8 minute read

I found a need to parse GPT headers directly from a potentially corrupt or truncated disk image. sgdisk makes a best effort when printing out the table, but its output is hard to parse. sfdisk provides very nice output options but fails to read an image that doesn’t have both the header and trailer GPT tables. I also wanted access to the GUID and CRC values for some additional logic.

$ sgdisk -p disk.img
Disk /dev/nvme0n1: 1000215216 sectors, 476.9 GiB
Model: SAMSUNG MZVL2512HCJQ-00BH1              
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 50BA123A-641E-419F-AE95-93E722FBAE66
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1000215182
Partitions will be aligned on 2048-sector boundaries
Total free space is 2669 sectors (1.3 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            4095   1024.0 KiB  EF02  
   2            4096          503807   244.0 MiB   EF00  EFI System Partition
   3          503808         8503295   3.8 GiB     8300  
   4         8503296      1000214527   472.9 GiB   8300  

Looking around for what in my environment would allow me to parse the table myself in a declaritive way without having to munge bytes myself, I found hexdump. It provides a way to convert binary data to human-readable, or more importantly, machine parseable output.

hexdump from util-linux version 2.37.2 is what I have. You may encounter reimplementations of it for other OSs with different features.

Wikipedia provides a nice reference for the data layout of the GPT table. I am not going to go into detail on the table layout here, I encourage you to have a look. I will note that GPT refers to LBAs, which are (generally) 512 byte chunks/sectors/blocks that it divides the block device into.

Hexdump can accept a format string describing how to parse and print the binary data. It borrows from the printf token scheme. The format string is composed of a chain of space separated “format units” that describe how to deserialize the data and print it. Each unit is composed of two numbers and a double-quoted string containing the printf token ([c/b] "%s"). Either number is optional, but if provided must include the / to distinguish which one you are providing. The first number is a count of the number of times to repeat the format unit. The second number is a count of the bytes to consume for each iteration. The string can only contain a single printf % token. You can think of it like for each count of bytes it passes it to the printf function with the bytes as a single argument.

hexdumps behaviour for when it consumes bytes or not from the input stream is inconsistent and seems to depend on the state it is in from the previous format unit. Generally I have found if the string doesn’t contain a printf token, it doesn’t consume bytes from the stream regardless of the numbers preceding the token. This is specifically true when you provide the empty string to try and discard bytes (1/4 ""). It doesn’t seem to work and I just needed to print the bytes in a way that could be ignored. The exception seems to be if it is the last format unit.

hexdump also allows you to provide a file that contains the format string rather than trying to manage it on the command line. The way it handles the file is a bit odd. Each line in the file is used to format the input, meaning multiple lines will repeatedly output the same input file from the beginning rather than allow you to have a single format string spanning multiple lines.

For your copy-pasta delight I have provided three files:

Copy the following into a file named gpt_header

#!/usr/bin/env -S bash -c 'hexdump -v -s$(( ${BLOCK_SIZE:-512} * ${OFFSET_LBA:-1} )) -n${BLOCK_SIZE:-512} -f <(tail -n +2 $0 | tr "\n" " ") $1'

"Signature='" 8/1 "%1_u" "'\n" 
"Header_Revision=" 2/2 "%u" "\n" 
"Header_Size=" 1/4 "%u" "\n" 
"Header_CRC32='" 1/4 "%x" "'\n"
"#Reserved" 1/4 "%d\n"
"Current_LBA=" 1/8 "%u" "\n"
"Backup_LBA=" 1/8 "%u" "\n"
"First_usable_LBA=" 1/8 "%u" "\n"
"Last_usable_LBA=" 1/8 "%u" "\n"
"Disk_GUID='" 1/4 "%08X-" 2/2 "%04X-" 2/1 "%02X" "-" 6/1 "%02X" "'\n"
"Partition_Entries_LBA=" 1/8 "%u" "\n"
"Partition_Max_Count=" 1/4 "%u" "\n"
"Partition_Entry_Size=" 1/4 "%u" "\n"
"Partition_Entries_CRC32='" 1/4 "%x" "'\n"
1/420 ""

Copy the following into a file named gpt_entries

#!/usr/bin/env -S bash -c 'hexdump -v -s$(( ${BLOCK_SIZE:-512} * ( ${OFFSET_LBA:-1} + 1 ) )) -n$(( ${LIMIT:-128} * 128 )) -f <(tail -n +2 $0 | tr "\n" " ") $1'

1/4 "%08X-" 2/2 "%04X-" 2/1 "%02X" "-" 6/1 "%02X" "\t"
1/4 "%08X-" 2/2 "%04X-" 2/1 "%02X" "-" 6/1 "%02X" "\t"
1/8 "%u" "\t"
1/8 "%u" "\t"
1/8 "%08x" "\t"
72/1 "%1_p" "\n"

You will see that I have included a shebang in the files allowing you to chmod +x gpt_header gpt_entries and execute the files directly rather than having to manage separate bash scripts. The shebangs also include a workaround that allows a single format string to span multiple lines for easier reading. Each script is passed a single argument, the path to the disk image or block device.

If for whatever reason your disk has a sector size other than 512 bytes you can set an env variable BLOCK_SIZE=<your sector size>. If whatever partitioned your disk did not put the GPT header at LBA 1, you can specify the offset (in LBAs) via the OFFSET_LBA=<your GPT offset> env variable. gpt_entries also allows you to specify a limit on the number of entries it lists via the LIMIT= env variable.

What always surprises me is the number of people who do not know you can inline environment variables for any executable command by putting their declaration before the executable path.

For example, rather than

export BLOCK_SIZE=512
export OFFSET_LBA=1
export LIMIT=3
./gpt_entries disk.img

You can simply write it like this and the variables will be set within the ./gpt_entries execution environment.

BLOCK_SIZE=512 OFFSET_LBA=1 LIMIT=3 ./gpt_entries disk.img

Obligatory example outputs

$ ./gpt_header disk.img 
Signature='EFI PART'
Header_Revision=01
Header_Size=92
Header_CRC32='c87f96b0'
#Reserved0
Current_LBA=1
Backup_LBA=1000215215
First_usable_LBA=34
Last_usable_LBA=1000215182
Disk_GUID='50BA123A-641E-419F-AE95-93E722FBAE66'
Partition_Entries_LBA=2
Partition_Max_Count=128
Partition_Entry_Size=128
Partition_Entries_CRC32='1c30552d'
$ ./gpt_entries disk.img | head
21686148-6449-6E6F-744E-656564454649	D0269D3E-2390-4536-97CD-D1E1F2DAD9D2	      2048	      4095	00000000	........................................................................
C12A7328-F81F-11D2-BA4B-00A0C93EC93B	E2FFD449-0A42-4613-8110-2FB7FF137BF8	      4096	    503807	00000000	E.F.I. .S.y.s.t.e.m. .P.a.r.t.i.t.i.o.n.................................
0FC63DAF-8483-4772-8E79-3D69D8477DE4	A6A09F63-13E2-401D-B450-5816661E4E45	    503808	   8503295	00000000	........................................................................
0FC63DAF-8483-4772-8E79-3D69D8477DE4	0BC98C47-894A-461E-A033-CB99D3D2E93A	   8503296	1000214527	00000000	........................................................................
00000000-0000-0000-0000-000000000000	00000000-0000-0000-0000-000000000000	         0	         0	00000000	........................................................................
00000000-0000-0000-0000-000000000000	00000000-0000-0000-0000-000000000000	         0	         0	00000000	........................................................................
00000000-0000-0000-0000-000000000000	00000000-0000-0000-0000-000000000000	         0	         0	00000000	........................................................................
00000000-0000-0000-0000-000000000000	00000000-0000-0000-0000-000000000000	         0	         0	00000000	........................................................................
00000000-0000-0000-0000-000000000000	00000000-0000-0000-0000-000000000000	         0	         0	00000000	........................................................................
00000000-0000-0000-0000-000000000000	00000000-0000-0000-0000-000000000000	         0	         0	00000000	........................................................................

Unfortunately hexdump can’t format UTF16 encoded strings, so you get . for each null byte in the label.

Note, there is nothing stopping you from re-working these scripts to output as JSON or any format you like.

For completeness, here is a bash script that brings everything together and cleans up the output.

#!/usr/bin/bash

set -eu

BLOCK_SIZE=${BLOCK_SIZE:-512}
OFFSET_LBA=${OFFSET_LBA:-1}

disk="${1?'You must provide the path to the GPT disk as the first argument'}"
src="$(dirname ${BASH_SOURCE[0]})"

source <($src/gpt_header $disk)

# Validate header info
[[ $Signature == 'EFI PART' ]] || {
  echo "Unexpected header signature: $Signature"
  exit 1
}

[[ $Header_Revision == 01 ]] || {
  echo "Unexpected GPT revision: $Header_Revision"
  exit 1
}

dd="dd if=$disk of=/dev/stdout bs=1 status=none"
# The CRC field needs to be zero'd for the calculation
calculatedCRC=$(cat <($dd skip=$(( OFFSET_LBA * BLOCK_SIZE )) count=16) <(head -c4 /dev/zero) <($dd skip=$(( (OFFSET_LBA * BLOCK_SIZE) + 20 )) count=72) | crc32 /dev/stdin)
[[ $calculatedCRC == $Header_CRC32 ]] || {
  echo "Unexpected header CRC: expected $Header_CRC32 got $calculatedCRC"
  exit 1
}

$src/gpt_header $disk

# List partition entries filtering out unused entries and remove the null bytes from the labels
echo # blank line
fmt='% 36s % 36s % 10s % 10s % 10s %s\n'
printf "$fmt" 'Type GUID' 'Partition GUID' 'Start LBA' 'Last LBA' 'Attributes' 'Label'
$src/gpt_entries $disk | grep -v '^00000000-0000-0000-0000-000000000000' | sed -e "s/^/'/g;s/\t/' '/g;s/$/'/g" | tr -d '.' | xargs -l printf "$fmt"

Output

$ ./dumpgpt.sh disk.img 
Signature='EFI PART'
Header_Revision=01
Header_Size=92
Header_CRC32='c87f96b0'
#Reserved0
Current_LBA=1
Backup_LBA=1000215215
First_usable_LBA=34
Last_usable_LBA=1000215182
Disk_GUID='50BA123A-641E-419F-AE95-93E722FBAE66'
Partition_Entries_LBA=2
Partition_Max_Count=128
Partition_Entry_Size=128
Partition_Entries_CRC32='1c30552d'

                           Type GUID                       Partition GUID  Start LBA   Last LBA Attributes Label
21686148-6449-6E6F-744E-656564454649 D0269D3E-2390-4536-97CD-D1E1F2DAD9D2       2048       4095   00000000 
C12A7328-F81F-11D2-BA4B-00A0C93EC93B E2FFD449-0A42-4613-8110-2FB7FF137BF8       4096     503807   00000000 EFI System Partition
0FC63DAF-8483-4772-8E79-3D69D8477DE4 A6A09F63-13E2-401D-B450-5816661E4E45     503808    8503295   00000000 
0FC63DAF-8483-4772-8E79-3D69D8477DE4 0BC98C47-894A-461E-A033-CB99D3D2E93A    8503296 1000214527   00000000 

I hope you have found this informative on the use of hexdump, bash, and general parsing of binary data like GPT headers. Here is a bonus script that I created to extract a partition image from a disk image. There are better utilities that already exist to do this but this was a proof-of-concept for a more complex system.

#!/usr/bin/env bash
# Use: ./dumppartition 'E2FFD449-0A42-4613-8110-2FB7FF137BF8' disk.img part.img
set -eu -o pipefail

BLOCK_SIZE=${BLOCK_SIZE:-512}
OFFSET_LBA=${OFFSET_LBA:-1}

search="${1?"The first argument must be either a GUID or disk label to search for"}"
disk="${2?'You must provide the path to the GPT disk as the second argument'}"
out="${3?"The third argument must be a path to write the partition"}"
src="$(dirname ${BASH_SOURCE[0]})"

source <($src/gpt_header $disk)

# Validate header info
[[ $Signature == 'EFI PART' ]] || {
  echo "Unexpected header signature: $Signature"
  exit 1
}

[[ $Header_Revision == 01 ]] || {
  echo "Unexpected GPT revision: $Header_Revision"
  exit 1
}

dd="dd if=$disk of=/dev/stdout bs=1 status=none"
# The CRC field needs to be zero'd for the calculation
calculatedCRC=$(cat <($dd skip=$(( OFFSET_LBA * BLOCK_SIZE )) count=16) <(head -c4 /dev/zero) <($dd skip=$(( (OFFSET_LBA * BLOCK_SIZE) + 20 )) count=72) | crc32 /dev/stdin)
[[ $calculatedCRC == $Header_CRC32 ]] || {
  echo "Unexpected header CRC: expected $Header_CRC32 got $calculatedCRC"
  exit 1
}

found="$($src/gpt_entries $disk | grep -v '^00000000-0000-0000-0000-000000000000' | tr -d '.' | grep -m1 -F "$search")" || {
  echo "No matching partition found"
  exit 1
}

startLBA=$(cut -f3 <<<"$found")
lastLBA=$(cut -f4 <<<"$found")

dd if=$disk of=$out skip=$startLBA count=$(( lastLBA - startLBA + 1 )) bs=$BLOCK_SIZE