Part 2 — S–Records parsing S-Records
Andrey Zagrebin, Moshe Kol, Shlomi Oberman
This post is the second of a four-part blog series documenting the different structures and stages of the firmware update. The next parts of the series will be uploaded week by week as we write them.
- Part 1 – Just Print Me
- Part 2 – S-Records parsing S-Records
- Part 3 – From NAND to RAM through sliding windows
- Part 4 – memory map leads us to our destination
In the previous post we detailed how to unpack HP firmware raster graphics and extract its encoded data. A second layer of encoded data should be visible at this point. This post details how to decipher the next encoding layer.
The second firmware encoding layer begins with a series of approximately 6,000 structured ASCII lines, followed by a large blob of binary data. In our research, we initially ignored the opening ASCII lines, focusing primarily on the binary data. In this post, however, we will come back to this binary data later.
The S-Record format
The second-stage file’s ASCII lines, found at the beginning of the file, use the “S-Record” format explained below, with some proprietary extensions thrown in for good measure.
Motorola developed the S-Record format, an ASCII format used for encoding binary data. Its regular purpose is to program EEPROM/Flash memory chips. It is commonly used by firmware developers, and is composed of different types of records, with two main categories being:
- data records, consisting of a load address followed by data
- Start address records containing an address for redirecting execution at the end of the programming.
Each S-Record is a separate line ending with
\n) and consists of 5 fields:
The use of each field depends on the record type. Aside from the first character (which is
S), all the others are ASCII-hexadecimal.
The ASCII part of the 2nd-stage firmware file begins, in our case, with 7 S-Records of type A, each of them of length 0x27 (spaces were added for convenience):
S A 27 0202026F02EA000801008F8B97F7FFC72708D194D0F4181400BF28865C57BDEFADCDDE136EB6 21 S A 27 5E116D98172B43812D1FA171C827EA524AC7A7699AA4AAA66D09C92F0FFE82FC8C611E6D2E90 F7 S A 27 9527FD68BC05365133E59A9C3F5C3E63819D64D7BBD839AF5B40ACCA7A4964ED97ECD21476B8 55 S A 27 46E0ADF0F9BF0D9210B70EA7E6FB96C007CED8DF8F92FCA2F5A4ADDE730EF4565049F2523097 2D S A 27 1B1A3B9BD4ADB31654C1B7A94D98629BB52974BB362F18FCD7D2898B6D20AA872070EC0BC3FC E5 S A 27 87737AEB4FDAA840BFC6355D3C6F405FFFB8F074141D1B59B302E4808944AC5CD4AE8FE3DEB8 CF S A 27 AC2DD17268E76FBAD51F8B979994DD70760F2BE56356659086F46FD5B109762F84ACBBCE0E42 4B
This type of S-Record is proprietary and undocumented. Some investigation confirms that it is likely an authentication header (possibly a signature of the update package).
After the authentication header, there is an S-Record of type 0. A record of type 0 is called the header record. Its address field is usually zero, and its data field normally contains textual information describing the following block of S-Records in a human-readable format. This record type does not affect record parsing. In our case, it contains the string “reflash ” (the spaces are intentional):
S 0 0F 0000 7265666C6173682020200000 AB r e f l a s h _ _ _[spaces]
The next 5,908 lines are S-Records of type 3. Type 3 records instruct the flash programmer to store the record data to the specified 4-byte memory address. For example, the following S-Record line means “store 20 bytes of data at address 401D0000”:
S 3 19 401D0000 F24111C0F2CC011F7808B170F64E410EF6C34102 77 0x19 == 25. 4 byte address, 1 byte checksum, 20 byte pure data
The last byte is the checksum and is computed as follows:
- Sum all bytes (modulo 256) starting at the length field.
- Take the 1’s complement of the result.
In our example,
and one can verify that
88 & 77 = 0
The block of type 3 S-Records is terminated with an S-Record of type 7:
S 7 05 401D8D38 D8
This record has no data field, and the address field is 4-byte address to which control (CPU execution) gets passed.
Using the open-source tool
srec_info (for a link, see the reference section below), we can gain some information about the file (be sure to manually remove the first 7 S-Records lines so that
srec_info recognizes the file). The output should look like this:
$ srec_info outer.srec Format: Motorola S-Record Header: "reflash " Execution Start Address: 401D8D38 srec_info: outer.srec: 5911: warning: ignoring garbage lines Data: 401D0000 - 401E4D95 401E4D98 - 401EBE67 401EBE98 - 401EBEBF C01F02E0 - C01F11BF C01F8164 - C01F8164
This output shows that the loaded data is not contiguous in memory. The garbage lines referred to by
srec_info are the two lines immediately following the S7 record:
It seems these are special (proprietary) records beginning with
P followed by a four-byte address. It could be used to set up some registers or state before branching, as an indicator of a version number, or anything else really, but we have not investigated this further.
The binary data
Following the S-Records comes the binary data. In our research, this was the first part we analyzed.
Many sequences in the binary data are easily detectable as human-readable strings, but they are “broken”, i.e., they are readable fragments of strings, not whole strings. This suggests that we are closing in on the final raw data, and that the format is not compressed, encrypted, or heavily encoded.
By staring a bit at the binary data, a pattern emerges. Take a look at this certificate extracted from the binary:
You can see a definite pattern to the binary data that repeats every 40 bytes. These 7 bytes are clearly not part of the real data (i.e., they contain non-ASCII bytes). Initial observations indicate that the second and third bytes
33 2D are some sort of header, while the next 4-bytes maintain some kind of counter incremented by 0x28 every time. The first byte looks like some kind of checksum.
Internally we referred to this pattern as the “mysterious 7-byte pattern”. Several questions intrigued us:
- What is the purpose of this pattern?
- What is the purpose of the ASCII commands at the beginning of the file?
- How is the checksum computed (and is it a checksum)?
Let’s look at the first byte, which looks like a checksum. Note that the value is different for two identical sequences. Consider the following:
It appears that, for every 0x28 bytes, the checksum byte is usually decremented by 0x28 (modulo 0x100), but that occasionaly it is decremented by 0x29.
To see this, let’s take a look at the address
0x13AB3E in the image and the three checksum bytes that follow it. We can examine the absolute difference between the checksum values:
1A + 28 = 42 (the first byte at address 0x13AB3E) F1 + 29 = 1A ^ +--- (the first byte at address 0x13AB9C)
Fast forward many grueling hours, in which we stare at the data, and it stares back at us. In the end, it gives up its secrets. The checksum calculation and the entire format can be deduced.
It turns out the first byte is actually the last byte of the previous “7-byte pattern”, and the pattern is nothing more than a binary version of the Motorola S-Records — mysterious indeed!
Records in the binary S-Record have the same structure as their ASCII counterparts, except that the
SX in the ASCII version (where
X denotes the type) is replaced by
0x3X in the binary version, and the rest of the data is binary data and not ASCII-encoded. The
newlines are also ommited.
For type 3 records the format is:
33 <4-byte counter> <... data ...>
For example, the following record
33 2D 00 18 A0 08 A3 62 02 B2 38 CC DA 54 6F 9B 4A 02 FC 81 0D 32 0E BB B1 3C 37 D6 8F A8 3C E7 9D F2 38 8C 63 FC F6 EA FC 38 68 03 2B F2 AA
is parsed as:
33 - Record type (3) 2D - Record length (45) 00 18 A0 08 - Record address A3 62 02 B2 38 CC DA 54 6F 9B 4A 02 FC 81 0D 32 0E BB B1 3C 37 D6 8F A8 3C E7 9D F2 38 8C 63 FC F6 EA FC 38 68 03 2B F2 - data AA - Record checksum
I heard you like S-Records
So what does this all mean? Why do we have ASCII S-Records followed by a binary version of S-Records? Why would HP use a binary S-Record format instead of the straightforward ASCII version? And if they decided to use binary S-Records, then why are the ASCII S-Records used as well?
The answers are, believe it or not, in the S-records.
We looked at the data encoded by the (ASCII) S-Record layer, writing a simple Python script to extract the records. The script loads the decoded data in Ghidra at address
0x401D0000, the address to which the S-Records indicate to write this section.
It turns out that the ASCII S-Records contain code.
An intriguing function is located at address
0x401d84d4. Here is an excerpt from Ghidra’s decompilation:
Lo and behold, the initial ASCII S-Record lines contain code. What does this code do? It decodes the binary S-Records that follow (yes, you read that correctly).
Why did HP do this? We don’t have a clear answer, but space is a likely reason for the binary S-Records. Binary data will expand in size by approximately 2.4 times when represented in ASCII S-Record format (see SRecord Reference Manual page 134). The firmware update file is already quite large – close to 40MB, and there’s no reason to double its size.
Why the double encoding? Possibly to enable all updates to be backward-compatible with versions of the printer that have code to decode ASCII S-Records. I could also be related to a staged process performed while updating the firmware.
We now have a raw flash image!
In the next blog post, we will see how to load this image into memory for reverse engineering (and regular printer operation).