Unpacking HP Firmware Updates

Part 2 — S–Records parsing S-Records

Andrey Zagrebin, Moshe Kol, Shlomi Oberman

This post is the second of a four-part blog series documenting the different structures and stages of the firmware update. The next parts of the series will be uploaded week by week as we write them.

    • Part 1 – Just Print Me
    • Part 2 – S-Records parsing S-Records
    • Part 3 – From NAND to RAM through sliding windows Coming soon!
    • Part 4 – Tools and process Coming soon!

In the previous post we detailed how to unpack HP firmware raster graphics and extract its encoded data. A second layer of encoded data should be visible at this point. This post details how to decipher the next encoding layer.

The second firmware encoding layer begins with a series of approximately 6,000 structured ASCII lines, followed by a large blob of binary data. In our research, we initially ignored the opening ASCII lines, focusing primarily on the binary data. In this post, however, we will come back to this binary data later.

The S-Record format

The second-stage file’s ASCII lines, found at the beginning of the file, use the “S-Record” format explained below, with some proprietary extensions thrown in for good measure.

Motorola developed the S-Record format, an ASCII format used for encoding binary data. Its regular purpose is to program EEPROM/Flash memory chips. It is commonly used by firmware developers, and is composed of different types of records, with two main categories being:

  •  data records, consisting of a load address followed by data
  • Start address records containing an address for redirecting execution at the end of the programming.

Each S-Record is a separate line ending with <LF> (\n) and consists of 5 fields:

S <Type> <Length> <Address> <Data> <Checksum>

The use of each field depends on the record type. Aside from the first character (which is S), all the others are ASCII-hexadecimal.

The ASCII part of the 2nd-stage firmware file begins, in our case, with 7 S-Records of type A, each of them of length 0x27 (spaces were added for convenience):

S A 27 0202026F02EA000801008F8B97F7FFC72708D194D0F4181400BF28865C57BDEFADCDDE136EB6 21 <LF>
S A 27 5E116D98172B43812D1FA171C827EA524AC7A7699AA4AAA66D09C92F0FFE82FC8C611E6D2E90 F7 <LF>
S A 27 9527FD68BC05365133E59A9C3F5C3E63819D64D7BBD839AF5B40ACCA7A4964ED97ECD21476B8 55 <LF>
S A 27 46E0ADF0F9BF0D9210B70EA7E6FB96C007CED8DF8F92FCA2F5A4ADDE730EF4565049F2523097 2D <LF>
S A 27 1B1A3B9BD4ADB31654C1B7A94D98629BB52974BB362F18FCD7D2898B6D20AA872070EC0BC3FC E5 <LF>
S A 27 87737AEB4FDAA840BFC6355D3C6F405FFFB8F074141D1B59B302E4808944AC5CD4AE8FE3DEB8 CF <LF>
S A 27 AC2DD17268E76FBAD51F8B979994DD70760F2BE56356659086F46FD5B109762F84ACBBCE0E42 4B <LF>

This type of S-Record is proprietary and undocumented. Some investigation confirms that it is likely an authentication header (possibly a signature of the update package).

After the authentication header, there is an S-Record of type 0. A record of type 0 is called the header record. Its address field is usually zero, and its data field normally contains textual information describing the following block of S-Records in a human-readable format. This record type does not affect record parsing. In our case, it contains the string “reflash ” (the spaces are intentional):

S 0 0F 0000 7265666C6173682020200000 AB <LF>
            r e f l a s h _ _ _[spaces]

The next 5,908 lines are S-Records of type 3. Type 3 records instruct the flash programmer to store the record data to the specified 4-byte memory address. For example, the following S-Record line means “store 20 bytes of data at address 401D0000”:

S 3 19 401D0000 F24111C0F2CC011F7808B170F64E410EF6C34102 77 <LF>

0x19 == 25. 4 byte address, 1 byte checksum, 20 byte pure data

The last byte is the checksum and is computed as follows:

  1. Sum all bytes (modulo 256) starting at the length field.
  2. Take the 1’s complement of the result.

In our example,

19+40+1D+00+00+F2+41+11+C0+F2+CC+01+1F+78+08+B1+70+F6+4E+41+0E+F6+C3+41+02=88

and one can verify that

88 & 77 = 0

The block of type 3 S-Records is terminated with an S-Record of type 7:

S 7 05 401D8D38 D8 <LF>

This record has no data field, and the address field is 4-byte address to which control (CPU execution) gets passed.

Using the open-source tool srec_info (for a link, see the reference section below), we can gain some information about the file (be sure to manually remove the first 7 S-Records lines so that srec_info recognizes the file). The output should look like this:

$ srec_info outer.srec
Format: Motorola S-Record
Header: "reflash "
Execution Start Address: 401D8D38
srec_info: outer.srec: 5911: warning: ignoring garbage lines
Data: 401D0000 - 401E4D95
      401E4D98 - 401EBE67
      401EBE98 - 401EBEBF
      C01F02E0 - C01F11BF
      C01F8164 - C01F8164

This output shows that the loaded data is not contiguous in memory. The garbage lines referred to by srec_info are the two lines immediately following the S7 record:

F026A0DA4
P02628000

It seems these are special (proprietary) records beginning with F or P followed by a four-byte address. It could be used to set up some registers or state before branching, as an indicator of a version number, or anything else really, but we have not investigated this further.

The binary data

Following the S-Records comes the binary data. In our research, this was the first part we analyzed.

Many sequences in the binary data are easily detectable as human-readable strings, but they are “broken”, i.e., they are readable fragments of strings, not whole strings. This suggests that we are closing in on the final raw data, and that the format is not compressed, encrypted, or heavily encoded.

By staring a bit at the binary data, a pattern emerges. Take a look at this certificate extracted from the binary:

You can see a definite pattern to the binary data that repeats every 40 bytes. These 7 bytes are clearly not part of the real data (i.e., they contain non-ASCII bytes). Initial observations indicate that the second and third bytes 33 2D are some sort of header, while the next 4-bytes maintain some kind of counter incremented by 0x28 every time. The first byte looks like some kind of checksum.

Internally we referred to this pattern as the “mysterious 7-byte pattern”. Several questions intrigued us:

  • What is the purpose of this pattern?
  • What is the purpose of the ASCII commands at the beginning of the file?
  • How is the checksum computed (and is it a checksum)?

Let’s look at the first byte, which looks like a checksum. Note that the value is different for two identical sequences. Consider the following:

It appears that, for every 0x28 bytes, the checksum byte is usually decremented by 0x28 (modulo 0x100), but that occasionaly it is decremented by 0x29.

To see this, let’s take a look at the address 0x13AB3E in the image and the three checksum bytes that follow it. We can examine the absolute difference between the checksum values:

1A + 28 = 42 (the first byte at address 0x13AB3E)
F1 + 29 = 1A
^
+--- (the first byte at address 0x13AB9C)

Fast forward many grueling hours, in which we stare at the data, and it stares back at us. In the end, it gives up its secrets. The checksum calculation and the entire format can be deduced.

It turns out the first byte is actually the last byte of the previous “7-byte pattern”, and the pattern is nothing more than a binary version of the Motorola S-Records — mysterious indeed!

Records in the binary S-Record have the same structure as their ASCII counterparts, except that the SX in the ASCII version (where X denotes the type) is replaced by 0x3X in the binary version, and the rest of the data is binary data and not ASCII-encoded. The <LF> newlines are also ommited.

For type 3 records the format is:

33 <byte_count> <4-byte counter> <... data ...> <checksum_byte>

For example, the following record

33 2D 00 18 A0 08 A3 62 02 B2 38 CC DA 54 6F 9B 4A 02 FC 81 0D 32 0E BB B1 3C 37 D6 8F A8 3C E7 9D F2 38 8C 63 FC F6 EA FC 38 68 03 2B F2 AA

is parsed as:

33 - Record type (3)
2D - Record length (45)
00 18 A0 08 - Record address
A3 62 02 B2 38 CC DA 54 6F 9B 4A 02 FC 81 0D 32 0E BB B1 3C 37 D6 8F A8 3C E7 9D F2 38 8C 63 FC F6 EA FC 38 68 03 2B F2 - data
AA - Record checksum

I heard you like S-Records

So what does this all mean? Why do we have ASCII S-Records followed by a binary version of S-Records? Why would HP use a binary S-Record format instead of the straightforward ASCII version? And if they decided to use binary S-Records, then why are the ASCII S-Records used as well?

The answers are, believe it or not, in the S-records.

We looked at the data encoded by the (ASCII) S-Record layer, writing a simple Python script to extract the records. The script loads the decoded data in Ghidra at address 0x401D0000, the address to which the S-Records indicate to write this section.

It turns out that the ASCII S-Records contain code.

An intriguing function is located at address 0x401d84d4. Here is an excerpt from Ghidra’s decompilation:

 

Lo and behold, the initial ASCII S-Record lines contain code. What does this code do? It decodes the binary S-Records that follow (yes, you read that correctly).

Why did HP do this? We don’t have a clear answer, but space is a likely reason for the binary S-Records. Binary data will expand in size by approximately 2.4 times when represented in ASCII S-Record format (see SRecord Reference Manual page 134). The firmware update file is already quite large – close to 40MB, and there’s no reason to double its size.

Why the double encoding? Possibly to enable all updates to be backward-compatible with versions of the printer that have code to decode ASCII S-Records. I could also be related to a staged process performed while updating the firmware.

Next steps

We now have a raw flash image!

In the next blog post, we will see how to load this image into memory for reverse engineering (and regular printer operation).

References