Unpacking HP Firmware Updates – Part 3

Part 3 — From NAND to RAM through sliding windows

Andrey Zagrebin, Moshe Kol, Shlomi Oberman

This post is the third of a four-part blog series documenting the different structures and stages of the firmware update. The next parts of the series will be uploaded week by week as we write them.

In the previous post we detailed how the S-record double-encoding works and how to decode it.

We now have a raw flash image in our hands, but don’t get excited yet. The path to enlightenment is almost as long as the path to HP firmware unpacking.

Flash out-of-band data

After peeling-off the binary S-Record layer, it looks like our situation has improved. The data looks more like plaintext, and human-readable strings can be recognized, but some of the strings are still broken. the same area of certificate information from the previous post can be seen. It looks “better”, but not quite perfect yet.

If we look at the data carefully and try to understand what is going on, we can see that there are sequences of real data followed by short sequences of what appears to be unused data (i.e., garbage data).

Here is the certificate area with the garbage data highlighted in green:

The interspersing of real data with seemingly “garbage” makes sense, as this is a raw flash image meant to be programmed onto a flash memory chip. These types of chips have designated areas for what is known as “out-of-band” (OOB) data.

The purpose of out-of-band data (also known as “spare area”) is storing ECC (Error Correction Code) and filesystem-dependent data. Unless we plan to use this file for writing directly to the flash (as well as for some internal calculations), we have no use for this data. Therefore the next stage of unpacking is to remove the redundant OOB.

The flash memory appears to be NAND flash with a page size of 0x800 bytes and 0x40 of out-of-band bytes per page:

Different devices might have different types of non-volatile memory. In particular, the page and OOB area sizes may change from one NAND flash device to another. Since our initial research focused on only one device, we will use these sizes. The sizes could be inferred from the flash data, as explained below, enabling us to adapt the tool to other devices that use a flash chip with different parameters.

Flash structure

At this stage, we have the raw flash data as it will be read from the NAND flash drive (the NAND flash user does not see the out-of-band data).

Now we need to understand how raw flash data gets loaded into memory and how to deal with any further encodings and compressions, if there Happen to be any.

We immediately see an intriguing header at the start of the image:

The printer\’s bootloader needs to load the firmware from flash to RAM and eventually pass execution to the firmware code, When first looking at this data we suspected it is some kind of boot header with all the necessary parameters to do so, and our suspicions indeed proved to be true.

The length of the Header 0x5C bytes. After these initial header bytes, we see a sequence of 0xFF padding bytes until the end of the flash page.

On the next page, we see the following:

The beginning of this area is easily recognizable as a header of a bitmap image file. The bitmap header indicates that the image is 0x5FA38 bytes long, and indeed after 0x5FA38 bytes we see padding up to the next page.

If we extract this file and load into an image viewer, this is what we see:

This graphic is the splash image, displayed on the device’s screen when booting. And Possibly in other cases.

What comes next appears to be firmware code, i.e., the area loaded into RAM. A seasoned ARM reverse-engineer should immediately notice the recurring 0xE? every four bytes, indicating we are probably looking at ARM instructions.

Note for the curious: This Opcode encoding starting with 0xE? characterizes the vast majority of ARM instructions, especially those commonly used. For example, the opcodes MOV, LDR, ADD, STR, CMP are all encoded this way. The common opcodes not encoded this way are branch instructions. Back to our data:

Knowing this information, we reversed engineered the code and worked back from that to decipher the meaning of some interesting values of the flash header:


The field we marked as bmp_size is the same as what we extracted from the bitmap file header. If we load the firmware section to the address given in the field load_addr, we see the following function at exec_addr:

This code appears to be the start of execution (note the resetting of registers).

We won’t go into how we found and confirmed the loading address and offset in this post, but it is relatively straight-forward when analyzing the code.

The pagesize_1 and pagesize_2 fields have the same value, which is equal to the page size. We don’t know the exact purpose of these fields. One of them could be the page size of our particular device, and the other could be an offset of the bootsplash BMP within the flash image.

The load_size field seems to be the size of the firmware image. However, if we skip 0x1F7FFF8 bytes forward from the start of firmware and align to the next page boundary we discover another page (0x800 bytes) of data on the flash:

The final section has 0x10A bytes of data and is then padded by null bytes until the end of the final page. These bytes are very similar to the authentication header we saw at the start of the ASCII S-Records that we covered in our previous post: its structure is very similar to the payload of the “SA” records. We assume it serves a similar function.

To conclude, the raw flash image is divided into four sections, each aligned to flash page boundaries:

With the following properties on our device:

SectionStart offsetSize in example (bytes)Size aligned to page (bytes) [Number of pages]
Header0x00x5c0x800 [1 page]
Bootsplash BMP0x8000x5fa380x60000 [0xc0 pages]
Firmware0x608000x1f7fff80x1f80000 [0x3f00 pages]
Authentication0x1fe08000x10a0x800 [1 page]

The sizes before page alignment are present in the firmware header (as described above). We don’t know whether the number of pages per section is constant or not. In our loader tool, we inferred the page count from the size header by rounding up to the nearest page boundary.

The bootloader loads the firmware section from flash to RAM, as a single chunk to addresses starting from 0x4e080000 (load_addr) up to 0x4ffffff8. Execution then transfers to exec_addr, which in our case is 0x4fffc140.

For purposes of reverse engineering, we need to load the firmware binary into a reverse-engineering tool (we chose Ghidra) and define the appropriate memory maps, so that the tool, as well as we, can follow branches and infer references.

The pre-loader

By now, you might think we’ve reached our goal – after all, we know that the firmware image is loaded to RAM as a single chunk, and we know its loading address. However, don’t let this feeling misguide you; remember the wise saying, “There is no way to HP printer firmware unpacking; unpacking is the way”. The most valuable part is the journey itself.

If we follow the path, and the code flow from the entry point (exec_addr), we unconditionally reach a branch to a function at address 0x400d9c28:

But hold on! Our firmware loaded into the range 0x4e080000-0x4ffffff8. Is there some other code already present in RAM, or have we missed something?

The answer lies a few code lines above this branch, as seen in the output of Ghidra’s decompiler:

The functions named memsetmemcpy, and uncompress were given their names by us after reverse-engineering of their code. The memset and memcpy functions appear to have identical function to their libc counterparts. The nature of uncompress is discussed in the next section. The last (indirect) call in this decompilation is the branch to our mysterious function.

The final call to uncompress decompresses data found at 0x4e0807b0, storing the uncompressed data to address 0x400d5440 and onward. This range is the area where our previous mysterious function should reside.

For these reasons, we named the code leading up to the mysterious function “the pre-loader code”. We resisted the urge to call it the “preloader-post-loader-post-binary-s-records-post-ascii-s-records-wrapped-in-raster-graphics-and-pcl code”, in part because it didn’t fit on our computer screen.

To fully be able to reverse engineer the printer firmware, we must “echo” these memsetmemcpy,anduncompress` functions that load and initializese the relevant memory sections. Note that the parameters of the functions are hardcoded. This is unfortunate because it means we can probably give up hope of creating a completely automated Ghidra loader tool, as these addresses differ from one firmware version to another.

Is that the best we can do?

A better solution comes from the next part of the code, which will be described in the next post, but let’s first understand how to uncompress the compressed code and data.

Sliding window compression

As suggested by the name of the decompression function, some of the memory sections reside in a compressed form in flash memory. Fortunately, we did not need to reverse engineer the decompression method, because Check Point Research already documented it.

The compression method used to compress the memory sections is LZSS. A general description of the algorithm can be found in any standard book about compression algorithms (see references for one), but its implementations vary.

The following section goes into the nitty-gritty details of the compression. If you are already familiar with this compression method, or would like to treat it as a black box and continue unpacking, feel free to skip to the next post (when published).

We now describe the compression method and the specifics of the parameters used in this case, as well as the decoder operation.

LZSS is a lossless data compression algorithm whose “dictionary coding method” avoids repeating previously encountered data. It performs compression by referring to already observed data using a token, which contains both an offset to this data and the length of the data to reuse.

“Previously observed” data resides in an area of the data called the window which can be limited in size. A significant window size results in better compression but slower encoding because it needs to search the entire window for matches. In such a case, a larger token (used to describe the offset in the window) needs to be used. (As an aside, the number of bits required to encode the offset is the base-2 logarithm of the window size.)

In practice, LZSS implementations frequently limit the window size, often specifying a few thousand bytes.

Another implementation parameter is the maximum number of bytes represented by a token (“maximum length”). The size of the “look-ahead buffer” determines the maximum length, usually in the order of several dozen bytes.

In our specific case, the window size is 4096 bytes, so 12 bits are required to encode the offset. The maximum length size is 4 bits. These two numbers (i.e., 12 and 4) mean that a token is encoded in 12+4=16 bits, or 2 bytes.

The token size should be less than the data we compress (otherwise, we don’t gain anything in terms of compression). Therefore, it only makes sense to compress at least 3 bytes using a token.

The compressed data cannot consist exclusively of tokens, as the tokens must reference some literal (i.e., raw) data. Separating tokens from literal data is accomplished using a bit flag (1 – literal, 0 – token). Instead of using a bit flag before each literal byte or token, flags are stored in groups of eight (8) bits encoded in groups as one byte per group. A single byte holds the flags for the following eight values, declaring them as a token or a literal.

The high-level structure of the compressed data repeats the following construct: a single flag byte followed by eight (8) values, where each value is a single-byte literal or a two-byte token.

With the high-level structure in mind, we can now dive into the specific quirks of the format.

We start with the flag byte. The least-significant bit of the flag byte corresponds to the first value. For example, the flag byte 0xFB = 0b11111011 means that the first two values are literals (21 01), the third value is a token (F8 F0, occupies two bytes), and the remaining five values are literals.

As for the token format, the low nibble of the second byte represents the length. You will recall the minimal compression length is 3. The length value is encoded, so a token length value of zero (0) maps to an actual length of 3 (i.e., add 3 to the encoded length). This representation allows for a maximum compression of 18 bytes.

The least significant byte of the token’s offset is the first byte of the token, and the high nibble of the offset is the high nibble of the second byte of the token. For example, the second token above, ED F0 (marked with dark-blue), encodes a length of 0+3=3 and an offset of 0xFED.

Missing from our description is the behavior of the sliding window. Let write_head denote the current position of the output (decoded) file, and let WINDOW_SIZE denote the size of the window in bytes (in our case, its value is 4096).

The token’s offset value is relative to a pointer we call window_start. In principle, when the decoder processes a token, it accesses position window_start + offset of the output buffer and outputs length bytes from there.

To avoid shifting the entire window after each encoding, LZSS implements a circular access to the window buffer. This means that if window_start + offset >= write_head, we wrap-around and access position window_start + offset - WINDOW_SIZE instead.

A picture can help:

When decoding bytes, the window slides from left to right, but the window_start anchor point remains intact. This anchor point moves every 4096 decoded bytes. The decoder keeps a counter known as the window_counter, enabling moving the anchor point when appropriate. This counter is incremented continuously by the number of decoded bytes, and wraps around when it reaches 4096 (i.e., the window size).

We compute the initial value of window_counter by subtracting the maximum token length (18) from the window size (4096). In our case, this is 4096-18=4078.

The last thing to keep in mind is that occasionally we have to access negative offsets (outside of our output buffer). Implementations use negative offsets to refer to data that is pre-filled in the window before decoding. Some implementations use 0x20 (ASCII space) while some use null bytes. In our case, null bytes are used.

We are now at a stage we can decompress the snippet of compressed data we saw above. This is an illustration of the stages of decompression of the data (snippet reproduced here for convenience):


Now that we know how it’s done – we can decompress and load the compressed section that contains the next stage of code.

In the next post we will explore this next stage of code and the final stages needed in order to correctly map and load the firmware in to a reverse-engineering tool.

References

Big thanks to our proofreader Moshe Rubin.

Get our posts to your Email

Subscribe to our mailing list