Part 4 — Memory map leads us to our destination
Andrey Zagrebin, Moshe Kol, Shlomi Oberman
This post is the forth and final of a four-part blog series documenting the different structures and stages of the firmware update.
- Part 1 – Just Print Me
- Part 2 – S-Records parsing S-Records
- Part 3 – From NAND to RAM through sliding windows
- Part 4 – Memory map leads us to our destination
In the previous post we detailed the flash layout and the sliding window compression used to store memory sections on-disk.
We now have a raw flash image on our hands.
At this point, we have preloaded the last few code sections, decompressing some of them, followed by general decompressing. We’re done and ready to look at the application code. right?
Don’t get excited yet. The path to enlightenment is almost as long as the path to HP firmware unpacking.
Following the newly loaded code, we reach yet another indirect call. From the debug strings around the call opcode, it looks like this is the entry point of the printer application code:
This entry point ultimately comes from decoding a structure already loaded into memory:
For reasons apparent later, we refer to this structure as the application header or apphdr, while referring to the code using it as the applicationloader (or app loade).
0x4fffc0004 is the start of this structure, and at
0x4fffc038 we find the entry point,
0x4145a9b4+1. This address is once again in a not-yet-initialized part of RAM. Reverse-engineering the function that parses the application header, we learn valuable implementation details, presented in the following paragraphs.
One of the first operations in the app loader is displaying the bootsplash bitmap picture. This picture is identical to the one found in the Flash image before loading the firmware to RAM.
Next, the application loader again performs
memcpy, and decompression operations on chunks of memory. Curiously, both the pre-loader and application loader have their own copies of these functions rather than sharing one set. This duplication suggests a possibleorganizational barrier between the pre-loader and app loader software development teams.
This time though, instead of using hardcoded arguments, the app loader invokes these functions in a loop, reading sets of parameters from memory pointed to indirectly by the app header.
Here’s an example of decompilation of that part of the app loader that invokes
memcpy in bulk:
verify_address function. This function checks whether the address range written to indeed overlaps so-called “protected ranges”. A protected range is a range of memory addreses that will not be overwritten, even if a section is marked for loading at an address that overlaps with that range. If there is an overlap, the loader does not invoke the relevant
uncompress for that section. To check whether an address range is protected, the loader compares the range against
0x1a pairs of starting and ending addresses of protected memory ranges. The array of pairs of addresses is also pointed to by the apphdr. We’ll discuss why these ranges are so special when we discuss the different memory sections.
memcpy parameters are stored as an array of triplets of the form:
|0||4||void*||dest – the start address of the block to initialize|
|4||4||void*||src – the start address of the block to read from|
|8||4||size_t||num – number of bytes to set (size of blocks)|
And similarly for
|0||4||void*||addr – the start address of the block to initialize|
|4||4||int||value – the byte value to set|
|8||4||size_t||num – number of bytes to set (size of block to initialize)|
Note: Although the second argument represents a byte value,
memset expects an int, which is recast internally to a byte, consistent with the
libc version of
|0||4||void*||dest – the start address of the block to initialize|
|4||4||void*||src – the start address of the block of compressed data|
|8||4||size_t||compressed_size – size of the compressed block|
There is quite a lot more code in the application loader, but we need only focus on the code that relates to loading those sections into memory required to achieve our goal, which is to reverse engineer the firmware and find security vulnerabilities.
In the end, execution is passed to the
app_entry function, pointed to by the apphdr.
All the parameters related to the application loader reside in the application header and the memory it points to. Let’s go through the important members of the application header structure:
(“Offset” means the decimal offset from structure start. We omit irrelevant and unknown fields.)
|0||0x3ca55a3c||magic||Checked before the stucture is used|
|4||0x6c||size||Total size of the struct in bytes|
|20||0x4e0b0000||bootsplash_bmp||bootsplash_bmp is a pointer to the bootspalsh bitmap image (BMP file format). This appears to be the same picture as the one found on the flash image before the code that is loaded to RAM|
|52||0x4145a9b5||entry_point||Pointer to the application entry point|
|56||0x4fffc000||protected_count||Pointer to a 32-bit integer counting the number of protected memory ranges|
|60||0x4fffc070||protected_addresses||Pointer to pairs of (start, end) protected memory ranges|
|64||0x4e10fcc0||section_linked_list||Pointer to a linked list of memory section descriptors|
|72||0x4e10fa68||memset_list_start||Start of the list of
|76||0x4e10fad4||memset_list_end||End of the list of
|80||0x4e10fad4||copy_list_start||Start of the list of
|84||0x4e10fbdc||copy_list_end||End of the list of
|92||0x4e10fbdc||uncompress_list_start||Start of the list of
|96||0x4e10fcc0||uncompress_list_end||End of the list of
- All fields are 32 bits (4 bytes) long
- The purpose of the two
more_magicfields is not clear; we conjecture they might be a version id or some kind of bitmask. Interestingly, their two values are bitwise complements of one another. Both values, except for the most significant nibble, are checked before reading from the apphdr. Technically, each value is masked with 0x0fffffff and tested against
copy_list_barrierfield points to the middle of the
memcpyparameter list, and is not used in this implementation of the loader. It may indicate that the values before this point have a different purpose than those following.
uncompress_list_barrierpoints to the middle of the
uncompressparameter list in much the same way.
As briefly mentioned above, the apphdr has a field (
section_linked_list) pointing to a linked list of memory section descriptors. The app loader code does not seem use it. However, it contains information about the structure of the printer’s memory, including section names, which may aid us in loading and reverse-engineering of the firmware.
section_linked_list points to the first element of this list and each element consists of the following members:
All members are 32-bit (4 bytes) long.
Following is a description of the element members:
next: Pointer to the next element of the linked list.
section_name: Pointer to a null-terminated string containing the section name
start_addr: The starting address of the section
size: The size of the section in bytes
unknown: The purpose of this field was not researched. It could contain Information about the section type or various flags (e.g., rwx (“read-write-execute”) permissions) Values observed were:
dest_section: If this section is used to initialize another section (e.g. it is the source of a
uncompressoperation), this field holds a pointer to the destination section descriptor. Otherwise, it is NULL.
This field points to the descriptor (i.e., linked-list element) and not to the start of the section in memory.
Example of two entries:
[0x4e110504] .cromtext: next: 0x4e110528 (.crommodule section descriptor) section_name: 0x4e11051c (".cromtext") start_addr: 0x4f522fd0 size: 0xa25a6c unknown: 0x1 dest_section: 0x4e110b78 (.text section descriptor) [0x4e110b78] .text: next: 0x4e110b98 (.module section descriptor) section_name: 0x4e110b90 (".text") start_addr: 0x4036800c size: 0x116655c unknown: 0x1 dest_section: 0x0
In this example,
.cromtext has a non-zero
0x4e110b78). As expected, the .cromtext section is decompressed and loaded to the .text section (address
0x4e110b78) by the app loader.
Some examples of the contents of memory sections include:
.load_apphdrsection: The section is constructed as follows:
Protected memory entries count (0x1a)
The apphdr struct itself
Protected memory entries (Pairs of 32-bit addresses. 0x1a*8=0xd0 bytes)
.secinfosection contains the parameter triples for the
decompressfunctions, elements of the section descriptor linked list, and the section names as null-terminated strings.
Now that we can associate memory address ranges with sections, we can reach some interesting conclusions:
- the memory sections do not overlap.
- the protected areas of memory include the following named sections:
.load_text .load_rodata .boot_ncdram_hole (empty section) .load_ncdata (empty section) .load_data .load_ncbss .load_cgdbuf .load_bss .nosi_text .nosi_rodata (empty section) .nosd_data (empty section) .nosd_bss .startup_text .startup_rodata .startup_data (empty section) .startup_bss .stack .erom_support_2 .secinfo
These are mostly sections that are critical for running the app loader, and include sections that were initialized by the pre-loader.
uncompressparameters correspond to entire memory sections and do not overlap.
- All the sections that need initialization have corresponding parameters in one of the
uncompresslists — even those that have been initialized by the pre-loader and are part of the protected ranges.
The last conclusion is encouraging. If we know the address of the apphdr structure, we can have a loader script parse it and initialize the uninitialized memory automatically. The initialization includes those sections with hardcoded addresses in the pre-loader code. The magic 4-byte number at the start of the apphdr structure is unique, and can be used to find the structure.
So, are we done yet? Every path has an end, and we have finally reach ours. Once here, we discover that “Accomplishments will prove to be a journey, not a destination” (Dwight D. Eisenhower).
Let’s take a moment to remember all the stages of the firmware we had to unpack and decode to reach this point:
In post no. 1, we unpacked the PCL format, including a proprietary extension and extracted data encoded as a raster graphics image.
In post no. 2, we encountered S-records, and all they wanted to do was parse some S-records. We dealt with a proprietary S-record binary variation along the way.
In post no. 3, we started looking at the code and discovered that it is self-modifying and staged, with each part loading the next into memory. We also got a crash lesson in sliding-window compression 101.
In post no. 4, we uncovered the app header structure, saw how the different sections of code and data are loaded into memory, and started to see the light at the end of the tunnel. We also finished this blog post series.
What should we do next week?
Be on the lookout for our upcoming announcement on June 16th, when we announce the security findings for which we performed all of this initial research of firmware unpacking.
- No references used for this post
Moshe Rubin and Daniel Goldberg for proofreading