Unpacking HP Firmware Updates

Part 1 — Just Print Me

Moshe Kol, Shlomi Oberman

This post is the first of a four-part blog series documenting the different structures and stages of the firmware update. The next parts of the series will be uploaded week by week as we write them.

Background

There comes a time when every person will need to reverse engineer an HP firmware update. That time came for a few of us at JSOF these past few months. This is part of a larger security research project to be released in the following months. We needed to be able to reverse engineer an HP firmware and chose to do so by looking at an update file. We wrote tools and documentation that can take us from a printer update file in the .rfu format all the way to a firmware mapped correctly in a Ghidra project.

The firmware file format has been partially documented in numerous places by different researchers and companies at different times, including a thorough analysis by Check Point research and basic official documentation for the outer layer of encoding. In our research, we encountered a lack of up-to-date and correct information. We also did not find any tooling to unpack and load the contents of an update package into a memory map for reverse engineering. We ended up writing in-house tooling, and documenting the essential structures and encodings.

When reading through this document, it becomes evident that the highly layered firmware encoding/packaging is quite convoluted and random. We are not sure whether this is some form of security-by-obscurity or a heap of legacy implementations built on top of each other like an ancient ruin. One can see, in the colors and shapes of the stones, the opposing political factions that dominated the printer corporate empire over time.

We wrote two main tools to reverse engineer printer updates. The first tool unpacks the firmware update package to the stage where we have a flash image. The second takes a flash image and loads it correctly into Ghidra.

Disclaimer All of this information is correct for the printer that we used and the firmware version that we used. We used an HP OfficeJet Pro 8720 with the firmware update file ojpro_8720_1919B_05102019.rfu.

How to obtain an HP firmware

HP firmware comes packaged in the remote firmware update (RFU) format. The update files are available on the HP FTP server: ftp://ftp.hp.com/pub/networking/software/pfirmware/.

When in doubt, sniff the update process of the printer using Wireshark to find the correct download link and firmware version.

The firmware update

The firmware update consists of the following main stages:

  1. Unpacking and decoding of the update package (with extension .rfu) to produce a flash image.
  2. The flash image gets loaded to memory on every printer boot. This stage is essential because it enables us to load different sections into memory for reverse engineering. It also allows us to compress parts of the data and code on disk.

The RFU (Remote Firmware Update) format

The firmware is compressed and encoded in the following high-level layers:

  1. The Printer Job Language (PJL) format, which is a documented format describing print jobs and is an extension of the Printer Command Language (PCL) format.
  2. A proprietary encoding scheme comprised of a binary version of binary-to-text encoding, similar to Motorola’s SREC format. After decoding this stage, we have the raw data as written to the NAND flash (as in our case).
  3. A proprietary firmware description format comprised of a section table with section descriptions, structures, and metadata.
  4. Sections containing the firmware data and code. Many of the sections are compressed using one of several supported compression schemes.

PJL and PCL languages

HP’s RFU file format contains Printer Job Language (PJL) commands, deployed to the printer like a typical print job (you simply print it!). HP developed the Printer Job Language (PJL) to allow switching printer languages (also called personalities) at the job level. An application that supports PJL can print one job using PCL and another job using some other printer language (e.g., PostScript).

In our case, the start of the update file shows the following PJL commands:

%-12345X@PJL
@PJL COMMENT MODEL=HP OfficeJet Pro 8720
@PJL COMMENT VERSION=WMP1CN1919BR
@PJL COMMENT DATECODE=20190510
@PJL UPGRADE SIZE =39640723
%-12345X@PJL COMMENT (null)
@PJL ENTER LANGUAGE=FWUPDATE
EThis device does not support FWUPDATE!

The first thing to note is the particular sequence <ESC>%-12345X (where <ESC> represent the escape code in ASCII, hex 1B , hereafter written <ESC>) at the beginning of the file. sequence, known as the Universal Exit Language (UEL) command, causes the printer to exit the active printer language and return control to the PJL layer, which is the default control layer. This command also appears at the end of the file.

From this header we can learn the printer model for which this firmware is intended, as well as firmware version and code build/release date. HP uses an undocumented PJL command UPGRADE and its option value SIZE in order to specify the size of the RFU file in bytes.

The ENTER command is used to select a particular printer language for the printing of subsequent data. Normally it is PCL or PostScript, but in our case the language selected is FWUPDATE. Not surprisingly, this language is used for the firmware update process and is undocumented. It is crucial to understand this language in order to extract the firmware image. Another indication that this is a non-standard printer language can be seen after the ENTER command – there’s a printer reset command (<ESC>E) followed by a message This device does not support FWUPDATE!. This should be printed by printers which don’t support this method of firmware update delivery.

Upon examination of the RFU binary, a pattern of <ESC>*b can be seen spanning through the entire file. Researching some information online took us into the PCL technical reference manual, and more specifically to the chapter on Raster Graphics (chapter 6 of the PCL 5 Color Technical Reference Manual).

Raster Graphics

A raster image is an image composed of dots (also known as a bitmap image). Each dot is represented by a bit (0 – print nothing, 1 – print a dot). The printer is capable of printing raster images using raster commands, which are part of PCL. The image is delivered to the printer as dot rows, each row represent a strip of the raster image.

Here’s an example of a simple raster image (0 was replaced by a dot for visibility):

For the printer to print a raster image, a certain command sequence should follow. These commands define the raster area (height and width) as well as the resolution and possibly color information. This is a relatively complex format and is fairly documented (see Further Reading section), therefore we won’t go into detail about every part of it, only the relevant parts for unpacking the firmware.

At a high level, the important command sequence for us is as follows, where each row is a command:

Source Raster Height
Source Raster Width
Start Raster Graphics
Y Offset
Raster Compression
Transfer Raster Data
...
Transfer Raster Data
Y Offset
Transfer Raster Data
...
Raster Compression
Transfer Raster Data
...
End Raster Graphics

The Raster Height and Raster Width commands define the raster area of an image. But what is the height and width of a firmware image? It seems odd at first that these commands are used. In our case the height was unspecified (implied 0) which means that it is ignored. The width was set to 16384, and it defines the length of the row in the raster image. As a technique to save space, the printer automatically fills any partial row (i.e. a row with length less than the specified width) with zero bytes up to the full length.

The Y-Offset command skips entire rows, so that there’s no need to send a batch of rows which are all zeroes. This command is irrelevant for firmware unpacking.

The Transfer Raster Data command specifies a length followed by a binary blob which is compressed using the compression method specified by the Raster Compression command. There are several compression methods used, some of them are described below.

PCL Command Syntax

Now that we have a basic understanding of the RFU file structure, we need to peel off these raster graphics and PCL layers. For this task, we need to understand the basic command syntax of PCL and the raster graphics commands in particular.

PCL commands are escape sequences specified in ASCII format, and consist of at least two characters. The first character is the escape character, <ESC>. Whatever comes next is interpreted as a printer command.

The general format of a printer command is as follows:

Group character, parameter character, value field and data are considered optional. We’ve already seen a PCL command without these: the printer reset command <ESC>E which appears directly after the ENTER PJL command. Actually, HP forgot the first # character before z1 in their PCL technical reference; hopefully they’ll fix it after reading this post.

The first PCL command just after the string This device does not support FWUPDATE!

<ESC>*rt16384sA

This command is as bad (i.e., complex) as it gets, so let’s parse it together. The rest of the commands should be relatively easy to understand.

The asterisk (‘*’) character is known as a parameterized character. It is the one used by all of the PCL commands in the RFU file, except for the Printer Reset command and UEL. The exact meaning of this character is not relevant for our purpose, only its syntactical role.

The r character is the group character. Its meaning is also irrelevant for our purpose

The t character is the parameter character which specifies the raster height. A value field for this character should appear before the character and is missing. Therefore, the value 0 is implied.

After the t character we see a numerical value 16384 (in decimal) followed by the parameter character s. The s character specifies the raster width.

Finally, we see a captial letter A which indicates both the end of this escape sequence and the end of the “Start Raster Graphics” command.

In summary, this combined escape sequence specifies the raster height (0), the raster width (16384) and a marker to start raster graphics. It can be thought of as a short-hand for writing these commands in sequence (note the uppercase T and S, which now take the role of the termination character):

<ESC>*r0T       Raster Height
<ESC>*r16384S   Raster Width
<ESC>*r0A       Start Raster Graphics

Compression Methods

To save space, the binary payload of the command Transfer Raster Data is often compressed. To signal which compression is being used, a Set Compression Method command is used. The command syntax is:

<ESC>*b#M

By now we know how to parse this command. HP supports several compression methods, each is identified by a unique value number:

Value Compression
0 Unencoded
1 Run-Length Encoding
2 Tagged Image File Format (TIFF) rev 4.0
3 Delta row
4 Empty row
5 Duplicate row/Adaptive compression

The compression methods are documented in HP PCL technical reference manual (see Further Reading section).

In our RFU sample, the compression methods used are Unencoded (0) and TIFF (2). The former is trivial, so we describe only the latter.

Tagged image file format encoding is a combination of run-length encoding (RLE) and no encoding at all. Each sequence of pattern bytes is preceded by a control byte. This control byte identifies whether we use RLE or no encoding at all.

The control byte is interpreted as a signed byte (represented by 2s complement).

  • A non-negative control byte c, with a value between 0 to 127, indicates that the following c+1 bytes should be interpreted literally.
  • A negative control byte -c (-127 to -1) indicates that the following byte should be replicated c+1 times.
  • A control value of -128 denotes a NO-OP.

Take a look at the following compressed data for the string JSOFrulez!!!!!!111111111:

There are two methods to deliver raster data to the printer, by row or by plane. The general syntax for Transfer Raster Data by Row is <ESC>*b#W and for transfer by plane is <ESC>*b#V.

The value field identifies the number of bytes in the transfer (in compressed form) and can be any number in the range 0 to 32767.

Originally, these two commands are used to deliver pixel information for an HP printer. The W method is the older of the two, and it is used to deliver monochrome raster data. The V method is used for sending colored pixels by plane according to a selected color palette. For instance, if we use RGB, then the first, second and third planes correspond to the colors red, green and blue, respectively.

In our case, these commands are used to specify the number of bytes to follow (in compressed form). One thing to note is that the command Transfer Raster Data by Plane (‘V’) is zero-filled if the amount of bytes after decompression is less than the raster width, while the Transfer Raster Data by Row (‘W’) is not zero-filled. This behavior unique to the FWUPDATE language and it was a source for subtle bugs in our unpacker.

Putting It All Together

This was a lot of information, so let us conclude this post with a simple example. We are going to encode the string JSOFrulez!!!!!!111111111 using the FWUPDATE language:

Breaking it down:

  1. <ESC>*rt32sA sets a height of 0 width of 32, as well as signaling the start of raster graphics.
  2. <ESC>*b+0Y does nothing, but this is mandatory.
  3. <ESC>*b2m14V specifies TIFF as the compression method, and specifies that the following 14 bytes should be decompressed. Since the decompressed data length is 24 (the length of our string), it should be zero-filled by 8 null bytes.
  4. <ESC>*bW is mandatory according to the format (but writes nothing in our case), as every sequence of V commands must be followed by a W command.
  5. <ESC>*rC specifies the end of raster graphics.

Part 2 coming soon!

Further Reading

The following references helped us during analysis:

Thanks

We’d like to thank the researchers at Checkpoint, for their helpful, previously-published research on this subject.

Big thanks to our proofreaders: Moshe Rubin, Nadav Cohen and Yaakov Cohen.

And lastly, we’d like to thank the EFF (Electronic Frontier Foundation) for their time, patience and guidance.