../
Looking for a senior C++ dev? I'm looking for work. Hire me!

Summary

I needed to understand how to put together and take apart a JPEG file. Strangely enough, detailed complete information on the popular JPEG file format is hard to find. Perhaps because of how common libjpeg happens to be, people don't generally roll their own JPEG implementation. This post is to document what I discovered.

JPEG file format

I remember playing with TIFF files in the 1980s. JPEG uses similar constructs. There is a set of common tags -- called markers or segments -- followed by a size, and then the data specific to that tag. This way, parsers that don't know how to interpret a certain marker can skip ahead in the file to the next one, completely ignoring structures that it doesn't understand.

All tags in JPEG files start with the value 0xFF. If the value 0xFF is ever needed in a JPEG file, it must be escaped by immediately following it with 0x00. This is called "byte stuffing".

So knowing that markers are 0xFF followed by anything other than 0x00, it becomes easy to start pulling apart JPEG files.

xxd -c16 -g1 -u testimg.jpg | grep --color=always -C999 FF 00000000: FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 ......JFIF...... 00000010: 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........ 00000020: 07 07 07 09 09 08 0A 0C 14 0D 0C 0B 0B 0C 19 12 ................ 00000030: 13 0F 14 1D 1A 1F 1E 1D 1A 1C 1C 20 24 2E 27 20 ........... $.' 00000040: 22 2C 23 1C 1C 28 37 29 2C 30 31 34 34 34 1F 27 ",#..(7),01444.' 00000050: 39 3D 38 32 3C 2E 33 34 32 FF DB 00 43 01 09 09 9=82<.342...C... 00000060: 09 0C 0B 0C 18 0D 0D 18 32 21 1C 21 32 32 32 32 ........2!.!2222 00000070: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 2222222222222222 00000080: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 2222222222222222 00000090: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 FF C0 22222222222222.. 000000a0: 00 11 08 00 95 00 E3 03 01 22 00 02 11 01 03 11 ........."...... 000000b0: 01 FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 ................ 000000c0: 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 ................ ...

All markers but two are immediately followed by a 2-byte size. The size never includes the 2-byte marker itself, but always includes the 2-byte size. This means the data in a marker is limited to 64KiB - 2 bytes. However, a marker may appear multiple times, and one particular marker -- the one with the actual image data -- works slightly differently to accomodate a payload of any size.

JPEG segments, tags, markers

Some information on JPEG segments/tags/markers:

TLA Name Hex Size Required Special Notes
SOI start of image 0xFF 0xD8 This tag does not have a size. Yes This tag must be the first one in the file.
APP0 application data 0xFF 0xE0 0x00 0x10 (16 bytes) for a standard image without a thumbnail. Yes This tag must come immediately after the SOI.
DQT define quantization table 0xFF 0xDB Variable size. Typically 0x00 0x43 (67 bytes) per table if this tag appears multiple times in the file. 0x00 0x84 (204 bytes) if two tables have been combined into a single tag. More if there are multiple tables, or if the tables are 16-bit instead of 8-bit. Yes The standard allows for multiple tables to be combined into a single DQT tag. I've seen both in use, JPEG files with multiple DQT segments, and JPEG files where the tables have been combined.
DHT define Huffman table 0xFF 0xC4 Variable, depending on the number and the size of the tables. Yes The standard allows for multiple tables to be combined into a single DHT tag. I've seen both in use, JPEG files with multiple DHT segments, and JPEG files where the tables have been combined.
SOF0 start of frame (baseline DCT) 0xFF 0xC0 Variable size. Typically 0x00 0x11 (17 bytes) for images with 3 components (e.g., YCrCb). Yes, but see "special notes". SOF0 can be replaced with SOF1 (0xFFC1, extended sequential DCT), SOF2 (0xFFC2, progressive DCT), etc...
COM comment 0xFF 0xFE Variable size. No  
SOS start of scan 0xFF 0xDA Complicated. See below for details. Yes The compressed image data comes immediately after the SOS tag.
EOI end of image 0xFF 0xD9 This tag does not have a size. Yes This tag must be the last one in the image.
There are many more segments/tags/markers than this. But these are the basic ones I needed to understand.

SOI (start of image) - 0xFFD8

This simple tag indicates the start of an image. It appears at the very start of the file. It has no size, and is immediately followed by the APP0 tag.

xxd -c16 -g1 -u testimg.jpg | grep --color=always "FF D8" 00000000: FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 ......JFIF...... 00000010: 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........ ...

APP0 (application data) - 0xFFE0

xxd -c16 -g1 -u testimg.jpg | grep --color=always "FF E0" 00000000: FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 ......JFIF...... 00000010: 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........ ...

Description:

0xFF, 0xE0, // APP0 segment 0x00, 0x10, // size of segment, including these 2 bytes; 0x10 = 16 bytes 0x4A, 0x46, 0x49, 0x46, 0x00, // identifier string: "JFIF" 0x01, 0x01, // JFIF version 1.01 0x00, // density units (0=no units) 0x00, 0x01, // horizontal density 0x00, 0x01, // vertical density 0x00, // X thumbnail size 0x00 // Y thumbnail size
It is recommended that thumbnails no longer be inserted into the APP0 segement, so the thumbnail sizes should both be zero. If APP0 contains a thumbnail, then the number of bytes will also need to be increased.

DQT (define quantization table) - 0xFFDB

xxd -c16 -g1 -u testimg.jpg | grep --color=always -C2 "FF DB" 00000000: FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 ......JFIF...... 00000010: 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........ 00000020: 07 07 07 09 09 08 0A 0C 14 0D 0C 0B 0B 0C 19 12 ................ 00000030: 13 0F 14 1D 1A 1F 1E 1D 1A 1C 1C 20 24 2E 27 20 ........... $.' 00000040: 22 2C 23 1C 1C 28 37 29 2C 30 31 34 34 34 1F 27 ",#..(7),01444.' 00000050: 39 3D 38 32 3C 2E 33 34 32 FF DB 00 43 01 09 09 9=82<.342...C... 00000060: 09 0C 0B 0C 18 0D 0D 18 32 21 1C 21 32 32 32 32 ........2!.!2222

Multiple quantization tables can be stored within a single DQT tag, or the JPEG file may have multiple DQT tags.

Description:

0xFF, 0xDB, // DQT segment 0x00, 0x43, // length of segment depends on the number of tables 0x00, // table #0, 8-bit // followed by the 64 byte quantization table 0x08, 0x06, 0x06, 0x07, 0x06, 0x05, 0x08, 0x07, 0x07, 0x07, 0x09, 0x09, 0x08, 0x0A, 0x0C, 0x14, 0x0D, 0x0C, 0x0B, 0x0B, 0x0C, 0x19, 0x12, 0x13, 0x0F, 0x14, 0x1D, 0x1A, 0x1F, 0x1E, 0x1D, 0x1A, 0x1C, 0x1C, 0x20, 0x24, 0x2E, 0x27, 0x20, 0x22, 0x2C, 0x23, 0x1C, 0x1C, 0x28, 0x37, 0x29, 0x2C, 0x30, 0x31, 0x34, 0x34, 0x34, 0x1F, 0x27, 0x39, 0x3D, 0x38, 0x32, 0x3C, 0x2E, 0x33, 0x34, 0x32
Default luma and chroma quantizer tables can be found in RFC 2435 Appendix A, or in libjpeg's jcparam.c.

DHT (define Huffman table) - 0xFFC4

xxd -c16 -g1 -u testimg.jpg | grep --color=always -C1 "FF C4" 000000b0: 01 FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 ................ 000000c0: 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 ................ 000000d0: 0A 0B FF C4 00 B5 10 00 02 01 03 03 02 04 03 05 ................ 000000e0: 05 04 04 00 00 01 7D 01 02 03 00 04 11 05 12 21 ......}........!

Multiple quantization tables can be stored within a single DHT tag, or the JPEG file may have multiple DHT tags.

Description:

0xFF, 0xC4, // DHT segment 0x00, 0xB5, // length of segment depends on the size of the table 0x10, // Huffman table // next 16 bytes describes the number of table entries // (in this example, the sum of 0+2+1+...+1+7d is 0xA2 or 162 decimal) 0x00, 0x02, 0x01, 0x03, 0x03, 0x02, 0x04, 0x03, 0x05, 0x05, 0x04, 0x04, 0x00, 0x00, 0x01, 0x7D, // table starts here -- this example has 162 table entries 0x01, 0x02, 0x03, 0x00, 0x04, 0x11, 0x05, 0x12, 0x21, 0x31, 0x41, 0x06, 0x13, 0x51, 0x61, 0x07, 0x22, 0x71, 0x14, 0x32, 0x81, 0x91, 0xA1, 0x08, 0x23, 0x42, 0xB1, 0xC1, 0x15, 0x52, 0xD1, 0xF0, ...

SOF0 (start of frame) - 0xFFC0

xxd -c16 -g1 -u testimg.jpg | grep --color=always -C2 "FF C0" 0000090: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 FF C0 22222222222222.. 00000a0: 00 11 08 00 95 00 E3 03 01 22 00 02 11 01 03 11 ........."...... 00000b0: 01 FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 ................

Description:

0xFF, 0xC0, // SOF0 segement 0x00, 0x11, // length of segment depends on the number of components 0x08, // bits per pixel 0x00, 0x95, // image height 0x00, 0xE3, // image width 0x03, // number of components (should be 1 or 3) 0x01, 0x22, 0x00, // 0x01=Y component, 0x22=sampling factor, quantization table number 0x02, 0x11, 0x01, // 0x02=Cb component, ... 0x03, 0x11, 0x01 // 0x03=Cr component, ...
The 2-byte image height and width fields explains why a JPEG is limited to 65535 x 65535 in size.

SOS (start of scan) - 0xFFDA

xxd -c16 -g1 -u testimg.jpg | grep --color=always -A4 "FF DA" 0000260: FA FF DA 00 0C 03 01 00 02 11 03 11 00 3F 00 F2 .............?.. 0000270: E5 6A 76 FA 66 29 08 34 9A 1A 63 F7 F3 46 F1 51 .jv.f).4..c..F.Q ...

For a 3-component image, the size will be 0x000C (12 bytes). However, the actual compressed image data comes immediately after the SOS segment, and isn't accounted for in the SOS size. This is how JPEG files can be larger than the usual 64KiB segment size limitation. So when reading through the file looking for segments, the SOS must be treated differently.

To find the next segment after the SOS, you must keep reading until you find a 0xFF bytes which is not immediately followed by 0x00 (see "byte stuffing"). Normally, this will be the EOI segment that comes at the end of the file.

0xFF, 0xDA, // SOS segment 0x00, 0x0C, // length of segment depends on the number of components 0x03, // number of components (1=monochrome, 3=colour) // 2 bytes for each component: 0x01, 0x00, // 0x01=Y, 0x00=Huffman table to use 0x02, 0x11, // 0x02=Cb, 0x11=Huffman table to use 0x03, 0x11, // 0x03=Cr, 0x11=Huffman table to use // I never figured out the actual meaning of these next 3 bytes 0x00, // start of spectral selection or predictor selection 0x3F, // end of spectral selection 0x00, // successive approximation bit position or point transform // image data starts here 0xF2, 0xE5, 0x6A, ...
Last modified: 2017-02-01
Stéphane Charette, stephanecharette@gmail.com
../