SlideShare a Scribd company logo
Let’s Write a JPEG Decoder
derekb@vimeo.com
@daemon404
Derek Buitenhuis
12 December 2018
New York, USA / The Internet
JPEG? Who cares?
112 December 2018
• Good as a first step into codecs
• Extremely simple
• Doesn’t even have spatial prediction
• Convince people DCTs aren’t scary
• In extremely wide use and will continue to be for the foreseeable future
• Writing a JPEG encoder is a good hands on way to get into hacking on multimedia code
• Real, viewable results
Vimeo Lunch Talks
212 December 2018 Vimeo Lunch Talks
Encoding
Step 0: RGB to Y’CbCr
312 December 2018
• Most JPEGs store image as Y’CbCr
• Some weird ones store as CMYK or XYZ
• JFIF doesn’t actually define a way to tag this info other than “number of planes”
• Most web uses are 4:2:0 subsampling
• Cb and Cr are half the resolution of Y’
• Save space for things that we notice more
• Always BT.601
Vimeo Lunch Talks
Step 1: Shift
412 December 2018
• Subtract 128 from all values
• DCT = Discrete Cosine Transform
• Think of Cosine’s range: [-1,1]
• Implementation note: Be careful with implicit type conversions here (uint8 / int8)
Vimeo Lunch Talks
60 → -68
Step 2: Apply 8x8 Forward DCT
512 December 2018
• Split planes into 8x8 blocks
• Do this:
Vimeo Lunch Talks
5 Second Overview of DSP
612 December 2018
• Background:
• Convert the sample values into the frequency domain using a reversible transform
• Higher frequencies = Finer (less noticeable) details
• Lower frequencies = Less granular details (e.g. solid rectangles)
• DCT chosen over DFT because DCT happens to have a nice property where its energy is
concentrated into a smaller set of coefficients, which is better of data compression.
• Intelligently drop higher frequencies we shouldn’t notice
• Intelligently reduce precision
Vimeo Lunch Talks
712 December 2018 Vimeo Lunch Talks
Don’t Run!
Step 2: Apply 8x8 Forward DCT — Continued
712 December 2018
• Gu,v is the resulting DCT coefficient at point u,v (see below)
• u and v are 0 to 7 (8 spatial frequencies in each direction, since we are using 8x8 blocks)
• gx,y is the shifted sample value at point x,y in our 8x8 block
• α(u) is this function:
• If you remember your linear algebra class, this makes sure the transform’s results are orthogonal to
each other
• Useful since we want to combine basis functions, and they have to be independent!
Vimeo Lunch Talks
Step 2: Apply 8x8 Forward DCT — Continued
812 December 2018
• Can be sort of thought as overlaying basis functions on each other at varying intensities
• This is where coefficients come into play
Vimeo Lunch Talks
Step 3: Zig-zag
912 December 2018
• Notice: Low frequencies cluster near the top left and higher frequencies radiate out
• The top left (lowest frequency) value is called the DC Value
• The rest are called AC values
• These are named as such for historical reasons
• DCT was used to analyze electrical signals before this
• Re-ordering the coefficients using a zig-zag pattern yields a set ordered by frequency
• Useful for entropy coding (more on that later)
• This is where FFmpeg’s logo comes from
Vimeo Lunch Talks
Step 4: Quantization
1012 December 2018
• Quantization generally refers to taking a continuous (or larger set) and sampling, or mapping it to a
smaller (discrete) set.
• Aside: The universe is quantum in nature, so can we really call anything continuous?
• This is the lossy part of JPEG compression.
• We want to map our larger set of DCT coefficients (in our case, floats, but in real cases, a larger set
of integers) to a smaller set of integer we’ll actually code into the bitstream
• We do this by dividing by a 8x8 quantization matrix, and clamping to integers
• This is provided by the encoder, and coded into the bitstream
Vimeo Lunch Talks
Step 4: Quantization — Continued
1112 December 2018
• Example Quantization Matrix: Input:
• Output:
Vimeo Lunch Talks
Step 5: Run Length Encode Zeroes
1212 December 2018
• Lots of zeroes now! Let’s code them efficiently.
• Example set (in raster order): 57,45,0,0,0,0,23,0,-30,-16,0,0,1,0, …
• For sets of values like: (X,Y)
• X is the number of preceding zeroes
• Y is the next value
• Special case #1: (0,0) means fill the rest of the set with zeroes after this point
• Special case #2: (15,0) in the middle of a set means stuff 16 zeroes in
• From our example set: (0, 57); (0, 45); (4, 23); (2, -30); (0, -16); (2, 1); (0, 0)
Vimeo Lunch Talks
Step 6: DC Prediction
1312 December 2018
• Prediction means “predicting” a current value based off of other values
• The “other” values can be separated by space (different parts of the same time), or for video,
time (different parts of previous or future images)
• Most prediction is done before DCT, on raw sample values
• JPEG does prediction post-DCT, but only on DC values
• Someone working on JPEG noticed DC values for subsequent block were kind of similar
• So instead of coding the DC value directly, code its diff to the previous block’s (in raster order)
DC value
• First block predicts for an initial value of 0
• Next block is differed to previous block
• So if you have e.g. 3 blocks with DCs of 10, 12, 10, you end up coding 10, 2, -2
Vimeo Lunch Talks
Step 7: Huffman Coding
1412 December 2018
• Simple idea: Values that appear frequently in our data get assigned codes
• Codes are variable length (sometimes called VLCs, or Variable Length Codes)
• JPEG writes lengths of these codes, and these can be generated using a known algorithm once
read.
• AC and DC coefficients have separate length tables coded (remember we predicted the DC value!)
• How we assign values to codes can be optimized “cleverly” in the encoder:
• Example: mozjpeg uses something akin to Viterbi
• These lengths are written as static tables in the JPEG
• The number of Huffman codes of each length (1 to 16 bits long) along with a sorted table of the byte
values of each code.
• This will make more sense when you see the decoder code
Vimeo Lunch Talks
1512 December 2018 Vimeo Lunch Talks
Decoding
.jpeg isn’t JPEG
1612 December 2018
• What we think of as a “JPEG file” isn’t actually JPEG
• Called JFIF, and several versions exists; we’re covering 1.01
• This format is both extremely simple and way too flexible
• Allows for all sorts of crazy crap, while simultaneously being underspecified (APPN
markers)
• The decoder we’re writing today makes a lot of assumptions about files being “good”
• It’s also very slow, since we’re going more for naivety rather than optimization
Vimeo Lunch Talks
JFIF
1712 December 2018
• Basically a series of markers, followed by a 16-bit length
• 0xFF, 0xNN – NN is the marker
• 16-bit length
• (length - 2) worth of data
Vimeo Lunch Talks
1812 December 2018 Vimeo Lunch Talks
Before anything:
You need a
bitstream reader
Boring Stuff: JFIF Markers & Bitstream Parsing
1912 December 2018 Vimeo Lunch Talks
Finally, Decoding Can Start
2012 December 2018 Vimeo Lunch Talks
IDCT
2112 December 2018 Vimeo Lunch Talks
• Can calculate the inverse of the DCT, called theIDCT:
• No more or less scary that the forward DCT
• Our implementation will use simple matrix multiplication and floats
• Real world implementations use fast integer transforms based on butterflies (see references at
end)
Links & References to Read
2212 December 2018 Vimeo Lunch Talks
• Start from nothing: https://dspguide.com/pdfbook.html
• Very good intro to JFIF and JPEG: http://www.opennet.ru/docs/formats/jpeg.txt
• More advanced background (where AA&N fast DCT came from, and why, and why things are the
way there are (AC/DC)): https://www.amazon.com/JPEG-Compression-Standard-Multimedia-
Standards/dp/0442012721/
• THE intro to video codecs: https://www.amazon.com/H-264-Advanced-Video-Compression-
Standard/dp/0470516925/ (can be found digitally)

More Related Content

PDF
Every Solution is Wrong: Normalizing Ambiguous, Broken, and Pants-on-Head Cra...
PDF
Colorspace: Useful For More Than Just Color? - SF Video Tech Meetup - 27 May ...
PDF
Let's Be HAV1ng You - London Video Tech October 2019
PDF
Multimedia Buzzword Bingo: Translating to English
PDF
A Progressive Approach to the Past: Ensuring Backwards Compatability Through ...
PDF
Opening up Open Source
PPTX
Things Developers Believe About Video Files (Proven Wrong by User Uploads)
PPTX
FFMS2: Indexing, Edge Cases, and Insanity
Every Solution is Wrong: Normalizing Ambiguous, Broken, and Pants-on-Head Cra...
Colorspace: Useful For More Than Just Color? - SF Video Tech Meetup - 27 May ...
Let's Be HAV1ng You - London Video Tech October 2019
Multimedia Buzzword Bingo: Translating to English
A Progressive Approach to the Past: Ensuring Backwards Compatability Through ...
Opening up Open Source
Things Developers Believe About Video Files (Proven Wrong by User Uploads)
FFMS2: Indexing, Edge Cases, and Insanity

What's hot (7)

PDF
I Wrote an FFV1 Decoder in Go for Fun: What I Learned Going from Spec to Impl...
ODP
Scaling Streaming - Concepts, Research, Goals
PDF
Latent diffusions vs DALL-E v2
PPT
a quick Introduction to PyPy
PDF
Iron* - An Introduction to Getting Dynamic on .NET
ZIP
An Introduction to PyPy
PDF
PyPy
I Wrote an FFV1 Decoder in Go for Fun: What I Learned Going from Spec to Impl...
Scaling Streaming - Concepts, Research, Goals
Latent diffusions vs DALL-E v2
a quick Introduction to PyPy
Iron* - An Introduction to Getting Dynamic on .NET
An Introduction to PyPy
PyPy
Ad

Similar to Let's Write a JPEG Decoder (Vimeo Lunch Talks) (20)

PPT
Multimedia image compression standards
PPT
Image compression- JPEG Compression & its Modes
PPT
M4L1.ppt
PDF
CMOS Image Sensor Design_h20_10_jpeg.pdf
PPT
jpg image processing nagham salim_as.ppt
PPTX
JPEG Image Compression
PPT
Why Image compression is Necessary?
PDF
Multimedia communication jpeg
PPTX
Jpeg standards
PDF
Video Compression Basics
PPTX
Jpeg compression
PDF
Compression: Images (JPEG)
PPT
image processing for jpeg presentati.ppt
PPTX
JPEG and MPEG Compression in Digital Image Processing.pptx
PPT
Mmclass4
PPT
Image compression techniques and its applications
PDF
Introduction to JPEG and MPEG standard techniques
PDF
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
PDF
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PDF
JFEF encoding
Multimedia image compression standards
Image compression- JPEG Compression & its Modes
M4L1.ppt
CMOS Image Sensor Design_h20_10_jpeg.pdf
jpg image processing nagham salim_as.ppt
JPEG Image Compression
Why Image compression is Necessary?
Multimedia communication jpeg
Jpeg standards
Video Compression Basics
Jpeg compression
Compression: Images (JPEG)
image processing for jpeg presentati.ppt
JPEG and MPEG Compression in Digital Image Processing.pptx
Mmclass4
Image compression techniques and its applications
Introduction to JPEG and MPEG standard techniques
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
JFEF encoding
Ad

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
cuic standard and advanced reporting.pdf
PDF
KodekX | Application Modernization Development
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
sap open course for s4hana steps from ECC to s4
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Weekly Chronicles - August'25 Week I
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Understanding_Digital_Forensics_Presentation.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
cuic standard and advanced reporting.pdf
KodekX | Application Modernization Development
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing

Let's Write a JPEG Decoder (Vimeo Lunch Talks)

  • 1. Let’s Write a JPEG Decoder derekb@vimeo.com @daemon404 Derek Buitenhuis 12 December 2018 New York, USA / The Internet
  • 2. JPEG? Who cares? 112 December 2018 • Good as a first step into codecs • Extremely simple • Doesn’t even have spatial prediction • Convince people DCTs aren’t scary • In extremely wide use and will continue to be for the foreseeable future • Writing a JPEG encoder is a good hands on way to get into hacking on multimedia code • Real, viewable results Vimeo Lunch Talks
  • 3. 212 December 2018 Vimeo Lunch Talks Encoding
  • 4. Step 0: RGB to Y’CbCr 312 December 2018 • Most JPEGs store image as Y’CbCr • Some weird ones store as CMYK or XYZ • JFIF doesn’t actually define a way to tag this info other than “number of planes” • Most web uses are 4:2:0 subsampling • Cb and Cr are half the resolution of Y’ • Save space for things that we notice more • Always BT.601 Vimeo Lunch Talks
  • 5. Step 1: Shift 412 December 2018 • Subtract 128 from all values • DCT = Discrete Cosine Transform • Think of Cosine’s range: [-1,1] • Implementation note: Be careful with implicit type conversions here (uint8 / int8) Vimeo Lunch Talks 60 → -68
  • 6. Step 2: Apply 8x8 Forward DCT 512 December 2018 • Split planes into 8x8 blocks • Do this: Vimeo Lunch Talks
  • 7. 5 Second Overview of DSP 612 December 2018 • Background: • Convert the sample values into the frequency domain using a reversible transform • Higher frequencies = Finer (less noticeable) details • Lower frequencies = Less granular details (e.g. solid rectangles) • DCT chosen over DFT because DCT happens to have a nice property where its energy is concentrated into a smaller set of coefficients, which is better of data compression. • Intelligently drop higher frequencies we shouldn’t notice • Intelligently reduce precision Vimeo Lunch Talks
  • 8. 712 December 2018 Vimeo Lunch Talks Don’t Run!
  • 9. Step 2: Apply 8x8 Forward DCT — Continued 712 December 2018 • Gu,v is the resulting DCT coefficient at point u,v (see below) • u and v are 0 to 7 (8 spatial frequencies in each direction, since we are using 8x8 blocks) • gx,y is the shifted sample value at point x,y in our 8x8 block • α(u) is this function: • If you remember your linear algebra class, this makes sure the transform’s results are orthogonal to each other • Useful since we want to combine basis functions, and they have to be independent! Vimeo Lunch Talks
  • 10. Step 2: Apply 8x8 Forward DCT — Continued 812 December 2018 • Can be sort of thought as overlaying basis functions on each other at varying intensities • This is where coefficients come into play Vimeo Lunch Talks
  • 11. Step 3: Zig-zag 912 December 2018 • Notice: Low frequencies cluster near the top left and higher frequencies radiate out • The top left (lowest frequency) value is called the DC Value • The rest are called AC values • These are named as such for historical reasons • DCT was used to analyze electrical signals before this • Re-ordering the coefficients using a zig-zag pattern yields a set ordered by frequency • Useful for entropy coding (more on that later) • This is where FFmpeg’s logo comes from Vimeo Lunch Talks
  • 12. Step 4: Quantization 1012 December 2018 • Quantization generally refers to taking a continuous (or larger set) and sampling, or mapping it to a smaller (discrete) set. • Aside: The universe is quantum in nature, so can we really call anything continuous? • This is the lossy part of JPEG compression. • We want to map our larger set of DCT coefficients (in our case, floats, but in real cases, a larger set of integers) to a smaller set of integer we’ll actually code into the bitstream • We do this by dividing by a 8x8 quantization matrix, and clamping to integers • This is provided by the encoder, and coded into the bitstream Vimeo Lunch Talks
  • 13. Step 4: Quantization — Continued 1112 December 2018 • Example Quantization Matrix: Input: • Output: Vimeo Lunch Talks
  • 14. Step 5: Run Length Encode Zeroes 1212 December 2018 • Lots of zeroes now! Let’s code them efficiently. • Example set (in raster order): 57,45,0,0,0,0,23,0,-30,-16,0,0,1,0, … • For sets of values like: (X,Y) • X is the number of preceding zeroes • Y is the next value • Special case #1: (0,0) means fill the rest of the set with zeroes after this point • Special case #2: (15,0) in the middle of a set means stuff 16 zeroes in • From our example set: (0, 57); (0, 45); (4, 23); (2, -30); (0, -16); (2, 1); (0, 0) Vimeo Lunch Talks
  • 15. Step 6: DC Prediction 1312 December 2018 • Prediction means “predicting” a current value based off of other values • The “other” values can be separated by space (different parts of the same time), or for video, time (different parts of previous or future images) • Most prediction is done before DCT, on raw sample values • JPEG does prediction post-DCT, but only on DC values • Someone working on JPEG noticed DC values for subsequent block were kind of similar • So instead of coding the DC value directly, code its diff to the previous block’s (in raster order) DC value • First block predicts for an initial value of 0 • Next block is differed to previous block • So if you have e.g. 3 blocks with DCs of 10, 12, 10, you end up coding 10, 2, -2 Vimeo Lunch Talks
  • 16. Step 7: Huffman Coding 1412 December 2018 • Simple idea: Values that appear frequently in our data get assigned codes • Codes are variable length (sometimes called VLCs, or Variable Length Codes) • JPEG writes lengths of these codes, and these can be generated using a known algorithm once read. • AC and DC coefficients have separate length tables coded (remember we predicted the DC value!) • How we assign values to codes can be optimized “cleverly” in the encoder: • Example: mozjpeg uses something akin to Viterbi • These lengths are written as static tables in the JPEG • The number of Huffman codes of each length (1 to 16 bits long) along with a sorted table of the byte values of each code. • This will make more sense when you see the decoder code Vimeo Lunch Talks
  • 17. 1512 December 2018 Vimeo Lunch Talks Decoding
  • 18. .jpeg isn’t JPEG 1612 December 2018 • What we think of as a “JPEG file” isn’t actually JPEG • Called JFIF, and several versions exists; we’re covering 1.01 • This format is both extremely simple and way too flexible • Allows for all sorts of crazy crap, while simultaneously being underspecified (APPN markers) • The decoder we’re writing today makes a lot of assumptions about files being “good” • It’s also very slow, since we’re going more for naivety rather than optimization Vimeo Lunch Talks
  • 19. JFIF 1712 December 2018 • Basically a series of markers, followed by a 16-bit length • 0xFF, 0xNN – NN is the marker • 16-bit length • (length - 2) worth of data Vimeo Lunch Talks
  • 20. 1812 December 2018 Vimeo Lunch Talks Before anything: You need a bitstream reader
  • 21. Boring Stuff: JFIF Markers & Bitstream Parsing 1912 December 2018 Vimeo Lunch Talks
  • 22. Finally, Decoding Can Start 2012 December 2018 Vimeo Lunch Talks
  • 23. IDCT 2112 December 2018 Vimeo Lunch Talks • Can calculate the inverse of the DCT, called theIDCT: • No more or less scary that the forward DCT • Our implementation will use simple matrix multiplication and floats • Real world implementations use fast integer transforms based on butterflies (see references at end)
  • 24. Links & References to Read 2212 December 2018 Vimeo Lunch Talks • Start from nothing: https://dspguide.com/pdfbook.html • Very good intro to JFIF and JPEG: http://www.opennet.ru/docs/formats/jpeg.txt • More advanced background (where AA&N fast DCT came from, and why, and why things are the way there are (AC/DC)): https://www.amazon.com/JPEG-Compression-Standard-Multimedia- Standards/dp/0442012721/ • THE intro to video codecs: https://www.amazon.com/H-264-Advanced-Video-Compression- Standard/dp/0470516925/ (can be found digitally)