SlideShare a Scribd company logo
5
Most read
6
Most read
7
Most read
Huffman Codes
 Huffman codes are an effective technique of
‘lossless data compression’ which means no
information is lost.
 The algorithm builds a table of the
frequencies of each character in a file
 The table is then used to determine an optimal
way of representing each character as a binary
string
Huffman Codes
 Consider a file of 100,000 characters from a-
f, with these frequencies:
 a = 45,000
 b = 13,000
 c = 12,000
 d = 16,000
 e = 9,000
 f = 5,000
Huffman Codes
 Typically each character in a file is stored as a
single byte (8 bits)
 If we know we only have six characters, we can use a 3 bit
code for the characters instead:
 a = 000, b = 001, c = 010, d = 011, e = 100, f = 101
 This is called a fixed-length code
 With this scheme, we can encode the whole file with 300,000
bits (45000*3+13000*3+12000*3+16000*3+9000*3+5000*3)
 We can do better
 Better compression
 More flexibility
Huffman Codes
 Variable length codes can perform significantly
better
 Frequent characters are given short code words, while
infrequent characters get longer code words
 Consider this scheme:
 a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100
 How many bits are now required to encode our file?
 45,000*1 + 13,000*3 + 12,000*3 + 16,000*3 + 9,000*4 + 5,000*4
= 224,000 bits
 This is in fact an optimal character code for this file
Huffman Codes
 Prefix codes
 Huffman codes are constructed in such a way that they can
be unambiguously translated back to the original data, yet
still be an optimal character code
 Huffman codes are really considered “prefix codes”
 No code word is a prefix of any other code word
Prefix code (9,55,50)
Not a prefix code (9,5,55,59) because 5 is present in 55 and 59
 This guarantees unambiguous decoding
 Once a code is recognized, we can replace with the decoded
data, without worrying about whether we may also match some
other code
Huffman Codes
 Both the encoder and decoder make use of a binary
tree to recognize codes
 The leaves of the tree represent the unencoded characters
 Each left branch indicates a “0” placed in the encoded bit
string
 Each right branch indicates a “1” placed in the bit string
Huffman Codes
100
a:45
0
55
1
25
0
c:12
0
b:13
1
30
1
14
0
d:16
1
f:5 e:9
0 1
A Huffman Code Tree
 To encode:
 Search the tree for the character
to encode
 As you progress, add “0” or “1”
to right of code
 Code is complete when you find
character
 To decode a code:
 Proceed through bit string left to right
 For each bit, proceed left or right as
indicated
 When you reach a leaf, that is the
decoded character
Huffman Codes
 Using this representation, an optimal code will
always be represented by a full binary tree
 Every non-leaf node has two children
 If this were not true, then there would be “waste” bits, as in the
fixed-length code, leading to a non-optimal compression
 For a set of c characters, this requires c leaves, and c-1
internal nodes
Huffman Codes
 Given a Huffman tree, how do we compute the
number of bits required to encode a file?
 For every character c:
 Let f(c) denote the character’s frequency
 Let dT(c) denote the character’s depth in the tree
 This is also the length of the character’s code word
 The total bits required is then:



Cc
T cdcfTB )()()(
Constructing a Huffman Code
 Huffman developed a greedy algorithm for
constructing an optimal prefix code
 The algorithm builds the tree in a bottom-up manner
 It begins with the leaves, then performs merging operations
to build up the tree
 At each step, it merges the two least frequent members
together
 It removes these characters from the set, and replaces them
with a “metacharacter” with frequency = sum of the removed
characters’ frequencies
Example
Huffman codes
Huffman codes
Huffman codes
Huffman codes
 For example, we code the 3 letter file ‘abc’ as
000.001.010

More Related Content

PPT
Huffman Coding
PPTX
Run length encoding
PPTX
When Technology and Humanity Cross
PPTX
Introduction to CI/CD
PDF
Python programming : Abstract classes interfaces
PDF
Chap 8. Optimization for training deep models
PDF
Python set
PDF
Machine Learning in Banking Sector
Huffman Coding
Run length encoding
When Technology and Humanity Cross
Introduction to CI/CD
Python programming : Abstract classes interfaces
Chap 8. Optimization for training deep models
Python set
Machine Learning in Banking Sector

What's hot (20)

PPTX
Huffman coding
PPT
Interpixel redundancy
PDF
Digital Image Processing - Image Compression
PPT
Hamming codes
PPTX
Greedy Algorithm - Knapsack Problem
PPTX
Unit iv(simple code generator)
PPTX
Directed Acyclic Graph Representation of basic blocks
PPTX
Huffman Coding Algorithm Presentation
PPTX
Three Address code
PPTX
Data structure - Graph
PPTX
Three address code In Compiler Design
PPTX
Unification and Lifting
PPTX
Message digest 5
PDF
Noise Models
PPTX
Secure Hash Algorithm (SHA-512)
PPTX
The role of the parser and Error recovery strategies ppt in compiler design
PPT
0/1 knapsack
PPTX
SHA-256.pptx
PPTX
Transform coding
Huffman coding
Interpixel redundancy
Digital Image Processing - Image Compression
Hamming codes
Greedy Algorithm - Knapsack Problem
Unit iv(simple code generator)
Directed Acyclic Graph Representation of basic blocks
Huffman Coding Algorithm Presentation
Three Address code
Data structure - Graph
Three address code In Compiler Design
Unification and Lifting
Message digest 5
Noise Models
Secure Hash Algorithm (SHA-512)
The role of the parser and Error recovery strategies ppt in compiler design
0/1 knapsack
SHA-256.pptx
Transform coding
Ad

Similar to Huffman codes (20)

PPTX
Huffman analysis
DOC
Huffman coding01
PPT
Huffman coding
DOC
HuffmanCoding01.doc
PDF
Huffman
PDF
Huffman
PPT
PPT
Lossless
PPTX
Text compression
PPTX
Huffman Coding
PDF
Implementation of Lossless Compression Algorithms for Text Data
PDF
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
PPTX
Huffman Algorithm By Shuhin
DOCX
Lecft3data
PPT
Komdat-Kompresi Data
PPTX
Chapter 4 Lossless Compression Algorithims.pptx
PPT
Chapter%202%20 %20 Text%20compression(2)
 
PDF
Arithmetic Coding
PDF
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
PPT
Lec5 Compression
Huffman analysis
Huffman coding01
Huffman coding
HuffmanCoding01.doc
Huffman
Huffman
Lossless
Text compression
Huffman Coding
Implementation of Lossless Compression Algorithms for Text Data
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
Huffman Algorithm By Shuhin
Lecft3data
Komdat-Kompresi Data
Chapter 4 Lossless Compression Algorithims.pptx
Chapter%202%20 %20 Text%20compression(2)
 
Arithmetic Coding
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
Lec5 Compression
Ad

More from Nargis Ehsan (11)

PDF
Sqlite left outer_joins
PPT
PPT
PPT
Sql statments c ha p# 1
PPT
PDF
Quick sort algo analysis
PPTX
Inner join and outer join
PPT
Erd chapter 3
PPTX
The relational database model chapter 2
PPTX
Communication network
PPTX
Communication network .ppt
Sqlite left outer_joins
Sql statments c ha p# 1
Quick sort algo analysis
Inner join and outer join
Erd chapter 3
The relational database model chapter 2
Communication network
Communication network .ppt

Recently uploaded (20)

PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
August Patch Tuesday
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Tartificialntelligence_presentation.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Hybrid model detection and classification of lung cancer
PDF
project resource management chapter-09.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Getting Started with Data Integration: FME Form 101
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Digital-Transformation-Roadmap-for-Companies.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
NewMind AI Weekly Chronicles - August'25-Week II
Group 1 Presentation -Planning and Decision Making .pptx
1. Introduction to Computer Programming.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
August Patch Tuesday
MIND Revenue Release Quarter 2 2025 Press Release
Building Integrated photovoltaic BIPV_UPV.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Tartificialntelligence_presentation.pptx
cloud_computing_Infrastucture_as_cloud_p
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation theory and applications.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Hybrid model detection and classification of lung cancer
project resource management chapter-09.pdf
1 - Historical Antecedents, Social Consideration.pdf
Getting Started with Data Integration: FME Form 101

Huffman codes

  • 1. Huffman Codes  Huffman codes are an effective technique of ‘lossless data compression’ which means no information is lost.  The algorithm builds a table of the frequencies of each character in a file  The table is then used to determine an optimal way of representing each character as a binary string
  • 2. Huffman Codes  Consider a file of 100,000 characters from a- f, with these frequencies:  a = 45,000  b = 13,000  c = 12,000  d = 16,000  e = 9,000  f = 5,000
  • 3. Huffman Codes  Typically each character in a file is stored as a single byte (8 bits)  If we know we only have six characters, we can use a 3 bit code for the characters instead:  a = 000, b = 001, c = 010, d = 011, e = 100, f = 101  This is called a fixed-length code  With this scheme, we can encode the whole file with 300,000 bits (45000*3+13000*3+12000*3+16000*3+9000*3+5000*3)  We can do better  Better compression  More flexibility
  • 4. Huffman Codes  Variable length codes can perform significantly better  Frequent characters are given short code words, while infrequent characters get longer code words  Consider this scheme:  a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100  How many bits are now required to encode our file?  45,000*1 + 13,000*3 + 12,000*3 + 16,000*3 + 9,000*4 + 5,000*4 = 224,000 bits  This is in fact an optimal character code for this file
  • 5. Huffman Codes  Prefix codes  Huffman codes are constructed in such a way that they can be unambiguously translated back to the original data, yet still be an optimal character code  Huffman codes are really considered “prefix codes”  No code word is a prefix of any other code word Prefix code (9,55,50) Not a prefix code (9,5,55,59) because 5 is present in 55 and 59  This guarantees unambiguous decoding  Once a code is recognized, we can replace with the decoded data, without worrying about whether we may also match some other code
  • 6. Huffman Codes  Both the encoder and decoder make use of a binary tree to recognize codes  The leaves of the tree represent the unencoded characters  Each left branch indicates a “0” placed in the encoded bit string  Each right branch indicates a “1” placed in the bit string
  • 7. Huffman Codes 100 a:45 0 55 1 25 0 c:12 0 b:13 1 30 1 14 0 d:16 1 f:5 e:9 0 1 A Huffman Code Tree  To encode:  Search the tree for the character to encode  As you progress, add “0” or “1” to right of code  Code is complete when you find character  To decode a code:  Proceed through bit string left to right  For each bit, proceed left or right as indicated  When you reach a leaf, that is the decoded character
  • 8. Huffman Codes  Using this representation, an optimal code will always be represented by a full binary tree  Every non-leaf node has two children  If this were not true, then there would be “waste” bits, as in the fixed-length code, leading to a non-optimal compression  For a set of c characters, this requires c leaves, and c-1 internal nodes
  • 9. Huffman Codes  Given a Huffman tree, how do we compute the number of bits required to encode a file?  For every character c:  Let f(c) denote the character’s frequency  Let dT(c) denote the character’s depth in the tree  This is also the length of the character’s code word  The total bits required is then:    Cc T cdcfTB )()()(
  • 10. Constructing a Huffman Code  Huffman developed a greedy algorithm for constructing an optimal prefix code  The algorithm builds the tree in a bottom-up manner  It begins with the leaves, then performs merging operations to build up the tree  At each step, it merges the two least frequent members together  It removes these characters from the set, and replaces them with a “metacharacter” with frequency = sum of the removed characters’ frequencies
  • 16.  For example, we code the 3 letter file ‘abc’ as 000.001.010