SlideShare a Scribd company logo
Huffman Codes
Encoding messages
 Encode a message composed of a string of
characters
 Codes used by computer systems
 ASCII
• uses 8 bits per character
• can encode 256 characters
 Unicode
• 16 bits per character
• can encode 65536 characters
• includes all characters encoded by ASCII
 ASCII and Unicode are fixed-length codes
 all characters represented by same number of bits
Problems
 Suppose that we want to encode a message
constructed from the symbols A, B, C, D, and E
using a fixed-length code
 How many bits are required to encode each
symbol?
 at least 3 bits are required
 2 bits are not enough (can only encode four
symbols)
 How many bits are required to encode the
message DEAACAAAAABA?
 there are twelve symbols, each requires 3 bits
 12*3 = 36 bits are required
Drawbacks of fixed-length codes
 Wasted space
 Unicode uses twice as much space as ASCII
• inefficient for plain-text messages containing
only ASCII characters
 Same number of bits used to represent all characters
 ‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’
 Potential solution: use variable-length codes
 variable number of bits to represent characters
when frequency of occurrence is known
 short codes for characters that occur frequently
Advantages of variable-length codes
 The advantage of variable-length codes over fixed-
length is short codes can be given to characters that
occur frequently
 on average, the length of the encoded message is
less than fixed-length encoding
 Potential problem: how do we know where one
character ends and another begins?
• not a problem if number of bits is fixed!
A = 00
B = 01
C = 10
D = 11
0010110111001111111111
A C D B A D D D D D
Prefix property
 A code has the prefix property if no character code
is the prefix (start of the code) for another character
 Example:
 000 is not a prefix of 11, 01, 001, or 10
 11 is not a prefix of 000, 01, 001, or 10 …
Symbol Code
P 000
Q 11
R 01
S 001
T 10
01001101100010
R S T Q P T
Code without prefix property
 The following code does not have prefix property
 The pattern 1110 can be decoded as QQQP, QTP,
QQS, or TS
Symbol Code
P 0
Q 1
R 01
S 10
T 11
Problem
 Design a variable-length prefix-free code such that
the message DEAACAAAAABA can be encoded
using 22 bits
 Possible solution:
 A occurs eight times while B, C, D, and E each
occur once
 represent A with a one bit code, say 0
• remaining codes cannot start with 0
 represent B with the two bit code 10
• remaining codes cannot start with 0 or 10
 represent C with 110
 represent D with 1110
 represent E with 11110
Encoded message
Symbol Code
A 0
B 10
C 110
D 1110
E 11110
DEAACAAAAABA
1110111100011000000100 22 bits
Another possible code
Symbol Code
A 0
B 100
C 101
D 1101
E 1111
DEAACAAAAABA
1101111100101000001000 22 bits
Better code
Symbol Code
A 0
B 100
C 101
D 110
E 111
DEAACAAAAABA
11011100101000001000 20 bits
What code to use?
 Question: Is there a variable-length code that makes
the most efficient use of space?
Answer: Yes!
Huffman coding tree
 Binary tree
 each leaf contains symbol (character)
 label edge from node to left child with 0
 label edge from node to right child with 1
 Code for any symbol obtained by following path from
root to the leaf containing symbol
 Code has prefix property
 leaf node cannot appear on path to another leaf
 note: fixed-length codes are represented by a
complete Huffman tree and clearly have the prefix
property
Building a Huffman tree
 Find frequencies of each symbol occurring in
message
 Begin with a forest of single node trees
 each contain symbol and its frequency
 Do recursively
 select two trees with smallest frequency at the root
 produce a new binary tree with the selected trees
as children and store the sum of their frequencies
in the root
 Recursion ends when there is one tree
 this is the Huffman coding tree
Example
 Build the Huffman coding tree for the message
This is his message
 Character frequencies
 Begin with forest of single trees
A G M T E H _ I S
1 1 1 1 2 2 3 3 5
1
1 3
1 2
1 2 3 5
A G I S
M T E H _
Step 1
1
1 3
1 2
1 2 3 5
A G I S
M T E H _
2
Step 2
1
1 3
1 2
1 2 3 5
A G I S
M T E H _
2 2
Step 3
1
1 3
1
1 3 5
A G I S
M T _
2 2
2 2
E H
4
Step 4
1
1 3
1
1 3 5
A G I S
M T _
2 2
2 2
E H
4
4
Step 5
1
1 3
1
1 3 5
A G I S
M T _
2 2
2 2
E H
4
4
6
Step 6
3 3 5
I S
_
2 2
E H
4
1
1 1
1
A G M T
2 2
4
6
8
Step 7
3 3
5
I
S
_
2 2
E H
4
1
1 1
1
A G M T
2 2
4 6
8 11
Step 8
3 3
5
I
S
_
2 2
E H
4
1
1 1
1
A G M T
2 2
4 6
8 11
19
Label edges
3 3
5
I
S
_
2 2
E H
4
1
1 1
1
A G M T
2 2
4 6
8 11
19
0
0
0
0
0
0
0
0
1
1
1
1 1
1
1
1
Huffman code & encoded message
S 11
E 010
H 011
_ 100
I 101
A 0000
G 0001
M 0010
T 0011
This is his message
00110111011110010111100011101111000010010111100000001010
Average Code Length
The average code length of the Huffman tree can be determined by using
the formula given below:
Average Code Length = ∑ ( frequency × code length ) / ∑ ( frequency )
This is his message
Symbol Frequency
(F)
Code Code
Length (CL)
Total
(F*CL)
S 5 11 2 10
E 2 010 3 6
H 2 011 3 6
_ 3 100 3 9
I 3 101 3 9
A 1 0000 4 4
G 1 0001 4 4
M 1 0010 4 4
T 1 0011 4 4
Total_Freq 19 56
Average = 56/19
= 2.94737 bits
Length of the string??

More Related Content

PPT
huffman ppt
PPT
Huffmans code
PPTX
Farhana shaikh webinar_huffman coding
PPT
Komdat-Kompresi Data
PPTX
Computer-codes.pptx
PPT
Topic 1 Data Representation
PPT
Topic 1 Data Representation
PDF
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
huffman ppt
Huffmans code
Farhana shaikh webinar_huffman coding
Komdat-Kompresi Data
Computer-codes.pptx
Topic 1 Data Representation
Topic 1 Data Representation
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf

Similar to Huffman Coding.ppt (20)

PPT
Compression Ii
PPT
Compression Ii
PDF
Digital electronics
PPT
Huffman code presentation and their operation
PPT
710402_Lecture 1.ppt
PPTX
Huffman Coding
PDF
Module-IV 094.pdf
PDF
Binary codes
PDF
004 NUMBER SYSTEM (1).pdf
PDF
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
PPTX
Digital Fundamental - Binary Codes-Logic Gates
PDF
DigitalLogic_CharacterCodes.pdf advanced
PPTX
Digital image processing- Compression- Different Coding techniques
PPTX
linear codes and cyclic codes
PPT
Lecture 01
PPTX
3RD.pptx
PPTX
. computer codes
PDF
Basics of coding theory
PPTX
DLD-W3-L1.pptx
Compression Ii
Compression Ii
Digital electronics
Huffman code presentation and their operation
710402_Lecture 1.ppt
Huffman Coding
Module-IV 094.pdf
Binary codes
004 NUMBER SYSTEM (1).pdf
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
Digital Fundamental - Binary Codes-Logic Gates
DigitalLogic_CharacterCodes.pdf advanced
Digital image processing- Compression- Different Coding techniques
linear codes and cyclic codes
Lecture 01
3RD.pptx
. computer codes
Basics of coding theory
DLD-W3-L1.pptx
Ad

Recently uploaded (20)

PPTX
Geodesy 1.pptx...............................................
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
Welding lecture in detail for understanding
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
Project quality management in manufacturing
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
PPT on Performance Review to get promotions
PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
web development for engineering and engineering
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Geodesy 1.pptx...............................................
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Digital Logic Computer Design lecture notes
Welding lecture in detail for understanding
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Project quality management in manufacturing
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
UNIT-1 - COAL BASED THERMAL POWER PLANTS
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
OOP with Java - Java Introduction (Basics)
Arduino robotics embedded978-1-4302-3184-4.pdf
PPT on Performance Review to get promotions
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Internet of Things (IOT) - A guide to understanding
web development for engineering and engineering
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Ad

Huffman Coding.ppt

  • 2. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII • uses 8 bits per character • can encode 256 characters  Unicode • 16 bits per character • can encode 65536 characters • includes all characters encoded by ASCII  ASCII and Unicode are fixed-length codes  all characters represented by same number of bits
  • 3. Problems  Suppose that we want to encode a message constructed from the symbols A, B, C, D, and E using a fixed-length code  How many bits are required to encode each symbol?  at least 3 bits are required  2 bits are not enough (can only encode four symbols)  How many bits are required to encode the message DEAACAAAAABA?  there are twelve symbols, each requires 3 bits  12*3 = 36 bits are required
  • 4. Drawbacks of fixed-length codes  Wasted space  Unicode uses twice as much space as ASCII • inefficient for plain-text messages containing only ASCII characters  Same number of bits used to represent all characters  ‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’  Potential solution: use variable-length codes  variable number of bits to represent characters when frequency of occurrence is known  short codes for characters that occur frequently
  • 5. Advantages of variable-length codes  The advantage of variable-length codes over fixed- length is short codes can be given to characters that occur frequently  on average, the length of the encoded message is less than fixed-length encoding  Potential problem: how do we know where one character ends and another begins? • not a problem if number of bits is fixed! A = 00 B = 01 C = 10 D = 11 0010110111001111111111 A C D B A D D D D D
  • 6. Prefix property  A code has the prefix property if no character code is the prefix (start of the code) for another character  Example:  000 is not a prefix of 11, 01, 001, or 10  11 is not a prefix of 000, 01, 001, or 10 … Symbol Code P 000 Q 11 R 01 S 001 T 10 01001101100010 R S T Q P T
  • 7. Code without prefix property  The following code does not have prefix property  The pattern 1110 can be decoded as QQQP, QTP, QQS, or TS Symbol Code P 0 Q 1 R 01 S 10 T 11
  • 8. Problem  Design a variable-length prefix-free code such that the message DEAACAAAAABA can be encoded using 22 bits  Possible solution:  A occurs eight times while B, C, D, and E each occur once  represent A with a one bit code, say 0 • remaining codes cannot start with 0  represent B with the two bit code 10 • remaining codes cannot start with 0 or 10  represent C with 110  represent D with 1110  represent E with 11110
  • 9. Encoded message Symbol Code A 0 B 10 C 110 D 1110 E 11110 DEAACAAAAABA 1110111100011000000100 22 bits
  • 10. Another possible code Symbol Code A 0 B 100 C 101 D 1101 E 1111 DEAACAAAAABA 1101111100101000001000 22 bits
  • 11. Better code Symbol Code A 0 B 100 C 101 D 110 E 111 DEAACAAAAABA 11011100101000001000 20 bits
  • 12. What code to use?  Question: Is there a variable-length code that makes the most efficient use of space? Answer: Yes!
  • 13. Huffman coding tree  Binary tree  each leaf contains symbol (character)  label edge from node to left child with 0  label edge from node to right child with 1  Code for any symbol obtained by following path from root to the leaf containing symbol  Code has prefix property  leaf node cannot appear on path to another leaf  note: fixed-length codes are represented by a complete Huffman tree and clearly have the prefix property
  • 14. Building a Huffman tree  Find frequencies of each symbol occurring in message  Begin with a forest of single node trees  each contain symbol and its frequency  Do recursively  select two trees with smallest frequency at the root  produce a new binary tree with the selected trees as children and store the sum of their frequencies in the root  Recursion ends when there is one tree  this is the Huffman coding tree
  • 15. Example  Build the Huffman coding tree for the message This is his message  Character frequencies  Begin with forest of single trees A G M T E H _ I S 1 1 1 1 2 2 3 3 5 1 1 3 1 2 1 2 3 5 A G I S M T E H _
  • 16. Step 1 1 1 3 1 2 1 2 3 5 A G I S M T E H _ 2
  • 17. Step 2 1 1 3 1 2 1 2 3 5 A G I S M T E H _ 2 2
  • 18. Step 3 1 1 3 1 1 3 5 A G I S M T _ 2 2 2 2 E H 4
  • 19. Step 4 1 1 3 1 1 3 5 A G I S M T _ 2 2 2 2 E H 4 4
  • 20. Step 5 1 1 3 1 1 3 5 A G I S M T _ 2 2 2 2 E H 4 4 6
  • 21. Step 6 3 3 5 I S _ 2 2 E H 4 1 1 1 1 A G M T 2 2 4 6 8
  • 22. Step 7 3 3 5 I S _ 2 2 E H 4 1 1 1 1 A G M T 2 2 4 6 8 11
  • 23. Step 8 3 3 5 I S _ 2 2 E H 4 1 1 1 1 A G M T 2 2 4 6 8 11 19
  • 24. Label edges 3 3 5 I S _ 2 2 E H 4 1 1 1 1 A G M T 2 2 4 6 8 11 19 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
  • 25. Huffman code & encoded message S 11 E 010 H 011 _ 100 I 101 A 0000 G 0001 M 0010 T 0011 This is his message 00110111011110010111100011101111000010010111100000001010
  • 26. Average Code Length The average code length of the Huffman tree can be determined by using the formula given below: Average Code Length = ∑ ( frequency × code length ) / ∑ ( frequency ) This is his message Symbol Frequency (F) Code Code Length (CL) Total (F*CL) S 5 11 2 10 E 2 010 3 6 H 2 011 3 6 _ 3 100 3 9 I 3 101 3 9 A 1 0000 4 4 G 1 0001 4 4 M 1 0010 4 4 T 1 0011 4 4 Total_Freq 19 56 Average = 56/19 = 2.94737 bits Length of the string??