SlideShare a Scribd company logo
4
Most read
11
Most read
14
Most read
Huffman Encoding



           Βαγγέλης Δούρος
           EY0619




1
Text Compression

     On a computer: changing the representation
     of a file so that it takes less space to store
     or/and less time to transmit.
     –   original file can be reconstructed exactly from the
         compressed representation.
     different than data compression in general
     –   text compression has to be lossless.
     –   compare with sound and images: small changes
         and noise is tolerated.

2
First Approach

     Let the word ABRACADABRA
     What is the most economical way to write this
     string in a binary representation?
     Generally speaking, if a text consists of N
     different characters, we need ⎡log N ⎤ bits to
                                    ⎢     ⎥
     represent each one using a fixed-length
     encoding.
     Thus, it would require 3 bits for each of 5
     different letters, or 33 bits for 11 letters.
     Can we do it better?

3
Yes!!!!

    We can do better, provided:
    –   Some characters are more frequent than others.
    –   Characters may be different bit lengths, so that for
        example, in the English alphabet letter a may use
        only one or two bits, while letter y may use
        several.
    –   We have a unique way of decoding the bit stream.



4
Using Variable-Length Encoding (1)

     Magic word: ABRACADABRA
     LET A = 0
         B = 100
         C = 1010
         D = 1011
         R = 11
     Thus, ABRACADABRA = 01001101010010110100110
     So 11 letters demand 23 bits < 33 bits, an
     improvement of about 30%.



5
Using Variable-Length Encoding (2)

     However, there is a serious danger: How to ensure
     unique reconstruction?
     Let A    01 and B     0101
     How to decode 010101?
     AB?
     BA?
     AAA?
     No problem…
     if we use prefix codes: no codeword is a prefix of
     another codeword.

6
Prefix Codes (1)

     Any prefix code can be represented by a full
     binary tree.
     Each leaf stores a symbol.
     Each node has two children – left branch
     means 0, right means 1.
     codeword = path from the root to the leaf
     interpreting suitably the left and right
     branches.

7
Prefix Codes (2)
     ABRACADABRA

     A=0
     B = 100
     C = 1010
     D = 1011
     R = 11
     Decoding is unique and simple!
     Read the bit stream from left to
     right and starting from the root,
     whenever a leaf is reached,
     write down its symbol and
     return to the root.




8
Prefix Codes (3)

     Let fi the frequency of the i-th symbol ,
     di the number of bits required for the i-th
     symbol(=the depth of this symbol in tree), 1 ≤ i ≤ n
     How do we find the optimal coding tree,   n

     which minimizes the cost of tree C = ∑ f d ?
                                              i=1
                                                    i   i



      –   Frequent characters should have short
          codewords
      –   Rare characters should have long codewords

9
Huffman’s Idea
      From the previous definition of the cost of tree, it is clear that
      the two symbols with the smallest frequencies must be at the
      bottom of the optimal tree, as children of the lowest internal
      node, isn’t it?
      This is a good sign that we have to use a bottom-up manner to
      build the optimal code!
      Huffman’s idea is based on a greedy approach, using the
      previous notices.
      Repeat until all nodes merged into one tree:
       –   Remove two nodes with the lowest frequencies.
       –   Create a new internal node, with the two just-removed nodes as
           children (either node can be either child) and the sum of their
           frequencies as the new frequency.


10
Constructing a Huffman Code (1)

      Assume that frequencies of symbols are:
      –   A: 40 B: 20 C: 10 D: 10 R: 20
      Smallest numbers are 10 and 10 (C and D), so
      connect them




11
Constructing a Huffman Code (2)
      C and D have already been
      used, and the new node
      above them (call it C+D) has
      value 20
      The smallest values are B,
      C+D, and R, all of which
      have value 20
       –   Connect any two of these
      It is clear that the algorithm
      does not construct a unique
      tree, but even if we have
      chosen the other possible
      connection, the code would
      be optimal too!

12
Constructing a Huffman Code (3)

      The smallest value is R, while A and B+C+D have
      value 40.
      Connect R to either of the others.




13
Constructing a Huffman Code(4)

      Connect the final two nodes, adding 0 and 1 to
      each left and right branch respectively.




14
Algorithm
                      X is the set of symbols, whose
                      frequencies are known in advance

                              Q is a min-priority queue,
                              implemented as binary-heap
                 -1




15
What about Complexity?

                                Thus, the algorithm needs Ο(nlogn)


                                          needs O(nlogn)
              -1   Thus, the loop needs O(nlogn)


                                    needs O(logn)
                                    needs O(logn)




                             needs O(logn)
16
Algorithm’s Correctness
      It is proven that the greedy algorithm HUFFMAN is correct, as the
      problem of determining an optimal prefix code exhibits the greedy-
      choice and optimal-substructure properties.
      Greedy Choice :Let C an alphabet in which each character c Є C has
      frequency f[c]. Let x and y two characters in C having the lowest
      frequencies. Then there exists an optimal prefix code for C in which
      the codewords for x and y have the same length and differ only in the
      last bit.
      Optimal Substructure :Let C a given alphabet with frequency f[c]
      defined for each character c Є C . Let x and y, two characters in C with
      minimum frequency. Let C’ ,the alphabet C with characters x,y
      removed and (new) character z added, so that C’ = C – {x,y} U {z};
      define f for C’ as for C, except that f[z] = f[x] + f[y]. Let T’ ,any tree
      representing an optimal prefix code for the alphabet C’. Then the tree
      T, obtained from T’ by replacing the leaf node for z with an internal
      node having x and y as children, represents an optimal prefix code for
      the alphabet C.
17
Last Remarks

     • "Huffman Codes" are widely used applications that
       involve the compression and transmission of digital
       data, such as: fax machines, modems, computer
       networks.
     • Huffman encoding is practical if:
        –   The encoded string is large relative to the code table
            (because you have to include the code table in the entire
            message, if it is not widely spread).
        –   We agree on the code table in advance
             • For example, it’s easy to find a table of letter frequencies for
               English (or any other alphabet-based language)


18
Ευχαριστώ!




19

More Related Content

PPTX
Video compression
PDF
Solution(1)
PPTX
Convolutional neural network
PPTX
Data compression
PDF
Region Splitting and Merging Technique For Image segmentation.
PPTX
Lzw compression ppt
PPTX
Chapter 9 morphological image processing
PPTX
Vector quantization
Video compression
Solution(1)
Convolutional neural network
Data compression
Region Splitting and Merging Technique For Image segmentation.
Lzw compression ppt
Chapter 9 morphological image processing
Vector quantization

What's hot (20)

PPTX
Lecture 16 memory bounded search
PPTX
sum of subset problem using Backtracking
PPTX
PPTX
Huffman codes
PPT
Action Recognition (Thesis presentation)
PPTX
Dbscan algorithom
PPT
image processing intensity transformation
PPTX
What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...
PDF
Training Neural Networks
PPTX
Lzw compression
PPTX
Boyer moore algorithm
PPTX
Chapter 8 image compression
PPSX
Edge Detection and Segmentation
PPTX
Batch normalization presentation
PDF
Target language in compiler design
PDF
Backpropagation in Convolutional Neural Network
PPT
ImageProcessing10-Segmentation(Thresholding) (1).ppt
PPTX
Lect 02 first portion
PPTX
Smoothing in Digital Image Processing
PPTX
Activation functions and Training Algorithms for Deep Neural network
Lecture 16 memory bounded search
sum of subset problem using Backtracking
Huffman codes
Action Recognition (Thesis presentation)
Dbscan algorithom
image processing intensity transformation
What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...
Training Neural Networks
Lzw compression
Boyer moore algorithm
Chapter 8 image compression
Edge Detection and Segmentation
Batch normalization presentation
Target language in compiler design
Backpropagation in Convolutional Neural Network
ImageProcessing10-Segmentation(Thresholding) (1).ppt
Lect 02 first portion
Smoothing in Digital Image Processing
Activation functions and Training Algorithms for Deep Neural network
Ad

Viewers also liked (20)

PDF
Data compression huffman coding algoritham
PPTX
Huffman Coding
PPT
Huffman Coding
PPTX
Huffman tree
PPT
Huffman Tree And Its Application
PPT
Knapsack problem
PPTX
Queue- 8 Queen
PPTX
Knapsack
PPTX
Greedy algorithm
PPTX
Computer architecture
PPT
0 introduction to computer architecture
PDF
backtracking algorithms of ada
PPTX
Memory Organization
PPT
Computer architecture
PPTX
8 queens problem using back tracking
PPTX
BASIC COMPUTER ARCHITECTURE
PPTX
Knapsack Problem
PDF
Memory organization
PPTX
Computer Architecture and organization
PPTX
Computer Architecture – An Introduction
Data compression huffman coding algoritham
Huffman Coding
Huffman Coding
Huffman tree
Huffman Tree And Its Application
Knapsack problem
Queue- 8 Queen
Knapsack
Greedy algorithm
Computer architecture
0 introduction to computer architecture
backtracking algorithms of ada
Memory Organization
Computer architecture
8 queens problem using back tracking
BASIC COMPUTER ARCHITECTURE
Knapsack Problem
Memory organization
Computer Architecture and organization
Computer Architecture – An Introduction
Ad

Similar to Huffman Encoding Pr (20)

DOC
HuffmanCoding01.doc
DOC
Huffman coding01
PPTX
Huffman Coding
PPT
Huffman Coding.ppt
PPTX
Huffman.pptx
PPTX
Lecture-7-CS345A-2023 of Design and Analysis
PPT
Huffmans code
PPT
huffman ppt
PDF
Huffman Text Compression Technique
PDF
11.the novel lossless text compression technique using ambigram logic and huf...
PDF
Sienna 12 huffman
PPTX
Huffman analysis
PDF
CS-102 Data Structures huffman coding.pdf
PPT
computer notes - Data Structures - 24
PPT
PPT
PPT
Huffman 2
PDF
Paper id 24201469
DOCX
Huffman Coding is a technique of compressing data
PDF
Implementation of Lossless Compression Algorithms for Text Data
HuffmanCoding01.doc
Huffman coding01
Huffman Coding
Huffman Coding.ppt
Huffman.pptx
Lecture-7-CS345A-2023 of Design and Analysis
Huffmans code
huffman ppt
Huffman Text Compression Technique
11.the novel lossless text compression technique using ambigram logic and huf...
Sienna 12 huffman
Huffman analysis
CS-102 Data Structures huffman coding.pdf
computer notes - Data Structures - 24
Huffman 2
Paper id 24201469
Huffman Coding is a technique of compressing data
Implementation of Lossless Compression Algorithms for Text Data

More from anithabalaprabhu (20)

PPTX
Shannon Fano
PDF
Ch 04 Arithmetic Coding ( P P T)
PPT
Compression
PPT
Datacompression1
PPT
Speech Compression
PDF
Z24 4 Speech Compression
PPT
PDF
Dictionary Based Compression
PDF
Module 4 Arithmetic Coding
PDF
Ch 04 Arithmetic Coding (Ppt)
PPT
Compression Ii
PDF
06 Arithmetic 1
PDF
Arithmetic Coding
PPT
Compression Ii
PPT
PPT
PPT
Losseless
Shannon Fano
Ch 04 Arithmetic Coding ( P P T)
Compression
Datacompression1
Speech Compression
Z24 4 Speech Compression
Dictionary Based Compression
Module 4 Arithmetic Coding
Ch 04 Arithmetic Coding (Ppt)
Compression Ii
06 Arithmetic 1
Arithmetic Coding
Compression Ii
Losseless

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Mobile App Security Testing_ A Comprehensive Guide.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Understanding_Digital_Forensics_Presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Chapter 3 Spatial Domain Image Processing.pdf
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Huffman Encoding Pr

  • 1. Huffman Encoding Βαγγέλης Δούρος EY0619 1
  • 2. Text Compression On a computer: changing the representation of a file so that it takes less space to store or/and less time to transmit. – original file can be reconstructed exactly from the compressed representation. different than data compression in general – text compression has to be lossless. – compare with sound and images: small changes and noise is tolerated. 2
  • 3. First Approach Let the word ABRACADABRA What is the most economical way to write this string in a binary representation? Generally speaking, if a text consists of N different characters, we need ⎡log N ⎤ bits to ⎢ ⎥ represent each one using a fixed-length encoding. Thus, it would require 3 bits for each of 5 different letters, or 33 bits for 11 letters. Can we do it better? 3
  • 4. Yes!!!! We can do better, provided: – Some characters are more frequent than others. – Characters may be different bit lengths, so that for example, in the English alphabet letter a may use only one or two bits, while letter y may use several. – We have a unique way of decoding the bit stream. 4
  • 5. Using Variable-Length Encoding (1) Magic word: ABRACADABRA LET A = 0 B = 100 C = 1010 D = 1011 R = 11 Thus, ABRACADABRA = 01001101010010110100110 So 11 letters demand 23 bits < 33 bits, an improvement of about 30%. 5
  • 6. Using Variable-Length Encoding (2) However, there is a serious danger: How to ensure unique reconstruction? Let A 01 and B 0101 How to decode 010101? AB? BA? AAA? No problem… if we use prefix codes: no codeword is a prefix of another codeword. 6
  • 7. Prefix Codes (1) Any prefix code can be represented by a full binary tree. Each leaf stores a symbol. Each node has two children – left branch means 0, right means 1. codeword = path from the root to the leaf interpreting suitably the left and right branches. 7
  • 8. Prefix Codes (2) ABRACADABRA A=0 B = 100 C = 1010 D = 1011 R = 11 Decoding is unique and simple! Read the bit stream from left to right and starting from the root, whenever a leaf is reached, write down its symbol and return to the root. 8
  • 9. Prefix Codes (3) Let fi the frequency of the i-th symbol , di the number of bits required for the i-th symbol(=the depth of this symbol in tree), 1 ≤ i ≤ n How do we find the optimal coding tree, n which minimizes the cost of tree C = ∑ f d ? i=1 i i – Frequent characters should have short codewords – Rare characters should have long codewords 9
  • 10. Huffman’s Idea From the previous definition of the cost of tree, it is clear that the two symbols with the smallest frequencies must be at the bottom of the optimal tree, as children of the lowest internal node, isn’t it? This is a good sign that we have to use a bottom-up manner to build the optimal code! Huffman’s idea is based on a greedy approach, using the previous notices. Repeat until all nodes merged into one tree: – Remove two nodes with the lowest frequencies. – Create a new internal node, with the two just-removed nodes as children (either node can be either child) and the sum of their frequencies as the new frequency. 10
  • 11. Constructing a Huffman Code (1) Assume that frequencies of symbols are: – A: 40 B: 20 C: 10 D: 10 R: 20 Smallest numbers are 10 and 10 (C and D), so connect them 11
  • 12. Constructing a Huffman Code (2) C and D have already been used, and the new node above them (call it C+D) has value 20 The smallest values are B, C+D, and R, all of which have value 20 – Connect any two of these It is clear that the algorithm does not construct a unique tree, but even if we have chosen the other possible connection, the code would be optimal too! 12
  • 13. Constructing a Huffman Code (3) The smallest value is R, while A and B+C+D have value 40. Connect R to either of the others. 13
  • 14. Constructing a Huffman Code(4) Connect the final two nodes, adding 0 and 1 to each left and right branch respectively. 14
  • 15. Algorithm X is the set of symbols, whose frequencies are known in advance Q is a min-priority queue, implemented as binary-heap -1 15
  • 16. What about Complexity? Thus, the algorithm needs Ο(nlogn) needs O(nlogn) -1 Thus, the loop needs O(nlogn) needs O(logn) needs O(logn) needs O(logn) 16
  • 17. Algorithm’s Correctness It is proven that the greedy algorithm HUFFMAN is correct, as the problem of determining an optimal prefix code exhibits the greedy- choice and optimal-substructure properties. Greedy Choice :Let C an alphabet in which each character c Є C has frequency f[c]. Let x and y two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit. Optimal Substructure :Let C a given alphabet with frequency f[c] defined for each character c Є C . Let x and y, two characters in C with minimum frequency. Let C’ ,the alphabet C with characters x,y removed and (new) character z added, so that C’ = C – {x,y} U {z}; define f for C’ as for C, except that f[z] = f[x] + f[y]. Let T’ ,any tree representing an optimal prefix code for the alphabet C’. Then the tree T, obtained from T’ by replacing the leaf node for z with an internal node having x and y as children, represents an optimal prefix code for the alphabet C. 17
  • 18. Last Remarks • "Huffman Codes" are widely used applications that involve the compression and transmission of digital data, such as: fax machines, modems, computer networks. • Huffman encoding is practical if: – The encoded string is large relative to the code table (because you have to include the code table in the entire message, if it is not widely spread). – We agree on the code table in advance • For example, it’s easy to find a table of letter frequencies for English (or any other alphabet-based language) 18