1. Huffman Coding
and its implementation on
MATLAB
To
Dr. Samir Ghadhban
By
Faisal K. Al-Hajri
227064
King Fahd University of Petroleum & Minerals
Electrical Engineering Department
EE430-062
2. Before I start explaining how
could you implement Huffman coding
using MATLAB functions, lets first get a
clear idea about Huffman coding and how
could you find the codewords.
3. Huffman Coding:
Huffman codes are lossless data compression
codes. They play an important role in data
communications, speech coding, and video or graphical
image compression. Huffman codes generally have
variable-length code words.
Construction of Huffman Codes:-
1- List the source symbols in a column in
descending order of probability.
2- Combine symbols, starting with the two lowest
probability symbols, to form a new compound symbol.
4. 3- Repeating step (2), using the two lowest probability
symbols from the new set of symbols. This process continues
until all of the original symbols have been combined into a
single compound symbol having probability 1.
4- Code words are assigned by reading the labels of
the tree stems from right to left back to the original symbol.
5. Example:
Construct a Huffman code for the following probabilities:
Symbols Probabilities
E 0.30
N 0.13
I 0.10
O 0.16
P 0.05
T 0.23
W 0.03
6. • First, list the source symbols in a column in descending
order of probability.
Symbols Probabilities
E 0.30
T 0.23
O 0.16
N 0.13
I 0.10
P 0.05
W 0.03
7. • Second, Combine symbols starting with the two lowest
probability symbols to form a new compound symbol.
Symbols Probabilities
E 0.30
T 0.23
O 0.16
N 0.13
I 0.10
P 0.05
W 0.03
These are the
two lowest
probabilities.
So, we will
add them.
9. We keep adding the lowest probabilities from the new set
until all of the original symbols have been combined into a
single compound symbols (E, T, O, N, I, P, W) having
probability of 1.
Note:- for each new list, we MUST rearrangement in descending
order.
12. So, the code word for the all symbols are:
The code word for the symbol E is:(11)
The code word for the symbol T is: (01)
The code word for the symbol O is: (101)
The code word for the symbol N is: (100)
The code word for the symbol I is: (001)
The code word for the symbol P is: (0001)
The code word for the symbol W is: (0000)
13. Now,
after we get a clear idea about Huffman coding. Lets now try
to implement it using MATLAP…
14. There are many ways to implement Huffman coding using
MATLAP or another program. My program consists of three parts or
should I call it Three phases.
First Phase:
It’s a checking phase. Where all the input arguments are being
checked whether they satisfy the conditions or not ( as we will see).
Second Phase:
This phase I should call it the heart of the algorithm. In this
phase, there are some loops that are employed for sorting & decisions
such as comparisons.
Third Phase:
In the third phase, all the output arguments are being finalized
and calculated.
15. Before introducing each phase, let me talk briefly about the MATLAB function
( function )
Description:
You add new functions to the MATLAB vocabulary by expressing
them in terms of existing functions. The existing commands and functions that
compose the new function reside in a text file called an M-file. This function is
usually used for writing an algorithm for general uses not specified for one
case.
In my algorithm, I have started with this code which means the following:
huffcode(s,p): huffcode for my function name and M-file must be also named
as same as your function name. s stands for symbols and p for probabilities.
CWord, WLen, Entropy, EfficiencyBefore, EfficiencyAfter & Codes, these are
the output arguments
19. • This if statement checks if the probabilities
are in vector or not. If it’s not the MATLAB
will show the error message and will not
continue to the rest of the program.
20. You can’t give probability of type integer.
You must give a probability of type double.
Note: this check is not too important.
21. Check each element in the probability
vector. If there is a number less than zero
(negative) then MATLAB will show the
error message and stop.
22. To be sure that all the elements in
probability vector are less than one.
23. It’s not enough to check if all the elements
are less than one. We have to check also
the sum of all probability elements are
equal to one.
24. Here simply to make sure that the size of the
symbols vector and probability vector are
equal.
26. In this phase I will show the idea of
structures and how did I use it to write this
algorithms.
Before taking the Symbols and Probabilities
vectors and push them inside the structure
we must rearrangement in descending
order of probability. That can be done
easily by this simple code.
27. Structure Function
Struct
Create structure array
Syntax
s = struct('field1', {}, 'field2', {}, ...)
s = struct('field1', values1, 'field2', values2, ...)
Description
s = struct('field1', {}, 'field2', {}, ...) creates an empty structure with fields field1, field2,
s = struct('field1', values1, 'field2', values2, ...) creates a structure array with the
specified fields and values. The value arrays values1, values2, etc., must be cell
arrays of the same size or scalar cells. Corresponding elements of the value arrays
are placed into corresponding structure array elements. The size of the resulting
structure is the same size as the value cell arrays or 1-by-1 if none of the values is a
cell.
28. Tree is the name of structure. This structure now contains only one
structure with this following properties:
Symbol: To save the symbol name.
prob : To save the original symbol name.
pTemp1: here to where the addition and sorting are done.
Node: To trace each symbol
Serial : Not important, just for checking.
Code: Here, where the code words are saved.
CP: this is the code pointer. The code pointer is to point where the next
0 or 1 should be written.
Note:- one structure is only for one symbol. So, we have to put Tree
inside a loop to generate many structures equal to the number of
symbols we have.
29. Let’s recall now our example and see how it’s going to be fit inside the
structure.
43. After we have calculated the code word for
each symbol. Now we go to the finalizing
stage there where all the calculations of
length, entropy, efficiency and code words.
This is easy, just applying formula.
44. This is the output. Code words are listed in the descending order
of the original probabilities.