SlideShare a Scribd company logo
Data Compression and
Encryption- (DCE)
TE-EXTC-Elective Subject
Data Compression and encryption for security
Data Compression and encryption for security
What is Data Compression?
• Data can be Text, image, audio or video
information
• Data compression is about storing and sending a
smaller number of bits. .
• Data compression is the art or science of
representing information in a compact form.
• We create these compact representations by
identifying and using structures that exist in the
data.
How data compression is possible
DATA = INFORMATION + REDUNDANT DATA
Data Compression and encryption for security
Need of data compression?
• Less storage space
• Less transmission time
• Reduced Bandwidth.
• During compression data is converted in
coded form, that secures the information.
Data compression working
Application
• In Satellite Communication
• In Radar
• In Mobile Communication
• In Digital Television
• In Banking Software
• Medical applications and so on..
Data Compression Techniques
Data Compression Techniques
1. Lossless Compression Techniques:
 No loss of information
 Original data can be recovered exactly from compressed data
 Loss of information is not tolerable
 Lower compression ratio
 Fidelity and Quality is high
 Highly Robust and Reliable
 Distortion is less
 Rate is high
 Text and Image compression
 Lossless Algorithm: Huffman coding, Arithmetic coding
 Applications: Medical, Satellite, Banking etc.
Data Compression Techniques
2. Lossy Compression Techniques:
 Some loss of information
 Original data cannot be recovered exactly from compressed data
 Higher compression ratio
 Loss of information is tolerable
 Fidelity and Quality is low
 Less Robust and Reliable
 Distortion is more.
 Rate is low
 Audio and Video compression
 Lossy Algorithm : MPEG,JPEG etc.
 Applications: Telephone, Mobile,Tv etc.
Measure of performance
A compression algorithm can be evaluated in different
ways:
1. Relative complexity of algorithm
2. Memory required to implement the algorithm
3. How fast the algorithm, performs on given machine
4. Amount of compression
5. How close reconstruction resembles the original data
Measure of performance cont..
1. Compression Ratio:
CR =
If CR<1 = then there is expansion instead of compression.
This is known as negative compression.
The compression ratio is always greater than 1.
Measures of Performance
2. Compression Factor:
• Compression factor is inverse of compression ratio.
• CF=
• Compression factor is always less than 1.
• If CF>1 then there is expansion instead of compression.
• Thus smaller the factor better the compression.
Measure of performance cont..
3. Rate:
• Average number of bits required to represent a single
sample is called rate.
• Eg. Image compression (bits/pixel).
4. Distortion:
• The difference between original data and
reconstructed data is called distortion.
• Eg. Lossy compression
Measure of performance cont..
5. Performance measure speed:
• Speed of compression can be measured by cycle/byte.
• It is important when compression is done by special
hardware.
6. Fidelity and Quality:
• It indicates the difference between reconstructed and
original data.
• When fidelity and quality is high, means difference
between reconstructed and original data is small.
• Eg. Lossless compression(image/text)
Lossless Compression
• Lossless compression techniques, as their name implies,
involve no loss of information.
• If data have been losslessly compressed, the original data
can be recovered exactly from the compressed data.
• Lossless compression is generally used for applications
that cannot tolerate any difference between the original
and reconstructed data.
• Text compression is an important area for lossless
compression. It is very important that the reconstruction
is identical to the text original, as very small differences
can result in statements with very different meanings.
Classification of Lossless Compression Techniques
Lossless techniques are classified into static, adaptive (or dynamic),
and hybrid.
• In a static method the mapping from the set of messages to the set of
code words is fixed before transmission begins, so that a given message is
represented by the same codeword every time it appears in the message
being encoded.
• Static coding requires two passes: one pass to compute probabilities (or
frequencies) and determine the mapping, and a second pass to encode.
• Examples: Static Huffman Coding
• In an adaptive method the mapping from the set of messages to the set
of code words changes over time.
• All of the adaptive methods are one-pass methods; only one scan of the
message is required.
• Examples: LZ77, LZ78, LZW, and Adaptive Huffman Coding
• An algorithm may also be a hybrid, neither completely static nor
completely dynamic.
Data Compression and encryption for security
Lossy Compression
• Lossy compression techniques involve some loss of
information, and data that have been compressed using
lossy techniques generally cannot be recovered or
reconstructed exactly.
• In return for accepting this distortion in the reconstruction,
we can generally obtain much higher compression ratios
than is possible with lossless compression.
• In many applications, this lack of exact reconstruction is not
a problem. For example, when storing or transmitting
speech, the exact value of each sample of speech is not
necessary.
• Depending on the quality required of the reconstructed
speech, varying amounts of loss of information about the
value of each sample can be tolerated.
Compression Utilities and Formats
• Compression tool examples:
winzip, pkzip, compress, gzip
„
• General compression formats:
.zip, .gz
„
• Common image compression formats:
JPEG, JPEG 2000, BMP, GIF, PCX, PNG, TGA, TIFF, WMP
• Common audio (sound) compression formats:
MPEG-1 Layer III (known as MP3), RealAudio (RA, RAM, RP), AU,
Vorbis, WMA, AIFF, WAVE, G.729a
• Common video (sound and image) compression formats:
MPEG-1, MPEG-2, MPEG-4, DivX, Quicktime (MOV), RealVideo (RM),
Windows Media Video ( WMV), Video for Windows ( AVI), Flash video
(FLV)
Modeling and Coding
An compression technique which work well for the compression of text
may not work well for compressing images.
Development of data compression algorithm for a variety of data is
divided into two phases:
1. Modeling 2. Coding
Modeling:
• In this method we try to extract information about redundancy that
exist in the data & describes the redundancy in the form of model.
• The description of model and description how data differs from
model are encoded generally using binary.
• The difference between data and model is often referred to as
residual.
• We can obtain compression by transmitting or storing the
parameters of the model and the residual sequence.
Modeling and Coding
Coding:
• In this we perform different arithmetic
operations on modeled data according to
simplicity & quality required.
• There are different coding techniques used for
compression of data are Huffman coding,
Shannon fano coding, Arithmetic coding.
Different MODELS
• Physical Models
• Probability Models
• Markov Models
• Composite Source Model
Different Coding
• Uniquely Decodable Codes
• Prefix Codes
• Huffman Codes
• Shannon Fanon Codes
Different Types of Models
1. Physical Model:
• If we know something about physics of data
generation process, we can use that information to
reconstruct the model, this model called as physical
model.
• Eg: In speech applications, the knowledge about the
physics of speech production is used to construct
mathematical model for sampled speech process.
• Sample speech is encoded using this model.
• The physics of data generation is too complicated to
understand so the are rarely used.
Different Types of Models cont..
2. Probability Model:
• This model is used when there is no any idea
about statics of source.
• In this model the probability of redundant
letters are calculated which are generated
from source.
• It assumes that each letter generated by
source is independent of every other letter
and so this model is called as “ignorance
model”.
• It is useful for text and image compression.
Different Types of Models cont..
3. Markov Model:
• In this model, the present state depends on previous state of samples.
• This model is useful for lossless data compression.
• A specific type of Markov process is used called ‘Discrete Time Markov Process’.
• Let Xn be the sequence of observations, follow Kth order model,
P(Xn|Xn-1,Xn-2,…….Xn-k) = P(Xn|Xn-1,Xn-2,…….Xn-k-1)
• It means knowledge of past k symbols is equivalent to the knowledge of entire
past history of the process.
Xn-1,Xn-2,…….Xn-k is called states of process
• The first order Markov model is written as,
P(Xn|Xn-1) = P(Xn|Xn-1,Xn-2,…….)
• Markov model : Complex
Accurate results
Different Types of Models cont..
4. Composite source Model:
• In many application it is not easy to use a single model to describe this source.
• In such cases composite source model is useful.
• It uses number of different sources Si, each with its own model Mi.
• The switch is used to select a particular source according to requirement.
Source 1 1
Source 2
Source n
Switch
• The first phase is usually referred to as modeling.
• In this phase we try to extract information about any
redundancy that exists in the data and describe the
redundancy in the form of a model.
• The second phase is called coding. A description of
the model and a “description” of how the data differ
from the model are encoded, generally using a binary
alphabet.
• The difference between the data and the model is
often referred to as the residual
Coding
When we talk about coding we mean the assignment of binary
sequences to elements of an alphabet. The set of binary sequences
is called a code, and the individual members of the set are called
code words. An alphabet is a collection of symbols called letters.
Data Compression and encryption for security
Data Compression and encryption for security
Shannon Fano Coding
• Its a method of constructing prefix code based on a set of
symbols and their probabilities estimated or measured.
• The technique was proposed in Shannon's "A Mathematical
Theory of Communication", his 1948 article introducing the
field of information theory.
• In the field of data compression, Shannon–Fano coding,
named after Claude Shannon and Robert Fano, is a technique
for constructing a prefix code based on a set of symbols and
their probabilities (estimated or measured).
• It is suboptimal in the sense that it does not achieve the lowest
possible expected code word length like Huffman coding;
however unlike Huffman coding, it does guarantee that all
code word lengths are within one bit of their theoretical ideal .
Basic Technique
 In Shannon–Fano coding, the symbols are arranged in order
from most probable to least probable, and then divided into
two sets whose total probabilities are as close as possible to
being equal.
 All symbols then have the first digits of their codes assigned;
symbols in the first set receive "0" and symbols in the
second set receive "1".
 As long as any sets with more than one member remain, the
same process is repeated on those sets, to determine
successive digits of their codes.
 When a set has been reduced to one symbol this means the
symbol's code is complete and will not form the prefix of
any other symbol's code.
Data Compression and encryption for security
Data Compression and encryption for security
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.4 0
X2 0.19 0
x3 0.16 1
x4 0.15 1
x5 0.1 1
1)x1 and x2 = UP = Yelow = 0.4+0.19= 0.59,
x3,x4 and x4= LP = Red = 0.16+0.15+0.1= 0.41.
UP and LP are almost equal probable or same
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.4 0 0
X2 0.19 0 1
x3 0.16 1
x4 0.15 1
x5 0.1 1
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.4 0 0
X2 0.19 0 1
x3 0.16 1 0
x4 0.15 1 1
x5 0.1 1 1
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.4 0 0 00 2
X2 0.19 0 1 01 2
x3 0.16 1 0 10 2
x4 0.15 1 1 0 110 3
x5 0.1 1 1 1 111 3
Data Compression and encryption for security
Data Compression and encryption for security
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.30
X2 0.25
x3 0.15
x4 0.12
X5 0.10
x6 0.08
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.30 0
X2 0.25 0
x3 0.15 1
x4 0.12 1
X5 0.10 1
x6 0.08 1
1) 0.3+0.25= 0.55, x1 and x2 = UP=Yelow
0.15+0.12+0.1+0.08= 0.45, x3,x4 and x4= LP= Red
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.30 0 0
X2 0.25 0 1
x3 0.15 1
x4 0.12 1
X5 0.10 1
x6 0.08 1
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.30 0 0
X2 0.25 0 1
x3 0.15 1 0
x4 0.12 1 0
X5 0.10 1 1
x6 0.08 1 1
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.30 0 0
X2 0.25 0 1
x3 0.15 1 0 0
x4 0.12 1 0 1
X5 0.10 1 1
x6 0.08 1 1
Messag
e
Prob Step 1 Step 2 Step 3 Code Code
length
x1 0.30 0 0 00 2
X2 0.25 0 1 01 2
x3 0.15 1 0 0 100 3
x4 0.12 1 0 1 101 3
X5 0.10 1 1 0 110 3
x6 0.08 1 1 1 111 3
1) 0.3+0.25= 0.55, x1 and x2 = UP=Yelow ; 0.15+0.12+0.1+0.08= 0.45, x3,x4
and x4= LP= Red
The Huffman Coding Algorithm
This technique was developed by David Huffman as part of a
class assignment; the class was the first ever in the area of
information theory and was taught by Robert Fano at MIT.
The codes generated using this technique or procedure are
called Huffman codes. These codes are prefix codes and are
optimum for a given model (set of probabilities).
The Huffman procedure is based on two observations regarding
optimum prefix codes.
1. In an optimum code, symbols that occur more frequently
(have a higher probability of occurrence) will have shorter
codewords than symbols that occur less frequenly.
2. In an optimum code, the two symbols that occur least
frequently will have the same length.
Design of a Huffman code
Let us design a Huffman code for a source that puts out letters from an
alphabet = a1, a2, a3, a4, a5 with P(a1) = P(a3) = 0. 2, P(a2) = 0.4, and
P(a4) = P(a5) = 0.1. The entropy for this source is 2.122 bits/symbol.
To design the Huffman code, we first sort the letters in a descending
probability order as shown in Table 3.1. Here c(ai) denotes the codeword for
ai.
Data Compression and encryption for security
Data Compression and encryption for security
Data Compression and encryption for security
Minimum Variance Huffman Codes
Data Compression and encryption for security
Data Compression and encryption for security

More Related Content

PDF
Charter1 material
PPT
VII Compression Introduction
PPTX
Introduction to data compression.pptx
PPTX
Data compression
PPT
Compressionbasics
PPTX
Compression technologies
PPTX
Fundamentals of Data compression
PPTX
Data compression
Charter1 material
VII Compression Introduction
Introduction to data compression.pptx
Data compression
Compressionbasics
Compression technologies
Fundamentals of Data compression
Data compression

Similar to Data Compression and encryption for security (20)

PPT
datacompression-150127035138-conversion-gate01.ppt
PPTX
Data compression
PPT
Compression techniques
PPT
Data compression
PPT
Data Compression
PPT
Data compression
PPTX
Teknik Pengkodean (2).pptx
PPTX
data compression technique
PPT
ch15.ppt
PDF
2019188026 Data Compression (1) (1).pdf
PDF
Image compression
PPTX
Introduction for Data Compression
PDF
Dictionary Based Compression
PDF
A research paper_on_lossless_data_compre
PPTX
DATA COMPRESSION, physical models, probability models, markov model
PPTX
Data representation
PPT
notes_Image Compression_edited.ppt
PPTX
Data compression techniques
PPT
Lec6 compression
datacompression-150127035138-conversion-gate01.ppt
Data compression
Compression techniques
Data compression
Data Compression
Data compression
Teknik Pengkodean (2).pptx
data compression technique
ch15.ppt
2019188026 Data Compression (1) (1).pdf
Image compression
Introduction for Data Compression
Dictionary Based Compression
A research paper_on_lossless_data_compre
DATA COMPRESSION, physical models, probability models, markov model
Data representation
notes_Image Compression_edited.ppt
Data compression techniques
Lec6 compression
Ad

Recently uploaded (20)

DOCX
573137875-Attendance-Management-System-original
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPT
Project quality management in manufacturing
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
UNIT 4 Total Quality Management .pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Well-logging-methods_new................
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Mechanical Engineering MATERIALS Selection
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Construction Project Organization Group 2.pptx
PPTX
Geodesy 1.pptx...............................................
573137875-Attendance-Management-System-original
Operating System & Kernel Study Guide-1 - converted.pdf
Project quality management in manufacturing
Embodied AI: Ushering in the Next Era of Intelligent Systems
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
UNIT 4 Total Quality Management .pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Well-logging-methods_new................
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Sustainable Sites - Green Building Construction
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Mechanical Engineering MATERIALS Selection
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
CH1 Production IntroductoryConcepts.pptx
Construction Project Organization Group 2.pptx
Geodesy 1.pptx...............................................
Ad

Data Compression and encryption for security

  • 1. Data Compression and Encryption- (DCE) TE-EXTC-Elective Subject
  • 4. What is Data Compression? • Data can be Text, image, audio or video information • Data compression is about storing and sending a smaller number of bits. . • Data compression is the art or science of representing information in a compact form. • We create these compact representations by identifying and using structures that exist in the data.
  • 5. How data compression is possible DATA = INFORMATION + REDUNDANT DATA
  • 7. Need of data compression? • Less storage space • Less transmission time • Reduced Bandwidth. • During compression data is converted in coded form, that secures the information.
  • 9. Application • In Satellite Communication • In Radar • In Mobile Communication • In Digital Television • In Banking Software • Medical applications and so on..
  • 11. Data Compression Techniques 1. Lossless Compression Techniques:  No loss of information  Original data can be recovered exactly from compressed data  Loss of information is not tolerable  Lower compression ratio  Fidelity and Quality is high  Highly Robust and Reliable  Distortion is less  Rate is high  Text and Image compression  Lossless Algorithm: Huffman coding, Arithmetic coding  Applications: Medical, Satellite, Banking etc.
  • 12. Data Compression Techniques 2. Lossy Compression Techniques:  Some loss of information  Original data cannot be recovered exactly from compressed data  Higher compression ratio  Loss of information is tolerable  Fidelity and Quality is low  Less Robust and Reliable  Distortion is more.  Rate is low  Audio and Video compression  Lossy Algorithm : MPEG,JPEG etc.  Applications: Telephone, Mobile,Tv etc.
  • 13. Measure of performance A compression algorithm can be evaluated in different ways: 1. Relative complexity of algorithm 2. Memory required to implement the algorithm 3. How fast the algorithm, performs on given machine 4. Amount of compression 5. How close reconstruction resembles the original data
  • 14. Measure of performance cont.. 1. Compression Ratio: CR = If CR<1 = then there is expansion instead of compression. This is known as negative compression. The compression ratio is always greater than 1.
  • 15. Measures of Performance 2. Compression Factor: • Compression factor is inverse of compression ratio. • CF= • Compression factor is always less than 1. • If CF>1 then there is expansion instead of compression. • Thus smaller the factor better the compression.
  • 16. Measure of performance cont.. 3. Rate: • Average number of bits required to represent a single sample is called rate. • Eg. Image compression (bits/pixel). 4. Distortion: • The difference between original data and reconstructed data is called distortion. • Eg. Lossy compression
  • 17. Measure of performance cont.. 5. Performance measure speed: • Speed of compression can be measured by cycle/byte. • It is important when compression is done by special hardware. 6. Fidelity and Quality: • It indicates the difference between reconstructed and original data. • When fidelity and quality is high, means difference between reconstructed and original data is small. • Eg. Lossless compression(image/text)
  • 18. Lossless Compression • Lossless compression techniques, as their name implies, involve no loss of information. • If data have been losslessly compressed, the original data can be recovered exactly from the compressed data. • Lossless compression is generally used for applications that cannot tolerate any difference between the original and reconstructed data. • Text compression is an important area for lossless compression. It is very important that the reconstruction is identical to the text original, as very small differences can result in statements with very different meanings.
  • 19. Classification of Lossless Compression Techniques Lossless techniques are classified into static, adaptive (or dynamic), and hybrid. • In a static method the mapping from the set of messages to the set of code words is fixed before transmission begins, so that a given message is represented by the same codeword every time it appears in the message being encoded. • Static coding requires two passes: one pass to compute probabilities (or frequencies) and determine the mapping, and a second pass to encode. • Examples: Static Huffman Coding • In an adaptive method the mapping from the set of messages to the set of code words changes over time. • All of the adaptive methods are one-pass methods; only one scan of the message is required. • Examples: LZ77, LZ78, LZW, and Adaptive Huffman Coding • An algorithm may also be a hybrid, neither completely static nor completely dynamic.
  • 21. Lossy Compression • Lossy compression techniques involve some loss of information, and data that have been compressed using lossy techniques generally cannot be recovered or reconstructed exactly. • In return for accepting this distortion in the reconstruction, we can generally obtain much higher compression ratios than is possible with lossless compression. • In many applications, this lack of exact reconstruction is not a problem. For example, when storing or transmitting speech, the exact value of each sample of speech is not necessary. • Depending on the quality required of the reconstructed speech, varying amounts of loss of information about the value of each sample can be tolerated.
  • 22. Compression Utilities and Formats • Compression tool examples: winzip, pkzip, compress, gzip „ • General compression formats: .zip, .gz „ • Common image compression formats: JPEG, JPEG 2000, BMP, GIF, PCX, PNG, TGA, TIFF, WMP • Common audio (sound) compression formats: MPEG-1 Layer III (known as MP3), RealAudio (RA, RAM, RP), AU, Vorbis, WMA, AIFF, WAVE, G.729a • Common video (sound and image) compression formats: MPEG-1, MPEG-2, MPEG-4, DivX, Quicktime (MOV), RealVideo (RM), Windows Media Video ( WMV), Video for Windows ( AVI), Flash video (FLV)
  • 23. Modeling and Coding An compression technique which work well for the compression of text may not work well for compressing images. Development of data compression algorithm for a variety of data is divided into two phases: 1. Modeling 2. Coding Modeling: • In this method we try to extract information about redundancy that exist in the data & describes the redundancy in the form of model. • The description of model and description how data differs from model are encoded generally using binary. • The difference between data and model is often referred to as residual. • We can obtain compression by transmitting or storing the parameters of the model and the residual sequence.
  • 24. Modeling and Coding Coding: • In this we perform different arithmetic operations on modeled data according to simplicity & quality required. • There are different coding techniques used for compression of data are Huffman coding, Shannon fano coding, Arithmetic coding.
  • 25. Different MODELS • Physical Models • Probability Models • Markov Models • Composite Source Model Different Coding • Uniquely Decodable Codes • Prefix Codes • Huffman Codes • Shannon Fanon Codes
  • 26. Different Types of Models 1. Physical Model: • If we know something about physics of data generation process, we can use that information to reconstruct the model, this model called as physical model. • Eg: In speech applications, the knowledge about the physics of speech production is used to construct mathematical model for sampled speech process. • Sample speech is encoded using this model. • The physics of data generation is too complicated to understand so the are rarely used.
  • 27. Different Types of Models cont.. 2. Probability Model: • This model is used when there is no any idea about statics of source. • In this model the probability of redundant letters are calculated which are generated from source. • It assumes that each letter generated by source is independent of every other letter and so this model is called as “ignorance model”. • It is useful for text and image compression.
  • 28. Different Types of Models cont.. 3. Markov Model: • In this model, the present state depends on previous state of samples. • This model is useful for lossless data compression. • A specific type of Markov process is used called ‘Discrete Time Markov Process’. • Let Xn be the sequence of observations, follow Kth order model, P(Xn|Xn-1,Xn-2,…….Xn-k) = P(Xn|Xn-1,Xn-2,…….Xn-k-1) • It means knowledge of past k symbols is equivalent to the knowledge of entire past history of the process. Xn-1,Xn-2,…….Xn-k is called states of process • The first order Markov model is written as, P(Xn|Xn-1) = P(Xn|Xn-1,Xn-2,…….) • Markov model : Complex Accurate results
  • 29. Different Types of Models cont.. 4. Composite source Model: • In many application it is not easy to use a single model to describe this source. • In such cases composite source model is useful. • It uses number of different sources Si, each with its own model Mi. • The switch is used to select a particular source according to requirement. Source 1 1 Source 2 Source n Switch
  • 30. • The first phase is usually referred to as modeling. • In this phase we try to extract information about any redundancy that exists in the data and describe the redundancy in the form of a model. • The second phase is called coding. A description of the model and a “description” of how the data differ from the model are encoded, generally using a binary alphabet. • The difference between the data and the model is often referred to as the residual
  • 31. Coding When we talk about coding we mean the assignment of binary sequences to elements of an alphabet. The set of binary sequences is called a code, and the individual members of the set are called code words. An alphabet is a collection of symbols called letters.
  • 34. Shannon Fano Coding • Its a method of constructing prefix code based on a set of symbols and their probabilities estimated or measured. • The technique was proposed in Shannon's "A Mathematical Theory of Communication", his 1948 article introducing the field of information theory. • In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). • It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding; however unlike Huffman coding, it does guarantee that all code word lengths are within one bit of their theoretical ideal .
  • 35. Basic Technique  In Shannon–Fano coding, the symbols are arranged in order from most probable to least probable, and then divided into two sets whose total probabilities are as close as possible to being equal.  All symbols then have the first digits of their codes assigned; symbols in the first set receive "0" and symbols in the second set receive "1".  As long as any sets with more than one member remain, the same process is repeated on those sets, to determine successive digits of their codes.  When a set has been reduced to one symbol this means the symbol's code is complete and will not form the prefix of any other symbol's code.
  • 38. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.4 0 X2 0.19 0 x3 0.16 1 x4 0.15 1 x5 0.1 1 1)x1 and x2 = UP = Yelow = 0.4+0.19= 0.59, x3,x4 and x4= LP = Red = 0.16+0.15+0.1= 0.41. UP and LP are almost equal probable or same
  • 39. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.4 0 0 X2 0.19 0 1 x3 0.16 1 x4 0.15 1 x5 0.1 1
  • 40. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.4 0 0 X2 0.19 0 1 x3 0.16 1 0 x4 0.15 1 1 x5 0.1 1 1
  • 41. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.4 0 0 00 2 X2 0.19 0 1 01 2 x3 0.16 1 0 10 2 x4 0.15 1 1 0 110 3 x5 0.1 1 1 1 111 3
  • 44. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.30 X2 0.25 x3 0.15 x4 0.12 X5 0.10 x6 0.08
  • 45. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.30 0 X2 0.25 0 x3 0.15 1 x4 0.12 1 X5 0.10 1 x6 0.08 1 1) 0.3+0.25= 0.55, x1 and x2 = UP=Yelow 0.15+0.12+0.1+0.08= 0.45, x3,x4 and x4= LP= Red
  • 46. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.30 0 0 X2 0.25 0 1 x3 0.15 1 x4 0.12 1 X5 0.10 1 x6 0.08 1
  • 47. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.30 0 0 X2 0.25 0 1 x3 0.15 1 0 x4 0.12 1 0 X5 0.10 1 1 x6 0.08 1 1
  • 48. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.30 0 0 X2 0.25 0 1 x3 0.15 1 0 0 x4 0.12 1 0 1 X5 0.10 1 1 x6 0.08 1 1
  • 49. Messag e Prob Step 1 Step 2 Step 3 Code Code length x1 0.30 0 0 00 2 X2 0.25 0 1 01 2 x3 0.15 1 0 0 100 3 x4 0.12 1 0 1 101 3 X5 0.10 1 1 0 110 3 x6 0.08 1 1 1 111 3 1) 0.3+0.25= 0.55, x1 and x2 = UP=Yelow ; 0.15+0.12+0.1+0.08= 0.45, x3,x4 and x4= LP= Red
  • 50. The Huffman Coding Algorithm This technique was developed by David Huffman as part of a class assignment; the class was the first ever in the area of information theory and was taught by Robert Fano at MIT. The codes generated using this technique or procedure are called Huffman codes. These codes are prefix codes and are optimum for a given model (set of probabilities). The Huffman procedure is based on two observations regarding optimum prefix codes. 1. In an optimum code, symbols that occur more frequently (have a higher probability of occurrence) will have shorter codewords than symbols that occur less frequenly. 2. In an optimum code, the two symbols that occur least frequently will have the same length.
  • 51. Design of a Huffman code Let us design a Huffman code for a source that puts out letters from an alphabet = a1, a2, a3, a4, a5 with P(a1) = P(a3) = 0. 2, P(a2) = 0.4, and P(a4) = P(a5) = 0.1. The entropy for this source is 2.122 bits/symbol. To design the Huffman code, we first sort the letters in a descending probability order as shown in Table 3.1. Here c(ai) denotes the codeword for ai.