SlideShare a Scribd company logo
Compression Fundamentals
Topics today… Why Compression ? Information Theory Basics Classification of Compression Algorithms Data Compression Model Compression Performance
Why Compression? Digital representation of analog signals requires huge storage High quality audio signal require 1.5 megabits/sec A low resolution movie (30 frames per second, 640 x 580 pixels per frame, 24 bits per pixel) requires 210 megabits per minute!! 95 gigabytes per hour  It is challenging transferring such files through the available limited bandwidth network.
Why Compression? Table –1  :  Uncompressed source data rates Source Bit Rates for uncompressed Sources (Approximate) Telephony (200-3400 Hz) 8000 samples/second x 12 bits/sample = 96 kbps Wideband Audio (20-20000 Hz) 44100 samples/second x 2 channels x 16 bits/sample= 1.412Mbps Images 512x512 pixel color image x 24 bits/pixel = 6.3Mbits/image Video 640x480 pixel color image x 24 bits/pixel x 30 images/second=221 Mbps    650 megabyte CD can store 23.5 mins of video ? HDTV 1280x720 pixel color image x 60 images/second x 24 bits/pixel=1.3Gbps
The compression problem Efficient digital representation of a source Data compression is the representation of the source in digital form with as few bits as possible while maintaining an acceptable loss in fidelity. Source can be data, still images, speech, audio, video or whatever signal needs to be stored & transmitted .
Synonyms for Data Compression Signal compression & signal coding Source coding & source coding with fidelity criterion (in information theory) Noiseless & Noisy Source coding (lossless & lossy compression) Noise    Reconstruction noise Bandwidth compression, redundancy removal (more dated terminologies, in 80’s.)
Types of Data Compression Problem Distortion-rate Problem Given the constraint on transmitted data rate or storage capacity, problem is to compress the source at or below this rate but at the highest fidelity possible Ex . Voice mail, video conferencing, digital cellular Rate-distortion Problem Given the constraint on the fidelity, problem is to achieve it with as few bits as possible Ex . CD-Quality audio
Information Theory Basics Representation of data is the combination of information and redundancy Data compression is essentially a redundancy reduction technique Data compression scheme can be broadly divided into two phases Modelling Coding
Information Theory Basics In  Modeling phase  information about redundancy is analyzed & represented as a model This can be done via observing the empirical distribution of the symbols the source generates In the  coding phase  the difference between the actual data and the model is coded
Discrete Memoryless Model Source is discrete memoryless if it generates symbol that is statistically independent of one another Described by the source alphabet A={a 1 ,a 2 ,a 3 …a n } and associated probabilities P=(p(a 1 ), p(a 2 ), p(a 3 ),…. p(a n )) The  amount of information content for a source symbol  I(a i ) is The base 2 logarithm indicates the information content is represented in bits. Higher probability symbols are coded with less bits.
Discrete Memoryless Model  [2] Averaging the information content over all symbols, we get the  entropy  E as follows Hence, entropy is the expected length of a binary code over all the symbols. Estimation of entropy depends on the observation & assumption on the structure of source symbols
Noiseless source coding theorem The  Noiseless Source Coding Theorem  states that any source can be losslessly encoded with a code whose average number of bits per source symbol is arbitrarily close to, but not less than, the source entropy  E  in bits by coding infinitely long extensions of the source.
Entropy Reduction Consider a discrete memoryless source, with source alphabet A1 = {α, β, γ, δ} & probability  p (α) = 0.65,  p (β) = 0.20,  p (γ) = 0.10,  p (δ) = 0.05 respectively  The entropy of this source is  E  = −(0.65 log2 0.65 + 0.20 log2 0.20 + 0.10 log2 0.10 + 0.05 log2 0.05)    = 1.42 bits per symbol A data source of 2000 symbols can be represented using 2000 x 1.42 = 2840 bits
Entropy Reduction  [2] Now assume we know something about the structure of the sequence Alphabet A2 = {0, 1, 2, 3}  Sequence  D  = 0 1 1 2 3 3 3 3 3 3 3 3 3 2 2 2 3 3 3 3  p (0) = 0.05,  p (1) = 0.10,  p (2) = 0.20, and  p (3) = 0.65  E  = 1.42 bits per symbol Assume the correlation between consecutive bits and we attempt to reduce it by  r i  =  s i  −  s i −1  for each sample  s i
Entropy Reduction  [3] Now D  = 0 1 0 1 1 0 0 0 0 0 0 0 0 −1 0 0 1 0 0 0  A2 = {−1, 1, 0} P (−1) = 0.05,  p (1) = 0.2, and  p (0) = 0.75  E  = 0.992  If used appropriate entropy coding technique maximum compression can be achieved
Unique Decipherability Consider the following table Symbols are encoded with codes A, B and C. Consider the string  S  = ααγαβαδ
Unique Decipherability  [2] Deciphering C A (S) and C B (S) are unambiguous and we get the string S C C (S) is ambiguous and not uniquely decipherable Fixed length codes are always uniquely decipherable. Not all variable length codes are uniquely decipherable.
Unique Decipherability  [3] Uniquely decipherable codes maintain  prefix property , ie  no codeword in the code-set forms the prefix of another distinct codeword  Popular variable-length coding techniques  Shannon-Fano Coding Huffman Coding Elias Coding Arithmetic Coding Fixed-length codes can be treated as a special case of uniquely decipherable  variable-length  code.
Classification of compression algorithms CODEC
Classification of compression algorithms [2] Data compression  as a method that takes an input data  D  and generates a shorter representation of the data  c ( D ) with a fewer number of bits compared to that of  D The reverse process is called  decompression,  which takes the compressed data c(D) and generates or reconstructs the data D′  Sometimes the  compression  (coding) and  decompression  (decoding) systems together are called a "CODEC,"
Classification of compression algorithms [3] If the reconstructed data  D ′ is an exact replica of the original data  D,  we call the algorithm applied to compress  D  and decompress  c ( D ) to be  lossless .  Otherwise the algorithms are  lossy Text, scientific data, medical images are some of the applications requires lossless compression Compression can be  static  or  dynamic , depends on the coding scheme used
Data compression model A data compression system mainly consists of three major steps removal or reduction in data redundancy reduction in entropy entropy encoding
Data compression model REDUCTION IN DATA REDUNDANCY Removal or reduction in data redundancy is typically achieved by transforming the original data from one form or representation to another Popular transformation techniques are  Discrete Cosine Transform (DCT) Discrete Wavelet Transformation (DWT) etc This step leads to the reduction of entropy For Lossless compression this transformation is completely reversible
Data compression model   REDUCTION IN ENTROPY Non reversible process Achieved by dropping insignificant information in the transformed data ( Lossy!!! ) Done by some  quantization  techniques Amount of quantization dictate the quality of the reconstructed data Entropy of the quantized data is less compared to the original one, hence more compression.
Compression Performance The performance measures of data compression algorithms can be looked at from different perspectives depending on the application requirements amount of compression achieved objective and subjective quality of the reconstructed data relative complexity of the algorithm speed of execution, etc.
Compression Performance  AMOUNT OF COMPRESSION ACHIEVED Compression ratio , the ratio of the number of bits to represent the original data to the number of bits to represent the compressed data Achievable compression ratio using a lossless compression scheme is totally input data dependent.  Sources with less redundancy have more entropy and hence are more difficult to achieve compression
Compression Performance  SUBJECTIVE QUALITY METRIC MOS :  m ean  o bservers  s core  or  m ean  o pinion  s core   is a common measure A statistically significant number of observers are randomly chosen to evaluate visual quality of the reconstructed images.  Each observer assigns a numeric score to each reconstructed image based on his or her perception of quality of the image, say within a range 1–5 to describe the quality of the image—5 being the highest quality and 1 being the worst quality.  MOS is the average of these scores
Compression Performance  OBJECTIVE QUALITY METRIC Common quality metrics are  root-mean-squared error ( RMSE ) signal-to-noise ratio ( SNR ) peak signal-to-noise ratio ( PSNR ).  If  I  is an  M  ×  N  image and  I  is the corresponding reconstructed image after compression and decompression,  RMSE  is calculated by  The  SNR  in decibel unit (dB) is expressed as
Compression Performance  CODING DELAY AND COMPLEXITY Coding delay , a performance measure for compression algorithms where interactive encoding and decoding is the requirement (e.g., interactive video teleconferencing, on-line image browsing, real-time voice communication, etc.) The complex the compression algorithm    Increased coding delay Compression system designer often use a less sophisticated algorithm for the compression system.
Compression Performance  CODING DELAY AND COMPLEXITY Coding complexity , a performance measure considered where the computational requirement to implement the codec is an important criteria MOPS  (millions of operations per second),  MIPS  (millions of instructions per second) are often used to measure the compression performance in a specific computing engine's architecture.
Reference Chapter 1 of  JPEG2000 Standard for Image Compression: Concepts, Algorithms and VLSI Architectures   by Tinku Acharya and Ping-Sing Tsai , John Wiley & Sons http://discovery.bits-pilani.ac.in/discipline/csis/vimal/course%2006-07%20Second/MMC/Lectures/cf.doc Chapter 1 of  Digital Compression for Multimedia: Principles & Standards  by Jerry D.Gibson

More Related Content

PPT
Compression techniques
PPTX
Data compression
PPTX
Data compression techniques
PPT
Image compression
PPTX
Image compression: Techniques and Application
PPTX
Fundamentals and image compression models
PPTX
Data compression
PPTX
Image compression
Compression techniques
Data compression
Data compression techniques
Image compression
Image compression: Techniques and Application
Fundamentals and image compression models
Data compression
Image compression

What's hot (20)

PPTX
Data compression
PPTX
Data compression
PPTX
data compression technique
PPTX
data compression.
PPT
image compresson
PPTX
Data compression
PPTX
image basics and image compression
PPT
VII Compression Introduction
PPT
Data compression
DOC
Seminar Report on image compression
PPTX
Fundamentals of Data compression
DOC
Image compression
PPT
Data compression
PPT
Compression
PPTX
digital image processing
PDF
Presentation on Image Compression
PPTX
A new algorithm for data compression technique using vlsi
PPTX
Data Compression Project Presentation
PDF
Data compression
PPTX
Introduction to Image Compression
Data compression
Data compression
data compression technique
data compression.
image compresson
Data compression
image basics and image compression
VII Compression Introduction
Data compression
Seminar Report on image compression
Fundamentals of Data compression
Image compression
Data compression
Compression
digital image processing
Presentation on Image Compression
A new algorithm for data compression technique using vlsi
Data Compression Project Presentation
Data compression
Introduction to Image Compression
Ad

Viewers also liked (9)

PPT
Science jeopardy
PDF
Raised hands
PPT
Vste r2
PPT
Geographic regions of virginia step by step
PDF
East40 pp 37 40 ita web
DOCX
Cheilitis angularis etio mekaa
PPT
Testcase at night
PPTX
The information need of role players and LARPers
PPTX
Kids Matter: The economics of fatherhood - #1 Health
Science jeopardy
Raised hands
Vste r2
Geographic regions of virginia step by step
East40 pp 37 40 ita web
Cheilitis angularis etio mekaa
Testcase at night
The information need of role players and LARPers
Kids Matter: The economics of fatherhood - #1 Health
Ad

Similar to Compressionbasics (20)

PPT
notes_Image Compression_edited.ppt
PPTX
Source coding
PPTX
Teknik Pengkodean (2).pptx
PPTX
Image compression
PPT
Compression
PDF
J03502050055
PDF
Compression of digital voice and video
PPTX
Data compression
PDF
Lossless Data Compression Using Rice Algorithm Based On Curve Fitting Technique
PPTX
Unit 3 Image Compression and Segmentation.pptx
PPT
Why Image compression is Necessary?
PPTX
Image compression 14_04_2020 (1)
PPT
Lecture 6 -_presentation_layer
PPTX
Image compression and jpeg
PDF
PIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHM
PDF
A Critical Review of Well Known Method For Image Compression
PDF
Chapter 5 - Data Compression
PPT
2019010413470100000524_Sesi10_Multimedia Data Compression II.ppt
PPT
Data Redundacy
PPT
Chapter%202%20 %20 Text%20compression(2)
 
notes_Image Compression_edited.ppt
Source coding
Teknik Pengkodean (2).pptx
Image compression
Compression
J03502050055
Compression of digital voice and video
Data compression
Lossless Data Compression Using Rice Algorithm Based On Curve Fitting Technique
Unit 3 Image Compression and Segmentation.pptx
Why Image compression is Necessary?
Image compression 14_04_2020 (1)
Lecture 6 -_presentation_layer
Image compression and jpeg
PIXEL SIZE REDUCTION LOSS-LESS IMAGE COMPRESSION ALGORITHM
A Critical Review of Well Known Method For Image Compression
Chapter 5 - Data Compression
2019010413470100000524_Sesi10_Multimedia Data Compression II.ppt
Data Redundacy
Chapter%202%20 %20 Text%20compression(2)
 

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation theory and applications.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
KodekX | Application Modernization Development
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation theory and applications.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Chapter 3 Spatial Domain Image Processing.pdf
KodekX | Application Modernization Development
sap open course for s4hana steps from ECC to s4
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I

Compressionbasics

  • 2. Topics today… Why Compression ? Information Theory Basics Classification of Compression Algorithms Data Compression Model Compression Performance
  • 3. Why Compression? Digital representation of analog signals requires huge storage High quality audio signal require 1.5 megabits/sec A low resolution movie (30 frames per second, 640 x 580 pixels per frame, 24 bits per pixel) requires 210 megabits per minute!! 95 gigabytes per hour It is challenging transferring such files through the available limited bandwidth network.
  • 4. Why Compression? Table –1 : Uncompressed source data rates Source Bit Rates for uncompressed Sources (Approximate) Telephony (200-3400 Hz) 8000 samples/second x 12 bits/sample = 96 kbps Wideband Audio (20-20000 Hz) 44100 samples/second x 2 channels x 16 bits/sample= 1.412Mbps Images 512x512 pixel color image x 24 bits/pixel = 6.3Mbits/image Video 640x480 pixel color image x 24 bits/pixel x 30 images/second=221 Mbps  650 megabyte CD can store 23.5 mins of video ? HDTV 1280x720 pixel color image x 60 images/second x 24 bits/pixel=1.3Gbps
  • 5. The compression problem Efficient digital representation of a source Data compression is the representation of the source in digital form with as few bits as possible while maintaining an acceptable loss in fidelity. Source can be data, still images, speech, audio, video or whatever signal needs to be stored & transmitted .
  • 6. Synonyms for Data Compression Signal compression & signal coding Source coding & source coding with fidelity criterion (in information theory) Noiseless & Noisy Source coding (lossless & lossy compression) Noise  Reconstruction noise Bandwidth compression, redundancy removal (more dated terminologies, in 80’s.)
  • 7. Types of Data Compression Problem Distortion-rate Problem Given the constraint on transmitted data rate or storage capacity, problem is to compress the source at or below this rate but at the highest fidelity possible Ex . Voice mail, video conferencing, digital cellular Rate-distortion Problem Given the constraint on the fidelity, problem is to achieve it with as few bits as possible Ex . CD-Quality audio
  • 8. Information Theory Basics Representation of data is the combination of information and redundancy Data compression is essentially a redundancy reduction technique Data compression scheme can be broadly divided into two phases Modelling Coding
  • 9. Information Theory Basics In Modeling phase information about redundancy is analyzed & represented as a model This can be done via observing the empirical distribution of the symbols the source generates In the coding phase the difference between the actual data and the model is coded
  • 10. Discrete Memoryless Model Source is discrete memoryless if it generates symbol that is statistically independent of one another Described by the source alphabet A={a 1 ,a 2 ,a 3 …a n } and associated probabilities P=(p(a 1 ), p(a 2 ), p(a 3 ),…. p(a n )) The amount of information content for a source symbol I(a i ) is The base 2 logarithm indicates the information content is represented in bits. Higher probability symbols are coded with less bits.
  • 11. Discrete Memoryless Model [2] Averaging the information content over all symbols, we get the entropy E as follows Hence, entropy is the expected length of a binary code over all the symbols. Estimation of entropy depends on the observation & assumption on the structure of source symbols
  • 12. Noiseless source coding theorem The Noiseless Source Coding Theorem states that any source can be losslessly encoded with a code whose average number of bits per source symbol is arbitrarily close to, but not less than, the source entropy E in bits by coding infinitely long extensions of the source.
  • 13. Entropy Reduction Consider a discrete memoryless source, with source alphabet A1 = {α, β, γ, δ} & probability p (α) = 0.65, p (β) = 0.20, p (γ) = 0.10, p (δ) = 0.05 respectively The entropy of this source is E = −(0.65 log2 0.65 + 0.20 log2 0.20 + 0.10 log2 0.10 + 0.05 log2 0.05) = 1.42 bits per symbol A data source of 2000 symbols can be represented using 2000 x 1.42 = 2840 bits
  • 14. Entropy Reduction [2] Now assume we know something about the structure of the sequence Alphabet A2 = {0, 1, 2, 3} Sequence D = 0 1 1 2 3 3 3 3 3 3 3 3 3 2 2 2 3 3 3 3 p (0) = 0.05, p (1) = 0.10, p (2) = 0.20, and p (3) = 0.65 E = 1.42 bits per symbol Assume the correlation between consecutive bits and we attempt to reduce it by r i = s i − s i −1 for each sample s i
  • 15. Entropy Reduction [3] Now D = 0 1 0 1 1 0 0 0 0 0 0 0 0 −1 0 0 1 0 0 0 A2 = {−1, 1, 0} P (−1) = 0.05, p (1) = 0.2, and p (0) = 0.75 E = 0.992 If used appropriate entropy coding technique maximum compression can be achieved
  • 16. Unique Decipherability Consider the following table Symbols are encoded with codes A, B and C. Consider the string S = ααγαβαδ
  • 17. Unique Decipherability [2] Deciphering C A (S) and C B (S) are unambiguous and we get the string S C C (S) is ambiguous and not uniquely decipherable Fixed length codes are always uniquely decipherable. Not all variable length codes are uniquely decipherable.
  • 18. Unique Decipherability [3] Uniquely decipherable codes maintain prefix property , ie no codeword in the code-set forms the prefix of another distinct codeword Popular variable-length coding techniques Shannon-Fano Coding Huffman Coding Elias Coding Arithmetic Coding Fixed-length codes can be treated as a special case of uniquely decipherable variable-length code.
  • 19. Classification of compression algorithms CODEC
  • 20. Classification of compression algorithms [2] Data compression as a method that takes an input data D and generates a shorter representation of the data c ( D ) with a fewer number of bits compared to that of D The reverse process is called decompression, which takes the compressed data c(D) and generates or reconstructs the data D′ Sometimes the compression (coding) and decompression (decoding) systems together are called a "CODEC,"
  • 21. Classification of compression algorithms [3] If the reconstructed data D ′ is an exact replica of the original data D, we call the algorithm applied to compress D and decompress c ( D ) to be lossless . Otherwise the algorithms are lossy Text, scientific data, medical images are some of the applications requires lossless compression Compression can be static or dynamic , depends on the coding scheme used
  • 22. Data compression model A data compression system mainly consists of three major steps removal or reduction in data redundancy reduction in entropy entropy encoding
  • 23. Data compression model REDUCTION IN DATA REDUNDANCY Removal or reduction in data redundancy is typically achieved by transforming the original data from one form or representation to another Popular transformation techniques are Discrete Cosine Transform (DCT) Discrete Wavelet Transformation (DWT) etc This step leads to the reduction of entropy For Lossless compression this transformation is completely reversible
  • 24. Data compression model REDUCTION IN ENTROPY Non reversible process Achieved by dropping insignificant information in the transformed data ( Lossy!!! ) Done by some quantization techniques Amount of quantization dictate the quality of the reconstructed data Entropy of the quantized data is less compared to the original one, hence more compression.
  • 25. Compression Performance The performance measures of data compression algorithms can be looked at from different perspectives depending on the application requirements amount of compression achieved objective and subjective quality of the reconstructed data relative complexity of the algorithm speed of execution, etc.
  • 26. Compression Performance AMOUNT OF COMPRESSION ACHIEVED Compression ratio , the ratio of the number of bits to represent the original data to the number of bits to represent the compressed data Achievable compression ratio using a lossless compression scheme is totally input data dependent. Sources with less redundancy have more entropy and hence are more difficult to achieve compression
  • 27. Compression Performance SUBJECTIVE QUALITY METRIC MOS : m ean o bservers s core or m ean o pinion s core is a common measure A statistically significant number of observers are randomly chosen to evaluate visual quality of the reconstructed images. Each observer assigns a numeric score to each reconstructed image based on his or her perception of quality of the image, say within a range 1–5 to describe the quality of the image—5 being the highest quality and 1 being the worst quality. MOS is the average of these scores
  • 28. Compression Performance OBJECTIVE QUALITY METRIC Common quality metrics are root-mean-squared error ( RMSE ) signal-to-noise ratio ( SNR ) peak signal-to-noise ratio ( PSNR ). If I is an M × N image and I is the corresponding reconstructed image after compression and decompression, RMSE is calculated by The SNR in decibel unit (dB) is expressed as
  • 29. Compression Performance CODING DELAY AND COMPLEXITY Coding delay , a performance measure for compression algorithms where interactive encoding and decoding is the requirement (e.g., interactive video teleconferencing, on-line image browsing, real-time voice communication, etc.) The complex the compression algorithm  Increased coding delay Compression system designer often use a less sophisticated algorithm for the compression system.
  • 30. Compression Performance CODING DELAY AND COMPLEXITY Coding complexity , a performance measure considered where the computational requirement to implement the codec is an important criteria MOPS (millions of operations per second), MIPS (millions of instructions per second) are often used to measure the compression performance in a specific computing engine's architecture.
  • 31. Reference Chapter 1 of JPEG2000 Standard for Image Compression: Concepts, Algorithms and VLSI Architectures by Tinku Acharya and Ping-Sing Tsai , John Wiley & Sons http://discovery.bits-pilani.ac.in/discipline/csis/vimal/course%2006-07%20Second/MMC/Lectures/cf.doc Chapter 1 of Digital Compression for Multimedia: Principles & Standards by Jerry D.Gibson