SlideShare a Scribd company logo
Chameli Devi Group Of Institutions 
Chameli Devi School Of Engineering 
Guided By:- 
Shadab Pasha 
Submitted By:- 
Arvind Carpenter
Contents 
1. Introduction 
2. Categorization of Compression 
3. Lossless Compression 
4. Run-length Encoding 
5. Huffman Coding 
6. Lempel Ziv (LZ) Encoding 
7. Lossy Compression 
8. Image Compression (JPEG) Encoding 
9. Video Compression (MPEG) Encoding 
10. Audio Compression (MP3) 
11 Conclusion 
12 References
Why
 Video: 30 pictures per second 
 Each picture = 200,000 dots or pixels 
 8-bits to represent each primary color 
 For RGB = 28 x 28 x 28 
 Bits required for one second movie = 503316480 pixels 
 Two hour movie requires = 2 x 60 x 60 x 503316480
Data compession
Introduction 
 Compression is a way to reduce the number of bits in a 
frame but retaining its meaning. 
 Decreases space, time to transmit, and cost 
 Technique is to identify redundancy and to eliminate it 
 If a file contains only capital letters, we may encode all 
the 26 alphabets using 5-bit numbers instead of 8-bit 
ASCII code 
 If the file had n-characters, then the savings = (8n-5n)/8n 
=> 37.5%
Categories of Compression
Lossless Compression 
In lossless data compression:- 
o The integrity of the data is preserved. 
o The original data and the data after compression and 
decompression are exactly the same. 
o No data loss. 
o Redundant data is removed in compression and added 
during decompression. 
o Lossless compression methods are normally used 
when we cannot afford to lose any data.
Run-length Encoding 
Run-length encoding is simple and lossless 
Here 
How 
It Works 
Is
Notice that here are 9 
pieces of fruits 
We can store these information as is.....
Data compession
There is a much better way....... 
Check 
It 
Out !
Currently to read the line 
of fruits aloud exactly 
it appears you would say. 
Kind of redundant.......
To save on space We can 
“Compress” The 
Information.....
Notice that there are multiples of 
certain fruits....
Simplify...
Now if we read these aloud it’s not 
So weird  
“Three apples, two pears, one banana, two oranges 
and one apple” 
.........And it saves SPACE
Now to translate into 
computer terms... 
A scan line contains a run of numbers... 
55556987444425555611111988888222222222 
...Using run-length Encoding 
(4,5) (1,6) (1,9) (1,8) (1,7) 
(4,4) (1,2) (4,5) (1,6) (5,1) 
(1,9) (5,8) (9,2)
To Sum it up..... 
In Wikipedia terms..... 
Run-length encoding (RLE) is a very simple 
form of data compression in which runs of data 
(that is, sequences in which the same data 
value occurs in many consecutive data 
elements) are stored as a single data value 
and count, rather than as the original run
Huffman Coding 
 Huffman coding is credited to David Albert Huffman 
 Huffman coding is an entropy encoding algorithm used 
for lossless data compression. 
 Huffman coding is a method of storing strings of data as 
binary code in efficient manner 
 Huffman coding uses variable length coding which 
means that symbols in the data you are encoded are 
converted in to a binary symbol based on how often that 
symbol is used 
 There is a way to decide what binary code to give to each 
character using trees
The (Real) Basic Algorithm 
 Scan text to be compressed and tally occurrence of all 
characters. 
 Sort or prioritize characters based on number of 
occurrences in text. 
 Build Huffman code tree based on prioritized list. 
 Perform a traversal of tree to determine all code words. 
 Scan text again and create new file using the Huffman 
codes.
Building a Tree 
Scan the original text 
 Consider the following short text: 
Eerie eyes seen near lake. 
 Count up the occurrences of all characters in the text 
CS 102
Building a Tree 
Scan the original text 
Eerie eyes seen near lake. 
What characters are present? 
E e r i space 
y s n a r l k . 
CS 102
Eerie eyes seen near lake. 
What is the frequency of each character in the 
text? 
Char Freq 
E 1 
e 8 
r 2 
i 1 
Space 4 
y 1 
s 2 
n 2 
CS 102 
Char Freq 
a 2 
l 1 
k 1 
. 1 
Building a Tree 
Scan the original text
 The queue after inserting all nodes 
 Null Pointers are not shown 
CS 102 
E 
1 
i 
1 
y 
1 
l 
1 
k 
1 
. 
1 
r 
2 
s 
2 
n 
2 
a 
2 
sp 
4 
e 
8 
Building a Tree
CS 102 
E 
1 
i 
1 
y 
1 
l 
1 
k 
1 
. 
1 
r 
2 
s 
2 
n 
2 
a 
2 
sp 
4 
e 
8 
BUILDING A TREE
CS 
102 
E1 
i 
1 
y 
1 
l 
1 
k 
1 
. 
1 
r 
2 
s 
2 
n 
2 
a 
2 
sp 
4 
e 
8 
2 
BUILDING A TREE
CS 
102 
E1 
i 
1 
y 
1 
l 
1 
k 
1 
. 
1 
r 
2 
s 
2 
n 
2 
a 
2 
sp 
4 
e 
8 
2 
BUILDING A TREE
CS 
102 
E1 
i 
1 
k 
1 
. 
1 
r 
2 
s 
2 
n 
2 
a 
2 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
BUILDING A TREE
CS 
102 
E1 
i 
1 
k 
1 
. 
1 
r 
2 
s 
2 
n 
2 
a 
2 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
BUILDING A TREE
CS 
102 
BUILDING A TREE 
E1 
i 
1 
r 
2 
s 
2 
n 
2 
a 
2 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2
CS 
102 
BUILDING A TREE 
E1 
i 
1 
r 
2 
s 
2 
n 
2 
a 
2 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2
CS 
102 
BUILDING A TREE 
E1 
i 
1 
n 
2 
a 
2 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4
CS 
102 
E1 
i 
1 
n 
2 
a 
2 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
BUILDING A TREE
CS 
102 
E1 
i 
1 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
4 
BUILDING A TREE
CS 
102 
E1 
i 
1 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
4 
BUILDING A TREE
CS 
102 
BUILDING A TREE 
E1 
i 
1 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
4 
4
CS 
102 
4 4 
E1 
i 
1 
sp 
4 
e 
2 8 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
BUILDING A TREE
CS 
102 
BUILDING A TREE 
4 4 
E1i 
1 
sp 
4 
e 
2 8 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
6
CS 
102 
BUILDING A TREE 
4 4 6 
E1 
i 
1 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
What is happening to the characters with a low number of occurrences?
CS 
102 
E1 
i 
1 
sp 
4 
e 
2 8 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
4 
4 
6 
8 
BUILDING A TREE
CS 
102 
BUILDING A TREE 
E1 
i 
1 
sp 
4 
e 
2 8 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
4 
4 
6 8
CS 
102 
E1 
i 
1 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 4 
4 
6 
8 
10 
BUILDING A TREE
CS 
102 
BUILDING A TREE 
E1 
i 
1 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
r s 
2 
2 
2 
4 
n2 
a2 4 4 
6 
8 10
CS 
102 
E1 
i 
1 
sp 
4 
e8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
4 
4 
6 
8 
10 
16 
BUILDING A TREE
CS 
102 
E1 
i 
1 
sp 
4 
e 
2 8 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n 
2 
a 
2 
4 
4 
6 
8 
10 
16 
BUILDING A TREE
CS 
102 
BUILDING A TREE 
E1 
i 
1 
sp 
4 
e 
8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n 
2 
a 
2 
4 
4 
6 
8 
10 
16 
26
CS 
102 
E1 
i 
1 
sp 
4 
e8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
s 
2 
4 
n2 
a2 
4 
4 
6 
8 
10 
16 
26 
After enqueueing this node 
there is only one node left 
in priority queue. 
BUILDING A TREE
CS 102 
 Perform a traversal of the 
tree to obtain new code 
words 
 Going left is a 0 going right 
is a 1 
 code word is only 
completed when a leaf 
node is reached 
E1 
i 
1 
sp 
4 
e8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
4 
s 
2 
n2 
a2 
4 
4 
6 
8 
10 
16 
26 
Encoding the File 
Traverse Tree for Codes
CS 102 
ENCODING THE FILE 
TRAVERSE TREE FOR CODES 
Char Code 
E 0000 
i 0001 
y 0010 
l 0011 
k 0100 
. 0101 
space 011 
e 10 
r 1100 
s 1101 
n 1110 
a 1111 
E1 
i 
1 
sp 
4 
e8 
2 
y 
1 
l 
1 
2 
k 
1 
. 
1 
2 
r 
2 
4 
s 
2 
n2 
a2 
4 
4 
6 
8 
10 
16 
26
CS 102 
ENCODING THE FILE 
 Rescan text and encode file 
using new code words 
Eerie eyes seen near lake. 
Char Code 
E 0000 
i 0001 
y 0010 
l 0011 
k 0100 
. 0101 
space 011 
e 10 
r 1100 
s 1101 
n 1110 
a 1111 
0000101100000110011100010101101101 
00111110101111110001100111111010010 
0101 
 Why is there no need for a 
separator character? 
.
CS 102 
ENCODING THE FILE 
RESULTS 
 Have we made things any 
better? 
 73 bits to encode the text 
 ASCII would take 8 * 26 = 
208 bits 
0000101100000110011100010101101101 
00111110101111110001100111111010010 
0101
Lemple Ziv (LZ) Encoding 
 Data compression up until the late 1970's mainly directed 
towards creating better methodologies for Huffman coding. 
 An innovative, radically different method was introduced 
in1977 by Abraham Lempel and Jacob Ziv. 
 This technique ( called Lempel-Ziv) actually consists of two 
considerably different algorithms, LZ77 and LZ78. 
 Due to patents, LZ77 and LZ78 led to many variants. 
LZ77 LZR LZSS LZB LZH 
Variants 
LZ78 LZW LZC LZT LZMW LZJ LZFG 
Variants 
 The zip and unzip use the LZH technique while UNIX's 
compress methods belong to the LZW and LZC classes
EXAMPLE : LZ78 COMPRESSION 
Encode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ78 algorithm. 
The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B) 
Note: The above is just a representation, the commas and parentheses are not transmitted; 
we will discuss the actual form of the compressed message later on in slide 12.
EXAMPLE : LZ78 COMPRESSION (CONT’D) 
1. A is not in the Dictionary; insert it 
2. B is not in the Dictionary; insert it 
3. B is in the Dictionary. 
BC is not in the Dictionary; insert it. 
4. B is in the Dictionary. 
BC is in the Dictionary. 
BCA is not in the Dictionary; insert it. 
5. B is in the Dictionary. 
BA is not in the Dictionary; insert it. 
6. B is in the Dictionary. 
BC is in the Dictionary. 
BCA is in the Dictionary. 
BCAA is not in the Dictionary; insert it. 
7. B is in the Dictionary. 
BC is in the Dictionary. 
BCA is in the Dictionary. 
BCAA is in the Dictionary. 
BCAAB is not in the Dictionary; insert it.
Lossy Compression Methods 
 Used for compressing images and video files 
(our eyes cannot distinguish subtle changes, so 
lossy data is acceptable). 
 These methods are cheaper, less time and 
space. 
 Several methods: 
 JPEG: compress pictures and graphics 
 MPEG: compress video 
 MP3: compress audio
JPEG Encoding 
 Used to compress pictures and graphics. 
 In JPEG, a grayscale picture is divided into 8x8 
pixel blocks to decrease the number of 
calculations. 
 Basic idea: 
 Change the picture into a linear (vector) sets of numbers that 
reveals the redundancies. 
 The redundancies is then removed by one of lossless 
compression methods.
JPEG Encoding - DCT 
DCT: Discrete Concise Transform 
DCT transforms the 64 values in 8x8 pixel block 
in a way that the relative relationships between 
pixels are kept but the redundancies are 
revealed. 
 Example: 
A gradient grayscale
Quantization & Compression 
 Quantization: 
 After T table is created, the values are quantized to reduce the 
number of bits needed for encoding. 
 Quantization divides the number of bits by a constant, then 
drops the fraction. This is done to optimize the number of bits 
and the number of 0s for each particular application. 
• Compression: 
 Quantized values are read from the table and redundant 0s are 
removed. 
 To cluster the 0s together, the table is read diagonally in an 
zigzag fashion. The reason is if the table doesn’t have fine 
changes, the bottom right corner of the table is all 0s. 
 JPEG usually uses lossless run-length encoding at the 
compression phase.
JPEG Encoding
MPEG Encoding 
 Used to compress video. 
 Basic idea: 
 Each video is a rapid sequence of a set of 
frames. Each frame is a spatial combination 
of pixels, or a picture. 
 Compressing video = 
spatially compressing each frame 
+ 
temporally compressing a set of 
frames.
MPEG Encoding 
• Spatial Compression 
• Each frame is spatially compressed by JPEG. 
• Temporal Compression 
• Redundant frames are removed. 
• For example, in a static scene in which someone is talking, 
most frames are the same except for the segment around the 
speaker’s lips, which changes from one frame to the next.
Audio Compression 
Used for speech or music 
 Speech: compress a 64 kHz digitized signal 
 Music: compress a 1.411 MHz signal 
Two categories of techniques: 
 Predictive encoding 
 Perceptual encoding
Audio Encoding 
•Predictive Encoding 
•Only the differences between samples are encoded, not 
the whole sample values. 
•Several standards: GSM (13 kbps), G.729 (8 kbps), and 
G.723.3 (6.4 or 5.3 kbps) 
•Perceptual Encoding: MP3 
•CD-quality audio needs at least 1.411 Mbps and cannot 
be sent over the Internet without compression. 
•MP3 (MPEG audio layer 3) uses perceptual encoding 
technique to compress audio.
Conclusion 
Compression is used in all types of data 
to save space and time. There are two 
types of data compression-lossy and 
lossless. Lossy techniques are used for 
images, videos and audios, where we 
can bear data loss. Lossless technique 
is used for textual data it can be 
encoded through run-length, Huffman 
and Lempel Ziv.
References 
 http://www.csie.kuas.edu.tw/course/cs/englis 
h/ch-15.ppt 
CS157B-Lecture 19 by Professor Lee 
http://cs.sjsu.edu/~lee/cs157b/cs157b.html 
 “The essentials of computer organization 
and architecture” by Linda Null and Julia 
Nobur 
 . 
 http://www.wekipedia.com
Thank 
You
Data Compression 
Questions

More Related Content

PPT
PPT
PPT
Huffman 2
PPTX
Data encryption standard
PDF
DES Simplified
PPTX
How Computer Games Help Children Learn (Stockholm University Dept of Educatio...
PDF
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...
PPTX
Frontiers of Engineering Education
Huffman 2
Data encryption standard
DES Simplified
How Computer Games Help Children Learn (Stockholm University Dept of Educatio...
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...
Frontiers of Engineering Education

Viewers also liked (20)

PPTX
data compression.
PPT
Data Compression Technique
PDF
Data compression introduction
ODP
image compression ppt
PPTX
Fundamentals of Data compression
PPTX
6th sense technology
DOCX
Biochipss
PPTX
Steganograpy
PPTX
Face Recognition
PPTX
3D internet
PPTX
PPTX
PDF
Arvind stegnography
PPTX
Compression project presentation
PPTX
4G Technology
PPTX
Data Compression In SQL
PPT
Compression
PPTX
Animation
PPT
Compression techniques
PPTX
Data compression
data compression.
Data Compression Technique
Data compression introduction
image compression ppt
Fundamentals of Data compression
6th sense technology
Biochipss
Steganograpy
Face Recognition
3D internet
Arvind stegnography
Compression project presentation
4G Technology
Data Compression In SQL
Compression
Animation
Compression techniques
Data compression
Ad

Similar to Data compession (20)

PDF
Introduction Data Compression/ Data compression, modelling and coding,Image C...
PDF
CS-102 Data Structures huffman coding.pdf
PDF
CS-102 Data Structures huffman coding.pdf
PPT
PDF
Linking E-Mails and Source Code Artifacts
PPTX
Huffman.pptx
PPT
huffman algoritm upload for understand.ppt
PPT
PPT
huffman codes and algorithm
PPT
huffman_nyu.ppt ghgghtttjghh hhhhhhhhhhh
PPT
Data Structure and Algorithms Huffman Coding Algorithm
PPTX
Block Cipher.cryptography_miu_year5.pptx
PPT
Huffman Tree And Its Application
PPTX
Data Encryption standard in cryptography
PDF
[Perforce] Tasks - The Holy Hand Grenade of Branching
PDF
Cryptography (under)engineering
PPT
Presentation2 1-150523155048-lva1-app6892
PDF
MaskedVByte: SIMD-accelerated VByte
PDF
Bsdconv
PPT
Writing Metasploit Plugins
Introduction Data Compression/ Data compression, modelling and coding,Image C...
CS-102 Data Structures huffman coding.pdf
CS-102 Data Structures huffman coding.pdf
Linking E-Mails and Source Code Artifacts
Huffman.pptx
huffman algoritm upload for understand.ppt
huffman codes and algorithm
huffman_nyu.ppt ghgghtttjghh hhhhhhhhhhh
Data Structure and Algorithms Huffman Coding Algorithm
Block Cipher.cryptography_miu_year5.pptx
Huffman Tree And Its Application
Data Encryption standard in cryptography
[Perforce] Tasks - The Holy Hand Grenade of Branching
Cryptography (under)engineering
Presentation2 1-150523155048-lva1-app6892
MaskedVByte: SIMD-accelerated VByte
Bsdconv
Writing Metasploit Plugins
Ad

Recently uploaded (20)

PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Computing-Curriculum for Schools in Ghana
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Lesson notes of climatology university.
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
01-Introduction-to-Information-Management.pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Final Presentation General Medicine 03-08-2024.pptx
Computing-Curriculum for Schools in Ghana
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Classroom Observation Tools for Teachers
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Lesson notes of climatology university.
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
01-Introduction-to-Information-Management.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Module 4: Burden of Disease Tutorial Slides S2 2025
Orientation - ARALprogram of Deped to the Parents.pptx
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
History, Philosophy and sociology of education (1).pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape

Data compession

  • 1. Chameli Devi Group Of Institutions Chameli Devi School Of Engineering Guided By:- Shadab Pasha Submitted By:- Arvind Carpenter
  • 2. Contents 1. Introduction 2. Categorization of Compression 3. Lossless Compression 4. Run-length Encoding 5. Huffman Coding 6. Lempel Ziv (LZ) Encoding 7. Lossy Compression 8. Image Compression (JPEG) Encoding 9. Video Compression (MPEG) Encoding 10. Audio Compression (MP3) 11 Conclusion 12 References
  • 3. Why
  • 4.  Video: 30 pictures per second  Each picture = 200,000 dots or pixels  8-bits to represent each primary color  For RGB = 28 x 28 x 28  Bits required for one second movie = 503316480 pixels  Two hour movie requires = 2 x 60 x 60 x 503316480
  • 6. Introduction  Compression is a way to reduce the number of bits in a frame but retaining its meaning.  Decreases space, time to transmit, and cost  Technique is to identify redundancy and to eliminate it  If a file contains only capital letters, we may encode all the 26 alphabets using 5-bit numbers instead of 8-bit ASCII code  If the file had n-characters, then the savings = (8n-5n)/8n => 37.5%
  • 8. Lossless Compression In lossless data compression:- o The integrity of the data is preserved. o The original data and the data after compression and decompression are exactly the same. o No data loss. o Redundant data is removed in compression and added during decompression. o Lossless compression methods are normally used when we cannot afford to lose any data.
  • 9. Run-length Encoding Run-length encoding is simple and lossless Here How It Works Is
  • 10. Notice that here are 9 pieces of fruits We can store these information as is.....
  • 12. There is a much better way....... Check It Out !
  • 13. Currently to read the line of fruits aloud exactly it appears you would say. Kind of redundant.......
  • 14. To save on space We can “Compress” The Information.....
  • 15. Notice that there are multiples of certain fruits....
  • 17. Now if we read these aloud it’s not So weird  “Three apples, two pears, one banana, two oranges and one apple” .........And it saves SPACE
  • 18. Now to translate into computer terms... A scan line contains a run of numbers... 55556987444425555611111988888222222222 ...Using run-length Encoding (4,5) (1,6) (1,9) (1,8) (1,7) (4,4) (1,2) (4,5) (1,6) (5,1) (1,9) (5,8) (9,2)
  • 19. To Sum it up..... In Wikipedia terms..... Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run
  • 20. Huffman Coding  Huffman coding is credited to David Albert Huffman  Huffman coding is an entropy encoding algorithm used for lossless data compression.  Huffman coding is a method of storing strings of data as binary code in efficient manner  Huffman coding uses variable length coding which means that symbols in the data you are encoded are converted in to a binary symbol based on how often that symbol is used  There is a way to decide what binary code to give to each character using trees
  • 21. The (Real) Basic Algorithm  Scan text to be compressed and tally occurrence of all characters.  Sort or prioritize characters based on number of occurrences in text.  Build Huffman code tree based on prioritized list.  Perform a traversal of tree to determine all code words.  Scan text again and create new file using the Huffman codes.
  • 22. Building a Tree Scan the original text  Consider the following short text: Eerie eyes seen near lake.  Count up the occurrences of all characters in the text CS 102
  • 23. Building a Tree Scan the original text Eerie eyes seen near lake. What characters are present? E e r i space y s n a r l k . CS 102
  • 24. Eerie eyes seen near lake. What is the frequency of each character in the text? Char Freq E 1 e 8 r 2 i 1 Space 4 y 1 s 2 n 2 CS 102 Char Freq a 2 l 1 k 1 . 1 Building a Tree Scan the original text
  • 25.  The queue after inserting all nodes  Null Pointers are not shown CS 102 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 Building a Tree
  • 26. CS 102 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 BUILDING A TREE
  • 27. CS 102 E1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 BUILDING A TREE
  • 28. CS 102 E1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 BUILDING A TREE
  • 29. CS 102 E1 i 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 y 1 l 1 2 BUILDING A TREE
  • 30. CS 102 E1 i 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 y 1 l 1 2 BUILDING A TREE
  • 31. CS 102 BUILDING A TREE E1 i 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2
  • 32. CS 102 BUILDING A TREE E1 i 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2
  • 33. CS 102 BUILDING A TREE E1 i 1 n 2 a 2 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4
  • 34. CS 102 E1 i 1 n 2 a 2 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 BUILDING A TREE
  • 35. CS 102 E1 i 1 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 4 BUILDING A TREE
  • 36. CS 102 E1 i 1 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 4 BUILDING A TREE
  • 37. CS 102 BUILDING A TREE E1 i 1 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 4 4
  • 38. CS 102 4 4 E1 i 1 sp 4 e 2 8 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 BUILDING A TREE
  • 39. CS 102 BUILDING A TREE 4 4 E1i 1 sp 4 e 2 8 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 6
  • 40. CS 102 BUILDING A TREE 4 4 6 E1 i 1 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 What is happening to the characters with a low number of occurrences?
  • 41. CS 102 E1 i 1 sp 4 e 2 8 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 4 4 6 8 BUILDING A TREE
  • 42. CS 102 BUILDING A TREE E1 i 1 sp 4 e 2 8 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 4 4 6 8
  • 43. CS 102 E1 i 1 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 4 4 6 8 10 BUILDING A TREE
  • 44. CS 102 BUILDING A TREE E1 i 1 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 r s 2 2 2 4 n2 a2 4 4 6 8 10
  • 45. CS 102 E1 i 1 sp 4 e8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 4 4 6 8 10 16 BUILDING A TREE
  • 46. CS 102 E1 i 1 sp 4 e 2 8 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n 2 a 2 4 4 6 8 10 16 BUILDING A TREE
  • 47. CS 102 BUILDING A TREE E1 i 1 sp 4 e 8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n 2 a 2 4 4 6 8 10 16 26
  • 48. CS 102 E1 i 1 sp 4 e8 2 y 1 l 1 2 k 1 . 1 2 r 2 s 2 4 n2 a2 4 4 6 8 10 16 26 After enqueueing this node there is only one node left in priority queue. BUILDING A TREE
  • 49. CS 102  Perform a traversal of the tree to obtain new code words  Going left is a 0 going right is a 1  code word is only completed when a leaf node is reached E1 i 1 sp 4 e8 2 y 1 l 1 2 k 1 . 1 2 r 2 4 s 2 n2 a2 4 4 6 8 10 16 26 Encoding the File Traverse Tree for Codes
  • 50. CS 102 ENCODING THE FILE TRAVERSE TREE FOR CODES Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 E1 i 1 sp 4 e8 2 y 1 l 1 2 k 1 . 1 2 r 2 4 s 2 n2 a2 4 4 6 8 10 16 26
  • 51. CS 102 ENCODING THE FILE  Rescan text and encode file using new code words Eerie eyes seen near lake. Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 0000101100000110011100010101101101 00111110101111110001100111111010010 0101  Why is there no need for a separator character? .
  • 52. CS 102 ENCODING THE FILE RESULTS  Have we made things any better?  73 bits to encode the text  ASCII would take 8 * 26 = 208 bits 0000101100000110011100010101101101 00111110101111110001100111111010010 0101
  • 53. Lemple Ziv (LZ) Encoding  Data compression up until the late 1970's mainly directed towards creating better methodologies for Huffman coding.  An innovative, radically different method was introduced in1977 by Abraham Lempel and Jacob Ziv.  This technique ( called Lempel-Ziv) actually consists of two considerably different algorithms, LZ77 and LZ78.  Due to patents, LZ77 and LZ78 led to many variants. LZ77 LZR LZSS LZB LZH Variants LZ78 LZW LZC LZT LZMW LZJ LZFG Variants  The zip and unzip use the LZH technique while UNIX's compress methods belong to the LZW and LZC classes
  • 54. EXAMPLE : LZ78 COMPRESSION Encode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ78 algorithm. The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B) Note: The above is just a representation, the commas and parentheses are not transmitted; we will discuss the actual form of the compressed message later on in slide 12.
  • 55. EXAMPLE : LZ78 COMPRESSION (CONT’D) 1. A is not in the Dictionary; insert it 2. B is not in the Dictionary; insert it 3. B is in the Dictionary. BC is not in the Dictionary; insert it. 4. B is in the Dictionary. BC is in the Dictionary. BCA is not in the Dictionary; insert it. 5. B is in the Dictionary. BA is not in the Dictionary; insert it. 6. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is not in the Dictionary; insert it. 7. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is in the Dictionary. BCAAB is not in the Dictionary; insert it.
  • 56. Lossy Compression Methods  Used for compressing images and video files (our eyes cannot distinguish subtle changes, so lossy data is acceptable).  These methods are cheaper, less time and space.  Several methods:  JPEG: compress pictures and graphics  MPEG: compress video  MP3: compress audio
  • 57. JPEG Encoding  Used to compress pictures and graphics.  In JPEG, a grayscale picture is divided into 8x8 pixel blocks to decrease the number of calculations.  Basic idea:  Change the picture into a linear (vector) sets of numbers that reveals the redundancies.  The redundancies is then removed by one of lossless compression methods.
  • 58. JPEG Encoding - DCT DCT: Discrete Concise Transform DCT transforms the 64 values in 8x8 pixel block in a way that the relative relationships between pixels are kept but the redundancies are revealed.  Example: A gradient grayscale
  • 59. Quantization & Compression  Quantization:  After T table is created, the values are quantized to reduce the number of bits needed for encoding.  Quantization divides the number of bits by a constant, then drops the fraction. This is done to optimize the number of bits and the number of 0s for each particular application. • Compression:  Quantized values are read from the table and redundant 0s are removed.  To cluster the 0s together, the table is read diagonally in an zigzag fashion. The reason is if the table doesn’t have fine changes, the bottom right corner of the table is all 0s.  JPEG usually uses lossless run-length encoding at the compression phase.
  • 61. MPEG Encoding  Used to compress video.  Basic idea:  Each video is a rapid sequence of a set of frames. Each frame is a spatial combination of pixels, or a picture.  Compressing video = spatially compressing each frame + temporally compressing a set of frames.
  • 62. MPEG Encoding • Spatial Compression • Each frame is spatially compressed by JPEG. • Temporal Compression • Redundant frames are removed. • For example, in a static scene in which someone is talking, most frames are the same except for the segment around the speaker’s lips, which changes from one frame to the next.
  • 63. Audio Compression Used for speech or music  Speech: compress a 64 kHz digitized signal  Music: compress a 1.411 MHz signal Two categories of techniques:  Predictive encoding  Perceptual encoding
  • 64. Audio Encoding •Predictive Encoding •Only the differences between samples are encoded, not the whole sample values. •Several standards: GSM (13 kbps), G.729 (8 kbps), and G.723.3 (6.4 or 5.3 kbps) •Perceptual Encoding: MP3 •CD-quality audio needs at least 1.411 Mbps and cannot be sent over the Internet without compression. •MP3 (MPEG audio layer 3) uses perceptual encoding technique to compress audio.
  • 65. Conclusion Compression is used in all types of data to save space and time. There are two types of data compression-lossy and lossless. Lossy techniques are used for images, videos and audios, where we can bear data loss. Lossless technique is used for textual data it can be encoded through run-length, Huffman and Lempel Ziv.
  • 66. References  http://www.csie.kuas.edu.tw/course/cs/englis h/ch-15.ppt CS157B-Lecture 19 by Professor Lee http://cs.sjsu.edu/~lee/cs157b/cs157b.html  “The essentials of computer organization and architecture” by Linda Null and Julia Nobur  .  http://www.wekipedia.com

Editor's Notes