SlideShare a Scribd company logo
Damian Gordon
 Rather than have to store every
character in a file, it would be great if
we could find a way of reducing the
length of the file to allow it to be
stored in a smaller space.
 This is the File Manager’s job
 Also Rather than have to send every
character in a message, it would be
great if we could find a way of
reducing the length of the message to
allow it to be transmitted quicker.
 This is the Network Manager’s job
 Whether the File Manager or the
Network Manager both take a similar
approach to compression.
 Let’s look at an example.
 Let’s imagine we had to send the
following message:
The rain in Spain lies mainly in the plain
 If we had to send this as it is down a
wire:
The rain in Spain lies mainly in the plain
 The a total of 42 characters (including
8 spaces)
The rain in Spain lies mainly in the plain
 The a total of 42 characters (including
8 spaces)
The rain in Spain lies mainly in the plain
 Lets replace the word “the” with the
number 1.
The rain in Spain lies mainly in the plain
 Lets replace the word “the” with the
number 1.
1 rain in Spain lies mainly in 1 plain
the =1
 Lets replace the word “the” with the
number 1.
 We’ve reduced the of characters to 38.
1 rain in Spain lies mainly in 1 plain
the =1
 Lets replace the letters “ain” with the
number 2.
1 rain in Spain lies mainly in 1 plain
the =1
 Lets replace the letters “ain” with the
number 2.
 We’ve reduced the of characters to 30.
1 r2 in Sp2 lies m2ly in 1 pl2
the =1
ain =2
 Lets replace the letters “in” with the
number 3.
1 r2 in Sp2 lies m2ly in 1 pl2
the =1
ain =2
 Lets replace the letters “in” with the
number 3.
 We’ve reduced the of characters to 28.
1 r2 3 Sp2 lies m2ly 3 1 pl2
the =1
ain =2
in = 3
 Now lets say 1 means “the ”, so it’s
“the” and a space
1 r2 3 Sp2 lies m2ly 3 1 pl2
the =1
ain =2
in = 3
 Now lets say 1 means “the ”, so it’s
“the” and a space
 We’ve reduced the of characters to 26.
1r2 3 Sp2 lies m2ly 3 1pl2
the =1
ain =2
in = 3
 Now lets say 3 means “in ”, so it’s “in”
and a space
1r2 3 Sp2 lies m2ly 3 1pl2
the =1
ain =2
in = 3
 Now lets say 3 means “in ”, so it’s “in”
and a space
 We’ve reduced the of characters to 24.
1r2 3Sp2 lies m2ly 31pl2
the =1
ain =2
in = 3
 So that’s 24 characters for a 42
character message, not bad.
The rain in Spain lies mainly in the plain
1r2 3Sp2 lies m2ly 31pl2
the =1
ain =2
in = 3
 Let’s try a different example.
 Let’s try a different example. Let’s say
we are sending a list of jobs, with each
item on the list is 10 characters long.
 Bookkeeper
 Teacher---
 Porter----
 Nurse-----
 Doctor----
 Rather than sending the spaces we
could just say how long they are:
 Bookkeeper
 Teacher---
 Porter----
 Nurse-----
 Doctor----
 Rather than sending the spaces we
could just say how long they are:
 Bookkeeper
 Teacher---
 Porter----
 Nurse-----
 Doctor----
• Bookkeeper
• Teacher3-
• Porter4-
• Nurse5-
• Doctor4-
 We’ve gone from 50 to 42 characters:
 Bookkeeper
 Teacher---
 Porter----
 Nurse-----
 Doctor----
• Bookkeeper
• Teacher3-
• Porter4-
• Nurse5-
• Doctor4-
PROGRAM CompressExample:
BEGIN
Get Current Character;
WHILE (NOT End_of_Line)
DO Get Next Character;
IF (Current Character != Next Character)
THEN Get next char, and set current to next;
Write out Current Character;
ELSE
Keep looping while the characters match;
Keep counting;
Get next char, and set current to next;
When finished write out Counter;
Write out Current Character;
Reset Counter;
ENDIF;
ENDWHILE;
END.
PROGRAM CompressExample:
BEGIN
char Current_Char, Next_char;
int Counter;
Current_Char := Get_char();
WHILE (NOT End_of_Line)
DO Next_Char := Get_char();
IF (Current_Char != Next_char)
THEN Current_Char := Next_Char;
Next_Char := Get_char();
Write out Current_Char;
ELSE
WHILE (Current_Char == Next_char)
DO Counter = Counter + 1;
Current_Char := Next_Char;
Next_Char := Get_char();
ENDWHILE;
Write out Counter, Current_Char;
Counter := 0;
ENDIF;
ENDWHILE;
END.
 Or let’s imagine we are sending a list
of house prices.
 350000
 600000
 550000
 2100000
 3000000
 Now let’s use the # to indicate number
of zeros:
 350000
 600000
 550000
 2100000
 3000000
 Now let’s use the # to indicate number
of zeros:
 350000
 600000
 550000
 2100000
 3000000
• 35#4
• 6#5
• 55#4
• 21#5
• 3#6
 We’ve gone from 32 characters to 18
characters:
 350000
 600000
 550000
 2100000
 3000000
• 35#4
• 6#5
• 55#4
• 21#5
• 3#6
Simple Data Compression
 Let’s think about images.
 Let’s say we are trying to display the
letter ‘A’
 Let’s think about images.
 Let’s say we are trying to display the
letter ‘A’
 We could encode this as:
 WWWBBWWW
 WWBWWBWW
 WBWWWWBW
 WBWWWWBW
 WBBBBBBW
 WBWWWWBW
 WBWWWWBW
 WWWWWWWW
 We could compress this to:
 WWWBBWWW
 WWBWWBWW
 WBWWWWBW
 WBWWWWBW
 WBBBBBBW
 WBWWWWBW
 WBWWWWBW
 WWWWWWWW
 We could compress this to:
 WWWBBWWW
 WWBWWBWW
 WBWWWWBW
 WBWWWWBW
 WBBBBBBW
 WBWWWWBW
 WBWWWWBW
 WWWWWWWW
• 3W2B3W
• 2WB2WB2W
• WB4WBW
• WB4WBW
• W6BW
• WB4WBW
• WB4WBW
• 8W
 From 64 characters to 44 characters:
 WWWBBWWW
 WWBWWBWW
 WBWWWWBW
 WBWWWWBW
 WBBBBBBW
 WBWWWWBW
 WBWWWWBW
 WWWWWWWW
• 3W2B3W
• 2WB2WB2W
• WB4WBW
• WB4WBW
• W6BW
• WB4WBW
• WB4WBW
• 8W
 We call this “run-length encoding” or
RLE.
 Now let’s add one more rule.
 Now let’s add one more rule.
 Let’s imagine if we send the number
‘0’ it means repeat the previous line.
 So now we had:
 WWWBBWWW
 WWBWWBWW
 WBWWWWBW
 WBWWWWBW
 WBBBBBBW
 WBWWWWBW
 WBWWWWBW
 WWWWWWWW
• 3W2B3W
• 2WB2WB2W
• WB4WBW
• WB4WBW
• W6BW
• WB4WBW
• WB4WBW
• 8W
 And we get:
 WWWBBWWW
 WWBWWBWW
 WBWWWWBW
 WBWWWWBW
 WBBBBBBW
 WBWWWWBW
 WBWWWWBW
 WWWWWWWW
• 3W2B3W
• 2WB2WB2W
• WB4WBW
• WB4WBW
• W6BW
• WB4WBW
• WB4WBW
• 8W
• 3W2B3W
• 2WB2WB2W
• WB4WBW
• 0
• W6BW
• WB4WBW
• 0
• 8W
 Going from 64 to 44 to 34 characters:
 WWWBBWWW
 WWBWWBWW
 WBWWWWBW
 WBWWWWBW
 WBBBBBBW
 WBWWWWBW
 WBWWWWBW
 WWWWWWWW
• 3W2B3W
• 2WB2WB2W
• WB4WBW
• WB4WBW
• W6BW
• WB4WBW
• WB4WBW
• 8W
• 3W2B3W
• 2WB2WB2W
• WB4WBW
• 0
• W6BW
• WB4WBW
• 0
• 8W
 For most images, the lines are
repeated frequently, so you can get
massive savings from RLE.
Simple Data Compression

More Related Content

PPTX
Operating Systems: Network Management
PPTX
James Bond and the OSI Model
PPTX
Operating Systems: Virtual Memory
PPTX
Operating Systems: Computer Security
PPTX
Operating Systems: Memory Management
PPTX
Operating Systems: File Management
PPTX
Operating Systems: Data Structures
PPTX
Operating Systems: Process Scheduling
Operating Systems: Network Management
James Bond and the OSI Model
Operating Systems: Virtual Memory
Operating Systems: Computer Security
Operating Systems: Memory Management
Operating Systems: File Management
Operating Systems: Data Structures
Operating Systems: Process Scheduling

Viewers also liked (19)

PPTX
Operating Systems: Processor Management
PPTX
Operating Systems: Device Management
PPTX
Operating Systems: What happen in 2016?
PPTX
Operating Systems 1: Syllabus
PPTX
Operating Systems 1: Introduction
PPTX
Operating Systems: Linux in Detail
PPTX
Operating Systems: Versions of Linux
PPTX
Operating Systems: A History of Linux
PPTX
Operating Systems: The Little-Man Computer
PPTX
Operating Systems: Computer Security
PPTX
Operating Systems: A History of Linux
PPTX
Computer Applications , Computer Operating System , Mobile Operating System ,...
PPTX
Data Scientist Why now?
PDF
Big data for Brains (part 3)
PPTX
Data compression
PPTX
Introduction for Data Compression
PDF
Huffman and Arithmetic coding - Performance analysis
PPTX
Compression project presentation
PPTX
Data compression
Operating Systems: Processor Management
Operating Systems: Device Management
Operating Systems: What happen in 2016?
Operating Systems 1: Syllabus
Operating Systems 1: Introduction
Operating Systems: Linux in Detail
Operating Systems: Versions of Linux
Operating Systems: A History of Linux
Operating Systems: The Little-Man Computer
Operating Systems: Computer Security
Operating Systems: A History of Linux
Computer Applications , Computer Operating System , Mobile Operating System ,...
Data Scientist Why now?
Big data for Brains (part 3)
Data compression
Introduction for Data Compression
Huffman and Arithmetic coding - Performance analysis
Compression project presentation
Data compression
Ad

More from Damian T. Gordon (20)

PPTX
Introduction to Prompts and Prompt Engineering
PPTX
Introduction to Vibe Coding and Vibe Engineering
PPTX
TRIZ: Theory of Inventive Problem Solving
PPTX
Some Ethical Considerations of AI and GenAI
PPTX
Some Common Errors that Generative AI Produces
PPTX
The Use of Data and Datasets in Data Science
PPTX
A History of Different Versions of Microsoft Windows
PPTX
Writing an Abstract: A Question-based Approach
PPTX
Using GenAI for Universal Design for Learning
DOC
A CheckSheet for Inclusive Software Design
PPTX
A History of Versions of the Apple MacOS
PPTX
68 Ways that Data Science and AI can help address the UN Sustainability Goals
PPTX
Copyright and Creative Commons Considerations
PPTX
Exam Preparation: Some Ideas and Suggestions
PPTX
Studying and Notetaking: Some Suggestions
PPTX
The Growth Mindset: Explanations and Activities
PPTX
Hyperparameter Tuning in Neural Networks
PPTX
Early 20th Century Modern Art: Movements and Artists
PPTX
An Introduction to Generative Artificial Intelligence
PPTX
An Introduction to Green Computing with a fun quiz.
Introduction to Prompts and Prompt Engineering
Introduction to Vibe Coding and Vibe Engineering
TRIZ: Theory of Inventive Problem Solving
Some Ethical Considerations of AI and GenAI
Some Common Errors that Generative AI Produces
The Use of Data and Datasets in Data Science
A History of Different Versions of Microsoft Windows
Writing an Abstract: A Question-based Approach
Using GenAI for Universal Design for Learning
A CheckSheet for Inclusive Software Design
A History of Versions of the Apple MacOS
68 Ways that Data Science and AI can help address the UN Sustainability Goals
Copyright and Creative Commons Considerations
Exam Preparation: Some Ideas and Suggestions
Studying and Notetaking: Some Suggestions
The Growth Mindset: Explanations and Activities
Hyperparameter Tuning in Neural Networks
Early 20th Century Modern Art: Movements and Artists
An Introduction to Generative Artificial Intelligence
An Introduction to Green Computing with a fun quiz.
Ad

Recently uploaded (20)

PPTX
master seminar digital applications in india
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
RMMM.pdf make it easy to upload and study
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
01-Introduction-to-Information-Management.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Lesson notes of climatology university.
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
master seminar digital applications in india
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Chinmaya Tiranga quiz Grand Finale.pdf
Pharma ospi slides which help in ospi learning
RMMM.pdf make it easy to upload and study
O5-L3 Freight Transport Ops (International) V1.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Microbial disease of the cardiovascular and lymphatic systems
01-Introduction-to-Information-Management.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Lesson notes of climatology university.
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Complications of Minimal Access Surgery at WLH
2.FourierTransform-ShortQuestionswithAnswers.pdf

Simple Data Compression

  • 2.  Rather than have to store every character in a file, it would be great if we could find a way of reducing the length of the file to allow it to be stored in a smaller space.  This is the File Manager’s job
  • 3.  Also Rather than have to send every character in a message, it would be great if we could find a way of reducing the length of the message to allow it to be transmitted quicker.  This is the Network Manager’s job
  • 4.  Whether the File Manager or the Network Manager both take a similar approach to compression.
  • 5.  Let’s look at an example.  Let’s imagine we had to send the following message: The rain in Spain lies mainly in the plain
  • 6.  If we had to send this as it is down a wire: The rain in Spain lies mainly in the plain
  • 7.  The a total of 42 characters (including 8 spaces) The rain in Spain lies mainly in the plain
  • 8.  The a total of 42 characters (including 8 spaces) The rain in Spain lies mainly in the plain
  • 9.  Lets replace the word “the” with the number 1. The rain in Spain lies mainly in the plain
  • 10.  Lets replace the word “the” with the number 1. 1 rain in Spain lies mainly in 1 plain the =1
  • 11.  Lets replace the word “the” with the number 1.  We’ve reduced the of characters to 38. 1 rain in Spain lies mainly in 1 plain the =1
  • 12.  Lets replace the letters “ain” with the number 2. 1 rain in Spain lies mainly in 1 plain the =1
  • 13.  Lets replace the letters “ain” with the number 2.  We’ve reduced the of characters to 30. 1 r2 in Sp2 lies m2ly in 1 pl2 the =1 ain =2
  • 14.  Lets replace the letters “in” with the number 3. 1 r2 in Sp2 lies m2ly in 1 pl2 the =1 ain =2
  • 15.  Lets replace the letters “in” with the number 3.  We’ve reduced the of characters to 28. 1 r2 3 Sp2 lies m2ly 3 1 pl2 the =1 ain =2 in = 3
  • 16.  Now lets say 1 means “the ”, so it’s “the” and a space 1 r2 3 Sp2 lies m2ly 3 1 pl2 the =1 ain =2 in = 3
  • 17.  Now lets say 1 means “the ”, so it’s “the” and a space  We’ve reduced the of characters to 26. 1r2 3 Sp2 lies m2ly 3 1pl2 the =1 ain =2 in = 3
  • 18.  Now lets say 3 means “in ”, so it’s “in” and a space 1r2 3 Sp2 lies m2ly 3 1pl2 the =1 ain =2 in = 3
  • 19.  Now lets say 3 means “in ”, so it’s “in” and a space  We’ve reduced the of characters to 24. 1r2 3Sp2 lies m2ly 31pl2 the =1 ain =2 in = 3
  • 20.  So that’s 24 characters for a 42 character message, not bad. The rain in Spain lies mainly in the plain 1r2 3Sp2 lies m2ly 31pl2 the =1 ain =2 in = 3
  • 21.  Let’s try a different example.
  • 22.  Let’s try a different example. Let’s say we are sending a list of jobs, with each item on the list is 10 characters long.  Bookkeeper  Teacher---  Porter----  Nurse-----  Doctor----
  • 23.  Rather than sending the spaces we could just say how long they are:  Bookkeeper  Teacher---  Porter----  Nurse-----  Doctor----
  • 24.  Rather than sending the spaces we could just say how long they are:  Bookkeeper  Teacher---  Porter----  Nurse-----  Doctor---- • Bookkeeper • Teacher3- • Porter4- • Nurse5- • Doctor4-
  • 25.  We’ve gone from 50 to 42 characters:  Bookkeeper  Teacher---  Porter----  Nurse-----  Doctor---- • Bookkeeper • Teacher3- • Porter4- • Nurse5- • Doctor4-
  • 26. PROGRAM CompressExample: BEGIN Get Current Character; WHILE (NOT End_of_Line) DO Get Next Character; IF (Current Character != Next Character) THEN Get next char, and set current to next; Write out Current Character; ELSE Keep looping while the characters match; Keep counting; Get next char, and set current to next; When finished write out Counter; Write out Current Character; Reset Counter; ENDIF; ENDWHILE; END.
  • 27. PROGRAM CompressExample: BEGIN char Current_Char, Next_char; int Counter; Current_Char := Get_char(); WHILE (NOT End_of_Line) DO Next_Char := Get_char(); IF (Current_Char != Next_char) THEN Current_Char := Next_Char; Next_Char := Get_char(); Write out Current_Char; ELSE WHILE (Current_Char == Next_char) DO Counter = Counter + 1; Current_Char := Next_Char; Next_Char := Get_char(); ENDWHILE; Write out Counter, Current_Char; Counter := 0; ENDIF; ENDWHILE; END.
  • 28.  Or let’s imagine we are sending a list of house prices.  350000  600000  550000  2100000  3000000
  • 29.  Now let’s use the # to indicate number of zeros:  350000  600000  550000  2100000  3000000
  • 30.  Now let’s use the # to indicate number of zeros:  350000  600000  550000  2100000  3000000 • 35#4 • 6#5 • 55#4 • 21#5 • 3#6
  • 31.  We’ve gone from 32 characters to 18 characters:  350000  600000  550000  2100000  3000000 • 35#4 • 6#5 • 55#4 • 21#5 • 3#6
  • 33.  Let’s think about images.  Let’s say we are trying to display the letter ‘A’
  • 34.  Let’s think about images.  Let’s say we are trying to display the letter ‘A’
  • 35.  We could encode this as:  WWWBBWWW  WWBWWBWW  WBWWWWBW  WBWWWWBW  WBBBBBBW  WBWWWWBW  WBWWWWBW  WWWWWWWW
  • 36.  We could compress this to:  WWWBBWWW  WWBWWBWW  WBWWWWBW  WBWWWWBW  WBBBBBBW  WBWWWWBW  WBWWWWBW  WWWWWWWW
  • 37.  We could compress this to:  WWWBBWWW  WWBWWBWW  WBWWWWBW  WBWWWWBW  WBBBBBBW  WBWWWWBW  WBWWWWBW  WWWWWWWW • 3W2B3W • 2WB2WB2W • WB4WBW • WB4WBW • W6BW • WB4WBW • WB4WBW • 8W
  • 38.  From 64 characters to 44 characters:  WWWBBWWW  WWBWWBWW  WBWWWWBW  WBWWWWBW  WBBBBBBW  WBWWWWBW  WBWWWWBW  WWWWWWWW • 3W2B3W • 2WB2WB2W • WB4WBW • WB4WBW • W6BW • WB4WBW • WB4WBW • 8W
  • 39.  We call this “run-length encoding” or RLE.
  • 40.  Now let’s add one more rule.
  • 41.  Now let’s add one more rule.  Let’s imagine if we send the number ‘0’ it means repeat the previous line.
  • 42.  So now we had:  WWWBBWWW  WWBWWBWW  WBWWWWBW  WBWWWWBW  WBBBBBBW  WBWWWWBW  WBWWWWBW  WWWWWWWW • 3W2B3W • 2WB2WB2W • WB4WBW • WB4WBW • W6BW • WB4WBW • WB4WBW • 8W
  • 43.  And we get:  WWWBBWWW  WWBWWBWW  WBWWWWBW  WBWWWWBW  WBBBBBBW  WBWWWWBW  WBWWWWBW  WWWWWWWW • 3W2B3W • 2WB2WB2W • WB4WBW • WB4WBW • W6BW • WB4WBW • WB4WBW • 8W • 3W2B3W • 2WB2WB2W • WB4WBW • 0 • W6BW • WB4WBW • 0 • 8W
  • 44.  Going from 64 to 44 to 34 characters:  WWWBBWWW  WWBWWBWW  WBWWWWBW  WBWWWWBW  WBBBBBBW  WBWWWWBW  WBWWWWBW  WWWWWWWW • 3W2B3W • 2WB2WB2W • WB4WBW • WB4WBW • W6BW • WB4WBW • WB4WBW • 8W • 3W2B3W • 2WB2WB2W • WB4WBW • 0 • W6BW • WB4WBW • 0 • 8W
  • 45.  For most images, the lines are repeated frequently, so you can get massive savings from RLE.