SlideShare a Scribd company logo
2
Most read
3
Most read
31
UNIVERSITY OF ANBAR LECTURE NOTES ON INFORMATION THEORY FOR 4
th
CLASS STUDENTS
COLLEGE OF ENGINEERING BY: NASER AL FALAHI ELECTRICAL ENGINEERING
LZ Coding & Data compression
A drawback of the Huffman code is that it requires knowledge of a probabilistic
model of the source; unfortunately, in practice, source statistics are not always known
a priori. thereby compromising the efficiency of the code. To overcome these practical
limitations, we may use the Lempel-Ziv algorithm/ which is intrinsically adaptive and
simpler to implement than Huffman coding.
Basically, encoding in the Lempel-Ziv algorithm is accomplished by parsing the source
data stream into segments that are the shortest subsequences not encountered pre-
viously. To illustrate this simple yet elegant idea, consider the example of an input
binary sequence specified as follows:
000101110010100101 ...
It is assumed that the binary symbols 0 and 1 are already stored in that order in the
code book. We thus write
Subsequences stored: 0, 1
Data to be parsed: 000101110010100101 . . .
The encoding process begins at the left. With symbols 0 and 1 already stored, the
shortest subsequence of the data stream encountered for the first time and not seen
before is 00; so we write
Subsequences stored: 0, 1, 00
Data to be parsed: 0101110010100101 . . .
The second shortest subsequence not seen before is 01; accordingly, we go on to write
Subsequences stored: 0, 1, 00, 01
Data to be parsed: 01110010100101 . . .
The next shortest subsequence not encountered previously is 011; hence, we write
Subsequences stored: 0, 1, 00, 01, 011
Data to be parsed: 10010100101 . . .
We continue in the manner until the given data stream has been completely parsed.
Thus, for the example at hand, we get the code book of binary subsequences shown in
the second row below:
WEEK 7
32
UNIVERSITY OF ANBAR LECTURE NOTES ON INFORMATION THEORY FOR 4
th
CLASS STUDENTS
COLLEGE OF ENGINEERING BY: NASER AL FALAHI ELECTRICAL ENGINEERING
The decoder is just as simple as the encoder. Specifically, it uses the pointer to identify
the root subsequence and then appends the innovation symbol. Consider, for example,
the binary encoded block 1101 in position 9. The last bit, 1, is the innovation symbol.
The remaining bits, 110, point to the root subsequence 10 in position 6. Hence, the block
1101 is decoded into 101, which is correct.
♣---------------------------------♣
Another procedure (the o/p code here is variable length code), given that 0 & 1 are not
stored in the encoder,
The method of compression is to replace a substring with a pointer to an earlier
occurrence of the same substring.
Example:
If we want to encode the string:
1011010100010. . . ,
We parse it into an ordered dictionary of substrings that have not appeared before as
follows:
λ, 1, 0, 11, 01, 010, 00, 10, . . . . We include the empty substring λ as the first substring in
the dictionary and order the substrings in the dictionary by the order in which they
emerged from the source. After every comma, we look along the next part of the input
sequence until we have read a substring that has not been marked off before. A
moment's reflection will confirm that this substring is longer by one bit than a substring
that has occurred earlier in the dictionary. This means that we can encode each
substring by giving a pointer to the earlier occurrence of that prefix and then sending
the extra bit by which the new substring in the dictionary differs from the earlier
substring. If, at the nth bit, we have enumerated s(n) substrings, then we can give the
value of the pointer in (log2 s(n)) bits. The code for the above sequence is then as shown
in the fourth line of the following table (with punctuation included for clarity), the
upper lines indicating the source string and the value of s(n):
33
UNIVERSITY OF ANBAR LECTURE NOTES ON INFORMATION THEORY FOR 4th
CLASS STUDENTS
COLLEGE OF ENGINEERING BY: NASER AL FALAHI ELECTRICAL ENGINEERING
Notice that the first pointer we send is empty, because, given that there is only one
substring in the dictionary - the string λ no bits are needed to convey the `choice' of that
substring as the prefix. The encoded string is 100011101100001000010. The encoding,
in this simple case, is actually a longer string than the source string, because there was
no obvious redundancy in the source string.
In simple words  ( the idea is to check the first row (substring) , remove the least bit
(innovation bit) and see the remaining bits ( pointer) – which must point to an earlier
code then we must take the binary equivalent of that code & add the innovation
symbol.
For example the last substring is (10) (0) is innovation- to be added later, & we have
remaining (1) which points to substring(1) with binary equivalent (001), so the sub-
code should be constructed from the pointer & the innovation bit, which yields (001,0).
Exercises:
Endocde the following stream using LZ algorithm:
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1 . . .
Endocde the following stream using LZ algorithm:
0011001111010100010001001 . . .

More Related Content

PPTX
Programming fundamental
DOCX
Instructionformatreport 110419102141-phpapp02
PPTX
Getting started with C++
PDF
Python syntax
PPTX
C Tokens
PPT
02. chapter 3 lexical analysis
PPT
PPTX
Complete Tokens in c/c++
Programming fundamental
Instructionformatreport 110419102141-phpapp02
Getting started with C++
Python syntax
C Tokens
02. chapter 3 lexical analysis
Complete Tokens in c/c++

What's hot (16)

PDF
Assignment5
DOC
Compiler Design QA
PPTX
Regular expression
PDF
Assignment10
PPT
Basic of c &c++
PPT
Basic concept of c++
PDF
Python workshop
PPTX
Constant and variacles in c
PDF
Python quick guide
PPTX
C tokens
PPT
Beginner C++ easy slide and simple definition with questions
PDF
HackerRank Repeated String Problem
PDF
2 variables and data types
PDF
Lexical
PDF
Assignment7
PPTX
Lexical analysis-using-lex
Assignment5
Compiler Design QA
Regular expression
Assignment10
Basic of c &c++
Basic concept of c++
Python workshop
Constant and variacles in c
Python quick guide
C tokens
Beginner C++ easy slide and simple definition with questions
HackerRank Repeated String Problem
2 variables and data types
Lexical
Assignment7
Lexical analysis-using-lex
Ad

Similar to Lz algorithm (20)

PDF
Data Communication & Computer Networks : LZ algorithms
PPT
Huffman Coding & Its Implementation on Matlab.ppt
PPT
Huffman Coding & Its Implementation on Matlab.ppt
PPTX
du i ydgf ysygd fun dvf ygd hd 04.pptx
PPTX
Text compression in LZW and Flate
DOCX
Arithmetic coding
PPTX
1.7Lempel Ziv algorithm Presentation.pptx
PPTX
1.7Lempel. Ziv. algorithm.pptx
PPT
Slides
PPT
Ch3 datalink
PDF
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
PDF
Bt0064 logic design1
PDF
4. Combinational Logic Circuits not bad.pdf
PPT
710402_Lecture 1.ppt
PDF
crc_checksum.pdf
PPTX
Chapter_Three.pptx of digital component design
PPTX
Information Theory and coding - Lecture 3
PPT
Data representation _
PPTX
Chapter 10: Error Correction and Detection
Data Communication & Computer Networks : LZ algorithms
Huffman Coding & Its Implementation on Matlab.ppt
Huffman Coding & Its Implementation on Matlab.ppt
du i ydgf ysygd fun dvf ygd hd 04.pptx
Text compression in LZW and Flate
Arithmetic coding
1.7Lempel Ziv algorithm Presentation.pptx
1.7Lempel. Ziv. algorithm.pptx
Slides
Ch3 datalink
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
Bt0064 logic design1
4. Combinational Logic Circuits not bad.pdf
710402_Lecture 1.ppt
crc_checksum.pdf
Chapter_Three.pptx of digital component design
Information Theory and coding - Lecture 3
Data representation _
Chapter 10: Error Correction and Detection
Ad

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Cloud computing and distributed systems.
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Big Data Technologies - Introduction.pptx
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Cloud computing and distributed systems.
sap open course for s4hana steps from ECC to s4
Big Data Technologies - Introduction.pptx
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
cuic standard and advanced reporting.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Programs and apps: productivity, graphics, security and other tools
Understanding_Digital_Forensics_Presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars

Lz algorithm

  • 1. 31 UNIVERSITY OF ANBAR LECTURE NOTES ON INFORMATION THEORY FOR 4 th CLASS STUDENTS COLLEGE OF ENGINEERING BY: NASER AL FALAHI ELECTRICAL ENGINEERING LZ Coding & Data compression A drawback of the Huffman code is that it requires knowledge of a probabilistic model of the source; unfortunately, in practice, source statistics are not always known a priori. thereby compromising the efficiency of the code. To overcome these practical limitations, we may use the Lempel-Ziv algorithm/ which is intrinsically adaptive and simpler to implement than Huffman coding. Basically, encoding in the Lempel-Ziv algorithm is accomplished by parsing the source data stream into segments that are the shortest subsequences not encountered pre- viously. To illustrate this simple yet elegant idea, consider the example of an input binary sequence specified as follows: 000101110010100101 ... It is assumed that the binary symbols 0 and 1 are already stored in that order in the code book. We thus write Subsequences stored: 0, 1 Data to be parsed: 000101110010100101 . . . The encoding process begins at the left. With symbols 0 and 1 already stored, the shortest subsequence of the data stream encountered for the first time and not seen before is 00; so we write Subsequences stored: 0, 1, 00 Data to be parsed: 0101110010100101 . . . The second shortest subsequence not seen before is 01; accordingly, we go on to write Subsequences stored: 0, 1, 00, 01 Data to be parsed: 01110010100101 . . . The next shortest subsequence not encountered previously is 011; hence, we write Subsequences stored: 0, 1, 00, 01, 011 Data to be parsed: 10010100101 . . . We continue in the manner until the given data stream has been completely parsed. Thus, for the example at hand, we get the code book of binary subsequences shown in the second row below: WEEK 7
  • 2. 32 UNIVERSITY OF ANBAR LECTURE NOTES ON INFORMATION THEORY FOR 4 th CLASS STUDENTS COLLEGE OF ENGINEERING BY: NASER AL FALAHI ELECTRICAL ENGINEERING The decoder is just as simple as the encoder. Specifically, it uses the pointer to identify the root subsequence and then appends the innovation symbol. Consider, for example, the binary encoded block 1101 in position 9. The last bit, 1, is the innovation symbol. The remaining bits, 110, point to the root subsequence 10 in position 6. Hence, the block 1101 is decoded into 101, which is correct. ♣---------------------------------♣ Another procedure (the o/p code here is variable length code), given that 0 & 1 are not stored in the encoder, The method of compression is to replace a substring with a pointer to an earlier occurrence of the same substring. Example: If we want to encode the string: 1011010100010. . . , We parse it into an ordered dictionary of substrings that have not appeared before as follows: λ, 1, 0, 11, 01, 010, 00, 10, . . . . We include the empty substring λ as the first substring in the dictionary and order the substrings in the dictionary by the order in which they emerged from the source. After every comma, we look along the next part of the input sequence until we have read a substring that has not been marked off before. A moment's reflection will confirm that this substring is longer by one bit than a substring that has occurred earlier in the dictionary. This means that we can encode each substring by giving a pointer to the earlier occurrence of that prefix and then sending the extra bit by which the new substring in the dictionary differs from the earlier substring. If, at the nth bit, we have enumerated s(n) substrings, then we can give the value of the pointer in (log2 s(n)) bits. The code for the above sequence is then as shown in the fourth line of the following table (with punctuation included for clarity), the upper lines indicating the source string and the value of s(n):
  • 3. 33 UNIVERSITY OF ANBAR LECTURE NOTES ON INFORMATION THEORY FOR 4th CLASS STUDENTS COLLEGE OF ENGINEERING BY: NASER AL FALAHI ELECTRICAL ENGINEERING Notice that the first pointer we send is empty, because, given that there is only one substring in the dictionary - the string λ no bits are needed to convey the `choice' of that substring as the prefix. The encoded string is 100011101100001000010. The encoding, in this simple case, is actually a longer string than the source string, because there was no obvious redundancy in the source string. In simple words  ( the idea is to check the first row (substring) , remove the least bit (innovation bit) and see the remaining bits ( pointer) – which must point to an earlier code then we must take the binary equivalent of that code & add the innovation symbol. For example the last substring is (10) (0) is innovation- to be added later, & we have remaining (1) which points to substring(1) with binary equivalent (001), so the sub- code should be constructed from the pointer & the innovation bit, which yields (001,0). Exercises: Endocde the following stream using LZ algorithm: 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1 . . . Endocde the following stream using LZ algorithm: 0011001111010100010001001 . . .