SlideShare a Scribd company logo
Acknowledgements
I would like to thank the Department of Electrical Engineering at the Indian Institute of
Technology (liT), Delhi for providing a stimulating academic environment that inspired this
book. In particular, I would like to thank Prof. S.C. Dutta Roy, Prof. Surendra Prasad,
Prof. H.M. Gupta, Prof. V.K Jain, Prof. Vinod Chandra, Prof. Santanu Chaudhury, Prof. S.D.
Joshi, Prof. Sheel Aditya, Prof. Devi Chadha, Prof. D. Nagchoudri, Prof. G.S. Visweswaran,
Prof. R K. Patney, Prof. V. C. Prasad, Prof. S. S. Jamuar and Prof. R K P. Bhatt. I am also
thankful to Dr. Subrat Kar, Dr. Ranjan K. Mallik and Dr. Shankar Prakriya for friendly
discussions. I have been fortunate to have several batches of excellent students whose feedback
have helped me improve the contents ofthis book. Many of the problems given at the end of the
chapters have been tested either as assignment problems or examination problems.
My heartfelt gratitude is due to Prof. Bernard D. Steinberg, University of Pennsylvania, who
has been my guide, mentor, friend and also my Ph.D thesis advisor. I am also grateful to
Prof. Avraham Freedman, Tel Aviv University, for his support and suggestions as and when
sought by me. I would like to thank Prof. B. Sundar Rajan of the Electrical Communication
Engineering group at the Indian Institute of Science, Bangalore, with whom I had a preliminary
discussion about writing this book.
I wish to acknowledge valuable feedback on this initial manuscript from Prof. Ravi Motwani,
liT Kanpur, Prof. A.K. Chaturvedi, liT Kanpur, Prof. N. Kumaravel, Anna University, Prof. V.
Maleswara Rao, College of Engineering, GITAM, Visakhapatnam, Prof. M. Chandrasekaran,
Government College of Engineering, Salem and Prof. Vikram Gadre, liT Mumbai.
I am indebted to my parents, for their love and moral support throughout my life. I am also
grateful to my grandparents for their blessings, and to my younger brother, Shantanu, for the
infinite discussions on finite topics.
Finally, I would like to thank my wife and best friend, Aloka, who encouraged me at·every
stage of writing this book. Her constructive suggestions and balanced criticism have been instru-
mental in making the book more readable and palatable. It was her infinite patience, unending
support, understanding and the sense of humour that were critical in transforming my セイ・。ュ@
into this book.
RANJAN BosE
New Delhi
Contents
Preface
Acknowledgements
Part I
Information Theory and Source Coding
1. Source Coding
1.1 Introduction to Information Theory 3
セ@ Uncertainty And Information 4
セaカ・イ。ァ・@ Mutual Information And Entropy 77
1.4 Information Measures For Continuous セ、ッュ@ Variables 74
.LVSource Coding Theorem 75
1.6 Huffman Coding 27
I.7 The Lempel-Ziv Algorithm 28
1.8 Run Length Encoding and the PCX Format 30
1.9 Rate Distortion Function 33
1.10 Optimum Quantizer Design 36
1.11 Introduction to Image Compression 37
1.12 The Jpeg Standard for Lossless Compression 38
1.13 The Jpeg Standard for Lossy Compression 39
1.14 Concluding Remarks 47
Summary 42
Problems 44
Computer Problems 46
2. Channel Capacity and Coding
2.1 Introduction 47
2.2 Channel Models 48
セ@ Channel Capacity 50
2.4 Channel Coding 52
W Information Capacity Theorem 56
IX
XII
3
47
2.6 The Shannon Limit 59
2.7 Random Selection of Codes 67
2.8 Concluding Remarks 67
Summary 68
Problems 69
Computer Problems 77
Part II
セ@ セN@ 1 Error Control Coding
セjエ@ 1JM"セ@ (Channel Coding)
3. · ear Block Codes for Error Correction
セ@ Introduction to Error Correcting Codes 75
3.2 JBasic Definitions 77
3.3 vMatrix Description of Linear Block Codes 87
Equivalent Codes 82
3.5 v'Parity Check Matrix 85
3.6 l.JDecoding of a Linear Block Code 87
3.7 Jsyndrome Decoding 94
3.8 Error Probability after Coding (Probability of Error Correction) 95
3.9 Perfect Codes 97
3.10 Hamming Codes 700
3.11 Optimal Linear Codes 702
3.12 Maximum Distance Separable (MDS) Codes 702
3.13 Concluding Remarks 102
Summary 703
Problems 705
Computer Problems 706
4. Cyclic Codes セセ@ セNQ@
4.1 Introduction to Cyclic Codes 708
tY Polynomials 709
,!Y The Division Algorithm for Polynomials 770
V A Method for Generating Cyclic Codes 775
W Matrix Description of Cyclic Codes 779
4.6 Burst Error Correction 727
75
108
4.7 Fire Codes 723
4.8 Golay Codes 724
4.9 Cyclic Redundancy Check (CRC) Codes 725
4.10
4.11
Circuit Implementation of Cyclic Codes
Concluding Remarks 732
728
Summary 732
Problems 134
Computer Problems 735
5. Bose-Chaudhuri Hocquenghem (BCH) Codes
セセN@
nJJ:,-idft セ@ z-id<. NGiヲセ@
5.1 Introduction to BCH Codes 736
セ@ Primitive Elements 737
e.)" Minimal Polynomials 739
セ@ Generator Polynomials in Terms of Minimal Polynomials
2.Y Some Examples of BCH Codes 743
5.6 Decoding of BCH Codes 747
V Reed-Solomon Codes 750
747
5.8 Implementation of Reed-Solomon Encoders and Decoders 753
5.9 Nested Codes 753
5.10 Concluding Remarks 755
Summary 756
Problems 757
Computer Problems 758
136
6. Convolutional Codes 159
6.1 Introduction to Convolutional Codes QUセ@ iJ. J) iJ セ@ 1! ', セl@
セ@ Tree Codes and Trellis Codes 760 JJtセセャ@ fo4! セ@ セ@ A/ '1/A セ@
セッャケョッュゥ。ャ@ Description of Convolutional Codes セセ@ 1 . (;__
(Analytical Representation) 765
V Distance Notions for Convolutional Codes 770
セ@ The Generating Function 773
6.6 Matrix Description of Convolutional Codes 776 セN@ セ@
V/ Viterbi Decoding of Convolutional Codes 778 R}1.L GSM OuセセゥヲッセN@
6.8 Distance Bounds for Convolutional Codes 785 1
6.9 Performance Bounds 787
6.10 Known Good Convolutional Codes 788
6.11 Turbo Codes 790
6.12 Turbo Decoding 792
6.13 Concluding Remarks 798
Summary 799
Problems 207
Computer Problems 203
7. Trellis Coded Modulation
7.1 Introduction to TCM 206
7.2 The Concept of Coded Modulation 207
7.3 Mapping by Set Partitioning 272
7.4 Ungerboeck's TCM Design Rules 276
7.5 Tern Decoder 220
7.6 Performance Evaluation for Awgn Channel 227
7.7 Computation of tip.ee 227
7.8 Tern for Fading Channels 228
7.9 Concluding Remarks · 232
Summary 233
Problems 234
Computer Problems 238
Partm
Coding for Secure Communications
. Cryptography
8.1 Introduction to Cryptography 247 L j
8.2 An Overview of Encryption Techniques 242 セ@ セ@ セセセ@
8.3 Operations Used By Encryption Algorithms 245
8.4 Symmetric (Secret Key) Cryptography 246
8.5 Data Encryption Standard (DES) 248
8.6 International Data Encryption Algorithm (IDEA) 252
8.7 RC Ciphers 253
8.8 Asymmetric (Public-Key) Algorithms 254
8.9 The RSA Algorithm 254
8.10 Pretty Good Privacy (PGP) 256
8.11 One-Way Hashing 259
8.12 Other Techniques 260
8.13 Secure Communication Using Chaos Functions 267
/xvul
8.14 Cryptanalysis 262
8.15 Politics of Cryptography 264
8.16 Concluding Remarks 265
Summary 268
Problems 269
206
Computer Problems 277
Index 273
241
information_theory_coding_and_cryptograp.pdf
1
Source Coding
,
Not- セ@ tluU: ca.t'1 be- セ@ セ@ a.n.d.- n.ot
セ@ tluU; CCf.U'.t¥ 。オ|セ@ 「・Mセ@ .
-Alberl" eセ@ (1879-1955)
1.1 INTRODUCTION TO INFORMATION THEORY
Today we live in the information age. The internet has become an integral part of our lives,
making this, the third planet from the sun, a global village. People talking over the cellular
phones is a common sight, sometimes even in cinema theatres. Movies can be rented in the
form of a DVD disk. Email addresses and web addresses are common on business cards. Many
people prefer to send emails and e-cards to their friends rather than the regular snail mail. Stock
quotes can be checked over the mobile phone.
Information has become the key to success (it has always been a key to success, but in today's
world it is tlu key). And behind all this information and its exchange lie the tiny l's and O's {the
omnipresent bits) that hold information by merely the way they sit next to one another. Yet the
information age that we live in today owes its existence, primarily, to a seminal paper published
in 1948 that laid the foundation of the wonderful field of Information Theory-a theory
initiated by one man, the American Electrical Engineer Claude E. Shannon, whose ideas
Information Theory, Coding and Cryptography
appeared in the article "The Mathematical Theory of Communication" in the Bell System
Technical]ournal (1948). In its broadest sense, information includes the content of any of the
standard communication media, such as telegraphy, telephony, radio, or television, and the
signals of electronic computers, servo-mechanism systems, and other data-processing devices.
The theory is even applicable to the signals of the nerve networks of humans and other animals.
The chief concern of information theory is to discover mathematical laws governing systems
designed to communicate or manipulate information. It sets up quantitative measures of
information and of the capacity of various systems to transmit, store, and otherwise process
information. Some of the problems treated are related to finding the best methods of using
various available communication systems and the best methods for separating wanted
information or signal, from extraneous information or noise. Another problem is the setting of
upper bounds on the capacity of a given information-carrying medium (often called an
information channel). While the results are chiefly of interest to communication engineers,
some of the concepts have been adopted and found useful in such fields as psychology and
linguistics.
The boundaries of information theory are quite fuzzy. The theory overlaps heavily with
communication theory but is more oriented towards the fundamental limitations on the
processing and communication of information and less towards the detailed operation of the
devices employed.
In this chapter, we shall first develop an intuitive understanding of information. It will be
followed by mathematical models of information sources and a quantitative measure of the
information emitted by a source. We shall then state and prove the source coding theorem.
Having developed the necessary mathematical framework, we shall look at two source coding
techniques, the Huffman encoding and the Lempel-Ziv encoding. This chapter will then
discuss the basics of the Run Length Encoding. The concept of the Rate Distortion Function
and the Optimum Quantizer will then be introduced. The chapter concludes with an
introduction to image compression, one of the important application areas of source coding. In
particular, theJPEG (joint Photographic Experts Group) standard will be discussed in brief.
1.2 UNCERTAINTY AND INFORMATION
Any information source, analog or digital, produces an output that is random in nature. If it
were not random, i.e., the output were known exactly, there would be no need to transmit it!
We live in an analog world and most sources are analog sources, for example, speech,
temperature fluctuations etc. The discrete sources are man-made sources, for example, a source
(say, a man) that generates a sequence of letters from a finite alphabet (typing his email).
Before we go on to develop a mathematical measure of information, let us develop an
intuitive feel for it. Read the following sentences:
I I
Source Coding
(A) Tomorrow, the sun will rise from the East.
(B) The phone will ring in the next one hour.
(C) It will snow in Delhi this winter.
The three sentences carry different amounts of information. In fact, the first sentence hardly
carries any information. Everybody knows that the sun rises in the East and the probability of
this happening again is almost unity. Sentence (B) appears to carry more information than
sentence (A). The phone may ring, or it may not. There is a finite probability that the phone will
ring in the next one hour (unless the maintenance people are at work again!). The last sentence
probably made you read it over twice. This is because it has never snowed in Delhi, and the
probability of a snowfall is very low. It is interesting to note that the amount of information
carried by the sentences listed above have something to do with the probability of occurrence of
the events stated in the sentences. And we observe an inverse relationship. Sentence (A), which
talks about an event which has a probability of occurrence very close to 1 carries almost no
information. Sentence (C), which has a very low probability of occurrence, appears to carry a
lot of information (made us read it twice to be sure we got the information right!). The other
interesting thing to note is that the length of the sentence has nothing to do with the amount of
information it conveys. In fact, sentence (A) is the longest but carries the minimum information.
We will now develop a mathematical measure of information.
Definition 1.1 Consider a discrete random variable X withpossible 011teomes·セG@ i:::;:
1, 2, ..., n. . ·.. . .
The Self-Information of the event X= xi is defined as
/(xi}= log (-
1
-) =-log P(x-)
P(x1} •
. (1.1)
We note that a high probability event conveys less information than a low probability event.
For an event with P(x) = 1, J(x) = 0. Since a lower probability implies a higher degree of
uncertainty (and vice versa), a random variable with a higher degree of uncertainty contains
more information. We will use this correlation between uncertainty and level of information for
physical interpretations throughout this chapter.
The units of I(x) are determined by the base of the logarithm, which is usually selected as 2 or
e. When the base is 2, the units are in bits and when the base is e, the units are in nats (natural
units). Since 0 セ@ P(xl; セ@ 1, J(x;) ;;::: 0, i.e., self information is non-negative. The following two
examples illustrate why a logarithmic measure of information is appropriate.
Information Theory, Coding and Cryptography
Example 1.1 Consider a binary source which tosses a fair coin and outputs a 1 if a head (H)
appears and aO if a tail (T) appears. For this source,P{l) =P(O) =0.5. The information content of
each output from the source is
I(x;) = :_ log2 P (x;)
= -log2 P (0.5) = 1 bit (1.2)
Indeed, we have to use only one bit to represent the output from this binary source (say, we use
a 1 to represent H and a 0 to represent T).
Now, suppose the successive outputs ·from this binary source are statistically independent, i.e.,
the source is memoryless. Consider a block of m bits. There are 2m possible m-bit blocks, each of
which is equally probable with probability 2-m .
The self-information of an m-bit block is
I(x;) = - log2 P (xi)
= - log2 2-m = m bits (1.3)
Again, we observe that we indeed need m bits to represent the possible m-bit blocks.
Thus, this logarithmic measure of information possesses the desired additive property when a
number of source outputs is considered as a block.
Example 1.2 Consider a discrete, memoryless source (DMS) (source C) that outputs two bits at
a time. This source comprises two binary sources (sourcesA andB) as mentioned in Example 1.1,
each source contributing one bit. The two binary sources within the source Care independent.
Intuitively, the information content of the aggregate source (source C) should be the sum of tbe
information contained in the outputs of the two independent sources that constitute this セcN@
Let us look at the information content ofthe outputs of sourceC. There are four possible outcomes
{00, 01, 10, 11 }, each with a probability P(C) = P(A)P(B) = (0.5)(0.5) =0.25, because the source
A and B are independent. The information content ofeach output from the source Cis
I(C) = - log2 P(x;)
= -log2 P(0.25) = 2 bits (1.4)
We have to use two bits to represent the outpat from this combined binary source.
Thus, the logarithmic measure of information possesses the desired additive property for
independent events.
Next, consider two discrete random variables X and Ywith possible outcomes X;, i = 1, 2, ..., 11
and Yj• j = 1, 2, ..., m respectively. Suppose we observe some outcome Y = Yi and we want to
Source Coding
determine the amount of information this event provides about the eventX =x;, i = 1, 2, ..., 11, i.e.,
we want to mathematically represent the mutual information. We note the two extreme cases:
(i) X and Yare independent, in which case the occurrence of Y = Yj provides no information
aboutX=x;.
(ii) X and Yare fully dependent events, in which case the occurrence ofY= yi determines the
occurrence of the event X= x;·
A suitable measure that satisfies these conditions is the logarithm of the ratio of the conditional
probability
P(X = X; I Y = Yj) = P(x; IY}
divided by the probability
P(X =X;) = P(x;)
Definition 1.2 The mutual information I(x;; y) between X; and Yi is defined as
I(x,; y) =log ( ーセZセサII@
(1.5)
(1.6)
(1.7)
As before, the units of I(x) are determined by the base of the logarithm, which is
usually selected as 2 or e. When the base is 2 the units are in bits. Note that
Therefore,
P(x;!yi) _ P(x;IYi)P(y;) _ P(x;,y1) _ P(y1lx;)
P{X;) - P(x;)P{y;) - P(x;)P(y1
)- P(y1
)
(1.8)
{1.9)
The physical interpretation of I(x;; y
1) = I(y1; xJ is as follows. The information provided by the
occurrence.of the event Y= y1about the event X= X; is identical to the information provided by
the occurrence of the event X= X; about the event Y = yl
Let us now verify the two extreme cases:
(i) When the random variables X and Yare statistically independent, P(x; Iy1)= P(xJ, it leads
to I(x;; y) = 0.
(ii) When the occurrence of Y = y
1uniquely determines the occurrence of the event X= X;,
P(x; I'1}·) = 1, the mutual information becomes
I(x;; y) = lo{ Ptx;)) =-log P(x;) (1.10)
This is the self-information of the event X= X;.
Thus, the logarithmic definition of mutual information confirms our intuition.
Information Theory, Coding and Cryptography
Example 1.3 Consider a Binary Symmetric Channel (BSC) as shown in Fig. 1.1. It is a channel
that transports 1's and O's from the transmitter (Tx) to the receiver (Rx). It makes an error
occasionally, with probabilityp. A BSC flips a 1to 0 and vice-versa with equal probability. Let X
and Ybe binary random variables that represent the input and output of this BSC respectively. Let
the input symbols be equally likely and the output symbols depend upon the input according to the
channel transition probabilities as given below
P(Y =0 I X= 0) =1 - p,
P(Y=OIX=1)=p,
P(Y = 11 X= 1) = 1-p,
P(Y = 1 I X= 0) = p.
Channel
Fig. 1.1 A Binary Symmetric Channel.
It simply implies that the probability of a bit getting flipped (i.e. in error) when transmitted over
this
BSC is p. From the channel transition probabilities we have
P(Y = 0) = P(X = 0) X P(Y = 0 I X= 0) + P(X = 1) X P(Y = 0 I X= 1)
= 0.5(1- p) + 0.5(p) = 0.5, and,
P(Y= 1)=P(X=0) X P(Y= 11X=0)+P(X= 1) X P(Y= 11X= 1)
= 0.5(p) + 0.5(1- p) = 0.5.
Suppose we are at the receiver and we want to determine what was transmitted at the transmitter,
on the basis of what was received. The mutual information about the occurrence ofthe eventX= 0
given that Y= 0 is
(
P(Y = OIX =0)) (!=__.e_J
I (x0; yo) = /(0; 0) = log2 = log2 = log22(1 - p).
P(Y=O) 0.5
Similarly,
l(x1; yo)= /(1; 0) = log2 = log2 _l!_ =1ogz2p.
. ( P(Y =OIX =1)) ( )
P(Y=O) 0.5
Let us consider some specific cases.
Source Coding
Suppose, p =0, i.e., it is an ideal channel (noiseless), then,
l(Xo; y0) = 1(0; 0) = log22(1 - p) = 1 bit.
Hence, from the output, we can determine what was transmitted with certainty. Recall that the self-
information about the event X= x0 was 1bit.
However, ifp = 0.5, we get
l(x0; y0) = /(0; 0) = log22(1- p) = log22(0.5) = 0.
It is clear from the output that we have no information about what was transmitted. Thus, it is a
useless channel. For such a channel, we may as well toss a fair coin at the receiver in order to
determine what was sent!
Suppose we have a channel wherep = 0.1, then,
l(x0; yo)= 1(0; 0) = logz2(1- p) = log22(0.9) = 0.848 bits.
Example 1.4 Let X and Y be binary random variables that represent the input and output of a
binary channel shown in Fig. 1.2. Let the input symlfols be equally likely, and the output symbols
depend upon the input according to the channel transition probabilities:
P(Y = 0 I X= 0) = 1 - p0,
P(Y = 0 I X= 1) = p 1,
P(Y = 1 I X = 1) = 1 - p1,
P(Y =1 I X= 0) =p0.
1-Po
Channel
Fig. 1.2 A Binary Channel with Asymmetric Probabilities.
From the channel transition probabilities we have
P(Y = 0) = P(X = O).P(Y = 0 I X= 0) + P(X = I).P(Y = 0 I X= 1)
= 0.5(1 - p0) + 0.5(p1) = 0.5(1 -Po+ p1), and,
pHyセ@ 1) =P(X=O).P(Y= 11X=0)+ P(X= l).P(Y= II X= 1)
= 0.5(p0) + 0.5(1 - p1) = 0.5(1 - Pt + po).
Information Theory, Coding and Cryptography
Suppose we are at the receiver and we want to determine what was transmitted at the transmitter,
on the basis of what is received. The mutual information about the occurrence of the event X= 0
given that Y = 0 is
I(x,; yo) =/(0; O) =log2( pHZWy P
セ P [@ O)) =log2 ( 0.5(/--セ P@
+1'1J=log2 C
セセZ@ セセ@ ).
Similarly,
(
P(Y =OIX =1)) ( 2Pt )
l(x1; yo)= /(1; 0) = log2 = log2 •
P(Y =0) 1- Po + Pt
Definition 1.3 The Conditional Self Information of the event X= X; given Y= y
1
is defined as
/(x11y) =ws(pHセijゥII@ =-log セクLャ@ Y)· (1.11)
Thus, we may write
I{x6 y
1) = I(xJ - I(x; Iy
1). (1.12)
The conditional self information can be interpreted as the self information about the
event X= X; on the basis of the event Y= Yt Recall that both J(x;) セ@ 0 and I(x; Iy1)セ@ 0.
Therefore, I(x;; y1) <0 when J..x;) < I(x; Iy1) and I(x6 y
1) >0 when /(x;) > /(x; IY)·
Hence, mutual information can be positive, negative or zero.
Examph 1.5 Consider the BSC discussed in Example 1.3. The plot of the mutual information
l(x0; yo) versus the probability of error, pis given in Fig. 1.3.
1 M]セMMセMMセMMセセMMセMMNMMMセセセセ@
I I I I I I I
セU@ MMセMMセMMセMMセMMセMMセMM
1 I I I I I I I
0 MMセMMセMMセMMセMM MMセMMセMMセMMセMM
1 I I I I I I I I
I I I I I I I I
MセU@ --l--l--l--l--l--l MセMMセMMセMM
1 I I I I I I I
-1 MMセMMセMMセMMセMMセMMセMMセ@ ,--,--
1 I I I I I I I
-1.1 ----1---i---i---i----1---i----j-- ---j--
1 I I I I I I I I
-2 MMセMMセMMセMMセMMMMQMMMMQMMMMQMMMMQM -l-- ,,
I I I I I I I I I
- 2.5 L_____l._____L__ ___L____.J..___.i._ ____J._---,.-----l__.J..__.,.--l,--___J
0
Fig. 1.3 The Plot of ftle Mutua/Information /(xo: ycJ vbセsus@ ftle Probability of Error. p.
Source Coding
It can be seen from the figure thatl(Xo; y0) is negative forp > 0.5. The physical interpretation is as
follows. A negative mutual information implies that having observed Y = y0, we must avoid
choosing X = Xo as the transmitted bit.
For p = 0.1,
l(x0; y1) = 1(0; 1) = log22{p) = log22(0.1) =- 2.322 bits.
This shows that the mutual information between the eventsX= Xo andY= y1 is negative forp =0.1.
For the extreme case ofp = 1, we have
l(x0; y1) = 1(0; 1) = log22{p) = log22(1) =- I bit.
The channel always changes a 0 to a 1 and vice versa (since p = 1). This implies that if y1
is
observed at the receiver, it can be concluded that Xo was actually transmitted. This is actually a
useful channel with a 100% bit error rate! We just flip the received bit.
1.3 AVERAGE MUTUAL INFORMATION AND ENTROPY
So far we have studied the mutual information associated with a pair of events xi and y
1
which
are the possible outcomes of the two random variables X and Y. We now want to find out the
average mutual information between the two random variables. This can be obtained simply by
weighting !{xi; y
1) by the probability of occurrence of the joint event and summing over all
possible joint events.
Definition 1.4 The Average Mutual Information between two random variables
X and Yis given by
For the case when X and Yare statistically independent, I(X; Y) = 0, i.e., there is no
average mutual information between X and Y. An important property of the average
mutual information is that /(X; Y) セ@ 0, where equality holds if and only if X and Yare
statistically independent.
Definition 1.5 The Average Self-Information of a random variable Xis defined as
n n
H(X) = LP(x;)I(Xj) =- LP(xi)logP(Xj) {1.14)
i=l i=l
When X represents the alphabet of possible output letters from a source, H(X)
represents the average information per source letter. In this case H (X) is called the
entropy. The term entropy has been borrowed from statistical mechanics, where it is
used to denote the level of disorder in a system. It is interesting to see that the Chinese
character for entropy looks like II!
Information Theory, Coding and Cryptography
Example 1.6 Consider a discrete binary source that emits a sequence of statistically independent
symbols. The output is either a 0 with probabilityp or a 1 with a probability 1- p. The entropy of
this binary source is
1
H(X) = - L P(x; )log P(x;) =- plog2 (p)- (1- p) log2 (1- p) '(1.15)
i=O
The plot of the Binary Entropy Function versus p is given in Fig. 1.4.
We observe from the figure that the value of the binary entropy function reaches its maximum
value for p = 0.5, i.e., when both 1 and 0 are equally likely. In general it can be shown that the
entropy of a discrete source is maximum when the letters from the source are equally probable.
H(X)
1 セMMMNMMMMPMWMMセMMMMNMMMMN@
0.8
0.6
0.4
0.2
0.2 0.4 0.6 0.8
Fig. 1.4 The Binary Entropy Function, H (X)=- p log2 (p)- (7 - p) log2 (1 - p).
Definition 1.6 The Average Conditional Self-Information called the conditional
entropy is defined as
n m
1
H(X IY) = LLP(xi, y1)log ( )
i=1J=1 P xiiYJ
(1.16)
The physical interpretation of this definition is as follows. H(XIY) is the information
(or uncertainty) in X having observed Y. Based on the definitions of H(X IY) and
H( YIX) we can write
I(X; Y) = H(X)- H(XIY) = H(Y)- H(YIX). (1.17)
We make the following observations.
(i) Since /(X; Y) セ@ 0, it implies that H(X) セ@ H(XI Y).
Source Coding
(ii) The case I (X; Y) = 0 implies that H(X) = H(XI Y), and it is possible if and only
if X and Yare statistically independent.
(iii) Since H(X IY) is the conditional self-information about X given Y and H(X) is
the average uncertainty (self-information) of A'; I(X; Y) is the average
uncertainty about xセ。カゥョァ@ observed Y.
(iv) Since H(X) セ@ H(X IY), the observation of Y does not increase the entropy
(uncertainty). It can only decrease the entropy. That is, observing Y cannot
reduce the information about セ@ it can only add to the information.
Example 1.7 Consider the BSC discussed in Example 1.3. Let the input symbols be '0' with
probability q and '1' with probability 1- q as shown in Fig. 1.5.
1-p
Probability 0 セMMMMMMMMMMMMMMMMMNNNN@ 0
q
Tx
1-q
Channel
Fig. 1.5 A Binary Symmetric Channel (BSC) with Input Symbols
Probabilities Equal to q and 7 - q.
The entropy of this binary source is
1
Rx
1
H(X) =- L P(x;)logP(x;) = -qlog2(q)- (1- q)log2 (1- q)
i=O
The conditional entropy is given by
n m
1
H(XlY)= L L,P(x;,y)log--
i=1 j=1 p(x;IYj)
In order to calculate the values ofH(XlY), we can make use of the following equalities
P(x;, Y) =P(x; IY) P(y) =P(yj IX;) P(x;)
The plot ofH(XIY) versus q is given in Fig. 1.6 withp as the parameter.
(1.18)
(1.19)
Information Theory, Coding and Cryptography
H(XJY)
Fig. 1.6 The Plot of Conditional Entropy H(XI Y) Versus q.
The average mutual information /(X; Y) is given in Fig. 1.7. It can be seen from the plot that as we
increase the parameterp from 0 to 0.5, I(X; Y) decreases. Physically it implies that, as we make the
channel less reliable (increase the value ofp セ@ 0.5), the mutual information between the random
variable X (at the transmitter) and the random variable Y (receiver) decreases.
1.4 INFORMATION MEASURES FOR CONTINUOUS RANDOM VARIABLES
The definitions of mutual information for discrete random variables can be directly extended to
continuous random variables. Let X and Y be random variables with joint probability density
function (pdf} p(x, y) and marginal pdfs p{x) and p(y). The average mutual information between X
and Y is defined as follows.
Definition 1.7 The average mutual information between two continuous
random variables X and Y is defined as
I(X:, Y) = j jp(x)p(ylx)log p(ylx)p(x) dxdy
---- p(x)p(y)
(1.20)
Fig. 1.7 The Plot of the Average Mutua/Information I(X: 'r? Versus q.
r
I
I
i
Source Coding
It should be pointed out that the definition of average mutual information can be
carried over from discrete random variables to continuous random variables, but the
concept and physical interpretation cannot. The reason is that the information
content in a continuous random variable is actually infinite, and we require infinite
number of bits to represent a continuous random variable precisely. The self-
information and hence the entropy is infinite. To get around the problem we define
a quantity called the differential entropy.
Definition 1.8 The differential entropy of a continuous random variable X is
defined as
hHセ@ =-Ip(x)logp(x) (1.21)
Again, it should be understood that there is no physical meaning attached to the
above quantity. We carry on with extending our definitions further.
Definition 1.9 1he Average Conditiona) Entropy of a continuous random
variables X given Y is defined as
H(XI Y) = I Ip(x, ケIャッァーHクャケIセ、ケ@ (1.22)
The average mutual information can be expressed as
I(X:, Y)=H(X) -H(XIY)=H(Y) -H(YIX) (1.23)
1.5 SOURCE CODING THEOREM
In this section we explore efficient representation (efficient coding) of symbols generated by a
source. The primary objective is the compression of data by efficient representation of the
symbols. Suppose a discrete memoryless source (DMS) outputs a symbol every t seconds and
each symbol is selected from a finite set of symbols xfl i= 1, 2, ..., L, occurring with probabilities
P (x;), i = 1, 2, ..., L, the entropy of this DMS in bits per source symbols is
L
H(X) = L P(x; )log2 P(x;) :5log2 L (1.24)
j=!
The equality holds when the symbols are equally likely. It means that the average number of
bits per source symbol is H(X) and the source rate is H(X)Itbitslsec.
Now let us represent the 26 letters in the English alphabet using bits. We observe that
25
= 32 > 26. Hence, each of the letters can be uniquely represented using 5 bits. This is an
example of a Fixed Length Code (FLC). Each letter has a corresponding 5 bit long codeword.
Information Theory, Coding and Cryptography
I Definition 1.10 A code is a set of vectors called codewords.
Suppose a DMS outputs a symbol selected from a finite set of symbols xi, i= 1, 2, ...,
L. The number of bits R required for unique coding when L is a power of 2 is
R = log2
L, (1.25)
and, when L is not a power of 2, it is
R = Llog2LJ + 1. (1.26)
As we saw earlier, to encode the letters of the English alphabet, we need R= Llog226J
+ 1 = 5 bits. The FLC for the English alphabet suggests that every letter in the
alphabet is equally important (probable) and hence each one requires 5 bits for
representation. However, we know that some letters are less common (x, q, z etc.)
while others are more frequently used (s, t, e etc.). It appears that allotting equal
number of bits to both the frequently used letters as well as not so commonly used
letters is rwt an efficient way of representation (coding). Intuitively, we should
represent the more frequently occurring letters by fewer number of bits and represent
the less frequently occurring letters by larger number of bits. In this manner, if we
have to encode a whole page of written text, we might end up using fewer number of
bits overall. When the source symbols are not equally probable, a more efficient
method is to use a Variable Length Code (VLC).
Example 1.8 Suppose we have only the frrst eight letters of the English alphabet (A-H) in our
vocabulary. The Fixed Length Code (FLC) for this set of letters would be
Letter Codeword Letter
A 000 E
B 001 F
c 010 G
D 011 H
Fixed Length Code
A VLC for the same set of letters can be
Letter Codeword Letter
A 00 E
B 010 F
c 011 G
D 100 H
Variable Length Code 1
Codeword
100
101
110
111
Codeword
101
110
1110
1111
f
Source Coding
Suppose we have to code the series of letters: "A BAD CAB". The fixed lertgth and the variable
length representation of the pseudo sentence would be
Fixed Length Code 000 001 000 011 010 000 001 I Total bits- 21
Variable Length Code 00 010 00 100 011 00 010 I Total bits セ@ 18
Note that the variable length code uses fewer number of bits simply because the letters appearing
more frequently in the pseudo sentence are represented with fewer number of bits.
We look at yet another VLC for the frrst 8 letters of the English alphabet:
Letter Codeword Letter Codeword
A 0 E 10
B 1 F 11
c 00 G 000
D 01 H Ill
Variable Length Code 2
This second variable length code appears to be more efficient in terms of representation of the
letters.
Variable Length Code 1 00 010 00 100 011 00 010 Total bits= 18
Variable Length Code 2 0 1001 0001 Total bits = 9
However there is a problem with VLC2. Consider the sequence of bits here-0 100I 0001 which is
used to represent A BAD CAB. We could regroup the bits in a different manner to have [0]
[10][0][1] [0][0][01] which translates to A EAB AAD or [0] [1][0][0][1] [0][0][0][1] which
stands for A BAAB AAAB ! Obviously there is a problem with the unique decoding of the code.
We have no clue where one codeword (symbol) ends and the next one begins, since the lengths of
the codewords are variable. However, this problem does not exist with VLCl. Here no codeword
forms the prefix ofany other codeword. This is called the prefix condition. As soon as a sequence
ofbits corresponding to any one ofthe possible codewords is detected, we can declare that symbol
decoded. Such codes called Uniquely Decodable or Instantaneous Codes cause no decoding
delay. In this example, the VLC2 is not a uniquely decodable code, hence not a code ofany utility.
The VLC1 is uniquely decodable, though less economical in terms of bits per symbol.
Definition 1.11 A Prefix Code is one in which no codeword forms the prefix of any
other codeword. Such codes are also called Uniquely Decodable or Instantaneous
Codes.
We now proceed to devise a systematic procedure for constructing uniquely
decodable, Variable Length Codes that are efficient in terms of average number of
bits per source letter. Let the source output a symbol from a finite set of symbols xi'!
Information Theory, Coding and Cryptography
i= 1, 2, ..., L, occurring with probabilities P(xJ, i = 1, 2, ..., L. The average number of
bits per source letter is defined as
L
R = L n(xk)P(xk) (1.27)
k=l
where n(xi) is the length of the codeword for the symbol xi.
Theorem 1.1 (Kraft Inequality) A necessary and sufficient condition for the existence of
a binary code with codewords having lengths n1 :5 n;. :5 ... nL that satisfy the prefix condition
is
(1.28)
Proof First we prove the sufficient condition. Consider a binary tree of order (depth) n=
nL. This tree has 2nL terminal nodes as depicted in Fig. 1.8. Let us select any code of order
n1 as the first codeword c1. Since no codeword is the prefix of any other codeword (the
prefix condition), this choice eliminates 2n--n, terminal codes. This process continues until
the last codeword is assigned at the terminal node n = nL" Consider the node of orderj <L.
The fraction of number of terminal nodes eliminated is
j L
LTnk < _Lz-nk :51. (1.29)
k=l k=l
Thus, we can construct a prefix code that is embedded in the full tree of nL nodes.
The nodes that are eliminated are depicted by the dotted arrow lines leading on to them in
.the figure.
0
0
Fig. 1.8 A Binary Tree of Order nL.
We now prove the necessary condition. We observe that in the code tree of order n = nb
the number of terminal nodes eliminated from the total number of 2n terminal nodes is
r
Source Coding
This leads to
Example 1.9 Consider the construction of a prefix code using a binary tree.
no
0
0
..セセセNNNNM@
nco セOONNNNM ---......._
MMMMBVセセセ@ ......•
---.......
llo10 セMᄋ@
---......_ no110
セᄋ@
no11 1 • no111
Fig. 1.9 Constructing a Binary Pr!!flx Code using a Binary Tree.
(1.30)
(1.31)
We start from the mother node and proceed toward the terminal nodes ofthe binary tree (Fig. 1.9).
Let the mother node be labelled '0' (could have been labelled '1' as well). Each node gives rise to
two branches (binary tree). Let's label the upper branch '0' and the lower branch '1' (these labels
could have also been mutually exchanged). First we follow the upper branch from the mother node.
We obtain our first codeword c1 = 0 terminating at node floo· Since we want to construct a prefix
code where no codeword is a prefix of any other codeword, we must discard all the daughter nodes
generated as a result of the node labelled c1•
Next, we proceed on the lower branch from the mother node and reach the node no1• We proceed
along the upper branch first and reach node now· We label this as the codeword c2 = 10 (the labels
of the branches that lead up to this node travelling from the moj}er node). Following the lower
branch from the nodeno1, we ultimately reach the terminal nodes no110 andn0111, which correspond
to the codewords c3 =110 and c4 =111 respectively.
Thus the binary tree has given us four prefix codewords: {0, 10, 110, 111 }. By construction, this
is a prefix code. For this code
L
L 2-nk =2-1
+ T 2
+ 2-3
+ T 3
=0.5 + 0.25 + 0.125 + 0.125 =1
k=l
Thus, the Kraft inequality is satisfied.
We now state and prove the noiseless Source Coding theorem, which applies to the codes that
satisfy the prefix condition.
.I
Information Theory, Coding and Cryptography
Theorem 1.2 (Source Coding Theorem) Let Xbe the set ofletters from a DMS with finite
entropy H(X) and xb k= 1, 2, ..., L the output symbols, occurring with probabilities P(xk),
k = 1, 2, ..., L. Given these parameters, it is possible to construct a code that satisfies the
prefix condition and has an average length R that satisfies the inequality
H(X) セ@ R<H(X) + 1 (1.32)
Proof First consider the lower bound of the inequality. For codewords that have length
ョセッ@ Qセ@ k セ@ L, the difference R - H(X) can be expressed as
_ L
1
L L
2
-nk
H(X) - R = L Pk log2-- L Pknk = L Pk log2-
k=l Pk k=l k=l Pk
We now make use of the inequality ln x セ@ x - 1 to get
,; (log2 e) (t.T'' -1J,;0
The last inequality follows from the Kraft inequality. Equality holds if and only ifPk = 2-nk
for 1 セ@ ォセ@ L.
Thus the lower bound is proved. Next, we prove the upper bound. Let us select the
codeword lengths nk such that 2-nk セ@ Pk < 2-nk + 1
• First consider 2-nk セ@ pセ^@ Summing both
sides over 1 セ@ k セ@ L gives us
L L
L 2-nk セ@ L Pk = 1
k=l k=l
which is the Kraft inequality for which there exist a code satisfying the prefix condition.
Next consider Pk < 2-nk+I. Take logarithm on both sides to get
log2 Pk <- nk + 1,
or,
nk < 1 - log2 Pk·
On multiplying both sides by Pk and summing over 1 セ@ k セ@ L we obtain
t.p.n,< t.h+(-セOGャッァ R ィ@ l
or,
R<H(X) + 1
r
I
I
i
i
i
I
Source Coding
Thus the upper bound is proved.
The Source Coding Theorem tells us that for any prefix code used to represent the symbols
from a source, the minimum number of bits required to represent the source symbols on an
average must be at least equal to the entropy of the source. If we have found a prefix code that
satisfies R= H(X) for a certain source .A: rve must abandon further search because we cannot do
any better. The theorem also tells us that a source with higher entropy (uncertainty) requires, on
an average, more number of bits to represent the source symbols in terms of a prefix code.
Definition 1.12 The efficiency of a prefix code is defined as
11
= Hj_x)
R
(1.33)
It is clear from the source coding theorem that the efficiency of a prefix code 11 セ@ 1.
Efficient representation of symbols leads to compression of data. Source coding is
primarily used for compression of data (and images).
Example 1.10 Consider a source X which generates four symbols with probabilities PI = 0.5,
p2 = 0.3, p3 = 0.1 and p4 = 0.1. The entropy of this source is
4
H(X) = - L Pk log2 Pk = 1.685 bits.
k=I
Suppose we use the prefix code {0, 10, 110, 111} constructed in Example 1.9. Then the average
codeword length, R is given by
4
R = L n(xk)P(xk) = 1(0.5) + 2(0.3) + 3(0.1) + 3(0.1) = 1.700 bits.
k=I
Thus we have
H(X) セ@ R セhHxI@ + 1
The efficiency of this code is 11 = (1.685/1.700) = 0.9912. Had the source symbol probabilities
been Pk = 2-n1, i.e., PI = 2-I = 0.5, p2 = 2-2
= 0.25, p3 = 2-3
= 0.125 and p4 = 2-
3
= 0.125, the
average codeword length would be, R= 1.750 bits= H(X).In this case, 11 = 1.
1.6 HUFFMAN CODING
We will now study an algorithm for constructing efficient source codes for a DMS with source
symbols that are not equally probable. A variable length encoding algorithm was suggested by
Huffman in 1952, based on the source symbol probabilities P(xJ, i= 1, 2, ..., L. The algorithm
is optimal in the sense that the average number of bits it requires to represent the source symbols
Information Theory, Coding and Cryptography
is a minimum, and also meets the prefix condition. The steps of the Huffman coding algorithm
are given below:
(i) Arrange the source symbols in decreasing order of their probabilities.
(ii) Take the bottom two symbols and tie them together as shown below. Add the.
probabilities of the two symbols and write it on the combined node. Label the two
branches with a '1' and a '0' as depicted in Fig. 1.10.
Pn-2
0
pョMQセKー@ n-1 n
Pn 1
Fig. 1.1 0 Combining Probabilities in Huffman Coding.
(iii) Treat this sum of probabilities as a new probability associated with a new symbol. Again
pick the two smallest probabilities, tie them together to form a new probability. Each
time we perform the combination of two symbols we reduce the total number of symbols
by one. Whenever we tie together two probabilities (nodes) we label the two branches
with a '1' and a '0'.
(iv) Continue the procedure until only one probability is left (and it should be 1 if your
addition is correct!). This completes the construction of the Huffman tree.
(v) To find out the prefix codeword for any symbol, follow the branches from the final node
back to the symbol. While tracing back the route read out the labels on the branches.
This is the codeword for the symbol.
The algorithm can be easily understood using the following example.
Example 1.11 Consider a DMS with seven possible symbols X;, i = 1, 2, ..., 7 and the
corresponding probabilitiesp1 =0.37,p2 =0.33,p3 =0.16,p4 =0.01,p5 =0.04,p6 =0.02, andp7
= 0.01. We first arrange the probabilities in the decreasing order and then construct the Huffman
tree as in Fig. 1.11.
Symbol Probability SelfInfonnation Codeword
X! 0.37 1.4344 0
x2 0.33 1.5995 10
x3 0.16 2.6439 110
x4 0.07 3.8365 1110
xs 0.04 4.6439 llllO
x6 0.02 5.6439 111110
x1 0.01 6.6439 111111
Source Coding
0
X1 0.37
0 1.00
x2 0.33 ----
0 0.66 I
X3 0.16
MMMQMセ@
X4 0.07
xs 0.04
0 0.30 I
.... ______ 0 _____1
0.14
セMMlセ@
I
I ___ .,l_J
.-
X6 0.02
0
0.07
0.03
1
1
X7 0.01
Fig. 1.11 Huffman Coding for Example 1. 17.
To find the codeword for any particular symbol, we just trace back the route from the final node to
the symbol. For the sake of illustration we show the route for the symbolx4 with probability 0.07
with the dotted line. We read out the labels of the branches on the way to obtain the codeword as
1110.
The entropy of the source is found out to be
7
H(X) = - L Pk log2 Pk = 2.1152 bitsl
k=l
and the average number of binary digits per symbol is calculated to be
7
R = 'I,n(xk)P(xk)
k=l
= 1(0.37) + 2(0.33) + 3(0.16) + 4(0.07) + 5(0.04) + 6(0.02) + 6(0.01)
= 2.1700 bits.
The efficiency of this code is Tf = (2.1152/2.1700) = 0.9747.
Example 1.12 This example shows that Huffman coding is not unique. Consider a DMS with
seven possible symbols X;, i = 1, 2, ..., 7 and the corresponding probabilities p1 =0.46, P2 =0.26,
p3 =0.12, p4 =0.06, p5 = 0.03, p6 =0.02, and p1 = 0.01.
Symbol Probability Selflnfonnation Codeword
xl 0.46 1.1203 1
x2 0.30 1.7370 00
x3 0.12 3.0589 010
x4 0.06 4.0589 0110
xs 0.03 5.0589 01110
x6 0.02 5.6439 011110
x1 0.01 6.6439 011111
Information Theory, Coding and Cryptography
x1 0.46
x2 0.30
0 I
X3 0.12
0 0.54
X4 0.06
Xs 0.03
0 0.24
0
1
0.12
1
Xs 0.02
X7 0.01
0 0.06
x6
' I 0.03
1
I 1
Fig. 1.12 Huffman Coding for Example 7. 7
2.
The entropy of the source is found out to be
7
H(X) =-IPk log2 Pk =1.9781 bits,
k=l
I
and the average number of binary digits per symbol is calculated to be
7
R = I n(xk )P(xk)
k=l
0
セ@ 1
= 1(0.46) + 2(0.30) + 3(0.12) + 4(0.06) + 5(0.03) + 6(0.02) + 6(0.01)
= 1.9900 bits.
The efficiency of this code is 11 =(1.978111.9900) =0.9940.
We shall now see that Huffman coding is not unique. Consider the combination ofthe two smallest
probabilities (symbols x6 。ョ、セIN@ Their sum is equal to 0.03, which is equal to the next higher
probability corresponding to the symbol x5. So, for the second step, we may choose to put this
combined probability (belonging to, say, symbol xt;') higher than, or lower than, the symbol x5•
Suppose we put the combined probability at a lower level. We proceed further, to again find the
combination ofx6' and x5 yields the probability 0.06, which is equal to that ofsymbolx4• We again
have a choice whether to put the combined probability higher than, or lower than, the symbol x4•
Each time we make a choice (or flip a fair coin) we end up changing the final codeword for the
symbols. In Fig. 1.13, each time we have to make a choice between two probabilities that are equal,
we put the probability of the combined symbols at a higher level.
Source Coding
X1 0.46
x2 0.30
0.54
X3 0.12
0.24
X4 0.06
0.12
xs 0.03
0
0.06
x6 0.02
0.03
X7 0.01
1
Fig. 1.13 Alternative way of Huffman Coding in Example 7. 7
2 which
Leads to a Different Code.
Symbol
xl
x2
x3
x4
Xs
x6
X?
The entropy of the source is
7
Probability
0.46
0.30
0.12
0.06
0.03
0.02
0.01
SelfInformation
1.1203
1.7370
3.0589
4.0589
5.0589
5.6439
6.6439
H(X) =-IPk log2 Pk =1.9781 bits,
k=l
and the average number of bits per symbol is
7
R = In(xk)P(xk)
k=l
Codeword
1
00
011
0101
01001
010000
010001
= 1(0.46) + 2(0.30) + 3(0.12) + 4(0.06) + 5(0.03) + 6(0.02) + 6(0.01)
= 1.9900 bits.
The efficiency of this code is 71 = (1.978111.9900) = 0.9940. Thus both codes are equally efficient.
In the above examples, encoding is done symbol by symbol. A more efficient procedure
is to encode blocks of B symbols at a time, In this case the bounds of the source coding
theorem becomes
BH(X) :::; RB < BH(X) + 1
I
Information Theory, Coding and Cryptography
since the entropy of a B-symbol block is simply BH(X;, and RB is the average number of
bits per B-symbol block. We can rewrite the bound as
H(X; :::; Rs <H(X; + __!_
B B
(1.34)
R - -
where _!L = R is the average number of bits per source symbol. Thus, R can be made
B
arbitrarily: close to H(X; by selecting a large enough block B.
Example 1.13 Consider the source symbols and their respective probabilities listed below.
Symbol Probability SelfInformation Codeword
xl 0.40
Xz 0.35
x" 0.25
For this code, the entropy of the source is
3
1.3219
1.5146
2.0000
H(X) =- L Pk log2 Pk =1.5589 bits
k=l
The average number of binary digits per symbol is
3
R = I n(xk )P(xk)
k=l
= 1(0.40) + 2(0.35) + 2(0.25) = 1.60 bits,
and the efficiency of this code is 1] =(1.5589/1.6000) =0.9743.
1
00
01
We now group together the symbols, two at a time, and again apply the Huffman encoding
algorithm. The probabilities of the symbol pairs, in decreasing order, are listed below.
Symbol Pairs Probability SelfInformation Codeword
XtXI 0.1600 2.6439 10
XtXz 0.1400 2.8365 001
XzXI 0.1400 2.8365 010
XzXz 0.1225 3.0291 011
XtX3 0.1000 3.3219 111
x3x1 0.1000 3.3219 0000
XzX3 0.0875 3.5146 0001
X3Xz 0.0875 3.5146 1100
x3x3 0.0625 4.0000 1101
Source Coding
For this code, the entropy is
9
2H(X) =- LPk log2 Pk =3.1177 bits,
k=l
==> H(X) = 1.5589 bits.
Note that the source entropy has not changed !The average number of bits per block (symbol pair)
is
9
RB = I n(xk )P(xk)
k=l
= 2(0.1600) + 3(0.1400) + 3(0.1400) + 3(0.1225)
+ 3(0.1000) + 4(0.1000) + 4(0.0875) + 4(0.0875) + 4(0.0625)
= 3.1775 bits per symbol pair.
==> R = 3.1775/2 = 1.5888 bits per symbol.
and the efficiency of this code is 1] = (1.558911.5888) = 0.9812. Thus we see that grouping of two
letters to make a symbol has improved the coding efficiency.
Example 1.14 Consider the source symbols and their respective probabilities listed below.
Symbol Probability SelfInformation Codeword
XI 0.50
Xz 0.30
x" 0.20
For this code, the entropy of the source is
3
1.0000
1.7370
2.3219
H(X) = - I Pk logz Pk = 1.4855 bits.
k=l
The average number of bits per symbol is
3
R = I n(xk )P(xk)
k=l
= 1(0.50) + 2(0.30) + 2(0.20) = 1.50 bits,
arid the efficiency of this code is 1J =(1.4855 /1.5000) =0.9903.
1
00
01
We now group together the symbols, two at a time, and again apply the Huffman encoding
algorithm. The probabilities of the symbol pairs, in decreasing order, are listed as follows.
Information Theory, Coding and Cryptography
Symbol Pairs Probability SelfInformation
xlxl 0.25 2.0000
x1
x2 0.15 2.7370
XzX! 0.15 2.7370
xlx3 0.10 3.3219
x3xl 0.10 3.3219
XzXz 0.09 3.4739
XzX3 0.06 4.0589
x3x2 0.06 4.0589
クセクセ@ 0.04 4.6439
For this code, the entropy is
9
2H(X) = - L,Pk 1og2 Pk = 2.9710 bits,
k=!
セ@ H(X) = 1.4855 bits.
The average number of bits per block (symbol pair) is
9
RB = L n(xk )P(xk)
k=!
Codeword
00
010
011
100
110
1010
1011
1110
1111
= 2(0.25) + 3(0.15) + 3(0.15) + 3(0.10) + 3(0.10) + 4(0.09) + 4(0.06) +
4(0.06) + 4(0.04) = 3.00 bits per symbol pair.
セ@ ii = 3.00/2 = 1.5000 bits per symbol.
and the efficiency of this code is ry2 = (1.4855 /1.5000) = 0.9903.
In this case, grouping together two letters at a time has not increased the efficiency ofthe code!
However, if we group 3 letters at a time (triplets) and then apply Huffman coding, we obtain the
code efficiency as ry3 = 0.9932. Upon grouping four letters at a time we see a further improvement
(TJ4 = 0.9946).
1.7 THE LEMPEL-ZIV ALGORITHM
Huffman coding requires the symbol probabilities. But most real life scenarios do not provide
the symbol probabilities in advance (i.e., the statistics of the source is unknown). In principle, it
is possible to observe the output of the source for a long enough time period and estimate the
symbol probabilities. However, this is impractical for real-time application. Also, while
Huffman coding is optimal for a DMS source where the occurrence of one symbol does not
alter the probabilities of the subsequent symbols, it is not the best choice for a source with
r
i
sセイ」・@ Coding
memory. For example, consider the problem of compression of written text. We know that
many letters occur in pairs or groups, like 'q-u', 't-h', 'i-n-g' etc. It would be more efficient to use
the statistical inter-dependence of the letters in the alphabet along with their individual
probabilities of occurrence. Such a scheme was proposed by Lempel and Ziv in 1977. Their
source coding algorithm does not need the source statistics. It is a Variable-to-Fixed Length
Source Coding Algorithm and belongs to セ・@ class of universal source coding algorithms.
The logic behind Lempel-Ziv universal coding is as follows. The compression of an arbitrary
sequence of bits is possible by coding a series of O's and 1's as some previous such string (the
prefix string) plus one new bit (called innovation bit). Then, the new string formed by adding
the new bit to the previously used prefix string becomes a potential prefix string for· future
strings. These variable length blocks are called phrases. The phrases are listed in a dictionary
which stores the existing phrases and their locations. In encoding a new phrase, we specify the
location of the existing phrase in the dictionary and append the new letter. We can derive a
better understanding of how the Lempel-Ziv algorithm works by the following example.
Example 1.15 Suppose we wish to code the string: 101011011010101011. We will begin by
parsing it into comma-separated phrases that represent strings that can be represented by a
previous string as a prefix, plus a bit. ·
The first bit, a 1, has no predecessors, so, it has a null prefix string and the one extra bit is itself:
1, 01011011010101011
The same goes for the 0 that follows since it can't be expressed in terms ofthe only existing prefix:
1, 0, 1011011010101011
So far our dictionary contains the strings' 1' and '0'. Next we encounter a 1, but it already exists in
our dictionary. Hence we proceed further. The following 10 is obviously a combination of the
prefix 1 and a 0, so we now have:
1, 0, 10, 11011010101011
Continuing in this way we eventually parse the whole string as follows:
1, 0, 10, 11, 01, 101, 010, 1011
Now, since we found 8 phrases, we will use a three bit code to label the null phrase and the first
seven phrases for a total of 8 numbered phrases. Next, we write the string in terms ofthe number of
the prefix phrase plus the new bit needed to create the new phrase. We will use parentheses and
commas to separate these at first, in order to aid our visualization ofthe process. The eight phrases
can be described by:
(000,1)(000,0),(001,0),(001'1),(010,1),(011,1),(101,0),(110,1).
Information Theory, Coding and Cryptography
It can be read out as: (codeword at location 0,1), (codeword at location 0,0), (codeword at
location 1,0), (codeword at location 1,1), (codeword at location 2,1), (codeword at location 3,1),
and so on.
Thus the coded version of the above string is:
00010000001000110101011110101101.
The dictionary for this example is given in Table 1.1. In this case, we have not obtained any
compression, our coded string is actually longer! However, the larger the initial string, the more
saving we get as we move along, because prefixes that are quite large become representable as
small numerical indices. In fact, Ziv proved that for long documents, the compression of the file
approaches the optimum obtainable as determined by the information content ofthe document.
Table 1.1 Dictionary for the Lempei-Ziv algorithm
Dictionary Dictionary Fixed Length
Location content Code'rvord
001 1 0001
010 0 0000
011 10 0010
100 11 0011
101 01 0101
110 101 0111
111 010 1010
1011 1101
The next question is what should be the length of the table. In practical application, regardless
of the length of the table, it will eventually overflow. This problem can be solved by pre-
deciding a large enough size of the dictionary. The encoder and decoder can update their
dictionaries by periodically substituting the less used phrases from their dictionaries by more
frequently used ones. Lempel-Ziv algorithm is widely used in practice. The compress and
uncompress utilities of the UNIX operating system use a modified version of this algorithm.
The standard algorithms for compressing binary files use code words of 12 bits and transmit 1
extra bit to indicate a new sequence. Using such a code, the Lempel-Ziv algorithm can compress
transmissions of English text by about 55 per cent, whereas the Huffman code compresses the
transmission by only 43 per cent.
In the following section we will study another type of source coding scheme, particularly
useful for facsimile transmission and image compression.
1.8 RUN LENGTH ENCODING AND THE PCX FORMAT
Run-Length Encoding or RLE is a technique used to reduce the size of a repeating string of
characters. This repeating string is called a run. Typically RLE encodes a run of symbols into
two bytes, a count and a symbol. RLE can compress any type of data regardless of its
Source Coding
information content, but the content of data to be compressed affects the compression ra4o.
RLE cannot achieve high compression ratios compared to other compression methods, but it is
easy to implement and is quick to execute. RLE is supported by most bitmap file formats such
as TIFF, BMP and PCX.
Example 1.16 Consider the following bit stream:
1111111111111110000000000000001111.
This can be represented as: fifteen 1's, nineteen O's, four 1's, i.e., (15,1), (19, 0), (4,1). Since the
maximum number ofrepetitions is 19, which can be represented with 5 bits, we can encode the bit
stream as (01111,1), (10011,0), (00100,1). The compression ratio in this case is 18:38 = 1:2.11.
RLE is highly suitable for FAX images of typical office documents. These two-colour images
(black and white) are predominantly white. If we spatially sample these images for conversion
into digital data, we find that many entire horizontal lines are entirely white (long runs of O's).
Furthermore, if a given pixel is black or white, the chances are very good that the next pixel will
match. The code for fax machines is actually a coml]ination of a run-length code and a Huffman
code. A run-length code maps run lengths into code words, and the codebook is partitioned into
two parts. The first part contains symbols for runs of lengths that are a multiple of 64; the second
part is made up of runs from 0 to 63 pixels. Any run length would then be represented as a
multiple of 64 plus some remainder. For example, a run of 205 pixels would be sent using the
code word for a run of length 192 (3 x 64) plus the code word for a run of length 13. In this way
the number of bits ョセ・、・、@ to represent the run is decreased significantly. In addition, certain
runs that are known to have a higher probability of occurrence are encoded into code words of
short length, further reducing the number of bits that need to be transmitted. Using this type of
encoding, typical compressions for facsimile transmission range between 4 to 1 and 8 to 1.
Coupled to higher modem speeds, these compressions reduce the transmission time of a single
page to less than a minute.
Run length coding is also used for the compression of images in the PCX formaL The PCX
format was introduced as part of the PC Paintbrush series of software for image painting and
editing, sold by the ZSoft company. Today, the PCX format is actually an umbrella name for
several image compression methods and a means to identify which has been applied. We will
restrict our attention here to only one of the methods, for 256-colour images. We will restrict
ourselves to that portion of the PCX data stream that actually contains the coded image, and not
those parts that store the colour palette and image information such as number of lines, pixels
per line, file and the coding method.
The basic scheme is as follows. If a string of pixels are identical in colour value, encode them
as a special flag byte which contains the count followed by a byte with the value of the repeated
pixel. If the pixel is not repeated, simply encode it as the byte itself. Such simple schemes can
Information Theory, Coding and Cryptography
often become more complicated in practice. Consider that in the above scheme, if all 256
colours in a palette are used in an image, then, we need all 256 values of a byte to represent
those colours. Hence, if we are going to use just bytes as our basic code unit, we don't have any
possible unused byte values that can be used as a flag/count byte. On. the. other ィ。ョセL@ if we use
two bytes for every coded pixel to leave room for the flag/count combmations, we mtght double
the size of pathological images instead of compressing them.
The compromise in the PCX format is based on the belief of its designers than many user-
created drawings (which was the primary intended output of their software) would not use all
256 colours. So, they optimized their compression scheme for the case of up to 192 colors only.
Images with more colours will also probably get good compression, just not quite as good, with
this scheme.
Example 1.17 PCX compression encodes single occurrences of colour (that is, a pixel that is not
part of a run of the same colour) 0 through 191 simply as the binary byte representation of exactly
that numerical value. Consider Table 1.2.
Table 1.2 Example of PCX encoding
P1xel color value Hex code Binary code
0
1
2
3
190
191
00
01
02
03
BE
BF
00000000
00000001
00000010
00000011
10111110
10111111
Forthe colour 192 (and all the colours higher than 192), the codeword is equal to one byte in which the
two most significant bits (MSBs) are both set to a 1. We will use these codewords to signify a flag and
count byte. Ifthe two MSBs are equal to one, we will say that they have flagged a count. The remaining
6 bits in the flag/count byte will be interpreted as a 6 bit binary number for the count (from 0 to 63).
This byte is then followed by the byte which represents the colour. In fact, ifwe have a run ofpixels of
one ofthe colours with palette code even over 191, we can still code the run easily since the top two bits
are not reserved in this second, colour code byte of a run coding byte pair.
If a run of pixels exceeds 63 in length, we simply use this code for the first 63 pixels in the run
and that code additional runs of that pixel until we exhaust all pixels in the run. The next question
is: how do we code those remaining colours in a nearly full palette image when there is no run? We
still code these as a run by simply setting the run length to 1. That means, for the case ofat most 64
colours which appear as single pixels in the image and not part of runs, we expand the data by a
factor of two. Luckily this rarely happens!
Source Coding
In the next section, we will study coding for analog sources. Recall that we ideally need
infinite number of bits to accurately represent an analog source. Anything fewer will only be an
approximate representation. We can choose to use fewer and fewer bits for representation at
the cost of a poorer approximation of the original signal. Thus, quantization of the amplitudes of
the sampled signals results in data compression. We would like to study the distortion
introduced when the samples from the information source are quantized.
1.9 RATE DISTORTION FUNCTION
Although we live in an analog world, most of the communication takes place in digital form.
Since most natural sources (e.g. speech, video etc.) are analog, they are first sampled, quantized
and then processed. Consider an analog message waveform x (t) which is a sample waveform of
a stochastic process X(t). Assuming X(t) is a bandlimited, stationary process, it can be
represented by a sequence of uniform samples taken at the Nyquist rate. These samples are
quantized in amplitude and encoded as a sequence of bits. A simple encoding strategy can be to
define L levels and encode every sample using
R = log2L bits if L is a power of 2, or
R = Llog2LJ + 1 bits if Lis not a power of 2.
If all levels are not equally probable we may 'use entropy coding for a more efficient
representation. In order to represent the analog waveform more accurately, we need more
number of levels, which would imply more number of bits per sample. Theoretically we need
infinite bits per sample to perfectly represent an analog source. Quantization of amplitude
results in data compression at the cost of signal integrity. It's a form of lossy data compression
where some measure of difference between the actual source samples {xJ and the correspon-
ding quantized value {xd results from distortion.
Definition 1.13 The squared-error distortion is defined as
d (xk. xk) = (xk- xk)2
In general a distortion measure may be represented as
d (xk' xk) = lxk, xklp
Consider a sequence of n samples, Xn, and the corresponding nquantized values, Xn.
Let d(xk, xk) be the distortion measure per sample (letter). Then the distortion
measure between the original sequence and the sequence of quantized values will
simply be the average over the n source output samples, i.e.,
d(X11
,X11
) = __!_ Id(xk,xk)
n k=I
We observe that the source is a random process, hence Xn and consequently
d(X X ) are random variables. We now define the distortion as follows.
11' 11
I
L
Information Theory, Coding and Cryptography
Definition 1.14 The distortion between a sequence of n samples, xセ@ and their
corresponding n quantized values, xn is defined as
D= E[d(X,"X11
)] = _!_ IE(tl(x.t,i.t)] = .Eltl(x,t,i'.t)}.
n .t=l
It has been assumed here that the random process is stationary.
Next, let a memoryless source have a continuous output X and the quantized output
alphabet X. Let the probability density function of this continuous amplitude be p{x)
and per letter distortion measure be d(x, x), where x E X and .X EX. We next
introduce the rate distortion function which gives us the minimum number of bits per
sample required to represent the source output symbols given a prespecified
allowable distortion.
Definition 1.15 The minimum rate (in bits/source output} required to represent the
output X of the memoryless source with a distortion less than or equal to Dis called
the rate distortion function Rf.Ii}, defined as
R(D) = min _ I{X, X)
p(i].t):E[d(X, X))
where I(X; X) is the average mutual information between X and X.
We will now state (without proof) two theorems related to the rate distortion function.
Theorem 1.3 The minimum information rate necessary to represent the output of a discrete
time, continuous amplitude memoryless Gaussian source with variance a-2
X' based on a
mean square-error distortion measure per symbol, is
(
セ@ {_!_log2(a-;/D) oセdセ。M[@
セ@ D1 = 2
0 D > a-2
X
Consider the two cases:
(i) D セ@ セZ@ For this case there is no need to transfer any information. For the
reconstruction of the samples (with distortion greater than or equal to the variance)
one can use statistically independent, zero mean Gaussian noise samples with
. D 2
vanance = 0" x .
(ii) D < O" セ@ : For this case the number of bits per output symbol decreases monotonically
as D increases. The plot of the rate distortion function is given in Fig. 1.14.
Source Coding
3
Rg(D) 2
o Dla:
0 0.2 0.4 0.6 0.8 1
Fig. 1.14 Plot of the Rg(D) versus PO」イセN@
Theorem 1.4 There exists an encoding scheme that maps the source output into codewords
such that for any given distortion D, the minimum rate R(D) bits per sample is sufficient to
reconstruct the source output with an average distortion that is arbitrarily close to D.
Thus, the distortion function for any source gives the lower bound on the source rate
that is possible for a given level of distortion.
Definition 1.16 The distortion rate function for a discrete time, memoryless
gaussian source is defined as
Dg(R) = z-2R セM
Exmrqile 1.18 For a discrete time, memoryless Gaussian source, the distOrtion (in dB) as a
function of its variance can be expressed as
10 log10 Dg(R) =- 6R + 10 log10 u;.
Thus the mean square distortion decreases at a rate of 6 dB/bit.
The rate distortion function ofadiscrete time, memoryless continuous amplitude source with zero
mean and finite variance u!with respect to the mean square error distortion measure D is upper
bounded as
This upper bound can be intuitively understood as follows. We know that for a given variance;
the zero mean Gaussian random variable exhibits the maximum differential entropyattainable by
any random variable. Hence, for a given distortion, the minimum number of bits per S8.tllple
required is upperbounded by the gaussian random variable.
./
Information Theory, Coding and Cryptography
The next obvious question is: What would be a good design for a quantizer? Is there a way to
construct a quantizer that minimizes the distortion without using too many bits? We shall find
the answers to these questions in the next section.
1.10 OPTIMUM QUANTIZER DESIGN
In this section, we look at optimum quantizer design. Consider a continuous amplitude signal
whose amplitude is not uniformly distributed, but varies according to a certain probability
density function, p(x). We wish to design the optimum scalar quantizer that minimizes some
function of the quantization error q= x - x, where x is the quantized value of x. The distortion
resulting due to the quantization can be expressed as
D = ヲセ@ f(x - x) p(x)dx,
where f( x - x) is the desired function of the error. An optimum quantizer is one that minimizes
D by optimally selecting the output levels and the corresponding input range of each output
level. The resulting optimum quanjizer is called the lloyd-Max Quantizer. For an L-level
quantizer the distortion is given by
L
D= If* f(x - x) p(x)dx
k=l Xk-1
The necessary conditions for minimum distortion are obtained by differentiating D with
respect to {xk} and {xA;}. As a result of the differentiation process we end up with the following
_system of equations
f(xk - x) = f(xk+I - xJ, k= 1, 2,···, L- 1
Xk
J f'(xk+ 1 - xJ p(x)dx k= 1, 2,···, L
For f(x) =X'- ,i.e., the mean square value of the distortion, the above equations simplify to
1 (- - )
xk =- xk - xk+I '
2
k= 1, 2,···, L- 1
rk (xk- x) p(x)dx= 0, k= 1, 2,···, L
xk-1
The nonuniform quantizers are optimized with respect to the distortion. However, each
quantized sample is represented by equal number of bits (say, R bits/sample). It is possible to
have a more efficient VLC. The discrete source outputs that result from quantization is
characterized by a set of probabilities h· These probabilities are then used to design efficient
VLC (source coding). In order to compare the performance of different nonuniform quantizers,
we first fix the distortion, D, and then compare the average number of bits required per sample.
Source Coding
Example 1.19 Consider an eight level quantizer for a Gaussian random variable. This problem
was first solved by Max in 1960. The random variable has zero mean and variance equal to unity.
For a mean square error minimization, the values xk and .X'g are listed in Table 1.3.
Table 1.3 Optimum quantization and Huffman coding
Level. x, x. P. Huffman
Code
1 - 1.748 - 2.152 0.040 0010
2 - 1.050 - 1.344 0.107 011
3 -0.500 -0.756 0.162 010
4 0 -0.245 0.191 10
5 0.500 0.245 0.191 11
6 1.050 0.756 0.162 001
7 1.748 1.344 0.107 0000
8 00 2.152 0.040 0011
For these values, D = 0.0345 which equals -14.62 dB.
The number of bits/sample for this optimum 8-level quantizer is R = 3. On performing Huffman
coding, the average number of bits per sample required is RH = 2.88 bits/sample. The theoretical
limit is H(X) = 2.82 bits/sample.
. F I 1 rt · ··· r r
1.11 INTRODUCTION TO IMAGE COMPRESSION
Earlier in this chapter we discussed the coding of data sets for compression. By applying these
techniques we can store or transmit all of the information content of a string of data with fewer
bits than are in the source data. The minimum number of bits that we must use to convey all the
information in the source data is determined by the entropy measure of the source. Good
compression ratios can be obtained via entropy encoders and universal encoders for sufficiently
large source data blocks. In this section, we look at compression techniques used to store and
transmit image data.
Images can be sampled and quantized sufficiently finely so that a binary data stream can
represent the original data to an extent that is satisfactory to the most discerning eye. Since we
can represent a picture by anything from a thousand to a million bytes of data, we should be
able to apply the techniques studied earlier directly to the task of compressing that data for
storage and transmission. First, we consider the following points:
1. High quality images are represented by very large data sets. A photographic quality
image may require 40 to 100 million bits for representation. These large file sizes drive
the need for extremely high compression ratios to make storage and transmission
(particularly of movies) practical.
2. Applications that involve imagery such as television, movies, computer graphical user
interfaces, and the World Wide Web need to be fast in execution and transmission across
Information Theory, Coding and Cryptography
distribution networks, particularly if they involve moving images, to be acceptable to the
human eye.
3. Imagery is characterised by higher redundancy than is true of other data. For example, a
pair of cセN、ェ。」・ョエ@ horizontal lines in an image is nearly identical while, two adjacent lines
in a book are generally different.
The first two points indicate that the highest level of compression technology needs to be
used for the movement and storage of image data. The third factor indicates that high
compression ratios could be applied. The third factor also says that some special compression
techniques may be possible to take advantage of the structure and properties of image data. The
close relationship between neighbouring pixels in an image can be exploited to improve the
compression ratios. This has a very important implication for the task of coding and decoding
image data for real-time applications.
Another interesting point to note is that the human eye is highly tolerant to approximation
error in an image. Thus, it may be possible to compress the image data in a manner in which the
less important details (to the human eye) can be ignored. That is, by trading off some of the
quality of the image we might obtain a significantly reduced data size. This technique is called
Lossy Compression, as opposed to the Lossless Compression techniques discussed earlier.
Such liberty cannot be taken, say, financial or textual data! Lossy Compression can only be
applied to data such as images and audio where deficiencies are made up by the tolerance by
human senses of sight and hearing.
1.12 THE JPEG STANDARD FOR LOSSLESS COMPRESSION
The Joint Photographic Experts Group (]PEG) was formed jointly by two 'standards'
organisations--the CCITT (The European Telecommunication Standards Organisation) and
the International Standards Organisation (ISO). Let us now consider the lossless compression
option of theJPEG Image Compression Standard which is a description of 29 distinct coding
systems for compression of images. Why are there so many approaches? It is because the needs
of different users vary so much with respect to quality versus compression and compression
versus computation time that the committee decided to provide a broad selection from which to
choose. We shall briefly discuss here two methods that use entropy coding.
The two lossless JPEG compression options discussed here differ only in the form of the
entropy code that is applied to the data. The user can choose either a Huffman Code or an
Arithmetic Code. We will not treat the Arithmetic Code concept in much detail here.
However, we will summarize its main features:
Arithmetic Code, like Huffman Code, achieves compression in transmission or storage by
using the probabilistic nature of the data to render the information with fewer bits than used in
the source data stream. Its primary advantage over the Huffman Code is that it comes closer to
the Shannon entropy limit of compression for data streams that involve a relatively small
alphabet. The reason is that Huffman codes work best (highest compression ratios) when the
Source Coding .
probabilities of the symbols can be expressed as fractions of powers of two. The Arithmetic
code construction is not closely tied to these particular values, as is the Huffman code. The
computation of coding and decoding Arithmetic codes is costlier than that of Huffman codes.
Typically a 5 to 10% reduction in file size is seen with the application of Arithmetic codes over
that obtained with Huffman coding.
Some compression can be achieved if we can predict the next pixel using the previous pixels. In
this way we just have to transmit the prediction coefficients (or difference in the values) instead of
the entire pixel. The predictive process that is used in the losslessJPEG coding schemes to form the
innovations data is also variable. However, in this case, the variation is not based upon the user's
choice, but rather, for any image on a line by line basis. The choice is made according to that
prediction method that yields the best prediction overall for the entire line.
There are eight prediction methods available in theJPEG coding standards. One of the eight
(which is the no prediction option) is not used for the lossless coding option that we are
examining here. The other seven may be divided into the following categories:
1. Predict the next pixel on the line as having the same value as the last one.
2. Predict the next pixel on the line as having the same value as the pixel in this position on
the previous line (that is, above it).
3. Predict the next pixel on the line as havirg a value related to a combination of the
previous, above and previous to the above pixel values. One such combination is simply
the average of the other three.
The differential encoding used in the JPEG standard consists of the differences between the
actual image pixel values and the predicted values. As a result of the smoothness and
redundancy present in most pictures, these differences give rise to relatively small positive and
negative numbers that represent the small typical error in the prediction. Hence, the
probabilities associated with these values are large for the small innovation values and quite
small for large ones. This is exactly the kind of data stream that compresses well with an entropy
code.
The typical lossless compression for natural images is 2: 1. While this is substantial, it does
not in general solve the problem of storing or moving large sequences of images as encountered
in high quality video.
1.13 THE JPEG STANDARD FOR LOSSY COMPRESSION
TheJPEG standard includes a set of sophisticated lossy compression options developed after a
study of image distortion acceptable to human senses. The JPEG lossy compression algorithm
consists of an image simplification stage, which removes the image complexity at some loss of
fidelity, followed by a lossless compression step based on predictive filtering and Huffman or
Arithmetic coding.
The lossy image simplification step, which we will call the image reduction, is based on the
exploitation of an operation known as the Discrete Cosine Transform (DCT), defined as follows.
Information Theory, Coding and Cryptography
N-IM-1
Y(k, l) = I I 4y(i, })cos( 1tk (2i + t)cos(__!E!___(2j + 1))
i=O J=O 2N 2M
where the input image is N pixels by M pixels, y(セ@ j) is the intensity of the pixel in row i and
column j, Y(k,l) is the DCT coefficient in row k and column l of the DCT matrix. All DCT
multiplications are real. This lowers the number of required multiplications, as compared to the
Discrete Fourier Transform. For most images, much of the signal energy lies at low frequencies,
which appear in the upper left comer of the DCT. The lower right values represent higher
frequencies, and are often small (usually small enough to be neglected with little visible
distortion).
In theJPEG image reduction process, the DCT is applied to 8 by 8 pixel blocks of the image.
Hence, if the image is 256 by 256 pixels in size, we break it into 32 by 32 square blocks of 8by
8 pixels and treat each one independently. The 64 pixel values in each block are transformed by
the DCT into a new set of 64 values. These new 64 values, known also as the DCT coefficients
'
form a whole new way of representing an image. The DCT coefficients represent the spatial
frequency of the image sub-block. The upper left comer of the DCT matrix has low frequency
components and the lower right comer the high frequency components (see Fig. 1.15). The top
left coefficient is called the DC coefficient. Its value is proportional to the average value of the 8
by 8 block of pixels. The rest are called the AC coefficients.
So far we have not obtained any reduction simply by taking the DCT. However, due to the
nature of most natural images, maximum energy (information) lies in low frequency as opposed
to high frequency. We can represent the high frequency components coarsely, or drop them
altogether, without strongly affecting the quality of the resulting image reconstruction. This
leads to a lot of compression (lossy). TheJPEG lossy compression algorithm does the following
operations:
1. First the lowest weights are trimmed by setting them to zero.
2. The remaining weights are quantized (that is, rounded off to the nearest of some number
of discrete code represented values), some more coarsely than others according to
observed levels of sensitivity of viewers to these degradations.
DC coefficient
Low frequency
coefficients
/
4.32
/
v
2.74
2.11
1.62
3.12
l.-""'
2.11
1.33
0.44
3.01 2.41
1.92 1.55
0.32 0.11...,.
-1.-----
セ@
0.03 0.02
セ@
v
Higher frequency
coefficients)
(AC coefficients)
Fig. 1.15 Typical Discrete Cosine Transform (OCT) Values.
Source Coding
Now several lossless compression steps are applied to the weight data that results from the
above DCT and quantization process, for all the image blocks. We observe that the DC
coefficient, which represents the average image intensity, tends to vary slowly from one block of
8 x 8 pixels to the next. Hence, the prediction of this value from surrounding blocks works well.
We just need to send one DC coefficient and the difference between the DC coefficients of
successive blocks. These differences can also be source coded.
We next look at the AC coefficients. We first quantize them, which transforms most of the
high frequency coefficients to zero. We then use a zig-zag coding as shown in Fig. 1.16. The
purpose of the zig-zag coding is that we gradually move from the low frequency to high
frequency, avoiding abrupt jumps in the values. Zig-zag coding will lead to long runs of O's,
which are ideal for RLE followed by Huffman or Arithmetic coding.
4.32 3.12 3.01 2.41 4 3 3 2
2.74 2.11 1.92 1.55
セ@ 4333222122200000
2.11 1.33 0.32 0.11
1.62 0.44 0.03 0.02
Fig. 1.16 An Example of Quantization followed by Zig-zag Coding.
The typically quoted performance for ]PEG is that photographic quality images of natural
scenes can be preserved with compression ratios of up to about 20:1 or 25:1. Usable quality
(that is, for noncritical purposes) can result for compression ratios in the range of 200:1 up to
230:1.
1.14 CONCLUDING REMARKS
In 1948, Shannon published his landmark paper titled "A Mathematical Theory of
Communication". He begins this pioneering paper on information theory by observing that the
fundamental problem of communication is that of reproducing at one point either exactly or
approximately a message selected at another point. He then proceeds so thoroughly to establish
the foundations of information theory that his framework and terminology remain standard.
Shannon's theory was an immediate success with communications engineers and stimulated the
growth of a technology which led to today's Information Age. Shannon published many more
provocative and influential articles in a variety of disciplines. His master's thesis, "A Symbolic
Analysis of Relay and Switching Circuits", used Boolean algebra to establish the theoretical
underpinnings of digital circuits. This work has broad significance because digital circuits are
fundamental to the operation of modem computers and telecommunications systems.
Information Theory, Coding and Cryptography
Shannon was renowned for his eclectic interests and capabilities. A favourite story describes
him juggling while riding a unicycle down the halls of Bell Labs. He designed and built chess-
playing, maze-solving, juggling and mind-reading machines. These activities bear out Shannon's
claim that he was more motivated by curiosity than usefulness. In his words "I just wondered
how things were put together."
The Huffman code was created by American Scientist, D. A. Huffman in 1952. Modified
Huffman coding is today used in the Joint Photographic Experts Group (]PEG) and Moving
Picture Experts Group (MEPG) standards.
A very efficient technique for encoding sources without needing to know their probable
occurrence was developed in the 1970s by the Israelis Abraham Lempel and Jacob Ziv. The
compress and uncompress utilities of the UNIX operating system use a modified version of this
algorithm. The GIF format (Graphics Interchange Format), developed by CompuServe,
involves simply an application of the Lempel-Ziv-Welch (LZW) universal coding algorithm to
the image data.
And finally, to conclude this chapter we mention that Shannon, the father of Information
Theory, passed away on February 24, 2001. Excerpts from the obituary published in the New
York Times:
SUMMARY
• The Self-Information of the event X= x, is given by I(xJ = log ( P(:,) ) = - log P(xJ.
• The Mutual Information I(x,; y) between x, and Jj is given by I(x,; y) =log ( ーセセセIIス@
• The Conditional Self-Information of the event X= xi given Y = y1is defined as J(xi I y)
= log (
1
J= - log P(xi Iy).
P(xiiYJ)
• The Average Mutual Information between two random variables X and Yis given by J(X:,
n m n m P(X· J·)
Y) = L L P(xi, y)I(xi; y) = L L P(xc y1)log '' 1
. For the case when X and Yare
i=l J=l i=l J=l P(xJP(yJ)
statistically independent, J(X; Y) = 0. The average mutual information J(X; Y) セ@ 0, with
equality if and only if X and Yare statistically independent.
Source Coding
n
• The Average Self-Information of a random variable Xis given by H(X; = L P(x;)I(xi)
n i=l
=-L P(xJlogP(xJ. H (X; is called the entropy.
i=l
• The Average Conditional Self-Information called the Conditional Entropy is given
by
n m
1
H(XI Y) = セ@ セ@ P(x;, y) log P(xiiYJ)
• I(xi; y) = I(xJ- I(xi Iy) and I(X; Y) = H(X)- H(XI Y) = H(Y)- H(YIX). Since I(X; Y)
セ@ 0, it implies that H(X) セ@ H(XI Y).
• The Average Mutual Information between two continuous random variables X and Y
00 00 p( lx)p(x)
is given by J(X; Y) = J Jp(x)p(ylx)log セクI@ ( ) dxdy
-oo-oo p pY
• The Differential Entropy of a continuous random variables X is given by H(X) =
- Jp(x)log p{x).
• The Average Conditional Entropy of a continuous random variables X given Y is
given by H(XI Y) = J Jp(x, y)log p(xly)dxdy.
• A necessary and sufficient condition for the existence of a binary code with codewords
L
having lengths n1 :5 セ@ :5 ... nL that satisfy the prefix condition is L 2-nk :5 1. The efficiency
k=l
H(x)
of a prefix code is given by T] = R ·
• Let X be the ensemble of letters from a DMS with finite entropy H (X). The Source
Coding Theorem suggests that it is possible to construct a code that satisfies the prefix
condition, and has an average length R that satisfies the inequality H (X) :5 R <H(X) +
1. Efficient representation of symbols leads to compression of data.
• Huffman Encoding and Lempel-Ziv Encoding are two popular source coding
techniques. In contrast to the Huffman coding scheme, the Lempel-Ziv technique is
independent of the source statistics. The Lempel-Ziv technique generates a Fixed Length
Code, where as the Huffman code is a Variable Length Code.
• Run-Length Encoding or RLE is a technique used to reduce the size of a repeating
string of characters. This repeating string is called a run. Run-length encoding is
supported by most bitmap file formats such as TIFF, BMP and PCX.
Information Theory, Coding and Cryptography
• Distortion implies some measure of difference between the actual source samples {x.J
and the corresponding quantized value {xd. The squared-error distortion is given by
d(xk, xk) =(xk- xkf In general, a distortion measure may be represented as d(xlt, xk) =
jxk- xkjP.
• The Minimum Rate (in bits/source output) required to represent the output X of a
memoryless source with a distortion less than or equal to D is called the rate distortion
function R(D), defined as R(D) = min _ I (X, X) where I(X, X); is the average
p(x!x):E[d(X, X)]
mutual information between X and .i.
• The distortion resulting due to the quantization can be expressed as D = [ .. f(x - x)
p(x)dx, where f(x- x) is the desired function of the error. An optimum quantizer is one
that minimizes D by optimally selecting the output levels and the corresponding input
range of each output level. The resulting optimum quantizer is called the Lloyd-Max
quantizer.
• Quantization and source coding techniques (Huffman coding, arithmetic coding iHld run-
length coding) are used in the JPEG standard for image compression.
ヲェoセ。エGB・エセセセケッキセキィ・LョNLケッキセ@
: . your ・Zyセ@ offyour セ@ 1
l - flenry FonL (1863-1947) i
..._)
PRC913LEMS
1.1 Consider a DMS with source probabilities {0.30, 0.25, 0.20, 0.15, 0.10}. Find the source
entropy, H (X).
1.2 Prove that the entropy for a discrete source is a maximum when the output symbols are
equally probable.
1.3 Prove the inequality In x セ@ x- 1. Plot the curves y1 = In x and Y2 = x- 1 to demonstrate the
validity of this inequality.
1.4 Show that I (X; Y) セ@ 0. Under what condition does the equality hold?
1.5 A source, X: has an infinitely large set of outputs with probability of occurrence given by
P (xJ = Ti, i = 1, 2, 3, ..... What is the average self information, H (X), of this source?
1.6 Consider another geometrically distributed random variable X with P (x;) = p (1 - pt1
,
i = 1, 2, 3, ..... What is the average self information, H (X), of this source?
1.7 Consider an integer valued random variable, X: given by P (X= n) =
1
2
, where
. An log n
セ@ 1 .
A= L 1 2 and n= 2, 3 ..., oo. Find the entropy, H (X).
n=Z n og n
Source Coding
1.8 Calculate the differential entropy, H (X), of the uniformly distributed random variable X
with the pdf,
{
-1 0 < <
()
a _x_a
px=
0 (otherwise)
Plot the differential entropy, H (X), versus the parameter a (0.1 < a< 10). Comment on
the result.
1.9 Consider a DMS with source probabilities {0.35, 0.25, 0.20, 0.15, 0.05}.
(i) Determine the Huffman code for this source.
(ii) Determine the average length R of the codewords.
(iii) What is the efficiency 11 of the code?
1.10 Consider a DMS with source probabilities {0.20, 0.20, 0.15, 0.15, 0.10, 0.10, 0.05, 0.05}.
(i) Determine an efficient fixed length code for the source.
(ii) Determine the Huffman code for this source.
(iii) Compare the two codes and comment.
1.11 A DMS has three output symbols with probabilities {0.5, 0.4, 0.1}.
(i) Determine the Huffman code for this source and find the efficiency 11·
(ii) Determine the Huffman code for this source taking two symbols at a time and find the
efficiency 11·
(iii) Determine the Huffman code for this source taking three symbols at a time and find
the efficiency 11·
1.12 For a source with entropy H(X), prove that the entropy of a B-symbol block is BH(X).
1.13 Let X and Y be random variables that take on values x1, セG@ ..., x, and Yr• )2, ..., Ys
respectively. Let Z =X+ Y.
1.14
1.15
(a) Show that H(Z!X) = H(Y!X)
(b) If X and Yare independent, then argue that H( Y) セ@ H(Z) and H(X) セ@ H (Z).
Comment on this observation.
(c) Under what condition will H(Z) = H(X) + H(Y)?
Determine the Lempel Ziv code for the following bit stream
01001111100101000001010101100110000.
Recover the original sequence from the encoded stream.
Find the rate distortion function R(D) =min !(X; X)for Bernoulli distributed X: with
p = 0.5, where the distortion is given by
{
0,
d (x, x) = セN@
x=x,
X= 1 X= 0,
X= 0, X= 1.
1.16 Consider a source X uniformly distributed orr the set {1, 2, ..., m}. Find the rate distortion
function for this source with Hamming distortion defined as
d (x, x) = ' _
{
o x=x
1, X -:1:- X
• !
Information Theory, Coding and Cryptography
COMPUTER PROBLEMS
1.17 Write a program that performs Huffman coding, given the source probabilities. It should
generate the code and give the coding efficiency.
1.18 Modify the above program so that it can group together n source symbols and then
generate the Huffman code. Plot the coding efficiency T1 versus n for the following source
symbol probabilities: {0.55, 0.25, 0.20}. For what value of n does the efficiency become
better than 0.9999? Repeat the exercise for following source symbol probabilities {0.45,
0.25, 0.15, 0.10, 0.05}.
1.19 Write a program that executes the Lempel Ziv algorithm. The input to the program can
be the English alphabets. It should convert the alphabets to their ASCII code and then
perform the compression routine. It should output the compression achieved. Using this
program, find out the compression achieved for the following strings of letters.
(i) The Lempel Ziv algorithm can compress the English text by about fifty five percent.
(ii) The cat cannot sit on the canopy of the car.
1.20 Write a program that performs run length encoding (RLE) on a sequence of bits and
gives the coded output along with the compression ratio. What is the output of the
program if the following sequence is fed into it:
1100000000111100000111111111111111111100000110000000.
Now feed back the encoded output to the program, i.e., perform the RLE two times on the
original sequence of bits. What do you observe? Corr:ment.
1.21 Write a program that takes in a 2n level gray scale image (n bits per pixel) and performs
the following operations:
(i) Breaks it up into 8 by 8 pixel blocks.
(ii) Performs DCT on each of the 8 by 8 blocks.
(iii) Quantizes the DCT coefficients by retaining only the m most significant bits (MSB),
where m セ@ n.
(iv) Performs the zig-zag coding followed by run length coding.
(v) Performs Huffman coding on the bit stream obtained above (think of a reasonable
way of calculating the symbol probabilities).
(vi) Calculates the compression ratio.
(vii) Performs the decompression (i.e., the inverse operation of the steps (v) back to (i)).
Perform image compression using this program for different values of m. Up to what
value of mis there no perceptible difference in the original image and the compressed
image?
..
2
Channel Capacity and Coding
v
eクMー・イセ@ thin1c.t tJw.:t 1£' w a-- セ@ thecremt
whilet エィ・Lセ@ be.lUwe-- Lt: 0- be- a.t'1 セ@
{cl.d.
{oキエ[Gィ・エgセ」オイカ・L@ セPMpセI@
. LipptttlM'4 GcibrieL (1845 -1921)
2.1 INTRODUCTION
In the previous chapter we saw that most natural sources have inherent redundancies and it is
possible to compress data by removing these redundancies using different source coding
techniques. After efficient representation of source symbols by the minimum possible number
of bits, we transmit these bit-streams over channels (e.g., telephone lines, optical fibres etc.).
These bits may be transmitted as they are (for baseband communications), or after modulation
(for passband communications). Unfortunately, all real-life channels are noisy. The term noise
designates unwanted waves that disturb the transmission and processing of the wanted signals
in communication systems. The source of noise may be external to the system (e.g., atmospheric
noise, man generated noise etc.), or internal (e.g., thermal noise, shot noise etc.). In effect, the
bit stream obtained at the receiver is likely to be different from what was transmitted. In
passband communication, the demodulator processes the channel-corrupted waveform and
reduces each waveform to a scalar or a vector that represents an estimate of the transmitted data
symbols. The detector, which follows the demodulator, decides whether the transmitted bit is a
Information Theory, Coding and Cryptography
0 or a 1. This is called Hard Decision Decoding. This decision process at the decoder is
similar to a binary quantization with two levels. If there are more than 2 levels of quantization,
the detector is said to perform a Soft Decision Decoding.
The use of hard decision decoding causes an irreversible loss of information at the receiver.
Suppose the modulator sends only binary symbols but the demodulator has an alphabet with Q,
symbols, and assuming the use of quantizer as depicted in Fig. 2.1 (a), we have Q,= 8. Such a
channel is called a binary input Qary output Discrete Memoryless Channel. The
corresponding channel is shown in Fig. 2.1 (b). The decoder performance depends on the
location of the representation levels of the quantizers, which in tum depends on the signal level
and the noise power. Accordingly, the demodulator must incorporate automatic gain control in
order to realize an effective multilevel quantizer. It is clear that the construction of such a
decoder is more complicated than the hard decision decoder. However, soft decision decoding
can provide significant improvement in performance over hard decision decoding.
Output
b1
b2
b3
b4
Input
bs
b6
b7
ba
(a) (b)
Fig. 2.1 (a) Transfer Characteristic of Multilevel Quantizer
(b) Channel Transition Probability Diagram.
b1
b2
b3
b4
bs
b6
b7
ba
There are three balls that a digital communication _engineer must juggle: (i) the transmitted
signal power, (ii) the channel bandwidth, and (iii) the reliability of the communication system
(in terms of the bit error rate). Channel coding allows us to trade-off one of these commodities
(signal power, bandwidth or reliability with respect to others). In this chapter, we will study how
to achieve reliable communication in the presence of noise. We shall ask ourselves questions
like: how many bits per second can be sent over a channel of a given bandwidth and for a given
signal to noise ratio (SNR)? For that, we begin by studying a few channel models first.
2.2 CHANNEL MODELS
We have already come across the simplest of the channel models, the Binary Symmetric
Channel (BSC), in the previous chapter. If the modulator employs binary waveforms and the
Channel Capacity and Coding
detector makes hard decisions, then the channel may be viewed as one in which a binary bit
stream enters at the transmitting end and another bit stream comes out at the receiving end.
This is depicted in Fig. 2.2.
Channel Binary
f---.- Channel ____. Demodulator/
r----- Channel
-
- Encoder
f---.- Modulator Detector Decoder
Fig. 2.2 A Composite Discrete-input, Discrete-output Channel.
The composite Discrete-input, Discrete-output Channel is characterized by the set X=
{0,1} of possible inputs, the set Y= {0, 1} of possible outputs and a set of conditional probabilities
that relate the possible outputs to the possible inputs. Assuming the noise in the channel causes
independent errors in the transmitted binary sequence with average probability of error p
P(Y= Ol X= 1) = P(Y= 11 X= 0) = p,
P(Y= II X= 1) = P(Y= Ol X= 0) = 1- p. (2.1)
A BSC is shown in Fig. 2.3.
1- p
0
0
p
1-p
Fig. 2.3 A Binary Symmetric Channel (BSC).
The BSC is a special case of a general, discrete-input, discrete-output channel. Let the input
to the channel be q-ary symbols, i.e., X= {.xo, x1 , .., xq--d and the output of the detector at the
receiver consist of Qary symbols, i.e., Y= {y0, y1 , .. , YQ;-d. We assume that the channel and the
modulation is memoryless. The inputs and outputs can then be related by a set of qQ,conditional
probabilities
P(Y= Yi I X= x) = P(y1 I x), (2.2)
where i = O, 1 , ... Q,- 1 and j = 0, 1 , ... q- 1. This channel is known as a Discrete Memoryless
Channel (DMC) and is depicted in Fig. 2.4.
Definition 2.1 The conditional probability P(y; I x1) is defined as the Channel
Transition Probability and is denoted by PJi·
Definition 2.2 The conditional probabilities {P(y; Ix)} that characterize a DMC can
be arranged in the matrix form P = [p1J. P is called the Probability Transition
Matrix for the channel.
.I
Information Theory, Coding and Cryptography
Yo
Y1
Xq-1
Ya-1
Fig. 2.4 A Discrete Memoryless Channel (DMC) with q-ary input and Q-ary output.
In the next section, we will try to answer the question: How many bits can be sent
across a given noisy channel, each time the channel is used?
2.3 CHANNEL CAPACITY
Consider a DMC having an input alphabet X= {.xo, xi, ..., xq-Jland an output alphabet Y= {.xo, xi,
..., xr_Jl. Let us denote the set of channel transition probabilities by P(yil x1). The average mutual
information provided by the output Y about the input X is given by (see Chapter 1, Section 1.2)
セi@ r-I P(y·lx.)
I(X;Y) = L LP(x1)P(yilx1)log
1 1
j=O i=O pHセI@
(2.3)
The channel transition probabilities pHケLセクI@ are determined by the channel characteristics
(particularly the noise in the channel). However, the input symbol probabilities P(x1
) are within
the control of the discrete channel encoder. The value of the average mutual information, /(X; Y),
maximized over the set of input symbol probabilities P(x) is a quantity that depends only on the
channel transition probabilities pHケLセク
Q IN@ This quantity is called the Capacity of the Channel.
Definition 2.3 The Capacity of a DMC is defined as the maximum average mutual
information in any single use of the channel, where the maximization is over all
possible input probabilities. That is,
C= max I(X; Y)
P(x1)
q-1 r-1 P(y·lx·)
=max L L P(x1 )P(y1lx1 )1og
1 1
P(xj) j=O i=O p (yl)
The maximization of I(X; Y) is performed under the constraints
q-1
P(x) セ@ 0, and L.P(x1) = 1
j=O
(2.4)
The units of channel capacity is bits per channel use (provided the base of the
lo 'thm is 2.
Channel Capacity and Coding
Example 2.1 Consider a BSC with channel transition probabilities
P(Oil) = p = P(liO)
By symmetry, the capacity, max C = max /(X; Y), is achieved forp = 0.5. From equation (2.4) we
P(xj)
obtain the capacity of a BSC as
C = 1 + p log2 p + (1- p) log2( 1- p)
Let us define the entropy function
H(p) =- p log2 p- (1- p) log2 (1- p)
Hence, we can rewrite the capacity of a binary symmetric channel as
C= 1- H(p).
0.8
セ@ 0.6
·o
IU
a.
IU
() 0.4
0.2
0.2 0.4 0.6 0.8
Probability of error
Fig. 2.5 The Capacity of a BSC.
The plot of the capacity versus p is given in Fig. 2.5. From the plot we make the following
observations.
(i) Forp = 0 (i.e., noise free channel), the capacity is 1bit/use, as expected. Each time we use the
channel, we can successfully transmit 1 bit ofinformation.
(ii) Forp = 0.5, the channel capacity is 0, i.e., observing the output gives no information about
the input. It is equivalent to the case when the channel is broken. We might as well discard
the channel and toss a fair coin in order to estimate what was transmitted.
(iii) For 0.5 < p < 1, the capacity increases with increasing p. In this case we simply reverse the
positions of 1 and 0 at the output of the BSC.
(iv) For p = 1 (i.e., every bit gets flipped by the channel), the capacity is again 1 bit/use, as
expected. In this case, one simply flips the bit at the output of the receiver so as to undo the
effect ofthe channel.
Information Theory, Coding and Cryptography
(v) Sincep_ is a monotonically decreasing function ofsignal to noise ratio (SNR), the capacity of
a BSC Is a monotonically increasing function of SNR.
Having developed the notion of capacity of a channel, we shall now try to relate it to reliable
communication over the channel. So far, we have only talked about bits that can be sent over a
ch_annel each time it is used (bits/use). But, what is the number of bits that can be send per ウ・」ッセ、@
(bits/sec)? To answer this question we introduce the concept of Channel Coding.
2.4 CHANNEL CODING
セャャ@ real-life channels are affected by noise. Noise causes discrepancies (errors) between the
mput and the ッオエーセエN@ data ウセアオ・ョ」・ウ@ of a digital communication system. For a typical noisy
」セ。ョョ・ャL@ the probability of bit error may be as high as 1o-2
. This means that, on an average, 1
bit.ou: セヲ@ _every 100 transmitted over this channel gets flipped. For most applications, this level of
セ・ャイ。ィコャイエケ@ IS far from adequate. Different applications require different levels of reliability (which
IS a_ 」ッューッョセョエ@ of the quality of service). Table 2.1 lists the typical acceptable bit error rates for
varwus applications.
Table 2. 1 Acceptable bit error rates for various applications
Application - Probability of Error
Speech telephony
Voice band data
10-4
10-6
Electronic mail, Electronic newspaper 1o-6
Internet access 1o-6
Video telephony, High speed computing 10-7
In order to achieve such high levels of reliability, we resort to Channel formatting.
Coding.. Th_e basic objective of channel coding is to increase the resistance of the digital
commumcation system to channel noise. This is done by adding redundancies in the transmitted
data stream in a controlled manner.
In セィ。ョョ・ャ@ coding, we map the incoming data sequence to a channel input sequence. This
・ョ」ッ、ゥセァ@ procedure is done by the Channel Encoder. The encoded sequence is then
transmitted over the noisy channel. The channel output sequence at the receiver is inverse
mapped on to an output data sequence. This is called the decoding procedure, and is carried out
by the Channel Decoder. Both the encoder and the decoder are under the designer's control.
As already mentioned, the encoder introduces redundancy in a prescribed manner. The
decoder exploits セゥウ@ redundancy in order to reconstruct the original source sequence as
accuratel_r セウ@ possible. Thus, channel coding makes it possible to carry out reliable
commumcation over unreliable (noisy) channels. Channel coding is also referred to as Error
Control Coding, and we will use these terms interchangeably. It is interesting to note here that
the source coder reduces redundancy to improve efficiency, whereas, the channel coder adds
redundancy in a controlled manner to improve reliability.
Channel Capacity and Coding
We first look at a class of channel codes called Block Codes. In this class of codes, the
incoming message sequence is first sub-divided into sequential blocks, each of length k bits.
Each k-bit long information block is mapped into an n-bit block by the channel coder, where n
> k. This means that for every k bits of information, (n- k) redundant bits are added. The ratio
k
r=- (2.5)
n
is called the Code Rate. Code rate of any coding scheme is, naturally, less than unity. A small
code rate implies that more and more bits per block are the redundant bits corresponding to a·
higher coding overhead. This may reduce the effect of noise, but will also reduce the
communication rate as we will end up transmitting more redundant bits and fewer information
bits. The question before us is whether there exists a coding scheme such that the probability
that the message bit will be in error is arbitrarily small and yet the coding rate is not too small?
The answer is yes and it was first provided by Shannon in his second theorem on channel
capacity. We will study this shortly.
Let us now introduce the concept of time in our discussion. We wish to look at questions like
how many bits per second can we send over a given noisy channel with arbitrarily low bit error
rates? Suppose the DMS has the source alphabet X and entropy H(X) bits per source symbol
and the source generates a symbol every T5
seconds, then, the average information rate of the
source is H (X) bits per second. Let us assume that the channel can be used once every T,
Ts
seconds and the capacity of the channel is C bits per channel use. Then, the channel capacity
per unit time is _!2_ bits per second. We now state Shannon's second theorem known as the
セ@
Channel Coding Theorem.
Theorem 2.1 Channel Coding Theorem (Noisy coding theorem)
(i) Let a DMS with an alphabet X have entropy H(X) and produce symbols every 1's
seconds. Let a DMC have capacity Cand be used once every T, seconds. Then, if
H (X) < _!2_ (2.6)
セ@ - セ@
エィ・イセ@ exists a coding scheme for which the source output can be transmitted over the
noisy channel and be reconstructed with an arbitrarily low probability of error.
(ii) Conversely, if
H(X) > _!2_ (2.7)
Ts セG@
it is not possible to transmit information over the channel and reconstruct it with an
arbitrarily small probability of error.
The parameter _!2_ is called the Critical Rate.
T,
I
I
Information Theory, Coding and Cryptography
The channel coding theorem is a very important result in information theory. The theorem
specifies the channel capacity, C, as a fundamental limit on the rate at which reliable communi-
cation can be carried out over an unreliable (noisy) DMS channel. It should be noted that the
channel coding theorem tells us about the existence of some codes that can achieve reliable
communication in a noisy environment. Unfortunately, it does not give us the recipe to
construct these codes. Therefore, channel coding is still an active area of research as the search
for better and better codes is still going on. From the next chapter onwards we shall study some
good channel codes.
Example 2.2 Consider a DMS source that emits equally likely binary symbols (p = 0.5) once
every Ts seconds. The entropy for this binary source is
H(p) =- p log2p- (1- p) log2 (1- p) = 1 bit.
The information rate of this source is
H (X) = -1 bits/second.
1's 1's
Suppose we wish to transmit the source symbols over a noisy channel. The source sequence is
applied to a channel coder with code rate r. This channel coder uses the channel once every Tc
seconds to send the coded sequence. We want to have reliable communication (the probability of
error as small as desired). From the channel coding theorem. if
_1 < ..£ (2.8)
1's - I;;
we can make the probability of error as small as desired by a suitable choice of a channel coding
scheme, and hence have reliable communication. We note that the code rate o{the coder can be
expressed as
I;;
r=-
I's
Hence, the condition for reliable communication can be rewritten as
r$. C
(2.9)
(2.10)
Thus, for a BSC one can find a suitable channel coding scheme with a code rate, r 5:. C, which
will ensure reliable communication regardless of how noisy the channel is! Of course, we can
state that at least one such code exists, but finding that code may not be a trivial job. As we shall
see later, the level of noise in the channel will manifest itself by limiting the channel capacity,
and hence the code rate.
Channel Capacity and Coding
Example 2.3 Consider a BSC with a transition probability p = w-2. Such error rates are typical
of wireless channels. We saw in Example 2.1 that for a BSC the capacity is given by
C = 1 + p log2p + (1 - p) log2 ( 1 - p)
By plugging in the value of p = w-2
we obtain the channel capacity C = 0.919. From the
previous example we can conclude that there exists at least one coding scheme with the code rater
$.-0.919 which will guarantee us a (non-zero) probability oferror that is as small as desired.
Example 2.4 Consider the repetition code in which each message bit is simply repeated n times,
where n is an odd integer. For example, for n = 3, we have the mapping scheme
0 セ@ 000; 1 セ@ 111
Similarly, for n = 5 we have the mapping scheme
0 セ@ 00000; 1 セ@ 11111
Note that the code rate of the repetition code with blocklength n is
r = _!_ (2.11)
n
The decoding strategy is as follows: If in a block ofn received bits the number of0's exceeds the
number of 1's, decide in favour of 0 and vice versa. This is otherwise known as Majority
Decoding. This also answers the question why n should be an odd integer for repetition codes.
Let n = 2m + 1, where m is a positive integer. This decoding strategy will make an error if more
than m bits are in error, because in that case ifa 0 is encoded and sent, there would be more number
of 1'sin the received word. Let us assume that the a priori probabilities of 1 and 0 are equal. Then,
the average probability of error is given by
(2.12)
where p is the channel transition probability. The average probability oferror for repetition codes
for different code rates is given in Table 2.2.
Table 2.2 Average probability of error for repetition codes
Code Rate. r Average Probability of
1
113
115
1/7
119
1111
Error. Pe
w-2
3xlo-4
w-6
4xlo-7
w-s
5x10-0
Information Theory, Coding and Cryptography
From the Table we see that as the code rate decreases, there is a steep fall in the average
probability of error. The decrease in the Pe is much more rapid than the decrease in the code
rate, r. However, for repetition codes, the code rate tends to zero if we want smaller and smaller
Pe- Thus the repetition code exchanges code rate for message reliability. But the channel coding
theorem states that the code rate need not tend to zero in order to obtain an arbitrarily low
probability of error. The theorem merely requires the code rate r to be less than the channel
capacity, C. So there must exist some code (other than the repetition code) with code rater= 0.9
which can achieve arbitrarily low probability of error. Such a coding scheme will add just 1
parity bit to 9 information bits (or, maybe, add 10 extra bits to 90 information bits) and give us
as small aPe as desired (say, 10-20 )! The hard part is finding such a code.
2.5 INFORMATION CAPACITY THEOREM
So far we have studied limits on the maximum rate at which information can be sent over a
channel reliably in terms of the chan:1el capacity. In this section we will formulate the
Information Capacity Theorem for band-limited, power-limited Gaussian channels.
Consider a zero mean, stationary random process X(t) that is band limited to WHertz. Let X*'
k = 1, 2,..., K, denote the continuous random variables obtained by uniform sampling of the
process X(t) at the Nyquist rate of 2W samples per second. These symbols are transmitted over
a noisy channel which is also band-limited to W Hertz. The channel output is corrupted by
Additive White Gaussian Noise (AWGN) of zero mean and power spectral density (psd)
Nof2. Because of the channel, the noise is band limited to WHertz. Let Yk, k= 1, 2,..., K, denote
the samples of the received signal. Therefore,
Yk= Xk + N"' k = 1, 2,..., K (2.13)
where Nk is the noise sample with zero mean and variance a 2 = N0W. It is assumed that Yh
k = 1, 2,..., K, are statistically independent. Since the transmitter is usually power-limited, let us
put a constraint on the average power in Xk :
E[X2
k] =P, k= 1, 2,..., K (2.14)
The information capacity of this band-limited, power-limited channel is the maximum of the
mutual information between the channel input Xk and the channel output Yk- The maximization
has to be done over all distributions on the input Xk that satisfy the power constraint of equation
(2.14). Thus, the information capacity of the channel (same as the channel capacity) is given by
C= max {/(X; Y) IE[X]J = P}, (2.15)
fxk (x)
where fxk (x) is the probability density function of xk.
Now, from the previous chapter we have,
I (Xk; Yk) = h(Yk)- h(Yk IXk) (2.16)
Note that Xk and Nk are independent random variables. Therefore, the conditional differential
entropy of Yk given Xk is equal to the differential entropy of Nk. Intuitively, this is because given
Xk the uncertainty arising in Yk is purely due to Nk. That is,
h( Yk IXk) = h(Nk) (2.17)
Channel Capacity and Coding
Hence we can write Eq. (2.16) as
1 (Xk; Yk) = h(Yk)- h (Nk) (2.18)
Since h (NJ is independent of X*' maximizing I(Xh YJ translates to maximizing h (YJ. It can
be shown that in order for h (YJ to be maximum, Yk has to be a Gaussian random variable (see
problem 2.10). If we assume Yk to be Gaussian, and Nk is Gaussian by definition, then X is also
Gaussian. This is because the sum (or difference) of two Gaussian random variablesk is also
Gaussian. Thus, in order to maximize the mutual information between the channel input Xk and
the channel output Y.b the transmitted signal should also be Gaussian. Therefore we can rewrite
(2.15) as
C = I(X;Y) IE {xセ@ P 。ョ、セ@ is Gaussian (2.19)
We know that if two independent Gaussian random variables are added, the variance of the
resulting Gaussian random variable is the sum of the variances. Therefore, the variance of the
received sample Yk equals P + No W. It can be shown that the differential entropy of a Gaussian
random variable with variance a
2
is セ@ log2 (2necr) (see Problem 2.10). Therefore,
and
h (YJ = _!_log2 [2ne(P+ N0
W)]
2
h (Nk) = 1-log2 [2ne (N0 W)]
2
(2.20)
(2.21)
Substituting these values of differential entropy for Yk and Nk we get
C = __!_log2(1 + _____f__J bits per channel use
2 N0W (2.22)
We are transmitting 2 W samples per second, i.e., the channel is being used 2W times in one
second. Therefore, the information capacity can be expressed as
C= Wlog2(1 + ___f__J bits per second (2.23)
N0W
This basic formula for the capacity of the band-limited, AWGN waveform channel with a
band-limited and average power-limited input was first derived by Shannon in 1948. It is known
as Shannon's third theorem, the Information Capacity Theorem.
Theorem 2.2 (Information Capacity Theorem) The information capacity of a continuous
channel of bandwidth W Hertz, disturbed by Additive White Gaussian Noise of power
spectral density N012 and limited in bandwidth to W, is given by
C= Wlog2(1 + _____f__J bits per second
N0W
where P is the average transmitted power. This theorem is also called the Channel
Capacity Theorem.
il
Information Theory, Coding and Cryptography
The Information Capacity Theorem is one of the important results in information theory.
In a single formula one can see the trade off between the channel bandwidth, the average
transmitted power and the noise power spectral density. Given the channel bandwidth and the
SNR the channel capacity (bits/second) can be computed. This channel capacity is the
fundamental limit on the rate of reliable communication for a ーッキ・イセャゥュゥエ・、L@ band-limited
gセウウゥ。ョ@ channel. It should be kept in mind that in order to approach this limit, the transmitted
signal must have statistical properties that are Gaussian in nature. Note that the terms channel
capacity and information capacity have been used interchangeably.
Let us now derive the same result in a more intuitive manner. Suppose we have a coding
scheme that results in an acceptably low probability of error. Let this coding scheme take k
information bits and encode them into n bit long codewords. The total number of codewords is
M = 2k . Let the average power per bit be P. Thus the average power required to transmit an
entire codeword is nP. Let these codewords be transmitted over a Gaussian channel with the
noise variance equal to if. The received vector of n bits is also Gaussian with the mean equal to
the transmitted codeword and the variance equal to na
2
. Since the code is a good one
(acceptable error rate), the vector lies inside a sphere of radius .Jna
2
centred on the transmitted
codeword. This sphere itself is contained in a larger sphere of radius Jn(P + <1
2
) where
n (P + a 2) is the average power of the received vector.
This concept may be visualized as depicted in Fig. 2.6. There is a large sphere of radius
Jn(P + a 2 ) which contains M smaller spheres of radius .Jna2
• HereM= 2k is the total number
of codewords. Each of these small spheres is centred on a codeword. These are called the
Decoding Spheres. Any received word lying within a sphere is decoded as the codeword on
which the sphere is centred. Suppose a codeword is transmitted over a noisy channel. Then
there is a high probability that received vector will lie inside the correct decoding sphere (since
it is a reasonably good code). The question arises: How many non-intersecting spheres can be
packed inside the large sphere? The more number of spheres one can pack, the more efficient
will be the code in terms of the code rate. This is known as the Sphere Packing Problem.
Fig. 2.6 Visualization of the Sphere Packing Problem.
Channel Capacity and Coding
The volume of an n-dimensional sphere of radius r can be expressed as
V= Anr" (2.24)
where An is a scaling factor. Therefore, the volume of the large sphere (sphere of all possible
received vectors) can be written as
Vau= An [n(P+ a 2)]n/2 (2.25)
and the volume of the decoding sphere can be written as
Vdr =An [na2]nl2 (2.26)
The maximum number of non intersecting decoding spheres that can be packed inside the
large sphere of all possible received vectors is
M= セ@ [n(P + (12 )]n/2 = (1+ __f_) = 2(n/2)log2(1+Picr2) (2.27)
セ{ョ。R}ョOR@ <12
On taking logarithm (base 2) on both sides of the equation we get
log2M = _!!:_ log2 (1+ _!___)
2 (12 (2.28)
Observing that k = log2M, we have
i_ = l_log2 (1+ _L_) '
n 2 a 2 (2.29)
Note that each time we use the channel, we effectively transmit i_ bits. Thus, the maximum
n
number of bits that can be transmitted per channel use, with a low probability of error, is l_ log2
2
x ( 1+ セ@ ) as seen previously in Eq. (2.22). Note that if represents the noise power and is
equal to N0 Wfor an AWGN with power spectral density No_ and limited in bandwidth to W
2
2.6 THE SHANNON LIMIT
Consider a Gaussian channel that is limited both in power and bandwidth. We wish to explore
the limits of a communication system under these constraints. Let us define an ideal system
which can transmit data at a bit rate Rb which is equal to the capacity, C, of the channel, i.e., Rb
= C. Suppose the energy per bit is Eb. Then the average transmitted power is
P= eセ「@ = EbC (2.30)
Therefore, the channel capacity theorem for this ideal system can be written as
_£_ = log2(1 + Eh 2_J
W N0 W
(2.31)
Information Theory, Coding and Cryptography
This equation can be re-written in the following form
E 2CIW -1
_b =---- (2.32)
No CIW
The plot of the bandwidth efficiency RWb versus Eb is called the Bandwidth Efficiency
No ,
Diagram, and is given in Fig. 2.7. The ideal system is represented by the line Rb = C.
MMMMMMMMMMMMMMMMセMMMMMMMMMMMセMMMMMMMMMMMセMMMMMMMMMMM
I
1
I
--------- MMMMMMセMMMMMMMMMMMLMMMMMMMMMMMLMMMMMMMMMMM
1 I
I
MMMMMMMMMセM ----------------------
10° ==== ]]]]]セ]]]]]ェ]]]]]]]]]]]セ]]]]]]]]]]]ェ]]]]]]]]]]]@
---- MMMMMMMMMMMTMMMMMMMMMMMセMMMMMMMMMMMMMMMMMMMMMMM
---- MMMMMMMMMMMセMMMMMMMMMMMTMMMMMMMMMMMMMMMMMMMMMMM
l-----------------------l-----------------------
セMMMMMMMMMMMMMMMMMMMMMMMTMMMMMMMMMMMセMMMMMMMMMMM
l I '
セMMMMMMMMMMMMMMMMMMMMMMMセMMMMMMMMMMMセMMMMMMMMMMM
1
MMMMMMMMMMMMMMセMMMMMMMMMMMセMMMMMMMMMMM 1 I
0 10 20 30
Fig. 2.7 The Bandwidth Efficiency Diagram.
The following conclusions can be drawn from the Bandwidth Efficiency Diagram.
· Eb d th 1· . . al
(i) For infinite bandwidth, the ratio No ten s セッ@ e tmtting v ue
Eb I = In 2 = 0.693 = - 1.6 dB
No W---too
(2.33)
This value is called the Shannon Limit. It is interesting to note that the Shannon limit is a
fraction. This implies that for very large bandwidths, reliable communication is possible even
for the case when the signal power is less than the noise power! The channel capacity
corresponding to this limiting value is
Clw---too = セ
P@ log2 e (2.34)
Thus, at infinite bandwidth, the capacity of the channel is determined by the SNR.
Channel Capacity and Coding
(ii) The curve for the critical rate Rb = Cis known as the Capacity Boundary. For the case
Rb > C, reliable communication is not guaranteed. However, for Rb < C, there exists some
coding scheme which can provide an arbitrarily low probability of error.
(iii) The Bandwidth Efficiency Diagram shows the trade-offs between the quantities !!JJ_ Eb
W'N0
and the probability of error, Pe- Note that for designing any communication system the basic
design parameters are the bandwidth available, the SNR and the bit error rate (BER). The BER
is determined by the application and the quality of service (QoS) desired. The bandwidth and
the power can be traded one for the other to provide the desired BER.
(iv) Any point on the Bandwidth Efficiency Diagram corresponds to an operating point
corresponding to a set of values of SNR, Bandwidth Efficiency and BER.
The information capacity theorem predicts the maximum amount of information that can be
transmitted through a given bandwidth for a given SNR. We see from Fig. 2.7 that acceptable
capacity can be achieved even for low SNRs, provided adequate bandwidth is available. The
optimum usage of a given bandwidth is obtained when the signals are noise-like and a minimal
SNR is maintained at the receiver. This principle lies in the heart of any spread spectrum
communication system, like Code Division Multiple Access (CDMA).
2.7 RANDOM SELECTION OF CODES
Consider a set of M coded signal waveforms constructed from a set of n-dimensional binary
codewords. Let us represent these codewords as follows
Ci= [ci1 cfl ... ciJ, i= 1, 2, ...M (2.35)
Since we are considering binary codes, ciJ is either a 0 or a 1. Let each bit of the codeword be
mapped on to a BPSK waveform p
1 so that the codeword may be represented as
where
n
s/t) = I, siJ p1(t), i = 1,2, ... M,
j=l
{
.JE for ciJ = 1
siJ = -JE for ciJ = 0
(2.36)
(2.37)
and JE is the energy per code bit. The waveform s/t) can then be represented as the n-
dimensional vector
si = [si1 si2 ... siJ , i = 1, 2, ...M (2.38)
We observe that this corresponds to a hypercube in the n-dimensional space. Let us now encode
k bits of information into an n bit long codeword, and map this codeword to one of the M
I
セi@
Information Theory, Coding and Cryptography
waveforms. Note that there are a total of 2k possible waveforms r.orresponding to the M = 2k
different codewords.
Let the information rate into the encoder be R bits/sec. The encoder takes ink bits at a time
and maps the k-bit block to one of the M waveforms. Thus, k = RT and M = 2k signals are
required.
Let us define a parameter D as follows
D = !!:_ dimensions/sec (2.39)
T
n = DT is the dimensionality of the space. The hypercube mentioned above has 2n = 2DT
vertices. Of these, we must choose M = 2RT to transmit the information. Under the constraint
D > R, the fraction of vertices that can be used as signal points is
2k 2RT
F=-=-=T(D-R)T (2.40)
2n 2DT
For D >R, F セ@ 0 as T セ@ oo. Since, n = DT, it implies that F セ@ 0 as n セ@ oo. Designing a good
coding scheme translates to choosing M vertices out of the 2n vertices of the hypercube in such
a manner that the probability of error tends to zero as we increase n. We saw that the fraction F
tends to zero as we choose larger and larger n. This implies that it is possible to increase the
minimum distance between these M signal points as n セ@ oo. Increasing the minimum distance
between the signal points would give us the probability of error, Pe セ@ 0.
There are ( 2n)M distinct ways of choosing M out of the total2n vertices. Each of these choices
corresponds to a coding scheme. For each set of M waveforms, it is possible to design a
communication system consisting of a modulator and a demodulator. Thus, there are 2nM
communication systems, one for each choice of the M coded waveforms. Each of these
communication systems is characterized by its probability of error. Of course, many of these
communication systems will perform poorly in terms of the probability of error.
Let us pick one of the codes at random from the possible 2nM sets of codes. The random
selection of this mili codeword occurs with the probability
(2.41)
Let the corresponding probability of error for this choice of code be Pe({s;} m). Then the
average probability of error over the ensemble of codes is
cztM
セ@ = lセHサウゥスュIpHサウゥスュIᄋ@
m;=1
'f'M
= 2!M セQjAH@ {si}m) (2.42)
Channel Capacity and Coding
We will next try to upper bound this average probability of error. If we have an upper bound
on セL@ then we can conclude that there exists at least one code for which this upper bound will
also hold. Furthermore, ゥヲセ@ セ@ 0 as n セ@ oo, we can surmise that Pe({sJJ セ@ 0 as n セ@ oo.
Consider the transmission of a k-bit message Xk = [x1 セ@ ... xJ where x
1
is binary for j= 1,2,...,
k. The conditional probability of error averaged over all possible codes is
J!(Xk) = L Pe (Xh {s;}JP ({s;}J
all codes
(2.43)
where Pe (Xh {sJJ is the conditional probability of error for a given k-bit message Xk = {ク Q セ@ · ••
xk], which is transmitted using the code {sJm. For the mili code,
M
Pe(Xh {sJJ $ L セュHsコLsォIL@ (2.44)
l=1
l-T-k
where, P2m(sz, sk) is the probability of error for the binary communication system using signal
vectors Sz and sk. to transmit one of two equally likely k-bit messages. Hence,
M
J!(Xk) $ L Pe({sJJ LP2m(Sz, sJ (2.45)
all codes l =1
l-T-k ,
On changing the order of summation we obtain
jAHxセI@ $ I [ lセHサウ[スュセュHウコL@ sk)J $ I セHsコL@ sk).
l=l all codes l=l
(2.46)
where セHウ Q L@ sk), represents the ensemble average of) , P2m(sz, sJ over the 2nMcodes. For additive
White Gaussian Noise Channel,
Therefore,
P2m (sz, sk) = Q,( dft J
2N0
n
、セ]@ 1st- si = L (sl;.- skJ)2
= d(2JE)2
= 4dE
}=1
P,m (s,, セセ@ セ@ Q(セセ@ J
(2.47)
(2.48)
(2.49)
Under the assumption that all codes are equally probable, it is equally likely that the vector s1
is any of the 2n vertices of the hypercube. Further, s1and sk are statistically independent. Hence,
the probability that s1and sk differ in exactly d places is
Information Theory, Coding and Cryptography
P(d) =(; r(:J
The expected value of P2m(s1, sk) over the ensemble of codes is then given by
Using the following upper bound
dE
セセI\・MnッL@
we obtain
P2(s1, s,) セH[NI⦅エッHZスセセ@ セ@ [セ@ (I+・セA@ Jr
From Eqs. (2.46) and (2.53) we obtain
pLHxLIセセpLHウLLNLI@ =(M-1{ セHセKNセAIイ@ <M[ [HャKセAIイ@
zLセォ@
(2.50)
(2.51)
(2.52)
(2.53)
(2.54)
Recall that we need an upper bound on セG@ the average error probability. To obtain セ@ we
average セ@ (Xk) over all possible k- bit information sequences. Thus,
We now define a new parameter as follows.
Definition 2.4 The Cutoff Rate R0 is defined as
Ro= log2
2
2= l-log2 ( iKNセAIN@ (2.56)
1+ eNo
The cutoff rate has the units of bits/dimension. Observe that 0 セ@ R0 セ@ 1. The plot of R0
with respect to SNR per dimension is given in Fig. 2.8.
The Eq. (2.55) can now be written succinctly as
セ@ <M2-nRo = TRT 2-nRo (2.57)
Substituting n = DT, we obtain
p <2-T(DRo-R).
e (2.58)
If we substitute T = nlD, we obtain
p <2-n (Ro- RJI1)
e (2.59)
Channel Capacity and Coding
0.9
0.8
0.7
0.6
Ro
0.5
0.4
0.3
0.2
0.1
0
-10 -5 0 5 10
E!N0 (dB)
Fig. 2.8 Cutoff Rate, R0, Versus the SNR (in dB) Per Dimension.
Observe that
Ji = _!l_ = RT = _! = R
D niT n n c·
(2.60)
Here, Rc represents the code rate. Hence the average error probability can be written in the
following instructive form
(2.61)
From the above equation we can conclude the following.
(i) For Rc <Rothe average probability of error セ@ セ@ 0 as n セ@ oo. Since by choosing large
values of n, セ@ can be made arbitrarily small, there exist good codes in the ensemble
which have the probability of error less than セN@
(ii) We observe the fact that セ@ is the ensemble average. Therefore, if a code is selected at
random, the probability is that its error セ@ > a セ@ is less than 11a. This implies that there
are no more than 10% of the codes that have an error probability that exceeds QPセN@
Thus, エィ・イセ@ are many good codes.
(iii) The codes whose probability of error ・ク」・・、セ@ are not always bad codes. The probability
of error of these codes may be reduced by increasing the dimensionality, n.
I
Information Theory, Coding and Cryptography
For binary coded signals, the cutoff rate, J?o, saturates at 1 bit/dimension for large values of__§___,
No
say >10. Thus, to achieve lower probabilities of error one must reduce the code rate, Rc
Alternately, very large block lengths have to be used. This is not an efficient approach. So,
binary codes become inefficient at high SNRs. For high SNR scenarios, non-binary coded signal
sets should be used to achieve an increase in the number of bits per dimension. Multiple-
amplitude coded signal sets can be easily constructed from non-binary codes by mapping each
code element into one of the possible amplitude levels (e.g. Pulse Amplitude Modulation). For
random codes using Mary multi-amplitude signals, it was shown by Shannon (in 1959) that
(2.62)
Let us now relate the cutoff rate R•0 to the capacity of the AWGN channel, which is given by
e= W log2 (1 + __f__) bits per second (2.63)
N 0W
The energy per code bit is equal to
E= PT
n
(2.64)
Recall that from the sampling theorem, a signal of bandwidth W may be represented by samples
taken at a rate 2W Thus, in the time interval of length T there are n = 2WTsamples. Therefore,
we may write D = _!l__ = 2W Hence,
T
nE
P=-=DE.
T
(2.65)
Define normalized capacity, en = _f_ = _s;__ and substitute for Wand Pin (2.63) to obtain
2W D
」Nセ@ (セ@ )'og2 (1+ 2;J
= ( セ@ }og2 (1 + 2RcY.b) (2.66)
where 'Yb is the SNR per bit. The normalized capacity, en and cutoff rate, R·0, are plotted in
Fig. 2.9. From the figure we can conclude the following:
(i) rセ@ < en for all values of __§___ . This is expected because en is the ultimate limit on the
No
transmission rate R/D.
Channel Capacity and Coding
(ii) For smaller values of the difference between en and R*0 is approximately 3 dB. This
means that randomly selected, average power limited, multi-amplitude signals yield R•0
within 3 dB of channel capacity.
2.5
2
0:1.5
"'C
c
I'll
<::
(..) 1
0.5
-5 0
E!No (98)
5 10
Fig. 2.9 The Normalized Capacity, Cn and Cutoff Rate, rセL@ for an AWGN Channel.
2.8 CONCLUDING REMARKS
Pioneering work in the area of channel capacity was done by Shannon in 1948. Shannon's
second theorem was indeed a surprising result at the time of its publication. It claimed that the
probability of error for a BSC could be made as small as desired provided the code rate was less
than the channel capacity. This theorem paved the way for a systematic study of reliable
communication over unreliable (noisy) channels. Shannon's third theorem, the Information
Capacity Theorem, is one of the most remarkable results in information theory. It gives a
relation between the channel bandwidth, the signal to noise ratio and the channel capacity.
Additional work was carried out in the 1950s and 1960s by Gilbert, Gallager, Wyner, Forney
and Viterbi to name some of the prominent contributors.
The concept of cutoff rate was also developed by Shannon, but was later used by Wozencraft,
Jacobs and Kennedy as a design parameter for communication systems.Jordan used the concept
of cutoff rate to design coded waveforms for Mary orthogonal signals with coherent and non-
coherent detection. Cutoff rates have been widely used as a design criterion for various
channels, including fading channels encountered in wireless communications.
. !
Information Theory, Coding and Cryptography
SUMMARY
• The conditional probability P (yi I xj is called the channel transition probability and is
denoted by Pji· The conditional pro6abilities {P(yi I x)} that characterize a DMC can be
arranged in the matrix form P = [p1J. P is known as the probability transition matrix for
the channel. ·
• The capacity of a discrete memoryless channel (DMC) is defined as the maximum
average mutual information in any single use of the channel, where the maximization is
over all possible input probabilities. That is,
q-I r-I P(y-lx·)
C= max !(X; Y) =max L :LP(x1)P(yilx1)log ( {
P(xj) P(xj) j=Oi=O p Yi
• The basic objective of channel coding is to increase the resistance of the digital
communication system to channel noise. This is done by adding redundancies in the
transmitted data stream in a controlled manner. Channel coding is also referred to as
error control coding.
• The ratio, r= ! ,is called the code rate. Code rate of any coding scheme is always less
n
than unity.
• Let a DMS with an alphabet X have entropy H (X) and produce symbols every I:
seconds. Let a DMC have capacity C and be used once every Tc seconds. Then, if
H(X) :::; _f._, there exists a coding scheme for which the source output can be transmitted
I: Tc
over the noisy channel and be reconstructed with an arbitrarily low probability of error.
This is the Channel Coding Theorem or the Noisy Coding Theorem.
• For H(X) :::; _q__, it is not possible to transmit information over the channel and
I: セ@
reconstruct it with an arbitrarily small probability of error. The parameter _f._ is called
セ@
the Critical Rate.
• The information capacity can be expressed as C= Wlog2 (I+_f_) bits per second.
N0W
This is the basic formula for the capacity of the band-limited, AWGN waveform channel
with a band-limited and average power-limited input. This is the crux of the Information
Capacity Theorem. This theorem is also called the Channel Capacity Theorem.
• The cutoff rate 11, is given by 11, = log2 セ@ L = 1-log2 ( 1+ e- : 0
J.The cutoff rate
I+ e No
has the units of bits/dimension. Note that 0 :::; R 0:::; 1. The average error probability in
Channel Capacity and Coding
terms of the cutoff rate can be written as セ@ < 2-n (J?o - R,). For Rc < セ@ the average
probability of error セ@ セ@ 0 as n セ@ oo .
ff
PRC913LEMS
2.I Consider the binary channel shown in Fig. 2.IO. Let the a priori probabilities oilifsendiP.n(Xg
the binary symbol be Po and pi, where Po+ PI= I. Find the aposteriori probab· "ties
= 0 IY = 0) and P (X= IIY = I)
P1 1 1-q
Fig. 2.10
q
2.2 Find the capacity of the binary erasure channel shown in Fig. 2.1I, where Po and PI are the
a priori probabilities.
P1 1 1-q
Fig. 2.11
e
2.3 Consider the channels A, B and the cascaded channel AB shown in Fig. 2.I2.
(a) Find CA the capacity of channel A.
(b) Find セエィ・@ capacity of channel B. . .
(c) Next, cascade the two channels and determine the combmed capacity CAB.
(d) Explain the relation between CA, セ。ョ、@ CAB.
lI
I
I
I
Information Theory, Coding and Cryptography
セ@
セ@
A B
Fig. 2.12
2.4 Find the capacity of the channel shown in Fig. 2.13.
0.5
0.5
Fig. 2.13
AB
2.5 (a) A telephone channel has a bandwidth of 3000 Hz and the SNR = 20 dB. Determine
the channel capacity.
(b) If the SNR is increased to 25 dB, determine the capacity.
2.6 Determine the channel capacity of the channel shown in Fig. 2.14.
1 -p
Fig. 2.14
2.7 Suppose a TV displays 30 frames/second. There are approximately 2 x 105 pixels per
frame, each pixel requiring 16 bits for colour display. Assuming an SNR of 25 dB
calculate the bandwidth required to support the transmission of the TV video signal (use
the Information Capacity Theorem).
2.8 Consider the Z channel shown in Fig. 2.15.
(a) Find the input probabilities that result in capacity.
(b) If N such channels are cascaded, show that the combined channel can be represented
by an equivalent Z channel with the channel transition probability jll.
(c) What is the capacity of the combined channel as N -7 oo?
Channel Capacity and Coding
セ@ 1 -p
Fig. 2.15
2.9 Consider a communication system using antipodal signalling. The SNR is 20 dB.
(a) Find the cutoff rate, J?o.
(b) We want to design a code which results in an average probability of error, Pe < 10-6.
What is the best code rate we can achieve?
(c) What will be the dimensionality, n, of this code?
(d) Repeat the earlier parts (a), (b) and (c) for an SNR = 5dB. Compare the results.
2.10 (a) Prove that for a finite variance o-2
, the Gaussian random variable has the largest
differential entropy attainable by any random variable.
(b) Show that this entropy is given by _!_ log2 (2 neo-2
).
2
C<.9MPUTER PR<.9'BLEMS
2.11 Write a computer program that takes in the channel transition probability matrix and
computes the capacity of the channel.
2.12 Plot the operating points on the bandwidth efficiency diagram for M-PSK, M= 2, 4, 8, 16
and 32, and the probabilities of error: (a) Pe = 10--{) and (b) Pe = 10-8 .
2.13 Write a program that implements the binary repitition code of rate 11n, where n is an odd
integer. Develop a decoder for the repitition code. Test the performance of this coding
scheme over a BSC with the channel transition probability, p. Generalize the program for
a repetition code of rate 11n over GF (q). Plot the residual Bit Error Rate (BER) versus p
and q (make a 3-D mesh plot).
I
information_theory_coding_and_cryptograp.pdf
Linear Block Codes for
Error Correction
3
:C mセ@ w CM1I セ@ セ@ セ@ btcr tx
ihct.WL net be-- allcwedt to- セ@ i-t1t セ@ キセ@ of セ@
セセセcャ「」エ、Bーセセーセ@ II
Ri.cluM-d- W. hセ@
3.1 INTRODUCTION TO ERROR CORRECTING CODES
In this age ofinformation, there is increasing need not only for speed, but also for accuracy in the
storage, retrieval, and transmission of data. The channels over which messages are transmitted
are often imperfect. Machines do make errors, and their non-man-made mistakes can turn
otherwise flawless programming into worthless, even dangerous, trash.Just as architects design
buildings that will stand even through an earthquake, their computer counterparts have come
up with sophisticated techniques capable of counteracting digital manifestations of Murphy's
Law ("If anything can go wrong, it will go"). Error Correcting Codes are a kind of safety net-
the mathematical insurance against the vagaries of an imperfect digital world.
Error Correcting Codes, as the name suggests, are used for correcting errors when
messages are transmitted over a noisy channel or stored data is retrieved. The physical medium
through which the messages are transmitted is called a channel (e.g. a telephone line, a satellite
link, a wireless channel used for mobile communications etc.). Different kinds of channels are
Information Theory, Coding and Cryptography
prone to different kinds of noise, which corrupt the data being transmitted. The noise could be
caused by lightning, human errors, equipment malfunction, voltage surge etc. Because these
error correcting codes try to overcome the detrimental effects of noise in the channel, the
encoding procedure is also called Channel Coding. Error control codes are also used for accurate
transfer of information from one place to another, for example storing data and reading it from
a compact disc (CD). In this case, the error could be due to a scratch on the surface of the CD.
The error correcting coding scheme will try to recover the original data from the corrupted one.
The basic idea behind error correcting codes is to add some redundancy in the form of extra
symbol to a message prior to its transmission through a noisy channel. This redundancy is
added in a controlled manner. The encoded message when transmitted might be corrupted by
noise in the channel. At the receiver, the original message can be recovered from the corrupted
one if the number of errors are within the limit for which the code has been designed. The block
diagram of a digital communication system is illustrated in Fig. 3.1. No.te that the most important
block in the figure is that of noise, without which there will be no need for the channel encoder.
Example 3.1 Let us see how redundancy combats the effects ofnoise. The normal language that
we use to communicate (say, English) has a lot ofredundancy built into it. Consider the following
sentence.
CODNG THEORY IS AN INTRSTNG SUBJCT.
As we can see, there are a number of errors in this sentence. However, due to familiarity with the
language we may guess the original text to have read:
CODING THEORY IS AN INTERESTING SUBJECT.
What we have just used is an error correcting strategy that makes use ofthe in-builtredundancy in
English language to reconstruct the original message from the corrupted one.
Information
Source
' ᄋLGLZLセ@ ,;..·
Use of
Information
Channel
Encoder
Channel
Decoder
Demodulator 1'*------'
Fig. 3.1 Block Diagram (and the principle) of a Digital Communication System.
Here the Source Coder/Decoder Block has not been shown.
Linear Block Codes for Error Correction
The objectives of a good error control coding scheme are
(i) error correcting capability in terms of the number of errors that it can rectify
(ii) fast and efficient encoding of the message,
(iii) fast and efficient decoding of the received message
(iv) maximum transfer of information bits per unit time (i.e., fewer overheads in terms of
redundancy).
The first objective is the primary one. In order to increase the error correcting capability of a
coding scheme one must introduce more redundancies. However, increased redundancy leads
to a slower rate of transfer of the actual information. Thus the objectives (i) and (iv) are not
totally compatible. Also, as the coding strategies become more complicated for correcting larger
number of errors, the objectives (ii) and (iii) also become difficult to achieve.
In this chapter, we ウィ。ャセ@ first learn about the basic definitions of error control coding. These
definitions, as we shall see, would be used throughout this book. The concept of Linear Block
Codes will then be introduced. linear Block Codes form a very large class of useful codes.
We will see that it is very easy to work with the matrix description of these codes. In the later
part of this chapter, we will learn how to efficiently decode these Linear Block Codes. Finally,
the notion of perfect codes and optimal linear codes will be introduced.
$
3.2 BASIC DEFINITIONS
Given here are some basic definitions, which will be frequently used here as well as in the later
chapters.
Definition 3.1 A Word is a sequence of symbols.
Definition 3.2 A Code is a set of vectors called Codewords.
Definition 3.3 The Hamming Weight of a Codeword {or any vector) is equal to
the number of nonzero elements in the codeword. The Hamming Weight of a
codeword cis denoted by w(c). The Hamming Distance between two codewords is
the number of places the codewords differ. The Hamming Distance between two
codewords c1 and '2 is denoted by d(ch '2). It is easy to see that d(ch '2) = w (cl- 」セN@
Example 3.2 Consider a code C with two code words= {0100, 1111} with Hamming Weight
w (0100) =1 and w (1111) =4. The Hamming Distance between the two codewords is 3 because
they differ at the 18
 3rd and 4th p1aees. Observe that w (0100- 1111) = w (1011) = 3 = d(OlOO,
1111).
Example 3.3 For the codeC = {01234, 43210}, the Hamming Weight ofeach codeword is 4and
the Hamming Distance between the codewords is 4 (because only the 3rd component of the two
codewords are identical while they differ at 4 places).
.
j
!
I
l
Information Theory, Coding and Cryptography
Definition 3.4 A Block Code consists of a set of fixed length codewords. The
fixed length of these codewords is called the Block Length and is typically denoted
by n. Thus, a code of blocklength n consists of a set of codewords having n
components.
A block code of size M defined over an alphabet with qsymbols is a set of M q-ary
sequences, each of length n. In the special case that q= 2, the symbols are called bits
and the code is said to be a binary code. Usually, M = q* for some integer k, and we
call such a code an (n, k) code.
Example 3.4 The code C = {00000, 10100, 11110, 11001} is a block code ofblock length equal
to 5. This code can be used to represent two bit binary numbers as follows
Uncoded bits Codewords
00 ()()()()()
01
10
11
10100
ll110
11001
HereM= 4,k= 2 andn = 5. Suppose we have to transmit a sequence of 1'sand O's using the above
coding scheme. Let's say that the sequence to be encoded is 1 0 0 1 0 1 0 0 1 1 ... The first step is
to break the sequence in groups of two bits (because we want to encode two bits at a time). So we
partition as follows
1001010011 ...
Next, replace each block by its corresponding codeword.
11110 10100 10100 ()()()()() 11001 ...
Thus 5 bits (coded) are sent for every 2 bits of uncoded message. It should be observed that for
every 2 bits of information we are sending 3 extra bits (redundancy).
Definition 3.5 The Code Rate of an (n, Jq code is defined as the ratio (kin), and
denotes the fraction of the codeword that consists of the information symbols.
Code rate is always less than unity. The smaller the code rate, the greater the
redundancy, i.e., more of redundant symbols are present per information symbol in a
codeword. A code with greater redundancy has the potential to detect and correct
more of symbols in error, but reduces the actual rate of transmission of information.
Definition 3.6 The minimum distance of a code is the minimum Hamming
distance between any two codewords. If the code C consists of the set of codewords
{ci' i=O, 1, ...,M-1} then the minimum distance ofthe code is given by a=mind(ci' c
1),
1
i*'i An (n, "' code with minimum distance ais sometimes denoted by (n, k, a).
Definition 3.7 The minimum weight of a code is the smallest weight of any non-
zero codeword, and is denoted by w·.
Linear Block Codes for Error Correction
Theorem 3.1 For a linear code the minimum distance is equal to the minimum weight of
the code, i.e., d.. = w*.
Intuitive proof: The distance diJ between any two codewords ci and 0is simply the weight
of the codeword formed by ci - c
1 Since the code is linear, the difference between two
codewords results in another valid codeword. Thus, the minimum weight of a non-zero
codeword will reflect the minimum distance of the code.
Definition 3.8 A linear code has the following properties:
(i) The sum of two codewords belonging to the code is also a codeword belonging
to the code.
(ii) The all-zero word is always a codeword.
(iii) The minimum Hamming distance between two codewords of a linear code is
equal to the minimum weight of any non-zero codeword, i.e., a= w"'.
Note that if the sum of two codewords is anott'l..!r codeword, the difference of two
codewords will also yield a valid codeword. For example, if ell '2 and c3 are valid
codewords such that c1 + £2 = c3 then Ca - c1 = c2. Hence it is obvious that the all-zero
codeword must always be a valid codeword for a linear block code (seH-subtraction
of a codeword).
Example 3.5 The code C = {0000, 1010,0101, 1111} is a linear block code ofblock lengthn =
4. Observe that all the ten possible sums of the codewords
0000 + 0000 = 0000,0000 + 1010 = 1010, ()()()() + ('101 = 0101,
0000 + 1111 = 1111, 1010 + 1010 = 0000, 1010 + 0101 = 1111,
1010 + 1111 = 0101, 0101 + 0101 = 0000, 0101 + 1111 = 1010 and
1111 + 1111 = 0000
are in C and the all-zero codeword is inC. The minimum distance ofthis code isd = 2. In orderto
verify the minimum distance ofthis linear w code we can determine the distance between all pairs
of codewords (which is ( セI@ = 6 in number):
d (0000, 1010) = 2, d (0000, 0101) = 2, d (0000, 1111) = 4
d (1010, 0101) = 4, d (1010, 1111) = 2, d (0101, 1111) = 2
We observe that the minimum distance ofthis code is 2.
Note that the code given in Example 3.4 is not linear because 1010 + 1111 = 0101, which is not
a valid codeword. Even though the all-zero word is a valid codeword, it does not guarantee
linearity. The presence of an all-zero codeword is thus a necessary b!lt not a sufficient
condition for linearity.
.I
Information Theory, Coding and Cryptography
In order to make the error correcting codes easier to use, understand and analyze, it is helpful to
impose some basic algebraic structure on them. As we shall soon see, it is useful to have an
alphabet werein it is easy to carry out basic mathematical operations such as add, subtract,
multiply and divide.
Definition 3.9 A field F is a set of elements with two operations + (addition) and .
(multiplication) satisfying the following properties
(i) F is closed under + and ., i.e., a + band a · bare in F if aand bare in F.
For all a, band cin F, the following hold:
(ii) Commutative laws: a + b= b+ a, a · b= b. a
(iii) Associative laws: (a+ b)+ c= a+ (b +c), a· (b · c)= (a· b) · c
(iv) Distributive law: a· (b + q=a· b+a· c
Further, identity elements 0 and 1 must exist in F satisfying:
(v) a+ 0 =a
(vi) a· 1 =a
(vii) For any a in F, there exists an additive inverse (-a) such that a + (-a) = 0.
{viii) For any ain F, there exists an multiplicative inverse (a-1
) such that a· a-1
= 1.
The above properties are true for fields with both finite as well as infinite elements. A
field with a finite number of eleiJents (say, q) is called a Galois Field (pronounced
Galva Field) and is denoted by GF(q). If only the first seven properties are satisfied,
then it is called a ring.
Extunple 3.6 Consider GF (4) with 4 elements {0, 1, 2, 3}. The addition and multiplication
tables for GF(4) are
+ 0 1 2 3 . 0 1 2 3
0 0 I 2 3 0 0 0 0 0
1 1 0 3 2 1 0 1 2 3
2 2 3 0 1 2 0 2 3 1
3 3 2 1 0 3 0 3 1 2
It should be noted here that the addition in GF(4) is not modulo 4 addition.
Linear Block Codes for Error Correction
Let us define a vector space, GF(q)n, which is a set of n-tuples of elements from GF(q). Linear
block codes can be looked upon as a set of n-tuples (vectors oflength n) over GF(q) such that the
sum of two codewords is also a codeword, and the product of any codeword by a field element
is a codeword. Thus, a linear block code is a subspace of GF(q)n.
Let S be a set of vectors of length n whose components are defined over GF(q). The set of all
linear combinations of the vectors ofSis called the linear span ofSand is denoted by <S>. The
linear span is thus a subspace of GF( q)n, generated by S. Given any subset S of GF( q)n, it is
possible to obtain a linear code C = <S> generated by S, consisting of precisely the following
codewords:
(i) all-zero word,
(ii) all words in S,
(iii) all linear combinations of two or more words in S.
Example 3.7 LetS= {1100, 0100, 0011 }. All possible linear combinations of S are llOO +
0100 = 1000, 1100 + 0011 = 1111,0100 + 0011 = 0111, 1100 + 0100 + 0011 セ@ 1011.
Therefore, C = <S> = {0000, 1100,0100,0011, 1000, 1111,0111, 1011 }. The minimumdistance
of this code is w(OIOO) = 1.
Extunple 3.8 LetS= {12, 21} defined over GF(3). The addition and multiplication tables of
field GF(3) = {0, 1, 2} are given by:
+ 0 1 2 0 1 2
0 0 1 2 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2
All possible linear combinations of 12 and 21 are:
12 + 21 = 00, 12 + 2(21) = 21, 2(12) + 21 = 12.
Therefore, C = <S> = {00, 12, 21, 00, 21, 12} = {00, 12, 21}.
3.3 MATRIX DESCRIPTION OF LINEAR BLOCK CODES
As we have observed earlier, any code Cis a subspace of GF(qt. Any set of basis vectors can be
used to generate the code space. We can, therefore, define a generator matrix, G, the rows of
which form the basis vectors of the subspace. The rows of G will be linearly independent. Thus,
a linear combination of the rows can be used to generate the codewords of C. The generator
matrix will be a k x n matrix with rank k. Since the choice of the basis vectors is not unique, the
generator matrix is not unique for a given linear code.
Information Theory, Coding and Cryptography
The generator matrix converts (encodes) a vector of length k to a vector of length n. Let the
input vector (uncoded symbols) be represented by i. The coded symbols will be given by
c= iG (3.1)
where c is called the codeword and i is called the information word.
The generator matrix provides a concise and efficient way of representing a linear block
code. The nx k matrix can generate q* codewords. Thus, instead of having a large look-up table
of q* codewords, one can simply have a generator matrix. This provides an enormous saving in
storage space for large codes. For example, for the binary (46, 24) code the total number of
codewords are 224
= 1,777,216 and the size of the lookup table of codewords will be n x 2* =
771,751,936 bits. On the other hand if we use a generator matrix, the total storage requirement
would be n x k= 46 x 24 = 1104 bits.
Example 3.9 Consider a generator matrix
G=[1 0 1]
0 1 0
[0 0] [ セ@
0
セ@ ]= [0 0 0], [0 1] [ セ@
0
セ}@ = [0 1 0]
cl =
1
cl=
1
c3 = [1 0] [ セ@
0
セ}@ = [1 0 1], c4 =[11] {セ@
0
セ}@ = [1 1 1]
1 1
Therefore, this generator matrix generates the code C = {000, 010, 101, 111 }.
3.4 EQUIVALENT CODES
Definition 3.10 A permutation of a setS= サク Q LセL@ ...,x11} is a one to one mapping
from S to itself. A permutation can be denoted as follows
Xz
J, (3.2)
f(xz)
Deftnitlon 3.11 Two q-axy codes are called equivalent if one can be obtained
from the other by one or both operations listed below:
(i) permutation of the symbols appearing in a fixed position,
(ii) permutation of the positions of the code.
Linear Block Codes for Error Correction
Suppose a code c.ontaining M codewords are displayed in the form of an M x n matrix, where
the rows represent the codewords. The operation (i) corresponds to the re-labelling of the
symbols appearing in a given column, and the operation (ii) represents the rearrangements of
the colums of the matrix.
Example 3.10 Consider the ternary code (a code whose components e {0, 1,2}) of blocldength 3
C= サセ@ セ@ セ@
0 1 2
If we apply the permutation 0 セ@ 2 , 2 セ@ 1, 1 セ@ 0 to column 2 and 1セ@ 2, 0 セ@ 1, 2 -+ 0 to column
3 we obtain
Cl = サセ@ セ@ セ@
0 0 0
The code Cl is equivalent to a repetition code of length 3.
Note that the original code is not linear, but is equfvalent to a linear code.
Definition 3.12 Two linear q-ary codes are called equivalent if one can be
obtained from the other by one or both operations listed below:
(i) multiplication of the components by a non-zero scalar,
(ii) permutation of the positions of the code.
Note that in Definition 3.11 we have defined equivalent codes that are not necessarily
linear.
Theorem 3.2 Two k x n matrices generate equivalent linear (n, k) codes over GF(q) if one
matrix can be obtained from the other by a sequence of the following operations:
(i) Permutation of rows
(ii) Multiplication of a row by a non scalar
(iii) Addition of a scalar multiple of one row to another
(iv) Permutation of columns
(v) Multiplication of any column by a non-zero scalar.
j.l
Information Theory, Coding and Cryptography
セッッヲ@ The first three operations (which are just row operations) prlserve the linear
ュ、セー・ョ、・ョ」・@ of the rows of the generator matrix. The operations merely modify the
basis. The last two operations (which are column operations) convert the matrix to one
which will produce an equivalent code.
Theorem 3.3 A generator matrix can be reduced to its systematic form (also called the
standard form of the generator matrix) of the type G = [ I 1 P] where I is a k x k identity
matrix and P is a k x (n - k) matrix.
Proof tィセ@ krows of any generator matrix (of size kx n) are linearly independent. Hence,
by ー・セッイュュァ@ elementary row operations and column permutations it is possible to obtain
an eqmvalent generator matrix in a row echelon form. This matrix will be of the fo
[II P]. rm
Example 3.11 Consider the generator matrix of a (4, 3) code over GF(3):
g]{セ@ セ@ セ@ セ}@
1 2 2 1
Let us represent the ith row by 7i and thejth column by 7i. Upon replacing 7
3
by 73 - 71 - 72 we get
(note that in GF(3), -1 =2 and -2 =1 because 1 + 2 =0, see table in Example 3.6)
G = {セ@ セ@ セ@ セ}@
0 1 2 0
Next we replace 71 by r1 - r3 to obtain
[
0 0
G= 1 0
0 1
01]
1 0
2 0
Finally, shifting c4 -7 cl, cl -7 C2, C2 -7 c3 and c3 -7 c4 we obtain the standard form of the
generator matrix
g]{セ@ セ@ セ@ セ}ᄋ@
0 0 1 2
Linear Block Codes for Error Correction
3.5 PARITY CHECK MATRIX
One of the objectives of a good code design is to have fast and efficient encoding and decoding
methodologies. So far we have dealt with the efficient generation of linear block codes using a
generator matrix. Codewords are obtained simply by multiplying the input vector (uncoded
word) by the generator matrix. Is it possible to detect a valid codeword using a similar concept?
The answer is yes, and such a matrix is called the Parity Check Matrix, H, for the given code.
For a parity check matrix,
cHT = 0 (3.3)
where cis a valid codeword. Since c= iG, therefore, iGHT= 0. For this to hold true for all valid
informat words we must have
(3.4)
The size of the parity check matrix is (n - k) x n. A parity check matrix provides a simple
method of detecting whether an error has occurred or not. If the multiplication of the received
word (at the receiver) with the transpose of H yields a non-zero vector, it implies that an error
has occurred. This methodology, however, will fail if the errors in the transmitted codeword
exceed the number of errors for which the coding,scheme is designed. We shall soon find out
that the non-zero product of cHT might help us not only to detect but also to correct the errors
under some conditions.
Suppose the generator matrix セウ@ represented in its systematic form G = [I IP]. The matrix P
is called the Coefficient Matrix. Then the parity check matrix will be defined as
H= ( -PTI I], (3.5)
where pT represents the transpose of matrix P. This is because
(3.6)
Since the choice of a generator matrix is not unique for a code, the parity check matrix will not
be unique either. Given a generator matrix G, we can determine the corresponding parity check
matrix and vice versa. Thus the parity 」ィセ」ォ@ matrix H can be used to specify the code completely.
From Eq. (3.3) we observe that the vector c must have 1's in such positions that the
corresponding rows of HT add up to the zero vector 0. Now, we know that the number of 1's in
a codeword pertains to its Hamming weight. Hence, the minimum distance d ofa linear block code is
given by the minimum number ofrows ofHT (or, the columns ofH) whose sum is equal to the qro vector.
.I
Information Theory, Coding and Cryptography
Exampk3.12 For a (7, 4) linear block code the generator matrix is given by
g]{セ@
0 0 0 1 0
']
1 0 0 1 1 1
0 0 1 0 0 1 0 '
0 0 0 1 0 1 0
[
101]
' 111 1100
the matrix Pis given by
0 1 0
and pr is given by [o 1 1 1]. Observing the fact that
010 1100
- 1 = 1 for the case of binary, we can write Lhe parity check matrix as
H=[-PTII]
fl100100]
=l·o 1 1 1 o 1 o.
1100001
Note that the columns 1, 5 and 7 of the parity check matrix, H, add up to the zero vector.
Hence, for this code, d* = 3.
Theorem 3.4 The code C contains a nonzero codeword of Hamming weight w or less if
and only if a linearly dependent set of w columns of H exist.
Proof Consider a codeword c E C. Let the weight of c be w which implies that there are
w non-zero components and (n- w) zero components in c. If we throw away the w zero
components, then fro:n the relation CHT = 0 we can conclude that w columns of H are
linearly dependent.
Conversely, if H has w linearly dependent columns, then a linear combination of at
most w columns is zero. These w non-zero coefficients would define a codeword of weight
w or less that satisfies CHT = 0.
Definition 3.13 An (n, k) systematic code is one in which the first k symbols of the
codeword of block length n are the information symbols themselves (i.e., the uncoded
vector) and the remainder the (n- k) symbols form the parity symbols.
Linear Block Codes for Error Correction
Example 3.13 The following is a (5, 2) systematic code over GF(3)
S.No. Information Symbols Codewords
(k = 2) (n = 5)
1. 00 00 000
2. 01 01 121
3. 02 02 220
4. 10 10 012
5. 11 11 221
6. 12 12 210
7. 20 20 020
8. 21 21 100
9. 22 22 212
Note that the total number ofcodewords is 3k = 3
2
= 9. Each codeword begins with theinformation
symbols, and has three parity symbols at the end. The parity symbols for the information word 01
are 121 in the above table. A generator matrix in the systematic form (standard form) will generate
a systematic code.
Theorem 3.5 The minimum distance (minimum weight) of an (n, k) linear code is bounded
as follows
d<S,n-k+l
This is known as the Singleton Bound.
(3.7)
Proof We can reduce all linear block codes tc their equivalent systematic forms. A
systematic code can have one information symbol and (n - k) parity symbols. At most all
the parity symbols can be non-zero, resulting in the total weight of the codeword to be
(n- k) + 1. Thus the weight of no codeword can exceed n- k + 1 giving the following
definition of a maximum distance code.
Definition 3.14 A Maximum Distance Code satisfies a= n - k + I.
Having familiarized ourselves with the concept of minimum distance of a linear code, we
shall now explore how this minimum distance is related to the total number of errors the code
can detect and possibly correct. So we move over to the receiver end and take a look at the
methods of decoding a linear block code.
3.6 DECODING OF A LINEAR BLOCK CODE
The basic objective of channel coding is to detect and correct errors when messages are
transmitted over a noisy channel. The noise in the channel randomly transforms some of the
セ
G@
.'
;
j
Information Theory, Coding and Cryptography
symbols of the transmitted codeword into some other symbols. If the noise, for example,
changes just one of the symbols in the transmitted codeword, the erroneous codeword will be at
a Hamming distance of one from the original codeword. If the noise transforms t symbols (that
is, t '>nfibols in the codeword are in error), the Hamming distance of the received word will be
at a Hannnirg distance of t from the originally transmitted codeword. Given a code, how many
errors can it detect and how many can it correct? Let us first look at the detection problem.
An error will be detected as long as it does not transform one codeword into another valid
codeword. If the minimum distance between the codewords is I, the weight of the error pattern
must be I or more to cause a transformation of one codeword to another. Therefore, an (n, k,
I) code will detect at least all nonzero error patterns of weight less than or equal to (I - 1).
Moreover, there is at least one error pattern of weight I which will not be detected. This
corresponds to the two codewords that are the closest. It may be possible that some error
patterns of weight I or more are detected, but allerror patterns of weight I will not be detected.
Example 3.14 For the code C1 = {000, Ill} the minimum distance is 3. Therefore errorpatterns
of weight 2 or I can be detected. This means that any error pattern belonging to the set {011, 101,
110, 001, 010, 100} will be detected by this code.
Next consider the code C2 ={001, 110, 101} with d* =1. Nothing can be said regardinghow many
errors this code can detect because d*- 1 = 0. However, the error pattern 010 of weight 1 can be
detected by this code. But it cannot detect all error patterns with weight one, e.g., the error vector
100 cannot be detected.
Next let us look at the problem of error correction. The objective is to make the best possible
guess regarding the originally transmitted codeword on the basis of the received word. What
would be a smart decoding strategy? Since only one of the valid codewords must have been
transmitted, it is logical to conclude that a valid codeword nearest (in terms of Hamming
distance) to the received word must have been actually transmitted. In other words, the
codeword which resembles the received word most is assumed to be the one that was sent. This
strategy is called the Nearest Neighbour Decoding, as we are picking the codeword nearest
to the received word in terms of the Hamming distance.
It may be possible that more than one codeword is at the same Hamming distance from the
received word. In that case the receiver can do one of the following:
(i) It can pick one of the equally distant neighbours randomly, or
(ii) request the transmitter to re-transmit.
Linear Block Codes for Error Correction
To ensure that the received word (that has at most terrors) is closest to the original codeword,
and farther from all other codewords, we must put the following condition on the minimum
distance of the code
iセ@ 2t+ 1 (3.8)
Graphically, the condition for correcting t errors or less can be visualized from Fig. 3.2.
Consider the space of all 'fary n-tuples. Every 9:ary vector of length n can be represented as a
point in this space. Every codeword can thus be depicted as a point in this space, and all words
at a Hamming distance of tor less would lie within the sphere centred at the codeword' and with
a radius oft. Ifthe minimum distance of the code is I, and the condition iセ@ 2t+ 1 holds good,
then none of these spheres would intersect. Any received vector (which is just a point) within a
specific sphere will be closest to its centre (which represents a codeword) than any other
codeword. We will call the spheres associated with each codeword its Decoding Sphere.
Hence it is possible to decode the received vector using the 'nearest neighbour' method without
ambiguity.
@
I
I
I
I
Fig. 3.2 Decoding Spheres.
Figure 3.2 shows words within the sphere of radius t and centred at c1 will be decoded as c1
•
For unambiguous decoding iセ@ 2t + 1.
The condition iセ@ 2t + 1 takes care of the worst case scenario. It may be possible, however,
that the above condition is not met but it is still feasible to correct t errors as illustrated in the
following example.
Example 3.15 Consider the code C = {00000, 01010, 10101, 11111 }. The minimum distance
d* = 2. Suppose the codeword 11111 was transmitted and the received word is 11110, i.e., t = 1
(one error has occurred in the fifth component). Now,
d (11110, 00000) = 4, d (11110, 01010) =2,
d (11110, 10101) = 3, d (11110, llll1) = I.
Using the nearest neighbour decoding we can conclude that 11111 was transmitted. Eventhough a
single error correction (t =1) was done in this case, d* < 2t + 1 =3. So it is possible to correct
Information Theory, Coding and Cryptography
errors even whend*;;:: 2t + 1. However, in many cases a single errorcorrection may not be possible
with this code. For example, ifOOOOO was sent and 01000 was received,
d (01000, 00000) =1, d (01000, 01010) =1,
d (01000, 10101) =4, d (01000, 11111) =4.
In this case there cannot be a clear cut decision, and a coin will have to be flipped1
Definition 3.15 An Incomplete Decoder decodes only those received codewords
that are clearly closest to one of the codewords. In the case of ambiguity, the decoder
declares that the received word is unrecognizable. The receiver is then requested to
re-transmit. A Complete Decoder decodes every received word, i.e., it tries to map
every received word to some codeword, even if it has to make a guess. Example 3.16
was that of a Complete Decoder. Such decoders may be used when it is better to have
a good guess 'rather than to have no guess at all. Most of the real life decoders are
incomplete decoders. Usually they send a message back to the transmitter requesting
them to re-transmit.
Definition 3.16 A receiver declares that an erasure has occurred {i.e., a received
symbol has been erased) when the symbol is received ambiguously, or the presence
of an interference is detected during reception.
Example 3.16 Consider a binary Pulse Amplitude Modulation (PAM) Scheme where 1 is
represented by five volts and 0 is represented by zero volts. The noise margin is one volt, which
implies that at the receiver:
if the received voltage is between 4 volts and 5 volts セ@ the bit sent is 1,
if the received voltage is between 0 volt and 1 volt セ@ the bit sent is 0,
if the received voltage is between 1 volt and 4 volts セ@ an erasure has occurred.
Thus if the receiver セ・ゥカ・、@ 2.9 volts during a bit interval, it will declare that an erasure has
occurred.
A channel can be prone both to errors and erasures. If in such a channel t errors and r erasures
occur, the error correcting scheme should be able to compensate for the erasures as well as
correct the errors. If r erasures occur, the minimum distance of the code will become d - r in
the worst case. This is because, the erased symbols have to be simply discarded, and if they
Linear Block Codes for Error Correction
were contributing to the minimum distance, this distance will reduce. A simple example will
illustrate the point. Consider the repetition code in which
0 セ@ 00000
QセQQQQQ@
Here d = 5. If r = 2, i.e., two bits get erased (let us say the first two), we will have
0 セ@ ??000
Qセ__QQQ@
Now, the effective minimum distance d1* =I -r= 3.
Therefore, for a channel with terrors and r erasures, I- r;;:: 2t + 1. Or,
1;;=:2t+r+1 (3.9)
For a channel which has no errors (t = 0), only r erasures.
l;;::r+1 (3.10)
Next let us give a little more formal treatment to the decoding procedure. Can we construct
some mathematical tools to simplify the nearest neighbour decoding? Suppose the codeword
c = e1セ@ •..., en is transmitted over a noisy channel. The noise in the channel changes some or all
of the symbols of the codeword. Let the received vector be denoted by v = v1Zl:l• ..•, vn. Define the
error vector as
e= V- C = V1 Zl:l• •••, Vn- e! セG@ ..., en= el e2, ..., en (3.11)
The decoder has to decide from the received vector, v, which codeword was transmitted, or
equivalently, it must determine the error vector, e.
Definition 3.17 Let Cbe an (n, k) code over GF(q) and a be any vector of length n.
Then the set
a+C={a+.%j.%e C} (3.12)
is called a Coset (or translate) of C. a and bare said to be in the same coset if (a- b)e
c.
Theorem 3.6 Suppose Cis an (n, k) code over GF(q). Then,
(i) every vector b of length n is in some coset of C.
(ii) each coset contains exactly l vectors.
(iii) two cosets are either disjoint or coincide (partial overlap is not possible).
(iv) if a + Cis a coset of C and b e a + C, we have b + C = a+ C.
Proof
(i) b = b + 0 E b + C.
(ii) Observe that the mapping C ---? a + C defined by.% セ@ a + .%, for all.% e C is a one-to-
one mapping. Thus the cardinality of a + Cis the same as that of C, which is equal to
l.
セ@ ...·
セNQ@
Information Theory, Coding and Cryptography
(iii) Suppose the cosets a + C and a + C overlap, i.e., they have at least one vector in
common.
Let v E (a+ C) n (b +C). Thus, for some x, y E C,
v =a+x=b+ y.
Or, b =a+x-y=a+z,whereze C
(because, the difference of two codewords is also a codeword).
Thus, b+ C =a+ C+zor (b+ C) c (a+ C).
Similarly, it can be shown that (a+ C) c (b + C). From these two we can conclude
that (b +C)= (a+ C).
(iv) Since bE a+ C, it implies that b =a+ x, for some x E C.
Next, if b + y E b + C, then,
b + y = (a + x) + y = a + (x + y) E a + C.
Hence,
b + C セ@ a + C. On the other hand, if a + z E a + C, then,
a + z = (b - x) + z = b + (z - x) E b + C.
Hence,
a + C セ@ b + C, and so b + C = a + C.
Definition 3.18 The vector having the minimum weight in a coset is called the
Coset Leader. If there is more than one vector with the minimum weight, one of
them is chosen at random and is declared the coset leader.
Example 3.17 Let C be the binary (3, 2) code with the generator matrix given by
The cosets ofC are,
G = [1 01]
0 1 0
i.e., C = {000, 010, 101, 111 }.
000 + c = 000,010, 101, 111,
001 + c = 001,011, 100, 110.
Note that all the eight vectors have been covered by these two cosets. As we have already seen (in
the above theorem), ifa + Cis a coset of C and b E a + C, we have b + C =a + C.
Hence, all cosets have been listed. For the sake of illustration we write down the following
010 + c = 010,000, 111, 101,
011 + c = 011, 001, 110, 101,
100 + c = 100, 110, 001, 011,
101 + c = 101, 111, 000, 101,
110 + c = 110, 100, 011, 001,
111 + c = 111, 101,010,000.
It can be seen that all these sets are already covered.
Linear Block Codes for Error Correction
Since two cosets are either disjoint or coincide (from Theorem 3.6), the set of all vectors, GF(q)"
can be written as
where
GF(q)" = C u (a1 + C) u (a2 + C) u ... u (a1 + C)
t =q"'k -1.
Definition 3.19 A Standard Array for an (n, k) code Cis a rf-lc x qk array of all
vectors in GF(fj)" in which the first row consists of the code C (with 0 on the extreme
left), and the other rows are the cosets a;+ C, each arranged in corresponding order,
with the coset leader on the left.
Steps for constructing a standard array:
(i) In the first row write down all the valid codewords, starting with the all-zero codeword.
(ii) Choose a vector a1 which is not in the first row. Write down the coset a1 + Cas the
second row such that a1 + x is written under x E C.
(iii) Next choose another カ・」エッイセ@ (not present in the first two rows) of minimum weight and
write down the coset セ@ + C as the third row such that a2 + x is written under x E C.
(iv) Continue the process until all the cosets are listed and every vector in GF (q)" appears
exactly once.
Example 3.18 Consider the code C = {0000, 1011, 0101, 1110}. The corresponding standard
array is
codewords セ@ 0000 1011 0101 1110
1000 0011 1101 0110
0100 1111 0001 1010
0010 1001 0111 1100
i
coset leader
Note that each entry is the sum of the codeword and its coset leader.
Let us now look at the concept of decoding (obtaining the information symbols from the received
codewords) using the standard array. Since the standard array comprises all possible words
belonging to GF(q)", the received word can always be identified with one of the elements of the
standard array. If the received word is a valid codeword, it is concluded that no errors have
occurred (this conclusion may be wrong with a very low probability of error, when one valid
codeword gets modified to another valid codeword due to noise!). In the case that the received
word, v, does not belong to the set of valid codewords, we surmise that an error has occurred.
The decoder then declares that the coset leader is the error vector, e, and decodes the codeword
as v - e. This is the codeword at the top of the column containing v. Thus, mechanically, we
decode the codeword as the one on the top of the column containing the received word.
I.
I
II
Information Theory, Coding and Cryptography
Example 3.19 Suppose the code in the previous example C = {0000, 1011, 0101, 1110} is used
and the received word is v = 1101. Since it is not one of the valid codewords, we deduce that an
error has occurred. Next we try to estimate which one ofthe four possible codewords was actually
transmitted. If we make use of the standard array of the earlier example, we find that 1101 lies in
the 3rd column. The topmost entry ofthis column is 0101. Hence the estimated codeword is 0101.
Observe that:
d (1101, 0000) = 3, d (1101, 1011) = 2,
d (1101, 0101) =1, d (1101, 1110) =2
and the error vector e = 1000, the coset leader.
Codes with larger blocklengths are desirable (though not always; see the concluding remarks
on this chapter) because the code rates of larger codes perform closer to the Shannon Limit. As
we go to larger codes (with larger values of k and n), the method of standard array will become
less practical because the size of the standard array (qn-k x q*) will become unmanageably large.
One of the basic objectives of coding theory is to develop efficient decoding strategies. If we are
to build decoders that will work in real-time, the decoding scheme should be realizable both in
terms ofmemory required as well as the computational load. Is it possible to reduce the standard
array? The answer lies in the concept of Syndrome Decoding, which we are going to discuss
next.
3.7 SYNDROME DECODING
The standard array can be simplified if we store only the first column, and compute the
remaining columns, if needed. To do so, we introduce the concept of the Syndrome of the error
pattern.
I
Definition 3.20 Suppose His a parity check matrix of an (n, Jq code, then for any
vector v e GF(q)n, the vector
s = vHT (3.13)
is called the Syndrome of v.
The syndrome of v is sometimes explicitly written as s(v). It is called a syndrome
because it gives us the symptoms of the error, thereby helping us to diagnose the
error.
Theorem 3.7 Two vectors x and y are in the same coset of C if and only if they have the
same syndrome.
Proof The vectors x and y belong to the same coset
Linear Block Codes for Error Correction
<:::> x+ C=y+ C
<:::> x-ye C
<:::> (x - y)HT = 0
<:::> xHT =yHT
<:::> s(x) = s(y)
Thus, there is a one to one correspondence between cosets and syndromes.
We can reduce the size of the standard array by simply listing the syndromes and the
corresponding coset leaders.
Extultpk J.JO WenowextendtbestandatdarraylistedinExample 3.18by。、。ゥ。ァᄋセ@
column.
The cOde is C ={0000, lOU. PQセQN@ 1110}. 'I'I:le conespoDd.iiig standard mayis·
Codewords 0000
1000
0100
0010
t
coset leader
1011
OO'J:l
l1U
loot
The steps for syndrome decoding are as follows
oiot
1101
·.JXIll
-, .-
'Oltl
'
1111
Syndi:OIDe'
00
oiio: 11
1010. · . Dt· ·
.·uoo l&
(i) Determine the syndrome (s = vHT) of the received word, v.
(ii) Locate the syndrome in the 'syndrome column'.
(iii) Determine the corresponding coset leader. This is the error vector, e.
(iv) Subtract this error vector from the received word to get the codeword y = v- e.
Having developed an efficient decoding methodology by means of syndrome decoding, let us
now find out how much advantage coding actually provides.
3.8 ERROR PROBABILITY AFTER CODING (PROBABILITY OF
ERROR CORRECTION)
DeflDitlon 3.21 The Probability of Error (or, the Word Error Rate) P,for any
decoding scheme is the probability that the decoder output is a wrong codeword. Itis
also called the llesklual Error Rate.
Suppose there are M codewords (of length n) which are used with equal probability. Let the
decoding be done using a standard array. Let the number of coset leaders with weight i be
denoted by a.,, We assume that the channel is a BSC with symbol error probabilityp. A decoding
error occurs if the error vector e is rwt a coset leader. Therefore, the probability of correct
decoding will be
I
Information Theory, Coding and Cryptography
n
pcor = Lai pi (1- P)n- i (3.14)
i=O
Hence, the probability of error will be
n
Perr= 1- Lai pi (1- pt-i (3.15)
i=O
Example 3.21 Consider the standard array in Example 3.18. The coset leaders are 0000, 1000,
0100 and 0010. Therefore <lo = 1 (only one coset leader with weight equal to zero), a.1
=3 (the
remaining three are of weight one) and all other a.i = 0.
Therefore,
perr = 1- [(1 - p)4 + 3p{l- p)3]
Recall that this code has four codewords, and can be used to send 2 bits at a time. If we did not
perform coding, the probability oferror of the 2-bit message being received incorrectly would be
perr = 1 -pear= 1 - (1 - p)2.
Note thatforp = 0.01, the Word Error Rate (upon coding) isPerr= 0.0103, while fortheuncoded
case Pe" = 0.0199. So, coding has almost halved the word error rate. The comparison of Perr for
messages with and without coding is plotted in Fig. 3.3. It can be seen that coding outperforms the
uncoded case only for p < 0.5. Note that the improvement due to coding comes at the cost of
information transfer rate. In this example, the rate of information transfer has been cut down by
half as we are sending two parity bits for every two information bits.
0.8
Without 」ッセゥョァ@ -
0.6
Perr
0.4
L__ With coding
0.2 MMMMMMMセMMMMMMMMMセMMMMMMMMMセMMMMMMMMM
0 p
0 0.2 0.4 0.6 0.8 1
Fig. 3.3 Comparison of Perr for Coded and Uncoded 2-Bit Messages.
Linear Block Codes for Error Correction
Example 3.22 This example will help us visualize the power ofcoding. Consider a BSC with the
probability of symbol error p = 1o-7. Suppose 10 bit long words are being transmitted without
coding. Let the bit rate of the transmitter be 10
7
b/s, which implies that Hf wordls are being sent.
The probability that a word is received incorrectly is
(0) (1- p)9 p + C20) (1- p)8; + cセI@ (1- p)7 p3 + ···""' c:)(1-p)9 p ]Qセ@ wordls.
Therefore, in one second, 10-6 x 1ff = 1 word will be in error ! The implication is that every
second a word will be in error and it will not be detected.
Next, let us add a parity bit to the uncoded words so as to make them 11 bits long. The parity
makes all the codewords of even parity and thus ensures that a single bit in error will be detected.
The only way that the coded word will be in error is iftwo or more bits get flipped, i.e., at least two
bits are in error. This can be computed as 1 - probability that less than two bits are in error.
Therefore, the probability of word error will be
II (11) 10 2
1- (1 - p) -
1
( 1- p) p z 1 - (1 - 11p) - 11(1 - 1Op) p = 110 p = 11 x 10-13
The new word rate will be 107/11 wordls because;10w 11 bits constitute one word and the bit
rate is the same as before. Thus in one second, (107/11) x (11 x w-13
) = 10-6 words will be in
error. This implies that, after coding, one word will be received incorrectlywithoutdetectionevery
106
seconds = 11.5 days!
So just by increasing the word length from 10 bits (uncoded) to 11 bits (with coding), we have
been able to obtain a dramatic decrease in the Word Error Rate. For the second case, each time 2
word is detected to be in error, we can request the transmitter to re-transmit the word.
This strategy for retransmission is called the Automatic Repeat Request (ARQ).
3.9 PERFECT CODES
Definition 3.22 For any vector u in GF(qt and any integer r セ@ 0, the sphere of
radius rand centre u, denoted by S(u, r), is the set {v E GF(q)n I d(u, v) セ@ r}.
This definition can be interpreted graphically, as shown in Fig. 3.4. Consider a code C with
minimum distance I(cIセ@ 2t+ 1. The spheres of radius tcentred at the codewords {c1, c2, .... eM}
of C will then be disjoint. Now consider the decoding problem. Any received vector can be
represented as a point in this space. If this point lies within a sphere, then by nearest neighbour
decoding it will be decoded as the centre of the sphere. If t or fewer errors occur, the received
word will definitely lie within the sphere of the codeword that was transmitted, and will be
correctly decoded. If, however, larger than terrors occur, it will escape the sphere, thus resulting
in incorrect decoding.
l'
I
Information Theory, Coding and Cryptography
Fig. 3.4 The concept of spheres in GF(qf.
The codewords of the code with I (C; セ@ 2t + 1 are the centres of these non-overlapping spheres.
Theorem 3.8 A sphere of radius r (0 :S r :S n) contains exactly
(セI@ +(;) (q-I)+(;) (q-1)
2
+···+G)<q-I)' vectors. (3.16)
Proof Consider a vector u in GF(q)n and another vector v which is at a distance m from
u. This implies that the vectors u and v differ at exactly mplaces. The total number of ways
in which m position can be chosen from n positions is (:). Now, each of these m places
can be replaced by (q- 1) possible symbols. This is because the total size of the alphabet is
q, out of which one is currently being used in that particular position in u. Hence, the
number of vectors at a distance exactly mfrom u is
HセI@ +GJ<q-I)+ C)(q-1)
2
+...+GJ<q-I)' (3.17)
Ertllllpk 3.23 Consider a binary code (i.e., q=2) and「ャッ」ォMQャセセセセ@
at a distance 2 or less from any·codeword will be · • Mセ@ MLLLZGーセZ@ サセZセエjZNGセLᄋセセZZGセ@ :·::.
Without loss ofgenerality we can choose the fixed vectora:= 0000. セ@ vCctonof.....2« .
less are .., ·
..·.. - . . -·
Linear Block Codes for Error Correction
ZZZZZコ]Zセ[ZNZZᄋセセセェセZヲᄋaZセyセセセヲ[{セfセセセ|LセャセゥZZZᄋ@
ZエNZZzZZcZセNセG\@ ··$ .: :f サ」セᄋ」@ :· .• NZjセMセセャセセセMNセセQゥ[」@
Theorem 3.9 A fj"ary (n, k) code with M codewords and minimum distance (2t + 1)
satisfies
M {(セI@ +(セ@ }q-I) +(;}q-1)
2
+..·+(;}q-1)'},; q' (3.18)
Proof Suppose Cis a f["ary (n, Jq code. Consider spheres of radius t centred on the M
codewords. Each sphere of radius t has
HセI@ +HセI@ (9 -I)+ (;}q-1)2
+...+(:}q-1)'
vectors (theorem 3.8). Since none of the spheres intersect, the total number of vectors for
the M disjoint spheres is M {(セI@ +(セ@ }q-I) +'(;}q-1)
2
+..·+(:}q-!)'} which is upper
bounded by qn, the total number of vectors oflength n in GF( q)n ·
This bound is called the Hamming Bound or the Sphere Packing Bound and it holds
· d 11 F b' d the Hamming Bound will become
good for nonhnear co es as we . or mary co es,
mサHセI@ +G)+(;)+ ... +(;)},; 2" (3.19)
It should be noted here that the mere existence of a set of integers n, M and t satisfying
the Hamming bound, does not confirm it as a binary code. For example, the s_et n= 5, セ@
= 5 and t = 1 satisfies the Hamming Bound. However, no binary code exists for this
specification.
Observe that for the case when M = qk, the Hamming Bound may be alternatively
written as
(3.20)
セMJGィャォエ」ゥセj・@ ゥ。\ュ・キィゥ、ゥセQj^・セェセ@ セセセ[N@
...... ;セHゥj@ +(;)(f-t>+(i)ifMエIセ@ + MセM セサヲIイセャエスBGセGᄋ@ ·
....·[M[Mセ@ ·
I
I
·I
l
!
Information Theory, Coding and Cryptography
For a Perfect Code, there are·equal radius disjoint spheres centred at the codewords
which completely fill the space. Thus, a t error correcting perfect code utilizes the
entire space in the most efficient manner.
Example 3.24 Consider the Binary Repetition Code
{
00...0
c-
11...1
of block length n, where n is odd. In this case M =2 and t =(n- 1)/2. Upon substituting these
values in the left hand side of the inequality for Hamming bound we get
Thus the repetition code is a Perfect Code. It is actually called a Trivial Perfect Code. In the next
chapter, we shall see some examples of Non-trivial pセイヲ・」エ@ Codes.
One of the ways to search for perfect codes is to obtain the integer solutions for the
parameters n, q, M and tin the equation for Hamming bound. Some of the solutions found by
exhaustive computer search are listed below.
S.No. n q M t
1 23 2 212 3
2 90 2 278 2
3 11 3 36 2
3. 10 HAMMING CODES
There are both binary and non-binary Hamming Codes. Here, we shall limit our discussion to
binary Hamming Codes. The binary Hamming Codes have the property that
(n, k) =(2m- 1, 2m- 1 - m) (3.22)
where m is any positive integer. For example, for m = 3 we have a (7, 4) Hamming Code. The
parity check matrix, H, of a Hamming Code is a very interesting matrix. Recall that the parity
check matrix of an (n, k) code has n- k rows and n columns. For the binary (n, k) Hamming
code, the n = 2m - 1 columns consist of all possible binary vectors with n - k = m elements,
except the all zero vector.
Linear Block Codes for Error Correction
Example 3.25 Tlle generator matrlx for the binary (7, 4) Hamming Code is given by
[
1 1 0 1
0 1 1 0
G = 0 0 1 1
0 0 0 1
0 0
1 0
0 1
1 0
j]
The corresponding parity check matrix is
H = イセ@ セ@ セ@ ::セ@ セQ@
lo o 1 o 1 1 1J
Observe that the columns ofthe parity check matrix consist of (100), (010), (101), (110), (111),
· · 1 b' ecto oflength three. It isquite
(011) and (001). These seven are all the possib e non-zero mary v rs .
easy to generate a systematic Hamming Code. The parity check matrix H can be arranged m the
systematic form as follows
110100]
1 1 1 0 1 0 = [- pT1 I].
1 0 1 0 0 1
Thus, the generator matrix in the systematic form for the binary Hamming code is
G = [ II P] = {セ@ lセ@ r::セ}@
0001:011
.. t t . . . JJ SL 2 2 2 Li 2
From. the above example, we observe that no two columns セヲ@ H セ・@ ャゥセ・。イャケ@ dependent
(
th · e they would be identical). However, form> 1, it is possible to Identify エィイセ・@ 」ッャオュセウ@
o erwis . . d. l' fan (n, k' Hammmg Code 1s
fHth t ld add up to zero. Thus, the mmimum Istance, , o 1
o a wou H · C d are Perfect
equal to 3, which implies that it is a single-error correcting code. ammmg o es
Codes.
· all ·ty b't an (n, k) Hamming Code can be modified to yield an (n+1, k)
By 。セエィ、ュ、セ@ セTッカP・イ@ thpanth Ih,and an (n, k' Hamming Code can be shortened to an (n- l, k
code WI - n e o er ' ·J • l 1
- l) code by イ・ュセカゥョァ@ l rows of its generator matrix G or, ・アオゥカ。ャ・ョセセG@ by removm.g セッセZウョウ@
· · h k trtx' H We can now give a more formal defimtion of Hammmg ·
of Its panty c ec rna , .
il
Information Theory, Coding and Cryptography
DefbdUoD. 3-U Let • ::{fk セ@ l}l{fセ@ 1J•.QᆪQゥ・Zセ QエCGゥセ[Zエエ@ セMM
code for which the ー。イゥエケNセ@ . . .. .
independent (over gfHセIL@ i.e., the セMᄋ@ are 。ゥjセ@ .Nュュ。エZMZZwゥMセゥイ}ャャ@ ...セ@
vectors.
3.11 OPTIMAL LINEAR CODES
Definition 3.25 For an (n, .t, tt) oーエャュ。ゥcッ、・Lーッサゥ。MャセNZォL[LセZヲ。セゥイZエJLヲセAヲQ[ッイ@ ..
(n + 1, k, d + 1) code exists. .· ·. .•. . ᄋセ@ ... セMM · ,
Optimal Linear Codes give the best distance property under the constraint of the block length.
Most of the optimal codes have been found by long computer searches. It may be possible to
have more than one optimal code for a given set of parameters n, k and d*. For instance, there
exist two different binary (25, 5, 10) optimal codes.
Qセセᄃセセ[セZZ[M[Z]Lイ[jエヲゥセェセセセセセ@ NMZMGBG\NᄋONMMMセUZB\ZM[ZLᄋN@ ·.:; .. ...
Thus the binary (24, 12, 8) code is an optimal code.
3.12 MAXIMUM DISTANCE SEPARABLE (MDS) CODES
In this section we consider the problem of finding as large a minimum distance apossible for
a given redundancy, r.
Theorem 3.10 An (n, n- r, d*) code satisfies d* $; r + 1.
IProof From the Singleton Bound we have a$; n - k + 1.
Substitute k = n - r to get d* $; r + 1.
3.13 CONCLUDING REMARKS
The classic paper by Claude Elwood Shannon in the Bell System Technical]ournal in 1948 gave
birth to two important fields (i) Information Theory and (ii) Coding Theory. At that time,
Shannon was only 32 years old. According to Shannon's Channel Coding Theorem, "the error
rate ofdata transmitted over a hand-limited noisy channel can he reduced to an arbitrarily small amount if
Linear Block Codes for Error Correction
the information rate is less than the channel 」。ー。」ゥエケGセ@ Shannon predicted the existence of good
channel codes but did not construct them. Since then the search for good codes has been on.
Shannon's seminal paper can be accessed from the site:
http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html.
In 1950, R.W. Hamming introduced the first single-error correcting code, which is still used
today. The work on linear codes was extended by Golay (whose codes will be studied in the
following chapter). Golay also introduced the concept of Perfect Codes. Non-binary Hamming
Codes were developed by Golay and Cocke in the late 1950s. Lately, a lot of computer searches
have been used to find interesting codes. However, some of the best known codes are ones
discovered by sheer genius rather than exhaustive searches.
According to Shannon's theorem, if C(p) represents the capacity (see Chapter 1 for further
details) of a BSC with probability of bit error equal to p, then for arbitrarily low probability of
symbol error we must have the code rateR< C(p). Even though the channel capacity provides
an upperbound on the achievable code rate (R = kln), evaluating a code exclusively against
channel capacity may be misleading. The block length of the code, which translates directly into
delay, is also an important parameter. Even if a code performs far from ideal, it is possible that
it is the best possible code for a given rate and length. It has been observed that as we increase
the block length of codes, the bounds on code rate are closer to channel capacity as opposed to
codes with smaller blocklengths. However, longer blocklengths imply longer delays in
decoding. This is because decoding of a codeword cannot begin until we have received the
entire codeword. The maximum delay allowable is limited by practical constraints. For
example, in mobile radio communications, packets of data are restricted to fewer than 200 bits.
In these cases, codewords with very large blocklengths cannot be used.
SUMMARY
• A Word is a sequence of symbols. A Code is a set of セ・」エッイウ@ called codewords.
• The Hamming Weight of a codeword (or any vector) is equal to the number of non-zero
elements in the codeword. The Hamming Weight of a codeword cis denoted by w(c).
• A Block Code consists of a set of fixed length codewords. The fixed length of these
codewords is called the Block Length and is typically denoted by n. A Block Coding
Scheme converts a block of k information symbols to n coded symbols. Such a code is
denoted by (n, k).
• The Code Rate of an (n, k) code is defined as the ratio (kin), and reflects the fraction of
the codeword that consists of the information symbols.
• The minimum distance of a code is the minimum Hamming Distance between any two
codewords. An (n, k) code with minimum distanced'' is sometimes denoted by (n, k, d).
The minimum weight of a code is the smallest weight of any non-zero codeword, and is
Information Theory, Coding and Cryptography
denoted by w·. For a Linear Code the minimum distance is equal to the minimum weight
of the code, i.e., d = w...
• A Linear Code has the following properties:
(i) The sum of two codewords belonging to the code is also a codeword belonging to the
code.
(ii) The all-zero codeword is always a codeword.
(iii) The minimum Hamming Distance between two codewords of a linear code is equal
to the minimum weight of any non-zero codeword, i.e., d = w...
• The generator matrix converts (encodes) a vector oflength k to a vector oflength n. Let
the input vector (uncoded symbols) be represented by i. The coded symbols will be given
by c= iG.
• Two q-ary codes are called equivalent if one can be obtained from the other by one or
both operations listed below:
(i) permutation of symbols appearing in a fixed position.
(ii) permutation of position of the code.
• An (n, k) Systematic Code is one in which the first k symbols of the codeword of block
length n are the information symbols themselves. A generator matrix of the form G =
[/ IP] is called the systematic form or the standard form of the generator matrix, where I
is a k x k identity matrix and P'is a k x (n- k) matrix.
• The Parity Check Matrix, H, for the given code satisfies cHT = 0, where c is a valid
codeword. Since c = iG, therefore, iGHT = 0. The Parity Check Matrix is not unique for
a given code.
• A Maximum Distance Code satisfies d..= n - k + 1.
• For a code to be able to correct up to terrors, we must have d セ@ 2t + 1, where d is
minimum distance of the code.
• Let Cbe an (n, k) code over GF(q) and a be any vector oflength n. Then the set a+ C=
{a+ xI x E C} is called a coset (or translate) of C. a and bare said to be in the same coset
iff (a - b) E C.
• Suppose His a Parity Check Matrix of an (n, k) code. Then for any vector v E GF(qt, the
vectors= vff is called the Syndrome of v. It is called a syndrome because it gives us the
symptoms of the error, thereby helping us to diagnose the error.
• A Perfect Code achieves the Hamming Bound, i.e.,
• The binary Hamming Codes have the property that (n, k) =(2m- 1, 2m- 1 - m), where
m is any positive integer. Hamming Codes are Perfect Codes.
• For an (n, k, d*) Optimal Code, no (n- 1, k, d), (n + 1, k + 1, d) or (n + 1, k, d + 1) code
exists.
• An (n, n- r, r+ 1) code is called a Maximum Distance Separable (MDS) Code. An MDS
code is a linear code of redundancy r, whose minimum distance is equal to r + 1.
Linear Block Codes for Error Correction
Aセセセエmッイ・ュエゥョエセキキィZyセ@ i
! I people,-are,t'better tU" IXtha.rtt セ@ I
i I セセ@ Adn:ant i
u !
PROBLEMS
3.1 Show that C= {0000, 1100,0011, 1111} is a linear code. What is its minimum distance?
3.2 Construct, if possible, binary (n, k, d) codes with the following parameters:
(i) (6, I, 6)
(ii) (3, 3, 1)
(iii) (4, 3, 2)
3.3 Consider the following generator matrix over GF(2)
G= iセ@ セ@ セ@ セ@ セ}ᄋ@
lo 1 o 1 o
(i) Generate all possible codewords using this matrix.
(ii) Find the parity check matrix, H.
(iii) Find the generator matrix of an equivalent systematic code.
(iv) Construct the standard array for this code.
(v) What is the minimum distance of this code?
(vi) How many errors can this code detect?
(vii) Write down the set of error patterns this code can detect.
(viii) How many errors can this code correct?
(ix) What is the probability of symbol error if we use this encoding scheme? Compare it
with the uncoded probability of error.
(x) Is this a linear code?
3.4 For the code C= {00000, 10101, 01010, 11111} construct the generator matrix. Since this
G is not unique, suggest another generator matrix that can also generate this set of
codewords.
3.5 Show that if there is a binary (n, k, d) code with I even, then there exists a binary (n, k,
d) code in which all codewords have even weight.
3.6 Show that if Cis a binary linear code, then the code obtained by adding an overall parity
check bit to Cis also linear.
3.7 For each of the following sets S, list the code <S>.
(a) S= {0101, 1010, 1100}.
(b) s= {1000, 0100, 0010, 0001}.
(c) S = {11000, 01111, 11110, 01010}.
l
l
I
Information Theory, Coding and Cryptography
3.8 Consider the (23, 12, 7) binary code. Show that if it is used over a binary symmetric
channel (BSC) with probability of bit error p= 0.01, the word error will be approximately
0.00008.
3.9 Suppose C is a binary code with parity check matrix, H. Show that the extended code C1,
obtained from C by adding an overall parity bit, has the parity check matrix
I 0
0
H
0
1 1 1
3.10 For a (5, 3) code over GF(4), the generator matrix is given by
G= {セ@ セ@ セ@ セ@ セ}@
0 0 1 1 3
(i) Find the parity check matrix.
(ii) How many errors can this code detect?
(iii) How many errors can this code correct?
(iv) How many erasures can this code correct?
(v) Is this a perfect code?
3.11 Let C be a binary perfect code of length n with minimum distance 7. Show that n = 7 or
n=23.
3.12 Let rHdenote the code rate for the binary Hamming code. Determine lim rH.
k-+oo
3.13 Show that a (15, 8, 5) code does not exist.
COMPUTER PROBLEMS
3.14 Write a computer program to find the minimum distance of a Linear Block Code over
GF(2), given the generator matrix for the code.
3.15 Generalize the above program to find the minimum distance of any Linear Block Code
over GF(q).
3.16 Write a computer program to exhaustively search for all the perfect code parameters n, q,
M and tin the equation for the Hamming Bound. Search for 1 セ@ n セ@ 200, 2 セ@ qセ@ 11.
3.17 Write a computer program for a universal binary Hamming encoder with rate }m-l
2 -1-m
The program should take as input the value of m and a bit-stream to be encoded. It
should then generate an encoded bit-stream. Develop a program for the decoder also.
Linear Block Codes for Error Correction
Now, perform the following tasks:
(i) Write an error generator module that takes in a bit stream and outputs another bit-
stream after inverting every bit with probability p, i.e., the probability of a bit error is p.
(ii) For m = 3, pass the Hamming encoded bit-stream through the above-mentioned
module and then decode the received words using the decoder block.
(iii) Plot the residual error probability (the probability of error after decoding) as a
function of p. Note that if you are working in the range of BER = 1o-r, you must
transmit of the order of 10'+2
bits (why?).
(iv) Repeat your simulations for m= 5, 8 and 15. What happens as m--7 oo.
,
I'
I
Cyclic Codes
We,t etn'lNe,t 。エセ@ not" by セ@ Ofll:y, butti4o- by the.-
hecwt.
ーセ@ GXセ@ (1623-1662)
4. 1 INTRODUCTION TO CYCLIC CODES
In the previous chapter, while dealing with Linear Block Codes, certain linearity constraints
were imposed on the structure of the block codes. These structural properties help セウ@ to search
for good linear block codes that are fast and easy to encode and decode. In this chapter, we shall
explore a subclass of linear block codes which has another constraint on the structure of the
codes. The additional constraint is that any cyclic shift of a codeword results in another valid
codeword. This condition allows very simple implementation of these cyclic codes by using
shift registers. Efficient circuit implementation is a selling feature of any error control code. We
shall also see that the theory of Galois Field can be used effectively to study, analyze and
discover new cyclic codes. The Galois Field representation of cyclic codes leads to low-
complexity encoding and decoding algorithms.
This chapter is organized as follows. In the first two sections, we take a mathematical detour
to polynomials. We will review some old concepts and learn a few new ones. Then, we will use
these mathematical tools to construct and analyze cyclic codes. The matrix description of cyclic
Cyclic Codes
codes will be introduced next. We will then, discuss some popular cyclic codes. The chapter wili
conclude with a discussion on circuit implementation of cyclic codes.
Definition 4.1 A code Cis cyclic if
(i) Cis a linear code, and,
(ii) any cyclic shift of a codeword is also a codeword, i.e., if the codeword tzoa1
••• セ Q@ is
in C then an-lOo···an-2is also in C.
Example 4.1 The binary code C1 = {0000, 0101, 1010, 1111} is a cyclic code. However C2
=
{0000, 0110, 1001, 1111} is not a cyclic code, but is equivalentto the first code. Interchanging the
third and the fourth components ofC2 yields C1.
4.2 POLYNOMIALS
Definition 4.2 A polynomial is a mathematical expression
f(x) =fo +fix+ ... +[,/',
e
(4.1)
where the symbol xis called the indeterminate and the coefficientsfo,fi, ...,fm are the
elements of GF (q). The coefficientfm is called the leading coefficient. Iffm # 0, then m
is called the degree of the polynomial, and is denoted by deg f(x).
Definition 4.3 A polynomial is called monic if its leading coefficient is unity.
Example 4.2 j{x) = 3 + ?x + セ@ + 5x4
+ x6
is a monic polynomial over GF(8). The degree of this
polynomial is 6.
Polynomials play an important role in the study of cyclic codes, the subject of this chapter. Let
F[x] be the set of polynomials in x with coefficients in GF(q). Different polynomials in F[x] can
be added, subtracted and multiplied in the usual manner. F[x] is an example of an algebraic
, structure called a ring. A ring satisfies the first seven of the eight axioms that define a field (see
Sec. 3.2 of Chapter 3). F[x] is not a field because polynomials of degree greater than zero do not·
have a multiplicative inverse. It can be seen that if[(x), g(x) E F[x], then deg (f(x)g(x)) = degf(x)
+ deg g(x). However, deg (f(x) + g(x)) is not necessarily max{ deg f(x), deg g(x)}.
For example, consider the two polynomials,f(x) and g(x), over GF(2) such thatf(x) = 1 + x2
and
g(x) = 1 + x + x2
. Then, deg (f(x) + g(x)) = deg (x) = 1. This is because, in GF(2), 1 + 1 = 0, and
x
2
+ x
2
=(I+ 1); = 0.
r--
Information Theory, Coding and Cryptography
Example 4.3 Consider the polynomialsf(x) = 2 +x + セ@ + 2x4
and g{x) = 1 + '1f + 2x4
Kセッカ・イ@
GF(3). Then,
f(i; + g(x) = (2 + 1) + x + (1 + RIセ@ + (2 + 2)x4
+ セ@ = x + x4
+ セN@
f(x). g(x) = (2 + x + セ@ + RクセH@ 1 + '1f + 2x4
+ セI@
= 2 +X+ (1 + 2.2) セ@ + U + (2 + 2 + 2.2)x4
+ (2 + 2) セ@
+ (1 + 2 + l).li + x1
+ 2.2x8
+ セ@
= 2 + x + (1 + Qセ@ + U + (2 + 2 + l)x4
+ (2 + Rセ@ + (1 + 2 + 1),/i + x1
+ :/' Kセ@
= 2 + x + セ@ + :zx3 + 2x4
+ :C + .li + x1
+ x8
+ セ@
Note that the addition and multiplication of the coefficients have been carried out in GF(3).
Example 4.4 Consider the polynomialf(x) = 1 + x over GF(2).
(f(x))
2
= 1 + (1 + l)x + セ@ = 1 + セ@
Again considerf(x) = 1 + x over GF(3).
(f(x))2
= 1 + HQKQIクKセ]@ 1 + RクKセ@
4.3 THE DIVISION ALGORITHM FOR POLYNOMIALS
The Division Algorithm states that, for every pair of polynomial a(x) and b(x) :t 0 in F[ x], there
exists a unique pair of polynomials q(x), the quotient, and r(x), the remainder, such that a(x) =
q(x) b(x) + r(x), where deg r(x) <deg b(x). The remainder is sometimes also called the residue,
and is denoted by Rh(x) [a(x)] = r(x).
Two important properties of residues are
(i) Rtr.x) [a(x) + b(x)] = Rtr.x) [a(x)] + Rttx) [b(x)], and
(ii) Rtr.x) [a(x). b(x)] = Rttx) {Rp_x) [a(x)]. Rttx) [b(x)]}
where a(x), b(x) and f(x) are polynomials over GF(q).
(4.2)
(4.3)
Example 4.5 Let the polynomials, a(x) = xl + x + land b(x) = セ@ + x + 1 be defined over GF{2).
We can carry out the long division of a(x) by b(x) as follows
x+l -q(x)
b(x) -xl+x+ 1) X3+ x+ 1-a(x)
_x3+ _x2 +X
.xl+
.xl+x+ 1
x - r(x)
Cyclic Codes
Thus, a(x) = (x+ 1) b(x) + x. Hence, we may write a(x) = q(x) b(x) + r(x), where q(x) = x + 1 and
r(x) = x. Note that deg r(x) <deg b(x).
Definition 4.4 Let f(x) be a fixed polynomial in F(Xj. Two polynomials, g(x) and
h(x) in F[x] are said to be congruent modulo f(x), depicted by g(x) =h(x) (modf(x)),
if g(x) - h(x) is divisible by f(x).
Example 4.6 Let the polynomials g(x) = x
9
+ セ@ + 1, h(x) = セ@ + セ@ + 1 and f(x) = x
4
+ 1 be
defined over GF(2). Since g(x)- h(x) = J?j(x), we can write g(x) =h(x) (modf(x)).
Next, let us denote F[x]!f(x) as the set ofpolynomials in F[x] of degree less than deg f(x), with
addition and multiplication carried out modulo f(x) as follows:
(i) If a(x) and b(x) belong to F[x]!f(x), then the sum a(x) + b(x) in F[x]lf(x) is the same as in
F[x]. This is because deg a(x) <degf(x), deg b(x) <degf(x) and therefore deg (a(x) + h(x))
<deg f(x).
(ii) The product a(x)b(x) is the unique polynomial of degree less than deg f(x) to which
a(x)b(x) (multiplication being carried out in F[x]) is congruent modulo f(x).
F[x]!f(x) is called the ring ofpolynomials (over F[x]) modulo f(x). As mentioned earlier, a
ring satisfies the first seven of the eight axioms that define a field. A ring in which every
element also has a multiplicative inverse forms a field.
Example 4.7 Consider the product (x + 1)2
in f{ク}ャHセ@ + x + 1) defined over GF{2). (x + 1)
2
= セ@
+X+ X+ 1 = セ@ + 1 =X Hュッ、セK@ X+ 1).
The product (x + 1)2
in f{ク}ャHセ@ + 1) defmed over GF(2) can be expressed as (x + 1)
2
= セ@ + x + x
+ 1 ]セK@ 1 ]oHュッ、セKクK@ 1).
The product (x + 1)2
in F [x}OHセ@ + x + 1) defined over GF(3) can be expressed as (x + 1)
2
= セ@ + x
+X+ 1 = セ@ + 2x + 1 =X Hュッ、セK@ X + 1).
Ifj(x) has degree n, then the ring F[x]!f(x) over GF(q) consists of polynomials of 、・ァイ・・セ@ n- 1.
The size of ring will be qn because each of the n coefficients of the polynomials can be one of the
q elements in GF(q).
Example 4.8 Consider the ring f{ク}ャHセ@ + x + 1) defined over GF(2). This ring will have
polynomials with highest degree = 1. This ring contains qn = 2
22
= 4 elements (each element is a
polynomial). The elements of the ring will be 0, 1, x and x + 1. The addition and multiplication
tables can be written as follows.
.I
Information Theory, Coding and Cryptography
+ 0 1 X x+1 . 0 1 X X+1
0 0 1 X x+1 0 0 0 0 0
1 1 0 x+l X 1 0 1 X x+1
X X x+l 0 1 X 0 X x+1 1
x+l x+l X 1 0 x+1 0 x+1 1 X
Next, consider f{ク}ャHセ@ + 1) defined over GF(2). The elements ofthe ring will be 0, l,x andx + 1.
The addition and multiplication tables can be written as follows.
+ 0 1 X x+1 . 0 1 X x+1
0 0 1 X x+1 0 0 0 0 0
1 1 0 x+1 X 1 0 1 X x+1
X X x+l 0 1 X 0 X 1 x+1
x+1 x+l X 1 0 x+1 0 x+1 x+l 0
It is interesting to note that F[x]l(x2
+ x + I) is actually a field as the multiplicative inverse for all
the non-zero elements also exists. On the other hand, F[x]!(x2
+ 1) is not a field because the
multiplicative inverse of element x + 1 does not exist.
It is worthwhile exploring the properties of f(x) which makes F[x]!f(x) a field. As we shall
shortly find out, the polynomial f(x) must be irreducible (nonfactoriz:p.hle).
Definition 4.5 A polynomi-al f(x) in F[x] is said to be reducible if f(x) = a(x) b(x),
where a(x), h(x) are elements of l{x) and deg a(x) and deg b(x) are both smaller than
deg f(x). H f(x) is not reducible, it is called irreducible. A monic irreducible
polynomial of degree at least one is called a prime polynomial.
It is helpful to compare a reducible polynomial with a positive integer that can be factorized
into a product of prime numbers. Any monic polynomial in f(x) can be factorized uniquely into
a product of irreducible monic polynomials (prime polynomials). One way to verify a prime
polynomial is by trial and error, testing for all possible factorizations. This would require a
computer search. Prime polynomials of every degree exist over every Galois Field.
Theorem 4.1
(i) A polynomial f(x) has a linear factor (x - a) if and only if f(x) = 0 where a is a field
element.
(ii) A polynomialf(x) in F[x] of degree 2 or 3 over GF(q) is irreducible if and only iff(a)
"* 0 for all a in GF(q).
(iii) Over any field, xn- 1 = (x- 1)( xn- 1
+ セM 2
+ ... + x +1). The second factor may be
further reducible.
Cyclic Codes
Proof
(i) H f(x) = (x- a) g(x), then obviously f(a) = 0. On the other hand, if f(a) = 0, by
division algorithm, f(x) = q(x)(x- a) + r(x), where deg r(x) <deg (x- a) = 1. This
implies that r(x) is a constant. But, since f(a) = 0, r(x) must be zero, and therefore,
f(x) = q(x)(x- a).
(ii) A polynomial of degree 2 or 3 over GF(q) will be reducible if, and only if, it has at
least one linear factor. The result (ii) then directly follows from (i). This result does
not necessarily hold for polynomials of degree more than 3. This is because it might
be possible to factorize a polynomial of degree 4 or higher into a product of
polynomials none of which are linear, i.e., of the type (x- a).
(iii) From (i), (x- 1) is a factor of (xn -1). By carrying out long division of (xn -1) by (x -1)
we obtain HセM Q@
+ xn- 2
+ ... + x + 1).
Example 4.9 Considerf(x) = セ@ -1 over GF(2). Using (iii) of theorem 4.1 we can write x1-I =
(x- QIHセ@ + x + 1). This factorization is true over any field. Now, lets try to factorize the second
term, p(x) = Hセ@ + x + 1).
p(O) =0 + 0 + 1 =1, over GF(2),
p(l) = 1 + 1 + 1 = 1, over GF(2).
セ@
Therefore, p(x) cannot be factorized further (from Theorem 4.1 (ii)).
Thus, over GF(2), x1- 1 = (x- QIHセ@ + x + 1).
Next, considerf(x) = x1- 1 over GF(3).
セ@ -1 = (x- QIHセ@ + x + 1).
Again, let p(x) = Hセ@ + x + 1).
p(O) = 0 + 0 + 1 = 1, over GF(3),
p(l) = 1 + 1 + 1 = 0, over GF(3).
p(2) = 2.2 + 2 + 1 = 1 + 2 + 1 = 1, over GF(3).
Since, p(l) = 0, from (i) we have (x- 1) as a factor ofp(x).
Thus, over GF(3),
セ@ -1 = (x- l)(x- 1) (x- 1).
Theorem 4.2 The ring F[x]lf(x) is a field if, and only if, J(x) is a prime polynomial in F[x].
Proof To prove that a ring is a field, we must show that every non zero element of the
ring has a multiplicative inverse. Let s(x) be a non zero element of the ring. We have, deg
s(x) < deg J(x), because s(x) is contained in the ring F[x]!f(x). It can be shown that the
Greatest Common Divisor (GCD) of two polynomials J(x) and s(x) can be expressed as
GCD(f(x), s(x)) = a(x) J(x) + h(x) s(x),
Information Theory, Coding and Cryptography
where a(x) and h(x) are polynomials over GF(q). Since f(x) is irreducible in F[x], we have
GCD(f(x), s(x)) = 1= a(x) f(x) + b(x) s(x).
Now, 1= セクI{Q}@ = セクI{@ a(x) f(x) + b(x) s(x)]
= セクI{@ a(x) f(x)] + セクI{@ h(x) s(x)] (property (i) of residues)
= 0 + セクI{「HクI@ s(x)]
= rヲHクIヲセクI{「HクI}NセクI{ウHクI}ス@ (property (ii) of residues)
= セクIサrエHクI{「HクI}NウHクIス@
Hence, セクI{「HクI}ゥウ@ the multiplicative inverse of s(x).
Next, let us prove the only ifpart of the theorem. Let us suppose f(x) has a degree of at
least 2, and is not a prime polynomial (a polynomial of degree one is always irreducible).
Therefore, we can write
f(x) = r(x) s(x)
セッイ@ some polynomials r(x) and s(x) with degrees at least one. If the ring F[x]lf(x) is
mdeed a field, then a multiplicative index of r(x), r- 1
(x) exists, since all polynomials in
the field must have their corresponding multiplicative inverses. Hence,
s(x) = セクIサ@ s(x)}
= セクIサ@ r(x)r- 1
(x)s(x)} = Rt(x){r- 1
(x)r(x)s(x)} = Rt(x){r- 1
(x)f(x)} = 0
セッキ・カ・イL@ we had assumed s(x) :t:. 0. Thus, there is a contradiction, implying that the ring
IS not a field.
Note that a prime polynomial is both monic and irreducible. In the above theorem it is
sufficient to have f(x) irreducible in order to obtain a field. The theorem could as well 'been
stated as: "The ring F[x]!f(x) is a field if and only if[(x) is irreducible in F[x]".
So, now we have an elegant mechanism of generating Galois Fields! If we can identify a
prime polynomial of degree n over GF(q), we can construct a Galois Field with (elements.
Such a field will have polynomials as the elements of the field. These polynomials will be
、セヲゥョ・、@ over GF(q) and consist of all polynomials of degree less than n. It can be seen that there
will be ( such polynomials, which form the elements of the Extension Field.
Example 4.10 Consider the polynomialp(x) =x1 +x + 1 over GF(2). sゥョ」・LーHoIMZセZ@ 0 andp(1) [セZ@
0, the polynomial is irreducible in GF{2). Since it is also monic,p(x) is a prime polynomial. Here
we haven= 3, so we can use p(x) to construct a field with 23
= 8 elements, i.e., GF(8). The
elements.of this field will be 0, 1, x, x + 1, Xl-, 7? + 1, 7? + x, 7? + x + 1, which are all possible
polynOJDials ofdegree less than n =3. It is easy toconstruct the addition and multiplication tables
for this field (exercise).
Cyclic Codes
Having developed the necessary mathematical tools, we now resume our study of cyclic
codes. We now fix[(x) =X'- 1 for the remainder of the chapter. We also denote F[x]!f(x) 「ケセᆳ
Before we proceed, we make the following observations:
(i) X'= 1 (mod X'- 1). Hence, any polynomial, modulo X'- 1, can be reduced simply by
replacing X' by 1, xz+l by X and SO on.
(ii) A codeword can uniquely be represented by a polynomial. A codeword consists of a
sequence of elements. We can use a polynomial to represent the locations and values of
all the elements in the codeword. For example, the codeword c1l2..·'n can be represented
by the polynomial c(x) =Co+ c1x + l2; + ... cnX'. As another example, the codeword over
GF(B), c= 207735 can be represented by the polynomial c(x) = 2 + 7Xl + 7; + 3x
4
+UセN@
(iii) Multiplying any polynomial by x corresponds to a single cyclic right-shift of the
codeword elements. More explicitly, in Rno by multiplying c(x) by x we get x. c{x) = eox +
cl; + l2; + ..... cnr-1
=en+ eox+ clXl + c2f + ..... cn-lX'.
Theorem 4.3 A code C in セ@ is a cyclic code if, and only if, C satisfies the following
conditions:
(i) a(x),b(x)E cセ。HクIK「HクIe@ C
(ii) a(x)E c。ョ、イHクIeセセ。HクIイHクIe@ C.
Proof
(4.4)
(4.5)
(i) Suppose Cis a cyclic code in セᄋ@ Since cyclic codes are a subset of linear block codes,
the first condition holds.
(ii) Let r(x) = r0
+ r1x + r2x2
+ ... 1nxn. Multiplication by x corresponds to a cyclic
rightshift. But, by definition, the cyclic shift of a cyclic codeword is also a valid
'todeword. That is,
x.a(x) E C, x.(xa (x)) E C,
and so on. Hence
r(x)a(x) = r0a(x) + r1xa(x) + r2Xla(x) + ... rnX'a(x)
is also in C since each summand is also in C.
Next, we prove the only ifpart of the theorem. Suppose (i) and (ii) hold. Take r(x) to be
a scalar. Then (i) implies that Cis linear. Take r(x) = x in (ii), which shows that any cyclic
shift also leads to a codeword. Hence (i) and (ii) imply that Cis a cyclic code.
In the next section, we shall use th-e mathematical tools developed so far to construct
cyclic codes.
4.4 A METHOD FOR GENERATING CYCLIC CODES
The following steps can be used to generate a cyclic code:
(i) Take a polynomial f(x) in セM
(ii) Obtain a set of polynomials by multiplying f(x) by'ell possible polynomials in セM
(iii) The set of polynomials obtained above corresponds to the set of codewords belonging to
a cyclic code. The blocklength of the code is n.
Information Theory, Coding and Cryptography
d!
Example 4.11 Consider the polynomial f(x) = 1 + :J! in R3
defmed over GF(2). In general a
polynomial in R3 ( = F [x]l( x
3
- 1)) can be represented as r(x) = r
0
+ r
1
x + イセL@ where the
coefficients can take the values 0 or 1 (since defined over GF(2)).
Thus, there can be a total of 2 x 2 x 2 = 8 polynomials inR3
defined over GF(2), which are 0, 1,
x, Yl-, 1 + x, 1 + Xl, x + Yl-, 1 + x + Yl-. To generate the cyclic code, we multiplyf(x) with these 8
possible elements ofR3 and then reduce the results modulo (2- 1):
(1 + _x2). 0 = 0, (1 + _x2) .1 = (1 + _x2), (1 + _x2) .X = 1 + X, (1 + _x2) . _x2 = X + _x2,
(1 +.C). (1 + x) = x + .C, (1 +.C). (1 +.C)= 1 + x, (1 +.C). (x +.C)= (1 +.C),
(1 + _x2) . (1 + X + _x2) = 0.
Thus there are only four distinct codewords: {0, 1 + x, 1 + .C, x + .C} which correspond to
{000, 110, 101, 011}.
From the above example it appears that we can have some sort of a Generator Polynomial
which can be used to construct the cyclic code.
Theorem 4.4 Let C be an (n, k) non-zero cyclic code in Rn. Then,
(i) there exists a unique monic polynomial セクI@ of the smallest degree in C
(ii) the cyclic code C consists of all multiples of the generator polynomial g(x) by
polynomials of degree k - I or less
(iii) g(x) is a factor ッヲセM 1
Proof
(i) Suppose both g(x) and h(x) are monic polynomials in C of smallest degree."Then g(x)
- h(x) is also in C, and has a smaller degree. If g(x) t: h(x), then a suitable scalar
multiplier of g(x) - h(x) is monic, and is in C, and is of smaller degree than g(x). This
gives a contradiction.
(ii) Let a(x) E C. Then, by division algorithm, a(x) = q(x)g(x) + r(x), where deg r(x) <deg
g(x). But r(x) = a(x) - q(x)g(x) E C because both words on the right hand side of the
equation are codewords. However, the degree of g(x) must be the minimum among
all codewords. This can only be possible if r(x) = 0 and a(x) = q(x)g(x). Thus, a
codeword is obtained by multiplying the generator polynomial g(x) with the
polynomial q(x). For a code defined over GF(q), here are qk distinct codewords
possible. These codewords correspond to multiplying g(x) with the ( distinct
polynomials, q(x), where deg q(x) $ (k- I).
(iii) By division 。ャァッイゥエィュLセM 1 = q(x)g(x) + r(x), where deg r(x) <deg g(x). Or, r(x) = {(xn
- 1) - q(x)g(x)} modulo Hセ@ - 1) = - q(x)g(x). But - q(x)g(x) E C because we are
multiplying the generator polynomial by another polynomial -q(x). Thus, we have a
codeword r(x) whose degree is less than that of g(x). This violates the minimality of
the degree ofg(x), unless r (x) = 0. Which ゥューャゥ・ウセM 1 = q(x) g(x), i.e., g(x) is a factor
ッヲセM 1.
Cyclic Codes
The last part of the theorem gives us the recipe to obtain the generator ーッャセッュゥ。ャ@ for a
cyclic code. All we have to do is to factorize xn - 1 into irreducible, monic ーッ⦅ャセッュゥ。ャウN@ We can
also find all the possible cyclic codewords of blocklength n simply by factonzmg セ@ - 1.
Note 1: A cyclic code C may contain polynomials other than the generator polynomial which
also generate C. But the polynomial with the minimum degree is called the generator
polynomial.
Note 2: The degree of g(x) is n- k (this will be shown later).
Example 4.12. To find all the binary cyclic codes ofblocklength 3, we first factorize2- 1. Note
that for GF(2), 1 =- 1, since 1 + 1 = 0. Hence,
セ@ - 1 = セ@ + 1 = (x + 1)( :J! + x + 1)
Thus, we can make the following table.
Generator Polynomial Code (polynomial) Code (binary)
1 {R3} {000, 001, 010, 011,
100, 101, 110, 111}
(x + 1) {0, X+ 1, X2 +X, .f + 1} {000,011, 110, 101}
(x2+x+ 1) {0, _x2 +X+ 1} {000, 111}
Hセ@ + 1) 0 {0} {000}
A simple encoding rule to generate the codewords from the generator polynomial is
qx) = i (x) g(x), (4.6)
where i(x) is the information polynomial, qx) is the codeword polynomial and セクI@ is the
generator polynomial. We have seen, already, that there is a one to one correspondence
between a word (vector) and a polynomial. The error vector can be also represented セウ@ the error
polynomial, e(x). Thus, the received word at the receiver, after passing through a nmsy channel
can be expressed as
(4.7)
v(x) = c(x) + e(x).
We define the Syndrome Polynomial, s(x) as the remainder of v(x) under division by セクIL@
i.e.,
s(x) = Rg(x)[v(x)] = Rg(x)[ c(x) + e(x)] = Rg(x)[c(x)] + Rg(x)[e(x)] "'Rg(x)[e(x)],
because Rg(x)[qx)] = 0.
(4.8)
Information Theory, Coding and Cryptography
Example 4.13 Consider the generator polynomial g(x) = x?- + I for ternary cyclic codes (i.e., over
GF(3)) ofblocklengthn = 4. Here we are dealing with cyclic codes, therefore,the highest power of
g(x) is n- k. Since n = 4, k must be 2. So, we are going to construct a (4, 2) cyclic ternary code.
There will be a total ofqk =32
=9 codewords. Theinformation polynomials and the corresponding
codeword polynomials are listed below.
i i(x) c(x) = i(x) g(x) c
()() 0 0 ()()()()
OI I x} + I OI01
02 2 2x?- + 2 0202
IO X セKx@ 1010
11 X+ I セKxャKクK@ 1 1111
12 x+2 セKRxzKクKR@ 1212
20 2x セKRク@ 2020
21 2x+ I セKイK@ 2x+ I 2121
22 2x+ 2 セKRxzKRクKR@ 2222
It can be seen that the cyclic shift of any codeword results in another valid codeword. By
observing the codewords we find that the minimum distance of this code is 2 (there are four non-
zero codewords with the minimum Hamming weight= 2). Therefore, this code is capable of
detecting one error and correcting zero errors.
Observing the fact that the codeword polynomial is divisible by the generator polynomial, we
can detect more number of errors than suggested by the minimum distance of the code. Since we
are dealing with cyclic codes that are a subset of linear block codes, we can use the all zero
codeword to illustrate this point without loss ofgenerality.
Assume that g(x) = x?- + 1 and the transmitted codeword is the all zero codeword.
Therefore, the received word is the error polynomial, i.e.,
v(x) = c(x) + e(x) = e(x). (4.9)
At the receiver end, an error will be detected ifg(x) fails to divide the received wordv(x) = e(x).
Now, g(x) has only two terms. So if the e(x) has odd number of terms, i.e., if the number of errors
are odd, it will be caught by the decoder! For example, if we try to divide e(x) = セ@ + x +I by g(x),
we will always get a remainder. In the example of the (4, 2) cyclic code with g(x) = x?- + I, the d*
= 2, suggesting that it can detectd*- I = I error. However, by this simple observation, we find that
it can detect any odd number of errors セ@ n. In this case, it can detect I error or 3 errors, but not 2
errors.
Cyclic Codes
4.5 MATRIX DESCRIPTION OF CYCLIC CODES
Theorem 4.5 Suppose Cis a cyclic code with generator polynomialg(x) = g0+ g1x + ...+g,xr
of degree r. Then the generator matrix of C is given by
go gl gr 0 0 0 0
0 go gl gr 0 0 0
G= 0 0 go g! gr 0 0 k= (n- r) rows (4.10)
...
0 0 0 0 0 go g! gr
n columns
Proof The (n- r) rows of the matrix are obviously linearly independent because of the
echelon form of the matrix. These (n - r) rows represent the codewords g(x), xg(x),
x2 g(x), ..., _?-r-1
g(x). Thus, the matrix can generate these codewords. Now, to prove that
the matrix can generate all the possible codewords, we must show that every
possible codeword can be represented as linear combinations of the codewords
g(x), xg(x), ,?g(x), ..., セLN⦅ Q
ァHクIN@
We know that if c(x) is a codeword, it can be represented as
c(x) = q(x) .g(x) '
for some polynomial q(x). Since the degree of c(x) < n (because the length of the codeword
is n), it follows that the degree of q(x) < n- r. Hence,
q(x).g(x) = (qo + qlx + ...+ qn-r-lx"-r-l)g(x) = アセHクIK@ q1xg(x) + ...+ qn-r-Ix"-r-
1
g(x)
Thus, any codeword can be represented as a linear combination of g(x), xg(x), Xl-g(x), ...,
x"-r-1
g(x). This proves that the matrix G is indeed the generator matrix.
We also know that the dimensions of the generator matrix is kx n. Therefore, r= n- k, i.e.,
the degree of g(x) is n- k.
Example 4.14 To find the generator matrices of all ternary codes (i.e., codes over GF(3)) of
blocklength n = 4, we first factorize x
4
- I.
x4 - I = (x- iIHセK@ x?- + x + I)= (x- I) (x + I)(x!- + 1)
We know that all the factors of x4
- 1 are capable of generating a cyclic code. The resultant
generator matrices are listed in Table 4.1. Note that -I = 2 for GF(3).
Table 4.1 Cyclic codes of blocklength n =4 over GF(3)
g(x) (n, k) dmin G
1 (4, 4) I [/4]
[-i
1 0
セ}@
(x-I) (4, 3) 2 -1 I
0 -I
Information Theory, Coding and Cryptography
g(x) (n, k) dmin G
[i
1 0
セ}@
(x+ 1) (4, 3) 2 1 1
0 1
Hセ@ + 1)
{セ@
0 1
セ}@
(4, 2) 2
0
1
HセiI@ [Mセ@
0 1
セ}@
(4, 2) 2
0
-1
Hセ@ Mセ@ + x•1) (4,1) 4 [-1 1 -1 1]
HセエセKクMヲエャI@ (4, 1) 4 [tl 1 -tl 1]
(x4
- 1) (4, 1) 0 [0000]
It can be seen from the table that none of the (4, 2) ternary cyclic codes are single error correcting
codes (since their minimum distance is less than 3). An interesting observation is that we do not
have any ternary (4, 2) Hamming Code that is cyclic! Remember, Hamming Codes are single error
correcting codes with n = (q -1)/(q -1) and k = (q...., 1)/(q -1)- r, where r is an ゥョエ・ァ・イセ@ 2.
Therefore, a (4, 2) ternary Hamming code exists, but it is not a cyclic code.
The next step is to explore if we can find a parity check polynomial corresponding to our
generator polynomial, g(x). We already know that セクI@ is a factor of X'- 1. Hence we can write
X'- 1 = h(x) g(x), (4.11)
where h (x) is some polynomial. The following can be concluded by simply observing the above
equation:
(i) Since g (x) is monic, h (x) has to be monic because the left hand side of the equation is also
monic (the leading coefficient is unity).
(ii) Since degree of g(x) is n - k, the degree of g(x) must be k.
Suppose Cis a cyclic code ゥョセ@ with the generator polynomialg(x). Recall that we are den-oting
F[x]lf(x) by Rw wheref(x) =X'- 1. InRw h(x)g(x) =xn-1 =0. Then, any codeword belo-nging to
Ccan be written as c(x) = 。HクIセクIL@ where the polynomial a(x) E セMtィ・イ・ヲッイ・@ in Rw
c(x)h(x) = 。HクIセクIィHクI@ = a(x) ·0 = 0.
Thus, h(x) behaves like a Parity Check Polynomial. Any valid codeword when multiplied
by the parity check polynomial yields the zero polynomial. This concept is parallel to that of the
parity check matrix introduced in the previous chapter. Since we are still in the domain of linear
block codes, we go ahead and define the parity check matrix in relation to the parity check
polynomial.
Cyclic Codes
Suppose Cis a cyclic code with the parity check polynomial h(x) = fto + h1
x + ... + hAl, then
the parity check matrix of C is given by
Recall that cHT= 0. Therefore, ゥghセ@ = 0 for any information vector, i. Hence, GHT= 0. We
further have s = vHT where s is the syndrome vector and v is the received word.
Exmnple 4.15 For binary codes of block length, n =7, we have
x
1
-1 = HクMQIHセKクK@ QIH[KセK@ 1)
Consider g(x) = Hセ@ + x + 1). Since g(x) is a factor Df x1
- 1, there is a cyclic code that can be
generated by it. The generator matrix corresponding to g(x) is
[
1 1 0 1 0 0 0]
0 1 1 0 1 0 0
G=
0 0 1 1 0 1 0
0 0 0 1 1 0 1
The parity check polynomial h(x) is (x- 1)(; + X2 + 1) = (x4
+ X2 + x + 1). And the corresponding
parity check matrix is
[
1 0 1 1 1 0 OJ
H= 0 1 0 1 1 1 0
0 0 1 0 1 1 1
The minimum distance of this code is 3, and this happens to be the (7, 4) Hamming Code. Thus, the
binary (7, 4) Hamming Code is also a Cyclic code.
4.6 BURST ERROR CORRECTION
In many real life channels, errors are not random, but occur in bursts. For example, in a mobile
communications channel, fading results in Burst errors. When errors occur at a stretch, as
opposed to random errors, we term them as Burst Errors.
I
•·
Information Theory, Coding and Cryptography
EX1l111Jlle 4.16 Let the transmitted sequence ofbits, transmitted at 10 kb/s over a wirelesschannel,
be
c = 0 1 0 0 0,1 1 l 0 1.0 1 0 0 0, 0 1 0 1 1 0 1
Suppose, after 0.5 ms ofthe start of transmission, the channel experiences a fade ofduration 1ms.
During this time interval, the channel corrupts the transmitted bits. The error sequence can be
written as
b = 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 0.
This is an example of a burst error, where a portion ofthe transmitted sequence gets garl>1ed due to
the channel. Here the length of the burst is 10 bits. However, not all ten locations arein error.
Definition 4.6 A Cyclic Burst of length t is a vector whose non-zero components
are among t successive components, the first and last of which are non-zero.
If we are constructing codes for channels more prone to burst errors of length t (as opposed
to an arbitrary pattern of t random errors), it might be possible to design more efficient codes.
We can describe a burst error as
e(x)=i.b(x) (4.13)
where is a polynomial of degree セ@ t- 1 and b(x) is the burst pattern. xi marks the starting location
of the burst pattern within the codeword being transmitted.
A code designed for correcting a burst of length t must have unique syndromes for every
error pattern, i.e.,
s(x) = Rg(x)[e(x)]
is different for each polynomial representing a burst of length t.
Example 4.17 For a binary code ofblocklength n = 15, consider the generator polynomial
g(x) = x
6
+ セ@ + セ@ + x + 1
(4.14)
This code is capable ofcorrecting bursts of length 3 or less. To prove this we must show that all the
syndromes corresponding to the different burst errors are distinct. The different Burst Errors are
(i) Bursts of length 1
e(x) = J fori= 0, 1, ..., 14.
(ii) Bursts oflength 2
e(x) = J.(l + x) fori= 0, 1, ..., 13, and e(x) =J ·(1 KセI@ fori= 0, 1, ..., 13.
(iii) Bursts of length 3
e(x) =J·(1 KクKセI@ fori=O, 1, ..., 12.
Cyclic Codes
It can be shown that the syndrome of all these 56 (15 + 14 + 14 + 13) errorpatterns are distinct.
A table can be made for each pattern and the corresponding syndrome which can be used for
correcting a burst error of length 3 or less. It should be emphasized that the codes designed
specifically for correcting burst errors are more efficient in terms ofthe code rate. The code being
discussed here is a (15, 9) cyclic code with code rate =ldn = 0.6 and minimum distanced• =3. This
code can correct only 1 random error (but up to three burst errors!). Note that correction of one
random error amounts to correcting a burst error of length 1.
Similar to the Singleton Bound studied in the previous chapter, there is a Bound for the
minimum number of parity bits required for a burst-error correcting linear block code: 'A
linear block code that corrects all bursts of length t or less must have at least 2t parity symbols'.
In the next three sections, we will study three different sub-classes of cyclic codes. Each sub-
class has a specific objective.
4.7 FIRE CODES
Definition 4.7 A Fire code is a cyclic burst error correcting code over GF(q) with
the generator polynomial
g(x) = (x2
t-1-l)p(x), (4.15)
where p(x) is a prime polynomial over GF(tj) whose degree mis not smaller than t
and p(x) does not divide xu-1
-l. The blocklength of the Fire Code is the smallest
integer nsuch that g(x) divides X'-1. A Fire Code can correct all burst errors of length
tor less.
Example 4.18 Consider the Fire code with t= m = 3. A prime polynomial overGF{2) ofdegree 3
is p(x) =; + x + 1, which does not divide Hセ@ -1). The generator polynomial ofthe Fire Code will
be
g(x) = HセMQIーHクI@ = HセMQIH@ f + x + 1)
= セ@ + x6
+ セM セ@ - x- 1
]セKク V
KセKセKクK@ 1
The degree of g(x) = n- k = 8. The blocklength is the smallest integer n such that g(x) divides
X'-1. After trial and error we get n = 35. Thus, the parameters of the Fire Code are (35, 27) with
g(x) = x8
+x6
+セ@ +; +x + 1. This code can correct up to bursts oflength 3._The code rate ofthis
code is 0.77, and is more efficient than the code generated byg(x) = セ@ +セ@ +セ@ + x + 1 which has
a code rate of only 0.6.
Fire Codes become more efficient as we increase t. The code rates for binary Fire Codes (with
m = t) for different values oft are plotted in Fig. 4.1.
Information Theory, Coding and Cryptography
I
0.9 MMMMMMMセMMMMMMMMMMMMTMMMMMM
.!l 0.8 - - -- - - セM - - - -- L - - - - - セ@ - - - - - - セ@ - - - セ@ - _J- - - - - - • - - - - セ@ -
e I :
CD 0.7 ---- セMMMMMMGMMMMMMMGMMMMMMセMMMMMMGMMMMMMセMMMMMセMMMMMM
セ@ I
8 0.6
I
-- MMセMMMMMMLMMMMMセMMMMMMイMMMMMLMMMMMMイMMMMMZMMMMMM
1 I I
I I I
0.5 - - -- - Mセ@ -- - - - - - ,___ - - - - ---1 - - -
I
0.4 '------'------'-----'------'---'------'----'------' t
2 3 4 5 6 7 8 9 10
Fig 4.1 Code Rates for Different Fire Codes.
4.8 GOLAY CODES
The Binary Golay Code
In the previous chapter, Sec. 3.9, we saw that a (23, 12) perfect code exists with d• = 7. Recall
that, for a perfect code,
M サHセIKgI@ (q-1) +
(;)<q-1)
2
+
...+
(;)<q-1)'} = q", (4.16)
which is satisfied for the values: n= 23, k= 12, M= 2k= 212
, q= 2 and t= (a- 1)/2 = 3. This
(23, 12) perfect code is the Binary Golay Code. We shall now explore this Perfect Code as a
cyclic code. We start with the factorization of (x23
-l).
Hセ S
MQI@ = (x-I)(x11
+ ;o + i + x5
+ x4
+ x2
+ 1) (;1
+ x9
+:? + x6
+ x5
+ x + 1)
=(x-I) g1(x) fa(x). (4.17)
The degree of g1(x) = n- k= 11, hence k= 12, which implies that there exists a (23, 12) cyclic
code. In order to prove that it is a perfect code, we must show that the minimum distance of this
(23, 12) cyclic.code is 7. One way is to write out the parity check matrix, H, and show that no six
columns are linearly dependent. Another way is to prove it analytically, which is a long and
drawn-out proof. The easiest way is to write a computer program to list out all the 212
codewords
and find the minimum weight (on a fast computer it takes about 30 seconds!). The code rate is
0.52 and it is a triple error correcting code. However, the relatively small block length of this
perfect code makes it impractical for most real life applications.
The Ternary Golay Code
We next examine the ternary (11, 6) cyclic code, which is also the Ternary Golay Code. This
code has a minimum distance = 5, and can be verified to be a perfect code. We begin by
factorizing (x11
-1) over GF(3).
Cyclic Codes
(x11
-1) = (x- l)(x'i + x4
- x3+ セM 1) HxゥM[MセM x- 1)
= (x- 1) g1(x) [!a(x) (4.18)
The degree of g1(x) = n- k = 5, hence k = 6, which implies that there exists a (11, 6) cyclic
code. In order to prove that it is a perfect code, we must show that the minimum distance of this
(11, 6) cyclic code is 5. Again, we resort to an exhaustive computer search and find that the
minimum distance is indeed 5.
It can be shown that (xP-I) has a factorization of the form (x- 1) g1(x) g2(x) over GF(2),
whenever pis a prime number of the form 8m ± 1 (m is a positive integer). In such cases, g1(x)
and g2(x) generate equivalent codes. If the minimum distance of the code generated by g1
(x) is
odd, then it satisfies the Square Root Bound
d.'2,.jp (4.19)
Note that p denotes the blocklength.
4.9 CYCLIC REDUNDANCY CHECK (CRC) CODES
One of the common error detecting codes is the Cyclic Redundancy Check (CRC) Codes. For
a k-bit block of bits the (n, k) CRC encoder generates (n- k) bit long frame check sequence
(FCS). Let us define the following
T = n-bit frame to be transmitted
D = k-bit message block (information bits)
F= (n- k) bit FCS, the last (n- k) bits ofT
P= the predetermined divisor, a pattern of (n- k + 1) bits.
The pre-determined divisor, P, should be able to divide the codeword T. Thus, TIP has no
remainder. Now, Dis the k-bit message block. Therefore, 2n-kD amounts to shifting the k bits to
the left by (n- k) bits, and padding the result with zeros (recall that left shift by 1 bit of a binary
sequence is equivalent to multiplying the number represented by the binary sequence by two).
The codeword, T, can then be represented as
T= 2n-kD + F (4.20)
Adding Fin the above equation yields the concatenation of D and F. If we divide 2n-k D by P,
we obtain
2n-k D R
- - = Q,+ - (4.21)
p p
where, Q,is the quotient and RlPis the remainder. Suppose we use R as the FCS, then,
T= 2n-k D + R (4.22)
In this case, upon dividing Thy P we obtain
T 2"-k D+R 2"-k D R
-----= +-
p p p p
Information Theory, Coding and Cryptography
R R R+R
= Q_+ -+-=Q +--=Q
p p p
(4.23)
Thus there is no remainder, i.e., Tis exactly divisible by P. To generate such an FCS, we
simply divide 2rrk D by P and use the (n- k)-bit remainder as the FCS.
Let an error E occur when Tis transmitted over a noisy channel. The received word is given
by
V=T+E (4.24)
The CRC scheme will fail to detect the error only if Vis completely divisible by P. This
translates to the case when E is completely divisible by P (because Tis divisible by P).
Example 4.19 Let the messageD= 101000I101, i.e., k = IO and the pattern, P = 110101. The
number of FCS bits= 5. Therefore, n = I5. We wish to determine the FCS.
First, the message is multiplied by 25
(left shift by 5 and pad with 5 zeros). This yields
25
D = 10I000110100000
Next divide the resulting number byP = 1I0101. By long division. we obtain Q= 1101010110 and
R = 0Ill0. The remainder is added to 25
D to obtain
T = lOIOOOIIOIOIIIO
Tis the transmitted codeword. Ifno errors occur in the channel, the received word whendivided by
P will yield 0 as the remainder.
CRC codes can also be defined using the polynomial representation. Let the message
polynomial be D(x) and the predetermined divisor be P(x). Therefore,
xn-k D(x) _ n( ) R(x)
-----!o(...x +--
P(x) P(x)
T(x) = xn- kD(x) + R(x) (4.25)
At the receiver end, the received word is divided by P(x). Suppose the received word is
V(x) = T(x) + E(x), (4.26)
where E(x) is the error polynomial. Then [T(x) + E(x) ]IP(x) = E(x)!P(x) because T(x)!P(x) = 0.
Those errors that happen to correspond to polynomials containing P(x) as a factor will slip by,
and the others will be caught in the net of the CRC decoder. The polynomial P(x) is also called
the generator polynomial for the CRC code. CRC codes are also known as Polynomial Codes.
Example 4.20 Suppose the transmitted codeword undergoes a single-bit error. The error
polynomialE(x) can be represented byE(x) =セL@ where i determines the location ofthe single error
bit. IfP(x) contains two or more terms, E(x)IP(x) can never be zero. Thus all the single errors will
be caught by such a CRC code.
Cyclic Codes
Example 4.21 Suppose two isolated errors occur, i.e., E(x) = :i + xi, i >j. Alternately, E(x) =
xi(7!-i + 1). Ifwe assume thatP(x) is not divisible byx, then a sufficient conditionfor detecting all
double errors is thatP(x) does not divide.I+ I for any kup to the maximum value ofi-j (i.e., the
frame length). For example x15
+ x14
+ 1 will not 、ゥカゥ、・セK@ 1 for any value of k below 32,768.
Example 4.22 Suppose the error polynomial has an odd numberofterms (correspondingto an odd
number of errors). An interesting fact is that there is no polynomial with an odd number of terms
that hasx + I as a factor ifwe are performing binary arithmetic (modulo 2 operations). By making
(x + 1) as a factor of P(x), we can catch all errors consisting of odd number of bits (i.e., we can
catch at least half of all possible errors!).
Another interesting feature of CRC codes is its ability to detect burst errors. A burst error of
length k can be represented by クゥHセM Q@
+ セM R@
+ ... + 1), where idetermines how far from the right
end of the received frame the burst is located. If P(x) has a ,P term, it will not have an xi as a
factor. So, if the degree of (xk-1
+ xk-2
+ ... + I) is less than the degree of P(x), the remainder can
never be zero. Therefore, a polynomial code with r check bits can detect all burst errors of
length セ@ r. If the burst length is r + 1, the remainder of the division by P(x) will be zero if, and
only if, the burst is identical to P(x). Now, the 1st and last bits of a burst must be 1 (by definition).
The intermediate bits can be 1 or 0. Therefore, the exact matching of the burst error with the
polynomial P(x) depends on the r- 1 intermediate bits. Assuming all combinations are equally
likely, the probability of a miss is
27
1
_1
. One can show that when error bursts oflength greater
than r+ 1 occurs, or several shorter bursts occur, the probability of a bad frame slipping through
. I
IS-.
2T
Example 4.23 Four versions ofP(x) have become international standards:
CRC-12: P(x) = x12
+ x11
+ Y! + セ@ + x
1
+ I.
CRC-16: P(x) = x
16
+ x15
+ セ@ + 1.
CRC-CCITI: P(x) = x16
+ x
15
+ セ@ + 1.
CRC-32: P(x) = Y!2
+ セ@ + ? + ? + x16
+ x
12
+ x
11
+ x
10
+ x
8
+ x
1
+ セ@ + x
4
+ セ@ + x
1
+ I.
(4.27)
All the four contain (x + 1) as a factor. CRC-12 is used for transmission of streams of 6-bit
characters and generates a 12-bit FCS. Both CRC-16 and CRC-CCITI are popular for 8-bit
characters. They result in a 16 bit FCS and can catch all single and double errors, all errors with
odd number of bits, all burst errors of length 16 or less, 99.997% of 17-bit bursts and 99.998%
of18-bit and longer bursts. CRC-32 is specified as an option in some point-to-point Synchronous
Transmission Standards.
Information Theory, Coding and Cryptography
4.10 CIRCUIT IMPLEMENTATION OF CYCLIC CODES
Shift registers can be used to encode and decode cyclic codes easily. Encoding and decoding of
cyclic codes require multiplication and division by polynomials. The shift property of shift
registers are ideally suited for such operations. Shift registers are banks of memory units which
are capable of shifting the contents of one unit to the next at every clock pulse. Here we will
focus on circuit implementation for codes over GF(2"l Beside the shift register, we will make
use of the following circuit elements:
(i) A scaler, whose job is to multiply the input by a fixed field element.
(ii) An adder, which takes in two inputs and adds them together. A simple circuit realization
of an adder is the 'exclusive-or' or the 'xor' gate.
(iii) A multiplier, which is basically the 'and' gate.
These elements are depicted in Fig. 4.2.
... D-O-
N stage shift register
Scaler Adder Multiplier
Fig. 4.2 Circuit Elements Used to Construct Encoders and Decoders for Cyclic Codes.
A field element of GF(2) can simply be represented by a single bit. For GF(2m) we require m
bits to represent one element. For example, elements of GF(B) can be represented as the
elements of the set {000, 001, 010, 011, 100, 101, 110, 111}. For such a representation we need
three clock pulses to shift the element from one stageof the shift register to the next. The effective
shift register for GF(B) is shown in Fig. 4.3. Any arbitrary element of this field can be
represented by aX-+ bx+ c, where a, b, care binary, and the power of the indeterminate xis used
to denote the position. For example, 101 =} + 1.
セ@
One stage of the effective
shift register
. . . L . . _ _ _ lセセ@
Fig. 4.3 The Effective Shift Register for GF(B).
Cyclic Codes
Example 4.24 We now consider the multiplication of any arbitrary element by another field
element overGF(8). Recall the construction ofGF(8) from GF(2) using the primepolynomialp(x)
]セKxK@ 1. The elements of the field will be 0, 1, X, X+ 1, r, Xl + 1, Xl +X, :Xl +X+ 1. We want
to obtain the circuit representation for the multiplication of any arbitrary field element (a:il + bx +
c) by another element, say, Xl + x. We have
(d + bx + c)(r + x) = ax4
+(a+ 「IセK@ (b +c) Xl +ex (modulop(x))
= (a + b + c) Xl + (b + c)x + (a + b)
One possible circuit realization is shown in Fig. 4.4.
Fig. 4.4 Multiplication of an Arbitrary Field Element.
We next focus on the multiplication of an arbitrary polynomial a(x) by g(x). Let the
polynomial g(x) be represented as
g(x) =&_XL+... + g1x + g0, (4.28)
the polynomial a(x) be represented as
a(x) = akx* +... + a1x + a0,
the resultant polynomial b(x) = a(x)g(x) be represented as
b(x) = bk+Lxk+L +... + b1x + !Jo.
(4.29)
(4.30)
The circuit realization of b(x) is given in Fig. 4.5. This is linear feed-forward shift register. It is
also called a Finite Impulse Response (FIR) Filter.
Fig. 4.5 A Finite lmpuse Response (FIR) Filter.
Information Theory, Coding and Cryptography
In Electrical Engineering jargon, the coefficients of a(x) and g(x) are 」ッョカッセカ・、@ by the shift
register. For our purpose, we have a circuit realization for multiplying two polynomials. Thus,
we have an efficient mechanism of encoding a cyclic code by multiplying the information
polynomial by the generator polynomial.
Exmnpk 4.25 The encoder circuit for the generator polynomial
g(x) =x8
+ x6
+ セ@ + セ@ + x + 1
is given in Fig. 4.6.
This is the generator polynomial for the Fire Code with t =m=3. It is easy to inteipret the circuit
The 8memory units shift the input, one unitat a time. The shifted outputsare summedat theproper
locations. There are five adders for summing up the six shiftedversions ofthe input.
. X . x2 x3 ' . y;4 x5 セ@ . . i' , x8
+ + + + . . +
- • - : セ@ . セM f セN@ ; セ@ t MMセ@ セGM
1
Fig. 4.6 Circuit Realization of the Encoder for the Fire Code.
For dividing an arbitrary polynomial by a fixed polynomial g(x), the circuit realization is
given in Fig. 4.7.
We can also use a shift register circuit for dividing an arbitrary polynomial, a(x), by a fixed
polynomial g(x). We assume here that the divisor is a monic polynomial. We already know how
to factor out a scalar in order to convert any polynomial to a monic polynomial. The division
process can be expressed as a pair of recursive equations. Let qセHクI@ and rセHクI@ be the quotient
polynomial and the remainder polynomial at the (It recursion step, with the initial conditions
Q,(o)(x) = 0 and R(0
l(x) = a(x). Then, the recursive equations can be written as
qセHクI@ = Qr-l)(x) + rイョMセクォMイL@
R(il(x) = R(r-l)(x)- R(n-r)xk-r"'x), (4.31)
where rイョMセ@ represents the leading coefficient of the remainder polynomial at stage (r - 1).
For dividing an arbitrary polynomial a(x) by a fixed polynomial g(x), the circuit realization is
given in Fig. 4.8. Mter n shifts, the quotient is passed out of the shift register, and the value
stored in the shift register is the remainder. Thus the shift register implementation of a decoder
is very simple. The contents of the shift register can be checked for all entries to be zero after the
division of the received polynomial by the generator polynomial. If even a single memory unit
of the shift register is non-zero, an error is detected.
Cyclic Codes
Fig. 4.7 A shift Register Circuit for Dividing by g(x).
Extutapk 4.26 The shiftregister circuit for dividing byg(x) =セ@ + セ@ + セ@ +セ@ +x + 1isgivenin
Fig. 4.8.
セ@
Fig. 4.8 A Shift Register Circuit for Dividing by g(x) = x' + X' + x5 + x' + x + 1.
The procedure for error detection and error 」ッイイ・」エゥセ@ is セ@ follows. The イ・」・ゥカセ@ セ@ is first
stored in a buffer. It is subjected to divide-by-g(x) operation. As we have ウ・・ョLセ@ di':ston セ「・@
carried out very efficiently by a shift register circuit. The remainder in the shift regiSter 1S then
compared with all the possible (pre-compUted) syndromes. This set of ウセュ・ウ@ correspon<!s to
the set of correctable error patterns. Ifa syndrome match is found. the error1s subtractedoutfrom
the received word. The corrected version ofthe received word is then passed on to the next stage of
the receiver unit for further processing. This kind ofa decoder is known asMeggittDecoder· The
flow chart for this is given in Fig. 4.9.
Divide by g(x) feedback
Received word
Compare with all test syndromes
Corrected Word
n stage shift register
Ag. 4.9 The Flow Chart of a Meggitt Decoder.
r
I
I
Information Theory, Coding and Cryptography
4.11 CONCLUDING REMARKS
The notion of cyclic codes was first introduced by Prange in 1957. The work on cyclic codes was
further developed by Peterson and Kasami. Pioneering work on the minimum distance of cyclic
codes was done by Bose and Raychaudhuri in the early 1960s. Another subclass of cyclic codes,
the BCH codes (named after Bose, Chaudhuri and Hocquenghem) will be studied in detail in
the next chapter. It was soon discovered that almost all of the earlier discovered linear block
codes could be made cyclic. The initial steps in the area of burst error correction was taken by
Abramson in 1959. The Fire Codes were published in the same year. The binary and the ternary
Golay Codes were published by Golay in, as early as, 1949.
Shift register circuits for cyclic codes were introduced in the works of Peterson, Chien and
Meggitt in the early 1960s. Important contributions were also made by Kasami, MacWilliams,
Mitchell and Rudolph.
SUMMARY
• A polynomial is a mathematical expression f(x) =fo +fix+ ... + f/, where the symbol
xis called the indeterminate and the coefficientsfo,h, ... fm are the elements of GF(q). The
coefficient fm is called the leading coefficient. Iffm :t= 0, then m is called the degree of the
polynomial, and is denoted by deg f(x). A polynomial is called monic if its leading
coefficient is unity. '
• The division algorithm states that, for every pair of polynomial a(x) and b(x) :t= 0 in F[x],
there exists a unique pair of polynomials q(x), the quotient, and r(x), the remainder, such
that a(x) = q(x) b(x) + r(x), where deg r(x) <deg b(x). The remainder is sometimes also
called the residue, and is denoted by Rb(x)[a(x)] = r(x).
• Two important properties of residues are
(i) Rf(x)[a(x) + b(x)] = JY(x)[a(x)] + JY(x)[b(x)], and
(ii) セクI{。HクIN@ b(x)] = JY(x){JY(x)[a(x)]. Rr(x)[b(x)]}
where a(x), b(x) and f(x) are polynomials over GF(q).
• A polynomial f(x) in F[x] is said to be reducible if f(x) = a(x) b(x), where a(x), b(x) are
elements of f(x) and deg a(x) and deg b(x) are both smaller than deg f(x). H f(x) is not
reducible, it is called irreducible. A monic irreducible polynomial of degree at least one
is called a prime polynomial.
• The ring F[x]!f(x) is a field if and only if f(x) is a prime polynomial in F[x].
• A code Cin セ@ is a cyclic code if and only if C satisfies the following conditions:
(i) a(x), b(x) E C セ@ a(x) + b(x) E C,
(ii) a(x) E C and r(x) E セ@ セ@ a(x)r(x) E C.
• The following steps can be used to generate a cyclic code:
(i) Take a polynomial f(x) ゥョセM
Cyclic Codes
(ii) Obtain a set of polynomials by multiplying f(x) by all possible polynomials in セM、@
. . b ds to the set of codewor s
(iii) The set of polynomials obtained as a ove correspon .
belonging to a cyclic code. The blocklength of the code IS n.
• Let Cbe a (n, k) non zero cyclic code in Rn. Then, .
(i) there exists a unique monic polynomial g(x) of the smallest degree m C, .
(ii) the cyclic code C consists of all multiples of the generator polynomial g(x) by
polynomials of degree k - 1 or less.
(iii) g(x) is a factor ッヲセ@ - 1.
(iv) The degree of g(x) is n- k. d th
• For a cyclic code, C, with generator polynomial g(x) =go+ g1x + ...+ g,x of egree r, e
generator matrix is given by
go gl g, 0
0 go g1 g,
0 0 go &
G=
0
0
g,
0
0
0
0
0
0
0 0 0 0 0 セ@ & b
k =n- rrows
n columns
• For a cyclic code, C, with the parity check polynomial h(x) = ho + h1x+ ...+ ィセZャL@ then the
parity check matrix is given by
hk hk -1
0 hk hk -1
0 0 hk hk-1
H=
0 0 0 0
0
ho
0
0
ho
0
0
0
0
0
0
(n- k) rows
n columns
• セ@ _
1
= h(x) g(x), where g(x) is the generator polynomial and h(x) is the parity check
polynomial. GF( ) "th the generator
• A fire code is a cyclic burst error correcting code over セ@ WI
1
"al (x) = (;t-1_1) p(x), where p(x) is a prime polynomial over GF(q) キィセウ・@
po ynomi g t divide _xu-1_1. The blocklength of the Fire
degree mis not small_er than t 。ョ、ィーセI@ エ、ァッHZセ@ セセ、・ウ@ セMQN@ A Fire Code can correct all burst
Code is the smallest mteger nsue a ;
errors of length t or less.
• The generator polynomial of the Binary Golay Code:
g1
(x) = (xll + x10 +; + x! + :4 +; + 1), or
g2(x) = (x11 +; + / +} + :f + x + 1).
I
I
I
Information Theory, Coding and Cryptography
• The generator polynomial of the Ternary Golay Code:
g1(x) = (} + x4
- i+;- I), or
g2(x) = (} - i - ; - x - I).
• One of the common error detecting codes is the Cyclic Redundancy Check (CRC) codes.
For a k-bit block of bits the (n, k) CRC encoder generates (n- k) bit long Frame Check
Sequence (FCS).
• Shift セ・ァゥウエ・イウ@ 」セ@ be used to encode and decode cyclic codes easily. Encoding and
decodmg of 」セ」ィ」@ c_odes イ・アセゥイ・@ multiplication and division by polynomials. The shift
property of shift regtsters are Ideally suited for such operations.
f9 e|ャセセッッュN。、・LNセセHmアゥャᆪエセセ@ but-not
QQOセャオN@
I I eセ@ Albert- (l879-1955)
オセMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ@
PROBLEMS
4.I Which of the following codes are (a) cyclic, (b) equivalent to a cyclic code?
(a) {0000, 0110, 1100, 0011, 100I} over GF(2).
(b) {00000, 10110, 0110I, 11011} over GF(2).
(c) {00000, 10I10, 0110I, 11011} over GF(3).
(d) {0000, II22, 22II} over GF(.'1).
(e) The rrary repetition code of length n.
4.2 Construct the addition and multiplication table for
(a) F[x]!(} + I) defined over GF(2).
(b) F[x]!(} + I) defined over GF(3).
Which of the above is a field?
4.3 List out all the irreducible polynomials over
(a) GF(2) of degrees I to 5.
(b) GF(3) of degrees I to 3.
4.4 Find all the cyclic binary codes of blocklength 5. Find the minimum distance of each
code.
4.5 sオーセッウ・@ X'- I is a product of r distinct irreducible polynomials over GF(q). How many
cychc codes of blocklength n over GF(q) exist? Comment on the minimum distance of
these codes.
TNVセ。」エッイゥコ・@ /'- I over GF(3).
(b) How many ternary cyclic codes of length 8 exist ?
(c) How many quaternary cyclic codes of length 8 exist?
Cyclic Codes
4.7 Let the polynomial
g(x) = ;o + /' + ; + x4
+ ; + x + I
be the generator polynomial of a cyclic code over GF(2) with blocklength I5.
WFind the generator matrix G.
(gYF'ind the parity check matrix H.
U::YHow many errors can this code detect?
A (gY1Iow many errors can this code correct?
(e) Write the generator matrix in the systematic form.
4.8 Consider the polynomial
g(x) = J5 + 3; + i + ; + 2; + 2x + 1.
(a) Is this a valid generator polynomial for a cyclic code over GF(4) with blocklength 15?
(b) Find the parity check matrix H.
(c) What is the minimum distance of this code?
(d) What is the code rate of this code?
(e) Is the received word, v(x) = J +; + 3i +; + 3x + I, a valid codeword?
4.9 An error vector of the form J + J+ 1
ゥョセ@ is called a double adjacent error. Show that the
code generated by the generator polynomial g1(x) = (x- I) !lli(x) is capable of correcting
all double adjacent errors, where gn(x) is, the generator polynomial of the binary
Hamming Code.
4.10 Design the Shift Register encoder and the Meggitt Decoder for the code generated in
Problem 4.8.
4.II The code with the generator polynomial g(x) = (x23
+ I)(;7
+ ; + I) is used for error-
detection and correction in the GSM standard.
(i) How many random errors can this code correct?
(ii) How many hurst errors can this code correct?
COMPUTER PROBLEMS
4.I2 Write a computer program to find the minimum distance of a cyclic code over GF(q),
given the generator polynomial (or the generator matrix) for the code.
4.I3 Write a computer program to encode and decode a (35, 27) Fire Code. It should be able
to automatically correct bursts of length 3 or less. What happens when you try to decode
a received word with a burst error of length 4?
Bose-Chaudhuri
Hocquenghem (BCH) Codes
5.1 INTRODUCTION TO BCH CODES
5
The class of Bose-Chaudhuri Hocquenghem (BCH) codes is one of the most powerful known
class of Linear Cyclic Block Codes. BCH codes are known for their multiple error correcting
ability, and the ease of encoding and decoding. So far, our approach has been to construct a
code and then find out its minimum distance in order to estimate its error correcting capability.
In this class of code, we will start from the other end. We begin by specifying the number of
random errors we desire the code to correct. Then we go on to construct the generator
polynomial for the code. As mentioned above, BCH codes are a subclass of cyclic codes, and
therefore, the decoding methodology for any cyclic code also works for the BCH codes.
However, more efficient decoding procedures are known for BCH codes, and will be discussed
in this chapter.
We begin by building the necessary mathematical tools in the next couple of sections. We
shall then look at the method for constructing the generator polynomial for BCH codes. Efficient
decoding techniques for this class of codes will be discussed next. An important sub-set of BCH
codes, the Reed-Solomon codes, will be introduced in the later part of this chapter.
Bose-Chaudhuri Hocquenghem (BCH) Codes
5.2 PRIMITIVE ELEMENTS
Definition 5.1 A Primitive Element of GF(q) is an element a such that every
field element except zero can be expressed as a power of a.
Example 5.1 Consider GF(5). Since q = 5 =a prime number, modulo arithmetic will work.
Consider the element 2.
2° = 1 (mod 5) = 1,
21
=2(mod5)=2,
22
= 4 (mod 5) = 4,
23
= 8 (mod 5) = 3.
Hence, all the elements of GF(5), i.e., {1, 2, 3, 4, 5} can be represented as powers of 2.
Therefore, 2 is a primitive element ofGF(5).
Next, consider the element 3.
3°= 1 (mod5)= 1,
31
= 3 (modS)= 3,
32
=9 (mod 5) =4,
33
= 27 (mod 5) = 2.
Again, all the elements ofGF(5), i.e., {1, 2, 3, 4, 5} can be represented as powers of 3. Therefore,
3 is also a primitive element ofGF(5).
However, it can be tested that the other non-zero elements {1, 4, 5} are not primitive elements.
We saw in the example that there can be more than one primitive element in a field. But is
there a guarantee of finding at least one primitive element? The answer is yes! The non-zero
elements of every Galois Field form a cyclic group. Hence, a Galois Field will include an
element of order q- 1. This will be the primitive element. Primitive elements are very useful in
constructing fields. Once we have a primitive element, we can easily find all the other elements
by simply evaluating the powers of the primitive element.
Definition 5.2 A Primitive Polynomial p(x) over GF(q) is a prime polynomial
over GF(q) with the property that in the extension field constructed modulo p(x), the
field element represented by x is a primitive element.
Primitive polynomials of every degree exist over every Galois Field. A primitive polynomial
can be used to construct an extension field.
r
I;
(
l.
I
r·.
!
'i
·::
'.
Information Theory, Coding and Cryptography
Extzmple 5.2 We can construct.GF(8) using the ーイゥュゥエゥセ・@ polynomial p(x) = X3 + x + 1. Letthe
primitive element of GF(8) be a =z. Then, we can represent all the elements of GF(8) by the
powers of a evaluated modulop(x). Thus, we can form Table 5.1.
Table 5.1 The elements of GF(B).
aI t
セ@ l'
as t+ I
a4 c+z
セ@ c+z+ I
a6 .(+I
a7 I
Theorem 5.1 Let /31, セG@ ..• , f3q-1 denote the non-zero field elements of GF(q). Then,
xq- 1
- 1= (x- f31)(x- セI@ ... (x- f3q_ 1). (5.1)
Proof The set of non-zero elements of GF(q) is a finite group under the operation of
multiplication. Let f3 be any non-zero element of the field. It can be represented as a power
of the primitive element a. Let f3 = (a) r for some integer r. Therefore,
13q-1 = ((anq- 1
= ((a)q- 1
Y = (1Y = 1
because,
Hence,
f3 is a zero ッヲセM 1
- 1.
This is true for any non-zero element /3.
Hence,
Exturtpk 5.3 Consider the field GF(5). The non-zero elements of this field are {1, 2, 3, 4}.
Therefore, we can write
x4
-1 =(x- l)(x- 2)(x- 3)(x- 4).
Bose-Chaudhuri Hocquenghem (BCH) Codes
5.3 MINIMAL POLYNOMIALS
In the previous chapter we saw that in order to find the generator polynomials for cyclic codes
of blocklength n, we have to first factorize X' -1. Thus X' -I can be written as the product of its
p prime factors
X' -I = fi(x) f2(x) f3(x) ... /p (x). (5.2)
Any combination of these factors can be multiplied together for a generator polynomial g(x).
If the prime factors of X' -I are distinct, then there are (2P- 2) different non-trivial cyclic codes
ofblocklength n. The two trivial cases that are being disregarded are g(x) = 1 and g(x) =X' -1.
Not all of the (2P- 2) possible cyclic codes are good codes in terms of their minimum distance.
We now evolve a strategy for finding good codes, i.e., of desirable minimum distance.
In the previous chapter we learnt how to construct an extension field from the subfield. In
this section we will study the prime polynomials (in a certain field) that have zeros in the
extension field. Our strategy for constructing g(x) will be as follows. Using the desirable zeros in
the extension field we will find prime polynomials in the subfield, which will be multiplied
together to yield a desirable g(x).
Definition 5.3 A blocklength n of the form n = rj - 1 is called a Primitive Block
Length for a code over GF(q). A cyclic code over GF{q) of primitive blocklength is
called a Primitive Cyclic Code.
The field GF((') is an extension field of GF(q). Let the primitive block length n = rf' - 1.
Consider the factorization
X' -1 = x'f"'-1
- 1 = fi(x) f2(x) ... /p(x) (5.3)
over the field GF(q). This factorization will also be valid over the extension field GF(tj) because
the addition and multiplication tables of the subfield forms a part of the tables of the extension
field. We also know that g(x) divides X' -1, i.e., セュM Q
M 1, hence g(x) must be the product of
m-1
some of these polynomials j;(x). Also, every non-zero element of GF(rf') is a zero ッヲセ@ - 1.
Hence, it is possible to factor xqm-l - 1 in the extension field GF(rj) to get
m-1 IJ
xq - 1 = (x - /3), (5.4)
j
where 131 ranges over all the non-zero elements of GF(rf'). This implies that each of the
polynomials j;(x) can be represented in GF(rf') as a product of some of the linear terms, and
each 131is a zero of exactly one of the j;(x). This j;(x) is called the Minimal Polynomial of f3t
Definition 5..4 The smallest degree polynomial with coefficients in the base field
GF(q) that has a zero in the extension field GF(tj) is called the Minima) Polynomial
of ·.
I
I
li
Information Theory, Coding and Cryptography
Example 5.4 Consider the subfield GF(2) and its extension field GF(8). Here q = 2 and m = 3.
The factorization ofrl -1 (in the subfield/extension field) yields
rr-l- 1 = x7 - 1 =(X- 1) HセKxK@ 1) Hセ@ + セ@ + 1).
Next, consider the elements ofthe extension field GF(8). The elements can be represented as 0,
1, z, z + 1, z?, i + 1, i + z, i + z +, 1 (from Example 4.10 of Chapter 4).
Therefore, we can write
rr-1
-1 = x1
-1 =(x-1)(x-z)(x-z-1)(x-i)Cx-i-1)(x-i-z)(x-i-z-1)
= (x- 1) · [(x- z)(x- C) (x- z?- z)] · [(x- z- 1)(x -z!-l)(x- z?- z- 1)].
It can be seen that over GF(8),
Hセ@ + x + 1) = (x - z)(x - C)(x - i - z), and
Hセ@ + セ@ + 1) = (x- z- 1)(x- z
2
- 1)(x- i- z- 1).
The multiplication and addition are carried out over GF(8). Interestingly, after a little bit of
algebra it is found that the coefficients of the minimal polynomial belong to GF (2) only. We can
now make Table 5.2.
Table 5.2 The Elements of GF(B) in Terms of the Powers of the Primitive Element a
Minimal polynomial Corresponding Elements Elements in Terms
f,(x) [31
in GF(8) of powers of a
(x- I)
(x1 + x+ I)
(x1 +;+I)
I
z. i and i + z
z+ I, i + I and i + z+ I
ao
a 1
,a2
,a4
a3, a6, as (= a!2)
It is interesting to note the elements (in terms of powers of the primitive element a) that
correspond to the same minimal polynomial. If we make the observation that a12
= d ·d =
1· a5
, we see a pattern in the elements that correspond to a certain minimal polynomial. In fact,
the elements that are roots of a minimal polynomial in the extension field are of the type f3qr-l
where f3 is an element of the extension field. In the above example, the zeros of the minimal
polynomialf2(x) =; + x+ 1 are a 1
, a2 and a4
and that ofh(x) =; + ; + 1 are if, cf and d 2
.
Definition 5.5 Two elements of GF(tj) that share the same minimal polynomial
over GF(q) are called Conjugates with respect to GF(q). .
Example 5.5 The elements { a 1
, a 2
, a4
} are conjugates with respect to GF(2). They share the
same minimal polynomialf2(x) = セ@ + x + 1.
Bose-Chaudhuri Hocquenghem (BCH) Codes
As we have seen, a single element in the extension field may have more than one conjugate. The
conjugacy relationship between two elements depends on the base field. For example, the extension
field GF(16) can be constructed using eitherGF(2) orGF(4). Two elements that are conjugates of
GF(2) may not be conjugates ofGF(4).
Iff(x) is the minimal polynomial of/3, then it is also the minimal polynomial ofthe elements in
the set {/3, f3q, f3q
2
,...,f3 qr-J }, where r is the smallest integer such that f3qr-l = /3. The set {/3, f3q,
f3q
2
,•.• ,{Jqr-
1
} is called the Set of Conjugates. The elements in the set of conjugates are all the
zeros off(x). Hence, the minimal polynomial offJ can be written as
f(x) = (x- fJ)(x- fJq)(x- pi") ... (x- pi-
1
). (5.5)
Example 5.6 Consider GF(256) as an extension field ofGF(2). Let abe the primitive elementof
GF(256). Then a set of conjugates would be
{at, c?, a4, as, al6, a32, a64, al28}
Note that cl-56 = a255
d = d, hence the set of conjugates terminates with d 28. The minimal
polynomial of a is
f(x) = (x- a
1
)(x- a
2
)(x- 。セHクM a
8
)(x- 。 Q
セHクM a 32)(x- a 64
)(x- 。 Q
セ@
The right hand side of the equation when multiplied out would only contain coefficients from
GF(2).
Similarly, the minimal polynomial ofcr would be
f(x) = (x- a
3
)(x - a
6
)(x - d 2
)(x- a 24
)(x - a 48)(x - 。セHクM a 192
)(x- 、 R
セ@
Definition 5.6 BCH codes defined over GF(q) with blocklength q"'- 1 are called
Primitive BCH codes.
Having developed the necessary mathematical tools, we shall now begin our study of BCH
codes. We will develop a method for constructing the generator polynomials of BCH codes that
can correct pre-specified t random errors.
5.4 GENERATOR POLYNOMIALS IN TERMS OF MINIMAL POLYNOMIALS
We know that g(x) is a factor of XZ - 1. Therefore, the generator polynomial of a cyclic code can
be written in the form
g(x) = LCM [[I(x) f2(x), ...,JP(x)], (5.6)
where, JI(x) J2(x), ..., JP(x) are the minimal polynomials of the zeros of g(x). Each minimal
polynomial corresponds to a zero of g(x) in an extension field. We will design good codes (i.e.,
determine the generator polynomials) with desirable zeros using this approach.
.I
I .
I,
Information Theory, Coding and Cryptography
Let c(x) be a codeword polynomial and e(x) be an error polynomial. Then the received
polynomial can be written as
v(x) = c(x) + e(x) (5.7)
where the polynomial coefficients are in GF(q). Now consider the extension field GF(rj). Let y1,
セ@ •... Yp be those elements of GF(rj) which are the zeros of g(x), i.e., g(y;) = 0 for i = 1, .., p.
Since, c(x) = a(x)g(x) for some polynomial a(x), we also have c( y;) = 0 for i= 1, .., p. Thus,
v(r;) = c(r;) + e(r;)
= e()'i) fori= 1, ..., p (5.8)
For a blocklength n, we have
11-1
v()'i) = "Le1r{ fori= 1, ..., p.
j•O
(5.9)
Thus, we have a set of pequations that involve components of the error pattern only. If it is
possible to solve this set of equations for e
1, the error pattern can be precisely determined.
Whether this set of equations can be solved depends on the value of p, the number of zeros of
g(x). In order to solve for the error pattern, we must choose the set of pequations properly. Ifwe
have to design for a t error correcting cyclic code, our choice should be such that the set of
equations can solve for at most t non-zero ei
Let us define the syndromes S; = e(rJ for i = 1' ..., p. we wish to choose YI ' 12··.. Jp in such a
manner that terrors can be computed from S1, S1,•••, t If a is a primitive element, then the set
of Y; which allow the correction of t errors is {aI, a , a3
, ... , a 21
}. Thus, we have a simple
mechanism of determining the generator polynomial of a BCH code that can correct t errors.
Steps for Determining the Generator Polynomial of a t-error Correcting BCH Code:
For a primitive blocklength n = rj- 1:
(i) Choose a prime polynomial of degree m and construct GF(rj).
(ii) Find [;(x), the minimal polynomial of ai for i = 1, ..., p.
(iii) The generator polynomial for the t error correcting code is simply
g(x) = LCM [[1(x) [ 2(x), ...,[2t(x)]. (5.10)
Codes designed in this manner can correct at least t errors. In many cases the codes will be
able to correct more than t errors. For this reason,
d= 2t+ 1 (5.11)
is called the Designed Distance of the code, and the minimum distance I ;;:: 2t + 1. The
generator polynomial has a degree equal to n- k (see Theorem 4.4, Chapter 4). It should be
noted that once we fix n and セ@ we can determine the generator polynomial for the BCH code.
The information length k is, then, decided by the degree of g(x). Intuitively, for a fixed
blocklength n, a larger value of t will force the information length k to be smaller (because a
higher redundancy will be required to correct more number of errors). In the following section,
we look at a few specific examples of BCH codes.
Bose-Chaudhuri Hocquenghem (BCH) Codes
5.5 SOME EXAMPLES OF BCH CODES
The following example illustrates the construction of the extension field GF(16) from GF(2).
.The minimal polynomials obtained will be used in the subsequent examples.
Emmpk 5.7 Consider the primitive polynomialp(z) =l +z+ 1 セカ・イ@ GF(2). We shalluse this to
construct the extension field GF(16). Let a= zbe the primitive element. The elements OfGF(l6)
as powers ofa and the corresponding minimal polynomials are listed in the Table 5.3.
Table 5.3 The elements of GF(16) and the corresponding minimal polynomlafs
al t
セ@ t
rr t
a• t+ 1
セ@ c+z
a6 t+i-
a7 l+z+ 1
as c+1
a9 t+z
a1o l+z+ 1
au t+i!+z
al2 t+i!+z+l
al3 t+i-+1
at• t + 1
a1s 1
.l+x+ 1
i+x+ 1
x4
+ x3
+ x2
+ x + 1
x"+x+1
Xl+ x+ 1
x4
+ :? + x2
+ x + 1
i+x3+1
i+x+ 1
i+x3+Xl+x+1
; +x+ 1
i+f+ 1
x4
+ x3
+ セ@ + x+ 1
i+x3+ 1
x4
+ x3 + 1
x+1
Example 5.8 We wish to determine the generator polynomial of a single error correcting BCH
code, i.e., t = 1 with a b1ocklength n = 15. From (5.10), the generator polynomial for a BCH code
is given by LCM [f1(x) j2(x), ..., j21(x)]. We will make use of Table 5.3 to obtain the minimal
polynomials ft(x) and fix). Thus, the generator polynomial of the single error correcting BCH
code will be
g(x) = LCM [f1(x), f2(x)]
=LCM [(x
4
+ x + 1), (x4
+ x + 1)]
=x
4
+x+ 1
Since, deg (g (x)) =n - k, we have n - k =4, which gives k =11. Thus we have obtained the
generator polynomial of the BCH (15, 11) single error correcting code. The designed distance of
this coded= 2t + 1 = 3. It can be calculated that the minimum distanced* of this code is also 3.
Thus, in this case the designed distance is equal to the minimum distance.
Next, we wish to determine the generator polynomial of a double error correcting BCH code,
i.e., t =2 with a blocklength n =15. The generator polynomial of the BCH code will be
Information Theory, Coding and Cryptography
g(x) = LCM [fj(x), fz(x), f3(x), f4(x)]
= LCM [(x
4
+ x + I)(x
4
+ x + I)(x
4
+ セ@ + .x2 + x + I)(x4 + x + I)]
=(x
4
+x+ QIHク T
KセKクャKクK@ I)
= セ@ + x7
+ x
6
+ x4
+ 1
Since, deg (g (x)) = n- k, we haven- k = 8, which gives k = 7. Thus, we have obtained the
generator polynomial of the BCH (I5, 7) double error correcting code. The desig."led distance of
this coded= 2t + 1 = 5. It can be calculated that the minimum distanced* of this code is also 5.
Thus, in this case the designed distance is equal to the minimum distance.
Next, we determine the generator polynomial for the triple error correcting binary BCH code.
The generator polynomial of the BCH code will be
g(x) = LCM [fj(x) fz(x), f3(x), fix), f5(x), f6(x)]
= (x
4
+X+ 1)(x
4
KセKセKxK@ 1)(_x2 +X+ 1)
= XIO + X8 + セ@ + X4 + セ@ + X+ 1
In this case, deg (g (x)) = n- k = 10, which gives k = 5. Thus we have obtained the generator
polynomial of the BCH (15, 5) triple error correcting code. The designed distance of this coded=
2t + 1 = 7. It can be calculated that the minimum distanced *ofthis code is also 7. Thus in this case
the designed distance is equal to the minimum distance.
Next, we determine the generator polynomial for a binary BCH code for the case t = 4. The
generator polynomial of the BCH code will be
g(x) = LCM [fi(x) f2(x), fix), fix), f5(x), f6(x) f7(x), f8(x)]
= (x
4
+ x + 1)(x
4
+ セ@ + セ@ + x + 1)(.xl + x + 1)(x4
+ セ@ + I)
= XI4 + Xl3 + XI2 + XII + XIO + セ@ + セ@ + X7 + :J? + セ@ + X4 + セ@ + _x2 + X + I
In this case, deg (g (x)) = n- k = I4, which gives k = I. It can be seen that this is the simple
repetition code. The designed distance of this coded= 2t + I = 9. However, it can be seen that the
minimum distance d *of this code is I5. Thus in this case the designed distance is not equal to the
minimum distance, and the code is over designed. This code can actually correct (d- I)/2 = 7
random errors!
If we repeat the exercise fort= 5, 6 or 7, we get the same generator polynomial (repetition
code). Note that there are only 15 non-zero field elements in GF(16) and hence there are only 15
minimal polynomials corresponding to these field elements. Thus, we cannot go beyond t = 7
(because fort= 8 we needfi6(x), which is undefined). Hence, to obtain BCH codes that can correct
larger number oferrors we must use an extension field with more elements!
Example 5.9 We can construct GF(16) as an extension field of GF(4) using the primitive
polynomial p(z) = i + z + 1 over GF(4). Let the elements of GF(4) consist of the quaternary
symbols contained in the set {0, 1, 2, 3}. The addition and multiplication tables forGF(4) aregiven
below for handy reference.
Bose-Chaudhuri Hocquenghem (BCH) Codes
GF(4)
+ 0 1 2 3 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 0 3 2 1 0 1 2 3
2 2 3 0 1 2 0 2 3 1
3 3 2 1 0 3 0 3 1 2
Table 5.4 lists the elements of GF(16) as powers of a. and the corresponding minimal
polynomials.
Fort= 1,
Table 5.4
Powers of a Elements of GF ( 16) . .Mimmal Polynomials
g(x) = LCM [fi(x), f 2(x)]
z
z+ 2
3z+ 2
z+ 1
2
2z
2z+3
z+ 3
2z+ 2
3
3z
3z+ 1
2z+ 1
3z+ 3
1
= LCM [( .x2 + X+ 2)( .x2 + X+ 3)]
= x
4
+ x + 1
}+.x+2
}+.x+3
; + 3.x + 1
}+.x+2
.x+2
; + 2.x+ 1
; + 2.x+ 2
}+.x+3
; + 2.x + 1
x+3
; + 3.x + 3
; + 3.x + 1
; + 2.x + 2
} + 3x+ 3
.x+1
Since, deg (g (x)) = n- k, we have n- k = 4, which gives k = 11. Thus we have obtained the
generator polynomial of the single error correcting BCH (15, 11) code over GF(4). It takes in 11
quaternary information symbols and encodes them into 15 quaternary symbols.. Note. that ッセ・@
quaternary symbol is equivalent to two bits. So, in effect, the BCH (15, 1I) takes m 22mput bits
and transforms them into 30 encoded bits (can this code be used to correct a burst of length 2 for a
binary sequence of length 30?). The designed distance of this code d = 2t + 1 = 3. It can be
calculated that the minimum distance d of this code is also 3. Thus in this case the designed
distance is equal to the minimum distance.
r Information Theory, Coding and Cryptography
Fort= 2,
g(x) = LCM [f1(x), f2(x), jj(x), f4(x)]
= LCM [( xz + x + 2), ( xz + x + 3), ( xz + 3x + t), ( xz + x + 2)l
= (r +X+ 2)( r +X+ 3)( r + 3x + 1)
= x6
+ 3r+ x4
+ セ@ + 2X2 + 2x + 1.
This is the generator polynomial of a (15, 9) double error correcting BCH code over GF(4).
Fort= 3,
g(x) = LCM [f1(x), f 2(x), j 3(x), fix), f 5(x), f 6(x)]
= LCM [( r +X+ 2), ( r +X+ 3), ( r + 3x + 1), ( r +X+ 2), (x + 2), (r + 2x + 1)]
= (r + x + 2)( セ@ + x + 3)( セ@ + 3x + l)(x + RIHセ@ + 2x + t)
= セ@ + 3x
8
+ 3x
1
+ z/' + r + 2x4
+X+ 2
This is the generator polynomial of a (15, 6) triple error correcting BCH code over GF(4).
Similarly, for t = 4,
g(x) = x
11
+ x
10
+ U + 3x
1
+ 3Ji + r + 3x4
+ セ@ + X + 3.
This is the generator polynomial of a (15, 4)fourerror correcting BCH code over GF(4).
Similarly, for t = 5,
g(x) = x
12
+ 2x11
+ 3x
10
+ セ@ + 2x8
+ x
1
+ 3x6
+ 3x4
+ Sセ@ + r + 2.
This is the generator polynomial of a (15, 3)five error correcting BCH code over GF(4).
Similarly, fort= 6,
g(x) = Xl4 + Xl3 + Xl2 + Xll + XlO + XJ + セ@ + X1 + ji + r + X4 + セ@ + セKxK@ 1.
This is the generator polynomial of a (15, 1) six error correcting BCH code over GF(4). As is
obvious, this is the simple repetition code, and can correct up to 7 errors.
.Table 5.5 lists the generator polynomial ofbinary BCH codes oflength up to 25
-1. Supposewe
wtsh to construct generator polynomial of the BCH(15,7) code. From the table we have (111 010
001) for the coefficients ofthe generator polynomial. Therefore,
g(x) = x8
+ x1
+ x6
+ x4
+ 1
Table 5.5 The generator polynomials of binary BCH codes of length up to:? -1
n J.. t Generator Polynomial Coetf1c1ents
7
15
15
15
31
4
11
7
5
26
1 1011
1 10 011
2 111 010 001
3 10 100 110 111
1 100 101
Contd.
31
31
31
31
Bose-Chaudhuri Hocquenghem (BCH) Codes
21
16
11
6
2
3
5
7
11 101 101 001
1 000 Ill 110 101 111
101 100 010 011 011 010 101
11 001 011 011 110 101 000 100 111
5.6 DECODING OF BCH CODES
So far we have learnt to obtain the generator polynomial for a BCH code given the number of
random errors to be corrected. With the knowledge of the generator polynomial, very fast
encoders can be built in hardware. We now shift our attention to the decoding of the BCH
codes. Since the BCH codes are a subclass of the cyclic codes, any standard decoding procedure
for cyclic codes is also applicable to BCH codes. However, better, more efficient algorithms
have been designed specifically for BCH codes. We discuss the Gorenstein-Zierler decoding
algorithm, which is the generalized form of the binary decoding algorithm first proposed by
Petersen.
We develop here the decoding algorithm for a terror correcting BCH code. Suppose a BCH
code is constructed based on the field element a. Consider the error polynomial
e(x) = en_1? 1
+ en-2? 2
+ ... + e1x + e0 (5.12)
where at most t coefficients are non-zero. Suppose that v errors actually occur, where 0 :5 v :5 t.
Let these errors occur at locations i1, セL@ ... , ill' The error polynomial can then be written as
e(x) = e;
1
セ Q@ + ・セ@ /1. + ... + e;" i• (5.13)
where e;* is the magnitude of the kth error. Note that we are considering the general case. For
binary codes, e;* = 1. For error correction, we must know two things:
(i) where the errors have occurred, i.e., the error locations, and
(ii) what the magnitudes of these errors are.
Thus, the unknowns are i1, セ@ , ... , iv and e;1
, ・セL@ ..., ei; which signify the locations and the
magnitudes of the errors respectively. The syndrome can be obtained by evaluating the イ・セ・ゥカ・、@
polynomial at a.
s1 = v(a) = c(a) + e(a) = e(a)
= e;
1
xi1 + ・セ@ ! 2 + ... + eiv i• (5.14)
Next, define the error magnitudes, Yk = ei* for k = 1, 2, ..., v and the error locations Xk = ai* for
k = 1, 2, ..., v, where ik is the location of the kth error and Xk is the field element associated with
this location. Now, the syndrome can be written as
S1
= J1X1 + r; x; + ... + Yuxu (5.15)
We can evaluate the received polynomial at each of the powers of a that has been used to define
g(x). We define the syndromes for j= 1, 2, ..., 2tby
r
llセセ@
セセ@
Information Theory, Coding and Cryptography
Sj= v(ai) = c(ai) + e(ai) = e(ai) (5.16)
':bus, we have the following set of 2t simultaneous equations, with v unknown error locations
X1
, A;, ... , セ。ョ、@ the v unknown error magnitudes Yi, f2, ... , Yv.
S1 = YiX1 + Y2Xi + ··· + セ@ Xv
S.Z = Yix'lr + Y2X22 + ··· + セxセ@ (5.17)
S.Zt = yゥクGャセ@ + ヲRNxRセ@ + ... + セクセエ@
Next, define the error locator polynomial
A(x) ]aセK@ Av_1セM
Q@
+ ... A1x + 1 (5.18)
The zeros of this polynomial are the inverse error locations x-1 for k = 1 2 v Th t ·
k , , •.•, • a 1s,
A(x) = (1- xX1) (1 - xX2) ... (1 - xXJ (5.19)
So, if we know the coefficients of the error locator polynomial A(x), we can obtain the error
locations Xi, A;, ... , セ@ Mter some algebraic manipulations we obtain
A1SJ+ v-I+ A2SJ+ v- 2 +-
... + AvSJ+ vfor j= I, 2, ..., v (5.20)
This is nothing but a set of linear equations that relate the syndromes to the coefficients of
A(x). This set of equations can be written in the matrix form as follows.
[
s1 s2 sll_1 sll ] [ セ@ J [-s11+ 1
]
S2 s3 sll sll+l セMQ@ = -s11+2
s1J s'(}+ 1 s21J-2 s211_I AI - s211
(5.21)
The values of エィセ@ 」ッ・セ」セ・ョエウ@ ッセ@ the error locator polynomial can be determined by inverting
the syndrome matrix. This IS poss1ble only if the matrix is non-singular. It can be shown that th"
. . lS
matrix IS non-singular if there are v errors.
Steps for Decoding BCH Codes
(i) As a trial カセオ・L@ セ・エ@ v= t and compute the determinant of the matrix of syndromes, M. If
the determi?ant IS zero, ウセエ@ v = t - 1. Again compute the detetminant of M. Repeat this
process until a :alue of v 1s セッオョ、@ for which the determinant of the matrix of syndromes
Is non-zero. This value of vIs the actual number of errors that occurred.
(ii) Invert the matrix M and find the coefficients of the error locator polynomial A{;x).
(iii) s 1 ( )
o ve .A.x =?to obtain the zeros and from them compute the error locations X1
, A;, ... ,
Xzr If It Is a bmary code, stop (because the magnitudes of error are unity).
(iv) If the code is not binary, go back to the system of equations:
S1 = YiXr + Y2Xi + ··· + セ@ Xv
S.Z = Yix'l1 + f2_x22 + ... + セxセ@
(' = y,_x'lt + y;x2t + + y x2t
i.J<).t 1 1 2 2 .. . v v
Bose-Chaudhuri Hocquenghem (BCH) Codes
Since the error locations are now known, these form a set of 2t linear equations. These can be
solved to obtain the error magnitudes.
Solving for Ai by inverting the vx vmatrix can be computationally expensive. The number of
computaticns required will be proportional to v3
. If we need to correct a large number of errors
(i.e., a large v) we need more efficient ways to solve the matrix equation. Various refinements
have been found which greatly reduce the computational complexity. It can be seen that the v x
v matrix is not arbitrary in form. The entries in its diagonal perpendicular to the main diagonal
are all identical. This property is called persymmetry. This structure was exploited by Berleykamp
(1968) and Massey (1969) to find a simpler solution to the system of equations.
The simplest way to search for the zeros of A(x) is to test out all the field elements one by one.
This method of exhaustive search is known as the Chien search.
Example 5.10 Consider the BCH (15, 5) triple error correcting code with the generator
polynomial
g(x) = x10
+ x
8
+ :x? + x
4
+ X: + x + 1
Let the all zero codeword be transmitted and the received polynomial be v(x) =:x? + セN@ Thus, there
are two errors at the locations 4 and 6. The error pol1£1omial e(x) = :x? + x
3
• But the decoder does
not know this. It does not even know how ma.'ly errors have actually occurred. We use the
Gorenstein-Zierler Decoding Algorithm. First we compute the syndromes using the arithmetic of
GF(16).
S1 = as + a 3 = all
S2 = a 10
+ a
6
= a7
s3 = a 15
+ a9
= a
7
s4 = a20 + al2 = al4
Ss =a 25 + aIs =as
56 = a 3o + a1s = al4
First set v = t = 3, since this is a triple error correcting code.
Det (M) =0, which implies that fewer than 3 errors have occurred. Next, set v =t =2.
セセセ@
1.
':.. :
1
,,
Information Theory, Coding and Cryptography
Det (M) :1:. 0, which implies that 2 errors have actually occurred. We next calculate M-1• It so
happens that in this case,
Solving for A1 and A2 we get Az =a8
and A1 =a11
• Thus,
A(x) = a
8
セ@ + a
11
x + I= ( a 5
x + 1)( a 3
x + 1).
Thus, the recovered errorlocations are a 5
and a3
• Since the code is binacy, the errormagnitudes
are 1. Thus, e(x) = x5
+ x3
•
In the next section, we will study the famous Reed-Solomon Codes, an important sub-class of
BCHcodes.
5.7 REED-SOLOMON CODES
Reed-Solomon (RS) Codes are an important subclass of the non-binary BCH with a wide range
of applications in digital communications and data storage. The typical application areas of the
RS code are
• Storage devices (including tape, Compact Disk, DVD, barcodes, etc),
• Wireless or mobile communication (including cellular telephones, microwave links, etc),
• Satellite communication,
• Digital television I Digital Video Broadcast (DVB),
• High-speed modems such as those employing ADSL, xDSL, etc.
It all began with a five-page paper that appeared in 1960 in the journal of the Society for
ャョセオウエイゥ。ャ@ andAppliedMathematics. The paper, "Polynomial Codes over Certain Finite Fields" by
Irvmg S. Reed and Gustave Solomon of MIT's Lincoln Laboratory, introduced the ideas that
form a significant portion of current error correcting techniques for everything from computer
hard disk drives to CD players. Reed-Solomon Codes (plus a lot of engineering wizardry, of
course} ュ。、セ@ possible the stunning pictures of the outer planets sent back by Voyager II. They
make It possible to scratch a compact disc and still enjoy the music. And in the not-too-distant
future, they will enable the profit mongers of cable television to squeeze more than 500 channels
into their systems.
RS coding system is based on groups of bits, such as bytes, rather than individual Os and 1s
making it particularly good at dealing with bursts of errors: six consecutive bit errors fo:
example, can affect at most two bytes. Thus, even a double-error-correction version of a rセ・、ᆳ
Solomon code can provide a comfortable safety factor. Current implementations of Reed-
Bose-Chaudhuri Hocquenghem (BCH) Codes
Solomon codes in CD technology are able to cope with error bursts as long as 4000 consecutive
bits.
In this sub-class of BCH codes, the symbol field GF(q) and the error locator field GF(qm) are
the same, i.e., m = 1. Thus, in this case
ョ]セMQ]アMQ@
The minimal polynomial of any element f3 in the same field GF(q) is
[p(x) = x- f3
(5.22)
(5.23)
Since the symbol field (sub-field) and the error locator field (extension field) are the same, all
the minimal polynomials are linear. The generator polynomial for at error correcting code will
be simply
g(x) = LCM [/i(x) fAx), ...,he(x)]
= (x- a)(x- a 2) ... (x- a 2
t-l) (x- a 2
t) (5.24)
Hence, the degree of the generator polynomial will always be 2t. Thus, the RS code satisfies
n - k = 2t (5.25)
In general, the generator polynomial of an RS code can be written as
g(x) = (x- ai)(x- ai+l) ... (x- a 2
t+t--
1
)(x- a 2t+i) (5.26)
,
Exturtpls 5.11 Consider the double error correcting RS code of blocklength 15 over GF (16).
Here t =2. We use here the elements ofthe extension field GF (16) constructed from GF(2)using
the primitive polynomialp(z) =z4
+z+ 1. The generator polynomial can be written as
g(x) =(x- a)(x- a 2
) (x- a 3
) (x- 。セ@
=x4
+ Hセ@ + i + 1) セ@ + Hセ@ + i)セ@ + セク@ + (i + z + 1)
=x4
+ H。 Q
セ@ セKH。セ@ Xl + (a
3
) x + a
10
Here n -lc =4, which implies lc = 11. Thus, we have obtained the generator polynomial of an RS
(15, 11) code overGF(16). Note thatthiscodingprocedure takes in 11 symbols (equivalentto4x
11 =44 bits) and encodes them into 15 symbols (equivalent to 60 bits).
Theorem 5.2 A Reed-Solomon Code is a Maximum Distance Separable (MDS) Code
and its minimum distance is n - k + 1.
Proof Let the designed distance of the RS code be d= 2t + 1. The minimum distance d
satisfies the condition
I"?. d= 2t+ 1
But, for an RS code, 2t = n - k. Hence,
d"?.d=n-k.
But, by the Singleton Bound for any linear code,
Information Theory, Coding and Cryptography
、セ@ n- k
Thus, d'" = n - k + 1, and the minimum distance d = d, the designed distance of the code.
Since RS. codes are maximum distance separable (MDS), all of the possible code words are
as far away as possible algebraically in the code space. It implies a uniform code word
distribution in the code space.
Table 5.6 lists the parameters of someRS codes. Note that for a given minimum distance, in
order to have a high code rate, one must work with larger Galois Fields.
Table 5.6 Some RS code parameters
m q =2"1
n =q - 1 t k d r = k/n
2 4 3 1 3 0.3333
3 8 7 1 5 3 0.7143
2 3 5 0.4286
3 1 7 0.1429
4 16 15 1 13 3 0.8667
2 11 5 0.7333
3 9 7 0.6000
4 7 9 0.4667
5 5 11 0.3333
6 3 13 0.2000
7 1 15 0.0667
5 32 31 1 29 3 0.9355
5 21 11 0.6774
8 15 17 0.4839
8 256 255 5 245 11 0.9608
15 225 31 0.8824
50 155 101 0.6078
Example 5.12 A popular Reed-Solomon code is RS(255,223) with 8-bit symbols (bytes), i.e.,
over GF (256). Each codeword contains 255 code word bytes, ofwhich 223 bytes are data and 32
bytes are parity. For this code, n =255, k = 223 and n- k =32. Hence, 2t =32, ort =16. Thus, the
decoder can correct any 16 symbol random error in the codeword,·i.e., errors in up to 16 bytes
anywhere in the 」ッ、・セッイ、@ can be corrected.
Example 5.13 Reed Solomon error correction codes have an extremely pronounced effect on the
efficiency of a digital communication channel. For example, an operation running at a datarate of
1 million bytes per second will carry approximately 4000 blocks of255 bytes eachsecond. If 1000
random short errors (less than 17 bits in length) per second are injected into thechannel, about 600
to 800 blocks per second would be corrupted, which might require retransmission ofnearly all of
the blocks. By applying the Reed-Solomon (255, 235) code (thatcorrects up to 10 errors per block
of 235 information bytes and 20 parity bytes), the typical time between blocks that cannot be
corrected and would require retransmission will be about 800 years. The mean time between
incorrectly decoded blocks will be over 20 billion years!
Bose-Chaudhuri Hocquenghem (BCH) Codes
5.8 IMPLEMENTATION OF REED-SOLOMON ENCODERS AND DECODERS
Hardware Implementation
A number of commercial hardware implementations exist for RS codes. Many existing systems
use off-the-shelf integrated circuits that encode and decode Reed-Solomon codes. These ICs
tend to support a certain amount of programmability, for example, RS(255, k) where t = 1 to 16
symbols. The recent trend has been towards VHDL or Verilog Designs (logic cores or
intellectual property cores). These have a number of advantages over standard ICs. A logic core
can be integrated with other VHDL or Verilog components and synthesized to an FPGA (Field '
Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)-this enables so-
called "System on Chip" designs where multiple modules can be combined in a single IC.
Depending on production volumes, logic cores can often give significantly lower system costs
than standard ICs. By using logic cores, a designer avoids the potential need to do a life-time
buy of a Reed-Solomon IC.
Software Implementation
Until recently, software implementations in "real-time" required too much computational power
for all but the simplest of Reed-Solomon codes (i.e. codes with small values oft). The major
difficulty in implementing Reed-Solomon codes in software is that general purpose processors
do not support Galois Field arithmetic operations. For example, to implement a Galois Field
multiply in software requires a test for 0, two log table look-ups, modulo add and anti-log table
look-up. However, careful design together with increases in processor performance mean that
software implementation can operate at relatively high data rates. Table 5.7 gives sample
benchmark figures on a 1.6 GHz Pentium PC. These data rates are for decoding only. Encoding
is considerably faster since it requires less computation.
Table 5.7 Sample benchmark figures for software decoding of
some RS codes
· Code Data Rate t .
RS(255,251)
RS(255,239)
RS(255,223)
5.9 NESTED CODES
- 120 Mbps
- 30 Mbps
- 10 Mbps
2
8
16
One of the ways to achieve codes with large blocklengths is to nest codes. This technique
combines a code of a small alphabet size and one of a larger alphabet size. Let a block of ttary
symbols be of length kK. This block can be broken up into Ksub-blocks of k symbols. Each sub-
block can be viewed as an element of a l-ary alphabet. A sequence of K such sub-blocks can be
encoded with an (N, K) code over GF(q*). Now, each of theN q*-ary symbols can be viewed as
.I
Information Theory, Coding and Cryptography
k q-ary symbols and can be coded with an (n, k) q-ary code. Thus, a nested code has two distinct
levels of coding. This method of generating a nested code is given in Fig. 5.I.
セMMMMMMMMMMMMMMMMMMMMQ@
Outer Encoder.
- (N, K)Code
over GF(qk}
1 アセ\M。イケ@ super channel I
I
I
r-r
I
I
I
Inner Encoder:
(n, k) Code t----
over GF(q)
I
I
q-ary
f----+- Inner Decoder t-+
channel
I
_____________________ j
Fig. 5. 1 Nesting of Codes.
Outer Decoder
Example 5.14 The following two codes can be nested to form a code with a larger blocklength.
Inner code: The RS (7, 3) double error correcting code over GF(8).
Outer code: The RS (5II, 505) triple error correcting code over GF(83).
On nesting these codes we obtain a (3577, 1515) code over GF(8). This code can correct any
random pattern of II errors. The codeword is 3577 symbols long, where the symbols are the
elements ofGF(8).
Example 5.15 RS codes are extensively used in the compact discs (CD) for ermr correction.
Below we give the standard Compact Disc digital fonnat.
Sampling frequency: 44.1 kHz, i.e., 10% margin with respect to the Nyquist frequency (audible
frequencies below 20kHz)
Quantization: 16-bit linear=> theoretical SNR about 98 dB (for sinusoidal signal with maximum
allowed amplitude), 2's complement
Signal format: Audio bit rate 1.4I Mbit/s (44.I kHz x 16 bits x 2 channels), Cross Interleave
Reed-Solomon Code (CIRC), total data rate (CIRC, sync, subcode) 2.034 Mbit/s.
Playing time: Maximum 74.7 min.
Disc specifications: Diameter 120 mm, thickness 1.2 mm, track pitch 1.6 f.lill, one side medium,
disc rotates clockwise, signal is recorded from inside to outside, constant linear velocity (CLV),
recording maximizes recording density (the speed of revolution of the disc is not constant; it
gradually decreases from 500 to 200 r/min), pit is about 0.5 J.1IIl wide, each pit edge is '1' and all
areas in between, whether inside or outside a pit, are '0's.
E"or Correction: A typical error rate of a CD system is w-5, which means that a data error
occurs roughly 20 times per second (bit rate x BER). About 200 error/s can be ..;Orrected.
Soutces ofe"ors: Dust, scratches, fingerprints, pit asymmetry, bubbles or defects in substrate,
coating defects and dropouts.
Bose-Chaudhuri Hocquenghem (BCH) Codes
Cross Interleave Reed-Solomon Code (CIRC)
• C2 can effectively correct burst errors.
• C1 can correct random errors and detect burst errors.
• Three interleaving stages to encode data before it is placed on a disc.
• Parity checking to correct random errors
• Cross interleaving to permit parity to correct burst errors.
I. Input stage: I2 words (16-bit, 6 words per channel) of data per input frame divided into 24
symbols of 8 bits.
2. C2 Reed Solomon code: 24 symbols of data are enclosed into a (28, 24) RS code and 4
parity symbols are used for error correction.
3. Cross interleaving: to guard against burst errors, separate error correction codes, one code
can check the accuracy of another, error correction is enhanced.
4. C1 Reed-Solomon code: cross-interleaved 28 symbols of the C2 code are encoded again
into a (32, 28) R-S code (4 parity symbols are used for error correction).
5. Output stage: half of the code word is subject to a 1-symbol delay to avoid 2-symbol error
at the boundary of symbols.
Performance ofCIRC: Both RS coders (C1 and C2) have four parities, and their minimumdistance
is 5. If error location is not known, up to two symbols can be corrected. If the errors exceed the
correction limit, they are concealed by interpolation. Since even-numbered sampleddata and odd-
numbered sampled data are interleaved as much as possible, CIRC can conceal long burst errors by
simple lin<?ar interpolation.
• Maximum correctable burst length is about 4000 data bits (2.5 mm track length).
• Maximum correctable burst length by interpolation in the worst case is about 12320 data
bits (7.7 mm track length).
Sample interpolation rate is one sample every IO hours at BER (Bit Error Rate)= 1o-
4
and 1000
samples at BER = 10-3
• Undetectable error samples (clicks) are less than one every 750 hours at
BER = 10-3
and negligible at BER = 10-4
.
5.10 CONCLUDING REMARKS
The class of BCH codes were discovered independently by Hocquenghem in I959 and Bose
and Ray Chaudhuri in I960. The BCH codes constitute one of the most important and powerful
classes of linear block codes, which are cyclic.
The Reed-Solomon codes were discovered by Irving S. Reed and Gustave Solomon who
published a five-page paper in the journal of the Society for Industrial and Applied Mathematics
in 1960 titled "Polynomial Codes over Certain Finite Fields". Despite their advantages, Reed-
Information Theory, Coding and Cryptography
Solomon codes did not go into use immediately after their invention. They had to wait for the
hardware technology to catch up. In 1960, there was no such thing as fast digital electronics, at
least not by today's standards. The Reed-Solomon paper suggested some nice ways to process
data, but nobody knew if it was practical or not, and in 1960 it probably wasn't practical.
Eventually technology did catch up, and numerous researchers began to work on
implementing the codes. One of the key individuals was Elwyn Berlekamp, a professor of
electrical engineering at the University of California at Berkeley, who invented an efficient
algorithm for decoding the Reed-Solomon code. Berlekamp's algorithm was used by Voyager
II and is the basis for decoding in CD players. Many other bells and whistles (some of
fundamental theoretic significance) have also been added. Compact discs, for example, use a
version called cross-interleaved Reed-Solomon code, or CIRC.
SUMMARY
• A primitive element of GF(q) is an element a such that every field element except zero
can be expressed as a power of a. A field can have more than one primitive element.
• A primitive polynomial p(x) over GF(q) is a prime polynomial over GF(q) with the
property that in the extension field constructed modulo p(x), the field element
represented by x is a primitive element.
• A blocklength n of the form n = rf - 1 is called a primitive block length for a code over
GF(q). A cyclic code over GF(q) of primitive blocklength is called a primitive cyclic code.
• It is possible to factor xqm-r - 1 in the extension field GF(rf) to ァセエクアュMャ@ - 1= IT (x- fJi),
j
where /3- ranges over all the non-zero elements of GF(rf). This implies that each of the
polynon'iials fi(x) can be represented in GF(rf) as a product of some of the linear terms,
and each {31
is a zero of exactly one of the fi (x). This fi(x) is called the minimal polynomial
of f3t
• Two elements of GF(rf) that share the same minimal polynomial over GF(q) are called
conjugates with respect to GF (q).
• BCH codes defined over GF(q) with blocklength rf - 1 are called primitive BCH codes.
• To determine the generator polynomial of a t-error correcting BCH code for a primitive
blocklength n = qm- 1, (i) Choose a prime po.lynomial of degree m and construct GF(q,,
(ii) find Ji(x), the minimal polynomial of a1
for i = 1, ..., p. (iii) obtain the generator
polynomial g(x) = LCM lfi(x) f2(x), ...,.f2e(x)]. Codes designed in this manner can correct
at least terrors. In many cases the codes will be able to correct more than terrors. For
this reason, d = 2t + 1 is called the designed distance of the code, and the minimum
distance d セ@ 2t + 1.
• Steps for decoding BCH codes:
(1) As a trial value, set v= t and compute the determinant of the matrix of syndromes, M.
If the determinant is zero, set v= t - 1. Again compute the determinant ofM. Repeat
Bose-Chaudhuri Hocquenghem (BCH) Codes
this process until a value of v is found for which the determinant of the matrix of
syndromes is non zero. This value of v is the actual number of errors that occurred.
(2) Invert the matrix M and find the coefficients of the error locator polynomial A(x).
(3) Solve A(x) = 0 to obtain the zeros and from them compute the error locations Xi, x;,
... , Xrr If it is a binary code, stop (because the magnitudes of error are unity).
(4) If the code in not binary, go back to the system of equations:
S1 = YrX1 + Y2X2 + ... + YvXv
セ@ = ll.x'lr + J2.x'l2 + ··· + f;; xセ@
セエ@ = YrX2i + jRxRセ@ + ... + ヲ[[xセエ@
Since the error locations are now known, these form a set of2t linear equations. These
can be solved to obtain the error magnitudes.
• The generator polynomial for a terror correcting RS code will be simply g(x) = LCM[fi(x)
f2(x), ..., ht(x)] = (x- a)(x- a2
) ... (x- a2
t-
1
)(x- ift). Hence, the degree of the generator
polynomial will always be 2t. Thus, the RS code satisfies n - k = 2t.
• A Reed-Solomon code is a Maximum Distance Separable (MDS) Code and its minimum
distance is n- k + 1.
• One of the ways to achieve codes with large blocklengths is to nest codes. !his technique
combines a code of a small alphabet size and a code of a larger alphabet Size. Let a b}ock
of q-ary symbols be of length kK. This block can be broken up into K subblocks of k
symbols. Each sub-block can be viewed as an element of a l-ary alphabet.
9 BoョN」・Mケッキセエmイセ@ whaawer セ@ n.o- :
·I セィ・キ@ ゥューセ@ セッッエョN・Mエイセ@ ..
J -UセQAセ@ (by SirAt"dwr CO"f.at'!Voyles 1859-1930)
PRO'BLEMS
Vconstruct ,CF(9) from Gft3) using an appropriate primitive ーッャセッュゥ。ャN@
..£Z(i) Find the ァ・ョ・セ。エッイ@ ーッャケョッュゥセ@ g (x) for a セセセYQ・@ error 」ッイイ・」⦅エゥョセ@ Je__rnary BCH code of
'C/ 「ャッ」ォャ・ョ・ᄋキセ。エ@ is the code rate of this code? cセセー。イ・@ It セエャヲエィ・@ (11, 6) ternary
Golay 」ッセエィ@ respect to the code rate and the mimmum distance.
(ii) Next, find' the generator polynomial g(x) for a triple error correcting ternary BCH
code of blocklength 26.
5.3 Find the generator polynomial g(x) for a binary BCH code of 「ャッ」ォャ・ョセ@ 31. ⦅ャjセ・@ the
primitive polynomial p(x) = セ@ + } + 1 to construct GF(32). What is e mimmum
distance of this code?
セ、@ the generator polynomials and the minimum distance for the following codes:
§IRS (15, 11) code
,,
Ill
lJ
! I
:!
Information Theory, Coding and Cryptography
(ii) RS (15, 7) code
(iii) RS (31, 21) code.
5.5 Show that every BCH code is a subfield subcode of a Reed-Solomon Code of the same
designed distance. Under what condition is the code rate of the BCH code equal to that of
the RS code?
5.6 Consider the code over GF(11) with a parity check matrix
- {セ@ セ@ セ@ セZZ@ Qセ@ l
H- 1 'J? セ@ 1o2
1 r.t :il 1ol
(i) Find the minimum distance of this code.
(ii) Show that this is an optimal code with respect to the Singleton Bound.
5.7 Consider the code over GF(11) with a parity check matrix
1 I 1 1 1
1 2 3
• 10
1 2? セ@ .2 1cf
H=
I cji <ji .3 Hf
I セ@ 3' •• HJ'
1 z5 セ@ .5 tOS
(i) Show that the code is a triple error correcting code.
(ii) Find the generator polynomial for this code.
COMPUTER PROBLEMS
5.8 Write a computer program which takes in the coefficients of a primitive polynomial, the
values of q and m, and then constructs the extension field GF(tj).
5.9 Write a computer program that performs addition and multiplication over GF(2m),
where m is an integer.
5.10 Find the generator polynomial g(x) for a binary BCH code of blocklength 63. Use the
primitive polynomial p(x) = } + x+ 1 to construct GF(64). What is the minimum distance
of this code?
5.11 Write a program that performs BCH decoding given n, q, t and the received vector.
5.12 Write a program that outputs the generator polynomial of the Reed-Solomon code with
the codeword length n and the message length k. A valid n should be 2M- 1, where M is
an integer not less than 3. The program should also list the minimum distance of the
code.
5.13 Write a computer program that performs the two level RS coding as done in a standard
compact disc.
6
Convolutional Codes
f rhet セー。エャャNエ@ o・エキ・・KG|ャエャ・ャッMセ@ cョエセ@ イ・。ャLセ@
I
セセセ」」キアZゥャL・NヲエOセ@
L j。L」アオ。MQAセQXVUᄋQYVS@
6.1 INTRODUCTION TO CONVOLUTIONAL CODES
So far we have studied block codes, where a block of k information symbols are encoded into a
· block of n coded symbols. There is always a one-to-one correspondence between the uncoded
block of symbols (information word) and the coded block of symbols (codeword). This method
is particularly useful for high data rate applications, where the incoming stream of uncoded data
is first broken into blocks, encoded, and then transmitted (Fig. 6.1). A large blocklength is
important because of the following reasons.
(i) Many of the good codes that have large distance properties are of large blocklengths
(e.g., the RS codes),
(ii) Larger blocklengths imply that the encoding overhead is small.
However, very large blocklengths have the disadvantage that unless the entire block of
encoded data is received at the receiver, the decoding procedure cannot start, which may result
in delays. In contrast, there is another coding scheme in which much smaller blocks of uncoded
Information Theory, Coding and Cryptography
data of length セ。イ・@ used. These are called Information Frames. An information frame
typically contains just a few symbols, and can have as few as just one symbol! These information
frames are encoded into Codeword Frames of length no . However, just one information
frame is not used to obtain the codeword frame. Instead, the current information frame with
previous m information frames are used to obtain a single codeword frame. This implies that
such encoders have memory, which retain the previous m incoming information frames. The
codes that are obtained in this fashion are called Tree Codes. An important sub-class of Tree
Codes, used frequently in practice, is called Convolutional Codes. Up to now, all the decoding
techniques discussed are algebraic_ and are memoryless, i.e. decoding decisions are based only
on the current codeword. Convolutional codes make decisions based on past information, i.e.
memory is required.
101110... 01100101 ...
Block
encoder
Fig. 6.1 Encoding Using a Block Encoder.
In this chapter, we start with an introduction to Tree and Trellis Codes. We will then,
develop the necessary mathematical tools to construct convolutional codes. We will see that
convolutional codes can be easily represented by polynomials. Next, we will give a matrix
description of convolutional codes. The chapter goes on to discuss the famous Viterbi
Decoding Technique. We shall conclude this chapter by giving an introduction to Turbo
Coding and Decoding.
6.2 TREE CODES AND TRELLIS CODES
We assume that we have an infinitely long stream of incoming symbols (thanks to the volumes
of information sent these days, it is not a bad assumption!). This stream of symbols is first
broken up into segments of セ@ symbols. Each segment is called an Information Frame, as
mentioned earlier. The encoder consists of two parts (Fig. 6.2):
(i) memory, which basically is a shift register,
(ii) a logic circuit.
The memory of the encoder can store m information frames. Each time a new information
frame arrives, it is shifted into the shift register and the oldest information frame is discarded. At
the end of any frame time the encoder has m most recent information frames in its memory,
which corresponds to a total of mlc0 information symbols.
When a new frame arrives, the encoder computes the codeword frame using this new frame
that has just arrived and the stored previous m frames. The computation of the codeword frame
is done using the logic circuit. This codeword frame is then shifted out. The oldest information
frame in the memory is then discarded and the most recent information frame is shifted in. The
Convolutional Codes
encoder is now ready for the next incoming information frame. Thus, for every information
frame セ@ symbols) that comes in, the encoder generates a codeword frarrie (no symbols). It
should be observed that the same information frame may not generate the same codeword
frame because the codeword frame also depends on the m previous information frames.
Definition 6.1 The Constraint Length of a shift register encoder is defined as the
number of symbols it can store in its memory. We shall give a more formal definition
of cqnstraint length later in this chapter.
If the shift register encoder stores m previous information frames of length セL@ the constraint
length of this encoder v= ュセN@
101110...
Information
frame
Fig. 6.2 A Shift Register Encoder that Generates a Tree Code.
Definition 6.2 The infinite set of all infinitely long codewords obtained by feeding
every possible input sequence to a shift register encoder is called a (AQ, 71Q ) Tree
Code. The rate of this tree code is defined as
(6.1)
A more formal definition is that a HセL@ no) Tree Code is a mapping from the set of
semi infinite sequences of elements of GF(q) into itself such that iffor any m, two semi
infinite sequences agree in the first mAQ components, then their images agree in the
first Nセ@ components.
Definition 6.3 The Wordlength of a shift register encoder is defined as k= (m +
l)AQ. The Blocklength of a shift register encoder is defined as n = (m + 1)71Q =k ::
Note that the code rate R =セ@ =.!... Normally, for practical shift register encoders,
no n
the information frame length Ao is small (usually less than 5). Therefore, it is difficult
to obtain the code rate R of tree codes close to unity, as is possible with block codes
(e.g., RS codes).
Information Theory, Coding and Cryptography
Definition 6.4 A (no, セI@ tree code that is linear, time-invariant, and has a finite
wordlength k= (m + 1)k 0 is called an (n, k) Convolutional Code.
Definition 6.5 A (no, Ao ) tree code that is time-invariant and has a finite wordlength
k is called an Hセ@ k) Sliding Block Code. Thus, a linear sliding block code is a
convolutional code.
Example'6.1 Consider the convolutional encoder given in Fig. 6.3.
Input
r-------------------------
1 I
I I
I
I
I
I
Shift
Fig. 6.3 Convolutional Encoder of Example 6. 7.
Encoded
Output
This encoder takes in one bit at a time and encodes it into 2 bits. The information frame length
ko =セL@ the code':ord frame length n0 = 2 and the blocldength (m + 1)no= 6 . The constraint length
of this セョ」セイ@ IS v = 2 and the code rate セM The clock rate of the outgoing data is twice as fast as
that of⦅ュ」ッュュセ@ data. The adders are binary adders, and from the point of view of circuit imple-
mentation, are simply XOR gates.
.Let us assume that the initial state of the shift register is [0 0]. Now, either '0' will come or •1•
will come as the incoming bit. Suppose '0' comes. On performing the logic operations, we see that
the_comput.ed value of the codeword frame is [0 0]. The 0 will be pushed into the memory (shift
イ・ァセウエ・イI@ and the rightmost '0' will be dropped. The state of the shift register remains [0 O] Next
let '1 ' · th · . '
セカ・⦅@ at e encoder. Agam we perform the logic operations to compute the codeword
セセM This セュ・@ w_e obtain [1 1]. So, this will be pushed out as the encoded frame. The incoming
1. キゥャャセ@
ウィゥヲセ@ mto the memory, and the rightmost bit will be dropped. So the new state of the
shift イ・ァセウエ・イ@ will be [1 0].
I;J ·IL.itl
Before
セ@
セ@ ·-.,.. セ@
After Drop the
oldest bit
__________________________c
__
ッョ⦅カ⦅ッ⦅ャ⦅オエ⦅ゥッ⦅ョ⦅。⦅i⦅c⦅ッ⦅、セ・セウMMMMMMMMMMMMMMMMMMMMMMMMMMセ@
Table 6.1 lists all the possibilities.
Table 6.1 The Incoming and Outgoing Bits of the Convolutional Encoder.
lncomm9 Current State of Outgomg Blfs -
Blf the Encode!
0
1
0
1
0
1
0
1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
We observe that there are only 22
= 4 possible states of the shift register. So, we can construct
the state diagram of the encoder as shown in Fig. 6.4. The bits associated with each arrow
represents the incoming bit. It can be seen that the same incoming bit gets encoded differently
depending on the current state of the encoder. This is different from the linear block codes
studied in previous chapters where there is always a one-to-one correspondence between the
incoming uncoded block of symbols (information word) and the coded block of symbols
(codeword).
Fig. 6.4 The state Diagram for the Encoder in Example 6. 7.
The same information contained in the state diagram can be conveyed usefully in terms of a
graph called the Trellis Diagram. A trellis is a graph whose nodes are in a rectangular grid,
which is semi-infinite to the right. Hence, these codes are also called Trellis Codes. The number
of nodes in each column is finite.The following example gives an illustration of a trellis diagram.
Example 6.2 The trellis diagram for the convolutional encoder discussed in Example 6.1 is given
in Fig. 6.5.
Every node in the trellis diagram represents a state of the shift register. Since the rate of the
encoder ゥウセL@ one bit at a time is processed by the encoder. The incoming bit is either a '0' or a '1'.
Therefore, there are two branches emanating form each node. The top branch represents the input
as '0' and the lower branch corresponds to '1'. Therefore, labelling is not required for a binary
trellis diagram. In general, one would label each branch with the input symbol to which it
corresponds. Normally, the nodes that cannot be reached by starting at the top left node and moving
only to the right are not shown in the trellis diagram. Corresponding to a certain state and a
particular incoming bit, the encoder will produce an output. The output ofthe encoder is written on
f'
I
ii
I
Information Theory, Coding and Cryptography
0
0·
0·
0·
States MMMMMMMMMMMセ@
Time axis
• • •
Continues to
infinity
Fig. 6.5 The Trellis Diagram for the Encoder Given in Fig. 6.3.
top of that 「イ。ョセィN@ Thus, a trellis diagram gives a very easy method to encode a stream of input
data. The encoding procedure using a trellis diagram is as follows.
• We start from the top left node (since the initial state of the encoder is [0 O]).
• Depending on whether a '0' or a '1' comes, we follow the upper or the lower branch to the next
node.
• The encoder output is read out from the top of the branch being traversed.
• Again, depending on whether a '0' or a '1' comes, we follow the upper or the lower branch
from the current node (state).
• Thus, the encoding procedure is simply following the branches on the diagram and reading out
the encoder outputs that are written on top ofeach branch.
Encoding the bit stream 1 0 0 1 1 0 1 ... gives a trellis diagram as illustrated in Fig. 6.6. The
encoded sequence can be read out from the diagram as 11 01 11 11 10 10 00 ....
It 」セ@ be seen エセエ@ .there is a one-to-one correspondence between the encoded sequence and a
path _m the エイ・ャセコウ@ セコ。ァイ。ュN@ Should the decoding procedure, then, just search for the most likely
path m the trellis diagram? The answer is yes, as we shall see further along in this chapter!
G
•
•
Fig. 6.6 Encoding an Input Sequence Using the Trellis Diagram.
Convolutional Codes
6.3 POLYNOMIAL DESCRIPTION OF CONVOLUTIONAL CODES
(ANALYTICAL REPRESENTATION)
In contrast to the two pictorial representations of the convolutional codes (the state diagram and
the trellis diagram), there is a very useful analytical representation of convolutional codes. The
representation makes use of the delay operator, D. We have earlier seen a one-to-one
correspondence between a word (vector) and a polynomial. The delay operator is used in a
similar ュセョ・イN@ For example, consider the word 10100011 with the oldest digit at the left. The
analytical representation (sometimes referred to as the transform) of this information word /(D)
will be
(6.2)
The indeterminate D is the number of time units of delay of that digit relative to the chosen
time origin, which is usually taken to coincide with the first bit. In general, the sequence IQ, i1, セN@
z3 .... has the representation IQ + i1D + セ@ JJ + z3d + ...
A convolutional code over GF (q) with a wordlength k= (m + 1)AQ, a blocklength n = (m + 1)no
and a constraint length v = ュセ@ can be encoded by sets of finite impulse response (FIR) filters.
Each set of filters consist of セ@ FIR filters in GF (q). The input to the encoder is a sequence of Ao
symbols, and the output is a sequence of no symbols. Figure 6.7 shows an encoder for a binary
convolutional code with Ao = 1 and no= 4. Figure 6.8 shows a convolutional filter with セ@ = 2 and
7lo = 4.
Each of the FIR filters can be represented by a polynomial of degree セ@ m. The input stream of
symbols can also be represented by a polynomial. The operation of the filter can then simply be
a multiplication of the two polynomials. Thus, the encoder (and hence the code) can be
represented by a set ofpolynomials called the generator polynomials of the code. This set
contains AoTlo polynomials. The largest degree of a polynomial in this set of generator
polynomials is m. We remind the reader that a block code was represented by a single generator
polynomial. Thus we can define a generator polynomial matrix of size AQ X 7lo for a
convolutional code as follows.
G(D) = [giJ (D)] (6.3)
FIR
FIR
FIR
FIR
Fig. 6.7 A Convolutional Encoder in Terms of FIR Filters with k0 = 7and n0 = 4.
Information Theory, Coding and Cryptography
Fig. 6.8 A Convolutional Encoder in Terms of FIR Filters with ko = 2 and n
0
= 4.
The Rate of This Encoder is R =セN@
Exmllpll16.3 Consider the convolutional encoder given in Fig. 6.9.
セMMMMMMMMMMMMMMMMMMMMMMMMセ@
I I
I I
D b 8
I I
I I
セMMMMMMMMMMMMMMMMMMMMMMMMMセ@
Fig. 6.9 The r。エ・セ@ Convolutional Encoder with G(D) = (o2 + D +1 fil + 7).
The first bit of the output a =i,._2 + i,._1 + in and the second bit of the output b =i,._2
+ i,. , where
in-I represents the input that arrived l time units earlier. Let the input stream of symbols be
represented by a polynomial. We know that multiplying any polynomial by D corresponds to a
single cyclic right-shift ofthe elements. Therefore,
gu(D) = rY + D + 1 and g12 (D)= rY + 1
and the generator polynomial matrix ofthis encoder can be written as
G(D)= [D 2
+D+I IY+1].
Convolutional Codes
Next, consider the encoder circuit shown in Fig. 6.10.
-----------------------------------,
I I
l I
I I
- i b a r-----
h
I
I I
L----------------------------------
Fig. 6.10 The r。エ・セ@ Convolutional Encoder with G(D) = (1 d + 1).
In this case, a = in and b = in-4 + in. Therefore, the generator polynomial matrix ofthis encoder
can be written as
G (D) = [1 D
4
+ 1].
Note that the first ko bits (/co = 1) of the codeword frame is identical to the information frame.
Hence, this is a Systematic Convolutional Encoder.
Example 6.4 Considerthe systematic convolutional encoder represented by the following circuit
(Fig. 6.11).
r---------------1
I I
I
Fig. 6.11 The Rate 2/3 Convolutional Encoder for Example 6.4.
The generator polynomial matrix of this encoder can be written as
[
g11 (D) g12 (D) g13 (D)]- [1
G{D) = -
g21 (D) g22 (D) g23 (D) 0
0 D
3
+D+l]
1 0
It is easy to write the generator polynomial matrix by visual inspection. The element in the セセイッキ@
andj'thcolumn ofthe matrix represents the relation between thel-th input bit and theJ-th output bit. To
I
I
II
.j
:i
I
,,
l
) i
Information Theory, Coding and Cryptography
write the generator polynomial for the (iw,jtll) entry of the matrix, just trace the route from the ,-Ut
input bit to theJ-tl!OUtpUt bit. Ifno path exists, the generator polynomial is the zero polynomial, as in
the case of g12(D), g21(D) and g2lD). If only a direct path exists without any delay elements, the
value ofthe generator polynomial is unity, as in g11(D) and g2z(D). Ifthe route from the ithinput bit
to theJ-Utoutput bit involves a series ofmemory elements (delay elements), represent each delay by
an 。、、ゥエゥセョ。ャ@ power of D, as in g13(D). Note that three of the generator polynomials in the set of
generator polynomials are zero. When ko is greater than 1, it is not unusual for some of the
generator polynomials to be the zero polynomials.
We can now give the formal definitions of the W ordlength, the Blocklength and the
Constraint Length of a Convolutional Encoder.
Definition 6.6 Given the generator polynomial matrix [gij(D)] of a convolutional
code:
{i) The Wordlength of the code is
k = セ@ セH@ deg gij(D) + 1). {6.4)
1,)
{ii) The Blocklength of the code is
n = n0 11?-il:X[deg gij(D) + 1].
1,)
{iii) The Constraint Length of the code is
ko
V= lセ{、・ァ@ gij(D)].
i=j 1,)
{6.5)
{6.6)
Recall that the input message stream IQ, i1, £;., セ@ ... has the polynomial representation I (D) = z0
+ i1D + £;_/J + i3U + ... + iMJ nMJ and the codeword polynomial can be written as C(D) = li> + c1
D + £2 D
2
+ 0,Ii +...+ criJ I?J. The encoding operation can simply be described as vector matrix
product,
C (D) = /(D) G (D) (6.7)
or equivalently,
c1(D) = Liz(D)g1,
1(D). (6.8)
i=l
Observing that the encoding operation can simply be described as vector matrix product, it can
be easily shown that convolutional codes belong to the class of linear codes (exercise).
Convolutional Codes
I
Definition セ@ A Parity Check Matrix H(D) is an (no- セI@ by no matrix of
polynomials that satisfies
G(D)H(D)T= 0 (6.9)
and the Syndrome Polynomial vector which is a (no- セIM」ッューッョ・ョエ@ row vector is
give:J;l by
s(D) = v(D)H(D)T (6.10)
d・セエゥッョ@ セ@ A Systematic Encoder for a convolutional code has the generator
polynomial matrix of the form
G(D) =[I IP(D)] (6.11)
where I is a ko by ko identity matrix and P(D) is a ko by (no - fro) matrix of polynomials.
The parity check polynomial matrix for a systematic convolutional encoder is
H(D) = [- P(D)T II] (6.12)
where I is a (no - セI@ by (no - セI@ identity matrix. It follows that
G(D)H(D)T = 0 (6.13)
Definition 'if A convolutional code whose generator polynomials g1(IJ), l!;;.(D), ...,
g110 (D) satisfy
GCD[gl(D), l!;;.(D), ..., gno (D)] = XZ (6.14)
for some a is called a Non-Catastrophic Convolutional Code. Otherwise it is
called a Catastrophic Convolutional Code.
Without loss of generality, one may take a = 0, i.e., XZ = 1. Thus the task of finding a non
catastrophic convolutional code is equivalent to finding a good set of relatively prime generator
polynomials. Relatively prime polynomials can be easily found by computer searches.
However, what is difficult is to find a set of relatively prime generator polynomials that have
good error correcting capabilities.
Example 6.5 All systematic codes are non-catastrophic because for them g1 (D) = 1 and
therefore,
GCD[l, g2(D)], ..., g"' (D)]= 1
Thus the systematic convolutional encoder represented by the generator polynomial matrix
G(D) = [1 D4
+ 1]
is non-catastrophic.
Consider the following generator polynomial matrix of a binary convolutional encoder
G(D) = [VZ + 1 D4
+ 1]
We observe that(D2
+ 1)2
=D4
+ (D2
+ D2
) + 1 =D4
+ 1for binary encoder (modulo 2 arithmetic).
Hence, GCD[gdD), gz(D)] = D2
+ 1;t 1. Therefore, this is a catastrophic encoder.
I
I'
Information Theory, Coding and Cryptography
Next, consider the following generator polynomial matrix of a non-systematic binary
convolutioilal encoder
The two generator polynomials are relatively prime, i.e., GCD[g1(D), g2(D)] =I.
Hence セウ@ represents anon-catastrophic convolutional encoder.
In the next section, we see that the distance notion of convolutional codes is an important
parameter that determines the number of errors a convolutional code can correct.
6.4 DISTANCE NOTIONS FOR CONVOLUTIONAL CODES
Recall that, for block codes, the concept of (Hamming) Distance between two codewords
provides a way of quantitatively describing how different the two vectors are and how as a good
code must possess the maximum possible minimum distance. Convolutional codes also have a
distance concept that determines how good the code is.
When a codeword from a convolutional encoder passes through a channel, errors occur from
time to time. The job of the decoder is to correct these errors by processing the received vector.
In principle, the convolutional codeword is infinite in length. However, the decoding decisions
are made on codeword segments of a finite length. The number of symbols that the decoder can
store is called the Decoding Window Width. Regardless of the size of these finite segments
(decoding window width), the previous frames affect the current frame because of the memory
of the encoder. In general, one gets a better performance by increasing the decoding window
width, but one eventually reaches a point of diminishing return.
Most of the decoding procedures for decoding convolutional codes work by focussing on the
errors in the first frame. If this frame can be corrected and decoded, then the first frame of
information is known at the receiver end. The effect of these information symbols on the
subsequent information frames can be computed and subtracted from subsequent codeword
frames. Thus the problem of decoding the second codeword frame is the same as the problem
of decoding the first frame.
We extend this logic further. If the first[frames have been decoded successfully, the problem
of decoding the(/+ l)th frame is the same as the problem of decoding the first frame. But what
happens if a frame in-between was not decoded correctly? If it is possible for a single decoding
error event to induce an infinite number of additional errors, then the decoder is said to be
subject to Error Propagation. In the case where the decoding algorithm is responsible for
error propagation, it is termed as Ordinary Error Propagation. In the case where the poor
choice of catastrophic generator polynomials cause error propagation, we call it Catastrophic
Error Propagation.
Convolutional Codes
Definition 6.10 The zthminimum distance d;·of a convolutional code is equal to
the smallest Hamming Distance between any two initial codeword segments l frame
long that are not identical in the initial frame. Ifl= m+ 1, then this (m+ l)th minimum
distance is called the Minimum Distance of the code and is denoted by d• , where m
is the number of information frames that can be stored in the memory ofthe encoder.
In literature, the minimum distance is also denoted by dmi,.
We note here that a convolutional code is a linear code. Therefore, one of the two codewords
used to determine the minimum distance of the code can be chosen to be the all zero codeword.
The lth minimum distance is then equal to the weight of the smallest-weight codeword segement
l frames long that is non zero in the first frame (i.e., different from the all zero frame).
Suppose the lth. minimum distance of a convolutional code is d!. The code can correct terrors
occurring in the first l frames provided
d; セ@ 2t+ 1' (6.15)
Next, put l= m+ 1, in which cased;= d;+l = d•. The code can correct terrors occurring in the
first blocklength n = {m + 1)no provided
、Nセ@ 2t + 1 (6.16)
Example 6.6 Consider the convolutional encoder ofExamp1e 6.1 (Fig. 6.3). The Trellis Diagram
for the encoder is given in Fig. 6.12.
liJ
E·
E·
•·
•
•
セ@ MMMMMMMMMMMMMMMMMMMMMMセ@
Time axis
• ••
Continues to
infinity
Fig. 6.12 The Trellis Diagram for the Convolutional Encoder of Example 6.1.
In this case d1
* = 2, セᄋ@ = 3, d3* = 5, d4* = 5, ... We observe that dt= 5 fori セSN@ For this ・セ」ッ、・イL@
m = 2.Therefore, the minimum distance of the code ゥウセ]@ d • = 5. This code can correct(d - 1)/2
= 2 random errors that occur in one blocklength, n = (m + l)no = 6.
I
·i
E2J Information Theory, Coding and Cryptography
Definition 6.11 The Free Distance of a convolutional code is given by
flt,ee = m;uc[dj] (6.17)
It follows that t4n+l :5 dm+2 :5 ··· :5 dfree ·
The term dfree was first coined by Massey in 1969 to denote a type of distance that was found
to be an important parameter for the decoding techniques of convolutional codes. Since, fir.ee
represents the minimum distance between arbitrarily long (possibly infinite) encoded
sequences, dfree is also denoted by doo in literature. The parameter dfree can be directlycalculated
from the trellis diagram. The free distance fit,ee is the minimum weight of a path that deviates
from the all zero path and later merges back into the all zero path at some point further down
the trellis as depicted in Fig. 6.13. Searching for a code with large minimumdistance and large
free distance is a tedious process, and is often done using a computer.Clever techniques have
been designed that reduce the effort by avoiding exhaustive searches. Most of the good
convolutional codes known today have been discovered by computer searches.
Definition 6.12 The free length セッヲ@ a convolutional code is the length of the
non-zero segment of a smallest weight convolutional codeword of non-zero weight.
Thus, d1= dfreeif l= nfree, and d1< flt,eeif l< nfree· In literature, 7l_trnis also denoted by n00 •
--------The all zero path ___________...
セ@
Re-merges
V Nodes in the trellis
Fig. 6.13 The Free Distance dtree path.
Example 6.7 Consider the convolutional encoder of Example 6.1 (Fig. 6.3). For this encoder,
dfree = 5. There are usually more than one ーセ@ ofpaths that can be used to calculate dfree . The two
paths that have been used to calculatedfree are shown in Fig. 6.14 by double lines. In this example,
dmin = dfree .
Convolutional Codes
States
Time axis
Fig. 6.14 Calculating dtree in the Trellis Diagram.
Q Q Q
Continues to
infinity
The free length of the convolutional code is nfree = 6. In this example, the セ@ is equal to the
blocklength n of the code. In general it can be longer than the blocklength.
6.5 THE GENERATING FUNCTION
The performance of a convolutional code depends on its free distance, dfree . Since convolutional
codes are a sub-class of linear codes, the set of Hamming distances between coded sequences is
the same as the set of distances of the coded sequel).ces from the all-zero sequence. Therefore,
we can consider the distance structure of the convolutional codes with respect to the all-zero
sequence without loss of generality. In this section, we shall study an elegant method of
determining the free distance, セ・@ of a convolutional code.
To find fir.ee we need the set of paths that diverge from the all-zero path and merge back at a
later time. The brute force (and time consuming, not to mention, exasperating) method is to
determine the distances of all possible paths from the trellis diagram. Another way to find out
the fit,ee of a convolutional code is use the concep.t of a generating function, whose expansion
provides all the distance information directly. The generating function can be understood by
the following example.
Example 6.8 Consider again the convolutional encoder of Example 6.1 (Fig. 6.3). The state
diagram ofthe encoder is given in Fig. 6.4. We now construct a modified state diagram as shown in
Fig. 6.15.
The branches of this modified state diagram are labelled by branch gain d ,i = 0, 1, 2, where
the expc)nent of D denotes the Hamming Weight of the·branch. Note that the self loop ar S0 has
been neglected as it does not contribute to the distance property ofthe code. Circulating around this
loop simply generates the all zero sequence. Note thatS0 has been split into two states, initial and
final. Any path that diverges from state S0 and later merges back to S0 can be thought of
equivalently as traversing along the branches of this modified state diagram, starting from the
I
I
!
I
t
l
i
H
:
L
II.:
··..セ@ I
't
I
:f
J.
·lj
セ@ :
Information Theory, Coding and Cryptography
initial S0 and ending at the final S0• Hence this modified state diagram encompasses all possible
paths that diverge from and then later merge back to the all zero path in the trellis diagram.
・MMMMMセMMMMMクセQセMMMMMMMMセセクセRMMMMセセMMMMt\d^@
So
Fig. 6.15 The Modified state Diagram of the Convolutional Encoder Shown in Fig. 6.3.
We can find the distance profile ofthe convolutional code using the state equations ofthis modified
state diagram. These state equations are
xt =D2 + x2,
x2 = DX1 + DX3,
x3 = DX1 + vx3,
T(D) = D2X2, (6.18)
where Xi are dummy variables. Upon solving these equations simultaneously, we obtain the
generating function
Ds
T(D)=--
1-2D
(6.19)
Note that the expression for T(D) can alxso be (easily) obtained by theMason's Gain Formula,
which is well known to the students ofDigital Signal Processing.
Following conclusions can be drawn from the series expansion ofthe generating function:
(i) There are an infinite number ofpossible paths that diverge from the all zero path and later
merge back again (this is also intuitive).
(ii) There is only one path with Hamming Distance 5, two paths with Hamming Distance 6,
and in general21
paths with Hamming Distance k + 5 from the all zero path.
(iii) The free Hamming Distance d1,eefor this code is 5. There is only one path corresponding to
4ee .Example 6.7 explicitly illustrates the pair of paths that result in 4ee = 5.
Convolutional Codes
We now introduce two new labels in the modified state diagram. To enumerate the length of
a given path, the label Lis attached to each branch. So every time we traverse along a branch we
increment the path length counter. We also add the label Ii, to each branch to enumerate the
Hamming Weight of the input bits associated with each branch of the modified state diagram
{see Fig. 6.16).
DLI
Fig. 6.16 The Modified state Diagram to Determine the Augmented Generating Function.
The state equations in this case would be
X1 = セlャ@ +LIX;_,
X;. = DLXI + dセL@
X3 ]dセ@ +DLfX3,
And the Augmented Generating Function is
T(DJ=IYLX;. .
On solving these simultaneous equations, we obtain
T(D, L, I)= DsL3I
1-DL(L+l)I
(6.20)
= D5L3I+ D6L4 (L + 1)I2 + ···+ Dk+5
L3+k (L +1)klk+1
··· (6.21)
Further conclusions from the series expansion of the augmented generating function are:
(i) The path with the minimum Hamming Distance of 5 has length equal to 3.
(ii) The input sequence corresponding to this path has weight equal to 1.
(iii) There are two paths with Hamming Distance equal to 6 from the all zero path. Of these,
one has a path length of 4 and the other 5 (observe the power of L in the second term in
the summation). Both these paths have an input sequence weight of 2.
In the next section, we study the matrix description of convolutional codes which is a bit
more complicated than the matrix description of linear block codes.
!
f
I
セ@ i
w
"I
H-
Information Theory, Coding and Cryptography
6.6 MATRIX DESCRIPTION OF CONVOLUTIONAL CODES
A convolutional code can be described as consisting of an infinite number of infinitely long
codewords and which (visualize the trellis diagram) belongs to the class of linear codes. They
can be described by an infinite generator matrix. As can be expected, the matrix description of
convolutional codes is messier than that of the linear block codes.
Let the generator polynomials of a Convolutional Code be represented by
giJ(D) = セァゥj@ D
1
(6.22)
In order to obtain a generator matrix, the gijl coefficients are arranged in a matrix format. For
each l, let G1be a セ@ by no matrix.
Gt=[gii1] (6.23)
Then, the generator matrix for the Convolutional Code that has been truncated to a block code
of blocklength n is
Go G1 Gz Gm
0 Go Go Gm-1
G(n) =
G(n)= 0 0 Go Gm-2
(6.24)
0 0 0 Go
where 0 is a セ@ by no matrix of zeros and m is the length of the shift register used to generate the
code. The generator matrix for the Convolutional Code is given by
G=[:'
セ@ G2 Gm 0 0 0 0
"""]
Go セ@ Gm-1 Gm 0 0 0 ...
0 Go Gm-2 Gm-1 Gm 0 0 (6.25)
The matrix extends infinitely far down and to the right. For a systematic convolutional code, the
generator matrix can be written as
I Po 0 セ@ 0 p2 0 pm :o 0 0 0
0 0 I Po 0 11 0
I
Pm-1 l 0 pm 0 0
0 0 0 0 I Po 0
I
pm-1 0 pm
Pm-2 l 0
(6.26)
G=
:o pm-2 0 Pm-1
I
I
0 Pm-2
I
I
I
I
where I is 。セ@ 「ケセ@ identity matrix, 0 is 。セ@ 「ケセ@ matrix of zeros and P0 , P 2 , ..., Pュ。イ・セ@ by
(no - セI@ matrices. The parity check matrix can then be written as
Convolutional Codes
RT
0 -I
p,T
1 0 P/ -I
Pl 0 p,T
1 0 P/ -I
H= (6.27}
pT
m 0 p,;_1 0 p,;_2 0 P{ -I
pT
m 0 p,;_I 0
pT
m 0
Example 6.9 Consider the convolutional encoder shown in Fig. 6.17. Let us first write the gene-
rator polynomial matrix for this encoder. To do so, we just follow individual inputs to the outputs,
one by one, and count the delays. The generator polynomial matrix is obtained as
[
D + D2 DD2 D+DD2]
G(D)= D2
i2
+ C:3
Fig. 6.17 A Rate 2!3 Convolutional Encoder.
The generator polynomials are g11(D) = D + IY, g12(D) = IY, g13(D) = D + IY, g21(D) = D
2
,
セ R HdI@ = D and セ S@ (D) = D.
To write out the matrix G0, we look at the constants (coefficients of D
0
) in the generator·
polynomials. Since there are no constant terms in any of the generator polynomials,
gッ]{セ@ セ@ セ}@
Next, to write out the matrix G1, we look at the coefficients ofD1
in the generator polynomials.
The l8
trow, 1st column entry of the matrixG1 corresponds to the coefficients ofD
1
ingu(D). The
l8
t row, 2nd column entry corresponds to the coefficients ofD1in g12(D), and so on. Thus,
i
i
!I
!
i
Information Theory, Coding and Cryptography
Similarly, we can write
Gr= [: セ@ セ}@
The generator matrix can now be written as
o o o:1 o 1:1 1 1:o o o ...
I I I
o o o:o 1 1:o 1 1: o o ...
-------r-------T-------r----------
10 0 0 11 0 111 1 1
I I I
:o o o:o 1 1:o 1 1
G=
I I I
MMMMMMM[MMMMMMM[MMMMMMMZセMMッMMセMMM
1 I
! : 0 1 1 ...
Our next task is to look at an efficient decoding strategy for the convolutional codes. One of the
very popular decoding methods, the Viterbi Decoding Technique, is discussed in detail.
6.7 VITERBI DECODING OF CONVOLUTIONAL CODES
There are three important decoding techniques for convolutional codes: Sequential Decoding,
Threshold Decoding and the Viterbi Decoding. The Sequential Decoding technique was
proposed by Wozencraft in 1957. Sequential Decoding has the advantage that it can perform
very well with long-constraint-length convolutional codes, but it has a variable decoding time.
.Threshold Decoding, also known as Majority Logic Decoding, was proposed by Massey in
1963 in his doctoral thesis at MIT. Threshold decoders were the first commercially produced
decoders for convolutional codes. Viterbi Decoding was developed by AndrewJ. Viterbi in
1967 and in the late 1970s became the dominant technique for convolutional codes.
Viterbi Decoding had the advantages of
(i) a highly satisfactory bit error performance,
(ii) high speed of operation,
(iii) ease of implementation,
(iv) low cost.
Threshold decoding lost its popularity specially because of its inferior bit error performance.
It is conceptually and practically closest to block decoding and it requires the calculation of a set
of syndromes, just as in the case of block codes. In this case, the syndrome is a sequence because
the information and check digits occur as sequences.
Viterbi Decoding has the advantage that it has a fixed decoding time. It is well suited to
hardware decoder implementation. But its computational requirements grow exponentially as a
function of the constraint length, so it is usually limited in practice to constraint lengths of
Convolutional Codes
v = 9 or less. As of early 2000, some leading companies claimed to have produced a
V = 9 Viterbi decoder that operates at rates up to 2 Mbps.
Since the time when Viterbi proposed his algorithm, other researchers have expanded on his
work by finding good convolutional codes, exploring the performance limits of the technique,
and varying decoder design parameters to optimize the implementation of the technique in
hardware and software. The Viterbi Decoding algorithm is also used in decoding Trellis Coded
Modulation (TCM), the technique used in telephone-line modems to squeeze high ratios of
bits per-second to Hertz out of 3 kHz-bandwidth analog telephone lines. We shall see more of
TCM in the next chapter. For years, convolutional coding with Viterbi Decoding has been the
predominant FEC (Forward Error Correction) technique used in space communications,
particularly in geostationary satellite communication networks such as VSAT (very small
aperture terminal) networks. The most common variant used in VSAT networks is rate 112
convolutional coding using a code with a constraint length V= 7. With this code, one can
transmit binary or quaternary phase-shift-keyed (BPSK or QPSK) signals with at least 5 dB less
power than without coding. That is a reduction in Watts of more than a factor of three! This is
very useful in reducing transmitter and antenna cost or permitting increased data rates given the
same transmitter power and antenna sizes.
We will now consider how to decode convoh.Itional codes using the Viterbi Decoding
algorithm. The nomenclature used here is that we have a message vector i from which the
encoder generates a code vector c that is sent across a discrete memoryless channel. The
received vector r may differ from the transmitted vector c (unless the channel is ideal or we are
very lucky!). The decoder is required to make an estimate of the message vector. Since there is
a one to one correspondence between code vector and message vector, the decoder makes an
estimate of the code vector.
Optimum decoding will result in a minimum probability of Decoding error. Let p(rlcJ be the
conditional probability of receiving r given that c was sent. We can state that the optimum
decoder is the maximum likelihood decoder with a decision rule to choose the code vector
estimate Cfor which the log-likelihood function In p(rlcJ is maximum.
If we consider a BSC where the vector elements of c and r are denoted by ci and r; , then, we
have
N
p{rlcJ = L P(T; lc;),
i=l
and hence, the log-likelihood function equals
N
In p(r Ic)= L In p(r; jcJ
i=l
Let us assume
(6.28)
(6.29)
(6.30)
Information Theory, Coding and Cryptography
If we suppose that the received vector differs from the transmitted vector in exactly d
positions (the Hamming Distance between vectors c and r), we may rewrite the log-likelihood
function as
In p(r Ic)= din p + (N- d) ln(1 - p)
=din (_p__J+Nln ( 1 - p)
1- p
(6.31)
We can assume the probability of error p < lf2 and we note that N In(1 - p) is a constant for
all code vectors. Now we can make the statement that the maximum likelihood decoding rule for a
Binary Symmetric Channel is to choose the code vector estimate cthat minimizes the Hamming Distance
between the received vector r and the transmitted vector c.
For Soft Decision Decoding in Additive White Gaussian Noise (AWGN) channel with single
sided noise power N0, the likelihood function is given by
lr-d
N -'-'-
II
1 No
.p(r Ic) = i=I JtrNo e
(
1 Jt ( 1 N J
= - - exp --,Lir; -cil2
TrNo No i=I
(6.32)
Thus the maximum likelihood decoding rule for the AWGN channel with Soft Decision
Decoding is to minimize the squared Euclidean Distance between r and c. This squared Euclidean
Distance is given by
N
、セHイャ」I@ = _L!r; -ci!
2
(6.33)
i=l
Viterbi Decoding works by choosing that trial information sequence, the encoded version of
which is closest to the received sequence. Here, Hamming Distance will be used as a measure of
proximity between two sequences. The Viterbi Decoding procedure can be easily understood
by the following example.
Example 6.10 Consider the rate 113 convolutional encoder given in Fig. 6.18 and the corresponding
trellis diagram.
(Contd.. .)
Convolutional Codes
101 101
Smres --------------------------------------
Time axis
• • •
Continues to
infinity
Fig. 6.18 A Rate 7/3 Convolutional Encoder and Its Trellis Diagram.
Suppose the transmitted sequence was an all zero sequence. Let the received sequence be
r= 010000100001 ...
Since it is a 1/3 rate encoder, we first segment the received sequence in groups of three
bits (because n0 = 3), i.e.,
r = 010 000 100 001 ...
The task at hand is to fiqd out the most likely path through the trellis. Since a path must pass
through nodes in the trellis, we will try to find out which nodes in the trellis belong to the most
likely path. At any time, every node has two incoming branches HセINw・@ simply determine which
ofthese two branches belongs to a more likely path (and discard the other). We make this decision
based on some metric (Hamming Distance). In this way we retain just one path per node and the
metric of that path. In this example, we retain only four paths as we progress with our decoding
(since we have only 4 states in our trellis).
Let us consider the first branch of the trellis which is labelled 000. We find the Hamming
distance between this branch and the first received framelength, 010. The Hamming distance
d (000, 010) = 1. Thus the metric for this first branch is 1, and is called theBranch Metric. Upon
reaching the top node from the starting node, this branch has accumulated a metric= 1. Next, we
compare the received framelength with the lower branch, which terminates at the second node from
the top. The Hamming Distance in this case is d (111, 010) = 2. Thus, the metric for this first
branch is 2. At each node we write the total metric accumulated by the path, called the Path
Metric. The path metrics are marked by circled numbers in the trellis diagram in Fig. 6.19. At the
subsequent stages ofdecoding when two paths terminate at every node, we will retain the path with
the smaller value of the metric.
I,
i.
[
l
l!
T セ@
,,
·,
Information Theory, Coding and Cryptography
Lo1 I •
L1o I • • • •
•
0· •
ウュセウ@ MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ@
Time axis
Fig. 6.19 The Path Metric after the 1st Step of Vlterbl Decoding.
We, now, move to the next stage ofthe trellis. The Hamming Distance betweenthe branches are
computed with respect to the second frame received, 000. The branch metrics for the two branches
emanating from the topmost node are 0 and 3. The branch metrics for the two branches emanating
from the second node are 2 and 1. The total path metric is marked by circled numbers in the trellis
diagram shown in Fig. 6.20.
0
0 8
(i 8 8
0 8 (i
0 8 (i
セ@ MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ@
Time axis
Fig. 6.20 The Path Metric after the 2nd Step of Vlterbl Decoding.
We now proceed to the next stage. We again compute the branch metrics and add them to the
respective path metrics to get the new path metrics. Consider the topmost node at this stage. Two
branches terminate at this node. The path coming from node 1 of the previous stage has a path
metric of2 and the path coming from node 1ofthe previous stage has a path metric of6. The path
with a lower metric is retained and the other discarded. The trellis diagram shown in Fig. 6.21
gives the surviving paths (double lines) and the path metrics (circled numbers). Viterbi called
these surviving paths as Survivors. It is interesting to note that node 4 receives two paths with
equal path metrics. We have arbitrarily chosen one ofthem as the surviving path (by tossing a fair
coin!).
Convolutional Codes
G
G· 0 G 8
0· ..
0-- 8
States
Time axis
Fig. 6.21 The Path Metric after the 3rd step of Viterbl Decoding.
We continue this procedure for Viterbi decoding for the next stage. The final branch metrics
and path metrics are shown in Fig. 6.22. At the end we pick the path with the minimum metric.
This path corresponds to the all zero path. Thus the decoding procedure has been able to
correctly decode the received vector.
CDo f;;
セセセセセセセセセセ]]セP@
G· 0
@ (oil G
0· (i
0
0· @ @
states
lime axis
Fig. 6.22 The Path Metric after the 4th Step of Viterbi Decoding.
The minimum distance for this code is a= 6. The number oferrors that it can correct perframe
length is equal to
t= l(d.- 1)/2)j = l(6- l)/2j = 2.
In this example, the maximum number oferrors per framelength was 1.
Consider the set of surviving paths at the rth frame time. If all the surviving paths cross
through the same nodes then a decision regarding the most likely pathNエイ。ョセュゥエエ・、@ can be made
up to the point where the nodes are common. To build a ーイセ」エゥ」。ャ@ vエセイ「エ@ Decoder, one must
choose a decoding window width w, which is usually several times as btg. as the 「ャッ」セ・ョァエィN@ At
a given frame time,f, the decoder examines all the surviving paths to see If they agree m the first
i;
l
'·!
!I·
li
,,
Information Theory, Coding and Cryptography
branch. This branch defines a decoded information frame and is passed out of the decoder. In
the previous example of Viterbi Decoding, we see that by the time the decoder reaches the 4th
frame, all the surviving paths agree in their first decoded branch (called a well-defined decision).
The decoder drops the first branch (after delivering the decoded frame) and takes in a new
frame of the received word for the next iteration. If again, all the surviving paths pass through
the same node of the oldest surviving frame, then this information frame is decoded. The
process continues in this way indefinitely.
If a long·enough decoding window w is chosen, then a well-defined decision can be reached
almost always. A well designed code will lead to correct decoding with a high probability. Note
that a well designed code carries meaning only in the context of a particular channel. The
random errors induced by the channel should be within the error correcting capability of the
code. The Viterbi decoder can be visualized as a sliding window through which the trellis is
viewed (see Fig. 6.23). The window slides to the right as new frames are processed. The
surviving paths are marked on the portion of the trellis which is visible through the window. As
the window slides, new nodes appear on the right, and some of the surviving paths are extended
to these new nodes while the other paths disappear.
Decoding Window
• • • • • • • • • • •
• • • • • • • • •
• • • • • • • •
w
Fig. 6.23 The Viterbi Decoder as a Sliding Window through which the Trellis is Viewed.
If the surviving paths do not go through the same node, we label it a Decoding Failure. The
decoder can break the deadlock using any arbitrary rule. To this limited extent, the decoder
becomes an incomplete decoder. Let us revert back to the previous example. At the 4th stage,
the surviving paths could as well be chosen as shown in Fig. 6.24, which will render the decoder
as an incomplete decoder.
CD CD 0
0
8
• •
0
0· ®
States
Time axis
Fig. 6.24 Example of an Incomplete Decoder in Viterbi Decoding Process.
Convolutional Codes
It is possible that in some cases the decoder reaches a well-defined decision, but a wrong one!
If this happens, the decoder has no way of knowing that it has taken a wrong decision. Based on
this wrong decision, the decoder will take more wrong decisions. However, if the code is non-
catastrophic, the decoder will recover from the errors.
The next section deals with some Distance Bounds for convolutional codes. These bounds
will help _us compare different convolutional coding schemes.
6.8 DISTANCE BOUNDS FOR CONVOLUTIONAL CODES
Upper bounds can be computed on the minimum distance of a convolutional code that has a
rate R = !!L and a constraint length v = ュセN@ These bounds are similar in nature and derivation
no
to those for block codes, with block length corresponding to constraint length. However, as we
shall see, the bounds are not very tight. These bounds just give us a rough idea of how good the
code is. Here we present the bounds (without ーイッッセ@ for binary codes.
For rate R and constraint length v, let d be the largest integer that satisfies
hHセカ@ IセゥMr@ (6.34)
Then at least one binary convolutional code exists with minimum distance d for which the
above inequality holds. Here H(x) is the familiar entropy function for a binary alphabet
H(x) = - x log2 x- (1 - x) log2 (1 - x), 0 :::;; x:::;; 1
For a binary code with R = 1171v the minimum distance dmin satisfies
dmin :::;; L(no v +no )/2J
where L]J denotes the largest integer less than or equal to 1
An upper bound on dfree is given by (Heller, 1968)
rip.ee = min lnv -t-v+j -lj
f?.l 2 21 -1
(6.35)
(6.36)
To calculate the upper bound, the right hand side should be plotted for different integer
values of j. The upper bound is the minimum of this plot.
Example 6.10 Let us apply the distance bounds on the convolutional encoder given in Example
6.1. We will first apply the bound given by (6.34). For this encoder, ko セ@ 1, no= 2, R =lnand v
=2.
H(__!l_) = H(d14) セ@ 1- R=1/2 => H(d14) セ@ 0.5
n0v
I.
i I
i
:I
But we have,
Information Theory, Coding and Cryptography
H{0.11) = - 0.11log2 0.11- (1- 0.11) log2 {1- 0.11) = 0.4999, and
H{0.12) =- 0.12 log2 0.12- {1- 0.12) log2 {1- 0.12) = 0.5294.
Therefore, 、OTセ@ .0.11, or d セ@ 0.44
The largest integer d that satisfies this bound is d = 0. This implies that at least one
binary convolutional code exists with minimum distance d = 0. This statement does not
say much (i.e., the bound is not strict enough for this encoder)!
NeXt, consider the encoder shown in Fig. 6.25.
For this encoder,
But we have,
----------------
;
I
I
:_----------------- J
Fig. 6.25 Convolutional Encoder for Example 6. 10.
G(D) = [1 D+ D2
D+ D2
+ D3
],
セ@ =1, no= 3, R= 113 and v =3.
n(セカI@ =H(d/9) Sl- R; 2/3=> H(d/9) s0.6666
H{O.l7) = - 0.17log2 0.17- (1- 0.17) log2 (1- 0.17) =0.6577, and
H(O.l8) = - 0.18log2 0.18- (I - 0.18) log2 (1 - 0.18) = 0.6801
Therefore, d/9 セ@ 0.17, or d セ@ 1.53.
The largest integer d that satisfies this bound is d = 1. Then at least one binary
convolutional code exists with minimum distance d = 1. This is a very loose bound.
Let us now evaluate the second bound, given by (6.35).
dmin セ@ l(nov +no )12J=l(9+ 3)/2J=6
This gives us dmm = 6, which is a good upper bound as seen from the trellis diagram
ヲセイ@ the encoder (Fig. 6.26). Since no= 3, every branch in the trellis is labelled by 3
btts. The two paths that have been used to calculate dmin are shown in Fig. 6.26 by
double lines. In this example, dmin = セ@ = 6.
Convolutional Codes
States
lime axis
e e e
Continues to
infinity
Fig. 6.26 The Trellis Diagram for the Convolutional Encoder Given in Fig. 6.25.
Next, we determine the Heller Bound on dfm, as given by (6.36). The plot of
the function
d(j) = l(no/2){211(21 -t))(v +j-1)J
for different integer values of j is given in Fig. 6.27.
.-3 ·. 4. 5
Fig. 6.27 The Heller Bound Plot.
From Fig. 6.27, we see that the upper bound on the free dista,nce of the code is dfm
セ@ 8. This is a good upper bound. The actual value of dfrtt = 6.
6.9 PERFORMANCE BOUNDS
One of the useful performance criteria for convolutional codes is the bit error probability P1r
The bit error probability or the bit error rate (a misnomer!) is defined as the expected number
of decoded information bit errors per information bit. Instead of obtaining an exact expression
for Ph , typically, an upper bound on the error probability is calculated. We will first determine
the First Event Error Probability, which is the probability of error for sequences that merge
with the all zero (correct) path for the first time at a given node in the trellis diagram.
Information Theory, Coding and Cryptography
Since convolutional codes are linear, let us assume that the all zero codeword is transmitted.
An error will be made by the decoder if it chooses the incorrect path c' instead of the all zero
path. Let c' differ from the all zero path in d bits. Therefore, a wrong decision will be made by
the maximum likely decoder if more than lセ@ Jerrors occur, where L
xJ is the largest integer
less than or equal to x. If the channel transition probability is p, then the probability of error can
be upper bounded as follows.
(6.37)
Now, there would be many paths with different distances that merge with the correct path 。エセ@
a given time for the first time. The upper bound on the first event error probability can be
obtained by summing the error probabilities of all such possible paths:
co
Pe セ@ LadPd (6.38)
d=dfrn
where, ad is the number of codewords of Hamming Distance d from the all zero codeword.
Comparing (6.19) and (6.38) we obtain
セ@ セ@ T(D)ID=2b(I- p) (6.39)
The bit error probability, Ph, can now be determined as follows. Ph can be upper bounded by
weighting each pairwise error probability, Pd in (6.37) by the number of incorrectly decoded
information bits for the corresponding incorrect path nd- For a rate lr/n encoder, the average Ph
(6.40)
It can be shown that
aT(D, 1)1
ai l=l
(6.41)
Thus,
Ph セ@ _!_ 1ar(D, I) j
k a1 l=l,D=2J(l-p)
(6.42)
6.10 KNOWN GOOD CONVOLUTIONAL CODES
In this section, we shall look at some known good convolutional codes. So far, only a few
constructive classes of convolutional codes have been reported. There exists no class with an
algebraic structure comparable to the t-error correcting BCH codes. No constructive methods
Convolutional Codes
exist for finding convolutional codes of long constraint length. Most of the codes presented here
have been found by computer searches.
Initial work on short convolutional codes with maximal free distance was reported by
Odenwalder (1970) and Larsen (1973). A few of the codes are listed in Tables 6.2, 6.3 and 6.4 for
code rates 112, 113 and 1/4 respectively. The generator is given as the sequence 1, イセゥIG@ イセI@ ...
where
(6.43)
For example, the octal notation for the generators of the R = lf2 , v = 4 encoders are 15 and 17
(see Table 6.2). The octal15 can be deciphered as 15 = 1-5 = 1-101. Therefore,
Similarly,
Therefore,
3
4
5
6
7
5
12
14
g1(D)= 1 + (1)D + (O)Ii + (1)U = 1 + D +d.
17= 1-7= 1-111.
!§;.(D)= 1 + (1)D + (1)U + (1)U = 1 + D + Ii + d, and
G(D) = [1 + D + d 1 + D + Ii + U].
Table 6.2 Rate セ@ codes with maximum free distance
v n Generators duee Heller
(octal) Bound
Non Catastrophic
6 5 7 5 5
8 15 17 6 6
10 23 35 7 8
12 53 75 8 8
14 133 171 10 10
Catastrophic
10 27 35 8 8
24 5237 6731 16 16
28 21645 37133 17 17
Table 6.3 Rate 7/3 codes with maximum free distance.
,. n Generators d1
'"'" Heller
3
4
5
6
7
9
12
15
18
21
5
13
25
47
133
(octal) Bound
7
17
37
75
175
7
17
37
75
175
8
10
12
13
15
8
10
12
13
15
..
I
I
1
Information Theory, Coding and Cryptography
Table 6.4 Rate 7/4 codes with maximum free distance
l' n Generators d,,<'•-' Heller
3
4
5
{)
7
12
16
20
24
28
5
13
25
53
135
7
15
27
67
135
(Octal) Bound
7
15
33
71
147
7
17
37
75
163
10
15
16
18
20
10
15
16
18
20
Next, we study an interesting class of codes, called Turbo Codes, which lie somewhere between
linear block codes and convolutional codes.
6. 11 TURBO CODES
Turbo Codes were introduced in 1993 at the International Conference on Communications (ICC) by
Berrou, Glavieux and Thitimajshima in their paper "Near Shannon Limit Error Correction
Coding and Decoding-Turbo-Codes". In this paper, they quoted a BER performance of 10-5
at
an E/No of 0.7 dB using only a 112 rate code, generating tremendous interest in the field. Turbo
Codes perform well in the low SNR scenario. At high SNRs, some of the traditional codes like
the Reed-Solomon Code have comparable or better performance than Turbo Codes.
Even though Turbo Codes are considered as Block Codes, they do not exactly work like
block codes. Turbo Codes are actually a quasi mix between Block and Convolutional Codes.
They require, like a block code, that the whole block be present before encoding can begin.
However, rather than computing parity bits from a system of equations, they use shift registers
just like Convolutional Codes.
Turbo Codes typically use at least two convolutional component encoders and two maximum
aposteriori (MAP) algorithm component decoders in the Turbo codes. This is known as
concatenation. Three different arrangements of turbo codes are Parallel Concatenated
cッョカセャオエゥッョ。ャ@ Codes (PCCC), Serial Concatenated Convolutional Codes (SCCC), and
セケ「ョ、@ Concatenated Convolutional Codes (HCCC). Typically, Turbo Codes are arranged
hke the PCCC. An example of a PCCC Turbo encoder given in Fig. 6.28 shows that two
encoders run in parallel.
Fig. 6.28 Block Diagram of a Rate 7/3, PCCC Turbo Encoder.
Convolutional Codes
One reason for the better performance of Turbo codes is that they produce high weight code
words. For example, if the input sequence (Uk) is originally low weight, the systematic (Xk) and
parity 1 (Y1) outputs may produce a low weight codeword. However, the parity 2 output (Yf)
is less likely to be a low weight codeword due to the interleaver in front of it. The interleaver
shuffles the input sequence, Uk, in such a way that when introduced to the second encoder, it is
more likely to produce a high weight codeword. This is ideal for the code because high weight
codewords result in better decoder performance. Intuitively, when one of the encoders produces
a 'weak' codeword, the other encoder has a low probability of producing another 'weak'
codeword because of the interleaver. The concatenated version of the two codewords is,
therefore, a 'strong' codeword. Here, the expression 'weak' is used as a measure of the average
Hamming Distance of a codeword from all other codewords.
Although the encoder determines the capability for error correction, it is the decoder that
determines the actual performance. The performance, however, depends upon which algorithm
is used. Since Turbo Decoding is an iterative process, it requires a soft output algorithm like the
maximum a-posteriori algorithm (MAP) or the Soft Output Viterbi Algorithm (SOVA) for
decoding. Soft output algorithms out-perform hard decision algorithms because they have
available a better estimate of what the sent data actually was. This is because soft output yields
a gradient of information about the computed infohnation bit rather than just choosing a 1 or 0
like hard output. A typical Turbo Decoder is shown in Fig. 6.29.
The MAP algorithm is often used to estimate the most likely information bit to have been
transmitted in a coded sequence. The MAP algorithm is favoured because it outperforms other
algorithms, such as the SOVA, under low SNR conditions. The major drawback, however, is
that it is more complex than most algorithms because of its focus on each individual bit of
information. Research in the area (in late 1990s) has resulted in great simplification of the MAP
algorithm.
カAMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ@
De-lnter1eaver
Decoder1
lnterleaver
Final Estimate
Fig. 6.29 Block Diagram of a Turbo Decoder.
Information Theory, Coding and Cryptography
A Turbo Decoder generally uses the MAP algorithm in at least one of its component
decoders. The decoding process begins by receiving partial information from the channel (Xk
and Yi} and passing it to the first decoder. The rest of the information, parity 2 (Yl ), goes to the
second decoder and waits for the rest of the information to catch up. While the second decoder
is waiting, the first decoder makes an estimate of the transmitted information, interleaves it to
match the format of parity 2, and sends it to the second decoder. The second decoder takes
information from both the first decoder and the channel and re-estimates the information. This
second estizp.ation is looped back to the first encoder where the process starts again. The iterative
process of the Turbo Decoder is illustrated below in Fig. 6.30.
esif1'Ste , _ ; セエ・@
セ。|HNXU@ セッヲエQGX||PiG@ ゥtsセセセ@
based,"". 0.--  .
セ・ウセ@
ゥヲoHエGZセ@
·ni0ft1'ai0{'
Reeewes ' ne aod
セLL@
esiroate
ゥヲ。ョウQ・ヲBsセ@
0{fS セヲ@ .........
Fig. 6.30 Iterative Decoding of Turbo Code.
This cycle will continue until certain conditions are met, such as a certain number ofiterations
being performed. It is from this iterative process that Turbo Coding gets its name. The decoder
circulates estimates of the sent data like a turbo engine circulates air. When the decoder is
ready, the estimated information is finally kicked out of the cycle and hard decisions are made
in the threshold component. The result is the decoded information sequence.
In the following section, we study two decoding methods for the Turbo Codes, in detail.
6.12 TURBO DECODING
We have seen that the Viterbi Algorithm is used for the decoding of convolutional codes. The
Viterbi Algorithm performs a systematic elimination of the paths in the trellis. However, such
luck does not exist for Turbo Decoder. The presence of the interleaver complicates the matter
immensely. Before the discovery of Turbo Codes, a lot of work was being done in the area of
Convolutional Codes
suboptimal decoding strategies for concatenated codes, involving multiple decoders. The
symbol-by-symbol maximum a posteriori (MAP) algorithm of Bahl, Cocke, Jelinek and Raviv,
published in the IEEE Transactions on Information Theory in March 1974 also received some
attention. It was this algorithm, which was used by Berrou et al. in the iterative decoding of their
Turbo Codes. In this section, we shall discuss two methods useful for Turbo Decoding:
(A) The modified Bahl, Cocke,Jelinek and Raviv (BCJR) Algorithm.
(B) Th_e Iterative MAP Decoding.
A. MODIFIED BAHL, COCKE, JELINEK AND RAVIV (BCJR) ALGORITHM
The modified BCJR Decoding Algorithm is a symbol-by-symbol decoder. The decoder decides
uk= +1 if
P(uk= +II y) > P(uk= -11 y), (6.44)
and it decides uk = -1 otherwise, where y = (yi, y2, ..., Yn) is the noisy received word.
More succincdy, the decision uk is given by
uk = sign [L(uk )] (6.45)
where L(uk ) is the Log A Posteriori Probability (LAPP) Ratio defined as
L(u )= lo ( P(uk = +1ly))
k g P(uk = -1ly)
(6.46)
Incorporating the code's trellis, this may be written as
[
セー@ (sk-I= s',sk = s,y)/p(y) J
L(uk ) = log , '
IP (sk-I =s 'sk =s, y)/p(y)
s-
(6.47)
where sk e S is the state of the encoder at time k, s+ is the set of ordered pairs (s', s)
corresponding to all state transitions {sk-I = f) to {sk = s) caused by data input uk = +1, and
s- is similarly defined for uk = -1.
Let us define
(6.48)
Iak-I (s')yk(s', s)
s' and
IIak-I(s')yk(s' s)
(6.49)
s' s'
Ifik(s')rk(s',s)
s' (6.50)
f3k-1 (s') =
I Iiik-1 (s')yk(s',s)
s' s'
セi@


:1
j
Information Theory, Coding and Cryptography
with boundary conditions
ao(O) = 1 and lXo ((s :t 0) = 0,
fiN(O) = 1 and fiN(s :t 0) = 0.
Then the modified BCJR Algorithm gives the LAPP Ratio in the following form
B. ITERATIVE MAP DECODING
(6.51)
(6.52)
The decoder is shown in Fig. 6.31. D1 and D2 are the two decoders. Sis the set of 2m constituent
encoder states. y is the noisy received word. Using Baye's rule we can write L(ulc) as
L(ulc) =log ( p (yiulc =+1) J+ log ( p (ulc =+1) J
P (yiulc =-1) P (ulc =-1)
(6.53)
with the second term representing apn:ori information. Since P(ulc = +1) = P(ulc = -1 ) typically,
the a priori term is usually zero for conventional decoders. However, for iterative decoders, D1
receives extrinsic or soft information for each ulc from D2 which serves as a priori information.
Similarly, D2 receives extrinsic information from D 1 and the decoding iteration proceeds with
the each of the two decoders passing soft information along to the other decoder at each half-
iteration except for the first. The idea behind extrinsic information is that D2 provides soft
information to D1 for each ub using only information not available to D1. D1 does likewise for
D2.
N-Bit
セdFMゥョエ・イエ@
01 D2
y1P
N-Bit MAP
yB e
L12
tnter1eaT Decoder2
N-Bit
lntertea'f
ケセMMMMMMMMMMMMMMMMセ]]]]]]]]MMMMMM⦅⦅ェ@
Fig. 6.31 Implementation of the Iterative Turbo Decoder.
At any given iteration, D1 computes
Lr(ulc) = L,y}. +L21(ulc)+.G2(ulc) (6.54)
where, the first term is the channel value, L, = 4E, I N0 (E, = energy per channel bit), L2r (ulc) is
extrinsic information passed from D2 to D1, and .G2 (ulc) is the extrinsic information from D1
to D2.
where
Convolutional Codes
e(' )- [1L P P]
yk S 'S - exp 2 cylcXlc '
イセ」HウGLウI@ = exp [ セ@ オセ」Hl・Hオャ」IKlLケォI}NイォHウGLウI@
_Lalr.-1(s')rセイNH@ s' s)
alc(s) = ii ')'and
a1c_1(s')r セ」HウG@ s
s s'
Lセ@ lc(s)yセ」HウGLウI@
セャイNMイHウGI@ = .LI ·
ale-! (s')ylc(s',s)
s s'
(6.55)
(6.56)
(6.57)
{6.58)
{6.59)
(6.60)
For the above algorithm, each decoder must have full knowledge of the trellis of the
constituent encoders, i.e. each decoder must have a table containing the input bits and parity
bits for all possible state transitions s' to s. Also, care should be taken that the last m bits of the
Nbit information word to be encoded must force encoder1 to the zero state by the セ@ bit.
The complexity of convolutional codes has slowed the development of low-cost Turbo
Convolutional Codes (TCC) decoders. On the other hand, another type of turbo code, known
as Turbo Product Code (TPC), uses block codes, solving multiple steps simultaneously,
thereby achieving high data throughput in hardware. We give here a brief introduction to
product codes. Let us consider two systematic linear block codes C1 with parameters (nbkl,d1)
and セキゥエィ@ parameters HセL@ k;_, セ@ ) where ni, ki and di (i = 1, 2) セエ。ョ、@ for codeword length,
number of information bits and minimum Hamming Distance respectively. The concatenation
of two block codes (or product code) P = C1 * セゥウ@ obtained (see Fig. 6.32) by the following
steps:
I
n1
Information Theory, Coding and Cryptography
----- MセM n2 セMMMMMMM ----
Check on columns
Checks
on
rows
cm」ォゥセ@
-Oft セMMMM
セMM
Fig. 6.32 Example of a Product Code P = C7
,. C:c
(i) placing (k1 x セ@ ) information bits in an array of k1 rows and セ@ columns,
(ii) coding the k1 rows using code C2,
(iii) coding the セ」ッャオュョウ@ using code cl.
The parameters of the product code Pare : n= n1 * セ@ , k = k1 * セ@ , d =:= d1 * セ@ and the code
rate R is given by R1 *R;_ where Ri is the code rate of code C;- Thus, we can build very long block
codes with large minimum Hamming Distance by combining short codes with small minimum
Hamming Distance. Given the procedure used to construct the product code, it is clear that the
Hセ@ - セI@ last columns of the matrix are codewords of C1. By using the matrix generator, one can
show that-the last rows of matrix Pare codewords of セM Hence all the rows of matrix Pare
codewords of C1 and all the columns of matrix Pare codewords of セM
Let us now consider the decoding of the rows and columns of a product code P transmitted
ona Gaussian Channel using QPSK signaling. On receiving matrix R corresponding to a
transmitted codeword E, the first decoder performs the soft decoding of the rows (or columns)
of P using as input matrix R. Soft Input I Soft Output decoding is performed using the new
algorithm proposed by R. Pyndiah. By subtracting the soft input from the soft output we obtain
the extririsic information W(2) where index 2 indicates that we are considering the extrinsic
information for the second decoding of P which was computed during the first decoding of P.
The soft input for the decoding of the columns (or rows) at the second decoding ofPis given by
R(2) = R + a(2)W(2), (6.61)
where a(2) is a scaling factor which takes into account -the fact that the standard deviation of
samples in matrix R and in matrix W are different. The standard deviation of the extrinsic
information is very high in the first decoding steps and decreases as we iterate the decoding.
This scaling factor a is also used to reduce the effect of the extrinsic information in the soft
Convolutional Codes
decoder in the first decoding steps when the BER is relatively high. It takes a small value ゥセ@ the
first decoding steps and increases as the BER tends to 0. tセ・@ decodin? pr?cedure descnbed
above is then generalized by cascading elementary decoders Illustrated m Fig. 6.33.
a(m) B(m) l
W(m + 1)
R
R
DetAY LINE
R
Fig. 6.33 Block Diagram of Elementary Block Turbo Decoder.
Let us now, briefly, look at the performance of Turbo Codes and compare it to that of other
existing schemes. As shown in Fig. 6.34, Turbo Codes are the best practical codes due to their
performance at lowSNR (at high SNRs, the Reed Solomon Codes ッセエー・イヲッイュ@ Turbo Codes!). セエ@
is obvious from the graph that the Recursive Systematic Convolutional (RSC) Turbo Code 1s
the best practical code known so far because it can achieve low BER at セッセ@ SNR and ゥセ@ the
closest to the theoretical maximum of channel performance, the Shannon Limit. The magnitude
of how well it performs is determined by the coding gain. It can be recalled that the coding gain
is the difference in SNR between a coded channel and an uncoded channel for the same
performance (BER). Coding gain can be determined by measuring the distance between the
10--{)
10-1 r---
10-2
10-3
10-4
Bit error 10-S
rate
QPセ@
10-7
QPセ@
D
[
A
10-9
-1
en
16
::J
::J
0
::J
-i
:::T
(1)
0
iセ@
('i"
Ill
r
3"
0
 -r-- セ@ セゥョァ@ u.,
|Hセ@ ,
--...:..
F---r---.....
a
セ|セ@ Gセ@
q
セ@
II <:ll セ||ゥ@ -8 5 dB(/ )
I Z。セ|@
11t
セ@
 セ@ ,
0
セQ|@ 

2 3 4 5 6 7 8
Signal to Noise Ratio (dB)
Fig. 6.34 Comparison of Different Coding Systems.
セ@
9 10
!· Information Theory, Coding and Cryptography
SNR values of any of the coded channels and the uncoded channel at a given BER. For example,
the coding gain for the RSC Turbo code, with rate 112 at a BER of 10-5
, is about 8.5 dB. The
physical consequence can be visualized as follows. Consider space communication where the
received power follows the inverse square law (PR oc 11d2
). This means that the Turbo coded
signal can either be received 2.65 (= -J7) times farther away than the uncoded signal (at the same
transmitting power), or it only requires 1/7 the transmitting power (for same transmitting
distance). Another way of looking at it is to turn it around and talk about portable device battery
lifetimes. For instance, since the RSC Turbo Coded Channel requires only 1/7 the power of the
uncoded channel, we can say that a device using a Turbo codec, such as a cell phone, has a
battery life 7 times longer than the device without any channel coding.
6.13 CONCLUDING REMARKS
The notion of convolutional codes was first proposed by Elias (1954) and later developed by
Wozencraft (1957) and Ash (1963). A class of multiple error correcting convolutional code was
suggested by Massey (1963). The study of the algebraic structure of convolutional codes was
carried out by Massey (1968) and Forney (1970).
Viterbi Decoding was developed by AndrewJ. Viterbi, founCler of Qualcomm Corporation.
His seminal paper on the technique titled "Error Bounds for Convolutional Codes and an
Asymptotically Optimum Decoding Algorithm," was published in IEEE Transactions on
Information Theory, Volume IT-13, pages 260-269, in April, 1967. In 1968, Heller showed that
the Viterbi Algorithm is practical if the constraint length is not too large.
Turbo Codes represent the next leap forward in error correction. Turbo Codes were
introduced in 1993 at the International Conference on Communications (ICC) by Berrou, Glavieux
arrl Thitimajshima in their paper "Near-Shannon-Limit Error Correction Coding and
Decoding- Turbo-Codes". These codes get their name from the fact that the decoded data are
recycled through the decoder several times. The inventors probably found this reminiscent of
the way a turbocharger operates. Turbo Codes have been shown to perform within 1 dB of the
Shannon Limit at a BER of 1o-5
. They break a complex decoding problem down into simple
steps, where each step is repeated until a solution is reached. The term "Turbo Code" is often
used to refer to turbo convolutional codes (TCCs)-one form of Turbo Codes. The symbol-by-
symbol Maximum A Posteriori (MAP) algorithm of Bahl, Cocke, Jelinek and Raviv, published in
1974 (nineteen years before the introduction of Turbo Codes!), was used by Berrou et al. for the
iterative decoding of their Turbo Codes. The complexity of convolutional codes has slowed the
development of low-cost TCC decoders. On the other hand, another type of Turbo Code,
known as Turbo Product Code (TPC), uses block codes, solving multiple steps simultaneously,
thereby achieving high data throughput in hardware.
Convolutional Codes
SUMMARY
• An important sub-class of tree codes is called Convolutional Codes. Convolutional Codes
make decisions based on past information, i.e. memory is required. A (AQ, no) tree code
that is linear, time-invariant, and has a finite wordlength k= (m + 1)AQ is called an (n, k)
Convolutional Code.
• For Convolutional Codes, much smaller blocks of uncoded data of length hQ are used.
These are called Information Frames. These Information Frames are encoded into
Codeword Frames of length Tlo· The rate of this Tree Code is defined as R = .5!_.
no
• The constraint length of a shift register encoder is defined as the number of symbols it
can store in its memory.
• For Convolutional Codes, the Generator Polynomial Matrix of size hQ x 1lo is given by
C{D) = [gij(D)], where, g9
{D) are the generator polynomials of the code. gij(D) are obtained
simply by tracing the path from input i to output j.
• The Wordlength of a Convolutional Code is given by k= hQ , fi?.3:X [deg giJ (D) + 1], the
1,)
Blocklength is given by n= 1lo , セ@ [deg gij(D) + 1] and the constraint length is given by
1,)
*<J
v = L ュセ@ [deg gij (D)]
i= 1 1
• The encoding operation can simply be described as vector matrix product, C(D) =
*<J
I(D) G(D), or equivalently, c
1(D)= Liz(D)gz,1(D).
i=l
• A parity check matrix H(D) is an (no - セI@ by 1lo matrix of polynomials that satisfies
G(D)H(D)T= 0, and the syndrome polynomial vector which is a (no- AQ)-componentrow
vector is given by s (D) = v(D) H (D) T.
• A systematic encoder for a convolutional code has the generator polynomial matrix ot
the form G(D)= [/I P (D)], where I is a hQ by セ@ identity matrix and P (D) is a セ@ by (no - AQ)
matrix of polynomials. The Parity check polynomial matrix for a Systematic
Convolutional Encoder is H(D)= [- P(Df I/].
• A Convolutional Code whose generator polynomials g1(D), g2(D),..., griJ(D) satisfy
GCD[g1(D), g2(D), ..., griJ(D)] = Jf, for some ais called a Non-Catastrophic Convolutional
Code. Otherwise it is called a Catastrophic Convolutional Code.
• The lth minimum distance lzof a Convolutional Code is equal to the smallest Hamming
Distance between any two initial codeword ウ・セ・ョエウ@ l frame long that are not identical in
the initial frame. If l= m+ 1, then this (m + 1) minimum distance is called the minimum
distance of the code and is denoted by d*, where m is the number of information frames
that can be stored in the memory of the encoder. In literature, the minimum distance is
also denoted by dmin .
Information Theory, Coding and Cryptography
• If the l th minimum distance of a Convolutional Code is d; , the code can correct terrors
occurring in the first l frames provided, d[セ@ 2t + 1. The free distance of a Convolutional
Code is given by セ・@ = mF [dz].
• The Free Length nfree of a Convolutional Code is the length of the non-zero segment of a
smallest weight convolutional codeword of non zero weight. Thus, d1= flt,ee if l= "free, and
d1<dfree if l < nfree . In literature, nfree is also denoted by n00 •
• Ano!her way to find out the flt,ee of a Convolutional Code is use the concept of a
generating function, whose expansion provides all the distance information directly.
• The generator matrix for the Convolutional Code is given by
[
G0 G1 G2 ··· Gm
G= 0 Go G1 .. · Gm- 1 Gm
0 0 G0 Gm _2 Gm -1 Gm
0 0
0 0 '"]
0 0 .. .
0 0 .. .
0
• The Viterbi Decoding Technique is an efficient decoding method for Convolutional
·Codes. Its computational requirements grow exponentially as a function of the constraint
length.
• For rate R and constraint length, let d be the largest integer that satisfies H( セ@ v ) ,; I - R .
Then, at least one Binary Convolutional Code exists with minimum distance dfor which
the above inequality holds. Here H(x) is the familiar entropy function for a binary
alphabet.
• For a binary code with R = llno the minimum distance dmin satisfies dmin セ@ UnoV + no)!2J,
where LI Jdenotes the largest integer less than or equal to 1
• An upper bound on dfree is given by Heller is dfree = min ャセMMMMMエ⦅Hカ@ + j -1)j. To
j'?.1 2 21 -1
calculate the upper bound, the right hand side should be plotted for different integer
values of j. The upper bound is the minimum of this plot.
• For Convolutional Codes, the upper bound on the first error probability can be obtained
1 ()T(D, I)
by Pe セ@ T(D)ID= 2
セ
Q M ) and the bit error probability P6 セM
MカpセャMpj@ k ()I r:tl--
1 = 1,<D= 2-.; p(1- p)
• Turbo codes are actually a quasi mix between Block and Convolutional Codes. Turbo
Codes typically use at least two convolutional component encoders and two maximum
aposteriori (MAP) algorithm component decoders in the Turbo Codes. Although the
encoder determines the capability for the error correction, it is the decoder that
determines the actual performance.
Convolutional Codes
セ@ It't-Jc.iwLof.c.·""'-0- do- the,- rmャャGィセ@ • セjj@ セNN@ セ@
i ! 6 f!NrV セᄋ@ ..イセセ@ I
i 1 Walt Disney (1901-1966)
Gr---------------------------------------------------_j
PRV13LEjvtS
6.1 Design a rate 1
12 Convolutional encoder with a constraint length v = 4 and d* = 6.
(i) Construct the State Diagram for this encoder.
(ii) Construct the Trellis Diagram for this encoder.
(iii) What is the dfree for this code?
(iv) Give the Generator Matrix, G.
(v) Is this code Non-Catastrophic? Why?
セ・ウゥァョ@ a (12, 3) systematic convolutional encoder with a constraint length v = 3 and
'?.8.
(i) Construct the Trellis Diagram for this encoder.
(ii) What is the dfree for this code?
セッョウゥ、・イ@ the binary encoder shown in Fig. 6.35.
Fig. 6.35
(i) Construct the Trellis Diagram for this encoder.
'@YW"rite down the values ッヲセG@ no, v, mand R for this encoder. j,: I I ?lt: ,, v; t;..
(iii) What are the values of d* and dfree for this code? /rl::. 4-. ,._ :. セ@ .•
セゥセZM the Generator pッャIGiAッセイセ@
. Gセ@ [D+ I pセMQM D'" {/+ dセMエ@ D).l
iセ@
jt
i
j
j
Information Theory, Coding and Cryptography
セッョウゥ、・イ@ the binary encoder shown in Fig. 6.36.
Fig. 6.36
(i) Write down the values of k, n, v, m and R for this encoder.
(ii) Give the Generator Polynomial Matrix G{D) for this encoder.
(iii) Give the Generator Matrix G for this encoder.
(iv) Give the Parity Check Matrix H for this encoder.
(v) What are the values of a, セ・@ and nfree for this code?
(vi) Is this encoder optimal in the sense of the Heller Bound on dfree-
(vii) Encode the following sequence of bits using this encoder: 101 001 001 010 000.
セッョウゥ、・イ@ a tonvolutional encoder described by its Generator pッッOエAセセゥ。ャ@ Matrix, defined
over GF(2): ---
--------.....
[
D 0
G{D) = D 2
0
1 0
1 D2
0 1+D
D 2
0
01Draw the circuit realization of this encoder using shift registers. What is the value of
v? - - - -
(ii) Is this a Catastrophic Code? Why?
(iii) Is this code optimal in the sense of the Heller Bound on dfree .
Convolutional Codes
6.6 The Parity Check Matrix of the (12, 9) Wyner-Ash code form= 2 is given as follows.
1 1 1 1 : I
I
I I I
1 1 0 o:1 1 1 1 I I
I I
I
0
I I
1 0 1 0: 1 1 o:1 1 1 1 I
H= I
0 0 0 o:1 0 1 0 : 1 1 0 0 1 1 1 1 : ...
I I I
0 0 0 o:o 0 0 0: 1 0 1 0 1 1 0 o:
(i) Determine the Generator Matrix, G.
(ii) Determine the Generator Polynomial Matrix, G{D).
(iii) Give the circuit realization of the (12, 9) Wyner-Ash Convolutional Code.
(viii) What are the values of d* and dfree for this code?
6.7 Consider a Convolutional Encoder defined over GF(4) with the Generator Polynomials
g1(D) = 2D3
+ 3D2
+ 1 and
セHdI@ = D3
+ D + 1.
(i) What is the minimum distance of this code?
(ii) Is this code Non-Catastrophic? Why?
セエ@ the Generator Polynomials of a 113 binary Convolutional Encoder be given by
g1(D) =D3
+ d + 1,
セHdI@ = D3
+ D and
&(D)= D3
+ 1.
セョ」ッ、・@ the bit stream: Q_J__1OQQ11110101;
(ii) Encode the bit stream: 10i0101010 ....
(iii) Decode the received bit stream: 001001101111000110011.
6.9. Consider a rate 1
12 Convolutional Encoder defined over GF (3) with the Generator
Polynomials
g1(D) = 2d + 2Ii + 1 and
セHdI]@ Jj + D + 2.
(i) Show the circuit realization of this encoder.
(ii) What is the minimum distance of this code?
(iii) Encode the following string of symbols using this encoder: 2012111002102.
(iv) Suppose the error vector is given by 0010102000201. Construct the received vector
and then decode this received vector using the Viterbi Algorithm.
COMPUTER PROBLEMS
6.10 Write a computer program that determines the Heller Bound on dfree, given the values for
n0 and v.
I204J Information Theory, Coding and Cryptography
6.11 Write a computer program to exhaustively search for good systematic Convolutional
Codes. The program should loop over the parameters セG@ no, v, m, etc. and determine the
Generator Polynomial Matrix (in octal format) for the best Convolutional Code in its
category.
6.12 Write a program that calculates the d and dfree given the generator polynomial matrix of
any convolutional encoder.
6.13 Write a computer program that constructs all possible rate 1
12 Convolutional Encoder for
a given constraint length, v and chooses the best code for a given value of v. Using the
program, obtain the following plots:
(i) the minimum distance, d* versus v, and
(ii) the free distance, fip.ee versus v.
Comment on the error correcting capability of Convolutional Codes in terms of the
memory requirement.
6.14 Write a Viterbi Decoder in software that takes in the following:
(i) code parameters in the Octal Format, and
(ii) the received bit stream
The decoder then produces the survivors and the decoded bit stream.
6.15 Verify the Heller Bound on the entries in Table 6.4 for v = 3 , 4, ..., 7.
6.16 Write a generalized computer program for a Turbo Encoder. The program should take in
the parameters for the two encoders and the type of interleaver. It should then generate
the encoded bit-stream when an input (uncoded) bit-stream is fed into the program.
6.17 Modify the Turbo Encoder program developed in the previous question to determine the
dr-ee of the Turbo Encoder.
6.18 Consider a rate 113 Turbo Encoder shown in Fig. 6.37. Let the random interleaver size
be 256 bits.
(i) Find the fip.ee of this Turbo encoder.
(ii) If the input bit rate is 28.8 kb/s, what is the time delay caused by the Encoder.
6.19 Write a generalized computer program that performs Turbo Decoding using the iterative
MAP Decoding algorithm. The program should take in the parameters for the two
encoders, the type of interleaver used for encoding and the SNR It should produce a
sequence of decoded bits when fed with a noisy, encoded bit-stream.
6.20 Consider the rate 1/3 Turbo Encoder comprising the following constituent encoders:
G (D) = G (D)= (1 1+ D2 + D3 + D4 )
1 2 1+ D + D4 .
The encoded output consists of the information bit, followed by the two parity bits from
the two encoders. Thus the rate of the encoder is 113. Use a random interleaver of size
256.
Convolutional Codes
セMMMMMMMMMMMMMMMMMMMMセ@
Fig. 6.37 Turbo Encoder for Problem 6.78.
(i) For this Turbo Encoder, generate a plot for the bit error rate (BER) versus the signal
to noise ratio (SNR). Vary the SNR from -2 dB through 10 dB.
(ii) Repeat the above for an interleaver of size 1024. Comment on your results.
I
.,
l
7
Trellis Coded Modulation
7.1 INTRODUCTION TO TCM
In the previous chapters we have studied a number of error control coding techniques. In all
these techniques, extra bits are added to the information bits in a known manp.er. However, the
improvement in the Bit Error Rate is obtained at the expense of bandwidth caused by these
extra bits. This bandwidth expansion is equal to the reciprocal of the code rate.
For example, an RS (255, 223) Code has a code rateR= 223/255 = 0.8745 and IIR= 1.1435.
Hence, to send 100 information bits, we have to transmit 14.35 extra bits (overhead). This
tr3.I1:slates to a bandwidth expansion of 14.35%. Even for this efficient RS (255, 223) code, the
excess bandwidth requirement is not small.
In power limited channels (like deep space communications) one may trade the bandwidth
expansion for a desired performance. However, for bandwidth limited channels (like the
telephone channel), this may not be the ideal option. In such channels, a bandwidth efficient
signalling scheme such as Pulse Amplitude Modulation (PAM), Quadrature Amplitude
Trellis Coded Modulation
Modulation (QAM) or Multi Phase Shift Keying (MPSK) is usually employed to support high
bandwidth efficiency (in bit!s/Hz).
In general, either extra bandwidth or a higher signal power is needed in order to improve the
performance (error rate). Is it possible to achieve an improvement in system performance
without sacrificing either the bandwidth (which translates to the data rate) or using additional
power? In this chapter we study a coding technique called the Trellis Coded Modulation
Technique, which can achieve better performance without bandwidth expansion or using extra
power.
We begin this chapter by introducing the concept of coded modulation. We, then, study
some design techniques to construct good Coded Modulation Schemes. Finally, the
performance of different Coded Modulation Schemes are discussed for Additive White
Gaussian Noise (AWGN) Channels as well as for Fading Channels.
7.2 THE CONCEPT OF CODED MODULATION
Traditionally, coding and modulation were considered two separate parts of a digital
communications system. The input message stream is first channel encoded (extra bits are
added) and then these encoded bits are converted into an analog waveform by the modulator.
The objective of both the channel encoder and the modulator is to correct errors resulting from
use of a non-ideal channel. Both these blocks (the encoder and the modulator) are optimized
independently even though their objective is the same, that is, to correct errors introduced by the
channel! As we have seen, a higher performance is possible by lowering the code rate at the cost
of bandwidth expansion and increased decoding complexity. However, it is possible to obtain
Coding Gain without bandwidth expansion if the channel encoder is integrated with the
modulator. We illustrate this by a simple example.
Example 7.1 Consider data transmission over a channel with a throughput of 2 bits/s/Hz. One
possible solution is to use uncoded QPSK. Another possibility is to first use a rate 213
Convolutional Encoder (which converts 2 uncoded bits to 3 coded bits) and then use an 8-PSK
signal set which has a throughput of 3 bit/s/Hz. This coded 8-PSK scheme yields the same
information data throughput of the uncoded QPSK (2 bit/s/Hz). Note that both the QPSK and the
8-PSK schemes require the same bandwidth. But we know that the. symbolerrorrate for the 8-PSK
is worse than that of QPSK for the same energy per symbol However, the 213 convolutional
encoder would provide some coding gain. It may be possible that the coding gain provided by the
encoder outweighs the performance loss because of the 8-PSK signal set. Ifthe coded modulation
scheme performs superior to the uncoded one at the same SNR, we can claim that an improvement
is achieved without sacrificing either the data rate or the bandwidth. In this example we have
combined a trellis encoder with the modulator. Such a scheme is called a Trellis Coded
Modulation (TCM) scheme.
.,
1'
I
QセL@i,:
I'
Information Theory, Coding and Cryptography
We observe that the expansion of the signal set to provide redundancy results in the shrinking
of the Euclidean distance between the signal points, if the average signal energy is to be kept
constant (Fig. 7.1). This reduction in the Euclidean distance increases the error rate which
should be compensated with coding (increase in the Hamming Distance). Here we are assuming
an AWGN channel. We also know that the use of a hard-decision demodulation prior to decoding
in a coded scheme causes an irreversible loss of information. This translates to a loss of SNR.
For coded modulation schemes, where the expansion of the signal set implies a power penalty,
the use of soft-decision decoding is imperative. As a result, demodulation and decoding should be
combined in a single step, and the decoder should operate on the soft output samples of the
channel. For maximum likelihood decoding using soft-decisions, the optimal decoder chooses
that code sequence which is nearest to the received sequence in terms of the Euclidean distance.
Hence, an efficient coding scheme should be designed based on maximizing the minimum
Euclidean distance between the coded sequences rather than the Hamming Distance.
QPSK 8-PSK
s1 52
S3
Xセ@ = 2E8
812 = 4Es
Ss
Xセ@ = 0.586 E5
812=2Es
8; = 3.414 E5
8t = 4 Es
Fig. 7.1 The Euclidean Distances between the Signal Points for QPSK and 8-PSK.
The Viterbi Algorithm can be used to decode the received symbols for a TCM scheme. In the
previous chapter we saw that the basic idea in Viterbi decoding is to trace out the most likely
path through the trellis. The most likely path is that which is closest to the received sequence in
terms of the Hamming Distance. For a TCM scheme, the Viterbi decoder chooses the most
likely path in terms of Euclidean Distance. The performance of the decoding algorithm depends
on the minimum Euclidean distance between a pair of paths forming an error event.
Definition 7.1 The minimum Euclidean Distance between any two paths in the
trellis is called the Free Euclidean Distance, dfr« of the TCM scheme.
Trellis Coded Modulation
In the previous chapter we had defined dp.ee in terms of Hamming Distance between any two
paths in the trellis. The minimum free distance in terms of Hamming Weight could be calculated
as the minimum weight of a path that deviates from the all zero path and later merges back into
the all zero path at some point further down the trellis. This was a consequence of the linearity
of Convolutional Codes. However, the same does not apply for the case of TCM, which is non
linear. It may be possible that dfree is the Euclidean Distance between two paths in the trellis
neither of which is the all zero path. Thus, in order to calculate the Free Euclidean Distance for
a TCM scheme, all pairs of paths have to be evaluated.
Example 7.2 Consider the convolutional encoder followed by a modulation block performing
natural mapping (000 セ@ s0 , 001 セ@ s1 , .•., 111 セ@ s7 ) shown in Fig. 7.2. The rate of the encoder
is 2/3.1t takes in two bits at a time (a1, a2J and outputs three encoded bits (c1 , c2 , c3 ). The three
output bits are then mapped to one of the eight possible signals in the 8-PSK signal set.
(8-PSK)
Fig. 7.2 The TCM Scheme for Example 7.2.
This combined encoding and modulation can also be represented using a trellis with its branches
labelled with the output symbol si. The TCM scheme is depicted below. This is a fully connected
trellis. Each branch is labelled by a symbol from the 8-PSK constellation diagram. In order to
represent the symbol allocation unambiguously, the assigned symbols to the branches are written
at the front end ofthe trellis. The convention is as follows. Consider state 1. The branch from state
1 to state 1 is labelled with s0, branch from state 1 to state 2 is labelled with s7 , branch from state
1 to state 3 is labelled with s5 and branch from state 1 to state 4 is labelled with s2. So, the 4-tuple
{s0 , s7 , s5 , s2 } in front of state 1represents the branch labels emanating from state 1 in sequence.
To encode any incoming bit stream, we follow the same procedure as for convolutional encoder.
However, in the case of TCM, the output is a sequence of symbols rather than a sequence of bits.
Suppose we have to encode the bit stream 1 0 1 1 1 0 0 0 1 0 0 1 ... We first group the input
sequence in pairs because the input is two bits at a time. The grouped input sequence i&
10111000 ...
The TCM encoder output can be obtained simply by following the path in the trellis as dictated
by the input sequence. The first input pair is 10. Starting from the first node in state 0, we traverse
the third branch emanating from this node as dictated by the input 01. This takes us to state 2. The
I
I
I
Information Theory, Coding and Cryptography
symbol output for this branch is s5• From state 2 we move along the fourth branch as determined by
the next input pair 11. The symbol output for this branch is s1• In this manner, the output symbols
corresponding to the given input sequence is
State 0: So s, ss
State 1: ss s2 so
%
8-PSK
8:2
Fig. 7.3 The Path in the Trellis Corresponding to the Input Sequence 10 11 10 00 ...
The path in the Trellis Diagram is depicted by the bold lines in Fig. 7.3. As in the case of
convolutional encoder, in TCM too, every encoded sequence corresponds to a unique path in the
trellis. The objective of the decoder is to recover this path from the Trellis Diagram.
Example 7.3 Consider the TCM scheme shown in Example 7.2. The free Euclidean Distance,
dfree of the TCM scheme can be found by inspecting all possible pairs of paths in the trellis. The
two paths that are separated by the minimum squared Euclidean Distance (which yields the セイ・・I@
are shown in the Trellis Diagram given in Fig. 7.4 with bold lines.
8-PSK
8:2
%
Fig. 7.4 The Two Paths in the Trellis that have the Free Euclidean Distance, d 2
tree·
Trellis Coded Modulation
d}ee =4 (sa, s7) +tJl(sa, sa)+ 4 (-S2, si)
= B5 + 0 + B5 = 2B5 = 1.172 Er
It can be seen that in this case, the error event that results in dfree does not involve the all zero
sequence. As mentioned before, in order to find the dfree> we must evaluate all possible pairs of
paths in.the trellis. It is not sufficient just to evaluate the paths diverging from and later merging
back ゥョエセ@ the all zero path because of the non-linear nature of TCM.
We must now develop a method to compare the coded scheme with the uncoded one. We
introduce the concept of coding gain below.
Definition 7.2 The difference between the values of the SNR for the coded and
uncoded schemes required to achieve the same error probability is defined as the
Coding Gain, g.
g= SNRiuncodtd- SNRicodtd {7.1)
At high SNR, the coding gain can be expressed as
(d}m!Es)
g,., = giSNR--too = 10 log
2
coded (7.2)
(dfre/Es )lmCIJdtd
where g,., represents the Asymptotic Coding Gain and Es is the average signal
energy. For uncoded schemes, dfree is simply the minimum Euclidean Distance
between the si al oints.
Example 7.4 Consider the TCM scheme discussed in Example 7.2 in which the encoder takes in
2 bits at a time. Ifwe were to send uncoded bits, we would employ QPSK. Thedfree for the uncoded
scheme (QPSK) is 2£8
from Fig. 7.1. From Example 7.3 we have dfree = 1.172£8 for our TCM
scheme. The Asymptotic Coding Gain is then given by
goo= 10 log 1.1
72
= -2.3 dB.
2
This implies that the performance of our TCM scheme is actually worse than the uncoded
scheme. A quick look at the convolutional encoder used in this example suggests that it has good
properties in terms of Hamming Distance. In fact, it can be verified that this convolutional encoder
is optimal in the sense of maximizing the free Hamming Distance. However, the encoder fails to
perform well for the case ofTCM. This illustrates the point that TCM schemes must be designed to
maximize the Euclidean Distance rather than the Hamming Distance.
,It
'i
i
I'
セ@
Information Theory, Coding and Cryptography
For a fully connected trellis discussed in Example 7.2, by a proper choice of the mapping
·scheme, we can improve the performance. In order to design a better TCM scheme, it is possible
to directly work from the trellis onwards. The objective is to assign the 8 symbols from the
8-PSK signal set in such a manner that the dfree is maximized. One approach is to use an
exhaustive computer search. There are a total of 16 branches that have to be assigned labels
(symbols) from timet k tot k+- 1 . We have 8 symbols to choose from. Thus, an exhaustive search
would involve 816
different cases!
Another approach is to assign the symbols to the branches in the trellis in a heuristic manner
so as to increase the 、ヲイ・セ@ We know that an error event consists of a path diverging in one state
and then merging back after one or more transitions, as depicted in Fig. 7.5. The Euclidean
Distance associated with such an error event can be expressed as
d'fotaz= d1 (diverging pair of paths)+ ... + d セ@ (re-merging pair of paths) (7.3)
VNodesin the Trellis
Fig. 7.5 An Error Event.
Thus, in order to design a TCM scheme with a large dfree• we can at least ensure that the 、セ@
(diverging pair of paths) and the 、セ@ (re-merging pair of paths) are as large as possible. In TCM
schemes, a redundant 2m+1
-ary signal set is often used to transmit m bits in each signalling
interval. The minput bits are first encoded by an ml(m+1) convolutional encoder. The resulting
m + 1 output bits are then mapped oo the signal points of the 2m+l_ary signal set. Now, recall that
the maximum likelihood decoding rule for the AWGN channel with soft decision decoding is
tominimize the squared Euclidean Distance between the received vector and the code vector
estimate from the trellis diagram (see Section 6.7, Chapter 6). Therefore, the mapping is done in
such a manner as to maximize the minimum Euclidean Distance between the different paths in
the trellis. This is done using a rule called Mapping by Set Partitioning.
7.3 MAPPING BY SET PARTITIONING
The Mapping by Set Partitioning is based on successive partitioning of the expanded 2m+1
-ary
signal set into subsets with increasing minimum Euclidean Distances. Each time we partition
the set, we reduce the number of the signal points in the subset, but increase the minimum
distance between the signal points in the subset. The set partitioning can be understood by the
following example.
Trellis Coded Modulation
iS E I ... 21 t ' &Sill
Example 7.5 Consider the set partitioning of 8-PSK. Before partitioning, the minimum Euclidean
Distance of the signal set is 6.o = Cia . In the first step, the 8 points in the constellation diagram are
subdivided into two subsets,A0 andA1 , each containing 4 signal points as shown in Fig. 7.6. As a
result of this first step, the minimum Euclidean Distance of each of the subsets is now L1o = <>t,
which is larger than the minimum Euclidean Distance of the original 8-PSK. We continue this
procedure and subdivide the sets A0 and A1 and into two subsets each, A0 セ@ {Aoo. A01 } and A1 セ@
{A10
, aセ Q@ }. As a result of this second step, the minimum Euclidean Distance ofeach ofthe subsets
is now セ@ = セ@ . Further subdivision results in one signal point per subset.
±
OM[ーセ@
* *
.
.61 = 81 A0 A1
• •
/ セ@ / セ@
* *0 *. *0
.62
= 15:3 Aoo Ao1 A1o A11
0 0 • 0 0 •
/ / / /
セッe@ _o'f ッセ@ セッセ@ セッGエセ@ ± セッGケ@ ッセ@ £ セッGセ@
セセセセセセセセ@
Fig. 7.6 Set Partitioning of the 8-PSK S•signal Set.
Consider the expanded 2m+1 -ary signal set used for TCM. In general, it is not necessary to continue
the process of set partitioning until the last stage. The partitioning can be stopped as soon as the
minimum distance of a subset is larger than the desired minimum Euclidean Distance of the TCM
scheme to be designed. Suppose the desired Euclidean Dista.1ce is obtained ェオウセ@ after the iii + 1th
set partitioning step ( iii セ@ m). It can be seen that after iii+ 1 steps we have 2m+
1
subsets and
each subset contains 2m- msignal points.
A general structure of a TCM encoder is given in Fig. 7.7. It consists of m ゥョセセエ@ bits of セィゥセィ@
the fh bits are fed into a rate m!( fh+ 1) convolutional encoder while the remammg m- m b1ts
are left uncoded. The fh + 1 output bits of the encoder along with the m - fh uncoded bits are
then input to the signal mapper. The ウセァョ。ャ@ mapper uses the fh + 1 bits from the 」ッセカッャオエゥッョ。ャ@
encoder to select one of the possible 2m+ 1
subsets. The remaining m - fh uncoded bits are used
to select one of the 2m+ m signals from this subset. Thus, the input to the TCM encoder are m
bits and the output is a signal point chosen from the original constellation.
1
I
I
I

,I
m
Information Theory, Coding and Cryptography
m-m
uncoded bits
"' ''
(
1
1
1
セ@ Signal mapper
I I }
m
lnputbits セ@
I
MMMM[セMMMMMMMMMMMNNNANNNZ@ --T--
1
MM]MMセ@ Select signal
I'

I
I
I
I
I
I
'_I I
-I
;n
I
I
I
'. ,' from subset
''.
セZ[[セイMMM[MMH@ -:r-
1
MMMMMKMセ@ } Select subset
I'/
;n + 1
coded bits
Fig. 7.7 The General Structure of a TCM Encoder.
For the TCM encoder shown in Fig. 7.7 we observe that m- muncoded bits have no effect on
the state of the convolutional encoder because the input is not being altered. Thus, we can
change the first m - m bits of the total m input bits without changing the encoder state. This
implies that 2m - m parallel transitions exist between states. These parallel transitions are
associated with the signals of the subsets in the lowest layer of the set partitioning tree. For the
case of m = m,the states are joined by single transitions.
Let us denote the minimum Euclidean Distance between parallel transitions by 11m + 1
and
the minimum Euclidean Distance between non-parallel paths of the trellis by dfree (m). The free
Euclidean Distance of the TCM encoder shown in Fig. 7.7 can then be written as
dfree = min [!1m + 1' セ・・@ (m)]. (7.4)
EXtllllple 7.6 Consider the TCM scheme·proposed by Ungerboeck. It is designed to maximize the
Free Euclidean Distance between coded sequences. It consists ofa rate 2/3 convolutional encoder
coupled with an 8-PSK signal set mapping. The encoderis given inFig. 7.8 and the corresponding
trellis diagram in Fig. 7.9.
QQQMMMMMMMMMMMMMMMMセ@ c1
Natural
li2 MMMMMMMMMMMNNNNMMMMMMセ@ セ@ Mapping
(8-PSK)
Fig. 7.8 The TCM Encoder for Example 7.6.
S;
Trellis Coded Modulation
So
So 8.4 S;z Ss
s1 ss sa s-r
5:2 -% So 54
sa s-r s1 ss
Fig. 7.9 The Trellis Diagram for the Encoder in Example 7.6.
For this encoderm = 2 and m= 1, which implies that there are 2m-m = 21
= 2 parallel transitions
between each state. The minimum squared Euclidean distance between parallel transitions is
..t2 _ ..t2 _ i:'2 _
4E
.u m+1 - .u2- u2- s ·
The minimum squared Euclidean Distance between non-parallel paths in the trellis, dfne (m ), is
given by the error event shown in Fig. 7.9 by bold lines. From the figure, we have
d)u (m) = 、セHセL@ oS2 ). + 4 HセL@ s1) + d'i: HセL@ oS2)
= Df + Xセ@ + 8f = 4.586 Es.
The error events associated with the parallel paths have the minimum squared Euclidean
Distance among all the possible error events. Therefore, the minimum squared Euclidean Distance
for the TCM scheme is clJree = mm(!1
2
m+ 1, dJne (m)] = 4Es. The asymptotic coding gain of this
schemejs
4
g00 = 10 log Z = 3 dB
This shows that the TCM scheme proposed by Ungerboeck shows an improvement of 3 dB over
the uncoded QPSK. This example illustrates the point that the combined coded modulation scheme
can compensate for the loss from the expansion ofthe signal set by the coding gain achieved by the
convolutional encoder. Further, for the non-parallel paths
、セ@ = li (diverging pair of paths) + ... + ii {re-merging pair of paths)
= セR@ + ... + 5{ = (5{ + セ R@
) + ... = 8l + ... = 4Es + ...
However, the minimum squared Euclidean Distance for the parallel transition is 8f= 4Es .
Hence, the minimum squared Euclidean Distance ofthis TCM scheme is determined by the parallel
transitions.
Information Theory, Coding and Cryptography
7.4 UNGERBOECK'S TCM DESIGN RULES
In 1982 Ungerboeck proposed a set of design rules for maximizing the free Euclidean Distance
·for TCM schemes. These design rules are based on heuristics.
Rule 1: Parallel Transitions, if present, must be associated with the signals of the subsets in the
lowest layer of the Set Partitioning Tree. These signals have the minimum Euclidean Distance
Rule 2: The transitions originating from or merging into one state must be associated with signals
of the first step of set partitioning. The Euclidean Distance between these signals is at least L11•
Rule 3: All signals are used with equal frequency in the Trellis Diagram.
2£& w; t : a a : .a a a
Example 7.7 Next, we wish to improve upon the TCM scheme proposed in Example 7.6. We
observed in the previous example that the parallel transitions limit the dfree .Therefore, we must
come up with a trellis that has no parallel transitions. The absence of parallel paths would imply
that the dJ;.ee is not limited todl, the maximum possible separation between two signal points in the
8-PSK Constellation Diagram. Consider the Trellis Diagram shown in Fig. 7.10. The trellis has 8
states. There are no Parallel Transitions in the Trellis Diagram. We wish to assign the symbols
from an 8-PSK signal set to the branches of this trellis according to the Ungerboeck rules.
Since there are no parallel transitions here, we start directly with Ungerboeck's second rule. We
must assign the transitions originating from or merging into one state with signals from the first
step of set partitioning. We will refer to Fig. 7.6 for the Set Partitioning Diagram for 8-PSK. The
first step of set partitioning yields two subsets, A0 andA1 , each consisting offour signal points. We
first focus on the diverging paths. Consider the topmost node (state S0 ). We assign to these four
diverging paths the signals s0, s4, s2 ands6 . Note that they all belong to the subsetA0 . fセヲGエィ・@ next
node (stateS1 ), we assign the signalss1, s5, s3 ands7 belonging to the subsetA1. For the next node
(state S2 ), we assign the signals s4, s0, s6 and s2 belonging to the subset A0• The order has been
shuffled to ensure that at the re-merging end we still have signals from the first step of set
partitioning. If we observe the four paths that merge into the node of state S0, their branches are
labelleds0, s4, s2 ands6, which belong toAo. This clever assignment has ensured that the transitions
originating from or merging into one state are labelled with signals from the first step of set
partitioning, thus satisfying rule 2. It can be verified that all the signals have been used with equal
frequency. We did not have to make any special effort to ensure that.
The error event corresponding to the squared free Euclidean Distance is shown in the Trellis
Diagram with bold lines. The squared free Euclidean Distance of this TCM Scheme is
4,= 4 (.fo, 56 ) + 4 (.fo, s7 ) + di (.fo, s6 )
= \Uセ@ + \UセK@ \Uセ]@ 4.586 Es
Trellis Coded Modulation
State So: so S<j セ@ Ss 0
State Sf s1 ss s3 s-, 0
State S2: S<j so Sa s2 0
State S3: Ss s1 s-, s3 0 0
State S4: セ@ Ss So s4 0 0
State S5: s3 s-, s1 ss 0 0
State Ss: ss セ@ s4 so 0 0
State S7: s-, S3 ss s1 0 0 0
Fig. 7.10 The Trellis Diagram for the Encoder in Example 7.7.
In comparison to uncoded QPSK, this translates to an asymptotic coding gain of
goo = 10 log
4
·
586
= 3.60 dB
2
[W]
Thus, at the cost ofadded encoding and decoding complexity, we have achieved a 0.6 dB gain over
the TCM scheme discussed in Example 7.6.
Example 7.8 Consider the 8 state, 8-PSK TCM scheme discussed in Example 7.7. The equivalent
systematic encoder realization with feedback is given in Fig. 7.11.
。 Q MMMMMMMMMMMMMMMMMMMMMMMMMMMNMMMMMMMMMMセ」Q@
S;
MMMMMMMMMMMMMMセMMMMMMMMMMMMKMMMMMMMMMMMMMMA@ <>.2 Natural
8
2 Mapping
(8-PSK)
Fig. 7.11 The TCM Encoder for Example 7.7.
Let us represent the output of the convolutional encoder shown in Fig. 7.11 in terms of the input and
delayed versions of the input (See Section 6.3 of Chapter 6 for analytical representation of
Convolutional Codes). From the figure, we have
c1 (D)= a1 (D),
c2 (D)= a2 (D),
Information Theory, Coding and Cryptography
c3(D) = ( D
2
2 ) 。ゥdIKHセI@ a2 (D)
1+D 1+D
Therefore, the Generator Polynomial Matrix of this encoder is
G(D) = [ 1 0 1セセS@ ]
0 1 _D_
1+D3
and the parity check polynomial matrix, H(D) , satisfying G(D). H T(D) = 0 is
H (D) = [d D1 + D3
].
We can re-write the parity check polynomial matrix H(D) = [H1 (D) H1
(D) H3
(D)], where
H 1 (D) =D
2
=(000 1OO)binary =(04)octal•
H 2 (D) =D =(000 OIO)binary =(02)octal•
H 3 (D)= 1+D 3
= (001 001)binary = (11)octa1.
Table 7.1 gives the encoder realization and asymptotic coding gains of some of the good TCM
Codes constructed for the 8-PSK signal constellation. Almost all ofthese TCM schemes have been
found by exhaustive computer searches. The coding gain is given with respect to the uncoded
QPSK. The parity check polynomials are expressed octal form.
Table 7.1 TCM Schemes Using 8-PSK
4 2 5 4.00 3.01
8 04 02 11 4.58 3.6
16 16 04 23 5.17 4.13
32 34 16 45 5.75 4.59
64 066 030 103 6.34 5.01
128 122 054 277 6.58 5.17
256 130 072 435 7.51 5.75
Example 7.9 We now look at a TCM scheme that involves 16QAM. The TCM encoder takes in
3 bits and outputs one symbol from the 16QAM Constellation Diagram. This TCM scheme has a
throughput of 3 bits/s/Hz and we will compare it with uncoded 8-PSK, which also has a through-
put of 3 bits/s/Hz.
Let the minimum distance between two points in the Signal Constellation of 16QAM be 0oas
depicted in Fig. 7.12. It is assumed that all the signals are equiprobable. Then the average signal
energy of a 16QAM signal is obtained as
Trellis Coded Modulation
•• ••
e • • •
• •• •
• • • •
-...o eo 0. 0.,
!J. =J2セM o 'e o ., A
1 vo eoeo 0 A, セ@ セ@セ@ セ@
o e o e eo eo
0 0 0 0 0 • 0 •
1 1
II セセセ@
!J.3 =2-J.c.Co セ@ セ@ 0
0 0 0 0
Aooo
ooeo oooo
oooo oeoo
eooo oooo
oooo oooe
A,oo Ao1o
0 0 0 0
0 0 0 •
0 0 0 0
0. 0 0
A110
0 • 0 •
セ@ セ@ セ@セ@ Ao1
0 0 0 0
1
0 • 0 0 0 0 0 •
0 0 0 0 0 0 0 0
0 0 0 • oeoo
0 0 0 0 0 0 0 0
Aoo, A,o,
Fig. 7.12 Set Partitioning of 16QAM.
Thus we have,
Bo = Rセ@ Es
10
0000
aLLセセセセ@
eo eo
1
0 0 0 0 0000
., 0 0 0 0 0. 0
0 0 0 0 0 0 0 0
0 0. 0 • 0 0 0
Ao11 A,,,
The Trellis Diagram for the 16QAM TCM scheme is given in Fig. 7.13. The trellis has 8states.
Each node has 8 bnmches emanating from it because the encoder takes in 3 input bits at a time (23
= 8).
The encoder realization is given in Fig. 7.14. The Ungerboeck design rules are followed to assign
the symbols to the different bnmches. The branches diverging from a node and the branches
merging back to a node are assigned symbols from the set A0 and A1• The parallel paths are
assigned symbols from the lowest layer of the Set Partitioning Tree (A000, A001 , etc.).
The squared Euclidean Distance between any two parallel paths is lQセ@ = 885.This is by design
as we have assigned symbols to the parallel paths from the lowest layer ofthe set Partitioning Tree.
The minimum squared Euclidean Distance between non-parallel paths is
4 = L1f + L1o2
+ L1f = 5bo2
Therefore, the free Euclidean Distance for the TCM scheme is
dfu=min {XセL@ 58#] = sDg = 2£.
Information Theory, Coding and Cryptography
Note that the free Euclidean Distance is determined by the non-parallel paths rather than the
parallel paths. We now compare the TCM scheme with the uncoded 8-PSK, which has the same
throughput. For uncoded 8-PSK, the minimum squared Euclidean Distance is (2 - J2)Es. Thus,
the asymptotic coding gain for this TCM encoder is
2
g, = lOlog J2 5.3 dB
2- 2
State S0: Aooo A1oo Ao1o A110
So So So
State S1: Aoo1 A1o1 Ao11 A111 0
State Si Aooo A1oo A110 Ao1o 0
State S3: A1o1 Aoo1 A111 Ao11 0
State S4: Ao1o A110 Aooo A1oo 0
State S5: Ao11 A111 Aoo1 A1o1 0
State Ss: A110 Ao1o A1oo Aooo 0
State Sr: A111 Ao11 A1o1 Aoo1 o
Fig. 7.13 Trellis Diagram for the 16 QAM TCM Scheme.
。QMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ」 Q@
8;! MMMMMMMMMMMMMMMMMMMMMMMMMMMMNMMMMMMMMMMセ@ c:z Natural
Mapping
XSMMMMMMMMMMMMMMセMMMMMMMMMMMKMMMMMMMMMMセセ@
C4 (16QAM)
S;
Fig. 7.14 The Equivalent Systematic Encoder for the Trellis Diagram Shown in Fig. 7. 13.
!7 sa $ a " 9 7 ··no 0
7.5 TCM DECODER
We have seen that, like Convolutional Codes, TCM Schemes are also described using Trellis
Diagrams. Any input sequence to a TCM encoder gets encoded into a sequence of symbols
based on the Trellis Diagram. The encoded sequence corresponds to a particular path in this
trellis diagram. There exists a one-to-one correspondence between an encoded sequence and a
path within the trellis. The task of the TCM decoder is simply to identify the most likely path in
the trellis. This is based on the maximum likelihood criterion. As seen in the previous chapter,
an efficient search method is to use the Viterbi algorithm (see Section 6.7 of the previous
chapter).
Trellis Coded Modulation
For soft decision decoding of the received sequences using the Viterbi Algorithm, each trellis
branch is labelled by the branch metric based on the observed received sequence. Using the
maximum likelihood decoder for the Additive White Gaussian Noise (AWGN) channels, the
branch metric is defined as the Euclidean Distance between the coded sequence and the
received sequence. The Viterbi Decoder finds a path through the trellis which is closest to the
received .sequence in the Euclidean Distance sense.
Definition 7.3 1be branch metric for a TCM scheme designed for AWGN
channel is the Euclidean Distancebetween the received signal and the signal associated
with the corresponding branch in the trellis.
In the next section, we study the performance of TCM schemes over AWGN channels. We
also develop some design rules.
7.6 PERFORMANCE EVALUATION FOR AWGN CHANNEL
There are different performance measures for a TCM scheme designed for an AWGN channel.
We have already discussed the asymptotic coding gain, which is based on free Euclidean
Distance, d.free . We will now look at some other parameters that are used to characterize a TCM
Code.
Definition 7.4 The average number of nearest neighbours at free distance, nHセIL@
gives the average number of paths in the trellis with free Euclidean Distance drmfrom a
transmitted sequence. This number is used in conjunction with dfrte for the evaluation
of the error event probability.
Definition 7.5 Two finite length paths in the trellis form an error event if they
start form the same state, diverge and then later merge back. An error event of length
lis defined by two coded sequences sn and in ,
s, = (sn, sn+l• ... ' sn+l+l )
such that
sn+l+l = 5n+l+l
s; :t s;, i = n + 1, ... , n + L (7.5)
Definition 7.6 The probability of an error event starting at time n, given that the
decoder has estimated the correct transmitter state at that time, is called the error
・カ・セエ@ probability, Pe-
r.
r
'i
/1
li
!i
!i
II
vi
r:
I
1,
•1
.'l
Ill
II!
I!
:I
;•
I
Information Theory, Coding and Cryptography
The performance ofTCM schemes is generally evaluated by means ofupper bounds
on error event probability. It is based on the generating function approach. Let us
consider again the Ungerboek model for rate ml(m + 1) TCM scheme as shown in
Fig. 7.7. The encoder takes in mbits at a time and encodes it to m+1 bits, which are
then mapped by a memoryless mapper, f(.), on to a symbol s,, Let us call the binary
(m + 1)-tuples ci as the labels of the signals si. We observe that there is a one-to-one
」ッイイ・セーッョ、・ョ」・@ between a symbol and its label. Hence, an error event oflength l can
be equivalently described by two sequences of labels
Cz = (ck, ck+l' ..., 」ォKセャI@ and C[= (ck, c'k+b ..., c GォKセャ@ ), (7.6)
where, ck = ck EB ek, c'k+1 = ckt1 EB ek+1 , .•• , and E1=(ek, ek+1 , ••• , ・ォKセャI@ is a sequence
of binary error vectors. The mathematical symbol EB represents binary {modulo-2)
addition. An error event of length l occurs when the decoder chooses, instead of the
transmitted sequence C1, the sequence C[ which corresponds to a path in the trellis
diagram that diverges from the original transmitted path and re-merges back exactly
after !time intervals. To find the probability of error, we need to sum over all possible
values of l the probabilities of error events oflength l (i.e., joint probabilities that C1is
transmitted and C[ is detected). The upper bound on the probability of error is
obtained by the following union bound
00
p ・セ@ LL LP(sl)P(Sz,s[) (7.7)
l = 1s1 s{ *'I
where P (s1 s[) denotes the pairwise error probability (i.e., probability that the
sequence s1 is transmitted and the sequence s[ is detected). Assuming a one-to-one
correspondence between a symbol and its label, we can write
00
pLセ@ LL LP(C1)P(C1,Cl)
i=lCr Cl*Cr
00
= LL LP(C1)P(C1 EB E,) {7.8)
ャ]QcイセJッ@
The pairwise error probability P (Cz, Cz, E1 ) can be upper-bounded by the
Bhattacharyya Bound (see Problem 7.12) as follows
-{-
1
-llf(CL)- f(Cr)l2
}
P(Cz, Cz, EB E1) セ@ e 4No
= eMセNセ@ ..セセヲHcャIM f(C'llf)
(7.9)
where f(.) is the memoryless mapper. Let D = e-{T セ P@ } (for Additive White Gaussian
Noise Channel with single sided power spectral density N0), then
Trellis Coded Modulation
(7.10)
where 、セサヲHc Q@ ),f(C'1)) represents the squared Euclidean distance between the symbol sequences
s1and s[. Next, define the function
W(Et) = LP(Cl) r)if(Cr)- f(Ct(;f)Etlii
2
Ct
(7.11)
We can now write the probability of error as
00
ー・セ@ L LW(El) (7.12)
I= 1Ez *o
From the above equation we observe that the probability of error is upper-bounded by a sum
over all possible error events, E1. Note that
(7.13)
We now introduce the concept of an error state diagram which is essentially a graph whose
branches have matrix labels. We assume that the source symbols are equally probable with
probabilities 2-m = 11M
DefbdtloD ,._,.Theerror weight.matrtx,·.G(e;) is tm Nx Nmatrix whose elementin
me·f'.iowtt:nd f1h eolumn is detmed u
Tセ@ )t :イセセセeカャHセmI@ セ@ Q GセQTエセャエ@ ·ifthereJl.aセセNM・Bャエッ@ ウエ。エ・セエ@
:_ ' l セ@ ; ' ' [aヲセセヲN@ ·. .· . •セG@ •,;••; .: • :•'• I : . , . '
N。gオエセスセ@ i}FQ,'ifthere is no セMNヲゥZッュNエ。エ・ー」キN@ ヲNᆱセ@ trdiD, {7.14)
·where c
1
-+ fare the label vectorS generatedby the transitionfrom. state .p.to state 1J.
The summation accounts for the possible parallel transitions (parallel paths) between states in
the Trellis Diagram. The entry (p, q) in the matrix G provides an upperbound on the probability
that an error event occurs starting from the node pand ending on q. Similarly, (11N)Gl is a vector
whose pth entry is a bound on the probability of any error event starting from node p. Now, to any
sequence E1
=e1
, e2
, ..., e1, there corresponds a sequence of l error weight matrices G(e1), G(e1), .•. ,
G(e1). Thus, we have
1 T l
fYt:Et) = -1 TI G(en)1
N n=l
(7.15)
where 1 is a column N-vector all elements of which are unity. We make the following
observations:
·.I
I>
:1
li
II
iᄋセ@
Information Theory, Coding and Cryptography
(i) For any matrix A, 1T A 1 represents the sum of all entries of A.
l
(ii) The element (p, q) of the matrix = TI G{en) enumerates the Euclidean Distance
n=l
involved in transition from state pto state qin exactly l steps.
Our next job is to relate the above analysis to the probability of error, Pe . It should be noted
that the error vectors e1, e2 , ... , e1 are not independent. The Error State Diagram has a structure
determined only by the Linear Convolutional Code and differs from the Code State Diagram
only in the denomination of its state and branch labels (G(ei)). Since the error vectors ei are
simply the differences of the vectors ci, the connections among the vectors ei are the same as that
among the vectors ci. Therefore, from (7.12) and (7.15) we have
(7.16)
where
T(D) = セ@ IT Gl, (7.17)
and the matrix
00 l
G=L L n G(en) (7.18)
l =1 Er"O n =1
is the matrix transfer function of the error state diagram. T(D) is called the scalar transfer
function or simply the transfer function of the Error State Diagram.
&ample 7.10 Consider a rate lf2 TCM scheme with m = 1, and M = 4. It takes
one bit at a time and encodes it into two bits, which are then mapped to one of
the four QPSK symbols. The two state Trellis Diagram and the symbol allocation
from the 4-PSKConstellation is given in Fig. 7.15.
10 00
0 0 0
16
Fig. 7.15
Let us denote the error vector bye= (e2 e1). Then, from (7.14)
Trellis Coded Modulation
1 [nllf(OO)- f(OOE!l e2'Illi
2
G(e2el) = 2 nllf(OO)- f(OlE!le2'Il112
= _!_[nllf(OO)- [(t2e1 lll
2
2 nllf(Ol)-f(e21Jll2
nllf(lO)- f{lO Ellt21J. )1
2
]
nllf(ll)- f(ll Ell e2till2
nllf(lO)- f(e21J. )12
]
nllf(ll)- f(e2iJ. IセR@
where e·= 1 ffi e. The error state diagram for this TCM scheme is given in Fig. 7.16.
G(10) A G(01)
0
So s1 So
Fig. 7.16
The matrix transfer function of the error state diagram is
G = G(I0)[/2 -G(ll)r
1
G(Ol)
(7.19)
(7.20)
where 12
is the 2 x 2 identity matrix. In this case there are only three error vectors possible,
{01,10,11 }. From (7.19) we calculate
G(OI) = セ@ {セZ@ セZ}L@ G(IO) = hセZ@ セZ}N。ョ、@ G(ll) = hセZ@ セZ}@
Using (7.20) we obtain the matrix transfer function of the Error State Diagram as
1 D
6
[1 1]
G = 2 1- D6 1 1
The scalar transfer function, T(D), is then given by
T(D) = 1__ 1T G1= D
6
2
2 1-D
(7.21)
(7.22)
d b b · · D e- {T セj@ in
The upper bound on the probability of error can be compute y su sututmg =
(7.22).
6 I
P < D -1
e- 2 -
1- D D=t 4No
(7.23)
Example 7.11 Consider another rate 112 TCM scheme with m = 1, and M = 4. The t:Vo state
Trellis Diagram is same as the one in the previous example. However, the symbol allocat1on from
the 4-PSK Constellation is different, and is given in Fig. 7.17.
Information Theory, Coding and Cryptography
01
10 00
0 0 0
11
Fig. 7.17
Note that this symbol assignment violates the Ungerboek design principles. Let us again denote the
error vector bye= (ez et ). The Error State Diagram for this TCM scheme is given in Fig. 7.18.
G(11)
G(10)
s1
Fig. 7.18
G(01)
The matrix transfer function of the Error State Diagram is
So
G = G(11)[lz- G(lO)r1
G(01) (7.24)
where lz is the 2 x 2 identity matrix. In this case there are only three error vectors possible
{01, 10,11 }. From (7.19) we calculate '
G(OI) = セ@ {セZ@ セZ}N@ G(IO) = セ@ [セZ@ セZ}N。ョ、@ G(ll) = セ@ [セZ@ セZ}@
Using (7.23) we obtain the matrix transfer function of the Error State Diagram as
G 1 D
4
[1 1]
= 2 1- D4 1 1
(7.25)
The scalar transfer function, T(D), is then given by
T (D) = _!_ 1T G1= D
4
2 1- D4 (7.26)
The upper bound on the probability of error is
D4 I
P, 1- D4 ' D=e__:_!__
{7.27)
-tNo
Comparing (7.23) and (7.27) we observe that, simply by changing the symbol assignment to th
branches oftheTrellis Diagram, we degrade the performance considerably. In the second examplee
エィセ@ upper 「ッオョセ@ on the error probability has loosened by two orders of magnitude(assuming D \セ@
1, 1.e., for the high SNR case).
Trellis Coded Modulation
A tighter upper bound on the error event probability is given by (exercise)
(Pf]
!!h._ -1
p セ@ _!_efrc d free e 4N0 T(D)I _ -tNo
e 2 4N D-t .
0
(7.28)
From (7.28), an asymptotic estimate on the error event probability can be obtained by
considering only the error events with free Euclidean Distance
P, セ@ セ@ N(dfr<,) efrc (Jセ@ J (7.29)
The bit error probability can be upper bounded simply by weighting the pairwise error
probabilities by the number of incorrect input bits associated with each error vector and then
dividing the result by m. Therefore,
-1
p < 1 aT(D,[) I mo
b- m aJ I= l,D= t
(7.30)
Where T(D, I) is the augmented generating function of the modified State Diagram. The
concept of the Modified State Diagram was introduced in the chapter on Convolutional
Codes (Section 6.5). A tighter upper bound can also be obtained for the bit error probability, and is
given by
p < _l_n+..c (Jd]re, Je 1;: aT(D,l) (7.31)
e- 2m r:;J'' 4N
0
ai 1
l=l,D=e 4
No
From (7.31), we observe that the upper bound on the bit error probability strongly depends ondfne
. In the next section, we will learn some methods for estimating dfree •
7.7 COMPUTATION OF drree
We have seen that the Euclidean Free Distance, dfree• is the singlemost importaat parameter for
determining how good a TCM scheme is for AWGN channels. It defines the asymptotic coding
gain of the scheme. In chapter 6 (Section 6.5) we saw that the generating function can be used to
calculate the Hamming Free Distance dfrer The transfer function of the error state diagram, T(D),
includes information about the distance of all the paths in the trellis from the all zero path. If
I(D) is obtained in a closed form, the value of dfree follows immediately from the expansion of
the function in a power series. The generating function can be written as
d2 d2
T (D) = N (dfree) D free + N (dnext) D next + ... (7.32)
where d!xt is the second smallest squared Euclidean Distance. Hence the smallest exponent of
Din the series expansion is エゥーN・セ@ However, in most cases, a closed form expression for T(D) may
not be available, and one has to resort to numerical techniques.
.I
Information Theory, Coding and Cryptography
Consider the function
¢ (D)= ln [ T(dJ)]
1
T(D)
(7.33)
¢I(D) decreases monotonically to the limit dfo
2
as D ---7 0 Therefore we h b d
d2 .d ee · ave an upper oun
on free proVI ed D>0. In order to obtain a lower bound on d}ee consider the following function
¢2(D) = ln T(D)
ln D (7.34)
Taking logarithm on both sides of (7.32) we get,
d}ee ln D = ln T(D) -ln N (d. ) -ln [1+ N(dfree) d、セ Q@ -dL .. ·]
}Tee N(dnext ) ,.•• (7.35)
If we take D ---7 0, provided D > 0, from (7.34) and (7.35) we obtain
ln T(D) 2
ln D = dfree- e(D) (7.36)
where,. e (D) is a function that is greater than zero, and tends to zero monotonically as D ---7 0
Thus, If we take smaller and smaller values of ¢1(D) and ¢2(D), we can obtain val th .
extremely close to dfree- ues at are
It should be kept in mind that even though d2 ·s th · 1 · -
d . free I e smg e most Important parameter to
・エ・イュゥセ・@ the quality of a TCM scheme, two other parameters are also influential:
(I) The error coefficient N (d )· A £ t f · · ·
d . . free · ac or o two mcrease m this error coefficient
re uces the codmg gam by approximately 0.2 dB for error rates of 10-6.
(ii) The next distance d · · th d all ·
th . next · IS e secon sm est Euchdean Distance between two
pa セ@ formm? an. error event. If dnext is very close to dfrw the SNR requirement for
goo approximation of the upper bound on Pe may be very large.
So fax:, we ィ。カセ@ セッ」オウウ・、@ primarily on AWGN channels. We found that the best design
ウエイ。エセァケ@ Is to maximiZe the free Euclidean Distance, dfret' for the code. In the next section we
」セョウゥ、・イ@ the design rules for TCM over fading channels. Just to remind the readers fading
cf =·els セ・@ frequen.tly encountered in radio and mobile communications. One 」ッュュセョ@ cause
0 ュセ@ IS the ュオセエゥー。エィ@ nature of the propagation medium. In this case, the signal arrives at
セ・@ ZセZセ・イ@ :om.different ー。セウ@ (with time varying nature) and gets added together. Depending
si al セセN@ e signals from セiセ・イ・セエ@ paths add up in phase, or out of phase, the net received
Zーャゥセセ・@ I(bitsl a ranthdomhvanld)ation m amplitude and phase. The drops in the received signal
e ow a res o are called fades.
7.8 TCM FOR FADING CHANNELS
In this section we will co ·d th rl
(MPSK) over a Fadin nsi er e pe ormance of trellis coded M-ary Phase Shift Keying
g Channel. We know that a TCM encoder takes in an input bit stream and
Trellis Coded Modulation
outputs a sequence of symbols. In this treatment we will assume that each of these symbols si
belong to the MPSK signal set. By using complex notation, each symbol can be represented by
a point in the complex plane. The coded signals are interleaved in order to spread the burst of
errors caused by the slowly varying fading process. These interleaved symbols are then pulse-
shaped for no inter-symbol interference and finally translated to RF frequencies for
transmission over the channel. The channel corrupts these transmitted symbols by adding a
fading gain (which is a negative gain, or a positive loss, depending on one's outlook) and
AWGN. At the receiver end, the received sequences are demodulated and quantized for soft
decision decoding. In many implementations, the channel estimator provides an estimate of the
channel gain, which is also termed as the channel state information. Thus we can represent
the received signal at time i as
(7.37)
where ni is a sample of the zero mean Gaussian noise process with variance N012 and gi is the
complex channel gain, which is also a sample of a complex Gaussian process with variance 」イセ@
The complex channel gain can be explicitly written using the phasor notation as follows
gi= ai ・ゥセゥL@ (7.38)
where ai and ¢i are the amplitude and phase processes respectively. We now make the following
assumptions:
(i) The receiver performs coherent detection,
(ii) The interleaving is ideal, which implies that the fading amplitudes are statistically
independent and the channel can be treated as memoryless.
Thus, we can write
(7.39)
We kr1ow that for a channel with a diffused multipath and no direct path the fading amplitude
is Rayleigh distributed with Probability Density Function (pdf)
(7.40)
For the case when there exists a direct path in addition to the multipath, Rician Fading is
observed. The pdf of the Rician Fading Amplitude is given by
PA (a)= 2a(1 + K)e- (K +a
2
(k +I)10( 2aJK(l + K)), (7.41)
where /
0
(.) is the zero-order, modified Bessel Function of the first kind and K is the Rician
Parameter defined as follows.
Definition 7.8 The Rician Parameter K is defined as the ratio of the energy of the
direct 」ッューッョセョエ@ to the energy of the diffused multipath component. For the
extreme case of K = 0, the pdf of the Rician distribution becomes the same as the pdf
of the Rayleigh Distribution.
Information Theory, .Coding and Cryptography
We now look at the performance of the TCM scheme over a fading channel. Let r1= (r1, r2,
...,rei) be the received signal. The maximum likely decoder, which is usually implemented by
the Viterbi decoder, chooses the coded sequence that most likely corresponds to the received
signals. This is achieved by computing a metric between the sequence of received signals, rtl
and the possible transmitted signals, St As we have seen earlier, this metric is related to the
conditional channel probabilities
m(r1, s1) =In p(r11s& (7.42)
If the channel state information is being used, the metric becomes
m(r[, St; aI) =In p(rzlsz, aI) (7.43L
Under the assumption of ideal interleaving, the channel is memoryless and hence the metrics
can be expressed as the following summations
l
m(rz, s1
) =lin p(r1is1) (7.44)
i=l
and
l
m(rz, hz; al) = llnp(rzisz,at) (7.45)
i=l
First, we consider the scenario where the channel state information is known, i.e., a; = a;. The
metric can be written as
m(r;, s;; a;) =- ir;- a; s;i2
Therefore, the pairwise error probability is given by
P2(Sz, Sz) = Eal [P2(sb sii az)],
where
(7.46)
(7.47)
(7.48)
and E is the statistical expectation operator. Using the Chernoff Bound, the pairwise error
probability can be upper bounded as follows.
A l l+K [ kTセッャウ@ .. -s..l2]
P2(sz, Sz) セii@ 1 exp - MMMMMZ[MセMMM
i=Il+K + 4No Is; -i.-1 l+K 4No lsi -iil2
For high SNR, the above equation simplifies to
A (1 + K)e- K
P2(sz, sz) セイイMMMM]MQMMM
iETJ 4N is;- s;i2
0
(7.49)
(7.50)
where 11 is.the set of all ifor which S; 7= si. Let us denote the number of elements in 17 by セ@ , then
we can wnte
Trellis Coded Modulation
(7.51)
where
d; HセI@ セ@ 111S; - s;l2 (7.52)
iETJ
is the squared product distance of the signals s; 7= s; .The term セ@ is called the effective
length of the error event (sz, 7= i 1). A union bound on the error event probability Pehas already
been discussed before. For the high SNR case, the upper bound on Pecan be expressed as
2
((1 + K)e- K)lry
p・セ@ I I 。{セL、ー@ (lTJ)] try
セ、セHセI@ HTセセI@ d;(l")
(7.53)
where a [l11
, 、セ@ HセI}ゥウ@ the average number of code sequences having the effective length lTI and
the squared product distance dl (セIN@ The error event probability is actually dominated by the
smallest effective length セ@ and the smallest product distance 、セ@ (セIN@ Let us denote the smallest
effective length セ@ by L and the corresponding product distance by 、セ@ (セIN@ The error event
probability can then be asymptotically approximated by
((1 + K)e-K)L
Pe z a (L, d; (L)) L ·
(TセッI@ d; (L)
(7.54)
We make the following observations from (7.54)
(i) The error event probability asymptotically varies with the Ltb power of SNR. This is
similar to what is achieved with a time diversity technique. Hence, Lis also called the
time diversity of the TCM scheme.
(ii) The important TCM design parameters for fading Channel are the time diversity, L,
and the product distance dJ(L). This is in contrast to the free Euclidean Distance
parameter for AWGN channel.
(iii) TCM codes designed for AWGN channels would normally fare poorly in fading
channels and vice versa.
(iv) For large values of the Rician parameter, K, the effect of the free Euclidean Distance
on the performance of the TCM scheme becomes dominant.
(v) At low SNR, again, the free Euclidean Distance becomes important for the
performance of the TCM scheme.
Thus the basic design rules for TCMs for fading channels, at high SNR and for small values of
K, are
,
'
I
rl
Information Theory, Coding and Cryptography
(i) maximize the effective length, L, of the code, and
(ii) minimize the minimum product distance dj (L).
Consider a TCM scheme with effective length, L, and the minimum product distance ih(L).
Suppose the code is redesigned to yield a minimum product distance, ih(L) with the same L.
The increa,se in the coding gain due the increase in the minimum product distance is given by
10 、セ@ (L)a1
L1 = SNR1 - SNR;.iP P ]MャッァセMM
g el- セ@ L 、セ@ (L)a2
'
(7.55)
where ai, i= 1, 2, is the average number of code sequences with effective length L for the TCM
scheme i. We observe that for a fixed value of L, increasing the minimum product distance
corresponding to a smaller value of L is more effective in improving the performance of the
code.
So far, we have assumed that the channel state information was available. A similar analysis
as carried out for the case where channel state information was available can also be done when
the information about the channel is unavailable. In the absence of channel state information, the
metric can be expressed as
m (r;, sj; aj) = -lri- sil2
. (7.56)
After some mathematical manipulations, it is shown that
A (2e/"')" {LセiウLM .i,l2r
R (s1
s1
) < (1 + K)lr, e-t
11
K
2
' - (l!No)lr, dj(ZTI)
(7.57)
Using arguments discussed earlier in this section, the error event probability Pecan be
determined for this case when the channel state information is not available.
7.9 CONCLUDING REMARKS
Coding and modulation were first analyzed together as a single entity by Massey in 1974. Prior
to that time, in all coded digital communications systems, the encoder/decoder and the
modulator/demodulator were designed and optimized separately. Massey's idea of combined
coding and modulation was concretized in the seminal paper by Ungerboeck in 1982. Similar
ideas were also proposed earlier by Imai and Hirakawa in 1977, but did not get due attention.
The primary advantage ofTCM was its ability to achieve increased power efficiency without the
customary increase in the bandwidth introduced by the coding process. In the following years
the theory of TCM was formalized by different researchers. Calderbank and Mazo showed that
the asymmetric one-dimensional TCM schemes provide more coding gain than symmetric
TCM schemes. Rotationally invariant TCM schemes were proposed by Wei in 1984, which
were subsequently adopted by CCITT for use in the new high speed voiceband modems.
Trellis Coded Modulation
SUMMARY
• The Trellis Coded Modulation (TCM) Technique allows us to achieve a better
performance without bandwidth expansion or using extra power.
• The minimum Euclidean Distance between any two paths in the trellis is called the free
Euclidean Distance, セ・・@ of the TCM scheme.
• The difference between the values of the SNR for the coded and uncoded schemes
required to achieve the same error probability is known as the coding gain, g= SNRiuncoded
- SNRicoded' At high SNR, the coding gain can be expressed as g, = giSNR--?= = 10 log
(dfee IEs)coded , where g"" represents the Asymptotic Coding Gain and Es is the average
(dfree / Es)uncoded
signal energy.
• The mapping by Set Partitioning is based on successive partitioning of the expanded
2m+1-ary signal set into subsets with increasing minimum Euclidean Distance. Each time
we partition the set, we reduce the number of the signal points in the subset, but increase
the minimum distance between the signal points in the subset.
• Ungerboeck's TCM design rules (based on heuristics) for AWGN channels are
Rule 1: Parallel transitions, if present, must 「セ@ associated with the signals of the subsets in
the lowest layer of the Set Partitioning Tree. These signals have the minimum Euclidean
Distance L1,n + 1.
Rule 2: The transitions originating from or merging into one state must be associates with
signals of the first step of set partitioning. The Euclidean distance between these signals is
at least L1 1.
Rule 3: All signals are used with equal frequency in the Trellis Diagram.
• The Viterbi Algorithm can be used to decode the received symbols for a TCM scheme.
The branch metric used in the decoding algorithm is the Euclidean Distance between the
received signal and the signal associated with the corresponding branch in the trellis.
• The average number of nearest neighbours at free distance, N(4u ), gives the 。カ・セ。ァ・@
number of paths in the trellis with free Euclidean Distance セ・・@ from a transmitted
sequence. This number is used in conjunction with セ・・@ for the evaluation of the error
event probability.
• The probability of error Pe :5 T (D) ID=e_ 114N0, where, T(D) = セ@ lTGl, and the matrix
l
G = I I ITG(en). T (D) is the scalar transfer function. A tighter upper bound on the
l =1Et oF-On= 1
d2
error event probability is given by Pe :5 l_erfc {セ@ d}.. JeT セ@ T(D) _::..!.___
2 4N0 D=e4No
Information Theory, Coding and Cryptography
セ@ (0 + K)e-K)lr, , セ@
• For fading channels, P:z, (sセ@ s) セ@ lr, where d; (l11 ) セ@ IJ I si - si1
2
• The term セ@
(_1_J d2 (/ ) iETI
4N0 P 11
is the effective length of the error event (sz, i1) and K is the Rician parameter. Thus, the
error event probability is dominated by the smallest effective length セ@ and the smallest
product distance d/ HセIN@
• The design rules for TCMs for fading channels, at high SNR and for small values ofK, are
(i) maximize the effective length, L, of the code, and
(ii) minimize the minimum product distance d/ (L).
• The increase in the coding gain due to increase in minimum product distance is given by
I
10 、セRHlI。 Q@ . .
セァ@ = SNR1 - sセ@ P. _ P.
2
=- log 2
, where ai , z= 1, 2, Is the average number
el ' L dp1(L)a2
of code sequences with effective length L for the TCM scheme i.
nA ャヲエエャ・Lセ@ セMセᄋBGPエGャャッヲセセ@ ,
u . · H./£ NUN'fJ-{Sa/.ii:;J {JB?0-1916;1
PRO'BLEMS
7.1 Consider a rate 2/3 Convolutional Code defined by
G(D) = [ 1 D D + D2 l
D2
1+ D 1+ D + D2
This code is used with an 8PSK signal set that uses Gray Coding (the three bits per
symbol are assigned such that the codes for two adjacent symbols differ only in 1 bit
location). The throughput of this TCM scheme is 2 bits/sec/Hz.
(a) How many states are there in the Trellis Diagram for this encoder?
(b) Find the free Euclidean Distance.
(c) Find the Asymptotic coding gain with respect to uncoded QPSK, which has a
throughput of 2 bits/sec/Hz.
7.2 In Problem 7.1, suppose instead of Gray Coding, natural mapping is performed, i.e.,
So セPPPL@ S1 セ@ 001, ..., S7 セ@ 111.
(a) Find the free Euclidean Distance.
(b) Find the Asymptotic coding gain with respect to uncoded QPSK (2 bits/sec/Hz).
Trellis Coded Modulation
7.3 Consider the TCM encoder shown in Fig. 7.19.
Fig. 7.19 Figure for Problem 73.
(a) Draw the State Diagram for this encoder.
(b) Draw the Trellis Diagram for this encoder.
S;
(c) Find the free Euclidean Distance, セヲイ・・N@ In the Trellis Diagram, show one pair of two
paths which result in tf}ree .What is N(tf}reJ?
(d) Next, use set partitioning to assign the symbols of 8-PSK to the branches of the Trellis
Diagram. What is the 、セ・・@ now?
(e) Encode the following bit stream using this encoder: 1 0 0 1 0 0 0 I 0 1 0 ... Give your
answer for both the natural mapping and mapping using Set Partitioning.
(f) Compare the asymptotic coding gains for the two different kinds of mapping.
7.4 We want to design a TCM scheme that has a 2/3 convolutional encoder followed by a
signal mapper. The mapping is done based on set partitioning of the Asymmetric
Constellation Diagram shown below. The trellis is a four-state, fully connected trellis.
(a) Perform Set Partitioning for the following Asymmetric Constellation Diagram.
(b) What is the free Euclidean distance, dfrel' for this asymmetric TCM scheme?
Compare it with the i)ee for the case when we use the standard 8-PSK Signal
Constellation. .
(c) How will you choose the value of efor improving the performance of the TCM
scheme using the Asymmetric Signal Constellation shown in Fig. 7.20?
Ss s-,
Fig. 7.20 Figure for Problem 74.
Information Theory, Coding and Cryptography
7.5 Consider the rate 3/4 encoder shown in Fig. 7.21. The four output bits from the encoder
are mapped onto one of the sixteen possible symbols from the Constellation Diagram
shown below. Use Ungerboeck's design rules to design a TCM scheme for an AWGN
channel. What is the asymptotic coding gain with respect to uncoded 8-PSK?
a1 c1
a2 C2
•
• •
a3 CJ
• •
•
c4
Fig. 7.21 Figure for Problem 7.5.
7.6 Consider the expression for pairwise error probability over a Rician Fading Channel.
Comment.
(b) Show that for low SNR the original inequality may be expressed as
R(s s) セ・クー@ [di (sz,iz)]
2 z, z 4No
7.7 Consider a TCM scheme designed for a Rician Fading Channel with an effective length
L and the minimum product distance d: (L). Suppose, we wish to redesign this code to
obtain an improvement of 3 dB in SNR.
(a) Compute the desired effective length L if the tfj (L) is kept unchanged.
(b) Compute the desired product distance d:(L) if the effective length L is kept
unchanged.
7.8 Suppose you have to design a TCM scheme for an AWGN channel (SNR = y). The
desired BER is Pe. Draw a flowchart as to how you will go about designing such a scheme.
Trellis Coded Modulation
(a) How many states will there be in your Trellis?
(b) How will you design the convolutional encoder?
(c) Would you have parallel paths in your design?
(d) What kind of modulation scheme will you choose and why?
(e) How will you assign the symbols of the modulation scheme to the branches?
7.9 For Viterbi decoding the metric used is of the form
m (rb s1) =In p(rzlsz).
(a) What is the logic behind choosing such a metric?
(b) Suggest another metric that will be suitable for fading channels. Give reasons for
your answer.
7.10 A TCM scheme designed for a Rician Fading Channel (K = 3) and a high SNR
environment (SNR = 20 dB) has L = 5 and d/ (L) = 2.34 E'/. It has to be redesigned to
produce an improvement of 2 dB.
(a) What is the tfj(L) of the new code?
(b) Comment on the new d セ・・ᄋ@
7.11 Consider the TCM scheme shown in Fig. 7.22 consisting of a rate lf2 convolutional
encoder coupled with a mappei.
(a) Draw the Trellis Diagram for this encoder.
(b) Determine the scalar transfer function, T (D).
(c) Determine the augmented generating function, T (D, L, !).
(d) What is the minimum Hamming Distance (dfree ) of this code?
(e) How many paths are there with this dfree?
10
0 0 0
Fig. 7.22 Figure for Problem 7. 77.
7.12 Consider the pairwise error probability P2(s1, Sz).
(a) For a maximum likelihood decoder, prove that
P2(sb Sz) = JヲHイIーrisHセウコI、イ@
00
where r is the received vector, pRl s(r ISz) is the channel transition probability density
function and
I
I
(b) Show that
Information Theory, Coding and Cryptography
f( ) < PRis(riiz)
r - PRis(riiz)
P2(sb iz) セ@ JJPRis(rliz )PRis(risz)dr
COMPUTER PROBLEMS
7.13 Write a computer program to perform trellis coded modulation, given the trellis structure
and the mapping rule. The program should take in an input bit stream and output a
sequence of symbols. The input to the program may be taken as two matrices, one that
gives the connectivity between the states of the trellis (essentially the structure of the
trellis) and the second, which gives the branch labels.
7.14 Write a computer program to calculate the squared free Euclidean distance ifree' the
effective length L, and the minimum product distance, dff (L), of a TCM Scheme, given
the Trellis Diagram and the label& on the branches.
7.15 Write a computer program that performs Viterbi decoding on an input stream of
symbols. This program makes use of a given trellis and the labels on the branches of the
Trellis Diagram.
7.16 Verify the performance of the different TCM schemes given in this chapter in AWGN
environment. To do so, take a long chain ofrandom bits and input it to the TCM encoder.
The encoder will produce a sequence of symbols (analog waveforms). Corrupt these
symbols with AWGN of different noise power, i.e., simulate scenarios with different
SNRs. Use Viterbi decoding to decode the received sequence of corrupted symbols
(distorted waveforms). Generate a plot of the BER versus the SNR and compare it with
the theoretically predicted error rates.
7.17 Write a program to observe the effect of decoding window size for the Viterbi decoder.
Generate a plot of the error rate versus the window size. Also plot the number of
computations versus the window size.
7.18 Write a computer program that performs exhaustive search ゥセ@ order to determine a rate
2/3 TCM encoder which is designed for AWGN (maximize dfree ). Assume that there are
four states in the Trellis Diagram and it is a fully connected trellis. The branches of this
trellis are labelled using the symbols from an 8-PSK signal set. Modify the program to
perform exhaustive search for a good TCM scheme with a four-state trellis with the
possibility of parallel branches.
7.19 Write a computer program that performs exhaustive search in order to determine a rate
2/3 TCM encoder which is designed for a fading channel (maximize d/(L)). Assume that
there are four states in the trellis diagram and it is a fully connected trellis. The branches
of this trellis are labelled using the symbols from an 8-PSK signal set. List out the dj (L)
and L of some of the better codes found during the search.
7.20 Draw the family of curves depicting the relation between Pe and Leff for different values
of K (Rician Parameter) for
(a) High SNR,
(b) Low SNR.
Comment on the plots.
8
Cryptography
ff,
8.1 INTRODUCTION TO CRYPTOGRAPHY
Cryptography is the science of devising methods that allow information to be sent in a secure
form· in such a way that the only person able to retrieve this information is the intended
recipient. Encryption is based on algorithms that scramble information into unreadable or non-
discernible form. Decryption is the process of restoring the scrambled information to its original
form (see Fig. 8.1).
A Cryptosystem is a collection of algorithms and associated procedures for hiding and
revealing (un-hiding!) information. Cryptanalysis is the process (actually, the art) of analyzing
a cryptosystem, either to verify its integrity or to break it for ulterior motives. An attacker is a
person or system that performs unauthorised cryptanalysis in order to break a cryptosystem.
Attackers are also referred to as hackers, interlopers or eavesdroppers. The process of attacking
a cryptosystem is often called cracking.
The job of the cryptanalyst is to find the weaknesses in the cryptosystem. In many cases, the
developers of a cryptosystem announce a public challenge with a large prize-money for anyone
Information Theory, Coding and Cryptography
who can crack the scheme. Once a cryptosystem is broken (and the cryptanalyst discloses his
techniques), the designers of the scheme try to strengthen the algorithm. Just because a
cryptosystem has been broken does not render it useless. The hackers may have broken the
system under optimal conditions using equipment (fast computers, dedicated microprocessors,
etc.) that is usually not available to common people. Some cryptosystems are rated in terms of
the length of time and the price of the computing equipment it would take to break them!
In the last few decades, cryptographic algorithms, being mathematical in nature, have
become so advanced that they can only be handled by computers. This, in effect, means that the
uncoded message (prior to encryption) is binary in form, and can therefore be anything; a
picture, a voice, a text such as an e-mail or even a video.
Fig. 8.1 The Process of Encryption and Decryption.
Cryptography is not merely used for military and diplomatic communications as many
people tend to believe. In reality, cryptography has many commercial uses and applications.
From protecting confidential company information, to protecting a telephone call, to allowing
someone to order a product on the Internet without the fear of their credit card number being
intercepted and misused, cryptography is all about increasing the level of privacy of individuals
and groups. For example, cryptography is often used to prevent forgers from counterfeiting
winning lottery tickets. Each lottery ticket can have two numbers printed onto it, one plaintext
and one the corresponding cipher. Unless the counterfeiter has cryptanalyzed the lottery's
cryptosystem he or she will not be able to print an acceptable forgery.
The chapter is organized as follows. We begin with an overview of different encryption
techniques. We will, then, study the concept of secret-key cryptography. Some specific secret-
key cryptographic techniques will be discussed in detail. The public-key cryptography will be
introduced next. Two popular public-key cryptographic techniques, the RSA algorithm and
PGP, will be discussed in detail. A flavour of some other cryptographic techniques in use today
will also be given. The chapter will conclude with a discussion on cryptanalysis and the politics
of cryptography.
8.2 AN OVERVIEW OF ENCRYPTION TECHNIQUES
The goal of a cryptographic system is to provide a high level of confidentiality, integrity, non-
repudiability and authenticity to information that is exchanged over networks.
Cryptography
Confidentiality of messages and stored data is protected by hiding information using
encryption techniques. Message integrity ensures that a message remains unchanged from the
time it is created to the time it is opened by the recipient. Non-repudiation can provide a way of
proving that the message came from someone even if they try to deny it. Authentication
provides two services. First, it establishes beyond doubt the origin of the message. Second, it
verifies the identity of a user logging into a system and continues to verify their identity in case
someone tries to break into the system.
Definition 8.1 A message being sent is known as plaintext The message is code<J.
using a Cryptographic Algorithm. This process is called encryption. An encrypted
message is known as ciphertext, and is turned back into plaintext by the process.of
decryption.
It must be assumed that any eavesdropper has access to all communication between the
sender and the recipient. A method of encryption is only secure if even with this complete
access, the eavesdropper is still unable to recover the original plaintext from the ciphertext.
There is a big difference between security and obscurity. Suppose, a message is left for
somebody in an airport locker, and the details of the airport and the locker number are known
only to the intended recipient, then this message is nf'>t secure, merely ッ「ウ」セイ・N@ If however, all
potential eavesdroppers know the exact location of the locker, and they still cannot open the
locker and access the message, then this message is secure.
Definition 8.2 A key is a value that causes a Cryptographic Algorithm to run in a
specific manner and produce a specific ciphertext as an outpUt: Thekeysize ゥセ@ usually
measured in bits. The bigger the key size, the more secure will be the algorithm.
Extlmpk8.1 Suppose we ィ。セ・@ to ・セ@ セ@ send エィ・ヲッャャッキセ@ stream Clfbinary、。エ。Hセ@
might be originating from voice, video, text or any other source)
0110001010011111 ...
We can use a 4-bit long key, x =1011, to encrypt this bit stream. To perform encryption, セ@
plaintext (binary bit stream) is first subdivided in to blocks of4 bits.
0110 0010 1001 1111....
Each sub--blockis XORed (binary addition) with the key,x=1011. The encryptedmessagewillbe
1 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0....
The recipient must also possess the knowledge of the key in order to セ@ the ュセ・ᄋNGGセ@
、・」イケーエゥッョセ@ is fairly simple in this case. The ciphertext (the received bmary bitセエウ@
first•subdividedmto blocks of 4 bits. Bach sub-block is XORed with the key, x =1011. tbe
decrypted messagewill be the original plaintext
セi@
Information Theory, Coding and Cryptography
0110 0010 1001 1111...
It should be noted that just one key is used both for encryption and decryption.
Example 8.2 Let us devise an algorithm for text messages, which we shall call character+ x. Let
x = 5. In this encryption technique, we replace every alphabet by the fifth one following it, i.e., A
becomes F, B becomes G, C becomes H, and so on. The recipients of the encrypted message just
need to know the value of the key, x, in order to decipher the message. The key must be kept
separate from the encrypted message being sent. Because there is just one key which is used for
encryption and decryption, this kind of technique is called Symmetric Cryptography or Single
Key Cryptography or Secret Key Cryptography. The problem with this technique is that the
key has to be kept confidential. Also, the key must be changed from time to time to ensure secrecy
of transmission. This means that the secret key (or the set of keys) has to be communicated to the
recipient. This might be done physically.
To get around this problem of communicating the key, the concept of Public Key
Cryptography was developed by Difie and Hellman. This technique is also called the
Asymmetric Encryption. The concept is simple. There are two keys, one is held privatelyand the
other one is made public. What one key can lock, the other key can unlock.
eク。ューセ@ 8.3 sオセウ・@ we want to send an encrypted message to recipient A using the public key
encryption techruque. To do so we will use the public key of the recipient A and use it to encrypt
エィセ@ message. When the message is received, recipient A decrypts it with his private key. Only the
pnvate key of recipient A can decrypt a message that has been encrypted with his public key.
Similarly, イセゥーゥ・ョエ@ B can only decrypt a message that has been encrypted with his public key.
Thus, no pnvate key ever needs to be communicated and hence one does not have to trust any
communication channel to convey the keys.
U:t us consider another scenario. Suppose we want to send somebody a message and also
ーイッvi、セ@ a ーイッッセ@ that the message is actually from us (a lot of harm can be done by providing
bogu.s mformation, or rather, misinformation!). In order to keep a message private and also
ーイセvi、・@ 。オセ・ョエゥ」。エゥッセ@ (that it is indeed from us), we can perform a special encryption on the
ーャセョN@ text wtth our pnvate key, then encrypt it again with the public key of the recipient. The
イ・cゥーゥ・セエ@ .uses ィセウ@ private key to open the message and then use our public key to verify the
authenticity. This technique is said to use Digital Signatures.
tィセイ・@ is セッエィ・イ@ important encryption technique called the One-way Function. It is a non-
reversible qmck encryption method. The encryption is easy and fast, but the decryption is not.
Suppose we send a document to recipient A and want to check at a later time whether the
document has been tampered with. We can do so by running a one-way function, which
ーイセ、オ」・セ@ a fixed length value called a hash (also called the message digest). The hash is the
umque signature of the document that can be sent along with the document. Recipient A can
run the same one-way function to check whether the document has been altered.
Cryptography
The actual mathematical function used to encrypt and decrypt messages is called a
Cryptographic Algorithm or cipher. This is only a part of the system used to send and
receive secure messages. This will become clearer as we discuss specific systems in detail.
As with most historical ciphers, the security of the message being sent relies on the algorithm
itself remaining secret. This technique is known as a Restricted Algorithm. It has the following
fundamental drawbacks. .
(i) The algorithm obviously has to be restricted to only those people that you want to be able
to decode your message. Therefore a new algorithm must be invented for every discrete
group of users.
(ii) A large or changing group of users cannot utilise them, as every time one user leaves the
group, everyone must change the algorithm.
(iii) If the algorithm is compromised in any way, a new algorithm must be implemented.
Because of these drawbacks, Restricted Algorithms are no longer popular and have given
way to key-based algorithms.
Practically all modem cryptographic systems make use of a key. Algorithms that use a key
allow all details of the algorithm to be widely available. This is because all of the security lies in
the key. With a key-based algorithm the plaintext is encrypted and decrypted by the algorithm
which uses a certain key, and the resulting ciphertext is dependent on the key, and not the
algorithm. This means that an eavesdropper can have a complete copy of the algorithm in use,
but without the specific key used to encrypt that message, it is useless.
8.3 OPERATIONS USED BY ENCRYPTION ALGORITHMS
Although the methods of encryption/decryption have changed dramatically since the advent of
computers, there are still only two basic operations that can be carried out on a piece of
plaintext: substitution and transposition. The only real difference is that, earlier these were
carried out with the alphabet, nowadays they are carried out on binary bits.
Substitution
Substitution operations replace bits in the plaintext with other bits decided upon by the
algorithm, to produce ciphertext. This substitution then just has to be reversed to produce
plaintext from ciphertext. This can be made increasingly complicated. For instance one
plaintext character could correspond to one of a number of ciphertext characters (homophonic
substitution), or each character of plaintext is substituted by a character of corresponding
position in a length of another text (running cipher).
.Example8.4 Julius Caesar was one ofthe first to use substitution encryption to sendmessages to
troops during the war. The substitution methodhe invented advances eachcharacterthree spacesin
the alphabet. Thus,
Information Theory, Coding and Cryptography
THIS IS SUBSTITUTION CIPHER (8.1)
WKLV LU VXEVWL WXWLRQ FLSKHU.
Transposition
Transposition (or permutation) does not alter any of the bits in plaintext, but instead moves
their positions around within it. If the resultant ciphertext is then put through more
transpositions, the end result is increasing security.
XOR
XOR is an exclusive-or operation. It is a Boolean operator such that if 1 of two bits is true, then
so is the result, but if both are true or both are false then the result is false. For example,
OXORO=O
1XOR0=1
OXOR1=1
1XOR1=0
(8.3)
A surprising amount of commercial software uses simple XOR functions to provide security,
including the USA digital cellular telephone network and many office applications, and it is
trivial to crack. However the XOR operation, as will be seen later in this paper, is a vital part of
many advanced Cryptographic Algorithms when performed between long blocks of bits that
also undergo substitution and/or transposition.
8.4 SYMMETRIC (SECRET KEY) CRYPTOGRAPHY
Symmetric Algorithms (or Single Key Algorithms or Secret Key Algorithms) have one
key that is used both to encrypt and decrypt the message, hence the name. In order for the
recipient to decrypt the message they need to have an identical copy of the key. This presents
one major problem, the distribution of the keys. Unless the recipient san meet the sender in
person and obtain a key, the key itself must be transmitted to the recipient, and is thus
susceptible to eavesdropping. However, single key algorithms are fast and efficient, especially if
large volumes of data need to be processed.
In Symmetric Cryptography, the two parties that exchange messages use the same algorithm.
Only the key is changed from time to time. The same plaintext with a different key results in a
different ciphertext. The encryption algorithm is available to the public, hence should be strong
and well-tested. The more powerful the algorithm, the less likely that an attacker will be able to
decrypt the resulting cipher.
The size of the key is critical in producing strong ciphertext. The US National Security
Agency, NSA stated in the mid-1990s that a 40-bit length was acceptable to them (i.e., they
Cryptography
could crack it sufficiently quickly!). Increasing processor speeds, combined with loosely-coupled
multi-processor configurations, have brought the ability to crack such short keys within the
reach of potential hackers. In 1998, it was suggested that in order to be strong, the key size needs
to be at least 56 bits long. It was argued by an expert group as early as 1996 that 90 bits is a more
appropriate length. Today, the most secure schemes use 128-bit keys or even longer keys.
Symmetric Cryptography provides a means of satisfying the requirement of message content
security, because the content cannot be read without the secret key. There remains a risk of
exposure, however, because neither party can be sure that the other party has not exposed the
secret key to a third party (whether accidentally or intentionally).
S mmetric Cryptography can also be used to address integrity and authentication
イ・アセイ・ュ・ョエウN@ The sender creates a summary of the message, or Message Authentication
Code (MAC), encrypts it with the secret key, and sends that with the message. The reCipient
then re-creates the MAC, decrypts the MAC that was sent, and compares the two. If they are
identical, then the message that was received must have been identical with that which was sent.
As mentioned earlier, a major difficulty with symmetric schemes is that the secret key has to
be possessed by both parties, and hence has to be transmitted from whoever 」セ・セエ・ウ@ it to セ・@
other party. Moreover, if the key is compromised, all of the message エセ。ョウュゥウウキョ@ セ・」オョエケ@
measures are undermined. The steps taken to provide a secure mechanism for creating and
passing on the secret key are referred to as Key Management.
The technique does not adequately address the non-repudiation requirement, 「・セ。オウ・N@ both
parties have the same secret key. Hence each is exposed to the risk of fraudulent ヲ。ャセiヲゥ」。エゥセョ@ of
a message by the other, and a claim by either party not to have sent a message IS credible,
because the other may have compromised the key.
There are two types of Symmetric Algorithms-Block Ciphers and Stream Ciphers.
Definition 8.3 Block Ciphen usually operate on groups of bits called blocks. Each
block is processed a multiple number of times. In each round the セ・ケ@ is applied ゥセ@ a
unique manner. The more the number of iterations, the longer IS the encryption
process, but results in a more secure ciphertext
Definition 8.4 Stream Ciphen operate on plaintext one bit at a time. Plaintext is
streamed as raw bits through the encryption algorithm. While a block cipher will
produce the same ciphertext from the same plaintext using the same key, a stream
cipher will not. The ciphertext produced by a stream cipher will vary under the same
conditions.
How long should a key be? There is no single answer to this アオ・ウエゥセョN@ It 、・セ・ョ、ウ@ on the
specific situation. To determine how much security one needs, the followmg questions must be
answered:
(i) What is the worth of the data to be protected?
-,
Information Theory, Coding and Cryptography
(ii) How long does it need to be secure?
(iii) What are the resources available to the cryptanalyst/hacker?
A customer list might be worth Rs 1000, an advertisement data might be worth Rs. 50,000 and
the master key for a digital cash system might be worth millions. In the world of stock markets,
the secrets have to be kept for a couple of minutes. In the newspaper business today's secret is
tomorrow's headlines. The census data of a country have to be kept secret for months (if not
years). Corporate trade secrets are interesting to rival companies and military secrets are
interesting to rival militaries. Thus, the security requirements can be specified in these terms.
For example, one may require that the key length must be such that there is a probability of
0.0001% that a hacker with the resources of Rs 1 million could break the system in 1 year,
assuming that the technology advances at a rate of 25% per annum over that period. The
minimum key requirement for different applications are listed in Table 8.1. This table should be
used as a guideline only.
Table 8.1 Minimum key requirements for different applications
Type of information Lifetime Mmimum key length
Tactical military information Minutes/hours 56-64 bits
Product announcements' Days/weeks 64 bits
Interest rates Days/weeks 64 bits
Trade secrets decades 112 bits
Nuclear bomb secrets >50 years 128 bits
Identities of spies >50 years 128 bits
Personal affairs > 60 years > 128 bits
Diplomatic embarrassments > 70 years > 128 bits
Future computing power is difficult to estimate. A rule of thumb is that the efficiency of
computing equipment divided by price doubles every 18 months, and increases by a factor of
10 every five years. Thus, in 50 years the fastest computer will be 10 billion times faster than
today's! These numbers refer to general-purpose computers. We cannot predict what kind of
specialized crypto-system breaking computers might be developed in the years to come.
Two symmetric algorithms, both block ciphers, will be discussed in this chapter. These are
the Data Encryption Standard (DES) and the International Data Encryption Algorithm
(IDEA).
8.5 DATA ENCRYPTION STANDARD (DES)
DES, an acronym for the Data Encryption Standard, is the name of the Federal Information
Processing Standard (FIPS) 46-3, which describes the Data Encryption Algorithm (DEA). The
DEA is also defined in the ANSI standard X9.32.
Created by IBM, DES came about due to a public request by the US National Bureau of
Standards (NSB) requesting proposals for a Standard Cryptographic Algorithm that satisfied
the following criteria:
Cryptography
(i) Provides a high level of security
(ii) The security depends on keys, not the secrecy of the algorithm
(iii) The security is capable of being evaluated
(iv) The algorithm is completely specified and easy to understand
(v) It is efficient to use and adaptable
(vi) Must be available to all users
(vii) Must be exportable
DEA is essentially an improvement of the 'Algorithm Lucifer' developed by IBM in the early
1970s. The US National Bureau of Standards published the Data Encryption Standard in 1975.
While the algorithm was basically designed by IBM, the NSA and NBS (now NIST) played a
substantial role in the final stages of the development. The DES has been extensively studied
since its publication and is the best known and the most widely used Symmetric Algorithm in
the world.
The DEA has a 64-bit block size and uses a 56-bit key during execution (8 parity bits are
stripped off from the full 64-bit key). The DEA is a Symmetric Cryptosystem, specifically a 16-
round Feistel Cipher and was originally designed for implementation in hardware. When used
for communication, both sender and receiver must know the same secret key, which can be
used to encrypt and decrypt the message, or to generate and verify a Message Authentication
Code (MAC). The DEA can also be used for single-user encryption, such as to store files on a
hard disk in encrypted form. In a multi-user environment, secure key distribution may be
difficult; public-key cryptography provides an ideal solution to this problem.
NIST re-certifies DES (FIPS 46_:1, 46-2, 46-3) every five years. FIPS 46-3 reaffirms DES usage
as of October 1999, but single DES is permitted only for legacy systems. FIPS 46-3 includes a
definition of triple-DES (TDEA, corresponding to X9.52). Within a few years, DES and triple-
DES will be replaced with the Advanced Encryption Standard.
DES has now been in world-wide use for over 20 years, and due to the fact that it is a defined
standard means that any system implementing DES can communicate with any other system
using it. DES is used in banks and businesses all over the world, as well as in networks (as
Kerberos) and to protect the password file on UNIX Operating Systems (as CRYPT).
DES Encryption
DES is a symmetric, block-cipher algorithm with a key length of 64 bits, and a block size of 64
bits (i.e. the algorithm operates on successive 64 bit blocks of plaintext). Being symmetric, the
same key is used for encryption and decryption, and DES also uses the same algorithm for
encryption and decryption.
First a transposition is carried out according to a set table (the initial permutation), the 64-bit
plaintext block is then split into two 32-bit blocks, and 16 identical operations called rounds are
carried out on each half. The two halves are then joined back together, and the reverse of the
Information Theory, Coding and Cryptography
initial permutation carried out. The purpose of the first transposition is not clear, as it does not
affect the security crf the algorithm, but is thought to be for the purpose of allowing plaintext and
ciphertext to be loaded into 8-bit chips in byte-sized pieces.
In any round, only one half of the original 64-bit block is operated on. The rounds alternate
between the two halves. One round in DES consists of the following.
Key Transformation
The 64-bit key is reduced to 56 by removing every eighth bit (these are sometimes used for
error checking). Sixteen different 48-bit subkeys are then created- one for each round. This is
achieved by splitting the 56-bit key into two halves, and then circularly shifting them left by 1 or
2 bits, depending on the round. After this, 48 of the bits are selected. Because they are shifted,
different groups of key bits are used in each subkey. This process is called a compression
permutation due to the transposition of the bits and the reduction of the overall size.
Expansion Permutation
After the key transformation, whichever half of the block is being operated on undergoes an
expansion permutation. In this operation, the expansion and transposition are achieved
simultaneously by allowing the 1st and 4th bits in each 4 bit block to appear twice in the output,
i.e., the 4th input bit becomes the 5th and 7th output bits (see Fig. 8.2).
The expansion permutation achieves 3 things: Firstly it increases the size of the half-block
from 32 bits to 48, the same number of bits as in the compressed key subset, which is important
as the next operation is to XOR the two together. Secondly, it produces a longer string of data
for the substitution operation that subsequently compresses it. Thirdly, and most importantly,
because in the subsequent substitutions the 1st and 4th bits appear in two S-boxes (described
shortly), they affect two substitutions. The effect of this is that the dependency of the output bits
on the input bits increases rapidly, and so, therefore, does the security of the algorithm.
-----.-§]
48
Fig. 8.2 The Expansion Permutation.
XOR
The resulting 48-bit block is then XORed with the appropriate subset key for that round.
Substitution
The next operation is to perform substitutions on the expanded block. There are eight
substitution boxes, called S-boxes. The first S-box operates on the first 6 bits of the 48-bit
Cryptography
expanded block, the 2nd S-box on the next six, and so on. Each S-box operates from a table of 4
rows and 16 columns, each entry in the table is a 4-bit number. The 6-bit number the S-box
takes as input is used to look up the appropriate entry in the table in the following way. The 1st
and 6th bits are combined to form a 2-bit number corresponding to a row number, and the 2nd
to 5th bits are combined to form a 4-bit セオュ「・イ@ corresponding to a particular column. The net
result of the substitution phase is eight 4-bit blocks that are then combined into a 32-bit block.
It is the non-linear relationship of the S-boxes that really provide DES with its security, all the
other processes within the DES algorithm are linear, and as such relatively easy to analyze.
48-bit input
iセ@ セセi@ ill)II: /ill)11:1r--11...--1
lMイャャMイセMNA@ セMMLゥLMMLQZ@ NNMQサセQ@ L-rii"TJ""""T:...,jlr-1!: / l{(!Ij: 1/ll!セセi@ j: Oゥャャセセ|@
セセセセ@ セセセセ@ セセセセ@ セセセセ@ セセセセ@ セセセセ@ セセセセ@ セセセセ@
ITITJITITJITITJITOJITOJITITJITIIJITIIJ
32-bit output
Fig. 8.3 The 5-box Substitution.
Permutation
The 32-bit output of the substitution phase then undergoes a straightforward transposition using
a table sometimes known as the P-box.
After all the rounds have been completed, the two 'half-blocks' of 32 bits are recombined to
form a 64-bit output, the final permutation is performed on it, and the resulting 64-bit block is
the DES encrypted ciphertext of the input plaintext block.
DES Decryption
Decrypting DES is very easy (if one has the correct key!). Thanks to its 、・ウセァョL@ the 、セ」イケーエゥッョ@
algorithm is identical to the encryption algorithm-the only alteration that セウ@ made, IS. that to
decrypt DES ciphertext, the subsets of the key used in each round are used m reverse, 1.e., the
16th subset is used first.
Security of DES
Unfortunately, with advances in the field of cryptanalysis and the huge ゥョ」イ・。ウセ@ in available
computing power, DES is no longer considered to be very secure. There are algonthms that can
be used to reduce the number of keys that need to be checked, but even using a straightforward
brute-force attack and just trying every single possible key, there are computers that can crack
DES in a matter of minutes. It is rumoured that the US National Security Agency (NSA) can
crack a DES encrypted message in 3-15 minutes.
If a time limit of 2 hours to crack a DES encrypted file is set, then you have to check all
possible keys (256
!in two hours, which is roughly 5 trillion keys per second. While this may
--,
Information Theory, Coding and Cryptography
seem like a huge number, consider that a $I0 Application-Specific Integrated Circuits (ASICs)
chip can test 200 million keys per second, and many of these can be paralleled together. It is
suggested that a $I0 million investment in ASICs would allow a computer to be built that would
be capable of breaking a DES encrypted message in 6 minutes.
DES can no longer be considered a sufficiently secure algorithm. If a DES-encrypted message
can be broken in minutes by supercomputers today, then the rapidly increasing power of
computers means that it will be a trivial matter to break DES encryption in the future (when a
message encrypted today may still need to be secure). An extension of DES called DESX is
considered to be virtually immune to an exhaustive key search.
8.6 INTERNATIONAL DATA ENCRYPTION ALGORITHM (IDEA)
IDEA was created in its first form by Xuejia Lai andjames Massey in I990, and was called the
pイッセッウ・セ@ eョ」セーエゥッョ@ Standard (PES). In I991, Lai and Massey strengthened the algorithm
agamst differential cryptanalysis and called the result Improved PES (IPES). The name of IPES
was change.d エセ@ International Data Encryption Algorithm (IDEA) in I992. IDEA is perhaps best
known for Its Implementation in PGP (Pretty Good Privacy).
The Algorithm
IDEA is a symmetric, block-cipher algorithm with a key length of I28 bits, a block size of 64
bits, and as with DES, the same algorithm provides encryption and decryption.
ideセ@ consists of 8 rounds using 52 subkeys. Each round uses six subkeys, with the remaining
four bemg used for the output transformation. The subkeys are created as follows.
. Firstly the NiセXM「ゥエ@ key is divided into eight I6-bit keys to provide the first eight subkeys. The
bits of the ッNョァエセ。ャN@ key are then shifted 25 bits to the left, and then it is again split into eight
subkeys. This shifting and then splitting is repeated until all 52 subkeys (SKI-SK52) have been
created.
The 64-bit plaintext block is first split into four blocks {B I-B4). A round then consists of the
following steps (OB stands for output block):
OBI = BI * SKI (multiply Ist sub-block with Ist subkey)
OB2 = B2 + SK2 (add 2nd sub-block to 2nd subkey)
OB3 = B3 + SK3 (add 3rd sub-block to 3rd subkey)
OB4 = B4 * SK4 (multiply 4th sub-block with 4th subkey)
OB5 =OBI XOR OB3 ( XOR results of steps I and 3)
OB6 = OB2 XOR OB4
OB7 = OB5 * SK5 (multiply result of step 5 with 5th subkey)
OB8 = OB6 + OB7 (add results of steps 5 and 7)
OB9 = OB8 * SK6 (multiply result of step 8 with 6th subkey)
Cryptography
OBIO = OB7 + OB9
OBll =OBI XOR OB9 (XOR results of steps 1 and 9)
OB 12 = OB3 XOR OB9
OB13 = OB2 XOR OBIO
OB14 = OB4 XOR OBIO
The input to the next round, is the four sub-blocks OB11, OB13, OB12, OB14 in that order.
After the eighth round, the four final output blocks (F1-F4) are used in a final transformation
to produce four sub-blocks of ciphertext (C1-C4) that are then rejoined to form the final64-bit
block of ciphertext.
C1 = F1 * SK49
C2=F2 + SK50
C3 = F3 + SK51
C4 = F4 * SK52
Ciphertext= C1 C2 C3 C4.
Security Provided by IDEA
Not only is IDEA approximately twice as fast as DES, but it is also considerably more secure.
Using a brute-force approach, there are 2128
possible keys. If a billion chips that could each test
1 billion keys a second were used to try and crack an IDEA-encrypted message, it would take
them 1013 years which is considerably longer than the age of the universe! Being a fairly new
algorithm, it is possible a better attack than brute-force will be found, which, when coupled with
much more powerful machines in the future may be able to crack a message. However, for a
long way into the future, IDEA seems to be an extremely secure cipher.
8.7 RC CIPHERS
The RC ciphers were designed by Ron Rivest for the RSA Data Security. RC stands for Ron's
Code or Rivest Cipher. RC2 was designed as a quick-fix replacement for DES that is more secure.
It is a block cipher with a variable key size that has a propriety algorithm. RC2 is a variable-key-
length cipher. However, when using the Microsoft Base Cryptographic Provider, the key length
is hard-coded to 40 bits. When using the Microsoft Enhanced Cryptographic Provider, the key
length is 128 bits by default and can be in the range of 40 to 128 bits in 8-bit increments.
RC4 was developed by Ron Rivest in 1987. It is a variable-key-size stream cipher. The details
of the algorithm have not been officially published. The algorithm is extremely easy to describe
and program. Just like RC2, 40-bit RC4 is supported by the Microsoft Base Cryptographic
provider, and the Enhanced provider allows keys in the range of 40 to 128 bits in 8-bit
increments.
RC5 is a block cipher designed for speed. The block size, key size and the number of
iterations are all variables. In particular, the key size can be as large as 2,048 bits.
l
Information Theory, Coding and Cryptography
All the encryption techniques discussed so far belong to the class of symmetric cryptography
(DES, IDEA and RC Ciphers). We now look at the class of Asymmetric Cryptographic
Techniques.
8.8 ASYMMETRIC (PUBLIC-KEY) ALGORITHMS
Public-key Algorithms are asymmetric, that is to say the key that is used to encrypt the
message is different from the key used to decrypt the message. The encryption key, known as
the public key is used to encrypt a message, but the message can only be decoded by the person
that has the decryption key, known as the private key.
This type of algorithm has a number of advantages over traditional symmetric ciphers. It
means that the recipient can make their public key widely available - anyone wanting to send
them a message uses the algorithm and the recipient's ーオ「ャゥセ@ key to do so. An eavesdropper
may have both the algorithm and the public key, but will still not be able to decrypt the message.
Only the recipient, with their private key can decrypt the message.
A disadvantage of public-key algorithms is that they are more computationally intensive than
symmetric algorithms, and therefore encryption and decryption take longer. This may not be
significant for a short text message, but certainly is for long messages or audio/video.
The Public-Key Cryptography Standards (PKCS) are specifications produced by RSA
Laboratories in cooperation with secure systems developers worldwide for the purpose of
accelerating the deployment of public-key cryptography. First published in 1991 as a result of
meetings with a small group of early adopters of public-key technology, the PKCS documents
have become widely referenced and implemented. Contributions from the PKCS series have
become part of many formal and de facto standards, including ANSI X9 documents, PKIX,
SET, S/MIME, and SSL.
The next two sections describe two popular public-key algorithms, the RSA Algorithm and
the Pretty Good Privacy (PGP) Hybrid Algorithm.
8.9 THE RSA ALGORITHM
RSA, named after its three creators-Rivest, Shamir and Adleman, was the first effective public-
key algorithm, and for years has withstood intense scrutiny by cryptanalysts all over the world.
Unlike symmetric key algorithms, where, as long as one presumes that an algorithm is not
flawed, the security relies on having to try all possible keys, public-key algorithms rely on it
being computationally unfeasible to recover the private key from the public key.
RSA relies on the fact that it is easy to multiply two large prime numbers together, but
extremely hard (i.e. time consuming) to factor them back from the result. Factoring a number
means finding its prime factors, which are the prime numbers that need to be multiplied
together in order to produce that number. For example,
Cryptography
10 = 2 X 5
60 = 2 X 2 X 3 X 5
2113
- 1 = 3391 X 23279 X 65993 X 1868569 X 1066818132868207
The algorithm
Two very large prime numbers, normally of equal length, are randomly chosen then multiplied
together.
N=AxB
T= (A- 1) X (B- 1)
(8.4)
(8.5)
A third number is then also chosen randomly as the public key (E; such that it has no common
factors (i.e. is relatively prime) with T. The private key (D) is then:
D= E- 1
mod T (8.6)
To encrypt a block of plaintext (M) into ciphertext (C):
C=MEmodN
Jo decrypt:
M= cDmod N
Example 8.5 Consider the following implementation ofthe RSA algorithm.
1st prime (A) = 37
2nd prime (B)= 23
So,
N = 37 X 23 = 851
T= (37- 1) X (23- 1) = 36 X 23 = 792
E must have no factors other than 1 in セッュュッョ@ with 792.
E (public key) could be 5.
D (private key)= 5-I mod 792 = 317
To encrypt a message (M) of the character 'G':
If G is represented as 7 (7th letter in alphabet), then M = 7.
C (ciphertext) = 75
mod 851 = 638
To decrypt: M = 638317
mod 851 = 7.
Security of RSA
(8.7)
(8.8)
The security of RSA algorithm depends on the ability of the hacker to factorize numbers. New,
faster and better methods for factoring numbers are constantly being devised. The current best
for long numbers is the Number Field Sieve. Prime Numbers of a length that was unimaginable a
mere decade ago are now factored easily. Obviously the longer a number is, the harder it is to
factor, and so the better the security of RSA. As theory and computers improve, larger and
Information Theory, Coding and Cryptography
larger keys will have to be used. The disadvantage in using extremely long keys is the
computational overhead involved in encryption/decryption. This will only become a problem
if a new factoring technique emerges that requires keys of such lengths to be used that necessary
key length increases much faster than the increasing average speed of computers utilising the
RSA algorithm.
In 1997, a specific assessment of the security of 512-bit RSA keys showed that one may be
factored for less than $1,000,000 in cost and eight months of effort. It is therefore believed that
512-bit keys provide insufficient security for anything other than short-term needs. RSA
Laboratories currently recommend key sizes of 768 bits for personal use, 1024 bits for corporate
use, and 2048 bits for extremely valuable keys like the root-key pair used by a certifying
authority. Security can be increased by changing a user's keys regularly and it is typical for a
user's key to expire after two years (the opportunity to change keys also allows for a longer
length key to be chosen).
Even without using huge keys, RSA is about 1000 times slower to encrypt/decrypt than DES.
This has resulted in it not being widely used as a stand-alone cryptography system. However, it
is used in many hybrid cryptosystems such as PGP. The basic principle of hybrid systems is to
encrypt plaintext with a Symmetric Algorithm (usually DES or IDEA); the symmetric
algorithm's key is then itself encrypted with a public-key algorithm such as RSA. The RSA
encrypted key and symmetric algorithm-encrypted message are then sent to the recipient, who
uses his private RSA key to decrypt the Symmetric Algorithm's key, and then that key to decrypt
the message. This is considerably faster than using RSA throughout, and allows a different
symmetric key to be used each time, considerably enhancing the security of the Symmetric
Algorithm.
RSA's future security relies· solely on advances in factoring techniques. Barring an
astronomical increase in the efficiency of factoring techniques, or available computing power,
the 2048-bit key will ensure very secure protection into the foreseeable future. For instance an
Intel Paragon, which can achieve 50,000 mips (million operations per second), would take a
million years to factor a 2048-bit key using current techniques.
8. 10 PREITY GOOD PRIVACY (PGP)
Pretty Good Privacy (PGP) is a hybrid cryptosystem that was created by Phil Zimmerman and
released onto the Internet as a freeware program in 1991. PGP is not a new algorithm in its own
right, but rather a series of other algorithms that are performed along with a sophisticated
protocol. PGP's intended use was for e-mail security, but there is no reason why the basic
principles behind it could not be applied to any type of transmission.
PGP and its source code is freely available on the Internet. This means that since its creation
PGP has been subjected to an enormous amount of scrutiny by cryptanalysts, who have yet to
find an exploitable fault in it. ·
Cryptography
PGP has four main modules: a symmetric cipher- IDEA for message encryption, a public
key algorithm-RSA to encrypt the IDEA key and hash values, a one-way hash function-MD5
for signing, and a random number generator.
The fact that the body of the message is encrypted with a symmetric algorithm (IDEA) means
that PGP generated e-mails are a lot faster to encrypt and decrypt than ones using simple RSA.
The key for the IDEA module is randomly generated each time as a one-off session key, this
makes PGP very secure, as even if one message was cracked, all previous and subsequent
messages would remain secure. This session key is then encrypted with the public key of the
recipient using RSA. Given that keys up to 2048 bits long can be used, this is extremely secure.
MD5 can be used to produce a hash of the message, which can then be signed by the sender's
private key. Another feature of PGP's security is that the user's private key is encrypted using a
hashed pass-phrase rather than simply a password, making the private key extremely resistant
to copying even with access to the user's computer.
Generating true random numbers on a computer is notoriously hard. PGP tries to achieve
randomness by making use of the keyboard latency when the user is typing. This means that the
program measures the gap of time between each key-press. Whilst at first this may seem to be
distinctly non-random, it is actually fairly effective-people take longer to hit some keys than
others, pause for thought, make mistakes and vary tpeir overall typing speed on all sorts of
factors such as knowledge of the subject and tiredness. These measurements are not actually
used directly but used to trigger a pseudo-random number generator. There are other ways of
generating random numbers, but to be much better than this gets very complex.
PGP uses a very clever, but complex, protocol for key management. Each user generates and
distributes their public key. IfJames is happy that a person's public key belongs to who it claims
to belong to, then he can sign that person's public key andJames's program will then accept
messages from that person as valid. The user can allocate levels of trust to other users. For
instance, James may decide that he completely trusts Earl to sign other peoples' keys, in effect
saying "his word is good enough for me". This means that if Rachel, who has had her key signed
by Earl, wants to communicate withJames, she sendsJames her signed key.James's program
recognises Earl's signature, has been told that Earl can be trusted to sign keys, and so accepts
Rachel's key as valid. In effect Earl has introduced Rachel to James.
PGP allows many levels of trust to be assigned to people, and this is best illustrated in Fig. 8.4.
The explanations are as follows.
15
' line
James has signed the keys of Earl, Sarah, Jacob and Kate. James completely trusts Earl to sign
other peoples' keys, does not trust Sarah at all, and partially trusts Jacob and Kate (he trusts
Jacob more than Kate).
X I Mike
(unsigned)
セuョ・@
Information Theory, Coding and Cryptography
Q =fully trusted 0 = partially trusted
to a lesser degree
c:J =partially trusted X CJ =not validated
セ@ =B's key validated directly
mor by introduction
Level 1-People with keys signed
by James
Levei2-People with keys signed
by those on level 1
Level 3-People with keys signed
by those on level 2
Fig. 8.4 An Example of a PGP User Web.
aャエィッオセj。ュ・ウ@ has not signed Sam's key he still trusts Sam to sign other peoples' keys, may be
セョ@ bセ「@ s say so or due to them actually meeting. Because Earl has signed Rachel's key, Rachel
IS validated H「セエ@ not trusted to sign keys). Even though Bob's key is signed by Sarah andJacob,
because Sarah IS not trusted andJacob only partially trusted, Bob is not validated. Two partially
trusted people,Jacob and Kate, have signed Archie's key, therefore Archie is validated.
J«i line
S_am, who is fully trusted, has signed Hal's key, therefore Hal is validated. Louise's key has been
signed by Rachel and Bob, neither of whom is trusted, therefore Louise is not validated.
Odd one out
Mike's key has not been signed by anyone in James' group, maybe James found it on the
Internet and does not know whether it is genuine or not.
PGP never prevents the user from sending or receiving e-mail, it does however warn the user
if a key is not validated, and the decision is then up to the user as to whether to heed the warning
or not.
Key Revocation
If a user's private key is compromised then they can send out a key revocation certificate.
Unfortunately this does not guarantee that everyone with that user's public key will receive it, as
ke!s are often swap?ed in a disorganised manner. Additionally, if the user no longer has the
pnvate key then they cannot issue a certificate, as the key is required to sign it.
Cryptography
Security of PGP
"A chain is only as strong as its weakest link" is the saying and it holds true for PGP. If the user
chooses a 40-bit RSA key to encrypt his session keys and never validates any users, then PGP
will not be very secure. If however a 2048-bit RSA key is chosen and the user is reasonably
vigilant, then PGP is the closest thing to military-grade encryption the public can hope to get
their hands on.
The Deputy Director of the NSA was quoted as saying:
''Ifall the personal computers in the world, an estimated 260 million, were put to work on a single PGP-
encrypted message, it would still take an estimated 72 million times the age ofthe universe, on average, to
break a single message. "
A disadvantage of public-key cryptography is that anyone can send you a message using your
public key, it is then necessary to prove that this message came from who it claims to have been
sent by. A message encrypted by someone's private key, can be decrypted by anyone with their
public key. This means that if the sender encrypted a message with his private key, and then
encrypted the resulting ciphertext with the recipient's public key, the recipient would be able to
decrypt the message with first their private key, and then the sender's public key, thus
recovering the message and proving it came from the correct sender.
This process is very time-consuming, and therefore rarely used. A much more common
method of digitally signing a message is using a method called One-Way Hashing.
8.11 ONE-WAY HASHING
A One-Way Hash Function is a mathematical function that takes a message string of any
length (pre-string) and returns a smaller fixed-length string (hash value). These functions are
designed in such a way that not only is it very difficult to deduce the message from its hashed
version, but also that even given that all hashes are a certain length, it is extremely hard to find
two messages that hash to the same value. In fact to find two messages with the same hash from
a 128-bit hash function, 264
hashes would have to be tried. In other words, the hash value of a
file is a small unique 'fingerprint'. Even a slight change in an input string should cause the hash
value to change drastically. Even if 1 bit is flipped in the input string, at least half of the bits in
the hash value will flip as a result. This is called an Avalanche Effect.
H = hash value, f = hash function, M = original message/pre-string, then
H= f(M). (8.9)
Ifyou know Mthen His easy to compute. However knowing Handf, it is not easy to compute
M, and is hopefully computationally unfeasible.
As long as there is a low risk of collision (i.e. 2 messages hashing to the same value), and the
hash is very hard to reverse, then a one-way hash function proves extremely useful for a number
of aspects of cryptography.
Information Theory, Coding and Cryptography
If you one-way hash a message, the result will be a much shorter but still unique (at least
statistically) number. This can be used as proof of ownership of a message without having to
reveal the contents of the actual message. For instance rather than keeping a database of
copyrighted documents, if just the hash values of each document were stored, then not only
would this save a lot of space, but it would also provide a great deal of security. If copyright then
needs to be.proved, the owner could produce the original document and prove it hashes to that
value.
Hash-functions can also be used to prove that no changes have been made to a file, as adding
even one character to a file would completely change its hash value.
By far the most common use of hash functions is to digitally sign messages. The sender
performs a one-way hash on the plaintext message, encrypts it with his private key and then
encrypts both with the recipient's public key and sends in the usual way. On decrypting the
ciphertext, the recipient can use the sender's public key to decrypt the hash value, he can then
perform a one-way hash himself on the plaintext message, and check this with the one he has
received. If the hash values are identical, the recipient knows not only that the message came
from the correct sender, as it used their private key to encrypt the hash, but also that the
plaintext message is completely authentic as it hashes to the same value.
The above method is greatly preferable to encrypting the whole message with a private key,
as the hash of a message will normally be considerably smaller than the message itself. This
means that it will not significantly slow down the decryption process in the same way that
decrypting the entire message with the sender's public key, and then decrypting it again with
the recipient's private key would. The PGP system uses the MD5 hash function for precisely this
purpose.
The Microsoft Cryptographic Providers support three hash algorithms: MD4, MD5 and
SHA. Both MD4 and MD5 were invented by Ron Rivest. MD stands for Message Digest. Both
algorithms produce 128-bit hash values. MD5 is an improved version of MD4. SHA stands for
Secure Hash Algorithm. It was designed by NIST and NSA. SHA produces 160-bit hash values,
longer than MD4 and MD5. SHA is generally considered more secure than other algorithms
and is the recommended hash algorithm.
8.12 OTHER TECHNIQUES
One Time Pads
The one-time pad was invented by MajorJoseph Mauborgne and Gilbert Bemam in 1917, and
is an unconditionally secure (i.e. unbreakable) algorithm. The theory behind a one-time pad is
simple. The pad is a non-repeating random string of letters. Each letter on the pad is used once
only to encrypt one corresponding plaintext character. After use, the pad must neverbe re-used.
As long as the pad remains secure, so is the message. This is because a random key added to a
Cryptography
non-random message produces completely random ciphertext, and there is absolutely no
amount of analysis or computation that can alter that. If both pads are destroyed then the
original message will never be recovered. There are two major drawbacks:
Firstly, it is extremely hard to generate truly random numbers, and a pad that has even a
couple of non-random properties is theoretically breakable. Secondly, because the pad can
never be reused no matter how large it is, the length of the pad must be the same as the length
of the message which is fine for text, but virtually impossible for video.
Steganography
Steganography is not actually a method of encrypting messages, but hiding them within
something else to enable them to pass undetected. Traditionally this was achieved with invisible
ink, microfilm or taking the first letter from each word of a message. This is now achieved by
hiding the message within a graphics or sound file. For instance in a 256-greyscale image, if the
least significant bit of each byte is replaced with a bit from the message then the result will be
indistinguishable to the human eye. An eavesdropper will not even realise a message is being
sent. This is not cryptography however, and although it would fool a human, a computer would
be able to detect this very quickly and reproduce the original message.
Secure Mail and S/MIME
Secure Multipurpose Internet Mail Extensions (S/MIME) is a de facto standard developed by
RSA Data Security, Inc., for sending secure mail based on public-key cryptography. MIME is
the industry standard format for electronic mail, which defines the structure of the message's
body. S/MIME-supporting e-mail applications add digital signatures and encryption capabilities
to that format to ensure message integrity, data origin authentication and confidentiality of
electronic mail.
When a signed message is sent, a detached signature in the PKCS =#=7 format is sent along
with the message as an attachment. The signature attachment contains the hash of the original
message signed with the sender's private key, as well as the signer certificate. S/MIME also
supports messages that are first signed with the sender's private key and then enveloped using
the recipients' public keys.
8.13 SECURE COMMUNICATION USING CHAOS FUNCTIONS
Chaos functions have also been used for secure communications and cryptographic
applications. The implication of a chaos function here is an iterative difference equation that
exhibits chaotic behaviour. If we observe the fact that cryptography has more to do with
unpredictability rather than randomness, chaos functions are a good choice because of their
property of unpredictability. If a hacker intercepts part of the sequence, he will have no
information on how to predict what comes next. The unpredictability of chaos functions makes
them a good choice for generating the keys for symmetric cryptography.
Information Theory, Coding and Cryptography
Example 8.6 Consider the difference equation
Xn+l = axn(l- Xn) (8.10)
For a= 4, this function behaves like a chaos function, i.e.,
(i) エィセ@ values obtained by successive iterations are unpredictable, and
(ii) the function is extremely sensitive to the initial condition, Xo·
For any given initial condition, this function will generate values ofx11
between 0 and 1for each
iteration. These values are good candidates forkey generation. In single-keycryptography, a key is
used for enciphering the message. This key is usually a pseudo noise (PN) sequence. The message
can be simply XORed with the key in order to scramble it. Since xn takes positive values that are
always less than unity, the binary equivalent ofthese fractions can serve as keys. Thus, one ofthe
ways ofgenerating keys from these random, unpredictable decimal numbers is to directly use their
binary representation. The lengths ofthese binary sequences will be limited only by the accuracy of
the decimal numbers, and hence very long binary keys can be generated. The recipient must know
the initial condition in order to generate the keys for decryption.
For application in single key cryptography the following two factors need to be decided
(i) The start value for the iterations (Xo), and
(ii) The number for decimal places of the mantissa that are to be supported by the calculating
machine (to avoid round off error).
For single-key cryptography, the chaos values obtained after some number of iteration are
converted to binary fractions whose first 64 bits are taken to generate PN sequences. These
initial iterations would make it still more difficult for the hacker to guess the initial condition.
The starting value should be taken between 0 and 1. A good choice of the starting value can
improve the performance slightly.
The secrecy of the starting number, x0, is the key to the success of this algorithm. Since chaos
functions are extremely sensitive to even errors of 10-30
in the starting number (x0
), it means
that we can have 1030
unique starting combinations. Therefore, a hacker who knows the chaos
function and the encryption algorithm has to try out 1030
different start combinations. In the
DES algorithm the hacker had to try out approximately 1019
different key values.
Chaos based algorithms require a high computational overhead to generate the chaos values
as well as high computational speeds. Hence, it might not be suitable for bulk data encryption.
8.14 CRYPTANALYSIS
Cryptanalysis is the science (or black art!) of recovering the plaintext of a message from the
ciphertext without access to the key. In cryptanalysis, it is always assumed that the cryptanalyst
Cryptography
has full access to the algorithm. An attempted cryptanalysis is known as an attack, of which
there are five major types:
• Bruteforce attack This technique requires a large amount of computing power and a large
amount of time to run. It consists of trying all possibilities in a logical manner until the
correct one is found. For the majority of encryption algorithms a brute force attack is
impractical due to the large number of possibilities.
• Ciphertext-only: The only information the cryptanalyst has to work with is the ciphertext
of various messages all encrypted with the same algorithm.
セエ@ Known-plaintext. In this scenario, the cryptanalyst has access not only to the ciphertext of
various messages, but also the corresponding plaintext as well.
• Chosen-plaintext: The cryptanalyst has access to the same information as in a known
plaintext attack, but this time may choose the plaintext that gets encrypted. This attack is
more powerful, as specific plaintext blocks can be chosen that may yield more
information about the key. An adaptive-chosen-plaintext c:ttack is merely one where the
cryptanalyst may repeatedly encrypt plaintext, thereby modifying the input based on the
results of a previous encryption.
• Chosen-ciphertext. The cryptanalyst uses a relatively new technique called differential
cryptanalysis, which is a1 interactive and iterative process. It works through many rounds
using the results from P- "vious rounds, until the key is identified. The cryptanalyst
repeatedly chooses ciphertext to be decrypted, and has access to the resulting plaintext.
From this they try to deduce the key.
There is only one totally secure algorithm, the one-time pad. All other algorithms can be
broken given infinite time and resources. Modern cryptography relies on making it
computationally unfeasible to break an algorithm. This means, that while it is theoretically
possible, the time scale and resources involved make it completely unrealistic.
If an algorithm is presumed to be perfect, then the only method of breaking it relies on trying
every possible key combination until the resulting ciphertext makes sense. As mentioned above,
this type of attack is called a brute-force attack. The field of parallel computing is perfectly
suited to the task of brute force attacks, as every processor can be given a number of possible
keys to try, and they do not need to interact with each other at all except to announce the result.
A technique that is becoming increasingly popular is parallel processing using thousands of
individual computers connected to the Internet. This is known as distributed computing. Many
cryptographers believe that brute force attacks are basically ineffective when long keys are
used. An encryption algorithm with a large key (over 100 bits) can take millions of years to
crack, even with powerful, networked computers of today. Besides, adding a single extra key
doubles the cost of performing a brute force cryptanalysis.
Regarding brute force attack, there are a couple of other pertinent questions. What if the
original plaintext is itself a cipher? In that case, how will the hacker know if he has found the
right key. In addition, is the cryptanalyst sitting at the computer and watching the result of each
Information Theory, Coding and Cryptography
key that is being tested? Thus, we can assume that brute force attack is impossible provided long
enough keys are being used.
Here are some of the techniques that have been used by cryptanalysts to attack ciphertext.
• Differential cryptanalysis: As mentioned before, this technique uses an iterative process to
evaluate cipher that has been generated using an iterative block algorithm (e.g. DES).
Related plaintext is encrypted using the same key. The difference is analysed. This
technique proved successful against DES and some hash functions.
• Linear Cryptanalysis: In this, pairs of plaintext and ciphertext are analysed and a linear
approximation technique is used to determine the behaviour of the block cipher. This
technique was also used successfully against DES.
• Algebraic attack This technique exploits the mathematical structure in block ciphers. If
the structure exists, a single encryption with one key might produce the same result as a
double encryption with two different keys. Thus the search time can be reduced.
However, strong or weak the algorithm used to encrypt it, a message can be thought of as
secure if the time and/or resources needed to recover the plaintext greatly exceed the benefits
bestowed by having the contents. This could be because the cost involved is greater than the
financial value of the message, or simply that by the time the plaintext is recovered the contents
will be outdated.
8. 15 POLITICS OF CRYPTOGRAPHY
Widespread use of cryptosystems is something most governments are not particularly happy
about-precisely because it threatens to give more privacy to the individual, including criminals.
For many years, police forces have been able to tap phone lines and intercept mail, however, in
an encrypted future that may become impossible.
This has led to some strange decisions on the part of governments, particularly the United
States government. In the United States, cryptography is classified as a munition and the export
of programs containing cryptosystems is tightly controlled. In 1992, the Software Publishers
Association reached agreement with the State Department to allow the export of software that
contained RSA's RC2 and RC4 encryption algorithms, but only if the key size was limited to 40
bits as opposed to the 128 bit keys available for セウ・@ within the US. This significantly reduced the
level of privacy produced. In 1993 the US Congress had asked the National Research Council
to study US cryptographic policy. Its 1996 report, the result of two years' work, offered the
fol!owing conclusions and recommendations:
• "On balance, the advantages of more widespread use of cryptography outweigh the
disadvantages."
• "No law should bar the manufacture, sale or use of any form of encryption within the
United States."
• "Export controls on cryptography should be progressively relaxed but not eliminated."
Cryptography
In 1997 the limit on the key size was increased to 56 bits. The US government has proposed
several methods whereby it would allow the export of stronger encryption, all based on a system
where the US government could gain access to the keys if necessary, for example the clipper
chip. Recently there has been a lot of protest from the cryptographic 」セュュオョゥ⦅エケ@ against the _tJS
government imposing restrictions on the development of cryptographic セ・」セュアオ・ウN@ :ne article
by Ronald L. Rivest, Professor, MIT, in the October 1998 issue of the Sctentific Ammcan, (pages
116-117) titled "The Case against Regulating Encryption Technology," is an example of such a
protest. The resolution of this issue is regarded to be one of the most important for the future of
e-commerce.
8.16 CONCLUDING REMARKS
In this section we present a brief history of cryptography. People have tried to conceal
information in written form since writing was developed. Examples survive in stone inscriptions
and papyruses showing that many ancient civilizations including the Egyptians, Hebrews and
Assyrians all developed cryptographic systems. The first recorded use of cryptography for
correspondence was by the Spartans who (as early as 400 BC) employed a cipher device called
a scytale to send secret communications between military commanders.
The scytale consisted of a tapered baton around which was wrapped a piece of parchment
inscribed with the message. Once unwrapped the parchment appeared to contain an
incomprehensible set of letters, however when wrapped around another baton of identical size
the original text appears.
The Greeks were therefore the inventors of the first transposition cipher and in the fourth
century BC the earliest treatise on the subject was written by a Greek, Aeneas t。」セ」オウL@ as part
of a work entitled On the Defence ofFortifications. Another Greek, Polybius, later deVIsed a means
of encoding letters into pairs of symbols using a device known as the Polybius checkerboardwhich
contains many elements common to later encryption systems. In addition to the Greeks there
are similar examples of primitive substitution or transposition ciphers in use by ッエィセイ@
civilizations including the Romans. The Polybius checkerboard consists of a five by five gn_d
containing all the letters of the alphabet. Each letter is converted into two numbers, the first 1s
the row in which the letter can be found and the second is the column. Hence the letter A
becomes 11, the letter B 12 and so forth.
The Arabs were the first people to clearly understand the principles of cryptography. They
devised and used both substitution and transposition ciphers and discovered the use of letter
frequency distributions in cryptanalysis. As a result of this, by approximately 1412, al-Kalka-
shandi could include in his encyclopaedia Subh al-a'sha a respectable, if elementary, treatment
of several cryptographic systems. He also gave explicit instructions on how to _cryptanalyze
ciphertext using letter frequency counts including examples illustrating the techmque.
Information Theory, Coding and Cryptography
European cryptography dates from the Middle Ages during which it was developed by the
Papal and Italian city states. The earliest ciphers involved only vowel substitution (leaving the
consonants unchanged). Circa 1379 the first European manual on cryptography, consisting of a
compilation of ciphers, was produced by Gabriele de Lavinde of Parma, who served Pope
Clement VII. This manual contains a set of keys for correspondents and uses symbols for letters
and nulls with several two character code equivalents for words and names. The first brief code
vocabularies, called nomenclators, were expanded gradually and for several centuries were the
mainstay of diplomatic communication for nearly all European governments. In 1470 Leon
b。エエゥセエ。@ Al.berti described the first cipher disk in Trattati in cifra and the Traicti de chiffres,
published m 1586 by Blaise de Vigernere contained a square table commonly attributed to him
as well as descriptions of the first plaintext and ciphertext autokey systems.
By 1860 large codes were in common use for diplomatic communications and cipher systems
had 「・」セュ・N@ a rarity for this application. However, cipher systems prevailed for military
」ッュュセュ」。エゥッョウ@ (except for high-command communication because of the difficulty of
ーイッエ・」セョァ@ codebooks from capture or compromise). During the US Civil War the Federal Army
・セエ・ョウゥカ・ャケ@ used エイセウーッウゥエゥッョ@ ciphers. The Confederate Army primarily used the Vigenere
cipher and ッセ@ occasiOnal monoalphabetic substitution. While the Union cryptanalysts solved
most of the mtercepted Confederate ciphers, the Confederacy, in desperation, sometimes
published Union ciphers in newspapers, appealing for help from readers in cryptanalysing
them.
During the first world war both sides employed cipher systems almost exclusively for tactical
」ッュュオョセ」。セッョ@ while code systems were still used mainly for high-command and diplomatic
」ッュセセュ」セエゥッョN@ Although field cipher systems such as the US Signal Corps cipher disk lacked
sophistication, some complicated cipher systems were used for high-level communications by
the end of the war. The most famous of these was the German ADFGVX fractionation cipher.
.In the 1920s the maturing of mechanical and electromechanical technology came together
With the needs of telegraphy and radio to bring about a revolution in cryptodevices-the
development セヲ@ イッセッイ@ cipher machines. The concept of the rotor had been anticipated in the
older ュ・」ィ。ョセ」セ@ cipher disks however it was an American, Edward Hebern, who recognised
that by hardwxnng a monoalphabetic substitution in the connections from the contacts on one
side of an electrical rotor to those on the other side and cascading a collection of such rotors
ーッャケセーセ。「・エゥ」@ substitutions of almost any complexity could be produced. From· 1921 and
」ッョエゥセオュァ@ through the next decade, Hebern constructed a series of steadily improving rotor
machmes that were evaluated by the US Navy. It was undoubtedly this work which led to the
United States' superior position in cryptology during the Second World War. At almost the
same time as Hebern was inventing the rotor cipher machine in the United States European
・セァゥョ・・イウ@ such as Hugo Koch (Netherlands) and Arthur Scherbius (Germany) ゥョ、セー・ョ、・ョエャケ@
、セウ」ッカ・イ・、@ the rotor concept and designed the precursors to the most famous cipher machine in
history, the German Enigma Machine which was used during World War 2. These machines
Cryptography
were also the stimulus for the TYPEX, the cipher machine employed by the British during
World War 2.
The United States introduced the M-134-C (SIGABA) cipher machine during World War 2.
TheJapanese cipher machines of World War 2 have an interesting history linking them to both
the Hebern and the Enigma machines. After Herbert Yardley, an American cryptographer who
organised and directed the US government's first formal code-breaking efforts during and after
the first world war, published The American Black Chamber in which he outlined details of the
American successes in cryptanalysing the Japanese ciphers, the Japanese government set out to
develop the best cryptomachines possible. With this in mind, it purchased the rotor machines of
Hebern and the commercial Enigmas, as well as several other contemporary machines, for
study. In 1930 the Japanese's first rotor machine, code named RED by US cryptanalysts, was
put into service by the Japanese Foreign Office. However, drawing on experience gained セッュ@
cryptanalysing the ciphers produced by the Hebern rotor machines, the US Army Signal
Intelligence Service team of cryptanalysts succeeded in cryptanalysing the RED ciphers. In
1939, the Japanese introduced a new cipher machine, code-named PURPLE by US
cryptanalysts, in which the rotors were replaced by telephone stepping ウキゥエ」ィ・セN@ The ァイ・セエセウエ@
triumphs of cryptanalysis occurred during the Second World War when the Pohsh and Bntish
cracked the Enigma ciphers and the American cryptanalysts broke the Japanese RED,
ORANGE and PURPLE ciphers. These developments played a major role in the Allies' conduct
of World War 2.
After World War 2 the electronics that had been developed in support of radar were adapted
to cryptomachines. The first electrical cryptomachines were little more than rotor machines
where the rotors had been replaced by electronic substitutions. The only advantage of these
electronic rotor machines was their speed of operation as they were still affected by the inherent
weaknesses of the mechanical rotor machines.
The era of computers and electronics has meant an unprecedented freedom for cipher
designers to use elaborate designs which would be far too prone to error if handled セエィ@ セ・ョ」ゥャ@
and paper, or far too expensive to implement in the form of an electromechamcal .cipher
machine. The main thrust of development has been in the development of block ciphers,
beginning with the LUCIFER project at IBM, a direct ancestor of the DES (Data Encryption
Standard).
There is a place for both symmetric and public-key algorithms in modern cryptography.
Hybrid cryptosystems successfully combine aspects of both and seem to be ウ・N」オイセ@ 。ョセ@ ヲ。ウセN@
While PGP and its complex protocols are designed with the Internet commumty m mmd, セエ@
should be obvious that the encryption behind it is very strong and could be adapted to smt
many applications. There may still be instances when a simple algorithm is ョ・」・ウセ。イケL@ and with
the security provided by algorithms like IDEA, there is absolutely no reason to thmk of these as
significantly less secure.
-
Information Theory, Coding and Cryptography
An article posted on the Internet on the subject of picking locks stated: "The most effective
door opening tool in any burglars' toolkit remains the crowbar". This also applies to
cryptanalysis - direct action is often the most effective. It is all very well transmitting your
messages with 128-bit IDEA encryption, but if all that is necessary to obtain that key is to walk
up to one of the computers used for encryption with a floppy disk, then the whole point of
encryption.is negated. In other words, an incredibly strong algorithm is not sufficient. For a.
system to be effective there must be effective management protocols involved. Finally, in the
words of Sir Edgar Allen Poe, "Human ingenuity cannot concoct a cipher which human
ingenuity cannot resolve."
SUlvflvfARY
• A cryptosystem is a collection of algorithms and associated procedures for hiding and
revealing information. Cryptanalysis is the process of analysing a cryptosystem, either to
verify its integrity or to break it for ulterior motives. An attacker is a person or system
that performs cryptanalysis in order to break a cryptosystem. The process of attacking a
cryptosystem is often called cracking. The job of the cryptanalyst is to find the weaknesses
in the cryptosystem.
• A message being sent is known as plaintext. The message is coded using a cryptographic
algorithm. This process is called encryption. An encrypted message is known as
ciphertext, and is turned back into plaintext by the process of decryption.
• A key is a value that causes a cryptographic algorithm to run in a specific manner
and produce a specific ciphertext as an output. The key size is usually measured in
bits. The bigger the key size, the more secure will be the algorithm.
• sケュセ・エイゥ」@ algorithms (or single key algorithms or secret key algorithms) have one key
that IS used both to encrypt and decrypt the message, hence their name. In order for the
recipient to decrypt the message they need to have an identical copy of the key. This
presents one major problem, the distribution of the keys.
• Block ciphers usually operate on groups of bits called blocks. Each block is processed a
multiple number of times. In each round the key is applied in a unique manner. The
more the number of iterations, the longer is the encryption process, but results in a more
secure ciphertext.
• Stream ciphers operate on plaintext one bit at a time. Plaintext is streamed as raw bits
through the encryption algorithm. While a block cipher will produce the same ciphertext
from the same plaintext using the same key, a stream cipher will not. The ciphertext
produced by a stream cipher will vary under the same conditions.
• To determine how much security one needs, the following questions must be answered:
1. What is the worth of the data to be protected?
2. How long does it need to be secure?
3. What are the resources available to the cryptanalyst/hacker?
Cryptography
• Two symmetric algorithms, both block ciphers, were discussed in this chapter. These are
the Data Encryption Standard (DES) and the International Data Encryption Algorithm
(IDEA).
• Public-key algorithms are asymmetric, that is to say the key that is オウ・セ@ to encrypt the
message is different to the key used to decrypt the message. The encryption key, known
as the public key is used to encrypt a message, but the message can ッセャケ@ be 、・」ッセ・、@ by
the person that has the decryption key, known as the private key. Rivest, Shamir セ、@
Adleman (RSA) algorithm and the Pretty Good Privacy (PGP) are two popular pubhc-
key encryption techniques.
• RSA relies on the fact that it is easy to multiply two large prime numbers together, but
extremely hard (i.e. time consuming) to factor them back from the result. Factoring a
number means finding its prime factors, which are the prime numbers that need to be
multiplied together in order to produce that number.
• A one-way hash function is a mathematical function that takes a message string of .any
length (pre-string) and returns a smaller fixed-length string (hash value). These functio?s
are designed in such a way that not only is it very difficult to 、・、セ」・@ the ュセウセ。ァ・@ from Its
hashed version, but also that even given that all hashes are a certain length, It IS extremely
hard to find two messages that hash to the same value.
• Chaos functions can be used for secure communication and cryptographic applications.
The chaotic functions are primarily used for generating keys that are essentially
unpredictable.
• An attempted unauthorised cryptanalysis is known as an attack, of which エィ・セ・@ are five
major types: Brute force attack, Ciphertext-only, Known-plaintext, Chosen-plamtext and
Chosen ciphertext.
• The common techniques that are used by cryptanalysts to attack ciphertext are
differential cryptanalysis, linear cryptanalysis and algebraic attack.
• Widespread use of cryptosystems is something most ァッカ・イョュ・ョエセ@ a.r:e _not ーセ」オャ。Zャケ@
happy about, because it threatens to give more privacy to the md1vtdual, mcludmg
criminals.
MMMMMMMMMMMMMMMMMMMMMMセMMMMᄋMMMMᄋMM -- ------·--·------ セ@
niセ@ wtttOr"e/ ゥMューセ@ tha,.n; ォNNキキセ@
i I ᆪセ@ Alliert (1879-1955)
! '
ᄋMMセᄋM
0
8.1 We want to test the security of character+ x encrypting technique in which each alphabet
of the plaintext is shifted by n to produce the ciphertext.
(a) How many different attempts must be made to crack this code assuming brute f?rce
attack is being used?
(b) Assuming it takes a computer 1 ms to check out one value of the shift, how soon can
this code be broken into?
Information Theory, Coding and Cryptography
8.2 Suppose a group of N people want to use secret key cryptography. Each pair of people in
the group should be able to communicate secretly. How many distinct keys are required?
8.3 Transposition Ciphers rearrange the letters of the plaintext without changing the letters
themselves. For example, a very simple transposition cipher is the railfence, in which the
plaintext is staggered between two rows and then read off to give the ciphertext. In a two
row rail fence the message MERCHANT TAYLORS' SCHOOL becomes:
M R H N T Y 0 S C 0 L
E C A T A L R S H 0
Which is read out as: MRHNTYOSCOLECATALRSHO.
(a) If a cryptanalyst wants to break into the rail fence cipher, how many distinct attacks
must he make, given the length of the ciphertext is n?
(b) Suggest a decrypting algorithm for the rail fence cipher.
8.4 One of the most famous field ciphers ever was a fractionation system - the ADFGVX
cipher, which was employed by the German Army during the first world war. This system
was so named because it used a 6 x 6 matrix to substitution-encrypt the 26 letters of the
alphabet and 10 digits into pairs of the symbols A, D, F, G, V and X. The resulting
biliteral cipher is only an intermediate cipher, it is then written into a rectangular matrix
and transposed to produce the final cipher which is the one which would be transmitted.
Here is an example of enciphering the phrase "Merchant Taylors" with this cipher using
the key word "Subject". .
A D F G v X
A s u B J E c
D T A D F G H
F I K L M N 0
G p Q R v w X
v y z 0 1 2 3
X 4 5 6 7 8 9
Plaintext: M E R C H A N T T A Y L 0 R S
Ciphertext: FG AV GF AX DX DD FV DA DA DD VA FF FX GF AA
This intermediate ciphertext can then be put in a transposition matrix based on a different key.
c I p H E R
1 4 5 3 2 6
F G A v G F
A X D X D D
F v D A D A
D D v A F F
F X G F A A
The final cipher is therefore: FAFDFGDDFAVXAAFGXVDXADDVGFDAFA.
8.5
Cryptography
(a) If a cryptanalyst wants to break into ADFGVX cipher, how many distinct attacks
must he make, given the length of the ciphertext is n?
(b) Suggest a decrypting algorithm for the ADFGVX cipher.
Consider the knapsack technique for encryption proposed by Ralph mセイォャ・@ of XEROX
and Martin Hellman of Stanford University in 1976. They suggested usmg the knapsack,
or subset-sum, problem as the basis for a public key cryptosystem. This problem en_tails
determining whether a number can be expressed as a sum of some セオ「ウ・エ@ of a gtven
sequence of numbers and, more importantly, which subset has the desired sum.
Given a sequence of numbers A, where A= (a1 ... aJ, and a number C, the knapsack
problem is to find a subset of a1 ... an which sums to C.
Consider the following example:
n= 5, C= 14, A= (1, 10, 5, 22, 3)
Solution= 14 = 1 + 10 + 3
In general, all the possible sums of all subsets can be expressed by:
ml al + BBBセ@ + mstl:3 + ... + mnan where each mi is either 0 or 1.
The solution is therefore a binary vector M = (1, 1, 0, 0, 1).
There is a total number 2n of such vectors (in this example 2
5
= 32)
Obviously not all values of C can be formed from the sum of a subset and some can be
formed in more than one way. For example, when A= (14, 28, 56, 82, 90, 132, 197, 284,
341, 455, 515) the figure 515 can be formed in three different ways but the number 516
cannot be formed in any way.
(a) If a cryptanalyst wants to break into this knapsack cipher, how many distinct attacks
must he make?
(b) Suggest a decrypting algorithm for the knapsack cipher. .
8.6
(a) Use the prime numbers 29 and 61 to generate keys using the RSA Algonthm.
(b) Represent the letters 'RSA' in ASCII and encode them using the key generated
above.
(c) Next, generate keys using the pair of primes, 37 and 67. Which is more secure, the
keys in part (a) or part (c)? Why?
8.7 Write a program that performs encryption using DES.
8.8 Write a program to encode and decode using IDEA. Compare エィセ@ ョセュ「セセセヲ@ d
computations required to encrypt a plaintext using the same keys1ze 10r an
IDEA.
8.9 Write a general program that dm factorize a given number.
8.10 Write a program to encode and decode using the RSA algorithm. Plot the number_ of
floating point operations required to be performed by the program versus the key-size.
Information Theory, Coding and Cryptography
8.11 Consider the difference equation
Xn+ I= axn(l - Xn)
For a = 4, this function behaves like a chaos function.
(a) Plot a sequence of 100 values obtained by iterative application of the difference
equation. What happens if the starting values Xo = 0.5?
(b) Take two initial conditions (i.e., two different starting values, Xor and .xo2
) which are
separated by セクN@ Use the difference equation to iterate each starting point ntimes and
obtain the final values y01 and y02, which are separated by セケN@ For a given セクL@ plot L1y
versus n.
(c) For a given value of n (say n= 500), plot セク@ verus セケN@
(d) Repeat parts (a), (b) and (c) for a= 3.7 and a= 3.9. Compare and comment.
(e) Develop a chaos-based encryption program that generates keys for single-key
encryption. Use the chaos function
xn+1 = 4xn(1-xJ
(fj Compare the encryption speed of this chaos-based program with that of IDEA for
a key length of 128 bits.
(g) Compare the security of this chaos-based algorithm with that of IDEA for the 128
bit long key.
Cryptography
Index
A Mathematical Theory of Communication 41
a scytale 265
AC coefficients 40
Additive White Gaussian Noise (AWGN) 56
Aeneas Tacticus 265
Algebraic attack 264
Asymmetric (Public-Key) Algorithms 254
Asymmetric Encryption 244
attacker 241
Augmented Generating Function 175
authenticity 242
Automatic Repeat Request 97
Avalanche Effect 259
Average Conditional Entropy 15
Average Conditional Self-Information 12
Average Mutual Information 11, 14
average number of nearest neighbours 221
Average Self-Information 11
Bandwidth Efficiency Diagram 60
Binary Entropy Function 12
Binary Golay Code 124
Binary Symmetric Channel 8
Blaise de Vigemere 266
Block Ciphers 247
Block Code 53, 77
Block Codes 53
Block Length 78, 161
Blocklength 161, 168
Brute force attack 263
BSC 13
Burst Error Correction 121
Burst Errors 121
Capacity Boundary 61
catastrophic 185
Catastrophic Convolutional Code 169
Catastrophic Error Propagation 170
Channel 49
Channel Capacity 50
Channel Coding 48, 76
Channel Coding Theorem 53
Channel Decoder 52
Channel Encoder 52
Channel Formatting 52
Channel Models 48
channel state information 229
channel transition probabilities 9

More Related Content

PDF
Selected Topics In Information And Coding Theory Issac Woungang
PDF
PDF
Broadband wireless communications
PDF
Fundamentals of information_theory_and_coding_design__discrete_mathematics_an...
PDF
Algebra And Coding Theory A Leroy S K Jain
PDF
Fundamentals Of Information Theory And Coding Design 1st Edition Roberto Togneri
PDF
Making effective use of graphics processing units (GPUs) in computations
PDF
Statistical_mechanics_of_complex_network.pdf
Selected Topics In Information And Coding Theory Issac Woungang
Broadband wireless communications
Fundamentals of information_theory_and_coding_design__discrete_mathematics_an...
Algebra And Coding Theory A Leroy S K Jain
Fundamentals Of Information Theory And Coding Design 1st Edition Roberto Togneri
Making effective use of graphics processing units (GPUs) in computations
Statistical_mechanics_of_complex_network.pdf

Similar to information_theory_coding_and_cryptograp.pdf (20)

PDF
Digital Signal Processing for Wireless Communication using Matlab (Signals an...
PDF
Theory And Design Of Digital Communication Systems Tri T Ha
PDF
An Introduction to Mathematical Cryptography-Springer-.pdf
PDF
RoutSaroj_ActiveMMterahertz_PhD_thesis_May_2016
PDF
Neural Networks and Deep Learning Syllabus
PDF
Serendipitous Web Applications through Semantic Hypermedia
PDF
Thesis_Underwater Swarm Sensor Networks
PDF
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
PDF
TOP CITED ARTICLE IN 2011 - INTERNATIONAL JOURNAL OF MOBILE NETWORK COMMUNICA...
PDF
thesis-final-version-for-viewing
PDF
Gravitational Billion Body Project
PDF
Chip xtalk ebook_
PDF
Ph d model-driven physical-design for future nanoscale architectures
PDF
Essay On Affirmative Action.pdf
PDF
Network literacy-high-res
PPTX
Semantic Sensor Networks and Linked Stream Data
PDF
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
DOCX
syllbus (2).docx
PDF
fundamentals-of-neural-networks-laurene-fausett
PDF
Analog_and_digital_signals_and_systems.pdf
Digital Signal Processing for Wireless Communication using Matlab (Signals an...
Theory And Design Of Digital Communication Systems Tri T Ha
An Introduction to Mathematical Cryptography-Springer-.pdf
RoutSaroj_ActiveMMterahertz_PhD_thesis_May_2016
Neural Networks and Deep Learning Syllabus
Serendipitous Web Applications through Semantic Hypermedia
Thesis_Underwater Swarm Sensor Networks
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
TOP CITED ARTICLE IN 2011 - INTERNATIONAL JOURNAL OF MOBILE NETWORK COMMUNICA...
thesis-final-version-for-viewing
Gravitational Billion Body Project
Chip xtalk ebook_
Ph d model-driven physical-design for future nanoscale architectures
Essay On Affirmative Action.pdf
Network literacy-high-res
Semantic Sensor Networks and Linked Stream Data
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
syllbus (2).docx
fundamentals-of-neural-networks-laurene-fausett
Analog_and_digital_signals_and_systems.pdf
Ad

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Empathic Computing: Creating Shared Understanding
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Approach and Philosophy of On baking technology
PPTX
A Presentation on Artificial Intelligence
PPTX
Machine Learning_overview_presentation.pptx
The AUB Centre for AI in Media Proposal.docx
NewMind AI Weekly Chronicles - August'25-Week II
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Unlocking AI with Model Context Protocol (MCP)
Empathic Computing: Creating Shared Understanding
“AI and Expert System Decision Support & Business Intelligence Systems”
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Chapter 3 Spatial Domain Image Processing.pdf
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
MIND Revenue Release Quarter 2 2025 Press Release
Approach and Philosophy of On baking technology
A Presentation on Artificial Intelligence
Machine Learning_overview_presentation.pptx
Ad

information_theory_coding_and_cryptograp.pdf

  • 1. Acknowledgements I would like to thank the Department of Electrical Engineering at the Indian Institute of Technology (liT), Delhi for providing a stimulating academic environment that inspired this book. In particular, I would like to thank Prof. S.C. Dutta Roy, Prof. Surendra Prasad, Prof. H.M. Gupta, Prof. V.K Jain, Prof. Vinod Chandra, Prof. Santanu Chaudhury, Prof. S.D. Joshi, Prof. Sheel Aditya, Prof. Devi Chadha, Prof. D. Nagchoudri, Prof. G.S. Visweswaran, Prof. R K. Patney, Prof. V. C. Prasad, Prof. S. S. Jamuar and Prof. R K P. Bhatt. I am also thankful to Dr. Subrat Kar, Dr. Ranjan K. Mallik and Dr. Shankar Prakriya for friendly discussions. I have been fortunate to have several batches of excellent students whose feedback have helped me improve the contents ofthis book. Many of the problems given at the end of the chapters have been tested either as assignment problems or examination problems. My heartfelt gratitude is due to Prof. Bernard D. Steinberg, University of Pennsylvania, who has been my guide, mentor, friend and also my Ph.D thesis advisor. I am also grateful to Prof. Avraham Freedman, Tel Aviv University, for his support and suggestions as and when sought by me. I would like to thank Prof. B. Sundar Rajan of the Electrical Communication Engineering group at the Indian Institute of Science, Bangalore, with whom I had a preliminary discussion about writing this book. I wish to acknowledge valuable feedback on this initial manuscript from Prof. Ravi Motwani, liT Kanpur, Prof. A.K. Chaturvedi, liT Kanpur, Prof. N. Kumaravel, Anna University, Prof. V. Maleswara Rao, College of Engineering, GITAM, Visakhapatnam, Prof. M. Chandrasekaran, Government College of Engineering, Salem and Prof. Vikram Gadre, liT Mumbai. I am indebted to my parents, for their love and moral support throughout my life. I am also grateful to my grandparents for their blessings, and to my younger brother, Shantanu, for the infinite discussions on finite topics. Finally, I would like to thank my wife and best friend, Aloka, who encouraged me at·every stage of writing this book. Her constructive suggestions and balanced criticism have been instru- mental in making the book more readable and palatable. It was her infinite patience, unending support, understanding and the sense of humour that were critical in transforming my セイ・。ュ@ into this book. RANJAN BosE New Delhi Contents Preface Acknowledgements Part I Information Theory and Source Coding 1. Source Coding 1.1 Introduction to Information Theory 3 セ@ Uncertainty And Information 4 セaカ・イ。ァ・@ Mutual Information And Entropy 77 1.4 Information Measures For Continuous セ、ッュ@ Variables 74 .LVSource Coding Theorem 75 1.6 Huffman Coding 27 I.7 The Lempel-Ziv Algorithm 28 1.8 Run Length Encoding and the PCX Format 30 1.9 Rate Distortion Function 33 1.10 Optimum Quantizer Design 36 1.11 Introduction to Image Compression 37 1.12 The Jpeg Standard for Lossless Compression 38 1.13 The Jpeg Standard for Lossy Compression 39 1.14 Concluding Remarks 47 Summary 42 Problems 44 Computer Problems 46 2. Channel Capacity and Coding 2.1 Introduction 47 2.2 Channel Models 48 セ@ Channel Capacity 50 2.4 Channel Coding 52 W Information Capacity Theorem 56 IX XII 3 47
  • 2. 2.6 The Shannon Limit 59 2.7 Random Selection of Codes 67 2.8 Concluding Remarks 67 Summary 68 Problems 69 Computer Problems 77 Part II セ@ セN@ 1 Error Control Coding セjエ@ 1JM"セ@ (Channel Coding) 3. · ear Block Codes for Error Correction セ@ Introduction to Error Correcting Codes 75 3.2 JBasic Definitions 77 3.3 vMatrix Description of Linear Block Codes 87 Equivalent Codes 82 3.5 v'Parity Check Matrix 85 3.6 l.JDecoding of a Linear Block Code 87 3.7 Jsyndrome Decoding 94 3.8 Error Probability after Coding (Probability of Error Correction) 95 3.9 Perfect Codes 97 3.10 Hamming Codes 700 3.11 Optimal Linear Codes 702 3.12 Maximum Distance Separable (MDS) Codes 702 3.13 Concluding Remarks 102 Summary 703 Problems 705 Computer Problems 706 4. Cyclic Codes セセ@ セNQ@ 4.1 Introduction to Cyclic Codes 708 tY Polynomials 709 ,!Y The Division Algorithm for Polynomials 770 V A Method for Generating Cyclic Codes 775 W Matrix Description of Cyclic Codes 779 4.6 Burst Error Correction 727 75 108 4.7 Fire Codes 723 4.8 Golay Codes 724 4.9 Cyclic Redundancy Check (CRC) Codes 725 4.10 4.11 Circuit Implementation of Cyclic Codes Concluding Remarks 732 728 Summary 732 Problems 134 Computer Problems 735 5. Bose-Chaudhuri Hocquenghem (BCH) Codes セセN@ nJJ:,-idft セ@ z-id<. NGiヲセ@ 5.1 Introduction to BCH Codes 736 セ@ Primitive Elements 737 e.)" Minimal Polynomials 739 セ@ Generator Polynomials in Terms of Minimal Polynomials 2.Y Some Examples of BCH Codes 743 5.6 Decoding of BCH Codes 747 V Reed-Solomon Codes 750 747 5.8 Implementation of Reed-Solomon Encoders and Decoders 753 5.9 Nested Codes 753 5.10 Concluding Remarks 755 Summary 756 Problems 757 Computer Problems 758 136 6. Convolutional Codes 159 6.1 Introduction to Convolutional Codes QUセ@ iJ. J) iJ セ@ 1! ', セl@ セ@ Tree Codes and Trellis Codes 760 JJtセセャ@ fo4! セ@ セ@ A/ '1/A セ@ セッャケョッュゥ。ャ@ Description of Convolutional Codes セセ@ 1 . (;__ (Analytical Representation) 765 V Distance Notions for Convolutional Codes 770 セ@ The Generating Function 773 6.6 Matrix Description of Convolutional Codes 776 セN@ セ@ V/ Viterbi Decoding of Convolutional Codes 778 R}1.L GSM OuセセゥヲッセN@ 6.8 Distance Bounds for Convolutional Codes 785 1 6.9 Performance Bounds 787 6.10 Known Good Convolutional Codes 788 6.11 Turbo Codes 790
  • 3. 6.12 Turbo Decoding 792 6.13 Concluding Remarks 798 Summary 799 Problems 207 Computer Problems 203 7. Trellis Coded Modulation 7.1 Introduction to TCM 206 7.2 The Concept of Coded Modulation 207 7.3 Mapping by Set Partitioning 272 7.4 Ungerboeck's TCM Design Rules 276 7.5 Tern Decoder 220 7.6 Performance Evaluation for Awgn Channel 227 7.7 Computation of tip.ee 227 7.8 Tern for Fading Channels 228 7.9 Concluding Remarks · 232 Summary 233 Problems 234 Computer Problems 238 Partm Coding for Secure Communications . Cryptography 8.1 Introduction to Cryptography 247 L j 8.2 An Overview of Encryption Techniques 242 セ@ セ@ セセセ@ 8.3 Operations Used By Encryption Algorithms 245 8.4 Symmetric (Secret Key) Cryptography 246 8.5 Data Encryption Standard (DES) 248 8.6 International Data Encryption Algorithm (IDEA) 252 8.7 RC Ciphers 253 8.8 Asymmetric (Public-Key) Algorithms 254 8.9 The RSA Algorithm 254 8.10 Pretty Good Privacy (PGP) 256 8.11 One-Way Hashing 259 8.12 Other Techniques 260 8.13 Secure Communication Using Chaos Functions 267 /xvul 8.14 Cryptanalysis 262 8.15 Politics of Cryptography 264 8.16 Concluding Remarks 265 Summary 268 Problems 269 206 Computer Problems 277 Index 273 241
  • 5. 1 Source Coding , Not- セ@ tluU: ca.t'1 be- セ@ セ@ a.n.d.- n.ot セ@ tluU; CCf.U'.t¥ 。オ|セ@ 「・Mセ@ . -Alberl" eセ@ (1879-1955) 1.1 INTRODUCTION TO INFORMATION THEORY Today we live in the information age. The internet has become an integral part of our lives, making this, the third planet from the sun, a global village. People talking over the cellular phones is a common sight, sometimes even in cinema theatres. Movies can be rented in the form of a DVD disk. Email addresses and web addresses are common on business cards. Many people prefer to send emails and e-cards to their friends rather than the regular snail mail. Stock quotes can be checked over the mobile phone. Information has become the key to success (it has always been a key to success, but in today's world it is tlu key). And behind all this information and its exchange lie the tiny l's and O's {the omnipresent bits) that hold information by merely the way they sit next to one another. Yet the information age that we live in today owes its existence, primarily, to a seminal paper published in 1948 that laid the foundation of the wonderful field of Information Theory-a theory initiated by one man, the American Electrical Engineer Claude E. Shannon, whose ideas
  • 6. Information Theory, Coding and Cryptography appeared in the article "The Mathematical Theory of Communication" in the Bell System Technical]ournal (1948). In its broadest sense, information includes the content of any of the standard communication media, such as telegraphy, telephony, radio, or television, and the signals of electronic computers, servo-mechanism systems, and other data-processing devices. The theory is even applicable to the signals of the nerve networks of humans and other animals. The chief concern of information theory is to discover mathematical laws governing systems designed to communicate or manipulate information. It sets up quantitative measures of information and of the capacity of various systems to transmit, store, and otherwise process information. Some of the problems treated are related to finding the best methods of using various available communication systems and the best methods for separating wanted information or signal, from extraneous information or noise. Another problem is the setting of upper bounds on the capacity of a given information-carrying medium (often called an information channel). While the results are chiefly of interest to communication engineers, some of the concepts have been adopted and found useful in such fields as psychology and linguistics. The boundaries of information theory are quite fuzzy. The theory overlaps heavily with communication theory but is more oriented towards the fundamental limitations on the processing and communication of information and less towards the detailed operation of the devices employed. In this chapter, we shall first develop an intuitive understanding of information. It will be followed by mathematical models of information sources and a quantitative measure of the information emitted by a source. We shall then state and prove the source coding theorem. Having developed the necessary mathematical framework, we shall look at two source coding techniques, the Huffman encoding and the Lempel-Ziv encoding. This chapter will then discuss the basics of the Run Length Encoding. The concept of the Rate Distortion Function and the Optimum Quantizer will then be introduced. The chapter concludes with an introduction to image compression, one of the important application areas of source coding. In particular, theJPEG (joint Photographic Experts Group) standard will be discussed in brief. 1.2 UNCERTAINTY AND INFORMATION Any information source, analog or digital, produces an output that is random in nature. If it were not random, i.e., the output were known exactly, there would be no need to transmit it! We live in an analog world and most sources are analog sources, for example, speech, temperature fluctuations etc. The discrete sources are man-made sources, for example, a source (say, a man) that generates a sequence of letters from a finite alphabet (typing his email). Before we go on to develop a mathematical measure of information, let us develop an intuitive feel for it. Read the following sentences: I I Source Coding (A) Tomorrow, the sun will rise from the East. (B) The phone will ring in the next one hour. (C) It will snow in Delhi this winter. The three sentences carry different amounts of information. In fact, the first sentence hardly carries any information. Everybody knows that the sun rises in the East and the probability of this happening again is almost unity. Sentence (B) appears to carry more information than sentence (A). The phone may ring, or it may not. There is a finite probability that the phone will ring in the next one hour (unless the maintenance people are at work again!). The last sentence probably made you read it over twice. This is because it has never snowed in Delhi, and the probability of a snowfall is very low. It is interesting to note that the amount of information carried by the sentences listed above have something to do with the probability of occurrence of the events stated in the sentences. And we observe an inverse relationship. Sentence (A), which talks about an event which has a probability of occurrence very close to 1 carries almost no information. Sentence (C), which has a very low probability of occurrence, appears to carry a lot of information (made us read it twice to be sure we got the information right!). The other interesting thing to note is that the length of the sentence has nothing to do with the amount of information it conveys. In fact, sentence (A) is the longest but carries the minimum information. We will now develop a mathematical measure of information. Definition 1.1 Consider a discrete random variable X withpossible 011teomes·セG@ i:::;: 1, 2, ..., n. . ·.. . . The Self-Information of the event X= xi is defined as /(xi}= log (- 1 -) =-log P(x-) P(x1} • . (1.1) We note that a high probability event conveys less information than a low probability event. For an event with P(x) = 1, J(x) = 0. Since a lower probability implies a higher degree of uncertainty (and vice versa), a random variable with a higher degree of uncertainty contains more information. We will use this correlation between uncertainty and level of information for physical interpretations throughout this chapter. The units of I(x) are determined by the base of the logarithm, which is usually selected as 2 or e. When the base is 2, the units are in bits and when the base is e, the units are in nats (natural units). Since 0 セ@ P(xl; セ@ 1, J(x;) ;;::: 0, i.e., self information is non-negative. The following two examples illustrate why a logarithmic measure of information is appropriate.
  • 7. Information Theory, Coding and Cryptography Example 1.1 Consider a binary source which tosses a fair coin and outputs a 1 if a head (H) appears and aO if a tail (T) appears. For this source,P{l) =P(O) =0.5. The information content of each output from the source is I(x;) = :_ log2 P (x;) = -log2 P (0.5) = 1 bit (1.2) Indeed, we have to use only one bit to represent the output from this binary source (say, we use a 1 to represent H and a 0 to represent T). Now, suppose the successive outputs ·from this binary source are statistically independent, i.e., the source is memoryless. Consider a block of m bits. There are 2m possible m-bit blocks, each of which is equally probable with probability 2-m . The self-information of an m-bit block is I(x;) = - log2 P (xi) = - log2 2-m = m bits (1.3) Again, we observe that we indeed need m bits to represent the possible m-bit blocks. Thus, this logarithmic measure of information possesses the desired additive property when a number of source outputs is considered as a block. Example 1.2 Consider a discrete, memoryless source (DMS) (source C) that outputs two bits at a time. This source comprises two binary sources (sourcesA andB) as mentioned in Example 1.1, each source contributing one bit. The two binary sources within the source Care independent. Intuitively, the information content of the aggregate source (source C) should be the sum of tbe information contained in the outputs of the two independent sources that constitute this セcN@ Let us look at the information content ofthe outputs of sourceC. There are four possible outcomes {00, 01, 10, 11 }, each with a probability P(C) = P(A)P(B) = (0.5)(0.5) =0.25, because the source A and B are independent. The information content ofeach output from the source Cis I(C) = - log2 P(x;) = -log2 P(0.25) = 2 bits (1.4) We have to use two bits to represent the outpat from this combined binary source. Thus, the logarithmic measure of information possesses the desired additive property for independent events. Next, consider two discrete random variables X and Ywith possible outcomes X;, i = 1, 2, ..., 11 and Yj• j = 1, 2, ..., m respectively. Suppose we observe some outcome Y = Yi and we want to Source Coding determine the amount of information this event provides about the eventX =x;, i = 1, 2, ..., 11, i.e., we want to mathematically represent the mutual information. We note the two extreme cases: (i) X and Yare independent, in which case the occurrence of Y = Yj provides no information aboutX=x;. (ii) X and Yare fully dependent events, in which case the occurrence ofY= yi determines the occurrence of the event X= x;· A suitable measure that satisfies these conditions is the logarithm of the ratio of the conditional probability P(X = X; I Y = Yj) = P(x; IY} divided by the probability P(X =X;) = P(x;) Definition 1.2 The mutual information I(x;; y) between X; and Yi is defined as I(x,; y) =log ( ーセZセサII@ (1.5) (1.6) (1.7) As before, the units of I(x) are determined by the base of the logarithm, which is usually selected as 2 or e. When the base is 2 the units are in bits. Note that Therefore, P(x;!yi) _ P(x;IYi)P(y;) _ P(x;,y1) _ P(y1lx;) P{X;) - P(x;)P{y;) - P(x;)P(y1 )- P(y1 ) (1.8) {1.9) The physical interpretation of I(x;; y 1) = I(y1; xJ is as follows. The information provided by the occurrence.of the event Y= y1about the event X= X; is identical to the information provided by the occurrence of the event X= X; about the event Y = yl Let us now verify the two extreme cases: (i) When the random variables X and Yare statistically independent, P(x; Iy1)= P(xJ, it leads to I(x;; y) = 0. (ii) When the occurrence of Y = y 1uniquely determines the occurrence of the event X= X;, P(x; I'1}·) = 1, the mutual information becomes I(x;; y) = lo{ Ptx;)) =-log P(x;) (1.10) This is the self-information of the event X= X;. Thus, the logarithmic definition of mutual information confirms our intuition.
  • 8. Information Theory, Coding and Cryptography Example 1.3 Consider a Binary Symmetric Channel (BSC) as shown in Fig. 1.1. It is a channel that transports 1's and O's from the transmitter (Tx) to the receiver (Rx). It makes an error occasionally, with probabilityp. A BSC flips a 1to 0 and vice-versa with equal probability. Let X and Ybe binary random variables that represent the input and output of this BSC respectively. Let the input symbols be equally likely and the output symbols depend upon the input according to the channel transition probabilities as given below P(Y =0 I X= 0) =1 - p, P(Y=OIX=1)=p, P(Y = 11 X= 1) = 1-p, P(Y = 1 I X= 0) = p. Channel Fig. 1.1 A Binary Symmetric Channel. It simply implies that the probability of a bit getting flipped (i.e. in error) when transmitted over this BSC is p. From the channel transition probabilities we have P(Y = 0) = P(X = 0) X P(Y = 0 I X= 0) + P(X = 1) X P(Y = 0 I X= 1) = 0.5(1- p) + 0.5(p) = 0.5, and, P(Y= 1)=P(X=0) X P(Y= 11X=0)+P(X= 1) X P(Y= 11X= 1) = 0.5(p) + 0.5(1- p) = 0.5. Suppose we are at the receiver and we want to determine what was transmitted at the transmitter, on the basis of what was received. The mutual information about the occurrence ofthe eventX= 0 given that Y= 0 is ( P(Y = OIX =0)) (!=__.e_J I (x0; yo) = /(0; 0) = log2 = log2 = log22(1 - p). P(Y=O) 0.5 Similarly, l(x1; yo)= /(1; 0) = log2 = log2 _l!_ =1ogz2p. . ( P(Y =OIX =1)) ( ) P(Y=O) 0.5 Let us consider some specific cases. Source Coding Suppose, p =0, i.e., it is an ideal channel (noiseless), then, l(Xo; y0) = 1(0; 0) = log22(1 - p) = 1 bit. Hence, from the output, we can determine what was transmitted with certainty. Recall that the self- information about the event X= x0 was 1bit. However, ifp = 0.5, we get l(x0; y0) = /(0; 0) = log22(1- p) = log22(0.5) = 0. It is clear from the output that we have no information about what was transmitted. Thus, it is a useless channel. For such a channel, we may as well toss a fair coin at the receiver in order to determine what was sent! Suppose we have a channel wherep = 0.1, then, l(x0; yo)= 1(0; 0) = logz2(1- p) = log22(0.9) = 0.848 bits. Example 1.4 Let X and Y be binary random variables that represent the input and output of a binary channel shown in Fig. 1.2. Let the input symlfols be equally likely, and the output symbols depend upon the input according to the channel transition probabilities: P(Y = 0 I X= 0) = 1 - p0, P(Y = 0 I X= 1) = p 1, P(Y = 1 I X = 1) = 1 - p1, P(Y =1 I X= 0) =p0. 1-Po Channel Fig. 1.2 A Binary Channel with Asymmetric Probabilities. From the channel transition probabilities we have P(Y = 0) = P(X = O).P(Y = 0 I X= 0) + P(X = I).P(Y = 0 I X= 1) = 0.5(1 - p0) + 0.5(p1) = 0.5(1 -Po+ p1), and, pHyセ@ 1) =P(X=O).P(Y= 11X=0)+ P(X= l).P(Y= II X= 1) = 0.5(p0) + 0.5(1 - p1) = 0.5(1 - Pt + po).
  • 9. Information Theory, Coding and Cryptography Suppose we are at the receiver and we want to determine what was transmitted at the transmitter, on the basis of what is received. The mutual information about the occurrence of the event X= 0 given that Y = 0 is I(x,; yo) =/(0; O) =log2( pHZWy P セ P [@ O)) =log2 ( 0.5(/--セ P@ +1'1J=log2 C セセZ@ セセ@ ). Similarly, ( P(Y =OIX =1)) ( 2Pt ) l(x1; yo)= /(1; 0) = log2 = log2 • P(Y =0) 1- Po + Pt Definition 1.3 The Conditional Self Information of the event X= X; given Y= y 1 is defined as /(x11y) =ws(pHセijゥII@ =-log セクLャ@ Y)· (1.11) Thus, we may write I{x6 y 1) = I(xJ - I(x; Iy 1). (1.12) The conditional self information can be interpreted as the self information about the event X= X; on the basis of the event Y= Yt Recall that both J(x;) セ@ 0 and I(x; Iy1)セ@ 0. Therefore, I(x;; y1) <0 when J..x;) < I(x; Iy1) and I(x6 y 1) >0 when /(x;) > /(x; IY)· Hence, mutual information can be positive, negative or zero. Examph 1.5 Consider the BSC discussed in Example 1.3. The plot of the mutual information l(x0; yo) versus the probability of error, pis given in Fig. 1.3. 1 M]セMMセMMセMMセセMMセMMNMMMセセセセ@ I I I I I I I セU@ MMセMMセMMセMMセMMセMMセMM 1 I I I I I I I 0 MMセMMセMMセMMセMM MMセMMセMMセMMセMM 1 I I I I I I I I I I I I I I I I MセU@ --l--l--l--l--l--l MセMMセMMセMM 1 I I I I I I I -1 MMセMMセMMセMMセMMセMMセMMセ@ ,--,-- 1 I I I I I I I -1.1 ----1---i---i---i----1---i----j-- ---j-- 1 I I I I I I I I -2 MMセMMセMMセMMセMMMMQMMMMQMMMMQMMMMQM -l-- ,, I I I I I I I I I - 2.5 L_____l._____L__ ___L____.J..___.i._ ____J._---,.-----l__.J..__.,.--l,--___J 0 Fig. 1.3 The Plot of ftle Mutua/Information /(xo: ycJ vbセsus@ ftle Probability of Error. p. Source Coding It can be seen from the figure thatl(Xo; y0) is negative forp > 0.5. The physical interpretation is as follows. A negative mutual information implies that having observed Y = y0, we must avoid choosing X = Xo as the transmitted bit. For p = 0.1, l(x0; y1) = 1(0; 1) = log22{p) = log22(0.1) =- 2.322 bits. This shows that the mutual information between the eventsX= Xo andY= y1 is negative forp =0.1. For the extreme case ofp = 1, we have l(x0; y1) = 1(0; 1) = log22{p) = log22(1) =- I bit. The channel always changes a 0 to a 1 and vice versa (since p = 1). This implies that if y1 is observed at the receiver, it can be concluded that Xo was actually transmitted. This is actually a useful channel with a 100% bit error rate! We just flip the received bit. 1.3 AVERAGE MUTUAL INFORMATION AND ENTROPY So far we have studied the mutual information associated with a pair of events xi and y 1 which are the possible outcomes of the two random variables X and Y. We now want to find out the average mutual information between the two random variables. This can be obtained simply by weighting !{xi; y 1) by the probability of occurrence of the joint event and summing over all possible joint events. Definition 1.4 The Average Mutual Information between two random variables X and Yis given by For the case when X and Yare statistically independent, I(X; Y) = 0, i.e., there is no average mutual information between X and Y. An important property of the average mutual information is that /(X; Y) セ@ 0, where equality holds if and only if X and Yare statistically independent. Definition 1.5 The Average Self-Information of a random variable Xis defined as n n H(X) = LP(x;)I(Xj) =- LP(xi)logP(Xj) {1.14) i=l i=l When X represents the alphabet of possible output letters from a source, H(X) represents the average information per source letter. In this case H (X) is called the entropy. The term entropy has been borrowed from statistical mechanics, where it is used to denote the level of disorder in a system. It is interesting to see that the Chinese character for entropy looks like II!
  • 10. Information Theory, Coding and Cryptography Example 1.6 Consider a discrete binary source that emits a sequence of statistically independent symbols. The output is either a 0 with probabilityp or a 1 with a probability 1- p. The entropy of this binary source is 1 H(X) = - L P(x; )log P(x;) =- plog2 (p)- (1- p) log2 (1- p) '(1.15) i=O The plot of the Binary Entropy Function versus p is given in Fig. 1.4. We observe from the figure that the value of the binary entropy function reaches its maximum value for p = 0.5, i.e., when both 1 and 0 are equally likely. In general it can be shown that the entropy of a discrete source is maximum when the letters from the source are equally probable. H(X) 1 セMMMNMMMMPMWMMセMMMMNMMMMN@ 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 Fig. 1.4 The Binary Entropy Function, H (X)=- p log2 (p)- (7 - p) log2 (1 - p). Definition 1.6 The Average Conditional Self-Information called the conditional entropy is defined as n m 1 H(X IY) = LLP(xi, y1)log ( ) i=1J=1 P xiiYJ (1.16) The physical interpretation of this definition is as follows. H(XIY) is the information (or uncertainty) in X having observed Y. Based on the definitions of H(X IY) and H( YIX) we can write I(X; Y) = H(X)- H(XIY) = H(Y)- H(YIX). (1.17) We make the following observations. (i) Since /(X; Y) セ@ 0, it implies that H(X) セ@ H(XI Y). Source Coding (ii) The case I (X; Y) = 0 implies that H(X) = H(XI Y), and it is possible if and only if X and Yare statistically independent. (iii) Since H(X IY) is the conditional self-information about X given Y and H(X) is the average uncertainty (self-information) of A'; I(X; Y) is the average uncertainty about xセ。カゥョァ@ observed Y. (iv) Since H(X) セ@ H(X IY), the observation of Y does not increase the entropy (uncertainty). It can only decrease the entropy. That is, observing Y cannot reduce the information about セ@ it can only add to the information. Example 1.7 Consider the BSC discussed in Example 1.3. Let the input symbols be '0' with probability q and '1' with probability 1- q as shown in Fig. 1.5. 1-p Probability 0 セMMMMMMMMMMMMMMMMMNNNN@ 0 q Tx 1-q Channel Fig. 1.5 A Binary Symmetric Channel (BSC) with Input Symbols Probabilities Equal to q and 7 - q. The entropy of this binary source is 1 Rx 1 H(X) =- L P(x;)logP(x;) = -qlog2(q)- (1- q)log2 (1- q) i=O The conditional entropy is given by n m 1 H(XlY)= L L,P(x;,y)log-- i=1 j=1 p(x;IYj) In order to calculate the values ofH(XlY), we can make use of the following equalities P(x;, Y) =P(x; IY) P(y) =P(yj IX;) P(x;) The plot ofH(XIY) versus q is given in Fig. 1.6 withp as the parameter. (1.18) (1.19)
  • 11. Information Theory, Coding and Cryptography H(XJY) Fig. 1.6 The Plot of Conditional Entropy H(XI Y) Versus q. The average mutual information /(X; Y) is given in Fig. 1.7. It can be seen from the plot that as we increase the parameterp from 0 to 0.5, I(X; Y) decreases. Physically it implies that, as we make the channel less reliable (increase the value ofp セ@ 0.5), the mutual information between the random variable X (at the transmitter) and the random variable Y (receiver) decreases. 1.4 INFORMATION MEASURES FOR CONTINUOUS RANDOM VARIABLES The definitions of mutual information for discrete random variables can be directly extended to continuous random variables. Let X and Y be random variables with joint probability density function (pdf} p(x, y) and marginal pdfs p{x) and p(y). The average mutual information between X and Y is defined as follows. Definition 1.7 The average mutual information between two continuous random variables X and Y is defined as I(X:, Y) = j jp(x)p(ylx)log p(ylx)p(x) dxdy ---- p(x)p(y) (1.20) Fig. 1.7 The Plot of the Average Mutua/Information I(X: 'r? Versus q. r I I i Source Coding It should be pointed out that the definition of average mutual information can be carried over from discrete random variables to continuous random variables, but the concept and physical interpretation cannot. The reason is that the information content in a continuous random variable is actually infinite, and we require infinite number of bits to represent a continuous random variable precisely. The self- information and hence the entropy is infinite. To get around the problem we define a quantity called the differential entropy. Definition 1.8 The differential entropy of a continuous random variable X is defined as hHセ@ =-Ip(x)logp(x) (1.21) Again, it should be understood that there is no physical meaning attached to the above quantity. We carry on with extending our definitions further. Definition 1.9 1he Average Conditiona) Entropy of a continuous random variables X given Y is defined as H(XI Y) = I Ip(x, ケIャッァーHクャケIセ、ケ@ (1.22) The average mutual information can be expressed as I(X:, Y)=H(X) -H(XIY)=H(Y) -H(YIX) (1.23) 1.5 SOURCE CODING THEOREM In this section we explore efficient representation (efficient coding) of symbols generated by a source. The primary objective is the compression of data by efficient representation of the symbols. Suppose a discrete memoryless source (DMS) outputs a symbol every t seconds and each symbol is selected from a finite set of symbols xfl i= 1, 2, ..., L, occurring with probabilities P (x;), i = 1, 2, ..., L, the entropy of this DMS in bits per source symbols is L H(X) = L P(x; )log2 P(x;) :5log2 L (1.24) j=! The equality holds when the symbols are equally likely. It means that the average number of bits per source symbol is H(X) and the source rate is H(X)Itbitslsec. Now let us represent the 26 letters in the English alphabet using bits. We observe that 25 = 32 > 26. Hence, each of the letters can be uniquely represented using 5 bits. This is an example of a Fixed Length Code (FLC). Each letter has a corresponding 5 bit long codeword.
  • 12. Information Theory, Coding and Cryptography I Definition 1.10 A code is a set of vectors called codewords. Suppose a DMS outputs a symbol selected from a finite set of symbols xi, i= 1, 2, ..., L. The number of bits R required for unique coding when L is a power of 2 is R = log2 L, (1.25) and, when L is not a power of 2, it is R = Llog2LJ + 1. (1.26) As we saw earlier, to encode the letters of the English alphabet, we need R= Llog226J + 1 = 5 bits. The FLC for the English alphabet suggests that every letter in the alphabet is equally important (probable) and hence each one requires 5 bits for representation. However, we know that some letters are less common (x, q, z etc.) while others are more frequently used (s, t, e etc.). It appears that allotting equal number of bits to both the frequently used letters as well as not so commonly used letters is rwt an efficient way of representation (coding). Intuitively, we should represent the more frequently occurring letters by fewer number of bits and represent the less frequently occurring letters by larger number of bits. In this manner, if we have to encode a whole page of written text, we might end up using fewer number of bits overall. When the source symbols are not equally probable, a more efficient method is to use a Variable Length Code (VLC). Example 1.8 Suppose we have only the frrst eight letters of the English alphabet (A-H) in our vocabulary. The Fixed Length Code (FLC) for this set of letters would be Letter Codeword Letter A 000 E B 001 F c 010 G D 011 H Fixed Length Code A VLC for the same set of letters can be Letter Codeword Letter A 00 E B 010 F c 011 G D 100 H Variable Length Code 1 Codeword 100 101 110 111 Codeword 101 110 1110 1111 f Source Coding Suppose we have to code the series of letters: "A BAD CAB". The fixed lertgth and the variable length representation of the pseudo sentence would be Fixed Length Code 000 001 000 011 010 000 001 I Total bits- 21 Variable Length Code 00 010 00 100 011 00 010 I Total bits セ@ 18 Note that the variable length code uses fewer number of bits simply because the letters appearing more frequently in the pseudo sentence are represented with fewer number of bits. We look at yet another VLC for the frrst 8 letters of the English alphabet: Letter Codeword Letter Codeword A 0 E 10 B 1 F 11 c 00 G 000 D 01 H Ill Variable Length Code 2 This second variable length code appears to be more efficient in terms of representation of the letters. Variable Length Code 1 00 010 00 100 011 00 010 Total bits= 18 Variable Length Code 2 0 1001 0001 Total bits = 9 However there is a problem with VLC2. Consider the sequence of bits here-0 100I 0001 which is used to represent A BAD CAB. We could regroup the bits in a different manner to have [0] [10][0][1] [0][0][01] which translates to A EAB AAD or [0] [1][0][0][1] [0][0][0][1] which stands for A BAAB AAAB ! Obviously there is a problem with the unique decoding of the code. We have no clue where one codeword (symbol) ends and the next one begins, since the lengths of the codewords are variable. However, this problem does not exist with VLCl. Here no codeword forms the prefix ofany other codeword. This is called the prefix condition. As soon as a sequence ofbits corresponding to any one ofthe possible codewords is detected, we can declare that symbol decoded. Such codes called Uniquely Decodable or Instantaneous Codes cause no decoding delay. In this example, the VLC2 is not a uniquely decodable code, hence not a code ofany utility. The VLC1 is uniquely decodable, though less economical in terms of bits per symbol. Definition 1.11 A Prefix Code is one in which no codeword forms the prefix of any other codeword. Such codes are also called Uniquely Decodable or Instantaneous Codes. We now proceed to devise a systematic procedure for constructing uniquely decodable, Variable Length Codes that are efficient in terms of average number of bits per source letter. Let the source output a symbol from a finite set of symbols xi'!
  • 13. Information Theory, Coding and Cryptography i= 1, 2, ..., L, occurring with probabilities P(xJ, i = 1, 2, ..., L. The average number of bits per source letter is defined as L R = L n(xk)P(xk) (1.27) k=l where n(xi) is the length of the codeword for the symbol xi. Theorem 1.1 (Kraft Inequality) A necessary and sufficient condition for the existence of a binary code with codewords having lengths n1 :5 n;. :5 ... nL that satisfy the prefix condition is (1.28) Proof First we prove the sufficient condition. Consider a binary tree of order (depth) n= nL. This tree has 2nL terminal nodes as depicted in Fig. 1.8. Let us select any code of order n1 as the first codeword c1. Since no codeword is the prefix of any other codeword (the prefix condition), this choice eliminates 2n--n, terminal codes. This process continues until the last codeword is assigned at the terminal node n = nL" Consider the node of orderj <L. The fraction of number of terminal nodes eliminated is j L LTnk < _Lz-nk :51. (1.29) k=l k=l Thus, we can construct a prefix code that is embedded in the full tree of nL nodes. The nodes that are eliminated are depicted by the dotted arrow lines leading on to them in .the figure. 0 0 Fig. 1.8 A Binary Tree of Order nL. We now prove the necessary condition. We observe that in the code tree of order n = nb the number of terminal nodes eliminated from the total number of 2n terminal nodes is r Source Coding This leads to Example 1.9 Consider the construction of a prefix code using a binary tree. no 0 0 ..セセセNNNNM@ nco セOONNNNM ---......._ MMMMBVセセセ@ ......• ---....... llo10 セMᄋ@ ---......_ no110 セᄋ@ no11 1 • no111 Fig. 1.9 Constructing a Binary Pr!!flx Code using a Binary Tree. (1.30) (1.31) We start from the mother node and proceed toward the terminal nodes ofthe binary tree (Fig. 1.9). Let the mother node be labelled '0' (could have been labelled '1' as well). Each node gives rise to two branches (binary tree). Let's label the upper branch '0' and the lower branch '1' (these labels could have also been mutually exchanged). First we follow the upper branch from the mother node. We obtain our first codeword c1 = 0 terminating at node floo· Since we want to construct a prefix code where no codeword is a prefix of any other codeword, we must discard all the daughter nodes generated as a result of the node labelled c1• Next, we proceed on the lower branch from the mother node and reach the node no1• We proceed along the upper branch first and reach node now· We label this as the codeword c2 = 10 (the labels of the branches that lead up to this node travelling from the moj}er node). Following the lower branch from the nodeno1, we ultimately reach the terminal nodes no110 andn0111, which correspond to the codewords c3 =110 and c4 =111 respectively. Thus the binary tree has given us four prefix codewords: {0, 10, 110, 111 }. By construction, this is a prefix code. For this code L L 2-nk =2-1 + T 2 + 2-3 + T 3 =0.5 + 0.25 + 0.125 + 0.125 =1 k=l Thus, the Kraft inequality is satisfied. We now state and prove the noiseless Source Coding theorem, which applies to the codes that satisfy the prefix condition. .I
  • 14. Information Theory, Coding and Cryptography Theorem 1.2 (Source Coding Theorem) Let Xbe the set ofletters from a DMS with finite entropy H(X) and xb k= 1, 2, ..., L the output symbols, occurring with probabilities P(xk), k = 1, 2, ..., L. Given these parameters, it is possible to construct a code that satisfies the prefix condition and has an average length R that satisfies the inequality H(X) セ@ R<H(X) + 1 (1.32) Proof First consider the lower bound of the inequality. For codewords that have length ョセッ@ Qセ@ k セ@ L, the difference R - H(X) can be expressed as _ L 1 L L 2 -nk H(X) - R = L Pk log2-- L Pknk = L Pk log2- k=l Pk k=l k=l Pk We now make use of the inequality ln x セ@ x - 1 to get ,; (log2 e) (t.T'' -1J,;0 The last inequality follows from the Kraft inequality. Equality holds if and only ifPk = 2-nk for 1 セ@ ォセ@ L. Thus the lower bound is proved. Next, we prove the upper bound. Let us select the codeword lengths nk such that 2-nk セ@ Pk < 2-nk + 1 • First consider 2-nk セ@ pセ^@ Summing both sides over 1 セ@ k セ@ L gives us L L L 2-nk セ@ L Pk = 1 k=l k=l which is the Kraft inequality for which there exist a code satisfying the prefix condition. Next consider Pk < 2-nk+I. Take logarithm on both sides to get log2 Pk <- nk + 1, or, nk < 1 - log2 Pk· On multiplying both sides by Pk and summing over 1 セ@ k セ@ L we obtain t.p.n,< t.h+(-セOGャッァ R ィ@ l or, R<H(X) + 1 r I I i i i I Source Coding Thus the upper bound is proved. The Source Coding Theorem tells us that for any prefix code used to represent the symbols from a source, the minimum number of bits required to represent the source symbols on an average must be at least equal to the entropy of the source. If we have found a prefix code that satisfies R= H(X) for a certain source .A: rve must abandon further search because we cannot do any better. The theorem also tells us that a source with higher entropy (uncertainty) requires, on an average, more number of bits to represent the source symbols in terms of a prefix code. Definition 1.12 The efficiency of a prefix code is defined as 11 = Hj_x) R (1.33) It is clear from the source coding theorem that the efficiency of a prefix code 11 セ@ 1. Efficient representation of symbols leads to compression of data. Source coding is primarily used for compression of data (and images). Example 1.10 Consider a source X which generates four symbols with probabilities PI = 0.5, p2 = 0.3, p3 = 0.1 and p4 = 0.1. The entropy of this source is 4 H(X) = - L Pk log2 Pk = 1.685 bits. k=I Suppose we use the prefix code {0, 10, 110, 111} constructed in Example 1.9. Then the average codeword length, R is given by 4 R = L n(xk)P(xk) = 1(0.5) + 2(0.3) + 3(0.1) + 3(0.1) = 1.700 bits. k=I Thus we have H(X) セ@ R セhHxI@ + 1 The efficiency of this code is 11 = (1.685/1.700) = 0.9912. Had the source symbol probabilities been Pk = 2-n1, i.e., PI = 2-I = 0.5, p2 = 2-2 = 0.25, p3 = 2-3 = 0.125 and p4 = 2- 3 = 0.125, the average codeword length would be, R= 1.750 bits= H(X).In this case, 11 = 1. 1.6 HUFFMAN CODING We will now study an algorithm for constructing efficient source codes for a DMS with source symbols that are not equally probable. A variable length encoding algorithm was suggested by Huffman in 1952, based on the source symbol probabilities P(xJ, i= 1, 2, ..., L. The algorithm is optimal in the sense that the average number of bits it requires to represent the source symbols
  • 15. Information Theory, Coding and Cryptography is a minimum, and also meets the prefix condition. The steps of the Huffman coding algorithm are given below: (i) Arrange the source symbols in decreasing order of their probabilities. (ii) Take the bottom two symbols and tie them together as shown below. Add the. probabilities of the two symbols and write it on the combined node. Label the two branches with a '1' and a '0' as depicted in Fig. 1.10. Pn-2 0 pョMQセKー@ n-1 n Pn 1 Fig. 1.1 0 Combining Probabilities in Huffman Coding. (iii) Treat this sum of probabilities as a new probability associated with a new symbol. Again pick the two smallest probabilities, tie them together to form a new probability. Each time we perform the combination of two symbols we reduce the total number of symbols by one. Whenever we tie together two probabilities (nodes) we label the two branches with a '1' and a '0'. (iv) Continue the procedure until only one probability is left (and it should be 1 if your addition is correct!). This completes the construction of the Huffman tree. (v) To find out the prefix codeword for any symbol, follow the branches from the final node back to the symbol. While tracing back the route read out the labels on the branches. This is the codeword for the symbol. The algorithm can be easily understood using the following example. Example 1.11 Consider a DMS with seven possible symbols X;, i = 1, 2, ..., 7 and the corresponding probabilitiesp1 =0.37,p2 =0.33,p3 =0.16,p4 =0.01,p5 =0.04,p6 =0.02, andp7 = 0.01. We first arrange the probabilities in the decreasing order and then construct the Huffman tree as in Fig. 1.11. Symbol Probability SelfInfonnation Codeword X! 0.37 1.4344 0 x2 0.33 1.5995 10 x3 0.16 2.6439 110 x4 0.07 3.8365 1110 xs 0.04 4.6439 llllO x6 0.02 5.6439 111110 x1 0.01 6.6439 111111 Source Coding 0 X1 0.37 0 1.00 x2 0.33 ---- 0 0.66 I X3 0.16 MMMQMセ@ X4 0.07 xs 0.04 0 0.30 I .... ______ 0 _____1 0.14 セMMlセ@ I I ___ .,l_J .- X6 0.02 0 0.07 0.03 1 1 X7 0.01 Fig. 1.11 Huffman Coding for Example 1. 17. To find the codeword for any particular symbol, we just trace back the route from the final node to the symbol. For the sake of illustration we show the route for the symbolx4 with probability 0.07 with the dotted line. We read out the labels of the branches on the way to obtain the codeword as 1110. The entropy of the source is found out to be 7 H(X) = - L Pk log2 Pk = 2.1152 bitsl k=l and the average number of binary digits per symbol is calculated to be 7 R = 'I,n(xk)P(xk) k=l = 1(0.37) + 2(0.33) + 3(0.16) + 4(0.07) + 5(0.04) + 6(0.02) + 6(0.01) = 2.1700 bits. The efficiency of this code is Tf = (2.1152/2.1700) = 0.9747. Example 1.12 This example shows that Huffman coding is not unique. Consider a DMS with seven possible symbols X;, i = 1, 2, ..., 7 and the corresponding probabilities p1 =0.46, P2 =0.26, p3 =0.12, p4 =0.06, p5 = 0.03, p6 =0.02, and p1 = 0.01. Symbol Probability Selflnfonnation Codeword xl 0.46 1.1203 1 x2 0.30 1.7370 00 x3 0.12 3.0589 010 x4 0.06 4.0589 0110 xs 0.03 5.0589 01110 x6 0.02 5.6439 011110 x1 0.01 6.6439 011111
  • 16. Information Theory, Coding and Cryptography x1 0.46 x2 0.30 0 I X3 0.12 0 0.54 X4 0.06 Xs 0.03 0 0.24 0 1 0.12 1 Xs 0.02 X7 0.01 0 0.06 x6 ' I 0.03 1 I 1 Fig. 1.12 Huffman Coding for Example 7. 7 2. The entropy of the source is found out to be 7 H(X) =-IPk log2 Pk =1.9781 bits, k=l I and the average number of binary digits per symbol is calculated to be 7 R = I n(xk )P(xk) k=l 0 セ@ 1 = 1(0.46) + 2(0.30) + 3(0.12) + 4(0.06) + 5(0.03) + 6(0.02) + 6(0.01) = 1.9900 bits. The efficiency of this code is 11 =(1.978111.9900) =0.9940. We shall now see that Huffman coding is not unique. Consider the combination ofthe two smallest probabilities (symbols x6 。ョ、セIN@ Their sum is equal to 0.03, which is equal to the next higher probability corresponding to the symbol x5. So, for the second step, we may choose to put this combined probability (belonging to, say, symbol xt;') higher than, or lower than, the symbol x5• Suppose we put the combined probability at a lower level. We proceed further, to again find the combination ofx6' and x5 yields the probability 0.06, which is equal to that ofsymbolx4• We again have a choice whether to put the combined probability higher than, or lower than, the symbol x4• Each time we make a choice (or flip a fair coin) we end up changing the final codeword for the symbols. In Fig. 1.13, each time we have to make a choice between two probabilities that are equal, we put the probability of the combined symbols at a higher level. Source Coding X1 0.46 x2 0.30 0.54 X3 0.12 0.24 X4 0.06 0.12 xs 0.03 0 0.06 x6 0.02 0.03 X7 0.01 1 Fig. 1.13 Alternative way of Huffman Coding in Example 7. 7 2 which Leads to a Different Code. Symbol xl x2 x3 x4 Xs x6 X? The entropy of the source is 7 Probability 0.46 0.30 0.12 0.06 0.03 0.02 0.01 SelfInformation 1.1203 1.7370 3.0589 4.0589 5.0589 5.6439 6.6439 H(X) =-IPk log2 Pk =1.9781 bits, k=l and the average number of bits per symbol is 7 R = In(xk)P(xk) k=l Codeword 1 00 011 0101 01001 010000 010001 = 1(0.46) + 2(0.30) + 3(0.12) + 4(0.06) + 5(0.03) + 6(0.02) + 6(0.01) = 1.9900 bits. The efficiency of this code is 71 = (1.978111.9900) = 0.9940. Thus both codes are equally efficient. In the above examples, encoding is done symbol by symbol. A more efficient procedure is to encode blocks of B symbols at a time, In this case the bounds of the source coding theorem becomes BH(X) :::; RB < BH(X) + 1 I
  • 17. Information Theory, Coding and Cryptography since the entropy of a B-symbol block is simply BH(X;, and RB is the average number of bits per B-symbol block. We can rewrite the bound as H(X; :::; Rs <H(X; + __!_ B B (1.34) R - - where _!L = R is the average number of bits per source symbol. Thus, R can be made B arbitrarily: close to H(X; by selecting a large enough block B. Example 1.13 Consider the source symbols and their respective probabilities listed below. Symbol Probability SelfInformation Codeword xl 0.40 Xz 0.35 x" 0.25 For this code, the entropy of the source is 3 1.3219 1.5146 2.0000 H(X) =- L Pk log2 Pk =1.5589 bits k=l The average number of binary digits per symbol is 3 R = I n(xk )P(xk) k=l = 1(0.40) + 2(0.35) + 2(0.25) = 1.60 bits, and the efficiency of this code is 1] =(1.5589/1.6000) =0.9743. 1 00 01 We now group together the symbols, two at a time, and again apply the Huffman encoding algorithm. The probabilities of the symbol pairs, in decreasing order, are listed below. Symbol Pairs Probability SelfInformation Codeword XtXI 0.1600 2.6439 10 XtXz 0.1400 2.8365 001 XzXI 0.1400 2.8365 010 XzXz 0.1225 3.0291 011 XtX3 0.1000 3.3219 111 x3x1 0.1000 3.3219 0000 XzX3 0.0875 3.5146 0001 X3Xz 0.0875 3.5146 1100 x3x3 0.0625 4.0000 1101 Source Coding For this code, the entropy is 9 2H(X) =- LPk log2 Pk =3.1177 bits, k=l ==> H(X) = 1.5589 bits. Note that the source entropy has not changed !The average number of bits per block (symbol pair) is 9 RB = I n(xk )P(xk) k=l = 2(0.1600) + 3(0.1400) + 3(0.1400) + 3(0.1225) + 3(0.1000) + 4(0.1000) + 4(0.0875) + 4(0.0875) + 4(0.0625) = 3.1775 bits per symbol pair. ==> R = 3.1775/2 = 1.5888 bits per symbol. and the efficiency of this code is 1] = (1.558911.5888) = 0.9812. Thus we see that grouping of two letters to make a symbol has improved the coding efficiency. Example 1.14 Consider the source symbols and their respective probabilities listed below. Symbol Probability SelfInformation Codeword XI 0.50 Xz 0.30 x" 0.20 For this code, the entropy of the source is 3 1.0000 1.7370 2.3219 H(X) = - I Pk logz Pk = 1.4855 bits. k=l The average number of bits per symbol is 3 R = I n(xk )P(xk) k=l = 1(0.50) + 2(0.30) + 2(0.20) = 1.50 bits, arid the efficiency of this code is 1J =(1.4855 /1.5000) =0.9903. 1 00 01 We now group together the symbols, two at a time, and again apply the Huffman encoding algorithm. The probabilities of the symbol pairs, in decreasing order, are listed as follows.
  • 18. Information Theory, Coding and Cryptography Symbol Pairs Probability SelfInformation xlxl 0.25 2.0000 x1 x2 0.15 2.7370 XzX! 0.15 2.7370 xlx3 0.10 3.3219 x3xl 0.10 3.3219 XzXz 0.09 3.4739 XzX3 0.06 4.0589 x3x2 0.06 4.0589 クセクセ@ 0.04 4.6439 For this code, the entropy is 9 2H(X) = - L,Pk 1og2 Pk = 2.9710 bits, k=! セ@ H(X) = 1.4855 bits. The average number of bits per block (symbol pair) is 9 RB = L n(xk )P(xk) k=! Codeword 00 010 011 100 110 1010 1011 1110 1111 = 2(0.25) + 3(0.15) + 3(0.15) + 3(0.10) + 3(0.10) + 4(0.09) + 4(0.06) + 4(0.06) + 4(0.04) = 3.00 bits per symbol pair. セ@ ii = 3.00/2 = 1.5000 bits per symbol. and the efficiency of this code is ry2 = (1.4855 /1.5000) = 0.9903. In this case, grouping together two letters at a time has not increased the efficiency ofthe code! However, if we group 3 letters at a time (triplets) and then apply Huffman coding, we obtain the code efficiency as ry3 = 0.9932. Upon grouping four letters at a time we see a further improvement (TJ4 = 0.9946). 1.7 THE LEMPEL-ZIV ALGORITHM Huffman coding requires the symbol probabilities. But most real life scenarios do not provide the symbol probabilities in advance (i.e., the statistics of the source is unknown). In principle, it is possible to observe the output of the source for a long enough time period and estimate the symbol probabilities. However, this is impractical for real-time application. Also, while Huffman coding is optimal for a DMS source where the occurrence of one symbol does not alter the probabilities of the subsequent symbols, it is not the best choice for a source with r i sセイ」・@ Coding memory. For example, consider the problem of compression of written text. We know that many letters occur in pairs or groups, like 'q-u', 't-h', 'i-n-g' etc. It would be more efficient to use the statistical inter-dependence of the letters in the alphabet along with their individual probabilities of occurrence. Such a scheme was proposed by Lempel and Ziv in 1977. Their source coding algorithm does not need the source statistics. It is a Variable-to-Fixed Length Source Coding Algorithm and belongs to セ・@ class of universal source coding algorithms. The logic behind Lempel-Ziv universal coding is as follows. The compression of an arbitrary sequence of bits is possible by coding a series of O's and 1's as some previous such string (the prefix string) plus one new bit (called innovation bit). Then, the new string formed by adding the new bit to the previously used prefix string becomes a potential prefix string for· future strings. These variable length blocks are called phrases. The phrases are listed in a dictionary which stores the existing phrases and their locations. In encoding a new phrase, we specify the location of the existing phrase in the dictionary and append the new letter. We can derive a better understanding of how the Lempel-Ziv algorithm works by the following example. Example 1.15 Suppose we wish to code the string: 101011011010101011. We will begin by parsing it into comma-separated phrases that represent strings that can be represented by a previous string as a prefix, plus a bit. · The first bit, a 1, has no predecessors, so, it has a null prefix string and the one extra bit is itself: 1, 01011011010101011 The same goes for the 0 that follows since it can't be expressed in terms ofthe only existing prefix: 1, 0, 1011011010101011 So far our dictionary contains the strings' 1' and '0'. Next we encounter a 1, but it already exists in our dictionary. Hence we proceed further. The following 10 is obviously a combination of the prefix 1 and a 0, so we now have: 1, 0, 10, 11011010101011 Continuing in this way we eventually parse the whole string as follows: 1, 0, 10, 11, 01, 101, 010, 1011 Now, since we found 8 phrases, we will use a three bit code to label the null phrase and the first seven phrases for a total of 8 numbered phrases. Next, we write the string in terms ofthe number of the prefix phrase plus the new bit needed to create the new phrase. We will use parentheses and commas to separate these at first, in order to aid our visualization ofthe process. The eight phrases can be described by: (000,1)(000,0),(001,0),(001'1),(010,1),(011,1),(101,0),(110,1).
  • 19. Information Theory, Coding and Cryptography It can be read out as: (codeword at location 0,1), (codeword at location 0,0), (codeword at location 1,0), (codeword at location 1,1), (codeword at location 2,1), (codeword at location 3,1), and so on. Thus the coded version of the above string is: 00010000001000110101011110101101. The dictionary for this example is given in Table 1.1. In this case, we have not obtained any compression, our coded string is actually longer! However, the larger the initial string, the more saving we get as we move along, because prefixes that are quite large become representable as small numerical indices. In fact, Ziv proved that for long documents, the compression of the file approaches the optimum obtainable as determined by the information content ofthe document. Table 1.1 Dictionary for the Lempei-Ziv algorithm Dictionary Dictionary Fixed Length Location content Code'rvord 001 1 0001 010 0 0000 011 10 0010 100 11 0011 101 01 0101 110 101 0111 111 010 1010 1011 1101 The next question is what should be the length of the table. In practical application, regardless of the length of the table, it will eventually overflow. This problem can be solved by pre- deciding a large enough size of the dictionary. The encoder and decoder can update their dictionaries by periodically substituting the less used phrases from their dictionaries by more frequently used ones. Lempel-Ziv algorithm is widely used in practice. The compress and uncompress utilities of the UNIX operating system use a modified version of this algorithm. The standard algorithms for compressing binary files use code words of 12 bits and transmit 1 extra bit to indicate a new sequence. Using such a code, the Lempel-Ziv algorithm can compress transmissions of English text by about 55 per cent, whereas the Huffman code compresses the transmission by only 43 per cent. In the following section we will study another type of source coding scheme, particularly useful for facsimile transmission and image compression. 1.8 RUN LENGTH ENCODING AND THE PCX FORMAT Run-Length Encoding or RLE is a technique used to reduce the size of a repeating string of characters. This repeating string is called a run. Typically RLE encodes a run of symbols into two bytes, a count and a symbol. RLE can compress any type of data regardless of its Source Coding information content, but the content of data to be compressed affects the compression ra4o. RLE cannot achieve high compression ratios compared to other compression methods, but it is easy to implement and is quick to execute. RLE is supported by most bitmap file formats such as TIFF, BMP and PCX. Example 1.16 Consider the following bit stream: 1111111111111110000000000000001111. This can be represented as: fifteen 1's, nineteen O's, four 1's, i.e., (15,1), (19, 0), (4,1). Since the maximum number ofrepetitions is 19, which can be represented with 5 bits, we can encode the bit stream as (01111,1), (10011,0), (00100,1). The compression ratio in this case is 18:38 = 1:2.11. RLE is highly suitable for FAX images of typical office documents. These two-colour images (black and white) are predominantly white. If we spatially sample these images for conversion into digital data, we find that many entire horizontal lines are entirely white (long runs of O's). Furthermore, if a given pixel is black or white, the chances are very good that the next pixel will match. The code for fax machines is actually a coml]ination of a run-length code and a Huffman code. A run-length code maps run lengths into code words, and the codebook is partitioned into two parts. The first part contains symbols for runs of lengths that are a multiple of 64; the second part is made up of runs from 0 to 63 pixels. Any run length would then be represented as a multiple of 64 plus some remainder. For example, a run of 205 pixels would be sent using the code word for a run of length 192 (3 x 64) plus the code word for a run of length 13. In this way the number of bits ョセ・、・、@ to represent the run is decreased significantly. In addition, certain runs that are known to have a higher probability of occurrence are encoded into code words of short length, further reducing the number of bits that need to be transmitted. Using this type of encoding, typical compressions for facsimile transmission range between 4 to 1 and 8 to 1. Coupled to higher modem speeds, these compressions reduce the transmission time of a single page to less than a minute. Run length coding is also used for the compression of images in the PCX formaL The PCX format was introduced as part of the PC Paintbrush series of software for image painting and editing, sold by the ZSoft company. Today, the PCX format is actually an umbrella name for several image compression methods and a means to identify which has been applied. We will restrict our attention here to only one of the methods, for 256-colour images. We will restrict ourselves to that portion of the PCX data stream that actually contains the coded image, and not those parts that store the colour palette and image information such as number of lines, pixels per line, file and the coding method. The basic scheme is as follows. If a string of pixels are identical in colour value, encode them as a special flag byte which contains the count followed by a byte with the value of the repeated pixel. If the pixel is not repeated, simply encode it as the byte itself. Such simple schemes can
  • 20. Information Theory, Coding and Cryptography often become more complicated in practice. Consider that in the above scheme, if all 256 colours in a palette are used in an image, then, we need all 256 values of a byte to represent those colours. Hence, if we are going to use just bytes as our basic code unit, we don't have any possible unused byte values that can be used as a flag/count byte. On. the. other ィ。ョセL@ if we use two bytes for every coded pixel to leave room for the flag/count combmations, we mtght double the size of pathological images instead of compressing them. The compromise in the PCX format is based on the belief of its designers than many user- created drawings (which was the primary intended output of their software) would not use all 256 colours. So, they optimized their compression scheme for the case of up to 192 colors only. Images with more colours will also probably get good compression, just not quite as good, with this scheme. Example 1.17 PCX compression encodes single occurrences of colour (that is, a pixel that is not part of a run of the same colour) 0 through 191 simply as the binary byte representation of exactly that numerical value. Consider Table 1.2. Table 1.2 Example of PCX encoding P1xel color value Hex code Binary code 0 1 2 3 190 191 00 01 02 03 BE BF 00000000 00000001 00000010 00000011 10111110 10111111 Forthe colour 192 (and all the colours higher than 192), the codeword is equal to one byte in which the two most significant bits (MSBs) are both set to a 1. We will use these codewords to signify a flag and count byte. Ifthe two MSBs are equal to one, we will say that they have flagged a count. The remaining 6 bits in the flag/count byte will be interpreted as a 6 bit binary number for the count (from 0 to 63). This byte is then followed by the byte which represents the colour. In fact, ifwe have a run ofpixels of one ofthe colours with palette code even over 191, we can still code the run easily since the top two bits are not reserved in this second, colour code byte of a run coding byte pair. If a run of pixels exceeds 63 in length, we simply use this code for the first 63 pixels in the run and that code additional runs of that pixel until we exhaust all pixels in the run. The next question is: how do we code those remaining colours in a nearly full palette image when there is no run? We still code these as a run by simply setting the run length to 1. That means, for the case ofat most 64 colours which appear as single pixels in the image and not part of runs, we expand the data by a factor of two. Luckily this rarely happens! Source Coding In the next section, we will study coding for analog sources. Recall that we ideally need infinite number of bits to accurately represent an analog source. Anything fewer will only be an approximate representation. We can choose to use fewer and fewer bits for representation at the cost of a poorer approximation of the original signal. Thus, quantization of the amplitudes of the sampled signals results in data compression. We would like to study the distortion introduced when the samples from the information source are quantized. 1.9 RATE DISTORTION FUNCTION Although we live in an analog world, most of the communication takes place in digital form. Since most natural sources (e.g. speech, video etc.) are analog, they are first sampled, quantized and then processed. Consider an analog message waveform x (t) which is a sample waveform of a stochastic process X(t). Assuming X(t) is a bandlimited, stationary process, it can be represented by a sequence of uniform samples taken at the Nyquist rate. These samples are quantized in amplitude and encoded as a sequence of bits. A simple encoding strategy can be to define L levels and encode every sample using R = log2L bits if L is a power of 2, or R = Llog2LJ + 1 bits if Lis not a power of 2. If all levels are not equally probable we may 'use entropy coding for a more efficient representation. In order to represent the analog waveform more accurately, we need more number of levels, which would imply more number of bits per sample. Theoretically we need infinite bits per sample to perfectly represent an analog source. Quantization of amplitude results in data compression at the cost of signal integrity. It's a form of lossy data compression where some measure of difference between the actual source samples {xJ and the correspon- ding quantized value {xd results from distortion. Definition 1.13 The squared-error distortion is defined as d (xk. xk) = (xk- xk)2 In general a distortion measure may be represented as d (xk' xk) = lxk, xklp Consider a sequence of n samples, Xn, and the corresponding nquantized values, Xn. Let d(xk, xk) be the distortion measure per sample (letter). Then the distortion measure between the original sequence and the sequence of quantized values will simply be the average over the n source output samples, i.e., d(X11 ,X11 ) = __!_ Id(xk,xk) n k=I We observe that the source is a random process, hence Xn and consequently d(X X ) are random variables. We now define the distortion as follows. 11' 11
  • 21. I L Information Theory, Coding and Cryptography Definition 1.14 The distortion between a sequence of n samples, xセ@ and their corresponding n quantized values, xn is defined as D= E[d(X,"X11 )] = _!_ IE(tl(x.t,i.t)] = .Eltl(x,t,i'.t)}. n .t=l It has been assumed here that the random process is stationary. Next, let a memoryless source have a continuous output X and the quantized output alphabet X. Let the probability density function of this continuous amplitude be p{x) and per letter distortion measure be d(x, x), where x E X and .X EX. We next introduce the rate distortion function which gives us the minimum number of bits per sample required to represent the source output symbols given a prespecified allowable distortion. Definition 1.15 The minimum rate (in bits/source output} required to represent the output X of the memoryless source with a distortion less than or equal to Dis called the rate distortion function Rf.Ii}, defined as R(D) = min _ I{X, X) p(i].t):E[d(X, X)) where I(X; X) is the average mutual information between X and X. We will now state (without proof) two theorems related to the rate distortion function. Theorem 1.3 The minimum information rate necessary to represent the output of a discrete time, continuous amplitude memoryless Gaussian source with variance a-2 X' based on a mean square-error distortion measure per symbol, is ( セ@ {_!_log2(a-;/D) oセdセ。M[@ セ@ D1 = 2 0 D > a-2 X Consider the two cases: (i) D セ@ セZ@ For this case there is no need to transfer any information. For the reconstruction of the samples (with distortion greater than or equal to the variance) one can use statistically independent, zero mean Gaussian noise samples with . D 2 vanance = 0" x . (ii) D < O" セ@ : For this case the number of bits per output symbol decreases monotonically as D increases. The plot of the rate distortion function is given in Fig. 1.14. Source Coding 3 Rg(D) 2 o Dla: 0 0.2 0.4 0.6 0.8 1 Fig. 1.14 Plot of the Rg(D) versus PO」イセN@ Theorem 1.4 There exists an encoding scheme that maps the source output into codewords such that for any given distortion D, the minimum rate R(D) bits per sample is sufficient to reconstruct the source output with an average distortion that is arbitrarily close to D. Thus, the distortion function for any source gives the lower bound on the source rate that is possible for a given level of distortion. Definition 1.16 The distortion rate function for a discrete time, memoryless gaussian source is defined as Dg(R) = z-2R セM Exmrqile 1.18 For a discrete time, memoryless Gaussian source, the distOrtion (in dB) as a function of its variance can be expressed as 10 log10 Dg(R) =- 6R + 10 log10 u;. Thus the mean square distortion decreases at a rate of 6 dB/bit. The rate distortion function ofadiscrete time, memoryless continuous amplitude source with zero mean and finite variance u!with respect to the mean square error distortion measure D is upper bounded as This upper bound can be intuitively understood as follows. We know that for a given variance; the zero mean Gaussian random variable exhibits the maximum differential entropyattainable by any random variable. Hence, for a given distortion, the minimum number of bits per S8.tllple required is upperbounded by the gaussian random variable. ./
  • 22. Information Theory, Coding and Cryptography The next obvious question is: What would be a good design for a quantizer? Is there a way to construct a quantizer that minimizes the distortion without using too many bits? We shall find the answers to these questions in the next section. 1.10 OPTIMUM QUANTIZER DESIGN In this section, we look at optimum quantizer design. Consider a continuous amplitude signal whose amplitude is not uniformly distributed, but varies according to a certain probability density function, p(x). We wish to design the optimum scalar quantizer that minimizes some function of the quantization error q= x - x, where x is the quantized value of x. The distortion resulting due to the quantization can be expressed as D = ヲセ@ f(x - x) p(x)dx, where f( x - x) is the desired function of the error. An optimum quantizer is one that minimizes D by optimally selecting the output levels and the corresponding input range of each output level. The resulting optimum quanjizer is called the lloyd-Max Quantizer. For an L-level quantizer the distortion is given by L D= If* f(x - x) p(x)dx k=l Xk-1 The necessary conditions for minimum distortion are obtained by differentiating D with respect to {xk} and {xA;}. As a result of the differentiation process we end up with the following _system of equations f(xk - x) = f(xk+I - xJ, k= 1, 2,···, L- 1 Xk J f'(xk+ 1 - xJ p(x)dx k= 1, 2,···, L For f(x) =X'- ,i.e., the mean square value of the distortion, the above equations simplify to 1 (- - ) xk =- xk - xk+I ' 2 k= 1, 2,···, L- 1 rk (xk- x) p(x)dx= 0, k= 1, 2,···, L xk-1 The nonuniform quantizers are optimized with respect to the distortion. However, each quantized sample is represented by equal number of bits (say, R bits/sample). It is possible to have a more efficient VLC. The discrete source outputs that result from quantization is characterized by a set of probabilities h· These probabilities are then used to design efficient VLC (source coding). In order to compare the performance of different nonuniform quantizers, we first fix the distortion, D, and then compare the average number of bits required per sample. Source Coding Example 1.19 Consider an eight level quantizer for a Gaussian random variable. This problem was first solved by Max in 1960. The random variable has zero mean and variance equal to unity. For a mean square error minimization, the values xk and .X'g are listed in Table 1.3. Table 1.3 Optimum quantization and Huffman coding Level. x, x. P. Huffman Code 1 - 1.748 - 2.152 0.040 0010 2 - 1.050 - 1.344 0.107 011 3 -0.500 -0.756 0.162 010 4 0 -0.245 0.191 10 5 0.500 0.245 0.191 11 6 1.050 0.756 0.162 001 7 1.748 1.344 0.107 0000 8 00 2.152 0.040 0011 For these values, D = 0.0345 which equals -14.62 dB. The number of bits/sample for this optimum 8-level quantizer is R = 3. On performing Huffman coding, the average number of bits per sample required is RH = 2.88 bits/sample. The theoretical limit is H(X) = 2.82 bits/sample. . F I 1 rt · ··· r r 1.11 INTRODUCTION TO IMAGE COMPRESSION Earlier in this chapter we discussed the coding of data sets for compression. By applying these techniques we can store or transmit all of the information content of a string of data with fewer bits than are in the source data. The minimum number of bits that we must use to convey all the information in the source data is determined by the entropy measure of the source. Good compression ratios can be obtained via entropy encoders and universal encoders for sufficiently large source data blocks. In this section, we look at compression techniques used to store and transmit image data. Images can be sampled and quantized sufficiently finely so that a binary data stream can represent the original data to an extent that is satisfactory to the most discerning eye. Since we can represent a picture by anything from a thousand to a million bytes of data, we should be able to apply the techniques studied earlier directly to the task of compressing that data for storage and transmission. First, we consider the following points: 1. High quality images are represented by very large data sets. A photographic quality image may require 40 to 100 million bits for representation. These large file sizes drive the need for extremely high compression ratios to make storage and transmission (particularly of movies) practical. 2. Applications that involve imagery such as television, movies, computer graphical user interfaces, and the World Wide Web need to be fast in execution and transmission across
  • 23. Information Theory, Coding and Cryptography distribution networks, particularly if they involve moving images, to be acceptable to the human eye. 3. Imagery is characterised by higher redundancy than is true of other data. For example, a pair of cセN、ェ。」・ョエ@ horizontal lines in an image is nearly identical while, two adjacent lines in a book are generally different. The first two points indicate that the highest level of compression technology needs to be used for the movement and storage of image data. The third factor indicates that high compression ratios could be applied. The third factor also says that some special compression techniques may be possible to take advantage of the structure and properties of image data. The close relationship between neighbouring pixels in an image can be exploited to improve the compression ratios. This has a very important implication for the task of coding and decoding image data for real-time applications. Another interesting point to note is that the human eye is highly tolerant to approximation error in an image. Thus, it may be possible to compress the image data in a manner in which the less important details (to the human eye) can be ignored. That is, by trading off some of the quality of the image we might obtain a significantly reduced data size. This technique is called Lossy Compression, as opposed to the Lossless Compression techniques discussed earlier. Such liberty cannot be taken, say, financial or textual data! Lossy Compression can only be applied to data such as images and audio where deficiencies are made up by the tolerance by human senses of sight and hearing. 1.12 THE JPEG STANDARD FOR LOSSLESS COMPRESSION The Joint Photographic Experts Group (]PEG) was formed jointly by two 'standards' organisations--the CCITT (The European Telecommunication Standards Organisation) and the International Standards Organisation (ISO). Let us now consider the lossless compression option of theJPEG Image Compression Standard which is a description of 29 distinct coding systems for compression of images. Why are there so many approaches? It is because the needs of different users vary so much with respect to quality versus compression and compression versus computation time that the committee decided to provide a broad selection from which to choose. We shall briefly discuss here two methods that use entropy coding. The two lossless JPEG compression options discussed here differ only in the form of the entropy code that is applied to the data. The user can choose either a Huffman Code or an Arithmetic Code. We will not treat the Arithmetic Code concept in much detail here. However, we will summarize its main features: Arithmetic Code, like Huffman Code, achieves compression in transmission or storage by using the probabilistic nature of the data to render the information with fewer bits than used in the source data stream. Its primary advantage over the Huffman Code is that it comes closer to the Shannon entropy limit of compression for data streams that involve a relatively small alphabet. The reason is that Huffman codes work best (highest compression ratios) when the Source Coding . probabilities of the symbols can be expressed as fractions of powers of two. The Arithmetic code construction is not closely tied to these particular values, as is the Huffman code. The computation of coding and decoding Arithmetic codes is costlier than that of Huffman codes. Typically a 5 to 10% reduction in file size is seen with the application of Arithmetic codes over that obtained with Huffman coding. Some compression can be achieved if we can predict the next pixel using the previous pixels. In this way we just have to transmit the prediction coefficients (or difference in the values) instead of the entire pixel. The predictive process that is used in the losslessJPEG coding schemes to form the innovations data is also variable. However, in this case, the variation is not based upon the user's choice, but rather, for any image on a line by line basis. The choice is made according to that prediction method that yields the best prediction overall for the entire line. There are eight prediction methods available in theJPEG coding standards. One of the eight (which is the no prediction option) is not used for the lossless coding option that we are examining here. The other seven may be divided into the following categories: 1. Predict the next pixel on the line as having the same value as the last one. 2. Predict the next pixel on the line as having the same value as the pixel in this position on the previous line (that is, above it). 3. Predict the next pixel on the line as havirg a value related to a combination of the previous, above and previous to the above pixel values. One such combination is simply the average of the other three. The differential encoding used in the JPEG standard consists of the differences between the actual image pixel values and the predicted values. As a result of the smoothness and redundancy present in most pictures, these differences give rise to relatively small positive and negative numbers that represent the small typical error in the prediction. Hence, the probabilities associated with these values are large for the small innovation values and quite small for large ones. This is exactly the kind of data stream that compresses well with an entropy code. The typical lossless compression for natural images is 2: 1. While this is substantial, it does not in general solve the problem of storing or moving large sequences of images as encountered in high quality video. 1.13 THE JPEG STANDARD FOR LOSSY COMPRESSION TheJPEG standard includes a set of sophisticated lossy compression options developed after a study of image distortion acceptable to human senses. The JPEG lossy compression algorithm consists of an image simplification stage, which removes the image complexity at some loss of fidelity, followed by a lossless compression step based on predictive filtering and Huffman or Arithmetic coding. The lossy image simplification step, which we will call the image reduction, is based on the exploitation of an operation known as the Discrete Cosine Transform (DCT), defined as follows.
  • 24. Information Theory, Coding and Cryptography N-IM-1 Y(k, l) = I I 4y(i, })cos( 1tk (2i + t)cos(__!E!___(2j + 1)) i=O J=O 2N 2M where the input image is N pixels by M pixels, y(セ@ j) is the intensity of the pixel in row i and column j, Y(k,l) is the DCT coefficient in row k and column l of the DCT matrix. All DCT multiplications are real. This lowers the number of required multiplications, as compared to the Discrete Fourier Transform. For most images, much of the signal energy lies at low frequencies, which appear in the upper left comer of the DCT. The lower right values represent higher frequencies, and are often small (usually small enough to be neglected with little visible distortion). In theJPEG image reduction process, the DCT is applied to 8 by 8 pixel blocks of the image. Hence, if the image is 256 by 256 pixels in size, we break it into 32 by 32 square blocks of 8by 8 pixels and treat each one independently. The 64 pixel values in each block are transformed by the DCT into a new set of 64 values. These new 64 values, known also as the DCT coefficients ' form a whole new way of representing an image. The DCT coefficients represent the spatial frequency of the image sub-block. The upper left comer of the DCT matrix has low frequency components and the lower right comer the high frequency components (see Fig. 1.15). The top left coefficient is called the DC coefficient. Its value is proportional to the average value of the 8 by 8 block of pixels. The rest are called the AC coefficients. So far we have not obtained any reduction simply by taking the DCT. However, due to the nature of most natural images, maximum energy (information) lies in low frequency as opposed to high frequency. We can represent the high frequency components coarsely, or drop them altogether, without strongly affecting the quality of the resulting image reconstruction. This leads to a lot of compression (lossy). TheJPEG lossy compression algorithm does the following operations: 1. First the lowest weights are trimmed by setting them to zero. 2. The remaining weights are quantized (that is, rounded off to the nearest of some number of discrete code represented values), some more coarsely than others according to observed levels of sensitivity of viewers to these degradations. DC coefficient Low frequency coefficients / 4.32 / v 2.74 2.11 1.62 3.12 l.-""' 2.11 1.33 0.44 3.01 2.41 1.92 1.55 0.32 0.11...,. -1.----- セ@ 0.03 0.02 セ@ v Higher frequency coefficients) (AC coefficients) Fig. 1.15 Typical Discrete Cosine Transform (OCT) Values. Source Coding Now several lossless compression steps are applied to the weight data that results from the above DCT and quantization process, for all the image blocks. We observe that the DC coefficient, which represents the average image intensity, tends to vary slowly from one block of 8 x 8 pixels to the next. Hence, the prediction of this value from surrounding blocks works well. We just need to send one DC coefficient and the difference between the DC coefficients of successive blocks. These differences can also be source coded. We next look at the AC coefficients. We first quantize them, which transforms most of the high frequency coefficients to zero. We then use a zig-zag coding as shown in Fig. 1.16. The purpose of the zig-zag coding is that we gradually move from the low frequency to high frequency, avoiding abrupt jumps in the values. Zig-zag coding will lead to long runs of O's, which are ideal for RLE followed by Huffman or Arithmetic coding. 4.32 3.12 3.01 2.41 4 3 3 2 2.74 2.11 1.92 1.55 セ@ 4333222122200000 2.11 1.33 0.32 0.11 1.62 0.44 0.03 0.02 Fig. 1.16 An Example of Quantization followed by Zig-zag Coding. The typically quoted performance for ]PEG is that photographic quality images of natural scenes can be preserved with compression ratios of up to about 20:1 or 25:1. Usable quality (that is, for noncritical purposes) can result for compression ratios in the range of 200:1 up to 230:1. 1.14 CONCLUDING REMARKS In 1948, Shannon published his landmark paper titled "A Mathematical Theory of Communication". He begins this pioneering paper on information theory by observing that the fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. He then proceeds so thoroughly to establish the foundations of information theory that his framework and terminology remain standard. Shannon's theory was an immediate success with communications engineers and stimulated the growth of a technology which led to today's Information Age. Shannon published many more provocative and influential articles in a variety of disciplines. His master's thesis, "A Symbolic Analysis of Relay and Switching Circuits", used Boolean algebra to establish the theoretical underpinnings of digital circuits. This work has broad significance because digital circuits are fundamental to the operation of modem computers and telecommunications systems.
  • 25. Information Theory, Coding and Cryptography Shannon was renowned for his eclectic interests and capabilities. A favourite story describes him juggling while riding a unicycle down the halls of Bell Labs. He designed and built chess- playing, maze-solving, juggling and mind-reading machines. These activities bear out Shannon's claim that he was more motivated by curiosity than usefulness. In his words "I just wondered how things were put together." The Huffman code was created by American Scientist, D. A. Huffman in 1952. Modified Huffman coding is today used in the Joint Photographic Experts Group (]PEG) and Moving Picture Experts Group (MEPG) standards. A very efficient technique for encoding sources without needing to know their probable occurrence was developed in the 1970s by the Israelis Abraham Lempel and Jacob Ziv. The compress and uncompress utilities of the UNIX operating system use a modified version of this algorithm. The GIF format (Graphics Interchange Format), developed by CompuServe, involves simply an application of the Lempel-Ziv-Welch (LZW) universal coding algorithm to the image data. And finally, to conclude this chapter we mention that Shannon, the father of Information Theory, passed away on February 24, 2001. Excerpts from the obituary published in the New York Times: SUMMARY • The Self-Information of the event X= x, is given by I(xJ = log ( P(:,) ) = - log P(xJ. • The Mutual Information I(x,; y) between x, and Jj is given by I(x,; y) =log ( ーセセセIIス@ • The Conditional Self-Information of the event X= xi given Y = y1is defined as J(xi I y) = log ( 1 J= - log P(xi Iy). P(xiiYJ) • The Average Mutual Information between two random variables X and Yis given by J(X:, n m n m P(X· J·) Y) = L L P(xi, y)I(xi; y) = L L P(xc y1)log '' 1 . For the case when X and Yare i=l J=l i=l J=l P(xJP(yJ) statistically independent, J(X; Y) = 0. The average mutual information J(X; Y) セ@ 0, with equality if and only if X and Yare statistically independent. Source Coding n • The Average Self-Information of a random variable Xis given by H(X; = L P(x;)I(xi) n i=l =-L P(xJlogP(xJ. H (X; is called the entropy. i=l • The Average Conditional Self-Information called the Conditional Entropy is given by n m 1 H(XI Y) = セ@ セ@ P(x;, y) log P(xiiYJ) • I(xi; y) = I(xJ- I(xi Iy) and I(X; Y) = H(X)- H(XI Y) = H(Y)- H(YIX). Since I(X; Y) セ@ 0, it implies that H(X) セ@ H(XI Y). • The Average Mutual Information between two continuous random variables X and Y 00 00 p( lx)p(x) is given by J(X; Y) = J Jp(x)p(ylx)log セクI@ ( ) dxdy -oo-oo p pY • The Differential Entropy of a continuous random variables X is given by H(X) = - Jp(x)log p{x). • The Average Conditional Entropy of a continuous random variables X given Y is given by H(XI Y) = J Jp(x, y)log p(xly)dxdy. • A necessary and sufficient condition for the existence of a binary code with codewords L having lengths n1 :5 セ@ :5 ... nL that satisfy the prefix condition is L 2-nk :5 1. The efficiency k=l H(x) of a prefix code is given by T] = R · • Let X be the ensemble of letters from a DMS with finite entropy H (X). The Source Coding Theorem suggests that it is possible to construct a code that satisfies the prefix condition, and has an average length R that satisfies the inequality H (X) :5 R <H(X) + 1. Efficient representation of symbols leads to compression of data. • Huffman Encoding and Lempel-Ziv Encoding are two popular source coding techniques. In contrast to the Huffman coding scheme, the Lempel-Ziv technique is independent of the source statistics. The Lempel-Ziv technique generates a Fixed Length Code, where as the Huffman code is a Variable Length Code. • Run-Length Encoding or RLE is a technique used to reduce the size of a repeating string of characters. This repeating string is called a run. Run-length encoding is supported by most bitmap file formats such as TIFF, BMP and PCX.
  • 26. Information Theory, Coding and Cryptography • Distortion implies some measure of difference between the actual source samples {x.J and the corresponding quantized value {xd. The squared-error distortion is given by d(xk, xk) =(xk- xkf In general, a distortion measure may be represented as d(xlt, xk) = jxk- xkjP. • The Minimum Rate (in bits/source output) required to represent the output X of a memoryless source with a distortion less than or equal to D is called the rate distortion function R(D), defined as R(D) = min _ I (X, X) where I(X, X); is the average p(x!x):E[d(X, X)] mutual information between X and .i. • The distortion resulting due to the quantization can be expressed as D = [ .. f(x - x) p(x)dx, where f(x- x) is the desired function of the error. An optimum quantizer is one that minimizes D by optimally selecting the output levels and the corresponding input range of each output level. The resulting optimum quantizer is called the Lloyd-Max quantizer. • Quantization and source coding techniques (Huffman coding, arithmetic coding iHld run- length coding) are used in the JPEG standard for image compression. ヲェoセ。エGB・エセセセケッキセキィ・LョNLケッキセ@ : . your ・Zyセ@ offyour セ@ 1 l - flenry FonL (1863-1947) i ..._) PRC913LEMS 1.1 Consider a DMS with source probabilities {0.30, 0.25, 0.20, 0.15, 0.10}. Find the source entropy, H (X). 1.2 Prove that the entropy for a discrete source is a maximum when the output symbols are equally probable. 1.3 Prove the inequality In x セ@ x- 1. Plot the curves y1 = In x and Y2 = x- 1 to demonstrate the validity of this inequality. 1.4 Show that I (X; Y) セ@ 0. Under what condition does the equality hold? 1.5 A source, X: has an infinitely large set of outputs with probability of occurrence given by P (xJ = Ti, i = 1, 2, 3, ..... What is the average self information, H (X), of this source? 1.6 Consider another geometrically distributed random variable X with P (x;) = p (1 - pt1 , i = 1, 2, 3, ..... What is the average self information, H (X), of this source? 1.7 Consider an integer valued random variable, X: given by P (X= n) = 1 2 , where . An log n セ@ 1 . A= L 1 2 and n= 2, 3 ..., oo. Find the entropy, H (X). n=Z n og n Source Coding 1.8 Calculate the differential entropy, H (X), of the uniformly distributed random variable X with the pdf, { -1 0 < < () a _x_a px= 0 (otherwise) Plot the differential entropy, H (X), versus the parameter a (0.1 < a< 10). Comment on the result. 1.9 Consider a DMS with source probabilities {0.35, 0.25, 0.20, 0.15, 0.05}. (i) Determine the Huffman code for this source. (ii) Determine the average length R of the codewords. (iii) What is the efficiency 11 of the code? 1.10 Consider a DMS with source probabilities {0.20, 0.20, 0.15, 0.15, 0.10, 0.10, 0.05, 0.05}. (i) Determine an efficient fixed length code for the source. (ii) Determine the Huffman code for this source. (iii) Compare the two codes and comment. 1.11 A DMS has three output symbols with probabilities {0.5, 0.4, 0.1}. (i) Determine the Huffman code for this source and find the efficiency 11· (ii) Determine the Huffman code for this source taking two symbols at a time and find the efficiency 11· (iii) Determine the Huffman code for this source taking three symbols at a time and find the efficiency 11· 1.12 For a source with entropy H(X), prove that the entropy of a B-symbol block is BH(X). 1.13 Let X and Y be random variables that take on values x1, セG@ ..., x, and Yr• )2, ..., Ys respectively. Let Z =X+ Y. 1.14 1.15 (a) Show that H(Z!X) = H(Y!X) (b) If X and Yare independent, then argue that H( Y) セ@ H(Z) and H(X) セ@ H (Z). Comment on this observation. (c) Under what condition will H(Z) = H(X) + H(Y)? Determine the Lempel Ziv code for the following bit stream 01001111100101000001010101100110000. Recover the original sequence from the encoded stream. Find the rate distortion function R(D) =min !(X; X)for Bernoulli distributed X: with p = 0.5, where the distortion is given by { 0, d (x, x) = セN@ x=x, X= 1 X= 0, X= 0, X= 1. 1.16 Consider a source X uniformly distributed orr the set {1, 2, ..., m}. Find the rate distortion function for this source with Hamming distortion defined as d (x, x) = ' _ { o x=x 1, X -:1:- X • !
  • 27. Information Theory, Coding and Cryptography COMPUTER PROBLEMS 1.17 Write a program that performs Huffman coding, given the source probabilities. It should generate the code and give the coding efficiency. 1.18 Modify the above program so that it can group together n source symbols and then generate the Huffman code. Plot the coding efficiency T1 versus n for the following source symbol probabilities: {0.55, 0.25, 0.20}. For what value of n does the efficiency become better than 0.9999? Repeat the exercise for following source symbol probabilities {0.45, 0.25, 0.15, 0.10, 0.05}. 1.19 Write a program that executes the Lempel Ziv algorithm. The input to the program can be the English alphabets. It should convert the alphabets to their ASCII code and then perform the compression routine. It should output the compression achieved. Using this program, find out the compression achieved for the following strings of letters. (i) The Lempel Ziv algorithm can compress the English text by about fifty five percent. (ii) The cat cannot sit on the canopy of the car. 1.20 Write a program that performs run length encoding (RLE) on a sequence of bits and gives the coded output along with the compression ratio. What is the output of the program if the following sequence is fed into it: 1100000000111100000111111111111111111100000110000000. Now feed back the encoded output to the program, i.e., perform the RLE two times on the original sequence of bits. What do you observe? Corr:ment. 1.21 Write a program that takes in a 2n level gray scale image (n bits per pixel) and performs the following operations: (i) Breaks it up into 8 by 8 pixel blocks. (ii) Performs DCT on each of the 8 by 8 blocks. (iii) Quantizes the DCT coefficients by retaining only the m most significant bits (MSB), where m セ@ n. (iv) Performs the zig-zag coding followed by run length coding. (v) Performs Huffman coding on the bit stream obtained above (think of a reasonable way of calculating the symbol probabilities). (vi) Calculates the compression ratio. (vii) Performs the decompression (i.e., the inverse operation of the steps (v) back to (i)). Perform image compression using this program for different values of m. Up to what value of mis there no perceptible difference in the original image and the compressed image? .. 2 Channel Capacity and Coding v eクMー・イセ@ thin1c.t tJw.:t 1£' w a-- セ@ thecremt whilet エィ・Lセ@ be.lUwe-- Lt: 0- be- a.t'1 セ@ {cl.d. {oキエ[Gィ・エgセ」オイカ・L@ セPMpセI@ . LipptttlM'4 GcibrieL (1845 -1921) 2.1 INTRODUCTION In the previous chapter we saw that most natural sources have inherent redundancies and it is possible to compress data by removing these redundancies using different source coding techniques. After efficient representation of source symbols by the minimum possible number of bits, we transmit these bit-streams over channels (e.g., telephone lines, optical fibres etc.). These bits may be transmitted as they are (for baseband communications), or after modulation (for passband communications). Unfortunately, all real-life channels are noisy. The term noise designates unwanted waves that disturb the transmission and processing of the wanted signals in communication systems. The source of noise may be external to the system (e.g., atmospheric noise, man generated noise etc.), or internal (e.g., thermal noise, shot noise etc.). In effect, the bit stream obtained at the receiver is likely to be different from what was transmitted. In passband communication, the demodulator processes the channel-corrupted waveform and reduces each waveform to a scalar or a vector that represents an estimate of the transmitted data symbols. The detector, which follows the demodulator, decides whether the transmitted bit is a
  • 28. Information Theory, Coding and Cryptography 0 or a 1. This is called Hard Decision Decoding. This decision process at the decoder is similar to a binary quantization with two levels. If there are more than 2 levels of quantization, the detector is said to perform a Soft Decision Decoding. The use of hard decision decoding causes an irreversible loss of information at the receiver. Suppose the modulator sends only binary symbols but the demodulator has an alphabet with Q, symbols, and assuming the use of quantizer as depicted in Fig. 2.1 (a), we have Q,= 8. Such a channel is called a binary input Qary output Discrete Memoryless Channel. The corresponding channel is shown in Fig. 2.1 (b). The decoder performance depends on the location of the representation levels of the quantizers, which in tum depends on the signal level and the noise power. Accordingly, the demodulator must incorporate automatic gain control in order to realize an effective multilevel quantizer. It is clear that the construction of such a decoder is more complicated than the hard decision decoder. However, soft decision decoding can provide significant improvement in performance over hard decision decoding. Output b1 b2 b3 b4 Input bs b6 b7 ba (a) (b) Fig. 2.1 (a) Transfer Characteristic of Multilevel Quantizer (b) Channel Transition Probability Diagram. b1 b2 b3 b4 bs b6 b7 ba There are three balls that a digital communication _engineer must juggle: (i) the transmitted signal power, (ii) the channel bandwidth, and (iii) the reliability of the communication system (in terms of the bit error rate). Channel coding allows us to trade-off one of these commodities (signal power, bandwidth or reliability with respect to others). In this chapter, we will study how to achieve reliable communication in the presence of noise. We shall ask ourselves questions like: how many bits per second can be sent over a channel of a given bandwidth and for a given signal to noise ratio (SNR)? For that, we begin by studying a few channel models first. 2.2 CHANNEL MODELS We have already come across the simplest of the channel models, the Binary Symmetric Channel (BSC), in the previous chapter. If the modulator employs binary waveforms and the Channel Capacity and Coding detector makes hard decisions, then the channel may be viewed as one in which a binary bit stream enters at the transmitting end and another bit stream comes out at the receiving end. This is depicted in Fig. 2.2. Channel Binary f---.- Channel ____. Demodulator/ r----- Channel - - Encoder f---.- Modulator Detector Decoder Fig. 2.2 A Composite Discrete-input, Discrete-output Channel. The composite Discrete-input, Discrete-output Channel is characterized by the set X= {0,1} of possible inputs, the set Y= {0, 1} of possible outputs and a set of conditional probabilities that relate the possible outputs to the possible inputs. Assuming the noise in the channel causes independent errors in the transmitted binary sequence with average probability of error p P(Y= Ol X= 1) = P(Y= 11 X= 0) = p, P(Y= II X= 1) = P(Y= Ol X= 0) = 1- p. (2.1) A BSC is shown in Fig. 2.3. 1- p 0 0 p 1-p Fig. 2.3 A Binary Symmetric Channel (BSC). The BSC is a special case of a general, discrete-input, discrete-output channel. Let the input to the channel be q-ary symbols, i.e., X= {.xo, x1 , .., xq--d and the output of the detector at the receiver consist of Qary symbols, i.e., Y= {y0, y1 , .. , YQ;-d. We assume that the channel and the modulation is memoryless. The inputs and outputs can then be related by a set of qQ,conditional probabilities P(Y= Yi I X= x) = P(y1 I x), (2.2) where i = O, 1 , ... Q,- 1 and j = 0, 1 , ... q- 1. This channel is known as a Discrete Memoryless Channel (DMC) and is depicted in Fig. 2.4. Definition 2.1 The conditional probability P(y; I x1) is defined as the Channel Transition Probability and is denoted by PJi· Definition 2.2 The conditional probabilities {P(y; Ix)} that characterize a DMC can be arranged in the matrix form P = [p1J. P is called the Probability Transition Matrix for the channel. .I
  • 29. Information Theory, Coding and Cryptography Yo Y1 Xq-1 Ya-1 Fig. 2.4 A Discrete Memoryless Channel (DMC) with q-ary input and Q-ary output. In the next section, we will try to answer the question: How many bits can be sent across a given noisy channel, each time the channel is used? 2.3 CHANNEL CAPACITY Consider a DMC having an input alphabet X= {.xo, xi, ..., xq-Jland an output alphabet Y= {.xo, xi, ..., xr_Jl. Let us denote the set of channel transition probabilities by P(yil x1). The average mutual information provided by the output Y about the input X is given by (see Chapter 1, Section 1.2) セi@ r-I P(y·lx.) I(X;Y) = L LP(x1)P(yilx1)log 1 1 j=O i=O pHセI@ (2.3) The channel transition probabilities pHケLセクI@ are determined by the channel characteristics (particularly the noise in the channel). However, the input symbol probabilities P(x1 ) are within the control of the discrete channel encoder. The value of the average mutual information, /(X; Y), maximized over the set of input symbol probabilities P(x) is a quantity that depends only on the channel transition probabilities pHケLセク Q IN@ This quantity is called the Capacity of the Channel. Definition 2.3 The Capacity of a DMC is defined as the maximum average mutual information in any single use of the channel, where the maximization is over all possible input probabilities. That is, C= max I(X; Y) P(x1) q-1 r-1 P(y·lx·) =max L L P(x1 )P(y1lx1 )1og 1 1 P(xj) j=O i=O p (yl) The maximization of I(X; Y) is performed under the constraints q-1 P(x) セ@ 0, and L.P(x1) = 1 j=O (2.4) The units of channel capacity is bits per channel use (provided the base of the lo 'thm is 2. Channel Capacity and Coding Example 2.1 Consider a BSC with channel transition probabilities P(Oil) = p = P(liO) By symmetry, the capacity, max C = max /(X; Y), is achieved forp = 0.5. From equation (2.4) we P(xj) obtain the capacity of a BSC as C = 1 + p log2 p + (1- p) log2( 1- p) Let us define the entropy function H(p) =- p log2 p- (1- p) log2 (1- p) Hence, we can rewrite the capacity of a binary symmetric channel as C= 1- H(p). 0.8 セ@ 0.6 ·o IU a. IU () 0.4 0.2 0.2 0.4 0.6 0.8 Probability of error Fig. 2.5 The Capacity of a BSC. The plot of the capacity versus p is given in Fig. 2.5. From the plot we make the following observations. (i) Forp = 0 (i.e., noise free channel), the capacity is 1bit/use, as expected. Each time we use the channel, we can successfully transmit 1 bit ofinformation. (ii) Forp = 0.5, the channel capacity is 0, i.e., observing the output gives no information about the input. It is equivalent to the case when the channel is broken. We might as well discard the channel and toss a fair coin in order to estimate what was transmitted. (iii) For 0.5 < p < 1, the capacity increases with increasing p. In this case we simply reverse the positions of 1 and 0 at the output of the BSC. (iv) For p = 1 (i.e., every bit gets flipped by the channel), the capacity is again 1 bit/use, as expected. In this case, one simply flips the bit at the output of the receiver so as to undo the effect ofthe channel.
  • 30. Information Theory, Coding and Cryptography (v) Sincep_ is a monotonically decreasing function ofsignal to noise ratio (SNR), the capacity of a BSC Is a monotonically increasing function of SNR. Having developed the notion of capacity of a channel, we shall now try to relate it to reliable communication over the channel. So far, we have only talked about bits that can be sent over a ch_annel each time it is used (bits/use). But, what is the number of bits that can be send per ウ・」ッセ、@ (bits/sec)? To answer this question we introduce the concept of Channel Coding. 2.4 CHANNEL CODING セャャ@ real-life channels are affected by noise. Noise causes discrepancies (errors) between the mput and the ッオエーセエN@ data ウセアオ・ョ」・ウ@ of a digital communication system. For a typical noisy 」セ。ョョ・ャL@ the probability of bit error may be as high as 1o-2 . This means that, on an average, 1 bit.ou: セヲ@ _every 100 transmitted over this channel gets flipped. For most applications, this level of セ・ャイ。ィコャイエケ@ IS far from adequate. Different applications require different levels of reliability (which IS a_ 」ッューッョセョエ@ of the quality of service). Table 2.1 lists the typical acceptable bit error rates for varwus applications. Table 2. 1 Acceptable bit error rates for various applications Application - Probability of Error Speech telephony Voice band data 10-4 10-6 Electronic mail, Electronic newspaper 1o-6 Internet access 1o-6 Video telephony, High speed computing 10-7 In order to achieve such high levels of reliability, we resort to Channel formatting. Coding.. Th_e basic objective of channel coding is to increase the resistance of the digital commumcation system to channel noise. This is done by adding redundancies in the transmitted data stream in a controlled manner. In セィ。ョョ・ャ@ coding, we map the incoming data sequence to a channel input sequence. This ・ョ」ッ、ゥセァ@ procedure is done by the Channel Encoder. The encoded sequence is then transmitted over the noisy channel. The channel output sequence at the receiver is inverse mapped on to an output data sequence. This is called the decoding procedure, and is carried out by the Channel Decoder. Both the encoder and the decoder are under the designer's control. As already mentioned, the encoder introduces redundancy in a prescribed manner. The decoder exploits セゥウ@ redundancy in order to reconstruct the original source sequence as accuratel_r セウ@ possible. Thus, channel coding makes it possible to carry out reliable commumcation over unreliable (noisy) channels. Channel coding is also referred to as Error Control Coding, and we will use these terms interchangeably. It is interesting to note here that the source coder reduces redundancy to improve efficiency, whereas, the channel coder adds redundancy in a controlled manner to improve reliability. Channel Capacity and Coding We first look at a class of channel codes called Block Codes. In this class of codes, the incoming message sequence is first sub-divided into sequential blocks, each of length k bits. Each k-bit long information block is mapped into an n-bit block by the channel coder, where n > k. This means that for every k bits of information, (n- k) redundant bits are added. The ratio k r=- (2.5) n is called the Code Rate. Code rate of any coding scheme is, naturally, less than unity. A small code rate implies that more and more bits per block are the redundant bits corresponding to a· higher coding overhead. This may reduce the effect of noise, but will also reduce the communication rate as we will end up transmitting more redundant bits and fewer information bits. The question before us is whether there exists a coding scheme such that the probability that the message bit will be in error is arbitrarily small and yet the coding rate is not too small? The answer is yes and it was first provided by Shannon in his second theorem on channel capacity. We will study this shortly. Let us now introduce the concept of time in our discussion. We wish to look at questions like how many bits per second can we send over a given noisy channel with arbitrarily low bit error rates? Suppose the DMS has the source alphabet X and entropy H(X) bits per source symbol and the source generates a symbol every T5 seconds, then, the average information rate of the source is H (X) bits per second. Let us assume that the channel can be used once every T, Ts seconds and the capacity of the channel is C bits per channel use. Then, the channel capacity per unit time is _!2_ bits per second. We now state Shannon's second theorem known as the セ@ Channel Coding Theorem. Theorem 2.1 Channel Coding Theorem (Noisy coding theorem) (i) Let a DMS with an alphabet X have entropy H(X) and produce symbols every 1's seconds. Let a DMC have capacity Cand be used once every T, seconds. Then, if H (X) < _!2_ (2.6) セ@ - セ@ エィ・イセ@ exists a coding scheme for which the source output can be transmitted over the noisy channel and be reconstructed with an arbitrarily low probability of error. (ii) Conversely, if H(X) > _!2_ (2.7) Ts セG@ it is not possible to transmit information over the channel and reconstruct it with an arbitrarily small probability of error. The parameter _!2_ is called the Critical Rate. T, I I
  • 31. Information Theory, Coding and Cryptography The channel coding theorem is a very important result in information theory. The theorem specifies the channel capacity, C, as a fundamental limit on the rate at which reliable communi- cation can be carried out over an unreliable (noisy) DMS channel. It should be noted that the channel coding theorem tells us about the existence of some codes that can achieve reliable communication in a noisy environment. Unfortunately, it does not give us the recipe to construct these codes. Therefore, channel coding is still an active area of research as the search for better and better codes is still going on. From the next chapter onwards we shall study some good channel codes. Example 2.2 Consider a DMS source that emits equally likely binary symbols (p = 0.5) once every Ts seconds. The entropy for this binary source is H(p) =- p log2p- (1- p) log2 (1- p) = 1 bit. The information rate of this source is H (X) = -1 bits/second. 1's 1's Suppose we wish to transmit the source symbols over a noisy channel. The source sequence is applied to a channel coder with code rate r. This channel coder uses the channel once every Tc seconds to send the coded sequence. We want to have reliable communication (the probability of error as small as desired). From the channel coding theorem. if _1 < ..£ (2.8) 1's - I;; we can make the probability of error as small as desired by a suitable choice of a channel coding scheme, and hence have reliable communication. We note that the code rate o{the coder can be expressed as I;; r=- I's Hence, the condition for reliable communication can be rewritten as r$. C (2.9) (2.10) Thus, for a BSC one can find a suitable channel coding scheme with a code rate, r 5:. C, which will ensure reliable communication regardless of how noisy the channel is! Of course, we can state that at least one such code exists, but finding that code may not be a trivial job. As we shall see later, the level of noise in the channel will manifest itself by limiting the channel capacity, and hence the code rate. Channel Capacity and Coding Example 2.3 Consider a BSC with a transition probability p = w-2. Such error rates are typical of wireless channels. We saw in Example 2.1 that for a BSC the capacity is given by C = 1 + p log2p + (1 - p) log2 ( 1 - p) By plugging in the value of p = w-2 we obtain the channel capacity C = 0.919. From the previous example we can conclude that there exists at least one coding scheme with the code rater $.-0.919 which will guarantee us a (non-zero) probability oferror that is as small as desired. Example 2.4 Consider the repetition code in which each message bit is simply repeated n times, where n is an odd integer. For example, for n = 3, we have the mapping scheme 0 セ@ 000; 1 セ@ 111 Similarly, for n = 5 we have the mapping scheme 0 セ@ 00000; 1 セ@ 11111 Note that the code rate of the repetition code with blocklength n is r = _!_ (2.11) n The decoding strategy is as follows: If in a block ofn received bits the number of0's exceeds the number of 1's, decide in favour of 0 and vice versa. This is otherwise known as Majority Decoding. This also answers the question why n should be an odd integer for repetition codes. Let n = 2m + 1, where m is a positive integer. This decoding strategy will make an error if more than m bits are in error, because in that case ifa 0 is encoded and sent, there would be more number of 1'sin the received word. Let us assume that the a priori probabilities of 1 and 0 are equal. Then, the average probability of error is given by (2.12) where p is the channel transition probability. The average probability oferror for repetition codes for different code rates is given in Table 2.2. Table 2.2 Average probability of error for repetition codes Code Rate. r Average Probability of 1 113 115 1/7 119 1111 Error. Pe w-2 3xlo-4 w-6 4xlo-7 w-s 5x10-0
  • 32. Information Theory, Coding and Cryptography From the Table we see that as the code rate decreases, there is a steep fall in the average probability of error. The decrease in the Pe is much more rapid than the decrease in the code rate, r. However, for repetition codes, the code rate tends to zero if we want smaller and smaller Pe- Thus the repetition code exchanges code rate for message reliability. But the channel coding theorem states that the code rate need not tend to zero in order to obtain an arbitrarily low probability of error. The theorem merely requires the code rate r to be less than the channel capacity, C. So there must exist some code (other than the repetition code) with code rater= 0.9 which can achieve arbitrarily low probability of error. Such a coding scheme will add just 1 parity bit to 9 information bits (or, maybe, add 10 extra bits to 90 information bits) and give us as small aPe as desired (say, 10-20 )! The hard part is finding such a code. 2.5 INFORMATION CAPACITY THEOREM So far we have studied limits on the maximum rate at which information can be sent over a channel reliably in terms of the chan:1el capacity. In this section we will formulate the Information Capacity Theorem for band-limited, power-limited Gaussian channels. Consider a zero mean, stationary random process X(t) that is band limited to WHertz. Let X*' k = 1, 2,..., K, denote the continuous random variables obtained by uniform sampling of the process X(t) at the Nyquist rate of 2W samples per second. These symbols are transmitted over a noisy channel which is also band-limited to W Hertz. The channel output is corrupted by Additive White Gaussian Noise (AWGN) of zero mean and power spectral density (psd) Nof2. Because of the channel, the noise is band limited to WHertz. Let Yk, k= 1, 2,..., K, denote the samples of the received signal. Therefore, Yk= Xk + N"' k = 1, 2,..., K (2.13) where Nk is the noise sample with zero mean and variance a 2 = N0W. It is assumed that Yh k = 1, 2,..., K, are statistically independent. Since the transmitter is usually power-limited, let us put a constraint on the average power in Xk : E[X2 k] =P, k= 1, 2,..., K (2.14) The information capacity of this band-limited, power-limited channel is the maximum of the mutual information between the channel input Xk and the channel output Yk- The maximization has to be done over all distributions on the input Xk that satisfy the power constraint of equation (2.14). Thus, the information capacity of the channel (same as the channel capacity) is given by C= max {/(X; Y) IE[X]J = P}, (2.15) fxk (x) where fxk (x) is the probability density function of xk. Now, from the previous chapter we have, I (Xk; Yk) = h(Yk)- h(Yk IXk) (2.16) Note that Xk and Nk are independent random variables. Therefore, the conditional differential entropy of Yk given Xk is equal to the differential entropy of Nk. Intuitively, this is because given Xk the uncertainty arising in Yk is purely due to Nk. That is, h( Yk IXk) = h(Nk) (2.17) Channel Capacity and Coding Hence we can write Eq. (2.16) as 1 (Xk; Yk) = h(Yk)- h (Nk) (2.18) Since h (NJ is independent of X*' maximizing I(Xh YJ translates to maximizing h (YJ. It can be shown that in order for h (YJ to be maximum, Yk has to be a Gaussian random variable (see problem 2.10). If we assume Yk to be Gaussian, and Nk is Gaussian by definition, then X is also Gaussian. This is because the sum (or difference) of two Gaussian random variablesk is also Gaussian. Thus, in order to maximize the mutual information between the channel input Xk and the channel output Y.b the transmitted signal should also be Gaussian. Therefore we can rewrite (2.15) as C = I(X;Y) IE {xセ@ P 。ョ、セ@ is Gaussian (2.19) We know that if two independent Gaussian random variables are added, the variance of the resulting Gaussian random variable is the sum of the variances. Therefore, the variance of the received sample Yk equals P + No W. It can be shown that the differential entropy of a Gaussian random variable with variance a 2 is セ@ log2 (2necr) (see Problem 2.10). Therefore, and h (YJ = _!_log2 [2ne(P+ N0 W)] 2 h (Nk) = 1-log2 [2ne (N0 W)] 2 (2.20) (2.21) Substituting these values of differential entropy for Yk and Nk we get C = __!_log2(1 + _____f__J bits per channel use 2 N0W (2.22) We are transmitting 2 W samples per second, i.e., the channel is being used 2W times in one second. Therefore, the information capacity can be expressed as C= Wlog2(1 + ___f__J bits per second (2.23) N0W This basic formula for the capacity of the band-limited, AWGN waveform channel with a band-limited and average power-limited input was first derived by Shannon in 1948. It is known as Shannon's third theorem, the Information Capacity Theorem. Theorem 2.2 (Information Capacity Theorem) The information capacity of a continuous channel of bandwidth W Hertz, disturbed by Additive White Gaussian Noise of power spectral density N012 and limited in bandwidth to W, is given by C= Wlog2(1 + _____f__J bits per second N0W where P is the average transmitted power. This theorem is also called the Channel Capacity Theorem. il
  • 33. Information Theory, Coding and Cryptography The Information Capacity Theorem is one of the important results in information theory. In a single formula one can see the trade off between the channel bandwidth, the average transmitted power and the noise power spectral density. Given the channel bandwidth and the SNR the channel capacity (bits/second) can be computed. This channel capacity is the fundamental limit on the rate of reliable communication for a ーッキ・イセャゥュゥエ・、L@ band-limited gセウウゥ。ョ@ channel. It should be kept in mind that in order to approach this limit, the transmitted signal must have statistical properties that are Gaussian in nature. Note that the terms channel capacity and information capacity have been used interchangeably. Let us now derive the same result in a more intuitive manner. Suppose we have a coding scheme that results in an acceptably low probability of error. Let this coding scheme take k information bits and encode them into n bit long codewords. The total number of codewords is M = 2k . Let the average power per bit be P. Thus the average power required to transmit an entire codeword is nP. Let these codewords be transmitted over a Gaussian channel with the noise variance equal to if. The received vector of n bits is also Gaussian with the mean equal to the transmitted codeword and the variance equal to na 2 . Since the code is a good one (acceptable error rate), the vector lies inside a sphere of radius .Jna 2 centred on the transmitted codeword. This sphere itself is contained in a larger sphere of radius Jn(P + <1 2 ) where n (P + a 2) is the average power of the received vector. This concept may be visualized as depicted in Fig. 2.6. There is a large sphere of radius Jn(P + a 2 ) which contains M smaller spheres of radius .Jna2 • HereM= 2k is the total number of codewords. Each of these small spheres is centred on a codeword. These are called the Decoding Spheres. Any received word lying within a sphere is decoded as the codeword on which the sphere is centred. Suppose a codeword is transmitted over a noisy channel. Then there is a high probability that received vector will lie inside the correct decoding sphere (since it is a reasonably good code). The question arises: How many non-intersecting spheres can be packed inside the large sphere? The more number of spheres one can pack, the more efficient will be the code in terms of the code rate. This is known as the Sphere Packing Problem. Fig. 2.6 Visualization of the Sphere Packing Problem. Channel Capacity and Coding The volume of an n-dimensional sphere of radius r can be expressed as V= Anr" (2.24) where An is a scaling factor. Therefore, the volume of the large sphere (sphere of all possible received vectors) can be written as Vau= An [n(P+ a 2)]n/2 (2.25) and the volume of the decoding sphere can be written as Vdr =An [na2]nl2 (2.26) The maximum number of non intersecting decoding spheres that can be packed inside the large sphere of all possible received vectors is M= セ@ [n(P + (12 )]n/2 = (1+ __f_) = 2(n/2)log2(1+Picr2) (2.27) セ{ョ。R}ョOR@ <12 On taking logarithm (base 2) on both sides of the equation we get log2M = _!!:_ log2 (1+ _!___) 2 (12 (2.28) Observing that k = log2M, we have i_ = l_log2 (1+ _L_) ' n 2 a 2 (2.29) Note that each time we use the channel, we effectively transmit i_ bits. Thus, the maximum n number of bits that can be transmitted per channel use, with a low probability of error, is l_ log2 2 x ( 1+ セ@ ) as seen previously in Eq. (2.22). Note that if represents the noise power and is equal to N0 Wfor an AWGN with power spectral density No_ and limited in bandwidth to W 2 2.6 THE SHANNON LIMIT Consider a Gaussian channel that is limited both in power and bandwidth. We wish to explore the limits of a communication system under these constraints. Let us define an ideal system which can transmit data at a bit rate Rb which is equal to the capacity, C, of the channel, i.e., Rb = C. Suppose the energy per bit is Eb. Then the average transmitted power is P= eセ「@ = EbC (2.30) Therefore, the channel capacity theorem for this ideal system can be written as _£_ = log2(1 + Eh 2_J W N0 W (2.31)
  • 34. Information Theory, Coding and Cryptography This equation can be re-written in the following form E 2CIW -1 _b =---- (2.32) No CIW The plot of the bandwidth efficiency RWb versus Eb is called the Bandwidth Efficiency No , Diagram, and is given in Fig. 2.7. The ideal system is represented by the line Rb = C. MMMMMMMMMMMMMMMMセMMMMMMMMMMMセMMMMMMMMMMMセMMMMMMMMMMM I 1 I --------- MMMMMMセMMMMMMMMMMMLMMMMMMMMMMMLMMMMMMMMMMM 1 I I MMMMMMMMMセM ---------------------- 10° ==== ]]]]]セ]]]]]ェ]]]]]]]]]]]セ]]]]]]]]]]]ェ]]]]]]]]]]]@ ---- MMMMMMMMMMMTMMMMMMMMMMMセMMMMMMMMMMMMMMMMMMMMMMM ---- MMMMMMMMMMMセMMMMMMMMMMMTMMMMMMMMMMMMMMMMMMMMMMM l-----------------------l----------------------- セMMMMMMMMMMMMMMMMMMMMMMMTMMMMMMMMMMMセMMMMMMMMMMM l I ' セMMMMMMMMMMMMMMMMMMMMMMMセMMMMMMMMMMMセMMMMMMMMMMM 1 MMMMMMMMMMMMMMセMMMMMMMMMMMセMMMMMMMMMMM 1 I 0 10 20 30 Fig. 2.7 The Bandwidth Efficiency Diagram. The following conclusions can be drawn from the Bandwidth Efficiency Diagram. · Eb d th 1· . . al (i) For infinite bandwidth, the ratio No ten s セッ@ e tmtting v ue Eb I = In 2 = 0.693 = - 1.6 dB No W---too (2.33) This value is called the Shannon Limit. It is interesting to note that the Shannon limit is a fraction. This implies that for very large bandwidths, reliable communication is possible even for the case when the signal power is less than the noise power! The channel capacity corresponding to this limiting value is Clw---too = セ P@ log2 e (2.34) Thus, at infinite bandwidth, the capacity of the channel is determined by the SNR. Channel Capacity and Coding (ii) The curve for the critical rate Rb = Cis known as the Capacity Boundary. For the case Rb > C, reliable communication is not guaranteed. However, for Rb < C, there exists some coding scheme which can provide an arbitrarily low probability of error. (iii) The Bandwidth Efficiency Diagram shows the trade-offs between the quantities !!JJ_ Eb W'N0 and the probability of error, Pe- Note that for designing any communication system the basic design parameters are the bandwidth available, the SNR and the bit error rate (BER). The BER is determined by the application and the quality of service (QoS) desired. The bandwidth and the power can be traded one for the other to provide the desired BER. (iv) Any point on the Bandwidth Efficiency Diagram corresponds to an operating point corresponding to a set of values of SNR, Bandwidth Efficiency and BER. The information capacity theorem predicts the maximum amount of information that can be transmitted through a given bandwidth for a given SNR. We see from Fig. 2.7 that acceptable capacity can be achieved even for low SNRs, provided adequate bandwidth is available. The optimum usage of a given bandwidth is obtained when the signals are noise-like and a minimal SNR is maintained at the receiver. This principle lies in the heart of any spread spectrum communication system, like Code Division Multiple Access (CDMA). 2.7 RANDOM SELECTION OF CODES Consider a set of M coded signal waveforms constructed from a set of n-dimensional binary codewords. Let us represent these codewords as follows Ci= [ci1 cfl ... ciJ, i= 1, 2, ...M (2.35) Since we are considering binary codes, ciJ is either a 0 or a 1. Let each bit of the codeword be mapped on to a BPSK waveform p 1 so that the codeword may be represented as where n s/t) = I, siJ p1(t), i = 1,2, ... M, j=l { .JE for ciJ = 1 siJ = -JE for ciJ = 0 (2.36) (2.37) and JE is the energy per code bit. The waveform s/t) can then be represented as the n- dimensional vector si = [si1 si2 ... siJ , i = 1, 2, ...M (2.38) We observe that this corresponds to a hypercube in the n-dimensional space. Let us now encode k bits of information into an n bit long codeword, and map this codeword to one of the M I
  • 35. セi@ Information Theory, Coding and Cryptography waveforms. Note that there are a total of 2k possible waveforms r.orresponding to the M = 2k different codewords. Let the information rate into the encoder be R bits/sec. The encoder takes ink bits at a time and maps the k-bit block to one of the M waveforms. Thus, k = RT and M = 2k signals are required. Let us define a parameter D as follows D = !!:_ dimensions/sec (2.39) T n = DT is the dimensionality of the space. The hypercube mentioned above has 2n = 2DT vertices. Of these, we must choose M = 2RT to transmit the information. Under the constraint D > R, the fraction of vertices that can be used as signal points is 2k 2RT F=-=-=T(D-R)T (2.40) 2n 2DT For D >R, F セ@ 0 as T セ@ oo. Since, n = DT, it implies that F セ@ 0 as n セ@ oo. Designing a good coding scheme translates to choosing M vertices out of the 2n vertices of the hypercube in such a manner that the probability of error tends to zero as we increase n. We saw that the fraction F tends to zero as we choose larger and larger n. This implies that it is possible to increase the minimum distance between these M signal points as n セ@ oo. Increasing the minimum distance between the signal points would give us the probability of error, Pe セ@ 0. There are ( 2n)M distinct ways of choosing M out of the total2n vertices. Each of these choices corresponds to a coding scheme. For each set of M waveforms, it is possible to design a communication system consisting of a modulator and a demodulator. Thus, there are 2nM communication systems, one for each choice of the M coded waveforms. Each of these communication systems is characterized by its probability of error. Of course, many of these communication systems will perform poorly in terms of the probability of error. Let us pick one of the codes at random from the possible 2nM sets of codes. The random selection of this mili codeword occurs with the probability (2.41) Let the corresponding probability of error for this choice of code be Pe({s;} m). Then the average probability of error over the ensemble of codes is cztM セ@ = lセHサウゥスュIpHサウゥスュIᄋ@ m;=1 'f'M = 2!M セQjAH@ {si}m) (2.42) Channel Capacity and Coding We will next try to upper bound this average probability of error. If we have an upper bound on セL@ then we can conclude that there exists at least one code for which this upper bound will also hold. Furthermore, ゥヲセ@ セ@ 0 as n セ@ oo, we can surmise that Pe({sJJ セ@ 0 as n セ@ oo. Consider the transmission of a k-bit message Xk = [x1 セ@ ... xJ where x 1 is binary for j= 1,2,..., k. The conditional probability of error averaged over all possible codes is J!(Xk) = L Pe (Xh {s;}JP ({s;}J all codes (2.43) where Pe (Xh {sJJ is the conditional probability of error for a given k-bit message Xk = {ク Q セ@ · •• xk], which is transmitted using the code {sJm. For the mili code, M Pe(Xh {sJJ $ L セュHsコLsォIL@ (2.44) l=1 l-T-k where, P2m(sz, sk) is the probability of error for the binary communication system using signal vectors Sz and sk. to transmit one of two equally likely k-bit messages. Hence, M J!(Xk) $ L Pe({sJJ LP2m(Sz, sJ (2.45) all codes l =1 l-T-k , On changing the order of summation we obtain jAHxセI@ $ I [ lセHサウ[スュセュHウコL@ sk)J $ I セHsコL@ sk). l=l all codes l=l (2.46) where セHウ Q L@ sk), represents the ensemble average of) , P2m(sz, sJ over the 2nMcodes. For additive White Gaussian Noise Channel, Therefore, P2m (sz, sk) = Q,( dft J 2N0 n 、セ]@ 1st- si = L (sl;.- skJ)2 = d(2JE)2 = 4dE }=1 P,m (s,, セセ@ セ@ Q(セセ@ J (2.47) (2.48) (2.49) Under the assumption that all codes are equally probable, it is equally likely that the vector s1 is any of the 2n vertices of the hypercube. Further, s1and sk are statistically independent. Hence, the probability that s1and sk differ in exactly d places is
  • 36. Information Theory, Coding and Cryptography P(d) =(; r(:J The expected value of P2m(s1, sk) over the ensemble of codes is then given by Using the following upper bound dE セセI\・MnッL@ we obtain P2(s1, s,) セH[NI⦅エッHZスセセ@ セ@ [セ@ (I+・セA@ Jr From Eqs. (2.46) and (2.53) we obtain pLHxLIセセpLHウLLNLI@ =(M-1{ セHセKNセAIイ@ <M[ [HャKセAIイ@ zLセォ@ (2.50) (2.51) (2.52) (2.53) (2.54) Recall that we need an upper bound on セG@ the average error probability. To obtain セ@ we average セ@ (Xk) over all possible k- bit information sequences. Thus, We now define a new parameter as follows. Definition 2.4 The Cutoff Rate R0 is defined as Ro= log2 2 2= l-log2 ( iKNセAIN@ (2.56) 1+ eNo The cutoff rate has the units of bits/dimension. Observe that 0 セ@ R0 セ@ 1. The plot of R0 with respect to SNR per dimension is given in Fig. 2.8. The Eq. (2.55) can now be written succinctly as セ@ <M2-nRo = TRT 2-nRo (2.57) Substituting n = DT, we obtain p <2-T(DRo-R). e (2.58) If we substitute T = nlD, we obtain p <2-n (Ro- RJI1) e (2.59) Channel Capacity and Coding 0.9 0.8 0.7 0.6 Ro 0.5 0.4 0.3 0.2 0.1 0 -10 -5 0 5 10 E!N0 (dB) Fig. 2.8 Cutoff Rate, R0, Versus the SNR (in dB) Per Dimension. Observe that Ji = _!l_ = RT = _! = R D niT n n c· (2.60) Here, Rc represents the code rate. Hence the average error probability can be written in the following instructive form (2.61) From the above equation we can conclude the following. (i) For Rc <Rothe average probability of error セ@ セ@ 0 as n セ@ oo. Since by choosing large values of n, セ@ can be made arbitrarily small, there exist good codes in the ensemble which have the probability of error less than セN@ (ii) We observe the fact that セ@ is the ensemble average. Therefore, if a code is selected at random, the probability is that its error セ@ > a セ@ is less than 11a. This implies that there are no more than 10% of the codes that have an error probability that exceeds QPセN@ Thus, エィ・イセ@ are many good codes. (iii) The codes whose probability of error ・ク」・・、セ@ are not always bad codes. The probability of error of these codes may be reduced by increasing the dimensionality, n. I
  • 37. Information Theory, Coding and Cryptography For binary coded signals, the cutoff rate, J?o, saturates at 1 bit/dimension for large values of__§___, No say >10. Thus, to achieve lower probabilities of error one must reduce the code rate, Rc Alternately, very large block lengths have to be used. This is not an efficient approach. So, binary codes become inefficient at high SNRs. For high SNR scenarios, non-binary coded signal sets should be used to achieve an increase in the number of bits per dimension. Multiple- amplitude coded signal sets can be easily constructed from non-binary codes by mapping each code element into one of the possible amplitude levels (e.g. Pulse Amplitude Modulation). For random codes using Mary multi-amplitude signals, it was shown by Shannon (in 1959) that (2.62) Let us now relate the cutoff rate R•0 to the capacity of the AWGN channel, which is given by e= W log2 (1 + __f__) bits per second (2.63) N 0W The energy per code bit is equal to E= PT n (2.64) Recall that from the sampling theorem, a signal of bandwidth W may be represented by samples taken at a rate 2W Thus, in the time interval of length T there are n = 2WTsamples. Therefore, we may write D = _!l__ = 2W Hence, T nE P=-=DE. T (2.65) Define normalized capacity, en = _f_ = _s;__ and substitute for Wand Pin (2.63) to obtain 2W D 」Nセ@ (セ@ )'og2 (1+ 2;J = ( セ@ }og2 (1 + 2RcY.b) (2.66) where 'Yb is the SNR per bit. The normalized capacity, en and cutoff rate, R·0, are plotted in Fig. 2.9. From the figure we can conclude the following: (i) rセ@ < en for all values of __§___ . This is expected because en is the ultimate limit on the No transmission rate R/D. Channel Capacity and Coding (ii) For smaller values of the difference between en and R*0 is approximately 3 dB. This means that randomly selected, average power limited, multi-amplitude signals yield R•0 within 3 dB of channel capacity. 2.5 2 0:1.5 "'C c I'll <:: (..) 1 0.5 -5 0 E!No (98) 5 10 Fig. 2.9 The Normalized Capacity, Cn and Cutoff Rate, rセL@ for an AWGN Channel. 2.8 CONCLUDING REMARKS Pioneering work in the area of channel capacity was done by Shannon in 1948. Shannon's second theorem was indeed a surprising result at the time of its publication. It claimed that the probability of error for a BSC could be made as small as desired provided the code rate was less than the channel capacity. This theorem paved the way for a systematic study of reliable communication over unreliable (noisy) channels. Shannon's third theorem, the Information Capacity Theorem, is one of the most remarkable results in information theory. It gives a relation between the channel bandwidth, the signal to noise ratio and the channel capacity. Additional work was carried out in the 1950s and 1960s by Gilbert, Gallager, Wyner, Forney and Viterbi to name some of the prominent contributors. The concept of cutoff rate was also developed by Shannon, but was later used by Wozencraft, Jacobs and Kennedy as a design parameter for communication systems.Jordan used the concept of cutoff rate to design coded waveforms for Mary orthogonal signals with coherent and non- coherent detection. Cutoff rates have been widely used as a design criterion for various channels, including fading channels encountered in wireless communications. . !
  • 38. Information Theory, Coding and Cryptography SUMMARY • The conditional probability P (yi I xj is called the channel transition probability and is denoted by Pji· The conditional pro6abilities {P(yi I x)} that characterize a DMC can be arranged in the matrix form P = [p1J. P is known as the probability transition matrix for the channel. · • The capacity of a discrete memoryless channel (DMC) is defined as the maximum average mutual information in any single use of the channel, where the maximization is over all possible input probabilities. That is, q-I r-I P(y-lx·) C= max !(X; Y) =max L :LP(x1)P(yilx1)log ( { P(xj) P(xj) j=Oi=O p Yi • The basic objective of channel coding is to increase the resistance of the digital communication system to channel noise. This is done by adding redundancies in the transmitted data stream in a controlled manner. Channel coding is also referred to as error control coding. • The ratio, r= ! ,is called the code rate. Code rate of any coding scheme is always less n than unity. • Let a DMS with an alphabet X have entropy H (X) and produce symbols every I: seconds. Let a DMC have capacity C and be used once every Tc seconds. Then, if H(X) :::; _f._, there exists a coding scheme for which the source output can be transmitted I: Tc over the noisy channel and be reconstructed with an arbitrarily low probability of error. This is the Channel Coding Theorem or the Noisy Coding Theorem. • For H(X) :::; _q__, it is not possible to transmit information over the channel and I: セ@ reconstruct it with an arbitrarily small probability of error. The parameter _f._ is called セ@ the Critical Rate. • The information capacity can be expressed as C= Wlog2 (I+_f_) bits per second. N0W This is the basic formula for the capacity of the band-limited, AWGN waveform channel with a band-limited and average power-limited input. This is the crux of the Information Capacity Theorem. This theorem is also called the Channel Capacity Theorem. • The cutoff rate 11, is given by 11, = log2 セ@ L = 1-log2 ( 1+ e- : 0 J.The cutoff rate I+ e No has the units of bits/dimension. Note that 0 :::; R 0:::; 1. The average error probability in Channel Capacity and Coding terms of the cutoff rate can be written as セ@ < 2-n (J?o - R,). For Rc < セ@ the average probability of error セ@ セ@ 0 as n セ@ oo . ff PRC913LEMS 2.I Consider the binary channel shown in Fig. 2.IO. Let the a priori probabilities oilifsendiP.n(Xg the binary symbol be Po and pi, where Po+ PI= I. Find the aposteriori probab· "ties = 0 IY = 0) and P (X= IIY = I) P1 1 1-q Fig. 2.10 q 2.2 Find the capacity of the binary erasure channel shown in Fig. 2.1I, where Po and PI are the a priori probabilities. P1 1 1-q Fig. 2.11 e 2.3 Consider the channels A, B and the cascaded channel AB shown in Fig. 2.I2. (a) Find CA the capacity of channel A. (b) Find セエィ・@ capacity of channel B. . . (c) Next, cascade the two channels and determine the combmed capacity CAB. (d) Explain the relation between CA, セ。ョ、@ CAB. lI I I I
  • 39. Information Theory, Coding and Cryptography セ@ セ@ A B Fig. 2.12 2.4 Find the capacity of the channel shown in Fig. 2.13. 0.5 0.5 Fig. 2.13 AB 2.5 (a) A telephone channel has a bandwidth of 3000 Hz and the SNR = 20 dB. Determine the channel capacity. (b) If the SNR is increased to 25 dB, determine the capacity. 2.6 Determine the channel capacity of the channel shown in Fig. 2.14. 1 -p Fig. 2.14 2.7 Suppose a TV displays 30 frames/second. There are approximately 2 x 105 pixels per frame, each pixel requiring 16 bits for colour display. Assuming an SNR of 25 dB calculate the bandwidth required to support the transmission of the TV video signal (use the Information Capacity Theorem). 2.8 Consider the Z channel shown in Fig. 2.15. (a) Find the input probabilities that result in capacity. (b) If N such channels are cascaded, show that the combined channel can be represented by an equivalent Z channel with the channel transition probability jll. (c) What is the capacity of the combined channel as N -7 oo? Channel Capacity and Coding セ@ 1 -p Fig. 2.15 2.9 Consider a communication system using antipodal signalling. The SNR is 20 dB. (a) Find the cutoff rate, J?o. (b) We want to design a code which results in an average probability of error, Pe < 10-6. What is the best code rate we can achieve? (c) What will be the dimensionality, n, of this code? (d) Repeat the earlier parts (a), (b) and (c) for an SNR = 5dB. Compare the results. 2.10 (a) Prove that for a finite variance o-2 , the Gaussian random variable has the largest differential entropy attainable by any random variable. (b) Show that this entropy is given by _!_ log2 (2 neo-2 ). 2 C<.9MPUTER PR<.9'BLEMS 2.11 Write a computer program that takes in the channel transition probability matrix and computes the capacity of the channel. 2.12 Plot the operating points on the bandwidth efficiency diagram for M-PSK, M= 2, 4, 8, 16 and 32, and the probabilities of error: (a) Pe = 10--{) and (b) Pe = 10-8 . 2.13 Write a program that implements the binary repitition code of rate 11n, where n is an odd integer. Develop a decoder for the repitition code. Test the performance of this coding scheme over a BSC with the channel transition probability, p. Generalize the program for a repetition code of rate 11n over GF (q). Plot the residual Bit Error Rate (BER) versus p and q (make a 3-D mesh plot). I
  • 41. Linear Block Codes for Error Correction 3 :C mセ@ w CM1I セ@ セ@ セ@ btcr tx ihct.WL net be-- allcwedt to- セ@ i-t1t セ@ キセ@ of セ@ セセセcャ「」エ、Bーセセーセ@ II Ri.cluM-d- W. hセ@ 3.1 INTRODUCTION TO ERROR CORRECTING CODES In this age ofinformation, there is increasing need not only for speed, but also for accuracy in the storage, retrieval, and transmission of data. The channels over which messages are transmitted are often imperfect. Machines do make errors, and their non-man-made mistakes can turn otherwise flawless programming into worthless, even dangerous, trash.Just as architects design buildings that will stand even through an earthquake, their computer counterparts have come up with sophisticated techniques capable of counteracting digital manifestations of Murphy's Law ("If anything can go wrong, it will go"). Error Correcting Codes are a kind of safety net- the mathematical insurance against the vagaries of an imperfect digital world. Error Correcting Codes, as the name suggests, are used for correcting errors when messages are transmitted over a noisy channel or stored data is retrieved. The physical medium through which the messages are transmitted is called a channel (e.g. a telephone line, a satellite link, a wireless channel used for mobile communications etc.). Different kinds of channels are
  • 42. Information Theory, Coding and Cryptography prone to different kinds of noise, which corrupt the data being transmitted. The noise could be caused by lightning, human errors, equipment malfunction, voltage surge etc. Because these error correcting codes try to overcome the detrimental effects of noise in the channel, the encoding procedure is also called Channel Coding. Error control codes are also used for accurate transfer of information from one place to another, for example storing data and reading it from a compact disc (CD). In this case, the error could be due to a scratch on the surface of the CD. The error correcting coding scheme will try to recover the original data from the corrupted one. The basic idea behind error correcting codes is to add some redundancy in the form of extra symbol to a message prior to its transmission through a noisy channel. This redundancy is added in a controlled manner. The encoded message when transmitted might be corrupted by noise in the channel. At the receiver, the original message can be recovered from the corrupted one if the number of errors are within the limit for which the code has been designed. The block diagram of a digital communication system is illustrated in Fig. 3.1. No.te that the most important block in the figure is that of noise, without which there will be no need for the channel encoder. Example 3.1 Let us see how redundancy combats the effects ofnoise. The normal language that we use to communicate (say, English) has a lot ofredundancy built into it. Consider the following sentence. CODNG THEORY IS AN INTRSTNG SUBJCT. As we can see, there are a number of errors in this sentence. However, due to familiarity with the language we may guess the original text to have read: CODING THEORY IS AN INTERESTING SUBJECT. What we have just used is an error correcting strategy that makes use ofthe in-builtredundancy in English language to reconstruct the original message from the corrupted one. Information Source ' ᄋLGLZLセ@ ,;..· Use of Information Channel Encoder Channel Decoder Demodulator 1'*------' Fig. 3.1 Block Diagram (and the principle) of a Digital Communication System. Here the Source Coder/Decoder Block has not been shown. Linear Block Codes for Error Correction The objectives of a good error control coding scheme are (i) error correcting capability in terms of the number of errors that it can rectify (ii) fast and efficient encoding of the message, (iii) fast and efficient decoding of the received message (iv) maximum transfer of information bits per unit time (i.e., fewer overheads in terms of redundancy). The first objective is the primary one. In order to increase the error correcting capability of a coding scheme one must introduce more redundancies. However, increased redundancy leads to a slower rate of transfer of the actual information. Thus the objectives (i) and (iv) are not totally compatible. Also, as the coding strategies become more complicated for correcting larger number of errors, the objectives (ii) and (iii) also become difficult to achieve. In this chapter, we ウィ。ャセ@ first learn about the basic definitions of error control coding. These definitions, as we shall see, would be used throughout this book. The concept of Linear Block Codes will then be introduced. linear Block Codes form a very large class of useful codes. We will see that it is very easy to work with the matrix description of these codes. In the later part of this chapter, we will learn how to efficiently decode these Linear Block Codes. Finally, the notion of perfect codes and optimal linear codes will be introduced. $ 3.2 BASIC DEFINITIONS Given here are some basic definitions, which will be frequently used here as well as in the later chapters. Definition 3.1 A Word is a sequence of symbols. Definition 3.2 A Code is a set of vectors called Codewords. Definition 3.3 The Hamming Weight of a Codeword {or any vector) is equal to the number of nonzero elements in the codeword. The Hamming Weight of a codeword cis denoted by w(c). The Hamming Distance between two codewords is the number of places the codewords differ. The Hamming Distance between two codewords c1 and '2 is denoted by d(ch '2). It is easy to see that d(ch '2) = w (cl- 」セN@ Example 3.2 Consider a code C with two code words= {0100, 1111} with Hamming Weight w (0100) =1 and w (1111) =4. The Hamming Distance between the two codewords is 3 because they differ at the 18 3rd and 4th p1aees. Observe that w (0100- 1111) = w (1011) = 3 = d(OlOO, 1111). Example 3.3 For the codeC = {01234, 43210}, the Hamming Weight ofeach codeword is 4and the Hamming Distance between the codewords is 4 (because only the 3rd component of the two codewords are identical while they differ at 4 places). . j !
  • 43. I l Information Theory, Coding and Cryptography Definition 3.4 A Block Code consists of a set of fixed length codewords. The fixed length of these codewords is called the Block Length and is typically denoted by n. Thus, a code of blocklength n consists of a set of codewords having n components. A block code of size M defined over an alphabet with qsymbols is a set of M q-ary sequences, each of length n. In the special case that q= 2, the symbols are called bits and the code is said to be a binary code. Usually, M = q* for some integer k, and we call such a code an (n, k) code. Example 3.4 The code C = {00000, 10100, 11110, 11001} is a block code ofblock length equal to 5. This code can be used to represent two bit binary numbers as follows Uncoded bits Codewords 00 ()()()()() 01 10 11 10100 ll110 11001 HereM= 4,k= 2 andn = 5. Suppose we have to transmit a sequence of 1'sand O's using the above coding scheme. Let's say that the sequence to be encoded is 1 0 0 1 0 1 0 0 1 1 ... The first step is to break the sequence in groups of two bits (because we want to encode two bits at a time). So we partition as follows 1001010011 ... Next, replace each block by its corresponding codeword. 11110 10100 10100 ()()()()() 11001 ... Thus 5 bits (coded) are sent for every 2 bits of uncoded message. It should be observed that for every 2 bits of information we are sending 3 extra bits (redundancy). Definition 3.5 The Code Rate of an (n, Jq code is defined as the ratio (kin), and denotes the fraction of the codeword that consists of the information symbols. Code rate is always less than unity. The smaller the code rate, the greater the redundancy, i.e., more of redundant symbols are present per information symbol in a codeword. A code with greater redundancy has the potential to detect and correct more of symbols in error, but reduces the actual rate of transmission of information. Definition 3.6 The minimum distance of a code is the minimum Hamming distance between any two codewords. If the code C consists of the set of codewords {ci' i=O, 1, ...,M-1} then the minimum distance ofthe code is given by a=mind(ci' c 1), 1 i*'i An (n, "' code with minimum distance ais sometimes denoted by (n, k, a). Definition 3.7 The minimum weight of a code is the smallest weight of any non- zero codeword, and is denoted by w·. Linear Block Codes for Error Correction Theorem 3.1 For a linear code the minimum distance is equal to the minimum weight of the code, i.e., d.. = w*. Intuitive proof: The distance diJ between any two codewords ci and 0is simply the weight of the codeword formed by ci - c 1 Since the code is linear, the difference between two codewords results in another valid codeword. Thus, the minimum weight of a non-zero codeword will reflect the minimum distance of the code. Definition 3.8 A linear code has the following properties: (i) The sum of two codewords belonging to the code is also a codeword belonging to the code. (ii) The all-zero word is always a codeword. (iii) The minimum Hamming distance between two codewords of a linear code is equal to the minimum weight of any non-zero codeword, i.e., a= w"'. Note that if the sum of two codewords is anott'l..!r codeword, the difference of two codewords will also yield a valid codeword. For example, if ell '2 and c3 are valid codewords such that c1 + £2 = c3 then Ca - c1 = c2. Hence it is obvious that the all-zero codeword must always be a valid codeword for a linear block code (seH-subtraction of a codeword). Example 3.5 The code C = {0000, 1010,0101, 1111} is a linear block code ofblock lengthn = 4. Observe that all the ten possible sums of the codewords 0000 + 0000 = 0000,0000 + 1010 = 1010, ()()()() + ('101 = 0101, 0000 + 1111 = 1111, 1010 + 1010 = 0000, 1010 + 0101 = 1111, 1010 + 1111 = 0101, 0101 + 0101 = 0000, 0101 + 1111 = 1010 and 1111 + 1111 = 0000 are in C and the all-zero codeword is inC. The minimum distance ofthis code isd = 2. In orderto verify the minimum distance ofthis linear w code we can determine the distance between all pairs of codewords (which is ( セI@ = 6 in number): d (0000, 1010) = 2, d (0000, 0101) = 2, d (0000, 1111) = 4 d (1010, 0101) = 4, d (1010, 1111) = 2, d (0101, 1111) = 2 We observe that the minimum distance ofthis code is 2. Note that the code given in Example 3.4 is not linear because 1010 + 1111 = 0101, which is not a valid codeword. Even though the all-zero word is a valid codeword, it does not guarantee linearity. The presence of an all-zero codeword is thus a necessary b!lt not a sufficient condition for linearity. .I
  • 44. Information Theory, Coding and Cryptography In order to make the error correcting codes easier to use, understand and analyze, it is helpful to impose some basic algebraic structure on them. As we shall soon see, it is useful to have an alphabet werein it is easy to carry out basic mathematical operations such as add, subtract, multiply and divide. Definition 3.9 A field F is a set of elements with two operations + (addition) and . (multiplication) satisfying the following properties (i) F is closed under + and ., i.e., a + band a · bare in F if aand bare in F. For all a, band cin F, the following hold: (ii) Commutative laws: a + b= b+ a, a · b= b. a (iii) Associative laws: (a+ b)+ c= a+ (b +c), a· (b · c)= (a· b) · c (iv) Distributive law: a· (b + q=a· b+a· c Further, identity elements 0 and 1 must exist in F satisfying: (v) a+ 0 =a (vi) a· 1 =a (vii) For any a in F, there exists an additive inverse (-a) such that a + (-a) = 0. {viii) For any ain F, there exists an multiplicative inverse (a-1 ) such that a· a-1 = 1. The above properties are true for fields with both finite as well as infinite elements. A field with a finite number of eleiJents (say, q) is called a Galois Field (pronounced Galva Field) and is denoted by GF(q). If only the first seven properties are satisfied, then it is called a ring. Extunple 3.6 Consider GF (4) with 4 elements {0, 1, 2, 3}. The addition and multiplication tables for GF(4) are + 0 1 2 3 . 0 1 2 3 0 0 I 2 3 0 0 0 0 0 1 1 0 3 2 1 0 1 2 3 2 2 3 0 1 2 0 2 3 1 3 3 2 1 0 3 0 3 1 2 It should be noted here that the addition in GF(4) is not modulo 4 addition. Linear Block Codes for Error Correction Let us define a vector space, GF(q)n, which is a set of n-tuples of elements from GF(q). Linear block codes can be looked upon as a set of n-tuples (vectors oflength n) over GF(q) such that the sum of two codewords is also a codeword, and the product of any codeword by a field element is a codeword. Thus, a linear block code is a subspace of GF(q)n. Let S be a set of vectors of length n whose components are defined over GF(q). The set of all linear combinations of the vectors ofSis called the linear span ofSand is denoted by <S>. The linear span is thus a subspace of GF( q)n, generated by S. Given any subset S of GF( q)n, it is possible to obtain a linear code C = <S> generated by S, consisting of precisely the following codewords: (i) all-zero word, (ii) all words in S, (iii) all linear combinations of two or more words in S. Example 3.7 LetS= {1100, 0100, 0011 }. All possible linear combinations of S are llOO + 0100 = 1000, 1100 + 0011 = 1111,0100 + 0011 = 0111, 1100 + 0100 + 0011 セ@ 1011. Therefore, C = <S> = {0000, 1100,0100,0011, 1000, 1111,0111, 1011 }. The minimumdistance of this code is w(OIOO) = 1. Extunple 3.8 LetS= {12, 21} defined over GF(3). The addition and multiplication tables of field GF(3) = {0, 1, 2} are given by: + 0 1 2 0 1 2 0 0 1 2 0 0 0 0 1 1 2 0 1 0 1 2 2 2 0 1 2 0 2 All possible linear combinations of 12 and 21 are: 12 + 21 = 00, 12 + 2(21) = 21, 2(12) + 21 = 12. Therefore, C = <S> = {00, 12, 21, 00, 21, 12} = {00, 12, 21}. 3.3 MATRIX DESCRIPTION OF LINEAR BLOCK CODES As we have observed earlier, any code Cis a subspace of GF(qt. Any set of basis vectors can be used to generate the code space. We can, therefore, define a generator matrix, G, the rows of which form the basis vectors of the subspace. The rows of G will be linearly independent. Thus, a linear combination of the rows can be used to generate the codewords of C. The generator matrix will be a k x n matrix with rank k. Since the choice of the basis vectors is not unique, the generator matrix is not unique for a given linear code.
  • 45. Information Theory, Coding and Cryptography The generator matrix converts (encodes) a vector of length k to a vector of length n. Let the input vector (uncoded symbols) be represented by i. The coded symbols will be given by c= iG (3.1) where c is called the codeword and i is called the information word. The generator matrix provides a concise and efficient way of representing a linear block code. The nx k matrix can generate q* codewords. Thus, instead of having a large look-up table of q* codewords, one can simply have a generator matrix. This provides an enormous saving in storage space for large codes. For example, for the binary (46, 24) code the total number of codewords are 224 = 1,777,216 and the size of the lookup table of codewords will be n x 2* = 771,751,936 bits. On the other hand if we use a generator matrix, the total storage requirement would be n x k= 46 x 24 = 1104 bits. Example 3.9 Consider a generator matrix G=[1 0 1] 0 1 0 [0 0] [ セ@ 0 セ@ ]= [0 0 0], [0 1] [ セ@ 0 セ}@ = [0 1 0] cl = 1 cl= 1 c3 = [1 0] [ セ@ 0 セ}@ = [1 0 1], c4 =[11] {セ@ 0 セ}@ = [1 1 1] 1 1 Therefore, this generator matrix generates the code C = {000, 010, 101, 111 }. 3.4 EQUIVALENT CODES Definition 3.10 A permutation of a setS= サク Q LセL@ ...,x11} is a one to one mapping from S to itself. A permutation can be denoted as follows Xz J, (3.2) f(xz) Deftnitlon 3.11 Two q-axy codes are called equivalent if one can be obtained from the other by one or both operations listed below: (i) permutation of the symbols appearing in a fixed position, (ii) permutation of the positions of the code. Linear Block Codes for Error Correction Suppose a code c.ontaining M codewords are displayed in the form of an M x n matrix, where the rows represent the codewords. The operation (i) corresponds to the re-labelling of the symbols appearing in a given column, and the operation (ii) represents the rearrangements of the colums of the matrix. Example 3.10 Consider the ternary code (a code whose components e {0, 1,2}) of blocldength 3 C= サセ@ セ@ セ@ 0 1 2 If we apply the permutation 0 セ@ 2 , 2 セ@ 1, 1 セ@ 0 to column 2 and 1セ@ 2, 0 セ@ 1, 2 -+ 0 to column 3 we obtain Cl = サセ@ セ@ セ@ 0 0 0 The code Cl is equivalent to a repetition code of length 3. Note that the original code is not linear, but is equfvalent to a linear code. Definition 3.12 Two linear q-ary codes are called equivalent if one can be obtained from the other by one or both operations listed below: (i) multiplication of the components by a non-zero scalar, (ii) permutation of the positions of the code. Note that in Definition 3.11 we have defined equivalent codes that are not necessarily linear. Theorem 3.2 Two k x n matrices generate equivalent linear (n, k) codes over GF(q) if one matrix can be obtained from the other by a sequence of the following operations: (i) Permutation of rows (ii) Multiplication of a row by a non scalar (iii) Addition of a scalar multiple of one row to another (iv) Permutation of columns (v) Multiplication of any column by a non-zero scalar.
  • 46. j.l Information Theory, Coding and Cryptography セッッヲ@ The first three operations (which are just row operations) prlserve the linear ュ、セー・ョ、・ョ」・@ of the rows of the generator matrix. The operations merely modify the basis. The last two operations (which are column operations) convert the matrix to one which will produce an equivalent code. Theorem 3.3 A generator matrix can be reduced to its systematic form (also called the standard form of the generator matrix) of the type G = [ I 1 P] where I is a k x k identity matrix and P is a k x (n - k) matrix. Proof tィセ@ krows of any generator matrix (of size kx n) are linearly independent. Hence, by ー・セッイュュァ@ elementary row operations and column permutations it is possible to obtain an eqmvalent generator matrix in a row echelon form. This matrix will be of the fo [II P]. rm Example 3.11 Consider the generator matrix of a (4, 3) code over GF(3): g]{セ@ セ@ セ@ セ}@ 1 2 2 1 Let us represent the ith row by 7i and thejth column by 7i. Upon replacing 7 3 by 73 - 71 - 72 we get (note that in GF(3), -1 =2 and -2 =1 because 1 + 2 =0, see table in Example 3.6) G = {セ@ セ@ セ@ セ}@ 0 1 2 0 Next we replace 71 by r1 - r3 to obtain [ 0 0 G= 1 0 0 1 01] 1 0 2 0 Finally, shifting c4 -7 cl, cl -7 C2, C2 -7 c3 and c3 -7 c4 we obtain the standard form of the generator matrix g]{セ@ セ@ セ@ セ}ᄋ@ 0 0 1 2 Linear Block Codes for Error Correction 3.5 PARITY CHECK MATRIX One of the objectives of a good code design is to have fast and efficient encoding and decoding methodologies. So far we have dealt with the efficient generation of linear block codes using a generator matrix. Codewords are obtained simply by multiplying the input vector (uncoded word) by the generator matrix. Is it possible to detect a valid codeword using a similar concept? The answer is yes, and such a matrix is called the Parity Check Matrix, H, for the given code. For a parity check matrix, cHT = 0 (3.3) where cis a valid codeword. Since c= iG, therefore, iGHT= 0. For this to hold true for all valid informat words we must have (3.4) The size of the parity check matrix is (n - k) x n. A parity check matrix provides a simple method of detecting whether an error has occurred or not. If the multiplication of the received word (at the receiver) with the transpose of H yields a non-zero vector, it implies that an error has occurred. This methodology, however, will fail if the errors in the transmitted codeword exceed the number of errors for which the coding,scheme is designed. We shall soon find out that the non-zero product of cHT might help us not only to detect but also to correct the errors under some conditions. Suppose the generator matrix セウ@ represented in its systematic form G = [I IP]. The matrix P is called the Coefficient Matrix. Then the parity check matrix will be defined as H= ( -PTI I], (3.5) where pT represents the transpose of matrix P. This is because (3.6) Since the choice of a generator matrix is not unique for a code, the parity check matrix will not be unique either. Given a generator matrix G, we can determine the corresponding parity check matrix and vice versa. Thus the parity 」ィセ」ォ@ matrix H can be used to specify the code completely. From Eq. (3.3) we observe that the vector c must have 1's in such positions that the corresponding rows of HT add up to the zero vector 0. Now, we know that the number of 1's in a codeword pertains to its Hamming weight. Hence, the minimum distance d ofa linear block code is given by the minimum number ofrows ofHT (or, the columns ofH) whose sum is equal to the qro vector. .I
  • 47. Information Theory, Coding and Cryptography Exampk3.12 For a (7, 4) linear block code the generator matrix is given by g]{セ@ 0 0 0 1 0 '] 1 0 0 1 1 1 0 0 1 0 0 1 0 ' 0 0 0 1 0 1 0 [ 101] ' 111 1100 the matrix Pis given by 0 1 0 and pr is given by [o 1 1 1]. Observing the fact that 010 1100 - 1 = 1 for the case of binary, we can write Lhe parity check matrix as H=[-PTII] fl100100] =l·o 1 1 1 o 1 o. 1100001 Note that the columns 1, 5 and 7 of the parity check matrix, H, add up to the zero vector. Hence, for this code, d* = 3. Theorem 3.4 The code C contains a nonzero codeword of Hamming weight w or less if and only if a linearly dependent set of w columns of H exist. Proof Consider a codeword c E C. Let the weight of c be w which implies that there are w non-zero components and (n- w) zero components in c. If we throw away the w zero components, then fro:n the relation CHT = 0 we can conclude that w columns of H are linearly dependent. Conversely, if H has w linearly dependent columns, then a linear combination of at most w columns is zero. These w non-zero coefficients would define a codeword of weight w or less that satisfies CHT = 0. Definition 3.13 An (n, k) systematic code is one in which the first k symbols of the codeword of block length n are the information symbols themselves (i.e., the uncoded vector) and the remainder the (n- k) symbols form the parity symbols. Linear Block Codes for Error Correction Example 3.13 The following is a (5, 2) systematic code over GF(3) S.No. Information Symbols Codewords (k = 2) (n = 5) 1. 00 00 000 2. 01 01 121 3. 02 02 220 4. 10 10 012 5. 11 11 221 6. 12 12 210 7. 20 20 020 8. 21 21 100 9. 22 22 212 Note that the total number ofcodewords is 3k = 3 2 = 9. Each codeword begins with theinformation symbols, and has three parity symbols at the end. The parity symbols for the information word 01 are 121 in the above table. A generator matrix in the systematic form (standard form) will generate a systematic code. Theorem 3.5 The minimum distance (minimum weight) of an (n, k) linear code is bounded as follows d<S,n-k+l This is known as the Singleton Bound. (3.7) Proof We can reduce all linear block codes tc their equivalent systematic forms. A systematic code can have one information symbol and (n - k) parity symbols. At most all the parity symbols can be non-zero, resulting in the total weight of the codeword to be (n- k) + 1. Thus the weight of no codeword can exceed n- k + 1 giving the following definition of a maximum distance code. Definition 3.14 A Maximum Distance Code satisfies a= n - k + I. Having familiarized ourselves with the concept of minimum distance of a linear code, we shall now explore how this minimum distance is related to the total number of errors the code can detect and possibly correct. So we move over to the receiver end and take a look at the methods of decoding a linear block code. 3.6 DECODING OF A LINEAR BLOCK CODE The basic objective of channel coding is to detect and correct errors when messages are transmitted over a noisy channel. The noise in the channel randomly transforms some of the
  • 48. セ G@ .' ; j Information Theory, Coding and Cryptography symbols of the transmitted codeword into some other symbols. If the noise, for example, changes just one of the symbols in the transmitted codeword, the erroneous codeword will be at a Hamming distance of one from the original codeword. If the noise transforms t symbols (that is, t '>nfibols in the codeword are in error), the Hamming distance of the received word will be at a Hannnirg distance of t from the originally transmitted codeword. Given a code, how many errors can it detect and how many can it correct? Let us first look at the detection problem. An error will be detected as long as it does not transform one codeword into another valid codeword. If the minimum distance between the codewords is I, the weight of the error pattern must be I or more to cause a transformation of one codeword to another. Therefore, an (n, k, I) code will detect at least all nonzero error patterns of weight less than or equal to (I - 1). Moreover, there is at least one error pattern of weight I which will not be detected. This corresponds to the two codewords that are the closest. It may be possible that some error patterns of weight I or more are detected, but allerror patterns of weight I will not be detected. Example 3.14 For the code C1 = {000, Ill} the minimum distance is 3. Therefore errorpatterns of weight 2 or I can be detected. This means that any error pattern belonging to the set {011, 101, 110, 001, 010, 100} will be detected by this code. Next consider the code C2 ={001, 110, 101} with d* =1. Nothing can be said regardinghow many errors this code can detect because d*- 1 = 0. However, the error pattern 010 of weight 1 can be detected by this code. But it cannot detect all error patterns with weight one, e.g., the error vector 100 cannot be detected. Next let us look at the problem of error correction. The objective is to make the best possible guess regarding the originally transmitted codeword on the basis of the received word. What would be a smart decoding strategy? Since only one of the valid codewords must have been transmitted, it is logical to conclude that a valid codeword nearest (in terms of Hamming distance) to the received word must have been actually transmitted. In other words, the codeword which resembles the received word most is assumed to be the one that was sent. This strategy is called the Nearest Neighbour Decoding, as we are picking the codeword nearest to the received word in terms of the Hamming distance. It may be possible that more than one codeword is at the same Hamming distance from the received word. In that case the receiver can do one of the following: (i) It can pick one of the equally distant neighbours randomly, or (ii) request the transmitter to re-transmit. Linear Block Codes for Error Correction To ensure that the received word (that has at most terrors) is closest to the original codeword, and farther from all other codewords, we must put the following condition on the minimum distance of the code iセ@ 2t+ 1 (3.8) Graphically, the condition for correcting t errors or less can be visualized from Fig. 3.2. Consider the space of all 'fary n-tuples. Every 9:ary vector of length n can be represented as a point in this space. Every codeword can thus be depicted as a point in this space, and all words at a Hamming distance of tor less would lie within the sphere centred at the codeword' and with a radius oft. Ifthe minimum distance of the code is I, and the condition iセ@ 2t+ 1 holds good, then none of these spheres would intersect. Any received vector (which is just a point) within a specific sphere will be closest to its centre (which represents a codeword) than any other codeword. We will call the spheres associated with each codeword its Decoding Sphere. Hence it is possible to decode the received vector using the 'nearest neighbour' method without ambiguity. @ I I I I Fig. 3.2 Decoding Spheres. Figure 3.2 shows words within the sphere of radius t and centred at c1 will be decoded as c1 • For unambiguous decoding iセ@ 2t + 1. The condition iセ@ 2t + 1 takes care of the worst case scenario. It may be possible, however, that the above condition is not met but it is still feasible to correct t errors as illustrated in the following example. Example 3.15 Consider the code C = {00000, 01010, 10101, 11111 }. The minimum distance d* = 2. Suppose the codeword 11111 was transmitted and the received word is 11110, i.e., t = 1 (one error has occurred in the fifth component). Now, d (11110, 00000) = 4, d (11110, 01010) =2, d (11110, 10101) = 3, d (11110, llll1) = I. Using the nearest neighbour decoding we can conclude that 11111 was transmitted. Eventhough a single error correction (t =1) was done in this case, d* < 2t + 1 =3. So it is possible to correct
  • 49. Information Theory, Coding and Cryptography errors even whend*;;:: 2t + 1. However, in many cases a single errorcorrection may not be possible with this code. For example, ifOOOOO was sent and 01000 was received, d (01000, 00000) =1, d (01000, 01010) =1, d (01000, 10101) =4, d (01000, 11111) =4. In this case there cannot be a clear cut decision, and a coin will have to be flipped1 Definition 3.15 An Incomplete Decoder decodes only those received codewords that are clearly closest to one of the codewords. In the case of ambiguity, the decoder declares that the received word is unrecognizable. The receiver is then requested to re-transmit. A Complete Decoder decodes every received word, i.e., it tries to map every received word to some codeword, even if it has to make a guess. Example 3.16 was that of a Complete Decoder. Such decoders may be used when it is better to have a good guess 'rather than to have no guess at all. Most of the real life decoders are incomplete decoders. Usually they send a message back to the transmitter requesting them to re-transmit. Definition 3.16 A receiver declares that an erasure has occurred {i.e., a received symbol has been erased) when the symbol is received ambiguously, or the presence of an interference is detected during reception. Example 3.16 Consider a binary Pulse Amplitude Modulation (PAM) Scheme where 1 is represented by five volts and 0 is represented by zero volts. The noise margin is one volt, which implies that at the receiver: if the received voltage is between 4 volts and 5 volts セ@ the bit sent is 1, if the received voltage is between 0 volt and 1 volt セ@ the bit sent is 0, if the received voltage is between 1 volt and 4 volts セ@ an erasure has occurred. Thus if the receiver セ・ゥカ・、@ 2.9 volts during a bit interval, it will declare that an erasure has occurred. A channel can be prone both to errors and erasures. If in such a channel t errors and r erasures occur, the error correcting scheme should be able to compensate for the erasures as well as correct the errors. If r erasures occur, the minimum distance of the code will become d - r in the worst case. This is because, the erased symbols have to be simply discarded, and if they Linear Block Codes for Error Correction were contributing to the minimum distance, this distance will reduce. A simple example will illustrate the point. Consider the repetition code in which 0 セ@ 00000 QセQQQQQ@ Here d = 5. If r = 2, i.e., two bits get erased (let us say the first two), we will have 0 セ@ ??000 Qセ__QQQ@ Now, the effective minimum distance d1* =I -r= 3. Therefore, for a channel with terrors and r erasures, I- r;;:: 2t + 1. Or, 1;;=:2t+r+1 (3.9) For a channel which has no errors (t = 0), only r erasures. l;;::r+1 (3.10) Next let us give a little more formal treatment to the decoding procedure. Can we construct some mathematical tools to simplify the nearest neighbour decoding? Suppose the codeword c = e1セ@ •..., en is transmitted over a noisy channel. The noise in the channel changes some or all of the symbols of the codeword. Let the received vector be denoted by v = v1Zl:l• ..•, vn. Define the error vector as e= V- C = V1 Zl:l• •••, Vn- e! セG@ ..., en= el e2, ..., en (3.11) The decoder has to decide from the received vector, v, which codeword was transmitted, or equivalently, it must determine the error vector, e. Definition 3.17 Let Cbe an (n, k) code over GF(q) and a be any vector of length n. Then the set a+C={a+.%j.%e C} (3.12) is called a Coset (or translate) of C. a and bare said to be in the same coset if (a- b)e c. Theorem 3.6 Suppose Cis an (n, k) code over GF(q). Then, (i) every vector b of length n is in some coset of C. (ii) each coset contains exactly l vectors. (iii) two cosets are either disjoint or coincide (partial overlap is not possible). (iv) if a + Cis a coset of C and b e a + C, we have b + C = a+ C. Proof (i) b = b + 0 E b + C. (ii) Observe that the mapping C ---? a + C defined by.% セ@ a + .%, for all.% e C is a one-to- one mapping. Thus the cardinality of a + Cis the same as that of C, which is equal to l.
  • 50. セ@ ...· セNQ@ Information Theory, Coding and Cryptography (iii) Suppose the cosets a + C and a + C overlap, i.e., they have at least one vector in common. Let v E (a+ C) n (b +C). Thus, for some x, y E C, v =a+x=b+ y. Or, b =a+x-y=a+z,whereze C (because, the difference of two codewords is also a codeword). Thus, b+ C =a+ C+zor (b+ C) c (a+ C). Similarly, it can be shown that (a+ C) c (b + C). From these two we can conclude that (b +C)= (a+ C). (iv) Since bE a+ C, it implies that b =a+ x, for some x E C. Next, if b + y E b + C, then, b + y = (a + x) + y = a + (x + y) E a + C. Hence, b + C セ@ a + C. On the other hand, if a + z E a + C, then, a + z = (b - x) + z = b + (z - x) E b + C. Hence, a + C セ@ b + C, and so b + C = a + C. Definition 3.18 The vector having the minimum weight in a coset is called the Coset Leader. If there is more than one vector with the minimum weight, one of them is chosen at random and is declared the coset leader. Example 3.17 Let C be the binary (3, 2) code with the generator matrix given by The cosets ofC are, G = [1 01] 0 1 0 i.e., C = {000, 010, 101, 111 }. 000 + c = 000,010, 101, 111, 001 + c = 001,011, 100, 110. Note that all the eight vectors have been covered by these two cosets. As we have already seen (in the above theorem), ifa + Cis a coset of C and b E a + C, we have b + C =a + C. Hence, all cosets have been listed. For the sake of illustration we write down the following 010 + c = 010,000, 111, 101, 011 + c = 011, 001, 110, 101, 100 + c = 100, 110, 001, 011, 101 + c = 101, 111, 000, 101, 110 + c = 110, 100, 011, 001, 111 + c = 111, 101,010,000. It can be seen that all these sets are already covered. Linear Block Codes for Error Correction Since two cosets are either disjoint or coincide (from Theorem 3.6), the set of all vectors, GF(q)" can be written as where GF(q)" = C u (a1 + C) u (a2 + C) u ... u (a1 + C) t =q"'k -1. Definition 3.19 A Standard Array for an (n, k) code Cis a rf-lc x qk array of all vectors in GF(fj)" in which the first row consists of the code C (with 0 on the extreme left), and the other rows are the cosets a;+ C, each arranged in corresponding order, with the coset leader on the left. Steps for constructing a standard array: (i) In the first row write down all the valid codewords, starting with the all-zero codeword. (ii) Choose a vector a1 which is not in the first row. Write down the coset a1 + Cas the second row such that a1 + x is written under x E C. (iii) Next choose another カ・」エッイセ@ (not present in the first two rows) of minimum weight and write down the coset セ@ + C as the third row such that a2 + x is written under x E C. (iv) Continue the process until all the cosets are listed and every vector in GF (q)" appears exactly once. Example 3.18 Consider the code C = {0000, 1011, 0101, 1110}. The corresponding standard array is codewords セ@ 0000 1011 0101 1110 1000 0011 1101 0110 0100 1111 0001 1010 0010 1001 0111 1100 i coset leader Note that each entry is the sum of the codeword and its coset leader. Let us now look at the concept of decoding (obtaining the information symbols from the received codewords) using the standard array. Since the standard array comprises all possible words belonging to GF(q)", the received word can always be identified with one of the elements of the standard array. If the received word is a valid codeword, it is concluded that no errors have occurred (this conclusion may be wrong with a very low probability of error, when one valid codeword gets modified to another valid codeword due to noise!). In the case that the received word, v, does not belong to the set of valid codewords, we surmise that an error has occurred. The decoder then declares that the coset leader is the error vector, e, and decodes the codeword as v - e. This is the codeword at the top of the column containing v. Thus, mechanically, we decode the codeword as the one on the top of the column containing the received word.
  • 51. I. I II Information Theory, Coding and Cryptography Example 3.19 Suppose the code in the previous example C = {0000, 1011, 0101, 1110} is used and the received word is v = 1101. Since it is not one of the valid codewords, we deduce that an error has occurred. Next we try to estimate which one ofthe four possible codewords was actually transmitted. If we make use of the standard array of the earlier example, we find that 1101 lies in the 3rd column. The topmost entry ofthis column is 0101. Hence the estimated codeword is 0101. Observe that: d (1101, 0000) = 3, d (1101, 1011) = 2, d (1101, 0101) =1, d (1101, 1110) =2 and the error vector e = 1000, the coset leader. Codes with larger blocklengths are desirable (though not always; see the concluding remarks on this chapter) because the code rates of larger codes perform closer to the Shannon Limit. As we go to larger codes (with larger values of k and n), the method of standard array will become less practical because the size of the standard array (qn-k x q*) will become unmanageably large. One of the basic objectives of coding theory is to develop efficient decoding strategies. If we are to build decoders that will work in real-time, the decoding scheme should be realizable both in terms ofmemory required as well as the computational load. Is it possible to reduce the standard array? The answer lies in the concept of Syndrome Decoding, which we are going to discuss next. 3.7 SYNDROME DECODING The standard array can be simplified if we store only the first column, and compute the remaining columns, if needed. To do so, we introduce the concept of the Syndrome of the error pattern. I Definition 3.20 Suppose His a parity check matrix of an (n, Jq code, then for any vector v e GF(q)n, the vector s = vHT (3.13) is called the Syndrome of v. The syndrome of v is sometimes explicitly written as s(v). It is called a syndrome because it gives us the symptoms of the error, thereby helping us to diagnose the error. Theorem 3.7 Two vectors x and y are in the same coset of C if and only if they have the same syndrome. Proof The vectors x and y belong to the same coset Linear Block Codes for Error Correction <:::> x+ C=y+ C <:::> x-ye C <:::> (x - y)HT = 0 <:::> xHT =yHT <:::> s(x) = s(y) Thus, there is a one to one correspondence between cosets and syndromes. We can reduce the size of the standard array by simply listing the syndromes and the corresponding coset leaders. Extultpk J.JO WenowextendtbestandatdarraylistedinExample 3.18by。、。ゥ。ァᄋセ@ column. The cOde is C ={0000, lOU. PQセQN@ 1110}. 'I'I:le conespoDd.iiig standard mayis· Codewords 0000 1000 0100 0010 t coset leader 1011 OO'J:l l1U loot The steps for syndrome decoding are as follows oiot 1101 ·.JXIll -, .- 'Oltl ' 1111 Syndi:OIDe' 00 oiio: 11 1010. · . Dt· · .·uoo l& (i) Determine the syndrome (s = vHT) of the received word, v. (ii) Locate the syndrome in the 'syndrome column'. (iii) Determine the corresponding coset leader. This is the error vector, e. (iv) Subtract this error vector from the received word to get the codeword y = v- e. Having developed an efficient decoding methodology by means of syndrome decoding, let us now find out how much advantage coding actually provides. 3.8 ERROR PROBABILITY AFTER CODING (PROBABILITY OF ERROR CORRECTION) DeflDitlon 3.21 The Probability of Error (or, the Word Error Rate) P,for any decoding scheme is the probability that the decoder output is a wrong codeword. Itis also called the llesklual Error Rate. Suppose there are M codewords (of length n) which are used with equal probability. Let the decoding be done using a standard array. Let the number of coset leaders with weight i be denoted by a.,, We assume that the channel is a BSC with symbol error probabilityp. A decoding error occurs if the error vector e is rwt a coset leader. Therefore, the probability of correct decoding will be I
  • 52. Information Theory, Coding and Cryptography n pcor = Lai pi (1- P)n- i (3.14) i=O Hence, the probability of error will be n Perr= 1- Lai pi (1- pt-i (3.15) i=O Example 3.21 Consider the standard array in Example 3.18. The coset leaders are 0000, 1000, 0100 and 0010. Therefore <lo = 1 (only one coset leader with weight equal to zero), a.1 =3 (the remaining three are of weight one) and all other a.i = 0. Therefore, perr = 1- [(1 - p)4 + 3p{l- p)3] Recall that this code has four codewords, and can be used to send 2 bits at a time. If we did not perform coding, the probability oferror of the 2-bit message being received incorrectly would be perr = 1 -pear= 1 - (1 - p)2. Note thatforp = 0.01, the Word Error Rate (upon coding) isPerr= 0.0103, while fortheuncoded case Pe" = 0.0199. So, coding has almost halved the word error rate. The comparison of Perr for messages with and without coding is plotted in Fig. 3.3. It can be seen that coding outperforms the uncoded case only for p < 0.5. Note that the improvement due to coding comes at the cost of information transfer rate. In this example, the rate of information transfer has been cut down by half as we are sending two parity bits for every two information bits. 0.8 Without 」ッセゥョァ@ - 0.6 Perr 0.4 L__ With coding 0.2 MMMMMMMセMMMMMMMMMセMMMMMMMMMセMMMMMMMMM 0 p 0 0.2 0.4 0.6 0.8 1 Fig. 3.3 Comparison of Perr for Coded and Uncoded 2-Bit Messages. Linear Block Codes for Error Correction Example 3.22 This example will help us visualize the power ofcoding. Consider a BSC with the probability of symbol error p = 1o-7. Suppose 10 bit long words are being transmitted without coding. Let the bit rate of the transmitter be 10 7 b/s, which implies that Hf wordls are being sent. The probability that a word is received incorrectly is (0) (1- p)9 p + C20) (1- p)8; + cセI@ (1- p)7 p3 + ···""' c:)(1-p)9 p ]Qセ@ wordls. Therefore, in one second, 10-6 x 1ff = 1 word will be in error ! The implication is that every second a word will be in error and it will not be detected. Next, let us add a parity bit to the uncoded words so as to make them 11 bits long. The parity makes all the codewords of even parity and thus ensures that a single bit in error will be detected. The only way that the coded word will be in error is iftwo or more bits get flipped, i.e., at least two bits are in error. This can be computed as 1 - probability that less than two bits are in error. Therefore, the probability of word error will be II (11) 10 2 1- (1 - p) - 1 ( 1- p) p z 1 - (1 - 11p) - 11(1 - 1Op) p = 110 p = 11 x 10-13 The new word rate will be 107/11 wordls because;10w 11 bits constitute one word and the bit rate is the same as before. Thus in one second, (107/11) x (11 x w-13 ) = 10-6 words will be in error. This implies that, after coding, one word will be received incorrectlywithoutdetectionevery 106 seconds = 11.5 days! So just by increasing the word length from 10 bits (uncoded) to 11 bits (with coding), we have been able to obtain a dramatic decrease in the Word Error Rate. For the second case, each time 2 word is detected to be in error, we can request the transmitter to re-transmit the word. This strategy for retransmission is called the Automatic Repeat Request (ARQ). 3.9 PERFECT CODES Definition 3.22 For any vector u in GF(qt and any integer r セ@ 0, the sphere of radius rand centre u, denoted by S(u, r), is the set {v E GF(q)n I d(u, v) セ@ r}. This definition can be interpreted graphically, as shown in Fig. 3.4. Consider a code C with minimum distance I(cIセ@ 2t+ 1. The spheres of radius tcentred at the codewords {c1, c2, .... eM} of C will then be disjoint. Now consider the decoding problem. Any received vector can be represented as a point in this space. If this point lies within a sphere, then by nearest neighbour decoding it will be decoded as the centre of the sphere. If t or fewer errors occur, the received word will definitely lie within the sphere of the codeword that was transmitted, and will be correctly decoded. If, however, larger than terrors occur, it will escape the sphere, thus resulting in incorrect decoding.
  • 53. l' I Information Theory, Coding and Cryptography Fig. 3.4 The concept of spheres in GF(qf. The codewords of the code with I (C; セ@ 2t + 1 are the centres of these non-overlapping spheres. Theorem 3.8 A sphere of radius r (0 :S r :S n) contains exactly (セI@ +(;) (q-I)+(;) (q-1) 2 +···+G)<q-I)' vectors. (3.16) Proof Consider a vector u in GF(q)n and another vector v which is at a distance m from u. This implies that the vectors u and v differ at exactly mplaces. The total number of ways in which m position can be chosen from n positions is (:). Now, each of these m places can be replaced by (q- 1) possible symbols. This is because the total size of the alphabet is q, out of which one is currently being used in that particular position in u. Hence, the number of vectors at a distance exactly mfrom u is HセI@ +GJ<q-I)+ C)(q-1) 2 +...+GJ<q-I)' (3.17) Ertllllpk 3.23 Consider a binary code (i.e., q=2) and「ャッ」ォMQャセセセセ@ at a distance 2 or less from any·codeword will be · • Mセ@ MLLLZGーセZ@ サセZセエjZNGセLᄋセセZZGセ@ :·::. Without loss ofgenerality we can choose the fixed vectora:= 0000. セ@ vCctonof.....2« . less are .., · ..·.. - . . -· Linear Block Codes for Error Correction ZZZZZコ]Zセ[ZNZZᄋセセセェセZヲᄋaZセyセセセヲ[{セfセセセ|LセャセゥZZZᄋ@ ZエNZZzZZcZセNセG\@ ··$ .: :f サ」セᄋ」@ :· .• NZjセMセセャセセセMNセセQゥ[」@ Theorem 3.9 A fj"ary (n, k) code with M codewords and minimum distance (2t + 1) satisfies M {(セI@ +(セ@ }q-I) +(;}q-1) 2 +..·+(;}q-1)'},; q' (3.18) Proof Suppose Cis a f["ary (n, Jq code. Consider spheres of radius t centred on the M codewords. Each sphere of radius t has HセI@ +HセI@ (9 -I)+ (;}q-1)2 +...+(:}q-1)' vectors (theorem 3.8). Since none of the spheres intersect, the total number of vectors for the M disjoint spheres is M {(セI@ +(セ@ }q-I) +'(;}q-1) 2 +..·+(:}q-!)'} which is upper bounded by qn, the total number of vectors oflength n in GF( q)n · This bound is called the Hamming Bound or the Sphere Packing Bound and it holds · d 11 F b' d the Hamming Bound will become good for nonhnear co es as we . or mary co es, mサHセI@ +G)+(;)+ ... +(;)},; 2" (3.19) It should be noted here that the mere existence of a set of integers n, M and t satisfying the Hamming bound, does not confirm it as a binary code. For example, the s_et n= 5, セ@ = 5 and t = 1 satisfies the Hamming Bound. However, no binary code exists for this specification. Observe that for the case when M = qk, the Hamming Bound may be alternatively written as (3.20) セMJGィャォエ」ゥセj・@ ゥ。\ュ・キィゥ、ゥセQj^・セェセ@ セセセ[N@ ...... ;セHゥj@ +(;)(f-t>+(i)ifMエIセ@ + MセM セサヲIイセャエスBGセGᄋ@ · ....·[M[Mセ@ · I I ·I l !
  • 54. Information Theory, Coding and Cryptography For a Perfect Code, there are·equal radius disjoint spheres centred at the codewords which completely fill the space. Thus, a t error correcting perfect code utilizes the entire space in the most efficient manner. Example 3.24 Consider the Binary Repetition Code { 00...0 c- 11...1 of block length n, where n is odd. In this case M =2 and t =(n- 1)/2. Upon substituting these values in the left hand side of the inequality for Hamming bound we get Thus the repetition code is a Perfect Code. It is actually called a Trivial Perfect Code. In the next chapter, we shall see some examples of Non-trivial pセイヲ・」エ@ Codes. One of the ways to search for perfect codes is to obtain the integer solutions for the parameters n, q, M and tin the equation for Hamming bound. Some of the solutions found by exhaustive computer search are listed below. S.No. n q M t 1 23 2 212 3 2 90 2 278 2 3 11 3 36 2 3. 10 HAMMING CODES There are both binary and non-binary Hamming Codes. Here, we shall limit our discussion to binary Hamming Codes. The binary Hamming Codes have the property that (n, k) =(2m- 1, 2m- 1 - m) (3.22) where m is any positive integer. For example, for m = 3 we have a (7, 4) Hamming Code. The parity check matrix, H, of a Hamming Code is a very interesting matrix. Recall that the parity check matrix of an (n, k) code has n- k rows and n columns. For the binary (n, k) Hamming code, the n = 2m - 1 columns consist of all possible binary vectors with n - k = m elements, except the all zero vector. Linear Block Codes for Error Correction Example 3.25 Tlle generator matrlx for the binary (7, 4) Hamming Code is given by [ 1 1 0 1 0 1 1 0 G = 0 0 1 1 0 0 0 1 0 0 1 0 0 1 1 0 j] The corresponding parity check matrix is H = イセ@ セ@ セ@ ::セ@ セQ@ lo o 1 o 1 1 1J Observe that the columns ofthe parity check matrix consist of (100), (010), (101), (110), (111), · · 1 b' ecto oflength three. It isquite (011) and (001). These seven are all the possib e non-zero mary v rs . easy to generate a systematic Hamming Code. The parity check matrix H can be arranged m the systematic form as follows 110100] 1 1 1 0 1 0 = [- pT1 I]. 1 0 1 0 0 1 Thus, the generator matrix in the systematic form for the binary Hamming code is G = [ II P] = {セ@ lセ@ r::セ}@ 0001:011 .. t t . . . JJ SL 2 2 2 Li 2 From. the above example, we observe that no two columns セヲ@ H セ・@ ャゥセ・。イャケ@ dependent ( th · e they would be identical). However, form> 1, it is possible to Identify エィイセ・@ 」ッャオュセウ@ o erwis . . d. l' fan (n, k' Hammmg Code 1s fHth t ld add up to zero. Thus, the mmimum Istance, , o 1 o a wou H · C d are Perfect equal to 3, which implies that it is a single-error correcting code. ammmg o es Codes. · all ·ty b't an (n, k) Hamming Code can be modified to yield an (n+1, k) By 。セエィ、ュ、セ@ セTッカP・イ@ thpanth Ih,and an (n, k' Hamming Code can be shortened to an (n- l, k code WI - n e o er ' ·J • l 1 - l) code by イ・ュセカゥョァ@ l rows of its generator matrix G or, ・アオゥカ。ャ・ョセセG@ by removm.g セッセZウョウ@ · · h k trtx' H We can now give a more formal defimtion of Hammmg · of Its panty c ec rna , .
  • 55. il Information Theory, Coding and Cryptography DefbdUoD. 3-U Let • ::{fk セ@ l}l{fセ@ 1J•.QᆪQゥ・Zセ QエCGゥセ[Zエエ@ セMM code for which the ー。イゥエケNセ@ . . .. . independent (over gfHセIL@ i.e., the セMᄋ@ are 。ゥjセ@ .Nュュ。エZMZZwゥMセゥイ}ャャ@ ...セ@ vectors. 3.11 OPTIMAL LINEAR CODES Definition 3.25 For an (n, .t, tt) oーエャュ。ゥcッ、・Lーッサゥ。MャセNZォL[LセZヲ。セゥイZエJLヲセAヲQ[ッイ@ .. (n + 1, k, d + 1) code exists. .· ·. .•. . ᄋセ@ ... セMM · , Optimal Linear Codes give the best distance property under the constraint of the block length. Most of the optimal codes have been found by long computer searches. It may be possible to have more than one optimal code for a given set of parameters n, k and d*. For instance, there exist two different binary (25, 5, 10) optimal codes. Qセセᄃセセ[セZZ[M[Z]Lイ[jエヲゥセェセセセセセ@ NMZMGBG\NᄋONMMMセUZB\ZM[ZLᄋN@ ·.:; .. ... Thus the binary (24, 12, 8) code is an optimal code. 3.12 MAXIMUM DISTANCE SEPARABLE (MDS) CODES In this section we consider the problem of finding as large a minimum distance apossible for a given redundancy, r. Theorem 3.10 An (n, n- r, d*) code satisfies d* $; r + 1. IProof From the Singleton Bound we have a$; n - k + 1. Substitute k = n - r to get d* $; r + 1. 3.13 CONCLUDING REMARKS The classic paper by Claude Elwood Shannon in the Bell System Technical]ournal in 1948 gave birth to two important fields (i) Information Theory and (ii) Coding Theory. At that time, Shannon was only 32 years old. According to Shannon's Channel Coding Theorem, "the error rate ofdata transmitted over a hand-limited noisy channel can he reduced to an arbitrarily small amount if Linear Block Codes for Error Correction the information rate is less than the channel 」。ー。」ゥエケGセ@ Shannon predicted the existence of good channel codes but did not construct them. Since then the search for good codes has been on. Shannon's seminal paper can be accessed from the site: http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html. In 1950, R.W. Hamming introduced the first single-error correcting code, which is still used today. The work on linear codes was extended by Golay (whose codes will be studied in the following chapter). Golay also introduced the concept of Perfect Codes. Non-binary Hamming Codes were developed by Golay and Cocke in the late 1950s. Lately, a lot of computer searches have been used to find interesting codes. However, some of the best known codes are ones discovered by sheer genius rather than exhaustive searches. According to Shannon's theorem, if C(p) represents the capacity (see Chapter 1 for further details) of a BSC with probability of bit error equal to p, then for arbitrarily low probability of symbol error we must have the code rateR< C(p). Even though the channel capacity provides an upperbound on the achievable code rate (R = kln), evaluating a code exclusively against channel capacity may be misleading. The block length of the code, which translates directly into delay, is also an important parameter. Even if a code performs far from ideal, it is possible that it is the best possible code for a given rate and length. It has been observed that as we increase the block length of codes, the bounds on code rate are closer to channel capacity as opposed to codes with smaller blocklengths. However, longer blocklengths imply longer delays in decoding. This is because decoding of a codeword cannot begin until we have received the entire codeword. The maximum delay allowable is limited by practical constraints. For example, in mobile radio communications, packets of data are restricted to fewer than 200 bits. In these cases, codewords with very large blocklengths cannot be used. SUMMARY • A Word is a sequence of symbols. A Code is a set of セ・」エッイウ@ called codewords. • The Hamming Weight of a codeword (or any vector) is equal to the number of non-zero elements in the codeword. The Hamming Weight of a codeword cis denoted by w(c). • A Block Code consists of a set of fixed length codewords. The fixed length of these codewords is called the Block Length and is typically denoted by n. A Block Coding Scheme converts a block of k information symbols to n coded symbols. Such a code is denoted by (n, k). • The Code Rate of an (n, k) code is defined as the ratio (kin), and reflects the fraction of the codeword that consists of the information symbols. • The minimum distance of a code is the minimum Hamming Distance between any two codewords. An (n, k) code with minimum distanced'' is sometimes denoted by (n, k, d). The minimum weight of a code is the smallest weight of any non-zero codeword, and is
  • 56. Information Theory, Coding and Cryptography denoted by w·. For a Linear Code the minimum distance is equal to the minimum weight of the code, i.e., d = w... • A Linear Code has the following properties: (i) The sum of two codewords belonging to the code is also a codeword belonging to the code. (ii) The all-zero codeword is always a codeword. (iii) The minimum Hamming Distance between two codewords of a linear code is equal to the minimum weight of any non-zero codeword, i.e., d = w... • The generator matrix converts (encodes) a vector oflength k to a vector oflength n. Let the input vector (uncoded symbols) be represented by i. The coded symbols will be given by c= iG. • Two q-ary codes are called equivalent if one can be obtained from the other by one or both operations listed below: (i) permutation of symbols appearing in a fixed position. (ii) permutation of position of the code. • An (n, k) Systematic Code is one in which the first k symbols of the codeword of block length n are the information symbols themselves. A generator matrix of the form G = [/ IP] is called the systematic form or the standard form of the generator matrix, where I is a k x k identity matrix and P'is a k x (n- k) matrix. • The Parity Check Matrix, H, for the given code satisfies cHT = 0, where c is a valid codeword. Since c = iG, therefore, iGHT = 0. The Parity Check Matrix is not unique for a given code. • A Maximum Distance Code satisfies d..= n - k + 1. • For a code to be able to correct up to terrors, we must have d セ@ 2t + 1, where d is minimum distance of the code. • Let Cbe an (n, k) code over GF(q) and a be any vector oflength n. Then the set a+ C= {a+ xI x E C} is called a coset (or translate) of C. a and bare said to be in the same coset iff (a - b) E C. • Suppose His a Parity Check Matrix of an (n, k) code. Then for any vector v E GF(qt, the vectors= vff is called the Syndrome of v. It is called a syndrome because it gives us the symptoms of the error, thereby helping us to diagnose the error. • A Perfect Code achieves the Hamming Bound, i.e., • The binary Hamming Codes have the property that (n, k) =(2m- 1, 2m- 1 - m), where m is any positive integer. Hamming Codes are Perfect Codes. • For an (n, k, d*) Optimal Code, no (n- 1, k, d), (n + 1, k + 1, d) or (n + 1, k, d + 1) code exists. • An (n, n- r, r+ 1) code is called a Maximum Distance Separable (MDS) Code. An MDS code is a linear code of redundancy r, whose minimum distance is equal to r + 1. Linear Block Codes for Error Correction Aセセセエmッイ・ュエゥョエセキキィZyセ@ i ! I people,-are,t'better tU" IXtha.rtt セ@ I i I セセ@ Adn:ant i u ! PROBLEMS 3.1 Show that C= {0000, 1100,0011, 1111} is a linear code. What is its minimum distance? 3.2 Construct, if possible, binary (n, k, d) codes with the following parameters: (i) (6, I, 6) (ii) (3, 3, 1) (iii) (4, 3, 2) 3.3 Consider the following generator matrix over GF(2) G= iセ@ セ@ セ@ セ@ セ}ᄋ@ lo 1 o 1 o (i) Generate all possible codewords using this matrix. (ii) Find the parity check matrix, H. (iii) Find the generator matrix of an equivalent systematic code. (iv) Construct the standard array for this code. (v) What is the minimum distance of this code? (vi) How many errors can this code detect? (vii) Write down the set of error patterns this code can detect. (viii) How many errors can this code correct? (ix) What is the probability of symbol error if we use this encoding scheme? Compare it with the uncoded probability of error. (x) Is this a linear code? 3.4 For the code C= {00000, 10101, 01010, 11111} construct the generator matrix. Since this G is not unique, suggest another generator matrix that can also generate this set of codewords. 3.5 Show that if there is a binary (n, k, d) code with I even, then there exists a binary (n, k, d) code in which all codewords have even weight. 3.6 Show that if Cis a binary linear code, then the code obtained by adding an overall parity check bit to Cis also linear. 3.7 For each of the following sets S, list the code <S>. (a) S= {0101, 1010, 1100}. (b) s= {1000, 0100, 0010, 0001}. (c) S = {11000, 01111, 11110, 01010}.
  • 57. l l I Information Theory, Coding and Cryptography 3.8 Consider the (23, 12, 7) binary code. Show that if it is used over a binary symmetric channel (BSC) with probability of bit error p= 0.01, the word error will be approximately 0.00008. 3.9 Suppose C is a binary code with parity check matrix, H. Show that the extended code C1, obtained from C by adding an overall parity bit, has the parity check matrix I 0 0 H 0 1 1 1 3.10 For a (5, 3) code over GF(4), the generator matrix is given by G= {セ@ セ@ セ@ セ@ セ}@ 0 0 1 1 3 (i) Find the parity check matrix. (ii) How many errors can this code detect? (iii) How many errors can this code correct? (iv) How many erasures can this code correct? (v) Is this a perfect code? 3.11 Let C be a binary perfect code of length n with minimum distance 7. Show that n = 7 or n=23. 3.12 Let rHdenote the code rate for the binary Hamming code. Determine lim rH. k-+oo 3.13 Show that a (15, 8, 5) code does not exist. COMPUTER PROBLEMS 3.14 Write a computer program to find the minimum distance of a Linear Block Code over GF(2), given the generator matrix for the code. 3.15 Generalize the above program to find the minimum distance of any Linear Block Code over GF(q). 3.16 Write a computer program to exhaustively search for all the perfect code parameters n, q, M and tin the equation for the Hamming Bound. Search for 1 セ@ n セ@ 200, 2 セ@ qセ@ 11. 3.17 Write a computer program for a universal binary Hamming encoder with rate }m-l 2 -1-m The program should take as input the value of m and a bit-stream to be encoded. It should then generate an encoded bit-stream. Develop a program for the decoder also. Linear Block Codes for Error Correction Now, perform the following tasks: (i) Write an error generator module that takes in a bit stream and outputs another bit- stream after inverting every bit with probability p, i.e., the probability of a bit error is p. (ii) For m = 3, pass the Hamming encoded bit-stream through the above-mentioned module and then decode the received words using the decoder block. (iii) Plot the residual error probability (the probability of error after decoding) as a function of p. Note that if you are working in the range of BER = 1o-r, you must transmit of the order of 10'+2 bits (why?). (iv) Repeat your simulations for m= 5, 8 and 15. What happens as m--7 oo. ,
  • 58. I' I Cyclic Codes We,t etn'lNe,t 。エセ@ not" by セ@ Ofll:y, butti4o- by the.- hecwt. ーセ@ GXセ@ (1623-1662) 4. 1 INTRODUCTION TO CYCLIC CODES In the previous chapter, while dealing with Linear Block Codes, certain linearity constraints were imposed on the structure of the block codes. These structural properties help セウ@ to search for good linear block codes that are fast and easy to encode and decode. In this chapter, we shall explore a subclass of linear block codes which has another constraint on the structure of the codes. The additional constraint is that any cyclic shift of a codeword results in another valid codeword. This condition allows very simple implementation of these cyclic codes by using shift registers. Efficient circuit implementation is a selling feature of any error control code. We shall also see that the theory of Galois Field can be used effectively to study, analyze and discover new cyclic codes. The Galois Field representation of cyclic codes leads to low- complexity encoding and decoding algorithms. This chapter is organized as follows. In the first two sections, we take a mathematical detour to polynomials. We will review some old concepts and learn a few new ones. Then, we will use these mathematical tools to construct and analyze cyclic codes. The matrix description of cyclic Cyclic Codes codes will be introduced next. We will then, discuss some popular cyclic codes. The chapter wili conclude with a discussion on circuit implementation of cyclic codes. Definition 4.1 A code Cis cyclic if (i) Cis a linear code, and, (ii) any cyclic shift of a codeword is also a codeword, i.e., if the codeword tzoa1 ••• セ Q@ is in C then an-lOo···an-2is also in C. Example 4.1 The binary code C1 = {0000, 0101, 1010, 1111} is a cyclic code. However C2 = {0000, 0110, 1001, 1111} is not a cyclic code, but is equivalentto the first code. Interchanging the third and the fourth components ofC2 yields C1. 4.2 POLYNOMIALS Definition 4.2 A polynomial is a mathematical expression f(x) =fo +fix+ ... +[,/', e (4.1) where the symbol xis called the indeterminate and the coefficientsfo,fi, ...,fm are the elements of GF (q). The coefficientfm is called the leading coefficient. Iffm # 0, then m is called the degree of the polynomial, and is denoted by deg f(x). Definition 4.3 A polynomial is called monic if its leading coefficient is unity. Example 4.2 j{x) = 3 + ?x + セ@ + 5x4 + x6 is a monic polynomial over GF(8). The degree of this polynomial is 6. Polynomials play an important role in the study of cyclic codes, the subject of this chapter. Let F[x] be the set of polynomials in x with coefficients in GF(q). Different polynomials in F[x] can be added, subtracted and multiplied in the usual manner. F[x] is an example of an algebraic , structure called a ring. A ring satisfies the first seven of the eight axioms that define a field (see Sec. 3.2 of Chapter 3). F[x] is not a field because polynomials of degree greater than zero do not· have a multiplicative inverse. It can be seen that if[(x), g(x) E F[x], then deg (f(x)g(x)) = degf(x) + deg g(x). However, deg (f(x) + g(x)) is not necessarily max{ deg f(x), deg g(x)}. For example, consider the two polynomials,f(x) and g(x), over GF(2) such thatf(x) = 1 + x2 and g(x) = 1 + x + x2 . Then, deg (f(x) + g(x)) = deg (x) = 1. This is because, in GF(2), 1 + 1 = 0, and x 2 + x 2 =(I+ 1); = 0.
  • 59. r-- Information Theory, Coding and Cryptography Example 4.3 Consider the polynomialsf(x) = 2 +x + セ@ + 2x4 and g{x) = 1 + '1f + 2x4 Kセッカ・イ@ GF(3). Then, f(i; + g(x) = (2 + 1) + x + (1 + RIセ@ + (2 + 2)x4 + セ@ = x + x4 + セN@ f(x). g(x) = (2 + x + セ@ + RクセH@ 1 + '1f + 2x4 + セI@ = 2 +X+ (1 + 2.2) セ@ + U + (2 + 2 + 2.2)x4 + (2 + 2) セ@ + (1 + 2 + l).li + x1 + 2.2x8 + セ@ = 2 + x + (1 + Qセ@ + U + (2 + 2 + l)x4 + (2 + Rセ@ + (1 + 2 + 1),/i + x1 + :/' Kセ@ = 2 + x + セ@ + :zx3 + 2x4 + :C + .li + x1 + x8 + セ@ Note that the addition and multiplication of the coefficients have been carried out in GF(3). Example 4.4 Consider the polynomialf(x) = 1 + x over GF(2). (f(x)) 2 = 1 + (1 + l)x + セ@ = 1 + セ@ Again considerf(x) = 1 + x over GF(3). (f(x))2 = 1 + HQKQIクKセ]@ 1 + RクKセ@ 4.3 THE DIVISION ALGORITHM FOR POLYNOMIALS The Division Algorithm states that, for every pair of polynomial a(x) and b(x) :t 0 in F[ x], there exists a unique pair of polynomials q(x), the quotient, and r(x), the remainder, such that a(x) = q(x) b(x) + r(x), where deg r(x) <deg b(x). The remainder is sometimes also called the residue, and is denoted by Rh(x) [a(x)] = r(x). Two important properties of residues are (i) Rtr.x) [a(x) + b(x)] = Rtr.x) [a(x)] + Rttx) [b(x)], and (ii) Rtr.x) [a(x). b(x)] = Rttx) {Rp_x) [a(x)]. Rttx) [b(x)]} where a(x), b(x) and f(x) are polynomials over GF(q). (4.2) (4.3) Example 4.5 Let the polynomials, a(x) = xl + x + land b(x) = セ@ + x + 1 be defined over GF{2). We can carry out the long division of a(x) by b(x) as follows x+l -q(x) b(x) -xl+x+ 1) X3+ x+ 1-a(x) _x3+ _x2 +X .xl+ .xl+x+ 1 x - r(x) Cyclic Codes Thus, a(x) = (x+ 1) b(x) + x. Hence, we may write a(x) = q(x) b(x) + r(x), where q(x) = x + 1 and r(x) = x. Note that deg r(x) <deg b(x). Definition 4.4 Let f(x) be a fixed polynomial in F(Xj. Two polynomials, g(x) and h(x) in F[x] are said to be congruent modulo f(x), depicted by g(x) =h(x) (modf(x)), if g(x) - h(x) is divisible by f(x). Example 4.6 Let the polynomials g(x) = x 9 + セ@ + 1, h(x) = セ@ + セ@ + 1 and f(x) = x 4 + 1 be defined over GF(2). Since g(x)- h(x) = J?j(x), we can write g(x) =h(x) (modf(x)). Next, let us denote F[x]!f(x) as the set ofpolynomials in F[x] of degree less than deg f(x), with addition and multiplication carried out modulo f(x) as follows: (i) If a(x) and b(x) belong to F[x]!f(x), then the sum a(x) + b(x) in F[x]lf(x) is the same as in F[x]. This is because deg a(x) <degf(x), deg b(x) <degf(x) and therefore deg (a(x) + h(x)) <deg f(x). (ii) The product a(x)b(x) is the unique polynomial of degree less than deg f(x) to which a(x)b(x) (multiplication being carried out in F[x]) is congruent modulo f(x). F[x]!f(x) is called the ring ofpolynomials (over F[x]) modulo f(x). As mentioned earlier, a ring satisfies the first seven of the eight axioms that define a field. A ring in which every element also has a multiplicative inverse forms a field. Example 4.7 Consider the product (x + 1)2 in f{ク}ャHセ@ + x + 1) defined over GF{2). (x + 1) 2 = セ@ +X+ X+ 1 = セ@ + 1 =X Hュッ、セK@ X+ 1). The product (x + 1)2 in f{ク}ャHセ@ + 1) defmed over GF(2) can be expressed as (x + 1) 2 = セ@ + x + x + 1 ]セK@ 1 ]oHュッ、セKクK@ 1). The product (x + 1)2 in F [x}OHセ@ + x + 1) defined over GF(3) can be expressed as (x + 1) 2 = セ@ + x +X+ 1 = セ@ + 2x + 1 =X Hュッ、セK@ X + 1). Ifj(x) has degree n, then the ring F[x]!f(x) over GF(q) consists of polynomials of 、・ァイ・・セ@ n- 1. The size of ring will be qn because each of the n coefficients of the polynomials can be one of the q elements in GF(q). Example 4.8 Consider the ring f{ク}ャHセ@ + x + 1) defined over GF(2). This ring will have polynomials with highest degree = 1. This ring contains qn = 2 22 = 4 elements (each element is a polynomial). The elements of the ring will be 0, 1, x and x + 1. The addition and multiplication tables can be written as follows. .I
  • 60. Information Theory, Coding and Cryptography + 0 1 X x+1 . 0 1 X X+1 0 0 1 X x+1 0 0 0 0 0 1 1 0 x+l X 1 0 1 X x+1 X X x+l 0 1 X 0 X x+1 1 x+l x+l X 1 0 x+1 0 x+1 1 X Next, consider f{ク}ャHセ@ + 1) defined over GF(2). The elements ofthe ring will be 0, l,x andx + 1. The addition and multiplication tables can be written as follows. + 0 1 X x+1 . 0 1 X x+1 0 0 1 X x+1 0 0 0 0 0 1 1 0 x+1 X 1 0 1 X x+1 X X x+l 0 1 X 0 X 1 x+1 x+1 x+l X 1 0 x+1 0 x+1 x+l 0 It is interesting to note that F[x]l(x2 + x + I) is actually a field as the multiplicative inverse for all the non-zero elements also exists. On the other hand, F[x]!(x2 + 1) is not a field because the multiplicative inverse of element x + 1 does not exist. It is worthwhile exploring the properties of f(x) which makes F[x]!f(x) a field. As we shall shortly find out, the polynomial f(x) must be irreducible (nonfactoriz:p.hle). Definition 4.5 A polynomi-al f(x) in F[x] is said to be reducible if f(x) = a(x) b(x), where a(x), h(x) are elements of l{x) and deg a(x) and deg b(x) are both smaller than deg f(x). H f(x) is not reducible, it is called irreducible. A monic irreducible polynomial of degree at least one is called a prime polynomial. It is helpful to compare a reducible polynomial with a positive integer that can be factorized into a product of prime numbers. Any monic polynomial in f(x) can be factorized uniquely into a product of irreducible monic polynomials (prime polynomials). One way to verify a prime polynomial is by trial and error, testing for all possible factorizations. This would require a computer search. Prime polynomials of every degree exist over every Galois Field. Theorem 4.1 (i) A polynomial f(x) has a linear factor (x - a) if and only if f(x) = 0 where a is a field element. (ii) A polynomialf(x) in F[x] of degree 2 or 3 over GF(q) is irreducible if and only iff(a) "* 0 for all a in GF(q). (iii) Over any field, xn- 1 = (x- 1)( xn- 1 + セM 2 + ... + x +1). The second factor may be further reducible. Cyclic Codes Proof (i) H f(x) = (x- a) g(x), then obviously f(a) = 0. On the other hand, if f(a) = 0, by division algorithm, f(x) = q(x)(x- a) + r(x), where deg r(x) <deg (x- a) = 1. This implies that r(x) is a constant. But, since f(a) = 0, r(x) must be zero, and therefore, f(x) = q(x)(x- a). (ii) A polynomial of degree 2 or 3 over GF(q) will be reducible if, and only if, it has at least one linear factor. The result (ii) then directly follows from (i). This result does not necessarily hold for polynomials of degree more than 3. This is because it might be possible to factorize a polynomial of degree 4 or higher into a product of polynomials none of which are linear, i.e., of the type (x- a). (iii) From (i), (x- 1) is a factor of (xn -1). By carrying out long division of (xn -1) by (x -1) we obtain HセM Q@ + xn- 2 + ... + x + 1). Example 4.9 Considerf(x) = セ@ -1 over GF(2). Using (iii) of theorem 4.1 we can write x1-I = (x- QIHセ@ + x + 1). This factorization is true over any field. Now, lets try to factorize the second term, p(x) = Hセ@ + x + 1). p(O) =0 + 0 + 1 =1, over GF(2), p(l) = 1 + 1 + 1 = 1, over GF(2). セ@ Therefore, p(x) cannot be factorized further (from Theorem 4.1 (ii)). Thus, over GF(2), x1- 1 = (x- QIHセ@ + x + 1). Next, considerf(x) = x1- 1 over GF(3). セ@ -1 = (x- QIHセ@ + x + 1). Again, let p(x) = Hセ@ + x + 1). p(O) = 0 + 0 + 1 = 1, over GF(3), p(l) = 1 + 1 + 1 = 0, over GF(3). p(2) = 2.2 + 2 + 1 = 1 + 2 + 1 = 1, over GF(3). Since, p(l) = 0, from (i) we have (x- 1) as a factor ofp(x). Thus, over GF(3), セ@ -1 = (x- l)(x- 1) (x- 1). Theorem 4.2 The ring F[x]lf(x) is a field if, and only if, J(x) is a prime polynomial in F[x]. Proof To prove that a ring is a field, we must show that every non zero element of the ring has a multiplicative inverse. Let s(x) be a non zero element of the ring. We have, deg s(x) < deg J(x), because s(x) is contained in the ring F[x]!f(x). It can be shown that the Greatest Common Divisor (GCD) of two polynomials J(x) and s(x) can be expressed as GCD(f(x), s(x)) = a(x) J(x) + h(x) s(x),
  • 61. Information Theory, Coding and Cryptography where a(x) and h(x) are polynomials over GF(q). Since f(x) is irreducible in F[x], we have GCD(f(x), s(x)) = 1= a(x) f(x) + b(x) s(x). Now, 1= セクI{Q}@ = セクI{@ a(x) f(x) + b(x) s(x)] = セクI{@ a(x) f(x)] + セクI{@ h(x) s(x)] (property (i) of residues) = 0 + セクI{「HクI@ s(x)] = rヲHクIヲセクI{「HクI}NセクI{ウHクI}ス@ (property (ii) of residues) = セクIサrエHクI{「HクI}NウHクIス@ Hence, セクI{「HクI}ゥウ@ the multiplicative inverse of s(x). Next, let us prove the only ifpart of the theorem. Let us suppose f(x) has a degree of at least 2, and is not a prime polynomial (a polynomial of degree one is always irreducible). Therefore, we can write f(x) = r(x) s(x) セッイ@ some polynomials r(x) and s(x) with degrees at least one. If the ring F[x]lf(x) is mdeed a field, then a multiplicative index of r(x), r- 1 (x) exists, since all polynomials in the field must have their corresponding multiplicative inverses. Hence, s(x) = セクIサ@ s(x)} = セクIサ@ r(x)r- 1 (x)s(x)} = Rt(x){r- 1 (x)r(x)s(x)} = Rt(x){r- 1 (x)f(x)} = 0 セッキ・カ・イL@ we had assumed s(x) :t:. 0. Thus, there is a contradiction, implying that the ring IS not a field. Note that a prime polynomial is both monic and irreducible. In the above theorem it is sufficient to have f(x) irreducible in order to obtain a field. The theorem could as well 'been stated as: "The ring F[x]!f(x) is a field if and only if[(x) is irreducible in F[x]". So, now we have an elegant mechanism of generating Galois Fields! If we can identify a prime polynomial of degree n over GF(q), we can construct a Galois Field with (elements. Such a field will have polynomials as the elements of the field. These polynomials will be 、セヲゥョ・、@ over GF(q) and consist of all polynomials of degree less than n. It can be seen that there will be ( such polynomials, which form the elements of the Extension Field. Example 4.10 Consider the polynomialp(x) =x1 +x + 1 over GF(2). sゥョ」・LーHoIMZセZ@ 0 andp(1) [セZ@ 0, the polynomial is irreducible in GF{2). Since it is also monic,p(x) is a prime polynomial. Here we haven= 3, so we can use p(x) to construct a field with 23 = 8 elements, i.e., GF(8). The elements.of this field will be 0, 1, x, x + 1, Xl-, 7? + 1, 7? + x, 7? + x + 1, which are all possible polynOJDials ofdegree less than n =3. It is easy toconstruct the addition and multiplication tables for this field (exercise). Cyclic Codes Having developed the necessary mathematical tools, we now resume our study of cyclic codes. We now fix[(x) =X'- 1 for the remainder of the chapter. We also denote F[x]!f(x) 「ケセᆳ Before we proceed, we make the following observations: (i) X'= 1 (mod X'- 1). Hence, any polynomial, modulo X'- 1, can be reduced simply by replacing X' by 1, xz+l by X and SO on. (ii) A codeword can uniquely be represented by a polynomial. A codeword consists of a sequence of elements. We can use a polynomial to represent the locations and values of all the elements in the codeword. For example, the codeword c1l2..·'n can be represented by the polynomial c(x) =Co+ c1x + l2; + ... cnX'. As another example, the codeword over GF(B), c= 207735 can be represented by the polynomial c(x) = 2 + 7Xl + 7; + 3x 4 +UセN@ (iii) Multiplying any polynomial by x corresponds to a single cyclic right-shift of the codeword elements. More explicitly, in Rno by multiplying c(x) by x we get x. c{x) = eox + cl; + l2; + ..... cnr-1 =en+ eox+ clXl + c2f + ..... cn-lX'. Theorem 4.3 A code C in セ@ is a cyclic code if, and only if, C satisfies the following conditions: (i) a(x),b(x)E cセ。HクIK「HクIe@ C (ii) a(x)E c。ョ、イHクIeセセ。HクIイHクIe@ C. Proof (4.4) (4.5) (i) Suppose Cis a cyclic code in セᄋ@ Since cyclic codes are a subset of linear block codes, the first condition holds. (ii) Let r(x) = r0 + r1x + r2x2 + ... 1nxn. Multiplication by x corresponds to a cyclic rightshift. But, by definition, the cyclic shift of a cyclic codeword is also a valid 'todeword. That is, x.a(x) E C, x.(xa (x)) E C, and so on. Hence r(x)a(x) = r0a(x) + r1xa(x) + r2Xla(x) + ... rnX'a(x) is also in C since each summand is also in C. Next, we prove the only ifpart of the theorem. Suppose (i) and (ii) hold. Take r(x) to be a scalar. Then (i) implies that Cis linear. Take r(x) = x in (ii), which shows that any cyclic shift also leads to a codeword. Hence (i) and (ii) imply that Cis a cyclic code. In the next section, we shall use th-e mathematical tools developed so far to construct cyclic codes. 4.4 A METHOD FOR GENERATING CYCLIC CODES The following steps can be used to generate a cyclic code: (i) Take a polynomial f(x) in セM (ii) Obtain a set of polynomials by multiplying f(x) by'ell possible polynomials in セM (iii) The set of polynomials obtained above corresponds to the set of codewords belonging to a cyclic code. The blocklength of the code is n.
  • 62. Information Theory, Coding and Cryptography d! Example 4.11 Consider the polynomial f(x) = 1 + :J! in R3 defmed over GF(2). In general a polynomial in R3 ( = F [x]l( x 3 - 1)) can be represented as r(x) = r 0 + r 1 x + イセL@ where the coefficients can take the values 0 or 1 (since defined over GF(2)). Thus, there can be a total of 2 x 2 x 2 = 8 polynomials inR3 defined over GF(2), which are 0, 1, x, Yl-, 1 + x, 1 + Xl, x + Yl-, 1 + x + Yl-. To generate the cyclic code, we multiplyf(x) with these 8 possible elements ofR3 and then reduce the results modulo (2- 1): (1 + _x2). 0 = 0, (1 + _x2) .1 = (1 + _x2), (1 + _x2) .X = 1 + X, (1 + _x2) . _x2 = X + _x2, (1 +.C). (1 + x) = x + .C, (1 +.C). (1 +.C)= 1 + x, (1 +.C). (x +.C)= (1 +.C), (1 + _x2) . (1 + X + _x2) = 0. Thus there are only four distinct codewords: {0, 1 + x, 1 + .C, x + .C} which correspond to {000, 110, 101, 011}. From the above example it appears that we can have some sort of a Generator Polynomial which can be used to construct the cyclic code. Theorem 4.4 Let C be an (n, k) non-zero cyclic code in Rn. Then, (i) there exists a unique monic polynomial セクI@ of the smallest degree in C (ii) the cyclic code C consists of all multiples of the generator polynomial g(x) by polynomials of degree k - I or less (iii) g(x) is a factor ッヲセM 1 Proof (i) Suppose both g(x) and h(x) are monic polynomials in C of smallest degree."Then g(x) - h(x) is also in C, and has a smaller degree. If g(x) t: h(x), then a suitable scalar multiplier of g(x) - h(x) is monic, and is in C, and is of smaller degree than g(x). This gives a contradiction. (ii) Let a(x) E C. Then, by division algorithm, a(x) = q(x)g(x) + r(x), where deg r(x) <deg g(x). But r(x) = a(x) - q(x)g(x) E C because both words on the right hand side of the equation are codewords. However, the degree of g(x) must be the minimum among all codewords. This can only be possible if r(x) = 0 and a(x) = q(x)g(x). Thus, a codeword is obtained by multiplying the generator polynomial g(x) with the polynomial q(x). For a code defined over GF(q), here are qk distinct codewords possible. These codewords correspond to multiplying g(x) with the ( distinct polynomials, q(x), where deg q(x) $ (k- I). (iii) By division 。ャァッイゥエィュLセM 1 = q(x)g(x) + r(x), where deg r(x) <deg g(x). Or, r(x) = {(xn - 1) - q(x)g(x)} modulo Hセ@ - 1) = - q(x)g(x). But - q(x)g(x) E C because we are multiplying the generator polynomial by another polynomial -q(x). Thus, we have a codeword r(x) whose degree is less than that of g(x). This violates the minimality of the degree ofg(x), unless r (x) = 0. Which ゥューャゥ・ウセM 1 = q(x) g(x), i.e., g(x) is a factor ッヲセM 1. Cyclic Codes The last part of the theorem gives us the recipe to obtain the generator ーッャセッュゥ。ャ@ for a cyclic code. All we have to do is to factorize xn - 1 into irreducible, monic ーッ⦅ャセッュゥ。ャウN@ We can also find all the possible cyclic codewords of blocklength n simply by factonzmg セ@ - 1. Note 1: A cyclic code C may contain polynomials other than the generator polynomial which also generate C. But the polynomial with the minimum degree is called the generator polynomial. Note 2: The degree of g(x) is n- k (this will be shown later). Example 4.12. To find all the binary cyclic codes ofblocklength 3, we first factorize2- 1. Note that for GF(2), 1 =- 1, since 1 + 1 = 0. Hence, セ@ - 1 = セ@ + 1 = (x + 1)( :J! + x + 1) Thus, we can make the following table. Generator Polynomial Code (polynomial) Code (binary) 1 {R3} {000, 001, 010, 011, 100, 101, 110, 111} (x + 1) {0, X+ 1, X2 +X, .f + 1} {000,011, 110, 101} (x2+x+ 1) {0, _x2 +X+ 1} {000, 111} Hセ@ + 1) 0 {0} {000} A simple encoding rule to generate the codewords from the generator polynomial is qx) = i (x) g(x), (4.6) where i(x) is the information polynomial, qx) is the codeword polynomial and セクI@ is the generator polynomial. We have seen, already, that there is a one to one correspondence between a word (vector) and a polynomial. The error vector can be also represented セウ@ the error polynomial, e(x). Thus, the received word at the receiver, after passing through a nmsy channel can be expressed as (4.7) v(x) = c(x) + e(x). We define the Syndrome Polynomial, s(x) as the remainder of v(x) under division by セクIL@ i.e., s(x) = Rg(x)[v(x)] = Rg(x)[ c(x) + e(x)] = Rg(x)[c(x)] + Rg(x)[e(x)] "'Rg(x)[e(x)], because Rg(x)[qx)] = 0. (4.8)
  • 63. Information Theory, Coding and Cryptography Example 4.13 Consider the generator polynomial g(x) = x?- + I for ternary cyclic codes (i.e., over GF(3)) ofblocklengthn = 4. Here we are dealing with cyclic codes, therefore,the highest power of g(x) is n- k. Since n = 4, k must be 2. So, we are going to construct a (4, 2) cyclic ternary code. There will be a total ofqk =32 =9 codewords. Theinformation polynomials and the corresponding codeword polynomials are listed below. i i(x) c(x) = i(x) g(x) c ()() 0 0 ()()()() OI I x} + I OI01 02 2 2x?- + 2 0202 IO X セKx@ 1010 11 X+ I セKxャKクK@ 1 1111 12 x+2 セKRxzKクKR@ 1212 20 2x セKRク@ 2020 21 2x+ I セKイK@ 2x+ I 2121 22 2x+ 2 セKRxzKRクKR@ 2222 It can be seen that the cyclic shift of any codeword results in another valid codeword. By observing the codewords we find that the minimum distance of this code is 2 (there are four non- zero codewords with the minimum Hamming weight= 2). Therefore, this code is capable of detecting one error and correcting zero errors. Observing the fact that the codeword polynomial is divisible by the generator polynomial, we can detect more number of errors than suggested by the minimum distance of the code. Since we are dealing with cyclic codes that are a subset of linear block codes, we can use the all zero codeword to illustrate this point without loss ofgenerality. Assume that g(x) = x?- + 1 and the transmitted codeword is the all zero codeword. Therefore, the received word is the error polynomial, i.e., v(x) = c(x) + e(x) = e(x). (4.9) At the receiver end, an error will be detected ifg(x) fails to divide the received wordv(x) = e(x). Now, g(x) has only two terms. So if the e(x) has odd number of terms, i.e., if the number of errors are odd, it will be caught by the decoder! For example, if we try to divide e(x) = セ@ + x +I by g(x), we will always get a remainder. In the example of the (4, 2) cyclic code with g(x) = x?- + I, the d* = 2, suggesting that it can detectd*- I = I error. However, by this simple observation, we find that it can detect any odd number of errors セ@ n. In this case, it can detect I error or 3 errors, but not 2 errors. Cyclic Codes 4.5 MATRIX DESCRIPTION OF CYCLIC CODES Theorem 4.5 Suppose Cis a cyclic code with generator polynomialg(x) = g0+ g1x + ...+g,xr of degree r. Then the generator matrix of C is given by go gl gr 0 0 0 0 0 go gl gr 0 0 0 G= 0 0 go g! gr 0 0 k= (n- r) rows (4.10) ... 0 0 0 0 0 go g! gr n columns Proof The (n- r) rows of the matrix are obviously linearly independent because of the echelon form of the matrix. These (n - r) rows represent the codewords g(x), xg(x), x2 g(x), ..., _?-r-1 g(x). Thus, the matrix can generate these codewords. Now, to prove that the matrix can generate all the possible codewords, we must show that every possible codeword can be represented as linear combinations of the codewords g(x), xg(x), ,?g(x), ..., セLN⦅ Q ァHクIN@ We know that if c(x) is a codeword, it can be represented as c(x) = q(x) .g(x) ' for some polynomial q(x). Since the degree of c(x) < n (because the length of the codeword is n), it follows that the degree of q(x) < n- r. Hence, q(x).g(x) = (qo + qlx + ...+ qn-r-lx"-r-l)g(x) = アセHクIK@ q1xg(x) + ...+ qn-r-Ix"-r- 1 g(x) Thus, any codeword can be represented as a linear combination of g(x), xg(x), Xl-g(x), ..., x"-r-1 g(x). This proves that the matrix G is indeed the generator matrix. We also know that the dimensions of the generator matrix is kx n. Therefore, r= n- k, i.e., the degree of g(x) is n- k. Example 4.14 To find the generator matrices of all ternary codes (i.e., codes over GF(3)) of blocklength n = 4, we first factorize x 4 - I. x4 - I = (x- iIHセK@ x?- + x + I)= (x- I) (x + I)(x!- + 1) We know that all the factors of x4 - 1 are capable of generating a cyclic code. The resultant generator matrices are listed in Table 4.1. Note that -I = 2 for GF(3). Table 4.1 Cyclic codes of blocklength n =4 over GF(3) g(x) (n, k) dmin G 1 (4, 4) I [/4] [-i 1 0 セ}@ (x-I) (4, 3) 2 -1 I 0 -I
  • 64. Information Theory, Coding and Cryptography g(x) (n, k) dmin G [i 1 0 セ}@ (x+ 1) (4, 3) 2 1 1 0 1 Hセ@ + 1) {セ@ 0 1 セ}@ (4, 2) 2 0 1 HセiI@ [Mセ@ 0 1 セ}@ (4, 2) 2 0 -1 Hセ@ Mセ@ + x•1) (4,1) 4 [-1 1 -1 1] HセエセKクMヲエャI@ (4, 1) 4 [tl 1 -tl 1] (x4 - 1) (4, 1) 0 [0000] It can be seen from the table that none of the (4, 2) ternary cyclic codes are single error correcting codes (since their minimum distance is less than 3). An interesting observation is that we do not have any ternary (4, 2) Hamming Code that is cyclic! Remember, Hamming Codes are single error correcting codes with n = (q -1)/(q -1) and k = (q...., 1)/(q -1)- r, where r is an ゥョエ・ァ・イセ@ 2. Therefore, a (4, 2) ternary Hamming code exists, but it is not a cyclic code. The next step is to explore if we can find a parity check polynomial corresponding to our generator polynomial, g(x). We already know that セクI@ is a factor of X'- 1. Hence we can write X'- 1 = h(x) g(x), (4.11) where h (x) is some polynomial. The following can be concluded by simply observing the above equation: (i) Since g (x) is monic, h (x) has to be monic because the left hand side of the equation is also monic (the leading coefficient is unity). (ii) Since degree of g(x) is n - k, the degree of g(x) must be k. Suppose Cis a cyclic code ゥョセ@ with the generator polynomialg(x). Recall that we are den-oting F[x]lf(x) by Rw wheref(x) =X'- 1. InRw h(x)g(x) =xn-1 =0. Then, any codeword belo-nging to Ccan be written as c(x) = 。HクIセクIL@ where the polynomial a(x) E セMtィ・イ・ヲッイ・@ in Rw c(x)h(x) = 。HクIセクIィHクI@ = a(x) ·0 = 0. Thus, h(x) behaves like a Parity Check Polynomial. Any valid codeword when multiplied by the parity check polynomial yields the zero polynomial. This concept is parallel to that of the parity check matrix introduced in the previous chapter. Since we are still in the domain of linear block codes, we go ahead and define the parity check matrix in relation to the parity check polynomial. Cyclic Codes Suppose Cis a cyclic code with the parity check polynomial h(x) = fto + h1 x + ... + hAl, then the parity check matrix of C is given by Recall that cHT= 0. Therefore, ゥghセ@ = 0 for any information vector, i. Hence, GHT= 0. We further have s = vHT where s is the syndrome vector and v is the received word. Exmnple 4.15 For binary codes of block length, n =7, we have x 1 -1 = HクMQIHセKクK@ QIH[KセK@ 1) Consider g(x) = Hセ@ + x + 1). Since g(x) is a factor Df x1 - 1, there is a cyclic code that can be generated by it. The generator matrix corresponding to g(x) is [ 1 1 0 1 0 0 0] 0 1 1 0 1 0 0 G= 0 0 1 1 0 1 0 0 0 0 1 1 0 1 The parity check polynomial h(x) is (x- 1)(; + X2 + 1) = (x4 + X2 + x + 1). And the corresponding parity check matrix is [ 1 0 1 1 1 0 OJ H= 0 1 0 1 1 1 0 0 0 1 0 1 1 1 The minimum distance of this code is 3, and this happens to be the (7, 4) Hamming Code. Thus, the binary (7, 4) Hamming Code is also a Cyclic code. 4.6 BURST ERROR CORRECTION In many real life channels, errors are not random, but occur in bursts. For example, in a mobile communications channel, fading results in Burst errors. When errors occur at a stretch, as opposed to random errors, we term them as Burst Errors.
  • 65. I •· Information Theory, Coding and Cryptography EX1l111Jlle 4.16 Let the transmitted sequence ofbits, transmitted at 10 kb/s over a wirelesschannel, be c = 0 1 0 0 0,1 1 l 0 1.0 1 0 0 0, 0 1 0 1 1 0 1 Suppose, after 0.5 ms ofthe start of transmission, the channel experiences a fade ofduration 1ms. During this time interval, the channel corrupts the transmitted bits. The error sequence can be written as b = 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 0. This is an example of a burst error, where a portion ofthe transmitted sequence gets garl>1ed due to the channel. Here the length of the burst is 10 bits. However, not all ten locations arein error. Definition 4.6 A Cyclic Burst of length t is a vector whose non-zero components are among t successive components, the first and last of which are non-zero. If we are constructing codes for channels more prone to burst errors of length t (as opposed to an arbitrary pattern of t random errors), it might be possible to design more efficient codes. We can describe a burst error as e(x)=i.b(x) (4.13) where is a polynomial of degree セ@ t- 1 and b(x) is the burst pattern. xi marks the starting location of the burst pattern within the codeword being transmitted. A code designed for correcting a burst of length t must have unique syndromes for every error pattern, i.e., s(x) = Rg(x)[e(x)] is different for each polynomial representing a burst of length t. Example 4.17 For a binary code ofblocklength n = 15, consider the generator polynomial g(x) = x 6 + セ@ + セ@ + x + 1 (4.14) This code is capable ofcorrecting bursts of length 3 or less. To prove this we must show that all the syndromes corresponding to the different burst errors are distinct. The different Burst Errors are (i) Bursts of length 1 e(x) = J fori= 0, 1, ..., 14. (ii) Bursts oflength 2 e(x) = J.(l + x) fori= 0, 1, ..., 13, and e(x) =J ·(1 KセI@ fori= 0, 1, ..., 13. (iii) Bursts of length 3 e(x) =J·(1 KクKセI@ fori=O, 1, ..., 12. Cyclic Codes It can be shown that the syndrome of all these 56 (15 + 14 + 14 + 13) errorpatterns are distinct. A table can be made for each pattern and the corresponding syndrome which can be used for correcting a burst error of length 3 or less. It should be emphasized that the codes designed specifically for correcting burst errors are more efficient in terms ofthe code rate. The code being discussed here is a (15, 9) cyclic code with code rate =ldn = 0.6 and minimum distanced• =3. This code can correct only 1 random error (but up to three burst errors!). Note that correction of one random error amounts to correcting a burst error of length 1. Similar to the Singleton Bound studied in the previous chapter, there is a Bound for the minimum number of parity bits required for a burst-error correcting linear block code: 'A linear block code that corrects all bursts of length t or less must have at least 2t parity symbols'. In the next three sections, we will study three different sub-classes of cyclic codes. Each sub- class has a specific objective. 4.7 FIRE CODES Definition 4.7 A Fire code is a cyclic burst error correcting code over GF(q) with the generator polynomial g(x) = (x2 t-1-l)p(x), (4.15) where p(x) is a prime polynomial over GF(tj) whose degree mis not smaller than t and p(x) does not divide xu-1 -l. The blocklength of the Fire Code is the smallest integer nsuch that g(x) divides X'-1. A Fire Code can correct all burst errors of length tor less. Example 4.18 Consider the Fire code with t= m = 3. A prime polynomial overGF{2) ofdegree 3 is p(x) =; + x + 1, which does not divide Hセ@ -1). The generator polynomial ofthe Fire Code will be g(x) = HセMQIーHクI@ = HセMQIH@ f + x + 1) = セ@ + x6 + セM セ@ - x- 1 ]セKク V KセKセKクK@ 1 The degree of g(x) = n- k = 8. The blocklength is the smallest integer n such that g(x) divides X'-1. After trial and error we get n = 35. Thus, the parameters of the Fire Code are (35, 27) with g(x) = x8 +x6 +セ@ +; +x + 1. This code can correct up to bursts oflength 3._The code rate ofthis code is 0.77, and is more efficient than the code generated byg(x) = セ@ +セ@ +セ@ + x + 1 which has a code rate of only 0.6. Fire Codes become more efficient as we increase t. The code rates for binary Fire Codes (with m = t) for different values oft are plotted in Fig. 4.1.
  • 66. Information Theory, Coding and Cryptography I 0.9 MMMMMMMセMMMMMMMMMMMMTMMMMMM .!l 0.8 - - -- - - セM - - - -- L - - - - - セ@ - - - - - - セ@ - - - セ@ - _J- - - - - - • - - - - セ@ - e I : CD 0.7 ---- セMMMMMMGMMMMMMMGMMMMMMセMMMMMMGMMMMMMセMMMMMセMMMMMM セ@ I 8 0.6 I -- MMセMMMMMMLMMMMMセMMMMMMイMMMMMLMMMMMMイMMMMMZMMMMMM 1 I I I I I 0.5 - - -- - Mセ@ -- - - - - - ,___ - - - - ---1 - - - I 0.4 '------'------'-----'------'---'------'----'------' t 2 3 4 5 6 7 8 9 10 Fig 4.1 Code Rates for Different Fire Codes. 4.8 GOLAY CODES The Binary Golay Code In the previous chapter, Sec. 3.9, we saw that a (23, 12) perfect code exists with d• = 7. Recall that, for a perfect code, M サHセIKgI@ (q-1) + (;)<q-1) 2 + ...+ (;)<q-1)'} = q", (4.16) which is satisfied for the values: n= 23, k= 12, M= 2k= 212 , q= 2 and t= (a- 1)/2 = 3. This (23, 12) perfect code is the Binary Golay Code. We shall now explore this Perfect Code as a cyclic code. We start with the factorization of (x23 -l). Hセ S MQI@ = (x-I)(x11 + ;o + i + x5 + x4 + x2 + 1) (;1 + x9 +:? + x6 + x5 + x + 1) =(x-I) g1(x) fa(x). (4.17) The degree of g1(x) = n- k= 11, hence k= 12, which implies that there exists a (23, 12) cyclic code. In order to prove that it is a perfect code, we must show that the minimum distance of this (23, 12) cyclic.code is 7. One way is to write out the parity check matrix, H, and show that no six columns are linearly dependent. Another way is to prove it analytically, which is a long and drawn-out proof. The easiest way is to write a computer program to list out all the 212 codewords and find the minimum weight (on a fast computer it takes about 30 seconds!). The code rate is 0.52 and it is a triple error correcting code. However, the relatively small block length of this perfect code makes it impractical for most real life applications. The Ternary Golay Code We next examine the ternary (11, 6) cyclic code, which is also the Ternary Golay Code. This code has a minimum distance = 5, and can be verified to be a perfect code. We begin by factorizing (x11 -1) over GF(3). Cyclic Codes (x11 -1) = (x- l)(x'i + x4 - x3+ セM 1) HxゥM[MセM x- 1) = (x- 1) g1(x) [!a(x) (4.18) The degree of g1(x) = n- k = 5, hence k = 6, which implies that there exists a (11, 6) cyclic code. In order to prove that it is a perfect code, we must show that the minimum distance of this (11, 6) cyclic code is 5. Again, we resort to an exhaustive computer search and find that the minimum distance is indeed 5. It can be shown that (xP-I) has a factorization of the form (x- 1) g1(x) g2(x) over GF(2), whenever pis a prime number of the form 8m ± 1 (m is a positive integer). In such cases, g1(x) and g2(x) generate equivalent codes. If the minimum distance of the code generated by g1 (x) is odd, then it satisfies the Square Root Bound d.'2,.jp (4.19) Note that p denotes the blocklength. 4.9 CYCLIC REDUNDANCY CHECK (CRC) CODES One of the common error detecting codes is the Cyclic Redundancy Check (CRC) Codes. For a k-bit block of bits the (n, k) CRC encoder generates (n- k) bit long frame check sequence (FCS). Let us define the following T = n-bit frame to be transmitted D = k-bit message block (information bits) F= (n- k) bit FCS, the last (n- k) bits ofT P= the predetermined divisor, a pattern of (n- k + 1) bits. The pre-determined divisor, P, should be able to divide the codeword T. Thus, TIP has no remainder. Now, Dis the k-bit message block. Therefore, 2n-kD amounts to shifting the k bits to the left by (n- k) bits, and padding the result with zeros (recall that left shift by 1 bit of a binary sequence is equivalent to multiplying the number represented by the binary sequence by two). The codeword, T, can then be represented as T= 2n-kD + F (4.20) Adding Fin the above equation yields the concatenation of D and F. If we divide 2n-k D by P, we obtain 2n-k D R - - = Q,+ - (4.21) p p where, Q,is the quotient and RlPis the remainder. Suppose we use R as the FCS, then, T= 2n-k D + R (4.22) In this case, upon dividing Thy P we obtain T 2"-k D+R 2"-k D R -----= +- p p p p
  • 67. Information Theory, Coding and Cryptography R R R+R = Q_+ -+-=Q +--=Q p p p (4.23) Thus there is no remainder, i.e., Tis exactly divisible by P. To generate such an FCS, we simply divide 2rrk D by P and use the (n- k)-bit remainder as the FCS. Let an error E occur when Tis transmitted over a noisy channel. The received word is given by V=T+E (4.24) The CRC scheme will fail to detect the error only if Vis completely divisible by P. This translates to the case when E is completely divisible by P (because Tis divisible by P). Example 4.19 Let the messageD= 101000I101, i.e., k = IO and the pattern, P = 110101. The number of FCS bits= 5. Therefore, n = I5. We wish to determine the FCS. First, the message is multiplied by 25 (left shift by 5 and pad with 5 zeros). This yields 25 D = 10I000110100000 Next divide the resulting number byP = 1I0101. By long division. we obtain Q= 1101010110 and R = 0Ill0. The remainder is added to 25 D to obtain T = lOIOOOIIOIOIIIO Tis the transmitted codeword. Ifno errors occur in the channel, the received word whendivided by P will yield 0 as the remainder. CRC codes can also be defined using the polynomial representation. Let the message polynomial be D(x) and the predetermined divisor be P(x). Therefore, xn-k D(x) _ n( ) R(x) -----!o(...x +-- P(x) P(x) T(x) = xn- kD(x) + R(x) (4.25) At the receiver end, the received word is divided by P(x). Suppose the received word is V(x) = T(x) + E(x), (4.26) where E(x) is the error polynomial. Then [T(x) + E(x) ]IP(x) = E(x)!P(x) because T(x)!P(x) = 0. Those errors that happen to correspond to polynomials containing P(x) as a factor will slip by, and the others will be caught in the net of the CRC decoder. The polynomial P(x) is also called the generator polynomial for the CRC code. CRC codes are also known as Polynomial Codes. Example 4.20 Suppose the transmitted codeword undergoes a single-bit error. The error polynomialE(x) can be represented byE(x) =セL@ where i determines the location ofthe single error bit. IfP(x) contains two or more terms, E(x)IP(x) can never be zero. Thus all the single errors will be caught by such a CRC code. Cyclic Codes Example 4.21 Suppose two isolated errors occur, i.e., E(x) = :i + xi, i >j. Alternately, E(x) = xi(7!-i + 1). Ifwe assume thatP(x) is not divisible byx, then a sufficient conditionfor detecting all double errors is thatP(x) does not divide.I+ I for any kup to the maximum value ofi-j (i.e., the frame length). For example x15 + x14 + 1 will not 、ゥカゥ、・セK@ 1 for any value of k below 32,768. Example 4.22 Suppose the error polynomial has an odd numberofterms (correspondingto an odd number of errors). An interesting fact is that there is no polynomial with an odd number of terms that hasx + I as a factor ifwe are performing binary arithmetic (modulo 2 operations). By making (x + 1) as a factor of P(x), we can catch all errors consisting of odd number of bits (i.e., we can catch at least half of all possible errors!). Another interesting feature of CRC codes is its ability to detect burst errors. A burst error of length k can be represented by クゥHセM Q@ + セM R@ + ... + 1), where idetermines how far from the right end of the received frame the burst is located. If P(x) has a ,P term, it will not have an xi as a factor. So, if the degree of (xk-1 + xk-2 + ... + I) is less than the degree of P(x), the remainder can never be zero. Therefore, a polynomial code with r check bits can detect all burst errors of length セ@ r. If the burst length is r + 1, the remainder of the division by P(x) will be zero if, and only if, the burst is identical to P(x). Now, the 1st and last bits of a burst must be 1 (by definition). The intermediate bits can be 1 or 0. Therefore, the exact matching of the burst error with the polynomial P(x) depends on the r- 1 intermediate bits. Assuming all combinations are equally likely, the probability of a miss is 27 1 _1 . One can show that when error bursts oflength greater than r+ 1 occurs, or several shorter bursts occur, the probability of a bad frame slipping through . I IS-. 2T Example 4.23 Four versions ofP(x) have become international standards: CRC-12: P(x) = x12 + x11 + Y! + セ@ + x 1 + I. CRC-16: P(x) = x 16 + x15 + セ@ + 1. CRC-CCITI: P(x) = x16 + x 15 + セ@ + 1. CRC-32: P(x) = Y!2 + セ@ + ? + ? + x16 + x 12 + x 11 + x 10 + x 8 + x 1 + セ@ + x 4 + セ@ + x 1 + I. (4.27) All the four contain (x + 1) as a factor. CRC-12 is used for transmission of streams of 6-bit characters and generates a 12-bit FCS. Both CRC-16 and CRC-CCITI are popular for 8-bit characters. They result in a 16 bit FCS and can catch all single and double errors, all errors with odd number of bits, all burst errors of length 16 or less, 99.997% of 17-bit bursts and 99.998% of18-bit and longer bursts. CRC-32 is specified as an option in some point-to-point Synchronous Transmission Standards.
  • 68. Information Theory, Coding and Cryptography 4.10 CIRCUIT IMPLEMENTATION OF CYCLIC CODES Shift registers can be used to encode and decode cyclic codes easily. Encoding and decoding of cyclic codes require multiplication and division by polynomials. The shift property of shift registers are ideally suited for such operations. Shift registers are banks of memory units which are capable of shifting the contents of one unit to the next at every clock pulse. Here we will focus on circuit implementation for codes over GF(2"l Beside the shift register, we will make use of the following circuit elements: (i) A scaler, whose job is to multiply the input by a fixed field element. (ii) An adder, which takes in two inputs and adds them together. A simple circuit realization of an adder is the 'exclusive-or' or the 'xor' gate. (iii) A multiplier, which is basically the 'and' gate. These elements are depicted in Fig. 4.2. ... D-O- N stage shift register Scaler Adder Multiplier Fig. 4.2 Circuit Elements Used to Construct Encoders and Decoders for Cyclic Codes. A field element of GF(2) can simply be represented by a single bit. For GF(2m) we require m bits to represent one element. For example, elements of GF(B) can be represented as the elements of the set {000, 001, 010, 011, 100, 101, 110, 111}. For such a representation we need three clock pulses to shift the element from one stageof the shift register to the next. The effective shift register for GF(B) is shown in Fig. 4.3. Any arbitrary element of this field can be represented by aX-+ bx+ c, where a, b, care binary, and the power of the indeterminate xis used to denote the position. For example, 101 =} + 1. セ@ One stage of the effective shift register . . . L . . _ _ _ lセセ@ Fig. 4.3 The Effective Shift Register for GF(B). Cyclic Codes Example 4.24 We now consider the multiplication of any arbitrary element by another field element overGF(8). Recall the construction ofGF(8) from GF(2) using the primepolynomialp(x) ]セKxK@ 1. The elements of the field will be 0, 1, X, X+ 1, r, Xl + 1, Xl +X, :Xl +X+ 1. We want to obtain the circuit representation for the multiplication of any arbitrary field element (a:il + bx + c) by another element, say, Xl + x. We have (d + bx + c)(r + x) = ax4 +(a+ 「IセK@ (b +c) Xl +ex (modulop(x)) = (a + b + c) Xl + (b + c)x + (a + b) One possible circuit realization is shown in Fig. 4.4. Fig. 4.4 Multiplication of an Arbitrary Field Element. We next focus on the multiplication of an arbitrary polynomial a(x) by g(x). Let the polynomial g(x) be represented as g(x) =&_XL+... + g1x + g0, (4.28) the polynomial a(x) be represented as a(x) = akx* +... + a1x + a0, the resultant polynomial b(x) = a(x)g(x) be represented as b(x) = bk+Lxk+L +... + b1x + !Jo. (4.29) (4.30) The circuit realization of b(x) is given in Fig. 4.5. This is linear feed-forward shift register. It is also called a Finite Impulse Response (FIR) Filter. Fig. 4.5 A Finite lmpuse Response (FIR) Filter.
  • 69. Information Theory, Coding and Cryptography In Electrical Engineering jargon, the coefficients of a(x) and g(x) are 」ッョカッセカ・、@ by the shift register. For our purpose, we have a circuit realization for multiplying two polynomials. Thus, we have an efficient mechanism of encoding a cyclic code by multiplying the information polynomial by the generator polynomial. Exmnpk 4.25 The encoder circuit for the generator polynomial g(x) =x8 + x6 + セ@ + セ@ + x + 1 is given in Fig. 4.6. This is the generator polynomial for the Fire Code with t =m=3. It is easy to inteipret the circuit The 8memory units shift the input, one unitat a time. The shifted outputsare summedat theproper locations. There are five adders for summing up the six shiftedversions ofthe input. . X . x2 x3 ' . y;4 x5 セ@ . . i' , x8 + + + + . . + - • - : セ@ . セM f セN@ ; セ@ t MMセ@ セGM 1 Fig. 4.6 Circuit Realization of the Encoder for the Fire Code. For dividing an arbitrary polynomial by a fixed polynomial g(x), the circuit realization is given in Fig. 4.7. We can also use a shift register circuit for dividing an arbitrary polynomial, a(x), by a fixed polynomial g(x). We assume here that the divisor is a monic polynomial. We already know how to factor out a scalar in order to convert any polynomial to a monic polynomial. The division process can be expressed as a pair of recursive equations. Let qセHクI@ and rセHクI@ be the quotient polynomial and the remainder polynomial at the (It recursion step, with the initial conditions Q,(o)(x) = 0 and R(0 l(x) = a(x). Then, the recursive equations can be written as qセHクI@ = Qr-l)(x) + rイョMセクォMイL@ R(il(x) = R(r-l)(x)- R(n-r)xk-r"'x), (4.31) where rイョMセ@ represents the leading coefficient of the remainder polynomial at stage (r - 1). For dividing an arbitrary polynomial a(x) by a fixed polynomial g(x), the circuit realization is given in Fig. 4.8. Mter n shifts, the quotient is passed out of the shift register, and the value stored in the shift register is the remainder. Thus the shift register implementation of a decoder is very simple. The contents of the shift register can be checked for all entries to be zero after the division of the received polynomial by the generator polynomial. If even a single memory unit of the shift register is non-zero, an error is detected. Cyclic Codes Fig. 4.7 A shift Register Circuit for Dividing by g(x). Extutapk 4.26 The shiftregister circuit for dividing byg(x) =セ@ + セ@ + セ@ +セ@ +x + 1isgivenin Fig. 4.8. セ@ Fig. 4.8 A Shift Register Circuit for Dividing by g(x) = x' + X' + x5 + x' + x + 1. The procedure for error detection and error 」ッイイ・」エゥセ@ is セ@ follows. The イ・」・ゥカセ@ セ@ is first stored in a buffer. It is subjected to divide-by-g(x) operation. As we have ウ・・ョLセ@ di':ston セ「・@ carried out very efficiently by a shift register circuit. The remainder in the shift regiSter 1S then compared with all the possible (pre-compUted) syndromes. This set of ウセュ・ウ@ correspon<!s to the set of correctable error patterns. Ifa syndrome match is found. the error1s subtractedoutfrom the received word. The corrected version ofthe received word is then passed on to the next stage of the receiver unit for further processing. This kind ofa decoder is known asMeggittDecoder· The flow chart for this is given in Fig. 4.9. Divide by g(x) feedback Received word Compare with all test syndromes Corrected Word n stage shift register Ag. 4.9 The Flow Chart of a Meggitt Decoder.
  • 70. r I I Information Theory, Coding and Cryptography 4.11 CONCLUDING REMARKS The notion of cyclic codes was first introduced by Prange in 1957. The work on cyclic codes was further developed by Peterson and Kasami. Pioneering work on the minimum distance of cyclic codes was done by Bose and Raychaudhuri in the early 1960s. Another subclass of cyclic codes, the BCH codes (named after Bose, Chaudhuri and Hocquenghem) will be studied in detail in the next chapter. It was soon discovered that almost all of the earlier discovered linear block codes could be made cyclic. The initial steps in the area of burst error correction was taken by Abramson in 1959. The Fire Codes were published in the same year. The binary and the ternary Golay Codes were published by Golay in, as early as, 1949. Shift register circuits for cyclic codes were introduced in the works of Peterson, Chien and Meggitt in the early 1960s. Important contributions were also made by Kasami, MacWilliams, Mitchell and Rudolph. SUMMARY • A polynomial is a mathematical expression f(x) =fo +fix+ ... + f/, where the symbol xis called the indeterminate and the coefficientsfo,h, ... fm are the elements of GF(q). The coefficient fm is called the leading coefficient. Iffm :t= 0, then m is called the degree of the polynomial, and is denoted by deg f(x). A polynomial is called monic if its leading coefficient is unity. ' • The division algorithm states that, for every pair of polynomial a(x) and b(x) :t= 0 in F[x], there exists a unique pair of polynomials q(x), the quotient, and r(x), the remainder, such that a(x) = q(x) b(x) + r(x), where deg r(x) <deg b(x). The remainder is sometimes also called the residue, and is denoted by Rb(x)[a(x)] = r(x). • Two important properties of residues are (i) Rf(x)[a(x) + b(x)] = JY(x)[a(x)] + JY(x)[b(x)], and (ii) セクI{。HクIN@ b(x)] = JY(x){JY(x)[a(x)]. Rr(x)[b(x)]} where a(x), b(x) and f(x) are polynomials over GF(q). • A polynomial f(x) in F[x] is said to be reducible if f(x) = a(x) b(x), where a(x), b(x) are elements of f(x) and deg a(x) and deg b(x) are both smaller than deg f(x). H f(x) is not reducible, it is called irreducible. A monic irreducible polynomial of degree at least one is called a prime polynomial. • The ring F[x]!f(x) is a field if and only if f(x) is a prime polynomial in F[x]. • A code Cin セ@ is a cyclic code if and only if C satisfies the following conditions: (i) a(x), b(x) E C セ@ a(x) + b(x) E C, (ii) a(x) E C and r(x) E セ@ セ@ a(x)r(x) E C. • The following steps can be used to generate a cyclic code: (i) Take a polynomial f(x) ゥョセM Cyclic Codes (ii) Obtain a set of polynomials by multiplying f(x) by all possible polynomials in セM、@ . . b ds to the set of codewor s (iii) The set of polynomials obtained as a ove correspon . belonging to a cyclic code. The blocklength of the code IS n. • Let Cbe a (n, k) non zero cyclic code in Rn. Then, . (i) there exists a unique monic polynomial g(x) of the smallest degree m C, . (ii) the cyclic code C consists of all multiples of the generator polynomial g(x) by polynomials of degree k - 1 or less. (iii) g(x) is a factor ッヲセ@ - 1. (iv) The degree of g(x) is n- k. d th • For a cyclic code, C, with generator polynomial g(x) =go+ g1x + ...+ g,x of egree r, e generator matrix is given by go gl g, 0 0 go g1 g, 0 0 go & G= 0 0 g, 0 0 0 0 0 0 0 0 0 0 0 セ@ & b k =n- rrows n columns • For a cyclic code, C, with the parity check polynomial h(x) = ho + h1x+ ...+ ィセZャL@ then the parity check matrix is given by hk hk -1 0 hk hk -1 0 0 hk hk-1 H= 0 0 0 0 0 ho 0 0 ho 0 0 0 0 0 0 (n- k) rows n columns • セ@ _ 1 = h(x) g(x), where g(x) is the generator polynomial and h(x) is the parity check polynomial. GF( ) "th the generator • A fire code is a cyclic burst error correcting code over セ@ WI 1 "al (x) = (;t-1_1) p(x), where p(x) is a prime polynomial over GF(q) キィセウ・@ po ynomi g t divide _xu-1_1. The blocklength of the Fire degree mis not small_er than t 。ョ、ィーセI@ エ、ァッHZセ@ セセ、・ウ@ セMQN@ A Fire Code can correct all burst Code is the smallest mteger nsue a ; errors of length t or less. • The generator polynomial of the Binary Golay Code: g1 (x) = (xll + x10 +; + x! + :4 +; + 1), or g2(x) = (x11 +; + / +} + :f + x + 1). I I I
  • 71. Information Theory, Coding and Cryptography • The generator polynomial of the Ternary Golay Code: g1(x) = (} + x4 - i+;- I), or g2(x) = (} - i - ; - x - I). • One of the common error detecting codes is the Cyclic Redundancy Check (CRC) codes. For a k-bit block of bits the (n, k) CRC encoder generates (n- k) bit long Frame Check Sequence (FCS). • Shift セ・ァゥウエ・イウ@ 」セ@ be used to encode and decode cyclic codes easily. Encoding and decodmg of 」セ」ィ」@ c_odes イ・アセゥイ・@ multiplication and division by polynomials. The shift property of shift regtsters are Ideally suited for such operations. f9 e|ャセセッッュN。、・LNセセHmアゥャᆪエセセ@ but-not QQOセャオN@ I I eセ@ Albert- (l879-1955) オセMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ@ PROBLEMS 4.I Which of the following codes are (a) cyclic, (b) equivalent to a cyclic code? (a) {0000, 0110, 1100, 0011, 100I} over GF(2). (b) {00000, 10110, 0110I, 11011} over GF(2). (c) {00000, 10I10, 0110I, 11011} over GF(3). (d) {0000, II22, 22II} over GF(.'1). (e) The rrary repetition code of length n. 4.2 Construct the addition and multiplication table for (a) F[x]!(} + I) defined over GF(2). (b) F[x]!(} + I) defined over GF(3). Which of the above is a field? 4.3 List out all the irreducible polynomials over (a) GF(2) of degrees I to 5. (b) GF(3) of degrees I to 3. 4.4 Find all the cyclic binary codes of blocklength 5. Find the minimum distance of each code. 4.5 sオーセッウ・@ X'- I is a product of r distinct irreducible polynomials over GF(q). How many cychc codes of blocklength n over GF(q) exist? Comment on the minimum distance of these codes. TNVセ。」エッイゥコ・@ /'- I over GF(3). (b) How many ternary cyclic codes of length 8 exist ? (c) How many quaternary cyclic codes of length 8 exist? Cyclic Codes 4.7 Let the polynomial g(x) = ;o + /' + ; + x4 + ; + x + I be the generator polynomial of a cyclic code over GF(2) with blocklength I5. WFind the generator matrix G. (gYF'ind the parity check matrix H. U::YHow many errors can this code detect? A (gY1Iow many errors can this code correct? (e) Write the generator matrix in the systematic form. 4.8 Consider the polynomial g(x) = J5 + 3; + i + ; + 2; + 2x + 1. (a) Is this a valid generator polynomial for a cyclic code over GF(4) with blocklength 15? (b) Find the parity check matrix H. (c) What is the minimum distance of this code? (d) What is the code rate of this code? (e) Is the received word, v(x) = J +; + 3i +; + 3x + I, a valid codeword? 4.9 An error vector of the form J + J+ 1 ゥョセ@ is called a double adjacent error. Show that the code generated by the generator polynomial g1(x) = (x- I) !lli(x) is capable of correcting all double adjacent errors, where gn(x) is, the generator polynomial of the binary Hamming Code. 4.10 Design the Shift Register encoder and the Meggitt Decoder for the code generated in Problem 4.8. 4.II The code with the generator polynomial g(x) = (x23 + I)(;7 + ; + I) is used for error- detection and correction in the GSM standard. (i) How many random errors can this code correct? (ii) How many hurst errors can this code correct? COMPUTER PROBLEMS 4.I2 Write a computer program to find the minimum distance of a cyclic code over GF(q), given the generator polynomial (or the generator matrix) for the code. 4.I3 Write a computer program to encode and decode a (35, 27) Fire Code. It should be able to automatically correct bursts of length 3 or less. What happens when you try to decode a received word with a burst error of length 4?
  • 72. Bose-Chaudhuri Hocquenghem (BCH) Codes 5.1 INTRODUCTION TO BCH CODES 5 The class of Bose-Chaudhuri Hocquenghem (BCH) codes is one of the most powerful known class of Linear Cyclic Block Codes. BCH codes are known for their multiple error correcting ability, and the ease of encoding and decoding. So far, our approach has been to construct a code and then find out its minimum distance in order to estimate its error correcting capability. In this class of code, we will start from the other end. We begin by specifying the number of random errors we desire the code to correct. Then we go on to construct the generator polynomial for the code. As mentioned above, BCH codes are a subclass of cyclic codes, and therefore, the decoding methodology for any cyclic code also works for the BCH codes. However, more efficient decoding procedures are known for BCH codes, and will be discussed in this chapter. We begin by building the necessary mathematical tools in the next couple of sections. We shall then look at the method for constructing the generator polynomial for BCH codes. Efficient decoding techniques for this class of codes will be discussed next. An important sub-set of BCH codes, the Reed-Solomon codes, will be introduced in the later part of this chapter. Bose-Chaudhuri Hocquenghem (BCH) Codes 5.2 PRIMITIVE ELEMENTS Definition 5.1 A Primitive Element of GF(q) is an element a such that every field element except zero can be expressed as a power of a. Example 5.1 Consider GF(5). Since q = 5 =a prime number, modulo arithmetic will work. Consider the element 2. 2° = 1 (mod 5) = 1, 21 =2(mod5)=2, 22 = 4 (mod 5) = 4, 23 = 8 (mod 5) = 3. Hence, all the elements of GF(5), i.e., {1, 2, 3, 4, 5} can be represented as powers of 2. Therefore, 2 is a primitive element ofGF(5). Next, consider the element 3. 3°= 1 (mod5)= 1, 31 = 3 (modS)= 3, 32 =9 (mod 5) =4, 33 = 27 (mod 5) = 2. Again, all the elements ofGF(5), i.e., {1, 2, 3, 4, 5} can be represented as powers of 3. Therefore, 3 is also a primitive element ofGF(5). However, it can be tested that the other non-zero elements {1, 4, 5} are not primitive elements. We saw in the example that there can be more than one primitive element in a field. But is there a guarantee of finding at least one primitive element? The answer is yes! The non-zero elements of every Galois Field form a cyclic group. Hence, a Galois Field will include an element of order q- 1. This will be the primitive element. Primitive elements are very useful in constructing fields. Once we have a primitive element, we can easily find all the other elements by simply evaluating the powers of the primitive element. Definition 5.2 A Primitive Polynomial p(x) over GF(q) is a prime polynomial over GF(q) with the property that in the extension field constructed modulo p(x), the field element represented by x is a primitive element. Primitive polynomials of every degree exist over every Galois Field. A primitive polynomial can be used to construct an extension field.
  • 73. r I; ( l. I r·. ! 'i ·:: '. Information Theory, Coding and Cryptography Extzmple 5.2 We can construct.GF(8) using the ーイゥュゥエゥセ・@ polynomial p(x) = X3 + x + 1. Letthe primitive element of GF(8) be a =z. Then, we can represent all the elements of GF(8) by the powers of a evaluated modulop(x). Thus, we can form Table 5.1. Table 5.1 The elements of GF(B). aI t セ@ l' as t+ I a4 c+z セ@ c+z+ I a6 .(+I a7 I Theorem 5.1 Let /31, セG@ ..• , f3q-1 denote the non-zero field elements of GF(q). Then, xq- 1 - 1= (x- f31)(x- セI@ ... (x- f3q_ 1). (5.1) Proof The set of non-zero elements of GF(q) is a finite group under the operation of multiplication. Let f3 be any non-zero element of the field. It can be represented as a power of the primitive element a. Let f3 = (a) r for some integer r. Therefore, 13q-1 = ((anq- 1 = ((a)q- 1 Y = (1Y = 1 because, Hence, f3 is a zero ッヲセM 1 - 1. This is true for any non-zero element /3. Hence, Exturtpk 5.3 Consider the field GF(5). The non-zero elements of this field are {1, 2, 3, 4}. Therefore, we can write x4 -1 =(x- l)(x- 2)(x- 3)(x- 4). Bose-Chaudhuri Hocquenghem (BCH) Codes 5.3 MINIMAL POLYNOMIALS In the previous chapter we saw that in order to find the generator polynomials for cyclic codes of blocklength n, we have to first factorize X' -1. Thus X' -I can be written as the product of its p prime factors X' -I = fi(x) f2(x) f3(x) ... /p (x). (5.2) Any combination of these factors can be multiplied together for a generator polynomial g(x). If the prime factors of X' -I are distinct, then there are (2P- 2) different non-trivial cyclic codes ofblocklength n. The two trivial cases that are being disregarded are g(x) = 1 and g(x) =X' -1. Not all of the (2P- 2) possible cyclic codes are good codes in terms of their minimum distance. We now evolve a strategy for finding good codes, i.e., of desirable minimum distance. In the previous chapter we learnt how to construct an extension field from the subfield. In this section we will study the prime polynomials (in a certain field) that have zeros in the extension field. Our strategy for constructing g(x) will be as follows. Using the desirable zeros in the extension field we will find prime polynomials in the subfield, which will be multiplied together to yield a desirable g(x). Definition 5.3 A blocklength n of the form n = rj - 1 is called a Primitive Block Length for a code over GF(q). A cyclic code over GF{q) of primitive blocklength is called a Primitive Cyclic Code. The field GF((') is an extension field of GF(q). Let the primitive block length n = rf' - 1. Consider the factorization X' -1 = x'f"'-1 - 1 = fi(x) f2(x) ... /p(x) (5.3) over the field GF(q). This factorization will also be valid over the extension field GF(tj) because the addition and multiplication tables of the subfield forms a part of the tables of the extension field. We also know that g(x) divides X' -1, i.e., セュM Q M 1, hence g(x) must be the product of m-1 some of these polynomials j;(x). Also, every non-zero element of GF(rf') is a zero ッヲセ@ - 1. Hence, it is possible to factor xqm-l - 1 in the extension field GF(rj) to get m-1 IJ xq - 1 = (x - /3), (5.4) j where 131 ranges over all the non-zero elements of GF(rf'). This implies that each of the polynomials j;(x) can be represented in GF(rf') as a product of some of the linear terms, and each 131is a zero of exactly one of the j;(x). This j;(x) is called the Minimal Polynomial of f3t Definition 5..4 The smallest degree polynomial with coefficients in the base field GF(q) that has a zero in the extension field GF(tj) is called the Minima) Polynomial of ·. I
  • 74. I li Information Theory, Coding and Cryptography Example 5.4 Consider the subfield GF(2) and its extension field GF(8). Here q = 2 and m = 3. The factorization ofrl -1 (in the subfield/extension field) yields rr-l- 1 = x7 - 1 =(X- 1) HセKxK@ 1) Hセ@ + セ@ + 1). Next, consider the elements ofthe extension field GF(8). The elements can be represented as 0, 1, z, z + 1, z?, i + 1, i + z, i + z +, 1 (from Example 4.10 of Chapter 4). Therefore, we can write rr-1 -1 = x1 -1 =(x-1)(x-z)(x-z-1)(x-i)Cx-i-1)(x-i-z)(x-i-z-1) = (x- 1) · [(x- z)(x- C) (x- z?- z)] · [(x- z- 1)(x -z!-l)(x- z?- z- 1)]. It can be seen that over GF(8), Hセ@ + x + 1) = (x - z)(x - C)(x - i - z), and Hセ@ + セ@ + 1) = (x- z- 1)(x- z 2 - 1)(x- i- z- 1). The multiplication and addition are carried out over GF(8). Interestingly, after a little bit of algebra it is found that the coefficients of the minimal polynomial belong to GF (2) only. We can now make Table 5.2. Table 5.2 The Elements of GF(B) in Terms of the Powers of the Primitive Element a Minimal polynomial Corresponding Elements Elements in Terms f,(x) [31 in GF(8) of powers of a (x- I) (x1 + x+ I) (x1 +;+I) I z. i and i + z z+ I, i + I and i + z+ I ao a 1 ,a2 ,a4 a3, a6, as (= a!2) It is interesting to note the elements (in terms of powers of the primitive element a) that correspond to the same minimal polynomial. If we make the observation that a12 = d ·d = 1· a5 , we see a pattern in the elements that correspond to a certain minimal polynomial. In fact, the elements that are roots of a minimal polynomial in the extension field are of the type f3qr-l where f3 is an element of the extension field. In the above example, the zeros of the minimal polynomialf2(x) =; + x+ 1 are a 1 , a2 and a4 and that ofh(x) =; + ; + 1 are if, cf and d 2 . Definition 5.5 Two elements of GF(tj) that share the same minimal polynomial over GF(q) are called Conjugates with respect to GF(q). . Example 5.5 The elements { a 1 , a 2 , a4 } are conjugates with respect to GF(2). They share the same minimal polynomialf2(x) = セ@ + x + 1. Bose-Chaudhuri Hocquenghem (BCH) Codes As we have seen, a single element in the extension field may have more than one conjugate. The conjugacy relationship between two elements depends on the base field. For example, the extension field GF(16) can be constructed using eitherGF(2) orGF(4). Two elements that are conjugates of GF(2) may not be conjugates ofGF(4). Iff(x) is the minimal polynomial of/3, then it is also the minimal polynomial ofthe elements in the set {/3, f3q, f3q 2 ,...,f3 qr-J }, where r is the smallest integer such that f3qr-l = /3. The set {/3, f3q, f3q 2 ,•.• ,{Jqr- 1 } is called the Set of Conjugates. The elements in the set of conjugates are all the zeros off(x). Hence, the minimal polynomial offJ can be written as f(x) = (x- fJ)(x- fJq)(x- pi") ... (x- pi- 1 ). (5.5) Example 5.6 Consider GF(256) as an extension field ofGF(2). Let abe the primitive elementof GF(256). Then a set of conjugates would be {at, c?, a4, as, al6, a32, a64, al28} Note that cl-56 = a255 d = d, hence the set of conjugates terminates with d 28. The minimal polynomial of a is f(x) = (x- a 1 )(x- a 2 )(x- 。セHクM a 8 )(x- 。 Q セHクM a 32)(x- a 64 )(x- 。 Q セ@ The right hand side of the equation when multiplied out would only contain coefficients from GF(2). Similarly, the minimal polynomial ofcr would be f(x) = (x- a 3 )(x - a 6 )(x - d 2 )(x- a 24 )(x - a 48)(x - 。セHクM a 192 )(x- 、 R セ@ Definition 5.6 BCH codes defined over GF(q) with blocklength q"'- 1 are called Primitive BCH codes. Having developed the necessary mathematical tools, we shall now begin our study of BCH codes. We will develop a method for constructing the generator polynomials of BCH codes that can correct pre-specified t random errors. 5.4 GENERATOR POLYNOMIALS IN TERMS OF MINIMAL POLYNOMIALS We know that g(x) is a factor of XZ - 1. Therefore, the generator polynomial of a cyclic code can be written in the form g(x) = LCM [[I(x) f2(x), ...,JP(x)], (5.6) where, JI(x) J2(x), ..., JP(x) are the minimal polynomials of the zeros of g(x). Each minimal polynomial corresponds to a zero of g(x) in an extension field. We will design good codes (i.e., determine the generator polynomials) with desirable zeros using this approach. .I
  • 75. I . I, Information Theory, Coding and Cryptography Let c(x) be a codeword polynomial and e(x) be an error polynomial. Then the received polynomial can be written as v(x) = c(x) + e(x) (5.7) where the polynomial coefficients are in GF(q). Now consider the extension field GF(rj). Let y1, セ@ •... Yp be those elements of GF(rj) which are the zeros of g(x), i.e., g(y;) = 0 for i = 1, .., p. Since, c(x) = a(x)g(x) for some polynomial a(x), we also have c( y;) = 0 for i= 1, .., p. Thus, v(r;) = c(r;) + e(r;) = e()'i) fori= 1, ..., p (5.8) For a blocklength n, we have 11-1 v()'i) = "Le1r{ fori= 1, ..., p. j•O (5.9) Thus, we have a set of pequations that involve components of the error pattern only. If it is possible to solve this set of equations for e 1, the error pattern can be precisely determined. Whether this set of equations can be solved depends on the value of p, the number of zeros of g(x). In order to solve for the error pattern, we must choose the set of pequations properly. Ifwe have to design for a t error correcting cyclic code, our choice should be such that the set of equations can solve for at most t non-zero ei Let us define the syndromes S; = e(rJ for i = 1' ..., p. we wish to choose YI ' 12··.. Jp in such a manner that terrors can be computed from S1, S1,•••, t If a is a primitive element, then the set of Y; which allow the correction of t errors is {aI, a , a3 , ... , a 21 }. Thus, we have a simple mechanism of determining the generator polynomial of a BCH code that can correct t errors. Steps for Determining the Generator Polynomial of a t-error Correcting BCH Code: For a primitive blocklength n = rj- 1: (i) Choose a prime polynomial of degree m and construct GF(rj). (ii) Find [;(x), the minimal polynomial of ai for i = 1, ..., p. (iii) The generator polynomial for the t error correcting code is simply g(x) = LCM [[1(x) [ 2(x), ...,[2t(x)]. (5.10) Codes designed in this manner can correct at least t errors. In many cases the codes will be able to correct more than t errors. For this reason, d= 2t+ 1 (5.11) is called the Designed Distance of the code, and the minimum distance I ;;:: 2t + 1. The generator polynomial has a degree equal to n- k (see Theorem 4.4, Chapter 4). It should be noted that once we fix n and セ@ we can determine the generator polynomial for the BCH code. The information length k is, then, decided by the degree of g(x). Intuitively, for a fixed blocklength n, a larger value of t will force the information length k to be smaller (because a higher redundancy will be required to correct more number of errors). In the following section, we look at a few specific examples of BCH codes. Bose-Chaudhuri Hocquenghem (BCH) Codes 5.5 SOME EXAMPLES OF BCH CODES The following example illustrates the construction of the extension field GF(16) from GF(2). .The minimal polynomials obtained will be used in the subsequent examples. Emmpk 5.7 Consider the primitive polynomialp(z) =l +z+ 1 セカ・イ@ GF(2). We shalluse this to construct the extension field GF(16). Let a= zbe the primitive element. The elements OfGF(l6) as powers ofa and the corresponding minimal polynomials are listed in the Table 5.3. Table 5.3 The elements of GF(16) and the corresponding minimal polynomlafs al t セ@ t rr t a• t+ 1 セ@ c+z a6 t+i- a7 l+z+ 1 as c+1 a9 t+z a1o l+z+ 1 au t+i!+z al2 t+i!+z+l al3 t+i-+1 at• t + 1 a1s 1 .l+x+ 1 i+x+ 1 x4 + x3 + x2 + x + 1 x"+x+1 Xl+ x+ 1 x4 + :? + x2 + x + 1 i+x3+1 i+x+ 1 i+x3+Xl+x+1 ; +x+ 1 i+f+ 1 x4 + x3 + セ@ + x+ 1 i+x3+ 1 x4 + x3 + 1 x+1 Example 5.8 We wish to determine the generator polynomial of a single error correcting BCH code, i.e., t = 1 with a b1ocklength n = 15. From (5.10), the generator polynomial for a BCH code is given by LCM [f1(x) j2(x), ..., j21(x)]. We will make use of Table 5.3 to obtain the minimal polynomials ft(x) and fix). Thus, the generator polynomial of the single error correcting BCH code will be g(x) = LCM [f1(x), f2(x)] =LCM [(x 4 + x + 1), (x4 + x + 1)] =x 4 +x+ 1 Since, deg (g (x)) =n - k, we have n - k =4, which gives k =11. Thus we have obtained the generator polynomial of the BCH (15, 11) single error correcting code. The designed distance of this coded= 2t + 1 = 3. It can be calculated that the minimum distanced* of this code is also 3. Thus, in this case the designed distance is equal to the minimum distance. Next, we wish to determine the generator polynomial of a double error correcting BCH code, i.e., t =2 with a blocklength n =15. The generator polynomial of the BCH code will be
  • 76. Information Theory, Coding and Cryptography g(x) = LCM [fj(x), fz(x), f3(x), f4(x)] = LCM [(x 4 + x + I)(x 4 + x + I)(x 4 + セ@ + .x2 + x + I)(x4 + x + I)] =(x 4 +x+ QIHク T KセKクャKクK@ I) = セ@ + x7 + x 6 + x4 + 1 Since, deg (g (x)) = n- k, we haven- k = 8, which gives k = 7. Thus, we have obtained the generator polynomial of the BCH (I5, 7) double error correcting code. The desig."led distance of this coded= 2t + 1 = 5. It can be calculated that the minimum distanced* of this code is also 5. Thus, in this case the designed distance is equal to the minimum distance. Next, we determine the generator polynomial for the triple error correcting binary BCH code. The generator polynomial of the BCH code will be g(x) = LCM [fj(x) fz(x), f3(x), fix), f5(x), f6(x)] = (x 4 +X+ 1)(x 4 KセKセKxK@ 1)(_x2 +X+ 1) = XIO + X8 + セ@ + X4 + セ@ + X+ 1 In this case, deg (g (x)) = n- k = 10, which gives k = 5. Thus we have obtained the generator polynomial of the BCH (15, 5) triple error correcting code. The designed distance of this coded= 2t + 1 = 7. It can be calculated that the minimum distanced *ofthis code is also 7. Thus in this case the designed distance is equal to the minimum distance. Next, we determine the generator polynomial for a binary BCH code for the case t = 4. The generator polynomial of the BCH code will be g(x) = LCM [fi(x) f2(x), fix), fix), f5(x), f6(x) f7(x), f8(x)] = (x 4 + x + 1)(x 4 + セ@ + セ@ + x + 1)(.xl + x + 1)(x4 + セ@ + I) = XI4 + Xl3 + XI2 + XII + XIO + セ@ + セ@ + X7 + :J? + セ@ + X4 + セ@ + _x2 + X + I In this case, deg (g (x)) = n- k = I4, which gives k = I. It can be seen that this is the simple repetition code. The designed distance of this coded= 2t + I = 9. However, it can be seen that the minimum distance d *of this code is I5. Thus in this case the designed distance is not equal to the minimum distance, and the code is over designed. This code can actually correct (d- I)/2 = 7 random errors! If we repeat the exercise fort= 5, 6 or 7, we get the same generator polynomial (repetition code). Note that there are only 15 non-zero field elements in GF(16) and hence there are only 15 minimal polynomials corresponding to these field elements. Thus, we cannot go beyond t = 7 (because fort= 8 we needfi6(x), which is undefined). Hence, to obtain BCH codes that can correct larger number oferrors we must use an extension field with more elements! Example 5.9 We can construct GF(16) as an extension field of GF(4) using the primitive polynomial p(z) = i + z + 1 over GF(4). Let the elements of GF(4) consist of the quaternary symbols contained in the set {0, 1, 2, 3}. The addition and multiplication tables forGF(4) aregiven below for handy reference. Bose-Chaudhuri Hocquenghem (BCH) Codes GF(4) + 0 1 2 3 0 1 2 3 0 0 1 2 3 0 0 0 0 0 1 1 0 3 2 1 0 1 2 3 2 2 3 0 1 2 0 2 3 1 3 3 2 1 0 3 0 3 1 2 Table 5.4 lists the elements of GF(16) as powers of a. and the corresponding minimal polynomials. Fort= 1, Table 5.4 Powers of a Elements of GF ( 16) . .Mimmal Polynomials g(x) = LCM [fi(x), f 2(x)] z z+ 2 3z+ 2 z+ 1 2 2z 2z+3 z+ 3 2z+ 2 3 3z 3z+ 1 2z+ 1 3z+ 3 1 = LCM [( .x2 + X+ 2)( .x2 + X+ 3)] = x 4 + x + 1 }+.x+2 }+.x+3 ; + 3.x + 1 }+.x+2 .x+2 ; + 2.x+ 1 ; + 2.x+ 2 }+.x+3 ; + 2.x + 1 x+3 ; + 3.x + 3 ; + 3.x + 1 ; + 2.x + 2 } + 3x+ 3 .x+1 Since, deg (g (x)) = n- k, we have n- k = 4, which gives k = 11. Thus we have obtained the generator polynomial of the single error correcting BCH (15, 11) code over GF(4). It takes in 11 quaternary information symbols and encodes them into 15 quaternary symbols.. Note. that ッセ・@ quaternary symbol is equivalent to two bits. So, in effect, the BCH (15, 1I) takes m 22mput bits and transforms them into 30 encoded bits (can this code be used to correct a burst of length 2 for a binary sequence of length 30?). The designed distance of this code d = 2t + 1 = 3. It can be calculated that the minimum distance d of this code is also 3. Thus in this case the designed distance is equal to the minimum distance.
  • 77. r Information Theory, Coding and Cryptography Fort= 2, g(x) = LCM [f1(x), f2(x), jj(x), f4(x)] = LCM [( xz + x + 2), ( xz + x + 3), ( xz + 3x + t), ( xz + x + 2)l = (r +X+ 2)( r +X+ 3)( r + 3x + 1) = x6 + 3r+ x4 + セ@ + 2X2 + 2x + 1. This is the generator polynomial of a (15, 9) double error correcting BCH code over GF(4). Fort= 3, g(x) = LCM [f1(x), f 2(x), j 3(x), fix), f 5(x), f 6(x)] = LCM [( r +X+ 2), ( r +X+ 3), ( r + 3x + 1), ( r +X+ 2), (x + 2), (r + 2x + 1)] = (r + x + 2)( セ@ + x + 3)( セ@ + 3x + l)(x + RIHセ@ + 2x + t) = セ@ + 3x 8 + 3x 1 + z/' + r + 2x4 +X+ 2 This is the generator polynomial of a (15, 6) triple error correcting BCH code over GF(4). Similarly, for t = 4, g(x) = x 11 + x 10 + U + 3x 1 + 3Ji + r + 3x4 + セ@ + X + 3. This is the generator polynomial of a (15, 4)fourerror correcting BCH code over GF(4). Similarly, for t = 5, g(x) = x 12 + 2x11 + 3x 10 + セ@ + 2x8 + x 1 + 3x6 + 3x4 + Sセ@ + r + 2. This is the generator polynomial of a (15, 3)five error correcting BCH code over GF(4). Similarly, fort= 6, g(x) = Xl4 + Xl3 + Xl2 + Xll + XlO + XJ + セ@ + X1 + ji + r + X4 + セ@ + セKxK@ 1. This is the generator polynomial of a (15, 1) six error correcting BCH code over GF(4). As is obvious, this is the simple repetition code, and can correct up to 7 errors. .Table 5.5 lists the generator polynomial ofbinary BCH codes oflength up to 25 -1. Supposewe wtsh to construct generator polynomial of the BCH(15,7) code. From the table we have (111 010 001) for the coefficients ofthe generator polynomial. Therefore, g(x) = x8 + x1 + x6 + x4 + 1 Table 5.5 The generator polynomials of binary BCH codes of length up to:? -1 n J.. t Generator Polynomial Coetf1c1ents 7 15 15 15 31 4 11 7 5 26 1 1011 1 10 011 2 111 010 001 3 10 100 110 111 1 100 101 Contd. 31 31 31 31 Bose-Chaudhuri Hocquenghem (BCH) Codes 21 16 11 6 2 3 5 7 11 101 101 001 1 000 Ill 110 101 111 101 100 010 011 011 010 101 11 001 011 011 110 101 000 100 111 5.6 DECODING OF BCH CODES So far we have learnt to obtain the generator polynomial for a BCH code given the number of random errors to be corrected. With the knowledge of the generator polynomial, very fast encoders can be built in hardware. We now shift our attention to the decoding of the BCH codes. Since the BCH codes are a subclass of the cyclic codes, any standard decoding procedure for cyclic codes is also applicable to BCH codes. However, better, more efficient algorithms have been designed specifically for BCH codes. We discuss the Gorenstein-Zierler decoding algorithm, which is the generalized form of the binary decoding algorithm first proposed by Petersen. We develop here the decoding algorithm for a terror correcting BCH code. Suppose a BCH code is constructed based on the field element a. Consider the error polynomial e(x) = en_1? 1 + en-2? 2 + ... + e1x + e0 (5.12) where at most t coefficients are non-zero. Suppose that v errors actually occur, where 0 :5 v :5 t. Let these errors occur at locations i1, セL@ ... , ill' The error polynomial can then be written as e(x) = e; 1 セ Q@ + ・セ@ /1. + ... + e;" i• (5.13) where e;* is the magnitude of the kth error. Note that we are considering the general case. For binary codes, e;* = 1. For error correction, we must know two things: (i) where the errors have occurred, i.e., the error locations, and (ii) what the magnitudes of these errors are. Thus, the unknowns are i1, セ@ , ... , iv and e;1 , ・セL@ ..., ei; which signify the locations and the magnitudes of the errors respectively. The syndrome can be obtained by evaluating the イ・セ・ゥカ・、@ polynomial at a. s1 = v(a) = c(a) + e(a) = e(a) = e; 1 xi1 + ・セ@ ! 2 + ... + eiv i• (5.14) Next, define the error magnitudes, Yk = ei* for k = 1, 2, ..., v and the error locations Xk = ai* for k = 1, 2, ..., v, where ik is the location of the kth error and Xk is the field element associated with this location. Now, the syndrome can be written as S1 = J1X1 + r; x; + ... + Yuxu (5.15) We can evaluate the received polynomial at each of the powers of a that has been used to define g(x). We define the syndromes for j= 1, 2, ..., 2tby
  • 78. r llセセ@ セセ@ Information Theory, Coding and Cryptography Sj= v(ai) = c(ai) + e(ai) = e(ai) (5.16) ':bus, we have the following set of 2t simultaneous equations, with v unknown error locations X1 , A;, ... , セ。ョ、@ the v unknown error magnitudes Yi, f2, ... , Yv. S1 = YiX1 + Y2Xi + ··· + セ@ Xv S.Z = Yix'lr + Y2X22 + ··· + セxセ@ (5.17) S.Zt = yゥクGャセ@ + ヲRNxRセ@ + ... + セクセエ@ Next, define the error locator polynomial A(x) ]aセK@ Av_1セM Q@ + ... A1x + 1 (5.18) The zeros of this polynomial are the inverse error locations x-1 for k = 1 2 v Th t · k , , •.•, • a 1s, A(x) = (1- xX1) (1 - xX2) ... (1 - xXJ (5.19) So, if we know the coefficients of the error locator polynomial A(x), we can obtain the error locations Xi, A;, ... , セ@ Mter some algebraic manipulations we obtain A1SJ+ v-I+ A2SJ+ v- 2 +- ... + AvSJ+ vfor j= I, 2, ..., v (5.20) This is nothing but a set of linear equations that relate the syndromes to the coefficients of A(x). This set of equations can be written in the matrix form as follows. [ s1 s2 sll_1 sll ] [ セ@ J [-s11+ 1 ] S2 s3 sll sll+l セMQ@ = -s11+2 s1J s'(}+ 1 s21J-2 s211_I AI - s211 (5.21) The values of エィセ@ 」ッ・セ」セ・ョエウ@ ッセ@ the error locator polynomial can be determined by inverting the syndrome matrix. This IS poss1ble only if the matrix is non-singular. It can be shown that th" . . lS matrix IS non-singular if there are v errors. Steps for Decoding BCH Codes (i) As a trial カセオ・L@ セ・エ@ v= t and compute the determinant of the matrix of syndromes, M. If the determi?ant IS zero, ウセエ@ v = t - 1. Again compute the detetminant of M. Repeat this process until a :alue of v 1s セッオョ、@ for which the determinant of the matrix of syndromes Is non-zero. This value of vIs the actual number of errors that occurred. (ii) Invert the matrix M and find the coefficients of the error locator polynomial A{;x). (iii) s 1 ( ) o ve .A.x =?to obtain the zeros and from them compute the error locations X1 , A;, ... , Xzr If It Is a bmary code, stop (because the magnitudes of error are unity). (iv) If the code is not binary, go back to the system of equations: S1 = YiXr + Y2Xi + ··· + セ@ Xv S.Z = Yix'l1 + f2_x22 + ... + セxセ@ (' = y,_x'lt + y;x2t + + y x2t i.J<).t 1 1 2 2 .. . v v Bose-Chaudhuri Hocquenghem (BCH) Codes Since the error locations are now known, these form a set of 2t linear equations. These can be solved to obtain the error magnitudes. Solving for Ai by inverting the vx vmatrix can be computationally expensive. The number of computaticns required will be proportional to v3 . If we need to correct a large number of errors (i.e., a large v) we need more efficient ways to solve the matrix equation. Various refinements have been found which greatly reduce the computational complexity. It can be seen that the v x v matrix is not arbitrary in form. The entries in its diagonal perpendicular to the main diagonal are all identical. This property is called persymmetry. This structure was exploited by Berleykamp (1968) and Massey (1969) to find a simpler solution to the system of equations. The simplest way to search for the zeros of A(x) is to test out all the field elements one by one. This method of exhaustive search is known as the Chien search. Example 5.10 Consider the BCH (15, 5) triple error correcting code with the generator polynomial g(x) = x10 + x 8 + :x? + x 4 + X: + x + 1 Let the all zero codeword be transmitted and the received polynomial be v(x) =:x? + セN@ Thus, there are two errors at the locations 4 and 6. The error pol1£1omial e(x) = :x? + x 3 • But the decoder does not know this. It does not even know how ma.'ly errors have actually occurred. We use the Gorenstein-Zierler Decoding Algorithm. First we compute the syndromes using the arithmetic of GF(16). S1 = as + a 3 = all S2 = a 10 + a 6 = a7 s3 = a 15 + a9 = a 7 s4 = a20 + al2 = al4 Ss =a 25 + aIs =as 56 = a 3o + a1s = al4 First set v = t = 3, since this is a triple error correcting code. Det (M) =0, which implies that fewer than 3 errors have occurred. Next, set v =t =2.
  • 79. セセセ@ 1. ':.. : 1 ,, Information Theory, Coding and Cryptography Det (M) :1:. 0, which implies that 2 errors have actually occurred. We next calculate M-1• It so happens that in this case, Solving for A1 and A2 we get Az =a8 and A1 =a11 • Thus, A(x) = a 8 セ@ + a 11 x + I= ( a 5 x + 1)( a 3 x + 1). Thus, the recovered errorlocations are a 5 and a3 • Since the code is binacy, the errormagnitudes are 1. Thus, e(x) = x5 + x3 • In the next section, we will study the famous Reed-Solomon Codes, an important sub-class of BCHcodes. 5.7 REED-SOLOMON CODES Reed-Solomon (RS) Codes are an important subclass of the non-binary BCH with a wide range of applications in digital communications and data storage. The typical application areas of the RS code are • Storage devices (including tape, Compact Disk, DVD, barcodes, etc), • Wireless or mobile communication (including cellular telephones, microwave links, etc), • Satellite communication, • Digital television I Digital Video Broadcast (DVB), • High-speed modems such as those employing ADSL, xDSL, etc. It all began with a five-page paper that appeared in 1960 in the journal of the Society for ャョセオウエイゥ。ャ@ andAppliedMathematics. The paper, "Polynomial Codes over Certain Finite Fields" by Irvmg S. Reed and Gustave Solomon of MIT's Lincoln Laboratory, introduced the ideas that form a significant portion of current error correcting techniques for everything from computer hard disk drives to CD players. Reed-Solomon Codes (plus a lot of engineering wizardry, of course} ュ。、セ@ possible the stunning pictures of the outer planets sent back by Voyager II. They make It possible to scratch a compact disc and still enjoy the music. And in the not-too-distant future, they will enable the profit mongers of cable television to squeeze more than 500 channels into their systems. RS coding system is based on groups of bits, such as bytes, rather than individual Os and 1s making it particularly good at dealing with bursts of errors: six consecutive bit errors fo: example, can affect at most two bytes. Thus, even a double-error-correction version of a rセ・、ᆳ Solomon code can provide a comfortable safety factor. Current implementations of Reed- Bose-Chaudhuri Hocquenghem (BCH) Codes Solomon codes in CD technology are able to cope with error bursts as long as 4000 consecutive bits. In this sub-class of BCH codes, the symbol field GF(q) and the error locator field GF(qm) are the same, i.e., m = 1. Thus, in this case ョ]セMQ]アMQ@ The minimal polynomial of any element f3 in the same field GF(q) is [p(x) = x- f3 (5.22) (5.23) Since the symbol field (sub-field) and the error locator field (extension field) are the same, all the minimal polynomials are linear. The generator polynomial for at error correcting code will be simply g(x) = LCM [/i(x) fAx), ...,he(x)] = (x- a)(x- a 2) ... (x- a 2 t-l) (x- a 2 t) (5.24) Hence, the degree of the generator polynomial will always be 2t. Thus, the RS code satisfies n - k = 2t (5.25) In general, the generator polynomial of an RS code can be written as g(x) = (x- ai)(x- ai+l) ... (x- a 2 t+t-- 1 )(x- a 2t+i) (5.26) , Exturtpls 5.11 Consider the double error correcting RS code of blocklength 15 over GF (16). Here t =2. We use here the elements ofthe extension field GF (16) constructed from GF(2)using the primitive polynomialp(z) =z4 +z+ 1. The generator polynomial can be written as g(x) =(x- a)(x- a 2 ) (x- a 3 ) (x- 。セ@ =x4 + Hセ@ + i + 1) セ@ + Hセ@ + i)セ@ + セク@ + (i + z + 1) =x4 + H。 Q セ@ セKH。セ@ Xl + (a 3 ) x + a 10 Here n -lc =4, which implies lc = 11. Thus, we have obtained the generator polynomial of an RS (15, 11) code overGF(16). Note thatthiscodingprocedure takes in 11 symbols (equivalentto4x 11 =44 bits) and encodes them into 15 symbols (equivalent to 60 bits). Theorem 5.2 A Reed-Solomon Code is a Maximum Distance Separable (MDS) Code and its minimum distance is n - k + 1. Proof Let the designed distance of the RS code be d= 2t + 1. The minimum distance d satisfies the condition I"?. d= 2t+ 1 But, for an RS code, 2t = n - k. Hence, d"?.d=n-k. But, by the Singleton Bound for any linear code,
  • 80. Information Theory, Coding and Cryptography 、セ@ n- k Thus, d'" = n - k + 1, and the minimum distance d = d, the designed distance of the code. Since RS. codes are maximum distance separable (MDS), all of the possible code words are as far away as possible algebraically in the code space. It implies a uniform code word distribution in the code space. Table 5.6 lists the parameters of someRS codes. Note that for a given minimum distance, in order to have a high code rate, one must work with larger Galois Fields. Table 5.6 Some RS code parameters m q =2"1 n =q - 1 t k d r = k/n 2 4 3 1 3 0.3333 3 8 7 1 5 3 0.7143 2 3 5 0.4286 3 1 7 0.1429 4 16 15 1 13 3 0.8667 2 11 5 0.7333 3 9 7 0.6000 4 7 9 0.4667 5 5 11 0.3333 6 3 13 0.2000 7 1 15 0.0667 5 32 31 1 29 3 0.9355 5 21 11 0.6774 8 15 17 0.4839 8 256 255 5 245 11 0.9608 15 225 31 0.8824 50 155 101 0.6078 Example 5.12 A popular Reed-Solomon code is RS(255,223) with 8-bit symbols (bytes), i.e., over GF (256). Each codeword contains 255 code word bytes, ofwhich 223 bytes are data and 32 bytes are parity. For this code, n =255, k = 223 and n- k =32. Hence, 2t =32, ort =16. Thus, the decoder can correct any 16 symbol random error in the codeword,·i.e., errors in up to 16 bytes anywhere in the 」ッ、・セッイ、@ can be corrected. Example 5.13 Reed Solomon error correction codes have an extremely pronounced effect on the efficiency of a digital communication channel. For example, an operation running at a datarate of 1 million bytes per second will carry approximately 4000 blocks of255 bytes eachsecond. If 1000 random short errors (less than 17 bits in length) per second are injected into thechannel, about 600 to 800 blocks per second would be corrupted, which might require retransmission ofnearly all of the blocks. By applying the Reed-Solomon (255, 235) code (thatcorrects up to 10 errors per block of 235 information bytes and 20 parity bytes), the typical time between blocks that cannot be corrected and would require retransmission will be about 800 years. The mean time between incorrectly decoded blocks will be over 20 billion years! Bose-Chaudhuri Hocquenghem (BCH) Codes 5.8 IMPLEMENTATION OF REED-SOLOMON ENCODERS AND DECODERS Hardware Implementation A number of commercial hardware implementations exist for RS codes. Many existing systems use off-the-shelf integrated circuits that encode and decode Reed-Solomon codes. These ICs tend to support a certain amount of programmability, for example, RS(255, k) where t = 1 to 16 symbols. The recent trend has been towards VHDL or Verilog Designs (logic cores or intellectual property cores). These have a number of advantages over standard ICs. A logic core can be integrated with other VHDL or Verilog components and synthesized to an FPGA (Field ' Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)-this enables so- called "System on Chip" designs where multiple modules can be combined in a single IC. Depending on production volumes, logic cores can often give significantly lower system costs than standard ICs. By using logic cores, a designer avoids the potential need to do a life-time buy of a Reed-Solomon IC. Software Implementation Until recently, software implementations in "real-time" required too much computational power for all but the simplest of Reed-Solomon codes (i.e. codes with small values oft). The major difficulty in implementing Reed-Solomon codes in software is that general purpose processors do not support Galois Field arithmetic operations. For example, to implement a Galois Field multiply in software requires a test for 0, two log table look-ups, modulo add and anti-log table look-up. However, careful design together with increases in processor performance mean that software implementation can operate at relatively high data rates. Table 5.7 gives sample benchmark figures on a 1.6 GHz Pentium PC. These data rates are for decoding only. Encoding is considerably faster since it requires less computation. Table 5.7 Sample benchmark figures for software decoding of some RS codes · Code Data Rate t . RS(255,251) RS(255,239) RS(255,223) 5.9 NESTED CODES - 120 Mbps - 30 Mbps - 10 Mbps 2 8 16 One of the ways to achieve codes with large blocklengths is to nest codes. This technique combines a code of a small alphabet size and one of a larger alphabet size. Let a block of ttary symbols be of length kK. This block can be broken up into Ksub-blocks of k symbols. Each sub- block can be viewed as an element of a l-ary alphabet. A sequence of K such sub-blocks can be encoded with an (N, K) code over GF(q*). Now, each of theN q*-ary symbols can be viewed as .I
  • 81. Information Theory, Coding and Cryptography k q-ary symbols and can be coded with an (n, k) q-ary code. Thus, a nested code has two distinct levels of coding. This method of generating a nested code is given in Fig. 5.I. セMMMMMMMMMMMMMMMMMMMMQ@ Outer Encoder. - (N, K)Code over GF(qk} 1 アセ\M。イケ@ super channel I I I r-r I I I Inner Encoder: (n, k) Code t---- over GF(q) I I q-ary f----+- Inner Decoder t-+ channel I _____________________ j Fig. 5. 1 Nesting of Codes. Outer Decoder Example 5.14 The following two codes can be nested to form a code with a larger blocklength. Inner code: The RS (7, 3) double error correcting code over GF(8). Outer code: The RS (5II, 505) triple error correcting code over GF(83). On nesting these codes we obtain a (3577, 1515) code over GF(8). This code can correct any random pattern of II errors. The codeword is 3577 symbols long, where the symbols are the elements ofGF(8). Example 5.15 RS codes are extensively used in the compact discs (CD) for ermr correction. Below we give the standard Compact Disc digital fonnat. Sampling frequency: 44.1 kHz, i.e., 10% margin with respect to the Nyquist frequency (audible frequencies below 20kHz) Quantization: 16-bit linear=> theoretical SNR about 98 dB (for sinusoidal signal with maximum allowed amplitude), 2's complement Signal format: Audio bit rate 1.4I Mbit/s (44.I kHz x 16 bits x 2 channels), Cross Interleave Reed-Solomon Code (CIRC), total data rate (CIRC, sync, subcode) 2.034 Mbit/s. Playing time: Maximum 74.7 min. Disc specifications: Diameter 120 mm, thickness 1.2 mm, track pitch 1.6 f.lill, one side medium, disc rotates clockwise, signal is recorded from inside to outside, constant linear velocity (CLV), recording maximizes recording density (the speed of revolution of the disc is not constant; it gradually decreases from 500 to 200 r/min), pit is about 0.5 J.1IIl wide, each pit edge is '1' and all areas in between, whether inside or outside a pit, are '0's. E"or Correction: A typical error rate of a CD system is w-5, which means that a data error occurs roughly 20 times per second (bit rate x BER). About 200 error/s can be ..;Orrected. Soutces ofe"ors: Dust, scratches, fingerprints, pit asymmetry, bubbles or defects in substrate, coating defects and dropouts. Bose-Chaudhuri Hocquenghem (BCH) Codes Cross Interleave Reed-Solomon Code (CIRC) • C2 can effectively correct burst errors. • C1 can correct random errors and detect burst errors. • Three interleaving stages to encode data before it is placed on a disc. • Parity checking to correct random errors • Cross interleaving to permit parity to correct burst errors. I. Input stage: I2 words (16-bit, 6 words per channel) of data per input frame divided into 24 symbols of 8 bits. 2. C2 Reed Solomon code: 24 symbols of data are enclosed into a (28, 24) RS code and 4 parity symbols are used for error correction. 3. Cross interleaving: to guard against burst errors, separate error correction codes, one code can check the accuracy of another, error correction is enhanced. 4. C1 Reed-Solomon code: cross-interleaved 28 symbols of the C2 code are encoded again into a (32, 28) R-S code (4 parity symbols are used for error correction). 5. Output stage: half of the code word is subject to a 1-symbol delay to avoid 2-symbol error at the boundary of symbols. Performance ofCIRC: Both RS coders (C1 and C2) have four parities, and their minimumdistance is 5. If error location is not known, up to two symbols can be corrected. If the errors exceed the correction limit, they are concealed by interpolation. Since even-numbered sampleddata and odd- numbered sampled data are interleaved as much as possible, CIRC can conceal long burst errors by simple lin<?ar interpolation. • Maximum correctable burst length is about 4000 data bits (2.5 mm track length). • Maximum correctable burst length by interpolation in the worst case is about 12320 data bits (7.7 mm track length). Sample interpolation rate is one sample every IO hours at BER (Bit Error Rate)= 1o- 4 and 1000 samples at BER = 10-3 • Undetectable error samples (clicks) are less than one every 750 hours at BER = 10-3 and negligible at BER = 10-4 . 5.10 CONCLUDING REMARKS The class of BCH codes were discovered independently by Hocquenghem in I959 and Bose and Ray Chaudhuri in I960. The BCH codes constitute one of the most important and powerful classes of linear block codes, which are cyclic. The Reed-Solomon codes were discovered by Irving S. Reed and Gustave Solomon who published a five-page paper in the journal of the Society for Industrial and Applied Mathematics in 1960 titled "Polynomial Codes over Certain Finite Fields". Despite their advantages, Reed-
  • 82. Information Theory, Coding and Cryptography Solomon codes did not go into use immediately after their invention. They had to wait for the hardware technology to catch up. In 1960, there was no such thing as fast digital electronics, at least not by today's standards. The Reed-Solomon paper suggested some nice ways to process data, but nobody knew if it was practical or not, and in 1960 it probably wasn't practical. Eventually technology did catch up, and numerous researchers began to work on implementing the codes. One of the key individuals was Elwyn Berlekamp, a professor of electrical engineering at the University of California at Berkeley, who invented an efficient algorithm for decoding the Reed-Solomon code. Berlekamp's algorithm was used by Voyager II and is the basis for decoding in CD players. Many other bells and whistles (some of fundamental theoretic significance) have also been added. Compact discs, for example, use a version called cross-interleaved Reed-Solomon code, or CIRC. SUMMARY • A primitive element of GF(q) is an element a such that every field element except zero can be expressed as a power of a. A field can have more than one primitive element. • A primitive polynomial p(x) over GF(q) is a prime polynomial over GF(q) with the property that in the extension field constructed modulo p(x), the field element represented by x is a primitive element. • A blocklength n of the form n = rf - 1 is called a primitive block length for a code over GF(q). A cyclic code over GF(q) of primitive blocklength is called a primitive cyclic code. • It is possible to factor xqm-r - 1 in the extension field GF(rf) to ァセエクアュMャ@ - 1= IT (x- fJi), j where /3- ranges over all the non-zero elements of GF(rf). This implies that each of the polynon'iials fi(x) can be represented in GF(rf) as a product of some of the linear terms, and each {31 is a zero of exactly one of the fi (x). This fi(x) is called the minimal polynomial of f3t • Two elements of GF(rf) that share the same minimal polynomial over GF(q) are called conjugates with respect to GF (q). • BCH codes defined over GF(q) with blocklength rf - 1 are called primitive BCH codes. • To determine the generator polynomial of a t-error correcting BCH code for a primitive blocklength n = qm- 1, (i) Choose a prime po.lynomial of degree m and construct GF(q,, (ii) find Ji(x), the minimal polynomial of a1 for i = 1, ..., p. (iii) obtain the generator polynomial g(x) = LCM lfi(x) f2(x), ...,.f2e(x)]. Codes designed in this manner can correct at least terrors. In many cases the codes will be able to correct more than terrors. For this reason, d = 2t + 1 is called the designed distance of the code, and the minimum distance d セ@ 2t + 1. • Steps for decoding BCH codes: (1) As a trial value, set v= t and compute the determinant of the matrix of syndromes, M. If the determinant is zero, set v= t - 1. Again compute the determinant ofM. Repeat Bose-Chaudhuri Hocquenghem (BCH) Codes this process until a value of v is found for which the determinant of the matrix of syndromes is non zero. This value of v is the actual number of errors that occurred. (2) Invert the matrix M and find the coefficients of the error locator polynomial A(x). (3) Solve A(x) = 0 to obtain the zeros and from them compute the error locations Xi, x;, ... , Xrr If it is a binary code, stop (because the magnitudes of error are unity). (4) If the code in not binary, go back to the system of equations: S1 = YrX1 + Y2X2 + ... + YvXv セ@ = ll.x'lr + J2.x'l2 + ··· + f;; xセ@ セエ@ = YrX2i + jRxRセ@ + ... + ヲ[[xセエ@ Since the error locations are now known, these form a set of2t linear equations. These can be solved to obtain the error magnitudes. • The generator polynomial for a terror correcting RS code will be simply g(x) = LCM[fi(x) f2(x), ..., ht(x)] = (x- a)(x- a2 ) ... (x- a2 t- 1 )(x- ift). Hence, the degree of the generator polynomial will always be 2t. Thus, the RS code satisfies n - k = 2t. • A Reed-Solomon code is a Maximum Distance Separable (MDS) Code and its minimum distance is n- k + 1. • One of the ways to achieve codes with large blocklengths is to nest codes. !his technique combines a code of a small alphabet size and a code of a larger alphabet Size. Let a b}ock of q-ary symbols be of length kK. This block can be broken up into K subblocks of k symbols. Each sub-block can be viewed as an element of a l-ary alphabet. 9 BoョN」・Mケッキセエmイセ@ whaawer セ@ n.o- : ·I セィ・キ@ ゥューセ@ セッッエョN・Mエイセ@ .. J -UセQAセ@ (by SirAt"dwr CO"f.at'!Voyles 1859-1930) PRO'BLEMS Vconstruct ,CF(9) from Gft3) using an appropriate primitive ーッャセッュゥ。ャN@ ..£Z(i) Find the ァ・ョ・セ。エッイ@ ーッャケョッュゥセ@ g (x) for a セセセYQ・@ error 」ッイイ・」⦅エゥョセ@ Je__rnary BCH code of 'C/ 「ャッ」ォャ・ョ・ᄋキセ。エ@ is the code rate of this code? cセセー。イ・@ It セエャヲエィ・@ (11, 6) ternary Golay 」ッセエィ@ respect to the code rate and the mimmum distance. (ii) Next, find' the generator polynomial g(x) for a triple error correcting ternary BCH code of blocklength 26. 5.3 Find the generator polynomial g(x) for a binary BCH code of 「ャッ」ォャ・ョセ@ 31. ⦅ャjセ・@ the primitive polynomial p(x) = セ@ + } + 1 to construct GF(32). What is e mimmum distance of this code? セ、@ the generator polynomials and the minimum distance for the following codes: §IRS (15, 11) code
  • 83. ,, Ill lJ ! I :! Information Theory, Coding and Cryptography (ii) RS (15, 7) code (iii) RS (31, 21) code. 5.5 Show that every BCH code is a subfield subcode of a Reed-Solomon Code of the same designed distance. Under what condition is the code rate of the BCH code equal to that of the RS code? 5.6 Consider the code over GF(11) with a parity check matrix - {セ@ セ@ セ@ セZZ@ Qセ@ l H- 1 'J? セ@ 1o2 1 r.t :il 1ol (i) Find the minimum distance of this code. (ii) Show that this is an optimal code with respect to the Singleton Bound. 5.7 Consider the code over GF(11) with a parity check matrix 1 I 1 1 1 1 2 3 • 10 1 2? セ@ .2 1cf H= I cji <ji .3 Hf I セ@ 3' •• HJ' 1 z5 セ@ .5 tOS (i) Show that the code is a triple error correcting code. (ii) Find the generator polynomial for this code. COMPUTER PROBLEMS 5.8 Write a computer program which takes in the coefficients of a primitive polynomial, the values of q and m, and then constructs the extension field GF(tj). 5.9 Write a computer program that performs addition and multiplication over GF(2m), where m is an integer. 5.10 Find the generator polynomial g(x) for a binary BCH code of blocklength 63. Use the primitive polynomial p(x) = } + x+ 1 to construct GF(64). What is the minimum distance of this code? 5.11 Write a program that performs BCH decoding given n, q, t and the received vector. 5.12 Write a program that outputs the generator polynomial of the Reed-Solomon code with the codeword length n and the message length k. A valid n should be 2M- 1, where M is an integer not less than 3. The program should also list the minimum distance of the code. 5.13 Write a computer program that performs the two level RS coding as done in a standard compact disc. 6 Convolutional Codes f rhet セー。エャャNエ@ o・エキ・・KG|ャエャ・ャッMセ@ cョエセ@ イ・。ャLセ@ I セセセ」」キアZゥャL・NヲエOセ@ L j。L」アオ。MQAセQXVUᄋQYVS@ 6.1 INTRODUCTION TO CONVOLUTIONAL CODES So far we have studied block codes, where a block of k information symbols are encoded into a · block of n coded symbols. There is always a one-to-one correspondence between the uncoded block of symbols (information word) and the coded block of symbols (codeword). This method is particularly useful for high data rate applications, where the incoming stream of uncoded data is first broken into blocks, encoded, and then transmitted (Fig. 6.1). A large blocklength is important because of the following reasons. (i) Many of the good codes that have large distance properties are of large blocklengths (e.g., the RS codes), (ii) Larger blocklengths imply that the encoding overhead is small. However, very large blocklengths have the disadvantage that unless the entire block of encoded data is received at the receiver, the decoding procedure cannot start, which may result in delays. In contrast, there is another coding scheme in which much smaller blocks of uncoded
  • 84. Information Theory, Coding and Cryptography data of length セ。イ・@ used. These are called Information Frames. An information frame typically contains just a few symbols, and can have as few as just one symbol! These information frames are encoded into Codeword Frames of length no . However, just one information frame is not used to obtain the codeword frame. Instead, the current information frame with previous m information frames are used to obtain a single codeword frame. This implies that such encoders have memory, which retain the previous m incoming information frames. The codes that are obtained in this fashion are called Tree Codes. An important sub-class of Tree Codes, used frequently in practice, is called Convolutional Codes. Up to now, all the decoding techniques discussed are algebraic_ and are memoryless, i.e. decoding decisions are based only on the current codeword. Convolutional codes make decisions based on past information, i.e. memory is required. 101110... 01100101 ... Block encoder Fig. 6.1 Encoding Using a Block Encoder. In this chapter, we start with an introduction to Tree and Trellis Codes. We will then, develop the necessary mathematical tools to construct convolutional codes. We will see that convolutional codes can be easily represented by polynomials. Next, we will give a matrix description of convolutional codes. The chapter goes on to discuss the famous Viterbi Decoding Technique. We shall conclude this chapter by giving an introduction to Turbo Coding and Decoding. 6.2 TREE CODES AND TRELLIS CODES We assume that we have an infinitely long stream of incoming symbols (thanks to the volumes of information sent these days, it is not a bad assumption!). This stream of symbols is first broken up into segments of セ@ symbols. Each segment is called an Information Frame, as mentioned earlier. The encoder consists of two parts (Fig. 6.2): (i) memory, which basically is a shift register, (ii) a logic circuit. The memory of the encoder can store m information frames. Each time a new information frame arrives, it is shifted into the shift register and the oldest information frame is discarded. At the end of any frame time the encoder has m most recent information frames in its memory, which corresponds to a total of mlc0 information symbols. When a new frame arrives, the encoder computes the codeword frame using this new frame that has just arrived and the stored previous m frames. The computation of the codeword frame is done using the logic circuit. This codeword frame is then shifted out. The oldest information frame in the memory is then discarded and the most recent information frame is shifted in. The Convolutional Codes encoder is now ready for the next incoming information frame. Thus, for every information frame セ@ symbols) that comes in, the encoder generates a codeword frarrie (no symbols). It should be observed that the same information frame may not generate the same codeword frame because the codeword frame also depends on the m previous information frames. Definition 6.1 The Constraint Length of a shift register encoder is defined as the number of symbols it can store in its memory. We shall give a more formal definition of cqnstraint length later in this chapter. If the shift register encoder stores m previous information frames of length セL@ the constraint length of this encoder v= ュセN@ 101110... Information frame Fig. 6.2 A Shift Register Encoder that Generates a Tree Code. Definition 6.2 The infinite set of all infinitely long codewords obtained by feeding every possible input sequence to a shift register encoder is called a (AQ, 71Q ) Tree Code. The rate of this tree code is defined as (6.1) A more formal definition is that a HセL@ no) Tree Code is a mapping from the set of semi infinite sequences of elements of GF(q) into itself such that iffor any m, two semi infinite sequences agree in the first mAQ components, then their images agree in the first Nセ@ components. Definition 6.3 The Wordlength of a shift register encoder is defined as k= (m + l)AQ. The Blocklength of a shift register encoder is defined as n = (m + 1)71Q =k :: Note that the code rate R =セ@ =.!... Normally, for practical shift register encoders, no n the information frame length Ao is small (usually less than 5). Therefore, it is difficult to obtain the code rate R of tree codes close to unity, as is possible with block codes (e.g., RS codes).
  • 85. Information Theory, Coding and Cryptography Definition 6.4 A (no, セI@ tree code that is linear, time-invariant, and has a finite wordlength k= (m + 1)k 0 is called an (n, k) Convolutional Code. Definition 6.5 A (no, Ao ) tree code that is time-invariant and has a finite wordlength k is called an Hセ@ k) Sliding Block Code. Thus, a linear sliding block code is a convolutional code. Example'6.1 Consider the convolutional encoder given in Fig. 6.3. Input r------------------------- 1 I I I I I I I Shift Fig. 6.3 Convolutional Encoder of Example 6. 7. Encoded Output This encoder takes in one bit at a time and encodes it into 2 bits. The information frame length ko =セL@ the code':ord frame length n0 = 2 and the blocldength (m + 1)no= 6 . The constraint length of this セョ」セイ@ IS v = 2 and the code rate セM The clock rate of the outgoing data is twice as fast as that of⦅ュ」ッュュセ@ data. The adders are binary adders, and from the point of view of circuit imple- mentation, are simply XOR gates. .Let us assume that the initial state of the shift register is [0 0]. Now, either '0' will come or •1• will come as the incoming bit. Suppose '0' comes. On performing the logic operations, we see that the_comput.ed value of the codeword frame is [0 0]. The 0 will be pushed into the memory (shift イ・ァセウエ・イI@ and the rightmost '0' will be dropped. The state of the shift register remains [0 O] Next let '1 ' · th · . ' セカ・⦅@ at e encoder. Agam we perform the logic operations to compute the codeword セセM This セュ・@ w_e obtain [1 1]. So, this will be pushed out as the encoded frame. The incoming 1. キゥャャセ@ ウィゥヲセ@ mto the memory, and the rightmost bit will be dropped. So the new state of the shift イ・ァセウエ・イ@ will be [1 0]. I;J ·IL.itl Before セ@ セ@ ·-.,.. セ@ After Drop the oldest bit __________________________c __ ッョ⦅カ⦅ッ⦅ャ⦅オエ⦅ゥッ⦅ョ⦅。⦅i⦅c⦅ッ⦅、セ・セウMMMMMMMMMMMMMMMMMMMMMMMMMMセ@ Table 6.1 lists all the possibilities. Table 6.1 The Incoming and Outgoing Bits of the Convolutional Encoder. lncomm9 Current State of Outgomg Blfs - Blf the Encode! 0 1 0 1 0 1 0 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 We observe that there are only 22 = 4 possible states of the shift register. So, we can construct the state diagram of the encoder as shown in Fig. 6.4. The bits associated with each arrow represents the incoming bit. It can be seen that the same incoming bit gets encoded differently depending on the current state of the encoder. This is different from the linear block codes studied in previous chapters where there is always a one-to-one correspondence between the incoming uncoded block of symbols (information word) and the coded block of symbols (codeword). Fig. 6.4 The state Diagram for the Encoder in Example 6. 7. The same information contained in the state diagram can be conveyed usefully in terms of a graph called the Trellis Diagram. A trellis is a graph whose nodes are in a rectangular grid, which is semi-infinite to the right. Hence, these codes are also called Trellis Codes. The number of nodes in each column is finite.The following example gives an illustration of a trellis diagram. Example 6.2 The trellis diagram for the convolutional encoder discussed in Example 6.1 is given in Fig. 6.5. Every node in the trellis diagram represents a state of the shift register. Since the rate of the encoder ゥウセL@ one bit at a time is processed by the encoder. The incoming bit is either a '0' or a '1'. Therefore, there are two branches emanating form each node. The top branch represents the input as '0' and the lower branch corresponds to '1'. Therefore, labelling is not required for a binary trellis diagram. In general, one would label each branch with the input symbol to which it corresponds. Normally, the nodes that cannot be reached by starting at the top left node and moving only to the right are not shown in the trellis diagram. Corresponding to a certain state and a particular incoming bit, the encoder will produce an output. The output ofthe encoder is written on
  • 86. f' I ii I Information Theory, Coding and Cryptography 0 0· 0· 0· States MMMMMMMMMMMセ@ Time axis • • • Continues to infinity Fig. 6.5 The Trellis Diagram for the Encoder Given in Fig. 6.3. top of that 「イ。ョセィN@ Thus, a trellis diagram gives a very easy method to encode a stream of input data. The encoding procedure using a trellis diagram is as follows. • We start from the top left node (since the initial state of the encoder is [0 O]). • Depending on whether a '0' or a '1' comes, we follow the upper or the lower branch to the next node. • The encoder output is read out from the top of the branch being traversed. • Again, depending on whether a '0' or a '1' comes, we follow the upper or the lower branch from the current node (state). • Thus, the encoding procedure is simply following the branches on the diagram and reading out the encoder outputs that are written on top ofeach branch. Encoding the bit stream 1 0 0 1 1 0 1 ... gives a trellis diagram as illustrated in Fig. 6.6. The encoded sequence can be read out from the diagram as 11 01 11 11 10 10 00 .... It 」セ@ be seen エセエ@ .there is a one-to-one correspondence between the encoded sequence and a path _m the エイ・ャセコウ@ セコ。ァイ。ュN@ Should the decoding procedure, then, just search for the most likely path m the trellis diagram? The answer is yes, as we shall see further along in this chapter! G • • Fig. 6.6 Encoding an Input Sequence Using the Trellis Diagram. Convolutional Codes 6.3 POLYNOMIAL DESCRIPTION OF CONVOLUTIONAL CODES (ANALYTICAL REPRESENTATION) In contrast to the two pictorial representations of the convolutional codes (the state diagram and the trellis diagram), there is a very useful analytical representation of convolutional codes. The representation makes use of the delay operator, D. We have earlier seen a one-to-one correspondence between a word (vector) and a polynomial. The delay operator is used in a similar ュセョ・イN@ For example, consider the word 10100011 with the oldest digit at the left. The analytical representation (sometimes referred to as the transform) of this information word /(D) will be (6.2) The indeterminate D is the number of time units of delay of that digit relative to the chosen time origin, which is usually taken to coincide with the first bit. In general, the sequence IQ, i1, セN@ z3 .... has the representation IQ + i1D + セ@ JJ + z3d + ... A convolutional code over GF (q) with a wordlength k= (m + 1)AQ, a blocklength n = (m + 1)no and a constraint length v = ュセ@ can be encoded by sets of finite impulse response (FIR) filters. Each set of filters consist of セ@ FIR filters in GF (q). The input to the encoder is a sequence of Ao symbols, and the output is a sequence of no symbols. Figure 6.7 shows an encoder for a binary convolutional code with Ao = 1 and no= 4. Figure 6.8 shows a convolutional filter with セ@ = 2 and 7lo = 4. Each of the FIR filters can be represented by a polynomial of degree セ@ m. The input stream of symbols can also be represented by a polynomial. The operation of the filter can then simply be a multiplication of the two polynomials. Thus, the encoder (and hence the code) can be represented by a set ofpolynomials called the generator polynomials of the code. This set contains AoTlo polynomials. The largest degree of a polynomial in this set of generator polynomials is m. We remind the reader that a block code was represented by a single generator polynomial. Thus we can define a generator polynomial matrix of size AQ X 7lo for a convolutional code as follows. G(D) = [giJ (D)] (6.3) FIR FIR FIR FIR Fig. 6.7 A Convolutional Encoder in Terms of FIR Filters with k0 = 7and n0 = 4.
  • 87. Information Theory, Coding and Cryptography Fig. 6.8 A Convolutional Encoder in Terms of FIR Filters with ko = 2 and n 0 = 4. The Rate of This Encoder is R =セN@ Exmllpll16.3 Consider the convolutional encoder given in Fig. 6.9. セMMMMMMMMMMMMMMMMMMMMMMMMセ@ I I I I D b 8 I I I I セMMMMMMMMMMMMMMMMMMMMMMMMMセ@ Fig. 6.9 The r。エ・セ@ Convolutional Encoder with G(D) = (o2 + D +1 fil + 7). The first bit of the output a =i,._2 + i,._1 + in and the second bit of the output b =i,._2 + i,. , where in-I represents the input that arrived l time units earlier. Let the input stream of symbols be represented by a polynomial. We know that multiplying any polynomial by D corresponds to a single cyclic right-shift ofthe elements. Therefore, gu(D) = rY + D + 1 and g12 (D)= rY + 1 and the generator polynomial matrix ofthis encoder can be written as G(D)= [D 2 +D+I IY+1]. Convolutional Codes Next, consider the encoder circuit shown in Fig. 6.10. -----------------------------------, I I l I I I - i b a r----- h I I I L---------------------------------- Fig. 6.10 The r。エ・セ@ Convolutional Encoder with G(D) = (1 d + 1). In this case, a = in and b = in-4 + in. Therefore, the generator polynomial matrix ofthis encoder can be written as G (D) = [1 D 4 + 1]. Note that the first ko bits (/co = 1) of the codeword frame is identical to the information frame. Hence, this is a Systematic Convolutional Encoder. Example 6.4 Considerthe systematic convolutional encoder represented by the following circuit (Fig. 6.11). r---------------1 I I I Fig. 6.11 The Rate 2/3 Convolutional Encoder for Example 6.4. The generator polynomial matrix of this encoder can be written as [ g11 (D) g12 (D) g13 (D)]- [1 G{D) = - g21 (D) g22 (D) g23 (D) 0 0 D 3 +D+l] 1 0 It is easy to write the generator polynomial matrix by visual inspection. The element in the セセイッキ@ andj'thcolumn ofthe matrix represents the relation between thel-th input bit and theJ-th output bit. To
  • 88. I I II .j :i I ,, l ) i Information Theory, Coding and Cryptography write the generator polynomial for the (iw,jtll) entry of the matrix, just trace the route from the ,-Ut input bit to theJ-tl!OUtpUt bit. Ifno path exists, the generator polynomial is the zero polynomial, as in the case of g12(D), g21(D) and g2lD). If only a direct path exists without any delay elements, the value ofthe generator polynomial is unity, as in g11(D) and g2z(D). Ifthe route from the ithinput bit to theJ-Utoutput bit involves a series ofmemory elements (delay elements), represent each delay by an 。、、ゥエゥセョ。ャ@ power of D, as in g13(D). Note that three of the generator polynomials in the set of generator polynomials are zero. When ko is greater than 1, it is not unusual for some of the generator polynomials to be the zero polynomials. We can now give the formal definitions of the W ordlength, the Blocklength and the Constraint Length of a Convolutional Encoder. Definition 6.6 Given the generator polynomial matrix [gij(D)] of a convolutional code: {i) The Wordlength of the code is k = セ@ セH@ deg gij(D) + 1). {6.4) 1,) {ii) The Blocklength of the code is n = n0 11?-il:X[deg gij(D) + 1]. 1,) {iii) The Constraint Length of the code is ko V= lセ{、・ァ@ gij(D)]. i=j 1,) {6.5) {6.6) Recall that the input message stream IQ, i1, £;., セ@ ... has the polynomial representation I (D) = z0 + i1D + £;_/J + i3U + ... + iMJ nMJ and the codeword polynomial can be written as C(D) = li> + c1 D + £2 D 2 + 0,Ii +...+ criJ I?J. The encoding operation can simply be described as vector matrix product, C (D) = /(D) G (D) (6.7) or equivalently, c1(D) = Liz(D)g1, 1(D). (6.8) i=l Observing that the encoding operation can simply be described as vector matrix product, it can be easily shown that convolutional codes belong to the class of linear codes (exercise). Convolutional Codes I Definition セ@ A Parity Check Matrix H(D) is an (no- セI@ by no matrix of polynomials that satisfies G(D)H(D)T= 0 (6.9) and the Syndrome Polynomial vector which is a (no- セIM」ッューッョ・ョエ@ row vector is give:J;l by s(D) = v(D)H(D)T (6.10) d・セエゥッョ@ セ@ A Systematic Encoder for a convolutional code has the generator polynomial matrix of the form G(D) =[I IP(D)] (6.11) where I is a ko by ko identity matrix and P(D) is a ko by (no - fro) matrix of polynomials. The parity check polynomial matrix for a systematic convolutional encoder is H(D) = [- P(D)T II] (6.12) where I is a (no - セI@ by (no - セI@ identity matrix. It follows that G(D)H(D)T = 0 (6.13) Definition 'if A convolutional code whose generator polynomials g1(IJ), l!;;.(D), ..., g110 (D) satisfy GCD[gl(D), l!;;.(D), ..., gno (D)] = XZ (6.14) for some a is called a Non-Catastrophic Convolutional Code. Otherwise it is called a Catastrophic Convolutional Code. Without loss of generality, one may take a = 0, i.e., XZ = 1. Thus the task of finding a non catastrophic convolutional code is equivalent to finding a good set of relatively prime generator polynomials. Relatively prime polynomials can be easily found by computer searches. However, what is difficult is to find a set of relatively prime generator polynomials that have good error correcting capabilities. Example 6.5 All systematic codes are non-catastrophic because for them g1 (D) = 1 and therefore, GCD[l, g2(D)], ..., g"' (D)]= 1 Thus the systematic convolutional encoder represented by the generator polynomial matrix G(D) = [1 D4 + 1] is non-catastrophic. Consider the following generator polynomial matrix of a binary convolutional encoder G(D) = [VZ + 1 D4 + 1] We observe that(D2 + 1)2 =D4 + (D2 + D2 ) + 1 =D4 + 1for binary encoder (modulo 2 arithmetic). Hence, GCD[gdD), gz(D)] = D2 + 1;t 1. Therefore, this is a catastrophic encoder. I
  • 89. I' Information Theory, Coding and Cryptography Next, consider the following generator polynomial matrix of a non-systematic binary convolutioilal encoder The two generator polynomials are relatively prime, i.e., GCD[g1(D), g2(D)] =I. Hence セウ@ represents anon-catastrophic convolutional encoder. In the next section, we see that the distance notion of convolutional codes is an important parameter that determines the number of errors a convolutional code can correct. 6.4 DISTANCE NOTIONS FOR CONVOLUTIONAL CODES Recall that, for block codes, the concept of (Hamming) Distance between two codewords provides a way of quantitatively describing how different the two vectors are and how as a good code must possess the maximum possible minimum distance. Convolutional codes also have a distance concept that determines how good the code is. When a codeword from a convolutional encoder passes through a channel, errors occur from time to time. The job of the decoder is to correct these errors by processing the received vector. In principle, the convolutional codeword is infinite in length. However, the decoding decisions are made on codeword segments of a finite length. The number of symbols that the decoder can store is called the Decoding Window Width. Regardless of the size of these finite segments (decoding window width), the previous frames affect the current frame because of the memory of the encoder. In general, one gets a better performance by increasing the decoding window width, but one eventually reaches a point of diminishing return. Most of the decoding procedures for decoding convolutional codes work by focussing on the errors in the first frame. If this frame can be corrected and decoded, then the first frame of information is known at the receiver end. The effect of these information symbols on the subsequent information frames can be computed and subtracted from subsequent codeword frames. Thus the problem of decoding the second codeword frame is the same as the problem of decoding the first frame. We extend this logic further. If the first[frames have been decoded successfully, the problem of decoding the(/+ l)th frame is the same as the problem of decoding the first frame. But what happens if a frame in-between was not decoded correctly? If it is possible for a single decoding error event to induce an infinite number of additional errors, then the decoder is said to be subject to Error Propagation. In the case where the decoding algorithm is responsible for error propagation, it is termed as Ordinary Error Propagation. In the case where the poor choice of catastrophic generator polynomials cause error propagation, we call it Catastrophic Error Propagation. Convolutional Codes Definition 6.10 The zthminimum distance d;·of a convolutional code is equal to the smallest Hamming Distance between any two initial codeword segments l frame long that are not identical in the initial frame. Ifl= m+ 1, then this (m+ l)th minimum distance is called the Minimum Distance of the code and is denoted by d• , where m is the number of information frames that can be stored in the memory ofthe encoder. In literature, the minimum distance is also denoted by dmi,. We note here that a convolutional code is a linear code. Therefore, one of the two codewords used to determine the minimum distance of the code can be chosen to be the all zero codeword. The lth minimum distance is then equal to the weight of the smallest-weight codeword segement l frames long that is non zero in the first frame (i.e., different from the all zero frame). Suppose the lth. minimum distance of a convolutional code is d!. The code can correct terrors occurring in the first l frames provided d; セ@ 2t+ 1' (6.15) Next, put l= m+ 1, in which cased;= d;+l = d•. The code can correct terrors occurring in the first blocklength n = {m + 1)no provided 、Nセ@ 2t + 1 (6.16) Example 6.6 Consider the convolutional encoder ofExamp1e 6.1 (Fig. 6.3). The Trellis Diagram for the encoder is given in Fig. 6.12. liJ E· E· •· • • セ@ MMMMMMMMMMMMMMMMMMMMMMセ@ Time axis • •• Continues to infinity Fig. 6.12 The Trellis Diagram for the Convolutional Encoder of Example 6.1. In this case d1 * = 2, セᄋ@ = 3, d3* = 5, d4* = 5, ... We observe that dt= 5 fori セSN@ For this ・セ」ッ、・イL@ m = 2.Therefore, the minimum distance of the code ゥウセ]@ d • = 5. This code can correct(d - 1)/2 = 2 random errors that occur in one blocklength, n = (m + l)no = 6.
  • 90. I ·i E2J Information Theory, Coding and Cryptography Definition 6.11 The Free Distance of a convolutional code is given by flt,ee = m;uc[dj] (6.17) It follows that t4n+l :5 dm+2 :5 ··· :5 dfree · The term dfree was first coined by Massey in 1969 to denote a type of distance that was found to be an important parameter for the decoding techniques of convolutional codes. Since, fir.ee represents the minimum distance between arbitrarily long (possibly infinite) encoded sequences, dfree is also denoted by doo in literature. The parameter dfree can be directlycalculated from the trellis diagram. The free distance fit,ee is the minimum weight of a path that deviates from the all zero path and later merges back into the all zero path at some point further down the trellis as depicted in Fig. 6.13. Searching for a code with large minimumdistance and large free distance is a tedious process, and is often done using a computer.Clever techniques have been designed that reduce the effort by avoiding exhaustive searches. Most of the good convolutional codes known today have been discovered by computer searches. Definition 6.12 The free length セッヲ@ a convolutional code is the length of the non-zero segment of a smallest weight convolutional codeword of non-zero weight. Thus, d1= dfreeif l= nfree, and d1< flt,eeif l< nfree· In literature, 7l_trnis also denoted by n00 • --------The all zero path ___________... セ@ Re-merges V Nodes in the trellis Fig. 6.13 The Free Distance dtree path. Example 6.7 Consider the convolutional encoder of Example 6.1 (Fig. 6.3). For this encoder, dfree = 5. There are usually more than one ーセ@ ofpaths that can be used to calculate dfree . The two paths that have been used to calculatedfree are shown in Fig. 6.14 by double lines. In this example, dmin = dfree . Convolutional Codes States Time axis Fig. 6.14 Calculating dtree in the Trellis Diagram. Q Q Q Continues to infinity The free length of the convolutional code is nfree = 6. In this example, the セ@ is equal to the blocklength n of the code. In general it can be longer than the blocklength. 6.5 THE GENERATING FUNCTION The performance of a convolutional code depends on its free distance, dfree . Since convolutional codes are a sub-class of linear codes, the set of Hamming distances between coded sequences is the same as the set of distances of the coded sequel).ces from the all-zero sequence. Therefore, we can consider the distance structure of the convolutional codes with respect to the all-zero sequence without loss of generality. In this section, we shall study an elegant method of determining the free distance, セ・@ of a convolutional code. To find fir.ee we need the set of paths that diverge from the all-zero path and merge back at a later time. The brute force (and time consuming, not to mention, exasperating) method is to determine the distances of all possible paths from the trellis diagram. Another way to find out the fit,ee of a convolutional code is use the concep.t of a generating function, whose expansion provides all the distance information directly. The generating function can be understood by the following example. Example 6.8 Consider again the convolutional encoder of Example 6.1 (Fig. 6.3). The state diagram ofthe encoder is given in Fig. 6.4. We now construct a modified state diagram as shown in Fig. 6.15. The branches of this modified state diagram are labelled by branch gain d ,i = 0, 1, 2, where the expc)nent of D denotes the Hamming Weight of the·branch. Note that the self loop ar S0 has been neglected as it does not contribute to the distance property ofthe code. Circulating around this loop simply generates the all zero sequence. Note thatS0 has been split into two states, initial and final. Any path that diverges from state S0 and later merges back to S0 can be thought of equivalently as traversing along the branches of this modified state diagram, starting from the
  • 91. I I ! I t l i H : L II.: ··..セ@ I 't I :f J. ·lj セ@ : Information Theory, Coding and Cryptography initial S0 and ending at the final S0• Hence this modified state diagram encompasses all possible paths that diverge from and then later merge back to the all zero path in the trellis diagram. ・MMMMMセMMMMMクセQセMMMMMMMMセセクセRMMMMセセMMMMt\d^@ So Fig. 6.15 The Modified state Diagram of the Convolutional Encoder Shown in Fig. 6.3. We can find the distance profile ofthe convolutional code using the state equations ofthis modified state diagram. These state equations are xt =D2 + x2, x2 = DX1 + DX3, x3 = DX1 + vx3, T(D) = D2X2, (6.18) where Xi are dummy variables. Upon solving these equations simultaneously, we obtain the generating function Ds T(D)=-- 1-2D (6.19) Note that the expression for T(D) can alxso be (easily) obtained by theMason's Gain Formula, which is well known to the students ofDigital Signal Processing. Following conclusions can be drawn from the series expansion ofthe generating function: (i) There are an infinite number ofpossible paths that diverge from the all zero path and later merge back again (this is also intuitive). (ii) There is only one path with Hamming Distance 5, two paths with Hamming Distance 6, and in general21 paths with Hamming Distance k + 5 from the all zero path. (iii) The free Hamming Distance d1,eefor this code is 5. There is only one path corresponding to 4ee .Example 6.7 explicitly illustrates the pair of paths that result in 4ee = 5. Convolutional Codes We now introduce two new labels in the modified state diagram. To enumerate the length of a given path, the label Lis attached to each branch. So every time we traverse along a branch we increment the path length counter. We also add the label Ii, to each branch to enumerate the Hamming Weight of the input bits associated with each branch of the modified state diagram {see Fig. 6.16). DLI Fig. 6.16 The Modified state Diagram to Determine the Augmented Generating Function. The state equations in this case would be X1 = セlャ@ +LIX;_, X;. = DLXI + dセL@ X3 ]dセ@ +DLfX3, And the Augmented Generating Function is T(DJ=IYLX;. . On solving these simultaneous equations, we obtain T(D, L, I)= DsL3I 1-DL(L+l)I (6.20) = D5L3I+ D6L4 (L + 1)I2 + ···+ Dk+5 L3+k (L +1)klk+1 ··· (6.21) Further conclusions from the series expansion of the augmented generating function are: (i) The path with the minimum Hamming Distance of 5 has length equal to 3. (ii) The input sequence corresponding to this path has weight equal to 1. (iii) There are two paths with Hamming Distance equal to 6 from the all zero path. Of these, one has a path length of 4 and the other 5 (observe the power of L in the second term in the summation). Both these paths have an input sequence weight of 2. In the next section, we study the matrix description of convolutional codes which is a bit more complicated than the matrix description of linear block codes.
  • 92. ! f I セ@ i w "I H- Information Theory, Coding and Cryptography 6.6 MATRIX DESCRIPTION OF CONVOLUTIONAL CODES A convolutional code can be described as consisting of an infinite number of infinitely long codewords and which (visualize the trellis diagram) belongs to the class of linear codes. They can be described by an infinite generator matrix. As can be expected, the matrix description of convolutional codes is messier than that of the linear block codes. Let the generator polynomials of a Convolutional Code be represented by giJ(D) = セァゥj@ D 1 (6.22) In order to obtain a generator matrix, the gijl coefficients are arranged in a matrix format. For each l, let G1be a セ@ by no matrix. Gt=[gii1] (6.23) Then, the generator matrix for the Convolutional Code that has been truncated to a block code of blocklength n is Go G1 Gz Gm 0 Go Go Gm-1 G(n) = G(n)= 0 0 Go Gm-2 (6.24) 0 0 0 Go where 0 is a セ@ by no matrix of zeros and m is the length of the shift register used to generate the code. The generator matrix for the Convolutional Code is given by G=[:' セ@ G2 Gm 0 0 0 0 """] Go セ@ Gm-1 Gm 0 0 0 ... 0 Go Gm-2 Gm-1 Gm 0 0 (6.25) The matrix extends infinitely far down and to the right. For a systematic convolutional code, the generator matrix can be written as I Po 0 セ@ 0 p2 0 pm :o 0 0 0 0 0 I Po 0 11 0 I Pm-1 l 0 pm 0 0 0 0 0 0 I Po 0 I pm-1 0 pm Pm-2 l 0 (6.26) G= :o pm-2 0 Pm-1 I I 0 Pm-2 I I I I where I is 。セ@ 「ケセ@ identity matrix, 0 is 。セ@ 「ケセ@ matrix of zeros and P0 , P 2 , ..., Pュ。イ・セ@ by (no - セI@ matrices. The parity check matrix can then be written as Convolutional Codes RT 0 -I p,T 1 0 P/ -I Pl 0 p,T 1 0 P/ -I H= (6.27} pT m 0 p,;_1 0 p,;_2 0 P{ -I pT m 0 p,;_I 0 pT m 0 Example 6.9 Consider the convolutional encoder shown in Fig. 6.17. Let us first write the gene- rator polynomial matrix for this encoder. To do so, we just follow individual inputs to the outputs, one by one, and count the delays. The generator polynomial matrix is obtained as [ D + D2 DD2 D+DD2] G(D)= D2 i2 + C:3 Fig. 6.17 A Rate 2!3 Convolutional Encoder. The generator polynomials are g11(D) = D + IY, g12(D) = IY, g13(D) = D + IY, g21(D) = D 2 , セ R HdI@ = D and セ S@ (D) = D. To write out the matrix G0, we look at the constants (coefficients of D 0 ) in the generator· polynomials. Since there are no constant terms in any of the generator polynomials, gッ]{セ@ セ@ セ}@ Next, to write out the matrix G1, we look at the coefficients ofD1 in the generator polynomials. The l8 trow, 1st column entry of the matrixG1 corresponds to the coefficients ofD 1 ingu(D). The l8 t row, 2nd column entry corresponds to the coefficients ofD1in g12(D), and so on. Thus,
  • 93. i i !I ! i Information Theory, Coding and Cryptography Similarly, we can write Gr= [: セ@ セ}@ The generator matrix can now be written as o o o:1 o 1:1 1 1:o o o ... I I I o o o:o 1 1:o 1 1: o o ... -------r-------T-------r---------- 10 0 0 11 0 111 1 1 I I I :o o o:o 1 1:o 1 1 G= I I I MMMMMMM[MMMMMMM[MMMMMMMZセMMッMMセMMM 1 I ! : 0 1 1 ... Our next task is to look at an efficient decoding strategy for the convolutional codes. One of the very popular decoding methods, the Viterbi Decoding Technique, is discussed in detail. 6.7 VITERBI DECODING OF CONVOLUTIONAL CODES There are three important decoding techniques for convolutional codes: Sequential Decoding, Threshold Decoding and the Viterbi Decoding. The Sequential Decoding technique was proposed by Wozencraft in 1957. Sequential Decoding has the advantage that it can perform very well with long-constraint-length convolutional codes, but it has a variable decoding time. .Threshold Decoding, also known as Majority Logic Decoding, was proposed by Massey in 1963 in his doctoral thesis at MIT. Threshold decoders were the first commercially produced decoders for convolutional codes. Viterbi Decoding was developed by AndrewJ. Viterbi in 1967 and in the late 1970s became the dominant technique for convolutional codes. Viterbi Decoding had the advantages of (i) a highly satisfactory bit error performance, (ii) high speed of operation, (iii) ease of implementation, (iv) low cost. Threshold decoding lost its popularity specially because of its inferior bit error performance. It is conceptually and practically closest to block decoding and it requires the calculation of a set of syndromes, just as in the case of block codes. In this case, the syndrome is a sequence because the information and check digits occur as sequences. Viterbi Decoding has the advantage that it has a fixed decoding time. It is well suited to hardware decoder implementation. But its computational requirements grow exponentially as a function of the constraint length, so it is usually limited in practice to constraint lengths of Convolutional Codes v = 9 or less. As of early 2000, some leading companies claimed to have produced a V = 9 Viterbi decoder that operates at rates up to 2 Mbps. Since the time when Viterbi proposed his algorithm, other researchers have expanded on his work by finding good convolutional codes, exploring the performance limits of the technique, and varying decoder design parameters to optimize the implementation of the technique in hardware and software. The Viterbi Decoding algorithm is also used in decoding Trellis Coded Modulation (TCM), the technique used in telephone-line modems to squeeze high ratios of bits per-second to Hertz out of 3 kHz-bandwidth analog telephone lines. We shall see more of TCM in the next chapter. For years, convolutional coding with Viterbi Decoding has been the predominant FEC (Forward Error Correction) technique used in space communications, particularly in geostationary satellite communication networks such as VSAT (very small aperture terminal) networks. The most common variant used in VSAT networks is rate 112 convolutional coding using a code with a constraint length V= 7. With this code, one can transmit binary or quaternary phase-shift-keyed (BPSK or QPSK) signals with at least 5 dB less power than without coding. That is a reduction in Watts of more than a factor of three! This is very useful in reducing transmitter and antenna cost or permitting increased data rates given the same transmitter power and antenna sizes. We will now consider how to decode convoh.Itional codes using the Viterbi Decoding algorithm. The nomenclature used here is that we have a message vector i from which the encoder generates a code vector c that is sent across a discrete memoryless channel. The received vector r may differ from the transmitted vector c (unless the channel is ideal or we are very lucky!). The decoder is required to make an estimate of the message vector. Since there is a one to one correspondence between code vector and message vector, the decoder makes an estimate of the code vector. Optimum decoding will result in a minimum probability of Decoding error. Let p(rlcJ be the conditional probability of receiving r given that c was sent. We can state that the optimum decoder is the maximum likelihood decoder with a decision rule to choose the code vector estimate Cfor which the log-likelihood function In p(rlcJ is maximum. If we consider a BSC where the vector elements of c and r are denoted by ci and r; , then, we have N p{rlcJ = L P(T; lc;), i=l and hence, the log-likelihood function equals N In p(r Ic)= L In p(r; jcJ i=l Let us assume (6.28) (6.29) (6.30)
  • 94. Information Theory, Coding and Cryptography If we suppose that the received vector differs from the transmitted vector in exactly d positions (the Hamming Distance between vectors c and r), we may rewrite the log-likelihood function as In p(r Ic)= din p + (N- d) ln(1 - p) =din (_p__J+Nln ( 1 - p) 1- p (6.31) We can assume the probability of error p < lf2 and we note that N In(1 - p) is a constant for all code vectors. Now we can make the statement that the maximum likelihood decoding rule for a Binary Symmetric Channel is to choose the code vector estimate cthat minimizes the Hamming Distance between the received vector r and the transmitted vector c. For Soft Decision Decoding in Additive White Gaussian Noise (AWGN) channel with single sided noise power N0, the likelihood function is given by lr-d N -'-'- II 1 No .p(r Ic) = i=I JtrNo e ( 1 Jt ( 1 N J = - - exp --,Lir; -cil2 TrNo No i=I (6.32) Thus the maximum likelihood decoding rule for the AWGN channel with Soft Decision Decoding is to minimize the squared Euclidean Distance between r and c. This squared Euclidean Distance is given by N 、セHイャ」I@ = _L!r; -ci! 2 (6.33) i=l Viterbi Decoding works by choosing that trial information sequence, the encoded version of which is closest to the received sequence. Here, Hamming Distance will be used as a measure of proximity between two sequences. The Viterbi Decoding procedure can be easily understood by the following example. Example 6.10 Consider the rate 113 convolutional encoder given in Fig. 6.18 and the corresponding trellis diagram. (Contd.. .) Convolutional Codes 101 101 Smres -------------------------------------- Time axis • • • Continues to infinity Fig. 6.18 A Rate 7/3 Convolutional Encoder and Its Trellis Diagram. Suppose the transmitted sequence was an all zero sequence. Let the received sequence be r= 010000100001 ... Since it is a 1/3 rate encoder, we first segment the received sequence in groups of three bits (because n0 = 3), i.e., r = 010 000 100 001 ... The task at hand is to fiqd out the most likely path through the trellis. Since a path must pass through nodes in the trellis, we will try to find out which nodes in the trellis belong to the most likely path. At any time, every node has two incoming branches HセINw・@ simply determine which ofthese two branches belongs to a more likely path (and discard the other). We make this decision based on some metric (Hamming Distance). In this way we retain just one path per node and the metric of that path. In this example, we retain only four paths as we progress with our decoding (since we have only 4 states in our trellis). Let us consider the first branch of the trellis which is labelled 000. We find the Hamming distance between this branch and the first received framelength, 010. The Hamming distance d (000, 010) = 1. Thus the metric for this first branch is 1, and is called theBranch Metric. Upon reaching the top node from the starting node, this branch has accumulated a metric= 1. Next, we compare the received framelength with the lower branch, which terminates at the second node from the top. The Hamming Distance in this case is d (111, 010) = 2. Thus, the metric for this first branch is 2. At each node we write the total metric accumulated by the path, called the Path Metric. The path metrics are marked by circled numbers in the trellis diagram in Fig. 6.19. At the subsequent stages ofdecoding when two paths terminate at every node, we will retain the path with the smaller value of the metric.
  • 95. I, i. [ l l! T セ@ ,, ·, Information Theory, Coding and Cryptography Lo1 I • L1o I • • • • • 0· • ウュセウ@ MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ@ Time axis Fig. 6.19 The Path Metric after the 1st Step of Vlterbl Decoding. We, now, move to the next stage ofthe trellis. The Hamming Distance betweenthe branches are computed with respect to the second frame received, 000. The branch metrics for the two branches emanating from the topmost node are 0 and 3. The branch metrics for the two branches emanating from the second node are 2 and 1. The total path metric is marked by circled numbers in the trellis diagram shown in Fig. 6.20. 0 0 8 (i 8 8 0 8 (i 0 8 (i セ@ MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ@ Time axis Fig. 6.20 The Path Metric after the 2nd Step of Vlterbl Decoding. We now proceed to the next stage. We again compute the branch metrics and add them to the respective path metrics to get the new path metrics. Consider the topmost node at this stage. Two branches terminate at this node. The path coming from node 1 of the previous stage has a path metric of2 and the path coming from node 1ofthe previous stage has a path metric of6. The path with a lower metric is retained and the other discarded. The trellis diagram shown in Fig. 6.21 gives the surviving paths (double lines) and the path metrics (circled numbers). Viterbi called these surviving paths as Survivors. It is interesting to note that node 4 receives two paths with equal path metrics. We have arbitrarily chosen one ofthem as the surviving path (by tossing a fair coin!). Convolutional Codes G G· 0 G 8 0· .. 0-- 8 States Time axis Fig. 6.21 The Path Metric after the 3rd step of Viterbl Decoding. We continue this procedure for Viterbi decoding for the next stage. The final branch metrics and path metrics are shown in Fig. 6.22. At the end we pick the path with the minimum metric. This path corresponds to the all zero path. Thus the decoding procedure has been able to correctly decode the received vector. CDo f;; セセセセセセセセセセ]]セP@ G· 0 @ (oil G 0· (i 0 0· @ @ states lime axis Fig. 6.22 The Path Metric after the 4th Step of Viterbi Decoding. The minimum distance for this code is a= 6. The number oferrors that it can correct perframe length is equal to t= l(d.- 1)/2)j = l(6- l)/2j = 2. In this example, the maximum number oferrors per framelength was 1. Consider the set of surviving paths at the rth frame time. If all the surviving paths cross through the same nodes then a decision regarding the most likely pathNエイ。ョセュゥエエ・、@ can be made up to the point where the nodes are common. To build a ーイセ」エゥ」。ャ@ vエセイ「エ@ Decoder, one must choose a decoding window width w, which is usually several times as btg. as the 「ャッ」セ・ョァエィN@ At a given frame time,f, the decoder examines all the surviving paths to see If they agree m the first
  • 96. i; l '·! !I· li ,, Information Theory, Coding and Cryptography branch. This branch defines a decoded information frame and is passed out of the decoder. In the previous example of Viterbi Decoding, we see that by the time the decoder reaches the 4th frame, all the surviving paths agree in their first decoded branch (called a well-defined decision). The decoder drops the first branch (after delivering the decoded frame) and takes in a new frame of the received word for the next iteration. If again, all the surviving paths pass through the same node of the oldest surviving frame, then this information frame is decoded. The process continues in this way indefinitely. If a long·enough decoding window w is chosen, then a well-defined decision can be reached almost always. A well designed code will lead to correct decoding with a high probability. Note that a well designed code carries meaning only in the context of a particular channel. The random errors induced by the channel should be within the error correcting capability of the code. The Viterbi decoder can be visualized as a sliding window through which the trellis is viewed (see Fig. 6.23). The window slides to the right as new frames are processed. The surviving paths are marked on the portion of the trellis which is visible through the window. As the window slides, new nodes appear on the right, and some of the surviving paths are extended to these new nodes while the other paths disappear. Decoding Window • • • • • • • • • • • • • • • • • • • • • • • • • • • • w Fig. 6.23 The Viterbi Decoder as a Sliding Window through which the Trellis is Viewed. If the surviving paths do not go through the same node, we label it a Decoding Failure. The decoder can break the deadlock using any arbitrary rule. To this limited extent, the decoder becomes an incomplete decoder. Let us revert back to the previous example. At the 4th stage, the surviving paths could as well be chosen as shown in Fig. 6.24, which will render the decoder as an incomplete decoder. CD CD 0 0 8 • • 0 0· ® States Time axis Fig. 6.24 Example of an Incomplete Decoder in Viterbi Decoding Process. Convolutional Codes It is possible that in some cases the decoder reaches a well-defined decision, but a wrong one! If this happens, the decoder has no way of knowing that it has taken a wrong decision. Based on this wrong decision, the decoder will take more wrong decisions. However, if the code is non- catastrophic, the decoder will recover from the errors. The next section deals with some Distance Bounds for convolutional codes. These bounds will help _us compare different convolutional coding schemes. 6.8 DISTANCE BOUNDS FOR CONVOLUTIONAL CODES Upper bounds can be computed on the minimum distance of a convolutional code that has a rate R = !!L and a constraint length v = ュセN@ These bounds are similar in nature and derivation no to those for block codes, with block length corresponding to constraint length. However, as we shall see, the bounds are not very tight. These bounds just give us a rough idea of how good the code is. Here we present the bounds (without ーイッッセ@ for binary codes. For rate R and constraint length v, let d be the largest integer that satisfies hHセカ@ IセゥMr@ (6.34) Then at least one binary convolutional code exists with minimum distance d for which the above inequality holds. Here H(x) is the familiar entropy function for a binary alphabet H(x) = - x log2 x- (1 - x) log2 (1 - x), 0 :::;; x:::;; 1 For a binary code with R = 1171v the minimum distance dmin satisfies dmin :::;; L(no v +no )/2J where L]J denotes the largest integer less than or equal to 1 An upper bound on dfree is given by (Heller, 1968) rip.ee = min lnv -t-v+j -lj f?.l 2 21 -1 (6.35) (6.36) To calculate the upper bound, the right hand side should be plotted for different integer values of j. The upper bound is the minimum of this plot. Example 6.10 Let us apply the distance bounds on the convolutional encoder given in Example 6.1. We will first apply the bound given by (6.34). For this encoder, ko セ@ 1, no= 2, R =lnand v =2. H(__!l_) = H(d14) セ@ 1- R=1/2 => H(d14) セ@ 0.5 n0v
  • 97. I. i I i :I But we have, Information Theory, Coding and Cryptography H{0.11) = - 0.11log2 0.11- (1- 0.11) log2 {1- 0.11) = 0.4999, and H{0.12) =- 0.12 log2 0.12- {1- 0.12) log2 {1- 0.12) = 0.5294. Therefore, 、OTセ@ .0.11, or d セ@ 0.44 The largest integer d that satisfies this bound is d = 0. This implies that at least one binary convolutional code exists with minimum distance d = 0. This statement does not say much (i.e., the bound is not strict enough for this encoder)! NeXt, consider the encoder shown in Fig. 6.25. For this encoder, But we have, ---------------- ; I I :_----------------- J Fig. 6.25 Convolutional Encoder for Example 6. 10. G(D) = [1 D+ D2 D+ D2 + D3 ], セ@ =1, no= 3, R= 113 and v =3. n(セカI@ =H(d/9) Sl- R; 2/3=> H(d/9) s0.6666 H{O.l7) = - 0.17log2 0.17- (1- 0.17) log2 (1- 0.17) =0.6577, and H(O.l8) = - 0.18log2 0.18- (I - 0.18) log2 (1 - 0.18) = 0.6801 Therefore, d/9 セ@ 0.17, or d セ@ 1.53. The largest integer d that satisfies this bound is d = 1. Then at least one binary convolutional code exists with minimum distance d = 1. This is a very loose bound. Let us now evaluate the second bound, given by (6.35). dmin セ@ l(nov +no )12J=l(9+ 3)/2J=6 This gives us dmm = 6, which is a good upper bound as seen from the trellis diagram ヲセイ@ the encoder (Fig. 6.26). Since no= 3, every branch in the trellis is labelled by 3 btts. The two paths that have been used to calculate dmin are shown in Fig. 6.26 by double lines. In this example, dmin = セ@ = 6. Convolutional Codes States lime axis e e e Continues to infinity Fig. 6.26 The Trellis Diagram for the Convolutional Encoder Given in Fig. 6.25. Next, we determine the Heller Bound on dfm, as given by (6.36). The plot of the function d(j) = l(no/2){211(21 -t))(v +j-1)J for different integer values of j is given in Fig. 6.27. .-3 ·. 4. 5 Fig. 6.27 The Heller Bound Plot. From Fig. 6.27, we see that the upper bound on the free dista,nce of the code is dfm セ@ 8. This is a good upper bound. The actual value of dfrtt = 6. 6.9 PERFORMANCE BOUNDS One of the useful performance criteria for convolutional codes is the bit error probability P1r The bit error probability or the bit error rate (a misnomer!) is defined as the expected number of decoded information bit errors per information bit. Instead of obtaining an exact expression for Ph , typically, an upper bound on the error probability is calculated. We will first determine the First Event Error Probability, which is the probability of error for sequences that merge with the all zero (correct) path for the first time at a given node in the trellis diagram.
  • 98. Information Theory, Coding and Cryptography Since convolutional codes are linear, let us assume that the all zero codeword is transmitted. An error will be made by the decoder if it chooses the incorrect path c' instead of the all zero path. Let c' differ from the all zero path in d bits. Therefore, a wrong decision will be made by the maximum likely decoder if more than lセ@ Jerrors occur, where L xJ is the largest integer less than or equal to x. If the channel transition probability is p, then the probability of error can be upper bounded as follows. (6.37) Now, there would be many paths with different distances that merge with the correct path 。エセ@ a given time for the first time. The upper bound on the first event error probability can be obtained by summing the error probabilities of all such possible paths: co Pe セ@ LadPd (6.38) d=dfrn where, ad is the number of codewords of Hamming Distance d from the all zero codeword. Comparing (6.19) and (6.38) we obtain セ@ セ@ T(D)ID=2b(I- p) (6.39) The bit error probability, Ph, can now be determined as follows. Ph can be upper bounded by weighting each pairwise error probability, Pd in (6.37) by the number of incorrectly decoded information bits for the corresponding incorrect path nd- For a rate lr/n encoder, the average Ph (6.40) It can be shown that aT(D, 1)1 ai l=l (6.41) Thus, Ph セ@ _!_ 1ar(D, I) j k a1 l=l,D=2J(l-p) (6.42) 6.10 KNOWN GOOD CONVOLUTIONAL CODES In this section, we shall look at some known good convolutional codes. So far, only a few constructive classes of convolutional codes have been reported. There exists no class with an algebraic structure comparable to the t-error correcting BCH codes. No constructive methods Convolutional Codes exist for finding convolutional codes of long constraint length. Most of the codes presented here have been found by computer searches. Initial work on short convolutional codes with maximal free distance was reported by Odenwalder (1970) and Larsen (1973). A few of the codes are listed in Tables 6.2, 6.3 and 6.4 for code rates 112, 113 and 1/4 respectively. The generator is given as the sequence 1, イセゥIG@ イセI@ ... where (6.43) For example, the octal notation for the generators of the R = lf2 , v = 4 encoders are 15 and 17 (see Table 6.2). The octal15 can be deciphered as 15 = 1-5 = 1-101. Therefore, Similarly, Therefore, 3 4 5 6 7 5 12 14 g1(D)= 1 + (1)D + (O)Ii + (1)U = 1 + D +d. 17= 1-7= 1-111. !§;.(D)= 1 + (1)D + (1)U + (1)U = 1 + D + Ii + d, and G(D) = [1 + D + d 1 + D + Ii + U]. Table 6.2 Rate セ@ codes with maximum free distance v n Generators duee Heller (octal) Bound Non Catastrophic 6 5 7 5 5 8 15 17 6 6 10 23 35 7 8 12 53 75 8 8 14 133 171 10 10 Catastrophic 10 27 35 8 8 24 5237 6731 16 16 28 21645 37133 17 17 Table 6.3 Rate 7/3 codes with maximum free distance. ,. n Generators d1 '"'" Heller 3 4 5 6 7 9 12 15 18 21 5 13 25 47 133 (octal) Bound 7 17 37 75 175 7 17 37 75 175 8 10 12 13 15 8 10 12 13 15
  • 99. .. I I 1 Information Theory, Coding and Cryptography Table 6.4 Rate 7/4 codes with maximum free distance l' n Generators d,,<'•-' Heller 3 4 5 {) 7 12 16 20 24 28 5 13 25 53 135 7 15 27 67 135 (Octal) Bound 7 15 33 71 147 7 17 37 75 163 10 15 16 18 20 10 15 16 18 20 Next, we study an interesting class of codes, called Turbo Codes, which lie somewhere between linear block codes and convolutional codes. 6. 11 TURBO CODES Turbo Codes were introduced in 1993 at the International Conference on Communications (ICC) by Berrou, Glavieux and Thitimajshima in their paper "Near Shannon Limit Error Correction Coding and Decoding-Turbo-Codes". In this paper, they quoted a BER performance of 10-5 at an E/No of 0.7 dB using only a 112 rate code, generating tremendous interest in the field. Turbo Codes perform well in the low SNR scenario. At high SNRs, some of the traditional codes like the Reed-Solomon Code have comparable or better performance than Turbo Codes. Even though Turbo Codes are considered as Block Codes, they do not exactly work like block codes. Turbo Codes are actually a quasi mix between Block and Convolutional Codes. They require, like a block code, that the whole block be present before encoding can begin. However, rather than computing parity bits from a system of equations, they use shift registers just like Convolutional Codes. Turbo Codes typically use at least two convolutional component encoders and two maximum aposteriori (MAP) algorithm component decoders in the Turbo codes. This is known as concatenation. Three different arrangements of turbo codes are Parallel Concatenated cッョカセャオエゥッョ。ャ@ Codes (PCCC), Serial Concatenated Convolutional Codes (SCCC), and セケ「ョ、@ Concatenated Convolutional Codes (HCCC). Typically, Turbo Codes are arranged hke the PCCC. An example of a PCCC Turbo encoder given in Fig. 6.28 shows that two encoders run in parallel. Fig. 6.28 Block Diagram of a Rate 7/3, PCCC Turbo Encoder. Convolutional Codes One reason for the better performance of Turbo codes is that they produce high weight code words. For example, if the input sequence (Uk) is originally low weight, the systematic (Xk) and parity 1 (Y1) outputs may produce a low weight codeword. However, the parity 2 output (Yf) is less likely to be a low weight codeword due to the interleaver in front of it. The interleaver shuffles the input sequence, Uk, in such a way that when introduced to the second encoder, it is more likely to produce a high weight codeword. This is ideal for the code because high weight codewords result in better decoder performance. Intuitively, when one of the encoders produces a 'weak' codeword, the other encoder has a low probability of producing another 'weak' codeword because of the interleaver. The concatenated version of the two codewords is, therefore, a 'strong' codeword. Here, the expression 'weak' is used as a measure of the average Hamming Distance of a codeword from all other codewords. Although the encoder determines the capability for error correction, it is the decoder that determines the actual performance. The performance, however, depends upon which algorithm is used. Since Turbo Decoding is an iterative process, it requires a soft output algorithm like the maximum a-posteriori algorithm (MAP) or the Soft Output Viterbi Algorithm (SOVA) for decoding. Soft output algorithms out-perform hard decision algorithms because they have available a better estimate of what the sent data actually was. This is because soft output yields a gradient of information about the computed infohnation bit rather than just choosing a 1 or 0 like hard output. A typical Turbo Decoder is shown in Fig. 6.29. The MAP algorithm is often used to estimate the most likely information bit to have been transmitted in a coded sequence. The MAP algorithm is favoured because it outperforms other algorithms, such as the SOVA, under low SNR conditions. The major drawback, however, is that it is more complex than most algorithms because of its focus on each individual bit of information. Research in the area (in late 1990s) has resulted in great simplification of the MAP algorithm. カAMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ@ De-lnter1eaver Decoder1 lnterleaver Final Estimate Fig. 6.29 Block Diagram of a Turbo Decoder.
  • 100. Information Theory, Coding and Cryptography A Turbo Decoder generally uses the MAP algorithm in at least one of its component decoders. The decoding process begins by receiving partial information from the channel (Xk and Yi} and passing it to the first decoder. The rest of the information, parity 2 (Yl ), goes to the second decoder and waits for the rest of the information to catch up. While the second decoder is waiting, the first decoder makes an estimate of the transmitted information, interleaves it to match the format of parity 2, and sends it to the second decoder. The second decoder takes information from both the first decoder and the channel and re-estimates the information. This second estizp.ation is looped back to the first encoder where the process starts again. The iterative process of the Turbo Decoder is illustrated below in Fig. 6.30. esif1'Ste , _ ; セエ・@ セ。|HNXU@ セッヲエQGX||PiG@ ゥtsセセセ@ based,"". 0.-- . セ・ウセ@ ゥヲoHエGZセ@ ·ni0ft1'ai0{' Reeewes ' ne aod セLL@ esiroate ゥヲ。ョウQ・ヲBsセ@ 0{fS セヲ@ ......... Fig. 6.30 Iterative Decoding of Turbo Code. This cycle will continue until certain conditions are met, such as a certain number ofiterations being performed. It is from this iterative process that Turbo Coding gets its name. The decoder circulates estimates of the sent data like a turbo engine circulates air. When the decoder is ready, the estimated information is finally kicked out of the cycle and hard decisions are made in the threshold component. The result is the decoded information sequence. In the following section, we study two decoding methods for the Turbo Codes, in detail. 6.12 TURBO DECODING We have seen that the Viterbi Algorithm is used for the decoding of convolutional codes. The Viterbi Algorithm performs a systematic elimination of the paths in the trellis. However, such luck does not exist for Turbo Decoder. The presence of the interleaver complicates the matter immensely. Before the discovery of Turbo Codes, a lot of work was being done in the area of Convolutional Codes suboptimal decoding strategies for concatenated codes, involving multiple decoders. The symbol-by-symbol maximum a posteriori (MAP) algorithm of Bahl, Cocke, Jelinek and Raviv, published in the IEEE Transactions on Information Theory in March 1974 also received some attention. It was this algorithm, which was used by Berrou et al. in the iterative decoding of their Turbo Codes. In this section, we shall discuss two methods useful for Turbo Decoding: (A) The modified Bahl, Cocke,Jelinek and Raviv (BCJR) Algorithm. (B) Th_e Iterative MAP Decoding. A. MODIFIED BAHL, COCKE, JELINEK AND RAVIV (BCJR) ALGORITHM The modified BCJR Decoding Algorithm is a symbol-by-symbol decoder. The decoder decides uk= +1 if P(uk= +II y) > P(uk= -11 y), (6.44) and it decides uk = -1 otherwise, where y = (yi, y2, ..., Yn) is the noisy received word. More succincdy, the decision uk is given by uk = sign [L(uk )] (6.45) where L(uk ) is the Log A Posteriori Probability (LAPP) Ratio defined as L(u )= lo ( P(uk = +1ly)) k g P(uk = -1ly) (6.46) Incorporating the code's trellis, this may be written as [ セー@ (sk-I= s',sk = s,y)/p(y) J L(uk ) = log , ' IP (sk-I =s 'sk =s, y)/p(y) s- (6.47) where sk e S is the state of the encoder at time k, s+ is the set of ordered pairs (s', s) corresponding to all state transitions {sk-I = f) to {sk = s) caused by data input uk = +1, and s- is similarly defined for uk = -1. Let us define (6.48) Iak-I (s')yk(s', s) s' and IIak-I(s')yk(s' s) (6.49) s' s' Ifik(s')rk(s',s) s' (6.50) f3k-1 (s') = I Iiik-1 (s')yk(s',s) s' s'
  • 101. セi@ :1 j Information Theory, Coding and Cryptography with boundary conditions ao(O) = 1 and lXo ((s :t 0) = 0, fiN(O) = 1 and fiN(s :t 0) = 0. Then the modified BCJR Algorithm gives the LAPP Ratio in the following form B. ITERATIVE MAP DECODING (6.51) (6.52) The decoder is shown in Fig. 6.31. D1 and D2 are the two decoders. Sis the set of 2m constituent encoder states. y is the noisy received word. Using Baye's rule we can write L(ulc) as L(ulc) =log ( p (yiulc =+1) J+ log ( p (ulc =+1) J P (yiulc =-1) P (ulc =-1) (6.53) with the second term representing apn:ori information. Since P(ulc = +1) = P(ulc = -1 ) typically, the a priori term is usually zero for conventional decoders. However, for iterative decoders, D1 receives extrinsic or soft information for each ulc from D2 which serves as a priori information. Similarly, D2 receives extrinsic information from D 1 and the decoding iteration proceeds with the each of the two decoders passing soft information along to the other decoder at each half- iteration except for the first. The idea behind extrinsic information is that D2 provides soft information to D1 for each ub using only information not available to D1. D1 does likewise for D2. N-Bit セdFMゥョエ・イエ@ 01 D2 y1P N-Bit MAP yB e L12 tnter1eaT Decoder2 N-Bit lntertea'f ケセMMMMMMMMMMMMMMMMセ]]]]]]]]MMMMMM⦅⦅ェ@ Fig. 6.31 Implementation of the Iterative Turbo Decoder. At any given iteration, D1 computes Lr(ulc) = L,y}. +L21(ulc)+.G2(ulc) (6.54) where, the first term is the channel value, L, = 4E, I N0 (E, = energy per channel bit), L2r (ulc) is extrinsic information passed from D2 to D1, and .G2 (ulc) is the extrinsic information from D1 to D2. where Convolutional Codes e(' )- [1L P P] yk S 'S - exp 2 cylcXlc ' イセ」HウGLウI@ = exp [ セ@ オセ」Hl・Hオャ」IKlLケォI}NイォHウGLウI@ _Lalr.-1(s')rセイNH@ s' s) alc(s) = ii ')'and a1c_1(s')r セ」HウG@ s s s' Lセ@ lc(s)yセ」HウGLウI@ セャイNMイHウGI@ = .LI · ale-! (s')ylc(s',s) s s' (6.55) (6.56) (6.57) {6.58) {6.59) (6.60) For the above algorithm, each decoder must have full knowledge of the trellis of the constituent encoders, i.e. each decoder must have a table containing the input bits and parity bits for all possible state transitions s' to s. Also, care should be taken that the last m bits of the Nbit information word to be encoded must force encoder1 to the zero state by the セ@ bit. The complexity of convolutional codes has slowed the development of low-cost Turbo Convolutional Codes (TCC) decoders. On the other hand, another type of turbo code, known as Turbo Product Code (TPC), uses block codes, solving multiple steps simultaneously, thereby achieving high data throughput in hardware. We give here a brief introduction to product codes. Let us consider two systematic linear block codes C1 with parameters (nbkl,d1) and セキゥエィ@ parameters HセL@ k;_, セ@ ) where ni, ki and di (i = 1, 2) セエ。ョ、@ for codeword length, number of information bits and minimum Hamming Distance respectively. The concatenation of two block codes (or product code) P = C1 * セゥウ@ obtained (see Fig. 6.32) by the following steps:
  • 102. I n1 Information Theory, Coding and Cryptography ----- MセM n2 セMMMMMMM ---- Check on columns Checks on rows cm」ォゥセ@ -Oft セMMMM セMM Fig. 6.32 Example of a Product Code P = C7 ,. C:c (i) placing (k1 x セ@ ) information bits in an array of k1 rows and セ@ columns, (ii) coding the k1 rows using code C2, (iii) coding the セ」ッャオュョウ@ using code cl. The parameters of the product code Pare : n= n1 * セ@ , k = k1 * セ@ , d =:= d1 * セ@ and the code rate R is given by R1 *R;_ where Ri is the code rate of code C;- Thus, we can build very long block codes with large minimum Hamming Distance by combining short codes with small minimum Hamming Distance. Given the procedure used to construct the product code, it is clear that the Hセ@ - セI@ last columns of the matrix are codewords of C1. By using the matrix generator, one can show that-the last rows of matrix Pare codewords of セM Hence all the rows of matrix Pare codewords of C1 and all the columns of matrix Pare codewords of セM Let us now consider the decoding of the rows and columns of a product code P transmitted ona Gaussian Channel using QPSK signaling. On receiving matrix R corresponding to a transmitted codeword E, the first decoder performs the soft decoding of the rows (or columns) of P using as input matrix R. Soft Input I Soft Output decoding is performed using the new algorithm proposed by R. Pyndiah. By subtracting the soft input from the soft output we obtain the extririsic information W(2) where index 2 indicates that we are considering the extrinsic information for the second decoding of P which was computed during the first decoding of P. The soft input for the decoding of the columns (or rows) at the second decoding ofPis given by R(2) = R + a(2)W(2), (6.61) where a(2) is a scaling factor which takes into account -the fact that the standard deviation of samples in matrix R and in matrix W are different. The standard deviation of the extrinsic information is very high in the first decoding steps and decreases as we iterate the decoding. This scaling factor a is also used to reduce the effect of the extrinsic information in the soft Convolutional Codes decoder in the first decoding steps when the BER is relatively high. It takes a small value ゥセ@ the first decoding steps and increases as the BER tends to 0. tセ・@ decodin? pr?cedure descnbed above is then generalized by cascading elementary decoders Illustrated m Fig. 6.33. a(m) B(m) l W(m + 1) R R DetAY LINE R Fig. 6.33 Block Diagram of Elementary Block Turbo Decoder. Let us now, briefly, look at the performance of Turbo Codes and compare it to that of other existing schemes. As shown in Fig. 6.34, Turbo Codes are the best practical codes due to their performance at lowSNR (at high SNRs, the Reed Solomon Codes ッセエー・イヲッイュ@ Turbo Codes!). セエ@ is obvious from the graph that the Recursive Systematic Convolutional (RSC) Turbo Code 1s the best practical code known so far because it can achieve low BER at セッセ@ SNR and ゥセ@ the closest to the theoretical maximum of channel performance, the Shannon Limit. The magnitude of how well it performs is determined by the coding gain. It can be recalled that the coding gain is the difference in SNR between a coded channel and an uncoded channel for the same performance (BER). Coding gain can be determined by measuring the distance between the 10--{) 10-1 r--- 10-2 10-3 10-4 Bit error 10-S rate QPセ@ 10-7 QPセ@ D [ A 10-9 -1 en 16 ::J ::J 0 ::J -i :::T (1) 0 iセ@ ('i" Ill r 3" 0 -r-- セ@ セゥョァ@ u., |Hセ@ , --...:.. F---r---..... a セ|セ@ Gセ@ q セ@ II <:ll セ||ゥ@ -8 5 dB(/ ) I Z。セ|@ 11t セ@ セ@ , 0 セQ|@ 2 3 4 5 6 7 8 Signal to Noise Ratio (dB) Fig. 6.34 Comparison of Different Coding Systems. セ@ 9 10
  • 103. !· Information Theory, Coding and Cryptography SNR values of any of the coded channels and the uncoded channel at a given BER. For example, the coding gain for the RSC Turbo code, with rate 112 at a BER of 10-5 , is about 8.5 dB. The physical consequence can be visualized as follows. Consider space communication where the received power follows the inverse square law (PR oc 11d2 ). This means that the Turbo coded signal can either be received 2.65 (= -J7) times farther away than the uncoded signal (at the same transmitting power), or it only requires 1/7 the transmitting power (for same transmitting distance). Another way of looking at it is to turn it around and talk about portable device battery lifetimes. For instance, since the RSC Turbo Coded Channel requires only 1/7 the power of the uncoded channel, we can say that a device using a Turbo codec, such as a cell phone, has a battery life 7 times longer than the device without any channel coding. 6.13 CONCLUDING REMARKS The notion of convolutional codes was first proposed by Elias (1954) and later developed by Wozencraft (1957) and Ash (1963). A class of multiple error correcting convolutional code was suggested by Massey (1963). The study of the algebraic structure of convolutional codes was carried out by Massey (1968) and Forney (1970). Viterbi Decoding was developed by AndrewJ. Viterbi, founCler of Qualcomm Corporation. His seminal paper on the technique titled "Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm," was published in IEEE Transactions on Information Theory, Volume IT-13, pages 260-269, in April, 1967. In 1968, Heller showed that the Viterbi Algorithm is practical if the constraint length is not too large. Turbo Codes represent the next leap forward in error correction. Turbo Codes were introduced in 1993 at the International Conference on Communications (ICC) by Berrou, Glavieux arrl Thitimajshima in their paper "Near-Shannon-Limit Error Correction Coding and Decoding- Turbo-Codes". These codes get their name from the fact that the decoded data are recycled through the decoder several times. The inventors probably found this reminiscent of the way a turbocharger operates. Turbo Codes have been shown to perform within 1 dB of the Shannon Limit at a BER of 1o-5 . They break a complex decoding problem down into simple steps, where each step is repeated until a solution is reached. The term "Turbo Code" is often used to refer to turbo convolutional codes (TCCs)-one form of Turbo Codes. The symbol-by- symbol Maximum A Posteriori (MAP) algorithm of Bahl, Cocke, Jelinek and Raviv, published in 1974 (nineteen years before the introduction of Turbo Codes!), was used by Berrou et al. for the iterative decoding of their Turbo Codes. The complexity of convolutional codes has slowed the development of low-cost TCC decoders. On the other hand, another type of Turbo Code, known as Turbo Product Code (TPC), uses block codes, solving multiple steps simultaneously, thereby achieving high data throughput in hardware. Convolutional Codes SUMMARY • An important sub-class of tree codes is called Convolutional Codes. Convolutional Codes make decisions based on past information, i.e. memory is required. A (AQ, no) tree code that is linear, time-invariant, and has a finite wordlength k= (m + 1)AQ is called an (n, k) Convolutional Code. • For Convolutional Codes, much smaller blocks of uncoded data of length hQ are used. These are called Information Frames. These Information Frames are encoded into Codeword Frames of length Tlo· The rate of this Tree Code is defined as R = .5!_. no • The constraint length of a shift register encoder is defined as the number of symbols it can store in its memory. • For Convolutional Codes, the Generator Polynomial Matrix of size hQ x 1lo is given by C{D) = [gij(D)], where, g9 {D) are the generator polynomials of the code. gij(D) are obtained simply by tracing the path from input i to output j. • The Wordlength of a Convolutional Code is given by k= hQ , fi?.3:X [deg giJ (D) + 1], the 1,) Blocklength is given by n= 1lo , セ@ [deg gij(D) + 1] and the constraint length is given by 1,) *<J v = L ュセ@ [deg gij (D)] i= 1 1 • The encoding operation can simply be described as vector matrix product, C(D) = *<J I(D) G(D), or equivalently, c 1(D)= Liz(D)gz,1(D). i=l • A parity check matrix H(D) is an (no - セI@ by 1lo matrix of polynomials that satisfies G(D)H(D)T= 0, and the syndrome polynomial vector which is a (no- AQ)-componentrow vector is given by s (D) = v(D) H (D) T. • A systematic encoder for a convolutional code has the generator polynomial matrix ot the form G(D)= [/I P (D)], where I is a hQ by セ@ identity matrix and P (D) is a セ@ by (no - AQ) matrix of polynomials. The Parity check polynomial matrix for a Systematic Convolutional Encoder is H(D)= [- P(Df I/]. • A Convolutional Code whose generator polynomials g1(D), g2(D),..., griJ(D) satisfy GCD[g1(D), g2(D), ..., griJ(D)] = Jf, for some ais called a Non-Catastrophic Convolutional Code. Otherwise it is called a Catastrophic Convolutional Code. • The lth minimum distance lzof a Convolutional Code is equal to the smallest Hamming Distance between any two initial codeword ウ・セ・ョエウ@ l frame long that are not identical in the initial frame. If l= m+ 1, then this (m + 1) minimum distance is called the minimum distance of the code and is denoted by d*, where m is the number of information frames that can be stored in the memory of the encoder. In literature, the minimum distance is also denoted by dmin .
  • 104. Information Theory, Coding and Cryptography • If the l th minimum distance of a Convolutional Code is d; , the code can correct terrors occurring in the first l frames provided, d[セ@ 2t + 1. The free distance of a Convolutional Code is given by セ・@ = mF [dz]. • The Free Length nfree of a Convolutional Code is the length of the non-zero segment of a smallest weight convolutional codeword of non zero weight. Thus, d1= flt,ee if l= "free, and d1<dfree if l < nfree . In literature, nfree is also denoted by n00 • • Ano!her way to find out the flt,ee of a Convolutional Code is use the concept of a generating function, whose expansion provides all the distance information directly. • The generator matrix for the Convolutional Code is given by [ G0 G1 G2 ··· Gm G= 0 Go G1 .. · Gm- 1 Gm 0 0 G0 Gm _2 Gm -1 Gm 0 0 0 0 '"] 0 0 .. . 0 0 .. . 0 • The Viterbi Decoding Technique is an efficient decoding method for Convolutional ·Codes. Its computational requirements grow exponentially as a function of the constraint length. • For rate R and constraint length, let d be the largest integer that satisfies H( セ@ v ) ,; I - R . Then, at least one Binary Convolutional Code exists with minimum distance dfor which the above inequality holds. Here H(x) is the familiar entropy function for a binary alphabet. • For a binary code with R = llno the minimum distance dmin satisfies dmin セ@ UnoV + no)!2J, where LI Jdenotes the largest integer less than or equal to 1 • An upper bound on dfree is given by Heller is dfree = min ャセMMMMMエ⦅Hカ@ + j -1)j. To j'?.1 2 21 -1 calculate the upper bound, the right hand side should be plotted for different integer values of j. The upper bound is the minimum of this plot. • For Convolutional Codes, the upper bound on the first error probability can be obtained 1 ()T(D, I) by Pe セ@ T(D)ID= 2 セ Q M ) and the bit error probability P6 セM MカpセャMpj@ k ()I r:tl-- 1 = 1,<D= 2-.; p(1- p) • Turbo codes are actually a quasi mix between Block and Convolutional Codes. Turbo Codes typically use at least two convolutional component encoders and two maximum aposteriori (MAP) algorithm component decoders in the Turbo Codes. Although the encoder determines the capability for the error correction, it is the decoder that determines the actual performance. Convolutional Codes セ@ It't-Jc.iwLof.c.·""'-0- do- the,- rmャャGィセ@ • セjj@ セNN@ セ@ i ! 6 f!NrV セᄋ@ ..イセセ@ I i 1 Walt Disney (1901-1966) Gr---------------------------------------------------_j PRV13LEjvtS 6.1 Design a rate 1 12 Convolutional encoder with a constraint length v = 4 and d* = 6. (i) Construct the State Diagram for this encoder. (ii) Construct the Trellis Diagram for this encoder. (iii) What is the dfree for this code? (iv) Give the Generator Matrix, G. (v) Is this code Non-Catastrophic? Why? セ・ウゥァョ@ a (12, 3) systematic convolutional encoder with a constraint length v = 3 and '?.8. (i) Construct the Trellis Diagram for this encoder. (ii) What is the dfree for this code? セッョウゥ、・イ@ the binary encoder shown in Fig. 6.35. Fig. 6.35 (i) Construct the Trellis Diagram for this encoder. '@YW"rite down the values ッヲセG@ no, v, mand R for this encoder. j,: I I ?lt: ,, v; t;.. (iii) What are the values of d* and dfree for this code? /rl::. 4-. ,._ :. セ@ .• セゥセZM the Generator pッャIGiAッセイセ@ . Gセ@ [D+ I pセMQM D'" {/+ dセMエ@ D).l
  • 105. iセ@ jt i j j Information Theory, Coding and Cryptography セッョウゥ、・イ@ the binary encoder shown in Fig. 6.36. Fig. 6.36 (i) Write down the values of k, n, v, m and R for this encoder. (ii) Give the Generator Polynomial Matrix G{D) for this encoder. (iii) Give the Generator Matrix G for this encoder. (iv) Give the Parity Check Matrix H for this encoder. (v) What are the values of a, セ・@ and nfree for this code? (vi) Is this encoder optimal in the sense of the Heller Bound on dfree- (vii) Encode the following sequence of bits using this encoder: 101 001 001 010 000. セッョウゥ、・イ@ a tonvolutional encoder described by its Generator pッッOエAセセゥ。ャ@ Matrix, defined over GF(2): --- --------..... [ D 0 G{D) = D 2 0 1 0 1 D2 0 1+D D 2 0 01Draw the circuit realization of this encoder using shift registers. What is the value of v? - - - - (ii) Is this a Catastrophic Code? Why? (iii) Is this code optimal in the sense of the Heller Bound on dfree . Convolutional Codes 6.6 The Parity Check Matrix of the (12, 9) Wyner-Ash code form= 2 is given as follows. 1 1 1 1 : I I I I I 1 1 0 o:1 1 1 1 I I I I I 0 I I 1 0 1 0: 1 1 o:1 1 1 1 I H= I 0 0 0 o:1 0 1 0 : 1 1 0 0 1 1 1 1 : ... I I I 0 0 0 o:o 0 0 0: 1 0 1 0 1 1 0 o: (i) Determine the Generator Matrix, G. (ii) Determine the Generator Polynomial Matrix, G{D). (iii) Give the circuit realization of the (12, 9) Wyner-Ash Convolutional Code. (viii) What are the values of d* and dfree for this code? 6.7 Consider a Convolutional Encoder defined over GF(4) with the Generator Polynomials g1(D) = 2D3 + 3D2 + 1 and セHdI@ = D3 + D + 1. (i) What is the minimum distance of this code? (ii) Is this code Non-Catastrophic? Why? セエ@ the Generator Polynomials of a 113 binary Convolutional Encoder be given by g1(D) =D3 + d + 1, セHdI@ = D3 + D and &(D)= D3 + 1. セョ」ッ、・@ the bit stream: Q_J__1OQQ11110101; (ii) Encode the bit stream: 10i0101010 .... (iii) Decode the received bit stream: 001001101111000110011. 6.9. Consider a rate 1 12 Convolutional Encoder defined over GF (3) with the Generator Polynomials g1(D) = 2d + 2Ii + 1 and セHdI]@ Jj + D + 2. (i) Show the circuit realization of this encoder. (ii) What is the minimum distance of this code? (iii) Encode the following string of symbols using this encoder: 2012111002102. (iv) Suppose the error vector is given by 0010102000201. Construct the received vector and then decode this received vector using the Viterbi Algorithm. COMPUTER PROBLEMS 6.10 Write a computer program that determines the Heller Bound on dfree, given the values for n0 and v.
  • 106. I204J Information Theory, Coding and Cryptography 6.11 Write a computer program to exhaustively search for good systematic Convolutional Codes. The program should loop over the parameters セG@ no, v, m, etc. and determine the Generator Polynomial Matrix (in octal format) for the best Convolutional Code in its category. 6.12 Write a program that calculates the d and dfree given the generator polynomial matrix of any convolutional encoder. 6.13 Write a computer program that constructs all possible rate 1 12 Convolutional Encoder for a given constraint length, v and chooses the best code for a given value of v. Using the program, obtain the following plots: (i) the minimum distance, d* versus v, and (ii) the free distance, fip.ee versus v. Comment on the error correcting capability of Convolutional Codes in terms of the memory requirement. 6.14 Write a Viterbi Decoder in software that takes in the following: (i) code parameters in the Octal Format, and (ii) the received bit stream The decoder then produces the survivors and the decoded bit stream. 6.15 Verify the Heller Bound on the entries in Table 6.4 for v = 3 , 4, ..., 7. 6.16 Write a generalized computer program for a Turbo Encoder. The program should take in the parameters for the two encoders and the type of interleaver. It should then generate the encoded bit-stream when an input (uncoded) bit-stream is fed into the program. 6.17 Modify the Turbo Encoder program developed in the previous question to determine the dr-ee of the Turbo Encoder. 6.18 Consider a rate 113 Turbo Encoder shown in Fig. 6.37. Let the random interleaver size be 256 bits. (i) Find the fip.ee of this Turbo encoder. (ii) If the input bit rate is 28.8 kb/s, what is the time delay caused by the Encoder. 6.19 Write a generalized computer program that performs Turbo Decoding using the iterative MAP Decoding algorithm. The program should take in the parameters for the two encoders, the type of interleaver used for encoding and the SNR It should produce a sequence of decoded bits when fed with a noisy, encoded bit-stream. 6.20 Consider the rate 1/3 Turbo Encoder comprising the following constituent encoders: G (D) = G (D)= (1 1+ D2 + D3 + D4 ) 1 2 1+ D + D4 . The encoded output consists of the information bit, followed by the two parity bits from the two encoders. Thus the rate of the encoder is 113. Use a random interleaver of size 256. Convolutional Codes セMMMMMMMMMMMMMMMMMMMMセ@ Fig. 6.37 Turbo Encoder for Problem 6.78. (i) For this Turbo Encoder, generate a plot for the bit error rate (BER) versus the signal to noise ratio (SNR). Vary the SNR from -2 dB through 10 dB. (ii) Repeat the above for an interleaver of size 1024. Comment on your results.
  • 107. I ., l 7 Trellis Coded Modulation 7.1 INTRODUCTION TO TCM In the previous chapters we have studied a number of error control coding techniques. In all these techniques, extra bits are added to the information bits in a known manp.er. However, the improvement in the Bit Error Rate is obtained at the expense of bandwidth caused by these extra bits. This bandwidth expansion is equal to the reciprocal of the code rate. For example, an RS (255, 223) Code has a code rateR= 223/255 = 0.8745 and IIR= 1.1435. Hence, to send 100 information bits, we have to transmit 14.35 extra bits (overhead). This tr3.I1:slates to a bandwidth expansion of 14.35%. Even for this efficient RS (255, 223) code, the excess bandwidth requirement is not small. In power limited channels (like deep space communications) one may trade the bandwidth expansion for a desired performance. However, for bandwidth limited channels (like the telephone channel), this may not be the ideal option. In such channels, a bandwidth efficient signalling scheme such as Pulse Amplitude Modulation (PAM), Quadrature Amplitude Trellis Coded Modulation Modulation (QAM) or Multi Phase Shift Keying (MPSK) is usually employed to support high bandwidth efficiency (in bit!s/Hz). In general, either extra bandwidth or a higher signal power is needed in order to improve the performance (error rate). Is it possible to achieve an improvement in system performance without sacrificing either the bandwidth (which translates to the data rate) or using additional power? In this chapter we study a coding technique called the Trellis Coded Modulation Technique, which can achieve better performance without bandwidth expansion or using extra power. We begin this chapter by introducing the concept of coded modulation. We, then, study some design techniques to construct good Coded Modulation Schemes. Finally, the performance of different Coded Modulation Schemes are discussed for Additive White Gaussian Noise (AWGN) Channels as well as for Fading Channels. 7.2 THE CONCEPT OF CODED MODULATION Traditionally, coding and modulation were considered two separate parts of a digital communications system. The input message stream is first channel encoded (extra bits are added) and then these encoded bits are converted into an analog waveform by the modulator. The objective of both the channel encoder and the modulator is to correct errors resulting from use of a non-ideal channel. Both these blocks (the encoder and the modulator) are optimized independently even though their objective is the same, that is, to correct errors introduced by the channel! As we have seen, a higher performance is possible by lowering the code rate at the cost of bandwidth expansion and increased decoding complexity. However, it is possible to obtain Coding Gain without bandwidth expansion if the channel encoder is integrated with the modulator. We illustrate this by a simple example. Example 7.1 Consider data transmission over a channel with a throughput of 2 bits/s/Hz. One possible solution is to use uncoded QPSK. Another possibility is to first use a rate 213 Convolutional Encoder (which converts 2 uncoded bits to 3 coded bits) and then use an 8-PSK signal set which has a throughput of 3 bit/s/Hz. This coded 8-PSK scheme yields the same information data throughput of the uncoded QPSK (2 bit/s/Hz). Note that both the QPSK and the 8-PSK schemes require the same bandwidth. But we know that the. symbolerrorrate for the 8-PSK is worse than that of QPSK for the same energy per symbol However, the 213 convolutional encoder would provide some coding gain. It may be possible that the coding gain provided by the encoder outweighs the performance loss because of the 8-PSK signal set. Ifthe coded modulation scheme performs superior to the uncoded one at the same SNR, we can claim that an improvement is achieved without sacrificing either the data rate or the bandwidth. In this example we have combined a trellis encoder with the modulator. Such a scheme is called a Trellis Coded Modulation (TCM) scheme. .,
  • 108. 1' I QセL@i,: I' Information Theory, Coding and Cryptography We observe that the expansion of the signal set to provide redundancy results in the shrinking of the Euclidean distance between the signal points, if the average signal energy is to be kept constant (Fig. 7.1). This reduction in the Euclidean distance increases the error rate which should be compensated with coding (increase in the Hamming Distance). Here we are assuming an AWGN channel. We also know that the use of a hard-decision demodulation prior to decoding in a coded scheme causes an irreversible loss of information. This translates to a loss of SNR. For coded modulation schemes, where the expansion of the signal set implies a power penalty, the use of soft-decision decoding is imperative. As a result, demodulation and decoding should be combined in a single step, and the decoder should operate on the soft output samples of the channel. For maximum likelihood decoding using soft-decisions, the optimal decoder chooses that code sequence which is nearest to the received sequence in terms of the Euclidean distance. Hence, an efficient coding scheme should be designed based on maximizing the minimum Euclidean distance between the coded sequences rather than the Hamming Distance. QPSK 8-PSK s1 52 S3 Xセ@ = 2E8 812 = 4Es Ss Xセ@ = 0.586 E5 812=2Es 8; = 3.414 E5 8t = 4 Es Fig. 7.1 The Euclidean Distances between the Signal Points for QPSK and 8-PSK. The Viterbi Algorithm can be used to decode the received symbols for a TCM scheme. In the previous chapter we saw that the basic idea in Viterbi decoding is to trace out the most likely path through the trellis. The most likely path is that which is closest to the received sequence in terms of the Hamming Distance. For a TCM scheme, the Viterbi decoder chooses the most likely path in terms of Euclidean Distance. The performance of the decoding algorithm depends on the minimum Euclidean distance between a pair of paths forming an error event. Definition 7.1 The minimum Euclidean Distance between any two paths in the trellis is called the Free Euclidean Distance, dfr« of the TCM scheme. Trellis Coded Modulation In the previous chapter we had defined dp.ee in terms of Hamming Distance between any two paths in the trellis. The minimum free distance in terms of Hamming Weight could be calculated as the minimum weight of a path that deviates from the all zero path and later merges back into the all zero path at some point further down the trellis. This was a consequence of the linearity of Convolutional Codes. However, the same does not apply for the case of TCM, which is non linear. It may be possible that dfree is the Euclidean Distance between two paths in the trellis neither of which is the all zero path. Thus, in order to calculate the Free Euclidean Distance for a TCM scheme, all pairs of paths have to be evaluated. Example 7.2 Consider the convolutional encoder followed by a modulation block performing natural mapping (000 セ@ s0 , 001 セ@ s1 , .•., 111 セ@ s7 ) shown in Fig. 7.2. The rate of the encoder is 2/3.1t takes in two bits at a time (a1, a2J and outputs three encoded bits (c1 , c2 , c3 ). The three output bits are then mapped to one of the eight possible signals in the 8-PSK signal set. (8-PSK) Fig. 7.2 The TCM Scheme for Example 7.2. This combined encoding and modulation can also be represented using a trellis with its branches labelled with the output symbol si. The TCM scheme is depicted below. This is a fully connected trellis. Each branch is labelled by a symbol from the 8-PSK constellation diagram. In order to represent the symbol allocation unambiguously, the assigned symbols to the branches are written at the front end ofthe trellis. The convention is as follows. Consider state 1. The branch from state 1 to state 1 is labelled with s0, branch from state 1 to state 2 is labelled with s7 , branch from state 1 to state 3 is labelled with s5 and branch from state 1 to state 4 is labelled with s2. So, the 4-tuple {s0 , s7 , s5 , s2 } in front of state 1represents the branch labels emanating from state 1 in sequence. To encode any incoming bit stream, we follow the same procedure as for convolutional encoder. However, in the case of TCM, the output is a sequence of symbols rather than a sequence of bits. Suppose we have to encode the bit stream 1 0 1 1 1 0 0 0 1 0 0 1 ... We first group the input sequence in pairs because the input is two bits at a time. The grouped input sequence i& 10111000 ... The TCM encoder output can be obtained simply by following the path in the trellis as dictated by the input sequence. The first input pair is 10. Starting from the first node in state 0, we traverse the third branch emanating from this node as dictated by the input 01. This takes us to state 2. The
  • 109. I I I Information Theory, Coding and Cryptography symbol output for this branch is s5• From state 2 we move along the fourth branch as determined by the next input pair 11. The symbol output for this branch is s1• In this manner, the output symbols corresponding to the given input sequence is State 0: So s, ss State 1: ss s2 so % 8-PSK 8:2 Fig. 7.3 The Path in the Trellis Corresponding to the Input Sequence 10 11 10 00 ... The path in the Trellis Diagram is depicted by the bold lines in Fig. 7.3. As in the case of convolutional encoder, in TCM too, every encoded sequence corresponds to a unique path in the trellis. The objective of the decoder is to recover this path from the Trellis Diagram. Example 7.3 Consider the TCM scheme shown in Example 7.2. The free Euclidean Distance, dfree of the TCM scheme can be found by inspecting all possible pairs of paths in the trellis. The two paths that are separated by the minimum squared Euclidean Distance (which yields the セイ・・I@ are shown in the Trellis Diagram given in Fig. 7.4 with bold lines. 8-PSK 8:2 % Fig. 7.4 The Two Paths in the Trellis that have the Free Euclidean Distance, d 2 tree· Trellis Coded Modulation d}ee =4 (sa, s7) +tJl(sa, sa)+ 4 (-S2, si) = B5 + 0 + B5 = 2B5 = 1.172 Er It can be seen that in this case, the error event that results in dfree does not involve the all zero sequence. As mentioned before, in order to find the dfree> we must evaluate all possible pairs of paths in.the trellis. It is not sufficient just to evaluate the paths diverging from and later merging back ゥョエセ@ the all zero path because of the non-linear nature of TCM. We must now develop a method to compare the coded scheme with the uncoded one. We introduce the concept of coding gain below. Definition 7.2 The difference between the values of the SNR for the coded and uncoded schemes required to achieve the same error probability is defined as the Coding Gain, g. g= SNRiuncodtd- SNRicodtd {7.1) At high SNR, the coding gain can be expressed as (d}m!Es) g,., = giSNR--too = 10 log 2 coded (7.2) (dfre/Es )lmCIJdtd where g,., represents the Asymptotic Coding Gain and Es is the average signal energy. For uncoded schemes, dfree is simply the minimum Euclidean Distance between the si al oints. Example 7.4 Consider the TCM scheme discussed in Example 7.2 in which the encoder takes in 2 bits at a time. Ifwe were to send uncoded bits, we would employ QPSK. Thedfree for the uncoded scheme (QPSK) is 2£8 from Fig. 7.1. From Example 7.3 we have dfree = 1.172£8 for our TCM scheme. The Asymptotic Coding Gain is then given by goo= 10 log 1.1 72 = -2.3 dB. 2 This implies that the performance of our TCM scheme is actually worse than the uncoded scheme. A quick look at the convolutional encoder used in this example suggests that it has good properties in terms of Hamming Distance. In fact, it can be verified that this convolutional encoder is optimal in the sense of maximizing the free Hamming Distance. However, the encoder fails to perform well for the case ofTCM. This illustrates the point that TCM schemes must be designed to maximize the Euclidean Distance rather than the Hamming Distance.
  • 110. ,It 'i i I' セ@ Information Theory, Coding and Cryptography For a fully connected trellis discussed in Example 7.2, by a proper choice of the mapping ·scheme, we can improve the performance. In order to design a better TCM scheme, it is possible to directly work from the trellis onwards. The objective is to assign the 8 symbols from the 8-PSK signal set in such a manner that the dfree is maximized. One approach is to use an exhaustive computer search. There are a total of 16 branches that have to be assigned labels (symbols) from timet k tot k+- 1 . We have 8 symbols to choose from. Thus, an exhaustive search would involve 816 different cases! Another approach is to assign the symbols to the branches in the trellis in a heuristic manner so as to increase the 、ヲイ・セ@ We know that an error event consists of a path diverging in one state and then merging back after one or more transitions, as depicted in Fig. 7.5. The Euclidean Distance associated with such an error event can be expressed as d'fotaz= d1 (diverging pair of paths)+ ... + d セ@ (re-merging pair of paths) (7.3) VNodesin the Trellis Fig. 7.5 An Error Event. Thus, in order to design a TCM scheme with a large dfree• we can at least ensure that the 、セ@ (diverging pair of paths) and the 、セ@ (re-merging pair of paths) are as large as possible. In TCM schemes, a redundant 2m+1 -ary signal set is often used to transmit m bits in each signalling interval. The minput bits are first encoded by an ml(m+1) convolutional encoder. The resulting m + 1 output bits are then mapped oo the signal points of the 2m+l_ary signal set. Now, recall that the maximum likelihood decoding rule for the AWGN channel with soft decision decoding is tominimize the squared Euclidean Distance between the received vector and the code vector estimate from the trellis diagram (see Section 6.7, Chapter 6). Therefore, the mapping is done in such a manner as to maximize the minimum Euclidean Distance between the different paths in the trellis. This is done using a rule called Mapping by Set Partitioning. 7.3 MAPPING BY SET PARTITIONING The Mapping by Set Partitioning is based on successive partitioning of the expanded 2m+1 -ary signal set into subsets with increasing minimum Euclidean Distances. Each time we partition the set, we reduce the number of the signal points in the subset, but increase the minimum distance between the signal points in the subset. The set partitioning can be understood by the following example. Trellis Coded Modulation iS E I ... 21 t ' &Sill Example 7.5 Consider the set partitioning of 8-PSK. Before partitioning, the minimum Euclidean Distance of the signal set is 6.o = Cia . In the first step, the 8 points in the constellation diagram are subdivided into two subsets,A0 andA1 , each containing 4 signal points as shown in Fig. 7.6. As a result of this first step, the minimum Euclidean Distance of each of the subsets is now L1o = <>t, which is larger than the minimum Euclidean Distance of the original 8-PSK. We continue this procedure and subdivide the sets A0 and A1 and into two subsets each, A0 セ@ {Aoo. A01 } and A1 セ@ {A10 , aセ Q@ }. As a result of this second step, the minimum Euclidean Distance ofeach ofthe subsets is now セ@ = セ@ . Further subdivision results in one signal point per subset. ± OM[ーセ@ * * . .61 = 81 A0 A1 • • / セ@ / セ@ * *0 *. *0 .62 = 15:3 Aoo Ao1 A1o A11 0 0 • 0 0 • / / / / セッe@ _o'f ッセ@ セッセ@ セッGエセ@ ± セッGケ@ ッセ@ £ セッGセ@ セセセセセセセセ@ Fig. 7.6 Set Partitioning of the 8-PSK S•signal Set. Consider the expanded 2m+1 -ary signal set used for TCM. In general, it is not necessary to continue the process of set partitioning until the last stage. The partitioning can be stopped as soon as the minimum distance of a subset is larger than the desired minimum Euclidean Distance of the TCM scheme to be designed. Suppose the desired Euclidean Dista.1ce is obtained ェオウセ@ after the iii + 1th set partitioning step ( iii セ@ m). It can be seen that after iii+ 1 steps we have 2m+ 1 subsets and each subset contains 2m- msignal points. A general structure of a TCM encoder is given in Fig. 7.7. It consists of m ゥョセセエ@ bits of セィゥセィ@ the fh bits are fed into a rate m!( fh+ 1) convolutional encoder while the remammg m- m b1ts are left uncoded. The fh + 1 output bits of the encoder along with the m - fh uncoded bits are then input to the signal mapper. The ウセァョ。ャ@ mapper uses the fh + 1 bits from the 」ッセカッャオエゥッョ。ャ@ encoder to select one of the possible 2m+ 1 subsets. The remaining m - fh uncoded bits are used to select one of the 2m+ m signals from this subset. Thus, the input to the TCM encoder are m bits and the output is a signal point chosen from the original constellation. 1 I
  • 111. I I ,I m Information Theory, Coding and Cryptography m-m uncoded bits "' '' ( 1 1 1 セ@ Signal mapper I I } m lnputbits セ@ I MMMM[セMMMMMMMMMMMNNNANNNZ@ --T-- 1 MM]MMセ@ Select signal I' I I I I I I '_I I -I ;n I I I '. ,' from subset ''. セZ[[セイMMM[MMH@ -:r- 1 MMMMMKMセ@ } Select subset I'/ ;n + 1 coded bits Fig. 7.7 The General Structure of a TCM Encoder. For the TCM encoder shown in Fig. 7.7 we observe that m- muncoded bits have no effect on the state of the convolutional encoder because the input is not being altered. Thus, we can change the first m - m bits of the total m input bits without changing the encoder state. This implies that 2m - m parallel transitions exist between states. These parallel transitions are associated with the signals of the subsets in the lowest layer of the set partitioning tree. For the case of m = m,the states are joined by single transitions. Let us denote the minimum Euclidean Distance between parallel transitions by 11m + 1 and the minimum Euclidean Distance between non-parallel paths of the trellis by dfree (m). The free Euclidean Distance of the TCM encoder shown in Fig. 7.7 can then be written as dfree = min [!1m + 1' セ・・@ (m)]. (7.4) EXtllllple 7.6 Consider the TCM scheme·proposed by Ungerboeck. It is designed to maximize the Free Euclidean Distance between coded sequences. It consists ofa rate 2/3 convolutional encoder coupled with an 8-PSK signal set mapping. The encoderis given inFig. 7.8 and the corresponding trellis diagram in Fig. 7.9. QQQMMMMMMMMMMMMMMMMセ@ c1 Natural li2 MMMMMMMMMMMNNNNMMMMMMセ@ セ@ Mapping (8-PSK) Fig. 7.8 The TCM Encoder for Example 7.6. S; Trellis Coded Modulation So So 8.4 S;z Ss s1 ss sa s-r 5:2 -% So 54 sa s-r s1 ss Fig. 7.9 The Trellis Diagram for the Encoder in Example 7.6. For this encoderm = 2 and m= 1, which implies that there are 2m-m = 21 = 2 parallel transitions between each state. The minimum squared Euclidean distance between parallel transitions is ..t2 _ ..t2 _ i:'2 _ 4E .u m+1 - .u2- u2- s · The minimum squared Euclidean Distance between non-parallel paths in the trellis, dfne (m ), is given by the error event shown in Fig. 7.9 by bold lines. From the figure, we have d)u (m) = 、セHセL@ oS2 ). + 4 HセL@ s1) + d'i: HセL@ oS2) = Df + Xセ@ + 8f = 4.586 Es. The error events associated with the parallel paths have the minimum squared Euclidean Distance among all the possible error events. Therefore, the minimum squared Euclidean Distance for the TCM scheme is clJree = mm(!1 2 m+ 1, dJne (m)] = 4Es. The asymptotic coding gain of this schemejs 4 g00 = 10 log Z = 3 dB This shows that the TCM scheme proposed by Ungerboeck shows an improvement of 3 dB over the uncoded QPSK. This example illustrates the point that the combined coded modulation scheme can compensate for the loss from the expansion ofthe signal set by the coding gain achieved by the convolutional encoder. Further, for the non-parallel paths 、セ@ = li (diverging pair of paths) + ... + ii {re-merging pair of paths) = セR@ + ... + 5{ = (5{ + セ R@ ) + ... = 8l + ... = 4Es + ... However, the minimum squared Euclidean Distance for the parallel transition is 8f= 4Es . Hence, the minimum squared Euclidean Distance ofthis TCM scheme is determined by the parallel transitions.
  • 112. Information Theory, Coding and Cryptography 7.4 UNGERBOECK'S TCM DESIGN RULES In 1982 Ungerboeck proposed a set of design rules for maximizing the free Euclidean Distance ·for TCM schemes. These design rules are based on heuristics. Rule 1: Parallel Transitions, if present, must be associated with the signals of the subsets in the lowest layer of the Set Partitioning Tree. These signals have the minimum Euclidean Distance Rule 2: The transitions originating from or merging into one state must be associated with signals of the first step of set partitioning. The Euclidean Distance between these signals is at least L11• Rule 3: All signals are used with equal frequency in the Trellis Diagram. 2£& w; t : a a : .a a a Example 7.7 Next, we wish to improve upon the TCM scheme proposed in Example 7.6. We observed in the previous example that the parallel transitions limit the dfree .Therefore, we must come up with a trellis that has no parallel transitions. The absence of parallel paths would imply that the dJ;.ee is not limited todl, the maximum possible separation between two signal points in the 8-PSK Constellation Diagram. Consider the Trellis Diagram shown in Fig. 7.10. The trellis has 8 states. There are no Parallel Transitions in the Trellis Diagram. We wish to assign the symbols from an 8-PSK signal set to the branches of this trellis according to the Ungerboeck rules. Since there are no parallel transitions here, we start directly with Ungerboeck's second rule. We must assign the transitions originating from or merging into one state with signals from the first step of set partitioning. We will refer to Fig. 7.6 for the Set Partitioning Diagram for 8-PSK. The first step of set partitioning yields two subsets, A0 andA1 , each consisting offour signal points. We first focus on the diverging paths. Consider the topmost node (state S0 ). We assign to these four diverging paths the signals s0, s4, s2 ands6 . Note that they all belong to the subsetA0 . fセヲGエィ・@ next node (stateS1 ), we assign the signalss1, s5, s3 ands7 belonging to the subsetA1. For the next node (state S2 ), we assign the signals s4, s0, s6 and s2 belonging to the subset A0• The order has been shuffled to ensure that at the re-merging end we still have signals from the first step of set partitioning. If we observe the four paths that merge into the node of state S0, their branches are labelleds0, s4, s2 ands6, which belong toAo. This clever assignment has ensured that the transitions originating from or merging into one state are labelled with signals from the first step of set partitioning, thus satisfying rule 2. It can be verified that all the signals have been used with equal frequency. We did not have to make any special effort to ensure that. The error event corresponding to the squared free Euclidean Distance is shown in the Trellis Diagram with bold lines. The squared free Euclidean Distance of this TCM Scheme is 4,= 4 (.fo, 56 ) + 4 (.fo, s7 ) + di (.fo, s6 ) = \Uセ@ + \UセK@ \Uセ]@ 4.586 Es Trellis Coded Modulation State So: so S<j セ@ Ss 0 State Sf s1 ss s3 s-, 0 State S2: S<j so Sa s2 0 State S3: Ss s1 s-, s3 0 0 State S4: セ@ Ss So s4 0 0 State S5: s3 s-, s1 ss 0 0 State Ss: ss セ@ s4 so 0 0 State S7: s-, S3 ss s1 0 0 0 Fig. 7.10 The Trellis Diagram for the Encoder in Example 7.7. In comparison to uncoded QPSK, this translates to an asymptotic coding gain of goo = 10 log 4 · 586 = 3.60 dB 2 [W] Thus, at the cost ofadded encoding and decoding complexity, we have achieved a 0.6 dB gain over the TCM scheme discussed in Example 7.6. Example 7.8 Consider the 8 state, 8-PSK TCM scheme discussed in Example 7.7. The equivalent systematic encoder realization with feedback is given in Fig. 7.11. 。 Q MMMMMMMMMMMMMMMMMMMMMMMMMMMNMMMMMMMMMMセ」Q@ S; MMMMMMMMMMMMMMセMMMMMMMMMMMMKMMMMMMMMMMMMMMA@ <>.2 Natural 8 2 Mapping (8-PSK) Fig. 7.11 The TCM Encoder for Example 7.7. Let us represent the output of the convolutional encoder shown in Fig. 7.11 in terms of the input and delayed versions of the input (See Section 6.3 of Chapter 6 for analytical representation of Convolutional Codes). From the figure, we have c1 (D)= a1 (D), c2 (D)= a2 (D),
  • 113. Information Theory, Coding and Cryptography c3(D) = ( D 2 2 ) 。ゥdIKHセI@ a2 (D) 1+D 1+D Therefore, the Generator Polynomial Matrix of this encoder is G(D) = [ 1 0 1セセS@ ] 0 1 _D_ 1+D3 and the parity check polynomial matrix, H(D) , satisfying G(D). H T(D) = 0 is H (D) = [d D1 + D3 ]. We can re-write the parity check polynomial matrix H(D) = [H1 (D) H1 (D) H3 (D)], where H 1 (D) =D 2 =(000 1OO)binary =(04)octal• H 2 (D) =D =(000 OIO)binary =(02)octal• H 3 (D)= 1+D 3 = (001 001)binary = (11)octa1. Table 7.1 gives the encoder realization and asymptotic coding gains of some of the good TCM Codes constructed for the 8-PSK signal constellation. Almost all ofthese TCM schemes have been found by exhaustive computer searches. The coding gain is given with respect to the uncoded QPSK. The parity check polynomials are expressed octal form. Table 7.1 TCM Schemes Using 8-PSK 4 2 5 4.00 3.01 8 04 02 11 4.58 3.6 16 16 04 23 5.17 4.13 32 34 16 45 5.75 4.59 64 066 030 103 6.34 5.01 128 122 054 277 6.58 5.17 256 130 072 435 7.51 5.75 Example 7.9 We now look at a TCM scheme that involves 16QAM. The TCM encoder takes in 3 bits and outputs one symbol from the 16QAM Constellation Diagram. This TCM scheme has a throughput of 3 bits/s/Hz and we will compare it with uncoded 8-PSK, which also has a through- put of 3 bits/s/Hz. Let the minimum distance between two points in the Signal Constellation of 16QAM be 0oas depicted in Fig. 7.12. It is assumed that all the signals are equiprobable. Then the average signal energy of a 16QAM signal is obtained as Trellis Coded Modulation •• •• e • • • • •• • • • • • -...o eo 0. 0., !J. =J2セM o 'e o ., A 1 vo eoeo 0 A, セ@ セ@セ@ セ@ o e o e eo eo 0 0 0 0 0 • 0 • 1 1 II セセセ@ !J.3 =2-J.c.Co セ@ セ@ 0 0 0 0 0 Aooo ooeo oooo oooo oeoo eooo oooo oooo oooe A,oo Ao1o 0 0 0 0 0 0 0 • 0 0 0 0 0. 0 0 A110 0 • 0 • セ@ セ@ セ@セ@ Ao1 0 0 0 0 1 0 • 0 0 0 0 0 • 0 0 0 0 0 0 0 0 0 0 0 • oeoo 0 0 0 0 0 0 0 0 Aoo, A,o, Fig. 7.12 Set Partitioning of 16QAM. Thus we have, Bo = Rセ@ Es 10 0000 aLLセセセセ@ eo eo 1 0 0 0 0 0000 ., 0 0 0 0 0. 0 0 0 0 0 0 0 0 0 0 0. 0 • 0 0 0 Ao11 A,,, The Trellis Diagram for the 16QAM TCM scheme is given in Fig. 7.13. The trellis has 8states. Each node has 8 bnmches emanating from it because the encoder takes in 3 input bits at a time (23 = 8). The encoder realization is given in Fig. 7.14. The Ungerboeck design rules are followed to assign the symbols to the different bnmches. The branches diverging from a node and the branches merging back to a node are assigned symbols from the set A0 and A1• The parallel paths are assigned symbols from the lowest layer of the Set Partitioning Tree (A000, A001 , etc.). The squared Euclidean Distance between any two parallel paths is lQセ@ = 885.This is by design as we have assigned symbols to the parallel paths from the lowest layer ofthe set Partitioning Tree. The minimum squared Euclidean Distance between non-parallel paths is 4 = L1f + L1o2 + L1f = 5bo2 Therefore, the free Euclidean Distance for the TCM scheme is dfu=min {XセL@ 58#] = sDg = 2£.
  • 114. Information Theory, Coding and Cryptography Note that the free Euclidean Distance is determined by the non-parallel paths rather than the parallel paths. We now compare the TCM scheme with the uncoded 8-PSK, which has the same throughput. For uncoded 8-PSK, the minimum squared Euclidean Distance is (2 - J2)Es. Thus, the asymptotic coding gain for this TCM encoder is 2 g, = lOlog J2 5.3 dB 2- 2 State S0: Aooo A1oo Ao1o A110 So So So State S1: Aoo1 A1o1 Ao11 A111 0 State Si Aooo A1oo A110 Ao1o 0 State S3: A1o1 Aoo1 A111 Ao11 0 State S4: Ao1o A110 Aooo A1oo 0 State S5: Ao11 A111 Aoo1 A1o1 0 State Ss: A110 Ao1o A1oo Aooo 0 State Sr: A111 Ao11 A1o1 Aoo1 o Fig. 7.13 Trellis Diagram for the 16 QAM TCM Scheme. 。QMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMセ」 Q@ 8;! MMMMMMMMMMMMMMMMMMMMMMMMMMMMNMMMMMMMMMMセ@ c:z Natural Mapping XSMMMMMMMMMMMMMMセMMMMMMMMMMMKMMMMMMMMMMセセ@ C4 (16QAM) S; Fig. 7.14 The Equivalent Systematic Encoder for the Trellis Diagram Shown in Fig. 7. 13. !7 sa $ a " 9 7 ··no 0 7.5 TCM DECODER We have seen that, like Convolutional Codes, TCM Schemes are also described using Trellis Diagrams. Any input sequence to a TCM encoder gets encoded into a sequence of symbols based on the Trellis Diagram. The encoded sequence corresponds to a particular path in this trellis diagram. There exists a one-to-one correspondence between an encoded sequence and a path within the trellis. The task of the TCM decoder is simply to identify the most likely path in the trellis. This is based on the maximum likelihood criterion. As seen in the previous chapter, an efficient search method is to use the Viterbi algorithm (see Section 6.7 of the previous chapter). Trellis Coded Modulation For soft decision decoding of the received sequences using the Viterbi Algorithm, each trellis branch is labelled by the branch metric based on the observed received sequence. Using the maximum likelihood decoder for the Additive White Gaussian Noise (AWGN) channels, the branch metric is defined as the Euclidean Distance between the coded sequence and the received sequence. The Viterbi Decoder finds a path through the trellis which is closest to the received .sequence in the Euclidean Distance sense. Definition 7.3 1be branch metric for a TCM scheme designed for AWGN channel is the Euclidean Distancebetween the received signal and the signal associated with the corresponding branch in the trellis. In the next section, we study the performance of TCM schemes over AWGN channels. We also develop some design rules. 7.6 PERFORMANCE EVALUATION FOR AWGN CHANNEL There are different performance measures for a TCM scheme designed for an AWGN channel. We have already discussed the asymptotic coding gain, which is based on free Euclidean Distance, d.free . We will now look at some other parameters that are used to characterize a TCM Code. Definition 7.4 The average number of nearest neighbours at free distance, nHセIL@ gives the average number of paths in the trellis with free Euclidean Distance drmfrom a transmitted sequence. This number is used in conjunction with dfrte for the evaluation of the error event probability. Definition 7.5 Two finite length paths in the trellis form an error event if they start form the same state, diverge and then later merge back. An error event of length lis defined by two coded sequences sn and in , s, = (sn, sn+l• ... ' sn+l+l ) such that sn+l+l = 5n+l+l s; :t s;, i = n + 1, ... , n + L (7.5) Definition 7.6 The probability of an error event starting at time n, given that the decoder has estimated the correct transmitter state at that time, is called the error ・カ・セエ@ probability, Pe-
  • 115. r. r 'i /1 li !i !i II vi r: I 1, •1 .'l Ill II! I! :I ;• I Information Theory, Coding and Cryptography The performance ofTCM schemes is generally evaluated by means ofupper bounds on error event probability. It is based on the generating function approach. Let us consider again the Ungerboek model for rate ml(m + 1) TCM scheme as shown in Fig. 7.7. The encoder takes in mbits at a time and encodes it to m+1 bits, which are then mapped by a memoryless mapper, f(.), on to a symbol s,, Let us call the binary (m + 1)-tuples ci as the labels of the signals si. We observe that there is a one-to-one 」ッイイ・セーッョ、・ョ」・@ between a symbol and its label. Hence, an error event oflength l can be equivalently described by two sequences of labels Cz = (ck, ck+l' ..., 」ォKセャI@ and C[= (ck, c'k+b ..., c GォKセャ@ ), (7.6) where, ck = ck EB ek, c'k+1 = ckt1 EB ek+1 , .•• , and E1=(ek, ek+1 , ••• , ・ォKセャI@ is a sequence of binary error vectors. The mathematical symbol EB represents binary {modulo-2) addition. An error event of length l occurs when the decoder chooses, instead of the transmitted sequence C1, the sequence C[ which corresponds to a path in the trellis diagram that diverges from the original transmitted path and re-merges back exactly after !time intervals. To find the probability of error, we need to sum over all possible values of l the probabilities of error events oflength l (i.e., joint probabilities that C1is transmitted and C[ is detected). The upper bound on the probability of error is obtained by the following union bound 00 p ・セ@ LL LP(sl)P(Sz,s[) (7.7) l = 1s1 s{ *'I where P (s1 s[) denotes the pairwise error probability (i.e., probability that the sequence s1 is transmitted and the sequence s[ is detected). Assuming a one-to-one correspondence between a symbol and its label, we can write 00 pLセ@ LL LP(C1)P(C1,Cl) i=lCr Cl*Cr 00 = LL LP(C1)P(C1 EB E,) {7.8) ャ]QcイセJッ@ The pairwise error probability P (Cz, Cz, E1 ) can be upper-bounded by the Bhattacharyya Bound (see Problem 7.12) as follows -{- 1 -llf(CL)- f(Cr)l2 } P(Cz, Cz, EB E1) セ@ e 4No = eMセNセ@ ..セセヲHcャIM f(C'llf) (7.9) where f(.) is the memoryless mapper. Let D = e-{T セ P@ } (for Additive White Gaussian Noise Channel with single sided power spectral density N0), then Trellis Coded Modulation (7.10) where 、セサヲHc Q@ ),f(C'1)) represents the squared Euclidean distance between the symbol sequences s1and s[. Next, define the function W(Et) = LP(Cl) r)if(Cr)- f(Ct(;f)Etlii 2 Ct (7.11) We can now write the probability of error as 00 ー・セ@ L LW(El) (7.12) I= 1Ez *o From the above equation we observe that the probability of error is upper-bounded by a sum over all possible error events, E1. Note that (7.13) We now introduce the concept of an error state diagram which is essentially a graph whose branches have matrix labels. We assume that the source symbols are equally probable with probabilities 2-m = 11M DefbdtloD ,._,.Theerror weight.matrtx,·.G(e;) is tm Nx Nmatrix whose elementin me·f'.iowtt:nd f1h eolumn is detmed u Tセ@ )t :イセセセeカャHセmI@ セ@ Q GセQTエセャエ@ ·ifthereJl.aセセNM・Bャエッ@ ウエ。エ・セエ@ :_ ' l セ@ ; ' ' [aヲセセヲN@ ·. .· . •セG@ •,;••; .: • :•'• I : . , . ' N。gオエセスセ@ i}FQ,'ifthere is no セMNヲゥZッュNエ。エ・ー」キN@ ヲNᆱセ@ trdiD, {7.14) ·where c 1 -+ fare the label vectorS generatedby the transitionfrom. state .p.to state 1J. The summation accounts for the possible parallel transitions (parallel paths) between states in the Trellis Diagram. The entry (p, q) in the matrix G provides an upperbound on the probability that an error event occurs starting from the node pand ending on q. Similarly, (11N)Gl is a vector whose pth entry is a bound on the probability of any error event starting from node p. Now, to any sequence E1 =e1 , e2 , ..., e1, there corresponds a sequence of l error weight matrices G(e1), G(e1), .•. , G(e1). Thus, we have 1 T l fYt:Et) = -1 TI G(en)1 N n=l (7.15) where 1 is a column N-vector all elements of which are unity. We make the following observations: ·.I
  • 116. I> :1 li II iᄋセ@ Information Theory, Coding and Cryptography (i) For any matrix A, 1T A 1 represents the sum of all entries of A. l (ii) The element (p, q) of the matrix = TI G{en) enumerates the Euclidean Distance n=l involved in transition from state pto state qin exactly l steps. Our next job is to relate the above analysis to the probability of error, Pe . It should be noted that the error vectors e1, e2 , ... , e1 are not independent. The Error State Diagram has a structure determined only by the Linear Convolutional Code and differs from the Code State Diagram only in the denomination of its state and branch labels (G(ei)). Since the error vectors ei are simply the differences of the vectors ci, the connections among the vectors ei are the same as that among the vectors ci. Therefore, from (7.12) and (7.15) we have (7.16) where T(D) = セ@ IT Gl, (7.17) and the matrix 00 l G=L L n G(en) (7.18) l =1 Er"O n =1 is the matrix transfer function of the error state diagram. T(D) is called the scalar transfer function or simply the transfer function of the Error State Diagram. &ample 7.10 Consider a rate lf2 TCM scheme with m = 1, and M = 4. It takes one bit at a time and encodes it into two bits, which are then mapped to one of the four QPSK symbols. The two state Trellis Diagram and the symbol allocation from the 4-PSKConstellation is given in Fig. 7.15. 10 00 0 0 0 16 Fig. 7.15 Let us denote the error vector bye= (e2 e1). Then, from (7.14) Trellis Coded Modulation 1 [nllf(OO)- f(OOE!l e2'Illi 2 G(e2el) = 2 nllf(OO)- f(OlE!le2'Il112 = _!_[nllf(OO)- [(t2e1 lll 2 2 nllf(Ol)-f(e21Jll2 nllf(lO)- f{lO Ellt21J. )1 2 ] nllf(ll)- f(ll Ell e2till2 nllf(lO)- f(e21J. )12 ] nllf(ll)- f(e2iJ. IセR@ where e·= 1 ffi e. The error state diagram for this TCM scheme is given in Fig. 7.16. G(10) A G(01) 0 So s1 So Fig. 7.16 The matrix transfer function of the error state diagram is G = G(I0)[/2 -G(ll)r 1 G(Ol) (7.19) (7.20) where 12 is the 2 x 2 identity matrix. In this case there are only three error vectors possible, {01,10,11 }. From (7.19) we calculate G(OI) = セ@ {セZ@ セZ}L@ G(IO) = hセZ@ セZ}N。ョ、@ G(ll) = hセZ@ セZ}@ Using (7.20) we obtain the matrix transfer function of the Error State Diagram as 1 D 6 [1 1] G = 2 1- D6 1 1 The scalar transfer function, T(D), is then given by T(D) = 1__ 1T G1= D 6 2 2 1-D (7.21) (7.22) d b b · · D e- {T セj@ in The upper bound on the probability of error can be compute y su sututmg = (7.22). 6 I P < D -1 e- 2 - 1- D D=t 4No (7.23) Example 7.11 Consider another rate 112 TCM scheme with m = 1, and M = 4. The t:Vo state Trellis Diagram is same as the one in the previous example. However, the symbol allocat1on from the 4-PSK Constellation is different, and is given in Fig. 7.17.
  • 117. Information Theory, Coding and Cryptography 01 10 00 0 0 0 11 Fig. 7.17 Note that this symbol assignment violates the Ungerboek design principles. Let us again denote the error vector bye= (ez et ). The Error State Diagram for this TCM scheme is given in Fig. 7.18. G(11) G(10) s1 Fig. 7.18 G(01) The matrix transfer function of the Error State Diagram is So G = G(11)[lz- G(lO)r1 G(01) (7.24) where lz is the 2 x 2 identity matrix. In this case there are only three error vectors possible {01, 10,11 }. From (7.19) we calculate ' G(OI) = セ@ {セZ@ セZ}N@ G(IO) = セ@ [セZ@ セZ}N。ョ、@ G(ll) = セ@ [セZ@ セZ}@ Using (7.23) we obtain the matrix transfer function of the Error State Diagram as G 1 D 4 [1 1] = 2 1- D4 1 1 (7.25) The scalar transfer function, T(D), is then given by T (D) = _!_ 1T G1= D 4 2 1- D4 (7.26) The upper bound on the probability of error is D4 I P, 1- D4 ' D=e__:_!__ {7.27) -tNo Comparing (7.23) and (7.27) we observe that, simply by changing the symbol assignment to th branches oftheTrellis Diagram, we degrade the performance considerably. In the second examplee エィセ@ upper 「ッオョセ@ on the error probability has loosened by two orders of magnitude(assuming D \セ@ 1, 1.e., for the high SNR case). Trellis Coded Modulation A tighter upper bound on the error event probability is given by (exercise) (Pf] !!h._ -1 p セ@ _!_efrc d free e 4N0 T(D)I _ -tNo e 2 4N D-t . 0 (7.28) From (7.28), an asymptotic estimate on the error event probability can be obtained by considering only the error events with free Euclidean Distance P, セ@ セ@ N(dfr<,) efrc (Jセ@ J (7.29) The bit error probability can be upper bounded simply by weighting the pairwise error probabilities by the number of incorrect input bits associated with each error vector and then dividing the result by m. Therefore, -1 p < 1 aT(D,[) I mo b- m aJ I= l,D= t (7.30) Where T(D, I) is the augmented generating function of the modified State Diagram. The concept of the Modified State Diagram was introduced in the chapter on Convolutional Codes (Section 6.5). A tighter upper bound can also be obtained for the bit error probability, and is given by p < _l_n+..c (Jd]re, Je 1;: aT(D,l) (7.31) e- 2m r:;J'' 4N 0 ai 1 l=l,D=e 4 No From (7.31), we observe that the upper bound on the bit error probability strongly depends ondfne . In the next section, we will learn some methods for estimating dfree • 7.7 COMPUTATION OF drree We have seen that the Euclidean Free Distance, dfree• is the singlemost importaat parameter for determining how good a TCM scheme is for AWGN channels. It defines the asymptotic coding gain of the scheme. In chapter 6 (Section 6.5) we saw that the generating function can be used to calculate the Hamming Free Distance dfrer The transfer function of the error state diagram, T(D), includes information about the distance of all the paths in the trellis from the all zero path. If I(D) is obtained in a closed form, the value of dfree follows immediately from the expansion of the function in a power series. The generating function can be written as d2 d2 T (D) = N (dfree) D free + N (dnext) D next + ... (7.32) where d!xt is the second smallest squared Euclidean Distance. Hence the smallest exponent of Din the series expansion is エゥーN・セ@ However, in most cases, a closed form expression for T(D) may not be available, and one has to resort to numerical techniques. .I
  • 118. Information Theory, Coding and Cryptography Consider the function ¢ (D)= ln [ T(dJ)] 1 T(D) (7.33) ¢I(D) decreases monotonically to the limit dfo 2 as D ---7 0 Therefore we h b d d2 .d ee · ave an upper oun on free proVI ed D>0. In order to obtain a lower bound on d}ee consider the following function ¢2(D) = ln T(D) ln D (7.34) Taking logarithm on both sides of (7.32) we get, d}ee ln D = ln T(D) -ln N (d. ) -ln [1+ N(dfree) d、セ Q@ -dL .. ·] }Tee N(dnext ) ,.•• (7.35) If we take D ---7 0, provided D > 0, from (7.34) and (7.35) we obtain ln T(D) 2 ln D = dfree- e(D) (7.36) where,. e (D) is a function that is greater than zero, and tends to zero monotonically as D ---7 0 Thus, If we take smaller and smaller values of ¢1(D) and ¢2(D), we can obtain val th . extremely close to dfree- ues at are It should be kept in mind that even though d2 ·s th · 1 · - d . free I e smg e most Important parameter to ・エ・イュゥセ・@ the quality of a TCM scheme, two other parameters are also influential: (I) The error coefficient N (d )· A £ t f · · · d . . free · ac or o two mcrease m this error coefficient re uces the codmg gam by approximately 0.2 dB for error rates of 10-6. (ii) The next distance d · · th d all · th . next · IS e secon sm est Euchdean Distance between two pa セ@ formm? an. error event. If dnext is very close to dfrw the SNR requirement for goo approximation of the upper bound on Pe may be very large. So fax:, we ィ。カセ@ セッ」オウウ・、@ primarily on AWGN channels. We found that the best design ウエイ。エセァケ@ Is to maximiZe the free Euclidean Distance, dfret' for the code. In the next section we 」セョウゥ、・イ@ the design rules for TCM over fading channels. Just to remind the readers fading cf =·els セ・@ frequen.tly encountered in radio and mobile communications. One 」ッュュセョ@ cause 0 ュセ@ IS the ュオセエゥー。エィ@ nature of the propagation medium. In this case, the signal arrives at セ・@ ZセZセ・イ@ :om.different ー。セウ@ (with time varying nature) and gets added together. Depending si al セセN@ e signals from セiセ・イ・セエ@ paths add up in phase, or out of phase, the net received Zーャゥセセ・@ I(bitsl a ranthdomhvanld)ation m amplitude and phase. The drops in the received signal e ow a res o are called fades. 7.8 TCM FOR FADING CHANNELS In this section we will co ·d th rl (MPSK) over a Fadin nsi er e pe ormance of trellis coded M-ary Phase Shift Keying g Channel. We know that a TCM encoder takes in an input bit stream and Trellis Coded Modulation outputs a sequence of symbols. In this treatment we will assume that each of these symbols si belong to the MPSK signal set. By using complex notation, each symbol can be represented by a point in the complex plane. The coded signals are interleaved in order to spread the burst of errors caused by the slowly varying fading process. These interleaved symbols are then pulse- shaped for no inter-symbol interference and finally translated to RF frequencies for transmission over the channel. The channel corrupts these transmitted symbols by adding a fading gain (which is a negative gain, or a positive loss, depending on one's outlook) and AWGN. At the receiver end, the received sequences are demodulated and quantized for soft decision decoding. In many implementations, the channel estimator provides an estimate of the channel gain, which is also termed as the channel state information. Thus we can represent the received signal at time i as (7.37) where ni is a sample of the zero mean Gaussian noise process with variance N012 and gi is the complex channel gain, which is also a sample of a complex Gaussian process with variance 」イセ@ The complex channel gain can be explicitly written using the phasor notation as follows gi= ai ・ゥセゥL@ (7.38) where ai and ¢i are the amplitude and phase processes respectively. We now make the following assumptions: (i) The receiver performs coherent detection, (ii) The interleaving is ideal, which implies that the fading amplitudes are statistically independent and the channel can be treated as memoryless. Thus, we can write (7.39) We kr1ow that for a channel with a diffused multipath and no direct path the fading amplitude is Rayleigh distributed with Probability Density Function (pdf) (7.40) For the case when there exists a direct path in addition to the multipath, Rician Fading is observed. The pdf of the Rician Fading Amplitude is given by PA (a)= 2a(1 + K)e- (K +a 2 (k +I)10( 2aJK(l + K)), (7.41) where / 0 (.) is the zero-order, modified Bessel Function of the first kind and K is the Rician Parameter defined as follows. Definition 7.8 The Rician Parameter K is defined as the ratio of the energy of the direct 」ッューッョセョエ@ to the energy of the diffused multipath component. For the extreme case of K = 0, the pdf of the Rician distribution becomes the same as the pdf of the Rayleigh Distribution.
  • 119. Information Theory, .Coding and Cryptography We now look at the performance of the TCM scheme over a fading channel. Let r1= (r1, r2, ...,rei) be the received signal. The maximum likely decoder, which is usually implemented by the Viterbi decoder, chooses the coded sequence that most likely corresponds to the received signals. This is achieved by computing a metric between the sequence of received signals, rtl and the possible transmitted signals, St As we have seen earlier, this metric is related to the conditional channel probabilities m(r1, s1) =In p(r11s& (7.42) If the channel state information is being used, the metric becomes m(r[, St; aI) =In p(rzlsz, aI) (7.43L Under the assumption of ideal interleaving, the channel is memoryless and hence the metrics can be expressed as the following summations l m(rz, s1 ) =lin p(r1is1) (7.44) i=l and l m(rz, hz; al) = llnp(rzisz,at) (7.45) i=l First, we consider the scenario where the channel state information is known, i.e., a; = a;. The metric can be written as m(r;, s;; a;) =- ir;- a; s;i2 Therefore, the pairwise error probability is given by P2(Sz, Sz) = Eal [P2(sb sii az)], where (7.46) (7.47) (7.48) and E is the statistical expectation operator. Using the Chernoff Bound, the pairwise error probability can be upper bounded as follows. A l l+K [ kTセッャウ@ .. -s..l2] P2(sz, Sz) セii@ 1 exp - MMMMMZ[MセMMM i=Il+K + 4No Is; -i.-1 l+K 4No lsi -iil2 For high SNR, the above equation simplifies to A (1 + K)e- K P2(sz, sz) セイイMMMM]MQMMM iETJ 4N is;- s;i2 0 (7.49) (7.50) where 11 is.the set of all ifor which S; 7= si. Let us denote the number of elements in 17 by セ@ , then we can wnte Trellis Coded Modulation (7.51) where d; HセI@ セ@ 111S; - s;l2 (7.52) iETJ is the squared product distance of the signals s; 7= s; .The term セ@ is called the effective length of the error event (sz, 7= i 1). A union bound on the error event probability Pehas already been discussed before. For the high SNR case, the upper bound on Pecan be expressed as 2 ((1 + K)e- K)lry p・セ@ I I 。{セL、ー@ (lTJ)] try セ、セHセI@ HTセセI@ d;(l") (7.53) where a [l11 , 、セ@ HセI}ゥウ@ the average number of code sequences having the effective length lTI and the squared product distance dl (セIN@ The error event probability is actually dominated by the smallest effective length セ@ and the smallest product distance 、セ@ (セIN@ Let us denote the smallest effective length セ@ by L and the corresponding product distance by 、セ@ (セIN@ The error event probability can then be asymptotically approximated by ((1 + K)e-K)L Pe z a (L, d; (L)) L · (TセッI@ d; (L) (7.54) We make the following observations from (7.54) (i) The error event probability asymptotically varies with the Ltb power of SNR. This is similar to what is achieved with a time diversity technique. Hence, Lis also called the time diversity of the TCM scheme. (ii) The important TCM design parameters for fading Channel are the time diversity, L, and the product distance dJ(L). This is in contrast to the free Euclidean Distance parameter for AWGN channel. (iii) TCM codes designed for AWGN channels would normally fare poorly in fading channels and vice versa. (iv) For large values of the Rician parameter, K, the effect of the free Euclidean Distance on the performance of the TCM scheme becomes dominant. (v) At low SNR, again, the free Euclidean Distance becomes important for the performance of the TCM scheme. Thus the basic design rules for TCMs for fading channels, at high SNR and for small values of K, are ,
  • 120. ' I rl Information Theory, Coding and Cryptography (i) maximize the effective length, L, of the code, and (ii) minimize the minimum product distance dj (L). Consider a TCM scheme with effective length, L, and the minimum product distance ih(L). Suppose the code is redesigned to yield a minimum product distance, ih(L) with the same L. The increa,se in the coding gain due the increase in the minimum product distance is given by 10 、セ@ (L)a1 L1 = SNR1 - SNR;.iP P ]MャッァセMM g el- セ@ L 、セ@ (L)a2 ' (7.55) where ai, i= 1, 2, is the average number of code sequences with effective length L for the TCM scheme i. We observe that for a fixed value of L, increasing the minimum product distance corresponding to a smaller value of L is more effective in improving the performance of the code. So far, we have assumed that the channel state information was available. A similar analysis as carried out for the case where channel state information was available can also be done when the information about the channel is unavailable. In the absence of channel state information, the metric can be expressed as m (r;, sj; aj) = -lri- sil2 . (7.56) After some mathematical manipulations, it is shown that A (2e/"')" {LセiウLM .i,l2r R (s1 s1 ) < (1 + K)lr, e-t 11 K 2 ' - (l!No)lr, dj(ZTI) (7.57) Using arguments discussed earlier in this section, the error event probability Pecan be determined for this case when the channel state information is not available. 7.9 CONCLUDING REMARKS Coding and modulation were first analyzed together as a single entity by Massey in 1974. Prior to that time, in all coded digital communications systems, the encoder/decoder and the modulator/demodulator were designed and optimized separately. Massey's idea of combined coding and modulation was concretized in the seminal paper by Ungerboeck in 1982. Similar ideas were also proposed earlier by Imai and Hirakawa in 1977, but did not get due attention. The primary advantage ofTCM was its ability to achieve increased power efficiency without the customary increase in the bandwidth introduced by the coding process. In the following years the theory of TCM was formalized by different researchers. Calderbank and Mazo showed that the asymmetric one-dimensional TCM schemes provide more coding gain than symmetric TCM schemes. Rotationally invariant TCM schemes were proposed by Wei in 1984, which were subsequently adopted by CCITT for use in the new high speed voiceband modems. Trellis Coded Modulation SUMMARY • The Trellis Coded Modulation (TCM) Technique allows us to achieve a better performance without bandwidth expansion or using extra power. • The minimum Euclidean Distance between any two paths in the trellis is called the free Euclidean Distance, セ・・@ of the TCM scheme. • The difference between the values of the SNR for the coded and uncoded schemes required to achieve the same error probability is known as the coding gain, g= SNRiuncoded - SNRicoded' At high SNR, the coding gain can be expressed as g, = giSNR--?= = 10 log (dfee IEs)coded , where g"" represents the Asymptotic Coding Gain and Es is the average (dfree / Es)uncoded signal energy. • The mapping by Set Partitioning is based on successive partitioning of the expanded 2m+1-ary signal set into subsets with increasing minimum Euclidean Distance. Each time we partition the set, we reduce the number of the signal points in the subset, but increase the minimum distance between the signal points in the subset. • Ungerboeck's TCM design rules (based on heuristics) for AWGN channels are Rule 1: Parallel transitions, if present, must 「セ@ associated with the signals of the subsets in the lowest layer of the Set Partitioning Tree. These signals have the minimum Euclidean Distance L1,n + 1. Rule 2: The transitions originating from or merging into one state must be associates with signals of the first step of set partitioning. The Euclidean distance between these signals is at least L1 1. Rule 3: All signals are used with equal frequency in the Trellis Diagram. • The Viterbi Algorithm can be used to decode the received symbols for a TCM scheme. The branch metric used in the decoding algorithm is the Euclidean Distance between the received signal and the signal associated with the corresponding branch in the trellis. • The average number of nearest neighbours at free distance, N(4u ), gives the 。カ・セ。ァ・@ number of paths in the trellis with free Euclidean Distance セ・・@ from a transmitted sequence. This number is used in conjunction with セ・・@ for the evaluation of the error event probability. • The probability of error Pe :5 T (D) ID=e_ 114N0, where, T(D) = セ@ lTGl, and the matrix l G = I I ITG(en). T (D) is the scalar transfer function. A tighter upper bound on the l =1Et oF-On= 1 d2 error event probability is given by Pe :5 l_erfc {セ@ d}.. JeT セ@ T(D) _::..!.___ 2 4N0 D=e4No
  • 121. Information Theory, Coding and Cryptography セ@ (0 + K)e-K)lr, , セ@ • For fading channels, P:z, (sセ@ s) セ@ lr, where d; (l11 ) セ@ IJ I si - si1 2 • The term セ@ (_1_J d2 (/ ) iETI 4N0 P 11 is the effective length of the error event (sz, i1) and K is the Rician parameter. Thus, the error event probability is dominated by the smallest effective length セ@ and the smallest product distance d/ HセIN@ • The design rules for TCMs for fading channels, at high SNR and for small values ofK, are (i) maximize the effective length, L, of the code, and (ii) minimize the minimum product distance d/ (L). • The increase in the coding gain due to increase in minimum product distance is given by I 10 、セRHlI。 Q@ . . セァ@ = SNR1 - sセ@ P. _ P. 2 =- log 2 , where ai , z= 1, 2, Is the average number el ' L dp1(L)a2 of code sequences with effective length L for the TCM scheme i. nA ャヲエエャ・Lセ@ セMセᄋBGPエGャャッヲセセ@ , u . · H./£ NUN'fJ-{Sa/.ii:;J {JB?0-1916;1 PRO'BLEMS 7.1 Consider a rate 2/3 Convolutional Code defined by G(D) = [ 1 D D + D2 l D2 1+ D 1+ D + D2 This code is used with an 8PSK signal set that uses Gray Coding (the three bits per symbol are assigned such that the codes for two adjacent symbols differ only in 1 bit location). The throughput of this TCM scheme is 2 bits/sec/Hz. (a) How many states are there in the Trellis Diagram for this encoder? (b) Find the free Euclidean Distance. (c) Find the Asymptotic coding gain with respect to uncoded QPSK, which has a throughput of 2 bits/sec/Hz. 7.2 In Problem 7.1, suppose instead of Gray Coding, natural mapping is performed, i.e., So セPPPL@ S1 セ@ 001, ..., S7 セ@ 111. (a) Find the free Euclidean Distance. (b) Find the Asymptotic coding gain with respect to uncoded QPSK (2 bits/sec/Hz). Trellis Coded Modulation 7.3 Consider the TCM encoder shown in Fig. 7.19. Fig. 7.19 Figure for Problem 73. (a) Draw the State Diagram for this encoder. (b) Draw the Trellis Diagram for this encoder. S; (c) Find the free Euclidean Distance, セヲイ・・N@ In the Trellis Diagram, show one pair of two paths which result in tf}ree .What is N(tf}reJ? (d) Next, use set partitioning to assign the symbols of 8-PSK to the branches of the Trellis Diagram. What is the 、セ・・@ now? (e) Encode the following bit stream using this encoder: 1 0 0 1 0 0 0 I 0 1 0 ... Give your answer for both the natural mapping and mapping using Set Partitioning. (f) Compare the asymptotic coding gains for the two different kinds of mapping. 7.4 We want to design a TCM scheme that has a 2/3 convolutional encoder followed by a signal mapper. The mapping is done based on set partitioning of the Asymmetric Constellation Diagram shown below. The trellis is a four-state, fully connected trellis. (a) Perform Set Partitioning for the following Asymmetric Constellation Diagram. (b) What is the free Euclidean distance, dfrel' for this asymmetric TCM scheme? Compare it with the i)ee for the case when we use the standard 8-PSK Signal Constellation. . (c) How will you choose the value of efor improving the performance of the TCM scheme using the Asymmetric Signal Constellation shown in Fig. 7.20? Ss s-, Fig. 7.20 Figure for Problem 74.
  • 122. Information Theory, Coding and Cryptography 7.5 Consider the rate 3/4 encoder shown in Fig. 7.21. The four output bits from the encoder are mapped onto one of the sixteen possible symbols from the Constellation Diagram shown below. Use Ungerboeck's design rules to design a TCM scheme for an AWGN channel. What is the asymptotic coding gain with respect to uncoded 8-PSK? a1 c1 a2 C2 • • • a3 CJ • • • c4 Fig. 7.21 Figure for Problem 7.5. 7.6 Consider the expression for pairwise error probability over a Rician Fading Channel. Comment. (b) Show that for low SNR the original inequality may be expressed as R(s s) セ・クー@ [di (sz,iz)] 2 z, z 4No 7.7 Consider a TCM scheme designed for a Rician Fading Channel with an effective length L and the minimum product distance d: (L). Suppose, we wish to redesign this code to obtain an improvement of 3 dB in SNR. (a) Compute the desired effective length L if the tfj (L) is kept unchanged. (b) Compute the desired product distance d:(L) if the effective length L is kept unchanged. 7.8 Suppose you have to design a TCM scheme for an AWGN channel (SNR = y). The desired BER is Pe. Draw a flowchart as to how you will go about designing such a scheme. Trellis Coded Modulation (a) How many states will there be in your Trellis? (b) How will you design the convolutional encoder? (c) Would you have parallel paths in your design? (d) What kind of modulation scheme will you choose and why? (e) How will you assign the symbols of the modulation scheme to the branches? 7.9 For Viterbi decoding the metric used is of the form m (rb s1) =In p(rzlsz). (a) What is the logic behind choosing such a metric? (b) Suggest another metric that will be suitable for fading channels. Give reasons for your answer. 7.10 A TCM scheme designed for a Rician Fading Channel (K = 3) and a high SNR environment (SNR = 20 dB) has L = 5 and d/ (L) = 2.34 E'/. It has to be redesigned to produce an improvement of 2 dB. (a) What is the tfj(L) of the new code? (b) Comment on the new d セ・・ᄋ@ 7.11 Consider the TCM scheme shown in Fig. 7.22 consisting of a rate lf2 convolutional encoder coupled with a mappei. (a) Draw the Trellis Diagram for this encoder. (b) Determine the scalar transfer function, T (D). (c) Determine the augmented generating function, T (D, L, !). (d) What is the minimum Hamming Distance (dfree ) of this code? (e) How many paths are there with this dfree? 10 0 0 0 Fig. 7.22 Figure for Problem 7. 77. 7.12 Consider the pairwise error probability P2(s1, Sz). (a) For a maximum likelihood decoder, prove that P2(sb Sz) = JヲHイIーrisHセウコI、イ@ 00 where r is the received vector, pRl s(r ISz) is the channel transition probability density function and I I
  • 123. (b) Show that Information Theory, Coding and Cryptography f( ) < PRis(riiz) r - PRis(riiz) P2(sb iz) セ@ JJPRis(rliz )PRis(risz)dr COMPUTER PROBLEMS 7.13 Write a computer program to perform trellis coded modulation, given the trellis structure and the mapping rule. The program should take in an input bit stream and output a sequence of symbols. The input to the program may be taken as two matrices, one that gives the connectivity between the states of the trellis (essentially the structure of the trellis) and the second, which gives the branch labels. 7.14 Write a computer program to calculate the squared free Euclidean distance ifree' the effective length L, and the minimum product distance, dff (L), of a TCM Scheme, given the Trellis Diagram and the label& on the branches. 7.15 Write a computer program that performs Viterbi decoding on an input stream of symbols. This program makes use of a given trellis and the labels on the branches of the Trellis Diagram. 7.16 Verify the performance of the different TCM schemes given in this chapter in AWGN environment. To do so, take a long chain ofrandom bits and input it to the TCM encoder. The encoder will produce a sequence of symbols (analog waveforms). Corrupt these symbols with AWGN of different noise power, i.e., simulate scenarios with different SNRs. Use Viterbi decoding to decode the received sequence of corrupted symbols (distorted waveforms). Generate a plot of the BER versus the SNR and compare it with the theoretically predicted error rates. 7.17 Write a program to observe the effect of decoding window size for the Viterbi decoder. Generate a plot of the error rate versus the window size. Also plot the number of computations versus the window size. 7.18 Write a computer program that performs exhaustive search ゥセ@ order to determine a rate 2/3 TCM encoder which is designed for AWGN (maximize dfree ). Assume that there are four states in the Trellis Diagram and it is a fully connected trellis. The branches of this trellis are labelled using the symbols from an 8-PSK signal set. Modify the program to perform exhaustive search for a good TCM scheme with a four-state trellis with the possibility of parallel branches. 7.19 Write a computer program that performs exhaustive search in order to determine a rate 2/3 TCM encoder which is designed for a fading channel (maximize d/(L)). Assume that there are four states in the trellis diagram and it is a fully connected trellis. The branches of this trellis are labelled using the symbols from an 8-PSK signal set. List out the dj (L) and L of some of the better codes found during the search. 7.20 Draw the family of curves depicting the relation between Pe and Leff for different values of K (Rician Parameter) for (a) High SNR, (b) Low SNR. Comment on the plots.
  • 124. 8 Cryptography ff, 8.1 INTRODUCTION TO CRYPTOGRAPHY Cryptography is the science of devising methods that allow information to be sent in a secure form· in such a way that the only person able to retrieve this information is the intended recipient. Encryption is based on algorithms that scramble information into unreadable or non- discernible form. Decryption is the process of restoring the scrambled information to its original form (see Fig. 8.1). A Cryptosystem is a collection of algorithms and associated procedures for hiding and revealing (un-hiding!) information. Cryptanalysis is the process (actually, the art) of analyzing a cryptosystem, either to verify its integrity or to break it for ulterior motives. An attacker is a person or system that performs unauthorised cryptanalysis in order to break a cryptosystem. Attackers are also referred to as hackers, interlopers or eavesdroppers. The process of attacking a cryptosystem is often called cracking. The job of the cryptanalyst is to find the weaknesses in the cryptosystem. In many cases, the developers of a cryptosystem announce a public challenge with a large prize-money for anyone
  • 125. Information Theory, Coding and Cryptography who can crack the scheme. Once a cryptosystem is broken (and the cryptanalyst discloses his techniques), the designers of the scheme try to strengthen the algorithm. Just because a cryptosystem has been broken does not render it useless. The hackers may have broken the system under optimal conditions using equipment (fast computers, dedicated microprocessors, etc.) that is usually not available to common people. Some cryptosystems are rated in terms of the length of time and the price of the computing equipment it would take to break them! In the last few decades, cryptographic algorithms, being mathematical in nature, have become so advanced that they can only be handled by computers. This, in effect, means that the uncoded message (prior to encryption) is binary in form, and can therefore be anything; a picture, a voice, a text such as an e-mail or even a video. Fig. 8.1 The Process of Encryption and Decryption. Cryptography is not merely used for military and diplomatic communications as many people tend to believe. In reality, cryptography has many commercial uses and applications. From protecting confidential company information, to protecting a telephone call, to allowing someone to order a product on the Internet without the fear of their credit card number being intercepted and misused, cryptography is all about increasing the level of privacy of individuals and groups. For example, cryptography is often used to prevent forgers from counterfeiting winning lottery tickets. Each lottery ticket can have two numbers printed onto it, one plaintext and one the corresponding cipher. Unless the counterfeiter has cryptanalyzed the lottery's cryptosystem he or she will not be able to print an acceptable forgery. The chapter is organized as follows. We begin with an overview of different encryption techniques. We will, then, study the concept of secret-key cryptography. Some specific secret- key cryptographic techniques will be discussed in detail. The public-key cryptography will be introduced next. Two popular public-key cryptographic techniques, the RSA algorithm and PGP, will be discussed in detail. A flavour of some other cryptographic techniques in use today will also be given. The chapter will conclude with a discussion on cryptanalysis and the politics of cryptography. 8.2 AN OVERVIEW OF ENCRYPTION TECHNIQUES The goal of a cryptographic system is to provide a high level of confidentiality, integrity, non- repudiability and authenticity to information that is exchanged over networks. Cryptography Confidentiality of messages and stored data is protected by hiding information using encryption techniques. Message integrity ensures that a message remains unchanged from the time it is created to the time it is opened by the recipient. Non-repudiation can provide a way of proving that the message came from someone even if they try to deny it. Authentication provides two services. First, it establishes beyond doubt the origin of the message. Second, it verifies the identity of a user logging into a system and continues to verify their identity in case someone tries to break into the system. Definition 8.1 A message being sent is known as plaintext The message is code<J. using a Cryptographic Algorithm. This process is called encryption. An encrypted message is known as ciphertext, and is turned back into plaintext by the process.of decryption. It must be assumed that any eavesdropper has access to all communication between the sender and the recipient. A method of encryption is only secure if even with this complete access, the eavesdropper is still unable to recover the original plaintext from the ciphertext. There is a big difference between security and obscurity. Suppose, a message is left for somebody in an airport locker, and the details of the airport and the locker number are known only to the intended recipient, then this message is nf'>t secure, merely ッ「ウ」セイ・N@ If however, all potential eavesdroppers know the exact location of the locker, and they still cannot open the locker and access the message, then this message is secure. Definition 8.2 A key is a value that causes a Cryptographic Algorithm to run in a specific manner and produce a specific ciphertext as an outpUt: Thekeysize ゥセ@ usually measured in bits. The bigger the key size, the more secure will be the algorithm. Extlmpk8.1 Suppose we ィ。セ・@ to ・セ@ セ@ send エィ・ヲッャャッキセ@ stream Clfbinary、。エ。Hセ@ might be originating from voice, video, text or any other source) 0110001010011111 ... We can use a 4-bit long key, x =1011, to encrypt this bit stream. To perform encryption, セ@ plaintext (binary bit stream) is first subdivided in to blocks of4 bits. 0110 0010 1001 1111.... Each sub--blockis XORed (binary addition) with the key,x=1011. The encryptedmessagewillbe 1 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0.... The recipient must also possess the knowledge of the key in order to セ@ the ュセ・ᄋNGGセ@ 、・」イケーエゥッョセ@ is fairly simple in this case. The ciphertext (the received bmary bitセエウ@ first•subdividedmto blocks of 4 bits. Bach sub-block is XORed with the key, x =1011. tbe decrypted messagewill be the original plaintext セi@
  • 126. Information Theory, Coding and Cryptography 0110 0010 1001 1111... It should be noted that just one key is used both for encryption and decryption. Example 8.2 Let us devise an algorithm for text messages, which we shall call character+ x. Let x = 5. In this encryption technique, we replace every alphabet by the fifth one following it, i.e., A becomes F, B becomes G, C becomes H, and so on. The recipients of the encrypted message just need to know the value of the key, x, in order to decipher the message. The key must be kept separate from the encrypted message being sent. Because there is just one key which is used for encryption and decryption, this kind of technique is called Symmetric Cryptography or Single Key Cryptography or Secret Key Cryptography. The problem with this technique is that the key has to be kept confidential. Also, the key must be changed from time to time to ensure secrecy of transmission. This means that the secret key (or the set of keys) has to be communicated to the recipient. This might be done physically. To get around this problem of communicating the key, the concept of Public Key Cryptography was developed by Difie and Hellman. This technique is also called the Asymmetric Encryption. The concept is simple. There are two keys, one is held privatelyand the other one is made public. What one key can lock, the other key can unlock. eク。ューセ@ 8.3 sオセウ・@ we want to send an encrypted message to recipient A using the public key encryption techruque. To do so we will use the public key of the recipient A and use it to encrypt エィセ@ message. When the message is received, recipient A decrypts it with his private key. Only the pnvate key of recipient A can decrypt a message that has been encrypted with his public key. Similarly, イセゥーゥ・ョエ@ B can only decrypt a message that has been encrypted with his public key. Thus, no pnvate key ever needs to be communicated and hence one does not have to trust any communication channel to convey the keys. U:t us consider another scenario. Suppose we want to send somebody a message and also ーイッvi、セ@ a ーイッッセ@ that the message is actually from us (a lot of harm can be done by providing bogu.s mformation, or rather, misinformation!). In order to keep a message private and also ーイセvi、・@ 。オセ・ョエゥ」。エゥッセ@ (that it is indeed from us), we can perform a special encryption on the ーャセョN@ text wtth our pnvate key, then encrypt it again with the public key of the recipient. The イ・cゥーゥ・セエ@ .uses ィセウ@ private key to open the message and then use our public key to verify the authenticity. This technique is said to use Digital Signatures. tィセイ・@ is セッエィ・イ@ important encryption technique called the One-way Function. It is a non- reversible qmck encryption method. The encryption is easy and fast, but the decryption is not. Suppose we send a document to recipient A and want to check at a later time whether the document has been tampered with. We can do so by running a one-way function, which ーイセ、オ」・セ@ a fixed length value called a hash (also called the message digest). The hash is the umque signature of the document that can be sent along with the document. Recipient A can run the same one-way function to check whether the document has been altered. Cryptography The actual mathematical function used to encrypt and decrypt messages is called a Cryptographic Algorithm or cipher. This is only a part of the system used to send and receive secure messages. This will become clearer as we discuss specific systems in detail. As with most historical ciphers, the security of the message being sent relies on the algorithm itself remaining secret. This technique is known as a Restricted Algorithm. It has the following fundamental drawbacks. . (i) The algorithm obviously has to be restricted to only those people that you want to be able to decode your message. Therefore a new algorithm must be invented for every discrete group of users. (ii) A large or changing group of users cannot utilise them, as every time one user leaves the group, everyone must change the algorithm. (iii) If the algorithm is compromised in any way, a new algorithm must be implemented. Because of these drawbacks, Restricted Algorithms are no longer popular and have given way to key-based algorithms. Practically all modem cryptographic systems make use of a key. Algorithms that use a key allow all details of the algorithm to be widely available. This is because all of the security lies in the key. With a key-based algorithm the plaintext is encrypted and decrypted by the algorithm which uses a certain key, and the resulting ciphertext is dependent on the key, and not the algorithm. This means that an eavesdropper can have a complete copy of the algorithm in use, but without the specific key used to encrypt that message, it is useless. 8.3 OPERATIONS USED BY ENCRYPTION ALGORITHMS Although the methods of encryption/decryption have changed dramatically since the advent of computers, there are still only two basic operations that can be carried out on a piece of plaintext: substitution and transposition. The only real difference is that, earlier these were carried out with the alphabet, nowadays they are carried out on binary bits. Substitution Substitution operations replace bits in the plaintext with other bits decided upon by the algorithm, to produce ciphertext. This substitution then just has to be reversed to produce plaintext from ciphertext. This can be made increasingly complicated. For instance one plaintext character could correspond to one of a number of ciphertext characters (homophonic substitution), or each character of plaintext is substituted by a character of corresponding position in a length of another text (running cipher). .Example8.4 Julius Caesar was one ofthe first to use substitution encryption to sendmessages to troops during the war. The substitution methodhe invented advances eachcharacterthree spacesin the alphabet. Thus,
  • 127. Information Theory, Coding and Cryptography THIS IS SUBSTITUTION CIPHER (8.1) WKLV LU VXEVWL WXWLRQ FLSKHU. Transposition Transposition (or permutation) does not alter any of the bits in plaintext, but instead moves their positions around within it. If the resultant ciphertext is then put through more transpositions, the end result is increasing security. XOR XOR is an exclusive-or operation. It is a Boolean operator such that if 1 of two bits is true, then so is the result, but if both are true or both are false then the result is false. For example, OXORO=O 1XOR0=1 OXOR1=1 1XOR1=0 (8.3) A surprising amount of commercial software uses simple XOR functions to provide security, including the USA digital cellular telephone network and many office applications, and it is trivial to crack. However the XOR operation, as will be seen later in this paper, is a vital part of many advanced Cryptographic Algorithms when performed between long blocks of bits that also undergo substitution and/or transposition. 8.4 SYMMETRIC (SECRET KEY) CRYPTOGRAPHY Symmetric Algorithms (or Single Key Algorithms or Secret Key Algorithms) have one key that is used both to encrypt and decrypt the message, hence the name. In order for the recipient to decrypt the message they need to have an identical copy of the key. This presents one major problem, the distribution of the keys. Unless the recipient san meet the sender in person and obtain a key, the key itself must be transmitted to the recipient, and is thus susceptible to eavesdropping. However, single key algorithms are fast and efficient, especially if large volumes of data need to be processed. In Symmetric Cryptography, the two parties that exchange messages use the same algorithm. Only the key is changed from time to time. The same plaintext with a different key results in a different ciphertext. The encryption algorithm is available to the public, hence should be strong and well-tested. The more powerful the algorithm, the less likely that an attacker will be able to decrypt the resulting cipher. The size of the key is critical in producing strong ciphertext. The US National Security Agency, NSA stated in the mid-1990s that a 40-bit length was acceptable to them (i.e., they Cryptography could crack it sufficiently quickly!). Increasing processor speeds, combined with loosely-coupled multi-processor configurations, have brought the ability to crack such short keys within the reach of potential hackers. In 1998, it was suggested that in order to be strong, the key size needs to be at least 56 bits long. It was argued by an expert group as early as 1996 that 90 bits is a more appropriate length. Today, the most secure schemes use 128-bit keys or even longer keys. Symmetric Cryptography provides a means of satisfying the requirement of message content security, because the content cannot be read without the secret key. There remains a risk of exposure, however, because neither party can be sure that the other party has not exposed the secret key to a third party (whether accidentally or intentionally). S mmetric Cryptography can also be used to address integrity and authentication イ・アセイ・ュ・ョエウN@ The sender creates a summary of the message, or Message Authentication Code (MAC), encrypts it with the secret key, and sends that with the message. The reCipient then re-creates the MAC, decrypts the MAC that was sent, and compares the two. If they are identical, then the message that was received must have been identical with that which was sent. As mentioned earlier, a major difficulty with symmetric schemes is that the secret key has to be possessed by both parties, and hence has to be transmitted from whoever 」セ・セエ・ウ@ it to セ・@ other party. Moreover, if the key is compromised, all of the message エセ。ョウュゥウウキョ@ セ・」オョエケ@ measures are undermined. The steps taken to provide a secure mechanism for creating and passing on the secret key are referred to as Key Management. The technique does not adequately address the non-repudiation requirement, 「・セ。オウ・N@ both parties have the same secret key. Hence each is exposed to the risk of fraudulent ヲ。ャセiヲゥ」。エゥセョ@ of a message by the other, and a claim by either party not to have sent a message IS credible, because the other may have compromised the key. There are two types of Symmetric Algorithms-Block Ciphers and Stream Ciphers. Definition 8.3 Block Ciphen usually operate on groups of bits called blocks. Each block is processed a multiple number of times. In each round the セ・ケ@ is applied ゥセ@ a unique manner. The more the number of iterations, the longer IS the encryption process, but results in a more secure ciphertext Definition 8.4 Stream Ciphen operate on plaintext one bit at a time. Plaintext is streamed as raw bits through the encryption algorithm. While a block cipher will produce the same ciphertext from the same plaintext using the same key, a stream cipher will not. The ciphertext produced by a stream cipher will vary under the same conditions. How long should a key be? There is no single answer to this アオ・ウエゥセョN@ It 、・セ・ョ、ウ@ on the specific situation. To determine how much security one needs, the followmg questions must be answered: (i) What is the worth of the data to be protected? -,
  • 128. Information Theory, Coding and Cryptography (ii) How long does it need to be secure? (iii) What are the resources available to the cryptanalyst/hacker? A customer list might be worth Rs 1000, an advertisement data might be worth Rs. 50,000 and the master key for a digital cash system might be worth millions. In the world of stock markets, the secrets have to be kept for a couple of minutes. In the newspaper business today's secret is tomorrow's headlines. The census data of a country have to be kept secret for months (if not years). Corporate trade secrets are interesting to rival companies and military secrets are interesting to rival militaries. Thus, the security requirements can be specified in these terms. For example, one may require that the key length must be such that there is a probability of 0.0001% that a hacker with the resources of Rs 1 million could break the system in 1 year, assuming that the technology advances at a rate of 25% per annum over that period. The minimum key requirement for different applications are listed in Table 8.1. This table should be used as a guideline only. Table 8.1 Minimum key requirements for different applications Type of information Lifetime Mmimum key length Tactical military information Minutes/hours 56-64 bits Product announcements' Days/weeks 64 bits Interest rates Days/weeks 64 bits Trade secrets decades 112 bits Nuclear bomb secrets >50 years 128 bits Identities of spies >50 years 128 bits Personal affairs > 60 years > 128 bits Diplomatic embarrassments > 70 years > 128 bits Future computing power is difficult to estimate. A rule of thumb is that the efficiency of computing equipment divided by price doubles every 18 months, and increases by a factor of 10 every five years. Thus, in 50 years the fastest computer will be 10 billion times faster than today's! These numbers refer to general-purpose computers. We cannot predict what kind of specialized crypto-system breaking computers might be developed in the years to come. Two symmetric algorithms, both block ciphers, will be discussed in this chapter. These are the Data Encryption Standard (DES) and the International Data Encryption Algorithm (IDEA). 8.5 DATA ENCRYPTION STANDARD (DES) DES, an acronym for the Data Encryption Standard, is the name of the Federal Information Processing Standard (FIPS) 46-3, which describes the Data Encryption Algorithm (DEA). The DEA is also defined in the ANSI standard X9.32. Created by IBM, DES came about due to a public request by the US National Bureau of Standards (NSB) requesting proposals for a Standard Cryptographic Algorithm that satisfied the following criteria: Cryptography (i) Provides a high level of security (ii) The security depends on keys, not the secrecy of the algorithm (iii) The security is capable of being evaluated (iv) The algorithm is completely specified and easy to understand (v) It is efficient to use and adaptable (vi) Must be available to all users (vii) Must be exportable DEA is essentially an improvement of the 'Algorithm Lucifer' developed by IBM in the early 1970s. The US National Bureau of Standards published the Data Encryption Standard in 1975. While the algorithm was basically designed by IBM, the NSA and NBS (now NIST) played a substantial role in the final stages of the development. The DES has been extensively studied since its publication and is the best known and the most widely used Symmetric Algorithm in the world. The DEA has a 64-bit block size and uses a 56-bit key during execution (8 parity bits are stripped off from the full 64-bit key). The DEA is a Symmetric Cryptosystem, specifically a 16- round Feistel Cipher and was originally designed for implementation in hardware. When used for communication, both sender and receiver must know the same secret key, which can be used to encrypt and decrypt the message, or to generate and verify a Message Authentication Code (MAC). The DEA can also be used for single-user encryption, such as to store files on a hard disk in encrypted form. In a multi-user environment, secure key distribution may be difficult; public-key cryptography provides an ideal solution to this problem. NIST re-certifies DES (FIPS 46_:1, 46-2, 46-3) every five years. FIPS 46-3 reaffirms DES usage as of October 1999, but single DES is permitted only for legacy systems. FIPS 46-3 includes a definition of triple-DES (TDEA, corresponding to X9.52). Within a few years, DES and triple- DES will be replaced with the Advanced Encryption Standard. DES has now been in world-wide use for over 20 years, and due to the fact that it is a defined standard means that any system implementing DES can communicate with any other system using it. DES is used in banks and businesses all over the world, as well as in networks (as Kerberos) and to protect the password file on UNIX Operating Systems (as CRYPT). DES Encryption DES is a symmetric, block-cipher algorithm with a key length of 64 bits, and a block size of 64 bits (i.e. the algorithm operates on successive 64 bit blocks of plaintext). Being symmetric, the same key is used for encryption and decryption, and DES also uses the same algorithm for encryption and decryption. First a transposition is carried out according to a set table (the initial permutation), the 64-bit plaintext block is then split into two 32-bit blocks, and 16 identical operations called rounds are carried out on each half. The two halves are then joined back together, and the reverse of the
  • 129. Information Theory, Coding and Cryptography initial permutation carried out. The purpose of the first transposition is not clear, as it does not affect the security crf the algorithm, but is thought to be for the purpose of allowing plaintext and ciphertext to be loaded into 8-bit chips in byte-sized pieces. In any round, only one half of the original 64-bit block is operated on. The rounds alternate between the two halves. One round in DES consists of the following. Key Transformation The 64-bit key is reduced to 56 by removing every eighth bit (these are sometimes used for error checking). Sixteen different 48-bit subkeys are then created- one for each round. This is achieved by splitting the 56-bit key into two halves, and then circularly shifting them left by 1 or 2 bits, depending on the round. After this, 48 of the bits are selected. Because they are shifted, different groups of key bits are used in each subkey. This process is called a compression permutation due to the transposition of the bits and the reduction of the overall size. Expansion Permutation After the key transformation, whichever half of the block is being operated on undergoes an expansion permutation. In this operation, the expansion and transposition are achieved simultaneously by allowing the 1st and 4th bits in each 4 bit block to appear twice in the output, i.e., the 4th input bit becomes the 5th and 7th output bits (see Fig. 8.2). The expansion permutation achieves 3 things: Firstly it increases the size of the half-block from 32 bits to 48, the same number of bits as in the compressed key subset, which is important as the next operation is to XOR the two together. Secondly, it produces a longer string of data for the substitution operation that subsequently compresses it. Thirdly, and most importantly, because in the subsequent substitutions the 1st and 4th bits appear in two S-boxes (described shortly), they affect two substitutions. The effect of this is that the dependency of the output bits on the input bits increases rapidly, and so, therefore, does the security of the algorithm. -----.-§] 48 Fig. 8.2 The Expansion Permutation. XOR The resulting 48-bit block is then XORed with the appropriate subset key for that round. Substitution The next operation is to perform substitutions on the expanded block. There are eight substitution boxes, called S-boxes. The first S-box operates on the first 6 bits of the 48-bit Cryptography expanded block, the 2nd S-box on the next six, and so on. Each S-box operates from a table of 4 rows and 16 columns, each entry in the table is a 4-bit number. The 6-bit number the S-box takes as input is used to look up the appropriate entry in the table in the following way. The 1st and 6th bits are combined to form a 2-bit number corresponding to a row number, and the 2nd to 5th bits are combined to form a 4-bit セオュ「・イ@ corresponding to a particular column. The net result of the substitution phase is eight 4-bit blocks that are then combined into a 32-bit block. It is the non-linear relationship of the S-boxes that really provide DES with its security, all the other processes within the DES algorithm are linear, and as such relatively easy to analyze. 48-bit input iセ@ セセi@ ill)II: /ill)11:1r--11...--1 lMイャャMイセMNA@ セMMLゥLMMLQZ@ NNMQサセQ@ L-rii"TJ""""T:...,jlr-1!: / l{(!Ij: 1/ll!セセi@ j: Oゥャャセセ|@ セセセセ@ セセセセ@ セセセセ@ セセセセ@ セセセセ@ セセセセ@ セセセセ@ セセセセ@ ITITJITITJITITJITOJITOJITITJITIIJITIIJ 32-bit output Fig. 8.3 The 5-box Substitution. Permutation The 32-bit output of the substitution phase then undergoes a straightforward transposition using a table sometimes known as the P-box. After all the rounds have been completed, the two 'half-blocks' of 32 bits are recombined to form a 64-bit output, the final permutation is performed on it, and the resulting 64-bit block is the DES encrypted ciphertext of the input plaintext block. DES Decryption Decrypting DES is very easy (if one has the correct key!). Thanks to its 、・ウセァョL@ the 、セ」イケーエゥッョ@ algorithm is identical to the encryption algorithm-the only alteration that セウ@ made, IS. that to decrypt DES ciphertext, the subsets of the key used in each round are used m reverse, 1.e., the 16th subset is used first. Security of DES Unfortunately, with advances in the field of cryptanalysis and the huge ゥョ」イ・。ウセ@ in available computing power, DES is no longer considered to be very secure. There are algonthms that can be used to reduce the number of keys that need to be checked, but even using a straightforward brute-force attack and just trying every single possible key, there are computers that can crack DES in a matter of minutes. It is rumoured that the US National Security Agency (NSA) can crack a DES encrypted message in 3-15 minutes. If a time limit of 2 hours to crack a DES encrypted file is set, then you have to check all possible keys (256 !in two hours, which is roughly 5 trillion keys per second. While this may --,
  • 130. Information Theory, Coding and Cryptography seem like a huge number, consider that a $I0 Application-Specific Integrated Circuits (ASICs) chip can test 200 million keys per second, and many of these can be paralleled together. It is suggested that a $I0 million investment in ASICs would allow a computer to be built that would be capable of breaking a DES encrypted message in 6 minutes. DES can no longer be considered a sufficiently secure algorithm. If a DES-encrypted message can be broken in minutes by supercomputers today, then the rapidly increasing power of computers means that it will be a trivial matter to break DES encryption in the future (when a message encrypted today may still need to be secure). An extension of DES called DESX is considered to be virtually immune to an exhaustive key search. 8.6 INTERNATIONAL DATA ENCRYPTION ALGORITHM (IDEA) IDEA was created in its first form by Xuejia Lai andjames Massey in I990, and was called the pイッセッウ・セ@ eョ」セーエゥッョ@ Standard (PES). In I991, Lai and Massey strengthened the algorithm agamst differential cryptanalysis and called the result Improved PES (IPES). The name of IPES was change.d エセ@ International Data Encryption Algorithm (IDEA) in I992. IDEA is perhaps best known for Its Implementation in PGP (Pretty Good Privacy). The Algorithm IDEA is a symmetric, block-cipher algorithm with a key length of I28 bits, a block size of 64 bits, and as with DES, the same algorithm provides encryption and decryption. ideセ@ consists of 8 rounds using 52 subkeys. Each round uses six subkeys, with the remaining four bemg used for the output transformation. The subkeys are created as follows. . Firstly the NiセXM「ゥエ@ key is divided into eight I6-bit keys to provide the first eight subkeys. The bits of the ッNョァエセ。ャN@ key are then shifted 25 bits to the left, and then it is again split into eight subkeys. This shifting and then splitting is repeated until all 52 subkeys (SKI-SK52) have been created. The 64-bit plaintext block is first split into four blocks {B I-B4). A round then consists of the following steps (OB stands for output block): OBI = BI * SKI (multiply Ist sub-block with Ist subkey) OB2 = B2 + SK2 (add 2nd sub-block to 2nd subkey) OB3 = B3 + SK3 (add 3rd sub-block to 3rd subkey) OB4 = B4 * SK4 (multiply 4th sub-block with 4th subkey) OB5 =OBI XOR OB3 ( XOR results of steps I and 3) OB6 = OB2 XOR OB4 OB7 = OB5 * SK5 (multiply result of step 5 with 5th subkey) OB8 = OB6 + OB7 (add results of steps 5 and 7) OB9 = OB8 * SK6 (multiply result of step 8 with 6th subkey) Cryptography OBIO = OB7 + OB9 OBll =OBI XOR OB9 (XOR results of steps 1 and 9) OB 12 = OB3 XOR OB9 OB13 = OB2 XOR OBIO OB14 = OB4 XOR OBIO The input to the next round, is the four sub-blocks OB11, OB13, OB12, OB14 in that order. After the eighth round, the four final output blocks (F1-F4) are used in a final transformation to produce four sub-blocks of ciphertext (C1-C4) that are then rejoined to form the final64-bit block of ciphertext. C1 = F1 * SK49 C2=F2 + SK50 C3 = F3 + SK51 C4 = F4 * SK52 Ciphertext= C1 C2 C3 C4. Security Provided by IDEA Not only is IDEA approximately twice as fast as DES, but it is also considerably more secure. Using a brute-force approach, there are 2128 possible keys. If a billion chips that could each test 1 billion keys a second were used to try and crack an IDEA-encrypted message, it would take them 1013 years which is considerably longer than the age of the universe! Being a fairly new algorithm, it is possible a better attack than brute-force will be found, which, when coupled with much more powerful machines in the future may be able to crack a message. However, for a long way into the future, IDEA seems to be an extremely secure cipher. 8.7 RC CIPHERS The RC ciphers were designed by Ron Rivest for the RSA Data Security. RC stands for Ron's Code or Rivest Cipher. RC2 was designed as a quick-fix replacement for DES that is more secure. It is a block cipher with a variable key size that has a propriety algorithm. RC2 is a variable-key- length cipher. However, when using the Microsoft Base Cryptographic Provider, the key length is hard-coded to 40 bits. When using the Microsoft Enhanced Cryptographic Provider, the key length is 128 bits by default and can be in the range of 40 to 128 bits in 8-bit increments. RC4 was developed by Ron Rivest in 1987. It is a variable-key-size stream cipher. The details of the algorithm have not been officially published. The algorithm is extremely easy to describe and program. Just like RC2, 40-bit RC4 is supported by the Microsoft Base Cryptographic provider, and the Enhanced provider allows keys in the range of 40 to 128 bits in 8-bit increments. RC5 is a block cipher designed for speed. The block size, key size and the number of iterations are all variables. In particular, the key size can be as large as 2,048 bits. l
  • 131. Information Theory, Coding and Cryptography All the encryption techniques discussed so far belong to the class of symmetric cryptography (DES, IDEA and RC Ciphers). We now look at the class of Asymmetric Cryptographic Techniques. 8.8 ASYMMETRIC (PUBLIC-KEY) ALGORITHMS Public-key Algorithms are asymmetric, that is to say the key that is used to encrypt the message is different from the key used to decrypt the message. The encryption key, known as the public key is used to encrypt a message, but the message can only be decoded by the person that has the decryption key, known as the private key. This type of algorithm has a number of advantages over traditional symmetric ciphers. It means that the recipient can make their public key widely available - anyone wanting to send them a message uses the algorithm and the recipient's ーオ「ャゥセ@ key to do so. An eavesdropper may have both the algorithm and the public key, but will still not be able to decrypt the message. Only the recipient, with their private key can decrypt the message. A disadvantage of public-key algorithms is that they are more computationally intensive than symmetric algorithms, and therefore encryption and decryption take longer. This may not be significant for a short text message, but certainly is for long messages or audio/video. The Public-Key Cryptography Standards (PKCS) are specifications produced by RSA Laboratories in cooperation with secure systems developers worldwide for the purpose of accelerating the deployment of public-key cryptography. First published in 1991 as a result of meetings with a small group of early adopters of public-key technology, the PKCS documents have become widely referenced and implemented. Contributions from the PKCS series have become part of many formal and de facto standards, including ANSI X9 documents, PKIX, SET, S/MIME, and SSL. The next two sections describe two popular public-key algorithms, the RSA Algorithm and the Pretty Good Privacy (PGP) Hybrid Algorithm. 8.9 THE RSA ALGORITHM RSA, named after its three creators-Rivest, Shamir and Adleman, was the first effective public- key algorithm, and for years has withstood intense scrutiny by cryptanalysts all over the world. Unlike symmetric key algorithms, where, as long as one presumes that an algorithm is not flawed, the security relies on having to try all possible keys, public-key algorithms rely on it being computationally unfeasible to recover the private key from the public key. RSA relies on the fact that it is easy to multiply two large prime numbers together, but extremely hard (i.e. time consuming) to factor them back from the result. Factoring a number means finding its prime factors, which are the prime numbers that need to be multiplied together in order to produce that number. For example, Cryptography 10 = 2 X 5 60 = 2 X 2 X 3 X 5 2113 - 1 = 3391 X 23279 X 65993 X 1868569 X 1066818132868207 The algorithm Two very large prime numbers, normally of equal length, are randomly chosen then multiplied together. N=AxB T= (A- 1) X (B- 1) (8.4) (8.5) A third number is then also chosen randomly as the public key (E; such that it has no common factors (i.e. is relatively prime) with T. The private key (D) is then: D= E- 1 mod T (8.6) To encrypt a block of plaintext (M) into ciphertext (C): C=MEmodN Jo decrypt: M= cDmod N Example 8.5 Consider the following implementation ofthe RSA algorithm. 1st prime (A) = 37 2nd prime (B)= 23 So, N = 37 X 23 = 851 T= (37- 1) X (23- 1) = 36 X 23 = 792 E must have no factors other than 1 in セッュュッョ@ with 792. E (public key) could be 5. D (private key)= 5-I mod 792 = 317 To encrypt a message (M) of the character 'G': If G is represented as 7 (7th letter in alphabet), then M = 7. C (ciphertext) = 75 mod 851 = 638 To decrypt: M = 638317 mod 851 = 7. Security of RSA (8.7) (8.8) The security of RSA algorithm depends on the ability of the hacker to factorize numbers. New, faster and better methods for factoring numbers are constantly being devised. The current best for long numbers is the Number Field Sieve. Prime Numbers of a length that was unimaginable a mere decade ago are now factored easily. Obviously the longer a number is, the harder it is to factor, and so the better the security of RSA. As theory and computers improve, larger and
  • 132. Information Theory, Coding and Cryptography larger keys will have to be used. The disadvantage in using extremely long keys is the computational overhead involved in encryption/decryption. This will only become a problem if a new factoring technique emerges that requires keys of such lengths to be used that necessary key length increases much faster than the increasing average speed of computers utilising the RSA algorithm. In 1997, a specific assessment of the security of 512-bit RSA keys showed that one may be factored for less than $1,000,000 in cost and eight months of effort. It is therefore believed that 512-bit keys provide insufficient security for anything other than short-term needs. RSA Laboratories currently recommend key sizes of 768 bits for personal use, 1024 bits for corporate use, and 2048 bits for extremely valuable keys like the root-key pair used by a certifying authority. Security can be increased by changing a user's keys regularly and it is typical for a user's key to expire after two years (the opportunity to change keys also allows for a longer length key to be chosen). Even without using huge keys, RSA is about 1000 times slower to encrypt/decrypt than DES. This has resulted in it not being widely used as a stand-alone cryptography system. However, it is used in many hybrid cryptosystems such as PGP. The basic principle of hybrid systems is to encrypt plaintext with a Symmetric Algorithm (usually DES or IDEA); the symmetric algorithm's key is then itself encrypted with a public-key algorithm such as RSA. The RSA encrypted key and symmetric algorithm-encrypted message are then sent to the recipient, who uses his private RSA key to decrypt the Symmetric Algorithm's key, and then that key to decrypt the message. This is considerably faster than using RSA throughout, and allows a different symmetric key to be used each time, considerably enhancing the security of the Symmetric Algorithm. RSA's future security relies· solely on advances in factoring techniques. Barring an astronomical increase in the efficiency of factoring techniques, or available computing power, the 2048-bit key will ensure very secure protection into the foreseeable future. For instance an Intel Paragon, which can achieve 50,000 mips (million operations per second), would take a million years to factor a 2048-bit key using current techniques. 8. 10 PREITY GOOD PRIVACY (PGP) Pretty Good Privacy (PGP) is a hybrid cryptosystem that was created by Phil Zimmerman and released onto the Internet as a freeware program in 1991. PGP is not a new algorithm in its own right, but rather a series of other algorithms that are performed along with a sophisticated protocol. PGP's intended use was for e-mail security, but there is no reason why the basic principles behind it could not be applied to any type of transmission. PGP and its source code is freely available on the Internet. This means that since its creation PGP has been subjected to an enormous amount of scrutiny by cryptanalysts, who have yet to find an exploitable fault in it. · Cryptography PGP has four main modules: a symmetric cipher- IDEA for message encryption, a public key algorithm-RSA to encrypt the IDEA key and hash values, a one-way hash function-MD5 for signing, and a random number generator. The fact that the body of the message is encrypted with a symmetric algorithm (IDEA) means that PGP generated e-mails are a lot faster to encrypt and decrypt than ones using simple RSA. The key for the IDEA module is randomly generated each time as a one-off session key, this makes PGP very secure, as even if one message was cracked, all previous and subsequent messages would remain secure. This session key is then encrypted with the public key of the recipient using RSA. Given that keys up to 2048 bits long can be used, this is extremely secure. MD5 can be used to produce a hash of the message, which can then be signed by the sender's private key. Another feature of PGP's security is that the user's private key is encrypted using a hashed pass-phrase rather than simply a password, making the private key extremely resistant to copying even with access to the user's computer. Generating true random numbers on a computer is notoriously hard. PGP tries to achieve randomness by making use of the keyboard latency when the user is typing. This means that the program measures the gap of time between each key-press. Whilst at first this may seem to be distinctly non-random, it is actually fairly effective-people take longer to hit some keys than others, pause for thought, make mistakes and vary tpeir overall typing speed on all sorts of factors such as knowledge of the subject and tiredness. These measurements are not actually used directly but used to trigger a pseudo-random number generator. There are other ways of generating random numbers, but to be much better than this gets very complex. PGP uses a very clever, but complex, protocol for key management. Each user generates and distributes their public key. IfJames is happy that a person's public key belongs to who it claims to belong to, then he can sign that person's public key andJames's program will then accept messages from that person as valid. The user can allocate levels of trust to other users. For instance, James may decide that he completely trusts Earl to sign other peoples' keys, in effect saying "his word is good enough for me". This means that if Rachel, who has had her key signed by Earl, wants to communicate withJames, she sendsJames her signed key.James's program recognises Earl's signature, has been told that Earl can be trusted to sign keys, and so accepts Rachel's key as valid. In effect Earl has introduced Rachel to James. PGP allows many levels of trust to be assigned to people, and this is best illustrated in Fig. 8.4. The explanations are as follows. 15 ' line James has signed the keys of Earl, Sarah, Jacob and Kate. James completely trusts Earl to sign other peoples' keys, does not trust Sarah at all, and partially trusts Jacob and Kate (he trusts Jacob more than Kate).
  • 133. X I Mike (unsigned) セuョ・@ Information Theory, Coding and Cryptography Q =fully trusted 0 = partially trusted to a lesser degree c:J =partially trusted X CJ =not validated セ@ =B's key validated directly mor by introduction Level 1-People with keys signed by James Levei2-People with keys signed by those on level 1 Level 3-People with keys signed by those on level 2 Fig. 8.4 An Example of a PGP User Web. aャエィッオセj。ュ・ウ@ has not signed Sam's key he still trusts Sam to sign other peoples' keys, may be セョ@ bセ「@ s say so or due to them actually meeting. Because Earl has signed Rachel's key, Rachel IS validated H「セエ@ not trusted to sign keys). Even though Bob's key is signed by Sarah andJacob, because Sarah IS not trusted andJacob only partially trusted, Bob is not validated. Two partially trusted people,Jacob and Kate, have signed Archie's key, therefore Archie is validated. J«i line S_am, who is fully trusted, has signed Hal's key, therefore Hal is validated. Louise's key has been signed by Rachel and Bob, neither of whom is trusted, therefore Louise is not validated. Odd one out Mike's key has not been signed by anyone in James' group, maybe James found it on the Internet and does not know whether it is genuine or not. PGP never prevents the user from sending or receiving e-mail, it does however warn the user if a key is not validated, and the decision is then up to the user as to whether to heed the warning or not. Key Revocation If a user's private key is compromised then they can send out a key revocation certificate. Unfortunately this does not guarantee that everyone with that user's public key will receive it, as ke!s are often swap?ed in a disorganised manner. Additionally, if the user no longer has the pnvate key then they cannot issue a certificate, as the key is required to sign it. Cryptography Security of PGP "A chain is only as strong as its weakest link" is the saying and it holds true for PGP. If the user chooses a 40-bit RSA key to encrypt his session keys and never validates any users, then PGP will not be very secure. If however a 2048-bit RSA key is chosen and the user is reasonably vigilant, then PGP is the closest thing to military-grade encryption the public can hope to get their hands on. The Deputy Director of the NSA was quoted as saying: ''Ifall the personal computers in the world, an estimated 260 million, were put to work on a single PGP- encrypted message, it would still take an estimated 72 million times the age ofthe universe, on average, to break a single message. " A disadvantage of public-key cryptography is that anyone can send you a message using your public key, it is then necessary to prove that this message came from who it claims to have been sent by. A message encrypted by someone's private key, can be decrypted by anyone with their public key. This means that if the sender encrypted a message with his private key, and then encrypted the resulting ciphertext with the recipient's public key, the recipient would be able to decrypt the message with first their private key, and then the sender's public key, thus recovering the message and proving it came from the correct sender. This process is very time-consuming, and therefore rarely used. A much more common method of digitally signing a message is using a method called One-Way Hashing. 8.11 ONE-WAY HASHING A One-Way Hash Function is a mathematical function that takes a message string of any length (pre-string) and returns a smaller fixed-length string (hash value). These functions are designed in such a way that not only is it very difficult to deduce the message from its hashed version, but also that even given that all hashes are a certain length, it is extremely hard to find two messages that hash to the same value. In fact to find two messages with the same hash from a 128-bit hash function, 264 hashes would have to be tried. In other words, the hash value of a file is a small unique 'fingerprint'. Even a slight change in an input string should cause the hash value to change drastically. Even if 1 bit is flipped in the input string, at least half of the bits in the hash value will flip as a result. This is called an Avalanche Effect. H = hash value, f = hash function, M = original message/pre-string, then H= f(M). (8.9) Ifyou know Mthen His easy to compute. However knowing Handf, it is not easy to compute M, and is hopefully computationally unfeasible. As long as there is a low risk of collision (i.e. 2 messages hashing to the same value), and the hash is very hard to reverse, then a one-way hash function proves extremely useful for a number of aspects of cryptography.
  • 134. Information Theory, Coding and Cryptography If you one-way hash a message, the result will be a much shorter but still unique (at least statistically) number. This can be used as proof of ownership of a message without having to reveal the contents of the actual message. For instance rather than keeping a database of copyrighted documents, if just the hash values of each document were stored, then not only would this save a lot of space, but it would also provide a great deal of security. If copyright then needs to be.proved, the owner could produce the original document and prove it hashes to that value. Hash-functions can also be used to prove that no changes have been made to a file, as adding even one character to a file would completely change its hash value. By far the most common use of hash functions is to digitally sign messages. The sender performs a one-way hash on the plaintext message, encrypts it with his private key and then encrypts both with the recipient's public key and sends in the usual way. On decrypting the ciphertext, the recipient can use the sender's public key to decrypt the hash value, he can then perform a one-way hash himself on the plaintext message, and check this with the one he has received. If the hash values are identical, the recipient knows not only that the message came from the correct sender, as it used their private key to encrypt the hash, but also that the plaintext message is completely authentic as it hashes to the same value. The above method is greatly preferable to encrypting the whole message with a private key, as the hash of a message will normally be considerably smaller than the message itself. This means that it will not significantly slow down the decryption process in the same way that decrypting the entire message with the sender's public key, and then decrypting it again with the recipient's private key would. The PGP system uses the MD5 hash function for precisely this purpose. The Microsoft Cryptographic Providers support three hash algorithms: MD4, MD5 and SHA. Both MD4 and MD5 were invented by Ron Rivest. MD stands for Message Digest. Both algorithms produce 128-bit hash values. MD5 is an improved version of MD4. SHA stands for Secure Hash Algorithm. It was designed by NIST and NSA. SHA produces 160-bit hash values, longer than MD4 and MD5. SHA is generally considered more secure than other algorithms and is the recommended hash algorithm. 8.12 OTHER TECHNIQUES One Time Pads The one-time pad was invented by MajorJoseph Mauborgne and Gilbert Bemam in 1917, and is an unconditionally secure (i.e. unbreakable) algorithm. The theory behind a one-time pad is simple. The pad is a non-repeating random string of letters. Each letter on the pad is used once only to encrypt one corresponding plaintext character. After use, the pad must neverbe re-used. As long as the pad remains secure, so is the message. This is because a random key added to a Cryptography non-random message produces completely random ciphertext, and there is absolutely no amount of analysis or computation that can alter that. If both pads are destroyed then the original message will never be recovered. There are two major drawbacks: Firstly, it is extremely hard to generate truly random numbers, and a pad that has even a couple of non-random properties is theoretically breakable. Secondly, because the pad can never be reused no matter how large it is, the length of the pad must be the same as the length of the message which is fine for text, but virtually impossible for video. Steganography Steganography is not actually a method of encrypting messages, but hiding them within something else to enable them to pass undetected. Traditionally this was achieved with invisible ink, microfilm or taking the first letter from each word of a message. This is now achieved by hiding the message within a graphics or sound file. For instance in a 256-greyscale image, if the least significant bit of each byte is replaced with a bit from the message then the result will be indistinguishable to the human eye. An eavesdropper will not even realise a message is being sent. This is not cryptography however, and although it would fool a human, a computer would be able to detect this very quickly and reproduce the original message. Secure Mail and S/MIME Secure Multipurpose Internet Mail Extensions (S/MIME) is a de facto standard developed by RSA Data Security, Inc., for sending secure mail based on public-key cryptography. MIME is the industry standard format for electronic mail, which defines the structure of the message's body. S/MIME-supporting e-mail applications add digital signatures and encryption capabilities to that format to ensure message integrity, data origin authentication and confidentiality of electronic mail. When a signed message is sent, a detached signature in the PKCS =#=7 format is sent along with the message as an attachment. The signature attachment contains the hash of the original message signed with the sender's private key, as well as the signer certificate. S/MIME also supports messages that are first signed with the sender's private key and then enveloped using the recipients' public keys. 8.13 SECURE COMMUNICATION USING CHAOS FUNCTIONS Chaos functions have also been used for secure communications and cryptographic applications. The implication of a chaos function here is an iterative difference equation that exhibits chaotic behaviour. If we observe the fact that cryptography has more to do with unpredictability rather than randomness, chaos functions are a good choice because of their property of unpredictability. If a hacker intercepts part of the sequence, he will have no information on how to predict what comes next. The unpredictability of chaos functions makes them a good choice for generating the keys for symmetric cryptography.
  • 135. Information Theory, Coding and Cryptography Example 8.6 Consider the difference equation Xn+l = axn(l- Xn) (8.10) For a= 4, this function behaves like a chaos function, i.e., (i) エィセ@ values obtained by successive iterations are unpredictable, and (ii) the function is extremely sensitive to the initial condition, Xo· For any given initial condition, this function will generate values ofx11 between 0 and 1for each iteration. These values are good candidates forkey generation. In single-keycryptography, a key is used for enciphering the message. This key is usually a pseudo noise (PN) sequence. The message can be simply XORed with the key in order to scramble it. Since xn takes positive values that are always less than unity, the binary equivalent ofthese fractions can serve as keys. Thus, one ofthe ways ofgenerating keys from these random, unpredictable decimal numbers is to directly use their binary representation. The lengths ofthese binary sequences will be limited only by the accuracy of the decimal numbers, and hence very long binary keys can be generated. The recipient must know the initial condition in order to generate the keys for decryption. For application in single key cryptography the following two factors need to be decided (i) The start value for the iterations (Xo), and (ii) The number for decimal places of the mantissa that are to be supported by the calculating machine (to avoid round off error). For single-key cryptography, the chaos values obtained after some number of iteration are converted to binary fractions whose first 64 bits are taken to generate PN sequences. These initial iterations would make it still more difficult for the hacker to guess the initial condition. The starting value should be taken between 0 and 1. A good choice of the starting value can improve the performance slightly. The secrecy of the starting number, x0, is the key to the success of this algorithm. Since chaos functions are extremely sensitive to even errors of 10-30 in the starting number (x0 ), it means that we can have 1030 unique starting combinations. Therefore, a hacker who knows the chaos function and the encryption algorithm has to try out 1030 different start combinations. In the DES algorithm the hacker had to try out approximately 1019 different key values. Chaos based algorithms require a high computational overhead to generate the chaos values as well as high computational speeds. Hence, it might not be suitable for bulk data encryption. 8.14 CRYPTANALYSIS Cryptanalysis is the science (or black art!) of recovering the plaintext of a message from the ciphertext without access to the key. In cryptanalysis, it is always assumed that the cryptanalyst Cryptography has full access to the algorithm. An attempted cryptanalysis is known as an attack, of which there are five major types: • Bruteforce attack This technique requires a large amount of computing power and a large amount of time to run. It consists of trying all possibilities in a logical manner until the correct one is found. For the majority of encryption algorithms a brute force attack is impractical due to the large number of possibilities. • Ciphertext-only: The only information the cryptanalyst has to work with is the ciphertext of various messages all encrypted with the same algorithm. セエ@ Known-plaintext. In this scenario, the cryptanalyst has access not only to the ciphertext of various messages, but also the corresponding plaintext as well. • Chosen-plaintext: The cryptanalyst has access to the same information as in a known plaintext attack, but this time may choose the plaintext that gets encrypted. This attack is more powerful, as specific plaintext blocks can be chosen that may yield more information about the key. An adaptive-chosen-plaintext c:ttack is merely one where the cryptanalyst may repeatedly encrypt plaintext, thereby modifying the input based on the results of a previous encryption. • Chosen-ciphertext. The cryptanalyst uses a relatively new technique called differential cryptanalysis, which is a1 interactive and iterative process. It works through many rounds using the results from P- "vious rounds, until the key is identified. The cryptanalyst repeatedly chooses ciphertext to be decrypted, and has access to the resulting plaintext. From this they try to deduce the key. There is only one totally secure algorithm, the one-time pad. All other algorithms can be broken given infinite time and resources. Modern cryptography relies on making it computationally unfeasible to break an algorithm. This means, that while it is theoretically possible, the time scale and resources involved make it completely unrealistic. If an algorithm is presumed to be perfect, then the only method of breaking it relies on trying every possible key combination until the resulting ciphertext makes sense. As mentioned above, this type of attack is called a brute-force attack. The field of parallel computing is perfectly suited to the task of brute force attacks, as every processor can be given a number of possible keys to try, and they do not need to interact with each other at all except to announce the result. A technique that is becoming increasingly popular is parallel processing using thousands of individual computers connected to the Internet. This is known as distributed computing. Many cryptographers believe that brute force attacks are basically ineffective when long keys are used. An encryption algorithm with a large key (over 100 bits) can take millions of years to crack, even with powerful, networked computers of today. Besides, adding a single extra key doubles the cost of performing a brute force cryptanalysis. Regarding brute force attack, there are a couple of other pertinent questions. What if the original plaintext is itself a cipher? In that case, how will the hacker know if he has found the right key. In addition, is the cryptanalyst sitting at the computer and watching the result of each
  • 136. Information Theory, Coding and Cryptography key that is being tested? Thus, we can assume that brute force attack is impossible provided long enough keys are being used. Here are some of the techniques that have been used by cryptanalysts to attack ciphertext. • Differential cryptanalysis: As mentioned before, this technique uses an iterative process to evaluate cipher that has been generated using an iterative block algorithm (e.g. DES). Related plaintext is encrypted using the same key. The difference is analysed. This technique proved successful against DES and some hash functions. • Linear Cryptanalysis: In this, pairs of plaintext and ciphertext are analysed and a linear approximation technique is used to determine the behaviour of the block cipher. This technique was also used successfully against DES. • Algebraic attack This technique exploits the mathematical structure in block ciphers. If the structure exists, a single encryption with one key might produce the same result as a double encryption with two different keys. Thus the search time can be reduced. However, strong or weak the algorithm used to encrypt it, a message can be thought of as secure if the time and/or resources needed to recover the plaintext greatly exceed the benefits bestowed by having the contents. This could be because the cost involved is greater than the financial value of the message, or simply that by the time the plaintext is recovered the contents will be outdated. 8. 15 POLITICS OF CRYPTOGRAPHY Widespread use of cryptosystems is something most governments are not particularly happy about-precisely because it threatens to give more privacy to the individual, including criminals. For many years, police forces have been able to tap phone lines and intercept mail, however, in an encrypted future that may become impossible. This has led to some strange decisions on the part of governments, particularly the United States government. In the United States, cryptography is classified as a munition and the export of programs containing cryptosystems is tightly controlled. In 1992, the Software Publishers Association reached agreement with the State Department to allow the export of software that contained RSA's RC2 and RC4 encryption algorithms, but only if the key size was limited to 40 bits as opposed to the 128 bit keys available for セウ・@ within the US. This significantly reduced the level of privacy produced. In 1993 the US Congress had asked the National Research Council to study US cryptographic policy. Its 1996 report, the result of two years' work, offered the fol!owing conclusions and recommendations: • "On balance, the advantages of more widespread use of cryptography outweigh the disadvantages." • "No law should bar the manufacture, sale or use of any form of encryption within the United States." • "Export controls on cryptography should be progressively relaxed but not eliminated." Cryptography In 1997 the limit on the key size was increased to 56 bits. The US government has proposed several methods whereby it would allow the export of stronger encryption, all based on a system where the US government could gain access to the keys if necessary, for example the clipper chip. Recently there has been a lot of protest from the cryptographic 」セュュオョゥ⦅エケ@ against the _tJS government imposing restrictions on the development of cryptographic セ・」セュアオ・ウN@ :ne article by Ronald L. Rivest, Professor, MIT, in the October 1998 issue of the Sctentific Ammcan, (pages 116-117) titled "The Case against Regulating Encryption Technology," is an example of such a protest. The resolution of this issue is regarded to be one of the most important for the future of e-commerce. 8.16 CONCLUDING REMARKS In this section we present a brief history of cryptography. People have tried to conceal information in written form since writing was developed. Examples survive in stone inscriptions and papyruses showing that many ancient civilizations including the Egyptians, Hebrews and Assyrians all developed cryptographic systems. The first recorded use of cryptography for correspondence was by the Spartans who (as early as 400 BC) employed a cipher device called a scytale to send secret communications between military commanders. The scytale consisted of a tapered baton around which was wrapped a piece of parchment inscribed with the message. Once unwrapped the parchment appeared to contain an incomprehensible set of letters, however when wrapped around another baton of identical size the original text appears. The Greeks were therefore the inventors of the first transposition cipher and in the fourth century BC the earliest treatise on the subject was written by a Greek, Aeneas t。」セ」オウL@ as part of a work entitled On the Defence ofFortifications. Another Greek, Polybius, later deVIsed a means of encoding letters into pairs of symbols using a device known as the Polybius checkerboardwhich contains many elements common to later encryption systems. In addition to the Greeks there are similar examples of primitive substitution or transposition ciphers in use by ッエィセイ@ civilizations including the Romans. The Polybius checkerboard consists of a five by five gn_d containing all the letters of the alphabet. Each letter is converted into two numbers, the first 1s the row in which the letter can be found and the second is the column. Hence the letter A becomes 11, the letter B 12 and so forth. The Arabs were the first people to clearly understand the principles of cryptography. They devised and used both substitution and transposition ciphers and discovered the use of letter frequency distributions in cryptanalysis. As a result of this, by approximately 1412, al-Kalka- shandi could include in his encyclopaedia Subh al-a'sha a respectable, if elementary, treatment of several cryptographic systems. He also gave explicit instructions on how to _cryptanalyze ciphertext using letter frequency counts including examples illustrating the techmque.
  • 137. Information Theory, Coding and Cryptography European cryptography dates from the Middle Ages during which it was developed by the Papal and Italian city states. The earliest ciphers involved only vowel substitution (leaving the consonants unchanged). Circa 1379 the first European manual on cryptography, consisting of a compilation of ciphers, was produced by Gabriele de Lavinde of Parma, who served Pope Clement VII. This manual contains a set of keys for correspondents and uses symbols for letters and nulls with several two character code equivalents for words and names. The first brief code vocabularies, called nomenclators, were expanded gradually and for several centuries were the mainstay of diplomatic communication for nearly all European governments. In 1470 Leon b。エエゥセエ。@ Al.berti described the first cipher disk in Trattati in cifra and the Traicti de chiffres, published m 1586 by Blaise de Vigernere contained a square table commonly attributed to him as well as descriptions of the first plaintext and ciphertext autokey systems. By 1860 large codes were in common use for diplomatic communications and cipher systems had 「・」セュ・N@ a rarity for this application. However, cipher systems prevailed for military 」ッュュセュ」。エゥッョウ@ (except for high-command communication because of the difficulty of ーイッエ・」セョァ@ codebooks from capture or compromise). During the US Civil War the Federal Army ・セエ・ョウゥカ・ャケ@ used エイセウーッウゥエゥッョ@ ciphers. The Confederate Army primarily used the Vigenere cipher and ッセ@ occasiOnal monoalphabetic substitution. While the Union cryptanalysts solved most of the mtercepted Confederate ciphers, the Confederacy, in desperation, sometimes published Union ciphers in newspapers, appealing for help from readers in cryptanalysing them. During the first world war both sides employed cipher systems almost exclusively for tactical 」ッュュオョセ」。セッョ@ while code systems were still used mainly for high-command and diplomatic 」ッュセセュ」セエゥッョN@ Although field cipher systems such as the US Signal Corps cipher disk lacked sophistication, some complicated cipher systems were used for high-level communications by the end of the war. The most famous of these was the German ADFGVX fractionation cipher. .In the 1920s the maturing of mechanical and electromechanical technology came together With the needs of telegraphy and radio to bring about a revolution in cryptodevices-the development セヲ@ イッセッイ@ cipher machines. The concept of the rotor had been anticipated in the older ュ・」ィ。ョセ」セ@ cipher disks however it was an American, Edward Hebern, who recognised that by hardwxnng a monoalphabetic substitution in the connections from the contacts on one side of an electrical rotor to those on the other side and cascading a collection of such rotors ーッャケセーセ。「・エゥ」@ substitutions of almost any complexity could be produced. From· 1921 and 」ッョエゥセオュァ@ through the next decade, Hebern constructed a series of steadily improving rotor machmes that were evaluated by the US Navy. It was undoubtedly this work which led to the United States' superior position in cryptology during the Second World War. At almost the same time as Hebern was inventing the rotor cipher machine in the United States European ・セァゥョ・・イウ@ such as Hugo Koch (Netherlands) and Arthur Scherbius (Germany) ゥョ、セー・ョ、・ョエャケ@ 、セウ」ッカ・イ・、@ the rotor concept and designed the precursors to the most famous cipher machine in history, the German Enigma Machine which was used during World War 2. These machines Cryptography were also the stimulus for the TYPEX, the cipher machine employed by the British during World War 2. The United States introduced the M-134-C (SIGABA) cipher machine during World War 2. TheJapanese cipher machines of World War 2 have an interesting history linking them to both the Hebern and the Enigma machines. After Herbert Yardley, an American cryptographer who organised and directed the US government's first formal code-breaking efforts during and after the first world war, published The American Black Chamber in which he outlined details of the American successes in cryptanalysing the Japanese ciphers, the Japanese government set out to develop the best cryptomachines possible. With this in mind, it purchased the rotor machines of Hebern and the commercial Enigmas, as well as several other contemporary machines, for study. In 1930 the Japanese's first rotor machine, code named RED by US cryptanalysts, was put into service by the Japanese Foreign Office. However, drawing on experience gained セッュ@ cryptanalysing the ciphers produced by the Hebern rotor machines, the US Army Signal Intelligence Service team of cryptanalysts succeeded in cryptanalysing the RED ciphers. In 1939, the Japanese introduced a new cipher machine, code-named PURPLE by US cryptanalysts, in which the rotors were replaced by telephone stepping ウキゥエ」ィ・セN@ The ァイ・セエセウエ@ triumphs of cryptanalysis occurred during the Second World War when the Pohsh and Bntish cracked the Enigma ciphers and the American cryptanalysts broke the Japanese RED, ORANGE and PURPLE ciphers. These developments played a major role in the Allies' conduct of World War 2. After World War 2 the electronics that had been developed in support of radar were adapted to cryptomachines. The first electrical cryptomachines were little more than rotor machines where the rotors had been replaced by electronic substitutions. The only advantage of these electronic rotor machines was their speed of operation as they were still affected by the inherent weaknesses of the mechanical rotor machines. The era of computers and electronics has meant an unprecedented freedom for cipher designers to use elaborate designs which would be far too prone to error if handled セエィ@ セ・ョ」ゥャ@ and paper, or far too expensive to implement in the form of an electromechamcal .cipher machine. The main thrust of development has been in the development of block ciphers, beginning with the LUCIFER project at IBM, a direct ancestor of the DES (Data Encryption Standard). There is a place for both symmetric and public-key algorithms in modern cryptography. Hybrid cryptosystems successfully combine aspects of both and seem to be ウ・N」オイセ@ 。ョセ@ ヲ。ウセN@ While PGP and its complex protocols are designed with the Internet commumty m mmd, セエ@ should be obvious that the encryption behind it is very strong and could be adapted to smt many applications. There may still be instances when a simple algorithm is ョ・」・ウセ。イケL@ and with the security provided by algorithms like IDEA, there is absolutely no reason to thmk of these as significantly less secure. -
  • 138. Information Theory, Coding and Cryptography An article posted on the Internet on the subject of picking locks stated: "The most effective door opening tool in any burglars' toolkit remains the crowbar". This also applies to cryptanalysis - direct action is often the most effective. It is all very well transmitting your messages with 128-bit IDEA encryption, but if all that is necessary to obtain that key is to walk up to one of the computers used for encryption with a floppy disk, then the whole point of encryption.is negated. In other words, an incredibly strong algorithm is not sufficient. For a. system to be effective there must be effective management protocols involved. Finally, in the words of Sir Edgar Allen Poe, "Human ingenuity cannot concoct a cipher which human ingenuity cannot resolve." SUlvflvfARY • A cryptosystem is a collection of algorithms and associated procedures for hiding and revealing information. Cryptanalysis is the process of analysing a cryptosystem, either to verify its integrity or to break it for ulterior motives. An attacker is a person or system that performs cryptanalysis in order to break a cryptosystem. The process of attacking a cryptosystem is often called cracking. The job of the cryptanalyst is to find the weaknesses in the cryptosystem. • A message being sent is known as plaintext. The message is coded using a cryptographic algorithm. This process is called encryption. An encrypted message is known as ciphertext, and is turned back into plaintext by the process of decryption. • A key is a value that causes a cryptographic algorithm to run in a specific manner and produce a specific ciphertext as an output. The key size is usually measured in bits. The bigger the key size, the more secure will be the algorithm. • sケュセ・エイゥ」@ algorithms (or single key algorithms or secret key algorithms) have one key that IS used both to encrypt and decrypt the message, hence their name. In order for the recipient to decrypt the message they need to have an identical copy of the key. This presents one major problem, the distribution of the keys. • Block ciphers usually operate on groups of bits called blocks. Each block is processed a multiple number of times. In each round the key is applied in a unique manner. The more the number of iterations, the longer is the encryption process, but results in a more secure ciphertext. • Stream ciphers operate on plaintext one bit at a time. Plaintext is streamed as raw bits through the encryption algorithm. While a block cipher will produce the same ciphertext from the same plaintext using the same key, a stream cipher will not. The ciphertext produced by a stream cipher will vary under the same conditions. • To determine how much security one needs, the following questions must be answered: 1. What is the worth of the data to be protected? 2. How long does it need to be secure? 3. What are the resources available to the cryptanalyst/hacker? Cryptography • Two symmetric algorithms, both block ciphers, were discussed in this chapter. These are the Data Encryption Standard (DES) and the International Data Encryption Algorithm (IDEA). • Public-key algorithms are asymmetric, that is to say the key that is オウ・セ@ to encrypt the message is different to the key used to decrypt the message. The encryption key, known as the public key is used to encrypt a message, but the message can ッセャケ@ be 、・」ッセ・、@ by the person that has the decryption key, known as the private key. Rivest, Shamir セ、@ Adleman (RSA) algorithm and the Pretty Good Privacy (PGP) are two popular pubhc- key encryption techniques. • RSA relies on the fact that it is easy to multiply two large prime numbers together, but extremely hard (i.e. time consuming) to factor them back from the result. Factoring a number means finding its prime factors, which are the prime numbers that need to be multiplied together in order to produce that number. • A one-way hash function is a mathematical function that takes a message string of .any length (pre-string) and returns a smaller fixed-length string (hash value). These functio?s are designed in such a way that not only is it very difficult to 、・、セ」・@ the ュセウセ。ァ・@ from Its hashed version, but also that even given that all hashes are a certain length, It IS extremely hard to find two messages that hash to the same value. • Chaos functions can be used for secure communication and cryptographic applications. The chaotic functions are primarily used for generating keys that are essentially unpredictable. • An attempted unauthorised cryptanalysis is known as an attack, of which エィ・セ・@ are five major types: Brute force attack, Ciphertext-only, Known-plaintext, Chosen-plamtext and Chosen ciphertext. • The common techniques that are used by cryptanalysts to attack ciphertext are differential cryptanalysis, linear cryptanalysis and algebraic attack. • Widespread use of cryptosystems is something most ァッカ・イョュ・ョエセ@ a.r:e _not ーセ」オャ。Zャケ@ happy about, because it threatens to give more privacy to the md1vtdual, mcludmg criminals. MMMMMMMMMMMMMMMMMMMMMMセMMMMᄋMMMMᄋMM -- ------·--·------ セ@ niセ@ wtttOr"e/ ゥMューセ@ tha,.n; ォNNキキセ@ i I ᆪセ@ Alliert (1879-1955) ! ' ᄋMMセᄋM 0 8.1 We want to test the security of character+ x encrypting technique in which each alphabet of the plaintext is shifted by n to produce the ciphertext. (a) How many different attempts must be made to crack this code assuming brute f?rce attack is being used? (b) Assuming it takes a computer 1 ms to check out one value of the shift, how soon can this code be broken into?
  • 139. Information Theory, Coding and Cryptography 8.2 Suppose a group of N people want to use secret key cryptography. Each pair of people in the group should be able to communicate secretly. How many distinct keys are required? 8.3 Transposition Ciphers rearrange the letters of the plaintext without changing the letters themselves. For example, a very simple transposition cipher is the railfence, in which the plaintext is staggered between two rows and then read off to give the ciphertext. In a two row rail fence the message MERCHANT TAYLORS' SCHOOL becomes: M R H N T Y 0 S C 0 L E C A T A L R S H 0 Which is read out as: MRHNTYOSCOLECATALRSHO. (a) If a cryptanalyst wants to break into the rail fence cipher, how many distinct attacks must he make, given the length of the ciphertext is n? (b) Suggest a decrypting algorithm for the rail fence cipher. 8.4 One of the most famous field ciphers ever was a fractionation system - the ADFGVX cipher, which was employed by the German Army during the first world war. This system was so named because it used a 6 x 6 matrix to substitution-encrypt the 26 letters of the alphabet and 10 digits into pairs of the symbols A, D, F, G, V and X. The resulting biliteral cipher is only an intermediate cipher, it is then written into a rectangular matrix and transposed to produce the final cipher which is the one which would be transmitted. Here is an example of enciphering the phrase "Merchant Taylors" with this cipher using the key word "Subject". . A D F G v X A s u B J E c D T A D F G H F I K L M N 0 G p Q R v w X v y z 0 1 2 3 X 4 5 6 7 8 9 Plaintext: M E R C H A N T T A Y L 0 R S Ciphertext: FG AV GF AX DX DD FV DA DA DD VA FF FX GF AA This intermediate ciphertext can then be put in a transposition matrix based on a different key. c I p H E R 1 4 5 3 2 6 F G A v G F A X D X D D F v D A D A D D v A F F F X G F A A The final cipher is therefore: FAFDFGDDFAVXAAFGXVDXADDVGFDAFA. 8.5 Cryptography (a) If a cryptanalyst wants to break into ADFGVX cipher, how many distinct attacks must he make, given the length of the ciphertext is n? (b) Suggest a decrypting algorithm for the ADFGVX cipher. Consider the knapsack technique for encryption proposed by Ralph mセイォャ・@ of XEROX and Martin Hellman of Stanford University in 1976. They suggested usmg the knapsack, or subset-sum, problem as the basis for a public key cryptosystem. This problem en_tails determining whether a number can be expressed as a sum of some セオ「ウ・エ@ of a gtven sequence of numbers and, more importantly, which subset has the desired sum. Given a sequence of numbers A, where A= (a1 ... aJ, and a number C, the knapsack problem is to find a subset of a1 ... an which sums to C. Consider the following example: n= 5, C= 14, A= (1, 10, 5, 22, 3) Solution= 14 = 1 + 10 + 3 In general, all the possible sums of all subsets can be expressed by: ml al + BBBセ@ + mstl:3 + ... + mnan where each mi is either 0 or 1. The solution is therefore a binary vector M = (1, 1, 0, 0, 1). There is a total number 2n of such vectors (in this example 2 5 = 32) Obviously not all values of C can be formed from the sum of a subset and some can be formed in more than one way. For example, when A= (14, 28, 56, 82, 90, 132, 197, 284, 341, 455, 515) the figure 515 can be formed in three different ways but the number 516 cannot be formed in any way. (a) If a cryptanalyst wants to break into this knapsack cipher, how many distinct attacks must he make? (b) Suggest a decrypting algorithm for the knapsack cipher. . 8.6 (a) Use the prime numbers 29 and 61 to generate keys using the RSA Algonthm. (b) Represent the letters 'RSA' in ASCII and encode them using the key generated above. (c) Next, generate keys using the pair of primes, 37 and 67. Which is more secure, the keys in part (a) or part (c)? Why? 8.7 Write a program that performs encryption using DES. 8.8 Write a program to encode and decode using IDEA. Compare エィセ@ ョセュ「セセセヲ@ d computations required to encrypt a plaintext using the same keys1ze 10r an IDEA. 8.9 Write a general program that dm factorize a given number. 8.10 Write a program to encode and decode using the RSA algorithm. Plot the number_ of floating point operations required to be performed by the program versus the key-size.
  • 140. Information Theory, Coding and Cryptography 8.11 Consider the difference equation Xn+ I= axn(l - Xn) For a = 4, this function behaves like a chaos function. (a) Plot a sequence of 100 values obtained by iterative application of the difference equation. What happens if the starting values Xo = 0.5? (b) Take two initial conditions (i.e., two different starting values, Xor and .xo2 ) which are separated by セクN@ Use the difference equation to iterate each starting point ntimes and obtain the final values y01 and y02, which are separated by セケN@ For a given セクL@ plot L1y versus n. (c) For a given value of n (say n= 500), plot セク@ verus セケN@ (d) Repeat parts (a), (b) and (c) for a= 3.7 and a= 3.9. Compare and comment. (e) Develop a chaos-based encryption program that generates keys for single-key encryption. Use the chaos function xn+1 = 4xn(1-xJ (fj Compare the encryption speed of this chaos-based program with that of IDEA for a key length of 128 bits. (g) Compare the security of this chaos-based algorithm with that of IDEA for the 128 bit long key. Cryptography Index A Mathematical Theory of Communication 41 a scytale 265 AC coefficients 40 Additive White Gaussian Noise (AWGN) 56 Aeneas Tacticus 265 Algebraic attack 264 Asymmetric (Public-Key) Algorithms 254 Asymmetric Encryption 244 attacker 241 Augmented Generating Function 175 authenticity 242 Automatic Repeat Request 97 Avalanche Effect 259 Average Conditional Entropy 15 Average Conditional Self-Information 12 Average Mutual Information 11, 14 average number of nearest neighbours 221 Average Self-Information 11 Bandwidth Efficiency Diagram 60 Binary Entropy Function 12 Binary Golay Code 124 Binary Symmetric Channel 8 Blaise de Vigemere 266 Block Ciphers 247 Block Code 53, 77 Block Codes 53 Block Length 78, 161 Blocklength 161, 168 Brute force attack 263 BSC 13 Burst Error Correction 121 Burst Errors 121 Capacity Boundary 61 catastrophic 185 Catastrophic Convolutional Code 169 Catastrophic Error Propagation 170 Channel 49 Channel Capacity 50 Channel Coding 48, 76 Channel Coding Theorem 53 Channel Decoder 52 Channel Encoder 52 Channel Formatting 52 Channel Models 48 channel state information 229 channel transition probabilities 9