information_theory_coding_and_cryptograp.pdf

Acknowledgements
I would like to thank the Department of Electrical Engineering at the Indian Institute of
Technology (liT), Delhi for providing a stimulating academic environment that inspired this
book. In particular, I would like to thank Prof. S.C. Dutta Roy, Prof. Surendra Prasad,
Prof. H.M. Gupta, Prof. V.K Jain, Prof. Vinod Chandra, Prof. Santanu Chaudhury, Prof. S.D.
Joshi, Prof. Sheel Aditya, Prof. Devi Chadha, Prof. D. Nagchoudri, Prof. G.S. Visweswaran,
Prof. R K. Patney, Prof. V. C. Prasad, Prof. S. S. Jamuar and Prof. R K P. Bhatt. I am also
thankful to Dr. Subrat Kar, Dr. Ranjan K. Mallik and Dr. Shankar Prakriya for friendly
discussions. I have been fortunate to have several batches of excellent students whose feedback
have helped me improve the contents ofthis book. Many of the problems given at the end of the
chapters have been tested either as assignment problems or examination problems.
My heartfelt gratitude is due to Prof. Bernard D. Steinberg, University of Pennsylvania, who
has been my guide, mentor, friend and also my Ph.D thesis advisor. I am also grateful to
Prof. Avraham Freedman, Tel Aviv University, for his support and suggestions as and when
sought by me. I would like to thank Prof. B. Sundar Rajan of the Electrical Communication
Engineering group at the Indian Institute of Science, Bangalore, with whom I had a preliminary
discussion about writing this book.
I wish to acknowledge valuable feedback on this initial manuscript from Prof. Ravi Motwani,
liT Kanpur, Prof. A.K. Chaturvedi, liT Kanpur, Prof. N. Kumaravel, Anna University, Prof. V.
Maleswara Rao, College of Engineering, GITAM, Visakhapatnam, Prof. M. Chandrasekaran,
Government College of Engineering, Salem and Prof. Vikram Gadre, liT Mumbai.
I am indebted to my parents, for their love and moral support throughout my life. I am also
grateful to my grandparents for their blessings, and to my younger brother, Shantanu, for the
infinite discussions on finite topics.
Finally, I would like to thank my wife and best friend, Aloka, who encouraged me at·every
stage of writing this book. Her constructive suggestions and balanced criticism have been instru-
mental in making the book more readable and palatable. It was her infinite patience, unending
support, understanding and the sense of humour that were critical in transforming my ｾｲ･｡ｭ＠
into this book.
RANJAN BosE
New Delhi
Contents
Preface
Acknowledgements
Part I
Information Theory and Source Coding
1. Source Coding
1.1 Introduction to Information Theory 3
ｾ＠ Uncertainty And Information 4
ｾａｶ･ｲ｡ｧ･＠ Mutual Information And Entropy 77
1.4 Information Measures For Continuous ｾ､ｯｭ＠ Variables 74
.LVSource Coding Theorem 75
1.6 Huffman Coding 27
I.7 The Lempel-Ziv Algorithm 28
1.8 Run Length Encoding and the PCX Format 30
1.9 Rate Distortion Function 33
1.10 Optimum Quantizer Design 36
1.11 Introduction to Image Compression 37
1.12 The Jpeg Standard for Lossless Compression 38
1.13 The Jpeg Standard for Lossy Compression 39
1.14 Concluding Remarks 47
Summary 42
Problems 44
Computer Problems 46
2. Channel Capacity and Coding
2.1 Introduction 47
2.2 Channel Models 48
ｾ＠ Channel Capacity 50
2.4 Channel Coding 52
W Information Capacity Theorem 56
IX
XII
3
47

2.6 The Shannon Limit 59
2.7 Random Selection of Codes 67
Summary 68
Problems 69
Part II
ｾ＠ｾＮ＠ 1 Error Control Coding
ｾｊｴ＠ 1JM"ｾ＠ (Channel Coding)
3. · ear Block Codes for Error Correction
ｾ＠ Introduction to Error Correcting Codes 75
3.2 JBasic Definitions 77
3.3 vMatrix Description of Linear Block Codes 87
Equivalent Codes 82
3.5 v'Parity Check Matrix 85
3.6 l.JDecoding of a Linear Block Code 87
3.7 Jsyndrome Decoding 94
3.8 Error Probability after Coding (Probability of Error Correction) 95
3.9 Perfect Codes 97
3.10 Hamming Codes 700
3.11 Optimal Linear Codes 702
3.12 Maximum Distance Separable (MDS) Codes 702
Summary 703
Problems 705
4. Cyclic Codes ｾｾ＠ｾＮＱ＠
4.1 Introduction to Cyclic Codes 708
tY Polynomials 709
,!Y The Division Algorithm for Polynomials 770
V A Method for Generating Cyclic Codes 775
W Matrix Description of Cyclic Codes 779
4.6 Burst Error Correction 727
75
108
4.7 Fire Codes 723
4.8 Golay Codes 724
4.9 Cyclic Redundancy Check (CRC) Codes 725
4.10
4.11
Circuit Implementation of Cyclic Codes
Concluding Remarks 732
728
Summary 732
Problems 134
5. Bose-Chaudhuri Hocquenghem (BCH) Codes
ｾｾＮ＠
nJJ:,-idft ｾ＠ z-id<. ＮＧｉｦｾ＠
5.1 Introduction to BCH Codes 736
ｾ＠ Primitive Elements 737
e.)" Minimal Polynomials 739
ｾ＠ Generator Polynomials in Terms of Minimal Polynomials
2.Y Some Examples of BCH Codes 743
5.6 Decoding of BCH Codes 747
V Reed-Solomon Codes 750
747
5.8 Implementation of Reed-Solomon Encoders and Decoders 753
5.9 Nested Codes 753
Summary 756
Problems 757
136
6. Convolutional Codes 159
6.1 Introduction to Convolutional Codes ＱＵｾ＠ iJ. J) iJ ｾ＠ 1! ', ｾｌ＠
ｾ＠ Tree Codes and Trellis Codes 760 JJtｾｾｬ＠ fo4! ｾ＠ｾ＠ A/ '1/A ｾ＠
ｾｯｬｹｮｯｭｩ｡ｬ＠ Description of Convolutional Codes ｾｾ＠ 1 . (;__
(Analytical Representation) 765
V Distance Notions for Convolutional Codes 770
ｾ＠ The Generating Function 773
6.6 Matrix Description of Convolutional Codes 776 ｾＮ＠ｾ＠
V/ Viterbi Decoding of Convolutional Codes 778 R}1.L GSM ＯｕｾｾｩｦｯｾＮ＠
6.8 Distance Bounds for Convolutional Codes 785 1
6.9 Performance Bounds 787
6.10 Known Good Convolutional Codes 788
6.11 Turbo Codes 790

6.12 Turbo Decoding 792
Summary 799
Problems 207
7. Trellis Coded Modulation
7.1 Introduction to TCM 206
7.2 The Concept of Coded Modulation 207
7.3 Mapping by Set Partitioning 272
7.4 Ungerboeck's TCM Design Rules 276
7.5 Tern Decoder 220
7.6 Performance Evaluation for Awgn Channel 227
7.7 Computation of tip.ee 227
7.8 Tern for Fading Channels 228
7.9 Concluding Remarks · 232
Summary 233
Problems 234
Partm
Coding for Secure Communications
. Cryptography
8.1 Introduction to Cryptography 247 L j
8.2 An Overview of Encryption Techniques 242 ｾ＠ｾ＠ｾｾｾ＠
8.3 Operations Used By Encryption Algorithms 245
8.4 Symmetric (Secret Key) Cryptography 246
8.5 Data Encryption Standard (DES) 248
8.6 International Data Encryption Algorithm (IDEA) 252
8.7 RC Ciphers 253
8.8 Asymmetric (Public-Key) Algorithms 254
8.9 The RSA Algorithm 254
8.10 Pretty Good Privacy (PGP) 256
8.11 One-Way Hashing 259
8.12 Other Techniques 260
8.13 Secure Communication Using Chaos Functions 267
/xvul
8.14 Cryptanalysis 262
8.15 Politics of Cryptography 264
Summary 268
Problems 269
206
Index 273
241

1
Source Coding
,
Not- ｾ＠ tluU: ca.t'1 be- ｾ＠ｾ＠ a.n.d.- n.ot
ｾ＠ tluU; CCf.U'.t¥ ｡ｵ｜ｾ＠｢･Ｍｾ＠ .
-Alberl" ｅｾ＠ (1879-1955)
1.1 INTRODUCTION TO INFORMATION THEORY
Today we live in the information age. The internet has become an integral part of our lives,
making this, the third planet from the sun, a global village. People talking over the cellular
phones is a common sight, sometimes even in cinema theatres. Movies can be rented in the
form of a DVD disk. Email addresses and web addresses are common on business cards. Many
people prefer to send emails and e-cards to their friends rather than the regular snail mail. Stock
quotes can be checked over the mobile phone.
Information has become the key to success (it has always been a key to success, but in today's
world it is tlu key). And behind all this information and its exchange lie the tiny l's and O's {the
omnipresent bits) that hold information by merely the way they sit next to one another. Yet the
information age that we live in today owes its existence, primarily, to a seminal paper published
in 1948 that laid the foundation of the wonderful field of Information Theory-a theory
initiated by one man, the American Electrical Engineer Claude E. Shannon, whose ideas

Information Theory, Coding and Cryptography
appeared in the article "The Mathematical Theory of Communication" in the Bell System
Technical]ournal (1948). In its broadest sense, information includes the content of any of the
standard communication media, such as telegraphy, telephony, radio, or television, and the
signals of electronic computers, servo-mechanism systems, and other data-processing devices.
The theory is even applicable to the signals of the nerve networks of humans and other animals.
The chief concern of information theory is to discover mathematical laws governing systems
designed to communicate or manipulate information. It sets up quantitative measures of
information and of the capacity of various systems to transmit, store, and otherwise process
information. Some of the problems treated are related to finding the best methods of using
various available communication systems and the best methods for separating wanted
information or signal, from extraneous information or noise. Another problem is the setting of
upper bounds on the capacity of a given information-carrying medium (often called an
information channel). While the results are chiefly of interest to communication engineers,
some of the concepts have been adopted and found useful in such fields as psychology and
linguistics.
The boundaries of information theory are quite fuzzy. The theory overlaps heavily with
communication theory but is more oriented towards the fundamental limitations on the
processing and communication of information and less towards the detailed operation of the
devices employed.
In this chapter, we shall first develop an intuitive understanding of information. It will be
followed by mathematical models of information sources and a quantitative measure of the
information emitted by a source. We shall then state and prove the source coding theorem.
Having developed the necessary mathematical framework, we shall look at two source coding
techniques, the Huffman encoding and the Lempel-Ziv encoding. This chapter will then
discuss the basics of the Run Length Encoding. The concept of the Rate Distortion Function
and the Optimum Quantizer will then be introduced. The chapter concludes with an
introduction to image compression, one of the important application areas of source coding. In
particular, theJPEG (joint Photographic Experts Group) standard will be discussed in brief.
1.2 UNCERTAINTY AND INFORMATION
Any information source, analog or digital, produces an output that is random in nature. If it
were not random, i.e., the output were known exactly, there would be no need to transmit it!
We live in an analog world and most sources are analog sources, for example, speech,
temperature fluctuations etc. The discrete sources are man-made sources, for example, a source
(say, a man) that generates a sequence of letters from a finite alphabet (typing his email).
Before we go on to develop a mathematical measure of information, let us develop an
intuitive feel for it. Read the following sentences:
I I
Source Coding
(A) Tomorrow, the sun will rise from the East.
(B) The phone will ring in the next one hour.
(C) It will snow in Delhi this winter.
The three sentences carry different amounts of information. In fact, the first sentence hardly
carries any information. Everybody knows that the sun rises in the East and the probability of
this happening again is almost unity. Sentence (B) appears to carry more information than
sentence (A). The phone may ring, or it may not. There is a finite probability that the phone will
ring in the next one hour (unless the maintenance people are at work again!). The last sentence
probably made you read it over twice. This is because it has never snowed in Delhi, and the
probability of a snowfall is very low. It is interesting to note that the amount of information
carried by the sentences listed above have something to do with the probability of occurrence of
the events stated in the sentences. And we observe an inverse relationship. Sentence (A), which
talks about an event which has a probability of occurrence very close to 1 carries almost no
information. Sentence (C), which has a very low probability of occurrence, appears to carry a
lot of information (made us read it twice to be sure we got the information right!). The other
interesting thing to note is that the length of the sentence has nothing to do with the amount of
information it conveys. In fact, sentence (A) is the longest but carries the minimum information.
We will now develop a mathematical measure of information.
Definition 1.1 Consider a discrete random variable X withpossible 011teomes·ｾＧ＠ i:::;:
1, 2, ..., n. . ·.. . .
The Self-Information of the event X= xi is defined as
/(xi}= log (-
1
-) =-log P(x-)
P(x1} •
. (1.1)
We note that a high probability event conveys less information than a low probability event.
For an event with P(x) = 1, J(x) = 0. Since a lower probability implies a higher degree of
uncertainty (and vice versa), a random variable with a higher degree of uncertainty contains
more information. We will use this correlation between uncertainty and level of information for
physical interpretations throughout this chapter.
The units of I(x) are determined by the base of the logarithm, which is usually selected as 2 or
e. When the base is 2, the units are in bits and when the base is e, the units are in nats (natural
units). Since 0 ｾ＠ P(xl; ｾ＠ 1, J(x;) ;;::: 0, i.e., self information is non-negative. The following two
examples illustrate why a logarithmic measure of information is appropriate.

Example 1.1 Consider a binary source which tosses a fair coin and outputs a 1 if a head (H)
appears and aO if a tail (T) appears. For this source,P{l) =P(O) =0.5. The information content of
each output from the source is
I(x;) = :_ log2 P (x;)
= -log2 P (0.5) = 1 bit (1.2)
Indeed, we have to use only one bit to represent the output from this binary source (say, we use
a 1 to represent H and a 0 to represent T).
Now, suppose the successive outputs ·from this binary source are statistically independent, i.e.,
the source is memoryless. Consider a block of m bits. There are 2m possible m-bit blocks, each of
which is equally probable with probability 2-m .
The self-information of an m-bit block is
I(x;) = - log2 P (xi)
= - log2 2-m = m bits (1.3)
Again, we observe that we indeed need m bits to represent the possible m-bit blocks.
Thus, this logarithmic measure of information possesses the desired additive property when a
number of source outputs is considered as a block.
Example 1.2 Consider a discrete, memoryless source (DMS) (source C) that outputs two bits at
a time. This source comprises two binary sources (sourcesA andB) as mentioned in Example 1.1,
each source contributing one bit. The two binary sources within the source Care independent.
Intuitively, the information content of the aggregate source (source C) should be the sum of tbe
information contained in the outputs of the two independent sources that constitute this ｾｃＮ＠
Let us look at the information content ofthe outputs of sourceC. There are four possible outcomes
{00, 01, 10, 11 }, each with a probability P(C) = P(A)P(B) = (0.5)(0.5) =0.25, because the source
A and B are independent. The information content ofeach output from the source Cis
I(C) = - log2 P(x;)
= -log2 P(0.25) = 2 bits (1.4)
We have to use two bits to represent the outpat from this combined binary source.
Thus, the logarithmic measure of information possesses the desired additive property for
independent events.
Next, consider two discrete random variables X and Ywith possible outcomes X;, i = 1, 2, ..., 11
and Yj• j = 1, 2, ..., m respectively. Suppose we observe some outcome Y = Yi and we want to
Source Coding
determine the amount of information this event provides about the eventX =x;, i = 1, 2, ..., 11, i.e.,
we want to mathematically represent the mutual information. We note the two extreme cases:
(i) X and Yare independent, in which case the occurrence of Y = Yj provides no information
aboutX=x;.
(ii) X and Yare fully dependent events, in which case the occurrence ofY= yi determines the
occurrence of the event X= x;·
A suitable measure that satisfies these conditions is the logarithm of the ratio of the conditional
probability
P(X = X; I Y = Yj) = P(x; IY}
divided by the probability
P(X =X;) = P(x;)
Definition 1.2 The mutual information I(x;; y) between X; and Yi is defined as
I(x,; y) =log ( ｰｾＺｾｻＩＩ＠
(1.5)
(1.6)
(1.7)
As before, the units of I(x) are determined by the base of the logarithm, which is
usually selected as 2 or e. When the base is 2 the units are in bits. Note that
Therefore,
P(x;!yi) _ P(x;IYi)P(y;) _ P(x;,y1) _ P(y1lx;)
P{X;) - P(x;)P{y;) - P(x;)P(y1
)- P(y1
)
(1.8)
{1.9)
The physical interpretation of I(x;; y
1) = I(y1; xJ is as follows. The information provided by the
occurrence.of the event Y= y1about the event X= X; is identical to the information provided by
the occurrence of the event X= X; about the event Y = yl
Let us now verify the two extreme cases:
(i) When the random variables X and Yare statistically independent, P(x; Iy1)= P(xJ, it leads
to I(x;; y) = 0.
(ii) When the occurrence of Y = y
1uniquely determines the occurrence of the event X= X;,
P(x; I'1}·) = 1, the mutual information becomes
I(x;; y) = lo{ Ptx;)) =-log P(x;) (1.10)
This is the self-information of the event X= X;.
Thus, the logarithmic definition of mutual information confirms our intuition.

Example 1.3 Consider a Binary Symmetric Channel (BSC) as shown in Fig. 1.1. It is a channel
that transports 1's and O's from the transmitter (Tx) to the receiver (Rx). It makes an error
occasionally, with probabilityp. A BSC flips a 1to 0 and vice-versa with equal probability. Let X
and Ybe binary random variables that represent the input and output of this BSC respectively. Let
the input symbols be equally likely and the output symbols depend upon the input according to the
channel transition probabilities as given below
P(Y =0 I X= 0) =1 - p,
P(Y=OIX=1)=p,
P(Y = 11 X= 1) = 1-p,
P(Y = 1 I X= 0) = p.
Channel
Fig. 1.1 A Binary Symmetric Channel.
It simply implies that the probability of a bit getting flipped (i.e. in error) when transmitted over
this
BSC is p. From the channel transition probabilities we have
P(Y = 0) = P(X = 0) X P(Y = 0 I X= 0) + P(X = 1) X P(Y = 0 I X= 1)
= 0.5(1- p) + 0.5(p) = 0.5, and,
P(Y= 1)=P(X=0) X P(Y= 11X=0)+P(X= 1) X P(Y= 11X= 1)
= 0.5(p) + 0.5(1- p) = 0.5.
Suppose we are at the receiver and we want to determine what was transmitted at the transmitter,
on the basis of what was received. The mutual information about the occurrence ofthe eventX= 0
given that Y= 0 is
(
P(Y = OIX =0)) (!=__.e_J
I (x0; yo) = /(0; 0) = log2 = log2 = log22(1 - p).
P(Y=O) 0.5
Similarly,
l(x1; yo)= /(1; 0) = log2 = log2 _l!_ =1ogz2p.
. ( P(Y =OIX =1)) ( )
P(Y=O) 0.5
Let us consider some specific cases.
Source Coding
Suppose, p =0, i.e., it is an ideal channel (noiseless), then,
l(Xo; y0) = 1(0; 0) = log22(1 - p) = 1 bit.
Hence, from the output, we can determine what was transmitted with certainty. Recall that the self-
information about the event X= x0 was 1bit.
However, ifp = 0.5, we get
l(x0; y0) = /(0; 0) = log22(1- p) = log22(0.5) = 0.
It is clear from the output that we have no information about what was transmitted. Thus, it is a
useless channel. For such a channel, we may as well toss a fair coin at the receiver in order to
determine what was sent!
Suppose we have a channel wherep = 0.1, then,
l(x0; yo)= 1(0; 0) = logz2(1- p) = log22(0.9) = 0.848 bits.
Example 1.4 Let X and Y be binary random variables that represent the input and output of a
binary channel shown in Fig. 1.2. Let the input symlfols be equally likely, and the output symbols
depend upon the input according to the channel transition probabilities:
P(Y = 0 I X= 0) = 1 - p0,
P(Y = 0 I X= 1) = p 1,
P(Y = 1 I X = 1) = 1 - p1,
P(Y =1 I X= 0) =p0.
1-Po
Channel
Fig. 1.2 A Binary Channel with Asymmetric Probabilities.
From the channel transition probabilities we have
P(Y = 0) = P(X = O).P(Y = 0 I X= 0) + P(X = I).P(Y = 0 I X= 1)
= 0.5(1 - p0) + 0.5(p1) = 0.5(1 -Po+ p1), and,
ｐＨｙｾ＠ 1) =P(X=O).P(Y= 11X=0)+ P(X= l).P(Y= II X= 1)
= 0.5(p0) + 0.5(1 - p1) = 0.5(1 - Pt + po).

Suppose we are at the receiver and we want to determine what was transmitted at the transmitter,
on the basis of what is received. The mutual information about the occurrence of the event X= 0
given that Y = 0 is
I(x,; yo) =/(0; O) =log2( ｐＨＺＷｙＰ
ｾＰ［＠ O)) =log2 ( 0.5(/--ｾＰ＠
+1'1J=log2 C
ｾｾＺ＠ｾｾ＠ ).
Similarly,
(
P(Y =OIX =1)) ( 2Pt )
l(x1; yo)= /(1; 0) = log2 = log2 •
P(Y =0) 1- Po + Pt
Definition 1.3 The Conditional Self Information of the event X= X; given Y= y
1
is defined as
/(x11y) =ws(ｐＨｾｉｊｩＩＩ＠ =-log ｾｸＬｬ＠ Y)· (1.11)
Thus, we may write
I{x6 y
1) = I(xJ - I(x; Iy
1). (1.12)
The conditional self information can be interpreted as the self information about the
event X= X; on the basis of the event Y= Yt Recall that both J(x;) ｾ＠ 0 and I(x; Iy1)ｾ＠ 0.
Therefore, I(x;; y1) <0 when J..x;) < I(x; Iy1) and I(x6 y
1) >0 when /(x;) > /(x; IY)·
Hence, mutual information can be positive, negative or zero.
Examph 1.5 Consider the BSC discussed in Example 1.3. The plot of the mutual information
l(x0; yo) versus the probability of error, pis given in Fig. 1.3.
1 Ｍ］ｾＭＭｾＭＭｾＭＭｾｾＭＭｾＭＭＮＭＭＭｾｾｾｾ＠
I I I I I I I
ｾＵ＠ＭＭｾＭＭｾＭＭｾＭＭｾＭＭｾＭＭｾＭＭ
1 I I I I I I I
0 ＭＭｾＭＭｾＭＭｾＭＭｾＭＭＭＭｾＭＭｾＭＭｾＭＭｾＭＭ
1 I I I I I I I I
I I I I I I I I
ＭｾＵ＠ --l--l--l--l--l--l ＭｾＭＭｾＭＭｾＭＭ
1 I I I I I I I
-1 ＭＭｾＭＭｾＭＭｾＭＭｾＭＭｾＭＭｾＭＭｾ＠ ,--,--
1 I I I I I I I
-1.1 ----1---i---i---i----1---i----j-- ---j--
1 I I I I I I I I
-2 ＭＭｾＭＭｾＭＭｾＭＭｾＭＭＭＭＱＭＭＭＭＱＭＭＭＭＱＭＭＭＭＱＭ -l-- ,,
I I I I I I I I I
- 2.5 L_____l._____L__ ___L____.J..___.i._ ____J._---,.-----l__.J..__.,.--l,--___J
0
Fig. 1.3 The Plot of ftle Mutua/Information /(xo: ycJ ｖｂｾｓｕｓ＠ ftle Probability of Error. p.
Source Coding
It can be seen from the figure thatl(Xo; y0) is negative forp > 0.5. The physical interpretation is as
follows. A negative mutual information implies that having observed Y = y0, we must avoid
choosing X = Xo as the transmitted bit.
For p = 0.1,
l(x0; y1) = 1(0; 1) = log22{p) = log22(0.1) =- 2.322 bits.
This shows that the mutual information between the eventsX= Xo andY= y1 is negative forp =0.1.
For the extreme case ofp = 1, we have
l(x0; y1) = 1(0; 1) = log22{p) = log22(1) =- I bit.
The channel always changes a 0 to a 1 and vice versa (since p = 1). This implies that if y1
is
observed at the receiver, it can be concluded that Xo was actually transmitted. This is actually a
useful channel with a 100% bit error rate! We just flip the received bit.
1.3 AVERAGE MUTUAL INFORMATION AND ENTROPY
So far we have studied the mutual information associated with a pair of events xi and y
1
which
are the possible outcomes of the two random variables X and Y. We now want to find out the
average mutual information between the two random variables. This can be obtained simply by
weighting !{xi; y
1) by the probability of occurrence of the joint event and summing over all
possible joint events.
Definition 1.4 The Average Mutual Information between two random variables
X and Yis given by
For the case when X and Yare statistically independent, I(X; Y) = 0, i.e., there is no
average mutual information between X and Y. An important property of the average
mutual information is that /(X; Y) ｾ＠ 0, where equality holds if and only if X and Yare
statistically independent.
Definition 1.5 The Average Self-Information of a random variable Xis defined as
n n
H(X) = LP(x;)I(Xj) =- LP(xi)logP(Xj) {1.14)
i=l i=l
When X represents the alphabet of possible output letters from a source, H(X)
represents the average information per source letter. In this case H (X) is called the
entropy. The term entropy has been borrowed from statistical mechanics, where it is
used to denote the level of disorder in a system. It is interesting to see that the Chinese
character for entropy looks like II!

Example 1.6 Consider a discrete binary source that emits a sequence of statistically independent
symbols. The output is either a 0 with probabilityp or a 1 with a probability 1- p. The entropy of
this binary source is
1
H(X) = - L P(x; )log P(x;) =- plog2 (p)- (1- p) log2 (1- p) '(1.15)
i=O
The plot of the Binary Entropy Function versus p is given in Fig. 1.4.
We observe from the figure that the value of the binary entropy function reaches its maximum
value for p = 0.5, i.e., when both 1 and 0 are equally likely. In general it can be shown that the
entropy of a discrete source is maximum when the letters from the source are equally probable.
H(X)
1 ｾＭＭＭＮＭＭＭＭＰＭＷＭＭｾＭＭＭＭＮＭＭＭＭＮ＠
0.8
0.6
0.4
0.2
0.2 0.4 0.6 0.8
Fig. 1.4 The Binary Entropy Function, H (X)=- p log2 (p)- (7 - p) log2 (1 - p).
Definition 1.6 The Average Conditional Self-Information called the conditional
entropy is defined as
n m
1
H(X IY) = LLP(xi, y1)log ( )
i=1J=1 P xiiYJ
(1.16)
The physical interpretation of this definition is as follows. H(XIY) is the information
(or uncertainty) in X having observed Y. Based on the definitions of H(X IY) and
H( YIX) we can write
I(X; Y) = H(X)- H(XIY) = H(Y)- H(YIX). (1.17)
We make the following observations.
(i) Since /(X; Y) ｾ＠ 0, it implies that H(X) ｾ＠ H(XI Y).
Source Coding
(ii) The case I (X; Y) = 0 implies that H(X) = H(XI Y), and it is possible if and only
if X and Yare statistically independent.
(iii) Since H(X IY) is the conditional self-information about X given Y and H(X) is
the average uncertainty (self-information) of A'; I(X; Y) is the average
uncertainty about ｘｾ｡ｶｩｮｧ＠ observed Y.
(iv) Since H(X) ｾ＠ H(X IY), the observation of Y does not increase the entropy
(uncertainty). It can only decrease the entropy. That is, observing Y cannot
reduce the information about ｾ＠ it can only add to the information.
Example 1.7 Consider the BSC discussed in Example 1.3. Let the input symbols be '0' with
probability q and '1' with probability 1- q as shown in Fig. 1.5.
1-p
Probability 0 ｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＮＮＮＮ＠ 0
q
Tx
1-q
Channel
Fig. 1.5 A Binary Symmetric Channel (BSC) with Input Symbols
Probabilities Equal to q and 7 - q.
The entropy of this binary source is
1
Rx
1
H(X) =- L P(x;)logP(x;) = -qlog2(q)- (1- q)log2 (1- q)
i=O
The conditional entropy is given by
n m
1
H(XlY)= L L,P(x;,y)log--
i=1 j=1 p(x;IYj)
In order to calculate the values ofH(XlY), we can make use of the following equalities
P(x;, Y) =P(x; IY) P(y) =P(yj IX;) P(x;)
The plot ofH(XIY) versus q is given in Fig. 1.6 withp as the parameter.
(1.18)
(1.19)

H(XJY)
Fig. 1.6 The Plot of Conditional Entropy H(XI Y) Versus q.
The average mutual information /(X; Y) is given in Fig. 1.7. It can be seen from the plot that as we
increase the parameterp from 0 to 0.5, I(X; Y) decreases. Physically it implies that, as we make the
channel less reliable (increase the value ofp ｾ＠ 0.5), the mutual information between the random
variable X (at the transmitter) and the random variable Y (receiver) decreases.
1.4 INFORMATION MEASURES FOR CONTINUOUS RANDOM VARIABLES
The definitions of mutual information for discrete random variables can be directly extended to
continuous random variables. Let X and Y be random variables with joint probability density
function (pdf} p(x, y) and marginal pdfs p{x) and p(y). The average mutual information between X
and Y is defined as follows.
Definition 1.7 The average mutual information between two continuous
random variables X and Y is defined as
I(X:, Y) = j jp(x)p(ylx)log p(ylx)p(x) dxdy
---- p(x)p(y)
(1.20)
Fig. 1.7 The Plot of the Average Mutua/Information I(X: 'r? Versus q.
r
I
I
i
Source Coding
It should be pointed out that the definition of average mutual information can be
carried over from discrete random variables to continuous random variables, but the
concept and physical interpretation cannot. The reason is that the information
content in a continuous random variable is actually infinite, and we require infinite
number of bits to represent a continuous random variable precisely. The self-
information and hence the entropy is infinite. To get around the problem we define
a quantity called the differential entropy.
Definition 1.8 The differential entropy of a continuous random variable X is
defined as
ｈＨｾ＠ =-Ip(x)logp(x) (1.21)
Again, it should be understood that there is no physical meaning attached to the
above quantity. We carry on with extending our definitions further.
Definition 1.9 1he Average Conditiona) Entropy of a continuous random
variables X given Y is defined as
H(XI Y) = I Ip(x, ｹＩｬｯｧｰＨｸｬｹＩｾ､ｹ＠ (1.22)
The average mutual information can be expressed as
I(X:, Y)=H(X) -H(XIY)=H(Y) -H(YIX) (1.23)
1.5 SOURCE CODING THEOREM
In this section we explore efficient representation (efficient coding) of symbols generated by a
source. The primary objective is the compression of data by efficient representation of the
symbols. Suppose a discrete memoryless source (DMS) outputs a symbol every t seconds and
each symbol is selected from a finite set of symbols xfl i= 1, 2, ..., L, occurring with probabilities
P (x;), i = 1, 2, ..., L, the entropy of this DMS in bits per source symbols is
L
H(X) = L P(x; )log2 P(x;) :5log2 L (1.24)
j=!
The equality holds when the symbols are equally likely. It means that the average number of
bits per source symbol is H(X) and the source rate is H(X)Itbitslsec.
Now let us represent the 26 letters in the English alphabet using bits. We observe that
25
= 32 > 26. Hence, each of the letters can be uniquely represented using 5 bits. This is an
example of a Fixed Length Code (FLC). Each letter has a corresponding 5 bit long codeword.

I Definition 1.10 A code is a set of vectors called codewords.
Suppose a DMS outputs a symbol selected from a finite set of symbols xi, i= 1, 2, ...,
L. The number of bits R required for unique coding when L is a power of 2 is
R = log2
L, (1.25)
and, when L is not a power of 2, it is
R = Llog2LJ + 1. (1.26)
As we saw earlier, to encode the letters of the English alphabet, we need R= Llog226J
+ 1 = 5 bits. The FLC for the English alphabet suggests that every letter in the
alphabet is equally important (probable) and hence each one requires 5 bits for
representation. However, we know that some letters are less common (x, q, z etc.)
while others are more frequently used (s, t, e etc.). It appears that allotting equal
number of bits to both the frequently used letters as well as not so commonly used
letters is rwt an efficient way of representation (coding). Intuitively, we should
represent the more frequently occurring letters by fewer number of bits and represent
the less frequently occurring letters by larger number of bits. In this manner, if we
have to encode a whole page of written text, we might end up using fewer number of
bits overall. When the source symbols are not equally probable, a more efficient
method is to use a Variable Length Code (VLC).
Example 1.8 Suppose we have only the frrst eight letters of the English alphabet (A-H) in our
vocabulary. The Fixed Length Code (FLC) for this set of letters would be
Letter Codeword Letter
A 000 E
B 001 F
c 010 G
D 011 H
Fixed Length Code
A VLC for the same set of letters can be
Letter Codeword Letter
A 00 E
B 010 F
c 011 G
D 100 H
Variable Length Code 1
Codeword
100
101
110
111
Codeword
101
110
1110
1111
f
Source Coding
Suppose we have to code the series of letters: "A BAD CAB". The fixed lertgth and the variable
length representation of the pseudo sentence would be
Fixed Length Code 000 001 000 011 010 000 001 I Total bits- 21
Variable Length Code 00 010 00 100 011 00 010 I Total bits ｾ＠ 18
Note that the variable length code uses fewer number of bits simply because the letters appearing
more frequently in the pseudo sentence are represented with fewer number of bits.
We look at yet another VLC for the frrst 8 letters of the English alphabet:
Letter Codeword Letter Codeword
A 0 E 10
B 1 F 11
c 00 G 000
D 01 H Ill
Variable Length Code 2
This second variable length code appears to be more efficient in terms of representation of the
letters.
Variable Length Code 1 00 010 00 100 011 00 010 Total bits= 18
Variable Length Code 2 0 1001 0001 Total bits = 9
However there is a problem with VLC2. Consider the sequence of bits here-0 100I 0001 which is
used to represent A BAD CAB. We could regroup the bits in a different manner to have [0]
[10][0][1] [0][0][01] which translates to A EAB AAD or [0] [1][0][0][1] [0][0][0][1] which
stands for A BAAB AAAB ! Obviously there is a problem with the unique decoding of the code.
We have no clue where one codeword (symbol) ends and the next one begins, since the lengths of
the codewords are variable. However, this problem does not exist with VLCl. Here no codeword
forms the prefix ofany other codeword. This is called the prefix condition. As soon as a sequence
ofbits corresponding to any one ofthe possible codewords is detected, we can declare that symbol
decoded. Such codes called Uniquely Decodable or Instantaneous Codes cause no decoding
delay. In this example, the VLC2 is not a uniquely decodable code, hence not a code ofany utility.
The VLC1 is uniquely decodable, though less economical in terms of bits per symbol.
Definition 1.11 A Prefix Code is one in which no codeword forms the prefix of any
other codeword. Such codes are also called Uniquely Decodable or Instantaneous
Codes.
We now proceed to devise a systematic procedure for constructing uniquely
decodable, Variable Length Codes that are efficient in terms of average number of
bits per source letter. Let the source output a symbol from a finite set of symbols xi'!

i= 1, 2, ..., L, occurring with probabilities P(xJ, i = 1, 2, ..., L. The average number of
bits per source letter is defined as
L
R = L n(xk)P(xk) (1.27)
k=l
where n(xi) is the length of the codeword for the symbol xi.
Theorem 1.1 (Kraft Inequality) A necessary and sufficient condition for the existence of
a binary code with codewords having lengths n1 :5 n;. :5 ... nL that satisfy the prefix condition
is
(1.28)
Proof First we prove the sufficient condition. Consider a binary tree of order (depth) n=
nL. This tree has 2nL terminal nodes as depicted in Fig. 1.8. Let us select any code of order
n1 as the first codeword c1. Since no codeword is the prefix of any other codeword (the
prefix condition), this choice eliminates 2n--n, terminal codes. This process continues until
the last codeword is assigned at the terminal node n = nL" Consider the node of orderj <L.
The fraction of number of terminal nodes eliminated is
j L
LTnk < _Lz-nk :51. (1.29)
k=l k=l
Thus, we can construct a prefix code that is embedded in the full tree of nL nodes.
The nodes that are eliminated are depicted by the dotted arrow lines leading on to them in
.the figure.
0
0
Fig. 1.8 A Binary Tree of Order nL.
We now prove the necessary condition. We observe that in the code tree of order n = nb
the number of terminal nodes eliminated from the total number of 2n terminal nodes is
r
Source Coding
This leads to
Example 1.9 Consider the construction of a prefix code using a binary tree.
no
0
0
..ｾｾｾＮＮＮＮＭ＠
nco ｾＯＯＮＮＮＮＭ ---......._
ＭＭＭＭＢＶｾｾｾ＠ ......•
---.......
llo10 ｾＭﾷ＠
---......_ no110
ｾﾷ＠
no11 1 • no111
Fig. 1.9 Constructing a Binary Pr!!flx Code using a Binary Tree.
(1.30)
(1.31)
We start from the mother node and proceed toward the terminal nodes ofthe binary tree (Fig. 1.9).
Let the mother node be labelled '0' (could have been labelled '1' as well). Each node gives rise to
two branches (binary tree). Let's label the upper branch '0' and the lower branch '1' (these labels
could have also been mutually exchanged). First we follow the upper branch from the mother node.
We obtain our first codeword c1 = 0 terminating at node floo· Since we want to construct a prefix
code where no codeword is a prefix of any other codeword, we must discard all the daughter nodes
generated as a result of the node labelled c1•
Next, we proceed on the lower branch from the mother node and reach the node no1• We proceed
along the upper branch first and reach node now· We label this as the codeword c2 = 10 (the labels
of the branches that lead up to this node travelling from the moj}er node). Following the lower
branch from the nodeno1, we ultimately reach the terminal nodes no110 andn0111, which correspond
to the codewords c3 =110 and c4 =111 respectively.
Thus the binary tree has given us four prefix codewords: {0, 10, 110, 111 }. By construction, this
is a prefix code. For this code
L
L 2-nk =2-1
+ T 2
+ 2-3
+ T 3
=0.5 + 0.25 + 0.125 + 0.125 =1
k=l
Thus, the Kraft inequality is satisfied.
We now state and prove the noiseless Source Coding theorem, which applies to the codes that
satisfy the prefix condition.
.I

Theorem 1.2 (Source Coding Theorem) Let Xbe the set ofletters from a DMS with finite
entropy H(X) and xb k= 1, 2, ..., L the output symbols, occurring with probabilities P(xk),
k = 1, 2, ..., L. Given these parameters, it is possible to construct a code that satisfies the
prefix condition and has an average length R that satisfies the inequality
H(X) ｾ＠ R<H(X) + 1 (1.32)
Proof First consider the lower bound of the inequality. For codewords that have length
ｮｾｯ＠Ｑｾ＠ k ｾ＠ L, the difference R - H(X) can be expressed as
_ L
1
L L
2
-nk
H(X) - R = L Pk log2-- L Pknk = L Pk log2-
k=l Pk k=l k=l Pk
We now make use of the inequality ln x ｾ＠ x - 1 to get
,; (log2 e) (t.T'' -1J,;0
The last inequality follows from the Kraft inequality. Equality holds if and only ifPk = 2-nk
for 1 ｾ＠ｫｾ＠ L.
Thus the lower bound is proved. Next, we prove the upper bound. Let us select the
codeword lengths nk such that 2-nk ｾ＠ Pk < 2-nk + 1
• First consider 2-nk ｾ＠ｐｾ＾＠ Summing both
sides over 1 ｾ＠ k ｾ＠ L gives us
L L
L 2-nk ｾ＠ L Pk = 1
k=l k=l
which is the Kraft inequality for which there exist a code satisfying the prefix condition.
Next consider Pk < 2-nk+I. Take logarithm on both sides to get
log2 Pk <- nk + 1,
or,
nk < 1 - log2 Pk·
On multiplying both sides by Pk and summing over 1 ｾ＠ k ｾ＠ L we obtain
t.p.n,< t.h+(-ｾＯＧｬｯｧＲｨ＠ l
or,
R<H(X) + 1
r
I
I
i
i
i
I
Source Coding
Thus the upper bound is proved.
The Source Coding Theorem tells us that for any prefix code used to represent the symbols
from a source, the minimum number of bits required to represent the source symbols on an
average must be at least equal to the entropy of the source. If we have found a prefix code that
satisfies R= H(X) for a certain source .A: rve must abandon further search because we cannot do
any better. The theorem also tells us that a source with higher entropy (uncertainty) requires, on
an average, more number of bits to represent the source symbols in terms of a prefix code.
Definition 1.12 The efficiency of a prefix code is defined as
11
= Hj_x)
R
(1.33)
It is clear from the source coding theorem that the efficiency of a prefix code 11 ｾ＠ 1.
Efficient representation of symbols leads to compression of data. Source coding is
primarily used for compression of data (and images).
Example 1.10 Consider a source X which generates four symbols with probabilities PI = 0.5,
p2 = 0.3, p3 = 0.1 and p4 = 0.1. The entropy of this source is
4
H(X) = - L Pk log2 Pk = 1.685 bits.
k=I
Suppose we use the prefix code {0, 10, 110, 111} constructed in Example 1.9. Then the average
codeword length, R is given by
4
R = L n(xk)P(xk) = 1(0.5) + 2(0.3) + 3(0.1) + 3(0.1) = 1.700 bits.
k=I
Thus we have
H(X) ｾ＠ R ｾｈＨｘＩ＠ + 1
The efficiency of this code is 11 = (1.685/1.700) = 0.9912. Had the source symbol probabilities
been Pk = 2-n1, i.e., PI = 2-I = 0.5, p2 = 2-2
= 0.25, p3 = 2-3
= 0.125 and p4 = 2-
3
= 0.125, the
average codeword length would be, R= 1.750 bits= H(X).In this case, 11 = 1.
1.6 HUFFMAN CODING
We will now study an algorithm for constructing efficient source codes for a DMS with source
symbols that are not equally probable. A variable length encoding algorithm was suggested by
Huffman in 1952, based on the source symbol probabilities P(xJ, i= 1, 2, ..., L. The algorithm
is optimal in the sense that the average number of bits it requires to represent the source symbols

is a minimum, and also meets the prefix condition. The steps of the Huffman coding algorithm
are given below:
(i) Arrange the source symbols in decreasing order of their probabilities.
(ii) Take the bottom two symbols and tie them together as shown below. Add the.
probabilities of the two symbols and write it on the combined node. Label the two
branches with a '1' and a '0' as depicted in Fig. 1.10.
Pn-2
0
ｐｮＭＱｾＫｰ＠ n-1 n
Pn 1
Fig. 1.1 0 Combining Probabilities in Huffman Coding.
(iii) Treat this sum of probabilities as a new probability associated with a new symbol. Again
pick the two smallest probabilities, tie them together to form a new probability. Each
time we perform the combination of two symbols we reduce the total number of symbols
by one. Whenever we tie together two probabilities (nodes) we label the two branches
with a '1' and a '0'.
(iv) Continue the procedure until only one probability is left (and it should be 1 if your
addition is correct!). This completes the construction of the Huffman tree.
(v) To find out the prefix codeword for any symbol, follow the branches from the final node
back to the symbol. While tracing back the route read out the labels on the branches.
This is the codeword for the symbol.
The algorithm can be easily understood using the following example.
Example 1.11 Consider a DMS with seven possible symbols X;, i = 1, 2, ..., 7 and the
corresponding probabilitiesp1 =0.37,p2 =0.33,p3 =0.16,p4 =0.01,p5 =0.04,p6 =0.02, andp7
= 0.01. We first arrange the probabilities in the decreasing order and then construct the Huffman
tree as in Fig. 1.11.
Symbol Probability SelfInfonnation Codeword
X! 0.37 1.4344 0
x2 0.33 1.5995 10
x3 0.16 2.6439 110
x4 0.07 3.8365 1110
xs 0.04 4.6439 llllO
x6 0.02 5.6439 111110
x1 0.01 6.6439 111111
Source Coding
0
X1 0.37
0 1.00
x2 0.33 ----
0 0.66 I
X3 0.16
ＭＭＭＱＭｾ＠
X4 0.07
xs 0.04
0 0.30 I
.... ______ 0 _____1
0.14
ｾＭＭｌｾ＠
I
I ___ .,l_J
.-
X6 0.02
0
0.07
0.03
1
1
X7 0.01
Fig. 1.11 Huffman Coding for Example 1. 17.
To find the codeword for any particular symbol, we just trace back the route from the final node to
the symbol. For the sake of illustration we show the route for the symbolx4 with probability 0.07
with the dotted line. We read out the labels of the branches on the way to obtain the codeword as
1110.
The entropy of the source is found out to be
7
H(X) = - L Pk log2 Pk = 2.1152 bitsl
k=l
and the average number of binary digits per symbol is calculated to be
7
R = 'I,n(xk)P(xk)
k=l
= 1(0.37) + 2(0.33) + 3(0.16) + 4(0.07) + 5(0.04) + 6(0.02) + 6(0.01)
= 2.1700 bits.
The efficiency of this code is Tf = (2.1152/2.1700) = 0.9747.
Example 1.12 This example shows that Huffman coding is not unique. Consider a DMS with
seven possible symbols X;, i = 1, 2, ..., 7 and the corresponding probabilities p1 =0.46, P2 =0.26,
p3 =0.12, p4 =0.06, p5 = 0.03, p6 =0.02, and p1 = 0.01.
Symbol Probability Selflnfonnation Codeword
xl 0.46 1.1203 1
x2 0.30 1.7370 00
x3 0.12 3.0589 010
x4 0.06 4.0589 0110
xs 0.03 5.0589 01110
x6 0.02 5.6439 011110
x1 0.01 6.6439 011111

x1 0.46
x2 0.30
0 I
X3 0.12
0 0.54
X4 0.06
Xs 0.03
0 0.24
0
1
0.12
1
Xs 0.02
X7 0.01
0 0.06
x6
' I 0.03
1
I 1
Fig. 1.12 Huffman Coding for Example 7. 7
2.
The entropy of the source is found out to be
7
H(X) =-IPk log2 Pk =1.9781 bits,
k=l
I
and the average number of binary digits per symbol is calculated to be
7
R = I n(xk )P(xk)
k=l
0
ｾ＠ 1
= 1(0.46) + 2(0.30) + 3(0.12) + 4(0.06) + 5(0.03) + 6(0.02) + 6(0.01)
= 1.9900 bits.
The efficiency of this code is 11 =(1.978111.9900) =0.9940.
We shall now see that Huffman coding is not unique. Consider the combination ofthe two smallest
probabilities (symbols x6 ｡ｮ､ｾＩＮ＠ Their sum is equal to 0.03, which is equal to the next higher
probability corresponding to the symbol x5. So, for the second step, we may choose to put this
combined probability (belonging to, say, symbol xt;') higher than, or lower than, the symbol x5•
Suppose we put the combined probability at a lower level. We proceed further, to again find the
combination ofx6' and x5 yields the probability 0.06, which is equal to that ofsymbolx4• We again
have a choice whether to put the combined probability higher than, or lower than, the symbol x4•
Each time we make a choice (or flip a fair coin) we end up changing the final codeword for the
symbols. In Fig. 1.13, each time we have to make a choice between two probabilities that are equal,
we put the probability of the combined symbols at a higher level.
Source Coding
X1 0.46
x2 0.30
0.54
X3 0.12
0.24
X4 0.06
0.12
xs 0.03
0
0.06
x6 0.02
0.03
X7 0.01
1
Fig. 1.13 Alternative way of Huffman Coding in Example 7. 7
2 which
Leads to a Different Code.
Symbol
xl
x2
x3
x4
Xs
x6
X?
The entropy of the source is
7
Probability
0.46
0.30
0.12
0.06
0.03
0.02
0.01
SelfInformation
1.1203
1.7370
3.0589
4.0589
5.0589
5.6439
6.6439
H(X) =-IPk log2 Pk =1.9781 bits,
k=l
and the average number of bits per symbol is
7
R = In(xk)P(xk)
k=l
Codeword
1
00
011
0101
01001
010000
010001
= 1(0.46) + 2(0.30) + 3(0.12) + 4(0.06) + 5(0.03) + 6(0.02) + 6(0.01)
= 1.9900 bits.
The efficiency of this code is 71 = (1.978111.9900) = 0.9940. Thus both codes are equally efficient.
In the above examples, encoding is done symbol by symbol. A more efficient procedure
is to encode blocks of B symbols at a time, In this case the bounds of the source coding
theorem becomes
BH(X) :::; RB < BH(X) + 1
I

since the entropy of a B-symbol block is simply BH(X;, and RB is the average number of
bits per B-symbol block. We can rewrite the bound as
H(X; :::; Rs <H(X; + __!_
B B
(1.34)
R - -
where _!L = R is the average number of bits per source symbol. Thus, R can be made
B
arbitrarily: close to H(X; by selecting a large enough block B.
Example 1.13 Consider the source symbols and their respective probabilities listed below.
Symbol Probability SelfInformation Codeword
xl 0.40
Xz 0.35
x" 0.25
For this code, the entropy of the source is
3
1.3219
1.5146
2.0000
H(X) =- L Pk log2 Pk =1.5589 bits
k=l
The average number of binary digits per symbol is
3
R = I n(xk )P(xk)
k=l
= 1(0.40) + 2(0.35) + 2(0.25) = 1.60 bits,
and the efficiency of this code is 1] =(1.5589/1.6000) =0.9743.
1
00
01
We now group together the symbols, two at a time, and again apply the Huffman encoding
algorithm. The probabilities of the symbol pairs, in decreasing order, are listed below.
Symbol Pairs Probability SelfInformation Codeword
XtXI 0.1600 2.6439 10
XtXz 0.1400 2.8365 001
XzXI 0.1400 2.8365 010
XzXz 0.1225 3.0291 011
XtX3 0.1000 3.3219 111
x3x1 0.1000 3.3219 0000
XzX3 0.0875 3.5146 0001
X3Xz 0.0875 3.5146 1100
x3x3 0.0625 4.0000 1101
Source Coding
For this code, the entropy is
9
2H(X) =- LPk log2 Pk =3.1177 bits,
k=l
==> H(X) = 1.5589 bits.
Note that the source entropy has not changed !The average number of bits per block (symbol pair)
is
9
RB = I n(xk )P(xk)
k=l
= 2(0.1600) + 3(0.1400) + 3(0.1400) + 3(0.1225)
+ 3(0.1000) + 4(0.1000) + 4(0.0875) + 4(0.0875) + 4(0.0625)
= 3.1775 bits per symbol pair.
==> R = 3.1775/2 = 1.5888 bits per symbol.
and the efficiency of this code is 1] = (1.558911.5888) = 0.9812. Thus we see that grouping of two
letters to make a symbol has improved the coding efficiency.
Example 1.14 Consider the source symbols and their respective probabilities listed below.
Symbol Probability SelfInformation Codeword
XI 0.50
Xz 0.30
x" 0.20
For this code, the entropy of the source is
3
1.0000
1.7370
2.3219
H(X) = - I Pk logz Pk = 1.4855 bits.
k=l
The average number of bits per symbol is
3
R = I n(xk )P(xk)
k=l
= 1(0.50) + 2(0.30) + 2(0.20) = 1.50 bits,
arid the efficiency of this code is 1J =(1.4855 /1.5000) =0.9903.
1
00
01
We now group together the symbols, two at a time, and again apply the Huffman encoding
algorithm. The probabilities of the symbol pairs, in decreasing order, are listed as follows.

Symbol Pairs Probability SelfInformation
xlxl 0.25 2.0000
x1
x2 0.15 2.7370
XzX! 0.15 2.7370
xlx3 0.10 3.3219
x3xl 0.10 3.3219
XzXz 0.09 3.4739
XzX3 0.06 4.0589
x3x2 0.06 4.0589
ｸｾｸｾ＠ 0.04 4.6439
For this code, the entropy is
9
2H(X) = - L,Pk 1og2 Pk = 2.9710 bits,
k=!
ｾ＠ H(X) = 1.4855 bits.
The average number of bits per block (symbol pair) is
9
RB = L n(xk )P(xk)
k=!
Codeword
00
010
011
100
110
1010
1011
1110
1111
= 2(0.25) + 3(0.15) + 3(0.15) + 3(0.10) + 3(0.10) + 4(0.09) + 4(0.06) +
4(0.06) + 4(0.04) = 3.00 bits per symbol pair.
ｾ＠ ii = 3.00/2 = 1.5000 bits per symbol.
and the efficiency of this code is ry2 = (1.4855 /1.5000) = 0.9903.
In this case, grouping together two letters at a time has not increased the efficiency ofthe code!
However, if we group 3 letters at a time (triplets) and then apply Huffman coding, we obtain the
code efficiency as ry3 = 0.9932. Upon grouping four letters at a time we see a further improvement
(TJ4 = 0.9946).
1.7 THE LEMPEL-ZIV ALGORITHM
Huffman coding requires the symbol probabilities. But most real life scenarios do not provide
the symbol probabilities in advance (i.e., the statistics of the source is unknown). In principle, it
is possible to observe the output of the source for a long enough time period and estimate the
symbol probabilities. However, this is impractical for real-time application. Also, while
Huffman coding is optimal for a DMS source where the occurrence of one symbol does not
alter the probabilities of the subsequent symbols, it is not the best choice for a source with
r
i
ｓｾｲ｣･＠ Coding
memory. For example, consider the problem of compression of written text. We know that
many letters occur in pairs or groups, like 'q-u', 't-h', 'i-n-g' etc. It would be more efficient to use
the statistical inter-dependence of the letters in the alphabet along with their individual
probabilities of occurrence. Such a scheme was proposed by Lempel and Ziv in 1977. Their
source coding algorithm does not need the source statistics. It is a Variable-to-Fixed Length
Source Coding Algorithm and belongs to ｾ･＠ class of universal source coding algorithms.
The logic behind Lempel-Ziv universal coding is as follows. The compression of an arbitrary
sequence of bits is possible by coding a series of O's and 1's as some previous such string (the
prefix string) plus one new bit (called innovation bit). Then, the new string formed by adding
the new bit to the previously used prefix string becomes a potential prefix string for· future
strings. These variable length blocks are called phrases. The phrases are listed in a dictionary
which stores the existing phrases and their locations. In encoding a new phrase, we specify the
location of the existing phrase in the dictionary and append the new letter. We can derive a
better understanding of how the Lempel-Ziv algorithm works by the following example.
Example 1.15 Suppose we wish to code the string: 101011011010101011. We will begin by
parsing it into comma-separated phrases that represent strings that can be represented by a
previous string as a prefix, plus a bit. ·
The first bit, a 1, has no predecessors, so, it has a null prefix string and the one extra bit is itself:
1, 01011011010101011
The same goes for the 0 that follows since it can't be expressed in terms ofthe only existing prefix:
1, 0, 1011011010101011
So far our dictionary contains the strings' 1' and '0'. Next we encounter a 1, but it already exists in
our dictionary. Hence we proceed further. The following 10 is obviously a combination of the
prefix 1 and a 0, so we now have:
1, 0, 10, 11011010101011
Continuing in this way we eventually parse the whole string as follows:
1, 0, 10, 11, 01, 101, 010, 1011
Now, since we found 8 phrases, we will use a three bit code to label the null phrase and the first
seven phrases for a total of 8 numbered phrases. Next, we write the string in terms ofthe number of
the prefix phrase plus the new bit needed to create the new phrase. We will use parentheses and
commas to separate these at first, in order to aid our visualization ofthe process. The eight phrases
can be described by:
(000,1)(000,0),(001,0),(001'1),(010,1),(011,1),(101,0),(110,1).

It can be read out as: (codeword at location 0,1), (codeword at location 0,0), (codeword at
location 1,0), (codeword at location 1,1), (codeword at location 2,1), (codeword at location 3,1),
and so on.
Thus the coded version of the above string is:
00010000001000110101011110101101.
The dictionary for this example is given in Table 1.1. In this case, we have not obtained any
compression, our coded string is actually longer! However, the larger the initial string, the more
saving we get as we move along, because prefixes that are quite large become representable as
small numerical indices. In fact, Ziv proved that for long documents, the compression of the file
approaches the optimum obtainable as determined by the information content ofthe document.
Table 1.1 Dictionary for the Lempei-Ziv algorithm
Dictionary Dictionary Fixed Length
Location content Code'rvord
001 1 0001
010 0 0000
011 10 0010
100 11 0011
101 01 0101
110 101 0111
111 010 1010
1011 1101
The next question is what should be the length of the table. In practical application, regardless
of the length of the table, it will eventually overflow. This problem can be solved by pre-
deciding a large enough size of the dictionary. The encoder and decoder can update their
dictionaries by periodically substituting the less used phrases from their dictionaries by more
frequently used ones. Lempel-Ziv algorithm is widely used in practice. The compress and
uncompress utilities of the UNIX operating system use a modified version of this algorithm.
The standard algorithms for compressing binary files use code words of 12 bits and transmit 1
extra bit to indicate a new sequence. Using such a code, the Lempel-Ziv algorithm can compress
transmissions of English text by about 55 per cent, whereas the Huffman code compresses the
transmission by only 43 per cent.
In the following section we will study another type of source coding scheme, particularly
useful for facsimile transmission and image compression.
1.8 RUN LENGTH ENCODING AND THE PCX FORMAT
Run-Length Encoding or RLE is a technique used to reduce the size of a repeating string of
characters. This repeating string is called a run. Typically RLE encodes a run of symbols into
two bytes, a count and a symbol. RLE can compress any type of data regardless of its
Source Coding
information content, but the content of data to be compressed affects the compression ra4o.
RLE cannot achieve high compression ratios compared to other compression methods, but it is
easy to implement and is quick to execute. RLE is supported by most bitmap file formats such
as TIFF, BMP and PCX.
Example 1.16 Consider the following bit stream:
1111111111111110000000000000001111.
This can be represented as: fifteen 1's, nineteen O's, four 1's, i.e., (15,1), (19, 0), (4,1). Since the
maximum number ofrepetitions is 19, which can be represented with 5 bits, we can encode the bit
stream as (01111,1), (10011,0), (00100,1). The compression ratio in this case is 18:38 = 1:2.11.
RLE is highly suitable for FAX images of typical office documents. These two-colour images
(black and white) are predominantly white. If we spatially sample these images for conversion
into digital data, we find that many entire horizontal lines are entirely white (long runs of O's).
Furthermore, if a given pixel is black or white, the chances are very good that the next pixel will
match. The code for fax machines is actually a coml]ination of a run-length code and a Huffman
code. A run-length code maps run lengths into code words, and the codebook is partitioned into
two parts. The first part contains symbols for runs of lengths that are a multiple of 64; the second
part is made up of runs from 0 to 63 pixels. Any run length would then be represented as a
multiple of 64 plus some remainder. For example, a run of 205 pixels would be sent using the
code word for a run of length 192 (3 x 64) plus the code word for a run of length 13. In this way
the number of bits ｮｾ･､･､＠ to represent the run is decreased significantly. In addition, certain
runs that are known to have a higher probability of occurrence are encoded into code words of
short length, further reducing the number of bits that need to be transmitted. Using this type of
encoding, typical compressions for facsimile transmission range between 4 to 1 and 8 to 1.
Coupled to higher modem speeds, these compressions reduce the transmission time of a single
page to less than a minute.
Run length coding is also used for the compression of images in the PCX formaL The PCX
format was introduced as part of the PC Paintbrush series of software for image painting and
editing, sold by the ZSoft company. Today, the PCX format is actually an umbrella name for
several image compression methods and a means to identify which has been applied. We will
restrict our attention here to only one of the methods, for 256-colour images. We will restrict
ourselves to that portion of the PCX data stream that actually contains the coded image, and not
those parts that store the colour palette and image information such as number of lines, pixels
per line, file and the coding method.
The basic scheme is as follows. If a string of pixels are identical in colour value, encode them
as a special flag byte which contains the count followed by a byte with the value of the repeated
pixel. If the pixel is not repeated, simply encode it as the byte itself. Such simple schemes can

often become more complicated in practice. Consider that in the above scheme, if all 256
colours in a palette are used in an image, then, we need all 256 values of a byte to represent
those colours. Hence, if we are going to use just bytes as our basic code unit, we don't have any
possible unused byte values that can be used as a flag/count byte. On. the. other ｨ｡ｮｾＬ＠ if we use
two bytes for every coded pixel to leave room for the flag/count combmations, we mtght double
the size of pathological images instead of compressing them.
The compromise in the PCX format is based on the belief of its designers than many user-
created drawings (which was the primary intended output of their software) would not use all
256 colours. So, they optimized their compression scheme for the case of up to 192 colors only.
Images with more colours will also probably get good compression, just not quite as good, with
this scheme.
Example 1.17 PCX compression encodes single occurrences of colour (that is, a pixel that is not
part of a run of the same colour) 0 through 191 simply as the binary byte representation of exactly
that numerical value. Consider Table 1.2.
Table 1.2 Example of PCX encoding
P1xel color value Hex code Binary code
0
1
2
3
190
191
00
01
02
03
BE
BF
00000000
00000001
00000010
00000011
10111110
10111111
Forthe colour 192 (and all the colours higher than 192), the codeword is equal to one byte in which the
two most significant bits (MSBs) are both set to a 1. We will use these codewords to signify a flag and
count byte. Ifthe two MSBs are equal to one, we will say that they have flagged a count. The remaining
6 bits in the flag/count byte will be interpreted as a 6 bit binary number for the count (from 0 to 63).
This byte is then followed by the byte which represents the colour. In fact, ifwe have a run ofpixels of
one ofthe colours with palette code even over 191, we can still code the run easily since the top two bits
are not reserved in this second, colour code byte of a run coding byte pair.
If a run of pixels exceeds 63 in length, we simply use this code for the first 63 pixels in the run
and that code additional runs of that pixel until we exhaust all pixels in the run. The next question
is: how do we code those remaining colours in a nearly full palette image when there is no run? We
still code these as a run by simply setting the run length to 1. That means, for the case ofat most 64
colours which appear as single pixels in the image and not part of runs, we expand the data by a
factor of two. Luckily this rarely happens!
Source Coding
In the next section, we will study coding for analog sources. Recall that we ideally need
infinite number of bits to accurately represent an analog source. Anything fewer will only be an
approximate representation. We can choose to use fewer and fewer bits for representation at
the cost of a poorer approximation of the original signal. Thus, quantization of the amplitudes of
the sampled signals results in data compression. We would like to study the distortion
introduced when the samples from the information source are quantized.
1.9 RATE DISTORTION FUNCTION
Although we live in an analog world, most of the communication takes place in digital form.
Since most natural sources (e.g. speech, video etc.) are analog, they are first sampled, quantized
and then processed. Consider an analog message waveform x (t) which is a sample waveform of
a stochastic process X(t). Assuming X(t) is a bandlimited, stationary process, it can be
represented by a sequence of uniform samples taken at the Nyquist rate. These samples are
quantized in amplitude and encoded as a sequence of bits. A simple encoding strategy can be to
define L levels and encode every sample using
R = log2L bits if L is a power of 2, or
R = Llog2LJ + 1 bits if Lis not a power of 2.
If all levels are not equally probable we may 'use entropy coding for a more efficient
representation. In order to represent the analog waveform more accurately, we need more
number of levels, which would imply more number of bits per sample. Theoretically we need
infinite bits per sample to perfectly represent an analog source. Quantization of amplitude
results in data compression at the cost of signal integrity. It's a form of lossy data compression
where some measure of difference between the actual source samples {xJ and the correspon-
ding quantized value {xd results from distortion.
Definition 1.13 The squared-error distortion is defined as
d (xk. xk) = (xk- xk)2
In general a distortion measure may be represented as
d (xk' xk) = lxk, xklp
Consider a sequence of n samples, Xn, and the corresponding nquantized values, Xn.
Let d(xk, xk) be the distortion measure per sample (letter). Then the distortion
measure between the original sequence and the sequence of quantized values will
simply be the average over the n source output samples, i.e.,
d(X11
,X11
) = __!_ Id(xk,xk)
n k=I
We observe that the source is a random process, hence Xn and consequently
d(X X ) are random variables. We now define the distortion as follows.
11' 11

I
L
Definition 1.14 The distortion between a sequence of n samples, ｘｾ＠ and their
corresponding n quantized values, xn is defined as
D= E[d(X,"X11
)] = _!_ IE(tl(x.t,i.t)] = .Eltl(x,t,i'.t)}.
n .t=l
It has been assumed here that the random process is stationary.
Next, let a memoryless source have a continuous output X and the quantized output
alphabet X. Let the probability density function of this continuous amplitude be p{x)
and per letter distortion measure be d(x, x), where x E X and .X EX. We next
introduce the rate distortion function which gives us the minimum number of bits per
sample required to represent the source output symbols given a prespecified
allowable distortion.
Definition 1.15 The minimum rate (in bits/source output} required to represent the
output X of the memoryless source with a distortion less than or equal to Dis called
the rate distortion function Rf.Ii}, defined as
R(D) = min _ I{X, X)
p(i].t):E[d(X, X))
where I(X; X) is the average mutual information between X and X.
We will now state (without proof) two theorems related to the rate distortion function.
Theorem 1.3 The minimum information rate necessary to represent the output of a discrete
time, continuous amplitude memoryless Gaussian source with variance a-2
X' based on a
mean square-error distortion measure per symbol, is
(
ｾ＠ {_!_log2(a-;/D) ｏｾｄｾ｡Ｍ［＠
ｾ＠ D1 = 2
0 D > a-2
X
Consider the two cases:
(i) D ｾ＠ｾＺ＠ For this case there is no need to transfer any information. For the
reconstruction of the samples (with distortion greater than or equal to the variance)
one can use statistically independent, zero mean Gaussian noise samples with
. D 2
vanance = 0" x .
(ii) D < O" ｾ＠ : For this case the number of bits per output symbol decreases monotonically
as D increases. The plot of the rate distortion function is given in Fig. 1.14.
Source Coding
3
Rg(D) 2
o Dla:
0 0.2 0.4 0.6 0.8 1
Fig. 1.14 Plot of the Rg(D) versus ＰＯ｣ｲｾＮ＠
Theorem 1.4 There exists an encoding scheme that maps the source output into codewords
such that for any given distortion D, the minimum rate R(D) bits per sample is sufficient to
reconstruct the source output with an average distortion that is arbitrarily close to D.
Thus, the distortion function for any source gives the lower bound on the source rate
that is possible for a given level of distortion.
Definition 1.16 The distortion rate function for a discrete time, memoryless
gaussian source is defined as
Dg(R) = z-2R ｾＭ
Exmrqile 1.18 For a discrete time, memoryless Gaussian source, the distOrtion (in dB) as a
function of its variance can be expressed as
10 log10 Dg(R) =- 6R + 10 log10 u;.
Thus the mean square distortion decreases at a rate of 6 dB/bit.
The rate distortion function ofadiscrete time, memoryless continuous amplitude source with zero
mean and finite variance u!with respect to the mean square error distortion measure D is upper
bounded as
This upper bound can be intuitively understood as follows. We know that for a given variance;
the zero mean Gaussian random variable exhibits the maximum differential entropyattainable by
any random variable. Hence, for a given distortion, the minimum number of bits per S8.tllple
required is upperbounded by the gaussian random variable.
./

The next obvious question is: What would be a good design for a quantizer? Is there a way to
construct a quantizer that minimizes the distortion without using too many bits? We shall find
the answers to these questions in the next section.
1.10 OPTIMUM QUANTIZER DESIGN
In this section, we look at optimum quantizer design. Consider a continuous amplitude signal
whose amplitude is not uniformly distributed, but varies according to a certain probability
density function, p(x). We wish to design the optimum scalar quantizer that minimizes some
function of the quantization error q= x - x, where x is the quantized value of x. The distortion
resulting due to the quantization can be expressed as
D = ｦｾ＠ f(x - x) p(x)dx,
where f( x - x) is the desired function of the error. An optimum quantizer is one that minimizes
D by optimally selecting the output levels and the corresponding input range of each output
level. The resulting optimum quanjizer is called the lloyd-Max Quantizer. For an L-level
quantizer the distortion is given by
L
D= If* f(x - x) p(x)dx
k=l Xk-1
The necessary conditions for minimum distortion are obtained by differentiating D with
respect to {xk} and {xA;}. As a result of the differentiation process we end up with the following
_system of equations
f(xk - x) = f(xk+I - xJ, k= 1, 2,···, L- 1
Xk
J f'(xk+ 1 - xJ p(x)dx k= 1, 2,···, L
For f(x) =X'- ,i.e., the mean square value of the distortion, the above equations simplify to
1 (- - )
xk =- xk - xk+I '
2
k= 1, 2,···, L- 1
rk (xk- x) p(x)dx= 0, k= 1, 2,···, L
xk-1
The nonuniform quantizers are optimized with respect to the distortion. However, each
quantized sample is represented by equal number of bits (say, R bits/sample). It is possible to
have a more efficient VLC. The discrete source outputs that result from quantization is
characterized by a set of probabilities h· These probabilities are then used to design efficient
VLC (source coding). In order to compare the performance of different nonuniform quantizers,
we first fix the distortion, D, and then compare the average number of bits required per sample.
Source Coding
Example 1.19 Consider an eight level quantizer for a Gaussian random variable. This problem
was first solved by Max in 1960. The random variable has zero mean and variance equal to unity.
For a mean square error minimization, the values xk and .X'g are listed in Table 1.3.
Table 1.3 Optimum quantization and Huffman coding
Level. x, x. P. Huffman
Code
1 - 1.748 - 2.152 0.040 0010
2 - 1.050 - 1.344 0.107 011
3 -0.500 -0.756 0.162 010
4 0 -0.245 0.191 10
5 0.500 0.245 0.191 11
6 1.050 0.756 0.162 001
7 1.748 1.344 0.107 0000
8 00 2.152 0.040 0011
For these values, D = 0.0345 which equals -14.62 dB.
The number of bits/sample for this optimum 8-level quantizer is R = 3. On performing Huffman
coding, the average number of bits per sample required is RH = 2.88 bits/sample. The theoretical
limit is H(X) = 2.82 bits/sample.
. F I 1 rt · ··· r r
1.11 INTRODUCTION TO IMAGE COMPRESSION
Earlier in this chapter we discussed the coding of data sets for compression. By applying these
techniques we can store or transmit all of the information content of a string of data with fewer
bits than are in the source data. The minimum number of bits that we must use to convey all the
information in the source data is determined by the entropy measure of the source. Good
compression ratios can be obtained via entropy encoders and universal encoders for sufficiently
large source data blocks. In this section, we look at compression techniques used to store and
transmit image data.
Images can be sampled and quantized sufficiently finely so that a binary data stream can
represent the original data to an extent that is satisfactory to the most discerning eye. Since we
can represent a picture by anything from a thousand to a million bytes of data, we should be
able to apply the techniques studied earlier directly to the task of compressing that data for
storage and transmission. First, we consider the following points:
1. High quality images are represented by very large data sets. A photographic quality
image may require 40 to 100 million bits for representation. These large file sizes drive
the need for extremely high compression ratios to make storage and transmission
(particularly of movies) practical.
2. Applications that involve imagery such as television, movies, computer graphical user
interfaces, and the World Wide Web need to be fast in execution and transmission across

distribution networks, particularly if they involve moving images, to be acceptable to the
human eye.
3. Imagery is characterised by higher redundancy than is true of other data. For example, a
pair of ｃｾＮ､ｪ｡｣･ｮｴ＠ horizontal lines in an image is nearly identical while, two adjacent lines
in a book are generally different.
The first two points indicate that the highest level of compression technology needs to be
used for the movement and storage of image data. The third factor indicates that high
compression ratios could be applied. The third factor also says that some special compression
techniques may be possible to take advantage of the structure and properties of image data. The
close relationship between neighbouring pixels in an image can be exploited to improve the
compression ratios. This has a very important implication for the task of coding and decoding
image data for real-time applications.
Another interesting point to note is that the human eye is highly tolerant to approximation
error in an image. Thus, it may be possible to compress the image data in a manner in which the
less important details (to the human eye) can be ignored. That is, by trading off some of the
quality of the image we might obtain a significantly reduced data size. This technique is called
Lossy Compression, as opposed to the Lossless Compression techniques discussed earlier.
Such liberty cannot be taken, say, financial or textual data! Lossy Compression can only be
applied to data such as images and audio where deficiencies are made up by the tolerance by
human senses of sight and hearing.
1.12 THE JPEG STANDARD FOR LOSSLESS COMPRESSION
The Joint Photographic Experts Group (]PEG) was formed jointly by two 'standards'
organisations--the CCITT (The European Telecommunication Standards Organisation) and
the International Standards Organisation (ISO). Let us now consider the lossless compression
option of theJPEG Image Compression Standard which is a description of 29 distinct coding
systems for compression of images. Why are there so many approaches? It is because the needs
of different users vary so much with respect to quality versus compression and compression
versus computation time that the committee decided to provide a broad selection from which to
choose. We shall briefly discuss here two methods that use entropy coding.
The two lossless JPEG compression options discussed here differ only in the form of the
entropy code that is applied to the data. The user can choose either a Huffman Code or an
Arithmetic Code. We will not treat the Arithmetic Code concept in much detail here.
However, we will summarize its main features:
Arithmetic Code, like Huffman Code, achieves compression in transmission or storage by
using the probabilistic nature of the data to render the information with fewer bits than used in
the source data stream. Its primary advantage over the Huffman Code is that it comes closer to
the Shannon entropy limit of compression for data streams that involve a relatively small
alphabet. The reason is that Huffman codes work best (highest compression ratios) when the
Source Coding .
probabilities of the symbols can be expressed as fractions of powers of two. The Arithmetic
code construction is not closely tied to these particular values, as is the Huffman code. The
computation of coding and decoding Arithmetic codes is costlier than that of Huffman codes.
Typically a 5 to 10% reduction in file size is seen with the application of Arithmetic codes over
that obtained with Huffman coding.
Some compression can be achieved if we can predict the next pixel using the previous pixels. In
this way we just have to transmit the prediction coefficients (or difference in the values) instead of
the entire pixel. The predictive process that is used in the losslessJPEG coding schemes to form the
innovations data is also variable. However, in this case, the variation is not based upon the user's
choice, but rather, for any image on a line by line basis. The choice is made according to that
prediction method that yields the best prediction overall for the entire line.
There are eight prediction methods available in theJPEG coding standards. One of the eight
(which is the no prediction option) is not used for the lossless coding option that we are
examining here. The other seven may be divided into the following categories:
1. Predict the next pixel on the line as having the same value as the last one.
2. Predict the next pixel on the line as having the same value as the pixel in this position on
the previous line (that is, above it).
3. Predict the next pixel on the line as havirg a value related to a combination of the
previous, above and previous to the above pixel values. One such combination is simply
the average of the other three.
The differential encoding used in the JPEG standard consists of the differences between the
actual image pixel values and the predicted values. As a result of the smoothness and
redundancy present in most pictures, these differences give rise to relatively small positive and
negative numbers that represent the small typical error in the prediction. Hence, the
probabilities associated with these values are large for the small innovation values and quite
small for large ones. This is exactly the kind of data stream that compresses well with an entropy
code.
The typical lossless compression for natural images is 2: 1. While this is substantial, it does
not in general solve the problem of storing or moving large sequences of images as encountered
in high quality video.
1.13 THE JPEG STANDARD FOR LOSSY COMPRESSION
TheJPEG standard includes a set of sophisticated lossy compression options developed after a
study of image distortion acceptable to human senses. The JPEG lossy compression algorithm
consists of an image simplification stage, which removes the image complexity at some loss of
fidelity, followed by a lossless compression step based on predictive filtering and Huffman or
Arithmetic coding.
The lossy image simplification step, which we will call the image reduction, is based on the
exploitation of an operation known as the Discrete Cosine Transform (DCT), defined as follows.

N-IM-1
Y(k, l) = I I 4y(i, })cos( 1tk (2i + t)cos(__!E!___(2j + 1))
i=O J=O 2N 2M
where the input image is N pixels by M pixels, y(ｾ＠ j) is the intensity of the pixel in row i and
column j, Y(k,l) is the DCT coefficient in row k and column l of the DCT matrix. All DCT
multiplications are real. This lowers the number of required multiplications, as compared to the
Discrete Fourier Transform. For most images, much of the signal energy lies at low frequencies,
which appear in the upper left comer of the DCT. The lower right values represent higher
frequencies, and are often small (usually small enough to be neglected with little visible
distortion).
In theJPEG image reduction process, the DCT is applied to 8 by 8 pixel blocks of the image.
Hence, if the image is 256 by 256 pixels in size, we break it into 32 by 32 square blocks of 8by
8 pixels and treat each one independently. The 64 pixel values in each block are transformed by
the DCT into a new set of 64 values. These new 64 values, known also as the DCT coefficients
'
form a whole new way of representing an image. The DCT coefficients represent the spatial
frequency of the image sub-block. The upper left comer of the DCT matrix has low frequency
components and the lower right comer the high frequency components (see Fig. 1.15). The top
left coefficient is called the DC coefficient. Its value is proportional to the average value of the 8
by 8 block of pixels. The rest are called the AC coefficients.
So far we have not obtained any reduction simply by taking the DCT. However, due to the
nature of most natural images, maximum energy (information) lies in low frequency as opposed
to high frequency. We can represent the high frequency components coarsely, or drop them
altogether, without strongly affecting the quality of the resulting image reconstruction. This
leads to a lot of compression (lossy). TheJPEG lossy compression algorithm does the following
operations:
1. First the lowest weights are trimmed by setting them to zero.
2. The remaining weights are quantized (that is, rounded off to the nearest of some number
of discrete code represented values), some more coarsely than others according to
observed levels of sensitivity of viewers to these degradations.
DC coefficient
Low frequency
coefficients
/
4.32
/
v
2.74
2.11
1.62
3.12
l.-""'
2.11
1.33
0.44
3.01 2.41
1.92 1.55
0.32 0.11...,.
-1.-----
ｾ＠
0.03 0.02
ｾ＠
v
Higher frequency
coefficients)
(AC coefficients)
Fig. 1.15 Typical Discrete Cosine Transform (OCT) Values.
Source Coding
Now several lossless compression steps are applied to the weight data that results from the
above DCT and quantization process, for all the image blocks. We observe that the DC
coefficient, which represents the average image intensity, tends to vary slowly from one block of
8 x 8 pixels to the next. Hence, the prediction of this value from surrounding blocks works well.
We just need to send one DC coefficient and the difference between the DC coefficients of
successive blocks. These differences can also be source coded.
We next look at the AC coefficients. We first quantize them, which transforms most of the
high frequency coefficients to zero. We then use a zig-zag coding as shown in Fig. 1.16. The
purpose of the zig-zag coding is that we gradually move from the low frequency to high
frequency, avoiding abrupt jumps in the values. Zig-zag coding will lead to long runs of O's,
which are ideal for RLE followed by Huffman or Arithmetic coding.
4.32 3.12 3.01 2.41 4 3 3 2
2.74 2.11 1.92 1.55
ｾ＠ 4333222122200000
2.11 1.33 0.32 0.11
1.62 0.44 0.03 0.02
Fig. 1.16 An Example of Quantization followed by Zig-zag Coding.
The typically quoted performance for ]PEG is that photographic quality images of natural
scenes can be preserved with compression ratios of up to about 20:1 or 25:1. Usable quality
(that is, for noncritical purposes) can result for compression ratios in the range of 200:1 up to
230:1.
1.14 CONCLUDING REMARKS
In 1948, Shannon published his landmark paper titled "A Mathematical Theory of
Communication". He begins this pioneering paper on information theory by observing that the
fundamental problem of communication is that of reproducing at one point either exactly or
approximately a message selected at another point. He then proceeds so thoroughly to establish
the foundations of information theory that his framework and terminology remain standard.
Shannon's theory was an immediate success with communications engineers and stimulated the
growth of a technology which led to today's Information Age. Shannon published many more
provocative and influential articles in a variety of disciplines. His master's thesis, "A Symbolic
Analysis of Relay and Switching Circuits", used Boolean algebra to establish the theoretical
underpinnings of digital circuits. This work has broad significance because digital circuits are
fundamental to the operation of modem computers and telecommunications systems.

Shannon was renowned for his eclectic interests and capabilities. A favourite story describes
him juggling while riding a unicycle down the halls of Bell Labs. He designed and built chess-
playing, maze-solving, juggling and mind-reading machines. These activities bear out Shannon's
claim that he was more motivated by curiosity than usefulness. In his words "I just wondered
how things were put together."
The Huffman code was created by American Scientist, D. A. Huffman in 1952. Modified
Huffman coding is today used in the Joint Photographic Experts Group (]PEG) and Moving
Picture Experts Group (MEPG) standards.
A very efficient technique for encoding sources without needing to know their probable
occurrence was developed in the 1970s by the Israelis Abraham Lempel and Jacob Ziv. The
compress and uncompress utilities of the UNIX operating system use a modified version of this
algorithm. The GIF format (Graphics Interchange Format), developed by CompuServe,
involves simply an application of the Lempel-Ziv-Welch (LZW) universal coding algorithm to
the image data.
And finally, to conclude this chapter we mention that Shannon, the father of Information
Theory, passed away on February 24, 2001. Excerpts from the obituary published in the New
York Times:
SUMMARY
• The Self-Information of the event X= x, is given by I(xJ = log ( P(:,) ) = - log P(xJ.
• The Mutual Information I(x,; y) between x, and Jj is given by I(x,; y) =log ( ｰｾｾｾＩＩｽ＠
• The Conditional Self-Information of the event X= xi given Y = y1is defined as J(xi I y)
= log (
1
J= - log P(xi Iy).
P(xiiYJ)
• The Average Mutual Information between two random variables X and Yis given by J(X:,
n m n m P(X· J·)
Y) = L L P(xi, y)I(xi; y) = L L P(xc y1)log '' 1
. For the case when X and Yare
i=l J=l i=l J=l P(xJP(yJ)
statistically independent, J(X; Y) = 0. The average mutual information J(X; Y) ｾ＠ 0, with
equality if and only if X and Yare statistically independent.
Source Coding
n
• The Average Self-Information of a random variable Xis given by H(X; = L P(x;)I(xi)
n i=l
=-L P(xJlogP(xJ. H (X; is called the entropy.
i=l
• The Average Conditional Self-Information called the Conditional Entropy is given
by
n m
1
H(XI Y) = ｾ＠ｾ＠ P(x;, y) log P(xiiYJ)
• I(xi; y) = I(xJ- I(xi Iy) and I(X; Y) = H(X)- H(XI Y) = H(Y)- H(YIX). Since I(X; Y)
ｾ＠ 0, it implies that H(X) ｾ＠ H(XI Y).
• The Average Mutual Information between two continuous random variables X and Y
00 00 p( lx)p(x)
is given by J(X; Y) = J Jp(x)p(ylx)log ｾｸＩ＠ ( ) dxdy
-oo-oo p pY
• The Differential Entropy of a continuous random variables X is given by H(X) =
- Jp(x)log p{x).
• The Average Conditional Entropy of a continuous random variables X given Y is
given by H(XI Y) = J Jp(x, y)log p(xly)dxdy.
• A necessary and sufficient condition for the existence of a binary code with codewords
L
having lengths n1 :5 ｾ＠ :5 ... nL that satisfy the prefix condition is L 2-nk :5 1. The efficiency
k=l
H(x)
of a prefix code is given by T] = R ·
• Let X be the ensemble of letters from a DMS with finite entropy H (X). The Source
Coding Theorem suggests that it is possible to construct a code that satisfies the prefix
condition, and has an average length R that satisfies the inequality H (X) :5 R <H(X) +
1. Efficient representation of symbols leads to compression of data.
• Huffman Encoding and Lempel-Ziv Encoding are two popular source coding
techniques. In contrast to the Huffman coding scheme, the Lempel-Ziv technique is
independent of the source statistics. The Lempel-Ziv technique generates a Fixed Length
Code, where as the Huffman code is a Variable Length Code.
• Run-Length Encoding or RLE is a technique used to reduce the size of a repeating
string of characters. This repeating string is called a run. Run-length encoding is
supported by most bitmap file formats such as TIFF, BMP and PCX.

• Distortion implies some measure of difference between the actual source samples {x.J
and the corresponding quantized value {xd. The squared-error distortion is given by
d(xk, xk) =(xk- xkf In general, a distortion measure may be represented as d(xlt, xk) =
jxk- xkjP.
• The Minimum Rate (in bits/source output) required to represent the output X of a
memoryless source with a distortion less than or equal to D is called the rate distortion
function R(D), defined as R(D) = min _ I (X, X) where I(X, X); is the average
p(x!x):E[d(X, X)]
mutual information between X and .i.
• The distortion resulting due to the quantization can be expressed as D = [ .. f(x - x)
p(x)dx, where f(x- x) is the desired function of the error. An optimum quantizer is one
that minimizes D by optimally selecting the output levels and the corresponding input
range of each output level. The resulting optimum quantizer is called the Lloyd-Max
quantizer.
• Quantization and source coding techniques (Huffman coding, arithmetic coding iHld run-
length coding) are used in the JPEG standard for image compression.
ｦｪｏｾ｡ｴＧＢ･ｴｾｾｾｹｯｷｾｷｨ･ＬｮＮＬｹｯｷｾ＠
: . your ･Ｚｙｾ＠ offyour ｾ＠ 1
l - flenry FonL (1863-1947) i
..._)
PRC913LEMS
1.1 Consider a DMS with source probabilities {0.30, 0.25, 0.20, 0.15, 0.10}. Find the source
entropy, H (X).
1.2 Prove that the entropy for a discrete source is a maximum when the output symbols are
equally probable.
1.3 Prove the inequality In x ｾ＠ x- 1. Plot the curves y1 = In x and Y2 = x- 1 to demonstrate the
validity of this inequality.
1.4 Show that I (X; Y) ｾ＠ 0. Under what condition does the equality hold?
1.5 A source, X: has an infinitely large set of outputs with probability of occurrence given by
P (xJ = Ti, i = 1, 2, 3, ..... What is the average self information, H (X), of this source?
1.6 Consider another geometrically distributed random variable X with P (x;) = p (1 - pt1
,
i = 1, 2, 3, ..... What is the average self information, H (X), of this source?
1.7 Consider an integer valued random variable, X: given by P (X= n) =
1
2
, where
. An log n
ｾ＠ 1 .
A= L 1 2 and n= 2, 3 ..., oo. Find the entropy, H (X).
n=Z n og n
Source Coding
1.8 Calculate the differential entropy, H (X), of the uniformly distributed random variable X
with the pdf,
{
-1 0 < <
()
a _x_a
px=
0 (otherwise)
Plot the differential entropy, H (X), versus the parameter a (0.1 < a< 10). Comment on
the result.
1.9 Consider a DMS with source probabilities {0.35, 0.25, 0.20, 0.15, 0.05}.
(i) Determine the Huffman code for this source.
(ii) Determine the average length R of the codewords.
(iii) What is the efficiency 11 of the code?
1.10 Consider a DMS with source probabilities {0.20, 0.20, 0.15, 0.15, 0.10, 0.10, 0.05, 0.05}.
(i) Determine an efficient fixed length code for the source.
(ii) Determine the Huffman code for this source.
(iii) Compare the two codes and comment.
1.11 A DMS has three output symbols with probabilities {0.5, 0.4, 0.1}.
(i) Determine the Huffman code for this source and find the efficiency 11·
(ii) Determine the Huffman code for this source taking two symbols at a time and find the
efficiency 11·
(iii) Determine the Huffman code for this source taking three symbols at a time and find
the efficiency 11·
1.12 For a source with entropy H(X), prove that the entropy of a B-symbol block is BH(X).
1.13 Let X and Y be random variables that take on values x1, ｾＧ＠ ..., x, and Yr• )2, ..., Ys
respectively. Let Z =X+ Y.
1.14
1.15
(a) Show that H(Z!X) = H(Y!X)
(b) If X and Yare independent, then argue that H( Y) ｾ＠ H(Z) and H(X) ｾ＠ H (Z).
Comment on this observation.
(c) Under what condition will H(Z) = H(X) + H(Y)?
Determine the Lempel Ziv code for the following bit stream
01001111100101000001010101100110000.
Recover the original sequence from the encoded stream.
Find the rate distortion function R(D) =min !(X; X)for Bernoulli distributed X: with
p = 0.5, where the distortion is given by
{
0,
d (x, x) = ｾＮ＠
x=x,
X= 1 X= 0,
X= 0, X= 1.
1.16 Consider a source X uniformly distributed orr the set {1, 2, ..., m}. Find the rate distortion
function for this source with Hamming distortion defined as
d (x, x) = ' _
{
o x=x
1, X -:1:- X
• !

COMPUTER PROBLEMS
1.17 Write a program that performs Huffman coding, given the source probabilities. It should
generate the code and give the coding efficiency.
1.18 Modify the above program so that it can group together n source symbols and then
generate the Huffman code. Plot the coding efficiency T1 versus n for the following source
symbol probabilities: {0.55, 0.25, 0.20}. For what value of n does the efficiency become
better than 0.9999? Repeat the exercise for following source symbol probabilities {0.45,
0.25, 0.15, 0.10, 0.05}.
1.19 Write a program that executes the Lempel Ziv algorithm. The input to the program can
be the English alphabets. It should convert the alphabets to their ASCII code and then
perform the compression routine. It should output the compression achieved. Using this
program, find out the compression achieved for the following strings of letters.
(i) The Lempel Ziv algorithm can compress the English text by about fifty five percent.
(ii) The cat cannot sit on the canopy of the car.
1.20 Write a program that performs run length encoding (RLE) on a sequence of bits and
gives the coded output along with the compression ratio. What is the output of the
program if the following sequence is fed into it:
1100000000111100000111111111111111111100000110000000.
Now feed back the encoded output to the program, i.e., perform the RLE two times on the
original sequence of bits. What do you observe? Corr:ment.
1.21 Write a program that takes in a 2n level gray scale image (n bits per pixel) and performs
the following operations:
(i) Breaks it up into 8 by 8 pixel blocks.
(ii) Performs DCT on each of the 8 by 8 blocks.
(iii) Quantizes the DCT coefficients by retaining only the m most significant bits (MSB),
where m ｾ＠ n.
(iv) Performs the zig-zag coding followed by run length coding.
(v) Performs Huffman coding on the bit stream obtained above (think of a reasonable
way of calculating the symbol probabilities).
(vi) Calculates the compression ratio.
(vii) Performs the decompression (i.e., the inverse operation of the steps (v) back to (i)).
Perform image compression using this program for different values of m. Up to what
value of mis there no perceptible difference in the original image and the compressed
image?
..
2
Channel Capacity and Coding
v
ｅｸＭｰ･ｲｾ＠ thin1c.t tJw.:t 1£' w a-- ｾ＠ thecremt
whilet ｴｨ･Ｌｾ＠ be.lUwe-- Lt: 0- be- a.t'1 ｾ＠
{cl.d.
｛ｏｷｴ［Ｇｨ･ｴｇｾ｣ｵｲｶ･Ｌ＠ｾＰＭｐｾＩ＠
. LipptttlM'4 GcibrieL (1845 -1921)
2.1 INTRODUCTION
In the previous chapter we saw that most natural sources have inherent redundancies and it is
possible to compress data by removing these redundancies using different source coding
techniques. After efficient representation of source symbols by the minimum possible number
of bits, we transmit these bit-streams over channels (e.g., telephone lines, optical fibres etc.).
These bits may be transmitted as they are (for baseband communications), or after modulation
(for passband communications). Unfortunately, all real-life channels are noisy. The term noise
designates unwanted waves that disturb the transmission and processing of the wanted signals
in communication systems. The source of noise may be external to the system (e.g., atmospheric
noise, man generated noise etc.), or internal (e.g., thermal noise, shot noise etc.). In effect, the
bit stream obtained at the receiver is likely to be different from what was transmitted. In
passband communication, the demodulator processes the channel-corrupted waveform and
reduces each waveform to a scalar or a vector that represents an estimate of the transmitted data
symbols. The detector, which follows the demodulator, decides whether the transmitted bit is a

0 or a 1. This is called Hard Decision Decoding. This decision process at the decoder is
similar to a binary quantization with two levels. If there are more than 2 levels of quantization,
the detector is said to perform a Soft Decision Decoding.
The use of hard decision decoding causes an irreversible loss of information at the receiver.
Suppose the modulator sends only binary symbols but the demodulator has an alphabet with Q,
symbols, and assuming the use of quantizer as depicted in Fig. 2.1 (a), we have Q,= 8. Such a
channel is called a binary input Qary output Discrete Memoryless Channel. The
corresponding channel is shown in Fig. 2.1 (b). The decoder performance depends on the
location of the representation levels of the quantizers, which in tum depends on the signal level
and the noise power. Accordingly, the demodulator must incorporate automatic gain control in
order to realize an effective multilevel quantizer. It is clear that the construction of such a
decoder is more complicated than the hard decision decoder. However, soft decision decoding
can provide significant improvement in performance over hard decision decoding.
Output
b1
b2
b3
b4
Input
bs
b6
b7
ba
(a) (b)
Fig. 2.1 (a) Transfer Characteristic of Multilevel Quantizer
(b) Channel Transition Probability Diagram.
b1
b2
b3
b4
bs
b6
b7
ba
There are three balls that a digital communication _engineer must juggle: (i) the transmitted
signal power, (ii) the channel bandwidth, and (iii) the reliability of the communication system
(in terms of the bit error rate). Channel coding allows us to trade-off one of these commodities
(signal power, bandwidth or reliability with respect to others). In this chapter, we will study how
to achieve reliable communication in the presence of noise. We shall ask ourselves questions
like: how many bits per second can be sent over a channel of a given bandwidth and for a given
signal to noise ratio (SNR)? For that, we begin by studying a few channel models first.
2.2 CHANNEL MODELS
We have already come across the simplest of the channel models, the Binary Symmetric
Channel (BSC), in the previous chapter. If the modulator employs binary waveforms and the
detector makes hard decisions, then the channel may be viewed as one in which a binary bit
stream enters at the transmitting end and another bit stream comes out at the receiving end.
This is depicted in Fig. 2.2.
Channel Binary
f---.- Channel ____. Demodulator/
r----- Channel
-
- Encoder
f---.- Modulator Detector Decoder
Fig. 2.2 A Composite Discrete-input, Discrete-output Channel.
The composite Discrete-input, Discrete-output Channel is characterized by the set X=
{0,1} of possible inputs, the set Y= {0, 1} of possible outputs and a set of conditional probabilities
that relate the possible outputs to the possible inputs. Assuming the noise in the channel causes
independent errors in the transmitted binary sequence with average probability of error p
P(Y= Ol X= 1) = P(Y= 11 X= 0) = p,
P(Y= II X= 1) = P(Y= Ol X= 0) = 1- p. (2.1)
A BSC is shown in Fig. 2.3.
1- p
0
0
p
1-p
Fig. 2.3 A Binary Symmetric Channel (BSC).
The BSC is a special case of a general, discrete-input, discrete-output channel. Let the input
to the channel be q-ary symbols, i.e., X= {.xo, x1 , .., xq--d and the output of the detector at the
receiver consist of Qary symbols, i.e., Y= {y0, y1 , .. , YQ;-d. We assume that the channel and the
modulation is memoryless. The inputs and outputs can then be related by a set of qQ,conditional
probabilities
P(Y= Yi I X= x) = P(y1 I x), (2.2)
where i = O, 1 , ... Q,- 1 and j = 0, 1 , ... q- 1. This channel is known as a Discrete Memoryless
Channel (DMC) and is depicted in Fig. 2.4.
Definition 2.1 The conditional probability P(y; I x1) is defined as the Channel
Transition Probability and is denoted by PJi·
Definition 2.2 The conditional probabilities {P(y; Ix)} that characterize a DMC can
be arranged in the matrix form P = [p1J. P is called the Probability Transition
Matrix for the channel.
.I

Yo
Y1
Xq-1
Ya-1
Fig. 2.4 A Discrete Memoryless Channel (DMC) with q-ary input and Q-ary output.
In the next section, we will try to answer the question: How many bits can be sent
across a given noisy channel, each time the channel is used?
2.3 CHANNEL CAPACITY
Consider a DMC having an input alphabet X= {.xo, xi, ..., xq-Jland an output alphabet Y= {.xo, xi,
..., xr_Jl. Let us denote the set of channel transition probabilities by P(yil x1). The average mutual
information provided by the output Y about the input X is given by (see Chapter 1, Section 1.2)
ｾｉ＠ r-I P(y·lx.)
I(X;Y) = L LP(x1)P(yilx1)log
1 1
j=O i=O ｐＨｾＩ＠
(2.3)
The channel transition probabilities ｐＨｹＬｾｸＩ＠ are determined by the channel characteristics
(particularly the noise in the channel). However, the input symbol probabilities P(x1
) are within
the control of the discrete channel encoder. The value of the average mutual information, /(X; Y),
maximized over the set of input symbol probabilities P(x) is a quantity that depends only on the
channel transition probabilities ｐＨｹＬｾｸ
ＱＩＮ＠ This quantity is called the Capacity of the Channel.
Definition 2.3 The Capacity of a DMC is defined as the maximum average mutual
information in any single use of the channel, where the maximization is over all
possible input probabilities. That is,
C= max I(X; Y)
P(x1)
q-1 r-1 P(y·lx·)
=max L L P(x1 )P(y1lx1 )1og
1 1
P(xj) j=O i=O p (yl)
The maximization of I(X; Y) is performed under the constraints
q-1
P(x) ｾ＠ 0, and L.P(x1) = 1
j=O
(2.4)
The units of channel capacity is bits per channel use (provided the base of the
lo 'thm is 2.
Example 2.1 Consider a BSC with channel transition probabilities
P(Oil) = p = P(liO)
By symmetry, the capacity, max C = max /(X; Y), is achieved forp = 0.5. From equation (2.4) we
P(xj)
obtain the capacity of a BSC as
C = 1 + p log2 p + (1- p) log2( 1- p)
Let us define the entropy function
H(p) =- p log2 p- (1- p) log2 (1- p)
Hence, we can rewrite the capacity of a binary symmetric channel as
C= 1- H(p).
0.8
ｾ＠ 0.6
·o
IU
a.
IU
() 0.4
0.2
0.2 0.4 0.6 0.8
Probability of error
Fig. 2.5 The Capacity of a BSC.
The plot of the capacity versus p is given in Fig. 2.5. From the plot we make the following
observations.
(i) Forp = 0 (i.e., noise free channel), the capacity is 1bit/use, as expected. Each time we use the
channel, we can successfully transmit 1 bit ofinformation.
(ii) Forp = 0.5, the channel capacity is 0, i.e., observing the output gives no information about
the input. It is equivalent to the case when the channel is broken. We might as well discard
the channel and toss a fair coin in order to estimate what was transmitted.
(iii) For 0.5 < p < 1, the capacity increases with increasing p. In this case we simply reverse the
positions of 1 and 0 at the output of the BSC.
(iv) For p = 1 (i.e., every bit gets flipped by the channel), the capacity is again 1 bit/use, as
expected. In this case, one simply flips the bit at the output of the receiver so as to undo the
effect ofthe channel.

(v) Sincep_ is a monotonically decreasing function ofsignal to noise ratio (SNR), the capacity of
a BSC Is a monotonically increasing function of SNR.
Having developed the notion of capacity of a channel, we shall now try to relate it to reliable
communication over the channel. So far, we have only talked about bits that can be sent over a
ch_annel each time it is used (bits/use). But, what is the number of bits that can be send per ｳ･｣ｯｾ､＠
(bits/sec)? To answer this question we introduce the concept of Channel Coding.
2.4 CHANNEL CODING
ｾｬｬ＠ real-life channels are affected by noise. Noise causes discrepancies (errors) between the
mput and the ｯｵｴｰｾｴＮ＠ data ｳｾｱｵ･ｮ｣･ｳ＠ of a digital communication system. For a typical noisy
｣ｾ｡ｮｮ･ｬＬ＠ the probability of bit error may be as high as 1o-2
. This means that, on an average, 1
bit.ou: ｾｦ＠ _every 100 transmitted over this channel gets flipped. For most applications, this level of
ｾ･ｬｲ｡ｨｺｬｲｴｹ＠ IS far from adequate. Different applications require different levels of reliability (which
IS a_ ｣ｯｭｰｯｮｾｮｴ＠ of the quality of service). Table 2.1 lists the typical acceptable bit error rates for
varwus applications.
Table 2. 1 Acceptable bit error rates for various applications
Application - Probability of Error
Speech telephony
Voice band data
10-4
10-6
Electronic mail, Electronic newspaper 1o-6
Internet access 1o-6
Video telephony, High speed computing 10-7
In order to achieve such high levels of reliability, we resort to Channel formatting.
Coding.. Th_e basic objective of channel coding is to increase the resistance of the digital
commumcation system to channel noise. This is done by adding redundancies in the transmitted
data stream in a controlled manner.
In ｾｨ｡ｮｮ･ｬ＠ coding, we map the incoming data sequence to a channel input sequence. This
･ｮ｣ｯ､ｩｾｧ＠ procedure is done by the Channel Encoder. The encoded sequence is then
transmitted over the noisy channel. The channel output sequence at the receiver is inverse
mapped on to an output data sequence. This is called the decoding procedure, and is carried out
by the Channel Decoder. Both the encoder and the decoder are under the designer's control.
As already mentioned, the encoder introduces redundancy in a prescribed manner. The
decoder exploits ｾｩｳ＠ redundancy in order to reconstruct the original source sequence as
accuratel_r ｾｳ＠ possible. Thus, channel coding makes it possible to carry out reliable
commumcation over unreliable (noisy) channels. Channel coding is also referred to as Error
Control Coding, and we will use these terms interchangeably. It is interesting to note here that
the source coder reduces redundancy to improve efficiency, whereas, the channel coder adds
redundancy in a controlled manner to improve reliability.
We first look at a class of channel codes called Block Codes. In this class of codes, the
incoming message sequence is first sub-divided into sequential blocks, each of length k bits.
Each k-bit long information block is mapped into an n-bit block by the channel coder, where n
> k. This means that for every k bits of information, (n- k) redundant bits are added. The ratio
k
r=- (2.5)
n
is called the Code Rate. Code rate of any coding scheme is, naturally, less than unity. A small
code rate implies that more and more bits per block are the redundant bits corresponding to a·
higher coding overhead. This may reduce the effect of noise, but will also reduce the
communication rate as we will end up transmitting more redundant bits and fewer information
bits. The question before us is whether there exists a coding scheme such that the probability
that the message bit will be in error is arbitrarily small and yet the coding rate is not too small?
The answer is yes and it was first provided by Shannon in his second theorem on channel
capacity. We will study this shortly.
Let us now introduce the concept of time in our discussion. We wish to look at questions like
how many bits per second can we send over a given noisy channel with arbitrarily low bit error
rates? Suppose the DMS has the source alphabet X and entropy H(X) bits per source symbol
and the source generates a symbol every T5
seconds, then, the average information rate of the
source is H (X) bits per second. Let us assume that the channel can be used once every T,
Ts
seconds and the capacity of the channel is C bits per channel use. Then, the channel capacity
per unit time is _!2_ bits per second. We now state Shannon's second theorem known as the
ｾ＠
Channel Coding Theorem.
Theorem 2.1 Channel Coding Theorem (Noisy coding theorem)
(i) Let a DMS with an alphabet X have entropy H(X) and produce symbols every 1's
seconds. Let a DMC have capacity Cand be used once every T, seconds. Then, if
H (X) < _!2_ (2.6)
ｾ＠ - ｾ＠
ｴｨ･ｲｾ＠ exists a coding scheme for which the source output can be transmitted over the
noisy channel and be reconstructed with an arbitrarily low probability of error.
(ii) Conversely, if
H(X) > _!2_ (2.7)
Ts ｾＧ＠
it is not possible to transmit information over the channel and reconstruct it with an
arbitrarily small probability of error.
The parameter _!2_ is called the Critical Rate.
T,
I
I

The channel coding theorem is a very important result in information theory. The theorem
specifies the channel capacity, C, as a fundamental limit on the rate at which reliable communi-
cation can be carried out over an unreliable (noisy) DMS channel. It should be noted that the
channel coding theorem tells us about the existence of some codes that can achieve reliable
communication in a noisy environment. Unfortunately, it does not give us the recipe to
construct these codes. Therefore, channel coding is still an active area of research as the search
for better and better codes is still going on. From the next chapter onwards we shall study some
good channel codes.
Example 2.2 Consider a DMS source that emits equally likely binary symbols (p = 0.5) once
every Ts seconds. The entropy for this binary source is
H(p) =- p log2p- (1- p) log2 (1- p) = 1 bit.
The information rate of this source is
H (X) = -1 bits/second.
1's 1's
Suppose we wish to transmit the source symbols over a noisy channel. The source sequence is
applied to a channel coder with code rate r. This channel coder uses the channel once every Tc
seconds to send the coded sequence. We want to have reliable communication (the probability of
error as small as desired). From the channel coding theorem. if
_1 < ..£ (2.8)
1's - I;;
we can make the probability of error as small as desired by a suitable choice of a channel coding
scheme, and hence have reliable communication. We note that the code rate o{the coder can be
expressed as
I;;
r=-
I's
Hence, the condition for reliable communication can be rewritten as
r$. C
(2.9)
(2.10)
Thus, for a BSC one can find a suitable channel coding scheme with a code rate, r 5:. C, which
will ensure reliable communication regardless of how noisy the channel is! Of course, we can
state that at least one such code exists, but finding that code may not be a trivial job. As we shall
see later, the level of noise in the channel will manifest itself by limiting the channel capacity,
and hence the code rate.
Example 2.3 Consider a BSC with a transition probability p = w-2. Such error rates are typical
of wireless channels. We saw in Example 2.1 that for a BSC the capacity is given by
C = 1 + p log2p + (1 - p) log2 ( 1 - p)
By plugging in the value of p = w-2
we obtain the channel capacity C = 0.919. From the
previous example we can conclude that there exists at least one coding scheme with the code rater
$.-0.919 which will guarantee us a (non-zero) probability oferror that is as small as desired.
Example 2.4 Consider the repetition code in which each message bit is simply repeated n times,
where n is an odd integer. For example, for n = 3, we have the mapping scheme
0 ｾ＠ 000; 1 ｾ＠ 111
Similarly, for n = 5 we have the mapping scheme
0 ｾ＠ 00000; 1 ｾ＠ 11111
Note that the code rate of the repetition code with blocklength n is
r = _!_ (2.11)
n
The decoding strategy is as follows: If in a block ofn received bits the number of0's exceeds the
number of 1's, decide in favour of 0 and vice versa. This is otherwise known as Majority
Decoding. This also answers the question why n should be an odd integer for repetition codes.
Let n = 2m + 1, where m is a positive integer. This decoding strategy will make an error if more
than m bits are in error, because in that case ifa 0 is encoded and sent, there would be more number
of 1'sin the received word. Let us assume that the a priori probabilities of 1 and 0 are equal. Then,
the average probability of error is given by
(2.12)
where p is the channel transition probability. The average probability oferror for repetition codes
for different code rates is given in Table 2.2.
Table 2.2 Average probability of error for repetition codes
Code Rate. r Average Probability of
1
113
115
1/7
119
1111
Error. Pe
w-2
3xlo-4
w-6
4xlo-7
w-s
5x10-0

From the Table we see that as the code rate decreases, there is a steep fall in the average
probability of error. The decrease in the Pe is much more rapid than the decrease in the code
rate, r. However, for repetition codes, the code rate tends to zero if we want smaller and smaller
Pe- Thus the repetition code exchanges code rate for message reliability. But the channel coding
theorem states that the code rate need not tend to zero in order to obtain an arbitrarily low
probability of error. The theorem merely requires the code rate r to be less than the channel
capacity, C. So there must exist some code (other than the repetition code) with code rater= 0.9
which can achieve arbitrarily low probability of error. Such a coding scheme will add just 1
parity bit to 9 information bits (or, maybe, add 10 extra bits to 90 information bits) and give us
as small aPe as desired (say, 10-20 )! The hard part is finding such a code.
2.5 INFORMATION CAPACITY THEOREM
So far we have studied limits on the maximum rate at which information can be sent over a
channel reliably in terms of the chan:1el capacity. In this section we will formulate the
Information Capacity Theorem for band-limited, power-limited Gaussian channels.
Consider a zero mean, stationary random process X(t) that is band limited to WHertz. Let X*'
k = 1, 2,..., K, denote the continuous random variables obtained by uniform sampling of the
process X(t) at the Nyquist rate of 2W samples per second. These symbols are transmitted over
a noisy channel which is also band-limited to W Hertz. The channel output is corrupted by
Additive White Gaussian Noise (AWGN) of zero mean and power spectral density (psd)
Nof2. Because of the channel, the noise is band limited to WHertz. Let Yk, k= 1, 2,..., K, denote
the samples of the received signal. Therefore,
Yk= Xk + N"' k = 1, 2,..., K (2.13)
where Nk is the noise sample with zero mean and variance a 2 = N0W. It is assumed that Yh
k = 1, 2,..., K, are statistically independent. Since the transmitter is usually power-limited, let us
put a constraint on the average power in Xk :
E[X2
k] =P, k= 1, 2,..., K (2.14)
The information capacity of this band-limited, power-limited channel is the maximum of the
mutual information between the channel input Xk and the channel output Yk- The maximization
has to be done over all distributions on the input Xk that satisfy the power constraint of equation
(2.14). Thus, the information capacity of the channel (same as the channel capacity) is given by
C= max {/(X; Y) IE[X]J = P}, (2.15)
fxk (x)
where fxk (x) is the probability density function of xk.
Now, from the previous chapter we have,
I (Xk; Yk) = h(Yk)- h(Yk IXk) (2.16)
Note that Xk and Nk are independent random variables. Therefore, the conditional differential
entropy of Yk given Xk is equal to the differential entropy of Nk. Intuitively, this is because given
Xk the uncertainty arising in Yk is purely due to Nk. That is,
h( Yk IXk) = h(Nk) (2.17)
Hence we can write Eq. (2.16) as
1 (Xk; Yk) = h(Yk)- h (Nk) (2.18)
Since h (NJ is independent of X*' maximizing I(Xh YJ translates to maximizing h (YJ. It can
be shown that in order for h (YJ to be maximum, Yk has to be a Gaussian random variable (see
problem 2.10). If we assume Yk to be Gaussian, and Nk is Gaussian by definition, then X is also
Gaussian. This is because the sum (or difference) of two Gaussian random variablesk is also
Gaussian. Thus, in order to maximize the mutual information between the channel input Xk and
the channel output Y.b the transmitted signal should also be Gaussian. Therefore we can rewrite
(2.15) as
C = I(X;Y) IE ｛ｘｾ＠ P ｡ｮ､ｾ＠ is Gaussian (2.19)
We know that if two independent Gaussian random variables are added, the variance of the
resulting Gaussian random variable is the sum of the variances. Therefore, the variance of the
received sample Yk equals P + No W. It can be shown that the differential entropy of a Gaussian
random variable with variance a
2
is ｾ＠ log2 (2necr) (see Problem 2.10). Therefore,
and
h (YJ = _!_log2 [2ne(P+ N0
W)]
2
h (Nk) = 1-log2 [2ne (N0 W)]
2
(2.20)
(2.21)
Substituting these values of differential entropy for Yk and Nk we get
C = __!_log2(1 + _____f__J bits per channel use
2 N0W (2.22)
We are transmitting 2 W samples per second, i.e., the channel is being used 2W times in one
second. Therefore, the information capacity can be expressed as
C= Wlog2(1 + ___f__J bits per second (2.23)
N0W
This basic formula for the capacity of the band-limited, AWGN waveform channel with a
band-limited and average power-limited input was first derived by Shannon in 1948. It is known
as Shannon's third theorem, the Information Capacity Theorem.
Theorem 2.2 (Information Capacity Theorem) The information capacity of a continuous
channel of bandwidth W Hertz, disturbed by Additive White Gaussian Noise of power
spectral density N012 and limited in bandwidth to W, is given by
C= Wlog2(1 + _____f__J bits per second
N0W
where P is the average transmitted power. This theorem is also called the Channel
Capacity Theorem.
il

The Information Capacity Theorem is one of the important results in information theory.
In a single formula one can see the trade off between the channel bandwidth, the average
transmitted power and the noise power spectral density. Given the channel bandwidth and the
SNR the channel capacity (bits/second) can be computed. This channel capacity is the
fundamental limit on the rate of reliable communication for a ｰｯｷ･ｲｾｬｩｭｩｴ･､Ｌ＠ band-limited
ｇｾｳｳｩ｡ｮ＠ channel. It should be kept in mind that in order to approach this limit, the transmitted
signal must have statistical properties that are Gaussian in nature. Note that the terms channel
capacity and information capacity have been used interchangeably.
Let us now derive the same result in a more intuitive manner. Suppose we have a coding
scheme that results in an acceptably low probability of error. Let this coding scheme take k
information bits and encode them into n bit long codewords. The total number of codewords is
M = 2k . Let the average power per bit be P. Thus the average power required to transmit an
entire codeword is nP. Let these codewords be transmitted over a Gaussian channel with the
noise variance equal to if. The received vector of n bits is also Gaussian with the mean equal to
the transmitted codeword and the variance equal to na
2
. Since the code is a good one
(acceptable error rate), the vector lies inside a sphere of radius .Jna
2
centred on the transmitted
codeword. This sphere itself is contained in a larger sphere of radius Jn(P + <1
2
) where
n (P + a 2) is the average power of the received vector.
This concept may be visualized as depicted in Fig. 2.6. There is a large sphere of radius
Jn(P + a 2 ) which contains M smaller spheres of radius .Jna2
• HereM= 2k is the total number
of codewords. Each of these small spheres is centred on a codeword. These are called the
Decoding Spheres. Any received word lying within a sphere is decoded as the codeword on
which the sphere is centred. Suppose a codeword is transmitted over a noisy channel. Then
there is a high probability that received vector will lie inside the correct decoding sphere (since
it is a reasonably good code). The question arises: How many non-intersecting spheres can be
packed inside the large sphere? The more number of spheres one can pack, the more efficient
will be the code in terms of the code rate. This is known as the Sphere Packing Problem.
Fig. 2.6 Visualization of the Sphere Packing Problem.
The volume of an n-dimensional sphere of radius r can be expressed as
V= Anr" (2.24)
where An is a scaling factor. Therefore, the volume of the large sphere (sphere of all possible
received vectors) can be written as
Vau= An [n(P+ a 2)]n/2 (2.25)
and the volume of the decoding sphere can be written as
Vdr =An [na2]nl2 (2.26)
The maximum number of non intersecting decoding spheres that can be packed inside the
large sphere of all possible received vectors is
M= ｾ＠ [n(P + (12 )]n/2 = (1+ __f_) = 2(n/2)log2(1+Picr2) (2.27)
ｾ｛ｮ｡Ｒ｝ｮＯＲ＠ <12
On taking logarithm (base 2) on both sides of the equation we get
log2M = _!!:_ log2 (1+ _!___)
2 (12 (2.28)
Observing that k = log2M, we have
i_ = l_log2 (1+ _L_) '
n 2 a 2 (2.29)
Note that each time we use the channel, we effectively transmit i_ bits. Thus, the maximum
n
number of bits that can be transmitted per channel use, with a low probability of error, is l_ log2
2
x ( 1+ ｾ＠ ) as seen previously in Eq. (2.22). Note that if represents the noise power and is
equal to N0 Wfor an AWGN with power spectral density No_ and limited in bandwidth to W
2
2.6 THE SHANNON LIMIT
Consider a Gaussian channel that is limited both in power and bandwidth. We wish to explore
the limits of a communication system under these constraints. Let us define an ideal system
which can transmit data at a bit rate Rb which is equal to the capacity, C, of the channel, i.e., Rb
= C. Suppose the energy per bit is Eb. Then the average transmitted power is
P= ｅｾ｢＠ = EbC (2.30)
Therefore, the channel capacity theorem for this ideal system can be written as
_£_ = log2(1 + Eh 2_J
W N0 W
(2.31)

This equation can be re-written in the following form
E 2CIW -1
_b =---- (2.32)
No CIW
The plot of the bandwidth efficiency RWb versus Eb is called the Bandwidth Efficiency
No ,
Diagram, and is given in Fig. 2.7. The ideal system is represented by the line Rb = C.
ＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭ
I
1
I
--------- ＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭＬＭＭＭＭＭＭＭＭＭＭＭＬＭＭＭＭＭＭＭＭＭＭＭ
1 I
I
ＭＭＭＭＭＭＭＭＭｾＭ ----------------------
10° ==== ］］］］］ｾ］］］］］ｪ］］］］］］］］］］］ｾ］］］］］］］］］］］ｪ］］］］］］］］］］］＠
---- ＭＭＭＭＭＭＭＭＭＭＭＴＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭ
---- ＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭＴＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭ
l-----------------------l-----------------------
ｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＴＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭ
l I '
ｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭ
1
ＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭ 1 I
0 10 20 30
Fig. 2.7 The Bandwidth Efficiency Diagram.
The following conclusions can be drawn from the Bandwidth Efficiency Diagram.
· Eb d th 1· . . al
(i) For infinite bandwidth, the ratio No ten s ｾｯ＠ e tmtting v ue
Eb I = In 2 = 0.693 = - 1.6 dB
No W---too
(2.33)
This value is called the Shannon Limit. It is interesting to note that the Shannon limit is a
fraction. This implies that for very large bandwidths, reliable communication is possible even
for the case when the signal power is less than the noise power! The channel capacity
corresponding to this limiting value is
Clw---too = ｾ
Ｐ＠ log2 e (2.34)
Thus, at infinite bandwidth, the capacity of the channel is determined by the SNR.
(ii) The curve for the critical rate Rb = Cis known as the Capacity Boundary. For the case
Rb > C, reliable communication is not guaranteed. However, for Rb < C, there exists some
coding scheme which can provide an arbitrarily low probability of error.
(iii) The Bandwidth Efficiency Diagram shows the trade-offs between the quantities !!JJ_ Eb
W'N0
and the probability of error, Pe- Note that for designing any communication system the basic
design parameters are the bandwidth available, the SNR and the bit error rate (BER). The BER
is determined by the application and the quality of service (QoS) desired. The bandwidth and
the power can be traded one for the other to provide the desired BER.
(iv) Any point on the Bandwidth Efficiency Diagram corresponds to an operating point
corresponding to a set of values of SNR, Bandwidth Efficiency and BER.
The information capacity theorem predicts the maximum amount of information that can be
transmitted through a given bandwidth for a given SNR. We see from Fig. 2.7 that acceptable
capacity can be achieved even for low SNRs, provided adequate bandwidth is available. The
optimum usage of a given bandwidth is obtained when the signals are noise-like and a minimal
SNR is maintained at the receiver. This principle lies in the heart of any spread spectrum
communication system, like Code Division Multiple Access (CDMA).
2.7 RANDOM SELECTION OF CODES
Consider a set of M coded signal waveforms constructed from a set of n-dimensional binary
codewords. Let us represent these codewords as follows
Ci= [ci1 cfl ... ciJ, i= 1, 2, ...M (2.35)
Since we are considering binary codes, ciJ is either a 0 or a 1. Let each bit of the codeword be
mapped on to a BPSK waveform p
1 so that the codeword may be represented as
where
n
s/t) = I, siJ p1(t), i = 1,2, ... M,
j=l
{
.JE for ciJ = 1
siJ = -JE for ciJ = 0
(2.36)
(2.37)
and JE is the energy per code bit. The waveform s/t) can then be represented as the n-
dimensional vector
si = [si1 si2 ... siJ , i = 1, 2, ...M (2.38)
We observe that this corresponds to a hypercube in the n-dimensional space. Let us now encode
k bits of information into an n bit long codeword, and map this codeword to one of the M
I

ｾｉ＠
waveforms. Note that there are a total of 2k possible waveforms r.orresponding to the M = 2k
different codewords.
Let the information rate into the encoder be R bits/sec. The encoder takes ink bits at a time
and maps the k-bit block to one of the M waveforms. Thus, k = RT and M = 2k signals are
required.
Let us define a parameter D as follows
D = !!:_ dimensions/sec (2.39)
T
n = DT is the dimensionality of the space. The hypercube mentioned above has 2n = 2DT
vertices. Of these, we must choose M = 2RT to transmit the information. Under the constraint
D > R, the fraction of vertices that can be used as signal points is
2k 2RT
F=-=-=T(D-R)T (2.40)
2n 2DT
For D >R, F ｾ＠ 0 as T ｾ＠ oo. Since, n = DT, it implies that F ｾ＠ 0 as n ｾ＠ oo. Designing a good
coding scheme translates to choosing M vertices out of the 2n vertices of the hypercube in such
a manner that the probability of error tends to zero as we increase n. We saw that the fraction F
tends to zero as we choose larger and larger n. This implies that it is possible to increase the
minimum distance between these M signal points as n ｾ＠ oo. Increasing the minimum distance
between the signal points would give us the probability of error, Pe ｾ＠ 0.
There are ( 2n)M distinct ways of choosing M out of the total2n vertices. Each of these choices
corresponds to a coding scheme. For each set of M waveforms, it is possible to design a
communication system consisting of a modulator and a demodulator. Thus, there are 2nM
communication systems, one for each choice of the M coded waveforms. Each of these
communication systems is characterized by its probability of error. Of course, many of these
communication systems will perform poorly in terms of the probability of error.
Let us pick one of the codes at random from the possible 2nM sets of codes. The random
selection of this mili codeword occurs with the probability
(2.41)
Let the corresponding probability of error for this choice of code be Pe({s;} m). Then the
average probability of error over the ensemble of codes is
cztM
ｾ＠ = ｌｾＨｻｳｩｽｭＩｐＨｻｳｩｽｭＩﾷ＠
m;=1
'f'M
= 2!M ｾＱｊＡＨ＠ {si}m) (2.42)
We will next try to upper bound this average probability of error. If we have an upper bound
on ｾＬ＠ then we can conclude that there exists at least one code for which this upper bound will
also hold. Furthermore, ｩｦｾ＠ｾ＠ 0 as n ｾ＠ oo, we can surmise that Pe({sJJ ｾ＠ 0 as n ｾ＠ oo.
Consider the transmission of a k-bit message Xk = [x1 ｾ＠ ... xJ where x
1
is binary for j= 1,2,...,
k. The conditional probability of error averaged over all possible codes is
J!(Xk) = L Pe (Xh {s;}JP ({s;}J
all codes
(2.43)
where Pe (Xh {sJJ is the conditional probability of error for a given k-bit message Xk = ｛ｸＱｾ＠ · ••
xk], which is transmitted using the code {sJm. For the mili code,
M
Pe(Xh {sJJ $ L ｾｭＨｓｺＬｓｫＩＬ＠ (2.44)
l=1
l-T-k
where, P2m(sz, sk) is the probability of error for the binary communication system using signal
vectors Sz and sk. to transmit one of two equally likely k-bit messages. Hence,
M
J!(Xk) $ L Pe({sJJ LP2m(Sz, sJ (2.45)
all codes l =1
l-T-k ,
On changing the order of summation we obtain
ｊＡＨｘｾＩ＠ $ I [ ｌｾＨｻｳ［ｽｭｾｭＨｳｺＬ＠ sk)J $ I ｾＨｓｺＬ＠ sk).
l=l all codes l=l
(2.46)
where ｾＨｳＱＬ＠ sk), represents the ensemble average of) , P2m(sz, sJ over the 2nMcodes. For additive
White Gaussian Noise Channel,
Therefore,
P2m (sz, sk) = Q,( dft J
2N0
n
､ｾ］＠ 1st- si = L (sl;.- skJ)2
= d(2JE)2
= 4dE
}=1
P,m (s,, ｾｾ＠ｾ＠ Q(ｾｾ＠ J
(2.47)
(2.48)
(2.49)
Under the assumption that all codes are equally probable, it is equally likely that the vector s1
is any of the 2n vertices of the hypercube. Further, s1and sk are statistically independent. Hence,
the probability that s1and sk differ in exactly d places is

P(d) =(; r(:J
The expected value of P2m(s1, sk) over the ensemble of codes is then given by
Using the following upper bound
dE
ｾｾＩ＼･ＭｎｯＬ＠
we obtain
P2(s1, s,) ｾＨ［ＮＩ｟ｴｯＨＺｽｾｾ＠ｾ＠ [ｾ＠ (I+･ｾＡ＠ Jr
From Eqs. (2.46) and (2.53) we obtain
ｐＬＨｘＬＩｾｾｐＬＨｳＬＬＮＬＩ＠ =(M-1{ ｾＨｾＫＮｾＡＩｲ＠ <M[ ［ＨｬＫｾＡＩｲ＠
ｚＬｾｫ＠
(2.50)
(2.51)
(2.52)
(2.53)
(2.54)
Recall that we need an upper bound on ｾＧ＠ the average error probability. To obtain ｾ＠ we
average ｾ＠ (Xk) over all possible k- bit information sequences. Thus,
We now define a new parameter as follows.
Definition 2.4 The Cutoff Rate R0 is defined as
Ro= log2
2
2= l-log2 ( ｉＫＮｾＡＩＮ＠ (2.56)
1+ eNo
The cutoff rate has the units of bits/dimension. Observe that 0 ｾ＠ R0 ｾ＠ 1. The plot of R0
with respect to SNR per dimension is given in Fig. 2.8.
The Eq. (2.55) can now be written succinctly as
ｾ＠ <M2-nRo = TRT 2-nRo (2.57)
Substituting n = DT, we obtain
p <2-T(DRo-R).
e (2.58)
If we substitute T = nlD, we obtain
p <2-n (Ro- RJI1)
e (2.59)
0.9
0.8
0.7
0.6
Ro
0.5
0.4
0.3
0.2
0.1
0
-10 -5 0 5 10
E!N0 (dB)
Fig. 2.8 Cutoff Rate, R0, Versus the SNR (in dB) Per Dimension.
Observe that
Ji = _!l_ = RT = _! = R
D niT n n c·
(2.60)
Here, Rc represents the code rate. Hence the average error probability can be written in the
following instructive form
(2.61)
From the above equation we can conclude the following.
(i) For Rc <Rothe average probability of error ｾ＠ｾ＠ 0 as n ｾ＠ oo. Since by choosing large
values of n, ｾ＠ can be made arbitrarily small, there exist good codes in the ensemble
which have the probability of error less than ｾＮ＠
(ii) We observe the fact that ｾ＠ is the ensemble average. Therefore, if a code is selected at
random, the probability is that its error ｾ＠ > a ｾ＠ is less than 11a. This implies that there
are no more than 10% of the codes that have an error probability that exceeds ＱＰｾＮ＠
Thus, ｴｨ･ｲｾ＠ are many good codes.
(iii) The codes whose probability of error ･ｸ｣･･､ｾ＠ are not always bad codes. The probability
of error of these codes may be reduced by increasing the dimensionality, n.
I

For binary coded signals, the cutoff rate, J?o, saturates at 1 bit/dimension for large values of__§___,
No
say >10. Thus, to achieve lower probabilities of error one must reduce the code rate, Rc
Alternately, very large block lengths have to be used. This is not an efficient approach. So,
binary codes become inefficient at high SNRs. For high SNR scenarios, non-binary coded signal
sets should be used to achieve an increase in the number of bits per dimension. Multiple-
amplitude coded signal sets can be easily constructed from non-binary codes by mapping each
code element into one of the possible amplitude levels (e.g. Pulse Amplitude Modulation). For
random codes using Mary multi-amplitude signals, it was shown by Shannon (in 1959) that
(2.62)
Let us now relate the cutoff rate R•0 to the capacity of the AWGN channel, which is given by
e= W log2 (1 + __f__) bits per second (2.63)
N 0W
The energy per code bit is equal to
E= PT
n
(2.64)
Recall that from the sampling theorem, a signal of bandwidth W may be represented by samples
taken at a rate 2W Thus, in the time interval of length T there are n = 2WTsamples. Therefore,
we may write D = _!l__ = 2W Hence,
T
nE
P=-=DE.
T
(2.65)
Define normalized capacity, en = _f_ = _s;__ and substitute for Wand Pin (2.63) to obtain
2W D
｣Ｎｾ＠ (ｾ＠ )'og2 (1+ 2;J
= ( ｾ＠ }og2 (1 + 2RcY.b) (2.66)
where 'Yb is the SNR per bit. The normalized capacity, en and cutoff rate, R·0, are plotted in
Fig. 2.9. From the figure we can conclude the following:
(i) ｒｾ＠ < en for all values of __§___ . This is expected because en is the ultimate limit on the
No
transmission rate R/D.
(ii) For smaller values of the difference between en and R*0 is approximately 3 dB. This
means that randomly selected, average power limited, multi-amplitude signals yield R•0
within 3 dB of channel capacity.
2.5
2
0:1.5
"'C
c
I'll
<::
(..) 1
0.5
-5 0
E!No (98)
5 10
Fig. 2.9 The Normalized Capacity, Cn and Cutoff Rate, ｒｾＬ＠ for an AWGN Channel.
Pioneering work in the area of channel capacity was done by Shannon in 1948. Shannon's
second theorem was indeed a surprising result at the time of its publication. It claimed that the
probability of error for a BSC could be made as small as desired provided the code rate was less
than the channel capacity. This theorem paved the way for a systematic study of reliable
communication over unreliable (noisy) channels. Shannon's third theorem, the Information
Capacity Theorem, is one of the most remarkable results in information theory. It gives a
relation between the channel bandwidth, the signal to noise ratio and the channel capacity.
Additional work was carried out in the 1950s and 1960s by Gilbert, Gallager, Wyner, Forney
and Viterbi to name some of the prominent contributors.
The concept of cutoff rate was also developed by Shannon, but was later used by Wozencraft,
Jacobs and Kennedy as a design parameter for communication systems.Jordan used the concept
of cutoff rate to design coded waveforms for Mary orthogonal signals with coherent and non-
coherent detection. Cutoff rates have been widely used as a design criterion for various
channels, including fading channels encountered in wireless communications.
. !

SUMMARY
• The conditional probability P (yi I xj is called the channel transition probability and is
denoted by Pji· The conditional pro6abilities {P(yi I x)} that characterize a DMC can be
arranged in the matrix form P = [p1J. P is known as the probability transition matrix for
the channel. ·
• The capacity of a discrete memoryless channel (DMC) is defined as the maximum
average mutual information in any single use of the channel, where the maximization is
over all possible input probabilities. That is,
q-I r-I P(y-lx·)
C= max !(X; Y) =max L :LP(x1)P(yilx1)log ( {
P(xj) P(xj) j=Oi=O p Yi
• The basic objective of channel coding is to increase the resistance of the digital
communication system to channel noise. This is done by adding redundancies in the
transmitted data stream in a controlled manner. Channel coding is also referred to as
error control coding.
• The ratio, r= ! ,is called the code rate. Code rate of any coding scheme is always less
n
than unity.
• Let a DMS with an alphabet X have entropy H (X) and produce symbols every I:
seconds. Let a DMC have capacity C and be used once every Tc seconds. Then, if
H(X) :::; _f._, there exists a coding scheme for which the source output can be transmitted
I: Tc
over the noisy channel and be reconstructed with an arbitrarily low probability of error.
This is the Channel Coding Theorem or the Noisy Coding Theorem.
• For H(X) :::; _q__, it is not possible to transmit information over the channel and
I: ｾ＠
reconstruct it with an arbitrarily small probability of error. The parameter _f._ is called
ｾ＠
the Critical Rate.
• The information capacity can be expressed as C= Wlog2 (I+_f_) bits per second.
N0W
This is the basic formula for the capacity of the band-limited, AWGN waveform channel
with a band-limited and average power-limited input. This is the crux of the Information
Capacity Theorem. This theorem is also called the Channel Capacity Theorem.
• The cutoff rate 11, is given by 11, = log2 ｾ＠ L = 1-log2 ( 1+ e- : 0
J.The cutoff rate
I+ e No
has the units of bits/dimension. Note that 0 :::; R 0:::; 1. The average error probability in
terms of the cutoff rate can be written as ｾ＠ < 2-n (J?o - R,). For Rc < ｾ＠ the average
probability of error ｾ＠ｾ＠ 0 as n ｾ＠ oo .
ff
PRC913LEMS
2.I Consider the binary channel shown in Fig. 2.IO. Let the a priori probabilities oilifsendiP.n(Xg
the binary symbol be Po and pi, where Po+ PI= I. Find the aposteriori probab· "ties
= 0 IY = 0) and P (X= IIY = I)
P1 1 1-q
Fig. 2.10
q
2.2 Find the capacity of the binary erasure channel shown in Fig. 2.1I, where Po and PI are the
a priori probabilities.
P1 1 1-q
Fig. 2.11
e
2.3 Consider the channels A, B and the cascaded channel AB shown in Fig. 2.I2.
(a) Find CA the capacity of channel A.
(b) Find ｾｴｨ･＠ capacity of channel B. . .
(c) Next, cascade the two channels and determine the combmed capacity CAB.
(d) Explain the relation between CA, ｾ｡ｮ､＠ CAB.
lI
I
I
I

ｾ＠
ｾ＠
A B
Fig. 2.12
2.4 Find the capacity of the channel shown in Fig. 2.13.
0.5
0.5
Fig. 2.13
AB
2.5 (a) A telephone channel has a bandwidth of 3000 Hz and the SNR = 20 dB. Determine
the channel capacity.
(b) If the SNR is increased to 25 dB, determine the capacity.
2.6 Determine the channel capacity of the channel shown in Fig. 2.14.
1 -p
Fig. 2.14
2.7 Suppose a TV displays 30 frames/second. There are approximately 2 x 105 pixels per
frame, each pixel requiring 16 bits for colour display. Assuming an SNR of 25 dB
calculate the bandwidth required to support the transmission of the TV video signal (use
the Information Capacity Theorem).
2.8 Consider the Z channel shown in Fig. 2.15.
(a) Find the input probabilities that result in capacity.
(b) If N such channels are cascaded, show that the combined channel can be represented
by an equivalent Z channel with the channel transition probability jll.
(c) What is the capacity of the combined channel as N -7 oo?
ｾ＠ 1 -p
Fig. 2.15
2.9 Consider a communication system using antipodal signalling. The SNR is 20 dB.
(a) Find the cutoff rate, J?o.
(b) We want to design a code which results in an average probability of error, Pe < 10-6.
What is the best code rate we can achieve?
(c) What will be the dimensionality, n, of this code?
(d) Repeat the earlier parts (a), (b) and (c) for an SNR = 5dB. Compare the results.
2.10 (a) Prove that for a finite variance o-2
, the Gaussian random variable has the largest
differential entropy attainable by any random variable.
(b) Show that this entropy is given by _!_ log2 (2 neo-2
).
2
C<.9MPUTER PR<.9'BLEMS
2.11 Write a computer program that takes in the channel transition probability matrix and
computes the capacity of the channel.
2.12 Plot the operating points on the bandwidth efficiency diagram for M-PSK, M= 2, 4, 8, 16
and 32, and the probabilities of error: (a) Pe = 10--{) and (b) Pe = 10-8 .
2.13 Write a program that implements the binary repitition code of rate 11n, where n is an odd
integer. Develop a decoder for the repitition code. Test the performance of this coding
scheme over a BSC with the channel transition probability, p. Generalize the program for
a repetition code of rate 11n over GF (q). Plot the residual Bit Error Rate (BER) versus p
and q (make a 3-D mesh plot).
I

Linear Block Codes for
Error Correction
3
:C ｍｾ＠ w CM1I ｾ＠ｾ＠ｾ＠ btcr tx
ihct.WL net be-- allcwedt to- ｾ＠ i-t1t ｾ＠ｷｾ＠ of ｾ＠
ｾｾｾｃｬ｢｣ｴ､Ｂｰｾｾｰｾ＠ II
Ri.cluM-d- W. ｈｾ＠
3.1 INTRODUCTION TO ERROR CORRECTING CODES
In this age ofinformation, there is increasing need not only for speed, but also for accuracy in the
storage, retrieval, and transmission of data. The channels over which messages are transmitted
are often imperfect. Machines do make errors, and their non-man-made mistakes can turn
otherwise flawless programming into worthless, even dangerous, trash.Just as architects design
buildings that will stand even through an earthquake, their computer counterparts have come
up with sophisticated techniques capable of counteracting digital manifestations of Murphy's
Law ("If anything can go wrong, it will go"). Error Correcting Codes are a kind of safety net-
the mathematical insurance against the vagaries of an imperfect digital world.
Error Correcting Codes, as the name suggests, are used for correcting errors when
messages are transmitted over a noisy channel or stored data is retrieved. The physical medium
through which the messages are transmitted is called a channel (e.g. a telephone line, a satellite
link, a wireless channel used for mobile communications etc.). Different kinds of channels are

prone to different kinds of noise, which corrupt the data being transmitted. The noise could be
caused by lightning, human errors, equipment malfunction, voltage surge etc. Because these
error correcting codes try to overcome the detrimental effects of noise in the channel, the
encoding procedure is also called Channel Coding. Error control codes are also used for accurate
transfer of information from one place to another, for example storing data and reading it from
a compact disc (CD). In this case, the error could be due to a scratch on the surface of the CD.
The error correcting coding scheme will try to recover the original data from the corrupted one.
The basic idea behind error correcting codes is to add some redundancy in the form of extra
symbol to a message prior to its transmission through a noisy channel. This redundancy is
added in a controlled manner. The encoded message when transmitted might be corrupted by
noise in the channel. At the receiver, the original message can be recovered from the corrupted
one if the number of errors are within the limit for which the code has been designed. The block
diagram of a digital communication system is illustrated in Fig. 3.1. No.te that the most important
block in the figure is that of noise, without which there will be no need for the channel encoder.
Example 3.1 Let us see how redundancy combats the effects ofnoise. The normal language that
we use to communicate (say, English) has a lot ofredundancy built into it. Consider the following
sentence.
CODNG THEORY IS AN INTRSTNG SUBJCT.
As we can see, there are a number of errors in this sentence. However, due to familiarity with the
language we may guess the original text to have read:
CODING THEORY IS AN INTERESTING SUBJECT.
What we have just used is an error correcting strategy that makes use ofthe in-builtredundancy in
English language to reconstruct the original message from the corrupted one.
Information
Source
' ﾷＬＧＬＺＬｾ＠ ,;..·
Use of
Information
Channel
Encoder
Channel
Decoder
Demodulator 1'*------'
Fig. 3.1 Block Diagram (and the principle) of a Digital Communication System.
Here the Source Coder/Decoder Block has not been shown.
Linear Block Codes for Error Correction
The objectives of a good error control coding scheme are
(i) error correcting capability in terms of the number of errors that it can rectify
(ii) fast and efficient encoding of the message,
(iii) fast and efficient decoding of the received message
(iv) maximum transfer of information bits per unit time (i.e., fewer overheads in terms of
redundancy).
The first objective is the primary one. In order to increase the error correcting capability of a
coding scheme one must introduce more redundancies. However, increased redundancy leads
to a slower rate of transfer of the actual information. Thus the objectives (i) and (iv) are not
totally compatible. Also, as the coding strategies become more complicated for correcting larger
number of errors, the objectives (ii) and (iii) also become difficult to achieve.
In this chapter, we ｳｨ｡ｬｾ＠ first learn about the basic definitions of error control coding. These
definitions, as we shall see, would be used throughout this book. The concept of Linear Block
Codes will then be introduced. linear Block Codes form a very large class of useful codes.
We will see that it is very easy to work with the matrix description of these codes. In the later
part of this chapter, we will learn how to efficiently decode these Linear Block Codes. Finally,
the notion of perfect codes and optimal linear codes will be introduced.
$
3.2 BASIC DEFINITIONS
Given here are some basic definitions, which will be frequently used here as well as in the later
chapters.
Definition 3.1 A Word is a sequence of symbols.
Definition 3.2 A Code is a set of vectors called Codewords.
Definition 3.3 The Hamming Weight of a Codeword {or any vector) is equal to
the number of nonzero elements in the codeword. The Hamming Weight of a
codeword cis denoted by w(c). The Hamming Distance between two codewords is
the number of places the codewords differ. The Hamming Distance between two
codewords c1 and '2 is denoted by d(ch '2). It is easy to see that d(ch '2) = w (cl- ｣ｾＮ＠
Example 3.2 Consider a code C with two code words= {0100, 1111} with Hamming Weight
w (0100) =1 and w (1111) =4. The Hamming Distance between the two codewords is 3 because
they differ at the 18
3rd and 4th p1aees. Observe that w (0100- 1111) = w (1011) = 3 = d(OlOO,
1111).
Example 3.3 For the codeC = {01234, 43210}, the Hamming Weight ofeach codeword is 4and
the Hamming Distance between the codewords is 4 (because only the 3rd component of the two
codewords are identical while they differ at 4 places).
.
j
!

I
l
Definition 3.4 A Block Code consists of a set of fixed length codewords. The
fixed length of these codewords is called the Block Length and is typically denoted
by n. Thus, a code of blocklength n consists of a set of codewords having n
components.
A block code of size M defined over an alphabet with qsymbols is a set of M q-ary
sequences, each of length n. In the special case that q= 2, the symbols are called bits
and the code is said to be a binary code. Usually, M = q* for some integer k, and we
call such a code an (n, k) code.
Example 3.4 The code C = {00000, 10100, 11110, 11001} is a block code ofblock length equal
to 5. This code can be used to represent two bit binary numbers as follows
Uncoded bits Codewords
00 ()()()()()
01
10
11
10100
ll110
11001
HereM= 4,k= 2 andn = 5. Suppose we have to transmit a sequence of 1'sand O's using the above
coding scheme. Let's say that the sequence to be encoded is 1 0 0 1 0 1 0 0 1 1 ... The first step is
to break the sequence in groups of two bits (because we want to encode two bits at a time). So we
partition as follows
1001010011 ...
Next, replace each block by its corresponding codeword.
11110 10100 10100 ()()()()() 11001 ...
Thus 5 bits (coded) are sent for every 2 bits of uncoded message. It should be observed that for
every 2 bits of information we are sending 3 extra bits (redundancy).
Definition 3.5 The Code Rate of an (n, Jq code is defined as the ratio (kin), and
denotes the fraction of the codeword that consists of the information symbols.
Code rate is always less than unity. The smaller the code rate, the greater the
redundancy, i.e., more of redundant symbols are present per information symbol in a
codeword. A code with greater redundancy has the potential to detect and correct
more of symbols in error, but reduces the actual rate of transmission of information.
Definition 3.6 The minimum distance of a code is the minimum Hamming
distance between any two codewords. If the code C consists of the set of codewords
{ci' i=O, 1, ...,M-1} then the minimum distance ofthe code is given by a=mind(ci' c
1),
1
i*'i An (n, "' code with minimum distance ais sometimes denoted by (n, k, a).
Definition 3.7 The minimum weight of a code is the smallest weight of any non-
zero codeword, and is denoted by w·.
Theorem 3.1 For a linear code the minimum distance is equal to the minimum weight of
the code, i.e., d.. = w*.
Intuitive proof: The distance diJ between any two codewords ci and 0is simply the weight
of the codeword formed by ci - c
1 Since the code is linear, the difference between two
codewords results in another valid codeword. Thus, the minimum weight of a non-zero
codeword will reflect the minimum distance of the code.
Definition 3.8 A linear code has the following properties:
(i) The sum of two codewords belonging to the code is also a codeword belonging
to the code.
(ii) The all-zero word is always a codeword.
(iii) The minimum Hamming distance between two codewords of a linear code is
equal to the minimum weight of any non-zero codeword, i.e., a= w"'.
Note that if the sum of two codewords is anott'l..!r codeword, the difference of two
codewords will also yield a valid codeword. For example, if ell '2 and c3 are valid
codewords such that c1 + £2 = c3 then Ca - c1 = c2. Hence it is obvious that the all-zero
codeword must always be a valid codeword for a linear block code (seH-subtraction
of a codeword).
Example 3.5 The code C = {0000, 1010,0101, 1111} is a linear block code ofblock lengthn =
4. Observe that all the ten possible sums of the codewords
0000 + 0000 = 0000,0000 + 1010 = 1010, ()()()() + ('101 = 0101,
0000 + 1111 = 1111, 1010 + 1010 = 0000, 1010 + 0101 = 1111,
1010 + 1111 = 0101, 0101 + 0101 = 0000, 0101 + 1111 = 1010 and
1111 + 1111 = 0000
are in C and the all-zero codeword is inC. The minimum distance ofthis code isd = 2. In orderto
verify the minimum distance ofthis linear w code we can determine the distance between all pairs
of codewords (which is ( ｾＩ＠ = 6 in number):
d (0000, 1010) = 2, d (0000, 0101) = 2, d (0000, 1111) = 4
d (1010, 0101) = 4, d (1010, 1111) = 2, d (0101, 1111) = 2
We observe that the minimum distance ofthis code is 2.
Note that the code given in Example 3.4 is not linear because 1010 + 1111 = 0101, which is not
a valid codeword. Even though the all-zero word is a valid codeword, it does not guarantee
linearity. The presence of an all-zero codeword is thus a necessary b!lt not a sufficient
condition for linearity.
.I

In order to make the error correcting codes easier to use, understand and analyze, it is helpful to
impose some basic algebraic structure on them. As we shall soon see, it is useful to have an
alphabet werein it is easy to carry out basic mathematical operations such as add, subtract,
multiply and divide.
Definition 3.9 A field F is a set of elements with two operations + (addition) and .
(multiplication) satisfying the following properties
(i) F is closed under + and ., i.e., a + band a · bare in F if aand bare in F.
For all a, band cin F, the following hold:
(ii) Commutative laws: a + b= b+ a, a · b= b. a
(iii) Associative laws: (a+ b)+ c= a+ (b +c), a· (b · c)= (a· b) · c
(iv) Distributive law: a· (b + q=a· b+a· c
Further, identity elements 0 and 1 must exist in F satisfying:
(v) a+ 0 =a
(vi) a· 1 =a
(vii) For any a in F, there exists an additive inverse (-a) such that a + (-a) = 0.
{viii) For any ain F, there exists an multiplicative inverse (a-1
) such that a· a-1
= 1.
The above properties are true for fields with both finite as well as infinite elements. A
field with a finite number of eleiJents (say, q) is called a Galois Field (pronounced
Galva Field) and is denoted by GF(q). If only the first seven properties are satisfied,
then it is called a ring.
Extunple 3.6 Consider GF (4) with 4 elements {0, 1, 2, 3}. The addition and multiplication
tables for GF(4) are
+ 0 1 2 3 . 0 1 2 3
0 0 I 2 3 0 0 0 0 0
1 1 0 3 2 1 0 1 2 3
2 2 3 0 1 2 0 2 3 1
3 3 2 1 0 3 0 3 1 2
It should be noted here that the addition in GF(4) is not modulo 4 addition.
Let us define a vector space, GF(q)n, which is a set of n-tuples of elements from GF(q). Linear
block codes can be looked upon as a set of n-tuples (vectors oflength n) over GF(q) such that the
sum of two codewords is also a codeword, and the product of any codeword by a field element
is a codeword. Thus, a linear block code is a subspace of GF(q)n.
Let S be a set of vectors of length n whose components are defined over GF(q). The set of all
linear combinations of the vectors ofSis called the linear span ofSand is denoted by <S>. The
linear span is thus a subspace of GF( q)n, generated by S. Given any subset S of GF( q)n, it is
possible to obtain a linear code C = <S> generated by S, consisting of precisely the following
codewords:
(i) all-zero word,
(ii) all words in S,
(iii) all linear combinations of two or more words in S.
Example 3.7 LetS= {1100, 0100, 0011 }. All possible linear combinations of S are llOO +
0100 = 1000, 1100 + 0011 = 1111,0100 + 0011 = 0111, 1100 + 0100 + 0011 ｾ＠ 1011.
Therefore, C = <S> = {0000, 1100,0100,0011, 1000, 1111,0111, 1011 }. The minimumdistance
of this code is w(OIOO) = 1.
Extunple 3.8 LetS= {12, 21} defined over GF(3). The addition and multiplication tables of
field GF(3) = {0, 1, 2} are given by:
+ 0 1 2 0 1 2
0 0 1 2 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2
All possible linear combinations of 12 and 21 are:
12 + 21 = 00, 12 + 2(21) = 21, 2(12) + 21 = 12.
Therefore, C = <S> = {00, 12, 21, 00, 21, 12} = {00, 12, 21}.
3.3 MATRIX DESCRIPTION OF LINEAR BLOCK CODES
As we have observed earlier, any code Cis a subspace of GF(qt. Any set of basis vectors can be
used to generate the code space. We can, therefore, define a generator matrix, G, the rows of
which form the basis vectors of the subspace. The rows of G will be linearly independent. Thus,
a linear combination of the rows can be used to generate the codewords of C. The generator
matrix will be a k x n matrix with rank k. Since the choice of the basis vectors is not unique, the
generator matrix is not unique for a given linear code.

The generator matrix converts (encodes) a vector of length k to a vector of length n. Let the
input vector (uncoded symbols) be represented by i. The coded symbols will be given by
c= iG (3.1)
where c is called the codeword and i is called the information word.
The generator matrix provides a concise and efficient way of representing a linear block
code. The nx k matrix can generate q* codewords. Thus, instead of having a large look-up table
of q* codewords, one can simply have a generator matrix. This provides an enormous saving in
storage space for large codes. For example, for the binary (46, 24) code the total number of
codewords are 224
= 1,777,216 and the size of the lookup table of codewords will be n x 2* =
771,751,936 bits. On the other hand if we use a generator matrix, the total storage requirement
would be n x k= 46 x 24 = 1104 bits.
Example 3.9 Consider a generator matrix
G=[1 0 1]
0 1 0
[0 0] [ ｾ＠
0
ｾ＠ ]= [0 0 0], [0 1] [ ｾ＠
0
ｾ｝＠ = [0 1 0]
cl =
1
cl=
1
c3 = [1 0] [ ｾ＠
0
ｾ｝＠ = [1 0 1], c4 =[11] ｛ｾ＠
0
ｾ｝＠ = [1 1 1]
1 1
Therefore, this generator matrix generates the code C = {000, 010, 101, 111 }.
3.4 EQUIVALENT CODES
Definition 3.10 A permutation of a setS= ｻｸＱＬｾＬ＠ ...,x11} is a one to one mapping
from S to itself. A permutation can be denoted as follows
Xz
J, (3.2)
f(xz)
Deftnitlon 3.11 Two q-axy codes are called equivalent if one can be obtained
from the other by one or both operations listed below:
(i) permutation of the symbols appearing in a fixed position,
(ii) permutation of the positions of the code.
Suppose a code c.ontaining M codewords are displayed in the form of an M x n matrix, where
the rows represent the codewords. The operation (i) corresponds to the re-labelling of the
symbols appearing in a given column, and the operation (ii) represents the rearrangements of
the colums of the matrix.
Example 3.10 Consider the ternary code (a code whose components e {0, 1,2}) of blocldength 3
C= ｻｾ＠ｾ＠ｾ＠
0 1 2
If we apply the permutation 0 ｾ＠ 2 , 2 ｾ＠ 1, 1 ｾ＠ 0 to column 2 and 1ｾ＠ 2, 0 ｾ＠ 1, 2 -+ 0 to column
3 we obtain
Cl = ｻｾ＠ｾ＠ｾ＠
0 0 0
The code Cl is equivalent to a repetition code of length 3.
Note that the original code is not linear, but is equfvalent to a linear code.
Definition 3.12 Two linear q-ary codes are called equivalent if one can be
obtained from the other by one or both operations listed below:
(i) multiplication of the components by a non-zero scalar,
(ii) permutation of the positions of the code.
Note that in Definition 3.11 we have defined equivalent codes that are not necessarily
linear.
Theorem 3.2 Two k x n matrices generate equivalent linear (n, k) codes over GF(q) if one
matrix can be obtained from the other by a sequence of the following operations:
(i) Permutation of rows
(ii) Multiplication of a row by a non scalar
(iii) Addition of a scalar multiple of one row to another
(iv) Permutation of columns
(v) Multiplication of any column by a non-zero scalar.

j.l
ｾｯｯｦ＠ The first three operations (which are just row operations) prlserve the linear
ｭ､ｾｰ･ｮ､･ｮ｣･＠ of the rows of the generator matrix. The operations merely modify the
basis. The last two operations (which are column operations) convert the matrix to one
which will produce an equivalent code.
Theorem 3.3 A generator matrix can be reduced to its systematic form (also called the
standard form of the generator matrix) of the type G = [ I 1 P] where I is a k x k identity
matrix and P is a k x (n - k) matrix.
Proof ｔｨｾ＠ krows of any generator matrix (of size kx n) are linearly independent. Hence,
by ｰ･ｾｯｲｭｭｧ＠ elementary row operations and column permutations it is possible to obtain
an eqmvalent generator matrix in a row echelon form. This matrix will be of the fo
[II P]. rm
Example 3.11 Consider the generator matrix of a (4, 3) code over GF(3):
ｇ］｛ｾ＠ｾ＠ｾ＠ｾ｝＠
1 2 2 1
Let us represent the ith row by 7i and thejth column by 7i. Upon replacing 7
3
by 73 - 71 - 72 we get
(note that in GF(3), -1 =2 and -2 =1 because 1 + 2 =0, see table in Example 3.6)
G = ｛ｾ＠ｾ＠ｾ＠ｾ｝＠
0 1 2 0
Next we replace 71 by r1 - r3 to obtain
[
0 0
G= 1 0
0 1
01]
1 0
2 0
Finally, shifting c4 -7 cl, cl -7 C2, C2 -7 c3 and c3 -7 c4 we obtain the standard form of the
generator matrix
ｇ］｛ｾ＠ｾ＠ｾ＠ｾ｝ﾷ＠
0 0 1 2
3.5 PARITY CHECK MATRIX
One of the objectives of a good code design is to have fast and efficient encoding and decoding
methodologies. So far we have dealt with the efficient generation of linear block codes using a
generator matrix. Codewords are obtained simply by multiplying the input vector (uncoded
word) by the generator matrix. Is it possible to detect a valid codeword using a similar concept?
The answer is yes, and such a matrix is called the Parity Check Matrix, H, for the given code.
For a parity check matrix,
cHT = 0 (3.3)
where cis a valid codeword. Since c= iG, therefore, iGHT= 0. For this to hold true for all valid
informat words we must have
(3.4)
The size of the parity check matrix is (n - k) x n. A parity check matrix provides a simple
method of detecting whether an error has occurred or not. If the multiplication of the received
word (at the receiver) with the transpose of H yields a non-zero vector, it implies that an error
has occurred. This methodology, however, will fail if the errors in the transmitted codeword
exceed the number of errors for which the coding,scheme is designed. We shall soon find out
that the non-zero product of cHT might help us not only to detect but also to correct the errors
under some conditions.
Suppose the generator matrix ｾｳ＠ represented in its systematic form G = [I IP]. The matrix P
is called the Coefficient Matrix. Then the parity check matrix will be defined as
H= ( -PTI I], (3.5)
where pT represents the transpose of matrix P. This is because
(3.6)
Since the choice of a generator matrix is not unique for a code, the parity check matrix will not
be unique either. Given a generator matrix G, we can determine the corresponding parity check
matrix and vice versa. Thus the parity ｣ｨｾ｣ｫ＠ matrix H can be used to specify the code completely.
From Eq. (3.3) we observe that the vector c must have 1's in such positions that the
corresponding rows of HT add up to the zero vector 0. Now, we know that the number of 1's in
a codeword pertains to its Hamming weight. Hence, the minimum distance d ofa linear block code is
given by the minimum number ofrows ofHT (or, the columns ofH) whose sum is equal to the qro vector.
.I

Exampk3.12 For a (7, 4) linear block code the generator matrix is given by
ｇ］｛ｾ＠
0 0 0 1 0
']
1 0 0 1 1 1
0 0 1 0 0 1 0 '
0 0 0 1 0 1 0
[
101]
' 111 1100
the matrix Pis given by
0 1 0
and pr is given by [o 1 1 1]. Observing the fact that
010 1100
- 1 = 1 for the case of binary, we can write Lhe parity check matrix as
H=[-PTII]
fl100100]
=l·o 1 1 1 o 1 o.
1100001
Note that the columns 1, 5 and 7 of the parity check matrix, H, add up to the zero vector.
Hence, for this code, d* = 3.
Theorem 3.4 The code C contains a nonzero codeword of Hamming weight w or less if
and only if a linearly dependent set of w columns of H exist.
Proof Consider a codeword c E C. Let the weight of c be w which implies that there are
w non-zero components and (n- w) zero components in c. If we throw away the w zero
components, then fro:n the relation CHT = 0 we can conclude that w columns of H are
linearly dependent.
Conversely, if H has w linearly dependent columns, then a linear combination of at
most w columns is zero. These w non-zero coefficients would define a codeword of weight
w or less that satisfies CHT = 0.
Definition 3.13 An (n, k) systematic code is one in which the first k symbols of the
codeword of block length n are the information symbols themselves (i.e., the uncoded
vector) and the remainder the (n- k) symbols form the parity symbols.
Example 3.13 The following is a (5, 2) systematic code over GF(3)
S.No. Information Symbols Codewords
(k = 2) (n = 5)
1. 00 00 000
2. 01 01 121
3. 02 02 220
4. 10 10 012
5. 11 11 221
6. 12 12 210
7. 20 20 020
8. 21 21 100
9. 22 22 212
Note that the total number ofcodewords is 3k = 3
2
= 9. Each codeword begins with theinformation
symbols, and has three parity symbols at the end. The parity symbols for the information word 01
are 121 in the above table. A generator matrix in the systematic form (standard form) will generate
a systematic code.
Theorem 3.5 The minimum distance (minimum weight) of an (n, k) linear code is bounded
as follows
d<S,n-k+l
This is known as the Singleton Bound.
(3.7)
Proof We can reduce all linear block codes tc their equivalent systematic forms. A
systematic code can have one information symbol and (n - k) parity symbols. At most all
the parity symbols can be non-zero, resulting in the total weight of the codeword to be
(n- k) + 1. Thus the weight of no codeword can exceed n- k + 1 giving the following
definition of a maximum distance code.
Definition 3.14 A Maximum Distance Code satisfies a= n - k + I.
Having familiarized ourselves with the concept of minimum distance of a linear code, we
shall now explore how this minimum distance is related to the total number of errors the code
can detect and possibly correct. So we move over to the receiver end and take a look at the
methods of decoding a linear block code.
3.6 DECODING OF A LINEAR BLOCK CODE
The basic objective of channel coding is to detect and correct errors when messages are
transmitted over a noisy channel. The noise in the channel randomly transforms some of the

ｾ
Ｇ＠
.'
;
j
symbols of the transmitted codeword into some other symbols. If the noise, for example,
changes just one of the symbols in the transmitted codeword, the erroneous codeword will be at
a Hamming distance of one from the original codeword. If the noise transforms t symbols (that
is, t '>nfibols in the codeword are in error), the Hamming distance of the received word will be
at a Hannnirg distance of t from the originally transmitted codeword. Given a code, how many
errors can it detect and how many can it correct? Let us first look at the detection problem.
An error will be detected as long as it does not transform one codeword into another valid
codeword. If the minimum distance between the codewords is I, the weight of the error pattern
must be I or more to cause a transformation of one codeword to another. Therefore, an (n, k,
I) code will detect at least all nonzero error patterns of weight less than or equal to (I - 1).
Moreover, there is at least one error pattern of weight I which will not be detected. This
corresponds to the two codewords that are the closest. It may be possible that some error
patterns of weight I or more are detected, but allerror patterns of weight I will not be detected.
Example 3.14 For the code C1 = {000, Ill} the minimum distance is 3. Therefore errorpatterns
of weight 2 or I can be detected. This means that any error pattern belonging to the set {011, 101,
110, 001, 010, 100} will be detected by this code.
Next consider the code C2 ={001, 110, 101} with d* =1. Nothing can be said regardinghow many
errors this code can detect because d*- 1 = 0. However, the error pattern 010 of weight 1 can be
detected by this code. But it cannot detect all error patterns with weight one, e.g., the error vector
100 cannot be detected.
Next let us look at the problem of error correction. The objective is to make the best possible
guess regarding the originally transmitted codeword on the basis of the received word. What
would be a smart decoding strategy? Since only one of the valid codewords must have been
transmitted, it is logical to conclude that a valid codeword nearest (in terms of Hamming
distance) to the received word must have been actually transmitted. In other words, the
codeword which resembles the received word most is assumed to be the one that was sent. This
strategy is called the Nearest Neighbour Decoding, as we are picking the codeword nearest
to the received word in terms of the Hamming distance.
It may be possible that more than one codeword is at the same Hamming distance from the
received word. In that case the receiver can do one of the following:
(i) It can pick one of the equally distant neighbours randomly, or
(ii) request the transmitter to re-transmit.
To ensure that the received word (that has at most terrors) is closest to the original codeword,
and farther from all other codewords, we must put the following condition on the minimum
distance of the code
ｉｾ＠ 2t+ 1 (3.8)
Graphically, the condition for correcting t errors or less can be visualized from Fig. 3.2.
Consider the space of all 'fary n-tuples. Every 9:ary vector of length n can be represented as a
point in this space. Every codeword can thus be depicted as a point in this space, and all words
at a Hamming distance of tor less would lie within the sphere centred at the codeword' and with
a radius oft. Ifthe minimum distance of the code is I, and the condition ｉｾ＠ 2t+ 1 holds good,
then none of these spheres would intersect. Any received vector (which is just a point) within a
specific sphere will be closest to its centre (which represents a codeword) than any other
codeword. We will call the spheres associated with each codeword its Decoding Sphere.
Hence it is possible to decode the received vector using the 'nearest neighbour' method without
ambiguity.
@
I
I
I
I
Fig. 3.2 Decoding Spheres.
Figure 3.2 shows words within the sphere of radius t and centred at c1 will be decoded as c1
•
For unambiguous decoding ｉｾ＠ 2t + 1.
The condition ｉｾ＠ 2t + 1 takes care of the worst case scenario. It may be possible, however,
that the above condition is not met but it is still feasible to correct t errors as illustrated in the
following example.
Example 3.15 Consider the code C = {00000, 01010, 10101, 11111 }. The minimum distance
d* = 2. Suppose the codeword 11111 was transmitted and the received word is 11110, i.e., t = 1
(one error has occurred in the fifth component). Now,
d (11110, 00000) = 4, d (11110, 01010) =2,
d (11110, 10101) = 3, d (11110, llll1) = I.
Using the nearest neighbour decoding we can conclude that 11111 was transmitted. Eventhough a
single error correction (t =1) was done in this case, d* < 2t + 1 =3. So it is possible to correct

errors even whend*;;:: 2t + 1. However, in many cases a single errorcorrection may not be possible
with this code. For example, ifOOOOO was sent and 01000 was received,
d (01000, 00000) =1, d (01000, 01010) =1,
d (01000, 10101) =4, d (01000, 11111) =4.
In this case there cannot be a clear cut decision, and a coin will have to be flipped1
Definition 3.15 An Incomplete Decoder decodes only those received codewords
that are clearly closest to one of the codewords. In the case of ambiguity, the decoder
declares that the received word is unrecognizable. The receiver is then requested to
re-transmit. A Complete Decoder decodes every received word, i.e., it tries to map
every received word to some codeword, even if it has to make a guess. Example 3.16
was that of a Complete Decoder. Such decoders may be used when it is better to have
a good guess 'rather than to have no guess at all. Most of the real life decoders are
incomplete decoders. Usually they send a message back to the transmitter requesting
them to re-transmit.
Definition 3.16 A receiver declares that an erasure has occurred {i.e., a received
symbol has been erased) when the symbol is received ambiguously, or the presence
of an interference is detected during reception.
Example 3.16 Consider a binary Pulse Amplitude Modulation (PAM) Scheme where 1 is
represented by five volts and 0 is represented by zero volts. The noise margin is one volt, which
implies that at the receiver:
if the received voltage is between 4 volts and 5 volts ｾ＠ the bit sent is 1,
if the received voltage is between 0 volt and 1 volt ｾ＠ the bit sent is 0,
if the received voltage is between 1 volt and 4 volts ｾ＠ an erasure has occurred.
Thus if the receiver ｾ･ｩｶ･､＠ 2.9 volts during a bit interval, it will declare that an erasure has
occurred.
A channel can be prone both to errors and erasures. If in such a channel t errors and r erasures
occur, the error correcting scheme should be able to compensate for the erasures as well as
correct the errors. If r erasures occur, the minimum distance of the code will become d - r in
the worst case. This is because, the erased symbols have to be simply discarded, and if they
were contributing to the minimum distance, this distance will reduce. A simple example will
illustrate the point. Consider the repetition code in which
0 ｾ＠ 00000
ＱｾＱＱＱＱＱ＠
Here d = 5. If r = 2, i.e., two bits get erased (let us say the first two), we will have
0 ｾ＠ ??000
Ｑｾ＿＿ＱＱＱ＠
Now, the effective minimum distance d1* =I -r= 3.
Therefore, for a channel with terrors and r erasures, I- r;;:: 2t + 1. Or,
1;;=:2t+r+1 (3.9)
For a channel which has no errors (t = 0), only r erasures.
l;;::r+1 (3.10)
Next let us give a little more formal treatment to the decoding procedure. Can we construct
some mathematical tools to simplify the nearest neighbour decoding? Suppose the codeword
c = e1ｾ＠ •..., en is transmitted over a noisy channel. The noise in the channel changes some or all
of the symbols of the codeword. Let the received vector be denoted by v = v1Zl:l• ..•, vn. Define the
error vector as
e= V- C = V1 Zl:l• •••, Vn- e! ｾＧ＠ ..., en= el e2, ..., en (3.11)
The decoder has to decide from the received vector, v, which codeword was transmitted, or
equivalently, it must determine the error vector, e.
Definition 3.17 Let Cbe an (n, k) code over GF(q) and a be any vector of length n.
Then the set
a+C={a+.%j.%e C} (3.12)
is called a Coset (or translate) of C. a and bare said to be in the same coset if (a- b)e
c.
Theorem 3.6 Suppose Cis an (n, k) code over GF(q). Then,
(i) every vector b of length n is in some coset of C.
(ii) each coset contains exactly l vectors.
(iii) two cosets are either disjoint or coincide (partial overlap is not possible).
(iv) if a + Cis a coset of C and b e a + C, we have b + C = a+ C.
Proof
(i) b = b + 0 E b + C.
(ii) Observe that the mapping C ---? a + C defined by.% ｾ＠ a + .%, for all.% e C is a one-to-
one mapping. Thus the cardinality of a + Cis the same as that of C, which is equal to
l.

ｾ＠ ...·
ｾＮＱ＠
(iii) Suppose the cosets a + C and a + C overlap, i.e., they have at least one vector in
common.
Let v E (a+ C) n (b +C). Thus, for some x, y E C,
v =a+x=b+ y.
Or, b =a+x-y=a+z,whereze C
(because, the difference of two codewords is also a codeword).
Thus, b+ C =a+ C+zor (b+ C) c (a+ C).
Similarly, it can be shown that (a+ C) c (b + C). From these two we can conclude
that (b +C)= (a+ C).
(iv) Since bE a+ C, it implies that b =a+ x, for some x E C.
Next, if b + y E b + C, then,
b + y = (a + x) + y = a + (x + y) E a + C.
Hence,
b + C ｾ＠ a + C. On the other hand, if a + z E a + C, then,
a + z = (b - x) + z = b + (z - x) E b + C.
Hence,
a + C ｾ＠ b + C, and so b + C = a + C.
Definition 3.18 The vector having the minimum weight in a coset is called the
Coset Leader. If there is more than one vector with the minimum weight, one of
them is chosen at random and is declared the coset leader.
Example 3.17 Let C be the binary (3, 2) code with the generator matrix given by
The cosets ofC are,
G = [1 01]
0 1 0
i.e., C = {000, 010, 101, 111 }.
000 + c = 000,010, 101, 111,
001 + c = 001,011, 100, 110.
Note that all the eight vectors have been covered by these two cosets. As we have already seen (in
the above theorem), ifa + Cis a coset of C and b E a + C, we have b + C =a + C.
Hence, all cosets have been listed. For the sake of illustration we write down the following
010 + c = 010,000, 111, 101,
011 + c = 011, 001, 110, 101,
100 + c = 100, 110, 001, 011,
101 + c = 101, 111, 000, 101,
110 + c = 110, 100, 011, 001,
111 + c = 111, 101,010,000.
It can be seen that all these sets are already covered.
Since two cosets are either disjoint or coincide (from Theorem 3.6), the set of all vectors, GF(q)"
can be written as
where
GF(q)" = C u (a1 + C) u (a2 + C) u ... u (a1 + C)
t =q"'k -1.
Definition 3.19 A Standard Array for an (n, k) code Cis a rf-lc x qk array of all
vectors in GF(fj)" in which the first row consists of the code C (with 0 on the extreme
left), and the other rows are the cosets a;+ C, each arranged in corresponding order,
with the coset leader on the left.
Steps for constructing a standard array:
(i) In the first row write down all the valid codewords, starting with the all-zero codeword.
(ii) Choose a vector a1 which is not in the first row. Write down the coset a1 + Cas the
second row such that a1 + x is written under x E C.
(iii) Next choose another ｶ･｣ｴｯｲｾ＠ (not present in the first two rows) of minimum weight and
write down the coset ｾ＠ + C as the third row such that a2 + x is written under x E C.
(iv) Continue the process until all the cosets are listed and every vector in GF (q)" appears
exactly once.
Example 3.18 Consider the code C = {0000, 1011, 0101, 1110}. The corresponding standard
array is
codewords ｾ＠ 0000 1011 0101 1110
1000 0011 1101 0110
0100 1111 0001 1010
0010 1001 0111 1100
i
coset leader
Note that each entry is the sum of the codeword and its coset leader.
Let us now look at the concept of decoding (obtaining the information symbols from the received
codewords) using the standard array. Since the standard array comprises all possible words
belonging to GF(q)", the received word can always be identified with one of the elements of the
standard array. If the received word is a valid codeword, it is concluded that no errors have
occurred (this conclusion may be wrong with a very low probability of error, when one valid
codeword gets modified to another valid codeword due to noise!). In the case that the received
word, v, does not belong to the set of valid codewords, we surmise that an error has occurred.
The decoder then declares that the coset leader is the error vector, e, and decodes the codeword
as v - e. This is the codeword at the top of the column containing v. Thus, mechanically, we
decode the codeword as the one on the top of the column containing the received word.

I.
I
II
Example 3.19 Suppose the code in the previous example C = {0000, 1011, 0101, 1110} is used
and the received word is v = 1101. Since it is not one of the valid codewords, we deduce that an
error has occurred. Next we try to estimate which one ofthe four possible codewords was actually
transmitted. If we make use of the standard array of the earlier example, we find that 1101 lies in
the 3rd column. The topmost entry ofthis column is 0101. Hence the estimated codeword is 0101.
Observe that:
d (1101, 0000) = 3, d (1101, 1011) = 2,
d (1101, 0101) =1, d (1101, 1110) =2
and the error vector e = 1000, the coset leader.
Codes with larger blocklengths are desirable (though not always; see the concluding remarks
on this chapter) because the code rates of larger codes perform closer to the Shannon Limit. As
we go to larger codes (with larger values of k and n), the method of standard array will become
less practical because the size of the standard array (qn-k x q*) will become unmanageably large.
One of the basic objectives of coding theory is to develop efficient decoding strategies. If we are
to build decoders that will work in real-time, the decoding scheme should be realizable both in
terms ofmemory required as well as the computational load. Is it possible to reduce the standard
array? The answer lies in the concept of Syndrome Decoding, which we are going to discuss
next.
3.7 SYNDROME DECODING
The standard array can be simplified if we store only the first column, and compute the
remaining columns, if needed. To do so, we introduce the concept of the Syndrome of the error
pattern.
I
Definition 3.20 Suppose His a parity check matrix of an (n, Jq code, then for any
vector v e GF(q)n, the vector
s = vHT (3.13)
is called the Syndrome of v.
The syndrome of v is sometimes explicitly written as s(v). It is called a syndrome
because it gives us the symptoms of the error, thereby helping us to diagnose the
error.
Theorem 3.7 Two vectors x and y are in the same coset of C if and only if they have the
same syndrome.
Proof The vectors x and y belong to the same coset
<:::> x+ C=y+ C
<:::> x-ye C
<:::> (x - y)HT = 0
<:::> xHT =yHT
<:::> s(x) = s(y)
Thus, there is a one to one correspondence between cosets and syndromes.
We can reduce the size of the standard array by simply listing the syndromes and the
corresponding coset leaders.
Extultpk J.JO WenowextendtbestandatdarraylistedinExample 3.18by｡､｡ｩ｡ｧﾷｾ＠
column.
The cOde is C ={0000, lOU. ＰＱｾＱＮ＠ 1110}. 'I'I:le conespoDd.iiig standard mayis·
Codewords 0000
1000
0100
0010
t
coset leader
1011
OO'J:l
l1U
loot
The steps for syndrome decoding are as follows
oiot
1101
·.JXIll
-, .-
'Oltl
'
1111
Syndi:OIDe'
00
oiio: 11
1010. · . Dt· ·
.·uoo l&
(i) Determine the syndrome (s = vHT) of the received word, v.
(ii) Locate the syndrome in the 'syndrome column'.
(iii) Determine the corresponding coset leader. This is the error vector, e.
(iv) Subtract this error vector from the received word to get the codeword y = v- e.
Having developed an efficient decoding methodology by means of syndrome decoding, let us
now find out how much advantage coding actually provides.
3.8 ERROR PROBABILITY AFTER CODING (PROBABILITY OF
ERROR CORRECTION)
DeflDitlon 3.21 The Probability of Error (or, the Word Error Rate) P,for any
decoding scheme is the probability that the decoder output is a wrong codeword. Itis
also called the llesklual Error Rate.
Suppose there are M codewords (of length n) which are used with equal probability. Let the
decoding be done using a standard array. Let the number of coset leaders with weight i be
denoted by a.,, We assume that the channel is a BSC with symbol error probabilityp. A decoding
error occurs if the error vector e is rwt a coset leader. Therefore, the probability of correct
decoding will be
I

n
pcor = Lai pi (1- P)n- i (3.14)
i=O
Hence, the probability of error will be
n
Perr= 1- Lai pi (1- pt-i (3.15)
i=O
Example 3.21 Consider the standard array in Example 3.18. The coset leaders are 0000, 1000,
0100 and 0010. Therefore <lo = 1 (only one coset leader with weight equal to zero), a.1
=3 (the
remaining three are of weight one) and all other a.i = 0.
Therefore,
perr = 1- [(1 - p)4 + 3p{l- p)3]
Recall that this code has four codewords, and can be used to send 2 bits at a time. If we did not
perform coding, the probability oferror of the 2-bit message being received incorrectly would be
perr = 1 -pear= 1 - (1 - p)2.
Note thatforp = 0.01, the Word Error Rate (upon coding) isPerr= 0.0103, while fortheuncoded
case Pe" = 0.0199. So, coding has almost halved the word error rate. The comparison of Perr for
messages with and without coding is plotted in Fig. 3.3. It can be seen that coding outperforms the
uncoded case only for p < 0.5. Note that the improvement due to coding comes at the cost of
information transfer rate. In this example, the rate of information transfer has been cut down by
half as we are sending two parity bits for every two information bits.
0.8
Without ｣ｯｾｩｮｧ＠ -
0.6
Perr
0.4
L__ With coding
0.2 ＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭ
0 p
0 0.2 0.4 0.6 0.8 1
Fig. 3.3 Comparison of Perr for Coded and Uncoded 2-Bit Messages.
Example 3.22 This example will help us visualize the power ofcoding. Consider a BSC with the
probability of symbol error p = 1o-7. Suppose 10 bit long words are being transmitted without
coding. Let the bit rate of the transmitter be 10
7
b/s, which implies that Hf wordls are being sent.
The probability that a word is received incorrectly is
(0) (1- p)9 p + C20) (1- p)8; + ｃｾＩ＠ (1- p)7 p3 + ···""' c:)(1-p)9 p ］Ｑｾ＠ wordls.
Therefore, in one second, 10-6 x 1ff = 1 word will be in error ! The implication is that every
second a word will be in error and it will not be detected.
Next, let us add a parity bit to the uncoded words so as to make them 11 bits long. The parity
makes all the codewords of even parity and thus ensures that a single bit in error will be detected.
The only way that the coded word will be in error is iftwo or more bits get flipped, i.e., at least two
bits are in error. This can be computed as 1 - probability that less than two bits are in error.
Therefore, the probability of word error will be
II (11) 10 2
1- (1 - p) -
1
( 1- p) p z 1 - (1 - 11p) - 11(1 - 1Op) p = 110 p = 11 x 10-13
The new word rate will be 107/11 wordls because;10w 11 bits constitute one word and the bit
rate is the same as before. Thus in one second, (107/11) x (11 x w-13
) = 10-6 words will be in
error. This implies that, after coding, one word will be received incorrectlywithoutdetectionevery
106
seconds = 11.5 days!
So just by increasing the word length from 10 bits (uncoded) to 11 bits (with coding), we have
been able to obtain a dramatic decrease in the Word Error Rate. For the second case, each time 2
word is detected to be in error, we can request the transmitter to re-transmit the word.
This strategy for retransmission is called the Automatic Repeat Request (ARQ).
3.9 PERFECT CODES
Definition 3.22 For any vector u in GF(qt and any integer r ｾ＠ 0, the sphere of
radius rand centre u, denoted by S(u, r), is the set {v E GF(q)n I d(u, v) ｾ＠ r}.
This definition can be interpreted graphically, as shown in Fig. 3.4. Consider a code C with
minimum distance I(ｃＩｾ＠ 2t+ 1. The spheres of radius tcentred at the codewords {c1, c2, .... eM}
of C will then be disjoint. Now consider the decoding problem. Any received vector can be
represented as a point in this space. If this point lies within a sphere, then by nearest neighbour
decoding it will be decoded as the centre of the sphere. If t or fewer errors occur, the received
word will definitely lie within the sphere of the codeword that was transmitted, and will be
correctly decoded. If, however, larger than terrors occur, it will escape the sphere, thus resulting
in incorrect decoding.

l'
I
Fig. 3.4 The concept of spheres in GF(qf.
The codewords of the code with I (C; ｾ＠ 2t + 1 are the centres of these non-overlapping spheres.
Theorem 3.8 A sphere of radius r (0 :S r :S n) contains exactly
(ｾＩ＠ +(;) (q-I)+(;) (q-1)
2
+···+G)<q-I)' vectors. (3.16)
Proof Consider a vector u in GF(q)n and another vector v which is at a distance m from
u. This implies that the vectors u and v differ at exactly mplaces. The total number of ways
in which m position can be chosen from n positions is (:). Now, each of these m places
can be replaced by (q- 1) possible symbols. This is because the total size of the alphabet is
q, out of which one is currently being used in that particular position in u. Hence, the
number of vectors at a distance exactly mfrom u is
ＨｾＩ＠ +GJ<q-I)+ C)(q-1)
2
+...+GJ<q-I)' (3.17)
Ertllllpk 3.23 Consider a binary code (i.e., q=2) and｢ｬｯ｣ｫＭＱｬｾｾｾｾ＠
at a distance 2 or less from any·codeword will be · • Ｍｾ＠ＭＬＬＬＺＧｰｾＺ＠ｻｾＺｾｴｊＺＮＧｾＬﾷｾｾＺＺＧｾ＠ :·::.
Without loss ofgenerality we can choose the fixed vectora:= 0000. ｾ＠ vCctonof.....2« .
less are .., ·
..·.. - . . -·
ＺＺＺＺＺｺ］Ｚｾ［ＺＮＺＺﾷｾｾｾｪｾＺｦﾷａＺｾｙｾｾｾｦ［｛ｾｆｾｾｾ｜ＬｾｬｾｩＺＺＺﾷ＠
ＺｴＮＺＺｚＺＺｃＺｾＮｾＧ＼＠ ··$ .: :f ｻ｣ｾﾷ｣＠ :· .• ＮＺｊｾＭｾｾｬｾｾｾＭＮｾｾＱｩ［｣＠
Theorem 3.9 A fj"ary (n, k) code with M codewords and minimum distance (2t + 1)
satisfies
M {(ｾＩ＠ +(ｾ＠ }q-I) +(;}q-1)
2
+..·+(;}q-1)'},; q' (3.18)
Proof Suppose Cis a f["ary (n, Jq code. Consider spheres of radius t centred on the M
codewords. Each sphere of radius t has
ＨｾＩ＠ +ＨｾＩ＠ (9 -I)+ (;}q-1)2
+...+(:}q-1)'
vectors (theorem 3.8). Since none of the spheres intersect, the total number of vectors for
the M disjoint spheres is M {(ｾＩ＠ +(ｾ＠ }q-I) +'(;}q-1)
2
+..·+(:}q-!)'} which is upper
bounded by qn, the total number of vectors oflength n in GF( q)n ·
This bound is called the Hamming Bound or the Sphere Packing Bound and it holds
· d 11 F b' d the Hamming Bound will become
good for nonhnear co es as we . or mary co es,
ｍｻＨｾＩ＠ +G)+(;)+ ... +(;)},; 2" (3.19)
It should be noted here that the mere existence of a set of integers n, M and t satisfying
the Hamming bound, does not confirm it as a binary code. For example, the s_et n= 5, ｾ＠
= 5 and t = 1 satisfies the Hamming Bound. However, no binary code exists for this
specification.
Observe that for the case when M = qk, the Hamming Bound may be alternatively
written as
(3.20)
ｾＭＪＧｨｬｫｴ｣ｩｾｊ･＠ｩ｡＼ｭ･ｷｨｩ､ｩｾＱｊ＾･ｾｪｾ＠ｾｾｾ［Ｎ＠
...... ;ｾＨｩｊ＠ +(;)(f-t>+(i)ifＭｴＩｾ＠ + ＭｾＭｾｻｦＩｲｾｬｴｽＢＧｾＧﾷ＠ ·
....·［Ｍ［Ｍｾ＠ ·
I
I
·I
l
!

For a Perfect Code, there are·equal radius disjoint spheres centred at the codewords
which completely fill the space. Thus, a t error correcting perfect code utilizes the
entire space in the most efficient manner.
Example 3.24 Consider the Binary Repetition Code
{
00...0
c-
11...1
of block length n, where n is odd. In this case M =2 and t =(n- 1)/2. Upon substituting these
values in the left hand side of the inequality for Hamming bound we get
Thus the repetition code is a Perfect Code. It is actually called a Trivial Perfect Code. In the next
chapter, we shall see some examples of Non-trivial ｐｾｲｦ･｣ｴ＠ Codes.
One of the ways to search for perfect codes is to obtain the integer solutions for the
parameters n, q, M and tin the equation for Hamming bound. Some of the solutions found by
exhaustive computer search are listed below.
S.No. n q M t
1 23 2 212 3
2 90 2 278 2
3 11 3 36 2
3. 10 HAMMING CODES
There are both binary and non-binary Hamming Codes. Here, we shall limit our discussion to
binary Hamming Codes. The binary Hamming Codes have the property that
(n, k) =(2m- 1, 2m- 1 - m) (3.22)
where m is any positive integer. For example, for m = 3 we have a (7, 4) Hamming Code. The
parity check matrix, H, of a Hamming Code is a very interesting matrix. Recall that the parity
check matrix of an (n, k) code has n- k rows and n columns. For the binary (n, k) Hamming
code, the n = 2m - 1 columns consist of all possible binary vectors with n - k = m elements,
except the all zero vector.
Example 3.25 Tlle generator matrlx for the binary (7, 4) Hamming Code is given by
[
1 1 0 1
0 1 1 0
G = 0 0 1 1
0 0 0 1
0 0
1 0
0 1
1 0
j]
The corresponding parity check matrix is
H = ｲｾ＠ｾ＠ｾ＠ ::ｾ＠ｾＱ＠
lo o 1 o 1 1 1J
Observe that the columns ofthe parity check matrix consist of (100), (010), (101), (110), (111),
· · 1 b' ecto oflength three. It isquite
(011) and (001). These seven are all the possib e non-zero mary v rs .
easy to generate a systematic Hamming Code. The parity check matrix H can be arranged m the
systematic form as follows
110100]
1 1 1 0 1 0 = [- pT1 I].
1 0 1 0 0 1
Thus, the generator matrix in the systematic form for the binary Hamming code is
G = [ II P] = ｛ｾ＠ lｾ＠ r::ｾ｝＠
0001:011
.. t t . . . JJ SL 2 2 2 Li 2
From. the above example, we observe that no two columns ｾｦ＠ H ｾ･＠ｬｩｾ･｡ｲｬｹ＠ dependent
(
th · e they would be identical). However, form> 1, it is possible to Identify ｴｨｲｾ･＠｣ｯｬｵｭｾｳ＠
o erwis . . d. l' fan (n, k' Hammmg Code 1s
fHth t ld add up to zero. Thus, the mmimum Istance, , o 1
o a wou H · C d are Perfect
equal to 3, which implies that it is a single-error correcting code. ammmg o es
Codes.
· all ·ty b't an (n, k) Hamming Code can be modified to yield an (n+1, k)
By ｡ｾｴｨ､ｭ､ｾ＠ｾＴｯｶＰ･ｲ＠ thpanth Ih,and an (n, k' Hamming Code can be shortened to an (n- l, k
code WI - n e o er ' ·J • l 1
- l) code by ｲ･ｭｾｶｩｮｧ＠ l rows of its generator matrix G or, ･ｱｵｩｶ｡ｬ･ｮｾｾＧ＠ by removm.g ｾｯｾＺｳｮｳ＠
· · h k trtx' H We can now give a more formal defimtion of Hammmg ·
of Its panty c ec rna , .

il
DefbdUoD. 3-U Let • ::{fk ｾ＠ l}l{fｾ＠ 1J•.ＱﾣＱｩ･ＺｾＱｴＣＧｩｾ［Ｚｴｴ＠ｾＭＭ
code for which the ｰ｡ｲｩｴｹＮｾ＠ . . .. .
independent (over ｇｆＨｾＩＬ＠ i.e., the ｾＭﾷ＠ are ｡ｩｊｾ＠ .Ｎｭｭ｡ｴＺＭＺＺｗｩＭｾｩｲ｝ｬｬ＠ ...ｾ＠
vectors.
3.11 OPTIMAL LINEAR CODES
Definition 3.25 For an (n, .t, tt) ｏｰｴｬｭ｡ｩｃｯ､･Ｌｰｯｻｩ｡ＭｬｾＮＺｫＬ［ＬｾＺｦ｡ｾｩｲＺｴＪＬｦｾＡｦＱ［ｯｲ＠ ..
(n + 1, k, d + 1) code exists. .· ·. .•. . ﾷｾ＠ ... ｾＭＭ · ,
Optimal Linear Codes give the best distance property under the constraint of the block length.
Most of the optimal codes have been found by long computer searches. It may be possible to
have more than one optimal code for a given set of parameters n, k and d*. For instance, there
exist two different binary (25, 5, 10) optimal codes.
Ｑｾｾﾧｾｾ［ｾＺＺ［Ｍ［Ｚ］Ｌｲ［ｊｴｦｩｾｪｾｾｾｾｾ＠ＮＭＺＭＧＢＧ＼ＮﾷＯＮＭＭＭｾＵＺＢ＼ＺＭ［ＺＬﾷＮ＠ ·.:; .. ...
Thus the binary (24, 12, 8) code is an optimal code.
3.12 MAXIMUM DISTANCE SEPARABLE (MDS) CODES
In this section we consider the problem of finding as large a minimum distance apossible for
a given redundancy, r.
Theorem 3.10 An (n, n- r, d*) code satisfies d* $; r + 1.
IProof From the Singleton Bound we have a$; n - k + 1.
Substitute k = n - r to get d* $; r + 1.
The classic paper by Claude Elwood Shannon in the Bell System Technical]ournal in 1948 gave
birth to two important fields (i) Information Theory and (ii) Coding Theory. At that time,
Shannon was only 32 years old. According to Shannon's Channel Coding Theorem, "the error
rate ofdata transmitted over a hand-limited noisy channel can he reduced to an arbitrarily small amount if
the information rate is less than the channel ｣｡ｰ｡｣ｩｴｹＧｾ＠ Shannon predicted the existence of good
channel codes but did not construct them. Since then the search for good codes has been on.
Shannon's seminal paper can be accessed from the site:
http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html.
In 1950, R.W. Hamming introduced the first single-error correcting code, which is still used
today. The work on linear codes was extended by Golay (whose codes will be studied in the
following chapter). Golay also introduced the concept of Perfect Codes. Non-binary Hamming
Codes were developed by Golay and Cocke in the late 1950s. Lately, a lot of computer searches
have been used to find interesting codes. However, some of the best known codes are ones
discovered by sheer genius rather than exhaustive searches.
According to Shannon's theorem, if C(p) represents the capacity (see Chapter 1 for further
details) of a BSC with probability of bit error equal to p, then for arbitrarily low probability of
symbol error we must have the code rateR< C(p). Even though the channel capacity provides
an upperbound on the achievable code rate (R = kln), evaluating a code exclusively against
channel capacity may be misleading. The block length of the code, which translates directly into
delay, is also an important parameter. Even if a code performs far from ideal, it is possible that
it is the best possible code for a given rate and length. It has been observed that as we increase
the block length of codes, the bounds on code rate are closer to channel capacity as opposed to
codes with smaller blocklengths. However, longer blocklengths imply longer delays in
decoding. This is because decoding of a codeword cannot begin until we have received the
entire codeword. The maximum delay allowable is limited by practical constraints. For
example, in mobile radio communications, packets of data are restricted to fewer than 200 bits.
In these cases, codewords with very large blocklengths cannot be used.
SUMMARY
• A Word is a sequence of symbols. A Code is a set of ｾ･｣ｴｯｲｳ＠ called codewords.
• The Hamming Weight of a codeword (or any vector) is equal to the number of non-zero
elements in the codeword. The Hamming Weight of a codeword cis denoted by w(c).
• A Block Code consists of a set of fixed length codewords. The fixed length of these
codewords is called the Block Length and is typically denoted by n. A Block Coding
Scheme converts a block of k information symbols to n coded symbols. Such a code is
denoted by (n, k).
• The Code Rate of an (n, k) code is defined as the ratio (kin), and reflects the fraction of
the codeword that consists of the information symbols.
• The minimum distance of a code is the minimum Hamming Distance between any two
codewords. An (n, k) code with minimum distanced'' is sometimes denoted by (n, k, d).
The minimum weight of a code is the smallest weight of any non-zero codeword, and is

denoted by w·. For a Linear Code the minimum distance is equal to the minimum weight
of the code, i.e., d = w...
• A Linear Code has the following properties:
(i) The sum of two codewords belonging to the code is also a codeword belonging to the
code.
(ii) The all-zero codeword is always a codeword.
(iii) The minimum Hamming Distance between two codewords of a linear code is equal
to the minimum weight of any non-zero codeword, i.e., d = w...
• The generator matrix converts (encodes) a vector oflength k to a vector oflength n. Let
the input vector (uncoded symbols) be represented by i. The coded symbols will be given
by c= iG.
• Two q-ary codes are called equivalent if one can be obtained from the other by one or
both operations listed below:
(i) permutation of symbols appearing in a fixed position.
(ii) permutation of position of the code.
• An (n, k) Systematic Code is one in which the first k symbols of the codeword of block
length n are the information symbols themselves. A generator matrix of the form G =
[/ IP] is called the systematic form or the standard form of the generator matrix, where I
is a k x k identity matrix and P'is a k x (n- k) matrix.
• The Parity Check Matrix, H, for the given code satisfies cHT = 0, where c is a valid
codeword. Since c = iG, therefore, iGHT = 0. The Parity Check Matrix is not unique for
a given code.
• A Maximum Distance Code satisfies d..= n - k + 1.
• For a code to be able to correct up to terrors, we must have d ｾ＠ 2t + 1, where d is
minimum distance of the code.
• Let Cbe an (n, k) code over GF(q) and a be any vector oflength n. Then the set a+ C=
{a+ xI x E C} is called a coset (or translate) of C. a and bare said to be in the same coset
iff (a - b) E C.
• Suppose His a Parity Check Matrix of an (n, k) code. Then for any vector v E GF(qt, the
vectors= vff is called the Syndrome of v. It is called a syndrome because it gives us the
symptoms of the error, thereby helping us to diagnose the error.
• A Perfect Code achieves the Hamming Bound, i.e.,
• The binary Hamming Codes have the property that (n, k) =(2m- 1, 2m- 1 - m), where
m is any positive integer. Hamming Codes are Perfect Codes.
• For an (n, k, d*) Optimal Code, no (n- 1, k, d), (n + 1, k + 1, d) or (n + 1, k, d + 1) code
exists.
• An (n, n- r, r+ 1) code is called a Maximum Distance Separable (MDS) Code. An MDS
code is a linear code of redundancy r, whose minimum distance is equal to r + 1.
Aｾｾｾｴｍｯｲ･ｭｴｩｮｴｾｷｷｨＺｙｾ＠ i
! I people,-are,t'better tU" IXtha.rtt ｾ＠ I
i I ｾｾ＠ Adn:ant i
u !
PROBLEMS
3.1 Show that C= {0000, 1100,0011, 1111} is a linear code. What is its minimum distance?
3.2 Construct, if possible, binary (n, k, d) codes with the following parameters:
(i) (6, I, 6)
(ii) (3, 3, 1)
(iii) (4, 3, 2)
3.3 Consider the following generator matrix over GF(2)
G= ｉｾ＠ｾ＠ｾ＠ｾ＠ｾ｝ﾷ＠
lo 1 o 1 o
(i) Generate all possible codewords using this matrix.
(ii) Find the parity check matrix, H.
(iii) Find the generator matrix of an equivalent systematic code.
(iv) Construct the standard array for this code.
(v) What is the minimum distance of this code?
(vi) How many errors can this code detect?
(vii) Write down the set of error patterns this code can detect.
(viii) How many errors can this code correct?
(ix) What is the probability of symbol error if we use this encoding scheme? Compare it
with the uncoded probability of error.
(x) Is this a linear code?
3.4 For the code C= {00000, 10101, 01010, 11111} construct the generator matrix. Since this
G is not unique, suggest another generator matrix that can also generate this set of
codewords.
3.5 Show that if there is a binary (n, k, d) code with I even, then there exists a binary (n, k,
d) code in which all codewords have even weight.
3.6 Show that if Cis a binary linear code, then the code obtained by adding an overall parity
check bit to Cis also linear.
3.7 For each of the following sets S, list the code <S>.
(a) S= {0101, 1010, 1100}.
(b) s= {1000, 0100, 0010, 0001}.
(c) S = {11000, 01111, 11110, 01010}.

l
l
I
3.8 Consider the (23, 12, 7) binary code. Show that if it is used over a binary symmetric
channel (BSC) with probability of bit error p= 0.01, the word error will be approximately
0.00008.
3.9 Suppose C is a binary code with parity check matrix, H. Show that the extended code C1,
obtained from C by adding an overall parity bit, has the parity check matrix
I 0
0
H
0
1 1 1
3.10 For a (5, 3) code over GF(4), the generator matrix is given by
G= ｛ｾ＠ｾ＠ｾ＠ｾ＠ｾ｝＠
0 0 1 1 3
(i) Find the parity check matrix.
(ii) How many errors can this code detect?
(iii) How many errors can this code correct?
(iv) How many erasures can this code correct?
(v) Is this a perfect code?
3.11 Let C be a binary perfect code of length n with minimum distance 7. Show that n = 7 or
n=23.
3.12 Let rHdenote the code rate for the binary Hamming code. Determine lim rH.
k-+oo
3.13 Show that a (15, 8, 5) code does not exist.
COMPUTER PROBLEMS
3.14 Write a computer program to find the minimum distance of a Linear Block Code over
GF(2), given the generator matrix for the code.
3.15 Generalize the above program to find the minimum distance of any Linear Block Code
over GF(q).
3.16 Write a computer program to exhaustively search for all the perfect code parameters n, q,
M and tin the equation for the Hamming Bound. Search for 1 ｾ＠ n ｾ＠ 200, 2 ｾ＠ qｾ＠ 11.
3.17 Write a computer program for a universal binary Hamming encoder with rate }m-l
2 -1-m
The program should take as input the value of m and a bit-stream to be encoded. It
should then generate an encoded bit-stream. Develop a program for the decoder also.
Now, perform the following tasks:
(i) Write an error generator module that takes in a bit stream and outputs another bit-
stream after inverting every bit with probability p, i.e., the probability of a bit error is p.
(ii) For m = 3, pass the Hamming encoded bit-stream through the above-mentioned
module and then decode the received words using the decoder block.
(iii) Plot the residual error probability (the probability of error after decoding) as a
function of p. Note that if you are working in the range of BER = 1o-r, you must
transmit of the order of 10'+2
bits (why?).
(iv) Repeat your simulations for m= 5, 8 and 15. What happens as m--7 oo.
,

I'
I
Cyclic Codes
We,t etn'lNe,t ｡ｴｾ＠ not" by ｾ＠ Ofll:y, butti4o- by the.-
hecwt.
ｰｾ＠ＧＸｾ＠ (1623-1662)
4. 1 INTRODUCTION TO CYCLIC CODES
In the previous chapter, while dealing with Linear Block Codes, certain linearity constraints
were imposed on the structure of the block codes. These structural properties help ｾｳ＠ to search
for good linear block codes that are fast and easy to encode and decode. In this chapter, we shall
explore a subclass of linear block codes which has another constraint on the structure of the
codes. The additional constraint is that any cyclic shift of a codeword results in another valid
codeword. This condition allows very simple implementation of these cyclic codes by using
shift registers. Efficient circuit implementation is a selling feature of any error control code. We
shall also see that the theory of Galois Field can be used effectively to study, analyze and
discover new cyclic codes. The Galois Field representation of cyclic codes leads to low-
complexity encoding and decoding algorithms.
This chapter is organized as follows. In the first two sections, we take a mathematical detour
to polynomials. We will review some old concepts and learn a few new ones. Then, we will use
these mathematical tools to construct and analyze cyclic codes. The matrix description of cyclic
Cyclic Codes
codes will be introduced next. We will then, discuss some popular cyclic codes. The chapter wili
conclude with a discussion on circuit implementation of cyclic codes.
Definition 4.1 A code Cis cyclic if
(i) Cis a linear code, and,
(ii) any cyclic shift of a codeword is also a codeword, i.e., if the codeword tzoa1
••• ｾＱ＠ is
in C then an-lOo···an-2is also in C.
Example 4.1 The binary code C1 = {0000, 0101, 1010, 1111} is a cyclic code. However C2
=
{0000, 0110, 1001, 1111} is not a cyclic code, but is equivalentto the first code. Interchanging the
third and the fourth components ofC2 yields C1.
4.2 POLYNOMIALS
Definition 4.2 A polynomial is a mathematical expression
f(x) =fo +fix+ ... +[,/',
e
(4.1)
where the symbol xis called the indeterminate and the coefficientsfo,fi, ...,fm are the
elements of GF (q). The coefficientfm is called the leading coefficient. Iffm # 0, then m
is called the degree of the polynomial, and is denoted by deg f(x).
Definition 4.3 A polynomial is called monic if its leading coefficient is unity.
Example 4.2 j{x) = 3 + ?x + ｾ＠ + 5x4
+ x6
is a monic polynomial over GF(8). The degree of this
polynomial is 6.
Polynomials play an important role in the study of cyclic codes, the subject of this chapter. Let
F[x] be the set of polynomials in x with coefficients in GF(q). Different polynomials in F[x] can
be added, subtracted and multiplied in the usual manner. F[x] is an example of an algebraic
, structure called a ring. A ring satisfies the first seven of the eight axioms that define a field (see
Sec. 3.2 of Chapter 3). F[x] is not a field because polynomials of degree greater than zero do not·
have a multiplicative inverse. It can be seen that if[(x), g(x) E F[x], then deg (f(x)g(x)) = degf(x)
+ deg g(x). However, deg (f(x) + g(x)) is not necessarily max{ deg f(x), deg g(x)}.
For example, consider the two polynomials,f(x) and g(x), over GF(2) such thatf(x) = 1 + x2
and
g(x) = 1 + x + x2
. Then, deg (f(x) + g(x)) = deg (x) = 1. This is because, in GF(2), 1 + 1 = 0, and
x
2
+ x
2
=(I+ 1); = 0.

r--
Example 4.3 Consider the polynomialsf(x) = 2 +x + ｾ＠ + 2x4
and g{x) = 1 + '1f + 2x4
Ｋｾｯｶ･ｲ＠
GF(3). Then,
f(i; + g(x) = (2 + 1) + x + (1 + ＲＩｾ＠ + (2 + 2)x4
+ ｾ＠ = x + x4
+ ｾＮ＠
f(x). g(x) = (2 + x + ｾ＠ + ＲｸｾＨ＠ 1 + '1f + 2x4
+ ｾＩ＠
= 2 +X+ (1 + 2.2) ｾ＠ + U + (2 + 2 + 2.2)x4
+ (2 + 2) ｾ＠
+ (1 + 2 + l).li + x1
+ 2.2x8
+ ｾ＠
= 2 + x + (1 + Ｑｾ＠ + U + (2 + 2 + l)x4
+ (2 + Ｒｾ＠ + (1 + 2 + 1),/i + x1
+ :/' Ｋｾ＠
= 2 + x + ｾ＠ + :zx3 + 2x4
+ :C + .li + x1
+ x8
+ ｾ＠
Note that the addition and multiplication of the coefficients have been carried out in GF(3).
Example 4.4 Consider the polynomialf(x) = 1 + x over GF(2).
(f(x))
2
= 1 + (1 + l)x + ｾ＠ = 1 + ｾ＠
Again considerf(x) = 1 + x over GF(3).
(f(x))2
= 1 + ＨＱＫＱＩｸＫｾ］＠ 1 + ＲｸＫｾ＠
4.3 THE DIVISION ALGORITHM FOR POLYNOMIALS
The Division Algorithm states that, for every pair of polynomial a(x) and b(x) :t 0 in F[ x], there
exists a unique pair of polynomials q(x), the quotient, and r(x), the remainder, such that a(x) =
q(x) b(x) + r(x), where deg r(x) <deg b(x). The remainder is sometimes also called the residue,
and is denoted by Rh(x) [a(x)] = r(x).
Two important properties of residues are
(i) Rtr.x) [a(x) + b(x)] = Rtr.x) [a(x)] + Rttx) [b(x)], and
(ii) Rtr.x) [a(x). b(x)] = Rttx) {Rp_x) [a(x)]. Rttx) [b(x)]}
where a(x), b(x) and f(x) are polynomials over GF(q).
(4.2)
(4.3)
Example 4.5 Let the polynomials, a(x) = xl + x + land b(x) = ｾ＠ + x + 1 be defined over GF{2).
We can carry out the long division of a(x) by b(x) as follows
x+l -q(x)
b(x) -xl+x+ 1) X3+ x+ 1-a(x)
_x3+ _x2 +X
.xl+
.xl+x+ 1
x - r(x)
Cyclic Codes
Thus, a(x) = (x+ 1) b(x) + x. Hence, we may write a(x) = q(x) b(x) + r(x), where q(x) = x + 1 and
r(x) = x. Note that deg r(x) <deg b(x).
Definition 4.4 Let f(x) be a fixed polynomial in F(Xj. Two polynomials, g(x) and
h(x) in F[x] are said to be congruent modulo f(x), depicted by g(x) =h(x) (modf(x)),
if g(x) - h(x) is divisible by f(x).
Example 4.6 Let the polynomials g(x) = x
9
+ ｾ＠ + 1, h(x) = ｾ＠ + ｾ＠ + 1 and f(x) = x
4
+ 1 be
defined over GF(2). Since g(x)- h(x) = J?j(x), we can write g(x) =h(x) (modf(x)).
Next, let us denote F[x]!f(x) as the set ofpolynomials in F[x] of degree less than deg f(x), with
addition and multiplication carried out modulo f(x) as follows:
(i) If a(x) and b(x) belong to F[x]!f(x), then the sum a(x) + b(x) in F[x]lf(x) is the same as in
F[x]. This is because deg a(x) <degf(x), deg b(x) <degf(x) and therefore deg (a(x) + h(x))
<deg f(x).
(ii) The product a(x)b(x) is the unique polynomial of degree less than deg f(x) to which
a(x)b(x) (multiplication being carried out in F[x]) is congruent modulo f(x).
F[x]!f(x) is called the ring ofpolynomials (over F[x]) modulo f(x). As mentioned earlier, a
ring satisfies the first seven of the eight axioms that define a field. A ring in which every
element also has a multiplicative inverse forms a field.
Example 4.7 Consider the product (x + 1)2
in ｆ｛ｸ｝ｬＨｾ＠ + x + 1) defined over GF{2). (x + 1)
2
= ｾ＠
+X+ X+ 1 = ｾ＠ + 1 =X Ｈｭｯ､ｾＫ＠ X+ 1).
The product (x + 1)2
in ｆ｛ｸ｝ｬＨｾ＠ + 1) defmed over GF(2) can be expressed as (x + 1)
2
= ｾ＠ + x + x
+ 1 ］ｾＫ＠ 1 ］ｏＨｭｯ､ｾＫｸＫ＠ 1).
The product (x + 1)2
in F [x｝ＯＨｾ＠ + x + 1) defined over GF(3) can be expressed as (x + 1)
2
= ｾ＠ + x
+X+ 1 = ｾ＠ + 2x + 1 =X Ｈｭｯ､ｾＫ＠ X + 1).
Ifj(x) has degree n, then the ring F[x]!f(x) over GF(q) consists of polynomials of ､･ｧｲ･･ｾ＠ n- 1.
The size of ring will be qn because each of the n coefficients of the polynomials can be one of the
q elements in GF(q).
Example 4.8 Consider the ring ｆ｛ｸ｝ｬＨｾ＠ + x + 1) defined over GF(2). This ring will have
polynomials with highest degree = 1. This ring contains qn = 2
22
= 4 elements (each element is a
polynomial). The elements of the ring will be 0, 1, x and x + 1. The addition and multiplication
tables can be written as follows.
.I

+ 0 1 X x+1 . 0 1 X X+1
0 0 1 X x+1 0 0 0 0 0
1 1 0 x+l X 1 0 1 X x+1
X X x+l 0 1 X 0 X x+1 1
x+l x+l X 1 0 x+1 0 x+1 1 X
Next, consider ｆ｛ｸ｝ｬＨｾ＠ + 1) defined over GF(2). The elements ofthe ring will be 0, l,x andx + 1.
The addition and multiplication tables can be written as follows.
+ 0 1 X x+1 . 0 1 X x+1
0 0 1 X x+1 0 0 0 0 0
1 1 0 x+1 X 1 0 1 X x+1
X X x+l 0 1 X 0 X 1 x+1
x+1 x+l X 1 0 x+1 0 x+1 x+l 0
It is interesting to note that F[x]l(x2
+ x + I) is actually a field as the multiplicative inverse for all
the non-zero elements also exists. On the other hand, F[x]!(x2
+ 1) is not a field because the
multiplicative inverse of element x + 1 does not exist.
It is worthwhile exploring the properties of f(x) which makes F[x]!f(x) a field. As we shall
shortly find out, the polynomial f(x) must be irreducible (nonfactoriz:p.hle).
Definition 4.5 A polynomi-al f(x) in F[x] is said to be reducible if f(x) = a(x) b(x),
where a(x), h(x) are elements of l{x) and deg a(x) and deg b(x) are both smaller than
deg f(x). H f(x) is not reducible, it is called irreducible. A monic irreducible
polynomial of degree at least one is called a prime polynomial.
It is helpful to compare a reducible polynomial with a positive integer that can be factorized
into a product of prime numbers. Any monic polynomial in f(x) can be factorized uniquely into
a product of irreducible monic polynomials (prime polynomials). One way to verify a prime
polynomial is by trial and error, testing for all possible factorizations. This would require a
computer search. Prime polynomials of every degree exist over every Galois Field.
Theorem 4.1
(i) A polynomial f(x) has a linear factor (x - a) if and only if f(x) = 0 where a is a field
element.
(ii) A polynomialf(x) in F[x] of degree 2 or 3 over GF(q) is irreducible if and only iff(a)
"* 0 for all a in GF(q).
(iii) Over any field, xn- 1 = (x- 1)( xn- 1
+ ｾＭ 2
+ ... + x +1). The second factor may be
further reducible.
Cyclic Codes
Proof
(i) H f(x) = (x- a) g(x), then obviously f(a) = 0. On the other hand, if f(a) = 0, by
division algorithm, f(x) = q(x)(x- a) + r(x), where deg r(x) <deg (x- a) = 1. This
implies that r(x) is a constant. But, since f(a) = 0, r(x) must be zero, and therefore,
f(x) = q(x)(x- a).
(ii) A polynomial of degree 2 or 3 over GF(q) will be reducible if, and only if, it has at
least one linear factor. The result (ii) then directly follows from (i). This result does
not necessarily hold for polynomials of degree more than 3. This is because it might
be possible to factorize a polynomial of degree 4 or higher into a product of
polynomials none of which are linear, i.e., of the type (x- a).
(iii) From (i), (x- 1) is a factor of (xn -1). By carrying out long division of (xn -1) by (x -1)
we obtain ＨｾＭＱ＠
+ xn- 2
+ ... + x + 1).
Example 4.9 Considerf(x) = ｾ＠ -1 over GF(2). Using (iii) of theorem 4.1 we can write x1-I =
(x- ＱＩＨｾ＠ + x + 1). This factorization is true over any field. Now, lets try to factorize the second
term, p(x) = Ｈｾ＠ + x + 1).
p(O) =0 + 0 + 1 =1, over GF(2),
p(l) = 1 + 1 + 1 = 1, over GF(2).
ｾ＠
Therefore, p(x) cannot be factorized further (from Theorem 4.1 (ii)).
Thus, over GF(2), x1- 1 = (x- ＱＩＨｾ＠ + x + 1).
Next, considerf(x) = x1- 1 over GF(3).
ｾ＠ -1 = (x- ＱＩＨｾ＠ + x + 1).
Again, let p(x) = Ｈｾ＠ + x + 1).
p(O) = 0 + 0 + 1 = 1, over GF(3),
p(l) = 1 + 1 + 1 = 0, over GF(3).
p(2) = 2.2 + 2 + 1 = 1 + 2 + 1 = 1, over GF(3).
Since, p(l) = 0, from (i) we have (x- 1) as a factor ofp(x).
Thus, over GF(3),
ｾ＠ -1 = (x- l)(x- 1) (x- 1).
Theorem 4.2 The ring F[x]lf(x) is a field if, and only if, J(x) is a prime polynomial in F[x].
Proof To prove that a ring is a field, we must show that every non zero element of the
ring has a multiplicative inverse. Let s(x) be a non zero element of the ring. We have, deg
s(x) < deg J(x), because s(x) is contained in the ring F[x]!f(x). It can be shown that the
Greatest Common Divisor (GCD) of two polynomials J(x) and s(x) can be expressed as
GCD(f(x), s(x)) = a(x) J(x) + h(x) s(x),

where a(x) and h(x) are polynomials over GF(q). Since f(x) is irreducible in F[x], we have
GCD(f(x), s(x)) = 1= a(x) f(x) + b(x) s(x).
Now, 1= ｾｸＩ｛Ｑ｝＠ = ｾｸＩ｛＠ a(x) f(x) + b(x) s(x)]
= ｾｸＩ｛＠ a(x) f(x)] + ｾｸＩ｛＠ h(x) s(x)] (property (i) of residues)
= 0 + ｾｸＩ｛｢ＨｸＩ＠ s(x)]
= ｒｦＨｸＩｦｾｸＩ｛｢ＨｸＩ｝ＮｾｸＩ｛ｳＨｸＩ｝ｽ＠ (property (ii) of residues)
= ｾｸＩｻｒｴＨｸＩ｛｢ＨｸＩ｝ＮｳＨｸＩｽ＠
Hence, ｾｸＩ｛｢ＨｸＩ｝ｩｳ＠ the multiplicative inverse of s(x).
Next, let us prove the only ifpart of the theorem. Let us suppose f(x) has a degree of at
least 2, and is not a prime polynomial (a polynomial of degree one is always irreducible).
Therefore, we can write
f(x) = r(x) s(x)
ｾｯｲ＠ some polynomials r(x) and s(x) with degrees at least one. If the ring F[x]lf(x) is
mdeed a field, then a multiplicative index of r(x), r- 1
(x) exists, since all polynomials in
the field must have their corresponding multiplicative inverses. Hence,
s(x) = ｾｸＩｻ＠ s(x)}
= ｾｸＩｻ＠ r(x)r- 1
(x)s(x)} = Rt(x){r- 1
(x)r(x)s(x)} = Rt(x){r- 1
(x)f(x)} = 0
ｾｯｷ･ｶ･ｲＬ＠ we had assumed s(x) :t:. 0. Thus, there is a contradiction, implying that the ring
IS not a field.
Note that a prime polynomial is both monic and irreducible. In the above theorem it is
sufficient to have f(x) irreducible in order to obtain a field. The theorem could as well 'been
stated as: "The ring F[x]!f(x) is a field if and only if[(x) is irreducible in F[x]".
So, now we have an elegant mechanism of generating Galois Fields! If we can identify a
prime polynomial of degree n over GF(q), we can construct a Galois Field with (elements.
Such a field will have polynomials as the elements of the field. These polynomials will be
､ｾｦｩｮ･､＠ over GF(q) and consist of all polynomials of degree less than n. It can be seen that there
will be ( such polynomials, which form the elements of the Extension Field.
Example 4.10 Consider the polynomialp(x) =x1 +x + 1 over GF(2). ｓｩｮ｣･ＬｰＨｏＩＭＺｾＺ＠ 0 andp(1) ［ｾＺ＠
0, the polynomial is irreducible in GF{2). Since it is also monic,p(x) is a prime polynomial. Here
we haven= 3, so we can use p(x) to construct a field with 23
= 8 elements, i.e., GF(8). The
elements.of this field will be 0, 1, x, x + 1, Xl-, 7? + 1, 7? + x, 7? + x + 1, which are all possible
polynOJDials ofdegree less than n =3. It is easy toconstruct the addition and multiplication tables
for this field (exercise).
Cyclic Codes
Having developed the necessary mathematical tools, we now resume our study of cyclic
codes. We now fix[(x) =X'- 1 for the remainder of the chapter. We also denote F[x]!f(x) ｢ｹｾﾭ
Before we proceed, we make the following observations:
(i) X'= 1 (mod X'- 1). Hence, any polynomial, modulo X'- 1, can be reduced simply by
replacing X' by 1, xz+l by X and SO on.
(ii) A codeword can uniquely be represented by a polynomial. A codeword consists of a
sequence of elements. We can use a polynomial to represent the locations and values of
all the elements in the codeword. For example, the codeword c1l2..·'n can be represented
by the polynomial c(x) =Co+ c1x + l2; + ... cnX'. As another example, the codeword over
GF(B), c= 207735 can be represented by the polynomial c(x) = 2 + 7Xl + 7; + 3x
4
+ＵｾＮ＠
(iii) Multiplying any polynomial by x corresponds to a single cyclic right-shift of the
codeword elements. More explicitly, in Rno by multiplying c(x) by x we get x. c{x) = eox +
cl; + l2; + ..... cnr-1
=en+ eox+ clXl + c2f + ..... cn-lX'.
Theorem 4.3 A code C in ｾ＠ is a cyclic code if, and only if, C satisfies the following
conditions:
(i) a(x),b(x)E ｃｾ｡ＨｸＩＫ｢ＨｸＩｅ＠ C
(ii) a(x)E ｃ｡ｮ､ｲＨｸＩｅｾｾ｡ＨｸＩｲＨｸＩｅ＠ C.
Proof
(4.4)
(4.5)
(i) Suppose Cis a cyclic code in ｾﾷ＠ Since cyclic codes are a subset of linear block codes,
the first condition holds.
(ii) Let r(x) = r0
+ r1x + r2x2
+ ... 1nxn. Multiplication by x corresponds to a cyclic
rightshift. But, by definition, the cyclic shift of a cyclic codeword is also a valid
'todeword. That is,
x.a(x) E C, x.(xa (x)) E C,
and so on. Hence
r(x)a(x) = r0a(x) + r1xa(x) + r2Xla(x) + ... rnX'a(x)
is also in C since each summand is also in C.
Next, we prove the only ifpart of the theorem. Suppose (i) and (ii) hold. Take r(x) to be
a scalar. Then (i) implies that Cis linear. Take r(x) = x in (ii), which shows that any cyclic
shift also leads to a codeword. Hence (i) and (ii) imply that Cis a cyclic code.
In the next section, we shall use th-e mathematical tools developed so far to construct
cyclic codes.
4.4 A METHOD FOR GENERATING CYCLIC CODES
The following steps can be used to generate a cyclic code:
(i) Take a polynomial f(x) in ｾＭ
(ii) Obtain a set of polynomials by multiplying f(x) by'ell possible polynomials in ｾＭ
(iii) The set of polynomials obtained above corresponds to the set of codewords belonging to
a cyclic code. The blocklength of the code is n.

d!
Example 4.11 Consider the polynomial f(x) = 1 + :J! in R3
defmed over GF(2). In general a
polynomial in R3 ( = F [x]l( x
3
- 1)) can be represented as r(x) = r
0
+ r
1
x + ｲｾＬ＠ where the
coefficients can take the values 0 or 1 (since defined over GF(2)).
Thus, there can be a total of 2 x 2 x 2 = 8 polynomials inR3
defined over GF(2), which are 0, 1,
x, Yl-, 1 + x, 1 + Xl, x + Yl-, 1 + x + Yl-. To generate the cyclic code, we multiplyf(x) with these 8
possible elements ofR3 and then reduce the results modulo (2- 1):
(1 + _x2). 0 = 0, (1 + _x2) .1 = (1 + _x2), (1 + _x2) .X = 1 + X, (1 + _x2) . _x2 = X + _x2,
(1 +.C). (1 + x) = x + .C, (1 +.C). (1 +.C)= 1 + x, (1 +.C). (x +.C)= (1 +.C),
(1 + _x2) . (1 + X + _x2) = 0.
Thus there are only four distinct codewords: {0, 1 + x, 1 + .C, x + .C} which correspond to
{000, 110, 101, 011}.
From the above example it appears that we can have some sort of a Generator Polynomial
which can be used to construct the cyclic code.
Theorem 4.4 Let C be an (n, k) non-zero cyclic code in Rn. Then,
(i) there exists a unique monic polynomial ｾｸＩ＠ of the smallest degree in C
(ii) the cyclic code C consists of all multiples of the generator polynomial g(x) by
polynomials of degree k - I or less
(iii) g(x) is a factor ｯｦｾＭ 1
Proof
(i) Suppose both g(x) and h(x) are monic polynomials in C of smallest degree."Then g(x)
- h(x) is also in C, and has a smaller degree. If g(x) t: h(x), then a suitable scalar
multiplier of g(x) - h(x) is monic, and is in C, and is of smaller degree than g(x). This
gives a contradiction.
(ii) Let a(x) E C. Then, by division algorithm, a(x) = q(x)g(x) + r(x), where deg r(x) <deg
g(x). But r(x) = a(x) - q(x)g(x) E C because both words on the right hand side of the
equation are codewords. However, the degree of g(x) must be the minimum among
all codewords. This can only be possible if r(x) = 0 and a(x) = q(x)g(x). Thus, a
codeword is obtained by multiplying the generator polynomial g(x) with the
polynomial q(x). For a code defined over GF(q), here are qk distinct codewords
possible. These codewords correspond to multiplying g(x) with the ( distinct
polynomials, q(x), where deg q(x) $ (k- I).
(iii) By division ｡ｬｧｯｲｩｴｨｭＬｾＭ 1 = q(x)g(x) + r(x), where deg r(x) <deg g(x). Or, r(x) = {(xn
- 1) - q(x)g(x)} modulo Ｈｾ＠ - 1) = - q(x)g(x). But - q(x)g(x) E C because we are
multiplying the generator polynomial by another polynomial -q(x). Thus, we have a
codeword r(x) whose degree is less than that of g(x). This violates the minimality of
the degree ofg(x), unless r (x) = 0. Which ｩｭｰｬｩ･ｳｾＭ 1 = q(x) g(x), i.e., g(x) is a factor
ｯｦｾＭ 1.
Cyclic Codes
The last part of the theorem gives us the recipe to obtain the generator ｰｯｬｾｯｭｩ｡ｬ＠ for a
cyclic code. All we have to do is to factorize xn - 1 into irreducible, monic ｰｯ｟ｬｾｯｭｩ｡ｬｳＮ＠ We can
also find all the possible cyclic codewords of blocklength n simply by factonzmg ｾ＠ - 1.
Note 1: A cyclic code C may contain polynomials other than the generator polynomial which
also generate C. But the polynomial with the minimum degree is called the generator
polynomial.
Note 2: The degree of g(x) is n- k (this will be shown later).
Example 4.12. To find all the binary cyclic codes ofblocklength 3, we first factorize2- 1. Note
that for GF(2), 1 =- 1, since 1 + 1 = 0. Hence,
ｾ＠ - 1 = ｾ＠ + 1 = (x + 1)( :J! + x + 1)
Thus, we can make the following table.
Generator Polynomial Code (polynomial) Code (binary)
1 {R3} {000, 001, 010, 011,
100, 101, 110, 111}
(x + 1) {0, X+ 1, X2 +X, .f + 1} {000,011, 110, 101}
(x2+x+ 1) {0, _x2 +X+ 1} {000, 111}
Ｈｾ＠ + 1) 0 {0} {000}
A simple encoding rule to generate the codewords from the generator polynomial is
qx) = i (x) g(x), (4.6)
where i(x) is the information polynomial, qx) is the codeword polynomial and ｾｸＩ＠ is the
generator polynomial. We have seen, already, that there is a one to one correspondence
between a word (vector) and a polynomial. The error vector can be also represented ｾｳ＠ the error
polynomial, e(x). Thus, the received word at the receiver, after passing through a nmsy channel
can be expressed as
(4.7)
v(x) = c(x) + e(x).
We define the Syndrome Polynomial, s(x) as the remainder of v(x) under division by ｾｸＩＬ＠
i.e.,
s(x) = Rg(x)[v(x)] = Rg(x)[ c(x) + e(x)] = Rg(x)[c(x)] + Rg(x)[e(x)] "'Rg(x)[e(x)],
because Rg(x)[qx)] = 0.
(4.8)

Example 4.13 Consider the generator polynomial g(x) = x?- + I for ternary cyclic codes (i.e., over
GF(3)) ofblocklengthn = 4. Here we are dealing with cyclic codes, therefore,the highest power of
g(x) is n- k. Since n = 4, k must be 2. So, we are going to construct a (4, 2) cyclic ternary code.
There will be a total ofqk =32
=9 codewords. Theinformation polynomials and the corresponding
codeword polynomials are listed below.
i i(x) c(x) = i(x) g(x) c
()() 0 0 ()()()()
OI I x} + I OI01
02 2 2x?- + 2 0202
IO X ｾＫｘ＠ 1010
11 X+ I ｾＫｘｬＫｸＫ＠ 1 1111
12 x+2 ｾＫＲｘｚＫｸＫＲ＠ 1212
20 2x ｾＫＲｸ＠ 2020
21 2x+ I ｾＫｲＫ＠ 2x+ I 2121
22 2x+ 2 ｾＫＲｘｚＫＲｸＫＲ＠ 2222
It can be seen that the cyclic shift of any codeword results in another valid codeword. By
observing the codewords we find that the minimum distance of this code is 2 (there are four non-
zero codewords with the minimum Hamming weight= 2). Therefore, this code is capable of
detecting one error and correcting zero errors.
Observing the fact that the codeword polynomial is divisible by the generator polynomial, we
can detect more number of errors than suggested by the minimum distance of the code. Since we
are dealing with cyclic codes that are a subset of linear block codes, we can use the all zero
codeword to illustrate this point without loss ofgenerality.
Assume that g(x) = x?- + 1 and the transmitted codeword is the all zero codeword.
Therefore, the received word is the error polynomial, i.e.,
v(x) = c(x) + e(x) = e(x). (4.9)
At the receiver end, an error will be detected ifg(x) fails to divide the received wordv(x) = e(x).
Now, g(x) has only two terms. So if the e(x) has odd number of terms, i.e., if the number of errors
are odd, it will be caught by the decoder! For example, if we try to divide e(x) = ｾ＠ + x +I by g(x),
we will always get a remainder. In the example of the (4, 2) cyclic code with g(x) = x?- + I, the d*
= 2, suggesting that it can detectd*- I = I error. However, by this simple observation, we find that
it can detect any odd number of errors ｾ＠ n. In this case, it can detect I error or 3 errors, but not 2
errors.
Cyclic Codes
4.5 MATRIX DESCRIPTION OF CYCLIC CODES
Theorem 4.5 Suppose Cis a cyclic code with generator polynomialg(x) = g0+ g1x + ...+g,xr
of degree r. Then the generator matrix of C is given by
go gl gr 0 0 0 0
0 go gl gr 0 0 0
G= 0 0 go g! gr 0 0 k= (n- r) rows (4.10)
...
0 0 0 0 0 go g! gr
n columns
Proof The (n- r) rows of the matrix are obviously linearly independent because of the
echelon form of the matrix. These (n - r) rows represent the codewords g(x), xg(x),
x2 g(x), ..., _?-r-1
g(x). Thus, the matrix can generate these codewords. Now, to prove that
the matrix can generate all the possible codewords, we must show that every
possible codeword can be represented as linear combinations of the codewords
g(x), xg(x), ,?g(x), ..., ｾＬＮ｟Ｑ
ｧＨｸＩＮ＠
We know that if c(x) is a codeword, it can be represented as
c(x) = q(x) .g(x) '
for some polynomial q(x). Since the degree of c(x) < n (because the length of the codeword
is n), it follows that the degree of q(x) < n- r. Hence,
q(x).g(x) = (qo + qlx + ...+ qn-r-lx"-r-l)g(x) = ｱｾＨｸＩＫ＠ q1xg(x) + ...+ qn-r-Ix"-r-
1
g(x)
Thus, any codeword can be represented as a linear combination of g(x), xg(x), Xl-g(x), ...,
x"-r-1
g(x). This proves that the matrix G is indeed the generator matrix.
We also know that the dimensions of the generator matrix is kx n. Therefore, r= n- k, i.e.,
the degree of g(x) is n- k.
Example 4.14 To find the generator matrices of all ternary codes (i.e., codes over GF(3)) of
blocklength n = 4, we first factorize x
4
- I.
x4 - I = (x- ｉＩＨｾＫ＠ x?- + x + I)= (x- I) (x + I)(x!- + 1)
We know that all the factors of x4
- 1 are capable of generating a cyclic code. The resultant
generator matrices are listed in Table 4.1. Note that -I = 2 for GF(3).
Table 4.1 Cyclic codes of blocklength n =4 over GF(3)
g(x) (n, k) dmin G
1 (4, 4) I [/4]
[-i
1 0
ｾ｝＠
(x-I) (4, 3) 2 -1 I
0 -I

g(x) (n, k) dmin G
[i
1 0
ｾ｝＠
(x+ 1) (4, 3) 2 1 1
0 1
Ｈｾ＠ + 1)
｛ｾ＠
0 1
ｾ｝＠
(4, 2) 2
0
1
ＨｾｉＩ＠ [Ｍｾ＠
0 1
ｾ｝＠
(4, 2) 2
0
-1
Ｈｾ＠Ｍｾ＠ + x•1) (4,1) 4 [-1 1 -1 1]
ＨｾｴｾＫｸＭｦｴｬＩ＠ (4, 1) 4 [tl 1 -tl 1]
(x4
- 1) (4, 1) 0 [0000]
It can be seen from the table that none of the (4, 2) ternary cyclic codes are single error correcting
codes (since their minimum distance is less than 3). An interesting observation is that we do not
have any ternary (4, 2) Hamming Code that is cyclic! Remember, Hamming Codes are single error
correcting codes with n = (q -1)/(q -1) and k = (q...., 1)/(q -1)- r, where r is an ｩｮｴ･ｧ･ｲｾ＠ 2.
Therefore, a (4, 2) ternary Hamming code exists, but it is not a cyclic code.
The next step is to explore if we can find a parity check polynomial corresponding to our
generator polynomial, g(x). We already know that ｾｸＩ＠ is a factor of X'- 1. Hence we can write
X'- 1 = h(x) g(x), (4.11)
where h (x) is some polynomial. The following can be concluded by simply observing the above
equation:
(i) Since g (x) is monic, h (x) has to be monic because the left hand side of the equation is also
monic (the leading coefficient is unity).
(ii) Since degree of g(x) is n - k, the degree of g(x) must be k.
Suppose Cis a cyclic code ｩｮｾ＠ with the generator polynomialg(x). Recall that we are den-oting
F[x]lf(x) by Rw wheref(x) =X'- 1. InRw h(x)g(x) =xn-1 =0. Then, any codeword belo-nging to
Ccan be written as c(x) = ｡ＨｸＩｾｸＩＬ＠ where the polynomial a(x) E ｾＭｔｨ･ｲ･ｦｯｲ･＠ in Rw
c(x)h(x) = ｡ＨｸＩｾｸＩｨＨｸＩ＠ = a(x) ·0 = 0.
Thus, h(x) behaves like a Parity Check Polynomial. Any valid codeword when multiplied
by the parity check polynomial yields the zero polynomial. This concept is parallel to that of the
parity check matrix introduced in the previous chapter. Since we are still in the domain of linear
block codes, we go ahead and define the parity check matrix in relation to the parity check
polynomial.
Cyclic Codes
Suppose Cis a cyclic code with the parity check polynomial h(x) = fto + h1
x + ... + hAl, then
the parity check matrix of C is given by
Recall that cHT= 0. Therefore, ｩｇｈｾ＠ = 0 for any information vector, i. Hence, GHT= 0. We
further have s = vHT where s is the syndrome vector and v is the received word.
Exmnple 4.15 For binary codes of block length, n =7, we have
x
1
-1 = ＨｸＭＱＩＨｾＫｸＫ＠ＱＩＨ［ＫｾＫ＠ 1)
Consider g(x) = Ｈｾ＠ + x + 1). Since g(x) is a factor Df x1
- 1, there is a cyclic code that can be
generated by it. The generator matrix corresponding to g(x) is
[
1 1 0 1 0 0 0]
0 1 1 0 1 0 0
G=
0 0 1 1 0 1 0
0 0 0 1 1 0 1
The parity check polynomial h(x) is (x- 1)(; + X2 + 1) = (x4
+ X2 + x + 1). And the corresponding
parity check matrix is
[
1 0 1 1 1 0 OJ
H= 0 1 0 1 1 1 0
0 0 1 0 1 1 1
The minimum distance of this code is 3, and this happens to be the (7, 4) Hamming Code. Thus, the
binary (7, 4) Hamming Code is also a Cyclic code.
4.6 BURST ERROR CORRECTION
In many real life channels, errors are not random, but occur in bursts. For example, in a mobile
communications channel, fading results in Burst errors. When errors occur at a stretch, as
opposed to random errors, we term them as Burst Errors.

I
•·
EX1l111Jlle 4.16 Let the transmitted sequence ofbits, transmitted at 10 kb/s over a wirelesschannel,
be
c = 0 1 0 0 0,1 1 l 0 1.0 1 0 0 0, 0 1 0 1 1 0 1
Suppose, after 0.5 ms ofthe start of transmission, the channel experiences a fade ofduration 1ms.
During this time interval, the channel corrupts the transmitted bits. The error sequence can be
written as
b = 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 0.
This is an example of a burst error, where a portion ofthe transmitted sequence gets garl>1ed due to
the channel. Here the length of the burst is 10 bits. However, not all ten locations arein error.
Definition 4.6 A Cyclic Burst of length t is a vector whose non-zero components
are among t successive components, the first and last of which are non-zero.
If we are constructing codes for channels more prone to burst errors of length t (as opposed
to an arbitrary pattern of t random errors), it might be possible to design more efficient codes.
We can describe a burst error as
e(x)=i.b(x) (4.13)
where is a polynomial of degree ｾ＠ t- 1 and b(x) is the burst pattern. xi marks the starting location
of the burst pattern within the codeword being transmitted.
A code designed for correcting a burst of length t must have unique syndromes for every
error pattern, i.e.,
s(x) = Rg(x)[e(x)]
is different for each polynomial representing a burst of length t.
Example 4.17 For a binary code ofblocklength n = 15, consider the generator polynomial
g(x) = x
6
+ ｾ＠ + ｾ＠ + x + 1
(4.14)
This code is capable ofcorrecting bursts of length 3 or less. To prove this we must show that all the
syndromes corresponding to the different burst errors are distinct. The different Burst Errors are
(i) Bursts of length 1
e(x) = J fori= 0, 1, ..., 14.
(ii) Bursts oflength 2
e(x) = J.(l + x) fori= 0, 1, ..., 13, and e(x) =J ·(1 ＫｾＩ＠ fori= 0, 1, ..., 13.
(iii) Bursts of length 3
e(x) =J·(1 ＫｸＫｾＩ＠ fori=O, 1, ..., 12.
Cyclic Codes
It can be shown that the syndrome of all these 56 (15 + 14 + 14 + 13) errorpatterns are distinct.
A table can be made for each pattern and the corresponding syndrome which can be used for
correcting a burst error of length 3 or less. It should be emphasized that the codes designed
specifically for correcting burst errors are more efficient in terms ofthe code rate. The code being
discussed here is a (15, 9) cyclic code with code rate =ldn = 0.6 and minimum distanced• =3. This
code can correct only 1 random error (but up to three burst errors!). Note that correction of one
random error amounts to correcting a burst error of length 1.
Similar to the Singleton Bound studied in the previous chapter, there is a Bound for the
minimum number of parity bits required for a burst-error correcting linear block code: 'A
linear block code that corrects all bursts of length t or less must have at least 2t parity symbols'.
In the next three sections, we will study three different sub-classes of cyclic codes. Each sub-
class has a specific objective.
4.7 FIRE CODES
Definition 4.7 A Fire code is a cyclic burst error correcting code over GF(q) with
the generator polynomial
g(x) = (x2
t-1-l)p(x), (4.15)
where p(x) is a prime polynomial over GF(tj) whose degree mis not smaller than t
and p(x) does not divide xu-1
-l. The blocklength of the Fire Code is the smallest
integer nsuch that g(x) divides X'-1. A Fire Code can correct all burst errors of length
tor less.
Example 4.18 Consider the Fire code with t= m = 3. A prime polynomial overGF{2) ofdegree 3
is p(x) =; + x + 1, which does not divide Ｈｾ＠ -1). The generator polynomial ofthe Fire Code will
be
g(x) = ＨｾＭＱＩｰＨｸＩ＠ = ＨｾＭＱＩＨ＠ f + x + 1)
= ｾ＠ + x6
+ ｾＭｾ＠ - x- 1
］ｾＫｸＶ
ＫｾＫｾＫｸＫ＠ 1
The degree of g(x) = n- k = 8. The blocklength is the smallest integer n such that g(x) divides
X'-1. After trial and error we get n = 35. Thus, the parameters of the Fire Code are (35, 27) with
g(x) = x8
+x6
+ｾ＠ +; +x + 1. This code can correct up to bursts oflength 3._The code rate ofthis
code is 0.77, and is more efficient than the code generated byg(x) = ｾ＠ +ｾ＠ +ｾ＠ + x + 1 which has
a code rate of only 0.6.
Fire Codes become more efficient as we increase t. The code rates for binary Fire Codes (with
m = t) for different values oft are plotted in Fig. 4.1.

I
0.9 ＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭＭＴＭＭＭＭＭＭ
.!l 0.8 - - -- - - ｾＭ - - - -- L - - - - - ｾ＠ - - - - - - ｾ＠ - - - ｾ＠ - _J- - - - - - • - - - - ｾ＠ -
e I :
CD 0.7 ---- ｾＭＭＭＭＭＭＧＭＭＭＭＭＭＭＧＭＭＭＭＭＭｾＭＭＭＭＭＭＧＭＭＭＭＭＭｾＭＭＭＭＭｾＭＭＭＭＭＭ
ｾ＠ I
8 0.6
I
-- ＭＭｾＭＭＭＭＭＭＬＭＭＭＭＭｾＭＭＭＭＭＭｲＭＭＭＭＭＬＭＭＭＭＭＭｲＭＭＭＭＭＺＭＭＭＭＭＭ
1 I I
I I I
0.5 - - -- - Ｍｾ＠ -- - - - - - ,___ - - - - ---1 - - -
I
0.4 '------'------'-----'------'---'------'----'------' t
2 3 4 5 6 7 8 9 10
Fig 4.1 Code Rates for Different Fire Codes.
4.8 GOLAY CODES
The Binary Golay Code
In the previous chapter, Sec. 3.9, we saw that a (23, 12) perfect code exists with d• = 7. Recall
that, for a perfect code,
M ｻＨｾＩＫｇＩ＠ (q-1) +
(;)<q-1)
2
+
...+
(;)<q-1)'} = q", (4.16)
which is satisfied for the values: n= 23, k= 12, M= 2k= 212
, q= 2 and t= (a- 1)/2 = 3. This
(23, 12) perfect code is the Binary Golay Code. We shall now explore this Perfect Code as a
cyclic code. We start with the factorization of (x23
-l).
ＨｾＳ
ＭＱＩ＠ = (x-I)(x11
+ ;o + i + x5
+ x4
+ x2
+ 1) (;1
+ x9
+:? + x6
+ x5
+ x + 1)
=(x-I) g1(x) fa(x). (4.17)
The degree of g1(x) = n- k= 11, hence k= 12, which implies that there exists a (23, 12) cyclic
code. In order to prove that it is a perfect code, we must show that the minimum distance of this
(23, 12) cyclic.code is 7. One way is to write out the parity check matrix, H, and show that no six
columns are linearly dependent. Another way is to prove it analytically, which is a long and
drawn-out proof. The easiest way is to write a computer program to list out all the 212
codewords
and find the minimum weight (on a fast computer it takes about 30 seconds!). The code rate is
0.52 and it is a triple error correcting code. However, the relatively small block length of this
perfect code makes it impractical for most real life applications.
The Ternary Golay Code
We next examine the ternary (11, 6) cyclic code, which is also the Ternary Golay Code. This
code has a minimum distance = 5, and can be verified to be a perfect code. We begin by
factorizing (x11
-1) over GF(3).
Cyclic Codes
(x11
-1) = (x- l)(x'i + x4
- x3+ ｾＭ 1) ＨｘｩＭ［ＭｾＭ x- 1)
= (x- 1) g1(x) [!a(x) (4.18)
The degree of g1(x) = n- k = 5, hence k = 6, which implies that there exists a (11, 6) cyclic
code. In order to prove that it is a perfect code, we must show that the minimum distance of this
(11, 6) cyclic code is 5. Again, we resort to an exhaustive computer search and find that the
minimum distance is indeed 5.
It can be shown that (xP-I) has a factorization of the form (x- 1) g1(x) g2(x) over GF(2),
whenever pis a prime number of the form 8m ± 1 (m is a positive integer). In such cases, g1(x)
and g2(x) generate equivalent codes. If the minimum distance of the code generated by g1
(x) is
odd, then it satisfies the Square Root Bound
d.'2,.jp (4.19)
Note that p denotes the blocklength.
4.9 CYCLIC REDUNDANCY CHECK (CRC) CODES
One of the common error detecting codes is the Cyclic Redundancy Check (CRC) Codes. For
a k-bit block of bits the (n, k) CRC encoder generates (n- k) bit long frame check sequence
(FCS). Let us define the following
T = n-bit frame to be transmitted
D = k-bit message block (information bits)
F= (n- k) bit FCS, the last (n- k) bits ofT
P= the predetermined divisor, a pattern of (n- k + 1) bits.
The pre-determined divisor, P, should be able to divide the codeword T. Thus, TIP has no
remainder. Now, Dis the k-bit message block. Therefore, 2n-kD amounts to shifting the k bits to
the left by (n- k) bits, and padding the result with zeros (recall that left shift by 1 bit of a binary
sequence is equivalent to multiplying the number represented by the binary sequence by two).
The codeword, T, can then be represented as
T= 2n-kD + F (4.20)
Adding Fin the above equation yields the concatenation of D and F. If we divide 2n-k D by P,
we obtain
2n-k D R
- - = Q,+ - (4.21)
p p
where, Q,is the quotient and RlPis the remainder. Suppose we use R as the FCS, then,
T= 2n-k D + R (4.22)
In this case, upon dividing Thy P we obtain
T 2"-k D+R 2"-k D R
-----= +-
p p p p

R R R+R
= Q_+ -+-=Q +--=Q
p p p
(4.23)
Thus there is no remainder, i.e., Tis exactly divisible by P. To generate such an FCS, we
simply divide 2rrk D by P and use the (n- k)-bit remainder as the FCS.
Let an error E occur when Tis transmitted over a noisy channel. The received word is given
by
V=T+E (4.24)
The CRC scheme will fail to detect the error only if Vis completely divisible by P. This
translates to the case when E is completely divisible by P (because Tis divisible by P).
Example 4.19 Let the messageD= 101000I101, i.e., k = IO and the pattern, P = 110101. The
number of FCS bits= 5. Therefore, n = I5. We wish to determine the FCS.
First, the message is multiplied by 25
(left shift by 5 and pad with 5 zeros). This yields
25
D = 10I000110100000
Next divide the resulting number byP = 1I0101. By long division. we obtain Q= 1101010110 and
R = 0Ill0. The remainder is added to 25
D to obtain
T = lOIOOOIIOIOIIIO
Tis the transmitted codeword. Ifno errors occur in the channel, the received word whendivided by
P will yield 0 as the remainder.
CRC codes can also be defined using the polynomial representation. Let the message
polynomial be D(x) and the predetermined divisor be P(x). Therefore,
xn-k D(x) _ n( ) R(x)
-----!o(...x +--
P(x) P(x)
T(x) = xn- kD(x) + R(x) (4.25)
At the receiver end, the received word is divided by P(x). Suppose the received word is
V(x) = T(x) + E(x), (4.26)
where E(x) is the error polynomial. Then [T(x) + E(x) ]IP(x) = E(x)!P(x) because T(x)!P(x) = 0.
Those errors that happen to correspond to polynomials containing P(x) as a factor will slip by,
and the others will be caught in the net of the CRC decoder. The polynomial P(x) is also called
the generator polynomial for the CRC code. CRC codes are also known as Polynomial Codes.
Example 4.20 Suppose the transmitted codeword undergoes a single-bit error. The error
polynomialE(x) can be represented byE(x) =ｾＬ＠ where i determines the location ofthe single error
bit. IfP(x) contains two or more terms, E(x)IP(x) can never be zero. Thus all the single errors will
be caught by such a CRC code.
Cyclic Codes
Example 4.21 Suppose two isolated errors occur, i.e., E(x) = :i + xi, i >j. Alternately, E(x) =
xi(7!-i + 1). Ifwe assume thatP(x) is not divisible byx, then a sufficient conditionfor detecting all
double errors is thatP(x) does not divide.I+ I for any kup to the maximum value ofi-j (i.e., the
frame length). For example x15
+ x14
+ 1 will not ､ｩｶｩ､･ｾＫ＠ 1 for any value of k below 32,768.
Example 4.22 Suppose the error polynomial has an odd numberofterms (correspondingto an odd
number of errors). An interesting fact is that there is no polynomial with an odd number of terms
that hasx + I as a factor ifwe are performing binary arithmetic (modulo 2 operations). By making
(x + 1) as a factor of P(x), we can catch all errors consisting of odd number of bits (i.e., we can
catch at least half of all possible errors!).
Another interesting feature of CRC codes is its ability to detect burst errors. A burst error of
length k can be represented by ｸｩＨｾＭＱ＠
+ ｾＭＲ＠
+ ... + 1), where idetermines how far from the right
end of the received frame the burst is located. If P(x) has a ,P term, it will not have an xi as a
factor. So, if the degree of (xk-1
+ xk-2
+ ... + I) is less than the degree of P(x), the remainder can
never be zero. Therefore, a polynomial code with r check bits can detect all burst errors of
length ｾ＠ r. If the burst length is r + 1, the remainder of the division by P(x) will be zero if, and
only if, the burst is identical to P(x). Now, the 1st and last bits of a burst must be 1 (by definition).
The intermediate bits can be 1 or 0. Therefore, the exact matching of the burst error with the
polynomial P(x) depends on the r- 1 intermediate bits. Assuming all combinations are equally
likely, the probability of a miss is
27
1
_1
. One can show that when error bursts oflength greater
than r+ 1 occurs, or several shorter bursts occur, the probability of a bad frame slipping through
. I
IS-.
2T
Example 4.23 Four versions ofP(x) have become international standards:
CRC-12: P(x) = x12
+ x11
+ Y! + ｾ＠ + x
1
+ I.
CRC-16: P(x) = x
16
+ x15
+ ｾ＠ + 1.
CRC-CCITI: P(x) = x16
+ x
15
+ ｾ＠ + 1.
CRC-32: P(x) = Y!2
+ ｾ＠ + ? + ? + x16
+ x
12
+ x
11
+ x
10
+ x
8
+ x
1
+ ｾ＠ + x
4
+ ｾ＠ + x
1
+ I.
(4.27)
All the four contain (x + 1) as a factor. CRC-12 is used for transmission of streams of 6-bit
characters and generates a 12-bit FCS. Both CRC-16 and CRC-CCITI are popular for 8-bit
characters. They result in a 16 bit FCS and can catch all single and double errors, all errors with
odd number of bits, all burst errors of length 16 or less, 99.997% of 17-bit bursts and 99.998%
of18-bit and longer bursts. CRC-32 is specified as an option in some point-to-point Synchronous
Transmission Standards.

4.10 CIRCUIT IMPLEMENTATION OF CYCLIC CODES
Shift registers can be used to encode and decode cyclic codes easily. Encoding and decoding of
cyclic codes require multiplication and division by polynomials. The shift property of shift
registers are ideally suited for such operations. Shift registers are banks of memory units which
are capable of shifting the contents of one unit to the next at every clock pulse. Here we will
focus on circuit implementation for codes over GF(2"l Beside the shift register, we will make
use of the following circuit elements:
(i) A scaler, whose job is to multiply the input by a fixed field element.
(ii) An adder, which takes in two inputs and adds them together. A simple circuit realization
of an adder is the 'exclusive-or' or the 'xor' gate.
(iii) A multiplier, which is basically the 'and' gate.
These elements are depicted in Fig. 4.2.
... D-O-
N stage shift register
Scaler Adder Multiplier
Fig. 4.2 Circuit Elements Used to Construct Encoders and Decoders for Cyclic Codes.
A field element of GF(2) can simply be represented by a single bit. For GF(2m) we require m
bits to represent one element. For example, elements of GF(B) can be represented as the
elements of the set {000, 001, 010, 011, 100, 101, 110, 111}. For such a representation we need
three clock pulses to shift the element from one stageof the shift register to the next. The effective
shift register for GF(B) is shown in Fig. 4.3. Any arbitrary element of this field can be
represented by aX-+ bx+ c, where a, b, care binary, and the power of the indeterminate xis used
to denote the position. For example, 101 =} + 1.
ｾ＠
One stage of the effective
shift register
. . . L . . _ _ _ lｾｾ＠
Fig. 4.3 The Effective Shift Register for GF(B).
Cyclic Codes
Example 4.24 We now consider the multiplication of any arbitrary element by another field
element overGF(8). Recall the construction ofGF(8) from GF(2) using the primepolynomialp(x)
］ｾＫｘＫ＠ 1. The elements of the field will be 0, 1, X, X+ 1, r, Xl + 1, Xl +X, :Xl +X+ 1. We want
to obtain the circuit representation for the multiplication of any arbitrary field element (a:il + bx +
c) by another element, say, Xl + x. We have
(d + bx + c)(r + x) = ax4
+(a+ ｢ＩｾＫ＠ (b +c) Xl +ex (modulop(x))
= (a + b + c) Xl + (b + c)x + (a + b)
One possible circuit realization is shown in Fig. 4.4.
Fig. 4.4 Multiplication of an Arbitrary Field Element.
We next focus on the multiplication of an arbitrary polynomial a(x) by g(x). Let the
polynomial g(x) be represented as
g(x) =&_XL+... + g1x + g0, (4.28)
the polynomial a(x) be represented as
a(x) = akx* +... + a1x + a0,
the resultant polynomial b(x) = a(x)g(x) be represented as
b(x) = bk+Lxk+L +... + b1x + !Jo.
(4.29)
(4.30)
The circuit realization of b(x) is given in Fig. 4.5. This is linear feed-forward shift register. It is
also called a Finite Impulse Response (FIR) Filter.
Fig. 4.5 A Finite lmpuse Response (FIR) Filter.

In Electrical Engineering jargon, the coefficients of a(x) and g(x) are ｣ｯｮｶｯｾｶ･､＠ by the shift
register. For our purpose, we have a circuit realization for multiplying two polynomials. Thus,
we have an efficient mechanism of encoding a cyclic code by multiplying the information
polynomial by the generator polynomial.
Exmnpk 4.25 The encoder circuit for the generator polynomial
g(x) =x8
+ x6
+ ｾ＠ + ｾ＠ + x + 1
is given in Fig. 4.6.
This is the generator polynomial for the Fire Code with t =m=3. It is easy to inteipret the circuit
The 8memory units shift the input, one unitat a time. The shifted outputsare summedat theproper
locations. There are five adders for summing up the six shiftedversions ofthe input.
. X . x2 x3 ' . y;4 x5 ｾ＠ . . i' , x8
+ + + + . . +
- • - : ｾ＠ . ｾＭ f ｾＮ＠ ; ｾ＠ t ＭＭｾ＠ｾＧＭ
1
Fig. 4.6 Circuit Realization of the Encoder for the Fire Code.
For dividing an arbitrary polynomial by a fixed polynomial g(x), the circuit realization is
given in Fig. 4.7.
We can also use a shift register circuit for dividing an arbitrary polynomial, a(x), by a fixed
polynomial g(x). We assume here that the divisor is a monic polynomial. We already know how
to factor out a scalar in order to convert any polynomial to a monic polynomial. The division
process can be expressed as a pair of recursive equations. Let ｑｾＨｸＩ＠ and ｒｾＨｸＩ＠ be the quotient
polynomial and the remainder polynomial at the (It recursion step, with the initial conditions
Q,(o)(x) = 0 and R(0
l(x) = a(x). Then, the recursive equations can be written as
ｑｾＨｸＩ＠ = Qr-l)(x) + ｒｲｮＭｾｸｫＭｲＬ＠
R(il(x) = R(r-l)(x)- R(n-r)xk-r"'x), (4.31)
where ｒｲｮＭｾ＠ represents the leading coefficient of the remainder polynomial at stage (r - 1).
For dividing an arbitrary polynomial a(x) by a fixed polynomial g(x), the circuit realization is
given in Fig. 4.8. Mter n shifts, the quotient is passed out of the shift register, and the value
stored in the shift register is the remainder. Thus the shift register implementation of a decoder
is very simple. The contents of the shift register can be checked for all entries to be zero after the
division of the received polynomial by the generator polynomial. If even a single memory unit
of the shift register is non-zero, an error is detected.
Cyclic Codes
Fig. 4.7 A shift Register Circuit for Dividing by g(x).
Extutapk 4.26 The shiftregister circuit for dividing byg(x) =ｾ＠ + ｾ＠ + ｾ＠ +ｾ＠ +x + 1isgivenin
Fig. 4.8.
ｾ＠
Fig. 4.8 A Shift Register Circuit for Dividing by g(x) = x' + X' + x5 + x' + x + 1.
The procedure for error detection and error ｣ｯｲｲ･｣ｴｩｾ＠ is ｾ＠ follows. The ｲ･｣･ｩｶｾ＠ｾ＠ is first
stored in a buffer. It is subjected to divide-by-g(x) operation. As we have ｳ･･ｮＬｾ＠ di':ston ｾ｢･＠
carried out very efficiently by a shift register circuit. The remainder in the shift regiSter 1S then
compared with all the possible (pre-compUted) syndromes. This set of ｳｾｭ･ｳ＠ correspon<!s to
the set of correctable error patterns. Ifa syndrome match is found. the error1s subtractedoutfrom
the received word. The corrected version ofthe received word is then passed on to the next stage of
the receiver unit for further processing. This kind ofa decoder is known asMeggittDecoder· The
flow chart for this is given in Fig. 4.9.
Divide by g(x) feedback
Received word
Compare with all test syndromes
Corrected Word
n stage shift register
Ag. 4.9 The Flow Chart of a Meggitt Decoder.

r
I
I
The notion of cyclic codes was first introduced by Prange in 1957. The work on cyclic codes was
further developed by Peterson and Kasami. Pioneering work on the minimum distance of cyclic
codes was done by Bose and Raychaudhuri in the early 1960s. Another subclass of cyclic codes,
the BCH codes (named after Bose, Chaudhuri and Hocquenghem) will be studied in detail in
the next chapter. It was soon discovered that almost all of the earlier discovered linear block
codes could be made cyclic. The initial steps in the area of burst error correction was taken by
Abramson in 1959. The Fire Codes were published in the same year. The binary and the ternary
Golay Codes were published by Golay in, as early as, 1949.
Shift register circuits for cyclic codes were introduced in the works of Peterson, Chien and
Meggitt in the early 1960s. Important contributions were also made by Kasami, MacWilliams,
Mitchell and Rudolph.
SUMMARY
• A polynomial is a mathematical expression f(x) =fo +fix+ ... + f/, where the symbol
xis called the indeterminate and the coefficientsfo,h, ... fm are the elements of GF(q). The
coefficient fm is called the leading coefficient. Iffm :t= 0, then m is called the degree of the
polynomial, and is denoted by deg f(x). A polynomial is called monic if its leading
coefficient is unity. '
• The division algorithm states that, for every pair of polynomial a(x) and b(x) :t= 0 in F[x],
there exists a unique pair of polynomials q(x), the quotient, and r(x), the remainder, such
that a(x) = q(x) b(x) + r(x), where deg r(x) <deg b(x). The remainder is sometimes also
called the residue, and is denoted by Rb(x)[a(x)] = r(x).
• Two important properties of residues are
(i) Rf(x)[a(x) + b(x)] = JY(x)[a(x)] + JY(x)[b(x)], and
(ii) ｾｸＩ｛｡ＨｸＩＮ＠ b(x)] = JY(x){JY(x)[a(x)]. Rr(x)[b(x)]}
where a(x), b(x) and f(x) are polynomials over GF(q).
• A polynomial f(x) in F[x] is said to be reducible if f(x) = a(x) b(x), where a(x), b(x) are
elements of f(x) and deg a(x) and deg b(x) are both smaller than deg f(x). H f(x) is not
reducible, it is called irreducible. A monic irreducible polynomial of degree at least one
is called a prime polynomial.
• The ring F[x]!f(x) is a field if and only if f(x) is a prime polynomial in F[x].
• A code Cin ｾ＠ is a cyclic code if and only if C satisfies the following conditions:
(i) a(x), b(x) E C ｾ＠ a(x) + b(x) E C,
(ii) a(x) E C and r(x) E ｾ＠ｾ＠ a(x)r(x) E C.
• The following steps can be used to generate a cyclic code:
(i) Take a polynomial f(x) ｩｮｾＭ
Cyclic Codes
(ii) Obtain a set of polynomials by multiplying f(x) by all possible polynomials in ｾＭ､＠
. . b ds to the set of codewor s
(iii) The set of polynomials obtained as a ove correspon .
belonging to a cyclic code. The blocklength of the code IS n.
• Let Cbe a (n, k) non zero cyclic code in Rn. Then, .
(i) there exists a unique monic polynomial g(x) of the smallest degree m C, .
(ii) the cyclic code C consists of all multiples of the generator polynomial g(x) by
polynomials of degree k - 1 or less.
(iii) g(x) is a factor ｯｦｾ＠ - 1.
(iv) The degree of g(x) is n- k. d th
• For a cyclic code, C, with generator polynomial g(x) =go+ g1x + ...+ g,x of egree r, e
generator matrix is given by
go gl g, 0
0 go g1 g,
0 0 go &
G=
0
0
g,
0
0
0
0
0
0
0 0 0 0 0 ｾ＠ & b
k =n- rrows
n columns
• For a cyclic code, C, with the parity check polynomial h(x) = ho + h1x+ ...+ ｨｾＺｬＬ＠ then the
parity check matrix is given by
hk hk -1
0 hk hk -1
0 0 hk hk-1
H=
0 0 0 0
0
ho
0
0
ho
0
0
0
0
0
0
(n- k) rows
n columns
• ｾ＠ _
1
= h(x) g(x), where g(x) is the generator polynomial and h(x) is the parity check
polynomial. GF( ) "th the generator
• A fire code is a cyclic burst error correcting code over ｾ＠ WI
1
"al (x) = (;t-1_1) p(x), where p(x) is a prime polynomial over GF(q) ｷｨｾｳ･＠
po ynomi g t divide _xu-1_1. The blocklength of the Fire
degree mis not small_er than t ｡ｮ､ｨｰｾＩ＠ｴ､ｧｯＨＺｾ＠ｾｾ､･ｳ＠ｾＭＱＮ＠ A Fire Code can correct all burst
Code is the smallest mteger nsue a ;
errors of length t or less.
• The generator polynomial of the Binary Golay Code:
g1
(x) = (xll + x10 +; + x! + :4 +; + 1), or
g2(x) = (x11 +; + / +} + :f + x + 1).
I
I
I

• The generator polynomial of the Ternary Golay Code:
g1(x) = (} + x4
- i+;- I), or
g2(x) = (} - i - ; - x - I).
• One of the common error detecting codes is the Cyclic Redundancy Check (CRC) codes.
For a k-bit block of bits the (n, k) CRC encoder generates (n- k) bit long Frame Check
Sequence (FCS).
• Shift ｾ･ｧｩｳｴ･ｲｳ＠｣ｾ＠ be used to encode and decode cyclic codes easily. Encoding and
decodmg of ｣ｾ｣ｨ｣＠ c_odes ｲ･ｱｾｩｲ･＠ multiplication and division by polynomials. The shift
property of shift regtsters are Ideally suited for such operations.
f9 ｅ｜ｬｾｾｯｯｭＮ｡､･ＬＮｾｾＨｍｱｩｬﾣｴｾｾ＠ but-not
ＱＱＯｾｬｵＮ＠
I I ｅｾ＠ Albert- (l879-1955)
ｵｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠
PROBLEMS
4.I Which of the following codes are (a) cyclic, (b) equivalent to a cyclic code?
(a) {0000, 0110, 1100, 0011, 100I} over GF(2).
(b) {00000, 10110, 0110I, 11011} over GF(2).
(c) {00000, 10I10, 0110I, 11011} over GF(3).
(d) {0000, II22, 22II} over GF(.'1).
(e) The rrary repetition code of length n.
4.2 Construct the addition and multiplication table for
(a) F[x]!(} + I) defined over GF(2).
(b) F[x]!(} + I) defined over GF(3).
Which of the above is a field?
4.3 List out all the irreducible polynomials over
(a) GF(2) of degrees I to 5.
(b) GF(3) of degrees I to 3.
4.4 Find all the cyclic binary codes of blocklength 5. Find the minimum distance of each
code.
4.5 ｓｵｰｾｯｳ･＠ X'- I is a product of r distinct irreducible polynomials over GF(q). How many
cychc codes of blocklength n over GF(q) exist? Comment on the minimum distance of
these codes.
ＴＮＶｾ｡｣ｴｯｲｩｺ･＠ /'- I over GF(3).
(b) How many ternary cyclic codes of length 8 exist ?
(c) How many quaternary cyclic codes of length 8 exist?
Cyclic Codes
4.7 Let the polynomial
g(x) = ;o + /' + ; + x4
+ ; + x + I
be the generator polynomial of a cyclic code over GF(2) with blocklength I5.
WFind the generator matrix G.
(gYF'ind the parity check matrix H.
U::YHow many errors can this code detect?
A (gY1Iow many errors can this code correct?
(e) Write the generator matrix in the systematic form.
4.8 Consider the polynomial
g(x) = J5 + 3; + i + ; + 2; + 2x + 1.
(a) Is this a valid generator polynomial for a cyclic code over GF(4) with blocklength 15?
(b) Find the parity check matrix H.
(c) What is the minimum distance of this code?
(d) What is the code rate of this code?
(e) Is the received word, v(x) = J +; + 3i +; + 3x + I, a valid codeword?
4.9 An error vector of the form J + J+ 1
ｩｮｾ＠ is called a double adjacent error. Show that the
code generated by the generator polynomial g1(x) = (x- I) !lli(x) is capable of correcting
all double adjacent errors, where gn(x) is, the generator polynomial of the binary
Hamming Code.
4.10 Design the Shift Register encoder and the Meggitt Decoder for the code generated in
Problem 4.8.
4.II The code with the generator polynomial g(x) = (x23
+ I)(;7
+ ; + I) is used for error-
detection and correction in the GSM standard.
(i) How many random errors can this code correct?
(ii) How many hurst errors can this code correct?
COMPUTER PROBLEMS
4.I2 Write a computer program to find the minimum distance of a cyclic code over GF(q),
given the generator polynomial (or the generator matrix) for the code.
4.I3 Write a computer program to encode and decode a (35, 27) Fire Code. It should be able
to automatically correct bursts of length 3 or less. What happens when you try to decode
a received word with a burst error of length 4?

Bose-Chaudhuri
Hocquenghem (BCH) Codes
5.1 INTRODUCTION TO BCH CODES
5
The class of Bose-Chaudhuri Hocquenghem (BCH) codes is one of the most powerful known
class of Linear Cyclic Block Codes. BCH codes are known for their multiple error correcting
ability, and the ease of encoding and decoding. So far, our approach has been to construct a
code and then find out its minimum distance in order to estimate its error correcting capability.
In this class of code, we will start from the other end. We begin by specifying the number of
random errors we desire the code to correct. Then we go on to construct the generator
polynomial for the code. As mentioned above, BCH codes are a subclass of cyclic codes, and
therefore, the decoding methodology for any cyclic code also works for the BCH codes.
However, more efficient decoding procedures are known for BCH codes, and will be discussed
in this chapter.
We begin by building the necessary mathematical tools in the next couple of sections. We
shall then look at the method for constructing the generator polynomial for BCH codes. Efficient
decoding techniques for this class of codes will be discussed next. An important sub-set of BCH
codes, the Reed-Solomon codes, will be introduced in the later part of this chapter.
Bose-Chaudhuri Hocquenghem (BCH) Codes
5.2 PRIMITIVE ELEMENTS
Definition 5.1 A Primitive Element of GF(q) is an element a such that every
field element except zero can be expressed as a power of a.
Example 5.1 Consider GF(5). Since q = 5 =a prime number, modulo arithmetic will work.
Consider the element 2.
2° = 1 (mod 5) = 1,
21
=2(mod5)=2,
22
= 4 (mod 5) = 4,
23
= 8 (mod 5) = 3.
Hence, all the elements of GF(5), i.e., {1, 2, 3, 4, 5} can be represented as powers of 2.
Therefore, 2 is a primitive element ofGF(5).
Next, consider the element 3.
3°= 1 (mod5)= 1,
31
= 3 (modS)= 3,
32
=9 (mod 5) =4,
33
= 27 (mod 5) = 2.
Again, all the elements ofGF(5), i.e., {1, 2, 3, 4, 5} can be represented as powers of 3. Therefore,
3 is also a primitive element ofGF(5).
However, it can be tested that the other non-zero elements {1, 4, 5} are not primitive elements.
We saw in the example that there can be more than one primitive element in a field. But is
there a guarantee of finding at least one primitive element? The answer is yes! The non-zero
elements of every Galois Field form a cyclic group. Hence, a Galois Field will include an
element of order q- 1. This will be the primitive element. Primitive elements are very useful in
constructing fields. Once we have a primitive element, we can easily find all the other elements
by simply evaluating the powers of the primitive element.
Definition 5.2 A Primitive Polynomial p(x) over GF(q) is a prime polynomial
over GF(q) with the property that in the extension field constructed modulo p(x), the
field element represented by x is a primitive element.
Primitive polynomials of every degree exist over every Galois Field. A primitive polynomial
can be used to construct an extension field.

r
I;
(
l.
I
r·.
!
'i
·::
'.
Extzmple 5.2 We can construct.GF(8) using the ｰｲｩｭｩｴｩｾ･＠ polynomial p(x) = X3 + x + 1. Letthe
primitive element of GF(8) be a =z. Then, we can represent all the elements of GF(8) by the
powers of a evaluated modulop(x). Thus, we can form Table 5.1.
Table 5.1 The elements of GF(B).
aI t
ｾ＠ l'
as t+ I
a4 c+z
ｾ＠ c+z+ I
a6 .(+I
a7 I
Theorem 5.1 Let /31, ｾＧ＠ ..• , f3q-1 denote the non-zero field elements of GF(q). Then,
xq- 1
- 1= (x- f31)(x- ｾＩ＠ ... (x- f3q_ 1). (5.1)
Proof The set of non-zero elements of GF(q) is a finite group under the operation of
multiplication. Let f3 be any non-zero element of the field. It can be represented as a power
of the primitive element a. Let f3 = (a) r for some integer r. Therefore,
13q-1 = ((anq- 1
= ((a)q- 1
Y = (1Y = 1
because,
Hence,
f3 is a zero ｯｦｾＭ 1
- 1.
This is true for any non-zero element /3.
Hence,
Exturtpk 5.3 Consider the field GF(5). The non-zero elements of this field are {1, 2, 3, 4}.
x4
-1 =(x- l)(x- 2)(x- 3)(x- 4).
5.3 MINIMAL POLYNOMIALS
In the previous chapter we saw that in order to find the generator polynomials for cyclic codes
of blocklength n, we have to first factorize X' -1. Thus X' -I can be written as the product of its
p prime factors
X' -I = fi(x) f2(x) f3(x) ... /p (x). (5.2)
Any combination of these factors can be multiplied together for a generator polynomial g(x).
If the prime factors of X' -I are distinct, then there are (2P- 2) different non-trivial cyclic codes
ofblocklength n. The two trivial cases that are being disregarded are g(x) = 1 and g(x) =X' -1.
Not all of the (2P- 2) possible cyclic codes are good codes in terms of their minimum distance.
We now evolve a strategy for finding good codes, i.e., of desirable minimum distance.
In the previous chapter we learnt how to construct an extension field from the subfield. In
this section we will study the prime polynomials (in a certain field) that have zeros in the
extension field. Our strategy for constructing g(x) will be as follows. Using the desirable zeros in
the extension field we will find prime polynomials in the subfield, which will be multiplied
together to yield a desirable g(x).
Definition 5.3 A blocklength n of the form n = rj - 1 is called a Primitive Block
Length for a code over GF(q). A cyclic code over GF{q) of primitive blocklength is
called a Primitive Cyclic Code.
The field GF((') is an extension field of GF(q). Let the primitive block length n = rf' - 1.
Consider the factorization
X' -1 = x'f"'-1
- 1 = fi(x) f2(x) ... /p(x) (5.3)
over the field GF(q). This factorization will also be valid over the extension field GF(tj) because
the addition and multiplication tables of the subfield forms a part of the tables of the extension
field. We also know that g(x) divides X' -1, i.e., ｾｭＭＱ
Ｍ 1, hence g(x) must be the product of
m-1
some of these polynomials j;(x). Also, every non-zero element of GF(rf') is a zero ｯｦｾ＠ - 1.
Hence, it is possible to factor xqm-l - 1 in the extension field GF(rj) to get
m-1 IJ
xq - 1 = (x - /3), (5.4)
j
where 131 ranges over all the non-zero elements of GF(rf'). This implies that each of the
polynomials j;(x) can be represented in GF(rf') as a product of some of the linear terms, and
each 131is a zero of exactly one of the j;(x). This j;(x) is called the Minimal Polynomial of f3t
Definition 5..4 The smallest degree polynomial with coefficients in the base field
GF(q) that has a zero in the extension field GF(tj) is called the Minima) Polynomial
of ·.
I

I
li
Example 5.4 Consider the subfield GF(2) and its extension field GF(8). Here q = 2 and m = 3.
The factorization ofrl -1 (in the subfield/extension field) yields
rr-l- 1 = x7 - 1 =(X- 1) ＨｾＫｘＫ＠ 1) Ｈｾ＠ + ｾ＠ + 1).
Next, consider the elements ofthe extension field GF(8). The elements can be represented as 0,
1, z, z + 1, z?, i + 1, i + z, i + z +, 1 (from Example 4.10 of Chapter 4).
rr-1
-1 = x1
-1 =(x-1)(x-z)(x-z-1)(x-i)Cx-i-1)(x-i-z)(x-i-z-1)
= (x- 1) · [(x- z)(x- C) (x- z?- z)] · [(x- z- 1)(x -z!-l)(x- z?- z- 1)].
It can be seen that over GF(8),
Ｈｾ＠ + x + 1) = (x - z)(x - C)(x - i - z), and
Ｈｾ＠ + ｾ＠ + 1) = (x- z- 1)(x- z
2
- 1)(x- i- z- 1).
The multiplication and addition are carried out over GF(8). Interestingly, after a little bit of
algebra it is found that the coefficients of the minimal polynomial belong to GF (2) only. We can
now make Table 5.2.
Table 5.2 The Elements of GF(B) in Terms of the Powers of the Primitive Element a
Minimal polynomial Corresponding Elements Elements in Terms
f,(x) [31
in GF(8) of powers of a
(x- I)
(x1 + x+ I)
(x1 +;+I)
I
z. i and i + z
z+ I, i + I and i + z+ I
ao
a 1
,a2
,a4
a3, a6, as (= a!2)
It is interesting to note the elements (in terms of powers of the primitive element a) that
correspond to the same minimal polynomial. If we make the observation that a12
= d ·d =
1· a5
, we see a pattern in the elements that correspond to a certain minimal polynomial. In fact,
the elements that are roots of a minimal polynomial in the extension field are of the type f3qr-l
where f3 is an element of the extension field. In the above example, the zeros of the minimal
polynomialf2(x) =; + x+ 1 are a 1
, a2 and a4
and that ofh(x) =; + ; + 1 are if, cf and d 2
.
Definition 5.5 Two elements of GF(tj) that share the same minimal polynomial
over GF(q) are called Conjugates with respect to GF(q). .
Example 5.5 The elements { a 1
, a 2
, a4
} are conjugates with respect to GF(2). They share the
same minimal polynomialf2(x) = ｾ＠ + x + 1.
As we have seen, a single element in the extension field may have more than one conjugate. The
conjugacy relationship between two elements depends on the base field. For example, the extension
field GF(16) can be constructed using eitherGF(2) orGF(4). Two elements that are conjugates of
GF(2) may not be conjugates ofGF(4).
Iff(x) is the minimal polynomial of/3, then it is also the minimal polynomial ofthe elements in
the set {/3, f3q, f3q
2
,...,f3 qr-J }, where r is the smallest integer such that f3qr-l = /3. The set {/3, f3q,
f3q
2
,•.• ,{Jqr-
1
} is called the Set of Conjugates. The elements in the set of conjugates are all the
zeros off(x). Hence, the minimal polynomial offJ can be written as
f(x) = (x- fJ)(x- fJq)(x- pi") ... (x- pi-
1
). (5.5)
Example 5.6 Consider GF(256) as an extension field ofGF(2). Let abe the primitive elementof
GF(256). Then a set of conjugates would be
{at, c?, a4, as, al6, a32, a64, al28}
Note that cl-56 = a255
d = d, hence the set of conjugates terminates with d 28. The minimal
polynomial of a is
f(x) = (x- a
1
)(x- a
2
)(x- ｡ｾＨｸＭ a
8
)(x- ｡Ｑ
ｾＨｸＭ a 32)(x- a 64
)(x- ｡Ｑ
ｾ＠
The right hand side of the equation when multiplied out would only contain coefficients from
GF(2).
Similarly, the minimal polynomial ofcr would be
f(x) = (x- a
3
)(x - a
6
)(x - d 2
)(x- a 24
)(x - a 48)(x - ｡ｾＨｸＭ a 192
)(x- ､Ｒ
ｾ＠
Definition 5.6 BCH codes defined over GF(q) with blocklength q"'- 1 are called
Primitive BCH codes.
Having developed the necessary mathematical tools, we shall now begin our study of BCH
codes. We will develop a method for constructing the generator polynomials of BCH codes that
can correct pre-specified t random errors.
5.4 GENERATOR POLYNOMIALS IN TERMS OF MINIMAL POLYNOMIALS
We know that g(x) is a factor of XZ - 1. Therefore, the generator polynomial of a cyclic code can
be written in the form
g(x) = LCM [[I(x) f2(x), ...,JP(x)], (5.6)
where, JI(x) J2(x), ..., JP(x) are the minimal polynomials of the zeros of g(x). Each minimal
polynomial corresponds to a zero of g(x) in an extension field. We will design good codes (i.e.,
determine the generator polynomials) with desirable zeros using this approach.
.I

I .
I,
Let c(x) be a codeword polynomial and e(x) be an error polynomial. Then the received
polynomial can be written as
v(x) = c(x) + e(x) (5.7)
where the polynomial coefficients are in GF(q). Now consider the extension field GF(rj). Let y1,
ｾ＠ •... Yp be those elements of GF(rj) which are the zeros of g(x), i.e., g(y;) = 0 for i = 1, .., p.
Since, c(x) = a(x)g(x) for some polynomial a(x), we also have c( y;) = 0 for i= 1, .., p. Thus,
v(r;) = c(r;) + e(r;)
= e()'i) fori= 1, ..., p (5.8)
For a blocklength n, we have
11-1
v()'i) = "Le1r{ fori= 1, ..., p.
j•O
(5.9)
Thus, we have a set of pequations that involve components of the error pattern only. If it is
possible to solve this set of equations for e
1, the error pattern can be precisely determined.
Whether this set of equations can be solved depends on the value of p, the number of zeros of
g(x). In order to solve for the error pattern, we must choose the set of pequations properly. Ifwe
have to design for a t error correcting cyclic code, our choice should be such that the set of
equations can solve for at most t non-zero ei
Let us define the syndromes S; = e(rJ for i = 1' ..., p. we wish to choose YI ' 12··.. Jp in such a
manner that terrors can be computed from S1, S1,•••, t If a is a primitive element, then the set
of Y; which allow the correction of t errors is {aI, a , a3
, ... , a 21
}. Thus, we have a simple
mechanism of determining the generator polynomial of a BCH code that can correct t errors.
Steps for Determining the Generator Polynomial of a t-error Correcting BCH Code:
For a primitive blocklength n = rj- 1:
(i) Choose a prime polynomial of degree m and construct GF(rj).
(ii) Find [;(x), the minimal polynomial of ai for i = 1, ..., p.
(iii) The generator polynomial for the t error correcting code is simply
g(x) = LCM [[1(x) [ 2(x), ...,[2t(x)]. (5.10)
Codes designed in this manner can correct at least t errors. In many cases the codes will be
able to correct more than t errors. For this reason,
d= 2t+ 1 (5.11)
is called the Designed Distance of the code, and the minimum distance I ;;:: 2t + 1. The
generator polynomial has a degree equal to n- k (see Theorem 4.4, Chapter 4). It should be
noted that once we fix n and ｾ＠ we can determine the generator polynomial for the BCH code.
The information length k is, then, decided by the degree of g(x). Intuitively, for a fixed
blocklength n, a larger value of t will force the information length k to be smaller (because a
higher redundancy will be required to correct more number of errors). In the following section,
we look at a few specific examples of BCH codes.
5.5 SOME EXAMPLES OF BCH CODES
The following example illustrates the construction of the extension field GF(16) from GF(2).
.The minimal polynomials obtained will be used in the subsequent examples.
Emmpk 5.7 Consider the primitive polynomialp(z) =l +z+ 1 ｾｶ･ｲ＠ GF(2). We shalluse this to
construct the extension field GF(16). Let a= zbe the primitive element. The elements OfGF(l6)
as powers ofa and the corresponding minimal polynomials are listed in the Table 5.3.
Table 5.3 The elements of GF(16) and the corresponding minimal polynomlafs
al t
ｾ＠ t
rr t
a• t+ 1
ｾ＠ c+z
a6 t+i-
a7 l+z+ 1
as c+1
a9 t+z
a1o l+z+ 1
au t+i!+z
al2 t+i!+z+l
al3 t+i-+1
at• t + 1
a1s 1
.l+x+ 1
i+x+ 1
x4
+ x3
+ x2
+ x + 1
x"+x+1
Xl+ x+ 1
x4
+ :? + x2
+ x + 1
i+x3+1
i+x+ 1
i+x3+Xl+x+1
; +x+ 1
i+f+ 1
x4
+ x3
+ ｾ＠ + x+ 1
i+x3+ 1
x4
+ x3 + 1
x+1
Example 5.8 We wish to determine the generator polynomial of a single error correcting BCH
code, i.e., t = 1 with a b1ocklength n = 15. From (5.10), the generator polynomial for a BCH code
is given by LCM [f1(x) j2(x), ..., j21(x)]. We will make use of Table 5.3 to obtain the minimal
polynomials ft(x) and fix). Thus, the generator polynomial of the single error correcting BCH
code will be
g(x) = LCM [f1(x), f2(x)]
=LCM [(x
4
+ x + 1), (x4
+ x + 1)]
=x
4
+x+ 1
Since, deg (g (x)) =n - k, we have n - k =4, which gives k =11. Thus we have obtained the
generator polynomial of the BCH (15, 11) single error correcting code. The designed distance of
this coded= 2t + 1 = 3. It can be calculated that the minimum distanced* of this code is also 3.
Thus, in this case the designed distance is equal to the minimum distance.
Next, we wish to determine the generator polynomial of a double error correcting BCH code,
i.e., t =2 with a blocklength n =15. The generator polynomial of the BCH code will be

g(x) = LCM [fj(x), fz(x), f3(x), f4(x)]
= LCM [(x
4
+ x + I)(x
4
+ x + I)(x
4
+ ｾ＠ + .x2 + x + I)(x4 + x + I)]
=(x
4
+x+ ＱＩＨｸＴ
ＫｾＫｸｬＫｸＫ＠ I)
= ｾ＠ + x7
+ x
6
+ x4
+ 1
Since, deg (g (x)) = n- k, we haven- k = 8, which gives k = 7. Thus, we have obtained the
generator polynomial of the BCH (I5, 7) double error correcting code. The desig."led distance of
this coded= 2t + 1 = 5. It can be calculated that the minimum distanced* of this code is also 5.
Thus, in this case the designed distance is equal to the minimum distance.
Next, we determine the generator polynomial for the triple error correcting binary BCH code.
The generator polynomial of the BCH code will be
g(x) = LCM [fj(x) fz(x), f3(x), fix), f5(x), f6(x)]
= (x
4
+X+ 1)(x
4
ＫｾＫｾＫｘＫ＠ 1)(_x2 +X+ 1)
= XIO + X8 + ｾ＠ + X4 + ｾ＠ + X+ 1
In this case, deg (g (x)) = n- k = 10, which gives k = 5. Thus we have obtained the generator
polynomial of the BCH (15, 5) triple error correcting code. The designed distance of this coded=
2t + 1 = 7. It can be calculated that the minimum distanced *ofthis code is also 7. Thus in this case
the designed distance is equal to the minimum distance.
Next, we determine the generator polynomial for a binary BCH code for the case t = 4. The
generator polynomial of the BCH code will be
g(x) = LCM [fi(x) f2(x), fix), fix), f5(x), f6(x) f7(x), f8(x)]
= (x
4
+ x + 1)(x
4
+ ｾ＠ + ｾ＠ + x + 1)(.xl + x + 1)(x4
+ ｾ＠ + I)
= XI4 + Xl3 + XI2 + XII + XIO + ｾ＠ + ｾ＠ + X7 + :J? + ｾ＠ + X4 + ｾ＠ + _x2 + X + I
In this case, deg (g (x)) = n- k = I4, which gives k = I. It can be seen that this is the simple
repetition code. The designed distance of this coded= 2t + I = 9. However, it can be seen that the
minimum distance d *of this code is I5. Thus in this case the designed distance is not equal to the
minimum distance, and the code is over designed. This code can actually correct (d- I)/2 = 7
random errors!
If we repeat the exercise fort= 5, 6 or 7, we get the same generator polynomial (repetition
code). Note that there are only 15 non-zero field elements in GF(16) and hence there are only 15
minimal polynomials corresponding to these field elements. Thus, we cannot go beyond t = 7
(because fort= 8 we needfi6(x), which is undefined). Hence, to obtain BCH codes that can correct
larger number oferrors we must use an extension field with more elements!
Example 5.9 We can construct GF(16) as an extension field of GF(4) using the primitive
polynomial p(z) = i + z + 1 over GF(4). Let the elements of GF(4) consist of the quaternary
symbols contained in the set {0, 1, 2, 3}. The addition and multiplication tables forGF(4) aregiven
below for handy reference.
GF(4)
+ 0 1 2 3 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 0 3 2 1 0 1 2 3
2 2 3 0 1 2 0 2 3 1
3 3 2 1 0 3 0 3 1 2
Table 5.4 lists the elements of GF(16) as powers of a. and the corresponding minimal
polynomials.
Fort= 1,
Table 5.4
Powers of a Elements of GF ( 16) . .Mimmal Polynomials
g(x) = LCM [fi(x), f 2(x)]
z
z+ 2
3z+ 2
z+ 1
2
2z
2z+3
z+ 3
2z+ 2
3
3z
3z+ 1
2z+ 1
3z+ 3
1
= LCM [( .x2 + X+ 2)( .x2 + X+ 3)]
= x
4
+ x + 1
}+.x+2
}+.x+3
; + 3.x + 1
}+.x+2
.x+2
; + 2.x+ 1
; + 2.x+ 2
}+.x+3
; + 2.x + 1
x+3
; + 3.x + 3
; + 3.x + 1
; + 2.x + 2
} + 3x+ 3
.x+1
Since, deg (g (x)) = n- k, we have n- k = 4, which gives k = 11. Thus we have obtained the
generator polynomial of the single error correcting BCH (15, 11) code over GF(4). It takes in 11
quaternary information symbols and encodes them into 15 quaternary symbols.. Note. that ｯｾ･＠
quaternary symbol is equivalent to two bits. So, in effect, the BCH (15, 1I) takes m 22mput bits
and transforms them into 30 encoded bits (can this code be used to correct a burst of length 2 for a
binary sequence of length 30?). The designed distance of this code d = 2t + 1 = 3. It can be
calculated that the minimum distance d of this code is also 3. Thus in this case the designed
distance is equal to the minimum distance.

r Information Theory, Coding and Cryptography
Fort= 2,
g(x) = LCM [f1(x), f2(x), jj(x), f4(x)]
= LCM [( xz + x + 2), ( xz + x + 3), ( xz + 3x + t), ( xz + x + 2)l
= (r +X+ 2)( r +X+ 3)( r + 3x + 1)
= x6
+ 3r+ x4
+ ｾ＠ + 2X2 + 2x + 1.
This is the generator polynomial of a (15, 9) double error correcting BCH code over GF(4).
Fort= 3,
g(x) = LCM [f1(x), f 2(x), j 3(x), fix), f 5(x), f 6(x)]
= LCM [( r +X+ 2), ( r +X+ 3), ( r + 3x + 1), ( r +X+ 2), (x + 2), (r + 2x + 1)]
= (r + x + 2)( ｾ＠ + x + 3)( ｾ＠ + 3x + l)(x + ＲＩＨｾ＠ + 2x + t)
= ｾ＠ + 3x
8
+ 3x
1
+ z/' + r + 2x4
+X+ 2
This is the generator polynomial of a (15, 6) triple error correcting BCH code over GF(4).
Similarly, for t = 4,
g(x) = x
11
+ x
10
+ U + 3x
1
+ 3Ji + r + 3x4
+ ｾ＠ + X + 3.
This is the generator polynomial of a (15, 4)fourerror correcting BCH code over GF(4).
Similarly, for t = 5,
g(x) = x
12
+ 2x11
+ 3x
10
+ ｾ＠ + 2x8
+ x
1
+ 3x6
+ 3x4
+ Ｓｾ＠ + r + 2.
This is the generator polynomial of a (15, 3)five error correcting BCH code over GF(4).
Similarly, fort= 6,
g(x) = Xl4 + Xl3 + Xl2 + Xll + XlO + XJ + ｾ＠ + X1 + ji + r + X4 + ｾ＠ + ｾＫｘＫ＠ 1.
This is the generator polynomial of a (15, 1) six error correcting BCH code over GF(4). As is
obvious, this is the simple repetition code, and can correct up to 7 errors.
.Table 5.5 lists the generator polynomial ofbinary BCH codes oflength up to 25
-1. Supposewe
wtsh to construct generator polynomial of the BCH(15,7) code. From the table we have (111 010
001) for the coefficients ofthe generator polynomial. Therefore,
g(x) = x8
+ x1
+ x6
+ x4
+ 1
Table 5.5 The generator polynomials of binary BCH codes of length up to:? -1
n J.. t Generator Polynomial Coetf1c1ents
7
15
15
15
31
4
11
7
5
26
1 1011
1 10 011
2 111 010 001
3 10 100 110 111
1 100 101
Contd.
31
31
31
31
21
16
11
6
2
3
5
7
11 101 101 001
1 000 Ill 110 101 111
101 100 010 011 011 010 101
11 001 011 011 110 101 000 100 111
5.6 DECODING OF BCH CODES
So far we have learnt to obtain the generator polynomial for a BCH code given the number of
random errors to be corrected. With the knowledge of the generator polynomial, very fast
encoders can be built in hardware. We now shift our attention to the decoding of the BCH
codes. Since the BCH codes are a subclass of the cyclic codes, any standard decoding procedure
for cyclic codes is also applicable to BCH codes. However, better, more efficient algorithms
have been designed specifically for BCH codes. We discuss the Gorenstein-Zierler decoding
algorithm, which is the generalized form of the binary decoding algorithm first proposed by
Petersen.
We develop here the decoding algorithm for a terror correcting BCH code. Suppose a BCH
code is constructed based on the field element a. Consider the error polynomial
e(x) = en_1? 1
+ en-2? 2
+ ... + e1x + e0 (5.12)
where at most t coefficients are non-zero. Suppose that v errors actually occur, where 0 :5 v :5 t.
Let these errors occur at locations i1, ｾＬ＠ ... , ill' The error polynomial can then be written as
e(x) = e;
1
ｾＱ＠ + ･ｾ＠ /1. + ... + e;" i• (5.13)
where e;* is the magnitude of the kth error. Note that we are considering the general case. For
binary codes, e;* = 1. For error correction, we must know two things:
(i) where the errors have occurred, i.e., the error locations, and
(ii) what the magnitudes of these errors are.
Thus, the unknowns are i1, ｾ＠ , ... , iv and e;1
, ･ｾＬ＠ ..., ei; which signify the locations and the
magnitudes of the errors respectively. The syndrome can be obtained by evaluating the ｲ･ｾ･ｩｶ･､＠
polynomial at a.
s1 = v(a) = c(a) + e(a) = e(a)
= e;
1
xi1 + ･ｾ＠ ! 2 + ... + eiv i• (5.14)
Next, define the error magnitudes, Yk = ei* for k = 1, 2, ..., v and the error locations Xk = ai* for
k = 1, 2, ..., v, where ik is the location of the kth error and Xk is the field element associated with
this location. Now, the syndrome can be written as
S1
= J1X1 + r; x; + ... + Yuxu (5.15)
We can evaluate the received polynomial at each of the powers of a that has been used to define
g(x). We define the syndromes for j= 1, 2, ..., 2tby

r
llｾｾ＠
ｾｾ＠
Sj= v(ai) = c(ai) + e(ai) = e(ai) (5.16)
':bus, we have the following set of 2t simultaneous equations, with v unknown error locations
X1
, A;, ... , ｾ｡ｮ､＠ the v unknown error magnitudes Yi, f2, ... , Yv.
S1 = YiX1 + Y2Xi + ··· + ｾ＠ Xv
S.Z = Yix'lr + Y2X22 + ··· + ｾｘｾ＠ (5.17)
S.Zt = ｙｩｸＧｬｾ＠ + ｦＲＮｘＲｾ＠ + ... + ｾｸｾｴ＠
Next, define the error locator polynomial
A(x) ］ａｾＫ＠ Av_1ｾＭ
Ｑ＠
+ ... A1x + 1 (5.18)
The zeros of this polynomial are the inverse error locations x-1 for k = 1 2 v Th t ·
k , , •.•, • a 1s,
A(x) = (1- xX1) (1 - xX2) ... (1 - xXJ (5.19)
So, if we know the coefficients of the error locator polynomial A(x), we can obtain the error
locations Xi, A;, ... , ｾ＠ Mter some algebraic manipulations we obtain
A1SJ+ v-I+ A2SJ+ v- 2 +-
... + AvSJ+ vfor j= I, 2, ..., v (5.20)
This is nothing but a set of linear equations that relate the syndromes to the coefficients of
A(x). This set of equations can be written in the matrix form as follows.
[
s1 s2 sll_1 sll ] [ ｾ＠ J [-s11+ 1
]
S2 s3 sll sll+l ｾＭＱ＠ = -s11+2
s1J s'(}+ 1 s21J-2 s211_I AI - s211
(5.21)
The values of ｴｨｾ＠｣ｯ･ｾ｣ｾ･ｮｴｳ＠ｯｾ＠ the error locator polynomial can be determined by inverting
the syndrome matrix. This IS poss1ble only if the matrix is non-singular. It can be shown that th"
. . lS
matrix IS non-singular if there are v errors.
Steps for Decoding BCH Codes
(i) As a trial ｶｾｵ･Ｌ＠ｾ･ｴ＠ v= t and compute the determinant of the matrix of syndromes, M. If
the determi?ant IS zero, ｳｾｴ＠ v = t - 1. Again compute the detetminant of M. Repeat this
process until a :alue of v 1s ｾｯｵｮ､＠ for which the determinant of the matrix of syndromes
Is non-zero. This value of vIs the actual number of errors that occurred.
(ii) Invert the matrix M and find the coefficients of the error locator polynomial A{;x).
(iii) s 1 ( )
o ve .A.x =?to obtain the zeros and from them compute the error locations X1
, A;, ... ,
Xzr If It Is a bmary code, stop (because the magnitudes of error are unity).
(iv) If the code is not binary, go back to the system of equations:
S1 = YiXr + Y2Xi + ··· + ｾ＠ Xv
S.Z = Yix'l1 + f2_x22 + ... + ｾｘｾ＠
(' = y,_x'lt + y;x2t + + y x2t
i.J<).t 1 1 2 2 .. . v v
Since the error locations are now known, these form a set of 2t linear equations. These can be
solved to obtain the error magnitudes.
Solving for Ai by inverting the vx vmatrix can be computationally expensive. The number of
computaticns required will be proportional to v3
. If we need to correct a large number of errors
(i.e., a large v) we need more efficient ways to solve the matrix equation. Various refinements
have been found which greatly reduce the computational complexity. It can be seen that the v x
v matrix is not arbitrary in form. The entries in its diagonal perpendicular to the main diagonal
are all identical. This property is called persymmetry. This structure was exploited by Berleykamp
(1968) and Massey (1969) to find a simpler solution to the system of equations.
The simplest way to search for the zeros of A(x) is to test out all the field elements one by one.
This method of exhaustive search is known as the Chien search.
Example 5.10 Consider the BCH (15, 5) triple error correcting code with the generator
polynomial
g(x) = x10
+ x
8
+ :x? + x
4
+ X: + x + 1
Let the all zero codeword be transmitted and the received polynomial be v(x) =:x? + ｾＮ＠ Thus, there
are two errors at the locations 4 and 6. The error pol1£1omial e(x) = :x? + x
3
• But the decoder does
not know this. It does not even know how ma.'ly errors have actually occurred. We use the
Gorenstein-Zierler Decoding Algorithm. First we compute the syndromes using the arithmetic of
GF(16).
S1 = as + a 3 = all
S2 = a 10
+ a
6
= a7
s3 = a 15
+ a9
= a
7
s4 = a20 + al2 = al4
Ss =a 25 + aIs =as
56 = a 3o + a1s = al4
First set v = t = 3, since this is a triple error correcting code.
Det (M) =0, which implies that fewer than 3 errors have occurred. Next, set v =t =2.

ｾｾｾ＠
1.
':.. :
1
,,
Det (M) :1:. 0, which implies that 2 errors have actually occurred. We next calculate M-1• It so
happens that in this case,
Solving for A1 and A2 we get Az =a8
and A1 =a11
• Thus,
A(x) = a
8
ｾ＠ + a
11
x + I= ( a 5
x + 1)( a 3
x + 1).
Thus, the recovered errorlocations are a 5
and a3
• Since the code is binacy, the errormagnitudes
are 1. Thus, e(x) = x5
+ x3
•
In the next section, we will study the famous Reed-Solomon Codes, an important sub-class of
BCHcodes.
5.7 REED-SOLOMON CODES
Reed-Solomon (RS) Codes are an important subclass of the non-binary BCH with a wide range
of applications in digital communications and data storage. The typical application areas of the
RS code are
• Storage devices (including tape, Compact Disk, DVD, barcodes, etc),
• Wireless or mobile communication (including cellular telephones, microwave links, etc),
• Satellite communication,
• Digital television I Digital Video Broadcast (DVB),
• High-speed modems such as those employing ADSL, xDSL, etc.
It all began with a five-page paper that appeared in 1960 in the journal of the Society for
ｬｮｾｵｳｴｲｩ｡ｬ＠ andAppliedMathematics. The paper, "Polynomial Codes over Certain Finite Fields" by
Irvmg S. Reed and Gustave Solomon of MIT's Lincoln Laboratory, introduced the ideas that
form a significant portion of current error correcting techniques for everything from computer
hard disk drives to CD players. Reed-Solomon Codes (plus a lot of engineering wizardry, of
course} ｭ｡､ｾ＠ possible the stunning pictures of the outer planets sent back by Voyager II. They
make It possible to scratch a compact disc and still enjoy the music. And in the not-too-distant
future, they will enable the profit mongers of cable television to squeeze more than 500 channels
into their systems.
RS coding system is based on groups of bits, such as bytes, rather than individual Os and 1s
making it particularly good at dealing with bursts of errors: six consecutive bit errors fo:
example, can affect at most two bytes. Thus, even a double-error-correction version of a ｒｾ･､ﾭ
Solomon code can provide a comfortable safety factor. Current implementations of Reed-
Solomon codes in CD technology are able to cope with error bursts as long as 4000 consecutive
bits.
In this sub-class of BCH codes, the symbol field GF(q) and the error locator field GF(qm) are
the same, i.e., m = 1. Thus, in this case
ｮ］ｾＭＱ］ｱＭＱ＠
The minimal polynomial of any element f3 in the same field GF(q) is
[p(x) = x- f3
(5.22)
(5.23)
Since the symbol field (sub-field) and the error locator field (extension field) are the same, all
the minimal polynomials are linear. The generator polynomial for at error correcting code will
be simply
g(x) = LCM [/i(x) fAx), ...,he(x)]
= (x- a)(x- a 2) ... (x- a 2
t-l) (x- a 2
t) (5.24)
Hence, the degree of the generator polynomial will always be 2t. Thus, the RS code satisfies
n - k = 2t (5.25)
In general, the generator polynomial of an RS code can be written as
g(x) = (x- ai)(x- ai+l) ... (x- a 2
t+t--
1
)(x- a 2t+i) (5.26)
,
Exturtpls 5.11 Consider the double error correcting RS code of blocklength 15 over GF (16).
Here t =2. We use here the elements ofthe extension field GF (16) constructed from GF(2)using
the primitive polynomialp(z) =z4
+z+ 1. The generator polynomial can be written as
g(x) =(x- a)(x- a 2
) (x- a 3
) (x- ｡ｾ＠
=x4
+ Ｈｾ＠ + i + 1) ｾ＠ + Ｈｾ＠ + i)ｾ＠ + ｾｸ＠ + (i + z + 1)
=x4
+ Ｈ｡Ｑ
ｾ＠ｾＫＨ｡ｾ＠ Xl + (a
3
) x + a
10
Here n -lc =4, which implies lc = 11. Thus, we have obtained the generator polynomial of an RS
(15, 11) code overGF(16). Note thatthiscodingprocedure takes in 11 symbols (equivalentto4x
11 =44 bits) and encodes them into 15 symbols (equivalent to 60 bits).
Theorem 5.2 A Reed-Solomon Code is a Maximum Distance Separable (MDS) Code
and its minimum distance is n - k + 1.
Proof Let the designed distance of the RS code be d= 2t + 1. The minimum distance d
satisfies the condition
I"?. d= 2t+ 1
But, for an RS code, 2t = n - k. Hence,
d"?.d=n-k.
But, by the Singleton Bound for any linear code,

､ｾ＠ n- k
Thus, d'" = n - k + 1, and the minimum distance d = d, the designed distance of the code.
Since RS. codes are maximum distance separable (MDS), all of the possible code words are
as far away as possible algebraically in the code space. It implies a uniform code word
distribution in the code space.
Table 5.6 lists the parameters of someRS codes. Note that for a given minimum distance, in
order to have a high code rate, one must work with larger Galois Fields.
Table 5.6 Some RS code parameters
m q =2"1
n =q - 1 t k d r = k/n
2 4 3 1 3 0.3333
3 8 7 1 5 3 0.7143
2 3 5 0.4286
3 1 7 0.1429
4 16 15 1 13 3 0.8667
2 11 5 0.7333
3 9 7 0.6000
4 7 9 0.4667
5 5 11 0.3333
6 3 13 0.2000
7 1 15 0.0667
5 32 31 1 29 3 0.9355
5 21 11 0.6774
8 15 17 0.4839
8 256 255 5 245 11 0.9608
15 225 31 0.8824
50 155 101 0.6078
Example 5.12 A popular Reed-Solomon code is RS(255,223) with 8-bit symbols (bytes), i.e.,
over GF (256). Each codeword contains 255 code word bytes, ofwhich 223 bytes are data and 32
bytes are parity. For this code, n =255, k = 223 and n- k =32. Hence, 2t =32, ort =16. Thus, the
decoder can correct any 16 symbol random error in the codeword,·i.e., errors in up to 16 bytes
anywhere in the ｣ｯ､･ｾｯｲ､＠ can be corrected.
Example 5.13 Reed Solomon error correction codes have an extremely pronounced effect on the
efficiency of a digital communication channel. For example, an operation running at a datarate of
1 million bytes per second will carry approximately 4000 blocks of255 bytes eachsecond. If 1000
random short errors (less than 17 bits in length) per second are injected into thechannel, about 600
to 800 blocks per second would be corrupted, which might require retransmission ofnearly all of
the blocks. By applying the Reed-Solomon (255, 235) code (thatcorrects up to 10 errors per block
of 235 information bytes and 20 parity bytes), the typical time between blocks that cannot be
corrected and would require retransmission will be about 800 years. The mean time between
incorrectly decoded blocks will be over 20 billion years!
5.8 IMPLEMENTATION OF REED-SOLOMON ENCODERS AND DECODERS
Hardware Implementation
A number of commercial hardware implementations exist for RS codes. Many existing systems
use off-the-shelf integrated circuits that encode and decode Reed-Solomon codes. These ICs
tend to support a certain amount of programmability, for example, RS(255, k) where t = 1 to 16
symbols. The recent trend has been towards VHDL or Verilog Designs (logic cores or
intellectual property cores). These have a number of advantages over standard ICs. A logic core
can be integrated with other VHDL or Verilog components and synthesized to an FPGA (Field '
Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)-this enables so-
called "System on Chip" designs where multiple modules can be combined in a single IC.
Depending on production volumes, logic cores can often give significantly lower system costs
than standard ICs. By using logic cores, a designer avoids the potential need to do a life-time
buy of a Reed-Solomon IC.
Software Implementation
Until recently, software implementations in "real-time" required too much computational power
for all but the simplest of Reed-Solomon codes (i.e. codes with small values oft). The major
difficulty in implementing Reed-Solomon codes in software is that general purpose processors
do not support Galois Field arithmetic operations. For example, to implement a Galois Field
multiply in software requires a test for 0, two log table look-ups, modulo add and anti-log table
look-up. However, careful design together with increases in processor performance mean that
software implementation can operate at relatively high data rates. Table 5.7 gives sample
benchmark figures on a 1.6 GHz Pentium PC. These data rates are for decoding only. Encoding
is considerably faster since it requires less computation.
Table 5.7 Sample benchmark figures for software decoding of
some RS codes
· Code Data Rate t .
RS(255,251)
RS(255,239)
RS(255,223)
5.9 NESTED CODES
- 120 Mbps
- 30 Mbps
- 10 Mbps
2
8
16
One of the ways to achieve codes with large blocklengths is to nest codes. This technique
combines a code of a small alphabet size and one of a larger alphabet size. Let a block of ttary
symbols be of length kK. This block can be broken up into Ksub-blocks of k symbols. Each sub-
block can be viewed as an element of a l-ary alphabet. A sequence of K such sub-blocks can be
encoded with an (N, K) code over GF(q*). Now, each of theN q*-ary symbols can be viewed as
.I

k q-ary symbols and can be coded with an (n, k) q-ary code. Thus, a nested code has two distinct
levels of coding. This method of generating a nested code is given in Fig. 5.I.
ｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＱ＠
Outer Encoder.
- (N, K)Code
over GF(qk}
1 ｱｾ＼Ｍ｡ｲｹ＠ super channel I
I
I
r-r
I
I
I
Inner Encoder:
(n, k) Code t----
over GF(q)
I
I
q-ary
f----+- Inner Decoder t-+
channel
I
_____________________ j
Fig. 5. 1 Nesting of Codes.
Outer Decoder
Example 5.14 The following two codes can be nested to form a code with a larger blocklength.
Inner code: The RS (7, 3) double error correcting code over GF(8).
Outer code: The RS (5II, 505) triple error correcting code over GF(83).
On nesting these codes we obtain a (3577, 1515) code over GF(8). This code can correct any
random pattern of II errors. The codeword is 3577 symbols long, where the symbols are the
elements ofGF(8).
Example 5.15 RS codes are extensively used in the compact discs (CD) for ermr correction.
Below we give the standard Compact Disc digital fonnat.
Sampling frequency: 44.1 kHz, i.e., 10% margin with respect to the Nyquist frequency (audible
frequencies below 20kHz)
Quantization: 16-bit linear=> theoretical SNR about 98 dB (for sinusoidal signal with maximum
allowed amplitude), 2's complement
Signal format: Audio bit rate 1.4I Mbit/s (44.I kHz x 16 bits x 2 channels), Cross Interleave
Reed-Solomon Code (CIRC), total data rate (CIRC, sync, subcode) 2.034 Mbit/s.
Playing time: Maximum 74.7 min.
Disc specifications: Diameter 120 mm, thickness 1.2 mm, track pitch 1.6 f.lill, one side medium,
disc rotates clockwise, signal is recorded from inside to outside, constant linear velocity (CLV),
recording maximizes recording density (the speed of revolution of the disc is not constant; it
gradually decreases from 500 to 200 r/min), pit is about 0.5 J.1IIl wide, each pit edge is '1' and all
areas in between, whether inside or outside a pit, are '0's.
E"or Correction: A typical error rate of a CD system is w-5, which means that a data error
occurs roughly 20 times per second (bit rate x BER). About 200 error/s can be ..;Orrected.
Soutces ofe"ors: Dust, scratches, fingerprints, pit asymmetry, bubbles or defects in substrate,
coating defects and dropouts.
Cross Interleave Reed-Solomon Code (CIRC)
• C2 can effectively correct burst errors.
• C1 can correct random errors and detect burst errors.
• Three interleaving stages to encode data before it is placed on a disc.
• Parity checking to correct random errors
• Cross interleaving to permit parity to correct burst errors.
I. Input stage: I2 words (16-bit, 6 words per channel) of data per input frame divided into 24
symbols of 8 bits.
2. C2 Reed Solomon code: 24 symbols of data are enclosed into a (28, 24) RS code and 4
parity symbols are used for error correction.
3. Cross interleaving: to guard against burst errors, separate error correction codes, one code
can check the accuracy of another, error correction is enhanced.
4. C1 Reed-Solomon code: cross-interleaved 28 symbols of the C2 code are encoded again
into a (32, 28) R-S code (4 parity symbols are used for error correction).
5. Output stage: half of the code word is subject to a 1-symbol delay to avoid 2-symbol error
at the boundary of symbols.
Performance ofCIRC: Both RS coders (C1 and C2) have four parities, and their minimumdistance
is 5. If error location is not known, up to two symbols can be corrected. If the errors exceed the
correction limit, they are concealed by interpolation. Since even-numbered sampleddata and odd-
numbered sampled data are interleaved as much as possible, CIRC can conceal long burst errors by
simple lin<?ar interpolation.
• Maximum correctable burst length is about 4000 data bits (2.5 mm track length).
• Maximum correctable burst length by interpolation in the worst case is about 12320 data
bits (7.7 mm track length).
Sample interpolation rate is one sample every IO hours at BER (Bit Error Rate)= 1o-
4
and 1000
samples at BER = 10-3
• Undetectable error samples (clicks) are less than one every 750 hours at
BER = 10-3
and negligible at BER = 10-4
.
The class of BCH codes were discovered independently by Hocquenghem in I959 and Bose
and Ray Chaudhuri in I960. The BCH codes constitute one of the most important and powerful
classes of linear block codes, which are cyclic.
The Reed-Solomon codes were discovered by Irving S. Reed and Gustave Solomon who
published a five-page paper in the journal of the Society for Industrial and Applied Mathematics
in 1960 titled "Polynomial Codes over Certain Finite Fields". Despite their advantages, Reed-

Solomon codes did not go into use immediately after their invention. They had to wait for the
hardware technology to catch up. In 1960, there was no such thing as fast digital electronics, at
least not by today's standards. The Reed-Solomon paper suggested some nice ways to process
data, but nobody knew if it was practical or not, and in 1960 it probably wasn't practical.
Eventually technology did catch up, and numerous researchers began to work on
implementing the codes. One of the key individuals was Elwyn Berlekamp, a professor of
electrical engineering at the University of California at Berkeley, who invented an efficient
algorithm for decoding the Reed-Solomon code. Berlekamp's algorithm was used by Voyager
II and is the basis for decoding in CD players. Many other bells and whistles (some of
fundamental theoretic significance) have also been added. Compact discs, for example, use a
version called cross-interleaved Reed-Solomon code, or CIRC.
SUMMARY
• A primitive element of GF(q) is an element a such that every field element except zero
can be expressed as a power of a. A field can have more than one primitive element.
• A primitive polynomial p(x) over GF(q) is a prime polynomial over GF(q) with the
property that in the extension field constructed modulo p(x), the field element
represented by x is a primitive element.
• A blocklength n of the form n = rf - 1 is called a primitive block length for a code over
GF(q). A cyclic code over GF(q) of primitive blocklength is called a primitive cyclic code.
• It is possible to factor xqm-r - 1 in the extension field GF(rf) to ｧｾｴｸｱｭＭｬ＠ - 1= IT (x- fJi),
j
where /3- ranges over all the non-zero elements of GF(rf). This implies that each of the
polynon'iials fi(x) can be represented in GF(rf) as a product of some of the linear terms,
and each {31
is a zero of exactly one of the fi (x). This fi(x) is called the minimal polynomial
of f3t
• Two elements of GF(rf) that share the same minimal polynomial over GF(q) are called
conjugates with respect to GF (q).
• BCH codes defined over GF(q) with blocklength rf - 1 are called primitive BCH codes.
• To determine the generator polynomial of a t-error correcting BCH code for a primitive
blocklength n = qm- 1, (i) Choose a prime po.lynomial of degree m and construct GF(q,,
(ii) find Ji(x), the minimal polynomial of a1
for i = 1, ..., p. (iii) obtain the generator
polynomial g(x) = LCM lfi(x) f2(x), ...,.f2e(x)]. Codes designed in this manner can correct
at least terrors. In many cases the codes will be able to correct more than terrors. For
this reason, d = 2t + 1 is called the designed distance of the code, and the minimum
distance d ｾ＠ 2t + 1.
• Steps for decoding BCH codes:
(1) As a trial value, set v= t and compute the determinant of the matrix of syndromes, M.
If the determinant is zero, set v= t - 1. Again compute the determinant ofM. Repeat
this process until a value of v is found for which the determinant of the matrix of
syndromes is non zero. This value of v is the actual number of errors that occurred.
(2) Invert the matrix M and find the coefficients of the error locator polynomial A(x).
(3) Solve A(x) = 0 to obtain the zeros and from them compute the error locations Xi, x;,
... , Xrr If it is a binary code, stop (because the magnitudes of error are unity).
(4) If the code in not binary, go back to the system of equations:
S1 = YrX1 + Y2X2 + ... + YvXv
ｾ＠ = ll.x'lr + J2.x'l2 + ··· + f;; ｘｾ＠
ｾｴ＠ = YrX2i + ｊＲｘＲｾ＠ + ... + ｦ［［ｘｾｴ＠
Since the error locations are now known, these form a set of2t linear equations. These
can be solved to obtain the error magnitudes.
• The generator polynomial for a terror correcting RS code will be simply g(x) = LCM[fi(x)
f2(x), ..., ht(x)] = (x- a)(x- a2
) ... (x- a2
t-
1
)(x- ift). Hence, the degree of the generator
polynomial will always be 2t. Thus, the RS code satisfies n - k = 2t.
• A Reed-Solomon code is a Maximum Distance Separable (MDS) Code and its minimum
distance is n- k + 1.
• One of the ways to achieve codes with large blocklengths is to nest codes. !his technique
combines a code of a small alphabet size and a code of a larger alphabet Size. Let a b}ock
of q-ary symbols be of length kK. This block can be broken up into K subblocks of k
symbols. Each sub-block can be viewed as an element of a l-ary alphabet.
9 ＢｏｮＮ｣･Ｍｹｯｷｾｴｍｲｾ＠ whaawer ｾ＠ n.o- :
·I ｾｨ･ｷ＠ｩｭｰｾ＠ｾｯｯｴｮＮ･Ｍｴｲｾ＠ ..
J -ＵｾＱＡｾ＠ (by SirAt"dwr CO"f.at'!Voyles 1859-1930)
PRO'BLEMS
Vconstruct ,CF(9) from Gft3) using an appropriate primitive ｰｯｬｾｯｭｩ｡ｬＮ＠
..£Z(i) Find the ｧ･ｮ･ｾ｡ｴｯｲ＠ｰｯｬｹｮｯｭｩｾ＠ g (x) for a ｾｾｾＹＱ･＠ error ｣ｯｲｲ･｣｟ｴｩｮｾ＠ Je__rnary BCH code of
'C/ ｢ｬｯ｣ｫｬ･ｮ･ﾷｷｾ｡ｴ＠ is the code rate of this code? ｃｾｾｰ｡ｲ･＠ It ｾｴｬｦｴｨ･＠ (11, 6) ternary
Golay ｣ｯｾｴｨ＠ respect to the code rate and the mimmum distance.
(ii) Next, find' the generator polynomial g(x) for a triple error correcting ternary BCH
code of blocklength 26.
5.3 Find the generator polynomial g(x) for a binary BCH code of ｢ｬｯ｣ｫｬ･ｮｾ＠ 31. ｟ｬｊｾ･＠ the
primitive polynomial p(x) = ｾ＠ + } + 1 to construct GF(32). What is e mimmum
distance of this code?
ｾ､＠ the generator polynomials and the minimum distance for the following codes:
§IRS (15, 11) code

,,
Ill
lJ
! I
:!
(ii) RS (15, 7) code
(iii) RS (31, 21) code.
5.5 Show that every BCH code is a subfield subcode of a Reed-Solomon Code of the same
designed distance. Under what condition is the code rate of the BCH code equal to that of
the RS code?
5.6 Consider the code over GF(11) with a parity check matrix
- ｛ｾ＠ｾ＠ｾ＠ｾＺＺ＠Ｑｾ＠ l
H- 1 'J? ｾ＠ 1o2
1 r.t :il 1ol
(i) Find the minimum distance of this code.
(ii) Show that this is an optimal code with respect to the Singleton Bound.
5.7 Consider the code over GF(11) with a parity check matrix
1 I 1 1 1
1 2 3
• 10
1 2? ｾ＠ .2 1cf
H=
I cji <ji .3 Hf
I ｾ＠ 3' •• HJ'
1 z5 ｾ＠ .5 tOS
(i) Show that the code is a triple error correcting code.
(ii) Find the generator polynomial for this code.
COMPUTER PROBLEMS
5.8 Write a computer program which takes in the coefficients of a primitive polynomial, the
values of q and m, and then constructs the extension field GF(tj).
5.9 Write a computer program that performs addition and multiplication over GF(2m),
where m is an integer.
5.10 Find the generator polynomial g(x) for a binary BCH code of blocklength 63. Use the
primitive polynomial p(x) = } + x+ 1 to construct GF(64). What is the minimum distance
of this code?
5.11 Write a program that performs BCH decoding given n, q, t and the received vector.
5.12 Write a program that outputs the generator polynomial of the Reed-Solomon code with
the codeword length n and the message length k. A valid n should be 2M- 1, where M is
an integer not less than 3. The program should also list the minimum distance of the
code.
5.13 Write a computer program that performs the two level RS coding as done in a standard
compact disc.
6
Convolutional Codes
f rhet ｾｰ｡ｴｬｬＮｴ＠ｏ･ｴｷ･･ＫＧ｜ｬｴｬ･ｬｯＭｾ＠ｃｮｴｾ＠ｲ･｡ｬＬｾ＠
I
ｾｾｾ｣｣ｷｱＺｩｬＬ･ＮｦｴＯｾ＠
L ｊ｡Ｌ｣ｱｵ｡ＭＱＡｾＱＸＶＵﾷＱＹＶＳ＠
6.1 INTRODUCTION TO CONVOLUTIONAL CODES
So far we have studied block codes, where a block of k information symbols are encoded into a
· block of n coded symbols. There is always a one-to-one correspondence between the uncoded
block of symbols (information word) and the coded block of symbols (codeword). This method
is particularly useful for high data rate applications, where the incoming stream of uncoded data
is first broken into blocks, encoded, and then transmitted (Fig. 6.1). A large blocklength is
important because of the following reasons.
(i) Many of the good codes that have large distance properties are of large blocklengths
(e.g., the RS codes),
(ii) Larger blocklengths imply that the encoding overhead is small.
However, very large blocklengths have the disadvantage that unless the entire block of
encoded data is received at the receiver, the decoding procedure cannot start, which may result
in delays. In contrast, there is another coding scheme in which much smaller blocks of uncoded

data of length ｾ｡ｲ･＠ used. These are called Information Frames. An information frame
typically contains just a few symbols, and can have as few as just one symbol! These information
frames are encoded into Codeword Frames of length no . However, just one information
frame is not used to obtain the codeword frame. Instead, the current information frame with
previous m information frames are used to obtain a single codeword frame. This implies that
such encoders have memory, which retain the previous m incoming information frames. The
codes that are obtained in this fashion are called Tree Codes. An important sub-class of Tree
Codes, used frequently in practice, is called Convolutional Codes. Up to now, all the decoding
techniques discussed are algebraic_ and are memoryless, i.e. decoding decisions are based only
on the current codeword. Convolutional codes make decisions based on past information, i.e.
memory is required.
101110... 01100101 ...
Block
encoder
Fig. 6.1 Encoding Using a Block Encoder.
In this chapter, we start with an introduction to Tree and Trellis Codes. We will then,
develop the necessary mathematical tools to construct convolutional codes. We will see that
convolutional codes can be easily represented by polynomials. Next, we will give a matrix
description of convolutional codes. The chapter goes on to discuss the famous Viterbi
Decoding Technique. We shall conclude this chapter by giving an introduction to Turbo
Coding and Decoding.
6.2 TREE CODES AND TRELLIS CODES
We assume that we have an infinitely long stream of incoming symbols (thanks to the volumes
of information sent these days, it is not a bad assumption!). This stream of symbols is first
broken up into segments of ｾ＠ symbols. Each segment is called an Information Frame, as
mentioned earlier. The encoder consists of two parts (Fig. 6.2):
(i) memory, which basically is a shift register,
(ii) a logic circuit.
The memory of the encoder can store m information frames. Each time a new information
frame arrives, it is shifted into the shift register and the oldest information frame is discarded. At
the end of any frame time the encoder has m most recent information frames in its memory,
which corresponds to a total of mlc0 information symbols.
When a new frame arrives, the encoder computes the codeword frame using this new frame
that has just arrived and the stored previous m frames. The computation of the codeword frame
is done using the logic circuit. This codeword frame is then shifted out. The oldest information
frame in the memory is then discarded and the most recent information frame is shifted in. The
Convolutional Codes
encoder is now ready for the next incoming information frame. Thus, for every information
frame ｾ＠ symbols) that comes in, the encoder generates a codeword frarrie (no symbols). It
should be observed that the same information frame may not generate the same codeword
frame because the codeword frame also depends on the m previous information frames.
Definition 6.1 The Constraint Length of a shift register encoder is defined as the
number of symbols it can store in its memory. We shall give a more formal definition
of cqnstraint length later in this chapter.
If the shift register encoder stores m previous information frames of length ｾＬ＠ the constraint
length of this encoder v= ｭｾＮ＠
101110...
Information
frame
Fig. 6.2 A Shift Register Encoder that Generates a Tree Code.
Definition 6.2 The infinite set of all infinitely long codewords obtained by feeding
every possible input sequence to a shift register encoder is called a (AQ, 71Q ) Tree
Code. The rate of this tree code is defined as
(6.1)
A more formal definition is that a ＨｾＬ＠ no) Tree Code is a mapping from the set of
semi infinite sequences of elements of GF(q) into itself such that iffor any m, two semi
infinite sequences agree in the first mAQ components, then their images agree in the
first Ｎｾ＠ components.
Definition 6.3 The Wordlength of a shift register encoder is defined as k= (m +
l)AQ. The Blocklength of a shift register encoder is defined as n = (m + 1)71Q =k ::
Note that the code rate R =ｾ＠ =.!... Normally, for practical shift register encoders,
no n
the information frame length Ao is small (usually less than 5). Therefore, it is difficult
to obtain the code rate R of tree codes close to unity, as is possible with block codes
(e.g., RS codes).

Definition 6.4 A (no, ｾＩ＠ tree code that is linear, time-invariant, and has a finite
wordlength k= (m + 1)k 0 is called an (n, k) Convolutional Code.
Definition 6.5 A (no, Ao ) tree code that is time-invariant and has a finite wordlength
k is called an Ｈｾ＠ k) Sliding Block Code. Thus, a linear sliding block code is a
convolutional code.
Example'6.1 Consider the convolutional encoder given in Fig. 6.3.
Input
r-------------------------
1 I
I I
I
I
I
I
Shift
Fig. 6.3 Convolutional Encoder of Example 6. 7.
Encoded
Output
This encoder takes in one bit at a time and encodes it into 2 bits. The information frame length
ko =ｾＬ＠ the code':ord frame length n0 = 2 and the blocldength (m + 1)no= 6 . The constraint length
of this ｾｮ｣ｾｲ＠ IS v = 2 and the code rate ｾＭ The clock rate of the outgoing data is twice as fast as
that of｟ｭ｣ｯｭｭｾ＠ data. The adders are binary adders, and from the point of view of circuit imple-
mentation, are simply XOR gates.
.Let us assume that the initial state of the shift register is [0 0]. Now, either '0' will come or •1•
will come as the incoming bit. Suppose '0' comes. On performing the logic operations, we see that
the_comput.ed value of the codeword frame is [0 0]. The 0 will be pushed into the memory (shift
ｲ･ｧｾｳｴ･ｲＩ＠ and the rightmost '0' will be dropped. The state of the shift register remains [0 O] Next
let '1 ' · th · . '
ｾｶ･｟＠ at e encoder. Agam we perform the logic operations to compute the codeword
ｾｾＭ This ｾｭ･＠ w_e obtain [1 1]. So, this will be pushed out as the encoded frame. The incoming
1. ｷｩｬｬｾ＠
ｳｨｩｦｾ＠ mto the memory, and the rightmost bit will be dropped. So the new state of the
shift ｲ･ｧｾｳｴ･ｲ＠ will be [1 0].
I;J ·IL.itl
Before
ｾ＠
ｾ＠ ·-.,.. ｾ＠
After Drop the
oldest bit
__________________________c
__
ｯｮ｟ｶ｟ｯ｟ｬ｟ｵｴ｟ｩｯ｟ｮ｟｡｟ｉ｟ｃ｟ｯ｟､ｾ･ｾｳＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠
Table 6.1 lists all the possibilities.
Table 6.1 The Incoming and Outgoing Bits of the Convolutional Encoder.
lncomm9 Current State of Outgomg Blfs -
Blf the Encode!
0
1
0
1
0
1
0
1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
We observe that there are only 22
= 4 possible states of the shift register. So, we can construct
the state diagram of the encoder as shown in Fig. 6.4. The bits associated with each arrow
represents the incoming bit. It can be seen that the same incoming bit gets encoded differently
depending on the current state of the encoder. This is different from the linear block codes
studied in previous chapters where there is always a one-to-one correspondence between the
incoming uncoded block of symbols (information word) and the coded block of symbols
(codeword).
Fig. 6.4 The state Diagram for the Encoder in Example 6. 7.
The same information contained in the state diagram can be conveyed usefully in terms of a
graph called the Trellis Diagram. A trellis is a graph whose nodes are in a rectangular grid,
which is semi-infinite to the right. Hence, these codes are also called Trellis Codes. The number
of nodes in each column is finite.The following example gives an illustration of a trellis diagram.
Example 6.2 The trellis diagram for the convolutional encoder discussed in Example 6.1 is given
in Fig. 6.5.
Every node in the trellis diagram represents a state of the shift register. Since the rate of the
encoder ｩｳｾＬ＠ one bit at a time is processed by the encoder. The incoming bit is either a '0' or a '1'.
Therefore, there are two branches emanating form each node. The top branch represents the input
as '0' and the lower branch corresponds to '1'. Therefore, labelling is not required for a binary
trellis diagram. In general, one would label each branch with the input symbol to which it
corresponds. Normally, the nodes that cannot be reached by starting at the top left node and moving
only to the right are not shown in the trellis diagram. Corresponding to a certain state and a
particular incoming bit, the encoder will produce an output. The output ofthe encoder is written on

f'
I
ii
I
0
0·
0·
0·
States ＭＭＭＭＭＭＭＭＭＭＭｾ＠
Time axis
• • •
Continues to
infinity
Fig. 6.5 The Trellis Diagram for the Encoder Given in Fig. 6.3.
top of that ｢ｲ｡ｮｾｨＮ＠ Thus, a trellis diagram gives a very easy method to encode a stream of input
data. The encoding procedure using a trellis diagram is as follows.
• We start from the top left node (since the initial state of the encoder is [0 O]).
• Depending on whether a '0' or a '1' comes, we follow the upper or the lower branch to the next
node.
• The encoder output is read out from the top of the branch being traversed.
• Again, depending on whether a '0' or a '1' comes, we follow the upper or the lower branch
from the current node (state).
• Thus, the encoding procedure is simply following the branches on the diagram and reading out
the encoder outputs that are written on top ofeach branch.
Encoding the bit stream 1 0 0 1 1 0 1 ... gives a trellis diagram as illustrated in Fig. 6.6. The
encoded sequence can be read out from the diagram as 11 01 11 11 10 10 00 ....
It ｣ｾ＠ be seen ｴｾｴ＠ .there is a one-to-one correspondence between the encoded sequence and a
path _m the ｴｲ･ｬｾｺｳ＠ｾｺ｡ｧｲ｡ｭＮ＠ Should the decoding procedure, then, just search for the most likely
path m the trellis diagram? The answer is yes, as we shall see further along in this chapter!
G
•
•
Fig. 6.6 Encoding an Input Sequence Using the Trellis Diagram.
Convolutional Codes
6.3 POLYNOMIAL DESCRIPTION OF CONVOLUTIONAL CODES
(ANALYTICAL REPRESENTATION)
In contrast to the two pictorial representations of the convolutional codes (the state diagram and
the trellis diagram), there is a very useful analytical representation of convolutional codes. The
representation makes use of the delay operator, D. We have earlier seen a one-to-one
correspondence between a word (vector) and a polynomial. The delay operator is used in a
similar ｭｾｮ･ｲＮ＠ For example, consider the word 10100011 with the oldest digit at the left. The
analytical representation (sometimes referred to as the transform) of this information word /(D)
will be
(6.2)
The indeterminate D is the number of time units of delay of that digit relative to the chosen
time origin, which is usually taken to coincide with the first bit. In general, the sequence IQ, i1, ｾＮ＠
z3 .... has the representation IQ + i1D + ｾ＠ JJ + z3d + ...
A convolutional code over GF (q) with a wordlength k= (m + 1)AQ, a blocklength n = (m + 1)no
and a constraint length v = ｭｾ＠ can be encoded by sets of finite impulse response (FIR) filters.
Each set of filters consist of ｾ＠ FIR filters in GF (q). The input to the encoder is a sequence of Ao
symbols, and the output is a sequence of no symbols. Figure 6.7 shows an encoder for a binary
convolutional code with Ao = 1 and no= 4. Figure 6.8 shows a convolutional filter with ｾ＠ = 2 and
7lo = 4.
Each of the FIR filters can be represented by a polynomial of degree ｾ＠ m. The input stream of
symbols can also be represented by a polynomial. The operation of the filter can then simply be
a multiplication of the two polynomials. Thus, the encoder (and hence the code) can be
represented by a set ofpolynomials called the generator polynomials of the code. This set
contains AoTlo polynomials. The largest degree of a polynomial in this set of generator
polynomials is m. We remind the reader that a block code was represented by a single generator
polynomial. Thus we can define a generator polynomial matrix of size AQ X 7lo for a
convolutional code as follows.
G(D) = [giJ (D)] (6.3)
FIR
FIR
FIR
FIR
Fig. 6.7 A Convolutional Encoder in Terms of FIR Filters with k0 = 7and n0 = 4.

Fig. 6.8 A Convolutional Encoder in Terms of FIR Filters with ko = 2 and n
0
= 4.
The Rate of This Encoder is R =ｾＮ＠
Exmllpll16.3 Consider the convolutional encoder given in Fig. 6.9.
ｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠
I I
I I
D b 8
I I
I I
ｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠
Fig. 6.9 The ｒ｡ｴ･ｾ＠ Convolutional Encoder with G(D) = (o2 + D +1 fil + 7).
The first bit of the output a =i,._2 + i,._1 + in and the second bit of the output b =i,._2
+ i,. , where
in-I represents the input that arrived l time units earlier. Let the input stream of symbols be
represented by a polynomial. We know that multiplying any polynomial by D corresponds to a
single cyclic right-shift ofthe elements. Therefore,
gu(D) = rY + D + 1 and g12 (D)= rY + 1
and the generator polynomial matrix ofthis encoder can be written as
G(D)= [D 2
+D+I IY+1].
Convolutional Codes
Next, consider the encoder circuit shown in Fig. 6.10.
-----------------------------------,
I I
l I
I I
- i b a r-----
h
I
I I
L----------------------------------
Fig. 6.10 The ｒ｡ｴ･ｾ＠ Convolutional Encoder with G(D) = (1 d + 1).
In this case, a = in and b = in-4 + in. Therefore, the generator polynomial matrix ofthis encoder
can be written as
G (D) = [1 D
4
+ 1].
Note that the first ko bits (/co = 1) of the codeword frame is identical to the information frame.
Hence, this is a Systematic Convolutional Encoder.
Example 6.4 Considerthe systematic convolutional encoder represented by the following circuit
(Fig. 6.11).
r---------------1
I I
I
Fig. 6.11 The Rate 2/3 Convolutional Encoder for Example 6.4.
The generator polynomial matrix of this encoder can be written as
[
g11 (D) g12 (D) g13 (D)]- [1
G{D) = -
g21 (D) g22 (D) g23 (D) 0
0 D
3
+D+l]
1 0
It is easy to write the generator polynomial matrix by visual inspection. The element in the ｾｾｲｯｷ＠
andj'thcolumn ofthe matrix represents the relation between thel-th input bit and theJ-th output bit. To

I
I
II
.j
:i
I
,,
l
) i
write the generator polynomial for the (iw,jtll) entry of the matrix, just trace the route from the ,-Ut
input bit to theJ-tl!OUtpUt bit. Ifno path exists, the generator polynomial is the zero polynomial, as in
the case of g12(D), g21(D) and g2lD). If only a direct path exists without any delay elements, the
value ofthe generator polynomial is unity, as in g11(D) and g2z(D). Ifthe route from the ithinput bit
to theJ-Utoutput bit involves a series ofmemory elements (delay elements), represent each delay by
an ｡､､ｩｴｩｾｮ｡ｬ＠ power of D, as in g13(D). Note that three of the generator polynomials in the set of
generator polynomials are zero. When ko is greater than 1, it is not unusual for some of the
generator polynomials to be the zero polynomials.
We can now give the formal definitions of the W ordlength, the Blocklength and the
Constraint Length of a Convolutional Encoder.
Definition 6.6 Given the generator polynomial matrix [gij(D)] of a convolutional
code:
{i) The Wordlength of the code is
k = ｾ＠ｾＨ＠ deg gij(D) + 1). {6.4)
1,)
{ii) The Blocklength of the code is
n = n0 11?-il:X[deg gij(D) + 1].
1,)
{iii) The Constraint Length of the code is
ko
V= ｌｾ｛､･ｧ＠ gij(D)].
i=j 1,)
{6.5)
{6.6)
Recall that the input message stream IQ, i1, £;., ｾ＠ ... has the polynomial representation I (D) = z0
+ i1D + £;_/J + i3U + ... + iMJ nMJ and the codeword polynomial can be written as C(D) = li> + c1
D + £2 D
2
+ 0,Ii +...+ criJ I?J. The encoding operation can simply be described as vector matrix
product,
C (D) = /(D) G (D) (6.7)
or equivalently,
c1(D) = Liz(D)g1,
1(D). (6.8)
i=l
Observing that the encoding operation can simply be described as vector matrix product, it can
be easily shown that convolutional codes belong to the class of linear codes (exercise).
Convolutional Codes
I
Definition ｾ＠ A Parity Check Matrix H(D) is an (no- ｾＩ＠ by no matrix of
polynomials that satisfies
G(D)H(D)T= 0 (6.9)
and the Syndrome Polynomial vector which is a (no- ｾＩＭ｣ｯｭｰｯｮ･ｮｴ＠ row vector is
give:J;l by
s(D) = v(D)H(D)T (6.10)
ｄ･ｾｴｩｯｮ＠ｾ＠ A Systematic Encoder for a convolutional code has the generator
polynomial matrix of the form
G(D) =[I IP(D)] (6.11)
where I is a ko by ko identity matrix and P(D) is a ko by (no - fro) matrix of polynomials.
The parity check polynomial matrix for a systematic convolutional encoder is
H(D) = [- P(D)T II] (6.12)
where I is a (no - ｾＩ＠ by (no - ｾＩ＠ identity matrix. It follows that
G(D)H(D)T = 0 (6.13)
Definition 'if A convolutional code whose generator polynomials g1(IJ), l!;;.(D), ...,
g110 (D) satisfy
GCD[gl(D), l!;;.(D), ..., gno (D)] = XZ (6.14)
for some a is called a Non-Catastrophic Convolutional Code. Otherwise it is
called a Catastrophic Convolutional Code.
Without loss of generality, one may take a = 0, i.e., XZ = 1. Thus the task of finding a non
catastrophic convolutional code is equivalent to finding a good set of relatively prime generator
polynomials. Relatively prime polynomials can be easily found by computer searches.
However, what is difficult is to find a set of relatively prime generator polynomials that have
good error correcting capabilities.
Example 6.5 All systematic codes are non-catastrophic because for them g1 (D) = 1 and
therefore,
GCD[l, g2(D)], ..., g"' (D)]= 1
Thus the systematic convolutional encoder represented by the generator polynomial matrix
G(D) = [1 D4
+ 1]
is non-catastrophic.
Consider the following generator polynomial matrix of a binary convolutional encoder
G(D) = [VZ + 1 D4
+ 1]
We observe that(D2
+ 1)2
=D4
+ (D2
+ D2
) + 1 =D4
+ 1for binary encoder (modulo 2 arithmetic).
Hence, GCD[gdD), gz(D)] = D2
+ 1;t 1. Therefore, this is a catastrophic encoder.
I

I'
Next, consider the following generator polynomial matrix of a non-systematic binary
convolutioilal encoder
The two generator polynomials are relatively prime, i.e., GCD[g1(D), g2(D)] =I.
Hence ｾｳ＠ represents anon-catastrophic convolutional encoder.
In the next section, we see that the distance notion of convolutional codes is an important
parameter that determines the number of errors a convolutional code can correct.
6.4 DISTANCE NOTIONS FOR CONVOLUTIONAL CODES
Recall that, for block codes, the concept of (Hamming) Distance between two codewords
provides a way of quantitatively describing how different the two vectors are and how as a good
code must possess the maximum possible minimum distance. Convolutional codes also have a
distance concept that determines how good the code is.
When a codeword from a convolutional encoder passes through a channel, errors occur from
time to time. The job of the decoder is to correct these errors by processing the received vector.
In principle, the convolutional codeword is infinite in length. However, the decoding decisions
are made on codeword segments of a finite length. The number of symbols that the decoder can
store is called the Decoding Window Width. Regardless of the size of these finite segments
(decoding window width), the previous frames affect the current frame because of the memory
of the encoder. In general, one gets a better performance by increasing the decoding window
width, but one eventually reaches a point of diminishing return.
Most of the decoding procedures for decoding convolutional codes work by focussing on the
errors in the first frame. If this frame can be corrected and decoded, then the first frame of
information is known at the receiver end. The effect of these information symbols on the
subsequent information frames can be computed and subtracted from subsequent codeword
frames. Thus the problem of decoding the second codeword frame is the same as the problem
of decoding the first frame.
We extend this logic further. If the first[frames have been decoded successfully, the problem
of decoding the(/+ l)th frame is the same as the problem of decoding the first frame. But what
happens if a frame in-between was not decoded correctly? If it is possible for a single decoding
error event to induce an infinite number of additional errors, then the decoder is said to be
subject to Error Propagation. In the case where the decoding algorithm is responsible for
error propagation, it is termed as Ordinary Error Propagation. In the case where the poor
choice of catastrophic generator polynomials cause error propagation, we call it Catastrophic
Error Propagation.
Convolutional Codes
Definition 6.10 The zthminimum distance d;·of a convolutional code is equal to
the smallest Hamming Distance between any two initial codeword segments l frame
long that are not identical in the initial frame. Ifl= m+ 1, then this (m+ l)th minimum
distance is called the Minimum Distance of the code and is denoted by d• , where m
is the number of information frames that can be stored in the memory ofthe encoder.
In literature, the minimum distance is also denoted by dmi,.
We note here that a convolutional code is a linear code. Therefore, one of the two codewords
used to determine the minimum distance of the code can be chosen to be the all zero codeword.
The lth minimum distance is then equal to the weight of the smallest-weight codeword segement
l frames long that is non zero in the first frame (i.e., different from the all zero frame).
Suppose the lth. minimum distance of a convolutional code is d!. The code can correct terrors
occurring in the first l frames provided
d; ｾ＠ 2t+ 1' (6.15)
Next, put l= m+ 1, in which cased;= d;+l = d•. The code can correct terrors occurring in the
first blocklength n = {m + 1)no provided
､Ｎｾ＠ 2t + 1 (6.16)
Example 6.6 Consider the convolutional encoder ofExamp1e 6.1 (Fig. 6.3). The Trellis Diagram
for the encoder is given in Fig. 6.12.
liJ
E·
E·
•·
•
•
ｾ＠ＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠
Time axis
• ••
Continues to
infinity
Fig. 6.12 The Trellis Diagram for the Convolutional Encoder of Example 6.1.
In this case d1
* = 2, ｾﾷ＠ = 3, d3* = 5, d4* = 5, ... We observe that dt= 5 fori ｾＳＮ＠ For this ･ｾ｣ｯ､･ｲＬ＠
m = 2.Therefore, the minimum distance of the code ｩｳｾ］＠ d • = 5. This code can correct(d - 1)/2
= 2 random errors that occur in one blocklength, n = (m + l)no = 6.

I
·i
E2J Information Theory, Coding and Cryptography
Definition 6.11 The Free Distance of a convolutional code is given by
flt,ee = m;uc[dj] (6.17)
It follows that t4n+l :5 dm+2 :5 ··· :5 dfree ·
The term dfree was first coined by Massey in 1969 to denote a type of distance that was found
to be an important parameter for the decoding techniques of convolutional codes. Since, fir.ee
represents the minimum distance between arbitrarily long (possibly infinite) encoded
sequences, dfree is also denoted by doo in literature. The parameter dfree can be directlycalculated
from the trellis diagram. The free distance fit,ee is the minimum weight of a path that deviates
from the all zero path and later merges back into the all zero path at some point further down
the trellis as depicted in Fig. 6.13. Searching for a code with large minimumdistance and large
free distance is a tedious process, and is often done using a computer.Clever techniques have
been designed that reduce the effort by avoiding exhaustive searches. Most of the good
convolutional codes known today have been discovered by computer searches.
Definition 6.12 The free length ｾｯｦ＠ a convolutional code is the length of the
non-zero segment of a smallest weight convolutional codeword of non-zero weight.
Thus, d1= dfreeif l= nfree, and d1< flt,eeif l< nfree· In literature, 7l_trnis also denoted by n00 •
--------The all zero path ___________...
ｾ＠
Re-merges
V Nodes in the trellis
Fig. 6.13 The Free Distance dtree path.
Example 6.7 Consider the convolutional encoder of Example 6.1 (Fig. 6.3). For this encoder,
dfree = 5. There are usually more than one ｰｾ＠ ofpaths that can be used to calculate dfree . The two
paths that have been used to calculatedfree are shown in Fig. 6.14 by double lines. In this example,
dmin = dfree .
Convolutional Codes
States
Time axis
Fig. 6.14 Calculating dtree in the Trellis Diagram.
Q Q Q
Continues to
infinity
The free length of the convolutional code is nfree = 6. In this example, the ｾ＠ is equal to the
blocklength n of the code. In general it can be longer than the blocklength.
6.5 THE GENERATING FUNCTION
The performance of a convolutional code depends on its free distance, dfree . Since convolutional
codes are a sub-class of linear codes, the set of Hamming distances between coded sequences is
the same as the set of distances of the coded sequel).ces from the all-zero sequence. Therefore,
we can consider the distance structure of the convolutional codes with respect to the all-zero
sequence without loss of generality. In this section, we shall study an elegant method of
determining the free distance, ｾ･＠ of a convolutional code.
To find fir.ee we need the set of paths that diverge from the all-zero path and merge back at a
later time. The brute force (and time consuming, not to mention, exasperating) method is to
determine the distances of all possible paths from the trellis diagram. Another way to find out
the fit,ee of a convolutional code is use the concep.t of a generating function, whose expansion
provides all the distance information directly. The generating function can be understood by
the following example.
Example 6.8 Consider again the convolutional encoder of Example 6.1 (Fig. 6.3). The state
diagram ofthe encoder is given in Fig. 6.4. We now construct a modified state diagram as shown in
Fig. 6.15.
The branches of this modified state diagram are labelled by branch gain d ,i = 0, 1, 2, where
the expc)nent of D denotes the Hamming Weight of the·branch. Note that the self loop ar S0 has
been neglected as it does not contribute to the distance property ofthe code. Circulating around this
loop simply generates the all zero sequence. Note thatS0 has been split into two states, initial and
final. Any path that diverges from state S0 and later merges back to S0 can be thought of
equivalently as traversing along the branches of this modified state diagram, starting from the

I
I
!
I
t
l
i
H
:
L
II.:
··..ｾ＠ I
't
I
:f
J.
·lj
ｾ＠ :
initial S0 and ending at the final S0• Hence this modified state diagram encompasses all possible
paths that diverge from and then later merge back to the all zero path in the trellis diagram.
･ＭＭＭＭＭｾＭＭＭＭＭｸｾＱｾＭＭＭＭＭＭＭＭｾｾｸｾＲＭＭＭＭｾｾＭＭＭＭｔ＼ｄ＾＠
So
Fig. 6.15 The Modified state Diagram of the Convolutional Encoder Shown in Fig. 6.3.
We can find the distance profile ofthe convolutional code using the state equations ofthis modified
state diagram. These state equations are
xt =D2 + x2,
x2 = DX1 + DX3,
x3 = DX1 + vx3,
T(D) = D2X2, (6.18)
where Xi are dummy variables. Upon solving these equations simultaneously, we obtain the
generating function
Ds
T(D)=--
1-2D
(6.19)
Note that the expression for T(D) can alxso be (easily) obtained by theMason's Gain Formula,
which is well known to the students ofDigital Signal Processing.
Following conclusions can be drawn from the series expansion ofthe generating function:
(i) There are an infinite number ofpossible paths that diverge from the all zero path and later
merge back again (this is also intuitive).
(ii) There is only one path with Hamming Distance 5, two paths with Hamming Distance 6,
and in general21
paths with Hamming Distance k + 5 from the all zero path.
(iii) The free Hamming Distance d1,eefor this code is 5. There is only one path corresponding to
4ee .Example 6.7 explicitly illustrates the pair of paths that result in 4ee = 5.
Convolutional Codes
We now introduce two new labels in the modified state diagram. To enumerate the length of
a given path, the label Lis attached to each branch. So every time we traverse along a branch we
increment the path length counter. We also add the label Ii, to each branch to enumerate the
Hamming Weight of the input bits associated with each branch of the modified state diagram
{see Fig. 6.16).
DLI
Fig. 6.16 The Modified state Diagram to Determine the Augmented Generating Function.
The state equations in this case would be
X1 = ｾｌｬ＠ +LIX;_,
X;. = DLXI + ｄｾＬ＠
X3 ］ｄｾ＠ +DLfX3,
And the Augmented Generating Function is
T(DJ=IYLX;. .
On solving these simultaneous equations, we obtain
T(D, L, I)= DsL3I
1-DL(L+l)I
(6.20)
= D5L3I+ D6L4 (L + 1)I2 + ···+ Dk+5
L3+k (L +1)klk+1
··· (6.21)
Further conclusions from the series expansion of the augmented generating function are:
(i) The path with the minimum Hamming Distance of 5 has length equal to 3.
(ii) The input sequence corresponding to this path has weight equal to 1.
(iii) There are two paths with Hamming Distance equal to 6 from the all zero path. Of these,
one has a path length of 4 and the other 5 (observe the power of L in the second term in
the summation). Both these paths have an input sequence weight of 2.
In the next section, we study the matrix description of convolutional codes which is a bit
more complicated than the matrix description of linear block codes.

!
f
I
ｾ＠ i
w
"I
H-
6.6 MATRIX DESCRIPTION OF CONVOLUTIONAL CODES
A convolutional code can be described as consisting of an infinite number of infinitely long
codewords and which (visualize the trellis diagram) belongs to the class of linear codes. They
can be described by an infinite generator matrix. As can be expected, the matrix description of
convolutional codes is messier than that of the linear block codes.
Let the generator polynomials of a Convolutional Code be represented by
giJ(D) = ｾｧｩｊ＠ D
1
(6.22)
In order to obtain a generator matrix, the gijl coefficients are arranged in a matrix format. For
each l, let G1be a ｾ＠ by no matrix.
Gt=[gii1] (6.23)
Then, the generator matrix for the Convolutional Code that has been truncated to a block code
of blocklength n is
Go G1 Gz Gm
0 Go Go Gm-1
G(n) =
G(n)= 0 0 Go Gm-2
(6.24)
0 0 0 Go
where 0 is a ｾ＠ by no matrix of zeros and m is the length of the shift register used to generate the
code. The generator matrix for the Convolutional Code is given by
G=[:'
ｾ＠ G2 Gm 0 0 0 0
"""]
Go ｾ＠ Gm-1 Gm 0 0 0 ...
0 Go Gm-2 Gm-1 Gm 0 0 (6.25)
The matrix extends infinitely far down and to the right. For a systematic convolutional code, the
generator matrix can be written as
I Po 0 ｾ＠ 0 p2 0 pm :o 0 0 0
0 0 I Po 0 11 0
I
Pm-1 l 0 pm 0 0
0 0 0 0 I Po 0
I
pm-1 0 pm
Pm-2 l 0
(6.26)
G=
:o pm-2 0 Pm-1
I
I
0 Pm-2
I
I
I
I
where I is ｡ｾ＠｢ｹｾ＠ identity matrix, 0 is ｡ｾ＠｢ｹｾ＠ matrix of zeros and P0 , P 2 , ..., Pｭ｡ｲ･ｾ＠ by
(no - ｾＩ＠ matrices. The parity check matrix can then be written as
Convolutional Codes
RT
0 -I
p,T
1 0 P/ -I
Pl 0 p,T
1 0 P/ -I
H= (6.27}
pT
m 0 p,;_1 0 p,;_2 0 P{ -I
pT
m 0 p,;_I 0
pT
m 0
Example 6.9 Consider the convolutional encoder shown in Fig. 6.17. Let us first write the gene-
rator polynomial matrix for this encoder. To do so, we just follow individual inputs to the outputs,
one by one, and count the delays. The generator polynomial matrix is obtained as
[
D + D2 DD2 D+DD2]
G(D)= D2
i2
+ C:3
Fig. 6.17 A Rate 2!3 Convolutional Encoder.
The generator polynomials are g11(D) = D + IY, g12(D) = IY, g13(D) = D + IY, g21(D) = D
2
,
ｾＲＨｄＩ＠ = D and ｾＳ＠ (D) = D.
To write out the matrix G0, we look at the constants (coefficients of D
0
) in the generator·
polynomials. Since there are no constant terms in any of the generator polynomials,
ｇｯ］｛ｾ＠ｾ＠ｾ｝＠
Next, to write out the matrix G1, we look at the coefficients ofD1
in the generator polynomials.
The l8
trow, 1st column entry of the matrixG1 corresponds to the coefficients ofD
1
ingu(D). The
l8
t row, 2nd column entry corresponds to the coefficients ofD1in g12(D), and so on. Thus,

i
i
!I
!
i
Similarly, we can write
Gr= [: ｾ＠ｾ｝＠
The generator matrix can now be written as
o o o:1 o 1:1 1 1:o o o ...
I I I
o o o:o 1 1:o 1 1: o o ...
-------r-------T-------r----------
10 0 0 11 0 111 1 1
I I I
:o o o:o 1 1:o 1 1
G=
I I I
ＭＭＭＭＭＭＭ［ＭＭＭＭＭＭＭ［ＭＭＭＭＭＭＭＺｾＭＭｯＭＭｾＭＭＭ
1 I
! : 0 1 1 ...
Our next task is to look at an efficient decoding strategy for the convolutional codes. One of the
very popular decoding methods, the Viterbi Decoding Technique, is discussed in detail.
6.7 VITERBI DECODING OF CONVOLUTIONAL CODES
There are three important decoding techniques for convolutional codes: Sequential Decoding,
Threshold Decoding and the Viterbi Decoding. The Sequential Decoding technique was
proposed by Wozencraft in 1957. Sequential Decoding has the advantage that it can perform
very well with long-constraint-length convolutional codes, but it has a variable decoding time.
.Threshold Decoding, also known as Majority Logic Decoding, was proposed by Massey in
1963 in his doctoral thesis at MIT. Threshold decoders were the first commercially produced
decoders for convolutional codes. Viterbi Decoding was developed by AndrewJ. Viterbi in
1967 and in the late 1970s became the dominant technique for convolutional codes.
Viterbi Decoding had the advantages of
(i) a highly satisfactory bit error performance,
(ii) high speed of operation,
(iii) ease of implementation,
(iv) low cost.
Threshold decoding lost its popularity specially because of its inferior bit error performance.
It is conceptually and practically closest to block decoding and it requires the calculation of a set
of syndromes, just as in the case of block codes. In this case, the syndrome is a sequence because
the information and check digits occur as sequences.
Viterbi Decoding has the advantage that it has a fixed decoding time. It is well suited to
hardware decoder implementation. But its computational requirements grow exponentially as a
function of the constraint length, so it is usually limited in practice to constraint lengths of
Convolutional Codes
v = 9 or less. As of early 2000, some leading companies claimed to have produced a
V = 9 Viterbi decoder that operates at rates up to 2 Mbps.
Since the time when Viterbi proposed his algorithm, other researchers have expanded on his
work by finding good convolutional codes, exploring the performance limits of the technique,
and varying decoder design parameters to optimize the implementation of the technique in
hardware and software. The Viterbi Decoding algorithm is also used in decoding Trellis Coded
Modulation (TCM), the technique used in telephone-line modems to squeeze high ratios of
bits per-second to Hertz out of 3 kHz-bandwidth analog telephone lines. We shall see more of
TCM in the next chapter. For years, convolutional coding with Viterbi Decoding has been the
predominant FEC (Forward Error Correction) technique used in space communications,
particularly in geostationary satellite communication networks such as VSAT (very small
aperture terminal) networks. The most common variant used in VSAT networks is rate 112
convolutional coding using a code with a constraint length V= 7. With this code, one can
transmit binary or quaternary phase-shift-keyed (BPSK or QPSK) signals with at least 5 dB less
power than without coding. That is a reduction in Watts of more than a factor of three! This is
very useful in reducing transmitter and antenna cost or permitting increased data rates given the
same transmitter power and antenna sizes.
We will now consider how to decode convoh.Itional codes using the Viterbi Decoding
algorithm. The nomenclature used here is that we have a message vector i from which the
encoder generates a code vector c that is sent across a discrete memoryless channel. The
received vector r may differ from the transmitted vector c (unless the channel is ideal or we are
very lucky!). The decoder is required to make an estimate of the message vector. Since there is
a one to one correspondence between code vector and message vector, the decoder makes an
estimate of the code vector.
Optimum decoding will result in a minimum probability of Decoding error. Let p(rlcJ be the
conditional probability of receiving r given that c was sent. We can state that the optimum
decoder is the maximum likelihood decoder with a decision rule to choose the code vector
estimate Cfor which the log-likelihood function In p(rlcJ is maximum.
If we consider a BSC where the vector elements of c and r are denoted by ci and r; , then, we
have
N
p{rlcJ = L P(T; lc;),
i=l
and hence, the log-likelihood function equals
N
In p(r Ic)= L In p(r; jcJ
i=l
Let us assume
(6.28)
(6.29)
(6.30)

If we suppose that the received vector differs from the transmitted vector in exactly d
positions (the Hamming Distance between vectors c and r), we may rewrite the log-likelihood
function as
In p(r Ic)= din p + (N- d) ln(1 - p)
=din (_p__J+Nln ( 1 - p)
1- p
(6.31)
We can assume the probability of error p < lf2 and we note that N In(1 - p) is a constant for
all code vectors. Now we can make the statement that the maximum likelihood decoding rule for a
Binary Symmetric Channel is to choose the code vector estimate cthat minimizes the Hamming Distance
between the received vector r and the transmitted vector c.
For Soft Decision Decoding in Additive White Gaussian Noise (AWGN) channel with single
sided noise power N0, the likelihood function is given by
lr-d
N -'-'-
II
1 No
.p(r Ic) = i=I JtrNo e
(
1 Jt ( 1 N J
= - - exp --,Lir; -cil2
TrNo No i=I
(6.32)
Thus the maximum likelihood decoding rule for the AWGN channel with Soft Decision
Decoding is to minimize the squared Euclidean Distance between r and c. This squared Euclidean
Distance is given by
N
､ｾＨｲｬ｣Ｉ＠ = _L!r; -ci!
2
(6.33)
i=l
Viterbi Decoding works by choosing that trial information sequence, the encoded version of
which is closest to the received sequence. Here, Hamming Distance will be used as a measure of
proximity between two sequences. The Viterbi Decoding procedure can be easily understood
by the following example.
Example 6.10 Consider the rate 113 convolutional encoder given in Fig. 6.18 and the corresponding
trellis diagram.
(Contd.. .)
Convolutional Codes
101 101
Smres --------------------------------------
Time axis
• • •
Continues to
infinity
Fig. 6.18 A Rate 7/3 Convolutional Encoder and Its Trellis Diagram.
Suppose the transmitted sequence was an all zero sequence. Let the received sequence be
r= 010000100001 ...
Since it is a 1/3 rate encoder, we first segment the received sequence in groups of three
bits (because n0 = 3), i.e.,
r = 010 000 100 001 ...
The task at hand is to fiqd out the most likely path through the trellis. Since a path must pass
through nodes in the trellis, we will try to find out which nodes in the trellis belong to the most
likely path. At any time, every node has two incoming branches ＨｾＩＮｗ･＠ simply determine which
ofthese two branches belongs to a more likely path (and discard the other). We make this decision
based on some metric (Hamming Distance). In this way we retain just one path per node and the
metric of that path. In this example, we retain only four paths as we progress with our decoding
(since we have only 4 states in our trellis).
Let us consider the first branch of the trellis which is labelled 000. We find the Hamming
distance between this branch and the first received framelength, 010. The Hamming distance
d (000, 010) = 1. Thus the metric for this first branch is 1, and is called theBranch Metric. Upon
reaching the top node from the starting node, this branch has accumulated a metric= 1. Next, we
compare the received framelength with the lower branch, which terminates at the second node from
the top. The Hamming Distance in this case is d (111, 010) = 2. Thus, the metric for this first
branch is 2. At each node we write the total metric accumulated by the path, called the Path
Metric. The path metrics are marked by circled numbers in the trellis diagram in Fig. 6.19. At the
subsequent stages ofdecoding when two paths terminate at every node, we will retain the path with
the smaller value of the metric.

I,
i.
[
l
l!
T ｾ＠
,,
·,
Lo1 I •
L1o I • • • •
•
0· •
ｳｭｾｳ＠ＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠
Time axis
Fig. 6.19 The Path Metric after the 1st Step of Vlterbl Decoding.
We, now, move to the next stage ofthe trellis. The Hamming Distance betweenthe branches are
computed with respect to the second frame received, 000. The branch metrics for the two branches
emanating from the topmost node are 0 and 3. The branch metrics for the two branches emanating
from the second node are 2 and 1. The total path metric is marked by circled numbers in the trellis
diagram shown in Fig. 6.20.
0
0 8
(i 8 8
0 8 (i
0 8 (i
ｾ＠ＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠
Time axis
Fig. 6.20 The Path Metric after the 2nd Step of Vlterbl Decoding.
We now proceed to the next stage. We again compute the branch metrics and add them to the
respective path metrics to get the new path metrics. Consider the topmost node at this stage. Two
branches terminate at this node. The path coming from node 1 of the previous stage has a path
metric of2 and the path coming from node 1ofthe previous stage has a path metric of6. The path
with a lower metric is retained and the other discarded. The trellis diagram shown in Fig. 6.21
gives the surviving paths (double lines) and the path metrics (circled numbers). Viterbi called
these surviving paths as Survivors. It is interesting to note that node 4 receives two paths with
equal path metrics. We have arbitrarily chosen one ofthem as the surviving path (by tossing a fair
coin!).
Convolutional Codes
G
G· 0 G 8
0· ..
0-- 8
States
Time axis
Fig. 6.21 The Path Metric after the 3rd step of Viterbl Decoding.
We continue this procedure for Viterbi decoding for the next stage. The final branch metrics
and path metrics are shown in Fig. 6.22. At the end we pick the path with the minimum metric.
This path corresponds to the all zero path. Thus the decoding procedure has been able to
correctly decode the received vector.
CDo f;;
ｾｾｾｾｾｾｾｾｾｾ］］ｾＰ＠
G· 0
@ (oil G
0· (i
0
0· @ @
states
lime axis
Fig. 6.22 The Path Metric after the 4th Step of Viterbi Decoding.
The minimum distance for this code is a= 6. The number oferrors that it can correct perframe
length is equal to
t= l(d.- 1)/2)j = l(6- l)/2j = 2.
In this example, the maximum number oferrors per framelength was 1.
Consider the set of surviving paths at the rth frame time. If all the surviving paths cross
through the same nodes then a decision regarding the most likely pathＮｴｲ｡ｮｾｭｩｴｴ･､＠ can be made
up to the point where the nodes are common. To build a ｰｲｾ｣ｴｩ｣｡ｬ＠ｖｴｾｲ｢ｴ＠ Decoder, one must
choose a decoding window width w, which is usually several times as btg. as the ｢ｬｯ｣ｾ･ｮｧｴｨＮ＠ At
a given frame time,f, the decoder examines all the surviving paths to see If they agree m the first

i;
l
'·!
!I·
li
,,
branch. This branch defines a decoded information frame and is passed out of the decoder. In
the previous example of Viterbi Decoding, we see that by the time the decoder reaches the 4th
frame, all the surviving paths agree in their first decoded branch (called a well-defined decision).
The decoder drops the first branch (after delivering the decoded frame) and takes in a new
frame of the received word for the next iteration. If again, all the surviving paths pass through
the same node of the oldest surviving frame, then this information frame is decoded. The
process continues in this way indefinitely.
If a long·enough decoding window w is chosen, then a well-defined decision can be reached
almost always. A well designed code will lead to correct decoding with a high probability. Note
that a well designed code carries meaning only in the context of a particular channel. The
random errors induced by the channel should be within the error correcting capability of the
code. The Viterbi decoder can be visualized as a sliding window through which the trellis is
viewed (see Fig. 6.23). The window slides to the right as new frames are processed. The
surviving paths are marked on the portion of the trellis which is visible through the window. As
the window slides, new nodes appear on the right, and some of the surviving paths are extended
to these new nodes while the other paths disappear.
Decoding Window
• • • • • • • • • • •
• • • • • • • • •
• • • • • • • •
w
Fig. 6.23 The Viterbi Decoder as a Sliding Window through which the Trellis is Viewed.
If the surviving paths do not go through the same node, we label it a Decoding Failure. The
decoder can break the deadlock using any arbitrary rule. To this limited extent, the decoder
becomes an incomplete decoder. Let us revert back to the previous example. At the 4th stage,
the surviving paths could as well be chosen as shown in Fig. 6.24, which will render the decoder
as an incomplete decoder.
CD CD 0
0
8
• •
0
0· ®
States
Time axis
Fig. 6.24 Example of an Incomplete Decoder in Viterbi Decoding Process.
Convolutional Codes
It is possible that in some cases the decoder reaches a well-defined decision, but a wrong one!
If this happens, the decoder has no way of knowing that it has taken a wrong decision. Based on
this wrong decision, the decoder will take more wrong decisions. However, if the code is non-
catastrophic, the decoder will recover from the errors.
The next section deals with some Distance Bounds for convolutional codes. These bounds
will help _us compare different convolutional coding schemes.
6.8 DISTANCE BOUNDS FOR CONVOLUTIONAL CODES
Upper bounds can be computed on the minimum distance of a convolutional code that has a
rate R = !!L and a constraint length v = ｭｾＮ＠ These bounds are similar in nature and derivation
no
to those for block codes, with block length corresponding to constraint length. However, as we
shall see, the bounds are not very tight. These bounds just give us a rough idea of how good the
code is. Here we present the bounds (without ｰｲｯｯｾ＠ for binary codes.
For rate R and constraint length v, let d be the largest integer that satisfies
ｈＨｾｶ＠ＩｾｩＭｒ＠ (6.34)
Then at least one binary convolutional code exists with minimum distance d for which the
above inequality holds. Here H(x) is the familiar entropy function for a binary alphabet
H(x) = - x log2 x- (1 - x) log2 (1 - x), 0 :::;; x:::;; 1
For a binary code with R = 1171v the minimum distance dmin satisfies
dmin :::;; L(no v +no )/2J
where L]J denotes the largest integer less than or equal to 1
An upper bound on dfree is given by (Heller, 1968)
rip.ee = min lnv -t-v+j -lj
f?.l 2 21 -1
(6.35)
(6.36)
To calculate the upper bound, the right hand side should be plotted for different integer
values of j. The upper bound is the minimum of this plot.
Example 6.10 Let us apply the distance bounds on the convolutional encoder given in Example
6.1. We will first apply the bound given by (6.34). For this encoder, ko ｾ＠ 1, no= 2, R =lnand v
=2.
H(__!l_) = H(d14) ｾ＠ 1- R=1/2 => H(d14) ｾ＠ 0.5
n0v

I.
i I
i
:I
But we have,
H{0.11) = - 0.11log2 0.11- (1- 0.11) log2 {1- 0.11) = 0.4999, and
H{0.12) =- 0.12 log2 0.12- {1- 0.12) log2 {1- 0.12) = 0.5294.
Therefore, ､ＯＴｾ＠ .0.11, or d ｾ＠ 0.44
The largest integer d that satisfies this bound is d = 0. This implies that at least one
binary convolutional code exists with minimum distance d = 0. This statement does not
say much (i.e., the bound is not strict enough for this encoder)!
NeXt, consider the encoder shown in Fig. 6.25.
For this encoder,
But we have,
----------------
;
I
I
:_----------------- J
Fig. 6.25 Convolutional Encoder for Example 6. 10.
G(D) = [1 D+ D2
D+ D2
+ D3
],
ｾ＠ =1, no= 3, R= 113 and v =3.
n(ｾｶＩ＠ =H(d/9) Sl- R; 2/3=> H(d/9) s0.6666
H{O.l7) = - 0.17log2 0.17- (1- 0.17) log2 (1- 0.17) =0.6577, and
H(O.l8) = - 0.18log2 0.18- (I - 0.18) log2 (1 - 0.18) = 0.6801
Therefore, d/9 ｾ＠ 0.17, or d ｾ＠ 1.53.
The largest integer d that satisfies this bound is d = 1. Then at least one binary
convolutional code exists with minimum distance d = 1. This is a very loose bound.
Let us now evaluate the second bound, given by (6.35).
dmin ｾ＠ l(nov +no )12J=l(9+ 3)/2J=6
This gives us dmm = 6, which is a good upper bound as seen from the trellis diagram
ｦｾｲ＠ the encoder (Fig. 6.26). Since no= 3, every branch in the trellis is labelled by 3
btts. The two paths that have been used to calculate dmin are shown in Fig. 6.26 by
double lines. In this example, dmin = ｾ＠ = 6.
Convolutional Codes
States
lime axis
e e e
Continues to
infinity
Fig. 6.26 The Trellis Diagram for the Convolutional Encoder Given in Fig. 6.25.
Next, we determine the Heller Bound on dfm, as given by (6.36). The plot of
the function
d(j) = l(no/2){211(21 -t))(v +j-1)J
for different integer values of j is given in Fig. 6.27.
.-3 ·. 4. 5
Fig. 6.27 The Heller Bound Plot.
From Fig. 6.27, we see that the upper bound on the free dista,nce of the code is dfm
ｾ＠ 8. This is a good upper bound. The actual value of dfrtt = 6.
6.9 PERFORMANCE BOUNDS
One of the useful performance criteria for convolutional codes is the bit error probability P1r
The bit error probability or the bit error rate (a misnomer!) is defined as the expected number
of decoded information bit errors per information bit. Instead of obtaining an exact expression
for Ph , typically, an upper bound on the error probability is calculated. We will first determine
the First Event Error Probability, which is the probability of error for sequences that merge
with the all zero (correct) path for the first time at a given node in the trellis diagram.

Since convolutional codes are linear, let us assume that the all zero codeword is transmitted.
An error will be made by the decoder if it chooses the incorrect path c' instead of the all zero
path. Let c' differ from the all zero path in d bits. Therefore, a wrong decision will be made by
the maximum likely decoder if more than lｾ＠ Jerrors occur, where L
xJ is the largest integer
less than or equal to x. If the channel transition probability is p, then the probability of error can
be upper bounded as follows.
(6.37)
Now, there would be many paths with different distances that merge with the correct path ｡ｴｾ＠
a given time for the first time. The upper bound on the first event error probability can be
obtained by summing the error probabilities of all such possible paths:
co
Pe ｾ＠ LadPd (6.38)
d=dfrn
where, ad is the number of codewords of Hamming Distance d from the all zero codeword.
Comparing (6.19) and (6.38) we obtain
ｾ＠ｾ＠ T(D)ID=2b(I- p) (6.39)
The bit error probability, Ph, can now be determined as follows. Ph can be upper bounded by
weighting each pairwise error probability, Pd in (6.37) by the number of incorrectly decoded
information bits for the corresponding incorrect path nd- For a rate lr/n encoder, the average Ph
(6.40)
It can be shown that
aT(D, 1)1
ai l=l
(6.41)
Thus,
Ph ｾ＠ _!_ 1ar(D, I) j
k a1 l=l,D=2J(l-p)
(6.42)
6.10 KNOWN GOOD CONVOLUTIONAL CODES
In this section, we shall look at some known good convolutional codes. So far, only a few
constructive classes of convolutional codes have been reported. There exists no class with an
algebraic structure comparable to the t-error correcting BCH codes. No constructive methods
Convolutional Codes
exist for finding convolutional codes of long constraint length. Most of the codes presented here
have been found by computer searches.
Initial work on short convolutional codes with maximal free distance was reported by
Odenwalder (1970) and Larsen (1973). A few of the codes are listed in Tables 6.2, 6.3 and 6.4 for
code rates 112, 113 and 1/4 respectively. The generator is given as the sequence 1, ｲｾｩＩＧ＠ｲｾＩ＠ ...
where
(6.43)
For example, the octal notation for the generators of the R = lf2 , v = 4 encoders are 15 and 17
(see Table 6.2). The octal15 can be deciphered as 15 = 1-5 = 1-101. Therefore,
Similarly,
Therefore,
3
4
5
6
7
5
12
14
g1(D)= 1 + (1)D + (O)Ii + (1)U = 1 + D +d.
17= 1-7= 1-111.
!§;.(D)= 1 + (1)D + (1)U + (1)U = 1 + D + Ii + d, and
G(D) = [1 + D + d 1 + D + Ii + U].
Table 6.2 Rate ｾ＠ codes with maximum free distance
v n Generators duee Heller
(octal) Bound
Non Catastrophic
6 5 7 5 5
8 15 17 6 6
10 23 35 7 8
12 53 75 8 8
14 133 171 10 10
Catastrophic
10 27 35 8 8
24 5237 6731 16 16
28 21645 37133 17 17
Table 6.3 Rate 7/3 codes with maximum free distance.
,. n Generators d1
'"'" Heller
3
4
5
6
7
9
12
15
18
21
5
13
25
47
133
(octal) Bound
7
17
37
75
175
7
17
37
75
175
8
10
12
13
15
8
10
12
13
15

..
I
I
1
Table 6.4 Rate 7/4 codes with maximum free distance
l' n Generators d,,<'•-' Heller
3
4
5
{)
7
12
16
20
24
28
5
13
25
53
135
7
15
27
67
135
(Octal) Bound
7
15
33
71
147
7
17
37
75
163
10
15
16
18
20
10
15
16
18
20
Next, we study an interesting class of codes, called Turbo Codes, which lie somewhere between
linear block codes and convolutional codes.
6. 11 TURBO CODES
Turbo Codes were introduced in 1993 at the International Conference on Communications (ICC) by
Berrou, Glavieux and Thitimajshima in their paper "Near Shannon Limit Error Correction
Coding and Decoding-Turbo-Codes". In this paper, they quoted a BER performance of 10-5
at
an E/No of 0.7 dB using only a 112 rate code, generating tremendous interest in the field. Turbo
Codes perform well in the low SNR scenario. At high SNRs, some of the traditional codes like
the Reed-Solomon Code have comparable or better performance than Turbo Codes.
Even though Turbo Codes are considered as Block Codes, they do not exactly work like
block codes. Turbo Codes are actually a quasi mix between Block and Convolutional Codes.
They require, like a block code, that the whole block be present before encoding can begin.
However, rather than computing parity bits from a system of equations, they use shift registers
just like Convolutional Codes.
Turbo Codes typically use at least two convolutional component encoders and two maximum
aposteriori (MAP) algorithm component decoders in the Turbo codes. This is known as
concatenation. Three different arrangements of turbo codes are Parallel Concatenated
ｃｯｮｶｾｬｵｴｩｯｮ｡ｬ＠ Codes (PCCC), Serial Concatenated Convolutional Codes (SCCC), and
ｾｹ｢ｮ､＠ Concatenated Convolutional Codes (HCCC). Typically, Turbo Codes are arranged
hke the PCCC. An example of a PCCC Turbo encoder given in Fig. 6.28 shows that two
encoders run in parallel.
Fig. 6.28 Block Diagram of a Rate 7/3, PCCC Turbo Encoder.
Convolutional Codes
One reason for the better performance of Turbo codes is that they produce high weight code
words. For example, if the input sequence (Uk) is originally low weight, the systematic (Xk) and
parity 1 (Y1) outputs may produce a low weight codeword. However, the parity 2 output (Yf)
is less likely to be a low weight codeword due to the interleaver in front of it. The interleaver
shuffles the input sequence, Uk, in such a way that when introduced to the second encoder, it is
more likely to produce a high weight codeword. This is ideal for the code because high weight
codewords result in better decoder performance. Intuitively, when one of the encoders produces
a 'weak' codeword, the other encoder has a low probability of producing another 'weak'
codeword because of the interleaver. The concatenated version of the two codewords is,
therefore, a 'strong' codeword. Here, the expression 'weak' is used as a measure of the average
Hamming Distance of a codeword from all other codewords.
Although the encoder determines the capability for error correction, it is the decoder that
determines the actual performance. The performance, however, depends upon which algorithm
is used. Since Turbo Decoding is an iterative process, it requires a soft output algorithm like the
maximum a-posteriori algorithm (MAP) or the Soft Output Viterbi Algorithm (SOVA) for
decoding. Soft output algorithms out-perform hard decision algorithms because they have
available a better estimate of what the sent data actually was. This is because soft output yields
a gradient of information about the computed infohnation bit rather than just choosing a 1 or 0
like hard output. A typical Turbo Decoder is shown in Fig. 6.29.
The MAP algorithm is often used to estimate the most likely information bit to have been
transmitted in a coded sequence. The MAP algorithm is favoured because it outperforms other
algorithms, such as the SOVA, under low SNR conditions. The major drawback, however, is
that it is more complex than most algorithms because of its focus on each individual bit of
information. Research in the area (in late 1990s) has resulted in great simplification of the MAP
algorithm.
ｶＡＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠
De-lnter1eaver
Decoder1
lnterleaver
Final Estimate
Fig. 6.29 Block Diagram of a Turbo Decoder.

A Turbo Decoder generally uses the MAP algorithm in at least one of its component
decoders. The decoding process begins by receiving partial information from the channel (Xk
and Yi} and passing it to the first decoder. The rest of the information, parity 2 (Yl ), goes to the
second decoder and waits for the rest of the information to catch up. While the second decoder
is waiting, the first decoder makes an estimate of the transmitted information, interleaves it to
match the format of parity 2, and sends it to the second decoder. The second decoder takes
information from both the first decoder and the channel and re-estimates the information. This
second estizp.ation is looped back to the first encoder where the process starts again. The iterative
process of the Turbo Decoder is illustrated below in Fig. 6.30.
esif1'Ste , _ ; ｾｴ･＠
ｾ｡｜ＨＮＸＵ＠ｾｯｦｴＱＧＸ｜｜ＰｉＧ＠ｩｔｓｾｾｾ＠
based,"". 0.-- .
ｾ･ｳｾ＠
ｩｦｏＨｴＧＺｾ＠
·ni0ft1'ai0{'
Reeewes ' ne aod
ｾＬＬ＠
esiroate
ｩｦ｡ｮｳＱ･ｦＢｓｾ＠
0{fS ｾｦ＠ .........
Fig. 6.30 Iterative Decoding of Turbo Code.
This cycle will continue until certain conditions are met, such as a certain number ofiterations
being performed. It is from this iterative process that Turbo Coding gets its name. The decoder
circulates estimates of the sent data like a turbo engine circulates air. When the decoder is
ready, the estimated information is finally kicked out of the cycle and hard decisions are made
in the threshold component. The result is the decoded information sequence.
In the following section, we study two decoding methods for the Turbo Codes, in detail.
6.12 TURBO DECODING
We have seen that the Viterbi Algorithm is used for the decoding of convolutional codes. The
Viterbi Algorithm performs a systematic elimination of the paths in the trellis. However, such
luck does not exist for Turbo Decoder. The presence of the interleaver complicates the matter
immensely. Before the discovery of Turbo Codes, a lot of work was being done in the area of
Convolutional Codes
suboptimal decoding strategies for concatenated codes, involving multiple decoders. The
symbol-by-symbol maximum a posteriori (MAP) algorithm of Bahl, Cocke, Jelinek and Raviv,
published in the IEEE Transactions on Information Theory in March 1974 also received some
attention. It was this algorithm, which was used by Berrou et al. in the iterative decoding of their
Turbo Codes. In this section, we shall discuss two methods useful for Turbo Decoding:
(A) The modified Bahl, Cocke,Jelinek and Raviv (BCJR) Algorithm.
(B) Th_e Iterative MAP Decoding.
A. MODIFIED BAHL, COCKE, JELINEK AND RAVIV (BCJR) ALGORITHM
The modified BCJR Decoding Algorithm is a symbol-by-symbol decoder. The decoder decides
uk= +1 if
P(uk= +II y) > P(uk= -11 y), (6.44)
and it decides uk = -1 otherwise, where y = (yi, y2, ..., Yn) is the noisy received word.
More succincdy, the decision uk is given by
uk = sign [L(uk )] (6.45)
where L(uk ) is the Log A Posteriori Probability (LAPP) Ratio defined as
L(u )= lo ( P(uk = +1ly))
k g P(uk = -1ly)
(6.46)
Incorporating the code's trellis, this may be written as
[
ｾｰ＠ (sk-I= s',sk = s,y)/p(y) J
L(uk ) = log , '
IP (sk-I =s 'sk =s, y)/p(y)
s-
(6.47)
where sk e S is the state of the encoder at time k, s+ is the set of ordered pairs (s', s)
corresponding to all state transitions {sk-I = f) to {sk = s) caused by data input uk = +1, and
s- is similarly defined for uk = -1.
Let us define
(6.48)
Iak-I (s')yk(s', s)
s' and
IIak-I(s')yk(s' s)
(6.49)
s' s'
Ifik(s')rk(s',s)
s' (6.50)
f3k-1 (s') =
I Iiik-1 (s')yk(s',s)
s' s'

ｾｉ＠

:1
j
with boundary conditions
ao(O) = 1 and lXo ((s :t 0) = 0,
fiN(O) = 1 and fiN(s :t 0) = 0.
Then the modified BCJR Algorithm gives the LAPP Ratio in the following form
B. ITERATIVE MAP DECODING
(6.51)
(6.52)
The decoder is shown in Fig. 6.31. D1 and D2 are the two decoders. Sis the set of 2m constituent
encoder states. y is the noisy received word. Using Baye's rule we can write L(ulc) as
L(ulc) =log ( p (yiulc =+1) J+ log ( p (ulc =+1) J
P (yiulc =-1) P (ulc =-1)
(6.53)
with the second term representing apn:ori information. Since P(ulc = +1) = P(ulc = -1 ) typically,
the a priori term is usually zero for conventional decoders. However, for iterative decoders, D1
receives extrinsic or soft information for each ulc from D2 which serves as a priori information.
Similarly, D2 receives extrinsic information from D 1 and the decoding iteration proceeds with
the each of the two decoders passing soft information along to the other decoder at each half-
iteration except for the first. The idea behind extrinsic information is that D2 provides soft
information to D1 for each ub using only information not available to D1. D1 does likewise for
D2.
N-Bit
ｾｄＦＭｩｮｴ･ｲｴ＠
01 D2
y1P
N-Bit MAP
yB e
L12
tnter1eaT Decoder2
N-Bit
lntertea'f
ｹｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ］］］］］］］］ＭＭＭＭＭＭ｟｟ｪ＠
Fig. 6.31 Implementation of the Iterative Turbo Decoder.
At any given iteration, D1 computes
Lr(ulc) = L,y}. +L21(ulc)+.G2(ulc) (6.54)
where, the first term is the channel value, L, = 4E, I N0 (E, = energy per channel bit), L2r (ulc) is
extrinsic information passed from D2 to D1, and .G2 (ulc) is the extrinsic information from D1
to D2.
where
Convolutional Codes
e(' )- [1L P P]
yk S 'S - exp 2 cylcXlc '
ｲｾ｣ＨｳＧＬｳＩ＠ = exp [ ｾ＠ｵｾ｣Ｈｌ･Ｈｵｬ｣ＩＫｌＬｹｫＩ｝ＮｲｫＨｳＧＬｳＩ＠
_Lalr.-1(s')rｾｲＮＨ＠ s' s)
alc(s) = ii ')'and
a1c_1(s')r ｾ｣ＨｳＧ＠ s
s s'
Lｾ＠ lc(s)yｾ｣ＨｳＧＬｳＩ＠
ｾｬｲＮＭｲＨｳＧＩ＠ = .LI ·
ale-! (s')ylc(s',s)
s s'
(6.55)
(6.56)
(6.57)
{6.58)
{6.59)
(6.60)
For the above algorithm, each decoder must have full knowledge of the trellis of the
constituent encoders, i.e. each decoder must have a table containing the input bits and parity
bits for all possible state transitions s' to s. Also, care should be taken that the last m bits of the
Nbit information word to be encoded must force encoder1 to the zero state by the ｾ＠ bit.
The complexity of convolutional codes has slowed the development of low-cost Turbo
Convolutional Codes (TCC) decoders. On the other hand, another type of turbo code, known
as Turbo Product Code (TPC), uses block codes, solving multiple steps simultaneously,
thereby achieving high data throughput in hardware. We give here a brief introduction to
product codes. Let us consider two systematic linear block codes C1 with parameters (nbkl,d1)
and ｾｷｩｴｨ＠ parameters ＨｾＬ＠ k;_, ｾ＠ ) where ni, ki and di (i = 1, 2) ｾｴ｡ｮ､＠ for codeword length,
number of information bits and minimum Hamming Distance respectively. The concatenation
of two block codes (or product code) P = C1 * ｾｩｳ＠ obtained (see Fig. 6.32) by the following
steps:

I
n1
----- ＭｾＭ n2 ｾＭＭＭＭＭＭＭ ----
Check on columns
Checks
on
rows
ｃｍ｣ｫｩｾ＠
-Oft ｾＭＭＭＭ
ｾＭＭ
Fig. 6.32 Example of a Product Code P = C7
,. C:c
(i) placing (k1 x ｾ＠ ) information bits in an array of k1 rows and ｾ＠ columns,
(ii) coding the k1 rows using code C2,
(iii) coding the ｾ｣ｯｬｵｭｮｳ＠ using code cl.
The parameters of the product code Pare : n= n1 * ｾ＠ , k = k1 * ｾ＠ , d =:= d1 * ｾ＠ and the code
rate R is given by R1 *R;_ where Ri is the code rate of code C;- Thus, we can build very long block
codes with large minimum Hamming Distance by combining short codes with small minimum
Hamming Distance. Given the procedure used to construct the product code, it is clear that the
Ｈｾ＠ - ｾＩ＠ last columns of the matrix are codewords of C1. By using the matrix generator, one can
show that-the last rows of matrix Pare codewords of ｾＭ Hence all the rows of matrix Pare
codewords of C1 and all the columns of matrix Pare codewords of ｾＭ
Let us now consider the decoding of the rows and columns of a product code P transmitted
ona Gaussian Channel using QPSK signaling. On receiving matrix R corresponding to a
transmitted codeword E, the first decoder performs the soft decoding of the rows (or columns)
of P using as input matrix R. Soft Input I Soft Output decoding is performed using the new
algorithm proposed by R. Pyndiah. By subtracting the soft input from the soft output we obtain
the extririsic information W(2) where index 2 indicates that we are considering the extrinsic
information for the second decoding of P which was computed during the first decoding of P.
The soft input for the decoding of the columns (or rows) at the second decoding ofPis given by
R(2) = R + a(2)W(2), (6.61)
where a(2) is a scaling factor which takes into account -the fact that the standard deviation of
samples in matrix R and in matrix W are different. The standard deviation of the extrinsic
information is very high in the first decoding steps and decreases as we iterate the decoding.
This scaling factor a is also used to reduce the effect of the extrinsic information in the soft
Convolutional Codes
decoder in the first decoding steps when the BER is relatively high. It takes a small value ｩｾ＠ the
first decoding steps and increases as the BER tends to 0. ｔｾ･＠ decodin? pr?cedure descnbed
above is then generalized by cascading elementary decoders Illustrated m Fig. 6.33.
a(m) B(m) l
W(m + 1)
R
R
DetAY LINE
R
Fig. 6.33 Block Diagram of Elementary Block Turbo Decoder.
Let us now, briefly, look at the performance of Turbo Codes and compare it to that of other
existing schemes. As shown in Fig. 6.34, Turbo Codes are the best practical codes due to their
performance at lowSNR (at high SNRs, the Reed Solomon Codes ｯｾｴｰ･ｲｦｯｲｭ＠ Turbo Codes!). ｾｴ＠
is obvious from the graph that the Recursive Systematic Convolutional (RSC) Turbo Code 1s
the best practical code known so far because it can achieve low BER at ｾｯｾ＠ SNR and ｩｾ＠ the
closest to the theoretical maximum of channel performance, the Shannon Limit. The magnitude
of how well it performs is determined by the coding gain. It can be recalled that the coding gain
is the difference in SNR between a coded channel and an uncoded channel for the same
performance (BER). Coding gain can be determined by measuring the distance between the
10--{)
10-1 r---
10-2
10-3
10-4
Bit error 10-S
rate
ＱＰｾ＠
10-7
ＱＰｾ＠
D
[
A
10-9
-1
en
16
::J
::J
0
::J
-i
:::T
(1)
0
ｉｾ＠
('i"
Ill
r
3"
0
-r-- ｾ＠ｾｩｮｧ＠ u.,
｜Ｈｾ＠ ,
--...:..
F---r---.....
a
ｾ｜ｾ＠Ｇｾ＠
q
ｾ＠
II <:ll ｾ｜｜ｩ＠ -8 5 dB(/ )
I Ｚ｡ｾ｜＠
11t
ｾ＠
ｾ＠ ,
0
ｾＱ｜＠

2 3 4 5 6 7 8
Signal to Noise Ratio (dB)
Fig. 6.34 Comparison of Different Coding Systems.
ｾ＠
9 10

!· Information Theory, Coding and Cryptography
SNR values of any of the coded channels and the uncoded channel at a given BER. For example,
the coding gain for the RSC Turbo code, with rate 112 at a BER of 10-5
, is about 8.5 dB. The
physical consequence can be visualized as follows. Consider space communication where the
received power follows the inverse square law (PR oc 11d2
). This means that the Turbo coded
signal can either be received 2.65 (= -J7) times farther away than the uncoded signal (at the same
transmitting power), or it only requires 1/7 the transmitting power (for same transmitting
distance). Another way of looking at it is to turn it around and talk about portable device battery
lifetimes. For instance, since the RSC Turbo Coded Channel requires only 1/7 the power of the
uncoded channel, we can say that a device using a Turbo codec, such as a cell phone, has a
battery life 7 times longer than the device without any channel coding.
The notion of convolutional codes was first proposed by Elias (1954) and later developed by
Wozencraft (1957) and Ash (1963). A class of multiple error correcting convolutional code was
suggested by Massey (1963). The study of the algebraic structure of convolutional codes was
carried out by Massey (1968) and Forney (1970).
Viterbi Decoding was developed by AndrewJ. Viterbi, founCler of Qualcomm Corporation.
His seminal paper on the technique titled "Error Bounds for Convolutional Codes and an
Asymptotically Optimum Decoding Algorithm," was published in IEEE Transactions on
Information Theory, Volume IT-13, pages 260-269, in April, 1967. In 1968, Heller showed that
the Viterbi Algorithm is practical if the constraint length is not too large.
Turbo Codes represent the next leap forward in error correction. Turbo Codes were
introduced in 1993 at the International Conference on Communications (ICC) by Berrou, Glavieux
arrl Thitimajshima in their paper "Near-Shannon-Limit Error Correction Coding and
Decoding- Turbo-Codes". These codes get their name from the fact that the decoded data are
recycled through the decoder several times. The inventors probably found this reminiscent of
the way a turbocharger operates. Turbo Codes have been shown to perform within 1 dB of the
Shannon Limit at a BER of 1o-5
. They break a complex decoding problem down into simple
steps, where each step is repeated until a solution is reached. The term "Turbo Code" is often
used to refer to turbo convolutional codes (TCCs)-one form of Turbo Codes. The symbol-by-
symbol Maximum A Posteriori (MAP) algorithm of Bahl, Cocke, Jelinek and Raviv, published in
1974 (nineteen years before the introduction of Turbo Codes!), was used by Berrou et al. for the
iterative decoding of their Turbo Codes. The complexity of convolutional codes has slowed the
development of low-cost TCC decoders. On the other hand, another type of Turbo Code,
known as Turbo Product Code (TPC), uses block codes, solving multiple steps simultaneously,
thereby achieving high data throughput in hardware.
Convolutional Codes
SUMMARY
• An important sub-class of tree codes is called Convolutional Codes. Convolutional Codes
make decisions based on past information, i.e. memory is required. A (AQ, no) tree code
that is linear, time-invariant, and has a finite wordlength k= (m + 1)AQ is called an (n, k)
Convolutional Code.
• For Convolutional Codes, much smaller blocks of uncoded data of length hQ are used.
These are called Information Frames. These Information Frames are encoded into
Codeword Frames of length Tlo· The rate of this Tree Code is defined as R = .5!_.
no
• The constraint length of a shift register encoder is defined as the number of symbols it
can store in its memory.
• For Convolutional Codes, the Generator Polynomial Matrix of size hQ x 1lo is given by
C{D) = [gij(D)], where, g9
{D) are the generator polynomials of the code. gij(D) are obtained
simply by tracing the path from input i to output j.
• The Wordlength of a Convolutional Code is given by k= hQ , fi?.3:X [deg giJ (D) + 1], the
1,)
Blocklength is given by n= 1lo , ｾ＠ [deg gij(D) + 1] and the constraint length is given by
1,)
*<J
v = L ｭｾ＠ [deg gij (D)]
i= 1 1
• The encoding operation can simply be described as vector matrix product, C(D) =
*<J
I(D) G(D), or equivalently, c
1(D)= Liz(D)gz,1(D).
i=l
• A parity check matrix H(D) is an (no - ｾＩ＠ by 1lo matrix of polynomials that satisfies
G(D)H(D)T= 0, and the syndrome polynomial vector which is a (no- AQ)-componentrow
vector is given by s (D) = v(D) H (D) T.
• A systematic encoder for a convolutional code has the generator polynomial matrix ot
the form G(D)= [/I P (D)], where I is a hQ by ｾ＠ identity matrix and P (D) is a ｾ＠ by (no - AQ)
matrix of polynomials. The Parity check polynomial matrix for a Systematic
Convolutional Encoder is H(D)= [- P(Df I/].
• A Convolutional Code whose generator polynomials g1(D), g2(D),..., griJ(D) satisfy
GCD[g1(D), g2(D), ..., griJ(D)] = Jf, for some ais called a Non-Catastrophic Convolutional
Code. Otherwise it is called a Catastrophic Convolutional Code.
• The lth minimum distance lzof a Convolutional Code is equal to the smallest Hamming
Distance between any two initial codeword ｳ･ｾ･ｮｴｳ＠ l frame long that are not identical in
the initial frame. If l= m+ 1, then this (m + 1) minimum distance is called the minimum
distance of the code and is denoted by d*, where m is the number of information frames
that can be stored in the memory of the encoder. In literature, the minimum distance is
also denoted by dmin .

• If the l th minimum distance of a Convolutional Code is d; , the code can correct terrors
occurring in the first l frames provided, d［ｾ＠ 2t + 1. The free distance of a Convolutional
Code is given by ｾ･＠ = mF [dz].
• The Free Length nfree of a Convolutional Code is the length of the non-zero segment of a
smallest weight convolutional codeword of non zero weight. Thus, d1= flt,ee if l= "free, and
d1<dfree if l < nfree . In literature, nfree is also denoted by n00 •
• Ano!her way to find out the flt,ee of a Convolutional Code is use the concept of a
generating function, whose expansion provides all the distance information directly.
• The generator matrix for the Convolutional Code is given by
[
G0 G1 G2 ··· Gm
G= 0 Go G1 .. · Gm- 1 Gm
0 0 G0 Gm _2 Gm -1 Gm
0 0
0 0 '"]
0 0 .. .
0 0 .. .
0
• The Viterbi Decoding Technique is an efficient decoding method for Convolutional
·Codes. Its computational requirements grow exponentially as a function of the constraint
length.
• For rate R and constraint length, let d be the largest integer that satisfies H( ｾ＠ v ) ,; I - R .
Then, at least one Binary Convolutional Code exists with minimum distance dfor which
the above inequality holds. Here H(x) is the familiar entropy function for a binary
alphabet.
• For a binary code with R = llno the minimum distance dmin satisfies dmin ｾ＠ UnoV + no)!2J,
where LI Jdenotes the largest integer less than or equal to 1
• An upper bound on dfree is given by Heller is dfree = min ｬｾＭＭＭＭＭｴ｟Ｈｶ＠ + j -1)j. To
j'?.1 2 21 -1
calculate the upper bound, the right hand side should be plotted for different integer
values of j. The upper bound is the minimum of this plot.
• For Convolutional Codes, the upper bound on the first error probability can be obtained
1 ()T(D, I)
by Pe ｾ＠ T(D)ID= 2
ｾ
ＱＭ ) and the bit error probability P6 ｾＭ
ＭｶｐｾｬＭｐｊ＠ k ()I r:tl--
1 = 1,<D= 2-.; p(1- p)
• Turbo codes are actually a quasi mix between Block and Convolutional Codes. Turbo
Codes typically use at least two convolutional component encoders and two maximum
aposteriori (MAP) algorithm component decoders in the Turbo Codes. Although the
encoder determines the capability for the error correction, it is the decoder that
determines the actual performance.
Convolutional Codes
ｾ＠ It't-Jc.iwLof.c.·""'-0- do- the,- rｍｬｬＧｨｾ＠ • ｾｊｊ＠ｾＮＮ＠ｾ＠
i ! 6 f!NrV ｾﾷ＠ ..ｲｾｾ＠ I
i 1 Walt Disney (1901-1966)
Gr---------------------------------------------------_j
PRV13LEjvtS
6.1 Design a rate 1
12 Convolutional encoder with a constraint length v = 4 and d* = 6.
(i) Construct the State Diagram for this encoder.
(ii) Construct the Trellis Diagram for this encoder.
(iii) What is the dfree for this code?
(iv) Give the Generator Matrix, G.
(v) Is this code Non-Catastrophic? Why?
ｾ･ｳｩｧｮ＠ a (12, 3) systematic convolutional encoder with a constraint length v = 3 and
'?.8.
(i) Construct the Trellis Diagram for this encoder.
(ii) What is the dfree for this code?
ｾｯｮｳｩ､･ｲ＠ the binary encoder shown in Fig. 6.35.
Fig. 6.35
(i) Construct the Trellis Diagram for this encoder.
'@YW"rite down the values ｯｦｾＧ＠ no, v, mand R for this encoder. j,: I I ?lt: ,, v; t;..
(iii) What are the values of d* and dfree for this code? /rl::. 4-. ,._ :. ｾ＠ .•
ｾｩｾＺＭ the Generator ｐｯｬＩＧｉＡｯｾｲｾ＠
. Gｾ＠ [D+ I ｐｾＭＱＭ D'" {/+ ｄｾＭｴ＠ D).l

ｉｾ＠
jt
i
j
j
ｾｯｮｳｩ､･ｲ＠ the binary encoder shown in Fig. 6.36.
Fig. 6.36
(i) Write down the values of k, n, v, m and R for this encoder.
(ii) Give the Generator Polynomial Matrix G{D) for this encoder.
(iii) Give the Generator Matrix G for this encoder.
(iv) Give the Parity Check Matrix H for this encoder.
(v) What are the values of a, ｾ･＠ and nfree for this code?
(vi) Is this encoder optimal in the sense of the Heller Bound on dfree-
(vii) Encode the following sequence of bits using this encoder: 101 001 001 010 000.
ｾｯｮｳｩ､･ｲ＠ a tonvolutional encoder described by its Generator ｐｯｯＯｴＡｾｾｩ｡ｬ＠ Matrix, defined
over GF(2): ---
--------.....
[
D 0
G{D) = D 2
0
1 0
1 D2
0 1+D
D 2
0
01Draw the circuit realization of this encoder using shift registers. What is the value of
v? - - - -
(ii) Is this a Catastrophic Code? Why?
(iii) Is this code optimal in the sense of the Heller Bound on dfree .
Convolutional Codes
6.6 The Parity Check Matrix of the (12, 9) Wyner-Ash code form= 2 is given as follows.
1 1 1 1 : I
I
I I I
1 1 0 o:1 1 1 1 I I
I I
I
0
I I
1 0 1 0: 1 1 o:1 1 1 1 I
H= I
0 0 0 o:1 0 1 0 : 1 1 0 0 1 1 1 1 : ...
I I I
0 0 0 o:o 0 0 0: 1 0 1 0 1 1 0 o:
(i) Determine the Generator Matrix, G.
(ii) Determine the Generator Polynomial Matrix, G{D).
(iii) Give the circuit realization of the (12, 9) Wyner-Ash Convolutional Code.
(viii) What are the values of d* and dfree for this code?
6.7 Consider a Convolutional Encoder defined over GF(4) with the Generator Polynomials
g1(D) = 2D3
+ 3D2
+ 1 and
ｾＨｄＩ＠ = D3
+ D + 1.
(i) What is the minimum distance of this code?
(ii) Is this code Non-Catastrophic? Why?
ｾｴ＠ the Generator Polynomials of a 113 binary Convolutional Encoder be given by
g1(D) =D3
+ d + 1,
ｾＨｄＩ＠ = D3
+ D and
&(D)= D3
+ 1.
ｾｮ｣ｯ､･＠ the bit stream: Q_J__1OQQ11110101;
(ii) Encode the bit stream: 10i0101010 ....
(iii) Decode the received bit stream: 001001101111000110011.
6.9. Consider a rate 1
12 Convolutional Encoder defined over GF (3) with the Generator
Polynomials
g1(D) = 2d + 2Ii + 1 and
ｾＨｄＩ］＠ Jj + D + 2.
(i) Show the circuit realization of this encoder.
(ii) What is the minimum distance of this code?
(iii) Encode the following string of symbols using this encoder: 2012111002102.
(iv) Suppose the error vector is given by 0010102000201. Construct the received vector
and then decode this received vector using the Viterbi Algorithm.
COMPUTER PROBLEMS
6.10 Write a computer program that determines the Heller Bound on dfree, given the values for
n0 and v.

I204J Information Theory, Coding and Cryptography
6.11 Write a computer program to exhaustively search for good systematic Convolutional
Codes. The program should loop over the parameters ｾＧ＠ no, v, m, etc. and determine the
Generator Polynomial Matrix (in octal format) for the best Convolutional Code in its
category.
6.12 Write a program that calculates the d and dfree given the generator polynomial matrix of
any convolutional encoder.
6.13 Write a computer program that constructs all possible rate 1
12 Convolutional Encoder for
a given constraint length, v and chooses the best code for a given value of v. Using the
program, obtain the following plots:
(i) the minimum distance, d* versus v, and
(ii) the free distance, fip.ee versus v.
Comment on the error correcting capability of Convolutional Codes in terms of the
memory requirement.
6.14 Write a Viterbi Decoder in software that takes in the following:
(i) code parameters in the Octal Format, and
(ii) the received bit stream
The decoder then produces the survivors and the decoded bit stream.
6.15 Verify the Heller Bound on the entries in Table 6.4 for v = 3 , 4, ..., 7.
6.16 Write a generalized computer program for a Turbo Encoder. The program should take in
the parameters for the two encoders and the type of interleaver. It should then generate
the encoded bit-stream when an input (uncoded) bit-stream is fed into the program.
6.17 Modify the Turbo Encoder program developed in the previous question to determine the
dr-ee of the Turbo Encoder.
6.18 Consider a rate 113 Turbo Encoder shown in Fig. 6.37. Let the random interleaver size
be 256 bits.
(i) Find the fip.ee of this Turbo encoder.
(ii) If the input bit rate is 28.8 kb/s, what is the time delay caused by the Encoder.
6.19 Write a generalized computer program that performs Turbo Decoding using the iterative
MAP Decoding algorithm. The program should take in the parameters for the two
encoders, the type of interleaver used for encoding and the SNR It should produce a
sequence of decoded bits when fed with a noisy, encoded bit-stream.
6.20 Consider the rate 1/3 Turbo Encoder comprising the following constituent encoders:
G (D) = G (D)= (1 1+ D2 + D3 + D4 )
1 2 1+ D + D4 .
The encoded output consists of the information bit, followed by the two parity bits from
the two encoders. Thus the rate of the encoder is 113. Use a random interleaver of size
256.
Convolutional Codes
ｾＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠
Fig. 6.37 Turbo Encoder for Problem 6.78.
(i) For this Turbo Encoder, generate a plot for the bit error rate (BER) versus the signal
to noise ratio (SNR). Vary the SNR from -2 dB through 10 dB.
(ii) Repeat the above for an interleaver of size 1024. Comment on your results.

I
.,
l
7
Trellis Coded Modulation
7.1 INTRODUCTION TO TCM
In the previous chapters we have studied a number of error control coding techniques. In all
these techniques, extra bits are added to the information bits in a known manp.er. However, the
improvement in the Bit Error Rate is obtained at the expense of bandwidth caused by these
extra bits. This bandwidth expansion is equal to the reciprocal of the code rate.
For example, an RS (255, 223) Code has a code rateR= 223/255 = 0.8745 and IIR= 1.1435.
Hence, to send 100 information bits, we have to transmit 14.35 extra bits (overhead). This
tr3.I1:slates to a bandwidth expansion of 14.35%. Even for this efficient RS (255, 223) code, the
excess bandwidth requirement is not small.
In power limited channels (like deep space communications) one may trade the bandwidth
expansion for a desired performance. However, for bandwidth limited channels (like the
telephone channel), this may not be the ideal option. In such channels, a bandwidth efficient
signalling scheme such as Pulse Amplitude Modulation (PAM), Quadrature Amplitude
Modulation (QAM) or Multi Phase Shift Keying (MPSK) is usually employed to support high
bandwidth efficiency (in bit!s/Hz).
In general, either extra bandwidth or a higher signal power is needed in order to improve the
performance (error rate). Is it possible to achieve an improvement in system performance
without sacrificing either the bandwidth (which translates to the data rate) or using additional
power? In this chapter we study a coding technique called the Trellis Coded Modulation
Technique, which can achieve better performance without bandwidth expansion or using extra
power.
We begin this chapter by introducing the concept of coded modulation. We, then, study
some design techniques to construct good Coded Modulation Schemes. Finally, the
performance of different Coded Modulation Schemes are discussed for Additive White
Gaussian Noise (AWGN) Channels as well as for Fading Channels.
7.2 THE CONCEPT OF CODED MODULATION
Traditionally, coding and modulation were considered two separate parts of a digital
communications system. The input message stream is first channel encoded (extra bits are
added) and then these encoded bits are converted into an analog waveform by the modulator.
The objective of both the channel encoder and the modulator is to correct errors resulting from
use of a non-ideal channel. Both these blocks (the encoder and the modulator) are optimized
independently even though their objective is the same, that is, to correct errors introduced by the
channel! As we have seen, a higher performance is possible by lowering the code rate at the cost
of bandwidth expansion and increased decoding complexity. However, it is possible to obtain
Coding Gain without bandwidth expansion if the channel encoder is integrated with the
modulator. We illustrate this by a simple example.
Example 7.1 Consider data transmission over a channel with a throughput of 2 bits/s/Hz. One
possible solution is to use uncoded QPSK. Another possibility is to first use a rate 213
Convolutional Encoder (which converts 2 uncoded bits to 3 coded bits) and then use an 8-PSK
signal set which has a throughput of 3 bit/s/Hz. This coded 8-PSK scheme yields the same
information data throughput of the uncoded QPSK (2 bit/s/Hz). Note that both the QPSK and the
8-PSK schemes require the same bandwidth. But we know that the. symbolerrorrate for the 8-PSK
is worse than that of QPSK for the same energy per symbol However, the 213 convolutional
encoder would provide some coding gain. It may be possible that the coding gain provided by the
encoder outweighs the performance loss because of the 8-PSK signal set. Ifthe coded modulation
scheme performs superior to the uncoded one at the same SNR, we can claim that an improvement
is achieved without sacrificing either the data rate or the bandwidth. In this example we have
combined a trellis encoder with the modulator. Such a scheme is called a Trellis Coded
Modulation (TCM) scheme.
.,

1'
I
ＱｾＬ＠i,:
I'
We observe that the expansion of the signal set to provide redundancy results in the shrinking
of the Euclidean distance between the signal points, if the average signal energy is to be kept
constant (Fig. 7.1). This reduction in the Euclidean distance increases the error rate which
should be compensated with coding (increase in the Hamming Distance). Here we are assuming
an AWGN channel. We also know that the use of a hard-decision demodulation prior to decoding
in a coded scheme causes an irreversible loss of information. This translates to a loss of SNR.
For coded modulation schemes, where the expansion of the signal set implies a power penalty,
the use of soft-decision decoding is imperative. As a result, demodulation and decoding should be
combined in a single step, and the decoder should operate on the soft output samples of the
channel. For maximum likelihood decoding using soft-decisions, the optimal decoder chooses
that code sequence which is nearest to the received sequence in terms of the Euclidean distance.
Hence, an efficient coding scheme should be designed based on maximizing the minimum
Euclidean distance between the coded sequences rather than the Hamming Distance.
QPSK 8-PSK
s1 52
S3
Ｘｾ＠ = 2E8
812 = 4Es
Ss
Ｘｾ＠ = 0.586 E5
812=2Es
8; = 3.414 E5
8t = 4 Es
Fig. 7.1 The Euclidean Distances between the Signal Points for QPSK and 8-PSK.
The Viterbi Algorithm can be used to decode the received symbols for a TCM scheme. In the
previous chapter we saw that the basic idea in Viterbi decoding is to trace out the most likely
path through the trellis. The most likely path is that which is closest to the received sequence in
terms of the Hamming Distance. For a TCM scheme, the Viterbi decoder chooses the most
likely path in terms of Euclidean Distance. The performance of the decoding algorithm depends
on the minimum Euclidean distance between a pair of paths forming an error event.
Definition 7.1 The minimum Euclidean Distance between any two paths in the
trellis is called the Free Euclidean Distance, dfr« of the TCM scheme.
In the previous chapter we had defined dp.ee in terms of Hamming Distance between any two
paths in the trellis. The minimum free distance in terms of Hamming Weight could be calculated
as the minimum weight of a path that deviates from the all zero path and later merges back into
the all zero path at some point further down the trellis. This was a consequence of the linearity
of Convolutional Codes. However, the same does not apply for the case of TCM, which is non
linear. It may be possible that dfree is the Euclidean Distance between two paths in the trellis
neither of which is the all zero path. Thus, in order to calculate the Free Euclidean Distance for
a TCM scheme, all pairs of paths have to be evaluated.
Example 7.2 Consider the convolutional encoder followed by a modulation block performing
natural mapping (000 ｾ＠ s0 , 001 ｾ＠ s1 , .•., 111 ｾ＠ s7 ) shown in Fig. 7.2. The rate of the encoder
is 2/3.1t takes in two bits at a time (a1, a2J and outputs three encoded bits (c1 , c2 , c3 ). The three
output bits are then mapped to one of the eight possible signals in the 8-PSK signal set.
(8-PSK)
Fig. 7.2 The TCM Scheme for Example 7.2.
This combined encoding and modulation can also be represented using a trellis with its branches
labelled with the output symbol si. The TCM scheme is depicted below. This is a fully connected
trellis. Each branch is labelled by a symbol from the 8-PSK constellation diagram. In order to
represent the symbol allocation unambiguously, the assigned symbols to the branches are written
at the front end ofthe trellis. The convention is as follows. Consider state 1. The branch from state
1 to state 1 is labelled with s0, branch from state 1 to state 2 is labelled with s7 , branch from state
1 to state 3 is labelled with s5 and branch from state 1 to state 4 is labelled with s2. So, the 4-tuple
{s0 , s7 , s5 , s2 } in front of state 1represents the branch labels emanating from state 1 in sequence.
To encode any incoming bit stream, we follow the same procedure as for convolutional encoder.
However, in the case of TCM, the output is a sequence of symbols rather than a sequence of bits.
Suppose we have to encode the bit stream 1 0 1 1 1 0 0 0 1 0 0 1 ... We first group the input
sequence in pairs because the input is two bits at a time. The grouped input sequence i&
10111000 ...
The TCM encoder output can be obtained simply by following the path in the trellis as dictated
by the input sequence. The first input pair is 10. Starting from the first node in state 0, we traverse
the third branch emanating from this node as dictated by the input 01. This takes us to state 2. The

I
I
I
symbol output for this branch is s5• From state 2 we move along the fourth branch as determined by
the next input pair 11. The symbol output for this branch is s1• In this manner, the output symbols
corresponding to the given input sequence is
State 0: So s, ss
State 1: ss s2 so
%
8-PSK
8:2
Fig. 7.3 The Path in the Trellis Corresponding to the Input Sequence 10 11 10 00 ...
The path in the Trellis Diagram is depicted by the bold lines in Fig. 7.3. As in the case of
convolutional encoder, in TCM too, every encoded sequence corresponds to a unique path in the
trellis. The objective of the decoder is to recover this path from the Trellis Diagram.
Example 7.3 Consider the TCM scheme shown in Example 7.2. The free Euclidean Distance,
dfree of the TCM scheme can be found by inspecting all possible pairs of paths in the trellis. The
two paths that are separated by the minimum squared Euclidean Distance (which yields the ｾｲ･･Ｉ＠
are shown in the Trellis Diagram given in Fig. 7.4 with bold lines.
8-PSK
8:2
%
Fig. 7.4 The Two Paths in the Trellis that have the Free Euclidean Distance, d 2
tree·
d}ee =4 (sa, s7) +tJl(sa, sa)+ 4 (-S2, si)
= B5 + 0 + B5 = 2B5 = 1.172 Er
It can be seen that in this case, the error event that results in dfree does not involve the all zero
sequence. As mentioned before, in order to find the dfree> we must evaluate all possible pairs of
paths in.the trellis. It is not sufficient just to evaluate the paths diverging from and later merging
back ｩｮｴｾ＠ the all zero path because of the non-linear nature of TCM.
We must now develop a method to compare the coded scheme with the uncoded one. We
introduce the concept of coding gain below.
Definition 7.2 The difference between the values of the SNR for the coded and
uncoded schemes required to achieve the same error probability is defined as the
Coding Gain, g.
g= SNRiuncodtd- SNRicodtd {7.1)
At high SNR, the coding gain can be expressed as
(d}m!Es)
g,., = giSNR--too = 10 log
2
coded (7.2)
(dfre/Es )lmCIJdtd
where g,., represents the Asymptotic Coding Gain and Es is the average signal
energy. For uncoded schemes, dfree is simply the minimum Euclidean Distance
between the si al oints.
Example 7.4 Consider the TCM scheme discussed in Example 7.2 in which the encoder takes in
2 bits at a time. Ifwe were to send uncoded bits, we would employ QPSK. Thedfree for the uncoded
scheme (QPSK) is 2£8
from Fig. 7.1. From Example 7.3 we have dfree = 1.172£8 for our TCM
scheme. The Asymptotic Coding Gain is then given by
goo= 10 log 1.1
72
= -2.3 dB.
2
This implies that the performance of our TCM scheme is actually worse than the uncoded
scheme. A quick look at the convolutional encoder used in this example suggests that it has good
properties in terms of Hamming Distance. In fact, it can be verified that this convolutional encoder
is optimal in the sense of maximizing the free Hamming Distance. However, the encoder fails to
perform well for the case ofTCM. This illustrates the point that TCM schemes must be designed to
maximize the Euclidean Distance rather than the Hamming Distance.

,It
'i
i
I'
ｾ＠
For a fully connected trellis discussed in Example 7.2, by a proper choice of the mapping
·scheme, we can improve the performance. In order to design a better TCM scheme, it is possible
to directly work from the trellis onwards. The objective is to assign the 8 symbols from the
8-PSK signal set in such a manner that the dfree is maximized. One approach is to use an
exhaustive computer search. There are a total of 16 branches that have to be assigned labels
(symbols) from timet k tot k+- 1 . We have 8 symbols to choose from. Thus, an exhaustive search
would involve 816
different cases!
Another approach is to assign the symbols to the branches in the trellis in a heuristic manner
so as to increase the ､ｦｲ･ｾ＠ We know that an error event consists of a path diverging in one state
and then merging back after one or more transitions, as depicted in Fig. 7.5. The Euclidean
Distance associated with such an error event can be expressed as
d'fotaz= d1 (diverging pair of paths)+ ... + d ｾ＠ (re-merging pair of paths) (7.3)
VNodesin the Trellis
Fig. 7.5 An Error Event.
Thus, in order to design a TCM scheme with a large dfree• we can at least ensure that the ､ｾ＠
(diverging pair of paths) and the ､ｾ＠ (re-merging pair of paths) are as large as possible. In TCM
schemes, a redundant 2m+1
-ary signal set is often used to transmit m bits in each signalling
interval. The minput bits are first encoded by an ml(m+1) convolutional encoder. The resulting
m + 1 output bits are then mapped oo the signal points of the 2m+l_ary signal set. Now, recall that
the maximum likelihood decoding rule for the AWGN channel with soft decision decoding is
tominimize the squared Euclidean Distance between the received vector and the code vector
estimate from the trellis diagram (see Section 6.7, Chapter 6). Therefore, the mapping is done in
such a manner as to maximize the minimum Euclidean Distance between the different paths in
the trellis. This is done using a rule called Mapping by Set Partitioning.
7.3 MAPPING BY SET PARTITIONING
The Mapping by Set Partitioning is based on successive partitioning of the expanded 2m+1
-ary
signal set into subsets with increasing minimum Euclidean Distances. Each time we partition
the set, we reduce the number of the signal points in the subset, but increase the minimum
distance between the signal points in the subset. The set partitioning can be understood by the
following example.
iS E I ... 21 t ' &Sill
Example 7.5 Consider the set partitioning of 8-PSK. Before partitioning, the minimum Euclidean
Distance of the signal set is 6.o = Cia . In the first step, the 8 points in the constellation diagram are
subdivided into two subsets,A0 andA1 , each containing 4 signal points as shown in Fig. 7.6. As a
result of this first step, the minimum Euclidean Distance of each of the subsets is now L1o = <>t,
which is larger than the minimum Euclidean Distance of the original 8-PSK. We continue this
procedure and subdivide the sets A0 and A1 and into two subsets each, A0 ｾ＠ {Aoo. A01 } and A1 ｾ＠
{A10
, ａｾＱ＠ }. As a result of this second step, the minimum Euclidean Distance ofeach ofthe subsets
is now ｾ＠ = ｾ＠ . Further subdivision results in one signal point per subset.
±
ＯＭ［ｰｾ＠
* *
.
.61 = 81 A0 A1
• •
/ ｾ＠ / ｾ＠
* *0 *. *0
.62
= 15:3 Aoo Ao1 A1o A11
0 0 • 0 0 •
/ / / /
ｾｯｅ＠ _o'f ｯｾ＠ｾｯｾ＠ｾｯＧｴｾ＠ ± ｾｯＧｹ＠ｯｾ＠ £ ｾｯＧｾ＠
ｾｾｾｾｾｾｾｾ＠
Fig. 7.6 Set Partitioning of the 8-PSK S•signal Set.
Consider the expanded 2m+1 -ary signal set used for TCM. In general, it is not necessary to continue
the process of set partitioning until the last stage. The partitioning can be stopped as soon as the
minimum distance of a subset is larger than the desired minimum Euclidean Distance of the TCM
scheme to be designed. Suppose the desired Euclidean Dista.1ce is obtained ｪｵｳｾ＠ after the iii + 1th
set partitioning step ( iii ｾ＠ m). It can be seen that after iii+ 1 steps we have 2m+
1
subsets and
each subset contains 2m- msignal points.
A general structure of a TCM encoder is given in Fig. 7.7. It consists of m ｩｮｾｾｴ＠ bits of ｾｨｩｾｨ＠
the fh bits are fed into a rate m!( fh+ 1) convolutional encoder while the remammg m- m b1ts
are left uncoded. The fh + 1 output bits of the encoder along with the m - fh uncoded bits are
then input to the signal mapper. The ｳｾｧｮ｡ｬ＠ mapper uses the fh + 1 bits from the ｣ｯｾｶｯｬｵｴｩｯｮ｡ｬ＠
encoder to select one of the possible 2m+ 1
subsets. The remaining m - fh uncoded bits are used
to select one of the 2m+ m signals from this subset. Thus, the input to the TCM encoder are m
bits and the output is a signal point chosen from the original constellation.
1
I

I
I

,I
m
m-m
uncoded bits
"' ''
(
1
1
1
ｾ＠ Signal mapper
I I }
m
lnputbits ｾ＠
I
ＭＭＭＭ［ｾＭＭＭＭＭＭＭＭＭＭＭＮＮＮＡＮＮＮＺ＠ --T--
1
ＭＭ］ＭＭｾ＠ Select signal
I'

I
I
I
I
I
I
'_I I
-I
;n
I
I
I
'. ,' from subset
''.
ｾＺ［［ｾｲＭＭＭ［ＭＭＨ＠ -:r-
1
ＭＭＭＭＭＫＭｾ＠ } Select subset
I'/
;n + 1
coded bits
Fig. 7.7 The General Structure of a TCM Encoder.
For the TCM encoder shown in Fig. 7.7 we observe that m- muncoded bits have no effect on
the state of the convolutional encoder because the input is not being altered. Thus, we can
change the first m - m bits of the total m input bits without changing the encoder state. This
implies that 2m - m parallel transitions exist between states. These parallel transitions are
associated with the signals of the subsets in the lowest layer of the set partitioning tree. For the
case of m = m,the states are joined by single transitions.
Let us denote the minimum Euclidean Distance between parallel transitions by 11m + 1
and
the minimum Euclidean Distance between non-parallel paths of the trellis by dfree (m). The free
Euclidean Distance of the TCM encoder shown in Fig. 7.7 can then be written as
dfree = min [!1m + 1' ｾ･･＠ (m)]. (7.4)
EXtllllple 7.6 Consider the TCM scheme·proposed by Ungerboeck. It is designed to maximize the
Free Euclidean Distance between coded sequences. It consists ofa rate 2/3 convolutional encoder
coupled with an 8-PSK signal set mapping. The encoderis given inFig. 7.8 and the corresponding
trellis diagram in Fig. 7.9.
ＱＱＱＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ＠ c1
Natural
li2 ＭＭＭＭＭＭＭＭＭＭＭＮＮＮＮＭＭＭＭＭＭｾ＠ｾ＠ Mapping
(8-PSK)
Fig. 7.8 The TCM Encoder for Example 7.6.
S;
So
So 8.4 S;z Ss
s1 ss sa s-r
5:2 -% So 54
sa s-r s1 ss
Fig. 7.9 The Trellis Diagram for the Encoder in Example 7.6.
For this encoderm = 2 and m= 1, which implies that there are 2m-m = 21
= 2 parallel transitions
between each state. The minimum squared Euclidean distance between parallel transitions is
..t2 _ ..t2 _ i:'2 _
4E
.u m+1 - .u2- u2- s ·
The minimum squared Euclidean Distance between non-parallel paths in the trellis, dfne (m ), is
given by the error event shown in Fig. 7.9 by bold lines. From the figure, we have
d)u (m) = ､ｾＨｾＬ＠ oS2 ). + 4 ＨｾＬ＠ s1) + d'i: ＨｾＬ＠ oS2)
= Df + Ｘｾ＠ + 8f = 4.586 Es.
The error events associated with the parallel paths have the minimum squared Euclidean
Distance among all the possible error events. Therefore, the minimum squared Euclidean Distance
for the TCM scheme is clJree = mm(!1
2
m+ 1, dJne (m)] = 4Es. The asymptotic coding gain of this
schemejs
4
g00 = 10 log Z = 3 dB
This shows that the TCM scheme proposed by Ungerboeck shows an improvement of 3 dB over
the uncoded QPSK. This example illustrates the point that the combined coded modulation scheme
can compensate for the loss from the expansion ofthe signal set by the coding gain achieved by the
convolutional encoder. Further, for the non-parallel paths
､ｾ＠ = li (diverging pair of paths) + ... + ii {re-merging pair of paths)
= ｾＲ＠ + ... + 5{ = (5{ + ｾＲ＠
) + ... = 8l + ... = 4Es + ...
However, the minimum squared Euclidean Distance for the parallel transition is 8f= 4Es .
Hence, the minimum squared Euclidean Distance ofthis TCM scheme is determined by the parallel
transitions.

7.4 UNGERBOECK'S TCM DESIGN RULES
In 1982 Ungerboeck proposed a set of design rules for maximizing the free Euclidean Distance
·for TCM schemes. These design rules are based on heuristics.
Rule 1: Parallel Transitions, if present, must be associated with the signals of the subsets in the
lowest layer of the Set Partitioning Tree. These signals have the minimum Euclidean Distance
Rule 2: The transitions originating from or merging into one state must be associated with signals
of the first step of set partitioning. The Euclidean Distance between these signals is at least L11•
Rule 3: All signals are used with equal frequency in the Trellis Diagram.
2£& w; t : a a : .a a a
Example 7.7 Next, we wish to improve upon the TCM scheme proposed in Example 7.6. We
observed in the previous example that the parallel transitions limit the dfree .Therefore, we must
come up with a trellis that has no parallel transitions. The absence of parallel paths would imply
that the dJ;.ee is not limited todl, the maximum possible separation between two signal points in the
8-PSK Constellation Diagram. Consider the Trellis Diagram shown in Fig. 7.10. The trellis has 8
states. There are no Parallel Transitions in the Trellis Diagram. We wish to assign the symbols
from an 8-PSK signal set to the branches of this trellis according to the Ungerboeck rules.
Since there are no parallel transitions here, we start directly with Ungerboeck's second rule. We
must assign the transitions originating from or merging into one state with signals from the first
step of set partitioning. We will refer to Fig. 7.6 for the Set Partitioning Diagram for 8-PSK. The
first step of set partitioning yields two subsets, A0 andA1 , each consisting offour signal points. We
first focus on the diverging paths. Consider the topmost node (state S0 ). We assign to these four
diverging paths the signals s0, s4, s2 ands6 . Note that they all belong to the subsetA0 . ｆｾｦＧｴｨ･＠ next
node (stateS1 ), we assign the signalss1, s5, s3 ands7 belonging to the subsetA1. For the next node
(state S2 ), we assign the signals s4, s0, s6 and s2 belonging to the subset A0• The order has been
shuffled to ensure that at the re-merging end we still have signals from the first step of set
partitioning. If we observe the four paths that merge into the node of state S0, their branches are
labelleds0, s4, s2 ands6, which belong toAo. This clever assignment has ensured that the transitions
originating from or merging into one state are labelled with signals from the first step of set
partitioning, thus satisfying rule 2. It can be verified that all the signals have been used with equal
frequency. We did not have to make any special effort to ensure that.
The error event corresponding to the squared free Euclidean Distance is shown in the Trellis
Diagram with bold lines. The squared free Euclidean Distance of this TCM Scheme is
4,= 4 (.fo, 56 ) + 4 (.fo, s7 ) + di (.fo, s6 )
= ＼Ｕｾ＠ + ＼ＵｾＫ＠＼Ｕｾ］＠ 4.586 Es
State So: so S<j ｾ＠ Ss 0
State Sf s1 ss s3 s-, 0
State S2: S<j so Sa s2 0
State S3: Ss s1 s-, s3 0 0
State S4: ｾ＠ Ss So s4 0 0
State S5: s3 s-, s1 ss 0 0
State Ss: ss ｾ＠ s4 so 0 0
State S7: s-, S3 ss s1 0 0 0
Fig. 7.10 The Trellis Diagram for the Encoder in Example 7.7.
In comparison to uncoded QPSK, this translates to an asymptotic coding gain of
goo = 10 log
4
·
586
= 3.60 dB
2
[W]
Thus, at the cost ofadded encoding and decoding complexity, we have achieved a 0.6 dB gain over
the TCM scheme discussed in Example 7.6.
Example 7.8 Consider the 8 state, 8-PSK TCM scheme discussed in Example 7.7. The equivalent
systematic encoder realization with feedback is given in Fig. 7.11.
｡ＱＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＮＭＭＭＭＭＭＭＭＭＭｾ｣Ｑ＠
S;
ＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭＭＫＭＭＭＭＭＭＭＭＭＭＭＭＭＭＡ＠ <>.2 Natural
8
2 Mapping
(8-PSK)
Fig. 7.11 The TCM Encoder for Example 7.7.
Let us represent the output of the convolutional encoder shown in Fig. 7.11 in terms of the input and
delayed versions of the input (See Section 6.3 of Chapter 6 for analytical representation of
Convolutional Codes). From the figure, we have
c1 (D)= a1 (D),
c2 (D)= a2 (D),

c3(D) = ( D
2
2 ) ｡ｩｄＩＫＨｾＩ＠ a2 (D)
1+D 1+D
Therefore, the Generator Polynomial Matrix of this encoder is
G(D) = [ 1 0 1ｾｾＳ＠ ]
0 1 _D_
1+D3
and the parity check polynomial matrix, H(D) , satisfying G(D). H T(D) = 0 is
H (D) = [d D1 + D3
].
We can re-write the parity check polynomial matrix H(D) = [H1 (D) H1
(D) H3
(D)], where
H 1 (D) =D
2
=(000 1OO)binary =(04)octal•
H 2 (D) =D =(000 OIO)binary =(02)octal•
H 3 (D)= 1+D 3
= (001 001)binary = (11)octa1.
Table 7.1 gives the encoder realization and asymptotic coding gains of some of the good TCM
Codes constructed for the 8-PSK signal constellation. Almost all ofthese TCM schemes have been
found by exhaustive computer searches. The coding gain is given with respect to the uncoded
QPSK. The parity check polynomials are expressed octal form.
Table 7.1 TCM Schemes Using 8-PSK
4 2 5 4.00 3.01
8 04 02 11 4.58 3.6
16 16 04 23 5.17 4.13
32 34 16 45 5.75 4.59
64 066 030 103 6.34 5.01
128 122 054 277 6.58 5.17
256 130 072 435 7.51 5.75
Example 7.9 We now look at a TCM scheme that involves 16QAM. The TCM encoder takes in
3 bits and outputs one symbol from the 16QAM Constellation Diagram. This TCM scheme has a
throughput of 3 bits/s/Hz and we will compare it with uncoded 8-PSK, which also has a through-
put of 3 bits/s/Hz.
Let the minimum distance between two points in the Signal Constellation of 16QAM be 0oas
depicted in Fig. 7.12. It is assumed that all the signals are equiprobable. Then the average signal
energy of a 16QAM signal is obtained as
•• ••
e • • •
• •• •
• • • •
-...o eo 0. 0.,
!J. =J2ｾＭ o 'e o ., A
1 vo eoeo 0 A, ｾ＠ｾ＠ｾ＠ｾ＠
o e o e eo eo
0 0 0 0 0 • 0 •
1 1
II ｾｾｾ＠
!J.3 =2-J.c.Co ｾ＠ｾ＠ 0
0 0 0 0
Aooo
ooeo oooo
oooo oeoo
eooo oooo
oooo oooe
A,oo Ao1o
0 0 0 0
0 0 0 •
0 0 0 0
0. 0 0
A110
0 • 0 •
ｾ＠ｾ＠ｾ＠ｾ＠ Ao1
0 0 0 0
1
0 • 0 0 0 0 0 •
0 0 0 0 0 0 0 0
0 0 0 • oeoo
0 0 0 0 0 0 0 0
Aoo, A,o,
Fig. 7.12 Set Partitioning of 16QAM.
Thus we have,
Bo = Ｒｾ＠ Es
10
0000
ａＬＬｾｾｾｾ＠
eo eo
1
0 0 0 0 0000
., 0 0 0 0 0. 0
0 0 0 0 0 0 0 0
0 0. 0 • 0 0 0
Ao11 A,,,
The Trellis Diagram for the 16QAM TCM scheme is given in Fig. 7.13. The trellis has 8states.
Each node has 8 bnmches emanating from it because the encoder takes in 3 input bits at a time (23
= 8).
The encoder realization is given in Fig. 7.14. The Ungerboeck design rules are followed to assign
the symbols to the different bnmches. The branches diverging from a node and the branches
merging back to a node are assigned symbols from the set A0 and A1• The parallel paths are
assigned symbols from the lowest layer of the Set Partitioning Tree (A000, A001 , etc.).
The squared Euclidean Distance between any two parallel paths is ｌＱｾ＠ = 885.This is by design
as we have assigned symbols to the parallel paths from the lowest layer ofthe set Partitioning Tree.
The minimum squared Euclidean Distance between non-parallel paths is
4 = L1f + L1o2
+ L1f = 5bo2
Therefore, the free Euclidean Distance for the TCM scheme is
dfu=min ｛ＸｾＬ＠ 58#] = sDg = 2£.

Note that the free Euclidean Distance is determined by the non-parallel paths rather than the
parallel paths. We now compare the TCM scheme with the uncoded 8-PSK, which has the same
throughput. For uncoded 8-PSK, the minimum squared Euclidean Distance is (2 - J2)Es. Thus,
the asymptotic coding gain for this TCM encoder is
2
g, = lOlog J2 5.3 dB
2- 2
State S0: Aooo A1oo Ao1o A110
So So So
State S1: Aoo1 A1o1 Ao11 A111 0
State Si Aooo A1oo A110 Ao1o 0
State S3: A1o1 Aoo1 A111 Ao11 0
State S4: Ao1o A110 Aooo A1oo 0
State S5: Ao11 A111 Aoo1 A1o1 0
State Ss: A110 Ao1o A1oo Aooo 0
State Sr: A111 Ao11 A1o1 Aoo1 o
Fig. 7.13 Trellis Diagram for the 16 QAM TCM Scheme.
｡ＱＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾ｣Ｑ＠
8;! ＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＮＭＭＭＭＭＭＭＭＭＭｾ＠ c:z Natural
Mapping
ＸＳＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭＭＭＭＭＭＭＭＫＭＭＭＭＭＭＭＭＭＭｾｾ＠
C4 (16QAM)
S;
Fig. 7.14 The Equivalent Systematic Encoder for the Trellis Diagram Shown in Fig. 7. 13.
!7 sa $ a " 9 7 ··no 0
7.5 TCM DECODER
We have seen that, like Convolutional Codes, TCM Schemes are also described using Trellis
Diagrams. Any input sequence to a TCM encoder gets encoded into a sequence of symbols
based on the Trellis Diagram. The encoded sequence corresponds to a particular path in this
trellis diagram. There exists a one-to-one correspondence between an encoded sequence and a
path within the trellis. The task of the TCM decoder is simply to identify the most likely path in
the trellis. This is based on the maximum likelihood criterion. As seen in the previous chapter,
an efficient search method is to use the Viterbi algorithm (see Section 6.7 of the previous
chapter).
For soft decision decoding of the received sequences using the Viterbi Algorithm, each trellis
branch is labelled by the branch metric based on the observed received sequence. Using the
maximum likelihood decoder for the Additive White Gaussian Noise (AWGN) channels, the
branch metric is defined as the Euclidean Distance between the coded sequence and the
received sequence. The Viterbi Decoder finds a path through the trellis which is closest to the
received .sequence in the Euclidean Distance sense.
Definition 7.3 1be branch metric for a TCM scheme designed for AWGN
channel is the Euclidean Distancebetween the received signal and the signal associated
with the corresponding branch in the trellis.
In the next section, we study the performance of TCM schemes over AWGN channels. We
also develop some design rules.
7.6 PERFORMANCE EVALUATION FOR AWGN CHANNEL
There are different performance measures for a TCM scheme designed for an AWGN channel.
We have already discussed the asymptotic coding gain, which is based on free Euclidean
Distance, d.free . We will now look at some other parameters that are used to characterize a TCM
Code.
Definition 7.4 The average number of nearest neighbours at free distance, ｎＨｾＩＬ＠
gives the average number of paths in the trellis with free Euclidean Distance drmfrom a
transmitted sequence. This number is used in conjunction with dfrte for the evaluation
of the error event probability.
Definition 7.5 Two finite length paths in the trellis form an error event if they
start form the same state, diverge and then later merge back. An error event of length
lis defined by two coded sequences sn and in ,
s, = (sn, sn+l• ... ' sn+l+l )
such that
sn+l+l = 5n+l+l
s; :t s;, i = n + 1, ... , n + L (7.5)
Definition 7.6 The probability of an error event starting at time n, given that the
decoder has estimated the correct transmitter state at that time, is called the error
･ｶ･ｾｴ＠ probability, Pe-

r.
r
'i
/1
li
!i
!i
II
vi
r:
I
1,
•1
.'l
Ill
II!
I!
:I
;•
I
The performance ofTCM schemes is generally evaluated by means ofupper bounds
on error event probability. It is based on the generating function approach. Let us
consider again the Ungerboek model for rate ml(m + 1) TCM scheme as shown in
Fig. 7.7. The encoder takes in mbits at a time and encodes it to m+1 bits, which are
then mapped by a memoryless mapper, f(.), on to a symbol s,, Let us call the binary
(m + 1)-tuples ci as the labels of the signals si. We observe that there is a one-to-one
｣ｯｲｲ･ｾｰｯｮ､･ｮ｣･＠ between a symbol and its label. Hence, an error event oflength l can
be equivalently described by two sequences of labels
Cz = (ck, ck+l' ..., ｣ｫＫｾｬＩ＠ and C[= (ck, c'k+b ..., c ＧｫＫｾｬ＠ ), (7.6)
where, ck = ck EB ek, c'k+1 = ckt1 EB ek+1 , .•• , and E1=(ek, ek+1 , ••• , ･ｫＫｾｬＩ＠ is a sequence
of binary error vectors. The mathematical symbol EB represents binary {modulo-2)
addition. An error event of length l occurs when the decoder chooses, instead of the
transmitted sequence C1, the sequence C[ which corresponds to a path in the trellis
diagram that diverges from the original transmitted path and re-merges back exactly
after !time intervals. To find the probability of error, we need to sum over all possible
values of l the probabilities of error events oflength l (i.e., joint probabilities that C1is
transmitted and C[ is detected). The upper bound on the probability of error is
obtained by the following union bound
00
p ･ｾ＠ LL LP(sl)P(Sz,s[) (7.7)
l = 1s1 s{ *'I
where P (s1 s[) denotes the pairwise error probability (i.e., probability that the
sequence s1 is transmitted and the sequence s[ is detected). Assuming a one-to-one
correspondence between a symbol and its label, we can write
00
ｐＬｾ＠ LL LP(C1)P(C1,Cl)
i=lCr Cl*Cr
00
= LL LP(C1)P(C1 EB E,) {7.8)
ｬ］ＱｃｲｾＪｯ＠
The pairwise error probability P (Cz, Cz, E1 ) can be upper-bounded by the
Bhattacharyya Bound (see Problem 7.12) as follows
-{-
1
-llf(CL)- f(Cr)l2
}
P(Cz, Cz, EB E1) ｾ＠ e 4No
= eＭｾＮｾ＠ ..ｾｾｦＨｃｬＩＭ f(C'llf)
(7.9)
where f(.) is the memoryless mapper. Let D = e-{ＴｾＰ＠ } (for Additive White Gaussian
Noise Channel with single sided power spectral density N0), then
(7.10)
where ､ｾｻｦＨｃＱ＠ ),f(C'1)) represents the squared Euclidean distance between the symbol sequences
s1and s[. Next, define the function
W(Et) = LP(Cl) r)if(Cr)- f(Ct(;f)Etlii
2
Ct
(7.11)
We can now write the probability of error as
00
ｰ･ｾ＠ L LW(El) (7.12)
I= 1Ez *o
From the above equation we observe that the probability of error is upper-bounded by a sum
over all possible error events, E1. Note that
(7.13)
We now introduce the concept of an error state diagram which is essentially a graph whose
branches have matrix labels. We assume that the source symbols are equally probable with
probabilities 2-m = 11M
DefbdtloD ,._,.Theerror weight.matrtx,·.G(e;) is tm Nx Nmatrix whose elementin
me·f'.iowtt:nd f1h eolumn is detmed u
Ｔｾ＠ )t :ｲｾｾｾｅｶｬＨｾｍＩ＠ｾ＠ＱＧｾＱＴｴｾｬｴ＠ ·ifthereJl.aｾｾＮＭ･Ｂｬｴｯ＠ｳｴ｡ｴ･ｾｴ＠
:_ ' l ｾ＠ ; ' ' ［ａｦｾｾｦＮ＠ ·. .· . •ｾＧ＠ •,;••; .: • :•'• I : . , . '
Ｎ｡ｇｵｴｾｽｾ＠ i}FQ,'ifthere is no ｾＭＮｦｩＺｯｭＮｴ｡ｴ･ｰ｣ｷＮ＠ｦＮﾫｾ＠ trdiD, {7.14)
·where c
1
-+ fare the label vectorS generatedby the transitionfrom. state .p.to state 1J.
The summation accounts for the possible parallel transitions (parallel paths) between states in
the Trellis Diagram. The entry (p, q) in the matrix G provides an upperbound on the probability
that an error event occurs starting from the node pand ending on q. Similarly, (11N)Gl is a vector
whose pth entry is a bound on the probability of any error event starting from node p. Now, to any
sequence E1
=e1
, e2
, ..., e1, there corresponds a sequence of l error weight matrices G(e1), G(e1), .•. ,
G(e1). Thus, we have
1 T l
fYt:Et) = -1 TI G(en)1
N n=l
(7.15)
where 1 is a column N-vector all elements of which are unity. We make the following
observations:
·.I

I>
:1
li
II
ｉﾷｾ＠
(i) For any matrix A, 1T A 1 represents the sum of all entries of A.
l
(ii) The element (p, q) of the matrix = TI G{en) enumerates the Euclidean Distance
n=l
involved in transition from state pto state qin exactly l steps.
Our next job is to relate the above analysis to the probability of error, Pe . It should be noted
that the error vectors e1, e2 , ... , e1 are not independent. The Error State Diagram has a structure
determined only by the Linear Convolutional Code and differs from the Code State Diagram
only in the denomination of its state and branch labels (G(ei)). Since the error vectors ei are
simply the differences of the vectors ci, the connections among the vectors ei are the same as that
among the vectors ci. Therefore, from (7.12) and (7.15) we have
(7.16)
where
T(D) = ｾ＠ IT Gl, (7.17)
and the matrix
00 l
G=L L n G(en) (7.18)
l =1 Er"O n =1
is the matrix transfer function of the error state diagram. T(D) is called the scalar transfer
function or simply the transfer function of the Error State Diagram.
&ample 7.10 Consider a rate lf2 TCM scheme with m = 1, and M = 4. It takes
one bit at a time and encodes it into two bits, which are then mapped to one of
the four QPSK symbols. The two state Trellis Diagram and the symbol allocation
from the 4-PSKConstellation is given in Fig. 7.15.
10 00
0 0 0
16
Fig. 7.15
Let us denote the error vector bye= (e2 e1). Then, from (7.14)
1 [nllf(OO)- f(OOE!l e2'Illi
2
G(e2el) = 2 nllf(OO)- f(OlE!le2'Il112
= _!_[nllf(OO)- [(t2e1 lll
2
2 nllf(Ol)-f(e21Jll2
nllf(lO)- f{lO Ellt21J. )1
2
]
nllf(ll)- f(ll Ell e2till2
nllf(lO)- f(e21J. )12
]
nllf(ll)- f(e2iJ. ＩｾＲ＠
where e·= 1 ffi e. The error state diagram for this TCM scheme is given in Fig. 7.16.
G(10) A G(01)
0
So s1 So
Fig. 7.16
The matrix transfer function of the error state diagram is
G = G(I0)[/2 -G(ll)r
1
G(Ol)
(7.19)
(7.20)
where 12
is the 2 x 2 identity matrix. In this case there are only three error vectors possible,
{01,10,11 }. From (7.19) we calculate
G(OI) = ｾ＠｛ｾＺ＠ｾＺ｝Ｌ＠ G(IO) = ｈｾＺ＠ｾＺ｝Ｎ｡ｮ､＠ G(ll) = ｈｾＺ＠ｾＺ｝＠
Using (7.20) we obtain the matrix transfer function of the Error State Diagram as
1 D
6
[1 1]
G = 2 1- D6 1 1
The scalar transfer function, T(D), is then given by
T(D) = 1__ 1T G1= D
6
2
2 1-D
(7.21)
(7.22)
d b b · · D e- {Ｔｾｊ＠ in
The upper bound on the probability of error can be compute y su sututmg =
(7.22).
6 I
P < D -1
e- 2 -
1- D D=t 4No
(7.23)
Example 7.11 Consider another rate 112 TCM scheme with m = 1, and M = 4. The t:Vo state
Trellis Diagram is same as the one in the previous example. However, the symbol allocat1on from
the 4-PSK Constellation is different, and is given in Fig. 7.17.

01
10 00
0 0 0
11
Fig. 7.17
Note that this symbol assignment violates the Ungerboek design principles. Let us again denote the
error vector bye= (ez et ). The Error State Diagram for this TCM scheme is given in Fig. 7.18.
G(11)
G(10)
s1
Fig. 7.18
G(01)
The matrix transfer function of the Error State Diagram is
So
G = G(11)[lz- G(lO)r1
G(01) (7.24)
where lz is the 2 x 2 identity matrix. In this case there are only three error vectors possible
{01, 10,11 }. From (7.19) we calculate '
G(OI) = ｾ＠｛ｾＺ＠ｾＺ｝Ｎ＠ G(IO) = ｾ＠ [ｾＺ＠ｾＺ｝Ｎ｡ｮ､＠ G(ll) = ｾ＠ [ｾＺ＠ｾＺ｝＠
Using (7.23) we obtain the matrix transfer function of the Error State Diagram as
G 1 D
4
[1 1]
= 2 1- D4 1 1
(7.25)
The scalar transfer function, T(D), is then given by
T (D) = _!_ 1T G1= D
4
2 1- D4 (7.26)
The upper bound on the probability of error is
D4 I
P, 1- D4 ' D=e__:_!__
{7.27)
-tNo
Comparing (7.23) and (7.27) we observe that, simply by changing the symbol assignment to th
branches oftheTrellis Diagram, we degrade the performance considerably. In the second examplee
ｴｨｾ＠ upper ｢ｯｵｮｾ＠ on the error probability has loosened by two orders of magnitude(assuming D ＼ｾ＠
1, 1.e., for the high SNR case).
A tighter upper bound on the error event probability is given by (exercise)
(Pf]
!!h._ -1
p ｾ＠ _!_efrc d free e 4N0 T(D)I _ -tNo
e 2 4N D-t .
0
(7.28)
From (7.28), an asymptotic estimate on the error event probability can be obtained by
considering only the error events with free Euclidean Distance
P, ｾ＠ｾ＠ N(dfr<,) efrc (Jｾ＠ J (7.29)
The bit error probability can be upper bounded simply by weighting the pairwise error
probabilities by the number of incorrect input bits associated with each error vector and then
dividing the result by m. Therefore,
-1
p < 1 aT(D,[) I mo
b- m aJ I= l,D= t
(7.30)
Where T(D, I) is the augmented generating function of the modified State Diagram. The
concept of the Modified State Diagram was introduced in the chapter on Convolutional
Codes (Section 6.5). A tighter upper bound can also be obtained for the bit error probability, and is
given by
p < _l_n+..c (Jd]re, Je 1;: aT(D,l) (7.31)
e- 2m r:;J'' 4N
0
ai 1
l=l,D=e 4
No
From (7.31), we observe that the upper bound on the bit error probability strongly depends ondfne
. In the next section, we will learn some methods for estimating dfree •
7.7 COMPUTATION OF drree
We have seen that the Euclidean Free Distance, dfree• is the singlemost importaat parameter for
determining how good a TCM scheme is for AWGN channels. It defines the asymptotic coding
gain of the scheme. In chapter 6 (Section 6.5) we saw that the generating function can be used to
calculate the Hamming Free Distance dfrer The transfer function of the error state diagram, T(D),
includes information about the distance of all the paths in the trellis from the all zero path. If
I(D) is obtained in a closed form, the value of dfree follows immediately from the expansion of
the function in a power series. The generating function can be written as
d2 d2
T (D) = N (dfree) D free + N (dnext) D next + ... (7.32)
where d!xt is the second smallest squared Euclidean Distance. Hence the smallest exponent of
Din the series expansion is ｴｩｰＮ･ｾ＠ However, in most cases, a closed form expression for T(D) may
not be available, and one has to resort to numerical techniques.
.I

Consider the function
¢ (D)= ln [ T(dJ)]
1
T(D)
(7.33)
¢I(D) decreases monotonically to the limit dfo
2
as D ---7 0 Therefore we h b d
d2 .d ee · ave an upper oun
on free proVI ed D>0. In order to obtain a lower bound on d}ee consider the following function
¢2(D) = ln T(D)
ln D (7.34)
Taking logarithm on both sides of (7.32) we get,
d}ee ln D = ln T(D) -ln N (d. ) -ln [1+ N(dfree) ｄ､ｾＱ＠ -dL .. ·]
}Tee N(dnext ) ,.•• (7.35)
If we take D ---7 0, provided D > 0, from (7.34) and (7.35) we obtain
ln T(D) 2
ln D = dfree- e(D) (7.36)
where,. e (D) is a function that is greater than zero, and tends to zero monotonically as D ---7 0
Thus, If we take smaller and smaller values of ¢1(D) and ¢2(D), we can obtain val th .
extremely close to dfree- ues at are
It should be kept in mind that even though d2 ·s th · 1 · -
d . free I e smg e most Important parameter to
･ｴ･ｲｭｩｾ･＠ the quality of a TCM scheme, two other parameters are also influential:
(I) The error coefficient N (d )· A £ t f · · ·
d . . free · ac or o two mcrease m this error coefficient
re uces the codmg gam by approximately 0.2 dB for error rates of 10-6.
(ii) The next distance d · · th d all ·
th . next · IS e secon sm est Euchdean Distance between two
pa ｾ＠ formm? an. error event. If dnext is very close to dfrw the SNR requirement for
goo approximation of the upper bound on Pe may be very large.
So fax:, we ｨ｡ｶｾ＠ｾｯ｣ｵｳｳ･､＠ primarily on AWGN channels. We found that the best design
ｳｴｲ｡ｴｾｧｹ＠ Is to maximiZe the free Euclidean Distance, dfret' for the code. In the next section we
｣ｾｮｳｩ､･ｲ＠ the design rules for TCM over fading channels. Just to remind the readers fading
cf =·els ｾ･＠ frequen.tly encountered in radio and mobile communications. One ｣ｯｭｭｾｮ＠ cause
0 ｭｾ＠ IS the ｭｵｾｴｩｰ｡ｴｨ＠ nature of the propagation medium. In this case, the signal arrives at
ｾ･＠ＺｾＺｾ･ｲ＠ :om.different ｰ｡ｾｳ＠ (with time varying nature) and gets added together. Depending
si al ｾｾＮ＠ e signals from ｾｉｾ･ｲ･ｾｴ＠ paths add up in phase, or out of phase, the net received
Ｚｰｬｩｾｾ･＠ I(bitsl a ranthdomhvanld)ation m amplitude and phase. The drops in the received signal
e ow a res o are called fades.
7.8 TCM FOR FADING CHANNELS
In this section we will co ·d th rl
(MPSK) over a Fadin nsi er e pe ormance of trellis coded M-ary Phase Shift Keying
g Channel. We know that a TCM encoder takes in an input bit stream and
outputs a sequence of symbols. In this treatment we will assume that each of these symbols si
belong to the MPSK signal set. By using complex notation, each symbol can be represented by
a point in the complex plane. The coded signals are interleaved in order to spread the burst of
errors caused by the slowly varying fading process. These interleaved symbols are then pulse-
shaped for no inter-symbol interference and finally translated to RF frequencies for
transmission over the channel. The channel corrupts these transmitted symbols by adding a
fading gain (which is a negative gain, or a positive loss, depending on one's outlook) and
AWGN. At the receiver end, the received sequences are demodulated and quantized for soft
decision decoding. In many implementations, the channel estimator provides an estimate of the
channel gain, which is also termed as the channel state information. Thus we can represent
the received signal at time i as
(7.37)
where ni is a sample of the zero mean Gaussian noise process with variance N012 and gi is the
complex channel gain, which is also a sample of a complex Gaussian process with variance ｣ｲｾ＠
The complex channel gain can be explicitly written using the phasor notation as follows
gi= ai ･ｩｾｩＬ＠ (7.38)
where ai and ¢i are the amplitude and phase processes respectively. We now make the following
assumptions:
(i) The receiver performs coherent detection,
(ii) The interleaving is ideal, which implies that the fading amplitudes are statistically
independent and the channel can be treated as memoryless.
Thus, we can write
(7.39)
We kr1ow that for a channel with a diffused multipath and no direct path the fading amplitude
is Rayleigh distributed with Probability Density Function (pdf)
(7.40)
For the case when there exists a direct path in addition to the multipath, Rician Fading is
observed. The pdf of the Rician Fading Amplitude is given by
PA (a)= 2a(1 + K)e- (K +a
2
(k +I)10( 2aJK(l + K)), (7.41)
where /
0
(.) is the zero-order, modified Bessel Function of the first kind and K is the Rician
Parameter defined as follows.
Definition 7.8 The Rician Parameter K is defined as the ratio of the energy of the
direct ｣ｯｭｰｯｮｾｮｴ＠ to the energy of the diffused multipath component. For the
extreme case of K = 0, the pdf of the Rician distribution becomes the same as the pdf
of the Rayleigh Distribution.

Information Theory, .Coding and Cryptography
We now look at the performance of the TCM scheme over a fading channel. Let r1= (r1, r2,
...,rei) be the received signal. The maximum likely decoder, which is usually implemented by
the Viterbi decoder, chooses the coded sequence that most likely corresponds to the received
signals. This is achieved by computing a metric between the sequence of received signals, rtl
and the possible transmitted signals, St As we have seen earlier, this metric is related to the
conditional channel probabilities
m(r1, s1) =In p(r11s& (7.42)
If the channel state information is being used, the metric becomes
m(r[, St; aI) =In p(rzlsz, aI) (7.43L
Under the assumption of ideal interleaving, the channel is memoryless and hence the metrics
can be expressed as the following summations
l
m(rz, s1
) =lin p(r1is1) (7.44)
i=l
and
l
m(rz, hz; al) = llnp(rzisz,at) (7.45)
i=l
First, we consider the scenario where the channel state information is known, i.e., a; = a;. The
metric can be written as
m(r;, s;; a;) =- ir;- a; s;i2
Therefore, the pairwise error probability is given by
P2(Sz, Sz) = Eal [P2(sb sii az)],
where
(7.46)
(7.47)
(7.48)
and E is the statistical expectation operator. Using the Chernoff Bound, the pairwise error
probability can be upper bounded as follows.
A l l+K [ ｋＴｾｯｬｳ＠ .. -s..l2]
P2(sz, Sz) ｾｉｉ＠ 1 exp - ＭＭＭＭＭＺ［ＭｾＭＭＭ
i=Il+K + 4No Is; -i.-1 l+K 4No lsi -iil2
For high SNR, the above equation simplifies to
A (1 + K)e- K
P2(sz, sz) ｾｲｲＭＭＭＭ］ＭＱＭＭＭ
iETJ 4N is;- s;i2
0
(7.49)
(7.50)
where 11 is.the set of all ifor which S; 7= si. Let us denote the number of elements in 17 by ｾ＠ , then
we can wnte
(7.51)
where
d; ＨｾＩ＠ｾ＠ 111S; - s;l2 (7.52)
iETJ
is the squared product distance of the signals s; 7= s; .The term ｾ＠ is called the effective
length of the error event (sz, 7= i 1). A union bound on the error event probability Pehas already
been discussed before. For the high SNR case, the upper bound on Pecan be expressed as
2
((1 + K)e- K)lry
ｐ･ｾ＠ I I ｡｛ｾＬ､ｰ＠ (lTJ)] try
ｾ､ｾＨｾＩ＠ＨＴｾｾＩ＠ d;(l")
(7.53)
where a [l11
, ､ｾ＠ＨｾＩ｝ｩｳ＠ the average number of code sequences having the effective length lTI and
the squared product distance dl (ｾＩＮ＠ The error event probability is actually dominated by the
smallest effective length ｾ＠ and the smallest product distance ､ｾ＠ (ｾＩＮ＠ Let us denote the smallest
effective length ｾ＠ by L and the corresponding product distance by ､ｾ＠ (ｾＩＮ＠ The error event
probability can then be asymptotically approximated by
((1 + K)e-K)L
Pe z a (L, d; (L)) L ·
(ＴｾｯＩ＠ d; (L)
(7.54)
We make the following observations from (7.54)
(i) The error event probability asymptotically varies with the Ltb power of SNR. This is
similar to what is achieved with a time diversity technique. Hence, Lis also called the
time diversity of the TCM scheme.
(ii) The important TCM design parameters for fading Channel are the time diversity, L,
and the product distance dJ(L). This is in contrast to the free Euclidean Distance
parameter for AWGN channel.
(iii) TCM codes designed for AWGN channels would normally fare poorly in fading
channels and vice versa.
(iv) For large values of the Rician parameter, K, the effect of the free Euclidean Distance
on the performance of the TCM scheme becomes dominant.
(v) At low SNR, again, the free Euclidean Distance becomes important for the
performance of the TCM scheme.
Thus the basic design rules for TCMs for fading channels, at high SNR and for small values of
K, are
,

'
I
rl
(i) maximize the effective length, L, of the code, and
(ii) minimize the minimum product distance dj (L).
Consider a TCM scheme with effective length, L, and the minimum product distance ih(L).
Suppose the code is redesigned to yield a minimum product distance, ih(L) with the same L.
The increa,se in the coding gain due the increase in the minimum product distance is given by
10 ､ｾ＠ (L)a1
L1 = SNR1 - SNR;.iP P ］ＭｬｯｧｾＭＭ
g el- ｾ＠ L ､ｾ＠ (L)a2
'
(7.55)
where ai, i= 1, 2, is the average number of code sequences with effective length L for the TCM
scheme i. We observe that for a fixed value of L, increasing the minimum product distance
corresponding to a smaller value of L is more effective in improving the performance of the
code.
So far, we have assumed that the channel state information was available. A similar analysis
as carried out for the case where channel state information was available can also be done when
the information about the channel is unavailable. In the absence of channel state information, the
metric can be expressed as
m (r;, sj; aj) = -lri- sil2
. (7.56)
After some mathematical manipulations, it is shown that
A (2e/"')" ｛ＬｾｉｳＬＭ .i,l2r
R (s1
s1
) < (1 + K)lr, e-t
11
K
2
' - (l!No)lr, dj(ZTI)
(7.57)
Using arguments discussed earlier in this section, the error event probability Pecan be
determined for this case when the channel state information is not available.
Coding and modulation were first analyzed together as a single entity by Massey in 1974. Prior
to that time, in all coded digital communications systems, the encoder/decoder and the
modulator/demodulator were designed and optimized separately. Massey's idea of combined
coding and modulation was concretized in the seminal paper by Ungerboeck in 1982. Similar
ideas were also proposed earlier by Imai and Hirakawa in 1977, but did not get due attention.
The primary advantage ofTCM was its ability to achieve increased power efficiency without the
customary increase in the bandwidth introduced by the coding process. In the following years
the theory of TCM was formalized by different researchers. Calderbank and Mazo showed that
the asymmetric one-dimensional TCM schemes provide more coding gain than symmetric
TCM schemes. Rotationally invariant TCM schemes were proposed by Wei in 1984, which
were subsequently adopted by CCITT for use in the new high speed voiceband modems.
SUMMARY
• The Trellis Coded Modulation (TCM) Technique allows us to achieve a better
performance without bandwidth expansion or using extra power.
• The minimum Euclidean Distance between any two paths in the trellis is called the free
Euclidean Distance, ｾ･･＠ of the TCM scheme.
• The difference between the values of the SNR for the coded and uncoded schemes
required to achieve the same error probability is known as the coding gain, g= SNRiuncoded
- SNRicoded' At high SNR, the coding gain can be expressed as g, = giSNR--?= = 10 log
(dfee IEs)coded , where g"" represents the Asymptotic Coding Gain and Es is the average
(dfree / Es)uncoded
signal energy.
• The mapping by Set Partitioning is based on successive partitioning of the expanded
2m+1-ary signal set into subsets with increasing minimum Euclidean Distance. Each time
we partition the set, we reduce the number of the signal points in the subset, but increase
the minimum distance between the signal points in the subset.
• Ungerboeck's TCM design rules (based on heuristics) for AWGN channels are
Rule 1: Parallel transitions, if present, must ｢ｾ＠ associated with the signals of the subsets in
the lowest layer of the Set Partitioning Tree. These signals have the minimum Euclidean
Distance L1,n + 1.
Rule 2: The transitions originating from or merging into one state must be associates with
signals of the first step of set partitioning. The Euclidean distance between these signals is
at least L1 1.
Rule 3: All signals are used with equal frequency in the Trellis Diagram.
• The Viterbi Algorithm can be used to decode the received symbols for a TCM scheme.
The branch metric used in the decoding algorithm is the Euclidean Distance between the
received signal and the signal associated with the corresponding branch in the trellis.
• The average number of nearest neighbours at free distance, N(4u ), gives the ｡ｶ･ｾ｡ｧ･＠
number of paths in the trellis with free Euclidean Distance ｾ･･＠ from a transmitted
sequence. This number is used in conjunction with ｾ･･＠ for the evaluation of the error
event probability.
• The probability of error Pe :5 T (D) ID=e_ 114N0, where, T(D) = ｾ＠ lTGl, and the matrix
l
G = I I ITG(en). T (D) is the scalar transfer function. A tighter upper bound on the
l =1Et oF-On= 1
d2
error event probability is given by Pe :5 l_erfc ｛ｾ＠ d}.. JeＴｾ＠ T(D) _::..!.___
2 4N0 D=e4No

ｾ＠ (0 + K)e-K)lr, , ｾ＠
• For fading channels, P:z, (sｾ＠ s) ｾ＠ lr, where d; (l11 ) ｾ＠ IJ I si - si1
2
• The term ｾ＠
(_1_J d2 (/ ) iETI
4N0 P 11
is the effective length of the error event (sz, i1) and K is the Rician parameter. Thus, the
error event probability is dominated by the smallest effective length ｾ＠ and the smallest
product distance d/ ＨｾＩＮ＠
• The design rules for TCMs for fading channels, at high SNR and for small values ofK, are
(i) maximize the effective length, L, of the code, and
(ii) minimize the minimum product distance d/ (L).
• The increase in the coding gain due to increase in minimum product distance is given by
I
10 ､ｾＲＨｌＩ｡Ｑ＠ . .
ｾｧ＠ = SNR1 - ｓｾ＠ P. _ P.
2
=- log 2
, where ai , z= 1, 2, Is the average number
el ' L dp1(L)a2
of code sequences with effective length L for the TCM scheme i.
nA ｬｦｴｴｬ･Ｌｾ＠ｾＭｾﾷＢＧＰｴＧｬｬｯｦｾｾ＠ ,
u . · H./£ NUN'fJ-{Sa/.ii:;J {JB?0-1916;1
PRO'BLEMS
7.1 Consider a rate 2/3 Convolutional Code defined by
G(D) = [ 1 D D + D2 l
D2
1+ D 1+ D + D2
This code is used with an 8PSK signal set that uses Gray Coding (the three bits per
symbol are assigned such that the codes for two adjacent symbols differ only in 1 bit
location). The throughput of this TCM scheme is 2 bits/sec/Hz.
(a) How many states are there in the Trellis Diagram for this encoder?
(b) Find the free Euclidean Distance.
(c) Find the Asymptotic coding gain with respect to uncoded QPSK, which has a
throughput of 2 bits/sec/Hz.
7.2 In Problem 7.1, suppose instead of Gray Coding, natural mapping is performed, i.e.,
So ｾＰＰＰＬ＠ S1 ｾ＠ 001, ..., S7 ｾ＠ 111.
(a) Find the free Euclidean Distance.
(b) Find the Asymptotic coding gain with respect to uncoded QPSK (2 bits/sec/Hz).
7.3 Consider the TCM encoder shown in Fig. 7.19.
Fig. 7.19 Figure for Problem 73.
(a) Draw the State Diagram for this encoder.
(b) Draw the Trellis Diagram for this encoder.
S;
(c) Find the free Euclidean Distance, ｾｦｲ･･Ｎ＠ In the Trellis Diagram, show one pair of two
paths which result in tf}ree .What is N(tf}reJ?
(d) Next, use set partitioning to assign the symbols of 8-PSK to the branches of the Trellis
Diagram. What is the ､ｾ･･＠ now?
(e) Encode the following bit stream using this encoder: 1 0 0 1 0 0 0 I 0 1 0 ... Give your
answer for both the natural mapping and mapping using Set Partitioning.
(f) Compare the asymptotic coding gains for the two different kinds of mapping.
7.4 We want to design a TCM scheme that has a 2/3 convolutional encoder followed by a
signal mapper. The mapping is done based on set partitioning of the Asymmetric
Constellation Diagram shown below. The trellis is a four-state, fully connected trellis.
(a) Perform Set Partitioning for the following Asymmetric Constellation Diagram.
(b) What is the free Euclidean distance, dfrel' for this asymmetric TCM scheme?
Compare it with the i)ee for the case when we use the standard 8-PSK Signal
Constellation. .
(c) How will you choose the value of efor improving the performance of the TCM
scheme using the Asymmetric Signal Constellation shown in Fig. 7.20?
Ss s-,
Fig. 7.20 Figure for Problem 74.

7.5 Consider the rate 3/4 encoder shown in Fig. 7.21. The four output bits from the encoder
are mapped onto one of the sixteen possible symbols from the Constellation Diagram
shown below. Use Ungerboeck's design rules to design a TCM scheme for an AWGN
channel. What is the asymptotic coding gain with respect to uncoded 8-PSK?
a1 c1
a2 C2
•
• •
a3 CJ
• •
•
c4
Fig. 7.21 Figure for Problem 7.5.
7.6 Consider the expression for pairwise error probability over a Rician Fading Channel.
Comment.
(b) Show that for low SNR the original inequality may be expressed as
R(s s) ｾ･ｸｰ＠ [di (sz,iz)]
2 z, z 4No
7.7 Consider a TCM scheme designed for a Rician Fading Channel with an effective length
L and the minimum product distance d: (L). Suppose, we wish to redesign this code to
obtain an improvement of 3 dB in SNR.
(a) Compute the desired effective length L if the tfj (L) is kept unchanged.
(b) Compute the desired product distance d:(L) if the effective length L is kept
unchanged.
7.8 Suppose you have to design a TCM scheme for an AWGN channel (SNR = y). The
desired BER is Pe. Draw a flowchart as to how you will go about designing such a scheme.
(a) How many states will there be in your Trellis?
(b) How will you design the convolutional encoder?
(c) Would you have parallel paths in your design?
(d) What kind of modulation scheme will you choose and why?
(e) How will you assign the symbols of the modulation scheme to the branches?
7.9 For Viterbi decoding the metric used is of the form
m (rb s1) =In p(rzlsz).
(a) What is the logic behind choosing such a metric?
(b) Suggest another metric that will be suitable for fading channels. Give reasons for
your answer.
7.10 A TCM scheme designed for a Rician Fading Channel (K = 3) and a high SNR
environment (SNR = 20 dB) has L = 5 and d/ (L) = 2.34 E'/. It has to be redesigned to
produce an improvement of 2 dB.
(a) What is the tfj(L) of the new code?
(b) Comment on the new d ｾ･･ﾷ＠
7.11 Consider the TCM scheme shown in Fig. 7.22 consisting of a rate lf2 convolutional
encoder coupled with a mappei.
(a) Draw the Trellis Diagram for this encoder.
(b) Determine the scalar transfer function, T (D).
(c) Determine the augmented generating function, T (D, L, !).
(d) What is the minimum Hamming Distance (dfree ) of this code?
(e) How many paths are there with this dfree?
10
0 0 0
Fig. 7.22 Figure for Problem 7. 77.
7.12 Consider the pairwise error probability P2(s1, Sz).
(a) For a maximum likelihood decoder, prove that
P2(sb Sz) = JｦＨｲＩｰｒｉｓＨｾｳｺＩ､ｲ＠
00
where r is the received vector, pRl s(r ISz) is the channel transition probability density
function and
I
I

(b) Show that
f( ) < PRis(riiz)
r - PRis(riiz)
P2(sb iz) ｾ＠ JJPRis(rliz )PRis(risz)dr
COMPUTER PROBLEMS
7.13 Write a computer program to perform trellis coded modulation, given the trellis structure
and the mapping rule. The program should take in an input bit stream and output a
sequence of symbols. The input to the program may be taken as two matrices, one that
gives the connectivity between the states of the trellis (essentially the structure of the
trellis) and the second, which gives the branch labels.
7.14 Write a computer program to calculate the squared free Euclidean distance ifree' the
effective length L, and the minimum product distance, dff (L), of a TCM Scheme, given
the Trellis Diagram and the label& on the branches.
7.15 Write a computer program that performs Viterbi decoding on an input stream of
symbols. This program makes use of a given trellis and the labels on the branches of the
Trellis Diagram.
7.16 Verify the performance of the different TCM schemes given in this chapter in AWGN
environment. To do so, take a long chain ofrandom bits and input it to the TCM encoder.
The encoder will produce a sequence of symbols (analog waveforms). Corrupt these
symbols with AWGN of different noise power, i.e., simulate scenarios with different
SNRs. Use Viterbi decoding to decode the received sequence of corrupted symbols
(distorted waveforms). Generate a plot of the BER versus the SNR and compare it with
the theoretically predicted error rates.
7.17 Write a program to observe the effect of decoding window size for the Viterbi decoder.
Generate a plot of the error rate versus the window size. Also plot the number of
computations versus the window size.
7.18 Write a computer program that performs exhaustive search ｩｾ＠ order to determine a rate
2/3 TCM encoder which is designed for AWGN (maximize dfree ). Assume that there are
four states in the Trellis Diagram and it is a fully connected trellis. The branches of this
trellis are labelled using the symbols from an 8-PSK signal set. Modify the program to
perform exhaustive search for a good TCM scheme with a four-state trellis with the
possibility of parallel branches.
7.19 Write a computer program that performs exhaustive search in order to determine a rate
2/3 TCM encoder which is designed for a fading channel (maximize d/(L)). Assume that
there are four states in the trellis diagram and it is a fully connected trellis. The branches
of this trellis are labelled using the symbols from an 8-PSK signal set. List out the dj (L)
and L of some of the better codes found during the search.
7.20 Draw the family of curves depicting the relation between Pe and Leff for different values
of K (Rician Parameter) for
(a) High SNR,
(b) Low SNR.
Comment on the plots.

8
Cryptography
ff,
8.1 INTRODUCTION TO CRYPTOGRAPHY
Cryptography is the science of devising methods that allow information to be sent in a secure
form· in such a way that the only person able to retrieve this information is the intended
recipient. Encryption is based on algorithms that scramble information into unreadable or non-
discernible form. Decryption is the process of restoring the scrambled information to its original
form (see Fig. 8.1).
A Cryptosystem is a collection of algorithms and associated procedures for hiding and
revealing (un-hiding!) information. Cryptanalysis is the process (actually, the art) of analyzing
a cryptosystem, either to verify its integrity or to break it for ulterior motives. An attacker is a
person or system that performs unauthorised cryptanalysis in order to break a cryptosystem.
Attackers are also referred to as hackers, interlopers or eavesdroppers. The process of attacking
a cryptosystem is often called cracking.
The job of the cryptanalyst is to find the weaknesses in the cryptosystem. In many cases, the
developers of a cryptosystem announce a public challenge with a large prize-money for anyone

who can crack the scheme. Once a cryptosystem is broken (and the cryptanalyst discloses his
techniques), the designers of the scheme try to strengthen the algorithm. Just because a
cryptosystem has been broken does not render it useless. The hackers may have broken the
system under optimal conditions using equipment (fast computers, dedicated microprocessors,
etc.) that is usually not available to common people. Some cryptosystems are rated in terms of
the length of time and the price of the computing equipment it would take to break them!
In the last few decades, cryptographic algorithms, being mathematical in nature, have
become so advanced that they can only be handled by computers. This, in effect, means that the
uncoded message (prior to encryption) is binary in form, and can therefore be anything; a
picture, a voice, a text such as an e-mail or even a video.
Fig. 8.1 The Process of Encryption and Decryption.
Cryptography is not merely used for military and diplomatic communications as many
people tend to believe. In reality, cryptography has many commercial uses and applications.
From protecting confidential company information, to protecting a telephone call, to allowing
someone to order a product on the Internet without the fear of their credit card number being
intercepted and misused, cryptography is all about increasing the level of privacy of individuals
and groups. For example, cryptography is often used to prevent forgers from counterfeiting
winning lottery tickets. Each lottery ticket can have two numbers printed onto it, one plaintext
and one the corresponding cipher. Unless the counterfeiter has cryptanalyzed the lottery's
cryptosystem he or she will not be able to print an acceptable forgery.
The chapter is organized as follows. We begin with an overview of different encryption
techniques. We will, then, study the concept of secret-key cryptography. Some specific secret-
key cryptographic techniques will be discussed in detail. The public-key cryptography will be
introduced next. Two popular public-key cryptographic techniques, the RSA algorithm and
PGP, will be discussed in detail. A flavour of some other cryptographic techniques in use today
will also be given. The chapter will conclude with a discussion on cryptanalysis and the politics
of cryptography.
8.2 AN OVERVIEW OF ENCRYPTION TECHNIQUES
The goal of a cryptographic system is to provide a high level of confidentiality, integrity, non-
repudiability and authenticity to information that is exchanged over networks.
Cryptography
Confidentiality of messages and stored data is protected by hiding information using
encryption techniques. Message integrity ensures that a message remains unchanged from the
time it is created to the time it is opened by the recipient. Non-repudiation can provide a way of
proving that the message came from someone even if they try to deny it. Authentication
provides two services. First, it establishes beyond doubt the origin of the message. Second, it
verifies the identity of a user logging into a system and continues to verify their identity in case
someone tries to break into the system.
Definition 8.1 A message being sent is known as plaintext The message is code<J.
using a Cryptographic Algorithm. This process is called encryption. An encrypted
message is known as ciphertext, and is turned back into plaintext by the process.of
decryption.
It must be assumed that any eavesdropper has access to all communication between the
sender and the recipient. A method of encryption is only secure if even with this complete
access, the eavesdropper is still unable to recover the original plaintext from the ciphertext.
There is a big difference between security and obscurity. Suppose, a message is left for
somebody in an airport locker, and the details of the airport and the locker number are known
only to the intended recipient, then this message is nf'>t secure, merely ｯ｢ｳ｣ｾｲ･Ｎ＠ If however, all
potential eavesdroppers know the exact location of the locker, and they still cannot open the
locker and access the message, then this message is secure.
Definition 8.2 A key is a value that causes a Cryptographic Algorithm to run in a
specific manner and produce a specific ciphertext as an outpUt: Thekeysize ｩｾ＠ usually
measured in bits. The bigger the key size, the more secure will be the algorithm.
Extlmpk8.1 Suppose we ｨ｡ｾ･＠ to ･ｾ＠ｾ＠ send ｴｨ･ｦｯｬｬｯｷｾ＠ stream Clfbinary､｡ｴ｡Ｈｾ＠
might be originating from voice, video, text or any other source)
0110001010011111 ...
We can use a 4-bit long key, x =1011, to encrypt this bit stream. To perform encryption, ｾ＠
plaintext (binary bit stream) is first subdivided in to blocks of4 bits.
0110 0010 1001 1111....
Each sub--blockis XORed (binary addition) with the key,x=1011. The encryptedmessagewillbe
1 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0....
The recipient must also possess the knowledge of the key in order to ｾ＠ the ｭｾ･ﾷＮＧＧｾ＠
､･｣ｲｹｰｴｩｯｮｾ＠ is fairly simple in this case. The ciphertext (the received bmary bitｾｴｳ＠
first•subdividedmto blocks of 4 bits. Bach sub-block is XORed with the key, x =1011. tbe
decrypted messagewill be the original plaintext
ｾｉ＠

0110 0010 1001 1111...
It should be noted that just one key is used both for encryption and decryption.
Example 8.2 Let us devise an algorithm for text messages, which we shall call character+ x. Let
x = 5. In this encryption technique, we replace every alphabet by the fifth one following it, i.e., A
becomes F, B becomes G, C becomes H, and so on. The recipients of the encrypted message just
need to know the value of the key, x, in order to decipher the message. The key must be kept
separate from the encrypted message being sent. Because there is just one key which is used for
encryption and decryption, this kind of technique is called Symmetric Cryptography or Single
Key Cryptography or Secret Key Cryptography. The problem with this technique is that the
key has to be kept confidential. Also, the key must be changed from time to time to ensure secrecy
of transmission. This means that the secret key (or the set of keys) has to be communicated to the
recipient. This might be done physically.
To get around this problem of communicating the key, the concept of Public Key
Cryptography was developed by Difie and Hellman. This technique is also called the
Asymmetric Encryption. The concept is simple. There are two keys, one is held privatelyand the
other one is made public. What one key can lock, the other key can unlock.
ｅｸ｡ｭｰｾ＠ 8.3 ｓｵｾｳ･＠ we want to send an encrypted message to recipient A using the public key
encryption techruque. To do so we will use the public key of the recipient A and use it to encrypt
ｴｨｾ＠ message. When the message is received, recipient A decrypts it with his private key. Only the
pnvate key of recipient A can decrypt a message that has been encrypted with his public key.
Similarly, ｲｾｩｰｩ･ｮｴ＠ B can only decrypt a message that has been encrypted with his public key.
Thus, no pnvate key ever needs to be communicated and hence one does not have to trust any
communication channel to convey the keys.
U:t us consider another scenario. Suppose we want to send somebody a message and also
ｰｲｯｖｉ､ｾ＠ a ｰｲｯｯｾ＠ that the message is actually from us (a lot of harm can be done by providing
bogu.s mformation, or rather, misinformation!). In order to keep a message private and also
ｰｲｾｖｉ､･＠｡ｵｾ･ｮｴｩ｣｡ｴｩｯｾ＠ (that it is indeed from us), we can perform a special encryption on the
ｰｬｾｮＮ＠ text wtth our pnvate key, then encrypt it again with the public key of the recipient. The
ｲ･ｃｩｰｩ･ｾｴ＠ .uses ｨｾｳ＠ private key to open the message and then use our public key to verify the
authenticity. This technique is said to use Digital Signatures.
ｔｨｾｲ･＠ is ｾｯｴｨ･ｲ＠ important encryption technique called the One-way Function. It is a non-
reversible qmck encryption method. The encryption is easy and fast, but the decryption is not.
Suppose we send a document to recipient A and want to check at a later time whether the
document has been tampered with. We can do so by running a one-way function, which
ｰｲｾ､ｵ｣･ｾ＠ a fixed length value called a hash (also called the message digest). The hash is the
umque signature of the document that can be sent along with the document. Recipient A can
run the same one-way function to check whether the document has been altered.
Cryptography
The actual mathematical function used to encrypt and decrypt messages is called a
Cryptographic Algorithm or cipher. This is only a part of the system used to send and
receive secure messages. This will become clearer as we discuss specific systems in detail.
As with most historical ciphers, the security of the message being sent relies on the algorithm
itself remaining secret. This technique is known as a Restricted Algorithm. It has the following
fundamental drawbacks. .
(i) The algorithm obviously has to be restricted to only those people that you want to be able
to decode your message. Therefore a new algorithm must be invented for every discrete
group of users.
(ii) A large or changing group of users cannot utilise them, as every time one user leaves the
group, everyone must change the algorithm.
(iii) If the algorithm is compromised in any way, a new algorithm must be implemented.
Because of these drawbacks, Restricted Algorithms are no longer popular and have given
way to key-based algorithms.
Practically all modem cryptographic systems make use of a key. Algorithms that use a key
allow all details of the algorithm to be widely available. This is because all of the security lies in
the key. With a key-based algorithm the plaintext is encrypted and decrypted by the algorithm
which uses a certain key, and the resulting ciphertext is dependent on the key, and not the
algorithm. This means that an eavesdropper can have a complete copy of the algorithm in use,
but without the specific key used to encrypt that message, it is useless.
8.3 OPERATIONS USED BY ENCRYPTION ALGORITHMS
Although the methods of encryption/decryption have changed dramatically since the advent of
computers, there are still only two basic operations that can be carried out on a piece of
plaintext: substitution and transposition. The only real difference is that, earlier these were
carried out with the alphabet, nowadays they are carried out on binary bits.
Substitution
Substitution operations replace bits in the plaintext with other bits decided upon by the
algorithm, to produce ciphertext. This substitution then just has to be reversed to produce
plaintext from ciphertext. This can be made increasingly complicated. For instance one
plaintext character could correspond to one of a number of ciphertext characters (homophonic
substitution), or each character of plaintext is substituted by a character of corresponding
position in a length of another text (running cipher).
.Example8.4 Julius Caesar was one ofthe first to use substitution encryption to sendmessages to
troops during the war. The substitution methodhe invented advances eachcharacterthree spacesin
the alphabet. Thus,

THIS IS SUBSTITUTION CIPHER (8.1)
WKLV LU VXEVWL WXWLRQ FLSKHU.
Transposition
Transposition (or permutation) does not alter any of the bits in plaintext, but instead moves
their positions around within it. If the resultant ciphertext is then put through more
transpositions, the end result is increasing security.
XOR
XOR is an exclusive-or operation. It is a Boolean operator such that if 1 of two bits is true, then
so is the result, but if both are true or both are false then the result is false. For example,
OXORO=O
1XOR0=1
OXOR1=1
1XOR1=0
(8.3)
A surprising amount of commercial software uses simple XOR functions to provide security,
including the USA digital cellular telephone network and many office applications, and it is
trivial to crack. However the XOR operation, as will be seen later in this paper, is a vital part of
many advanced Cryptographic Algorithms when performed between long blocks of bits that
also undergo substitution and/or transposition.
8.4 SYMMETRIC (SECRET KEY) CRYPTOGRAPHY
Symmetric Algorithms (or Single Key Algorithms or Secret Key Algorithms) have one
key that is used both to encrypt and decrypt the message, hence the name. In order for the
recipient to decrypt the message they need to have an identical copy of the key. This presents
one major problem, the distribution of the keys. Unless the recipient san meet the sender in
person and obtain a key, the key itself must be transmitted to the recipient, and is thus
susceptible to eavesdropping. However, single key algorithms are fast and efficient, especially if
large volumes of data need to be processed.
In Symmetric Cryptography, the two parties that exchange messages use the same algorithm.
Only the key is changed from time to time. The same plaintext with a different key results in a
different ciphertext. The encryption algorithm is available to the public, hence should be strong
and well-tested. The more powerful the algorithm, the less likely that an attacker will be able to
decrypt the resulting cipher.
The size of the key is critical in producing strong ciphertext. The US National Security
Agency, NSA stated in the mid-1990s that a 40-bit length was acceptable to them (i.e., they
Cryptography
could crack it sufficiently quickly!). Increasing processor speeds, combined with loosely-coupled
multi-processor configurations, have brought the ability to crack such short keys within the
reach of potential hackers. In 1998, it was suggested that in order to be strong, the key size needs
to be at least 56 bits long. It was argued by an expert group as early as 1996 that 90 bits is a more
appropriate length. Today, the most secure schemes use 128-bit keys or even longer keys.
Symmetric Cryptography provides a means of satisfying the requirement of message content
security, because the content cannot be read without the secret key. There remains a risk of
exposure, however, because neither party can be sure that the other party has not exposed the
secret key to a third party (whether accidentally or intentionally).
S mmetric Cryptography can also be used to address integrity and authentication
ｲ･ｱｾｲ･ｭ･ｮｴｳＮ＠ The sender creates a summary of the message, or Message Authentication
Code (MAC), encrypts it with the secret key, and sends that with the message. The reCipient
then re-creates the MAC, decrypts the MAC that was sent, and compares the two. If they are
identical, then the message that was received must have been identical with that which was sent.
As mentioned earlier, a major difficulty with symmetric schemes is that the secret key has to
be possessed by both parties, and hence has to be transmitted from whoever ｣ｾ･ｾｴ･ｳ＠ it to ｾ･＠
other party. Moreover, if the key is compromised, all of the message ｴｾ｡ｮｳｭｩｳｳｷｮ＠ｾ･｣ｵｮｴｹ＠
measures are undermined. The steps taken to provide a secure mechanism for creating and
passing on the secret key are referred to as Key Management.
The technique does not adequately address the non-repudiation requirement, ｢･ｾ｡ｵｳ･Ｎ＠ both
parties have the same secret key. Hence each is exposed to the risk of fraudulent ｦ｡ｬｾｉｦｩ｣｡ｴｩｾｮ＠ of
a message by the other, and a claim by either party not to have sent a message IS credible,
because the other may have compromised the key.
There are two types of Symmetric Algorithms-Block Ciphers and Stream Ciphers.
Definition 8.3 Block Ciphen usually operate on groups of bits called blocks. Each
block is processed a multiple number of times. In each round the ｾ･ｹ＠ is applied ｩｾ＠ a
unique manner. The more the number of iterations, the longer IS the encryption
process, but results in a more secure ciphertext
Definition 8.4 Stream Ciphen operate on plaintext one bit at a time. Plaintext is
streamed as raw bits through the encryption algorithm. While a block cipher will
produce the same ciphertext from the same plaintext using the same key, a stream
cipher will not. The ciphertext produced by a stream cipher will vary under the same
conditions.
How long should a key be? There is no single answer to this ｱｵ･ｳｴｩｾｮＮ＠ It ､･ｾ･ｮ､ｳ＠ on the
specific situation. To determine how much security one needs, the followmg questions must be
answered:
(i) What is the worth of the data to be protected?
-,

(ii) How long does it need to be secure?
(iii) What are the resources available to the cryptanalyst/hacker?
A customer list might be worth Rs 1000, an advertisement data might be worth Rs. 50,000 and
the master key for a digital cash system might be worth millions. In the world of stock markets,
the secrets have to be kept for a couple of minutes. In the newspaper business today's secret is
tomorrow's headlines. The census data of a country have to be kept secret for months (if not
years). Corporate trade secrets are interesting to rival companies and military secrets are
interesting to rival militaries. Thus, the security requirements can be specified in these terms.
For example, one may require that the key length must be such that there is a probability of
0.0001% that a hacker with the resources of Rs 1 million could break the system in 1 year,
assuming that the technology advances at a rate of 25% per annum over that period. The
minimum key requirement for different applications are listed in Table 8.1. This table should be
used as a guideline only.
Table 8.1 Minimum key requirements for different applications
Type of information Lifetime Mmimum key length
Tactical military information Minutes/hours 56-64 bits
Product announcements' Days/weeks 64 bits
Interest rates Days/weeks 64 bits
Trade secrets decades 112 bits
Nuclear bomb secrets >50 years 128 bits
Identities of spies >50 years 128 bits
Personal affairs > 60 years > 128 bits
Diplomatic embarrassments > 70 years > 128 bits
Future computing power is difficult to estimate. A rule of thumb is that the efficiency of
computing equipment divided by price doubles every 18 months, and increases by a factor of
10 every five years. Thus, in 50 years the fastest computer will be 10 billion times faster than
today's! These numbers refer to general-purpose computers. We cannot predict what kind of
specialized crypto-system breaking computers might be developed in the years to come.
Two symmetric algorithms, both block ciphers, will be discussed in this chapter. These are
the Data Encryption Standard (DES) and the International Data Encryption Algorithm
(IDEA).
8.5 DATA ENCRYPTION STANDARD (DES)
DES, an acronym for the Data Encryption Standard, is the name of the Federal Information
Processing Standard (FIPS) 46-3, which describes the Data Encryption Algorithm (DEA). The
DEA is also defined in the ANSI standard X9.32.
Created by IBM, DES came about due to a public request by the US National Bureau of
Standards (NSB) requesting proposals for a Standard Cryptographic Algorithm that satisfied
the following criteria:
Cryptography
(i) Provides a high level of security
(ii) The security depends on keys, not the secrecy of the algorithm
(iii) The security is capable of being evaluated
(iv) The algorithm is completely specified and easy to understand
(v) It is efficient to use and adaptable
(vi) Must be available to all users
(vii) Must be exportable
DEA is essentially an improvement of the 'Algorithm Lucifer' developed by IBM in the early
1970s. The US National Bureau of Standards published the Data Encryption Standard in 1975.
While the algorithm was basically designed by IBM, the NSA and NBS (now NIST) played a
substantial role in the final stages of the development. The DES has been extensively studied
since its publication and is the best known and the most widely used Symmetric Algorithm in
the world.
The DEA has a 64-bit block size and uses a 56-bit key during execution (8 parity bits are
stripped off from the full 64-bit key). The DEA is a Symmetric Cryptosystem, specifically a 16-
round Feistel Cipher and was originally designed for implementation in hardware. When used
for communication, both sender and receiver must know the same secret key, which can be
used to encrypt and decrypt the message, or to generate and verify a Message Authentication
Code (MAC). The DEA can also be used for single-user encryption, such as to store files on a
hard disk in encrypted form. In a multi-user environment, secure key distribution may be
difficult; public-key cryptography provides an ideal solution to this problem.
NIST re-certifies DES (FIPS 46_:1, 46-2, 46-3) every five years. FIPS 46-3 reaffirms DES usage
as of October 1999, but single DES is permitted only for legacy systems. FIPS 46-3 includes a
definition of triple-DES (TDEA, corresponding to X9.52). Within a few years, DES and triple-
DES will be replaced with the Advanced Encryption Standard.
DES has now been in world-wide use for over 20 years, and due to the fact that it is a defined
standard means that any system implementing DES can communicate with any other system
using it. DES is used in banks and businesses all over the world, as well as in networks (as
Kerberos) and to protect the password file on UNIX Operating Systems (as CRYPT).
DES Encryption
DES is a symmetric, block-cipher algorithm with a key length of 64 bits, and a block size of 64
bits (i.e. the algorithm operates on successive 64 bit blocks of plaintext). Being symmetric, the
same key is used for encryption and decryption, and DES also uses the same algorithm for
encryption and decryption.
First a transposition is carried out according to a set table (the initial permutation), the 64-bit
plaintext block is then split into two 32-bit blocks, and 16 identical operations called rounds are
carried out on each half. The two halves are then joined back together, and the reverse of the

initial permutation carried out. The purpose of the first transposition is not clear, as it does not
affect the security crf the algorithm, but is thought to be for the purpose of allowing plaintext and
ciphertext to be loaded into 8-bit chips in byte-sized pieces.
In any round, only one half of the original 64-bit block is operated on. The rounds alternate
between the two halves. One round in DES consists of the following.
Key Transformation
The 64-bit key is reduced to 56 by removing every eighth bit (these are sometimes used for
error checking). Sixteen different 48-bit subkeys are then created- one for each round. This is
achieved by splitting the 56-bit key into two halves, and then circularly shifting them left by 1 or
2 bits, depending on the round. After this, 48 of the bits are selected. Because they are shifted,
different groups of key bits are used in each subkey. This process is called a compression
permutation due to the transposition of the bits and the reduction of the overall size.
Expansion Permutation
After the key transformation, whichever half of the block is being operated on undergoes an
expansion permutation. In this operation, the expansion and transposition are achieved
simultaneously by allowing the 1st and 4th bits in each 4 bit block to appear twice in the output,
i.e., the 4th input bit becomes the 5th and 7th output bits (see Fig. 8.2).
The expansion permutation achieves 3 things: Firstly it increases the size of the half-block
from 32 bits to 48, the same number of bits as in the compressed key subset, which is important
as the next operation is to XOR the two together. Secondly, it produces a longer string of data
for the substitution operation that subsequently compresses it. Thirdly, and most importantly,
because in the subsequent substitutions the 1st and 4th bits appear in two S-boxes (described
shortly), they affect two substitutions. The effect of this is that the dependency of the output bits
on the input bits increases rapidly, and so, therefore, does the security of the algorithm.
-----.-§]
48
Fig. 8.2 The Expansion Permutation.
XOR
The resulting 48-bit block is then XORed with the appropriate subset key for that round.
Substitution
The next operation is to perform substitutions on the expanded block. There are eight
substitution boxes, called S-boxes. The first S-box operates on the first 6 bits of the 48-bit
Cryptography
expanded block, the 2nd S-box on the next six, and so on. Each S-box operates from a table of 4
rows and 16 columns, each entry in the table is a 4-bit number. The 6-bit number the S-box
takes as input is used to look up the appropriate entry in the table in the following way. The 1st
and 6th bits are combined to form a 2-bit number corresponding to a row number, and the 2nd
to 5th bits are combined to form a 4-bit ｾｵｭ｢･ｲ＠ corresponding to a particular column. The net
result of the substitution phase is eight 4-bit blocks that are then combined into a 32-bit block.
It is the non-linear relationship of the S-boxes that really provide DES with its security, all the
other processes within the DES algorithm are linear, and as such relatively easy to analyze.
48-bit input
ｉｾ＠ｾｾｉ＠ ill)II: /ill)11:1r--11...--1
ｌＭｲｬｬＭｲｾＭＮＡ＠ｾＭＭＬｩＬＭＭＬＱＺ＠ＮＮＭＱｻｾＱ＠ L-rii"TJ""""T:...,jlr-1!: / l{(!Ij: 1/ll!ｾｾｉ＠ j: Ｏｩｬｬｾｾ｜＠
ｾｾｾｾ＠ｾｾｾｾ＠ｾｾｾｾ＠ｾｾｾｾ＠ｾｾｾｾ＠ｾｾｾｾ＠ｾｾｾｾ＠ｾｾｾｾ＠
ITITJITITJITITJITOJITOJITITJITIIJITIIJ
32-bit output
Fig. 8.3 The 5-box Substitution.
Permutation
The 32-bit output of the substitution phase then undergoes a straightforward transposition using
a table sometimes known as the P-box.
After all the rounds have been completed, the two 'half-blocks' of 32 bits are recombined to
form a 64-bit output, the final permutation is performed on it, and the resulting 64-bit block is
the DES encrypted ciphertext of the input plaintext block.
DES Decryption
Decrypting DES is very easy (if one has the correct key!). Thanks to its ､･ｳｾｧｮＬ＠ the ､ｾ｣ｲｹｰｴｩｯｮ＠
algorithm is identical to the encryption algorithm-the only alteration that ｾｳ＠ made, IS. that to
decrypt DES ciphertext, the subsets of the key used in each round are used m reverse, 1.e., the
16th subset is used first.
Security of DES
Unfortunately, with advances in the field of cryptanalysis and the huge ｩｮ｣ｲ･｡ｳｾ＠ in available
computing power, DES is no longer considered to be very secure. There are algonthms that can
be used to reduce the number of keys that need to be checked, but even using a straightforward
brute-force attack and just trying every single possible key, there are computers that can crack
DES in a matter of minutes. It is rumoured that the US National Security Agency (NSA) can
crack a DES encrypted message in 3-15 minutes.
If a time limit of 2 hours to crack a DES encrypted file is set, then you have to check all
possible keys (256
!in two hours, which is roughly 5 trillion keys per second. While this may
--,

seem like a huge number, consider that a $I0 Application-Specific Integrated Circuits (ASICs)
chip can test 200 million keys per second, and many of these can be paralleled together. It is
suggested that a $I0 million investment in ASICs would allow a computer to be built that would
be capable of breaking a DES encrypted message in 6 minutes.
DES can no longer be considered a sufficiently secure algorithm. If a DES-encrypted message
can be broken in minutes by supercomputers today, then the rapidly increasing power of
computers means that it will be a trivial matter to break DES encryption in the future (when a
message encrypted today may still need to be secure). An extension of DES called DESX is
considered to be virtually immune to an exhaustive key search.
8.6 INTERNATIONAL DATA ENCRYPTION ALGORITHM (IDEA)
IDEA was created in its first form by Xuejia Lai andjames Massey in I990, and was called the
ｐｲｯｾｯｳ･ｾ＠ｅｮ｣ｾｰｴｩｯｮ＠ Standard (PES). In I991, Lai and Massey strengthened the algorithm
agamst differential cryptanalysis and called the result Improved PES (IPES). The name of IPES
was change.d ｴｾ＠ International Data Encryption Algorithm (IDEA) in I992. IDEA is perhaps best
known for Its Implementation in PGP (Pretty Good Privacy).
The Algorithm
IDEA is a symmetric, block-cipher algorithm with a key length of I28 bits, a block size of 64
bits, and as with DES, the same algorithm provides encryption and decryption.
ｉｄｅｾ＠ consists of 8 rounds using 52 subkeys. Each round uses six subkeys, with the remaining
four bemg used for the output transformation. The subkeys are created as follows.
. Firstly the ＮｉｾＸＭ｢ｩｴ＠ key is divided into eight I6-bit keys to provide the first eight subkeys. The
bits of the ｯＮｮｧｴｾ｡ｬＮ＠ key are then shifted 25 bits to the left, and then it is again split into eight
subkeys. This shifting and then splitting is repeated until all 52 subkeys (SKI-SK52) have been
created.
The 64-bit plaintext block is first split into four blocks {B I-B4). A round then consists of the
following steps (OB stands for output block):
OBI = BI * SKI (multiply Ist sub-block with Ist subkey)
OB2 = B2 + SK2 (add 2nd sub-block to 2nd subkey)
OB3 = B3 + SK3 (add 3rd sub-block to 3rd subkey)
OB4 = B4 * SK4 (multiply 4th sub-block with 4th subkey)
OB5 =OBI XOR OB3 ( XOR results of steps I and 3)
OB6 = OB2 XOR OB4
OB7 = OB5 * SK5 (multiply result of step 5 with 5th subkey)
OB8 = OB6 + OB7 (add results of steps 5 and 7)
OB9 = OB8 * SK6 (multiply result of step 8 with 6th subkey)
Cryptography
OBIO = OB7 + OB9
OBll =OBI XOR OB9 (XOR results of steps 1 and 9)
OB 12 = OB3 XOR OB9
OB13 = OB2 XOR OBIO
OB14 = OB4 XOR OBIO
The input to the next round, is the four sub-blocks OB11, OB13, OB12, OB14 in that order.
After the eighth round, the four final output blocks (F1-F4) are used in a final transformation
to produce four sub-blocks of ciphertext (C1-C4) that are then rejoined to form the final64-bit
block of ciphertext.
C1 = F1 * SK49
C2=F2 + SK50
C3 = F3 + SK51
C4 = F4 * SK52
Ciphertext= C1 C2 C3 C4.
Security Provided by IDEA
Not only is IDEA approximately twice as fast as DES, but it is also considerably more secure.
Using a brute-force approach, there are 2128
possible keys. If a billion chips that could each test
1 billion keys a second were used to try and crack an IDEA-encrypted message, it would take
them 1013 years which is considerably longer than the age of the universe! Being a fairly new
algorithm, it is possible a better attack than brute-force will be found, which, when coupled with
much more powerful machines in the future may be able to crack a message. However, for a
long way into the future, IDEA seems to be an extremely secure cipher.
8.7 RC CIPHERS
The RC ciphers were designed by Ron Rivest for the RSA Data Security. RC stands for Ron's
Code or Rivest Cipher. RC2 was designed as a quick-fix replacement for DES that is more secure.
It is a block cipher with a variable key size that has a propriety algorithm. RC2 is a variable-key-
length cipher. However, when using the Microsoft Base Cryptographic Provider, the key length
is hard-coded to 40 bits. When using the Microsoft Enhanced Cryptographic Provider, the key
length is 128 bits by default and can be in the range of 40 to 128 bits in 8-bit increments.
RC4 was developed by Ron Rivest in 1987. It is a variable-key-size stream cipher. The details
of the algorithm have not been officially published. The algorithm is extremely easy to describe
and program. Just like RC2, 40-bit RC4 is supported by the Microsoft Base Cryptographic
provider, and the Enhanced provider allows keys in the range of 40 to 128 bits in 8-bit
increments.
RC5 is a block cipher designed for speed. The block size, key size and the number of
iterations are all variables. In particular, the key size can be as large as 2,048 bits.
l

All the encryption techniques discussed so far belong to the class of symmetric cryptography
(DES, IDEA and RC Ciphers). We now look at the class of Asymmetric Cryptographic
Techniques.
8.8 ASYMMETRIC (PUBLIC-KEY) ALGORITHMS
Public-key Algorithms are asymmetric, that is to say the key that is used to encrypt the
message is different from the key used to decrypt the message. The encryption key, known as
the public key is used to encrypt a message, but the message can only be decoded by the person
that has the decryption key, known as the private key.
This type of algorithm has a number of advantages over traditional symmetric ciphers. It
means that the recipient can make their public key widely available - anyone wanting to send
them a message uses the algorithm and the recipient's ｰｵ｢ｬｩｾ＠ key to do so. An eavesdropper
may have both the algorithm and the public key, but will still not be able to decrypt the message.
Only the recipient, with their private key can decrypt the message.
A disadvantage of public-key algorithms is that they are more computationally intensive than
symmetric algorithms, and therefore encryption and decryption take longer. This may not be
significant for a short text message, but certainly is for long messages or audio/video.
The Public-Key Cryptography Standards (PKCS) are specifications produced by RSA
Laboratories in cooperation with secure systems developers worldwide for the purpose of
accelerating the deployment of public-key cryptography. First published in 1991 as a result of
meetings with a small group of early adopters of public-key technology, the PKCS documents
have become widely referenced and implemented. Contributions from the PKCS series have
become part of many formal and de facto standards, including ANSI X9 documents, PKIX,
SET, S/MIME, and SSL.
The next two sections describe two popular public-key algorithms, the RSA Algorithm and
the Pretty Good Privacy (PGP) Hybrid Algorithm.
8.9 THE RSA ALGORITHM
RSA, named after its three creators-Rivest, Shamir and Adleman, was the first effective public-
key algorithm, and for years has withstood intense scrutiny by cryptanalysts all over the world.
Unlike symmetric key algorithms, where, as long as one presumes that an algorithm is not
flawed, the security relies on having to try all possible keys, public-key algorithms rely on it
being computationally unfeasible to recover the private key from the public key.
RSA relies on the fact that it is easy to multiply two large prime numbers together, but
extremely hard (i.e. time consuming) to factor them back from the result. Factoring a number
means finding its prime factors, which are the prime numbers that need to be multiplied
together in order to produce that number. For example,
Cryptography
10 = 2 X 5
60 = 2 X 2 X 3 X 5
2113
- 1 = 3391 X 23279 X 65993 X 1868569 X 1066818132868207
The algorithm
Two very large prime numbers, normally of equal length, are randomly chosen then multiplied
together.
N=AxB
T= (A- 1) X (B- 1)
(8.4)
(8.5)
A third number is then also chosen randomly as the public key (E; such that it has no common
factors (i.e. is relatively prime) with T. The private key (D) is then:
D= E- 1
mod T (8.6)
To encrypt a block of plaintext (M) into ciphertext (C):
C=MEmodN
Jo decrypt:
M= cDmod N
Example 8.5 Consider the following implementation ofthe RSA algorithm.
1st prime (A) = 37
2nd prime (B)= 23
So,
N = 37 X 23 = 851
T= (37- 1) X (23- 1) = 36 X 23 = 792
E must have no factors other than 1 in ｾｯｭｭｯｮ＠ with 792.
E (public key) could be 5.
D (private key)= 5-I mod 792 = 317
To encrypt a message (M) of the character 'G':
If G is represented as 7 (7th letter in alphabet), then M = 7.
C (ciphertext) = 75
mod 851 = 638
To decrypt: M = 638317
mod 851 = 7.
Security of RSA
(8.7)
(8.8)
The security of RSA algorithm depends on the ability of the hacker to factorize numbers. New,
faster and better methods for factoring numbers are constantly being devised. The current best
for long numbers is the Number Field Sieve. Prime Numbers of a length that was unimaginable a
mere decade ago are now factored easily. Obviously the longer a number is, the harder it is to
factor, and so the better the security of RSA. As theory and computers improve, larger and

larger keys will have to be used. The disadvantage in using extremely long keys is the
computational overhead involved in encryption/decryption. This will only become a problem
if a new factoring technique emerges that requires keys of such lengths to be used that necessary
key length increases much faster than the increasing average speed of computers utilising the
RSA algorithm.
In 1997, a specific assessment of the security of 512-bit RSA keys showed that one may be
factored for less than $1,000,000 in cost and eight months of effort. It is therefore believed that
512-bit keys provide insufficient security for anything other than short-term needs. RSA
Laboratories currently recommend key sizes of 768 bits for personal use, 1024 bits for corporate
use, and 2048 bits for extremely valuable keys like the root-key pair used by a certifying
authority. Security can be increased by changing a user's keys regularly and it is typical for a
user's key to expire after two years (the opportunity to change keys also allows for a longer
length key to be chosen).
Even without using huge keys, RSA is about 1000 times slower to encrypt/decrypt than DES.
This has resulted in it not being widely used as a stand-alone cryptography system. However, it
is used in many hybrid cryptosystems such as PGP. The basic principle of hybrid systems is to
encrypt plaintext with a Symmetric Algorithm (usually DES or IDEA); the symmetric
algorithm's key is then itself encrypted with a public-key algorithm such as RSA. The RSA
encrypted key and symmetric algorithm-encrypted message are then sent to the recipient, who
uses his private RSA key to decrypt the Symmetric Algorithm's key, and then that key to decrypt
the message. This is considerably faster than using RSA throughout, and allows a different
symmetric key to be used each time, considerably enhancing the security of the Symmetric
Algorithm.
RSA's future security relies· solely on advances in factoring techniques. Barring an
astronomical increase in the efficiency of factoring techniques, or available computing power,
the 2048-bit key will ensure very secure protection into the foreseeable future. For instance an
Intel Paragon, which can achieve 50,000 mips (million operations per second), would take a
million years to factor a 2048-bit key using current techniques.
8. 10 PREITY GOOD PRIVACY (PGP)
Pretty Good Privacy (PGP) is a hybrid cryptosystem that was created by Phil Zimmerman and
released onto the Internet as a freeware program in 1991. PGP is not a new algorithm in its own
right, but rather a series of other algorithms that are performed along with a sophisticated
protocol. PGP's intended use was for e-mail security, but there is no reason why the basic
principles behind it could not be applied to any type of transmission.
PGP and its source code is freely available on the Internet. This means that since its creation
PGP has been subjected to an enormous amount of scrutiny by cryptanalysts, who have yet to
find an exploitable fault in it. ·
Cryptography
PGP has four main modules: a symmetric cipher- IDEA for message encryption, a public
key algorithm-RSA to encrypt the IDEA key and hash values, a one-way hash function-MD5
for signing, and a random number generator.
The fact that the body of the message is encrypted with a symmetric algorithm (IDEA) means
that PGP generated e-mails are a lot faster to encrypt and decrypt than ones using simple RSA.
The key for the IDEA module is randomly generated each time as a one-off session key, this
makes PGP very secure, as even if one message was cracked, all previous and subsequent
messages would remain secure. This session key is then encrypted with the public key of the
recipient using RSA. Given that keys up to 2048 bits long can be used, this is extremely secure.
MD5 can be used to produce a hash of the message, which can then be signed by the sender's
private key. Another feature of PGP's security is that the user's private key is encrypted using a
hashed pass-phrase rather than simply a password, making the private key extremely resistant
to copying even with access to the user's computer.
Generating true random numbers on a computer is notoriously hard. PGP tries to achieve
randomness by making use of the keyboard latency when the user is typing. This means that the
program measures the gap of time between each key-press. Whilst at first this may seem to be
distinctly non-random, it is actually fairly effective-people take longer to hit some keys than
others, pause for thought, make mistakes and vary tpeir overall typing speed on all sorts of
factors such as knowledge of the subject and tiredness. These measurements are not actually
used directly but used to trigger a pseudo-random number generator. There are other ways of
generating random numbers, but to be much better than this gets very complex.
PGP uses a very clever, but complex, protocol for key management. Each user generates and
distributes their public key. IfJames is happy that a person's public key belongs to who it claims
to belong to, then he can sign that person's public key andJames's program will then accept
messages from that person as valid. The user can allocate levels of trust to other users. For
instance, James may decide that he completely trusts Earl to sign other peoples' keys, in effect
saying "his word is good enough for me". This means that if Rachel, who has had her key signed
by Earl, wants to communicate withJames, she sendsJames her signed key.James's program
recognises Earl's signature, has been told that Earl can be trusted to sign keys, and so accepts
Rachel's key as valid. In effect Earl has introduced Rachel to James.
PGP allows many levels of trust to be assigned to people, and this is best illustrated in Fig. 8.4.
The explanations are as follows.
15
' line
James has signed the keys of Earl, Sarah, Jacob and Kate. James completely trusts Earl to sign
other peoples' keys, does not trust Sarah at all, and partially trusts Jacob and Kate (he trusts
Jacob more than Kate).

X I Mike
(unsigned)
ｾｕｮ･＠
Q =fully trusted 0 = partially trusted
to a lesser degree
c:J =partially trusted X CJ =not validated
ｾ＠ =B's key validated directly
mor by introduction
Level 1-People with keys signed
by James
Levei2-People with keys signed
by those on level 1
Level 3-People with keys signed
by those on level 2
Fig. 8.4 An Example of a PGP User Web.
ａｬｴｨｯｵｾｊ｡ｭ･ｳ＠ has not signed Sam's key he still trusts Sam to sign other peoples' keys, may be
ｾｮ＠ｂｾ｢＠ s say so or due to them actually meeting. Because Earl has signed Rachel's key, Rachel
IS validated Ｈ｢ｾｴ＠ not trusted to sign keys). Even though Bob's key is signed by Sarah andJacob,
because Sarah IS not trusted andJacob only partially trusted, Bob is not validated. Two partially
trusted people,Jacob and Kate, have signed Archie's key, therefore Archie is validated.
J«i line
S_am, who is fully trusted, has signed Hal's key, therefore Hal is validated. Louise's key has been
signed by Rachel and Bob, neither of whom is trusted, therefore Louise is not validated.
Odd one out
Mike's key has not been signed by anyone in James' group, maybe James found it on the
Internet and does not know whether it is genuine or not.
PGP never prevents the user from sending or receiving e-mail, it does however warn the user
if a key is not validated, and the decision is then up to the user as to whether to heed the warning
or not.
Key Revocation
If a user's private key is compromised then they can send out a key revocation certificate.
Unfortunately this does not guarantee that everyone with that user's public key will receive it, as
ke!s are often swap?ed in a disorganised manner. Additionally, if the user no longer has the
pnvate key then they cannot issue a certificate, as the key is required to sign it.
Cryptography
Security of PGP
"A chain is only as strong as its weakest link" is the saying and it holds true for PGP. If the user
chooses a 40-bit RSA key to encrypt his session keys and never validates any users, then PGP
will not be very secure. If however a 2048-bit RSA key is chosen and the user is reasonably
vigilant, then PGP is the closest thing to military-grade encryption the public can hope to get
their hands on.
The Deputy Director of the NSA was quoted as saying:
''Ifall the personal computers in the world, an estimated 260 million, were put to work on a single PGP-
encrypted message, it would still take an estimated 72 million times the age ofthe universe, on average, to
break a single message. "
A disadvantage of public-key cryptography is that anyone can send you a message using your
public key, it is then necessary to prove that this message came from who it claims to have been
sent by. A message encrypted by someone's private key, can be decrypted by anyone with their
public key. This means that if the sender encrypted a message with his private key, and then
encrypted the resulting ciphertext with the recipient's public key, the recipient would be able to
decrypt the message with first their private key, and then the sender's public key, thus
recovering the message and proving it came from the correct sender.
This process is very time-consuming, and therefore rarely used. A much more common
method of digitally signing a message is using a method called One-Way Hashing.
8.11 ONE-WAY HASHING
A One-Way Hash Function is a mathematical function that takes a message string of any
length (pre-string) and returns a smaller fixed-length string (hash value). These functions are
designed in such a way that not only is it very difficult to deduce the message from its hashed
version, but also that even given that all hashes are a certain length, it is extremely hard to find
two messages that hash to the same value. In fact to find two messages with the same hash from
a 128-bit hash function, 264
hashes would have to be tried. In other words, the hash value of a
file is a small unique 'fingerprint'. Even a slight change in an input string should cause the hash
value to change drastically. Even if 1 bit is flipped in the input string, at least half of the bits in
the hash value will flip as a result. This is called an Avalanche Effect.
H = hash value, f = hash function, M = original message/pre-string, then
H= f(M). (8.9)
Ifyou know Mthen His easy to compute. However knowing Handf, it is not easy to compute
M, and is hopefully computationally unfeasible.
As long as there is a low risk of collision (i.e. 2 messages hashing to the same value), and the
hash is very hard to reverse, then a one-way hash function proves extremely useful for a number
of aspects of cryptography.

If you one-way hash a message, the result will be a much shorter but still unique (at least
statistically) number. This can be used as proof of ownership of a message without having to
reveal the contents of the actual message. For instance rather than keeping a database of
copyrighted documents, if just the hash values of each document were stored, then not only
would this save a lot of space, but it would also provide a great deal of security. If copyright then
needs to be.proved, the owner could produce the original document and prove it hashes to that
value.
Hash-functions can also be used to prove that no changes have been made to a file, as adding
even one character to a file would completely change its hash value.
By far the most common use of hash functions is to digitally sign messages. The sender
performs a one-way hash on the plaintext message, encrypts it with his private key and then
encrypts both with the recipient's public key and sends in the usual way. On decrypting the
ciphertext, the recipient can use the sender's public key to decrypt the hash value, he can then
perform a one-way hash himself on the plaintext message, and check this with the one he has
received. If the hash values are identical, the recipient knows not only that the message came
from the correct sender, as it used their private key to encrypt the hash, but also that the
plaintext message is completely authentic as it hashes to the same value.
The above method is greatly preferable to encrypting the whole message with a private key,
as the hash of a message will normally be considerably smaller than the message itself. This
means that it will not significantly slow down the decryption process in the same way that
decrypting the entire message with the sender's public key, and then decrypting it again with
the recipient's private key would. The PGP system uses the MD5 hash function for precisely this
purpose.
The Microsoft Cryptographic Providers support three hash algorithms: MD4, MD5 and
SHA. Both MD4 and MD5 were invented by Ron Rivest. MD stands for Message Digest. Both
algorithms produce 128-bit hash values. MD5 is an improved version of MD4. SHA stands for
Secure Hash Algorithm. It was designed by NIST and NSA. SHA produces 160-bit hash values,
longer than MD4 and MD5. SHA is generally considered more secure than other algorithms
and is the recommended hash algorithm.
8.12 OTHER TECHNIQUES
One Time Pads
The one-time pad was invented by MajorJoseph Mauborgne and Gilbert Bemam in 1917, and
is an unconditionally secure (i.e. unbreakable) algorithm. The theory behind a one-time pad is
simple. The pad is a non-repeating random string of letters. Each letter on the pad is used once
only to encrypt one corresponding plaintext character. After use, the pad must neverbe re-used.
As long as the pad remains secure, so is the message. This is because a random key added to a
Cryptography
non-random message produces completely random ciphertext, and there is absolutely no
amount of analysis or computation that can alter that. If both pads are destroyed then the
original message will never be recovered. There are two major drawbacks:
Firstly, it is extremely hard to generate truly random numbers, and a pad that has even a
couple of non-random properties is theoretically breakable. Secondly, because the pad can
never be reused no matter how large it is, the length of the pad must be the same as the length
of the message which is fine for text, but virtually impossible for video.
Steganography
Steganography is not actually a method of encrypting messages, but hiding them within
something else to enable them to pass undetected. Traditionally this was achieved with invisible
ink, microfilm or taking the first letter from each word of a message. This is now achieved by
hiding the message within a graphics or sound file. For instance in a 256-greyscale image, if the
least significant bit of each byte is replaced with a bit from the message then the result will be
indistinguishable to the human eye. An eavesdropper will not even realise a message is being
sent. This is not cryptography however, and although it would fool a human, a computer would
be able to detect this very quickly and reproduce the original message.
Secure Mail and S/MIME
Secure Multipurpose Internet Mail Extensions (S/MIME) is a de facto standard developed by
RSA Data Security, Inc., for sending secure mail based on public-key cryptography. MIME is
the industry standard format for electronic mail, which defines the structure of the message's
body. S/MIME-supporting e-mail applications add digital signatures and encryption capabilities
to that format to ensure message integrity, data origin authentication and confidentiality of
electronic mail.
When a signed message is sent, a detached signature in the PKCS =#=7 format is sent along
with the message as an attachment. The signature attachment contains the hash of the original
message signed with the sender's private key, as well as the signer certificate. S/MIME also
supports messages that are first signed with the sender's private key and then enveloped using
the recipients' public keys.
8.13 SECURE COMMUNICATION USING CHAOS FUNCTIONS
Chaos functions have also been used for secure communications and cryptographic
applications. The implication of a chaos function here is an iterative difference equation that
exhibits chaotic behaviour. If we observe the fact that cryptography has more to do with
unpredictability rather than randomness, chaos functions are a good choice because of their
property of unpredictability. If a hacker intercepts part of the sequence, he will have no
information on how to predict what comes next. The unpredictability of chaos functions makes
them a good choice for generating the keys for symmetric cryptography.

Example 8.6 Consider the difference equation
Xn+l = axn(l- Xn) (8.10)
For a= 4, this function behaves like a chaos function, i.e.,
(i) ｴｨｾ＠ values obtained by successive iterations are unpredictable, and
(ii) the function is extremely sensitive to the initial condition, Xo·
For any given initial condition, this function will generate values ofx11
between 0 and 1for each
iteration. These values are good candidates forkey generation. In single-keycryptography, a key is
used for enciphering the message. This key is usually a pseudo noise (PN) sequence. The message
can be simply XORed with the key in order to scramble it. Since xn takes positive values that are
always less than unity, the binary equivalent ofthese fractions can serve as keys. Thus, one ofthe
ways ofgenerating keys from these random, unpredictable decimal numbers is to directly use their
binary representation. The lengths ofthese binary sequences will be limited only by the accuracy of
the decimal numbers, and hence very long binary keys can be generated. The recipient must know
the initial condition in order to generate the keys for decryption.
For application in single key cryptography the following two factors need to be decided
(i) The start value for the iterations (Xo), and
(ii) The number for decimal places of the mantissa that are to be supported by the calculating
machine (to avoid round off error).
For single-key cryptography, the chaos values obtained after some number of iteration are
converted to binary fractions whose first 64 bits are taken to generate PN sequences. These
initial iterations would make it still more difficult for the hacker to guess the initial condition.
The starting value should be taken between 0 and 1. A good choice of the starting value can
improve the performance slightly.
The secrecy of the starting number, x0, is the key to the success of this algorithm. Since chaos
functions are extremely sensitive to even errors of 10-30
in the starting number (x0
), it means
that we can have 1030
unique starting combinations. Therefore, a hacker who knows the chaos
function and the encryption algorithm has to try out 1030
different start combinations. In the
DES algorithm the hacker had to try out approximately 1019
different key values.
Chaos based algorithms require a high computational overhead to generate the chaos values
as well as high computational speeds. Hence, it might not be suitable for bulk data encryption.
8.14 CRYPTANALYSIS
Cryptanalysis is the science (or black art!) of recovering the plaintext of a message from the
ciphertext without access to the key. In cryptanalysis, it is always assumed that the cryptanalyst
Cryptography
has full access to the algorithm. An attempted cryptanalysis is known as an attack, of which
there are five major types:
• Bruteforce attack This technique requires a large amount of computing power and a large
amount of time to run. It consists of trying all possibilities in a logical manner until the
correct one is found. For the majority of encryption algorithms a brute force attack is
impractical due to the large number of possibilities.
• Ciphertext-only: The only information the cryptanalyst has to work with is the ciphertext
of various messages all encrypted with the same algorithm.
ｾｴ＠ Known-plaintext. In this scenario, the cryptanalyst has access not only to the ciphertext of
various messages, but also the corresponding plaintext as well.
• Chosen-plaintext: The cryptanalyst has access to the same information as in a known
plaintext attack, but this time may choose the plaintext that gets encrypted. This attack is
more powerful, as specific plaintext blocks can be chosen that may yield more
information about the key. An adaptive-chosen-plaintext c:ttack is merely one where the
cryptanalyst may repeatedly encrypt plaintext, thereby modifying the input based on the
results of a previous encryption.
• Chosen-ciphertext. The cryptanalyst uses a relatively new technique called differential
cryptanalysis, which is a1 interactive and iterative process. It works through many rounds
using the results from P- "vious rounds, until the key is identified. The cryptanalyst
repeatedly chooses ciphertext to be decrypted, and has access to the resulting plaintext.
From this they try to deduce the key.
There is only one totally secure algorithm, the one-time pad. All other algorithms can be
broken given infinite time and resources. Modern cryptography relies on making it
computationally unfeasible to break an algorithm. This means, that while it is theoretically
possible, the time scale and resources involved make it completely unrealistic.
If an algorithm is presumed to be perfect, then the only method of breaking it relies on trying
every possible key combination until the resulting ciphertext makes sense. As mentioned above,
this type of attack is called a brute-force attack. The field of parallel computing is perfectly
suited to the task of brute force attacks, as every processor can be given a number of possible
keys to try, and they do not need to interact with each other at all except to announce the result.
A technique that is becoming increasingly popular is parallel processing using thousands of
individual computers connected to the Internet. This is known as distributed computing. Many
cryptographers believe that brute force attacks are basically ineffective when long keys are
used. An encryption algorithm with a large key (over 100 bits) can take millions of years to
crack, even with powerful, networked computers of today. Besides, adding a single extra key
doubles the cost of performing a brute force cryptanalysis.
Regarding brute force attack, there are a couple of other pertinent questions. What if the
original plaintext is itself a cipher? In that case, how will the hacker know if he has found the
right key. In addition, is the cryptanalyst sitting at the computer and watching the result of each

key that is being tested? Thus, we can assume that brute force attack is impossible provided long
enough keys are being used.
Here are some of the techniques that have been used by cryptanalysts to attack ciphertext.
• Differential cryptanalysis: As mentioned before, this technique uses an iterative process to
evaluate cipher that has been generated using an iterative block algorithm (e.g. DES).
Related plaintext is encrypted using the same key. The difference is analysed. This
technique proved successful against DES and some hash functions.
• Linear Cryptanalysis: In this, pairs of plaintext and ciphertext are analysed and a linear
approximation technique is used to determine the behaviour of the block cipher. This
technique was also used successfully against DES.
• Algebraic attack This technique exploits the mathematical structure in block ciphers. If
the structure exists, a single encryption with one key might produce the same result as a
double encryption with two different keys. Thus the search time can be reduced.
However, strong or weak the algorithm used to encrypt it, a message can be thought of as
secure if the time and/or resources needed to recover the plaintext greatly exceed the benefits
bestowed by having the contents. This could be because the cost involved is greater than the
financial value of the message, or simply that by the time the plaintext is recovered the contents
will be outdated.
8. 15 POLITICS OF CRYPTOGRAPHY
Widespread use of cryptosystems is something most governments are not particularly happy
about-precisely because it threatens to give more privacy to the individual, including criminals.
For many years, police forces have been able to tap phone lines and intercept mail, however, in
an encrypted future that may become impossible.
This has led to some strange decisions on the part of governments, particularly the United
States government. In the United States, cryptography is classified as a munition and the export
of programs containing cryptosystems is tightly controlled. In 1992, the Software Publishers
Association reached agreement with the State Department to allow the export of software that
contained RSA's RC2 and RC4 encryption algorithms, but only if the key size was limited to 40
bits as opposed to the 128 bit keys available for ｾｳ･＠ within the US. This significantly reduced the
level of privacy produced. In 1993 the US Congress had asked the National Research Council
to study US cryptographic policy. Its 1996 report, the result of two years' work, offered the
fol!owing conclusions and recommendations:
• "On balance, the advantages of more widespread use of cryptography outweigh the
disadvantages."
• "No law should bar the manufacture, sale or use of any form of encryption within the
United States."
• "Export controls on cryptography should be progressively relaxed but not eliminated."
Cryptography
In 1997 the limit on the key size was increased to 56 bits. The US government has proposed
several methods whereby it would allow the export of stronger encryption, all based on a system
where the US government could gain access to the keys if necessary, for example the clipper
chip. Recently there has been a lot of protest from the cryptographic ｣ｾｭｭｵｮｩ｟ｴｹ＠ against the _tJS
government imposing restrictions on the development of cryptographic ｾ･｣ｾｭｱｵ･ｳＮ＠ :ne article
by Ronald L. Rivest, Professor, MIT, in the October 1998 issue of the Sctentific Ammcan, (pages
116-117) titled "The Case against Regulating Encryption Technology," is an example of such a
protest. The resolution of this issue is regarded to be one of the most important for the future of
e-commerce.
In this section we present a brief history of cryptography. People have tried to conceal
information in written form since writing was developed. Examples survive in stone inscriptions
and papyruses showing that many ancient civilizations including the Egyptians, Hebrews and
Assyrians all developed cryptographic systems. The first recorded use of cryptography for
correspondence was by the Spartans who (as early as 400 BC) employed a cipher device called
a scytale to send secret communications between military commanders.
The scytale consisted of a tapered baton around which was wrapped a piece of parchment
inscribed with the message. Once unwrapped the parchment appeared to contain an
incomprehensible set of letters, however when wrapped around another baton of identical size
the original text appears.
The Greeks were therefore the inventors of the first transposition cipher and in the fourth
century BC the earliest treatise on the subject was written by a Greek, Aeneas ｔ｡｣ｾ｣ｵｳＬ＠ as part
of a work entitled On the Defence ofFortifications. Another Greek, Polybius, later deVIsed a means
of encoding letters into pairs of symbols using a device known as the Polybius checkerboardwhich
contains many elements common to later encryption systems. In addition to the Greeks there
are similar examples of primitive substitution or transposition ciphers in use by ｯｴｨｾｲ＠
civilizations including the Romans. The Polybius checkerboard consists of a five by five gn_d
containing all the letters of the alphabet. Each letter is converted into two numbers, the first 1s
the row in which the letter can be found and the second is the column. Hence the letter A
becomes 11, the letter B 12 and so forth.
The Arabs were the first people to clearly understand the principles of cryptography. They
devised and used both substitution and transposition ciphers and discovered the use of letter
frequency distributions in cryptanalysis. As a result of this, by approximately 1412, al-Kalka-
shandi could include in his encyclopaedia Subh al-a'sha a respectable, if elementary, treatment
of several cryptographic systems. He also gave explicit instructions on how to _cryptanalyze
ciphertext using letter frequency counts including examples illustrating the techmque.

European cryptography dates from the Middle Ages during which it was developed by the
Papal and Italian city states. The earliest ciphers involved only vowel substitution (leaving the
consonants unchanged). Circa 1379 the first European manual on cryptography, consisting of a
compilation of ciphers, was produced by Gabriele de Lavinde of Parma, who served Pope
Clement VII. This manual contains a set of keys for correspondents and uses symbols for letters
and nulls with several two character code equivalents for words and names. The first brief code
vocabularies, called nomenclators, were expanded gradually and for several centuries were the
mainstay of diplomatic communication for nearly all European governments. In 1470 Leon
ｂ｡ｴｴｩｾｴ｡＠ Al.berti described the first cipher disk in Trattati in cifra and the Traicti de chiffres,
published m 1586 by Blaise de Vigernere contained a square table commonly attributed to him
as well as descriptions of the first plaintext and ciphertext autokey systems.
By 1860 large codes were in common use for diplomatic communications and cipher systems
had ｢･｣ｾｭ･Ｎ＠ a rarity for this application. However, cipher systems prevailed for military
｣ｯｭｭｾｭ｣｡ｴｩｯｮｳ＠ (except for high-command communication because of the difficulty of
ｰｲｯｴ･｣ｾｮｧ＠ codebooks from capture or compromise). During the US Civil War the Federal Army
･ｾｴ･ｮｳｩｶ･ｬｹ＠ used ｴｲｾｳｰｯｳｩｴｩｯｮ＠ ciphers. The Confederate Army primarily used the Vigenere
cipher and ｯｾ＠ occasiOnal monoalphabetic substitution. While the Union cryptanalysts solved
most of the mtercepted Confederate ciphers, the Confederacy, in desperation, sometimes
published Union ciphers in newspapers, appealing for help from readers in cryptanalysing
them.
During the first world war both sides employed cipher systems almost exclusively for tactical
｣ｯｭｭｵｮｾ｣｡ｾｯｮ＠ while code systems were still used mainly for high-command and diplomatic
｣ｯｭｾｾｭ｣ｾｴｩｯｮＮ＠ Although field cipher systems such as the US Signal Corps cipher disk lacked
sophistication, some complicated cipher systems were used for high-level communications by
the end of the war. The most famous of these was the German ADFGVX fractionation cipher.
.In the 1920s the maturing of mechanical and electromechanical technology came together
With the needs of telegraphy and radio to bring about a revolution in cryptodevices-the
development ｾｦ＠ｲｯｾｯｲ＠ cipher machines. The concept of the rotor had been anticipated in the
older ｭ･｣ｨ｡ｮｾ｣ｾ＠ cipher disks however it was an American, Edward Hebern, who recognised
that by hardwxnng a monoalphabetic substitution in the connections from the contacts on one
side of an electrical rotor to those on the other side and cascading a collection of such rotors
ｰｯｬｹｾｰｾ｡｢･ｴｩ｣＠ substitutions of almost any complexity could be produced. From· 1921 and
｣ｯｮｴｩｾｵｭｧ＠ through the next decade, Hebern constructed a series of steadily improving rotor
machmes that were evaluated by the US Navy. It was undoubtedly this work which led to the
United States' superior position in cryptology during the Second World War. At almost the
same time as Hebern was inventing the rotor cipher machine in the United States European
･ｾｧｩｮ･･ｲｳ＠ such as Hugo Koch (Netherlands) and Arthur Scherbius (Germany) ｩｮ､ｾｰ･ｮ､･ｮｴｬｹ＠
､ｾｳ｣ｯｶ･ｲ･､＠ the rotor concept and designed the precursors to the most famous cipher machine in
history, the German Enigma Machine which was used during World War 2. These machines
Cryptography
were also the stimulus for the TYPEX, the cipher machine employed by the British during
World War 2.
The United States introduced the M-134-C (SIGABA) cipher machine during World War 2.
TheJapanese cipher machines of World War 2 have an interesting history linking them to both
the Hebern and the Enigma machines. After Herbert Yardley, an American cryptographer who
organised and directed the US government's first formal code-breaking efforts during and after
the first world war, published The American Black Chamber in which he outlined details of the
American successes in cryptanalysing the Japanese ciphers, the Japanese government set out to
develop the best cryptomachines possible. With this in mind, it purchased the rotor machines of
Hebern and the commercial Enigmas, as well as several other contemporary machines, for
study. In 1930 the Japanese's first rotor machine, code named RED by US cryptanalysts, was
put into service by the Japanese Foreign Office. However, drawing on experience gained ｾｯｭ＠
cryptanalysing the ciphers produced by the Hebern rotor machines, the US Army Signal
Intelligence Service team of cryptanalysts succeeded in cryptanalysing the RED ciphers. In
1939, the Japanese introduced a new cipher machine, code-named PURPLE by US
cryptanalysts, in which the rotors were replaced by telephone stepping ｳｷｩｴ｣ｨ･ｾＮ＠ The ｧｲ･ｾｴｾｳｴ＠
triumphs of cryptanalysis occurred during the Second World War when the Pohsh and Bntish
cracked the Enigma ciphers and the American cryptanalysts broke the Japanese RED,
ORANGE and PURPLE ciphers. These developments played a major role in the Allies' conduct
of World War 2.
After World War 2 the electronics that had been developed in support of radar were adapted
to cryptomachines. The first electrical cryptomachines were little more than rotor machines
where the rotors had been replaced by electronic substitutions. The only advantage of these
electronic rotor machines was their speed of operation as they were still affected by the inherent
weaknesses of the mechanical rotor machines.
The era of computers and electronics has meant an unprecedented freedom for cipher
designers to use elaborate designs which would be far too prone to error if handled ｾｴｨ＠ｾ･ｮ｣ｩｬ＠
and paper, or far too expensive to implement in the form of an electromechamcal .cipher
machine. The main thrust of development has been in the development of block ciphers,
beginning with the LUCIFER project at IBM, a direct ancestor of the DES (Data Encryption
Standard).
There is a place for both symmetric and public-key algorithms in modern cryptography.
Hybrid cryptosystems successfully combine aspects of both and seem to be ｳ･Ｎ｣ｵｲｾ＠｡ｮｾ＠ｦ｡ｳｾＮ＠
While PGP and its complex protocols are designed with the Internet commumty m mmd, ｾｴ＠
should be obvious that the encryption behind it is very strong and could be adapted to smt
many applications. There may still be instances when a simple algorithm is ｮ･｣･ｳｾ｡ｲｹＬ＠ and with
the security provided by algorithms like IDEA, there is absolutely no reason to thmk of these as
significantly less secure.
-

An article posted on the Internet on the subject of picking locks stated: "The most effective
door opening tool in any burglars' toolkit remains the crowbar". This also applies to
cryptanalysis - direct action is often the most effective. It is all very well transmitting your
messages with 128-bit IDEA encryption, but if all that is necessary to obtain that key is to walk
up to one of the computers used for encryption with a floppy disk, then the whole point of
encryption.is negated. In other words, an incredibly strong algorithm is not sufficient. For a.
system to be effective there must be effective management protocols involved. Finally, in the
words of Sir Edgar Allen Poe, "Human ingenuity cannot concoct a cipher which human
ingenuity cannot resolve."
SUlvflvfARY
• A cryptosystem is a collection of algorithms and associated procedures for hiding and
revealing information. Cryptanalysis is the process of analysing a cryptosystem, either to
verify its integrity or to break it for ulterior motives. An attacker is a person or system
that performs cryptanalysis in order to break a cryptosystem. The process of attacking a
cryptosystem is often called cracking. The job of the cryptanalyst is to find the weaknesses
in the cryptosystem.
• A message being sent is known as plaintext. The message is coded using a cryptographic
algorithm. This process is called encryption. An encrypted message is known as
ciphertext, and is turned back into plaintext by the process of decryption.
• A key is a value that causes a cryptographic algorithm to run in a specific manner
and produce a specific ciphertext as an output. The key size is usually measured in
bits. The bigger the key size, the more secure will be the algorithm.
• ｓｹｭｾ･ｴｲｩ｣＠ algorithms (or single key algorithms or secret key algorithms) have one key
that IS used both to encrypt and decrypt the message, hence their name. In order for the
recipient to decrypt the message they need to have an identical copy of the key. This
presents one major problem, the distribution of the keys.
• Block ciphers usually operate on groups of bits called blocks. Each block is processed a
multiple number of times. In each round the key is applied in a unique manner. The
more the number of iterations, the longer is the encryption process, but results in a more
secure ciphertext.
• Stream ciphers operate on plaintext one bit at a time. Plaintext is streamed as raw bits
through the encryption algorithm. While a block cipher will produce the same ciphertext
from the same plaintext using the same key, a stream cipher will not. The ciphertext
produced by a stream cipher will vary under the same conditions.
• To determine how much security one needs, the following questions must be answered:
1. What is the worth of the data to be protected?
2. How long does it need to be secure?
3. What are the resources available to the cryptanalyst/hacker?
Cryptography
• Two symmetric algorithms, both block ciphers, were discussed in this chapter. These are
the Data Encryption Standard (DES) and the International Data Encryption Algorithm
(IDEA).
• Public-key algorithms are asymmetric, that is to say the key that is ｵｳ･ｾ＠ to encrypt the
message is different to the key used to decrypt the message. The encryption key, known
as the public key is used to encrypt a message, but the message can ｯｾｬｹ＠ be ､･｣ｯｾ･､＠ by
the person that has the decryption key, known as the private key. Rivest, Shamir ｾ､＠
Adleman (RSA) algorithm and the Pretty Good Privacy (PGP) are two popular pubhc-
key encryption techniques.
• RSA relies on the fact that it is easy to multiply two large prime numbers together, but
extremely hard (i.e. time consuming) to factor them back from the result. Factoring a
number means finding its prime factors, which are the prime numbers that need to be
multiplied together in order to produce that number.
• A one-way hash function is a mathematical function that takes a message string of .any
length (pre-string) and returns a smaller fixed-length string (hash value). These functio?s
are designed in such a way that not only is it very difficult to ､･､ｾ｣･＠ the ｭｾｳｾ｡ｧ･＠ from Its
hashed version, but also that even given that all hashes are a certain length, It IS extremely
hard to find two messages that hash to the same value.
• Chaos functions can be used for secure communication and cryptographic applications.
The chaotic functions are primarily used for generating keys that are essentially
unpredictable.
• An attempted unauthorised cryptanalysis is known as an attack, of which ｴｨ･ｾ･＠ are five
major types: Brute force attack, Ciphertext-only, Known-plaintext, Chosen-plamtext and
Chosen ciphertext.
• The common techniques that are used by cryptanalysts to attack ciphertext are
differential cryptanalysis, linear cryptanalysis and algebraic attack.
• Widespread use of cryptosystems is something most ｧｯｶ･ｲｮｭ･ｮｴｾ＠ a.r:e _not ｰｾ｣ｵｬ｡Ｚｬｹ＠
happy about, because it threatens to give more privacy to the md1vtdual, mcludmg
criminals.
ＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭＭｾＭＭＭＭﾷＭＭＭＭﾷＭＭ -- ------·--·------ ｾ＠
nｉｾ＠ wtttOr"e/ ｩＭｭｰｾ＠ tha,.n; ｫＮＮｷｷｾ＠
i I ﾣｾ＠ Alliert (1879-1955)
! '
ﾷＭＭｾﾷＭ
0
8.1 We want to test the security of character+ x encrypting technique in which each alphabet
of the plaintext is shifted by n to produce the ciphertext.
(a) How many different attempts must be made to crack this code assuming brute f?rce
attack is being used?
(b) Assuming it takes a computer 1 ms to check out one value of the shift, how soon can
this code be broken into?

8.2 Suppose a group of N people want to use secret key cryptography. Each pair of people in
the group should be able to communicate secretly. How many distinct keys are required?
8.3 Transposition Ciphers rearrange the letters of the plaintext without changing the letters
themselves. For example, a very simple transposition cipher is the railfence, in which the
plaintext is staggered between two rows and then read off to give the ciphertext. In a two
row rail fence the message MERCHANT TAYLORS' SCHOOL becomes:
M R H N T Y 0 S C 0 L
E C A T A L R S H 0
Which is read out as: MRHNTYOSCOLECATALRSHO.
(a) If a cryptanalyst wants to break into the rail fence cipher, how many distinct attacks
must he make, given the length of the ciphertext is n?
(b) Suggest a decrypting algorithm for the rail fence cipher.
8.4 One of the most famous field ciphers ever was a fractionation system - the ADFGVX
cipher, which was employed by the German Army during the first world war. This system
was so named because it used a 6 x 6 matrix to substitution-encrypt the 26 letters of the
alphabet and 10 digits into pairs of the symbols A, D, F, G, V and X. The resulting
biliteral cipher is only an intermediate cipher, it is then written into a rectangular matrix
and transposed to produce the final cipher which is the one which would be transmitted.
Here is an example of enciphering the phrase "Merchant Taylors" with this cipher using
the key word "Subject". .
A D F G v X
A s u B J E c
D T A D F G H
F I K L M N 0
G p Q R v w X
v y z 0 1 2 3
X 4 5 6 7 8 9
Plaintext: M E R C H A N T T A Y L 0 R S
Ciphertext: FG AV GF AX DX DD FV DA DA DD VA FF FX GF AA
This intermediate ciphertext can then be put in a transposition matrix based on a different key.
c I p H E R
1 4 5 3 2 6
F G A v G F
A X D X D D
F v D A D A
D D v A F F
F X G F A A
The final cipher is therefore: FAFDFGDDFAVXAAFGXVDXADDVGFDAFA.
8.5
Cryptography
(a) If a cryptanalyst wants to break into ADFGVX cipher, how many distinct attacks
must he make, given the length of the ciphertext is n?
(b) Suggest a decrypting algorithm for the ADFGVX cipher.
Consider the knapsack technique for encryption proposed by Ralph ｍｾｲｫｬ･＠ of XEROX
and Martin Hellman of Stanford University in 1976. They suggested usmg the knapsack,
or subset-sum, problem as the basis for a public key cryptosystem. This problem en_tails
determining whether a number can be expressed as a sum of some ｾｵ｢ｳ･ｴ＠ of a gtven
sequence of numbers and, more importantly, which subset has the desired sum.
Given a sequence of numbers A, where A= (a1 ... aJ, and a number C, the knapsack
problem is to find a subset of a1 ... an which sums to C.
Consider the following example:
n= 5, C= 14, A= (1, 10, 5, 22, 3)
Solution= 14 = 1 + 10 + 3
In general, all the possible sums of all subsets can be expressed by:
ml al + ＢＢＢｾ＠ + mstl:3 + ... + mnan where each mi is either 0 or 1.
The solution is therefore a binary vector M = (1, 1, 0, 0, 1).
There is a total number 2n of such vectors (in this example 2
5
= 32)
Obviously not all values of C can be formed from the sum of a subset and some can be
formed in more than one way. For example, when A= (14, 28, 56, 82, 90, 132, 197, 284,
341, 455, 515) the figure 515 can be formed in three different ways but the number 516
cannot be formed in any way.
(a) If a cryptanalyst wants to break into this knapsack cipher, how many distinct attacks
must he make?
(b) Suggest a decrypting algorithm for the knapsack cipher. .
8.6
(a) Use the prime numbers 29 and 61 to generate keys using the RSA Algonthm.
(b) Represent the letters 'RSA' in ASCII and encode them using the key generated
above.
(c) Next, generate keys using the pair of primes, 37 and 67. Which is more secure, the
keys in part (a) or part (c)? Why?
8.7 Write a program that performs encryption using DES.
8.8 Write a program to encode and decode using IDEA. Compare ｴｨｾ＠ｮｾｭ｢ｾｾｾｦ＠ d
computations required to encrypt a plaintext using the same keys1ze 10r an
IDEA.
8.9 Write a general program that dm factorize a given number.
8.10 Write a program to encode and decode using the RSA algorithm. Plot the number_ of
floating point operations required to be performed by the program versus the key-size.

8.11 Consider the difference equation
Xn+ I= axn(l - Xn)
For a = 4, this function behaves like a chaos function.
(a) Plot a sequence of 100 values obtained by iterative application of the difference
equation. What happens if the starting values Xo = 0.5?
(b) Take two initial conditions (i.e., two different starting values, Xor and .xo2
) which are
separated by ｾｸＮ＠ Use the difference equation to iterate each starting point ntimes and
obtain the final values y01 and y02, which are separated by ｾｹＮ＠ For a given ｾｸＬ＠ plot L1y
versus n.
(c) For a given value of n (say n= 500), plot ｾｸ＠ verus ｾｹＮ＠
(d) Repeat parts (a), (b) and (c) for a= 3.7 and a= 3.9. Compare and comment.
(e) Develop a chaos-based encryption program that generates keys for single-key
encryption. Use the chaos function
xn+1 = 4xn(1-xJ
(fj Compare the encryption speed of this chaos-based program with that of IDEA for
a key length of 128 bits.
(g) Compare the security of this chaos-based algorithm with that of IDEA for the 128
bit long key.
Cryptography
Index
A Mathematical Theory of Communication 41
a scytale 265
AC coefficients 40
Additive White Gaussian Noise (AWGN) 56
Aeneas Tacticus 265
Algebraic attack 264
Asymmetric (Public-Key) Algorithms 254
Asymmetric Encryption 244
attacker 241
Augmented Generating Function 175
authenticity 242
Automatic Repeat Request 97
Avalanche Effect 259
Average Conditional Entropy 15
Average Conditional Self-Information 12
Average Mutual Information 11, 14
average number of nearest neighbours 221
Average Self-Information 11
Bandwidth Efficiency Diagram 60
Binary Entropy Function 12
Binary Golay Code 124
Binary Symmetric Channel 8
Blaise de Vigemere 266
Block Ciphers 247
Block Code 53, 77
Block Codes 53
Block Length 78, 161
Blocklength 161, 168
Brute force attack 263
BSC 13
Burst Error Correction 121
Burst Errors 121
Capacity Boundary 61
catastrophic 185
Catastrophic Convolutional Code 169
Catastrophic Error Propagation 170
Channel 49
Channel Capacity 50
Channel Coding 48, 76
Channel Coding Theorem 53
Channel Decoder 52
Channel Encoder 52
Channel Formatting 52
Channel Models 48
channel state information 229
channel transition probabilities 9

information_theory_coding_and_cryptograp.pdf

More Related Content

Similar to information_theory_coding_and_cryptograp.pdf (20)

Recently uploaded (20)

information_theory_coding_and_cryptograp.pdf