A Students Guide To Coding And Information Theory Stefan M Moser

A Students Guide To Coding And Information
Theory Stefan M Moser download
https://ebookbell.com/product/a-students-guide-to-coding-and-
information-theory-stefan-m-moser-4061082
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
A Students Guide To Atomic Physics Mark Fox
https://ebookbell.com/product/a-students-guide-to-atomic-physics-mark-
fox-45107878
A Students Guide To Infinite Series And Sequences Bernhard W Bach Jr
https://ebookbell.com/product/a-students-guide-to-infinite-series-and-
sequences-bernhard-w-bach-jr-45107880
A Students Guide To Laplace Transforms Daniel Fleisch
https://ebookbell.com/product/a-students-guide-to-laplace-transforms-
daniel-fleisch-45107884
A Students Guide To Special Relativity Norman Gray
https://ebookbell.com/product/a-students-guide-to-special-relativity-
norman-gray-45107886

A Students Guide To Bayesian Statistics 1st Edition Ben Lambert
https://ebookbell.com/product/a-students-guide-to-bayesian-
statistics-1st-edition-ben-lambert-46707332
A Students Guide To Law School What Counts What Helps And What Matters
Andrew B Ayers
https://ebookbell.com/product/a-students-guide-to-law-school-what-
counts-what-helps-and-what-matters-andrew-b-ayers-47365482
A Students Guide To Vectors And Tensors 1st Daniel A Fleisch
https://ebookbell.com/product/a-students-guide-to-vectors-and-
tensors-1st-daniel-a-fleisch-47562946
A Students Guide To Data And Error Analysis Herman J C Berendsen
https://ebookbell.com/product/a-students-guide-to-data-and-error-
analysis-herman-j-c-berendsen-47631418
A Students Guide To Python For Physical Modeling 2nd Edition Jesse M
Kinder
https://ebookbell.com/product/a-students-guide-to-python-for-physical-
modeling-2nd-edition-jesse-m-kinder-47733628

A Student’s Guide to Coding and Information Theory
This easy-to-read guide provides a concise introduction to the engineering background of
modern communication systems, from mobile phones to data compression and storage.
Background mathematics and specific engineering techniques are kept to a minimum,
so that only a basic knowledge of high-school mathematics is needed to understand the
material covered. The authors begin with many practical applications in coding, includ-
ing the repetition code, the Hamming code, and the Huffman code. They then explain
the corresponding information theory, from entropy and mutual information to channel
capacity and the information transmission theorem. Finally, they provide insights into
the connections between coding theory and other fields. Many worked examples are
given throughout the book, using practical applications to illustrate theoretical defini-
tions. Exercises are also included, enabling readers to double-check what they have
learned and gain glimpses into more advanced topics, making this perfect for anyone
who needs a quick introduction to the subject.
stefan m. moser is an Associate Professor in the Department of Electrical Engi-
neering at the National Chiao Tung University (NCTU), Hsinchu, Taiwan, where he has
worked since 2005. He has received many awards for his work and teaching, including
the Best Paper Award for Young Scholars by the IEEE Communications Society and
IT Society (Taipei/Tainan Chapters) in 2009, the NCTU Excellent Teaching Award, and
the NCTU Outstanding Mentoring Award (both in 2007).
po-ning chen is a Professor in the Department of Electrical Engineering at the
National Chiao Tung University (NCTU). Amongst his awards, he has received the
2000 Young Scholar Paper Award from Academia Sinica. He was also selected as
the Outstanding Tutor Teacher of NCTU in 2002, and he received the Distinguished
Teaching Award from the College of Electrical and Computer Engineering in 2003.

A Student’s Guide to Coding and
Information Theory
STEFAN M. MOSER
PO-NING CHEN
National Chiao Tung University (NCTU),
Hsinchu, Taiwan

cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Tokyo, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9781107015838
C
Cambridge University Press 2012
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2012
Printed in the United Kingdom at the University Press, Cambridge
A catalog record for this publication is available from the British Library
ISBN 978-1-107-01583-8 Hardback
ISBN 978-1-107-60196-3 Paperback
Additional resources for this publication at www.cambridge.org/moser
Cambridge University Press has no responsibility for the persistence or
accuracy of URLs for external or third-party internet websites referred to
in this publication, and does not guarantee that any content on such
websites is, or will remain, accurate or appropriate.

Contents
List of contributors page ix
Preface xi
1 Introduction 1
1.1 Information theory versus coding theory 1
1.2 Model and basic operations of information processing
systems 2
1.3 Information source 4
1.4 Encoding a source alphabet 5
1.5 Octal and hexadecimal codes 8
1.6 Outline of the book 9
References 11
2 Error-detecting codes 13
2.1 Review of modular arithmetic 13
2.2 Independent errors – white noise 15
2.3 Single parity-check code 17
2.4 The ASCII code 19
2.5 Simple burst error-detecting code 21
2.6 Alphabet plus number codes – weighted codes 22
2.7 Trade-off between redundancy and error-detecting
capability 27
2.8 Further reading 30
References 30
3 Repetition and Hamming codes 31
3.1 Arithmetics in the binary field 33
3.2 Three-times repetition code 34

vi Contents
3.3 Hamming code 40
3.3.1 Some historical background 40
3.3.2 Encoding and error correction of the (7,4)
Hamming code 42
3.3.3 Hamming bound: sphere packing 48
References 53
4 Data compression: efficient coding of a random message 55
4.1 A motivating example 55
4.2 Prefix-free or instantaneous codes 57
4.3 Trees and codes 58
4.4 The Kraft Inequality 62
4.5 Trees with probabilities 65
4.6 Optimal codes: Huffman code 66
4.7 Types of codes 73
4.8 Some historical background 78
References 79
5 Entropy and Shannon’s Source Coding Theorem 81
5.1 Motivation 81
5.2 Uncertainty or entropy 86
5.2.1 Definition 86
5.2.2 Binary entropy function 88
5.2.3 The Information Theory Inequality 89
5.2.4 Bounds on the entropy 90
5.3 Trees revisited 92
5.4 Bounds on the efficiency of codes 95
5.4.1 What we cannot do: fundamental limitations
of source coding 95
5.4.2 What we can do: analysis of the best codes 97
5.4.3 Coding Theorem for a Single Random Message 101
5.5 Coding of an information source 103
5.8 Appendix: Uniqueness of the definition of entropy 111
References 112

Contents vii
6 Mutual information and channel capacity 115
6.1 Introduction 115
6.2 The channel 116
6.3 The channel relationships 118
6.4 The binary symmetric channel 119
6.5 System entropies 122
6.6 Mutual information 126
6.7 Definition of channel capacity 130
6.8 Capacity of the binary symmetric channel 131
6.9 Uniformly dispersive channel 134
6.10 Characterization of the capacity-achieving input distri-
bution 136
6.11 Shannon’s Channel Coding Theorem 138
References 141
7 Approaching the Shannon limit by turbo coding 143
7.1 Information Transmission Theorem 143
7.2 The Gaussian channel 145
7.3 Transmission at a rate below capacity 146
7.4 Transmission at a rate above capacity 147
7.5 Turbo coding: an introduction 155
7.7 Appendix: Why we assume uniform and independent
data at the encoder 160
7.8 Appendix: Definition of concavity 164
References 165
8 Other aspects of coding theory 167
8.1 Hamming code and projective geometry 167
8.2 Coding and game theory 175
References 182
References 183
Index 187

Contributors
Po-Ning Chen (Chapter 7)
Francis Lu (Chapter 3 and 8)
Stefan M. Moser (Chapter 4 and 5)
Chung-Hsuan Wang (Chapter 1 and 2)
Jwo-Yuh Wu (Chapter 6)

Preface
Most of the books on coding and information theory are prepared for those
who already have good background knowledge in probability and random pro-
cesses. It is therefore hard to find a ready-to-use textbook in these two subjects
suitable for engineering students at the freshmen level, or for non-engineering
major students who are interested in knowing, at least conceptually, how in-
formation is encoded and decoded in practice and the theories behind it. Since
communications has become a part of modern life, such knowledge is more
and more of practical significance. For this reason, when our school requested
us to offer a preliminary course in coding and information theory for students
who do not have any engineering background, we saw this as an opportunity
and initiated the plan to write a textbook.
In preparing this material, we hope that, in addition to the aforementioned
purpose, the book can also serve as a beginner’s guide that inspires and at-
tracts students to enter this interesting area. The material covered in this book
has been carefully selected to keep the amount of background mathematics
and electrical engineering to a minimum. At most, simple calculus plus a lit-
tle probability theory are used here, and anything beyond that is developed
as needed. Its first version has been used as a textbook in the 2009 summer
freshmen course Conversion Between Information and Codes: A Historical
View at National Chiao Tung University, Taiwan. The course was attended by
47 students, including 12 from departments other than electrical engineering.
Encouraged by the positive feedback from the students, the book went into
a round of revision that took many of the students’ comments into account.
A preliminary version of this revision was again the basis of the correspond-
ing 2010 summer freshmen course, which this time was attended by 51 stu-
dents from ten different departments. Specific credit must be given to Professor
Chung-Hsuan Wang, who volunteered to teach these 2009 and 2010 courses
and whose input considerably improved the first version, to Ms. Hui-Ting

xii Preface
Chang (a graduate student in our institute), who has redrawn all the figures
and brought them into shape, and to Pei-Yu Shih (a post-doc in our institute)
and Ting-Yi Wu (a second-year Ph.D. student in our institute), who checked
the readability and feasibility of all exercises. The authors also gratefully ac-
knowledge the support from our department, which continues to promote this
course.
Among the eight chapters in this book, Chapters 1 to 4 discuss coding tech-
niques (including error-detecting and error-correcting codes), followed by a
briefing in information theory in Chapters 5 and 6. By adopting this arrange-
ment, students can build up some background knowledge on coding through
concrete examples before plunging into information theory. Chapter 7 con-
cludes the quest on information theory by introducing the Information Trans-
mission Theorem. It attempts to explain the practical meaning of the so-called
Shannon limit in communications, and reviews the historical breakthrough of
turbo coding, which, after 50 years of research efforts, finally managed to ap-
proach this limit. The final chapter takes a few glances at unexpected relations
between coding theory and other fields. This chapter is less important for an
understanding of the basic principles, and is more an attempt to broaden the
view on coding and information theory.
In summary, Chapter 1 gives an overview of this book, including the system
model, some basic operations of information processing, and illustrations of
how an information source is encoded.
Chapter 2 looks at ways of encoding source symbols such that any errors,
up to a given level, can be detected at the receiver end. Basics of modular
arithmetic that will be used in the analysis of the error-detecting capability are
also included and discussed.
Chapter 3 introduces the fundamental concepts of error-correcting codes us-
ing the three-times repetition code and the Hamming code as starting exam-
ples. The error-detecting and -correcting capabilities of general linear block
codes are also discussed.
Chapter 4 looks at data compression. It shows how source codes represent
the output of an information source efficiently. The chapter uses Professor
James L. Massey’s beautifully simple and elegant approach based on trees.
By this means it is possible to prove all main results in an intuitive fashion that
relies on graphical explanations and requires no abstract math.
Chapter 5 presents a basic introduction to information theory and its main
quantity entropy, and then demonstrates its relation to the source coding of
Chapter 4. Since the basic definition of entropy and some of its properties
are rather dry mathematical derivations, some time is spent on motivating the
definitions. The proofs of the fundamental source coding results are then again

Preface xiii
based on trees and are therefore scarcely abstract in spite of their theoretical
importance.
Chapter 6 addresses how to convey information reliably over a noisy com-
munication channel. The mutual information between channel input and output
is defined and then used to quantify the maximal amount of information that
can get through a channel (the so-called channel capacity). The issue of how
to achieve channel capacity via proper selection of the input is also discussed.
Chapter 7 begins with the introduction of the Information Transmission
Theorem over communication channels corrupted by additive white Gaussian
noise. The optimal error rate that has been proven to be attainable by Claude
E. Shannon (baptized the Shannon limit) is then addressed, particularly for the
situation when the amount of transmitted information is above the channel ca-
pacity. The chapter ends with a simple illustration of turbo coding, which is
considered the first practical design approaching the Shannon limit.
Chapter 8 describes two particularly interesting connections between coding
theory and seemingly unrelated fields: firstly the relation of the Hamming code
to projective geometry is discussed, and secondly an application of codes to
game theory is given.
The title, A Student’s Guide to Coding and Information Theory, expresses
our hope that this book is suitable as a beginner’s guide, giving an overview to
anyone who wishes to enter this area. In order not to scare the students (espe-
cially those without an engineering background), no problems are given at the
end of each chapter as usual textbooks do. Instead, the problems are incorpo-
rated into the main text in the form of Exercises. The readers are encouraged
to work them out. They are very helpful in understanding the concepts and are
motivating examples for the theories covered in this book at a more advanced
level.
The book will undergo further revisions as long as the course continues to
be delivered. If a reader would like to provide comments or correct typos and
errors, please email any of the authors. We will appreciate it very much!

1
Introduction
Systems dedicated to the communication or storage of information are com-
monplace in everyday life. Generally speaking, a communication system is a
system which sends information from one place to another. Examples include
telephone networks, computer networks, audio/video broadcasting, etc. Stor-
age systems, e.g. magnetic and optical disk drives, are systems for storage and
later retrieval of information. In a sense, such systems may be regarded as com-
munication systems which transmit information from now (the present) to then
(the future). Whenever or wherever problems of information processing arise,
there is a need to know how to compress the textual material and how to protect
it against possible corruption. This book is to cover the fundamentals of infor-
mation theory and coding theory, to solve the above main problems, and to
give related examples in practice. The amount of background mathematics and
electrical engineering is kept to a minimum. At most, simple results of calculus
and probability theory are used here, and anything beyond that is developed as
needed.
1.1 Information theory versus coding theory
Information theory is a branch of probability theory with extensive applica-
tions to communication systems. Like several other branches of mathematics,
information theory has a physical origin. It was initiated by communication
scientists who were studying the statistical structure of electrical communica-
tion equipment and was principally founded by Claude E. Shannon through the
landmark contribution [Sha48] on the mathematical theory of communications.
In this paper, Shannon developed the fundamental limits on data compression
and reliable transmission over noisy channels. Since its inception, information
theory has attracted a tremendous amount of research effort and provided lots

2 Introduction
of inspiring insights into many research fields, not only communication and
signal processing in electrical engineering, but also statistics, physics, com-
puter science, economics, biology, etc.
Coding theory is mainly concerned with explicit methods for efficient and
reliable data transmission or storage, which can be roughly divided into data
compression and error-control techniques. Of the two, the former attempts to
compress the data from a source in order to transmit or store them more effi-
ciently. This practice is found every day on the Internet where data are usually
transformed into the ZIP format to make files smaller and reduce the network
load.
The latter adds extra data bits to make the transmission of data more robust
to channel disturbances. Although people may not be aware of its existence in
many applications, its impact has been crucial to the development of the Inter-
net, the popularity of compact discs (CD), the feasibility of mobile phones, the
success of the deep space missions, etc.
Logically speaking, coding theory leads to information theory, and informa-
tion theory provides the performance limits on what can be done by suitable
encoding of the information. Thus the two theories are intimately related, al-
though in the past they have been developed to a great extent quite separately.
One of the main purposes of this book is to show their mutual relationships.
1.2 Model and basic operations of information
processing systems
Communication and storage systems can be regarded as examples of informa-
tion processing systems and may be represented abstractly by the block dia-
gram in Figure 1.1. In all cases, there is a source from which the information
originates. The information source may be many things; for example, a book,
music, or video are all information sources in daily life.
Encoder Information
Information
source
Decoder
Channel
sink
Figure 1.1 Basic information processing system.
The source output is processed by an encoder to facilitate the transmission
(or storage) of the information. In communication systems, this function is
often called a transmitter, while in storage systems we usually speak of a

1.2 Model and basic operations 3
recorder. In general, three basic operations can be executed in the encoder:
source coding, channel coding, and modulation. For source coding, the en-
coder maps the source output into digital format. The mapping is one-to-one,
and the objective is to eliminate or reduce the redundancy, i.e. that part of the
data which can be removed from the source output without harm to the infor-
mation to be transmitted. So, source coding provides an efficient representation
of the source output. For channel coding, the encoder introduces extra redun-
dant data in a prescribed fashion so as to combat the noisy environment in
which the information must be transmitted or stored. Discrete symbols may
not be suitable for transmission over a physical channel or recording on a digi-
tal storage medium. Therefore, we need proper modulation to convert the data
after source and channel coding to waveforms that are suitable for transmission
or recording.
The output of the encoder is then transmitted through some physical com-
munication channel (in the case of a communication system) or stored in some
physical storage medium (in the case of a storage system). As examples of
the former we mention wireless radio transmission based on electromagnetic
waves, telephone communication through copper cables, and wired high-speed
transmission through fiber optic cables. As examples of the latter we indicate
magnetic storage media, such as those used by a magnetic tape, a hard-drive, or
a floppy disk drive, and optical storage disks, such as a CD-ROM1 or a DVD.2
Each of these examples is subject to various types of noise disturbances. On a
telephone line, the disturbance may come from thermal noise, switching noise,
or crosstalk from other lines. On magnetic disks, surface defects and dust par-
ticles are regarded as noise disturbances. Regardless of the explicit form of the
medium, we shall refer to it as the channel.
Information conveyed through (or stored in) the channel must be recovered
at the destination and processed to restore its original form. This is the task
of the decoder. In the case of a communication system, this device is often
referred to as the receiver. In the case of a storage system, this block is often
called the playback system. The signal processing performed by the decoder
can be viewed as the inverse of the function performed by the encoder. The
output of the decoder is then presented to the final user, which we call the
information sink.
The physical channel usually produces a received signal which differs from
the original input signal. This is because of signal distortion and noise intro-
duced by the channel. Consequently, the decoder can only produce an estimate
1 CD-ROM stands for compact disc read-only memory.
2 DVD stands for digital video disc or digital versatile disc.

4 Introduction
of the original information message. All well-designed systems aim at repro-
ducing as reliably as possible while sending as much information as possible
per unit time (for communication systems) or per unit storage (for storage sys-
tems).
1.3 Information source
Nature usually supplies information in continuous forms like, e.g., a beauti-
ful mountain scene or the lovely chirping of birds. However, digital signals in
which both amplitude and time take on discrete values are preferred in modern
communication systems. Part of the reason for this use of digital signals is that
they can be transmitted more reliably than analog signals. When the inevitable
corruption of the transmission system begins to degrade the signal, the digital
pulses can be detected, reshaped, and amplified to standard form before relay-
ing them to their final destination. Figure 1.2 illustrates an ideal binary digital
pulse propagating along a transmission line, where the pulse shape is degraded
as a function of line length. At a propagation distance where the transmitted
pulse can still be reliably identified (before it is degraded to an ambiguous
state), the pulse is amplified by a digital amplifier that recovers its original
ideal shape. The pulse is thus regenerated. On the other hand, analog signals
cannot be so reshaped since they take an infinite variety of shapes. Hence the
farther the signal is sent and the more it is processed, the more degradation it
suffers from small errors.
Propagation distance
Original signal Regenerated signal
Figure 1.2 Pulse degradation and regeneration.
Modern practice for transforming analog signals into digital form is to sam-
ple the continuous signal at equally spaced intervals of time, and then to quan-
tize the observed value, i.e. each sample value is approximated by the nearest

level in a finite set of discrete levels. By mapping each quantized sample to a
codeword consisting of a prescribed number of code elements, the information
is then sent as a stream of digits. The conversion process is illustrated in Fig-
ure 1.3. Figure 1.3(a) shows a segment of an analog waveform. Figure 1.3(b)
shows the corresponding digital waveform based on the binary code in Ta-
ble 1.1. In this example, symbols 0 and 1 of the binary code are represented by
zero and one volt, respectively. Each sampled value is quantized into four bi-
nary digits (bits) with the last bit called sign bit indicating whether the sample
value is positive or negative. The remaining three bits are chosen to represent
the absolute value of a sample in accordance with Table 1.1.
Table 1.1 Binary representation of quantized levels
Index of Binary Index expressed as
quantization level representation sum of powers of 2
0 000
1 001 20
2 010 21
3 011 21 +20
4 100 22
5 101 22 +20
6 110 22 + 21
7 111 22 + 21 +20
As a result of the sampling and quantizing operations, errors are introduced
into the digital signal. These errors are nonreversible in that it is not possible to
produce an exact replica of the original analog signal from its digital represen-
tation. However, the errors are under the designer’s control. Indeed, by proper
selection of the sampling rate and number of the quantization levels, the errors
due to the sampling and quantizing can be made so small that the difference
between the analog signal and its digital reconstruction is not discernible by a
human observer.
1.4 Encoding a source alphabet
Based on the discussion in Section 1.3, we can assume without loss of gener-
ality that an information source generates a finite (but possibly large) number
of messages. This is undoubtedly true for a digital source. As for an analog

6 Introduction
2
1
5
−6
−2
Voltage
Time
(a)
0
0
0 0
0
0
0
0
0
0 1
1
1
1
1
1
1 1
1
1
1.0
0.0
Sign bit
Sign bit
Sign bit
Sign bit
Sign bit
(negative)
(negative) (positive) (positive)
(positive)
(b)
Figure 1.3 (a) Analog waveform. (b) Digital representation.

source, the analog-to-digital conversion process mentioned above also makes
the assumption feasible. However, even though specific messages are actually
sent, the system designer has no idea in advance which message will be chosen
for transmission. We thus need to think of the source as a random (or stochas-
tic) source of information, and ask how we may encode, transmit, and recover
the original information.
An information source’s output alphabet is defined as the collection of all
possible messages. Denote by U a source alphabet which consists of r mes-
sages, say u1,u2,...,ur, with probabilities p1, p2,..., pr satisfying
pi ≥ 0, ∀i, and
r
∑
i=1
pi = 1. (1.1)
Here the notation ∀ means “for all” or “for every.” We can always represent
each message by a sequence of bits, which provides for easier processing by
computer systems. For instance, if we toss a fair dice to see which number
faces up, only six possible outputs are available with U = {1,2,3,4,5,6} and
pi = 1/6, ∀ 1 ≤ i ≤ 6. The following shows a straightforward binary description
of these messages:
1 ↔ 001, 2 ↔ 010, 3 ↔ 011, 4 ↔ 100, 5 ↔ 101, 6 ↔ 110, (1.2)
where each decimal number is encoded as its binary expression. Obviously,
there exist many other ways of encoding. For example, consider the two map-
pings listed below:
1 ↔ 00, 2 ↔ 01, 3 ↔ 100, 4 ↔ 101, 5 ↔ 110, 6 ↔ 111 (1.3)
and
1 ↔ 1100, 2 ↔ 1010, 3 ↔ 0110, 4 ↔ 1001, 5 ↔ 0101, 6 ↔ 0011. (1.4)
Note that all the messages are one-to-one mapped to the binary sequences,
no matter which of the above encoding methods is employed. The original
message can always be recovered from the binary sequence.
Given an encoding method, let li denote the length of the output sequence,
called the codeword, corresponding to ui, ∀ 1 ≤ i ≤ r. From the viewpoint of
source coding for data compression, an optimal encoding should minimize the
average length of codewords defined by
Lav ,
r
∑
i=1
pili. (1.5)

8 Introduction
By (1.5), the average lengths of codewords in (1.2), (1.3), and (1.4) are, re-
spectively,
L
(1.2)
av =
1
6
3+
1
6
3+
1
6
3+
1
6
3+
1
6
3+
1
6
3 = 3, (1.6)
L
(1.3)
av =
1
6
2+
1
6
2+
1
6
3+
1
6
3+
1
6
3+
1
6
3 =
8
3
' 2.667, (1.7)
L
(1.4)
av =
1
6
4+
1
6
4+
1
6
4+
1
6
4+
1
6
4+
1
6
4 = 4. (1.8)
The encoding method in (1.3) thus provides a more efficient way for the rep-
resentation of these source messages.
As for channel coding, a good encoding method should be able to protect
the source messages against the inevitable noise corruption. Suppose 3 is to be
transmitted and an error occurs in the least significant bit (LSB), namely the
first bit counted from the right-hand side of the associated codeword. In the
case of code (1.2) we now receive 010 instead of 011, and in the case of code
(1.3) we receive 101 instead of 100. In both cases, the decoder will retrieve
a wrong message (2 and 4, respectively). However, 0111 will be received if 3
is encoded by (1.4). Since 0111 is different from all the codewords in (1.4),
we can be aware of the occurrence of an error, i.e. the error is detected, and
possible retransmission of the message can be requested. Not just the error in
the LSB, but any single error can be detected by this encoding method. The
code (1.4) is therefore a better choice from the viewpoint of channel coding.
Typically, for channel coding, the encoding of the message to be transmitted
over the channel adds redundancy to combat channel noise. On the other hand,
the source encoding usually removes redundancy contained in the message to
be compressed. A more detailed discussion on channel and source coding will
be shown in Chapters 2 and 3 and in Chapters 4 and 5, respectively.
1.5 Octal and hexadecimal codes
Although the messages of an information source are usually encoded as bi-
nary sequences, the binary code is sometimes inconvenient for humans to use.
People usually prefer to make a single discrimination among many things. Ev-
idence for this is the size of the common alphabets. For example, the English
alphabet has 26 letters, the Chinese “alphabet” (bopomofo) has 37 letters, the
Phoenician alphabet has 22 letters, the Greek alphabet has 24 letters, the Rus-
sian alphabet 33, the Cyrillic alphabet has 44 letters, etc. Thus, for human use,
it is often convenient to group the bits into groups of three at a time and call
them the octal code (base 8). This code is given in Table 1.2.

1.6 Outline of the book 9
Table 1.2 Octal code
Binary Octal
000 0
001 1
010 2
011 3
100 4
101 5
110 6
111 7
When using the octal representation, numbers are often enclosed in paren-
theses with a following subscript 8. For example, the decimal number 81 is
written in octal as (121)8 since 81 = “1”×82 +“2”×81 +“1”×80. The trans-
lation from octal to binary is so immediate that there is little trouble in going
either way.
The binary digits are sometimes grouped in fours to make the hexadecimal
code (Table 1.3). For instance, to translate the binary sequence 101011000111
to the octal form, we first partition these bits into groups of three:
101
|{z} 011
|{z} 000
|{z} 111
|{z}. (1.9)
Each group of bits is then mapped to an octal number by Table 1.2, hence
resulting in the octal representation (5307)8. If we partition the bits into groups
of four, i.e.
1010
|{z}1100
|{z}0111
|{z}, (1.10)
we can get the hexadecimal representation (AC7)16 by Table 1.3. Since com-
puters usually work in bytes, which are 8 bits each, the hexadecimal code fits
into the machine architecture better than the octal code. However, the octal
code seems to fit better into the human’s psychology. Thus, in practice, neither
code has a clear victory over the other.
1.6 Outline of the book
After the introduction of the above main topics, we now have a basis for dis-
cussing the material the book is to cover.

10 Introduction
Table 1.3 Hexadecimal code
Binary Hexadecimal
0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
1000 8
1001 9
1010 A
1011 B
1100 C
1101 D
1110 E
1111 F
In general, the error-detecting capability will be accomplished by adding
some digits to the message, thus making the message slightly longer. The main
problem is to achieve a required protection against the inevitable channel er-
rors without too much cost in adding extra digits. Chapter 2 will look at ways
of encoding source symbols so that any errors, up to a given level, may be
detected at the terminal end. For a detected error, we might call for a repeat
transmission of the message, hoping to get it right the next time.
In contrast to error-detecting codes, error-correcting codes are able to cor-
rect some detected errors directly without having to retransmit the message a
second time. In Chapter 3, we will discuss two kinds of error-correcting codes,
the repetition code and the Hamming code, as well as their encoding and de-
coding methods.
In Chapter 4, we consider ways of representing information in an efficient
way. The typical example will be an information source that can take on r
different possible values. We will represent each of these r values by a string
of 0s and 1s with varying length. The question is how to design these strings
such that the average length is minimized, but such that we are still able to
recover the original data from it. So, in contrast to Chapters 2 and 3, here we
try to shorten the codewords.

References 11
While in Chapters 2 to 4 we are concerned with coding theory, Chapter 5 in-
troduces information theory. We define some way of measuring “information”
and then apply it to the codes introduced in Chapter 4. By doing so we can not
only compare different codes but also derive some fundamental limits of what
is possible and what not. So Chapter 5 provides the information theory related
to the coding theory introduced in Chapter 4.
In Chapter 6, we continue on the path of information theory and develop the
relation to the coding theory of Chapters 2 and 3. Prior to the mid 1940s people
believed that transmitted data subject to noise corruption can never be perfectly
recovered unless the transmission rate approaches zero. Shannon’s landmark
work in 1948 [Sha48] disproved this thinking and established a fundamental
result for modern communication: as long as the transmission rate is below
a certain threshold (the so-called channel capacity), errorless data transmis-
sion can be realized by some properly designed coding scheme. Chapter 6 will
highlight the essentials regarding the channel capacity. We shall first introduce
a communication channel model from the general probabilistic setting. Based
on the results of Chapter 5, we then go on to specify the mutual information,
which provides a natural way of characterizing the channel capacity.
In Chapter 7, we build further on the ideas introduced in Chapters 2, 3, and
6. We will cover the basic concept of the theory of reliable transmission of in-
formation bearing signals over a noisy communication channel. In particular,
we will discuss the additive white Gaussian noise (AWGN) channel and intro-
duce the famous turbo code that is the first code that can approach the Shannon
limit of the AWGN channel up to less than 1 dB at a bit error rate (BER) of
10−5.
Finally, in Chapter 8, we try to broaden the view by showing two relations
of coding theory to quite unexpected fields. Firstly we explain a connection of
projective geometry to the Hamming code of Chapter 3. Secondly we show
how codes (in particular the three-times repetition code and the Hamming
code) can be applied to game theory.
References
[Sha48] Claude E. Shannon, “A mathematical theory of communication,” Bell System
Technical Journal, vol. 27, pp. 379–423 and 623–656, July and October 1948.
Available: http://moser.cm.nctu.edu.tw/nctu/doc/shannon1948.pdf

2
Error-detecting codes
When a message is transmitted, the inevitable noise disturbance usually de-
grades the quality of communication. Whenever repetition is possible, it is
sufficient to detect the occurrence of an error. When an error is detected, we
simply repeat the message, and it may be correct the second time or even pos-
sibly the third time.
It is not possible to detect an error if every possible symbol, or set of sym-
bols, that can be received is a legitimate message. It is only possible to catch
errors if there are some restrictions on what a proper message is. The prob-
lem is to keep these restrictions on the possible messages down to ones that
are simple. In practice, “simple” means “easily computable.” In this chapter,
we will mainly investigate the problem of designing codes such that any sin-
gle error can be detected at the receiver. In Chapter 3, we will then consider
correcting the errors that occur during the transmission.
2.1 Review of modular arithmetic
We first give a quick review of the basic arithmetic which is extensively used
in the following sections. For binary digits, which take values of only 0 and 1,
the rules for addition and multiplication are defined by
0+0 = 0
0+1 = 1
1+0 = 1
1+1 = 0
and
0×0 = 0
0×1 = 0
1×0 = 0
1×1 = 1,
(2.1)
respectively. For example, by (2.1), we have
1+1×0+0+1×1 = 1+0+0+1 = 0. (2.2)

14 Error-detecting codes
If we choose to work in the decimal arithmetic, the binary arithmetic in (2.1)
can be obtained by dividing the result in decimal by 2 and taking the remainder.
For example, (2.2) yields
1+0+0+1 = 2 ≡ 0 mod 2. (2.3)
Occasionally, we may work modulo some number other than 2 for the case
of a nonbinary source. Given a positive integer m, for the addition and multi-
plication mod m (“mod” is an abbreviation for “modulo”), we merely divide
the result in decimal by m and take the nonnegative remainder. For instance,
consider an information source with five distinct outputs 0, 1, 2, 3, 4. It follows
that
2+4 = 1×5+“1” ⇐⇒ 2+4 ≡ 1 mod 5, (2.4)
3×4 = 2×5+“2” ⇐⇒ 3×4 ≡ 2 mod 5. (2.5)
Other cases for the modulo 5 addition and multiplication can be referred to in
Table 2.1.
Table 2.1 Addition and multiplication modulo 5
+ mod 5 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
× mod 5 0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
4 0 4 3 2 1
For multiplication mod m, we have to be more careful if m is not a prime.
Suppose that we have the numbers a and b congruent to a0 and b0 modulo the
modulus m. This means that
a ≡ a0
mod m and b ≡ b0
mod m (2.6)
or
a = a0
+k1m and b = b0
+k2m (2.7)
for some integers k1 and k2. For the product ab, we have
ab = a0
b0
+a0
k1m+b0
k2m+k1k2m2
(2.8)
and hence
ab ≡ a0
b0
mod m. (2.9)

2.2 Independent errors – white noise 15
Now consider the particular case
a = 15, b = 12, m = 10. (2.10)
We have a0 = 5 and b0 = 2 by (2.7) and ab ≡ a0b0 ≡ 0 mod 10 by (2.9). But
neither a nor b is zero! Only for a prime modulus do we have the important
property that if a product is zero, then at least one factor must be zero.
Exercise 2.1 In order to become more familiar with the modular operation
check out the following problems:
3×6+7 ≡ ? mod 11 (2.11)
and
5−4×2 ≡ ? mod 7. (2.12)
♦
More on modular arithmetic can be found in Section 3.1.
2.2 Independent errors – white noise
To simplify the analysis of noise behavior, we assume that errors in a message
satisfy the following constraints:
(1) the probability of an error in any binary position is assumed to be a fixed
number p, and
(2) errors in different positions are assumed to be independent.1
Such noise is called “white noise” in analogy with white light, which is sup-
posed to contain uniformly all the frequencies detected by the human eye.
However, in practice, there are often reasons for errors to be more common
in some positions in the message than in others, and it is often true that errors
tend to occur in bursts and not to be independent.We assume white noise in the
very beginning because this is the simplest case, and it is better to start from
the simplest case and move on to more complex situations after we have built
up a solid knowledge on the simple case.
Consider a message consisting of n digits for transmission. For white noise,
the probability of no error in any position is given by
(1− p)n
. (2.13)
1 Given events A`, they are said to be independent if Pr(
Tn
`=1 A`) = ∏n
`=1 Pr(A`). Here “
T
`”
denotes set-intersection, i.e.
T
` A` is the set of elements that are members of all sets A`. Hence,
Pr(
Tn
`=1 A`) is the event that all events A` occur at the same time. The notation ∏` is a short-
hand for multiplication: ∏n
`=1 a` , a1 ·a2 ···an.

The probability of a single error in the message is given by
np(1− p)n−1
. (2.14)
The probability of ` errors is given by the (` + 1)th term in the binomial ex-
pansion:
1 = (1− p)+ p
n
(2.15)
=

n
0

(1− p)n
+

n
1

p(1− p)n−1
+

n
2

p2
(1− p)n−2
+···+

n
n

pn
(2.16)
= (1− p)n
+np(1− p)n−1
+
n(n−1)
2
p2
(1− p)n−2
+···+ pn
. (2.17)
For example, the probability of exactly two errors is given by
n(n−1)
2
p2
(1− p)n−2
. (2.18)
We can obtain the probability of an even number of errors (0,2,4,...) by
adding the following two binomial expansions and dividing by 2:
1 = (1− p)+ p
n
=
n
∑
`=0

n
`

p`
(1− p)n−`
, (2.19)
(1−2p)n
= (1− p)− p
n
=
n
∑
`=0
(−1)`

n
`

p`
(1− p)n−`
. (2.20)
Denote by bξc the greatest integer not larger than ξ. We have2
Pr(An even number of errors) =
bn/2c
∑
`=0

n
2`

p2`
(1− p)n−2`
(2.21)
=
1+(1−2p)n
2
. (2.22)
The probability of an odd number of errors is 1 minus this number.
Exercise 2.2 Actually, this is a good chance to practice your basic skills on
the method of induction: can you show that
bn/2c
∑
`=0

n
2`

p2`
(1− p)n−2`
=
1+(1−2p)n
2
(2.23)
and
b(n−1)/2c
∑
`=0

n
2`+1

p2`+1
(1− p)n−2`−1
=
1−(1−2p)n
2
(2.24)
2 Note that zero errors also counts as an even number of errors here.

2.3 Single parity-check code 17
by induction on n?
Hint: Note that

n+1
k

=

n
k

+

n
k −1

(2.25)
for n,k ≥ 1. ♦
2.3 Single parity-check code
The simplest way of encoding a binary message to make it error-detectable
is to count the number of 1s in the message, and then append a final binary
digit chosen so that the entire message has an even number of 1s in it. The
entire message is therefore of even parity. Thus to (n − 1) message positions
we append an nth parity-check position. Denote by x` the original bit in the
`th message position, ∀1 ≤ ` ≤ n − 1, and let xn be the parity-check bit. The
constraint of even parity implies that
xn =
n−1
∑
`=1
x` (2.26)
by (2.1). Note that here (and for the remainder of this book) we omit “mod 2”
and implicitly assume it everywhere. Let y` be the channel output correspond-
ing to x`, ∀1 ≤ ` ≤ n. At the receiver, we firstly count the number of 1s in the
received sequence y. If the even-parity constraint is violated for the received
vector, i.e.
n
∑
`=1
y` 6= 0, (2.27)
this indicates that at least one error has occurred.
For example, given a message (x1,x2,x3,x4) = (0111), the parity-check bit
is obtained by
x5 = 0+1+1+1 = 1, (2.28)
and hence the resulting even-parity codeword (x1,x2,x3,x4,x5) is (01111).
Suppose the codeword is transmitted, but a vector y = (00111) is received. In
this case, an error in the second position is met. We have
y1 +y2 +y3 +y4 +y5 = 0+0+1+1+1 = 1(6= 0); (2.29)
thereby the error is detected. However, if another vector of (00110) is re-
ceived, where two errors (in the second and the last position) have occurred,

no error will be detected since
y1 +y2 +y3 +y4 +y5 = 0+0+1+1+0 = 0. (2.30)
Evidently in this code any odd number of errors can be detected. But any even
number of errors cannot be detected.
For channels with white noise, (2.22) gives the probability of any even num-
ber of errors in the message. Dropping the first term of (2.21), which corre-
sponds to the probability of no error, we have the following probability of
undetectable errors for the single parity-check code introduced here:
Pr(Undetectable errors) =
bn/2c
∑
`=1

n
2`

p2`
(1− p)n−2`
(2.31)
=
1+(1−2p)n
2
−(1− p)n
. (2.32)
The probability of detectable errors, i.e. all the odd-number errors, is then ob-
tained by
Pr(Detectable errors) = 1−
1+(1−2p)n
2
=
1−(1−2p)n
2
. (2.33)
Obviously, we should have that
Pr(Detectable errors) Pr(Undetectable errors). (2.34)
For p very small, we have
Pr(Undetectable errors) =
1+(1−2p)n
2
−(1− p)n
(2.35)
=
1
2
+
1
2

n
0

−

n
1

(2p)+

n
2

(2p)2
−···

−

n
0

−

n
1

p+

n
2

p2
−···

(2.36)
=
1
2
+
1
2

1−2np+
n(n−1)
2
4p2
−···

−

1−np+
n(n−1)
2
p2
−···

(2.37)
'
n(n−1)
2
p2
(2.38)
and
Pr(Detectable errors) =
1−(1−2p)n
2
(2.39)
=
1
2
−
1
2

n
0

−

n
1

(2p)+···

(2.40)

2.4 The ASCII code 19
=
1
2
−
1
2
[1−2np+···] (2.41)
' np. (2.42)
In the above approximations, we only retain the leading term that dominates
the sum.
Hence, (2.34) requires
np
n(n−1)
2
p2
, (2.43)
and implies that the shorter the message, the better the detecting performance.
In practice, it is common to break up a long message in the binary alphabet
into blocks of (n−1) digits and to append one binary digit to each block. This
produces the redundancy of
n
n−1
= 1+
1
n−1
, (2.44)
where the redundancy is defined as the total number of binary digits divided
by the minimum necessary. The excess redundancy is 1/(n − 1). Clearly, for
low redundancy we want to use long messages, but for high reliability short
messages are better. Thus the choice of the length n for the blocks to be sent is
a compromise between the two opposing forces.
2.4 The ASCII code
Here we introduce an example of a single parity-check code, called the Amer-
ican Standard Code for Information Interchange (ASCII), which was the first
code developed specifically for computer communications. Each character in
ASCII is represented by seven data bits constituting a unique binary sequence.
Thus a total of 128 (= 27) different characters can be represented in ASCII.
The characters are various commonly used letters, numbers, special control
symbols, and punctuation symbols, e.g. $, %, and @. Some of the special con-
trol symbols, e.g. ENQ (enquiry) and ETB (end of transmission block), are
used for communication purposes. Other symbols, e.g. BS (back space) and
CR (carriage return), are used to control the printing of characters on a page.
A complete listing of ASCII characters is given in Table 2.2.
Since computers work in bytes which are blocks of 8 bits, a single ASCII
symbol often uses 8 bits. The eighth bit is set so that the total number of 1s in
the eight positions is an even number. For example, consider “K” in Table 2.2
encoded as (113)8, which can be transformed into binary form as follows:
(113)8 = 1001011 (2.45)

Table 2.2 Seven-bit ASCII code
Octal Char. Octal Char. Octal Char. Octal Char.
code code code code
000 NUL 040 SP 100 @ 140 ‘
001 SOH 041 ! 101 A 141 a
002 STX 042 ” 102 B 142 b
003 ETX 043 # 103 C 143 c
004 EOT 044 $ 104 D 144 d
005 ENQ 045 % 105 E 145 e
006 ACK 046 106 F 146 f
007 BEL 047 ’ 107 G 147 g
010 BS 050 ( 110 H 150 h
011 HT 051 ) 111 I 151 i
012 LF 052 * 112 J 152 j
013 VT 053 + 113 K 153 k
014 FF 054 , 114 L 154 l
015 CR 055 - 115 M 155 m
016 SO 056 . 116 N 156 n
017 SI 057 / 117 O 157 o
020 DLE 060 0 120 P 160 p
021 DC1 061 1 121 Q 161 q
022 DC2 062 2 122 R 162 r
023 DC3 063 3 123 S 163 s
024 DC4 064 4 124 T 164 t
025 NAK 065 5 125 U 165 u
026 SYN 066 6 126 V 166 v
027 ETB 067 7 127 W 167 w
030 CAN 070 8 130 X 170 x
031 EM 071 9 131 Y 171 y
032 SUB 072 : 132 Z 172 z
033 ESC 073 ; 133 [ 173 {
034 FS 074 134 174 |
035 GS 075 = 135 ] 175 }
036 RS 076 136 ˆ 176 ˜
037 US 077 ? 137 177 DEL

2.5 Simple burst error-detecting code 21
(where we have dropped the first 2 bits of the first octal symbol). In this case,
the parity-check bit is 0; “K” is thus encoded as 10010110 for even parity. You
are encouraged to encode the remaining characters in Table 2.2.
By the constraint of even parity, any single error, a 0 changed into a 1 or a
1 changed into a 0, will be detected3 since after the change there will be an
odd number of 1s in the eight positions. Thus, we have an error-detecting code
that helps to combat channel noise. Perhaps more importantly, the code makes
it much easier to maintain the communication quality since the machine can
detect the occurrence of errors by itself.
2.5 Simple burst error-detecting code
In some situations, errors occur in bursts rather than in isolated positions in the
received message. For instance, lightning strikes, power-supply fluctuations,
loose flakes on a magnetic surface are all typical causes of a burst of noise.
Suppose that the maximum length of any error burst4 that we are to detect is
L. To protect data against the burst errors, we first divide the original message
into a sequence of words consisting of L bits. Aided with a pre-selected error-
detecting code, parity checks are then computed over the corresponding word
positions, instead of the bit positions.
Based on the above scenario, if an error burst occurs within one word, in
effect only a single word error is observed. If an error burst covers the end of
one word and the beginning of another, still no two errors corresponding to the
same position of words will be met, since we assumed that any burst length l
satisfies 0 ≤ l ≤ L. Consider the following example for illustration.
Example 2.3 If the message is
Hello NCTU
and the maximum burst error length L is 8, we can use the 7-bit ASCII code in
Table 2.2 and protect the message against burst errors as shown in Table 2.3.
(Here no parity check is used within the ASCII symbols.) The encoded mes-
sage is therefore
Hello NCTUn
3 Actually, to be precise, every odd number of errors is detected.
4 An error burst is said to have length L if errors are confined to L consecutive positions. By
this definition, the error patterns 0111110, 0101010, and 0100010 are all classified as bursts of
length 5. Note that a 0 in an error pattern denotes that no error has happened in that position,
while a 1 denotes an error. See also (3.34) in Section 3.3.2.

Table 2.3 Special type of parity check to protect against burst errors of
maximum length L = 8
H = (110)8 = 01001000
e = (145)8 = 01100101
l = (154)8 = 01101100
l = (154)8 = 01101100
o = (157)8 = 01101111
SP= (040)8 = 00100000
N = (116)8 = 01001110
C = (103)8 = 01000011
T = (124)8 = 01010100
U = (125)8 = 01010101
Check sum = 01101110 = n
where n is the parity-check symbol.
Suppose a burst error of length 5, as shown in Table 2.4, is met during the
transmission of the above message, where the bold-face positions are in error.
In this case, the burst error is successfully detected since the check sum is not
00000000. However, if the burst error of length 16 shown in Table 2.5 occurs,
the error will not be detected due to the all-zero check sum. ♦
Exercise 2.4 Could you repeat the above process of encoding for the case of
L = 16? Also, show that the resulting code can detect all the bursts of length
at most 16. ♦
Exercise 2.5 Can you show that the error might not be detected if there is
more than one burst, even if each burst is of length no larger than L? ♦
2.6 Alphabet plus number codes – weighted codes
The codes we have discussed so far were all designed with respect to a simple
form of “white noise” that causes some bits to be flipped. This is very suit-
able for many types of machines. However, in some systems, where people are
involved, other types of noise are more appropriate. The first common human
error is to interchange adjacent digits of numbers; for example, 38 becomes 83.
A second common error is to double the wrong one of a triple of digits, where
two adjacent digits are the same; for example, 338 becomes 388. In addition,
the confusion of O (“oh”) and 0 (“zero”) is also very common.

Table 2.4 A burst error of length 5 has occurred during transmission and is
detected because the check sum is not 0000000; bold-face positions denote
positions in error
H ⇒ K 0 1 0 0 1 0 1 1
e ⇒ ENQ 0 0 0 0 0 1 0 1
l 0 1 1 0 1 1 0 0
l 0 1 1 0 1 1 0 0
o 0 1 1 0 1 1 1 1
SP 0 0 1 0 0 0 0 0
N 0 1 0 0 1 1 1 0
C 0 1 0 0 0 0 1 1
T 0 1 0 1 0 1 0 0
U 0 1 0 1 0 1 0 1
n 0 1 1 0 1 1 1 0
Check sum = 0 1 1 0 0 0 1 1
Table 2.5 A burst error of length 16 has occurred during transmission, but it
is not detected; bold-face positions denote positions in error
H ⇒ K 0 1 0 0 1 0 1 1
e ⇒ J 0 1 0 0 1 0 1 0
l ⇒ @ 0 1 0 0 0 0 0 0
l 0 1 1 0 1 1 0 0
o 0 1 1 0 1 1 1 1
SP 0 0 1 0 0 0 0 0
N 0 1 0 0 1 1 1 0
C 0 1 0 0 0 0 1 1
T 0 1 0 1 0 1 0 0
U 0 1 0 1 0 1 0 1
n 0 1 1 0 1 1 1 0
Check sum = 0 0 0 0 0 0 0 0

Table 2.6 Weighted sum: progressive digiting
Message Sum Sum of sum
w w w
x w+x 2w+x
y w+x+y 3w+2x+y
z w+x+y+z 4w+3x+2y+z
In English text-based systems, it is quite common to have a source alphabet
consisting of the 26 letters, space, and the 10 decimal digits. Since the size of
this source alphabet, 37 (= 26 + 1 + 10), is a prime number, we can use the
following method to detect the presence of the above described typical errors.
Firstly, each symbol in the source alphabet is mapped to a distinct number in
{0,1,2,...,36}. Given a message for encoding, we weight the symbols with
weights 1,2,3,..., beginning with the check digit of the message. Then, the
weighted digits are summed together and reduced to the remainder after divid-
ing by 37. Finally, a check symbol is selected such that the sum of the check
symbol and the remainder obtained above is congruent to 0 modulo 37.
To calculate this sum of weighted digits easily, a technique called progres-
sive digiting, illustrated in Table 2.6, has been developed. In Table 2.6, it is
supposed that we want to compute the weighted sum for a message wxyz, i.e.
4w+3x+2y+1z. For each symbol in the message, we first compute the run-
ning sum from w to the symbol in question, thereby obtaining the second col-
umn in Table 2.6. We can sum these sums again in the same way to obtain the
desired weighted sum.
Example 2.6 We assign a distinct number from {0,1,2,...,36} to each
symbol in the combined alphabet/number set in the following way: “0” = 0,
“1” = 1, “2” = 2, ..., “9” = 9, “A” = 10, “B” = 11, “C” = 12, ..., “Z” = 35,
and “space” = 36. Then we encode
3B 8.
We proceed with the progressive digiting as shown in Table 2.7 and obtain a
weighted sum of 183. Since 183 mod 37 = 35 and 35+2 is divisible by 37, it
follows that the appended check digit should be
“2” = 2.

Table 2.7 Progressive digiting for the example of “3B 8”: we need to add
“2” = 2 as a check-digit to make sure that the weighted sum divides 37
Sum Sum of sum
“3” = 3 3 3
“B” = 11 14 17
“space” = 36 50 67
“8” = 8 58 125
Check-digit = ?? 58 183
4
37 / 183
148
35
Table 2.8 Checking the encoded message “3B 82”
3 3×5 = 15
B 11×4 = 44
“space” 36×3 = 108
8 8×2 = 16
2 2×1 = 2
Sum = 185 = 37×5 ≡ 0 mod 37
The encoded message is therefore given by
3B 82.
To check whether this is a legitimate message at the receiver, we proceed as
shown in Table 2.8.
Now suppose “space” is lost during the transmission such that only “3B82”
is received. Such an error can be detected since the weighted sum is now not
congruent to 0 mod 37; see Table 2.9. Similarly, the interchange from “82” to
Table 2.9 Checking the corrupted message “3B82”
3 3×4 = 12
B 11×3 = 33
8 8×2 = 16
2 2×1 = 2
Sum = 63 6≡ 0 mod 37

Table 2.10 Checking the corrupted message “3B 28”
3 3×5 = 15
B 11×4 = 44
“space” 36×3 = 108
2 2×2 = 4
8 8×1 = 8
Sum = 179 6≡ 0 mod 37
“28” can also be detected; see Table 2.10. ♦
In the following we give another two examples of error-detecting codes that
are based on modular arithmetic and are widely used in daily commerce.
Example 2.7 The International Standard Book Number (ISBN) is usually a
10-digit code used to identify a book uniquely. A typical example of the ISBN
is as follows:
0
|{z}
country
ID
– 52
|{z}
publisher
ID
18 – 4868
| {z }
book
number
– 7
|{z}
check
digit
where the hyphens do not matter and may appear in different positions. The
first digit stands for the country, with 0 meaning the United States and some
other English-speaking countries. The next two digits are the publisher ID;
here 52 means Cambridge University Press. The next six digits, 18 – 4868, are
the publisher-assigned book number. The last digit is the weighted check sum
modulo 11 and is represented by “X” if the required check digit is 10.
To confirm that this number is a legitimate ISBN number we proceed as
shown in Table 2.11. It checks! ♦
Exercise 2.8 Check whether 0 – 8044 – 2957 – X is a valid ISBN number. ♦
Example 2.9 The Universal Product Code (UPC) is a 12-digit single parity-
check code employed on the bar codes of most merchandise to ensure reliabil-
ity in scanning. A typical example of UPC takes the form
0 36000
| {z }
manufacturer
ID
29145
| {z }
item
number
2
|{z}
parity
check

2.7 Redundancy versus error-detecting capability 27
Table 2.11 Checking the ISBN number 0 – 5218 – 4868 – 7
Sum Sum of sum
0 0 0
5 5 5
2 7 12
1 8 20
8 16 36
4 20 56
8 28 84
6 34 118
8 42 160
7 49 209 = 11×19 ≡ 0 mod 11
where the last digit is the parity-check digit. Denote the digits as x1,x2,...,x12.
The parity digit x12 is determined such that
3(x1 +x3 +x5 +x7 +x9 +x11)+(x2 +x4 +x6 +x8 +x10 +x12) (2.46)
is a multiple5 of 10. In this case,
3(0+6+0+2+1+5)+(3+0+0+9+4+2) = 60. (2.47)
♦
2.7 Trade-off between redundancy and
error-detecting capability
As discussed in the previous sections, a single parity check to make the whole
message even-parity can help the detection of any single error (or even any odd
number of errors). However, if we want to detect the occurrence of more errors
in a noisy channel, what can we do for the design of error-detecting codes? Can
such a goal be achieved by increasing the number of parity checks, i.e. at the
cost of extra redundancy? Fortunately, the answer is positive. Let us consider
the following illustrative example.
5 Note that in this example the modulus 10 is used although this is not a prime. The slightly
unusual summation (2.46), however, makes sure that every single error can still be detected.
The reason why UPC chooses 10 as the modulus is that the check digit should also range from
0 to 9 so that it can easily be encoded by the bar code.

Example 2.10 For an information source of eight possible outputs, obviously
each output can be represented by a binary 3-tuple, say (x1,x2,x3). Suppose
three parity checks x4, x5, x6 are now appended to the original message by the
following equations:







x4 = x1 +x2,
x5 = x1 +x3,
x6 = x2 +x3,
(2.48)
to form a legitimate codeword (x1,x2,x3,x4,x5,x6). Compared with the sin-
gle parity-check code, this code increases the excess redundancy from 1/3 to
3/3. Let (y1,y2,y3,y4,y5,y6) be the received vector as (x1,x2,x3,x4,x5,x6) is
transmitted. If at least one of the following parity-check equations is violated:







y4 = y1 +y2,
y5 = y1 +y3,
y6 = y2 +y3,
(2.49)
the occurrence of an error is detected.
For instance, consider the case of a single error in the ith position such that
yi = xi +1 and y` = x`, ∀` ∈ {1,2,...,6}{i}. (2.50)
It follows that























y4 6= y1 +y2, y5 6= y1 +y3 if i = 1,
y4 6= y1 +y2, y6 6= y2 +y3 if i = 2,
y5 6= y1 +y3, y6 6= y2 +y3 if i = 3,
y4 6= y1 +y2 if i = 4,
y5 6= y1 +y3 if i = 5,
y6 6= y2 +y3 if i = 6.
(2.51)
Therefore, all single errors can be successfully detected. In addition, consider
the case of a double error in the ith and jth positions, respectively, such that
yi = xi +1, yj = xj +1, and y` = x`, ∀` ∈ {1,2,...,6}{i, j}. (2.52)

2.7 Redundancy versus error-detecting capability 29
We then have







































































y5 6= y1 +y3, y6 6= y2 +y3 if (i, j) = (1,2),
y4 6= y1 +y2, y6 6= y2 +y3 if (i, j) = (1,3),
y5 6= y1 +y3 if (i, j) = (1,4),
y4 6= y1 +y2 if (i, j) = (1,5),
y4 6= y1 +y2, y5 6= y1 +y3, y6 6= y2 +y3 if (i, j) = (1,6),
y4 6= y1 +y2, y5 6= y1 +y3 if (i, j) = (2,3),
y6 6= y2 +y3 if (i, j) = (2,4),
y4 6= y1 +y2, y5 6= y1 +y3, y6 6= y2 +y3 if (i, j) = (2,5),
y4 6= y1 +y2 if (i, j) = (2,6),
y4 6= y1 +y2, y5 6= y1 +y3, y6 6= y2 +y3 if (i, j) = (3,4),
y6 6= y2 +y3 if (i, j) = (3,5),
y5 6= y1 +y3 if (i, j) = (3,6),
y4 6= y1 +y2, y5 6= y1 +y3 if (i, j) = (4,5),
y4 6= y1 +y2, y6 6= y2 +y3 if (i, j) = (4,6),
y5 6= y1 +y3, y6 6= y2 +y3 if (i, j) = (5,6).
(2.53)
Hence, this code can detect any pattern of double errors. ♦
Exercise 2.11 Unfortunately, not all triple errors may be caught by the code
of Example 2.10. Can you give an example for verification? ♦
Without a proper design, however, increasing the number of parity checks
may not always improve the error-detecting capability. For example, consider
another code which appends the parity checks by







x4 = x1 +x2 +x3,
x5 = x1 +x2 +x3,
x6 = x1 +x2 +x3.
(2.54)
In this case, x5 and x6 are simply repetitions of x4. Following a similar discus-
sion as in Example 2.10, we can show that all single errors are still detectable.
But if the following double error occurs during the transmission:
y1 = x1 +1, y2 = x2 +1, and y` = x`, ∀3 ≤ ` ≤ 6, (2.55)
none of the three parity-check equations corresponding to (2.54) will be vi-
olated. This code thus is not double-error-detecting even though the same
amount of redundancy is required as in the code (2.48).

2.8 Further reading
In this chapter simple coding schemes, e.g. single parity-check codes, burst
error-detecting codes, and weighted codes, have been introduced to detect the
presence of channel errors. However, there exists a class of linear block codes,
called cyclic codes, which are probably the most widely used form of error-
detecting codes. The popularity of cyclic codes arises primarily from the fact
that these codes can be implemented with extremely cost-effective electronic
circuits. The codes themselves also possess a high degree of structure and reg-
ularity (which gives rise to the promising advantage mentioned above), and
there is a certain beauty and elegance in the corresponding theory. Interested
readers are referred to [MS77], [Wic94], and [LC04] for more details of cyclic
codes.
References
[LC04] Shu Lin and Daniel J. Costello, Jr., Error Control Coding, 2nd edn. Prentice
Hall, Upper Saddle River, NJ, 2004.
[MS77] F. Jessy MacWilliams and Neil J. A. Sloane, The Theory of Error-Correcting
Codes. North-Holland, Amsterdam, 1977.
[Wic94] Stephen B. Wicker, Error Control Systems for Digital Communication and
Storage. Prentice Hall, Englewood Cliffs, NJ, 1994.

3
Repetition and Hamming codes
The theory of error-correcting codes comes from the need to protect informa-
tion from corruption during transmission or storage. Take your CD or DVD as
an example. Usually, you might convert your music into MP3 files1 for stor-
age. The reason for such a conversion is that MP3 files are more compact and
take less storage space, i.e. they use fewer binary digits (bits) compared with
the original format on CD. Certainly, the price to pay for a smaller file size is
that you will suffer some kind of distortion, or, equivalently, losses in audio
quality or fidelity. However, such loss is in general indiscernible to human au-
dio perception, and you can hardly notice the subtle differences between the
uncompressed and compressed audio signals. The compression of digital data
streams such as audio music streams is commonly referred to as source coding.
We will consider it in more detail in Chapters 4 and 5.
What we are going to discuss in this chapter is the opposite of compression.
After converting the music into MP3 files, you might want to store these files
on a CD or a DVD for later use. While burning the digital data onto a CD, there
is a special mechanism called error control coding behind the CD burning pro-
cess. Why do we need it? Well, the reason is simple. Storing CDs and DVDs in-
evitably causes small scratches on the disk surface. These scratches impair the
disk surface and create some kind of lens effect so that the laser reader might
not be able to retrieve the original information correctly. When this happens,
the stored files are corrupted and can no longer be used. Since the scratches
are inevitable, it makes no sense to ask the users to keep the disks in per-
fect condition, or discard them once a perfect read-out from the disk becomes
impossible. Therefore, it would be better to have some kind of engineering
mechanism to protect the data from being compromised by minor scratches.
1 MP3 stands for MPEG-2 audio layer 3, where MPEG is the abbreviation for moving picture
experts group.

32 Repetition and Hamming codes
We use error-correcting codes to accomplish this task. Error-correcting codes
are also referred to as channel coding in general.
First of all, you should note that it is impossible to protect the stored MP3
files from impairment without increasing the file size. To see this, say you have
a binary data stream s of length k bits. If the protection mechanism were not
allowed to increase the length, after endowing s with some protection capa-
bility, the resulting stream x is at best still of length k bits. Then the whole
protection process is nothing but a mapping from a k-bit stream to another k-
bit stream. Such mapping is, at its best, one-to-one and onto, i.e. a bijection,
since if it were not a bijection, it would not be possible to recover the original
data. On the other hand, because of the bijection, when the stored data stream
x is corrupted, it is impossible to recover the original s. Therefore, we see that
the protection process (henceforth we will refer to it as an encoding process)
must be an injection, meaning x must have length larger than k, say n, so that
when x is corrupted, there is a chance that s may be recovered by using the
extra (n−k) bits we have used for storing extra information.
How to encode efficiently a binary stream of length k with minimum (n−k)
extra bits added so that the length k stream s is well protected from corrup-
tion is the major concern of error-correcting codes. In this chapter, we will
briefly introduce two kinds of error-correcting codes: the repetition code and
the Hamming code. The repetition code, as its name suggests, simply repeats
information and is the simplest error-protecting/correcting scheme. The Ham-
ming code, developed by Richard Hamming when he worked at Bell Labs in
the late 1940s (we will come back to this story in Section 3.3.1), on the other
hand, is a bit more sophisticated than the repetition code. While the original
Hamming code is actually not that much more complicated than the repeti-
tion code, it turns out to be optimal in terms of sphere packing in some high-
dimensional space. Specifically, this means that for certain code length and
error-correction capability, the Hamming code actually achieves the maximal
possible rate, or, equivalently, it requires the fewest possible extra bits.
Besides error correction and data protection, the Hamming code is also good
in many other areas. Readers who wish to know more about these subjects
are referred to Chapter 8, where we will briefly discuss two other uses of the
Hamming code. We will show in Section 8.1 how the Hamming code relates
to a geometric subject called projective geometry, and in Section 8.2 how the
Hamming code can be used in some mathematical games.

3.1 Arithmetics in the binary field 33
3.1 Arithmetics in the binary field
Prior to introducing the codes, let us first study the arithmetics of binary oper-
ations (see also Section 2.1). These are very important because the digital data
is binary, i.e. each binary digit is either of value 0 or 1, and the data will be
processed in a binary fashion. By binary operations we mean binary addition,
subtraction, multiplication, and division. The binary addition is a modulo-2
addition, i.e.
0+0 = 0,
1+0 = 1,
0+1 = 1,
1+1 = 0.
(3.1)
The only difference between binary and usual additions is the case of 1 + 1.
Usual addition would say 1 + 1 = 2. But since we are working with modulo-
2 addition, meaning the sum is taken as the remainder when divided by 2,
the remainder of 2 divided by 2 equals 0, hence we have 1 + 1 = 0 in binary
arithmetics.
By moving the second operand to the right of these equations, we obtain
subtractions:
0 = 0−0,
1 = 1−0,
0 = 1−1,
1 = 0−1.
(3.2)
Further, it is interesting to note that the above equalities also hold if we replace
“−” by “+”. Then we realize that, in binary, subtraction is the same as addition.
This is because the remainder of −1 divided by 2 equals 1, meaning −1 is
considered the same as 1 in binary. In other words,
a−b = a+(−1)×b = a+(1)×b = a+b. (3.3)
Also, it should be noted that the above implies
a−b = b−a = a+b (3.4)
in binary, while this is certainly false for real numbers.

Multiplication in binary is the same as usual, and we have
0×0 = 0,
1×0 = 0,
0×1 = 0,
1×1 = 1.
(3.5)
The same holds also for division.
Exercise 3.1 Show that the laws of association and distribution hold for
binary arithmetics. That is, show that for any a,b,c ∈ {0,1} we have
a+b+c = (a+b)+c = a+(b+c) (additive associative law),
a×b×c = (a×b)×c = a×(b×c) (multiplicative associative law),
a×(b+c) = (a×b)+(a×c) (distributive law). ♦
Exercise 3.2 In this chapter, we will use the notation
?
= to denote a con-
ditional equality, by which we mean that we are unsure whether the equality
holds. Show that the condition of a
?
= b in binary is the same as a+b
?
= 0. ♦
3.2 Three-times repetition code
A binary digit (or bit in short) s is to be stored on CD, but it could be corrupted
for some reason during read-out. To recover the corrupted data, a straight-
forward means of protection is to store as many copies of s as possible. For
simplicity, say we store three copies. Such a scheme is called the three-times
repetition code. Thus, instead of simply storing s, we store (s,s,s). To distin-
guish them, let us denote the first s as x1 and the others as x2 and x3. In other
words, we have
(
x2 = x3 = 0 if x1 = 0,
x2 = x3 = 1 if x1 = 1,
(3.6)
and the possible values of (x1,x2,x3) are (000) and (111).
When you read out the stream (x1,x2,x3) from a CD, you must check wheth-
er x1 = x2 and x1 = x3 in order to detect if there was a data corruption. From
Exercise 3.2, this can be achieved by the following computation:
(
data clean if x1 +x2 = 0 and x1 +x3 = 0,
data corrupted otherwise.
(3.7)
For example, if the read-out is (x1,x2,x3) = (000), then you might say the data

3.2 Three-times repetition code 35
is clean. Otherwise, if the read-out shows (x1,x2,x3) = (001) you immediately
find x1 +x3 = 1 and the data is corrupted.
Now say that the probability of writing in 0 and reading out 1 is p, and
the same for writing in 1 and reading out 0. You see that a bit is corrupted
with probability p and remains clean with probability (1− p). Usually we can
assume p 1/2, meaning the data is more likely to be clean than corrupted. In
the case of p 1/2, a simple bit-flipping technique of treating the read-out of
1 as 0 and 0 as 1 would do the trick.
Thus, when p 1/2, the only possibilities for data corruption going unde-
tected are the cases when the read-out shows (111) given writing in was (000)
and when the read-out shows (000) given writing in was (111). Each occurs
with probability2 p3 1/8. Compared with the case when the data is unpro-
tected, the probability of undetectable corruption drops from p to p3. It means
that when the read-out shows either (000) or (111), we are more confident
that such a read-out is clean.
The above scheme is commonly referred to as error detection (see also
Chapter 2), by which we mean we only detect whether the data is corrupted,
but we do not attempt to correct the errors. However, our goal was to correct
the corrupted data, not just detect it. This can be easily achieved with the rep-
etition code. Consider the case of a read-out (001): you would immediately
guess that the original data is more likely to be (000), which corresponds to
the binary bit s = 0. On the other hand, if the read-out shows (101), you would
guess the second bit is corrupted and the data is likely to be (111) and hence
determine the original s = 1.
There is a good reason for such a guess. Again let us denote by p the proba-
bility of a read-out bit being corrupted, and let us assume3 that the probability
of s being 0 is 1/2 (and of course the same probability for s being 1). Then,
given the read-out (001), the probability of the original data being (000) can
be computed as follows. Here again we assume that the read-out bits are cor-
rupted independently. Assuming Pr[s = 0] = Pr[s = 1] = 1/2, it is clear that
Pr(Writing in (000)) = Pr(Writing in (111)) =
1
2
. (3.8)
It is also easy to see that
2 Here we assume each read-out bit is corrupted independently, meaning whether one bit is cor-
rupted or not has no effect on the other bits being corrupted or not. With this independence
assumption, the probability of having three corrupted bits is p· p· p = p3.
3 Why do we need this assumption? Would the situation be different without this assumption?
Take the case of Pr[s = 0] = 0 and Pr[s = 1] = 1 as an example.

Pr(Writing in (000) and reading out (001))
= Pr(Writing in (000))·Pr(Reading out (001)| Writing in (000)) (3.9)
= Pr(Writing in (000))·Pr(0 → 0)·Pr(0 → 0)·Pr(0 → 1) (3.10)
=
1
2
·(1− p)·(1− p)· p (3.11)
=
(1− p)2 p
2
. (3.12)
Similarly, we have
Pr(Writing in (111) and reading out (001)) =
(1− p)p2
2
. (3.13)
These together show that
Pr(Reading out (001))
= Pr(Writing in (000) and reading out (001))
+ Pr(Writing in (111) and reading out (001)) (3.14)
=
(1− p)p
2
. (3.15)
Thus
Pr(Writing in (000)| Reading out (001))
=
Pr(Writing in (000) and reading out (001))
Pr(Reading out (001))
(3.16)
= 1− p. (3.17)
Similarly, it can be shown that
Pr(Writing in (111)| Reading out (001)) = p. (3.18)
As p 1/2 by assumption, we immediately see that
1− p p, (3.19)
and, given that the read-out is (001), the case of writing in (111) is less likely.
Hence we would guess the original data is more likely to be (000) due to its
higher probability. Arguing in a similar manner, we can construct a table for
decoding, shown in Table 3.1.
From Table 3.1, we see that given the original data being (000), the cor-
rectable error events are the ones when the read-outs are (100), (010), and
(001), i.e. the ones when only one bit is in error. The same holds for the other
write-in of (111). Thus we say that the three-times repetition code is a single-
error-correcting code, meaning the code is able to correct all possible one-bit
errors. If there are at least two out of the three bits in error during read-out,

Another Random Scribd Document
with Unrelated Content

small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws regulating
charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states where
we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

A Students Guide To Coding And Information Theory Stefan M Moser

More Related Content

Similar to A Students Guide To Coding And Information Theory Stefan M Moser (20)

Recently uploaded (20)

A Students Guide To Coding And Information Theory Stefan M Moser