SlideShare a Scribd company logo
SELECTED SOLUTIONS TO 
PRINCIPLES OF DIGITAL 
COMMUNICATION 
Cambridge Press 2008 
by ROBERT G. GALLAGER 
A complete set of solutions is available from Cambridge Press for instructors teaching a class 
using this text. This is a subset of solutions that I feel would be valuable for those studying the 
subject on their own. 
Chapter 2 
Exercise 2.2: 
(a) V +W is a random variable, so its expectation, by definition, is 
E[V +W] = 
X 
v∈V 
X 
w∈W 
(v + w)pVW(v,w) 
= 
X 
v∈V 
X 
w∈W 
v pVW(v,w) + 
X 
v∈V 
X 
w∈W 
w pVW(v,w) 
= 
X 
v∈V 
v 
X 
w∈W 
pVW(v,w) + 
X 
w∈W 
w 
X 
v∈V 
pVW(v,w) 
= 
X 
v∈V 
v pV (v) + 
X 
w∈W 
w pW(w) 
= E[V ] + E[W]. 
(b) Once again, working from first principles, 
E[V ·W] = 
X 
v∈V 
X 
w∈W 
(v · w)pVW(v,w) 
= 
X 
v∈V 
X 
w∈W 
(v · w)pV (v)pW(w) (Using independence) 
= 
X 
v∈V 
v pV (v) 
X 
w∈W 
w pW(w) 
= E[V ] · E[W]. 
(c) To discover a case where E[V ·W]6= E[V ] ·E[W], first try the simplest kind of example where 
V and W are binary with the joint pmf 
pVW(0, 1) = pVW(1, 0) = 1/2; pVW(0, 0) = pVW(1, 1) = 0. 
1
Clearly, V and W are not independent. Also, E[V · W] = 0 whereas E[V ] = E[W] = 1/2 and 
hence E[V ] · E[W] = 1/4. 
The second case requires some experimentation. One approach is to choose a joint distribution 
such that E[V ·W] = 0 and E[V ] = 0. A simple solution is then given by the pmf, 
pVW(−1, 0) = pVW(0, 1) = pVW(1, 0) = 1/3. 
Again, V and W are not independent. Clearly, E[V ·W] = 0. Also, E[V ] = 0 (what is E[W]?). 
Hence, E[V ·W] = E[V ] · E[W]. 
(d) 
σ2V 
+W = E[(V +W)2] − (E[V +W])2 
2W 
2V 
= E[V 2] + E[W2] + E[2V ·W] − (E[V ] + E[W])2 
= E[V 2] + E[W2] + 2E[V ] · E[W] − E[V ]2 − E[W]2 − 2E[V ] · E[W] 
= E[V 2] − E[V ]2 + E[W2] − E[W]2 
= σ+ σ. 
Exercise 2.4: 
(a) Since X1 and X2 are iid, symmetry implies that Pr(X1 > X2) = Pr(X2 > X1). These two 
events are mutually exclusive and the event X1 = X2 has 0 probability. Thus Pr(X1 > X2) and 
Pr(X1 < X2) sum to 1, so must each be 1/2. Thus Pr(X1 ≥ X2) = Pr(X2 ≥ X1) = 1/2. 
(b) Invoking the symmetry among X1, X2 and X3, we see that each has the same probability of 
being the smallest of the three (the probability of a tie is 0). These three events are mutually 
exclusive and their probabilities must add up to 1. Therefore each event occurs with probability 
1/3. 
(c) The event {N > n} is the same as the event {X1 is the minimum among the n iid 
random variables X1, X2, · · · , Xn}. By extending the argument in part (b), we see that 
Pr(X1 is the smallest of X1, . . . ,Xn) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= 1 
n−1 for 
n ≥ 2. 
(d) Since N is a non-negative integer random variable (taking on values from 2 to 1), we can 
use Exercise 2.3(a) as follows: 
E[N] = 
1X 
n=1 
Pr {N ≥ n} 
= Pr {N ≥ 1} + 
1X 
n=2 
Pr {N ≥ n} 
= 1 + 
1X 
n=2 
1 
n − 1 
= 1 + 
1X 
n=1 
1 
n 
. 
2
Since the series 
P 
1n=1 
1n 
diverges, we conclude that E[N] = 1. 
(e) Since the alphabet has a finite number of letters,1 Pr(X1 = X2) is no longer 0 and depends 
on the particular probability distribution. Thus, although, Pr(X1 ≥ X2) = Pr(X2 ≥ X1) by 
symmetry, neither can be found without knowing the distribution. 
Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric 
value. If X1 = amin, then no subsequent rv X2,X3, . . . can have a smaller value, so N = 1 in 
this case. Since the event X1 = amin occurs with positive probability, E[N] = 1. 
Exercise 2.6: 
(a) Assume the contrary; i.e., there is a suffix-free code that is not uniquely decodable. Then that 
code must contain two distinct sequences of source letters, say, x1, x2, . . . , xn and x01, x02, . . . , x0m 
such that, 
C(x1)C(x2) . . . C(xn) = C(x01)C(x02) . . . C(x0m). 
Then one of the following must hold: 
• C(xn) = C(x0m) 
• C(xn) is a suffix of C(x0m) 
• C(x0m) is a suffix of C(xn). 
In the last two cases we arrive at a contradiction since the code is hypothesized to be suffix-free. 
In the first case, xn must equal x0m because of the suffix freedom. Simply delete that final letter 
from each sequence and repeat the argument. Since the sequences are distinct, the final letter 
must differ after some number of repetitions of the above argument, and at that point one of 
the latter two cases holds and a contradiction is reached. 
Hence, suffix-free codes are uniquely decodable. 
(b) Any prefix-free code becomes a suffix-free code if the ordering of symbols in each codeword 
is reversed. About the simplest such example is {0,01,11} which can be seen to be a suffix-free 
code (with codeword lengths {1, 2, 2}) but not a prefix-free code. 
A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder. 
To illustrate a rather extreme case, consider the following output produced by the encoder, 
0111111111 . . . 
Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the two 
possible source sequences, 
acccccccc . . . 
and 
bcccccccc . . . , 
1The same results can be obtained with some extra work for a countably infinite discrete alphabet. 
3
till the end of the string is reached. Hence, in this case the decoder might have to wait for an 
arbitrarily long time before decoding. 
(c) There cannot be any code with codeword lengths (1, 2, 2) that is both prefix free and suffix 
free. Without loss of generality, set C1 = 0. Then a prefix-free code cannot use either the 
codewords 00 and 01 for C2 or C3, and thus must use 10 and 11, which is not suffix free. 
Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then, 
u1=0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after the 
binary point, which is not possible. Hence, the algorithm fails. 
Exercise 2.9: 
(a) Assume, as usual, that pj > 0 for each j. From Eqs. (2.8) and (2.9) 
H[X] − L = 
MX 
j=1 
pj log 
2−lj 
pj ≤ 
MX 
j=1 
pj 
Σ 
2−lj 
pj − 1 
Π 
log e = 0. 
As is evident from Figure 2.7, the inequality is strict unless 2−lj = pj for each j. Thus if 
H[X] = L, it follows that 2−lj = pj for each j. 
(b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) = 
1/4. The first order node 0 corresponds to the letter a and has probability 1/2. The first order 
node 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2. 
✓ 
✓ 
✏✏✏ 
PPP 
✏✏✏PPP 
PPP 
✟✟✟ 
❍❍❍PPP 
✓✓ 
❅ 
❅❅ 
✏✏✏ 
b 
✏✏✏ 
1 
0 
✏✏✏ 
PPP 
✟✟✟ 
PPP 
a 
c 
bb 
bc 
cb 
cc 
ba 
ca 
ab 
ac 
aa 
aa → 00 
ab → 011 
ac → 010 
ba → 110 
bb → 1111 
bc → 1110 
ca → 100 
cb → 1011 
cc → 1010 
Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the second 
order node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In the 
same way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed with 
higher order nodes in the same way, but what is the principle behind this? 
In general, when an infinite binary tree is used to represent an unending sequence of letters 
from an iid source where each letter j has probability pj and length `j = 2−j , we Q 
see that each 
node corresponding to an initial sequence of letters x1, . . . , xn has a probability 
i 2−`xi equal 
to the product of the individual letter probabilities and an order equal to 
P 
i `xi . Thus each 
node labelled by a subsequence of letters has a probability 2−` where ` is the order of that node. 
The other nodes (those unlabelled in the example above) have a probability equal to the sum of 
the immediately following labelled nodes. This probability is again 2−` for an `th order node, 
which can be established by induction if one wishes to be formal. 
4
Exercise 2.11: (a) For n = 2, 
 
 
MX 
j=1 
2−lj 
 
 
2 
= 
 
 
MX 
j1=1 
2−lj1 
 
 
 
 
MX 
j2=1 
2−lj2 
 
 = 
MX 
j1=1 
MX 
j2=1 
2−(lj1+lj2 ). 
The same approach works for arbitrary n. 
(b) Each source n-tuple xn = (aj1aj2 , . . . , ajn), is encoded into a concatenation 
C(aj1)C(aj2) . . .C(ajn) of binary digits of aggregate length l(xn) = lj1 + lj2 + · · · ,+ljn. Since 
there is one n-tuple xn for each choice of aj1 , aj2 , . . . , ajn, the result of part (a) can be rewritten 
as 
 
 
MX 
j=1 
2−lj 
 
 
n 
= 
X 
xn 
2−l(xn). (1) 
(c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate length 
i, 
 
 
MX 
j=1 
2−lj 
 
 
n 
= 
nXlmax 
i=1 
Ai2−i. 
This uses the fact that since each codeword has length at most lmax, each concatenation has 
length at most nlmax. 
(d) From unique decodability, each of these concatenations must be different, so there are at 
most 2i concatenations of aggregate length i, i.e., Ai ≤ 2i. Thus, since the above sum contains 
at most nlmax terms, 
 
 
MX 
j=1 
2−lj 
 
 
n 
≤ nlmax. (2) 
(e) Note that 
[nlmax]1/n = exp 
Σ 
ln(nlmax) 
n 
Π 
−→ exp(0) = 1 
as n → 1. Since (2) must be satisfied for all n, the Kraft inequality must be satisfied. 
Exercise 2.13: 
(a) In the Huffman algorithm, we start by combining p3 and p4. Since we have p1 = p3+p4 ≥ p2, 
we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also 
combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yielding 
codewords of lengths 1,2,3 and 3 respectively. 
(b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2p2. Thus 
p1 = p3 + p4 ≤ 2p2 which implies p1 + p3 + p4 ≤ 4p2. 
5
Since p2 = 1−p1 −p3 −p4, the latter equation implies that 1−p2 ≤ 4p2, or p2 ≥ 0.2. From the 
former equation, then, p1 ≤ 2p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by also 
choosing p3 = p4 = 0.2. Thus pmax = 0.4. 
(c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2p1. Thus 
1 − 2p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3 
and p3 = p4 = 1/6. 
(d) The argument in part (b) remains the same if we assume p1 ≤ p3+p4 rather than p1 = p3+p4, 
i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax. Thus assuming p1 > pmax implies that p1 > p3 + p4. 
Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol 
2 (or perhaps with symbol 1 if p2 = p1). Thus the codeword for symbol 1 (or perhaps the 
codeword for symbol 2) will have length 1. 
(e) The lengths of any optimal prefix free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 > 
pmax, then, from (b), p1 > p3 + p4, so the lengths (1, 2, 3, 3) yield a lower average length than 
(2, 2, 2, 2). 
(f) The argument in part (c) remains almost the same if we start with the assumption that 
p1 ≥ p3 + p4. In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2p1. Combined with p1 ≥ p2, we again 
have p1 ≥ pmin. Thus if p1 < pmin, we must have p3 + p4 > p1 ≥ p2. We then must combine p1 
and p2 in the second step of the Huffman algorithm, so each codeword will have length 2. 
(g) It turns out that pmax is still 2/5. To see this, first note that if p1 = 2/5, p2 = p3 = 1/5 
and all other symbols have an aggregate probability of 1/5, then the Huffman code construction 
combines the least likely symbols until they are tied together into a supersymbol of probability 
1/5. The completion of the algorithm, as in part (b), can lead to either one codeword of length 
1 or 3 codewords of length 2 and the others of longer length. If p1 > 2/5, then at each stage of 
the algorithm, two nodes of aggregate probability less than 2/5 are combined, leaving symbol 
1 unattached until only 4 nodes remain in the reduced symbol set. The argument in (d) then 
guarantees that the code will have one codeword of length 1. 
Exercise 2.15: 
(a) This is the same as Lemma 2.5.1. 
(b) Since p1 < pM−1 + pM, we see that p1 < p0M−1, where p0M−1 is the probability of the node 
in the reduced code tree corresponding to letters M − 1 and M in the original alphabet. Thus, 
by part (a), l1 ≥ l0M 
−1 = lM − 1. 
(c) Consider an arbitrary minimum-expected-length code tree. This code tree must be full (by 
Lemma 2.5.2), so suppose that symbol k is the sibling of symbol M in this tree. If k = 1, 
then l1 = lM, and otherwise, p1 < pM + pk, so l1 must be at least as large as the length of the 
immediate parent of M, showing that l1 ≥ lM − 1. 
(d) and (e) We have shown that the shortest and longest length differ by at most 1, with some 
number m ≥ 1 lengths equal to l1 and the remainingM−m lengths equal to l1+1. It follows that 
2l1+1 = 2m+(M −m) = M +m. From this is follows that l1 = blog2(M)c and m = 2l1+1 −M. 
Exercise 2.16: 
(a) Grow a full ternary tree to a full ternary tree at each step. The smallest tree has 3 leaves. For 
the next largest full tree, convert one of the leaves into an intermediate node and grow 3 leaves 
6
from that node. We lose 1 leaf, but gain 2 more at each growth extension. Thus, M = 3 + 2n 
(for n an integer). 
(b) It is clear that for optimality, all the unused leaves in the tree must have the same length 
as the longest codeword. For M even, combine the 2 lowest probabilities into a node at the 
first step, then combine the 3 lowest probability nodes for all the rest of the steps until the root 
node. If M is odd, a full ternary tree is possible, so combine the 3 lowest probability nodes at 
each step. 
(c) If {a, b, c, d, e, f} have symbol probabilities {0.3, 0.2, 0.2, 0.1, 0.1, 0.1} respectively, then the 
ternary Huffman code will be {a → 0, b → 1, c → 20, d → 21, e → 220, f → 221}. 
Exercise 2.18: 
(a) Applying the Huffman coding algorithm to the code with M +1 symbols with pM+1 = 0, we 
combine symbol M +1 with symbol M and the reduced code has M symbols with probabilities 
p1, . . . , pM. The Huffman code for this reduced set of symbols is simply the code for the original 
set of symbols with symbol M + 1 eliminated. Thus the code including symbol M + 1 is the 
reduced code modified by a unit length increase in the codeword for symbolM. Thus L = L0+pM 
where L0 is the expected length for the code with M symbols. 
(b) All n of the zero probability symbols are combined together in the Huffman algorithm, and 
the reduced code from this combination is then the same as the code with M + 1 symbols in 
part (a). Thus L = L0 + pM again. 
Exercise 2.19: 
(a) The entropies H(X), H(Y ), and H(XY ) can be expressed as 
H(XY ) = − 
X 
x∈X,y∈Y 
pXY (x, y) log pXY (x, y) 
H(X) = − 
X 
x∈X,y∈Y 
pXY (x, y) log pX(x) 
H(Y ) = − 
X 
x∈X,y∈Y 
pXY (x, y) log pY (y). 
It is assumed that all symbol pairs x, y of zero probability have been removed from this sum, 
and thus all x (y) for which pX(x) = 0 ( pY (y) = 0) are consequently removed. Combining these 
equations, 
H(XY ) − H(X) − H(Y ) = 
X 
x∈X,y∈Y 
pXY (x, y) log pX(x)pY (y) 
pXY (x, y) . 
(b) Using the standard inequality log x ≤ (x − 1) log e, 
H(XY ) − H(X) − H(Y ) ≤ 
X 
x∈X,y∈Y 
pXY (x, y) 
Σ 
pX(x)pY (y) 
pXY (x, y) − 1 
Π 
log e = 0. 
Thus H(X, Y ) ≤ H(X)+H(Y ). Note that this inequality is satisfied with equality if and only if 
X and Y are independent. 
7
(c) For n symbols, X1, . . . ,Xn, let Y be the ‘super-symbol’ X2, . . . ,Xn. Then using (b), 
H(X1, . . . ,Xn) = H(X1, Y ) ≤ H(X1) + H(Y ) = H(X1) + H(X2, . . . ,Xn). 
Iterating this gives the desired result. 
An alternate approach generalizes part (b) in the following way: 
H(X1, . . . ,Xn) − 
X 
i 
H(Xi) = 
X 
x1,... ,xn 
p(x1, . . . , xn) log p(x1), . . . , ...p(xn) 
p(x1, . . . , xn) 
≤ 0, 
where we have used log x ≤ (x − 1) log e again. 
Exercise 2.20 
(a) Y is 1 if X = 1, which occurs with probability p1. Y is 0 otherwise. Thus 
H(Y ) = −p1 log(p1) − (1 − p1) log(1 − p1) = Hb(p1). 
(b) Given Y =1, X = 1 with probability 1, so H(X | Y =1) = 0. 
(c) Given Y =0, X=1 has probability 0, so X hasM−1 possible choices with non-zero probability. 
The maximum entropy for an alphabet of sizeM−1 terms is log(M−1), so H(X|Y =0) ≤ log(M− 1). This upper bound is met with equality if Pr(X=j | X6=1) = 1 
M−1 for all j6= 1. Since 
Pr(X=j|X6=1) = pj/(1 − p1), this upper bound on H(X | Y =0) is achieved when p2 = p3 = 
· · · = pM. Combining this with part (b), 
H(X | Y ) = p1H(X | Y =1) ≤ (1−p1) log(M − 1). 
(d) Note that 
H(XY ) = H(Y ) + H(X|Y ) ≤ Hb(p1) + (1−p1) log(M−1) 
and this is met with equality for p2 = · · · , pM. There are now two reasonable approaches. One 
is to note that H(XY ) can also be expressed as H(X) + H(Y |X). Since Y is uniquely specified 
by X, H(Y |X) = 0, 
H(X) = H(XY ) ≤ Hb(p1) + (1 − p1) log(M − 1), (3) 
with equality when p2 = p3 = · · · = pM. The other approach is to observe that H(X) ≤ H(XY ), 
which again leads (3), but this does not immediately imply that equality is met for p2 = · · · = pM. 
Equation (3) is the Fano bound of information theory; it is useful when p1 is very close to 1 and 
plays a key role in the noisy channel coding theorem. 
(e) The same bound applies to each symbol by replacing p1 by pj for any j, 1 ≤ j ≤ M. Thus 
it also applies to pmax. 
23 
Exercise 2.22: One way to generate a source code for (X1,X2,X3 is to concatenate a Huffman 
code for (X1,X2) with a Huffman code of X3. The expected length of the resulting code for 
(X1,X2,X3) is Lmin,2 + Lmin. The expected length per source letter of this code is Lmin,2 + 
8
Lmin. The expected length per source letter of the optimal code for (X1,X2,X3) can be no 
worse, so 
13 
Lmin,3 ≤ 
2 
3Lmin,2 + 
1 
3Lmin. 
Exercise 2.23: (Run Length Coding) 
(a) Let C and C0 be the codes mapping source symbols to intermediate integers and intermediate 
integers to outbit bits respectively. If C0 is uniquely decodable, then the intermediate integers 
can be decoded from the received bit stream, and if C is also uniquely decodable, the original 
source bits can be decoded. 
The lengths specified for C0 satisfy Kraft and thus this code can be made prefix-free and thus 
uniquely decodable. For example, mapping 8 → 1 and each other integer to 0 followed by its 3 
bit binary representation is prefix-free. 
C is a variable to fixed length code, mapping {b, ab, a2b, . . . , a7b, a8} to the integers 0 to 8. This 
set of strings forms a full prefix-free set, and thus any binary string can be parsed into these 
‘codewords’, which are then mapped to the integers 0 to 8. The integers can then be decoded 
into the ‘codewords’ which are then concatenated into the original binary sequence. In general, a 
variable to fixed length code is uniquely decodable if the encoder can parse, which is guaranteed 
if that set of ‘codewords’ is full and prefix-free. 
(b) Each occurence of source letter b causes 4 bits to leave the encoder immediately. In addition, 
each subsequent run of 8 a’s causes 1 extra bit to leave the encoder. Thus, for each b, the encoder 
emits 4 bits with probability 1; it emits an extra bit with probability (0.9)8; it emits yet a further 
bit with probability (0.9)16 and so forth. Letting Y be the number of output bits per input b, 
E(Y ) = 4 + (.09)8 + (0.9)16 + · · · = 
4 · (0.9)8 
1 − (0.9)8 = 4.756. 
(c) To count the number of b’s out of the source, let Bi = 1 if the ith source letter is b and 
Bi = 0 otherwise. Then E(Bi) = 0.1 and σ2B 
i = 0.09. Let AB = (1/n) 
Pn i=1 Bi be the number of 
b’s per input in a run of n = 1020 inputs. This has mean 0.1 and variance (0.9) · 10−21, which is 
close to 0.1 with very high probability. As the number of trials increase, it is closer to 0.1 with 
still higher probability. 
(d) The total number of output bits corresponding to the essentially 1019 b’s in the 1020 source 
letters is with high probability close to 4.756 · 1019(1 + ≤) for small ≤. Thus, 
L ≈ 
(0.1) · 4 · (0.9)8 
1 − (0.9)8 = 0.4756. 
Renewal theory provides a more convincing way to fully justify this solution. 
Note that the achieved L is impressive considering that the entropy of the source is 
−(0.9) log(0.9) − (0.1) log(0.1) = 0.469 bits/source symbol. 
Exercise 2.25: 
(a) Note that W takes on values −log(2/3) with probability 2/3 and −log(1/3) with probability 
1/3. Thus E(W) = log 3 − 23 
. Note that E(W) = H(X). The fluctuation of W around its means 
is −1 
3 with probability 23 
and 23 
with probability 13 
. Thus σ2W 
= 29 
. 
9
(b) The bound on the probability of the typical set, as derived using the Chebyshev inequality, 
and stated in (2.2) is: 
Pr(Xn ∈ T≤) ≥ 1 − 
σ2W 
n≤2 = 1 − 
1 
45. 
2Y 
(c) To count the number of a’s out of the source, let the rv Yi(Xi) be 1 for Xi = a and 0 for 
Xi = b. The Yi(Xi)’s are iid with mean Y = 2/3 and σ= 2/9. Na(Xn) is given by 
Na = 
Xn 
i=1 
Yi(Xi), 
which has mean 2n/3 and variance 2n/9. 
(d) Since the n-tuple Xn is iid, the sample outcome w(xn) = 
P 
i w(xi). Let na be the sample 
value of Na corresponding to xn. Since w(a) = −log 2/3 and w(b) = −log 1/3, we have 
w(xn) = na(−log 2/3) + (n − na)(−log 1/3) = n log 3 − na 
W(Xn) = n log 3 − Na. 
In other words, ˜W 
(Xn), the fluctuation of W(Xn) around its mean, is the negative of the 
fluctuation of Na(Xn); that is ˜W 
(Xn) = − ˜N 
a(Xn). 
(e) The typical set is given by: 
T≤n 
= 
Ω 
xn : 
ØØØØ 
ØØØØ 
w(xn) 
n − E[W] 
æ 
= 
< ≤ 
Ω 
xn : 
ØØØØ 
˜ w(xn) 
n 
ØØØØ 
æ 
< ≤ 
= 
Ω 
xn : 
ØØØØ 
˜na(xn) 
n 
ØØØØ 
æ 
= 
< ≤ 
Ω 
xn : 105 
μ 
2 
3 − ≤ 
∂ 
< na(xn) < 105 
μ 
2 
3 
∂æ 
+ ≤ 
. 
where we have used ˜ w(xn) = −˜na(xn). Thus, α = 105 
°23 
− ≤ 
¢ 
and β = 105 
°23 
¢ 
. 
+ ≤ 
(f) From part (c), Na = 2n/3 and σ2N 
a = 2n/9. 
The CLT says that for n large, the sum of n iid random variables (rvs) has a distribution 
function close to Gaussian within several standard deviations from the mean. As n increases, 
the range and accuracy of the approximation increase. In this case, α and β are 103 below 
and above the mean respectively. The standard deviation is 
p 
2 · 105/9, so α and β are about 
6.7 standard deviations from the mean. The probability that a Gaussian rv is more than 6.7 
standard deviations from the mean is about (1.6) · 10−10. 
This is not intended as an accurate approximation, but only to demonstrate the weakness of the 
Chebyshev bound, which is useful in bounding but not for numerical approximation. 
Exercise 2.26: Any particular string xn which has i a’s and n−i b’s has probability 
°23 
¢i °13 
¢n−i. 
This is maximized when i = 105, and the corresponding probability is 10−17,600. Those strings 
with a single b have a probability 1/2 as large, and those with 2 b’s have a probability 1/4 as 
large. Since there are 
°n 
i 
¢ 
different sequences that have exactly i a’s and n − i b’s, 
Pr{Na = i} = 
μ 
n 
i 
∂μ 
2 
3 
∂i μ 
1 
3 
∂n−i 
. 
10
Evaluating for i = n, n−1, and n−2 for n = 105: 
Pr{Na = n} = 
μ 
2 
3 
∂n 
≈ 10−17,609 
Pr{Na = n−1} = 105 
μ 
2 
3 
∂n−1 μ 
1 
3 
∂ 
≈ 10−17604 
Pr{Na = n−2} = 
μ 
105 
2 
∂μ 
2 
3 
∂n−2 μ 
1 
3 
∂2 
≈ 10−17600. 
What this says is that the probability of any given string with na ones decreases as na decreases, 
while the aggregate probability of all strings with na a’s increases with na (for na large compared 
to Na). We saw in the previous exercise that the typical set is the set where na is close to Na 
and we now see that the most probable individual strings have fantastically small probability, 
both individually and in the aggregate, and thus can be ignored. 
Exercise 2.28: 
(a) The probability of an n-tuple xn = (x1, . . . , xn) is pXn(xn) = 
Qnk 
=1 pX(xk). This product 
includes Nj(xn) terms xk for which xk is the letter j, and this is true for each j in the alphabet. 
Thus 
pXn(xn) = 
MY 
j=1 
pNj (xn) 
j . (4) 
(b) Taking the log of (4), 
−log pXn(xn) = 
X 
j 
Nj(xn) log 
1 
pj 
. (5) 
Using the definition of Sn 
≤ , all xn ∈ Sn 
≤ must satisfy 
X 
j 
npj(1 − ≤) log 
1 
pj 
< 
X 
j 
Nj(xn) log 
1 
pj 
< npj(1 + ≤) log 
1 
pj 
nH(X)(1 − ≤) < 
X 
j 
Nj(xn) log 
1 
pj 
< nH(X)(1 + ≤). 
Combining this with (5, every xn ∈ S≤(n) satisfies 
H(X)(1 − ≤) < −log pXn(xn) 
n 
< H(X)(1 + ≤). (6) 
(c) With ≤0 = H(X)≤, (6) shows that for all xn ∈ Sn 
≤ , 
H(X) − ≤0 < −log pXn(xn) 
n 
< H(X) + ≤0. 
By (2.25) in the text, this is the defining equation of Tn 
≤0 , so all xn in Sn 
≤ are also in Tn 
≤0 . 
11
(d) For each j in the alphabet, the WLLN says that for any given ≤ > 0 and δ > 0, and for all 
sufficiently large n, 
Pr 
μØØØØ 
Nj(xn) 
n − pj 
ØØØØ 
∂ 
≤ 
≥ ≤ 
δ 
M 
. (7) 
For all sufficiently large n, (7) is satisfied for all j, 1 ≤ j ≤ M. For all such large enough n, each 
xn is either in Sn 
≤ or is a member of the event that |Nj (xn) 
n − pj | ≥ ≤ for some j. The union of 
the events that |Nj (xn) 
n − pj | ≥ ≤ for some j is upper bounded by δ, so Pr(Sn 
≤ ) ≥ 1 − δ. 
(e) The proof here is exactly the same as that of Theorem 2.7.1. Part (b) gives upper and lower 
bounds on Pr(xn) for xn ∈ Sn 
and (d) shows that 1 ≤ − δ ≤ Pr(Sn 
≤ ≤ 1, which together give the 
desired bounds on the number of elements in Sn 
≤ . 
Exercise 2.30: 
(a) First note that the chain is ergodic (i.e., it is aperiodic and all states can be reached P 
from all 
other states). Thus steady state probabilities q(s) exist and satisfy the equations 
s q(s) = 1 
and q(s) = 
P 
s0 q(s0)Q(s|s0). For the given chain, these latter equations are 
q(1) = q(1)(1/2) + q(2)(1/2) + q(4)(1) 
q(2) = q(1)(1/2) 
q(3) = q(2)(1/2) 
q(4) = q(3)(1). 
Solving by inspection, q(1) = 1/2, q(2) = 1/4, and q(3) = q(4) = 1/8. 
(b) To calculate H(X1) we first calculate the P 
pmf pX1(x) for each x ∈ X. Using the steady state 
probabilities q(s) for S0, we have pX1(x) = 
s q(s) Pr{X1=x|S0=s}. Since X1=a occurs with 
probability 1/2 from both S0=0 and S0=2 and occurs with probability 1 from S0=4, 
pX1(a) = q(0) 
1 
2 
+ q(2) 
1 
2 
+ q(4) = 
1 
2. 
Similarly, pX1(b) = pX1(c) = 1/4. Hence the pmf of X1 is 
©12 
, 14 
, 14 
™ 
and H(X1) = 3/2. 
(c) The pmf of X1 conditioned on S0 = 1 is {12 
, 12 
}. Hence, H(X1|S0=1) = 1. Similarly, 
H(X1|S0=2)=1. There is no uncertainty from states 3 and 4, so H(X1|S0=3) = H(X1|S0=4) = 
0. 
P 
Since H(X1|S0) is defined as 
Pr(S0 = s)H(X|S0=s), we have 
s H(X1|S0) = q(0)H(X1|S0=0) + q(1)H(X1|S0=0) = 
3 
4, 
which is less than H(X1) as expected. 
(d) We can achieve L = H(X1|S0) by achieving L(s) = H(X1|s) for each state s ∈ S. To do 
that, we use an optimal prefix-free code for each state. 
For S0 = 1, the code {a → 0, b → 1} is optimal with L(S0=1) = 1 = H(X1|S0=1). 
Similarly, for S0=2 {a → 0, c → 1} is optimal with L(S0=2) = 1 = H(X1|S0=2). 
Since H(X1|S0=3) = H(X1|S0=4) = 0, we do not use any code at all for the states 3 and 4. 
In other words, our encoder does not transmit any bits for symbols that result from transitions 
from these states. 
12
Now we explain why the decoder can track the state after time 0. The decoder is assumed 
to know the initial state. When in states 1 or 2, the next codeword from the corresponding 
prefix-free code uniquely determines the next state. When state 3 is entered, the the next state 
must be 4 since there is a single deterministic transition out of state 3 that goes to state 4 (and 
this is known without receiving the next codeword). Similarly, when state 4 is entered, the next 
state must be 1. When states 3 or 4 are entered, the next received codeword corresponds to the 
subsequent transition out of state 1. In this manner, the decoder can keep track of the state. 
(e)The question is slightly ambiguous. The intended meaning is how many source symbols 
x1, x2, . . . , xk must be observed before the new state sk is known, but one could possibly interpret 
it as determining the initial state s0. 
To determine the new state, note that the symbol a always drives the chain to state 0 and the 
symbol b always drives it to state 2. The symbol c, however, could lead to either state 3 or 4. 
In this case, the subsequent symbol could be c, leading to state 4 with certainty, or could be a, 
leading to state 1. Thus at most 2 symbols are needed to determine the new state. 
Determining the initial state, on the other hand, is not always possible. The symbol a could 
come from states 1, 2, or 4, and no future symbols can resolve this ambiguity. 
A more interesting problem is to determine the state, and thus to start decoding correctly, when 
the initial state is unknown at the decoder. For the code above, this is easy, since whenever 
a 0 appears in the encoded stream, the corresponding symbol is a and the next state is 0, 
permitting correct decoding from then on. This problem, known as the synchronizing problem, 
is quite challenging even for memoryless sources. 
Exercise 2.31: We know from (2.37) in the text that H(XY ) = H(Y ) + H(X | Y ) for any 
random symbols X and Y . For any k-tuple X1, . . . ,Xk of random symbols, we can view Xk 
as the symbol X above and view the k − 1 tuple Xk−1,Xk−2, . . . ,X1 as the symbol Y above, 
getting 
H(Xk,Xk−1 . . . ,X1) = H(Xk | Xk−1, . . . ,X1) + H(Xk−1, . . . ,X1). 
Since this expresses the entropy of each k-tuple in terms of a k−1-tuple, we can iterate, getting 
H(Xn,Xn−1, . . . ,X1) = H(Xn | Xn−1, . . . ,X1) + H(Xn−1, . . . ,X1) 
= H(Xn | Xn−1, . . . ,X1) + H(Xn−1 | Xn−2 . . . ,X1) + H(Xn−2 . . . ,X1) 
Xn 
= · · · = 
k=2 
H(Xk | Xk−1, . . . ,X1) + H(X1). 
Exercise 2.32: 
(a) We must show that H(S2|S1S0) = H(S2|S1). Viewing the pair of random symbols S1S0 as a 
random symbol in its own right, the definition of conditional entropy is 
H(S2|S1S0) = 
X 
s1,s0 
Pr(S1, S0 = s1, s0)H(S2|S1=s1, S0=s0) 
= 
X 
s1s0 
Pr(s1s0)H(S2|s1s0). (8) 
13
where we will use the above abbreviations throughout for clarity. By the Markov property, 
Pr(S2=s2|s1s0) = Pr(S2=s2|s1) for all symbols s0, s1, s2. Thus 
H(S2|s1s0) = 
X 
s2 
−Pr(S2=s2|s1s0) log Pr(S2=s2|s1s0) 
= 
X 
s2 
−Pr(S2=s2|s1) log Pr(S2=s2|s1) = H(S2|s1). 
Substituting this in (8), we get 
H(S2|S1S0) = 
X 
s1s0 
Pr(s1s0)H(S2|s1) 
= 
X 
s1 
Pr(s1)H(S2|s1) = H(S2|S1). (9) 
(b) Using the result of Exercise 2.31, 
H(S0, S1, . . . , Sn) = 
Xn 
k=1 
H(Sk | Sk−1, . . . , S0) + H(S0). 
Viewing S0 as one symbol and the n-tuple S1, . . . , Sn as another, 
H(S0, . . . , Sn) = H(S1, . . . , Sn | S0) + H(S0). 
Combining these two equations, 
H(S1, . . . , Sn | S0) = 
Xn 
k=1 
H(Sk | Sk−1, . . . , S0). (10) 
Applying the same argument as in part (a), we see that 
H(Sk | Sk−1, . . . , S0) = H(Sk | Sk−1). 
Substituting this into (10), 
H(S1, . . . , Sn | S0) = 
Xn 
k=1 
H(Sk | Sk−1). 
(c) If the chain starts in steady state, each successive state has the same steady state pmf, so 
each of the terms above are the same and 
H(S1, . . . , Sn|S0) = nH(S1|S0). 
(d) By definition of a Markov source, the state S0 and the next source symbol X1 uniquely 
determine the next state S1 (and vice-versa). Also, given state S1, the next symbol X2 uniquely 
determines the next state S2. Thus, Pr(x1x2|s0) = Pr(s1s2|s0) where x1x2 are the sample 
values of X1X2 in one-to-one correspondence to the sample values s1s2 of S1S2, all conditional 
on S0 = s0. 
Hence the joint pmf of X1X2 conditioned on S0=s0 is the same as the joint pmf for S1S2 
conditioned on S0=s0. The result follows. 
(e) Combining the results of (c) and (d) verifies (2.40) in the text. 
Exercise 2.33: Lempel-Ziv parsing of the given string can be done as follows: 
14
Step 1: 00011101 |0{0z1} 0101100 
↑ u = 7 n = 3 
Step 2: 00011101001 |01{z01} 100 
u = 2 ↑ n = 4 
Step 3: 000111010010101 |1{0z0} 
↑ u = 8 n = 3 
The string is parsed in three steps. In each step, the window is underlined and the parsed block 
is underbraced. The (n, u) pairs resulting from these steps are respectively (3,7), (4,2), and 
(3,8). 
Using the unary-binary code for n, which maps 3 → 011 and 4 → 00100, and a standard 3-bit 
map for u, 1 ≤ u ≤ 8, the encoded sequence is 011, 111, 00100, 010, 011, 000 (transmitted without 
commas). 
Note that for small examples, as in this case, LZ77 may not be very efficient. In general, the 
algorithm requires much larger window sizes to compress efficiently. 
Chapter 3 
Exercise 3.3: 
(a) Given a1 and a2, the Lloyd-Max conditions assert that b should be chosen half way between 
them, i.e., b = (a1+a2)/2. This insures that all points are mapped into the closest quantization 
point. If the probability density is zero in some region around (a1 + a2)/2, then it makes no 
difference where b is chosen within this region, since those points can not affect the MSE. 
(b) Note that y(x)/Q(x) is the expected value of U conditional on U ≥ x, Thus, given b, the 
MSE choice for a2 is y(b)/Q(b). Similarly, a1 is (E[U] − y(b))/1 − Q(x). Using the symmetry 
condition, E[U] = 0, so 
a1 = −y(b) 
1 − Q(b) a2 = y(b) 
Q(b) . (11) 
(c) Because of the symmetry, 
Q(0) = 
Z 
1 
0 
f(u) du = 
Z 
1 
0 
f(−u) du = 
Z 0 
−1 
f(u) du = 1 − Q(0). 
This implicity assumes that there is no impulse of probability density at the origin, since such 
an impulse would cause the integrals to be ill-defined. Thus, with b = 0, (11) implies that 
a1 = −a2. 
(d) Part (c) shows that for b = 0, a1 = −a2 satisfies step 2 in the Lloyd-Max algorithm, and 
then b = 0 = (a1 + a2)/2 then satisfies step 3. 
(e) The solution in part (d) for the density below is b = 0, a1 = −2/3, and a2 = 2/3. Another 
solution is a2 = 1, a1 = −1/2 and b = 1/3. The final solution is the mirror image of the second, 
namely a1 = −1, a2 = 1/2, and b = −1/3. 
15
1 
3≤ 
1 
3≤ 
✲ ✛≤ ✲ ✛≤ ✲ ✛≤ 
-1 0 1 
1 
3≤ 
f(u) 
(f) The MSE for the first solution above (b = 0) is 2/9. That for each of the other solutions 
is 1/6. These latter two solutions are optimal. On reflection, choosing the separation point b 
in the middle of one of the probability pulses seems like a bad idea, but the main point of the 
problem is that finding optimal solutions to MSE problems is often messy, despite the apparent 
simplicity of the Lloyd-Max algorithm. 
Exercise 3.4: 
(a) Using the hint, we minimize 
MSE(Δ1,Δ2) + Πf(Δ1,Δ2) = 
1 
12 
£ 
Δ21 
f1L1 + Δ22 
f2L2 
§ 
+ Π 
Σ 
L1 
Δ1 
+ 
L2 
Δ2 
Π 
over both Δ1 and Δ2. The function is convex over Δ1 and Δ2, so we simply take the derivative 
with respect to each and set it equal to 0, i.e., 
1 
6 
Δ1f1L1 − Π 
L1 
Δ21 
= 0; 
1 
6 
Δ2f2L2 − Π 
L2 
Δ22 
= 0. 
Rearranging, 
6Π = Δ31 
f1 = Δ32 
f2, 
which means that for each choice of Π, Δ1f1/3 
1 = Δ2f1/3 
2 . 
21 
(b) We see from part (a) that Δ1/Δ2 = (f2/f1)1/3 is fixed independent of M. Holding this ratio 
fixed, MSE is proportional to Δand M is proportional to 1/Δ1 Thus M2 MSE is independent 
of Δ1 (for the fixed ratio Δ1/Δ2). 
M2MSE = 
1 
12 
Σ 
f1L1 + 
Δ22 
Δ21 
f2L2 
Π Σ 
L1 + L2 
Δ1 
Δ2 
Π2 
= 
1 
12 
h 
f1L1 + f2/3 
1 f1/3 
2 L2 
i " 
L1 + L2 
f1/3 
2 
f1/3 
1 
#2 
= 
1 
12 
h 
f1/3 
1 L1 + f1/3 
2 L2 
i3 
. 
(c) If the algorithm starts with M1 points uniformly spaced over the first region and M2 points 
uniformly spaced in the second region, then it is in equilibrium and never changes. 
(d) If the algorithm starts with one additional point in the central region of zero probability 
density, and if that point is more than Δ1/2 away from region 1 and δ2/2 away from region 
2, then the central point is unused (with probability 1). Since the conditional mean over the 
region mapping into that central point is not well defined, it is not clear what the algorithm 
will do. If it views that conditional mean as being in the center of the region mapped into that 
16
2j 
point, then the algorihm is in equilibrium. The point of parts (c) and (d) is to point out that 
the Lloyd-Max algorithm is not very good at finding a global optimum. 
(e) The probability that the sample point lies in region j (j = 1, 2) is fjLj . The mean square 
error, using Mj points in region j and conditional on lying in region j, is L/(12M2 
). Thus, the 
j MSE with Mj points in region j is 
MSE = 
f1L31 
12M21 
+ 
f2L32 
12(M2)2 . 
This can be minimized numerically over integer M1 subject to M1 + M2 = M. This was 
minimized in part (b) without the integer constraint, and thus the solution here is slightly 
larger than that there, except in the special cases where the non-integer solution happens to be 
integer. 
(f) With given Δ1 and Δ2, the probability of each point in region j, j = 1, 2, is fjΔj and the 
number of such points is Lj/Δj (assumed to be integer). Thus the entropy is 
H(V ) = L1 
Δ1 
(f1Δ1) ln 
μ 
1 
f1L1 
∂ 
+ L2 
Δ2 
(f2Δ2) ln 
μ 
1 
f2L2 
∂ 
= −L1f1 ln(f1L1) − L2f2 ln(f2L2). 
(g) We use the same Lagrange multiplier approach as in part (a), now using the entropy H(V ) 
as the constraint. 
MSE(Δ1,Δ2) + ΠH(Δ1,Δ2) = 
1 
12 
£ 
Δ21 
f1L1 + Δ22 
f2L2 
§ 
− Πf1L1 ln(f1Δ1) − Πf2L2 ln(f2Δ2). 
Setting the derivatives with respect to Δ1 and Δ2 equal to zero, 
1 
6 
Δ1f1L1 − 
Πf1L1 
Δ1 
= 0; 
1 
6 
Δ2f2L2 − 
Πf2L2 
Δ2 
= 0. 
This leads to 6Π = Δ21 
= Δ22 
, so that Δ1 = Δ2. This is the same type of approximation as before 
since it ignores the constraint that L1/Δ1 and L2/Δ2 must be integers. 
Exercise 3.6: 
(a) The probability of the quantization region R is A = Δ(12 
+x+ Δ2 
). To simplify the algebraic 
Δ2 
12 
1A 
messiness, shift U to U − x − Δ/2, which, conditional on R, lies in [−Δ/2,Δ/2]. Let Y denote 
this shifted conditional variable. As shown below, fY (y) = [y + (x++)]. 
E[Y ] = 
Z Δ/2 
−Δ/2 
y 
A 
[y + (x + 
1 
2 
+ 
Δ 
2 
)] dy = 
Z Δ/2 
−Δ/2 
y2 
A 
dy + 
Z Δ/2 
−Δ/2 
y 
A 
[x + 
1 
2 
+ 
Δ 
2 
] dy = 
Δ3 
12A 
, 
since, by symmetry, the final integral above is 0. 
✟✟✟✟✟✟✟✟ 
x 
fU(u) 
✛Δ 
1 
2 + x 
✲ 
−Δ 
2 
fY (y) = 1 
Δ2 
A(y+x+1 
2+Δ 
2 ) 
y 
17
Since Y is the shift of U conditioned on R, 
E[U|R] = x + 
Δ 
2 
+ E[Y ] = x + 
Δ 
2 
+ 
Δ3 
12A 
. 
That is, the conditional mean is slightly larger than the center of the region R because of the 
increasing density in the region. 
(b) Since the variance of a rv is invariant to shifts, MSE= σ2U 
|R 
= σ2Y 
. Also, note from symmetry 
that 
R Δ/2 
−Δ/2 y3 dy = 0. Thus 
E[Y 2] = 
Z Δ/2 
−Δ/2 
y2 
A 
Σ 
y + (x + 
1 
2 
+ 
Δ 
2 
Π 
dy = 
) 
(x + 12 
+ Δ2 
) 
A 
Δ3 
12 
= 
Δ2 
12 . 
MSE = σ2Y 
= E[Y 2] − (E[Y ])2 = 
Δ2 
12 − 
Σ 
Δ3 
12A 
Π2 
. 
MSE − 
Δ2 
12 
= − 
Σ 
Δ3 
12A 
Π2 
= 
Δ4 
144(x + 12 
+ Δ2 
)2 
. 
(c) The quantizer output V is a discrete random variable whose entropy H[V ] is 
H(V ) = 
MX 
j=1 
Z jΔ 
(j−1)Δ −fU(u) log[f(u)Δ] du = 
Z 1 
0 −fU(u) log[f(u)] du − logΔ 
and the entropy of h(U) is by definition 
h[U] = 
Z 1 
−0 −fU(u) log[fU(u)] du. 
Thus, 
h[U] − logΔ − H[V ] = 
Z 1 
0 
fU(u) log[f(u)/fU(u)] du. 
(d) Using the inequality ln x ≤ x − 1, 
Z 1 
0 
fU(u) log[f(u)/fU(u)] du ≤ log e 
Z 1 
0 
fU(u)[f(u)/fU(u) − 1] du 
= log e 
ΣZ 1 
0 
f(u) du − 
Z 1 
0 
Π 
fU(u) 
= 0. 
Thus, the difference h[U] − logΔ − H[V ] is non-positive (not non-negative). 
(e) Approximating ln x by (1+x) − (1+x)2/2 for x = f(u)/f(u) and recognizing from part d) 
that the integral for the linear term is 0, we get 
Z 1 
0 
fU(u) log[f(u)/fU(u)] du ≈ − 
1 
2 
log e 
Z 1 
0 
Σ 
f(u) 
f(u) − 1 
Π2 
du (12) 
= − 
1 
2 
log e 
Z 1 
0 
[f(u) − fU(u)]2 
fU(u) du. (13) 
18
Now f(u) varies by at most Δ over any single region, and f(u) lies between the minimum and 
maximum f(u) in that region. Thus |f(u) − f(u)| ≤ Δ. Since f(u) ≥ 1/2, the integrand above 
is at most 2Δ2, so the right side of (13) is at most Δ2 log e. 
Exercise 3.7: 
(a) Note that 1 
u(ln u)2 is the derivative of −1/ ln u and thus integrates to 1 over the given interval. 
(b) 
h(U) = 
Z 
1 
e 
1 
u(ln u)2 [ln u + 2 ln(ln u)] du = 
Z 
1 
e 
1 
u ln u 
du + 
Z 
1 
e 
2 ln(ln u) 
u(ln u)2 du. 
The first integrand above is the derivative of ln(ln u) and thus the integral is infinite. The second 
integrand is positive for large enough u, and therefore h(U) is infinite. 
(c) The hint establishes the result directly. 
Exercise 3.8: 
(a) As suggested in the hint2, (and using common sense in any region where f(x) = 0) 
−D(fkg) = 
Z 
f(x) ln g(x) 
f(x) dx 
≤ 
Z 
f(x) 
Σ 
g(x) 
f(x) − 1 
Π 
dx = 
Z 
g(x) dx − 
Z 
f(x) dx = 0. 
Thus D(fkg) ≥ 0, 
(b) 
D(fkφ) = 
Z 
f(x) ln f(x) 
φ(x) dx 
= −h(f) + 
Z 
f(x) 
Σ 
ln√2πσ2 + x2 
2σ2 
Π 
dx 
= −h(f) + √2πeσ2. 
(c) Combining parts (a) and (b), h(f) ≤ √2πeσ2. Since D(φkφ) = 0, this inequality is satisfied 
with equality for a Gaussian rv ∼ N(0, σ2). 
Exercise 3.9: 
(a) For the same reason as for sources with probability densities, each representation point aj 
must be chosen as the conditional mean of the set of symbols in Rj . Specifically, 
aj = 
P 
i∈Rj pi ri P 
i∈Rj pi 
. 
2A useful feature of divergence is that it exists whether or not a density exists; it can be defined over any 
quantization of the sample space and it increases as the quantization becomes finer, thus approaching a limit 
(which might be finite or infinite). 
19
(b) The symbol ri has a squared error |ri − aj |2 if mapped into Rj and thus into aj . Thus ri 
must be mapped into the closest aj and thus the region Rj must contain all source symbols 
that are closer to aj than to any other representation point. The quantization intervals are not 
uniquely determined by this rule since Rj can end and Rj+1 can begin at any point between 
the largest source symbol closest to aj and the smallest source symbol closest to aj+1. 
(c) For ri midway between aj and aj+1, the squared error is |ri − aj |2 = |ri − aj+1|2 no matter 
whether ri is mapped into aj or aj+1. 
(d) In order for the case of part (c) to achieve MMSE, it is necessary for aj and aj+1 to each 
be the conditional mean of the set of points in the corresponding region. Now assume that aj 
is the conditional mean of Rj under the assumption that ri is part of Rj . Switching ri to Rj+1 
will not change the MSE (as seen in part (c)), but it will change Rj and will thus change the 
conditional mean of Rj . Moving aj to that new conditional mean will reduce the MSE. The 
same argument applies if ri is viewed as being in Rj+1 or even if it is viewed as being partly in 
Rj and partly in Rj+1. 
20
Chapter 4 
Exercise 4.2: 
From (4.1) in the text, we have u(t) = 
P 
1k 
=−1 
ˆuke2πikt/T for t ∈ [−T/2, T/2]. Substituting this 
into 
R T/2 
−T/2 u(t)u∗(t)dt, we have 
Z T/2 
−T/2 |u(t)|2dt = 
Z T/2 
−T/2 
1X 
k=−1 
ˆuke2πikt/T 1X 
`=−1 
ˆu∗` 
e−2πi`t/T dt 
= 
1X 
k=−1 
1X 
`=−1 
ˆuk ˆu∗` 
Z T/2 
−T/2 
e2πi(k−`)t/T dt 
= 
1X 
k=−1 
1X 
`=−1 
ˆuk ˆu∗` 
Tδk,`, 
where δk,` equals 1 if k = ` and 0 otherwise. Thus, 
Z T/2 
−T/2 |u(t)|2dt = T 
1X 
k=−1 
|ˆuk|2. 
Exercise 4.4: 
(a) Note that sa(k) − sa(k − 1) = ak ≥ 0, so the sequence sa(1), sa(2), . . . , is non-decreasing. 
A standard result in elementary analysis states that a bounded non-decreasing sequence must 
have a limit. The limit is the least upper bound of the sequence {sa(k); k ≥ 1}. 
(b) Let Jk = max{j(a), j(2), . . . , j(k), i.e., Jk is the largest index in aj(1), . . . , aj(k). Then 
Xk 
`=1 
b` = 
Xk 
`=1 
aj(`) ≤ 
XJk 
j=1 
aj ≤ Sa. 
By the same argument as in part (a), 
Pk `=1 b` has a limit as k → 1 and the limit, say Sb is at 
most Sa. 
(c) Using the inverse permutation to define the sequence {ak} from the sequence {bk}, the same 
argument as in part (b) shows that Sa ≤ Sb. Thus Sa = Sb and the limit is independent of the 
order of summation. 
(d) The simplest example is the sequence {1,−1, 1,−1, . . . }. The partial sums here alternate 
between 1 and 0, so do not converge at all. Also, in a sequence taking two odd terms for each 
even term, the series goes to 1. A more common (but complicated) example is the alternating 
harmonic series. This converges to 0, but taking two odd terms for each even term, the series 
approaches 1. 
Exercise 4.5: 
(a) For E = I1 ∪I2, with the left end points satisfying a1 ≤ a2, there are three cases to consider. 
21
• a2 < b1. In this case, all points in I1 and all points in I2 lie between a1 and max{b1, b2}. 
Conversely all points in (a1, max{b1, b2}) lie in either I1 or I2. Thus E is a single interval 
which might or might not include each end point. 
• a2 > b1. In this case, I1 and I2 are disjoint. 
• a2 = b1. If I1 is open on the right and I2 is open on the left, then I1 and I2 are separated 
by the single point a2 = b1. Otherwise E is a single interval. 
(b) Let Ek = I1∪I2∪· · ·∪Ik and let Jk be the final interval in the separated interval representation 
of Ek. We have seen how to find J2 from E2 and note that the starting point of J2 is either a1 
or a2. Assume that in general the starting point of Jk is aj for some j, 1 ≤ j ≤ k. 
Assuming that the starting points are ordered a1 ≤ a2 ≤ · · · , we see that ak+1 is greater than 
or equal to the starting point of Jk. Thus Jk ∪ Ik+1 is either a single interval or two separated 
intervals by the argument in part (a). Thus Ek+1, in separated interval form, is Ek, with Jk 
replaced either by two separated intervals, the latter starting with ak+1, or by a single interval 
starting with the same starting point as Jk. Either way the starting point of Jk+1 is aj for some 
j, 1 ≤ j ≤ k+1, verifying the initial assumption by induction. 
(c) Each interval Jk created above starts with an interval starting point a1, . . . , and ends with 
an interval ending point b1, . . . , and therefore all the separated intervals start and end with such 
points. 
(d) Let I01 
∪ · · · ∪ I0` 
be the union of disjoint intervals arising from the above algorithm and let 
I100 ∪ · · · ∪ I00 i be any other ordered union of separated intervals. Let k be the smallest integer for 
which I0k 
6= I00 k . Then the starting points or the ending points of these intervals are different, or 
one of the two intervals is open and one closed on one end. In all of these cases, there is at least 
one point that is in one of the unions and not the other. 
0j 
Exercise 4.6: 
0j 
(a) 0j 
0jIf we assume that the intervals {Ij ; 1 ≤ j < 1} are ordered in terms of starting points, then 
the argument in Exercise 4.5 immediately shows that the set of separated intervals stays the 
same as each new new interval Ik+1 is added except for the possible addition of a new interval 
at the right or the expansion of the right most interval. However, with a countably infinite set 
of intervals, it is not necessarily possible to order the intervals in terms of starting points (e.g., 
suppose the left points are the set of rationals in (0,1)). However, in the general case, in going 
from Bk to Bk+1, a single interval Ik+1 is added to Bk. This can add a new separated interval, 
or extend one of the existing separated intervals, or combine two or more adjacent separated 
intervals. In each of these cases, each of the separated intervals in Bk (including Ij,k) either 
stays the same or is expanded. Thus Ij,k ⊆ Ij,k+1. 
(b) Since Ij,k ⊆ Ij,k+1, the left end points of the sequence {Ij,k; k ≥ j} is a monotonic decreasing 
sequence and thus has a limit (including the possibility of −1). Similarly the right end points 
are monotonically increasing, and thus have a limit (possibly +1). Thus limk→1 Ij,k exists as 
an interval Ithat might be infinite on either end. Note now that any point in the interior of Imust be in Ij,k for some k. The same is true for the left (right) end point of Iif Iis closed on 
the left (right). Thus I0j 
must be in B for each j. 
(c) From Exercise 4.5, we know that for each k ≥ 1, the set of intervals {I1,k, I2,k, . . . , Ik,k} is 
a separated set whose union is Bk. Thus, for each `, j ≤ k, either I`,k = Ij,k or I`,k and Ij,k are 
22
0j 
separated. If 0` 
I`,k = Ij,k, then the fact that Ij,k ⊆ Ij,k+1 ensures that I`,k+1 = Ij,k+1, and thus, 
in the limit, I= I. If I`,k and Ij,k are separated, then, as explained in part (a), the addition 
0j 
of Ik+1 either maintains the separation or combines I`,k and Ij,k into a single interval. Thus, as 
k increases, either I`,k and Ij,k remain separated or become equal. 
(d) The sequence {I; j ≥ 1} is countable, and after removing repetitions it is still countable. It 
is a separated sequence of intervals from (c). From (b), ∪1j 
=1 ⊆ B. Also, since B = ∪jIj ⊆ ∪jI0j 
, 
we see that B = ∪jI0j 
. 
(e) Let {I0j 
; j ≥ 1} be the above sequence of separated intervals and let {I00 j ; j ≥ 1} be any other 
sequence of separated intervals such that ∪jI00 j = B. For each j ≥ 1, let c0j be the center point 
of I0j 
. Since c0j is in B, cj ∈ I00 k for some k ≥ 1. Assume first that I0j 
is open on the left. Letting 
a0j be the left end point of I0j 
, the interval (a0j , c0j ] must be contained in I00 k . Since a0j /∈ B, a0j 
must be the left end point of I00 k and I00 k must be open on the left. Similarly, if I0j 
is closed on 
0j 
0j 
the left, a0is the left end point of I00 and I00 is closed on the left. Using the same analysis on 
j k k the right end point of I, we see that I= I00 k . Thus the sequence {I00 j ; j ≥ 1} contains each 
interval in {I0j 
; j ≥ 1}. The same analysis applied to each interval in {I00 j ; j ≥ 1} shows that 
{I0j 
; j ≥ 1} contains each interval in {I00 j ; j ≥ 1}, and thus the two sequences are the same except 
for possibly different orderings. 
Exercise 4.7: 
(a) and (b) For any finite unions of intervals E1 and E2, (4.87) in the text states that 
μ(E1) + μ(E2) = μ(E1 ∪ E2) + μ(E1 ∩ E2) ≥ μ(E1 ∪ E2), 
kj 
where the final inequality follows from the non-negativity of measure and is satisfied with equality 
if E1 and E2 are disjoint. For part (a), let I1 = E1 and I2 = E2 and for part (b), let Bk = E1 and 
Ik+1 = E2. 
(c) For k = 2, part (a) shows that μ(Bk) ≤ μ(I1) + μ(I2). Using this Pas the initial step of the 
induction and using part (b) for the inductive step shows that μ(Bk) ≤ 
μ(Ij) with equality 
=1 in the disjoint case. 
(d) First assume that μ(B) is finite (this is always the case for measure over the interval 
[−T/2, T/2]). Then since Bk is non-decreasing in k, 
μ(B) = lim 
k→1 
μ(Bk) ≤ lim 
k→1 
Xk 
j=1 
μ(Ik). 
Alternatively, if μ(B) = 1, then limk→1 
Pkj 
=1 μ(Ik) = 1 also. 
Exercise 4.8: Let Bn = ∪1j 
=1In,j. Then B = ∪n,jIn,j. The collection of intervals {In,j; 1 ≤ n ≤ 
1, 1 ≤ j ≤ 1} is a countable collection of intervals since the set of pairs of positive integers is 
countable. 
Exercise 4.12: 
(a) By combining parts (a) and (c) of Exercise 4.11, {t : u(t) > β} is measurable for all β. 
Thus, {t : −u(t) < −β} is measurable for all β, so −u(t) is measurable. Next, for β > 0, 
{t : |u(t)| < β} = {t : u(t) < β} ∩ {t : u(t) > −β}, which is measurable. 
23
(b) {t : u(t) < β} = {t : g(u(t)) < g(β), so if u(t) is measurable, then g(u(t) is also. 
(c) Since exp(·) is increasing, exp[u(t)] is measurable by part (b). Part (a) shows that |u(t)| is 
measurable if u(t) is. Both the squaring function and the log function are increasing for positive 
values, so u2(t) = |u(t)|2 and log(|u(t)| are measurable. 
Exercise 4.13: 
(a) Let y(t) = u(t) + v(t). We will show that {t : y(t) < β) is measurable for all real β. Let 
≤ > 0 be arbitrary and k ∈ Z be arbitrary. Then, for any given t, 
(k − 1)≤ ≤ u(t) < k≤ and v(t) < β − k≤) =⇒ y(t) < β. 
This means that the set of t for which the left side holds is included in the set of t for which the 
right side holds, so 
{t : (k − 1)≤ ≤ u(t) < k≤} ∩ {t : v(t) < β − k≤)} ⊆ {t : y(t) < β}. 
This subset inequality holds for each integer k and thus must hold for the union over k, 
[ 
k 
h 
{t : (k − 1)≤ ≤ u(t) < k≤} ∩ {t : v(t) < β − k≤)} 
i 
⊆ {t : y(t) < β}. 
Finally this must hold for all ≤ > 0, so we choose a sequence 1/n for n ≥ 1, yielding 
[ 
n≥1 
[ 
k 
h 
{t : (k − 1)/n ≤ u(t) < k/n} ∩ {t : v(t) < β − k/n)} 
i 
⊆ {t : y(t) < β}. 
The set on the left is a countable union of measurable sets and thus is measurable. It is also 
equal to {t : y(t) < β}, since any t in this set also satisfies y(t) < β−1/n for sufficiently large n. 
(b) This can be shown by an adaptation of the argument in (a). If u(t) and v(t) are positive 
functions, it can also be shown by observing that ln u(t) and ln v(t) are measurable. Thus the 
sum is measurable by part (a) and exp[ln u(t) + ln v(t)] is measurable. 
Exercise 4.14: The hint says it all. 
Exercise 4.15: (a) Restrict attention to t ∈ [−T/2, T/2] throughout. First we show that 
vm(t) = inf1n=m un(t) is measurable for all m ≥ 1. For any given t, if un(t) ≥ V for all n ≥ m, 
then V is a lower bound to un(t) over n ≥ m, and thus V is greater than or equal to the greatest 
such lower bound, i.e., V ≥ vm(t). Similarly, vm(t) ≥ V implies that un(t) ≥ V for all n ≥ m. 
Thus, 
{t : vm(t) ≥ V } = 
1 
n=m{t : un(t) ≥ V }. 
Using Exercise 4.11, the measurability of un implies that {t : un(t) ≥ V } is measurable for 
each n. The countable intersection above is therefore measurable, and thus, using the result of 
Exercise 4.11 again, vm(t) is measurable for each m. 
24
Next, if vm(t) ≥ V then vm0(t) ≥ V for all m0 > m. This means that vm(t) is a non-decreasing 
function of m for each t, and thus limm vm(t) exists for each t. This also means that 
{t : lim 
m→1 
vm(t) ≥ V } = 
1[ 
m=1 
" 
1 
# 
. 
n=m{t : un(t) ≥ V } 
This is a countable union of measurable sets and is thus measurable, showing that lim inf un(t) 
is measurable. 
(b) If lim infn un(t) = V1 for a given t, then limm vm(t) = V1, which implies that for the given 
t, the sequence {un(t); n ≥ 1} has a subsequence that approaches V1 as a limit. Similarly, if 
lim supn un(t) = V2 for that t, then the sequence {un(t), n ≥ 1} has a subsequence approaching 
V2. If V1 < V2, then limn un(t) does not exist for that t, since the sequence oscillates infinitely 
between V1 and V2. If V1 = V2, the limit does exist and equals V1. 
(c) Using the same argument as in part (a), with inf and sup interchanged, 
{t : lim sup un(t) ≤ V } = 
1 
m=1 
" 
1[ 
# 
n=m{t : un(t) ≤ V } 
is also measurable, and thus lim sup un(t) is measurable. It follows from this, with the help of 
Exercise 4.13 (a), that lim supn un(t) − lim infn un(t) is measurable. Using part (b), limn un(t) 
exists if and only if this difference equals 0. Thus the set of points on which limn un(t) exists is 
measurable and the function that is this limit when it exists and 0 otherwise is measurable. 
Exercise 4.16: As seen below, un(t) is a rectangular pulse taking the value 2n from 1 
2n+1 to 
3 
2n+1 . It follows that for any t ≤ 0, un(t) = 0 for all n. For any fixed t > 0, we can visually see 
that for n large enough, un(t) = 0. Since un(t) is 0 for all t greater than 3 
2n+1 , then for any fixed 
t > 0, un(t) = 0 for all n > log2 
3t 
− 1. Thus limn→1 un(t) = 0 for all t. 
Since limn→1 un(t) = 0 for all t, it follows that 
R 
R R 
limn→1 un(t)dt = 0. On the other hand, 
un(t)dt = 1 for all n so limn→1 
un(t)dt = 1. 
1/16 3/16 
1/4 3/8 3/4 
0 1/8 
Exercise 4.17: 
(a) Since u(t) is real valued, 
ØØØØ 
ØØØØ 
Z 
u(t)dt 
= 
ØØØØ 
Z 
u+(t)dt − 
ØØØØ 
Z 
u−(t)dt 
≤ 
ØØØØ 
ØØØØ 
Z 
u+(t)dt 
+ 
ØØØØ 
ØØØØ 
Z 
u−(t)dt 
= 
Z ØØ 
ØØ 
u+(t) 
dt + 
Z ØØ 
ØØ 
u−(t) 
dt 
= 
Z 
u+(t)dt + 
Z 
u−(t)dt = 
Z 
|u(t)|dt. 
25
(b) As in the hint we select α such that α 
R 
u(t)dt is non-negative and real and |α| = 1. Now 
let R 
αu(t) = v(t) + jw(t) where R 
v(t) and w(t) R 
are the real and imaginary part of αu(t). Since 
α 
u(t)dt is real, we have 
w(t)dt = 0 and α 
u(t)dt = 
R 
v(t)dt. Note also that |v(t)| ≤ |αu(t)|. 
Hence 
ØØØØ 
ØØØØ 
Z 
u(t)dt 
= 
ØØØØ 
α 
ØØØØ 
Z 
u(t)dt 
= 
ØØØØ 
ØØØØ 
Z 
v(t)dt 
Z 
≤ 
|v(t)| dt (part a) 
Z 
≤ 
|αu(t)| dt 
= 
Z 
|α| |u(t)| dt 
= 
Z 
|u(t)| dt. 
Exercise 4.18: 
(a) The meaning of u(t) = v(t) a.e. is that μ{t : R |u(t) − v(t)| > 0} = 0. It follows that 
|u(t) − v(t)|2dt = 0. Thus u(t) and v(t) are L2 equivalent. 
(b) If u(t) and v(t) are L2 equivalent, then 
R 
|u(R 
t)−v(t)|2dt = 0. Now suppose that μ{t : |u(t)− v(t)|2 > ≤} is non-zero for some ≤ > 0. Then 
|u(t) − v(t)|2dt ≥ ≤μ{t : |u(t) − v(t)|2 > ≤} > 0 
which contradicts the assumption that u(t) and v(t) are L2 equivalent. 
(c) The set {t : |u(t) − v(t)| > 0} can be expressed as 
{t : |u(t) − v(t)| > 0} = 
[ 
n≥1 
{t : |u(t) − v(t)| > 1/n}. 
Since each term on the right has zero measure, the countable union also has zero measure. Thus 
{t : |u(t) − v(t)| > 0} has zero measure and u(t) = v(t) a.e. 
Exercise 4.21: 
(a) By expanding the magnitude squared within the given integral as a product of the function 
and its complex conjugate, we get 
Z ØØ Øu(t) − 
Xn 
m=−n 
X` 
k=−` 
ØØØ 
ˆuk,mθk,m(t) 
2 
dt = 
Z 
|u(t)|2 dt − 
Xn 
m=−n 
X` 
k=−` 
T|ˆuk,m|2. (14) 
Since each increase in n (or similarly in `) subtracts additional non-negative terms, the given 
integral is non-increasing in n and `. 
(b) and (c) The set of terms T|ˆuk,m|2 for k ∈ Z and m ∈ Z is a countable set of non-negative 
terms with a sum bounded by 
R 
|u(t)|2 dt, which is finite since u(t) is L2. Thus, using the result 
of Exercise 4.4, the sum over this set of terms is independent of the ordering of the summation. 
Any scheme for increasing n and ` in any order relative to each other in (14) is just an example 
of this more general ordering and must converge to the same quantity. 
26
Since um(t) = u(t)rect(t/T − m) satisfies 
R 
|um(t)|2 dt = T 
P 
k |uk,m|2 by Theorem 4.4.1 of the 
text, it is clear that the limit of (14) as n, ` → 1 is 0, so the limit is the same for any ordering. 
There is a subtlety above which is important to understand, but not so important as far as 
developing the notation to avoid the subtlety. The easiest way to understand (14) is by under-standing 
that 
R 
|um(t)|2 dt = T 
P 
k |uk,m|2, which suggests taking the limit k → ±1 for each 
value of m in (14). This does not correspond to a countable ordering of (k,m). This can be 
straightened out with epsilons and deltas, but is better left to the imagination of the reader. 
Exercise 4.22: 
(a) First note that: 
Xn 
m=−n 
um(t) = 
 
 
0 |t| > (n + 1/2)T 
2u(t) t = (m+ 1/2)T, |m| < n 
u(t) otherwise. 
Z ØØØØØ 
u(t) − 
Xn 
m=−n 
ØØØØØ 
um(t) 
2 
dt = 
Z (−n−1/2)T 
−1 |u(t)|2 + 
Z 
1 
(n+1/2)T |u(t)|2 dt. 
By the definition of an L2 function over an infinite time interval, each of the integrals on the 
right approach 0 with increasing n. 
(b) Let u` 
m(t) = 
P` 
k=−` ˆuk,mθk,m(t). Note that 
Pn 
m=−n u` 
m(t) = 0 for |t| > (n + 1/2)T. We can 
now write the given integral as: 
Z 
|t|>(n+1/2)T |u(t)|2 dt + 
Z (n+1/2)T 
−(n+1/2)T 
ØØØØØ 
u(t) − 
Xn 
m=−n 
ØØØØØ 
u` 
m(t) 
2 
dt. (15) 
As in part (a), the first integral vanishes as n → 1. 
(c) Since ˆuk,m are the Fourier series coefficients of um(t) we know um(t) = l.i.m`→1u` 
m(t). Hence, 
for each n, the second integral goes to zero as ` → 1. Thus, for any ≤ > 0, we can choose n so 
that the first term is less than ≤/2 and then choose ` large enough that the second term is less 
than ≤/2. Thus the limit of (15) as n, ` → 1 is 0. 
Exercise 4.23: The Fourier transform of the LHS of (4.40) is a function of t, so its Fourier 
transform is 
F(u(t) ∗ v(t)) = 
Z 
1 
−1 
μZ 
1 
−1 
u(τ )v(t − τ )dτ 
∂ 
e−2πiftdt 
= 
Z 
1 
−1 
u(τ ) 
μZ 
1 
−1 
∂ 
v(t − τ )e−2πift dt 
dτ 
= 
Z 
1 
−1 
u(τ ) 
μZ 
1 
−1 
v(r)e−2πif(t+r) dr 
∂ 
dτ 
= 
Z 
1 
−1 
u(τ )e−2πifτdτ 
Z 
1 
−1 
v(r)e−2πifrdr 
= ˆu(f)ˆv(f). 
27
Exercise 4.24: 
(a) 
Z 
|t|>T 
ØØ Øu(t)e−2πift − u(t)e−2πi(f−δ)t 
ØØØ 
dt = 
Z 
|t|>T 
ØØ Øu(t)e−2πift 
≥ 
1 − e2πiδt 
¥ØØØ 
dt 
= 
Z 
|t|>T |u(t)| 
ØØØ 
1 − e2πiδt 
ØØØ 
dt 
≤ 2 
Z 
|t|>T |u(t)| dt for all f > 0, δ > 0. 
Since u(t) is L1, 
R 
1 
−1 |u(t)| dt is finite. Thus, for T large enough, we can make 
R 
|t|>T |u(t)| dt 
as small as we wish. In particular, we can let T be sufficiently large that 2 
R 
|t|>T |u(t)| dt is less 
than ≤/2. The result follows. 
(b) For all f, 
Z 
|t|≤T 
ØØ Øu(t)e−2πift − u(t)e−2πi(f−δ)t 
ØØØ 
dt = 
Z 
|t|≤T |u(t)| 
ØØØ 
1 − e2πiδt 
ØØØ 
dt. 
For the T selected in part a), we can make 
ØØ 
1 − e2πiδt 
ØØ 
arbitrarily small for all |t| ≤ T by choosing 
δ to be small enough. Also, since u(t) is L1, 
R 
|t|≤T |u(t)| dt is finite. Thus, by choosing δ small 
enough, we can make 
R 
|t|≤T |u(t)| 
ØØ 
1 − e2πiδt 
ØØ 
dt < ≤/2. 
Exercise 4.26: Exercise 4.11 shows that the sum of two measurable functions is measurable, 
so the question concerns the energy in R 
au(t) + bv(t). Note that R 
for each t, |au(t) + bv(t)|2 ≤ 2|a|2|u(t)|2 + 2|b|2|v(t)|2. Thus since 
|u(t)|2 dt < 1 and 
R |v(t)|2 dt < 1, it follows that 
|au(t) + bv(t)|2 dt < 1. 
If {t : u(t) ≤ β} is a union of disjoint intervals, then {t : u(t − T) ≤ β} is that same union of 
intervals each shifted to the left by T, and therefore it has the same measure. In the general 
case, any cover of {t : u(t) ≤ β}, if shifted to the left by T, is a cover of {t : u(t−T) ≤ β}. Thus, 
for all β, μ{t : u(t) ≤ β} = μ{t : u(t−T) ≤ β}. Similarly if {t : u(t) ≤ β} is a union of intervals, 
then {t : u(t/T ) ≤ β} is that same set of intervals expanded by a factor of T. This generalizes 
to arbitrary measurable sets as before. Thus μ{t : u(t) ≤ β} = (1/T)μ{t : u(t/T ) ≤ β}. 
Exercise 4.29: The statement of the exercise contains a misprint — the transform ˆu(f) is 
limited to |f| ≤ 1/2 (thus making the sampling theorem applicable) rather than the function 
being time-limited. For the given sampling coefficients, we have 
u(t) = 
X 
k 
u(k)sinc(t − k) = 
Xn 
(−1)ksinc(t − k) 
k=−n 
u(n + 
1 
2 
) = 
Xn 
(−1)ksinc(n + 
k=−n 
1 
2 − k) = 
Xn 
k=−n 
12 
(−1)k(−1)n−k 
π[n − k + ] 
. (16) 
Since n is even, (−1)k(−1)n−k = (−1)n = 1. Substituting j for n − k, we then have 
u(n + 
1 
2 
) = 
X2n 
k=0 
1 
π(k + 12 
) 
. (17) 
28
The approximation 
Pm2 
k=m1 
1 
ln m2+1 
k+1/2 ≈ m1 
comes from approximating the sum by an integral 
and is quite accurate for m1 >> 0 To apply this approximation to (17), we must at least omit 
the term k = 0 and this gives us the approximation 
u(n + 
1 
2 
) ≈ 
2 
π 
+ 
1 
π 
ln(2n + 1). 
This goes to infinity logarithmically in n as n → 1. The approximation can be improved 
by removing the first few terms from (17) before applying the approximation, but the term 
ln(2n + 1) remains. 
We can evaluate u(n+m+12 
) and u(n−m−12 
) by the same procedure as in (16). In particular, 
u(n+m+ 
1 
2 
) = 
Xn 
(−1)ksinc(n+m+ 
k=−n 
1 
2−k) 
= 
Xn 
k=−n 
12 
(−1)k(−1)n+m−k 
π[n+m−k + ] 
= 
2Xn+m 
j=m 
12 
(−1)n+m 
π[j + ] 
. 
. 
u(n−m− 
1 
2 
) = 
Xn 
k=−n 
12 
(−1)k(−1)n−m−k 
π[n−m−k − ] 
= 
2Xn−m 
j=−m 
12 
(−1)n−m 
π[j − ] 
. 
Taking magnitudes, 
ØØØØ 
u(n+m+ 
1 
2 
ØØØØ 
) 
= 
2Xn+m 
j=m 
1 
π[j + 12 
] 
; 
ØØØØ 
u(n−m− 
1 
2 
ØØØØ 
) 
= 
2Xn−m 
j=−m 
1 
π[j − 12 
] 
. 
All terms in the first expression above are positive, whereas those in the second expression are 
negative for j ≤ 0. We break this second expression into positive and negative terms: 
ØØØØ 
u(n−m− 
1 
2 
ØØØØ 
) 
= 
X0 
j=−m 
1 
π[j − 12 
] 
+ 
2Xn−m 
j=1 
1 
π[j − 12 
] 
= 
X0 
k=−m 
1 
π[k − 12 
] 
+ 
2nX−m−1 
j=0 
1 
π[j + 12 
] 
. 
For each j, 0 ≤ j ≤ m, the term in the second sum above is the negative of the term in the first 
sum with j = −k. Cancelling these terms out, 
ØØØØu(n−m− 
1 
2 
ØØØØ 
) 
= 
2nX−m−1 
j=m+1 
1 
π[j + 12 
] 
. 
This is a sum of positive terms and is a subset of the positive terms in |u(n+m+ 12 
|, establishing 
that |u(n − m − 12 
| ≤ |u(n + m + 12 
)|. What is happening here is that for points inside [−n, n], 
the sinc functions from the samples on one side of the point cancel out the sinc functions from 
the samples on the other side. 
The particular samples in this exercise have been chosen to illustrate that truncating the samples 
of a bandlimited function and truncating the function can have very different effects. Here 
the function with truncated samples oscillates wildly (at least logarithmically in n), with the 
oscillations larger outside of the interval than inside. Thus most of the energy in the function 
resides outside of the region where the samples are nonzero. 
29
1W 
Exercise 4.31: 
(a) Note that g(t) = p2(t) where p(t) = sinc(Wt). Thus g(ˆf) is the convolution of p(ˆf) with 
itself. Since p(ˆf) = rect( f 
), we can convolve graphically to get the triangle function below. 
W ° 
° 
°❅ 
° 
ˆg(f) g(t) 
❅ 
❅ 
❅ 
1 
W 
−W W= 1 
2T 
1 
1W 
(b) Since u(t) = 
P 
k u(kT)sinc(2Wt − k), it follows that v(t) = 
P 
k u(kT)sinc(2Wt − k) ∗ g(t). 
Letting h(t) = sinc(t/T ) ∗ g(t), we see that ˆh 
(f) = Trect(Tf)ˆg(f). Since rect(Tf) = 1 over the 
range where ˆg(f) is non-zero, ˆh 
(f) = T ˆg(f). Thus h(t) = Tg(t). It follows that 
v(t) = 
X 
k 
Tu(kT)g(t − kT). (18) 
(c) Note that g(t) ≥ 0 for all t. This is the feature of g(t) that makes it useful in generating 
amplitude limited pulses. Thus, since u(kT) ≥ 0 for each k, each term in the sum is non-negative, 
and v(t) is non-negative. 
P 
(d) The obvious but incomplete way to see that 
k sinc(t/T − k) = 1 is to observe that each 
sample of the constant function 1 is 1, so this is just the sampling expansion of a constant. 
Unfortunately, u(t) = 1 is not L2, so the sampling theorem does not apply. The problem is more 
than nit-picking, since, for example, the sampling expansion of a sequence of alternating 1’s and 
-1’s does not converge (as can be seen from Exercise 4.29). The desired result follows here from 
noting that both the sampling expansion and the constant function 1 are periodic in T and both 
are L2 over one period. Taking the Fourier series over a period establishes the equality. 
P 
(e) To evaluate 
k g(t−kT), consider P (18) with each u(kT) = 1. For this choice, it follows that 
k g(t−kT) = v(t)/T. To evaluate v(t) for this choice, note that u(t) = 1 and v(t) = u(t)∗g(t), 
so that v(t) can be regarded as the output when the constant 1 is passed through the filter g(t). 
The output is then constant also and equal to 
R 
g(t) dt = ˆg(0) = 1W 
. Thus 
P 
k g(t − kT) = 
1/TW = 2. 
(f) Note that v(t) = 
P 
k u(kT)Tg(t − kT) is non-decreasing, for each t, in each sample u(kT). 
Thus v(t) ≤ 
P 
k Tg(t − kT), which as we have seen is simply 2T. 
(h) Since g is real and non-negative and each |u(kT)| ≤ 1, 
|v(t)| ≤ 
X 
k 
|u(kT)|Tg(t − kT) ≤ 2T for all t. 
We will find in Chapter 6 that g(t) is not a very good modulation waveform at a sample 
separation T, but it could be used at a sample separation 2T. 
Exercise 4.33: Consider the sequence of functions vm(t) = rect(t − m) for m ∈ Z+, i.e., time 
spaced rectangular pulses. For every t, limm→1 rect(t − m) = 0 so this sequence converges 
pointwise to 0. However, 
R 
|(rect(t −m) − rect(t − n)|2 dt = 2 for all n6= m, so L2 convergence 
is impossible. 
30
Exercise 4.37: 
(a) 
Z 
|ˆs(f)| df = 
Z 
| 
X 
m 
ˆu(f + m 
T 
)rect(fT)| df ≤ 
Z X 
|u(ˆf + m 
T 
m )rect(fT)| df = 
Z 
|ˆu(f)| df, 
which shows that s(f) is L1 if ˆu(f) is. 
(b) The following sketch makes it clear that ˆu(f) is L1 and L2. In particular, 
Z 
|ˆu(f)| df = 
Z 
|ˆu(f)|2 df = 2 
X 
k≥1 
1 
k2 < 1. 
0 1 2 
1 
ˆu(f) 
✲ 1✛/4 ✲ ✛1/9 
ˆs(f) 
2 
6 
4 
0 1/2 
It can be seen from the sketch of ˆs(f) that ˆs(f) = 2 from 18 
to 12 
and from −12 
to −18, which is 
a set of measure 3/4. In general, for arbitrary integer k > 0, it can be seen that ˆs(f) = 2k from 
1 
to 1 
and from 1 
2(k+1)2 2k2 − 2k2 to − 1 
2(k+1)2 . Thus ˆs(f) = 2k over a set of measure 2k+1 
k2(k+1)2 . It 
follows that 
Z 
ˆ |s(f)|2 df = lim 
n→1 
Xn 
k=1 
(2k)2 2k + 1 
k2(k + 1)2 = lim 
n→1 
Xn 
k=1 
4(2k + 1) 
(k + 1)2 
≥ lim 
n→1 
Xn 
k=1 
4(k + 1) 
(k + 1)2 = 
Xn 
k=1 
4 
k + 1 
= 1. 
(c) Note that ˆu(f) = 1 for every positive integer value of f, and thus (for positive ≤) ˆu(f)f1+≤ 
approaches 1. It is 0 for other arbitrarily large values of f, and thus no limit exists. 
Exercise 4.38: Z 
1 
−1 |u(t)|2dt = 2 
μ 
1 + 
1 
22 + 
∂ 
. 
1 
32 + ... 
This sum is finite so u(t) is L2. Now we’ll show that 
s(t) = 
X 
k 
u(k)sinc(t − k) = 
X 
k 
sinc(t − k) 
is neither L1 nor L2. Taking the Fourier Transform of s(t), 
ˆs(f) = 
X 
k 
rect(f)e−2πifk = rect(f) 
X 
k 
e−2πifk. 
To show that s(t) is not L1, 
Z 
1 
−1 |s(t)|dt = 
Z 
1 
−1 
s(t)dt since s(t) ≥ 0 for all t 
= ˆs(0) = 
X 
k 
1 = 1. 
31
To show that s(t) is not L2, 
Z 
1 
−1 |s(t)|2dt = 
Z 
1 
−1 | 
X 
k 
sinc(t − k)|2df = 1. 
Since u(k) is equal to 1 for every integer k, 
P 
k u2(k) = 1. The sampling theorem energy 
equation does not apply here 
°R 
|u(t)|2dt6= T 
P 
k |u(kT)|2 
¢ 
because ˆu(f) is not band-limited. 
32
Chapter 5 
Exercise 5.1: The first algorithm starts with a set of vectors S = {v1, . . . , vm} that span 
V but are dependent. A vector vk ∈ S is selected that is a linear combination of the other 
vectors in S. vk is removed from S, forming a reduced set S0. Now S0 still spans V since each 
v ∈ V is a linear combination of vectors in S, and vk in that expansion can be replaced by 
its representation using the other vectors. If S0 is independent, we are done, and if not, the 
previous step is repeated with S0 replacing S. Since the size of S is reduced by 1 on each such 
step, the algorithm terminates with an independent spanning set, i.e., a basis. 
The second algorithm starts with an independent set S = {v1, . . . , vm} of vectors that do 
not span the space. An arbitrary nonzero vector vm+1 ∈ V is then selected that is not a 
linear combination of S (this is possible since S does not span V). It can be seen that S0 = 
{v1, . . . , vm+1} is an independent set. If S0 spans V, we are done, and if not, the previous step is 
repeated with S0 replacing S. With each repetition of this step, the independent set is increased 
by 1 vector until it eventually spans V. 
It is not immediately clear that the second algorithm ever terminates. To prove this and also 
prove that all bases of a finite dimensional vector space have the same number of elements, we 
describe a third algorithm. Let Sind = v1, . . . , vm be an arbitrary set of independent vectors and 
let Ssp = {u1, . . . , un} be a finite spanning set for V (which must exist by the finite dimensional 
assumption). Then, for k = 1, . . .m, successively add vk to Ssp and remove one of the original 
vectors uj of Ssp so that the remaining set, say S0sp is still a spanning set. This is always possible 
since the added element must be a linear combination of a spanning set, so the augmented set is 
linearly dependent. One of the original elements of Ssp can be removed (while maintaining the 
spanning property) since the newly added vector is not a linear combination of the previously 
added vectors. A contradiction occurs if m > n, i.e., if the independent set is larger than the 
spanning set, since no more than the n original vectors in the spanning set can be removed. 
We have just shown that every spanning set contains at least as many members as any inde-pendent 
set. Since every basis is both a spanning set and an independent set, this means that 
every basis contains the same number of elements, say b. Since every independent set contains 
at most b elements, algorithm 2 must terminate as a basis when S reaches b vectors. 
Exercise 5.3: Let the n vectors that uniquely span a vector space V be called v1, v2, . . . , vn. 
We will prove that the n vectors are linearly independent using proof by contradiction. Assume 
v1, v2, . . . , vn are linearly dependent. Then 
Pnj 
=1 αjvj = 0 for some set of scalars α1, α2, .., αn 
where not all the αjs equal zero. Say αk6= 0. We can express vk as a linear combination of the 
other n − 1 vectors {vj}j6=k: 
vk = 
X 
j6=k 
−αj 
αk 
vj . 
Thus vk P 
has two representations in terms of {v1, . . . , vn}. One is that above, and the other 
is vk = 
j βjvj where βk = 1 and βj = 0 for j6= k. Thus the representation is non-unique, 
demonstrating the contradiction. 
It follows that if n vectors uniquely span a vector space, they are also independent and thus 
form a basis. From Theorem 5.1.1, the dimension of V is then n. 
33
Exercise 5.6: 
kv + uk2 = hv + u, v + ui 
= hv, v + ui + hu, v + ui by axiom (b) 
= hv, vi + hv, ui + hu, vi + hu, ui by axiom (b) 
≤ |hv, vi > | + |hv, ui| + |hu, vi| + |hu, ui| 
≤ kvk2 + kvkkuk + kukkvk + kuk2 = (kvk + kvk)2. 
So kv + uk ≤ kvk + kuk. 
Exercise 5.8: 
(a) By direct substitution of u(t) = 
P 
k,m ˆuk,mθk,m(t) and v∗(t) = 
P 
k,m ˆv∗k,mθ∗k,m(t) into the 
inner product definition 
hu, vi = 
Z 
1 
−1 
u(t)v∗(t) dt 
= 
Z 
1 
−1 
X 
k,m 
ˆuk,mθk,m(t) 
X 
k0,m0 
ˆv∗k0,m0θ∗k0,m0(t) dt 
= 
X 
k,m 
ˆuk,m 
X 
k0,m0 
ˆv∗k0,m0 
Z 
1 
−1 
θk,m(t)θ∗k0,m0(t) dt 
= T 
X 
k,m 
ˆuk,mˆv∗k,m. 
(b) For any real numbers a and b, 0 ≤ (a + b)2 = a2 + 2ab + b2. It follows that ab ≤ 12 
a2 + 12 
b2. 
Applying this to |ˆuk,m| and |ˆvk,m|, we see that 
|uk,mv∗k,m| = |uk,m| |v∗k,m| ≤ 
1 
2|uk,m|2 + 
1 
2|vk,m|2. 
Thus, using part (a), 
|hu, vi| ≤ T 
X 
k,m 
|uk,mv∗k,m| ≤ 
T 
2 
X 
k,m 
|uk,m|2 + T 
2 
X 
k,m 
|vk,m|2. 
Since u and v are L2, the latter sums above are finite, so |hu, vi| is also finite. 
(c) It is necessary for inner products in an inner-product space to be finite since, by definition 
of a complex inner-product space, the inner product must be a complex number, and the set 
of complex numbers (just like the set of real numbers) does not include 1. This seems like a 
technicality, but it is central to the special properties held by finite energy functions. 
Exercise 5.9: 
(a) For V to be a vector subspace, it is necessary for v = 0 to be an element of V, and this 
is only possible in the special case where ku1k = ku2k. Even in this case, however, V is not a 
vector space. This will be shown at the end of part (b). It will be seen in studying detection in 
Chapter 8 that V is an important set of vectors, subspace or not. 
34
(b) V can be rewritten as V = {v : kv −u1k2 = kv −u2k2}. Expanding these energy differences 
for k = 1, 2, 
kv − ukk2 = kvk2 − hv, uki − huk, vi + kukk2 
= kvk2 + kukk2 − 2<(hv, uki). 
It follows that v ∈ V if and only if 
kvk2 + ku1k2 − 2<(hv, u1i) = kvk2 + ku2k2 − 2<(hv, u2i). 
Rearranging terms, v ∈ V if and only if 
<(hv, u2 − u1i) = ku2k2 − ku1k2 
2 . (19) 
Now to complete part (a), assume ku2k2 = ku1k2 (which is necessary for V to be a vector space) 
and assume u16= u2 to avoid the trivial case where V is all of L2. Now let v = i(u2 − u1). 
Thus hv, u2 − u1i is pure imaginary so that v ∈ V. But iv is not in V since hiv, u2 − u1i = 
−ku2 −u1k26= 0. In a vector subspace, multiplication by a scalar (in this case i) yields another 
element of the subspace, so V is not a subspace except in the trivial case where u1 = u2¿ 
(c) Substituting (u1 + u2)/2 for v, we see that kv − u1k = k(u2 − u2)k and kv − u2k = 
k(−u2 + u2)k, so kv − u1k = kv − u2k and consequently v ∈ V. 
(d) The geometric situation is more clear if the underlying class of functions is the class of real 
L2 functions. In that case V is a subspace whenever ku1k = ku2k. If ku1k6= ku2k, then V is 
a hyperplane. In general, a hyperplane H is defined in terms of a vector u and a subspace S as H = {v : v = u + s for some s ∈ S}. In R2 a hyperplane is a straight line, not necessarily 
through the origin, and in R3, a hyperplane is either a plane or a line, neither necessarily 
including the origin. For complex L2, V is not a hyperplane. Part of the reason for this exercise 
is to see that real L2 and complex L2, while similar in may aspects, are very different in other 
aspects, especially those involving vector subspaces. 
Exercise 5.12: 
(a) To show that S⊥ is a subspace of V, we need to show that for any v1, v2 ∈ S⊥, αv1+βv2 ∈ S⊥ 
for all scalars α, β. If v1, v2 ∈ S⊥, then for all w ∈ S, hαv1 + βv2,wi = αhv1,wi+βhv2,wi = 
0 + 0. Thus αv1 + βv2 ∈ S⊥ and S⊥ is a subspace of V. 
(b) By the Projection Theorem, for any u ∈ V, there is a unique vector u|S ∈ S such that 
hu − u|S, si = 0 for all s ∈ S. So u⊥S = u − u|S ∈ S⊥S and we have a unique decomposition 
of u into u = u|S + u⊥S. 
(c) Let V and S (where S < V ) denote the dimensions of V and S respectively. Start with a 
set of V independent vectors s1, s2 · · · sV ∈ V. This set is chosen so that the first S of these i.e. 
s1, s2 · · · sS are in S. The first S orthonormal vectors obtained by Gram-Schmidt procedure 
will be a basis for S. The next V − S orthonormal vectors obtained by the procedure will be a 
basis for S⊥. 
Exercise 5.14: 
(a) Assume throughout this part that m, n are positive integers, m > n. We will show, as case 
1, that if the left end, am, of the pulse gm(t) satisfies am < an, then am + 2−m−1 < an, i.e., 
35
the pulses do not overlap at all. As case 2, we will show that if am ∈ (an, , an + 2−n−1), then 
am + 2−m−1 ∈ [an, an + 2−n−1], i.e., the pulses overlap completely. 
Case 1: Let dm be the denominator of the rational number am (in reduced form). Thus (since 
andn and amdm are integers), it follows that if am < an, then also am + 1 
dndm ≤ an. Since 
dn ≤ dm ≤ m for m ≥ 3, we have am + 1 
m2 ≤ an for m ≥ 3. Since 1/m2 > 2−m−1 for m ≥ 3, 
it follows that am + 2−m−1 ≤ an. Thus, if am < an, gm and gn do not overlap for any m > 3. 
Since g2 does not overlap g1 by inspection, there can be no partial overlap for any am < an. 
Case 2: Apologies! This is very tedious. Assume that am ∈ (an, an + 2−n−1). By the same 
argument as above, 
am ≥ an + 
1 
dndm 
and am + 
1 
dmd0n ≤ an + 2−n−1 (20) 
where d0n is the denominator of an + 2−n−1. Combining these inequalities, 
1 
dndm 
< 2−n−1. (21) 
We now separate case 2 into three subcases. First, from inspection of Figure 5.3 in the text, 
there are no partial overlaps for m < 8. Next consider m ≥ 8 and n ≤ 4. From the right side of 
(20), there can be no partial overlap if 
2−m−1 ≤ 
1 
dmd0n 
condition for no partial overlap. (22) 
From direct evaluation, we see that d0n ≤ 48 for n ≤ 4. Now dm2−m−1 is 5/512 for m = 8 and 
is decreasing for m ≥ 8. Since 5/512 < 1/48, there is no partial overlap for n ≤ 4,m ≥ 8. 
Next we consider the general case where n ≥ 5. From (21), we now derive a general condition 
on how small m can be for m, n pairs that satisfy the conditions of case 2. Since m ≥ dm for 
m ≥ 3, we have 
m > 
2n+1 
dn 
(23) 
For n ≥ 5, 2n+1/dn ≥ 2n + 2, so the general case reduces to n ≥ 5 and m ≥ 2n + 2. 
Next consider the condition for no partial overlap in (22). Since d0n ≤ 2n+1dn ≤ 2n+1n and 
dm ≤ m, the following condition also implies no partial overlap: 
m2−m−1 ≤ 
2−n−1 
n 
(24) 
The left side of (24) is decreasing in m, so if we can establish (24) for m = 2n+2, it is established 
for all m ≥ 2n+2. The left side, for m = 2n+2 is (2n+2)2−2m−3. Thus all that remains is to 
show that (2n + 2)n ≤ 2n+2. This, however, is obvious for n ≥ 5. 
Exercise 5.15: Using the same notation as in the proof of Theorem 4.5.1, 
u(n)(t) = 
Xn 
m=−n 
Xn 
k=−n 
ˆuk,mθk,m(t) ˆu(n)(f) = 
Xn 
m=−n 
Xn 
k=−n 
ˆuk,m√k,m(f). 
36
Since √k,m(f) is the Fourier transform of θk,m(t) for each k,m, the coefficients ˆuk,m are the same 
in each expansion. In the same way, 
v(n)(t) = 
Xn 
m=−n 
Xn 
k=−n 
ˆvk,mθk,m(t) ˆv(n)(f) = 
Xn 
m=−n 
Xn 
k=−n 
ˆvk,m√k,m(f). 
It is elementary, using the orthonormality of the θk,m and the orthonormality of the √k,m to see 
that for all n > 0, 
hu(n), v(n)i = 
Xn 
m=−n 
Xn 
k=−n 
ˆuk,mv∗k,m = hˆu 
(n), ˆv(n)i. (25) 
Thus our problem is to show that this same relationship holds in the limit n → 1. We know 
(from Theorem 4.5.1) that l.i.m.n→1u(n) = u, with the corresponding limits for v(n),ˆu 
(n), and 
ˆv(n). Using the Schwarz inequality on the second line below, and Bessel’s inequality on the 
third, 
|hu(n), vi − hu(n), v(n)i| = |hu(n), v − v(n)i| 
≤ ku(n)kkv − v(n)k 
≤ kukkv − v(n)k. 
Since limn→1 kv −v(n)k = 0, we see that limn→1 |hu(n), vi−hu(n), v(n)i| = 0. In the same way, 
limn→1hu(n), vi − hu, vi| = 0. Combining these limits, and going through the same operations 
on the transform side, 
lim 
n→1hu(n), v(n)i = hu, vi lim 
n→1hˆu 
(n), ˆv(n)i = hˆu 
, ˆvi. (26) 
Combining (25) and (26), we get Parsevals relation for L2 functions, hu, vi = hˆu 
, ˆvi. 
Exercise 5.16: 
(a) Colloquially, lim|f|→1 ˆu(f)|f|1+≤ = 0 means that 
ØØ 
ˆu(f)||f|1+≤ 
ØØ 
becomes and stays increas-ingly 
small as |f| becomes large. More technically, it means that for any δ > 0, there is an A(δ) 
such that 
ØØ 
ˆu(f)||f|1+≤ 
ØØ 
≤ δ for all f such that |f| ≥ A(δ). Choosing δ = 1 and A = A(1), we see 
that |ˆu(f)| ≤ |f|−1−≤ for |f| ≥ A. 
(b) 
Z 
1 
−1 |ˆu(f)| df = 
Z 
|f|>A |ˆu(f)| df + 
Z 
|f|≤A |ˆu(f)| df 
≤ 2 
Z 
1 
A 
f−1−≤ df + 
Z A 
−A |ˆu(f)| df 
= 
2A−≤ 
≤ 
+ 
Z A 
−A |ˆu(f)| df. 
Since ˆu(f) is L2, its truncated version to [−A,A] is also L1, so the second integral is finite, 
showing that ˆu(f) (untruncated) is also L1. In other words, one role of the ≤ above is to make 
ˆu(f) decreases quickly enough with increasing f to maintain the L1 property. 
37
(c) Recall that ˆs(n)(f) = 
P 
|m|≤n ˆsm(f) where ˆsm(f) = ˆu(f − m)rect†(f). Assuming A to be 
integer and m0 > A, |ˆsm0(f)| ≤ (m0 − 1)−1−≤. Thus for f ∈ (−1/2, 1/2] 
|ˆs(n)(f)| ≤ 
ØØØØØØ 
X 
|m|≤A 
ØØØØØØ 
ˆu(f −m) 
+ 
X 
(|m0| − 1)−1−≤ 
|m0|>A 
= 
ØØØØØØ 
X 
|m|≤A 
ØØØØØØ 
ˆu(f −m) 
+ 
X 
m≥A 
2m−1−≤. (27) 
1 
The factor →of 2 1 
→above was omitted by error from the exercise statement. Note that since the final 
sum converges, this is independent of n and is thus an upper bound on |s(ˆf)|. Now visualize the 
2A + 1 terms in the first sum above as a vector, say a. Let be the vector of 2A + 1 ones, so 
P 
P 
that ha,i = 
ak. Applying the Schwarz inequality to this, | 
k ak| ≤ kakk→1 
k. Substituting 
this into (27), 
|ˆs(f)| ≤ 
s 
(2A + 1) 
X 
|m|≤A 
|ˆu(f +m)|2 + 
X 
m≥A 
2m−1−≤. (28) 
(d) Note that for any complex numbers a and b, |a + b|2 ≤ |a + b|2 + |a − b|2 = 2|a|2 + 2|b|2. 
Applying this to (28), 
|ˆs(f)|2 ≤ (4A + 2) 
X 
|m|≤A 
|ˆu(f +m)|2 + 
 
 
X 
m≥A 
2m−1−≤ 
 
 
2 
. 
Since ˆs(f) is nonzero only in [1/2, 1/2] we can demonstrate that ˆs(f) is L2 by showing that the 
integral of |ˆs(f)|2 over [−1/2, 1/2] is finite. The integral of the first term above is 4A + 2 times 
the integral of |ˆu(f)|2 from −A − 1/2 to A + 1/2 and is finite since ˆu(f) is L2. The integral of 
the second term is simply the second term itself, which is finite. 
38
Chapter 6 
Exercise 6.1: Let Uk be be a standard M-PAM random variable where the M points each have 
probability 1/M. Consider the analogy with a uniform M-level quantizer used on a uniformly 
distributed rv U over the interval [−Md/2,Md/2]. 
✛ R1 ✲✛ R2 ✲✛ R3 ✲✛ R4 ✲✛ R5 ✲✛ R6 ✲ 
a1 a2 a3 a4 a5 a6 
0 
✛ d ✲ 
(M = 6) 
. 
Let Q be the quantization error for the quantizer and Uk be the quantization point. Thus 
U = Uk + Q. Observe that for each quantization point the quantization error is uniformly 
distributed over [−d/2, d/2]. This means that Q is zero mean and statistically independent of 
the quantization point Uk. It follows that 
k ] + E[Q2] = E[U2 
k ] + d2 
E[U2] = E[(Q + Uk)2 = E[U2 
12. 
On the other hand, since U is uniformly distributed, E[U2] = (dM)2/12. It then follows that 
k ] = d2(M2 − 1) 
E[U2 
12 . 
Verifying the formula for M=4: 
ES = 2 
°d2 
¢2 + 
°3d 
2 
¢2 
4 
= 
5 
4d2 
d2(M2 − 1) 
12 
= 
5 
4d2. 
Verifying the formula for M=8: 
ES = 2 
°d2 
¢2 + 
°3d 
2 
¢2 + 
°5d 
2 
¢2 + 
°7d 
2 
¢2 
8 
= 
21 
4 d2 
d2(M2 − 1) 
12 
= 
21 
4 d2. 
Exercise 6.3: 
(a) Since the received signal is decoded to the closest PAM signal, the intervals decoded to each 
signal are indicated below. 
✛ d ✲ 
R1 R2 R3 R4 ✛ ✲✛ ✲✛ ✲✛ ✲ 
a1 a2 a3 a4 −3d/2 −d/2 d/2 3d/2 
0 
. 
39
Thus if Uk = a1 is transmitted, an error occurs if Zk ≥ d/2. The probability of this is Q(d/2) 
where 
Q(x) = 
Z 
1 
x 
1 
√2π 
exp(−z2/2). 
If Uk = a2 is transmitted, an error occurs if either Zk ≥ d/2 or Zk < −d/2, so, using the 
symmetry of the Gaussian density, the probability of an error in this case is 2Q(d/2). In the 
same way, the error probability is 2Q(d/2) for a3 and Q(d/2) for a4. Thus the overall error 
probability is (3/2)Q(d/2). 
(b) Now suppose the third point is moved to d/2+≤. This moves the decision boundary between 
R3 and R4 by ≤/2 and similarly moves the decision boundary between R2 and R3 by ≤/2. The 
error probability then becomes 
Pe(≤) = 
1 
2 
Σ 
Q(d/2) + Q(d + ≤ 
2 
) + Q(d − ≤ 
2 
Π 
. 
) 
dP≤(e) 
d≤ 
= 
1 
4 
Σ 
1 
√2π 
exp(−(d + ≤)2/2) − 
1 
√2π 
Π 
. 
exp(−(d − ≤)2/2) 
This is equal to 0 at ≤ = 0, as can be seen by symmetry without actually taking the derivitive. 
(c) With the third signal point at d/2 + ≤, the signal energy is 
ES = 
1 
4 
"μ 
d 
2 
∂2 
+ 
μ 
d + ≤ 
2 
∂2 
+ 2 
μ 
3d 
2 
∂2 
# 
. 
The derivitive of this with respect to ≤ is (d + ≤)/8. 
(d) This means that to first order in ≤, the energy can be reduced by reducing a3 without 
changing Pe. Thus moving the two inner points slightly inward provides better energy efficiency 
for 4-PAM. This is quite counter-intuitive. The difference between optimizing the points in 
4-PAM and using standard PAM is not very significant, however. At 10 dB signal to noise 
ratio, the optimal placement of points (which requires considerably more computation) makes 
the ratio of outer points to inner points 3.15 instead of 3, but it reduces error probability by 
less than 1%. 
Exercise 6.4: 
(a) If for each j, 
Z 
1 
−1 
u(t)dj(t) dt = 
Z 
1 
−1 
1X 
k=1 
ukp(t−kT)dj(t) dt 
= 
1X 
k=1 
uk 
Z 
1 
−1 
p(t−kT)dj(t) dt = uj , 
then it must be that 
R 
1 
−1 
p(t−kT)dj(t) dt = hp(t − kT), dj(t)i has the value one for k = j and 
the value zero for all k6= j. That is, dj(t) must be orthogonal to p(t − kT) for all k6= j. 
40
(b) Since hp(t − kT), d0(t)i = 1 for k = 0 and equals zero for k6= 0, it follows by shifting each 
function by jT that hp(t − (k − j)T), d0(t)i equals 1 for j = k and 0 for j6= k. It follows that 
dj(t) = d0(t − jT). 
(c) In this exercise, to avoid ISI (intersymbol interference), we pass u(t) through a bank of 
filters d0(−t), d1(−t) . . . dj(−t) . . . , and the output of each filter at time t = 0 is u0, u1 . . . uj . . . 
respectively. To see this, note that the output of the j-th filter in the filter bank is 
rj(t) = 
1X 
k=1 
uk 
Z 
1 
−1 
p(τ−kT)dj(−t + τ ) dτ. 
At time t = 0, 
rj(0) = 
1X 
k=1 
uk 
Z 
1 
−1 
p(τ−kT)dj(τ ) dτ = uj . 
Thus, for every j, to retrieve uj from u(t), we filter u(t) through dj(−t) and look at the output 
at t = 0. 
However, from part (b), since dj(t) = d0(t−jT) (the j-th filter is just the first filter delayed by 
jT). Rather than processing in parallel through a filter bank and looking at the value at t = 0, 
we can process serially by filtering u(t) through d0(−t) and looking at the output every T. To 
verify this, note that the output after filtering u(t) through d0(−t) is 
r(t) = 
1X 
k=1 
uk 
Z 
1 
−1 
p(τ−kT)d0(−t + τ ) dτ, 
and so for every j, 
r(jT) = 
1X 
k=1 
uk 
Z 
1 
−1 
p(τ−kT)d0(τ − jT) dτ = uj . 
Filtering the received signal through d0(−t) and looking at the values at jT for every j is the 
same operation as filtering the signal through q(t) and then sampling at jT. Thus, q(t) = d0(−t). 
Exercise 6.6: 
(a) g(t) must be ideal Nyquist, i.e., g(0) = 1 and g(kT) = 0 for all non-zero integer k. The 
existence of the channel filter does not change the requirement for the overall cascade of filters. 
The Nyquist criterion is given in the previous problem as Eq. (??). 
P 
(b) It is possible, as shown below. There is no ISI if the Nyquist criterion 
m ˆg(f +2m) = 12 
for 
|f| ≤ 1 is satisfied. Since ˆg(f) = ˆp(f)ˆh 
(f)ˆq(f), we know that ˆg(f) is zero where ever ˆh 
(f) = 0. 
In particular, ˆg(f) must be 0 for |f| > 5/4 (and thus for f ≥ 2). Thus we can use the band edge 
symmetry condition, ˆg(f) + ˆg(2 − f) = 1/2 over 0 ≤ f ≤ 1. Since ˆg(f) = 0 for 3/4 < f ≤ 1, 
it is necessary that ˆg(f) = 1/2 for 1 < f ≤ 5/4. Similarly, since ˆg(f) = 0 for f > 5/4, we 
must satisfy ˆg(f) = 1/2 for |f| < 3/4. Thus, to satisfy the Nyquist criterion, ˆg(f) is uniquely 
specified as below. 
41
ˆg(f 0.5 ) 
0 34 
54 
In the regions where ˆg(f) = 1/2, we must choose ˆq(f) = 1/[2ˆp(f)ˆh 
(f)]. Elsewhere ˆg(f) = 0 
because ˆh 
(f) = 0, and thus ˆq(f) is arbitrary. More specifically, we must choose ˆq(f) to satisfy 
ˆq(f) = 
 
 
0.5, |f| ≤ 0.5; 
1 
3−2|f| 
, 0.5 < |f| ≤ 0.75 
1 
3−2|f| 
, 1 ≤ |f| ≤ 5/4 
It makes no difference what ˆq(f) is elsewhere as it will be multiplied by zero there. 
(c) Sinceˆh 
(f) = 0 for f > 3/4, it is necessary that ˆg(f) = 0 for |f| > 3/4. Thus, for all integers 
m, ˆg(f + 2m) is 0 for 3/4 < f < 1 and the Nyquist criterion cannot be met. 
(d) If for some frequency f, ˆp(f)ˆh 
(f)6= 0, it is possible for ˆg(f) to have an arbitrary value by 
choosing ˆq(f) appropriately. On the other hand, if ˆp(f)ˆh 
(f) = 0 for some f, then ˆg(f) = 0. 
Thus, to avoid ISI, it is necessary that for each 0 ≤ f ≤ 1/(2T), there is some integer m such that 
ˆh 
(f+m/T)ˆp(f+m/T)6= 0. Equivalently, it is necessary that 
P 
m 
ˆh 
(f+m/T)ˆp(f+m/T)6= 0 for 
all f. 
There is one peculiarity here that you were not expected to deal with. If ˆp(f)ˆh 
(f) goes through 
zero at f0 with some given slope, and that is the only f that can be used to satisfy the Nyquist 
criterion, then even if we ignore the point f0, the response ˆq(f) would approach infinity fast 
enough in the vicinity of f0 that ˆq(f) would not be L2. 
This overall problem shows that under ordinary conditions (i.e.non-zero filter transforms), there 
is no problem in choosing ˆq(f) to avoid intersymbol interference. Later, when noise is taken into 
account, it will be seen that it is undesirable for ˆq(f) be very large where ˆp(f) is small, since 
this amplifies the noise in frequency regions where there is very little signal. 
Exercise 6.8: 
(a) With α = 1, the flat part of ˆg(f) disappears. Using T = 1 and using the familiar formula 
cos2 x = (1 + cos 2x)/2, ˆg1(f) becomes 
ˆg1(f) = 
1 
2 
Σ 
1 + cos(πf 
2 
Π 
rect(f 
) 
2 
). 
Writing cos x = (1/2)[eix + e−ix] and using the frequency shift rule for Fourier transforms, we 
42
get 
g1(t) = sinc(2t) + 
1 
2 
sinc(2t + 1) + 
1 
2 
sinc(2t − 1) 
= 
sin(2πt) 
2πt 
+ 
1 
2 
sin(π(2t + 1)) 
π(2t + 1) 
+ 
1 
2 
sin(π(2t − 1)) 
π(2t − 1) 
= 
sin(2πt) 
2πt − 
1 
2 
sin(2πt) 
π(2t + 1) − 
1 
2 
sin(2πt) 
π(2t − 1) 
= 
sin(2πt) 
2π 
Σ 
1 
t − 
1 
2t + 1 − 
1 
2t − 1 
Π 
= 
sin(2πt) 
2πt(1 − 4t2) 
= 
sin(πt) cos(πt) 
πt(1 − 4t2) 
= 
sinc(t) cos(πt) 
(1 − 4t2) . 
This agrees with (6.18) in the text for α = 1, T = 1. Note that the denominator is 0 at t = ±0.5. 
The numerator is also 0, and it can be seen from the first equation above that the limiting value 
as t → ±0.5 is 1/2. Note also that this approaches 0 with increasing t as 1/t3, much faster than 
sinc(t). 
(b) It is necessary to use the result of Exercise 6.6 here. As shown there, the inverse transform 
of a real symmetric waveform gα(f) that satisfies the Nyquist criterion for T = 1 and has a 
rolloff of α ≤ 1 is equal to sinc(t)v(t). Here v(t) is lowpass limited to α/2 and its transform ˆv(f) 
is given by the following: 
ˆv(f + 1/2) = dˆg(f) 
df 
for −(1 + α) 
2 < f < 
(1 − α) 
2 . 
That is, we take the derivitive of the leading edge of ˆg(f), from −(1 + α)/2 to −(1 − α)/2 and 
shift by 1/2 to get ˆv(f). Using the middle expression in (6.17) of the text, and using the fact 
that cos2(x) = (1 + cos 2x)/2, 
ˆv(f + 1/2) = 
1 
2 
d 
df 
Σ 
1 + cos 
μ 
π(−f − (1 − α)/2) 
α 
∂Π 
for f in the interval (−(1 + α)/2,−(1 − α)/2). Shifting by letting s = f + 12 
, 
ˆv(s) = 
1 
2 
d 
ds 
cos 
Σ 
−πs 
α − 
π 
2 
Π 
= 
1 
2 
d 
ds 
sin 
hπs 
α 
i 
= π 
2α 
cos 
hπs 
α 
i 
12 
for s ∈ (−α/2, α/2). Multiplying this by rect(s/α) gives us an expression for v(ˆs) everywhere. 
Using cos x = (eix + e−ix) allows us to take the inverse transform of v(ˆs), getting 
v(t) = π 
4 
[sinc(αt + 1/2) + sinc(αt − 1/2)] 
= π 
4 
Σ 
sin(παt + π/2) 
παt + π/2 
+ 
sin(παt − π/2) 
παt − π/2 
Π 
. 
Using the identity sin(x + π/2) = cos x again, this becomes 
v(t) = 
1 
4 
Σ 
cos(παt) 
αt + 1/2 − 
cos(παt) 
αt − 1/2 
Π 
= 
cos(παt) 
1 − 4α2t2 . 
Since g(t) = sinc(t/a)v(t) the above result for v(t) corresponds with (6.18) for T = 1. 
43
(c) The result for arbitrary T follows simply by scaling. 
Exercise 6.9: 
(a) The figure is incorrectly drawn in the exercise statement and should be as follows: 
1 
1 
2 
−1 
− 2 3 
− 4 3 
− 2 7 
4 −1 
4 0 1 
4 
3 
4 
1 
2 
7 
4 
3 
2 
In folding these pulses together to check the Nyquist criterion, note that each pulse on the 
positive side of the figure folds onto the interval from −1/2 to −1/4, and each pulse of the left 
folds onto 1/4 to 1/2, Since there are k of them, each of height 1/k, they add up to satisfy the 
Nyquist criterion. 
(b) In the limit k → 1, the height of each pulse goes to 0, so the pointwise limit is simply the 
middle pulse. Since there are 2k pulses, each of energy 1/(4k2), the energy difference between 
that pointwise limit and ˆgk(f) is 1/(2k), which goes to 0 with k. Thus the pointwise limit and 
the L2 limit both converge to a function that does not satisfy the Nyquist criterion for T = 1 
and is not remotely close to a function satisfying the Nyquist condition. Note also that one 
could start with any central pulse and construct a similar example such that the limit satisfies 
the Nyquist criterion. 
Exercise 6.11: 
(a) Note that 
xk(t) = 2<{exp(2πi(fk + fc)t)} = cos[2π(fk + fc)t]. 
The cosine function is even, and thus x1(t) = x2(t) if f1 + fc = −f2 − fc. This is the only 
possibility for equality unless f1 = f2. Thus, the only f26= f1 for which x1(t) = x2(t) is 
f2 = −2fc −f1. Since f1 > −fc, this requires f2 < −fc, which is why this situation cannot arise 
when fk ∈ [−fc, fc) for each k. 
(b) For any ˆu1(f), one can find a function ˆu2(f) by the transformation f2 = −2fc − f1 in 
(a). Thus without the knowledge that u1(t) is lowpass limited to some B < fc, the ambiguous 
frequency components in u1(t) cannot be differentiated from those of u2(t) by observing x(t). 
If u(t) is known only to be bandlimited to some band B greater than fc, then the frequenices 
between −B and B − 2fc are ambiguous. 
An easy way to see the problem here is to visualize ˆu(f) both moved up by fc and down by fc. 
The bands overlap if B > fc and the overlapped portion can not be retrieved without additional 
knowledge about u(t). 
(c) The ambiguity is obvious by repeating the argument in (a). Now, since y(t) has some nonzero 
bandwidth, ambiguity might be possible in other ways also. We have already seen, however, 
that if u(t) has a bandwidth less than fc, then u(t) can be uniquely retrieved from x(t) in the 
absence of noise. 
(d) For u(t) real, x(t) = 2u(t) cos(2πfct), so u(t) can be retrieved by dividing x(t) by 2 cos(2πfct) 
except at those points of measure 0 where the cosine function is 0. This is not a reasonable 
approach, especially in the presence of noise, but it points out that the PAM case is essentially 
different from the QAM case 
44
(e) Since u∗(t) exp(2πifct) has energy at positive frequencies, the use of a Hilbert filter does 
not have an output equal to u(t) exp(2πifct), and thus u(t) does not result from shifting this 
output down by fc. In the same way, the bands at 2fc and −2fc that result from DSB-QC 
demodulation mix with those at 0 frequency, so cannot be removed by an ordinary LTI filter. 
For QAM, this problem is to be expected since u(t) cannot be uniquely generated by any means 
at all. 
For PAM it is surprising, since it says that these methods are not general. Since all time-limited 
waveforms are unbounded in frequency, it says that there is a fundamental theoretical problem 
with the standard methods of demodulation. This is not a problem in practice, since fc is usually 
so much larger than the nominal bandwidth of u(t) that this problem is of no significance. 
Exercise 6.13: 
(a) Since u(t) is real, φ1(t) = <{u(t)e2πifct} = u(t) cos(2πfct), and since v(t) is pure imaginary, 
φ2(t) = <{v(t)e2πifct} = [iv(t)] sin(2πifct). Note that [iv(t)] is real. Thus we must show that 
Z 
u(t) cos(2πfct)[iv(t)] sin(2πfct) dt = 
Z 
u(t)[iv(t)] sin(4πfct) dt = 0. 
Since u(t) and v(t) are lowpass limited to B/2, their product (which corresponds to convolution 
in the frequency domain) is lowpass limited to B < 2fc. Rewriting the sin(4πfct) above in 
terms of complex exponentials, and recognizing the resulting integral as the Fourier transform 
of u(t)[iv(t)] at ±2fc, we see that the above integral is indeed zero. 
(b) Almost anything works here, and a simple choice is u(t) = [iv(t)] = rect(8fct − 1/2). 
Exercise 6.15: 
(a) 
Z 
1 
−1 
√2p(t − jT) cos(2πfct)√2p∗(t − kT) cos(2πfct)dt 
= 
Z 
1 
−1 
p(t − jT)p∗(t − kT)[1 + cos(4πfct)]dt 
= 
Z 
1 
−1 
p(t − jT)p∗(t − kT)dt + 
Z 
1 
−1 
p(t − jT)p∗(t − kT) cos(4πfct)dt 
= δjk + 
1 
2 
Z 
1 
−1 
p(t − jT)p∗(t − kT) 
h 
e4πifct + e−4πifct 
i 
dt. 
The remaining task is to show that the integral above is 0. Let gjk(t) = p(t − jT)g∗(t − kT). 
Note that ˆgjk(f) is the convolution of the transform of p(t − jT) and that of p∗(t − kT). Since 
p is lowpass limited to fc, gjk is lowpass limited to 2fc, and thus the integral (which calculates 
the Fourier transform of gjk at 2fc and −2fc) is zero. 
45
(b) Similar to part (a) we get, 
Z 
1 
−1 
√2p(t − jT) sin(2πfct)√2p∗(t − kT) sin(2πfct)dt 
= 
Z 
1 
−1 
p(t − jT)p∗(t − kT)[1 − cos(4πfct)]dt 
= 
Z 
1 
−1 
p(t − jT)p∗(t − kT)dt − 
Z 
1 
−1 
p(t − jT)p∗(t − kT) cos(4πfct)dt 
= δjk − 
1 
2 
Z 
1 
−1 
gjk 
h 
e4πifct + e−4πifct 
i 
dt. 
Again, the integral is 0 and orthonormality is proved. 
Now for any j, k 
Z 
1 
−1 
√2p(t − jT) sin(2πfct)√2p∗(t − kT) cos(2πfct)dt 
= 
Z 
1 
−1 
p(t − jT)p∗(t − kT) sin(4πfct)dt 
= 
1 
2i 
Z 
1 
−1 
gjk 
h 
e4πifct − e−4πifct 
i 
dt, 
which is zero as before. Thus all sine terms are orthonormal to all cosine terms. 
Exercise 6.16: Let √k(t) = θk(t)e2πifct. Since 
Z 
√k(t)√∗j ((t) dt = 
Z 
θk(t)e2πifctθ∗j (t)e−2πifctdt = δkj , 
the set {√k(t)} is an orthonormal set. The set {√∗k(t)} is also an orthonormal set by the same 
reason. Also, since each √k(t) is bandlimited to [fc−B/2, fc+B/2] and each √∗k(t) is bandlimited 
to [−fc − B/2, −fc + B/2], the frequency bands do not overlap, so by Parsival’s relation, each 
√k(t) is orthonormal to each √∗j (t). This is where the constraint B/2 < fc has been used. 
Next note that the sets of functions √k,1(t) = <{2√k(t)} and √k,2(t) = ={2√k(t)} are given by 
√k,1(t) = √k(t) + √∗k(t) and i√k,2(t) = √k(t) − √∗k(t). 
It follows that the set {√k,1(t)} is an orthogonal set, each of energy 2, and the set {√k,2(t)} is 
an orthogonal set, each of energy 2. By the same reason, for each k, j with k6= j, √k,1 and √j,2 
are orthogonal. Finally, for each k, and for each ` = {1, 2}, 
Z 
√k,`√k,` dt = 
Z 
|√k(t)|2 + |√∗k(t)|2 dt = 2. 
Exercise 6.17: 
(a) This expression is given in (6.25) of the text. 
(b) Note that the hypothesized expression for x(t) is 
2|u(t)| cos[2πfct + φ(t)] = 2|u(t)| cos[φ(t)] cos(2πfct) − 2|u(t)| sin[φ(t)] sin(2πfct). 
46
Since u(t) = |u(t)|eiφ(t), 
<{u(t)} = |u(t)| cos[φ(t)] and ={u(t) = |u(t)| sin[φ(t)], 
0c 
so the hypothesized expression agrees with (6.25). Assuming that φ(t) varies slowly with respect 
to fct, x(t) varies between 2|u(t)| and −2|u(t)|, touching ±u(t) each once per cycle. 
(c) Since | exp(2πft)| = 1, for any real f, |u0(t)| = |u(t)| = |x+(t)|. Thus this envelope varies 
with the baseband waveform and is defined independent of the carrier. The phase modulation 
(as well as x(t)) does depend on the carrier. 
(d) Since 2πfct + φ(t) = 2πft + φ0(t), φ(t) and φ0(t) are related by 
φ0(t) = φ(t) + (fc − f0c 
)t. 
(e) There are two reasonable approaches to this. First, 
x2(t) = 4|u(t)|2 cos2[2πfct + φ(t)] = 2|u(t)|2 + 2|u(t)|2 cos[4πfct + 2φ(t)]. 
Filtering out the term at 2fc, we are left with 2|u(t)|2. The filtering has the effect of forming a 
short term average. The trouble with this approach is that it is not quite obvious that all of the 
high frequency term get filtered out. The other approach is more tedious and involves squaring 
x(t) using (6.25). After numerous trigonometric identities left to the imagination of the reader, 
the same result as above is derived. 
47
Chapter 7 
Exercise 7.1: 
(a) and (b) Since X, Y are independent, 
fX,Y (x, y) = αe−x2/2αe−y2/2 = α2e−(x2+y2)/2. 
Hence, the joint density has circular symmetry and the contours of equal probability density in 
the plane are concentric circles. Define 
FS(s) = Pr{S ≤ s} = Pr{X2 + Y 2 ≤ s} = 
ZZ 
(x,y): x2+y2≤s 
fX,Y (x, y) dxdy. 
Thus we need to integrate the joint density over a circle of radius √s centered at the origin. 
Consider dividing the region of integration into concentric annular rings with thickness dr. An 
annulus of radius r then contributes 
α2e−r2/2(2πr)dr 
to the integral (since the value of the fX,Y at a distance r is α2e−r2/2). Hence, 
FS(s) = 
ZZ 
(x,y): x2+y2≤s 
fX,Y (x, y) dxdy = 
Z √s 
0 
α2e−r2/2(2πr)dr = α22π(1 − e−s/2). 
This must approach 1 as s → 1, so α22π = 1 and α = 1/√2π. 
Differentiating FS(s), 
fS(s) = α22π 
μ 
1 
2e−s/2 
∂ 
= 
1 
2e−s/2. 
Recognizing S as an exponential rv with parameter 1/2, E[S] = 2. Also S = X2 +Y 2, and since 
X and Y are iid, E[X2] = E[Y 2]. Thus 
E[X2] = 1. 
Finally, since fX(x) is symmetric about 0, E[X] = 0. 
The point of parts (a) and (b) is to see that the spherical symmetry of the joint density of two 
(or more) iid Gaussian rvs allows us to prove properties that are cumbersome to prove directly. 
For instance, proving α = 1/√2π or E[X2] = 1 by considering the Gaussian pdf directly would 
be rather tedious. 
(c) Since FR(r) = Pr(R < r) = Pr(√S < r) = Pr(S < r2) = FS(r2), 
fR(r) = fS(r2)2r = re−r2/2. 
48
2Y 
Exercise 7.2: 
2X 
(a) Let Z = X + Y . Since X and Y are independent, the density of Z is the convolution of the 
X and Y densities. To make the principles stand out, assume σ= σ= 1. 
fZ(z) = fX(z) ∗ fY (z) = 
Z 
1 
−1 
fX(x)fY (z − x) dx 
= 
Z 
1 
−1 
1 
√2π 
e−x2/2 1 
√2π 
e−(z−x)2/2 dx 
= 
1 
2π 
Z 
1 
−1 
e−(x2−xz+z2 
2 ) dx 
= 
1 
2π 
Z 
1 
−1 
e−(x2−xz+z2 
4 )−z2 
4 dx 
= 
1 
2√π 
e−z2/4 
Z 
1 
−1 
1 
√π 
e−(x−z 
2 )2 
dx 
= 
1 
2√π 
e−z2/4 , 
2Y 
2X 
since the last integral integrates a Gaussian pdf with mean z/2 and variance 1/2, which evaluates 
to 1. As expected, Z is Gaussian with zero mean and variance 2. 
The ‘trick’ used here in the fourth equation above is called completing the square. The idea is 
to take a quadratic expression such as x2 + αz + βz2 and to add and subtract α2z2/4. Then 
x2 + αxz + αz2/4 is (x + αz/2)2, which leads to a Gaussian form that can be integrated. 
Repeating the same steps for arbitrary σand σ, we get the Gaussian density with mean 0 
and variance σ2X 
+ σ2Y 
. 
(b) and (c) We can also find the density of the sum of independent rvs by taking the product 
of the Fourier transforms of the densities and then taking the inverse transform. Since e−πt2 ↔ e−πf2 are a Fourier transform pair, the scaling property leads to 
1 
√2πσ2 
exp (− 
x2 
2σ2 ) ↔ exp (−π(2πσ2)f2) = exp[−2π2σ2θ2]. (29) 
Since convolution for densities corresponds to multiplication for their transforms, the transform 
of Z = X + Y is given by 
ˆ fZ(θ) = ˆ fX(θ) ˆ fY (θ) = exp 
£ 
−2π2θ2(σ2X 
+ σ2Y 
§ 
. 
) 
Recognizing this, with σ2 = σ2X+ σ2Y 
, as the same transform as in (29), the density of Z is the 
inverse transform 
fZ(z) = 
1 q 
2π(σ2X 
+ σ2Y 
) 
exp 
μ 
−z2 
2(σ2X 
+ σ2Y 
) 
∂ 
. (30) 
(d) Note that αkWk is a zero-mean Gaussian rv with variance α2k 
. Thus for n = 2, the density 
of V is given by (30) as 
fV (v) = 
1 p 
2π(α2 
1 + α2 
2) 
exp 
μ 
v2 
2(α2 
1 + α2 
2) 
∂ 
. 
49
The general formula, for arbitrary n, follows by iteration, viewing the sum of n variables as the 
sum of n − 1 variables (which is Gaussian) plus one new Gaussian rv. Thus 
fV (v) = 
1 q 
2π( 
P 
α2k 
) 
exp 
μ 
−v2 
Pnk 
=1 α2k 
2( 
) 
∂ 
. 
Exercise 7.3: 
(a) Note that fX is twice the density of a normal N(0, 1) rv for x ≥ 0 and thus that X is the 
magnitude of a normal rv. Multiplying by U simply converts X into a N(0, 1) rv Y1 that can 
be positive or negative. The mean and variance of Y1 are then 0 and 1 respectively. 
(b) Let Z be independent of X with the same density as X and let Y2 = UZ. Then Y2 is also 
a normal Gaussian rv. Note that Y1 and Y2 are each nonnegative if U = 1 and each negative if 
U = −1. Furthermore, given U = 1 , Y1 and Y2 are equal to X and Z respectively and thus have 
an iid Gaussian density conditional on being in the first quadrant. Given U = −1, Y1 and Y2 
are each negative with the density of −X and −Z. The unconditional density is then positive 
only in the first and third quadrant, with the conditional density in each quadrant multiplied 
by 1/2. 
Note that Y1 and Y2 are individually Gaussian, but clearly not jointly Gaussian, since their joint 
density is 0 in the second and fourth quadrant, and thus the contours of equi-probability density 
are not the ellipses of Figure 7.2. 
(c) 
E[Y1Y2] = E[UX UZ] = E[U2]E[X]E[Z] = (E[X])2, 
where we used the independence of U,X, and Z and also used the fact that U2 is deterministic 
with value 1. Now, 
E[X] = 
Z 
1 
0 
r 
2 
x 
π 
exp 
μ 
−x2 
2 
∂ 
dx = 
r 
2 
π 
, 
where we have integrated by using the fact that x dx = d(x2/2). Combining the above equations, 
E[Y1Y2] = 2/π. For jointly Gaussian rvs, the mean and covariance specify the joint density (given 
by (7.20) in the text for 2 rvs). Since this density is different from that of Y1 and Y2, this provides 
a very detailed proof that Y1 and Y2 are not jointly Gaussian. 
(d) In order for the joint probability of two individually Gaussian rvs V1, V2 to be concentrated 
on the diagonal axes, v2 = v1 and v2 = −v1, we arrange that v2 = ±v1 for each joint sample 
value. To achieve this, let X be the same as above and let U1 and U2 be iid binary rvs, each 
taking on the values +1 and -1 with equal probability. Then we let V1 = U1X and V2 = U2X. 
That is V1 and V2 are individually Gaussian but have identical magnitudes. Thus their sample 
values lie on the diagonal axes and they are not jointly Gaussian. 
Exercise 7.5: This exercise is needed to to justify that {V (τ ); τ ∈ R} in (7.34) of the text 
is L2 R with probability 1, but contains nothing explicit about random processes. Let g(τ ) = 
φ(t)h(τ − t) dt. Since convolution in the time domain corresponds to multiplication in the 
50
frequency domain, ˆg(f) = ˆφ(f)ˆh 
(f). Since h is L1, |ˆh 
(f)| ≤ 
R 
|h(t)| dt 
.= 
h◦ < 1. Thus 
Z 
|g(τ )|2 dτ = 
Z 
|ˆg(f)|2 df = 
Z 
|ˆh 
(f)|2|φ(f)|2 df ≤ (h◦)2 
Z 
|φ(f)|2 df = (h◦)2. 
To see that the assumption that h is L1 (or some similar condition) is necessary, let ˆφ(f) =ˆh 
(f) 
each be f−1/4 for 0 < |f| ≤ 1 and each be zero elsewhere. Then 
R 1 
0 |ˆφ(f)|2 df = 2. Thus h and 
θ are L2, but ˆh 
is unbounded and hence h is not L1. Also, 
R 1 
0 |ˆφ(f)ˆh 
(f)|2 df = 
R 1 
0 (1/f) df = 1. 
What this problem shows in general is that the convolution of two L2 functions is not necessarily 
L2, but that the additional condition that one of the functions is L1 is enough to guarantee that 
the convolution is L2. 
Exercise 7.7: 
(a) For each t ∈ <, Z(t) is a finite sum of zero-mean independent Gaussian rvs {φk(t)Zk; 1 ≤ k ≤ n} and is thus itself zero-mean Gaussian. 
var(Z(t)) = E[Z2(t)] 
= E 
"√ 
Xn 
k=1 
φk(t)Zk 
!√ 
Xn 
i=1 
φi(t)Zi 
!# 
= 
Xn 
i=1 
Xn 
k=1 
φi(t)φk(t)E[ZiZk] 
= 
Xn 
k=1 
φ2 
k(t)σ2 
k, 
where E[ZiZk] = σ2 
kδik since the Zk are zero mean and independent. 
(b) Let Y = (Z(t1),Z(t2) · · ·Z(t`))T and let Z = (Z1, . . . ,Zn)T be the underlying Gaussian 
rvs defining the process. We can write Y = BZ, where B is an ` × n matrix whose (m, k)th 
entry is B(m, k) = φk(tm). Since Z1, . . . ,Zn are independent, Y is jointly Gaussian. Thus 
{Z(t1),Z(t2) · · · ,Z(t`)} are jointly Gaussian. By definition then, Z(t) is a Gaussian process. 
P(c) Note that Z(j)(t) − Z(n)(t) = 
j 
Zkφk(t). By the same analysis as in part (a), 
k=n+1 var(Z(j)(t) − Z(n)(t)) = 
Xj 
k=n+1 
φ2 
k(t)σ2 
k. 
Since |φk(t)| < A for all k and t, we have 
var(Z(j)(t) − Z(n)(t)) ≤ 
Xj 
k=n+1 
σ2 
kA2. 
Because 
P 
1k=1 σ2 
k < 1, 
P 
1k 
k converges to 0 as n → 1. Thus, var(Z(j)(t) − Z(n)(t)) 
=n+1 σ2 
approaches 0 for all j as n → 1, i.e., for any δ > 0, there is an nδ such that var(Z(j)(t) − Z(n)(t)) ≤ δ for all j, n ≥ nδ. By the Chebyshev inequality, for any ≤ > 0, 
Pr[Z(j)(t) − Z(n)(t) ≥ ≤] ≤ δ/≤2. 
51
Choosing ≤2 = √δ, and letting δ go to 0 (with nδ → 1), we see that 
lim 
n,j→1 
Pr[Z(j)(t) − Z(n)(t) > 0] = 0. 
Thus we have a sequence of Gaussian random variables that, in the limit, take on the same 
sample values with probability 1. This is a strong (but not completely rigorous) justification for 
saying that there is a limiting rv that is Gaussian. 
(d) Since the set of functions {φk(t)} are orthonormal, 
kzk2 = 
Z 
1 
−1 
X 
( 
k 
X 
zkφk(t))( 
i 
ziφi(t))dt 
= 
X 
i, k 
zkzihφk(t), φi(t)i 
= 
X 
k 
z2 
k. 
Thus the expected energy in the process is E[k{Z(t)}k2] = 
P 
k E[Z2 
k] = 
P 
k σ2 
k 
(e) Using the Markov inequality, 
Pr{k{Z(t)}k2 > α} ≤ 
X 
k 
σ2 
k/α. 
Since 
P 
k < 1, limα→1 Pr{k{Z(t)}k2 > α} = 0. This says that sample functions of Z(t) are 
k σ2 
L2 with probability 1. 
Exercise 7.8: 
(a) Let t1, t2, . . . , tn be an arbitrary set of epochs. We will show that Z(t1), . . . ,Z(tn) are jointly 
Gaussian. Each of these rvs are linear combinations of the iid Gaussian set {Zk; k ∈ Z}, and, 
more specifically are linear combinations of a finite subset of {Zk; k ∈ Z}. Thus they are jointly 
Gaussian and {Z(t); t ∈ R} is a Gaussian random process. 
It is hard to take the need for a finite set above very seriously, but for the mathematically pure 
of heart, it follows from the fact that when t is not an integer plus 1/2, Z(t) is equal to Zk for 
k equal to the integer part of t + 1/2. In the special case where t is an integer plus 1/2, Z(t) is 
the sum of Zt−1/2 and Zt+1/2. 
(b) The covariance of {Z(t); t ∈ R} (neglecting the unimportant case where t or τ are equal to 
an integer plus 1/2) is: 
KZ(t, τ ) = E[Z(t)Z(τ )] 
= E[Zbt+0.5cZbτ+0.5c] 
= 
Ω 
1 , bt + 0.5c = bτ + 0.5c 0 , otherwise. 
(c) Since KZ(t, τ ) is a function of both t and τ rather than just t − τ (for example, 
0 = KZ(4/3, 2/3)6= KZ(2/3, 0) = 1 ), the process is not WSS. Hence it is not stationary. 
52
(d) Let V (0) = V0 and V (0.5) = V1. We see that V0 and V1 will be independent if 0 < φ ≤ 0.5 
and V0 = V1 if 0.5 < φ ≤ 1. Hence, 
fV1|V0,Φ(v1|v0, φ) = 
Ω 
N(0, 1) , 0 ≤ φ ≤ 0.5 
δ(v1 − v0) , otherwise 
Recognizing that Pr(0 ≤ Φ ≤ 0.5) = 1/2, the conditioning on Φ can be eliminated to get 
fV1|V0(v1|v0) = 
1 
2 
1 
√2π 
exp 
μ 
− 
v2 
1 
2 
∂ 
+ 
1 
2δ(v1 − v0). 
12 
12 
The main observation to be made from this is that V0 and V1 cannot be jointly Gaussian since 
this conditional distribution is not Gaussian. 
(e) Ignore the zero-probability event that φ = 1/2. Note that, given Φ < 1/2, V0 = Z0 and, 
given Φ > 1/2, V0 = Z−1. Since Φ is independent of Z0 and Z−1 and Z0 and Z−1 are iid, this 
means that V0 is N(0, 1) and, by the same argument, V1 is N(0, 1). Thus V (0) and V (0.5) are 
individually Gaussian but not jointly Gaussian. This is a less artificial example than those in 
Exercises 7.3 and 7.4 of Gaussian rvs that are not jointly Gaussian. The same argument applies 
to Z(t),Z(τ ) for any t, τ for which |t − τ | < 1, so {V (t); t ∈ R} is not a Gaussian process even 
though V (t) is a Gaussian rv for each t. 
(f) Consider the event that V (t) and V (τ ) are both given by the same Zi. This is impossible 
for |t − τ | > 1, so consider the case t ≤ τ < t + 1. Let V (t) = Zi (where i is such that 
−< t−i−Φ ≤ . Since Φ is uniform in [0, 1], Pr[V (τ ) = Zi] = 1−(τ −t). For t−1 < τ ≤ t, 
Pr[V (τ ) = Zi] = 1−(t−τ ). Given the event that V (t) and V (τ ) are both given by the same Zi 
the conditional expected value of V (t)V (τ ) is 1, and otherwise it is 0. Thus 
E[V (t)V (τ )] = 
Ω 
1 − |t − τ | , |t − τ | ≤ 1 
0 , otherwise. 
(g) Since the covariance function is a function of |t−τ | only, the process is WSS. For stationarity, 
we need to show that for all integers n, for all sets of epochs t1, . . . , tn ∈ R and for all shifts τ ∈ R, 
the joint distribution of V (t1), . . . , V (tn) is the same as the one of V (t1+τ ), . . . , V (tn+τ ). Let 
us consider n = 2. As in parts (d)-(e), one can show that the joint distribution of V (t1), V (t2) 
depends only on whether t1 −t2 is smaller than the value of φ or not; thus shifting both epochs 
by the same amount does not modify the joint distribution. For arbitrary n, one can observe 
that only the spacing between the epochs matters, hence a common shift does not affect the 
joint distribution and the process is stationary. 
Exercise 7.10: 
(a) Since X∗ −k = Xk for all k, we see that E[XkX∗ −k] = E[X2 
k ], Thus we are to show that 
E[X2 
k ] = 0 implies the given relations. 
E[X2 
k ] = E[(<(Xk) + i=(Xk))2] = E[<(Xk)2] − E[=(Xk)2] + 2iE[<(Xk)=(Xk)] 
Since both the real and imaginary parts of this must be 0, 
E[<(Xk)2] = E[=(Xk)2] and E[<(Xk)=(Xk)] = 0. 
Note: If Xk is Gaussian, this shows that it is circularly symmetric. 
53
(b) The statement of this part of the exercise is somewhat mangled. The intent (which is clear 
in Section 7.11.3, where this result is required) is to show that if E[XkX∗m 
] = 0 for all m6= k, 
then E[<(Xk)<(Xm)], E[<(Xk)=(Xm)], and E[=(Xk)=(Xm)] are all zero for all m6= ±k. To 
show this, note that 2i=(Xm) = [Xm − X∗m 
]. Thus, for m6= ±k, 
0 = E[XkX∗m 
] − E[XkX∗ −m] = E[Xk(X∗m 
− Xm)] = 2iE[Xk=(Xm)]. 
The real and imaginary part of this show that E[<(Xk)=(Xm)] = 0 and E[=(Xk)=(Xm)] = 0. 
The same argument, using 2<(Xm) = Xm + X∗m 
shows that E<(Xk)<(Xm)] = 0. 
Exercise 7.12: 
(a) We view Y as a linear functional of {Z(t)} by expressing it as 
Y = 
Z T 
0 
Z(t) dt = 
Z 
1 
−1 
Z(t)g(t) dt. 
where 
g(t) = 
( 
1 if t ∈ [0, T] 
0 otherwise 
. 
Since Y is a linear functional of a Gaussian process, it is a Gaussian rv. Hence, we only need to 
find its mean and variance. Since {Z(t)} is zero-mean, 
E[Y ] = E 
ΣZ 
1 
−1 
Π 
= 
Z(t)g(t) dt 
Z 
1 
−1 
E[Z(t)]g(t) dt = 0. 
E[Y 2] = E 
ΣZ T 
0 
Z(t)g(t) dt 
Z T 
0 
Z(τ )g(τ ) dτ 
Π 
= 
Z T 
0 
Z T 
0 
E[Z(t)Z(τ )]g(t)g(τ ) dτ dt 
= 
Z T 
0 
Z T 
0 
N0 
2 δ(τ − t)g(t)g(τ ) dτ dt 
= N0 
2 
Z 
1 
−1 
g2(τ ) dτ. 
For g(t) in this problem, 
R 
1 
−1 
g2(τ )dτ = T so E[Y 2] = N0T 
2 . 
(b) The ideal filter h(t), normalized to unit energy, is given by 
h(t) = √2Wsinc(2Wt) ↔ ˆh 
(f) = 
1 
√2W 
rect( f 
2W 
). 
The input process {Z(t)} is WGN of spectral density N0/2, as interpreted in the text. Thus the 
output is a stationary Gaussian process with the spectral density 
(f)|2 = N0 
SY (f) = Sz(f)|ˆh 
2 
1 
2W 
rect( f 
2W 
). 
The covariance function is then given by 
Y (τ ) = N0 
˜K 
2 
sinc(2Wτ ). 
54
The covariance matrix for Y1 = Y (0) and Y2 = Y ( 1 
4W ) is then 
K = N0 
2 
Σ 
1 2/π 
2/π 1 
Π 
. 
Using (7.20), the resulting joint probability density for Y1, Y2 is 
fY1Y2(y1y2) = 
1 
πN0 
p 
1 − (2/π)2 
exp 
μ 
−y2 
+ (4/π)y2 
1 y1y2 − 2 
N0(1 − (2/π)2) 
∂ 
. 
(c) 
V = 
Z 
1 
0 
e−tZ(t) dt = 
Z 
1 
−1 
g(t)Z(t) dt, 
where 
g(t) = 
( 
e−t for t ≥ 0 
0 otherwise. 
Thus, V is a zero-mean Gaussian rv with variance, 
E[V 2] = Z0 
2 
Z 
1 
−1 
g2(t) dt = Z0 
4 , 
i.e. V ∼ N(0,Z0/4). 
Exercise 7.14: 
(a) Following the hint, let Z1 be CN(0, 1) and let Z2 = UZ1 where U is independent of Z1 and 
is ±1 with equal probability. Then since <{Z1} and ={Z1} are iid and N(0, 1/2), it follows that 
<{Z2} = U<{Z1} and ={Z2} = U={Z1} are also iid and Gaussian and thus CN(0, 1). However 
<{Z1} and <{Z2} are not jointly Gaussian and ={Z1} and ={Z2} are not jointly Gaussian (see 
Exercise 7.3 (d)), and thus Z1 and Z2 are not jointly Gaussian. However, since Z1 and Z2 are 
circularly symmetric, Z and eiθZ have the same joint distribution for all θ. 
(b) We are given that <{Z} and ={Z} are Gaussian and that for each choice of φ, eiφZ has 
the same distribution as Z. Thus <{eiφZ} = cos(φ)<{Z} − sin(φ)={Z} must be Gaussian 
also. Scaling this, α cos(φ)<{Z} − α sin(φ)={Z} is also Gaussian for all α and all φ. Thus 
α1<{Z}+α2={Z} is Gaussian for all α1 and α2, which implies that <{Z} and ={Z} are jointly 
Gaussian. This, combined with the circular symmetry, implies that Z is circularly symmetric 
Gaussian. 
55
Chapter 8 
Exercise 8.1: 
(a) Conditional on observation v, the probability that hypothesis i is correct is pU|V (i|v). Thus 
for decision ˜U 
= 0, the cost is C00 if the decision is correct (an event of probability pU|V (0|v)) 
and C10 if the decision is incorrect (an event of probability pU|V (1|v)). Thus the expected cost 
for decision ˜U 
= 0 is 
C00pU|V (0|v) + C10pU|V (1|v). 
Combining this with the corresponding result for decision 1, 
˜U 
mincost = arg min 
j 
C0jpU|V (0|v) + C1jpU|V (1|v). 
(b) We have 
C01pU|V (0|v) + C11pU|V (1|v) ≥˜U=0 
<˜U=1 
C00pU|V (0|v) + C10pU|V (1|v) 
(C01 − C00)pU|V (0|v) ≥˜U=0 
<˜U=1 
(C10 − C11)pU|V (1|v), 
and using 
pU|V (j|v) = 
pjfV |U(v|j) 
fV (v) , j = 0, 1, 
we get the desired threshold test. 
(c) The MAP threshold test takes the form 
Λ(v) = 
˜fV |U(v|0) 
≥U=0 
fV |U(v|1) <U=1 
˜p1 
p0 
, 
so only the RHS of the minimum cost and the MAP tests are different. We can interpret the 
RHS of the cost detection problem as follows: the relative cost of an error (i.e., the difference 
in costs between the two hypotheses) given that 1 is correct is given by C10 −C11. This relative 
cost is then weighted by the a priori probability p1 for the same reason as in the MAP case. The 
important thing to observe here, however, is that both tests are threshold tests on the likelihood 
ratio. Thus the receiver structure in both cases computes the likelihood ratio (or LLR) and then 
makes the decision according to a threhold. The calculation of the likelihood ratio is usually the 
major task to be accomplished. Note also that the MAP test is the same as the minimum cost 
test when the relative cost is the same for each hypothesis. 
Exercise 8.3: Conditional on U, the Vi’s are independent since the Xi’s and Zi’s are inde-pendent. 
Under hypothesis U = 0, V1 and V2 are i.i.d. N(0, 1 + σ2) (because the sum of two 
independent Gaussian random variables forms a Gaussian random variable whose variance is 
the sum of the two individual variances) and V3 and V4 are i.i.d. N(0, σ2). Under hypothesis 
U = 1, V1 and V2 are i.i.d. N(0, σ2) and V3 and V4 are i.i.d. N(0, 1 + σ2). 
56
Note that by symmetry, the probability of error conditioned on U = 0 is same as that conditioned 
on U = 1. Hence the average probability of error is same as the probability of error conditioned 
on U = 0. 
(a), (b) Since the Vi’s are independent of each other under either hypothesis, we have 
fV|U(v|u) = fV1|U(v1|u)fV2|U(v2|u)fV3|U(v3|u)fV4|U(v4|u). Thus, 
LLR(v) = ln 
 
 
exp 
≥ 
− v2 
1+σ2 − v2 
1 
1+σ2 − v2 
2 
3 
σ2 − v2 
4 
σ2 
¥ 
exp 
≥ 
−v2 
1 
σ2 − v2 
2 
σ2 − v2 
1+σ2 − v2 
3 
4 
1+σ2 
¥ 
 
 
= (v2 
1 + v2 
2 − v2 
4) · 
3 − v2 
μ 
1 
σ2 − 
1 
1 + σ2 
∂ 
= Ea − Eb 
σ2(1 + σ2) . 
˜U 
This shows that the log-likelihood ratio depends only on the difference between Ea and Eb. Hence, 
the pair (Ea, Eb) is a sufficient statistic for this problem. Actually the difference Ea − Eb is also 
a sufficient statistic. 
(c) For ML detection, the LLR threshold is 0, i.e. the decision is U = 0 if LLR(v) is greater 
than or equal to zero and U = 1 if LLR(v) is less than zero. Thus the ML detection rule reduces 
≥=0 
to Ea <˜U 
=1 Eb. The threshold would shift to ln( Pr(U=1) 
Pr(U=0)) for MAP detection. 
(d) As described before, we only need to find the error probability conditioned on U = 0. 
Conditioning on U = 0 throughout below, the ML detector will make an error if Ea < Eb. Here 
(as shown in Exercise 7.1), Ea is an exponentially distributed random variable of mean 2 + 2σ2 
and Eb is an independent exponential rv of mean 2σ2, 
fEa(x) = 
1 
2 + 2σ2 exp( −x 
2 + 2σ2 ); fEb(x) = 
1 
2σ2 exp( −x 
2σ2 ). 
Thus, conditional on Ea = x (as well as U = 0), an error is made if Eb > x. 
Pr(Eb > x) = 
Z 
1 
x 
1 
2σ2 exp( −x 
2σ2 ) dx = exp( −x 
2σ2 ). 
The overall error probability is then 
Pr(e) = 
Z 
1 
0 
fEb(x) exp( −x 
2σ2 ) dx 
= 
Z 
1 
0 
1 
2 + 2σ2 exp( −x 
2 + 2σ2 ) exp( −x 
2σ2 ) dx 
= 
1 
2 + 1/σ2 . 
We can make a quick sanity check of this answer by checking that it equals 0 for σ2 = 0 and 
equals 1/2 for σ2 = 1. In the next chapter, it will be seen that this is the probability of error 
for binary signaling in flat Rayleigh fading. 
57
Exercise 8.5: Expanding y(t) and b(t) in terms of the orthogonal functions {φk,j(t)}, 
Z 
y(t)b(t) dt = 
Z 
 
 
X 
k,j 
 
 
yk,j√k,j(t) 
 
 
X 
k0,j0 
 
 dt 
bk0,j0√k0,j0(t)) 
= 
X 
k,j 
yk,jbk,j 
Z 
[√k,j(t)]2 dt 
= 2 
X 
k,j 
yk,jbk,j , 
where we first used the orthogonality of the functions √k,j(t) and next the fact that they each 
have energy 2. Dividing both sides by 2, we get (8.36) of the text. 
Exercise 8.6: 
(a) 
Q(x) = 
1 
√2π 
Z 
1 
x 
e−z2/2 dz 
= 
1 
√2π 
Z 
1 
0 
e−(x+y)2/2 dy where y = z − x 
= e−x2/2 
√2π 
Z 
1 
0 
e−y2/2−xy dy. 
(b) The upper bound, exp(−y2/2) ≤ 1 is trivial, since exp v ≤ 1 whenever v ≤ 0. For the 
lower bound, 1 − y2/2 ≤ exp(−y2/2) is equivalent (by taking the logarithm of both sides) to 
ln(1 − y2/2) ≤ −y2/2, which is the familiar log inequality we have used many times. 
(c) For the upper bound, use the upper bound of part (b), 
Q(x) = e−x2/2 
√2π 
Z 
1 
0 
e−y2/2−xy dy ≤ 
e−x2/2 
√2π 
Z 
1 
0 
e−xy dy = e−x2/2 
x√2π 
. 
For the lower bound, use the lower bound of part (b) and then substitute z for xy. 
Q(x) = e−x2/2 
√2π 
Z 
1 
0 
e−y2/2−xy dy 
≥ 
e−x2/2 
√2π 
Z 
1 
0 
e−xy dy − 
e−x2/2 
√2π 
Z 
1 
0 
y2 
2 e−xy dy 
= e−x2/2 
x√2π − 
e−x2/2 
√2π 
Σ 
1 
x3 
Z 
1 
0 
z2 
2 e−z dz 
Π 
= 
1 
x 
μ 
1 − 
1 
x2 
∂ 
e−x2/2 
√2π 
. 
Thus, 
μ 
1 − 
1 
x2 
∂ 
1 
x√2π 
e−x2/2 ≤ Q(x) ≤ 
1 
x√2π 
e−x2/2 for x > 0. (31) 
58
Exercise 8.7: 
(a) We are to show that Q(∞ + η) ≤ Q(∞) exp[−η∞ − η2/2] for ∞ ≥ 0 and η ≥ 0. Using the hint, 
Q(∞ + η) = 
1 
√2π 
Z 
1 
∞+η 
exp(− 
x2 
2 
) dx = 
1 
√2π 
Z 
1 
∞ 
exp(− 
(y + η)2 
2 
) dy 
= 
1 
√2π 
Z 
1 
∞ 
exp(− 
y2 
2 − ηy − 
η2 
2 
) dy 
≤ 
1 
√2π 
Z 
1 
∞ 
exp(− 
y2 
2 − η∞ − 
η2 
2 
) dy 
= exp[−η∞ − 
η2 
2 
] Q(∞), 
where, in the third step, we used the fact that y ≥ ∞ over the range of the integration. 
(b) Setting ∞ = 0 and recognizing that Q(0) = 1/2, 
Q(η) ≤ 
1 
2 
exp[− 
η2 
2 
]. 
This is tighter than the standard upper bound of (8.98) when 0 < η < 
p 
2/π. 
(c) Part (a) can be rewritten by adding and subtracting ∞2/2 inside the exponent. Then 
Q(∞ + η) ≤ exp 
Σ 
− 
(η + ∞)2 
2 
+ ∞2 
2 
Π 
Q(∞). 
Substituting w for ∞ + η yields the required result. 
Exercise 8.8: 
(a) An M-dimensional orthogonal signal set has M signal points and can transmit log2M bits 
per M-dimensions. Hence, 
ρ = 
2 log2M 
M 
bits per 2 dimensions. 
The energy per symbol is E and there are log2M bits per symbol. Hence, the energy per bit is, 
Eb = E 
log2M 
. 
(b) The squared distance between two orthogonal signal points aj and ak is given by, 
||aj − ak||2 = haj − ak, aj − aki 
= haj , aji + hak, aki − 2haj , aki 
= 2E − 2Eδjk 
= 
( 
2E if j6= k, 
0 otherwise 
Clearly, each point is equidistant from every other point and this distance is √2E. Hence, 
d2 
min(A) = 2E. 
59
Also every signal point has M − 1 nearest neighbors. 
(c) The ML rule chooses the ai that minimizes ky −aik2. This is equivalent to choosing the ai 
that maximizes hy, aii. This can be easily derived from the fact that ky −aik2 ≥ ky −ajk2 ⇔ 
hy, aii ≤ hy, aji. The ML rule is to project y on each of the signals and choose one with 
the largest projection. In a coordinate system where each signal waveform is collinear with a 
coordinate, this simply means choosing the hypothesis with the largest received coordinate. 
Exercise 8.10: 
(a) Conditional on A = a0, the normalized outputs W0, . . . ,WM−1 are independent and, except 
for W0, are iid, N(0, 1). Thus, conditional on W0 = w0, W1, . . .WM−1 are still iid. 
(b) Either at least one of the Wm, 1 ≤ m ≤ M − 1 are greater than or equal to w0 (this is the 
event whose probability is on the left of the first equation) or all are less than w0 (an event of 
probability [1 − Q(w0)]M−1. This verifies the first equation. 
The second inequality is obvious for M = 2, so we now verify it for M ≥ 3. Let x be Q(w0), 0 ≤ x ≤ 1 and let M − 1 be n. We then want to show that 
(1 − x)n ≤ 1 − nx + n2x2/2 
for 0 ≤ x ≤ 1 and n ≥ 2. This is clearly satisfied for x = 0, so it can be verified for 0 ≤ x ≤ 1 
by showing that the same inequality is satisfied by the first derivitive of each side, i.e., if 
−n(1−x)n−1 ≤ −n+n2x. This again is satisfied for x = 0, so it will be verified in general if its 
derivitive satisfies the inequality, i.e., if n(n − 1)(1 − x)n−2 ≤ n2, which is clearly satisfied for 
0 ≤ x ≤ 1. 
(c) Let y(w0) = (M − 1)Q(w0). Then y(w0) is decreasing in w0 and reaches the value 1 when 
w0 = ∞1. Using part (b), we then have 
Pr 
√ 
M[−1 
m=1 
! 
≥ y(w0) − y2(w0)/2 
≥ 
(Wm ≥ w0|A = a0) 
Ω 
y(w0)/2 for w0 > ∞1 
1/2 for w0 = ∞1 
The upper part of the second inequality is valid because, if w0 > ∞1, then y(w0) is less than 
1 and y2 < y. The lower part of the second inequality follows because y(∞1) = 1. The lower 
bound of 1/2 is also valid for all w0 < ∞1 because the probability on the left of the equation is 
increasing with decreasing w0. 
Note that these lower bounds differ by a factor of 2 from the corresponding union bound (and the 
fact that probabilities are at most 1). This might seem very loose, but since y(w0) is exponential 
in w0 over the range of interest, a factor of 2 is not very significant. 
(d) Using part (c) in part (a), 
Pr(e) = 
Z 
1 
−1 
fW0|A(w0|a0) Pr 
√ 
M[−1 
m=1 
! 
dw0 
(Wm ≥ w0|A = a0) 
≥ 
Z ∞1 
−1 
fW0|A(w0|a0) 
2 dw0 + 
Z 
1 
∞1 
fW0|A(w0|a0)(M − 1)Q(w0) 
2 dw0 (32) 
≥ 
1 
2Q(α − ∞1), (33) 
60
where, in the last step, the first integral in (32) is evaluated using the fact that W0, conditional 
on A = a0, is N(α, 1). The second integral in (32) has been lower bounded by 0. 
(e) From the upper and lower bounds to Q(x) in Exercise 8.6, we see that Q(x) ≈ 1 √2πx 
exp(−x2/2) for large x. The coefficient 1/√2πx here is unimportant in the sense that 
lim 
x→1 
ln[Q(x)] 
−x2/2 
= 1. 
This can be verified in greater detail by taking the log of each term in (31). Next, substituting 
∞1 for x and noting that limM→1 ∞1 = 1, this becomes 
lim 
M→1 
Σ 
ln(M − 1) 
∞2 
1/2 
Π 
= 1. 
Note that the limit is not affected by replacing M − 1 by M. Taking the square root of the 
resulting term in brackets, we get limM→1 ∞1/∞ = 1. 
Associating ∞ with ∞1, the upper bound to error probability in (8.57) of the text is substantially 
the same as the lower bound in part (d) (again in the sense of ignoring the coefficient in the 
bound to the Q function). The result that Pr(e) ≥ 1/4 for ∞1 = α is immediate from (33), and 
the result for ∞1 > α follows from the monotonicity of Q. 
(f) The problem statement was not very precise here. What was intended was to note that the 
lower bound here is close to the upper bound in (8.57), but not close to the upper bound in 
(8.59). The intent was to strengthen the lower bound in the case corresponding to (8.59), which 
is the low rate region where ∞ ≤ α/2. This can be done by using Exercise 8.6 to lower bound 
Q(w0) in the second integral in (32). After completing the square in the exponent, this integral 
agrees, in the exponential terms, with (8.59) in the text. 
What has been accomplished in this exercise is to show that the upper bound to error probability 
in Section 8.5.3 is essentially the actual error probability for orthogonal signals. Frequently it 
is very messy to evaluate error probability for codes and for sequences of codes of increasing 
block length, and much more insight can be achieved by finding upper and lower bounds that 
are almost the same. 
Exercise 8.12: 
(a) By definition, a vector space is closed under scalar multiplication. In this case, i is a scalar 
in the complex space spanned by the vector a and thus ia is in this space. 
(b) We must show that the inner product (in R2n) of D(a) and D(ia) is 0. Note that D(ia) = 
(−=(a1), . . . ,−=(an),<(a1), . . . ,<(an)), which is a real 2n-tuple. Using the inner product of 
R2n, 
hD(a),D(ia)i = 
X 
j 
−<(aj)=(aj) + =(aj)<(aj) = 0. 
(c) We first express hv, ai (the inner product in Cn), in terms of real and imaginary parts. 
hv, ai = [h<v,<ai + h=v,=ai] + i[h=v,<ai − h<v,=ai] 
= hD(v),D(a)i + ihD(v),D(ia)i. (34) 
61
Using this, 
kak2 = hD(v),D(a)ia + hD(v),D(ia)iia 
kak2 . 
v|a = hv, aia 
Since the inner products above are real and since kak = kD(ak, this can be converted to R2n as 
D(v|a ) = hD(v),D(a)iD(a) + hD(v),D(ia)iD(ia) 
kD(a)k2 . (35) 
This is the projection of D(v) onto the space spanned by D(a) and D(ia). It would be con-structive 
for the reader to trace through what this means in the special (but typical) case where 
the components of a are all real. 
(d) Note from (34) that <hv, ai = hD(v),D(a)i. Thus 
D 
μ 
<[hv, ai]a 
kak2 
∂ 
= hD(v),D(a)iD(a) 
kD(a)k2 . 
From (35), this is the projection of D(v|a ) onto D(a). 
Exercise 8.15: Theorem 8.4.1, generalized to MAP detection is as follows: 
Let U(t) = 
Pnk 
=1 Ukp(t − kT) be a QAM (or PAM) baseband input to a WGN channel and 
assume that {p(t−kT; 1 ≤ k ≤ n} is an orthonormal sequence. Assume that U = (U1, . . . ,Un)T 
is an n-vector of iid random symbols, each with the pmf p(0), . . . , p(M − 1). Then the Mn-ary 
MAP decision on U = (U1, . . . ,Un)T is equivalent to making separate M-ary MAP decisions on 
each Uk, 1 ≤ k ≤ n, where the decision on each Uk can be based either on the observation v(t) 
or the observation of vk. 
To see why this is true, view the detection as considering all binary MAP detections between 
each pair u, u0 of Mn-ary possible inputs and choose the MAP decision over all. From (8.42) 
and (8.43) of the text, 
LLRu,u0(v) = 
Xn 
k=1 
−(vk − uk)2 + (vk + u0k)2 
N0 
. 
The MAP test is to compare this with the MAP threshold, 
ln 
Σ 
PrU (u0) 
PrU (u) 
Π 
= 
Xn 
k=1 
ln p(u0k) 
p(uk) . 
It can be seen that this comparison is equivalent to n single letter comparisons. The sequence 
u that satisfies these tests for all u0 is the sequence for which each single letter test is satisfied. 
Exercise 8.18: 
(a) The two codewords 00 and 01 are mapped into the signals (a, a) and (a, -a). In R2, this 
appears as 
(a, a) 
(a,−a) 
° 
❅ 
❅ 
s 
° 
s 
62
Thus the first bit contributes nothing to the distance between the two signals, but achieves 
orthogonality using only ±a as signal values. As mentioned in Section 8.6.1 of the text, this is 
a trivial example of binary orthogonal codewords (binary sequences that differ from each other 
in half their positions). 
(b) Any row u in the first half of Hb+1 can be represented as (u1, u1) where u1 ∈ Hb is repeated 
as the first and second 2b entries of u. Similarly any other row u0 in the first half of Hb+1 can 
be represented as (u01, u01). The mod 2 sum of these two rows is thus 
u ⊕ u0 = (u1, u1) ⊕ (u01, u01) = (u1 ⊕ u01, u1 ⊕ u01). 
1 
→Since the mod 2 1 
→sum of any two rows of Hb is another row of Hb, u1 ⊕ u01 = u010 is a row of Hb. 
Thus u ⊕ u0 = (u010, u010) is a row in the first half of Hb+1. 
Any row in the second half of Hb+1 can be represented as (u1, u1⊕ ) where is a vector of 
2b ones and u1 is a row of Hb. Letting u0 be another vector in the second half of Hb+1 with the 
same form of representation, 
u ⊕ u0 = (u1, u1⊕ →1 
) ⊕ (u01, u01⊕ →1 
) = (u1 ⊕ u01, u1 ⊕ u01), 
where we have used the fact that →1 
⊕ →1 
is the zero vector. Thus u ⊕ u0 is a row in the first 
half of Hb+1. 
Finally if u = (u1, u1) and u0 = (u01, u01⊕ →1 
), then 
u ⊕ u0 = (u1, u1) ⊕ (u01, u01⊕ →1 
) = (u1 ⊕ u01, u1 ⊕ u01⊕ →1 
)), 
so that u ⊕ u0 is a row in the second half of Hb+1. 
Since H1 clearly has the property that each mod 2 sum of rows is another row, it follows by 
induction that Hb has this same property for all b ≥ 1. 
Exercise 8.19: As given in the hint, 
Prj 
=0 
°m 
j 
¢ 
is the number of binary m-tuples with at most 
r ones. In this formula, as in most other formulas using factorials, 0! must be defined to be 1, 
and thus 
°m 
0 
¢ 
= 1. Each m-tuple with at most r ones that ends in a one has at most r − 1 ones 
in the first m − 1 positions. There are 
Pr−1 
j=0 
°m−1 
j 
¢ 
such m − 1 tuples with at most r − 1 ones, 
so this is the number of binary m-tuples ending in one. The number ending in 0 is similarly the 
number of binary m− 1 tuples containing r or fewer ones. Thus 
Xr 
j=0 
μ 
m 
j 
∂ 
= 
Xr−1 
j=0 
μ 
m−1 
j 
∂ 
+ 
Xr 
j=0 
μ 
m−1 
j 
∂ 
. (36) 
(b) Starting with m = 1, the code RM(0,1) consists of two codewords, 00 and 11, so k(0, 1) 
(which is the base 2 log of the number of codewords) is 1. Similarly, RM(1,1) consists of four 
codewords so k(1, 1) = 2. Since 
°1 
0 
¢ 
= 1 and 
°1 
1 
¢ 
= 1, the formula 
k(r,m) = 
Xr 
j=0 
μ 
m 
j 
∂ 
(37) 
is satisfied for m = 1, forming the basis for the induction. Next, for any m ≥ 2, assume that 
(37) is satisfied for m0 = m − 1 and each r, 0 ≤ r ≤ m0. For 0 < r < m, each codeword x has 
63
the form x = (u, u ⊕ v) where u ∈ RM(r,m0) and v ∈ RM(r − 1,m0). Since each choice of u 
and v leads to a distinct codeword x, the number of codewords in RM(r,m) is the product of 
the number in RM(r,m0) and that in RM(r − 1,m0). Taking logs to get the number of binary 
information digits, RM(r,m) = RM(r,m − 1) + RM(r − 1,m − 1). Using (36), k(r,m) satisfies 
(37). Finally, k(0,m) = 1 and k(m,m) = m, also satisfying (37). 
64
Chapter 9 
Exercise 9.1: 
✭✭✭✭✭✭✭✭✭✭✭✭✭✭✭✭✭✭ 
✻ 
r0 
° 
° 
°✒ 
sending 
antenna 
receiving 
antenna 
✻ 
r(t) 
φ 
vt 
d(t) 
d(0) Line of sight 
Let v be the velocity of the receiving antenna, r0 be the distance from the sending antenna at 
time 0, and r(t) be the distance at time t. Let φ be the angle between the line of sight and the 
direction of motion of the receiving antenna. Assume (contrary to the figure) that vt is very 
small compared with r0. By plane geometry, 
r2(t) = (r0 + vt cos φ)2 + (vt sin φ)2 
= r2 
0 
μ 
1 + 
2vt cos φ 
r0 
+ v2t2 
r2 
0 
∂ 
. 
Taking the square root of both sides, ignoring the term in t2, and using the approximation 
√1 + x ≈ 1 + x/2 for small x, 
r(t) ≈ r0 + vt cos φ. 
A more precise way of saying this is that the derivitive of r(t) at t = 0 is v cos φ. Note that 
this result is independent of the orientation of the antennas and of the two angles specifying the 
receiver location. 
The received waveform may therefore be approximated as 
Er(f, t) ≈ 
< 
h 
α(θ0, √0, f) exp 
n 
2πif 
≥ 
t − r0+vt cos φ 
c 
¥oi 
r0 + vt cos φ 
. 
(b) Along with the approximations above for small t, there is also the assumption that the 
combined antenna pattern α(θ0, √0, f) does not change appreciably with t. The angles θ0 and 
√0 will change slowly with t, and the assumption here is that vt is small enough that α does not 
change significantly. 
Exercise 9.3: 
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤ 
r1 
r2 
③ 
Receiving 
Antenna 
✘✘✘✘✘✿ 
③ 
✻ 
hs 
❄ 
✛ r ✲ 
❄✻ 
hr 
Sending 
Antenna 
Ground Plane θ1 θ2 
65
(a) Let hs and hr be the heights of the sending and receiving antennas, respectively. By the 
reflection principle, θ1 = θ2. As shown in the figure, the length r2 of the reflected path is thus 
equal to the length of the path in that direction in the absence of a reflection. Thus 
r1 = 
p 
(hs − hr)2 + r2, r2 = 
p 
(hs + hr)2 + r2. 
Then, 
r2 − r1 = r 
r 
(hs + hr)2 
r2 + 1 − r 
r 
(hs − hr)2 
r2 + 1 
≈ r 
μ 
1 + 
(hs + hr)2 
2r2 
∂ 
− r 
μ 
1 + 
(hs − hr)2 
2r2 
∂ 
= 
2hshr 
r 
, 
where the approximation is good for (hs+hr)2 
r2 ø 1, i.e., when r is large relative to the heights 
of the antennas. Thus, r2 − r1 is asymptotically equal to b/r where b = 2hshr. This is quite 
surprising - the difference between the two path lengths is going to zero as 1/r. 
(b) and (c) Using the approximation r2 ≈ r1 + b/r from part (a), we write 
Er(f, t) = < 
£ 
αe2πi[ft−fr1/c] 
§ 
r1 − < 
£ 
αe2πi[ft−fr2/c] 
§ 
r2 
≈ < 
Σ 
αe2πi[ft−fr1/c] 
μ 
1 
r1 − 
1 
r2 
e−2πifb/rc 
∂Π 
. 
For sufficiently large r, b/r ø c/f). For example, if hs = 10m, hr = 1m, and r = 1km, then 
b/r = 0.02, which is much smaller than the wavelength c/f at f = 1gH. With this assumption, 
the second exponent above is close to zero and can be approximated by its first order expansion, 
Er(f, t) ≈ < 
Σ 
αe2πi[ft−fr1/c] 
μ 
1 
r1 − 
1 
r2 
∂Π 
(1 − 2πifb/rc) 
. 
Note that 1/r1 − 1/r2 is approximately b/r3, so it can be approximated by zero. Thus 
Er(f, t) ≈ < 
Σ 
αe2πi[ft−fr1/c] 2πifb 
r2rc 
Π 
≈ −2παfb 
cr2 sin[2π(ft − fr1/c)], (38) 
since r2 ≈ r for r ¿ 1. Thus, Er ≈ β/r2 where β = −(2πα/c) sin[2π(ft − fr1/c)]. 
The main point here is that the difference between r1 and r2 gets so small with increasing r that 
there is no Doppler spread, but simply a cancellation of path responses which causes the 1/r2 
decay with r. Viewed another way, the ground plane is gradually absorbing the radiated power. 
The student should now go back and look at the various approximations, observing that terms 
in 1/r3 were set to zero, and terms in 1/r2 were kept. 
Exercise 9.5: 
(a) Since there is only one path and it has Doppler shift D1, the Doppler Spread D is 0 and 
Δ = D1. 
ˆh 
(f, t) = e2πiD1t. 
66
√(f, t) = e−2πitΔˆh 
(f, t) = 1. 
The envelope of the output when the input is cos(2πft) is 
|ˆh 
(f, t)| = |√(f, t)| = 1. 
Note that in this case, there is Doppler shift but no fading, and the envelope, which is one, 
captures this lack of fading. 
(b) Here 
ˆh 
(f, t) = e2πiD1t + 1 
and there are two paths: one with Doppler shift D1 and the other with zero Doppler shift. Thus 
D = D1 and Δ = D1/2. 
√(f, t) = e−2πitΔˆh 
(f, t) 
= e−2πitD1/2(e2πiD1t + 1) 
= eπiD1t + e−πiD1t 
= 2 cos(πD1t). 
The envelope of the output when the input is cos(2πft) is 
|ˆh 
(f, t)| = |√(f, t)| = 2| cos(πD1t)|. 
Note that the fading here occurs at a frequency D1/2 Note that this is the same as if there were 
two paths with Doppler shifts D1/2 and −D1/2. In other words, the fading frequency depends 
on the Doppler spread rather than the individual shifts. 
Exercise 9.6: 
(a) The envelope of <[yf (t)] is defined to be |ˆh 
(f, t)|. Then 
|yf (t)| = |e2πiftˆh 
(f, t)| = |e2πift| · |ˆh 
(f, t)| = |ˆh 
(f, t)|. 
This is true no matter what f is, but the definition of |ˆh 
(f, t)| as an envelope only corresponds 
to our intuitive idea of envelope when f is large. 
(b) 
(<[yf (t)])2 = |ˆh 
(f, t)|2 cos2(2πft + ∠ˆh 
(f, t)) 
= 
1 
2|ˆh 
(f, t)|2(1 + cos(4πft + 2∠ˆh 
(f, t))). 
The result of lowpass filtering this power waveform is 
1 
2|ˆh 
(f, t)|2. 
With the assumption of large f, the angle of ˆh 
(f, t) is approximately constant over a cycle, so 
the short-time time-average of the power is 12 
|ˆh 
(f, t)|2. Thus the output of lowpass filtering the 
power waveform can be interpreted as the short-time time-average of the power. 
67
The square root of this time-average is 
|ˆh 
(f, t) √ | 2 
, 
which is just a scaled version of the envelope of <[yf (t)]. 
Thus, over short time scales, we can find the envelope of <[yf (t)] by squaring <[yf (t)], lowpass 
filtering, taking the square root, and multiplying by √2. 
Exercise 9.9: 
(a) Since θn and φn are independent, the mean of G0,0 is given by 
E[G0,0] = 
XN 
n=1 
E[θn]E[φn] 
= 
XN 
n=1 
2 
N 
(0) 
= 0. 
The variance of G0,0 is given by 
var(G0,0) = E[G2 0.0] = E 
 
 
√ 
XN 
n=1 
θnφn 
!2 
 
 
= 
XN 
n=1 
E[θ2n 
]E[φ2 
n] = 
XN 
n=1 
2 
N 
= 2, 
where we used the fact that each θnφn is zero mean and independent of θiφi for all i6= n and 
then the fact that each θn is independent of φn. 
Since G0,0 has mean 0 and variance 2 for all N, this is also the mean and variance in the limit 
N → 1. 
(b) This is a non-negative, integer-valued rv and thus is obviously non-Gaussian. 
In the Central Limit Theorem (CLT) one adds N iid rvs Y1, . . . , Yn of given mean and variance, 
divides by √N to normalize, and then goes to the limit N → 1. Here we are adding N iid 
rvs (Xn = θnφn) but normalizing by changing the distribution of each Xn as N changes. This 
might seem like a small difference, but in the CLT case, a typical normalized sample sequence 
y1/√N, . . . , yN/√N is a sequence of many small numbers; here a sample sequence x1, . . . , xN 
is a sequence of numbers, almost all of which are 0, with a small subset, not growing with N, of 
±1’s. 
(c) As mentioned above, G0,0 is the sum of a small number of ±1’s, so it will have an integer 
distribution where the integer will be small with high probability. To work this out analytically, 
let G0,0 = V (1) 
N + V (−1) 
N where V (1) 
N is the number of values of n, 1 ≤ n ≤ N, for which of 
θnφn = 1. Similarly, V (−1) 
N is the number of values of n for which θnφn = −1. Since θnφn is 1 
with probability 1/N, V (1) has the binomial pmf, 
N = k) = N! 
Pr(V (1) 
k!(N − k)! 
(1/N)k(1 − 1/N)N−k. 
68
Note that N!/(N − k)! = N(N − 1) · · · (N − k + 1). Thus, as N → 1 for any fixed k, 
lim 
N→1 
N! 
(N − k)!Nk → 1. 
Also (1 − 1/N)N−k = exp[(N − k) ln(1 − 1/N). Thus for any fixed k, 
(1 − 1/N)N−k = e−1. 
lim 
N→1 
Putting these relations together, 
lim 
N→1 
N = k) = Pr(V (1) = k) = e−1 
Pr(V (1) 
k! . 
In other words, the limiting rv, V (1), is a Poisson rv with mean 1. By the same argument, 
V (−1), the limiting number of occurrences of θnφn = −1, is a Poisson rv of mean 1. The limiting 
rvs V (0) and V (−1) are independent.3 Finally, the limiting rv G0,0 is equal to V (1) − V (−1). 
Convolving the pmf’s, 
Pr(G0,0 = `) = 
1X 
k=0 
e−1 
(k+l)! · 
e−1 
k! 
for ` ≥ 0. 
This goes to 0 rapidly with increasing `. The pdf is symmetric around 0, so the same formula 
applies for G0,0 = −`. 
Exercise 9.11: 
(a) From equation (9.59) of the text, 
Pr[e|(|G| = ˜g)] = 
1 
2 
exp 
μ 
− 
a2˜g2 
2WN0 
∂ 
. 
Since |G| has the Rayleigh density 
f|G|(˜g) = 2˜ge−˜g2 
, (39) 
the probability of error is given by 
Pr(e) = 
Z 
1 
0 
Pr[e|(|G| = ˜g)] f|G|(˜g) d˜g 
= 
Z 
1 
0 
˜g exp 
Σ 
−˜g2 
μ 
1 + a2 
2WN0 
∂Π 
d˜g 
= 
Σ 
2 + a2 
WN0 
Π−1 
= 
1 
2 + Eb/N0 
which is the same result as that derived in equation (9.56). 
N and V (−1) 
N are dependent. A cleaner mathematical approach would 
3This is mathematically subtle since V (1) 
be to find the pmf of the number of values of n that are either ±1 and then to find the conditional number that 
are +1 and −1. The answer is the same. 
69
(b) We want to find E[(a/|G|)2]. Since (a/|G|)2 goes to infinity quadratically as |G| → 0 and 
the probability density of |G| goes to 0 linearly as |G| → 0, we expect that this expected energy 
might be infinite. To show this, 
E[(a/|G|)2] = 
Z 
1 
0 
a2 
˜g2 f|G|˜g) d˜g = 
Z 
1 
0 
a2 
˜g2 2˜ge−˜g2 
d˜g 
≥ 
Z 1 
0 
2a2 
˜g 
e−1 d˜g = 1. 
Exercise 9.13: 
(a) Since X0,X1,X2,X3 are independent and the first two have densities αe−αx and the second 
two have densities βe−βx under u0, 
fX|U (x | u0) = α2β2 exp[−α(x0 + x1) − β(x2 + x3)] 
fX|U (x | u1) = α2β2 exp[−β(x0 + x1) − α(x2 + x3)]. 
(b) Taking the log of the ratio, 
LLR(x) = (β − α)(x0 + x1) − (β − α)(x2 + x3) 
= (β − α)(x0+x1−x2−x3). 
(c) Convolving the density for X0 (conditional on u0) with the conditional density for X1, 
fY0|U (y0 | u0) = α2y0e−αy0 . 
Similarly, 
fY1|U (y1 | u0) = β2y1e−βy1 . 
(d) Given U = u0, an error occurs if Y1 ≥ Y0. This is somewhat tedious, but probably the 
simplest approach is to first find Pr(e) conditional on u0 and Y0 = y0 and then multiply by the 
conditional density of y0 and integrate the answer over y0. 
Pr(e | U = u0, Y0 = y0) = 
Z 
1 
y0 
β2y1e−βy1 dy1 = (1 + βy0)e−βy0 . 
Performing the final tedious integration, 
Pr(e) = Pr(e | U = u0) = α3 + 3α2β 
(α + β)3 = 
1 + 3β/α 
(1 + β/α)3 . 
Since β/α = 1+Eb/2N0, this yields the final expression for Pr(e). The next exercise generalizes 
this and derives the result in a more insightful way. 
(e) Technically, we never used any assumption about this correlation. The reason is the same 
as with the flat fading case. G0,0 and G1,0 affect the result under one hypothesis and G0,2 and 
G1,2 under the other, but they never enter anything together. 
70
Exercise 9.14: 
(a) Under hypothesis H=1, we see that Vm = Zm for 1 ≤ m < L and Vm = √EbGm−L,m + Zm 
for L ≤ m < 2L. Thus (under H = 1), Vm is circularly symmetric complex Gaussian with 
variance N0 
2 per per real and imaginary part and Vm is similarly circularly symmetric complex 
Gaussian with variance Eb 
2L + N0 
2 . That is, given H = 1, Vm ∼ CN(0,N0) for 0 ≤ m < L 
and Vm ∼ CN(0,Eb/L + N0) for L ≤ m < 2L. In the same way, conditional on H = 0), 
Vm ∼ CN(0,Eb/L + N0) for 0 ≤ m < L and Vm ∼ CN(0,N0) for L ≤ m < 2L. Conditional on 
either hypothesis, the random variables V0, . . . , V2L−1 are statistically independent. 
Thus, the log likelihood ratio is 
LLR(v1, ..., v2L−1) = ln 
Σ 
f(v1, ..., v2L−1 | H = 0) 
f(v1, ..., v2L−1 | H = 1) 
Π 
= ln 
 
Aexp(− 
PL−1 
m=0 |vm|2 
Eb/L+N0 − 
P2L−1 
m=L |vm|2 
N0 
) 
Aexp(− 
PL−1 
m=0 |vm|2 
N0 − 
P2L−1 
m=L |vm|2 
Eb/L+N0 
) 
 
 
= 
≥PL−1 
m=0 |vm|2 − 
P2L−1 
m=L |vm|2 
¥ 
Eb/L 
(Eb/L + N0)(N0) , 
2m 
where A denotes the coefficient of the exponential in each of the Gaussian densities above; these 
terms cancel out of the LLR. Note that vis the sample value of the energy Xm in the mth 
received symbol. The ML rule is then to select H = 0 if 
PL 
m=0 Xm ≥ 
P2L−1 
m=L Xm and H = 1 
otherwise. 
Conditional on H = 0, we know that Xm = |Vm|2 for 0 ≤ m < L is exponential with density 
α exp(−αXm) for Xm ≥ 0 where α = 1 
Eb/L+N0 
. Also, Xm for L ≤ m < 2L is exponential with 
density β exp(−βXm) for Xm ≥ 0 where β = 1/N0. Thus, we can view 
PL−1 
m=0 Xm as the time of 
the Lth arrival in a Poisson process of rate α. Similarly we can view 
P2L−1 
m=L as the time of the 
Lth arrival in an independent Poisson process of rate β. Given H = 0, then, the probability of 
error is the probability that the Lth arrival of the first process (that of rate α) occurs before the 
Lth arrival of the second process (that of rate β). This is the probability that at least L arrivals 
from the first process and L − 1 arrivals from the second process precede the Lth arrival from 
the second process, i.e., that the first 2L−1 arrivals from the two processes together contain at 
least L arrivals from process 1. By symmetry, the same result applies conditional on H = 1. 
(b)A basic fact about Poisson processes is that the sum of two independent Poisson processes, 
one of rate α and the other of rate β, can be viewed as a single process of rate α + β which has 
two types of arrivals. Each arrival of the combined process is a type 1 arrival with probability 
p = α/(α + β) and is otherwise a type 2 arrival. The types are independent between combined 
arrivals. Thus if we look at the first 2L − 1 arrivals from the combined process, Pr(e|H = 0) = 
Pr(e|H = 1) is the probability that L or more of these independent events are type 1 events. 
From the Binomial expansion, 
Pr(e) = 
2XL−1 
`=L 
μ 
2L − 1 
` 
∂ 
p`(1 − p)2L−1−`. (40) 
71
(c) Expressing p as α/β 
1+α/β and 1 − p as 1 
1+α/β , (40) becomes 
Pr(e) = 
2XL−1 
`=L 
μ 
2L − 1 
` 
∂μ 
1 
1 + α/β 
∂2L−1 
(α/β)`. (41) 
Since β/α = 1 + Eb 
LN0 
, this becomes 
Pr(e) = 
2XL−1 
`=L 
μ 
2L − 1 
` 
∂μ 
1 + Eb/(LN0) 
2 + Eb/(LN0) 
∂2L−1 μ 
1 + Eb 
LN0 
∂ 
−` 
. (42) 
(d) For L = 1, note that 2L − 1 = 1 and the above sum has only one term and that term has 
` = 1. The combinatorial coefficient is then 1 and the result is Pr(e) = 1 
2+Eb/N0 
as expected. For 
L = 2, ` goes from 2 to 3 with combinatorial coefficients 3 and 1 respectively. Thus for L = 2, 
Pr(e) = 
μ 
1 + Eb/(2N0) 
2 + Eb/(2N0) 
∂3 μ 
3 
(1 + Eb/2N0)2 + 
1 
(1 + Eb/2N0)3 
∂ 
= 
4 + 3Eb/(2N0) 
(2 + Eb/(2N0))3 . 
which agrees with the result in Exercise 9.13. 
(e) The final term in (42) is geometrically decreasing with ` and the binomial coefficient is also 
decreasing. When Eb/(LN0) is large, the decreasing of the final term is so fast that all but the 
first term, with ` = L, can be ignored. For large L, we can use Stirling’s approximation for each 
of the factorials in the binomial term, 
μ 
2L − 1 
L 
∂ 
≈ 
22L−1 
√πL 
Thus, for L and Eb/LN0 large relative to 1, the middle factor in (42) is approximately 1 and we 
have 
Pr(e) ≈ 
1 
√4πL 
μ 
Eb 
4LN0 
∂L 
. 
The conventional wisdom is that Pr(e) decreases as (Eb/4N0)−L with increasing L, and this is 
true if Eb is the energy per bit for each degree of diversity (i.e., for each channel tap). Here we 
have assumed that Eb is shared between the different degrees of diversity, which explains the 
additional factor of L. The viewpoint here is appropriate if one achieves diversity by spreading 
the available power over L frequency bands (modeled here by L channel taps), thus achieving 
frequency diversity at the expense of less power per degree of freedom. If one achieves diversity by 
additional antennas at the receiver, for example, it is more appropriate to take the conventional 
view. 
The analysis above covers a very special case of diversity in which each diversity path has the 
same expected strength, each suffers flat Rayleigh fading, and detection is performed without 
measurement of the dynamically varying channel strengths. One gets very different answers 
when the detection makes use of such measurements and also when the transmitter is able to 
adjust the energy in the different paths. The point of the exercise, however, is to show that 
diversity can be advantageous even in the case here. 
72
(f)For the system analyzed above, with Lth order diversity, consider the result of a receiver that 
first makes hard decisions on each diversity path. That is, for each k, 0 ≤ k ≤ L−1, the receiver 
looks at outputs k and L + k and makes a decision independent of the other received symbols. 
The probability of error for a single value of k is simply the error probability for no fading, i.e., 
Pr(e, branch) = 
1 
2 + Eb/LN0 
. 
Note that this is equal to p = 1/(1 + β/α) as defined in part (b). Now suppose the diversity 
order is changed from L to 2L − 1, i.e., the discrete time model has 2L − 1 taps instead of 
L. We also assume that each mean square tap value remains at 1/L (this was not made clear 
in the problem statement). Thus the receiver now makes 2L − 1 hard decisions, each correct 
with probability p and, given the hypothesis, each independent of the others. Thus, using these 
2L − 1 hard decisions to make a final decision, the ML rule for the final decision is majority 
rule on the 2L − 1 local hard decisions. This means that the final decision is in error if L or 
more of the local decisions are in error. Since p is the probability of a local decision error, (40) 
gives the probability of a final decision error. This is the same as the error probability with Lth 
order diversity making an optimal decision on the raw received sequence. We note that in the 
situations such as that assumed here, where diversity is gained by dissipating available power, 
there is an additional 3dB power advantage in using ‘soft decisions’ beyond that exhibited here. 
Exercise 9.16: See the solution to Exercise 8.10, which is virtually identical. 
Exercise 9.17: (a) Note that the kth term of u ∗ u† is 
(u ∗ u†)k = 
X 
` 
u`u∗`+k = 2a2nδk. 
We assume from the context that n is the length of the sequence and assume from the factor 
of 2 that this is an ideal 4-QAM PN sequence. Since kuk2 is the center term of u ∗ u†, i.e., 
(u ∗ u†)0, it follows that kuk2 = 2a2n. Similarly, kbk2 is the center term of (b ∗ b†)0. Using the 
commutativity and associativity of convolution, 
b ∗ b† = u ∗ g ∗ b† ∗ g† 
= g ∗ u ∗ u† ∗ g† 
= 2a2ng ∗ g†. 
Finally, since kgk2 is the center term of gg†, i.e., (g ∗ g)0 
kbk2 = 2a2nkgk2 = kuk2kgk2. 
(b) If u0 and u1 are ideal PN sequences as given in part (a), then ku0k2 = ku1k2 = 2a2n. Using 
part (a) then, 
kb0k2 = ku0k2kgk2 = ku1k2kgk2 = kb1k2. 
73

More Related Content

PPTX
Calculus Homework Help
PPTX
Calculus Homework Help
PPTX
Calculus Homework Help
PDF
Probability and statistics assignment help
PDF
Calculus 08 techniques_of_integration
PDF
Mathematics assignment sample from assignmentsupport.com essay writing services
Calculus Homework Help
Calculus Homework Help
Calculus Homework Help
Probability and statistics assignment help
Calculus 08 techniques_of_integration
Mathematics assignment sample from assignmentsupport.com essay writing services

What's hot (19)

PPTX
Longest Common Subsequence
PPTX
Calculus Assignment Help
PDF
Third-kind Chebyshev Polynomials Vr(x) in Collocation Methods of Solving Boun...
PDF
Continuity and Uniform Continuity
PDF
Higher order derivatives for N -body simulations
DOC
Chapter 1 (maths 3)
PDF
Mcq differential and ordinary differential equation
PPT
maths
PDF
The 2 Goldbach's Conjectures with Proof
PPTX
Longest Common Subsequence
PPTX
Differential Equations Homework Help
PDF
Hypothesis of Riemann's (Comprehensive Analysis)
PPT
Unit i ppt (1)
PDF
Imc2016 day2-solutions
PDF
Permutations and Combinations IIT JEE+Olympiad Lecture 4
PDF
Calculus AB - Slope of secant and tangent lines
PDF
Dr. majeed &humam paper
PDF
Chapter-4: More on Direct Proof and Proof by Contrapositive
Longest Common Subsequence
Calculus Assignment Help
Third-kind Chebyshev Polynomials Vr(x) in Collocation Methods of Solving Boun...
Continuity and Uniform Continuity
Higher order derivatives for N -body simulations
Chapter 1 (maths 3)
Mcq differential and ordinary differential equation
maths
The 2 Goldbach's Conjectures with Proof
Longest Common Subsequence
Differential Equations Homework Help
Hypothesis of Riemann's (Comprehensive Analysis)
Unit i ppt (1)
Imc2016 day2-solutions
Permutations and Combinations IIT JEE+Olympiad Lecture 4
Calculus AB - Slope of secant and tangent lines
Dr. majeed &humam paper
Chapter-4: More on Direct Proof and Proof by Contrapositive
Ad

Viewers also liked (10)

PDF
Buy 1 bhk and 2 bhk flats in Moshi
PDF
Nerworking es xi
PPTX
Spare parts control for maintenance purposes
PDF
Time-Variant Distortions in OFDM
PDF
digital signal-processing-lab-manual
PPTX
My photography project
PPTX
On the Home Front Powerpoint
PDF
ภูมิปัญญาไทย
PDF
Usmstan2011
PPT
Konsep anatomi
Buy 1 bhk and 2 bhk flats in Moshi
Nerworking es xi
Spare parts control for maintenance purposes
Time-Variant Distortions in OFDM
digital signal-processing-lab-manual
My photography project
On the Home Front Powerpoint
ภูมิปัญญาไทย
Usmstan2011
Konsep anatomi
Ad

Similar to Prin digcommselectedsoln (20)

PDF
Proof of Kraft Mc-Millan theorem - nguyen vu hung
PDF
Basics of coding theory
PPTX
basicsofcodingtheory-160202182933-converted.pptx
PDF
Proof of Kraft-McMillan theorem
PDF
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
PDF
PDF
06 Arithmetic 1
PDF
Lecture01-Modeling and Coding-P2.pdf
PDF
Probability and stochastic processes 3rd edition Quiz Solutions
PPTX
Programming Exam Help
PPTX
Fundamental Limits on Performance in InformationTheory.pptx
PDF
Unit 3 Arithmetic Coding
PDF
sadsad asdasd dasdsa dasda sadCHAPTER 13.pdf
PDF
Tail
DOCX
Arithmetic coding
PDF
Unequal-Cost Prefix-Free Codes
PPT
Datacompression1
PDF
Shan.pdfFully Homomorphic Encryption (FHE)
PDF
Ch 04 Arithmetic Coding (Ppt)
PDF
Ch 04 Arithmetic Coding ( P P T)
Proof of Kraft Mc-Millan theorem - nguyen vu hung
Basics of coding theory
basicsofcodingtheory-160202182933-converted.pptx
Proof of Kraft-McMillan theorem
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
06 Arithmetic 1
Lecture01-Modeling and Coding-P2.pdf
Probability and stochastic processes 3rd edition Quiz Solutions
Programming Exam Help
Fundamental Limits on Performance in InformationTheory.pptx
Unit 3 Arithmetic Coding
sadsad asdasd dasdsa dasda sadCHAPTER 13.pdf
Tail
Arithmetic coding
Unequal-Cost Prefix-Free Codes
Datacompression1
Shan.pdfFully Homomorphic Encryption (FHE)
Ch 04 Arithmetic Coding (Ppt)
Ch 04 Arithmetic Coding ( P P T)

Prin digcommselectedsoln

  • 1. SELECTED SOLUTIONS TO PRINCIPLES OF DIGITAL COMMUNICATION Cambridge Press 2008 by ROBERT G. GALLAGER A complete set of solutions is available from Cambridge Press for instructors teaching a class using this text. This is a subset of solutions that I feel would be valuable for those studying the subject on their own. Chapter 2 Exercise 2.2: (a) V +W is a random variable, so its expectation, by definition, is E[V +W] = X v∈V X w∈W (v + w)pVW(v,w) = X v∈V X w∈W v pVW(v,w) + X v∈V X w∈W w pVW(v,w) = X v∈V v X w∈W pVW(v,w) + X w∈W w X v∈V pVW(v,w) = X v∈V v pV (v) + X w∈W w pW(w) = E[V ] + E[W]. (b) Once again, working from first principles, E[V ·W] = X v∈V X w∈W (v · w)pVW(v,w) = X v∈V X w∈W (v · w)pV (v)pW(w) (Using independence) = X v∈V v pV (v) X w∈W w pW(w) = E[V ] · E[W]. (c) To discover a case where E[V ·W]6= E[V ] ·E[W], first try the simplest kind of example where V and W are binary with the joint pmf pVW(0, 1) = pVW(1, 0) = 1/2; pVW(0, 0) = pVW(1, 1) = 0. 1
  • 2. Clearly, V and W are not independent. Also, E[V · W] = 0 whereas E[V ] = E[W] = 1/2 and hence E[V ] · E[W] = 1/4. The second case requires some experimentation. One approach is to choose a joint distribution such that E[V ·W] = 0 and E[V ] = 0. A simple solution is then given by the pmf, pVW(−1, 0) = pVW(0, 1) = pVW(1, 0) = 1/3. Again, V and W are not independent. Clearly, E[V ·W] = 0. Also, E[V ] = 0 (what is E[W]?). Hence, E[V ·W] = E[V ] · E[W]. (d) σ2V +W = E[(V +W)2] − (E[V +W])2 2W 2V = E[V 2] + E[W2] + E[2V ·W] − (E[V ] + E[W])2 = E[V 2] + E[W2] + 2E[V ] · E[W] − E[V ]2 − E[W]2 − 2E[V ] · E[W] = E[V 2] − E[V ]2 + E[W2] − E[W]2 = σ+ σ. Exercise 2.4: (a) Since X1 and X2 are iid, symmetry implies that Pr(X1 > X2) = Pr(X2 > X1). These two events are mutually exclusive and the event X1 = X2 has 0 probability. Thus Pr(X1 > X2) and Pr(X1 < X2) sum to 1, so must each be 1/2. Thus Pr(X1 ≥ X2) = Pr(X2 ≥ X1) = 1/2. (b) Invoking the symmetry among X1, X2 and X3, we see that each has the same probability of being the smallest of the three (the probability of a tie is 0). These three events are mutually exclusive and their probabilities must add up to 1. Therefore each event occurs with probability 1/3. (c) The event {N > n} is the same as the event {X1 is the minimum among the n iid random variables X1, X2, · · · , Xn}. By extending the argument in part (b), we see that Pr(X1 is the smallest of X1, . . . ,Xn) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= 1 n−1 for n ≥ 2. (d) Since N is a non-negative integer random variable (taking on values from 2 to 1), we can use Exercise 2.3(a) as follows: E[N] = 1X n=1 Pr {N ≥ n} = Pr {N ≥ 1} + 1X n=2 Pr {N ≥ n} = 1 + 1X n=2 1 n − 1 = 1 + 1X n=1 1 n . 2
  • 3. Since the series P 1n=1 1n diverges, we conclude that E[N] = 1. (e) Since the alphabet has a finite number of letters,1 Pr(X1 = X2) is no longer 0 and depends on the particular probability distribution. Thus, although, Pr(X1 ≥ X2) = Pr(X2 ≥ X1) by symmetry, neither can be found without knowing the distribution. Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric value. If X1 = amin, then no subsequent rv X2,X3, . . . can have a smaller value, so N = 1 in this case. Since the event X1 = amin occurs with positive probability, E[N] = 1. Exercise 2.6: (a) Assume the contrary; i.e., there is a suffix-free code that is not uniquely decodable. Then that code must contain two distinct sequences of source letters, say, x1, x2, . . . , xn and x01, x02, . . . , x0m such that, C(x1)C(x2) . . . C(xn) = C(x01)C(x02) . . . C(x0m). Then one of the following must hold: • C(xn) = C(x0m) • C(xn) is a suffix of C(x0m) • C(x0m) is a suffix of C(xn). In the last two cases we arrive at a contradiction since the code is hypothesized to be suffix-free. In the first case, xn must equal x0m because of the suffix freedom. Simply delete that final letter from each sequence and repeat the argument. Since the sequences are distinct, the final letter must differ after some number of repetitions of the above argument, and at that point one of the latter two cases holds and a contradiction is reached. Hence, suffix-free codes are uniquely decodable. (b) Any prefix-free code becomes a suffix-free code if the ordering of symbols in each codeword is reversed. About the simplest such example is {0,01,11} which can be seen to be a suffix-free code (with codeword lengths {1, 2, 2}) but not a prefix-free code. A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder. To illustrate a rather extreme case, consider the following output produced by the encoder, 0111111111 . . . Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the two possible source sequences, acccccccc . . . and bcccccccc . . . , 1The same results can be obtained with some extra work for a countably infinite discrete alphabet. 3
  • 4. till the end of the string is reached. Hence, in this case the decoder might have to wait for an arbitrarily long time before decoding. (c) There cannot be any code with codeword lengths (1, 2, 2) that is both prefix free and suffix free. Without loss of generality, set C1 = 0. Then a prefix-free code cannot use either the codewords 00 and 01 for C2 or C3, and thus must use 10 and 11, which is not suffix free. Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then, u1=0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after the binary point, which is not possible. Hence, the algorithm fails. Exercise 2.9: (a) Assume, as usual, that pj > 0 for each j. From Eqs. (2.8) and (2.9) H[X] − L = MX j=1 pj log 2−lj pj ≤ MX j=1 pj Σ 2−lj pj − 1 Π log e = 0. As is evident from Figure 2.7, the inequality is strict unless 2−lj = pj for each j. Thus if H[X] = L, it follows that 2−lj = pj for each j. (b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) = 1/4. The first order node 0 corresponds to the letter a and has probability 1/2. The first order node 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2. ✓ ✓ ✏✏✏ PPP ✏✏✏PPP PPP ✟✟✟ ❍❍❍PPP ✓✓ ❅ ❅❅ ✏✏✏ b ✏✏✏ 1 0 ✏✏✏ PPP ✟✟✟ PPP a c bb bc cb cc ba ca ab ac aa aa → 00 ab → 011 ac → 010 ba → 110 bb → 1111 bc → 1110 ca → 100 cb → 1011 cc → 1010 Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the second order node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In the same way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed with higher order nodes in the same way, but what is the principle behind this? In general, when an infinite binary tree is used to represent an unending sequence of letters from an iid source where each letter j has probability pj and length `j = 2−j , we Q see that each node corresponding to an initial sequence of letters x1, . . . , xn has a probability i 2−`xi equal to the product of the individual letter probabilities and an order equal to P i `xi . Thus each node labelled by a subsequence of letters has a probability 2−` where ` is the order of that node. The other nodes (those unlabelled in the example above) have a probability equal to the sum of the immediately following labelled nodes. This probability is again 2−` for an `th order node, which can be established by induction if one wishes to be formal. 4
  • 5. Exercise 2.11: (a) For n = 2,   MX j=1 2−lj   2 =   MX j1=1 2−lj1     MX j2=1 2−lj2   = MX j1=1 MX j2=1 2−(lj1+lj2 ). The same approach works for arbitrary n. (b) Each source n-tuple xn = (aj1aj2 , . . . , ajn), is encoded into a concatenation C(aj1)C(aj2) . . .C(ajn) of binary digits of aggregate length l(xn) = lj1 + lj2 + · · · ,+ljn. Since there is one n-tuple xn for each choice of aj1 , aj2 , . . . , ajn, the result of part (a) can be rewritten as   MX j=1 2−lj   n = X xn 2−l(xn). (1) (c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate length i,   MX j=1 2−lj   n = nXlmax i=1 Ai2−i. This uses the fact that since each codeword has length at most lmax, each concatenation has length at most nlmax. (d) From unique decodability, each of these concatenations must be different, so there are at most 2i concatenations of aggregate length i, i.e., Ai ≤ 2i. Thus, since the above sum contains at most nlmax terms,   MX j=1 2−lj   n ≤ nlmax. (2) (e) Note that [nlmax]1/n = exp Σ ln(nlmax) n Π −→ exp(0) = 1 as n → 1. Since (2) must be satisfied for all n, the Kraft inequality must be satisfied. Exercise 2.13: (a) In the Huffman algorithm, we start by combining p3 and p4. Since we have p1 = p3+p4 ≥ p2, we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yielding codewords of lengths 1,2,3 and 3 respectively. (b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2p2. Thus p1 = p3 + p4 ≤ 2p2 which implies p1 + p3 + p4 ≤ 4p2. 5
  • 6. Since p2 = 1−p1 −p3 −p4, the latter equation implies that 1−p2 ≤ 4p2, or p2 ≥ 0.2. From the former equation, then, p1 ≤ 2p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by also choosing p3 = p4 = 0.2. Thus pmax = 0.4. (c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2p1. Thus 1 − 2p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3 and p3 = p4 = 1/6. (d) The argument in part (b) remains the same if we assume p1 ≤ p3+p4 rather than p1 = p3+p4, i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax. Thus assuming p1 > pmax implies that p1 > p3 + p4. Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol 2 (or perhaps with symbol 1 if p2 = p1). Thus the codeword for symbol 1 (or perhaps the codeword for symbol 2) will have length 1. (e) The lengths of any optimal prefix free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 > pmax, then, from (b), p1 > p3 + p4, so the lengths (1, 2, 3, 3) yield a lower average length than (2, 2, 2, 2). (f) The argument in part (c) remains almost the same if we start with the assumption that p1 ≥ p3 + p4. In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2p1. Combined with p1 ≥ p2, we again have p1 ≥ pmin. Thus if p1 < pmin, we must have p3 + p4 > p1 ≥ p2. We then must combine p1 and p2 in the second step of the Huffman algorithm, so each codeword will have length 2. (g) It turns out that pmax is still 2/5. To see this, first note that if p1 = 2/5, p2 = p3 = 1/5 and all other symbols have an aggregate probability of 1/5, then the Huffman code construction combines the least likely symbols until they are tied together into a supersymbol of probability 1/5. The completion of the algorithm, as in part (b), can lead to either one codeword of length 1 or 3 codewords of length 2 and the others of longer length. If p1 > 2/5, then at each stage of the algorithm, two nodes of aggregate probability less than 2/5 are combined, leaving symbol 1 unattached until only 4 nodes remain in the reduced symbol set. The argument in (d) then guarantees that the code will have one codeword of length 1. Exercise 2.15: (a) This is the same as Lemma 2.5.1. (b) Since p1 < pM−1 + pM, we see that p1 < p0M−1, where p0M−1 is the probability of the node in the reduced code tree corresponding to letters M − 1 and M in the original alphabet. Thus, by part (a), l1 ≥ l0M −1 = lM − 1. (c) Consider an arbitrary minimum-expected-length code tree. This code tree must be full (by Lemma 2.5.2), so suppose that symbol k is the sibling of symbol M in this tree. If k = 1, then l1 = lM, and otherwise, p1 < pM + pk, so l1 must be at least as large as the length of the immediate parent of M, showing that l1 ≥ lM − 1. (d) and (e) We have shown that the shortest and longest length differ by at most 1, with some number m ≥ 1 lengths equal to l1 and the remainingM−m lengths equal to l1+1. It follows that 2l1+1 = 2m+(M −m) = M +m. From this is follows that l1 = blog2(M)c and m = 2l1+1 −M. Exercise 2.16: (a) Grow a full ternary tree to a full ternary tree at each step. The smallest tree has 3 leaves. For the next largest full tree, convert one of the leaves into an intermediate node and grow 3 leaves 6
  • 7. from that node. We lose 1 leaf, but gain 2 more at each growth extension. Thus, M = 3 + 2n (for n an integer). (b) It is clear that for optimality, all the unused leaves in the tree must have the same length as the longest codeword. For M even, combine the 2 lowest probabilities into a node at the first step, then combine the 3 lowest probability nodes for all the rest of the steps until the root node. If M is odd, a full ternary tree is possible, so combine the 3 lowest probability nodes at each step. (c) If {a, b, c, d, e, f} have symbol probabilities {0.3, 0.2, 0.2, 0.1, 0.1, 0.1} respectively, then the ternary Huffman code will be {a → 0, b → 1, c → 20, d → 21, e → 220, f → 221}. Exercise 2.18: (a) Applying the Huffman coding algorithm to the code with M +1 symbols with pM+1 = 0, we combine symbol M +1 with symbol M and the reduced code has M symbols with probabilities p1, . . . , pM. The Huffman code for this reduced set of symbols is simply the code for the original set of symbols with symbol M + 1 eliminated. Thus the code including symbol M + 1 is the reduced code modified by a unit length increase in the codeword for symbolM. Thus L = L0+pM where L0 is the expected length for the code with M symbols. (b) All n of the zero probability symbols are combined together in the Huffman algorithm, and the reduced code from this combination is then the same as the code with M + 1 symbols in part (a). Thus L = L0 + pM again. Exercise 2.19: (a) The entropies H(X), H(Y ), and H(XY ) can be expressed as H(XY ) = − X x∈X,y∈Y pXY (x, y) log pXY (x, y) H(X) = − X x∈X,y∈Y pXY (x, y) log pX(x) H(Y ) = − X x∈X,y∈Y pXY (x, y) log pY (y). It is assumed that all symbol pairs x, y of zero probability have been removed from this sum, and thus all x (y) for which pX(x) = 0 ( pY (y) = 0) are consequently removed. Combining these equations, H(XY ) − H(X) − H(Y ) = X x∈X,y∈Y pXY (x, y) log pX(x)pY (y) pXY (x, y) . (b) Using the standard inequality log x ≤ (x − 1) log e, H(XY ) − H(X) − H(Y ) ≤ X x∈X,y∈Y pXY (x, y) Σ pX(x)pY (y) pXY (x, y) − 1 Π log e = 0. Thus H(X, Y ) ≤ H(X)+H(Y ). Note that this inequality is satisfied with equality if and only if X and Y are independent. 7
  • 8. (c) For n symbols, X1, . . . ,Xn, let Y be the ‘super-symbol’ X2, . . . ,Xn. Then using (b), H(X1, . . . ,Xn) = H(X1, Y ) ≤ H(X1) + H(Y ) = H(X1) + H(X2, . . . ,Xn). Iterating this gives the desired result. An alternate approach generalizes part (b) in the following way: H(X1, . . . ,Xn) − X i H(Xi) = X x1,... ,xn p(x1, . . . , xn) log p(x1), . . . , ...p(xn) p(x1, . . . , xn) ≤ 0, where we have used log x ≤ (x − 1) log e again. Exercise 2.20 (a) Y is 1 if X = 1, which occurs with probability p1. Y is 0 otherwise. Thus H(Y ) = −p1 log(p1) − (1 − p1) log(1 − p1) = Hb(p1). (b) Given Y =1, X = 1 with probability 1, so H(X | Y =1) = 0. (c) Given Y =0, X=1 has probability 0, so X hasM−1 possible choices with non-zero probability. The maximum entropy for an alphabet of sizeM−1 terms is log(M−1), so H(X|Y =0) ≤ log(M− 1). This upper bound is met with equality if Pr(X=j | X6=1) = 1 M−1 for all j6= 1. Since Pr(X=j|X6=1) = pj/(1 − p1), this upper bound on H(X | Y =0) is achieved when p2 = p3 = · · · = pM. Combining this with part (b), H(X | Y ) = p1H(X | Y =1) ≤ (1−p1) log(M − 1). (d) Note that H(XY ) = H(Y ) + H(X|Y ) ≤ Hb(p1) + (1−p1) log(M−1) and this is met with equality for p2 = · · · , pM. There are now two reasonable approaches. One is to note that H(XY ) can also be expressed as H(X) + H(Y |X). Since Y is uniquely specified by X, H(Y |X) = 0, H(X) = H(XY ) ≤ Hb(p1) + (1 − p1) log(M − 1), (3) with equality when p2 = p3 = · · · = pM. The other approach is to observe that H(X) ≤ H(XY ), which again leads (3), but this does not immediately imply that equality is met for p2 = · · · = pM. Equation (3) is the Fano bound of information theory; it is useful when p1 is very close to 1 and plays a key role in the noisy channel coding theorem. (e) The same bound applies to each symbol by replacing p1 by pj for any j, 1 ≤ j ≤ M. Thus it also applies to pmax. 23 Exercise 2.22: One way to generate a source code for (X1,X2,X3 is to concatenate a Huffman code for (X1,X2) with a Huffman code of X3. The expected length of the resulting code for (X1,X2,X3) is Lmin,2 + Lmin. The expected length per source letter of this code is Lmin,2 + 8
  • 9. Lmin. The expected length per source letter of the optimal code for (X1,X2,X3) can be no worse, so 13 Lmin,3 ≤ 2 3Lmin,2 + 1 3Lmin. Exercise 2.23: (Run Length Coding) (a) Let C and C0 be the codes mapping source symbols to intermediate integers and intermediate integers to outbit bits respectively. If C0 is uniquely decodable, then the intermediate integers can be decoded from the received bit stream, and if C is also uniquely decodable, the original source bits can be decoded. The lengths specified for C0 satisfy Kraft and thus this code can be made prefix-free and thus uniquely decodable. For example, mapping 8 → 1 and each other integer to 0 followed by its 3 bit binary representation is prefix-free. C is a variable to fixed length code, mapping {b, ab, a2b, . . . , a7b, a8} to the integers 0 to 8. This set of strings forms a full prefix-free set, and thus any binary string can be parsed into these ‘codewords’, which are then mapped to the integers 0 to 8. The integers can then be decoded into the ‘codewords’ which are then concatenated into the original binary sequence. In general, a variable to fixed length code is uniquely decodable if the encoder can parse, which is guaranteed if that set of ‘codewords’ is full and prefix-free. (b) Each occurence of source letter b causes 4 bits to leave the encoder immediately. In addition, each subsequent run of 8 a’s causes 1 extra bit to leave the encoder. Thus, for each b, the encoder emits 4 bits with probability 1; it emits an extra bit with probability (0.9)8; it emits yet a further bit with probability (0.9)16 and so forth. Letting Y be the number of output bits per input b, E(Y ) = 4 + (.09)8 + (0.9)16 + · · · = 4 · (0.9)8 1 − (0.9)8 = 4.756. (c) To count the number of b’s out of the source, let Bi = 1 if the ith source letter is b and Bi = 0 otherwise. Then E(Bi) = 0.1 and σ2B i = 0.09. Let AB = (1/n) Pn i=1 Bi be the number of b’s per input in a run of n = 1020 inputs. This has mean 0.1 and variance (0.9) · 10−21, which is close to 0.1 with very high probability. As the number of trials increase, it is closer to 0.1 with still higher probability. (d) The total number of output bits corresponding to the essentially 1019 b’s in the 1020 source letters is with high probability close to 4.756 · 1019(1 + ≤) for small ≤. Thus, L ≈ (0.1) · 4 · (0.9)8 1 − (0.9)8 = 0.4756. Renewal theory provides a more convincing way to fully justify this solution. Note that the achieved L is impressive considering that the entropy of the source is −(0.9) log(0.9) − (0.1) log(0.1) = 0.469 bits/source symbol. Exercise 2.25: (a) Note that W takes on values −log(2/3) with probability 2/3 and −log(1/3) with probability 1/3. Thus E(W) = log 3 − 23 . Note that E(W) = H(X). The fluctuation of W around its means is −1 3 with probability 23 and 23 with probability 13 . Thus σ2W = 29 . 9
  • 10. (b) The bound on the probability of the typical set, as derived using the Chebyshev inequality, and stated in (2.2) is: Pr(Xn ∈ T≤) ≥ 1 − σ2W n≤2 = 1 − 1 45. 2Y (c) To count the number of a’s out of the source, let the rv Yi(Xi) be 1 for Xi = a and 0 for Xi = b. The Yi(Xi)’s are iid with mean Y = 2/3 and σ= 2/9. Na(Xn) is given by Na = Xn i=1 Yi(Xi), which has mean 2n/3 and variance 2n/9. (d) Since the n-tuple Xn is iid, the sample outcome w(xn) = P i w(xi). Let na be the sample value of Na corresponding to xn. Since w(a) = −log 2/3 and w(b) = −log 1/3, we have w(xn) = na(−log 2/3) + (n − na)(−log 1/3) = n log 3 − na W(Xn) = n log 3 − Na. In other words, ˜W (Xn), the fluctuation of W(Xn) around its mean, is the negative of the fluctuation of Na(Xn); that is ˜W (Xn) = − ˜N a(Xn). (e) The typical set is given by: T≤n = Ω xn : ØØØØ ØØØØ w(xn) n − E[W] æ = < ≤ Ω xn : ØØØØ ˜ w(xn) n ØØØØ æ < ≤ = Ω xn : ØØØØ ˜na(xn) n ØØØØ æ = < ≤ Ω xn : 105 μ 2 3 − ≤ ∂ < na(xn) < 105 μ 2 3 ∂æ + ≤ . where we have used ˜ w(xn) = −˜na(xn). Thus, α = 105 °23 − ≤ ¢ and β = 105 °23 ¢ . + ≤ (f) From part (c), Na = 2n/3 and σ2N a = 2n/9. The CLT says that for n large, the sum of n iid random variables (rvs) has a distribution function close to Gaussian within several standard deviations from the mean. As n increases, the range and accuracy of the approximation increase. In this case, α and β are 103 below and above the mean respectively. The standard deviation is p 2 · 105/9, so α and β are about 6.7 standard deviations from the mean. The probability that a Gaussian rv is more than 6.7 standard deviations from the mean is about (1.6) · 10−10. This is not intended as an accurate approximation, but only to demonstrate the weakness of the Chebyshev bound, which is useful in bounding but not for numerical approximation. Exercise 2.26: Any particular string xn which has i a’s and n−i b’s has probability °23 ¢i °13 ¢n−i. This is maximized when i = 105, and the corresponding probability is 10−17,600. Those strings with a single b have a probability 1/2 as large, and those with 2 b’s have a probability 1/4 as large. Since there are °n i ¢ different sequences that have exactly i a’s and n − i b’s, Pr{Na = i} = μ n i ∂μ 2 3 ∂i μ 1 3 ∂n−i . 10
  • 11. Evaluating for i = n, n−1, and n−2 for n = 105: Pr{Na = n} = μ 2 3 ∂n ≈ 10−17,609 Pr{Na = n−1} = 105 μ 2 3 ∂n−1 μ 1 3 ∂ ≈ 10−17604 Pr{Na = n−2} = μ 105 2 ∂μ 2 3 ∂n−2 μ 1 3 ∂2 ≈ 10−17600. What this says is that the probability of any given string with na ones decreases as na decreases, while the aggregate probability of all strings with na a’s increases with na (for na large compared to Na). We saw in the previous exercise that the typical set is the set where na is close to Na and we now see that the most probable individual strings have fantastically small probability, both individually and in the aggregate, and thus can be ignored. Exercise 2.28: (a) The probability of an n-tuple xn = (x1, . . . , xn) is pXn(xn) = Qnk =1 pX(xk). This product includes Nj(xn) terms xk for which xk is the letter j, and this is true for each j in the alphabet. Thus pXn(xn) = MY j=1 pNj (xn) j . (4) (b) Taking the log of (4), −log pXn(xn) = X j Nj(xn) log 1 pj . (5) Using the definition of Sn ≤ , all xn ∈ Sn ≤ must satisfy X j npj(1 − ≤) log 1 pj < X j Nj(xn) log 1 pj < npj(1 + ≤) log 1 pj nH(X)(1 − ≤) < X j Nj(xn) log 1 pj < nH(X)(1 + ≤). Combining this with (5, every xn ∈ S≤(n) satisfies H(X)(1 − ≤) < −log pXn(xn) n < H(X)(1 + ≤). (6) (c) With ≤0 = H(X)≤, (6) shows that for all xn ∈ Sn ≤ , H(X) − ≤0 < −log pXn(xn) n < H(X) + ≤0. By (2.25) in the text, this is the defining equation of Tn ≤0 , so all xn in Sn ≤ are also in Tn ≤0 . 11
  • 12. (d) For each j in the alphabet, the WLLN says that for any given ≤ > 0 and δ > 0, and for all sufficiently large n, Pr μØØØØ Nj(xn) n − pj ØØØØ ∂ ≤ ≥ ≤ δ M . (7) For all sufficiently large n, (7) is satisfied for all j, 1 ≤ j ≤ M. For all such large enough n, each xn is either in Sn ≤ or is a member of the event that |Nj (xn) n − pj | ≥ ≤ for some j. The union of the events that |Nj (xn) n − pj | ≥ ≤ for some j is upper bounded by δ, so Pr(Sn ≤ ) ≥ 1 − δ. (e) The proof here is exactly the same as that of Theorem 2.7.1. Part (b) gives upper and lower bounds on Pr(xn) for xn ∈ Sn and (d) shows that 1 ≤ − δ ≤ Pr(Sn ≤ ≤ 1, which together give the desired bounds on the number of elements in Sn ≤ . Exercise 2.30: (a) First note that the chain is ergodic (i.e., it is aperiodic and all states can be reached P from all other states). Thus steady state probabilities q(s) exist and satisfy the equations s q(s) = 1 and q(s) = P s0 q(s0)Q(s|s0). For the given chain, these latter equations are q(1) = q(1)(1/2) + q(2)(1/2) + q(4)(1) q(2) = q(1)(1/2) q(3) = q(2)(1/2) q(4) = q(3)(1). Solving by inspection, q(1) = 1/2, q(2) = 1/4, and q(3) = q(4) = 1/8. (b) To calculate H(X1) we first calculate the P pmf pX1(x) for each x ∈ X. Using the steady state probabilities q(s) for S0, we have pX1(x) = s q(s) Pr{X1=x|S0=s}. Since X1=a occurs with probability 1/2 from both S0=0 and S0=2 and occurs with probability 1 from S0=4, pX1(a) = q(0) 1 2 + q(2) 1 2 + q(4) = 1 2. Similarly, pX1(b) = pX1(c) = 1/4. Hence the pmf of X1 is ©12 , 14 , 14 ™ and H(X1) = 3/2. (c) The pmf of X1 conditioned on S0 = 1 is {12 , 12 }. Hence, H(X1|S0=1) = 1. Similarly, H(X1|S0=2)=1. There is no uncertainty from states 3 and 4, so H(X1|S0=3) = H(X1|S0=4) = 0. P Since H(X1|S0) is defined as Pr(S0 = s)H(X|S0=s), we have s H(X1|S0) = q(0)H(X1|S0=0) + q(1)H(X1|S0=0) = 3 4, which is less than H(X1) as expected. (d) We can achieve L = H(X1|S0) by achieving L(s) = H(X1|s) for each state s ∈ S. To do that, we use an optimal prefix-free code for each state. For S0 = 1, the code {a → 0, b → 1} is optimal with L(S0=1) = 1 = H(X1|S0=1). Similarly, for S0=2 {a → 0, c → 1} is optimal with L(S0=2) = 1 = H(X1|S0=2). Since H(X1|S0=3) = H(X1|S0=4) = 0, we do not use any code at all for the states 3 and 4. In other words, our encoder does not transmit any bits for symbols that result from transitions from these states. 12
  • 13. Now we explain why the decoder can track the state after time 0. The decoder is assumed to know the initial state. When in states 1 or 2, the next codeword from the corresponding prefix-free code uniquely determines the next state. When state 3 is entered, the the next state must be 4 since there is a single deterministic transition out of state 3 that goes to state 4 (and this is known without receiving the next codeword). Similarly, when state 4 is entered, the next state must be 1. When states 3 or 4 are entered, the next received codeword corresponds to the subsequent transition out of state 1. In this manner, the decoder can keep track of the state. (e)The question is slightly ambiguous. The intended meaning is how many source symbols x1, x2, . . . , xk must be observed before the new state sk is known, but one could possibly interpret it as determining the initial state s0. To determine the new state, note that the symbol a always drives the chain to state 0 and the symbol b always drives it to state 2. The symbol c, however, could lead to either state 3 or 4. In this case, the subsequent symbol could be c, leading to state 4 with certainty, or could be a, leading to state 1. Thus at most 2 symbols are needed to determine the new state. Determining the initial state, on the other hand, is not always possible. The symbol a could come from states 1, 2, or 4, and no future symbols can resolve this ambiguity. A more interesting problem is to determine the state, and thus to start decoding correctly, when the initial state is unknown at the decoder. For the code above, this is easy, since whenever a 0 appears in the encoded stream, the corresponding symbol is a and the next state is 0, permitting correct decoding from then on. This problem, known as the synchronizing problem, is quite challenging even for memoryless sources. Exercise 2.31: We know from (2.37) in the text that H(XY ) = H(Y ) + H(X | Y ) for any random symbols X and Y . For any k-tuple X1, . . . ,Xk of random symbols, we can view Xk as the symbol X above and view the k − 1 tuple Xk−1,Xk−2, . . . ,X1 as the symbol Y above, getting H(Xk,Xk−1 . . . ,X1) = H(Xk | Xk−1, . . . ,X1) + H(Xk−1, . . . ,X1). Since this expresses the entropy of each k-tuple in terms of a k−1-tuple, we can iterate, getting H(Xn,Xn−1, . . . ,X1) = H(Xn | Xn−1, . . . ,X1) + H(Xn−1, . . . ,X1) = H(Xn | Xn−1, . . . ,X1) + H(Xn−1 | Xn−2 . . . ,X1) + H(Xn−2 . . . ,X1) Xn = · · · = k=2 H(Xk | Xk−1, . . . ,X1) + H(X1). Exercise 2.32: (a) We must show that H(S2|S1S0) = H(S2|S1). Viewing the pair of random symbols S1S0 as a random symbol in its own right, the definition of conditional entropy is H(S2|S1S0) = X s1,s0 Pr(S1, S0 = s1, s0)H(S2|S1=s1, S0=s0) = X s1s0 Pr(s1s0)H(S2|s1s0). (8) 13
  • 14. where we will use the above abbreviations throughout for clarity. By the Markov property, Pr(S2=s2|s1s0) = Pr(S2=s2|s1) for all symbols s0, s1, s2. Thus H(S2|s1s0) = X s2 −Pr(S2=s2|s1s0) log Pr(S2=s2|s1s0) = X s2 −Pr(S2=s2|s1) log Pr(S2=s2|s1) = H(S2|s1). Substituting this in (8), we get H(S2|S1S0) = X s1s0 Pr(s1s0)H(S2|s1) = X s1 Pr(s1)H(S2|s1) = H(S2|S1). (9) (b) Using the result of Exercise 2.31, H(S0, S1, . . . , Sn) = Xn k=1 H(Sk | Sk−1, . . . , S0) + H(S0). Viewing S0 as one symbol and the n-tuple S1, . . . , Sn as another, H(S0, . . . , Sn) = H(S1, . . . , Sn | S0) + H(S0). Combining these two equations, H(S1, . . . , Sn | S0) = Xn k=1 H(Sk | Sk−1, . . . , S0). (10) Applying the same argument as in part (a), we see that H(Sk | Sk−1, . . . , S0) = H(Sk | Sk−1). Substituting this into (10), H(S1, . . . , Sn | S0) = Xn k=1 H(Sk | Sk−1). (c) If the chain starts in steady state, each successive state has the same steady state pmf, so each of the terms above are the same and H(S1, . . . , Sn|S0) = nH(S1|S0). (d) By definition of a Markov source, the state S0 and the next source symbol X1 uniquely determine the next state S1 (and vice-versa). Also, given state S1, the next symbol X2 uniquely determines the next state S2. Thus, Pr(x1x2|s0) = Pr(s1s2|s0) where x1x2 are the sample values of X1X2 in one-to-one correspondence to the sample values s1s2 of S1S2, all conditional on S0 = s0. Hence the joint pmf of X1X2 conditioned on S0=s0 is the same as the joint pmf for S1S2 conditioned on S0=s0. The result follows. (e) Combining the results of (c) and (d) verifies (2.40) in the text. Exercise 2.33: Lempel-Ziv parsing of the given string can be done as follows: 14
  • 15. Step 1: 00011101 |0{0z1} 0101100 ↑ u = 7 n = 3 Step 2: 00011101001 |01{z01} 100 u = 2 ↑ n = 4 Step 3: 000111010010101 |1{0z0} ↑ u = 8 n = 3 The string is parsed in three steps. In each step, the window is underlined and the parsed block is underbraced. The (n, u) pairs resulting from these steps are respectively (3,7), (4,2), and (3,8). Using the unary-binary code for n, which maps 3 → 011 and 4 → 00100, and a standard 3-bit map for u, 1 ≤ u ≤ 8, the encoded sequence is 011, 111, 00100, 010, 011, 000 (transmitted without commas). Note that for small examples, as in this case, LZ77 may not be very efficient. In general, the algorithm requires much larger window sizes to compress efficiently. Chapter 3 Exercise 3.3: (a) Given a1 and a2, the Lloyd-Max conditions assert that b should be chosen half way between them, i.e., b = (a1+a2)/2. This insures that all points are mapped into the closest quantization point. If the probability density is zero in some region around (a1 + a2)/2, then it makes no difference where b is chosen within this region, since those points can not affect the MSE. (b) Note that y(x)/Q(x) is the expected value of U conditional on U ≥ x, Thus, given b, the MSE choice for a2 is y(b)/Q(b). Similarly, a1 is (E[U] − y(b))/1 − Q(x). Using the symmetry condition, E[U] = 0, so a1 = −y(b) 1 − Q(b) a2 = y(b) Q(b) . (11) (c) Because of the symmetry, Q(0) = Z 1 0 f(u) du = Z 1 0 f(−u) du = Z 0 −1 f(u) du = 1 − Q(0). This implicity assumes that there is no impulse of probability density at the origin, since such an impulse would cause the integrals to be ill-defined. Thus, with b = 0, (11) implies that a1 = −a2. (d) Part (c) shows that for b = 0, a1 = −a2 satisfies step 2 in the Lloyd-Max algorithm, and then b = 0 = (a1 + a2)/2 then satisfies step 3. (e) The solution in part (d) for the density below is b = 0, a1 = −2/3, and a2 = 2/3. Another solution is a2 = 1, a1 = −1/2 and b = 1/3. The final solution is the mirror image of the second, namely a1 = −1, a2 = 1/2, and b = −1/3. 15
  • 16. 1 3≤ 1 3≤ ✲ ✛≤ ✲ ✛≤ ✲ ✛≤ -1 0 1 1 3≤ f(u) (f) The MSE for the first solution above (b = 0) is 2/9. That for each of the other solutions is 1/6. These latter two solutions are optimal. On reflection, choosing the separation point b in the middle of one of the probability pulses seems like a bad idea, but the main point of the problem is that finding optimal solutions to MSE problems is often messy, despite the apparent simplicity of the Lloyd-Max algorithm. Exercise 3.4: (a) Using the hint, we minimize MSE(Δ1,Δ2) + Πf(Δ1,Δ2) = 1 12 £ Δ21 f1L1 + Δ22 f2L2 § + Π Σ L1 Δ1 + L2 Δ2 Π over both Δ1 and Δ2. The function is convex over Δ1 and Δ2, so we simply take the derivative with respect to each and set it equal to 0, i.e., 1 6 Δ1f1L1 − Π L1 Δ21 = 0; 1 6 Δ2f2L2 − Π L2 Δ22 = 0. Rearranging, 6Π = Δ31 f1 = Δ32 f2, which means that for each choice of Π, Δ1f1/3 1 = Δ2f1/3 2 . 21 (b) We see from part (a) that Δ1/Δ2 = (f2/f1)1/3 is fixed independent of M. Holding this ratio fixed, MSE is proportional to Δand M is proportional to 1/Δ1 Thus M2 MSE is independent of Δ1 (for the fixed ratio Δ1/Δ2). M2MSE = 1 12 Σ f1L1 + Δ22 Δ21 f2L2 Π Σ L1 + L2 Δ1 Δ2 Π2 = 1 12 h f1L1 + f2/3 1 f1/3 2 L2 i " L1 + L2 f1/3 2 f1/3 1 #2 = 1 12 h f1/3 1 L1 + f1/3 2 L2 i3 . (c) If the algorithm starts with M1 points uniformly spaced over the first region and M2 points uniformly spaced in the second region, then it is in equilibrium and never changes. (d) If the algorithm starts with one additional point in the central region of zero probability density, and if that point is more than Δ1/2 away from region 1 and δ2/2 away from region 2, then the central point is unused (with probability 1). Since the conditional mean over the region mapping into that central point is not well defined, it is not clear what the algorithm will do. If it views that conditional mean as being in the center of the region mapped into that 16
  • 17. 2j point, then the algorihm is in equilibrium. The point of parts (c) and (d) is to point out that the Lloyd-Max algorithm is not very good at finding a global optimum. (e) The probability that the sample point lies in region j (j = 1, 2) is fjLj . The mean square error, using Mj points in region j and conditional on lying in region j, is L/(12M2 ). Thus, the j MSE with Mj points in region j is MSE = f1L31 12M21 + f2L32 12(M2)2 . This can be minimized numerically over integer M1 subject to M1 + M2 = M. This was minimized in part (b) without the integer constraint, and thus the solution here is slightly larger than that there, except in the special cases where the non-integer solution happens to be integer. (f) With given Δ1 and Δ2, the probability of each point in region j, j = 1, 2, is fjΔj and the number of such points is Lj/Δj (assumed to be integer). Thus the entropy is H(V ) = L1 Δ1 (f1Δ1) ln μ 1 f1L1 ∂ + L2 Δ2 (f2Δ2) ln μ 1 f2L2 ∂ = −L1f1 ln(f1L1) − L2f2 ln(f2L2). (g) We use the same Lagrange multiplier approach as in part (a), now using the entropy H(V ) as the constraint. MSE(Δ1,Δ2) + ΠH(Δ1,Δ2) = 1 12 £ Δ21 f1L1 + Δ22 f2L2 § − Πf1L1 ln(f1Δ1) − Πf2L2 ln(f2Δ2). Setting the derivatives with respect to Δ1 and Δ2 equal to zero, 1 6 Δ1f1L1 − Πf1L1 Δ1 = 0; 1 6 Δ2f2L2 − Πf2L2 Δ2 = 0. This leads to 6Π = Δ21 = Δ22 , so that Δ1 = Δ2. This is the same type of approximation as before since it ignores the constraint that L1/Δ1 and L2/Δ2 must be integers. Exercise 3.6: (a) The probability of the quantization region R is A = Δ(12 +x+ Δ2 ). To simplify the algebraic Δ2 12 1A messiness, shift U to U − x − Δ/2, which, conditional on R, lies in [−Δ/2,Δ/2]. Let Y denote this shifted conditional variable. As shown below, fY (y) = [y + (x++)]. E[Y ] = Z Δ/2 −Δ/2 y A [y + (x + 1 2 + Δ 2 )] dy = Z Δ/2 −Δ/2 y2 A dy + Z Δ/2 −Δ/2 y A [x + 1 2 + Δ 2 ] dy = Δ3 12A , since, by symmetry, the final integral above is 0. ✟✟✟✟✟✟✟✟ x fU(u) ✛Δ 1 2 + x ✲ −Δ 2 fY (y) = 1 Δ2 A(y+x+1 2+Δ 2 ) y 17
  • 18. Since Y is the shift of U conditioned on R, E[U|R] = x + Δ 2 + E[Y ] = x + Δ 2 + Δ3 12A . That is, the conditional mean is slightly larger than the center of the region R because of the increasing density in the region. (b) Since the variance of a rv is invariant to shifts, MSE= σ2U |R = σ2Y . Also, note from symmetry that R Δ/2 −Δ/2 y3 dy = 0. Thus E[Y 2] = Z Δ/2 −Δ/2 y2 A Σ y + (x + 1 2 + Δ 2 Π dy = ) (x + 12 + Δ2 ) A Δ3 12 = Δ2 12 . MSE = σ2Y = E[Y 2] − (E[Y ])2 = Δ2 12 − Σ Δ3 12A Π2 . MSE − Δ2 12 = − Σ Δ3 12A Π2 = Δ4 144(x + 12 + Δ2 )2 . (c) The quantizer output V is a discrete random variable whose entropy H[V ] is H(V ) = MX j=1 Z jΔ (j−1)Δ −fU(u) log[f(u)Δ] du = Z 1 0 −fU(u) log[f(u)] du − logΔ and the entropy of h(U) is by definition h[U] = Z 1 −0 −fU(u) log[fU(u)] du. Thus, h[U] − logΔ − H[V ] = Z 1 0 fU(u) log[f(u)/fU(u)] du. (d) Using the inequality ln x ≤ x − 1, Z 1 0 fU(u) log[f(u)/fU(u)] du ≤ log e Z 1 0 fU(u)[f(u)/fU(u) − 1] du = log e ΣZ 1 0 f(u) du − Z 1 0 Π fU(u) = 0. Thus, the difference h[U] − logΔ − H[V ] is non-positive (not non-negative). (e) Approximating ln x by (1+x) − (1+x)2/2 for x = f(u)/f(u) and recognizing from part d) that the integral for the linear term is 0, we get Z 1 0 fU(u) log[f(u)/fU(u)] du ≈ − 1 2 log e Z 1 0 Σ f(u) f(u) − 1 Π2 du (12) = − 1 2 log e Z 1 0 [f(u) − fU(u)]2 fU(u) du. (13) 18
  • 19. Now f(u) varies by at most Δ over any single region, and f(u) lies between the minimum and maximum f(u) in that region. Thus |f(u) − f(u)| ≤ Δ. Since f(u) ≥ 1/2, the integrand above is at most 2Δ2, so the right side of (13) is at most Δ2 log e. Exercise 3.7: (a) Note that 1 u(ln u)2 is the derivative of −1/ ln u and thus integrates to 1 over the given interval. (b) h(U) = Z 1 e 1 u(ln u)2 [ln u + 2 ln(ln u)] du = Z 1 e 1 u ln u du + Z 1 e 2 ln(ln u) u(ln u)2 du. The first integrand above is the derivative of ln(ln u) and thus the integral is infinite. The second integrand is positive for large enough u, and therefore h(U) is infinite. (c) The hint establishes the result directly. Exercise 3.8: (a) As suggested in the hint2, (and using common sense in any region where f(x) = 0) −D(fkg) = Z f(x) ln g(x) f(x) dx ≤ Z f(x) Σ g(x) f(x) − 1 Π dx = Z g(x) dx − Z f(x) dx = 0. Thus D(fkg) ≥ 0, (b) D(fkφ) = Z f(x) ln f(x) φ(x) dx = −h(f) + Z f(x) Σ ln√2πσ2 + x2 2σ2 Π dx = −h(f) + √2πeσ2. (c) Combining parts (a) and (b), h(f) ≤ √2πeσ2. Since D(φkφ) = 0, this inequality is satisfied with equality for a Gaussian rv ∼ N(0, σ2). Exercise 3.9: (a) For the same reason as for sources with probability densities, each representation point aj must be chosen as the conditional mean of the set of symbols in Rj . Specifically, aj = P i∈Rj pi ri P i∈Rj pi . 2A useful feature of divergence is that it exists whether or not a density exists; it can be defined over any quantization of the sample space and it increases as the quantization becomes finer, thus approaching a limit (which might be finite or infinite). 19
  • 20. (b) The symbol ri has a squared error |ri − aj |2 if mapped into Rj and thus into aj . Thus ri must be mapped into the closest aj and thus the region Rj must contain all source symbols that are closer to aj than to any other representation point. The quantization intervals are not uniquely determined by this rule since Rj can end and Rj+1 can begin at any point between the largest source symbol closest to aj and the smallest source symbol closest to aj+1. (c) For ri midway between aj and aj+1, the squared error is |ri − aj |2 = |ri − aj+1|2 no matter whether ri is mapped into aj or aj+1. (d) In order for the case of part (c) to achieve MMSE, it is necessary for aj and aj+1 to each be the conditional mean of the set of points in the corresponding region. Now assume that aj is the conditional mean of Rj under the assumption that ri is part of Rj . Switching ri to Rj+1 will not change the MSE (as seen in part (c)), but it will change Rj and will thus change the conditional mean of Rj . Moving aj to that new conditional mean will reduce the MSE. The same argument applies if ri is viewed as being in Rj+1 or even if it is viewed as being partly in Rj and partly in Rj+1. 20
  • 21. Chapter 4 Exercise 4.2: From (4.1) in the text, we have u(t) = P 1k =−1 ˆuke2πikt/T for t ∈ [−T/2, T/2]. Substituting this into R T/2 −T/2 u(t)u∗(t)dt, we have Z T/2 −T/2 |u(t)|2dt = Z T/2 −T/2 1X k=−1 ˆuke2πikt/T 1X `=−1 ˆu∗` e−2πi`t/T dt = 1X k=−1 1X `=−1 ˆuk ˆu∗` Z T/2 −T/2 e2πi(k−`)t/T dt = 1X k=−1 1X `=−1 ˆuk ˆu∗` Tδk,`, where δk,` equals 1 if k = ` and 0 otherwise. Thus, Z T/2 −T/2 |u(t)|2dt = T 1X k=−1 |ˆuk|2. Exercise 4.4: (a) Note that sa(k) − sa(k − 1) = ak ≥ 0, so the sequence sa(1), sa(2), . . . , is non-decreasing. A standard result in elementary analysis states that a bounded non-decreasing sequence must have a limit. The limit is the least upper bound of the sequence {sa(k); k ≥ 1}. (b) Let Jk = max{j(a), j(2), . . . , j(k), i.e., Jk is the largest index in aj(1), . . . , aj(k). Then Xk `=1 b` = Xk `=1 aj(`) ≤ XJk j=1 aj ≤ Sa. By the same argument as in part (a), Pk `=1 b` has a limit as k → 1 and the limit, say Sb is at most Sa. (c) Using the inverse permutation to define the sequence {ak} from the sequence {bk}, the same argument as in part (b) shows that Sa ≤ Sb. Thus Sa = Sb and the limit is independent of the order of summation. (d) The simplest example is the sequence {1,−1, 1,−1, . . . }. The partial sums here alternate between 1 and 0, so do not converge at all. Also, in a sequence taking two odd terms for each even term, the series goes to 1. A more common (but complicated) example is the alternating harmonic series. This converges to 0, but taking two odd terms for each even term, the series approaches 1. Exercise 4.5: (a) For E = I1 ∪I2, with the left end points satisfying a1 ≤ a2, there are three cases to consider. 21
  • 22. • a2 < b1. In this case, all points in I1 and all points in I2 lie between a1 and max{b1, b2}. Conversely all points in (a1, max{b1, b2}) lie in either I1 or I2. Thus E is a single interval which might or might not include each end point. • a2 > b1. In this case, I1 and I2 are disjoint. • a2 = b1. If I1 is open on the right and I2 is open on the left, then I1 and I2 are separated by the single point a2 = b1. Otherwise E is a single interval. (b) Let Ek = I1∪I2∪· · ·∪Ik and let Jk be the final interval in the separated interval representation of Ek. We have seen how to find J2 from E2 and note that the starting point of J2 is either a1 or a2. Assume that in general the starting point of Jk is aj for some j, 1 ≤ j ≤ k. Assuming that the starting points are ordered a1 ≤ a2 ≤ · · · , we see that ak+1 is greater than or equal to the starting point of Jk. Thus Jk ∪ Ik+1 is either a single interval or two separated intervals by the argument in part (a). Thus Ek+1, in separated interval form, is Ek, with Jk replaced either by two separated intervals, the latter starting with ak+1, or by a single interval starting with the same starting point as Jk. Either way the starting point of Jk+1 is aj for some j, 1 ≤ j ≤ k+1, verifying the initial assumption by induction. (c) Each interval Jk created above starts with an interval starting point a1, . . . , and ends with an interval ending point b1, . . . , and therefore all the separated intervals start and end with such points. (d) Let I01 ∪ · · · ∪ I0` be the union of disjoint intervals arising from the above algorithm and let I100 ∪ · · · ∪ I00 i be any other ordered union of separated intervals. Let k be the smallest integer for which I0k 6= I00 k . Then the starting points or the ending points of these intervals are different, or one of the two intervals is open and one closed on one end. In all of these cases, there is at least one point that is in one of the unions and not the other. 0j Exercise 4.6: 0j (a) 0j 0jIf we assume that the intervals {Ij ; 1 ≤ j < 1} are ordered in terms of starting points, then the argument in Exercise 4.5 immediately shows that the set of separated intervals stays the same as each new new interval Ik+1 is added except for the possible addition of a new interval at the right or the expansion of the right most interval. However, with a countably infinite set of intervals, it is not necessarily possible to order the intervals in terms of starting points (e.g., suppose the left points are the set of rationals in (0,1)). However, in the general case, in going from Bk to Bk+1, a single interval Ik+1 is added to Bk. This can add a new separated interval, or extend one of the existing separated intervals, or combine two or more adjacent separated intervals. In each of these cases, each of the separated intervals in Bk (including Ij,k) either stays the same or is expanded. Thus Ij,k ⊆ Ij,k+1. (b) Since Ij,k ⊆ Ij,k+1, the left end points of the sequence {Ij,k; k ≥ j} is a monotonic decreasing sequence and thus has a limit (including the possibility of −1). Similarly the right end points are monotonically increasing, and thus have a limit (possibly +1). Thus limk→1 Ij,k exists as an interval Ithat might be infinite on either end. Note now that any point in the interior of Imust be in Ij,k for some k. The same is true for the left (right) end point of Iif Iis closed on the left (right). Thus I0j must be in B for each j. (c) From Exercise 4.5, we know that for each k ≥ 1, the set of intervals {I1,k, I2,k, . . . , Ik,k} is a separated set whose union is Bk. Thus, for each `, j ≤ k, either I`,k = Ij,k or I`,k and Ij,k are 22
  • 23. 0j separated. If 0` I`,k = Ij,k, then the fact that Ij,k ⊆ Ij,k+1 ensures that I`,k+1 = Ij,k+1, and thus, in the limit, I= I. If I`,k and Ij,k are separated, then, as explained in part (a), the addition 0j of Ik+1 either maintains the separation or combines I`,k and Ij,k into a single interval. Thus, as k increases, either I`,k and Ij,k remain separated or become equal. (d) The sequence {I; j ≥ 1} is countable, and after removing repetitions it is still countable. It is a separated sequence of intervals from (c). From (b), ∪1j =1 ⊆ B. Also, since B = ∪jIj ⊆ ∪jI0j , we see that B = ∪jI0j . (e) Let {I0j ; j ≥ 1} be the above sequence of separated intervals and let {I00 j ; j ≥ 1} be any other sequence of separated intervals such that ∪jI00 j = B. For each j ≥ 1, let c0j be the center point of I0j . Since c0j is in B, cj ∈ I00 k for some k ≥ 1. Assume first that I0j is open on the left. Letting a0j be the left end point of I0j , the interval (a0j , c0j ] must be contained in I00 k . Since a0j /∈ B, a0j must be the left end point of I00 k and I00 k must be open on the left. Similarly, if I0j is closed on 0j 0j the left, a0is the left end point of I00 and I00 is closed on the left. Using the same analysis on j k k the right end point of I, we see that I= I00 k . Thus the sequence {I00 j ; j ≥ 1} contains each interval in {I0j ; j ≥ 1}. The same analysis applied to each interval in {I00 j ; j ≥ 1} shows that {I0j ; j ≥ 1} contains each interval in {I00 j ; j ≥ 1}, and thus the two sequences are the same except for possibly different orderings. Exercise 4.7: (a) and (b) For any finite unions of intervals E1 and E2, (4.87) in the text states that μ(E1) + μ(E2) = μ(E1 ∪ E2) + μ(E1 ∩ E2) ≥ μ(E1 ∪ E2), kj where the final inequality follows from the non-negativity of measure and is satisfied with equality if E1 and E2 are disjoint. For part (a), let I1 = E1 and I2 = E2 and for part (b), let Bk = E1 and Ik+1 = E2. (c) For k = 2, part (a) shows that μ(Bk) ≤ μ(I1) + μ(I2). Using this Pas the initial step of the induction and using part (b) for the inductive step shows that μ(Bk) ≤ μ(Ij) with equality =1 in the disjoint case. (d) First assume that μ(B) is finite (this is always the case for measure over the interval [−T/2, T/2]). Then since Bk is non-decreasing in k, μ(B) = lim k→1 μ(Bk) ≤ lim k→1 Xk j=1 μ(Ik). Alternatively, if μ(B) = 1, then limk→1 Pkj =1 μ(Ik) = 1 also. Exercise 4.8: Let Bn = ∪1j =1In,j. Then B = ∪n,jIn,j. The collection of intervals {In,j; 1 ≤ n ≤ 1, 1 ≤ j ≤ 1} is a countable collection of intervals since the set of pairs of positive integers is countable. Exercise 4.12: (a) By combining parts (a) and (c) of Exercise 4.11, {t : u(t) > β} is measurable for all β. Thus, {t : −u(t) < −β} is measurable for all β, so −u(t) is measurable. Next, for β > 0, {t : |u(t)| < β} = {t : u(t) < β} ∩ {t : u(t) > −β}, which is measurable. 23
  • 24. (b) {t : u(t) < β} = {t : g(u(t)) < g(β), so if u(t) is measurable, then g(u(t) is also. (c) Since exp(·) is increasing, exp[u(t)] is measurable by part (b). Part (a) shows that |u(t)| is measurable if u(t) is. Both the squaring function and the log function are increasing for positive values, so u2(t) = |u(t)|2 and log(|u(t)| are measurable. Exercise 4.13: (a) Let y(t) = u(t) + v(t). We will show that {t : y(t) < β) is measurable for all real β. Let ≤ > 0 be arbitrary and k ∈ Z be arbitrary. Then, for any given t, (k − 1)≤ ≤ u(t) < k≤ and v(t) < β − k≤) =⇒ y(t) < β. This means that the set of t for which the left side holds is included in the set of t for which the right side holds, so {t : (k − 1)≤ ≤ u(t) < k≤} ∩ {t : v(t) < β − k≤)} ⊆ {t : y(t) < β}. This subset inequality holds for each integer k and thus must hold for the union over k, [ k h {t : (k − 1)≤ ≤ u(t) < k≤} ∩ {t : v(t) < β − k≤)} i ⊆ {t : y(t) < β}. Finally this must hold for all ≤ > 0, so we choose a sequence 1/n for n ≥ 1, yielding [ n≥1 [ k h {t : (k − 1)/n ≤ u(t) < k/n} ∩ {t : v(t) < β − k/n)} i ⊆ {t : y(t) < β}. The set on the left is a countable union of measurable sets and thus is measurable. It is also equal to {t : y(t) < β}, since any t in this set also satisfies y(t) < β−1/n for sufficiently large n. (b) This can be shown by an adaptation of the argument in (a). If u(t) and v(t) are positive functions, it can also be shown by observing that ln u(t) and ln v(t) are measurable. Thus the sum is measurable by part (a) and exp[ln u(t) + ln v(t)] is measurable. Exercise 4.14: The hint says it all. Exercise 4.15: (a) Restrict attention to t ∈ [−T/2, T/2] throughout. First we show that vm(t) = inf1n=m un(t) is measurable for all m ≥ 1. For any given t, if un(t) ≥ V for all n ≥ m, then V is a lower bound to un(t) over n ≥ m, and thus V is greater than or equal to the greatest such lower bound, i.e., V ≥ vm(t). Similarly, vm(t) ≥ V implies that un(t) ≥ V for all n ≥ m. Thus, {t : vm(t) ≥ V } = 1 n=m{t : un(t) ≥ V }. Using Exercise 4.11, the measurability of un implies that {t : un(t) ≥ V } is measurable for each n. The countable intersection above is therefore measurable, and thus, using the result of Exercise 4.11 again, vm(t) is measurable for each m. 24
  • 25. Next, if vm(t) ≥ V then vm0(t) ≥ V for all m0 > m. This means that vm(t) is a non-decreasing function of m for each t, and thus limm vm(t) exists for each t. This also means that {t : lim m→1 vm(t) ≥ V } = 1[ m=1 " 1 # . n=m{t : un(t) ≥ V } This is a countable union of measurable sets and is thus measurable, showing that lim inf un(t) is measurable. (b) If lim infn un(t) = V1 for a given t, then limm vm(t) = V1, which implies that for the given t, the sequence {un(t); n ≥ 1} has a subsequence that approaches V1 as a limit. Similarly, if lim supn un(t) = V2 for that t, then the sequence {un(t), n ≥ 1} has a subsequence approaching V2. If V1 < V2, then limn un(t) does not exist for that t, since the sequence oscillates infinitely between V1 and V2. If V1 = V2, the limit does exist and equals V1. (c) Using the same argument as in part (a), with inf and sup interchanged, {t : lim sup un(t) ≤ V } = 1 m=1 " 1[ # n=m{t : un(t) ≤ V } is also measurable, and thus lim sup un(t) is measurable. It follows from this, with the help of Exercise 4.13 (a), that lim supn un(t) − lim infn un(t) is measurable. Using part (b), limn un(t) exists if and only if this difference equals 0. Thus the set of points on which limn un(t) exists is measurable and the function that is this limit when it exists and 0 otherwise is measurable. Exercise 4.16: As seen below, un(t) is a rectangular pulse taking the value 2n from 1 2n+1 to 3 2n+1 . It follows that for any t ≤ 0, un(t) = 0 for all n. For any fixed t > 0, we can visually see that for n large enough, un(t) = 0. Since un(t) is 0 for all t greater than 3 2n+1 , then for any fixed t > 0, un(t) = 0 for all n > log2 3t − 1. Thus limn→1 un(t) = 0 for all t. Since limn→1 un(t) = 0 for all t, it follows that R R R limn→1 un(t)dt = 0. On the other hand, un(t)dt = 1 for all n so limn→1 un(t)dt = 1. 1/16 3/16 1/4 3/8 3/4 0 1/8 Exercise 4.17: (a) Since u(t) is real valued, ØØØØ ØØØØ Z u(t)dt = ØØØØ Z u+(t)dt − ØØØØ Z u−(t)dt ≤ ØØØØ ØØØØ Z u+(t)dt + ØØØØ ØØØØ Z u−(t)dt = Z ØØ ØØ u+(t) dt + Z ØØ ØØ u−(t) dt = Z u+(t)dt + Z u−(t)dt = Z |u(t)|dt. 25
  • 26. (b) As in the hint we select α such that α R u(t)dt is non-negative and real and |α| = 1. Now let R αu(t) = v(t) + jw(t) where R v(t) and w(t) R are the real and imaginary part of αu(t). Since α u(t)dt is real, we have w(t)dt = 0 and α u(t)dt = R v(t)dt. Note also that |v(t)| ≤ |αu(t)|. Hence ØØØØ ØØØØ Z u(t)dt = ØØØØ α ØØØØ Z u(t)dt = ØØØØ ØØØØ Z v(t)dt Z ≤ |v(t)| dt (part a) Z ≤ |αu(t)| dt = Z |α| |u(t)| dt = Z |u(t)| dt. Exercise 4.18: (a) The meaning of u(t) = v(t) a.e. is that μ{t : R |u(t) − v(t)| > 0} = 0. It follows that |u(t) − v(t)|2dt = 0. Thus u(t) and v(t) are L2 equivalent. (b) If u(t) and v(t) are L2 equivalent, then R |u(R t)−v(t)|2dt = 0. Now suppose that μ{t : |u(t)− v(t)|2 > ≤} is non-zero for some ≤ > 0. Then |u(t) − v(t)|2dt ≥ ≤μ{t : |u(t) − v(t)|2 > ≤} > 0 which contradicts the assumption that u(t) and v(t) are L2 equivalent. (c) The set {t : |u(t) − v(t)| > 0} can be expressed as {t : |u(t) − v(t)| > 0} = [ n≥1 {t : |u(t) − v(t)| > 1/n}. Since each term on the right has zero measure, the countable union also has zero measure. Thus {t : |u(t) − v(t)| > 0} has zero measure and u(t) = v(t) a.e. Exercise 4.21: (a) By expanding the magnitude squared within the given integral as a product of the function and its complex conjugate, we get Z ØØ Øu(t) − Xn m=−n X` k=−` ØØØ ˆuk,mθk,m(t) 2 dt = Z |u(t)|2 dt − Xn m=−n X` k=−` T|ˆuk,m|2. (14) Since each increase in n (or similarly in `) subtracts additional non-negative terms, the given integral is non-increasing in n and `. (b) and (c) The set of terms T|ˆuk,m|2 for k ∈ Z and m ∈ Z is a countable set of non-negative terms with a sum bounded by R |u(t)|2 dt, which is finite since u(t) is L2. Thus, using the result of Exercise 4.4, the sum over this set of terms is independent of the ordering of the summation. Any scheme for increasing n and ` in any order relative to each other in (14) is just an example of this more general ordering and must converge to the same quantity. 26
  • 27. Since um(t) = u(t)rect(t/T − m) satisfies R |um(t)|2 dt = T P k |uk,m|2 by Theorem 4.4.1 of the text, it is clear that the limit of (14) as n, ` → 1 is 0, so the limit is the same for any ordering. There is a subtlety above which is important to understand, but not so important as far as developing the notation to avoid the subtlety. The easiest way to understand (14) is by under-standing that R |um(t)|2 dt = T P k |uk,m|2, which suggests taking the limit k → ±1 for each value of m in (14). This does not correspond to a countable ordering of (k,m). This can be straightened out with epsilons and deltas, but is better left to the imagination of the reader. Exercise 4.22: (a) First note that: Xn m=−n um(t) =   0 |t| > (n + 1/2)T 2u(t) t = (m+ 1/2)T, |m| < n u(t) otherwise. Z ØØØØØ u(t) − Xn m=−n ØØØØØ um(t) 2 dt = Z (−n−1/2)T −1 |u(t)|2 + Z 1 (n+1/2)T |u(t)|2 dt. By the definition of an L2 function over an infinite time interval, each of the integrals on the right approach 0 with increasing n. (b) Let u` m(t) = P` k=−` ˆuk,mθk,m(t). Note that Pn m=−n u` m(t) = 0 for |t| > (n + 1/2)T. We can now write the given integral as: Z |t|>(n+1/2)T |u(t)|2 dt + Z (n+1/2)T −(n+1/2)T ØØØØØ u(t) − Xn m=−n ØØØØØ u` m(t) 2 dt. (15) As in part (a), the first integral vanishes as n → 1. (c) Since ˆuk,m are the Fourier series coefficients of um(t) we know um(t) = l.i.m`→1u` m(t). Hence, for each n, the second integral goes to zero as ` → 1. Thus, for any ≤ > 0, we can choose n so that the first term is less than ≤/2 and then choose ` large enough that the second term is less than ≤/2. Thus the limit of (15) as n, ` → 1 is 0. Exercise 4.23: The Fourier transform of the LHS of (4.40) is a function of t, so its Fourier transform is F(u(t) ∗ v(t)) = Z 1 −1 μZ 1 −1 u(τ )v(t − τ )dτ ∂ e−2πiftdt = Z 1 −1 u(τ ) μZ 1 −1 ∂ v(t − τ )e−2πift dt dτ = Z 1 −1 u(τ ) μZ 1 −1 v(r)e−2πif(t+r) dr ∂ dτ = Z 1 −1 u(τ )e−2πifτdτ Z 1 −1 v(r)e−2πifrdr = ˆu(f)ˆv(f). 27
  • 28. Exercise 4.24: (a) Z |t|>T ØØ Øu(t)e−2πift − u(t)e−2πi(f−δ)t ØØØ dt = Z |t|>T ØØ Øu(t)e−2πift ≥ 1 − e2πiδt ¥ØØØ dt = Z |t|>T |u(t)| ØØØ 1 − e2πiδt ØØØ dt ≤ 2 Z |t|>T |u(t)| dt for all f > 0, δ > 0. Since u(t) is L1, R 1 −1 |u(t)| dt is finite. Thus, for T large enough, we can make R |t|>T |u(t)| dt as small as we wish. In particular, we can let T be sufficiently large that 2 R |t|>T |u(t)| dt is less than ≤/2. The result follows. (b) For all f, Z |t|≤T ØØ Øu(t)e−2πift − u(t)e−2πi(f−δ)t ØØØ dt = Z |t|≤T |u(t)| ØØØ 1 − e2πiδt ØØØ dt. For the T selected in part a), we can make ØØ 1 − e2πiδt ØØ arbitrarily small for all |t| ≤ T by choosing δ to be small enough. Also, since u(t) is L1, R |t|≤T |u(t)| dt is finite. Thus, by choosing δ small enough, we can make R |t|≤T |u(t)| ØØ 1 − e2πiδt ØØ dt < ≤/2. Exercise 4.26: Exercise 4.11 shows that the sum of two measurable functions is measurable, so the question concerns the energy in R au(t) + bv(t). Note that R for each t, |au(t) + bv(t)|2 ≤ 2|a|2|u(t)|2 + 2|b|2|v(t)|2. Thus since |u(t)|2 dt < 1 and R |v(t)|2 dt < 1, it follows that |au(t) + bv(t)|2 dt < 1. If {t : u(t) ≤ β} is a union of disjoint intervals, then {t : u(t − T) ≤ β} is that same union of intervals each shifted to the left by T, and therefore it has the same measure. In the general case, any cover of {t : u(t) ≤ β}, if shifted to the left by T, is a cover of {t : u(t−T) ≤ β}. Thus, for all β, μ{t : u(t) ≤ β} = μ{t : u(t−T) ≤ β}. Similarly if {t : u(t) ≤ β} is a union of intervals, then {t : u(t/T ) ≤ β} is that same set of intervals expanded by a factor of T. This generalizes to arbitrary measurable sets as before. Thus μ{t : u(t) ≤ β} = (1/T)μ{t : u(t/T ) ≤ β}. Exercise 4.29: The statement of the exercise contains a misprint — the transform ˆu(f) is limited to |f| ≤ 1/2 (thus making the sampling theorem applicable) rather than the function being time-limited. For the given sampling coefficients, we have u(t) = X k u(k)sinc(t − k) = Xn (−1)ksinc(t − k) k=−n u(n + 1 2 ) = Xn (−1)ksinc(n + k=−n 1 2 − k) = Xn k=−n 12 (−1)k(−1)n−k π[n − k + ] . (16) Since n is even, (−1)k(−1)n−k = (−1)n = 1. Substituting j for n − k, we then have u(n + 1 2 ) = X2n k=0 1 π(k + 12 ) . (17) 28
  • 29. The approximation Pm2 k=m1 1 ln m2+1 k+1/2 ≈ m1 comes from approximating the sum by an integral and is quite accurate for m1 >> 0 To apply this approximation to (17), we must at least omit the term k = 0 and this gives us the approximation u(n + 1 2 ) ≈ 2 π + 1 π ln(2n + 1). This goes to infinity logarithmically in n as n → 1. The approximation can be improved by removing the first few terms from (17) before applying the approximation, but the term ln(2n + 1) remains. We can evaluate u(n+m+12 ) and u(n−m−12 ) by the same procedure as in (16). In particular, u(n+m+ 1 2 ) = Xn (−1)ksinc(n+m+ k=−n 1 2−k) = Xn k=−n 12 (−1)k(−1)n+m−k π[n+m−k + ] = 2Xn+m j=m 12 (−1)n+m π[j + ] . . u(n−m− 1 2 ) = Xn k=−n 12 (−1)k(−1)n−m−k π[n−m−k − ] = 2Xn−m j=−m 12 (−1)n−m π[j − ] . Taking magnitudes, ØØØØ u(n+m+ 1 2 ØØØØ ) = 2Xn+m j=m 1 π[j + 12 ] ; ØØØØ u(n−m− 1 2 ØØØØ ) = 2Xn−m j=−m 1 π[j − 12 ] . All terms in the first expression above are positive, whereas those in the second expression are negative for j ≤ 0. We break this second expression into positive and negative terms: ØØØØ u(n−m− 1 2 ØØØØ ) = X0 j=−m 1 π[j − 12 ] + 2Xn−m j=1 1 π[j − 12 ] = X0 k=−m 1 π[k − 12 ] + 2nX−m−1 j=0 1 π[j + 12 ] . For each j, 0 ≤ j ≤ m, the term in the second sum above is the negative of the term in the first sum with j = −k. Cancelling these terms out, ØØØØu(n−m− 1 2 ØØØØ ) = 2nX−m−1 j=m+1 1 π[j + 12 ] . This is a sum of positive terms and is a subset of the positive terms in |u(n+m+ 12 |, establishing that |u(n − m − 12 | ≤ |u(n + m + 12 )|. What is happening here is that for points inside [−n, n], the sinc functions from the samples on one side of the point cancel out the sinc functions from the samples on the other side. The particular samples in this exercise have been chosen to illustrate that truncating the samples of a bandlimited function and truncating the function can have very different effects. Here the function with truncated samples oscillates wildly (at least logarithmically in n), with the oscillations larger outside of the interval than inside. Thus most of the energy in the function resides outside of the region where the samples are nonzero. 29
  • 30. 1W Exercise 4.31: (a) Note that g(t) = p2(t) where p(t) = sinc(Wt). Thus g(ˆf) is the convolution of p(ˆf) with itself. Since p(ˆf) = rect( f ), we can convolve graphically to get the triangle function below. W ° ° °❅ ° ˆg(f) g(t) ❅ ❅ ❅ 1 W −W W= 1 2T 1 1W (b) Since u(t) = P k u(kT)sinc(2Wt − k), it follows that v(t) = P k u(kT)sinc(2Wt − k) ∗ g(t). Letting h(t) = sinc(t/T ) ∗ g(t), we see that ˆh (f) = Trect(Tf)ˆg(f). Since rect(Tf) = 1 over the range where ˆg(f) is non-zero, ˆh (f) = T ˆg(f). Thus h(t) = Tg(t). It follows that v(t) = X k Tu(kT)g(t − kT). (18) (c) Note that g(t) ≥ 0 for all t. This is the feature of g(t) that makes it useful in generating amplitude limited pulses. Thus, since u(kT) ≥ 0 for each k, each term in the sum is non-negative, and v(t) is non-negative. P (d) The obvious but incomplete way to see that k sinc(t/T − k) = 1 is to observe that each sample of the constant function 1 is 1, so this is just the sampling expansion of a constant. Unfortunately, u(t) = 1 is not L2, so the sampling theorem does not apply. The problem is more than nit-picking, since, for example, the sampling expansion of a sequence of alternating 1’s and -1’s does not converge (as can be seen from Exercise 4.29). The desired result follows here from noting that both the sampling expansion and the constant function 1 are periodic in T and both are L2 over one period. Taking the Fourier series over a period establishes the equality. P (e) To evaluate k g(t−kT), consider P (18) with each u(kT) = 1. For this choice, it follows that k g(t−kT) = v(t)/T. To evaluate v(t) for this choice, note that u(t) = 1 and v(t) = u(t)∗g(t), so that v(t) can be regarded as the output when the constant 1 is passed through the filter g(t). The output is then constant also and equal to R g(t) dt = ˆg(0) = 1W . Thus P k g(t − kT) = 1/TW = 2. (f) Note that v(t) = P k u(kT)Tg(t − kT) is non-decreasing, for each t, in each sample u(kT). Thus v(t) ≤ P k Tg(t − kT), which as we have seen is simply 2T. (h) Since g is real and non-negative and each |u(kT)| ≤ 1, |v(t)| ≤ X k |u(kT)|Tg(t − kT) ≤ 2T for all t. We will find in Chapter 6 that g(t) is not a very good modulation waveform at a sample separation T, but it could be used at a sample separation 2T. Exercise 4.33: Consider the sequence of functions vm(t) = rect(t − m) for m ∈ Z+, i.e., time spaced rectangular pulses. For every t, limm→1 rect(t − m) = 0 so this sequence converges pointwise to 0. However, R |(rect(t −m) − rect(t − n)|2 dt = 2 for all n6= m, so L2 convergence is impossible. 30
  • 31. Exercise 4.37: (a) Z |ˆs(f)| df = Z | X m ˆu(f + m T )rect(fT)| df ≤ Z X |u(ˆf + m T m )rect(fT)| df = Z |ˆu(f)| df, which shows that s(f) is L1 if ˆu(f) is. (b) The following sketch makes it clear that ˆu(f) is L1 and L2. In particular, Z |ˆu(f)| df = Z |ˆu(f)|2 df = 2 X k≥1 1 k2 < 1. 0 1 2 1 ˆu(f) ✲ 1✛/4 ✲ ✛1/9 ˆs(f) 2 6 4 0 1/2 It can be seen from the sketch of ˆs(f) that ˆs(f) = 2 from 18 to 12 and from −12 to −18, which is a set of measure 3/4. In general, for arbitrary integer k > 0, it can be seen that ˆs(f) = 2k from 1 to 1 and from 1 2(k+1)2 2k2 − 2k2 to − 1 2(k+1)2 . Thus ˆs(f) = 2k over a set of measure 2k+1 k2(k+1)2 . It follows that Z ˆ |s(f)|2 df = lim n→1 Xn k=1 (2k)2 2k + 1 k2(k + 1)2 = lim n→1 Xn k=1 4(2k + 1) (k + 1)2 ≥ lim n→1 Xn k=1 4(k + 1) (k + 1)2 = Xn k=1 4 k + 1 = 1. (c) Note that ˆu(f) = 1 for every positive integer value of f, and thus (for positive ≤) ˆu(f)f1+≤ approaches 1. It is 0 for other arbitrarily large values of f, and thus no limit exists. Exercise 4.38: Z 1 −1 |u(t)|2dt = 2 μ 1 + 1 22 + ∂ . 1 32 + ... This sum is finite so u(t) is L2. Now we’ll show that s(t) = X k u(k)sinc(t − k) = X k sinc(t − k) is neither L1 nor L2. Taking the Fourier Transform of s(t), ˆs(f) = X k rect(f)e−2πifk = rect(f) X k e−2πifk. To show that s(t) is not L1, Z 1 −1 |s(t)|dt = Z 1 −1 s(t)dt since s(t) ≥ 0 for all t = ˆs(0) = X k 1 = 1. 31
  • 32. To show that s(t) is not L2, Z 1 −1 |s(t)|2dt = Z 1 −1 | X k sinc(t − k)|2df = 1. Since u(k) is equal to 1 for every integer k, P k u2(k) = 1. The sampling theorem energy equation does not apply here °R |u(t)|2dt6= T P k |u(kT)|2 ¢ because ˆu(f) is not band-limited. 32
  • 33. Chapter 5 Exercise 5.1: The first algorithm starts with a set of vectors S = {v1, . . . , vm} that span V but are dependent. A vector vk ∈ S is selected that is a linear combination of the other vectors in S. vk is removed from S, forming a reduced set S0. Now S0 still spans V since each v ∈ V is a linear combination of vectors in S, and vk in that expansion can be replaced by its representation using the other vectors. If S0 is independent, we are done, and if not, the previous step is repeated with S0 replacing S. Since the size of S is reduced by 1 on each such step, the algorithm terminates with an independent spanning set, i.e., a basis. The second algorithm starts with an independent set S = {v1, . . . , vm} of vectors that do not span the space. An arbitrary nonzero vector vm+1 ∈ V is then selected that is not a linear combination of S (this is possible since S does not span V). It can be seen that S0 = {v1, . . . , vm+1} is an independent set. If S0 spans V, we are done, and if not, the previous step is repeated with S0 replacing S. With each repetition of this step, the independent set is increased by 1 vector until it eventually spans V. It is not immediately clear that the second algorithm ever terminates. To prove this and also prove that all bases of a finite dimensional vector space have the same number of elements, we describe a third algorithm. Let Sind = v1, . . . , vm be an arbitrary set of independent vectors and let Ssp = {u1, . . . , un} be a finite spanning set for V (which must exist by the finite dimensional assumption). Then, for k = 1, . . .m, successively add vk to Ssp and remove one of the original vectors uj of Ssp so that the remaining set, say S0sp is still a spanning set. This is always possible since the added element must be a linear combination of a spanning set, so the augmented set is linearly dependent. One of the original elements of Ssp can be removed (while maintaining the spanning property) since the newly added vector is not a linear combination of the previously added vectors. A contradiction occurs if m > n, i.e., if the independent set is larger than the spanning set, since no more than the n original vectors in the spanning set can be removed. We have just shown that every spanning set contains at least as many members as any inde-pendent set. Since every basis is both a spanning set and an independent set, this means that every basis contains the same number of elements, say b. Since every independent set contains at most b elements, algorithm 2 must terminate as a basis when S reaches b vectors. Exercise 5.3: Let the n vectors that uniquely span a vector space V be called v1, v2, . . . , vn. We will prove that the n vectors are linearly independent using proof by contradiction. Assume v1, v2, . . . , vn are linearly dependent. Then Pnj =1 αjvj = 0 for some set of scalars α1, α2, .., αn where not all the αjs equal zero. Say αk6= 0. We can express vk as a linear combination of the other n − 1 vectors {vj}j6=k: vk = X j6=k −αj αk vj . Thus vk P has two representations in terms of {v1, . . . , vn}. One is that above, and the other is vk = j βjvj where βk = 1 and βj = 0 for j6= k. Thus the representation is non-unique, demonstrating the contradiction. It follows that if n vectors uniquely span a vector space, they are also independent and thus form a basis. From Theorem 5.1.1, the dimension of V is then n. 33
  • 34. Exercise 5.6: kv + uk2 = hv + u, v + ui = hv, v + ui + hu, v + ui by axiom (b) = hv, vi + hv, ui + hu, vi + hu, ui by axiom (b) ≤ |hv, vi > | + |hv, ui| + |hu, vi| + |hu, ui| ≤ kvk2 + kvkkuk + kukkvk + kuk2 = (kvk + kvk)2. So kv + uk ≤ kvk + kuk. Exercise 5.8: (a) By direct substitution of u(t) = P k,m ˆuk,mθk,m(t) and v∗(t) = P k,m ˆv∗k,mθ∗k,m(t) into the inner product definition hu, vi = Z 1 −1 u(t)v∗(t) dt = Z 1 −1 X k,m ˆuk,mθk,m(t) X k0,m0 ˆv∗k0,m0θ∗k0,m0(t) dt = X k,m ˆuk,m X k0,m0 ˆv∗k0,m0 Z 1 −1 θk,m(t)θ∗k0,m0(t) dt = T X k,m ˆuk,mˆv∗k,m. (b) For any real numbers a and b, 0 ≤ (a + b)2 = a2 + 2ab + b2. It follows that ab ≤ 12 a2 + 12 b2. Applying this to |ˆuk,m| and |ˆvk,m|, we see that |uk,mv∗k,m| = |uk,m| |v∗k,m| ≤ 1 2|uk,m|2 + 1 2|vk,m|2. Thus, using part (a), |hu, vi| ≤ T X k,m |uk,mv∗k,m| ≤ T 2 X k,m |uk,m|2 + T 2 X k,m |vk,m|2. Since u and v are L2, the latter sums above are finite, so |hu, vi| is also finite. (c) It is necessary for inner products in an inner-product space to be finite since, by definition of a complex inner-product space, the inner product must be a complex number, and the set of complex numbers (just like the set of real numbers) does not include 1. This seems like a technicality, but it is central to the special properties held by finite energy functions. Exercise 5.9: (a) For V to be a vector subspace, it is necessary for v = 0 to be an element of V, and this is only possible in the special case where ku1k = ku2k. Even in this case, however, V is not a vector space. This will be shown at the end of part (b). It will be seen in studying detection in Chapter 8 that V is an important set of vectors, subspace or not. 34
  • 35. (b) V can be rewritten as V = {v : kv −u1k2 = kv −u2k2}. Expanding these energy differences for k = 1, 2, kv − ukk2 = kvk2 − hv, uki − huk, vi + kukk2 = kvk2 + kukk2 − 2<(hv, uki). It follows that v ∈ V if and only if kvk2 + ku1k2 − 2<(hv, u1i) = kvk2 + ku2k2 − 2<(hv, u2i). Rearranging terms, v ∈ V if and only if <(hv, u2 − u1i) = ku2k2 − ku1k2 2 . (19) Now to complete part (a), assume ku2k2 = ku1k2 (which is necessary for V to be a vector space) and assume u16= u2 to avoid the trivial case where V is all of L2. Now let v = i(u2 − u1). Thus hv, u2 − u1i is pure imaginary so that v ∈ V. But iv is not in V since hiv, u2 − u1i = −ku2 −u1k26= 0. In a vector subspace, multiplication by a scalar (in this case i) yields another element of the subspace, so V is not a subspace except in the trivial case where u1 = u2¿ (c) Substituting (u1 + u2)/2 for v, we see that kv − u1k = k(u2 − u2)k and kv − u2k = k(−u2 + u2)k, so kv − u1k = kv − u2k and consequently v ∈ V. (d) The geometric situation is more clear if the underlying class of functions is the class of real L2 functions. In that case V is a subspace whenever ku1k = ku2k. If ku1k6= ku2k, then V is a hyperplane. In general, a hyperplane H is defined in terms of a vector u and a subspace S as H = {v : v = u + s for some s ∈ S}. In R2 a hyperplane is a straight line, not necessarily through the origin, and in R3, a hyperplane is either a plane or a line, neither necessarily including the origin. For complex L2, V is not a hyperplane. Part of the reason for this exercise is to see that real L2 and complex L2, while similar in may aspects, are very different in other aspects, especially those involving vector subspaces. Exercise 5.12: (a) To show that S⊥ is a subspace of V, we need to show that for any v1, v2 ∈ S⊥, αv1+βv2 ∈ S⊥ for all scalars α, β. If v1, v2 ∈ S⊥, then for all w ∈ S, hαv1 + βv2,wi = αhv1,wi+βhv2,wi = 0 + 0. Thus αv1 + βv2 ∈ S⊥ and S⊥ is a subspace of V. (b) By the Projection Theorem, for any u ∈ V, there is a unique vector u|S ∈ S such that hu − u|S, si = 0 for all s ∈ S. So u⊥S = u − u|S ∈ S⊥S and we have a unique decomposition of u into u = u|S + u⊥S. (c) Let V and S (where S < V ) denote the dimensions of V and S respectively. Start with a set of V independent vectors s1, s2 · · · sV ∈ V. This set is chosen so that the first S of these i.e. s1, s2 · · · sS are in S. The first S orthonormal vectors obtained by Gram-Schmidt procedure will be a basis for S. The next V − S orthonormal vectors obtained by the procedure will be a basis for S⊥. Exercise 5.14: (a) Assume throughout this part that m, n are positive integers, m > n. We will show, as case 1, that if the left end, am, of the pulse gm(t) satisfies am < an, then am + 2−m−1 < an, i.e., 35
  • 36. the pulses do not overlap at all. As case 2, we will show that if am ∈ (an, , an + 2−n−1), then am + 2−m−1 ∈ [an, an + 2−n−1], i.e., the pulses overlap completely. Case 1: Let dm be the denominator of the rational number am (in reduced form). Thus (since andn and amdm are integers), it follows that if am < an, then also am + 1 dndm ≤ an. Since dn ≤ dm ≤ m for m ≥ 3, we have am + 1 m2 ≤ an for m ≥ 3. Since 1/m2 > 2−m−1 for m ≥ 3, it follows that am + 2−m−1 ≤ an. Thus, if am < an, gm and gn do not overlap for any m > 3. Since g2 does not overlap g1 by inspection, there can be no partial overlap for any am < an. Case 2: Apologies! This is very tedious. Assume that am ∈ (an, an + 2−n−1). By the same argument as above, am ≥ an + 1 dndm and am + 1 dmd0n ≤ an + 2−n−1 (20) where d0n is the denominator of an + 2−n−1. Combining these inequalities, 1 dndm < 2−n−1. (21) We now separate case 2 into three subcases. First, from inspection of Figure 5.3 in the text, there are no partial overlaps for m < 8. Next consider m ≥ 8 and n ≤ 4. From the right side of (20), there can be no partial overlap if 2−m−1 ≤ 1 dmd0n condition for no partial overlap. (22) From direct evaluation, we see that d0n ≤ 48 for n ≤ 4. Now dm2−m−1 is 5/512 for m = 8 and is decreasing for m ≥ 8. Since 5/512 < 1/48, there is no partial overlap for n ≤ 4,m ≥ 8. Next we consider the general case where n ≥ 5. From (21), we now derive a general condition on how small m can be for m, n pairs that satisfy the conditions of case 2. Since m ≥ dm for m ≥ 3, we have m > 2n+1 dn (23) For n ≥ 5, 2n+1/dn ≥ 2n + 2, so the general case reduces to n ≥ 5 and m ≥ 2n + 2. Next consider the condition for no partial overlap in (22). Since d0n ≤ 2n+1dn ≤ 2n+1n and dm ≤ m, the following condition also implies no partial overlap: m2−m−1 ≤ 2−n−1 n (24) The left side of (24) is decreasing in m, so if we can establish (24) for m = 2n+2, it is established for all m ≥ 2n+2. The left side, for m = 2n+2 is (2n+2)2−2m−3. Thus all that remains is to show that (2n + 2)n ≤ 2n+2. This, however, is obvious for n ≥ 5. Exercise 5.15: Using the same notation as in the proof of Theorem 4.5.1, u(n)(t) = Xn m=−n Xn k=−n ˆuk,mθk,m(t) ˆu(n)(f) = Xn m=−n Xn k=−n ˆuk,m√k,m(f). 36
  • 37. Since √k,m(f) is the Fourier transform of θk,m(t) for each k,m, the coefficients ˆuk,m are the same in each expansion. In the same way, v(n)(t) = Xn m=−n Xn k=−n ˆvk,mθk,m(t) ˆv(n)(f) = Xn m=−n Xn k=−n ˆvk,m√k,m(f). It is elementary, using the orthonormality of the θk,m and the orthonormality of the √k,m to see that for all n > 0, hu(n), v(n)i = Xn m=−n Xn k=−n ˆuk,mv∗k,m = hˆu (n), ˆv(n)i. (25) Thus our problem is to show that this same relationship holds in the limit n → 1. We know (from Theorem 4.5.1) that l.i.m.n→1u(n) = u, with the corresponding limits for v(n),ˆu (n), and ˆv(n). Using the Schwarz inequality on the second line below, and Bessel’s inequality on the third, |hu(n), vi − hu(n), v(n)i| = |hu(n), v − v(n)i| ≤ ku(n)kkv − v(n)k ≤ kukkv − v(n)k. Since limn→1 kv −v(n)k = 0, we see that limn→1 |hu(n), vi−hu(n), v(n)i| = 0. In the same way, limn→1hu(n), vi − hu, vi| = 0. Combining these limits, and going through the same operations on the transform side, lim n→1hu(n), v(n)i = hu, vi lim n→1hˆu (n), ˆv(n)i = hˆu , ˆvi. (26) Combining (25) and (26), we get Parsevals relation for L2 functions, hu, vi = hˆu , ˆvi. Exercise 5.16: (a) Colloquially, lim|f|→1 ˆu(f)|f|1+≤ = 0 means that ØØ ˆu(f)||f|1+≤ ØØ becomes and stays increas-ingly small as |f| becomes large. More technically, it means that for any δ > 0, there is an A(δ) such that ØØ ˆu(f)||f|1+≤ ØØ ≤ δ for all f such that |f| ≥ A(δ). Choosing δ = 1 and A = A(1), we see that |ˆu(f)| ≤ |f|−1−≤ for |f| ≥ A. (b) Z 1 −1 |ˆu(f)| df = Z |f|>A |ˆu(f)| df + Z |f|≤A |ˆu(f)| df ≤ 2 Z 1 A f−1−≤ df + Z A −A |ˆu(f)| df = 2A−≤ ≤ + Z A −A |ˆu(f)| df. Since ˆu(f) is L2, its truncated version to [−A,A] is also L1, so the second integral is finite, showing that ˆu(f) (untruncated) is also L1. In other words, one role of the ≤ above is to make ˆu(f) decreases quickly enough with increasing f to maintain the L1 property. 37
  • 38. (c) Recall that ˆs(n)(f) = P |m|≤n ˆsm(f) where ˆsm(f) = ˆu(f − m)rect†(f). Assuming A to be integer and m0 > A, |ˆsm0(f)| ≤ (m0 − 1)−1−≤. Thus for f ∈ (−1/2, 1/2] |ˆs(n)(f)| ≤ ØØØØØØ X |m|≤A ØØØØØØ ˆu(f −m) + X (|m0| − 1)−1−≤ |m0|>A = ØØØØØØ X |m|≤A ØØØØØØ ˆu(f −m) + X m≥A 2m−1−≤. (27) 1 The factor →of 2 1 →above was omitted by error from the exercise statement. Note that since the final sum converges, this is independent of n and is thus an upper bound on |s(ˆf)|. Now visualize the 2A + 1 terms in the first sum above as a vector, say a. Let be the vector of 2A + 1 ones, so P P that ha,i = ak. Applying the Schwarz inequality to this, | k ak| ≤ kakk→1 k. Substituting this into (27), |ˆs(f)| ≤ s (2A + 1) X |m|≤A |ˆu(f +m)|2 + X m≥A 2m−1−≤. (28) (d) Note that for any complex numbers a and b, |a + b|2 ≤ |a + b|2 + |a − b|2 = 2|a|2 + 2|b|2. Applying this to (28), |ˆs(f)|2 ≤ (4A + 2) X |m|≤A |ˆu(f +m)|2 +   X m≥A 2m−1−≤   2 . Since ˆs(f) is nonzero only in [1/2, 1/2] we can demonstrate that ˆs(f) is L2 by showing that the integral of |ˆs(f)|2 over [−1/2, 1/2] is finite. The integral of the first term above is 4A + 2 times the integral of |ˆu(f)|2 from −A − 1/2 to A + 1/2 and is finite since ˆu(f) is L2. The integral of the second term is simply the second term itself, which is finite. 38
  • 39. Chapter 6 Exercise 6.1: Let Uk be be a standard M-PAM random variable where the M points each have probability 1/M. Consider the analogy with a uniform M-level quantizer used on a uniformly distributed rv U over the interval [−Md/2,Md/2]. ✛ R1 ✲✛ R2 ✲✛ R3 ✲✛ R4 ✲✛ R5 ✲✛ R6 ✲ a1 a2 a3 a4 a5 a6 0 ✛ d ✲ (M = 6) . Let Q be the quantization error for the quantizer and Uk be the quantization point. Thus U = Uk + Q. Observe that for each quantization point the quantization error is uniformly distributed over [−d/2, d/2]. This means that Q is zero mean and statistically independent of the quantization point Uk. It follows that k ] + E[Q2] = E[U2 k ] + d2 E[U2] = E[(Q + Uk)2 = E[U2 12. On the other hand, since U is uniformly distributed, E[U2] = (dM)2/12. It then follows that k ] = d2(M2 − 1) E[U2 12 . Verifying the formula for M=4: ES = 2 °d2 ¢2 + °3d 2 ¢2 4 = 5 4d2 d2(M2 − 1) 12 = 5 4d2. Verifying the formula for M=8: ES = 2 °d2 ¢2 + °3d 2 ¢2 + °5d 2 ¢2 + °7d 2 ¢2 8 = 21 4 d2 d2(M2 − 1) 12 = 21 4 d2. Exercise 6.3: (a) Since the received signal is decoded to the closest PAM signal, the intervals decoded to each signal are indicated below. ✛ d ✲ R1 R2 R3 R4 ✛ ✲✛ ✲✛ ✲✛ ✲ a1 a2 a3 a4 −3d/2 −d/2 d/2 3d/2 0 . 39
  • 40. Thus if Uk = a1 is transmitted, an error occurs if Zk ≥ d/2. The probability of this is Q(d/2) where Q(x) = Z 1 x 1 √2π exp(−z2/2). If Uk = a2 is transmitted, an error occurs if either Zk ≥ d/2 or Zk < −d/2, so, using the symmetry of the Gaussian density, the probability of an error in this case is 2Q(d/2). In the same way, the error probability is 2Q(d/2) for a3 and Q(d/2) for a4. Thus the overall error probability is (3/2)Q(d/2). (b) Now suppose the third point is moved to d/2+≤. This moves the decision boundary between R3 and R4 by ≤/2 and similarly moves the decision boundary between R2 and R3 by ≤/2. The error probability then becomes Pe(≤) = 1 2 Σ Q(d/2) + Q(d + ≤ 2 ) + Q(d − ≤ 2 Π . ) dP≤(e) d≤ = 1 4 Σ 1 √2π exp(−(d + ≤)2/2) − 1 √2π Π . exp(−(d − ≤)2/2) This is equal to 0 at ≤ = 0, as can be seen by symmetry without actually taking the derivitive. (c) With the third signal point at d/2 + ≤, the signal energy is ES = 1 4 "μ d 2 ∂2 + μ d + ≤ 2 ∂2 + 2 μ 3d 2 ∂2 # . The derivitive of this with respect to ≤ is (d + ≤)/8. (d) This means that to first order in ≤, the energy can be reduced by reducing a3 without changing Pe. Thus moving the two inner points slightly inward provides better energy efficiency for 4-PAM. This is quite counter-intuitive. The difference between optimizing the points in 4-PAM and using standard PAM is not very significant, however. At 10 dB signal to noise ratio, the optimal placement of points (which requires considerably more computation) makes the ratio of outer points to inner points 3.15 instead of 3, but it reduces error probability by less than 1%. Exercise 6.4: (a) If for each j, Z 1 −1 u(t)dj(t) dt = Z 1 −1 1X k=1 ukp(t−kT)dj(t) dt = 1X k=1 uk Z 1 −1 p(t−kT)dj(t) dt = uj , then it must be that R 1 −1 p(t−kT)dj(t) dt = hp(t − kT), dj(t)i has the value one for k = j and the value zero for all k6= j. That is, dj(t) must be orthogonal to p(t − kT) for all k6= j. 40
  • 41. (b) Since hp(t − kT), d0(t)i = 1 for k = 0 and equals zero for k6= 0, it follows by shifting each function by jT that hp(t − (k − j)T), d0(t)i equals 1 for j = k and 0 for j6= k. It follows that dj(t) = d0(t − jT). (c) In this exercise, to avoid ISI (intersymbol interference), we pass u(t) through a bank of filters d0(−t), d1(−t) . . . dj(−t) . . . , and the output of each filter at time t = 0 is u0, u1 . . . uj . . . respectively. To see this, note that the output of the j-th filter in the filter bank is rj(t) = 1X k=1 uk Z 1 −1 p(τ−kT)dj(−t + τ ) dτ. At time t = 0, rj(0) = 1X k=1 uk Z 1 −1 p(τ−kT)dj(τ ) dτ = uj . Thus, for every j, to retrieve uj from u(t), we filter u(t) through dj(−t) and look at the output at t = 0. However, from part (b), since dj(t) = d0(t−jT) (the j-th filter is just the first filter delayed by jT). Rather than processing in parallel through a filter bank and looking at the value at t = 0, we can process serially by filtering u(t) through d0(−t) and looking at the output every T. To verify this, note that the output after filtering u(t) through d0(−t) is r(t) = 1X k=1 uk Z 1 −1 p(τ−kT)d0(−t + τ ) dτ, and so for every j, r(jT) = 1X k=1 uk Z 1 −1 p(τ−kT)d0(τ − jT) dτ = uj . Filtering the received signal through d0(−t) and looking at the values at jT for every j is the same operation as filtering the signal through q(t) and then sampling at jT. Thus, q(t) = d0(−t). Exercise 6.6: (a) g(t) must be ideal Nyquist, i.e., g(0) = 1 and g(kT) = 0 for all non-zero integer k. The existence of the channel filter does not change the requirement for the overall cascade of filters. The Nyquist criterion is given in the previous problem as Eq. (??). P (b) It is possible, as shown below. There is no ISI if the Nyquist criterion m ˆg(f +2m) = 12 for |f| ≤ 1 is satisfied. Since ˆg(f) = ˆp(f)ˆh (f)ˆq(f), we know that ˆg(f) is zero where ever ˆh (f) = 0. In particular, ˆg(f) must be 0 for |f| > 5/4 (and thus for f ≥ 2). Thus we can use the band edge symmetry condition, ˆg(f) + ˆg(2 − f) = 1/2 over 0 ≤ f ≤ 1. Since ˆg(f) = 0 for 3/4 < f ≤ 1, it is necessary that ˆg(f) = 1/2 for 1 < f ≤ 5/4. Similarly, since ˆg(f) = 0 for f > 5/4, we must satisfy ˆg(f) = 1/2 for |f| < 3/4. Thus, to satisfy the Nyquist criterion, ˆg(f) is uniquely specified as below. 41
  • 42. ˆg(f 0.5 ) 0 34 54 In the regions where ˆg(f) = 1/2, we must choose ˆq(f) = 1/[2ˆp(f)ˆh (f)]. Elsewhere ˆg(f) = 0 because ˆh (f) = 0, and thus ˆq(f) is arbitrary. More specifically, we must choose ˆq(f) to satisfy ˆq(f) =   0.5, |f| ≤ 0.5; 1 3−2|f| , 0.5 < |f| ≤ 0.75 1 3−2|f| , 1 ≤ |f| ≤ 5/4 It makes no difference what ˆq(f) is elsewhere as it will be multiplied by zero there. (c) Sinceˆh (f) = 0 for f > 3/4, it is necessary that ˆg(f) = 0 for |f| > 3/4. Thus, for all integers m, ˆg(f + 2m) is 0 for 3/4 < f < 1 and the Nyquist criterion cannot be met. (d) If for some frequency f, ˆp(f)ˆh (f)6= 0, it is possible for ˆg(f) to have an arbitrary value by choosing ˆq(f) appropriately. On the other hand, if ˆp(f)ˆh (f) = 0 for some f, then ˆg(f) = 0. Thus, to avoid ISI, it is necessary that for each 0 ≤ f ≤ 1/(2T), there is some integer m such that ˆh (f+m/T)ˆp(f+m/T)6= 0. Equivalently, it is necessary that P m ˆh (f+m/T)ˆp(f+m/T)6= 0 for all f. There is one peculiarity here that you were not expected to deal with. If ˆp(f)ˆh (f) goes through zero at f0 with some given slope, and that is the only f that can be used to satisfy the Nyquist criterion, then even if we ignore the point f0, the response ˆq(f) would approach infinity fast enough in the vicinity of f0 that ˆq(f) would not be L2. This overall problem shows that under ordinary conditions (i.e.non-zero filter transforms), there is no problem in choosing ˆq(f) to avoid intersymbol interference. Later, when noise is taken into account, it will be seen that it is undesirable for ˆq(f) be very large where ˆp(f) is small, since this amplifies the noise in frequency regions where there is very little signal. Exercise 6.8: (a) With α = 1, the flat part of ˆg(f) disappears. Using T = 1 and using the familiar formula cos2 x = (1 + cos 2x)/2, ˆg1(f) becomes ˆg1(f) = 1 2 Σ 1 + cos(πf 2 Π rect(f ) 2 ). Writing cos x = (1/2)[eix + e−ix] and using the frequency shift rule for Fourier transforms, we 42
  • 43. get g1(t) = sinc(2t) + 1 2 sinc(2t + 1) + 1 2 sinc(2t − 1) = sin(2πt) 2πt + 1 2 sin(π(2t + 1)) π(2t + 1) + 1 2 sin(π(2t − 1)) π(2t − 1) = sin(2πt) 2πt − 1 2 sin(2πt) π(2t + 1) − 1 2 sin(2πt) π(2t − 1) = sin(2πt) 2π Σ 1 t − 1 2t + 1 − 1 2t − 1 Π = sin(2πt) 2πt(1 − 4t2) = sin(πt) cos(πt) πt(1 − 4t2) = sinc(t) cos(πt) (1 − 4t2) . This agrees with (6.18) in the text for α = 1, T = 1. Note that the denominator is 0 at t = ±0.5. The numerator is also 0, and it can be seen from the first equation above that the limiting value as t → ±0.5 is 1/2. Note also that this approaches 0 with increasing t as 1/t3, much faster than sinc(t). (b) It is necessary to use the result of Exercise 6.6 here. As shown there, the inverse transform of a real symmetric waveform gα(f) that satisfies the Nyquist criterion for T = 1 and has a rolloff of α ≤ 1 is equal to sinc(t)v(t). Here v(t) is lowpass limited to α/2 and its transform ˆv(f) is given by the following: ˆv(f + 1/2) = dˆg(f) df for −(1 + α) 2 < f < (1 − α) 2 . That is, we take the derivitive of the leading edge of ˆg(f), from −(1 + α)/2 to −(1 − α)/2 and shift by 1/2 to get ˆv(f). Using the middle expression in (6.17) of the text, and using the fact that cos2(x) = (1 + cos 2x)/2, ˆv(f + 1/2) = 1 2 d df Σ 1 + cos μ π(−f − (1 − α)/2) α ∂Π for f in the interval (−(1 + α)/2,−(1 − α)/2). Shifting by letting s = f + 12 , ˆv(s) = 1 2 d ds cos Σ −πs α − π 2 Π = 1 2 d ds sin hπs α i = π 2α cos hπs α i 12 for s ∈ (−α/2, α/2). Multiplying this by rect(s/α) gives us an expression for v(ˆs) everywhere. Using cos x = (eix + e−ix) allows us to take the inverse transform of v(ˆs), getting v(t) = π 4 [sinc(αt + 1/2) + sinc(αt − 1/2)] = π 4 Σ sin(παt + π/2) παt + π/2 + sin(παt − π/2) παt − π/2 Π . Using the identity sin(x + π/2) = cos x again, this becomes v(t) = 1 4 Σ cos(παt) αt + 1/2 − cos(παt) αt − 1/2 Π = cos(παt) 1 − 4α2t2 . Since g(t) = sinc(t/a)v(t) the above result for v(t) corresponds with (6.18) for T = 1. 43
  • 44. (c) The result for arbitrary T follows simply by scaling. Exercise 6.9: (a) The figure is incorrectly drawn in the exercise statement and should be as follows: 1 1 2 −1 − 2 3 − 4 3 − 2 7 4 −1 4 0 1 4 3 4 1 2 7 4 3 2 In folding these pulses together to check the Nyquist criterion, note that each pulse on the positive side of the figure folds onto the interval from −1/2 to −1/4, and each pulse of the left folds onto 1/4 to 1/2, Since there are k of them, each of height 1/k, they add up to satisfy the Nyquist criterion. (b) In the limit k → 1, the height of each pulse goes to 0, so the pointwise limit is simply the middle pulse. Since there are 2k pulses, each of energy 1/(4k2), the energy difference between that pointwise limit and ˆgk(f) is 1/(2k), which goes to 0 with k. Thus the pointwise limit and the L2 limit both converge to a function that does not satisfy the Nyquist criterion for T = 1 and is not remotely close to a function satisfying the Nyquist condition. Note also that one could start with any central pulse and construct a similar example such that the limit satisfies the Nyquist criterion. Exercise 6.11: (a) Note that xk(t) = 2<{exp(2πi(fk + fc)t)} = cos[2π(fk + fc)t]. The cosine function is even, and thus x1(t) = x2(t) if f1 + fc = −f2 − fc. This is the only possibility for equality unless f1 = f2. Thus, the only f26= f1 for which x1(t) = x2(t) is f2 = −2fc −f1. Since f1 > −fc, this requires f2 < −fc, which is why this situation cannot arise when fk ∈ [−fc, fc) for each k. (b) For any ˆu1(f), one can find a function ˆu2(f) by the transformation f2 = −2fc − f1 in (a). Thus without the knowledge that u1(t) is lowpass limited to some B < fc, the ambiguous frequency components in u1(t) cannot be differentiated from those of u2(t) by observing x(t). If u(t) is known only to be bandlimited to some band B greater than fc, then the frequenices between −B and B − 2fc are ambiguous. An easy way to see the problem here is to visualize ˆu(f) both moved up by fc and down by fc. The bands overlap if B > fc and the overlapped portion can not be retrieved without additional knowledge about u(t). (c) The ambiguity is obvious by repeating the argument in (a). Now, since y(t) has some nonzero bandwidth, ambiguity might be possible in other ways also. We have already seen, however, that if u(t) has a bandwidth less than fc, then u(t) can be uniquely retrieved from x(t) in the absence of noise. (d) For u(t) real, x(t) = 2u(t) cos(2πfct), so u(t) can be retrieved by dividing x(t) by 2 cos(2πfct) except at those points of measure 0 where the cosine function is 0. This is not a reasonable approach, especially in the presence of noise, but it points out that the PAM case is essentially different from the QAM case 44
  • 45. (e) Since u∗(t) exp(2πifct) has energy at positive frequencies, the use of a Hilbert filter does not have an output equal to u(t) exp(2πifct), and thus u(t) does not result from shifting this output down by fc. In the same way, the bands at 2fc and −2fc that result from DSB-QC demodulation mix with those at 0 frequency, so cannot be removed by an ordinary LTI filter. For QAM, this problem is to be expected since u(t) cannot be uniquely generated by any means at all. For PAM it is surprising, since it says that these methods are not general. Since all time-limited waveforms are unbounded in frequency, it says that there is a fundamental theoretical problem with the standard methods of demodulation. This is not a problem in practice, since fc is usually so much larger than the nominal bandwidth of u(t) that this problem is of no significance. Exercise 6.13: (a) Since u(t) is real, φ1(t) = <{u(t)e2πifct} = u(t) cos(2πfct), and since v(t) is pure imaginary, φ2(t) = <{v(t)e2πifct} = [iv(t)] sin(2πifct). Note that [iv(t)] is real. Thus we must show that Z u(t) cos(2πfct)[iv(t)] sin(2πfct) dt = Z u(t)[iv(t)] sin(4πfct) dt = 0. Since u(t) and v(t) are lowpass limited to B/2, their product (which corresponds to convolution in the frequency domain) is lowpass limited to B < 2fc. Rewriting the sin(4πfct) above in terms of complex exponentials, and recognizing the resulting integral as the Fourier transform of u(t)[iv(t)] at ±2fc, we see that the above integral is indeed zero. (b) Almost anything works here, and a simple choice is u(t) = [iv(t)] = rect(8fct − 1/2). Exercise 6.15: (a) Z 1 −1 √2p(t − jT) cos(2πfct)√2p∗(t − kT) cos(2πfct)dt = Z 1 −1 p(t − jT)p∗(t − kT)[1 + cos(4πfct)]dt = Z 1 −1 p(t − jT)p∗(t − kT)dt + Z 1 −1 p(t − jT)p∗(t − kT) cos(4πfct)dt = δjk + 1 2 Z 1 −1 p(t − jT)p∗(t − kT) h e4πifct + e−4πifct i dt. The remaining task is to show that the integral above is 0. Let gjk(t) = p(t − jT)g∗(t − kT). Note that ˆgjk(f) is the convolution of the transform of p(t − jT) and that of p∗(t − kT). Since p is lowpass limited to fc, gjk is lowpass limited to 2fc, and thus the integral (which calculates the Fourier transform of gjk at 2fc and −2fc) is zero. 45
  • 46. (b) Similar to part (a) we get, Z 1 −1 √2p(t − jT) sin(2πfct)√2p∗(t − kT) sin(2πfct)dt = Z 1 −1 p(t − jT)p∗(t − kT)[1 − cos(4πfct)]dt = Z 1 −1 p(t − jT)p∗(t − kT)dt − Z 1 −1 p(t − jT)p∗(t − kT) cos(4πfct)dt = δjk − 1 2 Z 1 −1 gjk h e4πifct + e−4πifct i dt. Again, the integral is 0 and orthonormality is proved. Now for any j, k Z 1 −1 √2p(t − jT) sin(2πfct)√2p∗(t − kT) cos(2πfct)dt = Z 1 −1 p(t − jT)p∗(t − kT) sin(4πfct)dt = 1 2i Z 1 −1 gjk h e4πifct − e−4πifct i dt, which is zero as before. Thus all sine terms are orthonormal to all cosine terms. Exercise 6.16: Let √k(t) = θk(t)e2πifct. Since Z √k(t)√∗j ((t) dt = Z θk(t)e2πifctθ∗j (t)e−2πifctdt = δkj , the set {√k(t)} is an orthonormal set. The set {√∗k(t)} is also an orthonormal set by the same reason. Also, since each √k(t) is bandlimited to [fc−B/2, fc+B/2] and each √∗k(t) is bandlimited to [−fc − B/2, −fc + B/2], the frequency bands do not overlap, so by Parsival’s relation, each √k(t) is orthonormal to each √∗j (t). This is where the constraint B/2 < fc has been used. Next note that the sets of functions √k,1(t) = <{2√k(t)} and √k,2(t) = ={2√k(t)} are given by √k,1(t) = √k(t) + √∗k(t) and i√k,2(t) = √k(t) − √∗k(t). It follows that the set {√k,1(t)} is an orthogonal set, each of energy 2, and the set {√k,2(t)} is an orthogonal set, each of energy 2. By the same reason, for each k, j with k6= j, √k,1 and √j,2 are orthogonal. Finally, for each k, and for each ` = {1, 2}, Z √k,`√k,` dt = Z |√k(t)|2 + |√∗k(t)|2 dt = 2. Exercise 6.17: (a) This expression is given in (6.25) of the text. (b) Note that the hypothesized expression for x(t) is 2|u(t)| cos[2πfct + φ(t)] = 2|u(t)| cos[φ(t)] cos(2πfct) − 2|u(t)| sin[φ(t)] sin(2πfct). 46
  • 47. Since u(t) = |u(t)|eiφ(t), <{u(t)} = |u(t)| cos[φ(t)] and ={u(t) = |u(t)| sin[φ(t)], 0c so the hypothesized expression agrees with (6.25). Assuming that φ(t) varies slowly with respect to fct, x(t) varies between 2|u(t)| and −2|u(t)|, touching ±u(t) each once per cycle. (c) Since | exp(2πft)| = 1, for any real f, |u0(t)| = |u(t)| = |x+(t)|. Thus this envelope varies with the baseband waveform and is defined independent of the carrier. The phase modulation (as well as x(t)) does depend on the carrier. (d) Since 2πfct + φ(t) = 2πft + φ0(t), φ(t) and φ0(t) are related by φ0(t) = φ(t) + (fc − f0c )t. (e) There are two reasonable approaches to this. First, x2(t) = 4|u(t)|2 cos2[2πfct + φ(t)] = 2|u(t)|2 + 2|u(t)|2 cos[4πfct + 2φ(t)]. Filtering out the term at 2fc, we are left with 2|u(t)|2. The filtering has the effect of forming a short term average. The trouble with this approach is that it is not quite obvious that all of the high frequency term get filtered out. The other approach is more tedious and involves squaring x(t) using (6.25). After numerous trigonometric identities left to the imagination of the reader, the same result as above is derived. 47
  • 48. Chapter 7 Exercise 7.1: (a) and (b) Since X, Y are independent, fX,Y (x, y) = αe−x2/2αe−y2/2 = α2e−(x2+y2)/2. Hence, the joint density has circular symmetry and the contours of equal probability density in the plane are concentric circles. Define FS(s) = Pr{S ≤ s} = Pr{X2 + Y 2 ≤ s} = ZZ (x,y): x2+y2≤s fX,Y (x, y) dxdy. Thus we need to integrate the joint density over a circle of radius √s centered at the origin. Consider dividing the region of integration into concentric annular rings with thickness dr. An annulus of radius r then contributes α2e−r2/2(2πr)dr to the integral (since the value of the fX,Y at a distance r is α2e−r2/2). Hence, FS(s) = ZZ (x,y): x2+y2≤s fX,Y (x, y) dxdy = Z √s 0 α2e−r2/2(2πr)dr = α22π(1 − e−s/2). This must approach 1 as s → 1, so α22π = 1 and α = 1/√2π. Differentiating FS(s), fS(s) = α22π μ 1 2e−s/2 ∂ = 1 2e−s/2. Recognizing S as an exponential rv with parameter 1/2, E[S] = 2. Also S = X2 +Y 2, and since X and Y are iid, E[X2] = E[Y 2]. Thus E[X2] = 1. Finally, since fX(x) is symmetric about 0, E[X] = 0. The point of parts (a) and (b) is to see that the spherical symmetry of the joint density of two (or more) iid Gaussian rvs allows us to prove properties that are cumbersome to prove directly. For instance, proving α = 1/√2π or E[X2] = 1 by considering the Gaussian pdf directly would be rather tedious. (c) Since FR(r) = Pr(R < r) = Pr(√S < r) = Pr(S < r2) = FS(r2), fR(r) = fS(r2)2r = re−r2/2. 48
  • 49. 2Y Exercise 7.2: 2X (a) Let Z = X + Y . Since X and Y are independent, the density of Z is the convolution of the X and Y densities. To make the principles stand out, assume σ= σ= 1. fZ(z) = fX(z) ∗ fY (z) = Z 1 −1 fX(x)fY (z − x) dx = Z 1 −1 1 √2π e−x2/2 1 √2π e−(z−x)2/2 dx = 1 2π Z 1 −1 e−(x2−xz+z2 2 ) dx = 1 2π Z 1 −1 e−(x2−xz+z2 4 )−z2 4 dx = 1 2√π e−z2/4 Z 1 −1 1 √π e−(x−z 2 )2 dx = 1 2√π e−z2/4 , 2Y 2X since the last integral integrates a Gaussian pdf with mean z/2 and variance 1/2, which evaluates to 1. As expected, Z is Gaussian with zero mean and variance 2. The ‘trick’ used here in the fourth equation above is called completing the square. The idea is to take a quadratic expression such as x2 + αz + βz2 and to add and subtract α2z2/4. Then x2 + αxz + αz2/4 is (x + αz/2)2, which leads to a Gaussian form that can be integrated. Repeating the same steps for arbitrary σand σ, we get the Gaussian density with mean 0 and variance σ2X + σ2Y . (b) and (c) We can also find the density of the sum of independent rvs by taking the product of the Fourier transforms of the densities and then taking the inverse transform. Since e−πt2 ↔ e−πf2 are a Fourier transform pair, the scaling property leads to 1 √2πσ2 exp (− x2 2σ2 ) ↔ exp (−π(2πσ2)f2) = exp[−2π2σ2θ2]. (29) Since convolution for densities corresponds to multiplication for their transforms, the transform of Z = X + Y is given by ˆ fZ(θ) = ˆ fX(θ) ˆ fY (θ) = exp £ −2π2θ2(σ2X + σ2Y § . ) Recognizing this, with σ2 = σ2X+ σ2Y , as the same transform as in (29), the density of Z is the inverse transform fZ(z) = 1 q 2π(σ2X + σ2Y ) exp μ −z2 2(σ2X + σ2Y ) ∂ . (30) (d) Note that αkWk is a zero-mean Gaussian rv with variance α2k . Thus for n = 2, the density of V is given by (30) as fV (v) = 1 p 2π(α2 1 + α2 2) exp μ v2 2(α2 1 + α2 2) ∂ . 49
  • 50. The general formula, for arbitrary n, follows by iteration, viewing the sum of n variables as the sum of n − 1 variables (which is Gaussian) plus one new Gaussian rv. Thus fV (v) = 1 q 2π( P α2k ) exp μ −v2 Pnk =1 α2k 2( ) ∂ . Exercise 7.3: (a) Note that fX is twice the density of a normal N(0, 1) rv for x ≥ 0 and thus that X is the magnitude of a normal rv. Multiplying by U simply converts X into a N(0, 1) rv Y1 that can be positive or negative. The mean and variance of Y1 are then 0 and 1 respectively. (b) Let Z be independent of X with the same density as X and let Y2 = UZ. Then Y2 is also a normal Gaussian rv. Note that Y1 and Y2 are each nonnegative if U = 1 and each negative if U = −1. Furthermore, given U = 1 , Y1 and Y2 are equal to X and Z respectively and thus have an iid Gaussian density conditional on being in the first quadrant. Given U = −1, Y1 and Y2 are each negative with the density of −X and −Z. The unconditional density is then positive only in the first and third quadrant, with the conditional density in each quadrant multiplied by 1/2. Note that Y1 and Y2 are individually Gaussian, but clearly not jointly Gaussian, since their joint density is 0 in the second and fourth quadrant, and thus the contours of equi-probability density are not the ellipses of Figure 7.2. (c) E[Y1Y2] = E[UX UZ] = E[U2]E[X]E[Z] = (E[X])2, where we used the independence of U,X, and Z and also used the fact that U2 is deterministic with value 1. Now, E[X] = Z 1 0 r 2 x π exp μ −x2 2 ∂ dx = r 2 π , where we have integrated by using the fact that x dx = d(x2/2). Combining the above equations, E[Y1Y2] = 2/π. For jointly Gaussian rvs, the mean and covariance specify the joint density (given by (7.20) in the text for 2 rvs). Since this density is different from that of Y1 and Y2, this provides a very detailed proof that Y1 and Y2 are not jointly Gaussian. (d) In order for the joint probability of two individually Gaussian rvs V1, V2 to be concentrated on the diagonal axes, v2 = v1 and v2 = −v1, we arrange that v2 = ±v1 for each joint sample value. To achieve this, let X be the same as above and let U1 and U2 be iid binary rvs, each taking on the values +1 and -1 with equal probability. Then we let V1 = U1X and V2 = U2X. That is V1 and V2 are individually Gaussian but have identical magnitudes. Thus their sample values lie on the diagonal axes and they are not jointly Gaussian. Exercise 7.5: This exercise is needed to to justify that {V (τ ); τ ∈ R} in (7.34) of the text is L2 R with probability 1, but contains nothing explicit about random processes. Let g(τ ) = φ(t)h(τ − t) dt. Since convolution in the time domain corresponds to multiplication in the 50
  • 51. frequency domain, ˆg(f) = ˆφ(f)ˆh (f). Since h is L1, |ˆh (f)| ≤ R |h(t)| dt .= h◦ < 1. Thus Z |g(τ )|2 dτ = Z |ˆg(f)|2 df = Z |ˆh (f)|2|φ(f)|2 df ≤ (h◦)2 Z |φ(f)|2 df = (h◦)2. To see that the assumption that h is L1 (or some similar condition) is necessary, let ˆφ(f) =ˆh (f) each be f−1/4 for 0 < |f| ≤ 1 and each be zero elsewhere. Then R 1 0 |ˆφ(f)|2 df = 2. Thus h and θ are L2, but ˆh is unbounded and hence h is not L1. Also, R 1 0 |ˆφ(f)ˆh (f)|2 df = R 1 0 (1/f) df = 1. What this problem shows in general is that the convolution of two L2 functions is not necessarily L2, but that the additional condition that one of the functions is L1 is enough to guarantee that the convolution is L2. Exercise 7.7: (a) For each t ∈ <, Z(t) is a finite sum of zero-mean independent Gaussian rvs {φk(t)Zk; 1 ≤ k ≤ n} and is thus itself zero-mean Gaussian. var(Z(t)) = E[Z2(t)] = E "√ Xn k=1 φk(t)Zk !√ Xn i=1 φi(t)Zi !# = Xn i=1 Xn k=1 φi(t)φk(t)E[ZiZk] = Xn k=1 φ2 k(t)σ2 k, where E[ZiZk] = σ2 kδik since the Zk are zero mean and independent. (b) Let Y = (Z(t1),Z(t2) · · ·Z(t`))T and let Z = (Z1, . . . ,Zn)T be the underlying Gaussian rvs defining the process. We can write Y = BZ, where B is an ` × n matrix whose (m, k)th entry is B(m, k) = φk(tm). Since Z1, . . . ,Zn are independent, Y is jointly Gaussian. Thus {Z(t1),Z(t2) · · · ,Z(t`)} are jointly Gaussian. By definition then, Z(t) is a Gaussian process. P(c) Note that Z(j)(t) − Z(n)(t) = j Zkφk(t). By the same analysis as in part (a), k=n+1 var(Z(j)(t) − Z(n)(t)) = Xj k=n+1 φ2 k(t)σ2 k. Since |φk(t)| < A for all k and t, we have var(Z(j)(t) − Z(n)(t)) ≤ Xj k=n+1 σ2 kA2. Because P 1k=1 σ2 k < 1, P 1k k converges to 0 as n → 1. Thus, var(Z(j)(t) − Z(n)(t)) =n+1 σ2 approaches 0 for all j as n → 1, i.e., for any δ > 0, there is an nδ such that var(Z(j)(t) − Z(n)(t)) ≤ δ for all j, n ≥ nδ. By the Chebyshev inequality, for any ≤ > 0, Pr[Z(j)(t) − Z(n)(t) ≥ ≤] ≤ δ/≤2. 51
  • 52. Choosing ≤2 = √δ, and letting δ go to 0 (with nδ → 1), we see that lim n,j→1 Pr[Z(j)(t) − Z(n)(t) > 0] = 0. Thus we have a sequence of Gaussian random variables that, in the limit, take on the same sample values with probability 1. This is a strong (but not completely rigorous) justification for saying that there is a limiting rv that is Gaussian. (d) Since the set of functions {φk(t)} are orthonormal, kzk2 = Z 1 −1 X ( k X zkφk(t))( i ziφi(t))dt = X i, k zkzihφk(t), φi(t)i = X k z2 k. Thus the expected energy in the process is E[k{Z(t)}k2] = P k E[Z2 k] = P k σ2 k (e) Using the Markov inequality, Pr{k{Z(t)}k2 > α} ≤ X k σ2 k/α. Since P k < 1, limα→1 Pr{k{Z(t)}k2 > α} = 0. This says that sample functions of Z(t) are k σ2 L2 with probability 1. Exercise 7.8: (a) Let t1, t2, . . . , tn be an arbitrary set of epochs. We will show that Z(t1), . . . ,Z(tn) are jointly Gaussian. Each of these rvs are linear combinations of the iid Gaussian set {Zk; k ∈ Z}, and, more specifically are linear combinations of a finite subset of {Zk; k ∈ Z}. Thus they are jointly Gaussian and {Z(t); t ∈ R} is a Gaussian random process. It is hard to take the need for a finite set above very seriously, but for the mathematically pure of heart, it follows from the fact that when t is not an integer plus 1/2, Z(t) is equal to Zk for k equal to the integer part of t + 1/2. In the special case where t is an integer plus 1/2, Z(t) is the sum of Zt−1/2 and Zt+1/2. (b) The covariance of {Z(t); t ∈ R} (neglecting the unimportant case where t or τ are equal to an integer plus 1/2) is: KZ(t, τ ) = E[Z(t)Z(τ )] = E[Zbt+0.5cZbτ+0.5c] = Ω 1 , bt + 0.5c = bτ + 0.5c 0 , otherwise. (c) Since KZ(t, τ ) is a function of both t and τ rather than just t − τ (for example, 0 = KZ(4/3, 2/3)6= KZ(2/3, 0) = 1 ), the process is not WSS. Hence it is not stationary. 52
  • 53. (d) Let V (0) = V0 and V (0.5) = V1. We see that V0 and V1 will be independent if 0 < φ ≤ 0.5 and V0 = V1 if 0.5 < φ ≤ 1. Hence, fV1|V0,Φ(v1|v0, φ) = Ω N(0, 1) , 0 ≤ φ ≤ 0.5 δ(v1 − v0) , otherwise Recognizing that Pr(0 ≤ Φ ≤ 0.5) = 1/2, the conditioning on Φ can be eliminated to get fV1|V0(v1|v0) = 1 2 1 √2π exp μ − v2 1 2 ∂ + 1 2δ(v1 − v0). 12 12 The main observation to be made from this is that V0 and V1 cannot be jointly Gaussian since this conditional distribution is not Gaussian. (e) Ignore the zero-probability event that φ = 1/2. Note that, given Φ < 1/2, V0 = Z0 and, given Φ > 1/2, V0 = Z−1. Since Φ is independent of Z0 and Z−1 and Z0 and Z−1 are iid, this means that V0 is N(0, 1) and, by the same argument, V1 is N(0, 1). Thus V (0) and V (0.5) are individually Gaussian but not jointly Gaussian. This is a less artificial example than those in Exercises 7.3 and 7.4 of Gaussian rvs that are not jointly Gaussian. The same argument applies to Z(t),Z(τ ) for any t, τ for which |t − τ | < 1, so {V (t); t ∈ R} is not a Gaussian process even though V (t) is a Gaussian rv for each t. (f) Consider the event that V (t) and V (τ ) are both given by the same Zi. This is impossible for |t − τ | > 1, so consider the case t ≤ τ < t + 1. Let V (t) = Zi (where i is such that −< t−i−Φ ≤ . Since Φ is uniform in [0, 1], Pr[V (τ ) = Zi] = 1−(τ −t). For t−1 < τ ≤ t, Pr[V (τ ) = Zi] = 1−(t−τ ). Given the event that V (t) and V (τ ) are both given by the same Zi the conditional expected value of V (t)V (τ ) is 1, and otherwise it is 0. Thus E[V (t)V (τ )] = Ω 1 − |t − τ | , |t − τ | ≤ 1 0 , otherwise. (g) Since the covariance function is a function of |t−τ | only, the process is WSS. For stationarity, we need to show that for all integers n, for all sets of epochs t1, . . . , tn ∈ R and for all shifts τ ∈ R, the joint distribution of V (t1), . . . , V (tn) is the same as the one of V (t1+τ ), . . . , V (tn+τ ). Let us consider n = 2. As in parts (d)-(e), one can show that the joint distribution of V (t1), V (t2) depends only on whether t1 −t2 is smaller than the value of φ or not; thus shifting both epochs by the same amount does not modify the joint distribution. For arbitrary n, one can observe that only the spacing between the epochs matters, hence a common shift does not affect the joint distribution and the process is stationary. Exercise 7.10: (a) Since X∗ −k = Xk for all k, we see that E[XkX∗ −k] = E[X2 k ], Thus we are to show that E[X2 k ] = 0 implies the given relations. E[X2 k ] = E[(<(Xk) + i=(Xk))2] = E[<(Xk)2] − E[=(Xk)2] + 2iE[<(Xk)=(Xk)] Since both the real and imaginary parts of this must be 0, E[<(Xk)2] = E[=(Xk)2] and E[<(Xk)=(Xk)] = 0. Note: If Xk is Gaussian, this shows that it is circularly symmetric. 53
  • 54. (b) The statement of this part of the exercise is somewhat mangled. The intent (which is clear in Section 7.11.3, where this result is required) is to show that if E[XkX∗m ] = 0 for all m6= k, then E[<(Xk)<(Xm)], E[<(Xk)=(Xm)], and E[=(Xk)=(Xm)] are all zero for all m6= ±k. To show this, note that 2i=(Xm) = [Xm − X∗m ]. Thus, for m6= ±k, 0 = E[XkX∗m ] − E[XkX∗ −m] = E[Xk(X∗m − Xm)] = 2iE[Xk=(Xm)]. The real and imaginary part of this show that E[<(Xk)=(Xm)] = 0 and E[=(Xk)=(Xm)] = 0. The same argument, using 2<(Xm) = Xm + X∗m shows that E<(Xk)<(Xm)] = 0. Exercise 7.12: (a) We view Y as a linear functional of {Z(t)} by expressing it as Y = Z T 0 Z(t) dt = Z 1 −1 Z(t)g(t) dt. where g(t) = ( 1 if t ∈ [0, T] 0 otherwise . Since Y is a linear functional of a Gaussian process, it is a Gaussian rv. Hence, we only need to find its mean and variance. Since {Z(t)} is zero-mean, E[Y ] = E ΣZ 1 −1 Π = Z(t)g(t) dt Z 1 −1 E[Z(t)]g(t) dt = 0. E[Y 2] = E ΣZ T 0 Z(t)g(t) dt Z T 0 Z(τ )g(τ ) dτ Π = Z T 0 Z T 0 E[Z(t)Z(τ )]g(t)g(τ ) dτ dt = Z T 0 Z T 0 N0 2 δ(τ − t)g(t)g(τ ) dτ dt = N0 2 Z 1 −1 g2(τ ) dτ. For g(t) in this problem, R 1 −1 g2(τ )dτ = T so E[Y 2] = N0T 2 . (b) The ideal filter h(t), normalized to unit energy, is given by h(t) = √2Wsinc(2Wt) ↔ ˆh (f) = 1 √2W rect( f 2W ). The input process {Z(t)} is WGN of spectral density N0/2, as interpreted in the text. Thus the output is a stationary Gaussian process with the spectral density (f)|2 = N0 SY (f) = Sz(f)|ˆh 2 1 2W rect( f 2W ). The covariance function is then given by Y (τ ) = N0 ˜K 2 sinc(2Wτ ). 54
  • 55. The covariance matrix for Y1 = Y (0) and Y2 = Y ( 1 4W ) is then K = N0 2 Σ 1 2/π 2/π 1 Π . Using (7.20), the resulting joint probability density for Y1, Y2 is fY1Y2(y1y2) = 1 πN0 p 1 − (2/π)2 exp μ −y2 + (4/π)y2 1 y1y2 − 2 N0(1 − (2/π)2) ∂ . (c) V = Z 1 0 e−tZ(t) dt = Z 1 −1 g(t)Z(t) dt, where g(t) = ( e−t for t ≥ 0 0 otherwise. Thus, V is a zero-mean Gaussian rv with variance, E[V 2] = Z0 2 Z 1 −1 g2(t) dt = Z0 4 , i.e. V ∼ N(0,Z0/4). Exercise 7.14: (a) Following the hint, let Z1 be CN(0, 1) and let Z2 = UZ1 where U is independent of Z1 and is ±1 with equal probability. Then since <{Z1} and ={Z1} are iid and N(0, 1/2), it follows that <{Z2} = U<{Z1} and ={Z2} = U={Z1} are also iid and Gaussian and thus CN(0, 1). However <{Z1} and <{Z2} are not jointly Gaussian and ={Z1} and ={Z2} are not jointly Gaussian (see Exercise 7.3 (d)), and thus Z1 and Z2 are not jointly Gaussian. However, since Z1 and Z2 are circularly symmetric, Z and eiθZ have the same joint distribution for all θ. (b) We are given that <{Z} and ={Z} are Gaussian and that for each choice of φ, eiφZ has the same distribution as Z. Thus <{eiφZ} = cos(φ)<{Z} − sin(φ)={Z} must be Gaussian also. Scaling this, α cos(φ)<{Z} − α sin(φ)={Z} is also Gaussian for all α and all φ. Thus α1<{Z}+α2={Z} is Gaussian for all α1 and α2, which implies that <{Z} and ={Z} are jointly Gaussian. This, combined with the circular symmetry, implies that Z is circularly symmetric Gaussian. 55
  • 56. Chapter 8 Exercise 8.1: (a) Conditional on observation v, the probability that hypothesis i is correct is pU|V (i|v). Thus for decision ˜U = 0, the cost is C00 if the decision is correct (an event of probability pU|V (0|v)) and C10 if the decision is incorrect (an event of probability pU|V (1|v)). Thus the expected cost for decision ˜U = 0 is C00pU|V (0|v) + C10pU|V (1|v). Combining this with the corresponding result for decision 1, ˜U mincost = arg min j C0jpU|V (0|v) + C1jpU|V (1|v). (b) We have C01pU|V (0|v) + C11pU|V (1|v) ≥˜U=0 <˜U=1 C00pU|V (0|v) + C10pU|V (1|v) (C01 − C00)pU|V (0|v) ≥˜U=0 <˜U=1 (C10 − C11)pU|V (1|v), and using pU|V (j|v) = pjfV |U(v|j) fV (v) , j = 0, 1, we get the desired threshold test. (c) The MAP threshold test takes the form Λ(v) = ˜fV |U(v|0) ≥U=0 fV |U(v|1) <U=1 ˜p1 p0 , so only the RHS of the minimum cost and the MAP tests are different. We can interpret the RHS of the cost detection problem as follows: the relative cost of an error (i.e., the difference in costs between the two hypotheses) given that 1 is correct is given by C10 −C11. This relative cost is then weighted by the a priori probability p1 for the same reason as in the MAP case. The important thing to observe here, however, is that both tests are threshold tests on the likelihood ratio. Thus the receiver structure in both cases computes the likelihood ratio (or LLR) and then makes the decision according to a threhold. The calculation of the likelihood ratio is usually the major task to be accomplished. Note also that the MAP test is the same as the minimum cost test when the relative cost is the same for each hypothesis. Exercise 8.3: Conditional on U, the Vi’s are independent since the Xi’s and Zi’s are inde-pendent. Under hypothesis U = 0, V1 and V2 are i.i.d. N(0, 1 + σ2) (because the sum of two independent Gaussian random variables forms a Gaussian random variable whose variance is the sum of the two individual variances) and V3 and V4 are i.i.d. N(0, σ2). Under hypothesis U = 1, V1 and V2 are i.i.d. N(0, σ2) and V3 and V4 are i.i.d. N(0, 1 + σ2). 56
  • 57. Note that by symmetry, the probability of error conditioned on U = 0 is same as that conditioned on U = 1. Hence the average probability of error is same as the probability of error conditioned on U = 0. (a), (b) Since the Vi’s are independent of each other under either hypothesis, we have fV|U(v|u) = fV1|U(v1|u)fV2|U(v2|u)fV3|U(v3|u)fV4|U(v4|u). Thus, LLR(v) = ln   exp ≥ − v2 1+σ2 − v2 1 1+σ2 − v2 2 3 σ2 − v2 4 σ2 ¥ exp ≥ −v2 1 σ2 − v2 2 σ2 − v2 1+σ2 − v2 3 4 1+σ2 ¥   = (v2 1 + v2 2 − v2 4) · 3 − v2 μ 1 σ2 − 1 1 + σ2 ∂ = Ea − Eb σ2(1 + σ2) . ˜U This shows that the log-likelihood ratio depends only on the difference between Ea and Eb. Hence, the pair (Ea, Eb) is a sufficient statistic for this problem. Actually the difference Ea − Eb is also a sufficient statistic. (c) For ML detection, the LLR threshold is 0, i.e. the decision is U = 0 if LLR(v) is greater than or equal to zero and U = 1 if LLR(v) is less than zero. Thus the ML detection rule reduces ≥=0 to Ea <˜U =1 Eb. The threshold would shift to ln( Pr(U=1) Pr(U=0)) for MAP detection. (d) As described before, we only need to find the error probability conditioned on U = 0. Conditioning on U = 0 throughout below, the ML detector will make an error if Ea < Eb. Here (as shown in Exercise 7.1), Ea is an exponentially distributed random variable of mean 2 + 2σ2 and Eb is an independent exponential rv of mean 2σ2, fEa(x) = 1 2 + 2σ2 exp( −x 2 + 2σ2 ); fEb(x) = 1 2σ2 exp( −x 2σ2 ). Thus, conditional on Ea = x (as well as U = 0), an error is made if Eb > x. Pr(Eb > x) = Z 1 x 1 2σ2 exp( −x 2σ2 ) dx = exp( −x 2σ2 ). The overall error probability is then Pr(e) = Z 1 0 fEb(x) exp( −x 2σ2 ) dx = Z 1 0 1 2 + 2σ2 exp( −x 2 + 2σ2 ) exp( −x 2σ2 ) dx = 1 2 + 1/σ2 . We can make a quick sanity check of this answer by checking that it equals 0 for σ2 = 0 and equals 1/2 for σ2 = 1. In the next chapter, it will be seen that this is the probability of error for binary signaling in flat Rayleigh fading. 57
  • 58. Exercise 8.5: Expanding y(t) and b(t) in terms of the orthogonal functions {φk,j(t)}, Z y(t)b(t) dt = Z   X k,j   yk,j√k,j(t)   X k0,j0   dt bk0,j0√k0,j0(t)) = X k,j yk,jbk,j Z [√k,j(t)]2 dt = 2 X k,j yk,jbk,j , where we first used the orthogonality of the functions √k,j(t) and next the fact that they each have energy 2. Dividing both sides by 2, we get (8.36) of the text. Exercise 8.6: (a) Q(x) = 1 √2π Z 1 x e−z2/2 dz = 1 √2π Z 1 0 e−(x+y)2/2 dy where y = z − x = e−x2/2 √2π Z 1 0 e−y2/2−xy dy. (b) The upper bound, exp(−y2/2) ≤ 1 is trivial, since exp v ≤ 1 whenever v ≤ 0. For the lower bound, 1 − y2/2 ≤ exp(−y2/2) is equivalent (by taking the logarithm of both sides) to ln(1 − y2/2) ≤ −y2/2, which is the familiar log inequality we have used many times. (c) For the upper bound, use the upper bound of part (b), Q(x) = e−x2/2 √2π Z 1 0 e−y2/2−xy dy ≤ e−x2/2 √2π Z 1 0 e−xy dy = e−x2/2 x√2π . For the lower bound, use the lower bound of part (b) and then substitute z for xy. Q(x) = e−x2/2 √2π Z 1 0 e−y2/2−xy dy ≥ e−x2/2 √2π Z 1 0 e−xy dy − e−x2/2 √2π Z 1 0 y2 2 e−xy dy = e−x2/2 x√2π − e−x2/2 √2π Σ 1 x3 Z 1 0 z2 2 e−z dz Π = 1 x μ 1 − 1 x2 ∂ e−x2/2 √2π . Thus, μ 1 − 1 x2 ∂ 1 x√2π e−x2/2 ≤ Q(x) ≤ 1 x√2π e−x2/2 for x > 0. (31) 58
  • 59. Exercise 8.7: (a) We are to show that Q(∞ + η) ≤ Q(∞) exp[−η∞ − η2/2] for ∞ ≥ 0 and η ≥ 0. Using the hint, Q(∞ + η) = 1 √2π Z 1 ∞+η exp(− x2 2 ) dx = 1 √2π Z 1 ∞ exp(− (y + η)2 2 ) dy = 1 √2π Z 1 ∞ exp(− y2 2 − ηy − η2 2 ) dy ≤ 1 √2π Z 1 ∞ exp(− y2 2 − η∞ − η2 2 ) dy = exp[−η∞ − η2 2 ] Q(∞), where, in the third step, we used the fact that y ≥ ∞ over the range of the integration. (b) Setting ∞ = 0 and recognizing that Q(0) = 1/2, Q(η) ≤ 1 2 exp[− η2 2 ]. This is tighter than the standard upper bound of (8.98) when 0 < η < p 2/π. (c) Part (a) can be rewritten by adding and subtracting ∞2/2 inside the exponent. Then Q(∞ + η) ≤ exp Σ − (η + ∞)2 2 + ∞2 2 Π Q(∞). Substituting w for ∞ + η yields the required result. Exercise 8.8: (a) An M-dimensional orthogonal signal set has M signal points and can transmit log2M bits per M-dimensions. Hence, ρ = 2 log2M M bits per 2 dimensions. The energy per symbol is E and there are log2M bits per symbol. Hence, the energy per bit is, Eb = E log2M . (b) The squared distance between two orthogonal signal points aj and ak is given by, ||aj − ak||2 = haj − ak, aj − aki = haj , aji + hak, aki − 2haj , aki = 2E − 2Eδjk = ( 2E if j6= k, 0 otherwise Clearly, each point is equidistant from every other point and this distance is √2E. Hence, d2 min(A) = 2E. 59
  • 60. Also every signal point has M − 1 nearest neighbors. (c) The ML rule chooses the ai that minimizes ky −aik2. This is equivalent to choosing the ai that maximizes hy, aii. This can be easily derived from the fact that ky −aik2 ≥ ky −ajk2 ⇔ hy, aii ≤ hy, aji. The ML rule is to project y on each of the signals and choose one with the largest projection. In a coordinate system where each signal waveform is collinear with a coordinate, this simply means choosing the hypothesis with the largest received coordinate. Exercise 8.10: (a) Conditional on A = a0, the normalized outputs W0, . . . ,WM−1 are independent and, except for W0, are iid, N(0, 1). Thus, conditional on W0 = w0, W1, . . .WM−1 are still iid. (b) Either at least one of the Wm, 1 ≤ m ≤ M − 1 are greater than or equal to w0 (this is the event whose probability is on the left of the first equation) or all are less than w0 (an event of probability [1 − Q(w0)]M−1. This verifies the first equation. The second inequality is obvious for M = 2, so we now verify it for M ≥ 3. Let x be Q(w0), 0 ≤ x ≤ 1 and let M − 1 be n. We then want to show that (1 − x)n ≤ 1 − nx + n2x2/2 for 0 ≤ x ≤ 1 and n ≥ 2. This is clearly satisfied for x = 0, so it can be verified for 0 ≤ x ≤ 1 by showing that the same inequality is satisfied by the first derivitive of each side, i.e., if −n(1−x)n−1 ≤ −n+n2x. This again is satisfied for x = 0, so it will be verified in general if its derivitive satisfies the inequality, i.e., if n(n − 1)(1 − x)n−2 ≤ n2, which is clearly satisfied for 0 ≤ x ≤ 1. (c) Let y(w0) = (M − 1)Q(w0). Then y(w0) is decreasing in w0 and reaches the value 1 when w0 = ∞1. Using part (b), we then have Pr √ M[−1 m=1 ! ≥ y(w0) − y2(w0)/2 ≥ (Wm ≥ w0|A = a0) Ω y(w0)/2 for w0 > ∞1 1/2 for w0 = ∞1 The upper part of the second inequality is valid because, if w0 > ∞1, then y(w0) is less than 1 and y2 < y. The lower part of the second inequality follows because y(∞1) = 1. The lower bound of 1/2 is also valid for all w0 < ∞1 because the probability on the left of the equation is increasing with decreasing w0. Note that these lower bounds differ by a factor of 2 from the corresponding union bound (and the fact that probabilities are at most 1). This might seem very loose, but since y(w0) is exponential in w0 over the range of interest, a factor of 2 is not very significant. (d) Using part (c) in part (a), Pr(e) = Z 1 −1 fW0|A(w0|a0) Pr √ M[−1 m=1 ! dw0 (Wm ≥ w0|A = a0) ≥ Z ∞1 −1 fW0|A(w0|a0) 2 dw0 + Z 1 ∞1 fW0|A(w0|a0)(M − 1)Q(w0) 2 dw0 (32) ≥ 1 2Q(α − ∞1), (33) 60
  • 61. where, in the last step, the first integral in (32) is evaluated using the fact that W0, conditional on A = a0, is N(α, 1). The second integral in (32) has been lower bounded by 0. (e) From the upper and lower bounds to Q(x) in Exercise 8.6, we see that Q(x) ≈ 1 √2πx exp(−x2/2) for large x. The coefficient 1/√2πx here is unimportant in the sense that lim x→1 ln[Q(x)] −x2/2 = 1. This can be verified in greater detail by taking the log of each term in (31). Next, substituting ∞1 for x and noting that limM→1 ∞1 = 1, this becomes lim M→1 Σ ln(M − 1) ∞2 1/2 Π = 1. Note that the limit is not affected by replacing M − 1 by M. Taking the square root of the resulting term in brackets, we get limM→1 ∞1/∞ = 1. Associating ∞ with ∞1, the upper bound to error probability in (8.57) of the text is substantially the same as the lower bound in part (d) (again in the sense of ignoring the coefficient in the bound to the Q function). The result that Pr(e) ≥ 1/4 for ∞1 = α is immediate from (33), and the result for ∞1 > α follows from the monotonicity of Q. (f) The problem statement was not very precise here. What was intended was to note that the lower bound here is close to the upper bound in (8.57), but not close to the upper bound in (8.59). The intent was to strengthen the lower bound in the case corresponding to (8.59), which is the low rate region where ∞ ≤ α/2. This can be done by using Exercise 8.6 to lower bound Q(w0) in the second integral in (32). After completing the square in the exponent, this integral agrees, in the exponential terms, with (8.59) in the text. What has been accomplished in this exercise is to show that the upper bound to error probability in Section 8.5.3 is essentially the actual error probability for orthogonal signals. Frequently it is very messy to evaluate error probability for codes and for sequences of codes of increasing block length, and much more insight can be achieved by finding upper and lower bounds that are almost the same. Exercise 8.12: (a) By definition, a vector space is closed under scalar multiplication. In this case, i is a scalar in the complex space spanned by the vector a and thus ia is in this space. (b) We must show that the inner product (in R2n) of D(a) and D(ia) is 0. Note that D(ia) = (−=(a1), . . . ,−=(an),<(a1), . . . ,<(an)), which is a real 2n-tuple. Using the inner product of R2n, hD(a),D(ia)i = X j −<(aj)=(aj) + =(aj)<(aj) = 0. (c) We first express hv, ai (the inner product in Cn), in terms of real and imaginary parts. hv, ai = [h<v,<ai + h=v,=ai] + i[h=v,<ai − h<v,=ai] = hD(v),D(a)i + ihD(v),D(ia)i. (34) 61
  • 62. Using this, kak2 = hD(v),D(a)ia + hD(v),D(ia)iia kak2 . v|a = hv, aia Since the inner products above are real and since kak = kD(ak, this can be converted to R2n as D(v|a ) = hD(v),D(a)iD(a) + hD(v),D(ia)iD(ia) kD(a)k2 . (35) This is the projection of D(v) onto the space spanned by D(a) and D(ia). It would be con-structive for the reader to trace through what this means in the special (but typical) case where the components of a are all real. (d) Note from (34) that <hv, ai = hD(v),D(a)i. Thus D μ <[hv, ai]a kak2 ∂ = hD(v),D(a)iD(a) kD(a)k2 . From (35), this is the projection of D(v|a ) onto D(a). Exercise 8.15: Theorem 8.4.1, generalized to MAP detection is as follows: Let U(t) = Pnk =1 Ukp(t − kT) be a QAM (or PAM) baseband input to a WGN channel and assume that {p(t−kT; 1 ≤ k ≤ n} is an orthonormal sequence. Assume that U = (U1, . . . ,Un)T is an n-vector of iid random symbols, each with the pmf p(0), . . . , p(M − 1). Then the Mn-ary MAP decision on U = (U1, . . . ,Un)T is equivalent to making separate M-ary MAP decisions on each Uk, 1 ≤ k ≤ n, where the decision on each Uk can be based either on the observation v(t) or the observation of vk. To see why this is true, view the detection as considering all binary MAP detections between each pair u, u0 of Mn-ary possible inputs and choose the MAP decision over all. From (8.42) and (8.43) of the text, LLRu,u0(v) = Xn k=1 −(vk − uk)2 + (vk + u0k)2 N0 . The MAP test is to compare this with the MAP threshold, ln Σ PrU (u0) PrU (u) Π = Xn k=1 ln p(u0k) p(uk) . It can be seen that this comparison is equivalent to n single letter comparisons. The sequence u that satisfies these tests for all u0 is the sequence for which each single letter test is satisfied. Exercise 8.18: (a) The two codewords 00 and 01 are mapped into the signals (a, a) and (a, -a). In R2, this appears as (a, a) (a,−a) ° ❅ ❅ s ° s 62
  • 63. Thus the first bit contributes nothing to the distance between the two signals, but achieves orthogonality using only ±a as signal values. As mentioned in Section 8.6.1 of the text, this is a trivial example of binary orthogonal codewords (binary sequences that differ from each other in half their positions). (b) Any row u in the first half of Hb+1 can be represented as (u1, u1) where u1 ∈ Hb is repeated as the first and second 2b entries of u. Similarly any other row u0 in the first half of Hb+1 can be represented as (u01, u01). The mod 2 sum of these two rows is thus u ⊕ u0 = (u1, u1) ⊕ (u01, u01) = (u1 ⊕ u01, u1 ⊕ u01). 1 →Since the mod 2 1 →sum of any two rows of Hb is another row of Hb, u1 ⊕ u01 = u010 is a row of Hb. Thus u ⊕ u0 = (u010, u010) is a row in the first half of Hb+1. Any row in the second half of Hb+1 can be represented as (u1, u1⊕ ) where is a vector of 2b ones and u1 is a row of Hb. Letting u0 be another vector in the second half of Hb+1 with the same form of representation, u ⊕ u0 = (u1, u1⊕ →1 ) ⊕ (u01, u01⊕ →1 ) = (u1 ⊕ u01, u1 ⊕ u01), where we have used the fact that →1 ⊕ →1 is the zero vector. Thus u ⊕ u0 is a row in the first half of Hb+1. Finally if u = (u1, u1) and u0 = (u01, u01⊕ →1 ), then u ⊕ u0 = (u1, u1) ⊕ (u01, u01⊕ →1 ) = (u1 ⊕ u01, u1 ⊕ u01⊕ →1 )), so that u ⊕ u0 is a row in the second half of Hb+1. Since H1 clearly has the property that each mod 2 sum of rows is another row, it follows by induction that Hb has this same property for all b ≥ 1. Exercise 8.19: As given in the hint, Prj =0 °m j ¢ is the number of binary m-tuples with at most r ones. In this formula, as in most other formulas using factorials, 0! must be defined to be 1, and thus °m 0 ¢ = 1. Each m-tuple with at most r ones that ends in a one has at most r − 1 ones in the first m − 1 positions. There are Pr−1 j=0 °m−1 j ¢ such m − 1 tuples with at most r − 1 ones, so this is the number of binary m-tuples ending in one. The number ending in 0 is similarly the number of binary m− 1 tuples containing r or fewer ones. Thus Xr j=0 μ m j ∂ = Xr−1 j=0 μ m−1 j ∂ + Xr j=0 μ m−1 j ∂ . (36) (b) Starting with m = 1, the code RM(0,1) consists of two codewords, 00 and 11, so k(0, 1) (which is the base 2 log of the number of codewords) is 1. Similarly, RM(1,1) consists of four codewords so k(1, 1) = 2. Since °1 0 ¢ = 1 and °1 1 ¢ = 1, the formula k(r,m) = Xr j=0 μ m j ∂ (37) is satisfied for m = 1, forming the basis for the induction. Next, for any m ≥ 2, assume that (37) is satisfied for m0 = m − 1 and each r, 0 ≤ r ≤ m0. For 0 < r < m, each codeword x has 63
  • 64. the form x = (u, u ⊕ v) where u ∈ RM(r,m0) and v ∈ RM(r − 1,m0). Since each choice of u and v leads to a distinct codeword x, the number of codewords in RM(r,m) is the product of the number in RM(r,m0) and that in RM(r − 1,m0). Taking logs to get the number of binary information digits, RM(r,m) = RM(r,m − 1) + RM(r − 1,m − 1). Using (36), k(r,m) satisfies (37). Finally, k(0,m) = 1 and k(m,m) = m, also satisfying (37). 64
  • 65. Chapter 9 Exercise 9.1: ✭✭✭✭✭✭✭✭✭✭✭✭✭✭✭✭✭✭ ✻ r0 ° ° °✒ sending antenna receiving antenna ✻ r(t) φ vt d(t) d(0) Line of sight Let v be the velocity of the receiving antenna, r0 be the distance from the sending antenna at time 0, and r(t) be the distance at time t. Let φ be the angle between the line of sight and the direction of motion of the receiving antenna. Assume (contrary to the figure) that vt is very small compared with r0. By plane geometry, r2(t) = (r0 + vt cos φ)2 + (vt sin φ)2 = r2 0 μ 1 + 2vt cos φ r0 + v2t2 r2 0 ∂ . Taking the square root of both sides, ignoring the term in t2, and using the approximation √1 + x ≈ 1 + x/2 for small x, r(t) ≈ r0 + vt cos φ. A more precise way of saying this is that the derivitive of r(t) at t = 0 is v cos φ. Note that this result is independent of the orientation of the antennas and of the two angles specifying the receiver location. The received waveform may therefore be approximated as Er(f, t) ≈ < h α(θ0, √0, f) exp n 2πif ≥ t − r0+vt cos φ c ¥oi r0 + vt cos φ . (b) Along with the approximations above for small t, there is also the assumption that the combined antenna pattern α(θ0, √0, f) does not change appreciably with t. The angles θ0 and √0 will change slowly with t, and the assumption here is that vt is small enough that α does not change significantly. Exercise 9.3: ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤ r1 r2 ③ Receiving Antenna ✘✘✘✘✘✿ ③ ✻ hs ❄ ✛ r ✲ ❄✻ hr Sending Antenna Ground Plane θ1 θ2 65
  • 66. (a) Let hs and hr be the heights of the sending and receiving antennas, respectively. By the reflection principle, θ1 = θ2. As shown in the figure, the length r2 of the reflected path is thus equal to the length of the path in that direction in the absence of a reflection. Thus r1 = p (hs − hr)2 + r2, r2 = p (hs + hr)2 + r2. Then, r2 − r1 = r r (hs + hr)2 r2 + 1 − r r (hs − hr)2 r2 + 1 ≈ r μ 1 + (hs + hr)2 2r2 ∂ − r μ 1 + (hs − hr)2 2r2 ∂ = 2hshr r , where the approximation is good for (hs+hr)2 r2 ø 1, i.e., when r is large relative to the heights of the antennas. Thus, r2 − r1 is asymptotically equal to b/r where b = 2hshr. This is quite surprising - the difference between the two path lengths is going to zero as 1/r. (b) and (c) Using the approximation r2 ≈ r1 + b/r from part (a), we write Er(f, t) = < £ αe2πi[ft−fr1/c] § r1 − < £ αe2πi[ft−fr2/c] § r2 ≈ < Σ αe2πi[ft−fr1/c] μ 1 r1 − 1 r2 e−2πifb/rc ∂Π . For sufficiently large r, b/r ø c/f). For example, if hs = 10m, hr = 1m, and r = 1km, then b/r = 0.02, which is much smaller than the wavelength c/f at f = 1gH. With this assumption, the second exponent above is close to zero and can be approximated by its first order expansion, Er(f, t) ≈ < Σ αe2πi[ft−fr1/c] μ 1 r1 − 1 r2 ∂Π (1 − 2πifb/rc) . Note that 1/r1 − 1/r2 is approximately b/r3, so it can be approximated by zero. Thus Er(f, t) ≈ < Σ αe2πi[ft−fr1/c] 2πifb r2rc Π ≈ −2παfb cr2 sin[2π(ft − fr1/c)], (38) since r2 ≈ r for r ¿ 1. Thus, Er ≈ β/r2 where β = −(2πα/c) sin[2π(ft − fr1/c)]. The main point here is that the difference between r1 and r2 gets so small with increasing r that there is no Doppler spread, but simply a cancellation of path responses which causes the 1/r2 decay with r. Viewed another way, the ground plane is gradually absorbing the radiated power. The student should now go back and look at the various approximations, observing that terms in 1/r3 were set to zero, and terms in 1/r2 were kept. Exercise 9.5: (a) Since there is only one path and it has Doppler shift D1, the Doppler Spread D is 0 and Δ = D1. ˆh (f, t) = e2πiD1t. 66
  • 67. √(f, t) = e−2πitΔˆh (f, t) = 1. The envelope of the output when the input is cos(2πft) is |ˆh (f, t)| = |√(f, t)| = 1. Note that in this case, there is Doppler shift but no fading, and the envelope, which is one, captures this lack of fading. (b) Here ˆh (f, t) = e2πiD1t + 1 and there are two paths: one with Doppler shift D1 and the other with zero Doppler shift. Thus D = D1 and Δ = D1/2. √(f, t) = e−2πitΔˆh (f, t) = e−2πitD1/2(e2πiD1t + 1) = eπiD1t + e−πiD1t = 2 cos(πD1t). The envelope of the output when the input is cos(2πft) is |ˆh (f, t)| = |√(f, t)| = 2| cos(πD1t)|. Note that the fading here occurs at a frequency D1/2 Note that this is the same as if there were two paths with Doppler shifts D1/2 and −D1/2. In other words, the fading frequency depends on the Doppler spread rather than the individual shifts. Exercise 9.6: (a) The envelope of <[yf (t)] is defined to be |ˆh (f, t)|. Then |yf (t)| = |e2πiftˆh (f, t)| = |e2πift| · |ˆh (f, t)| = |ˆh (f, t)|. This is true no matter what f is, but the definition of |ˆh (f, t)| as an envelope only corresponds to our intuitive idea of envelope when f is large. (b) (<[yf (t)])2 = |ˆh (f, t)|2 cos2(2πft + ∠ˆh (f, t)) = 1 2|ˆh (f, t)|2(1 + cos(4πft + 2∠ˆh (f, t))). The result of lowpass filtering this power waveform is 1 2|ˆh (f, t)|2. With the assumption of large f, the angle of ˆh (f, t) is approximately constant over a cycle, so the short-time time-average of the power is 12 |ˆh (f, t)|2. Thus the output of lowpass filtering the power waveform can be interpreted as the short-time time-average of the power. 67
  • 68. The square root of this time-average is |ˆh (f, t) √ | 2 , which is just a scaled version of the envelope of <[yf (t)]. Thus, over short time scales, we can find the envelope of <[yf (t)] by squaring <[yf (t)], lowpass filtering, taking the square root, and multiplying by √2. Exercise 9.9: (a) Since θn and φn are independent, the mean of G0,0 is given by E[G0,0] = XN n=1 E[θn]E[φn] = XN n=1 2 N (0) = 0. The variance of G0,0 is given by var(G0,0) = E[G2 0.0] = E   √ XN n=1 θnφn !2   = XN n=1 E[θ2n ]E[φ2 n] = XN n=1 2 N = 2, where we used the fact that each θnφn is zero mean and independent of θiφi for all i6= n and then the fact that each θn is independent of φn. Since G0,0 has mean 0 and variance 2 for all N, this is also the mean and variance in the limit N → 1. (b) This is a non-negative, integer-valued rv and thus is obviously non-Gaussian. In the Central Limit Theorem (CLT) one adds N iid rvs Y1, . . . , Yn of given mean and variance, divides by √N to normalize, and then goes to the limit N → 1. Here we are adding N iid rvs (Xn = θnφn) but normalizing by changing the distribution of each Xn as N changes. This might seem like a small difference, but in the CLT case, a typical normalized sample sequence y1/√N, . . . , yN/√N is a sequence of many small numbers; here a sample sequence x1, . . . , xN is a sequence of numbers, almost all of which are 0, with a small subset, not growing with N, of ±1’s. (c) As mentioned above, G0,0 is the sum of a small number of ±1’s, so it will have an integer distribution where the integer will be small with high probability. To work this out analytically, let G0,0 = V (1) N + V (−1) N where V (1) N is the number of values of n, 1 ≤ n ≤ N, for which of θnφn = 1. Similarly, V (−1) N is the number of values of n for which θnφn = −1. Since θnφn is 1 with probability 1/N, V (1) has the binomial pmf, N = k) = N! Pr(V (1) k!(N − k)! (1/N)k(1 − 1/N)N−k. 68
  • 69. Note that N!/(N − k)! = N(N − 1) · · · (N − k + 1). Thus, as N → 1 for any fixed k, lim N→1 N! (N − k)!Nk → 1. Also (1 − 1/N)N−k = exp[(N − k) ln(1 − 1/N). Thus for any fixed k, (1 − 1/N)N−k = e−1. lim N→1 Putting these relations together, lim N→1 N = k) = Pr(V (1) = k) = e−1 Pr(V (1) k! . In other words, the limiting rv, V (1), is a Poisson rv with mean 1. By the same argument, V (−1), the limiting number of occurrences of θnφn = −1, is a Poisson rv of mean 1. The limiting rvs V (0) and V (−1) are independent.3 Finally, the limiting rv G0,0 is equal to V (1) − V (−1). Convolving the pmf’s, Pr(G0,0 = `) = 1X k=0 e−1 (k+l)! · e−1 k! for ` ≥ 0. This goes to 0 rapidly with increasing `. The pdf is symmetric around 0, so the same formula applies for G0,0 = −`. Exercise 9.11: (a) From equation (9.59) of the text, Pr[e|(|G| = ˜g)] = 1 2 exp μ − a2˜g2 2WN0 ∂ . Since |G| has the Rayleigh density f|G|(˜g) = 2˜ge−˜g2 , (39) the probability of error is given by Pr(e) = Z 1 0 Pr[e|(|G| = ˜g)] f|G|(˜g) d˜g = Z 1 0 ˜g exp Σ −˜g2 μ 1 + a2 2WN0 ∂Π d˜g = Σ 2 + a2 WN0 Π−1 = 1 2 + Eb/N0 which is the same result as that derived in equation (9.56). N and V (−1) N are dependent. A cleaner mathematical approach would 3This is mathematically subtle since V (1) be to find the pmf of the number of values of n that are either ±1 and then to find the conditional number that are +1 and −1. The answer is the same. 69
  • 70. (b) We want to find E[(a/|G|)2]. Since (a/|G|)2 goes to infinity quadratically as |G| → 0 and the probability density of |G| goes to 0 linearly as |G| → 0, we expect that this expected energy might be infinite. To show this, E[(a/|G|)2] = Z 1 0 a2 ˜g2 f|G|˜g) d˜g = Z 1 0 a2 ˜g2 2˜ge−˜g2 d˜g ≥ Z 1 0 2a2 ˜g e−1 d˜g = 1. Exercise 9.13: (a) Since X0,X1,X2,X3 are independent and the first two have densities αe−αx and the second two have densities βe−βx under u0, fX|U (x | u0) = α2β2 exp[−α(x0 + x1) − β(x2 + x3)] fX|U (x | u1) = α2β2 exp[−β(x0 + x1) − α(x2 + x3)]. (b) Taking the log of the ratio, LLR(x) = (β − α)(x0 + x1) − (β − α)(x2 + x3) = (β − α)(x0+x1−x2−x3). (c) Convolving the density for X0 (conditional on u0) with the conditional density for X1, fY0|U (y0 | u0) = α2y0e−αy0 . Similarly, fY1|U (y1 | u0) = β2y1e−βy1 . (d) Given U = u0, an error occurs if Y1 ≥ Y0. This is somewhat tedious, but probably the simplest approach is to first find Pr(e) conditional on u0 and Y0 = y0 and then multiply by the conditional density of y0 and integrate the answer over y0. Pr(e | U = u0, Y0 = y0) = Z 1 y0 β2y1e−βy1 dy1 = (1 + βy0)e−βy0 . Performing the final tedious integration, Pr(e) = Pr(e | U = u0) = α3 + 3α2β (α + β)3 = 1 + 3β/α (1 + β/α)3 . Since β/α = 1+Eb/2N0, this yields the final expression for Pr(e). The next exercise generalizes this and derives the result in a more insightful way. (e) Technically, we never used any assumption about this correlation. The reason is the same as with the flat fading case. G0,0 and G1,0 affect the result under one hypothesis and G0,2 and G1,2 under the other, but they never enter anything together. 70
  • 71. Exercise 9.14: (a) Under hypothesis H=1, we see that Vm = Zm for 1 ≤ m < L and Vm = √EbGm−L,m + Zm for L ≤ m < 2L. Thus (under H = 1), Vm is circularly symmetric complex Gaussian with variance N0 2 per per real and imaginary part and Vm is similarly circularly symmetric complex Gaussian with variance Eb 2L + N0 2 . That is, given H = 1, Vm ∼ CN(0,N0) for 0 ≤ m < L and Vm ∼ CN(0,Eb/L + N0) for L ≤ m < 2L. In the same way, conditional on H = 0), Vm ∼ CN(0,Eb/L + N0) for 0 ≤ m < L and Vm ∼ CN(0,N0) for L ≤ m < 2L. Conditional on either hypothesis, the random variables V0, . . . , V2L−1 are statistically independent. Thus, the log likelihood ratio is LLR(v1, ..., v2L−1) = ln Σ f(v1, ..., v2L−1 | H = 0) f(v1, ..., v2L−1 | H = 1) Π = ln  Aexp(− PL−1 m=0 |vm|2 Eb/L+N0 − P2L−1 m=L |vm|2 N0 ) Aexp(− PL−1 m=0 |vm|2 N0 − P2L−1 m=L |vm|2 Eb/L+N0 )   = ≥PL−1 m=0 |vm|2 − P2L−1 m=L |vm|2 ¥ Eb/L (Eb/L + N0)(N0) , 2m where A denotes the coefficient of the exponential in each of the Gaussian densities above; these terms cancel out of the LLR. Note that vis the sample value of the energy Xm in the mth received symbol. The ML rule is then to select H = 0 if PL m=0 Xm ≥ P2L−1 m=L Xm and H = 1 otherwise. Conditional on H = 0, we know that Xm = |Vm|2 for 0 ≤ m < L is exponential with density α exp(−αXm) for Xm ≥ 0 where α = 1 Eb/L+N0 . Also, Xm for L ≤ m < 2L is exponential with density β exp(−βXm) for Xm ≥ 0 where β = 1/N0. Thus, we can view PL−1 m=0 Xm as the time of the Lth arrival in a Poisson process of rate α. Similarly we can view P2L−1 m=L as the time of the Lth arrival in an independent Poisson process of rate β. Given H = 0, then, the probability of error is the probability that the Lth arrival of the first process (that of rate α) occurs before the Lth arrival of the second process (that of rate β). This is the probability that at least L arrivals from the first process and L − 1 arrivals from the second process precede the Lth arrival from the second process, i.e., that the first 2L−1 arrivals from the two processes together contain at least L arrivals from process 1. By symmetry, the same result applies conditional on H = 1. (b)A basic fact about Poisson processes is that the sum of two independent Poisson processes, one of rate α and the other of rate β, can be viewed as a single process of rate α + β which has two types of arrivals. Each arrival of the combined process is a type 1 arrival with probability p = α/(α + β) and is otherwise a type 2 arrival. The types are independent between combined arrivals. Thus if we look at the first 2L − 1 arrivals from the combined process, Pr(e|H = 0) = Pr(e|H = 1) is the probability that L or more of these independent events are type 1 events. From the Binomial expansion, Pr(e) = 2XL−1 `=L μ 2L − 1 ` ∂ p`(1 − p)2L−1−`. (40) 71
  • 72. (c) Expressing p as α/β 1+α/β and 1 − p as 1 1+α/β , (40) becomes Pr(e) = 2XL−1 `=L μ 2L − 1 ` ∂μ 1 1 + α/β ∂2L−1 (α/β)`. (41) Since β/α = 1 + Eb LN0 , this becomes Pr(e) = 2XL−1 `=L μ 2L − 1 ` ∂μ 1 + Eb/(LN0) 2 + Eb/(LN0) ∂2L−1 μ 1 + Eb LN0 ∂ −` . (42) (d) For L = 1, note that 2L − 1 = 1 and the above sum has only one term and that term has ` = 1. The combinatorial coefficient is then 1 and the result is Pr(e) = 1 2+Eb/N0 as expected. For L = 2, ` goes from 2 to 3 with combinatorial coefficients 3 and 1 respectively. Thus for L = 2, Pr(e) = μ 1 + Eb/(2N0) 2 + Eb/(2N0) ∂3 μ 3 (1 + Eb/2N0)2 + 1 (1 + Eb/2N0)3 ∂ = 4 + 3Eb/(2N0) (2 + Eb/(2N0))3 . which agrees with the result in Exercise 9.13. (e) The final term in (42) is geometrically decreasing with ` and the binomial coefficient is also decreasing. When Eb/(LN0) is large, the decreasing of the final term is so fast that all but the first term, with ` = L, can be ignored. For large L, we can use Stirling’s approximation for each of the factorials in the binomial term, μ 2L − 1 L ∂ ≈ 22L−1 √πL Thus, for L and Eb/LN0 large relative to 1, the middle factor in (42) is approximately 1 and we have Pr(e) ≈ 1 √4πL μ Eb 4LN0 ∂L . The conventional wisdom is that Pr(e) decreases as (Eb/4N0)−L with increasing L, and this is true if Eb is the energy per bit for each degree of diversity (i.e., for each channel tap). Here we have assumed that Eb is shared between the different degrees of diversity, which explains the additional factor of L. The viewpoint here is appropriate if one achieves diversity by spreading the available power over L frequency bands (modeled here by L channel taps), thus achieving frequency diversity at the expense of less power per degree of freedom. If one achieves diversity by additional antennas at the receiver, for example, it is more appropriate to take the conventional view. The analysis above covers a very special case of diversity in which each diversity path has the same expected strength, each suffers flat Rayleigh fading, and detection is performed without measurement of the dynamically varying channel strengths. One gets very different answers when the detection makes use of such measurements and also when the transmitter is able to adjust the energy in the different paths. The point of the exercise, however, is to show that diversity can be advantageous even in the case here. 72
  • 73. (f)For the system analyzed above, with Lth order diversity, consider the result of a receiver that first makes hard decisions on each diversity path. That is, for each k, 0 ≤ k ≤ L−1, the receiver looks at outputs k and L + k and makes a decision independent of the other received symbols. The probability of error for a single value of k is simply the error probability for no fading, i.e., Pr(e, branch) = 1 2 + Eb/LN0 . Note that this is equal to p = 1/(1 + β/α) as defined in part (b). Now suppose the diversity order is changed from L to 2L − 1, i.e., the discrete time model has 2L − 1 taps instead of L. We also assume that each mean square tap value remains at 1/L (this was not made clear in the problem statement). Thus the receiver now makes 2L − 1 hard decisions, each correct with probability p and, given the hypothesis, each independent of the others. Thus, using these 2L − 1 hard decisions to make a final decision, the ML rule for the final decision is majority rule on the 2L − 1 local hard decisions. This means that the final decision is in error if L or more of the local decisions are in error. Since p is the probability of a local decision error, (40) gives the probability of a final decision error. This is the same as the error probability with Lth order diversity making an optimal decision on the raw received sequence. We note that in the situations such as that assumed here, where diversity is gained by dissipating available power, there is an additional 3dB power advantage in using ‘soft decisions’ beyond that exhibited here. Exercise 9.16: See the solution to Exercise 8.10, which is virtually identical. Exercise 9.17: (a) Note that the kth term of u ∗ u† is (u ∗ u†)k = X ` u`u∗`+k = 2a2nδk. We assume from the context that n is the length of the sequence and assume from the factor of 2 that this is an ideal 4-QAM PN sequence. Since kuk2 is the center term of u ∗ u†, i.e., (u ∗ u†)0, it follows that kuk2 = 2a2n. Similarly, kbk2 is the center term of (b ∗ b†)0. Using the commutativity and associativity of convolution, b ∗ b† = u ∗ g ∗ b† ∗ g† = g ∗ u ∗ u† ∗ g† = 2a2ng ∗ g†. Finally, since kgk2 is the center term of gg†, i.e., (g ∗ g)0 kbk2 = 2a2nkgk2 = kuk2kgk2. (b) If u0 and u1 are ideal PN sequences as given in part (a), then ku0k2 = ku1k2 = 2a2n. Using part (a) then, kb0k2 = ku0k2kgk2 = ku1k2kgk2 = kb1k2. 73