SlideShare a Scribd company logo
UNEQUAL-COST PREFIX-FREE CODES
DANIEL BULHOSA
Abstract. We demonstrate that the average word length of a binary unequal-
cost prefix-free code obeys a fundamental lower bound analogous to that of
equal-cost prefix-free D-ary codes. The costs of the characters are taken to
be 1 and 2 respectively. Furthermore, we show that prefix-free codes of this
type can always be created whose average word length is within 2 units of the
fundamental bound.
Introduction
Prefix-free codes are very important in the context of information theory as the
uniqueness of their words allows for a bijective correspondence between these and
the symbols they represent. From a practical standpoint, these codes are attractive
since their prefix-free property allows for the decoding of an incoming message as it
arrives. For comparison, although the more general uniquely decodable codes yield
messages that can always be decoded uniquely they do not possess this property.
Often one would need to wait for the full message to arrive to begin interpreting
it, leading to time and resources being wasted.
The prefix-free codes that are usually considered utilize an alphabet for which
the cost associated with transmitting any particular character is the same for any
character. Though an understanding of this case is sufficient for most practical ap-
plications, it is by no means the only possible case. One can consider the variation
of this problem in which different characters cost different amounts to transmit.
In this situation the standard results proven in the literature do not apply, and it
is necessary to generalize them. The goal of this paper is to extend some central
results of equal-cost prefix-free codes to unequal-cost prefix-free codes with binary
alphabets.
Generalization of Fundamental Bound
In this section we determine a fundamental lower bound for the average code-
word length of a prefix-free code with binary alphabet {., −} with length costs 1
and 2 respectively for each of these characters. The proof will not assume that the
characters have equal cost, leading to a more general result that will be applicable
to unequal-cost prefix-free codes. First we generalize the Kraft Inequality:
Date: March 25, 2015.
1
2 DANIEL BULHOSA
Theorem 1 (Kraft Inequality): Suppose that C is a binary finite set of
codewords that form a prefix-free code, with symbols . and − costing 1 and 2 units
of length respectively. Then for φ equal to the golden ratio:
words inC
φ−l(word)
≤ 1
Proof: First note that the only code of maximal length 1 that fits this description
is {.}, and it obeys the inequality. The only codes of maximal length 2 fitting this
description are {.}, {., −}, {.., −}, and {−} and they also obey the inequality.
Now we induct over the maximal length of the code. Let lmax be the length of the
longest word in the code C. Assume that the statement to be proved is true for all
codes with maximal legth l < lmax. We can create two codes with maximal lengths
less than lmax as follows: Take all the words starting with ., remove the ., and let the
set of these words form a code. This code C. inherits the prefix-free property from
C and its maximal length is at most lmax −1. If we do the same thing for all words
starting with − we end up with a code C− with maximal length of at most lmax −2.
Now, by the inductive hypothesis:
wordsinC.
φ−l.(word)
≤ 1 and
wordsinC−
φ−l−(word)
≤ 1
And by construction,
words in C.
φ−l.(word)−1
+
words in C−
φ−l−(word)−2
=
words inC
φ−l(word)
since . and − have lenghts 1 and 2 respectively. Combining this equation and the
inequalities we find that:
∞
words inC
φ−l(word)
=
words in C.
φ−l.(word)−1
+
words in C−
φ−l−(word)−2
≤ φ−1
+ φ−2
= 1
Now we prove the fundamental bound holds for these type of codes.
Theorem 2: Suppose that C is a code as described in Theorem 1, describing
the outcomes of some random variable X with probability distribution p. Then the
average length L of the codewords obeys L ≥ Hφ(p).
UNEQUAL-COST PREFIX-FREE CODES 3
Proof: We follow the template of Theorem 5.3.1 of Cover and Thomas. Let
r(x) = φ−l(x)
/ x φ−l(x)
and c = x φ−l(x)
. The difference of the average length
L and the entropy can be written as:
L − Hφ(p) =
x
p(x)l(x) +
X
p(x) logφ p(x)
= −
x
p(x) logφ φ−l(x)
+
X
p(x) logφ p(x)
= −
x
p(x) logφ
φ−l(x)
x φ−l(x)
+
X
p(x) logφ p(x)
+
x
p(x) logφ
1
x φ−l(x)
=
x
p(x) logφ
p(x)
r(x)
+ log c
= D(p||r) + log c
≥ 0
Here the inequality follows from the fact that the relative entropy is positive definite,
and the fact that c ≥ 1 by the Kraft inequality. Note that the inequality is saturated
if and only if the inequality is saturated and if p = r.
Achieving the Fundamental Bound
Our goal now is to demonstrate the existence of prefix-free codes of this type
whose average word length is within 2 units of the fundamental bound. Our ap-
proach will be to prove a version of the converse Kraft inequality and then using a
set of lengths motivated by Theorem 2 to generate the desired code.
Before doing this however, we will prove a useful lemma. First we make a few
useful definitions. Let C be a prefix-free code formed with the alphabet {., −},
then we define T(C) as the tree representation of C. In this representation, every
childless node of T(C) corresponds to a unique word in C. The word correspond-
ing to a given childless node can be constructed by following the path to the node
starting from the root, and adding a . for every left path (child) that is taken and a
− for every right path that is taken. Once we arrive to the childless node all neces-
sasry characters will be added and the word will be completed. Now, for the lemma:
Lemma: Let C be a prefix-free code such that T(C) is a full binary tree. Let .
have length 1 and − have length 2. Then:
words in C
φ−l(word)
= 1
4 DANIEL BULHOSA
Proof: First note that given a generic complete and full tree T we can create
a prefix-free code C by considering all of the words with N letters (independent
of cost). Here N is the depth of the childless nodes of T . A simple combinatoric
argument based on counting the number n of −’s in a word shows that:
words in C
φ−l(word)
=
0≤n≤N
N
n
φ−(N+n)
Now we show that this sum is equal to 1. First note that when N = 1 this sum
reduces to φ−1
+φ−2
= 1, so the base case holds. Now, assume that the postulated
equality holds for N, then:
1 = φ−1
+ φ−2
=
0≤n≤N
N
n
φ−[(N+1)+n]
+
0≤n≤N
N
n
φ−[(N+1)+(n+1)]
=
0≤n≤N
N
n
φ−[(N+1)+n]
+
1≤n≤N+1
N
n − 1
φ−[(N+1)+(n)]
= φ−(N+1)
+
1≤n≤N
N
n
+
N
n − 1
φ−[(N+1)+n]
+ φ−2(N+1)
= φ−(N+1)
+
1≤n≤N
N + 1
n
φ−[(N+1)+n]
+ φ−2(N+1)
=
0≤n≤N+1
N + 1
n
φ−[(N+1)+n]
Thus by induction the sum is equal to 1 for all N ≥ 1. So for a code with a complete
and full tree the Kraft inequality is saturated.
Now consider the tree T(C). Let N be the depth of the deepest childless node
of T(C). We can create a complete and full tree T (C) from T(C) by appending
complete and full subtrees to all the childless nodes that have depth less than N.
Note that if w is some word in C and D(w) is the set of its childless descendents
then:
D(w) in T (C)
φ−l(descendant)
= φ−l(w)
·
D(w) in T (C)
φ−(l(descendant)−l(w))
= φ−l(w)
This follows from the fact that the subtree of T (C) rooted at w is complete and
full (by construction), that its childless nodes have lenghts l − l(w), and the fact
that complete and full trees saturate the Kraft inequality as shown above. The
implication of this equation is that:
UNEQUAL-COST PREFIX-FREE CODES 5
w∈C
φ−l(w)
=
w∈C D(w) in T (C)
φ−l(descendant)
=
cn of T (C)
φ−l(cn)
= 1
Here cn stands for childness node. The second equality follows from the fact that
the sets of childless descendents D(w) are disjoint, and their union encompases all
childless nodes of the tree T (C).
Now we prove the converse of the Kraft inequality:
Theorem 3 (Converse Kraft Inequality): Suppose that L = {l1, ..., ln} is a
set of lengths (which may repeat) satisfying the Kraft inequality:
l∈L
φ−l
≤ 1
Then there exists some prefix-free code C with alphabet {., −}, where l(.) = 1 and
l(−) = 2, such that the lenght of each word is either equal to li or li +1 for a unique
1 ≤ i ≤ n.
Proof: We proceed by induction on the size of the set of lengths. Note that the
statement is trivial when there is only one length l1 in L, as then there is only one
word and a code with one word is trivially prefix-free. Note also that this length
can be arbitrary. This covers the base case.
Now suppose that L contains n lengths, some of which may be equal to one
another. Suppose that these lengths satisfy the Kraft inequality, and assume that
the theorem holds for all L of size n−1. Then the set L = L−{ln} that we create
by removing the last length must also obey the Kraft inequality:
l∈L
φ−l
<
l∈L
φ−l
≤ 1
The set L contains n − 1 lengths, so by the inductive hypothesis there exists some
prefix-free code C such that the length of each word is either equal to li or li + 1
for a unique 1 ≤ i ≤ n − 1.
Now, note that since the new set of lengths L obeys the Kraft inequality strictly
the contrapositive of the Lemma implies that the tree T(C ) is not full. This
means some node has a child available for whom a codeword has not been assigned.
Furthermore, the other child of this node must lead to codewords so the set of
characters leading to this node must have length of at most ln−1. Thus if ln > ln−1
we can create the code C from the code C by simply creating a path with appropiate
length starting from this node and going through its unassigned child.
6 DANIEL BULHOSA
The remaining concern is if ln = ln−1, for if this is the case and the characters
leading to the node in question have length ln this implies that the word corre-
sponding to ln must have length ln + 1. In turn, this means that the unassigned
child for the node in question leads to a word that has length ln +2, which is larger
than our goal.
However, if in fact ln = ln−1 and the length of the path to the node is ln−1 we
can do the following: Imagine that we do create a codeword w1 of length ln + 2
for this node, and create a code C by adding this word to the code C . Since
ln + 2 < ln The set of lengths for C must obey the Kraft inequality strictly, so
there is some node in the tree T(C ) with an unassigned child. Furthermore, the
characters leading to this node must have length of at most ln−2. We can take out
the word w1 from C and then add a new word w2 that is equal to the path to this
unassigned child. Again we must determine whether ln = ln−2 and if the length of
the path leading to this new node is ln−2. If this is not the case adding w2 to C
gives the desired code C.
If this is the case however we can again repeat the process by adding the words
w1, w2, ... to the code C to get a new code C(n)
strictly obeying the Kraft inequality.
We are guarateed that ln > lk for some 0 < k < n for otherwise the Kraft inequality
would be contradicted.
The Kraft inequality must be generalized in this manner because for some choices
of lengths one may obey the Kraft inequality but not have enough words of that
length to create a code. For example, consider the set of lengths L = {3, 3, 3, 3}.
This set obeys the Kraft inequality, but there are only three words we can create
from this alphabet that have length three: namely .., −., and .−. There are still
words of length four available after we use these three, and increasing any of the
lengths does not affect compliance with the Kraft inequality.
Having proved the Kraft inequality we can now prove the principal result of this
section. Returning to the result of Theorem 2, we found that if r = p and the
Kraft inequality is satisfied then the entropy bound is saturated. Imposing these
conditions, namely r = p along with c = 1, implies that the lengths must be:
p(x) = φ−l(x)
/
x
φ−l(x)
= φ−l(x)
=⇒ l(x) = logφ
1
p(x)
These may not valid lengths because they may be non-integer. However the
lenghts,
l (x) = logφ
1
p(x)
still obey the Kraft inequality and they are intergers. We use these lenghts to prove
the main result.
UNEQUAL-COST PREFIX-FREE CODES 7
Theorem 4: Let p be the distribution for a random variable X. Let l (x) be
the length associated with the outcome X = x. Then there exists a prefix-free code
C with alphabet {., −} such that:
Hφ(p) ≤ L ≤ Hφ(p) + 2
It follows that the optimal code C∗
for this probability distribution must be at least
as good as C.
Proof: Since the lenghts l (x) obey the Kraft inequality then Theorem 3 implies
that there is a code C such that the length of the word corresponding to x is either
l (x) or l (x) + 1. Let lC(x) denote the lengths of the words in this code. Then the
average word length L satisfies:
L =
x
p(x)lC(x)
≤
x
p(x)[l (x) + 1], worst case for Theorem 3
≤
x
p(x) logφ
1
p(x)
+ 2 , property of celing function
= Hφ(p) + 2
x
p(x)
= Hφ(p) + 2
L must also obey the fundamental bound from Theorem 2, so the result follows.
Conclusion
We have seen how the principal results for equal-cost prefix-free codes can be
generalized to the class of unequal-cost prefix-free codes with a binary dictionary
and costs 1 and 2. Although the results may not have any obvious practical ap-
plications, they do motivate an a roadmap for the interesting more general case of
non-binary codes with a different set of costs for the characters.
To illustrate the point, the key observation in proving Theorem 1 was to note
that:
φ−2
+ φ−1
= 1
It is simple to see however that this condition is equivalent to the condition φ2
−
φ − 1 = 0, which is the equation that defines the golden ratio. Thus it is apparent
that for a general unequal-cost code with costs c1, c2, ...cn the kind of relationship
that we would exploit to generalize the Kraft inequality would take the form:
8 DANIEL BULHOSA
ξ−c1
+ ξ−c2
+ ... + ξ−cn
= 1
This equation defines an algebrabic number ξ, which if real we can use to generalize
the Kraft inequality in the form:
words in C
ξ−l(words)
≤ 1
The generalization would likely follow an identical prescription as that of Theorem
1.
It is worth noting that every polynomial with odd degree and real coefficients has
at least one real root, which means that if the highest cost cmax is an odd number
such a ξ is guaranteed to exist. Thus at least certain classes of general unequal-
cost codes are promising for this type of generalization. This is an interesting
generalization from the standpoint of pure mathematics.

More Related Content

PDF
Quantum Noise and Error Correction
PDF
Quantum Computation and the Stabilizer Formalism for Error Correction
PDF
Application of Fuzzy Algebra in Coding Theory
PDF
PPTX
Learning sparse Neural Networks using L0 Regularization
PPT
Approx
PPTX
Stochastic Processes Homework Help
PDF
Ch2 probability and random variables pg 81
Quantum Noise and Error Correction
Quantum Computation and the Stabilizer Formalism for Error Correction
Application of Fuzzy Algebra in Coding Theory
Learning sparse Neural Networks using L0 Regularization
Approx
Stochastic Processes Homework Help
Ch2 probability and random variables pg 81

What's hot (20)

PDF
On the-optimal-number-of-smart-dust-particles
PDF
Optimization
PDF
Mid term
PPTX
Stochastic Processes Assignment Help
PDF
Lesson 29
PPTX
RSA final notation change2
PDF
CiE 2010 talk
PDF
PDF
Jensen's inequality, EM 알고리즘
PDF
2014 vulnerability assesment of spatial network - models and solutions
PDF
Quiz 1 solution
PDF
Huffman Code Decoding
PPTX
Signal Processing Assignment Help
PPT
Indexing Text with Approximate q-grams
PPTX
The Complexity Of Primality Testing
PPTX
Algorithms
PPT
Analysis Of Algorithms Ii
PPTX
Huffman codes
PDF
Lesson 28
On the-optimal-number-of-smart-dust-particles
Optimization
Mid term
Stochastic Processes Assignment Help
Lesson 29
RSA final notation change2
CiE 2010 talk
Jensen's inequality, EM 알고리즘
2014 vulnerability assesment of spatial network - models and solutions
Quiz 1 solution
Huffman Code Decoding
Signal Processing Assignment Help
Indexing Text with Approximate q-grams
The Complexity Of Primality Testing
Algorithms
Analysis Of Algorithms Ii
Huffman codes
Lesson 28
Ad

Viewers also liked (20)

PDF
nUDC_presentation
PPTX
Secreto del éxito
PPTX
Интернет-маркетинг и бизнес
DOCX
Alles wat je moet weten over de zorgverzekering
PPTX
In Toto Marketing Services
DOCX
NATIONAL INSTITUTE OF TECHNOLOGY
DOCX
Mom Resume 1
PDF
Презентация участникам строительной выставки Обнинск Строй Экспо 2016
PPTX
Интегрированные коммуникации в интернет-пространстве для ВШЭ
DOCX
PDF
Sizzle properties pvt ltd
PDF
JYC GUINNESS AND COMMUNITIES PRSENTATION 2
PPT
Introduction to business
DOCX
letest my cv0509
PPT
Innovatieproces Kaal Masten B.V.
PDF
Sizzle properties pvt ltd
DOC
M.Alamrawy's CV
PPT
этнография ансамбль Истоки 2009 Устюг
PPTX
Practica 7
POTX
Aparato reproductor
nUDC_presentation
Secreto del éxito
Интернет-маркетинг и бизнес
Alles wat je moet weten over de zorgverzekering
In Toto Marketing Services
NATIONAL INSTITUTE OF TECHNOLOGY
Mom Resume 1
Презентация участникам строительной выставки Обнинск Строй Экспо 2016
Интегрированные коммуникации в интернет-пространстве для ВШЭ
Sizzle properties pvt ltd
JYC GUINNESS AND COMMUNITIES PRSENTATION 2
Introduction to business
letest my cv0509
Innovatieproces Kaal Masten B.V.
Sizzle properties pvt ltd
M.Alamrawy's CV
этнография ансамбль Истоки 2009 Устюг
Practica 7
Aparato reproductor
Ad

Similar to Unequal-Cost Prefix-Free Codes (20)

PPT
Losseless
PDF
Huffman Encoding Pr
PPTX
Huffman analysis
PDF
Proof of Kraft Mc-Millan theorem - nguyen vu hung
PDF
Iswc 2016 completeness correctude
PDF
Simple effective decipherment via combinatorial optimization
PPT
Theory of computing
PPTX
Lecture 3.pptx
PPTX
Lecture 3.pptx
PDF
Flat unit 1
PDF
Data Complexity in EL Family of Description Logics
PDF
Recurrent Neural Networks (RNN): Unlocking Sequential Data Processing
PPT
23. Nekateri NP-polni problemi (ang).ppt
PPT
Context free grammer.ppt
PDF
Shan.pdfFully Homomorphic Encryption (FHE)
PDF
Basics of coding theory
PDF
Unit ii
PPTX
Computer Science Exam Help
PPT
combinatorial optimization in biology .ppt
Losseless
Huffman Encoding Pr
Huffman analysis
Proof of Kraft Mc-Millan theorem - nguyen vu hung
Iswc 2016 completeness correctude
Simple effective decipherment via combinatorial optimization
Theory of computing
Lecture 3.pptx
Lecture 3.pptx
Flat unit 1
Data Complexity in EL Family of Description Logics
Recurrent Neural Networks (RNN): Unlocking Sequential Data Processing
23. Nekateri NP-polni problemi (ang).ppt
Context free grammer.ppt
Shan.pdfFully Homomorphic Encryption (FHE)
Basics of coding theory
Unit ii
Computer Science Exam Help
combinatorial optimization in biology .ppt

Unequal-Cost Prefix-Free Codes

  • 1. UNEQUAL-COST PREFIX-FREE CODES DANIEL BULHOSA Abstract. We demonstrate that the average word length of a binary unequal- cost prefix-free code obeys a fundamental lower bound analogous to that of equal-cost prefix-free D-ary codes. The costs of the characters are taken to be 1 and 2 respectively. Furthermore, we show that prefix-free codes of this type can always be created whose average word length is within 2 units of the fundamental bound. Introduction Prefix-free codes are very important in the context of information theory as the uniqueness of their words allows for a bijective correspondence between these and the symbols they represent. From a practical standpoint, these codes are attractive since their prefix-free property allows for the decoding of an incoming message as it arrives. For comparison, although the more general uniquely decodable codes yield messages that can always be decoded uniquely they do not possess this property. Often one would need to wait for the full message to arrive to begin interpreting it, leading to time and resources being wasted. The prefix-free codes that are usually considered utilize an alphabet for which the cost associated with transmitting any particular character is the same for any character. Though an understanding of this case is sufficient for most practical ap- plications, it is by no means the only possible case. One can consider the variation of this problem in which different characters cost different amounts to transmit. In this situation the standard results proven in the literature do not apply, and it is necessary to generalize them. The goal of this paper is to extend some central results of equal-cost prefix-free codes to unequal-cost prefix-free codes with binary alphabets. Generalization of Fundamental Bound In this section we determine a fundamental lower bound for the average code- word length of a prefix-free code with binary alphabet {., −} with length costs 1 and 2 respectively for each of these characters. The proof will not assume that the characters have equal cost, leading to a more general result that will be applicable to unequal-cost prefix-free codes. First we generalize the Kraft Inequality: Date: March 25, 2015. 1
  • 2. 2 DANIEL BULHOSA Theorem 1 (Kraft Inequality): Suppose that C is a binary finite set of codewords that form a prefix-free code, with symbols . and − costing 1 and 2 units of length respectively. Then for φ equal to the golden ratio: words inC φ−l(word) ≤ 1 Proof: First note that the only code of maximal length 1 that fits this description is {.}, and it obeys the inequality. The only codes of maximal length 2 fitting this description are {.}, {., −}, {.., −}, and {−} and they also obey the inequality. Now we induct over the maximal length of the code. Let lmax be the length of the longest word in the code C. Assume that the statement to be proved is true for all codes with maximal legth l < lmax. We can create two codes with maximal lengths less than lmax as follows: Take all the words starting with ., remove the ., and let the set of these words form a code. This code C. inherits the prefix-free property from C and its maximal length is at most lmax −1. If we do the same thing for all words starting with − we end up with a code C− with maximal length of at most lmax −2. Now, by the inductive hypothesis: wordsinC. φ−l.(word) ≤ 1 and wordsinC− φ−l−(word) ≤ 1 And by construction, words in C. φ−l.(word)−1 + words in C− φ−l−(word)−2 = words inC φ−l(word) since . and − have lenghts 1 and 2 respectively. Combining this equation and the inequalities we find that: ∞ words inC φ−l(word) = words in C. φ−l.(word)−1 + words in C− φ−l−(word)−2 ≤ φ−1 + φ−2 = 1 Now we prove the fundamental bound holds for these type of codes. Theorem 2: Suppose that C is a code as described in Theorem 1, describing the outcomes of some random variable X with probability distribution p. Then the average length L of the codewords obeys L ≥ Hφ(p).
  • 3. UNEQUAL-COST PREFIX-FREE CODES 3 Proof: We follow the template of Theorem 5.3.1 of Cover and Thomas. Let r(x) = φ−l(x) / x φ−l(x) and c = x φ−l(x) . The difference of the average length L and the entropy can be written as: L − Hφ(p) = x p(x)l(x) + X p(x) logφ p(x) = − x p(x) logφ φ−l(x) + X p(x) logφ p(x) = − x p(x) logφ φ−l(x) x φ−l(x) + X p(x) logφ p(x) + x p(x) logφ 1 x φ−l(x) = x p(x) logφ p(x) r(x) + log c = D(p||r) + log c ≥ 0 Here the inequality follows from the fact that the relative entropy is positive definite, and the fact that c ≥ 1 by the Kraft inequality. Note that the inequality is saturated if and only if the inequality is saturated and if p = r. Achieving the Fundamental Bound Our goal now is to demonstrate the existence of prefix-free codes of this type whose average word length is within 2 units of the fundamental bound. Our ap- proach will be to prove a version of the converse Kraft inequality and then using a set of lengths motivated by Theorem 2 to generate the desired code. Before doing this however, we will prove a useful lemma. First we make a few useful definitions. Let C be a prefix-free code formed with the alphabet {., −}, then we define T(C) as the tree representation of C. In this representation, every childless node of T(C) corresponds to a unique word in C. The word correspond- ing to a given childless node can be constructed by following the path to the node starting from the root, and adding a . for every left path (child) that is taken and a − for every right path that is taken. Once we arrive to the childless node all neces- sasry characters will be added and the word will be completed. Now, for the lemma: Lemma: Let C be a prefix-free code such that T(C) is a full binary tree. Let . have length 1 and − have length 2. Then: words in C φ−l(word) = 1
  • 4. 4 DANIEL BULHOSA Proof: First note that given a generic complete and full tree T we can create a prefix-free code C by considering all of the words with N letters (independent of cost). Here N is the depth of the childless nodes of T . A simple combinatoric argument based on counting the number n of −’s in a word shows that: words in C φ−l(word) = 0≤n≤N N n φ−(N+n) Now we show that this sum is equal to 1. First note that when N = 1 this sum reduces to φ−1 +φ−2 = 1, so the base case holds. Now, assume that the postulated equality holds for N, then: 1 = φ−1 + φ−2 = 0≤n≤N N n φ−[(N+1)+n] + 0≤n≤N N n φ−[(N+1)+(n+1)] = 0≤n≤N N n φ−[(N+1)+n] + 1≤n≤N+1 N n − 1 φ−[(N+1)+(n)] = φ−(N+1) + 1≤n≤N N n + N n − 1 φ−[(N+1)+n] + φ−2(N+1) = φ−(N+1) + 1≤n≤N N + 1 n φ−[(N+1)+n] + φ−2(N+1) = 0≤n≤N+1 N + 1 n φ−[(N+1)+n] Thus by induction the sum is equal to 1 for all N ≥ 1. So for a code with a complete and full tree the Kraft inequality is saturated. Now consider the tree T(C). Let N be the depth of the deepest childless node of T(C). We can create a complete and full tree T (C) from T(C) by appending complete and full subtrees to all the childless nodes that have depth less than N. Note that if w is some word in C and D(w) is the set of its childless descendents then: D(w) in T (C) φ−l(descendant) = φ−l(w) · D(w) in T (C) φ−(l(descendant)−l(w)) = φ−l(w) This follows from the fact that the subtree of T (C) rooted at w is complete and full (by construction), that its childless nodes have lenghts l − l(w), and the fact that complete and full trees saturate the Kraft inequality as shown above. The implication of this equation is that:
  • 5. UNEQUAL-COST PREFIX-FREE CODES 5 w∈C φ−l(w) = w∈C D(w) in T (C) φ−l(descendant) = cn of T (C) φ−l(cn) = 1 Here cn stands for childness node. The second equality follows from the fact that the sets of childless descendents D(w) are disjoint, and their union encompases all childless nodes of the tree T (C). Now we prove the converse of the Kraft inequality: Theorem 3 (Converse Kraft Inequality): Suppose that L = {l1, ..., ln} is a set of lengths (which may repeat) satisfying the Kraft inequality: l∈L φ−l ≤ 1 Then there exists some prefix-free code C with alphabet {., −}, where l(.) = 1 and l(−) = 2, such that the lenght of each word is either equal to li or li +1 for a unique 1 ≤ i ≤ n. Proof: We proceed by induction on the size of the set of lengths. Note that the statement is trivial when there is only one length l1 in L, as then there is only one word and a code with one word is trivially prefix-free. Note also that this length can be arbitrary. This covers the base case. Now suppose that L contains n lengths, some of which may be equal to one another. Suppose that these lengths satisfy the Kraft inequality, and assume that the theorem holds for all L of size n−1. Then the set L = L−{ln} that we create by removing the last length must also obey the Kraft inequality: l∈L φ−l < l∈L φ−l ≤ 1 The set L contains n − 1 lengths, so by the inductive hypothesis there exists some prefix-free code C such that the length of each word is either equal to li or li + 1 for a unique 1 ≤ i ≤ n − 1. Now, note that since the new set of lengths L obeys the Kraft inequality strictly the contrapositive of the Lemma implies that the tree T(C ) is not full. This means some node has a child available for whom a codeword has not been assigned. Furthermore, the other child of this node must lead to codewords so the set of characters leading to this node must have length of at most ln−1. Thus if ln > ln−1 we can create the code C from the code C by simply creating a path with appropiate length starting from this node and going through its unassigned child.
  • 6. 6 DANIEL BULHOSA The remaining concern is if ln = ln−1, for if this is the case and the characters leading to the node in question have length ln this implies that the word corre- sponding to ln must have length ln + 1. In turn, this means that the unassigned child for the node in question leads to a word that has length ln +2, which is larger than our goal. However, if in fact ln = ln−1 and the length of the path to the node is ln−1 we can do the following: Imagine that we do create a codeword w1 of length ln + 2 for this node, and create a code C by adding this word to the code C . Since ln + 2 < ln The set of lengths for C must obey the Kraft inequality strictly, so there is some node in the tree T(C ) with an unassigned child. Furthermore, the characters leading to this node must have length of at most ln−2. We can take out the word w1 from C and then add a new word w2 that is equal to the path to this unassigned child. Again we must determine whether ln = ln−2 and if the length of the path leading to this new node is ln−2. If this is not the case adding w2 to C gives the desired code C. If this is the case however we can again repeat the process by adding the words w1, w2, ... to the code C to get a new code C(n) strictly obeying the Kraft inequality. We are guarateed that ln > lk for some 0 < k < n for otherwise the Kraft inequality would be contradicted. The Kraft inequality must be generalized in this manner because for some choices of lengths one may obey the Kraft inequality but not have enough words of that length to create a code. For example, consider the set of lengths L = {3, 3, 3, 3}. This set obeys the Kraft inequality, but there are only three words we can create from this alphabet that have length three: namely .., −., and .−. There are still words of length four available after we use these three, and increasing any of the lengths does not affect compliance with the Kraft inequality. Having proved the Kraft inequality we can now prove the principal result of this section. Returning to the result of Theorem 2, we found that if r = p and the Kraft inequality is satisfied then the entropy bound is saturated. Imposing these conditions, namely r = p along with c = 1, implies that the lengths must be: p(x) = φ−l(x) / x φ−l(x) = φ−l(x) =⇒ l(x) = logφ 1 p(x) These may not valid lengths because they may be non-integer. However the lenghts, l (x) = logφ 1 p(x) still obey the Kraft inequality and they are intergers. We use these lenghts to prove the main result.
  • 7. UNEQUAL-COST PREFIX-FREE CODES 7 Theorem 4: Let p be the distribution for a random variable X. Let l (x) be the length associated with the outcome X = x. Then there exists a prefix-free code C with alphabet {., −} such that: Hφ(p) ≤ L ≤ Hφ(p) + 2 It follows that the optimal code C∗ for this probability distribution must be at least as good as C. Proof: Since the lenghts l (x) obey the Kraft inequality then Theorem 3 implies that there is a code C such that the length of the word corresponding to x is either l (x) or l (x) + 1. Let lC(x) denote the lengths of the words in this code. Then the average word length L satisfies: L = x p(x)lC(x) ≤ x p(x)[l (x) + 1], worst case for Theorem 3 ≤ x p(x) logφ 1 p(x) + 2 , property of celing function = Hφ(p) + 2 x p(x) = Hφ(p) + 2 L must also obey the fundamental bound from Theorem 2, so the result follows. Conclusion We have seen how the principal results for equal-cost prefix-free codes can be generalized to the class of unequal-cost prefix-free codes with a binary dictionary and costs 1 and 2. Although the results may not have any obvious practical ap- plications, they do motivate an a roadmap for the interesting more general case of non-binary codes with a different set of costs for the characters. To illustrate the point, the key observation in proving Theorem 1 was to note that: φ−2 + φ−1 = 1 It is simple to see however that this condition is equivalent to the condition φ2 − φ − 1 = 0, which is the equation that defines the golden ratio. Thus it is apparent that for a general unequal-cost code with costs c1, c2, ...cn the kind of relationship that we would exploit to generalize the Kraft inequality would take the form:
  • 8. 8 DANIEL BULHOSA ξ−c1 + ξ−c2 + ... + ξ−cn = 1 This equation defines an algebrabic number ξ, which if real we can use to generalize the Kraft inequality in the form: words in C ξ−l(words) ≤ 1 The generalization would likely follow an identical prescription as that of Theorem 1. It is worth noting that every polynomial with odd degree and real coefficients has at least one real root, which means that if the highest cost cmax is an odd number such a ξ is guaranteed to exist. Thus at least certain classes of general unequal- cost codes are promising for this type of generalization. This is an interesting generalization from the standpoint of pure mathematics.