A Course On Large Deviations With An Introduction To Gibbs Measures Firas Rassoulagha

A Course On Large Deviations With An
Introduction To Gibbs Measures Firas Rassoulagha
download
https://ebookbell.com/product/a-course-on-large-deviations-with-
an-introduction-to-gibbs-measures-firas-rassoulagha-5222298
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
A Course On Small Area Estimation And Mixed Models Methods Theory And
Applications In R 1st Edition Domingo Morales
https://ebookbell.com/product/a-course-on-small-area-estimation-and-
mixed-models-methods-theory-and-applications-in-r-1st-edition-domingo-
morales-47284510
A Course On Plasticity Theory David J Steigmann
https://ebookbell.com/product/a-course-on-plasticity-theory-david-j-
steigmann-48922100
A Course On Hopf Algebras Rinat Kashaev
https://ebookbell.com/product/a-course-on-hopf-algebras-rinat-
kashaev-49478708
A Course On Holomorphic Discs Hansjrg Geiges Kai Zehmisch
https://ebookbell.com/product/a-course-on-holomorphic-discs-hansjrg-
geiges-kai-zehmisch-50752142

A Course On Tugofwar Games With Random Noise 1st Ed 2020 Marta Lewicka
https://ebookbell.com/product/a-course-on-tugofwar-games-with-random-
noise-1st-ed-2020-marta-lewicka-51188676
A Course On Surgery Theory Ams211 Stanley Chang Shmuel Weinberger
https://ebookbell.com/product/a-course-on-surgery-theory-
ams211-stanley-chang-shmuel-weinberger-51958464
A Course On Group Theory John S Rose
https://ebookbell.com/product/a-course-on-group-theory-john-s-
rose-52878492
A Course On Optimal Control 1st Edition Gjerrit Meinsma Arjan Van Der
Schaft
https://ebookbell.com/product/a-course-on-optimal-control-1st-edition-
gjerrit-meinsma-arjan-van-der-schaft-54928266
A Course On Mathematical Logic Shashi Mohan Srivastava Srivastava
https://ebookbell.com/product/a-course-on-mathematical-logic-shashi-
mohan-srivastava-srivastava-23404532

A Course on Large Deviations with an
Introduction to Gibbs Measures
Firas Rassoul-Agha
Timo Seppäläinen
Department of Mathematics, University of Utah, 155 South
1400 East, Salt Lake City, UT 84112, USA
E-mail address: firas@math.utah.edu
Mathematics Department, University of Wisconsin-Madison,
419 Van Vleck Hall, Madison, WI 53706, USA
E-mail address: seppalai@math.wisc.edu
c Copyright 2014 Firas Rassoul-Agha and Timo Seppäläinen

2000 Mathematics Subject Classification. Primary 60F10, 82B20
Key words and phrases. convex analysis, Gibbs measure, Ising model, large
deviations, Markov chain, percolation, phase transition, random cluster
model, random walk in a random medium, relative entropy, statistical
mechanics, variational principle

To Alla, Maxim, and Kirill
To Celeste, David, Ansa, and Timo

Contents
Preface xi
Part I. Large deviations: general theory and i.i.d. processes
Chapter 1. Introductory discussion 3
§1.1. Information-theoretic entropy 5
§1.2. Thermodynamic entropy 8
§1.3. Large deviations as useful estimates 12
Chapter 2. The large deviation principle 17
§2.1. Precise asymptotics on an exponential scale 17
§2.2. Lower semicontinuous and tight rate functions 20
§2.3. Weak large deviation principle 23
§2.4. Aspects of Cramér’s theorem 26
§2.5. Limits, deviations, and fluctuations 33
Chapter 3. Large deviations and asymptotics of integrals 35
§3.1. Contraction principle 35
§3.2. Varadhan’s theorem 37
§3.3. Bryc’s theorem 41
§3.4. Curie-Weiss model of ferromagnetism 43
Chapter 4. Convex analysis in large deviation theory 49
§4.1. Some elementary convex analysis 49
§4.2. Rate function as a convex conjugate 58
vii

viii Contents
§4.3. Multidimensional Cramér’s theorem 60
Chapter 5. Relative entropy and large deviations for empirical
measures 67
§5.1. Relative entropy 67
§5.2. Sanov’s theorem 73
§5.3. Maximum entropy principle 78
Chapter 6. Process level large deviations for i.i.d. fields 83
§6.1. Setting 83
§6.2. Specific relative entropy 85
§6.3. Pressure and the large deviation principle 91
Part II. Statistical mechanics
Chapter 7. Formalism for classical lattice systems 99
§7.1. Finite volume model 99
§7.2. Potentials and Hamiltonians 101
§7.3. Specifications 103
§7.4. Phase transition 108
§7.5. Extreme Gibbs measures 110
§7.6. Uniqueness for small potentials 112
Chapter 8. Large deviations and equilibrium statistical mechanics 119
§8.1. Thermodynamic limit of the pressure 119
§8.2. Entropy and large deviations under Gibbs measures 122
§8.3. Dobrushin-Lanford-Ruelle (DLR) variational principle 125
Chapter 9. Phase transition in the Ising model 131
§9.1. One-dimensional Ising model 134
§9.2. Phase transition at low temperature 136
§9.3. Case of no external field 139
§9.4. Case of nonzero external field 143
Chapter 10. Percolation approach to phase transition 147
§10.1. Bernoulli bond percolation and random cluster measures 147
§10.2. Ising phase transition revisited 151
Part III. Additional large deviation topics
Chapter 11. Further asymptotics for i.i.d. random variables 159

Contents ix
§11.1. Refinement of Cramér’s theorem 159
§11.2. Moderate deviations 162
Chapter 12. Large deviations through the limiting generating function165
§12.1. Essential smoothness and exposed points 165
§12.2. Gärtner-Ellis theorem 173
§12.3. Large deviations for the current of particles 177
Chapter 13. Large deviations for Markov chains 185
§13.1. Relative entropy for kernels 185
§13.2. Countable Markov chains 189
§13.3. Finite Markov chains 201
Chapter 14. Convexity criterion for large deviations 211
Chapter 15. Nonstationary independent variables 219
§15.1. Generalization of relative entropy and Sanov’s theorem 219
§15.2. Proof of the large deviation principle 221
Chapter 16. Random walk in a dynamical random environment 231
§16.1. Quenched large deviation principles 232
§16.2. Proofs via the Baxter-Jain theorem 237
Appendixes
Appendix A. Analysis 257
§A.1. Metric spaces and topology 257
§A.2. Measure and integral 260
§A.3. Product spaces 265
§A.4. Separation theorem 266
§A.5. Minimax theorem 267
Appendix B. Probability 271
§B.1. Independence 272
§B.2. Existence of stochastic processes 273
§B.3. Conditional expectation 274
§B.4. Weak topology of probability measures 276
§B.5. First limit theorems 280
§B.6. Ergodic theory 280
§B.7. Stochastic ordering 285

x Contents
Appendix C. Inequalities from statistical mechanics 291
§C.1. Griffiths’ inequality 291
§C.2. Griffiths-Hurst-Sherman inequality 292
Appendix D. Nonnegative matrices 295
Bibliography 297
Notation index 305
Author index 311
General index 313

Preface
This book arose from courses on large deviations and related topics given by
the authors in the Departments of Mathematics at the Ohio State University
(1993), at the University of Wisconsin-Madison (2006, 2013), and at the
University of Utah (2008, 2013).
Our goal has been to create an attractive collection of material for a
semester’s course which would also serve the broader needs of students from
different fields. This goal has had two implications for the book.
(1) We have not aimed at anything like an encyclopedic coverage of dif-
ferent techniques for proving large deviation principles (LDPs). Part I of the
book focuses on one classic line of reasoning: (i) upper bound by an exponen-
tial Markov-Chebyshev inequality, (ii) lower bound by a change of measure,
and (iii) an argument to match the rates from the first two steps. Beyond
this technique Part I covers Bryc’s theorem and proves Cramér’s theorem
with the subadditive method. Part III of the book covers the Gärtner-Ellis
theorem and an approach based on the convexity of a local rate function
due to Baxter and Jain.
(2) We have not felt obligated to stay within the boundaries of large
deviation theory but instead follow the trail of interesting material. Large
deviation theory is a natural gateway to statistical mechanics. Discussion
of statistical mechanics would be incomplete without some study of phase
transitions. We prove the phase transition of the Ising model in two differ-
ent ways: (i) first with classical techniques: Peierls argument, Dobrushin’s
uniqueness condition, and correlation inequalities, and (ii) the second time
with random cluster measures. This means leaving large deviation theory
xi

xii Preface
completely behind. Along the way we have the opportunity to learn cou-
pling methods which are central to modern probability theory but do not
get serious application in the typical first graduate course in probability.
Here is a brief overview of the contents of the book.
Part I covers core general large deviation theory, the relevant convex
analysis, and the large deviations of i.i.d. processes on three levels: Cramér’s
theorem, Sanov’s theorem, and the process level LDP for i.i.d. variables
indexed by a multidimensional square lattice.
Part II introduces Gibbs measures and proves the Dobrushin-Lanford-
Ruelle variational principle that characterizes translation-invariant Gibbs
measures. After this we study the phase transition of the Ising model. Part
II ends with a chapter on the Fortuin-Kasteleyn random cluster model and
the percolation approach to Ising phase transition.
Part III develops the large deviation themes of Part I in several direc-
tions. Large deviations of i.i.d. variables are complemented with moderate
deviations and with more precise large deviation asymptotics. The Gärtner-
Ellis theorem is developed carefully, together with the necessary additional
convex analysis beyond the basics covered in Part I. From large deviations
of i.i.d. processes we move on to Markov chains, to nonstationary indepen-
dent random variables, and finally to random walk in a dynamical random
environment. The last two topics give us an opportunity to apply another
approach to proving large deviation principles, namely the Baxter-Jain the-
orem. The Baxter-Jain theorem has not previously appeared in textbooks,
and its application to random walk in random environment is new.
The ideal background for reading this book would be some familiarity
with the language of measure-theoretic probability. Large deviation theory
does also require a little analysis, point set topology and functional analysis.
For example, readers should be comfortable with lower semicontinuity and
the weak topology on probability measures. It should be possible for an
instructor to accommodate students with quick lectures on technical pre-
requisites whenever needed. It is also possible to consider everything in
the framework of concrete finite spaces, in which case probability measures
become simply probability vectors.
In practice our courses have been populated by students with very di-
verse backgrounds, many with less than ideal knowledge of analysis and
probability. This has turned out less problematic than one might initially
fear. Mathematics students are typically fully satisfied only after every the-
oretical point is rigorously justified. But engineering students are content to
set aside much of the theory and focus on the essentials of the phenomenon
in question. There is great interest in probability theory among students

Preface xiii
of economics, engineering and sciences. This interest should be encouraged
and nurtured with accessible courses.
The appendixes in the back of the book serve two purposes. There is
a quick overview of some basic results of analysis and probability without
proofs for the reader who wants a quick refresher. In particular, here the
reader can look up textbook tools such as convergence theorems and inequal-
ities that are referenced in the text. The other material in the appendixes
consists of specialized results used in the text, such as a minimax theorem
and inequalities from statistical mechanics. These are proved.
Since this book evolved in courses where we tried to actively engage the
students, the development of the material relies on frequent exercises. We
realize that this feature may not appeal to some readers. On the other hand,
spelling out all the technical details left as exercises might make for tedious
reading. Hopefully an instructor can fill in those details fairly easily if she
wants to present full details in class. Exercises that are referred to in the
text are marked with an asterisk.
One of us (TS) first learned large deviations from a course taught by
Steven Orey in 1988-89 at the University of Minnesota. We are greatly
indebted to the existing books on the subject, especially those by Amir
Dembo and Ofer Zeitouni [15], Frank den Hollander [16], Jean-Dominique
Deuschel and Daniel Stroock [18], Richard Ellis [32] and Srinivasa Varadhan
[79].
As a text that combines large deviations with equilibrium statistical
mechanics, [32] is a predecessor of ours. There is obviously a good degree
of overlap but the books are different. Ours is a textbook with a lighter
touch while [32] is closer to a research monograph, covers more models in
detail and explains much of the physics. We recommend [32] to our readers
and students for further study. Our phase transition discussion covers the
nearest-neighbor Ising model while [32] covers also long-range Ising models.
On the other hand, [32] does not cover Dobrushin’s uniqueness theorem,
random cluster models, general lattice systems, or their large deviations.
Our literature references are sparse and sometimes do not assign credit to
the originators of the ideas. We encourage the reader to consult the superb
historical notes and references in the monographs of Dembo-Zeitouni, Ellis,
and Georgii.
Here is a guide to the dependencies between the parts of the book.
Sections 2.1-2.3 and 3.1-3.2 are foundational for all discussions of large devi-
ations. In addition, we have the following links. Chapter 5 relies on Sections
4.1-4.2, and Chapter 6 relies on Chapter 5. Chapter 8 relies on Chapters
6 and 7. Chapter 9 can be read independently of large deviations after
Sections 7.1-7.3 and 7.6. Section 10.2 makes sense only in the context of

xiv Preface
Chapter 9. Chapters 12 and 14 are independent of each other and both rely
on Sections 4.1-4.2. Chapter 13 relies on Chapter 5. Chapter 15 relies on
Section 13.1 and Chapter 14. Chapter 16 relies on Chapter 14.
We thank Jeff Steif for lecture notes that helped shape the proof of The-
orem 9.2, Jim Kuelbs for material for Chapter 11, and Chuck Newman for
helpful discussions on the liquid-gas phase transition for Chapter 7. We
also thank Davar Khoshnevisan for several valuable suggestions. We thank
the team at AMS and especially Ed Dunne for patience in the face of serial
breaches of agreed deadlines, and the several reviewers for valuable sugges-
tions.
Support from the National Science Foundation and the Wisconsin Alum-
ni Research Foundation is gratefully acknowledged.
Firas Rassoul-Agha
Timo Seppäläinen

Part I
Large deviations:
general theory and
i.i.d. processes

Chapter 1
Introductory discussion
Toss a fair coin n times. When n is small there is nothing to say beyond enu-
merating all the outcomes and their probabilities. With a large number of
tosses patterns and order emerge from the randomness: heads appear about
50% of the time and the histogram approaches a bell curve. As the number
of tosses increases these patterns become more and more pronounced. But
from time to time a random fluctuation might break the pattern: perhaps
10,000 tosses of a fair coin give 6000 heads. In fact, we know that there
is a chance of (1/2)10,000 that all tosses yield heads. The point is that to
understand the system well one cannot be satisfied with understanding only
the most likely outcomes. One also needs to understand rare events. But
why care about an event that has a chance of (1/2)10,000?
Here is a simplified example to illustrate the importance of probabilities
of rare events. Imagine that an insurance company collects premiums at
a steady rate of c per month. Let Xk be the random amount that the
insurance company pays out in month k to cover customer claims. Let
Sn = X1 + · · · + Xn be the total pay-out in n months. Naturally the
premiums must cover the average outlays, so c > E[Xk]. The company
stays solvent as long as Sn ≤ cn. Quantifying the chances of the rare event
Sn > cn is then of obvious interest.
This is an introductory book on the methods of computing asymptotics
of probabilities of rare events: the theory of large deviations. Let us start
with a basic computation.
Example 1.1. Let {Xk}k∈N be a sequence of independent and identically
distributed (i.i.d.) Bernoulli random variables with success probability p
(each Xk = 1 with probability p and 0 with probability 1 − p). Denote the
3

4 1. Introductory discussion
partial sum by Sn = X1 + · · · + Xn. The strong law of large numbers says
that, as n → ∞, the sample mean Sn/n converges to p almost surely. But
at any given n there is a chance pn for all heads (Sn = n) and also a chance
(1 − p)n for all tails (Sn = 0). In fact, for any s ∈ (0, 1) there is a positive
chance of a fraction of heads close to s. Let us compute the asymptotics of
this probability.
Denote the integer part of x ∈ R by bxc, that is, bxc is the largest integer
less than or equal to x. From binomial probabilities
P{Sn = bnsc} =
n!
bnsc!(n − bnsc)!
pbnsc
(1 − p)n−bnsc
∼
nnpbnsc(1 − p)n−bnsc
bnscbnsc(n − bnsc)n−bnsc
r
n
2πbnsc(n − bnsc)
.
We used Stirling’s formula (Exercise 3.5)
(1.1) n! ∼ e−n
nn
√
2πn .
Notation an ∼ bn means that an/bn → 1. Abbreviate
βn =
r
n
2πbnsc(n − bnsc)
and to get rid of integer parts, let also
γn =
(ns)ns(n − ns)n−ns
bnscbnsc(n − bnsc)n−bnsc
·
pbnsc(1 − p)n−bnsc
pns(1 − p)n−ns
.
Then
P{Sn = bnsc} ∼ βnγn exp

ns log
p
s
+ n(1 − s) log
1 − p
1 − s

.
∗Exercise 1.2. Show that there exists a constant C such that
1
C
√
n
≤ βn ≤
C
√
n
and
1
Cn
≤ γn ≤ Cn
for large enough n. By being a little more careful you can improve the
second statement to C−1 ≤ γn ≤ C.
The asymptotics above gives the limit
lim
n→∞
1
n
log P{Sn = bnsc} = −Ip(s)
with Ip(s) = s log
s
p
+ (1 − s) log
1 − s
1 − p
.
(1.2)
Note the minus sign introduced in front of Ip(s). This is a convention of
large deviation theory.
It is instructive to look at the graph of Ip (Figure 1.1). Ip extends
continuously to [0, 1] with values Ip(0) = log 1
1−p and Ip(1) = log 1
p that

1.1. Information-theoretic entropy 5
0 1
0
log 1
1−p
∞
log 1
p
p
I(s)
s
Figure 1.1. The rate function for coin tosses.
match the exponential decay of the probabilities of the events {Sn = 0} and
{Sn = n}. The unique zero of Ip is at the law of large numbers limit p
which we would regard as the “typical” behavior of Sn/n. Increasing values
of Ip correspond to less likely outcomes. For s 6∈ [0, 1] it is natural to set
Ip(s) = ∞.
The function Ip in (1.2) is a large deviation rate function. We shall
understand later that Ip(s) is also the relative entropy of the coin with
success probability s relative to the one with success probability p. The
choice of terminology is not a coincidence. This quantity is related to both
information-theoretic and thermodynamic entropy.
For this reason we go on a brief detour to discuss these well-known
notions of entropy and to point out the link with the large deviation rate
function Ip. The relative entropy that appears in large deviation theory will
take center stage in Chapters 5–6, and again in Chapter 8 when we discuss
statistical mechanics of lattice systems.
Limit (1.2) is our first large deviation result. One of the very last ones
in the book is limit (16.12) which is the analogue of (1.2) for a random walk
in a dynamical random environment, that is, in a setting where the success
probability of the coin also fluctuates randomly.
1.1. Information-theoretic entropy
A coin that always comes up heads is not random at all, and the same of
course for a coin that always comes up tails. On the other hand, we should
probably regard a fair coin as the “most random” coin because we cannot
predict whether we see more heads or tails in a sequence of tosses with better
than even odds. We discuss here briefly the quantification of the degree of
randomness of a sequence of coin flips. We take the point of view that the

degree of randomness of the coin is reflected in the average number of bits
needed to encode a sequence of tosses. This section is inspired by Chapter
2 of Ash [4].
Let Ω = {0, 1}n be the space of words ω ∈ Ω of length n. A message
is a concatenation of words. The message made of words ω1, ω2, . . . , ωm is
written ω1ω2 · · · ωm. A code is a map C : Ω → ∪`≥1{0, 1}` that assigns to
each word ω ∈ Ω a code word C(ω) which is a finite sequence of 0s and
1s. |C(ω)| denotes the length of code word C(ω). A concatenation of code
words is a code message. Thus, a message is encoded by concatenating the
code words of its individual words to make a code message: C(ω1 · · · ωm) =
C(ω1) · · · C(ωm). A code should be uniquely decipherable. That is, for every
finite sequence c1 · · · c` of 0s and 1s there exists at most one message ω1 · · · ωm
such that C(ω1) · · · C(ωm) = c1 · · · c`.
Now sample words at random under a probability distribution P on
the space Ω. In this discussion we employ the base 2 logarithm log2 x =
log x/ log 2.
Noiseless coding theorem. If C is a uniquely decipherable code, then its
average length satisfies
(1.3)
X
ω∈Ω
P(ω) |C(ω)| ≥ −
X
ω∈Ω
P(ω) log2 P(ω)
with equality if and only if P(ω) = 2−|C(ω)|.
In information theory the quantity on the right of (1.3) is called the
Shannon entropy of the probability distribution P. For a simple proof of
the theorem see [4, Theorem 2.5.1, page 37].
Consider the case where the n characters of the word ω are chosen in-
dependently, and let s ∈ [0, 1] be the probability that a character is a 1.
Then P(ω) = sN(ω)(1 − s)n−N(ω), where N(ω) is the number of ones in ω.
(As usual, 00 = 1.) By the noiseless coding theorem, the average length of
a decipherable code C satisfies
X
ω∈Ω
|C(ω)| sN(ω)
(1 − s)n−N(ω)
≥ −
X
ω∈Ω
sN(ω)
(1 − s)n−N(ω)
log2 sN(ω)
(1 − s)n−N(ω)
.
Since
P
ω sN(ω)(1−s)n−N(ω) = 1 and
P
ω N(ω)sN(ω)(1−s)n−N(ω) = ns, the
right-hand side equals nh(s) where
h(s) = −s log2 s − (1 − s) log2(1 − s) = 1 −
I1/2(s)
log 2
,

1.1. Information-theoretic entropy 7
and we see the large deviations rate function from (1.2) appear. Thus we
have the lower bound
(1.4)
X
ω∈Ω
|C(ω)| sN(ω)
(1 − s)n−N(ω)
≥ nh(s).
In other words, any uniquely decipherable code for independent and identi-
cally distributed characters with probability s for a 1 must use, on average,
at least h(s) bits per character. In this case the Shannon entropy and the
rate function I1/2 are related by
−
X
ω∈Ω
P(ω) log2 P(ω) = 1 −
I1/2(s)
log 2
.
Here is a simplistic way to see the lower bound nh(s) on the number of
bits needed that makes an indirect appeal to large deviations in the sense
that deviant words are ignored. With probability s for symbol 1, the typical
word of length n has about ns ones. Suppose we use code words of length
L to code these typical words. Then
2L
≥

n
bnsc

and the lower bound L ≥ nh(s) + O(log n) follows from Stirling’s formula.
The values h(0) = h(1) = 0 make asymptotic sense. For example, if
s = 0, then a word of any length n is all zeroes and can be encoded by a
single bit, which in the n → ∞ limit gives 0 bits per character. This is the
case of complete order. At the other extreme of complete disorder is the
case s = 1/2 of fair coin tosses where all n bits are needed because all words
of a given length are equally likely. For s 6= 1/2 a 1 is either more or less
likely than a 0 and by exploiting this bias one can encode with less than 1
bit per character on average.
David A. Huffman [48], while a Ph.D. student at MIT, developed an
optimal decipherable code; that is, a code C whose average length cannot
be improved upon. As n → ∞, the average length of the code generated by
this algorithm is exactly h(s) per character and so the lower bound (1.4) is
achieved asymptotically. We illustrate the algorithm through an example.
For a proof of its optimality and asymptotic average length see page 42 of
[4].
Example 1.3 (Huffman’s algorithm). Consider the case n = 3 and s = 1/4.
There are 8 words. Word 111 comes with probability 1/43, words 110, 101,
and 011 come each with probability 3/43, words 100, 010, and 001 come
with probability 32/43 each, and word 000 comes with probability (3/4)3.
These 8 words are the terminal leaves of a binary tree that we build.

11111
11110
11101
110
101
011
100
010
001
000
3/43
3/43
3/43
32
/43
32
/43
32
/43
33
/43
4/43
6/43
10/43
18/43
37/43
1
11100
110
101
100
0
19/43
0
0
Figure 1.2. The tree for Huffman’s algorithm in the case n = 3 and
s = 1/4. The leftmost column shows the resulting codes.
First, find the two leaves with the smallest probabilities. Ties can be
resolved arbitrarily. Give these two leaves a and b a common ancestor la-
beled with a probability that is the sum of the probabilities of a and b. In
our example, leaves 111 and 110 are given a common parent labeled with
probability 4/43.
Now leaves a and b are done with and their parent is regarded as a new
leaf. Repeat the step. Continue until there is one leaf left. In our example,
the second step gives a common ancestor to leaves 101 and 011. This new
node is labeled 3/43 +3/43 = 6/43. And so on. Figure 1.2 presents the final
tree.
To produce the code of a word, start at the root and follow the tree to
the leaf of that word. At each fork encode a down step with a 0 and an up
step with a 1 (in our figure). For instance, word 101 is reached from the root
by three successive up steps followed by a single down step then another up
step. Thus word 101 is encoded as 11101.
The average length of the code is
5 × 1 + 5 × 3 + 5 × 3 + 5 × 3 + 3 × 32 + 3 × 32 + 3 × 32 + 1 × 33
43
=
158
64
.
This is 158/192 ≈ 0.8229 bits per character. As the number of characters n
grows, the average length of the encoding per character will converge to the
information-theoretic entropy h(1/4) ≈ 0.811.
1.2. Thermodynamic entropy
The next discussion of thermodynamics is inspired by Schrödinger’s lectures
[70]. After some preliminary computations we use the first and second laws
of thermodynamics to derive an expression for entropy. In the simplest case
of a system with two energy levels this expression can be related to the

1.2. Thermodynamic entropy 9
rate function (1.2). The reader should be aware that this section is not
mathematically rigorous.
Let A denote a physical system whose possible energy levels are {ε` :
` ∈ N}. Then consider a larger system An made up of n identical physically
independent copies of A. By physically independent we mean that these
components of An do not communicate with each other. Each component
can be at any energy level ε`. Immerse the whole system in a large heat
bath at fixed absolute temperature T which gives the system total energy
E = nU.
Let a` be the number of components in state ε`. These numbers must
satisfy the constraints
(1.5)
X
`
a` = n and
X
`
a`ε` = E.
For given values a`, the total number of possible arrangements of the compo-
nents at different energy levels is n!
a1!···a`!··· . When n is large, it is reasonable
to assume that the values a` that appear are the ones that maximize the
number of arrangements, subject to the constraints (1.5).
To find these optimal a` values, maximize the logarithm of the number
of arrangements and introduce Lagrange multipliers α and β. Thus we wish
to differentiate
(1.6) log
n!
a1! · · · a`! · · ·
− α
X
`
a` − n

− β
X
`
a`ε` − E

with respect to a` and set the derivative equal to zero. To use calculus,
pretend that the unknowns a` are continuous variables and use Stirling’s
formula (1.1) in the form log n! ∼ n(log n − 1). We arrive at
log a` + α + βε` = 0, for all `.
Thus a` = Ce−βε` . Since the total number of components is n =
P
a`,
a` =
ne−βε`
P
j e−βεj
.
(1.7)
The second constraint gives
E =
n
P
` ε` e−βε`
P
` e−βε`
.
These equations should be understood to hold only asymptotically. Di-
vide both equations by n and take n → ∞. We interpret the limit as saying
that when a typical system A is immersed in a heat bath at temperature T

the system takes energy ε` with probability
p` =
e−βε`
P
j e−βεj
(1.8)
and then has average energy
U =
P
` ε` e−βε`
P
` e−βε`
.
(1.9)
Expression (1.6) suggests that β is a function of {ε`} and U. We argue with
physical reasoning that β is in fact a universal function of T alone.
Consider another system B with energy levels {ε̄m}. Let Bn denote a
composite system of n identical and independent copies of B, also physi-
cally independent of An. Immersing An in a heat bath with temperature T
specifies a value of β for it. Since β can a priori depend on {ε`}, which is a
characteristic of system A, we denote this value by βA. Similarly, immersing
Bn in the same heat bath leads to value βB.
We can also immerse An and Bn together in the heat bath and consider
them together as consisting of n independent and identical copies of a system
AB. This system acquires its own value βAB which depends on the tem-
perature T and on the energies a system AB can take. Since A and B are
physically independent, AB can take energies in the set {ε` +ε̄m : `, m ∈ N}.
Let a`,m be the number of AB-components whose A-part is at energy
level ε` and whose B-part is at energy level ε̄m, when An and Bn are im-
mersed together in the heat bath. Solving the Lagrange multipliers problem
for the AB-system gives
a`,m =
ne−βAB(ε`+ε̄m)
P
i,j e−βAB(εj+ε̄i)
= n ·
e−βABε`
P
j e−βABεj
·
e−βABε̄m
P
i e−βABε̄i
.
To obtain a`, the number of A-components at energy ε`, sum over m:
a` =
X
m
a`,m =
ne−βABε`
P
j e−βABεj
.
Since An and Bn do not interact, this must agree with the earlier outcome
(1.7):
a` =
ne−βAε`
P
j e−βAεj
=
ne−βABε`
P
j e−βABεj
for all ` ∈ N.
It is reasonable to assume that system A can take at least two different
energies ε` 6= ε`0 for otherwise the discussion is trivial. Then the above gives
e−βA(ε`−ε`0 ) = e−βAB(ε`−ε`0 ) and so βA = βAB. Switching the roles of A and
B leads to βB = βAB = βA. Since system B was arbitrary, we conclude that
β is a universal function of T.

1.2. Thermodynamic entropy 11
We regard β as the more fundamental quantity and view T as a universal
function of β. The state of the system is then determined by the energy levels
{ε`} and β by equations (1.8) and (1.9).
Next we derive the precise formula for the dependence of T on β. Work-
ing with fixed energies ε` and considering β to be the only variable will not
help, since we can replace β by any monotone function of it and nothing in
the above reasoning changes. We need to make energies ε` vary which leads
to the notion of work done by the system.
The first law of thermodynamics states that if the parameters of the
system (i.e. its energies ε`) change it will absorb an average amount of heat
dQ = dE + dW, where dW is the work done by the system. If the energies
change by dε`, then dW = −
P
` a` dε` and
dQ = dE −
X
`
a` dε`.
Let nS be the entropy of the system An. By the second law of thermody-
namics
dQ = nT dS.
Define the free energy F = log
P
e−βεj . Divide the two displays above
by n to write
(1.10)
dS =
1
T

dU −
X
p` dε`

=
1
Tβ

d(βU) − U dβ − β
X
p` dε`

=
1
Tβ

d(βU) +
∂F
∂β
dβ +
X ∂F
∂ε`
dε`

=
1
Tβ
d(βU + F).
Abbreviate G = βU + F which, by the display above, has to be a function
f(S) such that f0(S) = Tβ.
Recall that the three systems A, B, and AB acquire the same β when
immersed in the heat bath. Consequently FA +FB = FAB. Since U = −∂F
∂β ,
the same additivity holds for the function G, and so
f(SA) + f(SB) = f(SAB).
Then by (1.10), since T is a universal function of β, dSAB = dSA + dSB
which implies SAB = SA + SB + c. Now we have
f(SA) + f(SB) = f(SA + SB + c).
Differentiate in SA and SB to see that f0(SA) = f0(SB). Since the system
B was chosen arbitrarily, entropy SB can be made equal to any number

regardless of the value of temperature T. Therefore f0(S) must be a universal
constant which we call 1/k. (This constant cannot be zero because T and β
vary with each other.) This implies β = 1
kT and G = k−1S. Constant k is
called Boltzmann’s constant. If k 0, (1.8) would imply that as T → 0 the
system chooses the highest energy state which goes against physical sense.
Hence k 0.
Let us compute S for a system with two energy levels ε0 and ε1. By
symmetry, recentering, and a change of units, we can assume that ε0 = 0
and ε1 = 1. The system takes energy 0 with probability p0 and energy
1 with probability p1. The average energy U = p1 and from (1.8) p1 =
e−β/(1 + e−β). Then
S = kG = k(βU + F) = k p1(β + F) + (1 − p1)F

= −k p1 log p1 + (1 − p1) log(1 − p1)

= k log 2 − kI1/2(p1).
Thus rate function I1/2 of Example 1.1 is, up to a universal positive
multiplicative factor and an additive constant, the negative thermodynamic
entropy of a two-energy system. In the previous section we saw that −I1/2
is a linear function (with positive slope) of information-theoretic entropy.
Together these observations imply that the thermodynamic entropy of a
physical system represents the amount of information needed to describe
the system or, equivalently, the amount of uncertainty remaining in it.
The identity (kβ)−1S = U + β−1F expresses an energy-entropy balance
and reappears several times later in various guises. It can be found in
Exercise 5.19, as equation (7.8) for the Curie-Weiss model, and in Section
8.3 as part (c) of the Dobrushin-Lanford-Ruelle variational principle for
lattice systems.
1.3. Large deviations as useful estimates
The subject of large deviations is about controlling probabilities of atypical
events. There are two somewhat different forms of this activity.
(i) Proofs of limit theorems in probability require estimates to rule out
atypical behavior. Such estimates could be called “ad-hoc large
deviations”.
(ii) Precise limits of vanishing probabilities on an exponential scale are
stated as large deviation principles.

1.3. Large deviations as useful estimates 13
The subject of our book is the second kind of large deviations. The next
chapter begins a systematic development of large deviation principles. Be-
fore that, let us look at two textbook examples to illustrate the use of inde-
pendence in the derivation of estimates to prove limit theorems.
Example 1.4. Let {Xn} be an i.i.d. sequence with E[X] = 0. (Common
device: X is a random variable that has the same distribution as all the
Xn’s.) We wish to show that, under a suitable hypothesis,
(1.11) Sn/np
→ 0 P-almost surely, for any p 1/2.
In order to illustrate a method, we make a strong assumption. Assume the
existence of δ 0 such that E[eθX] ∞ for |θ| ≤ δ. When p ≥ 1 limit (1.11)
follows from the strong law of large numbers. So let us assume p ∈ (1/2, 1).
For t ≥ 0 Chebyshev’s inequality gives
P{Sn ≥ εnp
} ≤ E[etSn−εtnp
] = exp{−εtnp
+ n log E[etX
]}.
The exponential moment assumption implies that E[|X|k]tk/k! is summable
for t ∈ [0, δ]. Recalling that E[X] = 0,
E[etX
] = E[etX
− tX] ≤ 1 +
∞
X
k=2
tk
k!
E[|X|k
]
≤ 1 + t2
δ−2
∞
X
k=2
δk
k!
E[|X|k
] ≤ 1 + ct2
for t ∈ [0, δ].
Then, taking t = εnp
2nc and n large enough,
(1.12)
P{Sn ≥ εnp
} ≤ exp{−εtnp
+ n log(1 + ct2
)}
≤ exp{−εtnp
+ nct2
} = exp
n
−
ε2
4c
n2p−1
o
.
Applying this to the sequence {−Xn} gives the matching bound on the left:
(1.13) P{Sn ≤ −εnp
} ≤ exp
n
−
ε2
4c
n2p−1
o
.
Inequalities (1.12)–(1.13) can be regarded as large deviation estimates. (Al-
though later we see that since the scale is np for 1/2 p 1, technically
these are called moderate deviations. But that distinction is not relevant
here.) These estimates imply the summability
X
n
P{|Sn| ≥ εnp
} ∞.
The Borel-Cantelli lemma implies that for any ε 0
P{∃n0 : n ≥ n0 ⇒ |Sn/np
| ≤ ε} = 1.

A countable intersection over ε = 1/k for k ∈ N gives
P{∀k ∃n0 : n ≥ n0 ⇒ |Sn/np
| ≤ 1/k} = 1,
which says that Sn/np → 0, P-a.s.
We used an unnecessarily strong assumption to illustrate the exponential
Chebyshev method. We can achieve the same result with martingales under
the assumption E[|X|2] ∞. Since Sn is a martingale (relative to the
filtration σ(X1, . . . , Xn)), Doob’s inequality (Theorem 5.4.2 of [27] or (8.26)
of [54]) gives
P
n
max
k≤n
|Sk| ≥ εnp
o
≤
1
ε2n2p
E[|Sn|2
] =
nE[|X|2]
ε2n2p
=
c
ε2
n−(2p−1)
.
Pick r 0 such that r(2p − 1) 1. Then,
P
n
max
k≤mr
|Sk| ≥ εmpr
o
≤
c1
mr(2p−1)
.
Hence, P{maxk≤mr |Sk| ≥ εmpr} is summable over m and the Borel-Cantelli
lemma implies that m−rp maxk≤mr |Sk| → 0 P-a.s. as m → ∞.
To get the result for the full sequence pick mn such that (mn − 1)r ≤
n mr
n. Then,
n−p
max
k≤n
|Sk| ≤
mr
n
n
p
m−rp
n max
k≤mr
n
|Sk| −→ 0 as n → ∞
because mr
n/n → 1.
Example 1.5 (Longest run of heads). Let {Xn} be an i.i.d. sequence of
Bernoulli random variables with success probability p. For each n ≥ 1 let
Rn be the length of the longest success run among (X1, . . . , Xn). We derive
estimates to prove a result of Rényi [66] that
(1.14) P
n
lim
n→∞
Rn
log n
= −
1
log p
o
= 1.
Fix b 1 and r such that r(b − 1) 1. Let `m = d−br log m/ log pe.
(dxe is the smallest integer larger than or equal to x.) If Rmr ≥ `m, then
there is an i ≤ mr such that Xi = Xi+1 = · · · = Xi+`m−1 = 1. Therefore
(1.15) P{Rmr ≥ `m} ≤ mr
p`m
≤ 1/mr(b−1)
.
By the Borel-Cantelli lemma, with probability one, Rmr ≤ `m for large
enough m. (Though how large m needs to be is random.) Consequently,
with probability one,
lim
m→∞
Rmr
log mr
≤ −
b
log p
.

1.3. Large deviations as useful estimates 15
Given n, let mn be such that mr
n ≤ n (mn +1)r. Then Rn ≤ R(mn+1)r
and
lim
n→∞
Rn
log n
≤ lim
n→∞
log(mn + 1)r
log mr
n
·
R(mn+1)r
log(mn + 1)r
≤ −
b
log p
.
Taking b 1 along a sequence shows that
P
n
lim
n→∞
Rn
log n
≤ −
1
log p
o
= 1.
We have the upper bound for the goal (1.14).
Fix a ∈ (0, 1) and let `n = b−a log n/ log pc. Let Ai be the event that
Xi`n+1 = · · · = X(i+1)`n
= 1. Then
{Rn `n} ⊂
bn/`nc−1

i=0
Ac
i .
By the independence of the Ai’s
(1.16) P{Rn `n} ≤ (1 − p`n
)
n
`n
−1
≤ e− p`n ( n
`n
−1)
≤ e1−n1−a/`n
.
Once again, by the Borel-Cantelli lemma, Rn `n happens only finitely
often, with probability one, and thus limn→∞ Rn/ log n ≥ −a/ log p. Taking
a % 1 proves that
P
n
lim
n→∞
Rn
log n
≥ −
1
log p
o
= 1.
Looking back, the proof relied again on a right-tail estimate (1.15) and a
left-tail estimate (1.16). It might be a stretch to call (1.15) a large devia-
tion bound since it is not exponential, but (1.16) can be viewed as a large
deviation bound.
Remark 1.6. Combining the limit theorem above with the fact that the
variance of Rn remains bounded as n → ∞ (see [10, 42]) provides a very
accurate test of the hypothesis that the sequence {Xn} is i.i.d. Bernoulli
with probability of success p.

Chapter 2
The large deviation
principle
2.1. Precise asymptotics on an exponential scale
Since the 1960’s a standard formalism has been employed to express limits
of probabilities of rare events on an exponential scale. The term for these
statements is large deviation principle (LDP). We introduce this in a fairly
abstract setting and then return to the Bernoulli example.
There is a sequence {µn} of probability measures whose asymptotics we
are interested in. These measures exist on some measurable space (X, B).
Throughout our general discussion we take X to be a Hausdorff topological
space, unless further assumptions are placed on it. B = BX is the Borel
σ-algebra of X, and M1(X) is the space of probability measures on the
measurable space (X, BX ). Thus {µn} is a sequence in the space M1(X).
In Example 1.1 X = R and µn is the probability distribution of Sn/n:
µn(A) = P{Sn/n ∈ A} for Borel subsets A ⊂ R.
Remark on mathematical generality. A reader not familiar with point-
set topology can assume that X is a metric space without any harm. Even
taking X = R or Rd will do for a while. However, later we will study large
deviations on spaces of probability measures, and the more abstract point
of view becomes a necessity. If the notion of a Borel set is not familiar, it is
safe to think of Borel sets as “all the reasonable sets for which a probability
can be defined.”
To formulate a general large deviation statement, let us look at result
(1.2) of Example 1.1 for guidance. The first ingredient of interest in (1.2) is
the normalization n−1 in front of the logarithm. Obviously this can change
17

18 2. The large deviation principle
in a different example. Thus we should consider probabilities µn(A) that
decay roughly like e−rnC(A) for some normalization rn % ∞ and a constant
C(A) ∈ [0, ∞] that depends on the event A.
In (1.2) we identified a rate function. How should the constant C(A)
relate to a rate function? Consider a finite set A = {x1, . . . , xn}. Then
asymptotically
r−1
n log µn(A) = r−1
n log
X
i
µn{xi} ≈ max
i
r−1
n log µn{xi}
so that C(A) = mini C(xi). This suggests that in general C(A) should be
the infimum of a rate function I over A.
The final technical point is that it is in general unrealistic to expect
r−1
n log µn(A) to actually converge on account of boundary effects, even if A
is a nice set. A reasonable goal is to expect statements in terms of limsup
and liminf.
From these considerations we arrive at the following tentative formula-
tion of a large deviation principle: for Borel subsets A of the space X,
− inf
x∈A◦
I(x) ≤ lim
n→∞
1
rn
log µn(A) ≤ lim
n→∞
1
rn
log µn(A) ≤ − inf
x∈A
I(x),
(2.1)
where A◦ and A are, respectively, the topological interior and closure of A.
This statement is basically what we want, except that we need to address
the uniqueness of the rate function.
Example 2.1. Let us return to the i.i.d. Bernoulli sequence {Xn} of Exam-
ple 1.1. We claim that probability measures µn(A) = P{Sn/n ∈ A} satisfy
(2.1) with normalization rn = n and rate Ip of (1.2). This follows from (1.2)
with a small argument.
For an open set G and s ∈ G ∩ [0, 1], bnsc/n ∈ G for large enough n. So
lim
n→∞
1
n
log P{Sn/n ∈ G} ≥ lim
n→∞
1
n
log P{Sn = bnsc} = −Ip(s).
This holds also for s ∈ Gr[0, 1] because Ip(s) = ∞. Taking supremum over
s ∈ G on the right gives the inequality
lim
n→∞
1
n
log P{Sn/n ∈ G} ≥ sup
s∈G
−Ip(s)

= − inf
s∈G
Ip(s).
With G = A◦ this gives the lower bound in (2.1).
Split a closed set F into F1 = F ∩ (−∞, p] and F2 = F ∩ [p, ∞). First
prove the upper bound in (2.1) for F1 and F2 separately. Let a = sup F1 ≤ p
and b = inf F2 ≥ p. (If F1 is empty then a = −∞ and if F2 is empty then

2.1. Precise asymptotics on an exponential scale 19
b = ∞.) Assume first that a ≥ 0. Then
1
n
log P{Sn/n ∈ F1} ≤
1
n
log P{Sn/n ∈ [0, a]} =
1
n
log
bnac
X
k=0
P{Sn = k}.
∗Exercise 2.2. Prove that P{Sn = k} increases with k ≤ bnac.
By the exercise above,
lim
n→∞
1
n
log P{Sn/n ∈ F1} ≤ lim
n→∞
1
n
log(bnac + 1)P{Sn = bnac} = −Ip(a).
This formula is still valid even when a 0 because the probability vanishes.
A similar upper bound works for F2. Next write
1
n
log P{Sn/n ∈ F} ≤
1
n
log

P{Sn/n ∈ F1} + P{Sn/n ∈ F2}

≤
1
n
log 2 + max
1
n
log P{Sn/n ∈ F1},
1
n
log P{Sn/n ∈ F2}

.
Ip is decreasing on [0, p] and increasing on [p, 1]. Hence, infF1 Ip = Ip(a),
infF2 Ip = Ip(b), and infF Ip = min(Ip(a), Ip(b)). Finally,
lim
n→∞
1
n
log P{Sn/n ∈ F} ≤ − min(Ip(a), Ip(b)) = − inf
F
Ip.
If we now take F = A, the upper bound in (2.1) follows.
We have shown that (2.1) holds with Ip defined in (1.2). This is our first
example of a full-fledged large deviation principle.
Remark 2.3. The limsup for closed sets and liminf for open sets in (2.1) re-
mind us of weak convergence of probability measures where the same bound-
ary issue arises. Section B.4 gives the definition of weak convergence.
These exercises contain other instances where the rate function can be
derived by hand.
Exercise 2.4. Prove (2.1) for the distribution of the sample mean of an
i.i.d. sequence of real-valued normal random variables. Identifying I is part
of the task.
Hint: The density of Sn/n can be written down explicitly. This suggests
I(x) = (x − µ)2/(2σ2), where µ is the mean and σ2 is the variance of X1.
Exercise 2.5. Prove (2.1) for the distribution of the sample mean of an i.i.d.
sequence of exponential random variables and compute the rate function
explicitly.
Hint: Use Stirling’s formula.

2.2. Lower semicontinuous and tight rate functions
We continue with some general facts and then in Definition 2.12 state pre-
cisely what is meant by a large deviation principle. We recall the definition
of a lower semicontinuous function.
Definition 2.6. A function f : X → [−∞, ∞] is lower semicontinuous if
{f ≤ c} = {x ∈ X : f(x) ≤ c} is a closed subset of X for all c ∈ R.
∗Exercise 2.7. Prove that if X is a metric space then f is lower semicon-
tinuous if and only if limy→x f(y) ≥ f(x) for all x.
An important transformation produces a lower semicontinuous function
flsc from an arbitrary function f : X → [−∞, ∞]. This lower semicontinuous
regularization of f is defined by
flsc(x) = sup
n
inf
y∈G
f(y) : G 3 x and G is open
o
.
(2.2)
This turns out to be the maximal lower semicontinuous minorant of f.
Lemma 2.8. flsc is lower semicontinuous and flsc(x) ≤ f(x) for all x. If
g is lower semicontinuous and satisfies g(x) ≤ f(x) for all x, then g(x) ≤
flsc(x) for all x. In particular, if f is lower semicontinuous, then f = flsc.
Proof. flsc ≤ f is clear. To show flsc is lower semicontinuous, let x ∈
{flsc c}. Then there is an open set G containing x such that infG f c.
Hence by the supremum in the definition of flsc, flsc(y) ≥ infG f c for all
y ∈ G. Thus G is an open neighborhood of x contained in {flsc c}. So
{flsc c} is open.
To show g ≤ flsc one just needs to show that glsc = g. For then
g(x) = sup
n
inf
G
g : x ∈ G and G is open
o
≤ sup
n
inf
G
f : x ∈ G and G is open
o
= flsc(x).
We already know that glsc ≤ g. To show the other direction let c be such that
g(x) c. Then, G = {g c} is an open set containing x and infG g ≥ c.
Thus glsc(x) ≥ c. Now increase c to g(x).
The above can be reinterpreted in terms of epigraphs. The epigraph of
a function f is the set epi f = {(x, t) ∈ X × R : f(x) ≤ t}. For the next
lemma we endow X × R with its product topology.
Lemma 2.9. The epigraph of flsc is the closure of epi f.
Proof. Note that the epigraph of flsc is closed. That it contains the epigraph
of f (and thus also the closure of the epigraph of f) is immediate because

2.2. Lower semicontinuous and tight rate functions 21
flsc ≤ f. For the other inclusion we need to show that any open set outside
the epigraph of f is also outside the epigraph of flsc. Let A be such a set
and let (x, t) ∈ A. By the definition of the product topology, there is an
open neighborhood G of x and an ε 0 such that G × (t − ε, t + ε) ⊂ A.
So for any y ∈ G and any s ∈ (t − ε, t + ε), s f(y). In particular,
t + ε/2 ≤ infG f ≤ flsc(x). So (x, t) is outside the epigraph of flsc.
Lower semicontinuous regularization can also be expressed in terms of
pointwise alterations of the values of f.
Exercise 2.10. Assume X is a metric space. Show that if xn → x, then
flsc(x) ≤ lim f(xn). Prove that for each x ∈ X there is a sequence xn → x
such that f(xn) → flsc(x). (The constant sequence xn = x is allowed here.)
This gives the alternate definition flsc(x) = min(f(x), limy→x f(y)).
Now we apply this to large deviation rate functions. The next lemma
shows that rate functions can be assumed to be lower semicontinuous.
Lemma 2.11. Suppose I is a function such that (2.1) holds for all measur-
able sets A. Then (2.1) continues to hold if I is replaced by Ilsc.
Proof. Ilsc ≤ I and the upper bound is immediate. For the lower bound
observe that infG Ilsc = infG I when G is open.
Due to Lemma 2.11 we will call a [0, ∞]-valued function I a rate function
only when it is lower semicontinuous. Here is the precise definition of a large
deviation principle (LDP) for the remainder of the text.
Definition 2.12. Let I : X → [0, ∞] be a lower semicontinuous function
and rn % ∞ a sequence of positive real constants. A sequence of probability
measures {µn} ⊂ M1(X) is said to satisfy a large deviation principle with
rate function I and normalization rn if the following inequalities hold:
lim
n→∞
1
rn
log µn(F) ≤ − inf
x∈F
I(x) ∀ closed F ⊂ X,
(2.3)
lim
n→∞
1
rn
log µn(G) ≥ − inf
x∈G
I(x) ∀ open G ⊂ X.
(2.4)
We will abbreviate LDP(µn, rn, I) if all of the above holds. When the sets
{I ≤ c} are compact for all c ∈ R, we say I is a tight rate function.
Lower semicontinuity makes a rate function unique. For this we assume
of X a little bit more than Hausdorff. A topological space is regular if
points and closed sets can be separated by disjoint open neighborhoods. In
particular, metric spaces are regular topological spaces.

Theorem 2.13. If X is a regular topological space, then there is at most one
(lower semicontinuous) rate function satisfying the large deviation bounds
(2.3) and (2.4).
Proof. We show that I satisfies
I(x) = sup
n
− lim
1
rn
log µn(B) : B 3 x and B is open
o
.
One direction is easy: for all open B 3 x
− lim
1
rn
log µn(B) ≤ inf
B
I ≤ I(x).
For the other direction, fix x and let c I(x). One can separate x from {I ≤
c} by disjoint neighborhoods. Thus, there exists an open set G containing
x and such that G ⊂ {I c}. (Note that this is true also for c 0, which
is relevant in case I(x) = 0.) Then
sup
n
− lim
1
rn
log µn(B) : B 3 x and B is open
o
≥ − lim
1
rn
log µn(G) ≥ − lim
1
rn
log µn( G ) ≥ inf
G
I ≥ c.
Increasing c to I(x) concludes the proof.
Remark 2.14. Tightness of a rate function is a very useful property, as
illustrated by the two exercises below. In a large part of the large deviation
literature a rate function I is called good when the sets {I ≤ c} are compact
for c ∈ R. We prefer the term tight as more descriptive and because of the
connection with exponential tightness: see Theorem 2.19 below.
∗Exercise 2.15. Suppose X is a Hausdorff topological space and let E ⊂ X
be a closed set. Assume that the relative topology on E is metrized by the
metric d. Let I : E → [0, ∞] be a tight rate function and fix an arbitrary
closed set F ⊂ E. Prove that
lim
ε0
inf
Fε
I = inf
F
I,
where Fε = {x ∈ E : ∃y ∈ F such that d(x, y) ε}.
∗Exercise 2.16. X and E as in the exercise above. Suppose ξn and ηn are
E-valued random variables defined on (Ω, F, P), and for any δ 0 there
exists an n0 ∞ such that d(ξn(ω), ηn(ω)) δ for all n ≥ n0 and ω ∈ Ω.
(a) Show that if the distributions of ξn satisfy the lower large deviation
bound (2.4) with some rate function I : E → [0, ∞], then so do the
distributions of ηn.

2.3. Weak large deviation principle 23
(b) Show that if the distributions of ξn satisfy the upper large deviation
bound (2.3) with some tight rate function I : E → [0, ∞], then so
do the distributions of ηn.
2.3. Weak large deviation principle
It turns out that it is sometimes difficult to satisfy the upper bound (2.3)
for all closed sets. A useful weakening of the LDP requires the upper bound
only for compact sets.
Definition 2.17. A sequence of probability measures {µn} ⊂ M1(X) satis-
fies a weak large deviation principle with lower semicontinuous rate function
I : X → [0, ∞] and normalization {rn} if the lower large deviation bound
(2.4) holds for all open sets G ⊂ X and the upper large deviation bound
(2.3) holds for all compact sets F ⊂ X.
With enough control on the tails of the measures µn, a weak LDP is
sufficient for the full LDP.
Definition 2.18. We say {µn} ⊂ M1(X) is exponentially tight with nor-
malization rn if for each 0 b ∞ there exists a compact set Kb such
that
(2.5) lim
n→∞
1
rn
log µn(Kc
b ) ≤ −b.
Theorem 2.19. Assume the upper bound (2.3) holds for compact sets and
{µn} is exponentially tight with normalization rn. Then the upper bound
(2.3) holds for all closed sets with the same rate function I.
If the weak LDP(µn, rn, I) holds and {µn} is exponentially tight with
normalization rn, then the full LDP(µn, rn, I) holds and I is a tight rate
function.
Proof. Let F be a closed set.
lim
n→∞
1
rn
log µn(F) ≤ lim
n→∞
1
rn
log µn(F ∩ Kb) + µn(Kc
b )

≤ max

−b , lim
n→∞
1
rn
log µn(F ∩ Kb)

≤ max −b , − inf
F∩Kb
I

≤ max −b , − inf
F
I

.
Letting b % ∞ proves the upper large deviation bound (2.3).

The weak LDP already contains the lower large deviation bound (2.4)
and so we have both bounds. From the lower bound and exponential tight-
ness follows
inf
Kc
b+1
I ≥ − lim
n→∞
1
rn
log µn(Kc
b+1) ≥ b + 1.
This implies that {I ≤ b} ⊂ Kb+1. As a closed subset of a compact set
{I ≤ b} is compact.
The connection between a tight rate function and exponential tightness
is an equivalence if we assume a little more of the space. To prove the
other implication in Theorem 2.21 below we give an equivalent reformulation
of exponential tightness in terms of open balls. In a metric space (X, d),
B(x, r) = {y ∈ X : d(x, y) r} is the open r-ball centered at x.
Lemma 2.20. Let {µn} be a sequence of probability measures on a Polish
space X. (A Polish space is a complete and separable metric space.) Then
{µn} is exponentially tight if and only if for every b ∞ and δ 0 there
exist finitely many δ-balls B1, . . . , Bm such that
µn
h m
[
i=1
Bi
ic
≤ e−rnb
∀ n ∈ N.
Proof. Ulam’s theorem (page 280) says that on a Polish space an individual
probability measure ν is tight, which means that ∀ε 0 there exists a com-
pact set A such that ν(Ac) ε. Consequently on such a space exponential
tightness is equivalent to the stronger statement that for all b ∞ there
exists a compact set Kb such that µn(Kc
b ) ≤ e−rnb for all n ∈ N.
Since a compact set can be covered by finitely many δ-balls, the ball
condition is a consequence of this stronger form of exponential tightness.
Conversely, assume the ball condition and let 1 ≤ b ∞. We need to
produce the compact set Kb. For each k ∈ N, find mk balls Bk,1, . . . , Bk,mk
of radius k−1 such that
µn
h mk
[
i=1
Bk,i
ic
≤ e−2krnb
∀ n ∈ N.
Let K =
T∞
k=1
Smk
i=1 Bk,i. As a closed subset of X, K is complete. By its
construction K is totally bounded. This means that for any ε 0 it can
be covered by finitely many ε-balls. Completeness and total boundedness
are equivalent to compactness in a metric space [26, Theorem 2.3.1]. By
explicitly evaluating the geometric series and some elementary estimation,
µn(Kc
) ≤
∞
X
k=1
e−2krnb
≤ e−rnb

2.3. Weak large deviation principle 25
as long as rn ≥ 1. Exponential tightness has been verified.
Theorem 2.21. Suppose X is a Polish space. Assume probability measures
{µn} satisfy the upper large deviation bound (2.3) with a tight rate function
I. Then {µn} is exponentially tight.
Proof. Let {xi}i∈N be a countable dense set in X. Suppose we can show
that for every b ∞ and ε 0 there exists m ∈ N such that
(2.6) lim
n→∞
r−1
n log µn
h m
[
i=1
B(xi, ε)
ic
≤ −b.
This is sufficient for exponential tightness by Lemma 2.20. (See Exercise
2.22 below.)
To show (2.6), take m large enough so that the compact set {I ≤ b} is
covered by G = B(x1, ε) ∪ · · · ∪ B(xm, ε). (Since {xi} is dense, the entire
space is covered by
S
i≥1 B(xi, ε), and by compactness {I ≤ b} has a finite
subcover.) By the upper large deviation bound,
lim
n→∞
r−1
n log µn(Gc
) ≤ − inf
x∈Gc
I(x) ≤ −b.
Here is the missing detail from the proof.
∗Exercise 2.22. Show that the condition of Lemma 2.20 follows from the
condition established in the proof above. The fact that the balls B(xi, ε)
cover the entire space is again crucial.
The results of this section offer a strategy for proving an LDP. First
prove a weak LDP and then verify exponential tightness. A weak LDP may
be easier to prove because it reduces entirely to analyzing asymptotics of
r−1
n log µn(B(x, ε)) for small neighborhoods. This idea already appeared in
the proof of Example 2.1 where we reduced the proof to asymptotics of point
probabilities. Here is an example where this method applies.
Exercise 2.23. Prove the large deviation principle for the distribution of
the sample mean Sn/n of an i.i.d. sequence of Rd-valued normal random
variables with mean m and nonsingular covariance matrix A.
Hint: The density of Sn/n suggests I(x) = 1
2(x − m) · A−1(x − m). Note
that this is different from the one-dimensional case in Exercise 2.4 because
one cannot use monotonicity of I and split closed sets F into a part below
m and a part above m.
We end the section with an important theoretical exercise.
∗Exercise 2.24. For x ∈ X, define upper and lower local rate functions by
κ(x) = − inf
G ⊂ X: G open, x ∈ G
lim
n→∞
1
rn
log µn(G)
(2.7)

and
κ(x) = − inf
G ⊂ X: G open, x ∈ G
lim
n→∞
1
rn
log µn(G).
(2.8)
Show that if κ = κ = κ then the weak LDP holds with rate function κ. Note
that, by monotonicity, the same infimum in (2.7) and (2.8) can be taken over
any base of open neighborhoods at x.
2.4. Aspects of Cramér’s theorem
Cramér’s theorem is the LDP for the sample mean Sn/n = (X1 + · · · +
Xn)/n of i.i.d. random variables {Xn} with values in R or Rd. Discussion
around this theorem raises several basic themes of large deviation theory:
moment generating functions, compactness, convexity, minimax theorems,
and the change of measure argument. We prove partial results here, and
formulate many statements as exercises with hints for hands-on practice.
The important themes appear again later, so this section can be skipped.
Though we would recommend that the reader at least skim the main points.
A complete proof of Cramér’s theorem in Rd is given in Section 4.3.
We start by stating the one-dimensional theorem. Let {Xn} be i.i.d.
real-valued random variables, and X another random variable with the same
distribution. The moment generating function is M(θ) = E[eθX] for θ ∈ R.
M(θ) 0 always and M(θ) = ∞ is possible. Define
I(x) = sup
θ∈R
{θx − log M(θ)}.
(2.9)
Since M(0) = 1, I : R → [0, ∞] is a well-defined function.
Cramér’s theorem on R. Let {Xn} be a sequence of i.i.d. real-valued
random variables. Let µn be the distribution of the sample mean Sn/n.
Then the large deviation principle LDP(µn, n, I) is satisfied with I defined
in (2.9).
A proof of this general one-dimensional Cramér theorem that applies to
all i.i.d. sequences can be found in [15]. The case where M is finite in a
neighborhood of 0 is covered by our multidimensional Cramér theorem in
Section 4.3. Here we develop the upper bound and some related facts as a
series of exercises. Then we turn to discuss parts of the multidimensional
Cramér theorem under stronger assumptions.
Using Chebyshev’s inequality
P{Sn ≥ nb} ≤ e−nθb
E[eθSn
] = e−nθb
M(θ)n
for θ ≥ 0,
(2.10)
and P{Sn ≤ na} ≤ e−nθa
E[eθSn
] = e−nθa
M(θ)n
for θ ≤ 0.
(2.11)

2.4. Aspects of Cramér’s theorem 27
From the above we get immediately the upper bounds
lim
n→∞
1
n
log P{Sn ≥ nb} ≤ − sup
θ≥0
{θb − log M(θ)}
and lim
n→∞
1
n
log P{Sn ≤ na} ≤ − sup
θ≤0
{θa − log M(θ)}.
∗Exercise 2.25. Suppose X has a finite mean x̄ = E[X]. Prove that if
a ≤ x̄ ≤ b, then
sup
θ≥0
{θb − log M(θ)} = sup
θ∈R
{θb − log M(θ)}
and sup
θ≤0
{θa − log M(θ)} = sup
θ∈R
{θa − log M(θ)}.
Hint: Use Jensen’s inequality to show that θb−log M(θ) ≤ 0 for θ 0 and
θa − log M(θ) ≤ 0 for θ 0.
Definition 2.26. A subset A of a vector space X is convex if for all x, y ∈ A
and t ∈ [0, 1], tx + (1 − t)y ∈ A. A function f : X → [−∞, ∞] is convex if
f(tx +(1 − t)y) ≤ tf(x)+ (1 − t)f(y) for all x, y ∈ X and t ∈ [0, 1] such that
the right-hand side of the inequality is well-defined (that is, not ∞ − ∞).
∗Exercise 2.27. Prove that I is lower semicontinuous, convex, and that if
x̄ = E[X] is finite then I achieves its minimum at x̄ with I(x̄) = 0.
Hint: I is a supremum of lower semicontinuous convex functions. I(x) ≥ 0
for all x, but by Jensen’s inequality I(x̄) ≤ 0.
∗Exercise 2.28. Suppose M(θ) ∞ in some open neighborhood around
the origin. Show that then x̄ is the unique zero of I: that is, x 6= x̄ implies
I(x) 0.
Hint: For any x x̄, (log M(θ))0 x for θ in some interval (0, δ).
Exercise 2.29. Check that the rate functions found in Example 1.1 and
Exercises 2.4 and 2.5 match (2.9).
Exercise 2.27 together with the earlier observations shows that when x̄
is finite I(x) is nonincreasing for x x̄ and nondecreasing for x x̄. In
particular, if a ≤ x̄ ≤ b, then I(a) = infx≤a I(x) and I(b) = infx≥b I(x).
This proves the upper bound for the sets F = (−∞, a] and F = [b, ∞) in
the case where the mean is finite.
Exercise 2.30. Prove that the sample mean Sn/n of i.i.d. real-valued ran-
dom variables satisfies the upper large deviation bound (2.3) with normal-
ization n and rate I defined in (2.9), with no further assumptions on the
distribution.

Hint: The case of finite mean is almost done above. Then consider sep-
arately the cases where the mean is infinite and where the mean does not
exist.
While Cramér’s theorem is valid in general, it does not give much infor-
mation unless the variables have exponentially decaying tails. This point is
explored in the next exercise.
Exercise 2.31. Let {Xi} be an i.i.d. real-valued sequence. Assume E[X2
1 ]
∞ but, for any ε 0, P{X1 b} e−εb for all large enough b. Show that
(a) limn→∞
1
n log P{Sn/n E[X1] + δ} = 0 for any δ 0.
(b) The rate function is identically 0 on [E(X1), ∞).
Hint: For (a), deduce
P{Sn/n ≥ E[X1] + δ} ≥ P{Sn−1 ≥ (n − 1)E[X1]}P{X1 ≥ nδ + E[X1]}
and apply the central limit theorem. For (b), first find M(θ) for θ 0.
Then observe that for θ ≤ 0 and x ≥ E[X1],
θx − log M(θ) ≤ θ(x − E[X1]) ≤ 0.
Exercise 2.32. Let {Xi} be an i.i.d. real-valued sequence. Prove that the
closure of the set {I ∞} is the same as the closure of the convex hull
of the support of the distribution of X. (The convex hull of a set is the
intersection of all convex sets containing it.)
Hint: Let K be the latter set and y /
∈ K. To show that I(y) = ∞, find θ ∈ R
such that θy−ε supx∈K xθ. For the other direction, take y in the interior of
{I = ∞}. To get y 6∈ K, show first that there exists a sequence θn converging
to either ∞ or −∞ such that φy(θn) = θny−log M(θn) converges to infinity.
Assume θn → ∞. Show that for some ε, |x − y| ≤ ε implies φx(θ) → ∞ as
θ → ∞. Then, for θ 0, θ(y − ε) − log M(θ) ≤ − log µ{x : |x − y| ≤ ε}
where µ is the distribution of X. Let θ → ∞.
Cramér’s theorem is quite crude because only the exponentially decaying
terms of a full expansion affect the result. In some cases one can derive much
more precise asymptotics.
Exercise 2.33. Prove that if {Xk} are i.i.d. standard normal, then for any
k ∈ N and a 0
log P{Sn ≥ an} ∼ −
a2n
2
−
1
2
log(2πna2
)
+ log

1 −
1
a2n
+
1 · 3
a4n2
− · · · + (−1)k 1 · 3 · 5 · · · (2k − 1)
a2knk

.

Hint: Observe that
d
dx

e−x2/2
n
X
k=0
(−1)k
(1 · 3 · · · (2k − 1))x−2k−1

(
−e−x2/2 if n is even,
−e−x2/2 if n is odd.
Exercise 2.34. Continuing Exercise 2.29, derive Cramér rate functions for
further basic distributions.
(a) For real α 0, the rate α exponential distribution has density f(x) =
αe−αx on R+. Derive the Cramér rate function
I(x) = αx − 1 − log αx for x 0.
(b) For real λ 0, the mean λ Poisson distribution has probability mass
function p(k) = e−λλk/k! for k ∈ Z+. Derive the Cramér rate function
I(x) = x log(x/λ) − x + λ for x ≥ 0.
We turn to Cramér’s theorem in multiple dimensions. When {Xn} are
Rd-valued, the moment generating function is given by M(θ) = E[eθ·X] for
θ ∈ Rd. Again, M(θ) ∈ (0, ∞]. Define
I(x) = sup
θ∈Rd
{θ · x − log M(θ)}.
(2.12)
Exercise 2.35. Check that Exercises 2.27 and 2.28 apply to the multidi-
mensional case as well.
Hölder’s inequality implies that log M(θ) is a convex function: with
t ∈ (0, 1), p = 1/t and q = 1/(1 − t),
(2.13)
M(tθ1 + (1 − t)θ2) = E[etθ1·X
e(1−t)θ2·X
]
≤ E[eθ1·X
]t
E[eθ2·X
]1−t
= M(θ1)t
M(θ2)1−t
.
The full LDP of the one-dimensional Cramér theorem does not generalize
to multiple dimensions without an additional assumption. Counterexamples
appear in [20].
Cramér’s theorem on Rd. Let {Xn} be a sequence of i.i.d. Rd-valued
random variables and let µn be the distribution of the sample mean Sn/n.
Then without further assumptions weak LDP(µn, n, I) holds with I defined in
(2.12). If, moreover, M(θ) ∞ in a neighborhood of 0, then LDP(µn, n, I)
holds and I is a tight rate function.
At this point we prove the upper bound for compact sets without as-
sumptions on M and then exponential tightness assuming that M is finite
near the origin. Then we give a proof of the lower bound under the restrictive
assumption
M(θ) ∞ for all θ ∈ Rd
and |θ|−1
log M(θ) → ∞ as |θ| → ∞.
(2.14)

Both proofs introduce important techniques. Assumption (2.14) ensures
that the supremum in (2.12) is achieved. This is precisely the issue that
needs to be overcome when no assumptions on M are present. In Section
4.3 we revisit the theorem and prove its final version.
Proof of the upper bound for compacts and exponential tightness.
For any Borel set C and θ ∈ Rd,
P{Sn/n ∈ C} = E[1{Sn/n ∈ C}] ≤ e− infy∈C nθ·y
E[eθ·Sn
]
= e−n infy∈C θ·y
M(θ)n
.
This shows that
(2.15)
1
n
log P{Sn/n ∈ C} ≤ − sup
θ
inf
y∈C
{θ · y − log M(θ)}.
We would like to interchange the sup and the inf to find I(y) on the right-
hand side. This can be done if C is a compact convex set.
Minimax theorem on Rd. Let C ⊂ Rd be compact and convex. Let D ⊂
Rd be convex. Let f : C × D → R be such that for each θ ∈ D, f(y, θ) is
convex and continuous in y ∈ C, and for each y ∈ C, f(y, θ) is concave in
θ ∈ D. Then
sup
θ∈D
inf
y∈C
f(y, θ) = inf
y∈C
sup
θ∈D
f(y, θ).
This theorem is a special case of the more general minimax theorem
proved in Appendix A.5. To have a feeling for the theorem above think
of a horse saddle in R3. We have a smooth function that is convex in one
direction and concave in the other. Taking sup in the concave direction and
inf in the convex one will result in the saddle point regardless of the order.
The set D = {θ : M(θ) ∞} is convex by (2.13), C is a compact convex
set by assumption, and f(y, θ) = θ · y − log M(θ) satisfies the assumptions
of the minimax theorem. Thus the sup and the inf can be switched in (2.15)
to give
(2.16)
1
n
log P{Sn/n ∈ C} ≤ − inf
y∈C
sup
θ
{θ · y − log M(θ)} = − inf
y∈C
I(y).
We have the upper bound with rate function I of (2.12) for compact convex
sets, even without taking the n → ∞ limit.
We extend the upper bound to an arbitrary compact set K. Let α
infK I. Since I is lower semicontinuous {I α} is open. For each x ∈ K ⊂
{I α} pick a compact ball Cx centered at x with nonempty interior and
such that Cx ⊂ {I α}. Cover K with a finite collection Cx1 , . . . , CxN of

such balls. The upper bound for compact convex sets gives
P{Sn/n ∈ K} ≤
N
X
i=1
P{Sn/n ∈ Cxi } ≤
N
X
i=1
e−n infCxi
I
≤ Ne−nα
.
Taking n % ∞ and then α % infK I gives the upper bound (2.3) in weak
LDP(µn, n, I).
Last, we verify exponential tightness under the assumption that M is
finite near the origin. Theorem 2.19 then implies the upper bound for closed
sets. To this end, from (2.10) and (2.11) it follows that for any b 0 we can
find a large enough a = a(b) 0 such that
P{|S(i)
n | ≥ na} ≤ e−bn
for i = 1, 2, . . . , d, and all n ∈ N.
Here y(i) denotes the ith coordinate of a vector y ∈ Rd. Definition 2.18
of exponential tightness is satisfied with rn = n and Kb = {y : |y(i)| ≤
a(b) for all i = 1, . . . , d}.
Exercise 2.36. The minimax theorem was used above to turn (2.15) into
the non-asymptotic upper bound (2.16) for compact convex sets. This was
done to illustrate the minimax trick and because bounds that are valid for
finite n are useful. However, we can proceed directly from (2.15) to the
upper large deviation bound for a general compact set K. Fill in the details
in the following outline. With notation as above, for each x ∈ K find θx
such that θx · x − log M(θx) α. Pick a compact convex ball Ux centered
at x and with nonempty interior such that θx · y − log M(θx) α − ε for
y ∈ Ux. Proceed as in the proof above.
Proof of Cramér’s lower bound under (2.14). We introduce the clas-
sical change of measure argument for the lower bound. Let our random
variables {Xk} be defined on a probability space (Ω, F, P).
On any open set where M(θ) is finite it is differentiable and ∇M(θ) =
E[Xeθ·X]. This is by dominated convergence. Thus θ · x − log M(θ) is a
concave differentiable function of θ that, by (2.14), achieves its maximum
I(x) at some θx. Then ∇M(θx) = xM(θx).
Define the probability measure νx on Rd by
νx(B) =
1
M(θx)
E[eθx·X
1{X ∈ B}], B ∈ BRd .
The mean of νx is
Z
Rd
y νx(dy) =
E[Xeθx·X]
M(θx)
=
∇M(θx)
M(θx)
= x.

Let Qx,n be the probability measure on Ω defined by
Qx,n(A) =
E[1A · eθx·Sn ]
E[eθx·Sn ]
for A ∈ F.
Now for the lower bound. Take an open set G ⊂ Rd, x ∈ G, and ε 0
such that {y : |y − x| ε} ⊂ G.
P{Sn/n ∈ G} ≥ P{|Sn − nx| εn}
≥ e−nθx·x−nε|θx|
E[eθx·Sn
1{|Sn − nx| εn}]
= e−nθx·x−nε|θx|
M(θx)n
Qx,n{|Sn − nx| εn}.
The key observation is that under Qx,n the variables X1, X2, . . . , Xn are i.i.d.
νx-distributed: for B1, . . . , Bn ∈ BRd ,
Qx,n
n

k=1
{Xk ∈ Bk}

=
n
Y
k=1
E[1Bk
(X)eθx·X]
E[eθx·X]
=
n
Y
k=1
νx(Bk).
By the law of large numbers Qx,n{|Sn − nx| εn} → 1, and we get the
bound
lim
n→∞
1
n
log P{Sn/n ∈ G} ≥ −I(x) − ε|θx|.
Taking ε → 0 and sup over x ∈ G on the right proves the lower bound
(2.4).
The measure νx is called the tilted measure. The dependence on n in Qx,n
is an artifact we can eliminate by using a single infinite product measure on
a sequence space. This is what we do in Section 5.2 on Sanov’s theorem.
The change of measure argument replaced the original measure P by a
new measure Qx,n under which outcome x became typical rather than rare.
In the proof this appears to be merely a trick, but we shall see later that there
is more to it. Namely, to produce the deviation Sn ≈ nx the process {Xk}
actually behaves like an i.i.d. νx-sequence. This is an interesting conclusion.
A priori one could also imagine that the system prefers to deviate a small
number of variables while letting most Xk’s behave in a typical fashion.
(See Exercises 2.38 and 6.19 and the related maximum entropy principle in
Section 5.3.) A lesson of large deviation theory is that a deviation is not
produced in an arbitrary manner, but rather in the most probable way, and
this can be captured by the rate function.
Exercise 2.37. Let {Xn} be i.i.d. Bernoulli random variables with success
probability p ∈ [0, 1]. Show that for s ∈ [0, 1] the measure νs in the proof
above is the Bernoulli measure with success probability s. Investigate νx for
your other favorite distributions.

2.5. Limits, deviations, and fluctuations 33
Exercise 2.38. Let Sn = X1 + · · · + Xn be simple symmetric random walk
on Z. That is, {Xk} are i.i.d. with distribution P(Xk = ±1) = 1/2. Let
a ∈ [0, 1]. With elementary calculation find the limit of the process {Xk}
conditioned on |Sn − bnac| ≤ 1, as n → ∞.
Hint: Fix x1, . . . , xm ∈ {±1}, write the probability P(X1 = x1, . . . , Xm =
xm | |Sn − bnac| ≤ 1) in terms of factorials and observe the asymptotics.
Note that the conditioning event cannot always be written Sn = bnac be-
cause Sn must have the parity of n.
2.5. Limits, deviations, and fluctuations
Let {Yn} be a sequence of random variables with values in a metric space
(X, d) and let µn be the distribution of Yn, that is, µn(B) = P{Yn ∈ B}
for B ∈ BX . Naturally an LDP for the sequence {µn} is related to the
asymptotic behavior of Yn. Suppose LDP(µn, rn, I) holds and Yn → ȳ in
probability. Then the limit ȳ does not represent a deviation. The rate
function I recognizes this with the value I(ȳ) = 0 that follows from the
upper bound. For any open neighborhood G of ȳ we have µn(G) → 1.
Consequently for the closure
0 ≤ inf
G
I ≤ − lim r−1
n log µn(G) = 0.
Let G shrink down to ȳ. Lower semicontinuity forces I(ȳ) = 0.
Every LDP satisfies inf I = 0, as can be seen by taking F = X in the
upper bound (2.3). But the zero set of I does not necessarily represent limit
values. It may simply be that the probability of a deviation decays slower
than exponentially in rn which leads to I = 0.
Exercise 2.39. In case the reader prefers an off-the-shelf example rather
than playing with his own examples, here is one. Fix a sequence 0 an %
∞, let m denote Lebesgue measure, and define {µn} on R by
µn(A) = (1 − a−1
n )1A(0) + a−1
n m(A ∩ (0, 1]).
Clearly µn → δ0 weakly, or equivalently, if Yn has distribution µn then
Yn → 0 in probability. Given any c ∈ [0, ∞], show that by an appropriate
choice of rn we can have the LDP with rate function
I(x) =





0, x = 0
c, x ∈ (0, 1]
∞, x /
∈ [0, 1].
Returning to the general discussion, an LDP can imply convergence
of the random variables if the rate function has good properties. Assume
that I is a tight rate function and has a unique zero I(ȳ) = 0. Let A =

{y : d(y, ȳ) ≥ ε}. Compactness and lower semicontinuity ensure that the
infimum u = infA I is achieved. Since ȳ 6∈ A, it must be that u 0. Then,
for n large enough, the upper large deviation bound (2.3) implies
P{d(Yn, ȳ) ≥ ε} ≤ e−rn(infA I−u/2)
= e−rnu/2
.
Thus, Yn → ȳ in probability. If, moreover, rn grows fast enough so that
P
e−crn ∞ ∀c 0, then the Borel-Cantelli lemma implies that Yn → ȳ
almost surely.
For i.i.d. variables Cramér’s theorem should also be understood in rela-
tion to the central limit theorem (CLT). Consider the case where M(θ) is
finite in a neighborhood of the origin so that X has finite mean x̄ = E[X]
and finite variance σ2, and I(x) 0 for x 6= x̄ (Exercise 2.28). Then for
each δ 0 we have the large deviation bound
(2.17) P{Sn/n − x̄ ≥ δ} ≤ e−nI(x̄+δ)
.
(Recall (2.10) and Exercise 2.25.)
By contrast, the CLT tells us that small deviations of order n−1/2 con-
verge to a limit distribution: for r ∈ R,
P{Sn/n − x̄ ≥ rn−1/2
} −→
n→∞
∞
Z
r
e−s2/2σ2
√
2πσ2
ds.
This distinction is sometimes expressed by saying that the CLT describes
fluctuations as opposed to deviations. There is a significant qualitative dif-
ference between Cramér’s theorem and the CLT. The CLT is an example
of universality: the Gaussian limit is valid for all distributions with finite
variance. The Cramér rate function I on the other hand depends on the
entire distribution. (From convex analysis we will learn that I determines
M.)
There are also results on moderate deviations that fall between large
deviations and CLT fluctuations. For example, if d = 1 and M is finite in a
neighborhood of 0, then for any α ∈ (0, 1/2)
n−2α
log P{|Sn/n − x̄| ≥ δn−1/2+α
} −→
n→∞
−
δ2
2σ2
.
Note that this limit picks the leading exponential factor from the Gaussian.
In Chapter 11 we discuss refinements to Cramér’s theorem and moderate
deviations.

Chapter 3
Large deviations and
asymptotics of
integrals
This chapter takes up two general issues: transferring LDPs from one space
to another by a mapping, and asymptotics of integrals. In the last section
we discuss our first example from statistical mechanics.
3.1. Contraction principle
When f : X → Y is a measurable mapping, a measure µ on X can be
“pushed forward” to a measure ν on Y by the definition ν(B) = µ(f−1(B))
for measurable subsets B ⊂ Y. This definition is abbreviated as ν = µ◦f−1.
It preserves total mass so it transforms probability measures into probability
measures. The contraction or push-forward principle applies this same idea
to transfer an LDP from X to Y. In formula (3.1) below note that by
convention the infimum of an empty set is infinite. Recall also the definition
of the lower semicontinuous regularization ulsc of a function u: ulsc(y) =
sup

infG u : open G 3 y .
Contraction principle. Let X and Y be Hausdorff spaces and f : X → Y
a continuous mapping. Assume LDP(µn, rn, I) on X. Let νn = µn ◦ f−1.
Set
(3.1) e
J(y) = inf
x:f(x)=y
I(x), for y ∈ Y,
and J = e
Jlsc. Then
35

36 3. Large deviations and asymptotics of integrals
(a) LDP(νn, rn, J) holds on Y.
(b) If I is tight, then J = e
J and J is tight as well.
Proof. By Lemma 2.11 it suffices to prove that e
J satisfies the large deviation
bounds (2.3) and (2.4). Take a closed set F ⊂ Y. Then
lim
n→∞
1
rn
log µn(f−1
(F)) ≤ − inf
x∈f−1(F)
I(x) = − inf
y∈F
inf
f(x)=y
I(x) = − inf
y∈F
e
J(y).
The lower bound is proved similarly and (a) follows.
Assume now that I is tight. Observe that if e
J(y) ∞, then f−1(y) is
nonempty and closed, and the nested nonempty compact sets {I ≤ e
J(y) +
1/n} ∩ f−1(y) have a nonempty intersection. Hence e
J(y) = I(x) for some
x ∈ f−1(y). Consequently, { e
J ≤ c} = f({I ≤ c}) is a compact subset of Y.
In particular, { e
J ≤ c} is closed and e
J is lower semicontinuous and is hence
identical to J.
If the rate function I is not tight, then e
J may fail to be lower semicon-
tinuous (and hence J 6= e
J).
Exercise 3.1. Let X = [0, ∞) and µn(dx) = φn(x)dx where
φn(x) = nx−2
e1−n/x
1(0,n)(x).
Show that {µn} are not tight on [0, ∞) but LDP(µn, n, I) holds with I(x) =
x−1 for x 0 and infinite otherwise. (Tightness is discussed in Appendix
B.4.) Note that I is not a tight rate function.
∗Exercise 3.2. Let f : [0, ∞) → S1 = {y ∈ C : |y| = 1} be f(x) =
e2πi x
x+1 and νn = µn ◦ f−1, with µn defined in the previous exercise. Prove
that e
J(z) = inff(x)=z I(x) is not lower semicontinuous and that {νn} are
tight and converge weakly to δ1. Prove also that LDP(νn, n, J) holds with
J(e2πit) = 1−t
t for t ∈ (0, 1].
The simplest situation when the contraction principle is applied is when
X is a subspace of Y.
∗Exercise 3.3. Suppose LDP(µn, rn, I) holds on X and that X is a Haus-
dorff space contained in the larger Hausdorff space Y. Find J so that
LDP(µn, rn, J) holds on Y. What happens when I is tight on X?
Hint: A natural way to extend I is to simply set it to infinity outside X.
However, this may fail to be lower semicontinuous.
The next example is basically a tautology but it is an example of Sanov’s
theorem that comes from Section 5.2.

3.2. Varadhan’s theorem 37
Example 3.4. Fix 0 p 1 and let {Xn} be an i.i.d. Bernoulli sequence
with success probability p. Take X = [0, 1] and express the common distri-
bution of {Xn} as p δ1 + (1 − p)δ0. Here δx is the probability measure that
puts all mass at the single point x, equivalently
δx(A) = 1A(x) =
(
1, x ∈ A
0, x /
∈ A.
Example 2.1 gave LDP(µn, n, Ip) for the distribution µn of Sn/n = (X1 +
· · · + Xn)/n with rate Ip from (1.2). Consider now the empirical measures
Ln =
1
n
n
X
k=1
δXk
.
Ln is a random variable with values in Y = M1({0, 1}). Let νn be its
distribution. The empirical measure usually contains more information than
the sample mean Sn/n, but in the Bernoulli case
Ln =
Sn
n
δ1 +

1 −
Sn
n

δ0.
Hence νn = µn ◦ f−1 for f : X → Y defined by f(s) = sδ1 + (1 − s)δ0.
The contraction principle gives LDP(νn, n, H) with rate function defined for
α ∈ M1({0, 1}) by
H(α) = Ip(s) for α = sδ1 + (1 − s)δ0 with s ∈ [0, 1].
In Chapter 5 we see that H(α) is a relative entropy and that the LDP for
the empirical measure holds in general for i.i.d. processes.
3.2. Varadhan’s theorem
For a measurable function f : X → R bounded above, a probability measure
µ, and a sequence rn → ∞, the moment generating function obeys these
asymptotics:
lim
n→∞
1
rn
log
Z
ernf
dµ = µ-ess sup f.
With c = µ-ess sup f the argument is
c ≥ 1
rn
log
Z
ernf
dµ ≥ 1
rn
log
Z
fc−ε
ernf
dµ ≥ 1
rn
log µ{f c − ε} + c − ε.
Let us replace µ with by a sequence µn. If {µn} satisfies a large deviation
principle with normalization rn, then the rate function I comes into the
picture. The result is known as Varadhan’s theorem. It is a probabilistic
analogue of the well-known Laplace method for asymptotics of integrals
illustrated by the next simple exercise.

38 3. Large deviations and asymptotics of integrals
Exercise 3.5. (Stirling’s formula) Use induction to show that
n! =
Z ∞
0
e−x
xn
dx.
Observe that e−xxn has a unique maximum at x = n. Prove that
lim
n→∞
n!
√
2πn e−nnn
= 1.
Hint: Change variables y = x/n to reduce the problem to one of estimating
an integral of the form
R ∞
0 enf(y) dy. Show that the main contribution to
this integral comes from y ∈ [1−ε, 1+ε] and use Taylor expansion of f near
y = 1.
Varadhan’s theorem. Suppose LDP(µn, rn, I) holds, f : X → [−∞, ∞] is
a continuous function, and
lim
b→∞
lim
n→∞
1
rn
log
Z
f≥b
ernf
dµn = −∞.
(3.2)
Then
lim
n→∞
1
rn
log
Z
ernf
dµn = sup
x:f(x)∧I(x)∞
{f(x) − I(x)}.
Exercise 3.6. A function f bounded above is the trivial case that satisfies
(3.2). More generally, (3.2) follows if there exists α 1 such that
(3.3) sup
n
Z
eαrnf
dµn
1/rn
∞.
Note that even though f(x) = ∞ is allowed, condition (3.2) forces
µn{f = ∞} = 0 for large enough n.
Informally, here is the idea behind Varadhan’s theorem. Suppose that
we can partition the space into small sets Ui with points xi ∈ Ui such that
f ≈ f(xi) on Ui and µn(Ui) ≈ e−rnI(xi). Then for large n the following
approximations are valid:
1
rn
log
Z
ernf
dµn ≈ 1
rn
log
X
i
ernf(xi)
µn(Ui)
≈ 1
rn
log
X
i
ern(f(xi)−I(xi))
≈ max
i
[f(xi) − I(xi)].
We can get stronger statements for separate upper and lower bounds.
Hence we split the proof into two parts.

Random documents with unrelated
content Scribd suggests to you:

broad-brimmed hat, addressing a group of idlers and half-naked
children. I could furnish your correspondent S. A. S. with more
information if needful.
T. J.
Chester.
Blue Bells of Scotland (Vol. viii., p. 388. Vol. ix., p. 209.).—Surely
of Philadelphia is right in supposing that the Blue Bell of
Scotland, in the ballad which goes by that name, is a bell painted
blue, and used as the sign of an inn, and not the flower so called, as
asserted by Henry Stephens, unless indeed there be an older ballad
than the one commonly sung, which, as many of your readers must
be aware, contains this line,—

He dwells in merry Scotland,
At the sign of the Blue Bell.
I remember to have heard that the popularity of this song dates
from the time when it was sung on the stage by Mrs. Jordan.
Can any one inform me whether the air is ancient or modern?
Honoré de Mareville.
Guernsey.
De male quæsitis gaudet non tertius hæres (Vol. ii., p. 167.).—The
quotation here wanted has hitherto been neglected. The words may
be found, with a slight variation, in Bellochii Praxis Moralis
Theologiæ, de casibus reservatis, c., Venetiis, 1627, 4to. As the
work is not common, I send the passage for insertion, which I know
will be acceptable to other correspondents as well as to the querist:
Divino judicio permittitur ut tales surreptores rerum sacrarum
diu ipsis rebus furtivis non lætentur, sed imo ab aliis nequioribus
furibus præfatæ res illis abripiantur, ut de se ipso fassus est ille,
qui in suis ædibus hoc distichon inscripsit, ut refert Jo. Bonif.,
lib. de furt., § contrectatio, num. 134. in fin.:
'Congeries lapidum variis constructa rapinis,
Aut uret, aut ruet, aut raptor alter habebit.'
Et juxta illud:
'De rebus male acquisitis, non gaudebit tertius hæres.'
Lazar (de monitorio), sect. 4. 9. 4., num. 16., imo nec secundus,
ut ingenuè et perbellè fatetur in suo poemate, nostro idiomate
Jerusalem celeste acquistata, cant. x. num. 88. Pater Frater
Augustinus Gallutius de Mandulcho, ita canendo:
'D'un' acquisto sacrilego e immondo,

Gode di rado il successor secondo,
Pero che il primo e mal' accorto herede
Senza discretion li da di piedi.'
Bibliothecar. Chetham.
Mawkin (Vol. ix., pp. 303. 385.).—Is not mawkin merely a corruption
for mannikin? I strongly suspect it to be so, though Forby, in his
Vocabulary of East Anglia, gives the word maukin as if peculiar to
Norfolk and Suffolk, and derives it, like L., from Mal, for Moll or Mary.
F. C. H.
This word, in the Scottish dialect spelt maukin, means a hare. It
occurs in the following verse of Burns in Tam Samson's Elegy:
Rejoice, ye birring paitricks a';
Ye cootie moorcocks, crousely craw;
Ye maukins, cock your fud fu' braw,
Withouten dread;
Your mortal fae is now awa',
Tam Samson's dead!
Kennedy M‘Nab.
Putting a spoke in his wheel (Vol. viii., pp. 269. 351. 576.).—There
is no doubt that putting a spoke in his wheel is offering an
obstruction. But I have always understood the spoke to be, not a
radius of the wheel, but a bar put between the spokes at right
angles, so as to prevent the turning of the wheel; a rude mode of
locking, which I have often seen practised. The correctness of the
metaphor is thus evident.
Wm. Hazel.
Dog Latin (Vol. viii., p. 523.).—The return of a sheriff to a writ which
he had not been able to serve, owing to the defendant's secreting
himself in a swamp, will be new to English readers. It was Non
come-at-ibus in swampo.

Since the adoption of the Federal Constitution, the motto of the
United States has been E pluribus unum. A country sign-painter in
Bucks county, Pennsylvania, painted E pluribur unibus, instead of it
on a sign.
Uneda.
Philadelphia.
Swedish Words current in England (Vol. vii., pp. 231. 366.).—Very
many Swedish words are current in the north of England, e. gr. barn
or bearn (Scotticè bairn), Sw. barn; bleit or blate, bashful, Sw. blöd;
to cleam, to fasten, to spread thickly over, Sw. klemma; cod, pillow,
Sw. kudde; to gly, to squint, Sw. glo; to lope, to leap, Sw. löpa; to
late (Cumberland), to seek, Sw. leta; sackless, without crime, Sw.
saklös; sark, shirt, Sw. särk; to thole (Derbyshire), to endure, Sw.
tala; to walt, to totter, to overthrow, Sw. wälta; to warp, to lay eggs,
Sw. wärpa; wogh (Lancashire), wall, Sw. wägg, c. It is a fact very
little known, that the Swedish language bears the closest
resemblance of all modern languages to the English as regards
grammatical structure, not even the Danish excepted.
Suecas.
Mob (Vol. viii., p. 524.).—I have always understood that this word
was derived from the Latin expression mobile vulgus, which is, I
believe, in Virgil.
Uneda.
Philadelphia.
Days of my Youth (Vol. viii., p. 467.).—In answer to the inquiry
made a few months since, whether Judge St. George Tucker, of
Virginia, was the author of the lines beginning—
Days of my youth.
the undersigned states that he was a friend and relative of Judge
Tucker, and knows him to have been the author. They had a great

run at the time, and found their way not only into the newspapers,
but even into the almanacs of the day.
G. T.
Philadelphia.
Encore (Vol. viii., pp. 387. 524.).—A writer in an English magazine, a
few years ago, proposed that the Latin word repetitus should be
used instead of encore. Among other advantages he suggested that
the people in the gallery of a theatre would pronounce it repeat-it-
us, and thus make English of it.
Uneda.
Philadelphia.
Richard Plantagenet, Earl of Cambridge (Vol. ix., p. 493.).—Your
correspondent will find his question answered by referring to the
History of the Royal Family, 8vo., Lond., 1741, pp. 119. 156. For an
account of this book, which is founded upon the well-known
Sandford's Genealogical History, see Clarke's Bibliotheca Legum,
edit. 1819, p. 174.
T. E. T.
Islington.
Right of redeeming Property (Vol. viii., p. 516.).—This right formerly
existed in Normandy, and, I believe, in other parts of France. In the
bailiwick of Guernsey, the laws of which are based on the ancient
custom of Normandy, the right is still exercised, although it has been
abolished for some years in the neighbouring island of Jersey.
The law only applies to real property, which, by the Norman custom,
was divided in certain proportions among all the children; and this
right of retrait, as it is technically termed, was doubtless intended
to counteract in some measure the too minute division of land, and
to preserve inheritances in families. It must be exercised within a
year of the purchase. For farther information on the subject, Berry's
History of Guernsey, p. 176., may be consulted.

Honoré de Mareville.
Guernsey.
Latin Inscription on Lindsey Court-house (Vol. ix., pp. 492. 552.).—I
cannot but express my surprise at the learned (?) trifling of some of
your correspondents on the inscription upon Lindsey Court-house.
Try it thus:
Fiat Justitia,
1619,
Hæc domus
Odit, amat, punit, conservat, honorat,
Nequitiam, pacem, crimina, jura, bonos.
which will make two lines, an hexameter and a pentameter, the first
letters, O and N, having perhaps been effaced by time or accident.
Neglectus.
[That this emendation is the right one is clear from the communication of another
correspondent, B. R. A. Y., who makes the same, and adds in confirmation, The
following lines existed formerly (and do, perhaps, now) on the Market-house at
Much Wenlock, Shropshire, which will explain their meaning:
'Hic locus
Odit, amat, punit, conservat, honorat,
Nequitiam, pacem, crimina, jura, bonos.'
The O and N, being at the beginning of the lines as given by your correspondent,
were doubtless obliterated by age.]
The restoration of this inscription proposed by me is erroneous, and
must be corrected from the perfect inscription as preserved at
Pistoia and Much Wenlock, cited by another correspondent in p. 552.
The three inscriptions are slightly varied. Perhaps amat pacem is
better than amat leges, on account of the tautology with
conservat jura.
L.

Myrtle Bee (Vol. ix., p. 205. c.).—I have carefully read and reread
the articles on the myrtle bee, and I can come to no other
conclusion than that it is not a bird at all, but an insect, one of the
hawkmoths, and probably the humming-bird hawkmoth. We have so
many indefatigable genuine field naturalists, picking up every
straggler which is blown to our coasts, that I cannot think it possible
there is a bird at all common to any district of England, and yet
totally unknown to science. Now, insects are often exceedingly
abundant in particular localities, yet scarcely known beyond them.
The size C. Brown describes as certainly not larger than half that of
the common wren. The humming-bird (H. M.) is scarcely so large as
this, but its vibratory motion would make it look somewhat larger
than it really is. Its breadth, from tip to tip of the wings, is twenty to
twenty-four lines. The myrtle bee's short flight is rapid, steady, and
direct, exactly that of the hawkmoth. The tongue of the myrtle bee
is round, sharp, and pointed at the end, appearing capable of
penetration, not a bad popular description of the suctorial trunk of
the hawkmoth, from which it gains its generic name, Macroglossa.
Its second pair of wings are of a rusty yellow colour, which, when
closed, would give it it the appearance of being tinged with yellow
about the vent. It has also a tuft of scaly hairs at the extremity of
the abdomen, which would suggest the idea of a tail. In fact, on the
wing, it appears very like a little bird, as attested by its common
name. In habit it generally retires from the mid-day sun, which
would account for its being put up by the dogs. The furze-chat,
mentioned by C. Brown, is the Saxicola rubetra, commonly also
called the whinchat.
Wm. Hazel.
Mousehunt (Vol. ix., p. 65. c.).—G. Tennyson identifies the
mousehunt with the beechmartin, the very largest of our Mustelidæ,
on the authority of Henley the dramatic commentator. Was he a
naturalist too? I never heard of him as such.
Now, Mr. W. R. D. Salmon, who first asked the question, speaks of it
as less than the common weasel, and quotes Mr. Colquhoun's

opinion, that it is only the young of the year. I have no doubt at all
that this is correct. The young of all the Mustelidæ hunt, and to a
casual observer exhibit all the actions of full-grown animals, when
not more than half the size of their parents. There seems no reason
to suppose that there are more than four species known in England,
the weasel, the stoat or ermine, the polecat, and the martin. The
full-grown female of the weasel is much smaller than the male. Go
to any zealous gamekeeper's exhibition, and you will see them of
many gradations in size.
Wm. Hazel.
Longfellow's Hyperion (Vol. ix., p. 495.).—I would offer the
following rather as a suggestion than as an answer to Mordan
Gillott. But it has always appeared to me that Longfellow has
himself explained, by a simple allusion in the work, the reason which
dictated the name of his Hyperion. As the ancients fabled Hyperion
to be the offspring of the heavens and the earth; so, in his
aspirations, and his weakness and sorrows, Flemming (the hero of
the work) personifies, as it were, the mingling of heaven and earth
in the heart and mind of a man of true nobility. The passage to
which I allude is the following:
Noble examples of a high purpose, and a fixed will! Do they
not move, Hyperion-like, on high? Were they not likewise sons
of heaven and earth?—Book iv. ch. 1.
Seleucus.
Benjamin Rush (Vol. ix., p. 451.).—Inquirer asks Why the freedom of
Edinburgh was conferred upon him? I have looked into the Records
of the Town Council, and found the following entry:
4th March, 1767. The Council admit and receive Richard
Stocktoun, Esquire, of New Jersey, Councillour at Law, and
Benjamin Rush, Esquire, of Philadelphia, to be burgesses and
gild brethren of this city, in the most ample form.

But there is no reason assigned.
James Laurie, Conjoint Town Clerk.
Quakers executed in North America (Vol. ix., p. 305.).—A fuller
account of these nefarious proceedings is detailed in an abstract of
the sufferings of the people called Quakers, in 2 vols., 1733; vol. i.
(Appendix) pp. 491-514., and in vol. iii. pp. 195-232.
E. D.

Notices to Correspondents.
For the purpose of inserting as many Replies as possible in this, the
closing Number of our Ninth Volume, we have this week omitted our
usual Notes on Books and Lists of Books wanted to purchase.
W. W. (Malta). Received with many thanks.
R. H. (Oxford). For Kentish Men and Men of Kent, see N. Q., Vol.
v., pp. 321. 615.
Mr. Long's easy Calotype Process reached us too late for insertion
this week. It shall appear in our next.
Notes and Queries is published at noon on Friday, so that the
Country Booksellers may receive Copies in that night's parcels, and
deliver them to their Subscribers on the Saturday.
Notes and Queries is also issued in Monthly Parts, for the
convenience of those who may either have a difficulty in procuring
the unstamped weekly Numbers, or prefer receiving it monthly.
While parties resident in the country or abroad, who may be
desirous of receiving the weekly Numbers, may have stamped copies
forwarded direct from the Publisher. The subscription for the
stamped edition of Notes and Queries (including a very copious
Index) is eleven shillings and fourpence for six months, which may
be paid by Post-Office Order, drawn in favour of the Publisher, Mr.
George Bell, No. 186. Fleet Street.
DR. DE JONGH'S LIGHT BROWN COD LIVER OIL. Prepared for
medicinal use in the Loffoden Isles, Norway, and put to the test of
chemical analysis. The most effectual remedy for Consumption,

Bronchitis, Asthma, Gout, Chronic Rheumatism, and all Scrofulous
Diseases.
Approved of and recommended by Berzelius, Liebig, Woehler, Jonathan
Pereira, Fouquier, and numerous other eminent medical men and
scientific chemists in Europe.
Specially rewarded with medals by the Governments of Belgium and
the Netherlands.
Has almost entirely superseded all other kinds on the Continent, in
consequence of its proved superior power and efficacy—effecting a
cure much more rapidly.
Contains iodine, phosphate of chalk, volatile acid, and the elements
of the bile—in short, all its most active and essential principles—in
larger quantities than the pale oils made in England and
Newfoundland, deprived mainly of these by their mode of
preparation.
Sold Wholesale and Retail, in bottles, labelled with Dr. de Jongh's
Stamp and Signature, by
ANSAR, HARFORD, CO., 77. Strand,
Sole Consignees and Agents for the United Kingdom and British
Possessions; and by all respectable Chemists and Vendors of
Medicine in Town and Country, at the following prices:—
Imperial Measure, Half-pints, 2s. 6d.; Pints, 4s. 9d.
BENNETT'S MODEL WATCH, as shown at the GREAT EXHIBITION.
No. 1. Class X., in Gold and Silver Cases, in five qualities, and
adapted to all Climates, may now be had at the MANUFACTORY, 65.
CHEAPSIDE. Superior Gold London-made Patent Levers, 17, 15, and
12 guineas. Ditto, in Silver Cases, 8, 6, and 4 guineas. First-rate

Geneva Levers, in Gold Cases, 12, 10, and 8 guineas. Ditto, in Silver
Cases, 8, 6, and 5 guineas. Superior Lever, with Chronometer
Balance, Gold, 27, 23, and 19 guineas. Bennett's Pocket
Chronometer, Gold, 50 guineas; Silver, 40 guineas. Every Watch
skilfully examine, timed, and its performance guaranteed.
Barometers, 2l., 3l., and 4l. Thermometers from 1s. each.
BENNET, Watch, Clock, and Instrument Maker to the Royal
Observatory, the Board of Ordnance, the Admiralty, and the Queen,
65. CHEAPSIDE.
Patronised by the Royal Family.
TWO THOUSAND POUNDS for any person producing Articles superior to
the following:
THE HAIR RESTORED AND GREYNESS PREVENTED.
BEETHAM'S CAPILLARY FLUID is acknowledged to be the most
effectual article for Restoring the Hair in Baldness, strengthening
when weak and fine, effectually preventing falling or turning grey,
and for restoring its natural colour without the use of dye. The rich
glossy appearance it imparts is the admiration of every person.
Thousands have experienced its astonishing efficacy. Bottles 2s. 6d.;
double size, 4s. 6d.; 7s. 6d. equal to 4 small; 11s. to 6 small; 21s. to
13 small. The most perfect beautifier ever invented.
SUPERFLUOUS HAIR REMOVED.
BEETHAM'S VEGETABLE EXTRACT does not cause pain or injury to
the skin. Its effect is unerring, and it is now patronised by royalty
and hundreds of the first families. Bottles, 5s.

BEETHAM'S PLASTER is the only effectual remover of Corns and
Bunions. It also reduces enlarged Great Toe Joints in an astonishing
manner. If space allowed, the testimony of upwards of twelve
thousand individuals, during the last five years, might be inserted.
Packets, 1s.; Boxes, 2s. 6d. Sent Free by BEETHAM, Chemist,
Cheltenham, for 14 or 36 Post Stamps.
Sold by PRING, 30. Westmorland Street; JACKSON, 9. Westland
Row; BEWLEY EVANS, Dublin; GOULDING, 108. Patrick
Street, Cork; BARRY, 9. Main Street, Kinsale; GRATTAN, Belfast;
MURDOCK, BROTHERS, Glasgow; DUNCAN FLOCKHART,
Edinburgh. SANGER, 150. Oxford Street; PROUT, 229. Strand;
KEATING, St. Paul's Churchyard; SAVORY MOORE, Bond
Street; HANNAY, 63. Oxford Street; London. All Chemists and
Perfumers will procure them.
ROSS SONS' INSTANTANEOUS HAIR DYE, without Smell, the best
and cheapest extant.—ROSS SONS have several private
apartments devoted entirely to Dyeing the Hair, and particularly
request a visit, especially from the incredulous, as they will
undertake to dye a portion of their hair, without charging, of any
colour required, from the lightest brown to the darkest black, to
convince them of its effect.
Sold in cases at 3s. 6d., 5s. 6d., 10s., 15s., and 20s. each case.
Likewise wholesale to the Trade by the pint, quart, or gallon.
Address, ROSS SONS, 119. and 120. Bishopsgate Street, Six Doors
from Cornhill, London.
ALLEN'S ILLUSTRATED CATALOGUE, containing Size, Price, and
Description of upwards of 100 articles, consisting of
PORTMANTEAUS, TRAVELLING-BAGS, Ladies' Portmanteaus,
DESPATCH-BOXES, WRITING-DESKS, DRESSING-CASES, and other

travelling requisites, Gratis on application, or sent free by Post on
receipt of Two Stamps.
MESSRS. ALLEN'S registered Despatch-box and Writing-desk, their
Travelling-bag with the opening as large as the bag, and the new
Portmanteau containing four compartments, are undoubtedly the
best articles of the kind ever produced.
J. W. T. ALLEN, 18. 22. West Strand.
ONE THOUSAND BEDSTEADS TO CHOOSE FROM.—HEAL SON'S
Stock comprises handsomely Japanned and Brass-mounted Iron
Bedsteads, Children's Cribs and Cots of new and elegant designs,
Mahogany, Birch, and Walnut-tree Bedsteads, of the soundest and
best Manufacture, many of them fitted with Furnitures, complete. A
large Assortment of Servants' and Portable Bedsteads. They have
also every variety of Furniture for the complete furnishing of a Bed
Room.
HEAL SON'S ILLUSTRATED AND PRICED CATALOGUE OF
BEDSTEADS AND BEDDING, sent Free by Post.
HEAL SON, 196. Tottenham Court Road.
PHOTOGRAPHIC APPARATUS, MATERIALS, and PURE CHEMICAL
PREPARATIONS.
KNIGHT SONS' Illustrated Catalogue, containing Description and
Price of the best forms of Cameras and other Apparatus.
Voightlander and Son's Lenses for Portraits and Views, together with
the various Materials, and pure Chemical Preparations required in
practising the Photographic Art. Forwarded free on receipt of Six
Postage Stamps.
Instructions given in every branch of the Art.

An extensive Collection of Stereoscopic and other Photographic
Specimens.
GEORGE KNIGHT SONS, Foster Lane, London.
PHOTOGRAPHIC INSTITUTION.
THE EXHIBITION OF PHOTOGRAPHS, by the most eminent English
and Continental Artists, is OPEN DAILY from Ten till Five. Free
Admission.
£ s. d.
A Portrait by Mr. Talbot's Patent Process 1 1 0
Additional Copies (each) 0 5 0
A Coloured Portrait, highly finished (small size) 3 3 0
A Coloured Portrait, highly finished (larger size) 5 5 0
Miniatures, Oil Paintings, Water-Colour, and Chalk Drawings,
Photographed and Coloured in imitation of the Originals. Views of
Country Mansions, Churches, c., taken at a short notice.
Cameras, Lenses, and all the necessary Photographic Apparatus and
Chemicals, are supplied, tested, and guaranteed.
Gratuitous Instruction is given to Purchasers of Sets of Apparatus.
PHOTOGRAPHIC INSTITUTION,
168. New Bond Street.
THE LONDON SCHOOL OF PHOTOGRAPHY, 78. Newgate Street.—At
this Institution, Ladies and Gentlemen may learn in One Hour to take
Portraits and Landscapes, and purchase the necessary Apparatus for
Five Pounds. No charge is made for the Instruction.

IMPROVEMENT IN COLLODION.—J. B. HOCKIN CO., Chemists,
289. Strand, have, by an improved mode of Iodizing, succeeded in
producing a Collodion equal, they may say superior, in sensitiveness
and density of Negative, to any other hitherto published; without
diminishing the keeping properties and appreciation of half-tint for
which their manufacture has been esteemed.
Apparatus, pure Chemicals, and all the requirements for the practice
of Photography. Instruction in the Art.
THE COLLODION AND POSITIVE PAPER PROCESS. By J. B. HOCKIN.
Price 1s., per Post, 1s. 2d.
WHOLESALE PHOTOGRAPHIC DEPOT: DANIEL M‘MILLAN, 132. Fleet
Street, London. The Cheapest House in Town for every Description
of Photographic Apparatus, Materials, and Chemicals.
*
*
*
Price List Free on Application.
COCOA-NUT FIBRE MATTING and MATS, of the best quality.—The
Jury of Class 28, Great Exhibition, awarded the Prize Medal to T.
TRELOAR, Cocoa-Nut Fibre Manufacturer, 42. Ludgate Hill, London.
COLLODION PORTRAITS AND VIEWS obtained with the greatest
ease and certainty by using BLAND LONG'S preparation of Soluble
Cotton; certainty and uniformity of action over a lengthened period,
combined with the most faithful rendering of the half-tones,
constitute this a most valuable agent in the hands of the
photographer.
Albumenized paper, for printing from glass or paper negatives, giving
a minuteness of detail unattained by any other method, 5s. per
Quire.

Waxed and Iodized Papers of tried quality.
Instruction in the Processes.
BLAND LONG, Opticians and Photographical Instrument Makers,
and Operative Chemists, 153. Fleet Street. London.
*
*
*
Catalogues sent on application.
THE SIGHT preserved by the Use of SPECTACLES adapted to suit
every variety of Vision by means of SMEE'S OPTOMETER, which
effectually prevents Injury to the Eyes from the Selection of
Improper Glasses, and is extensively employed by
BLAND LONG, Opticians, 153. Fleet Street, London.
PHOTOGRAPHIC CAMERAS.
OTTEWILL AND MORGAN'S
Manufactory, 24. 25. Charlotte Terrace, Caledonian Road,
Islington.
OTTEWILL'S Registered Double Body Folding Camera, adapted for
Landscapes or Portraits, may be had of A. ROSS, Featherstone
Buildings, Holborn; the Photographic Institution, Bond Street; and at
the Manufactory as above, where every description of Cameras,
Slides, and Tripods may be had. The The Trade supplied.
PHOTOGRAPHY.—HORNE CO.'S Iodized Collodion, for obtaining
Instantaneous Views, and Portraits in from three to thirty seconds,
according to light.

Portraits obtained by the above, for delicacy of detail rival the
choicest Daguerreotypes, specimens of which may be seen at their
Establishment.
Also every description of Apparatus, Chemicals, c. c. used in this
beautiful Art.—123. and 121. Newgate Street.
PIANOFORTES, 25 Guineas each.—D'ALMAINE CO., 20. Soho
Square (established A.D. 1785), sole manufacturers of the ROYAL
PIANOFORTES, at 25 Guineas each. Every instrument warranted.
The peculiar advantages of these pianofortes are best described in
the following professional testimonial, signed by the majority of the
leading musicians of the age:—We, the undersigned members of
the musical profession, having carefully examined the Royal
Pianofortes manufactured by MESSRS. D'ALMAINE CO., have great
pleasure in bearing testimony to their merits and capabilities. It
appears to us impossible to produce instruments of the same size
possessing a richer and finer tone, more elastic touch, or more equal
temperament, while the elegance of their construction renders them
a handsome ornament for the library, boudoir, or drawing-room.
(Signed) J. L. Abel, F. Benedict, H. R. Bishop, J. Blewitt, J. Brizzi, T. P.
Chipp, P. Delavanti, C. H. Dolby, E. F. Fitzwilliam, W. Forde, Stephen
Glover, Henri Herz, E. Harrison, H. F. Hassé, J. L. Hatton, Catherine
Hayes, W. H. Holmes, W. Kuhe, G. F. Kiallmark, E. Land, G. Lanza,
Alexander Lee. A. Leffler, E. J. Loder, W. H. Montgomery, S. Nelson,
G. A. Osborne, John Parry, H. Panofka, Henry Phillips, F. Praegar, E.
F. Rimbault, Frank Romer, G. H. Rodwell, E. Rockel, Sims Reeves, J.
Templeton, F. Weber, H. Westrop, T. E. Wright, c.
D'ALMAINE CO., 20. Soho Square. Lists and Designs Gratis.
WESTERN LIFE ASSURANCE AND ANNUITY
SOCIETY.

3. PARLIAMENT STREET, LONDON.
Founded A.D. 1842.
Directors.
H. E. Bicknell, Esq.
T. S. Cocks, Jun. Esq., M.P.
G. H. Drew, Esq.
W. Evans, Esq.
W. Freeman, Esq.
F. Fuller, Esq.
J. H. Goodhart, Esq.
T. Grissell, Esq.
J. Hunt, Esq.
J. A. Lethbridge, Esq.
E. Lucas, Esq.
J. Lys Seager, Esq.
J. B. White, Esq.
J. Carter Wood, Esq.
Trustees.—W. Whateley, Esq., Q.C.; George Drew, Esq.,
T. Grissell, Esq.
Physician.—William Rich. Basham, M.D.
Bankers.—Messrs. Cocks, Biddulph, and Co., Charing Cross.
VALUABLE PRIVILEGE.
POLICIES effected in this Office do not become void through
temporary difficulty in paying a Premium, as permission is given
upon application to suspend the payment at interest, according to
the conditions detailed in the Prospectus.
Specimens of Rates of Premium for Assuring 100l., with a Share in
three-fourths of the Profits:—
Age £ s. d. Age £ s. d.
17
1 14 4
32
2 10 8
22
1 18 8
37
2 18 6

27
2 4 5
42
3 8 2
ARTHUR SCRATCHLEY, M.A., F.R.A.S., Actuary.
Now ready, price 10s. 6d., Second Edition, with material additions,
INDUSTRIAL INVESTMENT and EMIGRATION: being a TREATISE ON
BENEFIT BUILDING SOCIETIES, and on the General Principles of
Land Investment, exemplified in the Cases of Freehold Land
Societies, Building Companies, c. With a Mathematical Appendix on
Compound Interest and Life Assurance. By ARTHUR SCRATCHLEY,
M.A., Actuary to the Western Life Assurance Society, 3. Parliament
Street, London.
ALLSOPP'S PALE or BITTER ALE.—MESSRS. S. ALLSOPP SONS beg
to inform the TRADE that they are now registering Orders for the
March Brewings of their PALE ALE in Casks of 18 Gallons and
upwards, at the BREWERY, Burton-on-Trent; and at the under-
mentioned Branch Establishments:
LONDON, at 61. King William Street, City.
LIVERPOOL, at Cook Street.
MANCHESTER, at Ducie Place.
DUDLEY, at the Burnt Tree.
GLASGOW, at 115. St. Vincent Street.
DUBLIN, at 1. Crampton Quay.
BIRMINGHAM, at Market Hall.
SOUTH WALES, at 13. King Street, Bristol.
MESSRS. ALLSOPP SONS take the opportunity of announcing to
PRIVATE FAMILIES that their ALES, so strongly recommended by the
Medical Profession, may be procured in DRAUGHT and BOTTLES
GENUINE from all the most RESPECTABLE LICENSED VICTUALLERS,
on ALLSOPP'S PALE ALE being specially asked for.

When in bottle, the genuineness of the label can be ascertained by
its having ALLSOPP SONS written across it.
CHUBB'S LOCKS, with all the recent improvements. Strong fire-proof
safes, cash and deed boxes. Complete lists of sizes and prices may
be had on application.
CHUBB SON, 57. St. Paul's Churchyard, London; 28. Lord Street,
Liverpool; 16. Market Street, Manchester; and Horseley Fields,
Wolverhampton.
Printed by Thomas Clark Shaw, of No. 10. Stonefield Street, in the
Parish of St. Mary, Islington, at No. 5. New Street Square, in the
Parish of St. Bride, in the City of London; and published by George
Bell, of No. 186. Fleet Street, in the Parish of St. Dunstan in the
West, in the City of London, Publisher, at No. 186. Fleet Street
aforesaid.—Saturday, June 24. 1854.

*** END OF THE PROJECT GUTENBERG EBOOK NOTES AND
QUERIES, NUMBER 243, JUNE 24, 1854 ***
Updated editions will replace the previous one—the old editions
will be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.
START: FULL LICENSE

THE FULL PROJECT GUTENBERG LICENSE

PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the
free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only
be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.

1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E. Unless you have removed all references to Project
Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project

Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is
derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is
posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

A Course On Large Deviations With An Introduction To Gibbs Measures Firas Rassoulagha

More Related Content

Similar to A Course On Large Deviations With An Introduction To Gibbs Measures Firas Rassoulagha (20)

More from zubrusrexo50 (10)

Recently uploaded (20)

A Course On Large Deviations With An Introduction To Gibbs Measures Firas Rassoulagha