An Introduction to Mathematical Cryptography-Springer-.pdf

UndergraduateTexts in Mathematics
Jeffrey Hoffstein
Jill Pipher
Joseph H. Silverman
An
Introduction to
Mathematical
Cryptography
SecondEdition

Undergraduate Texts in Mathematics

Undergraduate Texts in Mathematics
Series Editors:
Sheldon Axler
San Francisco State University, San Francisco, CA, USA
Kenneth Ribet
University of California, Berkeley, CA, USA
Advisory Board:
Colin Adams, Williams College, Williamstown, MA, USA
Alejandro Adem, University of British Columbia, Vancouver, BC, Canada
Ruth Charney, Brandeis University, Waltham, MA, USA
Irene M. Gamba, The University of Texas at Austin, Austin, TX, USA
Roger E. Howe, Yale University, New Haven, CT, USA
David Jerison, Massachusetts Institute of Technology, Cambridge, MA, USA
Jeffrey C. Lagarias, University of Michigan, Ann Arbor, MI, USA
Jill Pipher, Brown University, Providence, RI, USA
Fadil Santosa, University of Minnesota, Minneapolis, MN, USA
Amie Wilkinson, University of Chicago, Chicago, IL, USA
Undergraduate Texts in Mathematics are generally aimed at third- and fourth-
year undergraduate mathematics students at North American universities. These texts
strive to provide students and teachers with new perspectives and novel approaches.
The books include motivation that guides the reader to an appreciation of interre-
lations among different aspects of the subject. They feature examples that illustrate
key concepts as well as exercises that strengthen understanding.
More information about this series at http://www.springer.com/series/666

Jeffrey Hoffstein • Jill Pipher
Joseph H. Silverman
An Introduction
to Mathematical
Cryptography
Second Edition
123

Jeffrey Hoffstein
Department of Mathematics
Brown University
Providence, RI, USA
Joseph H. Silverman
Brown University
Providence, RI, USA
Jill Pipher
Brown University
Providence, RI, USA
ISSN 0172-6056 ISSN 2197-5604 (electronic)
ISBN 978-1-4939-1710-5 ISBN 978-1-4939-1711-2 (eBook)
DOI 10.1007/978-1-4939-1711-2
Springer New York Heidelberg Dordrecht London
Library of Congress Control Number: 2014946354
© Springer Science+Business Media New York 2008, 2014
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and
executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this pub-
lication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s
location, in its current version, and permission for use must always be obtained from Springer. Permis-
sions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable
to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of publica-
tion, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors
or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the
material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

Preface
The creation of public key cryptography by Diffie and Hellman in 1976 and the
subsequent invention of the RSA public key cryptosystem by Rivest, Shamir,
and Adleman in 1978 are watershed events in the long history of secret com-
munications. It is hard to overestimate the importance of public key cryp-
tosystems and their associated digital signature schemes in the modern world
of computers and the Internet. This book provides an introduction to the
theory of public key cryptography and to the mathematical ideas underlying
that theory.
Public key cryptography draws on many areas of mathematics, including
number theory, abstract algebra, probability, and information theory. Each
of these topics is introduced and developed in sufficient detail so that this
book provides a self-contained course for the beginning student. The only
prerequisite is a first course in linear algebra. On the other hand, students
with stronger mathematical backgrounds can move directly to cryptographic
applications and still have time for advanced topics such as elliptic curve
pairings and lattice-reduction algorithms.
Among the many facets of modern cryptography, this book chooses to con-
centrate primarily on public key cryptosystems and digital signature schemes.
This allows for an in-depth development of the necessary mathematics re-
quired for both the construction of these schemes and an analysis of their
security. The reader who masters the material in this book will not only be
well prepared for further study in cryptography, but will have acquired a real
understanding of the underlying mathematical principles on which modern
cryptography is based.
Topics covered in this book include Diffie–Hellman key exchange, discrete
logarithm based cryptosystems, the RSA cryptosystem, primality testing, fac-
torization algorithms, digital signatures, probability theory, information the-
ory, collision algorithms, elliptic curves, elliptic curve cryptography, pairing-
based cryptography, lattices, lattice-based cryptography, and the NTRU cryp-
tosystem. A final chapter very briefly describes some of the many other aspects
of modern cryptography (hash functions, pseudorandom number generators,
v

vi Preface
zero-knowledge proofs, digital cash, AES, etc.) and serves to point the reader
toward areas for further study.
Electronic Resources: The interested reader will find additional material
and a list of errata on the Mathematical Cryptography home page:
www.math.brown.edu/~jhs/MathCryptoHome.html
This web page includes many of the numerical exercises in the book, allowing
the reader to cut and paste them into other programs, rather than having to
retype them.
No book is ever free from error or incapable of being improved. We would
be delighted to receive comments, good or bad, and corrections from our
readers. You can send mail to us at
mathcrypto@math.brown.edu
Acknowledgments: We, the authors, would like the thank the following
individuals for test-driving this book and for the many corrections and helpful
suggestions that they and their students provided: Liat Berdugo, Alexander
Collins, Samuel Dickman, Michael Gartner, Nicholas Howgrave-Graham, Su-
Ion Ih, Saeja Kim, Yuji Kosugi, Yesem Kurt, Michelle Manes, Victor Miller,
David Singer, William Whyte. In addition, we would like to thank the many
students at Brown University who took Math 158 and helped us improve the
exposition of this book.
Acknowledgments for the Second Edition: We would like to thank
the following individuals for corrections and suggestions that have been
incorporated into the second edition: Stefanos Aivazidis, Nicole Andre,
John B. Baena, Carlo Beenakker, Robert Bond, Reinier Broker, Camp-
bell Hewett, Rebecca Constantine, Stephen Constantine, Christopher Davis,
Maria Fox, Steven Galbraith, Motahhareh Gharahi, David Hartz, Jeremy
Huddleston, Calvin Jongsma, Maya Kaczorowski, Yamamoto Kato, Jonathan
Katz, Chan-Ho Kim, Ariella Kirsch, Martin M. Lauridsen, Kelly McNeilly,
Ryo Masuda, Shahab Mirzadeh, Kenneth Ribet, Jeremy Roach, Hemlal
Sahum, Ghassan Sarkis, Frederick Schmitt, Christine Schwartz, Wei Shen,
David Singer, Michael Soltys, David Spies, Bruce Stephens, Paulo Tanimoto,
Patrick Vogt, Ralph Wernsdorf, Sebastian Welsch, Ralph Wernsdorf, Edward
White, Pomona College Math 113 (Spring 2009), University of California at
Berkeley Math 116 (Spring 2009, 2010).
Providence, USA Jeffrey Hoffstein
Jill Pipher
Joseph H. Silverman

Contents
Preface v
Introduction xiii
1 An Introduction to Cryptography 1
1.1 Simple Substitution Ciphers . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Cryptanalysis of Simple Substitution Ciphers . . . . . . 4
1.2 Divisibility and Greatest Common Divisors . . . . . . . . . . . 10
1.3 Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.1 Modular Arithmetic and Shift Ciphers . . . . . . . . . . 23
1.3.2 The Fast Powering Algorithm . . . . . . . . . . . . . . . 24
1.4 Prime Numbers, Unique Factorization, and Finite Fields . . . . 26
1.5 Powers and Primitive Roots in Finite Fields . . . . . . . . . . . 29
1.6 Cryptography Before the Computer Age . . . . . . . . . . . . 34
1.7 Symmetric and Asymmetric Ciphers . . . . . . . . . . . . . . . 37
1.7.1 Symmetric Ciphers . . . . . . . . . . . . . . . . . . . . . 37
1.7.2 Encoding Schemes . . . . . . . . . . . . . . . . . . . . . 39
1.7.3 Symmetric Encryption of Encoded Blocks . . . . . . . . 40
1.7.4 Examples of Symmetric Ciphers . . . . . . . . . . . . . 41
1.7.5 Random Bit Sequences and Symmetric Ciphers . . . . . 44
1.7.6 Asymmetric Ciphers Make a First Appearance . . . . . 46
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 Discrete Logarithms and Diﬃe–Hellman 61
2.1 The Birth of Public Key Cryptography . . . . . . . . . . . . . 61
2.2 The Discrete Logarithm Problem . . . . . . . . . . . . . . . . . 64
2.3 Diﬃe–Hellman Key Exchange . . . . . . . . . . . . . . . . . . . 67
2.4 The Elgamal Public Key Cryptosystem . . . . . . . . . . . . . 70
2.5 An Overview of the Theory of Groups . . . . . . . . . . . . . . 74
2.6 How Hard Is the Discrete Logarithm Problem? . . . . . . . . . 77
2.7 A Collision Algorithm for the DLP . . . . . . . . . . . . . . . 81
vii

viii Contents
2.8 The Chinese Remainder Theorem . . . . . . . . . . . . . . . . 83
2.8.1 Solving Congruences with Composite Moduli . . . . . . 86
2.9 The Pohlig–Hellman Algorithm . . . . . . . . . . . . . . . . . 88
2.10 Rings, Quotients, Polynomials, and Finite Fields . . . . . . . . 94
2.10.1 An Overview of the Theory of Rings . . . . . . . . . . . 95
2.10.2 Divisibility and Quotient Rings . . . . . . . . . . . . . . 96
2.10.3 Polynomial Rings and the Euclidean Algorithm . . . . . 98
2.10.4 Polynomial Ring Quotients and Finite Fields . . . . . . 102
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3 Integer Factorization and RSA 117
3.1 Euler’s Formula and Roots Modulo pq . . . . . . . . . . . . . . 117
3.2 The RSA Public Key Cryptosystem . . . . . . . . . . . . . . . 123
3.3 Implementation and Security Issues . . . . . . . . . . . . . . . . 126
3.4 Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.4.1 The Distribution of the Set of Primes . . . . . . . . . . 133
3.4.2 Primality Proofs Versus Probabilistic Tests . . . . . . . 136
3.5 Pollard’s p − 1 Factorization Algorithm . . . . . . . . . . . . . 137
3.6 Factorization via Diﬀerence of Squares . . . . . . . . . . . . . 141
3.7 Smooth Numbers and Sieves . . . . . . . . . . . . . . . . . . . . 150
3.7.1 Smooth Numbers . . . . . . . . . . . . . . . . . . . . . . 150
3.7.2 The Quadratic Sieve . . . . . . . . . . . . . . . . . . . . 155
3.7.3 The Number Field Sieve . . . . . . . . . . . . . . . . . . 162
3.8 The Index Calculus and Discrete Logarithms . . . . . . . . . . 166
3.9 Quadratic Residues and Quadratic Reciprocity . . . . . . . . . 169
3.10 Probabilistic Encryption . . . . . . . . . . . . . . . . . . . . . . 177
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4 Digital Signatures 193
4.1 What Is a Digital Signature? . . . . . . . . . . . . . . . . . . . 193
4.2 RSA Digital Signatures . . . . . . . . . . . . . . . . . . . . . . 196
4.3 Elgamal Digital Signatures and DSA . . . . . . . . . . . . . . . 198
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5 Combinatorics, Probability, and Information Theory 207
5.1 Basic Principles of Counting . . . . . . . . . . . . . . . . . . . 208
5.1.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . 210
5.1.2 Combinations . . . . . . . . . . . . . . . . . . . . . . . . 211
5.1.3 The Binomial Theorem . . . . . . . . . . . . . . . . . . 213
5.2 The Vigenère Cipher . . . . . . . . . . . . . . . . . . . . . . . . 214
5.2.1 Cryptanalysis of the Vigenère Cipher: Theory . . . . . . 218
5.2.2 Cryptanalysis of the Vigenère Cipher: Practice . . . . . 223
5.3 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . 228
5.3.1 Basic Concepts of Probability Theory . . . . . . . . . . 228

Contents ix
5.3.2 Bayes’s Formula . . . . . . . . . . . . . . . . . . . . . . 233
5.3.3 Monte Carlo Algorithms . . . . . . . . . . . . . . . . . . 236
5.3.4 Random Variables . . . . . . . . . . . . . . . . . . . . . 238
5.3.5 Expected Value . . . . . . . . . . . . . . . . . . . . . . . 244
5.4 Collision Algorithms and Meet-in-the-Middle Attacks . . . . . . 246
5.4.1 The Birthday Paradox . . . . . . . . . . . . . . . . . . . 246
5.4.2 A Collision Theorem . . . . . . . . . . . . . . . . . . . . 247
5.4.3 A Discrete Logarithm Collision Algorithm . . . . . . . . 250
5.5 Pollard’s ρ Method . . . . . . . . . . . . . . . . . . . . . . . . . 253
5.5.1 Abstract Formulation of Pollard’s ρ Method . . . . . . . 254
5.5.2 Discrete Logarithms via Pollard’s ρ Method . . . . . . . 259
5.6 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . 263
5.6.1 Perfect Secrecy . . . . . . . . . . . . . . . . . . . . . . . 263
5.6.2 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
5.6.3 Redundancy and the Entropy
of Natural Language . . . . . . . . . . . . . . . . . . . . 275
5.6.4 The Algebra of Secrecy Systems . . . . . . . . . . . . . 277
5.7 Complexity Theory and P Versus NP . . . . . . . . . . . . . . 278
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
6 Elliptic Curves and Cryptography 299
6.1 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
6.2 Elliptic Curves over Finite Fields . . . . . . . . . . . . . . . . . 306
6.3 The Elliptic Curve Discrete Logarithm Problem . . . . . . . . . 310
6.3.1 The Double-and-Add Algorithm . . . . . . . . . . . . . 312
6.3.2 How Hard Is the ECDLP? . . . . . . . . . . . . . . . . . 315
6.4 Elliptic Curve Cryptography . . . . . . . . . . . . . . . . . . . 316
6.4.1 Elliptic Diffie–Hellman Key Exchange . . . . . . . . . . 316
6.4.2 Elliptic Elgamal Public Key Cryptosystem . . . . . . . 319
6.4.3 Elliptic Curve Signatures . . . . . . . . . . . . . . . . . 321
6.5 The Evolution of Public Key Cryptography . . . . . . . . . . . 321
6.6 Lenstra’s Elliptic Curve Factorization Algorithm . . . . . . . . 324
6.7 Elliptic Curves over F2 and over F2k . . . . . . . . . . . . . . . 329
6.8 Bilinear Pairings on Elliptic Curves . . . . . . . . . . . . . . . . 336
6.8.1 Points of Finite Order on Elliptic Curves . . . . . . . . 337
6.8.2 Rational Functions and Divisors on Elliptic Curves . . . 338
6.8.3 The Weil Pairing . . . . . . . . . . . . . . . . . . . . . . 340
6.8.4 An Efficient Algorithm to Compute the Weil Pairing . . 343
6.8.5 The Tate Pairing . . . . . . . . . . . . . . . . . . . . . . 346
6.9 The Weil Pairing over Fields of Prime Power Order . . . . . . . 347
6.9.1 Embedding Degree and the MOV Algorithm . . . . . . 347
6.9.2 Distortion Maps and a Modified Weil Pairing . . . . . . 350
6.9.3 A Distortion Map on y2
= x3
+ x . . . . . . . . . . . . . 352

x Contents
6.10 Applications of the Weil Pairing . . . . . . . . . . . . . . . . . 356
6.10.1 Tripartite Diﬃe–Hellman Key Exchange . . . . . . . . . 356
6.10.2 ID-Based Public Key Cryptosystems . . . . . . . . . . . 358
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
7 Lattices and Cryptography 373
7.1 A Congruential Public Key Cryptosystem . . . . . . . . . . . . 373
7.2 Subset-Sum Problems and Knapsack Cryptosystems . . . . . . 377
7.3 A Brief Review of Vector Spaces . . . . . . . . . . . . . . . . . 384
7.4 Lattices: Basic Deﬁnitions and Properties . . . . . . . . . . . . 388
7.5 Short Vectors in Lattices . . . . . . . . . . . . . . . . . . . . . . 395
7.5.1 The Shortest and the Closest Vector Problems . . . . . 395
7.5.2 Hermite’s Theorem and Minkowski’s Theorem . . . . . 396
7.5.3 The Gaussian Heuristic . . . . . . . . . . . . . . . . . . 400
7.6 Babai’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 403
7.7 Cryptosystems Based on Hard Lattice Problems . . . . . . . . 407
7.8 The GGH Public Key Cryptosystem . . . . . . . . . . . . . . . 409
7.9 Convolution Polynomial Rings . . . . . . . . . . . . . . . . . . 412
7.10 The NTRU Public Key Cryptosystem . . . . . . . . . . . . . . 416
7.10.1 NTRUEncrypt . . . . . . . . . . . . . . . . . . . . . . . 417
7.10.2 Mathematical Problems for NTRUEncrypt . . . . . . . 422
7.11 NTRUEncrypt as a Lattice Cryptosystem . . . . . . . . . . . . 425
7.11.1 The NTRU Lattice . . . . . . . . . . . . . . . . . . . . . 425
7.11.2 Quantifying the Security of an NTRU Lattice . . . . . . 427
7.12 Lattice-Based Digital Signature Schemes . . . . . . . . . . . . . 428
7.12.1 The GGH Digital Signature Scheme . . . . . . . . . . . 428
7.12.2 Transcript Analysis . . . . . . . . . . . . . . . . . . . . . 430
7.12.3 Rejection Sampling . . . . . . . . . . . . . . . . . . . . . 431
7.12.4 Rejection Sampling Applied to an Abstract Signature
Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
7.12.5 The NTRU Modular Lattice Signature Scheme . . . . . 434
7.13 Lattice Reduction Algorithms . . . . . . . . . . . . . . . . . . 436
7.13.1 Gaussian Lattice Reduction in Dimension 2 . . . . . . . 436
7.13.2 The LLL Lattice Reduction Algorithm . . . . . . . . . . 439
7.13.3 Using LLL to Solve apprCVP . . . . . . . . . . . . . . . 448
7.13.4 Generalizations of LLL . . . . . . . . . . . . . . . . . . . 449
7.14 Applications of LLL to Cryptanalysis . . . . . . . . . . . . . . . 450
7.14.1 Congruential Cryptosystems . . . . . . . . . . . . . . . . 451
7.14.2 Applying LLL to Knapsacks . . . . . . . . . . . . . . . . 451
7.14.3 Applying LLL to GGH . . . . . . . . . . . . . . . . . . . 452
7.14.4 Applying LLL to NTRU . . . . . . . . . . . . . . . . . . 453
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454

Contents xi
8 Additional Topics in Cryptography 471
8.1 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
8.2 Random Numbers and Pseudorandom Number . . . . . . . . . 474
8.3 Zero-Knowledge Proofs . . . . . . . . . . . . . . . . . . . . . . . 477
8.4 Secret Sharing Schemes . . . . . . . . . . . . . . . . . . . . . . 480
8.5 Identiﬁcation Schemes . . . . . . . . . . . . . . . . . . . . . . . 481
8.6 Padding Schemes and the Random Oracle Model . . . . . . . . 482
8.7 Building Protocols from Cryptographic Primitives . . . . . . . 485
8.8 Blind Digital Signatures, Digital Cash, and Bitcoin . . . . . . . 487
8.9 Homomorphic Encryption . . . . . . . . . . . . . . . . . . . . . 490
8.10 Hyperelliptic Curve Cryptography . . . . . . . . . . . . . . . . 494
8.11 Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . 497
8.12 Modern Symmetric Cryptosystems: DES and AES . . . . . . . 499
List of Notation 503
References 507
Index 517

An Introduction to Mathematical Cryptography-Springer-.pdf

Introduction
Principal Goals of (Public Key) Cryptography
• Allow two people to exchange conﬁdential information,
even if they have never met and can communicate only
via a channel that is being monitored by an adversary.
• Allow a person to attach a digital signature to a document,
so that any other person can verify the validity of the
signature, but no one can forge a signature on any other
document.
The security of communications and commerce in a digital age relies on the
modern incarnation of the ancient art of codes and ciphers. Underlying the
birth of modern cryptography is a great deal of fascinating mathematics,
some of which has been developed for cryptographic applications, but much
of which is taken from the classical mathematical canon. The principal goal
of this book is to introduce the reader to a variety of mathematical topics
while simultaneously integrating the mathematics into a description of modern
public key cryptography.
For thousands of years, all codes and ciphers relied on the assumption
that the people attempting to communicate, call them Bob and Alice, share
a secret key that their adversary, call her Eve, does not possess. Bob uses the
secret key to encrypt his message, Alice uses the same secret key to decrypt
the message, and poor Eve, not knowing the secret key, is unable to perform
the decryption. A disadvantage of these private key cryptosystems is that Bob
and Alice need to exchange the secret key before they can get started.
During the 1970s, the astounding idea of public key cryptography burst
upon the scene.1
In a public key cryptosystem, Alice has two keys, a public
encryption key KPub
and a private (secret) decryption key KPri
. Alice pub-
lishes her public key KPub
, and then Adam and Bob and Carl and everyone
else can use KPub
to encrypt messages and send them to Alice. The idea
underlying public key cryptography is that although everyone in the world
knows KPub
and can use it to encrypt messages, only Alice, who knows the
private key KPri
, is able to decrypt messages.
1A brief history of cryptography is given is Sects. 1.6, 2.1, 6.5, and 7.7.
xiii

xiv Introduction
The advantages of a public key cryptosystem are manifold. For example,
Bob can send Alice an encrypted message even if they have never previously
been in direct contact. But although public key cryptography is a fascinating
theoretical concept, it is not at all clear how one might create a public key
cryptosystem. It turns out that public key cryptosystems can be based on
hard mathematical problems. More precisely, one looks for a mathematical
problem that is initially hard to solve, but that becomes easy to solve if one
knows some extra piece of information.
Of course, private key cryptosystems have not disappeared. Indeed, they
are more important than ever, since they tend to be significantly more effi-
cient than public key cryptosystems. Thus in practice, if Bob wants to send
Alice a long message, he first uses a public key cryptosystem to send Alice
the key for a private key cryptosystem, and then he uses the private key
cryptosystem to encrypt his message. The most efficient modern private key
cryptosystems, such as DES and AES, rely for their security on repeated ap-
plication of various mixing operations that are hard to unmix without the
private key. Thus although the subject of private key cryptography is of both
theoretical and practical importance, the connection with fundamental un-
derlying mathematical ideas is much less pronounced than it is with public
key cryptosystems. For that reason, this book concentrates almost exclusively
on public key cryptography, especially public key cryptosystems and digital
signatures.
Modern mathematical cryptography draws on many areas of mathematics,
including especially number theory, abstract algebra (groups, rings, fields),
probability, statistics, and information theory, so the prerequisites for studying
the subject can seem formidable. By way of contrast, the prerequisites for
reading this book are minimal, because we take the time to introduce each
required mathematical topic in sufficient depth as it is needed. Thus this
book provides a self-contained treatment of mathematical cryptography for
the reader with limited mathematical background. And for those readers who
have taken a course in, say, number theory or abstract algebra or probability,
we suggest briefly reviewing the relevant sections as they are reached and then
moving on directly to the cryptographic applications.
This book is not meant to be a comprehensive source for all things cryp-
tographic. In the first place, as already noted, we concentrate on public key
cryptography. But even within this domain, we have chosen to pursue a small
selection of topics to a reasonable mathematical depth, rather than provid-
ing a more superficial description of a wider range of subjects. We feel that
any reader who has mastered the material in this book will not only be well
prepared for further study in cryptography, but will have acquired a real
understanding of the underlying mathematical principles on which modern
cryptography is based.
However, this does not mean that the omitted topics are unimportant.
It simply means that there is a limit to the amount of material that can
be included in a book (or course) of reasonable length. As in any text, the

Introduction xv
choice of particular topics reflects the authors’ tastes and interests. For the
convenience of the reader, the final chapter contains a brief survey of areas
for further study.
A Guide to Mathematical Topics: This book includes a significant amount
of mathematical material on a variety of topics that are useful in cryptography.
The following list is designed to help coordinate the mathematical topics that
we cover with subjects that the class or reader may have already studied.
Congruences, primes, and finite fields — Sects. 1.2, 1.3, 1.4, 1.5, 2.10.4
The Chinese remainder theorem — Sect. 2.8
Euler’s formula — Sect. 3.1
Primality testing — Sect. 3.4
Quadratic reciprocity — Sect. 3.9
Factorization methods — Sects. 3.5, 3.6, 3.7, 6.6
Discrete logarithms — Sects. 2.2, 3.8, 5.4, 5.5, 6.3
Group theory — Sect. 2.5
Rings, polynomials, and quotient rings — Sects. 2.10 and 7.9
Combinatorics and probability — Sects. 5.1 and 5.3
Information and complexity theory — Sects. 5.6 and 5.7
Elliptic curves — Sects. 6.1, 6.2, 6.7, 6.8
Linear algebra — Sects. 7.3
Lattices — Sects. 7.4, 7.5, 7.6, 7.13
Intended Audience and Prerequisites: This book provides a self-con-
tained introduction to public key cryptography and to the underlying math-
ematics that is required for the subject. It is suitable as a text for advanced
undergraduates and beginning graduate students. We provide enough back-
ground material so that the book can be used in courses for students with no
previous exposure to abstract algebra or number theory. For classes in which
the students have a stronger background, the basic mathematical material
may be omitted, leaving time for some of the more advanced topics.
The formal prerequisites for this book are few, beyond a facility with
high school algebra and, in Chap. 6, analytic geometry. Elementary calculus
is used here and there in a minor way, but is not essential, and linear alge-
bra is used in a small way in Chap. 3 and more extensively in Chap. 7. No
previous knowledge is assumed for mathematical topics such as number the-
ory, abstract algebra, and probability theory that play a fundamental role in
modern cryptography. They are covered in detail as needed.
However, it must be emphasized that this is a mathematics book with its
share of formal definitions and theorems and proofs. Thus it is expected that
the reader has a certain level of mathematical sophistication. In particular,
students who have previously taken a proof-based mathematics course will
find the material easier than those without such background. On the other
hand, the subject of cryptography is so appealing that this book makes a
good text for an introduction-to-proofs course, with the understanding that

xvi Introduction
the instructor will need to cover the material more slowly to allow the students
time to become comfortable with proof-based mathematics.
Suggested Syllabus: This book contains considerably more material than
can be comfortably covered by beginning students in a one semester course.
However, for more advanced students who have already taken courses in num-
ber theory and abstract algebra, it should be possible to do most of the remain-
ing material. We suggest covering the majority of the topics in Chaps. 1–4,
possibly omitting some of the more technical topics, the optional material
on the Vigènere cipher, and the section on ring theory, which is not used
until much later in the book. The next three chapters on information theory
(Chap. 5), elliptic curves (Chap. 6), and lattices (Chap. 7) are mostly indepen-
dent of one another, so the instructor has the choice of covering one or two
of them in detail or all of them in less depth. We offer the following syllabus
as an example of one of the many possibilities. We have indicated that some
sections are optional. Covering the optional material leaves less time for the
later chapters at the end of the course.
Chapter 1. An Introduction to Cryptography.
Cover all sections.
Chapter 2. Discrete Logarithms and Diffie–Hellman.
Cover Sects. 2.1–2.7. Optionally cover the more mathematically sophis-
ticated Sects. 2.8–2.9 on the Pohlig–Hellman algorithm. Omit Sect. 2.10
on first reading.
Chapter 3. Integer Factorization and RSA.
Cover Sects. 3.1–3.5 and 3.9–3.10. Optionally, cover the more mathemat-
ically sophisticated Sects. 3.6–3.8, dealing with smooth numbers, sieves,
and the index calculus.
Chapter 4. Digital Signatures.
Cover all sections.
Chapter 5. Probability Theory and Information Theory.
Cover Sects. 5.1, 5.3, and 5.4. Optionally cover the more mathemati-
cally sophisticated sections on Pollard’s ρ method (Sect. 5.5), informa-
tion theory (Sect. 5.6), and complexity theory (Sect. 5.7). The material
on the Vigenère cipher in Sect. 5.2 nicely illustrates the use of statistics
in cryptanalysis, but is somewhat off the main path.
Chapter 6. Elliptic Curves.
Cover Sects. 6.1–6.4. Cover other sections as time permits, but note that
Sects. 6.7–6.10 on pairings require finite fields of prime power order,
which are described in Sect. 2.10.4.
Chapter 7. Lattices and Cryptography.
Cover Sects. 7.1–7.8. (If time is short, one may omit either or both of
Sects. 7.1 and 7.2.) Cover either Sects. 7.13–7.14 on the LLL lattice re-
duction algorithm or Sects. 7.9–7.11 on the NTRU cryptosystem, or

Introduction xvii
both, as time permits. (The NTRU sections require the material on
polynomial rings and quotient rings covered in Sect. 2.10.)
Chapter 8. Additional Topics in Cryptography.
The material in this chapter points the reader toward other important
areas of cryptography. It provides a good list of topics and references
for student term papers and presentations.
Further Notes for the Instructor: Depending on how much of the harder
mathematical material in Chaps. 2–5 is covered, there may not be time to
delve into both Chaps. 6 and 7, so the instructor may need to omit either
elliptic curves or lattices in order to fit the other material into one semester.
We feel that it is helpful for students to gain an appreciation of the origins
of their subject, so we have scattered a handful of sections throughout the book
containing some brief comments on the history of cryptography. Instructors
who want to spend more time on mathematics may omit these sections without
affecting the mathematical narrative.
Changes in the Second Edition:
• The chapter on digital signatures has been moved, since we felt that
this important topic should be covered earlier in the course. More pre-
cisely, RSA, Elgamal, and DSA signatures are now described in the short
Chap. 4, while the material on elliptic curve signatures is covered in the
brief Sect. 6.4.3. The two sections on lattice-based signatures from the first
edition have been extensively rewritten and now appear as Sect. 7.12.
• Numerous new exercises have been included.
• Numerous typographical and minor mathematical errors have been cor-
rected, and notation has been made more consistent from chapter to
chapter.
• Various explanations have been rewritten or expanded for clarity, espe-
cially in Chaps. 5–7.
• New sections on digital cash and on homomorphic encryption have been
added to the additional topics in Chap. 8; see Sects. 8.8 and 8.9.

Chapter 1
An Introduction
to Cryptography
1.1 Simple Substitution Ciphers
As Julius Caesar surveys the unfolding battle from his hilltop outpost, an
exhausted and disheveled courier bursts into his presence and hands him a
sheet of parchment containing gibberish:
j s j r d k f q q n s l g f h p g w j f p y m w t z l m n r r n s j s y q z h n z x
Within moments, Julius sends an order for a reserve unit of charioteers to
speed around the left flank and exploit a momentary gap in the opponent’s
formation.
How did this string of seemingly random letters convey such important
information? The trick is easy, once it is explained. Simply take each letter in
the message and shift it five letters up the alphabet. Thus j in the ciphertext
becomes e in the plaintext,1
because e is followed in the alphabet by f,g,h,i,j.
Applying this procedure to the entire ciphertext yields
j s j r d k f q q n s l g f h p g w j f p y m w t z l m n r r n s j s y q z h n z x
e n e m y f a l l i n g b a c k b r e a k t h r o u g h i m m i n e n t l u c i u s
The second line is the decrypted plaintext, and breaking it into words and
supplying the appropriate punctuation, Julius reads the message
Enemy falling back. Breakthrough imminent. Lucius.
1The plaintext is the original message in readable form and the ciphertext is the
encrypted message.
© Springer Science+Business Media New York 2014
J. Hoffstein et al., An Introduction to Mathematical Cryptography,
Undergraduate Texts in Mathematics, DOI 10.1007/978-1-4939-1711-2 1
1

2 1. An Introduction to Cryptography
There remains one minor quirk that must be addressed. What happens when
Julius finds a letter such as d? There is no letter appearing five letters before d
in the alphabet. The answer is that he must wrap around to the end of the
alphabet. Thus d is replaced by y, since y is followed by z,a,b,c,d.
This wrap-around effect may be conveniently visualized by placing the
alphabet abcd...xyz around a circle, rather than in a line. If a second alpha-
bet circle is then placed within the first circle and the inner circle is rotated
five letters, as illustrated in Fig. 1.1, the resulting arrangement can be used
to easily encrypt and decrypt Caesar’s messages. To decrypt a letter, simply
find it on the inner wheel and read the corresponding plaintext letter from
the outer wheel. To encrypt, reverse this process: find the plaintext letter on
the outer wheel and read off the ciphertext letter from the inner wheel. And
note that if you build a cipherwheel whose inner wheel spins, then you are no
longer restricted to always shifting by exactly five letters. Cipher wheels of
this sort have been used for centuries.2
Although the details of the preceding scene are entirely fictional, and in
any case it is unlikely that a message to a Roman general would have been
written in modern English(!), there is evidence that Caesar employed this
early method of cryptography, which is sometimes called the Caesar cipher
in his honor. It is also sometimes referred to as a shift cipher, since each
letter in the alphabet is shifted up or down. Cryptography, the methodology of
concealing the content of messages, comes from the Greek root words kryptos,
meaning hidden,3
and graphikos, meaning writing. The modern scientific study
of cryptography is sometimes referred to as cryptology.
In the Caesar cipher, each letter is replaced by one specific substitute
letter. However, if Bob encrypts a message for Alice4
using a Caesar cipher
and allows the encrypted message to fall into Eve’s hands, it will take Eve
very little time to decrypt it. All she needs to do is try each of the 26 possible
shifts.
Bob can make his message harder to attack by using a more complicated
replacement scheme. For example, he could replace every occurrence of a
by z and every occurrence of z by a, every occurrence of b by y and every
occurrence of y by b, and so on, exchanging each pair of letters c ↔ x,. . . ,
m ↔ n.
This is an example of a simple substitution cipher, that is, a cipher in which
each letter is replaced by another letter (or some other type of symbol). The
2A cipher wheel with mixed up alphabets and with encryption performed using different
offsets for different parts of the message is featured in a fifteenth century monograph by
Leon Batista Alberti [63].
3The word cryptic, meaning hidden or occult, appears in 1638, while crypto- as a prefix
for concealed or secret makes its appearance in 1760. The term cryptogram appears much
later, first occurring in 1880.
4In cryptography, it is traditional for Bob and Alice to exchange confidential mes-
sages and for their adversary Eve, the eavesdropper, to intercept and attempt to read their
messages. This makes the field of cryptography much more personal than other areas of
mathematics and computer science, whose denizens are often X and Y !

1.1. Simple Substitution Ciphers 3
F
–
a
G
–
b
H
–
c
I
–
d
J
– e
K – f
L – g
M – h
N – i
O
– j
P
–
k
Q
–
l
R
–
m
S
–
n
T
–
o
U
–
p
V
–
q
W
–
r
X
–
s
Y
–
t
Z
–
u
A
–
v
B
–
w
C
–
x
D
–
y
E
–
z
Figure 1.1: A cipher wheel with an offset of five letters
Caesar cipher is an example of a simple substitution cipher, but there are
many simple substitution ciphers other than the Caesar cipher. In fact, a
simple substitution cipher may be viewed as a rule or function
{a,b,c,d,e,...,x,y,z} −→ {A,B,C,D,E,...,X,Y,Z}
assigning each plaintext letter in the domain a different ciphertext letter in the
range. (To make it easier to distinguish the plaintext from the ciphertext, we
write the plaintext using lowercase letters and the ciphertext using uppercase
letters.) Note that in order for decryption to work, the encryption function
must have the property that no two plaintext letters go to the same ciphertext
letter. A function with this property is said to be one-to-one or injective.
A convenient way to describe the encryption function is to create a table
by writing the plaintext alphabet in the top row and putting each ciphertext
letter below the corresponding plaintext letter.
Example 1.1. A simple substitution encryption table is given in Table 1.1. The
ciphertext alphabet (the uppercase letters in the bottom row) is a randomly
chosen permutation of the 26 letters in the alphabet. In order to encrypt the
plaintext message
Four score and seven years ago,
we run the words together, look up each plaintext letter in the encryption
table, and write the corresponding ciphertext letter below.
f o u r s c o r e a n d s e v e n y e a r s a g o
N U R B K S U B V C G Q K V E V G Z V C B K C F U

a b c d e f g h i j k l m n o p q r s t u v w x y z
C I S Q V N F O W A X M T G U H P B K L R E Y D Z J
Table 1.1: Simple substitution encryption table
j r a x v g n p b z s t l f h q d u c m o e i k w y
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Table 1.2: Simple substitution decryption table
It is then customary to write the ciphertext in five-letter blocks:
NURBK SUBVC GQKVE VGZVC BKCFU
Decryption is a similar process. Suppose that we receive the message
GVVQG VYKCM CQQBV KKWGF SCVKC B
and that we know that it was encrypted using Table 1.1. We can reverse
the encryption process by finding each ciphertext letter in the second row
of Table 1.1 and writing down the corresponding letter from the top row.
However, since the letters in the second row of Table 1.1 are all mixed up,
this is a somewhat inefficient process. It is better to make a decryption table
in which the ciphertext letters in the lower row are listed in alphabetical order
and the corresponding plaintext letters in the upper row are mixed up. We
have done this in Table 1.2. Using this table, we easily decrypt the message.
G V V Q G V Y K C M C Q Q B V K K W G F S C V K C B
n e e d n e w s a l a d d r e s s i n g c a e s a r
Putting in the appropriate word breaks and some punctuation reveals an
urgent request!
Need new salad dressing. -Caesar
1.1.1 Cryptanalysis of Simple Substitution Ciphers
How many different simple substitution ciphers exist? We can count them by
enumerating the possible ciphertext values for each plaintext letter. First we
assign the plaintext letter a to one of the 26 possible ciphertext letters A–Z. So
there are 26 possibilities for a. Next, since we are not allowed to assign b to the
same letter as a, we may assign b to any one of the remaining 25 ciphertext
letters. So there are 26 · 25 = 650 possible ways to assign a and b. We have
now used up two of the ciphertext letters, so we may assign c to any one of

the remaining 24 ciphertext letters. And so on. . . . Thus the total number of
ways to assign the 26 plaintext letters to the 26 ciphertext letters, using each
ciphertext letter only once, is
26 · 25 · 24 · · · 4 · 3 · 2 · 1 = 26! = 403291461126605635584000000.
There are thus more than 1026
diﬀerent simple substitution ciphers. Each
associated encryption table is known as a key.
Suppose that Eve intercepts one of Bob’s messages and that she attempts
to decrypt it by trying every possible simple substitution cipher. The process
of decrypting a message without knowing the underlying key is called crypt-
analysis. If Eve (or her computer) is able to check one million cipher alphabets
per second, it would still take her more than 1013
years to try them all.5
But
the age of the universe is estimated to be on the order of 1010
years. Thus Eve
has almost no chance of decrypting Bob’s message, which means that Bob’s
message is secure and he has nothing to worry about!6
Or does he?
It is time for an important lesson in the practical side of the science of
cryptography:
Your opponent always uses her best strategy to defeat you,
not the strategy that you want her to use. Thus the secu-
rity of an encryption system depends on the best known
method to break it. As new and improved methods are
developed, the level of security can only get worse, never
better.
Despite the large number of possible simple substitution ciphers, they are
actually quite easy to break, and indeed many newspapers and magazines
feature them as a companion to the daily crossword puzzle. The reason that
Eve can easily cryptanalyze a simple substitution cipher is that the letters
in the English language (or any other human language) are not random. To
take an extreme example, the letter q in English is virtually always followed
by the letter u. More useful is the fact that certain letters such as e and t
appear far more frequently than other letters such as f and c. Table 1.3 lists
the letters with their typical frequencies in English text. As you can see, the
most frequent letter is e, followed by t, a, o, and n.
Thus if Eve counts the letters in Bob’s encrypted message and makes a
frequency table, it is likely that the most frequent letter will represent e, and
that t, a, o, and n will appear among the next most frequent letters. In this
way, Eve can try various possibilities and, after a certain amount of trial and
error, decrypt Bob’s message.
5Do you see how we got 1013 years? There are 60 · 60 · 24 · 365 s in a year, and 26!
divided by 106 · 60 · 60 · 24 · 365 is approximately 1013.107.
6The assertion that a large number of possible keys, in and of itself, makes a cryptosys-
tem secure, has appeared many times in history and has equally often been shown to be
fallacious.

By decreasing frequency
E 13.11 % M 2.54 %
T 10.47 % U 2.46 %
A 8.15 % G 1.99 %
O 8.00 % Y 1.98 %
N 7.10 % P 1.98 %
R 6.83 % W 1.54 %
I 6.35 % B 1.44 %
S 6.10 % V 0.92 %
H 5.26 % K 0.42 %
D 3.79 % X 0.17 %
L 3.39 % J 0.13 %
F 2.92 % Q 0.12 %
C 2.76 % Z 0.08 %
In alphabetical order
A 8.15 % N 7.10 %
B 1.44 % O 8.00 %
C 2.76 % P 1.98 %
D 3.79 % Q 0.12 %
E 13.11 % R 6.83 %
F 2.92 % S 6.10 %
G 1.99 % T 10.47 %
H 5.26 % U 2.46 %
I 6.35 % V 0.92 %
J 0.13 % W 1.54 %
K 0.42 % X 0.17 %
L 3.39 % Y 1.98 %
M 2.54 % Z 0.08 %
Table 1.3: Frequency of letters in English text
LOJUM YLJME PDYVJ QXTDV SVJNL DMTJZ WMJGG YSNDL UYLEO SKDVC
GEPJS MDIPD NEJSK DNJTJ LSKDL OSVDV DNGYN VSGLL OSCIO LGOYG
ESNEP CGYSN GUJMJ DGYNK DPPYX PJDGG SVDNT WMSWS GYLYS NGSKJ
CEPYQ GSGLD MLPYN IUSCP QOYGM JGCPL GDWWJ DMLSL OJCNY NYLYD
LJQLO DLCNL YPLOJ TPJDM NJQLO JWMSE JGGJG XTUOY EOOJO DQDMM
YBJQD LLOJV LOJTV YIOLU JPPES NGYQJ MOYVD GDNJE MSVDN EJM
Table 1.4: A simple substitution cipher to cryptanalyze
In the remainder of this section we illustrate how to cryptanalyze a simple
substitution cipher by decrypting the message given in Table 1.4. Of course the
end result of defeating a simple substitution cipher is not our main goal here.
Our key point is to introduce the idea of statistical analysis, which will prove to
have many applications throughout cryptography. Although for completeness
we provide full details, the reader may wish to skim this material.
There are 298 letters in the ciphertext. The ﬁrst step is to make a frequency
table listing how often each ciphertext letter appears (Table 1.5).
J L D G Y S O N M P E V Q C T W U K I X Z B A F R H
Freq 32 28 27 24 23 22 19 18 17 15 12 12 8 8 7 6 6 5 4 3 1 1 0 0 0 0
% 11 9 9 8 8 7 6 6 6 5 4 4 3 3 2 2 2 2 1 1 0 0 0 0 0 0
Table 1.5: Frequency table for Table 1.4—Ciphertext length: 298
The ciphertext letter J appears most frequently, so we make the provisional
guess that it corresponds to the plaintext letter e. The next most frequent
ciphertext letters are L (28 times) and D (27 times), so we might guess from
Table 1.3 that they represent t and a. However, the letter frequencies in a

th he an re er in on at nd st es en of te ed
168 132 92 91 88 86 71 68 61 53 52 51 49 46 46
(a) Most common English bigrams (frequency per 1000 words)
LO OJ GY DN VD YL DL DM SN KD LY NG OY JD SK EP JG SV JM JQ
9 7 6 each 5 each 4 each
(b) Most common bigrams appearing in the ciphertext in Table 1.4
Table 1.6: Bigram frequencies
short message are unlikely to exactly match the percentages in Table 1.3. All
that we can say is that among the ciphertext letters L, D, G, Y, and S are likely
to appear several of the plaintext letters t, a, o, n, and r.
There are several ways to proceed. One method is to look at bigrams,
which are pairs of consecutive letters. Table 1.6a lists the bigrams that most
frequently appear in English, and Table 1.6b lists the ciphertext bigrams that
appear most frequently in our message. The ciphertext bigrams LO and OJ
appear frequently. We have already guessed that J = e, and based on its fre-
quency we suspect that L is likely to represent one of the letters t, a, o, n,
or r. Since the two most frequent English bigrams are th and he, we make
the tentative identiﬁcations
LO = th and OJ = he.
We substitute the guesses J = e, L = t, and O = h, into the ciphertext,
writing the putative plaintext letter below the corresponding ciphertext letter.
the-- -te-- ----e ----- --e-t ---e- --e-- ----t --t-h -----
---e- ----- --e-- --e-e t---t h---- ----- ---tt h---h t-h--
----- ----- --e-e ----- ----- -e--- ----- ----- --t-- ----e
----- ---t- -t--- ----- -h--- e---t ----e --t-t he--- --t--
te-th -t--t --the --e-- -e-th e---- e--e- ---h- -hheh -----
--e-- tthe- the-- --ht- e---- ----e -h--- ---e- ----- -e-
At this point, we can look at the fragments of plaintext and attempt to
guess some common English words. For example, in the second line we see the
three blocks

VSGLL OSCIO LGOYG,
---tt h---h t-h--.
Looking at the fragment th---ht, we might guess that this is the word
thought, which gives three more equivalences,
S = o, C = u, I = g.
This yields
the-- -te-- ----e ----- o-e-t ---e- --e-- -o--t --t-h o---u
---eo --g-- --eo- --e-e to--t ho--- ----- -o-tt hough t-h--
-o--- u--o- --e-e ----- ----- -e--- o---- --o-o --t-o --o-e
u---- -o-t- -t--- g-ou- -h--- e-u-t ----e --tot heu-- --t--
te-th -tu-t --the --e-- -e-th e--o- e--e- ---h- -hheh -----
--e-- tthe- the-- -ght- e---o ----e -h--- ---e- -o--- -e-
Now look at the three letters ght in the last line. They must be preceded
by a vowel, and the only vowels left are a and i, so we guess that Y = i. Then
we ﬁnd the letters itio in the third line, and we guess that they are followed
by an n, which gives N = n. (There is no reason that a letter cannot represent
itself, although this is often forbidden in the puzzle ciphers that appear in
newspapers.) We now have
the-- ite-- --i-e ----- o-ent ---e- --e-- ion-t -it-h o---u
---eo --g-- n-eo- -ne-e to--t ho--- -n-in -o-tt hough t-hi-
-on-- u-ion --e-e --in- ---i- -e--- o--n- --o-o -itio n-o-e
u--i- -o-t- -t-in g-ou- -hi-- e-u-t ----e --tot heuni niti-
te-th -tunt i-the --e-- ne-th e--o- e--e- ---hi -hheh -----
i-e-- tthe- the-- ight- e---o n-i-e -hi-- --ne- -o--n -e-
So far, we have reconstructed the following plaintext/ciphertext pairs:
J L D G Y S O N M P E V Q C T W U K I X Z B A F R H
e t - - i o h n - - - - - u - - - - g - - - - - - -
Freq 32 28 27 24 23 22 19 18 17 15 12 12 8 8 7 6 6 5 4 3 1 1 0 0 0 0

Recall that the most common letters in English (Table 1.3) are, in order of
decreasing frequency,
e, t, a, o, n, r, i, s, h.
We have already assigned ciphertext values to e, t, o, n, i, h, so we guess
that D and G represent two of the three letters a, r, s. In the third line we
notice that GYLYSN gives -ition, so clearly G must be s. Similarly, on the
fifth line we have LJQLO DLCNL equal to te-th -tunt, so D must be a, not r.
Substituting these new pairs G = s and D = a gives
the-- ite-- -ai-e ---a- o-ent a--e- --ess ionat -it-h o-a-u
s--eo -ag-a n-eo- ane-e to-at ho-a- ansin -ostt hough tshis
-on-- usion s-e-e asin- a--i- -eass o-an- --o-o sitio nso-e
u--i- sosta -t-in g-ou- -his- esu-t sa--e a-tot heuni nitia
te-th atunt i-the --ea- ne-th e--o- esses ---hi -hheh a-a--
i-e-a tthe- the-- ight- e---o nsi-e -hi-a sane- -o-an -e-
It is now easy to fill in additional pairs by inspection. For example, the
missing letter in the fragment atunt i-the on the fifth line must be l, which
gives P = l, and the missing letter in the fragment -osition on the third
line must be p, which gives W = p. Substituting these in, we find the fragment
e-p-ession on the first line, which gives Z = x and M = r, and the fragment
-on-lusion on the third line, which gives E = c. Then consi-er on the last
line gives Q = d and the initial words the-riterclai-e- must be the phrase
“the writer claimed,” yielding U = w and V = m. This gives
thewr iterc laime d--am oment ar-ex press ionat witch o-amu
scleo ragla nceo- ane-e to-at homam ansin mostt hough tshis
concl usion swere asin- alli- leass oman- propo sitio nso-e
uclid sosta rtlin gwoul dhisr esult sappe artot heuni nitia
tedth atunt ilthe -lear nedth eproc esses --whi chheh adarr
i-eda tthem the-m ightw ellco nside rhima sanec roman cer
It is now a simple matter to fill in the few remaining letters and put in
the appropriate word breaks, capitalization, and punctuation to recover the
plaintext:

The writer claimed by a momentary expression, a twitch of a mus-
cle or a glance of an eye, to fathom a man’s inmost thoughts. His
conclusions were as infallible as so many propositions of Euclid.
So startling would his results appear to the uninitiated that until
they learned the processes by which he had arrived at them they
might well consider him as a necromancer.7
1.2 Divisibility and Greatest Common Divisors
Much of modern cryptography is built on the foundations of algebra and
number theory. So before we explore the subject of cryptography, we need to
develop some important tools. In the next four sections we begin this devel-
opment by describing and proving fundamental results in these areas. If you
have already studied number theory in another course, a brief review of this
material will suﬃce. But if this material is new to you, then it is vital to study
it closely and to work out the exercises provided at the end of the chapter.
At the most basic level, Number Theory is the study of the natural numbers
1, 2, 3, 4, 5, 6, . . . ,
or slightly more generally, the study of the integers
. . . , −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, . . . .
The set of integers is denoted by the symbol Z. Integers can be added, sub-
tracted, and multiplied in the usual way, and they satisfy all the usual rules
of arithmetic (commutative law, associative law, distributive law, etc.). The
set of integers with their addition and multiplication rules are an example of
a ring. See Sect. 2.10.1 for more about the theory of rings.
If a and b are integers, then we can add them, a + b, subtract them, a − b,
and multiply them, a · b. In each case, we get an integer as the result. This
property of staying inside of our original set after applying operations to a
pair of elements is characteristic of a ring.
But if we want to stay within the integers, then we are not always able
to divide one integer by another. For example, we cannot divide 3 by 2, since
there is no integer that is equal to 3
2 . This leads to the fundamental concept
of divisibility.
Deﬁnition. Let a and b be integers with b = 0. We say that b divides a, or
that a is divisible by b, if there is an integer c such that
a = bc.
We write b | a to indicate that b divides a. If b does not divide a, then we
write b a.
7A Study in Scarlet (Chap. 2), Sir Arthur Conan Doyle.

1.2. Divisibility and Greatest Common Divisors 11
Example 1.2. We have 847 | 485331, since 485331 = 847 · 573. On the other
hand, 355 259943, since when we try to divide 259943 by 355, we get a
remainder of 83. More precisely, 259943 = 355 · 732 + 83, so 259943 is not an
exact multiple of 355.
Remark 1.3. Notice that every integer is divisible by 1. The integers that are
divisible by 2 are the even integers, and the integers that are not divisible
by 2 are the odd integers.
There are a number of elementary divisibility properties, some of which
we list in the following proposition.
Proposition 1.4. Let a, b, c ∈ Z be integers.
(a) If a | b and b | c, then a | c.
(b) If a | b and b | a, then a = ±b.
(c) If a | b and a | c, then a | (b + c) and a | (b − c).
Proof. We leave the proof as an exercise for the reader; see Exercise 1.6.
Definition. A common divisor of two integers a and b is a positive integer d
that divides both of them. The greatest common divisor of a and b is, as
its name suggests, the largest positive integer d such that d | a and d | b.
The greatest common divisor of a and b is denoted gcd(a, b). If there is no
possibility of confusion, it is also sometimes denoted by (a, b). (If a and b are
both 0, then gcd(a, b) is not defined.)
It is a curious fact that a concept as simple as the greatest common divisor
has many applications. We’ll soon see that there is a fast and efficient method
to compute the greatest common divisor of any two integers, a fact that has
powerful and far-reaching consequences.
Example 1.5. The greatest common divisor of 12 and 18 is 6, since 6 | 12
and 6 | 18 and there is no larger number with this property. Similarly,
gcd(748, 2024) = 44.
One way to check that this is correct is to make lists of all of the positive
divisors of 748 and of 2024.
Divisors of 748 = {1, 2, 4, 11, 17, 22, 34, 44, 68, 187, 374, 748},
Divisors of 2024 = {1, 2, 4, 8, 11, 22, 23, 44, 46, 88, 92, 184, 253,
506, 1012, 2024}.
Examining the two lists, we see that the largest common entry is 44. Even
from this small example, it is clear that this is not a very efficient method. If
we ever need to compute greatest common divisors of large numbers, we will
have to find a more efficient approach.

The key to an efficient algorithm for computing greatest common divisors
is division with remainder, which is simply the method of “long division” that
you learned in elementary school. Thus if a and b are positive integers and if
you attempt to divide a by b, you will get a quotient q and a remainder r,
where the remainder r is smaller than b. For example,
13 R 9
17 ) 230
17
60
51
9
so 230 divided by 17 gives a quotient of 13 with a remainder of 9. What does
this last statement really mean? It means that 230 can be written as
230 = 17 · 13 + 9,
where the remainder 9 is strictly smaller than the divisor 17.
Definition. (Division With Remainder) Let a and b be positive integers.
Then we say that a divided by b has quotient q and remainder r if
a = b · q + r with 0 ≤ r b.
The values of q and r are uniquely determined by a and b; see Exercise 1.14.
Suppose now that we want to find the greatest common divisor of a and b.
We first divide a by b to get
a = b · q + r with 0 ≤ r b. (1.1)
If d is any common divisor of a and b, then it is clear from Eq. (1.1) that d
is also a divisor of r. (See Proposition 1.4(c).) Similarly, if e is a common
divisor of b and r, then (1.1) shows that e is a divisor of a. In other words, the
common divisors of a and b are the same as the common divisors of b and r;
hence
gcd(a, b) = gcd(b, r).
We repeat the process, dividing b by r to get another quotient and remainder,
say
b = r · q
+ r
with 0 ≤ r
r.
Then the same reasoning shows that
gcd(b, r) = gcd(r, r
).
Continuing this process, the remainders become smaller and smaller, until
eventually we get a remainder of 0, at which point the final value gcd(s, 0) = s
is equal to the gcd of a and b.
We illustrate with an example and then describe the general method, which
goes by the name Euclidean algorithm.

Example 1.6. We compute gcd(2024, 748) using the Euclidean algorithm,
which is nothing more than repeated division with remainder. Notice how
the b and r values on each line become the new a and b values on the subse-
quent line:
2024 = 748 · 2 + 528
748 = 528 · 1 + 220
528 = 220 · 2 + 88
220 = 88 · 2 + 44 ← gcd = 44
88 = 44 · 2 + 0
Theorem 1.7 (The Euclidean Algorithm). Let a and b be positive integers
with a ≥ b. The following algorithm computes gcd(a, b) in a finite number of
steps.
(1) Let r0 = a and r1 = b.
(2) Set i = 1.
(3) Divide ri−1 by ri to get a quotient qi and remainder ri+1,
ri−1 = ri · qi + ri+1 with 0 ≤ ri+1 ri.
(4) If the remainder ri+1 = 0, then ri = gcd(a, b) and the algorithm termi-
nates.
(5) Otherwise, ri+1 0, so set i = i + 1 and go to Step 3.
The division step (Step 3) is executed at most
2 log2(b) + 2 times.
Proof. The Euclidean algorithm consists of a sequence of divisions with
remainder as illustrated in Fig. 1.2 (remember that we set r0 = a and r1 = b).
a = b ·q1 + r2 with 0 ≤ r2 b,
b = r2 ·q2 + r3 with 0 ≤ r3 r2,
r2 = r3 ·q3 + r4 with 0 ≤ r4 r3,
r3 = r4 ·q4 + r5 with 0 ≤ r5 r4,
.
.
.
.
.
.
.
.
.
rt−2 = rt−1 · qt−1 + rt with 0 ≤ rt rt−1,
rt−1 = rt · qt
Then rt = gcd(a, b).
Figure 1.2: The Euclidean algorithm step by step
The ri values are strictly decreasing, and as soon as they reach zero the
algorithm terminates, which proves that the algorithm does finish in a finite

number of steps. Further, at each iteration of Step 3 we have an equation of
the form
ri−1 = ri · qi + ri+1.
This equation implies that any common divisor of ri−1 and ri is also a divisor
of ri+1, and similarly it implies that any common divisor of ri and ri+1 is also
a divisor of ri−1. Hence
gcd(ri−1, ri) = gcd(ri, ri+1) for all i = 1, 2, 3, . . . . (1.2)
However, as noted earlier, we eventually get to an ri that is zero, say rt+1 = 0.
Then rt−1 = rt · qt, so
gcd(rt−1, rt) = gcd(rt · qt, rt) = rt.
But Eq. (1.2) says that this is equal to gcd(r0, r1), i.e., to gcd(a, b), which com-
pletes the proof that the last nonzero remainder in the Euclidean algorithm
is equal to the greatest common divisor of a and b.
It remains to estimate the eﬃciency of the algorithm. We noted above
that since the ri values are strictly decreasing, the algorithm terminates, and
indeed since r1 = b, it certainly terminates in at most b steps. However, this
upper bound is far from the truth. We claim that after every two iterations
of Step 3, the value of ri is at least cut in half. In other words:
Claim: ri+2 1
2 ri for all i = 0, 1, 2, . . . .
We prove the claim by considering two cases.
Case I: ri+1 ≤ 1
2 ri
We know that the ri values are strictly decreasing, so
ri+2 ri+1 ≤ 1
2 ri.
Case II: ri+1 1
2 ri
Consider what happens when we divide ri by ri+1. The value of ri+1 is
so large that we get
ri = ri+1 · 1 + ri+2 with ri+2 = ri − ri+1 ri − 1
2 ri = 1
2 ri.
We have now proven our claim that ri+2 1
2 ri for all i. Using this inequality
repeatedly, we ﬁnd that
r2k+1
1
2
r2k−1
1
4
r2k−3
1
8
r2k−5
1
16
r2k−7 · · ·
1
2k
r1 =
1
2k
b.
Hence if 2k
≥ b, then r2k+1 1, which forces r2k+1 to equal 0 and the algo-
rithm to terminate. In terms of Fig. 1.2, the value of rt+1 is 0, so we have

t + 1 ≤ 2k + 1, and thus t ≤ 2k. Further, there are exactly t divisions per-
formed in Fig. 1.2, so the Euclidean algorithm terminates in at most 2k iter-
ations. Choose the smallest such k, so 2k
≥ b 2k−1
. Then
# of iterations ≤ 2k = 2(k − 1) + 2 2 log2(b) + 2,
which completes the proof of Theorem 1.7.
Remark 1.8. We proved that the Euclidean algorithm applied to a and b with
a ≥ b requires no more than 2 log2(b) + 2 iterations to compute gcd(a, b). This
estimate can be somewhat improved. It has been proven that the Euclidean
algorithm takes no more than 1.45 log2(b) + 1.68 iterations, and that the
average number of iterations for randomly chosen a and b is approximately
0.85 log2(b) + 0.14; see [66].
Remark 1.9. One way to compute quotients and remainders is by long
division, as we did on page 12. You can speed up the process using a sim-
ple calculator. The ﬁrst step is to divide a by b on your calculator, which will
give a real number. Throw away the part after the decimal point to get the
quotient q. Then the remainder r can be computed as
r = a − b · q.
For example, let a = 2387187 and b = 27573. Then a/b ≈ 86.57697748, so
q = 86 and
r = a − b · q = 2387187 − 27573 · 86 = 15909.
If you need just the remainder, you can instead take the decimal part (also
sometimes called the fractional part) of a/b and multiply it by b. Continuing
with our example, the decimal part of a/b ≈ 86.57697748 is 0.57697748, and
multiplying by b = 27573 gives
27573 · 0.57697748 = 15909.00005604.
Rounding this oﬀ gives r = 15909.
After performing the Euclidean algorithm on two numbers, we can work
our way back up the process to obtain an extremely interesting formula. Before
giving the general result, we illustrate with an example.
Example 1.10. Recall that in Example 1.6 we used the Euclidean algorithm
to compute gcd(2024, 748) as follows:
2024 = 748 · 2 + 528
748 = 528 · 1 + 220
528 = 220 · 2 + 88
220 = 88 · 2 + 44 ← gcd = 44
88 = 44 · 2 + 0

We let a = 2024 and b = 748, so the first line says that
528 = a − 2b.
We substitute this into the second line to get
b = (a − 2b) · 1 + 220, so 220 = −a + 3b.
We next substitute the expressions 528 = a − 2b and 220 = −a + 3b into the
third line to get
a − 2b = (−a + 3b) · 2 + 88, so 88 = 3a − 8b.
Finally, we substitute the expressions 220 = −a + 3b and 88 = 3a − 8b into
the penultimate line to get
−a + 3b = (3a − 8b) · 2 + 44, so 44 = −7a + 19b.
In other words,
−7 · 2024 + 19 · 748 = 44 = gcd(2024, 748),
so we have found a way to write gcd(a, b) as a linear combination of a and b
using integer coefficients.
In general, it is always possible to write gcd(a, b) as an integer linear combi-
nation of a and b, a simple sounding result with many important consequences.
Theorem 1.11 (Extended Euclidean Algorithm). Let a and b be positive
integers. Then the equation
au + bv = gcd(a, b)
always has a solution in integers u and v. (See Exercise 1.12 for an efficient
algorithm to find a solution.)
If (u0, v0) is any one solution, then every solution has the form
u = u0 +
b · k
gcd(a, b)
and v = v0 −
a · k
gcd(a, b)
for some k ∈ Z.
Proof. Look back at Fig. 1.2, which illustrates the Euclidean algorithm step
by step. We can solve the first line for r2 = a − b · q1 and substitute it into
the second line to get
b = (a − b · q1) · q2 + r3, so r3 = −a · q2 + b · (1 + q1q2).
Next substitute the expressions for r2 and r3 into the third line to get
a − b · q1 =

−a · q2 + b · (1 + q1q2)

q3 + r4.

After rearranging the terms, this gives
r4 = a · (1 + q2q3) − b · (q1 + q3 + q1q2q3).
The key point is that r4 = a · u + b · v, where u and v are integers. It does
not matter that the expressions for u and v in terms of q1, q2, q3 are rather
messy. Continuing in this fashion, at each stage we find that ri is the sum of
an integer multiple of a and an integer multiple of b. Eventually, we get to
rt = a·u+b·v for some integers u and v. But rt = gcd(a, b), which completes
the proof of the first part of the theorem. We leave the second part as an
exercise (Exercise 1.11).
An especially important case of the extended Euclidean algorithm arises
when the greatest common divisor of a and b is 1. In this case we give a and b
a special name.
Definition. Let a and b be integers. We say that a and b are relatively prime
if gcd(a, b) = 1.
More generally, any equation
Au + Bv = gcd(A, B)
can be reduced to the case of relatively prime numbers by dividing both sides
by gcd(A, B). Thus
A
gcd(A, B)
u +
B
gcd(A, B)
v = 1,
where a = A/ gcd(A, B) and b = B/ gcd(A, B) are relatively prime and sat-
isfy au+bv = 1. For example, we found earlier that 2024 and 748 have greatest
common divisor 44 and satisfy
−7 · 2024 + 19 · 748 = 44.
Dividing both sides by 44, we obtain
−7 · 46 + 19 · 17 = 1.
Thus 2024/44 = 46 and 748/44 = 17 are relatively prime, and u = −7 and
v = 19 are the coefficients of a linear combination of 46 and 17 that equals 1.
In Example 1.10 we explained how to substitute the values from the
Euclidean algorithm in order to solve au + bv = gcd(a, b). Exercise 1.12
describes an efficient computer-oriented algorithm for computing u and v.
If a and b are relatively prime, we now describe a more conceptual version of
this substitution procedure. We first illustrate with the example a = 73 and
b = 25. The Euclidean algorithm gives
73 = 25 · 2 + 23

25 = 23 · 1 + 2
23 = 2 · 11 + 1
2 = 1 · 2 + 0.
We set up a box, using the sequence of quotients 2, 1, 11, and 2, as follows:
2 1 11 2
0 1 ∗ ∗ ∗ ∗
1 0 ∗ ∗ ∗ ∗
Then the rule to fill in the remaining entries is as follows:
New Entry = (Number at Top) · (Number to the Left)
+ (Number Two Spaces to the Left).
Thus the two leftmost ∗’s are
2 · 1 + 0 = 2 and 2 · 0 + 1 = 1,
so now our box looks like this:
2 1 11 2
0 1 2 ∗ ∗ ∗
1 0 1 ∗ ∗ ∗
Then the next two leftmost ∗’s are
1 · 2 + 1 = 3 and 1 · 1 + 0 = 1,
and then the next two are
11 · 3 + 2 = 35 and 11 · 1 + 1 = 12,
and the final entries are
2 · 35 + 3 = 73 and 2 · 12 + 1 = 25.
The completed box is
2 1 11 2
0 1 2 3 35 73
1 0 1 1 12 25
Notice that the last column repeats a and b. More importantly, the next to
last column gives the values of −v and u (in that order). Thus in this example
we find that 73 · 12 − 25 · 35 = 1. The general algorithm is given in Fig. 1.3.

1.3. Modular Arithmetic 19
In general, if a and b are relatively prime and if q1, q2, . . . , qt is the
sequence of quotients obtained from applying the Euclidean algorithm
to a and b as in Figure 1.2 on page 13, then the box has the form
q1 q2 . . . qt−1 qt
0 1 P1 P2 . . . Pt−1 a
1 0 Q1 Q2 . . . Qt−1 b
The entries in the box are calculated using the initial values
P1 = q1, Q1 = 1, P2 = q2 · P1 + 1, Q2 = q2 · Q1,
and then, for i ≥ 3, using the formulas
Pi = qi · Pi−1 + Pi−2 and Qi = qi · Qi−1 + Qi−2.
The final four entries in the box satisfy
a · Qt−1 − b · Pt−1 = (−1)t
.
Multiplying both sides by (−1)t
gives the solution u = (−1)t
Qt−1
and v = (−1)t+1
Pt−1 to the equation au + bv = 1.
Figure 1.3: Solving au + bv = 1 using the Euclidean algorithm
1.3 Modular Arithmetic
You may have encountered “clock arithmetic” in grade school, where after you
get to 12, the next number is 1. This leads to odd-looking equations such as
6 + 9 = 3 and 2 − 3 = 11.
These look strange, but they are true using clock arithmetic, since for ex-
ample 11 o’clock is 3 h before 2 o’clock. So what we are really doing is first
computing 2 − 3 = −1 and then adding 12 to the answer. Similarly, 9 h af-
ter 6 o’clock is 3 o’clock, since 6 + 9 − 12 = 3.
The theory of congruences is a powerful method in number theory that is
based on the simple idea of clock arithmetic.
Definition. Let m ≥ 1 be an integer. We say that the integers a and b are
congruent modulo m if their difference a − b is divisible by m. We write
a ≡ b (mod m)
to indicate that a and b are congruent modulo m. The number m is called the
modulus.

Our clock examples may be written as congruences using the modulus
m = 12:
6 + 9 = 15 ≡ 3 (mod 12) and 2 − 3 = −1 ≡ 11 (mod 12).
Example 1.12. We have
17 ≡ 7 (mod 5), since 5 divides 10 = 17 − 7.
On the other hand,
19 ≡ 6 (mod 11), since 11 does not divide 13 = 19 − 6.
Notice that the numbers satisfying
a ≡ 0 (mod m)
are the numbers that are divisible by m, i.e., the multiples of m.
The reason that congruence notation is so useful is that congruences be-
have much like equalities, as the following proposition indicates.
Proposition 1.13. Let m ≥ 1 be an integer.
(a) If a1 ≡ a2 (mod m) and b1 ≡ b2 (mod m), then
a1 ± b1 ≡ a2 ± b2 (mod m) and a1 · b1 ≡ a2 · b2 (mod m).
(b) Let a be an integer. Then
a · b ≡ 1 (mod m) for some integer b if and only if gcd(a, m) = 1.
Further, if a · b1 ≡ a · b2 ≡ 1 (mod m), then b1 ≡ b2 (mod m). We call b
the (multiplicative) inverse of a modulo m.
Proof. (a) We leave this as an exercise; see Exercise 1.15.
(b) Suppose first that gcd(a, m) = 1. Then Theorem 1.11 tells us that we can
find integers u and v satisfying au + mv = 1. This means that au − 1 = −mv
is divisible by m, so by definition, au ≡ 1 (mod m). In other words, we can
take b = u.
For the other direction, suppose that a has an inverse modulo m, say
a · b ≡ 1 (mod m). This means that ab − 1 = cm for some integer c. It follows
that gcd(a, m) divides ab − cm = 1, so gcd(a, m) = 1. This completes the
proof that a has an inverse modulo m if and only if gcd(a, m) = 1. It remains
to show that the inverse is unique modulo m.
So suppose that a · b1 ≡ a · b2 ≡ 1 (mod m). Then
b1 ≡ b1 · 1 ≡ β1 · (a · b2) ≡ (b1 · a) · b2 ≡ 1 · b2 ≡ b2 (mod m),
which completes the proof of Proposition 1.13.

Proposition 1.13(b) says that if gcd(a, m) = 1, then there exists an
inverse b of a modulo m. This has the curious consequence that the fraction
a−1
= 1/a has a meaningful interpretation in the world of integers modulo m,
namely a−1
modulo m is the unique number b modulo m satisfying the con-
gruence ab ≡ 1 (mod m).
Example 1.14. We take m = 5 and a = 2. Clearly gcd(2, 5) = 1, so there exists
an inverse to 2 modulo 5. The inverse of 2 modulo 5 is 3, since 2·3 ≡ 1 (mod 5),
so 2−1
≡ 3 (mod 5). Similarly gcd(4, 15) = 1 so 4−1
exists modulo 15. In fact
4 · 4 ≡ 1 (mod 15) so 4 is its own inverse modulo 15.
We can even work with fractions a/d modulo m as long as the denominator
is relatively prime to m. For example, we can compute 5/7 modulo 11 by first
observing that 7 · 8 ≡ 1 (mod 11), so 7−1
≡ 8 (mod 11). Then
5
7
= 5 · 7−1
≡ 5 · 8 ≡ 40 ≡ 7 (mod 11).
Remark 1.15. In the preceding examples it was easy to find inverses mod-
ulo m by trial and error. However, when m is large, it is more challenging
to compute a−1
modulo m. Note that we showed that inverses exist by us-
ing the extended Euclidean algorithm (Theorem 1.11). In order to actually
compute the u and v that appear in the equation au + mv = gcd(a, m), we
can apply the Euclidean algorithm directly as we did in Example 1.10, or
we can use the somewhat more efficient box method described at the end of
the preceding section, or we can use the algorithm given in Exercise 1.12.
In any case, since the Euclidean algorithm takes at most 2 log2(b) + 2 itera-
tions to compute gcd(a, b), it takes only a small multiple of log2(m) steps to
compute a−1
modulo m.
We now continue our development of the theory of modular arithmetic.
If a divided by m has quotient q and remainder r, it can be written as
a = m · q + r with 0 ≤ r m.
This shows that a ≡ r (mod m) for some integer r between 0 and m − 1, so
if we want to work with integers modulo m, it is enough to use the integers
0 ≤ r m. This prompts the following definition.
Definition. We write
Z/mZ = {0, 1, 2, . . . , m − 1}
and call Z/mZ the ring of integers modulo m. We add and multiply elements
of Z/mZ by adding or multiplying them as integers and then dividing the
result by m and taking the remainder in order to obtain an element in Z/mZ.
Figure 1.4 illustrates the ring Z/5Z by giving complete addition and mul-
tiplication tables modulo 5.

+ 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
· 0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
4 0 4 3 2 1
Figure 1.4: Addition and multiplication tables modulo 5
Remark 1.16. If you have studied ring theory, you will recognize that Z/mZ
is the quotient ring of Z by the principal ideal mZ, and that the num-
bers 0, 1, . . . , m − 1 are actually coset representatives for the congruence
classes that comprise the elements of Z/mZ. For a discussion of congruence
classes and general quotient rings, see Sect. 2.10.2.
Definition. Proposition 1.13(b) tells us that a has an inverse modulo m if
and only if gcd(a, m) = 1. Numbers that have inverses are called units. We
denote the set of all units by
(Z/mZ)∗
= {a ∈ Z/mZ : gcd(a, m) = 1}
= {a ∈ Z/mZ : a has an inverse modulo m}.
The set (Z/mZ)∗
is called the group of units modulo m.
Notice that if a1 and a2 are units modulo m, then so is a1a2. (Do you see
why this is true?) So when we multiply two units, we always get a unit. On
the other hand, if we add two units, we often do not get a unit.
Example 1.17. The group of units modulo 24 is
(Z/24Z)∗
= {1, 5, 7, 11, 13, 17, 19, 23}.
Similarly, the group of units modulo 7 is
(Z/7Z)∗
= {1, 2, 3, 4, 5, 6},
since every number between 1 and 6 is relatively prime to 7. The multiplication
tables for (Z/24Z)∗
and (Z/7Z)∗
are illustrated in Fig. 1.5.
In many of the cryptosystems that we will study, it is important to know
how many elements are in the unit group modulo m. This quantity is suffi-
ciently ubiquitous that we give it a name.
Definition. Euler’s phi function (also sometimes known as Euler’s totient
function) is the function φ(m) defined by the rule
φ(m) = # (Z/mZ)
∗
= #{0 ≤ a m : gcd(a, m) = 1}.
For example, we see from Example 1.17 that φ(24) = 8 and φ(7) = 6.

· 1 5 7 11 13 17 19 23
1 1 5 7 11 13 17 19 23
5 5 1 11 7 17 13 23 19
7 7 11 1 5 19 23 13 17
11 11 7 5 1 23 19 17 13
13 13 17 19 23 1 5 7 11
17 17 13 23 19 5 1 11 7
19 19 23 13 17 7 11 1 5
23 23 19 17 13 11 7 5 1
Unit group modulo 24
· 1 2 3 4 5 6
1 1 2 3 4 5 6
2 2 4 6 1 3 5
3 3 6 2 5 1 4
4 4 1 5 2 6 3
5 5 3 1 6 4 2
6 6 5 4 3 2 1
Unit group modulo 7
Figure 1.5: The unit groups (Z/24Z)∗
and (Z/7Z)∗
1.3.1 Modular Arithmetic and Shift Ciphers
Recall that the Caesar (or shift) cipher studied in Sect. 1.1 works by shifting
each letter in the alphabet a ﬁxed number of letters. We can describe a shift
cipher mathematically by assigning a number to each letter as in Table 1.7.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Table 1.7: Assigning numbers to letters
Then a shift cipher with shift k takes a plaintext letter corresponding to
the number p and assigns it to the ciphertext letter corresponding to the
number p + k mod 26. Notice how the use of modular arithmetic, in this case
modulo 26, simpliﬁes the description of the shift cipher. The shift amount
serves as both the encryption key and the decryption key. Encryption is given
by the formula
(Ciphertext Letter) ≡ (Plaintext Letter) + (Secret Key) (mod 26),
and decryption works by shifting in the opposite direction,

(Plaintext Letter) ≡ (Ciphertext Letter) − (Secret Key) (mod 26).
More succinctly, if we let
p = Plaintext Letter, c = Ciphertext Letter, k = Secret Key,
then
c ≡ p + k (mod 26)

Encryption
and p ≡ c − k (mod 26)

Decryption
.
1.3.2 The Fast Powering Algorithm
In some cryptosystems that we will study, for example the RSA and Diffie–
Hellman cryptosystems, Alice and Bob are required to compute large powers
of a number g modulo another number N, where N may have hundreds of
digits. The naive way to compute gA
is by repeated multiplication by g. Thus
g1 ≡ g (mod N), g2 ≡ g · g1 (mod N), g3 ≡ g · g2 (mod N),
g4 ≡ g · g3 (mod N), g5 ≡ g · g4 (mod N), . . . .
It is clear that gA ≡ gA
(mod N), but if A is large, this algorithm is completely
impractical. For example, if A ≈ 21000
, then the naive algorithm would take
longer than the estimated age of the universe! Clearly if it is to be useful, we
need to find a better way to compute gA
(mod N).
The idea is to use the binary expansion of the exponent A to convert
the calculation of gA
into a succession of squarings and multiplications. An
example will make the idea clear, after which we give a formal description of
the method.
Example 1.18. Suppose that we want to compute 3218
(mod 1000). The first
step is to write 218 as a sum of powers of 2,
218 = 2 + 23
+ 24
+ 26
+ 27
.
Then 3218
becomes
3218
= 32+23
+24
+26
+27
= 32
· 323
· 324
· 326
· 327
. (1.3)
Notice that it is relatively easy to compute the sequence of values
3, 32
, 322
, 323
, 324
, . . . ,
since each number in the sequence is the square of the preceding one. Further,
since we only need these values modulo 1000, we never need to store more
than three digits. Table 1.8 lists the powers of 3 modulo 1000 up to 327
.
Creating Table 1.8 requires only 7 multiplications, despite the fact that the
number 327
= 3128
has quite a large exponent, because each successive entry
in the table is equal to the square of the previous entry.

i 0 1 2 3 4 5 6 7
32i
(mod 1000) 3 9 81 561 721 841 281 961
Table 1.8: Successive square powers of 3 modulo 1000
We use (1.3) to decide which powers from Table 1.8 are needed to com-
pute 3218
. Thus
3218
= 32
· 323
· 324
· 326
· 327
≡ 9 · 561 · 721 · 281 · 961 (mod 1000)
≡ 489 (mod 1000).
We note that in computing the product 9 · 561 · 721 · 281 · 961, we may reduce
modulo 1000 after each multiplication, so we never need to deal with very
large numbers. We also observe that it has taken us only 11 multiplications
to compute 3218
(mod 1000), a huge savings over the naive approach. And for
larger exponents we would save even more.
The general approach used in Example 1.18 goes by various names, includ-
ing the Fast Powering Algorithm and the Square-and-Multiply Algorithm.8
We
now describe the algorithm more formally.
The Fast Powering Algorithm
Step 1. Compute the binary expansion of A as
A = A0 +A1 ·2+A2 ·22
+A3 ·23
+· · ·+Ar ·2r
with A0, . . . , Ar ∈ {0, 1},
where we may assume that Ar = 1.
Step 2. Compute the powers g2i
(mod N) for 0 ≤ i ≤ r by successive squar-
ing,
a0 ≡ g (mod N)
a1 ≡ a2
0 ≡ g2
(mod N)
a2 ≡ a2
1 ≡ g22
(mod N)
a3 ≡ a2
2 ≡ g23
(mod N)
.
.
.
.
.
.
.
.
.
ar ≡ a2
r−1 ≡ g2r
(mod N).
Each term is the square of the previous one, so this requires r
multiplications.
8The ﬁrst known recorded description of the fast powering algorithm appeared in India
before 200 BC, while the ﬁrst reference outside India dates to around 950 AD. See [66,
page 441] for a brief discussion and further references.

Step 3. Compute gA
(mod N) using the formula
gA
= gA0+A1·2+A2·22
+A3·23
+···+Ar·2r
= gA0
· (g2
)A1
· (g22
)A2
· (g23
)A3
· · · (g2r
)Ar
≡ aA0
0 · aA1
1 · aA2
2 · aA3
3 · · · aAr
r (mod N). (1.4)
Note that the quantities a0, a1, . . . , ar were computed in Step 2. Thus the
product (1.4) can be computed by looking up the values of the ai’s whose
exponent Ai is 1 and then multiplying them together. This requires at
most another r multiplications.
Running Time. It takes at most 2r multiplications modulo N to com-
pute gA
. Since A ≥ 2r
, we see that it takes at most 2 log2(A) mul-
tiplications9
modulo N to compute gA
. Thus even if A is very large,
say A ≈ 21000
, it is easy for a computer to do the approximately 2000
multiplications needed to calculate 2A
modulo N.
Efficiency Issues. There are various ways in which the square-and-multiply
algorithm can be made somewhat more efficient, in particular regarding
eliminating storage requirements; see Exercise 1.25 for an example.
1.4 Prime Numbers, Unique Factorization,
and Finite Fields
In Sect. 1.3 we studied modular arithmetic and saw that it makes sense to
add, subtract, and multiply integers modulo m. Division, however, can be
problematic, since we can divide by a in Z/mZ only if gcd(a, m) = 1. But
notice that if the integer m is a prime, then we can divide by every nonzero
element of Z/mZ. We start with a brief discussion of prime numbers before
returning to the ring Z/pZ with p prime.
Definition. An integer p is called a prime if p ≥ 2 and if the only positive
integers dividing p are 1 and p.
For example, the first ten primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, while the
hundred thousandth prime is 1299709 and the millionth is 15485863. There are
infinitely many primes, a fact that was known in ancient Greece and appears
as a theorem in Euclid’s Elements. (See Exercise 1.28.)
A prime p is defined in terms of the numbers that divide p. So the following
proposition, which describes a useful property of numbers that are divisible
by p, is not obvious and needs to be carefully proved. Notice that the proposi-
tion is false for composite numbers. For example, 6 divides 3·10, but 6 divides
neither 3 nor 10.
9Note that log2(A) means the usual logarithm to the base 2, not the so-called discrete
logarithm that will be discussed in Chap. 2.

1.4. Prime Numbers, Unique Factorization, and Finite Fields 27
Proposition 1.19. Let p be a prime number, and suppose that p divides the
product ab of two integers a and b. Then p divides at least one of a and b.
More generally, if p divides a product of integers, say
p | a1a2 · · · an,
then p divides at least one of the individual ai.
Proof. Let g = gcd(a, p). Then g | p, so either g = 1 or g = p. If g = p, then
p | a (since g | a), so we are done. Otherwise, g = 1 and Theorem 1.11 tells us
that we can find integers u and v satisfying au + pv = 1. We multiply both
sides of the equation by b to get
abu + pbv = b. (1.5)
By assumption, p divides the product ab, and certainly p divides pbv, so p di-
vides both terms on the left-hand side of (1.5). Hence it divides the right-hand
side, which shows that p divides b and completes the proof of Proposition 1.19.
To prove the more general statement, we write the product as a1(a2 · · · an)
and apply the first statement with a = a1 and b = a2 · · · an. If p | a1, we’re
done. Otherwise, p | a2 · · · an, so writing this as a2(a3 · · · an), the first state-
ment tells us that either p | a2 or p | a3 · · · an. Continuing in this fashion, we
must eventually find some ai that is divisible by p.
As an application of Proposition 1.19, we prove that every positive integer
has an essentially unique factorization as a product of primes.
Theorem 1.20 (The Fundamental Theorem of Arithmetic). Let a ≥ 2 be an
integer. Then a can be factored as a product of prime numbers
a = pe1
1 · pe2
2 · pe3
3 · · · per
r .
Further, other than rearranging the order of the primes, this factorization into
prime powers is unique.
Proof. It is not hard to prove that every a ≥ 2 can be factored into a product
of primes. It is tempting to assume that the uniqueness of the factorization is
also obvious. However, this is not the case; unique factorization is a somewhat
subtle property of the integers. We will prove it using the general form of
Proposition 1.19. (For an example of a situation in which unique factorization
fails to be true, see the E-zone described in [137, Chapter 7].)
Suppose that a has two factorizations into products of primes,
a = p1p2 · · · ps = q1q2 · · · qt, (1.6)
where the pi and qj are all primes, not necessarily distinct, and s does not
necessarily equal t. Since p1 | a, we see that p1 divides the product q1q2q3 · · · qt.
Thus by the general form of Proposition 1.19, we find that p1 divides one of

the qi. Rearranging the order of the qi if necessary, we may assume that p1 | q1.
But p1 and q1 are both primes, so we must have p1 = q1. This allows us to
cancel them from both sides of (1.6), which yields
p2p3 · · · ps = q2q3 · · · qt.
Repeating this process s times, we ultimately reach an equation of the form
1 = qt−sqt−s+1 · · · qt.
It follows immediately that t = s and that the original factorizations of a
were identical up to rearranging the order of the factors. (For a more detailed
proof of the fundamental theorem of arithmetic, see any basic number theory
textbook, for example [35, 52, 59, 100, 111, 137].)
Definition. The fundamental theorem of arithmetic (Theorem 1.20) says that
in the factorization of a positive integer a into primes, each prime p appears
to a particular power. We denote this power by ordp(a) and call it the order
(or exponent) of p in a. (For convenience, we set ordp(1) = 0 for all primes.)
For example, the factorization of 1728 is 1728 = 26
· 33
, so
ord2(1728) = 6, ord3(1728) = 3, and ordp(1728) = 0 for all primes p ≥ 5.
Using the ordp notation, the factorization of a can be succinctly written as
a =

primes p
pordp(a)
.
Note that this product makes sense, since ordp(a) is zero for all but finitely
many primes.
It is useful to view ordp as a function
ordp : {1, 2, 3, . . .} −→ {0, 1, 2, 3, . . .}. (1.7)
This function has a number of interesting properties, some of which are de-
scribed in Exercise 1.31.
We now observe that if p is a prime, then every nonzero number modulo p
has a multiplicative inverse modulo p. This means that when we do arithmetic
modulo a prime p, not only can we add, subtract, multiply, but we can also
divide by nonzero numbers, just as we can with real numbers. This property
of primes is sufficiently important that we formally state it as a proposition.
Proposition 1.21. Let p be a prime. Then every nonzero element a in Z/pZ
has a multiplicative inverse, that is, there is a number b satisfying
ab ≡ 1 (mod p).
We denote this value of b by a−1
mod p, or if p has already been specified,
then simply by a−1
.

1.5. Powers and Primitive Roots in Finite Fields 29
Proof. This proposition is a special case of Proposition 1.13(b) using the prime
modulus p, since if a ∈ Z/pZ is not zero, then gcd(a, p) = 1.
Remark 1.22. The extended Euclidean algorithm (Theorem 1.11) gives us an
efficient computational method for computing a−1
mod p. We simply solve
the equation
au + pv = 1 in integers u and v,
and then u = a−1
mod p. For an alternative method of computing a−1
mod p,
see Remark 1.26.
Proposition 1.21 can be restated by saying that if p is prime, then
(Z/pZ)∗
= {1, 2, 3, 4, . . . , p − 1}.
In other words, when the 0 element is removed from Z/pZ, the remaining
elements are units and closed under multiplication.
Definition. If p is prime, then the set Z/pZ of integers modulo p with its
addition, subtraction, multiplication, and division rules is an example of a
field. If you have studied abstract algebra (or see Sect. 2.10), you know that
a field is the general name for a (commutative) ring in which every nonzero
element has a multiplicative inverse. You are already familiar with some other
fields, for example the field of real numbers R, the field of rational numbers
(fractions) Q, and the field of complex numbers C.
The field Z/pZ of integers modulo p has only finitely many elements. It
is a finite field and is often denoted by Fp. Thus Fp and Z/pZ are really just
two different notations for the same object.10
Similarly, we write F∗
p inter-
changeably for the group of units (Z/pZ)∗
. Finite fields are of fundamental
importance throughout cryptography, and indeed throughout all of mathe-
matics.
Remark 1.23. Although Z/pZ and Fp are used to denote the same concept,
equality of elements is expressed somewhat differently in the two settings. For
a, b ∈ Fp, the equality of a and b is denoted by a = b, while for a, b ∈ Z/pZ,
the equality of a and b is denoted by equivalence modulo p, i.e., a ≡ b (mod p).
1.5 Powers and Primitive Roots
in Finite Fields
The application of finite fields in cryptography often involves raising elements
of Fp to high powers. As a practical matter, we know how to do this effi-
ciently using the powering algorithm described in Sect. 1.3.2. In this section
10Finite fields are also sometimes called Galois fields, after Évariste Galois, who studied
them in the nineteenth century. Yet another notation for Fp is GF(p), in honor of Galois.
And yet one more notation for Fp that you may run across is Zp, although in number theory
the notation Zp is more commonly reserved for the ring of p-adic integers.

we investigate powers in Fp from a purely mathematical viewpoint, prove a
fundamental result due to Fermat, and state an important property of the
group of units F∗
p.
We begin with a simple example. Table 1.9 lists the powers of 1, 2, 3, . . . , 6
modulo the prime 7.
11
≡ 1 12
≡ 1 13
≡ 1 14
≡ 1 15
≡ 1 16
≡ 1
21
≡ 2 22
≡ 4 23
≡ 1 24
≡ 2 25
≡ 4 26
≡ 1
31
≡ 3 32
≡ 2 33
≡ 6 34
≡ 4 35
≡ 5 36
≡ 1
41
≡ 4 42
≡ 2 43
≡ 1 44
≡ 4 45
≡ 2 46
≡ 1
51
≡ 5 52
≡ 4 53
≡ 6 54
≡ 2 55
≡ 3 56
≡ 1
61
≡ 6 62
≡ 1 63
≡ 6 64
≡ 1 65
≡ 6 66
≡ 1
Table 1.9: Powers of numbers modulo 7
There are quite a few interesting patterns visible in Table 1.9, including
in particular the fact that the right-hand column consists entirely of ones. We
can restate this observation by saying that
a6
≡ 1 (mod 7) for every a = 1, 2, 3, . . . , 6.
Of course, this cannot be true for all values of a, since if a is a multiple
of 7, then so are all of its powers, so in that case an
≡ 0 (mod 7). On the
other hand, if a is not divisible by 7, then a is congruent to one of the val-
ues 1, 2, 3, . . . , 6 modulo 7. Hence
a6
≡

1 (mod 7) if 7 a,
0 (mod 7) if 7 | a.
Further experiments with other primes suggest that this example reﬂects a
general fact.
Theorem 1.24 (Fermat’s Little Theorem). Let p be a prime number and
let a be any integer. Then
ap−1
≡

1 (mod p) if p a,
0 (mod p) if p | a.
Proof. There are many proofs of Fermat’s little theorem. If you have studied
group theory, the quickest proof is to observe that the nonzero elements in Fp
form a group F∗
p of order p − 1, so by Lagrange’s theorem, every element of F∗
p
has order dividing p − 1. For those who have not yet taken a course in group
theory, we provide a direct proof.

If p | a, then it is clear that every power of a is divisible by p. So we only
need to consider the case that p a. We now look at the list of numbers
a, 2a, 3a, . . . , (p − 1)a reduced modulo p. (1.8)
There are p − 1 numbers in this list, and we claim that they are all different.
To see why, take any two of them, say ja mod p and ka mod p, and suppose
that they are the same. This means that
ja ≡ ka (mod p), and hence that (j − k)a ≡ 0 (mod p).
Thus p divides the product (j − k)a. Proposition 1.19 tells us that either p
divides j − k or p divides a. However, we have assumed that p does not di-
vide a, so we conclude that p divides j − k. But both j and k are between 1
and p − 1, so their difference j − k is between −(p − 2) and p − 2. There is
only one number between −(p − 2) and p − 2 that is divisible by p, and that
number is zero! This proves that j − k = 0, which means that ja = ka. We
have thus shown that the p − 1 numbers in the list (1.8) are all different. They
are also nonzero, since 1, 2, 3, . . . , p − 1 and a are not divisible by p.
To recapitulate, we have shown that the list of numbers (1.8) consists of
p − 1 distinct numbers between 1 and p − 1. But there are only p − 1 distinct
numbers between 1 and p − 1, so the list of numbers (1.8) must simply be the
list of numbers 1, 2, . . . , p − 1 in some mixed up order.
Now consider what happens when we multiply together all of the numbers
a, 2a, 3a, . . . , (p−1)a in the list (1.8) and reduce the product modulo p. This is
the same as multiplying together all of the numbers 1, 2, 3, . . . , p − 1 modulo p,
so we get a congruence
a · 2a · 3a · · · (p − 1)a ≡ 1 · 2 · 3 · · · (p − 1) (mod p).
There are p − 1 copies of a appearing on the left-hand side. We factor these
out and use factorial notation (p − 1)! = 1 · 2 · · · (p − 1) to obtain
ap−1
· (p − 1)! ≡ (p − 1)! (mod p).
Finally, we are allowed to cancel (p − 1)! from both sides, since it is not
divisible by p. (We are using the fact that Fp is a field, so we are allowed to
divide by any nonzero number.) This yields
ap−1
≡ 1 (mod p),
which completes the proof of Fermat’s “little” theorem.11
11You may wonder why Theorem 1.24 is called a “little” theorem. The reason is to
distinguish it from Fermat’s “big” theorem, which is the famous assertion that xn +yn = zn
has no solutions in positive integers x, y, z if n ≥ 3. It is unlikely that Fermat himself could
prove this big theorem, but in 1996, more than three centuries after Fermat’s era, Andrew
Wiles finally found a proof.

Example 1.25. The number p = 15485863 is prime, so Fermat’s little theorem
(Theorem 1.24) tells us that
215485862
≡ 1 (mod 15485863).
Thus without doing any computing, we know that the number 215485862
− 1,
a number having more than two million digits, is a multiple of 15485863.
Remark 1.26. Fermat’s little theorem (Theorem 1.24) and the fast power-
ing algorithm (Sect. 1.3.2) provide us with a reasonably efficient method of
computing inverses modulo p, namely
a−1
≡ ap−2
(mod p).
This congruence is true because if we multiply ap−2
by a, then Fermat’s theo-
rem tells us that the product is equal to 1 modulo p. This gives an alternative
to the extended Euclidean algorithm method described in Remark 1.22. In
practice, the two algorithms tend to take about the same amount of time,
although there are variants of the Euclidean algorithm that are somewhat
faster in practice; see for example [66, Chapter 4.5.3, Theorem E].
Example 1.27. We compute the inverse of 7814 modulo 17449 in two ways.
First,
7814−1
≡ 781417447
≡ 1284 (mod 17449).
Second, we use the extended Euclidean algorithm to solve
7814u + 17449v = 1.
The solution is (u, v) = (1284, −575), so 7814−1
≡ 1284 (mod 17449).
Example 1.28. Consider the number m = 15485207. Using the powering al-
gorithm, it is not hard to compute (on a computer)
2m−1
= 215485206
≡ 4136685 (mod 15485207).
We did not get the value 1, so it seems that Fermat’s little theorem is not true
for m. What does that tell us? If m were prime, then Fermat’s little theorem
says that we would have obtained 1. Hence the fact that we did not get 1
proves that the number m = 15485207 is not prime.
Think about this for a minute, because it’s actually a bit astonishing. By
a simple computation, we have conclusively proven that m is not prime, yet
we do not know any of its factors!12
Fermat’s little theorem tells us that if a is an integer not divisible by p,
then ap−1
≡ 1 (mod p). However, for any particular value of a, there may
well be smaller powers of a that are congruent to 1. We define the order of a
modulo p to be the smallest exponent k ≥ 1 such that13
12The prime factorization of m is m = 15485207 = 3853 · 4019.
13We earlier defined the order of p in a to be the exponent of p when a is factored into
primes. Thus unfortunately, the word “order” has two different meanings. You will need to
judge which one is meant from the context.

ak
≡ 1 (mod p).
Proposition 1.29. Let p be a prime and let a be an integer not divisible
by p. Suppose that an
≡ 1 (mod p). Then the order of a modulo p divides n.
In particular, the order of a divides p − 1.
Proof. Let k be the order of a modulo p, so by definition ak
≡ 1 (mod p),
and k is the smallest positive exponent with this property. We are given
that an
≡ 1 (mod p). We divide n by k to obtain
n = kq + r with 0 ≤ r k.
Then
1 ≡ an
≡ akq+r
≡ (ak
)q
· ar
≡ 1q
· ar
≡ ar
(mod p).
But r k, so the fact that k is the smallest positive power of a that is
congruent to 1 tells us that r must equal 0. Therefore n = kq, so k divides n.
Finally, Fermat’s little theorem tells us that ap−1
≡ 1 (mod p), so k di-
vides p − 1.
Fermat’s little theorem describes a special property of the units (i.e., the
nonzero elements) in a finite field. We conclude this section with a brief dis-
cussion of another property that is quite important both theoretically and
practically.
Theorem 1.30 (Primitive Root Theorem). Let p be a prime number. Then
there exists an element g ∈ F∗
p whose powers give every element of F∗
p, i.e.,
F∗
p = {1, g, g2
, g3
, . . . , gp−2
}.
Elements with this property are called primitive roots of Fp or generators
of F∗
p. They are the elements of F∗
p having order p − 1.
Proof. See [137, Chapter 20] or one of the texts [35, 52, 59, 100, 111].
Example 1.31. The field F11 has 2 as a primitive root, since in F11,
20
= 1 21
= 2 22
= 4 23
= 8 24
= 5
25
= 10 26
= 9 27
= 7 28
= 3 29
= 6.
Thus all 10 nonzero elements of F11 have been generated as powers of 2. On
the other hand, 2 is not a primitive root for F17, since in F17,
20
= 1 21
= 2 22
= 4 23
= 8 24
= 16
25
= 15 26
= 13 27
= 9 28
= 1,
so we get back to 1 before obtaining all 16 nonzero values modulo 17. However,
it turns out that 3 is a primitive root for 17, since in F17,

30
= 1 31
= 3 32
= 9 33
= 10 34
= 13 35
= 5
36
= 15 37
= 11 38
= 16 39
= 14 310
= 8 311
= 7
312
= 4 313
= 12 314
= 2 315
= 6.
Remark 1.32. If p is large, then the finite field Fp has quite a few primitive
roots. The precise formula says that Fp has exactly φ(p − 1) primitive roots,
where φ is Euler’s phi function (see page 22). For example, you can check that
the following is a complete list of the primitive roots for F29:
{2, 3, 8, 10, 11, 14, 15, 18, 19, 21, 26, 27}.
This agrees with the value φ(28) = 12. More generally, if k divides p − 1, then
there are exactly φ(k) elements of F∗
p having order k.
1.6 Cryptography Before the Computer Age
We pause for a short foray into the history of pre-computer cryptography.
Our hope is that these brief notes will whet your appetite for further reading
on this fascinating subject, in which political intrigue, daring adventure, and
romantic episodes play an equal role with technical achievements.
The origins of cryptography are lost in the mists of time, but presumably
secret writing arose shortly after people started using some form of written
communication, since one imagines that the notion of confidential information
must date back to the dawn of civilization. There are early recorded descrip-
tions of ciphers being used in Roman times, including Julius Caesar’s shift
cipher from Sect. 1.1, and certainly from that time onward, many civilizations
have used both substitution ciphers, in which each letter is replaced by an-
other letter or symbol, and transposition ciphers, in which the order of the
letters is rearranged.
The invention of cryptanalysis, that is, the art of decrypting messages
without previous knowledge of the key, is more recent. The oldest surviving
texts, which include references to earlier lost volumes, are by Arab scholars
from the fourteenth and fifteenth centuries. These books describe not only
simple substitution and transposition ciphers, but also the first recorded in-
stance of a homophonic substitution cipher, which is a cipher in which a single
plaintext letter may be represented by any one of several possible ciphertext
letters. More importantly, they contain the first description of serious methods
of cryptanalysis, including the use of letter frequency counts and the likelihood
that certain pairs of letters will appear adjacent to one another. Unfortunately,
most of this knowledge seems to have disappeared by the seventeenth century.
Meanwhile, as Europe emerged from the Middle Ages, political states in
Italy and elsewhere required secure communications, and both cryptography
and cryptanalysis began to develop. The earliest known European homo-
phonic substitution cipher dates from 1401. The use of such a cipher suggests

1.6. Cryptography Before the Computer Age 35
contemporary knowledge of cryptanalysis via frequency analysis, since the
only reason to use a homophonic system is to make such cryptanalysis more
difficult.
In the fifteenth and sixteenth centuries there arose a variety of what are
known as polyalphabetic ciphers. (We will see an example of a polyalphabetic
cipher, called the Vigenère cipher, in Sect. 5.2.) The basic idea is that each
letter of the plaintext is enciphered using a different simple substitution ci-
pher. The name “polyalphabetic” refers to the use of many different cipher
alphabets, which were used according to some sort of key. If the key is rea-
sonably long, then it takes a long time for the any given cipher alphabet to
be used a second time. It wasn’t until the nineteenth century that statistical
methods were developed to reliably solve such systems, although there are
earlier recorded instances of cryptanalysis via special tricks or lucky guesses
of part of the message or the key. Jumping forward several centuries, we note
that the machine ciphers that played a large role in World War II were, in
essence, extremely complicated polyalphabetic ciphers.
Ciphers and codes14
for both political and military purposes become
increasingly widespread during the eighteenth, nineteenth, and early twentieth
centuries, as did cryptanalytic methods, although the level of sophistication
varied widely from generation to generation and from country to country. For
example, as the United States prepared to enter World War I in 1917, the
U.S. Army was using ciphers, inferior to those invented in Italy in the 1600s,
that any trained cryptanalyst of the time would have been able to break in a
few hours!
The invention and widespread deployment of long-range communication
methods, especially the telegraph, opened the need for political, military, and
commercial ciphers, and there are many fascinating stories of intercepted and
decrypted telegraph messages playing a role in historical events. One exam-
ple, the infamous Zimmerman telegram, will suffice. With the United States
maintaining neutrality in 1917 as Germany battled France and Britain on
the Western Front, the Germans decided that their best hope for victory was
to tighten their blockade of Britain by commencing unrestricted submarine
warfare in the Atlantic. This policy, which meant sinking ships from neutral
countries, was likely to bring the United States into the war, so Germany de-
cided to offer an alliance to Mexico. In return for Mexico invading the United
States, and thus distracting it from the ground war in Europe, Germany pro-
posed giving Mexico, at the conclusion of the war, much of present-day Texas,
New Mexico, and Arizona. The British secret service intercepted this commu-
nication, and despite the fact that it was encrypted using one of Germany’s
14In classical terminology, a code is a system in which each word of the plaintext is
replaced with a code word. This requires sender and receiver to share a large dictionary in
which plaintext words are paired with their ciphertext equivalents. Ciphers operate on the
individual letters of the plaintext, either by substitution, transposition, or some combina-
tion. This distinction between the words “code” and “cipher” seems to have been largely
abandoned in today’s literature.

most secure cryptosystems, they were able to decipher the cable and pass its
contents on to the United States, thereby helping to propel the United States
into World War I.
The invention and development of radio communications around 1900
caused an even more striking change in the cryptographic landscape, espe-
cially in urgent military and political situations. A general could now instan-
taneously communicate with all of his troops, but unfortunately the enemy
could listen in on all of his broadcasts. The need for secure and efficient ci-
phers became paramount and led to the invention of machine ciphers, such as
Germany’s Enigma machine. This was a device containing a number of rotors,
each of which had many wires running through its center. Before a letter was
encrypted, the rotors would spin in a predetermined way, thereby altering the
paths of the wires and the resultant output. This created an immensely com-
plicated polyalphabetic cipher in which the number of cipher alphabets was
enormous. Further, the rotors could be removed and replaced in a vast number
of different starting configurations, so breaking the system involved knowing
both the circuits through the rotors and figuring out that day’s initial rotor
configuration.
Despite these difficulties, during World War II the British managed to
decipher a large number of messages encrypted on Enigma machines. They
were aided in this endeavor by Polish cryptographers who, just before hos-
tilities commenced, shared with Britain and France the methods that they
had developed for attacking Enigma. But determining daily rotor configura-
tions and analyzing rotor replacements was still an immensely difficult task,
especially after Germany introduced an improved Enigma machine having an
extra rotor. The existence of Britain’s ULTRA project to decrypt Enigma re-
mained secret until 1974, but there are now several popular accounts. Military
intelligence derived from ULTRA was of vital importance in the Allied war
effort.
Another WWII cryptanalytic success was obtained by United States cryp-
tographers against a Japanese cipher machine that they code-named Purple.
This machine used switches, rather than rotors, but again the effect was to
create an incredibly complicated polyalphabetic cipher. A team of cryptogra-
phers, led by William Friedman, managed to reconstruct the design of the Pur-
ple machine purely by analyzing intercepted encrypted messages. They then
built their own machine and proceeded to decrypt many important diplomatic
messages.
In this section we have barely touched the surface of the history of cryptog-
raphy from antiquity through the middle of the twentieth century. Good start-
ing points for further reading include Simon Singh’s light introduction [139]
and David Kahn’s massive and comprehensive, but fascinating and quite read-
able, book The Codebreakers [63].

1.7. Symmetric and Asymmetric Ciphers 37
1.7 Symmetric and Asymmetric Ciphers
We have now seen several different examples of ciphers, all of which have a
number of features in common. Bob wants to send a secret message to Alice.
He uses a secret key k to scramble his plaintext message m and turn it into a
ciphertext c. Alice, upon receiving c, uses the secret key k to unscramble c and
reconstitute m. If this procedure is to work properly, then both Alice and Bob
must possess copies of the secret key k, and if the system is to provide security,
then their adversary Eve must not know k, must not be able to guess k, and
must not be able to recover m from c without knowing k.
In this section we formulate the notion of a cryptosystem in abstract math-
ematical terms. There are many reasons why this is desirable. In particular,
it allows us to highlight similarities and differences between different systems,
while also providing a framework within which we can rigorously analyze the
security of a cryptosystem against various types of attacks.
1.7.1 Symmetric Ciphers
Returning to Bob and Alice, we observe that they must share knowledge of
the secret key k. Using that secret key, they can both encrypt and decrypt
messages, so Bob and Alice have equal (or symmetric) knowledge and abil-
ities. For this reason, ciphers of this sort are known as symmetric ciphers.
Mathematically, a symmetric cipher uses a key k chosen from a space (i.e.,
a set) of possible keys K to encrypt a plaintext message m chosen from a
space of possible messages M, and the result of the encryption process is a
ciphertext c belonging to a space of possible ciphertexts C.
Thus encryption may be viewed as a function
e : K × M → C
whose domain K×M is the set of pairs (k, m) consisting of a key k and a plain-
text m and whose range is the space of ciphertexts C. Similarly, decryption is
a function
d : K × C → M.
Of course, we want the decryption function to “undo” the results of the en-
cryption function. Mathematically, this is expressed by the formula
d

k, e(k, m)

= m for all k ∈ K and all m ∈ M.
It is sometimes convenient to write the dependence on k as a subscript.
Then for each key k, we get a pair of functions
ek : M −→ C and dk : C −→ M
satisfying the decryption property
dk

ek(m)

= m for all m ∈ M.

In other words, for every key k, the function dk is the inverse function of
the function ek. In particular, this means that ek must be one-to-one, since if
ek(m) = ek(m
), then
m = dk

ek(m)

= dk

ek(m
)

= m
.
It is safest for Alice and Bob to assume that Eve knows the encryption
method that is being employed. In mathematical terms, this means that Eve
knows the functions e and d. What Eve does not know is the particular key k
that Alice and Bob are using. For example, if Alice and Bob use a simple
substitution cipher, they should assume that Eve is aware of this fact. This
illustrates a basic premise of modern cryptography called Kerckhoff’s princi-
ple, which says that the security of a cryptosystem should depend only on the
secrecy of the key, and not on the secrecy of the encryption algorithm itself.
If (K, M, C, e, d) is to be a successful cipher, it must have the following
properties:
1. For any key k ∈ K and plaintext m ∈ M, it must be easy to compute
the ciphertext ek(m).
2. For any key k ∈ K and ciphertext c ∈ C, it must be easy to compute the
plaintext dk(c).
3. Given one or more ciphertexts c1, c2, . . . , cn ∈ C encrypted using the
key k ∈ K, it must be very difficult to compute any of the corresponding
plaintexts dk(c1), . . . , dk(cn) without knowledge of k.
Here is another property that is desirable, although more difficult to
achieve.
4. Given one or more pairs of plaintexts and their corresponding cipher-
texts, (m1, c1), (m2, c2), . . . , (mn, cn), it must be very difficult to decrypt
any ciphertext c that is not in the given list without knowing k. This
property is called security against a known plaintext attack.
Even better is to achieve security while allowing the attacker to choose the
known plaintexts.
5. For any list of plaintexts m1, . . . , mn ∈ M chosen by the adversary, even
with knowledge of the corresponding ciphertexts ek(m1), . . . , ek(mn),
it is very difficult to decrypt any ciphertext c that is not in the
given list without knowing k. This is known as security against a cho-
sen plaintext attack. N.B. In this attack, the adversary is allowed to
choose m1, . . . , mn, as opposed to a known plaintext attack, where the
attacker is given a list of plaintext/ciphertext pairs not of his choosing.

Example 1.33. The simple substitution cipher does not have Property 4, since
even a single plaintext/ciphertext pair (m, c) reveals most of the encryption
table. Similarly, the Vigenère cipher discussed in Sect. 5.2 has the property
that a plaintext/ciphertext pair immediately reveals the keyword used for
encryption. Thus both simple substitution and Vigenère ciphers are vulnerable
to known plaintext attacks. See Exercise 1.43 for a further example.
In our list of desirable properties for a cryptosystem, we have left open the
question of what exactly is meant by the words “easy” and “hard.” We defer
a formal discussion of this profound question to Sect. 5.7; see also Sects. 2.1
and 2.6. For now, we informally take “easy” to mean computable in less than
a second on a typical desktop computer and “hard” to mean that all of the
computing power in the world would require several years (at least) to perform
the computation.
1.7.2 Encoding Schemes
It is convenient to view keys, plaintexts, and ciphertexts as numbers and to
write those numbers in binary form. For example, we could take strings of
8 bits,15
which give numbers from 0 to 255, and use them to represent the
letters of the alphabet via
a = 00000000, b = 00000001, c = 00000010, . . . , z = 00011001.
To distinguish lowercase from uppercase, we could let A = 00011011, B =
00011100, and so on. This encoding method allows up to 256 distinct symbols
to be translated into binary form.
Your computer may use a method of this type, called the ASCII code,16
to
store data, although for historical reasons the alphabetic characters are not as-
signed the lowest binary values. Part of the ASCII code is listed in Table 1.10.
For example, the phrase “Bed bug.” (including spacing and punctuation) is
encoded in ASCII as
B e d b u g .
66 101 100 32 98 117 103 46
01000010 01100101 01100100 00100000 01100010 01110101 01100111 00101110
Thus where you see the phrase “Bed bug.”, your computer sees the list of
bits
0100001001100101011001000010000001100010011101010110011100101110.
Deﬁnition. An encoding scheme is a method of converting one sort of
data into another sort of data, for example, converting text into numbers.
The distinction between an encoding scheme and an encryption scheme is one
15A bit is a 0 or a 1. The word “bit” is an abbreviation for binary digit.
16ASCII is an acronym for American Standard Code for Information Interchange.

32 00100000
( 40 00101000
) 41 00101001
, 44 00101100
. 46 00101110
A 65 01000001
B 66 01000010
C 67 01000011
D 68 01000100
.
.
.
.
.
.
.
.
.
X 88 01011000
Y 89 01011001
Z 90 01011010
a 97 01100001
b 98 01100010
c 99 01100011
d 100 01100100
.
.
.
.
.
.
.
.
.
x 120 01111000
y 121 01111001
z 122 01111010
Table 1.10: The ASCII encoding scheme
of intent. An encoding scheme is assumed to be entirely public knowledge and
used by everyone for the same purposes. An encryption scheme is designed to
hide information from anyone who does not possess the secret key. Thus an
encoding scheme, like an encryption scheme, consists of an encoding function
and its inverse decoding function, but for an encoding scheme, both functions
are public knowledge and should be fast and easy to compute.
With the use of an encoding scheme, a plaintext or ciphertext may be
viewed as a sequence of binary blocks, where each block consists of 8 bits, i.e.,
of a sequence of eight ones and zeros. A block of 8 bits is called a byte. For
human comprehension, a byte is often written as a decimal number between 0
and 255, or as a two-digit hexadecimal (base 16) number between 00 and FF.
Computers often operate on more than 1 byte at a time. For example, a 64-bit
processor operates on 8 bytes at a time.
1.7.3 Symmetric Encryption of Encoded Blocks
In using an encoding scheme as described in Sect. 1.7.2, it is convenient to
view the elements of the plaintext space M as consisting of bit strings of
a fixed length B, i.e., strings of exactly B ones and zeros. We call B the
blocksize of the cipher. A general plaintext message then consists of a list of
message blocks chosen from M, and the encryption function transforms the
message blocks into a list of ciphertext blocks in C, where each block is a
sequence of B bits. If the plaintext ends with a block of fewer than B bits, we
pad the end of the block with zeros. Keep in mind that this encoding process,
which converts the original plaintext message into a sequence of blocks of bits
in M, is public knowledge.
Encryption and decryption are done one block at a time, so it suffices to
study the process for a single plaintext block, i.e., for a single m ∈ M. This,
of course, is why it is convenient to break a message up into blocks. A message
can be of arbitrary length, so it’s nice to be able to focus the cryptographic
process on a single piece of fixed length. The plaintext block m is a string
of B bits, which for concreteness we identify with the corresponding number

in binary form. In other words, we identify M with the set of integers m
satisfying 0 ≤ m 2B
via
list of B bits of m

mB−1mB−2 · · · m2m1m0 ←→
integer between 0 and 2B
− 1

mB−1 · 2B−1
+ · · · + m2 · 22
+ m1 · 2 + m0 .
Here m0, m1, . . . , mB−1 are each 0 or 1.
Similarly, we identify the key space K and the ciphertext space C with sets
of integers corresponding to bit strings of a certain blocksize. For notational
convenience, we denote the blocksizes for keys, plaintexts, and ciphertexts
by Bk, Bm, and Bc. They need not be the same. Thus we have identified K, M,
and C with sets of positive integers
K = {k ∈ Z : 0 ≤ k 2Bk
},
M = {m ∈ Z : 0 ≤ m 2Bm
},
C = {c ∈ Z : 0 ≤ c 2Bc
}.
An important question immediately arises: how large should Alice and Bob
make the set K, or equivalently, how large should they choose the key block-
size Bk? If Bk is too small, then Eve can check every number from 0 to 2Bk
− 1
until she finds Alice and Bob’s key. More precisely, since Eve is assumed to
know the decryption algorithm d (Kerckhoff’s principle), she takes each k ∈ K
and uses it to compute dk(c). Assuming that Eve is able to distinguish between
valid and invalid plaintexts, eventually she will recover the message.
This attack is known as an exhaustive search attack (also sometimes re-
ferred to as a brute-force attack), since Eve exhaustively searches through the
key space. With current technology, an exhaustive search is considered to be
infeasible if the space has at least 280
elements. Thus Bob and Alice should
definitely choose Bk ≥ 80.
For many cryptosystems, especially the public key cryptosystems that form
the core of this book, there are refinements on the exhaustive search attack
that effectively replace the size of the space with its square root. These meth-
ods are based on the principle that it is easier to find matching objects (col-
lisions) in a set than it is to find a particular object in the set. We describe
some of these meet-in-the-middle or collision attacks in Sects. 2.7, 5.4, 5.5, 7.2,
and 7.10. If meet-in-the-middle attacks are available, then Alice and Bob
should choose Bk ≥ 160.
1.7.4 Examples of Symmetric Ciphers
Before descending further into a morass of theory and notation, we pause to
give a mathematical description of some elementary symmetric ciphers.

Let p be a large prime,17
say 2159
p 2160
. Alice and Bob take their
key space K, plaintext space M, and ciphertext space C to be the same set,
K = M = C = {1, 2, 3, . . . , p − 1}.
In fancier terminology, K = M = C = F∗
p are all taken to be equal to the
group of units in the finite field Fp.
Alice and Bob randomly select a key k ∈ K, i.e., they select an integer k
satisfying 1 ≤ k p, and they decide to use the encryption function ek de-
fined by
ek(m) ≡ k · m (mod p). (1.9)
Here we mean that ek(m) is set equal to the unique positive integer between 1
and p that is congruent to k · m modulo p. The corresponding decryption
function dk is
dk(c) ≡ k
· c (mod p),
where k
is the inverse of k modulo p. It is important to note that although p
is very large, the extended Euclidean algorithm (Remark 1.15) allows us to
calculate k
in fewer than 2 log2 p + 2 steps. Thus finding k
from k counts as
“easy” in the world of cryptography.
It is clear that Eve has a hard time guessing k, since there are approxi-
mately 2160
possibilities from which to choose. Is it also difficult for Eve to
recover k if she knows the ciphertext c? The answer is yes, it is still difficult.
Notice that the encryption function
ek : M −→ C
is surjective (onto) for any choice of key k. This means that for every c ∈ C
and any k ∈ K there exists an m ∈ M such that ek(m) = c. Further, any
given ciphertext may represent any plaintext, provided that the plaintext is
encrypted by an appropriate key. Mathematically, this may be rephrased by
saying that given any ciphertext c ∈ C and any plaintext m ∈ M, there exists
a key k such that ek(m) = c. Specifically this is true for the key
k ≡ m−1
· c (mod p). (1.10)
This shows that Alice and Bob’s cipher has Properties 1–3 as listed on page 38,
since anyone who knows the key k can easily encrypt and decrypt, but it is
hard to decrypt if you do not know the value of k. However, this cipher does
not have Property 4, since even a single plaintext/ciphertext pair (m, c) allows
Eve to recover the private key k using the formula (1.10).
17There are in fact many primes in the interval 2159 p 2160. The prime number
theorem implies that almost 1 % of the numbers in this interval are prime. Of course, there
is also the question of identifying a number as prime or composite. There are efficient tests
that do this, even for very large numbers. See Sect. 3.4.

It is also interesting to observe that if Alice and Bob define their encryption
function to be simply multiplication of integers ek(m) = k · m with no reduc-
tion modulo p, then their cipher still has Properties 1 and 2, but Property 3
fails. If Eve tries to decrypt a single ciphertext c = k · m, she still faces the
(moderately) difficult task of factoring a large number. However, if she man-
ages to acquire several ciphertexts c1, c2, . . . , cn, then there is a good chance
that
gcd(c1, c2, . . . , cn) = gcd(k · m1, k · m2, . . . , k · mn)
= k · gcd(m1, m2, . . . , mn)
equals k itself or a small multiple of k. Note that it is an easy task to compute
the greatest common divisor.
This observation provides our first indication of how reduction modulo p
has a wonderful “mixing” effect that destroys properties such as divisibility.
However, reduction is not by itself the ultimate solution. Consider the vulner-
ability of the cipher (1.9) to a known plaintext attack. As noted above, if Eve
can get her hands on both a ciphertext c and its corresponding plaintext m,
then she easily recovers the key by computing
k ≡ m−1
· c (mod p).
Thus even a single plaintext/ciphertext pair suffices to reveal the key, so the
encryption function ek given by (1.9) does not have Property 4 on page 38.
There are many variants of this “multiplication-modulo-p” cipher. For
example, since addition is more efficient than multiplication, there is an
“addition-modulo-p” cipher given by
ek(m) ≡ m + k (mod p) and dk(c) ≡ c − k (mod p),
which is nothing other than the shift or Caesar cipher that we studied in
Sect. 1.1. Another variant, called an affine cipher, is a combination of the shift
cipher and the multiplication cipher. The key for an affine cipher consists of
two integers k = (k1, k2) and encryption and decryption are defined by
ek(m) ≡ k1 · m + k2 (mod p),
dk(c) ≡ k
1 · (c − k2) (mod p),
(1.11)
where k
1 is the inverse of k1 modulo p.
The affine cipher has a further generalization called the Hill cipher, in
which the plaintext m, the ciphertext c, and the second part of the key k2 are
replaced by column vectors consisting of n numbers modulo p. The first part of
the key k1 is taken to be an n-by-n matrix with mod p integer entries. Encryp-
tion and decryption are again given by (1.11), but now multiplication k1 · m
is the product of a matrix and a vector, and k
1 is the inverse matrix of k1
modulo p. Both the affine cipher and the Hill cipher are vulnerable to known
plaintext attacks; see Exercises 1.43. and 1.44.

Example 1.34. As noted earlier, addition is generally faster than multiplica-
tion, but there is another basic computer operation that is even faster than
addition. It is called exclusive or and is denoted by XOR or ⊕. At the lowest
level, XOR takes two individual bits β ∈ {0, 1} and β
∈ {0, 1} and yields
β ⊕ β
=

0 if β and β
are the same,
1 if β and β
are different.
(1.12)
If you think of a bit as a number that is 0 or 1, then XOR is the same as
addition modulo 2. More generally, the XOR of 2 bit strings is the result of
performing XOR on each corresponding pair of bits. For example,
10110 ⊕ 11010 = [1 ⊕ 1] [0 ⊕ 1] [1 ⊕ 0] [1 ⊕ 1] [0 ⊕ 0] = 01100.
Using this new operation, Alice and Bob have at their disposal yet another
basic cipher defined by
ek(m) = k ⊕ m and dk(c) = k ⊕ c.
Here K, M, and C are the sets of all binary strings of length B, or equivalently,
the set of all numbers between 0 and 2B
− 1.
This cipher has the advantage of being highly efficient and completely
symmetric in the sense that ek and dk are the same function. If k is chosen
randomly and is used only once, then this cipher is known as Vernam’s one-
time pad. In Sect. 5.57 we show that the one-time pad is provably secure.
Unfortunately, it requires a key that is as long as the plaintext, which makes
it too cumbersome for most practical applications. And if k is used to encrypt
more than one plaintext, then Eve may be able to exploit the fact that
c ⊕ c
= (k ⊕ m) ⊕ (k ⊕ m
) = m ⊕ m
to extract information about m or m
. It’s not obvious how Eve would proceed
to find k, m, or m
, but simply the fact that the key k can be removed so
easily, revealing the potentially less random quantity m ⊕ m
, should make a
cryptographer nervous. Further, this method is vulnerable in some situations
to a known plaintext attack; see Exercise 1.48.
1.7.5 Random Bit Sequences and Symmetric Ciphers
We have arrived, at long last, at the fundamental question regarding the
creation of secure and efficient symmetric ciphers. Is it possible to use a single
relatively short key k (say consisting of 160 random bits) to securely and
efficiently send arbitrarily long messages? Here is one possible construction.
Suppose that we could construct a function
R : K × Z −→ {0, 1}

with the following properties:
1. For all k ∈ K and all j ∈ Z, it is easy to compute R(k, j).
2. Given an arbitrarily long sequence of integers j1, j2, . . . , jn and given all
of the values R(k, j1), R(k, j2), . . . , R(k, jn), it is hard to determine k.
3. Given any list of integers j1, j2, . . . , jn and given all of the values
R(k, j1), R(k, j2), . . . , R(k, jn),
it is hard to guess the value of R(k, j) with better than a 50 % chance
of success for any value of j not already in the list.
If we could find a function R with these three properties, then we could
use it to turn an initial key k into a sequence of bits
R(k, 1), R(k, 2), R(k, 3), R(k, 4), . . . , (1.13)
and then we could use this sequence of bits as the key for a one-time pad as
described in Example 1.34.
The fundamental problem with this approach is that the sequence of
bits (1.13) is not truly random, since it is generated by the function R. In-
stead, we say that the sequence of bits (1.13) is a pseudorandom sequence and
we call R a pseudorandom number generator.
Do pseudorandom number generators exist? If so, they would provide ex-
amples of the one-way functions defined by Diffie and Hellman in their ground-
breaking paper [38], but despite more than a quarter century of work, no one
has yet proven the existence of even a single such function. We return to this
fascinating subject in Sects. 2.1 and 8.2. For now, we content ourselves with
a few brief remarks.
Although no one has yet conclusively proven that pseudorandom number
generators exist, many candidates have been suggested, and some of these
proposals have withstood the test of time. There are two basic approaches
to constructing candidates for R, and these two methods provide a good il-
lustration of the fundamental conflict in cryptography between security and
efficiency.
The first approach is to repeatedly apply an ad hoc collection of mixing
operations that are well suited to efficient computation and that appear to
be very hard to untangle. This method is, disconcertingly, the basis for most
practical symmetric ciphers, including the Data Encryption Standard (DES)
and the Advanced Encryption Standard (AES), which are the two systems
most widely used today. See Sect. 8.12 for a brief description of these modern
symmetric ciphers.
The second approach is to construct R using a function whose efficient
inversion is a well-known mathematical problem that is believed to be difficult.
This approach provides a far more satisfactory theoretical underpinning for a
symmetric cipher, but unfortunately, all known constructions of this sort are
far less efficient than the ad hoc constructions, and hence are less attractive
for real-world applications.

1.7.6 Asymmetric Ciphers Make a First Appearance
If Alice and Bob want to exchange messages using a symmetric cipher, they
must first mutually agree on a secret key k. This is fine if they have the oppor-
tunity to meet in secret or if they are able to communicate once over a secure
channel. But what if they do not have this opportunity and if every commu-
nication between them is monitored by their adversary Eve? Is it possible for
Alice and Bob to exchange a secret key under these conditions?
Most people’s first reaction is that it is not possible, since Eve sees every
piece of information that Alice and Bob exchange. It was the brilliant insight
of Diffie and Hellman18
that under certain hypotheses, it is possible. The
search for efficient (and provable) solutions to this problem, which is called
public key (or asymmetric) cryptography, forms one of the most interesting
parts of mathematical cryptography and is the principal focus of this book.
We start by describing a nonmathematical way to visualize public key
cryptography. Alice buys a safe with a narrow slot in the top and puts her
safe in a public location. Everyone in the world is allowed to examine the safe
and see that it is securely made. Bob writes his message to Alice on a piece of
paper and slips it through the slot in the top of the safe. Now only a person
with the key to the safe, which presumably means only Alice, can retrieve
and read Bob’s message. In this scenario, Alice’s public key is the safe, the
encryption algorithm is the process of putting the message in the slot, and the
decryption algorithm is the process of opening the safe with the key. Note that
this setup is not far-fetched; it is used in the real world. For example, the night
deposit slot at a bank has this form, although in practice the “slot” must be
well protected to prevent someone from inserting a long thin pair of tongs and
extracting other people’s deposits!
A useful feature of our “safe-with-a-slot” cryptosystem, which it shares
with actual public key cryptosystems, is that Alice needs to put only one safe
in a public location, and then everyone in the world can use it repeatedly
to send encrypted messages to Alice. There is no need for Alice to provide
a separate safe for each of her correspondents. And there is also no need for
Alice to open the safe and remove Bob’s message before someone else such as
Carl or Dave uses it to send Alice a message.
We are now ready to give a mathematical formulation of an asymmetric
cipher. As usual, there are spaces of keys K, plaintexts M, and ciphertexts C.
However, an element k of the key space is really a pair of keys,
k = (kpriv, kpub),
called the private key and the public key, respectively. For each public key kpub
there is a corresponding encryption function
ekpub
: M −→ C,
18The history is actually somewhat more complicated than this; see our brief discussion
in Sect. 2.1 and the references listed there for further reading.

Exercises 47
and for each private key kpriv there is a corresponding decryption function
dkpriv
: C −→ M.
These have the property that if the pair (kpriv, kpub) is in the key space K, then
dkpriv

ekpub
(m)

= m for all m ∈ M.
If an asymmetric cipher is to be secure, it must be difficult for Eve to com-
pute the decryption function dkpriv
(c), even if she knows the public key kpub.
Notice that under this assumption, Alice can send kpub to Bob using an inse-
cure communication channel, and Bob can send back the ciphertext ekpub
(m),
without worrying that Eve will be able to decrypt the message. To easily de-
crypt, it is necessary to know the private key kpriv, and presumably Alice is
the only person with that information. The private key is sometimes called
Alice’s trapdoor information, because it provides a trapdoor (i.e., a short-
cut) for computing the inverse function of ekpub
. The fact that the encryption
and decryption keys kpub and kpriv are different makes the cipher asymmetric,
whence its moniker.
It is quite intriguing that Diffie and Hellman created this concept without
finding a candidate for an actual pair of functions, although they did propose
a similar method by which Alice and Bob can securely exchange a random
piece of data whose value is not known initially to either one. We describe
Diffie and Hellman’s key exchange method in Sect. 2.3 and then go on to
discuss a number of asymmetric ciphers, including Elgamal (Sect. 2.4), RSA
(Sect. 3.2), Goldwassser–Micali (Sect. 3.10), ECC (Sect. 6.4), GGH (Sect. 7.8),
and NTRU (Sect. 7.10), whose security rely on the presumed difficulty of a
variety of different mathematical problems.
Remark 1.35. In practice, asymmetric ciphers tend to be considerably slower
than symmetric ciphers such as DES and AES. For that reason, if Bob needs
to send Alice a large file, he might first use an asymmetric cipher to send
Alice the key to a symmetric cipher, which he would then use to transmit the
actual file.
Exercises
Section 1.1. Simple Substitution Ciphers
1.1. Build a cipher wheel as illustrated in Fig. 1.1, but with an inner wheel that
rotates, and use it to complete the following tasks. (For your convenience, there
is a cipher wheel that you can print and cut out at www.math.brown.edu/~jhs/
MathCrypto/CipherWheel.pdf.)
(a) Encrypt the following plaintext using a rotation of 11 clockwise.
“A page of history is worth a volume of logic.”
(b) Decrypt the following message, which was encrypted with a rotation of 7 clock-
wise.

48 Exercises
AOLYLHYLUVZLJYLAZILAALYAOHUAOLZLJYLAZAOHALCLYFIVKFNBLZZLZ
(c) Decrypt the following message, which was encrypted by rotating 1 clockwise
for the ﬁrst letter, then 2 clockwise for the second letter, etc.
XJHRFTNZHMZGAHIUETXZJNBWNUTRHEPOMDNBJMAUGORFAOIZOCC
S C J A X U F B Q K T P R W E Z H V L I G Y D N M O
Table 1.11: Simple substitution encryption table for Exercise 1.3
1.2. Decrypt each of the following Caesar encryptions by trying the various possible
shifts until you obtain readable text.
(a) LWKLQNWKDWLVKDOOQHYHUVHHDELOOERDUGORYHOBDVDWUHH
(b) UXENRBWXCUXENFQRLQJUCNABFQNWRCJUCNAJCRXWORWMB
(c) BGUTBMBGZTFHNLXMKTIPBMAVAXXLXTEPTRLEXTOXKHHFYHKMAXFHNLX
1.3. For this exercise, use the simple substitution table given in Table 1.11.
(a) Encrypt the plaintext message
The gold is hidden in the garden.
(b) Make a decryption table, that is, make a table in which the ciphertext alphabet
is in order from A to Z and the plaintext alphabet is mixed up.
(c) Use your decryption table from (b) to decrypt the following message.
IBXLX JVXIZ SLLDE VAQLL DEVAU QLB
1.4. Each of the following messages has been encrypted using a simple substitution
cipher. Decrypt them. For your convenience, we have given you a frequency table
and a list of the most common bigrams that appear in the ciphertext. (If you do not
want to recopy the ciphertexts by hand, they can be downloaded or printed from
the web site listed in the preface.)
(a) “A Piratical Treasure”
JNRZR BNIGI BJRGZ IZLQR OTDNJ GRIHT USDKR ZZWLG OIBTM NRGJN
IJTZJ LZISJ NRSBL QVRSI ORIQT QDEKJ JNRQW GLOFN IJTZX QLFQL
WBIMJ ITQXT HHTBL KUHQL JZKMM LZRNT OBIMI EURLW BLQZJ GKBJT
QDIQS LWJNR OLGRI EZJGK ZRBGS MJLDG IMNZT OIHRK MOSOT QHIJL
QBRJN IJJNT ZFIZL WIZTO MURZM RBTRZ ZKBNN LFRVR GIZFL KUHIM
MRIGJ LJNRB GKHRT QJRUU RBJLW JNRZI TULGI EZLUK JRUST QZLUK
EURFT JNLKJ JNRXR S
The ciphertext contains 316 letters. Here is a frequency table:
R J I L Z T N Q B G K U M O S H W F E D X V
Freq 33 30 27 25 24 20 19 16 15 15 13 12 12 10 9 8 7 6 5 5 3 2
The most frequent bigrams are: JN (11 times), NR (8 times), TQ (6 times), and
LW, RB, RZ, and JL (5 times each).

Exercises 49
(b) “A Botanical Code”
KZRNK GJKIP ZBOOB XLCRG BXFAU GJBNG RIXRU XAFGJ BXRME MNKNG
BURIX KJRXR SBUER ISATB UIBNN RTBUM NBIGK EBIGR OCUBR GLUBN
JBGRL SJGLN GJBOR ISLRS BAFFO AZBUN RFAUS AGGBI NGLXM IAZRX
RMNVL GEANG CJRUE KISRM BOOAZ GLOKW FAUKI NGRIC BEBRI NJAWB
OBNNO ATBZJ KOBRC JKIRR NGBUE BRINK XKBAF QBROA LNMRG MALUF
BBG
B R G N A I U K O J L X M F S E Z C T W P V Q
Freq 32 28 22 20 16 16 14 13 12 11 10 10 8 8 7 7 6 5 3 2 1 1 1
The most frequent bigrams are: NG and RI (7 times each), BU (6 times), and BR
(5 times).
(c) In order to make this one a bit more challenging, we have removed all occur-
rences of the word “the” from the plaintext.
“A Brilliant Detective”
GSZES GNUBE SZGUG SNKGX CSUUE QNZOQ EOVJN VXKNG XGAHS AWSZZ
BOVUE SIXCQ NQESX NGEUG AHZQA QHNSP CIPQA OIDLV JXGAK CGJCG
SASUB FVQAV CIAWN VWOVP SNSXV JGPCV NODIX GJQAE VOOXC SXXCG
OGOVA XGNVU BAVKX QZVQD LVJXQ EXCQO VKCQG AMVAX VWXCG OOBOX
VZCSO SPPSN VAXUB DVVAX QJQAJ VSUXC SXXCV OVJCS NSJXV NOJQA
MVBSZ VOOSH VSAWX QHGMV GWVSX CSXXC VBSNV ZVNVN SAWQZ ORVXJ
CVOQE JCGUW NVA
V S X G A O Q C N J U Z E W B P I H K D M L R F
Freq 39 29 29 22 21 21 20 20 19 13 11 11 10 8 8 6 5 5 5 4 3 2 1 1
The most frequent bigrams are: XC (10 times), NV (7 times), and CS, OV, QA, and
SX (6 times each).
1.5. Suppose that you have an alphabet of 26 letters.
(a) How many possible simple substitution ciphers are there?
(b) A letter in the alphabet is said to be fixed if the encryption of the letter is the
letter itself. How many simple substitution ciphers are there that leave:
(i) No letters fixed?
(ii) At least one letter fixed?
(iii) Exactly one letter fixed?
(iv) At least two letters fixed?
(Part (b) is quite challenging! You might try doing the problem first with an alphabet
of four or five letters to get an idea of what is going on.)
Section 1.2. Divisibility and Greatest Common Divisors
1.6. Let a, b, c ∈ Z. Use the definition of divisibility to directly prove the following
properties of divisibility. (This is Proposition 1.4.)
(a) If a | b and b | c, then a | c.
(b) If a | b and b | a, then a = ±b.
(c) If a | b and a | c, then a | (b + c) and a | (b − c).
1.7. Use a calculator and the method described in Remark 1.9 to compute the
following quotients and remainders.
(a) 34787 divided by 353.

50 Exercises
(b) 238792 divided by 7843.
(c) 9829387493 divided by 873485.
(d) 1498387487 divided by 76348.
1.8. Use a calculator and the method described in Remark 1.9 to compute the
following remainders, without bothering to compute the associated quotients.
(a) The remainder of 78745 divided by 127.
(b) The remainder of 2837647 divided by 4387.
(c) The remainder of 8739287463 divided by 18754.
(d) The remainder of 4536782793 divided by 9784537.
1.9. Use the Euclidean algorithm to compute the following greatest common divi-
sors.
(a) gcd(291, 252).
(b) gcd(16261, 85652).
(c) gcd(139024789, 93278890).
(d) gcd(16534528044, 8332745927).
1.10. For each of the gcd(a, b) values in Exercise 1.9, use the extended Euclidean
algorithm (Theorem 1.11) to find integers u and v such that au + bv = gcd(a, b).
1.11. Let a and b be positive integers.
(a) Suppose that there are integers u and v satisfying au + bv = 1. Prove that
gcd(a, b) = 1.
(b) Suppose that there are integers u and v satisfying au + bv = 6. Is it necessarily
true that gcd(a, b) = 6? If not, give a specific counterexample, and describe in
general all of the possible values of gcd(a, b)?
(c) Suppose that (u1, v1) and (u2, v2) are two solutions in integers to the equation
au + bv = 1. Prove that a divides v2 − v1 and that b divides u2 − u1.
(d) More generally, let g = gcd(a, b) and let (u0, v0) be a solution in integers to
au + bv = g. Prove that every other solution has the form u = u0 + kb/g and
v = v0 − ka/g for some integer k. (This is the second part of Theorem 1.11.)
1.12. The method for solving au+bv = gcd(a, b) described in Sect. 1.2 is somewhat
inefficient. This exercise describes a method to compute u and v that is well suited
for computer implementation. In particular, it uses very little storage.
(a) Show that the following algorithm computes the greatest common divisor g of
the positive integers a and b, together with a solution (u, v) in integers to the
equation au + bv = gcd(a, b).
1. Set u = 1, g = a, x = 0, and y = b
2. If y = 0, set v = (g − au)/b and return the values (g, u, v)
3. Divide g by y with remainder, g = qy + t, with 0 ≤ t y
4. Set s = u − qx
5. Set u = x and g = y
6. Set x = s and y = t
7. Go To Step (2)
(b) Implement the above algorithm on a computer using the computer language of
your choice.

Exercises 51
(c) Use your program to compute g = gcd(a, b) and integer solutions to the equa-
tion au + bv = g for the following pairs (a, b).
(i) (527, 1258)
(ii) (228, 1056)
(iii) (163961, 167181)
(iv) (3892394, 239847)
(d) What happens to your program if b = 0? Fix the program so that it deals with
this case correctly.
(e) It is often useful to have a solution with u 0. Modify your program so that
it returns a solution with u 0 and u as small as possible. [Hint. If (u, v) is a
solution, then so is (u + b/g, v − a/g).] Redo (c) using your modiﬁed program.
1.13. Let a1, a2, . . . , ak be integers with gcd(a1, a2, . . . , ak) = 1, i.e., the largest
positive integer dividing all of a1, . . . , ak is 1. Prove that the equation
a1u1 + a2u2 + · · · + akuk = 1
has a solution in integers u1, u2, . . . , uk. (Hint. Repeatedly apply the extended Eu-
clidean algorithm, Theorem 1.11. You may ﬁnd it easier to prove a more general
statement in which gcd(a1, . . . , ak) is allowed to be larger than 1.)
1.14. Let a and b be integers with b 0. We’ve been using the “obvious fact” that a
divided by b has a unique quotient and remainder. In this exercise you will give a
proof.
(a) Prove that the set
{a − bq : q ∈ Z}
contains at least one non-negative integer.
(b) Let r be the smallest non-negative integer in the set described in (a). Prove
that 0 ≤ r b.
(c) Prove that there are integers q and r satisfying
a = bq + r and 0 ≤ r b.
(d) Suppose that
a = bq1 + r1 = bq2 + r2 with 0 ≤ r1 b and 0 ≤ r2 b.
Prove that q1 = q2 and r1 = r2.
Section 1.3. Modular Arithmetic
1.15. Let m ≥ 1 be an integer and suppose that
a1 ≡ a2 (mod m) and b1 ≡ b2 (mod m).
Prove that
a1 ± b1 ≡ a2 ± b2 (mod m) and a1 · b1 ≡ a2 · b2 (mod m).
(This is Proposition 1.13(a).)
1.16. Write out the following tables for Z/mZ and (Z/mZ)∗
, as we did in Figs. 1.4
and 1.5.

52 Exercises
(a) Make addition and multiplication tables for Z/3Z.
(b) Make addition and multiplication tables for Z/6Z.
(c) Make a multiplication table for the unit group (Z/9Z)∗
.
(d) Make a multiplication table for the unit group (Z/16Z)∗
.
1.17. Do the following modular computations. In each case, fill in the box with an
integer between 0 and m − 1, where m is the modulus.
(a) 347 + 513 ≡ (mod 763).
(b) 3274 + 1238 + 7231 + 6437 ≡ (mod 9254).
(c) 153 · 287 ≡ (mod 353).
(d) 357 · 862 · 193 ≡ (mod 943).
(e) 5327 · 6135 · 7139 · 2187 · 5219 · 1873 ≡ (mod 8157).
(Hint. After each multiplication, reduce modulo 8157 before doing the next
multiplication.)
(f) 1372
≡ (mod 327).
(g) 3736
≡ (mod 581).
(h) 233
· 195
· 114
≡ (mod 97).
1.18. Find all values of x between 0 and m − 1 that are solutions of the following
congruences. (Hint. If you can’t figure out a clever way to find the solution(s), you
can just substitute each value x = 1, x = 2,. . . , x = m − 1 and see which ones
work.)
(a) x + 17 ≡ 23 (mod 37).
(b) x + 42 ≡ 19 (mod 51).
(c) x2
≡ 3 (mod 11).
(d) x2
≡ 2 (mod 13).
(e) x2
≡ 1 (mod 8).
(f) x3
− x2
+ 2x − 2 ≡ 0 (mod 11).
(g) x ≡ 1 (mod 5) and also x ≡ 2 (mod 7). (Find all solutions modulo 35, that is,
find the solutions satisfying 0 ≤ x ≤ 34.)
1.19. Suppose that ga
≡ 1 (mod m) and that gb
≡ 1 (mod m). Prove that
ggcd(a,b)
≡ 1 (mod m).
1.20. Prove that if a1 and a2 are units modulo m, then a1a2 is a unit modulo m.
1.21. Prove that m is prime if and only if φ(m) = m − 1, where φ is Euler’s phi
function.
1.22. Let m ∈ Z.
(a) Suppose that m is odd. What integer between 1 and m − 1 equals 2−1
mod m?
(b) More generally, suppose that m ≡ 1 (mod b). What integer between 1 and m − 1
is equal to b−1
mod m?
1.23. Let m be an odd integer and let a be any integer. Prove that 2m + a2
can
never be a perfect square. (Hint. If a number is a perfect square, what are its possible
values modulo 4?)

Exercises 53
1.24. (a) Find a single value x that simultaneously solves the two congruences
x ≡ 3 (mod 7) and x ≡ 4 (mod 9).
(Hint. Note that every solution of the ﬁrst congruence looks like x = 3 + 7y for
some y. Substitute this into the second congruence and solve for y; then use
that to get x.)
(b) Find a single value x that simultaneously solves the two congruences
x ≡ 13 (mod 71) and x ≡ 41 (mod 97).
(c) Find a single value x that simultaneously solves the three congruences
x ≡ 4 (mod 7), x ≡ 5 (mod 8), and x ≡ 11 (mod 15).
(d) Prove that if gcd(m, n) = 1, then the pair of congruences
x ≡ a (mod m) and x ≡ b (mod n)
has a solution for any choice of a and b. Also give an example to show that the
condition gcd(m, n) = 1 is necessary.
1.25. Let N, g, and A be positive integers (note that N need not be prime).
Prove that the following algorithm, which is a low-storage variant of the square-
and-multiply algorithm described in Sect. 1.3.2, returns the value gA
(mod N). (In
Step 4 we use the notation x to denote the greatest integer function, i.e., round x
down to the nearest integer.)
Input. Positive integers N, g, and A.
1. Set a = g and b = 1.
2. Loop while A 0.
3. If A ≡ 1 (mod 2), set b = b · a (mod N).
4. Set a = a2
(mod N) and A = A/2.
5. If A 0, continue with loop at Step 2.
6. Return the number b, which equals gA
(mod N).
1.26. Use the square-and-multiply algorithm described in Sect. 1.3.2, or the more
eﬃcient version in Exercise 1.25, to compute the following powers.
(a) 17183
(mod 256).
(b) 2477
(mod 1000).
(c) 11507
(mod 1237).
1.27. Consider the congruence
ax ≡ c (mod m).
(a) Prove that there is a solution if and only if gcd(a, m) divides c.
(b) If there is a solution, prove that there are exactly gcd(a, m) distinct solutions
modulo m.
(Hint. Use the extended Euclidean algorithm (Theorem 1.11).)
Section 1.4. Prime Numbers, Unique Factorization, and Finite Fields

54 Exercises
1.28. Let {p1, p2, . . . , pr} be a set of prime numbers, and let
N = p1p2 · · · pr + 1.
Prove that N is divisible by some prime not in the original set. Use this fact to
deduce that there must be infinitely many prime numbers. (This proof of the infini-
tude of primes appears in Euclid’s Elements. Prime numbers have been studied for
thousands of years.)
1.29. Without using the fact that every integer has a unique factorization into
primes, prove that if gcd(a, b) = 1 and if a | bc, then a | c. (Hint. Use the fact that
it is possible to find a solution to au + bv = 1.)
1.30. Compute the following ordp values:
(a) ord2(2816).
(b) ord7(2222574487).
(c) ordp(46375) for each of p = 3, 5, 7, and 11.
1.31. Let p be a prime number. Prove that ordp has the following properties.
(a) ordp(ab) = ordp(a) + ordp(b). (Thus ordp resembles the logarithm function,
since it converts multiplication into addition!)
(b) ordp(a + b) ≥ min

ordp(a), ordp(b)

.
(c) If ordp(a) = ordp(b), then ordp(a + b) = min

ordp(a), ordp(b)

.
A function satisfying properties (a) and (b) is called a valuation.
Section 1.5. Powers and Primitive Roots in Finite Fields
1.32. For each of the following primes p and numbers a, compute a−1
mod p in two
ways: (i) Use the extended Euclidean algorithm. (ii) Use the fast power algorithm
and Fermat’s little theorem. (See Example 1.27.)
(a) p = 47 and a = 11.
(b) p = 587 and a = 345.
(c) p = 104801 and a = 78467.
1.33. Let p be a prime and let q be a prime that divides p − 1.
(a) Let a ∈ F∗
p and let b = a(p−1)/q
. Prove that either b = 1 or else b has order q.
(Recall that the order of b is the smallest k ≥ 1 such that bk
= 1 in F∗
p. Hint.
Use Proposition 1.29.)
(b) Suppose that we want to find an element of F∗
p of order q. Using (a), we can
randomly choose a value of a ∈ F∗
p and check whether b = a(p−1)/q
satisfies b =
1. How likely are we to succeed? In other words, compute the value of the ratio
#{a ∈ F∗
p : a(p−1)/q
= 1}
#F∗
p
.
(Hint. Use Theorem 1.30.)
1.34. Recall that g is called a primitive root modulo p if the powers of g give all
nonzero elements of Fp.
(a) For which of the following primes is 2 a primitive root modulo p?
(i) p = 7 (ii) p = 13 (iii) p = 19 (iv) p = 23

Exercises 55
(b) For which of the following primes is 3 a primitive root modulo p?
(i) p = 5 (ii) p = 7 (iii) p = 11 (iv) p = 17
(c) Find a primitive root for each of the following primes.
(i) p = 23 (ii) p = 29 (iii) p = 41 (iv) p = 43
(d) Find all primitive roots modulo 11. Verify that there are exactly φ(10) of them,
as asserted in Remark 1.32.
(e) Write a computer program to check for primitive roots and use it to find all
primitive roots modulo 229. Verify that there are exactly φ(228) of them.
(f) Use your program from (e) to find all primes less than 100 for which 2 is a
primitive root.
(g) Repeat the previous exercise to find all primes less than 100 for which 3 is a
primitive root. Ditto to find the primes for which 4 is a primitive root.
1.35. Let p be a prime such that q = 1
2
(p − 1) is also prime. Suppose that g is an
integer satisfying
g ≡ 0 (mod p) and g ≡ ±1 (mod p) and gq
≡ 1 (mod p).
Prove that g is a primitive root modulo p.
1.36. This exercise begins the study of squares and square roots modulo p.
(a) Let p be an odd prime number and let b be an integer with p b. Prove that
either b has two square roots modulo p or else b has no square roots modulo p.
In other words, prove that the congruence
X2
≡ b (mod p)
has either two solutions or no solutions in Z/pZ. (What happens for p = 2?
What happens if p | b?)
(b) For each of the following values of p and b, find all of the square roots of b
modulo p.
(i) (p, b) = (7, 2) (ii) (p, b) = (11, 5)
(iii) (p, b) = (11, 7) (iv) (p, b) = (37, 3)
(c) How many square roots does 29 have modulo 35? Why doesn’t this contradict
the assertion in (a)?
(d) Let p be an odd prime and let g be a primitive root modulo p. Then any
number a is equal to some power of g modulo p, say a ≡ gk
(mod p). Prove
that a has a square root modulo p if and only if k is even.
1.37. Let p ≥ 3 be a prime and suppose that the congruence
X2
≡ b (mod p)
has a solution.
(a) Prove that for every exponent e ≥ 1 the congruence
X2
≡ b (mod pe
) (1.14)
has a solution. (Hint. Use induction on e. Build a solution modulo pe+1
by
suitably modifying a solution modulo pe
.)
(b) Let X = α be a solution to X2
≡ b (mod p). Prove that in (a), we can find a
solution X = β to X2
≡ b (mod pe
) that also satisfies β ≡ α (mod p).

56 Exercises
(c) Let β and β
be two solutions as in (b). Prove that β ≡ β
(mod pe
).
(d) Use Exercise 1.36 to deduce that the congruence (1.14) has either two solutions
or no solutions modulo pe
.
1.38. Compute the value of
2(p−1)/2
(mod p)
for every prime 3 ≤ p 20. Make a conjecture as to the possible values of
2(p−1)/2
(mod p) when p is prime and prove that your conjecture is correct.
Section 1.6. Cryptography by Hand
1.39. Write a 2–5 page paper on one of the following topics, including both cryp-
tographic information and placing events in their historical context:
(a) Cryptography in the Arab world to the ﬁfteenth century.
(b) European cryptography in the ﬁfteenth and early sixteenth centuries.
(c) Cryptography and cryptanalysis in Elizabethan England.
(d) Cryptography and cryptanalysis in the nineteenth century.
(e) Cryptography and cryptanalysis during World War I.
(f) Cryptography and cryptanalysis during World War II.
(Most of these topics are too broad for a short term paper, so you should choose a
particular aspect on which to concentrate.)
1.40. A homophonic cipher is a substitution cipher in which there may be more than
one ciphertext symbol for each plaintext letter. Here is an example of a homophonic
cipher, where the more common letters have several possible replacements.
! 4 # $ 1 % * ( ) 3 2 = + [ 9 ] { } : ; 7 5 ?
♥ ◦ ℵ 6

♦ ∧ Δ ∇ 8 ♣ Ω ∨ ⊗ ♠
Θ ∞ ⇑ • ⊕ ⇐
⇓ ⇒
Decrypt the following message.
( % Δ ♠ ⇒ # 4 ∞ : ♦ 6

[ ℵ 8 % 2 [ 7 ⇓ ♣ ♥ 5 ∇
1.41. A transposition cipher is a cipher in which the letters of the plaintext remain
the same, but their order is rearranged. Here is a simple example in which the
message is encrypted in blocks of 25 letters at a time.19
Take the given 25 letters
and arrange them in a 5-by-5 block by writing the message horizontally on the lines.
For example, the ﬁrst 25 letters of the message
Now is the time for all good men to come to the aid...
is written as
N O W I S
T H E T I
M E F O R
A L L G O
O D M E N
19If the number of letters in the message is not an even multiple of 25, then extra random
letters are appended to the end of the message.

Exercises 57
Now the cipehrtext is formed by reading the letters down the columns, which gives
the ciphertext
NTMAO OHELD WEFLM ITOGE SIRON.
(a) Use this transposition cipher to encrypt the first 25 letters of the message
Four score and seven years ago our fathers...
(b) The following message was encrypted using this transposition cipher.
Decrypt it.
WNOOA HTUFN EHRHE NESUV ICEME
(c) There are many variations on this type of cipher. We can form the letters into a
rectangle instead of a square, and we can use various patterns to place the letters
into the rectangle and to read them back out. Try to decrypt the following
ciphertext, in which the letters were placed horizontally into a rectangle of
some size and then read off vertically by columns.
WHNCE STRHT TEOOH ALBAT DETET SADHE
LEELL QSFMU EEEAT VNLRI ATUDR HTEEA
(For convenience, we’ve written the ciphertext in 5 letter blocks, but that
doesn’t necessarily mean that the rectangle has a side of length 5.)
Section 1.7. Symmetric Ciphers and Asymmetric Ciphers
1.42. Encode the following phrase (including capitalization, spacing and punctua-
tion) into a string of bits using the ASCII encoding scheme given in Table 1.10.
Bad day, Dad.
1.43. Consider the affine cipher with key k = (k1, k2) whose encryption and de-
cryption functions are given by (1.11) on page 43.
(a) Let p = 541 and let the key be k = (34, 71). Encrypt the message m = 204.
Decrypt the ciphertext c = 431.
(b) Assuming that p is public knowledge, explain why the affine cipher is vulnerable
to a known plaintext attack. (See Property 4 on page 38.) How many plain-
text/ciphertext pairs are likely to be needed in order to recover the private
key?
(c) Alice and Bob decide to use the prime p = 601 for their affine cipher. The
value of p is public knowledge, and Eve intercepts the ciphertexts c1 = 324
and c2 = 381 and also manages to find out that the corresponding plaintexts
are m1 = 387 and m2 = 491. Determine the private key and then use it to
encrypt the message m3 = 173.
(d) Suppose now that p is not public knowledge. Is the affine cipher still vulnerable
to a known plaintext attack? If so, how many plaintext/ciphertext pairs are
likely to be needed in order to recover the private key?
1.44. Consider the Hill cipher defined by (1.11),
ek(m) ≡ k1 · m + k2 (mod p) and dk(c) ≡ k−1
1 · (c − k2) (mod p),
where m, c, and k2 are column vectors of dimension n, and k1 is an n-by-n matrix.

58 Exercises
(a) We use the vector Hill cipher with p = 7 and the key k1 = ( 1 3
2 2 ) and k2 = ( 5
4 ).
(i) Encrypt the message m = ( 2
1 ).
(ii) What is the matrix k−1
1 used for decryption?
(iii) Decrypt the message c = ( 3
5 ).
(b) Explain why the Hill cipher is vulnerable to a known plaintext attack.
(c) The following plaintext/ciphertext pairs were generated using a Hill cipher with
the prime p = 11. Find the keys k1 and k2.
m1 = ( 5
4 ) , c1 = ( 1
8 ) , m2 = ( 8
10 ) , c2 = ( 8
5 ) , m3 = ( 7
1 ) , c3 = ( 8
7 ) .
(d) Explain how any simple substitution cipher that involves a permutation of the
alphabet can be thought of as a special case of a Hill cipher.
1.45. Let N be a large integer and let K = M = C = Z/NZ. For each of the
functions
e : K × M −→ C
listed in (a)–(c), answer the following questions:
• Is e an encryption function?
• If e is an encryption function, what is its associated decryption function d?
• If e is not an encryption function, can you make it into an encryption function
by using some smaller, yet reasonably large, set of keys?
(a) ek(m) ≡ k − m (mod N).
(b) ek(m) ≡ k · m (mod N).
(c) ek(m) ≡ (k + m)2
(mod N).
1.46. (a) Convert the 12 bit binary number 110101100101 into a decimal integer
between 0 and 212
− 1.
(b) Convert the decimal integer m = 37853 into a binary number.
(c) Convert the decimal integer m = 9487428 into a binary number.
(d) Use exclusive or (XOR) to “add” the bit strings 11001010 ⊕ 10011010.
(e) Convert the decimal numbers 8734 and 5177 into binary numbers, combine
them using XOR, and convert the result back into a decimal number.
1.47. Alice and Bob choose a key space K containing 256
keys. Eve builds a special-
purpose computer that can check 10,000,000,000 keys per second.
(a) How many days does it take Eve to check half of the keys in K?
(b) Alice and Bob replace their key space with a larger set containing 2B
diﬀerent
keys. How large should Alice and Bob choose B in order to force Eve’s computer
to spend 100 years checking half the keys? (Use the approximation that there
are 365.25 days in a year.)
For many years the United States government recommended a symmetric cipher
called DES that used 56 bit keys. During the 1990s, people built special purpose
computers demonstrating that 56 bits provided insuﬃcient security. A new sym-
metric cipher called AES, with 128 bit keys, was developed to replace DES. See
Sect. 8.12 for further information about DES and AES.
1.48. Explain why the cipher
ek(m) = k ⊕ m and dk(c) = k ⊕ c

Exercises 59
defined by XOR of bit strings is not secure against a known plaintext attack.
Demonstrate your attack by finding the private key used to encrypt the 16-bit ci-
phertext c = 1001010001010111 if you know that the corresponding plaintext is
m = 0010010000101100.
1.49. Alice and Bob create a symmetric cipher as follows. Their private key k is a
large integer and their messages (plaintexts) are d-digit integers
M = {m ∈ Z : 0 ≤ m 10d
}.
To encrypt a message, Alice computes
√
k to d decimal places, throws away the part
to the left of the decimal point, and keeps the remaining d digits. Let α be this
d-digit number. (For example, if k = 87 and d = 6, then
√
87 = 9.32737905 . . . and
α = 327379.)
Alice encrypts a message m as
c ≡ m + α (mod 10d
).
Since Bob knows k, he can also find α, and then he decrypts c by comput-
ing m ≡ c − α (mod 10d
).
(a) Alice and Bob choose the secret key k = 11 and use it to encrypt 6-digit integers
(i.e., d = 6). Bob wants to send Alice the message m = 328973. What is the
ciphertext that he sends?
(b) Alice and Bob use the secret key k = 23 and use it to encrypt 8-digit integers.
Alice receives the ciphertext c = 78183903. What is the plaintext m?
(c) Show that the number α used for encryption and decryption is given by the
formula
α =

10d
√
k −
√
k

,
where t denotes the greatest integer that is less than or equal to t.
(d) (Challenge Problem) If Eve steals a plaintext/ciphertext pair (m, c), then it is
clear that she can recover the number α, since α ≡ c − m (mod 10d
). If 10d
is
large compared to k, can she also recover the number k? This might be useful,
for example, if Alice and Bob use some of the other digits of
√
k to encrypt
subsequent messages.
1.50. Bob and Alice use a cryptosystem in which their private key is a (large)
prime k and their plaintexts and ciphertexts are integers. Bob encrypts a message m
by computing the product c = km. Eve intercepts the following two ciphertexts:
c1 = 12849217045006222, c2 = 6485880443666222.
Use the gcd method described in Sect. 1.7.4 to find Bob and Alice’s private key.

Chapter 2
Discrete Logarithms
and Diffie–Hellman
2.1 The Birth of Public Key Cryptography
In 1976, Whitfield Diffie and Martin Hellman published their now famous
paper [38] entitled “New Directions in Cryptography.” In this paper they
formulated the concept of a public key encryption system and made several
groundbreaking contributions to this new field. A short time earlier, Ralph
Merkle had independently isolated one of the fundamental problems and in-
vented a public key construction for an undergraduate project in a computer
science class at Berkeley, but this was little understood at the time. Merkle’s
work “Secure communication over insecure channels” appeared in 1982 [83].
However, it turns out that the concept of public key encryption was orig-
inally discovered by James Ellis while working at the British Government
Communications Headquarters (GCHQ). Ellis’s discoveries in 1969 were clas-
sified as secret material by the British government and were not declassi-
fied and released until 1997, after his death. It is now known that two other
researchers at GCHQ, Malcolm Williamson and Clifford Cocks, discovered
the Diffie–Hellman key exchange algorithm and the RSA public key encryp-
tion system, respectively, before their rediscovery and public dissemination by
Diffie, Hellman, Rivest, Shamir, and Adleman. To learn more about the fas-
cinating history of public key cryptography, see for example [37, 42, 63, 139].
The Diffie–Hellman publication was an extremely important event—it set
forth the basic definitions and goals of a new field of mathematics/computer
science, a field whose existence was dependent on the then emerging age of
the digital computer. Indeed, their paper begins with a call to arms:
61

62 2. Discrete Logarithms and Diffie–Hellman
We stand today on the brink of a revolution in cryptography.
An original or breakthrough scientific idea is often called revolutionary, but
in this instance, as the authors were fully aware, the term revolutionary was
relevant in another sense. Prior to the publication of “New Directions. . . ,”
encryption research in the United States was the domain of the National Se-
curity Agency, and all information in this area was classified. Indeed, until the
mid-1990s, the United States government treated cryptographic algorithms as
munitions, which meant that their export was prosecutable as a treasonable
offense. Eventually, the government realized the futility of trying to prevent
free and open discussion about abstract cryptographic algorithms and the
dubious legality of restricting domestic use of strong cryptographic methods.
However, in order to maintain some control, the government continued to re-
strict export of high security cryptographic algorithms if they were “machine
readable.” Their object, to prevent widespread global dissemination of so-
phisticated cryptography programs to potential enemies of the United States,
was laudable,1
but there were two difficulties that rendered the government’s
policy unworkable.
First, the existence of optical scanners creates a very blurry line between
“machine readable” and “human text.” To protest the government’s policy,
people wrote a three line version of the RSA algorithm in a programming
language called perl and printed it on tee shirts and soda cans, thereby making
these products into munitions. In principle, wearing an “RSA enabled” tee
shirt on a flight from New York to Europe subjected the wearer to a large
fine and a 10 year jail term. Even more amusing (or frightening, depending
on your viewpoint), tattoos of the RSA perl code made people’s bodies into
non-exportable munitions!
Second, although these and other more serious protests and legal chal-
lenges had some effect, the government’s policy was ultimately rendered moot
by a simple reality. Public key algorithms are quite simple, and although it
requires a certain expertise to implement them in a secure fashion, the world is
full of excellent mathematicians and computer scientists and engineers. Thus
government restrictions on the export of “strong crypto” simply encouraged
the creation of cryptographic industries in other parts of the world. The gov-
ernment was able to slow the adoption of strong crypto for a few years, but
it is now possible for anyone to purchase for a nominal sum cryptographic
software that allows completely secure communications.2
1It is surely laudable to keep potential weapons out of the hands of one’s enemies,
but many have argued, with considerable justification, that the government also had the
less benign objective of preventing other governments from using communication methods
secure from United States prying.
2Of course, one never knows what cryptanalytic breakthroughs have been made by the
scientists at the National Security Agency, since virtually all of their research is classified.
The NSA is reputed to be the world’s largest single employer of Ph.D.s in mathematics.
However, in contrast to the situation before the 1970s, there are now far more cryptographers
employed in academia and in the business world than there are in government agencies.

2.1. The Birth of Public Key Cryptography 63
Domain Range
f
easy to compute
f−1
f−1
hard to compute
with trapdoor information
easy to compute
Figure 2.1: Illustration of a one-way trapdoor function
The first important contribution of Diffie and Hellman in [38] was the def-
inition of a Public Key Cryptosystem (PKC) and its associated components—
one-way functions and trapdoor information. A one-way function is an invert-
ible function that is easy to compute, but whose inverse is difficult to compute.
What does it mean to be “difficult to compute”? Intuitively, a function is dif-
ficult to compute if any algorithm that attempts to compute the inverse in
a “reasonable” amount of time, e.g., less than the age of the universe, will
almost certainly fail, where the phrase “almost certainly” must be defined
probabilistically. (For a more rigorous definition of “hardness,” see Sect. 2.6.)
Secure PKCs are built using one-way functions that have a trapdoor. The
trapdoor is a piece of auxiliary information that allows the inverse to be easily
computed. This idea is illustrated in Fig. 2.1, although it must be stressed
that there is a vast chasm separating the abstract idea of a one-way trapdoor
function and the actual construction of such a function.
As described in Sect. 1.7.6, the key for a public key (or asymmetric) cryp-
tosystem consists of two pieces, a private key kpriv and a public key kpub,
where in practice kpub is computed by applying some key-creation algorithm
to kpriv. For each public/private key pair (kpriv, kpub) there is an encryption
algorithm ekpub
and a corresponding decryption algorithm dkpriv
. The encryp-
tion algorithm ekpub
corresponding to kpub is public knowledge and easy to
compute. Similarly, the decryption algorithm dkpriv
must be easily computable
by someone who knows the private key kpriv, but it should be very difficult to
compute for someone who knows only the public key kpub.
One says that the private key kpriv is trapdoor information for the func-
tion ekpub
, because without the trapdoor information it is very hard to compute
the inverse function to ekpub
, but with the trapdoor information it is easy to
compute the inverse. Notice that in particular, the function that is used to
create kpub from kpriv must be difficult to invert, since kpub is public knowledge
and kpriv allows efficient decryption.
It may come as a surprise to learn that despite years of research, it is
still not known whether one-way functions exist. In fact, a proof of the exis-
tence of one-way functions would simultaneously solve the famous P = NP

problem in complexity theory.3
Various candidates for one-way functions have
been proposed, and some of them are used by modern public key encryption
algorithms. But it must be stressed that the security of these cryptosystems
rests on the assumption that inverting the underlying function (or finding the
private key from the public one) is a hard problem.
The situation is somewhat analogous to theories in physics that gain cred-
ibility over time, as they fail to be disproved and continue to explain or gen-
erate interesting phenomena. Diffie and Hellman made several suggestions
in [38] for one-way functions, including knapsack problems and exponenti-
ation mod q, but they did not produce an example of a PKC, mainly for
lack of finding the right trapdoor information. They did, however, describe a
public key method by which certain material could be securely shared over
an insecure channel. Their method, which is now called Diffie–Hellman key
exchange, is based on the assumption that the discrete logarithm problem
(DLP) is difficult to solve. We discuss the DLP in Sect. 2.2, and then describe
Diffie–Hellman key exchange in Sect. 2.3. In their paper, Diffie and Hellman
also defined a variety of cryptanalytic attacks and introduced the important
concepts of digital signatures and one-way authentication, which we discuss
in Chap. 4 and Sect. 8.5.
With the publication of [38] in 1976, the race was on to invent a practical
public key cryptosystem. Within 2 years, two major papers describing public
key cryptosystems were published: the RSA scheme of Rivest, Shamir, and
Adleman [110] and the knapsack scheme of Merkle and Hellman [84]. Of these
two, only RSA has withstood the test of time, in the sense that its underly-
ing hard problem of integer factorization is still sufficiently computationally
difficult to allow RSA to operate efficiently. By way of contrast, the knap-
sack system of Merkle and Hellman was shown to be insecure at practical
computational levels [124]. However, the cryptanalysis of knapsack systems
introduces important links to hard computational problems in the theory of
integer lattices that we explore in Chap. 7.
2.2 The Discrete Logarithm Problem
The discrete logarithm problem is a mathematical problem that arises in many
settings, including the mod p version described in this section and the elliptic
curve version that will be studied later, in Chap. 6. The first published public
key construction, due to Diffie and Hellman [38], is based on the discrete log-
arithm problem in a finite field Fp, where recall that Fp is a field with a prime
number of elements. (See Sect. 1.4.) For convenience, we interchangeably use
the notations Fp and Z/pZ for this field, and we use equality notation for ele-
ments of Fp and congruence notation for elements of Z/pZ (cf. Remark 1.23).
3The P = NP problem is one of the so-called Millennium Prizes, each of which has a
$1,000,000 prize attached. See Sect. 5.7 for more on P versus NP.

2.2. The Discrete Logarithm Problem 65
Let p be a (large) prime. Theorem 1.30 tells us that there exists a primitive
element g. This means that every nonzero element of Fp is equal to some power
of g. In particular, gp−1
= 1 by Fermat’s little theorem (Theorem 1.24), and
no smaller positive power of g is equal to 1. Equivalently, the list of elements
1, g, g2
, g3
, . . . , gp−2
∈ F∗
p
is a complete list of the elements in F∗
p in some order.
Definition. Let g be a primitive root for Fp and let h be a nonzero element
of Fp. The Discrete Logarithm Problem (DLP) is the problem of finding an
exponent x such that
gx
≡ h (mod p).
The number x is called the discrete logarithm of h to the base g and is denoted
by logg(h).
Remark 2.1. An older term for the discrete logarithm is the index, denoted
by indg(h). The index terminology is still commonly used in number theory. It
is also convenient if there is a danger of confusion between ordinary logarithms
and discrete logarithms, since, for example, the quantity log2 frequently occurs
in both contexts.
Remark 2.2. The discrete logarithm problem is a well-posed problem, namely
to find an integer exponent x such that gx
= h. However, if there is one so-
lution, then there are infinitely many, because Fermat’s little theorem (The-
orem 1.24) tells us that gp−1
≡ 1 (mod p). Hence if x is a solution to gx
= h,
then x + k(p − 1) is also a solution for every value of k, because
gx+k(p−1)
= gx
· (gp−1
)k
≡ h · 1k
≡ h (mod p).
Thus logg(h) is defined only up to adding or subtracting multiples of p − 1.
In other words, logg(h) is really defined modulo p − 1. It is not hard to verify
(Exercise 2.3(a)) that logg gives a well-defined function4
logg : F∗
p −→
Z
(p − 1)Z
. (2.1)
Sometimes, for concreteness, we refer to “the” discrete logarithm as the integer
x lying between 0 and p − 2 satisfying the congruence gx
≡ h (mod p).
Remark 2.3. It is not hard to prove (see Exercise 2.3(b)) that
logg(ab) = logg(a) + logg(b) for all a, b ∈ F∗
p.
4If you have studied complex analysis, you may have noticed an analogy with the com-
plex logarithm, which is not actually well defined on C∗. This is due to the fact that e2πi = 1,
so log(z) is well defined only up to adding or subtracting multiples of 2πi. The complex
logarithm thus defines an isomorphism from C∗ to the quotient group C/2πiZ, analogous
to (2.1).

n gn
mod p
1 627
2 732
3 697
4 395
5 182
6 253
7 543
8 760
9 374
10 189
n gn
mod p
11 878
12 21
13 934
14 316
15 522
16 767
17 58
18 608
19 111
20 904
h logg(h)
1 0
2 183
3 469
4 366
5 356
6 652
7 483
8 549
9 938
10 539
h logg(h)
11 429
12 835
13 279
14 666
15 825
16 732
17 337
18 181
19 43
20 722
Table 2.1: Powers and discrete logarithms for g = 627 modulo p = 941
Thus calling logg a “logarithm” is reasonable, since it converts multiplication
into addition in the same way as the usual logarithm function. In mathemat-
ical terminology, the discrete logarithm logg is a group isomorphism from F∗
p
to Z/(p − 1)Z.
Example 2.4. The number p = 56509 is prime, and one can check that g = 2
is a primitive root modulo p. How would we go about calculating the discrete
logarithm of h = 38679? The only method that is immediately obvious is to
compute
22
, 23
, 24
, 25
, 26
, 27
, . . . (mod 56509)
until we find some power that equals 38679. It would be difficult to do this
by hand, but using a computer, we find that log2(h) = 11235. You can verify
this by calculating 211235
mod 56509 and checking that it is equal to 38679.
Remark 2.5. It must be emphasized that the discrete logarithm bears lit-
tle resemblance to the continuous logarithm defined on the real or complex
numbers. The terminology is still reasonable, because in both instances the
process of exponentiation is inverted—but exponentiation modulo p varies in
a very irregular way with the exponent, contrary to the behavior of its contin-
uous counterpart. The random-looking behavior of exponentiation modulo p
is apparent from even a cursory glance at a table of values such as those in
Table 2.1, where we list the first few powers and the first few discrete loga-
rithms for the prime p = 941 and the base g = 627. The seeming randomness
is also illustrated by the scatter graph of 627i
mod 941 pictured in Fig. 2.2.
Remark 2.6. Our statement of the discrete logarithm problem includes the
assumption that the base g is a primitive root modulo p, but this is not strictly
necessary. In general, for any g ∈ F∗
p and any h ∈ F∗
p, the discrete logarithm
problem is the determination of an exponent x satisfying gx
≡ h (mod p),
assuming that such an x exists.
More generally, rather than taking nonzero elements of a finite field Fp and
multiplying them together or raising them to powers, we can take elements of

2.3. Diffie–Hellman Key Exchange 67
0 30 60 90 120 150 180 210 240 270
0
100
200
300
400
500
600
700
800
900
Figure 2.2: Powers 627i
mod 941 for i = 1, 2, 3, . . .
any group and use the group law instead of multiplication. This leads to the
most general form of the discrete logarithm problem. (If you are unfamiliar
with the theory of groups, we give a brief overview in Sect. 2.5.)
Definition. Let G be a group whose group law we denote by the symbol .
The Discrete Logarithm Problem for G is to determine, for any two given
elements g and h in G, an integer x satisfying
g g g · · · g

x times
= h.
2.3 Diffie–Hellman Key Exchange
The Diffie–Hellman key exchange algorithm solves the following dilemma.
Alice and Bob want to share a secret key for use in a symmetric cipher, but
their only means of communication is insecure. Every piece of information that
they exchange is observed by their adversary Eve. How is it possible for Alice
and Bob to share a key without making it available to Eve? At first glance it
appears that Alice and Bob face an impossible task. It was a brilliant insight
of Diffie and Hellman that the difficulty of the discrete logarithm problem
for F∗
p provides a possible solution.
The first step is for Alice and Bob to agree on a large prime p and a
nonzero integer g modulo p. Alice and Bob make the values of p and g public

knowledge; for example, they might post the values on their web sites, so Eve
knows them, too. For various reasons to be discussed later, it is best if they
choose g such that its order in F∗
p is a large prime. (See Exercise 1.33 for a
way of finding such a g.)
The next step is for Alice to pick a secret integer a that she does not reveal
to anyone, while at the same time Bob picks an integer b that he keeps secret.
Bob and Alice use their secret integers to compute
A ≡ ga
(mod p)

Alice computes this
and B ≡ gb
(mod p)

Bob computes this
.
They next exchange these computed values, Alice sends A to Bob and Bob
sends B to Alice. Note that Eve gets to see the values of A and B, since they
are sent over the insecure communication channel.
Finally, Bob and Alice again use their secret integers to compute
A
≡ Ba
(mod p)

Alice computes this
and B
≡ Ab
(mod p)

Bob computes this
.
The values that they compute, A
and B
respectively, are actually the same,
since
A
≡ Ba
≡ (gb
)a
≡ gab
≡ (ga
)b
≡ Ab
≡ B
(mod p).
This common value is their exchanged key. The Diffie–Hellman key exchange
algorithm is summarized in Table 2.2.
Public parameter creation
A trusted party chooses and publishes a (large) prime p
and an integer g having large prime order in F∗
p.
Private computations
Alice Bob
Choose a secret integer a. Choose a secret integer b.
Compute A ≡ ga
(mod p). Compute B ≡ gb
(mod p).
Public exchange of values
Alice sends A to Bob −
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
→ A
B ←
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
− Bob sends B to Alice
Further private computations
Alice Bob
Compute the number Ba
(mod p). Compute the number Ab
(mod p).
The shared secret value is Ba
≡ (gb
)a
≡ gab
≡ (ga
)b
≡ Ab
(mod p).
Table 2.2: Diffie–Hellman key exchange

2.3. Diffie–Hellman Key Exchange 69
Example 2.7. Alice and Bob agree to use the prime p = 941 and the
primitive root g = 627. Alice chooses the secret key a = 347 and computes
A = 390 ≡ 627347
(mod 941). Similarly, Bob chooses the secret key b = 781
and computes B = 691 ≡ 627781
(mod 941). Alice sends Bob the number 390
and Bob sends Alice the number 691. Both of these transmissions are done
over an insecure channel, so both A = 390 and B = 691 should be considered
public knowledge. The numbers a = 347 and b = 781 are not transmitted and
remain secret. Then Alice and Bob are both able to compute the number
470 ≡ 627347·781
≡ Ab
≡ Ba
(mod 941),
so 470 is their shared secret.
Suppose that Eve sees this entire exchange. She can reconstitute Alice’s
and Bob’s shared secret if she can solve either of the congruences
627a
≡ 390 (mod 941) or 627b
≡ 691 (mod 941),
since then she will know one of their secret exponents. As far as is known,
this is the only way for Eve to find the secret shared value without Alice’s or
Bob’s assistance.
Of course, our example uses numbers that are much too small to afford Al-
ice and Bob any real security, since it takes very little time for Eve’s computer
to check all possible powers of 627 modulo 941. Current guidelines suggest
that Alice and Bob choose a prime p having approximately 1000 bits (i.e.,
p ≈ 21000
) and an element g whose order is prime and approximately p/2.
Then Eve will face a truly difficult task.
In general, Eve’s dilemma is this. She knows the values of A and B, so she
knows the values of ga
and gb
. She also knows the values of g and p, so if she
can solve the DLP, then she can find a and b, after which it is easy for her to
compute Alice and Bob’s shared secret value gab
. It appears that Alice and
Bob are safe provided that Eve is unable to solve the DLP, but this is not
quite correct. It is true that one method of finding Alice and Bob’s shared
value is to solve the DLP, but that is not the precise problem that Eve needs
to solve. The security of Alice’s and Bob’s shared key rests on the difficulty
of the following, potentially easier, problem.
Definition. Let p be a prime number and g an integer. The Diffie–Hellman
Problem (DHP) is the problem of computing the value of gab
(mod p) from
the known values of ga
(mod p) and gb
(mod p).
It is clear that the DHP is no harder than the DLP. If Eve can solve the
DLP, then she can compute Alice and Bob’s secret exponents a and b from the
intercepted values A = ga
and B = gb
, and then it is easy for her to compute
their shared key gab
. (In fact, Eve needs to compute only one of a and b.) But
the converse is less clear. Suppose that Eve has an algorithm that efficiently
solves the DHP. Can she use it to also efficiently solve the DLP? The answer
is not known.

2.4 The Elgamal Public Key Cryptosystem
Although the Diffie–Hellman key exchange algorithm provides a method of
publicly sharing a random secret key, it does not achieve the full goal of being
a public key cryptosystem, since a cryptosystem permits exchange of specific
information, not just a random string of bits. The first public key cryptosys-
tem was the RSA system of Rivest, Shamir, and Adleman [110], which they
published in 1978. RSA was, and still is, a fundamentally important discovery,
and we discuss it in detail in Chap. 3. However, although RSA was historically
first, the most natural development of a public key cryptosystem following the
Diffie–Hellman paper [38] is a system described by Taher Elgamal in 1985 [41].
The Elgamal public key encryption algorithm is based on the discrete log prob-
lem and is closely related to Diffie–Hellman key exchange from Sect. 2.3. In
this section we describe the version of the Elgamal PKC that is based on the
discrete logarithm problem for F∗
p, but the construction works quite generally
using the DLP in any group. In particular, in Sect. 6.4.2 we discuss a version
of the Elgamal PKC based on elliptic curve groups.
The Elgamal PKC is our first example of a public key cryptosystem, so
we proceed slowly and provide all of the details. Alice begins by publishing
information consisting of a public key and an algorithm. The public key is
simply a number, and the algorithm is the method by which Bob encrypts
his messages using Alice’s public key. Alice does not disclose her private key,
which is another number. The private key allows Alice, and only Alice, to
decrypt messages that have been encrypted using her public key.
This is all somewhat vague and applies to any public key cryptosystem. For
the Elgamal PKC, Alice needs a large prime number p for which the discrete
logarithm problem in F∗
p is difficult, and she needs an element g modulo p of
large (prime) order. She may choose p and g herself, or they may have been
preselected by some trusted party such as an industry panel or government
agency.
Alice chooses a secret number a to act as her private key, and she computes
the quantity
A ≡ ga
(mod p).
Notice the resemblance to Diffie–Hellman key exchange. Alice publishes her
public key A and she keeps her private key a secret.
Now suppose that Bob wants to encrypt a message using Alice’s pub-
lic key A. We will assume that Bob’s message m is an integer between 2
and p. (Recall that we discussed how to convert messages into numbers in
Sect. 1.7.2.) In order to encrypt m, Bob first randomly chooses another num-
ber k modulo p.5
Bob uses k to encrypt one, and only one, message, and then
5Most public key cryptosystems require the use of random numbers in order to operate
securely. The generation of random or random-looking integers is actually a delicate process.
We discuss the problem of generating pseudorandom numbers in Sect. 8.2, but for now we
ignore this issue and assume that Bob has no trouble generating random numbers modulo p.

2.4. The Elgamal Public Key Cryptosystem 71
he discards it. The number k is called a random element; it exists for the sole
purpose of encrypting a single message.
Bob takes his plaintext message m, his random element k, and Alice’s
public key A and uses them to compute the two quantities
c1 ≡ gk
(mod p) and c2 ≡ mAk
(mod p).
(Remember that g and p are public parameters, so Bob also knows their val-
ues.) Bob’s ciphertext, i.e., his encryption of m, is the pair of numbers (c1, c2),
which he sends to Alice.
How does Alice decrypt Bob’s ciphertext (c1, c2)? Since Alice knows a, she
can compute the quantity
x ≡ (ca
1)−1
(mod p).
She can this by first computing c1
1 (mod p) using the fast power algorithm, and
then computing the inverse using the extended Euclidean algorithm. Alterna-
tively, she can just use fast powering to compute cp−1−a
1 (mod p). Alice next
multiplies c2 by x, and lo and behold, the resulting value is the plaintext m.
To see why, we expand the value of x · c2 and find that
x · c2 ≡ (ca
1)−1
· c2 (mod p), since x ≡ (ca
1)−1
(mod p),
≡ (gak
)−1
· (mAk
) (mod p), since c1 ≡ gk
, c2 ≡ mAk
(mod p),
≡ (gak
)−1
· (m(ga
)k
) (mod p), since A ≡ ga
(mod p),
≡ m (mod p), since the gak
terms cancel out.
The Elgamal public key cryptosystem is summarized in Table 2.3.
What is Eve’s task in trying to decrypt the message? Eve knows the pub-
lic parameters p and g, and she also knows the value of A ≡ ga
(mod p),
since Alice’s public key A is public knowledge. If Eve can solve the dis-
crete logarithm problem, then she can find a and decrypt the message. More
precisely, it’s enough for Eve to solve the Diffie–Hellman problem; see Exer-
cise 2.9. Otherwise it appears difficult for Eve to find the plaintext, although
there are subtleties, some of which we’ll discuss after doing an example with
small numbers.
Example 2.8. Alice uses the prime p = 467 and the primitive root g = 2. She
chooses a = 153 to be her private key and computes her public key
A ≡ ga
≡ 2153
≡ 224 (mod 467).
Bob decides to send Alice the message m = 331. He chooses a random element,
say he chooses k = 197, and he computes the two quantities
c1 ≡ 2197
≡ 87 (mod 467) and c2 ≡ 331 · 224197
≡ 57 (mod 467).
The pair (c1, c2) = (87, 57) is the ciphertext that Bob sends to Alice.

A trusted party chooses and publishes a large prime p
and an element g modulo p of large (prime) order.
Alice Bob
Key creation
Choose private key 1 ≤ a ≤ p − 1.
Compute A = ga
(mod p).
Publish the public key A.
Encryption
Choose plaintext m.
Choose random element k.
Use Alice’s public key A
to compute c1 = gk
(mod p)
and c2 = mAk
(mod p).
Send ciphertext (c1, c2) to Alice.
Decryption
Compute (ca
1)−1
· c2 (mod p).
This quantity is equal to m.
Table 2.3: Elgamal key creation, encryption, and decryption
Alice, knowing a = 153, first computes
x ≡ (ca
1)−1
≡ cp−1−a
1 ≡ 87313
≡ 14 (mod 467).
Finally, she computes
c2x ≡ 57 · 14 ≡ 331 (mod 467)
and recovers the plaintext message m.
Remark 2.9. In the Elgamal cryptosystem, the plaintext is an integer m be-
tween 2 and p − 1, while the ciphertext consists of two integers c1 and c2 in
the same range. Thus in general it takes twice as many bits to write down the
ciphertext as it does to write down the plaintext. We say that Elgamal has a
2-to-1 message expansion.
It’s time to raise an important question. Is the Elgamal system as hard for
Eve to attack as the Diffie–Hellman problem? Or, by introducing a clever way
of encrypting messages, have we unwittingly opened a back door that makes
it easy to decrypt messages without solving the Diffie–Hellman problem? One
of the goals of modern cryptography is to identify an underlying hard problem
like the Diffie–Hellman problem and to prove that a given cryptographic con-
struction like Elgamal is at least as hard to attack as the underlying problem.
In this case we would like to prove that anyone who can decrypt arbitrary
ciphertexts created by Elgamal encryption, as summarized in Table 2.3, must

2.4. The Elgamal Public Key Cryptosystem 73
also be able to solve the Diffie–Hellman problem. Specifically, we would like
to prove the following:
Proposition 2.10. Fix a prime p and base g to use for Elgamal encryption.
Suppose that Eve has access to an oracle that decrypts arbitrary Elgamal ci-
phertexts encrypted using arbitrary Elgamal public keys. Then she can use the
oracle to solve the Diffie–Hellman problem described on page 69.
Conversely, if Eve can solve the Diffie–Hellman problem, then she can
break the Elgamal PKC.
Proof. Rather than giving a compact formal proof, we will be more discursive
and explain how one might approach the problem of using an Elgamal oracle to
solve the Diffie–Hellman problem. Recall that in the Diffie–Hellman problem,
Eve is given the two values
A ≡ ga
(mod p) and B ≡ gb
(mod p),
and she is required to compute the value of gab
(mod p). Keep in mind that
she knows both of the values of A and B, but she does not know either of the
values a and b.
Now suppose that Eve can consult an Elgamal oracle. This means that
Eve can send the oracle a prime p, a base g, a purported public key A, and
a purported cipher text (c1, c2). Referring to Table 2.3, the oracle returns to
Eve the quantity
(ca
1)−1
· c2 (mod p).
If Eve wants to solve the Diffie–Hellman problem, what values of c1 and c2
should she choose? A little thought shows that c1 = B = gb
and c2 = 1 are
good choices, since with this input, the oracle returns (gab
)−1
(mod p), and
then Eve can take the inverse modulo p to obtain gab
(mod p), thereby solving
the Diffie–Hellman problem.
But maybe the oracle is smart enough to know that it should never decrypt
ciphertexts having c2 = 1. Eve can still fool the oracle by sending it random-
looking ciphertexts as follows. She chooses an arbitrary value for c2 and tells
the oracle that the public key is A and that the ciphertext is (B, c2). The
oracle returns to her the supposed plaintext m that satisfies
m ≡ (ca
1)−1
· c2 ≡ (Ba
)−1
· c2 ≡ (gab
)−1
· c2 (mod p).
After the oracle tells Eve the value of m, she simply computes
m−1
· c2 ≡ gab
(mod p)
to find the value of gab
(mod p). It is worth noting that although, with the
oracle’s help, Eve has computed gab
(mod p), she has done so without knowl-
edge of a or b, so she has solved only the Diffie–Hellman problem, not the
discrete logarithm problem.
We leave the proof of the converse, i.e., that a Diffie–Hellman oracle breaks
the Elgamal PKC, as an exercise; see Exercise 2.9.

2.5 An Overview of the Theory of Groups
For readers unfamiliar with the theory of groups, we briefly introduce a few
basic concepts that should help to place the study of discrete logarithms, both
here and in Chap. 6, into a broader context.
We’ve just spent some time talking about exponentiation of elements in F∗
p.
Since exponentiation is simply repeated multiplication, this seems like a good
place to start. What we’d like to do is to underline some important properties
of multiplication in F∗
p and to point out that these attributes appear in many
other contexts.
The properties are:
• There is an element 1 ∈ F∗
p satisfying 1 · a = a for every a ∈ F∗
p.
• Every a ∈ F∗
p has an inverse a−1
∈ F∗
p satisfying a · a−1
= a−1
· a = 1.
• Multiplication is associative: a · (b · c) = (a · b) · c for all a, b, c ∈ F∗
p.
• Multiplication is commutative: a · b = b · a for all a, b ∈ F∗
p.
Suppose that instead of multiplication in F∗
p, we substitute addition in Fp. We
also use 0 in place of 1 and −a in place of a−1
. Then all four properties are
still true:
• 0 + a = a for every a ∈ Fp.
• Every a ∈ Fp has an inverse −a ∈ Fp with a + (−a) = (−a) + a = 0.
• Addition is associative, a + (b + c) = (a + b) + c for all a, b, c ∈ Fp.
• Addition is commutative, a + b = b + a for all a, b ∈ Fp.
Sets and operations that behave similarly to multiplication or addition are
so widespread that it is advantageous to abstract the general concept and talk
about all such systems at once. The leads to the notion of a group.
Definition. A group consists of a set G and a rule, which we denote by ,
for combining two elements a, b ∈ G to obtain an element a b ∈ G. The
composition operation is required to have the following three properties:
[Identity Law] There is an e ∈ G such that
e a = a e = a for every a ∈ G.
[Inverse Law] For every a ∈ G there is a (unique) a−1
∈ G
satisfying a a−1
= a−1
a = e.
[Associative Law] a (b c) = (a b) c for all a, b, c ∈ G.
If, in addition, composition satisfies the
[Commutative Law] a b = b a for all a, b ∈ G,
then the group is called a commutative group or an abelian group.
If G has finitely many elements, we say that G is a finite group. The order
of G is the number of elements in G; it is denoted by |G| or #G.

2.5. An Overview of the Theory of Groups 75
Example 2.11. Groups are ubiquitous in mathematics and in the physical
sciences. Here are a few examples, the first two repeating those mentioned
earlier:
(a) G = F∗
p and = multiplication. The identity element is e = 1. Propo-
sition 1.21 tells us that inverses exist. Then G is a finite group of or-
der p − 1.
(b) G = Z/NZ and = addition. The identity element is e = 0 and the
inverse of a is −a. This G is a finite group of order N.
(c) G = Z and = addition. The identity element is e = 0 and the inverse
of a is −a. This group G is an infinite group.
(d) Note that G = Z and = multiplication is not a group, since most
elements do not have multiplicative inverses inside Z.
(e) However, G = R∗
and = multiplication is a group, since all elements
have multiplicative inverses inside R∗
.
(f) An example of a noncommutative group is
G =

a b
c d

: a, b, c, d ∈ R and ad − bc = 0

with operation = matrix multiplication. The identity element is
e = ( 1 0
0 1 ) and the inverse is given by the familiar formula

a b
c d

−1
=
d
ad−bc
−b
ad−bc
−c
ad−bc
a
ad−bc

.
Notice that G is noncommutative, since for example, ( 1 1
0 1 ) ( 1 1
1 0 ) is not
equal to ( 1 1
1 0 ) ( 1 1
0 1 ).
(g) More generally, we can use matrices of any size. This gives the general
linear group
GLn(R) =

n-by-n matrices A with real coefficients and det(A) = 0

and operation = matrix multiplication. We can form other groups
by replacing R with some other field, for example, the finite field Fp.
(See Exercise 2.15.) The group GLn(Fp) is clearly a finite group, but
computing its order is an interesting exercise.
Let g be an element of a group G and let x be a positive integer. Then gx
means that we apply the group operation to x copies of the element g,
gx
= g g g · · · g

x repetitions
.

For example, exponentiation gx
in the group F∗
p has the usual meaning, multi-
ply x copies of g. But “exponentiation” gx
in the group Z/NZ means to add x
copies of g. Admittedly, it is more common to write the quantity “add x copies
of g” as x·g, but this is just a matter of notation. The key concept underlying
exponentiation in a group is repeated application of the group operation to
an element of the group.
It is also convenient to give a meaning to gx
when x is not positive. So if x
is a negative integer, we define gx
to be (g−1
)|x|
. For x = 0, we set g0
= e,
the identity element of G.
We now introduce a key concept used in the study of groups.
Definition. Let G be a group and let a ∈ G be an element of the group.
Suppose there exists a positive integer d with the property that ad
= e. The
smallest such d is called the order of a. If there is no such d, then a is said to
have infinite order.
We next prove two propositions describing important properties of the
orders of group elements. These are generalizations of Theorem 1.24 (Fermat’s
little theorem) and Proposition 1.29, which deal with the group G = F∗
p. The
proofs are essentially the same.
Proposition 2.12. Let G be a finite group. Then every element of G has
finite order. Further, if a ∈ G has order d and if ak
= e, then d | k.
Proof. Since G is finite, the sequence
a, a2
, a3
, a4
, . . .
must eventually contain a repetition. That is, there exist positive integers i
and j with j i such that ai
= aj
. Multiplying both sides by a−j
and applying
the group laws leads to ai−j
= e. Since i − j 0, this proves that some power
of a is equal to e. We let d be the smallest positive exponent satisfying ad
= e.
Now suppose that k ≥ d also satisfies ak
= e. We divide k by d to obtain
k = dq + r with 0 ≤ r d.
Using the fact that ak
= ad
= e, we find that
e = ak
= adq+r
= (ad
)q
ar
= eq
ar
= ar
.
But d is the smallest positive power of a that is equal to e, so we must
have r = 0. Therefore k = dq, so d | k.
Proposition 2.13 (Lagrange’s Theorem). Let G be a finite group and let
a ∈ G. Then the order of a divides the order G.
More precisely, let n = |G| be the order of G and let d be the order of a,
i.e., ad
is the smallest positive power of a that is equal to e. Then
an
= e and d | n.

2.6. How Hard Is the Discrete Logarithm Problem? 77
Proof. We give a simple proof in the case that G is commutative. For a proof
in the general case, see any basic algebra textbook, for example [40, §3.2]
or [45, §2.3].
Since G is finite, we can list its elements as
G = {g1, g2, . . . , gn}.
We now multiply each element of G by a to obtain a new set, which we call Sa,
Sa = {a g1, a g2, . . . , a gn}.
We claim that the elements of Sa are distinct. To see this, suppose that
a gi = a gj. Multiplying both sides by a−1
yields gi = gj.6
Thus Sa
contains n distinct elements, which is the same as the number of elements
of G. Therefore Sa = G, so if we multiply together all of the elements of Sa,
we get the same answer as multiplying together all of the elements of G. (Note
that we are using the assumption that G is commutative.) Thus
(a g1) (a g2) · · · (a gn) = g1 g2 · · · gn.
We can rearrange the order of the product on the left-hand side (again using
the commutativity) to obtain
an
g1 g2 · · · gn = g1 g2 · · · gn.
Now multiplying by (g1 g2 · · · gn)−1
yields an
= e, which proves the
first statement, and then the divisibility of n by d follows immediately from
Proposition 2.12.
2.6 How Hard Is the Discrete Logarithm
Problem?
Given a group G and two elements g, h ∈ G, the discrete logarithm prob-
lem asks for an exponent x such that gx
= h. What does it mean to talk
about the difficulty of this problem? How can we quantify “hard”? A natural
measure of hardness is the approximate number of operations necessary for
a person or a computer to solve the problem using the most efficient method
currently known. For example, we can solve the discrete logarithm problem
by computing the list of values g, g2
, g3
, . . . until we find one that is equal
to h. If g has order n, then this algorithm is guaranteed to find the solution
6We are being somewhat informal here, as is usually done when one is working with
groups. Here is a more formal proof. We are given that agi = agj. We use this assumption
and the group law axioms to compute
gi = e gi = (a−1 a) gi = a−1 (a gi) = a−1 (a gj) = (a−1 a) gj = e gj = gj.

in at most n multiplications, but if n is large, say n 280
, then it is not a
practical algorithm with the computing power available today.
Alternatively, we might try choosing random values of x, compute gx
, and
check if gx
= h. Using the fast exponentiation method described in Sect. 1.3.2,
it takes a small multiple of log2(x) modular multiplications to compute gx
.
If n and x are k-bit numbers, that is, they are each approximately 2k
, then this
trial-and-error approach requires about k·2k
multiplications. If we are working
in the group F∗
p and if we treat modular addition as our basic operation,
then modular multiplication of two k-bit numbers takes (approximately) k2
basic operations, so solving the DLP by trial-and-error takes a small multiple
of k2
· 2k
basic operations.
We are being somewhat imprecise when we talk about “small multiples”
of 2k
or k · 2k
or k2
· 2k
. This is because when we want to know whether a
computation is feasible, numbers such as 3 · 2k
and 10 · 2k
and 100 · 2k
mean
pretty much the same thing if k is large. The important property is that
the constant multiple is fixed as k increases. Order notation was invented
to make these ideas precise.7
It is prevalent throughout mathematics and
computer science and provides a handy way to get a grip on the magnitude
of quantities.
Definition (Order Notation). Let f(x) and g(x) be functions of x taking
values that are positive. We say that “f is big-O of g” and write
f(x) = O

g(x)

if there are positive constants c and C such that
f(x) ≤ cg(x) for all x ≥ C.
In particular, we write f(x) = O(1) if f(x) is bounded for all x ≥ C.
The next proposition gives a method that can sometimes be used to prove
that f(x) = O

g(x)

.
Proposition 2.14. If the limit
lim
x→∞
f(x)
g(x)
exists (and is finite), then f(x) = O

g(x)

.
Proof. Let L be the limit. By definition of limit, for any 0 there is a
constant C such that

f(x)
g(x)
− L

for all x C.
7Although we use the same word for the order of a finite group and the order of growth
of a function, they are two different concepts. Make sure that you don’t confuse them.

2.6. How Hard Is the Discrete Logarithm Problem? 79
In particular, taking = 1, we find that
f(x)
g(x)
L + 1 for all x C1.
Hence by definition, f(x) = O

g(x)

with c = L + 1 and C = C1.
Example 2.15. We have 2x3
− 3x2
+ 7 = O(x3
), since
lim
x→∞
2x3
− 3x2
+ 7
x3
= 2.
Similarly, we have x2
= O(2x
), since
lim
x→∞
x2
2x
= 0.
(If you don’t know the value of this limit, use L’Hôpital’s rule twice.)
However, note that we may have f(x) = O

g(x)

even if the limit of
f(x)/g(x) does not exist. For example, the limit
lim
x→∞
(x + 2) cos2
(x)
x
does not exist, but
(x + 2) cos2
(x) = O(x), since (x + 2) cos2
(x) ≤ x + 2 ≤ 2x for all x ≥ 2.
Example 2.16. Here are a few more examples of big-O notation. We leave the
verification as an exercise.
(a) x2
+
√
x = O

x2

. (d) (ln k)375
= O

k0.001

.
(b) 5 + 6x2
− 37x5
= O

x5

. (e) k2
2k
= O

e2k

.
(c) k300
= O

2k

. (f) N10
2N
= O

eN

.
Order notation allows us to define several fundamental concepts that are
used to get a rough handle on the computational complexity of mathematical
problems.
Definition. Suppose that we are trying to solve a certain type of mathemat-
ical problem, where the input to the problem is a number whose size may
vary. As an example, consider the Integer Factorization Problem, whose input
is a number N and whose output is a prime factor of N. We are interested
in knowing how long it takes to solve the problem in terms of the size of the
input. Typically, one measures the size of the input by its number of bits,
since that is how much storage it takes to record the input.
Suppose that there is a constant A ≥ 0, independent of the size of the
input, such that if the input is O(k) bits long, then it takes O(kA
) steps to
solve the problem. Then the problem is said to be solvable in polynomial time.

If we can take A = 1, then the problem is solvable in linear time, and if we can
take A = 2, then the problem is solvable in quadratic time. Polynomial-time
algorithms are considered to be fast algorithms.
On the other hand, if there is a constant c 0 such that for inputs of
size O(k) bits, there is an algorithm to solve the problem in O

eck

steps,
then the problem is solvable in exponential time. Exponential-time algorithms
are considered to be slow algorithms.
Intermediate between polynomial-time algorithms and exponential-time
algorithms are subexponential-time algorithms. These have the property that
for every 0, they solve the problem in O

ek

steps. This notation
means that the constants c and C appearing in the deﬁnition of order no-
tation are allowed to depend on . For example, in Chap. 3 we will study
a subexponential-time algorithm for the integer factorization problem whose
running time is O

ec
√
k log k

steps.
As a general rule of thumb in cryptography, problems solvable in polyno-
mial time are considered to be “easy” and problems that require exponential
time are viewed as “hard,” with subexponential time lying somewhere in be-
tween. However, bear in mind that these are asymptotic descriptions that are
applicable only as the variables become very large. Depending on the big-O
constants and on the size of the input, an exponential problem may be easier
than a polynomial problem. We illustrate these general concepts by consider-
ing the discrete logarithm problem in various groups.
Example 2.17. We start with our original discrete logarithm problem gx
= h
in G = F∗
p. If the prime p is chosen between 2k
and 2k+1
, then g, h, and p
all require at most k bits, so the problem can be stated in O(k)-bits. (Notice
that O(k) is the same as O(log2 p).)
If we try to solve the DLP using the trial-and-error method mentioned
earlier, then it takes O(p) steps to solve the problem. Since O(p) = O(2k
),
this algorithm takes exponential time. (If we consider instead multiplication
or addition to be the basic operation, then the algorithm takes O(k · 2k
)
or O(k2
· 2k
) steps, but these distinctions are irrelevant; the running time is
still exponential, since for example it is O(3k
).)
However, there are faster ways to solve the DLP in F∗
p, some of which
are very fast but work only for some primes, while others are less fast, but
work for all primes. For example, the Pohlig–Hellman algorithm described in
Sect. 2.9 shows that if p − 1 factors entirely into a product of small primes,
then the DLP is quite easy. For arbitrary primes, the algorithm described in
Sect. 2.7 solves the DLP in O(
√
p log p) steps, which is much faster than O(p),
but still exponential. Even better is the index calculus algorithm described in
Sect. 3.8. The index calculus solves the DLP in O(ec
√
(log p)(log log p)
) steps, so
it is a subexponential algorithm.
Example 2.18. We next consider the DLP in the group G = Fp, where now
the group operation is addition. The DLP in this context asks for a solution x
to the congruence

2.7. A Collision Algorithm for the DLP 81
x · g ≡ h (mod p),
where g and h are given elements of Z/pZ. As described in Sect. 1.3, we
can solve this congruence using the extended Euclidean algorithm (Theo-
rem 1.11) to compute g−1
(mod p) and setting x ≡ g−1
· h (mod p). This
takes O(log p) steps (see Remark 1.15), so there is a linear-time algorithm to
solve the DLP in the additive group Fp. This is a very fast algorithm, so the
DLP in Fp with addition is not a good candidate for use as a one-way function
in cryptography.
This is an important lesson to learn. The discrete logarithm problems in
different groups may display different levels of difficulty for their solution.
Thus the DLP in Fp with addition has a linear-time solution, while the best
known general algorithm to solve the DLP in F∗
p with multiplication is subex-
ponential. In Chap. 6 we discuss another sort of group called an elliptic curve.
The discrete logarithm problem for elliptic curves is believed to be even more
difficult than the DLP for F∗
p. In particular, if the elliptic curve group is cho-
sen carefully and has N elements, then the best known algorithm to solve the
DLP requires O(
√
N) steps. Thus it currently takes exponential time to solve
the elliptic curve discrete logarithm problem (ECDLP).
2.7 A Collision Algorithm for the DLP
In this section we describe a discrete logarithm algorithm due to Shanks. It
is an example of a collision, or meet-in-the-middle, algorithm. Algorithms of
this type are discussed in more detail in Sects. 5.4 and 5.5. Shanks’s algorithm
works in any group, not just F∗
p, and the proof that it works is no more difficult
for arbitrary groups, so we state and prove it in full generality.
We begin by recalling the running time of the trivial brute-force algorithm
to solve the DLP.
Proposition 2.19 (Trivial Bound for DLP). Let G be a group and let g ∈ G
be an element of order N. (Recall that this means that gN
= e and that
no smaller positive power of g is equal to the identity element e.) Then the
discrete logarithm problem
gx
= h (2.2)
can be solved in O(N) steps and O(1) storage, where each step consists of
multiplication by g.
Proof. We simply compute g, g2
, g3
, . . ., where each successive value is ob-
tained by multiplying the previous value by g, so we only need to store two
values at a time. If a solution to gx
= h exists, then h will appear before we
reach gN
.
Remark 2.20. If we work in F∗
p, then each computation of gx
(mod p) re-
quires O((log p)k
) computer operations, where the constant k and the implied

big-O constant depend on the computer and the algorithm used for modular
multiplication. Then the total number of computer steps, or running time,
is O(N(log p)k
). In general, the factor contributed by the O((log p)k
) is neg-
ligible, so we will suppress it and simply refer to the running time as O(N).
The idea behind a collision algorithm is to make two lists and look for
an element that appears in both lists. For the discrete logarithm problem
described in Proposition 2.19, the running time of a collision algorithm is a
little more than O(
√
N ) steps, which is a huge savings over O(N) if N is
large.
Proposition 2.21 (Shanks’s Babystep–Giantstep Algorithm). Let G be a
group and let g ∈ G be an element of order N ≥ 2. The following algo-
rithm solves the discrete logarithm problem gx
= h in O(
√
N · log N) steps
using O(
√
N) storage.
(1) Let n = 1 +
√
N , so in particular, n
√
N.
(2) Create two lists,
List 1: e, g, g2
, g3
, . . . , gn
,
List 2: h, h · g−n
, h · g−2n
, h · g−3n
, . . . , h · g−n2
.
(3) Find a match between the two lists, say gi
= hg−jn
.
(4) Then x = i + jn is a solution to gx
= h.
Proof. We begin with a couple of observations. First, when creating List 2,
we start by computing the quantity u = g−n
and then compile List 2 by
computing h, h · u, h · u2
, . . . , h · un
. Thus creating the two lists takes approx-
imately 2n multiplications.8
Second, assuming that a match exists, we can
ﬁnd a match in a small multiple of n log(n) steps using standard sorting and
searching algorithms, so Step (3) takes O(n log n) steps. Hence the total run-
ning time for the algorithm is O(n log n) = O(
√
N log N). For this last step
we have used the fact that n ≈
√
N, so
n log n ≈
√
N log
√
N =
1
2
√
N log N.
Third, the lists in Step (2) have length n, so require O(
√
N) storage.
In order to prove that the algorithm works, we must show that Lists 1
and 2 always have a match. To see this, let x be the unknown solution to
gx
= h and write x as
x = nq + r with 0 ≤ r n.
8Multiplication by g is a “baby step” and multiplication by u = g−n is a “giant step,”
whence the name of the algorithm.

2.8. The Chinese Remainder Theorem 83
k gk h · uk
1 9704 347
2 6181 13357
3 5763 12423
4 1128 13153
5 8431 7928
6 16568 1139
7 14567 6259
8 2987 12013
k gk h · uk
9 15774 16564
10 12918 11741
11 16360 16367
12 13259 7315
13 4125 2549
14 16911 10221
15 4351 16289
16 1612 4062
k gk h · uk
17 10137 10230
18 17264 3957
19 4230 9195
20 9880 13628
21 9963 10126
22 15501 5416
23 6854 13640
24 15680 5276
k gk h · uk
25 4970 12260
26 9183 6578
27 10596 7705
28 2427 1425
29 6902 6594
30 11969 12831
31 6045 4754
32 7583 14567
Table 2.4: Babystep–giantstep to solve 9704x
≡ 13896 (mod 17389)
We know that 1 ≤ x N, so
q =
x − r
n

N
n
n since n
√
N.
Hence we can rewrite the equation gx
= h as
gr
= h · g−qn
with 0 ≤ r n and 0 ≤ q n.
Thus gr
is in List 1 and h · g−qn
is in List 2, which shows that Lists 1 and 2
have a common element.
Example 2.22. We illustrate Shanks’s babystep–giantstep method by using it
to solve the discrete logarithm problem
gx
= h in F∗
p with g = 9704, h = 13896, and p = 17389.
The number 9704 has order 1242 in F∗
17389.9
Set n =
√
1242 + 1 = 36 and
u = g−n
= 9704−36
= 2494. Table 2.4 lists the values of gk
and h · uk
for
k = 1, 2, . . . . From the table we ﬁnd the collision
97047
= 14567 = 13896 · 249432
in F17389.
Using the fact that 2494 = 9704−36
, we compute
13896 = 97047
· 2494−32
= 97047
· (970436
)32
= 97041159
in F17389.
Hence x = 1159 solves the problem 9704x
= 13896 in F17389.
2.8 The Chinese Remainder Theorem
The Chinese remainder theorem describes the solutions to a system of simul-
taneous linear congruences. The simplest situation is a system of two congru-
ences,
9Lagrange’s theorem (Proposition 2.13) says that the order of g divides 17388 = 22 ·
33 · 7 · 23. So we can determine the order of g by computing gn for the 48 distinct divisors
of 17388, although in practice there are more eﬃcient methods.

x ≡ a (mod m) and x ≡ b (mod n), (2.3)
with gcd(m, n) = 1, in which case the Chinese remainder theorem says that
there is a unique solution modulo mn.
The first recorded instance of a problem of this type appears in a Chinese
mathematical work from the late third or early fourth century. It actually
deals with the harder problem of three simultaneous congruences.
We have a number of things, but we do not know exactly how
many. If we count them by threes, we have two left over. If we
count them by fives, we have three left over. If we count them by
sevens, we have two left over. How many things are there? [Sun Tzu
Suan Ching (Master Sun’s Mathematical Manual) circa 300 AD,
volume 3, problem 26.]
The Chinese remainder theorem and its generalizations have many appli-
cations in number theory and other areas of mathematics. In Sect. 2.9 we will
see how it can be used to solve certain instances of the discrete logarithm
problem. We begin with an example in which we solve two simultaneous con-
gruences. As you read this example, notice that it is not merely an abstract
statement that a solution exists. The method that we describe is really an
algorithm that allows us to find the solution.
Example 2.23. We look for an integer x that simultaneously solves both of
the congruences
x ≡ 1 (mod 5) and x ≡ 9 (mod 11). (2.4)
The first congruence tells us that x ≡ 1 (mod 5), so the full set of solutions
to the first congruence is the collection of integers
x = 1 + 5y, y ∈ Z. (2.5)
Substituting (2.5) into the second congruence in (2.4) gives
1 + 5y ≡ 9 (mod 11), and hence 5y ≡ 8 (mod 11). (2.6)
We solve for y by multiplying both sides of (2.6) by the inverse of 5 mod-
ulo 11. This inverse exists because gcd(5, 11) = 1 and can be computed using
the procedure described in Proposition 1.13 (see also Remark 1.15). How-
ever, in this case the modulus is so small that we find it by trial and error;
thus 5 · 9 = 45 ≡ 1 (mod 11).
In any case, multiplying both sides of (2.6) by 9 yields
y ≡ 9 · 8 ≡ 72 ≡ 6 (mod 11).
Finally, substituting this value of y into (2.5) gives the solution
x = 1 + 5 · 6 = 31
to the original problem.

The procedure outlined in Example 2.23 can be used to derive a general
formula for the solution of two simultaneous congruences (see Exercise 2.20),
but it is much better to learn the method, rather than memorizing a for-
mula. This is especially true because the Chinese remainder theorem applies
to systems of arbitrarily many simultaneous congruences.
Theorem 2.24 (Chinese Remainder Theorem). Let m1, m2, . . . , mk be a col-
lection of pairwise relatively prime integers. This means that
gcd(mi, mj) = 1 for all i = j.
Let a1, a2, . . . , ak be arbitrary integers. Then the system of simultaneous con-
gruences
x ≡ a1 (mod m1), x ≡ a2 (mod m2), . . . , x ≡ ak (mod mk) (2.7)
has a solution x = c. Further, if x = c and x = c
are both solutions, then
c ≡ c
(mod m1m2 · · · mk). (2.8)
Proof. Suppose that for some value of i we have already managed to find a
solution x = ci to the first i simultaneous congruences,
x ≡ a1 (mod m1), x ≡ a2 (mod m2), . . . , x ≡ ai (mod mi). (2.9)
For example, if i = 1, then c1 = a1 works. We are going to explain how to
find a solution to one more congruence,
x ≡ a1 (mod m1), x ≡ a2 (mod m2), . . . , x ≡ ai+1 (mod mi+1).
The idea is to look for a solution having the form
x = ci + m1m2 · · · miy.
Notice that this value of x still satisfies all of the congruences (2.9), so we
need merely choose y so that it also satisfies x ≡ ai+1 (mod mi+1). In other
words, we need to find a value of y satisfying
ci + m1m2 · · · miy ≡ ai+1 (mod mi+1).
Proposition 1.13(b) and the fact that gcd(mi+1, m1m2 · · · mi) = 1 imply that
we can always do this. This completes the proof of the existence of a solution.
We leave to you the task of proving that different solutions satisfy (2.8); see
Exercise 2.21.
The proof of the Chinese remainder theorem (Theorem 2.24) is easily con-
verted into an algorithm for finding the solution to a system of simultaneous
congruences. An example suffices to illustrate the general method.

Example 2.25. We solve the three simultaneous congruences
x ≡ 2 (mod 3), x ≡ 3 (mod 7), x ≡ 4 (mod 16). (2.10)
The Chinese remainder theorem says that there is a unique solution mod-
ulo 336, since 336 = 3 · 7 · 16. We start with the solution x = 2 to the first
congruence x ≡ 2 (mod 3). We use it to form the general solution x = 2 + 3y
and substitute it into the second congruence to get
2 + 3y ≡ 3 (mod 7).
This simplifies to 3y ≡ 1 (mod 7), and we multiply both sides by 5 (since 5 is
the inverse of 3 modulo 7) to get y ≡ 5 (mod 7). This gives the value
x = 2 + 3y = 2 + 3 · 5 = 17
as a solution to the first two congruences in (2.10).
The general solution to the first two congruences is thus x = 17 + 21z. We
substitute this into the third congruence to obtain
17 + 21z ≡ 4 (mod 16).
This simplifies to 5z ≡ 3 (mod 16). We multiply by 13, which is the inverse
of 5 modulo 16, to obtain
z ≡ 3 · 13 ≡ 39 ≡ 7 (mod 16).
Finally, we substitute this into x = 17 + 21z to get the solution
x = 17 + 21 · 7 = 164.
All other solutions are obtained by adding and subtracting multiples of 336
to this particular solution.
2.8.1 Solving Congruences with Composite Moduli
It is usually easiest to solve a congruence with a composite modulus by first
solving several congruences modulo primes (or prime powers) and then fitting
together the solutions using the Chinese remainder theorem. We illustrate
the principle in this section by discussing the problem of finding square roots
modulo m. It turns out that it is relatively easy to compute square roots
modulo a prime. Indeed, for primes congruent to 3 modulo 4, it is extremely
easy to find square roots, as shown by the following proposition.
Proposition 2.26. Let p be a prime satisfying p ≡ 3 (mod 4). Let a be an
integer such that the congruence x2
≡ a (mod p) has a solution, i.e., such
that a has a square root modulo p. Then

b ≡ a(p+1)/4
(mod p)
is a solution, i.e., it satisfies b2
≡ a (mod p). (N.B. This formula is valid
only if a has a square root modulo p. In Sect. 3.9 we will describe an efficient
method for checking which numbers have square roots modulo p.)
Proof. Let g be a primitive root modulo p. Then a is equal to some power
of g, and the fact that a has a square root modulo p means that a is an even
power of g, say a ≡ g2k
(mod p). (See Exercise 2.5.) Now we compute
b2
≡ a
p+1
2 (mod p) definition of b,
≡ (g2k
)
p+1
2 (mod p) since a ≡ g2k
(mod p),
≡ g(p+1)k
(mod p)
≡ g2k+(p−1)k
(mod p)
≡ a · (gp−1
)k
(mod p) since a ≡ g2k
(mod p),
≡ a (mod p) since gp−1
≡ 1 (mod p).
Hence b is indeed a square root of a modulo p.
Example 2.27. A square root of a = 2201 modulo the prime p = 4127 is
b ≡ a(p+1)/4
= 22014128/4
≡ 22011032
≡ 3718 (mod 4127).
To see that a does indeed have a square root modulo 4127, we simply square b
and check that 37182
= 13823524 ≡ 2201 (mod 4127).
Suppose now that we want to compute a square root modulo m, where m is
not necessarily a prime. An efficient method is to factor m, compute the square
root modulo each of the prime (or prime power) factors, and then combine
the solutions using the Chinese remainder theorem. An example makes the
idea clear.
Example 2.28. We look for a solution to the congruence
x2
≡ 197 (mod 437). (2.11)
The modulus factors as 437 = 19 · 23, so we first solve the two congruences
y2
≡ 197 ≡ 7 (mod 19) and z2
≡ 197 ≡ 13 (mod 23).
Since both 19 and 23 are congruent to 3 modulo 4, we can find these square
roots using Proposition 2.26 (or by trial and error). In any case, we have
y ≡ ±8 (mod 19) and z ≡ ±6 (mod 23).
We can pick either 8 or −8 for y and either 6 or −6 for z. Choosing the two
positive solutions, we next use the Chinese remainder theorem to solve the
simultaneous congruences
x ≡ 8 (mod 19) and x ≡ 6 (mod 23). (2.12)
We find that x ≡ 236 (mod 437), which gives the desired solution to (2.11).

Remark 2.29. The solution to Example 2.28 is not unique. In the first place,
we can always take the negative,
−236 ≡ 201 (mod 437),
to get a second square root of 197 modulo 437. If the modulus were prime,
there would be only these two square roots (Exercise 1.36(a)). However,
since 437 = 19 · 23 is composite, there are two others. In order to find them,
we replace one of 8 and 6 with its negative in (2.12). This leads to the val-
ues x = 144 and x = 293, so 197 has four square roots modulo 437.
Remark 2.30. It is clear from Example 2.28 (see also Exercises 2.23 and 2.24)
that it is relatively easy to compute square roots modulo m if one knows how
to factor m into a product of prime powers. However, suppose that m is so
large that we are not able to factor it. It is then a very difficult problem to
find square roots modulo m. Indeed, in a certain reasonably precise sense, it
is just as difficult to compute square roots modulo m as it is to factor m.
In fact, if m is a large composite number whose factorization is unknown,
then it is a difficult problem to determine whether a given integer a has a
square root modulo m, even without requiring that the square root be com-
puted. The Goldwasser–Micali public key cryptosystem, which is described in
Sect. 3.10, is based on the difficulty of identifying which numbers have square
roots modulo a composite modulus m. The trapdoor information is knowledge
of the factors of m.
2.9 The Pohlig–Hellman Algorithm
In addition to being a theorem and an algorithm, we would suggest to the
reader that the Chinese remainder theorem is also a state of mind. If
m = m1 · m2 · · · mt
is a product of pairwise relatively prime integers, then the Chinese remainder
theorem says that solving an equation modulo m is more or less equivalent
to solving the equation modulo mi for each i, since it tells us how to knit the
solutions together to get a solution modulo m.
In the discrete logarithm problem (DLP), we need to solve the equation
gx
≡ h (mod p).
In this case, the modulus p is prime, which suggests that the Chinese remain-
der theorem is irrelevant. However, recall that the solution x is determined
only modulo p−1, so we can think of the solution as living in Z/(p−1)Z. This
hints that the factorization of p−1 into primes may play a role in determining
the difficulty of the DLP in F∗
p. More generally, if G is any group and g ∈ G
is an element of order N, then solutions to gx
= h in G are determined only

2.9. The Pohlig–Hellman Algorithm 89
modulo N, so the prime factorization of N would appear to be relevant. This
idea is at the core of the Pohlig–Hellman algorithm.
As in Sect. 2.7 we state and prove results in this section for an arbitrary
group G. But if you feel more comfortable working with integers modulo p,
you may simply replace G by F∗
p.
Theorem 2.31 (Pohlig–Hellman Algorithm). Let G be a group, and suppose
that we have an algorithm to solve the discrete logarithm problem in G for
any element whose order is a power of a prime. To be concrete, if g ∈ G has
order qe
, suppose that we can solve gx
= h in O(Sqe ) steps. (For example,
Proposition 2.21 says that we can take Sqe to be qe/2
. See Remark 2.32 for a
further discussion.)
Now let g ∈ G be an element of order N, and suppose that N factors into
a product of prime powers as
N = qe1
1 · qe2
2 · · · qet
t .
Then the discrete logarithm problem gx
= h can be solved in
O
t

i=1
Sq
ei
i
+ log N

steps (2.13)
using the following procedure:
(1) For each 1 ≤ i ≤ t, let
gi = gN/q
ei
i and hi = hN/q
ei
i .
Notice that gi has prime power order qei
i , so use the given algorithm to
solve the discrete logarithm problem
gy
i = hi. (2.14)
Let y = yi be a solution to (2.14).
(2) Use the Chinese remainder theorem (Theorem 2.24) to solve
x ≡ y1 (mod qe1
1 ), x ≡ y2 (mod qe2
2 ), . . . , x ≡ yt (mod qet
t ). (2.15)
Proof. The running time is clear, since Step (1) takes O(

Sq
ei
i
) steps, and
Step (2), via the Chinese remainder theorem, takes O(log N) steps. In practice,
the Chinese remainder theorem computation is usually negligible compared
to the discrete logarithm computations.
It remains to show that Steps (1) and (2) give a solution to gx
= h. Let x
be a solution to the system of congruences (2.15). Then for each i we can
write
x = yi + qei
i zi for some zi. (2.16)

This allows us to compute

gx
N/q
ei
i
=

gyi+q
ei
i zi
N/q
ei
i
from (2.16),
=

gN/q
ei
i
yi
· gNzi
=

gN/q
ei
i
yi
since gN
is the identity element,
= gyi
i by the definition of gi,
= hi from (2.14)
= hN/q
ei
i by the definition of hi.
In terms of discrete logarithms to the base g, we can rewrite this as
N
qei
i
· x ≡
N
qei
i
· logg(h) (mod N), (2.17)
where recall that the discrete logarithm to the base g is defined only modulo N,
since gN
is the identity element.
Next we observe that the numbers
N
qe1
1
,
N
qe2
2
, . . .
N
qet
t
have no nontrivial common factor, i.e., their greatest common divisor is 1.
Repeated application of the extended Euclidean theorem (Theorem 1.11) (see
also Exercise 1.13) says that we can find integers c1, c2, . . . , ct such that
N
qe1
1
· c1 +
N
qe2
2
· c2 + · · · +
N
qet
t
· ct = 1. (2.18)
Now multiply both sides of (2.17) by ci and sum over i = 1, 2, . . . , t. This
gives
t

i=1
N
qei
i
· ci · x ≡
t

i=1
N
qei
i
· ci · logg(h) (mod N),
and then (2.18) tells us that
x = logg(h) (mod N).
This completes the proof that x satisfies gx
≡ h.
Remark 2.32. The Pohlig–Hellman algorithm more or less reduces the discrete
logarithm problem for elements of arbitrary order to the discrete logarithm
problem for elements of prime power order. A further refinement, which we
discuss later in this section, essentially reduces the problem to elements of
prime order. More precisely, in the notation of Theorem 2.31, the running
time Sqe for elements of order qe
can be reduced to O(eSq). This is the content
of Proposition 2.33.

The Pohlig–Hellman algorithm thus tells us that the discrete logarithm
problem in a group G is not secure if the order of the group is a product
of powers of small primes. More generally, gx
= h is easy to solve if the
order of the element g is a product of powers of small primes. This applies, in
particular, to the discrete logarithm problem in Fp if p−1 factors into powers
of small primes. Since p−1 is always even, the best that we can do is take p =
2q + 1 with q prime and use an element g of order q. Then the running time
of the collision algorithm described in Proposition 2.21 is O(
√
q ) = O(
√
p ).
However, the index calculus method described in Sect. 3.8 has running time
that is subexponential, so even if p = 2q + 1, the prime q must be chosen to
be quite large.
We now explain the algorithm that reduces the discrete logarithm prob-
lem for elements of prime power order to the discrete logarithm problem for
elements of prime order. The idea is simple: if g has order qe
, then gqe−1
has
order q. The trick is to repeat this process several times and then assemble
the information into the ﬁnal answer.
Proposition 2.33. Let G be a group. Suppose that q is a prime, and suppose
that we know an algorithm that takes Sq steps to solve the discrete logarithm
problem gx
= h in G whenever g has order q. Now let g ∈ G be an element of
order qe
with e ≥ 1. Then we can solve the discrete logarithm problem
gx
= h in O(eSq) steps. (2.19)
Remark 2.34. Proposition 2.21 says that we can take Sq = O(
√
q ), so Propo-
sition 2.33 says that we can solve the DLP (2.19) in O(e
√
q ) steps. Notice
that if we apply Proposition 2.21 directly to the DLP (2.19), the running time
is O(qe/2
), which is much slower if e ≥ 2.
Proof of Proposition 2.33. The key idea to proving the proposition is to write
the unknown exponent x in the form
x = x0 + x1q + x2q2
+ · · · + xe−1qe−1
with 0 ≤ xi q, (2.20)
and then determine successively x0, x1, x2, . . . . We begin by observing that
the element gqe−1
is of order q. This allows us to compute
hqe−1
= (gx
)qe−1
raising both sides of (2.19)
to the qe−1
power
=

gx0+x1q+x2q2
+···+xe−1qe−1
qe−1
from (2.20)
= gx0qe−1
·

gqe
x1+x2q+···+xe−1qe−2
=

gqe−1 x0
since gqe
= 1.

Since gqe−1
is an element of order q in G, the equation

gqe−1 x0
= hqe−1
is a discrete logarithm problem whose base is an element of order q. By as-
sumption, we can solve this problem in Sq steps. Once this is done, we know
an exponent x0 with the property that
gx0qe−1
= hqe−1
in G.
We next do a similar computation, this time raising both sides of (2.19)
to the qe−2
power, which yields
hqe−2
= (gx
)qe−2
=

gx0+x1q+x2q2
+···+xe−1qe−1
qe−2
= gx0qe−2
· gx1qe−1
·

gqe
x2+x3q+···+xe−1qe−3
= gx0qe−2
· gx1qe−1
.
Keep in mind that we have already determined the value of x0 and that the
element gqe−1
has order q in G. In order to ﬁnd x1, we must solve the discrete
logarithm problem
gqe−1
x1
=

h · g−x0
qe−2
for the unknown quantity x1. Again applying the given algorithm, we can
solve this in Sq steps. Hence in O(2Sq) steps, we have determined values
for x0 and x1 satisfying
g(x0+x1q)qe−2
= hqe−2
in G.
Similarly, we ﬁnd x2 by solving the discrete logarithm problem

gqe−1
x2
=

h · g−x0−x1q
qe−3
,
and in general, after we have determined x0, . . . , xi−1, then the value of xi is
obtained by solving

gqe−1
xi
=

h · g−x0−x1q−···−xi−1qi−1
qe−i−1
in G.
Each of these is a discrete logarithm problem whose base is of order q, so each
of them can be solved in Sq steps. Hence after O(eSq) steps, we obtain an
exponent x = x0 + x1q + · · · + xe−1qe−1
satisfying gx
= h, thus solving the
original discrete logarithm problem.

Example 2.35. We do an example to clarify the algorithm described in the
proof of Proposition 2.33. We solve
5448x
= 6909 in F∗
11251. (2.21)
The prime p = 11251 has the property that p − 1 is divisible by 54
, and it
is easy to check that 5448 has order exactly 54
in F11251. The first step is to
solve

544853
x0
= 690953
,
which reduces to 11089x0
= 11089. This one is easy; the answer is x0 = 1, so
our initial value of x is x = 1.
The next step is to solve

544853
x1
= (6909 · 5448−x0
)52
= (6909 · 5448−1
)52
,
= 3742. Note that we only need to check values
of x1 between 1 and 4, although if q were large, it would pay to use a faster
algorithm such as Proposition 2.21 to solve this discrete logarithm problem.
In any case, the solution is x1 = 2, so the value of x is now x = 11 = 1 + 2 · 5.
Continuing, we next solve

544853
x2
=

6909 · 5448−x0−x1·5
5
=

6909 · 5448−11
5
,
= 1. Thus x2 = 0, which means that the value of x
remains at x = 11.
The final step is to solve

544853
x3
= 6909 · 5448−x0−x1·5−x2·52
= 6909 · 5448−11
.
This reduces to solving 11089x3
= 6320, which has the solution x3 = 4. Hence
our final answer is
x = 511 = 1 + 2 · 5 + 4 · 53
.
As a check, we compute
5448511
= 6909 in F11251.
The Pohlig–Hellman algorithm (Theorem 2.31) for solving the discrete log-
arithm problem uses the Chinese remainder theorem (Theorem 2.24) to knot
together the solutions for prime powers from Proposition 2.33. The following
example illustrates the full Pohlig–Hellman algorithm.
Example 2.36. Consider the discrete logarithm problem
23x
= 9689 in F11251. (2.22)

The base 23 is a primitive root in F11251, i.e., it has order 11250. Since
11250 = 2 · 32
· 54
is a product of small primes, the Pohlig–Hellman algorithm
should work well. In the notation of Theorem 2.31, we set
p = 11251, g = 23, h = 9689, N = p − 1 = 2 · 32
· 54
.
The first step is solve three subsidiary discrete logarithm problems, as
indicated in the following table.
q e g(p−1)/qe
h(p−1)/qe
Solve

g(p−1)/qe x
= h(p−1)/qe
for x
2 1 11250 11250 1
3 2 5029 10724 4
5 4 5448 6909 511
Notice that the first problem is trivial, while the third one is the problem that
we solved in Example 2.35. In any case, the individual problems in this step
of the algorithm may be solved as described in the proof of Proposition 2.33.
The second step is to use the Chinese remainder theorem to solve the
simultaneous congruences
x ≡ 1 (mod 2), x ≡ 4 (mod 32
), x ≡ 511 (mod 54
).
The smallest solution is x = 4261. We check our answer by computing
234261
= 9689 in F11251.
2.10 Rings, Quotient Rings, Polynomial
Rings, and Finite Fields
Note to the Reader: In this section we describe some topics that are typ-
ically covered in an introductory course in abstract algebra. This material
is somewhat more mathematically sophisticated than the material that we
have discussed up to this point. For cryptographic applications, the most im-
portant topics in this section are the theory of finite fields of prime power
order, which in this book are used primarily in Sects. 6.7 and 6.8 in studying
elliptic curve cryptography, and the theory of quotients of polynomial rings,
which are used in Sect. 7.10 to describe the lattice-based NTRU public key
cryptosystem. The reader interested in proceeding more rapidly to additional
cryptographic topics may wish to omit this section at first reading and return
to it when arriving at the relevant sections of Chaps. 6 and 7.
As we have seen, groups are fundamental objects that appear in many
areas of mathematics. A group G is a set and an operation that allows us to
“multiply” two elements to obtain a third element. We gave a brief overview
of the theory of groups in Sect. 2.5. Another fundamental object in mathe-
matics, called a ring, is a set having two operations. These two operations

2.10. Rings, Quotients, Polynomials, and Finite Fields 95
are analogous to ordinary addition and multiplication, and they are linked by
the distributive law. In this section we begin with a brief discussion of the
general theory of rings, then we discuss how to form one ring from another
by taking quotients, and we conclude by examining in some detail the case of
polynomial rings.
2.10.1 An Overview of the Theory of Rings
You are already familiar with many rings, for example the ring of integers
with the operations of addition and multiplication. We abstract the funda-
mental properties of these operations and use them to formulate the following
fundamental definition.
Definition. A ring is a set R that has two operations, which we denote by +
and ,10
having the following properties:
Properties of +
[Identity Law] There is an additive identity 0 ∈ R such that
0 + a = a + 0 = a for every a ∈ R.
[Inverse Law] For every element a ∈ R there is an additive
inverse b ∈ R such that a + b = b + a = 0.
[Associative Law] a + (b + c) = (a + b) + c for all a, b, c ∈ R.
[Commutative Law] a + b = b + a for all a, b ∈ R,
Briefly, if we look at R with only the operation +, then it is a commutative
group with (additive) identity element 0.
Properties of
[Identity Law] There is a multiplicative identity 1 ∈ R such that
1 a = a 1 = a for every a ∈ R.
[Associative Law] a (b c) = (a b) c for all a, b, c ∈ R.
[Commutative Law] a b = b a for all a, b ∈ R,
Thus if we look at R with only the operation , then it is almost a commu-
tative group with (multiplicative) identity element 1, except that elements
are not required to have multiplicative inverses.
Property Linking + and
[Distributive Law] a (b + c) = a b + a c for all a, b, c ∈ R.
Remark 2.37. More generally, people sometimes work with rings that do not
contain a multiplicative identity, and also with rings for which is not com-
mutative, i.e., a b might not be equal to b a. So to be formal, our rings
are really commutative rings with (multiplicative) identity. However, all of the
rings that we use will be of this type, so we will just call them rings.
10Addition in a ring is virtually always denoted by +, but there are many different
notations for multiplication. In this book use a b, aḃ, or simply ab, depending on the
context.

Every element of a ring has an additive inverse, but there may be many
nonzero elements that do not have multiplicative inverses. For example, in the
ring of integers Z, the only elements that have multiplicative inverses are 1
and −1.
Definition. A (commutative) ring in which every nonzero element has a
multiplicative inverse is called a field.
Example 2.38. Here are a few examples of rings and fields with which you are
probably already familiar.
(a) R = Q, = multiplication, and addition is as usual. The multiplicative
identity element is 1. Every nonzero element has a multiplicative inverse,
so Q is a field.
(b) R = Z, = multiplication, and addition is as usual. The multiplicative
identity element is 1. The only elements that have multiplicative inverses
are 1 and −1, so Z is a ring, but it is not a field.
(c) R = Z/nZ, n is any positive integer, = multiplication, and addition
is as usual. The multiplicative identity element is 1. Here R is always a
ring, and it is a field if and only if n is prime.
(d) R = Fp, p is any prime integer, = multiplication, and addition is
as usual. The multiplicative identity element is 1. By Proposition 1.21,
every nonzero element has a multiplicative inverse, so Fp is a field.
(e) The collection of all polynomials with coefficients taken from Z forms a
ring under the usual operations of polynomial addition and multiplica-
tion. This ring is denoted by Z[x]. Thus we write
Z[x] = {a0 + a1x + a2x2
+ · · · + anxn
: n ≥ 0 and a0, a1, . . . , an ∈ Z}.
For example, 1+x2
and 3−7x4
+23x9
are polynomials in the ring Z[x],
as are 17 and −203.
(f) More generally, if R is any ring, we can form a ring of polynomials whose
coefficients are taken from the ring R. For example, the ring R might
be Z/qZ or a finite field Fp. We discuss these general polynomial rings,
denoted by R[x], in Sect. 7.9.
2.10.2 Divisibility and Quotient Rings
The concept of divisibility, originally introduced for the integers Z in Sect. 1.2,
can be generalized to any ring.
Definition. Let a and b be elements of a ring R with b = 0. We say that b
divides a, or that a is divisible by b, if there is an element c ∈ R such that
a = b c.

As before, we write b | a to indicate that b divides a. If b does not divide a,
then we write b a.
Remark 2.39. The basic properties of divisibility given in Proposition 1.4
apply to rings in general. The proof for Z works for any ring. Similarly, it is
true in every ring that b | 0 for any b = 0. (See Exercise 2.30.) However, note
that not every ring is as nice as Z. For example, there are rings with nonzero
elements a and b whose product a b is 0. An example of such a ring is Z/6Z,
in which 2 and 3 are nonzero, but 2 · 3 = 6 = 0.
Recall that an integer is called a prime if it has no nontrivial factors. What
is a trivial factor? We can “factor” any integer by writing it as a = 1 · a and
as a = (−1)(−a), so these are trivial factorizations. What makes them trivial
is the fact that 1 and −1 have multiplicative inverses. In general, if R is a ring
and if u ∈ R is an element that has a multiplicative inverse u−1
∈ R, then we
can factor any element a ∈ R by writing it as a = u−1
· (ua). Elements that
have multiplicative inverses and elements that have only trivial factorizations
are special elements of a ring, so we give them special names.
Definition. Let R be a ring. An element u ∈ R is called a unit if it has a
multiplicative inverse, i.e., if there is an element v ∈ R such that u v = 1.
An element a of a ring R is said to be irreducible if a is not itself a unit
and if in every factorization of a as a = b c, either b is a unit or c is a unit.
Remark 2.40. The integers have the property that every integer factors
uniquely into a product of irreducible integers, up to rearranging the order
of the factors and throwing in some extra factors of 1 and −1. (Note that a
positive irreducible integer is simply another name for a prime.) Not every
ring has this important unique factorization property, but in the next section
we prove that the ring of polynomials with coefficients in a field is a unique
factorization ring.
We have seen that congruences are a very important and powerful mathe-
matical tool for working with the integers. Using the definition of divisibility,
we can extend the notion of congruence to arbitrary rings.
Definition. Let R be a ring and choose a nonzero element m ∈ R. We say
that two elements a and b of R are congruent modulo m if their difference
a − b is divisible by m. We write
a ≡ b (mod m)
to indicate that a and b are congruent modulo m.
Congruences for arbitrary rings satisfy the same equation-like properties
as they do in the original integer setting.
Proposition 2.41. Let R be a ring and let m ∈ R with m = 0. If
a1 ≡ a2 (mod m) and b1 ≡ b2 (mod m),
then
a1 ± b1 ≡ a2 ± b2 (mod m) and a1 b1 ≡ a2 b2 (mod m).

Proof. We leave the proof as an exercise; see Exercise 2.32.
Remark 2.42. Our definition of congruence captures all of the properties that
we need in this book. However, we must observe that there exists a more
general notion of congruence modulo ideals. For our purposes, it is enough
to work with congruences modulo principal ideals, which are ideals that are
generated by a single element.
An important consequence of Proposition 2.41 is a method for creating new
rings from old rings, just as we created Z/qZ from Z by looking at congruences
modulo q.
Definition. Let R be a ring and let m ∈ R with m = 0. For any a ∈ R,
we write a for the set of all a
∈ R such that a
≡ a (mod m). The set a is
called the congruence class of a, and we denote the collection of all congruence
classes by R/(m) or R/mR. Thus
R/(m) = R/mR = {a : a ∈ R}.
We add and multiply congruence classes using the obvious rules
a + b = a + b and a b = a b. (2.23)
We call R/(m) the quotient ring of R by m. This name is justified by the next
proposition.
Proposition 2.43. The formulas (2.23) give well-defined addition and multi-
plication rules on the set of congruence classes R/(m), and they make R/(m)
into a ring.
Proof. We leave the proof as an exercise; see Exercise 2.43.
2.10.3 Polynomial Rings and the Euclidean Algorithm
In Example 2.38(f) we observed that if R is any ring, then we can create a
polynomial ring with coefficients taken from R. This ring is denoted by
R[x] = {a0 + a1x + a2x2
+ · · · + anxn
: n ≥ 0 and a0, a1, . . . , an ∈ R}.
The degree of a nonzero polynomial is the exponent of the highest power of x
that appears. Thus if
a(x) = a0 + a1x + a2x2
+ · · · + anxn
with an = 0, then a(x) has degree n. We denote the degree of a by deg(a),
and we call an the leading coefficient of a(x). A nonzero polynomial whose
leading coefficient is equal to 1 is called a monic polynomial. For example,
3 + x2
is a monic polynomial, but 1 + 3x2
is not.

Especially important are those polynomial rings in which the ring R is a
field; for example, R could be Q or R or C or a finite field Fp. (For cryptogra-
phy, by far the most important case is the last named one.) One reason why
it is so useful to take R to be a field F is because virtually all of the properties
of Z that we proved in Sect. 1.2 are also true for the polynomial ring F[x].
This section is devoted to a discussion of the properties of F[x].
Back in high school you undoubtedly learned how to divide one polynomial
by another. We recall the process by doing an example. Here is how one divides
x5
+ 2x4
+ 7 by x3
− 5:
x2
+ 2x R 5x2
+ 10x + 7
x3
− 5 ) x5
+ 2x4
+ 7
x5
− 5x2
2x4 + 5x2 + 7
2x4
− 10x
5x2 + 10x + 7
In other words, x5
+2x4
+7 divided by x3
−5 gives a quotient of x2
+2x with
a remainder of 5x2
+ 10x + 7. Another way to say this is to write11
x5
+ 2x4
+ 7 = (x2
+ 2x) · (x3
− 5) + (5x2
+ 10x + 7).
Notice that the degree of the remainder 5x2
+ 10x + 7 is strictly smaller than
the degree of the divisor x3
− 5.
We can do the same thing for any polynomial ring F[x] as long as F is a
field. Rings of this sort that have a “division with remainder” algorithm are
called Euclidean rings.
Proposition 2.44 (The ring F[x] is Euclidean). Let F be a field and let a
and b be polynomials in F[x] with b = 0. Then it is possible to write
a = b · k + r with k and r polynomials, and
either r = 0 or deg r deg b.
We say that a divided by b has quotient k and remainder r.
Proof. We start with any values for k and r that satisfy
a = b · k + r.
(For example, we could start with k = 0 and r = a.) If deg r deg b, then
we’re done. Otherwise we write
b = b0 + b1x + · · · + bdxd
and r = r0 + r1x + · · · + rexe
11For notational convenience, we drop the for multiplication and just write a · b, or
even simply ab.

with bd = 0 and re = 0 and e ≥ d. We rewrite the equation a = b · k + r as
a = b ·

k +
re
bd
xe−d

= b · k
+ r
.
Notice that we have canceled the top degree term of r, so deg r
deg r.
If deg r
deg b, then we’re done. If not, we repeat the process. We can do
this as long as the r term satisfies deg r ≥ deg b, and every time we apply this
process, the degree of our r term gets smaller. Hence eventually we arrive at
an r term whose degree is strictly smaller than the degree of b.
We can now define common divisors and greatest common divisors in F[x].
Definition. A common divisor of two elements a, b ∈ F[x] is an element d ∈
F[x] that divides both a and b. We say that d is a greatest common divisor
of a and b if every common divisor of a and b also divides d.
We will see below that every pair of elements in F[x] has a greatest common
divisor,12
which is unique up to multiplying it by a nonzero element of F. We
write gcd(a, b) for the unique monic polynomial that is a greatest common
divisor of a and b.
Example 2.45. The greatest common divisor of x2
− 1 and x3
+ 1 is x + 1.
Notice that
x2
− 1 = (x + 1)(x − 1) and x3
+ 1 = (x + 1)(x2
− x + 1),
so x+1 is a common divisor. We leave it to you to check that it is the greatest
common divisor.
It is not clear, a priori, that every pair of elements has a greatest common
divisor. And indeed, there are many rings in which greatest common divisors
do not exist, for example in the ring Z[x]. But greatest common divisors do
exist in the polynomial ring F[x] when F is a field.
Proposition 2.46 (The extended Euclidean algorithm for F[x]). Let F be
a field and let a and b be polynomials in F[x] with b = 0. Then the greatest
common divisor d of a and b exists, and there are polynomials u and v in F[x]
such that
a · u + b · v = d.
Proof. Just as in the proof of Theorem 1.7, the polynomial gcd(a, b) can
be computed by repeated application of Proposition 2.44, as described in
Fig. 2.3. Similarly, the polynomials u and v can be computed by substituting
one equation into another in Fig. 2.3, exactly as described in the proof of
Theorem 1.11.
12According to our definition, even if both a and b are 0, they have a greatest common
divisor, namely 0. However, some authors prefer to leave gcd(0, 0) undefined.

a = b · k1 + r2 with 0 ≤ deg r2 deg b,
b = r2 · k2 + r3 with 0 ≤ deg r3 deg r2,
r2 = r3 · k3 + r4 with 0 ≤ deg r4 deg r3,
r3 = r4 · k4 + r5 with 0 ≤ deg r5 deg r4,
.
.
.
.
.
.
.
.
.
rt−2 = rt−1 · kt−2 + rt with 0 ≤ deg rt deg rt−1,
rt−1 = rt · kt
Then d = rt = gcd(a, b).
Figure 2.3: The Euclidean algorithm for polynomials
Example 2.47. We use the Euclidean algorithm in the ring F13[x] to compute
gcd(x5
− 1, x3
+ 2x − 3):
x5
− 1 = (x3
+ 2x − 3) · (x2
+ 11) + (3x2
+ 4x + 6)
x3
+ 2x − 3 = (3x2
+ 4x + 6) · (9x + 1) + (9x + 4) ← gcd = 9x + 4
3x2
+ 4x + 6 = (9x + 4) · (9x + 8) + 0
Thus 9x + 4 is a greatest common divisor of x5
− 1 and x3
+ 2x − 3 in F13[x].
In order to get a monic polynomial, we multiply by 3 ≡ 9−1
(mod 13). This
gives
gcd(x5
− 1, x3
+ 2x − 3) = x − 1 in F13[x].
We recall from Sect. 2.10.2 that an element u of a ring is a unit if it has
a multiplicative inverse u−1
, and that an element a of a ring is irreducible
if it is not a unit and if the only way to factor a is as a = bc with either b
or c a unit. It is not hard to see that the units in a polynomial ring F[x] are
precisely the nonzero constant polynomials, i.e., the nonzero elements of F;
see Exercise 2.34. The question of irreducibility is subtler, as shown by the
following examples.
Example 2.48. The polynomial x5
− 4x3
+ 3x2
− x + 2 is irreducible as a
polynomial in Z[x], but if we view it as an element of F3[x], then it factors as
x5
− 4x3
+ 3x2
− x + 2 ≡ (x + 1)

x4
+ 2x3
+ 2

(mod 3).
It also factors if we view it as a polynomial in F5[x], but this time as a product
of a quadratic polynomial and a cubic polynomial,
x5
− 4x3
+ 3x2
− x + 2 ≡

x2
+ 4x + 2

x3
+ x2
+ 1

(mod 5).
On the other hand, if we work in F13[x], then x5
− 4x3
+ 3x2
− x + 2 is
irreducible.
Every integer has an essentially unique factorization as a product of
primes. The same is true of polynomials with coeﬃcients in a ﬁeld. And just
as for the integers, the key to proving unique factorization is the extended
Euclidean algorithm.

Proposition 2.49. Let F be a field. Then every nonzero polynomial in F[x]
can be uniquely factored as a product of monic irreducible polynomials, in the
following sense. If a ∈ F[x] is factored as
a = αp1 · p2 · · · pm and a = βq1 · q2 · · · qn,
where α, β ∈ F are constants and p1, . . . , pm, q1, . . . , qn are monic irreducible
polynomials, then after rearranging the order of q1, . . . , qn, we have
α = β, m = n, and pi = qi for all 1 ≤ i ≤ m.
Proof. The existence of a factorization into irreducibles follows easily from the
fact that if a = b·c, then deg a = deg b+deg c. (See Exercise 2.34.) The proof
that the factorization is unique is exactly the same as the proof for integers, cf.
Theorem 1.20. The key step in the proof is the statement that if p ∈ F[x] is
irreducible and divides the product a · b, then either p | a or p | b (or both).
This statement is the polynomial analogue of Proposition 1.19 and is proved
in the same way, using the polynomial version of the extended Euclidean
algorithm (Proposition 2.46).
2.10.4 Quotients of Polynomial Rings and Finite Fields
of Prime Power Order
In Sect. 2.10.3 we studied polynomial rings and in Sect. 2.10.2 we studied
quotient rings. In this section we combine these two constructions and consider
quotients of polynomial rings.
Recall that in working with the integers modulo m, it is often convenient
to represent each congruence class modulo m by an integer between 0 and m−
1. The division-with-remainder algorithm (Proposition 2.44) allows us to do
something similar for the quotient of a polynomial ring.
Proposition 2.50. Let F be field and let m ∈ F[x] be a nonzero polynomial.
Then every nonzero congruence class a ∈ F[x]/(m) has a unique representa-
tive r satisfying
deg r deg m and a ≡ r (mod m).
Proof. We use Proposition 2.44 to find polynomials k and r such that
a = m · k + r
with either r = 0 or deg r deg m. If r = 0, then a ≡ 0 (mod m), so a = 0.
Otherwise, reducing modulo m gives a ≡ r (mod m) with deg r deg m.
This shows that r exists. To show that it is unique, suppose that r
has the
same properties. Then
r − r
≡ a − a ≡ 0 (mod m),
so m divides r − r
. But r − r
has degree strictly smaller than the degree
of m, so we must have r − r
= 0.

Example 2.51. Consider the ring F[x]/(x2
+ 1). Proposition 2.50 says that
every element of this quotient ring is uniquely represented by a polynomial of
the form
α + βx with α, β ∈ F.
Addition is performed in the obvious way,
α1 + β1x + α2 + β2x = (α1 + α2) + (β1 + β2)x.
Multiplication is similar, except that we have to divide the final result by
x2
+ 1 and take the remainder. Thus
α1 + β1x · α2 + β2x = α1α2 + (α1β2 + α2β1)x + β1β2x2
= (α1α2 − β1β2) + (α1β2 + α2β1)x.
Notice that the effect of dividing by x2
+1 is the same as replacing x2
with −1.
The intuition is that in the quotient ring F[x]/(x2
+ 1), we have made the
quantity x2
+ 1 equal to 0. Notice that if we take F = R in this example,
then R[x]/(x2
+ 1) is simply the field of complex numbers C.
We can use Proposition 2.50 to count the number of elements in a poly-
nomial quotient ring when F is a finite field.
Corollary 2.52. Let Fp be a finite field and let m ∈ Fp[x] be a nonzero poly-
nomial of degree d ≥ 1. Then the quotient ring Fp[x]/(m) contains exactly pd
elements.
Proof. From Proposition 2.50 we know that every element of Fp[x]/(m) is
represented by a unique polynomial of the form
a0 + a1x + a2x2
+ · · · + ad−1xd−1
with a0, a1, . . . , ad−1 ∈ Fp.
There are p choices for a0, and p choices for a1, and so on, leading to a total
of pd
choices for a0, a1, . . . , ad.
We next give an important characterization of the units in a polynomial
quotient ring. This will allow us to construct new finite fields.
Proposition 2.53. Let F be a field and let a, m ∈ F[x] be polynomials
with m = 0. Then a is a unit in the quotient ring F[x]/(m) if and only if
gcd(a, m) = 1.
Proof. Suppose first that a is a unit in F[x]/(m). By definition, this means
that we can find some b ∈ F[x](m) satisfying a·b = 1. In terms of congruences,
this means that a · b ≡ 1 (mod m), so there is some c ∈ F[x] such that
a · b − 1 = c · m.

It follows that any common divisor of a and m must also divide 1. There-
fore gcd(a, m) = 1.
Next suppose that gcd(a, m) = 1. Then Proposition 2.46 tells us that
there are polynomials u, v ∈ F[x] such that
a · u + m · v = 1.
Reducing modulo m yields
a · u ≡ 1 (mod m),
so u is an inverse for a in F[x]/(m).
An important instance of Proposition 2.53 is the case that the modulus is
an irreducible polynomial.
Corollary 2.54. Let F be a field and let m ∈ F[x] be an irreducible polyno-
mial. Then the quotient ring F[x]/(m) is a field, i.e., every nonzero element
of F[x]/(m) has a multiplicative inverse.
Proof. Replacing m by a constant multiple, we may assume that m is a monic
polynomial. Let a ∈ F[x]/(m). There are two cases to consider. First, sup-
pose that gcd(a, m) = 1. Then Proposition 2.53 tells us that a is a unit, so
we are done. Second, suppose that d = gcd(a, m) = 1. Then in particular,
we know that d | m. But m is monic and irreducible, and d = 1, so we must
have d = m. We also know that d | a, so m | a. Hence a = 0 in F[x]/(m).
This completes the proof that every nonzero element of F[x]/(m) has a mul-
tiplicative inverse.
Example 2.55. The polynomial x2
+ 1 is irreducible in R[x]. The quotient
ring R[x]/(x2
+1) is a field. Indeed, it is the field of complex numbers C, where
the “variable” x plays the role of i =
√
−1, since in the ring R[x]/(x2
+ 1) we
have x2
= −1.
By way of contrast, the polynomial x2
−1 is clearly not irreducible in R[x].
The quotient ring R[x]/(x2
− 1) is not a field. In fact,
(x − 1) · (x + 1) = 0 in R[x]/(x2
− 1).
Thus ring R[x]/(x2
−1) has nonzero elements whose product is 0, which means
that they certainly cannot be units. (Nonzero elements of a ring whose product
is 0 are called zero divisors.)
If we apply Corollary 2.54 to a polynomial ring with coefficients in a finite
field Fp, we can create new finite fields with a prime power number of elements.
Corollary 2.56. Let Fp be a finite field and let m ∈ Fp[x] be an irreducible
polynomial of degree d ≥ 1. Then Fp[x]/(m) is a field with pd
elements.
Proof. We combine Corollary 2.54, which says that Fp[x]/(m) is a field, with
Corollary 2.52, which says that Fp[x]/(m) has pd
elements.

Example 2.57. It is not hard to check that the polynomial x3
+ x + 1 is
irreducible in F2[x] (see Exercise 2.37), so F2[x]/(x3
+ x + 1) is a field with
eight elements. Proposition 2.50 tells us that the following are representatives
for the eight elements in this field:
0, 1, x, x2
, 1 + x, 1 + x2
, x + x2
, 1 + x + x2
.
Addition is easy as long as you remember to treat the coefficients modulo 2,
so for example,
(1 + x) + (x + x2
) = 1 + x2
.
Multiplication is also easy, just multiply the polynomials, divide by x3
+x+1,
and take the remainder. For example,
(1 + x) · (x + x2
) = x + 2x2
+ x3
= 1,
so 1 + x and x + x2
are multiplicative inverses. The complete multiplication
table for F2[x]/(x3
+ x + 1) is described in Exercise 2.38.
Example 2.58. When is the polynomial x2
+ 1 irreducible in the ring Fp[x]?
If it is reducible, then it factors as
x2
+ 1 = (x + α)(x + β) for some α, β ∈ Fp.
Comparing coefficients, we find that α + β = 0 and αβ = 1; hence
α2
= α · (−β) = −αβ = −1.
In other words, the field Fp has an element whose square is −1. Conversely,
if α ∈ Fp satisfies α2
= −1, then x2
+ 1 = (x − α)(x + α) factors in Fp[x].
This proves that
x2
+ 1 is irreducible in Fp[x] if and only if −1 is not a square in Fp.
Quadratic reciprocity, which we study later in Sect. 3.9, then tells us that
x2
+ 1 is irreducible in Fp[x] if and only if p ≡ 3 (mod 4).
Let p be a prime satisfying p ≡ 3 (mod 4). Then the quotient field
Fp[x]/(x2
+ 1) is a field containing p2
elements. It contains an element x that
is a square root of −1. So we can view Fp[x]/(x2
+ 1) as a sort of analogue of
the complex numbers and can write its elements in the form
a + bi with a, b ∈ Fp,
where i is simply a symbol with the property that i2
= −1. Addition, sub-
traction, multiplication, and division are performed just as in the complex
numbers, with the understanding that instead of real numbers as coefficients,
we are using integers modulo p. So for example, division is done by the usual
“rationalizing the denominator” trick,

a + bi
c + di
=
a + bi
c + di
·
c − di
c − di
=
(ac + bd) + (bc − ad)i
c2 + d2
.
Note that there is never a problem of 0 in the denominator, since the assump-
tion that p ≡ 3 (mod 4) ensures that c2
+ d2
= 0 (as long as at least one of c
and d is nonzero). These fields of order p2
will be used in Sect. 6.9.3.
In order to construct a field with pd
elements, we need to find an irreducible
polynomial of degree d in Fp[x]. It is proven in more advanced texts that there
is always such a polynomial, and indeed generally many such polynomials.
Further, in a certain abstract sense it doesn’t matter which irreducible poly-
nomial we choose: we always get the same field. However, in a practical sense
it does make a difference, because practical computations in Fp[x]/(m) are
more efficient if m does not have very many nonzero coefficients.
We summarize some of the principal properties of finite fields in the fol-
lowing theorem.
Theorem 2.59. Let Fp be a finite field.
(a) For every d ≥ 1 there exists an irreducible polynomial m ∈ Fp[x] of
degree d.
(b) For every d ≥ 1 there exists a finite field with pd
elements.
(c) If F and F
are finite fields with the same number of elements, then there
is a way to match the elements of F with the elements of F
so that
the addition and multiplication tables of F and F
are the same. (The
mathematical terminology is that F and F
are isomorphic.)
Proof. We know from Proposition 2.56 that (a) implies (b). For proofs of (a)
and (c), see any basic algebra or number theory text, for example [40,
§§13.5, 14.3], [53, Section 7.1], or [59, Chapter 7].
Definition. We write Fpd for a field with pd
elements. Theorem 2.59 assures
us that there is at least one such field and that any two fields with pd
elements
are essentially the same, up to relabeling their elements. These fields are
also sometimes called Galois fields and denoted by GF(pd
) in honor of the
nineteenth-century French mathematician Évariste Galois, who studied them.
Remark 2.60. It is not difficult to prove that if F is a finite field, then F has pd
elements for some prime p and some d ≥ 1. (The proof uses linear algebra;
see Exercise 2.41.) So Theorem 2.59 describes all finite fields.
Remark 2.61. For cryptographic purposes, it is frequently advantageous to
work in a field F2d , rather than in a field Fp with p large. This is due to the
fact that the binary nature of computers often enables them to work more
efficiently with F2d . A second reason is that sometimes it is useful to have
a finite field that contains smaller fields. In the case of Fpd , one can show
that every field Fpe with e | d is a subfield of Fpd . Of course, if one is going to
use F2d for Diffie–Hellman key exchange or Elgamal encryption, it is necessary
to choose 2d
to be of approximately the same size as one typically chooses p.

Exercises 107
Let F be a finite field having q elements. Every nonzero element of F has an
inverse, so the group of units F∗
is a group of order q − 1. Lagrange’s theorem
(Theorem 2.13) tells us that every element of F∗
has order dividing q − 1, so
aq−1
= 1 for all a ∈ F.
This is a generalization of Fermat’s little theorem (Theorem 1.24) to arbitrary
finite fields. The primitive root theorem (Theorem 1.30) is also true for all
finite fields.
Theorem 2.62. Let F be a finite field having q elements. Then F has a
primitive root, i.e., there is an element g ∈ F such that
F∗
= {1, g, g2
, g3
, . . . , gq−2
}.
Proof. You can find a proof of this theorem in any basic number theory text-
book; see for example [59, §4.1] or [137, Chapter 28].
Exercises
Section 2.1. Diffie–Hellman and RSA
2.1. Write a one page essay giving arguments, both pro and con, for the following
assertion:
If the government is able to convince a court that there is a valid reason
for their request, then they should have access to an individual’s private
keys (even without the individual’s knowledge), in the same way that
the government is allowed to conduct court authorized secret wiretaps
in cases of suspected criminal activity or threats to national security.
Based on your arguments, would you support or oppose the government being given
this power? How about without court oversight? The idea that all private keys should
be stored at a secure central location and be accessible to government agencies (with
or without suitably stringent legal conditions) is called key escrow.
2.2. Research and write a one to two page essay on the classification of cryptographic
algorithms as munitions under ITAR (International Traffic in Arms Regulations).
How does that act define “export”? What are the potential fines and jail terms for
those convicted of violating the Arms Export Control Act? Would teaching non-
classified cryptographic algorithms to a college class that includes non-US citizens
be considered a form of export? How has US government policy changed from the
early 1990s to the present?
Section 2.2. The Discrete Logarithm Problem
2.3. Let g be a primitive root for Fp.
(a) Suppose that x = a and x = b are both integer solutions to the congruence
gx
≡ h (mod p). Prove that a ≡ b (mod p − 1). Explain why this implies that
the map (2.1) on page 65 is well-defined.

108 Exercises
(b) Prove that logg(h1h2) = logg(h1) + logg(h2) for all h1, h2 ∈ F∗
p.
(c) Prove that logg(hn
) = n logg(h) for all h ∈ F∗
p and n ∈ Z.
2.4. Compute the following discrete logarithms.
(a) log2(13) for the prime 23, i.e., p = 23, g = 2, and you must solve the congruence
2x
≡ 13 (mod 23).
(b) log10(22) for the prime p = 47.
(c) log627(608) for the prime p = 941. (Hint. Look in the second column of Table 2.1
on page 66.)
2.5. Let p be an odd prime and let g be a primitive root modulo p. Prove that a
has a square root modulo p if and only if its discrete logarithm logg(a) modulo p−1
is even.
Section 2.3. Diffie–Hellman Key Exchange
2.6. Alice and Bob agree to use the prime p = 1373 and the base g = 2 for a
Diffie–Hellman key exchange. Alice sends Bob the value A = 974. Bob asks your
assistance, so you tell him to use the secret exponent b = 871. What value B should
Bob send to Alice, and what is their secret shared value? Can you figure out Alice’s
secret exponent?
2.7. Let p be a prime and let g be an integer. The Decision Diffie–Hellman Problem
is as follows. Suppose that you are given three numbers A, B, and C, and suppose
that A and B are equal to
A ≡ ga
(mod p) and B ≡ gb
(mod p),
but that you do not necessarily know the values of the exponents a and b. Determine
whether C is equal to gab
(mod p). Notice that this is different from the Diffie–
Hellman problem described on page 69. The Diffie–Hellman problem asks you to
actually compute the value of gab
.
(a) Prove that an algorithm that solves the Diffie–Hellman problem can be used to
solve the decision Diffie–Hellman problem.
(b) Do you think that the decision Diffie–Hellman problem is hard or easy? Why?
See Exercise 6.40 for a related example in which the decision problem is easy, but
it is believed that the associated computational problem is hard.
Section 2.4. The Elgamal Public Key Cryptosystem
2.8. Alice and Bob agree to use the prime p = 1373 and the base g = 2 for
communications using the Elgamal public key cryptosystem.
(a) Alice chooses a = 947 as her private key. What is the value of her public key A?
(b) Bob chooses b = 716 as his private key, so his public key is
B ≡ 2716
≡ 469 (mod 1373).
Alice encrypts the message m = 583 using the random element k = 877. What
is the ciphertext (c1, c2) that Alice sends to Bob?
(c) Alice decides to choose a new private key a = 299 with associated public key
A ≡ 2299
≡ 34 (mod 1373). Bob encrypts a message using Alice’s public key
and sends her the ciphertext (c1, c2) = (661, 1325). Decrypt the message.

Exercises 109
(d) Now Bob chooses a new private key and publishes the associated public key B =
893. Alice encrypts a message using this public key and sends the ciphertext
(c1, c2) = (693, 793) to Bob. Eve intercepts the transmission. Help Eve by
solving the discrete logarithm problem 2b
≡ 893 (mod 1373) and using the value
of b to decrypt the message.
2.9. Suppose that Eve is able to solve the Diffie–Hellman problem described on
page 69. More precisely, assume that if Eve is given two powers gu
and gv
mod p,
then she is able to compute guv
mod p. Show that Eve can break the Elgamal PKC.
2.10. The exercise describes a public key cryptosystem that requires Bob and Alice
to exchange several messages. We illustrate the system with an example.
Bob and Alice fix a publicly known prime p = 32611, and all of the other numbers
used are private. Alice takes her message m = 11111, chooses a random exponent
a = 3589, and sends the number u = ma
(mod p) = 15950 to Bob. Bob chooses a
random exponent b = 4037 and sends v = ub
(mod p) = 15422 back to Alice. Al-
ice then computes w = v15619
≡ 27257 (mod 32611) and sends w = 27257 to Bob.
Finally, Bob computes w31883
(mod 32611) and recovers the value 11111 of Alice’s
message.
(a) Explain why this algorithm works. In particular, Alice uses the numbers
a = 3589 and 15619 as exponents. How are they related? Similarly, how are
Bob’s exponents b = 4037 and 31883 related?
(b) Formulate a general version of this cryptosystem, i.e., using variables, and show
that it works in general.
(c) What is the disadvantage of this cryptosystem over Elgamal? (Hint. How many
times must Alice and Bob exchange data?)
(d) Are there any advantages of this cryptosystem over Elgamal? In particular, can
Eve break it if she can solve the discrete logarithm problem? Can Eve break it
if she can solve the Diffie–Hellman problem?
Section 2.5. An Overview of the Theory of Groups
2.11. The group S3 consists of the following six distinct elements
e, σ, σ2
, τ, στ, σ2
τ,
where e is the identity element and multiplication is performed using the rules
σ3
= e, τ2
= e, τσ = σ2
τ.
Compute the following values in the group S3:
(a) τσ2
(b) τ(στ) (c) (στ)(στ) (d) (στ)(σ2
τ).
Is S3 a commutative group?
2.12. Let G be a group, let d ≥ 1 be an integer, and define a subset of G by
G[d] = {g ∈ G : gd
= e}.
(a) Prove that if g is in G[d], then g−1
is in G[d].
(b) Suppose that G is commutative. Prove that if g1 and g2 are in G[d], then their
product g1 g2 is in G[d].
(c) Deduce that if G is commutative, then G[d] is a group.

110 Exercises
(d) Show by an example that if G is not a commutative group, then G[d] need not
be a group. (Hint. Use Exercise 2.11.)
2.13. Let G and H be groups. A function φ : G → H is called a (group) homomor-
phism if it satisfies
φ(g1 g2) = φ(g1) φ(g2) for all g1, g2 ∈ G.
(Note that the product g1 g2 uses the group law in the group G, while the prod-
uct φ(g1) φ(g2) uses the group law in the group H.)
(a) Let eG be the identity element of G, let eH be the identity element of H, and
let g ∈ G. Prove that
φ(eG) = eH and φ(g−1
) = φ(g)−1
.
(b) Let G be a commutative group. Prove that the map φ : G → G defined
by φ(g) = g2
is a homomorphism. Give an example of a noncommutative group
for which this map is not a homomorphism.
(c) Same question as (b) for the map φ(g) = g−1
.
2.14. Prove that each of the following maps is a group homomorphism.
(a) The map φ : Z → Z/NZ that sends a ∈ Z to a mod N in Z/NZ.
(b) The map φ : R∗
→ GL2(R) defined by φ(a) =
a 0
0 a−1

.
(c) The discrete logarithm map logg : F∗
p → Z/(p−1)Z, where g is a primitive root
modulo p.
2.15. (a) Prove that GL2(Fp) is a group.
(b) Show that GL2(Fp) is a noncommutative group for every prime p.
(c) Describe GL2(F2) completely. That is, list its elements and describe the multi-
plication table.
(d) How many elements are there in the group GL2(Fp)?
(e) How many elements are there in the group GLn(Fp)?
Section 2.6. How Hard Is the Discrete Logarithm Problem?
2.16. Verify the following assertions from Example 2.16.
(a) x2
+
√
x = O

x2

. (d) (ln k)375
= O

k0.001

.
(b) 5 + 6x2
− 37x5
= O

x5

. (e) k2
2k
= O

e2k

.
(c) k300
= O

2k

. (f) N10
2N
= O

eN

.
Section 2.7. A Collision Algorithm for the DLP
2.17. Use Shanks’s babystep–giantstep method to solve the following discrete log-
arithm problems. (For (b) and (c), you may want to write a computer program
implementing Shanks’s algorithm.)
(a) 11x
= 21 in F71.
(b) 156x
= 116 in F593.
(c) 650x
= 2213 in F3571.
Section 2.8. The Chinese Remainder Theorem

Exercises 111
2.18. Solve each of the following simultaneous systems of congruences (or explain
why no solution exists).
(a) x ≡ 3 (mod 7) and x ≡ 4 (mod 9).
(b) x ≡ 137 (mod 423) and x ≡ 87 (mod 191).
(c) x ≡ 133 (mod 451) and x ≡ 237 (mod 697).
(d) x ≡ 5 (mod 9), x ≡ 6 (mod 10), and x ≡ 7 (mod 11).
(e) x ≡ 37 (mod 43), x ≡ 22 (mod 49), and x ≡ 18 (mod 71).
2.19. Solve the 1700-year-old Chinese remainder problem from the Sun Tzu Suan
Ching stated on page 84.
2.20. Let a, b, m, n be integers with gcd(m, n) = 1. Let
c ≡ (b − a) · m−1
(mod n).
Prove that x = a + cm is a solution to
x ≡ a (mod m) and x ≡ b (mod n), (2.24)
and that every solution to (2.24) has the form x = a + cm + ymn for some y ∈ Z.
2.21. (a) Let a, b, c be positive integers and suppose that
a | c, b | c, and gcd(a, b) = 1.
Prove that ab | c.
(b) Let x = c and x = c
be two solutions to the system of simultaneous congru-
ences (2.7) in the Chinese remainder theorem (Theorem 2.24). Prove that
c ≡ c
(mod m1m2 · · · mk).
2.22. For those who have studied ring theory, this exercise sketches a short
proof of the Chinese remainder theorem. Let m1, . . . , mk be integers and let
m = m1m2 · · · mk be their product.
(a) Prove that the map
Z
mZ
−
−
−
−
−
→
Z
m1Z
×
Z
m2Z
× · · · ×
Z
mkZ
a mod m −
−
−
−
−
→ (a mod m1, a mod m2, . . . , a mod mk)
(2.25)
is a well-defined homomorphism of rings. (Hint. First define a homomorphism
from Z to the right-hand side of (2.25), and then show that mZ is in the kernel.)
(b) Assume that m1, . . . , mk are pairwise relatively prime. Prove that the map
given by (2.25) is one-to-one. (Hint. What is the kernel?)
(c) Continuing with the assumption that the numbers m1, . . . , mk are pairwise
relatively prime, prove that the map (2.25) is onto. (Hint. Use (b) and count
the size of both sides.)
(d) Explain why the Chinese remainder theorem (Theorem 2.24) is equivalent to
the assertion that (b) and (c) are true.
2.23. Use the method described in Sect. 2.8.1 to find square roots modulo the
following composite moduli.

112 Exercises
(a) Find a square root of 340 modulo 437. (Note that 437 = 19 · 23.)
(b) Find a square root of 253 modulo 3143.
(c) Find four square roots of 2833 modulo 4189. (The modulus factors as 4189 =
59 · 71. Note that your four square roots should be distinct modulo 4189.)
(d) Find eight square roots of 813 modulo 868.
2.24. Let p be an odd prime, let a be an integer that is not divisible by p, and
let b be a square root of a modulo p. This exercise investigates the square root of a
modulo powers of p.
(a) Prove that for some choice of k, the number b + kp is a square root of a mod-
ulo p2
, i.e., (b + kp)2
≡ a (mod p2
).
(b) The number b = 537 is a square root of a = 476 modulo the prime p = 1291.
Use the idea in (a) to compute a square root of 476 modulo p2
.
(c) Suppose that b is a square root of a modulo pn
. Prove that for some choice of j,
the number b + jpn
is a square root of a modulo pn+1
.
(d) Explain why (c) implies the following statement: If p is an odd prime and if a
has a square root modulo p, then a has a square root modulo pn
for every power
of p. Is this true if p = 2?
(e) Use the method in (c) to compute the square root of 3 modulo 133
, given
that 92
≡ 3 (mod 13).
2.25. Suppose n = pq with p and q distinct odd primes.
(a) Suppose that gcd(a, pq) = 1. Prove that if the equation x2
≡ a (mod n) has
any solutions, then it has four solutions.
(b) Suppose that you had a machine that could find all four solutions for some
given a. How could you use this machine to factor n?
Section 2.9. The Pohlig–Hellman Algorithm
2.26. Let Fp be a finite field and let N | p − 1. Prove that F∗
p has an element of
order N. This is true in particular for any prime power that divides p − 1. (Hint.
Use the fact that F∗
p has a primitive root.)
2.27. Write out your own proof that the Pohlig–Hellman algorithm works in the
particular case that p − 1 = q1 · q2 is a product of two distinct primes. This provides
a good opportunity for you to understand how the proof works and to get a feel for
how it was discovered.
2.28. Use the Pohlig–Hellman algorithm (Theorem 2.31) to solve the discrete log-
arithm problem
gx
= a in Fp
in each of the following cases.
(a) p = 433, g = 7, a = 166.
(b) p = 746497, g = 10, a = 243278.
(c) p = 41022299, g = 2, a = 39183497. (Hint. p = 2 · 295
+ 1.)
(d) p = 1291799, g = 17, a = 192988. (Hint. p − 1 has a factor of 709.)
Section 2.10. Rings, Quotient Rings, Polynomial Rings, and Finite Fields

Exercises 113
2.29. Let R be a ring with the property that the only way that a product a · b can
be 0 is if a = 0 or b = 0. (In the terminology of Example 2.55, the ring R has no zero
divisors.) Suppose further that R has only finitely many elements. Prove that R is a
field. (Hint. Let a ∈ R with a = 0. What can you say about the map R → R defined
by b → a · b?)
2.30. Let R be a ring. Prove the following properties of R directly from the ring
axioms described in Sect. 2.10.1.
(a) Prove that the additive identity element 0 ∈ R is unique, i.e., prove that there
is only one element in R satisfying 0 + a = a + 0 = 0 for every a ∈ R.
(b) Prove that the multiplicative identity element 1 ∈ R is unique.
(c) Prove that every element of R has a unique additive inverse.
(d) Prove that 0 a = a 0 = 0 for all a ∈ R.
(e) We denote the additive inverse of a by −a. Prove that −(−a) = a.
(f) Let −1 be the additive inverse of the multiplicative identity element 1 ∈ R.
Prove that (−1) (−1) = 1.
(g) Prove that b | 0 for every nonzero b ∈ R.
(h) Prove that an element of R has at most one multiplicative inverse.
2.31. Let R and S be rings. A function φ : R → S is called a (ring) homomorphism
if it satisfies
φ(a + b) = φ(a) + φ(b) and φ(a a) = φ(a) φ(a) for all a, b, ∈ R.
(a) Let 0R, 0S, 1R and 1S denote the additive and multiplicative identities of R
and S, respectively. Prove that
φ(0R) = 0S, φ(1R) = 1S, φ(−a) = −φ(a), φ(a−1
) = φ(a)−1
,
where the last equality holds for those a ∈ R that have a multiplicative inverse.
(b) Let p be a prime, and let R be a ring with the property that pa = 0 for
every a ∈ R. (Here pa means to add a to itself p times.) Prove that the map
φ : R −→ R, φ(a) = ap
is a ring homomorphism. It is called the Frobenius homomorphism.
2.32. Prove Proposition 2.41.
2.33. Prove Proposition 2.43. (Hint. First use Exercise 2.32 to prove that the con-
gruence classes a + b and a b depend only on the congruence classes of a and b.)
2.34. Let F be a field and let a and b be nonzero polynomials in F[x].
(a) Prove that deg(a · b) = deg(a) + deg(b).
(b) Prove that a has a multiplicative inverse in F[x] if and only if a is in F, i.e., if
and only if a is a constant polynomial.
(c) Prove that every nonzero element of F[x] can be factored into a product of
irreducible polynomials. (Hint. Use (a), (b), and induction on the degree of the
polynomial.)
(d) Let R be the ring Z/6Z. Give an example to show that (a) is false for some
polynomials a and b in R[x].

114 Exercises
2.35. Let a and b be the polynomials
a = x5
+ 3x4
− 5x3
− 3x2
+ 2x + 2,
b = x5
+ x4
− 2x3
+ 4x2
+ x + 5.
Use the Euclidean algorithm to compute gcd(a, b) in each of the following rings.
(a) F2[x] (b) F3[x] (c) F5[x] (d) F7[x].
2.36. Continuing with the same polynomials a and b as in Exercise 2.35, for each
of the polynomial rings (a)–(d) in Exercise 2.35, find polynomials u and v satisfying
a · u + b · v = gcd(a, b).
2.37. Prove that the polynomial x3
+ x + 1 is irreducible in F2[x]. (Hint. Think
about what a factorization would have to look like.)
2.38. The multiplication table for the field F2[x]/(x3
+ x + 1) is given in Table 2.5,
but we have omitted fourteen entries. Fill in the missing entries. (This is the field
described in Example 2.57. You can download and print a copy of Table 2.5 at www.
math.brown.edu/~jhs/MathCrypto/Table2.5.pdf.)
0 1 x x2 1 + x 1 + x2 x + x2 1 + x + x2
0 0 0 0 0 0 0 0 0
1 0 1 x 1 + x2 x + x2 1 + x + x2
x 0 x x2 x + x2 1 1 + x2
x2 0 x + x2 1 + x + x2 x 1 + x2 1
1 + x 0 x + x2 1 + x + x2 1 + x2 1 x
1 + x2 0 1 + x2 1 x 1 + x + x2 1 + x
x + x2 0 x + x2 1 + x2 1 1 + x x
1 + x + x2 0 1 + x + x2 1 + x2 1 x 1 + x
Table 2.5: Multiplication table for the field F2[x]/(x3
+ x + 1)
2.39. The field F7[x]/(x2
+ 1) is a field with 49 elements, which for the moment we
denote by F49. (See Example 2.58 for a convenient way to work with F49.)
(a) Is 2 + 5x a primitive root in F49?
(b) Is 2 + x a primitive root in F49?
(c) Is 1 + x a primitive root in F49?
(Hint. Lagrange’s theorem says that the order of u ∈ F49 must divide 48. So if uk
= 1
for all proper divisors k of 48, then u is a primitive root.)
2.40. Let p be a prime number and let e ≥ 2. The quotient ring Z/pe
Z and the
finite field Fpe are both rings and both have the same number of elements. Describe
some ways in which they are intrinsically different.
2.41. Let F be a finite field.
(a) Prove that there is an integer m ≥ 1 such that if we add 1 to itself m times,
1 + 1 + · · · + 1

Exercises 115
then we get 0. Note that here 1 and 0 are the multiplicative and additive identity
elements of the field F. If the notation is confusing, you can let u and z be the
multiplicative and additive identity elements of F, and then you need to prove
that u + u + · · · + u = z. (Hint. Since F is finite, the numbers 1, 1 + 1, 1 + 1 +
1,. . . cannot all be different.)
(b) Let m be the smallest positive integer with the property described in (a). Prove
that m is prime. (Hint. If m factors, show that there are nonzero elements
in F whose product is zero, so F cannot be a field.) This prime is called the
characteristic of the field F.
(c) Let p be the characteristic of F. Prove that F is a finite-dimensional vector
space over the field Fp of p elements.
(d) Use (c) to deduce that F has pd
elements for some d ≥ 1.

Chapter 3
Integer Factorization
and RSA
3.1 Euler’s Formula and Roots Modulo pq
The Diffie–Hellman key exchange method and the Elgamal public key
cryptosystem studied in Sects. 2.3 and 2.4 rely on the fact that it is easy
to compute powers an
mod p, but difficult to recover the exponent n if you
know only the values of a and an
mod p. An essential result that we used to
analyze the security of Diffie–Hellman and Elgamal is Fermat’s little theorem
(Theorem 1.24),
ap−1
≡ 1 (mod p) for all a ≡ 0 (mod p).
Fermat’s little theorem expresses a beautiful property of prime numbers.
It is natural to ask what happens if we replace p with a number m that is
not prime. Is it still true that am−1
≡ 1 (mod m)? A few computations such
as Example 1.28 in Sect. 1.4 will convince you that the answer is no. In this
section we investigate the correct generalization of Fermat’s little theorem
when m = pq is a product of two distinct primes, since this is the case that is
most important for cryptographic applications. We leave the general case for
you to do in Exercises 3.4 and 3.5.
As usual, we begin with an example. What do powers modulo 15 look like?
If we make a table of squares and cubes modulo 15, they do not look very
interesting, but many fourth powers are equal to 1 modulo 15. More precisely,
we find that
a4
≡ 1 (mod 15) for a = 1, 2, 4, 7, 8, 11, 13, and 14;
a4
≡ 1 (mod 15) for a = 3, 5, 6, 9, 10, and 12.
117

118 3. Integer Factorization and RSA
What distinguishes the list of numbers 1, 2, 4, 7, 8, 11, 13, 14 whose fourth
power is 1 modulo 15 from the list of numbers 3, 5, 6, 9, 10, 12, 15 whose fourth
power is not 1 modulo 15? A moment’s reﬂection shows that each of the
numbers 3, 5, 6, 9, 10, 12, 15 has a nontrivial factor in common with the modu-
lus 15, while the numbers 1, 2, 4, 7, 8, 11, 13, 14 are relatively prime to 15. This
suggests that some version of Fermat’s little theorem should be true if the
number a is relatively prime to the modulus m, but the correct exponent to
use is not necessarily m − 1.
For m = 15 we found that the right exponent is 4. Why does 4 work? We
could simply check each value of a, but a more enlightening argument would
be better. In order to show that a4
≡ 1 (mod 15), it is enough to check the
two congruences
a4
≡ 1 (mod 3) and a4
≡ 1 (mod 5). (3.1)
This is because the two congruences (3.1) say that
3 divides a4
− 1 and 5 divides a4
− 1,
which in turn imply that 15 divides a4
− 1.
The two congruences in (3.1) are modulo primes, so we can use Fermat’s
little theorem to check that they are true. Thus
a4
= (a2
)2
= (a(3−1)
)2
≡ 12
≡ 1 (mod 3),
a4
= a5−1
≡ 1 (mod 5).
If you think about these two congruences, you will see that the crucial property
of the exponent 4 is that it is a multiple of p − 1 for both p = 3 and p = 5.
Notice that this is not true of 14, which does not work as an exponent. With
this observation, we are ready to state the fundamental formula that underlies
the RSA public key cryptosystem.
Theorem 3.1 (Euler’s Formula for pq). Let p and q be distinct primes and let
g = gcd(p − 1, q − 1).
Then
a(p−1)(q−1)/g
≡ 1 (mod pq) for all a satisfying gcd(a, pq) = 1.
In particular, if p and q are odd primes, then
a(p−1)(q−1)/2
≡ 1 (mod pq) for all a satisfying gcd(a, pq) = 1.
Proof. By assumption we know that p does not divide a and that g divides
q − 1, so we can compute
a(p−1)(q−1)/g
=

a(p−1)
(q−1)/g
since (q − 1)/g is an integer,

3.1. Euler’s Formula and Roots Modulo pq 119
≡ 1(q−1)/g
(mod p) since ap−1
≡ 1 (mod p)
from Fermat’s little theorem,
≡ 1 (mod p) since 1 to any power is 1!
The exact same computation, reversing the roles of p and q, shows that
a(p−1)(q−1)/g
≡ 1 (mod q).
This proves that a(p−1)(q−1)/g
− 1 is divisible by both p and by q; hence it is
divisible by pq, which completes the proof of Theorem 3.1.
Diffie–Hellman key exchange and the Elgamal public key cryptosystem
(Sects. 2.3 and 2.4) rely for their security on the difficulty of solving equations
of the form
ax
≡ b (mod p),
where a, b, and p are known quantities, p is a prime, and x is the unknown vari-
able. The RSA public key cryptosystem, which we study in the next section,
relies on the difficulty of solving equations of the form
xe
≡ c (mod N),
where now the quantities e, c, and N are known and x is the unknown. In
other words, the security of RSA relies on the assumption that it is difficult
to take eth roots modulo N.
Is this a reasonable assumption? If the modulus N is prime, then it turns
out that it is comparatively easy to compute eth roots modulo N, as described
in the next proposition.
Proposition 3.2. Let p be a prime and let e ≥ 1 be an integer satisfying
gcd(e, p−1) = 1. Proposition 1.13 tells us that e has an inverse modulo p − 1,
say
de ≡ 1 (mod p − 1).
Then the congruence
xe
≡ c (mod p) (3.2)
has the unique solution x ≡ cd
(mod p).
Proof. If c ≡ 0 (mod p), then x ≡ 0 (mod p) is the unique solution
and we are done. So we assume that c ≡ 0 (mod p). The proof is then an
easy application of Fermat’s little theorem (Theorem 1.24). The congruence
de ≡ 1 (mod p − 1) means that there is an integer k such that
de = 1 + k(p − 1).
Now we check that cd
is a solution to xe
≡ c (mod p):
(cd
)e
≡ cde
(mod p) law of exponents,

≡ c1+k(p−1)
(mod p) since de = 1 + k(p − 1),
≡ c · (cp−1
)k
(mod p) law of exponents again,
≡ c · 1k
(mod p) from Fermat’s little theorem,
≡ c (mod p).
This completes the proof that x = cd
is a solution to xe
≡ c (mod p).
In order to see that the solution is unique, suppose that x1 and x2 are both
solutions to the congruence (3.2). We’ve just proven that zde
≡ z (mod p) for
any nonzero value z, so we find that
x1 ≡ xde
1 ≡ (xe
1)d
≡ cd
≡ (xe
2)d
≡ xde
2 ≡ x2 (mod p).
Thus x1 and x2 are the same modulo p, so (3.2) has at most one solution.
Example 3.3. We solve the congruence
x1583
≡ 4714 (mod 7919),
where the modulus p = 7919 is prime. Proposition 3.2 says that first we need
to solve the congruence
1583d ≡ 1 (mod 7918).
The solution, using the extended Euclidean algorithm (Theorem 1.11; see
also Remark 1.15 and Exercise 1.12), is d ≡ 5277 (mod 7918). Then Proposi-
tion 3.2 tells us that
x ≡ 47145277
≡ 6059 (mod 7919)
is a solution to x1583
≡ 4714 (mod 7919).
Remark 3.4. Proposition 3.2 includes the assumption that gcd(e, p − 1) = 1.
If this assumption is omitted, then the congruence xe
≡ c (mod p) will have a
solution for some, but not all, values of c. Further, if it does have a solution,
then it will have more than one. See Exercise 3.2 for further details.
Proposition 3.2 shows that it is easy to take eth roots if the modulus is a
prime p. The situation for a composite modulus N looks similar, but there is a
crucial difference. If we know how to factor N, then it is again easy to compute
eth roots. The following proposition explains how to do this if N = pq is a
product of two primes. The general case is left for you to do in Exercise 3.6.
Proposition 3.5. Let p and q be distinct primes and let e ≥ 1 satisfy
gcd

e, (p − 1)(q − 1)

= 1.

3.1. Euler’s Formula and Roots Modulo pq 121
Proposition 1.13 tells us that e has an inverse modulo (p − 1)(q − 1), say
de ≡ 1 (mod (p − 1)(q − 1)).
Then the congruence
xe
≡ c (mod pq) (3.3)
has the unique solution x ≡ cd
(mod pq).
Proof. We assume that gcd(c, pq) = 1; see Exercise 3.3 for the other cases. The
proof of Proposition 3.5 is almost identical to the proof of Proposition 3.2, but
instead of using Fermat’s little theorem, we use Euler’s formula (Theorem 3.1).
The congruence de ≡ 1 (mod (p − 1)(q − 1)) means that there is an integer k
such that
de = 1 + k(p − 1)(q − 1).
Now we check that cd
is a solution to xe
≡ c (mod pq):
(cd
)e
≡ cde
(mod pq) law of exponents,
≡ c1+k(p−1)(q−1)
(mod pq) since de = 1 + k(p − 1)(q − 1),
≡ c · (c(p−1)(q−1)
)k
(mod pq) law of exponents again,
≡ c · 1k
(mod pq) from Euler’s formula (Theorem 3.1),
≡ c (mod pq).
This completes the proof that x = cd
is a solution to the congruence (3.3). It
remains to show that the solution is unique. Suppose that x = u is a solution
to (3.3). Then
u ≡ ude−k(p−1)(q−1)
(mod pq) since de = 1 + k(p − 1)(q − 1),
≡ (ue
)d
· (u(p−1)(q−1)
)−k
(mod pq)
≡ (ue
)d
· 1−k
(mod pq) using Euler’s formula (Theorem 3.1),
≡ cd
(mod pq) since u is a solution to (3.3).
Thus every solution to (3.3) is equal to cd
(mod pq), so this is the unique
solution.
Remark 3.6. Proposition 3.5 gives an algorithm for solving xe
≡ c (mod pq)
that involves ﬁrst solving de ≡ 1 (mod (p − 1)(q − 1)) and then computing
cd
mod pq. We can often make the computation faster by using a smaller
value of d. Let g = gcd(p − 1, q − 1) and suppose that we solve the following
congruence for d:
de ≡ 1

mod
(p − 1)(q − 1)
g

.
Euler’s formula (Theorem 3.1) says that a(p−1)(q−1)/g
≡ 1 (mod pq). Hence
just as in the proof of Proposition 3.5, if we write de = 1 + k(p − 1)(q − 1)/g,
then
(cd
)e
= cde
= c1+k(p−1)(q−1)/g
= c · (c(p−1)(q−1)/g
)k
≡ c (mod pq).

Thus using this smaller value of d, we still find that cd
mod pq is a solution
to xe
≡ c (mod pq).
Example 3.7. We solve the congruence
x17389
≡ 43927 (mod 64349),
where the modulus N = 64349 = 229 · 281 is a product of the two primes
p = 229 and q = 281. The first step is to solve the congruence
17389d ≡ 1 (mod 63840),
where 63840 = (p − 1)(q − 1) = 228 · 280. The solution, using the method
described in Remark 1.15 or Exercise 1.12, is d ≡ 53509 (mod 63840). Then
Proposition 3.5 tells us that
x ≡ 4392753509
≡ 14458 (mod 64349)
is the solution to x17389
≡ 43927 (mod 64349).
We can save ourselves a little bit of work by using the idea described in
Remark 3.6. We have
g = gcd(p − 1, q − 1) = gcd(228, 280) = 4,
so (p − 1)(q − 1)/g = (228)(280)/4 = 15960, which means that we can find a
value of d by solving the congruence
17389d ≡ 1 (mod 15960).
The solution is d ≡ 5629 (mod 15960), and then
x ≡ 439275629
≡ 14458 (mod 64349)
is the solution to x17389
≡ 43927 (mod 64349). Notice that we obtained
the same solution, as we should, but that we needed to raise 43927 to
only the 5629th power, while using Proposition 3.5 directly required us to
raise 43927 to the 53509th power. This saves some time, although not quite
as much as it looks, since recall that computing cd
mod N takes time O(ln d).
Thus the faster method takes about 80 % as long as the slower method, since
ln(5629)/ ln(53509) ≈ 0.793.
Example 3.8. Alice challenges Eve to solve the congruence
x9843
≡ 134872 (mod 30069476293).
The modulus 30069476293 is not prime, since (cf. Example 1.28)
230069476293−1
≡ 18152503626 ≡ 1 (mod 30069476293).

3.2. The RSA Public Key Cryptosystem 123
It happens that 30069476293 is a product of two primes, but if Eve does
not know the prime factors, she cannot use Proposition 3.5 to solve Alice’s
challenge. After accepting Eve’s concession of defeat, Alice informs Eve
that 30069476293 is equal to 104729 · 287117. With this new knowledge,
Alice’s challenge becomes easy. Eve computes 104728·287116 = 30069084448,
solves the congruence 9843d ≡ 1 (mod 30069084448) to find d ≡ 18472798299
(mod 30069084448), and computes the solution
x ≡ 13487218472798299
≡ 25470280263 (mod 30069476293).
Bob Alice
Key creation
Choose secret primes p and q.
Choose encryption exponent e
with gcd(e, (p − 1)(q − 1)) = 1.
Publish N = pq and e.
Encryption
Choose plaintext m.
Use Bob’s public key (N, e)
to compute c ≡ me
(mod N).
Send ciphertext c to Bob.
Decryption
Compute d satisfying
ed ≡ 1 (mod (p − 1)(q − 1)).
Compute m
≡ cd
(mod N).
Then m
equals the plaintext m.
Table 3.1: RSA key creation, encryption, and decryption
3.2 The RSA Public Key Cryptosystem
Bob and Alice have the usual problem of exchanging sensitive information
over an insecure communication line. We have seen in Chap. 2 various ways
in which Bob and Alice can accomplish this task, based on the difficulty of
solving the discrete logarithm problem. In this section we describe the RSA
public key cryptosystem, the first invented and certainly best known such
system. RSA is named after its (public) inventors, Ron Rivest, Adi Shamir,
and Leonard Adleman.
The security of RSA depends on the following dichotomy:
• Setup. Let p and q be large primes, let N = pq, and let e and c be
integers.

• Problem. Solve the congruence xe
≡ c (mod N) for the variable x.
• Easy. Bob, who knows the values of p and q, can easily solve for x as
described in Proposition 3.5.
• Hard. Eve, who does not know the values of p and q, cannot easily ﬁnd x.
• Dichotomy. Solving xe
≡ c (mod N) is easy for a person who possesses
certain extra information, but it is apparently hard for all other people.
The RSA public key cryptosystem is summarized in Table 3.1. Bob’s secret
key is a pair of large primes p and q. His public key is the pair (N, e) consisting
of the product N = pq and an encryption exponent e that is relatively prime
to (p − 1)(q − 1). Alice takes her plaintext and converts it into an integer m
between 1 and N. She encrypts m by computing the quantity
c ≡ me
(mod N).
The integer c is her ciphertext, which she sends to Bob. It is then a simple
matter for Bob to solve the congruence xe
≡ c (mod N) to recover Alice’s
message m, because Bob knows the factorization N = pq. Eve, on the other
hand, may intercept the ciphertext c, but unless she knows how to factor N,
she presumably has a diﬃcult time trying to solve xe
≡ c (mod N).
Example 3.9. We illustrate the RSA public key cryptosystem with a small
numerical example. Of course, this example is not secure, since the numbers
are so small that it would be easy for Eve to factor the modulus N. Secure
implementations of RSA use moduli N with hundreds of digits.
RSA Key Creation
• Bob chooses two secret primes p = 1223 and q = 1987. Bob computes his
public modulus
N = p · q = 1223 · 1987 = 2430101.
• Bob chooses a public encryption exponent e = 948047 with the property
that
gcd(e, (p − 1)(q − 1)) = gcd(948047, 2426892) = 1.
RSA Encryption
• Alice converts her plaintext into an integer
m = 1070777 satisfying 1 ≤ m N.
• Alice uses Bob’s public key (N, e) = (2430101, 948047) to compute
c ≡ me
(mod N), c ≡ 1070777948047
≡ 1473513 (mod 2430101).
• Alice sends the ciphertext c = 1473513 to Bob.

3.2. The RSA Public Key Cryptosystem 125
RSA Decryption
• Bob knows (p − 1)(q − 1) = 1222 · 1986 = 2426892, so he can solve
ed ≡ 1 (mod (p − 1)(q − 1)), 948047 · d ≡ 1 (mod 2426892),
for d and find that d = 1051235.
• Bob takes the ciphertext c = 1473513 and computes
cd
(mod N), 14735131051235
≡ 1070777 (mod 2430101).
The value that he computes is Alice’s message m = 1070777.
Remark 3.10. The quantities N and e that form Bob’s public key are called,
respectively, the modulus and the encryption exponent. The number d that
Bob uses to decrypt Alice’s message, that is, the number d satisfying
ed ≡ 1 (mod (p − 1)(q − 1)), (3.4)
is called the decryption exponent. It is clear that encryption can be done
more efficiently if the encryption exponent e is a small number, and similarly,
decryption is more efficient if the decryption exponent d is small. Of course,
Bob cannot choose both of them to be small, since once one of them is selected,
the other is determined by the congruence (3.4). (This is not strictly true, since
if Bob takes e = 1, then also d = 1, so both d and e are small. But then the
plaintext and the ciphertext are identical, so taking e = 1 is a very bad idea!)
Notice that Bob cannot take e = 2, since he needs e to be relatively prime
to (p − 1)(q − 1). Thus the smallest possible value for e is e = 3. As far as is
known, taking e = 3 is as secure as taking a larger value of e, although some
doubts are raised in [22]. People who want fast encryption, but are worried
that e = 3 is too small, often take e = 216
+ 1 = 65537, since it takes only
sixteen squarings and one multiplication to compute m65537
via the square-
and-multiply algorithm described in Sect. 1.3.2.
An alternative is for Bob to use a small value for d and use the congru-
ence (3.4) to determine e, so e would be large. However, it turns out that
this may lead to an insecure version of RSA. More precisely, if d is smaller
than N1/4
, then the theory of continued fractions allows Eve to break RSA.
See [17, 18, 19, 149] for details.
Remark 3.11. Bob’s public key includes the number N = pq, which is a
product of two secret primes p and q. Proposition 3.5 says that if Eve knows
the value of (p − 1)(q − 1), then she can solve xe
≡ c (mod N), and thus can
decrypt messages sent to Bob.
Expanding (p − 1)(q − 1) gives
(p − 1)(q − 1) = pq − p − q + 1 = N − (p + q) + 1. (3.5)
Bob has published the value of N, so Eve already knows N. Thus if Eve
can determine the value of the sum p + q, then (3.5) gives her the value
of (p − 1)(q − 1), which enables her to decrypt messages.

In fact, if Eve knows the values of p + q and pq, then it is easy for her to
compute the values of p and q. She simply uses the quadratic formula to find
the roots of the polynomial
X2
− (p + q)X + pq,
since this polynomial factors as (X −p)(X −q), so its roots are p and q. Thus
once Bob publishes the value of N = pq, it is no easier for Eve to find the
value of (p − 1)(q − 1) than it is for her to find p and q themselves.
We illustrate with an example. Suppose that Eve knows that
N = pq = 66240912547 and (p − 1)(q − 1) = 66240396760.
She first uses (3.5) to compute
p + q = N + 1 − (p − 1)(q − 1) = 515788.
Then she uses the quadratic formula to factor the polynomial
X2
− (p + q)X + N = X2
− 515788X + 66240912547
= (X − 241511)(X − 274277).
This gives her the factorization N = 66240912547 = 241511 · 274277.
Remark 3.12. One final, but very important, observation. We have shown that
it is no easier for Eve to determine (p − 1)(q − 1) than it is for her to factor N.
But this does not prove that that Eve must factor N in order to decrypt Bob’s
messages. The point is that what Eve really needs to do is to solve congruences
of the form xe
≡ c (mod N), and conceivably there is an efficient algorithm to
solve such congruences without knowing the value of (p − 1)(q − 1). No one
knows whether such a method exists, although see [22] for a suggestion that
computing roots modulo N may be easier than factoring N.
3.3 Implementation and Security Issues
Our principal focus in this book is the mathematics of the hard problems
underlying modern cryptography, but we would be remiss if we did not at
least briefly mention some of the security issues related to implementation.
The reader should be aware that we do not even scratch the surface of this
vast and fascinating subject, but simply describe some examples to show that
there is far more to creating a secure communications system than simply
using a cryptosystem based on an intractable mathematical problem.
Example 3.13 (Woman-in-the-Middle Attack). Suppose that Eve is not simply
an eavesdropper, but that she has full control over Alice and Bob’s commu-
nication network. In this case, she can institute what is known as a man-in-
the-middle attack. We describe this attack for Diffie–Hellman key exchange,
but it exists for most public key constructions. (See Exercise 3.12.)

3.3. Implementation and Security Issues 127
Recall that in Diffie–Hellman key exchange (Table 2.2), Alice sends Bob
the value A = ga
and Bob sends Alice the value B = gb
, where the compu-
tations take place in the finite field Fp. What Eve does is to choose her own
secret exponent e and compute the value E = ge
. She then intercepts Alice
and Bob’s communications, and instead of sending A to Bob and sending B to
Alice, she sends both of them the number E. Notice that Eve has exchanged
the value Ae
with Alice and the value Be
with Bob, while Alice and Bob be-
lieve that they have exchanged values with each other. The man-in-the-middle
attack is illustrated in Fig. 3.1.
Alice
−
−
−
−
−
−
→
←
−
−
−
−
−
−
Eve
B=gb
A=ga
←
−
−
−
−
−
−
E=ge
E=ge
−
−
−
−
−
−
→
Bob
Figure 3.1: “Man-in-the-middle” attack on Diffie–Hellman key exchange
Suppose that Alice and Bob subsequently use their supposed secret shared
value as the key for a symmetric cipher and send each other messages. For
example, Alice encrypts a plaintext message m using Ea
as the symmetric
cipher key. Eve intercepts this message and is able to decrypt it using Ae
as the symmetric cipher key, so she can read Alice’s message. She then re-
encrypts it using Be
as the symmetric cipher key and sends it to Bob. Since
Bob is then able to decrypt it using Eb
as the symmetric cipher key, he is
unaware that there is a breach in security.
Notice the insidious nature of this attack. Eve does not solve the underly-
ing hard problem (in this case, the discrete logarithm problem or the Diffie–
Hellman problem), yet she is able to read Alice and Bob’s communications,
and they are not aware of her success.
Example 3.14. Suppose that Eve is able to convince Alice to decrypt “random”
RSA messages using her (Alice’s) private key. This is a plausible scenario,
since one way for Alice to authenticate her identity as the owner of the public
key (N, e) is to show that she knows how to decrypt messages. (One says that
Eve has access to an RSA oracle.)
Eve can exploit Alice’s generosity as follows. Suppose that Eve has in-
tercepted a ciphertext c that Bob has sent to Alice. Eve chooses a random
value k and sends Alice the “message”
c
≡ ke
· c (mod N).
Alice decrypts c
and returns the resulting m
to Eve, where
m
≡ (c
)d
≡ (ke
· c)d
≡ (ke
· me
)d
≡ k · m (mod N).
Thus Eve knows the quantity k · m (mod N), and since she knows k, she
immediately recovers Bob’s plaintext m.

There are two important observations to make. First, Eve has decrypted
Bob’s message without knowing or gaining knowledge of how to factor N, so
the difficulty of the underlying mathematical problem is irrelevant. Second,
since Eve has used k to mask Bob’s ciphertext, Alice has no way to tell that
Eve’s message is in any way related to Bob’s message. Thus Alice sees only
the values ke
· c (mod N) and k · m (mod N), which to her look random when
compared to c and m.
Example 3.15. Suppose that Alice publishes two different exponents e1 and e2
for use with her public modulus N and that Bob encrypts a single plaintext m
using both of Alice’s exponents. If Eve intercepts the ciphertexts
c1 ≡ me1
(mod N) and c2 ≡ me2
(mod N),
she can take a solution to the equation
e1 · u + e2 · v = gcd(e1, e2)
and use it to compute
cu
1 · cv
2 ≡ (me1
)u
· (me2
)v
≡ me1·u+e2·v
≡ mgcd(e1,e2)
(mod N).
If it happens that gcd(e1, e2) = 1, Eve has recovered the plaintext. (See
Exercise 3.13 for a numerical example.) More generally, if Bob encrypts a
single message using several exponents e1, e2, . . . , er, then Eve can recover the
plaintext if gcd(e1, e2, . . . , er) = 1. The moral is that Alice should use at most
one encryption exponent for a given modulus.
3.4 Primality Testing
Bob has finished reading Sects. 3.2 and 3.3 and is now ready to communicate
with Alice using his RSA public/private key pair. Or is he? In order to create
an RSA key pair, Bob needs to choose two very large primes p and q. It’s not
enough for him to choose two very large, but possibly composite, numbers p
and q. In the first place, if p and q are not prime, Bob will need to know how
to factor them in order to decrypt Alice’s message. But even worse, if p and q
have small prime factors, then Eve may be able to factor pq and break Bob’s
system.
Bob is thus faced with the task of finding large prime numbers. More pre-
cisely, he needs a way of distinguishing between prime numbers and composite
numbers, since if he knows how to do this, then he can choose large random
numbers until he hits one that is prime. We discuss later (Sect. 3.4.1) the like-
lihood that a randomly chosen number is prime, but for now it is enough to
know that he has a reasonably good chance of success. Hence what Bob really
needs is an efficient way to tell whether a very large number is prime.

3.4. Primality Testing 129
For example, suppose that Bob chooses the rather large number
n = 31987937737479355332620068643713101490952335301
and he wants to know whether n is prime. First Bob searches for small factors,
but he finds that n is not divisible by any primes smaller than 1000000. So
he begins to suspect that maybe n is prime. Next he computes the quantity
2n−1
mod n and he finds that
2n−1
≡ 1281265953551359064133601216247151836053160074 (mod n). (3.6)
The congruence (3.6) immediately tells Bob that n is a composite number,
although it does not give him any indication of how to factor n. Why? Recall
Fermat’s little theorem, which says that if p is prime, then ap−1
≡ 1 (mod p)
(unless p divides a). Thus if n were prime, then the right-hand side of (3.6)
would equal 1; since it does not equal 1, Bob concludes that n is not prime.
Before continuing the saga of Bob’s quest for large primes, we state a
convenient version of Fermat’s little theorem that puts no restrictions on a.
Theorem 3.16 (Fermat’s Little Theorem, Version 2). Let p be a prime num-
ber. Then
ap
≡ a (mod p) for every integer a. (3.7)
Proof. If p a, then the first version of Fermat’s little theorem (Theorem 1.24)
implies that ap−1
≡ 1 (mod p). Multiplying both sides by a proves that (3.7)
is true. On the other hand, if p | a, then both sides of (3.7) are 0 modulo p.
Returning to Bob’s quest, we find him undaunted as he randomly chooses
another large number,
n = 2967952985951692762820418740138329004315165131. (3.8)
After checking for divisibility by small primes, Bob computes 2n
mod n and
finds that
2n
≡ 2 (mod n). (3.9)
Does (3.9) combined with Fermat’s little theorem 3.16 prove that n is prime?
The answer is NO! Fermat’s theorem works in only one direction:
If p is prime, then ap
≡ a (mod p).
There is nothing to prevent an equality such as (3.9) being true for composite
values of n, and indeed a brief search reveals examples such as
2341
≡ 2 (mod 341) with 341 = 11 · 31.
However, in some vague philosophical sense, the fact that 2n
≡ 2 (mod n)
makes it more likely that n is prime, since if the value of 2n
mod n had turned
out differently, we would have known that n was composite. This leads us to
make the following definition.

Definition. Fix an integer n. We say that an integer a is a witness for (the
compositeness of ) n if
an
≡ a (mod n).
As we observed earlier, a single witness for n combined with Fermat’s little
theorem (Theorem 3.16) is enough to prove beyond a shadow of a doubt that n
is composite.1
Thus one way to assess the likelihood that n is prime is to try
a lot of numbers a1, a2, a3, . . . . If any one of them is a witness for n, then Bob
knows that n is composite; and if none of them is a witness for n, then Bob
suspects, but does not know for certain, that n is prime.
Unfortunately, intruding on this idyllic scene are barbaric numbers such
as 561. The number 561 is composite, 561 = 3·11·17, yet 561 has no witnesses!
In other words,
a561
≡ a (mod 561) for every integer a.
Composite numbers having no witnesses are called Carmichael numbers, after
R.D. Carmichael, who in 1910 published a paper listing 15 such numbers.
The fact that 561 is a Carmichael number can be verified by checking each
value a = 0, 1, 2, . . . , 560, but see Exercise 3.14 for an easier method and for
more examples of Carmichael numbers. Although Carmichael numbers are
rather rare, Alford, Granville, and Pomerance [5] proved in 1994 that there
are infinitely many of them. So Bob needs something stronger than Fermat’s
little theorem in order to test whether a number is (probably) prime. What
is needed is a better test for compositeness. The following property of prime
numbers is used to formulate the Miller–Rabin test, which has the agreeable
property that every composite number has a large number of witnesses.
Proposition 3.17. Let p be an odd prime and write
p − 1 = 2k
q with q odd.
Let a be any number not divisible by p. Then one of the following two condi-
tions is true:
(i) aq
is congruent to 1 modulo p.
(ii) One of aq
, a2q
, a4q
,. . . , a2k−1
q
is congruent to −1 modulo p.
Proof. Fermat’s little theorem (Theorem 1.24) tells us that ap−1
≡ 1 (mod p).
This means that when we look at the list of numbers
aq
, a2q
, a4q
, . . . , a2k−1
q
, a2k
q
,
we know that the last number in the list, which equals ap−1
, is congruent
to 1 modulo p. Further, each number in the list is the square of the previous
number. Therefore one of the following two possibilities must occur:
1In the great courthouse of mathematics, witnesses never lie!

(i) The first number in the list is congruent to 1 modulo p.
(ii) Some number in the list is not congruent to 1 modulo p, but when it
is squared, it becomes congruent to 1 modulo p. But the only number
satisfying both
b ≡ 1 (mod p) and b2
≡ 1 (mod p)
is −1, so one of the numbers in the list is congruent to −1 modulo p.
This completes the proof of Proposition 3.17.
Input. Integer n to be tested, integer a as potential witness.
1. If n is even or 1 gcd(a, n) n, return Composite.
2. Write n − 1 = 2k
q with q odd.
3. Set a = aq
(mod n).
4. If a ≡ 1 (mod n), return Test Fails.
5. Loop i = 0, 1, 2, . . . , k − 1
6. If a ≡ −1 (mod n), return Test Fails.
7. Set a = a2
mod n.
8. End i loop.
9. Return Composite.
Table 3.2: Miller–Rabin test for composite numbers
Definition. Let n be an odd number and write n − 1 = 2k
q with q odd. An
integer a satisfying gcd(a, n) = 1 is called a Miller–Rabin witness for (the
compositeness of ) n if both of the following conditions are true:
(a) aq
≡ 1 (mod n).
(b) a2i
q
≡ −1 (mod n) for all i = 0, 1, 2, . . . , k − 1.
It follows from Proposition 3.17 that if there exists an a that is a Miller–
Rabin witness for n, then n is definitely a composite number. This leads to
the Miller–Rabin test for composite numbers described in Table 3.2.
Now suppose that Bob wants to check whether a large number n is prob-
ably a prime. To do this, he runs the Miller–Rabin test using a bunch of
randomly selected values of a. Why is this better than using the Fermat’s
little theorem test? The answer is that there are no Carmichael-like numbers
for the Miller–Rabin test, and in fact, every composite number has a lot of
Miller–Rabin witnesses, as described in the following proposition.
Proposition 3.18. Let n be an odd composite number. Then at least 75 % of
the numbers a between 1 and n − 1 are Miller–Rabin witnesses for n.

Proof. The proof is not hard, but we will not give it here. See for example [132,
Theorem 10.6].
Consider now Bob’s quest to identify large prime numbers. He takes his
potentially prime number n and he runs the Miller–Rabin test on n for (say)
10 different values of a. If any a value is a Miller–Rabin witness for n, then
Bob knows that n is composite. But suppose that none of his a values is a
Miller–Rabin witness for n. Proposition 3.18 says that if n were composite,
then each time Bob tries a value for a, he has at least a 75 % chance of
getting a witness. Since Bob found no witnesses in 10 tries, it is reasonable2
to conclude that the probability of n being composite is at most (25 %)10
,
which is approximately 10−6
. And if this is not good enough, Bob can use
100 different values of a, and if none of them proves n to be composite, then
the probability that n is actually composite is less than (25 %)100
≈ 10−60
.
Example 3.19. We illustrate the Miller–Rabin test with a = 2 and the number
n = 561, which, you may recall, is a Carmichael number. We factor
n − 1 = 560 = 24
· 35
and then compute
235
≡ 263 (mod 561),
22·35
≡ 2632
≡ 166 (mod 561),
24·35
≡ 1662
≡ 67 (mod 561),
28·35
≡ 672
≡ 1 (mod 561).
The first number 235
mod 561 is neither 1 nor −1, and the other numbers in
the list are not equal to −1, so 2 is a Miller–Rabin witness to the fact that 561
is composite.
Example 3.20. We do a second example, taking n = 172947529 and
n − 1 = 172947528 = 23
· 21618441.
We apply the Miller–Rabin test with a = 17 and find that
1721618441
≡ 1 (mod 172947529).
2Unfortunately, although this deduction seems reasonable, it is not quite accurate. In
the language of probability theory, we need to compute the conditional probability that n
is composite given that the Miller–Rabin test fails 10 times; and we know the conditional
probability that the Miller–Rabin test succeeds at least 75 % of the time if n is composite.
See Sect. 5.3.2 for a discussion of conditional probabilities and Exercise 5.30 for a derivation
of the correct formula, which says that the probability (25 %)10 must be approximately
multiplied by ln(n).

Thus 17 is not a Miller–Rabin witness for n. Next we try a = 3, but unfortu-
nately
321618441
≡ −1 (mod 172947529),
so 3 also fails to be a Miller–Rabin witness. At this point we might suspect
that n is prime, but if we try another value, say a = 23, we find that
2321618441
≡ 40063806 (mod 172947529),
232·21618441
≡ 2257065 (mod 172947529),
234·21618441
≡ 1 (mod 172947529).
Thus 23 is a Miller–Rabin witness and n is actually composite. In fact, n is a
Carmichael number, but it’s not so easy to factor (by hand).
3.4.1 The Distribution of the Set of Primes
If Bob picks a number at random, what is the likelihood that it is prime? The
answer is provided by one of number theory’s most famous theorems. In order
to state the theorem, we need a definition.
Definition. For any number X, let
π(X) = (# of primes p satisfying 2 ≤ p ≤ X).
For example, π(10) = 4, since the primes between 2 and 10 are 2, 3, 5, and 7.
Theorem 3.21 (The Prime Number Theorem).
lim
X→∞
π(X)
X/ ln(X)
= 1.
Proof. The prime number theorem was proven independently by Hadamard
and de la Vallée Poussin in 1896. The proof is unfortunately far beyond the
scope of this book. The most direct proof uses complex analysis; see for ex-
ample [7, Chapter 13].
Example 3.22. How many primes would we expect to find between 900000
and 1000000? The prime number theorem says that

Number of primes between 900000 and 1000000

= π(1000000) − π(900000) ≈
1000000
ln 1000000
−
900000
ln 900000
= 6737.62 . . . .
In fact, it turns out that there are exactly 7224 primes between 900000
and 1000000.
For cryptographic purposes, we need even larger primes. For example,
we might want to use primes having approximately 300 decimal digits, or
almost equivalently, primes that are 1024 bits in length, since 21024
≈ 10308.25
.

How many primes p satisfy 21023
p 21024
? The prime number theorem
gives us an answer:
# of 1024 bit primes = π(21024
) − π(21023
) ≈
21024
ln 21024
−
21023
ln 21023
≈ 21013.53
.
So there should be lots of primes in this interval.
Intuitively, the prime number theorem says that if we look at all of the
numbers between 1 and X, then the proportion of them that are prime is
approximately 1/ ln(X). Turning this statement around, the prime number
theorem says:
A randomly chosen number N has
probability 1/ ln(N) of being prime.
(3.10)
Of course, taken at face value, statement (3.10) is utter nonsense. A chosen
number either is prime or is not prime; it cannot be partially prime and
partially composite! A better interpretation of (3.10) is that it describes how
many primes one expects to find in an interval around N. See Exercise 3.19 for
a more precise statement of (3.10) that is both meaningful and mathematically
correct.
Example 3.23. We illustrate statement (3.10) and the prime number theorem
by searching for 1024-bit primes, i.e., primes that are approximately 21024
.
Statement (3.10) says that the probability that a random number N ≈ 21024
is prime is approximately 0.14 %. Thus on average, Bob checks about 700 ran-
domly chosen numbers of this size before finding a prime.
If he is clever, Bob can do better. He knows that he doesn’t want a number
that is even, nor does he want a number that is divisible by 3, nor divisible
by 5, etc. Thus rather than choosing numbers completely at random, Bob
might restrict attention (say) to numbers that are relatively prime to 2, 3, 5, 7
and 11. To do this, he first chooses a random number that is relatively prime
to 2·3·5·7·11 = 2310, say he chooses 1139. Then he considers only numbers N
of the form
N = 2 · 3 · 5 · 7 · 11 · K + 1139 = 2310K + 1139. (3.11)
The probability that an N of this form is prime is approximately (see Exer-
cise 3.20)
2
1
·
3
2
·
5
4
·
7
6
·
11
10
·
1
ln(N)
≈
4.8
ln(N)
.
So if Bob chooses a random number N of the form (3.11) with N ≈ 21024
,
then the probability that it is prime is approximately 0.67 %. Thus he only
needs to check 150 numbers to have a good chance of finding a prime.

We used the Miller–Rabin test with 100 randomly chosen values of a to
check the primality of
2310K + 1139 for each 21013
≤ K ≤ 21013
+ 1000.
We found that 2310(21013
+J)+1139 is probably prime for the following 12 val-
ues of J:
J ∈ {41, 148, 193, 251, 471, 585, 606, 821, 851, 865, 910, 911}.
This is a bit better than the 7 values predicted by the prime number theorem.
The smallest probable prime that we found is 2310 · (21013
+ 41) + 1139, which
is equal to the following 308 digit number:
20276714558261473373313940384387925462194955182405899331133959349334105522983
75121272248938548639688519470034484877532500936544755670421865031628734263599742737518719
78241831537235413710389881550750303525056818030281312537212445925881220354174468221605146
327969430834440565497127875070636801598203824198219369.
Remark 3.24. There are many deep open questions concerning the distribution
of prime numbers, of which the most important and famous is certainly the
Riemann hypothesis.3
The usual way to state the Riemann hypothesis requires
some complex analysis. The Riemann zeta function ζ(s) is deﬁned by the
series
ζ(s) =
∞

n=1
1
ns
,
which converges when s is a complex number with real part greater than 1. It
has an analytic continuation to the entire complex plane with a simple pole
at s = 1 and no other poles. The Riemann hypothesis says that if ζ(σ+it) = 0
with σ and t real and 0 ≤ σ ≤ 1, then in fact σ = 1
2 .
At ﬁrst glance, this somewhat bizarre statement appears to have little
relation to prime numbers. However, it is not hard to show that ζ(s) is also
equal to the product
ζ(s) =

p prime

1 −
1
ps

−1
,
so ζ(s) incorporates information about the set of prime numbers.
There are many statements about prime numbers that are equivalent to
the Riemann hypothesis. For example, recall that the prime number theorem
(Theorem 3.21) says that π(X) is approximately equal to X/ ln(X) for large
values of X. The Riemann hypothesis is equivalent to the following more
accurate statement:
π(X) =
X
2
dt
ln t
+ O
√
X · ln(X)

. (3.12)
This conjectural formula is stronger than the prime number theorem, since
the integral is approximately equal to X/ ln(X). (See Exercise 3.21.)
3The Riemann hypothesis is another of the $1,000,000 Millennium Prize problems.

3.4.2 Primality Proofs Versus Probabilistic Tests
The Miller–Rabin test is a powerful and practical method for finding large
numbers that are “probably prime.” Indeed, Proposition 3.18 says that every
composite number has many Miller–Rabin witnesses, so 50 or 100 repetitions
of the Miller–Rabin test provide solid evidence that n is prime. However, there
is a difference between evidence for a statement and a rigorous proof that the
statement is correct. Suppose that Bob is not satisfied with mere evidence.
He wants to be completely certain that his chosen number n is prime.
In principle, nothing could be simpler. Bob checks to see whether n is di-
visible by any of the numbers 1, 2, 3, 4, . . . up to
√
n. If none of these numbers
divides n, then Bob knows, with complete certainty, that n is prime. Unfor-
tunately, if n is large, say n ≈ 21000
, then the sun will have burnt out before
Bob finishes his task. Notice that the running time of this naive algorithm
is O(
√
n), which means that it is an exponential-time algorithm according
to the definition in Sect. 2.6, since
√
n is exponential in the number of bits
required to write down the number n.
It would be nice if we could use the Miller–Rabin test to efficiently and
conclusively prove that a number n is prime. More precisely, we would like a
polynomial-time algorithm that proves primality. If a generalized version of
the Riemann hypothesis is true, then the following proposition says that this
can be done. (We discussed the Riemann hypothesis in Remark 3.24.)
Proposition 3.25. If a generalized version of the Riemann hypothesis is
true, then every composite number n has a Miller–Rabin witness a for its
compositeness satisfying
a ≤ 2(ln n)2
.
Proof. See [87] for a proof that every composite number n has a wit-
ness satisfying a = O

(ln n)2

, and [9, 10] for the more precise estimate
a ≤ 2(ln n)2
.
Thus if the generalized Riemann hypothesis is true, then we can prove
that n is prime by applying the Miller–Rabin test using every a smaller
than 2(ln n)2
. If some a proves that n is composite, then n is composite,
and otherwise, Proposition 3.25 tells us that n is prime. Unfortunately, the
proof of Proposition 3.25 assumes that the generalized Riemann hypothesis
is true, and no one has yet been able to prove even the original Riemann
hypothesis, despite almost 150 years of work on the problem.
After the creation of public key cryptography, and especially after the
publication of the RSA cryptosystem in 1978, it became of great interest to
find a polynomial-time primality test that did not depend on any unproven
hypotheses. Many years of research culminated in 2002, when M. Agrawal, N.
Kayal, and N. Saxena [1] found such an algorithm. Subsequent improvements
to their algorithm have given the following result.

3.5. Pollard’s p − 1 Factorization Algorithm 137
Theorem 3.26 (AKS Primality Test). For every 0, there is an algorithm
that conclusively determines whether a given number n is prime in no more
than O

(ln n)6+

steps.
Proof. The original algorithm was published in [1]. Further analysis and refine-
ments may be found in [76]. The monograph [36] contains a nice description
of primality testing, including the AKS test.
Remark 3.27. The result described in Theorem 3.26 represents a triumph of
modern algorithmic number theory. The significance for practical cryptogra-
phy is less clear, since the AKS algorithm is much slower than the Miller–
Rabin test. In practice, most people are willing to accept that a number is
prime if it passes the Miller–Rabin test for (say) 50–100 randomly chosen
values of a.
3.5 Pollard’s p − 1 Factorization Algorithm
We saw in Sect. 3.4 that it is relatively easy to check whether a large number
is (probably) prime. This is good, since the RSA cryptosystem needs large
primes in order to operate.
Conversely, the security of RSA relies on the apparent difficulty of factor-
ing large numbers. The study of factorization dates back at least to ancient
Greece, but it was only with the advent of computers that people started to
develop algorithms capable of factoring very large numbers. The paradox of
RSA is that in order to make RSA more efficient, we want to use a modu-
lus N = pq that is as small as possible. On the other hand, if an opponent
can factor N, then our encrypted messages are not secure. It is thus vital
to understand how hard it is to factor large numbers, and in particular, to
understand the capabilities of the different algorithms that are currently used
for factorization.
In the next few sections we discuss, with varying degrees of detail, some
of the known methods for factoring large integers. A further method using
elliptic curves is described in Sect. 6.6. Those readers interested in pursuing
this subject might consult [28, 34, 109, 150] and the references cited in those
works.
We begin with an algorithm called Pollard’s p − 1 method. Although not
useful for all numbers, there are certain types of numbers for which it is quite
efficient. Pollard’s method demonstrates that there are insecure RSA moduli
that at first glance appear to be secure. This alone warrants the study of
Pollard’s method. In addition, the p − 1 method provides the inspiration for
Lenstra’s elliptic curve factorization method, which we study later, in Sect. 6.6.
We are presented with a number N = pq and our task is to determine
the prime factors p and q. Suppose that by luck or hard work or some other
method, we manage to find an integer L with the property that

p − 1 divides L and q − 1 does not divide L.
This means that there are integers i, j, and k with k = 0 satisfying
L = i(p − 1) and L = j(q − 1) + k.
Consider what happens if we take a randomly chosen integer a and com-
pute aL
. Fermat’s little theorem (Theorem 1.24) tells us that4
aL
= ai(p−1)
= (ap−1
)i
≡ 1i
≡ 1 (mod p),
aL
= aj(q−1)+k
= ak
(aq−1
)j
≡ ak
· 1j
≡ ak
(mod q).
The exponent k is not equal to 0, so it is quite unlikely that ak
will be
congruent to 1 modulo q. Thus with very high probability, i.e., for most choices
of a, we find that
p divides aL
− 1 and q does not divide aL
− 1.
But this is wonderful, since it means that we can recover p via the simple gcd
computation
p = gcd(aL
− 1, N).
This is all well and good, but where, you may ask, can we find an expo-
nent L that is divisible by p − 1 and not by q − 1? Pollard’s observation is
that if p − 1 happens to be a product of many small primes, then it will di-
vide n! for some not-too-large value of n. So here is the idea. For each number
n = 2, 3, 4, . . . we choose a value of a and compute
gcd(an!
− 1, N).
(In practice, we might simply take a = 2.) If the gcd is equal to 1, then we
go on to the next value of n. If the gcd ever equals N, then we’ve been quite
unlucky, but a different a value will probably work. And if we get a number
strictly between 1 and N, then we have a nontrivial factor of N and we’re
done.
Remark 3.28. There are two important remarks to make before we put
Pollard’s idea into practice. The first concerns the quantity an!
− 1. Even
for a = 2 and quite moderate values of n, say n = 100, it is not feasible to
compute an!
− 1 exactly. Indeed, the number 2100!
has more than 10157
digits,
which is larger than the number of elementary particles in the known universe!
Luckily, there is no need to compute it exactly. We are interested only in the
greatest common divisor of an!
− 1 and N, so it suffices to compute
an!
− 1 (mod N)
4We have assumed that p a and q a, since if p and q are very large, this will almost
certainly be the case. Further, if by some chance p | a and q a, then we can recover p as
p = gcd(a, N).

3.5. Pollard’s p − 1 Factorization Algorithm 139
and then take the gcd with N. Thus we never need to work with numbers
larger than N.
Second, we do not even need to compute the exponent n!. Instead, assum-
ing that we have already computed an!
mod N in the previous step, we can
compute the next value as
a(n+1)!
≡

an!
n+1
(mod N).
This leads to the algorithm described in Table 3.3.
Remark 3.29. How long does it take to compute the value of an!
mod N? The
fast exponentiation algorithm described in Sect. 1.3.2 gives a method for com-
puting ak
mod N in at most 2 log2 k steps, where each step is a multiplication
modulo N. Stirling’s formula5
says that if n is large, then n! is approximately
equal to (n/e)n
. So we can compute an!
mod N in 2n log2(n) steps. Thus it is
feasible to compute an!
mod N for reasonably large values of n.
Input. Integer N to be factored.
1. Set a = 2 (or some other convenient value).
2. Loop j = 2, 3, 4, . . . up to a specified bound.
3. Set a = aj
mod N.
4. Compute d = gcd(a − 1, N)†
.
5. If 1 d N then success, return d.
6. Increment j and loop again at Step 2.
†
For added efficiency, choose an appropriate k and
compute the gcd in Step 4 only every kth iteration.
Table 3.3: Pollard’s p − 1 factorization algorithm
Example 3.30. We use Pollard’s p−1 method to factor N = 13927189. Starting
with gcd(29!
− 1, N) and taking successively larger factorials in the exponent,
we find that
29!
− 1 ≡ 13867883 (mod 13927189), gcd(29!
− 1, 13927189) = 1,
210!
− 1 ≡ 5129508 (mod 13927189), gcd(210!
− 1, 13927189) = 1,
211!
− 1 ≡ 4405233 (mod 13927189), gcd(211!
− 1, 13927189) = 1,
212!
− 1 ≡ 6680550 (mod 13927189), gcd(212!
− 1, 13927189) = 1,
213!
− 1 ≡ 6161077 (mod 13927189), gcd(213!
− 1, 13927189) = 1,
214!
− 1 ≡ 879290 (mod 13927189), gcd(214!
− 1, 13927189) = 3823.
The final line gives us a nontrivial factor p = 3823 of N. This factor is
prime, and the other factor q = N/p = 13927189/3823 = 3643 is also prime.
5Stirling’s formula says more precisely that ln(n!) = n ln(n) − n + 1
2
ln(2πn) + O(1/n).

The reason that an exponent of 14! worked in this instance is that p − 1 factors
into a product of small primes,
p − 1 = 3822 = 2 · 3 · 72
· 13.
The other factor satisfies q − 1 = 3642 = 2 · 3 · 607, which is not a product of
small primes.
Example 3.31. We present one further example using larger numbers. Let
N = 168441398857. Then
250!
− 1 ≡ 114787431143 (mod N), gcd(250!
− 1, N) = 1,
251!
− 1 ≡ 36475745067 (mod N), gcd(251!
− 1, N) = 1,
252!
− 1 ≡ 67210629098 (mod N), gcd(252!
− 1, N) = 1,
253!
− 1 ≡ 8182353513 (mod N), gcd(253!
− 1, N) = 350437.
So using 253!
− 1 yields the prime factor p = 350437 of N, and the other
(prime) factor is 480661. We were lucky, of course, that p − 1 is a product of
small factors,
p − 1 = 350436 = 22
· 3 · 19 · 29 · 53.
Remark 3.32. Notice that it is easy for Bob and Alice to avoid the dangers of
Pollard’s p − 1 method when creating RSA keys. They simply check that their
chosen secret primes p and q have the property that neither p − 1 nor q − 1
factors entirely into small primes. From a cryptographic perspective, the im-
portance of Pollard’s method lies in the following lesson. Most people would
not expect, at first glance, that factorization properties of p − 1 and q − 1
should have anything to do with the difficulty of factoring pq. The moral is
that even if we build a cryptosystem based on a seemingly hard problem such
as integer factorization, we must be wary of special cases of the problem that,
for subtle and nonobvious reasons, are easier to solve than the general case.
We have already seen an example of this in the Pohlig–Hellman algorithm for
the discrete logarithm problem (Sect. 2.9), and we will see it again later when
we discuss elliptic curves and the elliptic curve discrete logarithm problem.
Remark 3.33. We have not yet discussed the likelihood that Pollard’s p − 1
algorithm succeeds. Suppose that p and q are randomly chosen primes of about
the same size. Pollard’s method works if at least one of p − 1 or q − 1 factors
entirely into a product of small prime powers. Clearly p − 1 is even, so we
can pull off a factor of 2, but after that, the quantity 1
2 (p − 1) should behave
more or less like a random number of size approximately 1
2 p. This leads to the
following question:
What is the probability that a randomly chosen integer of
size approximately n divides B! (B-factorial)?
Notice in particular that if n divides B!, then every prime dividing n must
satisfy ≤ B. A number whose prime factors are all less than or equal to B

3.6. Factorization via Difference of Squares 141
is called a B-smooth number. It is thus natural to ask for the probability that
a randomly chosen integer of size approximately n is a B-smooth number.
Turning this question around, we can also ask:
Given n, how large should we choose B so that a randomly
chosen integer of size approximately n has a reasonably
good probability of being a B-smooth number?
The efficiency (or lack thereof) of all modern methods of integer factoriza-
tion is largely determined by the answer to this question. We study smooth
numbers in Sect. 3.7.
3.6 Factorization via Difference of Squares
The most powerful factorization methods known today rely on one of the
simplest identities in all of mathematics,
X2
− Y 2
= (X + Y )(X − Y ). (3.13)
This beautiful formula says that a difference of squares is equal to a product.
The potential applicability to factorization is immediate. In order to factor a
number N, we look for an integer b such that the quantity N + b2
is a perfect
square, say equal to a2
. Then N + b2
= a2
, so
N = a2
− b2
= (a + b)(a − b),
and we have effected a factorization of N.
Example 3.34. We factor N = 25217 by looking for an integer b making N + b2
a perfect square:
25217 + 12
= 25218 not a square,
25217 + 22
25217 + 32
25217 + 42
25217 + 52
25217 + 62
25217 + 72
25217 + 82
= 25281 = 1592
Eureka! ** square **.
Then we compute
25217 = 1592
− 82
= (159 + 8)(159 − 8) = 167 · 151.

If N is large, then it is unlikely that a randomly chosen value of b will
make N + b2
into a perfect square. We need to find a clever way to select b.
An important observation is that we don’t necessarily need to write N itself
as a difference of two squares. It often suffices to write some multiple kN of N
as a difference of two squares, since if
kN = a2
− b2
= (a + b)(a − b),
then there is a reasonable chance that the factors of N are separated by the
right-hand side of the equation, i.e., that N has a nontrivial factor in common
with each of a + b and a − b. It is then a simple matter to recover the factors
by computing gcd(N, a + b) and gcd(N, a − b). We illustrate with an example.
Example 3.35. We factor N = 203299. If we make a list of N + b2
for values
of b = 1, 2, 3, . . ., say up to b = 100, we do not find any square values. So next
we try listing the values of 3N + b2
and we find
3 · 203299 + 12
3 · 203299 + 22
3 · 203299 + 32
3 · 203299 + 42
3 · 203299 + 52
3 · 203299 + 62
3 · 203299 + 72
3 · 203299 + 82
= 609961 = 7812
Eureka! ** square **.
Thus
3 · 203299 = 7812
− 82
= (781 + 8)(781 − 8) = 789 · 773,
so when we compute
gcd(203299, 789) = 263 and gcd(203299, 773) = 773,
we find nontrivial factors of N. The numbers 263 and 773 are prime, so the
full factorization of N is 203299 = 263 · 773.
Remark 3.36. In Example 3.35, we made a list of values of 3N + b2
. Why
didn’t we try 2N + b2
first? The answer is that if N is odd, then 2N + b2
can never be a square, so it would have been a waste of time to try it. The
reason that 2N + b2
can never be a square is as follows (cf. Exercise 1.23).
We compute modulo 4,
2N + b2
≡ 2 + b2
≡

2 + 0 ≡ 2 (mod 4) if b is even,
2 + 1 ≡ 3 (mod 4) if b is odd.
Thus 2N +b2
is congruent to either 2 or 3 modulo 4. But squares are congruent
to either 0 or 1 modulo 4. Hence if N is odd, then 2N + b2
is never a square.

The multiples of N are the numbers that are congruent to 0 modulo N,
so rather than searching for a difference of squares a2
− b2
that is a multiple
of N, we may instead search for distinct numbers a and b satisfying
a2
≡ b2
(mod N). (3.14)
This is exactly the same problem, of course, but the use of modular arithmetic
helps to clarify our task.
In practice it is not feasible to search directly for integers a and b satis-
fying (3.14). Instead we use a three-step process as described in Table 3.4.
This procedure, in one form or another, underlies most modern methods of
factorization.
1. Relation Building: Find many integers a1, a2, a3, . . . , ar with the
property that the quantity ci ≡ a2
i (mod N) factors as a product of
small primes.
2. Elimination: Take a product ci1
ci2
· · · cis
of some of the ci’s so that
every prime appearing in the product appears to an even power. Then
ci1
ci2
· · · cis
= b2
is a perfect square.
3. GCD Computation: Let a = ai1
ai2
· · · ais
and compute the greatest
common divisor d = gcd(N, a − b). Since
a2
= (ai1
ai2
· · · ais
)
2
≡ a2
i1
a2
i2
· · · a2
is
≡ ci1
ci2
· · · cis
≡ b2
(mod N),
there is a reasonable chance that d is a nontrivial factor of N.
Table 3.4: A three step factorization procedure
Example 3.37. We factor N = 914387 using the procedure described in
Table 3.4. We first search for integers a with the property that a2
mod N
is a product of small primes. For this example, we ask that each a2
mod N be
a product of primes in the set {2, 3, 5, 7, 11}. Ignoring for now the question of
how to find such a, we observe that
18692
≡ 750000 (mod 914387) and 750000 = 24
· 3 · 56
,
19092
≡ 901120 (mod 914387) and 901120 = 214
· 5 · 11,
33872
≡ 499125 (mod 914387) and 499125 = 3 · 53
· 113
.
None of the numbers on the right is a square, but if we multiply them together,
then we do get a square. Thus
18692
· 19092
· 33872
≡ 750000 · 901120 · 499125 (mod 914387)
≡ (24
· 3 · 56
)(214
· 5 · 11)(3 · 53
· 113
) (mod 914387)

= (29
· 3 · 55
· 112
)2
= 5808000002
≡ 1642552
(mod 914387).
We further note that 1869 · 1909 · 3387 ≡ 9835 (mod 914387), so we compute
gcd(914387, 9835 − 164255) = gcd(914387, 154420) = 1103.
Hooray! We have factored 914387 = 1103 · 829.
Example 3.38. We do a second example to illustrate a potential pitfall in this
method. We will factor N = 636683. After some searching, we find
13872
≡ 13720 (mod 636683) and 13720 = 23
· 5 · 73
,
27742
≡ 54880 (mod 636683) and 54880 = 25
· 5 · 73
.
Multiplying these two values gives a square,
13872
· 27742
≡ 13720 · 54880 = (24
· 5 · 73
)2
= 274402
.
Unfortunately, when we compute the gcd, we find that
gcd(636683, 1387 · 2774 − 27440) = gcd(636683, 3820098) = 636683.
Thus after all our work, we have made no progress! However, all is not lost.
We can gather more values of a and try to find a different relation. Extending
the above list, we discover that
33592
≡ 459270 (mod 636683) and 459270 = 2 · 38
· 5 · 7.
Multiplying 13872
and 33592
gives
13872
· 33592
≡ 13720 · 459270 = (22
· 34
· 5 · 72
)2
= 793802
,
and now when we compute the gcd, we obtain
gcd(636683, 1387 · 3359 − 79380) = gcd(636683, 4579553) = 787.
This gives the factorization N = 787 · 809.
Remark 3.39. How many solutions to a2
≡ b2
(mod N) are we likely to try
before we find a factor of N? The most difficult case occurs when N = pq is
a product of two primes that are of roughly the same size. (This is because
the smallest prime factor is O(
√
N), while in any other case the smallest
prime factor will be O(Nα
), with α 1/2. As α decreases, the difficulty of
factoring N decreases.) Suppose that we can find more or less random values
of a and b satisfying a2
≡ b2
(mod N). What are our chances of finding a
nontrivial factor of N when we compute gcd(a − b, N)? We know that

(a − b)(a + b) = a2
− b2
= kN = kpq for some value of k.
The prime p must divide at least one of a − b and a + b, and it has approx-
imately equal probability of dividing each. Similarly for q. We win if a − b
is divisible by exactly one of p and q, which happens approximately 50 % of
the time. Hence if we can actually generate random a’s and b’s satisfying
a2
≡ b2
(mod N), then it won’t take us long to find a factor of N. Of course
this leaves us with the question of just how hard it is to find these a’s and b’s.
Having given a taste of the process through several examples, we now do a
more systematic analysis. The factorization procedure described in Table 3.4
consists of three steps:
1. Relation Building
2. Elimination
3. GCD Computation
There is really nothing to say about Step 3, since the Euclidean algorithm
(Theorem 1.7) tells us how to efficiently compute gcd(N, a − b) in O(ln N)
steps. On the other hand, there is so much to say about relation building
that we postpone our discussion until Sect. 3.7. Finally, what of Step 2, the
elimination step?
We suppose that each of the numbers a1, . . . , ar found in Step 1 has the
property that ci ≡ a2
i (mod m) factors into a product of small primes—say
that each ci is a product of primes chosen from the set of the first t primes
{p1, p2, p3, . . . , pt}. This means that there are exponents eij such that
c1 = pe11
1 pe12
2 pe13
3 · · · pe1t
t ,
c2 = pe21
1 pe22
2 pe23
3 · · · pe2t
t ,
.
.
.
.
.
.
cr = per1
1 per2
2 per3
3 · · · pert
t .
Our goal is to take a product of some of the ci’s in order to make each prime
on the right-hand side of the equation appear to an even power. In other
words, our problem reduces to finding u1, u2, . . . , ur ∈ {0, 1} such that
cu1
1 · cu2
2 · · · cur
r is a perfect square.
Here we take ui = 1 if we want to include ci in the product, and we take ui = 0
if we do not want to include ci in the product.
Writing out the product in terms of the prime factorizations of c1, . . . , cr
gives the rather messy expression
cu1
1 · cu2
2 · · · cur
r
= (pe11
1 pe12
2 pe13
3 · · · pe1t
t )
u1
· (pe21
1 pe22
2 pe23
3 · · · pe2t
t )
u2
· · · (per1
1 per2
2 per3
3 · · · pert
t )
ur

= pe11u1+e21u2+···+er1ur
1 · pe12u1+e22u2+···+er2ur
2 · · · pe1tu1+e2tu2+···+ertur
t ·
(3.15)
You may find this clearer if it is written using summation and product
notation,
r

i=1
cui
i =
t

j=1
p
r
i=1 eij ui
j . (3.16)
In any case, our goal is to choose u1, . . . , ur such that all of the exponents
in (3.15), or equivalently in (3.16), are even.
To recapitulate, we are given integers
e11, e12, . . . , e1t, e21, e22, . . . , e2t, . . . , er1, er2, . . . , ert
and we are searching for integers u1, u2, . . . , ur such that
e11u1 + e21u2 + · · · + er1ur ≡ 0 (mod 2),
e12u1 + e22u2 + · · · + er2ur ≡ 0 (mod 2),
.
.
.
.
.
.
e1tu1 + e2tu2 + · · · + ertur ≡ 0 (mod 2).
(3.17)
You have undoubtedly recognized that the system of congruences (3.17)
is simply a system of linear equations over the finite field F2. Hence standard
techniques from linear algebra, such as Gaussian elimination, can be used to
solve these equations. In fact, doing linear algebra in the field F2 is much
easier than doing linear algebra in the field R, since there is no need to worry
about round-off errors.
Example 3.40. We illustrate the linear algebra elimination step by factoring
the number
N = 9788111.
We look for numbers a with the property that a2
mod N is 50-smooth, i.e.,
numbers a such that a2
mod N is equal to a product of primes in the set
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47}.
The top part of Table 3.5 lists the 20 numbers a1, a2, . . . , a20 between 3129
and 4700 having this property,6
together with the factorization of each
ci ≡ a2
i (mod N).
The bottom part of Table 3.5 translates the requirement that a product
cu1
1 cu2
2 · · · cu20
20 be a square into a system of linear equation for (u1, u2, . . . , u20)
as described by (3.17). For notational convenience, we have written the system
of linear equations in Table 3.5 in matrix form.
6Why do we start with a = 3129? The answer is that unless a2 is larger than N, then
there is no reduction modulo N in a2 mod N, so we cannot hope to gain any information.
The value 3129 comes from the fact that
√
N =
√
9788111 ≈ 3128.6.

The next step is to solve the system of linear equations in Table 3.5. This
can be done by standard Gaussian elimination, always keeping in mind that
all computations are done modulo 2. The set of solutions turns out to be an
F2-vector space of dimension 8. A basis for the set of solutions is given by
the following 8 vectors, where we have written the vectors horizontally, rather
than vertically, in order to save space:
v1 = (0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
v2 = (0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
v3 = (0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
v4 = (1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
v5 = (1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0),
v6 = (1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0),
v7 = (1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0),
v8 = (1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1).
Each of the vectors v1, . . . , v8 gives a congruence a2
≡ b2
(mod N) that
has the potential to provide a factorization of N. For example, v1 says that if
we multiply the 3rd, 5th, and 9th numbers in the list at the top of Table 3.5,
we will get a square, and indeed we find that
31312
· 31742
· 34812
≡ (2 · 52
· 7 · 43)(5 · 113
· 43)(2 · 53
· 7 · 113
) (mod 9788111)
= (2 · 53
· 7 · 113
· 43)2
= 1001577502
.
Next we compute
gcd(9788111, 3131 · 3174 · 3481 − 100157750) = 9788111,
which gives back the original number N. This is unfortunate, but all is not
lost, since we have seven more independent solutions to our system of linear
equations. Trying each of them in turn, we list the results in Table 3.6.
Seven of the eight solutions to the system of linear equations yield no
useful information about N, the resulting gcd being either 1 or N. However,
one solution, listed in the penultimate box of Table 3.6, leads to a nontrivial
factorization of N. Thus 2741 is a factor of N, and dividing by it we obtain
N = 9788111 = 2741 · 3571. Since both 2741 and 3571 are prime, this gives
the complete factorization of N.
Remark 3.41. In order to factor a large number N, it may be necessary to use
a set {p1, p2, p3, . . . , pt} containing hundreds of thousands, or even millions,
of primes. Then the system (3.17) contains millions of linear equations, and
even working in the field F2, it can be very difficult to solve general systems

31292
≡ 2530 (mod 9788111) and 2530 = 2 · 5 · 11 · 23
31302
≡ 8789 (mod 9788111) and 8789 = 11 · 17 · 47
31312
≡ 15050 (mod 9788111) and 15050 = 2 · 52
· 7 · 43
31662
≡ 235445 (mod 9788111) and 235445 = 5 · 72
· 312
31742
≡ 286165 (mod 9788111) and 286165 = 5 · 113
· 43
32152
≡ 548114 (mod 9788111) and 548114 = 2 · 73
· 17 · 47
33132
≡ 1187858 (mod 9788111) and 1187858 = 2 · 72
· 17 · 23 · 31
34492
≡ 2107490 (mod 9788111) and 2107490 = 2 · 5 · 72
· 11 · 17 · 23
34812
≡ 2329250 (mod 9788111) and 2329250 = 2 · 53
· 7 · 113
35612
≡ 2892610 (mod 9788111) and 2892610 = 2 · 5 · 7 · 312
· 43
43942
≡ 9519125 (mod 9788111) and 9519125 = 53
· 7 · 11 · 23 · 43
44252
≡ 4403 (mod 9788111) and 4403 = 7 · 17 · 37
44262
≡ 13254 (mod 9788111) and 13254 = 2 · 3 · 472
44322
≡ 66402 (mod 9788111) and 66402 = 2 · 32
· 7 · 17 · 31
44422
≡ 155142 (mod 9788111) and 155142 = 2 · 33
· 132
· 17
44682
≡ 386802 (mod 9788111) and 386802 = 2 · 33
· 13 · 19 · 29
45512
≡ 1135379 (mod 9788111) and 1135379 = 72
· 17 · 29 · 47
45952
≡ 1537803 (mod 9788111) and 1537803 = 32
· 17 · 19 · 232
46512
≡ 2055579 (mod 9788111) and 2055579 = 3 · 23 · 313
46842
≡ 2363634 (mod 9788111) and 2363634 = 2 · 33
· 7 · 132
· 37
Relation gathering step
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
1 0 1 0 0 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1
1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0 1 1 1 1 0 1 0 0 0 0 0 1
1 1 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 1 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
u1
u2
u3
u4
u5
u6
u7
u8
u9
u10
u11
u12
u13
u14
u15
u16
u17
u18
u19
u20
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
≡
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
(mod 2)
Linear algebra elimination step
Table 3.5: Factorization of N = 9788111

v1 = (0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
31312
· 31742
· 34812
≡ (2 · 53
· 7 · 113
· 43)2
= 1001577502
gcd(9788111, 3131 · 3174 · 3481 − 100157750) = 9788111
v2 = (0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
31302
· 31312
· 31662
· 31742
· 32152
≡ (2 · 52
· 73
· 112
· 17 · 31 · 43 · 47)2
= 22101737850502
gcd(9788111, 3130 · 3131 · 3166 · 3174 · 3215 − 2210173785050) = 1
v3 = (0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
31312
· 31662
· 35612
≡ (2 · 52
· 72
· 312
· 43)2
= 1012413502
gcd(9788111, 3131 · 3166 · 3561 − 101241350) = 9788111
v4 = (1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
31292
· 31312
· 43942
≡ (2 · 53
· 7 · 11 · 23 · 43)2
= 190382502
gcd(9788111, 3129 · 3131 · 4394 − 19038250) = 9788111
v5 = (1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)
31292
· 31312
· 31742
· 33132
· 44322
≡ (22
· 3 · 52
· 72
· 112
· 17 · 23 · 31 · 43)2
= 9270637761002
gcd(9788111, 3129 · 3131 · 3174 · 3313 · 4432 − 927063776100) = 1
v6 = (1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0)
31292
· 34492
· 44262
· 44422
≡ (22
· 32
· 5 · 7 · 11 · 13 · 17 · 23 · 47)2
= 33111678602
gcd(9788111, 3129 · 3449 · 4426 · 4442 − 3311167860) = 1
v7 = (1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0)
31292
· 33132
· 34492
· 44262
· 46512
≡ (22
· 3 · 5 · 72
· 11 · 17 · 232
· 312
· 47)2
= 131360821145402
gcd(9788111, 3129 · 3313 · 3449 · 4426 · 4651 − 13136082114540) = 2741
v8 = (1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1)
31292
· 34492
· 44252
· 44262
· 46842
≡ (22
· 32
· 5 · 72
· 11 · 13 · 17 · 23 · 37 · 47)2
= 8575924757402
gcd(9788111, 3129 · 3449 · 4425 · 4426 · 4684 − 857592475740) = 1
Table 3.6: Factorization of N = 9788111; computation of gcds

of this size. However, it turns out that the systems of linear equations used in
factorization are quite sparse, which means that most of their coefficients are
zero. (This is plausible because if a number A is a product of primes smaller
than B, then one expects A to be a product of approximately ln(A)/ ln(B) dis-
tinct primes.) There are special techniques for solving sparse systems of linear
equations that are much more efficient than ordinary Gaussian elimination;
see for example [31, 72].
3.7 Smooth Numbers, Sieves, and Building
Relations for Factorization
In this section we describe the two fastest known methods for doing “hard”
factorization problems, i.e., factoring numbers of the form N = pq, where p
and q are primes of approximately the same order of magnitude. We begin
with a discussion of smooth numbers, which form the essential tool for building
relations. Next we describe in some detail the quadratic sieve, which is a fast
method for finding the necessary smooth numbers. Finally, we briefly describe
the number field sieve, which is similar to the quadratic sieve in that it provides
a fast method for finding smooth numbers of a certain form. However, when N
is extremely large, the number field sieve is much faster than the quadratic
sieve, because by working in a ring larger than Z, it uses smaller auxiliary
numbers in its search for smooth numbers.
3.7.1 Smooth Numbers
The relation building step in the three step factorization procedure described
in Table 3.4 requires us to find many integers with the property that a2
mod N
factors as a product of small primes. As noted at the end of Sect. 3.5, these
highly factorizable numbers have a name.
Definition. An integer n is called B-smooth if all of its prime factors are less
than or equal to B.
Example 3.42. Here are the first few 5-smooth numbers and the first few
numbers that are not 5-smooth:
5-smooth: 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36, . . .
Not 5-smooth: 7, 11, 13, 14, 17, 19, 21, 22, 23, 26, 28, 29, 31, 33, 34, 35, 37, . . .
Definition. The function ψ(X, B) counts B-smooth numbers,
ψ(X, B) = Number of B-smooth integers n such that 1 n ≤ X.
For example,
ψ(25, 5) = 15,

3.7. Smooth Numbers and Sieves 151
since the 5-smooth numbers between 1 and 25 are the 15 numbers
2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25.
In order to evaluate the efficiency of the three step factorization method,
we need to understand how ψ(X, B) behaves for large values of X and B.
It turns out that in order to obtain useful results, the quantities B and X
must increase together in just the right way. An important theorem in this
direction was proven by Canfield, Erdős, and Pomerance [24].
Theorem 3.43 (Canfield, Erdős, Pomerance). Fix a number 0 1
2 , and
let X and B increase together while satisfying
(ln X)
ln B (ln X)1−
.
For notational convenience, we let
u =
ln X
ln B
.
Then the number of B-smooth numbers less than X satisfies
ψ(X, B) = X · u−u(1+o(1))
.
Remark 3.44. We’ve used little-o notation here for the first time. The ex-
pression o(1) denotes a function that tends to 0 as X tends to infinity. More
generally, we write
f(X) = o

g(X)

if the ratio f(X)/g(X) tends to 0 as X tends to infinity. Note that this is
different from the big-O notation introduced in Sect. 2.6, where recall that
f(X) = O

g(X)

means that f(X) is smaller than a multiple of g(X).
The question remains of how we should choose B in terms of X. It turns
out that the following curious-looking function L(X) is what we will need:
L(X) = e
√
(ln X)(ln ln X)
. (3.18)
Then, as an immediate consequence of Theorem 3.43, we obtain a fundamental
estimate for ψ.
Corollary 3.45. For any fixed value of c with 0 c 1,
ψ

X, L(X)c

= X · L(X)−(1/2c)(1+o(1))
as X → ∞.
Proof. Note that if B = L(X)c
and if we take any 1
2 , then
ln B = c ln L(X) = c

(ln X)(ln ln X)
satisfies (ln X)
ln B (ln X)1−
. So we can apply Theorem 3.43 with

u =
ln X
ln B
=
1
c
·

ln X
ln ln X
to deduce that ψ

X, L(X)c

= X · u−u(1+o(1))
. It is easily checked (see Exer-
cise 3.32) that this value of u satisfies
u−u(1+o(1))
= L(X)−(1/2c)(1+o(1))
,
which completes the proof of the corollary.
The function L(X) = e
√
(ln X)(ln ln X)
and other similar functions appear
prominently in the theory of factorization due to their close relationship to
the distribution of smooth numbers. It is thus important to understand how
fast L(X) grows as a function of X.
Recall that in Sect. 2.6 we defined big-O notation and used it to discuss
the notions of polynomial, exponential, and subexponential running times.
What this meant was that the number of steps required to solve a problem
was, respectively, polynomial, exponential, and subexponential in the number
of bits required to describe the problem. As a supplement to big-O notation,
it is convenient to introduce two other ways of comparing the rate at which
functions grow.
Definition (Order Notation). Let f(X) and g(X) be functions of X whose
values are positive. Recall that we write
f(X) = O

g(X)

if there are positive constants c and C such that
f(X) ≤ cg(X) for all X ≥ C.
Similarly, we say that f is big-Ω of g and write
f(X) = Ω

g(X)

if there are positive constants c and C such that7
f(X) ≥ cg(X) for all X ≥ C.
Finally, if f is both big-O and big-Ω of g, we say that f is big-Θ of g and
write f(X) = Θ

g(X)

.
7Note: Big-Ω notation as used by computer scientists and cryptographers does not
mean the same thing as the big-Ω notation of mathematicians. In mathematics, especially
in the field of analytic number theory, the expression f(n) = Ω

g(n)

means that there is
a constant c such that there are infinitely many integers n such that f(n) ≥ cg(n). In this
book we use the computer science definition.

Remark 3.46. In analytic number theory there is an alternative version of
order notation that is quite intuitive. For functions f(X) and g(X), we write
f(X) g(X) if f(X) = O

g(X)

,
f(X) g(X) if f(X) = Ω

g(X)

,
f(X) g(X) if f(X) = Θ

g(X)

.
The advantage of this notation is that it is transitive, just as the usual “greater
than” and “less than” relations are transitive. For example, if f g and g
h, then f h.
Definition. With this notation in place, a function f(X) is said to grow
exponentially if there are positive constants α and β such that
Ω(Xα
) = f(X) = O(Xβ
),
and it is said to grow polynomially if there are positive constants α and β
such that
Ω

(ln X)α

= f(X) = O

(ln X)β

.
In the alternative notation of Remark 3.46, exponential growth and polyno-
mial growth are written, respectively, as
Xα
f(X) Xβ
and (ln X)α
f(X) (ln X)β
.
A function that falls in between these two categories is called subexponen-
tial. Thus f(X) is subexponential if for every positive constant α, no matter
how large, and for every positive constant β, no matter how small,
Ω

(ln X)α

= f(X) = O(Xβ
). (3.19)
(In the alternative notation, this becomes (ln X)α
f(X) Xβ
.)
Note that there is a possibility for confusion, since these definitions do
not correspond to the usual meaning of exponential and polynomial growth
that one finds in calculus. What is really happening is that “exponential” and
“polynomial” refer to growth rates in the number of bits that it takes to write
down X, i.e., exponential or polynomial functions of log2(X).
Remark 3.47. The function L(X) falls into the subexponential category. We
leave this for you to prove in Exercise 3.30. See Table 3.7 for a rough idea of
how fast L(X) grows as X increases.
Suppose that we attempt to factor N by searching for values a2
(mod N)
that are B-smooth. In order to perform the linear equation elimination step,
we need (at least) as many B-smooth numbers as there are primes less than B.
We need this many because in the elimination step, the smooth numbers
correspond to the variables, while the primes less than B correspond to the
equations, and we need more variables than equations. In order to ensure that

X ln L(X) L(X)
2100
17.141 224.73
2250
29.888 243.12
2500
45.020 264.95
21000
67.335 297.14
22000
100.145 2144.48
Table 3.7: The growth of L(X) = e
√
(ln X)(ln ln X)
this is the case, we thus need there to be at least π(B) B-smooth numbers,
where π(B) is the number of primes up to B. It will turn out that we can take
B = L(N)c
for a suitable value of c. In the next proposition we use the prime
number theorem (Theorem 3.21) and the formula for ψ(X, L(X)c
) given in
Corollary 3.45 to choose the smallest value of c that gives us some chance of
factoring N using this method.
Proposition 3.48. Let L(X) = e
√
(ln X)(ln ln X)
be as in Corollary 3.45, let N
be a large integer, and set B = L(N)1/
√
2
.
(a) We expect to check approximately L(N)
√
2
random numbers modulo N in
order to find π(B) numbers that are B-smooth.
(b) We expect to check approximately L(N)
√
2
random numbers of the form
a2
(mod N) in order to find enough B-smooth numbers to factor N.
Hence the factorization procedure described in Table 3.4 should have a subex-
ponential running time.
Proof. We already explained why (a) and (b) are equivalent, assuming that
the numbers a2
(mod N) are sufficiently random. We now prove (a).
The probability that a randomly chosen number modulo N is B-smooth
is ψ(N, B)/N. In order to find π(B) numbers that are B-smooth, we need to
check approximately
π(B)
ψ(N, B)/N
numbers. (3.20)
We want to choose B so as to minimize this function, since checking numbers
for smoothness is a time-consuming process.
Corollary 3.45 says that
ψ(N, L(N)c
)/N ≈ L(N)−1/2c
,
so we set B = L(N)c
and search for the value of c that minimizes (3.20).
The prime number theorem (Theorem 3.21) tells us that π(B) ≈ B/ ln(B),
so (3.20) is equal to

π

L(N)c

ψ

N, L(N)c

/N
≈
L(N)c
c ln L(N)
·
1
L(N)−1/2c
= L(N)c+1/2c
·
1
c ln L(N)
.
The factor L(N)c+1/2c
dominates this last expression, so we choose the value
of c that minimizes the quantity c + 1
2c . This is an elementary calculus prob-
lem. It is minimized when c = 1
√
2
, and the minimum value is
√
2. Thus if we
choose B ≈ L(N)1/
√
2
, then we need to check approximately L(N)
√
2
values
in order to find π(B) numbers that are B-smooth, and hence to find enough
relations to factor N.
Remark 3.49. Proposition 3.48 suggests that we need to check approxi-
mately L(N)
√
2
randomly chosen numbers modulo N in order to find enough
smooth numbers to factor N. There are various ways to decrease the search
time. In particular, rather than using random values of a to compute numbers
of the form a2
(mod N), we might instead select numbers a that are only a
little bit larger than
√
N. Then a2
(mod N) is O(
√
N ), so is more likely to
be B-smooth than is a number that is O(N). Reworking the calculation in
Proposition 3.48, one finds that it suffices to check approximately L(N) ran-
dom numbers of the form a2
(mod N) with a close to
√
N. This is a significant
savings over L(N)
√
2
. See Exercise 3.33 for further details.
Remark 3.50. When estimating the effort needed to factor N, we have
completely ignored the work required to check whether a given number
is B-smooth. For example, if we check for B-smoothness using trial division,
i.e., dividing by each prime less than B, then it takes approximately π(B)
trial divisions to check for B-smoothness. Taking this additional effort into
account in the proof of Proposition 3.48, one finds that it takes approxi-
mately L(N)
√
2
trial divisions to find enough smooth numbers to factor N,
even using values of a ≈
√
N as in Remark 3.49.
The quadratic sieve, which we describe in Sect. 3.7.2, uses a more efficient
method for generating B-smooth numbers and thereby reduces the running
time down to L(N). (See Table 3.7 for a reminder of how L(N) grows and why
a running time of L(N) is much better than a running time of L(N)
√
2
.) In
Exercise 3.29 we ask you to estimate how long it takes to perform L(N) opera-
tions on a moderately fast computer. For a number of years it was thought that
no factorization algorithm could take fewer than a fixed power of L(N) steps,
but the invention of the number field sieve (Sect. 3.7.3) showed this to be
incorrect. The number field sieve, whose running time of ec 3
√
(ln N)(ln ln N)2
is
faster than L(N)
for every 0, achieves its speed by moving beyond the
realm of the ordinary integers.
3.7.2 The Quadratic Sieve
In this section we address the final piece of the puzzle that must be solved in
order to factor large numbers via the difference of squares method described
in Sect. 3.6:

How can we efficiently find many numbers a
√
N
such that each a2
(mod N) is B-smooth?
From the discussion in Sect. 3.7.1 and the proof of Proposition 3.48, we know
that we need to take B ≈ L(N)1/
√
2
in order to have a reasonable chance of
factoring N.
An early approach to finding B-smooth squares modulo N was to look for
fractions a
b that are as close as possible to
√
kN for k = 1, 2, 3, . . . . Then
a2
≈ b2
kN,
so a2
(mod N) is reasonably small, and thus is more likely to be B-smooth.
The theory of continued fractions gives an algorithm for finding such a
b .
See [28, §10.1] for details.
An alternative approach that turns out to be much faster in practice is
to allow slightly larger values of a and to use an efficient cancellation process
called a sieve to simultaneously create a large number of values a2
(mod N)
that are B-smooth. We next describe Pomerance’s quadratic sieve, which is
still the fastest known method for factoring large numbers N = pq up to
about 2350
. For numbers considerably larger than this, say larger than 2450
,
the more complicated number field sieve holds the world record for quickest
factorization. In the remainder of this section we describe the simplest version
of the quadratic sieve as an illustration of modern factorization methods. For
a description of the history of sieve methods and an overview of how they
work, see Pomerance’s delightful essay “A Tale of Two Sieves” [105].
We start with the simpler problem of rapidly finding many B-smooth
numbers less than some bound X, without worrying whether the numbers have
the form a2
(mod N). To do this, we adapt the Sieve of Eratosthenes, which
is an ancient Greek method for making lists of prime numbers. Eratosthenes’
idea for finding primes is as follows. Start by circling the first prime 2 and
crossing off every larger multiple of 2. Then circle the next number, 3 (which
must be prime) and cross off every larger multiple of 3. The smallest uncircled
number is 5, so circle 5 and cross off all larger multiples of 5, and so on. At
the end, the circled numbers are the primes.
This sieving process is illustrated in Fig. 3.2, where we have sieved all
primes less than 10. (These are the boxed primes in the figure.) The remaining
uncrossed numbers in the list are all remaining primes smaller than 100.
2 3 4/ 5 /
/
6 7 /
/0
2
9
1
/
/8
1
7
1
/
6
1
/
/5
1
/
/4
1
3
1
/
/2
1
1
1
/
/0
1
/
9
/
8
/
/0
4
/
9
3
/
8
3
7
3
/
/6
3
/
/5
3
/
4
3
/
3
3
/
2
3
1
3
/
/
/0
3
9
2
/
/8
2
/
7
2
/
6
2
/
5
2
/
/4
2
3
2
/
2
2
/
/1
2
/
/
/0
6
9
5
/
8
5
/
7
5
/
/6
5
/
5
5
/
/4
5
3
5
/
2
5
/
1
5
/
/0
5
/
9
4
/
/8
4
7
4
/
6
4
/
/5
4
/
4
4
3
4
/
/
/2
4
1
4
/
/0
8
9
7
/
/8
7
/
7
7
/
6
7
/
/5
7
/
4
7
3
7
/
/2
7
1
7
/
/
/0
7
/
9
6
/
8
6
7
6
/
/6
6
/
5
6
/
4
6
/
/3
6
/
2
6
1
6
/
9
9
/
/8
9
7
9
/
/6
9
/
5
9
/
4
9
/
3
9
/
2
9
/
1
9
/
/
/0
9
9
8
/
8
8
/
7
8
/
6
8
/
5
8
/
/
/4
8
3
8
/
2
8
/
1
8
Figure 3.2: The sieve of Eratosthenes

Notice that some numbers are crossed off several times. For example, 6,
12 and 18 are crossed off twice, once because they are multiples of 2 and once
because they are multiples of 3. Similarly, numbers such as 30 and 42 are
crossed off three times. Suppose that rather than crossing numbers off, we
instead divide. That is, we begin by dividing every even number by 2, then we
divide every multiple of 3 by 3, then we divide every multiple of 5 by 5, and
so on. If we do this for all primes less than B, which numbers end up being
divided all the way down to 1? The answer is that these are the numbers that
are a product of distinct primes less than B; in particular, they are B-smooth!
So we end up with a list of many B-smooth numbers.
Unfortunately, we miss some B-smooth numbers, namely those divisible by
powers of small primes, but it is easy to remedy this problem by sieving with
prime powers. Thus after sieving by 3, rather than proceeding to 5, we first
sieve by 4. To do this, we cancel an additional factor of 2 from every multiple
of 4. (Notice that we’ve already canceled 2 from these numbers, since they are
even, so we can cancel only one additional factor of 2.) If we do this, then at the
end, the B-smooth numbers less than X are precisely the numbers that have
been reduced to 1. One can show that the total number of divisions required
is approximately X ln(ln(B)). The double logarithm function ln(ln(B)) grows
extremely slowly, so the average number of divisions required to check each
individual number for smoothness is approximately constant.
However, our goal is not to make a list of numbers from 1 to X that are
B-smooth. What we need is a list of numbers of the form a2
(mod N) that
are B-smooth. Our strategy for accomplishing this uses the polynomial
F(T) = T2
− N.
We want to start with a value of a that is slightly larger than
√
N, so we set
a =
√
N + 1,
where x denotes, as usual, the greatest integer less than or equal to x. We
then look at the list of numbers
F(a), F(a + 1), F(a + 2), . . . , F(b). (3.21)
The idea is to find the B-smooth numbers in this list by sieving away the
primes smaller than B and seeing which numbers in the list get sieved all the
way down to 1. We choose B sufficiently large so that, by the end of the sieving
process, we are likely to have found enough B-smooth numbers to factor N.
The following definition is useful in describing this process.
Definition. The set of primes less than B (or sometimes the set of prime
powers less than B) is called the factor base.
Suppose that p is a prime in our factor base. Which of the numbers in the
list (3.21) are divisible by p? Equivalently, which numbers t between a and b
satisfy
t2
≡ N (mod p)? (3.22)

If the congruence (3.22) has no solutions, then we discard the prime p, since p
divides none of the numbers in the list (3.21). Otherwise the congruence (3.22)
has two solutions (see Exercise 1.36 on page 55), which we denote by
t = αp and t = βp.
(If p = 2, there is only one solution αp.) It follows that each of the numbers
F(αp), F(αp + p), F(αp + 2p), F(αp + 3p), . . .
and each of the numbers
F(βp), F(βp + p), F(βp + 2p), F(βp + 3p), . . .
is divisible by p. Thus we can sieve away a factor of p from every pth entry in
the list (3.21), starting with the smallest a value satisfying a ≡ αp (mod p),
and similarly we can sieve away a factor of p from every pth entry in the
list (3.21), starting with the smallest a value satisfying a ≡ βp (mod p).
Example 3.51. We illustrate the quadratic sieve applied to the composite
number N = 221. The smallest number whose square is larger than N is
a =
√
221 + 1 = 15. We set
F(T) = T2
− 221
and sieve the numbers from F(15) = 4 up to F(30) = 679 using successively
the prime powers from 2 to 7. The initial list of numbers T2
− N is8
4 35 68 103 140 179 220 263 308 355 404 455 508 563 620 679.
We first sieve by p = 2, which means that we cancel 2 from every second entry
in the list. This gives
4 35 68 103 140 179 220 263 308 355 404 455 508 563 620 679
↓2 ↓2 ↓2 ↓2 ↓2 ↓2 ↓2 ↓2
2 35 34 103 70 179 110 263 154 355 202 455 254 563 310 679
Next we sieve by p = 3. However, it turns out that the congruence
t2
≡ 221 ≡ 2 (mod 3)
has no solutions, so none of the entries in our list are divisible by 3.
8In practice when N is large, the t values used in the quadratic sieve are close enough
to
√
N that the value of t2 − N is between 1 and N. For our small numerical example, this is
not the case, so it would be more efficient to reduce our values of t2 modulo N, rather than
merely subtracting N from t2. However, since our aim is illumination, not efficiency, we will
pretend that there is no advantage to subtracting additional multiples of N from t2 − N.

We move on to the prime power 22
. Every odd number is a solution of the
congruence
t2
≡ 221 ≡ 1 (mod 4),
which means that we can sieve another factor of 2 from every second entry
in our list. We put a small 4 next to the sieving arrows to indicate that in
this step we are sieving by 4, although we cancel only a factor of 2 from each
entry.
2 35 34 103 70 179 110 263 154 355 202 455 254 563 310 679
↓4 ↓4 ↓4 ↓4 ↓4 ↓4 ↓4 ↓4
1 35 17 103 35 179 55 263 77 355 101 455 127 563 155 679
Next we move on to p = 5. The congruence
t2
≡ 221 ≡ 1 (mod 5)
has two solutions, α5 = 1 and β5 = 4 modulo 5. The first t value in our list
that is congruent to 1 modulo 5 is t = 16, so starting with F(16), we find that
every fifth entry is divisible by 5. Sieving out these factors of 5 gives
1 35 17 103 35 179 55 263 77 355 101 455 127 563 155 679
↓5 ↓5 ↓5
1 7 17 103 35 179 11 263 77 355 101 91 127 563 155 679
Similarly, every fifth entry starting with F(19) is divisible by 5, so we sieve
out those factors
1 7 17 103 35 179 11 263 77 355 101 91 127 563 155 679
↓5 ↓5 ↓5
1 7 17 103 7 179 11 263 77 71 101 91 127 563 31 679
To conclude our example, we sieve the prime p = 7. The congruence
t2
≡ 221 ≡ 4 (mod 7)
has the two solutions α7 = 2 and β7 = 5. We can thus sieve 7 away from
every seventh entry starting with F(16), and also every seventh entry starting
with F(19). This yields
1 7 17 103 7 179 11 263 77 71 101 91 127 563 31 679
↓7 ↓7 ↓7
1 1 17 103 7 179 11 263 11 71 101 91 127 563 31 97
↓7 ↓7
1 1 17 103 1 179 11 263 11 71 101 13 127 563 31 97
Notice that the original entries
F(15) = 4, F(16) = 35, and F(19) = 140

have been sieved all the way down to 1. This tells us that
F(15) = 152
− 221, F(16) = 162
− 221, and F(19) = 192
− 221
are each a product of small primes, so we have discovered several squares
modulo 221 that are products of small primes:
152
≡ 22
(mod 221),
162
≡ 5 · 7 (mod 221),
192
≡ 22
· 5 · 7 (mod 221).
(3.23)
We can use the congruences (3.23) to obtain various relations between
squares. For example,
(16 · 19)2
≡ (2 · 5 · 7)2
(mod 221).
Computing
gcd(221, 16 · 19 − 2 · 5 · 7) = gcd(221, 234) = 13
gives a nontrivial factor of 221.9
We have successfully factored N = 221, but to illustrate the sieving process
further, we continue sieving up to B = 11. The next prime power to sieve
is 32
. However, the fact that t2
≡ 221 (mod 3) has no solutions means that
t2
≡ 221 (mod 9) also has no solutions, so we move on to the prime p = 11.
The congruence t2
≡ 221 ≡ 1 (mod 11) has the solutions α11 = 1 and
β11 = 10, which allows us to sieve a factor of 11 from F(23) and from F(21).
We recapitulate the entire sieving process in Fig. 3.3, where the top row gives
values of t and the subsequent rows sieve the values of F(t) = t2
− 221 using
prime powers up to 11.
Notice that two more entries, F(21) and F(23), have been sieved down
to 1, which gives us two additional relations
F(21) ≡ 212
≡ 22
·5·11 (mod 221) and F(23) ≡ 232
≡ 22
·7·11 (mod 221).
We can combine these relations with the earlier relations (3.23) to obtain new
square equalities, for example
(19 · 21 · 23)2
≡ (23
· 5 · 7 · 11)2
(mod 221).
These give another way to factor 221:
gcd(221, 19 · 21 · 23 − 23
· 5 · 7) = gcd(221, 6097) = 13.
9Looking back at the congruences (3.23), you may have noticed that it is even
easier to use the fact that 152 is itself congruent to a square modulo 221, yielding
gcd(15 − 2, 221) = 13. In practice, the true power of the quadratic sieve appears only when
it is applied to numbers much too large to use in a textbook example.

Remark 3.52. If p is an odd prime, then the congruence t2
≡ N (mod p) has
either 0 or 2 solutions modulo p. More generally, congruences
t2
≡ N (mod pe
)
modulo powers of p have either 0 or 2 solutions. (See Exercises 1.36 and 1.37.)
This makes sieving odd prime powers relatively straightforward. Sieving with
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
4 35 68 103 140 179 220 263 308 355 404 455 508 563 620 679
↓2 ↓2 ↓2 ↓2 ↓2 ↓2 ↓2 ↓2
2 35 34 103 70 179 110 263 154 355 202 455 254 563 310 679
↓4 ↓4 ↓4 ↓4 ↓4 ↓4 ↓4 ↓4
1 35 17 103 35 179 55 263 77 355 101 455 127 563 155 679
↓5 ↓5 ↓5
1 7 17 103 35 179 11 263 77 355 101 91 127 563 155 679
↓5 ↓5 ↓5
1 7 17 103 7 179 11 263 77 71 101 91 127 563 31 679
↓7 ↓7 ↓7
1 1 17 103 7 179 11 263 11 71 101 91 127 563 31 97
↓7 ↓7
1 1 17 103 1 179 11 263 11 71 101 13 127 563 31 97
↓11
1 1 17 103 1 179 11 263 1 71 101 13 127 563 31 97
↓11
1 1 17 103 1 179 1 263 1 71 101 13 127 563 31 97
Figure 3.3: Sieving N = 221 using prime powers up to B = 11
powers of 2 is a bit trickier, since the number of solutions may be different
modulo 2, modulo 4, and modulo higher powers of 2. Further, there may
be more than two solutions. For example, t2
≡ N (mod 8) has four different
solutions modulo 8 if N ≡ 1 (mod 8). So although sieving powers of 2 is not
intrinsically difficult, it must be dealt with as a special case.
Remark 3.53. There are many implementation ideas that can be used to
greatly increase the practical speed of the quadratic sieve. Although the run-
ning time of the sieve remains a constant multiple of L(N), the multiple can
be significantly reduced.
A time-consuming part of the sieve is the necessity of dividing every pth en-
try by p, since if the numbers are large, division by p is moderately compli-
cated. Of course, computers perform division quite rapidly, but the sieving
process requires approximately L(N) divisions, so anything that decreases
this time will have an immediate effect. A key idea to speed up this step is
to use approximate logarithms, which allows the slower division operations to
be replaced by faster subtraction operations.
We explain the basic idea. Instead of using the list of values
F(a), F(a + 1), F(a + 2), . . . ,

we use a list of integer values that are approximately equal to
log F(a), log F(a + 1), log F(a + 2), log F(a + 3), . . . .
In order to sieve p from F(t), we subtract an integer approximation of log p
from the integer approximation to log F(t), since by the rule of logarithms,
log F(t) − log p = log
F(t)
p
.
If we were to use exact values for the logarithms, then at the end of the sieving
process, the entries that are reduced to 0 would be precisely the values of F(t)
that are B-smooth. However, since we use only approximate logarithm values,
at the end we look for entries that have been reduced to a small number. Then
we use division on only those few entries to find the ones that are actually
B-smooth.
A second idea that can be used to speed the quadratic sieve is to use the
polynomial F(t) = t2
− N only until t reaches a certain size, and then replace
it with a new polynomial. For details of these two implementation ideas and
many others, see for example [28, §10.4], [34], or [109] and the references that
they list.
3.7.3 The Number Field Sieve
The number field sieve is a factorization method that works in a ring that is
larger than the ordinary integers. The full details are very complicated, so in
this section we are content to briefly explain some of the ideas that go into
making the number field sieve the fastest known method for factoring large
numbers of the form N = pq, where p and q are primes of approximately the
same order of magnitude.
In order to factor N, we start by finding a nonzero integer m and an
irreducible monic polynomial f(x) ∈ Z[x] of small degree satisfying
f(m) ≡ 0 (mod N).
Example 3.54. Suppose that we want to factor the number N = 229
+1. Then
we could take m = 2103
and f(x) = x5
+ 8, since
f(m) = f(2103
) = 2515
+ 8 = 8(2512
+ 1) ≡ 0 (mod 229
+ 1).
Let d be the degree of f(x) and let β be a root of f(x). (Note that β might
be a complex number.) We will work in the ring
Z[β] = {c0 + c1β + c2β2
+ · · · + cd−1βd−1
∈ C : c0, c1, . . . , cd−1 ∈ Z}.
Note that although we have written Z[β] as a subring of the complex numbers,
it isn’t actually necessary to deal with real or complex numbers. We can work
with Z[β] purely algebraically, since it is equal to the quotient ring Z[x]/(f(x)).
(See Sect. 2.10.2 for information about quotient rings.)

Example 3.55. We give an example to illustrate how one performs addition
and multiplication in the ring Z[β]. Let f(x) = 1 + 3x − 2x3
+ x4
, let β be a
root of f(x), and consider the ring Z[β]. In order to add the elements
u = 2 − 4β + 7β2
+ 3β3
and v = 1 + 2β − 4β2
− 2β3
,
we simply add their coefficients,
u + v = 3 − 2β + 3β2
+ β3
.
Multiplication is a bit more complicated. First we multiply u and v, treating β
as if it were a variable,
uv = 2 − 9β2
+ 29β3
− 14β4
− 26β5
− 6β6
.
Then we divide by f(β) = 1 + 3β − 2β3
+ β4
, still treating β as a variable,
and keep the remainder,
uv = 92 + 308β + 111β2
− 133β3
∈ Z[β].
The next step in the number field sieve is to find a large number of pairs
of integers (a1, b1), . . . , (ak, bk) that simultaneously satisfy
k

i=1
(ai − bim) is a square in Z and
k

i=1
(ai − biβ) is a square in Z[β].
Thus there is an integer A ∈ Z and an element α ∈ Z[β] such that
k

i=1
(ai − bim) = A2
and
k

i=1
(ai − biβ) = α2
. (3.24)
By definition of Z[β], we can find an expression for α of the form
α = c0 + c1β + c2β2
+ · · · + cd−1βd−1
with c0, c1, . . . , cd−1 ∈ Z. (3.25)
Recall our original assumption f(m) ≡ 0 (mod N). This means that we
have
m ≡ β (mod N) in the ring Z[β].
So on the one hand, (3.24) becomes
A2
≡ α2
(mod N) in the ring Z[β],
while on the other hand, (3.25) becomes
α ≡ c0 + c1m + c2m2
+ · · · + cd−1md−1
(mod N) in the ring Z[β].

Hence
A2
≡ (c0 + c1m + c2m2
+ · · · + cd−1md−1
)2
(mod N).
Thus we have created a congruence A2
≡ B2
(mod N) that is valid in the ring
of integers Z, and as usual, there is then a good chance that gcd(A − B, N)
will yield a nontrivial factor of N.
How do we find the (ai, bi) pairs to make both of the products (3.24) into
squares? For the first product, we can use a sieve-type algorithm, similar to the
method used in the quadratic sieve, to find values of a − bm that are smooth,
and then use linear algebra to find a subset with the desired property.
Pollard’s idea is to simultaneously do something similar for the second
product while working in the ring Z[β]. Thus we look for pairs of integers (a, b)
such that the quantity a − bβ is “smooth” in Z[β]. There are many serious
issues that arise when we try to do this, including the following:
1. The ring Z[β] usually does not have unique factorization of elements into
primes or irreducible elements. So instead, we factor the ideal (a−bβ) into
a product of prime ideals. We say that a−bβ is smooth if the prime ideals
appearing in the factorization are small.
2. Unfortunately, even ideals in the ring Z[β] may not have unique factoriza-
tion as a product of prime ideals. However, there is a slightly larger ring,
called the ring of integers of Q(β), in which unique factorization of ideals
is true.
3. Suppose that we have managed to make the ideal (

(ai − biβ)) into the
square of an ideal in Z[β]. There are two further problems. First, it need
not be the square of an ideal generated by a single element. Second, even
if it is equal to an ideal of the form (γ)2
, we can conclude only that

(a−
i − biβ) = uγ2
for some unit u ∈ Z[β]∗
, and generally the ring Z[β] has
infinitely many units.
It would take us too far afield to explain how to deal with these potential
difficulties. Suffice it to say that through a number of ingenious ideas due
to Adleman, Buhler, H. Lenstra, Pomerance, and others, the obstacles were
overcome, leading to a practical factorization method. (See [105] for a nice
overview of the number field sieve and some of the ideas used to turn it from
a theoretical construction into a working algorithm.)
However, we will comment further on the first step in the algorithm. In
order to get started, we need an integer m and a monic irreducible polyno-
mial f(x) of small degree such that f(m) ≡ 0 (mod N). The trick is first to
choose the desired degree d of f, next to choose an integer m satisfying
(N/2)1/d
m N1/d
,
and then to write N as a number to the base m,
N = c0 + c1m + c2m2
+ · · · + cd−1md−1
+ cdmd
with 0 ≤ ci m.

The condition on m ensures that cd = 1, so we can take f to be the monic
polynomial
f(x) = c0 + c1x + c2x2
+ · · · + cd−1xd−1
+ xd
.
We also need f(x) to be irreducible, but if f(x) factors in Z[x], say f(x) =
g(x)h(x), then N = f(m) = g(m)h(m) gives a factorization of N and we are
done. So now we have an f(x) and an m, which allows us to get started using
the number field sieve.
There is no denying the fact that the number field sieve is much more
complicated than the quadratic sieve. So why is it useful? The reason has to
do with the size of the numbers that must be considered. Recall that for the
quadratic sieve, we sieved to find smooth numbers of the form

√
N + k
2
− N for k = 1, 2, 3, . . . .
So we needed to pick out the smooth numbers from a set of numbers whose
size is a little larger than
√
N. For the number field sieve one ends up looking
for smooth numbers of the form
(a − mb) · bd
f(a/b), (3.26)
and it turns out that by a judicious choice of m and f, these numbers are much
smaller than
√
N. In order to describe how much smaller, we use a general-
ization of the subexponential function L(N) that was so useful in describing
the running time of the quadratic sieve.
Definition. For any 0 1, we define the function
L(X) = e(ln X)
(ln ln X)1−
.
Notice that with this notation, the function L(X) defined in Sect. 3.7.1
is L1/2(X).
Then one can show that the numbers (3.26) used by the number field
sieve have size a small power of L2/3(N). To put this into perspective, the
quadratic sieve works with numbers having approximately half as many digits
as N, while the number field sieve uses numbers K satisfying
(Number of digits of K) ≈ (Number of digits of N)2/3
.
This leads to a vastly improved running time for sufficiently large values of N.
Theorem 3.56. Under some reasonable assumptions, the expected running
time of the number field sieve to factor the number N is L1/3(N)c
for a small
value of c.
For general numbers, the best known value of c in Theorem 3.56 is a bit
less than 2, while for special numbers such as 229
+ 1 it is closer to 1.5. Of
course, the number field sieve is sufficiently complicated that it becomes faster
than other methods only when N is sufficiently large. As a practical matter,
the quadratic sieve is faster for numbers smaller than 10100
, while the number
field sieve is faster for numbers larger than 10130
.

3.8 The Index Calculus Method for Computing
Discrete Logarithms in Fp
The index calculus is a method for solving the discrete logarithm problem in a
finite field Fp. The algorithm uses smooth numbers and bears some similarity
to the sieve methods that we have studied in this chapter, which is why we
cover it here, rather than in Chap. 2, where we originally discussed discrete
logarithms.
The idea behind the index calculus is fairly simple. We want to solve the
gx
≡ h (mod p), (3.27)
where the prime p and the integers g and h are given. For simplicity, we will
assume that g is a primitive root modulo p, so its powers give all of F∗
p.
Rather than solving (3.27) directly, we instead choose a value B and solve
the discrete logarithm problem
gx
≡ (mod p) for all primes ≤ B.
In other words, we compute the discrete logarithm logg() for every prime
satisfying ≤ B.
Having done this, we next look at the quantities
h · g−k
(mod p) for k = 1, 2, . . .
until we find a value of k such that h · g−k
(mod p) is B-smooth. For this
value of k we have
h · g−k
≡

≤B
e
(mod p) (3.28)
for certain exponents e. We rewrite (3.28) in terms of discrete logarithms as
logg(h) ≡ k +

≤B
e · logg() (mod p − 1), (3.29)
where recall that discrete logarithms are defined only modulo p−1. But we are
assuming that we already computed logg() for all primes ≤ B. Hence (3.29)
gives the value of logg(h).
It remains to explain how to find logg() for small primes . Again the idea
is simple. For a random selection of exponents i we compute
gi ≡ gi
(mod p) with 0 gi p.
If gi is not B-smooth, then we discard it, while if gi is B-smooth, then we can
factor it as
gi =

≤B
u(i)
.

3.8. The Index Calculus and Discrete Logarithms 167
In terms of discrete logarithms, this gives the relation
i ≡ logg(gi) ≡

≤B
u(i) · logg() (mod p − 1). (3.30)
Notice that the only unknown quantities in the formula (3.30) are the dis-
crete logarithm values logg(). So if we can find more than π(B) equations
like (3.30), then we can use linear algebra to solve for the logg() “variables.”
This method of solving the discrete logarithm problem in Fp is called the
index calculus, where recall from Sect. 2.2 that index is an older name for
discrete logarithm. The index calculus first appears in work of Western and
Miller [148] in 1968, so it predates by a few years the invention of public key
cryptography. The method was independently rediscovered by several cryp-
tographers in the 1970s after the publication of the Diffie–Hellman paper [38].
Remark 3.57. A minor issue that we have ignored is the fact that the lin-
ear equations (3.30) are congruences modulo p − 1. Standard linear algebra
methods such as Gaussian elimination do not work well modulo composite
numbers, because there are numbers that do not have multiplicative inverses.
The Chinese remainder theorem (Theorem 2.24) solves this problem. First we
solve the congruences (3.30) modulo q for each prime q dividing p − 1. Then,
if q appears in the factorization of p − 1 to a power qe
, we lift the solution
from Z/qZ to Z/qe
Z. Finally, we use the Chinese remainder theorem to com-
bine solutions modulo prime powers to obtain a solution modulo p − 1. In
cryptographic applications one should choose p such that p − 1 is divisible by
a large prime; otherwise, the Pohlig–Hellman algorithm (Sect. 2.9) solves the
discrete logarithm problem. For example, if we select p = 2q +1 with q prime,
then the index calculus requires us to solve simultaneous congruences (3.30)
modulo q and modulo 2.
There are many implementation issues that arise and tricks that have been
developed in practical applications of the index calculus. We do not pursue
these matters here, but are content to present a small numerical example
illustrating how the index calculus works.
Example 3.58. We let p be the prime p = 18443 and use the index calculus
to solve the discrete logarithm problem
37x
≡ 211 (mod 18443).
We note that g = 37 is a primitive root modulo p = 18443 We take B = 5,
so our factor base is the set of primes {2, 3, 5}. We start by taking random
powers of g = 37 modulo 18443 and pick out the ones that are B-smooth.
A couple of hundred attempts gives four equations:
g12708
≡ 23
· 34
· 5 (mod 18443), g11311
≡ 23
· 52
(mod 18443),
g15400
≡ 23
· 33
· 5 (mod 18443), g2731
≡ 23
· 3 · 54
(mod 18443).
(3.31)

These in turn give linear relations for the discrete logarithms of 2, 3, and 5 to
the base g. For example, the first one says that
12708 = 3 · logg(2) + 4 · logg(3) + logg(5).
To ease notation, we let
x2 = logg(2), x3 = logg(3), and x5 = logg(5).
Then the four congruences (3.31) become the following four linear relations:
12708 = 3x2 + 4x3 + x5 (mod 18442),
11311 = 3x2 + 2x5 (mod 18442),
15400 = 3x2 + 3x3 + x5 (mod 18442),
2731 = 3x2 + x3 + 4x5 (mod 18442).
(3.32)
Note that the formulas (3.32) are congruences modulo
p − 1 = 18442 = 2 · 9221,
since discrete logarithms are defined only modulo p − 1. The number 9221
is prime, so we need to solve the system of linear equations (3.32) modulo 2
and modulo 9221. This is easily accomplished by Gaussian elimination, i.e.,
by adding multiples of one equation to another to eliminate variables. The
solutions are
(x2, x3, x5) ≡ (1, 0, 1) (mod 2),
(x2, x3, x5) ≡ (5733, 6529, 6277) (mod 9221).
Combining these solutions yields
(x2, x3, x5) ≡ (5733, 15750, 6277) (mod 18442).
We check the solutions by computing
375733
≡ 2 (mod 18443), 3715750
≡ 3 (mod 18443), 376277
≡ 5 (mod 18443).
Recall that our ultimate goal is to solve the discrete logarithm problem
37x
≡ 211 (mod 18443).
We compute the value of 211 · 37−k
(mod 18443) for random values of k until
we find a value that is B-smooth. After a few attempts we find that
211 · 37−9549
≡ 25
· 32
· 52
(mod 18443).
Using the values of the discrete logs of 2, 3, and 5 from above, this yields
logg(211) = 9549 + 5 logg(2) + 2 logg(3) + 2 logg(5)
= 9549 + 5 · 5733 + 2 · 15750 + 2 · 6277 ≡ 8500 (mod 18442).
Finally, we check our answer logg(211) = 8500 by computing
378500
≡ 211 (mod 18443).

3.9. Quadratic Residues and Quadratic Reciprocity 169
Remark 3.59. We can roughly estimate the running time of the index calculus
as follows. Using a factor base consisting of primes less than B, we need to
find approximately π(B) numbers of the form gi
(mod p) that are B-smooth.
Proposition 3.48 suggests that we should take B = L(p)1/
√
2
, and then we
will have to check approximately L(p)
√
2
values of i. There is also the issue
of checking each value to see whether it is B-smooth, but sieve-type methods
can be used to speed the process. Further, using ideas based on the number
field sieve, the running time can be further reduced to a small power L1/3(p).
In any case, the index calculus is a subexponential algorithm for solving the
discrete logarithm problem in F∗
p. This stands in marked contrast to the dis-
crete logarithm problem in elliptic curve groups, which we study in Chap. 6.
Currently, the best known algorithms to solve the general discrete logarithm
problem in elliptic curve groups are fully exponential.
3.9 Quadratic Residues and Quadratic
Reciprocity
Let p be a prime number. Here is a simple mathematical question:
How can Bob tell whether a given number a is
equal to a square modulo p?
For example, suppose that Alice asks Bob whether 181 is a square mod-
ulo 1223. One way for Bob to answer Alice’s question is by constructing a table
of squares modulo 1223 as illustrated in Table 3.8, but this is a lot of work,
so he gave up after computing 962
mod 1223. Alice picked up the computa-
tion where Bob stopped and eventually found that 4372
≡ 181 (mod 1223).
Thus the answer to her question is that 181 is indeed a square modulo 1223.
Similarly, if Alice is sufficiently motivated to continue the table all the way
up to 12222
mod 1223, she can verify that the number 385 is not a square
modulo 1223, because it does not appear in her table. (In fact, Alice can save
half her time by computing only up to 6112
mod 1223, since a2
and (p − a)2
have the same values modulo p.)
Our goal in this section is to describe a more much efficient way to check
if a number is a square modulo a prime. We begin with a definition.
Definition. Let p be an odd prime number and let a be a number with p a.
We say that a is a quadratic residue modulo p if a is a square modulo p, i.e.,
if there is a number c so that c2
≡ a (mod p). If a is not a square modulo p,
i.e., if there exists no such c, then a is called a quadratic nonresidue modulo p.
Example 3.60. The numbers 968 and 1203 are both quadratic residues mod-
ulo 1223, since
4532
≡ 968 (mod 1223) and 3752
≡ 1203 (mod 1223).

1
2
≡ 1 2
2
≡ 4 3
2
≡ 9 4
2
≡ 16 5
2
≡ 25 6
2
≡ 36 7
2
≡ 49 8
2
≡ 64 9
2
≡ 81
10
2
≡ 100 11
2
≡ 121 12
2
≡ 144 13
2
≡ 169 14
2
≡ 196 15
2
≡ 225 16
2
≡ 256 17
2
≡ 289 18
2
≡ 324
19
2
≡ 361 20
2
≡ 400 21
2
≡ 441 22
2
≡ 484 23
2
≡ 529 24
2
≡ 576 25
2
≡ 625 26
2
≡ 676 27
2
≡ 729
28
2
≡ 784 29
2
≡ 841 30
2
≡ 900 31
2
≡ 961 32
2
≡ 1024 33
2
≡ 1089 34
2
≡ 1156 35
2
≡ 2 36
2
≡ 73
37
2
≡ 146 38
2
≡ 221 39
2
≡ 298 40
2
≡ 377 41
2
≡ 458 42
2
≡ 541 43
2
≡ 626 44
2
≡ 713 45
2
≡ 802
46
2
≡ 893 47
2
≡ 986 48
2
≡ 1081 49
2
≡ 1178 50
2
≡ 54 51
2
≡ 155 52
2
≡ 258 53
2
≡ 363 54
2
≡ 470
55
2
≡ 579 56
2
≡ 690 57
2
≡ 803 58
2
≡ 918 59
2
≡ 1035 60
2
≡ 1154 61
2
≡ 52 62
2
≡ 175 63
2
≡ 300
64
2
≡ 427 65
2
≡ 556 66
2
≡ 687 67
2
≡ 820 68
2
≡ 955 69
2
≡ 1092 70
2
≡ 8 71
2
≡ 149 72
2
≡ 292
73
2
≡ 437 74
2
≡ 584 75
2
≡ 733 76
2
≡ 884 77
2
≡ 1037 78
2
≡ 1192 79
2
≡ 126 80
2
≡ 285 81
2
≡ 446
82
2
≡ 609 83
2
≡ 774 84
2
≡ 941 85
2
≡ 1110 86
2
≡ 58 87
2
≡ 231 88
2
≡ 406 89
2
≡ 583 90
2
≡ 762
91
2
≡ 943 92
2
≡ 1126 93
2
≡ 88 94
2
≡ 275 95
2
≡ 464 96
2
≡ 655 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 3.8: Bob’s table of squares modulo 1223
On the other hand, the numbers 209 and 888 are quadratic nonresidues mod-
ulo 1223, since the congruences
c2
≡ 209 (mod 1223) and c2
≡ 888 (mod 1223)
have no solutions.
The next proposition describes what happens when quadratic residues and
nonresidues are multiplied together.
Proposition 3.61. Let p be an odd prime number.
(a) The product of two quadratic residues modulo p is a quadratic residue
modulo p.
(b) The product of a quadratic residue and a quadratic nonresidue modulo p
is a quadratic nonresidue modulo p.
(c) The product of two quadratic nonresidues modulo p is a quadratic residue
modulo p.
Proof. It is easy to prove (a) and (b) directly from the deﬁnition of quadratic
residue, but we use a diﬀerent approach that gives all three parts simultane-
ously. Let g be a primitive root modulo p as described in Theorem 1.30. This
means that the powers 1, g, g2
, . . . , gp−2
are all distinct modulo p.
Which powers of g are quadratic residues modulo p? Certainly if m = 2k
is even, then gm
= g2k
= (gk
)2
is a square.
On the other hand, let m be odd, say m = 2k + 1, and suppose
that gm
is a quadratic residue, say gm
≡ c2
(mod p). Fermat’s little theorem
cp−1
≡ 1 (mod p).
However, cp−1
(mod p) is also equal to

cp−1
≡ (c2
)
p−1
2 ≡ (gm
)
p−1
2 ≡ (g2k+1
)
p−1
2 ≡ gk(p−1)
· g
p−1
2 (mod p).
Another application of Fermat’s little theorem tells us that
gk(p−1)
≡ (gp−1
)k
≡ 1k
≡ 1 (mod p),
so we find that
g
p−1
2 ≡ 1 (mod p).
This contradicts the fact that g is a primitive root, which proves that every
odd power of g is a quadratic nonresidue.
We have proven an important dichotomy. If g is a primitive root modulo p,
then
gm
is a

quadratic residue if m is even,
quadratic nonresidue if m is odd.
It is now a simple matter to prove Proposition 3.61. In each case we write a
and b as powers of g, multiply a and b by adding their exponents, and read
off the result.
(a) Suppose that a and b are quadratic residues. Then a = g2i
and b = g2j
,
so ab = g2(i+j)
has even exponent, and hence ab is a quadratic residue.
(b) Let a be a quadratic residue and let b be a nonresidue. Then a = g2i
and b = g2j+1
, so ab = g2(i+j)+1
has odd exponent, and hence ab is a quadratic
nonresidue.
(c) Finally, let a and b both be nonresidues. Then a = g2i+1
and b = g2j+1
,
so ab = g2(i+j+1)
has even exponent, and hence ab is a quadratic residue.
If we write QR to denote a quadratic residue and NR to denote a quadratic
nonresidue, then Proposition 3.61 may be succinctly summarized by the three
equations
QR · QR = QR, QR · NR = NR, NR · NR = QR.
Do these equations look familiar? They resemble the rules for multiplying 1
and −1. This observation leads to the following definition.
Definition. Let p be an odd prime. The Legendre symbol of a is the quan-
tity
a
p

defined by the rules

a
p

=
⎧
⎪
⎨
⎪
⎩
1 if a is a quadratic residue modulo p,
−1 if a is a quadratic nonresidue modulo p,
0 if p | a.
With this deﬁnition, Proposition 3.61 is summarized by the simple multi-
plication rule10
10Proposition 3.61 deals only with the case that p a and p b. But if p divides a or b,
then p also divides ab, so both sides of (3.33) are zero.


a
p

. (3.33)
We also make the obvious, but useful, observation that
If a ≡ b (mod p), then

a
p

. (3.34)
Thus in computing
a
p

, we may reduce a modulo p into the interval from 0
to p − 1. It is worth adding a cautionary note: The notation for the Legendre
symbol resembles a fraction, but it is not a fraction!
Returning to our original question of determining whether a given number
is a square modulo p, the following beautiful and powerful theorem provides
a method for determining the answer.
Theorem 3.62 (Quadratic Reciprocity). Let p and q be odd primes.
(a)

−1
p

=

1 if p ≡ 1 (mod 4),
−1 if p ≡ 3 (mod 4).
(b)

2
p

=

1 if p ≡ 1 or 7 (mod 8),
−1 if p ≡ 3 or 5 (mod 8).
(c)

p
q

=
⎧
⎪
⎪
⎨
⎪
⎪
⎩

q
p

if p ≡ 1 (mod 4) or q ≡ 1 (mod 4),
−

q
p

if p ≡ 3 (mod 4) and q ≡ 3 (mod 4).
Proof. We do not give a proof of quadratic reciprocity, but you will find a proof
in any introductory number theory textbook, such as [35, 52, 59, 100, 111].
The name “quadratic reciprocity” comes from property (c), which tells us
how
p
q

is related to its “reciprocal”
q
p

. It is worthwhile spending some time
contemplating Theorem 3.62, because despite the simplicity of its statement,
quadratic reciprocity is saying something quite unexpected and profound. The
value of
p
q

tells us whether p is a square modulo q. Similarly,
q
p

tells us
whether q is a square modulo p. There is no a priori reason to suspect that
these questions should have anything to do with one another. Quadratic reci-
procity tells us that they are intimately related, and indeed, related by a very
simple rule.
Similarly, parts (a) and (b) of quadratic reciprocity give us some surprising
information. The first part says that the question whether −1 is a square
modulo p is answered by the congruence class of p modulo 4, and the second
part says that question whether 2 is a square modulo p is answered by the
congruence class of p modulo 8.
We indicated earlier that quadratic reciprocity can be used to determine
whether a is a square modulo p. The way to apply quadratic reciprocity is to
use (c) to repeatedly flip the Legendre symbol, where each time that we flip,
we’re allowed to reduce the top number modulo the bottom number. This

leads to a rapid reduction in the size of the numbers, as illustrated by the
following example.
Example 3.63. We check whether −15750 is a quadratic residue modulo 37907
using quadratic reciprocity to compute the Legendre symbol
−15750
37907

.

−15750
37907

Multiplication rule (3.33)
= −

15750
37907

Quadratic Reciprocity 3.62(a)
= −

2 · 32
· 53
· 7
37907

Multiplication rule (3.33)
= −

2
37907

Quadratic Reciprocity 3.62(b)
=

37907
5

Quadratic Reciprocity 3.62(c)
= −

2
5

since 37907 ≡ 2 (mod 5)
and 37907 ≡ 2 (mod 7)
= −(−1) × 1 Quadratic Reciprocity 3.62(b)
= 1.
Thus
−15750
37907

= 1, so we conclude that −15750 is a square modulo 37907.
Note that our computation using Legendre symbols does not tell us how to
solve c2
≡ −15750 (mod 37907); it tells us only that there is a solution. For
those who are curious, we mention that c = 10982 is a solution.
Example 3.63 shows how quadratic reciprocity can be used to evaluate the
Legendre symbol. However, you may have noticed that in the middle of our
calculation, we needed to factor the number 15750. We were lucky that 15750
is easy to factor, but suppose that we were faced with a more diﬃcult fac-
torization problem. For example, suppose that we want to determine whether
p = 228530738017 is a square modulo q = 9365449244297. It turns out that
both p and q are prime.11
Hence we can use quadratic reciprocity to compute

228530738017
9365449244297

since 228530738017 ≡ 1 (mod 4),
=

224219723617
228530738017

reducing 9365449244297
modulo 228530738017.
11If you don’t believe that p and q are prime, use Miller–Rabin (Table 3.2) to check.

Unfortunately, the number 224219723617 is not prime, so we cannot apply
quadratic reciprocity directly, and even more unfortunately, it is not an easy
number to factor (by hand). So it appears that quadratic reciprocity is useful
only if the intermediate calculations lead to numbers that we are able to factor.
Luckily, there is a fancier version of quadratic reciprocity that completely
eliminates this difficulty. In order to state it, we need to generalize the defini-
tion of the Legendre symbol.
Definition. Let a and b be integers and let b be odd and positive. Suppose
that the factorization of b into primes is
b = pe1
1 pe2
2 pe3
3 · · · pet
t .
The Jacobi symbol
a
b

is defined by the formula

a
b

et
.
Notice that if b is itself prime, then
a
b

is the original Legendre symbol, so
the Jacobi symbol is a generalization of the Legendre symbol. Also note that
we deﬁne the Jacobi symbol only for odd positive values of b.
Example 3.64. Here is a simple example of a Jacobi symbol, computed directly
from the deﬁnition:

123
323

= 1.
Here is a more complicated example:

171337608
536134436237

=

171337608
293 · 59 · 672 · 83

=

171337608
29
3
171337608
59

171337608
67
2
171337608
83

=

171337608
29

171337608
59

171337608
83

=

11
29

15
59

44
83

= (−1) · 1 · 1 = −1.
From the deﬁnition, it appears that we need to know how to factor b
in order to compute the Jacobi symbol
a
b

, so we haven’t gained anything.
However, it turns out that the Jacobi symbol inherits most of the properties
of the Legendre symbol, which will allow us to compute
a
b

extremely rapidly
without doing any factorization at all. We start with the basic multiplication
and reduction properties.

Proposition 3.65. Let a, a1, a2, b, b1, b2 be integers with b, b1, and b2 positive
and odd.
(a)

a1a2
b

.
(b) If a1 ≡ a2 (mod b), then

a1
b

.
Proof. Both parts of Proposition 3.65 follow easily from the deﬁnition of
the Jacobi symbol and the corresponding properties (3.33) and (3.34) of the
Legendre symbol.
Now we come to the amazing fact that the Jacobi symbol satisﬁes exactly
the same reciprocity law as the Legendre symbol.
Theorem 3.66 (Quadratic Reciprocity: Version II). Let a and b be integers
that are odd and positive.
(a)

−1
b

=

1 if b ≡ 1 (mod 4),
−1 if b ≡ 3 (mod 4).
(b)

2
b

=

1 if b ≡ 1 or 7 (mod 8),
−1 if b ≡ 3 or 5 (mod 8).
(c)

a
b

=
⎧
⎪
⎪
⎨
⎪
⎪
⎩

b
a

if a ≡ 1 (mod 4) or b ≡ 1 (mod 4),
−

b
a

if a ≡ 3 (mod 4) and b ≡ 3 (mod 4).
Proof. It is not hard to use the original version of quadratic reciprocity for
the Legendre symbol (Theorem 3.62) to prove the more general version for the
Jacobi symbol. See for example [59, Proposition 5.2.2] or [137, Theorem 22.2].
Example 3.67. When we tried to use the original version of quadratic reci-
procity (Theorem 3.62) to compute
228530738017
9365449244297

, we ran into the problem
that we needed to factor the number 224219723617. Using the new and im-
proved version of quadratic reciprocity (Theorem 3.66), we can perform the
computation without doing any factoring:

228530738017
9365449244297

=

9365449244297
228530738017

=

224219723617
228530738017

=

228530738017
224219723617

=

4311014400
224219723617

=

210
· 4209975
224219723617

=

224219723617
4209975

=

665092
4209975

=

22
· 166273
4209975

=

4209975
166273

=

53150
166273

=

2 · 26575
166273

=

26575
166273

=

166273
26575

=

6823
26575

= −

26575
6823

= −

6106
6823

= −

2 · 3053
6823

= −

3053
6823

= −

6823
3053

= −

717
3053

= −

3053
717

= −

185
717

= −

717
185

= −

162
185

= −

2 · 81
185

= −

81
185

= −

185
81

= −

23
81

= −

81
23

= −

12
23

= −

22
· 3
23

=

23
3

=

2
3

= −1.
Hence 228530738017 is not a square modulo 9365449244297.
Remark 3.68. Suppose that
a
b

= 1, where b is some odd positive number.
Does the fact that
a
b

= 1 tell us that a is a square modulo b? It does if b
is prime, since that’s how we deﬁned the Legendre symbol, but what if b is
composite? For example, suppose that b = pq is a product of two primes.
Then by deﬁnition,
a
b

.
We see that there are two ways in which
a
b

can be equal to 1, namely 1 = 1·1
and 1 = (−1) · (−1). This leads to two diﬀerent cases:
Case 1:

a
p

= 1, so a is a square modulo pq.
Case 2:

a
p

= −1, so a is not a square modulo pq.
We should justify our assertion that a is a square modulo pq in Case 1. Note
that in Case 1, there are solutions to c2
1 ≡ a (mod p) and c2
2 ≡ a (mod q).
We use the Chinese remainder theorem (Theorem 2.24) to ﬁnd an integer c
satisfying c ≡ c1 (mod p) and c ≡ c2 (mod q), and then c2
≡ a (mod pq).
Our conclusion is that if b = pq is a product of two primes, then although
it is easy to compute the value of the Jacobi symbol
a
pq

, this value does not
tell us whether a is a square modulo pq. This dichotomy can be exploited for
cryptographic purposes as explained in the next section.
Example 3.69 (An application of quadratic reciprocity to the discrete loga-
rithm problem). Let p be an odd prime, let g ∈ F∗
p be a primitive root, and
let h ∈ F∗
p. As we have discussed, it is in general a diﬃcult problem to com-
pute the discrete logarithm logg(h), i.e., to solve gx
= h. But one might ask
if it is possible to easily extract some information about logg(h). The answer
is yes, since we claim that
(−1)logg(h)
=

h
p

. (3.35)
Thus the Legendre symbol
h
p

determines whether logg(h) is odd or even,
and quadratic reciprocity gives a fast algorithm to compute the value of
h
p

.

3.10. Probabilistic Encryption 177
In order to prove (3.35), we note that while proving Proposition 3.61, we
showed that gr
is a quadratic residue if r is even and that gr
is a quadratic
nonresidue if r is odd. Taking r = logg(h) gives (3.35). In fancier terminology,
one says that the 0th bit of the discrete logarithm is insecure. See Exercise 3.40
for a generalization.
3.10 Probabilistic Encryption and the
Goldwasser–Micali Cryptosystem
Suppose that Alice wants to use a public key cryptosystem to encrypt and
send Bob 1 bit, i.e., Alice wants to send Bob one of the values 0 and 1. At
first glance such an arrangement seems inherently insecure. All that Eve has
to do is to encrypt the two possible plaintexts m = 0 and m = 1, and then
she compares the encryptions with Alice’s ciphertext. More generally, in any
cryptosystem for which the set of possible plaintexts is small, Eve can encrypt
every plaintext using Bob’s public key until she finds the one that is Alice’s.
Probabilistic encryption was invented by Goldwasser and Micali as a way
around this problem. The idea is that Alice chooses both a plaintext m and
a random string of data r, and then she uses Bob’s public key to encrypt the
pair (m, r). Ideally, as r varies over all of its possible values, the ciphertexts
for (m, r) will vary “randomly” over the possible ciphertexts. More precisely,
for any fixed m1 and m2 and for varying r, the distribution of values of the
two quantities
e(m1, r) = the ciphertext for plaintext m1 and random string r,
e(m2, r) = the ciphertext for plaintext m2 and random string r,
should be essentially indistinguishable. Note that it is not necessary that Bob
be able to recover the full pair (m, r) when he performs the decryption. He
needs to recover only the plaintext m.
This abstract idea is clear, but how might one create a probabilistic en-
cryption scheme in practice? Goldwasser and Micali describe one such scheme,
which, although impractical, since it encrypts only 1 bit at a time, has the
advantage of being quite simple to describe and analyze. The idea is based on
the difficulty of the following problem.
Let p and q be (secret) prime numbers and let N = pq
be given. For a given integer a, determine whether a
is a square modulo N, i.e., determine whether there
exists an integer u satisfying u2
≡ a (mod N).
Note that Bob, who knows how to factor N = pq, is able to solve this problem
very easily, since
a is a square modulo pq if and only if

a
p

Eve, on the other hand, has a harder time, since she knows only the value
of N. Eve can compute
a
N

, but as we noted earlier (Remark 3.68), this does
not tell her whether a is a square modulo N. Goldwasser and Micali exploit
this fact12
to create the probabilistic public key cryptosystem described in
Table 3.9.
Bob Alice
Key creation
Choose a with
a
p

=
a
q

= −1.
Publish N = pq and a.
Encryption
Choose plaintext m ∈ {0, 1}.
Choose random r with 1 r N.
Use Bob’s public key (N, a)
to compute
c =

r2
mod N if m = 0,
ar2
mod N if m = 1.
Send ciphertext c to Bob.
Decryption
Compute
c
p

. Decrypt to
m =

0 if
c
p

= 1,
1 if
c
p

= −1.
Table 3.9: Goldwasser–Micali probabilistic public key cryptosystem
It is easy to check that the Goldwasser–Micali cryptosystem works as ad-
vertised, since

c
p

=
⎧
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎩

r2
p

= −1 if m = 1.
Further, since Alice chooses r randomly, the set of values that Eve sees
when Alice encrypts m = 0 consists of all possible squares modulo N, and the
set of values that Eve sees when Alice encrypts m = 1 consists of all possible
numbers c satisfying
c
N

= 1 that are not squares modulo N.
12Goldwasser and Micali were not the ﬁrst to use the problem of squares modulo pq
for cryptography. Indeed, an early public key cryptosystem due to Rabin that is provably
secure against chosen plaintext attacks (assuming the hardness of factorization) relies on
this problem.

3.10. Probabilistic Encryption 179
What information does Eve obtain if she computes the Jacobi sym-
bol
c
N

, which she can do since N is a public quantity? If m = 0, then
c ≡ r2
(mod N), so

c
N

2
= 1.
On the other hand, if m = 1, then c ≡ ar2
(mod N), so

c
N

= (−1) · (−1) = 1
is also equal to 1. (Note that Bob chose a to satisfy
a
p

=
a
q

= −1.) Thus
c
N

is equal to 1, regardless of the value of N, so the Jacobi symbol gives Eve no
useful information.
Example 3.70. Bob creates a Goldwasser–Micali public key by choosing
p = 2309, q = 5651, N = pq = 13048159, a = 6283665.
Note that a has the property that
a
p

=
a
q

= −1. He publishes the pair (N, a)
and keeps the values of the primes p and q secret.
Alice begins by sending Bob the plaintext bit m = 0. To do this, she
chooses r = 1642087 at random from the interval 1 to 13048158. She then
computes
c ≡ r2
≡ 16420872
≡ 8513742 (mod 13048159),
and sends the ciphertext c = 8513742 to Bob. Bob decrypts the ciphertext
c = 8513742 by computing
8513742
2309

= 1, which gives the plaintext bit m = 0.
Next Alice decides to send Bob the plaintext bit m = 1. She chooses a
random value r = 11200984 and computes
c ≡ ar2
≡ 6283665 · 112009842
≡ 2401627 (mod 13048159).
Bob decrypts c = 2401627 by computing
2401627
2309

= −1, which tells him that
the plaintext bit m = 1.
Finally, Alice wants to send Bob another plaintext bit m = 1. She chooses
the random value r = 11442423 and computes
c ≡ ar2
≡ 6283665 · 114424232
≡ 4099266 (mod 13048159).
Notice that the ciphertext for this encryption of m = 1 is completely unrelated
to the previous encryption of m = 1. Bob decrypts c = 4099266 by computing
4099266
2309

= −1 to conclude that the plaintext bit is m = 1.
Remark 3.71. The Goldwasser–Micali public key cryptosystem is not practi-
cal, because each bit of the plaintext is encrypted with a number modulo N.
For it to be secure, it is necessary that Eve be unable to factor the number
N = pq, so in practice N will be (at least) a 1000-bit number. Thus if Alice
wants to send k bits of plaintext to Bob, her ciphertext will be 1000k bits long.

180 Exercises
Thus the Goldwasswer–Micali public key cryptosystem has a message expan-
sion ratio of 1000, since the ciphertext is 1000 times as long as the plaintext. In
general, the Goldwasswer–Micali public key cryptosystem expands a message
by a factor of log2(N).
There are other probabilistic public key cryptosystems whose message ex-
pansion is much smaller. Indeed, we have already seen one: the random ele-
ment k used by the Elgamal public key cryptosystem (Sect. 2.4) makes Elga-
mal a probabilistic cryptosystem. Elgamal has a message expansion ratio of 2,
as explained in Remark 2.9. Later, in Sect. 7.10, we will see another proba-
bilistic cryptosystem called NTRU. More generally, it is possible, and indeed
usually desirable, to take a deterministic cryptosystem such as RSA and turn
it into a probabilistic system, even at the cost of increasing its message ex-
pansion ratio. (See Exercise 3.43 and Sect. 8.6.)
Exercises
Section 3.1. Euler’s Theorem and Roots Modulo pq
3.1. Solve the following congruences.
(a) x19
≡ 36 (mod 97).
(b) x137
≡ 428 (mod 541).
(c) x73
≡ 614 (mod 1159).
(d) x751
≡ 677 (mod 8023).
(e) x38993
≡ 328047 (mod 401227). (Hint. 401227 = 607 · 661.)
3.2. This exercise investigates what happens if we drop the assumption that
gcd(e, p − 1) = 1 in Proposition 3.2. So let p be a prime, let c ≡ 0 (mod p), let
e ≥ 1, and consider the congruence
xe
≡ c (mod p). (3.36)
(a) Prove that if (3.36) has one solution, then it has exactly gcd(e, p − 1) distinct
solutions. (Hint. Use primitive root theorem (Theorem 1.30), combined with
the extended Euclidean algorithm (Theorem 1.11) or Exercise 1.27.)
(b) For how many non-zero values of c (mod p) does the congruence (3.36) have a
solution?
3.3. Let p and q be distinct primes and let e and d be positive integers satisfying
de ≡ 1 (mod (p − 1)(q − 1)).
Suppose further that c is an integer with gcd(c, pq) 1. Prove that
x ≡ cd
(mod pq) is a solution to the congruence xe
≡ c (mod pq),
thereby completing the proof of Proposition 3.5.

Exercises 181
3.4. Recall from Sect. 1.3 that Euler’s phi function φ(N) is the function deﬁned by
φ(N) = #{0 ≤ k N : gcd(k, N) = 1}.
In other words, φ(N) is the number of integers between 0 and N − 1 that are
relatively prime to N, or equivalently, the number of elements in Z/NZ that have
inverses modulo N.
(a) Compute the values of φ(6), φ(9), φ(15), and φ(17).
(b) If p is prime, what is the value of φ(p)?
(c) Prove Euler’s formula
aφ(N)
≡ 1 (mod N) for all integers a satisfying gcd(a, N) = 1.
(Hint. Mimic the proof of Fermat’s little theorem (Theorem 1.24), but instead of
looking at all of the multiples of a as was done in (1.8), just take the multiples ka
of a for values of k satisfying gcd(k, N) = 1.)
3.5. Euler’s phi function has many beautiful properties.
(a) If p and q are distinct primes, how is φ(pq) related to φ(p) and φ(q)?
(b) If p is prime, what is the value of φ(p2
)? How about φ(pj
)? Prove that your
formula for φ(pj
) is correct. (Hint. Among the numbers between 0 and pj
− 1,
remove the ones that have a factor of p. The ones that are left are relatively
prime to p.)
(c) Let M and N be integers satisfying gcd(M, N) = 1. Prove the multiplication
formula
φ(MN) = φ(M)φ(N).
(d) Let p1, p2, . . . , pr be the distinct primes that divide N. Use your results from (b)
and (c) to prove the following formula:
φ(N) = N
r

i=1

1 −
1
pi

.
(e) Use the formula in (d) to compute the following values of φ(N).
(i) φ(1728). (ii) φ(1575). (iii) φ(889056) (Hint. 889056 = 25
· 34
· 73
).
3.6. Let N, c, and e be positive integers satisfying the conditions gcd(N, c) = 1 and
gcd

e, φ(N)

= 1.
(a) Explain how to solve the congruence
xe
≡ c (mod N),
assuming that you know the value of φ(N). (Hint. Use the formula in
Exercise 3.4(c).)
(b) Solve the following congruences. (The formula in Exercise 3.5(d) may be helpful
for computing the value of φ(N).)
(i) x577
≡ 60 (mod 1463).
(ii) x959
≡ 1583 (mod 1625).
(iii) x133957
≡ 224689 (mod 2134440).

182 Exercises
Section 3.2. The RSA Public Key Cryptosystem
3.7. Alice publishes her RSA public key: modulus N = 2038667 and exponent
e = 103.
(a) Bob wants to send Alice the message m = 892383. What ciphertext does Bob
send to Alice?
(b) Alice knows that her modulus factors into a product of two primes, one of which
is p = 1301. Find a decryption exponent d for Alice.
(c) Alice receives the ciphertext c = 317730 from Bob. Decrypt the message.
3.8. Bob’s RSA public key has modulus N = 12191 and exponent e = 37. Alice
sends Bob the ciphertext c = 587. Unfortunately, Bob has chosen too small a modu-
lus. Help Eve by factoring N and decrypting Alice’s message. (Hint. N has a factor
smaller than 100.)
3.9. For each of the given values of N = pq and (p − 1)(q − 1), use the method
described in Remark 3.11 to determine p and q.
(a) N = pq = 352717 and (p − 1)(q − 1) = 351520.
(b) N = pq = 77083921 and (p − 1)(q − 1) = 77066212.
(c) N = pq = 109404161 and (p − 1)(q − 1) = 109380612.
(d) N = pq = 172205490419 and (p − 1)(q − 1) = 172204660344.
3.10. A decryption exponent for an RSA public key (N, e) is an integer d with the
property that ade
≡ a (mod N) for all integers a that are relatively prime to N.
(a) Suppose that Eve has a magic box that creates decryption exponents for (N, e)
for a fixed modulus N and for a large number of different encryption expo-
nents e. Explain how Eve can use her magic box to try to factor N.
(b) Let N = 38749709. Eve’s magic box tells her that the encryption exponent
e = 10988423 has decryption exponent d = 16784693 and that the encryp-
tion exponent e = 25910155 has decryption exponent d = 11514115. Use this
information to factor N.
(c) Let N = 225022969. Eve’s magic box tells her the following three encryp-
tion/decryption pairs for N:
(70583995, 4911157), (173111957, 7346999), (180311381, 29597249).
Use this information to factor N.
(d) Let N = 1291233941. Eve’s magic box tells her the following three encryp-
tion/decryption pairs for N:
(1103927639, 76923209), (1022313977, 106791263), (387632407, 7764043).
Use this information to factor N.
3.11. Here is an example of a public key system that was proposed at a cryptography
conference. It was designed to be more efficient than RSA.
Alice chooses two large primes p and q and she publishes N = pq. It is assumed
that N is hard to factor. Alice also chooses three random numbers g, r1, and r2
modulo N and computes
g1 ≡ gr1(p−1)
(mod N) and g2 ≡ gr2(q−1)
(mod N).

Exercises 183
Her public key is the triple (N, g1, g2) and her private key is the pair of primes (p, q).
Now Bob wants to send the message m to Alice, where m is a number modulo N.
He chooses two random integers s1 and s2 modulo N and computes
c1 ≡ mgs1
1 (mod N) and c2 ≡ mgs2
2 (mod N).
Bob sends the ciphertext (c1, c2) to Alice.
Decryption is extremely fast and easy. Alice uses the Chinese remainder theorem
to solve the pair of congruences
x ≡ c1 (mod p) and x ≡ c2 (mod q).
(a) Prove that Alice’s solution x is equal to Bob’s plaintext m.
(b) Explain why this cryptosystem is not secure.
Section 3.3. Implementation and Security Issues
3.12. Formulate a man-in-the-middle attack, similar to the attack described in
Example 3.13 on page 126, for the following public key cryptosystems.
(a) The Elgamal public key cryptosystem (Table 2.3 on page 72).
(b) The RSA public key cryptosystem (Table 3.1 on page 123).
3.13. Alice decides to use RSA with the public key N = 1889570071. In order to
guard against transmission errors, Alice has Bob encrypt his message twice, once
using the encryption exponent e1 = 1021763679 and once using the encryption
exponent e2 = 519424709. Eve intercepts the two encrypted messages
c1 = 1244183534 and c2 = 732959706.
Assuming that Eve also knows N and the two encryption exponents e1 and e2, use
the method described in Example 3.15 to help Eve recover Bob’s plaintext without
ﬁnding a factorization of N.
Section 3.4. Primality Testing
3.14. We stated that the number 561 is a Carmichael number, but we never checked
that a561
≡ a (mod 561) for every value of a.
(a) The number 561 factors as 3 · 11 · 17. First use Fermat’s little theorem to prove
that
a561
≡ a (mod 3), a561
≡ a (mod 11), and a561
≡ a (mod 17)
for every value of a. Then explain why these three congruences imply that
a561
≡ a (mod 561) for every value of a.
(b) Mimic the idea used in (a) to prove that each of the following numbers is a
Carmichael number. (To assist you, we have factored each number into primes.)
(i) 1729 = 7 · 13 · 19
(ii) 10585 = 5 · 29 · 73
(iii) 75361 = 11 · 13 · 17 · 31
(iv) 1024651 = 19 · 199 · 271
(c) Prove that a Carmichael number must be odd.

184 Exercises
(d) Prove that a Carmichael number must be a product of distinct primes.
(e) Look up Korselt’s criterion in a book or online, write a brief description of how it
works, and use it to show that 29341 = 13·37·61 and 172947529 = 307·613·919
are Carmichael numbers.
3.15. Use the Miller–Rabin test on each of the following numbers. In each case,
either provide a Miller–Rabin witness for the compositeness of n, or conclude that n
is probably prime by providing 10 numbers that are not Miller–Rabin witnesses
for n.
(a) n = 1105. (Yes, 5 divides n, but this is just a warm-up exercise!)
(b) n = 294409 (c) n = 294439
(d) n = 118901509 (e) n = 118901521
(f) n = 118901527 (g) n = 118915387
3.16. Looking back at Exercise 3.10, let’s suppose that for a given N, the magic box
can produce only one decryption exponent. Equivalently, suppose that an RSA key
pair has been compromised and that the private decryption exponent corresponding
to the public encryption exponent has been discovered. Show how the basic idea in
the Miller–Rabin primality test can be applied to use this information to factor N.
3.17. The function π(X) counts the number of primes between 2 and X.
(a) Compute the values of π(20), π(30), and π(100).
(b) Write a program to compute π(X) and use it to compute π(X) and the ratio
π(X)/(X/ ln(X)) for X = 100, X = 1000, X = 10000, and X = 100000. Does
your list of ratios make the prime number theorem plausible?
3.18. Let
π1(X) = (# of primes p between 2 and X satisfying p ≡ 1 (mod 4)),
π3(X) = (# of primes p between 2 and X satisfying p ≡ 3 (mod 4)).
Thus every prime other than 2 gets counted by either π1(X) or by π3(X).
(a) Compute the values of π1(X) and π3(X) for each of the following values of X.
(i) X = 10. (ii) X = 25. (iii) X = 100.
(b) Write a program to compute π1(X) and π3(X) and use it to compute their
values and the ratio π3(X)/π1(X) for X = 100, X = 1000, X = 10000, and
X = 100000.
(c) Based on your data from (b), make a conjecture about the relative sizes of π1(X)
and π3(X). Which one do you think is larger? What do you think is the limit
of the ratio π3(X)/π1(X) as X → ∞?
3.19. We noted in Sect. 3.4 that it really makes no sense to say that the number n
has probability 1/ ln(n) of being prime. Any particular number that you choose
either will be prime or will not be prime; there are no numbers that are 35 % prime
and 65 % composite! In this exercise you will prove a result that gives a more sensible
meaning to the statement that a number has a certain probability of being prime.
You may use the prime number theorem (Theorem 3.21) for this problem.
(a) Fix a (large) number N and suppose that Bob chooses a random number n in
the interval 1
2
N ≤ n ≤ 3
2
N. If he repeats this process many times, prove that
approximately 1/ ln(N) of his numbers will be prime. More precisely, deﬁne

Exercises 185
P(N) =
number of primes between 1
2
N and 3
2
N
number of integers between 1
2
N and 3
2
N
=

Probability that an integer n in the
interval 1
2
N ≤ n ≤ 3
2
N is a prime number

,
and prove that
lim
N→∞
P(N)
1/ ln(N)
= 1.
This shows that if N is large, then P(N) is approximately 1/ ln(N).
(b) More generally, fix two numbers c1 and c2 satisfying c2 c1 0. Bob chooses
random numbers n in the interval c1N ≤ n ≤ c2N. Keeping c1 and c2 fixed, let
P(c1, c2; N) =

Probability that an integer n in the inter-
val c1N ≤ n ≤ c2N is a prime number

.
In the following formula, fill in the box with a simple function of N so that the
statement is true:
lim
N→∞
P(c1, c2; N)
= 1.
3.20. Continuing with the previous exercise, explain how to make mathematical
sense of the following statements.
(a) A randomly chosen odd number N has probability 2/ ln(N) of being prime.
(What is the probability that a randomly chosen even number is prime?)
(b) A randomly chosen number N satisfying N ≡ 1 (mod 3) has probability
3/(2 ln(N)) of being prime.
(c) A randomly chosen number N satisfying N ≡ 1 (mod 6) has probability
3/ ln(N) of being prime.
(d) Let m = p1p2 · · · pr be a product of distinct primes and let k be a number
satisfying gcd(k, m) = 1. What number should go into the box to make state-
ment (3.37) correct? Why?
A randomly chosen number N satisfying
N ≡ k (mod m) has probability / ln(N)
of being prime.
(3.37)
(e) Same question, but for arbitrary m, not just for m that are products of distinct
primes.
3.21. The logarithmic integral function Li(X) is defined to be
Li(X) =
X
2
dt
ln t
.
(a) Prove that
Li(X) =
X
ln X
+
X
2
dt
(ln t)2
+ O(1).
(Hint. Integration by parts.)

186 Exercises
(b) Compute the limit
lim
X→∞
Li(X)
X/ ln X
.
(Hint. Break the integral in (a) into two pieces, 2 ≤ t ≤
√
X and
√
X ≤ t ≤ X,
and estimate each piece separately.)
(c) Use (b) to show that formula (3.12) on page 135 implies the prime number
theorem (Theorem 3.21).
Section 3.5. Pollard’s p − 1 Factorization Algorithm
3.22. Use Pollard’s p − 1 method to factor each of the following numbers.
(a) n = 1739 (b) n = 220459 (c) n = 48356747
Be sure to show your work and to indicate which prime factor p of n has the property
that p − 1 is a product of small primes.
3.23. A prime of the form 2n
− 1 is called a Mersenne prime.
(a) Factor each of the numbers 2n
−1 for n = 2, 3, . . . , 10. Which ones are Mersenne
primes?
(b) Find the first seven Mersenne primes. (You may need a computer.)
(c) If n is even and n 2, prove that 2n
− 1 is not prime.
(d) If 3 | n and n 3, prove that 2n
− 1 is not prime.
(e) More generally, prove that if n is a composite number, then 2n
− 1 is not prime.
Thus all Mersenne primes have the form 2p
− 1 with p a prime number.
(f) What is the largest known Mersenne prime? Are there any larger primes known?
(You can find out at the “Great Internet Mersenne Prime Search” web site www.
mersenne.org/prime.htm.)
(g) Write a one page essay on Mersenne primes, starting with the discoveries of
Father Mersenne and ending with GIMPS.
Section 3.6. Factorization via Difference of Squares
3.24. For each of the following numbers N, compute the values of
N + 12
, N + 22
, N + 32
, N + 42
, . . .
as we did in Example 3.34 until you find a value N + b2
that is a perfect square a2
.
Then use the values of a and b to factor N.
(a) N = 53357 (b) N = 34571 (c) N = 25777 (d) N = 64213
3.25. For each of the listed values of N, k, and binit, factor N by making a list of
values of k · N + b2
, starting at b = binit and incrementing b until k · N + b2
is a
perfect square. Then take greatest common divisors as we did in Example 3.35.
(a) N = 143041 k = 247 binit = 1
(b) N = 1226987 k = 3 binit = 36
(c) N = 2510839 k = 21 binit = 90
3.26. For each part, use the data provided to find values of a and b satisfying
a2
≡ b2
(mod N), and then compute gcd(N, a − b) in order to find a nontrivial factor
of N, as we did in Examples 3.37 and 3.38.

Exercises 187
(a) N = 61063
18822
≡ 270 (mod 61063) and 270 = 2 · 33
· 5
18982
≡ 60750 (mod 61063) and 60750 = 2 · 35
· 53
(b) N = 52907
3992
≡ 480 (mod 52907) and 480 = 25
· 3 · 5
7632
≡ 192 (mod 52907) and 192 = 26
· 3
7732
≡ 15552 (mod 52907) and 15552 = 26
· 35
9762
≡ 250 (mod 52907) and 250 = 2 · 53
(c) N = 198103
11892
≡ 27000 (mod 198103) and 27000 = 23
· 33
· 53
16052
≡ 686 (mod 198103) and 686 = 2 · 73
23782
≡ 108000 (mod 198103) and 108000 = 25
· 33
· 53
28152
≡ 105 (mod 198103) and 105 = 3 · 5 · 7
(d) N = 2525891
15912
≡ 5390 (mod 2525891) and 5390 = 2 · 5 · 72
· 11
31822
≡ 21560 (mod 2525891) and 21560 = 23
· 5 · 72
· 11
47732
≡ 48510 (mod 2525891) and 48510 = 2 · 32
· 5 · 72
· 11
52752
≡ 40824 (mod 2525891) and 40824 = 23
· 36
· 7
54012
≡ 1386000 (mod 2525891) and 1386000 = 24
· 32
· 53
· 7 · 11
Section 3.7. Smooth Numbers, Sieves, and Building Relations for Factorization
3.27. Compute the following values of ψ(X, B), the number of B-smooth numbers
between 2 and X (see page 150).
(a) ψ(25, 3) (b) ψ(35, 5) (c) ψ(50, 7) (d) ψ(100, 5) (e) ψ(100, 7)
3.28. An integer M is called B-power-smooth if every prime power pe
dividing M
satisﬁes pe
≤ B. For example, 180 = 22
· 32
· 5 is 10-power-smooth, since the largest
prime power dividing 180 is 9, which is smaller than 10.
(a) Suppose that M is B-power-smooth. Prove that M is also B-smooth.
(b) Suppose that M is B-smooth. Is it always true that M is also B-power-smooth?
Either prove that it is true or give an example for which it is not true.
(c) The following is a list of 20 randomly chosen numbers between 1 and 1000,
sorted from smallest to largest. Which of these numbers are 10-power-smooth?
Which of them are 10-smooth?
{84, 141, 171, 208, 224, 318, 325, 366, 378, 390, 420, 440,
504, 530, 707, 726, 758, 765, 792, 817}

188 Exercises
(d) Prove that M is B-power-smooth if and only if M divides the least common mul-
tiple of [1, 2, . . . , B]. (The least common multiple of a list of numbers k1, . . . , kr
is the smallest number K that is divisible by every number in the list.)
3.29. Let L(N) = e
√
(ln N)(ln ln N)
as usual. Suppose that a computer does one
billion operations per second.
(a) How many seconds does it take to perform L(2100
) operations?
(b) How many hours does it take to perform L(2250
) operations?
(c) How many days does it take to perform L(2350
) operations?
(d) How many years does it take to perform L(2500
) operations?
(e) How many years does it take to perform L(2750
) operations?
(f) How many years does it take to perform L(21000
) operations?
(g) How many years does it take to perform L(22000
) operations?
(For simplicity, you may assume that there are 365.25 days in a year.)
3.30. Prove that the function L(X) = e
√
(ln X)(ln ln X)
is subexponential. That is,
prove the following two statements.
(a) For every positive constant α, no matter how large, L(X) = Ω

(ln X)α

.
(b) For every positive constant β, no matter how small, L(X) = O

Xβ
).
3.31. For any fixed positive constants a and b, define the function
Fa,b(X) = e(ln X)1/a
(ln ln X)1/b
.
Prove the following properties of Fa,b(X).
(a) If a 1, prove that Fa,b(X) is subexponential.
(b) If a = 1, prove that Fa,b(X) = Ω(Xα
) for every α 0. Thus Fa,b(X) grows
faster than every exponential function, so one says that Fa,b(X) has superex-
ponential growth.
(c) What happens if a 1?
3.32. This exercise asks you to verify an assertion in the proof of Corollary 3.45.
Let L(X) be the usual function L(X) = e
√
(ln X)(ln ln X)
.
(a) Prove that there is a value of 0 such that
(ln X)
ln L(X) (ln X)1−
for all X 10.
(b) Let c 0, let Y = L(X)c
, and let u = (ln X)/(ln Y ). Prove that
u−u
= L(X)− 1
2c
(1+o(1))
.
3.33. Proposition 3.48 assumes that we choose random numbers a modulo N, com-
pute a2
(mod N), and check whether the result is B-smooth. We can achieve better
results if we take values for a of the form
a =
√
N

+ k for 1 ≤ k ≤ K.
(For simplicity, you may treat K as a fixed integer, independent of N. More rigor-
ously, it is necessary to take K equal to a power of L(N), which has a small effect
on the final answer.)

Exercises 189
(a) Prove that a2
−N ≤ 2K
√
N +K2
, so in particular, a2
(mod N) is smaller than
a multiple of
√
N.
(b) Prove that L(
√
N ) ≈ L(N)1/
√
2
by showing that
lim
N→∞
log L(
√
N )
log L(N)1/
√
2
= 1.
More generally, prove that in the same sense, L(N1/r
) ≈ L(N)1/
√
r
for any
fixed r 0.
(c) Re-prove Proposition 3.48 using this better choice of values for a. Set B =
L(N)c
and find the optimal value of c. Approximately how many relations are
needed to factor N?
3.34. Illustrate the quadratic sieve, as was done in Fig. 3.3 (page 161), by sieving
prime powers up to B on the values of F(T) = T2
− N in the indicated range.
(a) Sieve N = 493 using prime powers up to B = 11 on values from F(23) to F(38).
Use the relation(s) that you find to factor N.
(b) Extend the computations in (a) by using prime powers up to B = 16 and
sieving values from F(23) to F(50). What additional value(s) are sieved down
to 1 and what additional relation(s) do they yield?
3.35. Let Z[β] be the ring described in Example 3.55, i.e., β is a root of f(x) =
1 + 3x − 2x3
+ x4
. For each of the following pairs of elements u, v ∈ Z[β], compute
the sum u + v and the product uv. Your answers should involve only powers of β up
to β3
.
(a) u = −5 − 2β + 9β2
− 9β3
and v = 2 + 9β − 7β2
+ 7β3
.
(b) u = 9 + 9β + 6β2
− 5β3
and v = −4 − 6β − 2β2
− 5β3
.
(c) u = 6 − 5β + 3β2
+ 3β3
and v = −2 + 7β + 6β2
.
Section 3.8. The Index Calculus and Discrete Logarithms
3.36. This exercise asks you to use the index calculus to solve a discrete logarithm
problem. Let p = 19079 and g = 17.
(a) Verify that gi
(mod p) is 5-smooth for each of the values i = 3030, i = 6892,
and i = 18312.
(b) Use your computations in (a) and linear algebra to compute the discrete loga-
rithms logg(2), logg(3), and logg(5). (Note that 19078 = 2 · 9539 and that 9539
is prime.)
(c) Verify that 19 · 17−12400
(mod p) is 5-smooth.
(d) Use the values from (b) and the computation in (c) to solve the discrete loga-
rithm problem
17x
≡ 19 (mod 19079).
Section 3.9. Quadratic Residues and Quadratic Reciprocity
3.37. Let p be an odd prime and let a be an integer with p a.
(a) Prove that a(p−1)/2
is congruent to either 1 or −1 modulo p.
(b) Prove that a(p−1)/2
is congruent to 1 modulo p if and only if a is a quadratic
residue modulo p. (Hint. Let g be a primitive root for p and use the fact, proven
during the course of proving Proposition 3.61, that gm
is a quadratic residue if
and only if m is even.)

190 Exercises
(c) Prove that a(p−1)/2
≡
a
p

(mod p). (This holds even if p | a.)
(d) Use (c) to prove Theorem 3.62(a), that is, prove that

−1
p

=

1 if p ≡ 1 (mod 4),
−1 if p ≡ 3 (mod 4).
3.38. Prove that the three parts of the quadratic reciprocity theorem (Theo-
rem 3.62) are equivalent to the following three concise formulas, where p and q
are odd primes:
(a)

−1
p

= (−1)
p−1
2 (b)

2
p

= (−1)
p2−1
8 (c)

p
q

q
p

= (−1)
p−1
2
· q−1
2
3.39. Let p be a prime satisfying p ≡ 3 (mod 4).
(a) Let a be a quadratic residue modulo p. Prove that the number
b ≡ a
p+1
4 (mod p)
has the property that b2
≡ a (mod p). (Hint. Write p+1
2
as 1 + p−1
2
and use
Exercise 3.37.) This gives an easy way to take square roots modulo p for primes
that are congruent to 3 modulo 4.
(b) Use (a) to compute the following square roots modulo p. Be sure to check your
answers.
(i) Solve b2
≡ 116 (mod 587).
(ii) Solve b2
≡ 3217 (mod 8627).
(iii) Solve b2
≡ 9109 (mod 10663).
3.40. Let p be an odd prime, let g ∈ F∗
p be a primitive root, and let h ∈ F∗
p. Write
p − 1 = 2s
m with m odd and s ≥ 1, and write the binary expansion of logg(h) as
logg(h) = 0 + 22 + 42 + 83 + · · · with 0, 1, . . . ∈ {0, 1}.
Give an algorithm that generalizes Example 3.69 and allows you to rapidly com-
pute 0, 1, . . . , s−1, thereby proving that the ﬁrst s bits of the discrete logarithm
are insecure. You may assume that you have a fast algorithm to compute square
roots in F∗
p, as provided for example by Exercise 3.39(a) if p ≡ 3 (mod 4). (Hint.
Use Example 3.69 to compute the 0th bit, take the square root of either h or g−1
h,
and repeat.)
3.41. Let p be a prime satisfying p ≡ 1 (mod 3). We say that a is a cubic residue
modulo p if p a and there is an integer c satisfying a ≡ c3
(mod p).
(a) Let a and b be cubic residues modulo p. Prove that ab is a cubic residue mod-
ulo p.
(b) Give an example to show that (unlike the case with quadratic residues) it is
possible for none of a, b, and ab to be a cubic residue modulo p.
(c) Let g be a primitive root modulo p. Prove that a is a cubic residue modulo p
if and only if 3 | logg(a), where logg(a) is the discrete logarithm of a to the
base g.

Exercises 191
(d) Suppose instead that p ≡ 2 (mod 3). Prove that for every integer a there is
an integer c satisfying a ≡ c3
(mod p). In other words, if p ≡ 2 (mod 3), show
that every number is a cube modulo p.
Section 3.10. Probabilistic Encryption and the Goldwasser–Micali Cryptosystem
3.42. Perform the following encryptions and decryptions using the Goldwasser–
Micali public key cryptosystem (Table 3.9).
(a) Bob’s public key is the pair N = 1842338473 and a = 1532411781. Alice en-
crypts 3 bits and sends Bob the ciphertext blocks
1794677960, 525734818, and 420526487.
Decrypt Alice’s message using the factorization
N = pq = 32411 · 56843.
(b) Bob’s public key is N = 3149 and a = 2013. Alice encrypts 3 bits and sends
Bob the ciphertext blocks 2322, 719, and 202. Unfortunately, Bob used primes
that are much too small. Factor N and decrypt Alice’s message.
(c) Bob’s public key is N = 781044643 and a = 568980706. Encrypt the
3 bits 1, 1, 0 using, respectively, the three random values
r = 705130839, r = 631364468, r = 67651321.
3.43. Suppose that the plaintext space M of a certain cryptosystem is the set of
bit strings of length 2b. Let ek and dk be the encryption and decryption functions
associated with a key k ∈ K. This exercise describes one method of turning the orig-
inal cryptosystem into a probabilistic cryptosystem. Most practical cryptosystems
that are currently in use rely on more complicated variants of this idea in order to
thwart certain types of attacks. (See Sect. 8.6 for further details.)
Alice sends Bob an encrypted message by performing the following steps:
1. Alice chooses a b-bit message m
to be encrypted.
2. Alice chooses a string r consisting of b random bits.
3. Alice sets m = r (r ⊕ m
), where denotes concatenation13
and ⊕ denotes
exclusive or (see Sect. 1.7.4). Notice that m has length 2b bits.
4. Alice computes c = ek(m) and sends the ciphertext c to Bob.
(a) Explain how Bob decrypts Alice’s message and recovers the plaintext m
. We
assume, of course, that Bob knows the decryption function dk.
(b) If the plaintexts and the ciphertexts of the original cryptosystem have the
same length, what is the message expansion ratio of the new probabilistic
cryptosystem?
(c) More generally, if the original cryptosystem has a message expansion ratio of μ,
what is the message expansion ratio of the new probabilistic cryptosystem?
13The concatenation of 2 bit strings is formed by placing the ﬁrst string before the second
string. For example, 1101 1001 is the bit string 11011001.

Chapter 4
Digital Signatures
4.1 What Is a Digital Signature?
Encryption schemes, whether symmetric or asymmetric, solve the problem
of secure communications over an insecure network. Digital signatures solve
a different problem, analogous to the purpose of a pen-and-ink signature on
a physical document. It is thus interesting that the tools used to construct
digital signatures are very similar to the tools used to construct asymmetric
ciphers.
Here is the exact problem that a digital signature is supposed to solve.
Samantha1
has a (digital) document D, for example a computer file, and she
wants to create some additional piece of information DSam
that can be used
to prove conclusively that Samantha herself approves of the document. So
you might view Samantha’s digital signature DSam
as analogous to her actual
signature on an ordinary paper document.
To contrast the purpose and functionality of public key (asymmetric) cryp-
tosystems versus digital signatures, we consider an analogy using bank deposit
vaults and signet rings. A bank deposit vault has a narrow slot (the “public
encryption key”) into which anyone can deposit an envelope, but only the
owner of the combination (the “private decryption key”) to the vault’s lock is
able to open the vault and read the message. Thus a public key cryptosystem
is a digital version of a bank deposit vault. A signet ring (the “private signing
key”) is a ring that has a recessed image. The owner drips some wax from
a candle onto his document and presses the ring into the wax to make an
impression (the “public signature”). Anyone who looks at the document can
verify that the wax impression was made by the owner of the signet ring, but
1In this chapter we give Alice and Bob a well deserved rest and let Samantha, the signer,
and Victor, the verifier, take over cryptographic duties.
193

194 4. Digital Signatures
Digital
document
tobesigned
D Signing
Algorithm
KPri Private
Key
Dsig
Digital
signature
D
Document
Dsig
Signature
Verification
Algorithm
TRUE if D signed
by KPri
is Dsig
FALSE otherwise
KPub Public
key
Figure 4.1: The two components of a digital signature scheme
only the owner of the ring is able to create valid impressions.2
Thus one may
view a digital signature system as a modern version of a signet ring.
Despite their different purposes, digital signature schemes are similar to
asymmetric cryptosystems in that they involve public and private keys and
invoke algorithms that use these keys. Here is an abstract description of the
pieces that make up a digital signature scheme:
KPri
A private signing key.
KPub
A public verification key.
Sign A signing algorithm that takes as input a digital document D and
a private key KPri
and returns a signature Dsig
for D.
Verify A verification algorithm that takes as input a digital document D,
a signature Dsig
, and a public key KPub
. The algorithm returns
True if Dsig
is a signature for D associated to the private key KPri
,
and otherwise it returns False.
The operation of a digital signature scheme is depicted in Fig. 4.1. An
important point to observe in Fig. 4.1 is that the verification algorithm does
not know the private key KPri
when it determines whether D signed by KPri
is equal to Dsig
. The verification algorithm has access only to the public
key KPub
.
It is not difficult to produce (useless) algorithms that satisfy the digital
signature properties. For example, let KPub
= KPri
. What is difficult is to
2Back in the days when interior illumination was by candlelight, sealing documents with
signet rings was a common way to create unforgeable signatures. In today’s world, with its
plentiful machine tools, signet rings and wax images obviously would not provide much
security.

4.1. What Is a Digital Signature? 195
create a digital signature scheme in which the owner of the private key KPri
is
able to create valid signatures, but knowledge of the public key KPub
does not
reveal the private key KPri
. Necessary general conditions for a secure digital
signature scheme include the following:
• Given KPub
, an attacker cannot feasibly determine KPri
, nor can she de-
termine any other private key that produces the same signatures as KPri
.
• Given KPub
and a list of signed documents D1, . . . , Dn with their sig-
natures Dsig
1 , . . . , Dsig
n , an attacker cannot feasibly determine a valid
signature on any document D that is not in the list D1, . . . , Dn.
The second condition is rather different from the situation for encryption
schemes. In public key encryption, an attacker can create as many cipher-
text/plaintext pairs as she wants, since she can create them using the known
public key. However, each time a digital signature scheme is used to sign a
new document, it is revealing a new document/signature pair, which provides
new information to an attacker. The second condition says that the attacker
gains nothing beyond knowledge of that new pair. An attack on a digital sig-
nature scheme that makes use of a large number of known signatures is called
a transcript attack. (See Sect. 7.12 for further discussion.)
Remark 4.1. Digital signatures are at least as important as public key cryp-
tosystems for the conduct of business in a digital age, and indeed, one might
argue that they are of greater importance. To take a significant instance, your
computer undoubtedly receives program and system upgrades over the Inter-
net. How can your computer tell that an upgrade comes from a legitimate
source, in this case the company that wrote the program in the first place?
The answer is a digital signature. The original program comes equipped with
the company’s public verification key. The company uses its private signing
key to sign the upgrade and sends your computer both the new program and
the signature. Your computer can use the public key to verify the signature,
thereby verifying that the program comes from a trusted source, before in-
stalling it on your system.
We must stress, however, that although this conveys the idea of how a
digital signature might be used, it is a vastly oversimplified explanation. Real-
world applications of digital signature schemes require considerable care to
avoid a variety of subtle, but fatal, security problems. In particular, as digital
signatures proliferate, it can become problematic to be sure that a purported
public verification key actually belongs to the supposed owner. And clearly
an adversary who tricks you into using her verification key, instead of the real
one, will then be able to convince you to accept all of her forged documents.
Remark 4.2. The natural capability of most digital signature schemes is to
sign only a small amount of data, say b bits, where b is between 80 and 1000.
It is thus quite inefficient to sign a large digital document D, both because it
takes a lot of time to sign each b bits of D and because the resulting digital
signature is likely to be as large as the original document.

The standard solution to this problem is to use a hash function, which is
an easily computable function
Hash : (arbitrary size documents) −→ {0, 1}k
that is very hard to invert. (More generally, one wants it to be very difficult
to find two distinct inputs D and D
whose outputs Hash(D) and Hash(D
)
are the same.) Then, rather than signing her document D, Samantha instead
computes and signs the hash Hash(D). For verification, Victor computes and
verifies the signature on Hash(D).
There are also security advantages to signing a hash of D, including in-
trinsically linking the signature to the entire document, and preventing an ad-
versary from choosing random signatures and determining which documents
they sign. For a brief introduction to hash functions and references for further
reading, see Sect. 8.1. We will not concern ourselves further with such issues
in this chapter.
Remark 4.3. There are many variants of the basic digital signature paradigm.
For example, a blinded signature is one in which the signer does not know
the contents of the document being signed. This could be useful, for example,
if voters want an election official to sign their votes without revealing what
those votes are. Further material on blinded signatures, with an RSA-style
example and applications to digital cash systems, are given in Sect. 8.8.
In this chapter we discuss digital signature schemes whose underlying hard
problems are integer factorization and the discrete logarithm problem in F∗
p.
Subsequent chapters include descriptions of digital signature schemes based
on the discrete logarithm problem in elliptic curve groups (Sect. 6.4.3) and on
hard lattice problems (Sect. 7.12).
4.2 RSA Digital Signatures
The original RSA paper described both the RSA encryption scheme and an
RSA digital signature scheme. The idea is very simple. The setup is the same
as for RSA encryption, Samantha chooses two large secret primes p and q
and she publishes their product N = pq and a public verification exponent e.
Samantha uses her knowledge of the factorization of N to solve the congruence
de ≡ 1

mod(p − 1)(q − 1)

. (4.1)
Note that if Samantha were doing RSA encryption, then e would be her
encryption exponent and d would be her decryption exponent. However, in
the present setup d is her signing exponent and e is her verification exponent.
In order to sign a digital document D, which we assume to be an integer
in the range 1 D N, Samantha computes
S ≡ Dd
(mod N).

4.2. RSA Digital Signatures 197
Samantha Victor
Key creation
Choose verification exponent e
with
gcd(e, (p − 1)(q − 1)) = 1.
Publish N = pq and e.
Signing
Compute d satisfying
de ≡ 1 (mod (p − 1)(q − 1)).
Sign document D by computing
S ≡ Dd
(mod N).
Verification
Compute Se
mod N and verify
that it is equal to D.
Table 4.1: RSA digital signatures
Victor verifies the validity of the signature S on D by computing
Se
mod N
and checking that it is equal to D. This process works because Euler’s formula
Se
≡ Dde
≡ D (mod N).
The RSA digital signature scheme is summarized in Table 4.1.
If Eve can factor N, then she can solve (4.1) for Samantha’s secret signing
key d. However, just as with RSA encryption, the hard problem underlying
RSA digital signatures is not directly the problem of factorization. In order
to forge a signature on a document D, Eve needs to find a eth root of D
modulo N. This is identical to the hard problem underlying RSA decryption,
in which the plaintext is the eth root of the ciphertext.
Remark 4.4. As with RSA encryption, one can gain a bit of efficiency by
choosing d and e to satisfy
de ≡ 1

mod
(p − 1)(q − 1)
gcd(p − 1, q − 1)

.
Theorem 3.1 ensures that the veriﬁcation step still works.
Example 4.5. We illustrate the RSA digital signature scheme with a small
numerical example.

RSA Signature Key Creation
• Samantha chooses two secret primes p = 1223 and q = 1987 and computes
her public modulus
N = p · q = 1223 · 1987 = 2430101.
• Samantha chooses a public verification exponent e = 948047 with the
property that
gcd

e, (p − 1)(q − 1)

= gcd(948047, 2426892) = 1.
RSA Signing
• Samantha computes her private signing key d using the secret values of p
and q to compute (p − 1)(q − 1) = 1222 · 1986 = 2426892 and then solving
the congruence
ed ≡ 1

mod(p − 1)(q − 1)

, 948047 · d ≡ 1 (mod 2426892).
She finds that d = 1051235.
• Samantha selects a digital document to sign,
D = 1070777 with 1 ≤ D N.
She computes the digital signature
S ≡ Dd
(mod N), S ≡ 10707771051235
≡ 153337 (mod 2430101).
• Samantha publishes the document and signature
D = 1070777 and S = 153337.
RSA Verification
• Victor uses Samantha’s public modulus N and verification exponent e to
compute
Se
mod N, 153337948047
≡ 1070777 (mod 2430101).
He verifies that the value of Se
modulo N is the same as the value of the
digital document D = 1070777.
4.3 Elgamal Digital Signatures and DSA
The transition from RSA encryption to RSA digital signatures, as described
in Sect. 4.2, is quite straightforward. This is not true for discrete logarithm
based encryption schemes such as Elgamal (Sect. 2.4).
An Elgamal-style digital signature scheme was put forward in 1985, and a
modified version called the Digital Signature Algorithm (DSA), which allows

4.3. Elgamal Digital Signatures and DSA 199
shorter signatures, was proposed in 1991 and officially published as a national
Digital Signature Standard (DSS) in 1994; see [98]. We start with the Elgamal
scheme, which is easier to understand, and then explain how DSA works.
Samantha, or some trusted third party, chooses a large prime p and a
primitive root g modulo p. Samantha next chooses a secret signing exponent a
and computes
A ≡ ga
(mod p).
The quantity a, together with the public parameters p and g, form Samantha’s
public verification key.
Suppose now that Samantha wants to sign a digital document D, where D
is an integer satisfying 1 D p. She chooses a random element 1 k p
satisfying gcd(k, p − 1) = 1 and computes the two quantities
S1 ≡ gk
(mod p) and S2 ≡ (D − aS1)k−1
(mod p − 1). (4.2)
Notice that S2 is computed modulo p − 1, not modulo p. Samantha’s digital
signature on the document D is the pair (S1, S2).
Victor verifies the signature by checking that
AS1
SS2
1 mod p is equal to gD
mod p.
The Elgamal digital signature algorithm is illustrated in Table 4.2.
Why does Elgamal work? When Victor computes AS1
SS2
1 , he is actually
computing
AS1
· SS2
1 ≡ gaS1
· gkS2
≡ gaS1+kS2
≡ gaS1+k(D−aS1)k−1
≡ gaS1+(D−aS1)
≡ gD
(mod p),
so verification returns TRUE for a valid signature.
Notice the significance of choosing S2 modulo p − 1. The quantity S2
appears as an exponent of g, and we know that gp−1
≡ 1 (mod p), so in the
expression gS2
mod p, we may replace S2 by any quantity that is congruent
to S2 modulo p − 1.
If Eve knows how to solve the discrete logarithm problem, then she can
solve ga
≡ A (mod p) for Samantha’s private signing key a, and thence can
forge Samantha’s signature. However, it is not at all clear that this is the only
way to forge an Elgamal signature. Eve’s task is as follows. Given the values
of A and gD
, Eve must find integers x and y satisfying
Ax
xy
≡ gD
(mod p). (4.3)
The congruence (4.3) is a rather curious one, because the variable x appears
as both a base and an exponent. Using discrete logarithms to the base g, we
can rewrite (4.3) as
logg(A)x + y logg(x) ≡ D (mod p − 1). (4.4)

A trusted party chooses and publishes a large prime p
and primitive root g modulo p.
Samantha Victor
Key creation
Choose secret signing key
1 ≤ a ≤ p − 1.
Compute A = ga
(mod p).
Publish the verification key A.
Signing
Choose document D mod p.
Choose random element 1 k p
satisfying gcd(k, p − 1) = 1.
Compute signature
S1 ≡ gk
(mod p) and
S2 ≡ (D − aS1)k−1
(mod p − 1).
Verification
Compute AS1
S
S2
1 mod p.
Verify that it is equal to gD
mod p.
Table 4.2: The Elgamal digital signature algorithm
If Eve can solve the discrete logarithm problem, she can take an arbitrary value
for x, compute logg(A) and logg(x), and then solve (4.4) for y. At present,
this is the only known method for finding a solution to (4.4).
Remark 4.6. There are many subtleties associated to using an ostensibly se-
cure digital signature scheme such as Elgamal. See Exercises 4.7 and 4.8 for
some examples of what can go wrong.
Example 4.7. Samantha chooses the prime p = 21739 and primitive root
g = 7. She selects the secret signing key a = 15140 and computes her public
verification key
A ≡ ga
≡ 715140
≡ 17702 (mod 21739).
She signs the digital document D = 5331 using the random element k = 10727
by computing
S1 ≡ gk
≡ 710727
≡ 15775 (mod 21739),
S2 ≡ (D − aS1)k−1
≡ (5331 − 15140 · 15775) · 6353 ≡ 791 (mod 21738).
Samantha publishes the signature (S1, S2) = (15775, 791) and the digital doc-
ument D = 5331. Victor verifies the signature by computing
AS1
SS2
1 ≡ 1770215775
· 15775791
≡ 13897 (mod 21739)

4.3. Elgamal Digital Signatures and DSA 201
and verifying that it agrees with
gD
≡ 75331
≡ 13897 (mod 21739).
An Elgamal signature (S1, S2) consists of one number modulo p and one
number modulo p − 1, so has length approximately 2 log2(p) bits. In order to
be secure against index calculus attacks on the discrete logarithm problem,
the prime p is generally taken to be between 1000 and 2000 bits, so signatures
are between 2000 and 4000 bits.
The Digital Signature Algorithm (DSA) significantly shortens the signa-
ture by working in a subgroup of F∗
p of prime order q. The underlying assump-
tion is that using the index calculus to solve the discrete logarithm problem
in the subgroup is no easier than solving it in F∗
p. So it suffices to take a sub-
group in which it is infeasible to solve the discrete logarithm problem using a
collision algorithm. We now describe the details of DSA.
Samantha, or some trusted third party, chooses two primes p and q with
p ≡ 1 (mod q).
(In practice, typical choices satisfy 21000
p 22000
and 2160
q 2320
.)
She also chooses an element g ∈ F∗
p of exact order q. This is easy to do. For
example, she can take
g = g
(p−1)/q
1 for a primitive root g1 in Fp.
Samantha chooses a secret exponent a and computes
A ≡ ga
(mod p).
The quantity A, together with the public parameters (p, q, g), form
Samantha’s public verification key.
Suppose now that Samantha wants to sign a digital document D, where D
is an integer satisfying 1 ≤ D q. She chooses a random element k in the
range 1 k q and computes the two quantities
S1 = (gk
mod p) mod q and S2 ≡ (D + aS1)k−1
(mod q). (4.5)
Notice the similarity between (4.5) and the Elgamal signature (4.2). However,
there is an important difference, since when computing S1 in (4.5), Samantha
first computes gk
mod p as an integer in the range from 1 to p − 1, and then
she reduces modulo q to obtain an integer in the range from 1 to q − 1.
Samantha’s digital signature on the document D is the pair (S1, S2), so the
signature consists of two numbers modulo q.
Victor verifies the signature by first computing
V1 ≡ DS−1
2 (mod q) and V2 ≡ S1S−1
2 (mod q).
He then checks that

A trusted party chooses and publishes large primes p and q satisfying
p ≡ 1 (mod q) and an element g of order q modulo p.
Samantha Victor
Key creation
1 ≤ a ≤ q − 1.
Compute A = ga
(mod p).
Publish the veriﬁcation key A.
Signing
Choose document D mod q.
Choose random element 1 k q.
Compute signature
S1 ≡ (gk
mod p) mod q and
S2 ≡ (D + aS1)k−1
(mod q).
Veriﬁcation
Compute V1 ≡ DS−1
2 (mod q) and
V2 ≡ S1S−1
2 (mod q).
Verify that
(gV1
AV2
mod p) mod q = S1.
Table 4.3: The digital signature algorithm (DSA)
(gV1
AV2
mod p) mod q is equal to S1.
The digital signature algorithm (DSA) is illustrated in Table 4.3.
DSA seems somewhat complicated, but it is easy to check that it works.
Thus Victor computes
gV1
AV2
(mod p) ≡ gDS−1
2 gaS1S−1
2 since V1 ≡ DS−1
2 and V2 ≡ S1S−1
2
and A ≡ ga
,
≡ g(D+aS1)S−1
2 (mod p)
≡ gk
(mod p) since S2 ≡ (D + aS1)k−1
.
Hence
(gV1
AV2
mod p) mod q = (gk
mod p) mod q = S1.
Example 4.8. We illustrate DSA with a small numerical example. Samantha
uses the public parameters
p = 48731, q = 443, and g = 5260.

Exercises 203
(The element g was computed as g ≡ 748730/443
(mod 48731), where 7 is a
primitive root modulo 48731.) Samantha chooses the secret signing key a =
242 and publishes her public verification key
A ≡ 5260242
≡ 3438 (mod 48731).
She signs the document D = 343 using the random element k = 427 by
computing the two quantities
S1 = (5260427
mod 48731) mod 443 = 2727 mod 443 = 59,
S2 ≡ (343 + 343 · 59)427−1
≡ 166 (mod 443).
Samantha publishes the signature (S1, S2) = (59, 166) for the document
D = 343.
Victor verifies the signature by first computing
V1 ≡ 343 · 166−1
≡ 357 (mod 443) and V2 ≡ 59 · 166−1
≡ 414 (mod 443).
He then computes
gV1
AV2
≡ 5260357
· 3438414
≡ 2717 (mod 48731)
and checks that
(gV1
AV2
mod 48731) mod 443 = 2717 mod 443 = 59
is equal to S1 = 59.
Both the Elgamal digital signature scheme and DSA can be adapted to
other groups in which the discrete logarithm problem is ostensibly more diffi-
cult to solve. In particular, the use of elliptic curve groups leads to the Elliptic
Curve Digital Signature Algorithm (ECDSA), which is described in Sect. 6.4.3.
Exercises
Section 4.2. RSA Digital Signatures
4.1. Samantha uses the RSA signature scheme with primes p = 541 and q = 1223
and public verification exponent e = 159853.
(a) What is Samantha’s public modulus? What is her private signing key?
(b) Samantha signs the digital document D = 630579. What is the signature?
4.2. Samantha uses the RSA signature scheme with public modulus N = 1562501
and public verification exponent e = 87953. Adam claims that Samantha has signed
each of the documents
D = 119812, D
= 161153, D
= 586036,
and that the associated signatures are
S = 876453, S
= 870099, S
= 602754.
Which of these are valid signatures?

204 Exercises
4.3. Samantha uses the RSA signature scheme with public modulus and public
verification exponent
N = 27212325191 and e = 22824469379.
Use whatever method you want to factor N, and then forge Samantha’s signature
on the document D = 12910258780.
4.4. Suppose that Alice and Bob communicate using the RSA PKC. This means
that Alice has a public modulus NA = pAqA, a public encryption exponent eA, and a
private decryption exponent dA, where pA and qA are primes and eA and dA satisfy
eAdA ≡ 1

mod (pA − 1)(qA − 1)

.
Similarly, Bob has a public modulus NB = pBqB, a public encryption exponent eB,
and a private decryption exponent dB.
In this situation, Alice can simultaneously encrypt and sign a message in the
following way. Alice chooses her plaintext m and computes the usual RSA ciphertext
c ≡ meB
(mod NB).
She next applies a hash function to her plaintext and uses her private decryption
key to compute
s ≡ Hash(m)dA
(mod NA).
She sends the pair (c, s) to Bob.
Bob first decrypts the ciphertext using his private decryption exponent dB,
m ≡ cdB
(mod NB).
He then uses Alice’s public encryption exponent eA to verify that
Hash(m) ≡ seA
(mod NA).
Explain why verification works, and why it would be difficult for anyone other
than Alice to send Bob a validly signed message.
Section 4.3. Discrete Logarithm Digital Signatures
4.5. Samantha uses the Elgamal signature scheme with prime p = 6961 and primi-
tive root g = 437.
(a) Samantha’s private signing key is a = 6104. What is her public verification
key?
(b) Samantha signs the digital document D = 5584 using the random element
k = 4451. What is the signature?
4.6. Samantha uses the Elgamal signature scheme with prime p = 6961 and prim-
itive root g = 437. Her public verification key is A = 4250. Adam claims that
Samantha has signed each of the documents
D = 1521, D
= 1837, D
= 1614,
and that the associated signatures are
(S1, S2) = (4129, 5575), (S
1, S
2) = (3145, 1871), (S
1 , S
2 ) = (2709, 2994).
Which of these are valid signatures?

Exercises 205
4.7. Let p be a prime, let i and j be integers with gcd(j, p − 1) = 1, and let A be
arbitrary. Set
S1 ≡ gi
Aj
(mod p), S2 ≡ −S1j−1
(mod p − 1), D ≡ −S1ij−1
(mod p − 1).
Prove that (S1, S2) is a valid Elgamal signature on the document D for the verifi-
cation key A. Thus Eve can produce signatures on random documents.
4.8. Suppose that Samantha is using the Elgamal signature scheme and that she is
careless and uses the same random element k to sign two documents D and D
.
(a) Explain how Eve can tell at a glance whether Samantha has made this mistake.
(b) If the signature on D is (S1, S2) and the signature on D
is (S
1, S
2), explain
how Eve can recover a, Samantha’s private signing key.
(c) Apply your method from (b) to the following example and recover Samantha’s
signing key a, where Samantha is using the prime p = 348149, base g = 113459,
and verification key A = 185149.
D = 153405, S1 = 208913, S2 = 209176,
D
= 127561, S
1 = 208913, S
2 = 217800.
4.9. Samantha uses DSA with public parameters (p, q, g) = (22531, 751, 4488). She
chooses the secret signing key a = 674.
(a) What is Samantha’s public verification key?
(b) Samantha signs the document D = 244 using the random element k = 574.
What is the signature?
4.10. Samantha uses DSA with public parameters (p, q, g) = (22531, 751, 4488). Her
public verification key is A = 22476.
(a) Is (S1, S2) = (183, 260) a valid signature on the document D = 329?
(b) Is (S1, S2) = (211, 97) a valid signature on the document D = 432?
4.11. Samantha’s DSA public parameters are (p, q, g) = (103687, 1571, 21947), and
her public verification key is A = 31377. Use whatever method you prefer (brute-
force, collision, index calculus,. . . ) to solve the DLP and find Samantha’s private
signing key. Use her key to sign the document D = 510 using the random element
k = 1105.

Chapter 5
Combinatorics, Probability,
and Information Theory
In considering the usefulness and practicality of a cryptographic system, it is
necessary to measure its resistance to various forms of attack. Such attacks
include simple brute-force searches through the key or message space, some-
what faster searches via collision or meet-in-the-middle algorithms, and more
sophisticated methods that are used to compute discrete logarithms, factor
integers, and ﬁnd short vectors in lattices. We have already studied some of
these algorithms in Chaps. 2 and 3, and we will see the others in this and later
chapters. In studying these algorithms, it is important to be able to analyze
how long they take to solve the targeted problem. Such an analysis generally
requires tools from combinatorics, probability theory, and information theory.
In this chapter we present, in a largely self-contained form, an introduction
to these topics.
We start with basic principles of counting, and continue with the devel-
opment of the foundations of probability theory, primarily in the discrete set-
ting. Subsequent sections introduce (discrete) random variables, probability
density functions, conditional probability and Bayes’s formula. The applica-
tions of probability theory to cryptography are legion. We cover in some detail
Monte Carlo algorithms and collision algorithms and their uses in cryptogra-
phy. We also include a section on the statistical cryptanalysis of a historically
interesting polyalphabetic substitution cipher called the Vigenère cipher, but
we note that the material on the Vigenère cipher is not used elsewhere in the
book, so it may be omitted by the reader who wishes to proceed more rapidly
to the more modern cryptographic material.
207

208 5. Combinatorics, Probability, and Information Theory
The chapter concludes with a very short introduction to the concept of
complexity and the notions of polynomial-time and nondeterministic polyno-
mial-time algorithms. This section, if properly developed, would be a book in
itself, and we can only give a hint of the powerful ideas and techniques used
in this subject.
5.1 Basic Principles of Counting
As I was going to St. Ives,
I met a man with seven wives,
Each wife had seven sacks,
Each sack had seven cats.
Each cat had seven kits.
Kits, cats, sacks, and wives,
How many were going to St. Ives?
The trick answer to this ancient riddle is that there is only one person going
to St. Ives, namely the narrator, since all of the other people and animals
and objects that he meets in the rhyme are not traveling to St. Ives, they are
traveling away from St. Ives! However, if we are in a pedantic, rather than a
clever, frame of mind, we might instead ask the natural question: How many
people, animals, and objects does the narrator meet?
The answer is
2801 = 1

man
+ 7

wives
+ 72

sacks
+ 73

cats
+ 74

kits
.
The computation of this number employs basic counting principles that are
fundamental to the probability calculations used in cryptography and in many
other areas of mathematics. We have already seen an example in Sect. 1.1.1,
where we computed the number of diﬀerent simple substitution ciphers.
A cipher is said to be combinatorially secure if it is not feasible to break
the system by exhaustively checking every possible key.1
This depends to
some extent on how long it takes to check each key, but more importantly,
it depends on the number of keys. In this section we develop some basic
counting techniques that are used in a variety of ways to analyze the security
of cryptographic constructions.
Example 5.1 (A Basic Counting Principle). Bob is at a restaurant that
features two appetizers, egg rolls and fried wontons, and 20 main dishes. As-
suming that he plans to order one appetizer and one main dish, how many
possible meals could Bob order?
We need to count the number of pairs (x, y), where x is either “egg roll” or
“fried wonton” and y is a main dish. The total number is obtained by letting x
1Sometimes the length of the search can be signiﬁcantly shortened by matching pieces of
keys taken from two or more lists. Such an attack is called a collision or meet-in-the-middle
attack; see Sect. 5.4.

5.1. Basic Principles of Counting 209
vary over the 2 possibilities and letting y vary over the 20 possibilities, and
then counting up the total number of pairs
(ER, 1), (ER, 2), . . . , (ER, 20), (FW, 1), (FW, 2), . . . , (FW, 20).
The answer is that there are 40 possibilities, which we compute as
40 = 2

appetizers
· 20

main dishes
.
In this example, we first counted the number of ways of assigning an appe-
tizer (egg roll or fried wonton) to the variable x. It is convenient to view this
assignment as the outcome of an experiment. That is, we perform an experi-
ment whose outcome is either “egg roll” or “fried wonton,” and we assign the
outcome’s value to x. Similarly, we perform a second independent experiment
whose possible outcomes are any one of the 20 main courses, and we assign
that value to y. The total number of outcomes of the two experiments is the
product of the number of outcomes for each one individually. This leads to
the following basic counting principle:
Basic Counting Principle
If two experiments are performed, one of which
has n possible outcomes and the other of which
has m possible outcomes, then there are nm
possible outcomes of performing both experi-
ments.
More generally, if k independent experiments are performed and if the
number of possible outcomes of the ith experiment is ni, then the total number
of outcomes for all of the experiments is the product n1n2 · · · nk. It is easy to
derive this result by writing xi for the outcome of the ith experiment. Then
the outcome of all k experiments is the value of the k-tuple (x1, x2, . . . , xk),
and the total number of possible k-tuples is the product n1n2 · · · nk.
Example 5.2. Suppose that Bob also wants to order dessert, and that there
are five desserts on the menu. We are now counting triples (x, y, z), where x
is one of the two appetizers, y is one of the 20 main dishes, and z is one of
the five desserts. Hence the total number of meals is
200 = 2

appetizers
· 20

main courses
· 5

desserts
.
The basic counting principle is used in the solution of the pedantic version
of the St. Ives problem. For example, the number of cats traveling from St.
Ives is
# of cats = 343 = 73
= 1

man
· 7

wives
· 7

sacks
· 7

cats
.
The earliest published version of the St. Ives riddle dates to around 1730, but
similar problems date back to antiquity; see Exercise 5.1.

5.1.1 Permutations
The numbers 1, 2, . . . , 10 are typically listed in increasing order, but suppose
instead we allow the order to be mixed. Then how many different ways are
there to list these ten integers? Each possible configuration is called a permu-
tation of 1, 2, . . . , 10. The problem of counting the number of possible permu-
tations of a given list of objects occurs in many forms and contexts throughout
mathematics.
Each permutation of 1, 2, . . . , 10 is a sequence of all ten distinct integers
in some order. For example, here is a random choice: 8, 6, 10, 3, 9, 2, 4, 7, 5, 1.
How can we create all of the possibilities? It’s easiest to create them by listing
the numbers one at a time, say from left to right. We thus start by assigning a
number to the first position. There are ten choices. Next we assign a number
to the second position, but for the second position there are only nine choices,
because we already used up one of the integers in the first position. (Remember
that we are not allowed to use an integer twice.) Then there are eight integers
left as possibilities for the third position, because we already used two integers
in the first two positions. And so on. Hence the total number of permutations
of 1, 2, . . . , 10 is
10! = 10 · 9 · 8 · · · 2 · 1.
The value of 10! is 3628800, so between three and four million.
Notice how we are using the basic counting principle. The only subtlety
is that the outcome of the first experiment reduces the number of possible
outcomes of the second experiment, the results of the first two experiments
further reduce the number of possible outcomes of the third experiment, and
so on.
Definition. Let S be a set containing n distinct objects. A permutation of S
is an ordered list of the objects in S. A permutation of the set {1, 2, . . . , n} is
simply called a permutation of n.
Proposition 5.3. Let S be a set containing n distinct objects. Then there are
exactly n! different permutations of S.
Proof. Our discussion of the permutations of {1, . . . , 10} works in general.
Thus suppose that S contains n objects and that we want to create a permu-
tation of S. There are n choices for the first entry, then n − 1 choices for the
second entry, then n − 2 choices for the third entry, etc. This leads to a total
of n · (n − 1) · (n − 2) · · · 2 · 1 possible permutations.
Remark 5.4 (Permutations and Simple Substitution Ciphers). By definition,
a permutation of the set {a1, a2, . . . , an} is a list consisting of the ai’s in some
order. We can also describe a permutation by using a bijective (i.e., one-to-one
and onto) function
π : {1, 2, . . . , n} −→ {1, 2, . . . , n}.

The function π determines the permutation
(aπ(1), aπ(2), . . . , aπ(n)),
and given a permutation, it is easy to write down the corresponding function.
Now suppose that we take the set of letters {A, B, C, . . . , Z}. A permuta-
tion π of this set is just another name for a simple substitution cipher, where π
acts as the encryption function. Thus π tells us that A gets sent to the π(1)st
letter, and B gets sent to the π(2)nd letter, and so on. In order to decrypt, we
use the inverse function π−1
.
Example 5.5. Sometimes one needs to count the number of possible permuta-
tions of n objects when some of the objects are indistinguishable. For example,
there are six permutations of three distinct objects A, B, C,
ABC, CAB, BCA, ACB, BAC, and CBA,
but if two of them are indistinguishable, say A, A, B, then there are only three
different arrangements,
AAB, ABA, and BAA.
To illustrate the idea in a more complicated case, we count the number
of different letter arrangements of the five letters A, A, A, B, B. If the five
letters were distinguishable, say they were labeled A1, A2, A3, B1, B2, then
there would be 5! permutations. However, permutations such as
A1A2B1B2A3 and A2A3B2B1A1
become the same when the subscripts are dropped, so we have overcounted
in arriving at the number 5!. How many different arrangements have been
counted more than once?
For example, in any particular permutation, the two B’s have been placed
into specific positions, but we can always switch them and get the same un-
subscripted list. This means that we need to divide 5! by 2 to compensate
for overcounting the placement of the B’s. Similarly, once the three A’s have
been placed into specific positions, we can permute them among themselves
in 3! ways, so we need to divide 5! by 3! to compensate for overcounting the
placement of the A’s. Hence there are 5!
3!·2! = 10 different letter arrangements
of the five letters A, A, A, B, B.
5.1.2 Combinations
A permutation is a way of arranging a set of objects into a list. A combination
is similar, except that now the order of the list no longer matters. We start
with an example that is typical of problems involving combinations.

Example 5.6. Five people (Alice, Bob, Carl, Dave, and Eve2
) are ordering
a meal at a Chinese restaurant. The menu contains 20 different items. Each
person gets to choose one dish, no dish may be ordered twice, and they plan
to share the food. How many different meals are possible?
Alice orders first and she has 20 choices for her dish. Then Bob orders from
the remaining 19 dishes, and then Carl chooses from the remaining 18 dishes,
and so on. It thus appears that there are 20 · 19 · 18 · 17 · 16 = 1860480 possi-
ble meals. However, the order in which the dishes are ordered is immaterial.
If Alice orders fried rice and Bob orders egg rolls, or if Alice orders egg rolls
and Bob orders fried rice, the meal is the same. Unfortunately, we did not
take this into account when we arrived at the number 1860480.
Let’s number the dishes D1, D2, . . . , D20. Then, for example, we want to
count the two possible dinners
D1, D5, D7, D18, D20 and D5, D18, D20, D7, D1
as being the same, although the order of the dishes is different. To correct the
overcount, note that in the computation 20 · 19 · 18 · 17 · 16 = 1860480, every
permutation of any set of five dishes was counted separately, but we really
want to count these permutations as giving the same meal. Thus we should
divide 1860480 by the number of ways to permute the five distinct dishes in
each possible order, i.e., we should divide by 5!. Hence the total number of
different meals is
20 · 19 · 18 · 17 · 16
5!
= 15504.
It is often convenient to rewrite this quantity entirely in terms of factorials
by multiplying the numerator and the denominator by 15! to get
20 · 19 · 18 · 17 · 16
5!
=
(20 · 19 · 18 · 17 · 16) · (15 · 14 · · · 3 · 2 · 1)
5! · 15!
=
20!
5! · 15!
.
Definition. Let S be a set containing n distinct objects. A combination of r
objects of S is a subset consisting of exactly r distinct elements of S, where
the order of the objects in the subset does not matter.
Proposition 5.7. The number of possible combinations of r objects chosen
from a set of n objects is equal to

n
r

=
n!
r!(n − r)!
.
Remark 5.8. The symbol
n
r

is called a combinatorial symbol or a binomial
coeﬃcient. It is read as “n choose r.” Note that by convention, zero factorial
is set equal to 1, so
n
0

= n!
n!·0! = 1. This makes sense, since there is only one
way to choose zero objects from a set.
2You may wonder why Alice and Bob, those intrepid exchangers of encrypted secret
messages, are sitting down for a meal with their cryptographic adversary Eve. In the real
world, this happens all the time, especially at cryptography conferences!

Proof of Proposition 5.7. If you understand the discussion in Example 5.6,
then the proof of the general case is clear. The number of ways to make an
ordered list of r distinct elements from the set S is
n(n − 1)(n − 2) · · · (n − r + 1),
since there are n choices for the ﬁrst element, then n − 1 choices for the second
element, and so on until we have selected r elements. Then we need to divide
by r! in order to compensate for the ways to permute the r elements in our
subset. Dividing by r! accounts for the fact that we do not care in which order
the r elements were chosen. Hence the total number of combinations is
n(n − 1)(n − 2) · · · (n − r + 1)
r!
=
n!
r!(n − r)!
.
Example 5.9. Returning to the ﬁve people ordering a meal at the Chinese
restaurant, suppose that they want the order to consist of two vegetarian
dishes and three meat dishes, and suppose that the menu contains 5 vege-
tarian choices and 15 meat choices. Now how many possible meals can they
order? There are
5
2

possibilities for the two vegetarian dishes and there
are
15
3

choices for the three meat dishes. Hence by our basic counting prin-
ciple, there are
5
2

= 10 · 455 = 4550
possible meals.
5.1.3 The Binomial Theorem
You may have seen the combinatorial numbers
n
r

appearing in the binomial
theorem,3
which gives a formula for the nth power of the sum of two numbers.
Theorem 5.10 (The Binomial Theorem).
(x + y)n
=
n

j=0

n
j

xj
yn−j
. (5.1)
Proof. Let’s start with a particular case, say n = 3. If we multiply out the
product
3The binomial theorem’s fame extends beyond mathematics. Moriarty, Sherlock
Holmes’s arch enemy, “wrote a treatise upon the Binomial Theorem,” on the strength of
which he won a mathematical professorship. And Major General Stanley, that very Model
of a Modern Major General, proudly informs the Pirate King and his cutthroat band:
About Binomial Theorem I’m teeming with a lot o’ news—
With many cheerful facts about the square of the hypotenuse.
(The Pirates of Penzance, W.S. Gilbert and A. Sullivan 1879)

(x + y)3
= (x + y) · (x + y) · (x + y), (5.2)
the result is a sum of terms x3
, x2
y, xy2
, and y3
. There is only one x3
term,
since to get x3
we must take x from each of the three factors in (5.2). How
many copies of x2
y are there? We can get x2
y in several ways. For example,
we could take x from the ﬁrst two factors and y from the last factor. Or we
could take x from the ﬁrst and third factors and take y from the second factor.
Thus we get x2
y by choosing two of the three factors in (5.2) to give x (note
that the order doesn’t matter), and then the remaining factor gives y. There
are thus
3
2

= 3 ways to get x2
y. Similarly, there are
3
1

= 3 ways to get xy2
and only one way to get y3
. Hence
(x + y)3
=

3
3

y3
= x3
+ 3x2
y + 3xy2
+ y3
.
The general case is exactly the same. When multiplied out, the product
(x + y)n
= (x + y) · (x + y) · (x + y) · · · (x + y) (5.3)
is a sum of terms xn
, xn−1
y, . . . , xyn−1
, yn
. We get copies of xj
yn−j
by choos-
ing x from any j of the factors in (5.3) and then taking y from the other n − j
factors. Thus we get
n
j

copies of xj
yn−j
. Summing over the possible values
of j gives (5.1), which completes the proof of the binomial theorem.
Example 5.11. We use the binomial theorem to compute
(2t + 3)4
=

4
4

34
= 16t4
+ 4 · 8t3
· 3 + 6 · 4t2
· 9 + 4 · 2t · 27 + 81
= 16t4
+ 96t3
+ 216t2
+ 216t + 81.
5.2 The Vigenère Cipher
The simple substitution ciphers that we studied in Sect. 1.1 are examples of
monoalphabetic ciphers, since every plaintext letter is encrypted using only
one cipher alphabet. As cryptanalytic methods became more sophisticated in
Renaissance Italy, correspondingly more sophisticated ciphers were invented
(although it seems that they were seldom used in practice). Consider how
much more difficult a task is faced by the cryptanalyst if every plaintext
letter is encrypted using a different ciphertext alphabet. This ideal resurfaces
in modern cryptography in the form of the one-time pad, which we discuss
in Sect. 5.6, but in this section we discuss a less complicated polyalphabetic
cipher called the Vigenère cipher4
dating back to the sixteenth century.
4This cipher is named after Blaise de Vigenère (1523–1596), whose 1586 book Traicté
des Chiffres describes the known ciphers of his time. These include polyalphabetic ciphers
such as the “Vigenère cipher,” which according to [63] Vigenère did not invent, and an
ingenious autokey system (see Exercise 5.19), which he did.

5.2. The Vigenère Cipher 215
The Vigenère cipher works by using different shift ciphers to encrypt dif-
ferent letters. In order to decide how far to shift each letter, Bob and Alice
first agree on a keyword or phrase. Bob then uses the letters of the keyword,
one by one, to determine how far to shift each successive plaintext letter. If the
keyword letter is a, there is no shift, if the keyword letter is b, he shifts by 1,
if the keyword letter is c, he shifts by 2, and so on. An example illustrates the
process:
Example 5.12. Suppose that the keyword is dog and the plaintext is yellow.
The first letter of the keyword is d, which gives a shift of 3, so Bob shifts
the first plaintext letter y forward by 3, which gives the ciphertext letter b.
(Remember that a follows z.) The second letter of the keyword is o, which
gives a shift of 14, so Bob shifts the second plaintext letter e forward by 14,
which gives the ciphertext letter s. The third letter of the keyword is g, which
gives a shift of 6, so Bob shifts the third plaintext letter l forward by 6, which
gives the ciphertext letter r.
Bob has run out of keyword letters, so what does he do now? He simply
starts again with the first letter of the keyword. The first letter of the keyword
is d, which again gives a shift of 3, so Bob shifts the fourth plaintext letter l
forward by 3, which gives the ciphertext letter o. Then the second keyword
letter o tells him to shift the fifth plaintext letter o forward by 14, giving the
ciphertext letter c, and finally the third keyword letter g tells him to shift the
sixth plaintext letter w forward by 6, giving the ciphertext letter c.
In conclusion, Bob has encrypted the plaintext yellow using the keyword
dog and obtained the ciphertext bsrocc.
Even this simple example illustrates two important characteristics of the
Vigenère cipher. First, the repeated letters ll in the plaintext lead to non-
identical letters ro in the ciphertext, and second, the repeated letters cc in the
ciphertext correspond to different letters ow of the plaintext. Thus a straight-
forward frequency analysis as we used to cryptanalyze simple substitution
ciphers (Sect. 1.1.1) is not going to work for the Vigenère cipher.
A useful tool for doing Vigenère encryption and decryption, at least if no
computer is available (as was typically the case in the sixteenth century!), is
the so-called Vigenère tableau illustrated in Table 5.1. The Vigenère tableau
consists of 26 alphabets arranged in a square, with each alphabet shifted one
further than the alphabet to its left. In order to use a given keyword letter
to encrypt a given plaintext letter, Bob finds the plaintext letter in the top
row and the keyword letter in the first column. He then looks for the letter in
the tableau lying below the plaintext letter and to the right of the keyword
letter. That is, he locates the encrypted letter at the intersection of the row
beginning with the keyword letter and the column with the plaintext letter
on top.
For example, if the keyword letter is d and the plaintext letter is y, Bob
looks in the fourth row (which is the one that starts with d) and in the next

b c d e f g h i j k l m n o p q r s t u v w x y z a
c d e f g h i j k l m n o p q r s t u v w x y z a b
d e f g h i j k l m n o p q r s t u v w x y z a b c
e f g h i j k l m n o p q r s t u v w x y z a b c d
f g h i j k l m n o p q r s t u v w x y z a b c d e
g h i j k l m n o p q r s t u v w x y z a b c d e f
h i j k l m n o p q r s t u v w x y z a b c d e f g
i j k l m n o p q r s t u v w x y z a b c d e f g h
j k l m n o p q r s t u v w x y z a b c d e f g h i
k l m n o p q r s t u v w x y z a b c d e f g h i j
l m n o p q r s t u v w x y z a b c d e f g h i j k
m n o p q r s t u v w x y z a b c d e f g h i j k l
n o p q r s t u v w x y z a b c d e f g h i j k l m
o p q r s t u v w x y z a b c d e f g h i j k l m n
p q r s t u v w x y z a b c d e f g h i j k l m n o
q r s t u v w x y z a b c d e f g h i j k l m n o p
r s t u v w x y z a b c d e f g h i j k l m n o p q
s t u v w x y z a b c d e f g h i j k l m n o p q r
t u v w x y z a b c d e f g h i j k l m n o p q r s
u v w x y z a b c d e f g h i j k l m n o p q r s t
v w x y z a b c d e f g h i j k l m n o p q r s t u
w x y z a b c d e f g h i j k l m n o p q r s t u v
x y z a b c d e f g h i j k l m n o p q r s t u v w
y z a b c d e f g h i j k l m n o p q r s t u v w x
z a b c d e f g h i j k l m n o p q r s t u v w x y
• Find the plaintext letter in the top row.
• Find the keyword letter in the ﬁrst column.
• The ciphertext letter lies below the plaintext letter and to the right of
the keyword letter.
Table 5.1: The Vigenère Tableau

to last column (which is the one headed by y). This row and column intersect
at the letter b, so the corresponding ciphertext letter is b.
Decryption is just as easy. Alice uses the row containing the keyword letter
and looks in that row for the ciphertext letter. Then the top of that column is
the plaintext letter. For example, if the keyword letter is g and the ciphertext
letter is r, Alice looks in the row starting with g until she finds r and then
she moves to the top of that column to find the plaintext letter l.
Example 5.13. We illustrate the use of the Vigenère tableau by encrypting
the plaintext message
The rain in Spain stays mainly in the plain,
using the keyword flamingo. Since the key word has eight letters, the first
step is to split the plaintext into eight-letter blocks,
theraini | nspainst | aysmainl | yinthepl | ain.
Next we write the keyword beneath each block of plaintext, where for conve-
nience we label lines P, K, and C to indicate, respectively, the plaintext, the
keyword, and the ciphertext.
P t h e r a i n i n s p a i n s t a y s m a i n l y i n t h e p l a i n
K f l a m i n g o f l a m i n g o f l a m i n g o f l a m i n g o f l a
Finally, we encrypt each letter using the Vigenère tableau. The initial plaintext
letter t and initial keyword letter f combine in the Vigenère tableau to yield
the ciphertext letter y, the second plaintext letter h and second keyword
letter l combine in the Vigenère tableau to yield the ciphertext letter s, and
so on. Continuing in this fashion, we complete the encryption process.
P t h e r a i n i n s p a i n s t a y s m a i n l y i n t h e p l a i n
K f l a m i n g o f l a m i n g o f l a m i n g o f l a m i n g o f l a
C y s e d i v t w s d p m q a y h f j s y i v t z d t n f p r v z f t n
Splitting the ciphertext into convenient blocks of five letters each, we are ready
to transmit our encrypted message
ysedi vtwsd pmqay hfjsy ivtzd tnfpr vzftn.
Remark 5.14. As we already pointed out, the same plaintext letter in a
Vigenère cipher is represented in the ciphertext by many different letters.
However, if the keyword is short, there will be a tendency for repetitive parts
of the plaintext to end up aligned at the same point in the keyword, in which
case they will be identically enciphered. This occurs in Example 5.13, where
the ain in rain and in mainly are encrypted using the same three keyword
letters ing, so they yield the same ciphertext letters ivt. This repetition in
the ciphertext, which appears separated by 16 letters, suggests that the key-
word has length dividing 16. Of course, not every occurrence of ain in the

plaintext yields the same ciphertext. It is only when two occurrences line up
with the same part of the keyword that repetition occurs.
In the next section we develop the idea of using ciphertext repetitions to
guess the length of the keyword, but here we simply want to make the point
that short keywords are less secure than long keywords.5
On the other hand,
Bob and Alice find it easier to remember a short keyword than a long one.
We thus see the beginnings of the eternal struggle in practical (as opposed to
purely theoretical) cryptography, namely the battle between
Efficiency (and ease of use) ←
−
−
−
− versus −
−
−
−
→ Security.
As a further illustration of this dichotomy, we consider ways in which
Bob and Alice might make their Vigenère-type cipher more secure. They can
certainly make Eve’s job harder by mixing up the letters in the first row of their
Vigenère tableau and then rotating this “mixed alphabet” in the subsequent
rows. Unfortunately, a mixed alphabet makes encryption and decryption more
cumbersome, plus it means that Bob and Alice must remember (or write down
for safekeeping!) not only their keyword, but also the mixed alphabet. And
if they want to be even more secure, they can use different randomly mixed
alphabets in every row of their Vigenère tableau. But if they do that, then
they will certainly need to keep a written copy of the tableau, which is a
serious security risk.
5.2.1 Cryptanalysis of the Vigenère Cipher: Theory
At various times in history it has been claimed that Vigenère-type ciphers,
especially with mixed alphabets, are “unbreakable.” In fact, nothing could be
further from the truth. If Eve knows Bob and Alice, she may be able to guess
part of the keyword and proceed from there. (How many people do you know
who use some variation of their name and birthday as an Internet password?)
But even without lucky guesses, elementary statistical methods developed in
the nineteenth century allow for a straightforward cryptanalysis of Vigenère-
type ciphers. In the interest of simplicity, we stick with the original Vigenère,
i.e., we do not allow mixed alphabets in the tableau.
You may wonder why we take the time to cryptanalyze the Vigenère cipher,
since no one these days uses the Vigenère for secure communications. The
answer is that our exposition is designed principally to introduce you to the use
of statistical tools in cryptanalysis. This builds on and extends the elementary
application of frequency tables as we used them in Sect. 1.1.1 to cryptanalyze
simple substitution ciphers. In this section we describe the theoretical tools
used to cryptanalyze the Vigenère, and in the next section we apply those
tools to decrypt a sample ciphertext. If at any point you find that the theory
in this section becomes confusing, it may help to turn to Sect. 5.2.2 and see
how the theory is applied in practice.
5More typically one uses a key phrase consisting of several words, but for simplicity we
use the term “keyword” to cover both single keywords and longer key phrases.

The first goal in cryptanalyzing a Vigenère cipher is to find the length of
the keyword, which is sometimes called the blocksize or the period. We already
saw in Remark 5.14 how this might be accomplished by looking for repeated
fragments in the ciphertext. The point is that certain plaintext fragments
such as the occur quite frequently, while other plaintext fragments such as ugw
occur infrequently or not at all. Among the many occurrences of the letters the
in the plaintext, a certain percentage of them will line up with exactly the
same part of the keyword.
This leads to the Kasiski method, first described by a German military
officer named Friedrich Kasiski in his book Die Geheimschriften und die
Dechiffrir-kunst6
published in 1863. One looks for repeated fragments within
the ciphertext and compiles a list of the distances that separate the repeti-
tions. The key length is likely to divide many of these distances. Of course, a
certain number of repetitions will occur by pure chance, but these are random,
while the ones coming from repeated plaintext fragments are always divisible
by the key length. It is generally not hard to pick out the key length from this
data.
There is another method of guessing the key length that works with
individual letters, rather than with fragments consisting of several letters.
The underlying idea can be traced all the way back to the frequency table
of English letters (Table 1.3), which shows that some letters are more likely
to occur than others. Suppose now that you are presented with a ciphertext
encrypted using a Vigenère cipher and that you guess that it was encrypted
using a keyword of length 5. This means that every fifth letter was encrypted
using the same rotation, so if you pull out every fifth letter and form them into
a string, this entire string was encrypted using a single substitution cipher.
Hence the string’s letter frequencies should look more or less as they do in En-
glish, with some letters much more frequent and some much less frequent. And
the same will be true of the string consisting of the 2nd, 7th, 12th,. . . letters
of the ciphertext, and so on. On the other hand, if you guessed wrong and the
key length is not five, then the string consisting of every fifth letter should be
more or less random, so its letter frequencies should look different from the
frequencies in English.
How can we quantify the following two statements so as to be able to
distinguish between them?
String 1 has letter frequencies similar to those in Table 1.3. (5.4)
String 2 has letter frequencies that look more or less random. (5.5)
One method is to use the following device.
Definition. Let s = c1c2c3 · · · cn be a string of n alphabetic characters. The
index of coincidence of s, denoted by IndCo(s), is the probability that two
randomly chosen characters in the string s are identical.
6Cryptography and the Art of Decryption.

We are going to derive a formula for the index of coincidence. It is conve-
nient to identify the letters a,. . . ,z with the numbers 0, 1, . . . , 25 respectively.
For each value i = 0, 1, 2, . . . , 25, let Fi be the frequency with which letter i
appears in the string s. For example, if the letter h appears 23 times in the
string s, then F7 = 23, since h = 7 in our labeling of the alphabet.
For each i, there are
Fi
2

= Fi(Fi−1)
2 ways to select two instances of the
ith letter of the alphabet from s, so the total number of ways to get a re-
peated letter is the sum of Fi(Fi−1)
2 for i = 0, 1, . . . , 25. On the other hand,
there are
n
2

= n(n−1)
2 ways to select two arbitrary characters from s. The
probability of selecting two identical letters is the total number of ways to
choose two identical letters divided by the total number of ways to choose
any two letters. That is,
IndCo(s) =
1
n(n − 1)
25

i=0
Fi(Fi − 1). (5.6)
Example 5.15. Let s be the string
s = “A bird in hand is worth two in the bush.”
Ignoring the spaces between words, s consists of 30 characters. The following
table counts the frequencies of each letter that appears at least once:
A B D E H I N O R S T U W
i 0 1 3 4 7 8 13 14 17 18 19 20 22
Fi 2 2 2 1 4 4 3 2 2 2 3 1 2
Then the index of coincidence of s, as given by (5.6), is
IndCo(s) =
1
30 · 29
(2·1+2·1+2·1+4·3+4·3+3·2+· · ·+3·2+2·1) ≈ 0.0575.
We return to our two statements (5.4) and (5.5). Suppose ﬁrst that the
string s consists of random characters. Then the probability that ci = cj is
exactly 1
26 , so we would expect IndCo(s) ≈ 1
26 ≈ 0.0385. On the other hand,
if s consists of English text, then we would expect the relative frequencies to
be as in Table 1.3. So for example, if s consists of 10,000 characters, we would
expect approximately 815 A’s, approximately 144 B’s, approximately 276 C’s,
and so on. Thus the index of coincidence for a string of English text should
be approximately
815 · 814 + 144 · 143 + 276 · 275 + · · · + 8 · 7
10000 · 9999
≈ 0.0685.
The disparity between 0.0385 and 0.0685, as small as it may seem, provides
the means to distinguish between Statement 5.4 and Statement 5.5. More
precisely:

If IndCo(s) ≈ 0.068, then s looks like simple substitution English. (5.7)
If IndCo(s) ≈ 0.038, then s looks like random letters. (5.8)
Of course, the value of IndCo(s) will tend to fluctuate, especially if s is fairly
short. But the moral of (5.7) and (5.8) is that larger values of IndCo(s) make it
more likely that s is English encrypted with some sort of simple substitution,
while smaller values of IndCo(s) make it more likely that s is random.
Now suppose that Eve intercepts a message s that she believes was en-
crypted using a Vigenère cipher and wants to check whether the keyword has
length k. Her first step is to break the string s into k pieces s1, s2, . . . , sk,
where s1 consists of every kth letter starting from the first letter, s2 consists
of every kth letter starting from the second letter, and so on. In mathematical
terms, if we write s = c1c2c3 . . . cn, then
si = cici+kci+2kci+3k . . . .
Notice that if Eve’s guess is correct and the keyword has length k, then each si
consists of characters that were encrypted using the same shift amount, so
although they do not decrypt to form actual words (remember that si is
every kth letter of the text), the pattern of their letter frequencies will look
like English. On the other hand, if Eve’s guess is incorrect, then the si strings
will be more or less random.
Thus for each k, Eve computes IndCo(si) for i = 1, 2, . . . , k and checks
whether these numbers are closer to 0.068 or closer to 0.038. She does this
for k = 3, 4, 5, . . . until she finds a value of k for which the average value
of IndCo(s1), IndCo(s2), . . . , IndCo(sk) is large, say greater than 0.06. Then
this k is probably the correct blocksize.
We assume now that Eve has used the Kasiski test or the index of coinci-
dence test to determine that the keyword has length k. That’s a good start,
but she’s still quite far from her goal of finding the plaintext. The next step
is to compare the strings s1, s2, . . . , sk to one another. The tool she uses to
compare different strings is called the mutual index of coincidence. The gen-
eral idea is that each of the k strings has been encrypted using a different
shift cipher. If the string si is shifted by βi and the string sj is shifted by βj,
then one would expect the frequencies of si to best match those of sj when
the symbols in si are shifted by an additional amount
σ ≡ βj − βi (mod 26).
This leads to the following useful definition.
Definition. Let
s = c1c2c3 . . . cn and t = d1d2d3 . . . dm
be strings of alphabetic characters. The mutual index of coincidence of s
and t, denoted by MutIndCo(s, t), is the probability that a randomly chosen
character from s and a randomly chosen character from t will be the same.

If we let Fi(s) denote the number of times the ith letter of the alphabet
appears in the string s, and similarly for Fi(t), then the probability of choosing
the ith letter from both is the product of the probabilities Fi(s)
n and Fi(t)
m .
In order to obtain a formula for the mutual index of coincidence of s and t,
we add these probabilities over all possible letters,
MutIndCo(s, t) =
1
nm
25

i=0
Fi(s)Fi(t). (5.9)
Example 5.16. Let s and t be the strings
s = “A bird in hand is worth two in the bush,”
t = “A stitch in time saves nine.”
Using formula (5.9) to compute the mutual index of coincidence of s and t
yields MutIndCo(s, t) = 0.0773.
The mutual index of coincidence has very similar properties to the index
of coincidence. For example, there are analogues of the two statements (5.7)
and (5.8). The value of MutIndCo(s, t) can be used to confirm that a guessed
shift amount is correct. Thus if two strings s and t are encrypted using the
same simple substitution cipher, then MutIndCo(s, t) tends to be large, be-
cause of the uneven frequency with which letters appear. On the other hand,
if s and t are encrypted using different substitution ciphers, then they have no
relation to one another, and the mutual index of coincidence MutIndCo(s, t)
will be much smaller.
We return now to Eve’s attack on a Vigenère cipher. She knows the key
length k and has split the ciphertext into k blocks, s1, s2, . . . , sk, as usual. The
characters in each block have been encrypted using the same shift amount,
say
βi = Amount that block si has been shifted.
Eve’s next step is to compare si with the string obtained by shifting the
characters in sj by different amounts. As a notational convenience, we write
sj + σ =

The string sj with every character
shifted σ spots down the alphabet.

Suppose that σ happens to equal βi−βj. Then sj +σ has been shifted a total of
βj + σ = βi from the plaintext, so sj + σ and si have been encrypted using the
same shift amount. Hence, as noted above, their mutual index of coincidence
will be fairly large. On the other hand, if σ is not equal to βi − βj, then sj + σ
and si have been encrypted using diﬀerent shift amounts, so MutIndCo(s, t)
will tend to be small.
To put this concept into action, Eve computes all of the mutual indices of
coincidence

MutIndCo(si, sj + σ) for 1 ≤ i j ≤ k and 0 ≤ σ ≤ 25.
Scanning the list of values, she picks out the ones that are large, say larger
than 0.065. Each large value of MutIndCo(si, sj + σ) makes it likely that
βi − βj ≡ σ (mod 26). (5.10)
(Note that (5.10) is only a congruence modulo 26, since a shift of 26 is the same
as a shift of 0.) The leads to a system of equations of the form (5.10) for the
variables β1, . . . , βk. In practice, some of these equations will be spurious, but
after a certain amount of trial and error, Eve will end up with values γ2, . . . , γk
satisfying
β2 = β1 + γ2, β3 = β1 + γ3, β4 = β1 + γ4, . . . , βk = β1 + γk.
Thus if the keyword happens to start with A, then the second letter of the
keyword would be A shifted by γ2, the third letter of the keyword would be A
shifted by γ3, and so on. Similarly, if the keyword happens to start with B,
then its second letter would be B shifted by γ2, its third letter would be B
shifted by γ3, etc. So all that Eve needs to do is try each of the 26 possible
starting letters and decrypt the message using each of the 26 corresponding
keywords. Looking at the ﬁrst few characters of the 26 putative plaintexts, it
is easy for her to pick out the correct one.
Remark 5.17. We make one ﬁnal remark before doing an example. We noted
earlier that among the many occurrences of the letters the in the plaintext,
a certain percentage of them will line up with exactly the same part of the
keyword. It turns out that these repeated encryptions occur much more fre-
quently than one might guess. This is an example of the “birthday paradox,”
which says that the probability of getting a match (e.g. of trigrams or birth-
days or colors) is quite high. We discuss the birthday paradox and some of its
many applications to cryptography in Sect. 5.4.
5.2.2 Cryptanalysis of the Vigenère Cipher: Practice
In this section we illustrate how to cryptanalyze a Vigenère ciphertext by
decrypting the message given in Table 5.2.
zpgdl rjlaj kpylx zpyyg lrjgd lrzhz qyjzq repvm swrzy rigzh
zvreg kwivs saolt nliuw oldie aqewf iiykh bjowr hdogc qhkwa
jyagg emisr zqoqh oavlk bjofr ylvps rtgiu avmsw lzgms evwpc
dmjsv jqbrn klpcf iowhv kxjbj pmfkr qthtk ozrgq ihbmq sbivd
ardym qmpbu nivxm tzwqv gefjh ucbor vwpcd xuwft qmoow jipds
fluqm oeavl jgqea lrkti wvext vkrrg xani
Table 5.2: A Vigenère ciphertext to cryptanalyze

Trigram Appears at places Difference
avl 117 and 258 141 = 3 · 47
bjo 86 and 121 35 = 5 · 7
dlr 4 and 25 21 = 3 · 7
gdl 3 and 24 16 = 24
lrj 5 and 21 98 = 2 · 72
msw 40 and 138 84 = 22
· 3 · 7
pcd 149 and 233 13 = 13
qmo 241 and 254 98 = 2 · 72
vms 39 and 137 84 = 22
· 3 · 7
vwp 147 and 231 84 = 22
· 3 · 7
wpc 148 and 232 21 = 3 · 7
zhz 28 and 49 21 = 3 · 7
Table 5.3: Repeated trigrams in the ciphertext given in Table 5.2
Key Average Individual indices
length index of coincidence
4 0.038 0.034, 0.042, 0.039, 0.035
5 0.037 0.038, 0.039, 0.043, 0.027, 0.036
6 0.036 0.038, 0.038, 0.039, 0.038, 0.032, 0.033
7 0.062 0.062, 0.057, 0.065, 0.059, 0.060, 0.064, 0.064
8 0.038 0.037, 0.029, 0.038, 0.030, 0.034, 0.057, 0.040, 0.039
9 0.037 0.032, 0.036, 0.028, 0.030, 0.026, 0.032, 0.045, 0.047, 0.056
Table 5.4: Indices of coincidence of Table 5.2 for various key lengths
We begin by applying the Kasiski test. A list of repeated trigrams is given
in Table 5.3, together with their location within the ciphertext and the number
of letters that separates them. Most of the differences in the last column are
divisible by 7, and 7 is the largest number with this property, so we guess that
the keyword length is 7.
Although the Kasiski test shows that the period is probably 7, we also
apply the index of coincidence test in order to illustrate how it works. Table 5.4
lists the indices of coincidence for various choices of key length and the average
index of coincidence for each key length. We see from Table 5.4 that key
length 7 has far higher average index of coincidence than the other potential
key lengths, which confirms the conclusion from the Kasiski test.
Now that Eve knows that the key length is 7, she compares the blocks
with one another as described in Sect. 5.2.1. She first breaks the ciphertext
into seven blocks by taking every seventh letter. (Notice how the first seven
letters of the ciphertext run down the first column, the second seven down
the second column, and so on.)

Blocks Shift amount
i j 0 1 2 3 4 5 6 7 8 9 10 11 12
1 2 0.025 0.034 0.045 0.049 0.025 0.032 0.037 0.042 0.049 0.031 0.032 0.037 0.043
1 3 0.023 0.067 0.055 0.022 0.034 0.049 0.036 0.040 0.040 0.046 0.025 0.031 0.046
1 4 0.032 0.041 0.027 0.040 0.045 0.037 0.045 0.028 0.049 0.042 0.042 0.030 0.039
1 5 0.043 0.021 0.031 0.052 0.027 0.049 0.037 0.050 0.033 0.033 0.035 0.044 0.030
1 6 0.037 0.036 0.030 0.037 0.037 0.055 0.046 0.038 0.035 0.031 0.032 0.037 0.032
1 7 0.054 0.063 0.034 0.030 0.034 0.040 0.035 0.032 0.042 0.025 0.019 0.061 0.054
2 3 0.041 0.029 0.036 0.041 0.045 0.038 0.060 0.031 0.020 0.045 0.056 0.029 0.030
2 4 0.028 0.043 0.042 0.032 0.032 0.047 0.035 0.048 0.037 0.040 0.028 0.051 0.037
2 5 0.047 0.037 0.032 0.044 0.059 0.029 0.017 0.044 0.060 0.034 0.037 0.046 0.039
2 6 0.033 0.035 0.052 0.040 0.032 0.031 0.031 0.029 0.055 0.052 0.043 0.028 0.023
2 7 0.038 0.037 0.035 0.046 0.046 0.054 0.037 0.018 0.029 0.052 0.041 0.026 0.037
3 4 0.029 0.039 0.033 0.048 0.044 0.043 0.030 0.051 0.033 0.034 0.034 0.040 0.038
3 5 0.021 0.041 0.041 0.037 0.051 0.035 0.036 0.038 0.025 0.043 0.034 0.039 0.036
3 6 0.037 0.034 0.042 0.034 0.051 0.029 0.027 0.041 0.034 0.040 0.037 0.046 0.036
3 7 0.046 0.023 0.028 0.040 0.031 0.040 0.045 0.039 0.020 0.030 0.069 0.042 0.037
4 5 0.041 0.033 0.041 0.038 0.036 0.031 0.056 0.032 0.026 0.034 0.049 0.029 0.054
4 6 0.035 0.037 0.032 0.039 0.041 0.033 0.032 0.039 0.042 0.031 0.049 0.039 0.058
4 7 0.031 0.032 0.046 0.038 0.039 0.042 0.033 0.056 0.046 0.027 0.027 0.036 0.036
5 6 0.048 0.036 0.026 0.031 0.033 0.039 0.037 0.027 0.037 0.045 0.032 0.040 0.041
5 7 0.030 0.051 0.043 0.031 0.034 0.041 0.048 0.032 0.053 0.037 0.024 0.029 0.045
6 7 0.032 0.033 0.030 0.038 0.032 0.035 0.047 0.050 0.049 0.033 0.057 0.050 0.021
Blocks Shift amount
i j 13 14 15 16 17 18 19 20 21 22 23 24 25
1 2 0.034 0.052 0.037 0.030 0.037 0.054 0.021 0.018 0.052 0.052 0.043 0.042 0.046
1 3 0.031 0.037 0.038 0.050 0.039 0.040 0.026 0.037 0.044 0.043 0.023 0.045 0.032
1 4 0.039 0.040 0.032 0.041 0.028 0.019 0.071 0.038 0.040 0.034 0.045 0.026 0.052
1 5 0.042 0.032 0.038 0.037 0.032 0.045 0.045 0.033 0.041 0.043 0.035 0.028 0.063
1 6 0.040 0.030 0.028 0.071 0.051 0.033 0.036 0.047 0.029 0.037 0.046 0.041 0.027
1 7 0.040 0.032 0.049 0.037 0.035 0.035 0.039 0.023 0.043 0.035 0.041 0.042 0.027
2 3 0.054 0.040 0.028 0.031 0.039 0.033 0.052 0.046 0.037 0.026 0.028 0.036 0.048
2 4 0.047 0.034 0.027 0.038 0.047 0.042 0.026 0.038 0.029 0.046 0.040 0.061 0.025
2 5 0.034 0.026 0.035 0.038 0.048 0.035 0.033 0.032 0.040 0.041 0.045 0.033 0.036
2 6 0.033 0.034 0.036 0.036 0.048 0.040 0.041 0.049 0.058 0.028 0.021 0.043 0.049
2 7 0.042 0.037 0.041 0.059 0.031 0.027 0.043 0.046 0.028 0.021 0.044 0.048 0.040
3 4 0.037 0.045 0.033 0.028 0.029 0.073 0.026 0.040 0.040 0.026 0.043 0.042 0.043
3 5 0.035 0.029 0.036 0.044 0.055 0.034 0.033 0.046 0.041 0.024 0.041 0.067 0.037
3 6 0.023 0.043 0.074 0.047 0.033 0.043 0.030 0.026 0.042 0.045 0.032 0.035 0.040
3 7 0.035 0.035 0.035 0.028 0.048 0.033 0.035 0.041 0.038 0.052 0.038 0.029 0.062
4 5 0.032 0.041 0.036 0.032 0.046 0.035 0.039 0.042 0.038 0.034 0.043 0.036 0.048
4 6 0.034 0.034 0.036 0.029 0.043 0.037 0.039 0.036 0.039 0.033 0.066 0.037 0.028
4 7 0.043 0.032 0.039 0.034 0.029 0.071 0.037 0.039 0.030 0.044 0.037 0.030 0.041
5 6 0.052 0.035 0.019 0.036 0.063 0.045 0.030 0.039 0.049 0.029 0.036 0.052 0.041
5 7 0.040 0.031 0.034 0.052 0.026 0.034 0.051 0.044 0.041 0.039 0.034 0.046 0.029
6 7 0.029 0.035 0.039 0.032 0.028 0.039 0.026 0.036 0.069 0.052 0.035 0.034 0.038
Table 5.5: Mutual indices of coincidence of Table 5.2 for shifted blocks
s1 = zlxrhrrhwloehdweoklilwvlhphqbynwhwfjulrxx
s2 = pazjzezzitlwboamqbvuzpjpvmtiimiquptiqjkta
s3 = gjpgqpyvvndfjgjihjpagcqckfkhvqvvccqpmgtvn
s4 = dkydyvrrsliiocysoosvmdbfxkobdmxgbdmdoqiki
s5 = lpyljmiesieiwqarafrmsmrijrzmapmeoxoseewr
s6 = rygrzsggauayrhgzvrtsejnobqrqrbtfruofaavr
s7 = jllzqwzkowqkhkgqlygwvskwjtgsduzjvwwlvleg
She then compares the ith block si to the jth block shifted by σ, which we
denote by sj + σ, taking successively σ = 0, 1, 2, . . . , 25. Table 5.5 gives a
complete list of the 546 mutual indices of coincidence
MutIndCo(si, sj + σ) for 1 ≤ i j ≤ 7 and 0 ≤ σ ≤ 25.
In Table 5.5, the entry in the row corresponding to (i, j) and the column
corresponding to the shift σ is equal to
MutIndCo(si, sj + σ) = MutIndCo(Block si, Block sj shifted by σ). (5.11)

If this quantity is large, it suggests that sj has been shifted σ further than si.
As in Sect. 5.2.1 we let
i j Shift MutIndCo Shift relation
1 3 1 0.067 β1 − β3 = 1
3 7 10 0.069 β3 − β7 = 10
1 4 19 0.071 β1 − β4 = 19
1 6 16 0.071 β1 − β6 = 16
3 4 18 0.073 β3 − β4 = 18
3 5 24 0.067 β3 − β5 = 24
3 6 15 0.074 β3 − β6 = 15
4 6 23 0.066 β4 − β6 = 23
4 7 18 0.071 β4 − β7 = 18
6 7 21 0.069 β6 − β7 = 21
Table 5.6: Large indices of coincidence and shift relations
βi = Amount that the block si has been shifted.
Then a large value for (5.11) makes it likely that
βi − βj = σ. (5.12)
We have underlined the large values (those greater than 0.065) in Table 5.5
and compiled them, with the associated shift relation (5.12), in Table 5.6.
Eve’s next step is to solve the system of linear equations appearing in the
ﬁnal column of Table 5.6, keeping in mind that all values are modulo 26, since
a shift of 26 is the same as no shift at all. Notice that there are 10 equations
for the six variables β1, β3, β4, β5, β6, β7. (Unfortunately, β2 does not appear,
so we’ll deal with it later). In general, a system of 10 equations in 6 variables
has no solutions,7
but in this case a little bit of algebra shows that not only
is there a solution, there is actually one solution for each value of β1. In
other words, the full set of solutions is obtained by expressing each of the
variables β3, . . . , β7 in terms of β1:
β3 = β1 + 25, β4 = β1 + 7, β5 = β1 + 1, β6 = β1 + 10, β7 = β1 + 15.
(5.13)
What should Eve do about β2? She could just ignore it for now, but instead
she picks out the largest values in Table 5.5 that relate to block 2 and uses
those. The largest such values are (i, j) = (2, 3) with shift 6 and index 0.060
and (i, j) = (2, 4) with shift 24 and index 0.061, which give the relations
β2 − β3 = 6 and β2 − β4 = 24.
7We were a little lucky in that every relation in Table 5.6 is correct. Sometimes there
are erroneous relations, but it is not hard to eliminate them with some trial and error.

Substituting in from (5.13), these both yield β2 = β1 + 5, and the fact that
they give the same value gives Eve confidence that they are correct.
Shift Keyword Decrypted text
0 AFZHBKP zkhwkhulvkdoowxuqrxwwrehwkhkhurripbrzqolih
1 BGAICLQ yjgvjgtkujcnnvwtpqwvvqdgvjgjgtqqhoaqypnkhg
2 CHBJDMR xifuifsjtibmmuvsopvuupcfuififsppgnzpxomjgf
3 DICKENS whetherishallturnouttobetheheroofmyownlife
4 EJDLFOT vgdsgdqhrgzkkstqmntssnadsgdgdqnnelxnvmkhed
5 FKEMGPU ufcrfcpgqfyjjrsplmsrrmzcrfcfcpmmdkwmuljgdc
6 GLFNHQV tebqebofpexiiqroklrqqlybqebebollcjvltkifcb
7 HMGOIRW sdapdaneodwhhpqnjkqppkxapdadankkbiuksjheba
8 INHPJSX rczoczmdncvggopmijpoojwzoczczmjjahtjrigdaz
.
.
.
.
.
.
.
.
.
Table 5.7: Decryption of Table 5.2 using shifts of the keyword AFZHBKP
To summarize, Eve now knows that however much the first block s1 is
rotated, blocks s2, s3, . . . , s7 are rotated, respectively, 5, 25, 7, 1, 10, and 15
steps further than s1. So for example, if s1 is not rotated at all (i.e., if β1 = 0
and the first letter of the keyword is A), then the full keyword is AFZHBKP. Eve
uses the keyword AFZHBKP to decrypt the first few blocks of the ciphertext,
finding the “plaintext”
zkhwkhulvkdoowxuqrxwwrehwkhkhurripbrzqolihruzkhwkh.
That doesn’t look good! So next she tries β1 = 1 and a keyword starting with
the letter B. Continuing in this fashion, she need only check the 26 possibilities
for β1. The results are listed in Table 5.7.
Taking β1 = 3 yields the keyword DICKENS and an acceptable plaintext.
Completing the decryption using this keyword and supplying the appropriate
word breaks, punctuation, and capitalization, Eve recovers the full plaintext:
Whether I shall turn out to be the hero of my own life, or whether
that station will be held by anybody else, these pages must show.
To begin my life with the beginning of my life, I record that I was
born (as I have been informed and believe) on a Friday, at twelve
o’clock at night. It was remarked that the clock began to strike,
and I began to cry, simultaneously.8
8David Copperfield, 1850, Charles Dickens.

5.3 Probability Theory
5.3.1 Basic Concepts of Probability Theory
In this section we introduce the basic ideas of probability theory in the dis-
crete setting. A probability space consists of two pieces. The first is a finite
set Ω consisting of all possible outcomes of an experiment and the second is
a method for assigning a probability to each possible outcome. In mathemat-
ical terms, a probability space is a finite set of outcomes Ω, called the sample
space, and a function
Pr : Ω −→ R.
We want the function Pr to satisfy our intuition that
Pr(ω) = “probability that event ω occurred.”
In particular, the value of Pr(ω) should be between 0 and 1.
Example 5.18. Consider the toss of a single coin. There are two outcomes,
heads and tails, so we let Ω be the set {H, T}. Assuming that it is a fair coin,
each outcome is equally likely, so Pr(H) = Pr(T) = 1
2 .
Example 5.19. Consider the roll of two dice. The sample space Ω is the fol-
lowing set of 36 pairs of numbers:
Ω =

(n, m) : n, m ∈ Z with 1 ≤ n, m ≤ 6

.
As in Example 5.18, each possible outcome is equally likely. For example,
the probability of rolling (6, 6) is the same as the probability of rolling (3, 4).
Hence
Pr

(n, m)

=
1
36
for any choice of (n, m). Note that order matters in this scenario. We might
imagine that one die is red and the other is blue, so “red 3 and blue 5” is a
different outcome from “red 5 and blue 3.”
Example 5.20. Suppose that an urn contains 100 balls, of which 21 are red
and the rest are blue. If we pick 10 balls at random (without replacement),
what is the probability that exactly 3 of them are red?
The total number of ways of selecting 10 balls from among 100 is
100
10

.
Similarly, there are
21
3

ways to select 3 red balls from among the 21 that are
red, and there are
79
7

ways to pick the other 7 balls from among the 79
that are blue. There are thus
21
3
79
7

ways to select exactly 3 red balls
and exactly 7 blue balls. Hence the probability of picking exactly 3 red balls
in 10 tries is
Pr

exactly 3 red balls in
10 attempts

=
21
3
79
7

100
10
=
20271005
91015876
≈ 0.223.

5.3. Probability Theory 229
We are typically more interested in computing the probability of compound
events. These are subsets of the sample space that may include more than one
outcome. For example, in the roll of two dice in Example 5.19, we might be
interested in the probability that at least one of the dice shows a 6. This
compound event is the subset of Ω consisting of all outcomes that include the
number six, which is the set

(1, 6), (2, 6), (3, 6), (4, 6), (5, 6), (6, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5)

.
Suppose that we know the probability of each particular outcome. How
then do we compute the probability of compound events or of events consisting
of repeated independent trials of an experiment? Analyzing this problem leads
to the idea of independence of events, a concept that gives probability theory
much of its complexity and richness.
The formal theory of probability is an axiomatic theory. You have probably
seen such theories when you studied Euclidean geometry and when you studied
abstract vector spaces. In an axiomatic theory, one starts with a small list of
basic axioms and derives from them additional interesting facts and formulas.
The axiomatic theory of probability allows us to derive formulas to compute
the probabilities of compound events. In this book we are content with an
informal presentation of the theory, but for those who are interested in a more
rigorous axiomatic treatment of probability theory, see for example [112, §2.3].
We begin with some definitions.
Definition. A sample space (or set of outcomes) is a finite9
set Ω. Each
outcome ω ∈ Ω is assigned a probability Pr(ω), where we require that the
probability function
Pr : Ω −→ R
satisfy the following two properties:
(a) 0 ≤ Pr(ω) ≤ 1 for all ω ∈ Ω and (b)

ω∈Ω
Pr(ω) = 1. (5.14)
Notice that (5.14)(a) corresponds to our intuition that every outcome
has a probability between 0 (if it never occurs) and 1 (if it always occurs),
while (5.14)(b) says that some outcome must occur, so Ω contains all possible
outcomes for the experiment.
Definition. An event is any subset of Ω. We assign a probability to an event
E ⊂ Ω by setting
Pr(E) =

ω∈E
Pr(ω). (5.15)
In particular, Pr(∅) = 0 by convention, and Pr(Ω) = 1 from (5.14)(b).
9General (continuous) probability theory also deals with infinite sample spaces Ω, in
which case only certain subsets of Ω are allowed to be events and are assigned probabilities.
There are also further restrictions on the probability function Pr : Ω → R. For our study of
cryptography in this book, it suffices to use discrete (finite) sample spaces.

Definition. We say that two events E and F are disjoint if E ∩ F = ∅.
It is clear that
Pr(E ∪ F) = Pr(E) + Pr(F) if E and F are disjoint,
since then E ∪ F is the collection of all outcomes in either E or F. When
E and F are not disjoint, the probability of the event E ∪ F is not the sum
of Pr(E) and Pr(F), since the outcomes common to both E and F should
not be counted twice. Thus we need to subtract the outcomes common to E
and F, which gives the useful formula
Pr(E ∪ F) = Pr(E) + Pr(F) − Pr(E ∩ F). (5.16)
(See Exercise 5.20.)
Definition. The complement of an event E is the event Ec
consisting of all
outcomes that are not in E, i.e.,
Ec
= {ω ∈ Ω : ω /
∈ E}.
The probability of the complementary event is given by
Pr(Ec
) = 1 − Pr(E). (5.17)
It is sometimes easier to compute the probability of the complement of an
event E and then use (5.17) to find Pr(E).
Example 5.21. We continue with Example 5.19 in which Ω consists of the
possible outcomes of rolling two dice. Let E be the event
E = {at least one six is rolled}.
We can write down E explicitly; it is the set
E =

(1, 6), (6, 1), (2, 6), (6, 2), (3, 6), (6, 3), (4, 6), (6, 4), (5, 6), (6, 5), (6, 6)

.
Each of these 11 outcomes has probability 1
36 , so
Pr(E) =

ω∈E
Pr(ω) =
11
36
.
We can then compute the probability of not rolling a six as
Pr(no sixes are rolled) = Pr(Ec
) = 1 − Pr(E) =
25
36
.
Next consider the event F defined by
F = {no number higher than two is rolled}.

Notice that
F =

(1, 1), (1, 2), (2, 1), (2, 2)

is disjoint from E, so the probability of either rolling a six or else rolling no
number higher than two is
Pr(E ∪ F) = Pr(E) + Pr(F) =
11
36
+
4
36
=
15
36
.
For nondisjoint events, the computation is more complicated, since we
need to avoid double counting outcomes. Consider the event G defined by
G = {doubles},
i.e., G =

(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)

. Then E and G both contain
the outcome (6, 6), so their union E ∪ G only contains 16 outcomes, not 17.
Thus the probability of rolling either a six or doubles is 16
36 . We can also
compute this probability using formula (5.16),
Pr(E ∪ G) = Pr(E) + Pr(G) − Pr(E ∩ G) =
11
36
+
6
36
−
1
36
=
16
36
=
4
9
.
To conclude this example, let H be the event
H = {the sum of the two dice is at least 4}.
We could compute Pr(H) directly, but it is easier to compute the probability
of Hc
. Indeed, there are only three outcomes that give a sum smaller than 4,
namely
Hc
=

(1, 1), (1, 2), (2, 1)

.
Thus Pr(Hc
) = 3
36 = 1
12 , and then Pr(H) = 1 − Pr(Hc
) = 11
12 .
Suppose now that E and F are events. The event consisting of both E
and F is the intersection E ∩ F, so the probability that both E and F occur
is
Pr(E and F) = Pr(E ∩ F).
As the next example makes clear, the probability of the intersection of two
events is not a simple function of the probabilities of the individual events.
Example 5.22. Consider the experiment consisting of drawing two cards from
a deck of cards, where the second card is drawn without replacing the first
card. Let E and F be the following events:
E = {the first card drawn is a king},
F = {the second card drawn is a king}.
Clearly Pr(E) = 1
13 . It is also true that Pr(F) = 1
13 , since with no information
about the value of the first card, there’s no difference between events E and F.
(If this seems unclear, suppose instead that the deck of cards were dealt to 52

people. Then the probability that any particular person gets a king is 1
13 ,
regardless of whether they received the first card or the second card or. . . .)
However, it is also clear that if we know whether event E has occurred, then
that knowledge does affect the probability of F occurring. More precisely, if E
occurs, then there are only 3 kings left in the remaining 51 cards, so F is less
likely, while if E does not occur, then there are 4 kings left and F is more
likely. Mathematically we find that
Pr(F if E has occurred) =
3
51
and Pr(F if E has not occurred) =
4
51
.
Thus the probability of both E and F occurring, i.e., the probability of draw-
ing two consecutive kings, is smaller than the product of Pr(E) and Pr(F),
because the occurrence of the event E makes the event F less likely. The
correct computation is
Pr(drawing two kings) = Pr(E ∩ F)
= Pr(E) · Pr(F given that E has occurred)
=
1
13
·
3
51
=
1
221
≈ 0.0045.
Let
G = {the second card drawn is an ace}.
Then the occurrence of E makes G more likely, since if the first card is known
to be a king, then there are still four aces left. Thus if we know that E occurs,
then the probability of G increases from 4
52 to 4
51 .
Notice, however, that if we change the experiment and require that the first
card be replaced in the deck before the second card is drawn, then whether E
occurs has no effect at all on F. Thus using this card replacement scenario,
the probability that E and F both occur is simply the product
Pr(E) Pr(F) =

1
13

2
≈ 0.006.
We learn two things from the discussion in Example 5.22. First, we see
that the probability of one event can depend on whether another event has
occurred. Second, we develop some probabilistic intuitions that lead to the
mathematical deﬁnition of independence.
Deﬁnition. Two events E and F are said to be independent if
Pr(E ∩ F) = Pr(E) · Pr(F),
where recall that the probability of the intersection Pr(E ∩ F) is the proba-
bility that both E and F occur. In other words, E and F are independent
if the probability of their both occurring is the product of their individual
probabilities of occurring.

Example 5.23. A coin is tossed 10 times and the results recorded. What are
the probabilities of the following events?
E1 = {the first five tosses are all heads}.
E2 = {the first five tosses are heads and the rest are tails}.
E3 = {exactly five of the ten tosses are heads}.
The result of any one toss is independent of the result of any other toss,
so we can compute the probability of getting H on the first five tosses by
multiplying together the probability of getting H on any one of these tosses.
Assuming that it is a fair coin, the answer to our first question is thus
Pr(E1) =

1
2

5
=
1
32
≈ 0.031.
In order to compute the probability of E2, note that we are now asking for
the probability that our sequence of tosses is exactly HHHHHTTTTT. Again
using the independence of the individual tosses, we see that
Pr(E2) =

1
2

10
=
1
1024
≈ 0.00098.
The computation of Pr(E3) is a little trickier, because it asks for exactly
five H’s to occur, but places no restriction on when they occur. If we were to
specify exactly when the five H’s and the five T’s occur, then the probability
would be 1
210 , just as it was for E2. So all that we need to do is to count
how many ways we can distribute five H’s and five T’s into ten spots, or
equivalently, how many different sequences we can form consisting of five H’s
and five T’s. This is simply the number of ways of choosing five locations
from ten possible locations, which is given by the combinatorial symbol
10
5

.
Hence dividing the number of outcomes satisfying E3 by the total number of
outcomes, we find that
Pr(E3) =

10
5

·
1
210
=
252
1024
=
63
256
≈ 0.246.
Thus there is just under a 25 % chance of getting exactly ﬁve heads in ten
tosses of a coin.
5.3.2 Bayes’s Formula
As we saw in Example 5.22, there is a connection between the probability
that two events E and F occur simultaneously and the probability that one of
them occurs if we know that the other one has occurred. The former quantity
is simply Pr(E ∩ F). The latter quantity is called the conditional probability
of F on E.

Deﬁnition. The conditional probability of F on E is denoted by
Pr(F | E) = Pr(F given that E has occurred).
The probability that both E and F occur is related to the conditional
probability of F on E by the formula
Pr(F | E) =
Pr(F ∩ E)
Pr(E)
. (5.18)
The intuition behind (5.18), which is usually taken as the deﬁnition of the
conditional probability Pr(F | E), is simple. On the left-hand side, we are
assuming that E occurs, so our sample space or universe is now E instead
of Ω. We are asking for the probability that the event F occurs in this smaller
universe of outcomes, so we should compute the proportion of the event F
that is included in the event E, divided by the total size of the event E itself.
This gives the right-hand side of (5.18).
Formula (5.18) immediately implies that
Pr(F | E) Pr(E) = Pr(F ∩ E) = Pr(E ∩ F) = Pr(E | F) Pr(F).
Dividing both sides by Pr(F) gives a preliminary version of Bayes’s
formula:
Pr(E | F) =
Pr(F | E) Pr(E)
Pr(F)
(Bayes’s formula). (5.19)
This formula is useful if we know the conditional probability of F on E and
want to know the reverse conditional probability of E on F.
Sometimes it is easier to compute the probability of an event by dividing
it into a union of disjoint events, as in the next proposition, which includes
another version of Bayes’s formula.
Proposition 5.24. Let E and F be events.
(a) Pr(E) = Pr(E | F) Pr(F) + Pr(E | Fc
) Pr(Fc
). (5.20)
(b) Pr(E | F) =
Pr(F | E) Pr(E)
Pr(F | E) Pr(E) + Pr(F | Ec) Pr(Ec)
(Bayes’s formula).
(5.21)
Proof. The proof of (a) illustrates how one manipulates basic probability
formulas.
Pr(E | F) Pr(F) + Pr(E | Fc
) Pr(Fc
)
= Pr(E ∩ F) + Pr(E ∩ Fc
) from (5.18),
= Pr

(E ∩ F) ∪ (E ∩ Fc
)

since E ∩ F and E ∩ Fc
are disjoint,
= Pr(E) since F ∪ Fc
= Ω.

This completes the proof of (a).
In order to prove (b), we reverse the roles of E and F in (a) to get
Pr(F) = Pr(F | E) Pr(E) + Pr(F | Ec
) Pr(Ec
), (5.22)
and then substitute (5.22) into the denominator of (5.19) to obtain (5.21).
Here are some examples that illustrate the use of conditional probabilities.
Bayes’s formula will be applied in the next section.
Example 5.25. We are given two urns10
containing gold and silver coins.
Urn #1 contains 10 gold coins and 5 silver coins, and Urn #2 contains 2 gold
coins and 8 silver coins. An urn is chosen at random, and then a coin is picked
at random. What is the probability of choosing a gold coin?
Let
E = {a gold coin is chosen}.
The probability of E depends ﬁrst on which urn was chosen, and then on
which coin is chosen in that urn. It is thus natural to break E up according
to the outcome of the event
F = {Urn #1 is chosen}.
Notice that Fc
is the event that Urn #2 is chosen. The decomposition
formula (5.20) says that
Pr(E) = Pr(E | F) Pr(F) + Pr(E | Fc
) Pr(Fc
).
The key point here is that it is easy to compute the conditional probabilities
on the right-hand side, and similarly easy to compute Pr(F) and Pr(Fc
). Thus
Pr(E | F) =
10
15
=
2
3
, Pr(E | Fc
) =
2
10
=
1
5
, Pr(F) = Pr(Fc
) =
1
2
.
Using these values, we can compute
Pr(E) = Pr(E | F) Pr(F) + Pr(E | Fc
) Pr(Fc
) =
2
3
·
1
2
+
1
5
·
1
2
=
13
30
≈ 0.433.
Example 5.26 (The Three Prisoners Problem). The three prisoners problem is
a classical problem about conditional probability. Three prisoners, Alice, Bob,
and Carl, are informed by their jailer that the next day, one of them will be
released from prison, but that the other two will have to serve life sentences.
The jailer says that he will not tell any prisoner what will happen to him or
her. But Alice, who reasons that her chances of going free are now 1
3 , asks
the jailer to give her the name of one prisoner, other than herself, who will
10The authors of [51, chapter 1] explain the ubiquity of urns in the ﬁeld of probability
theory as being connected with the French phrase aller aux urnes (to vote).

not go free. The jailer tells Alice that Bob will remain in jail. Now what are
Alice’s chances of going free? Has the probability changed? Alice could argue
that she now has a 1
2 chance of going free, since Bob will definitely remain
behind. On the other hand, it also seems reasonable to argue that since one
of Bob or Carl had to stay in jail, this new information could not possibly
change the odds for Alice.
In fact, either answer may be correct. It depends on the strategy that the
jailer follows in deciding which name to give to Alice (assuming that Alice
knows which strategy is being used). If the jailer picks a name at random
whenever both Bob and Carl are possible choices, then Alice’s chances of
freedom have not changed. However, if the jailer names Bob whenever possi-
ble, and otherwise names Carl, then the new information does indeed change
Alice’s probability of release to 1
2 . See Exercise 5.26.
There are many other versions of the three prisoners problem, including
the “Monty Hall problem” that is a staple of popular culture. Exercise 5.27
describes the Monty Hall problem and other fun applications of these ideas.
5.3.3 Monte Carlo Algorithms
There are many algorithms whose output is not guaranteed to be correct. For
example, Table 3.2 in Sect. 3.4 describes the Miller–Rabin algorithm, which
is used to check whether a given large number is prime. In practice, one runs
the algorithm many times to obtain an output that is “probably” correct.
In applying these so-called Monte Carlo or probabilistic algorithms, it is im-
portant to be able to compute a confidence level, which is the probability that
the output is indeed correct. In this section we describe how to use Bayes’s
formula to do such a computation.
The basic scenario consists of a large (possibly infinite) set of integers S
and an interesting property A. For example, S could be the set of all integers,
or more realistically S might be the set of all integers between, say, 21024
and 21025
. An example of an interesting property A is the property of being
composite.
Now suppose that we are looking for numbers that do not have property A.
Using the Miller–Rabin test, we might be looking for integers between 21024
and 21025
that are not composite, i.e., that are prime. In general, suppose
that we are given an integer m in S and that we want to know whether m has
property A. Usually we know approximately how many of the integers in S
have property A. For example, we might know that 99 % of elements have
property A and that the other 1 % do not. However, it may be difficult to
determine with certainty that any particular m ∈ S does not have property A.
So instead we settle for a faster algorithm that is not absolutely certain to be
correct.
A Monte Carlo algorithm for property A takes as its input both a num-
ber m ∈ S to be tested and a randomly chosen number r and returns as
output either Yes or No according to the following rules:

(1) If the algorithm returns Yes, then m definitely has property A. In con-
ditional probability notation, this says that
Pr(m has property A | algorithm returns Yes) = 1.
(2) If m has property A, then the algorithm returns Yes for at least 50 % of
the choices for r.11
Using conditional probability notation,
Pr(algorithm returns Yes | m has property A) ≥
1
2
.
Now suppose that we run the algorithm N times on an integer m ∈ S,
using N different randomly chosen values for r. If even a single trial re-
turns Yes, then we know that m has property A. But suppose instead that
all N trials return the answer No. How confident can we be that our integer
does not have property A? In probability terminology, we want to estimate
Pr(m does not have property A | algorithm returns No N times).
More precisely, we want to show that if N is large, then this probability is
close to 1.
We define two events:
E = {an integer in S does not have property A},
F = {the algorithm returns No N times in a row}.
We are interested in the conditional probability Pr(E | F), that is, the
probability that m does not have property A, given the fact that the al-
gorithm returned No N times. We can compute this probability using Bayes’s
formula (5.21),
Pr(E | F) =
Pr(F | E) Pr(E)
Pr(F | E) Pr(E) + Pr(F | Ec) Pr(Ec)
.
We are given that 99 % of the elements in S have property A, so
Pr(E) = 0.01 and Pr(Ec
) = 0.99.
Next consider Pr(F | E). If m does not have property A, which is our assump-
tion on this conditional probability, then the algorithm always returns No,
since Property (1) of the Monte Carlo method tells us that a Yes output
forces m to have property A. In symbols, Property (1) says that
Pr(No | not A) = Pr(A | Yes) = 1.
11More generally, the success rate in a Monte Carlo algorithm need not be 50 %, but
may instead be any positive probability that is not too small. For the Miller–Rabin test
described in Sect. 3.4, the corresponding probability is 75 %. See Exercise 5.28 for details.

N
from Property (2) of the Monte Carlo method,
=
1
2N
.
Substituting these values into Bayes’s formula, we find that if the algorithm
returns No N times in a row, then the probability that the integer m does not
have property A is
Pr(E | F) ≥
1 · (0.01)
1 · (0.01) + 2−N · (0.99)
=
1
1 + 99 · 2−N
= 1 −
99
2N + 99
.
Notice that if N is large, the lower bound is very close to 1.
For example, if we run the algorithm 100 times and get 100 No answers,
then the probability that m does not have property A is at least
99
99 + 2−100
≈ 1 − 10−32.1
.
So for most practical purposes, it is safe to conclude that m does not have
property A.
5.3.4 Random Variables
We are generally more interested in the consequences of an experiment, for
example the net loss or gain from a game of chance, than in the experiment
itself. Mathematically, this means that we are interested in functions that are
defined on events and that take values in some set.
Definition. A random variable is a function
X : Ω −→ R
whose domain is the sample space Ω and that takes values in the real numbers.
More generally, a random variable is a function X : Ω → S whose range may
be any set; for example, S could be a set of keys or a set of plaintexts.
We note that since our sample spaces are finite, a random variable takes
on only finitely many values. Random variables are useful for defining events.
For example, if X : Ω → R is a random variable, then any real number x
defines three interesting events,
{ω ∈ Ω : X(ω) ≤ x}, {ω ∈ Ω : X(ω) = x}, {ω ∈ Ω : X(ω) x}.

Definition. Let X : Ω → R be a random variable. The probability density
function of X, denoted by fX(x), is defined to be
fX(x) = Pr(X = x).
In other words, fX (x) is the probability that X takes on the value x. Some-
times we write f(x) if the random variable is clear.
Remark 5.27. In probability theory, people often use the distribution function
of X, which is the function
FX(x) = Pr(X ≤ x),
instead of the density function. Indeed, when studying probability theory for
infinite sample spaces, it is essential to use FX. However, since our sample
spaces are finite, and thus our random variables are finite and discrete, the
two notions are essentially interchangeable. For simplicity, we will stick to
density functions.
There are a number of standard density functions that occur frequently
in discrete probability calculations. We briefly describe a few of the more
common ones.
Example 5.28 (Uniform Distribution). Let S be a set containing N elements;
for example, S could be the set S = {0, 1, . . . , N − 1}. Let X be a random
variable satisfying
fX(j) = Pr(X = j) =
⎧
⎨
⎩
1
N
if j ∈ S,
0 if j /
∈ S.
This random variable X is said to be uniformly distributed or to have uniform
density, since each of the outcomes in S is equally likely.
Example 5.29 (Binomial Distribution). Suppose that an experiment has two
outcomes, success or failure. Let p denote the probability of success. The
experiment is performed n times and the random variable X records the
number of successes. The sample space Ω consists of all binary strings ω =
b1b2 . . . bn of length n, where bi = 0 if the i’th experiment is a failure and bi = 1
if the i’th experiment is a success. Then the value of the random variable X
at ω is simply X(ω) = b1+b2+· · ·+bn, which is the number of successes. Using
the random variable X, we can express the probability of a single event ω as
Pr({ω}) = pX(ω)
(1 − p)n−X(ω)
.
(Do you see why this is the correct formula?) This allows us to compute the
probability of exactly k successes as
fX(k) = Pr(X = k)
=

ω∈Ω, X(ω)=k
Pr({ω})

=

ω∈Ω, X(ω)=k
pX(ω)
(1 − p)n−X(ω)
=

ω∈Ω, X(ω)=k
pk
(1 − p)n−k
= #

ω ∈ Ω : X(ω) = k

pk
(1 − p)n−k
=

n
k

pk
(1 − p)n−k
.
Here the last line follows from the fact that there are
n
k

ways to select
exactly k of the n experiments to be successes. The function
fX(k) =

n
k

pk
(1 − p)n−k
(5.23)
is called the binomial density function.
Example 5.30 (Hypergeometric Distribution). An urn contains N balls of
which m are red and N − m are blue. From this collection, n balls are chosen
at random without replacement. Let X denote the number of red balls chosen.
Then X is a random variable taking on the integer values
0 ≤ X(ω) ≤ min{m, n}.
In the case that n ≤ m, an argument similar to the one that we gave in
Example 5.20 shows that the density function of X is given by the formula
fX (i) = Pr(X = i) =

m
i

. (5.24)
This is called the hypergeometric density function.
Example 5.31 (Geometric Distribution). We give one example of an inﬁnite
probability space. Suppose that we repeatedly toss an unfair coin, where the
probability of getting heads is some number 0 p 1. Let X be the random
variable giving the total number of coin tosses required before heads appears
for the ﬁrst time. Note that it is possible for X to take on any positive integer
value, since it is possible (although unlikely) that we could have a tremen-
dously long string of tails.12
The sample space Ω consists of all binary strings ω = b1b2b3 . . ., where
bi = 0 if the i’th toss is tails and bi = 1 if the i’th toss is heads. Note that Ω is
12For an amusing commentary on long strings of heads, see Act I of Tom Stoppard’s
Rosencrantz and Guildenstern Are Dead.

an infinite set. We assign probabilities to certain events, i.e. to certain subsets
of Ω, by specifying the values of some initial tosses. So for any given finite
binary string γ1γ2 . . . γn, we assign a probability
Pr

{ω ∈ Ω : ω starts γ1γ2 . . . γn}

= p(# of γi equal to 1)
(1−p)(# of γi equal to 0)
.
The random variable X is defined by
X(ω) = X(b1b2b3 . . .) = (smallest i such that bi = 1).
Then
{X = n} =

ω ∈ Ω : X(ω) = n

=

000 . . . 00

n − 1 zeros
1bn+1bn+2 . . .

,
which gives the formula
fX(n) = Pr(X = n) = (1 − p)n−1
p for n = 1, 2, 3, . . . . (5.25)
A random variable with the density function (5.25) is said to have a geometric
density, because the sequence of probabilities fX(1), fX (2), fX (3), . . . form a
geometric progression.13
Later, in Example 5.37, we compute the expected
value of this X by summing an infinite geometric series.
Earlier we studied aspects of probability theory involving two or more
events interacting in various ways. We now discuss material that allows us
study the interaction of two or more random variables.
Definition. Let X and Y be two random variables. The joint density function
of X and Y , denoted by fX,Y (x, y), is the probability that X takes the value x
and Y takes the value y. Thus14
fX,Y (x, y) = Pr(X = x and Y = y).
Similarly, the conditional density function, denoted by fX|Y (x | y), is the prob-
ability that X takes the value x, given that Y takes the value y:
fX|Y (x | y) = Pr(X = x | Y = y).
We say that X and Y are independent if
13A sequence a1, a2, a3, . . . is called a geometric progression if all of the ratios an+1/an
are the same. Similarly, the sequence is an arithmetic progression if all of the differences
an+1 − an are the same.
14Note that the expression Pr(X = x and Y = y) is really shorthand for the probability
of the event
ω ∈ Ω : X(ω) = x and Y (ω) = y

.
If you find yourself becoming confused about probabilities expressed in terms of values of
random variables, it often helps to write them out explicitly in terms of an event, i.e., as
the probability of a certain subset of Ω.

fX,Y (x, y) = fX(x)fY (y) for all x and y.
This is equivalent to the events {X = x} and {Y = y} being independent in
the earlier sense of independence that is defined on page 232. If there is no
chance for confusion, we sometimes write f(x, y) and f(x | y) for fX,Y (x, y)
and fX|Y (x | y), respectively.
Example 5.32. An urn contains four gold coins and three silver coins. A coin is
drawn at random, examined, and returned to the urn, and then a second coin is
randomly drawn and examined. Let X be the number of gold coins drawn and
let Y be the number of silver ones. To find the joint density function fX,Y (x, y),
we need to compute the probability of the event {X = x and Y = y}. To help
explain the calculation, we define two additional random variables. Let
F =

1 if first pick is gold,
0 if first pick is silver,
and S =

1 if second pick is gold,
0 if second pick is silver.
Notice that X = F + S and Y = 2 − X = 2 − F − S. Further, the random
variables F and S are independent, and Pr(F = 1) = Pr(S = 1) = 4
7 . We can
compute fX,Y (1, 1) as follows:
fX,Y (1, 1) = Pr(X = 1 and Y = 1)
= Pr(F = 1 and S = 0) + Pr(F = 0 and S = 1)
= Pr(F = 1) · Pr(S = 0) + Pr(F = 0) · Pr(S = 1)
=
4
7
·
3
7
+
3
7
·
4
7
=
24
49
≈ 0.4898.
In other words, the probability of drawing one gold coin and one silver coin
is about 0.4898. The computation of the other values of fX,Y is similar.
These computations were easy because F and S are independent. How do
our computations change if the first coin is not replaced before the second
coin is selected? Then the probability of getting a silver coin on the second
pick depends on whether the first pick was gold or silver. For example, the
earlier computation of fX,Y (1, 1) changes to
fX,Y (1, 1) = Pr(X = 1 and Y = 1)
= Pr(F = 1 and S = 0) + Pr(F = 0 and S = 1)
= Pr(S = 0 | F = 1) Pr(F = 1) + Pr(S = 1 | F = 0) Pr(F = 0)
=
3
6
·
4
7
+
4
6
·
3
7
=
4
7
≈ 0.5714.
Thus the chance of getting exactly one gold coin and exactly one silver coin
is somewhat larger if the coins are not replaced after each pick.
We remark that this last computation is a special case of the hypergeo-
metric distribution; see Example 5.30. Thus the value fX,Y (1, 1) = 4
7 may be
computed using (5.24) with N = 7, m = 4, n = 2, and i = 1, which yields
4
1
3
1
7
2

= 4
7 .

The following restatement of Bayes’s formula is often convenient for cal-
culations involving conditional probabilities.
Theorem 5.33 (Bayes’s formula). Let X and Y be random variables and
assume that fY (y) 0. Then
fX|Y (x | y) =
fX (x)fY |X (y | x)
fY (y)
.
In particular,
X and Y are independent ⇐⇒ fX|Y (x | y) = fX(x) for all x and y.
Example 5.34. In this example we use Bayes’s formula to explore the inde-
pendence of pairs of random variables taken from a triple (X, Y, Z). Let X
and Y be independent random variables taking on values +1 and −1 with
probability 1
2 each, and let Z = XY . Then Z also takes on the values +1
and −1, and we have
fZ(1) =

x∈{−1,+1}

y∈{−1,+1}
Pr(Z = 1 | X = x and Y = y) · fX,Y (x, y).
(5.26)
If (X, Y ) = (+1, −1) or (X, Y ) = (−1, +1), then Z = 1, so only the two terms
with (x, y) = (1, 1) and (x, y) = (−1, −1) appear in the sum (5.26). For these
two terms, we have Pr(Z = 1 | X = x and Y = y) = 1, so
fZ(1) = Pr(X = 1 and Y = 1) + Pr(X = −1 and Y = −1)
=
1
2
·
1
2
+
1
2
·
1
2
=
1
2
.
It follows that fZ(−1) = 1 − fZ(1) is also equal to 1
2 .
Next we compute the joint probability density of Z and X. For example,
fZ,X(1, 1) = Pr(Z = 1 and X = 1)
= Pr(X = 1 and Y = 1)
=
1
4
since X and Y are independent,
= fZ(1)fX (1).
Similar computations show that
fZ,X(z, x) = fZ(z)fX (x) for all z, x ∈ {−1, +1},
so by Theorem 5.33, Z and X are independent. The argument works equally
well for Z and Y , so Z and Y are also independent. Thus among the three
random variables X, Y , and Z, any pair of them are independent. Yet we
would not want to call the three of them together an independent family,
since the value of Z is determined by the values of X and Y . This prompts
the following deﬁnition.

Definition. A family of two or more random variables {X1, X2, . . . , Xn} is
independent if the events
{X1 = x1 and X2 = x2 and · · · and Xn = xn}
are independent for every choice of x1, x2, . . . , xn.
Notice that the random variables X, Y and Z = XY in Example 5.34 are
not an independent family, since
Pr(Z = 1 and X = 1 and Y = −1) = 0,
while
Pr(Z = 1) · Pr(X = 1) · Pr(Y = −1) =
1
8
.
5.3.5 Expected Value
The expected value of a random variable X is the average of its values weighted
by their probability of occurrence. The expected value thus provides a rough
initial indication of the behavior of X.
Definition. Let X be a random variable that takes on the values x1, . . . , xn.
The expected value (or mean) of X is the quantity
E(X) =
n

i=1
xi · fX (xi) =
n

i=1
xi · Pr(X = xi). (5.27)
Example 5.35. Let X be the random variable whose value is the sum of the
numbers appearing on two tossed dice. The possible values of X are the inte-
gers between 2 and 12, so
E(X) =
12

i=2
i · Pr(X = i).
There are 36 ways for the two dice to fall, as indicated in Table 5.8a. We read
off from that table the number of ways that the sum can equal i for each value
of i between 2 and 12 and compile the results in Table 5.8b. The probability
that X = i is 1
36 times the total number of ways that two dice can sum to i,
so we can use Table 5.8b to compute
E(X) = 2 ·
1
36
+ 3 ·
2
36
+ 4 ·
3
36
+ 5 ·
4
36
+ 6 ·
5
36
+ 7 ·
6
36
+ 8 ·
5
36
+ 9 ·
4
36
+ 10 ·
3
36
+ 11 ·
2
36
+ 12 ·
1
36
= 7.
This answers makes sense, since the middle value is 7, and for any integer j,
the value of X is just as likely to be 7 + j as it is to be 7 − j.

1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
(a) Sum of two dice
Sum # of ways
2 or 12 1
3 or 11 2
4 or 10 3
5 or 9 4
6 or 8 5
7 6
(b) Number of ways to make a sum
Table 5.8: Outcome of rolling two dice
The name “expected” value is somewhat misleading, since the fact that
the expectation E(X) is a weighted average means that it may take on a value
that is not actually attained, as the next example shows.
Example 5.36. Suppose that we choose an integer at random from among
the integers {1, 2, 3, 4, 5, 6} and let X be the value of our choice. Then
Pr(X = i) = 1
6 for each 1 ≤ i ≤ 6, i.e., X is uniformly distributed. The ex-
pected value of X is
E(X) =
1
6
(1 + 2 + 3 + 4 + 5 + 6) =
7
2
.
Thus the expectation of X is a value that X does not actually attain. More
generally, the expected value of a random variable uniformly distributed
on {1, 2, . . . , N} is (N + 1)/2.
Example 5.37. We return to our coin tossing experiment (Example 5.31),
where the probability of getting H on any one coin toss is equal to p. Let X
be the random variable that is equal to n if H appears for the ﬁrst time at the
nth coin toss. Then X has a geometric density, and its density function fX (n)
is given by the formula (5.25). We compute E(X), which is the expected
number of tosses before the ﬁrst H appears:
E(X) =
∞

n=1
np(1 − p)n−1
= −p
∞

n=1
d
dp

(1 − p)n

= −p
d
dp
∞

n=1
(1 − p)n

= −p
d
dp

1
p
− 1

=
p
p2
=
1
p
.
This answer seems plausible, since the smaller the value of p, the more tosses
we expect to need before obtaining our ﬁrst H. The computation of E(X) uses
a very useful trick with derivatives followed by the summation of a geometric
series. See Exercise 5.33 for further applications of this method.

5.4 Collision Algorithms
and Meet-in-the-Middle Attacks
A simple, yet surprisingly powerful, search method is based on the observation
that it is usually much easier to find matching objects than it is to find a
particular object. Methods of this sort go by many names, including meet-in-
the-middle attacks and collision algorithms.
5.4.1 The Birthday Paradox
The fundamental idea behind collision algorithms is strikingly illustrated by
the famous birthday paradox. In a random group of 40 people, consider the
following two questions:
(1) What is the probability that someone has the same birthday as you?
(2) What is the probability that at least two people share the same birthday?
It turns out that the answers to (1) and (2) are very different. As a warm-up,
we start by answering the easier first question.
A rough answer is that since any one person has a 1-in-365 chance of
sharing your birthday, then in a crowd of 40 people, the probability of some-
one having your birthday is approximately 40
365 ≈ 11 %. However, this is an
overestimate, since it double counts the occurrences of more than one person
in the crowd sharing your birthday.15
The exact answer is obtained by com-
puting the probability that none of the people share your birthday and then
subtracting that value from 1.
Pr

someone has
your birthday

= 1 − Pr

none of the 40 people
has your birthday

= 1 −
40

i=1
Pr

ith person does not
have your birthday

40
≈ 10.4 %.
Thus among 40 strangers, there is only slightly better than a 10 % chance that
one of them shares your birthday.
Now consider the second question, in which you win if any two of the people
in the group have the same birthday. Again it is easier to compute the prob-
ability that all 40 people have diﬀerent birthdays. However, the computation
15If you think that 40
365
is the right answer, think about the same situation with 366
people. The probability that someone shares your birthday cannot be 366
365
, since that’s
larger than 1.

5.4. Collision Algorithms and Meet-in-the-Middle Attacks 247
changes because we now require that the ith person have a birthday that is
diﬀerent from all of the previous i − 1 people’s birthdays. Hence the calcula-
tion is
Pr

two people have
the same birthday

= 1 − Pr

all 40 people have
diﬀerent birthdays

= 1 −
40

i=1
Pr
⎛
⎝
ith person does not have
the same birthday as any
of the previous i − 1 people
⎞
⎠
= 1 −
40

i=1
365 − (i − 1)
365
= 1 −
365
365
·
364
365
·
363
365
· · ·
326
365
≈ 89.1 %.
Thus among 40 strangers, there is almost a 90 % chance that two of them
share a birthday.
The only part of this calculation that merits some comment is the formula
for the probability that the ith person has a birthday diﬀerent from any of
the previous i − 1 people. Among the 365 possible birthdays, note that the
previous i − 1 people have taken up i − 1 of them. Hence the probability that
the ith person has his or her birthday among the remaining 365 − (i − 1)
days is
365 − (i − 1)
365
.
Most people tend to assume that questions (1) and (2) have essentially the
same answer. The fact that they do not is called the birthday paradox. In fact,
it requires only 23 people to have a better than 50 % chance of a matched
birthday, while it takes 253 people to have better than a 50 % chance of
ﬁnding someone who has your birthday.
5.4.2 A Collision Theorem
Cryptographic applications of collision algorithms are generally based on the
following setup. Bob has a box that contains N numbers. He chooses n distinct
numbers from the box and puts them in a list. He then makes a second list by
choosing m (not necessarily distinct) numbers from the box. The remarkable
fact is that if n and m are each slightly larger than
√
N, then it is very likely
that the two lists contain a common element.
We start with an elementary result that illustrates the sort of calculation
that is used to quantify the probability of success of a collision algorithm.
Theorem 5.38 (Collision Theorem). An urn contains N balls, of which n
are red and N − n are blue. Bob randomly selects a ball from the urn, replaces

it in the urn, randomly selects a second ball, replaces it, and so on. He does
this until he has looked at a total of m balls.
(a) The probability that Bob selects at least one red ball is
Pr(at least one red) = 1 −

1 −
n
N
m
. (5.28)
(b) A lower bound for the probability (5.28) is
Pr(at least one red) ≥ 1 − e−mn/N
. (5.29)
If N is large and if m and n are not too much larger than
√
N (e.g.,
m, n 10
√
N), then (5.29) is almost an equality.
Proof. Each time Bob selects a ball, his probability of choosing a red one is n
N ,
so you might think that since he chooses m balls, his probability of getting
a red one is mn
N . However, a small amount of thought shows that this must
be incorrect. For example, if m is large, this would lead to a probability that
is larger than 1. The diﬃculty, just as in the birthday example in Sect. 5.4.1,
is that we are overcounting the times that Bob happens to select more than
one red ball. The correct way to calculate is to compute the probability that
Bob chooses only blue balls and then subtract this complementary probability
from 1. Thus
Pr

at least one red
ball in m attempts

= 1 − Pr(all m choices are blue)
= 1 −
m

i=1
Pr(ith choice is blue)
= 1 −
m

i=1

N − n
N

= 1 −

1 −
n
N
m
.
This completes the proof of (a).
For (b), we use the inequality
e−x
≥ 1 − x for all x ∈ R.
(See Exercise 5.38(a) for a proof.) Setting x = n/N and raising both sides of
the inequality to the mth power shows that
1 −

1 −
n
N
m
≥ 1 − (e−n/N
)m
= 1 − e−mn/N
,
which proves the important inequality in (b). We leave it to the reader (Ex-
ercise 5.38(b)) to prove that the inequality is close to being an equality if m
and n is not too large compared to
√
N.

In order to connect Theorem 5.38 with the problem of finding a match
in two lists of numbers, we view the list of numbers as an urn containing N
numbered blue balls. After making our first list of n different numbered balls,
we repaint those n balls with red paint and return them to the box. The
second list is constructed by drawing m balls out of the urn one at a time,
noting their number and color, and then replacing them. The probability of
selecting at least one red ball is the same as the probability of a matched
number on the two lists.
Example 5.39. A deck of cards is shuffled and eight cards are dealt face up.
Bob then takes a second deck of cards and chooses eight cards at random,
replacing each chosen card before making the next choice. What is Bob’s
probability of matching one of the cards from the first deck?
We view the eight dealt cards from the first deck as “marking” those same
cards in the second deck. So our “urn” is the second deck, the “red balls”
are the eight marked cards in the second deck, and the “blue balls” are the
other 44 cards in the second deck. Theorem 5.38(a) tells us that
Pr(a match) = 1 −

1 −
8
52

8
≈ 73.7 %.
The approximation in Theorem 5.38(b) gives a lower bound of 70.8 %.
Suppose instead that Bob deals ten cards from the ﬁrst deck and chooses
only ﬁve cards from the second deck. Then
Pr(a match) = 1 −

1 −
10
52

5
≈ 65.6 %.
Example 5.40. A box contains 10 billion labeled objects. Bob randomly se-
lects 100,000 distinct objects from the box, makes a list of which objects
he’s chosen, and returns them to the box. If he next randomly selects an-
other 100,000 objects (with replacement) and makes a second list, what
is the probability that the two lists contain a match? Formula (5.28) in
Theorem 5.38(a) says that
Pr(a match) = 1 −

1 −
100,000
1010

100,000
≈ 0.632122.
The approximate lower bound given by the formula (5.29) in Theorem 5.38(b)
is 0.632121. As you can see, the approximation is quite accurate.
It is interesting to observe that if Bob doubles the number of objects in
his lists to 200,000, then his probability of getting a match increases quite
substantially to 98.2 %. And if he triples the number of elements in each list
to 300,000, then the probability of a match is 99.988 %. This rapid increase
reﬂects that fact that the exponential function in (5.29) decreases very rapidly
as soon as mn becomes larger than N.

Example 5.41. A set contains N objects. Bob randomly chooses n of them,
makes a list of his choices, replaces them, and then chooses another n of them.
How large should he choose n to give himself a 50 % chance of getting a match?
How about if he wants a 99.99 % chance of getting a match?
For the ﬁrst question, Bob uses the reasonably accurate lower bound of
formula (5.29) to set
Pr(match) ≈ 1 − e−n2
/N
=
1
2
.
It is easy to solve this for n:
e−n2
/N
=
1
2
=⇒ −
n2
N
= ln

1
2

=⇒ n =
√
N · ln 2 ≈ 0.83
√
N.
Thus it is enough to create lists that are a bit shorter than
√
N in length.
The second question is similar, but now Bob solves
Pr(match) ≈ 1 − e−n2
/N
= 0.9999 = 1 − 10−4
.
The solution is
n =
√
N · ln 104 ≈ 3.035 ·
√
N.
Remark 5.42. Algorithms that rely on finding matching elements from within
one or more lists go by a variety of names, including collision algorithm,
meet-in-the-middle algorithm, birthday paradox algorithm, and square root al-
gorithm. The last refers to the fact that the running time of a collision al-
gorithm is generally a small multiple of the square root of the running time
required by an exhaustive search. The connection with birthdays was briefly
discussed in Sect. 5.4.1; see also Exercise 5.36. When one of these algorithms is
used to break a cryptosystem, the word “algorithm” is often replaced by the
word “attack,” so cryptanalysts refer to meet-in-the-middle attacks, square
root attacks, etc.
Remark 5.43. Collision algorithms tend to take approximately
√
N steps in
order to find a collision among N objects. A drawback of these algorithms
is that they require creation of one or more lists of size approximately
√
N.
When N is large, providing storage for
√
N numbers may be more of an ob-
stacle than doing the computation. In Sect. 5.5 we describe a collision method
due to Pollard that, at the cost of a small amount of extra computation,
requires essentially no storage.
5.4.3 A Discrete Logarithm Collision Algorithm
There are many applications of collision algorithms to cryptography. These
may involve searching a space of keys or plaintexts or ciphertexts, or for
public key cryptosystems, they may be aimed at solving the underlying hard
mathematical problem. In this section we illustrate the general theory by

formulating an abstract randomized collision algorithm to solve the discrete
logarithm problem. For the ﬁnite ﬁeld Fp, it solves the discrete logarithm
problem (DLP) in approximately
√
p steps.
One may well ask why the probabilistic collision algorithm described in
Proposition 5.44 with expected running time O
√
N

is interesting, since the
baby step–giant step algorithm from Sect. 2.7 is deterministic and solves the
same problem in the same amount of time. One answer is that both algorithms
also require O
√
N

storage, which is a serious constraint if N is large. So
the collision algorithm in Proposition 5.44 may be viewed as a warm-up for
Pollard’s ρ algorithm, which is a collision algorithm taking O
√
N

time, but
using only O(1) storage. We will discuss Pollard’s algorithm in Sect. 5.5.
One might also inquire why any of these O
√
N

collision algorithms are
interesting, since, the index calculus described in Sect. 3.8 solves the DLP in Fp
much more rapidly. But there are other groups, such as elliptic curve groups,
for which collision algorithms are the fastest known way to solve the DLP.
This explains why elliptic curve groups are used in cryptography; at present,
the DLP in an elliptic curve group is much harder than the DLP in F∗
p for
groups of about the same size. Elliptic curves and their use in cryptography
is the subject of Chap. 6.
Proposition 5.44. Let G be a group, and let g ∈ G be an element of order N,
i.e., gN
= e and no smaller power of g is equal to e. Then, assuming that the
gx
= h (5.30)
has a solution, a solution can be found in O
√
N

steps, where each step is
an exponentiation in the group G. (Note that since gN
= e, the powering al-
gorithm from Sect. 1.3.2 lets us raise g to any power using fewer than 2 log2 N
group multiplications.)
Proof. The idea is to write x as x = y − z and look for a solution to
gy
= h · gz
.
We do this by making a list of gy
values and a list of h · gz
values and looking
for a match between the two lists.
We begin by choosing random exponents y1, y2, . . . , yn between 1 and N
and computing the values
gy1
, gy2
, gy3
, . . . , gyn
in G. (5.31)

Note that all of the values (5.31) are in the set
S = {1, g, g2
, g3
, . . . , gN−1
},
so (5.31) is a selection of (approximately) n elements of S. In terms of the
collision theorem (Theorem 5.38), we view S as an urn containing N balls and
the list (5.31) as a way of coloring n of those balls red.
Next we choose additional random exponents z1, z2, . . . , zn between 1 and k
and compute the quantities
h · gz1
, h · gz2
, h · gz3
, . . . , h · gzn
in G. (5.32)
Since we are assuming that (5.30) has a solution, i.e., h is equal to some
power of g, it follows that each of the values h · gzi
is also in the set S. Thus
the list (5.32) may be viewed as selecting n elements from the urn, and we
would like to know the probability of selecting at least one red ball, i.e., the
probability that at least one element in the list (5.32) matches an element in
the list (5.31). The collision theorem (Theorem 5.38) says that
Pr

at least one match
between (5.31) and (5.32)

≈ 1 −

1 −
n
N
n
≈ 1 − e−n2
/N
.
Thus if we choose (say) n ≈ 3
√
N, then our probability of getting a match
is greater than 99.98 %, so we are almost guaranteed a match. Or if that
is not good enough, take n ≈ 5
√
N to get a probability of success greater
than 1 − 10−10
. Notice that as soon as we find a match between the two lists,
say gy
= h · gz
, then we have solved the discrete logarithm problem (5.30) by
setting x = y − z.16
How long does it take us to find this solution? Each of the lists (5.31)
and (5.32) has n elements, so it takes approximately 2n steps to assemble each
list. More precisely, each element in each list requires us to compute gi
for some
value of i between 1 and N, and it takes approximately 2 log2(i) group mul-
tiplications to compute gi
using the fast exponentiation algorithm described
in Sect. 1.3.2. (Here log2 is the logarithm to the base 2.) Thus it takes ap-
proximately 4n log2(N) multiplications to assemble the two lists. In addition,
it takes about log2(n) steps to check whether an element of the second list is
in the first list (e.g., sort the first list), so n log2(n) comparisons altogether.
Hence the total computation time is approximately
4n log2(N) + n log2(n) = n log2(N4
n) steps.
Taking n ≈ 3
√
N, which as we have seen gives us a 99.98 % chance of success,
we find that
Computation Time ≈ 13.5 ·
√
N · log2(1.3 · N).
16If this value of x happens to be negative and we want a positive solution, we can always
use the fact that gN = 1 to replace it with x = y − z + N.

5.5. Pollard’s ρ Method 253
t gt
h · gt
564 410 422
469 357 181
276 593 620
601 416 126
9 512 3
350 445 233
t gt
h · gt
53 10 605
332 651 175
178 121 401
477 450 206
503 116 428
198 426 72
t gt
h · gt
513 164 37
71 597 203
314 554 567
581 47 537
371 334 437
83 422 489
Table 5.9: Solving 2x
= 390 in F659 with random exponent collisions
Example 5.45. We do an example with small numbers to illustrate the use of
collisions. We solve the discrete logarithm problem
2x
= 390 in the finite field F659.
The number 2 has order 658 modulo 659, so it is a primitive root. In this
example g = 2 and h = 390. We choose random exponents t and compute
the values of gt
and h · gt
until we get a match. The results are compiled in
Table 5.9. We see that
283
= 390 · 2564
= 422 in F659.
Hence using two lists of length 18, we have solved a discrete logarithm problem
in F659. (We had a 39 % chance of getting a match with lists of length 18, so
we were a little bit lucky.) The solution is
283
· 2−564
= 2−481
= 2177
= 390 in F659.
Remark 5.46. The algorithms described in Propositions 2.21 and 5.44 solve
the DLP in O
√
N

steps. It is thus interesting that, in a certain sense, Victor
Shoup [130] has shown that there cannot exist a general algorithm to solve the
DLP in an arbitrary finite group in fewer than O
√
p

steps, where p is the
largest prime dividing the order of the group. This is the so-called black box
DLP, in which you are given a box that instantaneously performs the group
operations, but you’re not allowed to look inside the box to see how it is doing
the computations.
5.5 Pollard’s ρ Method
As we noted in Remark 5.43, collision algorithms tend to require a considerable
amount of storage. A beautiful idea of Pollard often allows one to use almost
no storage, at the cost of a small amount of extra computation. We explain

Tail Length = T
Loop Length = M
x0
x1
x2
x3
xT−1
xT
xT+1
xT+2
xT+3
xT+4
xM+T−4
xM+T−3
xM+T−2
xM+T−1
xM+T
Figure 5.1: Pollard’s ρ method
the basic idea behind Pollard’s method and then illustrate it by yet again
solving a small instance of the discrete logarithm problem in Fp. See also
Exercise 5.44 for a factorization algorithm based on the same ideas.
5.5.1 Abstract Formulation of Pollard’s ρ Method
We begin in an abstract setting. Let S be a ﬁnite set and let
f : S −→ S
be a function that does a good job at mixing up the elements of S. Suppose
that we start with some element x ∈ S and we repeatedly apply f to create
a sequence of elements
x0 = x, x1 = f(x0), x2 = f(x1), x3 = f(x2), x4 = f(x3), . . . .
In other words,
xi = (f ◦ f ◦ f ◦ · · · ◦ f

i iterations of f
)(x).
The map f from S to itself is an example of a discrete dynamical system.
The sequence
x0, x1, x2, x3, x4, . . . (5.33)
is called the (forward) orbit of x by the map f and is denoted by O+
f (x).
The set S is ﬁnite, so eventually there must be some element of S that
appears twice in the orbit O+
f (x). We can illustrate the orbit as shown in
Fig. 5.1. For a while the points x0, x1, x2, x3, . . . travel along a “path” without
repeating until eventually they loop around to give a repeated element. Then

they continue moving around the loop. As illustrated, we let T be the number
of elements in the “tail” before getting to the loop, and we let M be the
number of elements in the loop. Mathematically, T and M are deﬁned by the
conditions
T =

largest integer such that xT −1
appears only once in O+
f (x)

M =

smallest integer such
that xT +M = xT

Remark 5.47. Look again at the illustration in Fig. 5.1. It may remind you of
a certain Greek letter. For this reason, collision algorithms based on following
the orbit of an element in a discrete dynamical system are called ρ algorithms.
The first ρ algorithm was invented by Pollard in 1974.
Suppose that S contains N elements. Later, in Theorem 5.48, we will
sketch a proof that the quantity T + M is usually no more than a small
multiple of
√
N. Since xT = xT +M by definition, this means that we obtain
a collision in O(
√
N ) steps. However, since we don’t know the values of T
and M, it appears that we need to make a list of x0, x1, x2, x3, . . . , xT +M in
order to detect the collision.
Pollard’s clever idea is that it is possible to detect a collision in O(
√
N )
steps without storing all of the values. There are various ways to accomplish
this. We describe one such method. Although not of optimal efficiency, it
has the advantage of being easy to understand. (For more efficient methods,
see [23, 28, §8.5], or [90].) The idea is to compute not only the sequence xi,
but also a second sequence yi defined by
y0 = x0 and yi+1 = f

f(yi)

for i = 0, 1, 2, 3, . . . .
In other words, every time that we apply f to generate the next element of
the xi sequence, we apply f twice to generate the next element of the yi
sequence. It is clear that
yi = x2i.
How long will it take to find an index i with x2i = xi? In general, for j i
we have
xj = xi if and only if i ≥ T and j ≡ i (mod M).
This is clear from the ρ-shaped picture in Fig. 5.1, since we get xj = xi pre-
cisely when we are past xT , i.e., when i ≥ T, and xj has gone around the loop
past xi an integral number of times, i.e., when j − i is a multiple of M.
Thus x2i = xi if and only if i ≥ T and 2i ≡ i (mod M). The lat-
ter condition is equivalent to M | i, so we get x2i = xi exactly when i is
equal to the first multiple of M that is larger than T. Since one of the num-
bers T, T + 1,. . . , T + M − 1 is divisible by M, this proves that
x2i = xi for some 1 ≤ i T + M.

We show in the next theorem that the average value of T + M is approx-
imately 1.25 ·
√
N, so we have a very good chance of getting a collision in
a small multiple of
√
N steps. This is more or less the same running time
as the collision algorithm described in Sect. 5.4.3, but notice that we need to
store only two numbers, namely the current values of the xi sequence and
the yi sequence.
Theorem 5.48 (Pollard’s ρ Method: abstract version). Let S be a finite set
containing N elements, let f : S → S be a map, and let x ∈ S be an initial
point.
(a) Suppose that the forward orbit O+
f (x) = {x0, x1, x2, . . .} of x has a tail of
length T and a loop of length M, as illustrated in Fig. 5.1. Then
x2i = xi for some 1 ≤ i T + M. (5.34)
(b) If the map f is sufficiently random, then the expected value of T + M is
E(T + M) ≈ 1.2533 ·
√
N.
Hence if N is large, then we are likely to find a collision as described
by (5.34) in O(
√
N ) steps, where a “step” is one evaluation of the
function f.
Proof. (a) We proved this earlier in this section.
(b) We sketch the proof of (b) because it is an instructive blend of probability
theory and analysis of algorithms. However, the reader desiring a rigorous
proof will need to fill in some details. Suppose that we compute the first k
values x0, x1, x2, . . . , xk−1. What is the probability that we do not get any
matches? If we assume that the successive xi’s are randomly chosen from the
set S, then we can compute this probability as
Pr

x0, x1, . . . , xk−1
are all different

=
k−1

i=1
Pr

xi = xj for
all 0 ≤ j i

x0, x1, . . . , xi−1
are all diﬀerent

(5.35)
=
k−1

i=1

1 −
i
N

. (5.36)
Note that the probability formula (5.35) comes from the fact that if the ﬁrst i
choices x0, x1, . . . , xi−1 are distinct, then among the N possible choices for xi,
exactly N − i of them are diﬀerent from the previously chosen values. Hence
the probability of getting a new value, assuming that the earlier values were
distinct, is N−i
N .

We can approximate the product (5.36) using the estimate
1 − t ≈ e−t
, valid for small values of t.
(Compare with the proof of Theorem 5.38(b), and see also Exercise 5.38.) In
practice, k will be approximately
√
N and N will be large, so i
N will indeed
be small for 1 ≤ i k. Hence
Pr

x0, x1, . . . , xk−1
are all diﬀerent

≈
k−1

i=1
e−i/N
= e−(1+2+···+(k−1))/N
≈ e−k2
/2N
. (5.37)
For the last approximation we are using the fact that
1 + 2 + · · · + (k − 1) =
k2
− k
2
≈
k2
2
when k is large.
We now know the probability that x0, x1, . . . , xk−1 are all distinct.
Assuming that they are distinct, what is the probability that the next
choice xk gives a match? There are k elements for it to match among the N
possible elements, so this conditional probability is
Pr

xk is a match

x0, . . . , xk−1 are distinct

=
k
N
. (5.38)
Hence
Pr

xk is the first match

= Pr

xk is a match AND x0, . . . , xk−1 are distinct

= Pr

xk is a match


· Pr


≈
k
N
· e−k2
/2N
from (5.37) and (5.38).
The expected number of steps before finding the first match is then given
by the formula
E(first match) =

k≥1
k · Pr

xk is the first match

≈

k≥1
k2
N
· e−k2
/2N
. (5.39)
We want to know what this series looks like as a function of N. The following
estimate, whose derivation uses elementary calculus, is helpful in estimating
series of this sort.
Lemma 5.49. Let F(t) be a “nicely behaved” real valued function17
with the
property that
∞
0
F(t) dt converges. Then for large values of n we have
∞

k=1
F

k
n

≈ n ·
∞
0
F(t) dt. (5.40)
17For example, it would suﬃce that F have a continuous derivative.

Proof. We start with the deﬁnite integral of F(t) over an interval 0 ≤ t ≤ A.
By deﬁnition, this integral is equal to a limit of Riemann sums,
A
0
F(t) dt = lim
n→∞
An

k=1
F

k
n

·
1
n
,
where in the sum we have broken the interval [0, A] into An pieces. In partic-
ular, if n is large, then
n ·
A
0
F(t) dt ≈
An

k=1
F

k
n

.
Now letting A → ∞ yields (5.40). (We do not claim that this is a rigorous
argument. Our aim is merely to convey the underlying idea. The interested
reader may supply the details needed to complete the argument and to obtain
explicit upper and lower bounds.)
We use Lemma 5.49 to estimate
E(ﬁrst match) ≈

k≥1
k2
N
· e−k2
/2N
from (5.39),
=

k≥1
F

k
√
N

letting F(t) = t2
e−t2
/2
,
≈
√
N ·
∞
0
t2
e−t2
/2
dt from (5.40) with n =
√
N,
≈ 1.2533 ·
√
N by numerical integration.
For the last line, we used a numerical method to estimate the definite inte-
gral, although in fact the integral can be evaluated exactly. (Its value turns
out to be

π/2; see Exercise 5.43.) This completes the proof of (b), and
combining (a) and (b) gives the final statement of Theorem 5.48.
Remark 5.50. It is instructive to check numerically the accuracy of the esti-
mates used in the proof of Theorem 5.48. In that proof we claimed that for
large values of N, the expected number of steps before finding a match is
given by each of the following three formulas:
E1 =

k≥1
k2
N
k−1

i=1

1 −
i
N

E2 =

k≥1
k2
N
e−k2
/2N
E3 =
√
N
∞
0
t2
e−t2
/2
dt
More precisely, E1 is the exact formula, but hard to compute exactly if N is
very large, while E2 and E3 are approximations. We have computed the values
of E1, E2, and E3 for some moderate sized values of N and compiled the results
in Table 5.10. As you can see, E2 and E3 are quite close to one another, and
once N gets reasonably large, they also provide a good approximation for E1.
Hence for very large values of N, say 280
N 2160
, it is quite reasonable
to estimate E1 using E3.

5.5.2 Discrete Logarithms via Pollard’s ρ Method
In this section we describe how to use Pollard’s ρ method to solve the discrete
logarithm problem
gt
= h in F∗
p
N E1 E2 E3 E1/E3
100 12.210 12.533 12.533 0.97421
500 27.696 28.025 28.025 0.98827
1000 39.303 39.633 39.633 0.99167
5000 88.291 88.623 88.623 0.99626
10000 124.999 125.331 125.331 0.99735
20000 176.913 177.245 177.245 0.99812
50000 279.917 280.250 280.250 0.99881
Table 5.10: Expected number of steps until a ρ collision
when g is a primitive root modulo p. The idea is to find a collision between gi
hj
and gk
h
for some known exponents i, j, k, . Then gi−k
= h−j
, and taking
roots in Fp will more or less solve the problem of expressing h as a power of g.
The difficulty is finding a function f : Fp → Fp that is complicated enough
to mix up the elements of Fp, yet simple enough to keep track of its orbits.
Pollard [104] suggests using the function
f(x) =
⎧
⎪
⎨
⎪
⎩
gx if 0 ≤ x p/3,
x2
if p/3 ≤ x 2p/3,
hx if 2p/3 ≤ x p.
(5.41)
Note that x must be reduced modulo p into the range 0 ≤ x p before (5.41)
is used to determine the value of f(x).
Remark 5.51. No one has proven that the function f(x) given by (5.41) is
sufficiently random to guarantee that Theorem 5.48 is true for f, but experi-
mentally, the function f works fairly well. However, Teske [144, 145] has shown
that f is not sufficiently random to give optimal results, and she gives exam-
ples of somewhat more complicated functions that work better in practice.
Consider what happens when we repeatedly apply the function f given
by (5.41) to the starting point x0 = 1. At each step, we either multiply by g,
multiply by h, or square the previous value. So after each step, we end up
with a power of g multiplied by a power of h, say after i steps we have
xi = (f ◦ f ◦ f ◦ · · · ◦ f

i iterations of f
)(1) = gαi
· hβi
.

We cannot predict the values of αi and βi, but we can compute them at the
same time that we are computing the xi’s using the definition (5.41) of f.
Clearly α0 = β0 = 0, and then subsequent values are given by
αi+1 =
⎧
⎪
⎨
⎪
⎩
αi + 1 if 0 ≤ x p/3,
2αi if p/3 ≤ x 2p/3,
αi if 2p/3 ≤ x p,
βi+1 =
⎧
⎪
⎨
⎪
⎩
βi if 0 ≤ x p/3,
2βi if p/3 ≤ x 2p/3,
βi + 1 if 2p/3 ≤ x p.
In computing αi and βi, it suffices to keep track of their values modulo p − 1,
since gp−1
= 1 and hp−1
= 1. This is important, since otherwise the values
of αi and βi would become prohibitively large.
In a similar fashion we compute the sequence given by
y0 = 1 and yi+1 = f

f(yi)

.
Then
yi = x2i = gγi
· hδi
,
where the exponents γi and δi can be computed by two repetitions of the
recursions used for αi and βi. Of course, the first time we use yi to determine
which case of (5.41) to apply, and the second time we use f(yi) to decide.
Applying the above procedure, we eventually find a collision in the x and
the y sequences, say yi = xi. This means that
gαi
· hβi
= gγi
· hδi
.
So if we let
u ≡ αi − γi (mod p − 1) and v ≡ δi − βi (mod p − 1),
then gu
= hv
in Fp. Equivalently,
v · logg(h) ≡ u (mod p − 1). (5.42)
If gcd(v, p − 1) = 1, then we can multiply both sides of (5.42) by the inverse
of v modulo p − 1 to solve the discrete logarithm problem.
More generally, if d = gcd(v, p − 1) ≥ 2, we use the extended Euclidean
algorithm (Theorem 1.11) to find an integer s such that
s · v ≡ d (mod p − 1).
Multiplying both sides of (5.42) by s yields
d · logg(h) ≡ w (mod p − 1), (5.43)

where w ≡ s · u (mod p − 1). In this congruence we know all of the quantities
except for logg(h). The fact that d divides p − 1 will force d to divide w,
so w/d is one solution to (5.43), but there are others. The full set of solutions
to (5.43) is obtained by starting with w/d and adding multiples of (p − 1)/d,
logg(h) ∈

w
d
+ k ·
p − 1
d
: k = 0, 1, 2, . . . , d − 1

.
In practice, d will tend to be fairly small,18
so it suffices to check each of the d
possibilities for logg(a) until the correct value is found.
Example 5.52. We illustrate Pollard’s ρ method by solving the discrete loga-
rithm problem
19t
≡ 24717 (mod 48611).
The first step is to compute the x and y sequences until we find a match
yi = xi, while also computing the exponent sequences α, β, γ, δ. The initial
stages of this process and the final few steps before a collision has been found
are given in Table 5.11.
i xi yi = x2i αi βi γi δi
0 1 1 0 0 0 0
1 19 361 1 0 2 0
2 361 33099 2 0 4 0
3 6859 13523 3 0 4 2
4 33099 20703 4 0 6 2
5 33464 14974 4 1 13 4
6 13523 18931 4 2 14 5
7 13882 30726 5 2 56 20
8 20703 1000 6 2 113 40
9 11022 14714 12 4 228 80
.
.
.
542 21034 46993 13669 2519 27258 30257
543 20445 37138 27338 5038 27259 30258
544 40647 33210 6066 10076 5908 11908
545 28362 21034 6066 10077 5909 11909
546 36827 40647 12132 20154 23636 47636
547 11984 36827 12132 20155 47272 46664
548 33252 33252 12133 20155 47273 46665
Table 5.11: Pollard ρ computations to solve 19t
= 24717 in F48611
18For most cryptographic applications, the prime p is chosen such that p−1 has precisely
one large prime factor, since otherwise, the Pohlig–Hellman algorithm (Theorem 2.31) may
be applicable. And it is unlikely that d will be divisible by the large prime factor of p − 1.

From the table we see that x1096 = x548 = 33252 in F48611. The associated
exponent values are
α548 = 12133, β548 = 20155, γ548 = 47273, δ548 = 46665,
so we know that
1912133
· 2471720155
= 1947273
· 2471746665
in F48611.
(Before proceeding, we should probably check this equality to make sure that
we didn’t made an arithmetic error.) Moving the powers of 19 to one side
and the powers of 24717 to the other side yields 19−35140
= 2471726510
, and
adding 48610 = p − 1 to the exponent of 19 gives
1913470
= 2471726510
in F48611. (5.44)
We next observe that
gcd(26510, 48610) = 10 and 970 · 26510 ≡ 10 (mod 48610).
Raising both sides of (5.44) to the 970th power yields
1913470·970
= 1913065900
= 1938420
= 2471710
in F48611.
Hence
10 · log19(24717) ≡ 38420 (mod 48610),
which means that
log19(24717) ≡ 3842 (mod 4861).
The possible values for the discrete logarithm are obtained by adding multiples
of 4861 to 3842, so log19(24717) is one of the numbers in the set
{3842, 8703, 13564, 18425, 23286, 28147, 33008, 37869, 42730, 47591}.
To complete the solution, we compute 19 raised to each of these 10 values
until we ﬁnd the one that is equal to 24717:
193842
= 16580, 198703
= 29850, 1913564
= 23894, 1918425
= 20794,
1923286
= 10170, 1928147
= 32031, 1933008
= 18761, 1937869
= 24717 .
This gives the solution log19(24717) = 37869. We check our answer
1937869
= 24717 in F48611.

5.6. Information Theory 263
5.6 Information Theory
In 1948 and 1949, Claude Shannon published two papers [126, 127] that form
the mathematical foundation of modern cryptography. In these papers he
defines the concept of perfect (or unconditional) secrecy, introduces the idea of
entropy of natural language and statistical analysis, provides the first proofs
of security using probability theory, and gives precise connections between
provable security and the size of the key, plaintext, and ciphertext spaces.
In public key cryptography, one is interested in how computationally dif-
ficult it is to break the system. The issue of security is thus a relative one—a
given cryptosystem is hard to break if one assumes that some underlying
problem is hard to solve. It requires some care to formulate these concepts
properly. In this section we briefly introduce Shannon’s ideas and explain their
relevance to symmetric key systems. In [127], Shannon develops a theory of
security for cryptosystems that assumes that no bounds are placed on the com-
putational resources that may be brought to bear against them. For example,
symmetric ciphers such as the simple substitution cipher (Sect. 1.1) and the
Vigènere cipher (Sect. 5.2) are not computationally secure. With unlimited
resources—indeed with very limited resources—an adversary can easily break
these ciphers. If we seek unconditional security, we must either seek new algo-
rithms or modify the implementation of known algorithms. In fact, Shannon
shows that perfectly secure cryptosystems must have at least as many keys as
plaintexts and that every key must be used with equal probability. This means
that most practical cryptosystems are not unconditionally secure. We discuss
the notion of perfect security in Sect. 5.6.1.
In [126] Shannon develops a mathematical theory that measures the
amount of information that is revealed by a random variable. When the ran-
dom variable represents the possible plaintexts or ciphertexts or keys of a
cipher that is used to encrypt a natural language such as English, we obtain
a framework for the rigorous mathematical study of cryptographic security.
Shannon adopted the word entropy for this measure because of its formal
similarity to Boltzmann’s definition of entropy in statistical mechanics, and
also because Shannon viewed language as a stochastic process, i.e., as a sys-
tem governed by probabilities that produces a sequence of symbols. Later,
the physicist E.T. Jaynes [60] argued that thermodynamic entropy could be
interpreted as an application of a certain information-theoretic entropy. As
a measure of “uncertainty” of a system, the logarithmic formula for entropy
is determined, up to a constant, by requiring that it be continuous, mono-
tonic, and satisfy a certain additive property. We discuss information-theoretic
entropy and its application to cryptography in Sect. 5.6.2.
5.6.1 Perfect Secrecy
A cryptosystem has perfect secrecy if the interception of a ciphertext gives
the cryptanalyst no information about the underlying plaintext and no

information about any future encrypted messages. To formalize this concept,
we introduce random variables M, C, and K representing the finite number
of possible messages, ciphertexts, and keys. In other words, M is a random
variable whose values are the possible messages (plaintexts), C is a random
variable whose values are the possible ciphertexts, and K is a random vari-
able whose values are the possible keys used for encryption and decryption.
We let fM , fC, and fK be the associated density functions.19
The density func-
tions fM , fK, and fC are related to one another via the encryption/decryption
formula dk(ek(m)) = m, which we will exploit shortly to prove (5.47).
We also have the joint densities and the conditional densities of all pairs
of these random variables, such as f(C,M)(c, m) and fC|M (c | m), and so
forth. We will let the variable names simplify the notation. For example,
we write f(c | m) for fC|M (c | m), the conditional probability density of the
random variables C and M, i.e.,
f(c | m) = Pr(C = c given that M = m).
Similarly, we write f(m) for fM (m), the probability that M = m.
Definition. A cryptosystem has perfect secrecy if
f(m | c) = f(m) for all m ∈ M and all c ∈ C. (5.45)
What does (5.45) mean? It says that the probability of any particular
plaintext, Pr(M = m), is independent of the ciphertext. Intuitively, this means
that the ciphertext reveals no knowledge of the plaintext.
Bayes’s formula (Theorem 5.33) says that
f(m | c)f(c) = f(c | m)f(m),
which implies that perfect secrecy is equivalent to the condition
f(c | m) = f(c) for all c ∈ C and all m ∈ M with f(m) = 0. (5.46)
Formula (5.46) says that the appearance of any particular ciphertext is equally
likely, independent of the plaintext.
If we know fK and fM , then fC is determined. To see this, we note that
for a given key k, the probability that the ciphertext equals c is the same as
the probability that the decryption of c is the plaintext, assuming of course
19As is typical, we have omitted reference to the underlying sample spaces. To be com-
pletely explicit, we have three probability spaces with sample spaces ΩM , ΩC , and ΩK and
probability functions PrM , PrC , and PrK . Then M, C and K are random variables
M : ΩM → M, K : ΩK → K, C : ΩC → C.
Then by definition, the density function fM is
fM (m) = Pr(M = m) = PrM

{ω ∈ ΩM : M(ω) = m}

,
and similarly for K and C.

that c is the encryption of some plaintext for key k. This allows us to com-
pute the total probability fC(c) by summing over all possible keys and using
the decomposition formula (5.20) of Proposition 5.24, or more precisely, its
generalization described in Exercise 5.23. As usual, we let K denote the set
of all possible keys and ek : M → C and dk : C → M be the encryption
and decryption functions for the key k ∈ K. Then the probability that the
ciphertext is equal to c is given by the formula
m1 m2 m3
k1 c2 c1 c3
k2 c1 c3 c2
Table 5.12: Encryption of messages with keys k1 and k2
fC(c) =

k ∈ K such
that c = ek(m)
for some m ∈ M
fK(k)fM

dk(c)

; (5.47)
see also Exercise 5.47. We note that if the encryption map ek : M → C is onto
for all keys k, which is often true in practice, then the sum in (5.47) is over
all k ∈ K.
Example 5.53. Consider the Shift Cipher described in Sect. 1.1. Suppose that
each of the 26 possible keys (shift amounts) is chosen with equal probability
and that each plaintext character is encrypted using a new, randomly chosen,
shift amount. Then it is not hard to check that the resulting cryptosystem
has perfect secrecy; see Exercise 5.46.
Recall that an encryption function is one-to-one, meaning that each mes-
sage gives rise to a unique ciphertext. This implies that there are at least as
many ciphertexts as plaintexts (messages). Perfect secrecy gives additional
restrictions on the relative size of the key, message, and ciphertext spaces.
We first investigate an example of a (tiny) cryptosystem that does not have
perfect secrecy.
Example 5.54. Suppose that a cryptosystem has two keys k1 and k2, three
messages m1, m2, and m3, and three ciphertexts c1, c2, and c3. Assume that
the density function for the message random variable satisfies
fM (m1) = fM (m2) =
1
4
and fM (m3) =
1
2
. (5.48)
Suppose further that Table 5.12 describes how the different keys act on the
messages to produce ciphertexts.
For example, the encryption of the plaintext m1 with the key k1 is the
ciphertext c2. Under the assumption that the keys are used with equal prob-
ability, we can use (5.47) to compute the probability that the ciphertext is
equal to c1:

f(c1) = f(k1)fM (dk1
(c1)) + f(k2)fM (dk2
(c1))
= f(k1)f(m2) + f(k2)f(m1)
=
1
2
·
1
4
+
1
2
·
1
4
=
1
4
.
On the other hand, we see from the table that f(c1 | m3) = 0. Hence this
cryptosystem does not have perfect secrecy.
This matches our intuition, since it is clear that seeing a ciphertext leaks
some information about the plaintext. For example, if we see the ciphertext c1,
then we know that the message was either m1 or m2, it cannot be m3.
As noted earlier, the number of ciphertexts must be at least as large as
the number of plaintexts, since otherwise, decryption is not possible. It turns
out that one consequence of perfect secrecy is that the number of keys must
also be at least as large as the number of plaintexts.
Proposition 5.55. If a cryptosystem has perfect secrecy, then #K ≥ #C+
,
where C+
= {m ∈ M : f(m) 0} is the set of plaintexts that have a positive
probability of being selected.
Proof. We start by fixing some ciphertext c ∈ C with f(c) 0. Perfect secrecy
in the form of (5.46) tells us that
f(c | m) = f(c) 0 for all m ∈ C+
.
This says that there is a positive probability that m ∈ C+
encrypts to c,
so in particular there is at least one key k satisfying ek(m) = c. Further, if
we start with a different plaintext m
∈ C+
, then we get a different key k
,
since otherwise ek(m) = c = ek(m
), which would contradict the one-to-one
property of ek.
To recapitulate, we have shown that for every m ∈ C+
, the set
{k ∈ K : ek(m) = c}
is nonempty, and further, these sets are disjoint for different m’s. Thus each
plaintext m ∈ C+
is matched with one or more keys, and different m’s are
matched with different keys, which shows that the number of keys is at least
as large as the number of plaintexts in C+
.
Given the restriction on the relative sizes of the key, ciphertext, and plain-
text spaces in systems with perfect secrecy, namely
#K ≥ #M and #C ≥ #M,
it is most efficient to assume that the key space, the plaintext space, and
the ciphertext space are all of equal size. Assuming this, Shannon proves a
theorem characterizing perfect secrecy.

Theorem 5.56. Suppose that a cryptosystem satisﬁes
#K = #M = #C,
i.e., the numbers of keys, plaintexts, and ciphertexts are all equal. Then the
system has perfect secrecy if and only if the following two conditions hold:
(a) Each key k ∈ K is used with equal probability.
(b) For a given message m ∈ M and ciphertext c ∈ C, there is exactly one
key k ∈ K that encrypts m to c.
Proof. Suppose ﬁrst that a cryptosystem has perfect secrecy. We start by
verifying (b). For any plaintext m ∈ M and ciphertext c ∈ C, consider the
(possibly empty) set of keys that encrypt m to c,
Sm,c =

k ∈ K : ek(m) = c

.
We are going to prove that if the cryptosystem has perfect secrecy, then in
fact #Sm,c = 1 for every m ∈ M and every c ∈ C, which is equivalent to
statement (b) of the theorem. We do this in three steps.
Claim 1. If m = m
, then Sm,c ∩ Sm,c = ∅.
Suppose that k ∈ Sm,c ∩ Sm,c. Then ek(m) = c = ek(m
), which implies that
m = m
, since the encryption function ek is injective. This proves Claim 1.
Claim 2. If the cryptosystem has perfect secrecy, then Sm,c is nonempty
for every m and c.
We use the perfect secrecy assumption in the form f(m, c) = f(m)f(c). We
know that every m ∈ M is a valid plaintext for at least one key, so f(m) 0.
Similarly, every c ∈ C appears as the encryption of at least one plaintext using
some key, so f(c) 0. Hence perfect secrecy implies that
f(m, c) 0 for all m ∈ M and all c ∈ C. (5.49)
But the formula f(m, c) 0 is simply another way of saying that c is a
possible encryption of m. Hence there must be at least one key k ∈ K satis-
fying ek(m) = c, i.e., there is some key k ∈ Sm,c. This completes the proof of
Claim 2.
Claim 3. If the cryptosystem has perfect secrecy, then #Sm,c = 1.
Fix a ciphertext c ∈ C. Then
#K ≥ #
#
m∈M
Sm,c

since K contains every Sm,c,
=

m∈M
#Sm,c since the Sm,c are disjoint from Claim 1,

≥ #M since #Sm,c ≥ 1 from Claim 2,
= #K since #K = #M by assumption.
Thus all of these inequalities are equalities, so in particular,

m∈M
#Sm,c = #M.
Then the fact (Claim 2) that every #Sm,c is greater than or equal to 1 implies
that every #Sm,c must equal 1. This completes the proof of Claim 3.
As noted above, Claim 3 is equivalent to statement (b) of the theorem. We turn
now to statement (a). Consider the set of triples
(k, m, c) ∈ K × M × C satisfying ek(m) = c.
Clearly k and m determine a unique value for c, and (b) says that m and c
determine a unique value for k. It is also not hard, using a similar argument
and the assumption that #M = #C, to show that c and k determine a unique
value for m; see Exercise 5.48.
For any triple (k, m, c) satisfying ek(m) = c, we compute
f(m) = f(m | c) by perfect secrecy,
=
f(m, c)
f(c)
deﬁnition of conditional probability,
=
f(m, k)
f(c)
since any two of m, k, c determine the third,
=
f(m)f(k)
f(c)
since M and K are independent.
(There are cryptosystems in which the message forms part of the key; see for
example Exercise 5.19, in which case M and K would not be independent.)
Canceling f(m) from both sides, we have shown that
f(k) = f(c) for every k ∈ K and every c ∈ C. (5.50)
Note that our proof shows that (5.50) is true for every k and every c, because
Exercise 5.48 tells us that for every (k, c) there is a (unique) m satisfy-
ing ek(m) = c.
We sum (5.50) over all c ∈ C and divide by #C to obtain
f(k) =
1
#C

c∈C
f(c) =
1
#C
.
This shows that f(k) is constant, independent of the choice of k ∈ K, which
is precisely the assertion of (a). At the same time we have proven the useful
fact that f(c) is constant, i.e., every ciphertext is used with equal probability.

In the other direction, if a cryptosystem has properties (a) and (b), then
the steps outlined to prove perfect secrecy of the shift cipher in Exercise 5.46
can be applied in this more general setting. We leave the details to the reader.
Example 5.57 (The one-time pad). Vernam’s one-time pad, patented in 1917,
is an extremely simple, perfectly secret, albeit very inefficient, cryptosystem.
The key k consists of a string of binary digits k0k1 . . . kN . It is used to encrypt a
binary plaintext string m = m0m1 . . . mN by XOR’ing the two strings together
bit by bit. See (1.12) on page 44 for a description of the XOR operation, which
for convenience we will denote by ⊕. Then the ciphertext c = c0c1 . . . cN is
given by
ci = ki ⊕ mi for i = 0, 1, . . . , N.
Each key is used only once and then discarded, whence the name of the system.
Since every key is used with equal probability, and since there is exactly one
key that encrypts a given m to a given c, namely the key m ⊕ c, Theorem 5.56
shows that Vernam’s one-time pad has perfect secrecy.
Unfortunately, if Bob and Alice want to use a Vernam one-time pad to
exchange N bits of information, they must already know N bits of shared
secret information to use as the key. This makes one-time pads much too inef-
ficient for large-scale communication networks. However, there are situations
in which they have been used, such as top secret communications between
diplomatic offices or for short messages between spies and their home bases.
It is also worth noting that a one-time pad remains completely secure only
as long as its keys are never reused. When a key pad is used more than once,
either due to error or to the difficulty of providing enough key material, then
the cryptosystem may be vulnerable to cryptanalysis. This occurred in the
real world when the Soviet Union reused some one-time pads during World
War II. The United States mounted a massive cryptanalytic effort called the
VENONA project that successfully decrypted a number of documents.
5.6.2 Entropy
In efficient cryptosystems, a single key must be used to encrypt many differ-
ent plaintexts, so perfect secrecy is not possible. At best we can hope to build
cryptosystems that are computationally secure. Unfortunately, anything less
than perfect secrecy leaves open the possibility that a list of ciphertexts will
reveal significant information about the key. To study this phenomenon, Shan-
non introduced the concept of entropy, which is a measure of the uncertainty
in a system. Thus if we view fX(x) = Pr(X = x) as being the probability that
the outcome of a certain experiment is equal to x, then the entropy of X will
be small if the outcome of a single experiment reveals a significant amount of
information about the random variable X.
Let X be a random variable taking on finitely many values x1, x2, . . . , xn,
and let p1, p2, . . . , pn be the associated probabilities,

pi = fX (xi) = Pr(X = xi).
The entropy H(X) of X is a number that depends only on the probabili-
ties p1, . . . , pn of the possible outcomes of X, so we write20
H(X) = H(p1, . . . , pn).
We would like to capture the idea that H is the expected value of a random
variable that measures the uncertainty that the outcome xi has occurred. Thus
the larger the value of H(X), the less information about X that is revealed
by the outcome of an experiment.
What properties should H possess?
Property H1 The function H should be continuous in the variables pi.
This reflects the intuition that a small change in pi should produce a small
change in the amount of information revealed by X.
Property H2 Let Xn be the random variable that is uniformly distributed
on a set {x1, . . . , xn}, i.e., the random variable Xn has n possible outcomes,
each occurring with probability 1
n . Then
H(Xn+1) H(Xn) for all n ≥ 1.
This reflects the intuition that if all outcomes are equally likely, then the
uncertainty should increase as the number of outcomes increases.
Property H3 The third property is subtler. It says that if an outcome
of X is thought of as a choice, and if that choice can be broken down into
two successive choices, then the original value of H is a weighted sum of the
values of H for the successive choices. In order to quantify this intuition, we
consider random variables X, Y , and Z1, . . . , Zn taking values in the sets
X : Ω −→ {xij : 1 ≤ i ≤ n and 1 ≤ j ≤ mi},
Y :Ω −→ {Z1, . . . , Zn},
Zi :Ω −→ {xij : 1 ≤ j ≤ mi},
and satisfying
Pr(X = xij) = Pr(Y = Zi and Zi = xij).
This reflects the intuition that the outcome X = xij is being broken down
into the successive choices Y = Zi followed by Zi = xij. Then Property H3 is
the formula
H(X) = H(Y ) +
n

i=1
Pr(Y = Zi)H(Zi).
20Although this notation is useful, it is important to remember that the domain of H
is the set of random variables, not the set of n-tuples for some fixed value of n. Thus the
domain of H is itself a set of functions.

Example 5.58. Let Xn be a uniformly distributed random variable on
n objects. Then we claim that
H(Xn2 ) = 2H(Xn).
To see this, we view Xn2 as choosing an element from {xij : 1 ≤ i, j ≤ n},
and we break this choice into two choices by first choosing an index i, and
then choosing an index j. Property H3 says that
H(Xn2 ) = H(Xn) +
n

i=1
1
n
H(Xn) = 2H(Xn).
Example 5.59. We illustrate Property H3 with a more elaborate example. Sup-
pose that X has five possible outcomes {x1, x2, x3, x4, x5} with probabilities
fX (x1) =
1
2
, fX (x2) =
1
4
, fX(x3) =
1
12
, fX (x4) =
1
8
, fX(x5) =
1
24
.
The five outcomes for X are illustrated by the branched tree in Fig. 5.2a.
Now suppose that X is written as two successive choices,the first deciding
between the subsets {x1, x2, x3} and {x4, x5}, and the second choosing an
element of the designated subset. So we have random variables Y , Z1, Z2,
where
fY (Z1) =
5
6
and fY (Z2) =
1
6
,
and
fZ1
(x1)=
3
5
, fZ1
(x2) =
3
10
, fZ1
(x3) =
1
10
, fZ2
(x4) =
3
4
, fZ2
(x5) =
1
4
,
as illustrated in Fig. 5.2b. Then Property H3 for this example says that
H

1
2
,
1
4
,
1
12
,
1
8
,
1
24

↑ ↑ ↑ ↑
H(X) H(Y ) H(Z1) H(Z2)
Theorem 5.60. Every function having Properties H1, H2, and H3 is a con-
stant multiple of the function
H(p1, . . . , pn) = −
n

i=1
pi log2 pi, (5.51)
where log2 denotes the logarithm to the base 2, and if p = 0, then we set
p log2 p = 0.21
21This convention makes sense, since we want H to be continuous in the pi’s, and it is
true that limp→0 p log2 p = 0.

1
2
1
4
1
12
1
8
1
24
x1
x2
x3
x3
x5
x1
x2
x3
x3
x5
5
6
1
6
3
5
3
10
1
10
3
4
1
4
1
2
1
4
1
12
1
8
1
24
Z2
Z1
a b
Figure 5.2: Splitting X into Y followed by Z1 or Z2. (a) Five outcomes of a
choice. (b) Splitting into two choices
Proof. See Shannon’s paper [126].
To illustrate the notion of uncertainty, consider what happens when one
of the probabilities pi is one and the other probabilities are zero. In this case,
the formula (5.51) for entropy gives H(p1, . . . , pn) = 0, which makes sense,
since there is no uncertainty about the outcome of an experiment having only
one possible outcome.
It turns out that the other extreme, namely maximal uncertainly, occurs
when all of the probabilities pi are equal. In order to prove this, we use an
important inequality from real analysis known as Jensen’s inequality. Before
stating Jensen’s inequality, we first need a definition.
Definition. A function F on the real line is called concave (down) on an
interval I if the following inequality is true for all 0 ≤ α ≤ 1 and all s and t
in I:
(1 − α)F(s) + αF(t) ≤ F

(1 − α)s + αt

. (5.52)
This definition may seem mysterious, but it has a simple geometric inter-
pretation. Notice that if we fix s and t and let a vary from 0 to 1, then the
points (1 − α)s + αt trace out the interval from s to t on the real line. So
inequality (5.52) is the geometric statement that the line segment connecting
any two points on the graph of F lies below the graph of F. For example, the
function F(t) = 1 − t2
is concave. Illustrations of concave and noncave func-
tions, with representative line segments, are given in Fig. 5.3. If the function F
has a second derivative, then the second derivative test that you learned in
calculus can be used to test for concavity (see Exercise 5.54).
Theorem 5.61 (Jensen’s Inequality). Suppose that F is concave on an
interval I, and let α1, α2, . . . , αn be nonnegative numbers satisfying
α1 + α2 + · · · + αn = 1.

a b
Figure 5.3: An illustration of concavity. (a) A concave function. (b) A non-
concave function
Then
n

i=1
αiF(ti) ≤ F
n

i=1
αiti

for all t1, t2, . . . , tn ∈ I. (5.53)
Further, equality holds in (5.53) if and only if either F is a linear function
or t1 = t2 = · · · = tn.
Proof. Notice that for n = 2, the desired inequality (5.53) is exactly the
definition of concavity (5.52). The general case is then proven by induction;
see Exercise 5.55.
Corollary 5.62. Let X be a random variable that takes on finitely many
possible values x1, . . . , xn.
(a) H(X) ≤ log2 n.
(b) H(X) = log2 n if and only if every event X = xi occurs with the same
probability 1/n.
Proof. Let pi = Pr(X = xi) for i = 1, 2, . . . , n. Then p1 + · · · + pn = 1, so we
may apply Jensen’s inequality to the function F(t) = log2 t with αi = pi and
ti = 1/pi. (See Exercise 5.54 for a proof that log2 t is a concave function.) The
left-hand side of (5.53) is exactly the formula for entropy (5.51), so we find
that
H(X) = −
n

i=1
pi log2 pi =
n

i=1
pi log2
1
pi
≤ log2
$ n

i=1
pi
1
pi
%
= log2 n.
This proves (a). Further, the function log2 t is not linear, so equality occurs if
and only if p1 = p2 = · · · = pn, i.e., if all of the probabilities satisfy pi = 1/n.
This proves (b).
Notice that Corollary 5.62 says that entropy is maximized when all of
the probabilities are equal. This conforms to our intuitive understanding that
uncertainty is maximized when every outcome is equally likely.
The theory of entropy is applied to cryptography by computing the en-
tropy of random variables such as K, M, and C that are associated with

the cryptosystem and comparing the actual values with the maximum pos-
sible values. Clearly the more entropy there is, the better for the user, since
increased uncertainty makes the cryptanalyst’s job harder.
For instance, consider a shift cipher and the random variable K associ-
ated with its keys. The random variable K has 26 possible values, since the
shift may be any integer between 0 and 25, and each shift amount is equally
probable, so K has maximal entropy H(K) = log2(26).
Example 5.63. We consider the system with two keys described in Exam-
ple 5.54 on page 265. Each key is equally likely, so H(K) = log2(2) = 1. Simi-
larly, we can use the plaintext probabilities for this system as given by (5.48)
to compute the entropy of the random variable M associated to the plaintexts.
H(M) = −
1
4
log2

1
4

=
3
2
= 1.5.
Notice that H(M) is slightly smaller than log2(3) ≈ 1.585, which would be
the maximal possible entropy for M in a cryptosystem with three plaintexts.
We now introduce the concept of conditional entropy and its application
to secrecy systems. Suppose that a signal is sent over a noisy channel, which
means that the signal may be distorted during transmission. Shannon [126]
defines the equivocation to be the conditional entropy of the original signal,
given the received signal. He uses this quantity to measure the amount of un-
certainty in transmissions across a noisy channel. Shannon [127] later observed
that a noisy communication channel is also a model for a secrecy system. The
original signal (the plaintext) is “distorted” by applying the encryption pro-
cess, and the received signal (the ciphertext) is thus a noisy version of the
original signal. In this way, the notion of equivocation can be applied to cryp-
tography, where a large equivocation says that the ciphertext conceals most
information about the plaintext.
Definition. Let X and Y be random variables, and let x1, . . . , xn be the
possible values of X and y1, . . . , ym the possible values of Y . The equivocation,
or conditional entropy, of X on Y is the quantity H(X | Y ) defined by
H(X | Y ) = −
n

i=1
m

j=1
fY (yj)fX|Y (xi | yj) log2 fX|Y (xi | yj).
When X = K is the key random variable and Y = C is the cipher-
text random variable, the quantity H(K | C) is called the key equivocation.
It measures the total amount of information about the key revealed by the
ciphertext, or more precisely, it is the expected value of the conditional en-
tropy H(K | c) of K given a single observation c of C. The key equivocation
can be determined by computing all of the conditional probabilities f(k | c)
of the cryptosystem. Alternatively, one can use the following result.

Proposition 5.64. The key equivocation of a cryptosystem (K, Mcal, C) is
related to the individual entropies of K, M, and C by the formula
H(K | C) = H(K) + H(M) − H(C). (5.54)
Proof. We leave the proof as an exercise; see Exercise 5.57
Example 5.65. We compute the key equivocation of the cryptosystem de-
scribed in Examples 5.54 and 5.63. We already computed H(K) = 1 and
H(M) = 3
2 , so it remains to compute H(C). To do this, we need the val-
ues of f(c) for each ciphertext c ∈ C. We already computed f(c1) = 1
4 , and a
similar computation using (5.48) and Table 5.12 yields
f(c2) = f(k1)f(m1) + f(k2)f(m3) =

1
2

=
3
8
,
f(c3) = f(k1)f(m3) + f(k2)f(m2) =

1
2

=
3
8
.
Therefore,
H(C) = −
1
4
log2

1
4

≈ 1.56,
and using (5.54), we ﬁnd that
H(K | C) = H(K) + H(M) − H(C) ≈ 1 + 1.5 − 1.56 ≈ 0.94.
5.6.3 Redundancy and the Entropy
of Natural Language
Suppose that the plaintext is written in a natural language such as English.22
Then nearby letters, or nearby bits if the letters are converted to ASCII, are
heavily dependent on one another, rather than looking random. For exam-
ple, correlations between successive letters (bigrams or trigrams) can aid the
cryptanalyst, as we saw when we cryptanalyzed a simple substitution cipher
in Sect. 1.1. In this section we use the notion of entropy to quantify the re-
dundancy inherent in a natural language.
We start by approximating the entropy of a single letter in English text.
Let L denote the random variable whose values are the letters of the English
language E with their associated probabilities as given in Table 1.3 on page 6.
For example, the table says that
fL(A) = 0.0815, fL(B) = 0.0144, fL(C) = 0.0276, . . . , fL(Z) = 0.0008.
22It should be noted that when implementing a modern public key cipher, one generally
combines the plaintext with some random bits and then performs some sort of invertible
transformation so that the resulting secondary plaintext looks more like a string of random
bits. See Sect. 8.6.

We can use the values in Table 1.3 to compute the entropy of a single letter
in English text,
H(L) = 0.0815 log2(0.0815) + · · · + 0.0008 log2(0.0008) ≈ 4.132.
If every letter were equally likely, the entropy would be log2(26) ≈ 4.7. The
fact that the entropy is only 4.132 shows that some letters in English are more
prevalent than others.
The concept of entropy can be used to measure the amount of information
conveyed by a language. Shannon [126] shows that H(L) can be interpreted
as the average number of bits of information conveyed by a single letter of a
language. The value of H(L) that we computed does reveal some redundancy:
it says that a letter conveys only 4.132 bits of information on average, although
it takes 4.7 bits on average to specify a letter in the English alphabet.
The fact that natural languages contain redundancy is obvious. For ex-
ample, you will probably be able to read the following sentence, despite our
having removed almost 40 % of the letters:
Th prblms o crptgry nd scrcy sysms frnsh n ntrstng aplcatn o comm thry.
However, the entropy H(L) of a single letter does not take into account
correlations between nearby letters, so it alone does not give a good value for
the redundancy of the English language E. As a first step, we should take into
account the correlations between pairs of letters (bigrams). Let L2
denote the
random variable whose values are pairs of English letters as they appear in
typical English text. Some bigrams appear fairly frequently, for example
fL2 (TH) = 0.00315 and fL2 (AN) = 0.00172.
Others, such as JX and ZQ, never occur. Just as Table 1.3 was created ex-
perimentally by counting the letters in a long sample text, we can create a
frequency table of bigrams and use it to obtain an experimental value for L2
.
This leads to a value of H(L2
) ≈ 7.12, so on average, each letter of E has
entropy equal to half this value, namely 3.56. Continuing, we could exper-
imentally compute the entropy of L3
, which is the random variable whose
values are trigrams (triples of letters), and then 1
3 H(L3
) would be an even
better approximation to the entropy of E. Of course, we need to analyze a
great deal of text in order to obtain a reliable estimate for trigram frequencies,
and the problem becomes even harder as we look at L4
, L5
, L6
, and so on.
However, this idea leads to the following important concept.
Definition. Let L be a language (e.g., English or French or C++), and for
each n ≥ 1, let Ln
denote the random variables whose values are strings
of n consecutive characters of L. The entropy of L is defined to be the quan-
tity23
23To be rigorous, one should really define upper and lower densities using liminf and
limsup, since it is not clear that limit defining H(L) exists. We will not worry about such
niceties here.

H(L) = lim
n→∞
H(Ln
)
n
.
Although it is not possible to precisely determine the entropy of the English
language E, experimentally it appears that
1.0 ≤ H(E) ≤ 1.5.
This means that despite the fact that it requires almost five bits to represent
each of the 26 letters used in English, each letter conveys less than one and a
half bits of information. Thus English is approximately 70 % redundant!24
5.6.4 The Algebra of Secrecy Systems
We make only a few brief remarks about the algebra of cryptosystems. In [127],
Shannon considers ways of building new cryptosystems by taking algebraic
combinations of old ones. The new systems are described in terms of linear
combinations and products of the original encryption transformations.
Example 5.66 (Summation Systems). If R and T are two secrecy systems,
then Shannon defines the weighted sum of R and T to be
S = pR + qT, where p + q = 1.
The meaning of this notation is as follows. First one chooses either R or T,
where the probability of choosing R is p and the probability of choosing T is q.
Imagine that the choice is made by flipping an unbalanced coin, but note that
both Bob and Alice need to have a copy of the output of the coin tosses. In
other words, the list of choices, or a method for generating the list of choices,
forms part of their private key.
The notion of summation extends to the sum of any number of secrecy
systems. The systems R and T need to have the same message space, but
they need not act on messages in a similar way. For example, the system R
could be a substitution cipher and the system T could be a shift cipher. As
another example, suppose that Ti is the shift cipher that encrypts a letter
of the alphabet by shifting it i places. Then the system that encrypts by
choosing a shift at random and encrypting according to the chosen shift is the
summation cipher
25

i=0
1
26
Ti.
Example 5.67 (Product Systems). In order to define the product of two cryp-
tosystems, it is necessary that the ciphertexts of the first system be plaintexts
for the second system. Thus let
24This does not mean that one can remove 70 % of the letters and still have an intelligible
message. What it means is that in principle, it is possible to take a long message that
requires 4.7 bits to specify each letter and to compress it into a form that takes only 30 % as
many bits.

e : M → C and e
: M
→ C
by two encryption functions, and suppose that C = M
, or more generally,
that C ⊆ M
. Then the product system e
· e is defined to be the composition
of e and e
,
e
· e : M
e
−
−
−
−
→ C ⊆ M e
−
−
−
−
→ C
.
Product ciphers provide a means to strengthen security. They were used in the
development of DES, the Digital Encryption Standard [97], the first national
standard for symmetric encryption. DES features several rounds of a cipher
called S-box encryption, so it is a multiple product of a cipher with itself.
Further, each round consists of the composition of several different transfor-
mations. The use of product ciphers continues to be of importance in the
development of new symmetric ciphers, including AES, the Advanced En-
cryption Standard. See Sect. 8.12 for a brief discussion of DES and AES.
5.7 Complexity Theory and P Versus NP
A decision problem is a problem in a formal system that has a yes or no
answer. For example, PRIME is the decision problem of determining whether
a given integer is a prime. We discussed this problem in Sect. 3.4. Another
example is the decision Diffie–Hellman problem (Exercise 2.7): given ga
mod
p and gb
mod p, determine whether a given number C is equal to gab
mod
p. Complexity theory attempts to understand and quantify the difficulty of
solving particular decision problems.
The early history of this field is fascinating, as mathematicians tried to
come to grips with the limitations on provability within formal systems.
In 1936, Alan Turing proved that there is no algorithm that solves the halting
problem. That is, there is no algorithm to determine whether an arbitrary
computer program, given an arbitrary input, eventually halts execution. Such
a problem is called undecidable. Earlier in that same year, Alonzo Church
had published a proof of undecidability of a problem in the lambda calculus.
He and Turing then showed that the lambda calculus and the notion of Tur-
ing machine are essentially equivalent. The breakthroughs on the theory of
undecidability that appeared in the 1930s and 1940s began as a response to
Hilbert’s questions about the completeness of axiomatic systems and whether
there exist unsolvable mathematical problems. Indeed, both Church and Tur-
ing were influenced by Gödel’s discovery in 1930 that all sufficiently strong and
precise axiomatic systems are incomplete, i.e., they contain true statements
that are unprovable within the system.
There are uncountably many undecidable problems in mathematics, some
of which have simple and interesting formulations. Here is an example of an
easy to state undecidable problem called Post’s correspondence problem [106].
Suppose that you are given a sequence of pairs of strings,
(s1, t1), (s2, t2), (s3, t3), . . . , (sk, tk),

5.7. Complexity Theory and P Versus NP 279
where a string is simply a list of characters from some alphabet containing
at least two letters. The correspondence problem asks you to decide whether
there is an integer r ≥ 1 and a list of indices
i1, i2, . . . , ir between 1 and k (5.55)
such that the concatenations
si1
si2
· · · sir
and ti1
ti2
· · · tir
are equal. (5.56)
Note that if we bound the value of r, say r ≤ r0, then the problem becomes
decidable, since there are only a finite number of concatentations to check. The
problem with r restricted in this way is called the bounded Post correspondence
problem.
On the other end of the spectrum are decision problems for which there
exist quick algorithms leading to their solutions. We have already talked about
algorithms being fast if they run in polynomial time and slow if they take
exponential time; see the discussion in Sects. 2.6 and 3.4.2.
Definition. A decision problem belongs to the class P if there exists a
polynomial-time algorithm that solves it. That is, given an input of length n,
the answer will be produced in a polynomial (in n) number of steps. One says
that the decision problems in P are those that can be solved in polynomial
time.
The concept of verification in polynomial time has some subtlety that
can be captured only by a more precise definition, which we do not give.
The class NP is defined by the concept of a polynomial-time algorithm on
a “nondeterministic” machine. This means, roughly speaking, that we are
allowed to guess a solution, but the verification time to check that the guessed
solution is correct must be polynomial in the length of the input.
An example of a decision problem in P is that of determining whether
two integers have a nontrivial common factor. This problem is in P because
the Euclidean algorithm takes fewer than O(n3
) steps. (Note that in this
setting, the Euclidean algorithm takes more than O(n) steps, since we need
to take account of the time it takes to add, subtract, multiply, and divide n-bit
numbers.) Another decision problem in P is that of determining whether a
given integer is prime. The famous AKS algorithm, Theorem 3.26, takes fewer
than O(n7
) steps to check primality.
Definition. A decision problem belongs to the class NP if a yes-instance of
the problem can be verified in polynomial time.
For example, the bounded Post correspondence problem is in NP. It is
clear that if you are given a list of indices (5.55) of bounded length such that
the concatenations (5.56) are alleged to be the same, it takes a polynomial
number of steps to verify that the concatenations are indeed the same. On the

other hand, exhaustively checking all possible concatenations of length up
to r0 takes an exponential (in r0) number of steps. It is less clear, but can be
proven, that one cannot find a solution in a polynomial number of steps.
This brings us to one of the most famous open questions in all of mathe-
matics and computer science25
:
Does P = NP?
Since the status of P versus NP is currently unresolved, it is useful to
characterize problems in terms of their relative difficulty. We say that prob-
lem A can be (polynomial-time) reduced to problem B if there is a constructive
polynomial-time transformation that takes any instance of A and maps it to
an instance of B. Thus any algorithm for solving B can be transformed into
an algorithm for solving A. Hence if problem B belongs to P, and if A is
reducible to B, then A also belongs to P. The intuition is that if A can be
reduced to B, then solving A is no harder than solving B (up to a polynomial
amount of computation).
Stephen Cook’s 1971 paper [30] entitled “The Complexity of Theorem
Proving Procedures” laid the foundations of the theory of NP-completeness.
In this paper, Cook works with a certain NP problem called “Satisfiability”
(abbreviated SAT). The SAT problem asks, given a Boolean expression in-
volving only variables, parentheses, OR, AND and NOT, whether there exists
an assignment of truth values that makes the expression true. Cook proves
that SAT has the following properties:
1. Every NP problem is polynomial-time reducible to SAT.
2. If there exists any problem in NP that fails to be in P, then SAT is not
in P.
A problem that has these two properties is said to be NP-complete. Since
the publication of Cook’s paper, many other problems have been shown to
be NP-complete.
A related notion is that of NP-hardness. We say that a problem is NP-
hard if it has the reducibility property (1), although the problem itself need not
belong to NP. All NP-complete problems are NP-hard, but not conversely.
For example, the halting problem is NP-hard, but not NP-complete.
In order to put our informal discussion onto a firm mathematical footing,
it is necessary to introduce some formalism. We start with a finite set of
symbols Σ, and we denote by Σ∗
the set of all (finite) strings of these symbols.
A subset of Σ∗
is called a language. A decision problem is defined to be the
problem of deciding whether an input string belongs to a language. The precise
definitions of P and NP are then given within this formal framework, which
25As mentioned in Sect. 2.1, the question of whether P = NP is one of the $1,000,000
Millennium Prize problems.

5.7. Complexity Theory and P Versus NP 281
we shall not develop further here. For an excellent introduction to the theory
of complexity, see [46], and for additional material on complexity theory as it
relates to cryptography, see for example [143, Chapters 2 and 3].
Up to now we have been discussing the complexity theory of decision prob-
lems, but not every problem has a yes/no answer. For example, the problem of
integer factorization (given a composite number, find a nontrivial factor) has
a solution that is an integer, as does the discrete logarithm problem (given g
and h in a F∗
p, find an x such that gx
= h). It is possible to formulate a
theory of complexity for general computational problems, but we are content
to give two examples. First, the integer factorization problem is in NP, since
given an integer N and a putative factor m, it can be verified in polynomial-
time that m divides N. Second, the discrete logarithm problem is in NP,
since given a supposed solution x, one can verify in polynomial time (using
the fast powering algorithm) that gx
= h. It is not known whether either of
these computational problems is in P, i.e., there are no known polynomial-
time algorithms for either integer factorization or for discrete logarithms. The
current general consensus seems to be that they are probably not in P.
We turn now to the role of complexity theory in some of the problems
that arise in cryptography. The problems of factoring integers and finding
discrete logarithms are presumed to be difficult, since no one has yet discov-
ered polynomial-time algorithms to produce solutions. However, the problem
of producing a solution (this is called the function problem) may be different
from the decision problem of determining whether a solution exists. Here is a
version of the factoring problem phrased as a decision problem:
Does there exist a nontrivial factor of N that is less than k?
As we can see, a yes instance of this problem (i.e., N is composite) has a
(trivial) polynomial-time verification algorithm, and so this decision problem
belongs to NP. It can also be shown that the complementary problem belongs
to NP. That is, if N is a no instance (i.e., N is prime), then the primality
of N can be verified in polynomial time on a nondeterministic Turing machine.
When both the yes and no instances of a problem can be verified in polynomial
time, the decision problem is said to belong to the class co-NP. Since it is
widely believed that NP is not the same as co-NP, it was also believed
that factoring is not an NP-complete problem. In 2004, Agrawal, Kayal and
Saxena [1] showed that the decision problem of determining whether a number
is prime does indeed belong to P, settling the long-standing question whether
this decision problem could be NP-complete.
A cryptosystem is only as secure as its underlying hard problem, so it
would be desirable to construct cryptosystems based on NP-hard problems.
There has been a great deal of interest in building efficient public key cryp-
tosystems of this sort. A major difficulty is that one needs not only an NP-
hard problem, but also a trapdoor to the problem to use for decryption. This
has led to a number of cryptosystems that are based special cases of NP-
hard problems, but it is not known whether these special cases are themselves
NP-hard.

282 Exercises
The first example of a public key cryptosystem built around an NP-
complete problem was the knapsack cryptosystem of Merkle and Hellman.
More precisely, they based their cryptosystem on the subset-sum problem,
which asks the following:
Given n positive integers a1, . . . , an and a target
sum S, find a subset of the ai such that
ai1
+ ai2
+ · · · + ait
= S.
The subset-sum problem is NP-complete, since one can show that any in-
stance of SAT can be reduced to an instance of the subset-sum problem, and
vice versa. In order to build a public key cryptosystem based on the (hard)
subset-sum problem, Merkle and Hellman needed to build a trapdoor into the
problem. They did this by using only certain special cases of the subset-sum
problem, but unfortunately it turned out that these special cases are signifi-
cantly easier than the general case and their cryptosystem was broken. And
despite further work by a number of cryptographers, no one has been able to
build a subset-sum cryptosystem that is both efficient and secure. See Sect. 7.2
for a detailed discussion of how subset-sum cryptosystems work and how they
are broken.
Another cautionary note in going from theory to practice comes from the
fact that even if a certain collection of problems is NP-hard, that does not
mean that every problem in the collection is hard. In some sense, NP-hardness
measures the difficulty of the hardest problem in the collection, not the average
problem. It would not be good to base a cryptosystem on a problem for which
a few instances are very hard, but most instances are very easy. Ideally, we
want to use a collection of problems with the property that most instances are
NP-hard. An interesting example is the closest vector problem (CVP), which
involves finding a vector in lattice that is close to a given vector. We discuss
lattices and CVP in Chap. 7, but for now we note that CVP is NP-hard.
Our interest in CVP stems from a famous result of Ajtai and Dwork [4] in
which they construct a cryptosystem based on CVP in a certain set of lattices.
They show that the average difficulty of solving CVP for their lattices can be
polynomially reduced to solving the hardest instance of CVP in a similar set of
lattices (of somewhat smaller dimension). Although not practical, their public
key cryptosystem was the first construction exhibiting worst-case/average-
case equivalence.
Exercises
Section 5.1. Basic Principles of Counting
5.1. The Rhind papyrus is an ancient Egyptian mathematical manuscript that is
more than 3500 years old. Problem 79 of the Rhind papyrus poses a problem that
can be paraphrased as follows: there are seven houses; in each house lives seven cats;

Exercises 283
each cat kills seven mice; each mouse has eaten seven spelt seeds26
; each spelt seed
would have produced seven hekat27
of spelt. What is the sum of all of the named
items? Solve this 3500 year old problem.
5.2. (a) How many n-tuples (x1, x2, . . . , xn) are there if the coordinates are required
to be integers satisfying 0 ≤ xi q?
(b) Same question as (a), except now there are separate bounds 0 ≤ xi qi for
each coordinate.
(c) How many n-by-n matrices are there if the entries xi,j of the matrix are integers
satisfying 0 ≤ xi,j q?
(d) Same question as (a), except now the order of the coordinates does not matter.
So for example, (0, 0, 1, 3) and (1, 0, 3, 0) are considered the same. (This one is
rather tricky.)
(e) Twelve students are each taking four classes, for each class they need two loose-
leaf notebooks, for each notebook they need 100 sheets of paper, and each sheet
of paper has 32 lines on it. Altogether, how many students, classes, notebooks,
sheets, and lines are there? (Bonus. Make this or a similar problem of your own
devising into a rhyme like the St. Ives riddle.)
5.3. (a) List all of the permutations of the set {A, B, C}.
(b) List all of the permutations of the set {1, 2, 3, 4}.
(c) How many permutations are there of the set {1, 2, . . . , 20}?
(d) Seven students are to be assigned to seven dormitory rooms, each student
receiving his or her own room. In how many ways can this be done?
(e) How many different words can be formed with the four symbols A, A, B, C?
5.4. (a) List the 24 possible permutations of the letters A1, A2, B1, B2. If A1 is
indistinguishable from A2, and B1 is indistinguishable from B2, show how the
permutations become grouped into 6 distinct letter arrangements, each con-
taining 4 of the original 24 permutations.
(b) Using the seven symbols A, A, A, A, B, B, B, how many different seven letter
words can be formed?
(c) Using the nine symbols A, A, A, A, B, B, B, C, C, how many different nine letter
(d) Using the seven symbols A, A, A, A, B, B, B, how many different five letter
5.5. (a) There are 100 students eligible for an award, and the winner gets to
choose from among 5 different possible prizes. How many possible outcomes
are there?
(b) Same as in (a), but this time there is a first place winner, a second place winner,
and a third place winner, each of whom gets to select a prize. However, there
is only one of each prize. How many possible outcomes are there?
(c) Same as in (b), except that there are multiple copies of each prize, so each of the
three winners may choose any of the prizes. Now how many possible outcomes
are there? Is this larger or smaller than your answer from (b)?
26Spelt is an ancient type of wheat.
27A hekat is 1
30
of a cubic cubit, which is approximately 4.8 l.

284 Exercises
(d) Same as in (c), except that rather than specifying a first, second, and third place
winner, we just choose three winning students without differentiating between
them. Now how many possible outcomes are there? Compare the size of your
answers to (b), (c), and (d).
5.6. Use the binomial theorem (Theorem 5.10) to compute each of the following
quantities.
(a) (5z + 2)3
(b) (2a − 3b)4
(c) (x − 2)5
5.7. The binomial coefficients satisfy many interesting identities. Give three proofs
of the identity
n
j

=

n − 1
j − 1

+

n − 1
j

.
(a) For Proof #1, use the definition of
n
j

as n!
(n−j)!j!
.
(b) For Proof #2, use the binomial theorem (Theorem 5.10) and compare the
coefficients of xj
yn−j
on the two sides of the identity
(x + y)n
= (x + y)(x + y)n−1
.
(c) For Proof #3, argue directly that choosing j objects from a set of n objects
can be decomposed into either choosing j − 1 objects from n − 1 objects or
choosing j objects from n − 1 objects.
5.8. Let p be a prime number. This exercise sketches another proof of Fermat’s
little theorem (Theorem 1.24).
(a) If 1 ≤ j ≤ p − 1, prove that the binomial coefficient
p
j

is divisible by p.
(b) Use (a) and the binomial theorem (Theorem 5.10) to prove that
(a + b)p
≡ ap
+ bp
(mod p) for all a, b ∈ Z.
(c) Use (b) with b = 1 and induction on a to prove that ap
≡ a (mod p) for
all a ≥ 0.
(d) Use (c) to deduce that ap−1
≡ 1 (mod p) for all a with gcd(p, a) = 1.
5.9. We know that there are n! different permutations of the set {1, 2, . . . , n}.
(a) How many of these permutations leave no number fixed?
(b) How many of these permutations leave at least one number fixed?
(c) How many of these permutations leave exactly one number fixed?
(d) How many of these permutations leave at least two numbers fixed?
For each part of this problem, give a formula or algorithm that can be used to
compute the answer for an arbitrary value of n, and then compute the value for n =
10 and n = 26. (This exercise generalizes Exercise 1.5.)
Section 5.2. The Vigenère Cipher
5.10. Encrypt each of the following Vigenère plaintexts using the given keyword
and the Vigenère tableau (Table 5.1).
(a) Keyword: hamlet
Plaintext: To be, or not to be, that is the question.

Exercises 285
(b) Keyword: fortune
Plaintext: The treasure is buried under the big W.
5.11. Decrypt each of the following Vigenère ciphertexts using the given keyword
and the Vigenère tableau (Table 5.1).
(a) Keyword: condiment
Ciphertext: r s g h z b m c x t d v f s q h n i g q x r n b m
p d n s q s m b t r k u
(b) Keyword: rabbithole
Ciphertext: k h f e q y m s c i e t c s i g j v p w f f b s q
m o a p x z c s f x e p s o x y e n p k d a i c x
c e b s m t t p t x z o o e q l a f l g k i p o c
z s w q m t a u j w g h b o h v r j t q h u
5.12. Explain how a cipher wheel with rotating inner wheel (see Fig. 1.1 on page 3)
can be used in place of a Vigeǹere tableau (Table 5.1) to perform Vigenère encryption
and decryption. Illustrate by describing the sequence of rotations used to perform a
Vigenère encryption with the keyword mouse.
5.13. Let
s = “I am the very model of a modern major general.”
t = “I have information vegetable, animal, and mineral.”
(a) Make frequency tables for s and t.
(b) Compute IndCo(s) and IndCo(t).
(c) Compute MutIndCo(s, t).
5.14. The following strings are blocks from a Vigenère encryption. It turns out that
the keyword contains a repeated letter, so two of these blocks were encrypted with
the same shift. Compute MutIndCo(si, sj) for 1 ≤ i j ≤ 3 and use these values
to deduce which two strings were encrypted using the same shift.
s1 = iwseesetftuonhdptbunnybioeatneghictdnsevi
s2 = qibfhroeqeickxmirbqlflgkrqkejbejpepldfjbk
s3 = iesnnciiheptevaireittuevmhooottrtaaflnatg
5.15. (a) One of the following two strings was encrypted using a simple substitution
cipher, while the other is a random string of letters. Compute the index of
coincidence of each string and use the results to guess which is which.
s1 = RCZBWBFHSLPSCPILHBGZJTGBIBJGLYIJIBFHCQQFZBYFP,
s2 = KHQWGIZMGKPOYRKHUITDUXLXCWZOTWPAHFOHMGFEVUEJJ.
(b) One of the following two strings was encrypted using a simple substitution
cipher, while the other is a random permutation of the same set of letters.
s1 = NTDCFVDHCTHKGUNGKEPGXKEWNECKEGWEWETWKUEVHDKK
CDGCWXKDEEAMNHGNDIWUVWSSCTUNIGDSWKE

286 Exercises
nhqrk vvvfe fwgjo mzjgc kocgk lejrj wossy wgvkk hnesg kwebi
bkkcj vqazx wnvll zetjc zwgqz zwhah kwdxj fgnyw gdfgh bitig
mrkwn nsuhy iecru ljjvs qlvvw zzxyv woenx ujgyr kqbfj lvjzx
dxjfg nywus rwoar xhvvx ssmja vkrwt uhktm malcz ygrsz xwnvl
lzavs hyigh rvwpn ljazl nispv jahym ntewj jvrzg qvzcr estul
fkwis tfylk ysnir rddpb svsux zjgqk xouhs zzrjj kyiwc zckov
qyhdv rhhny wqhyi rjdqm iwutf nkzgd vvibg oenwb kolca mskle
cuwwz rgusl zgfhy etfre ijjvy ghfau wvwtn xlljv vywyj apgzw
trggr dxfgs ceyts tiiih vjjvt tcxfj hciiv voaro lrxij vjnok
mvrgw kmirt twfer oimsb qgrgc
Table 5.13: A Vigenère ciphertext for Exercise 5.16
togmg gbymk kcqiv dmlxk kbyif vcuek cuuis vvxqs pwwej koqgg
phumt whlsf yovww knhhm rcqfq vvhkw psued ugrsf ctwij khvfa
thkef fwptj ggviv cgdra pgwvm osqxg hkdvt whuev kcwyj psgsn
gfwsl jsfse ooqhw tofsh aciin gfbif gabgj adwsy topml ecqzw
asgvs fwrqs fsfvq rhdrs nmvmk cbhrv kblxk gzi
s2 = IGWSKGEHEXNGECKVWNKVWNKSUTEHTWHEKDNCDXWSIEKD
AECKFGNDCPUCKDNCUVWEMGEKWGEUTDGTWHD
Thus their Indices of Coincidence are identical. Develop a method to compute
a bigram index of coincidence, i.e., the frequency of pairs of letters, and use it
to determine which string is most likely the encrypted text.
(Bonus: Decrypt the encrypted texts in (a) and (b), but be forewarned that the
plaintexts are in Latin.)
5.16. Table 5.13 is a Vigenère ciphertext in which we have marked some of the
repeated trigrams for you. How long do you think the keyword is? Why?
Bonus: Complete the cryptanalysis and recover the plaintext.
5.17. We applied a Kasiski test to the Vigenère ciphertext listed in Table 5.14
and found that the key length is probably 5. We then performed a mutual index of
coincidence test to each shift of each pair of blocks and listed the results for you in
Table 5.15. (This is the same type of table as Table 5.5 in the text, except that we
haven’t underlined the large values.) Use Table 5.15 to guess the relative rotations
of the blocks, as we did in Table 5.6. This will give you a rotated version of the
keyword. Try rotating it, as we did in Table 5.7, to ﬁnd the correct keyword and
decrypt the text.
5.18. Table 5.16 gives a Vigenère ciphertext for you to analyze from scratch. It is
probably easiest to do so by writing a computer program, but you are welcome to
try to decrypt it with just paper and pencil.
(a) Make a list of matching trigrams as we did in Table 5.3. Use the Kasiski test
on matching trigrams to ﬁnd the likely key length.

Exercises 287
Blocks Shift amount
i j 0 1 2 3 4 5 6 7 8 9 10 11 12
1 2 0.044 0.047 0.021 0.054 0.046 0.038 0.022 0.034 0.057 0.035 0.040 0.023 0.038
1 3 0.038 0.031 0.027 0.037 0.045 0.036 0.034 0.032 0.039 0.039 0.047 0.038 0.050
1 4 0.025 0.039 0.053 0.043 0.023 0.035 0.032 0.043 0.029 0.040 0.041 0.050 0.027
1 5 0.050 0.050 0.025 0.031 0.038 0.045 0.037 0.028 0.032 0.038 0.063 0.033 0.034
2 3 0.035 0.037 0.039 0.031 0.031 0.035 0.047 0.048 0.034 0.031 0.031 0.067 0.053
2 4 0.040 0.033 0.046 0.031 0.033 0.023 0.052 0.027 0.031 0.039 0.078 0.034 0.029
2 5 0.042 0.040 0.042 0.029 0.033 0.035 0.035 0.038 0.037 0.057 0.039 0.038 0.040
3 4 0.032 0.033 0.035 0.049 0.053 0.027 0.030 0.022 0.047 0.036 0.040 0.036 0.052
3 5 0.043 0.043 0.040 0.034 0.033 0.034 0.043 0.035 0.026 0.030 0.050 0.068 0.044
4 5 0.045 0.033 0.044 0.046 0.021 0.032 0.030 0.038 0.047 0.040 0.025 0.037 0.068
Blocks Shift amount
i j 13 14 15 16 17 18 19 20 21 22 23 24 25
1 2 0.040 0.063 0.033 0.025 0.032 0.055 0.038 0.030 0.032 0.045 0.035 0.030 0.044
1 3 0.026 0.046 0.042 0.053 0.027 0.024 0.040 0.047 0.048 0.018 0.037 0.034 0.066
1 4 0.042 0.050 0.042 0.031 0.024 0.052 0.027 0.051 0.020 0.037 0.042 0.069 0.031
1 5 0.030 0.048 0.039 0.030 0.034 0.038 0.042 0.035 0.036 0.043 0.055 0.030 0.035
2 3 0.039 0.015 0.030 0.045 0.049 0.037 0.023 0.036 0.030 0.049 0.039 0.050 0.037
2 4 0.027 0.048 0.050 0.037 0.032 0.021 0.035 0.043 0.047 0.041 0.047 0.042 0.035
2 5 0.033 0.035 0.039 0.033 0.037 0.047 0.037 0.028 0.034 0.066 0.054 0.032 0.022
3 4 0.040 0.048 0.041 0.044 0.033 0.028 0.039 0.027 0.036 0.017 0.038 0.051 0.065
3 5 0.039 0.029 0.045 0.040 0.033 0.028 0.031 0.037 0.038 0.036 0.033 0.051 0.036
4 5 0.049 0.033 0.029 0.043 0.028 0.033 0.020 0.040 0.040 0.041 0.039 0.039 0.059
Table 5.15: Mutual indices of coincidence for Exercise 5.17
mgodt beida psgls akowu hxukc iawlr csoyh prtrt udrqh cengx
uuqtu habxw dgkie ktsnp sekld zlvnh wefss glzrn peaoy lbyig
uaafv eqgjo ewabz saawl rzjpv feyky gylwu btlyd kroec bpfvt
psgki puxfb uxfuq cvymy okagl sactt uwlrx psgiy ytpsf rjfuw
igxhr oyazd rakce dxeyr pdobr buehr uwcue ekfic zehrq ijezr
xsyor tcylf egcy
(b) Make a table of indices of coincidence for various key lengths, as we did in
Table 5.4. Use your results to guess the probable key length.
(c) Using the probable key length from (a) or (b), make a table of mutual indices
of coincidence between rotated blocks, as we did in Table 5.5. Pick the largest
indices from your table and use them to guess the relative rotations of the
blocks, as we did in Table 5.6.
(d) Use your results from (c) to guess a rotated version of the keyword, and then
try the different rotations as we did in Table 5.7 to find the correct keyword
and decrypt the text.
5.19. The autokey cipher is similar to the Vigenère cipher, except that rather than
repeating the key, it simply uses the key to encrypt the first few letters and then
uses the plaintext itself (shifted over) to continue the encryption. For example, in
order to encrypt the message “The autokey cipher is cool” using the keyword
random, we proceed as follows:

288 Exercises
Plaintext t h e a u t o k e y c i p h e r i s c o o l
Key r a n d o m t h e a u t o k e y c i p h e r
Ciphertext k h r d i f h r i y w b d r i p k a r v s c
The autokey cipher has the advantage that different messages are encrypted using
different keys (except for the first few letters). Further, since the key does not repeat,
there is no key length, so the autokey is not directly susceptible to a Kasiski or index
of coincidence analysis. A disadvantage of the autokey is that a single mistake in
encryption renders the remainder of the message unintelligible. According to [63],
Vigenère invented the autokey cipher in 1586, but his invention was ignored and
forgotten before being reinvented in the 1800s.
(a) Encrypt the following message using the autokey cipher:
Keyword: LEAR
Plaintext: Come not between the dragon and his wrath.
(b) Decrypt the following message using the autokey cipher:
Keyword: CORDELIA
Ciphertext: pckkm yowvz ejwzk knyzv vurux cstri tgac
(c) Eve intercepts an autokey ciphertext and manages to steal the accompanying
plaintext:
Plaintext ifmusicbethefoodofloveplayon
Ciphertext azdzwqvjjfbwnqphhmptjsszfjci
Help Eve to figure out the keyword that was used for encryption. Describe your
method in sufficient generality to show that the autokey cipher is susceptible
to known plaintext attacks.
(d) Bonus Problem: Try to formulate a statistical or algebraic attack on the autokey
cipher, assuming that you are given a large amount of ciphertext to analyze.
Section 5.3. Probability Theory
5.20. Use the definition (5.15) of the probability of an event to prove the following
basic facts about probability theory.
(a) Let E and F be disjoint events. Then
Pr(E ∪ F) = Pr(E) + Pr(F).
(b) Let E and F be events that need not be disjoint. Then
Pr(E ∪ F) = Pr(E) + Pr(F) − Pr(E ∩ F).
(c) Let E be an event. Then Pr(Ec
) = 1 − Pr(E).
(d) Let E1, E2, E3 be events. Prove that
Pr(E1 ∪ E2 ∪ E3) = Pr(E1) + Pr(E2) + Pr(E3) − Pr(E1 ∩ E2)
− Pr(E1 ∩ E3) − Pr(E2 ∩ E3) + Pr(E1 ∩ E2 ∩ E3).
The formulas in (b) and (d) and their generalization to n events are known as the
inclusion–exclusion principle.
5.21. We continue with the coin tossing scenario from Example 5.23, so our ex-
periment consists in tossing a fair coin ten times. Compute the probabilities of the
following events.

Exercises 289
(a) The first and last tosses are both heads.
(b) Either the first toss or the last toss (or both) are heads.
(c) Either the first toss or the last toss (but not both) are heads.
(d) There are exactly k heads and 10 − k tails. Compute the probability for each
value of k between 0 and 10. (Hint. To save time, note that the probability of
exactly k heads is the same as the probability of exactly k tails.)
(e) There is an even number of heads.
(f) There is an odd number of heads.
5.22. Alice offers to make the following bet with you. She will toss a fair coin 14
times. If exactly 7 heads come up, she will give you $4; otherwise you must give
her $1. Would you take this bet? If so, and if you repeated the bet 10000 times, how
much money would you expect to win or lose?
5.23. Let E and F be events.
(a) Prove that Pr(E | E) = 1. Explain in words why this is reasonable.
(b) If E and F are disjoint, prove that Pr(F | E) = 0. Explain in words why this
is reasonable.
(c) Let F1, . . . , Fn be events satisfying Fi ∩ Fj = ∅ for all i = j. We say
that F1, . . . , Fn are pairwise disjoint. Prove then that
Pr
n

i=1
Fi

=
n

i=1
Pr(Fi).
(d) Let F1, . . . , Fn be pairwise disjoint as in (c), and assume further that
F1 ∪ · · · ∪ Fn = Ω,
where recall that Ω is the entire sample space. Prove the following general
version of the decomposition formula (5.20) in Proposition 5.24(a):
Pr(E) =
n

i=1
Pr(E | Fi) Pr(Fi).
(e) Prove a general version of Bayes’s formula:
Pr(Fi | E) =
Pr(E | Fi) Pr(Fi)
Pr(E | F1) Pr(F1) + Pr(E | F2) Pr(F2) + · · · + Pr(E | Fn) Pr(Fn)
.
5.24. There are two urns containing pens and pencils. Urn #1 contains three pens
and seven pencils and Urn #2 contains eight pens and four pencils.
(a) An urn is chosen at random and an object is drawn. What is the probability
that it is a pencil?
(b) An urn is chosen at random and an object is drawn. If the object drawn is a
pencil, what is the probability that it came from Urn #1?
(c) If an urn is chosen at random and two objects are drawn simultaneously, what
is the probability that both are pencils?
5.25. An urn contains 20 silver coins and 10 gold coins. You are the sixth person
in line to randomly draw and keep a coin from the urn.
(a) What is the probability that you draw a gold coin?

290 Exercises
(b) If you draw a gold coin, what is the probability that the five people ahead of
you all drew silver coins?
5.26. Consider the three prisoners scenario described in Example 5.26. Let A, B,
and C denote respectively the events that Alice is to be released, Bob is to be
released, and Carl is to be released, which we assume to be equally likely, so Pr(A) =
Pr(B) = Pr(C) = 1
3
. Also let J be the event that the jailer tells Aice that Bob is to
stay in jail.
(a) Compute the values of Pr(B | J), Pr(J | B), and Pr(J | C).
(b) Compute the values of Pr(J | Ac
) and Pr(Jc
| Ac
), where the event Ac
is the
event that Alice stays in jail.
(c) Suppose that if Alice is the one who is to be released, then the jailer flips a fair
coin to decide whether to tell Alice that Bob stays in jail or that Carl stays in
jail. What is the value of Pr(A | J)?
(d) Suppose instead that if Alice is the one who is to be released, then the jailer
always tells her that Bob will stay in jail. Now what is the value of Pr(A | J)?
Other similar problems with counterintuitive conclusions include the Monty Hall
problem (Exercise 5.27), Bertrand’s box paradox, and the principle of restricted
choice in contract bridge.
5.27. (The Monty Hall Problem) Monty Hall gives Dan the choice of three curtains.
Behind one curtain is a car, while behind the other two curtains are goats. Dan
chooses a curtain, but before it is opened, Monty Hall opens one of the other curtains
and reveals a goat. He then offers Dan the option of keeping his original curtain or
switching to the remaining closed curtain. The Monty Hall problem is to figure out
Dan’s best strategy: “To stick or to switch?”
(a) What is the probability that Dan wins the car if he always sticks to his first
choice of curtain? What is the probability that Dan wins the car if he always
switches curtains? Which is his best strategy? (If the answer seems counter-
intuitive, suppose instead that there are 1000 curtains and that Monty Hall
opens 998 goat curtains. Now what are the winning probabilities for the two
strategies?)
(b) Suppose that we give Monty Hall another option, namely he’s allowed to force
Dan to stick with his first choice of curtain. Assuming that Monty Hall dislikes
giving away cars, now what is Dan’s best strategy, and what is his probability
of winning a car?
(c) More generally, suppose that there are N curtains and M cars, and suppose
that Monty Hall opens K curtains that have goats behind them. Compute the
probabilities
Pr(Dan wins a car | Dan sticks), Pr(Dan wins a car | Dan switches).
Which is the better strategy?
5.28. Let S be a set, let A be a property of interest, and suppose that for m ∈ S,
we have
Pr(m does not have property A) = δ.
Suppose further that a Monte Carlo algorithm applied to m and a random number r
satisfy:

Exercises 291
(1) If the algorithm returns Yes, then m definitely has property A.
(2) If m has property A, then the probability that the algorithm returns Yes is at
least p.
Notice that we can restate (1) and (2) as conditional probabilities:
(1) Pr(m has property A | algorithm returns Yes) = 1,
(2) Pr(algorithm returns Yes | m has property A) ≥ p.
Suppose that we run the algorithm N times on the number m, and suppose that
the algorithm returns No every single time. Derive a lower bound, in terms of δ, p,
and N, for the probability that m does not have property A. (This generalizes the
version of the Monte Carlo method that we studied in Sect. 5.3.3 with δ = 0.01 and
p = 1
2
. Be careful to distinguish p from 1 − p in your calculations.)
5.29. We continue with the setup described in Exercise 5.28.
(a) Suppose that δ = 9
10
and p = 3
4
. If we run the algorithm 25 times on the
input m and always get back No, what is the probability that m does not have
property A?
(b) Same question as (a), but this time we run the algorithm 100 times.
(c) Suppose that δ = 99
100
and p = 1
2
. How many times should we run the algorithm
on m to be 99 % confident that m does not have property A, assuming that
every output is No?
(d) Same question as (c), except now we want to be 99.9999 % confident.
5.30. If an integer n is composite, then the Miller–Rabin test has at least a 75 %
chance of succeeding in proving that n is composite, while it never misidentifies a
prime as being composite. (See Table 3.2 in Sect. 3.4 for a description of the Miller–
Rabin test.) Suppose that we run the Miller–Rabin test N times on the integer n
and that it fails to prove that n is composite. Show that the probability that n is
prime satisfies (approximately)
Pr(n is prime | the Miller–Rabin test fails N times) ≥ 1 −
ln(n)
4N
.
(Hint. Use Exercise 5.28 with appropriate choices of A, S, δ, and p. You may also
use the estimate from Sect. 3.4.1 that the probability that n is prime is approxi-
mately 1/ ln(n).)
5.31. It is natural to assume that if Pr(E | F) is significantly larger than Pr(E),
then somehow F is causing E. Baye’s formula illustrates the fallacy of this sort of
reasoning, since it says that
Pr(E | F)
Pr(E)
=
Pr(F | E)
Pr(F)
.
So if F is “causing” E, then the same reasoning shows that E is “causing” F. All
that one can really say is that E and F are correlated with one another, in the sense
that either one of them being true makes it more likely that the other one is true.
It is incorrect to deduce a cause-and-effect relation.
Here is a concrete example. Testing shows that first graders are more likely to be
good spellers if their shoe sizes are larger than average. This is an experimental fact.
Hence if we stretch a child’s foot, it will make them a better speller! Alternatively,

292 Exercises
by Baye’s formula, if we give them extra spelling lessons, then their feet will grow
faster! Explain why these last two assertions are nonsense, and describe what’s really
going on.
5.32. Let fX (k) be the binomial density function (5.23). Prove directly, using the
binomial theorem, that
n
k=0 fX (k) = 1.
5.33. In Example 5.37 we used a differentiation trick to compute the value of the
infinite series
∞
n=1 np(1−p)n−1
. This exercise further develops this useful technique.
The starting point is the formula for the geometric series
∞

n=0
xn
=
1
1 − x
for |x| 1 (5.57)
and the differential operator
D = x
d
dx
.
(a) Using the fact that D(xn
) = nxn
, prove that
∞

n=1
nxn
=
x
(1 − x)2
(5.58)
by applying D to both sides of (5.57). For which x does the left-hand side
of (5.58) converge? (Hint. Use the ratio test.)
(b) Applying D again, prove that
∞

n=0
n2
xn
=
x + x2
(1 − x)3
. (5.59)
(c) More generally, prove that for every value of k there is a polynomial Fk(x) such
that
∞

n=0
nk
xn
=
Fk(x)
(1 − x)k+1
. (5.60)
(Hint. Use induction on k.)
(d) The first few polynomials Fk(x) in (c) are F0(x) = 1, F1(x) = x, and F2(x) =
x + x2
. These follow from (5.57), (5.58), and (5.59). Compute F3(x) and F4(x).
(e) Prove that the polynomial Fk(x) in (c) has degree k.
5.34. In each case, compute the expectation of the random variable X.
(a) The values of X are uniformly distributed on the set {0, 1, 2, . . . , N − 1}. (See
Example 5.28.)
(b) The values of X are uniformly distributed on the set {1, 2, . . . , N}.
(c) The values of X are uniformly distributed on the set {1, 3, 7, 11, 19, 23}.
(d) X is a random variable with a binomial density function; see formula (5.23) in
Example 5.29 on page 240.
5.35. Let X be a random variable on the probability space Ω. It might seem more
natural to define the expected value of X by the formula

Exercises 293

ω∈Ω
X(ω) · Pr(ω). (5.61)
Prove that the formula (5.61) gives the same value as Eq. (5.27) on page 244, which
we used in the text to define E(X).
Section 5.4. Collision Algorithms and the Birthday Paradox
5.36. (a) In a group of 23 strangers, what is the probability that at least two of
them have the same birthday? How about if there are 40 strangers? In a group
of 200 strangers, what is the probability that one of them has the same birthday
as your birthday? (Hint. See the discussion in Sect. 5.4.1.)
(b) Suppose that there are N days in a year (where N could be any number) and
that there are n people. Develop a general formula, analogous to (5.28), for the
probability that at least two of them have the same birthday. (Hint. Do a cal-
culation similar to the proof of (5.28) in the collision theorem (Theorem 5.38),
but note that the formula is a bit different because the birthdays are being
selected from a single list of N days.)
(c) Find a lower bound of the form
Pr(at least one match) ≥ 1 − e−(some function of n and N)
for the probability in (b), analogous to the estimate (5.29).
5.37. A deck of cards is shuffled and the top eight cards are turned over.
(a) What is the probability that the king of hearts is visible?
(b) A second deck is shuffled and its top eight cards are turned over. What is the
probability that a visible card from the first deck matches a visible card from
the second deck? (Note that this is slightly different from Example 5.39 because
the cards in the second deck are not being replaced.)
5.38. (a) Prove that
e−x
≥ 1 − x for all values of x.
(Hint. Look at the graphs of e−x
and 1 − x, or use calculus to compute the
minimum of the function f(x) = e−x
− (1 − x).)
(b) Prove that for all a 1, the inequality
e−ax
≤ (1 − x)a
+
1
2
ax2
is valid for all 0 ≤ x ≤ 1.
(This is a challenging problem.)
(c) We used the inequality in (a) during the proof of the lower bound (5.29) in the
collision theorem (Theorem 5.38). Use (b) to prove that
Pr(at least one red) ≤ 1 − e−mn/N
+
mn2
2N2
.
Thus if N is large and m and n are not much larger than
√
N, then the estimate
Pr(at least one red) ≈ 1 − e−mn/N
is quite accurate. (Hint. Use (b) with a = m and x = n/N.)
5.39. Solve the discrete logarithm problem 10x
= 106 in the finite field F811 by
finding a collision among the random powers 10i
and 106 · 10i
that are listed in
Table 5.17.

294 Exercises
i gi
h · gi
116 96 444
497 326 494
225 757 764
233 517 465
677 787 700
622 523 290
i gi
h · gi
519 291 28
286 239 193
298 358 642
500 789 101
272 24 111
307 748 621
i gi
h · gi
791 496 672
385 437 95
178 527 714
471 117 237
42 448 450
258 413 795
i gi
h · gi
406 801 562
745 194 289
234 304 595
556 252 760
326 649 670
399 263 304
Table 5.17: Data for Exercise 5.39, g = 10, h = 106, p = 811
Section 5.5. Pollard’s ρ Method
5.40. Table 5.18 gives some of the computations for the solution of the discrete
logarithm problem
11t
= 41387 in F81799 (5.62)
using Pollard’s ρ method. (It is similar to Table 5.11 in Example 5.52.) Use the data
in Table 5.18 to solve (5.62).
i xi yi αi βi γi δi
0 1 1 0 0 0 0
1 11 121 1 0 2 0
2 121 14641 2 0 4 0
3 1331 42876 3 0 12 2
4 14641 7150 4 0 25 4
.
.
.
151 4862 33573 40876 45662 29798 73363
152 23112 53431 81754 9527 37394 48058
153 8835 23112 81755 9527 67780 28637
154 15386 15386 81756 9527 67782 28637
Table 5.18: Computations to solve 11t
= 41387 in F81799 for Exercise 5.40
5.41. Table 5.19 gives some of the computations for the solution of the discrete
logarithm problem
7t
= 3018 in F7963 (5.63)
using Pollard’s ρ method. (It is similar to Table 5.11 in Example 5.52.) Extend
Table 5.19 until you ﬁnd a collision (we promise that it won’t take too long) and
then solve (5.63).
5.42. Write a computer program implementing Pollard’s ρ method for solving the
discrete logarithm problem and use it to solve each of the following:
(a) 2t
= 2495 in F5011.
(b) 17t
= 14226 in F17959.
(c) 29t
= 5953042 in F15239131.

Exercises 295
i xi yi αi βi γi δi
0 1 1 0 0 0 0
1 7 49 1 0 2 0
2 49 2401 2 0 4 0
3 343 6167 3 0 6 0
4 2401 1399 4 0 7 1
.
.
.
87 1329 1494 6736 7647 3148 3904
88 1340 1539 6737 7647 3150 3904
89 1417 4767 6738 7647 6302 7808
90 1956 1329 6739 7647 4642 7655
Table 5.19: Computations to solve 7t
= 3018 in F7963 for Exercise 5.41
5.43. Evaluate the integral I =
∞
0
t2
e−t2
/2
dt appearing in the proof of Theo-
rem 5.48. (Hint. Write I2
as an iterated integral,
I2
=
∞
0
∞
0
x2
e−x2
/2
· y2
e−y2
/2
dx dy,
and switch to polar coordinates.)
5.44. This exercise describes Pollard’s ρ factorization algorithm. It is particularly
good at factoring numbers N that have a prime factor p with the property that p
is considerably smaller than N/p. Later we will study an even faster, albeit more
complicated, factorization algorithm with this property that is based on the theory
of elliptic curves; see Sect. 6.6.
Let N be an integer that is not prime, and let
f : Z/NZ −→ Z/NZ
be a mixing function, for example f(x) = x2
+ 1 mod N. As in the abstract version
of Pollard’s ρ method (Theorem 5.48), let x0 = y0 be an initial value, and generate
sequences by setting xi+1 = f(xi) and yi+1 = f(f(yi)). At each step, also compute
the greatest common divisor
gi = gcd

|xi − yi|, N

.
(a) Let p be the smallest prime divisor of N. If the function f is suﬃciently random,
show that with high probability we have
gk = p for some k = O(
√
p).
Hence the algorithm factors N in O(
√
p) steps.
(b) Program Pollard’s ρ algorithm with f(x) = x2
+ 1 and x0 = y0 = 0, and use it
to factor the following numbers. In each case, give the smallest value of k such
that gk is a nontrivial factor of N and print the ratio k/
√
N.
(i) N = 2201. (ii) N = 9409613. (iii) N = 1782886219.

296 Exercises
(c) Repeat your computations in (b) using the function f(x) = x2
+ 2. Do the
running times change?
(d) Explain what happens if you run Pollard’s ρ algorithm and N is prime.
(e) Explain what happens if you run Pollard’s ρ algorithm with f(x) = x2
and any
initial values for x0.
(f) Try running Pollard’s ρ algorithm with the function f(x) = x2
− 2. Explain
what is happening. (Hint. This part is more challenging. It may help to use the
identity fn
(u + u−1
) = u2n
+ u−2n
, which you can prove by induction.)
Section 5.6. Information Theory
5.45. Consider the cipher that has three keys, three plaintexts, and four ciphertexts
that are combined using the following encryption table (which is similar to Table 5.12
used in Example 5.54 on page 265).
m1 m2 m3
k1 c2 c4 c1
k2 c1 c3 c2
k3 c3 c1 c2
Suppose further that the plaintexts and keys are used with the following probabili-
ties:
f(m1) = f(m2) =
2
5
, f(m3) =
1
5
, f(k1) = f(k2) = f(k3) =
1
3
.
(a) Compute f(c1), f(c2), f(c3), and f(c4).
(b) Compute f(c1 | m1), f(c1 | m2), and f(c1 | m3). Does this cryptosystem have
perfect secrecy?
(c) Compute f(c2 | m1) and f(c3 | m1).
(d) Compute f(k1 | c3) and f(k2 | c3).
5.46. Suppose that a shift cipher is employed such that each key, i.e., each shift
amount from 0 to 25, is used with equal probability and such that a new key is
chosen to encrypt each successive letter. Show that this cryptosystem has perfect
secrecy by ﬁlling in the details of the following steps.
(a) Show that

k∈K fM (dk(c)) = 1 for every ciphertext c ∈ C.
(b) Compute the ciphertext density function fC using (5.47), which in this case
says that
fC (c) =

k∈K
fK (k)fM (dk(c)).
(c) Compare fC (c) to fC|M (c | m).
5.47. Give the details of the proof of (5.47), which says that
fC (c) =

k ∈ K such
that c ∈ ek(M)
fK (k)fM

dk(c)

.
(Hint. Use the decomposition formula from Exercise 5.23(d)).

Exercises 297
5.48. Suppose that a cryptosystem has the same number of plaintexts as it does
ciphertexts (#M = #C). Prove that for any given key k ∈ K and any given ci-
phertext c ∈ C, there is a unique plaintext m ∈ M that encrypts to c using the
key k. (We used this fact during the proof of Theorem 5.56. Notice that the proof
does not require the cryptosystem to have perfect secrecy; all that is needed is that
#M = #C.)
5.49. Let Sm,c =

k ∈ K : ek(m) = c

be the set used during the proof of
Theorem 5.56. Prove that if c = c
, then Sm,c ∩ Sm,c = ∅. (Prove this for any
cryptosystem; it is not necessary to assume perfect secrecy.)
5.50. Suppose that a cryptosystem satisfies #K = #M = #C and that it has
perfect secrecy. Prove that every ciphertext is used with equal probability and that
every plaintext is used with equal probability. (Hint. We proved one of these during
the course of proving Theorem 5.56. The proof of the other is similar.)
5.51. Prove the “only if” part of Theorem 5.56, i.e., prove that if a cryptosystem
with an equal number of keys, plaintexts, and ciphertexts satisfies conditions (a)
and (b) of Theorem 5.56, then it has perfect secrecy.
5.52. Let Xn be a uniformly distributed random variable on n objects, and let r ≥ 1.
Prove directly from Property H3 of entropy that
H(Xnr ) = rH(Xn).
This generalizes Example 5.58.
5.53. Let X, Y , and Z1, . . . , Zm be random variables as described in Property H3
on page 270. Let
pi = Pr(Y = Zi) and qij = Pr(Zi = xij), so Pr(X = xij) = piqij.
With this notation, Property H3 says that
H

(piqij) 1≤i≤n
1≤j≤mi

= H

(pi)1≤i≤n

+
n

i=1
piH

(qij)1≤j≤mi

.
(See Example 5.59.) Then the formula (5.51) for entropy given in Theorem 5.60
implies that
n

i=1
mi

j=1
piqij log2(piqij) =
n

i=1
pi log2(pi) +
n

i=1
pi
mi

j=1
qij log2(qij). (5.64)
Prove directly that (5.64) is true. (Hint. Remember that the probabilities satisfy

i pi = 1 and

j qij = 1.)
5.54. Let F(x) be a twice differentiable function with the property that F
(x) 0
for all x in its domain. Prove that F is concave in the sense of (5.52). Conclude in
particular that the function F(x) = log x is concave for all x 0.
5.55. Use induction to prove Jensen’s inequality (Theorem 5.61).
5.56. Let X and Y be independent random variables.

298 Exercises
(a) Prove that the equivocation H(X | Y ) is equal to the entropy H(X).
(b) If H(X | Y ) = H(X), is it necessarily true that X and Y are independent?
5.57. Prove that key equivocation satisfies the formula
H(K | C) = H(K) + H(M) − H(C)
as described in Proposition 5.64.
5.58. We continue with the cipher described in Exercise 5.45.
(a) Compute the entropies H(K), H(M), and H(C).
(b) Compute the key equivocation H(K | C).
5.59. Suppose that the key equivocation of a certain cryptosystem vanishes, i.e.,
suppose that H(K | C) = 0. Prove that even a single observed ciphertext uniquely
determines which key was used.
5.60. Write a computer program that reads a text file and performs the following
tasks:
[1] Convert all alphabetic characters to lowercase and convert all strings of con-
secutive nonalphabetic characters to a single space. (The reason for leaving in
a space is that when you count bigrams and trigrams, you will want to know
where words begin and end.)
[2] Count the frequency of each letter a-to-z, print a frequency table, and use your
frequency table to estimate the entropy of a single letter in English, as we did
in Sect. 5.6.3 using Table 1.3.
[3] Count the frequency of each bigram aa, ab,. . . ,zz, being careful to include
only bigrams that appear within words. (As an alternative, also allow bigrams
that either start or end with a space, in which case there are 272
− 1 = 728
possible bigrams.) Print a frequency table of the 25 most common bigrams and
their probabilities, and use your full frequency table to estimate the entropy of
bigrams in English. In the notation of Sect. 5.6.3, this is the quantity H(L2
).
Compare 1
2
H(L2
) with the value of H(L) from step [1].
[4] Repeat [3], but this time with trigrams. Compare 1
3
H(L3
) with the values
of H(L) and 1
2
H(L2
) from [2] and [3]. (Note that for this part, you will need a
large quantity of text in order to get some reasonable frequencies.)
Try running your program on some long blocks of text. For example, the following
noncopyrighted material is available in the form of ordinary text files from Project
Gutenberg at http://www.gutenberg.org/. To what extent are the letter frequen-
cies similar and to what extent do they differ in these different texts?
(a) Alice’s Adventures in Wonderland by Lewis Carroll,
http://www.gutenberg.org/etext/11
(b) Relativity: the Special and General Theory by Albert Einstein,
(c) The Old Testament (translated from the original Hebrew, of course!),
(d) 20000 Lieues Sous Les Mers (20000 Leagues Under the Sea) by Jules Verne,
http://www.gutenberg.org/etext/5097. Note that this one is a little trickier,
since first you will need to convert all of the letters to their unaccented forms.

Chapter 6
Elliptic Curves
and Cryptography
The subject of elliptic curves encompasses a vast amount of mathematics.1
Our aim in this section is to summarize just enough of the basic theory for
cryptographic applications. For additional reading, there are a number of sur-
vey articles and books devoted to elliptic curve cryptography [14, 68, 81, 135],
and many others that describe the number theoretic aspects of the theory of
elliptic curves, including [25, 65, 73, 74, 136, 134, 138].
6.1 Elliptic Curves
An elliptic curve2
is the set of solutions to an equation of the form
Y 2
= X3
+ AX + B.
Equations of this type are called Weierstrass equations after the mathe-
matician who studied them extensively during the nineteenth century. Two
examples of elliptic curves,
E1 : Y 2
= X3
− 3X + 3 and E2 : Y 2
= X3
− 6X + 5,
are illustrated in Fig. 6.1.
1Indeed, even before elliptic curves burst into cryptographic prominence, a well-known
mathematician [73] opined that “it is possible to write endlessly on elliptic curves!”
2A word of warning. You may recall from high school geometry that an ellipse is a
geometric object that looks like a squashed circle. Elliptic curves are not ellipses, and
indeed, despite their somewhat unfortunate name, elliptic curves and ellipses have only the
most tenuous connection with one another.
299

300 6. Elliptic Curves and Cryptography
E2 : Y 2 = X3 − 6X+ 5
E1 : Y 2 = X3 − 3X + 3
Figure 6.1: Two examples of elliptic curves
An amazing feature of elliptic curves is that there is a natural way to take
two points on an elliptic curve and “add” them to produce a third point. We
put quotation marks around “add” because we are referring to an operation
that combines two points in a manner analogous to addition in some respects
(it is commutative and associative, and there is an identity), but very unlike
addition in other ways. The most natural way to describe the “addition law”
on elliptic curves is to use geometry.
Let P and Q be two points on an elliptic curve E, as illustrated in Fig. 6.2.
We start by drawing the line L through P and Q. This line L intersects E at
three points, namely P, Q, and one other point R. We take that point R and
reflect it across the x-axis (i.e., we multiply its Y-coordinate by −1) to get a
new point R
. The point R
is called the “sum of P and Q,” although as you
can see, this process is nothing like ordinary addition. For now, we denote this
strange addition law by the symbol ⊕. Thus we write3
P ⊕ Q = R
.
Example 6.1. Let E be the elliptic curve
Y 2
= X3
− 15X + 18. (6.1)
The points P = (7, 16) and Q = (1, 2) are on the curve E. The line L
connecting them is given by the equation4
L : Y =
7
3
X −
1
3
. (6.2)
In order to find the points where E and L intersect, we substitute (6.2)
into (6.1) and solve for X. Thus
3Not to be confused with the identical symbol ⊕ that we used to denote the XOR
operation in a different context!
4Recall that the equation of the line through two points (x1, y1) and (x2, y2) is given
by the point–slope formula Y − y1 = λ · (X − x1), where the slope λ is equal to y2−y1
x2−x1
.

6.1. Elliptic Curves 301
P
Q
R
P ⊕ Q = R
L
E
Figure 6.2: The addition law on an elliptic curve

7
3
X −
1
3

2
= X3
− 15X + 18,
49
9
X2
−
14
9
X +
1
9
= X3
− 15X + 18,
0 = X3
−
49
9
X2
−
121
9
X +
161
9
.
We need to find the roots of this cubic polynomial. In general, finding the
roots of a cubic is difficult. However, in this case we already know two of the
roots, namely X = 7 and X = 1, since we know that P and Q are in the
intersection E ∩ L. It is then easy to find the other factor,
X3
−
49
9
X2
−
121
9
X +
161
9
= (X − 7) · (X − 1) ·

X +
23
9

,
so the third point of intersection of L and E has X-coordinate equal to −23
9 .
Next we ﬁnd the Y-coordinate by substituting X = −23
9 into Eq. (6.2). This
gives R =

−23
9 , −170
27

. Finally, we reﬂect across the X-axis to obtain
P ⊕ Q =

−
23
9
,
170
27

.
There are a few subtleties to elliptic curve addition that need to be ad-
dressed. First, what happens if we want to add a point P to itself? Imagine
what happens to the line L connecting P and Q if the point Q slides along
the curve and gets closer and closer to P. In the limit, as Q approaches P, the
line L becomes the tangent line to E at P. Thus in order to add P to itself,
we simply take L to be the tangent line to E at P, as illustrated in Fig. 6.3.
Then L intersects E at P and at one other point R, so we can proceed as

P
R
2P = P ⊕ P = R
L is tangent to E at P
L
E
Figure 6.3: Adding a point P to itself
before. In some sense, L still intersects E at three points, but P counts as two
of them.
Example 6.2. Continuing with the curve E and point P from Example 6.1, we
compute P ⊕P. The slope of E at P is computed by implicitly diﬀerentiating
equation (6.1). Thus
2Y
dY
dX
= 3X2
− 15, so
dY
dX
=
3X2
− 15
2Y
.
Substituting the coordinates of P = (7, 16) gives slope λ = 33
8 , so the tangent
line to E at P is given by the equation
L : Y =
33
8
X −
103
8
. (6.3)
Now we substitute (6.3) into Eq. (6.1) for E, simplify, and factor:

33
8
X −
103
8

2
= X3
− 15X + 18,
X3
−
1089
64
X2
+
2919
32
X −
9457
64
= 0,
(X − 7)2
·

X −
193
64

= 0.
Notice that the X-coordinate of P, which is X = 7, appears as a double root
of the cubic polynomial, so it was easy for us to factor the cubic. Finally, we
substitute X = 193
64 into Eq. (6.3) for L to get Y = −223
512 , and then we switch
the sign on Y to get
P ⊕ P =

193
64
,
223
512

A second potential problem with our “addition law” arises if we try to add
a point P = (a, b) to its reflection about the X-axis P
= (a, −b). The line L
through P and P
is the vertical line x = a, and this line intersects E in only
the two points P and P
. (See Fig. 6.4.) There is no third point of intersection,
so it appears that we are stuck! But there is a way out. The solution is to
create an extra point O that lives “at infinity.” More precisely, the point O
does not exist in the XY -plane, but we pretend that it lies on every vertical
line. We then set
P ⊕ P
= O.
We also need to figure out how to add O to an ordinary point P = (a, b)
on E. The line L connecting P to O is the vertical line through P, since O
lies on vertical lines, and that vertical line intersects E at the points P, O,
and P
= (a, −b). To add P to O, we reflect P
across the X-axis, which gets
us back to P. In other words, P ⊕O = P, so O acts like zero for elliptic curve
addition.
Example 6.3. Continuing with the curve E from Example 6.1, notice that the
point T = (3, 0) is on the curve E and that the tangent line to E at T is the
vertical line X = 3. Thus if we add T to itself, we get T ⊕ T = O.
Definition. An elliptic curve E is the set of solutions to a Weierstrass
equation
E : Y 2
= X3
+ AX + B,
together with an extra point O, where the constants A and B must satisfy
4A3
+ 27B2
= 0.
The addition law on E is defined as follows. Let P and Q be two points
on E. Let L be the line connecting P and Q, or the tangent line to E at P
if P = Q. Then the intersection of E and L consists of three points P, Q,
and R, counted with appropriate multiplicities and with the understanding
that O lies on every vertical line. Writing R = (a, b), the sum of P and Q is
defined to be the reflection R
= (a, −b) of R across the X-axis. This sum is
denoted by P ⊕ Q, or simply by P + Q.
Further, if P = (a, b), we denote the reflected point by P = (a, −b), or
simply by −P; and we define P Q (or P − Q) to be P ⊕ (Q). Similarly,
repeated addition is represented as multiplication of a point by an integer,
nP = P + P + P + · · · + P

n copies
.
Remark 6.4. What is this extra condition 4A3
+ 27B2
= 0? The quantity
ΔE = 4A3
+ 27B2
is called the discriminant of E. The condition ΔE = 0
is equivalent to the condition that the cubic polynomial X3
+ AX + B have
no repeated roots, i.e., if we factor X3
+ AX + B completely as

E
L
O
P = (a,b)
P
= (a,−b)
P
Vertical lines have no
third intersection
point with E
Figure 6.4: The vertical line L through P = (a, b) and P
= (a, −b)
X3
+ AX + B = (X − e1)(X − e2)(X − e3),
where e1, e2, e3 are allowed to be complex numbers, then
4A3
+ 27B2
= 0 if and only if e1, e2, e3 are distinct.
(See Exercise 6.3.) Curves with ΔE = 0 have singular points (see Exercise 6.4).
The addition law does not work well on these curves. That is why we include
the requirement that ΔE = 0 in our deﬁnition of an elliptic curve.
Theorem 6.5. Let E be an elliptic curve. Then the addition law on E has
the following properties:
(a) P + O = O + P = P for all P ∈ E. [Identity]
(b) P + (−P) = O for all P ∈ E. [Inverse]
(c) (P + Q) + R = P + (Q + R) for all P, Q, R ∈ E. [Associative]
(d) P + Q = Q + P for all P, Q ∈ E. [Commutative]
In other words, the addition law makes the points of E into an abelian group.
(See Sect. 2.5 for a general discussion of groups and their axioms.)
Proof. As we explained earlier, the identity law (a) and inverse law (b) are
true because O lies on all vertical lines. The commutative law (d) is easy to
verify, since the line that goes through P and Q is the same as the line that
goes through Q and P, so the order of the points does not matter.
The remaining piece of Theorem 6.5 is the associative law (c). One might
not think that this would be hard to prove, but if you draw a picture and
start to put in all of the lines needed to verify (c), you will see that it is quite

complicated. There are many ways to prove the associative law, but none of
the proofs are easy. After we develop explicit formulas for the addition law
on E (Theorem 6.6), you can use those formulas to check the associative law
by a direct (but painful) calculation. More perspicacious, but less elementary,
proofs may be found in [74, 136, 138] and other books on elliptic curves.
Our next task is to find explicit formulas to enable us to easily add and
subtract points on an elliptic curve. The derivation of these formulas uses
elementary analytic geometry, a little bit of differential calculus to find a
tangent line, and a certain amount of algebraic manipulation. We state the
results in the form of an algorithm, and then briefly indicate the proof.
Theorem 6.6 (Elliptic Curve Addition Algorithm). Let
E : Y 2
= X3
+ AX + B
be an elliptic curve and let P1 and P2 be points on E.
(a) If P1 = O, then P1 + P2 = P2.
(b) Otherwise, if P2 = O, then P1 + P2 = P1.
(c) Otherwise, write P1 = (x1, y1) and P2 = (x2, y2).
(d) If x1 = x2 and y1 = −y2, then P1 + P2 = O.
(e) Otherwise, define λ by
λ =
⎧
⎪
⎪
⎨
⎪
⎪
⎩
y2 − y1
x2 − x1
if P1 = P2,
3x2
1 + A
2y1
if P1 = P2,
and let
x3 = λ2
− x1 − x2 and y3 = λ(x1 − x3) − y1.
Then P1 + P2 = (x3, y3).
Proof. Parts (a) and (b) are clear, and (d) is the case that the line through P1
and P2 is vertical, so P1 +P2 = O. (Note that if y1 = y2 = 0, then the tangent
line is vertical, so that case works, too.) For (e), we note that if P1 = P2,
then λ is the slope of the line through P1 and P2, and if P1 = P2, then λ is
the slope of the tangent line at P1 = P2. In either case the line L is given by
the equation Y = λX + ν with ν = y1 − λx1. Substituting the equation for L
into the equation for E gives
(λX + ν)2
= X3
+ AX + B,
so
X3
− λ2
X2
+ (A − 2λν)X + (B − ν2
) = 0.

We know that this cubic has x1 and x2 as two of its roots. If we call the third
root x3, then it factors as
X3
− λ2
X2
+ (A − 2λν)X + (B − ν2
) = (X − x1)(X − x2)(X − x3).
Now multiply out the right-hand side and look at the coefficient of X2
on each
side. The coefficient of X2
on the right-hand side is −x1 − x2 − x3, which must
equal −λ2
, the coefficient of X2
on the left-hand side. This allows us to solve
for x3 = λ2
− x1 − x2, and then the Y-coordinate of the third intersection
point of E and L is given by λx3 + ν. Finally, in order to get P1 + P2, we
must reflect across the X-axis, which means replacing the Y-coordinate with
its negative.
6.2 Elliptic Curves over Finite Fields
In the previous section we developed the theory of elliptic curves geometrically.
For example, the sum of two distinct points P and Q on an elliptic curve E
is defined by drawing the line L connecting P to Q and then finding the third
point where L and E intersect, as illustrated in Fig. 6.2. However, in order to
apply the theory of elliptic curves to cryptography, we need to look at elliptic
curves whose points have coordinates in a finite field Fp. This is easy to do.
Definition. Let p ≥ 3 be a prime. An elliptic curve over Fp is an equation
of the form
E : Y 2
= X3
+ AX + B with A, B ∈ Fp satisfying 4A3
+ 27B2
= 0.
The set of points on E with coordinates in Fp is the set
E(Fp) =

(x, y) : x, y ∈ Fp satisfy y2
= x3
+ Ax + B

∪ {O}.
Remark 6.7. Elliptic curves over F2 are actually quite important in cryptogra-
phy, but they require more complicated equations, so we delay our discussion
of them until Sect. 6.7.
Example 6.8. Consider the elliptic curve
E : Y 2
= X3
+ 3X + 8 over the field F13.
We can find the points of E(F13) by substituting in all possible values X = 0, 1,
2, . . . , 12 and checking for which X values the quantity X3
+ 3X + 8 is a
square modulo 13. For example, putting X = 0 gives 8, and 8 is not a square
modulo 13. Next we try X = 1, which gives 1+3+8 = 12. It turns out that 12
is a square modulo 13; in fact, it has two square roots,
52
≡ 12 (mod 13) and 82
≡ 12 (mod 13).
This gives two points (1, 5) and (1, 8) in E(F13). Continuing in this fashion,
we end up with a complete list,

6.2. Elliptic Curves over Finite Fields 307
E(F13) = {O, (1, 5), (1, 8), (2, 3), (2, 10), (9, 6), (9, 7), (12, 2), (12, 11)}.
Thus E(F13) consists of nine points.
Suppose now that P and Q are two points in E(Fp) and that we want to
“add” the points P and Q. One possibility is to develop a theory of geometry
using the field Fp instead of R. Then we could mimic our earlier construc-
tions to define P + Q. This can be done, and it leads to a fascinating field of
mathematics called algebraic geometry. However, in the interests of brevity
of exposition, we instead use the explicit formulas given in Theorem 6.6 to
add points in E(Fp). But we note that if one wants to gain a deeper under-
standing of the theory of elliptic curves, then it is necessary to use some of
the machinery and some of the formalism of algebraic geometry.
Let P = (x1, y1) and Q = (x2, y2) be points in E(Fp). We define the sum
P + Q to be the point (x3, y3) obtained by applying the elliptic curve addition
algorithm (Theorem 6.6). Notice that in this algorithm, the only operations
used are addition, subtraction, multiplication, and division involving the co-
efficients of E and the coordinates of P and Q. Since those coefficients and
coordinates are in the field Fp, we end up with a point (x3, y3) whose coordi-
nates are in Fp. Of course, it is not completely clear that (x3, y3) is a point
in E(Fp).
Theorem 6.9. Let E be an elliptic curve over Fp and let P and Q be points
in E(Fp).
(a) The elliptic curve addition algorithm (Theorem 6.6) applied to P and Q
yields a point in E(Fp). We denote this point by P + Q.
(b) This addition law on E(Fp) satisfies all of the properties listed in
Theorem 6.5. In other words, this addition law makes E(Fp) into a
finite group.
Proof. The formulas in Theorem 6.6(e) are derived by substituting the equa-
tion of a line into the equation for E and solving for X, so the resulting point
is automatically a point on E, i.e., it is a solution to the equation defining E.
This shows why (a) is true, although when P = Q, a small additional argument
is needed to indicate why the resulting cubic polynomial has a double root.
For (b), the identity law follows from the addition algorithm steps (a) and (b),
the inverse law is clear from the addition algorithm Step (d), and the commu-
tative law is easy, since a brief examination of the addition algorithm shows
that switching the two points leads to the same result. Unfortunately, the as-
sociative law is not so clear. It is possible to verify the associative law directly
using the addition algorithm formulas, although there are many special cases
to consider. The alternative is to develop more of the general theory of elliptic
curves, as is done in the references cited in the proof of Theorem 6.5.
Example 6.10. We continue with the elliptic curve
E : Y 2
= X3
+ 3X + 8 over F13

from Example 6.8, and we use the addition algorithm (Theorem 6.6) to add
the points P = (9, 7) and Q = (1, 8) in E(F13). Step (e) of that algorithm
tells us to first compute
λ =
y2 − y1
x2 − x1
=
8 − 7
1 − 9
=
1
−8
=
1
5
= 8,
where recall that all computations5
are being performed in the field F13, so
−8 = 5 and 1
5 = 5−1
= 8. Next we compute
ν = y1 − λx1 = 7 − 8 · 9 = −65 = 0.
Finally, the addition algorithm tells us to compute
x3 = λ2
− x1 − x2 = 64 − 9 − 1 = 54 = 2,
y3 = −(λx3 + ν) = −8 · 2 = −16 = 10.
This completes the computation of
P + Q = (1, 8) + (9, 7) = (2, 10) in E(F13).
Similarly, we can use the addition algorithm to add P = (9, 7) to itself.
Keeping in mind that all calculations are in F13, we find that
λ =
3x2
1 + A
2y1
=
3 · 92
+ 3
2 · 7
=
246
14
= 12 and ν = y1 − λx1 = 7 − 12 · 9 = 3.
Then
x3 = λ2
−x1−x2 = (12)2
−9−9 = 9 and y3 = −(λx3+ν) = −(12·9+3) = 6,
so P + P = (9, 7) + (9, 7) = (9, 6) in E(F13). In a similar fashion, we can
compute the sum of every pair of points in E(F13). The results are listed in
Table 6.1.
It is clear that the set of points E(Fp) is a finite set, since there are only
finitely many possibilities for the X- and Y-coordinates. More precisely, there
are p possibilities for X, and then for each X, the equation
Y 2
= X3
+ AX + B
shows that there are at most two possibilities for Y . (See Exercise 1.36.)
Adding in the extra point O, this shows that #E(Fp) has at most 2p + 1
points. However, this estimate is considerably larger than the true size.
5This is a good time to learn that 1
5
is a symbol for a solution to the equation 5x = 1.
In order to assign a value to the symbol 1
5
, you must know where that value lives. In Q, the
value of 1
5
is the usual number with which you are familiar, but in F13 the value of 1
5
is 8,
while in F11 the value of 1
5
is 9. And in F5 the symbol 1
5
is not assigned a value.

6.2. Elliptic Curves over Finite Fields 309
O (1, 5) (1, 8) (2, 3) (2, 10) (9, 6) (9, 7) (12, 2) (12, 11)
O O (1, 5) (1, 8) (2, 3) (2, 10) (9, 6) (9, 7) (12, 2) (12, 11)
(1, 5) (1, 5) (2, 10) O (1, 8) (9, 7) (2, 3) (12, 2) (12, 11) (9, 6)
(1, 8) (1, 8) O (2, 3) (9, 6) (1, 5) (12, 11) (2, 10) (9, 7) (12, 2)
(2, 3) (2, 3) (1, 8) (9, 6) (12, 11) O (12, 2) (1, 5) (2, 10) (9, 7)
(2, 10) (2, 10) (9, 7) (1, 5) O (12, 2) (1, 8) (12, 11) (9, 6) (2, 3)
(9, 6) (9, 6) (2, 3) (12, 11) (12, 2) (1, 8) (9, 7) O (1, 5) (2, 10)
(9, 7) (9, 7) (12, 2) (2, 10) (1, 5) (12, 11) O (9, 6) (2, 3) (1, 8)
(12, 2) (12, 2) (12, 11) (9, 7) (2, 10) (9, 6) (1, 5) (2, 3) (1, 8) O
(12, 11) (12, 11) (9, 6) (12, 2) (9, 7) (2, 3) (2, 10) (1, 8) O (1, 5)
Table 6.1: Addition table for E : Y 2
= X3
+ 3X + 8 over F13
When we plug in a value for X, there are three possibilities for the value
of the quantity
X3
+ AX + B.
First, it may be a quadratic residue modulo p, in which case it has two square
roots and we get two points in E(Fp). This happens about 50 % of the time.
Second, it may be a nonresidue modulo p, in which case we discard X. This
also happens about 50 % of the time. Third, it might equal 0, in which case
we get one point in E(Fp), but this case happens very rarely.6
Thus we might
expect that the number of points in E(Fp) is approximately
#E(Fp) ≈ 50 % · 2 · p + 1 = p + 1.
A famous theorem of Hasse, later vastly generalized by Weil and Deligne, says
that this is true up to random fluctuations.
Theorem 6.11 (Hasse). Let E be an elliptic curve over Fp. Then
#E(Fp) = p + 1 − tp with tp satisfying |tp| ≤ 2
√
p.
Definition. The quantity
tp = p + 1 − #E(Fp)
appearing in Theorem 6.11 is called the trace of Frobenius for E/Fp. We will
not explain the somewhat technical reasons for this name, other than to say
that tp appears as the trace of a certain 2-by-2 matrix that acts as a linear
transformation on a certain two-dimensional vector space associated to E/Fp.
Example 6.12. Let E be given by the equation
E : Y 2
= X3
+ 4X + 6.
We can think of E as an elliptic curve over Fp for different finite fields Fp and
count the number of points in E(Fp). Table 6.2 lists the results for the first
few primes, together with the value of tp and, for comparison purposes, the
value of 2
√
p.
6The congruence X3 + AX + B ≡ 0 (mod p) has at most three solutions, and if p is
large, the chance of randomly choosing one of them is very small.

p #E(Fp) tp 2
√
p
3 4 0 3.46
5 8 −2 4.47
7 11 −3 5.29
11 16 −4 6.63
13 14 0 7.21
17 15 3 8.25
Table 6.2: Number of points and trace of Frobenius for E : Y 2
= X3
+4X +6
Remark 6.13. Hasse’s theorem (Theorem 6.11) gives a bound for #E(Fp), but
it does not provide a method for calculating this quantity. In principle, one can
substitute in each value for X and check the value of X3
+ AX + B against
a table of squares modulo p, but this takes time O(p), so is very inefficient.
Schoof [120] found an algorithm to compute #E(Fp) in time O

(log p)6

, i.e.,
he found a polynomial-time algorithm. Schoof’s algorithm was improved and
made practical by Elkies and Atkin, so it is now known as the SEA algorithm.
We will not describe SEA, which uses advanced techniques from the theory
of elliptic curves, but see [121]. Also see Remark 6.32 in Sect. 6.7 for another
counting algorithm due to Satoh that is designed for a different type of finite
field.
6.3 The Elliptic Curve Discrete Logarithm
Problem (ECDLP)
In Chap. 2 we talked about the discrete logarithm problem (DLP) in the finite
field F∗
p. In order to create a cryptosystem based on the DLP for F∗
p, Alice
publishes two numbers g and h, and her secret is the exponent x that solves
the congruence
h ≡ gx
(mod p).
Let’s consider how Alice can do something similar with an elliptic curve E
over Fp. If Alice views g and h as being elements of the group F∗
p, then the
discrete logarithm problem requires Alice’s adversary Eve to find an x such
that
h ≡ g · g · g · · · g

x multiplications
(mod p).
In other words, Eve needs to determine how many times g must be multiplied
by itself in order to get to h.
With this formulation, it is clear that Alice can do the same thing with the
group of points E(Fp) of an elliptic curve E over a finite field Fp. She chooses

6.3. The Elliptic Curve Discrete Logarithm Problem 311
and publishes two points P and Q in E(Fp), and her secret is an integer n
that makes
Q = P + P + P + · · · + P

n additions on E
= nP.
Then Eve needs to find out how many times P must be added to itself in
order to get Q. Keep in mind that although the “addition law” on an elliptic
curve is conventionally written with a plus sign, addition on E is actually a
very complicated operation, so this elliptic analogue of the discrete logarithm
problem may be quite difficult to solve.
Definition. Let E be an elliptic curve over the finite field Fp and let P and Q
be points in E(Fp). The Elliptic Curve Discrete Logarithm Problem (ECDLP)
is the problem of finding an integer n such that Q = nP. By analogy with the
discrete logarithm problem for F∗
p, we denote this integer n by
n = logP (Q)
and we call n the elliptic discrete logarithm of Q with respect to P.
Remark 6.14. Our definition of logP (Q) is not quite precise. The first difficulty
is that there may be points P, Q ∈ E(Fp) such that Q is not a multiple of P. In
this case, logP (Q) is not defined. However, for cryptographic purposes, Alice
starts out with a public point P and a private integer n and she computes
and publishes the value of Q = nP. So in practical applications, logP (Q) exists
and its value is Alice’s secret.
The second difficulty is that if there is one value of n satisfying Q = nP,
then there are many such values. To see this, we first note that there exists a
positive integer s such that sP = O. We recall the easy proof of this fact (cf.
Proposition 2.12). Since E(Fp) is finite, the points in the list P, 2P, 3P, 4P, . . .
cannot all be distinct. Hence there are integers k j such that kP = jP,
and we can take s = k − j. The smallest such s ≥ 1 is called the order of P.
(Proposition 2.13 tells us that the order of P divides #E(Fp).) Thus if s is
the order of P and if n0 is any integer such that Q = n0P, then the solutions
to Q = nP are the integers n = n0 + is with i ∈ Z. (See Exercise 6.9.)
This means that the value of logP (Q) is really an element of Z/sZ, i.e.,
logP (Q) is an integer modulo s, where s is the order of P. For concreteness we
could set logP (Q) equal to n0. However the advantage of defining the values
to be in Z/sZ is that the elliptic discrete logarithm then satisfies
logP (Q1 + Q2) = logP (Q1) + logP (Q2) for all Q1, Q2 ∈ E(Fp). (6.4)
Notice the analogy with the ordinary logarithm log(αβ) = log(α) + log(β)
and the discrete logarithm for F∗
p (cf. Remark 2.2). The fact that the discrete
logarithm for E(Fp) satisfies (6.4) means that it respects the addition law
when the group E(Fp) is mapped to the group Z/sZ. We say that the map logP
defines a group homomorphism (cf. Exercise 2.13)
logP : E(Fp) −→ Z/sZ.

Example 6.15. Consider the elliptic curve
E : Y 2
= X3
+ 8X + 7 over F73.
The points P = (32, 53) and Q = (39, 17) are both in E(F73), and it is easy
to verify (by hand if you’re patient and with a computer if not) that
Q = 11P, so logP (Q) = 11.
Similarly, R = (35, 47) ∈ E(F73) and S = (58, 4) ∈ E(F73), and after some
computation we find that they satisfy R = 37P and S = 28P, so
logP (R) = 37 and logP (S) = 28.
Finally, we mention that #E(F73) = 82, but P satisfies 41P = O. Thus P
has order 41 = 82/2, so only half of the points in E(F73) are multiples of P.
For example, (20, 65) is in E(F73), but it does not equal a multiple of P.
6.3.1 The Double-and-Add Algorithm
It appears to be quite difficult to recover the value of n from the two points P
and Q = nP in E(Fp), i.e., it is difficult to solve the ECDLP. We will say
more about the difficulty of the ECDLP in later sections. However, in order
to use the function
Z −→ E(Fp), n −→ nP,
for cryptography, we need to efficiently compute nP from the known values n
and P. If n is large, we certainly do not want to compute nP by comput-
ing P, 2P, 3P, 4P, . . . .
The most efficient way to compute nP is very similar to the method that we
described in Sect. 1.3.2 for computing powers an
(mod N), which we needed
for Diffie–Hellman key exchange (Sect. 2.3) and for the Elgamal and RSA
public key cryptosystems (Sects. 2.4 and 3.2). However, since the operation
on an elliptic curve is written as addition instead of as multiplication, we call
it “double-and-add” instead of “square-and-multiply.”
The underlying idea is the same as before. We first write n in binary
form as
n = n0 + n1 · 2 + n2 · 4 + n3 · 8 + · · · + nr · 2r
with n0, n1, . . . , nr ∈ {0, 1}.
(We also assume that nr = 1.) Next we compute the following quantities:
Q0 = P, Q1 = 2Q0, Q2 = 2Q1, . . . , Qr = 2Qr−1.
Notice that Qi is simply twice the previous Qi−1, so
Qi = 2i
P.

Input. Point P ∈ E(Fp) and integer n ≥ 1.
1. Set Q = P and R = O.
2. Loop while n 0.
3. If n ≡ 1 (mod 2), set R = R + Q.
4. Set Q = 2Q and n = n/2.
5. If n 0, continue with loop at Step 2.
6. Return the point R, which equals nP.
Table 6.3: The double-and-add algorithm for elliptic curves
These points are referred to as 2-power multiples of P, and computing them
requires r doublings. Finally, we compute nP using at most r additional
additions,
nP = n0Q0 + n1Q1 + n2Q2 + · · · + nrQr.
We’ll refer to the addition of two points in E(Fp) as a point operation. Thus
the total time to compute nP is at most 2r point operations in E(Fp). Notice
that n ≥ 2r
, so it takes no more than 2 log2(n) point operations to com-
pute nP. This makes it feasible to compute nP even for very large values
of n. We have summarized the double-and-add algorithm in Table 6.3.
Example 6.16. We use the Double-and-Add Algorithm as described in
Table 6.3 to compute nP in E(Fp) for
n = 947, E : Y 2
= X3
+ 14X + 19, p = 3623, P = (6, 730).
The binary expansion of n is
n = 947 = 1 + 2 + 24
+ 25
+ 27
+ 28
+ 29
.
The step by step calculation, which requires nine doublings and six additions,
is given in Table 6.4. The final result is 947P = (3492, 60). (The n column in
Table 6.4 refers to the n used in the algorithm described in Table 6.3.)
Remark 6.17. There is an additional technique that can be used to further
reduce the time required to compute nP. The idea is to write n using sums and
differences of powers of 2. The reason that this is advantageous is because there
are generally fewer terms, so fewer point additions are needed to compute nP.
It is important to observe that subtracting two points on an elliptic curve is as
easy as adding them, since −(x, y) = (x, −y). This is rather different from F∗
p,
where computing a−1
takes significantly more time than it takes to multiply
two elements.
An example will help to illustrate the idea. We saw in Example 6.16 that
947 = 1+2+24
+25
+27
+28
+29
, so it takes 15 point operations (9 doublings
and 6 additions) to compute 947P. But if we instead write
947 = 1 + 2 − 24
− 26
+ 210
,

Step i n Q = 2i
P R
0 947 (6, 730) O
1 473 (2521, 3601) (6, 730)
2 236 (2277, 502) (2149, 196)
3 118 (3375, 535) (2149, 196)
4 59 (1610, 1851) (2149, 196)
5 29 (1753, 2436) (2838, 2175)
6 14 (2005, 1764) (600, 2449)
7 7 (2425, 1791) (600, 2449)
8 3 (3529, 2158) (3247, 2849)
9 1 (2742, 3254) (932, 1204)
10 0 (1814, 3480) (3492, 60)
Table 6.4: Computing 947 · (6, 730) on Y 2
= X3
+ 14X + 19 modulo 3623
then we can compute
947P = P + 2P − 24
P − 26
P + 210
P
using 10 doublings and 4 additions, for a total of 14 point operations. Writing
a number n as a sum of positive and negative powers of 2 is called a ternary
expansion of n.
How much savings can we expect? Suppose that n is a large number and
let k = log n + 1. In the worst case, if n has the form 2k
− 1, then comput-
ing nP using a binary expansion of n requires 2k point operations (k doublings
and k additions), since
2k
− 1 = 1 + 2 + 22
+ · · · + 2k−1
.
But if we allow ternary expansions, then we prove below (Proposition 6.18)
that computing nP never requires more than 3
2 k + 1 point operations (k + 1
doublings and 1
2 k additions).
This is the worst case scenario, but it’s also important to know what hap-
pens on average. The binary expansion of a random number has approximately
the same number of 1’s and 0’s, so for most n, computing nP using the binary
expansion of n takes about 3
2 k steps (k doublings and 1
2 k additions). But if we
allow sums and diﬀerences of powers of 2, then one can show that most n have
an expansion with 2
3 of the terms being 0. So for most n, we can compute nP
in about 4
3 k + 1 steps (k + 1 doublings and 1
3 k additions).
Proposition 6.18. Let n be a positive integer and let k = log n + 1, which
means that 2k
n. Then we can always write
n = u0 + u1 · 2 + u2 · 4 + u3 · 8 + · · · + uk · 2k
(6.5)
with u0, u1, . . . , uk ∈ {−1, 0, 1} and at most 1
2 k of the ui nonzero.

Proof. The proof is essentially an algorithm for writing n in the desired form.
We start by writing n in binary,
n = n0 + n1 · 2 + n2 · 4 + · · · + nk−1 · 2k−1
with n0, . . . , nk−1 ∈ {0, 1}.
Working from left to right, we look for the first occurrence of two or more
consecutive nonzero ni coefficients. For example, suppose that
ns = ns+1 = · · · = ns+t−1 = 1 and ns+t = 0
for some t ≥ 2. In other words, the quantity
2s
+ 2s+1
+ · · · + 2s+t−1
+ 0 · 2s+t
(6.6)
appears in the binary expansion of n. We observe that
2s
+ 2s+1
+ · · · + 2s+t−1
+ 0 · 2s+t
= 2s
(1 + 2 + 4 + · · · + 2t−1
) = 2s
(2t
− 1),
so we can replace (6.6) with
−2s
+ 2s+t
.
Repeating this procedure, we end up with an expansion of n of the form (6.5)
in which no two consecutive ui are nonzero. (Note that although the original
binary expansion went up to only 2k−1
, the new expansion might go up to 2k
.)
6.3.2 How Hard Is the ECDLP?
The collision algorithms described in Sect. 5.4 are easily adapted to any group,
for example to the group of points E(Fp) on an elliptic curve. In order to
solve Q = nP, Eve chooses random integers j1, . . . , jr and k1, . . . , kr between 1
and p and makes two lists of points:
List #1. j1P, j2P, j3P, . . . , jrP,
List #2. k1P + Q, k2P + Q, k3P + Q, . . . , krP + Q.
As soon as she finds a match (collision) between the two lists, she is done,
since if she finds juP = kvP + Q, then Q = (ju − kv)P provides the solution.
As we saw in Sect. 5.4, if r is somewhat larger than
√
p, say r ≈ 3
√
p, then
there is a very good chance that there will be a collision.
This naive collision algorithm requires quite a lot of storage for the two
lists. However, it is not hard to adapt Pollard’s ρ method from Sect. 5.5 to
devise a storage-free collision algorithm with a similar running time. (See Ex-
ercise 6.13.) In any case, there are certainly algorithms that solve the ECDLP
for E(Fp) in O(
√
p ) steps.
We have seen that there are much faster ways to solve the discrete loga-
rithm problem for F∗
p. In particular, the index calculus described in Sect. 3.8

has a subexponential running time, i.e., the running time is O(p
) for ev-
ery 0. The principal reason that elliptic curves are used in cryptography
is the fact that there are no index calculus algorithms known for the ECDLP,
and indeed, there are no general algorithms known that solve the ECDLP in
fewer than O(
√
p ) steps. In other words, despite the highly structured nature
of the group E(Fp), the fastest known algorithms to solve the ECDLP are no
better than the generic algorithm that works equally well to solve the discrete
logarithm problem in any group. This fact is sufficiently important that it
bears highlighting.
The fastest known algorithm to
solve ECDLP in E(Fp) takes ap-
proximately
√
p steps.
Thus the ECDLP appears to be much more difficult than the DLP. Recall,
however, there are some primes p for which the DLP in F∗
p is comparatively
easy. For example, if p − 1 is a product of small primes, then the Pohlig–
Hellman algorithm (Theorem 2.31) gives a quick solution to the DLP in F∗
p.
In a similar fashion, there are some elliptic curves and some primes for which
the ECDLP in E(Fp) is comparatively easy. We discuss some of these special
cases, which must be avoided in the construction of secure cryptosystems, in
Sect. 6.9.1.
6.4 Elliptic Curve Cryptography
It is finally time to apply elliptic curves to cryptography. We start with the
easiest application, Diffie–Hellman key exchange, which involves little more
than replacing the discrete logarithm problem for the finite field Fp with
the discrete logarithm problem for an elliptic curve E(Fp). We then describe
elliptic analogues of the Elgamal public key cryptosystem and the digital
signature algorithm (DSA).
6.4.1 Elliptic Diffie–Hellman Key Exchange
Alice and Bob agree to use a particular elliptic curve E(Fp) and a particular
point P ∈ E(Fp). Alice chooses a secret integer nA and Bob chooses a secret
integer nB. They compute the associated multiples
Alice computes this

QA = nAP and
Bob computes this

QB = nBP ,
and they exchange the values of QA and QB. Alice then uses her secret multi-
plier to compute nAQB, and Bob similarly computes nBQA. They now have
the shared secret value

6.4. Elliptic Curve Cryptography 317
A trusted party chooses and publishes a (large) prime p,
an elliptic curve E over Fp, and a point P in E(Fp).
Private computations
Alice Bob
Chooses a secret integer nA. Chooses a secret integer nB.
Computes the point QA = nAP. Computes the point QB = nBP.
Public exchange of values
Alice sends QA to Bob −
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
→ QA
QB ←
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
− Bob sends QB to Alice
Further private computations
Alice Bob
Computes the point nAQB. Computes the point nBQA.
The shared secret value is nAQB = nA(nBP) = nB(nAP) = nBQA.
Table 6.5: Diffie–Hellman key exchange using elliptic curves
nAQB = (nAnB)P = nBQA,
which they can use as a key to communicate privately via a symmetric cipher.
Table 6.5 summarizes elliptic Diffie–Hellman key exchange.
Example 6.19. Alice and Bob decide to use elliptic Diffie–Hellman with the
following prime, curve, and point:
p = 3851, E : Y 2
= X3
+ 324X + 1287, P = (920, 303) ∈ E(F3851).
Alice and Bob choose respective secret values nA = 1194 and nB = 1759, and
then
Alice computes QA = 1194P = (2067, 2178) ∈ E(F3851),
Bob computes QB = 1759P = (3684, 3125) ∈ E(F3851).
Alice sends QA to Bob and Bob sends QB to Alice. Finally,
Alice computes nAQB = 1194(3684, 3125) = (3347, 1242) ∈ E(F3851),
Bob computes nBQA = 1759(2067, 2178) = (3347, 1242) ∈ E(F3851).
Bob and Alice have exchanged the secret point (3347, 1242). As will be ex-
plained in Remark 6.20, they should discard the y-coordinate and treat only
the value x = 3347 as a secret shared value.
One way for Eve to discover Alice and Bob’s secret is to solve the ECDLP
nP = QA,

since if Eve can solve this problem, then she knows nA and can use it to
compute nAQB. Of course, there might be some other way for Eve to com-
pute their secret without actually solving the ECDLP. The precise problem
that Eve needs to solve is the elliptic analogue of the Diffie–Hellman problem
described on page 69.
Definition. Let E(Fp) be an elliptic curve over a finite field and let P ∈
E(Fp). The Elliptic Curve Diffie–Hellman Problem is the problem of comput-
ing the value of n1n2P from the known values of n1P and n2P.
Remark 6.20. Elliptic Diffie–Hellman key exchange requires Alice and Bob
to exchange points on an elliptic curve. A point Q in E(Fp) consists of two
coordinates Q = (xQ, yQ), where xQ and yQ are elements of the finite field Fp,
so it appears that Alice must send Bob two numbers in Fp. However, those
two numbers modulo p do not contain as much information as two arbitrary
numbers, since they are related by the formula
y2
Q = x3
Q + AxQ + B in Fp.
Note that Eve knows A and B, so if she can guess the correct value of xQ,
then there are only two possible values for yQ, and in practice it is not too
hard for her to actually compute the two values of yQ.
There is thus little reason for Alice to send both coordinates of QA to Bob,
since the y-coordinate contains so little additional information. Instead, she
sends Bob only the x-coordinate of QA. Bob then computes and uses one of
the two possible y-coordinates. If he happens to choose the “correct” y, then
he is using QA, and if he chooses the “incorrect” y (which is the negative of
the correct y), then he is using −QA. In any case, Bob ends up computing
one of
±nBQA = ±(nAnB)P.
Similarly, Alice ends up computing one of ±(nAnB)P. Then Alice and Bob
use the x-coordinate as their shared secret value, since that x-coordinate is
the same regardless of which y they use.
Example 6.21. Alice and Bob decide to exchange another secret value using
the same public parameters as in Example 6.19:
p = 3851, E : Y 2
= X3
+ 324X + 1287, P = (920, 303) ∈ E(F3851).

6.4. Elliptic Curve Cryptography 319
However, this time they want to send fewer bits to one another. Alice and
Bob respectively choose new secret values nA = 2489 and nB = 2286, and as
before,
Alice computes QA = nAP = 2489(920, 303) = (593, 719) ∈ E(F3851),
Bob computes QB = nBP = 2286(920, 303) = (3681, 612) ∈ E(F3851).
However, rather than sending both coordinates, Alice sends only xA = 593 to
Bob and Bob sends only xB = 3681 to Alice.
Alice substitutes xB = 3681 into the equation for E and ﬁnds that
y2
B = x3
B + 324xB + 1287 = 36813
+ 324 · 3681 + 1287 = 997.
(Recall that all calculations are performed in F3851.) Alice needs to compute a
square root of 997 modulo 3851. This is not hard to do, especially for primes
satisfying p ≡ 3 (mod 4), since Proposition 2.26 tells her that b(p+1)/4
is a
square root of b modulo p. So Alice sets
yB = 997(3851+1)/4
= 997963
≡ 612 (mod 3851).
It happens that she gets the same point QB = (xB, yB) = (3681, 612) that
Bob used, and she computes nAQB = 2489(3681, 612) = (509, 1108).
Similarly, Bob substitutes xA = 593 into the equation for E and takes a
square root,
y2
A = x3
A + 324xA + 1287 = 5933
+ 324 · 593 + 1287 = 927,
yA = 927(3851+1)/4
= 927963
≡ 3132 (mod 3851).
Bob then uses the point Q
A = (593, 3132), which is not Alice’s point QA, to
compute nBQ
A = 2286(593, 3132) = (509, 2743). Bob and Alice end up with
points that are negatives of one another in E(Fp), but that is all right, since
their shared secret value is the x-coordinate x = 509, which is the same for
both points.
6.4.2 Elliptic Elgamal Public Key Cryptosystem
It is easy to create a direct analogue of the Elgamal public key cryptosystem
described in Sect. 2.4. Brieﬂy, Alice and Bob agree to use a particular prime p,
elliptic curve E, and point P ∈ E(Fp). Alice chooses a secret multiplier nA
and publishes the point QA = nAP as her public key. Bob’s plaintext is a
point M ∈ E(Fp). He chooses an integer k to be his random element and
computes
C1 = kP and C2 = M + kQA.
He sends the two points (C1, C2) to Alice, who computes
C2 − nAC1 = (M + kQA) − nA(kP) = M + k(nAP) − nA(kP) = M

A trusted party chooses and publishes a (large) prime p,
an elliptic curve E over Fp, and a point P in E(Fp).
Alice Bob
Key creation
Choose a private key nA.
Compute QA = nAP in E(Fp).
Publish the public key QA.
Encryption
Choose plaintext M ∈ E(Fp).
Choose a random element k.
Use Alice’s public key QA to
compute C1 = kP ∈ E(Fp).
and C2 = M + kQA ∈ E(Fp).
Send ciphertext (C1, C2)
to Alice.
Decryption
Compute C2 − nAC1 ∈ E(Fp).
This quantity is equal to M.
Table 6.6: Elliptic Elgamal key creation, encryption, and decryption
to recover the plaintext. The elliptic Elgamal public key cryptosystem is sum-
marized in Table 6.6.
In principle, the elliptic Elgamal cryptosystem works fine, but there are
some practical difficulties.
1. There is no obvious way to attach plaintext messages to points in E(Fp).
2. The elliptic Elgamal cryptosystem has 4-to-1 message expansion, as
compared to the 2-to-1 expansion ratio of Elgamal using Fp. (See
Remark 2.9.)
The reason that elliptic Elgamal has a 4-to-1 message expansion lies in
the fact that the plaintext M is a single point in E(Fp). By Hasse’s theorem
(Theorem 6.11) there are approximately p different points in E(Fp), hence
only about p different plaintexts. However, the ciphertext (C1, C2) consists of
four numbers modulo p, since each point in E(Fp) has two coordinates.
Various methods have been proposed to solve these problems. The diffi-
culty of associating plaintexts to points can be circumvented by choosing M
randomly and using it as a mask for the actual plaintext. One such method,
which also decreases message expansion, is described in Exercise 6.17.
Another natural way to improve message expansion is to send only the x-
coordinates of C1 and C2, as was suggested for Diffie–Hellman key exchange

6.5. The Evolution of Public Key Cryptography 321
in Remark 6.20. Unfortunately, since Alice must compute the difference
C2 − nAC1, she needs the correct values of both the x-and y-coordinates of C1
and C2. (Note that the points C2 − nAC1 and C2 + nAC1 are quite different!)
However, the x-coordinate of a point determines the y-coordinate up to change
of sign, so Bob can send one extra bit, for example
Extra bit =

0 if 0 ≤ y 1
2 p,
1 if 1
2 p y p
(See Exercise 6.16.) In this way, Bob needs to send only the x-coordinates
of C1 and C2, plus two extra bits. This idea is sometimes referred to as point
compression.
6.4.3 Elliptic Curve Signatures
The Elliptic Curve Digital Signature Algorithm (ECDSA), which is described
in Table 6.7, is a straightforward analogue of the digital signature algorithm
(DSA) described in Table 4.3 of Sect. 4.3. ECDSA is in widespread use, es-
pecially, but not only, in situations where signature size is important. Offi-
cial specifications for implementing ECDSA are described in [6, 142]. (See
also Sect. 8.8 for an amusing real-world implementation of digital cash using
ECDSA.)
In order to prove that ECDSA works, i.e., that the verification step suc-
ceeds in verifying a valid signature, we compute
v1G + v2V = ds−1
2 G + s1s−1
2 (sG)
= (d + ss1)s−1
2 G
= (es2)s−1
2 G
= eG ∈ E(Fp).
Hence
x(v1G + v2V ) mod q = x(eG) (mod q) = s1,
so the signature is accepted as valid.
6.5 The Evolution of Public
Key Cryptography
The invention of RSA in the late 1970s catapulted the problem of factoring
large integers into prominence, leading to improved factorization methods
such as the quadratic and number field sieves described in Sect. 3.7. In 1984,
Hendrik Lenstra Jr. circulated a manuscript describing a new factorization
method using elliptic curves. Lenstra’s algorithm [75], which we describe in
Sect. 6.6, is an elliptic analogue of Pollard’s p − 1 factorization algorithm

A trusted party chooses a finite field Fp, an elliptic curve E/Fp,
and a point G ∈ E(Fp) of large prime order q.
Samantha Victor
Key creation
1 s q − 1.
Compute V = sG ∈ E(Fp).
Publish the verification key V .
Signing
Choose document d mod q.
Choose random element e mod q.
Compute eG ∈ E(Fp) and then,
s1 = x(eG) mod q and
s2 ≡ (d + ss1)e−1
(mod q).
Publish the signature (s1, s2).
Verification
Compute v1 ≡ ds−1
2 (mod q) and
v2 ≡ s1s−1
2 (mod q).
Compute v1G+v2V ∈ E(Fp) and ver-
ify that
x(v1G + v2V ) mod q = s1.
Table 6.7: The elliptic curve digital signature algorithm (ECDSA)
(Sect. 3.5) and exploits the fact that the number of points in E(Fp) varies as
one chooses different elliptic curves. Although less efficient than sieve methods
for the factorization problems that occur in cryptography, Lenstra’s algorithm
helped introduce elliptic curves to the cryptographic community.
The importance of factorization algorithms for cryptography is that they
are used to break RSA and other similar cryptosystems. In 1985, Neal Koblitz
and Victor Miller independently proposed using elliptic curves to create cryp-
tosystems. They suggested that the elliptic curve discrete logarithm problem
might be more difficult than the classical discrete logarithm problem mod-
ulo p. Thus Diffie–Hellman key exchange and the Elgamal public key cryp-
tosystem, implemented using elliptic curves as described in Sect. 6.4, might
require smaller keys and run more efficiently than RSA because one could use
smaller numbers.
Koblitz [67] and Miller [88] each published their ideas as academic papers,
but neither of them pursued the commercial aspects of elliptic curve cryptog-
raphy. Indeed, at the time, there was virtually no research on the ECDLP,
so it was difficult to say with any confidence that the ECDLP was indeed
significantly more difficult than the classical DLP. However, the potential of
what became known as elliptic curve cryptography (ECC) was noted by Scott

6.5. The Evolution of Public Key Cryptography 323
Vanstone and Ron Mullin, who had started a cryptographic company called
Certicom in 1985. They joined with other researchers in both academia and
the business world to promote ECC as an alternative to RSA and Elgamal.
All was not smooth sailing. For example, during the late 1980s, various
cryptographers proposed using so-called supersingular elliptic curves for added
efficiency, but in 1990, the MOV algorithm (see Sect. 6.9.1) showed that su-
persingular curves are vulnerable to attack. Some saw this as an indictment of
ECC as a whole, while others pointed out that RSA also has weak instances
that must be avoided, e.g., RSA must avoid using numbers that can be easily
factored by Pollard’s p − 1 method.
The purely mathematical question of whether ECC provided a secure and
efficient alternative to RSA was clouded by the fact that there were com-
mercial and financial issues at stake. In order to be commercially successful,
cryptographic methods must be standardized for use in areas such as commu-
nications and banking. RSA had the initial lead, since it was invented first,
but RSA was patented, and some companies resisted the idea that standards
approved by trade groups or government bodies should mandate the use of
a patented technology. Elgamal, after it was invented in 1985, provided a
royalty-free alternative, so many standards specified Elgamal as an alterna-
tive to RSA. In the meantime, ECC was growing in stature, but even as late
as 1997, more than a decade after its introduction, leading experts indicated
their doubts about the security of ECC.7
A major dilemma pervading the field of cryptography is that no one knows
the actual difficulty of the supposedly hard problems on which it is based.
Currently, the security of public key cryptosystems depends on the percep-
tion and consensus of experts as to the difficulty of problems such as integer
factorization and discrete logarithms. All that can be said is that “such-and-
such a problem has been extensively studied for N years, and here is the
fastest known method for solving it.” Proponents of factorization-based cryp-
tosystems point to the fact that, in some sense, people have been trying to
factor numbers since antiquity; but in truth, the modern theory of factor-
ization requires high-speed computing devices and barely predates the inven-
tion of RSA. Serious study of the elliptic curve discrete logarithm problem
started in the late 1980s, so modern factorization methods have a 10–15 year
head start on ECDLP. In Chap. 7 we will describe public key cryptosystems
(NTRU, GGH) whose security is based on certain hard problems in the the-
ory of lattices. Lattices have been extensively investigated since the nineteenth
century, but again the invention and analysis of modern computational algo-
rithms is much more recent, having been initiated by fundamental work of
7In 1997, the RSA corporation posted the following quote by RSA co-inventor Ron
Rivest on its website: “But the security of cryptosystems based on elliptic curves is not well
understood, due in large part to the abstruse nature of elliptic curves. . . .
Over time, this may change, but for now trying to get an evaluation of the security of
an elliptic-curve cryptosystem is a bit like trying to get an evaluation of some recently
discovered Chaldean poetry. Until elliptic curves have been further studied and evaluated,
I would advise against fielding any large-scale applications based on them.”

Lenstra, Lenstra, and Lovász in the early 1980s. Lattices appeared as a tool
for cryptanalysis during the 1980s and as a means of creating cryptosystems
in the 1990s.
RSA, the first public key cryptosystem, was patented by its inventors.
The issue of patents in cryptography is fraught with controversy. One might
argue that the RSA patent, which ran from 1983 to 2000, set back the use
of cryptography by requiring users to pay licensing fees. However, it is also
true that in order to build a company, an inventor needs investors willing to
risk their money, and it is much easier to raise funds if there is an exclusive
product to offer. Further, the fact that RSA was originally the “only game
in town” meant that it automatically received extensive scrutiny from the
academic community, which helped to validate its security.
The invention and eventual commercial implementation of ECC followed a
different path. Since neither Koblitz nor Miller applied for a patent, the basic
underlying idea of ECC became freely available for all to use. This led Cer-
ticom and other companies to apply for patents giving improvements to the
basic ECC idea. Some of these improvements were based on significant new
research ideas, while others were less innovative and might almost be char-
acterized as routine homework problems.8
Unfortunately, the United States
Patents and Trademark Office (USPTO) does not have the expertise to effec-
tively evaluate the flood of cryptographic patent applications that it receives.
The result has been a significant amount of uncertainty in the marketplace as
to which versions of ECC are free and which require licenses, even assuming
that all of the issued patents can withstand a legal challenge.
6.6 Lenstra’s Elliptic Curve Factorization
Algorithm
Pollard’s p − 1 factorization method, which we discussed in Sect. 3.5, finds
factors of N = pq by searching for a power aL
with the property that
aL
≡ 1 (mod p) and aL
≡ 1 (mod q).
Fermat’s little theorem tells us that this is likely to work if p − 1 divides L
and q − 1 does not divide L. So what we do is to take L = n! for some moderate
value of n. Then we hope that p − 1 or q − 1, but not both, is a product of
small primes, hence divides n!. Clearly Pollard’s method works well for some
numbers, but not for all numbers. The determining factor is whether p − 1
or q − 1 is a product of small primes.
What is it about the quantity p − 1 that makes it so important for Pollard’s
method? The answer lies in Fermat’s little theorem. Intrinsically, p − 1 is
8For example, at the end of Sect. 6.4.2 we described how to save bandwidth in elliptic
Elgamal by sending the x-coordinate and one additional bit to specify the y-coordinate.
This idea is called “point compression” and is covered by US Patent 6,141,420.

6.6. Lenstra’s Elliptic Curve Factorization Algorithm 325
important because there are p − 1 elements in F∗
p, so every element α of F∗
p
satisfies αp−1
= 1. Now consider that last statement as it relates to the theme
of this chapter, which is that the points and the addition law for an elliptic
curve E(Fp) are very much analogous to the elements and the multiplication
law for F∗
p. Hendrik Lenstra [75] made this analogy precise by devising a
factorization algorithm that uses the group law on an elliptic curve E in place
of multiplication modulo N.
In order to describe Lenstra’s algorithm, we need to work with an elliptic
curve modulo N, where the integer N is not prime, so the ring Z/NZ is not
a field. However, suppose that we start with an equation
E : Y 2
= X3
+ AX + B
and suppose that P = (a, b) is a point on E modulo N, by which we mean
that
b2
≡ a3
+ A · a + B (mod N).
Then we can apply the elliptic curve addition algorithm (Theorem 6.6) to
compute 2P, 3P, 4P, . . ., since the only operations required by that algorithm
are addition, subtraction, multiplication, and division (by numbers relatively
prime to N).
Example 6.22. Let N = 187 and consider the elliptic curve
E : Y 2
= X3
+ 3X + 7
modulo 187 and the point P = (38, 112), that is on E modulo 187. In order
to compute 2P mod 187, we follow the elliptic curve addition algorithm and
compute
1
2y(P)
=
1
224
≡ 91 (mod 187),
λ =
3x(P)2
+ A
2y(P)
=
4335
224
≡ 34 · 91 ≡ 102 (mod 187),
x(2P) = λ2
− 2x(P) = 10328 ≡ 43 (mod 187),
y(2P) = λ

x(P) − x(2P)

− y(P) = 102(38 − 43) − 112 ≡ 126 (mod 187).
Thus 2P = (43, 126) as a point on the curve E modulo 187.
For clarity, we have written x(P) and y(P) for the x-and y-coordinates
of P, and similarly for 2P. Also, during the calculation we needed to find the
reciprocal of 224 modulo 187, i.e., we needed to solve the congruence
224d ≡ 1 (mod 187).
This was easily accomplished using the extended Euclidean algorithm (The-
orem 1.11; see also Remark 1.15 and Exercise 1.12), since it turns out that
gcd(224, 187) = 1.

We next compute 3P = 2P + P in a similar fashion. In this case, we are
adding distinct points, so the formula for λ is different, but the computation
is virtually the same:
1
x(2P) − x(P)
=
1
5
≡ 75 (mod 187),
λ =
y(2P) − y(P)
x(2P) − x(P)
=
14
5
≡ 14 · 75 ≡ 115 (mod 187),
x(3P) = λ2
− x(2P) − x(P) = 13144 ≡ 54 (mod 187),
y(3P) = λ

x(P) − x(3P)

− y(P) = 115(38 − 54) − 112 ≡ 105 (mod 187).
Thus 3P = (54, 105) on the curve E modulo 187. Again we needed to compute
a reciprocal, in this case, the reciprocal of 5 modulo 187. We leave it to you to
continue the calculations. For example, it is instructive to check that P + 3P
and 2P + 2P give the same answer, namely 4P = (93, 64).
Example 6.23. Continuing with Example 6.22, we attempt to compute 5P for
the point P = (38, 112) on the elliptic curve
E : Y 2
= X3
+ 3X + 7 modulo 187.
We already computed 2P = (43, 126) and 3P = (54, 105). The first step in
computing 5P = 3P + 2P is to compute the reciprocal of
x(3P) − x(2P) = 54 − 43 = 11 modulo 187.
However, when we apply the extended Euclidean algorithm to 11 and 187, we
find that gcd(11, 187) = 11, so 11 does not have a reciprocal modulo 187.
It seems that we have hit a dead end, but in fact, we have struck it rich!
Notice that since the quantity gcd(11, 187) is greater than 1, it gives us a
divisor of 187. So our failure to compute 5P also tells us that 11 divides 187,
which allows us to factor 187 as 187 = 11 · 17. This idea underlies Lenstra’s
elliptic curve factorization algorithm.
We examine more closely why we were not able to compute 5P modulo 187.
If we instead look at the elliptic curve E modulo 11, then a quick computation
shows that the point
P = (38, 112) ≡ (5, 2) (mod 11) satisfies 5P = O in E(F11).
This means that when we attempt to compute 5P modulo 11, we end up with
the point O at infinity, so at some stage of the calculation we have tried to
divide by zero. But here “zero” means zero in F11, so we actually end up
trying to find the reciprocal modulo 11 of some integer that is divisible by 11.
Following the lead from Examples 6.22 and 6.23, we replace multiplication
modulo N in Pollard’s factorization method with addition modulo N on an
elliptic curve. We start with an elliptic curve E and a point P on E modulo N
and we compute

6.6. Lenstra’s Elliptic Curve Factorization Algorithm 327
2! · P, 3! · P, 4! · P, 5! · P, . . . (mod N).
Notice that once we have computed Q = (n − 1)! · P, it is easy to com-
pute n! · P, since it equals nQ. At each stage, there are three things that
may happen. First, we may be able to compute n! · P. Second, during the
computation we may need to find the reciprocal of a number d that is a
multiple of N, which would not be helpful, but luckily this situation is quite
unlikely to occur. Third, we may need to find the reciprocal of a number d
that satisfies 1 gcd(d, N) N, in which case the computation of n! · P fails,
but gcd(d, N) is a nontrivial factor of N, so we are happy.
Input. Integer N to be factored.
1. Choose random values A, a, and b modulo N.
2. Set P = (a, b) and B ≡ b2
− a3
− A · a (mod N).
Let E be the elliptic curve E : Y 2
= X3
+ AX + B.
3. Loop j = 2, 3, 4, . . . up to a specified bound.
4. Compute Q ≡ jP (mod N) and set P = Q.
5. If computation in Step 4 fails,
then we have found a d 1 with d | N.
6. If d N, then success, return d.
7. If d = N, go to Step 1 and choose a new curve and point.
8. Increment j and loop again at Step 2.
Table 6.8: Lenstra’s elliptic curve factorization algorithm
This completes the description of Lenstra’s elliptic curve factorization al-
gorithm, other than the minor problem of finding an initial point P on an
elliptic curve E modulo N. The obvious method is to fix an equation for the
curve E, plug in values of X, and check whether the quantity X3
+ AX + B is
a square modulo N. Unfortunately, this is difficult to do unless we know how
to factor N. The solution to this dilemma is to first choose the point P = (a, b)
at random, second choose a random value for A, and third set
B ≡ b2
− a3
− A · a (mod N).
Then the point P is automatically on the curve E : Y 2
= X3
+ AX + B mod-
ulo N. Lenstra’s algorithm is summarized in Table 6.8.
Example 6.24. We illustrate Lenstra’s algorithm by factoring N = 6887. We
begin by randomly selecting a point P = (1512, 3166) and a number A = 14
and computing
B ≡ 31662
− 15123
− 14 · 1512 ≡ 19 (mod 6887).
We let E be the elliptic curve

E : Y 2
= X3
+ 14X + 19,
so by construction, the point P is automatically on E modulo 6887. Now we
start computing multiples of P modulo 6887. First we find that
2P ≡ (3466, 2996) (mod 6887).
Next we compute
3! · P = 3 · (2P) = 3 · (3466, 2996) ≡ (3067, 396) (mod 6887).
n n! · P mod 6887
1 P = (1512, 3166)
2 2! · P = (3466, 2996)
3 3! · P = (3067, 396)
4 4! · P = (6507, 2654)
5 5! · P = (2783, 6278)
6 6! · P = (6141, 5581)
Table 6.9: Multiples of P = (1512, 3166) on Y 2
≡ X3
+ 14X + 19 (mod 6887)
And so on. The values up to 6! · P are listed in Table 6.9. These values are
not, in and of themselves, interesting. It is only when we try, and fail, to
compute 7! · P, that something interesting happens.
From Table 6.9 we read off the value of Q = 6! · P = (6141, 5581), and we
want to compute 7Q. First we compute
2Q ≡ (5380, 174) (mod 6887),
4Q ≡ 2 · 2Q ≡ (203, 2038) (mod 6887).
Then we compute 7Q as
Q ≡ (Q + 2Q) + 4Q (mod 6887)
≡

(6141, 5581) + (5380, 174)

+ (203, 2038) (mod 6887)
≡ (984, 589) + (203, 2038) (mod 6887).
When we attempt to perform the final step, we need to compute the reciprocal
of 203 − 984 modulo 6887, but we find that
gcd(203 − 984, 6887) = gcd(−781, 6887) = 71.
Thus we have discovered a nontrivial divisor of 6887, namely 71, which gives
the factorization 6887 = 71 · 97.
It turns out that in E(F71), the point P satisfies 63P ≡ O (mod 71), while
in E(F97), the point P satisfies 107P ≡ O (mod 97). The reason that we suc-
ceeded in factoring 6887 using 7! · P, but not with a smaller multiple of P, is
precisely because 7! is the smallest factorial that is divisible by 63.

6.7. Elliptic Curves over F2 and over F2k 329
Remark 6.25. In Sect. 3.7 we discussed the speed of sieve factorization meth-
ods and saw that the average running time of the quadratic sieve to factor a
composite number N is approximately
O

e
√
(log N)(log log N)

steps. (6.7)
Notice that the running time depends on the size of the integer N.
On the other hand, the most naive possible factorization method, namely
trying each possible divisor 2, 3, 4, 5, . . ., has a running time that depends on
the smallest prime factor of N. More precisely, this trial division algorithm
takes exactly p steps, where p is the smallest prime factor of N. If it happens
that N = pq with p and q approximately the same size, then the running time
is approximately
√
N, which is much slower than sieve methods; but if N
happens to have a very small prime factor, trial division may be helpful in
finding it.
It is an interesting and useful property of the elliptic curve factorization
algorithm that its expected running time depends on the smallest prime factor
of N, rather than on N itself. (See Exercise 5.44 for another, albeit slower,
factorization algorithm with this property.) More precisely, if p is the smallest
factor of N, then the elliptic curve factorization algorithm has average running
time approximately
O

e
√
2(log p)(log log p)

steps. (6.8)
If N = pq is a product of two primes with p ≈ q, the running times
in (6.7) and (6.8) are approximately equal, and then the fact that a sieve step is
much faster than an elliptic curve step makes sieve methods faster in practice.
However, the elliptic curve method is quite useful for finding moderately large
factors of extremely large numbers, because its running time depends on the
smallest prime factor.
6.7 Elliptic Curves over F2 and over F2k
Computers speak binary, so they are especially well suited to doing calcu-
lations modulo 2. This suggests that it might be more efficient to use ellip-
tic curves modulo 2. Unfortunately, if E is an elliptic curve defined over F2,
then E(F2) contains at most 5 points, so E(F2) is not useful for cryptographic
purposes.
However, there are other finite fields in which 2 = 0. These are the
fields F2k containing 2k
elements. Recall from Sect. 2.10.4 that for every prime
power pk
there exists a field Fpk with pk
elements; and further, up to rela-
beling the elements, there is exactly one such field. So we can take an elliptic
curve whose Weierstrass equation has coefficients in a field Fpk and look at
the group of points on that curve having coordinates in Fpk . Hasse’s theorem
(Theorem 6.11) is true in this more general setting.

Theorem 6.26 (Hasse). Let E be an elliptic curve over Fpk . Then
#E(Fpk ) = pk
+ 1 − tpk with tpk satisfying |tpk | ≤ 2pk/2
.
Example 6.27. We work with the field
F9 = {a + bi : a, b ∈ F3}, where i2
= −1.
(See Example 2.58 for a discussion of Fp2 for primes p ≡ 3 (mod 4).) Let E
be the elliptic curve over F9 defined by the equation
E : Y 2
= X3
+ (1 + i)X + (2 + i).
By trial and error we find that there are 10 points in E(F9),
(2i, 1 + 2i), (2i, 2 + i), (1 + i, 1 + i), (1 + i, 2 + 2i), (2, 0),
(2 + i, i), (2 + i, 2i), (2 + 2i, 1), (2 + 2i, 2), O.
Points can be doubled or added to one another using the formulas for the
addition of points, always keeping in mind that i2
= −1 and that we are
working modulo 3. For example, you can check that
(2, 0) + (2 + i, 2i) = (2i, 1 + 2i) and 2(1 + i, 2 + 2i) = (2 + i, i).
Our goal is to use elliptic curves over F2k for cryptography, but there is
one difficulty that we must first address. The problem is that we cheated a
little bit when we defined an elliptic curve as a curve given by a Weierstrass
equation Y 2
= X3
+ AX + B satisfying Δ = 4A3
+ 27B2
= 0. In fact, the
correct definition of the discriminant Δ is
Δ = −16(4A3
+ 27B2
).
As long as we work in a field where 2 = 0, then the condition Δ = 0 is
the same with either definition, but for fields such as F2k where 2 = 0, we
have Δ = 0 for every standard Weierstrass equation. The solution is to enlarge
the collection of allowable Weierstrass equations.
Definition. An elliptic curve E is the set of solutions to a generalized Weier-
strass equation
E : Y 2
+ a1XY + a3Y = X3
+ a2X2
+ a4X + a6,
together with an extra point O. The coefficients a1, . . . , a6 are required to
satisfy Δ = 0, where the discriminant Δ is defined in terms of certain quan-
tities b2, b4, b6, b8 as follows:
b2 = a2
1 + 4a2, b4 = 2a4 + a1a3, b6 = a2
3 + 4a6,
b8 = a2
1a6 + 4a2a6 − a1a3a4 + a2a2
3 − a2
4,

Δ = −b2
2b8 − 8b3
4 − 27b2
6 + 9b2b4b6.
(Although these formulas look complicated, they are easy enough to compute,
and the condition Δ = 0 is exactly what is required to ensure that the curve E
is nonsingular.)
The geometric definition of the addition law on E is similar to our earlier
definition, the only change being that the old reflection step (x, y) → (x, −y)
is replaced by the slightly more complicated reflection step
(x, y) −→ (x, −y − a1x − a3).
This is also the formula for the negative of a point.
Working with generalized Weierstrass equations, it is not hard to derive
an addition algorithm similar to the algorithm described in Theorem 6.6; see
Exercise 6.22 for details. For example, if P1 = (x1, y1) and P2 = (x2, y2) are
points with P1 = ±P2, then the x-coordinate of their sum is given by
x(P1 + P2) = λ2
+ a1λ − a2 − x1 − x2 with λ =
y2 − y1
x2 − x1
.
Similarly, the x-coordinate of twice a point P = (x, y) is given by the dupli-
cation formula
x(2P) =
x4
− b4x2
− 2b6x − b8
4x3 + b2x2 + 4b4x + b6
.
Example 6.28. The polynomial T3
+ T + 1 is irreducible in F2[T], so as ex-
plained in Sect. 2.10.4, the quotient ring F2[T]/(T3
+ T + 1) is a field F8 with
eight elements. Every element in F8 can be represented by an expression of
the form
a + bT + cT2
with a, b, c ∈ F2,
with the understanding that when we multiply two elements, we divide the
product by T3
+ T + 1 and take the remainder.
Now consider the elliptic curve E defined over the field F8 by the general-
ized Weierstrass equation
E : Y 2
+ (1 + T)Y = X3
+ (1 + T2
)X + T.
The discriminant of E is Δ = 1 + T + T2
. There are nine points in E(F8),
(0, T), (0, 1), (T, 0), (T, 1 + T), (1 + T, T),
(1 + T, 1), (1 + T2
, T + T2
), (1 + T2
, 1 + T2
), O.
Using the group law described in Exercise 6.22, we can add and double points,
for example
(1 + T2
, T + T2
) + (1 + T, T) = (1 + T2
, 1 + T2
) and 2(T, 1 + T) = (T, 0).

There are some computational advantages to working with elliptic curves
defined over F2k , rather than over Fp. We already mentioned the first, the bi-
nary nature of computers tends to make them operate more efficiently in situ-
ations in which 2 = 0. A second advantage is the option to take k composite, in
which case F2k contains other finite fields intermediate between F2 and F2k .
(The precise statement is that F2j is a subfield of F2k if and only if j | k.)
These intermediate fields can sometimes be used to speed up computations,
but there are also situations in which they cause security problems. So as is
often the case, increased efficiency may come at the cost of decreased security;
to avoid potential problems, it is often safest to use fields F2k with k prime.
The third, and most important, advantage of working over F2k lies in a
suggestion of Neal Koblitz to use an elliptic curve E over F2, while taking
points on E with coordinates in F2k . As we now explain, this allows the use
of the Frobenius map instead of the doubling map and leads to a significant
gain in efficiency.
Definition. The (p-power) Frobenius map τ is the map from the field Fpk to
itself defined by the simple rule
τ : Fpk −→ Fpk , α −→ αp
.
The Frobenius map has the surprising property that it preserves addition
and multiplication,9
τ(α + β) = τ(α) + τ(β) and τ(α · β) = τ(α) · τ(β).
The multiplication rule is obvious, since
τ(α · β) = (α · β)p
= αp
· βp
= τ(α) · τ(β).
In general, the addition rule is a consequence of the binomial theorem (see
Exercise 6.24). For p = 2, which is what we will need, the proof is easy,
τ(α + β) = (α + β)2
= α2
+ 2α · β + β2
= α2
+ β2
= τ(α) + τ(β),
where we have used the fact that 2 = 0 in F2k . We also note that τ(α) = α
for every α ∈ F2, which is clear, since F2 = {0, 1}.
Now let E be an elliptic curve defined over F2, i.e., given by a generalized
Weierstrass equation with coefficients in F2, and let P = (x, y) ∈ E(F2k ) be a
point on E with coordinates in some larger field F2k . We define a Frobenius
map on points in E(F2k ) by applying τ to each coordinate,
τ(P) =

τ(x), τ(y)

. (6.9)
We are going to show that the map τ has some nice properties. For example,
we claim that
τ(P) ∈ E(F2k ). (6.10)
9In mathematical terminology, the Frobenius map τ is a field automorphism of Fpk .
It also fixes Fp. One can show that the Galois group of Fpk /Fp is cyclic of order k and is
generated by τ.

Further, if P, Q ∈ E(F2k ), then we claim that
τ(P + Q) = τ(P) + τ(Q). (6.11)
In other words, τ maps E(F2k ) to itself, and it respects the addition law.
(In mathematical terminology, the Frobenius map is a group homomorphism
of E(F2k ) to itself.)
It is easy to check (6.10). We are given that P = (x, y) ∈ E(F2k ), so
y2
+ a1xy + a3y − x3
− a2x2
− a4x − a6 = 0.
Applying τ to both sides and using the fact that τ respects addition and
multiplication in F2k , we find that
τ(y)2
+τ(a1)τ(x)τ(y)+τ(a3)τ(y)−τ(x)3
−τ(a2)τ(x)2
−τ(a4)τ(x)−τ(a6) = 0.
By assumption, the Weierstrass equation has coefficients in F2, and we know
that τ fixes elements of F2, so
τ(y)2
+ a1τ(x)τ(y) + a3τ(y) − τ(x)3
− a2τ(x)2
− a4τ(x) − a6 = 0.
Hence τ(P) =

τ(x), τ(y)

is a point of E(F2k ).
A similar computation, which we omit, shows that (6.11) is true. The
key fact is that the addition law on E requires only addition, subtraction,
multiplication, and division of the coordinates of points and the coefficients
of the Weierstrass equation.
Our next result shows that the Frobenius map is closely related to the
number of points in E(Fp).
Theorem 6.29. Let E be an elliptic curve over Fp and let
t = p + 1 − #E(Fp).
Notice that Hasse’s theorem (Theorem 6.11) says that |t| ≤ 2
√
p.
(a) Let α and β be the complex roots of the quadratic polynomial Z2
−tZ +p.
Then |α| = |β| =
√
p, and for every k ≥ 1 we have
#E(Fpk ) = pk
+ 1 − αk
− βk
.
(b) Let
τ : E(Fpk ) −→ E(Fpk ), (x, y) −→ (xp
, yp
),
be the Frobenius map. Then for every point Q ∈ E(Fpk ) we have
τ2
(Q) − t · τ(Q) + p · Q = O,
where τ2
(Q) denotes the composition τ(τ(Q)).
Proof. The proof requires more tools than we have at our disposal; see for
example [136, V §2] or [147].

Recall from Sect. 6.3.1 that to compute a multiple nP of a point P, we first
expressed n as a sum of powers of 2 and then used a double-and-add method to
compute nP. For random values of n, this required approximately log n dou-
blings and 1
2 log n additions. A refinement of this method using both positive
and negative powers of 2 reduces the time to approximately log n doublings
and 1
3 log n additions. Notice that the number of doublings remains at log n.
Koblitz’s idea is to replace the doubling map with the Frobenius map. This
leads to a large savings, because it takes much less time to compute τ(P)
than it does to compute 2P. The key to the approach is Theorem 6.29, which
tells us that the action of the Frobenius map on E(F2k ) satisfies a quadratic
equation.
Definition. A Koblitz curve is an elliptic curve defined over F2 by an equation
of the form
Ea : Y 2
+ XY = X3
+ aX2
+ 1
with a ∈ {0, 1}. The discriminant of Ea is Δ = 1.
For concreteness we restrict attention to the curve
E0 : Y 2
+ XY = X3
+ 1.
It is easy to check that
E0(F2) =

(0, 1), (1, 0), (1, 1), O

,
so #E0(F2) = 4 and
t = 2 + 1 − #E0(F2) = −1.
To apply Theorem 6.29, we use the quadratic formula to find the roots of the
polynomial Z2
+ Z + 2. The roots are
−1 +
√
−7
2
and
−1 −
√
−7
2
.
Then Theorem 6.29(a) tells us that
#E0(F2k ) = 2k
+ 1 −

−1 +
√
−7
2

k
. (6.12)
This formula easily allows us to compute the number of points in #E0(F2k ),
even for very large values of k. For example,
#E0(F297 ) = 158456325028528296935114828764.
(See also Exercise 6.25.)
Further, Theorem 6.29(b) says that the Frobenius map τ satisﬁes the equa-
tion τ2
+ τ + 2 = 0 when it acts on points of E(F2k ), i.e.,

τ2
(P) + τ(P) + 2P = O for all P ∈ E(F2k ).
The idea now is to write an arbitrary integer n as a sum of powers of τ, subject
to the assumption that τ2
= −2 − τ. Say we have written n as
n = v0 + v1τ + v2τ2
+ · · · + vτ
with vi ∈ {−1, 0, 1}.
Then we can compute nP efficiently using the formula
nP = (v0 + v1τ + v2τ2
+ · · · + vτ
)P
= v0P + v1τ(P) + v2τ2
(P) + · · · + vτ
(P).
This takes less time than using the binary or ternary method because it is far
easier to compute τi
(P) than it is to compute 2i
P.
Proposition 6.30. Let n be a positive integer. Then n can be written in the
form
n = v0 + v1τ + v2τ2
+ · · · + vτ
with vi ∈ {−1, 0, 1}, (6.13)
under the assumption that τ satisfies τ2
= −2 − τ. Further, this can always
be done with ≈ 2 log n and with at most 1
3 of the vi nonzero.
Proof. The proof is similar to Proposition 6.18, the basic idea being that we
write integers as 2a + b with b ∈ {0, 1, −1} and replace 2 with −τ − τ2
;
see Exercise 6.27. With more work, it is possible to find an expansion (6.13)
with ≈ log n and approximately 1
3 of the vi nonzero; see [29, §15.1].
Example 6.31. We illustrate Proposition 6.30 with a numerical example.
Let n = 7. Then
7 = 1 + 3 · 2 = 1 + 3 · (−τ − τ2
) = 1 − 3τ − 3τ2
= 1 − τ − τ2
− 2τ − 2τ2
= 1 − τ − τ2
− (−τ − τ2
)τ − (−τ − τ2
)τ2
= 1 − τ + 2τ3
+ τ4
= 1 − τ + (−τ − τ2
)τ3
+ τ4
= 1 − τ − τ5
.
Thus 7 = 1 − τ − τ5
.
Remark 6.32. As we have seen, computing #E(F2k ) for Koblitz curves is
very easy. However, for general elliptic curves over F2k , this is a more difficult
task. The SEA algorithm and its variants [120, 121] that we mentioned in Re-
mark 6.13 are reasonably efficient at counting the number of points in E(Fq)
for any fields with a large number of elements. Satoh [113] devised an alter-
native method that is often faster than SEA when q = pe
for a small prime p
and (moderately) large exponent e. Satoh’s original paper dealt only with the
case p ≥ 3, but subsequent work [44, 140] covers also the cryptographically
important case of p = 2.

6.8 Bilinear Pairings on Elliptic Curves
You have probably seen examples of bilinear pairings in a linear algebra class.
For example, the dot product is a bilinear pairing on the vector space Rn
,
β(v, w) = v · w = v1w1 + v2w2 + · · · + vnwn.
It is a pairing in the sense that it takes a pair of vectors and returns a num-
ber, and it is bilinear in the sense that it is a linear transformation in each
of its variables. In other words, for any vectors v1, v2, w1, w2 and any real
numbers a1, a2, b1, b2, we have
β(a1v1 + a2v2, w) = a1β(v1, w) + a2β(v2, w),
β(v, b1w1 + b2w2) = b1β(v, w1) + b2β(v, w2).
(6.14)
More generally, if A is any n-by-n matrix, then the function β(v, w) = vAwt
is a bilinear pairing on Rn
, where we write v as a row vector and we write wt
,
the transpose of w, as a column vector.
Another bilinear pairing that you have seen is the determinant map on R2
.
Thus if v = (v1, v2) and w = (w1, w2), then
δ(v, w) = det

v1 v2
w1 w2

= v1w2 − v2w1
is a bilinear map. The determinant map has the further property that it is
alternating, which means that if we switch the vectors, the value changes sign,
δ(v, w) = −δ(w, v).
Notice that the alternating property implies that δ(v, v) = 0 for every
vector v.
The bilinear pairings that we discuss in this section are similar in that they
take as input two points on an elliptic curve and give as output a number.
However, the bilinearity condition is slightly different, because the output
value is a nonzero element of a finite field, so the sum on the right-hand side
of (6.14) is replaced by a product.
Bilinear pairings on elliptic curves have a number of important crypto-
graphic applications. For most of these applications it is necessary to work
with finite fields Fpk of prime power order. Fields of prime power order are
discussed in Sect. 2.10.4, but even if you have not covered that material, you
can just imagine a field that is similar to Fp, but that has pk
elements. (N.B.
The field Fpk is very different from the ring Z/pk
Z; see Exercise 2.40.) Stan-
dard references for the material used in this section are [136] and [147].

6.8. Bilinear Pairings on Elliptic Curves 337
6.8.1 Points of Finite Order on Elliptic Curves
We begin by briefly describing the points of finite order on an elliptic curve.
Definition. Let m ≥ 1 be an integer. A point P ∈ E satisfying mP = O
is called a point of order m in the group E. We denote the set of points of
order m by
E[m] =

P ∈ E : mP = O

.
Such points are called points of finite order or torsion points.
It is easy to see that if P and Q are in E[m], then P + Q and −P are also
in E[m], so E[m] is a subgroup of E. If we want the coordinates of P to lie in a
particular field K, for example in Q or R or C or Fp, then we write E(K)[m].
(See Exercise 2.12.)
The group of points of order m has a fairly simple structure, at least if we
allow the coordinates of the points to be in a sufficiently large field.
Proposition 6.33. Let m ≥ 1 be an integer.
(a) Let E be an elliptic curve over Q or R or C. Then
E(C)[m] ∼
= Z/mZ × Z/mZ
is a product of two cyclic groups of order m.
(b) Let E be an elliptic curve over Fp and assume that p does not divide m.
Then there exists a value of k such that
E(Fpjk )[m] ∼
= Z/mZ × Z/mZ for all j ≥ 1.
Proof. For the proof, which is beyond the scope of this book, see any standard
text on elliptic curves, for example [136, Corollary III.6.4].
Remark 6.34. Notice that if is prime and if K is a field such that
E(K)[] = Z/Z × Z/Z,
then we may view E[] as a 2-dimensional vector space over the field Z/Z.
And even if m is not prime,
E(K)[m] = Z/mZ × Z/mZ
still has a “basis” {P1, P2} in the sense that every point P = E[m] can be
written as a linear combination
P = aP1 + bP2
for a unique choice of coefficients a, b ∈ Z/mZ. Of course, if m is large, it may
be very difficult to find a and b. Indeed, if P is a multiple of P1, then finding
the value of a is the same as solving the ECDLP for P and P1.

6.8.2 Rational Functions and Divisors on Elliptic Curves
In order to define the Weil and Tate pairings, we need to explain how a rational
function on an elliptic curve is related to its zeros and poles. We start with
the simpler case of a rational function of one variable. A rational function is
a ratio of polynomials
f(X) =
a0 + a1X + a2X2
+ · · · + anXn
b0 + b1X + b2X2 + · · · + bmXm
.
Any nonzero polynomial can be factored completely if we allow complex num-
bers, so a nonzero rational function can be factored as
f(X) =
a(X − α1)e1
(X − α2)e2
· · · (X − αr)er
b(X − β1)d1 (X − β2)d2 · · · (X − βs)ds
.
We may assume that α1, . . . , αr, β1, . . . , βs are distinct numbers, since
otherwise we can cancel some of the terms in the numerator with some
of the terms in the denominator. The numbers α1, . . . , αr are called the zeros
of f(X) and the numbers β1, . . . , βs are called the poles of f(X). The expo-
nents e1, . . . , er, d1, . . . , ds are the associated multiplicities. We keep track of
the zeros and poles of f(X) and their multiplicities by defining the divisor
of f(X) to be the formal sum
div

f(X)

= e1[α1] + e2[α2] + · · · + er[αr] − d1[β1] − d2[β2] − · · · − dr[βr].
Note that this is simply a convenient shorthand way of saying that f(X) has
a zero of multiplicity e1 at α1, a zero of multiplicity e2 at α2, etc.
If E is an elliptic curve,
E : Y 2
= X3
+ AX + B,
and if f(X, Y ) is a nonzero rational function of two variables, we may view f
as defining a function on E by writing points as P = (x, y) and setting f(P) =
f(x, y). Then just as for rational functions of one variable, there are points
of E where the numerator of f vanishes and there are points of E where the
denominator of f vanishes, so f has zeros and poles on E. Further, one can
assign multiplicities to the zeros and poles, so f has an associated divisor
div(f) =

P ∈E
nP [P].
In this formal sum, the coefficients nP are integers, and only finitely many of
the nP are nonzero, so div(f) is a finite sum. Of course, the coordinates of
the zeros and poles of f may require moving to a larger field. For example,
if E is defined over Fp, then the poles and zeros of f have coordinates in Fpk
for some k, but the value of k will, in general, depend on the function f.

Example 6.35. Suppose that the cubic polynomial used to define E factors as
X3
+ AX + B = (X − α1)(X − α2)(X − α3).
Then the points P1 = (α1, 0), P2 = (α2, 0), and P3 = (α3, 0) are distinct (see
Remark 6.4) and satisfy 2P1 = 2P2 = 2P3 = O, i.e., they are points of order 2.
The function Y , which remember is defined by
Y (P) = (the y-coordinate of the point P),
vanishes at these three points and at no other points P = (x, y). The divisor
of Y has the form [P1] + [P2] + [P3] − n[O] for some integer n, and it follows
from Theorem 6.36 that n = 3, so
div(Y ) = [P1] + [P2] + [P3] − 3[O].
More generally, we define a divisor on E to be any formal sum
D =

P ∈E
nP [P] with nP ∈ Z and nP = 0 for all but finitely many P.
The degree of a divisor is the sum of its coefficients,
deg(D) = deg

P ∈E
nP [P]

=

P ∈E
nP .
We define the sum of a divisor by dropping the square brackets; thus
Sum(D) = Sum

P ∈E
nP [P]

=

P ∈E
nP P.
Note that nP P means to add P to itself nP times using the addition law
on E. It is natural to ask which divisors are divisors of functions, and to what
extent the divisor of a function determines the function. These questions are
answered by the following theorem.
Theorem 6.36. Let E be an elliptic curve.
(a) Let f and g be nonzero rational functions on E. If div(f) = div(g), then
there is a nonzero constant c such that f = cg.
(b) Let D =

P ∈E nP [P] be a divisor on E. Then D is the divisor of a
rational function on E if and only if
deg(D) = 0 and Sum(D) = O.
In particular, if a rational function on E has no zeros or no poles, then it is
constant.
Proof. Again we refer the reader to any elliptic curve textbook such as [136,
Propositions II.3.1 and III.3.4].

Example 6.37. Suppose that P ∈ E[m] is a point of order m. By defini-
tion, mP = O, so the divisor
m[P] − m[O]
satisfies the conditions of Theorem 6.36(b). Hence there is a rational func-
tion fP (X, Y ) on E satisfying
div(fP ) = m[P] − m[O].
The case m = 2 is particularly simple. A point P ∈ E has order 2 if
and only if its Y -coordinate vanishes. If we let P = (α, 0) ∈ E[2], then the
function fP = X − α satisfies
div(X − α) = 2[P] − 2[O];
see Exercise 6.30.
6.8.3 The Weil Pairing
The Weil pairing, which is denoted by em, takes as input a pair of points
P, Q ∈ E[m] and gives as output an mth root of unity em(P, Q). The bilin-
earity of the Weil pairing is expressed by the equations
em(P1 + P2, Q) = em(P1, Q)em(P2, Q),
em(P, Q1 + Q2) = em(P, Q1)em(P, Q2).
(6.15)
This is similar to the vector space bilinearity described in (6.14), but note that
the bilinearity in (6.15) is multiplicative, in the sense that the quantities on
the right-hand side are multiplied, while the bilinearity in (6.14) is additive,
in the sense that the quantities on the right-hand side are added.
Definition. Let P, Q ∈ E[m], i.e., P and Q are points of order m in the
group E. Let fP and fQ be rational functions on E satisfying
div(fP ) = m[P] − m[O] and div(fQ) = m[Q] − m[O].
(See Example 6.37.) The Weil pairing of P and Q is the quantity
em(P, Q) =
fP (Q + S)
fP (S)

fQ(P − S)
fQ(−S)
, (6.16)
where S ∈ E is any point satisfying S /
∈ {O, P, −Q, P − Q}. (This ensures
that all of the quantities on the right-hand side of (6.16) are defined and
nonzero.) One can check that the value of em(P, Q) does not depend on the
choice of fP , fQ, and S; see Exercise 6.32.
Despite its somewhat arcane definition, the Weil pairing em has many
useful properties.

Theorem 6.38. (a) The values of the Weil pairing satisfy
em(P, Q)m
= 1 for all P, Q ∈ E[m].
In other words, em(P, Q) is an mth root of unity.
(b) The Weil pairing is bilinear, which means that
em(P1 + P2, Q) = em(P1, Q)em(P2, Q) for all P1, P2, Q ∈ E[m],
and
em(P, Q1 + Q2) = em(P, Q1)em(P, Q2) for all P, Q1, Q2 ∈ E[m].
(c) The Weil pairing is alternating, which means that
em(P, P) = 1 for all P ∈ E[m].
This implies that em(P, Q) = em(Q, P)−1
for all P, Q, ∈ E[m], see
Exercise 6.31.
(d) The Weil pairing is nondegenerate, which means that
if em(P, Q) = 1 for all Q ∈ E[m], then P = O.
Proof. Some parts of Theorem 6.38 are easy to prove, while other parts are
not so easy. For a complete proof, see for example [136, Section III.8].
Remark 6.39. Where does the Weil pairing come from? According to
Proposition 6.33 (see also Remark 6.34), if we allow points with coordi-
nates in a sufficiently large field, then E[m] looks like a 2-dimensional “vector
space” over the “field” Z/mZ. So if we choose a basis P1, P2 ∈ E[m], then
any element P ∈ E[m] can be written in terms of this basis as
P = aP P1 + bP P2 for unique aP , bP ∈ Z/mZ,
and then we can define an alternating bilinear pairing by using the
determinant,
E[m] × E[m] −→ Z/mZ, (P, Q) −→ det

aP aQ
bP bQ

= aP bQ − aQbP .
But there are two problems with this pairing. First, it depends on choosing
a basis, and second, there’s no easy way to compute it other than writing P
and Q in terms of the basis. However, it should come as no surprise that the
determinant and the Weil pairing are closely related to one another. To be pre-
cise, if we let ζ = em(P1, P2), then it is easy to check that (see Exercise 6.33)
em(P, Q) = ζ
det
aP aQ
bP bQ

= ζaP bQ−aQbP
.

The glory10
of the Weil pairing is that it can be computed quite efficiently
without first expressing P and Q in terms of any particular basis of E[m].
(See Sect. 6.8.4 for a double-and-add algorithm to compute em(P, Q).) This is
good, since expressing a point in terms of the basis P1 and P2 is at least as
difficult as solving the ECDLP; see Exercise 6.10.
Example 6.40. We are going to compute e2 directly from the definition. Let E
be given by the equation
Y 2
= X3
+ Ax + B = (X − α1)(X − α2)(X − α3).
Note that α1 + α2 + α3 = 0, since the left-hand side has no X2
term. The
points
P1 = (α1, 0), P2 = (α2, 0), P3 = (α3, 0),
are points of order 2, and as noted in Example 6.37 (see also Exercise 6.30),
div(X − αi) = 2[Pi] − 2[O].
In order to compute e2(P1, P2), we can take an arbitrary point S = (x, y) ∈ E.
Using the addition formula, we find that the x-coordinate of P1 −S is equal to
X(P1 − S) =

−y
x − α1

2
− x − α1
=
y2
− (x − α1)2
(x + α1)
(x − α1)2
=
(x − α1)(x − α2)(x − α3) − (x − α1)2
(x + α1)
(x − α1)2
since y2
= (x − α1)(x − α2)(x − α3),
=
(x − α2)(x − α3) − (x − α1)(x + α1)
x − α1
=
(−α2 − α3)x + α2α3 + α2
1
x − α1
=
α1x + α2α3 + α2
1
x − α1
since α1 + α2 + α3 = 0.
Similarly,
X(P2 + S) =
α2x + α1α3 + α2
2
x − α2
.
10For those who have taken a course in abstract algebra, we mention that the other
glorious property of the Weil pairing is that it interacts well with Galois theory. Thus let E
be an elliptic curve over a ﬁeld K, let L/K be a Galois extension, and let P, Q ∈ E(L)[m].
Then for every element g ∈ Gal(L/K), the Weil pairing obeys the rule em

g(P), g(Q)

=
g

em(P, Q)

.

Using the rational functions fPi
= X − αi and assuming that P1 and P2 are
distinct nonzero points in E[2], we find directly from the definition of em that
e2(P1, P2) =
fP1
(P2 + S)
fP1
(S)

fP2
(P1 − S)
fP2
(−S)
=
X(P2 + S) − α1
X(S) − α1

X(P1 − S) − α2
X(−S) − α2
=
α2x+α1α3+α2
2
x−α2
− α1
x − α1
' α1x+α2α3+α2
1
x−α1
− α2
x − α2
=
(α2 − α1)x + α1α3 + α2
2 + α1α2
(α1 − α2)x + α2α3 + α2
1 + α1α2
=
(α2 − α1)x + α2
2 − α2
1
(α1 − α2)x + α2
1 − α2
2
since α1 + α2 + α3 = 0,
= −1.
6.8.4 An Efficient Algorithm to Compute
the Weil Pairing
In this section we describe a double-and-add method that can be used to
efficiently compute the Weil pairing. The key idea, which is due to Victor
Miller [89], is an algorithm to rapidly evaluate certain functions with specified
divisors, as explained in the next theorem. (For further material on Miller’s
algorithm, see [136, Section XI.8].)
Theorem 6.41. Let E be an elliptic curve and let P = (xP , yP ) and Q =
(xQ, yQ) be nonzero points on E.
(a) Let λ be the slope of the line connecting P and Q, or the slope of the
tangent line to E at P if P = Q. (If the line is vertical, we let λ = ∞.)
Define a function gP,Q on E as follows:
gP,Q =
⎧
⎨
⎩
y − yP − λ(x − xP )
x + xP + xQ − λ2
if λ = ∞,
x − xP if λ = ∞.
Then
div(gP,Q) = [P] + [Q] − [P + Q] − [O]. (6.17)
(b) (Miller’s Algorithm) Let m ≥ 1 and write the binary expansion of m as
m = m0 + m1 · 2 + m2 · 22
+ · · · + mn−12n−1
with mi ∈ {0, 1} and mn−1 = 0. The following algorithm returns a func-
tion fP whose divisor satisfies
div(fP ) = m[P] − [mP] − (m − 1)[O],

where the functions gT,T and gT,P used by the algorithm are as deﬁned
in (a).
[1] Set T = P and f = 1
[2] Loop i = n − 2 down to 0
[3] Set f = f2
· gT,T
[4] Set T = 2T
[5] If mi = 1
[6] Set f = f · gT,P
[7] Set T = T + P
[8] End If
[9] End i Loop
[10] Return the value f
In particular, if P ∈ E[m], then div(fP ) = m[P] − m[O].
Proof. (a) Suppose ﬁrst that λ = ∞ and let y = λx+ν be the line through P
and Q or the tangent line at P if P = Q. This line intersects E at the three
points P, Q, and −P − Q, so
div(y − λx − ν) = [P] + [Q] + [−P − Q] − 3[O].
Vertical lines intersect E at points and their negatives, so
div(x − xP +Q) = [P + Q] + [−P − Q] − 2[O].
It follows that
gP,Q =
y − λx − ν
x − xP +Q
has the desired divisor (6.17). Finally, the addition formula (Theorem 6.6) tells
us that xP +Q = λ2
− xP − xQ, and we can eliminate ν from the numerator
of gP,Q using yP = λxP + ν.
If λ = ∞, then P+Q = O, so we want gP,Q to have divisor [P]+[−P]−2[O].
The function x − xP has this divisor.
(b) This is a standard double-and-add algorithm, similar to others that we
have seen in the past. The key to the algorithm comes from (a), which tells
us that the functions gT,T and gT,P used in Steps 3 and 6 have divisors
div(gT,T ) = 2[T] − [2T] − [O] and div(gT,P ) = [T] + [P] − [T + P] − [O].
We leave to the reader the remainder of the proof, which is a simple induction
using these relations.
Let P ∈ E[m]. The algorithm described in Theorem 6.41 tells us how to
compute a function fP with divisor m[P] − m[O]. Further, if R is any point
of E, then we can compute fP (R) directly by evaluating the functions gT,T (R)
and gT,P (R) each time we execute Steps 3 and 6 of the algorithm. Notice that
quantities of the form fP (R) are exactly what are needed in order to evaluate

the Weil pairing em(P, Q). More precisely, given nonzero points P, Q ∈ E[m],
we choose a point S /
∈ {O, P, −Q, P − Q} and use Theorem 6.41 to evaluate
em(P, Q) =
fP (Q + S)
fP (S)

fQ(P − S)
fQ(−S)
by computing each of the functions at the indicated point.
Remark 6.42. For added efficiency, one can compute fP (Q + S) and fP (S)
simultaneously, and similarly for fQ(P − S) and fQ(−S). Further savings are
available using the Tate pairing, which is a variant of the Weil pairing that
we describe briefly in Sect. 6.8.5.
Example 6.43. We take the elliptic curve
y2
= x3
+ 30x + 34 over the finite field F631.
The curve has #E(F631) = 650 = 2 · 52
· 13 points, and it turns out that it
has 25 points of order 5. The points
P = (36, 60) and Q = (121, 387)
generate the points of order 5 in E(F631). In order to compute the Weil pairing
using Miller’s algorithm, we want a point S that is not in the subgroup spanned
by P and Q. We take S = (0, 36). The point S has order 130. Then Miller’s
algorithm gives
fP (Q + S)
fP (S)
=
103
219
= 473 ∈ F631.
Reversing the roles of P and Q and replacing S by −S, Miller’s algorithm
also gives
fQ(P − S)
fQ(−S)
=
284
204
= 88 ∈ F631.
Finally, taking the ratio of these two values yields
e5(P, Q) =
473
88
= 242 ∈ F631.
We check that (242)5
= 1, so e5(P, Q) is a fifth root of unity in F631.
Continuing to work on the same curve, we take P
= (617, 5) and Q
=
(121, 244). Then a similar calculation gives
fP (Q
+ S)
fP (S)
=
326
523
= 219 and
fQ (P
− S)
fQ (−S)
=
483
576
= 83,
and taking the ratio of these two values yields
e5(P
, Q
) =
219
83
= 512 ∈ F631.
It turns out that P
= 3P and Q
= 4Q. We check that
e5(P, Q)12
= 24212
= 512 = e5(P
, Q
) = e5(3P, 4Q),
which illustrates the bilinearity property of the Weil pairing.

6.8.5 The Tate Pairing
The Weil pairing is a nondegenerate bilinear form on elliptic curves defined
over any field. For elliptic curves over finite fields there is another pairing,
called the Tate pairing (or sometimes the Tate–Lichtenbaum pairing), that
is often used in cryptography because it is computationally somewhat more
efficient than the Weil pairing. In this section we briefly describe the Tate
pairing. (For further material on the Tate pairing, see [136, Section XI.9].)
Definition. Let E be an elliptic curve over Fq, let be a prime, let P ∈
E(Fq)[], and let Q ∈ E(Fq). Choose a rational function fP on E with
div(fP ) = [P] − [O].
The Tate pairing of P and Q is the quantity
τ(P, Q) =
fP (Q + S)
fP (S)
∈ F∗
q,
where S is any point in E(Fq) such that fP (Q+S) and fP (S) are defined and
nonzero. It turns out that the value of the Tate pairing is well-defined only
up to multiplying it by the th power of an element of F∗
q. If q ≡ 1 (mod ),
we define the (modified) Tate pairing of P and Q to be
τ̂(P, Q) = τ(P, Q)(q−1)/
=

fP (Q + S)
fP (S)

(q−1)/
∈ F∗
q.
Theorem 6.44. Let E be an elliptic curve over Fq and let be a prime with
q ≡ 1 (mod ) and E(Fq)[] ∼
= Z/Z.
Then the modified Tate pairing gives a well-defined map
τ̂ : E(Fq)[] × E(Fq)[] −→ F∗
q
having the following properties:
(a) Bilinearity:
τ̂(P1 +P2, Q) = τ̂(P1, Q)τ̂(P2, Q) and τ̂(P, Q1 +Q2) = τ̂(P, Q1)τ̂(P, Q2).
(b) Nondegeneracy:
τ̂(P, P) is a primitive th root of unity for all nonzero P ∈ E(Fq)[].
(A primitive th root of unity is a number ζ = 1 such that ζ
= 1.)
In applications such as tripartite Diffie–Hellman (Sect. 6.10.1) and ID-
based cryptography (Sect. 6.10.2), one may use the Tate pairing in place of
the Weil pairing. Note that Miller’s algorithm gives an efficient way to compute
the Tate pairing, since Theorem 6.41(b) explains how to rapidly compute the
value of fP .

6.9. The Weil Pairing over Fields of Prime Power Order 347
6.9 The Weil Pairing over Fields of Prime
Power Order
There are many applications of the Weil pairing in which it is necessary
to work in fields Fpk of prime power order. In this section we discuss the
m-embedding degree, which is the smallest value of k such that E(Fpk )[m]
is as large as possible, and we give an application called the MOV algorithm
that reduces the ECDLP in E(Fp) to the DLP in F∗
pk . We then describe dis-
tortion maps on E and use them to define a modified Weil pairing êm for
which êm(P, P) is nontrivial.
6.9.1 Embedding Degree and the MOV Algorithm
Let E be an elliptic curve over Fp and let m ≥ 1 be an integer with p m.
In order to obtain nontrivial values of the Weil pairing em, we need to use
independent points of order m on E. According to Proposition 6.33(b), the
curve E has m2
points of order m, but their coordinates may lie in a larger
finite field.
Definition. Let E be an elliptic curve over Fp and let m ≥ 1 be an integer
with p m. The embedding degree of E with respect to m is the smallest value
of k such that
E(Fpk )[m] ∼
= Z/mZ × Z/mZ.
For cryptographic applications, the most interesting case occurs when m
is a (large) prime, in which case there are alternative characterizations of the
embedding degree, as in the following result.
Proposition 6.45. Let E be an elliptic curve over Fp and let = p be a
prime. Assume that E(Fp) contains a point of order . Then the embedding
degree of E with respect to is given by one of the following cases:
(i) The embedding degree of E is 1. (This cannot happen if
√
p + 1; see
Exercise 6.39.)
(ii) p ≡ 1 (mod ) and the embedding degree is .
(iii) p ≡ 1 (mod ) and the embedding degree is the smallest value of k ≥ 2
such that
pk
≡ 1 (mod ).
Proof. The proof uses more advanced methods than we have at our disposal.
See [147, Proposition 5.9] for a proof of case (iii), which is the case that most
often occurs in practice.
The significance of the embedding degree k is that the Weil pairing embeds
the ECDLP on the elliptic curve E(Fp) into the DLP in the field Fpk . The
basic setup is as follows. Let E be an elliptic curve over Fp and let P ∈ E(Fp)
be a point of order , where is a large prime, say
√
p + 1. Let k be the

embedding degree with respect to and suppose that we know how to solve
the discrete logarithm problem in the field Fpk . Let Q ∈ E(Fp) be a point
that is a multiple of P. Then the following algorithm of Menezes, Okamoto,
and Vanstone [82] solves the elliptic curve discrete logarithm problem for P
and Q.
The MOV Algorithm
1. Compute the number of points N = #E(Fpk ). This is feasible if k is
not too large, since there are polynomial-time algorithms to count the
number of points on an elliptic curve; see Remarks 6.13 and 6.32. Note
that | N, since by assumption E(Fp) has a point of order .
2. Choose a random point T ∈ E(Fpk ) with T /
∈ E(Fp).
3. Compute T
= (N/)T. If T
= O, go back to Step 2. Otherwise, T
is
a point of order , so proceed to Step 4.
4. Compute the Weil pairing values
α = e(P, T
) ∈ F∗
pk and β = e(Q, T
) ∈ F∗
pk .
This can be done quite efficiently, in time proportional to log(pk
), see
Sect. 6.8.4. If α = 1, return to Step 2.
5. Solve the DLP for α and β in F∗
pk , i.e., find an exponent n such that
β = αn
. If pk
is not too large, this can be done using the index calculus.
Note that the index calculus (Sect. 3.8) is a subexponential algorithm,
so it is considerably faster than collision algorithms such as Pollard’s ρ
method (Sects. 5.4 and 5.5).
6. Then also Q = nP, so the ECDLP has been solved.
The MOV algorithm is summarized in Table 6.10. A few comments are in
order.
Remark 6.46. How does one generate a random point T ∈ E(Fpk ) with T /
∈
E(Fp) in Step 2? One method is to choose random values x ∈ Fpk and check
whether x3
+ Ax + B is a square in Fpk , which is easy to do, since z is a
square in Fpk if and only if z(pk
−1)/2
= 1. (We are assuming that p is an
odd prime.) There then exist practical (i.e., polynomial time) algorithms to
compute square roots in finite fields, but to describe them would take us too
far afield; see [28, §§1.5.1, 1.5.2].
Remark 6.47. Why does the MOV algorithm solve the ECDLP? The point T
constructed by the algorithm is generally independent of P, so the pair of
points {P, T
} forms a basis for the 2-dimensional vector space
E[] = Z/Z × Z/Z.

It follows from the nondegeneracy of the Weil pairing that e(P, T
) is a non-
trivial th root of unity in F∗
pk . In other words,
e(P, T
)r
= 1 if and only if | r.
Suppose now that Q = jP and that our goal is to find the value of j mod-
ulo . The MOV algorithm finds an integer n satisfying e(Q, T
) = e(P, T
)n
.
The linearity of the Weil pairing implies that
e(P, T
)n
= e(Q, T
) = e(jP, T
) = e(P, T
)j
,
so e(P, T
)n−j
= 1. Hence n ≡ j (mod ), which shows that n solves the
ECDLP for P and Q.
Remark 6.48. How practical is the MOV algorithm? The answer, obviously,
depends on the size of k. If k is large, say k (ln p)2
, then the MOV algorithm
is completely infeasible. For example, if p ≈ 2160
, then we would have to solve
the DLP in Fpk with k 4000. Since a randomly chosen elliptic curves over Fp
almost always has embedding degree that is much larger than (ln p)2
, it would
seem that the MOV algorithm is not useful. However, there are certain special
sorts of curves whose embedding degree is small. An important class of such
curves consists of those satisfying
#E(Fp) = p + 1.
These supersingular elliptic curves generally have embedding degree k = 2,
and in any case k ≤ 6. For example,
E : y2
= x3
+ x
is supersingular for any prime p ≡ 3 (mod 4), and it has embedding de-
gree 2 for any
√
p + 1. This means that solving ECLDP in E(Fp) is no
harder than solving DLP in F∗
p2 , which makes E a very poor choice for use in
cryptography.11
Remark 6.49. An elliptic curve E over a finite field Fp is called anomalous if
#E(Fp) = p. A number of people [114, 122, 141] more or less simultaneously
observed that there is a very fast (linear time) algorithm to solve the ECDLP
on anomalous elliptic curves, so such curves must be avoided in cryptographic
constructions.
There are also some cases in which the ECDLP is easier than expected for
elliptic curves E over finite fields F2m when m is composite. (A reason to use
such fields is that field operations can sometimes be done more efficiently.)
This attack uses a tool called Weil descent and was originally suggested by
Gerhard Frey. The idea is to transfer an ECDLP in E(F2m ) to a discrete
logarithm problem on a hyperelliptic curve (see Sect. 8.10) over a smaller
field F2k , where k divides m. The details are complicated and beyond the
scope of this book. See [29, §22.3] for details.
11Or so it would seem, but we will see in Sect. 6.9.3 that the ECDLP on E does have its
uses in cryptography!

1. Compute the number of points N = #E(Fpk ).
2. Choose a random point T ∈ E(Fpk ) with T /
∈ E(Fp).
3. Let T
= (N/)T. If T
= O, go back to Step 2. Otherwise T
is a point of order , so proceed to Step 4.
4. Compute the Weil pairing values
α = e(P, T
) ∈ F∗
pk and β = e(Q, T
) ∈ F∗
pk .
If α = 1, go to Step 2.
5. Solve the DLP for α and β in F∗
pk , i.e., find an exponent n
such that β = αn
.
6. Then also Q = nP, so the ECDLP has been solved.
Table 6.10: The MOV algorithm to solve the ECDLP
6.9.2 Distortion Maps and a Modified Weil Pairing
The Weil pairing is alternating, which means that em(P, P) = 1 for all P.
In cryptographic applications we generally want to evaluate the pairing at
points P1 = aP and P2 = bP, but using the Weil pairing directly is not
helpful, since
em(P1, P2) = em(aP, bP) = em(P, P)ab
= 1ab
= 1.
One way around this dilemma is to choose an elliptic curve that has a “nice”
map φ : E → E with the property that P and φ(P) are “independent”
in E[m]. Then we can evaluate
em

P1, φ(P2)

= em

aP, φ(bP)

= em

aP, bφ(P)

= em

P, φ(P)
ab
.
For cryptographic applications one generally takes m to be prime, so we re-
strict our attention to this case.
Definition. Let ≥ 3 be a prime, let E be an elliptic curve, let P ∈ E[] be
a point of order , and let φ : E → E be a map from E to itself. We say that φ
is an -distortion map for P if it has the following two properties12
:
(i) φ(nP) = nφ(P) for all n ≥ 1.
(ii) The number e

P, φ(P)

is a primitive th root of unity. This means that
e

P, φ(P)
r
= 1 if and only if r is a multiple of .
The next proposition gives various ways to check condition (ii).
Proposition 6.50. Let E be an elliptic curve, let ≥ 3 be a prime, and
view E[] = Z/Z×Z/Z as a 2-dimensional vector space over the field Z/Z.
Let P, Q ∈ E[]. Then the following are equivalent:
12There are various definitions of distortion maps in the literature. The one that we give
distills the essential properties needed for most cryptographic applications. In practice, one
also requires an efficient algorithm to compute φ.

(a) P and Q form a basis for the vector space E[].
(b) P = O and Q is not a multiple of P.
(c) e(P, Q) is a primitive th root of unity.
(d) e(P, Q) = 1.
Proof. It is clear that (a) implies (b), since a basis consists of independent
vectors. Conversely, suppose that (a) is false. This means that there is a linear
relation
uP + vQ = O with u, v ∈ Z/Z not both 0.
If v = 0, then P = O, so (b) is false. And if v = 0, then v has an inverse
in Z/Z, so Q = −v−1
uP is a multiple of P, again showing that (b) is false.
This completes the proof that (a) and (b) are equivalent.
To ease notation, we let
ζ = e(P, Q).
From the deﬁnition of the Weil pairing, we know that ζ
= 1. Let r ≥ 1 be
the smallest integer such that ζr
= 1. Use the extended Euclidean algorithm
(Theorem 1.11) to write the greatest common divisor of r and as
sr + t = gcd(r, ) for some s, t ∈ Z.
Then
ζgcd(r,)
= ζsr+t
= (ζr
)s
(ζ
)t
= 1.
The minimality of r tells us that r = gcd(r, ), so r | . Since is prime, it
follows that either r = 1, so ζ = 1, or else r = . This proves that (c) and (d)
are equivalent.
We next verify that (a) implies (d). So we are given that P and Q are a
basis for E[]. In particular, P = O, so the nondegeneracy of the Weil pairing
tells us that there is a point R ∈ E[] with e(P, R) = 1. Since P and Q are a
basis for E[], we can write R as a linear combination of P and Q, say
R = uP + vQ.
Then the bilinearity and alternating properties of the Weil pairing yield
1 = e(P, R) = e(P, uP + vQ) = e(P, P)u
e(P, Q)v
= e(P, Q)v
.
Hence e(P, Q) = 1, which shows that (d) is true.
Finally, we show that (d) implies (b) by assuming that (b) is false and
deducing that (d) is false. The assumption that (b) is false means that ei-
ther P = O or Q = uP for some u ∈ Z/Z. But if P = O, then e(P, Q) =
e(O, Q) = 1 by bilinearity, while if Q = uP, then
e(P, Q) = e(P, uP) = e(P, P)u
= 1u
= 1
by the alternating property of e. Thus in both cases we ﬁnd that e(P, Q) = 1,
so (d) is false.

Definition. Let E be an elliptic curve, let P ∈ E[], and let φ be an -
distortion map for P. The modified Weil pairing ê on E[] (relative to φ) is
defined by
ê(Q, Q
) = e

Q, φ(Q
)

.
In cryptographic applications, the modified Weil pairing is evaluated at
points that are multiples of P. The crucial property of the modified Weil
pairing is its nondegeneracy, as described in the next result.
Proposition 6.51. Let E be an elliptic curve, let P ∈ E[], let φ be an -
distortion map for P, and let ê be the modified Weil pairing relative to φ.
Let Q and Q
be multiples of P. Then
ê(Q, Q
) = 1 if and only if Q = O or Q
= O.
Proof. We are given that Q and Q
are multiples of P, so we can write them
as Q = sP and Q
= tP. The definition of distortion map and the linearity of
the Weil pairing imply that
ê(Q, Q
) = ê(sP, tP) = e

sP, φ(tP)

= e

sP, tφ(P)

= e

P, φ(P)
st
.
The quantity e

P, φ(P)

is a primitive th root of unity, so
ê(Q, Q
) = 1 ⇐⇒ | st
⇐⇒ | s or | t
⇐⇒ Q = O or Q
= O.
6.9.3 A Distortion Map on y2
= x3
+ x
In order to use the modified Weil pairing for cryptographic purposes, we need
to give at least one example of an elliptic curve with a distortion map. In this
section we give such an example for the elliptic curve y2
= x3
+ x over the
field Fp with p ≡ 3 (mod 4). (See Exercise 6.43 for another example.) We start
by describing the map φ.
Proposition 6.52. Let E be the elliptic curve
E : y2
= x3
+ x
over a field K and suppose that K has an element α ∈ K satisfying α2
= −1.
Define a map φ by
φ(x, y) = (−x, αy) and φ(O) = O.
(a) Let P ∈ E(K). Then φ(P) ∈ E(K), so φ is a map from E(K) to itself.

(b) The map φ respects the addition law on E,13
φ(P1 + P2) = φ(P1) + φ(P2) for all P1, P2 ∈ E(K).
In particular, φ(nP) = nφ(P) for all P ∈ E(K) and all n ≥ 1.
Proof. (a) Let P = (x, y) ∈ E(K). Then
(αy)2
= −y2
= −(x3
+ x) = (−x)3
+ (−x),
so φ(P) = (−x, αy) ∈ E(K).
(b) Suppose that P1 = (x1, y1) and P2 = (x2, y2) are distinct points. Then
using the elliptic curve addition algorithm (Theorem 6.6), we ﬁnd that the
x-coordinate of φ(P1) + φ(P2) is
x

φ(P1) + φ(P2)

=

αy2 − αy1
(−x2) − (−x1)

2
− (−x1) − (−x2)
= α2

y2 − y1
x2 − x1

2
+ x1 + x2
= −
$
y2 − y1
x2 − x1

2
− x1 − x2
%
= −x(P1 + P2).
Similarly, the y-coordinate of φ(P1) + φ(P2) is
y

φ(P1) + φ(P2)

=

αy2 − αy1
(−x2) − (−x1)

−x1 − x

φ(P1) + φ(P2)

− αy1
= −α

y2 − y1
x2 − x1

−x1 + x(P1 + P2)

− αy1
= α

y2 − y1
x2 − x1

An Introduction to Mathematical Cryptography-Springer-.pdf

More Related Content

Similar to An Introduction to Mathematical Cryptography-Springer-.pdf (20)

Recently uploaded (20)

An Introduction to Mathematical Cryptography-Springer-.pdf