SlideShare a Scribd company logo
Theory of Computation 
Lectures delivered by Michael Sipser 
Notes by Holden Lee 
Fall 2012, MIT 
Last updated Tue. 12/11/2012 
Contents 
Lecture 1 Thu. 9/6/12 
S1 Overview 5 S2 Finite Automata 6 S3 Formalization 7 
Lecture 2 Tue. 9/11/12 
S1 Regular expressions 12 S2 Nondeterminism 13 S3 Using nondeterminism to show 
closure 17 S4 Converting a finite automaton into a regular expression 20 
Lecture 3 Thu. 9/13/12 
S1 Converting a DFA to a regular expression 22 S2 Non-regular languages 25 
Lecture 4 Tue. 9/18/12 
S1 Context-Free Grammars 30 S2 Pushdown automata 33 S3 Comparing pushdown 
and finite automata 33 
Lecture 5 Thu. 9/20/12 
S1 CFG’s and PDA’s recognize the same language 37 S2 Converting CFG→PDA 38 
S3 Non-CFLs 39 S4 Turing machines 42 
Lecture 6 Tue. 9/25/12 
S1 Turing machines 44 S2 Philosophy: Church-Turing Thesis 50 
Lecture 7 Thu. 9/27/12 
S1 Examples of decidable problems: problems on FA’s 52 S2 Problems on grammars 56 
1
Lecture 8 Tue. 10/2/12 
S1 Languages 58 S2 Diagonalization 59 S3 퐴푇푀: Turing-recognizable but not 
decidable 63 S4 Showing a specific language is not recognizable 65 
Lecture 9 Thu. 10/4/12 
S1 Reducibility 67 S2 Mapping reducibility 69 
Lecture 10 Thu. 10/11/12 
S1 Post Correspondence Problem 72 S2 Computation Histories and Linearly Bounded 
Automata 73 S3 Proof of undecidability of PCP 76 
Lecture 11 Tue. 10/16/12 
S1 Computation history method 77 S2 Recursion theorem 79 S3 Logic 83 
Lecture 12 Thu. 10/18/12 
S1 Introduction to complexity theory 84 S2 Time Complexity: formal definition 87 
Lecture 13 Tue. 10/23/12 
Lecture 14 Tue. 10/30/12 
S1 P vs. NP 91 S2 Polynomial reducibility 92 S3 NP completeness 95 
Lecture 15 Thu. 11/1/12 
S1 Cook-Levin Theorem 101 S2 Subset sum problem 105 
Lecture 16 Tue. 11/6/12 
S1 Space complexity 108 S2 Savitch’s Theorem 112 
Lecture 17 Thu. 11/8/12 
S1 Savitch’s Theorem 114 S2 PSPACE–completeness 116 
Lecture 18 Thu. 10/11/12 
S1 Games: Generalized Geography 120 S2 Log space 124 S3 퐿,푁퐿 ⊆ 푃 126 
Lecture 19 Thu. 11/15/12 
S1 L vs. NL129 S2 NL-completeness 130 S3 NL=coNL 132 
Lecture 20 Tue. 11/20/12 
S1 Space hierarchy 134 S2 Time Hierarchy Theorem 137
Lecture 21 Tue. 11/27/12 
S1 Intractable problems 138 S2 Oracles 142 
Lecture 22 Thu. 11/29/12 
S1 Primality testing 144 S2 Probabilistic Turing Machines 145 S3 Branching programs 147 
Lecture 23 Thu. 10/11/12 
S1 EQROBP 150 
Lecture 24 Thu. 12/6/12 
S1 Interactive proofs 155 S2 IP=PSPACE158 
Lecture 25 Tue. 12/11/2012 
S1 coNP⊆IP 161 S2 A summary of complexity classes 164
Introduction 
Michael Sipser taught a course (18.404J) on Theory of Computation at MIT in Fall 2012. 
These are my “live-TEXed” notes from the course. The template is borrowed from Akhil 
Mathew. 
Please email corrections to holden1@mit.edu.
Lecture 1 Notes on Theory of Computation 
Lecture 1 
Thu. 9/6/12 
Course information: Michael Sipser teaches the course. Alex Arkhipov and Zack Rumscrim 
teach the recitations. The website is http://math.mit.edu/~sipser/18404. 
The 3rd edition of the textbook has an extra lesson on parsing and deterministic free 
languages (which we will not cover), and some additional problems. The 2nd edition is okay. 
S1 Overview 
1.1 Computability theory 
In the first part of the course we will cover computability theory: what kinds of things 
can you solve with a computer and what kinds of things can you not solve with a computer? 
Computers are so powerful that you may think they can do anything. That’s false. For 
example, consider the question 
Does a given program meet a given specification? 
Is it possible to build a computer that will answer this question, when you feed it any 
program and any specification? No; this problem is uncomputable, impossible to solve with 
a computer. 
Is it possible to build a computer so that when I feed it math statements (for instance, 
Fermat’s last theorem or the twin primes conjecture), it will output true or false? Again, no. 
No algorithm can tell whether a math statement is true or fals, not even in principle (given 
sufficiently long time and large space). 
We’ll have to introduce a formal model of computation—what do we mean by a computer?— 
to talk about the subject in a mathematical way. There aer several different models for 
computation. We’ll talk about the simplest of these—finite automata—today. 
Computability theory had its hayday 1930 to 1950’s; it is pretty much finished as a field 
of research. 
1.2 Complexity theory 
By contrast, the second half of the course focuses on complexity theory. This picks up 
where computability left off, from 1960’s to the present. It is still a major area of research, 
and focuses not on whether we can solve a problem using a computer, but how hard is it to 
solve? The classical example is factoring. You can easily multiply 2 large numbers (ex. 500- 
digit numbers) quickly on a laptop. No one knows how or if you can do the opposite—factor 
a large number (ex. 1000-digit number)—easily. The state of the art is 200 digits right now. 
250 digits and up is way beyond what we can do in general. 
We’ll define different ways of measuring hardness: time and space (memory). We’ll look 
at models that are specific to complexity theory, such as probabilistic models and interactive 
5
Lecture 1 Notes on Theory of Computation 
proof systems. A famous unsolved problem is the P vs. NP problem, which will be a 
theme throughout our lessons on complexity. In these problems, some kind of “searching” 
is inevitable. 
1.3 Why theory? 
What is value of studying computer science theory? People question why a computer science 
department should invest in theoretical computer science. This used to be a big issue. 
Firstly, theory has proved its value to other parts of computer science. Many technologists 
and companies grew up out of the work of theorists, for example, RSA cryptography. Akamai 
came out of looking at distributed systems from a theoretical point of view. Many key 
personnel in Google were theorists, because search has a theoretical side to it. Theorists 
have played a role in building the computer science industry. 
Second, theory as a part of science. It’s not just driven by its applications, but by 
curiosity. For example, “how hard is factoring?” is a natural question that it is intrinsically 
worthwhile to answer. Our curiosity is makes us human. 
Computer science theory may also help us understand the brain in the future. We 
understand heart and most of our other organs pretty well, but we have only the faintest 
idea how the brain works. Because the brain has a computation aspect to it, it’s entirely 
possible that some theory of computation will help solve this problem. 
Is there more stuff to do? Certainly. We’re still at the very beginnings of computer 
science and theory. Whole realms out there are worthy of exploration. 
That’s what and why we do this stuff. 
S2 Finite Automata 
2.1 An example 
We need to set up our models for computability theory. The first one will be finite automata. 
An example of a finite automata is given by the following picture. 
6
Lecture 1 Notes on Theory of Computation 
The states are {푞1, 푞2, 푞3}. The transitions are arrows with 0 or 1, such as 0− 
→. The 
start state is 푞1 (it has a regular arrow leading to it) and the accept states is {푞3} (it has a 
double circle). Note each state has 2 arrows exiting it, 0 and 1. 
How does this automaton work when we feed it a string such as 010110? We start at 
the start state 푞1. Read in the input symbols one at a time, and follow the transition arrow 
given by the next bit. 
∙ 0: take the arrow from 푞1 back to 푞1. 
∙ 1: take the arrow from 푞1 to 푞2. 
∙ 0: take the arrow back to 푞1. 
∙ 1: get to 푞2 
∙ 1: get to 푞3 
∙ 0: stay at 푞3. 
Since 푞3 is an accept state, the output is “accept.” By contrast, the input state 101 ends at 
푞2; the machine does not accept, i.e. it rejects the input. 
Problem 1.1: What strings does the machine accept? 
The machine accepts exactly the strings with two consecutive 1’s. The language of 퐴, 
denoted 퐿(퐴), is the set of accepted strings, i.e. the language that the machine recognizes. 
(This term comes from linguistics.) 
We say that the language of 퐴 is 
퐿(퐴) = {푤 : 푤 has substring 11} . 
S3 Formalization 
We now give a formal definition of a finite automaton. 
Definition 1.1: A finite automaton is a tuple 푀 = (푄,Σ, 훿, 푞0, 퐹) where 
∙ 푄 is a finite set of states, 
∙ Σ is a finite alphabet (collection of symbols, for instance {0, 1}), 
∙ 훿 is the transition function that takes a state and input symbol and gives another state 
훿 : 푄 × Σ → 푄 
(푞, 푎)↦→ 푟. 
We denote this with a circle 푞 and an arrow 푎− 
→ leading to a circle 푟. 
7
Lecture 1 Notes on Theory of Computation 
∙ 푞0 ∈ 푄 is a start state. 
∙ 퐹 ⊆ 푄 is a set of accept states. 
To take this further, we’re going to define the language of an automaton. (We did this 
informally by following our finger on a path. We’re just doing this formally now.) 
Definition 1.2: Say 푀 accepts input string 푊 = 푊1 · · ·푊푛 where each 푊푖 ∈ Σ, if 
푟0, . . . , 푟푛 is a sequence from 푄 (of states gone through) where 
∙ 푟0 = 푞0 (start at start state), 
∙ 푟푛 ∈ 퐹 (ends at an accept state), 
∙ and for each 푖 > 0 and each 푖 > 0, 푟푖 = 훿(푟푖−1,푤푖) (each next state is obtained the 
previous state by reading the next symbol and using the transition function). 
The language of 푀 is 
퐿(푀) = {푤 : 푀 accepts 푤} . 
Note 푀 accepts certain strings and rejects certains strings, but 푀 recognizes just 1 
language, the collection of all recognized strings.1 
Note there is a special string, the empty string of length 0, denote 휀. By contrast, the 
empty language is denoted by 휑. 
Definition 1.3: A language is regular if some finite automaton recognizes it. 
For instance {푤 : 푤 has substring 11} is a regular language because we exhibited a au-tomaton 
that recognizes it. 
3.1 Building automata 
Problem 1.2: Build an automaton to recognize... 
∙ The set of strings with an even number of 1’s. 
∙ The set of strings that start and end with the same symbol. 
When we have a finite automaton, and we want to design an automaton for a certain 
task, think as follows: 
1If 퐿′ is a subset of 퐿 and 푀 recognizes 퐿, we don’t say 푀 recognizes 퐿′. 
8
Lecture 1 Notes on Theory of Computation 
The states of the automaton represent its memory. Use different states for different 
possibilities. 
For example, 
1. an automaton that accepts iff the string has an even number of 1’s will have to count 
number of 1’s mod 2. You want to have one state for each possibility. 
2. an automaton that accepts iff the first equals the last symbol will have to keep track 
of what the first symbol is. It should have different states for different possibilities of 
the first symbol. 
In the next lecture and a half we’ll seek to understand the regular languages. There are 
simple languages that are not regular, for example, the language that has an equal number 
of 0’s and 1’s is not regular. 
Proof sketch. Such an automaton would have to keep track of the difference between number 
of 0’s and 1’s so far, and there are an infinite number of possibilities to track; a finite 
automaton has only finitely many states and can keep track of finitely many possibilities. 
3.2 Closure properties of languages 
Definition 1.4: We call the following 3 operations on languages regular operations. 
∙ ∪ union: 퐴 ∪ 퐵 = {푤 : 푤 ∈ 퐴 or 푤 ∈ 퐵} 
∙ ∘ concatenation: 
퐴 ∘ 퐵 = 퐴퐵 = {푤 : 푤 = 푥푦, 푥 ∈ 퐴, 푦 ∈ 퐵} . 
∙ * Kleene star (unary operation) 
퐴* = {푤 : 푤 = 푋1푋2 · · ·푋푘, 푘 ≥ 0, 푥푖 ∈ 퐴} . 
These are traditionally called the regular operations: they are in a sense minimal, because 
starting from a simple set of regular languages and applying these three operations we can 
get to all regular languages. 
Example 1.5: If 퐴 = {good, bad} and 퐵 = {boy, girl} we get 
퐴 ∘ 퐵 = {good boy, good girl, bad boy, bad girl}. 
Note for *, we stick together words in any way we want to get longer string. We get an 
infinite language unless 퐴 ⊆ {휀}. Note 휀 ∈ 퐴*; in particular, 휑* = {휀}. 
9
Lecture 1 Notes on Theory of Computation 
Theorem 1.6: The collection of regular languages is closed under regular operations. In 
other words, if we take 2 regular languages (or 1 regular language, for *) and apply a regular 
operation, we get another regular language. 
We say the integers are “closed” under multiplication and addition, but not “closed” 
under division, because if you divide one by another, you might not get an integer. Closed 
means “you can’t get out” by using the operation. 
Proof of closure under ∪. We show that if 퐴 and 퐵 are regular, then so is 퐴 ∪ 퐵. 
We have to show how to construct the automaton for the union language given the 
automata that recognize 퐴 and 퐵, i.e. given 
푀1 = {푄1,Σ, 훿1, 푞1, 퐹1} recognizing 퐴 
푀2 = {푄2,Σ, 훿2, 푞2, 퐹2} recognizing 퐵 
construct 푀 = (푄,Σ, 훿, 푞0, 퐹) recognizing 퐴 ∪ 퐵. (For simplicity, let Σ1 = Σ2 = Σ.) 
You might think: run the string through 푀1, see whether 푀1 accepts it, then run the 
string through 푀2 and see whether 푀2 accepts it. But you can’t try something on the whole 
input string, and try another thing on the whole input string! You get only 1 pass. 
Imagine yourself in the role of 푀. 
The solution is to run both 푀1 and 푀2 at the same time. Imagine putting two fingers 
on the diagrams of the automata for 푀1 and 푀2, and moving them around. At the end, if 
either finger is on an accept state, then we accept. This strategy we can implement in 푀. 
We now formalize this idea. 
We should keep track of a state in 푀1 and a state in 푀2 as a single state in 푀. So each 
state in 푀 corresponds to a pair of states, on in 푀1 and 푀2; let 
푄 = 푄1 × 푄2 = {(푞, 푟) : 푞 ∈ 푄1, 푟 ∈ 푄2} . 
10
Lecture 1 Notes on Theory of Computation 
How to define 훿? When we get a new symbol coming in; we go to wherever 푞 goes and 
wherever 푟 goes, individually. 
훿((푞, 푟), 푎) = (훿1(푞, 푎), 훿2(푟, 푎)). 
The start state is 푞0 = (푞1, 푞2). The accept set is 
퐹 = (퐹1 × 푄2) ∪ (푄1 × 퐹2). 
(Note 퐹1 × 퐹2 gives intersection.) 
It is clear by induction that the 푘th state of 푀 is just the 푘th state of 푀1 and 푘th state 
of 푀2. 
Problem 1.3: Prove that the collection of regular languages is closed under concate-nation 
and Kleene star. 
Note: The following is my solution. See the next lecture for an easier way to phrase it. 
Proof of closure under ∘. To know whether a string 푤 is in 퐴 ∘ 퐵, we think as follows: 
Suppose reading from the beginning of 푤 we see a string in 퐴, say 푥1 · · · 푥푎1 . In other words, 
we get to an accept state in 푄1. Then maybe we have 
푥1 · · · 푥푎1 ⏟ ⏞ ∈퐴 
푥푎1+1 · · · 푥푛 
⏟ ⏞ ∈퐵 
. 
But maybe we should keep reading until next time we get to an accept state in 푄1, say step 
푎2, and 
푥1 · · · 푥푎2 ⏟ ⏞ ∈퐴 
푥푎2+1 · · · 푥푛 
⏟ ⏞ ∈퐵 
. 
But maybe we have 
푥1 · · · 푥푎3 ⏟ ⏞ ∈퐴 
푥푎3+1 · · · 푥푛 
⏟ ⏞ ∈퐵 
! 
So the possibilities “branch”—imagine putting one more finger on the diagram each time we 
get to an accept state; one finger then goes to 푄2 and the other stays at 푄1. Our fingers will 
occupy a subset of the union 퐴 ∪ 퐵, so let 
푄 = 2푄1∪푄2 , the set of subsets of 푄1 ∪ 푄2. 
Now define 
훿(푆, 푎) =⎧⎨⎩ 
{훿(푠, 푎) : 푠 ∈ 푆} , 퐹1 ∩ 푆 = 휑 
{훿(푠, 푎) : 푠 ∈ 푆} ∪ {훿(푞2, 푎)}, 퐹1 ∩ 푆̸= 휑. 
The start state is {푞1} and the accept set is 
퐹 = {푆 ⊆ 푄 : 퐹2 ∩ 푆̸= 휑} , 
i.e. the set of subsets that contain at least one element of 퐹2. Details of checking this works 
left to you! 
11
Lecture 2 Notes on Theory of Computation 
Note this solution involves “keeping track of multiple possibilities.” We’ll need to do this 
often, so we’ll develop some machinery—namely, a type of finite automaton that can keep 
track of multiple possibilities—that simplifies the writing of these proofs. 
Lecture 2 
Tue. 9/11/12 
The first problem set is out. Turn in the homework in 2-285. About homeworks: The 
optional problems are only for A+’s; we count how many optional problems you solved 
correctly. 
Look at the homework before the day before it’s due! The problems aren’t tedious lemmas 
that Sipser doesn’t want to do in lectures. He chose them for creativity, the “aha” moment. 
They encourage you to play with examples, and don’t have overly long writeups. Write each 
problem on a separate sheet, and turn them in in separate boxes in 2-285. 
Last time we talked about 
∙ finite automata 
∙ regular languages 
∙ regular operations, and 
∙ closure under ∪. 
Today we’ll talk about 
∙ regular expressions, 
∙ nondeterminism, 
∙ closure under ∘ and *, and 
∙ 퐹퐴 → regular expressions. 
S1 Regular expressions 
Recall that the regular operations are ∪, ∘, and *. 
Definition 2.1: A regular expression is an expression built up from members of Σ (the 
alphabet) and 휑, 휀 using ∪, ∘, and *. 
12
Lecture 2 Notes on Theory of Computation 
For example, if Σ = {푎, 푏}, we can build up regular expressions such as 
(푎* ∪ 푎푏) = (푎* ∪ 푎 ∘ 푏). 
Here we consider 푎 as a single string of length 1, so 푎 is shorthand for {푎}. 휀 might also 
appear, so we might have something like 푎* ∪ 푎푏 ∪ 휀 (which is the same since 휀 ∈ 푎*; the 
language that the expression describes is the same). We also write 퐿(푎*∪푎푏∪휀) to emphasize 
that the regular expression describes a language. 
Regular expressions are often used in text editors in string matching. 
Our goal for the next 11 
2 lectures is to prove the following. 
Theorem 2.2: thm:regex-FA Regular expressions and finite automata describe the same 
class of languages. In other words, 
1. Every finite automaton can be converted to a regular expression which generates the 
same language and 
2. every regular expression can be converted to finite automaton that recognizes the same 
language. 
Even though these 2 methods of computation (regular expressions and finite automata) 
seem very different, they capture the same language! To prove this, we’ll first have to develop 
some technology. 
S2 Nondeterminism 
First, let’s think about how to prove the closure properties from last time. We showed that 
if 퐴1 and 퐴2 are regular, so is 퐴1 ∪ 퐴2. To do this, given a machine 푀1 recognizing 퐴1 and 
a machine 푀2 recognizing 퐴2, we built a machine 푀 that recognizes 퐴1 ∪ 퐴2 by simulating 
퐴1 and 퐴2 in parallel. 
Now let’s prove closure under concatenation: If 퐴1 and 퐴2 are regular, then so is 퐴1퐴2. 
We start off the same way. Suppose 푀1 recognizes 퐴1 and 푀2 recognizes 퐴2; we want to 
construct 푀 recognizing 퐴1퐴2. 
What does 푀 need to do? Imagine a string 푤 going into 푀... Pretend like you are 푀; 
you have to answer if 푤 is in the concatenation 퐴1퐴2 or not, i.e. you have to determine if it 
is possible to cut 푤 into 2 pieces, the first of which is in 퐴1 and the second of which is in 퐴2. 
2 A1 2 A2 
W 
Why don’t we feed 푊 into 푀1 until we get to an accept state, and then transition control 
to 푀2 by going to the start state of 푀2? 
The problem with this approach is that just because you found an initial piece of 푊 in 
퐴1 does not necessarily mean you found the right place to cut 푊! It’s possible that the 
remainder is not in 퐴2, and you wrongly reject the string. Maybe you should wait until later 
time to switch to 퐴2. There are many possible ways of cutting. 
13
Lecture 2 Notes on Theory of Computation 
2 A1 2 A2 
W 
We introduce the idea of nondeterminism to give an elegant solution to this problem. 
2.1 Nondeterministic Finite Automata 
Consider, for example, the following automaton, which we’ll call 퐵. 
1 
0,1 
1 0, 휀 
푞1 푞2 푞3 푞4 
How is this different from a finite automaton? Note that there are two “1” arrows from 
푞1. In a nondeterministic finite automaton there may be several ways to proceed. The 
present state does NOT determine the next state; there are several possible futures. We also 
permit 휀 to be a label, as matter of convenience. 
How does this automaton work? 
We have multiple alternative computations on the input. When there is more than 1 
possible way to proceed, we take all of them. Imagine a parallel computer following each of 
the paths independently. When the machine comes to point of nondeterminism, imagine it 
forking into multiple copies of itself, each going like a separate thread in a computer program. 
An 휀 label means that you can take the transition for free. The other transitions also 
allow reading with 1 input symbol. (In some cases there is no arrow to follow. In those cases 
the thread just dies off.) 
What do we do when parallel branches differ in their output? One choice might end up 
at 푞4, and another may end up not at 푞4. Only one path needs to lead to an accept state, for 
the entire machine to accept. If any computational branch leads to an accepting state, we 
say the machine accepts the input. Acceptance overrules rejection. We reject only if every 
possible way to proceed leads to rejection. 
Although this seems more complicated than the finite automata we’ve studied, we’ll prove 
that it doesn’t give anything new. We’ll show that anything you can do with nondeterministic 
finite automata, you can also do with (deterministic) finite automata. 
1 
0,1 
1 0, 휀 
푞1 푞2 푞3 푞4 
Let’s look at a specific example. Take 01011 as the input. Point your finger at the start 
state 푞1. 
∙ Read 0. We follow the loop back to 푞1. 
14
Lecture 2 Notes on Theory of Computation 
∙ Read 1. There are 2 arrows with “1” starting at 푞1, so split your finger into 2 fingers, 
to represent the 2 different places machine could be: 푞1 and 푞2. 
∙ 0. Now each finger proceeds independently, because they represent different threads of 
computation. The finger at 푞1 goes back to 푞1. There is no place for the finger at 푞2 
to go (because there is no arrow with 0 from 푞2), so remove that finger. We just have 
{푞1} left. 
∙ 1. We branch into 푞1, 푞2. 
∙ 1. Following “1” arrows from 푞1 and 푞2, we can get to 푞1, 푞2, 푞3. But note there is an 
휀 transition from 푞3 to 푞4. This means we can take that transition for free. From a 
finger being on 푞3, a new thread gets opened on to 푞4. We end up with all states 푞1, 
푞2, 푞3, and 푞4. 
Each finger represents a different thread of the computation. Overall the machine accepts 
because at least 1 finger (thread of computation) ended up at an accepting state, 푞4. The 
NFA accepts this string, i.e. 01011 ∈ 퐿(퐵). By contrast 0101̸∈ 퐿(퐵), because at this point 
we only have fingers on 푞1, 푞2; all possibilities are reject states. 
We now make a formal definition. 
Definition 2.3: Define a nondeterministic finite automaton (NFA)푀 = (푄,Σ, 훿, 푞0, 퐹) 
as follows. 푄, Σ, 푞0, and 퐹 are the same as in a finite automaton. Here 
훿 : 푄 × Σ휀 → 풫(푄), 
where 풫(푄) = {푅 : 푅 ⊆ 푄} is the power set of 푄, the collection of subsets of 푄 (all the 
different states you can get to from the input symbol.) and Σ휀 = Σ ∪ {휀}. 
In our example, 훿(푞1, 1) = {푞1, 푞2} and 훿(푞3, 휀) = {푞4}. Note 훿 may give you back the 
empty set, 훿(푞2, 0) = 휑. 
The only thing that has a different form from a finite automaton is the transition function 
훿. 훿 might give you back several states, i.e. whole set of states. 
2.2 Comparing NFA’s with DFA’s 
We now show that any language recognized by a NFA is also recognized by a DFA (de-terministic 
finite automaton), i.e. is regular. This means they recognize the same class of 
languages. 
Theorem 2.4 (NFA’s and DFA’s recognize the same languages): If 퐴 = 퐿(퐵) for a NFA 
퐵, then 퐴 is regular. 
Proof. The idea is to convert a NFA 퐵 to DFA 퐶. 
15
Lecture 2 Notes on Theory of Computation 
Pretend to be a DFA. How would we simulate a NFA? In the NFA 퐵 we put our fingers 
on some collection of states. Each possibility corresponds not to a single state, but to a 
subset of states of 퐵. 
What should the states of 퐶 be? The states of 퐶 should be the power set of 퐵, i.e. the 
set of subsets of 퐵. In other words, each state of 퐶 corresponds to some 푅 ⊆ 푄. 
퐵 
NFA 
푅 ⊆ 푄 
퐶 
DFA 
Let 퐵 = (푄,Σ, 훿, 푞0, 퐹); we need to define 퐶 = (푄′,Σ, 훿′, 푞′0 
, 퐹′). Let 푄′ = 풫(푄) (the 
power set of 푄), so that if 퐵 has 푛 states, then 퐶 has 2푛 states. For 푅 ⊆ 푄 (i.e. 푅 ∈ 푄′), 
define 
훿′(푅, 푎) = {푞 ∈ 푄 : 푞 ∈ 훿(푟, 푎), 푟 ∈ 푅 or following 휀-arrows from 푞 ∈ 훿(푟, 푎)} . 
(The textbook says it more precisely.) 
1 
1 
퐵 
NFA 
16
Lecture 2 Notes on Theory of Computation 
푅 ⊆ 푄 
1 
퐶 
DFA 
The start state of 퐶 is a singleton set consisting of just the state and anything you can 
get to by 휀-transitions. The accept states are the subsets containg at least one accept state 
in 퐵. 
NFA’s and DFA’s describe same class of languages. Thus to show a language is a 
regular language, you can just build a NFA that recognizes it, rather than a DFA. 
Many times it is more convenient to build a NFA rather than a DFA, especially if 
you want to keep track of multiple possibilities. 
S3 Using nondeterminism to show closure 
Nondeterminism is exactly what we need to show that the concatenation of two regular 
languages is regular. As we said, maybe we don’t want to exit the first machine the first 
time we get to an accept state; maybe we want to stay in 푀1 and jump later. We want 
multiple possibilities. 
Proof of closure under ∘. Given 푀1 recognizing 퐴1 and 푀2 recognizing 퐴2, define 푀 as 
follows. Put the two machines 푀1 and 푀2 together. Every time you enter an accept state 
in 푀1, you are allowed to branch by an 휀-transition to the start state of 푀2—this represents 
the fact that you can either start looking for a word in 퐴2, or continue looking for a word in 
푀1. Now eliminate the accepting states for 푀2. We’re done! 
17
Lecture 2 Notes on Theory of Computation 
휀 
휀 
Nondeterminism keeps track of parallelism of possibilities. Maybe you got to an accepting 
state but you should have waited until a subsequent state. We have a thread for every possible 
place to transition from 퐴1 to 퐴2; we’re basically trying all possible break points in parallel. 
Another way to think of NFA’s is that they enable “guessing.” Our new machine 푀 
simulates 푀1 until it guesses that it found the right transition point. We “guess” this is the 
right place to jump to 푀2. This is just another way of saying we make a different thread. 
We’re not sure which is right thread, so we make a guess. We accept if there is at least one 
correct guess. 
Next we show that if 퐴1 is regular, then so is 퐴*1 
. 
Proof of closure under *. Suppose 푀1 recognizes 퐴1. We construct 푀 recognizing 퐴*1 
. We 
will do a proof by picture. 
18
Lecture 2 Notes on Theory of Computation 
What does if mean for a word 푊 to be in 퐴*1 
? 푊 is in 퐴*1 
if we can break it up into 
pieces that are in the original language 퐴1. 
2 A1 2 A1 2 A1 2 A1 2 A1 
W 
Every time we get to the an accept state of 푀1, i.e. we’ve read a word in 퐴1 and we 
might want to start over. So we put 휀-transition leading from the accept state to the start 
state. 
As in the case with concatenation, we may not want to reset at the first cut point, because 
maybe there is no way to cut remaining piece into words in 퐴1. So every time get to an 
accept, have the choice to restart—we split into 2 threads, one that looks to continue the 
current word, and one that restarts. 
There is a slight problem: we need to accept the empty string as well. 
To do this we add a new start state, and add an 휀-transition to the old start state. Then 
we’re good. 
19
Lecture 2 Notes on Theory of Computation 
NFA’s also give us an easier way to prove closure under union. 
Proof of closure under ∪. Suppose we’re given 푀1 recognizing 퐴1 and 푀2 recognizing 퐴2. 
To build 푀 recognizing 퐴1 and 퐴2, it needs to go through 푀1 and 푀2 in parallel. So we 
put the two machines together, add a new start state, and have it branch by 휀-transitions 
to the start states both 푀1 and 푀2. This way we’ll have a finger in 푀1 and a finger in 푀2 
at the same time. 
S4 Converting a finite automaton into a regular expression 
The proof of the closure properties gives us a procedure for converting a regular expression 
into finite automaton. This procedure comes right out of the construction of machines for 
∪, ∘, and *. This will prove part 2 of Theorem 2.2. 
We do a proof by example: consider (푎푏 ∪ 푎*). We convert this to a finite automaton as 
follows. For 푎, 푏 we make the following automata. 
We build up our expression from small pieces and then combine. Let’s make an automaton 
for 푎푏. We use our construction for closure under concatenation. 
20
Lecture 3 Notes on Theory of Computation 
This machine recognizes 푎푏. Now we do 푎*. 
Finally we put the FA’s for 푎푏 and 푎* together, using the ∪ construction, to get the FA 
recognizing 푎푏 ∪ 푎*. 
The constructions for ∪, ∘, and * give a way to construct a FA for any regular 
expression. 
Lecture 3 
Thu. 9/13/12 
Last time we talked about 
∙ nondeterminism and NFA’s 
21
Lecture 3 Notes on Theory of Computation 
∙ NFA→DFA 
∙ Regular expression→ NFA 
Today we’ll talk about 
∙ DFA→regular expression 
∙ Non-regular languages 
About the homework: By the end of today, you should have everything you need to solve 
all the homework problems except problem 6. Problem 3 (1.45) has a 1 line answer. As a 
hint, it’s easier to show there exists a finite automaton; you don’t have to give a procedure 
to construct it. 
We will finish our discussion of finite automata today. We introduced deterministic and 
nondeterministic automata. Nondeterminism is a theme throughout the course, so get used 
to it. 
We gave a procedure—the subset construction—to convert NFA to DFA. NFA helped 
achieve part of our goal to show regular expressions and NFAs recognize the same languages. 
We showed how to convert regular expressions to NFA, and NFA can be converted to DFA. 
To convert regular expressions, we used the constructions for closure under ∪, ∘, and *; 
we start with the atoms of the expression, and build up using more and more complex 
subexpressions, until we get the language recognized by the whole expression. This is a 
recursive construction, i.e. a proof by induction, a proof that calls on itself on smaller 
values. 
Today we’ll do the reverse, showing how to convert a DFA to a regular expressions, 
finishing our goal. 
S1 Converting a DFA to a regular expression 
Theorem 3.1 (Theorem 2.2, again): 퐴 is a regular language iff 퐴 = 퐿(푟) for some regular 
expression 푟. 
Proof. ⇐: Show how to convert 푟 to an equivalent NFA. We did this last time. 
⇒: We need to convert a DFA to an equivalent 푟. This is harder, and will be the focus 
of this section. 
We’ll digress and introduce another model of an automaton, which is useful just for the 
purposes of this proof. 
A generalized nondeterministic finite automaton (GNFA) has states, some ac-cepting, 
one of which is starting. We have transitions as well. What’s different is that we 
can write not just members of the alphabet and the empty string but any regular expression 
as a lable for a transition. So for instance we could write 푎푏. 
22
Lecture 3 Notes on Theory of Computation 
Start at the start state. During a transition, the machine gets to read an entire chunk of 
the input in a single step, provided that the string is in the language described by the label 
on the associated transition. 
There may be several ways to process the input string. The machine accepts if some 
possibility ends up at an accept state, i.e. there is some way to cut and read the input 
string. If all paths fail then the machine rejects the input. 
Although GNFA’s look more complicated, they still recognize the same languages as 
DFA’s! 
If looks harder to convert a GNFA to a regular expression, GNFA→r. However, for 
inductive proofs, it is often helpful to prove something stronger along the way, so we can 
carry through the statement. In other words, we strengthen the induction hypothesis. 
To make life easier, we make a few assumptions about the GNFA. 
∙ First, there is only 1 accept state. To achieve this, we can declassify accept states, and 
add empty transitions to new accept states. 
∙ The accept state and start states are different (taken care of by 1st bullet). 
∙ No incoming transitions come to the start state. To achieve this, make a new start 
state with an 휀-transition going to the previous start state. 
∙ There are only transitions to, not from, the accept state (taken care of by 1st bullet). 
∙ Add all possible transitions between states except the start and end states. If we are 
lacking a transition, add 휑 transition. We can go along this transition by reading a 
language described by 휑. This means we can never go along this transition, since 휑 
describes no languages. 
For instance, we can modify our example to satisfy these conditions as follows. 
23
Lecture 3 Notes on Theory of Computation 
Lemma 3.2: For every 푘 ≥ 2, every GNFA with 푘 states has an equivalent regular expression 
푅. 
Proof. We induct on 푘. 
The base case is 푘 = 2. We know what the states are: the machine has a start state 
(no incoming arrows) and an accept state. Assuming the conditions above, the only possible 
arrow is from the start to end, so the machine looks like the following. There are no return 
arrows or self-loops. 
푅 
푞1 푞2 
The only way to accept is to read a string in 푅; the machine can only process input in 
its entirety with one bite, so the language is just the regular expression 푅. This is the easy 
part. 
Now for the induction step. Assume the lemma true for 푘; we prove it for 푘 + 1. Sup-pose 
we’re given a (푘 + 1)-state GNFA. We need to show this has a corresponding regular 
expression. We know how to convert 푘-state GNFA to a regular expression. Thus, if we can 
convert the (푘 + 1)-state to a 푘-state GNFA, then we’re done. You can think of this as an 
iterative process: convert (푘 +1) to 푘 to 푘 −1 states and so on, wiping out state after state, 
and keeping the language the same, until we get to just 2 states, where we can read off the 
regular expression from the single arrow. 
We’ll pick one state 푥 (that is not the start or accept state) and remove it. Since 푘+1 ≥ 3, 
there is a state other than the start and accept state. But now the machine doesn’t recognize 
the same language anymore. We broke the machine! 
We have to repair the machine, by putting back the computation paths that got lost by 
removing the state. 
This is where the magic of regular expressions come in. 
Suppose we have arrows 푖 → 푥 → 푗. We can’t follow this path because 푥 is gone. In the 
arrow from 푖 to 푗, we have to put back the strings that got lost. So if we have 푖 푟1 −→ 푥 푟3 −→ 푗, 
then we add in 푟1푟3 from 푖 to 푗, so we can go directly from 푖 to 푗 via 푟1푟3. However, letting 
the self-loop at 푥 be 푟2, we might go along 푟1, repeat 푟2 for a while, and then go to 푟3, 
24
Lecture 3 Notes on Theory of Computation 
we so actually want 푟1(푟* 
2)푟3. Now take the union with the regular expression from 푖 to 푗, 
푟1(푟* 
2)푟3 ∪ 푟4. 
So the construction is as follows. For each pair 푖 푟4 −→ 푗, replace 푟4 with 
푟1(푟2)*푟3 ∪ 푟4 
where 푟1, 푟2, 푟3 are as above. All arrows adjusted in the same way. The computations that 
go from 푖 to 푗 via 푥 in the old machine are still present in the new machine, and go directly 
from 푖 to 푗. 
Our modified machine is equivalent to the original machine. Taking any computation in 
first machine, there is a corresponding computation in second machine on the same input 
string, and vice versa. This finishes the proof. 
Theorem 2.2 now follows, since a DFA is a GNFA. 
S2 Non-regular languages 
There are lots of langages that are not recognized by any finite automata. We see how to 
prove a specific language is non-regular. 
Let 
퐶 = {푤 : 푤 has equal number of 0s and 1s} . 
As we’ve said, it seems like 퐶 is not regular because it has to keep track of the difference 
between the number of 0s and 1s, and that would require infinitely many states. But be 
careful when you claim a machine can’t do something—maybe the machine just can’t do it 
the following the method you came up with! 
! “I can’t think of a way; if I try come up with one I fail” doesn’t hold water as a proof! 
As an example, consider 
퐵 = {푤 : 푤 has equal number of 01 and 10 substrings} . 
25
Lecture 3 Notes on Theory of Computation 
For example 1010̸∈ 퐵, but 101101 ∈ 퐵. This language may look nonregular because it 
looks like we have to count. But it is regular, because there is an alternative way to describe 
it that avoids counting. 
Problem 3.1: Show that 퐵 is regular. 
2.1 Pumping Lemma 
We give a general method that works in large number of cases showing a language is not 
regular, called the Pumping Lemma. It is a formal method for proving nonregular not 
regular. Later on, we will see similar methods for proving that problems cannot be solved 
by other kinds of machines. 
Lemma 3.3 (Pumping Lemma): lem:pump For any regular language 퐴, there is a number 
푝 where if 푠 ∈ 퐴 and |푆| ≥ 푝 then 푆 = 푥푦푧 where 
1. 푥푦푖푧 ∈ 퐴 for any 푖 ≥ 0 (We can repeat the middle and stay in the language.) 
2. 푦̸= 휀 (Condition 1 is nontrivial.) 
3. |푥푦| ≤ 푝 (Useful for applications.) 
What is this doing for us? The Pumping Lemma gives a property of regular languages. 
To show a language is not regular, we just need to show it doesn’t have the property. 
The property is that the language has a pumping length, or cutoff 푝. For any string 푠 
longer than the cutoff, we can repeat some middle piece (푦푖) as much as we want and stay in 
the language. We call this pumping up 푠. Every long enough string in the regular language 
can be pumped up as much as we want and the string remains in the language. 
Before we give a proof, let’s see an example. 
Example 3.4: Let 
퐷 = {0푚1푚 : 푚 ≥ 0} . 
Show that 퐷 is not regular using the Pumping Lemma. 
To show a language 퐷 is not regular, proceed by contradiction: If 퐷 is regular, then 
it must have the pumping property. Exhibit a string of 퐷 that cannot be pumped 
no matter how we cut it up. This shows 퐷 does not have the pumping property, so 
it can’t be regular. 
Assume 퐷 is regular. The pumping lemma gives a pumping length 푝. We find a string 
longer than 푝 that can’t be pumped: let 푠 = 0푝1푝 ∈ 퐷. 
26
Lecture 3 Notes on Theory of Computation 
s = 
0 · · · 0 1 · · · 1 
p p 
There must be some way to divide 푠 into 3 pieces, so that if we repeat 푦 we stay in the 
same language. 
But we can’t pump 푠 no matter where 푦 is. One of the following cases holds: 
1. 푦 is all 0’s 
2. 푦 is all 1’s 
3. 푦 has both 0’s and 1’s. 
If 푦 is all 0’s, then repeating 푦 gives too many 0’s, and takes us out of the language. If 푦 is 
all 1’s, repeating gives too many 1’s. If 푦 has both 0’s and 1’s, they are out of order when 
we repeat. In each case, we are taken out of the language so pumping lemma fails, and 퐷 is 
not regular. 
If we use condition 3 of the Pumping Lemma we get a simpler proof: 푥푦 is entirely in 
the first half of 푠, so 푦 must be all 0’s (case 1). Then 푥푦푦푧 has excess 0’s and so 푥푦2푧̸∈ 퐷. 
Now we prove the Pumping Lemma. 
Proof of Lemma 3.3. Let 푀 be the DFA for 퐴. Let 푝 be the number of states of 푀. This 
will be our pumping length. 
Suppose we have a string of length at least 푝. Something special has to happen when the 
machine reads the string: We have to repeat a state! We have to repeat a state within the 
first 푝 steps (because after 푝 steps we’ve made 푝 + 1 visits to states, including the starting 
state). Consider the first repeated state, drawn in in the below diagram. 
Divide the path into 3 parts: 푥, 푦, and 푧. Note we can choose 푦 nonempty because we’re 
saying the state is repeated. From this we see that we can repeat 푦 as many times as we 
want. 
27
Lecture 4 Notes on Theory of Computation 
Example 3.5: Now we show 
퐶 = {푤 : 푤 has equal number of 0s and 1s} . 
is not regular. There are two ways to proceed. One is to use the Pumping Lemma directly 
(this time we need to use condition 3) and the other way is to use the fact that we already 
know 퐷 is not regular. 
What is wrong with the following proof? Because 퐷 is not regular and 퐷 ⊆ 퐶, 퐶 is not 
regular. 
! Regular languages can have nonregular languages as subsets, and vice versa. Subsets 
tell you nothing about regularity. 
However, we if we combine the fact that 퐷 ⊆ 퐶 with some extra features of 퐶, then we 
can come up with a proof. Note 
퐷 = 퐶 ∩ 0*1*. 
Note 0*1* is regular. If 퐶 were regular, then 퐷 would be regular, because the intersection 
of 2 regular languages is regular. Since 퐷 is not regular, neither is 퐶. 
The Pumping Lemma is a powerful tool for showing languages are nonregular, es-pecially 
when we combine it with the observation that regular languages are closed 
under regular operations. 
Lecture 4 
Tue. 9/18/12 
Last time we talked about 
∙ Regular expressions← DFA 
∙ Pumping lemma 
Today we’ll talk about CFG’s, CFL’s, and PDA’s. 
Homework 1 is due Thursday. 
∙ Use separate sheets. 
∙ No bibles, online solutions, etc. 
∙ Office hours 
28
Lecture 4 Notes on Theory of Computation 
– Michael Sipser: Monday 3-5 
– Zack: Tuesday 4-6 32-6598 
– Alex: Wednesday 2-4 32-6604 
S0 Homework hints 
Problem 2 (1.67, rotational closure): 
If 퐴 is a language, 푤 = 푥푦 ∈ 퐴, then put 푦푥 ∈ 푅퐶(퐴). Prove that if 퐴 is regular, then 
푅퐶(퐴) is also regular. If 푀 is a finite automaton and 퐿(푀) = 퐴, then you need to come 
up with a finite automaton that recognizes the rotational closure of 퐴. The new automaton 
must be able to deal with inputs that look like 푦푥. 
Don’t just try to twiddle 푀. 
If you were pretending to be a finite automaton yourself, how you would go about deter-mine 
if a string is in the rotational closure of the original language? 
Recall, for 푦푥 to be in the rotational closure, the original automaton should accept 푥푦. 
How would you run the original automaton to see whether the string is a rearranged input 
of something the original automaton would have accepted? 
If only you could see 푥 in advance, you would know what state you get to after running 
푦! Then you could start there, run 푦, then run 푥, and see if you get back where you started. 
But you have to pretend to be a finite automaton, so you can’t see 푥 first. 
The magic of nondeterminism will be helpful here! You could guess all possible starting 
states, and see if any guess results in accept. “Guess and check” is a typical pattern in 
nondeterminism. 
Problem 3 (1.45, 퐴/퐵 is regular, where 퐴 is regular and 퐵 is any): We get 퐴/퐵 as follows: 
start with 퐴 and remove all the endings that can be in 퐵. In other words, 퐴/퐵 consists of 
all strings such that if you stick in some member of 퐵, you get a member of 퐴. 
Note you don’t necessarily have a finite automaton for 퐵 because 퐵 is not necessarily 
regular! This might be surprising. Think about how you would simulate a machine for 퐴/퐵. 
If a string leads to one of the original accepting states, you might want accept it early. You 
don’t want to see rest of string if the rest of the string is in 퐵. 
Looked at the right way, the solution is transparent and short. 
Again, think of what you would do if you were given the input and wanted to test if it 
was in the language. 
Problem 4 (1.46d): When you’re using the pumping lemma, you have to be very careful. 
The language you’re supposed to work with consists of strings 푤푡푤 where |푤|, |푡| ≥ 1. For 
example, 0001000 is in the languge, because we can let 
000 
⏟ ⏞ 푤 
1 
⏟ ⏞ 푡 
000 
⏟ ⏞ 푤 
. 
29
Lecture 4 Notes on Theory of Computation 
If we add another 0 to the front, it’s tempting to say we’re not out of the language. But 
we’re still in the language because we can write 
000 
⏟ ⏞ 푤 
01 
⏟ ⏞ 푡 
000 
⏟ ⏞ 푤 
. 
You don’t get to say what 푤 and 푡 are. As long as there is some way of choosing 푤 and 푡, 
it’s in the language. 
S1 Context-Free Grammars 
We now talk about more powerful ways of describing languages than finite automata: context-free 
grammars and pushdown automata. Context free grammars and pushdown automata 
have practical applications: we can use them to design controllers, and we can use them to 
describe languages, both natural languages and programming languages. 
1.1 Example 
We introduce context-free grammars with an example. 
A context-free grammar has variables, terminals, and rules (or predictions). 
푆 → 푂푆1 
푆 → 푅 
푅 → 휀 
In the above, the three statement above are rules, 푅 is a variable, and the 1 at the end of 
푂푆1 is a terminal. The symbols on the left hand side are variables. The symbols that only 
appear on the right hand side are called terminals. 
We use a grammar to generate a language as follows. Start out with the symbol on the 
LHS of the topmost rule, 푆 here. The rules represent possibilities for substitution. Look for 
a variable in our current expression that appears on the LHS of a rule, substitute it with the 
RHS. For instance, in the following we replace each bold string by the string that is in blue 
in the next step. 
S 
0S1 
00S11 
00R11 
00휀11 
0011. 
When we have a string with only terminal symbols, we declare that string to be in the 
langage of 퐺. So here 
0011 ∈ 퐿(퐺). 
30
Lecture 4 Notes on Theory of Computation 
Problem 4.1: What is the language of 퐺? 
We can repeat the first rule until we get tired, and then terminate by the 2nd and 3rd 
rules. We find that 
퐿(퐺) = ⌋︀0푘1푘 : 푘 ≥ 0{︀. 
The typical shorthand combines all rules that have the same left hand side into a single line, 
using the symbol | to mean “or.” So we can rewrite our example as 
푆 → 푂푆1|푅 
푅 → 휀. 
Example 4.1: Define 퐺2 to be the grammar 
퐸 → 퐸 + 푇|푇 
푇 → 푇 × 퐹|퐹 
퐹 → (퐸)|푎. 
The variables are 
(푉 ) = {퐸, 푇, 퐹}; 
the terminals are 
(Σ) = {푎,+,×, (, )}. 
(We think of these as symbols.) This grammar represents arithmetical expressions in 푎 using 
+,×, and parentheses; for instance, (푎 + 푎) × 푎 ∈ 퐿(퐺). 
This might appear as part of a larger grammar of a programming language. 
Here is the parse tree for (푎 + 푎) × 푎. 
퐸 
푇 
푇 × 퐹 
퐹 푎 
( 퐸 ) 
퐸 + 푇 
푇 퐹 
퐹 푎 
푎 
31
Lecture 4 Notes on Theory of Computation 
A derivation is a list of steps in linear form: When 푈, 푉 ∈ (푉 ∪ Σ)*, we write 푈 =⇒ 푉 is 
we get to 푣 from 푢 in one substitution. For instance we write 퐹 × 퐹 =⇒ (퐸) × 퐹. 
We write 푢 * =⇒ 푣 if we can get from 푢 to 푣 in 0, 1, or more substitution steps. 
1.2 Formal definition 
We now give a formal definition, just like we did for a finite automaton. 
Definition 4.2: A context-free grammar (CFG) is 퐺(푉,Σ, 푆,푅) where 
∙ 푉 is the set of variables, 
∙ Σ is the set of terminals, 
∙ 푆 ∈ 푉 is the start variable, and 
∙ 푅 is the set of rules, in the form 
variable → string of variable and terminals. 
We say 푆 derives 푤 if we can repeatedly make substitutions according to the rules to get 
from 푆 to 푤. We write a derivation as 
푆 =⇒ 푢1 =⇒ 푢2 =⇒ · · · =⇒ 푢ℓ =⇒ 푤, 푆 * =⇒ 푤. 
(푤 only has terminals, but the other strings have variables too.) We say that 퐺 recognizes 
the language 
퐿(퐺) = ⌋︀푤 ∈ Σ* : 푆 * =⇒ 푤{︀. 
There is a natural correspondence between a derivation and a parse tree. Parse tree may 
be more relevant to a particular applications. 
Note 푎 + 푎 × 푎 ∈ 퐿(퐺2). Take a look back at the parse tree for 푎 + 푎 × 푎. Reading it 
from the bottom up, the parse tree first groups 푎×푎 into a subtree, and then puts in the ×. 
There is no way to put the + first, unless we put in parentheses. 
This is important in a programming language! Sometimes we can have multiple parse 
strings for the same string—an undesirable feature in general. That means we have two 
different interpretations for a particular string, that can give rise to two different semantic 
meanings. In a programming language, we do not want two different meanings for the same 
expression. 
Definition 4.3: A string is derived ambiguously if it has two different parse trees. A 
grammar or language is ambiguous if some string can be derived ambiguously. 
We won’t discuss this further, but look at the section in the book for more. 
32
Lecture 4 Notes on Theory of Computation 
1.3 Why care? 
To describe a programming language in formal way, we can write it down in terms of a 
grammar. We can specify the whole syntax of the any programming language with context-free 
grammars. If we understand grammars well enough, we can generate a parser—the part 
of a compiler which will take the grammar representing the program, process a program, 
and group the pieces of code into recognizable expressions. The parser would then feed the 
expressions into another advice. 
The key point is that we need to write down a grammar that represents the programming 
language. 
Context-free grammars had their origin in the study of natural languages. For instance, 
푆 might represent sentence, and we may have rules such as 
푆 → (noun phrase) (verb phrase) , 
(verb) → (adverb) (verb) , 
(noun) → (adjective) (noun) , 
and so forth. We can gain insight into the way a language works by specifying it this fashion. 
This is a gross oversimplification, but both the study of programming and natural lan-guages 
benefit from the study of grammars. 
We’re going to shift gears now, and then put everything together in the next lecture. 
S2 Pushdown automata 
Recall that we had 2 different ways for describing regular languages, using a 
∙ computational device, a finite automaton, which recognize members of regular lan-guages 
when it runs. 
∙ descriptive device, a regular expression, which generates members of regular languages. 
We found that finite automata and regular expressions recognize the same class of languages 
(Theorem 2.2). A CFG is a descriptive device, like a regular expression. We will find a 
computational device that recognizes the same languages as CFG’s. First, a definition. 
Definition 4.4: A context-free language (CFL) is one generated by a CFG. 
We’ve already observed that there is a CFL that is not regular: we found a CFG gener-ating 
the language {0푘1푘}, which is not regular. We will show in fact that the CFL’s include 
all regular languages. More on this later. 
S3 Comparing pushdown and finite automata 
We now introduce a computational device that recognizes exactly the context-free languages: 
a pushdown automaton (PDA). A pushdown automaton is like a finite automaton with a 
extra feature called a stack. 
33
Lecture 4 Notes on Theory of Computation 
In a finite automaton, we have a finite control, i.e. different states with rules of how 
to transition between them. We draw a schematic version of a finite automaton, as above. 
A head starts at the beginning of the input string, and at each step, it moves to the next 
symbol to the right. 
A pushdown automata has an extra feature. It is allowed to write symbols on the stack, 
not just read symbols. 
However, there are some limitations. A pushdown automata can only look at the topmost 
symbol of a stack. When it writes a symbol to the stack, what’s presently there gets pushed 
down, like a stack of plates in a cafeteria. When reading, the reverse happens. In one step 
the automata can only pop off the topmost symbol; then the remaining symbols all move 
back up. We use the following terminology: 
∙ push means “add to stack,” and 
∙ pop means “read and remove from stack.” 
When we looked at FA’s, we considered deterministic and nondeterministic variants. For 
PDA’s, we’ll only consider the nondeterministic variant. A deterministic version has been 
studied, but in the case of pushdown automata they are not equivalent. Some languages 
require nondeterministic PDA’s. 
Deterministic pushdown automata have practical applications to programming languages, 
because the process of testing whether a language is valid is especially efficient if the PDA 
is deterministic. This is covered in the 3rd edition of the textbook. 
Let’s give an example. 
Example 4.5: ex:akbk We give a PDA for 퐴 = ⌋︀0푘1푘 : 푘 ≥ 0{︀. 
As we’ve said, a PDA is a device that looks like FA but also have stack can write on. 
Our PDA is supposed to test whether a string is in 퐴. 
If we used an ordinary FA, without a stack, then we’re out of luck. Intuitively, a FA 
has finite memory, and we can’t do this language with finite memory. The stack in a PDA, 
however, is just enough to allow us to “remember stuff.” 
34
Lecture 4 Notes on Theory of Computation 
Problem 4.2: How would we design a PDA that recognizes 퐴? (How would you use 
the stack?) 
We can use the stack to record information. The idea is that every time we read a 0, 
stick a 0 in; every time we read a 1, pop it out. If the stack becomes empty and has not 
become empty beforehand, then we accept. The 0’s match off with 1’s that come later. 
We have to modify this idea a little bit, because what if the 0’s and 1’s are out of order? 
We don’t want to accept strings where the 0’s and 1’s are out of order. If we insist that 0’s 
come before 1’s, we need a finite control mechanism. 
We have a state for reading 0’s and another state when reading 1’s. In the “1” state 
the PDA no longer takes 0’s and adds them to the stack. We see that a PDA combines the 
elements of FA with the power of a stack. 
Now we ask: how do we know when to transition from reading 0’s to reading 1’s? We’d 
like to consider different possibilities for when to transition, i.e. let several parallel threads 
operate independently, and if any thread gets to an accept state, then have the machine 
accepts the input. Hence we turn to nondeterminism: every time there’s a choice, the 
machine splits into different machines which operate independently, each on its own stack. 
At every step when the machine is reading 0’s, we give it a nondeterministic choice: in 
the next step the machine can continue to push 0’s on the stack, or transition into reading 
1’s and popping 0’s off the stack. 
3.1 Formal definition 
Let’s write down the formal definition. 
Definition 4.6: A pushdown automaton (PDA) is a 6-tuple 푃 = (푄,Σ, Γ, 훿, 푞0, 퐹) 
where 
∙ 푄 are the states 
∙ Σ is the input alphabet, 
∙ Γ is the stack alphabet, 
∙ 훿 is the transition function, 
∙ 푞0 is the start state, 
∙ 퐹 is the accept states. 
Here 푄, Σ, 푞0, and 퐹 are as in a finite automata, but the transition function is a function 
훿 : 푄 × Σ휀 × Γ휀 → 풫(푄 × Γ휀) (we explain this below). 
35
Lecture 5 Notes on Theory of Computation 
On first thought, we may think to define the transition function as a function 
훿 : 푄 × Σ × Γ → 푄 × Γ. 
The function takes as input 
∙ a state in 푄—the current state of the machine, 
∙ a symbol from Σ—the next symbol to read, and 
∙ a symbol from Γ—the top-of-stack symbol. 
It outputs a another state in 푄 to transition to, and a symbol from Γ—the next symbol to 
push on the stack. 
However, we have to modify this: we want nondeterminism, so we allow the machine to 
transition to an entire set of possible next states and next symbols, and we represent this by 
having 훿 output a subset: 
훿 : 푄 × Σ × Γ → 풫(푄 × Γ). 
We also allow 훿 to read an empty string, or read without popping a string on the stack, and 
proceed without writing a symbol, so we actually want 
훿 : 푄 × Σ휀 ⏟ ⏞ Σ∪{휀} 
× Γ휀 ⏟ ⏞ Γ∪{휀} 
→ 풫(푄 × Γ휀). 
We’ll do one more example and save proofs to next time. 
Example 4.7: Consider the language 
⌋︀푤푤ℛ : 푤 ∈ {0, 1}{︀, 
where ℛ means “reverse the word.” This is the language of even-length palindromes such as 
0110110110. A PDA recognizing this language uses nondeterminism in an essential way. We 
give a sketch of how to construct a PDA to recognize it. (See the book for details.) 
The PDA has to answer: does the 2nd half of the word match the first half? 
We should push the first half of the word on the stack. When we pop it off, the string 
comes out backwards, and we can match it with the second half of the word. This is exactly 
what we need. 
But how do we know we’re at the middle? When do you shift from pushing to popping 
and matching? 
Can we find the length of the word? No. Instead, we guess nondeterministically at every 
point that we’re at the middle! If the word is a palindrome, one of the threads will guess 
correctly the middle, and the machine will accept. 
36
Lecture 5 Notes on Theory of Computation 
Lecture 5 
Thu. 9/20/12 
Problem set 2 is out. 
Last time we talked about CFG’s, CFL’s, and PDA’s. Today we will talk about 
∙ CFG→PDA, 
∙ non-CFL’s 
∙ Turing machines 
Recall what nondeterminism means: every time there are multiple possibilities, the whole 
machine splits into independent parts. As long as one thread lands in an accept state, then 
we accept. Nondeterminism is a kind of guessing and checking that the guess is correct. 
When we define the model for a PDA, the PDA can pop something from the stack. There 
is no hardware (bulit-in function) to test if the stack is empty, but we can use “software” 
(i.e. clever programming) to test if the stack is empty: to start off, write a $, and when 
the machine sees $, it knows that the stack is empty. Thus we can allow any PDA to test 
whether the stack is empty. We’ll use this in many of our examples. 
To jog your memory, a CFG is made up of a set of rules like the following: 
퐸 → 퐸 + 푇|푇 
푇 → 푇 × 퐹|퐹 
푇 → (퐸)|푎. 
We saw that this CFG recognizes 푎 × 푎 + 푎: we had a derivation of 푎 × 푎 → 푎 given by 
퐸 =⇒ 퐸 + 푇 =⇒ 푇 + 푇 =⇒ 푇 + 퐹 =⇒ · · · =⇒ 푎 × 푎 + 푎. 
S1 CFG’s and PDA’s recognize the same language 
Our main theorem of today is the following. 
Theorem 5.1: 퐴 is a CFL iff some PDA recognizes 퐴. 
In other words, CFG’s and PDA’s have exactly the same computing power; they generate 
the same class of languages. To prove this we’ll simulate one kind of computation with 
another kind of computation. 
Proof. We need to prove 
1. CFG→PDA (we’ll do this today) 
2. PDA→CFG (skip, see the book. This is more technical.) 
37
Lecture 5 Notes on Theory of Computation 
Corollary 5.2: 1. Every regular language is a CFL. 
2. The intersection of a context-free language and a regular language is a context-free 
language. 
CFL ∩ regular = CFL. 
Proof. 1. A finite automaton is a pushdown automaton that just doesn’t use the stack. 
2. Omitted. (See Exercise 2.18a in the book—basically just take the states to be the 
product set.) 
Note 2 is weaker than the statement that the intersection of two context-free languages 
is a context-free language, which is not true. 
Proposition 5.3: The intersection of two CFL’s, or the complement of a CFL, is closed 
under ∪, ∘, *, but not under ∩ or complementation. 
Proof. Just give a construction using grammars or pushdown automata. 
S2 Converting CFG→PDA 
Proof sketch. We convert a CFG into a PDA. 
The input is a string that may or may not be in the language; our PDA has to say 
whether it is. 
Recall that the derivation is the sequence of strings we go through to get to a string in 
the language. We use nondeterminism to guess the derivation. 
We first write down the start variable on the top of the stack. Take whatever string is 
written down on the stack to be the current working string. Take one of the variables on the 
stack and in the next step, replace it using one of the rules. There may be several possible 
steps; we consider all possibilities using nondeterminism. 
For instance, we’d want the machine to operate as follows. 
38
Lecture 5 Notes on Theory of Computation 
At the end we pop the symbols and compare with the input string, and then test the 
stack for emptiness at the end. 
However, there’s a problem: what if we want to replace some symbol not at the top? 
The idea is the following: if the top of the stack has a terminal symbol (which can’t be 
replaced by anything), let’s match it against the next symbol in the input word immediately. 
Whenever we have a terminal symbol at the top of the stack, we pop and compare until a 
variable (such a F) is at the top. Sooner or later, we’ll have a variable at the top, and then 
we can try one of the substitutions. 
See the textbook for details. 
S3 Non-CFLs 
CFG’s are powerful but there are still many languages they can’t recognize! 
We will show the language ⌋︀푎푘푏푘푐푘 : 푘 ≥ 0{︀ is not a CFL. Note by contrast that ⌋︀푎푘푏푘 : 푘 ≥ 0{︀ is a CFL (Example 4.5). 
An intuitive argument is the following: we can push the 푎’s, compare with the 푏’s by 
popping the 푎’s, but when we get to the 푐’s we’re out of luck: the 푎’s were all popped off, 
and the system has not remembered any information about the 푎’s. However, as we’ve said, 
we have to be careful with any argument that says “I can’t think of a way; thus it can’t be 
done.” How do you know the machine doesn’t proceed in some other tricky way? 
By contrast, if we look at the strings 푎푘푏푙푐푚 where either the number of 푎’s equal the 
number of 푏’s, or the number of 푎’s equal the number of 푐’s, this can be done with pushdown 
automaton. (Use nondeterminism, left as exercise.) 
We’ll give a technique to show that a language is not a CFL, a pumping lemma in the 
spirit of the pumping lemma for regular languages, changed to make it apply to CFL’s. 
Our notion of pumping is different. It is the same general notion: all long strings can be 
“pumped” up and stay in the language. However, we’ll have to cut our string into 5 rather 
then 3 parts. 
39
Lecture 5 Notes on Theory of Computation 
Lemma 5.4 (Pumping lemma for CFL’s): lem:pump-cfl For a CFL 퐴, there is a pumping 
length 푝 where if 푠 ∈ 퐴 and |푠| ≥ 푝, then 푠 can be broken up into 푠 = 푢푣푥푦푧 such that 
1. 푢푣푖푥푦푖푧 ∈ 퐴 for 푖 ≥ 0. (We have to pump 푣 and 푦 by the same amount.) The picture 
is as follows. 
S = u v x y z 
2. 푣푦̸= 휀. (We can’t break it up so that the second and fourth string are empty, because 
in this case we won’t be saying anything!) 
3. |푣푥푦| ≤ 푝 
Example 5.5: Let’s show ⌋︀푎푘푏푘 : 푘 ≥ 0{︀satisfies the pumping lemma. For instance, we can 
let 
푎푎푎푎 
⏟ ⏞ 푢 
푎 
⏟ ⏞ 푣 
푎푏 
⏟ ⏞ 푥 
푏 
⏟ ⏞ 푦 
푏푏푏푏 
⏟ ⏞ 푧 
. 
Example 5.6: If ⌋︀푎푘푏푘푐푘 : 푘 ≥ 0{︀were a CFL, it would be satisfy the pumping lemma. We 
show this is not true, so it is not a CFL. Again, this is a proof by contradiction. 
Suppose ⌋︀푎푘푏푘푐푘 : 푘 ≥ 0{︀ satisfies the conclusion of the pumping lemma. Take the string 
푠 = 푎 · · · 푎 
⏟ ⏞ 푝 
푏 · · · 푏 
⏟ ⏞ 푝 
푐 · · · 푐 
⏟ ⏞ 푝 
= 푎푝푏푝푐푝 
and let 푢, 푣, 푥, 푦, 푧 satisfy the conclusions of the pumping lemma. First note that 푣 can 
only have one kind of symbol, otherwise when we pump we would have letters out of order 
(instead of all 푎’s before 푏’s and all 푏’s before 푐’s), and the same is true of 푦. Thus when we 
pump up 푣 and 푦, the count of at most 2 symbols will increase (rather than all 3 symbols), 
and we will not have an equal number of 푎’s, 푏’s, and 푐’s. 
Thus ⌋︀푎푘푏푘푐푘 : 푘 ≥ 0{︀ fails the pumping lemma, and hence is not context-free. 
Proof of Pumping Lemma 5.4. We’ll sketch the higher-level idea. 
Qualitatively, the pumping lemma says that every long enough string can be pumped 
and stay in the language. 
Let 푠 be a really long string. We’ll figure out what “really long” means later. 
Let’s look at the parse tree; suppose 푇 is the start variable. 
What do we know about the parse tree? It’s really tall, because 푠 is long, and a short 
parse tree generate can’t generate a really wide tree (which corresponds to a long string). 
More precisely, the amount of “fan-out” is determined by the size of the longest right-hand 
string in the grammar. We determine what “long” and “tall” means after we look at the 
grammar. 
What does it mean when we have a really tall parse tree? If we take a path, then there 
has to be a long path, with lots of nodes—so many nodes that we have to repeat one of the 
variables, say 푅. Let 푢, 푣, 푥, 푦, 푧 be as follows. 
40
Lecture 5 Notes on Theory of Computation 
Look at the subtree that comes from the lower and upper instances of the repeated 
variable. Now let’s make a “surgery”: take the subtree under the higher 푅 and stick it in 
place of the lower subtree. We now get another valid parse tree for 푢푣푣푥푦푦푧. We can repeat 
this as many times as we’d like. 
We get the 푖 = 0 case by sticking the lower tree on the upper 푅. 
41
Lecture 5 Notes on Theory of Computation 
There are some details to get the conditions to work. 
∙ How do we know that 푣 and 푦 are not both empty? If they are, we’ve shown nothing. 
Let’s start off with the parse tree with the fewest nodes. If 푣 and 푦 were both empty, 
then when we stick the lower 푅-subtree higher up as in the last picture above, we get 
fewer nodes, contradicting our minimality assumption. Hence 푣 and 푦 can’t both be 
empty; this gives condition 2. 
∙ Let’s figure out 푝. Let 푏 be the size of the largest right-hand side of a rule. We want 
the tallness to be at least |푉 | + 1 (|푉 | is the number of variables.) At each level, the 
number of nodes multiplies by at most 푏. If we set 푝 = 푏|푉 |+1, then the tree would have 
at least |푉 | + 1 levels, so one of the symbols would repeat, as needed. 
∙ To satisfy item 3 we take the lowest repetition of a symbol, so that there can be no 
repetitions below. This will give the bound |푣푥푦| ≤ 푝. 
S4 Turing machines 
Everything we’ve done so far is a warm-up. We’ve given two models of computations that 
are deficient in a certain sense because they can’t even do what we think computers can do, 
such as test whethere a string is of the form 푎푘푏푘푐푘. 
A Turing machine is vastly more powerful; it is a much closer model to what we think 
about when we think about a general-purpose computer. 
The input tape of a Turing machine combines the features of both the input and stack. 
It is a place where we can both read and write. 
∙ We can read and write on the tape. 
This is the key difference. The following are other differences. 
42
Lecture 6 Notes on Theory of Computation 
∙ We are able to both move the tape forward and back, so we can read what we wrote 
before. (It’s a two way head.) 
∙ The tape is infinite to the right. At the beginning, it is filled with a finite string, and 
the rest of the tape is filled will special symbols called blanks. The head starts on the 
leftmost tape cell. 
∙ The machine accepts by entering an “accept” state anywhere. (It no longer makes 
sense to say the Turing machine accepts only at the end of the string—it might have 
erased or changed the last symbol!) 
∙ There is a “reject” state; if the machine visits that state, stop and reject (reject by 
halting). 
∙ A Turing machine can also reject by entering an infinite loop (“looping”).2 
For the time being we’ll just allow the deterministic variant. 
Example 5.7: We outline how to build a Turing machine that recognizes {푎푛푏푛푐푛}. 
Let’s assume we can test when we’re at the beginning. We go through the string and 
cross out the first 푎, 푏, and 푐 that appear. 
If we find letters that are out of order, we reject. Otherwise we go back to the beginning 
and continue to cross off symbols 푎, 푏, and 푐 one at a time. If we cross out the last 푎, 푏, and 
푐 on the same run, then accept. 
When we cross a symbol off, write the symbol 푥 to remember that we crossed out some-thing 
there. 
We’ll write down the formal definition next time. Our transition function will depends 
on on both the state and tape symbol. 
2How does the Turing machine know it has entered a infinite loop? Mathematically being able to define 
when the machine rejects is different from what we can tell from the machine’s operation. We’ll talk more 
about this later. 
43
Lecture 6 Notes on Theory of Computation 
Lecture 6 
Tue. 9/25/12 
Last time we talked about 
∙ CFG↔PDA (we only proved →) 
∙ Pumping lemma for CFL’s 
∙ Turing machines 
Turing machines are an important model for us because they capture what we think of when 
we think of a general-purpose computer, without constraints like only having a stack memory 
or a finite memory. Instead, it has an unlimited memory. 
Today we’ll talk about 
∙ Turing machine variants 
– Multitape 
– Nondeterministic 
– Enumerators 
∙ Church-Turing thesis 
In the first half of today’s lecture, we’ll develop some intuition for Turing machines. We will 
prove that several variations of Turing machines are actually all equivalent. As we’ll see, 
this has philosophically important implications. 
S1 Turing machines 
We now give a formal definition of a Turing machine. 
Definition 6.1: A Turing machine (TM) is a 7-tuple 
푀 = (푄,Σ, Γ, 훿, 푞0, 푞acc, 푞rej) 
where 
∙ 푄 is the set of states, 
∙ Σ is the input alphabet, 
∙ Γ is the tape alphabet, 
∙ 훿 is a function 푄 × Γ → 푄 × Γ × {퐿,푅}. Here 퐿 or 푅 denote the movement of the 
head. 
44
Lecture 6 Notes on Theory of Computation 
∙ 푞0 is the start state, 
∙ 푞푎 are accept states, and 
∙ 푞푟 are reject states. 
If the machine tries to move off the left extreme of the tape, the machine instead just stay 
where it is.3 
A Turing machine may halt (accept or reject) or loop (reject). If the machine loops we 
say the machine rejects, but think of it as rejecting after “infinite time”; we don’t know at 
any finite time that the machine has rejected. 
Definition 6.2: Let 
퐿(푀) = {푤 : 푀 on input 푤 accepts} . 
If 퐴 = 퐿(푀) for some Turing Machine 푀, we say that 퐴 is Turing-recognizable (also 
called recursively enumerable). 
An important subclass of Turing Machines are those that always halt. 
Definition 6.3: A TM 푀 is a decider if 푀 halts on every input. If 퐴 = 퐿(푀) for some 
decider, we say 퐴 is decidable. 
Turing Machines which reject by halting are more desirable than those that reject by 
looping. 
We just introduced 2 new classes of languages, Turing-recognizable languages and decid-able 
languages. We have 
CFL’s ⊂ decidable ⊂ T-recognizable 
where the inclusions are proper (we’ll show the right-hand inclusion is proper). We need to 
show containment in the left-hand side and nonequality in the RHS. 
1.1 A brief history of Turing machines 
Why are Turing machines so important, and why do we use them as a model for a general-purpose 
computer? 
The concept of a Turing machines dates back to the 1930’s. It was one of a number of 
different models of computation that tried to capture effective computability, or algorithm, 
as we would now say. Other researchers came up with other models to capture computation; 
for example, Alonzo Church developed lambda calculus. 
3Other treatments do different things. However, minor details don’t make any difference. We get the 
same computing power, and the same class of languages is recognized. The model of Turing machine is 
robust; it’s not sensitive to little details. 
45
Lecture 6 Notes on Theory of Computation 
It wasn’t obvious that these different models are equivalent, i.e., that they captured the 
same class of computations. However, they did. 
Nowadays we have programming languages. Can today’s more “advanced” programming 
languages (Python) do more than a FORTRAN program? It has a lot of new features 
compared to old boring “do” loops. It is conceivable that as we add more constructs to a 
programming language, it becomes more powerful, in the sense of computing functions and 
recognizing languages. 
However, anything you can do with one language you can do with another. (It might 
simply be easier to program in one than another, or one might run faster.) How can we show 
we can do the same thing with Python as FORTRAN? We can convert python programs 
into FORTRAN, or convert FORTRAN programs to python. We can simulate one language 
with the other. This “proves” that they have the same computational power. 
That’s what the researchers of computation theory did. They gave ways to simulate 
Turing machines by lambda calculus, lambda calculus by Turing machines, as well as different 
variations of these models. 
They found that all these models were doing the same thing! 
We’ll see this for ourselves, as we show that several variations of Turing machines all 
have the same computational power. 
1.2 Multitape Turing machines 
Definition 6.4: A multitape Turing machine is a Turing machine with multiple tapes. 
The input is written on the first tape. 
The transition function can not look at all the symbols under each of the heads, and 
write and move on each tape. 
We can define all this rigorously, if we wanted to. 
Theorem 6.5: thm:multitape 퐴 is Turing-recognizable iff some multitape TM recognizes 
퐴. 
In other words, Turing-recognizability with respect to one-tape Turing machines is the 
same as Turing-recognizability with respect to multi-tape Turing machines. 
Proof. If 퐴 is Turing recognizable, then clearly a multitape TM recognizes 퐴, because a 
single-tape TM is a multitape TM. 
Suppose we have a language recognizable with a multitape TM. We need something like 
a compiler to convert a multitape TM to a one-tape TM, so that we can use a one-tape TM 
to simulate a multi-tape TM. 
Let 푀 be a multitape TM. 푀 can do stuff with primitive operations that a single-tape 
TM 푆 can’t do. It can write to a 2nd tape, which 푆 doesn’t have! We need to use some data 
structure on 푆’s tape to represent what appears on the multitapes on 푀. 
46
Lecture 6 Notes on Theory of Computation 
푆 initially formats its tape by writing separators # after the input string, one for each 
tape that 푀 has. The string between two separators will represent the string on one of 푀’s 
tapes. 
Next 푆 moves into simulation phase. Every time 푀 does one step, 푆 simulates it with 
many steps. (This is just “programming,” in single-machine TM code.) 푆 has to remember 
where the heads are on the multiple tapes of 푀. We enhance the alphabet on 푆 to have 
symbols with dots on them ˙푎 to represent the positions of the heads. 푆 update the locations 
of these markers to indicate the locations of the heads. 
(Figure 3.14 from the textbook.) 
There are details. For example, suppose 푀 decides to move head to an initially blank 
part. 푆 only has allocated finite memory to each tape! 푆 has to go into an “interrupt” phase 
and move everything down one symbol, before carrying on. 
A lot of models for computation turn out to be equivalent (especially variants of 
Turing machines). To show they are equivalent, give a way to simulate one model 
with the other. 
The same proof carries through for deciders: A language is decidable by a multitape TM 
iff it is decidable by a single-tape TM. 
Let’s look at another example, similar but important for us down the road. 
1.3 Nondeterministic TM 
Definition 6.6: A nondeterministic Turing machine (NTM) is like a Turing machine 
except that the transition function now allows several possibilities at each step, i.e., it is a 
function 
훿 : 푄 × Γ → 풫(푄 × Γ × {퐿,푅}). 
If any thread of the computation accepts, then we say the Turing machine accepts. (Accept-ing 
overrules rejecting.) 
We say that a nondeterministic TM is a decider if every branch halts on every input. 
47
Lecture 6 Notes on Theory of Computation 
Theorem 6.7: thm:ntm 퐴 is Turing-recognizable iff some NTM recognizes 퐴. 
As we will see in the second half of the course, nondeterministic TM’s are very impor-tant. 
For our purposes now, they have the same power as deterministic TM’s, because they 
recognize the same class of langugages. 
Proof. Any deterministic Turing machine is a NTM, so this direction is obvious. 
We want to convert a NTM 푁 to a DTM 푀. 푀 is supposed to accept exactly the same 
input 푁 accepts, by simulating 푁. How does this work? This is trickier. 
푁 may have made a nondeterministic move, resulting in 2 or more options. 푀 doesn’t 
know which to follow. If there are multiple ways to go, then take that piece of tape, make 
several copies of the tape separated by #, and carry on the simulation. This is just like the 
proof of Theorem 6.5, except that different segments of tape don’t rep different tapes, they 
represent different threads. 
We have to represent both the head and the state for each of the threads. The number of 
threads may grow in some unbounded way. 푀 can’t keep track of all the different states in 
finite memory, so we had better write them all down. To do this, allow a composite symbol 
푞 
to mean that the head is at 푎 and the machine is in state 푞 in that thread. 푀 proceeds 
푎 
by taking a thread, seeing what 푁 would do, and updating thread. 
One of threads may again fork into multiple possibilities. In that case we have to open 
up room to write down copies of a thread, by moving stuff down. 
푀 goes through each thread and repeats. The only thing we have to take note of is when 
shoule 푀 ends up accepting? If 푀 enters an accept state on any thread, then 푀 enters 
its accept state. If 푀 notices some thread of 푁 enters a reject state, then 푀 collapse the 
thread down or marks it as rejecting, so it don’t proceed with that further, and carries on 
with the other threads. 
Question: When does nondeterminism help in a model of computation? In the second 
half of the course, when we care about how much time computation takes, the big queston is 
whether NTM and TM are equivalent. It is not obvious when nondeterminism is equivalent 
to determinism. If we can answer this question for polynomial time TM’s, then we’ve just 
solved a famous problem (P vs. NP). 
Let’s just do 1 more model, that has a different flavor that what we’ve done, and is 
slightly more interesting. 
1.4 Turing enumerators 
Instead of recognition, can you just list the members of a language? 
Definition 6.8: A Turing enumerator is a Turing machine with a “printer” (output 
device). Start the Turing machine on an empty tape (all blanks). The Turing enumerator 
has a special feature that when it goes into “print” mode, it sends out a marked section of 
the tape to the printer to write out. 
48
Lecture 6 Notes on Theory of Computation 
(Figure 3.20 in textbook) 
The strings that are written out by an enumerator 퐸 are considered to be its language: 
퐿(퐸) = {푤 : 퐸 outputs 푊 at some point when started on blanks} . 
If 퐸 halts, then the list is finite. It could also go on forever, so it can enumerate an 
infinite language. 
Again, Turing enumerators capture the same class of languages. 
Theorem 6.9: 퐴 is Turing-recognizable iff 퐴 = 퐿(퐸) for some enumerator 퐸. 
Proof. Here we need to prove both directions. 
(←) Convert 퐸 to an ordinary recognizer 푀. Given 퐸, we construct a TM 푀 with 퐸 
built inside it. 
Have 푀 leave the input string 푤 alone. 푀 moves to the blank portion of the tape and 
runs 퐸. When 퐸 decides to print something out, 푀 takes a look to see if the string is 푤. If 
not, then 푀 keeps simulating 퐸. If the string is 푤, then 푀 accepts. 
Note that if 푀 doesn’t find a match, it may go on forever—this is okay, 푀 can loop by 
rejecting. We have to take advantage of 푀 being able to go on forever. 
(→) Convert 푀 to enumerator 퐸. The idea is to feed all possible strings to 푀 in some 
reasonable order, for instance, lexicographic order 휀, 0, 1, 00, 01, 10, 11. 
However, we have to be careful. Suppose 푀 is running on 101. If 푀 accepts 101, then 
we print it out. If 푀 halts and rejects 101, then 퐸 should move on to the next string. The 
only problem is when 푀 runs forver. What is 퐸 supposed to do? 퐸 doesn’t know 푀 is 
going forever! We can’t get hung up running 푀 on 101. We need to check 110 too! The 
solution is to run 푀 for a few steps on any given string, and if hasn’t halted then move on, 
and come back to it lter. 
We share time among all strings where computation hasn’t ended. Run more and more 
strings for longer and longer. More precisely, for 푘 = 1, 2, 3, . . ., 퐸 runs 푀 on the first 푘 
strings for 푘 steps. If 푀 ever accepts some string 푠, then print 푠. 
49
Lecture 6 Notes on Theory of Computation 
S2 Philosophy: Church-Turing Thesis 
The Church-Turing Thesis was important in the history of math. After proposing all these 
different models to capture what we can compute, people saw how they were all equivalent 
(in an non-obvious way). 
Axiom 6.10 (Church-Turing Thesis): church-turing Our perception of what we can 
do with a computer (an algorithm, effective procedure) is exactly captured by Turing 
machine. 
Our inuitive “Algorithm” is the precise notion of a “Turing machine.” 
It might seem arbitrary for us to focus on Turing machines, when this is just one model 
of computation. But the Church-Turing Thesis tells us the models are all equivalent! The 
notion of algorithm is a natural, robust notion. This was a major step forward in our 
understanding of what computation is. 
It’s almost saying something about the physical universe: there’s nothing we can build 
in the physical world that is more powerful than a Turing machine. 
David Hilbert gave an address at the International Congress of Mathematicians in 1900. 
He was probably the last mathematician who knew what was going on in every field of 
mathematics at the same time. He knew the big questions in each of those fields. He made a 
list of 23 unsolved problems that he felt were a good challenge for the coming century; they 
are called the Hilbert problems. 
Some of them are solved. Some of them are fuzzy, so it’s not clear whether they are 
solved. Some of them have multiple parts, just like homework. 
One of the questions was about algorithms—Hilbert’s tenth problem, which I’ll describe. 
Suppose we want to solve a polynomial equation 3푥2 +17푥−22 = 0. This is easily done. 
But suppose we don’t want to know if a polynomial equation has a root, but whether it have 
a root where variables are integers. Furthermore, we allow variables with several variables. 
This makes things a lot harder. For instance, we could have 
17푥푦2 + 2푥 − 21푧5 + 푥푦 + 1 = 0. 
Is there an assignment of integers in 푥, 푦, 푧 such that this equation is satisfied? 
Hilbert asked: Is there a finite procedure which concludes after some finite number of 
steps, that tells us whether a given polynomial has an integer root? 
We can put this in our modern framework. Hilbert didn’t know what a “procedure” was 
in a mathematical sense. In these days, this is how we would phrase this question. 
Problem 6.1 (Hilbert’s Tenth Problem): Let 
퐷 = {푝 : 푝 is a multivariable polynomial that has a solution (root) in integers} . 
50
Lecture 7 Notes on Theory of Computation 
Is 퐷 decidable?4 
The answer is no, as Russian mathematician Matiasevich found when he was 20 years 
old. 
Without a precise notion of procedure, there was no hope of answering the question. 
Hilbert originally said, give a finite procedure. There was no notion that there might not be 
a procedure! It took 35 years before the problem could be addressed because we needed a 
formal notion of procedure to prove there is none. 
Here, the Church-Turing Thesis played a fundamental role. 
Lecture 7 
Thu. 9/27/12 
Last time we talked about 
∙ TM variants 
∙ Church-Turing Thesis 
Today we’ll give examples of decidable problems about automata and grammars. 
S0 Hints 
Problem 1: Prove some language not context-free. Use the pumping lemma! The trick is to 
find the right string to use for pumping. Choose a string longer than the pumping length, 
no matter how you try to pump it up you get out of the language. The first string you think 
of pump may not work; probably the second one will work. 
Problem 2: Show context-free. Give a grammar or a pushdown automata. At first glance 
it doesn’t look like a context-free language. Look at the problem, and see this is a language 
written in terms of having to satisfy two conditions, eahc of which seems to use the condition. 
The problem seems to be if you use the stack for the first condition it’s empt for the second 
condition. Instead of thinking of it as a AND of two conditions , think of it as an OR of 
several conditions. 
Problem 3 and 4: easy. 4 about enumerator. 
Problem 5: variant of a Turing machine. Practice with programming on automaton. 
Problem 6: (4.17 in the 2nd edition and 4.18 in the 3rd edition) Let 퐶 be a language. 
Prove that 퐶 is Turing-recognizable iff a decidable language 퐷 exists such that 
퐶 = {푥 : for some 푦, ⟨푥, 푦⟩ ∈ 퐷} . 
We’ll talk about this notation below. 
4Note 퐷 is Turing recognizable. Just start plugging in all possible tuples of integers, in a systematic list 
that covers all tuples. If any one is a root, accept, otherwise carry on. 
51
Lecture 7 Notes on Theory of Computation 
0.1 Encodings 
We want to feed more complicated objects into Turing machines—but a Turing machine can 
only read strings. 
If we want to feed a fancy object into a program we have to write it as a string. We need 
some way of encoding objects, and we’d like some notation for it. 
For any formal finite object 퐵, for instance, a polynomial, automaton, string, grammar, 
etc., we use ⟨퐵⟩ to denote a reasonable encoding of 퐵 into a binary string. ⟨퐵1, . . . ,퐵푘⟩ encodes several objects into a string. 
For example, to encode an automaton, write out the list of states and list of transitions, 
and convert it into a long binary string, suitable for feeding in to a TM. 
Problem 6 links recognizability and decidability in a nice way. You can think of it as 
saying: “The projection of a recognizable language is a decidable language.” Imagine we 
have a coordinate systems, 푥 and 푦. Any point corresponds to some (푥, 푦). 
Look at all 푥 such that for some 푦, ⟨푥, 푦⟩ is in 퐷. So 퐶 consists of those points of 푥 
underneath some element of 퐷. We’re taking all the (푥, 푦) pairs and remove the 푥. Shrinks 
2-d shape into 1-d shadow; this is why we call it the projection. This will reappear later on 
the course when we talk about complexity theory! 
You need to prove an “if and only if.” Reverse direction is easy. If 퐷 is decidable, and 
you can write 퐶 like this, we want 퐶 to be recognizable. We need to make a recognizer for 퐶. 
It accepts for strings in the language but may go on forever for strings not in the language. 
Accept if in 퐶 but don’t know what 푦 is. Well, let’s not give it away more! 
The other direction is harder. Given T-recognizable 퐶, show that 퐷 is decidable, we 
don’t even know what 퐷 is! We have to find an “easier” language, so 푦 sort-of helps you 
determine whether 푥 ∈ 퐶. If 퐶 were decidable easy, just ignore 푦. Which 푦 should you use? 
Make it a decidable test. 
The 푦 somehow proves that 푥 ∈ 퐶. For each 푥 ∈ 퐶 there has to be some 푦 up there 
somewhere. Wht does 푦 do? The nice thing about 푦 in 퐶, is that if the proof fails, the 
decider can see that the proof fails. (Whatever I mean by proof. Conceptually. Test for 
validity.) Go from recognizer to decider. Nice problem! 
S1 Examples of decidable problems: problems on FA’s 
By the Church-Turing Thesis 6.10, algorithms are exactly captured by Turing machines. 
We’ll talk about algorithms and Turing machines interchangeably (so we’ll be a lot less 
formal about putting stuff in Turing machine language). 
Theorem 7.1: thm:ADFA Let 
퐴DFA = {⟨퐵,푤⟩ : 퐵 is a DFA and 퐵 accepts푤} . 
Then 퐴DFA is decidable. 
The idea is to just run the DFA! We’ll do some easy things to start. 
52
Lecture 7 Notes on Theory of Computation 
Proof. We’ll give the proof in high level descriptive language (like pseudocode), rather than 
explicitly draw out state diagrams. We’ll write the proof in quotes to emphasize that our 
description is informal but there is a precise mathematical formulation we can make. 
Let 퐶=“on input string 푥 
1. Test if 푥 legally encodes ⟨퐵,푤⟩ for some DFA 퐵 and 푤. Does it actually encode a 
finite automata and string? If not, reject (it’s a garbage string). 
2. Now we know it’s of the correct form. Run 퐵 on 푤. We’ll give some details. We’ll use 
a multi-tape Turing machine. 
Find the start state, and write it on the working tape. Symbol by symbol, read 푤. At 
each step, see what the current state is, and transition to the next state based on the 
symbol read, until we get to end of 푤. Look up the state in 퐵 to see whether it is an 
accept state; if so accept, and otherwise reject. 
3. Accept if 퐵 accepts. Reject if 퐵 rejects.” 
Under the high-level ideas, the details are there. From now on, we’ll just give the high-level 
proof. This is the degree of formality that we’ll provide and that you should provide in 
your proofs. 
Brackets mean we agree on some encoding. We don’t go through the gory details of 
spelling out exactly what it is; we just agree it’s reasonable. 
We go through some details here, so you can develop a feeling for what intuition can be 
made into simulations. Each stage should be obviously doable in finite time. 
Turing machines are “powerful” enough: trust me or play around with them a bit to see 
they have the power any programming language has. 
We’ll do a bunch of examples, and then move into some harder ones. Let’s do the same 
thing for NFA’s. 
Theorem 7.2: Let 
퐴NFA = {⟨퐵,푤⟩ : 퐵 is a NFA and 퐵 accepts 푤} . 
Then 퐴NFA is decidable. 
53
Lecture 7 Notes on Theory of Computation 
We can say exactly what we did before for NFA’s instead of DFA’s. However, we’ll say 
it a slightly different way, to make a point. 
Proof. We’re going to use the fact that we already solved the problem for DFA’s. 
Turing machine 퐷 =“on input ⟨퐵, 푞⟩, (By this we mean that we’ll check at the beginning 
whether the input is of this form, and reject if not.) 
1. Convert the NFA 퐵 to an equivalent DFA 퐵′ (using the subset construction). 
All of those constructions can be implemented with Turing machines. 
2. Run TM 퐶 (from the proof of Theorem 7.1) on input ⟨퐵′,푤⟩. 
3. Accept if 퐶 accepts, reject if 퐶 rejects. 
We see that in this type of problem, it doesn’t matter whether we use NFA or DFA, 
or whether we use CFG or PDA, because each in the pair recognizes the same class of 
languages. In the future we won’t spell out all all equivalent automata; we’ll just choose one 
representative (DFA and CFG). 
Let’s do a slightly harder problem. 
Theorem 7.3: EDFA Let 
퐸DFA = {⟨퐵⟩ : 퐵 is DFA and 퐿(퐵) = 휑} . 
Then 퐸DFA is decidable. 
This is the emptiness testing problem for DFA’s: Is there one string out there that the 
DFA accepts? 
Proof. How would you test if a DFA 퐵 has am empty language? Naively we could test all 
strings. That is not a good idea, because this is not something we can do in finite time. 
Instead we test whether there is a path from the start state to any of the accept states: 
Mark the start state, mark any state emanating from a previously marked state, and so 
forth, until you can’t mark anything new. We eventually get to all states that are reachable 
under some input. 
If we’ve marked all reachable states, and haven’t marked the accept state, then 퐵 has 
empty language. 
54
Lecture 7 Notes on Theory of Computation 
With this idea, let’s describe the Turing machine that decides 퐸DFA. 
Let 푆 =“ on input ⟨퐵⟩. 
1. Mark the start state. 
2. Repeat until nothing new is marked: Mark all states emanating from previously marked 
states. 
3. Accept if no accept state is marked. Reject otherwise. 
This is detailed enough for us to build the Turing machine if we had the time, but high-level 
enough so that the focus is on big details and not on fussing with minor things. (This 
is how much detail I expect in your solutions.) 
Note this applies to NFA’s as well because we can convert NFA’s to DFA’s and carry out 
the algorithm we just desribed. 
Theorem 7.4 (Equivalence problem for DFA’s): EQDFA 
퐸푄DFA = {⟨퐴,퐵⟩ : 퐴,퐵 DFA’s and 퐿(퐴) = 퐿(퐵)} 
Proof. Look at all the places where 퐿(퐴) and 퐿(퐵) are not the same. Another way to phrase 
the equivalence problem (is 퐿(퐴) = 퐿(퐵)) is as follows: Is the shaded area below, called the 
symmetric difference, empty? 
퐴△퐵 = (퐿(퐴) ∩ 퐿(퐵)) ∪ (퐿(퐴) ∩ 퐿(퐵)) 
Let 퐸 =“ on input ⟨퐴,퐵⟩. 
∙ Construct a DFA 퐶 which recognizes 퐴△퐵. Test if 퐿(퐶) = 휑 using the TM 푆 that 
tested for emptiness (Theorem 7.3). 
∙ Accept if it is 휑, reject if not. 
55
Lecture 7 Notes on Theory of Computation 
S2 Problems on grammars 
Let’s shift gears and talk about grammars. 
Theorem 7.5: ACFG Let 
퐴CFG = {⟨퐺,푤⟩ : 퐺 is a CFG and 푤 ∈ 퐿(퐺)} . 
Then 퐴CFG is decidable. 
Proof. We want to know: does 퐺 generate 푤? 
We need an outside fact. We can try derivations coming from the start variable, and 
see if any of them lead to 푤. Unfortunately, without extra work, there are infinitely many 
things to test. For example, a word 푤 may have infinitely many parse trees generating it, if 
we had a rule like 푅 → 휀|푅. 
Definition 7.6: A CFG is in Chomsky normal form if all rules are of the form 푆 → 휀, 
퐴 → 퐵퐶, or 퐴 → 푎, where 푆 is the start variable, 퐴,퐵,퐶 are variables, 퐵,퐶 are not 푆, 
and 푎 is a terminal. 
The Chomsky normal form assures us that we don’t have loops (like 푅 → 푅 would cause). 
A variable 퐴 can only be converted to something longer, so that the length can only increase. 
We need two facts about Chomsky normal form. 
Theorem 7.7: chomsky-nf 
1. Any context-free language is generated by a CFG in Chomsky normal form. 
2. For a grammar in Chomsky normal form, all derivations of a length 푛 string have at 
most a certain number of steps, 2푛 − 1. 
Let 퐹 =“on ⟨퐺,푤⟩. 
1. Convert 퐺 to Chomsky normal form. 
2. Try all derivations with 2푛 − 1 steps where 푛 = |푤|. 
3. Accept if any yield 푤 and reject otherwise 
Corollary 7.8: CFL-decidable Every CFL is decidable. 
This is a different kind of theorem from what we’ve shown. We need to show every 
context-free language is decidable, and there are infinitely many CFL’s. 
Proof. Suppose 퐴 is a CFL generated by CFG 퐺. We build a machine 푀퐺 (depending on 
the grammar 퐺) deciding 퐴: 푀퐺=“on input 푤, 
56
Lecture 7 Notes on Theory of Computation 
1. Run TM 퐹 deciding 퐴CFG (from Theorem 7.5) on ⟨퐺,푤⟩. Accept if 퐹 does and reject 
if not. 
Theorem 7.9 (Emptiness problem for CFG’s): EQCFG 
퐸CFG = {⟨퐴⟩ : 퐿(퐴) = 휑} 
is decidable. 
Proof. Define a Turing machine by the following. “On input ⟨퐺⟩, 
1. First mark all terminal variables 
푆 → ˙푎푆˙ 푏|푇˙푏 
푇 → ˙푎|푇 ˙ 푎. 
2. Now mark all variables that go to a marked variable: 
푆 → ˙푎푆˙ 푏| ˙푇 
˙푏 
˙푇 
→ ˙푎| ˙푇 
˙ 푎. 
and repeat until we can’t mark any more 
˙푆 
→ ˙푎 ˙푆 
˙푏 
| ˙푇 
˙푏 
˙푇 
→ ˙푎| ˙푇 
˙ 푎. 
3. If 푆 is marked at the end, accept, otherwise reject. 
Turing machines can decide a lot of properties of DFA’s and CFG’s because DFA’s 
and PDA’s have finite memory. Thus we may can say things like, “mark certain 
states/variables, then mark the states/variables connected to them, and so on, and 
accept if we eventually get to...” 
In contrast to the previous theorems, however, we have the following. 
Theorem 7.10 (Equivalence problem for CFG’s): EQCFG 
퐸푄CFG = {⟨퐴,퐵⟩ : 퐴,퐵 CFG’s and 퐿(퐴) = 퐿(퐵)} 
is undecidable. 
57
Lecture 8 Notes on Theory of Computation 
(Note that the complement of 퐸푄CFG is recognizable. Decidability is closed under com-plement 
but not recognizability. In fact, the class of recognizable languages isn’t closed under 
intersection or complement.) 
We will prove this later. We also have the following theorem. 
Theorem 7.11 (Acceptance problem for TM’s): 
퐴TM = {⟨푀,푤⟩ : TM 푀accepts 푤} 
is undecidable. However it is 푇-recognizable. 
To see that 퐴TM is 푇-recognizable, let 푈 =“ on ⟨푀,푤⟩. Simulate 푇 on 푤.” Note this 
may not stop. 
This is a famous Turing machine. It is the “universal machine,” and the inspiration for 
von Neumann architecture. It is a machine that one can program, without having to rewire 
it each time, so it can do the work of any other machine. 
Lecture 8 
Tue. 10/2/12 
Today Zack Remscrim is filling in for Michael Sipser. 
We summarize the relationships between the three types of languages we’ve seen so far. 
S1 Languages 
Proposition 8.1: lang-subset Each of the following classes of language is a proper subset 
of the next. 
1. Regular 
2. CFL 
3. Decidable 
4. Turing-recognizable 
Proof. We’ve already shown that the classes are subsets of each other. 
We have that {푎푛푏푛 : 푛 ≥ 0} is a CFL but not a regular language, and {푎푛푏푛푐푛 : 푛 ≥ 0} is decidable but not CFL. 
Today we’ll finish the proof by showing that decidable languages are a proper subset of 
T-recognizable languages, by showing that 
퐴푇푀 = {⟨푀,푤⟩ : 푀 is a TM that accepts 푤} 
is Turing-recognizable but not decidable. 
58
Lecture 8 Notes on Theory of Computation 
We’ll also show there is a language that is not even Turing-recognizable. 
Theorem 8.2: 퐴푇푀 is Turing-recognizable. 
Proof. Let 푈 =“on input ⟨푀,푊⟩, 
1. Run 푀 on 푤. 
2. If 푀 accepts, then accept. 
If 푀 halts and rejects, then reject.” 
푀 doesn’t have to be a decider, it may reject by looping. Then 푈 also rejects by looping. 
We can’t do something stronger, namely make a test for membership and be certain that 
it halts (i.e., make a decider). 
S2 Diagonalization 
Diagonalization is a technique originially introduced to compare the sizes of sets. We have 
a well-defined notion of size for finite sets. For infinite sets, it’s not interesting just to call 
them all “infinite.” We’d also like to define the size of an infinite set, so that we can say one 
infinite set is larger or the same size as another. 
Definition 8.3: Two sets 퐴 and 퐵 have the same size if there exists a one-to one (injec-tive) 
and onto (surjective) function 푓 : 퐴 → 퐵. Here, 
∙ “one-to-one” means if 푥̸= 푦, then 푓(푥)̸= 푓(푦). 
∙ “onto” means for all 푦 ∈ 퐵 there exists 푥 ∈ 퐴 such that 푓(푥) = 푦. 
We also say that 푓 : 퐴 → 퐵 is a 1-1 correspondence, or a bijection. 
This agrees with our notion of size for finite sets: we can pair off elements in 퐴 and 퐵 
(make a bijection) iff 퐴 and 퐵 have the same number of elements. 
This might seem like an excessive definition but it’s more interesting when applied to 
infinite sets. 
Example 8.4: Let 
N = {1, 2, 3, 4, . . . , } 
E = {2, 4, 6, 8, . . . , }. 
Then N and E have the same size, because the function 푓(푛) = 2푛 gives a bijection N → E. 
59
Lecture 8 Notes on Theory of Computation 
푛 푓(푛) 
1 2 
2 4 
3 6 
4 8 
... 
... 
Note N and E have the same size even though E is a proper subset of N. 
This will usefully separate different kinds of infinities. We’re setting the definition to be 
useful for us. We want to distinguish sets that are much much bigger than N, such as the 
real numbers. 
Definition 8.5: A set is countable if it is finite or has the same size as N. 
Example 8.6: The set of positive rationals 
Q+ = ⌈︀푚 
푛 
: 푚, 푛 ∈ N}︀ 
is countable. 
To see this, we’ll build up a grid of rational numbers in the following way. 
1 2 3 4 5 
1 1 
1 
1 
2 
1 
3 
1 
4 · · · 
2 2 
1 
2 
2 
2 
3 
2 
4 · · · 
3 3 
1 
3 
2 
3 
3 
3 
4 · · · 
4 ... 
... 
... 
... 
. . . 
60
Lecture 8 Notes on Theory of Computation 
Every rational number certainly appears in the table. We’ll snake our way through the grid. 
1 2 3 4 5 
1 1 
1 
 
1 
2 
 
1 
3 



























 
1 
4 · · · 
2 2 
1 
B 
    2 
2 
B 
    2 
3 
2 
4 · · · 
3 3 
1 
B 
    3 
2 
3 
3 
3 
4 · · · 
4 ... 
... 
... 
... 
. . . 
Now put the numbers in this order in a list next to 1, 2, 3, . . . 
푛 푓(푛) 
1 1 
1 
2 2 
1 
3 1 
2 
4 3 
1 
2 
2 
5  
1 
3 
Note some rational numbers appear multiple ... times, for instance, 1 appears as ... 
1 
1 , 2 
2 , . . .. 
In the correspondence we don’t want to repeat these, we just go to the next value. This 
creates a bijection between N and Q+, showing Q+ is countable. 
A lot of infinite sets seem to have the same size, so is this a completely useless definition? 
No, there are infinite sets bigger than others, so it is useful for us. Are the real numbers of 
the same size as rational numbers? 
Theorem 8.7: thm:R-uncountable The set of real numbers R is not countable. 
Our proof uses the technique of diagonalization, which will also help us with the proof 
for 퐴푇푀. 
Proof. Assume by contradiction that R is countable; there exists a bijection 푓 : N → R. 
We’re going to prove it’s not a bijection, by showing that it misses some 푦. 
Let’s illustrate with a potential bijection 푓. 
61
Lecture 8 Notes on Theory of Computation 
푛 푓(푛) 
1 1.4142 
2 3.1415 
3 2.7182 
4 1.6108 
... 
... 
We’ll construct a number 푦 that is missed by 푓 in the following way: Let 푦 differ from 
푓(푖) at the 푖th place to the right of the decimal point. 
푛 푓(푛) 
1 1.4142 
2 3.1415 
3 2.7182 
4 1.6108 
... 
... 
For instance, let 
푦 = 0.3725 . . . 
We claim 푦 can’t show up in the image of 푓. Indeed, this is by construction: it differs from 
푓(푖) in the 푖th place, so it can’t be 푓(푖) for any 푖. 
There’s one little detail: 1 and .999 . . . are equal even though their decimal representations 
are different. To remedy this, we’ll just never use a 0 or 9 in 푦 to the right of the decimal. 
This is just to get around a little issue, though. The main idea is that given an alleged 
bijection, I can show it’s not a bijection by constructing a value it misses. 
We’ve shown that there can’t be a bijection N → R; therefore R is uncountable. 
Use diagonalization when you want to construct an element that is different from 
every element on a given list. This is used in proofs by contradiction, for example, 
when you want to show a function can’t hit every element of a set. 
Theorem 8.8: Let 
ℒ = {퐿 : 퐿 is a language} . 
Then ℒ is uncountable. 
The proof uses the same diagonalization idea. 
Proof. It’s enough to show just ℒ is uncountable when the alphabet is just 0, because every 
alphabet contains at least 1 symbol. The set of possible strings is 
{0}* = {휀, 0, 00, 000, . . .}. 
62
Lecture 8 Notes on Theory of Computation 
For a language 퐿, define the characteristic vector of 휒퐿 by 휒퐿(푣) = 0 if 푣̸∈ 퐿 and 1 if 푣 ∈ 퐿. 
휒퐿 simply records whether each word is in 퐿 or not. 
There is a correspondence between each language and its characteristic vectors. All we 
have to show is the set of characteristic vectors is uncountable. The set of strings of count-able 
length is uncountable. Assume by contradiction that {휒퐿 : 퐿 is a language over {0}} is 
countable. 
Suppose we have some bijection from N to the set of characteristic vectors 휒퐿, 
푛 푓(푛) 
1 1011 · · · 2 0000 · · · 3 1111 · · · 4 1010 · · · ... 
... 
Again, there has to be some binary string that is missed by 푓. We choose 푦 so it differs 
from 푓(푖) at the 푖th place. 
푛 푓(푛) 
1 1011 
2 0000 
3 1111 
4 1010 
... 
... 
푦 = 0101 · · · 
This 푦 can’t ever appear in the table: Suppose 푓(푛) = 푦. This is impossible since we said 
푦 differs from 푓(푛) in the 푛th place. This shows the set of languages really is uncountable. 
Note that our proof works no matter what the alleged bijection 푓 looks like. Whatever 
푓 does, it pairs up each 푛 with one binary number. All I have to do is construct a 푦 that 
differs from every single 푓(푛). It’s constructed so it differs from every 푓(푖) somewhere. 
This shows that no function 푓 can work. 
S3 퐴푇푀: Turing-recognizable but not decidable 
Consider 
ℳ= {푀 : 푀 is a Turing machine} . 
(We fix the tape alphabet.) This is countable because there is a way to encode a Turing 
machines using a finite alphabet, with a finite length word. Now some words represent valid 
Turing machines. 
Now pair the first valid string representing a Turing machine with 1, the second valid 
string representing a Turing machine with 2, and so forth. This shows ℳ is countable. 
63
Lecture 8 Notes on Theory of Computation 
The set of all languages is uncountable, but the set of Turing machines is countable. This 
implies the following fact. 
Theorem 8.9: There exists a language 퐿 such that 퐿 is not Turing-recognizable. 
Proof. If every language were Turing-recognizable, we can map every language to a Turing 
machine that recognizes it; this would give a correspondence between a uncountable and a 
countable set. 
We’re now ready to prove that 퐴푇푀 is undecidable. 
Theorem 8.10: ATM 퐴푇푀 is undecidable. 
Proof. We’ll proceed by contradication using diagonalization. 
Assume for sake of contradiction that 퐴푇푀 is decidable. Then there exists a decider 퐻, 
such that 
퐻(⟨푀,푤⟩) =⎧⎨⎩ 
accept, when 푀 accepts. 
rejects, when 푀 rejects. 
(Because 퐻 is a decider, it is guaranteed to halt.) Using this machine 퐻 we’re going to make 
a machine 퐷 that does something utterly impossible. This will be our contradiction. 
Let 퐷=“On input ⟨푀⟩, 
1. Run 퐻 on ⟨푀, ⟨푀⟩⟩.5 퐻 answers the 퐴푇푀 problem, so it answers: does machine 푀 
accept its own description?6 
2. If 퐻 accepts, reject. 
If 퐻 rejects, accept. 
Now for any Turing machine 푀, 퐷 accepts ⟨푀⟩ iff 푀 doesn’t accept ⟨푀⟩. 
What happens if we feed ⟨퐷⟩ to 퐷? We get that 퐷 accepts ⟨퐷⟩ iff 퐷 doesn’t accept ⟨퐷⟩. 
This is a contradiction! 
Let’s look at what we’ve done. Let’s say 퐴푇푀 were decidable. Let 퐻 decide the 퐴푇푀 
problem. We construct 퐷 that uses 퐻 as a subroutine, that does the opposite of what a 
machine 푀 does when fed the description of 푀. Then when we feed ⟨퐷⟩ to 퐷, 퐷 is now 
completely confused! We get a contradiction, hence 퐴푇푀 can’t be decidable. 
(If you’re completely confused, there’s more explanation in the next lecture.) 
This completes the picture in Proposition 8.1. 
5Compare this to looking at the 푖th symbol of 푓(푖). 
6This is a valid question, because we can encode the machine in a string, and the machine accepts strings. 
We can feed the code of a program to the program itself. For instance, we could have an optimizing compiler 
for 퐶, written in 퐶. Once we have the compiler, we might compile the compiler. 
64
Lecture 9 Notes on Theory of Computation 
S4 Showing a specific language is not recognizable 
So far we know that there are nonrecognizable languages, but we haven’t given an explicit 
description of one. Now we’ll show a specific language is not recognizable. For this the 
following lemma is useful. 
Lemma 8.11: 퐴 is decidable iff 퐴 is 푇-recognizable and 퐴 is 푇-recognizable (we say that 
퐴 is co-T-recognizable). 
This immediately implies that 퐴푇푀 is not recognizable. 
Proof. ( =⇒ ): Suppose 퐴 is decidable. Then 퐴 is T-recognizable. For the second part, if 
퐴 is decidable, then 퐴 is decidable (decidable languages are closed under complementation: 
just run the decider and do the opposite. You’re allowed to do the opposite because the 
decider is guaranteed to halt). Hence 퐴 is also 푇-recognizable. 
(⇐): Suppose 푅 recognizes 퐴 and 푆 recognizes 퐴. We construct a decider 푇 for 퐴. If 
we can do this, we’re done. 
Construct 푇 as follows. 푇 =“on input 푤, 
1. Run 푅 and 푆 on 푤 in parallel until one accepts. (We can’t run 푅 and see what it does, 
and then run 푆, because 푅 and 푆 may not be decidable—푅 might run forever, but 푇 
needs to be a decider.) This won’t take forever: either 푅 or 푆 might run forever on 
a particular input, but at least one of them will accept eventually, becase a string is 
either in 퐴 or 퐴. 
2. If 푅 accepts, then accept. If 푆 accepts (i.e. 푤 ∈ 퐴), then reject. 
Lecture 9 
Thu. 10/4/12 
Last time we saw 
∙ 퐴푇푀 is undecidable. 
∙ 퐴푇푀 is T-unrecognizable. 
∙ Diagonalization method 
We showed how the diagonalization method proved the reals were uncountable, and also 
applied the same idea to decidability. We’ll give a quick recap, and highlight why the idea 
behind the two diagaonalization arguments are the same. 
Theorem 9.1: R is uncountable. 
65
Lecture 9 Notes on Theory of Computation 
Proof. Assume for contradiction that R is countable. Suppose we’re given a bijection. 
푛 푓(푛) 
1 2.71828 . . . 
2 3.14159 . . . 
3 0.11111 . . . 
4 
... 
... 
... 
Take a number differing from 푓(푖) in 푖th place. For instance, take 푥 = 0.654 . . . where 
6̸= 7, 5̸= 4, and 4̸= 1. 
Then 푥 can’t be on the list. For instance, it can’t be the 17th number because it’s 
different in 17th place. Thus 푓 fails to be a bijection. This is Cantor’s proof. 
We applied diagonalization to decidability problems. 
Theorem 9.2: 퐴푇푀 is undecidable. 
Proof. Assume 퐴 is decidable by a Turing machine 퐻. Use 퐻 to get TM 퐷, that does the 
following. 
1. 퐷 on ⟨푀⟩ rejects if 푀 accepts ⟨푀⟩ and accepts if 푀 rejects (halt or loop) ⟨푀⟩. 
Then 퐷 accepts ⟨푀⟩ iff 푀 doesn’t accept ⟨푀⟩; hence 퐷 accepts ⟨퐷⟩ if 퐷 doesn’t accept 
⟨퐷⟩, contradiction. 
This is the same idea as Cantor’s diagonalization argument! To see this, let’s make a 
table of how Turing machines respond to descriptions of Turing machines as inputs: 
⟨푀1⟩ ⟨푀2⟩ ⟨푀3⟩ · · · ⟨퐷⟩ 
푀1 accept reject reject · · · 
푀2 reject reject reject 
푀3 accept accept accept · · · ... 
... 
. . . 
퐷 rejects accept reject ? 
We programmed 퐷 so that it differed from what 푀푖 decided on ⟨푀푖⟩. However we get a 
contradiction because nothing can go in the box labeled “?”, hence 퐷 can’t be on the list of 
all Turing machines. 
Today we’ll show a lot of other problems are undecidable. There’s now a shortcut: by 
proving that 퐴푇푀 is undecidable, we will show a lot of other problems inherent 퐴푇푀’s 
undecidability. Then we won’t need the diagonalization argument again. 
Today we’ll use 
1. Reducibility to show undecidability 
2. Mapping reducibility to show T-unrecognizability. 
66
Lecture 9 Notes on Theory of Computation 
S1 Reducibility 
Let 
HALT푇푀 = {⟨푀,푤⟩ : TM 푀 halts on input 푤} . 
Theorem 9.3: HALTTM HALT푇푀 is undecidable. 
We can go back and use the diagaonlization method. But we’ll give a different technique. 
Proof. Suppose we can decide the halting problem by some Turing machine. We’re going 
to use that to decide 퐴푇푀, which we know is not decidable. Hence our assumption that 
HALT푇푀 is decidable must be false, i.e., the halting problem cannot be decided. 
Assume for sake of contradiction that TM 푅 decides HALT푇푀. We will construct a TM 
푆 deciding 퐴TM. 
Let 푆 =“on input ⟨푀,푤⟩. 
1. Use 푅 to test if 푀 halts on 푤. If not, reject. If yes, run 푀 on 푤 until it halts.” 
Why does this work? If 푀 doesn’t halt on 푤, then we know 푀 doesn’t accept, so reject. 
Suppose 푅 says 푀 does halt. We don’t know whether it accepts right off. Our algorithm 
says to run 푀 on 푤. We don’t have to worry about 푀 going forever, because 푅 has told us 
that 푀 halts! We’ll eventually come to the end, 푀 will accept or reject, and we can give 
our answer about 퐴TM. 
Thus we can use our HALTTM machine to decide 퐴TM. 
This is called reducing 퐴푇푀 to the HALT푇푀 problem. 
Reducibility: One way to show a problem is undecidable is by reducing it from a 
problem we already know is undecidable, such as 퐴TM. 
Concretely, to show a problem 푃1 is undecidable, suppose it had a decider. Use the 
decider for 푃1 to decide an undecidable problem (e.g. 퐴TM). This gives a contradic-tion. 
If some problem has already been solved, and we reduce a new problem to an old problem, 
then we’ve solved it too. For instance, consider the acceptance problem for DFA’s. We 
showed that 퐴퐷퐹퐴 is decidable (Theorem ??). Then it immediately follows that 퐴푁퐹퐴 is 
decidable, because we can reduce the 퐴푁퐹퐴 problem to a 퐴퐷퐹퐴 problem (Theorem ??). We 
converted the new problem into the solved problem. 
Definition 9.4: We say 퐴 is reducible to 퐵 if a solution to 퐵 gives a solution to 퐴. 
Here we used reducibility in a twisted way. If 퐴 is reducible to 퐵, we know that if we 
can solve 퐵 then we can solve 퐴. Hence if we can’t solve 퐴 then we can’t solve 퐵. 
67
Lecture 9 Notes on Theory of Computation 
We used a HALT푇푀 machine to decide 퐴푇푀, so we reduced 퐴푇푀 to HALT푇푀. 
All “natural” problems which are undecidable can be shown to be undecidable by reduc-ing 
퐴푇푀 to them or their complement. 
! When trying to show problems are undecidable, reduce from 퐴푇푀 (not to 퐴푇푀).a 
aOn an undecidability problem on the exam, if you just write “reduction from 퐴푇푀” you will 
get partial credit. If you write “reduction to 퐴푇푀” you will get less credit. 
Let 
퐸TM = {⟨푀⟩ : TM 푀 and 퐿(푀) = 휑} . 
Theorem 9.5: thm:etm 퐸푇푀 is undecidable. 
Proof. Use reduction from 퐴푇푀 to 퐸푇푀. 
Here’s the idea. Assume 푅 decides 퐸푇푀. We construct 푆 deciding 퐴푇푀. How do we do 
this? 푆 wants to decide whether a certain string is accepted; 푅 only tells whether the entire 
language is empty. We’re going to trick 푅 into giving me the answer we’re looking for. 
Instead of feeding the TM 푀 into 푅, we’re going to modify 푀. In the modified version 
of 푀 it’s going to have 푤 built in: 푀푤. When start up 푀푤 on any input it will ignore that 
input, and just run 푀 on 푤. It doesn’t matter what I feed it; it will run as if the input were 
푤. The first thing it does is erase the input and writes 푤. 푀푤 will always to do the same 
thing: always accept or always reject, depending on what 푀 does to 푤. (The language is 
everything or nothing.) 
Now we feed 푀푤 into 푅. The only way the language can be nonempty is if 푀 accepts 
푤. We’ve forced 푅 to give us the answer we’re looking for, i.e., we’ve converted acceptance 
into emptiness problem. Now we’re ready to write out the proof. 
푆 =“On input ⟨푀,푤⟩, 
1. Construct 푀푤 =“ignore input. 
(a) Run 푀 on 푤. 
(b) Accept if 푀 accepts.” 
2. Run 푅 on ⟨푀푤⟩. 
3. Give opposite answer. (푅 is a decider. If 푅 accepts ⟨푀푤⟩, then 푀푤’s language is 
empty, so 푀 did not accept 푤, so reject.) 
This machine decides 퐴TM, which is a contradiction. Hence our assumption was incorrect; 
퐸TM is undecidable. 
68
Lecture 9 Notes on Theory of Computation 
S2 Mapping reducibility 
We gave a general notion of reducibility, but not a specific definition. In this section we 
introduce a specific method called mapping reducibility. 
Definition 9.6: Let 퐴 and 퐵 be languages. We say that 퐴 is mapping reducible to 퐵, 
and write7 
퐴 ≤푚 퐵 
if there is a computable function 푓 : Σ* → Σ* and for all 푤, 푤 ∈ 퐴 iff 푓(푤) ∈ 퐵. 
We say 푓 : Σ* → Σ* is computable if some TM 퐹 halts with 푓(푤) on the tape when 
started on input 푤. 
Why is mapping reducibility useful? Suppose we have a decider for 퐵, and we have 푓, 
computed by a decider. We can use the two deciders together to decide whether a string is 
in 퐴! 
Proposition 9.7: pr:map-reduce If 퐴 ≤푚 퐵 and 퐵 is decidable (recognizable) so is 퐴. 
Proof. Say 푅 decides 퐵. Let 푆 =“On 푤, 
1. Compute 푓(푤). 
2. Accept if 푓(푤) ∈ 퐵. Reject otherwise.” 
For 퐵 recognizable, just remove the last line “reject if 푓(푤)̸∈ 퐵.” (We don’t know that 푅 
halts.) 
Think of 푓 as a “transformer”: it transforms a problem in 퐴 to a problem in 퐵. If 퐴 is 
reducible to 퐵, and 퐴 is not decidable, then neither is 퐵. 
This will also help us prove a problem is non-T-recognizable. 
Let’s recast our previous results in the language of mapping reducibility. 
In the proof of Theorem 9.5 we showed that 
퐴푇푀 ≤푚 퐸푇푀. 
We converted a problem about 퐴푇푀 to a problem about 퐸푇푀. Given ⟨푀,푤⟩ ∈ 퐴푇푀, let 푓 
map it to ⟨푀푤⟩. We have ⟨푀,푤⟩ ∈ 퐴푇푀 iff 푀푤̸∈ 퐸푇푀. 
7Think of the notation as saying 퐴 is “easier” than 퐵 
69
Lecture 9 Notes on Theory of Computation 
A useful fact is that 
퐴 ≤푚 퐵 ⇐⇒ 퐴 ≤푚 퐵; 
by using the same 푓. 
We make one more observation, then prove another theorem. 
We actually have the following strengthened version of Theorem 9.5. 
Theorem 9.8: thm:etm2 퐸푇푀 is not recognizable. 
Proof. We showed 퐴푇푀 ≤푚 퐸푇푀, so 퐴푇푀 ≤푚 퐸푇푀. Since 퐴푇푀 is not recognizable, 퐸TM is 
not recognizable. 
We’ll now use mapping reducibility to give an example of a language such that neither 
it nor its complement is recognizable. We will prove this by reduction from 퐴푇푀. 
Theorem 9.9: EQTM 퐸푄푇푀 and 퐸푄푇푀 are both 푇-unrecognizable. 
Recall that the equality problem is that given 2 Turing machines, we want to know 
whether they recognize the same language. 
Proof. We show that 
1. 퐴푇푀 ≤푚 퐸푄푇푀, or equivalently, 
퐴푇푀 ≤푚 퐸푄푇푀. 
We have to give a function 
푓 : ⟨푀,푤⟩↦→ ⟨푀1,푀2⟩ . 
We let 푀2 be the machine that always rejects. Let 푀1 = 푀푤, the machine that 
simulates 푀 on 푤. If 푀 accepts/rejects 푤 then the first will accept/reject everything 
and 푀2 will reject everything, so ⟨푀,푤⟩ ∈ 퐴푇푀 iff ⟨푀1,푀2⟩ ∈ 퐸푄푇푀. 
2. 퐴푇푀 ≤푚 퐸푄푇푀, or equivalently, 
퐴푇푀 ≤푚 퐸푄푇푀. 
70
Lecture 10 Notes on Theory of Computation 
We have to give a function 
푓 : ⟨푀,푤⟩↦→ ⟨푀1,푀2⟩ . 
We let 푀2 be the machine that always accepts. Again let 푀1 = 푀푤, the machine that 
simulates 푀 on 푤. 
In the remaining 2 minutes, we’ll look at a cool fact that we’ll continue next time. 
Lots of undecidable problems appear throughout math have nothing to do with Turing 
machines. 
We’ll give the simplest example. Let’s define dominoes as pairs of strings of 푎’s and 푏’s, 
such as ⌉︀–푎푏푎 
푎푏 ™, –푎푎 
푎푏™, – ™, . . .« 
Given a set of dominoes, can we construct a match, which is an ordering of dominoes such 
that the string along the top is the same as the string along the bottom? One little point: 
each domino can be reused as many times as we want. This means we have a potentially 
unlimited set of dominoes. 
Is it possible to construct a match? 
This is an undecidable problem! 
And it has nothing to do with automata. But next time we will show we can reduce 퐴푇푀 
to this problem; therefore it’s undecidable. 
Lecture 10 
Thu. 10/11/12 
Midterm Thu. 10/25 in walker (up through next week’s material. Everything on computabil-ity 
theory but not complexity theory.) 
Homework due next week. 
Handout: sample problems for midterm 
Last time we talked about 
∙ reducibility 
∙ mapping reducibility 
Today we will talk about 
∙ Post Correspondence Problem 
∙ LBA’s 
∙ Computation history method 
71
Lecture 10 Notes on Theory of Computation 
We have been proving undecidability. First we proved 퐴푇푀 is undecidable by diagonalization. 
Next, by reducing ATM to another problem, we show that if the other problem were decidable 
so is 퐴푇푀; hence the other problem must also be undecidable. 
Today we’ll look at a fancier undecidable problem. It is a prototype for undecidable 
problems that are not superficially related to computability theory. All proofs for these 
problems use the method we’ll introduce today, the computation history method. 
Ours is a toy problem, with no independent interest. But it is nice to illustrate the 
method, and it is relatively clean. 
Even the solution to Hilbert’s tenth problem uses the computation history method 
(though there are many complications involved). 
S1 Post Correspondence Problem 
Given a finite collection of dominoes 
푃 = ⌉︀‚푢1 
푣1Œ,‚푢2 
푣2Œ, · · · ,‚푢푘 
푣푘Œ« 
a match is a sequence of dominoes from 푃 (repetitions allowed) where 
푢푖1 · · · 푢푖ℓ = 푣푖1 · · · 푣푖ℓ 
The question is: Is there a match in 푃? 
For example, if our collection of dominoes is 
푃 = ⌉︀‚푎푎 
푎푏푎Œ,‚푎푏 
푎푏푎Œ,‚푏푎 
푎푎Œ,‚푎푏푎푏 
푏 Œ« 
then we do have a match because 
⃒⃒⃒⃒⃒ 
푎 푏 | 푎 푎 | 푏 푎 | 푎 푎 | 푎 푏 푎 푏 
푎 푏 푎 | 푎 푏 푎 | 푎 푎 | 푎 푏 푎 | 푏.⃒⃒⃒⃒⃒ 
Formally, define 
푃퐶푃 = {⟨푃⟩ : 푃 has a match} . 
We will show PCP is undecidable. (Note it is Turing recognizable because for a given 
arrangement it’s easy to see if it’s a match; just enumerate all possible arrangements. If we 
find a match, accept.) 
This is an old problem, the first shown undecidable by Post in 1950’s.8 
Let’s modify the PCP so that the match has to start with the starting domino (to see 
how to fix this, see the book). 
Theorem 10.1: PCP PCP is undecidable. 
8Don’t confuse the Post Correspondence Problem with probabilistic checkable proofs, also abbreviated 
PCP. 
72
Lecture 10 Notes on Theory of Computation 
The proof has two ideas. Each takes some time to introduce. Instead of doing them at 
once (they intertwine), we’ll defer the proof and prove a different theorem that uses only one 
of the ideas. Then we prove this theorem and use both ideas. 
To introduce the first idea we’ll go back to a problem in computation. 
S2 Computation Histories and Linearly Bounded Automata 
Definition 10.2: A linearly bounded automaton (LBA) is a modified Turing machine, 
where the tape is only as long as the input string.9 
The head doesn’t move off the left or right hand sides. The machine limited in how much 
memory it has: if you have an amount of tape, the amount of memory is linearly bounded 
in terms of the size of the input (you might have a constant better memory because you’re 
allowed a larger tape alphabet, but you can’t for instance have 푛2 memory). Now let the 
acceptance and empty problems be 
퐴퐿퐵퐴 = {⟨푀,푤⟩ : LBA 푀 accepts 푤} 
퐸퐿퐵퐴 = {⟨푀⟩ : LBA 푀 and 퐿(푀) = 휑} 
Even though 퐴푇푀 was undecidable, 퐴퐿퐵퐴 is decidable. 
Theorem 10.3: ALBA 퐴퐿퐵퐴 is decidable. 
This is a dramatic change in what we can do computationally! 
The key difference that makes 퐴퐿퐵퐴 decidable is that linearly bounded automata have 
finitely many configurations (see below). 
As a side remark, 퐴퐿퐵퐴 is not decidable by LBA’s. In the theorem, by decidable we 
mean it’s decidable by ordinary Turing machines. It is decidable but only by using a lot of 
memory. 
Proof. Define a configuration to be a total snapshot of a machine at a given time: 
(푞, 푡, 푝) 
where 푞 is the state, 푡 is the tape contents, and 푝 is the head position. For an input size, the 
number of configurations of a LBA is finite. 
If we run the LBA for certain amount of time 푇, then it has to repeat a configuration. 
If it has halted by time 푇, then we know the answer. If the machine hasn’t halted, it’s in a 
loop and will run forever. The reject. 
(We don’t even have to remember configurations.) 
For a string 푤 of length 푛, the number of configurations of length 푛 is 
|푄| · |Γ|푛 · 푛. 
9The LBA doesn’t have a limited tape. (Then it would be finite automaton.) The tape is allowed to be 
enough to grow just enough to fit the input, but it can’t grow any further. 
73
Lecture 10 Notes on Theory of Computation 
We now write down the TM that decides 퐴퐿퐵퐴. 
“On input ⟨푀,푤⟩, 
1. Compute |푄| · |Γ|푛 · 푛. 
2. Run 푀 on 푤 for that many steps. 
3. Accept if accepted. 
Reject if not yet accepted after that many steps. 
Note that to write the number |푄||Γ|푛푛 down requires on the order of 푛 ln 푛 length tape, 
so intuitively 퐴퐿퐵퐴 is not decidable by LBA’s. The same diagonalization method can prove 
that 퐴퐿퐵퐴 is not decidable by LBA’s. In general, we can’t have a class of automata which 
decide whether automatons of that class accept. 
In contrast with 퐴퐿퐵퐴, 퐸퐿퐵퐴 is still undecidable. 
Theorem 10.4: ELBA 퐸퐿퐵퐴 is undecidable. 
We prove this in a totally different way. Before, to prove 퐸푇푀 is undecidable (Theo-rem 
9.5), we showed 퐴푇푀 reduces to 퐸푇푀. We also have 퐴퐿퐵퐴 reduces to 퐸퐿퐵퐴, but doesn’t 
tell us anything because 퐴퐿퐵퐴 is decidable! 
Instead we reduce 퐴푇푀 to 퐸퐿퐵퐴. This is not obvious! Assume 푅 decides 퐸퐿퐵퐴. We’ll 
construct 푆 deciding 퐴푇푀. (This is our standard reduction framework.) This is hard because 
we can’t feed Turing machines into 퐸퐿퐵퐴: it only accepts LBA’s as input, not general Turing 
machines. 
We use the idea of computation history. 
Definition 10.5: Define an accepting computation history of a TM 푇 on input 푤 to 
be 
퐶1,퐶2, . . . ,퐶accept 
where 퐶1 is the start configuration, each 퐶푖 leads to 퐶푖+1, and 퐶accept is in an accepting 
state. 10 
If a Turing machine does not accept its input, it does not have an accepting computation 
history. An accepting configuration history exists iff the machine accepts. 
It’s convenient to have a format for writing this down (this will come up later in com-plexity 
theory). Write down (푞, 푡, 푝) as 
푡1푞푡2. 
10The computation history stores a record of all motions the machine goes through, just as a debugger 
stores a snapshot of what all the registers contain at each moment. 
74
Lecture 10 Notes on Theory of Computation 
This means: split the tape into 2 parts, the part before the head is 푡1, the part after the 
head is 푡2 and 푞 points to the first symbol in 푡2. All I’m doing here is indicating the position 
of the head by inserting a symbol representing the state in between the two parts. 
Write the computation history as a sequence of strings like this separated by pound signs. 
퐶1#퐶2#· · ·#퐶accept. 
Here 퐶1 is represented by 푞0푤1 · · ·푤푛. 
Proof of Theorem 10.4. Let 푆 =“on input ⟨푇,푤⟩, 
1. Construct LBA 푀푇,푤=“On input 푧, 
(a) test if 푧 is an accepting computation history for 푇 on 푤. 
(b) Accept if yes. 
Reject if not. 
Note 푀푇,푤 does not simulate 푇. It simply checks if the input is a valid computation of 푇, 
in the form 퐶1#퐶2#· · ·#퐶accept. 
Why is this a LBA? It doesn’t actually simulate 푇, it just checks the computation; this 
doesn’t require running off the tape. 
What does 퐶1 need to look like? It must look like 푞0푤1 · · ·푤푛. How do we check 퐶2? 
This is a delicate point. See whether 퐶1 updated correctly to 퐶2. Zigzag back and forth 
between 퐶1 and 퐶2 to check that everything follows legally. If anything is wrong, reject. It 
can put little markers on the input. It repeats all the way to 퐶accept, then check that 퐶accept 
is in an accept state. 
Now the only string our LBA 푀푇,푤 could possibly accept, by design, is an accepting 
computation history. The point is that checking a computation is a lot easier than doing it 
yourself; a LBA is enough to check a TM’s computation. 
Now we have 0 or 1 string is in푀푇,푤. If 푇 does not accept, then 퐿(푀푇,푤) is empty. If 푇 ac-cepts, 
there is exactly one accepting computation history, namely the correct 퐶1#· · ·#퐶accept. 
Each configuration forces the next all the way to the accepting configuration. (We’ve built 
the Turing machine based on the particular Turing machine 푇 and string 푤.) Hence 푀푇,푤 
is empty if and only if 푇 does not accept 푤. 
We’ve reduced 퐴푇푀 to 퐸퐿퐵퐴. This proves 퐸퐿퐵퐴 is undecidable. 
Note the computational history method is especially useful to show the undecidability of 
a problem that has to do with weaker models of computation. 
75
Lecture 10 Notes on Theory of Computation 
S3 Proof of undecidability of PCP 
Proof. (This is slightly rushed; see the book for a complete treatment.) The idea is to 
construct a collection of dominoes where a match has to be an accepting computation history. 
(For simplicity, we’ll deal with the modified PCP problem where we designate a certain 
domino to be the starting domino.) 
Given 푇, 푤, we construct a PCP problem 푃푇,푤 where a match corresponds to an accepting 
computation history. We construct the dominoes in order to force any match to simulate a 
computation history. 
∙ Let the start domino be ‚ # 
Œ. 
#푞0푤1 . . .푤푛#∙ If in 푇 we have 훿(푞, 푎) = (푟, 푏,푅), put the domino 
‚푞푎 
푟푏Œ 
in. 
Similarly we have a domino for left transitions (omitted). 
푎 
∙ For all tape symbols 푎 ∈ Γ, put in ‚Œ. 
푎Consider a concrete example, 푤 = 011, and 훿(푞0, 0) = (푞5, 2,푅). We have the domino 
‚푞00 
2푞5Œ. (The construction has one simple idea really: the transition dominoes, like ‚푞푎 
푟푏Œ, 
force each configuration to lead to the next configuration.) We start with 
⃒⃒⃒⃒⃒ 
#| 푞0 0| # 푞0 0 1 1 #| 2 푞5|⃒⃒⃒⃒⃒ 
We’ve managed to push forward the match on one more domino. Now we have to copy 
everything. We use ‚1 
Œ, ‚0 
Œ, ‚2 
Œ‚# 
, Œ: 
102#⃒⃒⃒⃒⃒ 
#| 푞0 0| 1| 1| #| #| 푞0 0 1 1 #| 2 푞5| 1| 1| #|⃒⃒⃒⃒⃒ 
At end, computation history is done, but match isn’t. We add dominoes ‚푞accept푐 
푞accept Œ, ‚푐푞accept 
푞accept Œ, 
and ‚푞accept## 
# Œ. 
These dominoes “eat” the tape one symbol at time, around the accept state. Putting in 
one last domino finishes it off. 
The start domino is a technicality. 
We’ve reduced 퐴푇푀 to PCP. Therefore, PCP is undecidable. 
76
Lecture 11 Notes on Theory of Computation 
A Turing machine accepts an input if and only if it has an accepting computation 
history for that input. Thus the problem of whether 푇 accepts 푤 can be formulated 
as: does 푇 have an accepting computation history for 푤? 
This formulation is more concrete, and it is much easier to check whether a compu-tation 
history is correct, then ask whether 푇 accepts 푤. 
To show a problem that isn’t related to computability theory is undecidable, find a 
way to simulate/encode an undecidable problem (such as 퐴푇푀) with that problem. 
It is useful to encode computation histories. 
Lecture 11 
Tue. 10/16/12 
Last time we talked about 
∙ the computation history method 
∙ 퐸퐿퐵퐴, PCP undecidable. 
Today we’ll do the following. 
∙ Review above, ALL푃퐷퐴 undecidable 
∙ Recursion theorem 
∙ Introduction to logic 
S0 Homework 
Enumerate collection of deciders hit every decidable language. Impossible. 
Argue reduction works as claimed. 
4. computation history 
5-6. today. 6. model for particular sentence. if haven’t seen logic before, read section. 
Hint built into problem if you read it carefully. 
S1 Computation history method 
The state doesn’t give us all the information about a Turing machine. Recall that a config-uration 
of a Turing machine 푀 consists of the state, tape contents, and head position. We 
have a convenient representation, where 푞 is written at the head position. 
77
Lecture 11 Notes on Theory of Computation 
A computation history of 푀 on 푤 is 퐶1, . . . ,퐶halt a sequence of configurations 푀 
enters. It’s much easier to check these than to simulate a Turing machine outright. It is 
possible to check computation history with many kinds of automata or combinatorial objects. 
We started with Turing machine 푀 and 푤, and the problem of whether 푀 accepts 푤. We 
found we can convert this into an instance of PCP where the only possible match corresponds 
to an accepting computation history. The undecidability of Hilbert’s tenth problem is shown 
using the same idea. Here one has to construct polynomial in several variables (originally it 
was 13 variables). One variable plays the role of the input to the polynomial. The only way 
for the polynomial to have an integral solution is for the assignment to 푥 to be an accepting 
computational history suitably encoded in an integer. The other variables are helpers, to 
make sure the polynomial evaluates to 0 exactly when 푥 is an accepting computational 
history. Polynomials are a rather restricted comput model, so the polynomial is rather 
painful to present. (It would take an entire semester.) 
Let 
ALL푃퐷퐴 = {⟨푃⟩ : 푃 a 푃퐷퐴 and 퐿(푃) = Σ*} . 
It is the problem: does a pushdown automaton accept all strings? We use the computational 
history method to show the following. 
Theorem 11.1: ALLPDA ALL푃퐷퐴 is undecidable. 
Proof. We reduce 퐴푇푀 to ALL푃퐷퐴. We take ⟨푀,푤⟩ and convert it to a pushdown automaton 
푃푀,푤, such that if can tell whether 푃푀,푤 accepts all inputs, we can tell whether 푀 accept 푤. 
We construct 푃푀,푤 by having it operate on computation histories. However, instead of 
having 푃푀,푤 accept an accepting computation history, we have it accept every string except 
for this string. It is the sanitation engineer that accepts all the bad strings, junk, garbage. 
If 푀 doesn’t accept 푤, then there is no accepting history, so everything is junk, and 푃푀,푤 
accepts everything. If 푀 accepts 푤, then 푃푀,푤 accepts everything except one string. We 
feed 푃푀,푤 into a machine for ALL푃퐷퐴 to decide 퐴푇푀. 
78
Lecture 11 Notes on Theory of Computation 
How can we make a PDA to accept all the junk? It will use nondeterminism. It checks 
the configuration history to see if it 
∙ fails to start correctly, 
∙ fails to end correctly, 
∙ or fails to go from one step to the next correctly. 
푃푀,푤 has the starting configuration built in, so it can check whether the history starts 
correctly. If not, accept. One branch looks at the last configuration; if that is not right, 
accept. 
Now 푃푀,푤 scans through the configuration history (nondeterministically). At a place 
where it guesses there may be an error, it pushes 퐶푖 onto the stack, then pop 퐶푖 off as it 
compares 퐶푖 to 퐶푖+1, seeing if everything matches except that stuff near the head is updated 
correctly. However, 퐶푖 comes out in the reverse order that it was put in. The trick is to 
write every other configuration in reverse. 퐶푅 
2 , 퐶푅 
4 . If 푃푀,푤 finds a bug, then it accepts. 
Remark: Why don’t we do the original thing, accept only the accepting computation history 
and nothing else? But that would only prove 퐸푃퐷퐴 is undecidable. 
And in fact, we can’t do that because 퐸푃퐷퐴 is decidable! We have to check each config-uration 
legally follows the next. For instance, if we want to check 퐶3 legally yields 퐶4, we 
have a problem because we’ve already read 퐶3 when comparing it to 퐶2. We can’t push 퐶3 
and match it with 퐶4. This is an unfixable problem. 
S2 Recursion theorem 
The recursion theorem is an amazing theorem that gives a fitting end to the computability 
part of the course. 
It does some things that seem counter-intuitive. 
Can we make a Turing machine (or any reasonable computation model, such as a com-puter 
program), such that when we turn it on, it prints out its own description? I.e., can 
we make a self-reproducing Turing machine? Can we write a piece of code which outputs an 
exact copy of itself? 
We might argue as follows: We’d have to take a copy of the program and put it inside 
itself, and then another copy inside that copy, and so forth. We’d have to have an infinite 
program, with copies of itself down forever. 
But in fact we can make such a program, and it is useful in many cases. 
This theorem answers one paradox of life: Living things reproduce—make copies of 
themselves. Does that mean each living thing had its descendants inside, descendants of 
descendants inside those, down forever? No. Today, thoughts like that are so absurd they 
don’t even bear consideration. We don’t need to do that. 
Let’s make the paradox more concrete. Can we make a machine to build other machines? 
We can make a factory (suppose it’s fully automated) that makes cars. The factory is more 
79
Lecture 11 Notes on Theory of Computation 
complicated than the cars: It is at least as complicated because it has instructions for building 
the cars. It’s more complicated because it has all the machinery (robots and so forth). What 
if we wanted to make a factory that builds factories, identical copies of itself? It has robots 
which assemble a factory; it seems the factory would have to be more complicated than itself! 
But the Recursion Theorem says our intuition is false. We can make a factory-producing 
factory. 
There are practical situations where a program would produce copies of itself. Generally 
these are malicious programs (depend on whose side you’re on). This is one way to make a 
computer virus—the virus obtains an exact copy of itself, transmits it to the victim computer, 
installs virus, and continues spreading). One way to transmit virus is the Recursion Theorem. 
(The other way is to use the special nature of machine to find the address of own executable 
and get the code.) 
Theorem 11.2 (Recursion Theorem): recursion We can make a Turing machine “SELF” 
where on blank input, SELF outputs ⟨SELF⟩. 
We can do this in any programming language! 
The proof relies on the following lemma. 
Lemma 11.3: There is a computable function 푞 : Σ* → Σ* such that for every 푥, 
푞(푥) = ⟨푃푥⟩ 
where 푃푥 is a Turing machine that prints 푥 (on any input). Moreover, we can determine 
that Turing machine from 푥 in a computable way. 
Proof. Let 푃푥 =“print 푥.” 
(Note the function is called 푞 for quote, because in LISP, this function is represented by 
sending 푥 to “푥.) 
Proof. The TM SELF will have 2 phases 퐴 and 퐵 . Control passes from 퐴 to 퐵 after 퐴 is 
done. 
80
Lecture 11 Notes on Theory of Computation 
퐴 is very simple: 퐴 = 푃⟨퐵⟩. Now we have to say what 퐵 is. 
Why don’t we do the same thing to get the 퐴 part? Try to set 퐵 = 푃⟨퐴⟩. This is not 
possible. 퐴 is much bigger than 퐵. 퐴 is a 퐵-factory. You can’t take the description of 퐴 
and stuff it into 퐵; the same circular reasoning got us into trouble in first place. 
We don’t print out print out 퐴 by having a copy of 퐴 inside 퐵. So how does 퐵 find 
what 퐴 is? It computes 푞 of the string on the tape. 푞 of that string is 퐴! Let 퐵=“compute 
푞(tape) and prepend to tape.” 
We can do this in any programming language. Let’s do it in English. 
Print out this sentence. 
If you execute this code, out comes a copy of itself. It tells you as executer to print out 
copy of itself. However, it cheats, because “this” is a pointer to self. In general, there is 
no pointer refering to the code. We show how to get the same effect, in software, without 
self-reference. It achieves the same goal without “this” refering to itself. 
Here is the legit version. 
Print out two copies of the following, the second one in quotes. 
“Print out two copies of the following, the second one in quotes.” 
If you execute this command, you write the same thing. The A part is below, the B part is 
above. A is slightly bigger than B by virtue of quotes. 
Why is the Recursion Theorem more than just a curiosity? Besides being philosophically 
interesting, it has applications in computability theory and logic. 
The Recursion Theorem in full generality says that we can obtain a complete description 
and process that copy. Sometimes this is very helpful. 
Theorem 11.4 (Full recursion theorem): For any Turing machine 푇, there is a TM 푅 where 
푅(푥) behaves the same as 푇(⟨푅, 푥⟩). 
Think of 푅 as a compiler which computes its own description. 
Proof. Figure 7. 
푅 has 3 pieces now, 퐴, 퐵, and 푇, where 퐴 = 푃⟨퐵푇⟩ and 퐵 is as before. 
Moral of story: 
We can use “get own description” in Turing machine code. 
Why would we want to do that? We give a new proof that 퐴푇푀 is undecidable. 
Proof. Assume for sake of contradiction that 퐻 decides 퐴푇푀. Consider the following Turing 
machine: Construct TM 푅 =“on input 푥, 
81
Lecture 11 Notes on Theory of Computation 
1. Get own description ⟨푅⟩. 
2. Run 푅 on ⟨푅, 푥⟩ to see if 푅 accepts 푥. 
3. Do the opposite of what 푅 did on ⟨푅, 푥⟩ 
This is a contradiction because 푅 accepts iff 푅 says 푅 doesn’t accept 푥. 
In a sense, the recursion method is standing in for the diagonalization argument here. 
Let’s give another application to something we haven’t proved yet. Let 
MIN = {⟨푀⟩ : 푀 is a TM with the shortest description among all equivalent TM’s} . 
Theorem 11.5: MIN MIN is not Turing-recognizable. 
Proof. Recognizable means enumerable. 
Assume by way of contradiction that 퐸 enumerates MIN. Make 푅 =“on 푥, 
1. Get ⟨푅⟩. 
2. Run 퐸 until some machine 푀 appears where ⟨푀⟩ is longer than 푅. 
3. Simulate 푀 on 푥. 
Our 푅 will simulate the smallest machine in MIN larger than 푅, which contradicts the 
definition of MIN. 
As a summary, here are a list of problems we’ve proven to be decidable, undecidable, and 
unrecognizable. (Keep in mind CFG=PDA for the purposes of computation.) 
∙ Decidable: 퐴퐷퐹퐴 (Theorem 7.1), 퐸퐷퐹퐴 (Theorem 7.3), 퐸푄퐷퐹퐴 (Theorem 7.4), 퐴퐶퐹퐺 
(Theorem 7.5), 퐸푃퐷퐴 (exercise), 퐴퐿퐵퐴 (Theorem 10.3). 
∙ Undecidable: 퐴푇푀 (Theorem 8.10), HALT푇푀 (Theorem 9.3), ALL푃퐷퐴 (Theorem 11.1), 
퐸푄퐶퐹퐺 (Theorem 7.10), 퐸퐿퐵퐴 (Theorem 10.4), PCP (Theorem 10.1). (Note: I haven’t 
checked whether these are recognizable.) 
∙ Unrecognizable: 퐴푇푀, 퐸푇푀 (Theorem 9.8), 퐸푄푇푀, 퐸푄푇푀 (Theorem 9.9), MIN 
(Theorem 11.5). 
82
Lecture 12 Notes on Theory of Computation 
S3 Logic 
We’ll finish today with a quick tour of logic. This is something that takes weeks in logic 
course; we’ll do it in 10 minutes. 
Logic is the math of math itself. Logic formalizes what we mean by mathematical state-ments. 
For instance, 
휑 : ∀푥∃푦[푦  푥]. 
We all believe we can formalize math and define what quantifiers mean. (This is several 
weeks in logic.) This statement has meaning. It depends on what universe the quantifiers 
quantifying over. For natural numbers with usual interpretation, this is false. If we instead 
interpret over R or Z, then it is true. 
We have to give give a universe for quantifiers to range over and define all relation 
symbols “.” Ordinary boolean logic allows us to combine statements. We get a meaning 
for sentence, and it is either true or false in a given model. 
Definition 11.6: A model is a universe with all relation symbols defined. 
For instance, a model for a statement 휑 has 휑 true. 
Let the universe be N and the relations for + and ×. Let 
Th(N,+,×) = {all true sentences for this model}. 
Skipping over details, you can imagine what we mean. Some sentences are true, others won’t 
be true. Considering the sentences as strings, is this set decidable? G¨odel and others showed 
it is not decidable. We can write down sentences to describe what Turing machines do; + 
and × are expressive enough to describe Turing machines. 
There are two notions, truth and provability. What does it mean to give a proof of a true 
statement? We mean that from the axioms, and simple rules of implication, you can have a 
chain of reasoning that gets to that statement. 
Consider the famous axioms called Peano axioms. Can you prove all true things from 
Peano axioms? You cannot! You can make a recognizer for all provable things: Search 
through all possible proofs, until find proof in question. If everything is either provable or 
its complement is provable, you can search for the proof of the statement or its negation, 
and this would give a decider for its provability. In truth, this doesn’t exist. 
Can we exhibit a statement which is unprovable? Try: “This statement is unprovable.” 
If the sentence were false it would be provable; so it must be true, hence unprovable. This 
statement is true and unprovable. However, we’ve actually cheated by using self-reference, 
however, but one can fix this using the recursion theorem. 
83
Lecture 12 Notes on Theory of Computation 
Lecture 12 
Thu. 10/18/12 
Now that we’ve wrapped up the first half on computability theory, we have a midterm next 
Thursday, on the 3rd floor of Walker, at the usual time 11:00–12:30. It is open book (the 
book for this course only)/handouts/notes and covers everything through the last lecture. 
The test will contain about 4 problems. 
Last time we talked about 
∙ the recursion theorem, and 
∙ an introduction to logic. 
Today we’ll talk about 
∙ an introduction to complexity theory, 
∙ TIME (푡(푛)), and 
∙ 푃. 
We’re shifting gears to talk about complexity theory. Today we will introduce the subject 
and set up the basic model and definitions. 
S1 Introduction to complexity theory 
We have to go slow at the beginning to get the framework clear. We’ll begin by a simple 
example. In computability theory a typical question is whether a problem is decidable or 
not. As a research area that was mostly finished in the 50’s. Now we’ll restrict our attention 
to decidable languages. The question now becomes 
how many time or resources do we need to decide? 
This has been an ongoing area of research since the 60’s. 
Let 퐴 = ⌋︀0푘1푘 : 푘 ≥ 0{︀. This is a decidable language (in fact, context-free). We want to 
know how hard it is to see whether a string is in 퐴. We could measure hardness in terms of 
number of steps, but the number of steps depend on the input. For longer strings it may take 
more time, and within strings of same length, some strings may take longer than others. For 
instance, if the string starts with 1, we can reject immediately. The picture is a bit messy, 
so to simplify, we’ll only consider (other do diff things), we’ll only consider how much time 
is necessary as a function of the length 푛 of the input. Among all inputs of given length, 
we’ll try to determine what the worst case is, that is, we consider the worst case complexity. 
Summarizing, we consider how the number of Turing machine steps depends on the input 
length 푛, and look at the worst case. 
84
Lecture 12 Notes on Theory of Computation 
Recall that no matter whether we considered single tape, multi-tape, or nondeterministic 
Turing machines, what is computable remains invariant. This is not true for complexity 
theory: The picture will change depending on what model you use. 
We’re not seeking to develop a theory of one-tape Turing machines. Turing machines are 
a stand-in for computation. We want to try to understand computation. What can do in 
principle, in a reasonable amount of time? We don’t want to just focus on Turing machines. 
The fact that complexity depends on the model is a problem. Which should we pick? 
But as we will see, although it depends on model, it doesn’t depend too much, and we can 
recover useful theorems. 
1.1 An example 
Example 12.1: We analyze how much space it takes to decide 퐴 = ⌋︀0푘1푘 : 푘 ≥ 0{︀. 
Let 푀1=“(1-tape Turing machine) 
1. Scan the input to test whether 푤 ∈ 0*1*. We’ll make a pass over the input just to see 
it’s of the right form: no 1’s before 0’s. 
2. Go back to the beginning. Mark off 0 (turn it into another symbol), mark off a 1, then 
go back to mark off the next 0, then the next 1, and so forth. Continue until we run 
out of 0’s, 1’s, or both. If we run out of 0’s or 1’s first, then reject. If they run out at 
the same time, accept. 
(We needed to spell this out to know how much time the machine is using.) 
In summary, repeat until all symbols are crossed off: 
(a) Pass over input, cross off 0’s and 1’s. 
(b) If either finishes before other, reject. 
3. If all symbols are crossed off, accept. 
For this course, we won’t care about the constant factors in the time used: 10푛2 steps and 
20푛2 are equivalent for us. (We could have the machine do some work on the reverse, but 
we won’t bother.) 
How much time does 푀1 take? 
85
Lecture 12 Notes on Theory of Computation 
Theorem 12.2: A 1-tape Turing machine can decide 퐴 using 푐푛2 steps for all inputs of 
length 푛 and some fixed constant 푐. 
We specify number of steps up to constants, so it’s convenient to have notation for this. 
We’ll refer to 푐푛2 as 푂(푛2). This means at most a constant times 푛2, where the constant is 
independent of 푛. 
The definition is spelled out in the book; see the definition there. 
Proof. Construct 푀1 is above. How long does each step take? 
1. Scan input to test 푤 ∈ 0*1*. This takes 푂(푛) time: 푛 steps to go forward and 푛 steps 
to go back. 
2. Repeat until all symbols are crossed off: We need at most 푛 
2 steps. 
(a) Pass over input, cross off 0’s and 1’s. This takes 푂(푛) time. 
(b) If either finishes before other, reject. 
3. If all crossed off, accept. 
Thus 푀1 takes time 
푂(푛) + 
푛 
2 
푂(푛) = 푂(푛2). 
(the nice thing about 푂 notation is that we only have to look at the dominant term when 
adding. We can throw away smaller terms because we can absorb them into the constant in 
the dominant term.) 
Is this possible, or can we do better? Let’s still stick to one-tape Turing machine. This is 
not the best algorithm out there; we can find a better one. Here is a suggestion. Zigzagging 
over the input costs us a lot more time. What if we cross off more 0’s and 1’s on a pass? 
We can cross of two 0’s and 1’s, so this takes half as much time. But we ignore the constant 
factor, so for our purposes this isn’t really an improvement. 
We don’t ignore these improvements not because they’re unimportant. In the real world, 
it’s good to save factor of 2, that’s good. However, we choose to ignore these questions 
because we are looking at a different realm: questions that don’t depend on constant factors, 
or even larger variations. 
By ignoring some things, other things come out more visibly. For example, everything 
reduces to quarks, but it would not benefit biologists to study everything on the level of 
quarks. 
1.2 An improvement 
We can improve running time to 푂(푛 log 푛); this is significant from our standpoint. Instead 
of crossing out a fixed number of 0’s and 1’s, we’ll cross off every other 0 and every other 
1 (Fig. 2), remember the even/odd parity of the number of 0’s and 1’s, and makes sure 
86
Lecture 12 Notes on Theory of Computation 
the parities agree at every pass. After every step we go to the beginning and repeat, but 
we ignore the crossed off symbols. We always check the parity at each step. If they ever 
disagree, which can only happen if the number of 0’s and 1’s are different, we reject. If 
the parity is the same at each step, then there must be same number of 0’s as 1’s. This is 
because the parities are giving the representation of number of symbols in binary (details 
omitted). 
This is a more efficient algorithm but less obvious than the original algorithm. It looks 
like room there is room for improvement, because we only need 푛 steps to read the input. 
Could the algorithm do 푂(푛)? 
We can do 푂(푛) with 2 tapes as follows. Read across the 0’s, and copy them onto the 
2nd tape. Then read 1’s on the 1st tape and match the 0’s against the 1’s on the second 
tape. We don’t need to zigzag, and we can finish in 2푛 steps. 
In fact there is a theorem that we cannot decide 퐴 in 푂(푛) steps with a 1-tape Turing 
machine. If we can do a problem with 푂(푛) steps on a 1-tape Turing machine, then it is a 
regular language! (This is not obvious.) If the machine can only use order 푛 time, then the 
only thing it can do is regular languages: the ability to write doesn’t help. 
In fact, anything that takes time 표(푛 log 푛) must be a regular language as well. 
S2 Time Complexity: formal definition 
Which model should we pick to see how much time it takes? In computability theory we had 
model independence, the Church-Turing Thesis. Any model that we pick captures the same 
class of languages. Unfortunately, in complexity theory we have model dependence. Fortu-nately, 
for reasonable models, the dependence is not very big. Some interesting questions 
don’t depend (much) on the choice of “reasonable” models. 
In the meantime, we’ll fix a particular model, set things up using that model, and show 
things don’t change too much if we choose a different model. For convenience we’ll choose 
the same model we had all along, a 1-tape Turing machine, then show that it doesn’t change 
too much for other models. 
Definition 12.3: For 푡 : N → N, we say a Turing machine 푀 runs in time 푡(푛) if for all 
inputs 푤 of length 푛, 푀 on 푤 halts in at most 푡(푛) steps. 
For instance, we say 푀 runs in 푛2 time if 푀 always halts in 푛2 steps when we give it an 
input of length 푛. 
We now define the class of languages we can do in a certain number of steps. 
Definition 12.4: Define 
TIME(푡(푛)) := {퐴 : some TM decides 퐴 and runs in 푂(푡(푛)) times} . 
This is called a time complexity class. 
87
Lecture 12 Notes on Theory of Computation 
We defined the time complexity using 1-tape turing machines. For a 2-tape TM, what 
is in the classes could change. Once we draw the picture, we can ask: is there a language 
we can do in 푛2 time but can’t do in 푛 log 푛 time? We’ll look at questions like this later on, 
and be able to answer of them. 
2.1 Polynomial equivalence 
Even though the theory depends on model, it doesn’t depend too much. This comes from 
the following statement. 
Theorem 12.5: Let 푡(푛) ≥ 푛. Then every multi-tape TM 푀 that runs in 푡(푛) time has an 
equivalent 1-tape Turing machine 푆 that runs in order 푂(푡2(푛)) time. 
In other words, converting a multi-tape TM to a single tape TM can only blow up the 
amount of time by squaring; the single tape TM can polynomially simulate the multi-tape 
TM. 
You might think this is bad, but for a computer, this is not too bad. It could be worse 
(exponential). 
Proof. We analyze the standard simulation (from the proof of Theorem 6.7). 
The conversion only ends up squaring the amount of time used. Indeed, it took the tapes 
and wrote them down next to each other on the tape. Every time the multitape machine 
푀 did one step, the single-tape machine 푆 had to do a lot of steps, and then do an update. 
One step of 푀 might have 푆 pass over entire portion of tape. Each tape can be at most 
푡(푛) symbols long, because there are only 푡(푛) steps where it can write symbols. There are a 
constant number of tapes. Thus one pass at most 푂(푡(푛)) steps. The machine has make at 
most 푡(푛) passes. Thus the order is 푂(푡(푛)2). 
88
Lecture 12 Notes on Theory of Computation 
Here is an informal definition. 
Definition 12.6: Two computational models are polynomially equivalent if each can 
simulate the other with at most polynomial increase (푡(푛) can go to 푂(푡(푛)푘) for some 푘). 
All reasonable deterministic models of computation turn out to be polynomially equiva-lent. 
This is the complexity analogue of the Church-Turing Thesis. 
Axiom 12.7 (Church-Turing Thesis): church-turing-complexity All reasonable determinis-tic 
models are polynomially equivalent. 
This includes one-tape TM’s, multi-tape TM’s, 2-dimensional TM’s, and random access 
machines (which are closer to a real computer) which can write an index and grab the 
memory cell at that location (the address). 
A real computer is a messy thing to discuss mathematically. It doesn’t have infinite 
amount of memory. From some points of view, it is like a finite automaton. The most useful 
way to abstractify it is as a random access machine(RAM) or a parallel RAM (PRAM). If 
the machine only has polynomial parallism, then it is also polynomially equivalent. 
The analogous question with nondeterministic TM’s is hard. No one knows a polynomial 
simulation. It is a famous open problem whether we convert a nondeterministic TM to a 
deterministic TM with a polynomial increase in time. 
2.2 P 
The complexity version of the Church-Turing Thesis 12.7 tells us the following. 
All reasonable deterministic models are polynomially equivalent. Thus, if we ignore 
polynomial differences, we can recover a complexity class independent of the model. 
Definition 12.8: Let 
푃 = ⋃︁푘 
TIME(푛푘) = TIME(poly(푛)). 
In other words, 푃 consists of all languages solvable in 푂(푛푘) time for some 푘. Why is 푃 
important? 
1. The class 푃 is invariant under choice of reasonable deterministic model. Time classes 
change when we go from 1-tape to multi-tape TM’s. But by using polynomial equivalence— 
taking the union over all 푂(푛푘)—the class 푃 is not going to change from model to 
model. We get the same class 푃. 
Mathematically speaking, this invariance is natural. 푃 not a class to do with Turing 
machines. It’s to do with the nature of computation. 
89
Lecture 12 Notes on Theory of Computation 
2. Polynomial time computability roughly corresponds to practical computability. It is a 
good litmus test: a good way of capturing what it means for a problem to be solvable 
practically. 
Of course, practicality depends on context. There is a continuum between practical 
and unpractical algorithms, but polynomial computability is a good dividing line. 
One feature of 푃 makes it mathematically nice, and one feature tell you something practical 
to real world. A math notion with both these aspects is very good. 
This is why 푃 is such an influential notion in complexity theory and throughout math. 
2.3 Examples 
Let’s look at something we can solve in polynomial time. Let 
PATH = {⟨퐺, 푠, 푡⟩ : 퐺 is a directed graph with a path from 푠 to 푡} . 
Theorem 12.9: PATH∈ 푃. 
The way to prove something like this is to give an algorithm that runs in polynomial 
time. 
Proof. “One input ⟨퐺, 푠, 푡⟩, 
1. Mark a node 푠. 
Repeat until nothing new is marked: 
∙ Mark any node pointed to by previously a marked node. 
2. Accept if 푡 is marked and reject if not. 
We start at 푠, mark everything we can get to in 1 step by marking nodes adjacent to 푠; 
then we mark nodes adjacent to those... This is a simple breadth-first search, not the best, 
but it runs in polynomial time. 
We will often omit time analyses unless it is not obvious. If each step runs in polynomial 
time, and all repetitions involve a polynomial number of repeats, then the problem is solvable 
in 푃. 
If we look at a similar problem, however, everything changes. 
90
Lecture 14 Notes on Theory of Computation 
Definition 12.10: A Hamiltonian path goes through every node exactly once. 
Is HAMPATH ∈ 푃? The algorithm above doesn’t answer this question. It’s a decidable 
problem because we can try every possible path, but there can be an exponential number of 
paths (in terms of the size of the graph). 
The answer is not known! This is a very famous unsolved problem. 
Lecture 13 
Tue. 10/23/12 
Absent because of sickness. 
Lecture 14 
Tue. 10/30/12 
Last time we talked about 
∙ NTIME(푡(푛)) 
∙ NP 
Today we’ll talk about NP-completeness. 
S1 P vs. NP 
Recall that P is the class of problems (languages) where we can test membership quickly (in 
polynomial time in the size of the input). NP is the class of problems where we can verify 
membership quickly. We verify via a “short certificate” or a “short proof.” The verifier 
would be convinced that the string is in the language. Hamiltonian path is a good example: 
membership is easily verified by giving the path. Nonmembership is trickier: No one knows 
whether there is a way to exhibit short proof for non-existence of a Hamiltonian path. The 
complement of HAMPATH is not known to be in NP. 
We can always flip the answer in P. However, we can’t do so easily in NP: the acceptance 
structure can’t be complemented easily in nondeterministic Turing machine. 
! The complement of a language in P is in P (coP=P). However, the complement of 
a language in NP may not be in NP, because a NTM can’t easily do the opposite of 
what another NTM does. 
The big problem in theoretical computer science is P versus NP. Most people believe 
P̸=NP: there is a larger class of languages that can be verified in polynomial time than can 
91
Lecture 14 Notes on Theory of Computation 
be solved in polynomial time. The other alternative is that 푃 = 푁푃. We’ve seen that SAT, 
HAMPATH, CLIQUE, etc. are in NP. 
This problem was posed in the early 1970’s, though it had precursors in the literature 
10–15 years prior. There is an amazing letter Kurt G¨odel sent to John von Neumann in 
1955–1956 about the problem, using different language: Do we have to look for proofs by 
brute force or is there some quicker way? The problem has spread outside the computer 
science community to the math community. P vs. NP is one of Millenium problems, put 
together by a committee in 2000 as the analogue to Hilbert’s problems in 1900. Langton 
Clay put in prize money for a solution: one million dollars. 
S2 Polynomial reducibility 
Early progress on the P vs. NP problem gave the amazing theorem. 
Theorem 14.1: thm:sat-np SAT∈P iff P=NP. 
This would be important in a proof of the 푃 ?= 
푁푃 problem. If might seem that you 
have to find an algorithm for all NP problems. If you believe P=NP, all you have to do is 
find an algorithm for SAT. On the flip side, to show 푃̸= 푁푃, all you have to do is pick 
one problem and show it’s in NP but not in P. But you might pick the wrong problem, for 
instance compositeness (primality testing), which is actually in 푃. This theorem tells you 
you can just focus on SAT. 
This is an application of the theorem to understanding the P vs. NP problem. If you 
think of problems in P as being easy, and problems outside being hard, and if you assume 
that P̸=NP, then this theorem tells you that SAT is not easy. This gives evidence that SAT 
does not have a polynomial time algorithm. 
Enough philosophical musings; let’s do math. We’ll work our way towards the proof of 
Theorem 14.1 today and finish next time. 
We use a notion that we’ve seen before—reducibility. 
Definition 14.2: 퐴 is polynomial time mapping reducible to 퐵 (퐴 ≤푃 퐵) if 퐴 ≤푚 퐵 
(퐴 is mapping reducible to 퐵) and the reduction is computable in polynomial time. 
92
Lecture 14 Notes on Theory of Computation 
In other words, the thing that does the mapping can be done quickly. Not only can you 
translate 퐴-questions to 퐵-questions, you can do so by a polynomial time algorithm. 
Just as we proved Proposition 9.7, we can show the following. 
Theorem 14.3: If 퐴 ≤푃 퐵 and 퐵 ∈ 푃, then 퐴 ∈ 푃. 
Let’s do an example. 
2.1 An example: 3SAT reduces to CLIQUE 
Example 14.4: 3SAT≤푃CLIQUE. 
Recall that 
SAT = {⟨휑⟩ : 휑 is a satisfiable Boolean formula} . 
In other words, it is the set of statements 휑 that is true, under some truth assignment to its 
variables. 
It’s convenient to consider Boolean formulas in a special form, 3CNF (conjunctive normal 
form). This means the formula looks something like 
(푥 ∨ 푦 ∨ 푧) ∧ (푥 ∨ 푤 ∨ 푦) ∧ · · · ∧ (푢 ∨ 푤 ∨ 푥). 
It is written as a bunch of clauses and’d together, and each each clause is an “or” of 3 literals 
(variables or negated variables). That’s all we’re allowed to do. The “3” means that we have 
3 variables in each clause. Thus we see this is a special case of the SAT problem, which we 
call 3SAT. 
3SAT = {⟨휑⟩ : 휑 is a satisfiable 3CNF formula} . 
We’ll focus on the 3SAT problem and the CLIQUE problem. 
The CLIQUE problem is very different. Given an undirected graph with nodes and edges, 
a 푘-clique is 푘 vertices all connected to one another. 
Define 
CLIQUE = {⟨퐺, 푘⟩ : 퐺 contains a 푘–clique} . 
I’m going to give a way to convert problem about whether or not a formula is in the 
3SAT language to whether a graph contains a 푘-clique. This is surprising! We’ll see that 
such conversions (reductions) are not just an interesting curiosity, but very important. 
93
Lecture 14 Notes on Theory of Computation 
We’ll do a proof by example. Suppose 
휑 = (푥1 ∨ 푥2 ∨ 푥3) ∧ (푥2 ∨ 푥3 ∨ 푥4) ∧ · · · ∧ (· · · ). 
A satisfying assignment is an assignment that makes the whole thing true. Because 휑 is 
made up of clauses and’d together, each clause has to be true. What does it mean for each 
clause to be true? We have to make at least one of the literals true. 
We have to pick out one literal and make it true. Thinking of the problem this way will 
be helpful to understanding the reduction to the CLIQUE problem. 
We will now convert 휑 to ⟨퐺, 푘⟩. We will have one node for each literal variable. It’s 
helpful to think of each node as being labeled by the associated literal. Now we put in the 
edges. We put in all possible edges with two exceptions. 
1. Don’t put edges inside a clause (internal to one of the triples associated to a clause). 
Thus edges can only go from one clause to another clause. 
2. Never join two nodes that are associated to contradictory labels. 
All other edges will be there. 
As long as two literals are not contradictory in different clauses, they are connected by 
an edge. 
Let 푘 be the number of clauses. 
We just have to show that this is actually a reduction. That this can be done in polyno-mial 
time is clear: by looking at the formula, we can easily write down the graph. 
We have to show two directions. Now is where the interesting stuff happens; we’ll un-derstand 
what’s going on; why did we draw this strange graph? 
1. 휑 ∈3SAT =⇒ ⟨퐺, 푘⟩ ∈CLIQUE. 
Suppose 휑 is 3-satisfiable; we have to exhibit a 푘-clique. Each clause has at least one 
true literal. Pick out a true literal in each clause. Maybe the assignment makes 푥2 
true. Obviously it cannot make 푥2 true; maybe it makes 푥3 true. Now pick out the 
associated nodes. 
I claim those nodes form a clique. I have to show that every pair of nodes I’ve picked 
are connected by an edge. We put in all possible edge with 2 exceptions. We have to 
show we don’t run into any of the exceptions. 
1. We only pick 1 node from each clause. 
94
Lecture 14 Notes on Theory of Computation 
2. We never pick two nodes with contradictory labels. We can’t pick two nodes with 
contradictory labels because they can’t be both true; we could not have picked 
both of them as the true literal in the clauses. One will be true and the other 
false in any assignment. 
We started with the certificate from 3SAT and produced a certificate for CLIQUE. 
2. 휑 ∈3SAT⇐ ⟨퐺, 푘⟩ ∈CLIQUE. 
Now we start with a 푘-clique. We reverse the argument. Look at the nodes we picked 
out as being in the same clique. Every node has to be from a different clause, because 
nodes in the same clause are not connected (1). Since there are 푘 clauses, we took one 
node from each clause. 
Take the nodes in the clique and let the corresponding literal be true. For instance, 
if 푥2 and 푥3 are in the clique, make 푥2 true and 푥3 true, i.e., 푥3 false. If a variable 
is unassigned, assign any which way. How do we know we didn’t run into trouble? 
We won’t assign a variable true and its complement true, because contradictory nodes 
can’t be in the same clique (2). 
This gives at least one 1 node in each clause. 
We’re done but we had to show both directions. 
This means that if we find a polynomial time algorithm for CLIQUE, then we can solve 
3SAT quickly. We can convert 3SAT into a special CLIQUE problem. If you can solve 
general CLIQUE problems, then you can solve these special CLIQUE problems too, using 
our magical polynomial time algorithm to CLIQUE. 
Let’s lay out our game plan. We’ll show next lecture that every NP problem can be 
reduced to SAT. We’ll show 
SAT ≤푃 3SAT ≤푃 CLIQUE,HAMPATH, . . . 
(we just did 3SAT≤푃CLIQUE). What we did for 1 problem we’ll have to do for infinitely 
many problems. We’ll use the Boolean logic of SAT to simulate a Turing machine. This is 
similar to the proof of undecidability of PCP: we use combinatorial structure to simulate a 
Turing machine. 
Note that polynomial time reducibility is preserved by composition (exercise). 
S3 NP completeness 
We have a special name for problems that every NP problem can reduce to. 
Definition 14.5: A language 퐵 is NP-complete if 
1. 퐵 ∈NP. 
2. For every 퐴 ∈NP, 퐴 ≤푃 퐵 (퐴 is reducible to 퐵 in polynomial time). 
95
Lecture 14 Notes on Theory of Computation 
If we can reduce everything else in NP to 퐵, then 퐵 is a NP-complete problem. Condition 
2 by itself is called NP-hard. Rephrasing, 퐵 is NP-complete if 퐵 ∈NP and is NP-hard. (A 
problem that is just NP-hard may be worse than NP.) 
The picture is that NP-complete problems are at the “top” of the NP problems: 
Proving the non-existence of reductions within NP is tricky business. A common question 
is to give an example of NP problem which is not NP-complete. But if 푃 = 푁푃, then all 
problems in NP are reducible to each other, essentially. If you can prove some NP problem 
is not reducible to another NP problem, then you have a good result—you’ve just shown 
푃̸= 푁푃. We’re not going to show that in class. Otherwise, I’d be off celebrating somewhere 
in the Caribbean. 
There is a special analogy between P and decidability and NP and recognizability. One 
key element is not in place, though. We don’t know whether the classes are different. Still, 
there are a lot of similarities. 
As we will show, everything is reducible to SAT, so SAT is NP-problem (Cook-Levin 
Theorem). 
Theorem 14.6 (Cook-Levin): thm:cook-levin SAT is NP-complete. 
(This is equivalent to Theorem 14.1.) 
By composition of reductions, if SAT reduces to some other problem, that problem is also 
a NP-problem. This will show that 3SAT, CLIQUE, HAMPATH, etc. are also NP-complete, 
provided that we have the reductions. 
Because 3SAT is NP-complete, to show another problem is NP-complete, you just 
have to do things: 
∙ Show it is in NP. 
∙ Give a polynomial-time reduction from 3SAT to NP: 3SAT≤푃NP. 
When we’re doing reductions, we’re trying to find a way to simulate Boolean variables 
with structures in the target problems. 
96
Lecture 14 Notes on Theory of Computation 
To reduce from one 3SAT to another language, design features or structures that have 
the same kind of feature as a variable or clause in 3SAT. (Think of this as “learning 
to program” using CLIQUE, HAMPATH, etc. languages/) These features are called 
gadgets, substructures in the target language which operate in the same way a variable 
or clause do. 
The best way to understand this is through example. 
Theorem 14.7: 3SAT≤푃HAMPATH. 
Proof. Start with a 3CNF, say 휑 = (푥1 ∨ 푥2 ∨ 푥3) ∧ (푥2 ∨ 푥3 ∨ 푥4) · · · . We construct ⟨퐺, 푠, 푡⟩. 
We build a graph that has a Hamiltonian path in it exactly when 휑 is satisfiable. (fig 6). 
We put in a bunch of nodes; all edges are directed downwards or horizontally. The dia-mond 
structures will be associated to the variables, there will be one structure corresponding 
to each variable (a bit different from last time, where we had one structure for each appear-ance 
of a literal). The bottom node of a diamond is the same as the top node of the next. 
For each diamond we have horizontal connections. 
We have a hamiltonian path right now. For each diamond we could zig-zag or zag-zig 
independently through each of the variable gadgets; we pick up all the nodes, and there’s 
nothing else we could do. Zig-zag is going to correspond to “true” and zag-zig is going to 
correspond to “false.” The Hamiltonian path is going to correspond to the truth assignment. 
An important feature we haven’t done yes is the clauses. We have to have an assignment 
which makes one literal in each clause true. We let each clause gadget be a node. A 
Hamiltonian path has to go through each. If 푥1 ∈ 퐶1 (clause 1), then we put in arrows like 
in the diagram, allow a detour to visit 푥1 if we’re zig-zagging (going from left to right in a 
diamond) ,but not if we’re zag-zigging (going from right to left in a diamond): (figure from 
textbook) 
97
Lecture 14 Notes on Theory of Computation 
This corresponds for 푥1 being a positive literal. How do we implement the fact that 
푥3 ∈ 퐶1? We allow the detour only to go in the right-to-left direction. 
98
Lecture 15 Notes on Theory of Computation 
We leave a space before putting the next node, to give an opportunity to make several 
detours. 
Suppose an assignment has 2 true literals in some clause 퐶1. But that gives 2 detours to 
퐶1. We can only visit 퐶1 once. Is that a problem? No. A detour is an option—it’s not a 
broken-road detour, it’s a rest-stop type detour, if you don’t have to go, don’t. 
We have to prove that if we have a satisfying assignment, then we have a Hamiltonian 
path. We zig-zag or zag-zig according to assignment, visit all detours. 
For the converse, if the path is nice (consisting of zig-zag and zag-zigs), then we get 
a satisfying assignment, and we’re done. If the path is not nice, i.e., it goes to a different 
diamond from one it came from at some stage, then the path cannot be Hamiltonian because 
of the spacer nodes. 
Lecture 15 
Thu. 11/1/12 
Last time we talked about 
∙ NP-completeness 
∙ 3SAT≤푃CLIQUE 
∙ 3SAT≤푃HAMPATH 
Today we’ll prove the Cook-Levin Theorem: SAT is NP-complete. 
We have 
(Every NP problem) ≤푃 SAT ≤푃 3SAT ≤푃 CLIQUE, HAMPATH, many others 
We’ll show the first inequality today and the second inequality in recitation. We know 
every problem on the right is NP-complete. (We don’t necessarily have to start with SAT 
or 3SAT. Sometimes it’s easier to study another NP-complete problem. For instance, to 
show UHAMPATH, the undirected version of Hamiltonian path, is NP-complete, we can 
just reduce the directed to the undirected version, HAMPATH≤푃UHAMPATH.) 
99
Lecture 15 Notes on Theory of Computation 
If we assume P̸=NP, and if we show a problem is NP-complete, then it cannot be solved 
in polynomial time. Thus being NP-complete is very strong evidence for intractibility: the 
problem is too hard to solve in practice. What is remarkable (and not well understood) is 
that typical problems in NP, with few exceptions, turn out to be in P or NP-complete. This 
is mysterious and currently has no theoretical basis. 
Thus, given a problem, researchers often spend part of the time showing it’s solvable in 
polynomial time and part of the time showing it’s NP-complete. This works well most of 
the time. 
There are some problems, though, that seem to be outside of P, but we don’t know 
how prove they are NP-complete. For instance, the problem of testing if 2 graphs are 
isomorphic—whether they are the same graph but labeled differently—is NP: the short 
proof is the mapping of the vertices. No one knows whether the graph isomorphism problem 
is solvable in polynomial time, nor has anyone shown it is NP-complete. It’s one of few 
problems that seem to be hovering in between. Another example is factoring integers. 
Define 
CoNP = ⌋︀퐴 : 퐴 ∈ NP{︀. 
We have P∈ NP ∩ CoNP. (P is closed under complement so P=coP. It’s generally believed 
that NP-complete problems cannot be in coNP because otherwise NP=coNP. 
There are problems in the intersection, for instance, factoring is a problem in NP∩coNP. 
Naively it’s a function, but we can turn it into a decision problem. Think of numbers as 
written in binary, and call it the bit factoring problem: 
BIT-Factoring = {⟨푥, 푖⟩ : 푖th bit of largest prime factor of 푥 is 1} . 
BIT-Factoring is in NP because nondeterministically we can guess the prime factorization 
of 푥 and check that the largest prime factor has a 1 in the 푖th place. 
The complement is also a NP-problem: The 푖th bit is a 0. We can check that in exactly 
the same way. 
If BIT-Factoring is in 푃, then we can factor numbers in polynomial time. We believe 
that factoring is not in P, so this problem seems to not be in P. This suggests the problem 
is not NP-complete. 
S0 Homework 
The first four questions are clear. For one of them keep in mind dynamic programming as a 
technique. (Context-free languages are testable in polynomial time. It is in a sense the most 
basic polynomial time algorithm.) 
100
Lecture 15 Notes on Theory of Computation 
Problem 5 asks you to show that under the assumption P=NP, there exists an algorithm 
that operates in polynomial time which not not only test whether a statement is satisfiable, 
but produce the satisfying assignment. A tempting algorithm is that there is a nondeter-ministic 
algorithm which finds the assignment, and because P=NP, there is a deterministic 
algorithm which finds the assignment. But it is conceivable that the polynomial time al-gorithm 
for satisfiability operates not by finding the assignment, but saying whether it is 
satisfiable. 
You have to show that if the program operates by some other way, you can turn it into 
an algorithm to find the assignment. 
In order to produce a satisfying assignment, you will end up testing whether multiple 
formulas are satisfiable.Out of the decisions from the tests, you can assemble the satisfying 
assignment to the original formula. How can you at least get a little bit of information about 
the satisfying assignment? 
Problem 6 says that minimizing NFA’s cannot be done unless P=NP. By contrast, it is 
know that the conversion for DFA can be done in polynomial time. 
S1 Cook-Levin Theorem 
Theorem 15.1 (Cook-Levin, Theorem 14.6 again): thm:cook-levin2 SAT is NP-complete. 
Proof. 
1. SAT∈NP: This is easy: guess a satisfying assignment. 
2. Let 퐴 ∈NP. We have to show 퐴 ≤푃SAT. Assume we have a NTM 푀 for 퐴 so that 푀 
runs in 푛푘 time. 
The idea is as follows. We have to give a polynomial time reduction 푓 : 퐴 →SAT. It will take 
a string 푤 and convert it to some formula 휑푤. The function 푓 maps a membership question 
in 퐴 to a membership question in SAT; we will have 푤 ∈ 퐴 exactly when 휑푤 is satisfiable. 
푓 : 퐴 → SAT 
푤↦→ 휑푤 
푤 ∈ 퐴 iff 휑푤 is satisfiable. 
Think of 휑푤 as saying whether 푀 accepts 푤. 
The construction of 휑푤 as follows. It will be in 4 pieces and’d together: 
휑푤 = 휑cell ∧ 휑start ∧ 휑move ∧ 휑accept 
We’ll describe the computation of 푀 on 푤 in a certain way. 
Define a tableaux for 푀 on 푤 to be a table where the rows are configurations of 푀 
on 푤. Write down the tape with the head symbol to the left of the symbol it’s looking 
at (cf. the PCP proof 10.1). Each row is a configuration. The sequence of rows you 
101
Lecture 15 Notes on Theory of Computation 
get is a computation history. Remember 푀 is nondeterministic, so there may be multiple 
computation histories. If 푀 accepts 푤, there is an accepting branch, and we can write 
down an accepting computation history with the starting configuration at the top and the 
accepting configuration at the bottom. 
Does there exist such a tableaux? If 푀 does not accept 푤 there is no accepting com-putation 
history so there is no tableaux. The question we’re trying to answer is whether a 
tableaux exists. 
We’re trying to make a formula which says a tableaux exists. Is there some way of setting 
cells to symbols such that the whole thing is a legitimate tableaux? We make indicator 
variables for each cell: think of each cell as having a bunch of little lights; one light for each 
possible setting the cell could be: 푎, 푏, 푞0, etc. If the light for 푎 is on, then the cell has an 푎 
in it. 
The variables of 휑푤 are 푥푖푗휎 where 1 ≤ 푖, 푗 ≤ 푛푘 (we’re assuming the machine runs for 푛푘 
steps; the most number of cells it could use is 푛푘)11 and 휎 ∈ Γ∪푄 (휎 is in the tape alphabet 
or 휎 is a state). There are |Γ ∪ 푄|푛2푘 variables 푥푖푗휎, which is polynomial in 푛. 
11Technically we may need 푐푛푘 just to cover 푛 = 1 but this is a minor issue. 
102
Lecture 15 Notes on Theory of Computation 
휑cell: In order for variables to correspond to valid tableaux, exactly 1 cell per symbol has to 
get assigned. If we turn on several lights for some cell, this would correspond to multiple 
symbols, and we don’t want that. We have to make sure we’re turning on exactly one light; 
exactly one variable becomes true for each (푖, 푗). This is the first piece 휑cell. 
휑cell says that there is exactly one symbol per cell or equivalently, exactly one 푥푖푗휎 is true 
for each 푖, 푗: 
휑cell := ⋀︁ 1≤푖,푗≤푛푘 „ ⋁︁ 휎∈Γ∪푄 
푥푖푗휎 ∧ ⋀︁휎̸=휏 
푥푖푗휎 ∨ 푥푖푗휏Ž. 
The first formula ensures that one of these lights is “on,” and the second ensures that at 
most one of the lights is on (for every pair of lights which are not the same, at least one of 
them is on). Together they say exactly 1 variable is true. The assignment has to correspond 
to one symbol in each cell of the tableaux. 
휑start: Now we want to say in the very first row, the variables are set to be the start 
configuration. 휑start says that the start configuration is 
푞0푤1푤2 · · ·푤푛 · · · ⏟ ⏞ 푛푘 
Hence we let 
휑start = 푥11푞0 ∧ 푥12푤1 ∧ 푥13푤2 ∧ · · · ∧ 푥1,푛+1,푤푛 ∧ 푥1,푛+2, ∧ · · · 푥1,푛푘, . 
휑accept: Now let’s do 휑accept. The very last row is an accepting configuration; namely the 
machine is in the accept state. (What if the machine stops sometime earlier? We assume that 
the rules of the machine say it stays in the accepting state for the “pseudo-steps” afterward.) 
We let 
휑accept = ⋀︁ 1≤푗≤푛푘 
푥푛푘푗푞accept . 
휑move: Finally, we need to say the machine moves completely. To do this out in full gory 
detail is a bit of a mess (like the PCP problem). I’ll just convince you that you can do it. 
We pick out a 2 × 3 neighborhood, or window from the tableaux, and specify what it 
means for it to be a legal neighborhood. figure 3 For any given setting of symbols in the 
2×3 neighborhood, we can ask whether it could possibly arise according to the rules of the 
machine. There are certain legal settings and certain illegal settings. For instance if when 
in state 푞3 and the machine reads an 푎, writes 푐, moves to the right, and goes to state 푞5 in 
a possible nondeterministic step, then 
푞3 푎 푏 
푐 푞5 푏 
is legal, whereas 
푞3 푎 푏 
푐 푞5 푑 
103
Lecture 15 Notes on Theory of Computation 
is illegal. 
There are some subtleties, for instance, 
푎 푏 푐 
푑 푏 푐 
may be a state with where the head changed 푎 (the head being to the left of 푎), but something 
like 
푎 푏 푐 
푎 푑 푐 
is never possible. By looking at the transition function of 푀, we can determine which of 
the 6-symbol settings are legal and which are not. We need to check whether every single 
window is legal. If every single window is legal then all moves are legal. 
This depends critically on the window being 2 × 3. If it were just a 2 × 2 window it 
wouldn’t work. The tableaux can be globally wrong but locally right if we only look at 2×2 
windows. If the machine is in state 푞2, and it can go to 푞3 and go left, or 푞5 and go right, 
then you have to make sure you exclude things like 
푎 푞2 푎 
푞3 푎 푞5 
. 
A 2 × 3 window just big enough to catch this; this is the only thing that can go wrong. 
Thus we let 
휑move = ⋀︁ 1≤푖,푗≤푛푘 
(푖, 푗 neighborhood is legal), 
i.e., more precisely, 
휑move = ⋀︁ 1푖푛푘, 1≤푗푛푘 ⋁︁ 푎 푏 푐 
푑 푒 푓 
is legal 
푥푖−1,푗,푎 ∧푥푖,푗,푏 ∧푥푖+1,푗,푐 ∧푥푖−1,푗+1,푑 ∧푥푖,푗+1,푒 ∧푥푖+1,푗+1,푓 . 
We “or” over all possible ways to set cells to symbols to get a legal window. That can be a 
lot but it’s a fixed number. 
We have 2 things that remain: first, we need to show this is correct, i.e., 푤 is in the 
language iff 휑푤 satisfied. Now 푤 being in the language means there is some accepting 
computation history, i.e., some valid tableaux, i.e., some setting of variables that satisfies 
휑푤. This should be clear from the construction. The pieces of the formula are designed to 
force the variables to be set according to some valid accepting tableaux. 
We also have to check the reduction can be done in polynomial time. This is easy to 
confirm. First, how large is 휑푤? Ignoring constant factors, the size is about as large as 
the number of cells in the tableaux, which is polynomial in 푛. Actually, writing down the 
formula can be done in about the same time as the size of the formula. The steps themselves 
are simple. It’s just a lot of output, but still polynomial. The actual thinking to produce 
the output is simple. 
104
Lecture 15 Notes on Theory of Computation 
S2 Subset sum problem 
Let’s look the subset sum problem: 
SubSum = {(푎1, . . . , 푎푘, 푡) : some subset of 푎1, . . . , 푎푘 sums to 푡} . 
This is a NP-problem because you can just guess the subset that sums to 푡. 
Theorem 15.2: The subset sum problem is NP-complete. 
Proof. We show that 3SAT reduces to SubSum. Suppose we are given a 3-cnf 휑 = (푥1 ∨푥2 ∨ 푥3) ∧ (· · · ) · · · (· · · ). How do we make gadgets in SubSum but simulate the variables and 
clauses of the 3SAT problem? 
In the choice of what the subset looks like, there are some binary choices: pick or not 
pick. We want to make them correspond to binary choices for the variables. 
A binary choice is whether or not 푎1 in the subset. We modify this a bit. 푥1 set to true 
or false is somehow symmetrical. 푎1 being in the subset or not is less symmetrical. We’ll 
do something in the same spirit. Each variable represented is represented by 2 values. The 
target sum is designed in such a way so that exactly one value has to appear in the subset. 
Here’s the construction. We’ll write the values in decimal. Having 1’s in 푡 forces exactly 
one of 푎1, 푎2 to appear, and similarly for each pair 푎2푘−1, 푎2푘. 푎1, 푎2 is the 푥1 gadget, 푎3, 푎4 
is the 푥2 gadget, and so forth; 푎1 corresponds to 푥1 true and 푎2 corresponds to 푥1 false, and 
so forth. In the table below, we write 푎2푘−1, 푎2푘 as 푦푘, 푧푘. We have columns corresponding 
to each clause, and put 1’s in cells when the literal corresponding to the row is in the clause 
corresponding to the column. 
105
Lecture 16 Notes on Theory of Computation 
Now we put 2 extra 1’s in each column. If there are no 1’s in the formula part, then we 
are not going to get 3. If we have at least 1 in the formula part, then we can add 1’s to get 
3, and we are done. 
Lecture 16 
Tue. 11/6/12 
We’re going to shift gears a little bit. Having finished our discussion of time complexity—the 
number of steps it needs to solve one problem—we’re going to look at how much memory 
(space) is needed to solve various problems. We’ll introduce complexity levels for space 
complexity analogous to time complexity, and complete problems for these classes. 
Last time we proved the Cook-Levin Theorem: SAT is NP-complete. 
Today we’ll do 
∙ space complexity 
106
Lecture 16 Notes on Theory of Computation 
∙ SPACE(푠(푛)), NSPACE(푠(푛)) 
∙ PSPACE, NPSPACE 
∙ Examples: TQBF, LADDERDFA 
∙ Savitch’s Theorem. 
S0 Homework 
Problem 1: 
On exponentiation modulo a number. We can do the test even though the numbers are 
very big, say all 푛-bit numbers. The naive algorithm—just multiplying over and over—takes 
exponential time, because the magnitude of the number is exponential in the size of the 
number. 
If you want to raise a number to the 4th power, you can multiply it 3 times or square it 
twice. Using this squaring trick you can raise number to high powers, even if they are not 
powers of two. 
There are real applications of raising numbers to powers in modular arithmetic, for in-stance, 
in cryptography. 
Problem 2 (Unary subset sum problem): 
A number in unary is much bigger to write down than, say, in binary. The straightforward 
algorithm—looking through all possible subsets—doesn’t give a polynomial time algorithm 
because there are exponentially many subsets. Instead, use dynamic programming. The key 
observation is that you can ignore the target. Just calculate all possible values you can get by 
looking at the subsets. There are exponentially many subsets, but only polynomially many 
different values you can obtain for their sums. Think about how to organize your progress 
carefully. Dynamic programming gives you a way to organize your progress. 
Problem 3: 
This is an important problem. If P=NP, then everything in NP is NP complete. This 
is important for 2 reasons. This shows that proving a problem is not NP-complete is pretty 
hopeless. There can be no simple way of showing a problem not NP-complete, because then 
we get this amazing consequence P̸=NP. 
The fact really comes from the fact that all problems in P are polynomial time reducible 
to one another. This is important to understand, because the issue comes up repeatedly in 
different guises. This is a nice exam-type question that can be asked in a variety of different 
ways. 
This is similar to the fact that all decidable problems are mapping-reducible to one an-other. 
This is a basic concept to understand in putting together the theory the way we do 
it. 
Problem 4: 
107
Lecture 16 Notes on Theory of Computation 
The 3-coloring problem is NP-complete. The book gives gadgets you might use. The 
palette is a structure you might want to use in your reduction. If you imagine trying to 
color your graph in 3 colors, and you have this structure, the 3 colors must all appears in 
the palette. (The palette is like the set of colors the artist has to work with.) When you 
color the graph with 3 colors, we don’t know what colors they are but we can arbitrarily 
assign them names, say True, False, and Red. Thinking of colors as truth values helps you 
understand the rest of the connection. 
In the variable gadget, a node of the palette (the red node) happens to be connected to 
2 other nodes connected to each other. If it is 3-colorable, then we know the 2 nodes are 
not red, so are either true-false or false-true. That binary choice mirrors the choice of truth 
assignment to some variable. That’s why this structure is called a variable gadget. It has 
two possibilities. 
But you have to make sure the coloring corresponds to satisfying assignment. That’s 
what the other gadgets help you to do. Play with the or-gadget. Try assigning values at the 
bottom and see what values are forced elsewhere. 
Problem 5: 
If P=NP then you can not only test formulas, you can find the assignment. Find the 
assignment a little bit at a time. 
Problem 6 (Minimizing NFA’s): 
Find an equivalent automaton with the fewest number of states possible, equivalent to 
original one. For DFA’s, there is a poly time algorithm. No such algorithm is known for 
NFA’s. In fact, if you could do that then P=NP. Imagine what would happen if you could 
minimize the automaton you ended up constructing. That would turn out to be useful. 
S1 Space complexity 
1.1 Definitions 
Definition 16.1: A Turing machine runs in space 푠(푛), where 푠 : N → N, if it halts using 
at most 푠(푛) tape cells on every input of length 푛, for every 푛. 
The machine uses a tape cell if its head moves over that position on the tape at some 
time. We’re only going to consider the case 푠(푛) ≥ 푛, so we at least read the entire input. 
The head has at least passed over the input; it might use additional space beyond the input. 
We assume the machine halts on input of every length. 
This is entirely analogous to time complexity. There, instead of measuring space used, 
we measured time used. 
108
Lecture 16 Notes on Theory of Computation 
We can define space use for deterministic and nondeterministic machines. For a nonde-terministic 
maching to run in space 푠(푛), it has to use at most 푠(푛) in every branch. We 
treat each branch independently, seeing how many tape cells are used on that branch alone. 
We now define space complexity classes. 
Definition 16.2: Define 
SPACE(푠(푛)) = {퐴 : some TM decides 퐴 running in 푂(푠(푛)) space} 
NSPACE(푠(푛)) = {퐴 : some NTM decides 퐴 running in 푂(푠(푛)) space} . 
Think of these as the collection of languages some machine can do within 푠(푛) space. 
1.2 Basic facts 
Let’s show some easy facts, some relationships between space and time complexity. 
Proposition 16.3: For 푠(푛) ≥ 푛, 
TIME(푠(푛)) ⊆ SPACE(푠(푛)). 
This also works for NSPACE and NTIME. 
Proof. Suppose we can do some problem with 푠(푛) time. Then there is a TM that can 
solve that problem with at most 푠(푛) steps on any problem of input length 푛. I claim that 
language is also solvable in space 푠(푛). If you can do something with 푠(푛) steps you can do 
it in 푠(푛) space, by using the same algorithm. The machine can only use at most 푠(푛) tape 
cells because in each additional step it uses at most 1 more tape cell. 
Let’s do containment in the other direction. Space seems to be more powerful than time: 
the amount of stuff doable in space 푛 might take a lot more time. 
Proposition 16.4: For 푠(푛) ≥ 푛, 
SPACE(푠(푛)) ⊆ TIME(2푂(푠(푛))) = ⋃︁푐0 
TIME(푐푠(푛)). 
This also works for NSPACE and NTIME. 
Think of 푐 as the size of the tape alphabet. 
Proof. Consider a machine running in space 푠(푛). 
It can’t go on too long without repeating a configuration; if it halts it can’t repeat a 
configuration. The number of configurations is at most exponential in 푠(푛), so the time is 
at most exponential in 푠(푛). 
109
Lecture 16 Notes on Theory of Computation 
Definition 16.5: Define 
PSPACE = ⋃︁푘 
SPACE(푛푘) 
NPSPACE = ⋃︁푘 
NSPACE(푛푘) 
We define these because they’re model independent like P and NP. 
Corollary 16.6: P⊆PSPACE and NP⊆NPSPACE. 
Proof. This follows from TIME(푠(푛)) ⊆ SPACE(푠(푛)). 
The following starts to show you why space is more powerful than time. 
Theorem 16.7: NP⊆PSPACE. 
Now we have to do something nontrivial. All we know is that we have a nondeterministic 
polynomial time algorithm for the language. It’s not going tell you that you can decide the 
same language with a polynomial time algorithm on a deterministic machine. 
Proof. 1. We first show SAT∈PSPACE: Use a brute force algorithm. You wouldn’t want 
to write down the whole truth table. But you can cycle through all truth assignments 
one by one, reusing space to check whether they are satisfying assignments. If you go 
through all assignments and there are no satisfying assignments, then you can reject. 
The total space used is just enough to write down the current assignment. Thus 
SAT∈SPACE(푛). 
2. If 퐴 ∈NP then 퐴 ≤푃SAT. The polynomial time reduction can be carried out in poly-nomial 
space. If you have an instance of a NP problem, then you can map it to SAT 
in polynomial time, and use the fact that the SAT problem can be done in polynomial 
space. 
This theorem illustrates the power of completeness. Note that we had to make sure the 
reduction is being capable of being computed by algorithms within the class (PSPACE). 
Then we showed a NP-problem is in PSPACE for a complete problem in that class (SAT), 
so we get that all problems reducible to it are also in that class. Thus the whole class (NP) 
becomes subset of class you’re working with (PSPACE). 
(You can also give a more direct proof.) 
Theorem 16.8: CoNP⊆PSPACE. 
Proof. When you have deterministic machines, and you want the complementary langauage, 
you can just flip the answer at the end. Deterministic complexity classes are closed under 
complement. Just solve the NP problem and take the complement. 
110
Lecture 16 Notes on Theory of Computation 
For instance, the unsatisfiability problem is in CoNP, hence is in PSPACE. We have 
HAMPATH ∈CoNP, hence is in PSPACE. In fact, UNSAT and HAMPATH are CoNP-complete. 
1.3 Examples 
Let’s do a slightly less trivial example of a problem in PSPACE. Then we’ll give an example 
of a problem in NPSPACE. 
Example 16.9: Here is a Boolean formula: 
(푥 ∨ 푦) ∧ (푥 ∨ 푦 ∨ 푧). 
We put quantifiers in front. Quantifiers range over boolean values. 
∀푥∃푦∀푧[(푥 ∨ 푦) ∧ (푥 ∨ 푦 ∨ 푧)]. 
This formula says: For every truth assignment to 푥 there exists a truth assignment to 푦 such 
that for every truth assignment to 푧 the statement is true. This is a quantified Boolean 
formula. We assume every variable gets quantified. 
We formulate the general computational problem as a language: TQBF, true quanti-fied 
Boolean formulas. 
TQBF = {⟨휑⟩ : 휑 is a true quantified Boolean formula} . 
This problem is in a sense a generalization of satisfiabilities. The satisfiability problem is 
the special case where all quantifiers out front are ∃: is there a setting to all variables that 
makes the formula true. 
TQBF seems to be harder. It is in polynomial space, but not known to be in NP. Why 
is it solvable in polynomial space? 
It turns out TQBF is PSPACE-complete. We first have to show it’s in PSPACE. This 
isn’t too hard. 
Theorem 16.10: thm:tqbf-pspace TQBF∈PSPACE. 
Let’s assume that you can plug in constant values (trues/falses) in certain locations. 
Proof. Break into cases. On input ⟨휑⟩, 
1. If there are no quantifiers, then there are no variables, so evaluate 휑 and accept if true. 
2. We give a recursion. If 휑 starts with ∃푥, evaluate recursively for 푥 true and false. 
Accept if either accepts. 
3. If 휑 starts with ∀푥, evaluate recursively for 푥 true and false. Accept if both accept. 
111
Lecture 16 Notes on Theory of Computation 
In this way the machine evaluates all possibilities while resusing the space! 
This uses space 푂(푛). 
Now let’s look at nondeterminstic space complexity. 
Here’s a word puzzle: convert one word to another by changing one word at a time, and 
staying in the language. For instance, suppose we want to convert ROCK to ROLL. We 
can’t convert ROCK to to ROCL because ROCL is not an English word. It may be helpful 
to change the R to something else to enable us to change last letters: ROCK, SOCK, SULK, 
BULK, BULL, BOLL, ROLL. We’ll consider a similar problem. 
Define the language as the set of strings some finite automaton accepts. 
Definition 16.11: df:ladder Define 
LADDERDFA 
= {⟨퐵, 푠, 푡⟩ : there is a sequence 푠 = 푠0, 푠1, . . . , 푠푘 = 푡, 푠푖, 푠푖+1 differ in one character, all 푠푖 ∈ 퐿(퐵)} 
What’s the complexity of testing this? This problem is solvable in nondeterministic 
polynomial space. 
Theorem 16.12: thm:ladder-npspace LADDERDFA ∈ NPSPACE. 
Proof. (by example) Nondeterministically change one letter, check to see if the word is still 
in the language. We test at every stage if we ended up at ROLL. We have to be careful not 
end up in a loop. The machine cannot remember everything it’s done. Instead, it counts 
how many words it has looked at so far. If the number is too high, it must have looped. 
1.4 PSPACE vs. NPSPACE 
There is a rather surprising general theorem that tells you PSPACE and NPSPACE are the 
same. The analogue to P vs. NP for space complexity is solved. 
PSPACE = NPSPACE. 
This is not obvious! If you try a backtracking algorithm in the obvious way, then it blows 
up the space to be exponential. 
Is this is a NP problem? The certificate (ladder) could be exponentially long! The space 
is allowed to guess on the fly. The amount of steps is potentially exponential. It’s not known 
to be in NP. (The input consists of the automaton, starting string, and ending string. The 
automaton is the not dominant piece; the starting and ending string are.) 
S2 Savitch’s Theorem 
We have the following remarkable theorem. 
112
Lecture 17 Notes on Theory of Computation 
Theorem 16.13 (Savitch): thm:savitch For 푠(푛) ≥ 푛, 
NSPACE(푠(푛)) ⊆ SPACE(푠(푛)2) 
Corollary 16.14: PSPACE=NPSPACE. 
This is because we have just a squaring. 
Proof. Given 푆(푛)–SPACE NTM 푁, we construct an equivalent TM 푀 that uses 푂(푆2(푛)) 
space. 
Imagine a tableaux of 푁 on 푤, corresponding to some accepting computation branch. 
This time, the dimensions are different: the width is 푠(푛), how much space we have and the 
height is 푐푆(푛) for some 푐. We want to test if there’s a tableaux for 푁 on 푤, but we want to 
do it deterministically. 
Can we fill it in somehow? It’s an exponentially big object, and we’ll be in trouble if we 
have to keep it all in memory—we don’t have that much memory. 
The deterministic machine tries every middle configuration sequentially. (This takes a 
horrendous amount of time but we only care about space.) 
For a start configuration, ask: can you get from top to middle in time 1 
2푐푆(푛) and from 
middle to bottom in time 1 
2푐푆(푛). Now ask this recursively, until we get down to adjacent 
configurations. 
How deep is the recursion going to be? The depth of the recursion is log2 푐푆(푛) = 푂(푆(푛)). 
What do we have to remember every time we recurse? The working midpoint configurations. 
For each level of the recursion we have to write down an entire configuration. The config-uration 
takes 푆(푛) space, and each level costs 푂(푆(푛)) space. Hence the total is 푂(푆2(푛)) 
space. 
You can implement this in the word-ladder problem: write down a conjecture for the 
intermediate string. See if can get from/to in half as much time. This is slow slow but runs 
in relatively small space. 
113
Lecture 17 Notes on Theory of Computation 
Lecture 17 
Thu. 11/8/12 
Problem set 5 is out today. It is due after Thanksgiving, so you can think about it while 
you’re digesting. 
Last time we talked about 
∙ space complexity 
∙ SPACE(푠(푛)), NSPACE(푠(푛)) 
∙ PSPACE, NPSPACE 
∙ Savitch’s Theorem says that PSPACE=NPSPACE. 
Today we will 
∙ finish Savitch’s Theorem. 
∙ Show TQBF is PSPACE-complete. 
S1 Savitch’s Theorem 
Recall the following. 
Theorem (Savitch, Theorem 16.13 again): For 푠(푛) ≥ 푛, 
NSPACE(푠(푛)) ⊆ SPACE(푠(푛)2). 
Savitch’s Theorem says that if we have a nondeterministic machine, we can convert it to 
a deterministic machine using at most the square of the amount of time. 
Nondeterminism only blows up space by a square, not an exponential. The proof is not 
super hard but it is not immediately obvious. 
Proof. For NTM 푁 using space 푆(푛) with configurations 퐶1,퐶2, write 퐶1 
푡− 
→ 퐶2 (“퐶1 yields 
퐶2 in 푡 steps”) if 푁 can go from 퐶1 to 퐶2 in at most 푡 steps. We give a recursive, deterministic 
algorithm to test 퐶1 
푡− 
→ 퐶2 without using too much space. 
We will apply the algorithm to 퐶1 = 퐶start, 퐶2 = 퐶accept, and 푡 = 푑푆(푛). 
We may assume 푁 has a single accepting configuration, by requiring the machine to clean 
up the space when it is done (just like children have to clean up their room). It puts blanks 
back, moves its tape head to the left, and only then does it enter the accept state. 
The basic plan is to make a recursive algorithm. 
114
Lecture 17 Notes on Theory of Computation 
We will inevitably have to try all possibilities, but we can do so without using too much 
space. The machine zoom to the middle, and guess the midpoint configuration. It tries 
configurations sequentially one after another, as a candidate for the midpoint—think of it 
as cycling like an odometer through all possible configurations (of symbols and the tape 
head). This is horrendously slow, but it can reuse space. Once it has a candidate, it solves 
2 problems of the same kind recursively: can we get from the top to the middle in half the 
time, and once we’ve found a path to the middle, we ask can we get from the middle to the 
bottom? Note that in the second half of procedure, the machine can reuse space from the 
first half. 
The machine continues recursively on the top half, splitting it into two halves and asking 
whether it can get between the configurations in a quarter of the original time. 
The recursion only goes down a logarithmic number of steps, until it gets to 푡 = 1. There 
are on the order of 푆(푛) levels. To check whether one configuration follows another in 1 step, 
just simulate the machine. How much do we have to write down every time we recurse? We 
have to write down the candidate for the middle. Each time we recurse we have a new 
configuration to write down. 
We summarize the algorithm below. 
On input 퐶1,퐶2, 푡, do the following. 
1. For 푡  1, for each configuration 퐶MID test if 퐶1 
푡/2 
−→ 퐶MID and 퐶MID 
푡/2 
−→ 퐶2, reusing 
the space. 
Accept if both accept (for some 퐶MID). (Then 퐶1 can get to 퐶2 in 2 steps.) 
2. If 푡 = 1, accept if 퐶1 can legally yield 퐶2 in 1 step of 푁 or if 퐶1 = 퐶2. 
The number of levels of recursion is log2 푑푆(푛) = 푂(푆(푛)). Each level requires storing a 
configuration 퐶MID and uses 푆(푛) space. The total space used is 
푂(푆(푛))푆(푛) = 푂(푆2(푛)). 
115
Lecture 17 Notes on Theory of Computation 
This tells us PSPACE=NPSPACE. Let’s draw the picture of the world. (If 푆(푛) is 
polynomial, 푆(푛)2 is still polynomial.) 
Let’s move to the second topic for today. 
S2 PSPACE–completeness 
It is a famous problem whether P=NP. We know NP⊆PSPACE; we can also ask whether 
P=PSPACE. If a language needs polynomial space, can we just use polynomial time? Un-believably, 
we don’t know the answer to that either. For all we know, the whole picture 
collapses down to P! 
A few (wacky) members of community believe P=NP. No one believes P=PSPACE. That 
would be incredible. 
What we do have is the notion of NP–complete. There is a companion notion of PSPACE– 
completeness. Every problem in PSPACE is reducible to a PSPACE–complete problem. 
This is interesting for some of the same reasons that NP–complete problems are interesting. 
Showing a problem is PSPACE–complete is even more compelling evidence that outside P, 
because else P=PSPACE. Complete problems for class give you insight for what that space 
is about, and how hard the problems are. 
PSPACE–completeness has something to do with determining who has a winning strategy 
in a game. There is a tree of possibilities in a game, and a structure to that tree: I win if 
for every move you make there exists a move I can make such that... This is the essence of 
what PSPACE is about. While we don’t know P̸=PSPACE, we do know that P is not equal 
to the next one up: EXPTIME. You can prove P̸=EXPTIME. That is the first time where 
technology allows us to show something different. 
Note there is a tradeoff: more time, less space vs. more space, less time. There are 
results in these directions, but we won’t do them. For instance, there are Savitch’s Theorem 
variants, which trade off time for space. It cuts the recursion at different points. 
2.1 Definitions 
This should look familiar, but there’s one point we have to make clear. 
116
Lecture 17 Notes on Theory of Computation 
Definition 17.1: We say that 퐵 is PSPACE–complete if 
1. 퐵 ∈PSPACE. 
2. For every 퐴 ∈PSPACE, 퐴 ≤푃 퐵. 
We are still using polynomial time reducibility. Why polynomial time? It’s important to 
realize if we put PSPACE, that would be stupid. If 퐴 is polynomial space reducible to 퐵, 
what would happen? This is related to the homework due today. The reduction can solve 
the problem itself and then target it to the right problem. 
Thus, every 2 problems in P are polynomial time reducible to one another. Every 2 probs 
in PSPACE are polynomial space reducible to one another. Every we use polynomial space 
reducibility, every problem is PSPACE–complete. This is not interesting. You have to use a 
reduction less powerful than class you’re studying. A reduction is a transformer of problems, 
not a solver of problems. 
If you have a PSPACE–complete problem, and you can solve it in polynomial time by 
virtue of some miracle, then every other PSPACE problem can be solved in polynomial time, 
and we’ve pulled down all of PSPACE into P. 
It’s important to understand why we set it up this way! 
2.2 TQBF is PSPACE–complete 
An example of a PSPACE problem is TQBF (true quantified boolean formulas, where all 
variables are quantified by ∀’s and ∃): 
TQBF = {⟨휑⟩ : 휑 is a true quantified Boolean formula} . 
For instance, ∀푥∃푦(푥 ∨ 푦). 
Theorem 17.2: thm:tqbf-pspace-comp TQBF is PSPACE–complete. 
The proof will be a recap of stuff we’ve seen plus 1 new idea. 
Proof. 1. TQBF∈PSPACE: We saw last time that recursing on assignments gives a linear 
space algorithm (Theorem 16.10). 
2. Let 퐴 ∈PSPACE be decided by a TM 푀 in space 푛푘. 
We give a polynomial time reduction 푓 from 퐴 to TQBF, 푓 : 푤↦→ 휑푤, such that 휑푤 
“says” 푀 accepts 푤. 휑푤 captures 푀 running on 푤; so far this is the same idea as that 
in the Cook-Levin Theorem. 
Consider a tableaux of 푀 on 푤, with width 푆(푛) and height 푑푆(푛). 푀 is deterministic, 
so there is just 1 possibility for the rows to be the computation history. As in Cook-Levin, 
we can try to build 휑푤 the same way. That gives us a correct formula. 
The difference is that before we were talking about satisfiability. We can just put ∃ quantifiers out front to make it a TQBF. This doesn’t work; why? How big is the formula? 
117
Lecture 17 Notes on Theory of Computation 
It’s as big as the tableaux, exponentially big! You can’t write down an exponentially big 
formula in polynomial time. We need a shorter formula which expresses the same thing. 
The휑푊 from Cook-Levin is too big. 
This is why the idea from Cook-Levin by itself is not enough. 
First we solve a more general problem. Let’s solve the problem for 퐶1,퐶2, 푡: give 휑퐶1,퐶2,푡 
which says 퐶1 
푡− 
→ 퐶2. It will be helpful to talk about any 2 configurations, and being able to 
go from one to another in a certain amount of time. Even Cook-Levin would give you that: 
just use 퐶1 and 퐶2 in place of the start and end configuration. But this viewpoint allows us 
to talk about the problem in a different way. 
As a first attempt, we can construct 휑퐶1,퐶2,푡 saying 퐶1 
푡− 
→ 퐶2 by writing 
휑퐶1,퐶2,푡 = ∃퐶MID(휑퐶1,퐶MID,푡/2 ∧ 휑퐶MID,퐶2,푡/2) 
and constructing subformulas recursively. 
Why can we write down ∃퐶MID? Really it is represented by the configurations of a bunch 
of variables. It is shorthand for ∃푥1∃푥2 · · · ∃푥ℓ. If 푡 = 1, then 휑퐶1,퐶2,푡=1 and we can write the 
formula by Cook-Levin. 
But have we done anything? The semantics of the formula are correct. All this is saying 
is that we can get from 퐶1 to 퐶2 in 푡 steps iff there is some midpoint such that we can get 
from 퐶1 to the midpoint in half the time and from the midpoint to 퐶2 in half the time. (This 
smells like Savitch’s theorem. There is more than meets the eye!) We cut 푡 in half at the 
expense of creating 2 subproblems. The number of levels of the recursion is fortunately only 
푑. Here 푆(푛) = 푛푘. 
We end up with polynomial time steps, but we double the size of the formula each time, 
so it’s still exponential. 
We ended up not doing anything! This shouldn’t come as a total surprise. We’re still 
only using the ∃ quantifier. This is still a SAT-problem! We haven’t used the full power of 
TQBF, which uses ∃ and ∀’s. 
Now and’s and ∀’s are 2 flavors of the same thing. ∃’s are like or’s. We’re going to get 
rid of the “and.” This looks like cheating but it’s not: 
휑퐶1,퐶2,푡 = ∃퐶MID∀(퐶3,퐶4) ∈ {(퐶1,퐶MID), (퐶MID,퐶2)}(휑퐶3,퐶4, 푡 
2 
). 
There is a fixed cost out front, and a single new formula at each level, not doble formulas, so 
there is no blowup. We need to show this is legitimate. Note that ∃퐶MID stands for a string 
118
Lecture 17 Notes on Theory of Computation 
that is 푂(푛푘 = 푆(푛)) long. The same is true of the ∀ quantifier. Let’s rewrite the ∀ in more 
legal language: 
∀(퐶3,퐶4) ∈ {(퐶1,퐶MID), (퐶MID,퐶2)}(휑퐶3,퐶4, 푡 
) 
2 
= ∀퐶3∀퐶4[(퐶3,퐶4) = (퐶1,퐶MID) ∨ (퐶3,퐶4) = (퐶MID,퐶2) → 휑퐶3,퐶4,푡/2] 
This is the trick! This was done at MIT by Larry Stockmeyer in his Ph.D. thesis. It is called 
the Meyer-Stockmeyer Theorem. 
How big is this formula? We start off with an exponential number of steps 푑푆(푛) = 푑푛푘 , 
so the number of recursions is 푂(푛푘). Each adds order 푂(푛푘) stuff out front, so the size of 
the formula is 푛2푘. Its size is polynomial, but it does have a squaring. 
We see in both Savitch’s Theorem 16.13 and Theorem 17.2 the following concept. 
Recursion using middle configurations makes things polynomial, not exponential! 
In fact, the proof Theorem 17.2 implies Savitch’s Theorem: It could have been a nondeter-ministic 
Turing machine and the proof still works! Hence, very nondeterministic NPSPACE 
computation can be reduced to TQBF. 
If a nondeterministic machine is reduced to TQBF, there is a squaring. Note TQBF can 
be done in linear space: A deterministic machine goes through all assignments, and solves 
TQBF in linear time. This gives a different proof of Savitch’s Theorem. 
2.3 PSPACE–completeness and games 
PSPACE–complete problems can look like games. TQBF doesn’t look like a game, but we’ll 
see it really does. 
We’ll see other PSPACE–complete problems that are more strictly “games.” My son, all 
he does is XBox. There is a kind of game we used to play before XBox, called geography. 
Choose some starting city, like Cambridge. Two players take turns. You have to pick a place 
whose first letter starts with the same letter Cambridge ends with. Edinburgh ends with H. 
I can pick Hartford, you Denver, I pick Raleigh, and so on. The first person who gets stuck 
loses. One more rule: no repetitions. 
We can model this game as a graph. All cities are nodes. 
119
Lecture 18 Notes on Theory of Computation 
Arrows correspond to legal moves. Let’s abstract the game, and forget the labels. We 
take turns picking some path through the graph. It has to be simple: no repeats. If you get 
stuck somewhere with no place to go you lose. Depending on how you play, you might win 
or lose. The question is, if you play optimally, who wins? Given one of these graphs, which 
side has the win? We’ll show this problem is PSPACE–complete by reducing TQBF to this 
problem. 
Lecture 18 
Thu. 10/11/12 
Last time we showed TQBF is PSPACE-complete, analogous to how SAT was complete for 
NP. 
Today we’ll talk about 
∙ generalized geography 
∙ games 
∙ log space: L and NL 
S1 Games: Generalized Geography 
Recall our generalized geography: Boston goes to New York City, Newark, etc., Newark goes 
to Kalamazoo, etc. 
One important class of PSPACE problems are these games: Given a an initial configura-tion 
of a game, the moves allowed, and a rule for who has won, which player has the upper 
hand? If both sides play the best possible strategy, who will win? Many of these problems 
are in PSPACE. 
We’ll look at an example, generalized geography, and show that deciding who has a 
winning strategy is a PSPACE-complete problem. 
In generalized geography, we give a bunch of geographical names, for instance cities; each 
city called out has to start with the letter that the previous one ended with. The starting 
person picks Boston; the second player has to pick a place starting with 푁. Say Newark. 
120
Lecture 18 Notes on Theory of Computation 
First person start with 퐾, Kalamazoo. The person who gets stuck because there is no place 
to move to loses. You can draw a graph that shows the possible moves. 
Abstracting, we erase the names and just remember the graph. Two players I and II take 
turns picking nodes of a simple path. The first one unable to move loses. Let 
GG = {⟨퐺, 푎⟩ : Player I has a winning strategy (forced win) in 퐺, starting at 푎} . 
Here’s an example. 
In general, figuring out who has winning strategy is not so easy: it is PSPACE–complete. 
The proof is nice: it reveals connections between quantifiers and games. 
Theorem 18.1: GG is PSPACE-complete. 
Proof. Like showing a problem is NP-complete, we start off with a problem we already know 
to be PSPACE-complete. We have to show two things. 
1. GG∈PSPACE. (This is easy, a straightforward recursive algorithm.) 
2. TQBF≤푃PSPACE. (This is the interesting fun part.) 
To make sense of this reduction, we look at the TQBF problem in a different way, as a game. 
Let 휑 be a quantified Boolean formula, for instance 
휑 = ∃푥1 ∀푥2 ∃푥3 ∀푥4[Ψ]. 
We know how to test whether this is true: Calculate and see if it works out. Put this aside 
for a moment. 
Let’s create a game for the formula. There are two players: one of them is called ∃ and 
the other is called ∀. This is how you play the game. The players take a look at a formula. 
Start with ∃’s turn. ∃ gets to pick the value of 푥1. Then it is ∀’s turn. ∀ gets to pick the 
value of the next variable 푥2, and so forth. (There may be several variables in row with the 
same quantifier, but we can always throw in dummy variables so they alternate.) 
∃ pick values of ∃ variables, ∀ pick values of ∀ variables. 
The two players have opposing desires. ∃ is trying to pick values of variables to make 
the formula true at the end, to make variables satisfy the formula. ∀ is trying to do the 
opposite: make the formula false. 
∃ wins if then chosen values of 푥1, . . . , 푥4 satisfy Ψ and ∀ wins if the chosen values don’t 
satisfy Ψ. 
121
Lecture 18 Notes on Theory of Computation 
I don’t know if this game will be a big seller. However, it is a valid game: each player is 
trying to achieve an objective, and at end, we know who won. 
Who has the winning strategy? 
The cool thing is that we’ve already run into this problem. This is exactly the same as 
the TQBF problem. ∃ has a winning strategy exactly when it is a true quantified boolean 
formula. What does it mean to have winning strategy? It means that under optimal play, 
the ∃ player can make the formula true. In other words, there exists some move, such that 
no matter what the for all player does for 푥2, there exists some move 푥3... Whether ∃ has 
a winning strategy is the the same as the truth value of the formula: whether there exists 
some value, such that for all... 
This is just a different view of the truth value. 
With this representation, we can reduce from TQBF to GG, i.e., show TQBF≤푃GG. The 
technique is reminiscent of SAT reductions: We make gadgets, etc. The way you put them 
together, though, is different because there is a dynamic game component to it. Playing the 
game simulate playing the formula game. The gadgets work a little differently. 
We send 
휑↦→ ⟨퐺, 푎⟩ 
∃, ∀ 퐼, 퐼퐼 
Player I will be like ∃ and player II will be ∀. 
The graph will have a sequence of diamonds. Let’s look at a fragment and think about 
how it proceeds. ∀ starts at the top. ∀ has no choice. ∃ player has a choice. Now ∀ has a 
choice. This simulates the choice of the variables. The first diamond is the gadget for 푥1, 
the second for 푥2. (Figure from book) 
122
Lecture 18 Notes on Theory of Computation 
If ∀ appeared twice in a row, then we wouldn’t have an extra node, which just served to 
switch whose turn it is. 
After the diamond at the very bottom, all truth values for variables have been chosen. 
Let’s assume going left corresponds to T and right corresponds to F. In the variable game, 
the game is over. In generalized geography, we’re not finished because we want to arrange 
more structure—an endgame—so that the ∃ player wins iff the formula is satisfied. 
There is one node for each of the clauses. The ∀ player picks a clause. The ∀ player is 
claiming, or hoping, that the clause is unsatisfied. (We can assume it is in CNF, just like we 
reduced SAT to 3SAT.) ∀ is trying to demonstrate that the formula not satisfied, by picking 
the unsatisfied clause. “You didn’t win because clause 2 is not satisfied.” 
(The one who tells the truth is going to be the one who ultimately wins.) Each clause 
points to all its literals. Psychologically, ∀ claims 푐1 not satisfied. ∃ says it is satisfied because 
푥1 is true. Now it’s the moment of truth. It is ∀’s turn. The positive literal is connected 
to true side of the construct. Negated variables get connected to false side. ∃ claims “this 
clause is satisfied by 푥1.” If ∃ was right, and earlier the game had gone through the true side 
of 푥1, then the ∀ player can’t move. If the ∀ player is right, then play went down the other 
way, ∀ can move, and now ∃ is stuck. 
123
Lecture 18 Notes on Theory of Computation 
All these nodes and arrows are laid down before the play begins. We build gadgets up 
front, one for each variable, and lay down nodes for each clause. We connect the nodes 
corresponding to literal in the clauses left or right depending on whether they are positive 
and negative. Playing the generalized geography game is just playing the formula game. 
There is a winning strategy exactly when the counterpart in the formula game has winning 
strategy. 
This shows TQBF is PSPACE-complete, and hence probably a hard problem. 
Similar results have been proven for natural games: the game of Go is played on a 19×19 
board; 2 players have 2 colors of stones, each trying to surround the other person’s stones. 
Determining which side has a winning strategy in Go from some preset board configuration 
is PSPACE–hard: you can reduce GG to the Go problem. There are structures in Go which 
correspond to moving through the GG configuration, and playing Go game corresponds to 
GG. There are 2 remarks in order. We actually have to generalize Go: 19×19 finite problem; 
we consider a 푛 × 푛 board. All results are asymptotic. (If we only considered 19 × 19, then 
the problem is just a big table lookup.) 
Go is at least PSPACE-hard. Whether it’s in PSPACE depends on details on how the 
set game up. A special rule might let the game go on for a very long time, so this depends 
on details of the definition of game. PSPACE–hardness has been proven for other games 
푛 × 푛 checkers, and 푛 × 푛 chess (which is less natural). 
We now shift to a different set of classes, still in space complexity. 
S2 Log space 
Instead of talking about polynomial space, we’ll talk about a whole different regime, called 
log space. It has its own associated complexity classes and natural problems. 
We look at SPACE(log 푛) and NSPACE(log 푛). We have space bounds that have size less 
than the problem. In order to make sense of this, we need to introduce a different model. 
Just by reading the entire input, the machine use space 푛. That is no sensible way to talk 
about log space. Instead, we allow the machine to read the entire input, but have a limited 
amount of work space. 
Thus we consider a multitape Turing machine, with a 
1. input (read-only) tape, and a 
2. work (read/write) tape. 
We will only count the space used on the work tape. The input given for free. 
124
Lecture 18 Notes on Theory of Computation 
We can talk about log 푛-bounded work tapes. There will be an assumed constant factor 
allowed. 
Definition 18.2: Define 
L = SPACE(log 푛), 
NL = NSPACE(log 푛), 
Why 푂(log 푛)? Why not 푂(√푛), etc? 
log 푛 is a natural amount of space to provide. It is just enough to keep track of a pointer 
into the input. Constant log 푛, for instance 7 log 푛, that’s enough to keep track of 7 pointers 
into your input. 
Using log space, a machine can keep track of a finite number of pointers. 
We’ll do a couple of examples. 
Example 18.3: We have that the set of palindromes is in log-space. 
⌋︀푤푤ℛ : 푤 ∈ {0, 1}*{︀∈ 퐿. 
The machine zigzags back and forth on the input. It can’t make any marks on the input, 
only keep track of stuff on the work tape. This is still good enough. The machine keeps 
track of how many symbols are already matched off; a fixed number of pointers enable this. 
For instance, it could record that it has already matched off 3 symbols, and is now looking 
at the 4th on the left or right. The machine uses a log-space work tape. 
We’re considering machines with separate read-only input. The input may be enormous: 
for example, input from a CD-ROM or DVD-rom, onto your small little laptop. The laptop 
doesn’t have enough internal memory to store all of it. 
A better analogy is that the read-only input tape is the Internet, huge. You can only 
store addresses of stuff and probe things. What kinds of problems can you solve, if you have 
just enough memory to write down the index of things? 
Example 18.4: path:nl We have 
PATH = {⟨퐺, 푠, 푡⟩ : 퐺 has a directed 푠, 푡 path} ∈ NL. 
125
Lecture 18 Notes on Theory of Computation 
A nondeterministic machine can put a pointer on the start node, then nondeterministically 
choose one of the outgoing edges from the start node. It remembers only the current node 
it moved to. It forgets where it came from. The machine repeats. The machine jumps node 
by node nondeterministically, and accepts if it hits 푡. 
The machine enough space to remember a node, which is logarithmic space, and also 
space to count how many nodes it has visited, so it can quit if it has visited too many 
vertices. 
Can we solve PATH deterministically in log-space? Consider an enormous graph written 
down over the surface of the US, with trillions and trillions of nodes. Can you with 20 friends 
(or however many facebook friends you have), each just keeping track of where you are (a 
pointer into a location), operating deterministically, figure out whether you can get from 
some location to another? You can communicate by walkie-talkie (or by Internet). 
Nobody knows the answer. Whether PATH is solvable deterministically (PATH ∈ 퐿?) is 
an unsolved problem. 
In fact the L vs. NL problem is open just as P vs. NP is open. There are NL-complete 
problems. If you can solve any of them in L, then you bring all of NL to L. PATH turns out 
to be complete for NL. We’ll start to prove that. 
S3 퐿,푁퐿 ⊆ 푃 
Before that, let’s look at the connection between L, NL, and the classes we’ve already seen. 
Theorem 18.5: L⊆P. 
Proof. If 퐴 ∈L and TM 푀 decides 퐴 using 푂(log 푛) space, we have to show there is a 
deterministic machine that solves 퐴 in polynomial time. 
How many configurations does the machine have? This tells us how long the machine 
can go for. Fix the input 푤. A configuration of 푀 on 푤 is (푞, ℎ1, ℎ2,work tape contents). 
We don’t include 푤 because it is read-only. The number of configurations is 
|푄| · 푛 · 푑 log 푛 · 푐푑 log 푛 
⏟ ⏞ 푛푘 
= 푂(푛ℓ). 
for some 푘, ℓ. No configuration can repeat, because no looping is allowed. Since the machine 
can have an at most polynomial number number of configurations, it runs in polynomial 
time. We get 퐴 ∈P. 
The following is trickier. 
Theorem 18.6: thm:nl-p NL⊆P. 
The previous proof would only give NL⊆NP. To get a deterministic polynomial time 
algorithm we need to construct a different machine. 
126
Lecture 19 Notes on Theory of Computation 
Proof. Given a NL TM 푁, we convert it to an equivalent polynomial-time TM 푀. 
How many configurations does 푁 have? Whe we count the number of configurations, 
it doesn’t matter if the machine is deterministic or not! A configuration is simply a snap-shot. 
푀 takes all configurations and write them down, but there’s only polynomially many. 
푀 =“on 푤, 
1. Write all configurations of 푁 on 푤. We will treat these as the nodes of a graph, called 
the configuration graph. 
2. Put an edge from one configuration 푐1 to another 푐2 when 푐1 leads to 푐2 in one step. 
Now we have a big graph of all possible configurations. We have 푐start and 푐finish (we 
can assume there is a single accepting configuration, that the machine clears the work 
tape and moves its head to the left). Now we test if there is a path from the starting 
to the accepting configuration. 
If there is a path, the nondeterministic machine accepts its input. The path gives a 
sequence of configurations that the nondeterministic machine goes on some path from 
start to accept. 
Conversely, if the machine does accept, there has to be a path of configurations from 
start to accept, so there is a sequence of edges go from start to accept. 
A polynomial time machine can do this test because it’s the PATH problem! Depth 
or breadth first search works fine. This answers whether the NL machine accepts the 
input. 
Lecture 19 
Thu. 11/15/12 
Last time we talked about... 
∙ GG is PSPACE-complete 
∙ L and NL 
We reduced from TQBF to GG to show GG is PSPACE-complete. Then we turned our 
attention to a different regime: what if we consider logarithmic space instead of polynomial 
space? Log space is enough to give you pointers into the input. This has a certain power 
which we can describe; it fits in nicely into our framework. 
Today we’ll talk about 
∙ NL-completeness 
∙ NL=coNL (this differs from what we think is true for NP) 
127
Lecture 19 Notes on Theory of Computation 
Recall that L=SPACE(log 푛) and NL=NSPACE(log 푛). We have a nice hierarchy: 
L ⊆ NL ⊆ P ⊆ NP ⊆ PSPACE. 
We don’t know whether these containments are proper. We can show that PSPACE and 
NL are different (and will eventually do so), so not everything in the picture collapses down. 
Most people believe that these spaces are all different; however, we don’t know adjacent 
inclusions are proper. 
However, NL=coNL shows that surprising things do happen, and we do have unexpected 
collapses. 
First let’s review a theorem from last time. 
Theorem (Theorem 18.6): NL⊆P. 
Proof. For a NL-machine 푁, a configuration of 푁 on 푤 is (푞, 푝1, 푝2, 푡). The number of 
configurations of 푁 on 푤 is polynomial in 푛 where 푛 = |푤| (푤 is fixed). The computation 
graph is the graph where 
∙ nodes are configurations, and 
∙ edges show how 푁 can move. 
Here is a polynomial time algorithm that simulates 푁. “On input 푤, 
1. Construct the computation graph. 
2. Test if there is a path from start to accept (using any polynomial time algorithm for 
PATH). 
3. Accept if yes and reject if no.” 
128
Lecture 19 Notes on Theory of Computation 
S1 L vs. NL 
Now we turn our attention to L vs. NL. We’ll show that the situation is analogous to the 
situation of P vs. NP. How much space deterministically do we actually need for a NL 
problem? We can do it with polynomial space but that’s pretty crude. We can do much 
better. 
We have using Savitch’s Theorem that 
NL = NSPACE(log 푛) ⊆ SPACE(log2 푛) 
We stated Savitch’s Theorem for space bounds ≥ 푛; with space bounds of ≥ log 푛 the same 
argument goes through. No one knows whether we can reduce the exponent, or whether 
L=NL. 
(We will show that SPACE(log 푛) is provably different fron SPACE(log2 푛), using the 
hierarchy theorem 20.1. When we increase the amount of space/time, we actually get new 
stuff. But maybe some other argument could show NL⊆SPACE(log 푛).) 
We will show that there are NL-complete problems, an example of which is PATH. If 
you can solve PATH or any other NL-complete problems in deterministic log space, then it 
brings down everything with it to L. We’ll show everything in NL is reducible to the PATH 
problem. This shouldn’t be a surprise because it’s what we did in the previous theorem: 
whether a machine accepts is equivalent to whether there’s a path. We’ll just need to define 
NL-completeness in the appropriate way and then we’ll be done by the argument given in 
the NL⊆P theorem. 
Definition 19.1: 퐵 is NL-complete if 
1. 퐵 ∈NL 
2. Every NL-problem is log-space reducible to 퐵: for every 퐴 ∈NL, 퐴 ≤퐿 퐵. 
We need to define what it means to be log-space reducible. We have to be careful because 
the input is roughly 푛, and the output is roughly 푛. we don’t want to count the output of 
machine in the space bound. The input and output should be kept seprate. 
Definition 19.2: A log-space transducer is a Turing machine with 3 tapes, 
1. input tape (read only), 
2. work tape (read-write), and 
3. output tape (write only), 
such that the space used by the work tape is 푂(log 푛) with 푛 the size of the input. 
We say that 푓 : Σ* → Σ* is computable in log-space if there is a log-space transducer 
that on input 푤, which leaves 푓(푤) on the output tape. 
129
Lecture 19 Notes on Theory of Computation 
We don’t use polynomial reducibility because once we have polynomial time we can solve 
NL problems. The reducer can figure out whether a string is in 퐴, then direct it to a point 
in 퐵. It would not change the problem, just get the answer and dump it in 퐵. If we used 
polynomial reducibility, everything would be NL-complete except 휑 and Σ*. 
We need a reduction that the L machine could compute. If we used polynomial reduction, 
an L machine couldn’t necessarily make the reduction. But if we log space reduction, then 
an L machine can compute the reduction. 
We have the following analogous theorem. 
Theorem 19.3: If 퐴 ≤퐿 퐵 and 퐵 ∈ 퐿 then 퐴 ∈ 퐿. 
Why doesn’t the same argument for P work for L? It’s a bit tricky because we can’t write 
all of the output of 푓 on a L-machine. 
Proof. The algorithm for 퐴 is the following. “On 푤, 
1. Compute 푓(푤). 
2. Test if 푓(푤) ∈ 퐵. 
3. Accept or reject accordingly. 
But we can’t write 푓(푤)! 
There’s a trick that fixes this. We run 퐵 without having 푓(푤) available. Every time we 
need a bit of 푤, we run the whole reduction, throw away all the output except the bit we’re 
looking for, and plug that into the machine. 
Recomputation allows us to get by with logarithmic memory. 
Proposition 19.4: If 퐴 ≤퐿 퐵 and 퐵 ≤퐿 퐶 then 퐴 ≤퐿 퐶. 
Proof. Use the same idea, doing computation on the fly. 
Now let’s turn to NL-completeness. 
S2 NL-completeness 
Theorem 19.5: PATH is NL-complete. 
Proof. We have to show 
1. PATH∈NL: We already proved this (Example ??). 
130
Lecture 19 Notes on Theory of Computation 
2. For 퐴 ∈NL, 퐴 ≤퐿PATH. We give a generic reduction. Say that the NL-machine 푁 
decides 퐴. We give the reduction 푓. 
Given 푤, let 푓(푤) be ⟨퐺, 푠, 푡⟩ where 퐺 is the computation graph for 푁 on 푤, 푠 is 퐶start, 
and 푡 is 퐶accept (again, we assume 푁 cleans up its tape at the end, so that there is just 
one accept state). Testing whether there is a path from 퐶start to 퐶accept is an instance 
of the PATH problem. We have 푤 ∈ 퐴 iff there is a path from 푠 to 푡. 
This machine does the right thing. We have to show we can do the reduction in log 
space, i.e., 푓 is log-space computable. 
푓(푤) is supposed to be a description of the nodes and edges, and which is the starting 
and ending node. Split the work tape into 2 pieces, representing 2 configurations of 푁 
on 푤, say 퐶1 and 퐶2. 
We’ll go through all possible pairs of configurations sequentially, just like an odometer. 
For each possibility of 퐶2 look at all possibilities of 퐶1. We cycle through all possible 
pairs of configurations, testing whether 퐶1 legally yields 퐶2 in 1 step according to the 
rules of 푁. If so, take the pair and output an edge between them. The whole thing 
takes log-space, because writing 퐶1,퐶2 takes log space. 
This proves 푓 is a log-space computable function, so the reduction takes log-space. 
Note that the output depends on 푤. How? Which configurations lead from others—it 
might seem like these depend only on machine. But 푓(푤) should depend on 푤. The start 
configuration doesn’t depend on 푤, and doesn’t have 푤 built in. When you look at whether 
you can transition from 퐶1 to 퐶2, they have head positions as part of the configuration. In 
order to see whether 퐶1 leads to 퐶2 we have to see what’s in the cell that the head is at. 
Thus the edges of the graph do depend on 푤. 
For homework, you need to show other problems are NL-complete. To show other prob-lems 
are NL-complete, we reduce PATH to show they are also NL-compute, just like we 
reduced 3SAT. 
Let’s move on the this amazing problem. 
131
Lecture 19 Notes on Theory of Computation 
S3 NL=coNL 
Let’s look at the picture. 20 years ago we thought NL̸=coNL, with L in the intersection, 
much like we still think the picture for P vs. NP still looks like this. However, actually 
NL=coNL. 
Theorem 19.6: NL=coNL. 
Proof. A reducible to B exactly when 퐴 reducible to 퐵, so all we need to do is show 
PATH ∈NL. How do we give a NL-algorithm that recognize the nonexistence of a path? 
What could we guess, so that if accept at the end, there’s no path? 
Perhaps we could guess a cut. But writing down a cut requires more than log-space. 
The algorithm is very nonobvious. This was a prize-winning paper. 
We’ll give the algorithm in pictures. We have our graph 퐺, with starting and ending 
nodes 푠 and 푡. The idea came a little out of left field. The guy’s advisor asked: if you’re 
trying to solve problems and you’re given information for free, what happens? What happens 
if you’re given for free the number of nodes you can get to from 푠? 
We first give a NL-algorithm for the PATH, given the number of nodes reachable from 
푠. Let 푅 be the set of nodes reachable from 푠. Let 푐 be the size of 푅. 
Our algorithm goes through all nodes of 퐺 one by one. Every time it gets to a new 
node, it guesses whether the node is reachable. If it guesses the node is reachable, it will 
prove it’s right by guessing the path. (If can’t find the path, that branch dies.) If the node 
is reachable, some branch will guess the right path and then move on. We keep track of 
how many reachable nodes we’ve found. When we’re done, if the count equals 푐, then we’ve 
guessed right all the way along; we’ve found all the reachable ones. All the ones that we’ve 
guessed to be nonreachable are really nonreachable. If 푡 wasn’t guessed, 푡 is nonreachable! 
Now we’ve reduced the problem to computing 푐. We compute it using the same technique. 
We layer the graph into 푅0,푅1,푅2, . . . where 
푅푖 = nodes reachable from 푠 by path with length ≤ 푖. 
Note 
푅0 ⊆ 푅1 ⊆ · · · ⊆ 푅푚 = 푅 
beacuse 푚 is the maximal possible number of steps you need to reach any node. Let 퐶푖 = |푅푖|. 
132
Lecture 20 Notes on Theory of Computation 
We will show how to compute 퐶푖+1 from 퐶푖. Then we can throw away the previous 
퐶-value. So we can get the count of nodes reachable in any number of steps, and we’re done. 
We need a way of testing whether a node is in 푅푖+1: nondeterminism will either get 
the value correctly, or that branch of the nondeterminism will fail. Some branch will have 
guessed everything correctly along the way. 
Each time we’re testing whether a node 푣 is in 푅푖+1, we go through all the nodes, guessing 
which are in 푅푖 and which are not. If we guess it’s in 푅푖, we prove it is in by guessing a path 
of length at most 푖. We keep a count of the number of nodes in 푅푖 and make sure it equals 
퐶푖 at the end. Along the way we check if any of these nodes connect to 푣. 
Now iterate over 푖 = 0, . . . ,푚 − 1. 
Lecture 20 
Tue. 11/20/12 
Last time we showed: 
∙ PATH is NL-complete 
∙ NL=coNL 
Today we’ll talk about the time and space hierarchy theorems. 
S0 Homework 
Problem 3: 
Show a language is in L. If you just try to do it with your bare hands, it’s a mess. But 
if you use the methodology we talked about in lecture, it’s a 1-2 line proof. Don’t just dive 
in but use a technique we introduced to make it simpler. 
Problem 6: 
Here the satisfiability problem is made to be easier: The clauses have at most 1 negated 
literal per clause (for instance, (푥 ∨ 푦1 ∨ · · · ∨ 푦푘)), and that negated literal cannot appear 
anywhere else. This turns out to be solvable in NL, and be NL-complete. As a hint, (푎 ∨ 푏) 
is equivalent to 푎 → 푏. Thus we can rewrite (푥 ∨ 푦1 ∨ · · · ∨ 푦푘) as 
(푥 → (푦1 ∨ · · · ∨ 푦푘)). 
133
Lecture 20 Notes on Theory of Computation 
This suggests you think of the problem as a graph which represents the formula in some 
way. The nodes are the clauses, and have an edge going from (푥 → 푦1 ∨ · · · ∨ 푦푘) to a clause 
containing 푦1 on the left-hand-side of an implication. If 푥 is true, one of 푦1, . . . , 푦푘 is true; 
then following an edge we get to one of the 푦푘. Think of this as a connectivity problem in a 
graph. 
For the reduction, we want to reduce the graph to the restricted satisfiability problem. 
We can just reduce from graphs that don’t have any cycles in them. Reduce a path problem 
to the satisfiability problem, using a construction inspired by the above. The construction 
requires the starting graph not to have cycles. You have to remove the cycles because they 
cause problems. The acyclic path problem is still NL-complete; this is in the textbook. Use 
the technique of level graphs, explained below. 
To show 
PATH ≤퐿 acyclic-PATH, 
take your graph and convert it to a bunch of copies of the vertex set, where an edge from 푎 
to 푏 now goes from 푎 in one layer to 푏 in the next layer down. There are no backward-going 
edges so there are no cycles. But if we had an edge from 푠 to 푡, there is still an edge from 푠 
to 푡 in modified graph. 
S1 Space hierarchy 
We’ve wrapped up the basics on complexity classes for time and space. We’ll now talk about 
a pair of theorems that relate to both time and space. The hierarchy theorems have a very 
simple message. With respect to time and space (let’s think of time for the moment), and 
if you have a certain amount of time you’re allowing the machine, then if you increase the 
time, you’d expect there’s more stuff the machine you could do (say 푛3 instead of 푛2). For 
this question the answer is known: if you give the machine a little more time or space, it can 
do more things. 
In particular, the theorem tells you there are decidable languages that are not in P. 
So far we have 
L ⊆ NL ⊆ P ⊆ NP ⊆ PSPACE. 
Even 퐿 ?= 
푃 is open, and 푃 ?= 
PSPACE is open. However, 퐿, PSPACE are provably different, 
so we can’t have both 퐿 = 푃 and 푃 = PSPACE. 
There are separations out there, which we don’t know how to prove. The general belief 
is that all of these are separate. We can actually prove something stronger. We have by 
Savitch’s Theorem that 
NL ⊆ SPACE(log2 푛) ⊂ PSPACE, 
the inclusion proper by Space Hierarchy. We know 푁퐿̸= PSPACE, but nothing stronger is 
known. 
Theorem 20.1 (Space Hierarchy Theorem): thm:space-hierarchy For functions 푓, 푔 : N → N 
where 
134
Lecture 20 Notes on Theory of Computation 
1. 푓 is space constructible: it can be computed in 푓(푛) space. (This is a technical 
condition that all normal functions will satisfy.) 
2. 푔(푛) = 표(푓(푛)), 
then there is a language 
퐴 ∈ SPACE(푓(푛)) 
with 
퐴̸∈ SPACE(푔(푛)). 
(Note 푔(푛) = 표(푓(푛)) means 푔(푛)  푐푓(푛) for any constant 푐  0, if you make 푛 large 
enough. In other words, 푓(푛) dominates 푔(푛) for large enough 푛.) 
We will find some language 퐴 in SPACE(푓(푛)) and not in SPACE(푔(푛)), to prove this. 
For instance take 푔(푛) ∼ 푛2 and 푓(푛) ∼ 푛3: we can do something in 푛3 space that we 
can’t do in 푛2 space. 
The space hierarchy theorem has a slightly easier proof than the time hierarchy theorem. 
What are we going to do? I’ll tell you what we’re not going to do. It would be nice 
if the language was some nice language, understandable as a string manipulation, with 푓 
as a parameter somewhere. Rather, it will be an artificial concocted language designed 
specifically to meet the conditions that we want; we won’t be able to understand it simply 
otherwise. Later on we’ll find more natural languages that take a lot of space. 
The machine operates in space 푓(푛), and by design, makes sure its language can’t be 
done in less space. It simulates all smaller space machines and acts differently from them. 
Really it amounts to a diagonalization. We build something different from everything in 
some list. 
Let’s review diagonalization. To prove R is uncountable, given a list of real numbers, we 
make a number differing from everything in the list by at least one digit (Theorem 8.7). To 
show 퐴푇푀 is undecidable, we make a machine that looks at what 푀푖 does on ⟨푀푖⟩ and does 
the opposite (Theorem 8.10). Its language 퐷 is new thing, and can’t be on the list of all 
possible Turing machines, a contradiction. 
Our proof is similar in spirit. Think of 푀푖 as the machines that operate in space 푔(푛) 
where 푔(푛) = 표(푓(푛)), the small space machines. 퐷 does something different from what each 
푀푖 does, so 퐷 can’t be a small space machine. 
However, 퐷 is decidable. Testing whether the 푀푖 accept their input takes small space. 
Our language is decidable in space just little more. We have to be careful in our analysis 
just to show 퐷 can decide the diagonal in just a little more space; by construction it can’t 
do the tests in small space, but can do it in more space 푓(푛). 
Proof. We give a Turing machine (decider) 퐷 where 퐴 = 퐿(퐷) and 퐷 is a decider running 
in space 푂(푓(푛)). This gives 퐴 ∈ SPACE(푓(푛)). Our algorithm for 퐷 will show that 퐷 is 
not solvable in smaller space, 퐴̸∈ SPACE(푔(푛)). 
Our first try is the following. Let 퐷 =“on input 푤 (of length 푛): 
135
Lecture 20 Notes on Theory of Computation 
1. Compute 푓(푛) and mark off 푓(푛) tape cells. (If the machine ever needs to use more 
space, then reject.) 12 
2. If 푤̸= ⟨푀⟩ for some TM 푀, then reject. If 푤 = ⟨푀⟩, continue; 퐷 will try to be 
different from 푀. 
3. Run 푀 on 푤 and do the opposite (this is an oversimplication; we have to make some 
adjustments).” 
Modulo a little engineering, this is our description of 퐷. Conceptually, this is the whole 
proof. 
But we might not finish, 푀 might take more space than we allocated, in which case 퐷 
ends up rejecting. Is that a problem? We only have an obligation to be different from the 
small-space machines. We’ll be able to run small spaces to completion. Our language is 
different from what those languages are. 
This is a bit of a cheat. There are 2 critical flaws in the argument. I claimed that if 
푀’s computation doesn’t fit in the space, I don’t have to worry about it. That’s not true. 
It could be the machine uses lots of space on small input, but on large input, it uses space 
표(푓(푛)). We have 푔(푛)  푓(푛) for a particular 푤 (but not asymptotically)—we had one 
chance to be different from that machine, and we’ve blown it. No one tells us the constant 
factor. This problem seems more serious! 
We want to run 푀 on a bigger 푤. We don’t what 푤 we need, but big enough so the 
asymptotics kick in. Thus we’ll pad it in all possible ways; we’ll have infinitely many chances 
to be different. 
We change the above as follows. let’s strip off trailing 0’s and see if the remainder is a 
Turing machine. We could have a string with billions of 0’s, run on some relatively small 
Turing machine. Let 퐷 =“on input 푤 (of length 푛): 
1. Compute 푓(푛) and mark off 푓(푛) tape cells. (If the machine ever needs to use more 
space, then reject.) 
2. If 푤̸= ⟨푀⟩ 0* for some TM 푀, then reject. If 푤 = ⟨푀⟩ 0*, continue; 퐷 will try to be 
different from from 푀. 
12We use the technical condition that 푓(푛) can be computed in 푓(푛) space; the machine needs to understand 
how much extra space it got in order to do something new to it. There is a counterpart to the theorem: 
we can construct gaps in hierarchy where nothing new from 푔 up to 푓, by constructing 푓 so complicated, 
that we can’t compute 푓 in 푓 space. This is the gap theorem. There is one gap you can describe easily; 
log-log-space. There is a gap between constant space and log-log-space. Nothing nonregular is in 표(log log 푛) 
space. 
136
Lecture 20 Notes on Theory of Computation 
3. Run 푀 on 푤 and do the opposite.” 
This allows 퐷 to run 푀 on very long inputs. 
This solves one problem. But it’s possible that 푀 on 푤 goes forever. It can only do so 
in a particular way: using a small amount of space. If 푀 blindly simulates, it is going to 
loop. The amount of time it can take is exponential in the amount of space. Thus, we run a 
counter to count up the amount of time a machine can run without getting into a loop, on 
that amount of space. It’s just 2푓(푛). 
The counter takes a constant factor more space; put the counter out to the right, or think 
of it running on a separate track below. If we exceed the time bound, then reject. Using 
asymptotics, for large enough 푛, we will run to completion on some input and be different. 
Let 퐷 =“on input 푤 (of length 푛): 
1. Compute 푓(푛) and mark off 푓(푛) tape cells. (If the machine ever needs to use more 
space, then reject.) 
2. If 푤̸= ⟨푀⟩ 0* for some TM 푀, then reject. If 푤 = ⟨푀⟩ 0*, continue; 퐷 will try to be 
different from from 푀. 
3. Run 푀 on 푤 and do the opposite. 
(a) Reject if exceeds 2푓(푛) time.” 
Constructibility works down to log 푛 (we have to work with the special model for sublinear 
space). 
S2 Time Hierarchy Theorem 
The issue of the overhead becomes more of a concern. 
Theorem 20.2: thm:time-hierarchy If 푓 is time-constructible and 푔(푛) = 표 (︁푓(푛) 
log 푛)︁, then 
there exists 퐴 ∈ TIME(푓(푛)) and 퐴̸∈ TIME(푔(푛)). 
In the interest of time (pun intended) we’ll just sketch the proof. The idea is the same. 
Proof. Let 퐷 =“on input 푤 of length 푛, 
1. Compute 푓(푛). Start counter (“timer”) to 푓(푛). 
2. If 푤̸= ⟨푀⟩ 0* then for some TM M, reject. 
3. Run 푀 on 푤 and do the opposite (provided it runs within the time on the counter). 
137
Lecture 21 Notes on Theory of Computation 
We have to be careful. Every time we do a step, we refer back to 푀. The overhead, if we’re 
not careful, will be bad. We only have an extra factor of log 푛 to work with. We extend our 
tape alphabet so that every tape cell has enough space to write 2 symbols. We’ll keep the 
description of 푀 on the tape: Like checking out book from the library, we’ll take 푀 and 
carry it with us. More complicated is the counter. 
푀 is constant size thing. The counter is not constant in size; it grows with 푛, hence is 
logarithmic in size. This contributes log 푛 overhead. 
Lecture 21 
Tue. 11/27/12 
Last time we talked about hierarchy theorems. If we allow a bit more time or space, then 
there are more things we can do. 
Today we’ll talk about 
∙ natural intractable problems 
∙ Relativization, oracles 
S1 Intractable problems 
Definition 21.1: Define 
EXPTIME = ⋃︁푘 
TIME(2푛푘 
) 
EXPSPACE = ⋃︁푘 
SPACE(2푛푘 
). 
(Think of it as 2poly(푛).) 
The hierarchy theorems show that there are things we can do in exponential time that 
we can’t do in polynomial time, and the same is true for space. We have proper inclusions. 
푃 ⊂ EXPTIME, PSPACE ⊂ EXPSPACE. 
We found 퐴 ∈ EXPSPACE∖PSPACE. This was a language that the hierarchy machine 
produced for us. It decides in such a way that makes it provable different. 퐴 is by design 
not doable in polynomial space, because it diagonalizes over all polynomial space machines. 
138
Lecture 21 Notes on Theory of Computation 
But 퐴 is an unnatural language; it has no independent interest; it just came out for sake 
of proving the hierarchy theorem. 
We’d like to prove some more natural language is in EXPSPACE∖PSPACE. To do this 
we turn to completeness. 
We’ll introduce an exponential space complete problem, in the same spirit as our other 
complete problems. Everything in the class reduces to it. It cannot be in polynomial space 
because otherwise PSPACE=EXPSPACE. Because 퐴 is outside PSPACE, the classes are 
different, and the exponential space complete problem must also be outside. 
The language is a describable language. We can write it in an intelligible way. It’s a toy 
language. There are languages that people are more interested in that have completeness 
properties. Our problem will illustrate the method, which we care about more than the 
results. This is like the Post Correspondence Problem. Other languages are less convenient 
to work with. 
Definition 21.2: A language is intractable if it is provably outside of 푃. 
Example 21.3: Here’s a problem that mathematicians care about. Remember that we 
talked about number theory: we can write down statements. 
Consider a statement of number theory with quantifiers and statements involving only 
+. Chapter 6 gives an algorithm for testing whether such statements are true of false. It’s 
a beautiful application of finite automata. The algorithm is very slow; it repeatly involves 
converting a NFA to a DFA, which is an exponential blowup. The algorithm runs in time 
22. . . 
, a tower whose length is about the length of the formula. It can be improved to double 
exponential. 
Is there a polynomial time algorithm? No, it’s complete for double exponential time; it 
provably cannot be improved. We’ll give the flavor of how these things go, by giving an 
exponential problem that’s more tailored to showing it’s complete. That’s the game plan. 
1.1 EQREX 
First we consider a simpler problem. 
Definition 21.4: Define 
EQREX = {⟨푅1,푅2⟩ : 푅1,푅2 are regular expressions and 퐿(푅1) = 퐿(푅2)} . 
139
Lecture 21 Notes on Theory of Computation 
This can be solved in polynomial space (it’s a good exercise). We can convert regular 
expressions to NFAs of about the same size. Thus we can convert the problem to testing 
whether two NFA’s are equivalent. We’ll look at the complementary problem, the inequiva-lence 
problem, show that is in PSPACE. 
We show EQREX is in NPSPACE and use Savitch’s Theorem 16.13. The machine has to 
accept if the strings are not equivalent. We’ll guess the string on which they give a different 
answer. 
If one machine is in an accepting state on one and the other machine not in an accepting 
state on any possibility, we know the machines are not equivalent, and we can accept. 
Does this also show the inequivalence problem is in NP? Why not? Why can’t we use 
the string as the witness, that’s accepted by one machine to another? The mismatch could 
be a huge string that is not polynomially long. The first string on which differ could be 
exponentially long. 
To use polynomial space, we modify our machine so it guesses symbol by symbol, and 
simulates the machine on the guessed symbols. 
A variant of this problem is not in PSPACE. 
For a regular expression 푅, let 푅푘 = 푅· · ·푅 
⏟ ⏞ 푘 
. Imagining 푘 is written down as a binary 
number, we could potentially save a lot of room (save exponential space) by using exponen-tiation. 
We’ll talk about regular expressions with exponentiation. 
Definition 21.5: Define 
EQREX↑ = {⟨푅1,푅2⟩ : 푅1,푅2 are regular expressions with exponentiation and 퐿(푅1) = 퐿(푅2)} . 
Definition 21.6: We say 퐵 is EXPSPACE–complete if 
1. 퐵 ∈EXPSPACE. 
2. for all 퐴 ∈EXPSPACE, 퐴 ≤푃 퐵.13 
We show the following. 
Theorem 21.7: EQREX↑ is EXPSPACE–complete. 
Proof. First we have to show EQREX↑ is in EXPSPACE. If we have regular regular expres-sions, 
we know it’s in polynomial space; using the Savitch’s Theorem trick we argued at 
the beginning of class that it’s doable in polynomial space. For regular expressions with 
exponentiation, expand each concatenation. This blows up the expression by at most an 
exponential factor. Now use the polynomial algorithm in the exponentially larger input. 
The claim follows immediately. 14 
13Why polynomial time reduction, not polynomial space reduction? Reductions are usually doable in log 
space; they are very simple transformations relying on repeated structure. Cook-Levin could be done in 
log-space reduction. If weaker reductions already work, there’s no reason to define a stronger one. 
14If we allow complements in the expression, we’re in trouble. The algorithm doesn’t work for complements. 
If we have complementation we have to repeatly convert NFA’s to DFA’s to make everything work out. 
140
Lecture 21 Notes on Theory of Computation 
Now we show EQREX↑ is EXPSPACE–complete. Let 퐴 ∈EXPSPACE be decided by TM 
푀 in space 2푛푘 . We give a reduction 푓 from 퐴 to EQREX↑ sending 푤 to 푓(푤) = ⟨푅1,푅2⟩, 
defined below. 
Let Δ be the computation history alphabet. Let 
∙ 푅1 be just all possible strings over some alphabet, Δ*, and 
∙ 푅2 be all strings except rejecting computation histories for 푀 on 푤. 
If 푀 rejects 푤, there a is rejecting computation history. Then 푅2 will be all strings 
except for that one string, and the regular expressions will not be equivalent, 푅1̸= 푅2. 
If 푀 accepts 푤, then there are no rejecting computation histories, and 푅1 = 푅2. 
How big are 푅1 and 푅2 allowed to be? They have to be polynomial in the size of 푤. 
How big can 푅2 be? 푤 already has exponential space, so the string is double-exponentially 
big. The challenge is how to encode: how to represent the enormous objects even though 
you yourself are very small. 
We construct 푅2 as follows. 푅2 is supposed to describe all the junk: every string which 
fails to be a computation history (because it’s scribble). We look at all the possibilities that 
푅2 can fail; we have to describe all failure modalities. We’ll write 
푅2 = 푅bad-start ∪ 푅bad-reject ∪ 푅bad-compute. 
The beginning is bad, the end is bad, or somewhere along the line we made a mistake moving 
from one configuration to the next. 
A computation history looks like 
퐶start#퐶1#· · ·#퐶reject. 
The configurations are 2푛푘 big, because we have to write down the the tape of the machine. 
Assume they are padded to the same length, so 퐶start = 푞푤1 · · ·푤푛 · · · . 
푅bad-start: We describe 푅bad-start as all words which don’t have first symbol 푞0, or 2nd symbol 
푤1, and so forth, so everything that doesn’t start with 푞푤1 · · ·푤푛 · · · . To start, let 
푅bad-start = (Δ − 푞0)Δ* ∩ Δ(Δ − 푤1)Δ* ∩ Δ2(Δ − 푤2)Δ* ∩ · · · ∩ Δ푛(Δ − 푤푛)Δ* ∩ · · · 
(Technically we have to write out Δ − 푞0 as a union. This is shorthand. It’s not a regular 
expression as we wrote it, but we can easily convert it.) Now we have to deal with the 
blanks. This is a little of a pain. Naively we have to write down an exponential number of 
expressions Δ푖(Δ − )Δ*. We do a bit of regular expression hacking. We let 
푅bad-start = (Δ − 푞0)Δ* ∩ Δ(Δ − 푤1)Δ* ∩ Δ2(Δ − 푤2)Δ* ∩ · · ·Δ푛(Δ − 푤푛)Δ* 
∩Δ푛+1(Δ ∪ 휀)2푛푘 
−(푛+1)(Δ − )Δ* ∩ 2푛푘 
(Δ − #)Δ*. 
(Any string that starts with 푛 + 1 to 2푛푘 symbols followed by a non-blank is a bad starting 
string.) Note that 2푛푘 can be written down with 푛푘 + 1 bit 
141
Lecture 21 Notes on Theory of Computation 
푅bad-reject: Let 
푅bad-reject = (Δ − 푞rej)*. 
푅bad-compute: For 푅bad-compute, we need to describe all possible errors that can happen, 
Δ* (error)Δ*. An error means we have a bad window; we have an incorrect window 푑푒푓 
following 푎푏푐 in the same position. Thus we let 
⋃︁ 푎푏푐푑푒푓 illegal window 
Δ*(푎푏푐Δ2푛푘 
−2푑푒푓)Δ*. 
Note that this is a constant-size union independent of 푛; it is at most size |Δ ∪ 푄|6. 
We’re done! 푅 is a polynomial time regular expression with exponentiation. 
We proved this language is not in PSPACE, hence not in P, hence truly intractable. 
Can we use the same method to show the satisfiability problem is not in 푃? That would 
show P=NP. There is a heuristic argument that shows this method will not solve the P vs. 
NP problem. This has to do with oracles. 
The moral of the story is that this method, which is very successful in showing a language 
outside of P, is not going to show SAT is outside of P. 
S2 Oracles 
Sometimes we want to think of a Turing machine that operates normally, but is allowed to 
get a certain free language. The machine is hooked up to a black box, the oracle, which is 
going to answer questions whenever the machine decides to ask one, about whether a string 
is in the language. 
Definition 21.8: An oracle for a language 퐴 is a machine (black box) that answers ques-tions 
about what is in 퐴 for free (in constant time/space). 
푀퐴 is a TM with access to an oracle for 퐴. 
Let P퐴 be the languages decidable in polynomial time with oracle 퐴, and define NP퐴 in 
the languages decideable in nondeterministic polynomial time with oracle 퐴. 
Let’s look at some examples. A handy oracle is an oracle for SAT. 
Example 21.9: 푃SAT is the class of languages that you can solve in polynomial time, with 
the ability to ask whether any expression is in SAT. 
Because SAT is NP–complete, this allows you to solve any NP problem: 
NP ⊆ 푃SAT. 
Given a language in NP, first compute a polynomial reduction to SAT, and then ask the 
oracle whether the formula is true. 
We also have 
coNP ⊆ 푃SAT, 
142
Lecture 22 Notes on Theory of Computation 
because 푃SAT, a deterministic class, is closed under complement. 
This is called computation relative to SAT. The general concept is called relativization. 
Whether 
NPSAT ?= 
푃SAT 
is open. However, we do know the following. 
Theorem 21.10: For some 퐴, 
P퐴 = NP퐴. 
For some other 퐵, 
P퐵̸= NP퐵. 
We’ll prove the first fact, and then see the important implications. 
Proof. Let 퐴 =TQBF (or any PSPACE–complete problem). Now because TQBF is in 
PSPACE, the machine can answer can answer the question without the oracle, we can 
eliminate the oracle. 
NPTQBF ⊆ NPSPACE Savitch = PSPACE ⊆ 푃TQBF, 
the last because TQBF is PSPACE–complete. 
Here is the whole point of why this is interesting. 
Suppose we can prove P̸=NP using essentially the technique for the first 2 
3 of the lecture: 
hierarchy theorem and a reduction. At first glance that’s possible. But diagonalization at 
its core is one machine simulating another machine, or a variety of different machines. 
Notice that simulation arguments would still work in the presence of an oracle. We give 
both the simulating machine and simulated machine the same oracle; the proof goes through. 
The simulating machine can also ask the same oracle. 
Suppose we have a way of proving P̸=NP with simulating. Then we could prove 푃퐴̸= 
푁푃퐴 for every oracle 퐴. But this is false! We know 푃퐴 = 푁푃퐴 for certain oracles! This 
simple-minded approach doesn’t work. 
A solution to 푃 ?= 
푁푃 cannot rely on simulating machines alone, because if it did, 
by relativization the proof would show that the same is true with any oracle. 
Lecture 22 
Thu. 11/29/12 
Last time we talked about 
∙ EQREX↑ is EXPSPACE-complete. 
143
Lecture 22 Notes on Theory of Computation 
∙ Oracles 
We gave an example of a provably intractable language, and concluded the same technique 
can’t be used to prove P ? =NP (relativization). 
Today we’ll look at a different model of computation that has important applications. 
We allow Turing machines to access a source of randomness to compute things more quickly 
then we might otherwise be able to do. We’ll talk about 
∙ Probabilistic computation and BPP 
∙ Primality and branching programs 
S1 Primality testing 
We’ll use primality testing as an example of a probabilistic algorithm. 
Let 
PRIMES = {푝 : 푝 is a prime number in binary} . 
We have PRIMES∈coNP (easy). We can write down a short proof in elementary number 
theory that PRIMES∈coNP. A big breakthrough in 2002 showed PRIMES∈P. 
We’ll give a probabilistic, polynomial-time algorithm for PRIMES. We’ll just sketch the 
idea, without going through the details. It is probabilistic in the sense that for each input 
the running time is polynomial, but there is a small chance that it will be wrong. 
We need the following. 
Theorem 22.1 (Fermat’s little theorem): For any prime 푝 and 푎 relatively prime to 푝, 
푎푝−1 ≡ 1 (mod 푝). 
This comes from the abstract algebra fact that if you raise the element of a finite group 
to the size of the group you get the identity. 
For example, if 푝 = 7 and 푎 = 2, then 26 = 64 ≡ 1 (mod 7). 
In contrast, if you take 푝 = 9, 푎 = 2, then 28 ≡ 256 ≡ 4̸≡ 1 (mod 9). We have just given 
a proof that 9 is not a prime number: 9 does not have a property that all prime numbers 
are. However, this proof does not tell you what the factors are. (So primality testing may 
not help you do any factoring.) 
Suppose 푎푝−1 (mod 푝)̸= 1 for 푝 not prime. This gives an easy test for primality. Unfor-tunately, 
this is false. An example is 푝 = 561 = 3 · 11 · 17. We have 2560 ≡ 1 (mod 561). 
We’re going to look at something which is still false, but closer to being true. Suppose for 
푝 not prime, 푎푝−1 (mod 푝)̸= 1 for most 푎  푝. This would not necessarily give a polynomial 
time algorithm, because it might give the wrong answer. But you can pick random 푎’s; each 
time you pick an 푎, you have a 50-50 chance of getting a result which is not 1. 
To test if 푝 is a prime number, test a hundred random 푎’s. If you run 100 times and 
fail, the number is probably prime. But this is also false. For 561, it fails for all 푎 relatively 
prime to 푝. This test ignores Carmichael numbers, which masquerade for primes. 
144
Lecture 22 Notes on Theory of Computation 
But let’s assume our heuristic is true. 
Then this test works. Let’s write the algorithm down. Here is a probabilistic algorithm 
assuming the heuristic. “On input 푝, 
1. Pick 푎1, . . . , 푎푘  푝 at random. (푘 is the amplification parameter, which allows us to 
adjust the probability of error.) 
2. Compute 푎푝−1 
푖 (mod 푝) for each 푖. 
3. Accept if all results equal 1, and reject any result is not 1.” 
With our assumption, if 푝 is prime, 
푃(accept) = 1. 
If we have a prime number we always get 1 by Fermat’s little theorem. But if 푝 is composite, 
then the probability is going to be small (under the false assumption) 
푃(accept) ≤ 2−푘. 
It’s like flipping a coin each time you pick an 푎. This is our motivating example for making 
our definition. 
S2 Probabilistic Turing Machines 
We set up a model of computation—probabilistic Turing machines—which allows us to talk 
about complexity classes for algorithms like this. 
Definition 22.2: A probabilistic Turing machine is a type of NTM where we always 
have 1 or 2 possible moves at each point. If there is 1 move, we call it a deterministic 
move, and if there are 2 moves, we call it a coin toss. We have accept or reject possibilities 
as before. 
We consider machines which run in time poly(푛) on all branches of its computation. 
145
Lecture 22 Notes on Theory of Computation 
Definition 22.3: For a branch 푏 of 푀 on 푤, we say the probability of 퐵 is 
푃(푏) = 2−ℓ 
where ℓ is the number of coin toss moves in 푏. We have 
푃(푀 accepts 푤) = Σ︁ 푏 accepting branch 
푃(푏). 
This is the obvious definition: what is the probability of following 푏 if we actually tossed 
coins at each coin toss step? At each step there is 1 
2 chance of going off 푏. 
The machine will accept the input with certain probability. Accept some with 99%, 0%, 
2%, 50%. We want to say that the probability does the right thing on every input, but with 
small probability of failing (the error). 
Definition 22.4: For a language 퐴, we say that probabilistic TM 푀 decides 퐴 with error 
probability 휀 if for 푤 ∈ 퐴, 
푃(푀 accepts 푤) ≥ 1 − 휀. 
If 푤̸∈ 퐴, then 
푃(푀 rejects 푤) ≥ 1 − 휀 
(i.e., it accepts with small probability, 푃(푀 accepts푤) ≤ 휀.) 
For instance if a machine accept with 1% error, then it accept things in the language 
with 99% probability. 
There is a forbidden behavior: the machine is not allowed to be unsure, for instance 
accept/reject an input with probability 1 
2 . It has to lean overwhelmingly one way or another 
way. How overwhelming do you want to be? We have a parameter 푘, which we can apply 
universally to adjust the error possibility. By repeating an algorithm many times, we can 
decrease error. 
Lemma 22.5 (Amplification lemma): For a probabilistic Turing machine 푀 with error 
probability 휀, with 0 ≤ 휀  1 
2 , any any polynomial 푝(푛), there is a probabilistic Turing 
machine 푀′ equivalent to 푀 and has error probability 2−푝(푛). 
Not only can we get the error probability small, we can get the probability decreasing 
rapidly in terms of 푛. 
Proof sketch. 푀′ on 푤 runs 푀 on 푤 poly(푛) times and outputs the majority answer. 
This motivates the following important definition of a complexity class. 
Definition 22.6: Define 
BPP = {︂퐴 : some probabilistic poly-time TM decides 퐴 with error probability 
1 
3}︂. 
BPP stands for bounded probabilistic polynomial-time. 
146
Lecture 22 Notes on Theory of Computation 
Here, bounded means bounded below 1 
2 . The 1 
3 looks like an arbitrary number, but it 
doesn’t matter. Once you have a TM you can make the probability 1 
10100 is you want. All 
you need about 1 
3 is that 1 
3  1 
2 . 
We can prove PRIMES∈BPP by souping up the algorithm we described appropriately. 
Now we know PRIMES∈P. Obviously P⊆BPP. (A P-algorithm gives the right answer with 
error 0.) We still don’t know P ? =BPP. 
In fact most people believe P=BPP, because of pseudorandomness. If there was some 
way to compute a value of the coin toss in a way that would act as good as a truly random 
coin toss, with a bit more work one could prove P=BPP. A lot of progress has been made 
constructing pseudo-random generators, but they require assumptions such as P̸=NP. 
S3 Branching programs 
We turn to a bigger example of a problem in BPP that has a beautiful proof. It has an 
important idea that turned out to be revolutionary in complexity theory. 
(A useful picture to keep in mind is the following. Fig 3.) 
We need to define our problem. 
Definition 22.7: A branching program (BP) is a directed graph labeled with variable 
names (possibly repeated) such that the following hold. 
∙ Every node has a label and has 2 outgoing edges 0 and 1, except for two special nodes 
at the end. 
∙ The 2 special nodes are 0 and 1. (Think of them as the output.) 
147
Lecture 22 Notes on Theory of Computation 
∙ There is a special start node, and no cycles. 
To use a branching program, make an assignment of the variables to 0’s and 1’s. Once 
you have the assignment, put your finger on the start node. Look at the variable at the node. 
Read the variable’s value, and follow 0 or 1 out. An assignment of variables will eventually 
take you to an output node 0 or 1; that is the output of the program. 
Here is a branching program. It computes the exclusive or function. 
We want to test whether two different-looking branching programs are equivalent: whether 
they compute the same function. 
Definition 22.8: Define the equality problem for BP’s by 
EQBP = {⟨퐵1,퐵2⟩ : 퐵1,퐵2 are BP’s and compute some function} . 
This is in coNP: when two BP’s are not equivalent, then we can give an assignment on 
which they differ. 
EQBP ∈ coNP. 
In fact it is coNP–complete. There’s not much more we can say without radical consequences 
to other things. 
We consider a special case that disallows a feature that our first BP has. We disallow 
reading the same variable twice on any path. Once we’ve read 푥1, we can’t read 푥1 again. 
148
Lecture 23 Notes on Theory of Computation 
Definition 22.9: In a read-once BP, each 푥푖 can appear at most once on a path from the 
start to the output. 
Let’s look at the problem 
EQROBP = {⟨퐵1,퐵2⟩ : 퐵1,퐵2 are read-once BP’s and compute some function} . 
This is in coNP, but it’s not known to be complete. (It is not known to be P, but known to 
be in BPP. It would probably not be coNP–complete.) 
Our main theorem is the following. 
Theorem 22.10: EQROBP ∈BPP. 
Our first approach is to run the 2 BP’s on random inputs. But that’s not good enough 
to give a BPP algorithm: we can only run on polynomially many out of exponentially many 
input values, and see if they ever do something different. But you can construct branching 
programs 퐵1 and 퐵2 that agree everywhere except at 1 place. They are obviously not 
equivalent. But if you run them on random input, the chance of finding that disagreement 
is low. Even if you run polynomially many times, you’re likely not to see the disagreement, 
and you would think they’re not equivalent. 
We need to make the chance of finding the disagreement at least 1 
2 3, or some fixed value 
greater than 1 
2 . 
Instead we’ll do something totally crazy. Instead of setting the 푥푖’s to 0’s and 1’s, we’ll 
set them to other values: 2, 3’s, etc. What does that mean? The whole problem is to define 
it. We extend in some algebraic way to apply to nonboolean input, and a single difference 
gets magnified into an overwhelming difference. 
This is worth doing, because the math ideas behind the proof is important. 
We’ll give a taste of the proof, and finish it next time. 
Now 푥1 could be given the value 2. We’ll blend 0’s and 1’s together. It uses the following 
important technique, called arithmetization. We want to convert a Boolean model of 
computation into arithmetic operations that simulate boolean ones. 
For instance consider ∧,∨. We want to simulate these using arithmetic operations that 
operate on boolean variables the same way. We want to use +,× but get the same answer. 
푎 ∧ 푏 → 푎푏 
¬푎 → 1 − 푎 
푎 ∨ 푏 → 푎 + 푏 − 푎푏. 
Our first step is to convert the branching programs, write it out in terms of and’s, or’s, 
and negations. We express the program as a circuit in terms of and’s and or’s. Then we 
convert to +’s and ×’s, so that the program still simulate faithfully when given boolean 
inputs, but now has meaning for nonboolean inputs. That’s the whole point. There is 
analysis that we have to work through, but this sets the stage. 
149
Lecture 23 Notes on Theory of Computation 
Lecture 23 
Thu. 10/11/12 
We are going into more advanced topics. Last time we talked about 
∙ probabilistic computation 
∙ BPP 
Today we’ll see that 
∙ EQROBP ∈BPP. 
Unlike PRIMES, this is not known to be in 푃. A read-once branching program looks like 
this. (Ignore the blue 1’s for now.) 
S0 Homework 
Problem 1: 
Using padding, we can relate different unsolved problems: EXP vs. NEXP to P vs. NP. 
Problem 2: 
This is on nondeterministic time hierarchy. It has a short answer, but you have to see 
what’s going on the the proof to see why it doesn’t work. There is a nondeterministic time 
hierarchy, but you need a fancier method of proof, to overcome this problem. A famous 
paper by Steve Cook shows how to overcome it. 
S1 EQROBP 
In the figure, 퐵1 is the constant-1 branching program. The only way 퐵2 can output 0 is if 
everything is 0. It computes the OR function. 퐵1 and 퐵2 almost compute the same function; 
they agree everywhere except the place where everything is 0. 
150
Lecture 23 Notes on Theory of Computation 
If we run the programs on polynomially many input, the chance that we land on the 
single input where they differ is exponentially small. We need exponentially many of them 
to have a shot. The straightforward algorithm for testing equivalence is not going to run in 
polynomial time. 
The key idea is to run on other numbers, using arithmetization. This technique is also 
used in error-correcting codes and other fancy things. 
We will simulate ∧,∨ with +,×. 
푎 ∧ 푏 → 푎푏 
푎 → 1 − 푎 
푎 ∨ 푏 → 푎 + 푏 − 푎푏 → 푎 + 푏 if 푎, 푏 not both 1. 
(For 푎 ∨ 푏, we can use 푎 + 푏 if 푎, 푏 are never both 1.) 
We first rerepresent branching program with and’s and or’s. 
Let’s think about running the branching program slightly differently. It is a boolean 
evaluation: we give a boolean assignment to every one of the nodes and edges. Put a 1 on 
every node and edge that you follow and 0 on everything you don’t. 
Every path starts at 푥1, so we assign the node with 1. Let’s say 푥1 = 1; we write 1 on 
the edge going to 푥3 and 0 on the edge 푥2 to say we didn’t go that way. Put 1 on 푥3. Let’s 
say 푥3 is 0. Then we put 1 on the 0-path from 푥3. Everything else is 0. 
The advantage is that we can write a boolean expression corresponding to what 1’s we 
write down. Suppose we have a node 푥푖. 
151
Lecture 23 Notes on Theory of Computation 
We need a boolean expression to say which edge we went down. On the right side we’ll 
put 푎 ∧ 푥푖. Why does that make sense? The only way we’ll go down that path is if we went 
through 푥푖 and 푥푖 = 1. On the 0 edge we write 푎 ∧ 푥푖. 
This tells us how to label the edges. How do we label the nodes? Suppose 푎1, 푎2, 푎3 label 
edges going to a node. Then we label the node with 푎1 ∨ 푎2 ∨ 푎3. The start gets 1 and the 
output is the value of the 1 node (푎 =output). 
Now let’s redo it with + and × instead of ∨ and ∧. There are no cycles; the path can 
enter every node in exactly one way. Thus we never have more than 1 푎푖 set to 1. Thus for 
the “or” we don’t need the correction term, and we can just add, 푎 + 푏. 
Using the arithmetization, we can assign non-Boolean values, and there is perhaps some 
nonsensical result that comes out. Remember that we wrote down the branching program for 
parity (exclusive or), for instance, 푥1⊕푥2. Have you ever wondered what 2⊕3 is? Probably 
not. Let’s plug in 푥1 = 2 and 푥2 = 3 into the arithmetized version of this branching program. 
Let’s see what happens. Plug in 1 at the start node. If we assigned 푥1 = 0 and 푥2 = 1, 
everything work out as before. But now can give values even if we don’t have 0’s and 1’s 
coming in. We get the following values. 
152
Lecture 23 Notes on Theory of Computation 
We get 2 ⊕ 3 = −7. (We haven’t discovered some fundamental fact about exclusive or. 
There’s no fundamental meaning to this.) 
Let’s summarize. Originally we thought of a running a BP as following some path. That 
way of thinking doesn’t lend itself to arithmetization. Instead of thinking about taking a 
path, think about evaluating a branching program by assigning values to all nodes by the 
procedure, and looking at 1 node. There is no path, but this way of thinking is equivalent. 
We can look at the value on the output node even if the input nodes didn’t have 0/1 values 
coming in. 
If we had a different branching program representation of the same boolean formula (say, 
xor), would we get different value? No. As you will see from the coming proof, if we have 
a different representation that is still read-once, and it agrees on the boolean case, then it 
agrees on the non-boolean case. This is not true with a general branching program! 
As an example, if we flip 푥1, 푥2 in the xor program we get the same value for 2 ⊕ 3. 
Finally, here is the probabilistic algorithm. 
Proof. Let 푀 =“on ⟨퐵1,퐵2⟩, 
1. Randomly select non-Boolean values for 푥1, . . . , 푥푚 from the finite field F푞 = {0, 1, . . . , 푞− 1} where 푞 is prime (this is modular arithmetic modulo 푞). Choose 푞  3푚. 
2. Compute 퐵1,퐵2 (arithmetized) on 푥1, . . . , 푥푚. 
3. Accept if we get the same output value. Reject if we do not.” 
Now we have to prove this works. 
We claim the following. 
1. If 퐵1,퐵2 are equivalent then 푃(푀 accepts) = 1. (If they agree then they agree on 
boolean values. We’ll prove they agree even on nonboolean values.) 
2. If 퐵1,퐵2 are not equivalent then 푃(푀 rejects) ≥ 2 
3 . 
We prove statement 1. This is the hard part. 
We take the input variables and keep them as variables 푥푖; do the calculation symbolically. 
We’ll write down expressions like 푥1, 1 − 푥1, and so forth. At every step we’re multiplying 
things like 푥푖 or (1 − 푥푖), or adding together terms. At the output node 1 we have some 
polynomial in 푥1, . . . , 푥푚. 
153
Lecture 23 Notes on Theory of Computation 
Evaluating 퐵1,퐵2 symbolically in the arithmetized version, we get polynomials 푃1, 푃2 on 
푥1, . . . , 푥푚. These polynomials have a very nice form: They all look like products of 푥푖’s and 
(1 − 푥푖)’s, added up, for instance 
푥1(1 − 푥2)푥3푥4(1 − 푥5) · · · 푥푚 
+ (1 − 푥1)푥2푥3(1 − 푥4) · · · 
+ · · · . 
In each summand, we never get same variable appearing more than once because of the 
read-once property. How do we know we get every variable appearing once? We can always 
pad out a branching program by adding missing variables, to turn it into a “read exactly 
once” branching program. 
Both 푃1, 푃2 look like this. Why is it nice? It’s the truth table of the original program on 
Boolean values. The summands give the rows of the Boolean truth table. If two BP’s agree 
in the Boolean world, they have the same truth table and hence the same polynomial, and 
agree everywhere. 
(There is an exponential number of rows, but this doesn’t matter: when we run the 
algorithm, we don’t calculate the polynomial, which takes exponential time. We get a 
specific value of the polynomial by plugging in values and computing things on the fly.) 
Part 2 uses a magical fact about polynomials. 
Lemma 23.1: If 푃(푥) is a nonzero polynomial of degree at most 푑, then 푃(푥) has at most 
푑 zeros. (This is true in any field, in particular F푞.) 
The probabilistic version is the following: if you pick 푥 ∈ F푞 at random, Prob(푃(푥) = 
0) ≤ 푑 
푞 . Lemma 2 is the multivariate version. 
Lemma 23.2 (Schwartz-Zippel): lem:schwartz-zippel If 푃(푥1, . . . , 푥푚) is nonzero, each 푥푖 
has degree at most 푑, and you pick 푥푖 ∈ F푞 randomly, then 
Prob[푃(푥1, . . . , 푥푚) = 0] ≤ 
푚푑 
푞 
. 
This is proved from the single-variable case by induction. 
Remember we had 2 polynomials 푃1, 푃2? Let’s look at the difference 푃 := 푃1−푃2. If the 
branching programs are not equivalent, then the difference of the polynomials is nonzero. 
That nonzero polynomial has few roots. 푃 is zero in very few places, so 푃1, 푃2 agree in very 
few places. When we run the arithmetization of 푃1, 푃2, we’re unlikely to get the same value 
coming out. It’s very likely we’ll get different values coming out, and very likely we’ll reject. 
For our 푃 = 푃1 − 푃2, what is 푑? Every variable appears once. Hence 푑 = 1. 푚 is the 
number of variables, and 푞  3푚, so the probability is at most 1 
3 . The chance we got an 
agreement in 푃1, 푃2 is at most 1 
3 . The chance we get disagreement is at least 2 
3 . 
154
Lecture 24 Notes on Theory of Computation 
Though arithmetization—converting boolean formulas to a polynomial and then run-ning 
on randomly selected nonboolean formula—we can magnify the chance that a 
probabilistic algorithm works. 
This is a nice probabilistic algorithm. We’ll use this method again in the last two lectures, 
where we’ll prove amazing results about satisfiability using interactive proof systems. 
Lecture 24 
Thu. 12/6/12 
The final exam is Wednesday December 19, 9-12 in Walker. It is open book, notes, and 
handouts. It covers the whole semester with emphasis on the 2nd half. It is a 3-hour version 
of the midterm with some short-answer questions. 
Handout: sample questions. 
Last time we showed EQROBP ∈ 퐵푃푃 and today we’ll talk about 
∙ Interactive Proofs 
∙ IP 
S1 Interactive proofs 
1.1 Example: Graph isomorphism 
We’ll move into the very last topic, an amazing idea: the interactive proof system. It’s 
a probabilistic version of NP, the same way BPP is a probabilistic version of P. Another 
amazing thing is that it goes against the intuition of NP we built up during the term: If a 
problem is in NP, it has short certificates, so a prover can convince a verifier about a certain 
fact, membership in the language. 
Using the idea of interactive proof, a prover can convince a verifier about a certain fact 
even though there are no short certificates. 
We’ll start this off with a famous example: testing whether or not graphs are isomorphic: 
ISO = {⟨퐺1,퐺2⟩ : 퐺1,퐺2 graphs,퐺1 ≡ 퐺2} . 
Two graphs are isomorphic iff we can match up the nodes so that edges go between cor-responding 
nodes. It is clear ISO∈NP: just give the matching. It is one of the rare (combi-natorial) 
problems in NP that is neither known to be in P nor NP-complete. Almost every 
other problem is either known to be in P or NP-complete, except for bunch of problems 
related to number theory. 
The graph isomorphism is the most famous such problem. There is a huge literature 
trying to prove it one way or other, with no success yet. Define NONISO∈ ISO. 
155
Lecture 24 Notes on Theory of Computation 
Is NONISO∈NP? Not known. Its seems one has to astronomically search through all 
permutations to determine non-isomorphism. Is there a short certificate, or do you essentially 
have to go through same process again? 
There is way for you to convince me of the fact provided you have sufficient computational 
power at your disposal. Here is a whimsical version of interactive proof system: The prover 
has unlimited computational power but is not trustworthy. The verifier checks the prover. 
The prover is like army of slaves, also called graduate students. The verifier is the king, 
sometimes called the professor. The grad students (slaves) stay up all night, and have 
unlimited computational power. The professor only operates in probabilistic polynomial 
time. The professor has a research problem: are these graphs isomorphic? The grad students 
get to work, with their fancy computers. They find: Yes, they’re isomorphic! The professor 
knows that grad students basically honest folks, but they have lots of other things worry 
about, like XBox. The prof needs to be convinced, and be sure what answer is. If the 
grad students say yes, the professor says: convince me, and the grad students give the 
isomorphism. Suppose the grad students say the graphs are nonisomorphic. The professor 
asks for a proof. 
There is a simple protocol they can go through with the professor to convince this pro-fessor 
that the graphs are non-isomorphic. This was established back in mid-1980’s. Laszlo 
Babi, a leading expert in graph isomorphism, was flabbergasted. 
Both the professor and students have access to the 2 graphs. The professor takes 2 
graphs, turns around secretly, chooses one of 퐺1,퐺2 at random, and randomly permutes the 
vertices. The professor asks, “Is the graph I picked 퐺1 or 퐺2?” If the grad students can 
answer reliably, then they must be nonisomorphic. If they are isomorphic, it could have 
come from either one, and there is no way to tell which one the prof picked; the best thing 
one can do is guess. If the graphs really were different, the students can use a supercomputer 
to guess which one professor the picked: The graph can only be isomorphic to one of 퐺1,퐺2. 
The professor does this several times. If the students can answer the question 100 times 
correctly in a row, either they are legitimately doing the protocol, or they’re incredibly lucky. 
In fact, interactive proof systems can show formulas are unsatisfiable. The proof is more 
complicated. This gives all of coNP doable with interactive proof system! 
We know ISO∈NP, but we don’t know whether NONISO∈NP. But we can convince 
someone of non-isomorphism if we have enough computational power. We extend NP to a 
bigger class, where we can convince a verifier of membership in languages beyond NP. 
Interactive proof systems play a big role in cryptography: here the prover is limited in 
some way, but has special information (a password), and has to convince someone that he 
has the password without revealing the password. 
1.2 Formal model of interactive proofs 
We write down a formal model. 
Definition 24.1: Let 푃 be a prover with unlimited computational power. Let 푉 be a verifier 
with probabilistic polynomial time computational power. Let (푃 ↔ 푉 ) be an interaction 
156
Lecture 24 Notes on Theory of Computation 
where 푃 and 푉 exchange polynomially many messages (both given input 푤) until 푉 says 
accept or reject. 
We say that 퐴 ∈IP if there are 푉 and 푃 where for every 푤, 푤 ∈ 퐴, 
Prob[(푃 ↔ 푉 ) = accept] ≥ 
2 
3 
and for 푤̸∈ 퐴, for every Ü푃, 
Prob[(Ü푃 ↔ 푉 ) = reject] ≥ 
2 
3 
. 
To show a language is in IP, we set up a verifier and prover. For every string in the 
language, working together, the prover gets the verifier to accept with high probability. If 
the string is not in language, then no matter what prover you choose (푃 is cheating prover 
trying to make the verifier accept when she shouldn’t), rejection is the likely Üoutcome. 
Theorem 24.2: NONISO∈IP. 
Proof. We write the NONISO protocol with this setup in mind. On input ⟨퐺1,퐺2⟩, 
V: Choose 퐺1 or 퐺2 at random. Then randomly permute and send result to 푃. 
P: Replies: which 퐺푖 did 푉 choose? 
Repeat twice. 
V: Accept if 푃 is correct both times. 
Reject if 푃 is ever wrong. 
If 퐺1̸≡ 퐺2 then 
Prob[(푉 ↔ 푃) accepts] = 1 
The honest prover can tell which 퐺푖 the verifier picked by detecting whether it is isomorphic 
to 퐺1 or 퐺2. 
The honest prover only in play when ⟨퐺1,퐺2⟩ is in the language. Now the sneaky prover 
steps in: I’ll take a shot at it. If 퐺1̸≡ 퐺2, then the sneaky prover (pretending 퐺1 ≡ 퐺2) 
can’t do anything, can only guess. The probability it guesses right twice is 1 
4 . 
Thus if 퐺1 ≡ 퐺2, then for any Ü푃, 
Prob[(푉 ↔ Ü푃) accepts] ≤ 
1 
4 
. 
This shows NONISO∈P. 
Proposition 24.3: NP⊆IP. 
BPP⊆IP 
Proof. For NP⊆IP, the prover sends the certificate to the verifier. This is just a 1-way 
conversation. The verifier checks the certificate. 
For BPP, the verifier doesn’t need the prover. The verifier can it do all by his or her 
lonesome self. 
157
Lecture 24 Notes on Theory of Computation 
S2 IP=PSPACE 
Now we’ll prove the amazing theorem. This blew everything away when it came out, I 
remember that. 
Theorem 24.4: IP=PSPACE. 
What does this mean? Take the game of chess, some game where you can test in poly-nomial 
space which side has a forced win. It takes an ungodly amount of time to go through 
the search tree, but in relatively small space you can show (say,) white has a forced win. 
There is probably no short certificate, but if Martians with supercomputers have done all 
computations, they could convince mere probabilistic time mortals like us that white has a 
forced win without us going through the entire game tree. 
We’ll prove a weaker version, coNP⊆IP. This was discovered first, contains pretty much 
all the ideas, and is easier to describe. The full proof of IP=PSPACE is in the textbook. 
It’s enough to work with satisfiability, show the prover can convince the verifier that a 
formula is not satisfiable. 
The amazing thing is the technique. We’ll use arithmetization as we did before. 
2.1 Aside 
Every few months I get a email or letter claiming to prove P=NP. The first thing I look 
is whether P=NP. If the person claims P=NP, I don’t even look at it. It is probably some 
horrible algorithm with accompanying code. 
I tell them, then you can factor numbers. Here’s a website with various numbers known 
to be composite, where no one knows the factorization. Just factor one of them. That 
usually shuts them up, and I never hear from them again. 
If P̸=NP, then almost without exception, their proof goes like this. They claim, clearly 
any algorithm for SAT, etc. has to operate in the following way... Then they give a long 
analysis that shows it has to be an exponential algorithm. The silly part is the “clearly.” 
That’s the whole point. How do you know you can’t do something magical; plug the input 
through a Fourier transform and do some strange things, and have the answer pop out. 
You have to prove no such crazy algorithms exist. 
The cool thing about the IP protocol is that it does something crazy and actually works. 
2.2 coNP⊆IP 
Proof of coNP⊆IP. For a formula 휑 let #휑 be the number of satisfying assignments of 휑. 
Note #휑 will immediately tell you whether 휑 is satisfiable. 
Define number-SAT (sharp-SAT) by 
#SAT := {⟨휑, 푘⟩ : #휑 = 푘} . 
This is not known to be in NP. (It would be in NP for small 푘. However, if there are 
exponentially many satisfying assignments, naively we’d need an exponential size certificate.) 
158
Lecture 24 Notes on Theory of Computation 
However, we show 
#푆퐴푇 ∈ IP. 
We’ll set up a little notation. Fix 휑. Let 
휑(푥1, . . . , 푥푚) =⎧⎨⎩ 
0, unsatisfying 
1, satisfying 
Let 
푇() = Σ︁ 푥푖∈{0,1} 
휑(푥1, . . . , 푥푚). 
Note 푇() = #휑 is the number of satisfying assignments. Add 1 every time satisfying, 0 if 
not satisfying assignments. 
Define 
푇(푥1, . . . , 푥푗) = Σ︁ 푥푖∈{0,1}, 푖푗 
휑(푥1, . . . , 푥푚). 
We are preseting some of the values of the formula, and counting the number of satisfying 
assignments subject to those presets. Thus 
푇(푥1, . . . , 푥푗) = #휑푥1,...,푥푗 
where 휑0 = 휑 with 푥1 = 0, 휑01 = 휑 with 푥1 = 0, 푥2 = 1, and so forth. In particular, since 
we assign values to all of the 푥푖 values, 푇(푥1, . . . , 푥푛) is 0 or 1. 
We have the following relations. 
푇() = #휑 
푇(푥1, . . . , 푥푚) = 휑(푥1, . . . , 푥푚) 
푇(푥1, . . . , 푥푗) = 푇(푥1 . . . 푥푗0) + 푇(푥1 . . . 푥푗1). 
To see the last equation, note the number of satisfying assignments with 푥1, . . . , 푥푗 is the sum 
of the number of satisfying assignments additionally satisfying 푥푗+1 = 1 and the number of 
satisfying assignments additionally satisfying 푥푗+1 = 0, because one of these has to be true. 
We set up the #SAT protocol. (Our first version will have a little problem, as we will 
see.) Suppose the input is ⟨휑, 푘⟩. The prover is supposed to make the verifier accept with 
high probability. 
0. P: Sends 푇(), 푉 checks 푘 = 푇(). (Reject if things don’t check out.) 
1. P: Sends 푇(0) and 푇(1). 푉 checks that 푇() = 푇(0) + 푇(1). 
2. P: Sends 푇(00), 푇(01), 푇(10), 푇(11). 푉 checks 푇(0) = 푇(00) + 푇(01) and 푇(1) = 
푇(10) + 푇(11). (This is exponential, which is a problem. But humor me.) 
... 
159
Lecture 24 Notes on Theory of Computation 
푚. P: Sends 푇(0 . . . 0), . . . , 푇(1 . . . 1 
⏟ ⏞ 푚 
). V checks 푇(0 . . . 0 
⏟ ⏞ 푚−1 
) = 푇(0 . . . 0 
⏟ ⏞ 푚−1 
0) + 푇(0 . . . 0 
⏟ ⏞ 푚−1 
1), . . ., 
푇(1 . . . 1 
⏟ ⏞ 푚−1 
) = 푇(1 . . . 1 
⏟ ⏞ 푚−1 
0) + 푇(1 . . . 1 
⏟ ⏞ 푚−1 
1). 
푚 + 1. V checks 푇(0 . . . 0) = 휑(0 . . . 0), . . ., 푇(1 . . . 1) = 휑(1 . . . 1), and accepts if all these are 
equal. 
Think of this as a tree. 
This algorithm might seem trivial, but it’s important to understand the motivations. 
An honest prover sends correct input. Suppose we have a dishonest prover: If 푘 is wrong, 
the prover tries to convince the verifer to accept anyway. The prover sends wrong value for 
푇(). 
This is like asking a kid questions, trying to ferret out a lie. One lie lead to other lies. 
(But to the kid things may look locally consistent...) There must be a lie on at least one of 
two branches. At least one lie must propagate down at each step, all the way to a lie at the 
bottom, which the verifier catches. 
The only problem is the exponential tree. You can imagine trying to do something 
probabilistic. Instead of following both branches, let’s pick a random branch to follow. 
You’re a busy parent. You can’t check out all possible things your kid is saying. Pick one. 
Choose one branch. But you want a high probability of detecting the cheating. If you a pick 
random branch, with 50-50 chance, as soon as you get off the lying side we get to the honest 
side, prover is saved. The prover thinks, “You’re not going to catch me now,” and behaves 
honestly all the way down. 
The dishonest prover should only make the verifier accept with low probability. 
Instead we pick non-boolean values. We arithmetize the whole setup, and reduce to one 
randomly chosen non-boolean case. We only have to follow a single line of these non-boolean 
values down. Again we rely on the magic of polynomials. 
If the prover lied, then in almost all of the non-boolean values we could pick, there will 
be a lie. A lie leads to another lie almost certainly. The rest of the protocol is set up in 
160
Lecture 25 Notes on Theory of Computation 
terms of arithmetization. Arithmetize everything and everything just works. We finish next 
time. 
Lecture 25 
Tue. 12/11/2012 
Last time we talked about 
∙ interactive proofs 
∙ IP. 
Today we’ll finish the proof of coNP⊆IP. 
A prover with unlimited computational power tries to convince a verifier that a string is 
in the language. For a string in the language, the prover will convince the verifier with high 
probability. For a string not in the language, that prover, or any other prover, will fail with 
high probability. 
The big result is IP=PSPACE; we prove a weaker form coNP⊆IP. (It was around half a 
year before Adi Shamir got trick to go from coNP⊆IP to IP=PSPACE.) 
S1 coNP⊆IP 
Last time we introduced an exponential protocol for #SAT, a coNP-hard problem. 
This protocol doesn’t use the full power of IP. It is a one-way protocol, like NP: the 
verifier doesn’t send the prover any questions. 
Using arithmetization, we find a polynomial that faithfully simulates the expression when 
we plug in 0’s and 1’s. The degree of the polynomial is not too big. 
푎 ∧ 푏 → 푎푏 
푎 → 1 − 푎 
푎 ∨ 푏 → 푎 + 푏 − 푎푏 
휑 → 푃휑(푥1, . . . , 푥푚) 
The total degree of the polynomial will be at most the length 푛 of 휑: when we combine two 
expressions the degrees will at most be the sum. 
Instead of reducing the verification of one 푇-value to two 푇-values, we reduce it to one 
푇-value but one that is non-boolean. The formulas will have other values when you plug in 
161
Lecture 25 Notes on Theory of Computation 
other values. 
푇() = 푘 
 
푇(?) 
 
푇(?, ?) 
 
푇(?, ?, ?) 
We arithmetize 푃. 푃 looks just like it looks before, but instead of using the formula, we 
use the polynomial that represents the formula: 
푇(푥1, . . . , 푥푖) = Σ︁ 푥푖+1,...,푥푚∈{0,1} 
푃휑(푥1, . . . , 푥푚). 
If we preset to 0’s and 1’s, we get the same value because the polynomial agrees with the 
boolean formula on boolean values. 
If we preset nothing, there is no change: 푇() is the number of satisfying assignments. 
Everything is added up over booleans. If we set everything, we have possibly non-boolean 
values, and 
푇(푥1, . . . , 푥푚) = 푃휑(푥1, . . . , 푥푚). 
We now give the protocol. This is where the magic happens. We’ll work over some finite 
field F푞, where 푞  2푛. The reason we make it so big is that 푘 can be a value between 0 and 
2푛. We will have wraparound issues if we use a field that can’t represent all these possible 
values. 
0. P sends 푇(). 푉 checks 푘 = 푇(). 
1. P sends 푇(푧) as a polynomial in the variable 푧. (More formally, 푃 sends the coefficients. 
Note that the degree in 푧 is at most |휑|. Each number has at most 푚 bits, and there 
are at most |휑| + 1 coefficients. Of course calculating this is difficult, but that’s okay: 
this is the prover. The grad students work hard. They don’t get paid for their work 
beyond their stipend, which is polynomial, so doesn’t matter. They send an answer 
which is polynomial.) 
V checks 푇(0) and 푇(1) are correct by checking 
푇() = 푇(0) + 푇(1). 
Note the nice thing is that one object allows us to get two values. This will prevent 
the blowup. 
V sends a random 푟1 ∈ F푞. The prover now has to show 푇(푟1) is correct. 
162
Lecture 25 Notes on Theory of Computation 
2. P sends 푇(푟1, 푧) as polynomial in 푧. (푃 convinces 푉 that 푇(푟1) is correct.) 
V checks 푇(푟1) = 푇(푟1, 0) + 푇(푟1, 1). 
... 
푚. P sends 푇(푟1, . . . , 푟푚−1, 푧) as a polynomial in 푧. V checks 푇(푟1 · · · 푟푚−1) = 푇(푟1 · · · 푟푚−10)+ 
푇(푟1 · · · 푟푚−11). 푉 chooses random 푟푚 ∈ F푞. 
푚 + 1. V checks 푇(푟1, . . . , 푟푚) = 푃휑(푟1, . . . , 푟푚), and if so, accepts. 
How does the verifier check at the end stage it’s okay? Plug it into 푃휑. 
푇() = 푘 
 
푇(푟1) 
 
푇(푟1, 푟2) 
... 
 
푇(푟1, . . . , 푟푚) 푃휑(푟1, . . . , 푟푚) 
The honest prover will make the verifier accept with probability 1. Just follow the protocol: 
send the correct polynomials. 
The verifier says, “Convince me the polynomial is right by convince me it works on some 
random element.” 
Why does this work? Why are we using polynoimals? Let’s see what happens when the 
prover tries to lie. If 푘 is wrong the verifier will reject with high probability. In order to 
preserve any hope of making the verifier accept, the prover has to lie. If 푇() is a lie, then one 
of 푇(0), 푇(1) has to be wrong. But these came from the same polynomial, by plugging in 0, 
1. So the polynomial is wrong. We evaluate that wrong polynomial at a random input. But 
2 low-degree polynomials agree in only a small number of locations, because a polynomial of 
low degree has only a small number of roots. Every place where the polynomial is 0 becomes 
a zero of the difference polynomials (between the actual and claimed polynomial). 
푇(푟1) doesn’t necessarily have to be a lie. But that’s very unlikely: a small number of 
agreements (roughly 푛) out of exponentially many possibilities. 
푇(푟1) is almost certainly wrong. The dishonest prover tries to convince the verifier that 
푇(푟1) is right. The prover again has a chance again of getting lucky: if the verifier picks a 
place where the incorrect and correct polynomial agree. But every step, it’s hard to succeed. 
Almost certainly 푇(푟1) incorrect forces 푇(푟1푟2) incorrect, and so forth. 
163
Lecture 25 Notes on Theory of Computation 
1.1 Analysis of protocol 
If ⟨휑, 푘⟩ ∈ #푆퐴푇, then 
Prob(푉 ↔ 푃 accepts) = 1. 
If ⟨휑, 푘⟩̸∈ #푆퐴푇, then by the Schwartz-Zippel Lemma 23.2, for any prover Ü푃, 
Prob(푉 ↔ Ü푃 accepts) ≤ 푚 · 
deg 푃휑 
푞 
= 푚 
푛 
2푛 = 
poly(푛) 
2푛 . 
The prover has 푚 chances to get lucky. If it gets lucky, it follows the original protocol: just 
send the correct values all the way down. The probability of getting lucky at one stage is 
the degree of the polynomial divided by the size of the field 푞. This is small. 
This shows #SAT∈IP, and hence coNP⊆IP. 
S2 A summary of complexity classes 
164

More Related Content

PDF
Theory of Computation Lecture Notes
PPT
Finite automata examples
PPTX
Propositional logic & inference
PPTX
Propositional logic
PPT
5.1 greedy
DOC
AUTOMATA THEORY - SHORT NOTES
PPTX
Theory of automata and formal language
PPT
NFA or Non deterministic finite automata
Theory of Computation Lecture Notes
Finite automata examples
Propositional logic & inference
Propositional logic
5.1 greedy
AUTOMATA THEORY - SHORT NOTES
Theory of automata and formal language
NFA or Non deterministic finite automata

What's hot (20)

PPTX
0.0 Introduction to theory of computation
PPTX
Regular expressions
PPT
Finite automata
PPTX
Bellman ford algorithm
PPTX
AI Propositional logic
PDF
Flat unit 3
PPTX
Natural Language processing Parts of speech tagging, its classes, and how to ...
PDF
Chapter1 Formal Language and Automata Theory
PDF
Syntactic analysis in NLP
PPT
Lecture 5
PPT
Pumping lemma
PDF
Lecture: Regular Expressions and Regular Languages
PPT
Sum of subsets problem by backtracking 
PPT
Regular expressions-Theory of computation
PPTX
Formal language
PPT
Graph algorithm
PDF
Pumping lemma (1)
PPT
PDF
Longest common subsequence
PDF
Syntax directed translation
0.0 Introduction to theory of computation
Regular expressions
Finite automata
Bellman ford algorithm
AI Propositional logic
Flat unit 3
Natural Language processing Parts of speech tagging, its classes, and how to ...
Chapter1 Formal Language and Automata Theory
Syntactic analysis in NLP
Lecture 5
Pumping lemma
Lecture: Regular Expressions and Regular Languages
Sum of subsets problem by backtracking 
Regular expressions-Theory of computation
Formal language
Graph algorithm
Pumping lemma (1)
Longest common subsequence
Syntax directed translation
Ad

Similar to Formal language & automata theory (20)

DOCX
Introduction to complexity theory assignment
PDF
UNIT -IV DAA.pdf
PPTX
The Limits of Computation
PPTX
Limits of Computation
PDF
9. chapter 8 np hard and np complete problems
PPTX
Algorithm Design and Complexity - Course 1&2
PDF
A Survey Of NP-Complete Puzzles
PPTX
NP completeness
PPT
How Hard Can a Problem Be ?
PDF
Introduction
PPTX
Complexity analysis - The Big O Notation
PDF
Lecture 2: Computational Semantics
PPTX
Lecture1.pptx
PPTX
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
PPTX
Theory of computation anna University ppt
PDF
Sienna 2 analysis
PPTX
FOrmalLanguage and Automata -undecidability.pptx
PPT
1. Introduction to __Automata Theory.ppt
PPTX
CSE680-17NP-Complete.pptx
PPT
2009 CSBB LAB 新生訓練
Introduction to complexity theory assignment
UNIT -IV DAA.pdf
The Limits of Computation
Limits of Computation
9. chapter 8 np hard and np complete problems
Algorithm Design and Complexity - Course 1&2
A Survey Of NP-Complete Puzzles
NP completeness
How Hard Can a Problem Be ?
Introduction
Complexity analysis - The Big O Notation
Lecture 2: Computational Semantics
Lecture1.pptx
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
Theory of computation anna University ppt
Sienna 2 analysis
FOrmalLanguage and Automata -undecidability.pptx
1. Introduction to __Automata Theory.ppt
CSE680-17NP-Complete.pptx
2009 CSBB LAB 新生訓練
Ad

More from NYversity (20)

PDF
Programming methodology-1.1
PDF
3016 all-2007-dist
PDF
Programming methodology lecture28
PDF
Programming methodology lecture27
PDF
Programming methodology lecture26
PDF
Programming methodology lecture25
PDF
Programming methodology lecture24
PDF
Programming methodology lecture23
PDF
Programming methodology lecture22
PDF
Programming methodology lecture20
PDF
Programming methodology lecture19
PDF
Programming methodology lecture18
PDF
Programming methodology lecture17
PDF
Programming methodology lecture16
PDF
Programming methodology lecture15
PDF
Programming methodology lecture14
PDF
Programming methodology lecture13
PDF
Programming methodology lecture12
PDF
Programming methodology lecture11
PDF
Programming methodology lecture10
Programming methodology-1.1
3016 all-2007-dist
Programming methodology lecture28
Programming methodology lecture27
Programming methodology lecture26
Programming methodology lecture25
Programming methodology lecture24
Programming methodology lecture23
Programming methodology lecture22
Programming methodology lecture20
Programming methodology lecture19
Programming methodology lecture18
Programming methodology lecture17
Programming methodology lecture16
Programming methodology lecture15
Programming methodology lecture14
Programming methodology lecture13
Programming methodology lecture12
Programming methodology lecture11
Programming methodology lecture10

Recently uploaded (20)

PDF
01-Introduction-to-Information-Management.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Cell Types and Its function , kingdom of life
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
RMMM.pdf make it easy to upload and study
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
01-Introduction-to-Information-Management.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
O7-L3 Supply Chain Operations - ICLT Program
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
GDM (1) (1).pptx small presentation for students
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Cell Structure & Organelles in detailed.
Sports Quiz easy sports quiz sports quiz
Computing-Curriculum for Schools in Ghana
Cell Types and Its function , kingdom of life
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
STATICS OF THE RIGID BODIES Hibbelers.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
RMMM.pdf make it easy to upload and study
Microbial diseases, their pathogenesis and prophylaxis
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

Formal language & automata theory

  • 1. Theory of Computation Lectures delivered by Michael Sipser Notes by Holden Lee Fall 2012, MIT Last updated Tue. 12/11/2012 Contents Lecture 1 Thu. 9/6/12 S1 Overview 5 S2 Finite Automata 6 S3 Formalization 7 Lecture 2 Tue. 9/11/12 S1 Regular expressions 12 S2 Nondeterminism 13 S3 Using nondeterminism to show closure 17 S4 Converting a finite automaton into a regular expression 20 Lecture 3 Thu. 9/13/12 S1 Converting a DFA to a regular expression 22 S2 Non-regular languages 25 Lecture 4 Tue. 9/18/12 S1 Context-Free Grammars 30 S2 Pushdown automata 33 S3 Comparing pushdown and finite automata 33 Lecture 5 Thu. 9/20/12 S1 CFG’s and PDA’s recognize the same language 37 S2 Converting CFG→PDA 38 S3 Non-CFLs 39 S4 Turing machines 42 Lecture 6 Tue. 9/25/12 S1 Turing machines 44 S2 Philosophy: Church-Turing Thesis 50 Lecture 7 Thu. 9/27/12 S1 Examples of decidable problems: problems on FA’s 52 S2 Problems on grammars 56 1
  • 2. Lecture 8 Tue. 10/2/12 S1 Languages 58 S2 Diagonalization 59 S3 퐴푇푀: Turing-recognizable but not decidable 63 S4 Showing a specific language is not recognizable 65 Lecture 9 Thu. 10/4/12 S1 Reducibility 67 S2 Mapping reducibility 69 Lecture 10 Thu. 10/11/12 S1 Post Correspondence Problem 72 S2 Computation Histories and Linearly Bounded Automata 73 S3 Proof of undecidability of PCP 76 Lecture 11 Tue. 10/16/12 S1 Computation history method 77 S2 Recursion theorem 79 S3 Logic 83 Lecture 12 Thu. 10/18/12 S1 Introduction to complexity theory 84 S2 Time Complexity: formal definition 87 Lecture 13 Tue. 10/23/12 Lecture 14 Tue. 10/30/12 S1 P vs. NP 91 S2 Polynomial reducibility 92 S3 NP completeness 95 Lecture 15 Thu. 11/1/12 S1 Cook-Levin Theorem 101 S2 Subset sum problem 105 Lecture 16 Tue. 11/6/12 S1 Space complexity 108 S2 Savitch’s Theorem 112 Lecture 17 Thu. 11/8/12 S1 Savitch’s Theorem 114 S2 PSPACE–completeness 116 Lecture 18 Thu. 10/11/12 S1 Games: Generalized Geography 120 S2 Log space 124 S3 퐿,푁퐿 ⊆ 푃 126 Lecture 19 Thu. 11/15/12 S1 L vs. NL129 S2 NL-completeness 130 S3 NL=coNL 132 Lecture 20 Tue. 11/20/12 S1 Space hierarchy 134 S2 Time Hierarchy Theorem 137
  • 3. Lecture 21 Tue. 11/27/12 S1 Intractable problems 138 S2 Oracles 142 Lecture 22 Thu. 11/29/12 S1 Primality testing 144 S2 Probabilistic Turing Machines 145 S3 Branching programs 147 Lecture 23 Thu. 10/11/12 S1 EQROBP 150 Lecture 24 Thu. 12/6/12 S1 Interactive proofs 155 S2 IP=PSPACE158 Lecture 25 Tue. 12/11/2012 S1 coNP⊆IP 161 S2 A summary of complexity classes 164
  • 4. Introduction Michael Sipser taught a course (18.404J) on Theory of Computation at MIT in Fall 2012. These are my “live-TEXed” notes from the course. The template is borrowed from Akhil Mathew. Please email corrections to holden1@mit.edu.
  • 5. Lecture 1 Notes on Theory of Computation Lecture 1 Thu. 9/6/12 Course information: Michael Sipser teaches the course. Alex Arkhipov and Zack Rumscrim teach the recitations. The website is http://math.mit.edu/~sipser/18404. The 3rd edition of the textbook has an extra lesson on parsing and deterministic free languages (which we will not cover), and some additional problems. The 2nd edition is okay. S1 Overview 1.1 Computability theory In the first part of the course we will cover computability theory: what kinds of things can you solve with a computer and what kinds of things can you not solve with a computer? Computers are so powerful that you may think they can do anything. That’s false. For example, consider the question Does a given program meet a given specification? Is it possible to build a computer that will answer this question, when you feed it any program and any specification? No; this problem is uncomputable, impossible to solve with a computer. Is it possible to build a computer so that when I feed it math statements (for instance, Fermat’s last theorem or the twin primes conjecture), it will output true or false? Again, no. No algorithm can tell whether a math statement is true or fals, not even in principle (given sufficiently long time and large space). We’ll have to introduce a formal model of computation—what do we mean by a computer?— to talk about the subject in a mathematical way. There aer several different models for computation. We’ll talk about the simplest of these—finite automata—today. Computability theory had its hayday 1930 to 1950’s; it is pretty much finished as a field of research. 1.2 Complexity theory By contrast, the second half of the course focuses on complexity theory. This picks up where computability left off, from 1960’s to the present. It is still a major area of research, and focuses not on whether we can solve a problem using a computer, but how hard is it to solve? The classical example is factoring. You can easily multiply 2 large numbers (ex. 500- digit numbers) quickly on a laptop. No one knows how or if you can do the opposite—factor a large number (ex. 1000-digit number)—easily. The state of the art is 200 digits right now. 250 digits and up is way beyond what we can do in general. We’ll define different ways of measuring hardness: time and space (memory). We’ll look at models that are specific to complexity theory, such as probabilistic models and interactive 5
  • 6. Lecture 1 Notes on Theory of Computation proof systems. A famous unsolved problem is the P vs. NP problem, which will be a theme throughout our lessons on complexity. In these problems, some kind of “searching” is inevitable. 1.3 Why theory? What is value of studying computer science theory? People question why a computer science department should invest in theoretical computer science. This used to be a big issue. Firstly, theory has proved its value to other parts of computer science. Many technologists and companies grew up out of the work of theorists, for example, RSA cryptography. Akamai came out of looking at distributed systems from a theoretical point of view. Many key personnel in Google were theorists, because search has a theoretical side to it. Theorists have played a role in building the computer science industry. Second, theory as a part of science. It’s not just driven by its applications, but by curiosity. For example, “how hard is factoring?” is a natural question that it is intrinsically worthwhile to answer. Our curiosity is makes us human. Computer science theory may also help us understand the brain in the future. We understand heart and most of our other organs pretty well, but we have only the faintest idea how the brain works. Because the brain has a computation aspect to it, it’s entirely possible that some theory of computation will help solve this problem. Is there more stuff to do? Certainly. We’re still at the very beginnings of computer science and theory. Whole realms out there are worthy of exploration. That’s what and why we do this stuff. S2 Finite Automata 2.1 An example We need to set up our models for computability theory. The first one will be finite automata. An example of a finite automata is given by the following picture. 6
  • 7. Lecture 1 Notes on Theory of Computation The states are {푞1, 푞2, 푞3}. The transitions are arrows with 0 or 1, such as 0− →. The start state is 푞1 (it has a regular arrow leading to it) and the accept states is {푞3} (it has a double circle). Note each state has 2 arrows exiting it, 0 and 1. How does this automaton work when we feed it a string such as 010110? We start at the start state 푞1. Read in the input symbols one at a time, and follow the transition arrow given by the next bit. ∙ 0: take the arrow from 푞1 back to 푞1. ∙ 1: take the arrow from 푞1 to 푞2. ∙ 0: take the arrow back to 푞1. ∙ 1: get to 푞2 ∙ 1: get to 푞3 ∙ 0: stay at 푞3. Since 푞3 is an accept state, the output is “accept.” By contrast, the input state 101 ends at 푞2; the machine does not accept, i.e. it rejects the input. Problem 1.1: What strings does the machine accept? The machine accepts exactly the strings with two consecutive 1’s. The language of 퐴, denoted 퐿(퐴), is the set of accepted strings, i.e. the language that the machine recognizes. (This term comes from linguistics.) We say that the language of 퐴 is 퐿(퐴) = {푤 : 푤 has substring 11} . S3 Formalization We now give a formal definition of a finite automaton. Definition 1.1: A finite automaton is a tuple 푀 = (푄,Σ, 훿, 푞0, 퐹) where ∙ 푄 is a finite set of states, ∙ Σ is a finite alphabet (collection of symbols, for instance {0, 1}), ∙ 훿 is the transition function that takes a state and input symbol and gives another state 훿 : 푄 × Σ → 푄 (푞, 푎)↦→ 푟. We denote this with a circle 푞 and an arrow 푎− → leading to a circle 푟. 7
  • 8. Lecture 1 Notes on Theory of Computation ∙ 푞0 ∈ 푄 is a start state. ∙ 퐹 ⊆ 푄 is a set of accept states. To take this further, we’re going to define the language of an automaton. (We did this informally by following our finger on a path. We’re just doing this formally now.) Definition 1.2: Say 푀 accepts input string 푊 = 푊1 · · ·푊푛 where each 푊푖 ∈ Σ, if 푟0, . . . , 푟푛 is a sequence from 푄 (of states gone through) where ∙ 푟0 = 푞0 (start at start state), ∙ 푟푛 ∈ 퐹 (ends at an accept state), ∙ and for each 푖 > 0 and each 푖 > 0, 푟푖 = 훿(푟푖−1,푤푖) (each next state is obtained the previous state by reading the next symbol and using the transition function). The language of 푀 is 퐿(푀) = {푤 : 푀 accepts 푤} . Note 푀 accepts certain strings and rejects certains strings, but 푀 recognizes just 1 language, the collection of all recognized strings.1 Note there is a special string, the empty string of length 0, denote 휀. By contrast, the empty language is denoted by 휑. Definition 1.3: A language is regular if some finite automaton recognizes it. For instance {푤 : 푤 has substring 11} is a regular language because we exhibited a au-tomaton that recognizes it. 3.1 Building automata Problem 1.2: Build an automaton to recognize... ∙ The set of strings with an even number of 1’s. ∙ The set of strings that start and end with the same symbol. When we have a finite automaton, and we want to design an automaton for a certain task, think as follows: 1If 퐿′ is a subset of 퐿 and 푀 recognizes 퐿, we don’t say 푀 recognizes 퐿′. 8
  • 9. Lecture 1 Notes on Theory of Computation The states of the automaton represent its memory. Use different states for different possibilities. For example, 1. an automaton that accepts iff the string has an even number of 1’s will have to count number of 1’s mod 2. You want to have one state for each possibility. 2. an automaton that accepts iff the first equals the last symbol will have to keep track of what the first symbol is. It should have different states for different possibilities of the first symbol. In the next lecture and a half we’ll seek to understand the regular languages. There are simple languages that are not regular, for example, the language that has an equal number of 0’s and 1’s is not regular. Proof sketch. Such an automaton would have to keep track of the difference between number of 0’s and 1’s so far, and there are an infinite number of possibilities to track; a finite automaton has only finitely many states and can keep track of finitely many possibilities. 3.2 Closure properties of languages Definition 1.4: We call the following 3 operations on languages regular operations. ∙ ∪ union: 퐴 ∪ 퐵 = {푤 : 푤 ∈ 퐴 or 푤 ∈ 퐵} ∙ ∘ concatenation: 퐴 ∘ 퐵 = 퐴퐵 = {푤 : 푤 = 푥푦, 푥 ∈ 퐴, 푦 ∈ 퐵} . ∙ * Kleene star (unary operation) 퐴* = {푤 : 푤 = 푋1푋2 · · ·푋푘, 푘 ≥ 0, 푥푖 ∈ 퐴} . These are traditionally called the regular operations: they are in a sense minimal, because starting from a simple set of regular languages and applying these three operations we can get to all regular languages. Example 1.5: If 퐴 = {good, bad} and 퐵 = {boy, girl} we get 퐴 ∘ 퐵 = {good boy, good girl, bad boy, bad girl}. Note for *, we stick together words in any way we want to get longer string. We get an infinite language unless 퐴 ⊆ {휀}. Note 휀 ∈ 퐴*; in particular, 휑* = {휀}. 9
  • 10. Lecture 1 Notes on Theory of Computation Theorem 1.6: The collection of regular languages is closed under regular operations. In other words, if we take 2 regular languages (or 1 regular language, for *) and apply a regular operation, we get another regular language. We say the integers are “closed” under multiplication and addition, but not “closed” under division, because if you divide one by another, you might not get an integer. Closed means “you can’t get out” by using the operation. Proof of closure under ∪. We show that if 퐴 and 퐵 are regular, then so is 퐴 ∪ 퐵. We have to show how to construct the automaton for the union language given the automata that recognize 퐴 and 퐵, i.e. given 푀1 = {푄1,Σ, 훿1, 푞1, 퐹1} recognizing 퐴 푀2 = {푄2,Σ, 훿2, 푞2, 퐹2} recognizing 퐵 construct 푀 = (푄,Σ, 훿, 푞0, 퐹) recognizing 퐴 ∪ 퐵. (For simplicity, let Σ1 = Σ2 = Σ.) You might think: run the string through 푀1, see whether 푀1 accepts it, then run the string through 푀2 and see whether 푀2 accepts it. But you can’t try something on the whole input string, and try another thing on the whole input string! You get only 1 pass. Imagine yourself in the role of 푀. The solution is to run both 푀1 and 푀2 at the same time. Imagine putting two fingers on the diagrams of the automata for 푀1 and 푀2, and moving them around. At the end, if either finger is on an accept state, then we accept. This strategy we can implement in 푀. We now formalize this idea. We should keep track of a state in 푀1 and a state in 푀2 as a single state in 푀. So each state in 푀 corresponds to a pair of states, on in 푀1 and 푀2; let 푄 = 푄1 × 푄2 = {(푞, 푟) : 푞 ∈ 푄1, 푟 ∈ 푄2} . 10
  • 11. Lecture 1 Notes on Theory of Computation How to define 훿? When we get a new symbol coming in; we go to wherever 푞 goes and wherever 푟 goes, individually. 훿((푞, 푟), 푎) = (훿1(푞, 푎), 훿2(푟, 푎)). The start state is 푞0 = (푞1, 푞2). The accept set is 퐹 = (퐹1 × 푄2) ∪ (푄1 × 퐹2). (Note 퐹1 × 퐹2 gives intersection.) It is clear by induction that the 푘th state of 푀 is just the 푘th state of 푀1 and 푘th state of 푀2. Problem 1.3: Prove that the collection of regular languages is closed under concate-nation and Kleene star. Note: The following is my solution. See the next lecture for an easier way to phrase it. Proof of closure under ∘. To know whether a string 푤 is in 퐴 ∘ 퐵, we think as follows: Suppose reading from the beginning of 푤 we see a string in 퐴, say 푥1 · · · 푥푎1 . In other words, we get to an accept state in 푄1. Then maybe we have 푥1 · · · 푥푎1 ⏟ ⏞ ∈퐴 푥푎1+1 · · · 푥푛 ⏟ ⏞ ∈퐵 . But maybe we should keep reading until next time we get to an accept state in 푄1, say step 푎2, and 푥1 · · · 푥푎2 ⏟ ⏞ ∈퐴 푥푎2+1 · · · 푥푛 ⏟ ⏞ ∈퐵 . But maybe we have 푥1 · · · 푥푎3 ⏟ ⏞ ∈퐴 푥푎3+1 · · · 푥푛 ⏟ ⏞ ∈퐵 ! So the possibilities “branch”—imagine putting one more finger on the diagram each time we get to an accept state; one finger then goes to 푄2 and the other stays at 푄1. Our fingers will occupy a subset of the union 퐴 ∪ 퐵, so let 푄 = 2푄1∪푄2 , the set of subsets of 푄1 ∪ 푄2. Now define 훿(푆, 푎) =⎧⎨⎩ {훿(푠, 푎) : 푠 ∈ 푆} , 퐹1 ∩ 푆 = 휑 {훿(푠, 푎) : 푠 ∈ 푆} ∪ {훿(푞2, 푎)}, 퐹1 ∩ 푆̸= 휑. The start state is {푞1} and the accept set is 퐹 = {푆 ⊆ 푄 : 퐹2 ∩ 푆̸= 휑} , i.e. the set of subsets that contain at least one element of 퐹2. Details of checking this works left to you! 11
  • 12. Lecture 2 Notes on Theory of Computation Note this solution involves “keeping track of multiple possibilities.” We’ll need to do this often, so we’ll develop some machinery—namely, a type of finite automaton that can keep track of multiple possibilities—that simplifies the writing of these proofs. Lecture 2 Tue. 9/11/12 The first problem set is out. Turn in the homework in 2-285. About homeworks: The optional problems are only for A+’s; we count how many optional problems you solved correctly. Look at the homework before the day before it’s due! The problems aren’t tedious lemmas that Sipser doesn’t want to do in lectures. He chose them for creativity, the “aha” moment. They encourage you to play with examples, and don’t have overly long writeups. Write each problem on a separate sheet, and turn them in in separate boxes in 2-285. Last time we talked about ∙ finite automata ∙ regular languages ∙ regular operations, and ∙ closure under ∪. Today we’ll talk about ∙ regular expressions, ∙ nondeterminism, ∙ closure under ∘ and *, and ∙ 퐹퐴 → regular expressions. S1 Regular expressions Recall that the regular operations are ∪, ∘, and *. Definition 2.1: A regular expression is an expression built up from members of Σ (the alphabet) and 휑, 휀 using ∪, ∘, and *. 12
  • 13. Lecture 2 Notes on Theory of Computation For example, if Σ = {푎, 푏}, we can build up regular expressions such as (푎* ∪ 푎푏) = (푎* ∪ 푎 ∘ 푏). Here we consider 푎 as a single string of length 1, so 푎 is shorthand for {푎}. 휀 might also appear, so we might have something like 푎* ∪ 푎푏 ∪ 휀 (which is the same since 휀 ∈ 푎*; the language that the expression describes is the same). We also write 퐿(푎*∪푎푏∪휀) to emphasize that the regular expression describes a language. Regular expressions are often used in text editors in string matching. Our goal for the next 11 2 lectures is to prove the following. Theorem 2.2: thm:regex-FA Regular expressions and finite automata describe the same class of languages. In other words, 1. Every finite automaton can be converted to a regular expression which generates the same language and 2. every regular expression can be converted to finite automaton that recognizes the same language. Even though these 2 methods of computation (regular expressions and finite automata) seem very different, they capture the same language! To prove this, we’ll first have to develop some technology. S2 Nondeterminism First, let’s think about how to prove the closure properties from last time. We showed that if 퐴1 and 퐴2 are regular, so is 퐴1 ∪ 퐴2. To do this, given a machine 푀1 recognizing 퐴1 and a machine 푀2 recognizing 퐴2, we built a machine 푀 that recognizes 퐴1 ∪ 퐴2 by simulating 퐴1 and 퐴2 in parallel. Now let’s prove closure under concatenation: If 퐴1 and 퐴2 are regular, then so is 퐴1퐴2. We start off the same way. Suppose 푀1 recognizes 퐴1 and 푀2 recognizes 퐴2; we want to construct 푀 recognizing 퐴1퐴2. What does 푀 need to do? Imagine a string 푤 going into 푀... Pretend like you are 푀; you have to answer if 푤 is in the concatenation 퐴1퐴2 or not, i.e. you have to determine if it is possible to cut 푤 into 2 pieces, the first of which is in 퐴1 and the second of which is in 퐴2. 2 A1 2 A2 W Why don’t we feed 푊 into 푀1 until we get to an accept state, and then transition control to 푀2 by going to the start state of 푀2? The problem with this approach is that just because you found an initial piece of 푊 in 퐴1 does not necessarily mean you found the right place to cut 푊! It’s possible that the remainder is not in 퐴2, and you wrongly reject the string. Maybe you should wait until later time to switch to 퐴2. There are many possible ways of cutting. 13
  • 14. Lecture 2 Notes on Theory of Computation 2 A1 2 A2 W We introduce the idea of nondeterminism to give an elegant solution to this problem. 2.1 Nondeterministic Finite Automata Consider, for example, the following automaton, which we’ll call 퐵. 1 0,1 1 0, 휀 푞1 푞2 푞3 푞4 How is this different from a finite automaton? Note that there are two “1” arrows from 푞1. In a nondeterministic finite automaton there may be several ways to proceed. The present state does NOT determine the next state; there are several possible futures. We also permit 휀 to be a label, as matter of convenience. How does this automaton work? We have multiple alternative computations on the input. When there is more than 1 possible way to proceed, we take all of them. Imagine a parallel computer following each of the paths independently. When the machine comes to point of nondeterminism, imagine it forking into multiple copies of itself, each going like a separate thread in a computer program. An 휀 label means that you can take the transition for free. The other transitions also allow reading with 1 input symbol. (In some cases there is no arrow to follow. In those cases the thread just dies off.) What do we do when parallel branches differ in their output? One choice might end up at 푞4, and another may end up not at 푞4. Only one path needs to lead to an accept state, for the entire machine to accept. If any computational branch leads to an accepting state, we say the machine accepts the input. Acceptance overrules rejection. We reject only if every possible way to proceed leads to rejection. Although this seems more complicated than the finite automata we’ve studied, we’ll prove that it doesn’t give anything new. We’ll show that anything you can do with nondeterministic finite automata, you can also do with (deterministic) finite automata. 1 0,1 1 0, 휀 푞1 푞2 푞3 푞4 Let’s look at a specific example. Take 01011 as the input. Point your finger at the start state 푞1. ∙ Read 0. We follow the loop back to 푞1. 14
  • 15. Lecture 2 Notes on Theory of Computation ∙ Read 1. There are 2 arrows with “1” starting at 푞1, so split your finger into 2 fingers, to represent the 2 different places machine could be: 푞1 and 푞2. ∙ 0. Now each finger proceeds independently, because they represent different threads of computation. The finger at 푞1 goes back to 푞1. There is no place for the finger at 푞2 to go (because there is no arrow with 0 from 푞2), so remove that finger. We just have {푞1} left. ∙ 1. We branch into 푞1, 푞2. ∙ 1. Following “1” arrows from 푞1 and 푞2, we can get to 푞1, 푞2, 푞3. But note there is an 휀 transition from 푞3 to 푞4. This means we can take that transition for free. From a finger being on 푞3, a new thread gets opened on to 푞4. We end up with all states 푞1, 푞2, 푞3, and 푞4. Each finger represents a different thread of the computation. Overall the machine accepts because at least 1 finger (thread of computation) ended up at an accepting state, 푞4. The NFA accepts this string, i.e. 01011 ∈ 퐿(퐵). By contrast 0101̸∈ 퐿(퐵), because at this point we only have fingers on 푞1, 푞2; all possibilities are reject states. We now make a formal definition. Definition 2.3: Define a nondeterministic finite automaton (NFA)푀 = (푄,Σ, 훿, 푞0, 퐹) as follows. 푄, Σ, 푞0, and 퐹 are the same as in a finite automaton. Here 훿 : 푄 × Σ휀 → 풫(푄), where 풫(푄) = {푅 : 푅 ⊆ 푄} is the power set of 푄, the collection of subsets of 푄 (all the different states you can get to from the input symbol.) and Σ휀 = Σ ∪ {휀}. In our example, 훿(푞1, 1) = {푞1, 푞2} and 훿(푞3, 휀) = {푞4}. Note 훿 may give you back the empty set, 훿(푞2, 0) = 휑. The only thing that has a different form from a finite automaton is the transition function 훿. 훿 might give you back several states, i.e. whole set of states. 2.2 Comparing NFA’s with DFA’s We now show that any language recognized by a NFA is also recognized by a DFA (de-terministic finite automaton), i.e. is regular. This means they recognize the same class of languages. Theorem 2.4 (NFA’s and DFA’s recognize the same languages): If 퐴 = 퐿(퐵) for a NFA 퐵, then 퐴 is regular. Proof. The idea is to convert a NFA 퐵 to DFA 퐶. 15
  • 16. Lecture 2 Notes on Theory of Computation Pretend to be a DFA. How would we simulate a NFA? In the NFA 퐵 we put our fingers on some collection of states. Each possibility corresponds not to a single state, but to a subset of states of 퐵. What should the states of 퐶 be? The states of 퐶 should be the power set of 퐵, i.e. the set of subsets of 퐵. In other words, each state of 퐶 corresponds to some 푅 ⊆ 푄. 퐵 NFA 푅 ⊆ 푄 퐶 DFA Let 퐵 = (푄,Σ, 훿, 푞0, 퐹); we need to define 퐶 = (푄′,Σ, 훿′, 푞′0 , 퐹′). Let 푄′ = 풫(푄) (the power set of 푄), so that if 퐵 has 푛 states, then 퐶 has 2푛 states. For 푅 ⊆ 푄 (i.e. 푅 ∈ 푄′), define 훿′(푅, 푎) = {푞 ∈ 푄 : 푞 ∈ 훿(푟, 푎), 푟 ∈ 푅 or following 휀-arrows from 푞 ∈ 훿(푟, 푎)} . (The textbook says it more precisely.) 1 1 퐵 NFA 16
  • 17. Lecture 2 Notes on Theory of Computation 푅 ⊆ 푄 1 퐶 DFA The start state of 퐶 is a singleton set consisting of just the state and anything you can get to by 휀-transitions. The accept states are the subsets containg at least one accept state in 퐵. NFA’s and DFA’s describe same class of languages. Thus to show a language is a regular language, you can just build a NFA that recognizes it, rather than a DFA. Many times it is more convenient to build a NFA rather than a DFA, especially if you want to keep track of multiple possibilities. S3 Using nondeterminism to show closure Nondeterminism is exactly what we need to show that the concatenation of two regular languages is regular. As we said, maybe we don’t want to exit the first machine the first time we get to an accept state; maybe we want to stay in 푀1 and jump later. We want multiple possibilities. Proof of closure under ∘. Given 푀1 recognizing 퐴1 and 푀2 recognizing 퐴2, define 푀 as follows. Put the two machines 푀1 and 푀2 together. Every time you enter an accept state in 푀1, you are allowed to branch by an 휀-transition to the start state of 푀2—this represents the fact that you can either start looking for a word in 퐴2, or continue looking for a word in 푀1. Now eliminate the accepting states for 푀2. We’re done! 17
  • 18. Lecture 2 Notes on Theory of Computation 휀 휀 Nondeterminism keeps track of parallelism of possibilities. Maybe you got to an accepting state but you should have waited until a subsequent state. We have a thread for every possible place to transition from 퐴1 to 퐴2; we’re basically trying all possible break points in parallel. Another way to think of NFA’s is that they enable “guessing.” Our new machine 푀 simulates 푀1 until it guesses that it found the right transition point. We “guess” this is the right place to jump to 푀2. This is just another way of saying we make a different thread. We’re not sure which is right thread, so we make a guess. We accept if there is at least one correct guess. Next we show that if 퐴1 is regular, then so is 퐴*1 . Proof of closure under *. Suppose 푀1 recognizes 퐴1. We construct 푀 recognizing 퐴*1 . We will do a proof by picture. 18
  • 19. Lecture 2 Notes on Theory of Computation What does if mean for a word 푊 to be in 퐴*1 ? 푊 is in 퐴*1 if we can break it up into pieces that are in the original language 퐴1. 2 A1 2 A1 2 A1 2 A1 2 A1 W Every time we get to the an accept state of 푀1, i.e. we’ve read a word in 퐴1 and we might want to start over. So we put 휀-transition leading from the accept state to the start state. As in the case with concatenation, we may not want to reset at the first cut point, because maybe there is no way to cut remaining piece into words in 퐴1. So every time get to an accept, have the choice to restart—we split into 2 threads, one that looks to continue the current word, and one that restarts. There is a slight problem: we need to accept the empty string as well. To do this we add a new start state, and add an 휀-transition to the old start state. Then we’re good. 19
  • 20. Lecture 2 Notes on Theory of Computation NFA’s also give us an easier way to prove closure under union. Proof of closure under ∪. Suppose we’re given 푀1 recognizing 퐴1 and 푀2 recognizing 퐴2. To build 푀 recognizing 퐴1 and 퐴2, it needs to go through 푀1 and 푀2 in parallel. So we put the two machines together, add a new start state, and have it branch by 휀-transitions to the start states both 푀1 and 푀2. This way we’ll have a finger in 푀1 and a finger in 푀2 at the same time. S4 Converting a finite automaton into a regular expression The proof of the closure properties gives us a procedure for converting a regular expression into finite automaton. This procedure comes right out of the construction of machines for ∪, ∘, and *. This will prove part 2 of Theorem 2.2. We do a proof by example: consider (푎푏 ∪ 푎*). We convert this to a finite automaton as follows. For 푎, 푏 we make the following automata. We build up our expression from small pieces and then combine. Let’s make an automaton for 푎푏. We use our construction for closure under concatenation. 20
  • 21. Lecture 3 Notes on Theory of Computation This machine recognizes 푎푏. Now we do 푎*. Finally we put the FA’s for 푎푏 and 푎* together, using the ∪ construction, to get the FA recognizing 푎푏 ∪ 푎*. The constructions for ∪, ∘, and * give a way to construct a FA for any regular expression. Lecture 3 Thu. 9/13/12 Last time we talked about ∙ nondeterminism and NFA’s 21
  • 22. Lecture 3 Notes on Theory of Computation ∙ NFA→DFA ∙ Regular expression→ NFA Today we’ll talk about ∙ DFA→regular expression ∙ Non-regular languages About the homework: By the end of today, you should have everything you need to solve all the homework problems except problem 6. Problem 3 (1.45) has a 1 line answer. As a hint, it’s easier to show there exists a finite automaton; you don’t have to give a procedure to construct it. We will finish our discussion of finite automata today. We introduced deterministic and nondeterministic automata. Nondeterminism is a theme throughout the course, so get used to it. We gave a procedure—the subset construction—to convert NFA to DFA. NFA helped achieve part of our goal to show regular expressions and NFAs recognize the same languages. We showed how to convert regular expressions to NFA, and NFA can be converted to DFA. To convert regular expressions, we used the constructions for closure under ∪, ∘, and *; we start with the atoms of the expression, and build up using more and more complex subexpressions, until we get the language recognized by the whole expression. This is a recursive construction, i.e. a proof by induction, a proof that calls on itself on smaller values. Today we’ll do the reverse, showing how to convert a DFA to a regular expressions, finishing our goal. S1 Converting a DFA to a regular expression Theorem 3.1 (Theorem 2.2, again): 퐴 is a regular language iff 퐴 = 퐿(푟) for some regular expression 푟. Proof. ⇐: Show how to convert 푟 to an equivalent NFA. We did this last time. ⇒: We need to convert a DFA to an equivalent 푟. This is harder, and will be the focus of this section. We’ll digress and introduce another model of an automaton, which is useful just for the purposes of this proof. A generalized nondeterministic finite automaton (GNFA) has states, some ac-cepting, one of which is starting. We have transitions as well. What’s different is that we can write not just members of the alphabet and the empty string but any regular expression as a lable for a transition. So for instance we could write 푎푏. 22
  • 23. Lecture 3 Notes on Theory of Computation Start at the start state. During a transition, the machine gets to read an entire chunk of the input in a single step, provided that the string is in the language described by the label on the associated transition. There may be several ways to process the input string. The machine accepts if some possibility ends up at an accept state, i.e. there is some way to cut and read the input string. If all paths fail then the machine rejects the input. Although GNFA’s look more complicated, they still recognize the same languages as DFA’s! If looks harder to convert a GNFA to a regular expression, GNFA→r. However, for inductive proofs, it is often helpful to prove something stronger along the way, so we can carry through the statement. In other words, we strengthen the induction hypothesis. To make life easier, we make a few assumptions about the GNFA. ∙ First, there is only 1 accept state. To achieve this, we can declassify accept states, and add empty transitions to new accept states. ∙ The accept state and start states are different (taken care of by 1st bullet). ∙ No incoming transitions come to the start state. To achieve this, make a new start state with an 휀-transition going to the previous start state. ∙ There are only transitions to, not from, the accept state (taken care of by 1st bullet). ∙ Add all possible transitions between states except the start and end states. If we are lacking a transition, add 휑 transition. We can go along this transition by reading a language described by 휑. This means we can never go along this transition, since 휑 describes no languages. For instance, we can modify our example to satisfy these conditions as follows. 23
  • 24. Lecture 3 Notes on Theory of Computation Lemma 3.2: For every 푘 ≥ 2, every GNFA with 푘 states has an equivalent regular expression 푅. Proof. We induct on 푘. The base case is 푘 = 2. We know what the states are: the machine has a start state (no incoming arrows) and an accept state. Assuming the conditions above, the only possible arrow is from the start to end, so the machine looks like the following. There are no return arrows or self-loops. 푅 푞1 푞2 The only way to accept is to read a string in 푅; the machine can only process input in its entirety with one bite, so the language is just the regular expression 푅. This is the easy part. Now for the induction step. Assume the lemma true for 푘; we prove it for 푘 + 1. Sup-pose we’re given a (푘 + 1)-state GNFA. We need to show this has a corresponding regular expression. We know how to convert 푘-state GNFA to a regular expression. Thus, if we can convert the (푘 + 1)-state to a 푘-state GNFA, then we’re done. You can think of this as an iterative process: convert (푘 +1) to 푘 to 푘 −1 states and so on, wiping out state after state, and keeping the language the same, until we get to just 2 states, where we can read off the regular expression from the single arrow. We’ll pick one state 푥 (that is not the start or accept state) and remove it. Since 푘+1 ≥ 3, there is a state other than the start and accept state. But now the machine doesn’t recognize the same language anymore. We broke the machine! We have to repair the machine, by putting back the computation paths that got lost by removing the state. This is where the magic of regular expressions come in. Suppose we have arrows 푖 → 푥 → 푗. We can’t follow this path because 푥 is gone. In the arrow from 푖 to 푗, we have to put back the strings that got lost. So if we have 푖 푟1 −→ 푥 푟3 −→ 푗, then we add in 푟1푟3 from 푖 to 푗, so we can go directly from 푖 to 푗 via 푟1푟3. However, letting the self-loop at 푥 be 푟2, we might go along 푟1, repeat 푟2 for a while, and then go to 푟3, 24
  • 25. Lecture 3 Notes on Theory of Computation we so actually want 푟1(푟* 2)푟3. Now take the union with the regular expression from 푖 to 푗, 푟1(푟* 2)푟3 ∪ 푟4. So the construction is as follows. For each pair 푖 푟4 −→ 푗, replace 푟4 with 푟1(푟2)*푟3 ∪ 푟4 where 푟1, 푟2, 푟3 are as above. All arrows adjusted in the same way. The computations that go from 푖 to 푗 via 푥 in the old machine are still present in the new machine, and go directly from 푖 to 푗. Our modified machine is equivalent to the original machine. Taking any computation in first machine, there is a corresponding computation in second machine on the same input string, and vice versa. This finishes the proof. Theorem 2.2 now follows, since a DFA is a GNFA. S2 Non-regular languages There are lots of langages that are not recognized by any finite automata. We see how to prove a specific language is non-regular. Let 퐶 = {푤 : 푤 has equal number of 0s and 1s} . As we’ve said, it seems like 퐶 is not regular because it has to keep track of the difference between the number of 0s and 1s, and that would require infinitely many states. But be careful when you claim a machine can’t do something—maybe the machine just can’t do it the following the method you came up with! ! “I can’t think of a way; if I try come up with one I fail” doesn’t hold water as a proof! As an example, consider 퐵 = {푤 : 푤 has equal number of 01 and 10 substrings} . 25
  • 26. Lecture 3 Notes on Theory of Computation For example 1010̸∈ 퐵, but 101101 ∈ 퐵. This language may look nonregular because it looks like we have to count. But it is regular, because there is an alternative way to describe it that avoids counting. Problem 3.1: Show that 퐵 is regular. 2.1 Pumping Lemma We give a general method that works in large number of cases showing a language is not regular, called the Pumping Lemma. It is a formal method for proving nonregular not regular. Later on, we will see similar methods for proving that problems cannot be solved by other kinds of machines. Lemma 3.3 (Pumping Lemma): lem:pump For any regular language 퐴, there is a number 푝 where if 푠 ∈ 퐴 and |푆| ≥ 푝 then 푆 = 푥푦푧 where 1. 푥푦푖푧 ∈ 퐴 for any 푖 ≥ 0 (We can repeat the middle and stay in the language.) 2. 푦̸= 휀 (Condition 1 is nontrivial.) 3. |푥푦| ≤ 푝 (Useful for applications.) What is this doing for us? The Pumping Lemma gives a property of regular languages. To show a language is not regular, we just need to show it doesn’t have the property. The property is that the language has a pumping length, or cutoff 푝. For any string 푠 longer than the cutoff, we can repeat some middle piece (푦푖) as much as we want and stay in the language. We call this pumping up 푠. Every long enough string in the regular language can be pumped up as much as we want and the string remains in the language. Before we give a proof, let’s see an example. Example 3.4: Let 퐷 = {0푚1푚 : 푚 ≥ 0} . Show that 퐷 is not regular using the Pumping Lemma. To show a language 퐷 is not regular, proceed by contradiction: If 퐷 is regular, then it must have the pumping property. Exhibit a string of 퐷 that cannot be pumped no matter how we cut it up. This shows 퐷 does not have the pumping property, so it can’t be regular. Assume 퐷 is regular. The pumping lemma gives a pumping length 푝. We find a string longer than 푝 that can’t be pumped: let 푠 = 0푝1푝 ∈ 퐷. 26
  • 27. Lecture 3 Notes on Theory of Computation s = 0 · · · 0 1 · · · 1 p p There must be some way to divide 푠 into 3 pieces, so that if we repeat 푦 we stay in the same language. But we can’t pump 푠 no matter where 푦 is. One of the following cases holds: 1. 푦 is all 0’s 2. 푦 is all 1’s 3. 푦 has both 0’s and 1’s. If 푦 is all 0’s, then repeating 푦 gives too many 0’s, and takes us out of the language. If 푦 is all 1’s, repeating gives too many 1’s. If 푦 has both 0’s and 1’s, they are out of order when we repeat. In each case, we are taken out of the language so pumping lemma fails, and 퐷 is not regular. If we use condition 3 of the Pumping Lemma we get a simpler proof: 푥푦 is entirely in the first half of 푠, so 푦 must be all 0’s (case 1). Then 푥푦푦푧 has excess 0’s and so 푥푦2푧̸∈ 퐷. Now we prove the Pumping Lemma. Proof of Lemma 3.3. Let 푀 be the DFA for 퐴. Let 푝 be the number of states of 푀. This will be our pumping length. Suppose we have a string of length at least 푝. Something special has to happen when the machine reads the string: We have to repeat a state! We have to repeat a state within the first 푝 steps (because after 푝 steps we’ve made 푝 + 1 visits to states, including the starting state). Consider the first repeated state, drawn in in the below diagram. Divide the path into 3 parts: 푥, 푦, and 푧. Note we can choose 푦 nonempty because we’re saying the state is repeated. From this we see that we can repeat 푦 as many times as we want. 27
  • 28. Lecture 4 Notes on Theory of Computation Example 3.5: Now we show 퐶 = {푤 : 푤 has equal number of 0s and 1s} . is not regular. There are two ways to proceed. One is to use the Pumping Lemma directly (this time we need to use condition 3) and the other way is to use the fact that we already know 퐷 is not regular. What is wrong with the following proof? Because 퐷 is not regular and 퐷 ⊆ 퐶, 퐶 is not regular. ! Regular languages can have nonregular languages as subsets, and vice versa. Subsets tell you nothing about regularity. However, we if we combine the fact that 퐷 ⊆ 퐶 with some extra features of 퐶, then we can come up with a proof. Note 퐷 = 퐶 ∩ 0*1*. Note 0*1* is regular. If 퐶 were regular, then 퐷 would be regular, because the intersection of 2 regular languages is regular. Since 퐷 is not regular, neither is 퐶. The Pumping Lemma is a powerful tool for showing languages are nonregular, es-pecially when we combine it with the observation that regular languages are closed under regular operations. Lecture 4 Tue. 9/18/12 Last time we talked about ∙ Regular expressions← DFA ∙ Pumping lemma Today we’ll talk about CFG’s, CFL’s, and PDA’s. Homework 1 is due Thursday. ∙ Use separate sheets. ∙ No bibles, online solutions, etc. ∙ Office hours 28
  • 29. Lecture 4 Notes on Theory of Computation – Michael Sipser: Monday 3-5 – Zack: Tuesday 4-6 32-6598 – Alex: Wednesday 2-4 32-6604 S0 Homework hints Problem 2 (1.67, rotational closure): If 퐴 is a language, 푤 = 푥푦 ∈ 퐴, then put 푦푥 ∈ 푅퐶(퐴). Prove that if 퐴 is regular, then 푅퐶(퐴) is also regular. If 푀 is a finite automaton and 퐿(푀) = 퐴, then you need to come up with a finite automaton that recognizes the rotational closure of 퐴. The new automaton must be able to deal with inputs that look like 푦푥. Don’t just try to twiddle 푀. If you were pretending to be a finite automaton yourself, how you would go about deter-mine if a string is in the rotational closure of the original language? Recall, for 푦푥 to be in the rotational closure, the original automaton should accept 푥푦. How would you run the original automaton to see whether the string is a rearranged input of something the original automaton would have accepted? If only you could see 푥 in advance, you would know what state you get to after running 푦! Then you could start there, run 푦, then run 푥, and see if you get back where you started. But you have to pretend to be a finite automaton, so you can’t see 푥 first. The magic of nondeterminism will be helpful here! You could guess all possible starting states, and see if any guess results in accept. “Guess and check” is a typical pattern in nondeterminism. Problem 3 (1.45, 퐴/퐵 is regular, where 퐴 is regular and 퐵 is any): We get 퐴/퐵 as follows: start with 퐴 and remove all the endings that can be in 퐵. In other words, 퐴/퐵 consists of all strings such that if you stick in some member of 퐵, you get a member of 퐴. Note you don’t necessarily have a finite automaton for 퐵 because 퐵 is not necessarily regular! This might be surprising. Think about how you would simulate a machine for 퐴/퐵. If a string leads to one of the original accepting states, you might want accept it early. You don’t want to see rest of string if the rest of the string is in 퐵. Looked at the right way, the solution is transparent and short. Again, think of what you would do if you were given the input and wanted to test if it was in the language. Problem 4 (1.46d): When you’re using the pumping lemma, you have to be very careful. The language you’re supposed to work with consists of strings 푤푡푤 where |푤|, |푡| ≥ 1. For example, 0001000 is in the languge, because we can let 000 ⏟ ⏞ 푤 1 ⏟ ⏞ 푡 000 ⏟ ⏞ 푤 . 29
  • 30. Lecture 4 Notes on Theory of Computation If we add another 0 to the front, it’s tempting to say we’re not out of the language. But we’re still in the language because we can write 000 ⏟ ⏞ 푤 01 ⏟ ⏞ 푡 000 ⏟ ⏞ 푤 . You don’t get to say what 푤 and 푡 are. As long as there is some way of choosing 푤 and 푡, it’s in the language. S1 Context-Free Grammars We now talk about more powerful ways of describing languages than finite automata: context-free grammars and pushdown automata. Context free grammars and pushdown automata have practical applications: we can use them to design controllers, and we can use them to describe languages, both natural languages and programming languages. 1.1 Example We introduce context-free grammars with an example. A context-free grammar has variables, terminals, and rules (or predictions). 푆 → 푂푆1 푆 → 푅 푅 → 휀 In the above, the three statement above are rules, 푅 is a variable, and the 1 at the end of 푂푆1 is a terminal. The symbols on the left hand side are variables. The symbols that only appear on the right hand side are called terminals. We use a grammar to generate a language as follows. Start out with the symbol on the LHS of the topmost rule, 푆 here. The rules represent possibilities for substitution. Look for a variable in our current expression that appears on the LHS of a rule, substitute it with the RHS. For instance, in the following we replace each bold string by the string that is in blue in the next step. S 0S1 00S11 00R11 00휀11 0011. When we have a string with only terminal symbols, we declare that string to be in the langage of 퐺. So here 0011 ∈ 퐿(퐺). 30
  • 31. Lecture 4 Notes on Theory of Computation Problem 4.1: What is the language of 퐺? We can repeat the first rule until we get tired, and then terminate by the 2nd and 3rd rules. We find that 퐿(퐺) = ⌋︀0푘1푘 : 푘 ≥ 0{︀. The typical shorthand combines all rules that have the same left hand side into a single line, using the symbol | to mean “or.” So we can rewrite our example as 푆 → 푂푆1|푅 푅 → 휀. Example 4.1: Define 퐺2 to be the grammar 퐸 → 퐸 + 푇|푇 푇 → 푇 × 퐹|퐹 퐹 → (퐸)|푎. The variables are (푉 ) = {퐸, 푇, 퐹}; the terminals are (Σ) = {푎,+,×, (, )}. (We think of these as symbols.) This grammar represents arithmetical expressions in 푎 using +,×, and parentheses; for instance, (푎 + 푎) × 푎 ∈ 퐿(퐺). This might appear as part of a larger grammar of a programming language. Here is the parse tree for (푎 + 푎) × 푎. 퐸 푇 푇 × 퐹 퐹 푎 ( 퐸 ) 퐸 + 푇 푇 퐹 퐹 푎 푎 31
  • 32. Lecture 4 Notes on Theory of Computation A derivation is a list of steps in linear form: When 푈, 푉 ∈ (푉 ∪ Σ)*, we write 푈 =⇒ 푉 is we get to 푣 from 푢 in one substitution. For instance we write 퐹 × 퐹 =⇒ (퐸) × 퐹. We write 푢 * =⇒ 푣 if we can get from 푢 to 푣 in 0, 1, or more substitution steps. 1.2 Formal definition We now give a formal definition, just like we did for a finite automaton. Definition 4.2: A context-free grammar (CFG) is 퐺(푉,Σ, 푆,푅) where ∙ 푉 is the set of variables, ∙ Σ is the set of terminals, ∙ 푆 ∈ 푉 is the start variable, and ∙ 푅 is the set of rules, in the form variable → string of variable and terminals. We say 푆 derives 푤 if we can repeatedly make substitutions according to the rules to get from 푆 to 푤. We write a derivation as 푆 =⇒ 푢1 =⇒ 푢2 =⇒ · · · =⇒ 푢ℓ =⇒ 푤, 푆 * =⇒ 푤. (푤 only has terminals, but the other strings have variables too.) We say that 퐺 recognizes the language 퐿(퐺) = ⌋︀푤 ∈ Σ* : 푆 * =⇒ 푤{︀. There is a natural correspondence between a derivation and a parse tree. Parse tree may be more relevant to a particular applications. Note 푎 + 푎 × 푎 ∈ 퐿(퐺2). Take a look back at the parse tree for 푎 + 푎 × 푎. Reading it from the bottom up, the parse tree first groups 푎×푎 into a subtree, and then puts in the ×. There is no way to put the + first, unless we put in parentheses. This is important in a programming language! Sometimes we can have multiple parse strings for the same string—an undesirable feature in general. That means we have two different interpretations for a particular string, that can give rise to two different semantic meanings. In a programming language, we do not want two different meanings for the same expression. Definition 4.3: A string is derived ambiguously if it has two different parse trees. A grammar or language is ambiguous if some string can be derived ambiguously. We won’t discuss this further, but look at the section in the book for more. 32
  • 33. Lecture 4 Notes on Theory of Computation 1.3 Why care? To describe a programming language in formal way, we can write it down in terms of a grammar. We can specify the whole syntax of the any programming language with context-free grammars. If we understand grammars well enough, we can generate a parser—the part of a compiler which will take the grammar representing the program, process a program, and group the pieces of code into recognizable expressions. The parser would then feed the expressions into another advice. The key point is that we need to write down a grammar that represents the programming language. Context-free grammars had their origin in the study of natural languages. For instance, 푆 might represent sentence, and we may have rules such as 푆 → (noun phrase) (verb phrase) , (verb) → (adverb) (verb) , (noun) → (adjective) (noun) , and so forth. We can gain insight into the way a language works by specifying it this fashion. This is a gross oversimplification, but both the study of programming and natural lan-guages benefit from the study of grammars. We’re going to shift gears now, and then put everything together in the next lecture. S2 Pushdown automata Recall that we had 2 different ways for describing regular languages, using a ∙ computational device, a finite automaton, which recognize members of regular lan-guages when it runs. ∙ descriptive device, a regular expression, which generates members of regular languages. We found that finite automata and regular expressions recognize the same class of languages (Theorem 2.2). A CFG is a descriptive device, like a regular expression. We will find a computational device that recognizes the same languages as CFG’s. First, a definition. Definition 4.4: A context-free language (CFL) is one generated by a CFG. We’ve already observed that there is a CFL that is not regular: we found a CFG gener-ating the language {0푘1푘}, which is not regular. We will show in fact that the CFL’s include all regular languages. More on this later. S3 Comparing pushdown and finite automata We now introduce a computational device that recognizes exactly the context-free languages: a pushdown automaton (PDA). A pushdown automaton is like a finite automaton with a extra feature called a stack. 33
  • 34. Lecture 4 Notes on Theory of Computation In a finite automaton, we have a finite control, i.e. different states with rules of how to transition between them. We draw a schematic version of a finite automaton, as above. A head starts at the beginning of the input string, and at each step, it moves to the next symbol to the right. A pushdown automata has an extra feature. It is allowed to write symbols on the stack, not just read symbols. However, there are some limitations. A pushdown automata can only look at the topmost symbol of a stack. When it writes a symbol to the stack, what’s presently there gets pushed down, like a stack of plates in a cafeteria. When reading, the reverse happens. In one step the automata can only pop off the topmost symbol; then the remaining symbols all move back up. We use the following terminology: ∙ push means “add to stack,” and ∙ pop means “read and remove from stack.” When we looked at FA’s, we considered deterministic and nondeterministic variants. For PDA’s, we’ll only consider the nondeterministic variant. A deterministic version has been studied, but in the case of pushdown automata they are not equivalent. Some languages require nondeterministic PDA’s. Deterministic pushdown automata have practical applications to programming languages, because the process of testing whether a language is valid is especially efficient if the PDA is deterministic. This is covered in the 3rd edition of the textbook. Let’s give an example. Example 4.5: ex:akbk We give a PDA for 퐴 = ⌋︀0푘1푘 : 푘 ≥ 0{︀. As we’ve said, a PDA is a device that looks like FA but also have stack can write on. Our PDA is supposed to test whether a string is in 퐴. If we used an ordinary FA, without a stack, then we’re out of luck. Intuitively, a FA has finite memory, and we can’t do this language with finite memory. The stack in a PDA, however, is just enough to allow us to “remember stuff.” 34
  • 35. Lecture 4 Notes on Theory of Computation Problem 4.2: How would we design a PDA that recognizes 퐴? (How would you use the stack?) We can use the stack to record information. The idea is that every time we read a 0, stick a 0 in; every time we read a 1, pop it out. If the stack becomes empty and has not become empty beforehand, then we accept. The 0’s match off with 1’s that come later. We have to modify this idea a little bit, because what if the 0’s and 1’s are out of order? We don’t want to accept strings where the 0’s and 1’s are out of order. If we insist that 0’s come before 1’s, we need a finite control mechanism. We have a state for reading 0’s and another state when reading 1’s. In the “1” state the PDA no longer takes 0’s and adds them to the stack. We see that a PDA combines the elements of FA with the power of a stack. Now we ask: how do we know when to transition from reading 0’s to reading 1’s? We’d like to consider different possibilities for when to transition, i.e. let several parallel threads operate independently, and if any thread gets to an accept state, then have the machine accepts the input. Hence we turn to nondeterminism: every time there’s a choice, the machine splits into different machines which operate independently, each on its own stack. At every step when the machine is reading 0’s, we give it a nondeterministic choice: in the next step the machine can continue to push 0’s on the stack, or transition into reading 1’s and popping 0’s off the stack. 3.1 Formal definition Let’s write down the formal definition. Definition 4.6: A pushdown automaton (PDA) is a 6-tuple 푃 = (푄,Σ, Γ, 훿, 푞0, 퐹) where ∙ 푄 are the states ∙ Σ is the input alphabet, ∙ Γ is the stack alphabet, ∙ 훿 is the transition function, ∙ 푞0 is the start state, ∙ 퐹 is the accept states. Here 푄, Σ, 푞0, and 퐹 are as in a finite automata, but the transition function is a function 훿 : 푄 × Σ휀 × Γ휀 → 풫(푄 × Γ휀) (we explain this below). 35
  • 36. Lecture 5 Notes on Theory of Computation On first thought, we may think to define the transition function as a function 훿 : 푄 × Σ × Γ → 푄 × Γ. The function takes as input ∙ a state in 푄—the current state of the machine, ∙ a symbol from Σ—the next symbol to read, and ∙ a symbol from Γ—the top-of-stack symbol. It outputs a another state in 푄 to transition to, and a symbol from Γ—the next symbol to push on the stack. However, we have to modify this: we want nondeterminism, so we allow the machine to transition to an entire set of possible next states and next symbols, and we represent this by having 훿 output a subset: 훿 : 푄 × Σ × Γ → 풫(푄 × Γ). We also allow 훿 to read an empty string, or read without popping a string on the stack, and proceed without writing a symbol, so we actually want 훿 : 푄 × Σ휀 ⏟ ⏞ Σ∪{휀} × Γ휀 ⏟ ⏞ Γ∪{휀} → 풫(푄 × Γ휀). We’ll do one more example and save proofs to next time. Example 4.7: Consider the language ⌋︀푤푤ℛ : 푤 ∈ {0, 1}{︀, where ℛ means “reverse the word.” This is the language of even-length palindromes such as 0110110110. A PDA recognizing this language uses nondeterminism in an essential way. We give a sketch of how to construct a PDA to recognize it. (See the book for details.) The PDA has to answer: does the 2nd half of the word match the first half? We should push the first half of the word on the stack. When we pop it off, the string comes out backwards, and we can match it with the second half of the word. This is exactly what we need. But how do we know we’re at the middle? When do you shift from pushing to popping and matching? Can we find the length of the word? No. Instead, we guess nondeterministically at every point that we’re at the middle! If the word is a palindrome, one of the threads will guess correctly the middle, and the machine will accept. 36
  • 37. Lecture 5 Notes on Theory of Computation Lecture 5 Thu. 9/20/12 Problem set 2 is out. Last time we talked about CFG’s, CFL’s, and PDA’s. Today we will talk about ∙ CFG→PDA, ∙ non-CFL’s ∙ Turing machines Recall what nondeterminism means: every time there are multiple possibilities, the whole machine splits into independent parts. As long as one thread lands in an accept state, then we accept. Nondeterminism is a kind of guessing and checking that the guess is correct. When we define the model for a PDA, the PDA can pop something from the stack. There is no hardware (bulit-in function) to test if the stack is empty, but we can use “software” (i.e. clever programming) to test if the stack is empty: to start off, write a $, and when the machine sees $, it knows that the stack is empty. Thus we can allow any PDA to test whether the stack is empty. We’ll use this in many of our examples. To jog your memory, a CFG is made up of a set of rules like the following: 퐸 → 퐸 + 푇|푇 푇 → 푇 × 퐹|퐹 푇 → (퐸)|푎. We saw that this CFG recognizes 푎 × 푎 + 푎: we had a derivation of 푎 × 푎 → 푎 given by 퐸 =⇒ 퐸 + 푇 =⇒ 푇 + 푇 =⇒ 푇 + 퐹 =⇒ · · · =⇒ 푎 × 푎 + 푎. S1 CFG’s and PDA’s recognize the same language Our main theorem of today is the following. Theorem 5.1: 퐴 is a CFL iff some PDA recognizes 퐴. In other words, CFG’s and PDA’s have exactly the same computing power; they generate the same class of languages. To prove this we’ll simulate one kind of computation with another kind of computation. Proof. We need to prove 1. CFG→PDA (we’ll do this today) 2. PDA→CFG (skip, see the book. This is more technical.) 37
  • 38. Lecture 5 Notes on Theory of Computation Corollary 5.2: 1. Every regular language is a CFL. 2. The intersection of a context-free language and a regular language is a context-free language. CFL ∩ regular = CFL. Proof. 1. A finite automaton is a pushdown automaton that just doesn’t use the stack. 2. Omitted. (See Exercise 2.18a in the book—basically just take the states to be the product set.) Note 2 is weaker than the statement that the intersection of two context-free languages is a context-free language, which is not true. Proposition 5.3: The intersection of two CFL’s, or the complement of a CFL, is closed under ∪, ∘, *, but not under ∩ or complementation. Proof. Just give a construction using grammars or pushdown automata. S2 Converting CFG→PDA Proof sketch. We convert a CFG into a PDA. The input is a string that may or may not be in the language; our PDA has to say whether it is. Recall that the derivation is the sequence of strings we go through to get to a string in the language. We use nondeterminism to guess the derivation. We first write down the start variable on the top of the stack. Take whatever string is written down on the stack to be the current working string. Take one of the variables on the stack and in the next step, replace it using one of the rules. There may be several possible steps; we consider all possibilities using nondeterminism. For instance, we’d want the machine to operate as follows. 38
  • 39. Lecture 5 Notes on Theory of Computation At the end we pop the symbols and compare with the input string, and then test the stack for emptiness at the end. However, there’s a problem: what if we want to replace some symbol not at the top? The idea is the following: if the top of the stack has a terminal symbol (which can’t be replaced by anything), let’s match it against the next symbol in the input word immediately. Whenever we have a terminal symbol at the top of the stack, we pop and compare until a variable (such a F) is at the top. Sooner or later, we’ll have a variable at the top, and then we can try one of the substitutions. See the textbook for details. S3 Non-CFLs CFG’s are powerful but there are still many languages they can’t recognize! We will show the language ⌋︀푎푘푏푘푐푘 : 푘 ≥ 0{︀ is not a CFL. Note by contrast that ⌋︀푎푘푏푘 : 푘 ≥ 0{︀ is a CFL (Example 4.5). An intuitive argument is the following: we can push the 푎’s, compare with the 푏’s by popping the 푎’s, but when we get to the 푐’s we’re out of luck: the 푎’s were all popped off, and the system has not remembered any information about the 푎’s. However, as we’ve said, we have to be careful with any argument that says “I can’t think of a way; thus it can’t be done.” How do you know the machine doesn’t proceed in some other tricky way? By contrast, if we look at the strings 푎푘푏푙푐푚 where either the number of 푎’s equal the number of 푏’s, or the number of 푎’s equal the number of 푐’s, this can be done with pushdown automaton. (Use nondeterminism, left as exercise.) We’ll give a technique to show that a language is not a CFL, a pumping lemma in the spirit of the pumping lemma for regular languages, changed to make it apply to CFL’s. Our notion of pumping is different. It is the same general notion: all long strings can be “pumped” up and stay in the language. However, we’ll have to cut our string into 5 rather then 3 parts. 39
  • 40. Lecture 5 Notes on Theory of Computation Lemma 5.4 (Pumping lemma for CFL’s): lem:pump-cfl For a CFL 퐴, there is a pumping length 푝 where if 푠 ∈ 퐴 and |푠| ≥ 푝, then 푠 can be broken up into 푠 = 푢푣푥푦푧 such that 1. 푢푣푖푥푦푖푧 ∈ 퐴 for 푖 ≥ 0. (We have to pump 푣 and 푦 by the same amount.) The picture is as follows. S = u v x y z 2. 푣푦̸= 휀. (We can’t break it up so that the second and fourth string are empty, because in this case we won’t be saying anything!) 3. |푣푥푦| ≤ 푝 Example 5.5: Let’s show ⌋︀푎푘푏푘 : 푘 ≥ 0{︀satisfies the pumping lemma. For instance, we can let 푎푎푎푎 ⏟ ⏞ 푢 푎 ⏟ ⏞ 푣 푎푏 ⏟ ⏞ 푥 푏 ⏟ ⏞ 푦 푏푏푏푏 ⏟ ⏞ 푧 . Example 5.6: If ⌋︀푎푘푏푘푐푘 : 푘 ≥ 0{︀were a CFL, it would be satisfy the pumping lemma. We show this is not true, so it is not a CFL. Again, this is a proof by contradiction. Suppose ⌋︀푎푘푏푘푐푘 : 푘 ≥ 0{︀ satisfies the conclusion of the pumping lemma. Take the string 푠 = 푎 · · · 푎 ⏟ ⏞ 푝 푏 · · · 푏 ⏟ ⏞ 푝 푐 · · · 푐 ⏟ ⏞ 푝 = 푎푝푏푝푐푝 and let 푢, 푣, 푥, 푦, 푧 satisfy the conclusions of the pumping lemma. First note that 푣 can only have one kind of symbol, otherwise when we pump we would have letters out of order (instead of all 푎’s before 푏’s and all 푏’s before 푐’s), and the same is true of 푦. Thus when we pump up 푣 and 푦, the count of at most 2 symbols will increase (rather than all 3 symbols), and we will not have an equal number of 푎’s, 푏’s, and 푐’s. Thus ⌋︀푎푘푏푘푐푘 : 푘 ≥ 0{︀ fails the pumping lemma, and hence is not context-free. Proof of Pumping Lemma 5.4. We’ll sketch the higher-level idea. Qualitatively, the pumping lemma says that every long enough string can be pumped and stay in the language. Let 푠 be a really long string. We’ll figure out what “really long” means later. Let’s look at the parse tree; suppose 푇 is the start variable. What do we know about the parse tree? It’s really tall, because 푠 is long, and a short parse tree generate can’t generate a really wide tree (which corresponds to a long string). More precisely, the amount of “fan-out” is determined by the size of the longest right-hand string in the grammar. We determine what “long” and “tall” means after we look at the grammar. What does it mean when we have a really tall parse tree? If we take a path, then there has to be a long path, with lots of nodes—so many nodes that we have to repeat one of the variables, say 푅. Let 푢, 푣, 푥, 푦, 푧 be as follows. 40
  • 41. Lecture 5 Notes on Theory of Computation Look at the subtree that comes from the lower and upper instances of the repeated variable. Now let’s make a “surgery”: take the subtree under the higher 푅 and stick it in place of the lower subtree. We now get another valid parse tree for 푢푣푣푥푦푦푧. We can repeat this as many times as we’d like. We get the 푖 = 0 case by sticking the lower tree on the upper 푅. 41
  • 42. Lecture 5 Notes on Theory of Computation There are some details to get the conditions to work. ∙ How do we know that 푣 and 푦 are not both empty? If they are, we’ve shown nothing. Let’s start off with the parse tree with the fewest nodes. If 푣 and 푦 were both empty, then when we stick the lower 푅-subtree higher up as in the last picture above, we get fewer nodes, contradicting our minimality assumption. Hence 푣 and 푦 can’t both be empty; this gives condition 2. ∙ Let’s figure out 푝. Let 푏 be the size of the largest right-hand side of a rule. We want the tallness to be at least |푉 | + 1 (|푉 | is the number of variables.) At each level, the number of nodes multiplies by at most 푏. If we set 푝 = 푏|푉 |+1, then the tree would have at least |푉 | + 1 levels, so one of the symbols would repeat, as needed. ∙ To satisfy item 3 we take the lowest repetition of a symbol, so that there can be no repetitions below. This will give the bound |푣푥푦| ≤ 푝. S4 Turing machines Everything we’ve done so far is a warm-up. We’ve given two models of computations that are deficient in a certain sense because they can’t even do what we think computers can do, such as test whethere a string is of the form 푎푘푏푘푐푘. A Turing machine is vastly more powerful; it is a much closer model to what we think about when we think about a general-purpose computer. The input tape of a Turing machine combines the features of both the input and stack. It is a place where we can both read and write. ∙ We can read and write on the tape. This is the key difference. The following are other differences. 42
  • 43. Lecture 6 Notes on Theory of Computation ∙ We are able to both move the tape forward and back, so we can read what we wrote before. (It’s a two way head.) ∙ The tape is infinite to the right. At the beginning, it is filled with a finite string, and the rest of the tape is filled will special symbols called blanks. The head starts on the leftmost tape cell. ∙ The machine accepts by entering an “accept” state anywhere. (It no longer makes sense to say the Turing machine accepts only at the end of the string—it might have erased or changed the last symbol!) ∙ There is a “reject” state; if the machine visits that state, stop and reject (reject by halting). ∙ A Turing machine can also reject by entering an infinite loop (“looping”).2 For the time being we’ll just allow the deterministic variant. Example 5.7: We outline how to build a Turing machine that recognizes {푎푛푏푛푐푛}. Let’s assume we can test when we’re at the beginning. We go through the string and cross out the first 푎, 푏, and 푐 that appear. If we find letters that are out of order, we reject. Otherwise we go back to the beginning and continue to cross off symbols 푎, 푏, and 푐 one at a time. If we cross out the last 푎, 푏, and 푐 on the same run, then accept. When we cross a symbol off, write the symbol 푥 to remember that we crossed out some-thing there. We’ll write down the formal definition next time. Our transition function will depends on on both the state and tape symbol. 2How does the Turing machine know it has entered a infinite loop? Mathematically being able to define when the machine rejects is different from what we can tell from the machine’s operation. We’ll talk more about this later. 43
  • 44. Lecture 6 Notes on Theory of Computation Lecture 6 Tue. 9/25/12 Last time we talked about ∙ CFG↔PDA (we only proved →) ∙ Pumping lemma for CFL’s ∙ Turing machines Turing machines are an important model for us because they capture what we think of when we think of a general-purpose computer, without constraints like only having a stack memory or a finite memory. Instead, it has an unlimited memory. Today we’ll talk about ∙ Turing machine variants – Multitape – Nondeterministic – Enumerators ∙ Church-Turing thesis In the first half of today’s lecture, we’ll develop some intuition for Turing machines. We will prove that several variations of Turing machines are actually all equivalent. As we’ll see, this has philosophically important implications. S1 Turing machines We now give a formal definition of a Turing machine. Definition 6.1: A Turing machine (TM) is a 7-tuple 푀 = (푄,Σ, Γ, 훿, 푞0, 푞acc, 푞rej) where ∙ 푄 is the set of states, ∙ Σ is the input alphabet, ∙ Γ is the tape alphabet, ∙ 훿 is a function 푄 × Γ → 푄 × Γ × {퐿,푅}. Here 퐿 or 푅 denote the movement of the head. 44
  • 45. Lecture 6 Notes on Theory of Computation ∙ 푞0 is the start state, ∙ 푞푎 are accept states, and ∙ 푞푟 are reject states. If the machine tries to move off the left extreme of the tape, the machine instead just stay where it is.3 A Turing machine may halt (accept or reject) or loop (reject). If the machine loops we say the machine rejects, but think of it as rejecting after “infinite time”; we don’t know at any finite time that the machine has rejected. Definition 6.2: Let 퐿(푀) = {푤 : 푀 on input 푤 accepts} . If 퐴 = 퐿(푀) for some Turing Machine 푀, we say that 퐴 is Turing-recognizable (also called recursively enumerable). An important subclass of Turing Machines are those that always halt. Definition 6.3: A TM 푀 is a decider if 푀 halts on every input. If 퐴 = 퐿(푀) for some decider, we say 퐴 is decidable. Turing Machines which reject by halting are more desirable than those that reject by looping. We just introduced 2 new classes of languages, Turing-recognizable languages and decid-able languages. We have CFL’s ⊂ decidable ⊂ T-recognizable where the inclusions are proper (we’ll show the right-hand inclusion is proper). We need to show containment in the left-hand side and nonequality in the RHS. 1.1 A brief history of Turing machines Why are Turing machines so important, and why do we use them as a model for a general-purpose computer? The concept of a Turing machines dates back to the 1930’s. It was one of a number of different models of computation that tried to capture effective computability, or algorithm, as we would now say. Other researchers came up with other models to capture computation; for example, Alonzo Church developed lambda calculus. 3Other treatments do different things. However, minor details don’t make any difference. We get the same computing power, and the same class of languages is recognized. The model of Turing machine is robust; it’s not sensitive to little details. 45
  • 46. Lecture 6 Notes on Theory of Computation It wasn’t obvious that these different models are equivalent, i.e., that they captured the same class of computations. However, they did. Nowadays we have programming languages. Can today’s more “advanced” programming languages (Python) do more than a FORTRAN program? It has a lot of new features compared to old boring “do” loops. It is conceivable that as we add more constructs to a programming language, it becomes more powerful, in the sense of computing functions and recognizing languages. However, anything you can do with one language you can do with another. (It might simply be easier to program in one than another, or one might run faster.) How can we show we can do the same thing with Python as FORTRAN? We can convert python programs into FORTRAN, or convert FORTRAN programs to python. We can simulate one language with the other. This “proves” that they have the same computational power. That’s what the researchers of computation theory did. They gave ways to simulate Turing machines by lambda calculus, lambda calculus by Turing machines, as well as different variations of these models. They found that all these models were doing the same thing! We’ll see this for ourselves, as we show that several variations of Turing machines all have the same computational power. 1.2 Multitape Turing machines Definition 6.4: A multitape Turing machine is a Turing machine with multiple tapes. The input is written on the first tape. The transition function can not look at all the symbols under each of the heads, and write and move on each tape. We can define all this rigorously, if we wanted to. Theorem 6.5: thm:multitape 퐴 is Turing-recognizable iff some multitape TM recognizes 퐴. In other words, Turing-recognizability with respect to one-tape Turing machines is the same as Turing-recognizability with respect to multi-tape Turing machines. Proof. If 퐴 is Turing recognizable, then clearly a multitape TM recognizes 퐴, because a single-tape TM is a multitape TM. Suppose we have a language recognizable with a multitape TM. We need something like a compiler to convert a multitape TM to a one-tape TM, so that we can use a one-tape TM to simulate a multi-tape TM. Let 푀 be a multitape TM. 푀 can do stuff with primitive operations that a single-tape TM 푆 can’t do. It can write to a 2nd tape, which 푆 doesn’t have! We need to use some data structure on 푆’s tape to represent what appears on the multitapes on 푀. 46
  • 47. Lecture 6 Notes on Theory of Computation 푆 initially formats its tape by writing separators # after the input string, one for each tape that 푀 has. The string between two separators will represent the string on one of 푀’s tapes. Next 푆 moves into simulation phase. Every time 푀 does one step, 푆 simulates it with many steps. (This is just “programming,” in single-machine TM code.) 푆 has to remember where the heads are on the multiple tapes of 푀. We enhance the alphabet on 푆 to have symbols with dots on them ˙푎 to represent the positions of the heads. 푆 update the locations of these markers to indicate the locations of the heads. (Figure 3.14 from the textbook.) There are details. For example, suppose 푀 decides to move head to an initially blank part. 푆 only has allocated finite memory to each tape! 푆 has to go into an “interrupt” phase and move everything down one symbol, before carrying on. A lot of models for computation turn out to be equivalent (especially variants of Turing machines). To show they are equivalent, give a way to simulate one model with the other. The same proof carries through for deciders: A language is decidable by a multitape TM iff it is decidable by a single-tape TM. Let’s look at another example, similar but important for us down the road. 1.3 Nondeterministic TM Definition 6.6: A nondeterministic Turing machine (NTM) is like a Turing machine except that the transition function now allows several possibilities at each step, i.e., it is a function 훿 : 푄 × Γ → 풫(푄 × Γ × {퐿,푅}). If any thread of the computation accepts, then we say the Turing machine accepts. (Accept-ing overrules rejecting.) We say that a nondeterministic TM is a decider if every branch halts on every input. 47
  • 48. Lecture 6 Notes on Theory of Computation Theorem 6.7: thm:ntm 퐴 is Turing-recognizable iff some NTM recognizes 퐴. As we will see in the second half of the course, nondeterministic TM’s are very impor-tant. For our purposes now, they have the same power as deterministic TM’s, because they recognize the same class of langugages. Proof. Any deterministic Turing machine is a NTM, so this direction is obvious. We want to convert a NTM 푁 to a DTM 푀. 푀 is supposed to accept exactly the same input 푁 accepts, by simulating 푁. How does this work? This is trickier. 푁 may have made a nondeterministic move, resulting in 2 or more options. 푀 doesn’t know which to follow. If there are multiple ways to go, then take that piece of tape, make several copies of the tape separated by #, and carry on the simulation. This is just like the proof of Theorem 6.5, except that different segments of tape don’t rep different tapes, they represent different threads. We have to represent both the head and the state for each of the threads. The number of threads may grow in some unbounded way. 푀 can’t keep track of all the different states in finite memory, so we had better write them all down. To do this, allow a composite symbol 푞 to mean that the head is at 푎 and the machine is in state 푞 in that thread. 푀 proceeds 푎 by taking a thread, seeing what 푁 would do, and updating thread. One of threads may again fork into multiple possibilities. In that case we have to open up room to write down copies of a thread, by moving stuff down. 푀 goes through each thread and repeats. The only thing we have to take note of is when shoule 푀 ends up accepting? If 푀 enters an accept state on any thread, then 푀 enters its accept state. If 푀 notices some thread of 푁 enters a reject state, then 푀 collapse the thread down or marks it as rejecting, so it don’t proceed with that further, and carries on with the other threads. Question: When does nondeterminism help in a model of computation? In the second half of the course, when we care about how much time computation takes, the big queston is whether NTM and TM are equivalent. It is not obvious when nondeterminism is equivalent to determinism. If we can answer this question for polynomial time TM’s, then we’ve just solved a famous problem (P vs. NP). Let’s just do 1 more model, that has a different flavor that what we’ve done, and is slightly more interesting. 1.4 Turing enumerators Instead of recognition, can you just list the members of a language? Definition 6.8: A Turing enumerator is a Turing machine with a “printer” (output device). Start the Turing machine on an empty tape (all blanks). The Turing enumerator has a special feature that when it goes into “print” mode, it sends out a marked section of the tape to the printer to write out. 48
  • 49. Lecture 6 Notes on Theory of Computation (Figure 3.20 in textbook) The strings that are written out by an enumerator 퐸 are considered to be its language: 퐿(퐸) = {푤 : 퐸 outputs 푊 at some point when started on blanks} . If 퐸 halts, then the list is finite. It could also go on forever, so it can enumerate an infinite language. Again, Turing enumerators capture the same class of languages. Theorem 6.9: 퐴 is Turing-recognizable iff 퐴 = 퐿(퐸) for some enumerator 퐸. Proof. Here we need to prove both directions. (←) Convert 퐸 to an ordinary recognizer 푀. Given 퐸, we construct a TM 푀 with 퐸 built inside it. Have 푀 leave the input string 푤 alone. 푀 moves to the blank portion of the tape and runs 퐸. When 퐸 decides to print something out, 푀 takes a look to see if the string is 푤. If not, then 푀 keeps simulating 퐸. If the string is 푤, then 푀 accepts. Note that if 푀 doesn’t find a match, it may go on forever—this is okay, 푀 can loop by rejecting. We have to take advantage of 푀 being able to go on forever. (→) Convert 푀 to enumerator 퐸. The idea is to feed all possible strings to 푀 in some reasonable order, for instance, lexicographic order 휀, 0, 1, 00, 01, 10, 11. However, we have to be careful. Suppose 푀 is running on 101. If 푀 accepts 101, then we print it out. If 푀 halts and rejects 101, then 퐸 should move on to the next string. The only problem is when 푀 runs forver. What is 퐸 supposed to do? 퐸 doesn’t know 푀 is going forever! We can’t get hung up running 푀 on 101. We need to check 110 too! The solution is to run 푀 for a few steps on any given string, and if hasn’t halted then move on, and come back to it lter. We share time among all strings where computation hasn’t ended. Run more and more strings for longer and longer. More precisely, for 푘 = 1, 2, 3, . . ., 퐸 runs 푀 on the first 푘 strings for 푘 steps. If 푀 ever accepts some string 푠, then print 푠. 49
  • 50. Lecture 6 Notes on Theory of Computation S2 Philosophy: Church-Turing Thesis The Church-Turing Thesis was important in the history of math. After proposing all these different models to capture what we can compute, people saw how they were all equivalent (in an non-obvious way). Axiom 6.10 (Church-Turing Thesis): church-turing Our perception of what we can do with a computer (an algorithm, effective procedure) is exactly captured by Turing machine. Our inuitive “Algorithm” is the precise notion of a “Turing machine.” It might seem arbitrary for us to focus on Turing machines, when this is just one model of computation. But the Church-Turing Thesis tells us the models are all equivalent! The notion of algorithm is a natural, robust notion. This was a major step forward in our understanding of what computation is. It’s almost saying something about the physical universe: there’s nothing we can build in the physical world that is more powerful than a Turing machine. David Hilbert gave an address at the International Congress of Mathematicians in 1900. He was probably the last mathematician who knew what was going on in every field of mathematics at the same time. He knew the big questions in each of those fields. He made a list of 23 unsolved problems that he felt were a good challenge for the coming century; they are called the Hilbert problems. Some of them are solved. Some of them are fuzzy, so it’s not clear whether they are solved. Some of them have multiple parts, just like homework. One of the questions was about algorithms—Hilbert’s tenth problem, which I’ll describe. Suppose we want to solve a polynomial equation 3푥2 +17푥−22 = 0. This is easily done. But suppose we don’t want to know if a polynomial equation has a root, but whether it have a root where variables are integers. Furthermore, we allow variables with several variables. This makes things a lot harder. For instance, we could have 17푥푦2 + 2푥 − 21푧5 + 푥푦 + 1 = 0. Is there an assignment of integers in 푥, 푦, 푧 such that this equation is satisfied? Hilbert asked: Is there a finite procedure which concludes after some finite number of steps, that tells us whether a given polynomial has an integer root? We can put this in our modern framework. Hilbert didn’t know what a “procedure” was in a mathematical sense. In these days, this is how we would phrase this question. Problem 6.1 (Hilbert’s Tenth Problem): Let 퐷 = {푝 : 푝 is a multivariable polynomial that has a solution (root) in integers} . 50
  • 51. Lecture 7 Notes on Theory of Computation Is 퐷 decidable?4 The answer is no, as Russian mathematician Matiasevich found when he was 20 years old. Without a precise notion of procedure, there was no hope of answering the question. Hilbert originally said, give a finite procedure. There was no notion that there might not be a procedure! It took 35 years before the problem could be addressed because we needed a formal notion of procedure to prove there is none. Here, the Church-Turing Thesis played a fundamental role. Lecture 7 Thu. 9/27/12 Last time we talked about ∙ TM variants ∙ Church-Turing Thesis Today we’ll give examples of decidable problems about automata and grammars. S0 Hints Problem 1: Prove some language not context-free. Use the pumping lemma! The trick is to find the right string to use for pumping. Choose a string longer than the pumping length, no matter how you try to pump it up you get out of the language. The first string you think of pump may not work; probably the second one will work. Problem 2: Show context-free. Give a grammar or a pushdown automata. At first glance it doesn’t look like a context-free language. Look at the problem, and see this is a language written in terms of having to satisfy two conditions, eahc of which seems to use the condition. The problem seems to be if you use the stack for the first condition it’s empt for the second condition. Instead of thinking of it as a AND of two conditions , think of it as an OR of several conditions. Problem 3 and 4: easy. 4 about enumerator. Problem 5: variant of a Turing machine. Practice with programming on automaton. Problem 6: (4.17 in the 2nd edition and 4.18 in the 3rd edition) Let 퐶 be a language. Prove that 퐶 is Turing-recognizable iff a decidable language 퐷 exists such that 퐶 = {푥 : for some 푦, ⟨푥, 푦⟩ ∈ 퐷} . We’ll talk about this notation below. 4Note 퐷 is Turing recognizable. Just start plugging in all possible tuples of integers, in a systematic list that covers all tuples. If any one is a root, accept, otherwise carry on. 51
  • 52. Lecture 7 Notes on Theory of Computation 0.1 Encodings We want to feed more complicated objects into Turing machines—but a Turing machine can only read strings. If we want to feed a fancy object into a program we have to write it as a string. We need some way of encoding objects, and we’d like some notation for it. For any formal finite object 퐵, for instance, a polynomial, automaton, string, grammar, etc., we use ⟨퐵⟩ to denote a reasonable encoding of 퐵 into a binary string. ⟨퐵1, . . . ,퐵푘⟩ encodes several objects into a string. For example, to encode an automaton, write out the list of states and list of transitions, and convert it into a long binary string, suitable for feeding in to a TM. Problem 6 links recognizability and decidability in a nice way. You can think of it as saying: “The projection of a recognizable language is a decidable language.” Imagine we have a coordinate systems, 푥 and 푦. Any point corresponds to some (푥, 푦). Look at all 푥 such that for some 푦, ⟨푥, 푦⟩ is in 퐷. So 퐶 consists of those points of 푥 underneath some element of 퐷. We’re taking all the (푥, 푦) pairs and remove the 푥. Shrinks 2-d shape into 1-d shadow; this is why we call it the projection. This will reappear later on the course when we talk about complexity theory! You need to prove an “if and only if.” Reverse direction is easy. If 퐷 is decidable, and you can write 퐶 like this, we want 퐶 to be recognizable. We need to make a recognizer for 퐶. It accepts for strings in the language but may go on forever for strings not in the language. Accept if in 퐶 but don’t know what 푦 is. Well, let’s not give it away more! The other direction is harder. Given T-recognizable 퐶, show that 퐷 is decidable, we don’t even know what 퐷 is! We have to find an “easier” language, so 푦 sort-of helps you determine whether 푥 ∈ 퐶. If 퐶 were decidable easy, just ignore 푦. Which 푦 should you use? Make it a decidable test. The 푦 somehow proves that 푥 ∈ 퐶. For each 푥 ∈ 퐶 there has to be some 푦 up there somewhere. Wht does 푦 do? The nice thing about 푦 in 퐶, is that if the proof fails, the decider can see that the proof fails. (Whatever I mean by proof. Conceptually. Test for validity.) Go from recognizer to decider. Nice problem! S1 Examples of decidable problems: problems on FA’s By the Church-Turing Thesis 6.10, algorithms are exactly captured by Turing machines. We’ll talk about algorithms and Turing machines interchangeably (so we’ll be a lot less formal about putting stuff in Turing machine language). Theorem 7.1: thm:ADFA Let 퐴DFA = {⟨퐵,푤⟩ : 퐵 is a DFA and 퐵 accepts푤} . Then 퐴DFA is decidable. The idea is to just run the DFA! We’ll do some easy things to start. 52
  • 53. Lecture 7 Notes on Theory of Computation Proof. We’ll give the proof in high level descriptive language (like pseudocode), rather than explicitly draw out state diagrams. We’ll write the proof in quotes to emphasize that our description is informal but there is a precise mathematical formulation we can make. Let 퐶=“on input string 푥 1. Test if 푥 legally encodes ⟨퐵,푤⟩ for some DFA 퐵 and 푤. Does it actually encode a finite automata and string? If not, reject (it’s a garbage string). 2. Now we know it’s of the correct form. Run 퐵 on 푤. We’ll give some details. We’ll use a multi-tape Turing machine. Find the start state, and write it on the working tape. Symbol by symbol, read 푤. At each step, see what the current state is, and transition to the next state based on the symbol read, until we get to end of 푤. Look up the state in 퐵 to see whether it is an accept state; if so accept, and otherwise reject. 3. Accept if 퐵 accepts. Reject if 퐵 rejects.” Under the high-level ideas, the details are there. From now on, we’ll just give the high-level proof. This is the degree of formality that we’ll provide and that you should provide in your proofs. Brackets mean we agree on some encoding. We don’t go through the gory details of spelling out exactly what it is; we just agree it’s reasonable. We go through some details here, so you can develop a feeling for what intuition can be made into simulations. Each stage should be obviously doable in finite time. Turing machines are “powerful” enough: trust me or play around with them a bit to see they have the power any programming language has. We’ll do a bunch of examples, and then move into some harder ones. Let’s do the same thing for NFA’s. Theorem 7.2: Let 퐴NFA = {⟨퐵,푤⟩ : 퐵 is a NFA and 퐵 accepts 푤} . Then 퐴NFA is decidable. 53
  • 54. Lecture 7 Notes on Theory of Computation We can say exactly what we did before for NFA’s instead of DFA’s. However, we’ll say it a slightly different way, to make a point. Proof. We’re going to use the fact that we already solved the problem for DFA’s. Turing machine 퐷 =“on input ⟨퐵, 푞⟩, (By this we mean that we’ll check at the beginning whether the input is of this form, and reject if not.) 1. Convert the NFA 퐵 to an equivalent DFA 퐵′ (using the subset construction). All of those constructions can be implemented with Turing machines. 2. Run TM 퐶 (from the proof of Theorem 7.1) on input ⟨퐵′,푤⟩. 3. Accept if 퐶 accepts, reject if 퐶 rejects. We see that in this type of problem, it doesn’t matter whether we use NFA or DFA, or whether we use CFG or PDA, because each in the pair recognizes the same class of languages. In the future we won’t spell out all all equivalent automata; we’ll just choose one representative (DFA and CFG). Let’s do a slightly harder problem. Theorem 7.3: EDFA Let 퐸DFA = {⟨퐵⟩ : 퐵 is DFA and 퐿(퐵) = 휑} . Then 퐸DFA is decidable. This is the emptiness testing problem for DFA’s: Is there one string out there that the DFA accepts? Proof. How would you test if a DFA 퐵 has am empty language? Naively we could test all strings. That is not a good idea, because this is not something we can do in finite time. Instead we test whether there is a path from the start state to any of the accept states: Mark the start state, mark any state emanating from a previously marked state, and so forth, until you can’t mark anything new. We eventually get to all states that are reachable under some input. If we’ve marked all reachable states, and haven’t marked the accept state, then 퐵 has empty language. 54
  • 55. Lecture 7 Notes on Theory of Computation With this idea, let’s describe the Turing machine that decides 퐸DFA. Let 푆 =“ on input ⟨퐵⟩. 1. Mark the start state. 2. Repeat until nothing new is marked: Mark all states emanating from previously marked states. 3. Accept if no accept state is marked. Reject otherwise. This is detailed enough for us to build the Turing machine if we had the time, but high-level enough so that the focus is on big details and not on fussing with minor things. (This is how much detail I expect in your solutions.) Note this applies to NFA’s as well because we can convert NFA’s to DFA’s and carry out the algorithm we just desribed. Theorem 7.4 (Equivalence problem for DFA’s): EQDFA 퐸푄DFA = {⟨퐴,퐵⟩ : 퐴,퐵 DFA’s and 퐿(퐴) = 퐿(퐵)} Proof. Look at all the places where 퐿(퐴) and 퐿(퐵) are not the same. Another way to phrase the equivalence problem (is 퐿(퐴) = 퐿(퐵)) is as follows: Is the shaded area below, called the symmetric difference, empty? 퐴△퐵 = (퐿(퐴) ∩ 퐿(퐵)) ∪ (퐿(퐴) ∩ 퐿(퐵)) Let 퐸 =“ on input ⟨퐴,퐵⟩. ∙ Construct a DFA 퐶 which recognizes 퐴△퐵. Test if 퐿(퐶) = 휑 using the TM 푆 that tested for emptiness (Theorem 7.3). ∙ Accept if it is 휑, reject if not. 55
  • 56. Lecture 7 Notes on Theory of Computation S2 Problems on grammars Let’s shift gears and talk about grammars. Theorem 7.5: ACFG Let 퐴CFG = {⟨퐺,푤⟩ : 퐺 is a CFG and 푤 ∈ 퐿(퐺)} . Then 퐴CFG is decidable. Proof. We want to know: does 퐺 generate 푤? We need an outside fact. We can try derivations coming from the start variable, and see if any of them lead to 푤. Unfortunately, without extra work, there are infinitely many things to test. For example, a word 푤 may have infinitely many parse trees generating it, if we had a rule like 푅 → 휀|푅. Definition 7.6: A CFG is in Chomsky normal form if all rules are of the form 푆 → 휀, 퐴 → 퐵퐶, or 퐴 → 푎, where 푆 is the start variable, 퐴,퐵,퐶 are variables, 퐵,퐶 are not 푆, and 푎 is a terminal. The Chomsky normal form assures us that we don’t have loops (like 푅 → 푅 would cause). A variable 퐴 can only be converted to something longer, so that the length can only increase. We need two facts about Chomsky normal form. Theorem 7.7: chomsky-nf 1. Any context-free language is generated by a CFG in Chomsky normal form. 2. For a grammar in Chomsky normal form, all derivations of a length 푛 string have at most a certain number of steps, 2푛 − 1. Let 퐹 =“on ⟨퐺,푤⟩. 1. Convert 퐺 to Chomsky normal form. 2. Try all derivations with 2푛 − 1 steps where 푛 = |푤|. 3. Accept if any yield 푤 and reject otherwise Corollary 7.8: CFL-decidable Every CFL is decidable. This is a different kind of theorem from what we’ve shown. We need to show every context-free language is decidable, and there are infinitely many CFL’s. Proof. Suppose 퐴 is a CFL generated by CFG 퐺. We build a machine 푀퐺 (depending on the grammar 퐺) deciding 퐴: 푀퐺=“on input 푤, 56
  • 57. Lecture 7 Notes on Theory of Computation 1. Run TM 퐹 deciding 퐴CFG (from Theorem 7.5) on ⟨퐺,푤⟩. Accept if 퐹 does and reject if not. Theorem 7.9 (Emptiness problem for CFG’s): EQCFG 퐸CFG = {⟨퐴⟩ : 퐿(퐴) = 휑} is decidable. Proof. Define a Turing machine by the following. “On input ⟨퐺⟩, 1. First mark all terminal variables 푆 → ˙푎푆˙ 푏|푇˙푏 푇 → ˙푎|푇 ˙ 푎. 2. Now mark all variables that go to a marked variable: 푆 → ˙푎푆˙ 푏| ˙푇 ˙푏 ˙푇 → ˙푎| ˙푇 ˙ 푎. and repeat until we can’t mark any more ˙푆 → ˙푎 ˙푆 ˙푏 | ˙푇 ˙푏 ˙푇 → ˙푎| ˙푇 ˙ 푎. 3. If 푆 is marked at the end, accept, otherwise reject. Turing machines can decide a lot of properties of DFA’s and CFG’s because DFA’s and PDA’s have finite memory. Thus we may can say things like, “mark certain states/variables, then mark the states/variables connected to them, and so on, and accept if we eventually get to...” In contrast to the previous theorems, however, we have the following. Theorem 7.10 (Equivalence problem for CFG’s): EQCFG 퐸푄CFG = {⟨퐴,퐵⟩ : 퐴,퐵 CFG’s and 퐿(퐴) = 퐿(퐵)} is undecidable. 57
  • 58. Lecture 8 Notes on Theory of Computation (Note that the complement of 퐸푄CFG is recognizable. Decidability is closed under com-plement but not recognizability. In fact, the class of recognizable languages isn’t closed under intersection or complement.) We will prove this later. We also have the following theorem. Theorem 7.11 (Acceptance problem for TM’s): 퐴TM = {⟨푀,푤⟩ : TM 푀accepts 푤} is undecidable. However it is 푇-recognizable. To see that 퐴TM is 푇-recognizable, let 푈 =“ on ⟨푀,푤⟩. Simulate 푇 on 푤.” Note this may not stop. This is a famous Turing machine. It is the “universal machine,” and the inspiration for von Neumann architecture. It is a machine that one can program, without having to rewire it each time, so it can do the work of any other machine. Lecture 8 Tue. 10/2/12 Today Zack Remscrim is filling in for Michael Sipser. We summarize the relationships between the three types of languages we’ve seen so far. S1 Languages Proposition 8.1: lang-subset Each of the following classes of language is a proper subset of the next. 1. Regular 2. CFL 3. Decidable 4. Turing-recognizable Proof. We’ve already shown that the classes are subsets of each other. We have that {푎푛푏푛 : 푛 ≥ 0} is a CFL but not a regular language, and {푎푛푏푛푐푛 : 푛 ≥ 0} is decidable but not CFL. Today we’ll finish the proof by showing that decidable languages are a proper subset of T-recognizable languages, by showing that 퐴푇푀 = {⟨푀,푤⟩ : 푀 is a TM that accepts 푤} is Turing-recognizable but not decidable. 58
  • 59. Lecture 8 Notes on Theory of Computation We’ll also show there is a language that is not even Turing-recognizable. Theorem 8.2: 퐴푇푀 is Turing-recognizable. Proof. Let 푈 =“on input ⟨푀,푊⟩, 1. Run 푀 on 푤. 2. If 푀 accepts, then accept. If 푀 halts and rejects, then reject.” 푀 doesn’t have to be a decider, it may reject by looping. Then 푈 also rejects by looping. We can’t do something stronger, namely make a test for membership and be certain that it halts (i.e., make a decider). S2 Diagonalization Diagonalization is a technique originially introduced to compare the sizes of sets. We have a well-defined notion of size for finite sets. For infinite sets, it’s not interesting just to call them all “infinite.” We’d also like to define the size of an infinite set, so that we can say one infinite set is larger or the same size as another. Definition 8.3: Two sets 퐴 and 퐵 have the same size if there exists a one-to one (injec-tive) and onto (surjective) function 푓 : 퐴 → 퐵. Here, ∙ “one-to-one” means if 푥̸= 푦, then 푓(푥)̸= 푓(푦). ∙ “onto” means for all 푦 ∈ 퐵 there exists 푥 ∈ 퐴 such that 푓(푥) = 푦. We also say that 푓 : 퐴 → 퐵 is a 1-1 correspondence, or a bijection. This agrees with our notion of size for finite sets: we can pair off elements in 퐴 and 퐵 (make a bijection) iff 퐴 and 퐵 have the same number of elements. This might seem like an excessive definition but it’s more interesting when applied to infinite sets. Example 8.4: Let N = {1, 2, 3, 4, . . . , } E = {2, 4, 6, 8, . . . , }. Then N and E have the same size, because the function 푓(푛) = 2푛 gives a bijection N → E. 59
  • 60. Lecture 8 Notes on Theory of Computation 푛 푓(푛) 1 2 2 4 3 6 4 8 ... ... Note N and E have the same size even though E is a proper subset of N. This will usefully separate different kinds of infinities. We’re setting the definition to be useful for us. We want to distinguish sets that are much much bigger than N, such as the real numbers. Definition 8.5: A set is countable if it is finite or has the same size as N. Example 8.6: The set of positive rationals Q+ = ⌈︀푚 푛 : 푚, 푛 ∈ N}︀ is countable. To see this, we’ll build up a grid of rational numbers in the following way. 1 2 3 4 5 1 1 1 1 2 1 3 1 4 · · · 2 2 1 2 2 2 3 2 4 · · · 3 3 1 3 2 3 3 3 4 · · · 4 ... ... ... ... . . . 60
  • 61. Lecture 8 Notes on Theory of Computation Every rational number certainly appears in the table. We’ll snake our way through the grid. 1 2 3 4 5 1 1 1 1 2 1 3 1 4 · · · 2 2 1 B 2 2 B 2 3 2 4 · · · 3 3 1 B 3 2 3 3 3 4 · · · 4 ... ... ... ... . . . Now put the numbers in this order in a list next to 1, 2, 3, . . . 푛 푓(푛) 1 1 1 2 2 1 3 1 2 4 3 1 2 2 5 1 3 Note some rational numbers appear multiple ... times, for instance, 1 appears as ... 1 1 , 2 2 , . . .. In the correspondence we don’t want to repeat these, we just go to the next value. This creates a bijection between N and Q+, showing Q+ is countable. A lot of infinite sets seem to have the same size, so is this a completely useless definition? No, there are infinite sets bigger than others, so it is useful for us. Are the real numbers of the same size as rational numbers? Theorem 8.7: thm:R-uncountable The set of real numbers R is not countable. Our proof uses the technique of diagonalization, which will also help us with the proof for 퐴푇푀. Proof. Assume by contradiction that R is countable; there exists a bijection 푓 : N → R. We’re going to prove it’s not a bijection, by showing that it misses some 푦. Let’s illustrate with a potential bijection 푓. 61
  • 62. Lecture 8 Notes on Theory of Computation 푛 푓(푛) 1 1.4142 2 3.1415 3 2.7182 4 1.6108 ... ... We’ll construct a number 푦 that is missed by 푓 in the following way: Let 푦 differ from 푓(푖) at the 푖th place to the right of the decimal point. 푛 푓(푛) 1 1.4142 2 3.1415 3 2.7182 4 1.6108 ... ... For instance, let 푦 = 0.3725 . . . We claim 푦 can’t show up in the image of 푓. Indeed, this is by construction: it differs from 푓(푖) in the 푖th place, so it can’t be 푓(푖) for any 푖. There’s one little detail: 1 and .999 . . . are equal even though their decimal representations are different. To remedy this, we’ll just never use a 0 or 9 in 푦 to the right of the decimal. This is just to get around a little issue, though. The main idea is that given an alleged bijection, I can show it’s not a bijection by constructing a value it misses. We’ve shown that there can’t be a bijection N → R; therefore R is uncountable. Use diagonalization when you want to construct an element that is different from every element on a given list. This is used in proofs by contradiction, for example, when you want to show a function can’t hit every element of a set. Theorem 8.8: Let ℒ = {퐿 : 퐿 is a language} . Then ℒ is uncountable. The proof uses the same diagonalization idea. Proof. It’s enough to show just ℒ is uncountable when the alphabet is just 0, because every alphabet contains at least 1 symbol. The set of possible strings is {0}* = {휀, 0, 00, 000, . . .}. 62
  • 63. Lecture 8 Notes on Theory of Computation For a language 퐿, define the characteristic vector of 휒퐿 by 휒퐿(푣) = 0 if 푣̸∈ 퐿 and 1 if 푣 ∈ 퐿. 휒퐿 simply records whether each word is in 퐿 or not. There is a correspondence between each language and its characteristic vectors. All we have to show is the set of characteristic vectors is uncountable. The set of strings of count-able length is uncountable. Assume by contradiction that {휒퐿 : 퐿 is a language over {0}} is countable. Suppose we have some bijection from N to the set of characteristic vectors 휒퐿, 푛 푓(푛) 1 1011 · · · 2 0000 · · · 3 1111 · · · 4 1010 · · · ... ... Again, there has to be some binary string that is missed by 푓. We choose 푦 so it differs from 푓(푖) at the 푖th place. 푛 푓(푛) 1 1011 2 0000 3 1111 4 1010 ... ... 푦 = 0101 · · · This 푦 can’t ever appear in the table: Suppose 푓(푛) = 푦. This is impossible since we said 푦 differs from 푓(푛) in the 푛th place. This shows the set of languages really is uncountable. Note that our proof works no matter what the alleged bijection 푓 looks like. Whatever 푓 does, it pairs up each 푛 with one binary number. All I have to do is construct a 푦 that differs from every single 푓(푛). It’s constructed so it differs from every 푓(푖) somewhere. This shows that no function 푓 can work. S3 퐴푇푀: Turing-recognizable but not decidable Consider ℳ= {푀 : 푀 is a Turing machine} . (We fix the tape alphabet.) This is countable because there is a way to encode a Turing machines using a finite alphabet, with a finite length word. Now some words represent valid Turing machines. Now pair the first valid string representing a Turing machine with 1, the second valid string representing a Turing machine with 2, and so forth. This shows ℳ is countable. 63
  • 64. Lecture 8 Notes on Theory of Computation The set of all languages is uncountable, but the set of Turing machines is countable. This implies the following fact. Theorem 8.9: There exists a language 퐿 such that 퐿 is not Turing-recognizable. Proof. If every language were Turing-recognizable, we can map every language to a Turing machine that recognizes it; this would give a correspondence between a uncountable and a countable set. We’re now ready to prove that 퐴푇푀 is undecidable. Theorem 8.10: ATM 퐴푇푀 is undecidable. Proof. We’ll proceed by contradication using diagonalization. Assume for sake of contradiction that 퐴푇푀 is decidable. Then there exists a decider 퐻, such that 퐻(⟨푀,푤⟩) =⎧⎨⎩ accept, when 푀 accepts. rejects, when 푀 rejects. (Because 퐻 is a decider, it is guaranteed to halt.) Using this machine 퐻 we’re going to make a machine 퐷 that does something utterly impossible. This will be our contradiction. Let 퐷=“On input ⟨푀⟩, 1. Run 퐻 on ⟨푀, ⟨푀⟩⟩.5 퐻 answers the 퐴푇푀 problem, so it answers: does machine 푀 accept its own description?6 2. If 퐻 accepts, reject. If 퐻 rejects, accept. Now for any Turing machine 푀, 퐷 accepts ⟨푀⟩ iff 푀 doesn’t accept ⟨푀⟩. What happens if we feed ⟨퐷⟩ to 퐷? We get that 퐷 accepts ⟨퐷⟩ iff 퐷 doesn’t accept ⟨퐷⟩. This is a contradiction! Let’s look at what we’ve done. Let’s say 퐴푇푀 were decidable. Let 퐻 decide the 퐴푇푀 problem. We construct 퐷 that uses 퐻 as a subroutine, that does the opposite of what a machine 푀 does when fed the description of 푀. Then when we feed ⟨퐷⟩ to 퐷, 퐷 is now completely confused! We get a contradiction, hence 퐴푇푀 can’t be decidable. (If you’re completely confused, there’s more explanation in the next lecture.) This completes the picture in Proposition 8.1. 5Compare this to looking at the 푖th symbol of 푓(푖). 6This is a valid question, because we can encode the machine in a string, and the machine accepts strings. We can feed the code of a program to the program itself. For instance, we could have an optimizing compiler for 퐶, written in 퐶. Once we have the compiler, we might compile the compiler. 64
  • 65. Lecture 9 Notes on Theory of Computation S4 Showing a specific language is not recognizable So far we know that there are nonrecognizable languages, but we haven’t given an explicit description of one. Now we’ll show a specific language is not recognizable. For this the following lemma is useful. Lemma 8.11: 퐴 is decidable iff 퐴 is 푇-recognizable and 퐴 is 푇-recognizable (we say that 퐴 is co-T-recognizable). This immediately implies that 퐴푇푀 is not recognizable. Proof. ( =⇒ ): Suppose 퐴 is decidable. Then 퐴 is T-recognizable. For the second part, if 퐴 is decidable, then 퐴 is decidable (decidable languages are closed under complementation: just run the decider and do the opposite. You’re allowed to do the opposite because the decider is guaranteed to halt). Hence 퐴 is also 푇-recognizable. (⇐): Suppose 푅 recognizes 퐴 and 푆 recognizes 퐴. We construct a decider 푇 for 퐴. If we can do this, we’re done. Construct 푇 as follows. 푇 =“on input 푤, 1. Run 푅 and 푆 on 푤 in parallel until one accepts. (We can’t run 푅 and see what it does, and then run 푆, because 푅 and 푆 may not be decidable—푅 might run forever, but 푇 needs to be a decider.) This won’t take forever: either 푅 or 푆 might run forever on a particular input, but at least one of them will accept eventually, becase a string is either in 퐴 or 퐴. 2. If 푅 accepts, then accept. If 푆 accepts (i.e. 푤 ∈ 퐴), then reject. Lecture 9 Thu. 10/4/12 Last time we saw ∙ 퐴푇푀 is undecidable. ∙ 퐴푇푀 is T-unrecognizable. ∙ Diagonalization method We showed how the diagonalization method proved the reals were uncountable, and also applied the same idea to decidability. We’ll give a quick recap, and highlight why the idea behind the two diagaonalization arguments are the same. Theorem 9.1: R is uncountable. 65
  • 66. Lecture 9 Notes on Theory of Computation Proof. Assume for contradiction that R is countable. Suppose we’re given a bijection. 푛 푓(푛) 1 2.71828 . . . 2 3.14159 . . . 3 0.11111 . . . 4 ... ... ... Take a number differing from 푓(푖) in 푖th place. For instance, take 푥 = 0.654 . . . where 6̸= 7, 5̸= 4, and 4̸= 1. Then 푥 can’t be on the list. For instance, it can’t be the 17th number because it’s different in 17th place. Thus 푓 fails to be a bijection. This is Cantor’s proof. We applied diagonalization to decidability problems. Theorem 9.2: 퐴푇푀 is undecidable. Proof. Assume 퐴 is decidable by a Turing machine 퐻. Use 퐻 to get TM 퐷, that does the following. 1. 퐷 on ⟨푀⟩ rejects if 푀 accepts ⟨푀⟩ and accepts if 푀 rejects (halt or loop) ⟨푀⟩. Then 퐷 accepts ⟨푀⟩ iff 푀 doesn’t accept ⟨푀⟩; hence 퐷 accepts ⟨퐷⟩ if 퐷 doesn’t accept ⟨퐷⟩, contradiction. This is the same idea as Cantor’s diagonalization argument! To see this, let’s make a table of how Turing machines respond to descriptions of Turing machines as inputs: ⟨푀1⟩ ⟨푀2⟩ ⟨푀3⟩ · · · ⟨퐷⟩ 푀1 accept reject reject · · · 푀2 reject reject reject 푀3 accept accept accept · · · ... ... . . . 퐷 rejects accept reject ? We programmed 퐷 so that it differed from what 푀푖 decided on ⟨푀푖⟩. However we get a contradiction because nothing can go in the box labeled “?”, hence 퐷 can’t be on the list of all Turing machines. Today we’ll show a lot of other problems are undecidable. There’s now a shortcut: by proving that 퐴푇푀 is undecidable, we will show a lot of other problems inherent 퐴푇푀’s undecidability. Then we won’t need the diagonalization argument again. Today we’ll use 1. Reducibility to show undecidability 2. Mapping reducibility to show T-unrecognizability. 66
  • 67. Lecture 9 Notes on Theory of Computation S1 Reducibility Let HALT푇푀 = {⟨푀,푤⟩ : TM 푀 halts on input 푤} . Theorem 9.3: HALTTM HALT푇푀 is undecidable. We can go back and use the diagaonlization method. But we’ll give a different technique. Proof. Suppose we can decide the halting problem by some Turing machine. We’re going to use that to decide 퐴푇푀, which we know is not decidable. Hence our assumption that HALT푇푀 is decidable must be false, i.e., the halting problem cannot be decided. Assume for sake of contradiction that TM 푅 decides HALT푇푀. We will construct a TM 푆 deciding 퐴TM. Let 푆 =“on input ⟨푀,푤⟩. 1. Use 푅 to test if 푀 halts on 푤. If not, reject. If yes, run 푀 on 푤 until it halts.” Why does this work? If 푀 doesn’t halt on 푤, then we know 푀 doesn’t accept, so reject. Suppose 푅 says 푀 does halt. We don’t know whether it accepts right off. Our algorithm says to run 푀 on 푤. We don’t have to worry about 푀 going forever, because 푅 has told us that 푀 halts! We’ll eventually come to the end, 푀 will accept or reject, and we can give our answer about 퐴TM. Thus we can use our HALTTM machine to decide 퐴TM. This is called reducing 퐴푇푀 to the HALT푇푀 problem. Reducibility: One way to show a problem is undecidable is by reducing it from a problem we already know is undecidable, such as 퐴TM. Concretely, to show a problem 푃1 is undecidable, suppose it had a decider. Use the decider for 푃1 to decide an undecidable problem (e.g. 퐴TM). This gives a contradic-tion. If some problem has already been solved, and we reduce a new problem to an old problem, then we’ve solved it too. For instance, consider the acceptance problem for DFA’s. We showed that 퐴퐷퐹퐴 is decidable (Theorem ??). Then it immediately follows that 퐴푁퐹퐴 is decidable, because we can reduce the 퐴푁퐹퐴 problem to a 퐴퐷퐹퐴 problem (Theorem ??). We converted the new problem into the solved problem. Definition 9.4: We say 퐴 is reducible to 퐵 if a solution to 퐵 gives a solution to 퐴. Here we used reducibility in a twisted way. If 퐴 is reducible to 퐵, we know that if we can solve 퐵 then we can solve 퐴. Hence if we can’t solve 퐴 then we can’t solve 퐵. 67
  • 68. Lecture 9 Notes on Theory of Computation We used a HALT푇푀 machine to decide 퐴푇푀, so we reduced 퐴푇푀 to HALT푇푀. All “natural” problems which are undecidable can be shown to be undecidable by reduc-ing 퐴푇푀 to them or their complement. ! When trying to show problems are undecidable, reduce from 퐴푇푀 (not to 퐴푇푀).a aOn an undecidability problem on the exam, if you just write “reduction from 퐴푇푀” you will get partial credit. If you write “reduction to 퐴푇푀” you will get less credit. Let 퐸TM = {⟨푀⟩ : TM 푀 and 퐿(푀) = 휑} . Theorem 9.5: thm:etm 퐸푇푀 is undecidable. Proof. Use reduction from 퐴푇푀 to 퐸푇푀. Here’s the idea. Assume 푅 decides 퐸푇푀. We construct 푆 deciding 퐴푇푀. How do we do this? 푆 wants to decide whether a certain string is accepted; 푅 only tells whether the entire language is empty. We’re going to trick 푅 into giving me the answer we’re looking for. Instead of feeding the TM 푀 into 푅, we’re going to modify 푀. In the modified version of 푀 it’s going to have 푤 built in: 푀푤. When start up 푀푤 on any input it will ignore that input, and just run 푀 on 푤. It doesn’t matter what I feed it; it will run as if the input were 푤. The first thing it does is erase the input and writes 푤. 푀푤 will always to do the same thing: always accept or always reject, depending on what 푀 does to 푤. (The language is everything or nothing.) Now we feed 푀푤 into 푅. The only way the language can be nonempty is if 푀 accepts 푤. We’ve forced 푅 to give us the answer we’re looking for, i.e., we’ve converted acceptance into emptiness problem. Now we’re ready to write out the proof. 푆 =“On input ⟨푀,푤⟩, 1. Construct 푀푤 =“ignore input. (a) Run 푀 on 푤. (b) Accept if 푀 accepts.” 2. Run 푅 on ⟨푀푤⟩. 3. Give opposite answer. (푅 is a decider. If 푅 accepts ⟨푀푤⟩, then 푀푤’s language is empty, so 푀 did not accept 푤, so reject.) This machine decides 퐴TM, which is a contradiction. Hence our assumption was incorrect; 퐸TM is undecidable. 68
  • 69. Lecture 9 Notes on Theory of Computation S2 Mapping reducibility We gave a general notion of reducibility, but not a specific definition. In this section we introduce a specific method called mapping reducibility. Definition 9.6: Let 퐴 and 퐵 be languages. We say that 퐴 is mapping reducible to 퐵, and write7 퐴 ≤푚 퐵 if there is a computable function 푓 : Σ* → Σ* and for all 푤, 푤 ∈ 퐴 iff 푓(푤) ∈ 퐵. We say 푓 : Σ* → Σ* is computable if some TM 퐹 halts with 푓(푤) on the tape when started on input 푤. Why is mapping reducibility useful? Suppose we have a decider for 퐵, and we have 푓, computed by a decider. We can use the two deciders together to decide whether a string is in 퐴! Proposition 9.7: pr:map-reduce If 퐴 ≤푚 퐵 and 퐵 is decidable (recognizable) so is 퐴. Proof. Say 푅 decides 퐵. Let 푆 =“On 푤, 1. Compute 푓(푤). 2. Accept if 푓(푤) ∈ 퐵. Reject otherwise.” For 퐵 recognizable, just remove the last line “reject if 푓(푤)̸∈ 퐵.” (We don’t know that 푅 halts.) Think of 푓 as a “transformer”: it transforms a problem in 퐴 to a problem in 퐵. If 퐴 is reducible to 퐵, and 퐴 is not decidable, then neither is 퐵. This will also help us prove a problem is non-T-recognizable. Let’s recast our previous results in the language of mapping reducibility. In the proof of Theorem 9.5 we showed that 퐴푇푀 ≤푚 퐸푇푀. We converted a problem about 퐴푇푀 to a problem about 퐸푇푀. Given ⟨푀,푤⟩ ∈ 퐴푇푀, let 푓 map it to ⟨푀푤⟩. We have ⟨푀,푤⟩ ∈ 퐴푇푀 iff 푀푤̸∈ 퐸푇푀. 7Think of the notation as saying 퐴 is “easier” than 퐵 69
  • 70. Lecture 9 Notes on Theory of Computation A useful fact is that 퐴 ≤푚 퐵 ⇐⇒ 퐴 ≤푚 퐵; by using the same 푓. We make one more observation, then prove another theorem. We actually have the following strengthened version of Theorem 9.5. Theorem 9.8: thm:etm2 퐸푇푀 is not recognizable. Proof. We showed 퐴푇푀 ≤푚 퐸푇푀, so 퐴푇푀 ≤푚 퐸푇푀. Since 퐴푇푀 is not recognizable, 퐸TM is not recognizable. We’ll now use mapping reducibility to give an example of a language such that neither it nor its complement is recognizable. We will prove this by reduction from 퐴푇푀. Theorem 9.9: EQTM 퐸푄푇푀 and 퐸푄푇푀 are both 푇-unrecognizable. Recall that the equality problem is that given 2 Turing machines, we want to know whether they recognize the same language. Proof. We show that 1. 퐴푇푀 ≤푚 퐸푄푇푀, or equivalently, 퐴푇푀 ≤푚 퐸푄푇푀. We have to give a function 푓 : ⟨푀,푤⟩↦→ ⟨푀1,푀2⟩ . We let 푀2 be the machine that always rejects. Let 푀1 = 푀푤, the machine that simulates 푀 on 푤. If 푀 accepts/rejects 푤 then the first will accept/reject everything and 푀2 will reject everything, so ⟨푀,푤⟩ ∈ 퐴푇푀 iff ⟨푀1,푀2⟩ ∈ 퐸푄푇푀. 2. 퐴푇푀 ≤푚 퐸푄푇푀, or equivalently, 퐴푇푀 ≤푚 퐸푄푇푀. 70
  • 71. Lecture 10 Notes on Theory of Computation We have to give a function 푓 : ⟨푀,푤⟩↦→ ⟨푀1,푀2⟩ . We let 푀2 be the machine that always accepts. Again let 푀1 = 푀푤, the machine that simulates 푀 on 푤. In the remaining 2 minutes, we’ll look at a cool fact that we’ll continue next time. Lots of undecidable problems appear throughout math have nothing to do with Turing machines. We’ll give the simplest example. Let’s define dominoes as pairs of strings of 푎’s and 푏’s, such as ⌉︀–푎푏푎 푎푏 ™, –푎푎 푎푏™, – ™, . . .« Given a set of dominoes, can we construct a match, which is an ordering of dominoes such that the string along the top is the same as the string along the bottom? One little point: each domino can be reused as many times as we want. This means we have a potentially unlimited set of dominoes. Is it possible to construct a match? This is an undecidable problem! And it has nothing to do with automata. But next time we will show we can reduce 퐴푇푀 to this problem; therefore it’s undecidable. Lecture 10 Thu. 10/11/12 Midterm Thu. 10/25 in walker (up through next week’s material. Everything on computabil-ity theory but not complexity theory.) Homework due next week. Handout: sample problems for midterm Last time we talked about ∙ reducibility ∙ mapping reducibility Today we will talk about ∙ Post Correspondence Problem ∙ LBA’s ∙ Computation history method 71
  • 72. Lecture 10 Notes on Theory of Computation We have been proving undecidability. First we proved 퐴푇푀 is undecidable by diagonalization. Next, by reducing ATM to another problem, we show that if the other problem were decidable so is 퐴푇푀; hence the other problem must also be undecidable. Today we’ll look at a fancier undecidable problem. It is a prototype for undecidable problems that are not superficially related to computability theory. All proofs for these problems use the method we’ll introduce today, the computation history method. Ours is a toy problem, with no independent interest. But it is nice to illustrate the method, and it is relatively clean. Even the solution to Hilbert’s tenth problem uses the computation history method (though there are many complications involved). S1 Post Correspondence Problem Given a finite collection of dominoes 푃 = ⌉︀‚푢1 푣1Œ,‚푢2 푣2Œ, · · · ,‚푢푘 푣푘Œ« a match is a sequence of dominoes from 푃 (repetitions allowed) where 푢푖1 · · · 푢푖ℓ = 푣푖1 · · · 푣푖ℓ The question is: Is there a match in 푃? For example, if our collection of dominoes is 푃 = ⌉︀‚푎푎 푎푏푎Œ,‚푎푏 푎푏푎Œ,‚푏푎 푎푎Œ,‚푎푏푎푏 푏 Œ« then we do have a match because ⃒⃒⃒⃒⃒ 푎 푏 | 푎 푎 | 푏 푎 | 푎 푎 | 푎 푏 푎 푏 푎 푏 푎 | 푎 푏 푎 | 푎 푎 | 푎 푏 푎 | 푏.⃒⃒⃒⃒⃒ Formally, define 푃퐶푃 = {⟨푃⟩ : 푃 has a match} . We will show PCP is undecidable. (Note it is Turing recognizable because for a given arrangement it’s easy to see if it’s a match; just enumerate all possible arrangements. If we find a match, accept.) This is an old problem, the first shown undecidable by Post in 1950’s.8 Let’s modify the PCP so that the match has to start with the starting domino (to see how to fix this, see the book). Theorem 10.1: PCP PCP is undecidable. 8Don’t confuse the Post Correspondence Problem with probabilistic checkable proofs, also abbreviated PCP. 72
  • 73. Lecture 10 Notes on Theory of Computation The proof has two ideas. Each takes some time to introduce. Instead of doing them at once (they intertwine), we’ll defer the proof and prove a different theorem that uses only one of the ideas. Then we prove this theorem and use both ideas. To introduce the first idea we’ll go back to a problem in computation. S2 Computation Histories and Linearly Bounded Automata Definition 10.2: A linearly bounded automaton (LBA) is a modified Turing machine, where the tape is only as long as the input string.9 The head doesn’t move off the left or right hand sides. The machine limited in how much memory it has: if you have an amount of tape, the amount of memory is linearly bounded in terms of the size of the input (you might have a constant better memory because you’re allowed a larger tape alphabet, but you can’t for instance have 푛2 memory). Now let the acceptance and empty problems be 퐴퐿퐵퐴 = {⟨푀,푤⟩ : LBA 푀 accepts 푤} 퐸퐿퐵퐴 = {⟨푀⟩ : LBA 푀 and 퐿(푀) = 휑} Even though 퐴푇푀 was undecidable, 퐴퐿퐵퐴 is decidable. Theorem 10.3: ALBA 퐴퐿퐵퐴 is decidable. This is a dramatic change in what we can do computationally! The key difference that makes 퐴퐿퐵퐴 decidable is that linearly bounded automata have finitely many configurations (see below). As a side remark, 퐴퐿퐵퐴 is not decidable by LBA’s. In the theorem, by decidable we mean it’s decidable by ordinary Turing machines. It is decidable but only by using a lot of memory. Proof. Define a configuration to be a total snapshot of a machine at a given time: (푞, 푡, 푝) where 푞 is the state, 푡 is the tape contents, and 푝 is the head position. For an input size, the number of configurations of a LBA is finite. If we run the LBA for certain amount of time 푇, then it has to repeat a configuration. If it has halted by time 푇, then we know the answer. If the machine hasn’t halted, it’s in a loop and will run forever. The reject. (We don’t even have to remember configurations.) For a string 푤 of length 푛, the number of configurations of length 푛 is |푄| · |Γ|푛 · 푛. 9The LBA doesn’t have a limited tape. (Then it would be finite automaton.) The tape is allowed to be enough to grow just enough to fit the input, but it can’t grow any further. 73
  • 74. Lecture 10 Notes on Theory of Computation We now write down the TM that decides 퐴퐿퐵퐴. “On input ⟨푀,푤⟩, 1. Compute |푄| · |Γ|푛 · 푛. 2. Run 푀 on 푤 for that many steps. 3. Accept if accepted. Reject if not yet accepted after that many steps. Note that to write the number |푄||Γ|푛푛 down requires on the order of 푛 ln 푛 length tape, so intuitively 퐴퐿퐵퐴 is not decidable by LBA’s. The same diagonalization method can prove that 퐴퐿퐵퐴 is not decidable by LBA’s. In general, we can’t have a class of automata which decide whether automatons of that class accept. In contrast with 퐴퐿퐵퐴, 퐸퐿퐵퐴 is still undecidable. Theorem 10.4: ELBA 퐸퐿퐵퐴 is undecidable. We prove this in a totally different way. Before, to prove 퐸푇푀 is undecidable (Theo-rem 9.5), we showed 퐴푇푀 reduces to 퐸푇푀. We also have 퐴퐿퐵퐴 reduces to 퐸퐿퐵퐴, but doesn’t tell us anything because 퐴퐿퐵퐴 is decidable! Instead we reduce 퐴푇푀 to 퐸퐿퐵퐴. This is not obvious! Assume 푅 decides 퐸퐿퐵퐴. We’ll construct 푆 deciding 퐴푇푀. (This is our standard reduction framework.) This is hard because we can’t feed Turing machines into 퐸퐿퐵퐴: it only accepts LBA’s as input, not general Turing machines. We use the idea of computation history. Definition 10.5: Define an accepting computation history of a TM 푇 on input 푤 to be 퐶1,퐶2, . . . ,퐶accept where 퐶1 is the start configuration, each 퐶푖 leads to 퐶푖+1, and 퐶accept is in an accepting state. 10 If a Turing machine does not accept its input, it does not have an accepting computation history. An accepting configuration history exists iff the machine accepts. It’s convenient to have a format for writing this down (this will come up later in com-plexity theory). Write down (푞, 푡, 푝) as 푡1푞푡2. 10The computation history stores a record of all motions the machine goes through, just as a debugger stores a snapshot of what all the registers contain at each moment. 74
  • 75. Lecture 10 Notes on Theory of Computation This means: split the tape into 2 parts, the part before the head is 푡1, the part after the head is 푡2 and 푞 points to the first symbol in 푡2. All I’m doing here is indicating the position of the head by inserting a symbol representing the state in between the two parts. Write the computation history as a sequence of strings like this separated by pound signs. 퐶1#퐶2#· · ·#퐶accept. Here 퐶1 is represented by 푞0푤1 · · ·푤푛. Proof of Theorem 10.4. Let 푆 =“on input ⟨푇,푤⟩, 1. Construct LBA 푀푇,푤=“On input 푧, (a) test if 푧 is an accepting computation history for 푇 on 푤. (b) Accept if yes. Reject if not. Note 푀푇,푤 does not simulate 푇. It simply checks if the input is a valid computation of 푇, in the form 퐶1#퐶2#· · ·#퐶accept. Why is this a LBA? It doesn’t actually simulate 푇, it just checks the computation; this doesn’t require running off the tape. What does 퐶1 need to look like? It must look like 푞0푤1 · · ·푤푛. How do we check 퐶2? This is a delicate point. See whether 퐶1 updated correctly to 퐶2. Zigzag back and forth between 퐶1 and 퐶2 to check that everything follows legally. If anything is wrong, reject. It can put little markers on the input. It repeats all the way to 퐶accept, then check that 퐶accept is in an accept state. Now the only string our LBA 푀푇,푤 could possibly accept, by design, is an accepting computation history. The point is that checking a computation is a lot easier than doing it yourself; a LBA is enough to check a TM’s computation. Now we have 0 or 1 string is in푀푇,푤. If 푇 does not accept, then 퐿(푀푇,푤) is empty. If 푇 ac-cepts, there is exactly one accepting computation history, namely the correct 퐶1#· · ·#퐶accept. Each configuration forces the next all the way to the accepting configuration. (We’ve built the Turing machine based on the particular Turing machine 푇 and string 푤.) Hence 푀푇,푤 is empty if and only if 푇 does not accept 푤. We’ve reduced 퐴푇푀 to 퐸퐿퐵퐴. This proves 퐸퐿퐵퐴 is undecidable. Note the computational history method is especially useful to show the undecidability of a problem that has to do with weaker models of computation. 75
  • 76. Lecture 10 Notes on Theory of Computation S3 Proof of undecidability of PCP Proof. (This is slightly rushed; see the book for a complete treatment.) The idea is to construct a collection of dominoes where a match has to be an accepting computation history. (For simplicity, we’ll deal with the modified PCP problem where we designate a certain domino to be the starting domino.) Given 푇, 푤, we construct a PCP problem 푃푇,푤 where a match corresponds to an accepting computation history. We construct the dominoes in order to force any match to simulate a computation history. ∙ Let the start domino be ‚ # Œ. #푞0푤1 . . .푤푛#∙ If in 푇 we have 훿(푞, 푎) = (푟, 푏,푅), put the domino ‚푞푎 푟푏Œ in. Similarly we have a domino for left transitions (omitted). 푎 ∙ For all tape symbols 푎 ∈ Γ, put in ‚Œ. 푎Consider a concrete example, 푤 = 011, and 훿(푞0, 0) = (푞5, 2,푅). We have the domino ‚푞00 2푞5Œ. (The construction has one simple idea really: the transition dominoes, like ‚푞푎 푟푏Œ, force each configuration to lead to the next configuration.) We start with ⃒⃒⃒⃒⃒ #| 푞0 0| # 푞0 0 1 1 #| 2 푞5|⃒⃒⃒⃒⃒ We’ve managed to push forward the match on one more domino. Now we have to copy everything. We use ‚1 Œ, ‚0 Œ, ‚2 Œ‚# , Œ: 102#⃒⃒⃒⃒⃒ #| 푞0 0| 1| 1| #| #| 푞0 0 1 1 #| 2 푞5| 1| 1| #|⃒⃒⃒⃒⃒ At end, computation history is done, but match isn’t. We add dominoes ‚푞accept푐 푞accept Œ, ‚푐푞accept 푞accept Œ, and ‚푞accept## # Œ. These dominoes “eat” the tape one symbol at time, around the accept state. Putting in one last domino finishes it off. The start domino is a technicality. We’ve reduced 퐴푇푀 to PCP. Therefore, PCP is undecidable. 76
  • 77. Lecture 11 Notes on Theory of Computation A Turing machine accepts an input if and only if it has an accepting computation history for that input. Thus the problem of whether 푇 accepts 푤 can be formulated as: does 푇 have an accepting computation history for 푤? This formulation is more concrete, and it is much easier to check whether a compu-tation history is correct, then ask whether 푇 accepts 푤. To show a problem that isn’t related to computability theory is undecidable, find a way to simulate/encode an undecidable problem (such as 퐴푇푀) with that problem. It is useful to encode computation histories. Lecture 11 Tue. 10/16/12 Last time we talked about ∙ the computation history method ∙ 퐸퐿퐵퐴, PCP undecidable. Today we’ll do the following. ∙ Review above, ALL푃퐷퐴 undecidable ∙ Recursion theorem ∙ Introduction to logic S0 Homework Enumerate collection of deciders hit every decidable language. Impossible. Argue reduction works as claimed. 4. computation history 5-6. today. 6. model for particular sentence. if haven’t seen logic before, read section. Hint built into problem if you read it carefully. S1 Computation history method The state doesn’t give us all the information about a Turing machine. Recall that a config-uration of a Turing machine 푀 consists of the state, tape contents, and head position. We have a convenient representation, where 푞 is written at the head position. 77
  • 78. Lecture 11 Notes on Theory of Computation A computation history of 푀 on 푤 is 퐶1, . . . ,퐶halt a sequence of configurations 푀 enters. It’s much easier to check these than to simulate a Turing machine outright. It is possible to check computation history with many kinds of automata or combinatorial objects. We started with Turing machine 푀 and 푤, and the problem of whether 푀 accepts 푤. We found we can convert this into an instance of PCP where the only possible match corresponds to an accepting computation history. The undecidability of Hilbert’s tenth problem is shown using the same idea. Here one has to construct polynomial in several variables (originally it was 13 variables). One variable plays the role of the input to the polynomial. The only way for the polynomial to have an integral solution is for the assignment to 푥 to be an accepting computational history suitably encoded in an integer. The other variables are helpers, to make sure the polynomial evaluates to 0 exactly when 푥 is an accepting computational history. Polynomials are a rather restricted comput model, so the polynomial is rather painful to present. (It would take an entire semester.) Let ALL푃퐷퐴 = {⟨푃⟩ : 푃 a 푃퐷퐴 and 퐿(푃) = Σ*} . It is the problem: does a pushdown automaton accept all strings? We use the computational history method to show the following. Theorem 11.1: ALLPDA ALL푃퐷퐴 is undecidable. Proof. We reduce 퐴푇푀 to ALL푃퐷퐴. We take ⟨푀,푤⟩ and convert it to a pushdown automaton 푃푀,푤, such that if can tell whether 푃푀,푤 accepts all inputs, we can tell whether 푀 accept 푤. We construct 푃푀,푤 by having it operate on computation histories. However, instead of having 푃푀,푤 accept an accepting computation history, we have it accept every string except for this string. It is the sanitation engineer that accepts all the bad strings, junk, garbage. If 푀 doesn’t accept 푤, then there is no accepting history, so everything is junk, and 푃푀,푤 accepts everything. If 푀 accepts 푤, then 푃푀,푤 accepts everything except one string. We feed 푃푀,푤 into a machine for ALL푃퐷퐴 to decide 퐴푇푀. 78
  • 79. Lecture 11 Notes on Theory of Computation How can we make a PDA to accept all the junk? It will use nondeterminism. It checks the configuration history to see if it ∙ fails to start correctly, ∙ fails to end correctly, ∙ or fails to go from one step to the next correctly. 푃푀,푤 has the starting configuration built in, so it can check whether the history starts correctly. If not, accept. One branch looks at the last configuration; if that is not right, accept. Now 푃푀,푤 scans through the configuration history (nondeterministically). At a place where it guesses there may be an error, it pushes 퐶푖 onto the stack, then pop 퐶푖 off as it compares 퐶푖 to 퐶푖+1, seeing if everything matches except that stuff near the head is updated correctly. However, 퐶푖 comes out in the reverse order that it was put in. The trick is to write every other configuration in reverse. 퐶푅 2 , 퐶푅 4 . If 푃푀,푤 finds a bug, then it accepts. Remark: Why don’t we do the original thing, accept only the accepting computation history and nothing else? But that would only prove 퐸푃퐷퐴 is undecidable. And in fact, we can’t do that because 퐸푃퐷퐴 is decidable! We have to check each config-uration legally follows the next. For instance, if we want to check 퐶3 legally yields 퐶4, we have a problem because we’ve already read 퐶3 when comparing it to 퐶2. We can’t push 퐶3 and match it with 퐶4. This is an unfixable problem. S2 Recursion theorem The recursion theorem is an amazing theorem that gives a fitting end to the computability part of the course. It does some things that seem counter-intuitive. Can we make a Turing machine (or any reasonable computation model, such as a com-puter program), such that when we turn it on, it prints out its own description? I.e., can we make a self-reproducing Turing machine? Can we write a piece of code which outputs an exact copy of itself? We might argue as follows: We’d have to take a copy of the program and put it inside itself, and then another copy inside that copy, and so forth. We’d have to have an infinite program, with copies of itself down forever. But in fact we can make such a program, and it is useful in many cases. This theorem answers one paradox of life: Living things reproduce—make copies of themselves. Does that mean each living thing had its descendants inside, descendants of descendants inside those, down forever? No. Today, thoughts like that are so absurd they don’t even bear consideration. We don’t need to do that. Let’s make the paradox more concrete. Can we make a machine to build other machines? We can make a factory (suppose it’s fully automated) that makes cars. The factory is more 79
  • 80. Lecture 11 Notes on Theory of Computation complicated than the cars: It is at least as complicated because it has instructions for building the cars. It’s more complicated because it has all the machinery (robots and so forth). What if we wanted to make a factory that builds factories, identical copies of itself? It has robots which assemble a factory; it seems the factory would have to be more complicated than itself! But the Recursion Theorem says our intuition is false. We can make a factory-producing factory. There are practical situations where a program would produce copies of itself. Generally these are malicious programs (depend on whose side you’re on). This is one way to make a computer virus—the virus obtains an exact copy of itself, transmits it to the victim computer, installs virus, and continues spreading). One way to transmit virus is the Recursion Theorem. (The other way is to use the special nature of machine to find the address of own executable and get the code.) Theorem 11.2 (Recursion Theorem): recursion We can make a Turing machine “SELF” where on blank input, SELF outputs ⟨SELF⟩. We can do this in any programming language! The proof relies on the following lemma. Lemma 11.3: There is a computable function 푞 : Σ* → Σ* such that for every 푥, 푞(푥) = ⟨푃푥⟩ where 푃푥 is a Turing machine that prints 푥 (on any input). Moreover, we can determine that Turing machine from 푥 in a computable way. Proof. Let 푃푥 =“print 푥.” (Note the function is called 푞 for quote, because in LISP, this function is represented by sending 푥 to “푥.) Proof. The TM SELF will have 2 phases 퐴 and 퐵 . Control passes from 퐴 to 퐵 after 퐴 is done. 80
  • 81. Lecture 11 Notes on Theory of Computation 퐴 is very simple: 퐴 = 푃⟨퐵⟩. Now we have to say what 퐵 is. Why don’t we do the same thing to get the 퐴 part? Try to set 퐵 = 푃⟨퐴⟩. This is not possible. 퐴 is much bigger than 퐵. 퐴 is a 퐵-factory. You can’t take the description of 퐴 and stuff it into 퐵; the same circular reasoning got us into trouble in first place. We don’t print out print out 퐴 by having a copy of 퐴 inside 퐵. So how does 퐵 find what 퐴 is? It computes 푞 of the string on the tape. 푞 of that string is 퐴! Let 퐵=“compute 푞(tape) and prepend to tape.” We can do this in any programming language. Let’s do it in English. Print out this sentence. If you execute this code, out comes a copy of itself. It tells you as executer to print out copy of itself. However, it cheats, because “this” is a pointer to self. In general, there is no pointer refering to the code. We show how to get the same effect, in software, without self-reference. It achieves the same goal without “this” refering to itself. Here is the legit version. Print out two copies of the following, the second one in quotes. “Print out two copies of the following, the second one in quotes.” If you execute this command, you write the same thing. The A part is below, the B part is above. A is slightly bigger than B by virtue of quotes. Why is the Recursion Theorem more than just a curiosity? Besides being philosophically interesting, it has applications in computability theory and logic. The Recursion Theorem in full generality says that we can obtain a complete description and process that copy. Sometimes this is very helpful. Theorem 11.4 (Full recursion theorem): For any Turing machine 푇, there is a TM 푅 where 푅(푥) behaves the same as 푇(⟨푅, 푥⟩). Think of 푅 as a compiler which computes its own description. Proof. Figure 7. 푅 has 3 pieces now, 퐴, 퐵, and 푇, where 퐴 = 푃⟨퐵푇⟩ and 퐵 is as before. Moral of story: We can use “get own description” in Turing machine code. Why would we want to do that? We give a new proof that 퐴푇푀 is undecidable. Proof. Assume for sake of contradiction that 퐻 decides 퐴푇푀. Consider the following Turing machine: Construct TM 푅 =“on input 푥, 81
  • 82. Lecture 11 Notes on Theory of Computation 1. Get own description ⟨푅⟩. 2. Run 푅 on ⟨푅, 푥⟩ to see if 푅 accepts 푥. 3. Do the opposite of what 푅 did on ⟨푅, 푥⟩ This is a contradiction because 푅 accepts iff 푅 says 푅 doesn’t accept 푥. In a sense, the recursion method is standing in for the diagonalization argument here. Let’s give another application to something we haven’t proved yet. Let MIN = {⟨푀⟩ : 푀 is a TM with the shortest description among all equivalent TM’s} . Theorem 11.5: MIN MIN is not Turing-recognizable. Proof. Recognizable means enumerable. Assume by way of contradiction that 퐸 enumerates MIN. Make 푅 =“on 푥, 1. Get ⟨푅⟩. 2. Run 퐸 until some machine 푀 appears where ⟨푀⟩ is longer than 푅. 3. Simulate 푀 on 푥. Our 푅 will simulate the smallest machine in MIN larger than 푅, which contradicts the definition of MIN. As a summary, here are a list of problems we’ve proven to be decidable, undecidable, and unrecognizable. (Keep in mind CFG=PDA for the purposes of computation.) ∙ Decidable: 퐴퐷퐹퐴 (Theorem 7.1), 퐸퐷퐹퐴 (Theorem 7.3), 퐸푄퐷퐹퐴 (Theorem 7.4), 퐴퐶퐹퐺 (Theorem 7.5), 퐸푃퐷퐴 (exercise), 퐴퐿퐵퐴 (Theorem 10.3). ∙ Undecidable: 퐴푇푀 (Theorem 8.10), HALT푇푀 (Theorem 9.3), ALL푃퐷퐴 (Theorem 11.1), 퐸푄퐶퐹퐺 (Theorem 7.10), 퐸퐿퐵퐴 (Theorem 10.4), PCP (Theorem 10.1). (Note: I haven’t checked whether these are recognizable.) ∙ Unrecognizable: 퐴푇푀, 퐸푇푀 (Theorem 9.8), 퐸푄푇푀, 퐸푄푇푀 (Theorem 9.9), MIN (Theorem 11.5). 82
  • 83. Lecture 12 Notes on Theory of Computation S3 Logic We’ll finish today with a quick tour of logic. This is something that takes weeks in logic course; we’ll do it in 10 minutes. Logic is the math of math itself. Logic formalizes what we mean by mathematical state-ments. For instance, 휑 : ∀푥∃푦[푦 푥]. We all believe we can formalize math and define what quantifiers mean. (This is several weeks in logic.) This statement has meaning. It depends on what universe the quantifiers quantifying over. For natural numbers with usual interpretation, this is false. If we instead interpret over R or Z, then it is true. We have to give give a universe for quantifiers to range over and define all relation symbols “.” Ordinary boolean logic allows us to combine statements. We get a meaning for sentence, and it is either true or false in a given model. Definition 11.6: A model is a universe with all relation symbols defined. For instance, a model for a statement 휑 has 휑 true. Let the universe be N and the relations for + and ×. Let Th(N,+,×) = {all true sentences for this model}. Skipping over details, you can imagine what we mean. Some sentences are true, others won’t be true. Considering the sentences as strings, is this set decidable? G¨odel and others showed it is not decidable. We can write down sentences to describe what Turing machines do; + and × are expressive enough to describe Turing machines. There are two notions, truth and provability. What does it mean to give a proof of a true statement? We mean that from the axioms, and simple rules of implication, you can have a chain of reasoning that gets to that statement. Consider the famous axioms called Peano axioms. Can you prove all true things from Peano axioms? You cannot! You can make a recognizer for all provable things: Search through all possible proofs, until find proof in question. If everything is either provable or its complement is provable, you can search for the proof of the statement or its negation, and this would give a decider for its provability. In truth, this doesn’t exist. Can we exhibit a statement which is unprovable? Try: “This statement is unprovable.” If the sentence were false it would be provable; so it must be true, hence unprovable. This statement is true and unprovable. However, we’ve actually cheated by using self-reference, however, but one can fix this using the recursion theorem. 83
  • 84. Lecture 12 Notes on Theory of Computation Lecture 12 Thu. 10/18/12 Now that we’ve wrapped up the first half on computability theory, we have a midterm next Thursday, on the 3rd floor of Walker, at the usual time 11:00–12:30. It is open book (the book for this course only)/handouts/notes and covers everything through the last lecture. The test will contain about 4 problems. Last time we talked about ∙ the recursion theorem, and ∙ an introduction to logic. Today we’ll talk about ∙ an introduction to complexity theory, ∙ TIME (푡(푛)), and ∙ 푃. We’re shifting gears to talk about complexity theory. Today we will introduce the subject and set up the basic model and definitions. S1 Introduction to complexity theory We have to go slow at the beginning to get the framework clear. We’ll begin by a simple example. In computability theory a typical question is whether a problem is decidable or not. As a research area that was mostly finished in the 50’s. Now we’ll restrict our attention to decidable languages. The question now becomes how many time or resources do we need to decide? This has been an ongoing area of research since the 60’s. Let 퐴 = ⌋︀0푘1푘 : 푘 ≥ 0{︀. This is a decidable language (in fact, context-free). We want to know how hard it is to see whether a string is in 퐴. We could measure hardness in terms of number of steps, but the number of steps depend on the input. For longer strings it may take more time, and within strings of same length, some strings may take longer than others. For instance, if the string starts with 1, we can reject immediately. The picture is a bit messy, so to simplify, we’ll only consider (other do diff things), we’ll only consider how much time is necessary as a function of the length 푛 of the input. Among all inputs of given length, we’ll try to determine what the worst case is, that is, we consider the worst case complexity. Summarizing, we consider how the number of Turing machine steps depends on the input length 푛, and look at the worst case. 84
  • 85. Lecture 12 Notes on Theory of Computation Recall that no matter whether we considered single tape, multi-tape, or nondeterministic Turing machines, what is computable remains invariant. This is not true for complexity theory: The picture will change depending on what model you use. We’re not seeking to develop a theory of one-tape Turing machines. Turing machines are a stand-in for computation. We want to try to understand computation. What can do in principle, in a reasonable amount of time? We don’t want to just focus on Turing machines. The fact that complexity depends on the model is a problem. Which should we pick? But as we will see, although it depends on model, it doesn’t depend too much, and we can recover useful theorems. 1.1 An example Example 12.1: We analyze how much space it takes to decide 퐴 = ⌋︀0푘1푘 : 푘 ≥ 0{︀. Let 푀1=“(1-tape Turing machine) 1. Scan the input to test whether 푤 ∈ 0*1*. We’ll make a pass over the input just to see it’s of the right form: no 1’s before 0’s. 2. Go back to the beginning. Mark off 0 (turn it into another symbol), mark off a 1, then go back to mark off the next 0, then the next 1, and so forth. Continue until we run out of 0’s, 1’s, or both. If we run out of 0’s or 1’s first, then reject. If they run out at the same time, accept. (We needed to spell this out to know how much time the machine is using.) In summary, repeat until all symbols are crossed off: (a) Pass over input, cross off 0’s and 1’s. (b) If either finishes before other, reject. 3. If all symbols are crossed off, accept. For this course, we won’t care about the constant factors in the time used: 10푛2 steps and 20푛2 are equivalent for us. (We could have the machine do some work on the reverse, but we won’t bother.) How much time does 푀1 take? 85
  • 86. Lecture 12 Notes on Theory of Computation Theorem 12.2: A 1-tape Turing machine can decide 퐴 using 푐푛2 steps for all inputs of length 푛 and some fixed constant 푐. We specify number of steps up to constants, so it’s convenient to have notation for this. We’ll refer to 푐푛2 as 푂(푛2). This means at most a constant times 푛2, where the constant is independent of 푛. The definition is spelled out in the book; see the definition there. Proof. Construct 푀1 is above. How long does each step take? 1. Scan input to test 푤 ∈ 0*1*. This takes 푂(푛) time: 푛 steps to go forward and 푛 steps to go back. 2. Repeat until all symbols are crossed off: We need at most 푛 2 steps. (a) Pass over input, cross off 0’s and 1’s. This takes 푂(푛) time. (b) If either finishes before other, reject. 3. If all crossed off, accept. Thus 푀1 takes time 푂(푛) + 푛 2 푂(푛) = 푂(푛2). (the nice thing about 푂 notation is that we only have to look at the dominant term when adding. We can throw away smaller terms because we can absorb them into the constant in the dominant term.) Is this possible, or can we do better? Let’s still stick to one-tape Turing machine. This is not the best algorithm out there; we can find a better one. Here is a suggestion. Zigzagging over the input costs us a lot more time. What if we cross off more 0’s and 1’s on a pass? We can cross of two 0’s and 1’s, so this takes half as much time. But we ignore the constant factor, so for our purposes this isn’t really an improvement. We don’t ignore these improvements not because they’re unimportant. In the real world, it’s good to save factor of 2, that’s good. However, we choose to ignore these questions because we are looking at a different realm: questions that don’t depend on constant factors, or even larger variations. By ignoring some things, other things come out more visibly. For example, everything reduces to quarks, but it would not benefit biologists to study everything on the level of quarks. 1.2 An improvement We can improve running time to 푂(푛 log 푛); this is significant from our standpoint. Instead of crossing out a fixed number of 0’s and 1’s, we’ll cross off every other 0 and every other 1 (Fig. 2), remember the even/odd parity of the number of 0’s and 1’s, and makes sure 86
  • 87. Lecture 12 Notes on Theory of Computation the parities agree at every pass. After every step we go to the beginning and repeat, but we ignore the crossed off symbols. We always check the parity at each step. If they ever disagree, which can only happen if the number of 0’s and 1’s are different, we reject. If the parity is the same at each step, then there must be same number of 0’s as 1’s. This is because the parities are giving the representation of number of symbols in binary (details omitted). This is a more efficient algorithm but less obvious than the original algorithm. It looks like room there is room for improvement, because we only need 푛 steps to read the input. Could the algorithm do 푂(푛)? We can do 푂(푛) with 2 tapes as follows. Read across the 0’s, and copy them onto the 2nd tape. Then read 1’s on the 1st tape and match the 0’s against the 1’s on the second tape. We don’t need to zigzag, and we can finish in 2푛 steps. In fact there is a theorem that we cannot decide 퐴 in 푂(푛) steps with a 1-tape Turing machine. If we can do a problem with 푂(푛) steps on a 1-tape Turing machine, then it is a regular language! (This is not obvious.) If the machine can only use order 푛 time, then the only thing it can do is regular languages: the ability to write doesn’t help. In fact, anything that takes time 표(푛 log 푛) must be a regular language as well. S2 Time Complexity: formal definition Which model should we pick to see how much time it takes? In computability theory we had model independence, the Church-Turing Thesis. Any model that we pick captures the same class of languages. Unfortunately, in complexity theory we have model dependence. Fortu-nately, for reasonable models, the dependence is not very big. Some interesting questions don’t depend (much) on the choice of “reasonable” models. In the meantime, we’ll fix a particular model, set things up using that model, and show things don’t change too much if we choose a different model. For convenience we’ll choose the same model we had all along, a 1-tape Turing machine, then show that it doesn’t change too much for other models. Definition 12.3: For 푡 : N → N, we say a Turing machine 푀 runs in time 푡(푛) if for all inputs 푤 of length 푛, 푀 on 푤 halts in at most 푡(푛) steps. For instance, we say 푀 runs in 푛2 time if 푀 always halts in 푛2 steps when we give it an input of length 푛. We now define the class of languages we can do in a certain number of steps. Definition 12.4: Define TIME(푡(푛)) := {퐴 : some TM decides 퐴 and runs in 푂(푡(푛)) times} . This is called a time complexity class. 87
  • 88. Lecture 12 Notes on Theory of Computation We defined the time complexity using 1-tape turing machines. For a 2-tape TM, what is in the classes could change. Once we draw the picture, we can ask: is there a language we can do in 푛2 time but can’t do in 푛 log 푛 time? We’ll look at questions like this later on, and be able to answer of them. 2.1 Polynomial equivalence Even though the theory depends on model, it doesn’t depend too much. This comes from the following statement. Theorem 12.5: Let 푡(푛) ≥ 푛. Then every multi-tape TM 푀 that runs in 푡(푛) time has an equivalent 1-tape Turing machine 푆 that runs in order 푂(푡2(푛)) time. In other words, converting a multi-tape TM to a single tape TM can only blow up the amount of time by squaring; the single tape TM can polynomially simulate the multi-tape TM. You might think this is bad, but for a computer, this is not too bad. It could be worse (exponential). Proof. We analyze the standard simulation (from the proof of Theorem 6.7). The conversion only ends up squaring the amount of time used. Indeed, it took the tapes and wrote them down next to each other on the tape. Every time the multitape machine 푀 did one step, the single-tape machine 푆 had to do a lot of steps, and then do an update. One step of 푀 might have 푆 pass over entire portion of tape. Each tape can be at most 푡(푛) symbols long, because there are only 푡(푛) steps where it can write symbols. There are a constant number of tapes. Thus one pass at most 푂(푡(푛)) steps. The machine has make at most 푡(푛) passes. Thus the order is 푂(푡(푛)2). 88
  • 89. Lecture 12 Notes on Theory of Computation Here is an informal definition. Definition 12.6: Two computational models are polynomially equivalent if each can simulate the other with at most polynomial increase (푡(푛) can go to 푂(푡(푛)푘) for some 푘). All reasonable deterministic models of computation turn out to be polynomially equiva-lent. This is the complexity analogue of the Church-Turing Thesis. Axiom 12.7 (Church-Turing Thesis): church-turing-complexity All reasonable determinis-tic models are polynomially equivalent. This includes one-tape TM’s, multi-tape TM’s, 2-dimensional TM’s, and random access machines (which are closer to a real computer) which can write an index and grab the memory cell at that location (the address). A real computer is a messy thing to discuss mathematically. It doesn’t have infinite amount of memory. From some points of view, it is like a finite automaton. The most useful way to abstractify it is as a random access machine(RAM) or a parallel RAM (PRAM). If the machine only has polynomial parallism, then it is also polynomially equivalent. The analogous question with nondeterministic TM’s is hard. No one knows a polynomial simulation. It is a famous open problem whether we convert a nondeterministic TM to a deterministic TM with a polynomial increase in time. 2.2 P The complexity version of the Church-Turing Thesis 12.7 tells us the following. All reasonable deterministic models are polynomially equivalent. Thus, if we ignore polynomial differences, we can recover a complexity class independent of the model. Definition 12.8: Let 푃 = ⋃︁푘 TIME(푛푘) = TIME(poly(푛)). In other words, 푃 consists of all languages solvable in 푂(푛푘) time for some 푘. Why is 푃 important? 1. The class 푃 is invariant under choice of reasonable deterministic model. Time classes change when we go from 1-tape to multi-tape TM’s. But by using polynomial equivalence— taking the union over all 푂(푛푘)—the class 푃 is not going to change from model to model. We get the same class 푃. Mathematically speaking, this invariance is natural. 푃 not a class to do with Turing machines. It’s to do with the nature of computation. 89
  • 90. Lecture 12 Notes on Theory of Computation 2. Polynomial time computability roughly corresponds to practical computability. It is a good litmus test: a good way of capturing what it means for a problem to be solvable practically. Of course, practicality depends on context. There is a continuum between practical and unpractical algorithms, but polynomial computability is a good dividing line. One feature of 푃 makes it mathematically nice, and one feature tell you something practical to real world. A math notion with both these aspects is very good. This is why 푃 is such an influential notion in complexity theory and throughout math. 2.3 Examples Let’s look at something we can solve in polynomial time. Let PATH = {⟨퐺, 푠, 푡⟩ : 퐺 is a directed graph with a path from 푠 to 푡} . Theorem 12.9: PATH∈ 푃. The way to prove something like this is to give an algorithm that runs in polynomial time. Proof. “One input ⟨퐺, 푠, 푡⟩, 1. Mark a node 푠. Repeat until nothing new is marked: ∙ Mark any node pointed to by previously a marked node. 2. Accept if 푡 is marked and reject if not. We start at 푠, mark everything we can get to in 1 step by marking nodes adjacent to 푠; then we mark nodes adjacent to those... This is a simple breadth-first search, not the best, but it runs in polynomial time. We will often omit time analyses unless it is not obvious. If each step runs in polynomial time, and all repetitions involve a polynomial number of repeats, then the problem is solvable in 푃. If we look at a similar problem, however, everything changes. 90
  • 91. Lecture 14 Notes on Theory of Computation Definition 12.10: A Hamiltonian path goes through every node exactly once. Is HAMPATH ∈ 푃? The algorithm above doesn’t answer this question. It’s a decidable problem because we can try every possible path, but there can be an exponential number of paths (in terms of the size of the graph). The answer is not known! This is a very famous unsolved problem. Lecture 13 Tue. 10/23/12 Absent because of sickness. Lecture 14 Tue. 10/30/12 Last time we talked about ∙ NTIME(푡(푛)) ∙ NP Today we’ll talk about NP-completeness. S1 P vs. NP Recall that P is the class of problems (languages) where we can test membership quickly (in polynomial time in the size of the input). NP is the class of problems where we can verify membership quickly. We verify via a “short certificate” or a “short proof.” The verifier would be convinced that the string is in the language. Hamiltonian path is a good example: membership is easily verified by giving the path. Nonmembership is trickier: No one knows whether there is a way to exhibit short proof for non-existence of a Hamiltonian path. The complement of HAMPATH is not known to be in NP. We can always flip the answer in P. However, we can’t do so easily in NP: the acceptance structure can’t be complemented easily in nondeterministic Turing machine. ! The complement of a language in P is in P (coP=P). However, the complement of a language in NP may not be in NP, because a NTM can’t easily do the opposite of what another NTM does. The big problem in theoretical computer science is P versus NP. Most people believe P̸=NP: there is a larger class of languages that can be verified in polynomial time than can 91
  • 92. Lecture 14 Notes on Theory of Computation be solved in polynomial time. The other alternative is that 푃 = 푁푃. We’ve seen that SAT, HAMPATH, CLIQUE, etc. are in NP. This problem was posed in the early 1970’s, though it had precursors in the literature 10–15 years prior. There is an amazing letter Kurt G¨odel sent to John von Neumann in 1955–1956 about the problem, using different language: Do we have to look for proofs by brute force or is there some quicker way? The problem has spread outside the computer science community to the math community. P vs. NP is one of Millenium problems, put together by a committee in 2000 as the analogue to Hilbert’s problems in 1900. Langton Clay put in prize money for a solution: one million dollars. S2 Polynomial reducibility Early progress on the P vs. NP problem gave the amazing theorem. Theorem 14.1: thm:sat-np SAT∈P iff P=NP. This would be important in a proof of the 푃 ?= 푁푃 problem. If might seem that you have to find an algorithm for all NP problems. If you believe P=NP, all you have to do is find an algorithm for SAT. On the flip side, to show 푃̸= 푁푃, all you have to do is pick one problem and show it’s in NP but not in P. But you might pick the wrong problem, for instance compositeness (primality testing), which is actually in 푃. This theorem tells you you can just focus on SAT. This is an application of the theorem to understanding the P vs. NP problem. If you think of problems in P as being easy, and problems outside being hard, and if you assume that P̸=NP, then this theorem tells you that SAT is not easy. This gives evidence that SAT does not have a polynomial time algorithm. Enough philosophical musings; let’s do math. We’ll work our way towards the proof of Theorem 14.1 today and finish next time. We use a notion that we’ve seen before—reducibility. Definition 14.2: 퐴 is polynomial time mapping reducible to 퐵 (퐴 ≤푃 퐵) if 퐴 ≤푚 퐵 (퐴 is mapping reducible to 퐵) and the reduction is computable in polynomial time. 92
  • 93. Lecture 14 Notes on Theory of Computation In other words, the thing that does the mapping can be done quickly. Not only can you translate 퐴-questions to 퐵-questions, you can do so by a polynomial time algorithm. Just as we proved Proposition 9.7, we can show the following. Theorem 14.3: If 퐴 ≤푃 퐵 and 퐵 ∈ 푃, then 퐴 ∈ 푃. Let’s do an example. 2.1 An example: 3SAT reduces to CLIQUE Example 14.4: 3SAT≤푃CLIQUE. Recall that SAT = {⟨휑⟩ : 휑 is a satisfiable Boolean formula} . In other words, it is the set of statements 휑 that is true, under some truth assignment to its variables. It’s convenient to consider Boolean formulas in a special form, 3CNF (conjunctive normal form). This means the formula looks something like (푥 ∨ 푦 ∨ 푧) ∧ (푥 ∨ 푤 ∨ 푦) ∧ · · · ∧ (푢 ∨ 푤 ∨ 푥). It is written as a bunch of clauses and’d together, and each each clause is an “or” of 3 literals (variables or negated variables). That’s all we’re allowed to do. The “3” means that we have 3 variables in each clause. Thus we see this is a special case of the SAT problem, which we call 3SAT. 3SAT = {⟨휑⟩ : 휑 is a satisfiable 3CNF formula} . We’ll focus on the 3SAT problem and the CLIQUE problem. The CLIQUE problem is very different. Given an undirected graph with nodes and edges, a 푘-clique is 푘 vertices all connected to one another. Define CLIQUE = {⟨퐺, 푘⟩ : 퐺 contains a 푘–clique} . I’m going to give a way to convert problem about whether or not a formula is in the 3SAT language to whether a graph contains a 푘-clique. This is surprising! We’ll see that such conversions (reductions) are not just an interesting curiosity, but very important. 93
  • 94. Lecture 14 Notes on Theory of Computation We’ll do a proof by example. Suppose 휑 = (푥1 ∨ 푥2 ∨ 푥3) ∧ (푥2 ∨ 푥3 ∨ 푥4) ∧ · · · ∧ (· · · ). A satisfying assignment is an assignment that makes the whole thing true. Because 휑 is made up of clauses and’d together, each clause has to be true. What does it mean for each clause to be true? We have to make at least one of the literals true. We have to pick out one literal and make it true. Thinking of the problem this way will be helpful to understanding the reduction to the CLIQUE problem. We will now convert 휑 to ⟨퐺, 푘⟩. We will have one node for each literal variable. It’s helpful to think of each node as being labeled by the associated literal. Now we put in the edges. We put in all possible edges with two exceptions. 1. Don’t put edges inside a clause (internal to one of the triples associated to a clause). Thus edges can only go from one clause to another clause. 2. Never join two nodes that are associated to contradictory labels. All other edges will be there. As long as two literals are not contradictory in different clauses, they are connected by an edge. Let 푘 be the number of clauses. We just have to show that this is actually a reduction. That this can be done in polyno-mial time is clear: by looking at the formula, we can easily write down the graph. We have to show two directions. Now is where the interesting stuff happens; we’ll un-derstand what’s going on; why did we draw this strange graph? 1. 휑 ∈3SAT =⇒ ⟨퐺, 푘⟩ ∈CLIQUE. Suppose 휑 is 3-satisfiable; we have to exhibit a 푘-clique. Each clause has at least one true literal. Pick out a true literal in each clause. Maybe the assignment makes 푥2 true. Obviously it cannot make 푥2 true; maybe it makes 푥3 true. Now pick out the associated nodes. I claim those nodes form a clique. I have to show that every pair of nodes I’ve picked are connected by an edge. We put in all possible edge with 2 exceptions. We have to show we don’t run into any of the exceptions. 1. We only pick 1 node from each clause. 94
  • 95. Lecture 14 Notes on Theory of Computation 2. We never pick two nodes with contradictory labels. We can’t pick two nodes with contradictory labels because they can’t be both true; we could not have picked both of them as the true literal in the clauses. One will be true and the other false in any assignment. We started with the certificate from 3SAT and produced a certificate for CLIQUE. 2. 휑 ∈3SAT⇐ ⟨퐺, 푘⟩ ∈CLIQUE. Now we start with a 푘-clique. We reverse the argument. Look at the nodes we picked out as being in the same clique. Every node has to be from a different clause, because nodes in the same clause are not connected (1). Since there are 푘 clauses, we took one node from each clause. Take the nodes in the clique and let the corresponding literal be true. For instance, if 푥2 and 푥3 are in the clique, make 푥2 true and 푥3 true, i.e., 푥3 false. If a variable is unassigned, assign any which way. How do we know we didn’t run into trouble? We won’t assign a variable true and its complement true, because contradictory nodes can’t be in the same clique (2). This gives at least one 1 node in each clause. We’re done but we had to show both directions. This means that if we find a polynomial time algorithm for CLIQUE, then we can solve 3SAT quickly. We can convert 3SAT into a special CLIQUE problem. If you can solve general CLIQUE problems, then you can solve these special CLIQUE problems too, using our magical polynomial time algorithm to CLIQUE. Let’s lay out our game plan. We’ll show next lecture that every NP problem can be reduced to SAT. We’ll show SAT ≤푃 3SAT ≤푃 CLIQUE,HAMPATH, . . . (we just did 3SAT≤푃CLIQUE). What we did for 1 problem we’ll have to do for infinitely many problems. We’ll use the Boolean logic of SAT to simulate a Turing machine. This is similar to the proof of undecidability of PCP: we use combinatorial structure to simulate a Turing machine. Note that polynomial time reducibility is preserved by composition (exercise). S3 NP completeness We have a special name for problems that every NP problem can reduce to. Definition 14.5: A language 퐵 is NP-complete if 1. 퐵 ∈NP. 2. For every 퐴 ∈NP, 퐴 ≤푃 퐵 (퐴 is reducible to 퐵 in polynomial time). 95
  • 96. Lecture 14 Notes on Theory of Computation If we can reduce everything else in NP to 퐵, then 퐵 is a NP-complete problem. Condition 2 by itself is called NP-hard. Rephrasing, 퐵 is NP-complete if 퐵 ∈NP and is NP-hard. (A problem that is just NP-hard may be worse than NP.) The picture is that NP-complete problems are at the “top” of the NP problems: Proving the non-existence of reductions within NP is tricky business. A common question is to give an example of NP problem which is not NP-complete. But if 푃 = 푁푃, then all problems in NP are reducible to each other, essentially. If you can prove some NP problem is not reducible to another NP problem, then you have a good result—you’ve just shown 푃̸= 푁푃. We’re not going to show that in class. Otherwise, I’d be off celebrating somewhere in the Caribbean. There is a special analogy between P and decidability and NP and recognizability. One key element is not in place, though. We don’t know whether the classes are different. Still, there are a lot of similarities. As we will show, everything is reducible to SAT, so SAT is NP-problem (Cook-Levin Theorem). Theorem 14.6 (Cook-Levin): thm:cook-levin SAT is NP-complete. (This is equivalent to Theorem 14.1.) By composition of reductions, if SAT reduces to some other problem, that problem is also a NP-problem. This will show that 3SAT, CLIQUE, HAMPATH, etc. are also NP-complete, provided that we have the reductions. Because 3SAT is NP-complete, to show another problem is NP-complete, you just have to do things: ∙ Show it is in NP. ∙ Give a polynomial-time reduction from 3SAT to NP: 3SAT≤푃NP. When we’re doing reductions, we’re trying to find a way to simulate Boolean variables with structures in the target problems. 96
  • 97. Lecture 14 Notes on Theory of Computation To reduce from one 3SAT to another language, design features or structures that have the same kind of feature as a variable or clause in 3SAT. (Think of this as “learning to program” using CLIQUE, HAMPATH, etc. languages/) These features are called gadgets, substructures in the target language which operate in the same way a variable or clause do. The best way to understand this is through example. Theorem 14.7: 3SAT≤푃HAMPATH. Proof. Start with a 3CNF, say 휑 = (푥1 ∨ 푥2 ∨ 푥3) ∧ (푥2 ∨ 푥3 ∨ 푥4) · · · . We construct ⟨퐺, 푠, 푡⟩. We build a graph that has a Hamiltonian path in it exactly when 휑 is satisfiable. (fig 6). We put in a bunch of nodes; all edges are directed downwards or horizontally. The dia-mond structures will be associated to the variables, there will be one structure corresponding to each variable (a bit different from last time, where we had one structure for each appear-ance of a literal). The bottom node of a diamond is the same as the top node of the next. For each diamond we have horizontal connections. We have a hamiltonian path right now. For each diamond we could zig-zag or zag-zig independently through each of the variable gadgets; we pick up all the nodes, and there’s nothing else we could do. Zig-zag is going to correspond to “true” and zag-zig is going to correspond to “false.” The Hamiltonian path is going to correspond to the truth assignment. An important feature we haven’t done yes is the clauses. We have to have an assignment which makes one literal in each clause true. We let each clause gadget be a node. A Hamiltonian path has to go through each. If 푥1 ∈ 퐶1 (clause 1), then we put in arrows like in the diagram, allow a detour to visit 푥1 if we’re zig-zagging (going from left to right in a diamond) ,but not if we’re zag-zigging (going from right to left in a diamond): (figure from textbook) 97
  • 98. Lecture 14 Notes on Theory of Computation This corresponds for 푥1 being a positive literal. How do we implement the fact that 푥3 ∈ 퐶1? We allow the detour only to go in the right-to-left direction. 98
  • 99. Lecture 15 Notes on Theory of Computation We leave a space before putting the next node, to give an opportunity to make several detours. Suppose an assignment has 2 true literals in some clause 퐶1. But that gives 2 detours to 퐶1. We can only visit 퐶1 once. Is that a problem? No. A detour is an option—it’s not a broken-road detour, it’s a rest-stop type detour, if you don’t have to go, don’t. We have to prove that if we have a satisfying assignment, then we have a Hamiltonian path. We zig-zag or zag-zig according to assignment, visit all detours. For the converse, if the path is nice (consisting of zig-zag and zag-zigs), then we get a satisfying assignment, and we’re done. If the path is not nice, i.e., it goes to a different diamond from one it came from at some stage, then the path cannot be Hamiltonian because of the spacer nodes. Lecture 15 Thu. 11/1/12 Last time we talked about ∙ NP-completeness ∙ 3SAT≤푃CLIQUE ∙ 3SAT≤푃HAMPATH Today we’ll prove the Cook-Levin Theorem: SAT is NP-complete. We have (Every NP problem) ≤푃 SAT ≤푃 3SAT ≤푃 CLIQUE, HAMPATH, many others We’ll show the first inequality today and the second inequality in recitation. We know every problem on the right is NP-complete. (We don’t necessarily have to start with SAT or 3SAT. Sometimes it’s easier to study another NP-complete problem. For instance, to show UHAMPATH, the undirected version of Hamiltonian path, is NP-complete, we can just reduce the directed to the undirected version, HAMPATH≤푃UHAMPATH.) 99
  • 100. Lecture 15 Notes on Theory of Computation If we assume P̸=NP, and if we show a problem is NP-complete, then it cannot be solved in polynomial time. Thus being NP-complete is very strong evidence for intractibility: the problem is too hard to solve in practice. What is remarkable (and not well understood) is that typical problems in NP, with few exceptions, turn out to be in P or NP-complete. This is mysterious and currently has no theoretical basis. Thus, given a problem, researchers often spend part of the time showing it’s solvable in polynomial time and part of the time showing it’s NP-complete. This works well most of the time. There are some problems, though, that seem to be outside of P, but we don’t know how prove they are NP-complete. For instance, the problem of testing if 2 graphs are isomorphic—whether they are the same graph but labeled differently—is NP: the short proof is the mapping of the vertices. No one knows whether the graph isomorphism problem is solvable in polynomial time, nor has anyone shown it is NP-complete. It’s one of few problems that seem to be hovering in between. Another example is factoring integers. Define CoNP = ⌋︀퐴 : 퐴 ∈ NP{︀. We have P∈ NP ∩ CoNP. (P is closed under complement so P=coP. It’s generally believed that NP-complete problems cannot be in coNP because otherwise NP=coNP. There are problems in the intersection, for instance, factoring is a problem in NP∩coNP. Naively it’s a function, but we can turn it into a decision problem. Think of numbers as written in binary, and call it the bit factoring problem: BIT-Factoring = {⟨푥, 푖⟩ : 푖th bit of largest prime factor of 푥 is 1} . BIT-Factoring is in NP because nondeterministically we can guess the prime factorization of 푥 and check that the largest prime factor has a 1 in the 푖th place. The complement is also a NP-problem: The 푖th bit is a 0. We can check that in exactly the same way. If BIT-Factoring is in 푃, then we can factor numbers in polynomial time. We believe that factoring is not in P, so this problem seems to not be in P. This suggests the problem is not NP-complete. S0 Homework The first four questions are clear. For one of them keep in mind dynamic programming as a technique. (Context-free languages are testable in polynomial time. It is in a sense the most basic polynomial time algorithm.) 100
  • 101. Lecture 15 Notes on Theory of Computation Problem 5 asks you to show that under the assumption P=NP, there exists an algorithm that operates in polynomial time which not not only test whether a statement is satisfiable, but produce the satisfying assignment. A tempting algorithm is that there is a nondeter-ministic algorithm which finds the assignment, and because P=NP, there is a deterministic algorithm which finds the assignment. But it is conceivable that the polynomial time al-gorithm for satisfiability operates not by finding the assignment, but saying whether it is satisfiable. You have to show that if the program operates by some other way, you can turn it into an algorithm to find the assignment. In order to produce a satisfying assignment, you will end up testing whether multiple formulas are satisfiable.Out of the decisions from the tests, you can assemble the satisfying assignment to the original formula. How can you at least get a little bit of information about the satisfying assignment? Problem 6 says that minimizing NFA’s cannot be done unless P=NP. By contrast, it is know that the conversion for DFA can be done in polynomial time. S1 Cook-Levin Theorem Theorem 15.1 (Cook-Levin, Theorem 14.6 again): thm:cook-levin2 SAT is NP-complete. Proof. 1. SAT∈NP: This is easy: guess a satisfying assignment. 2. Let 퐴 ∈NP. We have to show 퐴 ≤푃SAT. Assume we have a NTM 푀 for 퐴 so that 푀 runs in 푛푘 time. The idea is as follows. We have to give a polynomial time reduction 푓 : 퐴 →SAT. It will take a string 푤 and convert it to some formula 휑푤. The function 푓 maps a membership question in 퐴 to a membership question in SAT; we will have 푤 ∈ 퐴 exactly when 휑푤 is satisfiable. 푓 : 퐴 → SAT 푤↦→ 휑푤 푤 ∈ 퐴 iff 휑푤 is satisfiable. Think of 휑푤 as saying whether 푀 accepts 푤. The construction of 휑푤 as follows. It will be in 4 pieces and’d together: 휑푤 = 휑cell ∧ 휑start ∧ 휑move ∧ 휑accept We’ll describe the computation of 푀 on 푤 in a certain way. Define a tableaux for 푀 on 푤 to be a table where the rows are configurations of 푀 on 푤. Write down the tape with the head symbol to the left of the symbol it’s looking at (cf. the PCP proof 10.1). Each row is a configuration. The sequence of rows you 101
  • 102. Lecture 15 Notes on Theory of Computation get is a computation history. Remember 푀 is nondeterministic, so there may be multiple computation histories. If 푀 accepts 푤, there is an accepting branch, and we can write down an accepting computation history with the starting configuration at the top and the accepting configuration at the bottom. Does there exist such a tableaux? If 푀 does not accept 푤 there is no accepting com-putation history so there is no tableaux. The question we’re trying to answer is whether a tableaux exists. We’re trying to make a formula which says a tableaux exists. Is there some way of setting cells to symbols such that the whole thing is a legitimate tableaux? We make indicator variables for each cell: think of each cell as having a bunch of little lights; one light for each possible setting the cell could be: 푎, 푏, 푞0, etc. If the light for 푎 is on, then the cell has an 푎 in it. The variables of 휑푤 are 푥푖푗휎 where 1 ≤ 푖, 푗 ≤ 푛푘 (we’re assuming the machine runs for 푛푘 steps; the most number of cells it could use is 푛푘)11 and 휎 ∈ Γ∪푄 (휎 is in the tape alphabet or 휎 is a state). There are |Γ ∪ 푄|푛2푘 variables 푥푖푗휎, which is polynomial in 푛. 11Technically we may need 푐푛푘 just to cover 푛 = 1 but this is a minor issue. 102
  • 103. Lecture 15 Notes on Theory of Computation 휑cell: In order for variables to correspond to valid tableaux, exactly 1 cell per symbol has to get assigned. If we turn on several lights for some cell, this would correspond to multiple symbols, and we don’t want that. We have to make sure we’re turning on exactly one light; exactly one variable becomes true for each (푖, 푗). This is the first piece 휑cell. 휑cell says that there is exactly one symbol per cell or equivalently, exactly one 푥푖푗휎 is true for each 푖, 푗: 휑cell := ⋀︁ 1≤푖,푗≤푛푘 „ ⋁︁ 휎∈Γ∪푄 푥푖푗휎 ∧ ⋀︁휎̸=휏 푥푖푗휎 ∨ 푥푖푗휏Ž. The first formula ensures that one of these lights is “on,” and the second ensures that at most one of the lights is on (for every pair of lights which are not the same, at least one of them is on). Together they say exactly 1 variable is true. The assignment has to correspond to one symbol in each cell of the tableaux. 휑start: Now we want to say in the very first row, the variables are set to be the start configuration. 휑start says that the start configuration is 푞0푤1푤2 · · ·푤푛 · · · ⏟ ⏞ 푛푘 Hence we let 휑start = 푥11푞0 ∧ 푥12푤1 ∧ 푥13푤2 ∧ · · · ∧ 푥1,푛+1,푤푛 ∧ 푥1,푛+2, ∧ · · · 푥1,푛푘, . 휑accept: Now let’s do 휑accept. The very last row is an accepting configuration; namely the machine is in the accept state. (What if the machine stops sometime earlier? We assume that the rules of the machine say it stays in the accepting state for the “pseudo-steps” afterward.) We let 휑accept = ⋀︁ 1≤푗≤푛푘 푥푛푘푗푞accept . 휑move: Finally, we need to say the machine moves completely. To do this out in full gory detail is a bit of a mess (like the PCP problem). I’ll just convince you that you can do it. We pick out a 2 × 3 neighborhood, or window from the tableaux, and specify what it means for it to be a legal neighborhood. figure 3 For any given setting of symbols in the 2×3 neighborhood, we can ask whether it could possibly arise according to the rules of the machine. There are certain legal settings and certain illegal settings. For instance if when in state 푞3 and the machine reads an 푎, writes 푐, moves to the right, and goes to state 푞5 in a possible nondeterministic step, then 푞3 푎 푏 푐 푞5 푏 is legal, whereas 푞3 푎 푏 푐 푞5 푑 103
  • 104. Lecture 15 Notes on Theory of Computation is illegal. There are some subtleties, for instance, 푎 푏 푐 푑 푏 푐 may be a state with where the head changed 푎 (the head being to the left of 푎), but something like 푎 푏 푐 푎 푑 푐 is never possible. By looking at the transition function of 푀, we can determine which of the 6-symbol settings are legal and which are not. We need to check whether every single window is legal. If every single window is legal then all moves are legal. This depends critically on the window being 2 × 3. If it were just a 2 × 2 window it wouldn’t work. The tableaux can be globally wrong but locally right if we only look at 2×2 windows. If the machine is in state 푞2, and it can go to 푞3 and go left, or 푞5 and go right, then you have to make sure you exclude things like 푎 푞2 푎 푞3 푎 푞5 . A 2 × 3 window just big enough to catch this; this is the only thing that can go wrong. Thus we let 휑move = ⋀︁ 1≤푖,푗≤푛푘 (푖, 푗 neighborhood is legal), i.e., more precisely, 휑move = ⋀︁ 1푖푛푘, 1≤푗푛푘 ⋁︁ 푎 푏 푐 푑 푒 푓 is legal 푥푖−1,푗,푎 ∧푥푖,푗,푏 ∧푥푖+1,푗,푐 ∧푥푖−1,푗+1,푑 ∧푥푖,푗+1,푒 ∧푥푖+1,푗+1,푓 . We “or” over all possible ways to set cells to symbols to get a legal window. That can be a lot but it’s a fixed number. We have 2 things that remain: first, we need to show this is correct, i.e., 푤 is in the language iff 휑푤 satisfied. Now 푤 being in the language means there is some accepting computation history, i.e., some valid tableaux, i.e., some setting of variables that satisfies 휑푤. This should be clear from the construction. The pieces of the formula are designed to force the variables to be set according to some valid accepting tableaux. We also have to check the reduction can be done in polynomial time. This is easy to confirm. First, how large is 휑푤? Ignoring constant factors, the size is about as large as the number of cells in the tableaux, which is polynomial in 푛. Actually, writing down the formula can be done in about the same time as the size of the formula. The steps themselves are simple. It’s just a lot of output, but still polynomial. The actual thinking to produce the output is simple. 104
  • 105. Lecture 15 Notes on Theory of Computation S2 Subset sum problem Let’s look the subset sum problem: SubSum = {(푎1, . . . , 푎푘, 푡) : some subset of 푎1, . . . , 푎푘 sums to 푡} . This is a NP-problem because you can just guess the subset that sums to 푡. Theorem 15.2: The subset sum problem is NP-complete. Proof. We show that 3SAT reduces to SubSum. Suppose we are given a 3-cnf 휑 = (푥1 ∨푥2 ∨ 푥3) ∧ (· · · ) · · · (· · · ). How do we make gadgets in SubSum but simulate the variables and clauses of the 3SAT problem? In the choice of what the subset looks like, there are some binary choices: pick or not pick. We want to make them correspond to binary choices for the variables. A binary choice is whether or not 푎1 in the subset. We modify this a bit. 푥1 set to true or false is somehow symmetrical. 푎1 being in the subset or not is less symmetrical. We’ll do something in the same spirit. Each variable represented is represented by 2 values. The target sum is designed in such a way so that exactly one value has to appear in the subset. Here’s the construction. We’ll write the values in decimal. Having 1’s in 푡 forces exactly one of 푎1, 푎2 to appear, and similarly for each pair 푎2푘−1, 푎2푘. 푎1, 푎2 is the 푥1 gadget, 푎3, 푎4 is the 푥2 gadget, and so forth; 푎1 corresponds to 푥1 true and 푎2 corresponds to 푥1 false, and so forth. In the table below, we write 푎2푘−1, 푎2푘 as 푦푘, 푧푘. We have columns corresponding to each clause, and put 1’s in cells when the literal corresponding to the row is in the clause corresponding to the column. 105
  • 106. Lecture 16 Notes on Theory of Computation Now we put 2 extra 1’s in each column. If there are no 1’s in the formula part, then we are not going to get 3. If we have at least 1 in the formula part, then we can add 1’s to get 3, and we are done. Lecture 16 Tue. 11/6/12 We’re going to shift gears a little bit. Having finished our discussion of time complexity—the number of steps it needs to solve one problem—we’re going to look at how much memory (space) is needed to solve various problems. We’ll introduce complexity levels for space complexity analogous to time complexity, and complete problems for these classes. Last time we proved the Cook-Levin Theorem: SAT is NP-complete. Today we’ll do ∙ space complexity 106
  • 107. Lecture 16 Notes on Theory of Computation ∙ SPACE(푠(푛)), NSPACE(푠(푛)) ∙ PSPACE, NPSPACE ∙ Examples: TQBF, LADDERDFA ∙ Savitch’s Theorem. S0 Homework Problem 1: On exponentiation modulo a number. We can do the test even though the numbers are very big, say all 푛-bit numbers. The naive algorithm—just multiplying over and over—takes exponential time, because the magnitude of the number is exponential in the size of the number. If you want to raise a number to the 4th power, you can multiply it 3 times or square it twice. Using this squaring trick you can raise number to high powers, even if they are not powers of two. There are real applications of raising numbers to powers in modular arithmetic, for in-stance, in cryptography. Problem 2 (Unary subset sum problem): A number in unary is much bigger to write down than, say, in binary. The straightforward algorithm—looking through all possible subsets—doesn’t give a polynomial time algorithm because there are exponentially many subsets. Instead, use dynamic programming. The key observation is that you can ignore the target. Just calculate all possible values you can get by looking at the subsets. There are exponentially many subsets, but only polynomially many different values you can obtain for their sums. Think about how to organize your progress carefully. Dynamic programming gives you a way to organize your progress. Problem 3: This is an important problem. If P=NP, then everything in NP is NP complete. This is important for 2 reasons. This shows that proving a problem is not NP-complete is pretty hopeless. There can be no simple way of showing a problem not NP-complete, because then we get this amazing consequence P̸=NP. The fact really comes from the fact that all problems in P are polynomial time reducible to one another. This is important to understand, because the issue comes up repeatedly in different guises. This is a nice exam-type question that can be asked in a variety of different ways. This is similar to the fact that all decidable problems are mapping-reducible to one an-other. This is a basic concept to understand in putting together the theory the way we do it. Problem 4: 107
  • 108. Lecture 16 Notes on Theory of Computation The 3-coloring problem is NP-complete. The book gives gadgets you might use. The palette is a structure you might want to use in your reduction. If you imagine trying to color your graph in 3 colors, and you have this structure, the 3 colors must all appears in the palette. (The palette is like the set of colors the artist has to work with.) When you color the graph with 3 colors, we don’t know what colors they are but we can arbitrarily assign them names, say True, False, and Red. Thinking of colors as truth values helps you understand the rest of the connection. In the variable gadget, a node of the palette (the red node) happens to be connected to 2 other nodes connected to each other. If it is 3-colorable, then we know the 2 nodes are not red, so are either true-false or false-true. That binary choice mirrors the choice of truth assignment to some variable. That’s why this structure is called a variable gadget. It has two possibilities. But you have to make sure the coloring corresponds to satisfying assignment. That’s what the other gadgets help you to do. Play with the or-gadget. Try assigning values at the bottom and see what values are forced elsewhere. Problem 5: If P=NP then you can not only test formulas, you can find the assignment. Find the assignment a little bit at a time. Problem 6 (Minimizing NFA’s): Find an equivalent automaton with the fewest number of states possible, equivalent to original one. For DFA’s, there is a poly time algorithm. No such algorithm is known for NFA’s. In fact, if you could do that then P=NP. Imagine what would happen if you could minimize the automaton you ended up constructing. That would turn out to be useful. S1 Space complexity 1.1 Definitions Definition 16.1: A Turing machine runs in space 푠(푛), where 푠 : N → N, if it halts using at most 푠(푛) tape cells on every input of length 푛, for every 푛. The machine uses a tape cell if its head moves over that position on the tape at some time. We’re only going to consider the case 푠(푛) ≥ 푛, so we at least read the entire input. The head has at least passed over the input; it might use additional space beyond the input. We assume the machine halts on input of every length. This is entirely analogous to time complexity. There, instead of measuring space used, we measured time used. 108
  • 109. Lecture 16 Notes on Theory of Computation We can define space use for deterministic and nondeterministic machines. For a nonde-terministic maching to run in space 푠(푛), it has to use at most 푠(푛) in every branch. We treat each branch independently, seeing how many tape cells are used on that branch alone. We now define space complexity classes. Definition 16.2: Define SPACE(푠(푛)) = {퐴 : some TM decides 퐴 running in 푂(푠(푛)) space} NSPACE(푠(푛)) = {퐴 : some NTM decides 퐴 running in 푂(푠(푛)) space} . Think of these as the collection of languages some machine can do within 푠(푛) space. 1.2 Basic facts Let’s show some easy facts, some relationships between space and time complexity. Proposition 16.3: For 푠(푛) ≥ 푛, TIME(푠(푛)) ⊆ SPACE(푠(푛)). This also works for NSPACE and NTIME. Proof. Suppose we can do some problem with 푠(푛) time. Then there is a TM that can solve that problem with at most 푠(푛) steps on any problem of input length 푛. I claim that language is also solvable in space 푠(푛). If you can do something with 푠(푛) steps you can do it in 푠(푛) space, by using the same algorithm. The machine can only use at most 푠(푛) tape cells because in each additional step it uses at most 1 more tape cell. Let’s do containment in the other direction. Space seems to be more powerful than time: the amount of stuff doable in space 푛 might take a lot more time. Proposition 16.4: For 푠(푛) ≥ 푛, SPACE(푠(푛)) ⊆ TIME(2푂(푠(푛))) = ⋃︁푐0 TIME(푐푠(푛)). This also works for NSPACE and NTIME. Think of 푐 as the size of the tape alphabet. Proof. Consider a machine running in space 푠(푛). It can’t go on too long without repeating a configuration; if it halts it can’t repeat a configuration. The number of configurations is at most exponential in 푠(푛), so the time is at most exponential in 푠(푛). 109
  • 110. Lecture 16 Notes on Theory of Computation Definition 16.5: Define PSPACE = ⋃︁푘 SPACE(푛푘) NPSPACE = ⋃︁푘 NSPACE(푛푘) We define these because they’re model independent like P and NP. Corollary 16.6: P⊆PSPACE and NP⊆NPSPACE. Proof. This follows from TIME(푠(푛)) ⊆ SPACE(푠(푛)). The following starts to show you why space is more powerful than time. Theorem 16.7: NP⊆PSPACE. Now we have to do something nontrivial. All we know is that we have a nondeterministic polynomial time algorithm for the language. It’s not going tell you that you can decide the same language with a polynomial time algorithm on a deterministic machine. Proof. 1. We first show SAT∈PSPACE: Use a brute force algorithm. You wouldn’t want to write down the whole truth table. But you can cycle through all truth assignments one by one, reusing space to check whether they are satisfying assignments. If you go through all assignments and there are no satisfying assignments, then you can reject. The total space used is just enough to write down the current assignment. Thus SAT∈SPACE(푛). 2. If 퐴 ∈NP then 퐴 ≤푃SAT. The polynomial time reduction can be carried out in poly-nomial space. If you have an instance of a NP problem, then you can map it to SAT in polynomial time, and use the fact that the SAT problem can be done in polynomial space. This theorem illustrates the power of completeness. Note that we had to make sure the reduction is being capable of being computed by algorithms within the class (PSPACE). Then we showed a NP-problem is in PSPACE for a complete problem in that class (SAT), so we get that all problems reducible to it are also in that class. Thus the whole class (NP) becomes subset of class you’re working with (PSPACE). (You can also give a more direct proof.) Theorem 16.8: CoNP⊆PSPACE. Proof. When you have deterministic machines, and you want the complementary langauage, you can just flip the answer at the end. Deterministic complexity classes are closed under complement. Just solve the NP problem and take the complement. 110
  • 111. Lecture 16 Notes on Theory of Computation For instance, the unsatisfiability problem is in CoNP, hence is in PSPACE. We have HAMPATH ∈CoNP, hence is in PSPACE. In fact, UNSAT and HAMPATH are CoNP-complete. 1.3 Examples Let’s do a slightly less trivial example of a problem in PSPACE. Then we’ll give an example of a problem in NPSPACE. Example 16.9: Here is a Boolean formula: (푥 ∨ 푦) ∧ (푥 ∨ 푦 ∨ 푧). We put quantifiers in front. Quantifiers range over boolean values. ∀푥∃푦∀푧[(푥 ∨ 푦) ∧ (푥 ∨ 푦 ∨ 푧)]. This formula says: For every truth assignment to 푥 there exists a truth assignment to 푦 such that for every truth assignment to 푧 the statement is true. This is a quantified Boolean formula. We assume every variable gets quantified. We formulate the general computational problem as a language: TQBF, true quanti-fied Boolean formulas. TQBF = {⟨휑⟩ : 휑 is a true quantified Boolean formula} . This problem is in a sense a generalization of satisfiabilities. The satisfiability problem is the special case where all quantifiers out front are ∃: is there a setting to all variables that makes the formula true. TQBF seems to be harder. It is in polynomial space, but not known to be in NP. Why is it solvable in polynomial space? It turns out TQBF is PSPACE-complete. We first have to show it’s in PSPACE. This isn’t too hard. Theorem 16.10: thm:tqbf-pspace TQBF∈PSPACE. Let’s assume that you can plug in constant values (trues/falses) in certain locations. Proof. Break into cases. On input ⟨휑⟩, 1. If there are no quantifiers, then there are no variables, so evaluate 휑 and accept if true. 2. We give a recursion. If 휑 starts with ∃푥, evaluate recursively for 푥 true and false. Accept if either accepts. 3. If 휑 starts with ∀푥, evaluate recursively for 푥 true and false. Accept if both accept. 111
  • 112. Lecture 16 Notes on Theory of Computation In this way the machine evaluates all possibilities while resusing the space! This uses space 푂(푛). Now let’s look at nondeterminstic space complexity. Here’s a word puzzle: convert one word to another by changing one word at a time, and staying in the language. For instance, suppose we want to convert ROCK to ROLL. We can’t convert ROCK to to ROCL because ROCL is not an English word. It may be helpful to change the R to something else to enable us to change last letters: ROCK, SOCK, SULK, BULK, BULL, BOLL, ROLL. We’ll consider a similar problem. Define the language as the set of strings some finite automaton accepts. Definition 16.11: df:ladder Define LADDERDFA = {⟨퐵, 푠, 푡⟩ : there is a sequence 푠 = 푠0, 푠1, . . . , 푠푘 = 푡, 푠푖, 푠푖+1 differ in one character, all 푠푖 ∈ 퐿(퐵)} What’s the complexity of testing this? This problem is solvable in nondeterministic polynomial space. Theorem 16.12: thm:ladder-npspace LADDERDFA ∈ NPSPACE. Proof. (by example) Nondeterministically change one letter, check to see if the word is still in the language. We test at every stage if we ended up at ROLL. We have to be careful not end up in a loop. The machine cannot remember everything it’s done. Instead, it counts how many words it has looked at so far. If the number is too high, it must have looped. 1.4 PSPACE vs. NPSPACE There is a rather surprising general theorem that tells you PSPACE and NPSPACE are the same. The analogue to P vs. NP for space complexity is solved. PSPACE = NPSPACE. This is not obvious! If you try a backtracking algorithm in the obvious way, then it blows up the space to be exponential. Is this is a NP problem? The certificate (ladder) could be exponentially long! The space is allowed to guess on the fly. The amount of steps is potentially exponential. It’s not known to be in NP. (The input consists of the automaton, starting string, and ending string. The automaton is the not dominant piece; the starting and ending string are.) S2 Savitch’s Theorem We have the following remarkable theorem. 112
  • 113. Lecture 17 Notes on Theory of Computation Theorem 16.13 (Savitch): thm:savitch For 푠(푛) ≥ 푛, NSPACE(푠(푛)) ⊆ SPACE(푠(푛)2) Corollary 16.14: PSPACE=NPSPACE. This is because we have just a squaring. Proof. Given 푆(푛)–SPACE NTM 푁, we construct an equivalent TM 푀 that uses 푂(푆2(푛)) space. Imagine a tableaux of 푁 on 푤, corresponding to some accepting computation branch. This time, the dimensions are different: the width is 푠(푛), how much space we have and the height is 푐푆(푛) for some 푐. We want to test if there’s a tableaux for 푁 on 푤, but we want to do it deterministically. Can we fill it in somehow? It’s an exponentially big object, and we’ll be in trouble if we have to keep it all in memory—we don’t have that much memory. The deterministic machine tries every middle configuration sequentially. (This takes a horrendous amount of time but we only care about space.) For a start configuration, ask: can you get from top to middle in time 1 2푐푆(푛) and from middle to bottom in time 1 2푐푆(푛). Now ask this recursively, until we get down to adjacent configurations. How deep is the recursion going to be? The depth of the recursion is log2 푐푆(푛) = 푂(푆(푛)). What do we have to remember every time we recurse? The working midpoint configurations. For each level of the recursion we have to write down an entire configuration. The config-uration takes 푆(푛) space, and each level costs 푂(푆(푛)) space. Hence the total is 푂(푆2(푛)) space. You can implement this in the word-ladder problem: write down a conjecture for the intermediate string. See if can get from/to in half as much time. This is slow slow but runs in relatively small space. 113
  • 114. Lecture 17 Notes on Theory of Computation Lecture 17 Thu. 11/8/12 Problem set 5 is out today. It is due after Thanksgiving, so you can think about it while you’re digesting. Last time we talked about ∙ space complexity ∙ SPACE(푠(푛)), NSPACE(푠(푛)) ∙ PSPACE, NPSPACE ∙ Savitch’s Theorem says that PSPACE=NPSPACE. Today we will ∙ finish Savitch’s Theorem. ∙ Show TQBF is PSPACE-complete. S1 Savitch’s Theorem Recall the following. Theorem (Savitch, Theorem 16.13 again): For 푠(푛) ≥ 푛, NSPACE(푠(푛)) ⊆ SPACE(푠(푛)2). Savitch’s Theorem says that if we have a nondeterministic machine, we can convert it to a deterministic machine using at most the square of the amount of time. Nondeterminism only blows up space by a square, not an exponential. The proof is not super hard but it is not immediately obvious. Proof. For NTM 푁 using space 푆(푛) with configurations 퐶1,퐶2, write 퐶1 푡− → 퐶2 (“퐶1 yields 퐶2 in 푡 steps”) if 푁 can go from 퐶1 to 퐶2 in at most 푡 steps. We give a recursive, deterministic algorithm to test 퐶1 푡− → 퐶2 without using too much space. We will apply the algorithm to 퐶1 = 퐶start, 퐶2 = 퐶accept, and 푡 = 푑푆(푛). We may assume 푁 has a single accepting configuration, by requiring the machine to clean up the space when it is done (just like children have to clean up their room). It puts blanks back, moves its tape head to the left, and only then does it enter the accept state. The basic plan is to make a recursive algorithm. 114
  • 115. Lecture 17 Notes on Theory of Computation We will inevitably have to try all possibilities, but we can do so without using too much space. The machine zoom to the middle, and guess the midpoint configuration. It tries configurations sequentially one after another, as a candidate for the midpoint—think of it as cycling like an odometer through all possible configurations (of symbols and the tape head). This is horrendously slow, but it can reuse space. Once it has a candidate, it solves 2 problems of the same kind recursively: can we get from the top to the middle in half the time, and once we’ve found a path to the middle, we ask can we get from the middle to the bottom? Note that in the second half of procedure, the machine can reuse space from the first half. The machine continues recursively on the top half, splitting it into two halves and asking whether it can get between the configurations in a quarter of the original time. The recursion only goes down a logarithmic number of steps, until it gets to 푡 = 1. There are on the order of 푆(푛) levels. To check whether one configuration follows another in 1 step, just simulate the machine. How much do we have to write down every time we recurse? We have to write down the candidate for the middle. Each time we recurse we have a new configuration to write down. We summarize the algorithm below. On input 퐶1,퐶2, 푡, do the following. 1. For 푡 1, for each configuration 퐶MID test if 퐶1 푡/2 −→ 퐶MID and 퐶MID 푡/2 −→ 퐶2, reusing the space. Accept if both accept (for some 퐶MID). (Then 퐶1 can get to 퐶2 in 2 steps.) 2. If 푡 = 1, accept if 퐶1 can legally yield 퐶2 in 1 step of 푁 or if 퐶1 = 퐶2. The number of levels of recursion is log2 푑푆(푛) = 푂(푆(푛)). Each level requires storing a configuration 퐶MID and uses 푆(푛) space. The total space used is 푂(푆(푛))푆(푛) = 푂(푆2(푛)). 115
  • 116. Lecture 17 Notes on Theory of Computation This tells us PSPACE=NPSPACE. Let’s draw the picture of the world. (If 푆(푛) is polynomial, 푆(푛)2 is still polynomial.) Let’s move to the second topic for today. S2 PSPACE–completeness It is a famous problem whether P=NP. We know NP⊆PSPACE; we can also ask whether P=PSPACE. If a language needs polynomial space, can we just use polynomial time? Un-believably, we don’t know the answer to that either. For all we know, the whole picture collapses down to P! A few (wacky) members of community believe P=NP. No one believes P=PSPACE. That would be incredible. What we do have is the notion of NP–complete. There is a companion notion of PSPACE– completeness. Every problem in PSPACE is reducible to a PSPACE–complete problem. This is interesting for some of the same reasons that NP–complete problems are interesting. Showing a problem is PSPACE–complete is even more compelling evidence that outside P, because else P=PSPACE. Complete problems for class give you insight for what that space is about, and how hard the problems are. PSPACE–completeness has something to do with determining who has a winning strategy in a game. There is a tree of possibilities in a game, and a structure to that tree: I win if for every move you make there exists a move I can make such that... This is the essence of what PSPACE is about. While we don’t know P̸=PSPACE, we do know that P is not equal to the next one up: EXPTIME. You can prove P̸=EXPTIME. That is the first time where technology allows us to show something different. Note there is a tradeoff: more time, less space vs. more space, less time. There are results in these directions, but we won’t do them. For instance, there are Savitch’s Theorem variants, which trade off time for space. It cuts the recursion at different points. 2.1 Definitions This should look familiar, but there’s one point we have to make clear. 116
  • 117. Lecture 17 Notes on Theory of Computation Definition 17.1: We say that 퐵 is PSPACE–complete if 1. 퐵 ∈PSPACE. 2. For every 퐴 ∈PSPACE, 퐴 ≤푃 퐵. We are still using polynomial time reducibility. Why polynomial time? It’s important to realize if we put PSPACE, that would be stupid. If 퐴 is polynomial space reducible to 퐵, what would happen? This is related to the homework due today. The reduction can solve the problem itself and then target it to the right problem. Thus, every 2 problems in P are polynomial time reducible to one another. Every 2 probs in PSPACE are polynomial space reducible to one another. Every we use polynomial space reducibility, every problem is PSPACE–complete. This is not interesting. You have to use a reduction less powerful than class you’re studying. A reduction is a transformer of problems, not a solver of problems. If you have a PSPACE–complete problem, and you can solve it in polynomial time by virtue of some miracle, then every other PSPACE problem can be solved in polynomial time, and we’ve pulled down all of PSPACE into P. It’s important to understand why we set it up this way! 2.2 TQBF is PSPACE–complete An example of a PSPACE problem is TQBF (true quantified boolean formulas, where all variables are quantified by ∀’s and ∃): TQBF = {⟨휑⟩ : 휑 is a true quantified Boolean formula} . For instance, ∀푥∃푦(푥 ∨ 푦). Theorem 17.2: thm:tqbf-pspace-comp TQBF is PSPACE–complete. The proof will be a recap of stuff we’ve seen plus 1 new idea. Proof. 1. TQBF∈PSPACE: We saw last time that recursing on assignments gives a linear space algorithm (Theorem 16.10). 2. Let 퐴 ∈PSPACE be decided by a TM 푀 in space 푛푘. We give a polynomial time reduction 푓 from 퐴 to TQBF, 푓 : 푤↦→ 휑푤, such that 휑푤 “says” 푀 accepts 푤. 휑푤 captures 푀 running on 푤; so far this is the same idea as that in the Cook-Levin Theorem. Consider a tableaux of 푀 on 푤, with width 푆(푛) and height 푑푆(푛). 푀 is deterministic, so there is just 1 possibility for the rows to be the computation history. As in Cook-Levin, we can try to build 휑푤 the same way. That gives us a correct formula. The difference is that before we were talking about satisfiability. We can just put ∃ quantifiers out front to make it a TQBF. This doesn’t work; why? How big is the formula? 117
  • 118. Lecture 17 Notes on Theory of Computation It’s as big as the tableaux, exponentially big! You can’t write down an exponentially big formula in polynomial time. We need a shorter formula which expresses the same thing. The휑푊 from Cook-Levin is too big. This is why the idea from Cook-Levin by itself is not enough. First we solve a more general problem. Let’s solve the problem for 퐶1,퐶2, 푡: give 휑퐶1,퐶2,푡 which says 퐶1 푡− → 퐶2. It will be helpful to talk about any 2 configurations, and being able to go from one to another in a certain amount of time. Even Cook-Levin would give you that: just use 퐶1 and 퐶2 in place of the start and end configuration. But this viewpoint allows us to talk about the problem in a different way. As a first attempt, we can construct 휑퐶1,퐶2,푡 saying 퐶1 푡− → 퐶2 by writing 휑퐶1,퐶2,푡 = ∃퐶MID(휑퐶1,퐶MID,푡/2 ∧ 휑퐶MID,퐶2,푡/2) and constructing subformulas recursively. Why can we write down ∃퐶MID? Really it is represented by the configurations of a bunch of variables. It is shorthand for ∃푥1∃푥2 · · · ∃푥ℓ. If 푡 = 1, then 휑퐶1,퐶2,푡=1 and we can write the formula by Cook-Levin. But have we done anything? The semantics of the formula are correct. All this is saying is that we can get from 퐶1 to 퐶2 in 푡 steps iff there is some midpoint such that we can get from 퐶1 to the midpoint in half the time and from the midpoint to 퐶2 in half the time. (This smells like Savitch’s theorem. There is more than meets the eye!) We cut 푡 in half at the expense of creating 2 subproblems. The number of levels of the recursion is fortunately only 푑. Here 푆(푛) = 푛푘. We end up with polynomial time steps, but we double the size of the formula each time, so it’s still exponential. We ended up not doing anything! This shouldn’t come as a total surprise. We’re still only using the ∃ quantifier. This is still a SAT-problem! We haven’t used the full power of TQBF, which uses ∃ and ∀’s. Now and’s and ∀’s are 2 flavors of the same thing. ∃’s are like or’s. We’re going to get rid of the “and.” This looks like cheating but it’s not: 휑퐶1,퐶2,푡 = ∃퐶MID∀(퐶3,퐶4) ∈ {(퐶1,퐶MID), (퐶MID,퐶2)}(휑퐶3,퐶4, 푡 2 ). There is a fixed cost out front, and a single new formula at each level, not doble formulas, so there is no blowup. We need to show this is legitimate. Note that ∃퐶MID stands for a string 118
  • 119. Lecture 17 Notes on Theory of Computation that is 푂(푛푘 = 푆(푛)) long. The same is true of the ∀ quantifier. Let’s rewrite the ∀ in more legal language: ∀(퐶3,퐶4) ∈ {(퐶1,퐶MID), (퐶MID,퐶2)}(휑퐶3,퐶4, 푡 ) 2 = ∀퐶3∀퐶4[(퐶3,퐶4) = (퐶1,퐶MID) ∨ (퐶3,퐶4) = (퐶MID,퐶2) → 휑퐶3,퐶4,푡/2] This is the trick! This was done at MIT by Larry Stockmeyer in his Ph.D. thesis. It is called the Meyer-Stockmeyer Theorem. How big is this formula? We start off with an exponential number of steps 푑푆(푛) = 푑푛푘 , so the number of recursions is 푂(푛푘). Each adds order 푂(푛푘) stuff out front, so the size of the formula is 푛2푘. Its size is polynomial, but it does have a squaring. We see in both Savitch’s Theorem 16.13 and Theorem 17.2 the following concept. Recursion using middle configurations makes things polynomial, not exponential! In fact, the proof Theorem 17.2 implies Savitch’s Theorem: It could have been a nondeter-ministic Turing machine and the proof still works! Hence, very nondeterministic NPSPACE computation can be reduced to TQBF. If a nondeterministic machine is reduced to TQBF, there is a squaring. Note TQBF can be done in linear space: A deterministic machine goes through all assignments, and solves TQBF in linear time. This gives a different proof of Savitch’s Theorem. 2.3 PSPACE–completeness and games PSPACE–complete problems can look like games. TQBF doesn’t look like a game, but we’ll see it really does. We’ll see other PSPACE–complete problems that are more strictly “games.” My son, all he does is XBox. There is a kind of game we used to play before XBox, called geography. Choose some starting city, like Cambridge. Two players take turns. You have to pick a place whose first letter starts with the same letter Cambridge ends with. Edinburgh ends with H. I can pick Hartford, you Denver, I pick Raleigh, and so on. The first person who gets stuck loses. One more rule: no repetitions. We can model this game as a graph. All cities are nodes. 119
  • 120. Lecture 18 Notes on Theory of Computation Arrows correspond to legal moves. Let’s abstract the game, and forget the labels. We take turns picking some path through the graph. It has to be simple: no repeats. If you get stuck somewhere with no place to go you lose. Depending on how you play, you might win or lose. The question is, if you play optimally, who wins? Given one of these graphs, which side has the win? We’ll show this problem is PSPACE–complete by reducing TQBF to this problem. Lecture 18 Thu. 10/11/12 Last time we showed TQBF is PSPACE-complete, analogous to how SAT was complete for NP. Today we’ll talk about ∙ generalized geography ∙ games ∙ log space: L and NL S1 Games: Generalized Geography Recall our generalized geography: Boston goes to New York City, Newark, etc., Newark goes to Kalamazoo, etc. One important class of PSPACE problems are these games: Given a an initial configura-tion of a game, the moves allowed, and a rule for who has won, which player has the upper hand? If both sides play the best possible strategy, who will win? Many of these problems are in PSPACE. We’ll look at an example, generalized geography, and show that deciding who has a winning strategy is a PSPACE-complete problem. In generalized geography, we give a bunch of geographical names, for instance cities; each city called out has to start with the letter that the previous one ended with. The starting person picks Boston; the second player has to pick a place starting with 푁. Say Newark. 120
  • 121. Lecture 18 Notes on Theory of Computation First person start with 퐾, Kalamazoo. The person who gets stuck because there is no place to move to loses. You can draw a graph that shows the possible moves. Abstracting, we erase the names and just remember the graph. Two players I and II take turns picking nodes of a simple path. The first one unable to move loses. Let GG = {⟨퐺, 푎⟩ : Player I has a winning strategy (forced win) in 퐺, starting at 푎} . Here’s an example. In general, figuring out who has winning strategy is not so easy: it is PSPACE–complete. The proof is nice: it reveals connections between quantifiers and games. Theorem 18.1: GG is PSPACE-complete. Proof. Like showing a problem is NP-complete, we start off with a problem we already know to be PSPACE-complete. We have to show two things. 1. GG∈PSPACE. (This is easy, a straightforward recursive algorithm.) 2. TQBF≤푃PSPACE. (This is the interesting fun part.) To make sense of this reduction, we look at the TQBF problem in a different way, as a game. Let 휑 be a quantified Boolean formula, for instance 휑 = ∃푥1 ∀푥2 ∃푥3 ∀푥4[Ψ]. We know how to test whether this is true: Calculate and see if it works out. Put this aside for a moment. Let’s create a game for the formula. There are two players: one of them is called ∃ and the other is called ∀. This is how you play the game. The players take a look at a formula. Start with ∃’s turn. ∃ gets to pick the value of 푥1. Then it is ∀’s turn. ∀ gets to pick the value of the next variable 푥2, and so forth. (There may be several variables in row with the same quantifier, but we can always throw in dummy variables so they alternate.) ∃ pick values of ∃ variables, ∀ pick values of ∀ variables. The two players have opposing desires. ∃ is trying to pick values of variables to make the formula true at the end, to make variables satisfy the formula. ∀ is trying to do the opposite: make the formula false. ∃ wins if then chosen values of 푥1, . . . , 푥4 satisfy Ψ and ∀ wins if the chosen values don’t satisfy Ψ. 121
  • 122. Lecture 18 Notes on Theory of Computation I don’t know if this game will be a big seller. However, it is a valid game: each player is trying to achieve an objective, and at end, we know who won. Who has the winning strategy? The cool thing is that we’ve already run into this problem. This is exactly the same as the TQBF problem. ∃ has a winning strategy exactly when it is a true quantified boolean formula. What does it mean to have winning strategy? It means that under optimal play, the ∃ player can make the formula true. In other words, there exists some move, such that no matter what the for all player does for 푥2, there exists some move 푥3... Whether ∃ has a winning strategy is the the same as the truth value of the formula: whether there exists some value, such that for all... This is just a different view of the truth value. With this representation, we can reduce from TQBF to GG, i.e., show TQBF≤푃GG. The technique is reminiscent of SAT reductions: We make gadgets, etc. The way you put them together, though, is different because there is a dynamic game component to it. Playing the game simulate playing the formula game. The gadgets work a little differently. We send 휑↦→ ⟨퐺, 푎⟩ ∃, ∀ 퐼, 퐼퐼 Player I will be like ∃ and player II will be ∀. The graph will have a sequence of diamonds. Let’s look at a fragment and think about how it proceeds. ∀ starts at the top. ∀ has no choice. ∃ player has a choice. Now ∀ has a choice. This simulates the choice of the variables. The first diamond is the gadget for 푥1, the second for 푥2. (Figure from book) 122
  • 123. Lecture 18 Notes on Theory of Computation If ∀ appeared twice in a row, then we wouldn’t have an extra node, which just served to switch whose turn it is. After the diamond at the very bottom, all truth values for variables have been chosen. Let’s assume going left corresponds to T and right corresponds to F. In the variable game, the game is over. In generalized geography, we’re not finished because we want to arrange more structure—an endgame—so that the ∃ player wins iff the formula is satisfied. There is one node for each of the clauses. The ∀ player picks a clause. The ∀ player is claiming, or hoping, that the clause is unsatisfied. (We can assume it is in CNF, just like we reduced SAT to 3SAT.) ∀ is trying to demonstrate that the formula not satisfied, by picking the unsatisfied clause. “You didn’t win because clause 2 is not satisfied.” (The one who tells the truth is going to be the one who ultimately wins.) Each clause points to all its literals. Psychologically, ∀ claims 푐1 not satisfied. ∃ says it is satisfied because 푥1 is true. Now it’s the moment of truth. It is ∀’s turn. The positive literal is connected to true side of the construct. Negated variables get connected to false side. ∃ claims “this clause is satisfied by 푥1.” If ∃ was right, and earlier the game had gone through the true side of 푥1, then the ∀ player can’t move. If the ∀ player is right, then play went down the other way, ∀ can move, and now ∃ is stuck. 123
  • 124. Lecture 18 Notes on Theory of Computation All these nodes and arrows are laid down before the play begins. We build gadgets up front, one for each variable, and lay down nodes for each clause. We connect the nodes corresponding to literal in the clauses left or right depending on whether they are positive and negative. Playing the generalized geography game is just playing the formula game. There is a winning strategy exactly when the counterpart in the formula game has winning strategy. This shows TQBF is PSPACE-complete, and hence probably a hard problem. Similar results have been proven for natural games: the game of Go is played on a 19×19 board; 2 players have 2 colors of stones, each trying to surround the other person’s stones. Determining which side has a winning strategy in Go from some preset board configuration is PSPACE–hard: you can reduce GG to the Go problem. There are structures in Go which correspond to moving through the GG configuration, and playing Go game corresponds to GG. There are 2 remarks in order. We actually have to generalize Go: 19×19 finite problem; we consider a 푛 × 푛 board. All results are asymptotic. (If we only considered 19 × 19, then the problem is just a big table lookup.) Go is at least PSPACE-hard. Whether it’s in PSPACE depends on details on how the set game up. A special rule might let the game go on for a very long time, so this depends on details of the definition of game. PSPACE–hardness has been proven for other games 푛 × 푛 checkers, and 푛 × 푛 chess (which is less natural). We now shift to a different set of classes, still in space complexity. S2 Log space Instead of talking about polynomial space, we’ll talk about a whole different regime, called log space. It has its own associated complexity classes and natural problems. We look at SPACE(log 푛) and NSPACE(log 푛). We have space bounds that have size less than the problem. In order to make sense of this, we need to introduce a different model. Just by reading the entire input, the machine use space 푛. That is no sensible way to talk about log space. Instead, we allow the machine to read the entire input, but have a limited amount of work space. Thus we consider a multitape Turing machine, with a 1. input (read-only) tape, and a 2. work (read/write) tape. We will only count the space used on the work tape. The input given for free. 124
  • 125. Lecture 18 Notes on Theory of Computation We can talk about log 푛-bounded work tapes. There will be an assumed constant factor allowed. Definition 18.2: Define L = SPACE(log 푛), NL = NSPACE(log 푛), Why 푂(log 푛)? Why not 푂(√푛), etc? log 푛 is a natural amount of space to provide. It is just enough to keep track of a pointer into the input. Constant log 푛, for instance 7 log 푛, that’s enough to keep track of 7 pointers into your input. Using log space, a machine can keep track of a finite number of pointers. We’ll do a couple of examples. Example 18.3: We have that the set of palindromes is in log-space. ⌋︀푤푤ℛ : 푤 ∈ {0, 1}*{︀∈ 퐿. The machine zigzags back and forth on the input. It can’t make any marks on the input, only keep track of stuff on the work tape. This is still good enough. The machine keeps track of how many symbols are already matched off; a fixed number of pointers enable this. For instance, it could record that it has already matched off 3 symbols, and is now looking at the 4th on the left or right. The machine uses a log-space work tape. We’re considering machines with separate read-only input. The input may be enormous: for example, input from a CD-ROM or DVD-rom, onto your small little laptop. The laptop doesn’t have enough internal memory to store all of it. A better analogy is that the read-only input tape is the Internet, huge. You can only store addresses of stuff and probe things. What kinds of problems can you solve, if you have just enough memory to write down the index of things? Example 18.4: path:nl We have PATH = {⟨퐺, 푠, 푡⟩ : 퐺 has a directed 푠, 푡 path} ∈ NL. 125
  • 126. Lecture 18 Notes on Theory of Computation A nondeterministic machine can put a pointer on the start node, then nondeterministically choose one of the outgoing edges from the start node. It remembers only the current node it moved to. It forgets where it came from. The machine repeats. The machine jumps node by node nondeterministically, and accepts if it hits 푡. The machine enough space to remember a node, which is logarithmic space, and also space to count how many nodes it has visited, so it can quit if it has visited too many vertices. Can we solve PATH deterministically in log-space? Consider an enormous graph written down over the surface of the US, with trillions and trillions of nodes. Can you with 20 friends (or however many facebook friends you have), each just keeping track of where you are (a pointer into a location), operating deterministically, figure out whether you can get from some location to another? You can communicate by walkie-talkie (or by Internet). Nobody knows the answer. Whether PATH is solvable deterministically (PATH ∈ 퐿?) is an unsolved problem. In fact the L vs. NL problem is open just as P vs. NP is open. There are NL-complete problems. If you can solve any of them in L, then you bring all of NL to L. PATH turns out to be complete for NL. We’ll start to prove that. S3 퐿,푁퐿 ⊆ 푃 Before that, let’s look at the connection between L, NL, and the classes we’ve already seen. Theorem 18.5: L⊆P. Proof. If 퐴 ∈L and TM 푀 decides 퐴 using 푂(log 푛) space, we have to show there is a deterministic machine that solves 퐴 in polynomial time. How many configurations does the machine have? This tells us how long the machine can go for. Fix the input 푤. A configuration of 푀 on 푤 is (푞, ℎ1, ℎ2,work tape contents). We don’t include 푤 because it is read-only. The number of configurations is |푄| · 푛 · 푑 log 푛 · 푐푑 log 푛 ⏟ ⏞ 푛푘 = 푂(푛ℓ). for some 푘, ℓ. No configuration can repeat, because no looping is allowed. Since the machine can have an at most polynomial number number of configurations, it runs in polynomial time. We get 퐴 ∈P. The following is trickier. Theorem 18.6: thm:nl-p NL⊆P. The previous proof would only give NL⊆NP. To get a deterministic polynomial time algorithm we need to construct a different machine. 126
  • 127. Lecture 19 Notes on Theory of Computation Proof. Given a NL TM 푁, we convert it to an equivalent polynomial-time TM 푀. How many configurations does 푁 have? Whe we count the number of configurations, it doesn’t matter if the machine is deterministic or not! A configuration is simply a snap-shot. 푀 takes all configurations and write them down, but there’s only polynomially many. 푀 =“on 푤, 1. Write all configurations of 푁 on 푤. We will treat these as the nodes of a graph, called the configuration graph. 2. Put an edge from one configuration 푐1 to another 푐2 when 푐1 leads to 푐2 in one step. Now we have a big graph of all possible configurations. We have 푐start and 푐finish (we can assume there is a single accepting configuration, that the machine clears the work tape and moves its head to the left). Now we test if there is a path from the starting to the accepting configuration. If there is a path, the nondeterministic machine accepts its input. The path gives a sequence of configurations that the nondeterministic machine goes on some path from start to accept. Conversely, if the machine does accept, there has to be a path of configurations from start to accept, so there is a sequence of edges go from start to accept. A polynomial time machine can do this test because it’s the PATH problem! Depth or breadth first search works fine. This answers whether the NL machine accepts the input. Lecture 19 Thu. 11/15/12 Last time we talked about... ∙ GG is PSPACE-complete ∙ L and NL We reduced from TQBF to GG to show GG is PSPACE-complete. Then we turned our attention to a different regime: what if we consider logarithmic space instead of polynomial space? Log space is enough to give you pointers into the input. This has a certain power which we can describe; it fits in nicely into our framework. Today we’ll talk about ∙ NL-completeness ∙ NL=coNL (this differs from what we think is true for NP) 127
  • 128. Lecture 19 Notes on Theory of Computation Recall that L=SPACE(log 푛) and NL=NSPACE(log 푛). We have a nice hierarchy: L ⊆ NL ⊆ P ⊆ NP ⊆ PSPACE. We don’t know whether these containments are proper. We can show that PSPACE and NL are different (and will eventually do so), so not everything in the picture collapses down. Most people believe that these spaces are all different; however, we don’t know adjacent inclusions are proper. However, NL=coNL shows that surprising things do happen, and we do have unexpected collapses. First let’s review a theorem from last time. Theorem (Theorem 18.6): NL⊆P. Proof. For a NL-machine 푁, a configuration of 푁 on 푤 is (푞, 푝1, 푝2, 푡). The number of configurations of 푁 on 푤 is polynomial in 푛 where 푛 = |푤| (푤 is fixed). The computation graph is the graph where ∙ nodes are configurations, and ∙ edges show how 푁 can move. Here is a polynomial time algorithm that simulates 푁. “On input 푤, 1. Construct the computation graph. 2. Test if there is a path from start to accept (using any polynomial time algorithm for PATH). 3. Accept if yes and reject if no.” 128
  • 129. Lecture 19 Notes on Theory of Computation S1 L vs. NL Now we turn our attention to L vs. NL. We’ll show that the situation is analogous to the situation of P vs. NP. How much space deterministically do we actually need for a NL problem? We can do it with polynomial space but that’s pretty crude. We can do much better. We have using Savitch’s Theorem that NL = NSPACE(log 푛) ⊆ SPACE(log2 푛) We stated Savitch’s Theorem for space bounds ≥ 푛; with space bounds of ≥ log 푛 the same argument goes through. No one knows whether we can reduce the exponent, or whether L=NL. (We will show that SPACE(log 푛) is provably different fron SPACE(log2 푛), using the hierarchy theorem 20.1. When we increase the amount of space/time, we actually get new stuff. But maybe some other argument could show NL⊆SPACE(log 푛).) We will show that there are NL-complete problems, an example of which is PATH. If you can solve PATH or any other NL-complete problems in deterministic log space, then it brings down everything with it to L. We’ll show everything in NL is reducible to the PATH problem. This shouldn’t be a surprise because it’s what we did in the previous theorem: whether a machine accepts is equivalent to whether there’s a path. We’ll just need to define NL-completeness in the appropriate way and then we’ll be done by the argument given in the NL⊆P theorem. Definition 19.1: 퐵 is NL-complete if 1. 퐵 ∈NL 2. Every NL-problem is log-space reducible to 퐵: for every 퐴 ∈NL, 퐴 ≤퐿 퐵. We need to define what it means to be log-space reducible. We have to be careful because the input is roughly 푛, and the output is roughly 푛. we don’t want to count the output of machine in the space bound. The input and output should be kept seprate. Definition 19.2: A log-space transducer is a Turing machine with 3 tapes, 1. input tape (read only), 2. work tape (read-write), and 3. output tape (write only), such that the space used by the work tape is 푂(log 푛) with 푛 the size of the input. We say that 푓 : Σ* → Σ* is computable in log-space if there is a log-space transducer that on input 푤, which leaves 푓(푤) on the output tape. 129
  • 130. Lecture 19 Notes on Theory of Computation We don’t use polynomial reducibility because once we have polynomial time we can solve NL problems. The reducer can figure out whether a string is in 퐴, then direct it to a point in 퐵. It would not change the problem, just get the answer and dump it in 퐵. If we used polynomial reducibility, everything would be NL-complete except 휑 and Σ*. We need a reduction that the L machine could compute. If we used polynomial reduction, an L machine couldn’t necessarily make the reduction. But if we log space reduction, then an L machine can compute the reduction. We have the following analogous theorem. Theorem 19.3: If 퐴 ≤퐿 퐵 and 퐵 ∈ 퐿 then 퐴 ∈ 퐿. Why doesn’t the same argument for P work for L? It’s a bit tricky because we can’t write all of the output of 푓 on a L-machine. Proof. The algorithm for 퐴 is the following. “On 푤, 1. Compute 푓(푤). 2. Test if 푓(푤) ∈ 퐵. 3. Accept or reject accordingly. But we can’t write 푓(푤)! There’s a trick that fixes this. We run 퐵 without having 푓(푤) available. Every time we need a bit of 푤, we run the whole reduction, throw away all the output except the bit we’re looking for, and plug that into the machine. Recomputation allows us to get by with logarithmic memory. Proposition 19.4: If 퐴 ≤퐿 퐵 and 퐵 ≤퐿 퐶 then 퐴 ≤퐿 퐶. Proof. Use the same idea, doing computation on the fly. Now let’s turn to NL-completeness. S2 NL-completeness Theorem 19.5: PATH is NL-complete. Proof. We have to show 1. PATH∈NL: We already proved this (Example ??). 130
  • 131. Lecture 19 Notes on Theory of Computation 2. For 퐴 ∈NL, 퐴 ≤퐿PATH. We give a generic reduction. Say that the NL-machine 푁 decides 퐴. We give the reduction 푓. Given 푤, let 푓(푤) be ⟨퐺, 푠, 푡⟩ where 퐺 is the computation graph for 푁 on 푤, 푠 is 퐶start, and 푡 is 퐶accept (again, we assume 푁 cleans up its tape at the end, so that there is just one accept state). Testing whether there is a path from 퐶start to 퐶accept is an instance of the PATH problem. We have 푤 ∈ 퐴 iff there is a path from 푠 to 푡. This machine does the right thing. We have to show we can do the reduction in log space, i.e., 푓 is log-space computable. 푓(푤) is supposed to be a description of the nodes and edges, and which is the starting and ending node. Split the work tape into 2 pieces, representing 2 configurations of 푁 on 푤, say 퐶1 and 퐶2. We’ll go through all possible pairs of configurations sequentially, just like an odometer. For each possibility of 퐶2 look at all possibilities of 퐶1. We cycle through all possible pairs of configurations, testing whether 퐶1 legally yields 퐶2 in 1 step according to the rules of 푁. If so, take the pair and output an edge between them. The whole thing takes log-space, because writing 퐶1,퐶2 takes log space. This proves 푓 is a log-space computable function, so the reduction takes log-space. Note that the output depends on 푤. How? Which configurations lead from others—it might seem like these depend only on machine. But 푓(푤) should depend on 푤. The start configuration doesn’t depend on 푤, and doesn’t have 푤 built in. When you look at whether you can transition from 퐶1 to 퐶2, they have head positions as part of the configuration. In order to see whether 퐶1 leads to 퐶2 we have to see what’s in the cell that the head is at. Thus the edges of the graph do depend on 푤. For homework, you need to show other problems are NL-complete. To show other prob-lems are NL-complete, we reduce PATH to show they are also NL-compute, just like we reduced 3SAT. Let’s move on the this amazing problem. 131
  • 132. Lecture 19 Notes on Theory of Computation S3 NL=coNL Let’s look at the picture. 20 years ago we thought NL̸=coNL, with L in the intersection, much like we still think the picture for P vs. NP still looks like this. However, actually NL=coNL. Theorem 19.6: NL=coNL. Proof. A reducible to B exactly when 퐴 reducible to 퐵, so all we need to do is show PATH ∈NL. How do we give a NL-algorithm that recognize the nonexistence of a path? What could we guess, so that if accept at the end, there’s no path? Perhaps we could guess a cut. But writing down a cut requires more than log-space. The algorithm is very nonobvious. This was a prize-winning paper. We’ll give the algorithm in pictures. We have our graph 퐺, with starting and ending nodes 푠 and 푡. The idea came a little out of left field. The guy’s advisor asked: if you’re trying to solve problems and you’re given information for free, what happens? What happens if you’re given for free the number of nodes you can get to from 푠? We first give a NL-algorithm for the PATH, given the number of nodes reachable from 푠. Let 푅 be the set of nodes reachable from 푠. Let 푐 be the size of 푅. Our algorithm goes through all nodes of 퐺 one by one. Every time it gets to a new node, it guesses whether the node is reachable. If it guesses the node is reachable, it will prove it’s right by guessing the path. (If can’t find the path, that branch dies.) If the node is reachable, some branch will guess the right path and then move on. We keep track of how many reachable nodes we’ve found. When we’re done, if the count equals 푐, then we’ve guessed right all the way along; we’ve found all the reachable ones. All the ones that we’ve guessed to be nonreachable are really nonreachable. If 푡 wasn’t guessed, 푡 is nonreachable! Now we’ve reduced the problem to computing 푐. We compute it using the same technique. We layer the graph into 푅0,푅1,푅2, . . . where 푅푖 = nodes reachable from 푠 by path with length ≤ 푖. Note 푅0 ⊆ 푅1 ⊆ · · · ⊆ 푅푚 = 푅 beacuse 푚 is the maximal possible number of steps you need to reach any node. Let 퐶푖 = |푅푖|. 132
  • 133. Lecture 20 Notes on Theory of Computation We will show how to compute 퐶푖+1 from 퐶푖. Then we can throw away the previous 퐶-value. So we can get the count of nodes reachable in any number of steps, and we’re done. We need a way of testing whether a node is in 푅푖+1: nondeterminism will either get the value correctly, or that branch of the nondeterminism will fail. Some branch will have guessed everything correctly along the way. Each time we’re testing whether a node 푣 is in 푅푖+1, we go through all the nodes, guessing which are in 푅푖 and which are not. If we guess it’s in 푅푖, we prove it is in by guessing a path of length at most 푖. We keep a count of the number of nodes in 푅푖 and make sure it equals 퐶푖 at the end. Along the way we check if any of these nodes connect to 푣. Now iterate over 푖 = 0, . . . ,푚 − 1. Lecture 20 Tue. 11/20/12 Last time we showed: ∙ PATH is NL-complete ∙ NL=coNL Today we’ll talk about the time and space hierarchy theorems. S0 Homework Problem 3: Show a language is in L. If you just try to do it with your bare hands, it’s a mess. But if you use the methodology we talked about in lecture, it’s a 1-2 line proof. Don’t just dive in but use a technique we introduced to make it simpler. Problem 6: Here the satisfiability problem is made to be easier: The clauses have at most 1 negated literal per clause (for instance, (푥 ∨ 푦1 ∨ · · · ∨ 푦푘)), and that negated literal cannot appear anywhere else. This turns out to be solvable in NL, and be NL-complete. As a hint, (푎 ∨ 푏) is equivalent to 푎 → 푏. Thus we can rewrite (푥 ∨ 푦1 ∨ · · · ∨ 푦푘) as (푥 → (푦1 ∨ · · · ∨ 푦푘)). 133
  • 134. Lecture 20 Notes on Theory of Computation This suggests you think of the problem as a graph which represents the formula in some way. The nodes are the clauses, and have an edge going from (푥 → 푦1 ∨ · · · ∨ 푦푘) to a clause containing 푦1 on the left-hand-side of an implication. If 푥 is true, one of 푦1, . . . , 푦푘 is true; then following an edge we get to one of the 푦푘. Think of this as a connectivity problem in a graph. For the reduction, we want to reduce the graph to the restricted satisfiability problem. We can just reduce from graphs that don’t have any cycles in them. Reduce a path problem to the satisfiability problem, using a construction inspired by the above. The construction requires the starting graph not to have cycles. You have to remove the cycles because they cause problems. The acyclic path problem is still NL-complete; this is in the textbook. Use the technique of level graphs, explained below. To show PATH ≤퐿 acyclic-PATH, take your graph and convert it to a bunch of copies of the vertex set, where an edge from 푎 to 푏 now goes from 푎 in one layer to 푏 in the next layer down. There are no backward-going edges so there are no cycles. But if we had an edge from 푠 to 푡, there is still an edge from 푠 to 푡 in modified graph. S1 Space hierarchy We’ve wrapped up the basics on complexity classes for time and space. We’ll now talk about a pair of theorems that relate to both time and space. The hierarchy theorems have a very simple message. With respect to time and space (let’s think of time for the moment), and if you have a certain amount of time you’re allowing the machine, then if you increase the time, you’d expect there’s more stuff the machine you could do (say 푛3 instead of 푛2). For this question the answer is known: if you give the machine a little more time or space, it can do more things. In particular, the theorem tells you there are decidable languages that are not in P. So far we have L ⊆ NL ⊆ P ⊆ NP ⊆ PSPACE. Even 퐿 ?= 푃 is open, and 푃 ?= PSPACE is open. However, 퐿, PSPACE are provably different, so we can’t have both 퐿 = 푃 and 푃 = PSPACE. There are separations out there, which we don’t know how to prove. The general belief is that all of these are separate. We can actually prove something stronger. We have by Savitch’s Theorem that NL ⊆ SPACE(log2 푛) ⊂ PSPACE, the inclusion proper by Space Hierarchy. We know 푁퐿̸= PSPACE, but nothing stronger is known. Theorem 20.1 (Space Hierarchy Theorem): thm:space-hierarchy For functions 푓, 푔 : N → N where 134
  • 135. Lecture 20 Notes on Theory of Computation 1. 푓 is space constructible: it can be computed in 푓(푛) space. (This is a technical condition that all normal functions will satisfy.) 2. 푔(푛) = 표(푓(푛)), then there is a language 퐴 ∈ SPACE(푓(푛)) with 퐴̸∈ SPACE(푔(푛)). (Note 푔(푛) = 표(푓(푛)) means 푔(푛) 푐푓(푛) for any constant 푐 0, if you make 푛 large enough. In other words, 푓(푛) dominates 푔(푛) for large enough 푛.) We will find some language 퐴 in SPACE(푓(푛)) and not in SPACE(푔(푛)), to prove this. For instance take 푔(푛) ∼ 푛2 and 푓(푛) ∼ 푛3: we can do something in 푛3 space that we can’t do in 푛2 space. The space hierarchy theorem has a slightly easier proof than the time hierarchy theorem. What are we going to do? I’ll tell you what we’re not going to do. It would be nice if the language was some nice language, understandable as a string manipulation, with 푓 as a parameter somewhere. Rather, it will be an artificial concocted language designed specifically to meet the conditions that we want; we won’t be able to understand it simply otherwise. Later on we’ll find more natural languages that take a lot of space. The machine operates in space 푓(푛), and by design, makes sure its language can’t be done in less space. It simulates all smaller space machines and acts differently from them. Really it amounts to a diagonalization. We build something different from everything in some list. Let’s review diagonalization. To prove R is uncountable, given a list of real numbers, we make a number differing from everything in the list by at least one digit (Theorem 8.7). To show 퐴푇푀 is undecidable, we make a machine that looks at what 푀푖 does on ⟨푀푖⟩ and does the opposite (Theorem 8.10). Its language 퐷 is new thing, and can’t be on the list of all possible Turing machines, a contradiction. Our proof is similar in spirit. Think of 푀푖 as the machines that operate in space 푔(푛) where 푔(푛) = 표(푓(푛)), the small space machines. 퐷 does something different from what each 푀푖 does, so 퐷 can’t be a small space machine. However, 퐷 is decidable. Testing whether the 푀푖 accept their input takes small space. Our language is decidable in space just little more. We have to be careful in our analysis just to show 퐷 can decide the diagonal in just a little more space; by construction it can’t do the tests in small space, but can do it in more space 푓(푛). Proof. We give a Turing machine (decider) 퐷 where 퐴 = 퐿(퐷) and 퐷 is a decider running in space 푂(푓(푛)). This gives 퐴 ∈ SPACE(푓(푛)). Our algorithm for 퐷 will show that 퐷 is not solvable in smaller space, 퐴̸∈ SPACE(푔(푛)). Our first try is the following. Let 퐷 =“on input 푤 (of length 푛): 135
  • 136. Lecture 20 Notes on Theory of Computation 1. Compute 푓(푛) and mark off 푓(푛) tape cells. (If the machine ever needs to use more space, then reject.) 12 2. If 푤̸= ⟨푀⟩ for some TM 푀, then reject. If 푤 = ⟨푀⟩, continue; 퐷 will try to be different from 푀. 3. Run 푀 on 푤 and do the opposite (this is an oversimplication; we have to make some adjustments).” Modulo a little engineering, this is our description of 퐷. Conceptually, this is the whole proof. But we might not finish, 푀 might take more space than we allocated, in which case 퐷 ends up rejecting. Is that a problem? We only have an obligation to be different from the small-space machines. We’ll be able to run small spaces to completion. Our language is different from what those languages are. This is a bit of a cheat. There are 2 critical flaws in the argument. I claimed that if 푀’s computation doesn’t fit in the space, I don’t have to worry about it. That’s not true. It could be the machine uses lots of space on small input, but on large input, it uses space 표(푓(푛)). We have 푔(푛) 푓(푛) for a particular 푤 (but not asymptotically)—we had one chance to be different from that machine, and we’ve blown it. No one tells us the constant factor. This problem seems more serious! We want to run 푀 on a bigger 푤. We don’t what 푤 we need, but big enough so the asymptotics kick in. Thus we’ll pad it in all possible ways; we’ll have infinitely many chances to be different. We change the above as follows. let’s strip off trailing 0’s and see if the remainder is a Turing machine. We could have a string with billions of 0’s, run on some relatively small Turing machine. Let 퐷 =“on input 푤 (of length 푛): 1. Compute 푓(푛) and mark off 푓(푛) tape cells. (If the machine ever needs to use more space, then reject.) 2. If 푤̸= ⟨푀⟩ 0* for some TM 푀, then reject. If 푤 = ⟨푀⟩ 0*, continue; 퐷 will try to be different from from 푀. 12We use the technical condition that 푓(푛) can be computed in 푓(푛) space; the machine needs to understand how much extra space it got in order to do something new to it. There is a counterpart to the theorem: we can construct gaps in hierarchy where nothing new from 푔 up to 푓, by constructing 푓 so complicated, that we can’t compute 푓 in 푓 space. This is the gap theorem. There is one gap you can describe easily; log-log-space. There is a gap between constant space and log-log-space. Nothing nonregular is in 표(log log 푛) space. 136
  • 137. Lecture 20 Notes on Theory of Computation 3. Run 푀 on 푤 and do the opposite.” This allows 퐷 to run 푀 on very long inputs. This solves one problem. But it’s possible that 푀 on 푤 goes forever. It can only do so in a particular way: using a small amount of space. If 푀 blindly simulates, it is going to loop. The amount of time it can take is exponential in the amount of space. Thus, we run a counter to count up the amount of time a machine can run without getting into a loop, on that amount of space. It’s just 2푓(푛). The counter takes a constant factor more space; put the counter out to the right, or think of it running on a separate track below. If we exceed the time bound, then reject. Using asymptotics, for large enough 푛, we will run to completion on some input and be different. Let 퐷 =“on input 푤 (of length 푛): 1. Compute 푓(푛) and mark off 푓(푛) tape cells. (If the machine ever needs to use more space, then reject.) 2. If 푤̸= ⟨푀⟩ 0* for some TM 푀, then reject. If 푤 = ⟨푀⟩ 0*, continue; 퐷 will try to be different from from 푀. 3. Run 푀 on 푤 and do the opposite. (a) Reject if exceeds 2푓(푛) time.” Constructibility works down to log 푛 (we have to work with the special model for sublinear space). S2 Time Hierarchy Theorem The issue of the overhead becomes more of a concern. Theorem 20.2: thm:time-hierarchy If 푓 is time-constructible and 푔(푛) = 표 (︁푓(푛) log 푛)︁, then there exists 퐴 ∈ TIME(푓(푛)) and 퐴̸∈ TIME(푔(푛)). In the interest of time (pun intended) we’ll just sketch the proof. The idea is the same. Proof. Let 퐷 =“on input 푤 of length 푛, 1. Compute 푓(푛). Start counter (“timer”) to 푓(푛). 2. If 푤̸= ⟨푀⟩ 0* then for some TM M, reject. 3. Run 푀 on 푤 and do the opposite (provided it runs within the time on the counter). 137
  • 138. Lecture 21 Notes on Theory of Computation We have to be careful. Every time we do a step, we refer back to 푀. The overhead, if we’re not careful, will be bad. We only have an extra factor of log 푛 to work with. We extend our tape alphabet so that every tape cell has enough space to write 2 symbols. We’ll keep the description of 푀 on the tape: Like checking out book from the library, we’ll take 푀 and carry it with us. More complicated is the counter. 푀 is constant size thing. The counter is not constant in size; it grows with 푛, hence is logarithmic in size. This contributes log 푛 overhead. Lecture 21 Tue. 11/27/12 Last time we talked about hierarchy theorems. If we allow a bit more time or space, then there are more things we can do. Today we’ll talk about ∙ natural intractable problems ∙ Relativization, oracles S1 Intractable problems Definition 21.1: Define EXPTIME = ⋃︁푘 TIME(2푛푘 ) EXPSPACE = ⋃︁푘 SPACE(2푛푘 ). (Think of it as 2poly(푛).) The hierarchy theorems show that there are things we can do in exponential time that we can’t do in polynomial time, and the same is true for space. We have proper inclusions. 푃 ⊂ EXPTIME, PSPACE ⊂ EXPSPACE. We found 퐴 ∈ EXPSPACE∖PSPACE. This was a language that the hierarchy machine produced for us. It decides in such a way that makes it provable different. 퐴 is by design not doable in polynomial space, because it diagonalizes over all polynomial space machines. 138
  • 139. Lecture 21 Notes on Theory of Computation But 퐴 is an unnatural language; it has no independent interest; it just came out for sake of proving the hierarchy theorem. We’d like to prove some more natural language is in EXPSPACE∖PSPACE. To do this we turn to completeness. We’ll introduce an exponential space complete problem, in the same spirit as our other complete problems. Everything in the class reduces to it. It cannot be in polynomial space because otherwise PSPACE=EXPSPACE. Because 퐴 is outside PSPACE, the classes are different, and the exponential space complete problem must also be outside. The language is a describable language. We can write it in an intelligible way. It’s a toy language. There are languages that people are more interested in that have completeness properties. Our problem will illustrate the method, which we care about more than the results. This is like the Post Correspondence Problem. Other languages are less convenient to work with. Definition 21.2: A language is intractable if it is provably outside of 푃. Example 21.3: Here’s a problem that mathematicians care about. Remember that we talked about number theory: we can write down statements. Consider a statement of number theory with quantifiers and statements involving only +. Chapter 6 gives an algorithm for testing whether such statements are true of false. It’s a beautiful application of finite automata. The algorithm is very slow; it repeatly involves converting a NFA to a DFA, which is an exponential blowup. The algorithm runs in time 22. . . , a tower whose length is about the length of the formula. It can be improved to double exponential. Is there a polynomial time algorithm? No, it’s complete for double exponential time; it provably cannot be improved. We’ll give the flavor of how these things go, by giving an exponential problem that’s more tailored to showing it’s complete. That’s the game plan. 1.1 EQREX First we consider a simpler problem. Definition 21.4: Define EQREX = {⟨푅1,푅2⟩ : 푅1,푅2 are regular expressions and 퐿(푅1) = 퐿(푅2)} . 139
  • 140. Lecture 21 Notes on Theory of Computation This can be solved in polynomial space (it’s a good exercise). We can convert regular expressions to NFAs of about the same size. Thus we can convert the problem to testing whether two NFA’s are equivalent. We’ll look at the complementary problem, the inequiva-lence problem, show that is in PSPACE. We show EQREX is in NPSPACE and use Savitch’s Theorem 16.13. The machine has to accept if the strings are not equivalent. We’ll guess the string on which they give a different answer. If one machine is in an accepting state on one and the other machine not in an accepting state on any possibility, we know the machines are not equivalent, and we can accept. Does this also show the inequivalence problem is in NP? Why not? Why can’t we use the string as the witness, that’s accepted by one machine to another? The mismatch could be a huge string that is not polynomially long. The first string on which differ could be exponentially long. To use polynomial space, we modify our machine so it guesses symbol by symbol, and simulates the machine on the guessed symbols. A variant of this problem is not in PSPACE. For a regular expression 푅, let 푅푘 = 푅· · ·푅 ⏟ ⏞ 푘 . Imagining 푘 is written down as a binary number, we could potentially save a lot of room (save exponential space) by using exponen-tiation. We’ll talk about regular expressions with exponentiation. Definition 21.5: Define EQREX↑ = {⟨푅1,푅2⟩ : 푅1,푅2 are regular expressions with exponentiation and 퐿(푅1) = 퐿(푅2)} . Definition 21.6: We say 퐵 is EXPSPACE–complete if 1. 퐵 ∈EXPSPACE. 2. for all 퐴 ∈EXPSPACE, 퐴 ≤푃 퐵.13 We show the following. Theorem 21.7: EQREX↑ is EXPSPACE–complete. Proof. First we have to show EQREX↑ is in EXPSPACE. If we have regular regular expres-sions, we know it’s in polynomial space; using the Savitch’s Theorem trick we argued at the beginning of class that it’s doable in polynomial space. For regular expressions with exponentiation, expand each concatenation. This blows up the expression by at most an exponential factor. Now use the polynomial algorithm in the exponentially larger input. The claim follows immediately. 14 13Why polynomial time reduction, not polynomial space reduction? Reductions are usually doable in log space; they are very simple transformations relying on repeated structure. Cook-Levin could be done in log-space reduction. If weaker reductions already work, there’s no reason to define a stronger one. 14If we allow complements in the expression, we’re in trouble. The algorithm doesn’t work for complements. If we have complementation we have to repeatly convert NFA’s to DFA’s to make everything work out. 140
  • 141. Lecture 21 Notes on Theory of Computation Now we show EQREX↑ is EXPSPACE–complete. Let 퐴 ∈EXPSPACE be decided by TM 푀 in space 2푛푘 . We give a reduction 푓 from 퐴 to EQREX↑ sending 푤 to 푓(푤) = ⟨푅1,푅2⟩, defined below. Let Δ be the computation history alphabet. Let ∙ 푅1 be just all possible strings over some alphabet, Δ*, and ∙ 푅2 be all strings except rejecting computation histories for 푀 on 푤. If 푀 rejects 푤, there a is rejecting computation history. Then 푅2 will be all strings except for that one string, and the regular expressions will not be equivalent, 푅1̸= 푅2. If 푀 accepts 푤, then there are no rejecting computation histories, and 푅1 = 푅2. How big are 푅1 and 푅2 allowed to be? They have to be polynomial in the size of 푤. How big can 푅2 be? 푤 already has exponential space, so the string is double-exponentially big. The challenge is how to encode: how to represent the enormous objects even though you yourself are very small. We construct 푅2 as follows. 푅2 is supposed to describe all the junk: every string which fails to be a computation history (because it’s scribble). We look at all the possibilities that 푅2 can fail; we have to describe all failure modalities. We’ll write 푅2 = 푅bad-start ∪ 푅bad-reject ∪ 푅bad-compute. The beginning is bad, the end is bad, or somewhere along the line we made a mistake moving from one configuration to the next. A computation history looks like 퐶start#퐶1#· · ·#퐶reject. The configurations are 2푛푘 big, because we have to write down the the tape of the machine. Assume they are padded to the same length, so 퐶start = 푞푤1 · · ·푤푛 · · · . 푅bad-start: We describe 푅bad-start as all words which don’t have first symbol 푞0, or 2nd symbol 푤1, and so forth, so everything that doesn’t start with 푞푤1 · · ·푤푛 · · · . To start, let 푅bad-start = (Δ − 푞0)Δ* ∩ Δ(Δ − 푤1)Δ* ∩ Δ2(Δ − 푤2)Δ* ∩ · · · ∩ Δ푛(Δ − 푤푛)Δ* ∩ · · · (Technically we have to write out Δ − 푞0 as a union. This is shorthand. It’s not a regular expression as we wrote it, but we can easily convert it.) Now we have to deal with the blanks. This is a little of a pain. Naively we have to write down an exponential number of expressions Δ푖(Δ − )Δ*. We do a bit of regular expression hacking. We let 푅bad-start = (Δ − 푞0)Δ* ∩ Δ(Δ − 푤1)Δ* ∩ Δ2(Δ − 푤2)Δ* ∩ · · ·Δ푛(Δ − 푤푛)Δ* ∩Δ푛+1(Δ ∪ 휀)2푛푘 −(푛+1)(Δ − )Δ* ∩ 2푛푘 (Δ − #)Δ*. (Any string that starts with 푛 + 1 to 2푛푘 symbols followed by a non-blank is a bad starting string.) Note that 2푛푘 can be written down with 푛푘 + 1 bit 141
  • 142. Lecture 21 Notes on Theory of Computation 푅bad-reject: Let 푅bad-reject = (Δ − 푞rej)*. 푅bad-compute: For 푅bad-compute, we need to describe all possible errors that can happen, Δ* (error)Δ*. An error means we have a bad window; we have an incorrect window 푑푒푓 following 푎푏푐 in the same position. Thus we let ⋃︁ 푎푏푐푑푒푓 illegal window Δ*(푎푏푐Δ2푛푘 −2푑푒푓)Δ*. Note that this is a constant-size union independent of 푛; it is at most size |Δ ∪ 푄|6. We’re done! 푅 is a polynomial time regular expression with exponentiation. We proved this language is not in PSPACE, hence not in P, hence truly intractable. Can we use the same method to show the satisfiability problem is not in 푃? That would show P=NP. There is a heuristic argument that shows this method will not solve the P vs. NP problem. This has to do with oracles. The moral of the story is that this method, which is very successful in showing a language outside of P, is not going to show SAT is outside of P. S2 Oracles Sometimes we want to think of a Turing machine that operates normally, but is allowed to get a certain free language. The machine is hooked up to a black box, the oracle, which is going to answer questions whenever the machine decides to ask one, about whether a string is in the language. Definition 21.8: An oracle for a language 퐴 is a machine (black box) that answers ques-tions about what is in 퐴 for free (in constant time/space). 푀퐴 is a TM with access to an oracle for 퐴. Let P퐴 be the languages decidable in polynomial time with oracle 퐴, and define NP퐴 in the languages decideable in nondeterministic polynomial time with oracle 퐴. Let’s look at some examples. A handy oracle is an oracle for SAT. Example 21.9: 푃SAT is the class of languages that you can solve in polynomial time, with the ability to ask whether any expression is in SAT. Because SAT is NP–complete, this allows you to solve any NP problem: NP ⊆ 푃SAT. Given a language in NP, first compute a polynomial reduction to SAT, and then ask the oracle whether the formula is true. We also have coNP ⊆ 푃SAT, 142
  • 143. Lecture 22 Notes on Theory of Computation because 푃SAT, a deterministic class, is closed under complement. This is called computation relative to SAT. The general concept is called relativization. Whether NPSAT ?= 푃SAT is open. However, we do know the following. Theorem 21.10: For some 퐴, P퐴 = NP퐴. For some other 퐵, P퐵̸= NP퐵. We’ll prove the first fact, and then see the important implications. Proof. Let 퐴 =TQBF (or any PSPACE–complete problem). Now because TQBF is in PSPACE, the machine can answer can answer the question without the oracle, we can eliminate the oracle. NPTQBF ⊆ NPSPACE Savitch = PSPACE ⊆ 푃TQBF, the last because TQBF is PSPACE–complete. Here is the whole point of why this is interesting. Suppose we can prove P̸=NP using essentially the technique for the first 2 3 of the lecture: hierarchy theorem and a reduction. At first glance that’s possible. But diagonalization at its core is one machine simulating another machine, or a variety of different machines. Notice that simulation arguments would still work in the presence of an oracle. We give both the simulating machine and simulated machine the same oracle; the proof goes through. The simulating machine can also ask the same oracle. Suppose we have a way of proving P̸=NP with simulating. Then we could prove 푃퐴̸= 푁푃퐴 for every oracle 퐴. But this is false! We know 푃퐴 = 푁푃퐴 for certain oracles! This simple-minded approach doesn’t work. A solution to 푃 ?= 푁푃 cannot rely on simulating machines alone, because if it did, by relativization the proof would show that the same is true with any oracle. Lecture 22 Thu. 11/29/12 Last time we talked about ∙ EQREX↑ is EXPSPACE-complete. 143
  • 144. Lecture 22 Notes on Theory of Computation ∙ Oracles We gave an example of a provably intractable language, and concluded the same technique can’t be used to prove P ? =NP (relativization). Today we’ll look at a different model of computation that has important applications. We allow Turing machines to access a source of randomness to compute things more quickly then we might otherwise be able to do. We’ll talk about ∙ Probabilistic computation and BPP ∙ Primality and branching programs S1 Primality testing We’ll use primality testing as an example of a probabilistic algorithm. Let PRIMES = {푝 : 푝 is a prime number in binary} . We have PRIMES∈coNP (easy). We can write down a short proof in elementary number theory that PRIMES∈coNP. A big breakthrough in 2002 showed PRIMES∈P. We’ll give a probabilistic, polynomial-time algorithm for PRIMES. We’ll just sketch the idea, without going through the details. It is probabilistic in the sense that for each input the running time is polynomial, but there is a small chance that it will be wrong. We need the following. Theorem 22.1 (Fermat’s little theorem): For any prime 푝 and 푎 relatively prime to 푝, 푎푝−1 ≡ 1 (mod 푝). This comes from the abstract algebra fact that if you raise the element of a finite group to the size of the group you get the identity. For example, if 푝 = 7 and 푎 = 2, then 26 = 64 ≡ 1 (mod 7). In contrast, if you take 푝 = 9, 푎 = 2, then 28 ≡ 256 ≡ 4̸≡ 1 (mod 9). We have just given a proof that 9 is not a prime number: 9 does not have a property that all prime numbers are. However, this proof does not tell you what the factors are. (So primality testing may not help you do any factoring.) Suppose 푎푝−1 (mod 푝)̸= 1 for 푝 not prime. This gives an easy test for primality. Unfor-tunately, this is false. An example is 푝 = 561 = 3 · 11 · 17. We have 2560 ≡ 1 (mod 561). We’re going to look at something which is still false, but closer to being true. Suppose for 푝 not prime, 푎푝−1 (mod 푝)̸= 1 for most 푎 푝. This would not necessarily give a polynomial time algorithm, because it might give the wrong answer. But you can pick random 푎’s; each time you pick an 푎, you have a 50-50 chance of getting a result which is not 1. To test if 푝 is a prime number, test a hundred random 푎’s. If you run 100 times and fail, the number is probably prime. But this is also false. For 561, it fails for all 푎 relatively prime to 푝. This test ignores Carmichael numbers, which masquerade for primes. 144
  • 145. Lecture 22 Notes on Theory of Computation But let’s assume our heuristic is true. Then this test works. Let’s write the algorithm down. Here is a probabilistic algorithm assuming the heuristic. “On input 푝, 1. Pick 푎1, . . . , 푎푘 푝 at random. (푘 is the amplification parameter, which allows us to adjust the probability of error.) 2. Compute 푎푝−1 푖 (mod 푝) for each 푖. 3. Accept if all results equal 1, and reject any result is not 1.” With our assumption, if 푝 is prime, 푃(accept) = 1. If we have a prime number we always get 1 by Fermat’s little theorem. But if 푝 is composite, then the probability is going to be small (under the false assumption) 푃(accept) ≤ 2−푘. It’s like flipping a coin each time you pick an 푎. This is our motivating example for making our definition. S2 Probabilistic Turing Machines We set up a model of computation—probabilistic Turing machines—which allows us to talk about complexity classes for algorithms like this. Definition 22.2: A probabilistic Turing machine is a type of NTM where we always have 1 or 2 possible moves at each point. If there is 1 move, we call it a deterministic move, and if there are 2 moves, we call it a coin toss. We have accept or reject possibilities as before. We consider machines which run in time poly(푛) on all branches of its computation. 145
  • 146. Lecture 22 Notes on Theory of Computation Definition 22.3: For a branch 푏 of 푀 on 푤, we say the probability of 퐵 is 푃(푏) = 2−ℓ where ℓ is the number of coin toss moves in 푏. We have 푃(푀 accepts 푤) = Σ︁ 푏 accepting branch 푃(푏). This is the obvious definition: what is the probability of following 푏 if we actually tossed coins at each coin toss step? At each step there is 1 2 chance of going off 푏. The machine will accept the input with certain probability. Accept some with 99%, 0%, 2%, 50%. We want to say that the probability does the right thing on every input, but with small probability of failing (the error). Definition 22.4: For a language 퐴, we say that probabilistic TM 푀 decides 퐴 with error probability 휀 if for 푤 ∈ 퐴, 푃(푀 accepts 푤) ≥ 1 − 휀. If 푤̸∈ 퐴, then 푃(푀 rejects 푤) ≥ 1 − 휀 (i.e., it accepts with small probability, 푃(푀 accepts푤) ≤ 휀.) For instance if a machine accept with 1% error, then it accept things in the language with 99% probability. There is a forbidden behavior: the machine is not allowed to be unsure, for instance accept/reject an input with probability 1 2 . It has to lean overwhelmingly one way or another way. How overwhelming do you want to be? We have a parameter 푘, which we can apply universally to adjust the error possibility. By repeating an algorithm many times, we can decrease error. Lemma 22.5 (Amplification lemma): For a probabilistic Turing machine 푀 with error probability 휀, with 0 ≤ 휀 1 2 , any any polynomial 푝(푛), there is a probabilistic Turing machine 푀′ equivalent to 푀 and has error probability 2−푝(푛). Not only can we get the error probability small, we can get the probability decreasing rapidly in terms of 푛. Proof sketch. 푀′ on 푤 runs 푀 on 푤 poly(푛) times and outputs the majority answer. This motivates the following important definition of a complexity class. Definition 22.6: Define BPP = {︂퐴 : some probabilistic poly-time TM decides 퐴 with error probability 1 3}︂. BPP stands for bounded probabilistic polynomial-time. 146
  • 147. Lecture 22 Notes on Theory of Computation Here, bounded means bounded below 1 2 . The 1 3 looks like an arbitrary number, but it doesn’t matter. Once you have a TM you can make the probability 1 10100 is you want. All you need about 1 3 is that 1 3 1 2 . We can prove PRIMES∈BPP by souping up the algorithm we described appropriately. Now we know PRIMES∈P. Obviously P⊆BPP. (A P-algorithm gives the right answer with error 0.) We still don’t know P ? =BPP. In fact most people believe P=BPP, because of pseudorandomness. If there was some way to compute a value of the coin toss in a way that would act as good as a truly random coin toss, with a bit more work one could prove P=BPP. A lot of progress has been made constructing pseudo-random generators, but they require assumptions such as P̸=NP. S3 Branching programs We turn to a bigger example of a problem in BPP that has a beautiful proof. It has an important idea that turned out to be revolutionary in complexity theory. (A useful picture to keep in mind is the following. Fig 3.) We need to define our problem. Definition 22.7: A branching program (BP) is a directed graph labeled with variable names (possibly repeated) such that the following hold. ∙ Every node has a label and has 2 outgoing edges 0 and 1, except for two special nodes at the end. ∙ The 2 special nodes are 0 and 1. (Think of them as the output.) 147
  • 148. Lecture 22 Notes on Theory of Computation ∙ There is a special start node, and no cycles. To use a branching program, make an assignment of the variables to 0’s and 1’s. Once you have the assignment, put your finger on the start node. Look at the variable at the node. Read the variable’s value, and follow 0 or 1 out. An assignment of variables will eventually take you to an output node 0 or 1; that is the output of the program. Here is a branching program. It computes the exclusive or function. We want to test whether two different-looking branching programs are equivalent: whether they compute the same function. Definition 22.8: Define the equality problem for BP’s by EQBP = {⟨퐵1,퐵2⟩ : 퐵1,퐵2 are BP’s and compute some function} . This is in coNP: when two BP’s are not equivalent, then we can give an assignment on which they differ. EQBP ∈ coNP. In fact it is coNP–complete. There’s not much more we can say without radical consequences to other things. We consider a special case that disallows a feature that our first BP has. We disallow reading the same variable twice on any path. Once we’ve read 푥1, we can’t read 푥1 again. 148
  • 149. Lecture 23 Notes on Theory of Computation Definition 22.9: In a read-once BP, each 푥푖 can appear at most once on a path from the start to the output. Let’s look at the problem EQROBP = {⟨퐵1,퐵2⟩ : 퐵1,퐵2 are read-once BP’s and compute some function} . This is in coNP, but it’s not known to be complete. (It is not known to be P, but known to be in BPP. It would probably not be coNP–complete.) Our main theorem is the following. Theorem 22.10: EQROBP ∈BPP. Our first approach is to run the 2 BP’s on random inputs. But that’s not good enough to give a BPP algorithm: we can only run on polynomially many out of exponentially many input values, and see if they ever do something different. But you can construct branching programs 퐵1 and 퐵2 that agree everywhere except at 1 place. They are obviously not equivalent. But if you run them on random input, the chance of finding that disagreement is low. Even if you run polynomially many times, you’re likely not to see the disagreement, and you would think they’re not equivalent. We need to make the chance of finding the disagreement at least 1 2 3, or some fixed value greater than 1 2 . Instead we’ll do something totally crazy. Instead of setting the 푥푖’s to 0’s and 1’s, we’ll set them to other values: 2, 3’s, etc. What does that mean? The whole problem is to define it. We extend in some algebraic way to apply to nonboolean input, and a single difference gets magnified into an overwhelming difference. This is worth doing, because the math ideas behind the proof is important. We’ll give a taste of the proof, and finish it next time. Now 푥1 could be given the value 2. We’ll blend 0’s and 1’s together. It uses the following important technique, called arithmetization. We want to convert a Boolean model of computation into arithmetic operations that simulate boolean ones. For instance consider ∧,∨. We want to simulate these using arithmetic operations that operate on boolean variables the same way. We want to use +,× but get the same answer. 푎 ∧ 푏 → 푎푏 ¬푎 → 1 − 푎 푎 ∨ 푏 → 푎 + 푏 − 푎푏. Our first step is to convert the branching programs, write it out in terms of and’s, or’s, and negations. We express the program as a circuit in terms of and’s and or’s. Then we convert to +’s and ×’s, so that the program still simulate faithfully when given boolean inputs, but now has meaning for nonboolean inputs. That’s the whole point. There is analysis that we have to work through, but this sets the stage. 149
  • 150. Lecture 23 Notes on Theory of Computation Lecture 23 Thu. 10/11/12 We are going into more advanced topics. Last time we talked about ∙ probabilistic computation ∙ BPP Today we’ll see that ∙ EQROBP ∈BPP. Unlike PRIMES, this is not known to be in 푃. A read-once branching program looks like this. (Ignore the blue 1’s for now.) S0 Homework Problem 1: Using padding, we can relate different unsolved problems: EXP vs. NEXP to P vs. NP. Problem 2: This is on nondeterministic time hierarchy. It has a short answer, but you have to see what’s going on the the proof to see why it doesn’t work. There is a nondeterministic time hierarchy, but you need a fancier method of proof, to overcome this problem. A famous paper by Steve Cook shows how to overcome it. S1 EQROBP In the figure, 퐵1 is the constant-1 branching program. The only way 퐵2 can output 0 is if everything is 0. It computes the OR function. 퐵1 and 퐵2 almost compute the same function; they agree everywhere except the place where everything is 0. 150
  • 151. Lecture 23 Notes on Theory of Computation If we run the programs on polynomially many input, the chance that we land on the single input where they differ is exponentially small. We need exponentially many of them to have a shot. The straightforward algorithm for testing equivalence is not going to run in polynomial time. The key idea is to run on other numbers, using arithmetization. This technique is also used in error-correcting codes and other fancy things. We will simulate ∧,∨ with +,×. 푎 ∧ 푏 → 푎푏 푎 → 1 − 푎 푎 ∨ 푏 → 푎 + 푏 − 푎푏 → 푎 + 푏 if 푎, 푏 not both 1. (For 푎 ∨ 푏, we can use 푎 + 푏 if 푎, 푏 are never both 1.) We first rerepresent branching program with and’s and or’s. Let’s think about running the branching program slightly differently. It is a boolean evaluation: we give a boolean assignment to every one of the nodes and edges. Put a 1 on every node and edge that you follow and 0 on everything you don’t. Every path starts at 푥1, so we assign the node with 1. Let’s say 푥1 = 1; we write 1 on the edge going to 푥3 and 0 on the edge 푥2 to say we didn’t go that way. Put 1 on 푥3. Let’s say 푥3 is 0. Then we put 1 on the 0-path from 푥3. Everything else is 0. The advantage is that we can write a boolean expression corresponding to what 1’s we write down. Suppose we have a node 푥푖. 151
  • 152. Lecture 23 Notes on Theory of Computation We need a boolean expression to say which edge we went down. On the right side we’ll put 푎 ∧ 푥푖. Why does that make sense? The only way we’ll go down that path is if we went through 푥푖 and 푥푖 = 1. On the 0 edge we write 푎 ∧ 푥푖. This tells us how to label the edges. How do we label the nodes? Suppose 푎1, 푎2, 푎3 label edges going to a node. Then we label the node with 푎1 ∨ 푎2 ∨ 푎3. The start gets 1 and the output is the value of the 1 node (푎 =output). Now let’s redo it with + and × instead of ∨ and ∧. There are no cycles; the path can enter every node in exactly one way. Thus we never have more than 1 푎푖 set to 1. Thus for the “or” we don’t need the correction term, and we can just add, 푎 + 푏. Using the arithmetization, we can assign non-Boolean values, and there is perhaps some nonsensical result that comes out. Remember that we wrote down the branching program for parity (exclusive or), for instance, 푥1⊕푥2. Have you ever wondered what 2⊕3 is? Probably not. Let’s plug in 푥1 = 2 and 푥2 = 3 into the arithmetized version of this branching program. Let’s see what happens. Plug in 1 at the start node. If we assigned 푥1 = 0 and 푥2 = 1, everything work out as before. But now can give values even if we don’t have 0’s and 1’s coming in. We get the following values. 152
  • 153. Lecture 23 Notes on Theory of Computation We get 2 ⊕ 3 = −7. (We haven’t discovered some fundamental fact about exclusive or. There’s no fundamental meaning to this.) Let’s summarize. Originally we thought of a running a BP as following some path. That way of thinking doesn’t lend itself to arithmetization. Instead of thinking about taking a path, think about evaluating a branching program by assigning values to all nodes by the procedure, and looking at 1 node. There is no path, but this way of thinking is equivalent. We can look at the value on the output node even if the input nodes didn’t have 0/1 values coming in. If we had a different branching program representation of the same boolean formula (say, xor), would we get different value? No. As you will see from the coming proof, if we have a different representation that is still read-once, and it agrees on the boolean case, then it agrees on the non-boolean case. This is not true with a general branching program! As an example, if we flip 푥1, 푥2 in the xor program we get the same value for 2 ⊕ 3. Finally, here is the probabilistic algorithm. Proof. Let 푀 =“on ⟨퐵1,퐵2⟩, 1. Randomly select non-Boolean values for 푥1, . . . , 푥푚 from the finite field F푞 = {0, 1, . . . , 푞− 1} where 푞 is prime (this is modular arithmetic modulo 푞). Choose 푞 3푚. 2. Compute 퐵1,퐵2 (arithmetized) on 푥1, . . . , 푥푚. 3. Accept if we get the same output value. Reject if we do not.” Now we have to prove this works. We claim the following. 1. If 퐵1,퐵2 are equivalent then 푃(푀 accepts) = 1. (If they agree then they agree on boolean values. We’ll prove they agree even on nonboolean values.) 2. If 퐵1,퐵2 are not equivalent then 푃(푀 rejects) ≥ 2 3 . We prove statement 1. This is the hard part. We take the input variables and keep them as variables 푥푖; do the calculation symbolically. We’ll write down expressions like 푥1, 1 − 푥1, and so forth. At every step we’re multiplying things like 푥푖 or (1 − 푥푖), or adding together terms. At the output node 1 we have some polynomial in 푥1, . . . , 푥푚. 153
  • 154. Lecture 23 Notes on Theory of Computation Evaluating 퐵1,퐵2 symbolically in the arithmetized version, we get polynomials 푃1, 푃2 on 푥1, . . . , 푥푚. These polynomials have a very nice form: They all look like products of 푥푖’s and (1 − 푥푖)’s, added up, for instance 푥1(1 − 푥2)푥3푥4(1 − 푥5) · · · 푥푚 + (1 − 푥1)푥2푥3(1 − 푥4) · · · + · · · . In each summand, we never get same variable appearing more than once because of the read-once property. How do we know we get every variable appearing once? We can always pad out a branching program by adding missing variables, to turn it into a “read exactly once” branching program. Both 푃1, 푃2 look like this. Why is it nice? It’s the truth table of the original program on Boolean values. The summands give the rows of the Boolean truth table. If two BP’s agree in the Boolean world, they have the same truth table and hence the same polynomial, and agree everywhere. (There is an exponential number of rows, but this doesn’t matter: when we run the algorithm, we don’t calculate the polynomial, which takes exponential time. We get a specific value of the polynomial by plugging in values and computing things on the fly.) Part 2 uses a magical fact about polynomials. Lemma 23.1: If 푃(푥) is a nonzero polynomial of degree at most 푑, then 푃(푥) has at most 푑 zeros. (This is true in any field, in particular F푞.) The probabilistic version is the following: if you pick 푥 ∈ F푞 at random, Prob(푃(푥) = 0) ≤ 푑 푞 . Lemma 2 is the multivariate version. Lemma 23.2 (Schwartz-Zippel): lem:schwartz-zippel If 푃(푥1, . . . , 푥푚) is nonzero, each 푥푖 has degree at most 푑, and you pick 푥푖 ∈ F푞 randomly, then Prob[푃(푥1, . . . , 푥푚) = 0] ≤ 푚푑 푞 . This is proved from the single-variable case by induction. Remember we had 2 polynomials 푃1, 푃2? Let’s look at the difference 푃 := 푃1−푃2. If the branching programs are not equivalent, then the difference of the polynomials is nonzero. That nonzero polynomial has few roots. 푃 is zero in very few places, so 푃1, 푃2 agree in very few places. When we run the arithmetization of 푃1, 푃2, we’re unlikely to get the same value coming out. It’s very likely we’ll get different values coming out, and very likely we’ll reject. For our 푃 = 푃1 − 푃2, what is 푑? Every variable appears once. Hence 푑 = 1. 푚 is the number of variables, and 푞 3푚, so the probability is at most 1 3 . The chance we got an agreement in 푃1, 푃2 is at most 1 3 . The chance we get disagreement is at least 2 3 . 154
  • 155. Lecture 24 Notes on Theory of Computation Though arithmetization—converting boolean formulas to a polynomial and then run-ning on randomly selected nonboolean formula—we can magnify the chance that a probabilistic algorithm works. This is a nice probabilistic algorithm. We’ll use this method again in the last two lectures, where we’ll prove amazing results about satisfiability using interactive proof systems. Lecture 24 Thu. 12/6/12 The final exam is Wednesday December 19, 9-12 in Walker. It is open book, notes, and handouts. It covers the whole semester with emphasis on the 2nd half. It is a 3-hour version of the midterm with some short-answer questions. Handout: sample questions. Last time we showed EQROBP ∈ 퐵푃푃 and today we’ll talk about ∙ Interactive Proofs ∙ IP S1 Interactive proofs 1.1 Example: Graph isomorphism We’ll move into the very last topic, an amazing idea: the interactive proof system. It’s a probabilistic version of NP, the same way BPP is a probabilistic version of P. Another amazing thing is that it goes against the intuition of NP we built up during the term: If a problem is in NP, it has short certificates, so a prover can convince a verifier about a certain fact, membership in the language. Using the idea of interactive proof, a prover can convince a verifier about a certain fact even though there are no short certificates. We’ll start this off with a famous example: testing whether or not graphs are isomorphic: ISO = {⟨퐺1,퐺2⟩ : 퐺1,퐺2 graphs,퐺1 ≡ 퐺2} . Two graphs are isomorphic iff we can match up the nodes so that edges go between cor-responding nodes. It is clear ISO∈NP: just give the matching. It is one of the rare (combi-natorial) problems in NP that is neither known to be in P nor NP-complete. Almost every other problem is either known to be in P or NP-complete, except for bunch of problems related to number theory. The graph isomorphism is the most famous such problem. There is a huge literature trying to prove it one way or other, with no success yet. Define NONISO∈ ISO. 155
  • 156. Lecture 24 Notes on Theory of Computation Is NONISO∈NP? Not known. Its seems one has to astronomically search through all permutations to determine non-isomorphism. Is there a short certificate, or do you essentially have to go through same process again? There is way for you to convince me of the fact provided you have sufficient computational power at your disposal. Here is a whimsical version of interactive proof system: The prover has unlimited computational power but is not trustworthy. The verifier checks the prover. The prover is like army of slaves, also called graduate students. The verifier is the king, sometimes called the professor. The grad students (slaves) stay up all night, and have unlimited computational power. The professor only operates in probabilistic polynomial time. The professor has a research problem: are these graphs isomorphic? The grad students get to work, with their fancy computers. They find: Yes, they’re isomorphic! The professor knows that grad students basically honest folks, but they have lots of other things worry about, like XBox. The prof needs to be convinced, and be sure what answer is. If the grad students say yes, the professor says: convince me, and the grad students give the isomorphism. Suppose the grad students say the graphs are nonisomorphic. The professor asks for a proof. There is a simple protocol they can go through with the professor to convince this pro-fessor that the graphs are non-isomorphic. This was established back in mid-1980’s. Laszlo Babi, a leading expert in graph isomorphism, was flabbergasted. Both the professor and students have access to the 2 graphs. The professor takes 2 graphs, turns around secretly, chooses one of 퐺1,퐺2 at random, and randomly permutes the vertices. The professor asks, “Is the graph I picked 퐺1 or 퐺2?” If the grad students can answer reliably, then they must be nonisomorphic. If they are isomorphic, it could have come from either one, and there is no way to tell which one the prof picked; the best thing one can do is guess. If the graphs really were different, the students can use a supercomputer to guess which one professor the picked: The graph can only be isomorphic to one of 퐺1,퐺2. The professor does this several times. If the students can answer the question 100 times correctly in a row, either they are legitimately doing the protocol, or they’re incredibly lucky. In fact, interactive proof systems can show formulas are unsatisfiable. The proof is more complicated. This gives all of coNP doable with interactive proof system! We know ISO∈NP, but we don’t know whether NONISO∈NP. But we can convince someone of non-isomorphism if we have enough computational power. We extend NP to a bigger class, where we can convince a verifier of membership in languages beyond NP. Interactive proof systems play a big role in cryptography: here the prover is limited in some way, but has special information (a password), and has to convince someone that he has the password without revealing the password. 1.2 Formal model of interactive proofs We write down a formal model. Definition 24.1: Let 푃 be a prover with unlimited computational power. Let 푉 be a verifier with probabilistic polynomial time computational power. Let (푃 ↔ 푉 ) be an interaction 156
  • 157. Lecture 24 Notes on Theory of Computation where 푃 and 푉 exchange polynomially many messages (both given input 푤) until 푉 says accept or reject. We say that 퐴 ∈IP if there are 푉 and 푃 where for every 푤, 푤 ∈ 퐴, Prob[(푃 ↔ 푉 ) = accept] ≥ 2 3 and for 푤̸∈ 퐴, for every Ü푃, Prob[(Ü푃 ↔ 푉 ) = reject] ≥ 2 3 . To show a language is in IP, we set up a verifier and prover. For every string in the language, working together, the prover gets the verifier to accept with high probability. If the string is not in language, then no matter what prover you choose (푃 is cheating prover trying to make the verifier accept when she shouldn’t), rejection is the likely Üoutcome. Theorem 24.2: NONISO∈IP. Proof. We write the NONISO protocol with this setup in mind. On input ⟨퐺1,퐺2⟩, V: Choose 퐺1 or 퐺2 at random. Then randomly permute and send result to 푃. P: Replies: which 퐺푖 did 푉 choose? Repeat twice. V: Accept if 푃 is correct both times. Reject if 푃 is ever wrong. If 퐺1̸≡ 퐺2 then Prob[(푉 ↔ 푃) accepts] = 1 The honest prover can tell which 퐺푖 the verifier picked by detecting whether it is isomorphic to 퐺1 or 퐺2. The honest prover only in play when ⟨퐺1,퐺2⟩ is in the language. Now the sneaky prover steps in: I’ll take a shot at it. If 퐺1̸≡ 퐺2, then the sneaky prover (pretending 퐺1 ≡ 퐺2) can’t do anything, can only guess. The probability it guesses right twice is 1 4 . Thus if 퐺1 ≡ 퐺2, then for any Ü푃, Prob[(푉 ↔ Ü푃) accepts] ≤ 1 4 . This shows NONISO∈P. Proposition 24.3: NP⊆IP. BPP⊆IP Proof. For NP⊆IP, the prover sends the certificate to the verifier. This is just a 1-way conversation. The verifier checks the certificate. For BPP, the verifier doesn’t need the prover. The verifier can it do all by his or her lonesome self. 157
  • 158. Lecture 24 Notes on Theory of Computation S2 IP=PSPACE Now we’ll prove the amazing theorem. This blew everything away when it came out, I remember that. Theorem 24.4: IP=PSPACE. What does this mean? Take the game of chess, some game where you can test in poly-nomial space which side has a forced win. It takes an ungodly amount of time to go through the search tree, but in relatively small space you can show (say,) white has a forced win. There is probably no short certificate, but if Martians with supercomputers have done all computations, they could convince mere probabilistic time mortals like us that white has a forced win without us going through the entire game tree. We’ll prove a weaker version, coNP⊆IP. This was discovered first, contains pretty much all the ideas, and is easier to describe. The full proof of IP=PSPACE is in the textbook. It’s enough to work with satisfiability, show the prover can convince the verifier that a formula is not satisfiable. The amazing thing is the technique. We’ll use arithmetization as we did before. 2.1 Aside Every few months I get a email or letter claiming to prove P=NP. The first thing I look is whether P=NP. If the person claims P=NP, I don’t even look at it. It is probably some horrible algorithm with accompanying code. I tell them, then you can factor numbers. Here’s a website with various numbers known to be composite, where no one knows the factorization. Just factor one of them. That usually shuts them up, and I never hear from them again. If P̸=NP, then almost without exception, their proof goes like this. They claim, clearly any algorithm for SAT, etc. has to operate in the following way... Then they give a long analysis that shows it has to be an exponential algorithm. The silly part is the “clearly.” That’s the whole point. How do you know you can’t do something magical; plug the input through a Fourier transform and do some strange things, and have the answer pop out. You have to prove no such crazy algorithms exist. The cool thing about the IP protocol is that it does something crazy and actually works. 2.2 coNP⊆IP Proof of coNP⊆IP. For a formula 휑 let #휑 be the number of satisfying assignments of 휑. Note #휑 will immediately tell you whether 휑 is satisfiable. Define number-SAT (sharp-SAT) by #SAT := {⟨휑, 푘⟩ : #휑 = 푘} . This is not known to be in NP. (It would be in NP for small 푘. However, if there are exponentially many satisfying assignments, naively we’d need an exponential size certificate.) 158
  • 159. Lecture 24 Notes on Theory of Computation However, we show #푆퐴푇 ∈ IP. We’ll set up a little notation. Fix 휑. Let 휑(푥1, . . . , 푥푚) =⎧⎨⎩ 0, unsatisfying 1, satisfying Let 푇() = Σ︁ 푥푖∈{0,1} 휑(푥1, . . . , 푥푚). Note 푇() = #휑 is the number of satisfying assignments. Add 1 every time satisfying, 0 if not satisfying assignments. Define 푇(푥1, . . . , 푥푗) = Σ︁ 푥푖∈{0,1}, 푖푗 휑(푥1, . . . , 푥푚). We are preseting some of the values of the formula, and counting the number of satisfying assignments subject to those presets. Thus 푇(푥1, . . . , 푥푗) = #휑푥1,...,푥푗 where 휑0 = 휑 with 푥1 = 0, 휑01 = 휑 with 푥1 = 0, 푥2 = 1, and so forth. In particular, since we assign values to all of the 푥푖 values, 푇(푥1, . . . , 푥푛) is 0 or 1. We have the following relations. 푇() = #휑 푇(푥1, . . . , 푥푚) = 휑(푥1, . . . , 푥푚) 푇(푥1, . . . , 푥푗) = 푇(푥1 . . . 푥푗0) + 푇(푥1 . . . 푥푗1). To see the last equation, note the number of satisfying assignments with 푥1, . . . , 푥푗 is the sum of the number of satisfying assignments additionally satisfying 푥푗+1 = 1 and the number of satisfying assignments additionally satisfying 푥푗+1 = 0, because one of these has to be true. We set up the #SAT protocol. (Our first version will have a little problem, as we will see.) Suppose the input is ⟨휑, 푘⟩. The prover is supposed to make the verifier accept with high probability. 0. P: Sends 푇(), 푉 checks 푘 = 푇(). (Reject if things don’t check out.) 1. P: Sends 푇(0) and 푇(1). 푉 checks that 푇() = 푇(0) + 푇(1). 2. P: Sends 푇(00), 푇(01), 푇(10), 푇(11). 푉 checks 푇(0) = 푇(00) + 푇(01) and 푇(1) = 푇(10) + 푇(11). (This is exponential, which is a problem. But humor me.) ... 159
  • 160. Lecture 24 Notes on Theory of Computation 푚. P: Sends 푇(0 . . . 0), . . . , 푇(1 . . . 1 ⏟ ⏞ 푚 ). V checks 푇(0 . . . 0 ⏟ ⏞ 푚−1 ) = 푇(0 . . . 0 ⏟ ⏞ 푚−1 0) + 푇(0 . . . 0 ⏟ ⏞ 푚−1 1), . . ., 푇(1 . . . 1 ⏟ ⏞ 푚−1 ) = 푇(1 . . . 1 ⏟ ⏞ 푚−1 0) + 푇(1 . . . 1 ⏟ ⏞ 푚−1 1). 푚 + 1. V checks 푇(0 . . . 0) = 휑(0 . . . 0), . . ., 푇(1 . . . 1) = 휑(1 . . . 1), and accepts if all these are equal. Think of this as a tree. This algorithm might seem trivial, but it’s important to understand the motivations. An honest prover sends correct input. Suppose we have a dishonest prover: If 푘 is wrong, the prover tries to convince the verifer to accept anyway. The prover sends wrong value for 푇(). This is like asking a kid questions, trying to ferret out a lie. One lie lead to other lies. (But to the kid things may look locally consistent...) There must be a lie on at least one of two branches. At least one lie must propagate down at each step, all the way to a lie at the bottom, which the verifier catches. The only problem is the exponential tree. You can imagine trying to do something probabilistic. Instead of following both branches, let’s pick a random branch to follow. You’re a busy parent. You can’t check out all possible things your kid is saying. Pick one. Choose one branch. But you want a high probability of detecting the cheating. If you a pick random branch, with 50-50 chance, as soon as you get off the lying side we get to the honest side, prover is saved. The prover thinks, “You’re not going to catch me now,” and behaves honestly all the way down. The dishonest prover should only make the verifier accept with low probability. Instead we pick non-boolean values. We arithmetize the whole setup, and reduce to one randomly chosen non-boolean case. We only have to follow a single line of these non-boolean values down. Again we rely on the magic of polynomials. If the prover lied, then in almost all of the non-boolean values we could pick, there will be a lie. A lie leads to another lie almost certainly. The rest of the protocol is set up in 160
  • 161. Lecture 25 Notes on Theory of Computation terms of arithmetization. Arithmetize everything and everything just works. We finish next time. Lecture 25 Tue. 12/11/2012 Last time we talked about ∙ interactive proofs ∙ IP. Today we’ll finish the proof of coNP⊆IP. A prover with unlimited computational power tries to convince a verifier that a string is in the language. For a string in the language, the prover will convince the verifier with high probability. For a string not in the language, that prover, or any other prover, will fail with high probability. The big result is IP=PSPACE; we prove a weaker form coNP⊆IP. (It was around half a year before Adi Shamir got trick to go from coNP⊆IP to IP=PSPACE.) S1 coNP⊆IP Last time we introduced an exponential protocol for #SAT, a coNP-hard problem. This protocol doesn’t use the full power of IP. It is a one-way protocol, like NP: the verifier doesn’t send the prover any questions. Using arithmetization, we find a polynomial that faithfully simulates the expression when we plug in 0’s and 1’s. The degree of the polynomial is not too big. 푎 ∧ 푏 → 푎푏 푎 → 1 − 푎 푎 ∨ 푏 → 푎 + 푏 − 푎푏 휑 → 푃휑(푥1, . . . , 푥푚) The total degree of the polynomial will be at most the length 푛 of 휑: when we combine two expressions the degrees will at most be the sum. Instead of reducing the verification of one 푇-value to two 푇-values, we reduce it to one 푇-value but one that is non-boolean. The formulas will have other values when you plug in 161
  • 162. Lecture 25 Notes on Theory of Computation other values. 푇() = 푘 푇(?) 푇(?, ?) 푇(?, ?, ?) We arithmetize 푃. 푃 looks just like it looks before, but instead of using the formula, we use the polynomial that represents the formula: 푇(푥1, . . . , 푥푖) = Σ︁ 푥푖+1,...,푥푚∈{0,1} 푃휑(푥1, . . . , 푥푚). If we preset to 0’s and 1’s, we get the same value because the polynomial agrees with the boolean formula on boolean values. If we preset nothing, there is no change: 푇() is the number of satisfying assignments. Everything is added up over booleans. If we set everything, we have possibly non-boolean values, and 푇(푥1, . . . , 푥푚) = 푃휑(푥1, . . . , 푥푚). We now give the protocol. This is where the magic happens. We’ll work over some finite field F푞, where 푞 2푛. The reason we make it so big is that 푘 can be a value between 0 and 2푛. We will have wraparound issues if we use a field that can’t represent all these possible values. 0. P sends 푇(). 푉 checks 푘 = 푇(). 1. P sends 푇(푧) as a polynomial in the variable 푧. (More formally, 푃 sends the coefficients. Note that the degree in 푧 is at most |휑|. Each number has at most 푚 bits, and there are at most |휑| + 1 coefficients. Of course calculating this is difficult, but that’s okay: this is the prover. The grad students work hard. They don’t get paid for their work beyond their stipend, which is polynomial, so doesn’t matter. They send an answer which is polynomial.) V checks 푇(0) and 푇(1) are correct by checking 푇() = 푇(0) + 푇(1). Note the nice thing is that one object allows us to get two values. This will prevent the blowup. V sends a random 푟1 ∈ F푞. The prover now has to show 푇(푟1) is correct. 162
  • 163. Lecture 25 Notes on Theory of Computation 2. P sends 푇(푟1, 푧) as polynomial in 푧. (푃 convinces 푉 that 푇(푟1) is correct.) V checks 푇(푟1) = 푇(푟1, 0) + 푇(푟1, 1). ... 푚. P sends 푇(푟1, . . . , 푟푚−1, 푧) as a polynomial in 푧. V checks 푇(푟1 · · · 푟푚−1) = 푇(푟1 · · · 푟푚−10)+ 푇(푟1 · · · 푟푚−11). 푉 chooses random 푟푚 ∈ F푞. 푚 + 1. V checks 푇(푟1, . . . , 푟푚) = 푃휑(푟1, . . . , 푟푚), and if so, accepts. How does the verifier check at the end stage it’s okay? Plug it into 푃휑. 푇() = 푘 푇(푟1) 푇(푟1, 푟2) ... 푇(푟1, . . . , 푟푚) 푃휑(푟1, . . . , 푟푚) The honest prover will make the verifier accept with probability 1. Just follow the protocol: send the correct polynomials. The verifier says, “Convince me the polynomial is right by convince me it works on some random element.” Why does this work? Why are we using polynoimals? Let’s see what happens when the prover tries to lie. If 푘 is wrong the verifier will reject with high probability. In order to preserve any hope of making the verifier accept, the prover has to lie. If 푇() is a lie, then one of 푇(0), 푇(1) has to be wrong. But these came from the same polynomial, by plugging in 0, 1. So the polynomial is wrong. We evaluate that wrong polynomial at a random input. But 2 low-degree polynomials agree in only a small number of locations, because a polynomial of low degree has only a small number of roots. Every place where the polynomial is 0 becomes a zero of the difference polynomials (between the actual and claimed polynomial). 푇(푟1) doesn’t necessarily have to be a lie. But that’s very unlikely: a small number of agreements (roughly 푛) out of exponentially many possibilities. 푇(푟1) is almost certainly wrong. The dishonest prover tries to convince the verifier that 푇(푟1) is right. The prover again has a chance again of getting lucky: if the verifier picks a place where the incorrect and correct polynomial agree. But every step, it’s hard to succeed. Almost certainly 푇(푟1) incorrect forces 푇(푟1푟2) incorrect, and so forth. 163
  • 164. Lecture 25 Notes on Theory of Computation 1.1 Analysis of protocol If ⟨휑, 푘⟩ ∈ #푆퐴푇, then Prob(푉 ↔ 푃 accepts) = 1. If ⟨휑, 푘⟩̸∈ #푆퐴푇, then by the Schwartz-Zippel Lemma 23.2, for any prover Ü푃, Prob(푉 ↔ Ü푃 accepts) ≤ 푚 · deg 푃휑 푞 = 푚 푛 2푛 = poly(푛) 2푛 . The prover has 푚 chances to get lucky. If it gets lucky, it follows the original protocol: just send the correct values all the way down. The probability of getting lucky at one stage is the degree of the polynomial divided by the size of the field 푞. This is small. This shows #SAT∈IP, and hence coNP⊆IP. S2 A summary of complexity classes 164