SlideShare a Scribd company logo
#1
Top-Down ParsingTop-Down Parsing
#2
Extra Credit Question
• Given this grammar G:
– E → E + T
– E → T
– T → T * int
– T → int
– T → ( E )
• Is the string int * (int + int) in L(G)?
– Give a derivation or prove that it is not.
#3
Revenge of Theory
• How do we tell if DFA P is equal to DFA Q?
– We can do: “is DFA P empty?”
• How?
– We can do: “P := not Q”
• How?
– We can do: “P := Q intersect R”
• How?
– So do: “is P intersect not Q empty?”
• Does this work for CFG X and CFG Y?
• Can we tell if s is in CFG X?
#4
Outline
• Recursive Descent Parsing
• Left Recursion
• LL(1) Parsing
– LL(1) Parsing Tables
– LP(1) Parsing Algorithm
• Constructing LL(1) Parsing Tables
– First, Follow
#5
In One Slide
• An LL(1) parser reads tokens from left to
right and constructs a top-down leftmost
derivation. LL(1) parsing is a special case of
recursive descent parsing in which you can
predict which single production to use from
one token of lookahead. LL(1) parsing is fast
and easy, but it does not work if the grammar
is ambiguous, left-recursive, or not left-
factored (i.e., it does not work for most
programming languages).
#6
Intro to Top-Down Parsing
• Terminals are seen in order
of appearance in the token
stream:
t1 t2 t3 t4 t5
The parse tree is
constructed
– From the top
– From left to right
A
t1 B
C
t2
D
t3
t4
t4
#7
Recursive Descent Parsing
• We’ll try recursive descent parsing first
– “Try all productions exhaustively, backtrack”
• Consider the grammar
E → T + E | T
T → ( E ) | int | int * T
• Token stream is: int * int
• Start with top-level non-terminal E
• Try the rules for E in order
#8
Recursive Descent Example
• Try E0 → T1 + E2
• Then try a rule for T1 → ( E3 )
– But ( does not match input token int
• Try T1 → int . Token matches.
– But + after T1 does not match input token *
• Try T1 → int * T2
– This will match but + after T1 will be unmatched
• Have exhausted the choices for T1
– Backtrack to choice for E0
E → T + E | T
T → ( E ) | int | int * T
Input = int * int
#9
Recursive Descent Example (2)
• Try E0 → T1
• Follow same steps as before for T1
– And succeed with T1 → int * T2 and T2 → int
– With the following parse tree
E0
T1
int * T2
int
E → T + E | T
T → ( E ) | int | int * T
Input = int * int
#10
Recursive Descent Parsing
• Parsing: given a string of tokens t1 t2 ... tn,
find its parse tree
• Recursive descent parsing: Try all the
productions exhaustively
– At a given moment the fringe of the parse tree is:
t1 t2 … tk A …
– Try all the productions for A: if A ! BC is a
production, the new fringe is t1 t2 … tk B C …
– Backtrack when the fringe doesn’t match the
string
– Stop when there are no more non-terminals
#11
When Recursive Descent
Does Not Work
• Consider a production S → S a:
– In the process of parsing S we try the above rule
– What goes wrong?
• A left-recursive grammar has
S →+
Sα for some α
Recursive descent does not work in such cases
– It goes into an 1 loop
#12
What's Wrong With That Picture?
#13
Elimination of Left Recursion
• Consider the left-recursive grammar
S → S α | β
• S generates all strings starting with a β and
followed by a number of α
• Can rewrite using right-recursion
S → β T
T → α T | ε
#14
Example of
Eliminating Left Recursion
• Consider the grammar
S ! 1 | S 0
( β = 1 and α = 0 )
It can be rewritten as
S ! 1 T
T ! 0 T | ε
#15
More Left Recursion Elimination
• In general
S → S α1 | … | S αn | β1 | … | βm
• All strings derived from S start with one of β1,
…,βm and continue with several instances of
α1,…,αn
• Rewrite as
S → β1 T | … | βm T
T → α1 T | … | αn T | ε
#16
General Left Recursion
• The grammar
S → A α | δ
A → S β
is also left-recursive because
S →+
S β α
• This left-recursion can also be eliminated
• See book, Section 2.3
• Detecting and eliminating left recursion are
popular test questions
#17
Summary of Recursive Descent
• Simple and general parsing strategy
– Left-recursion must be eliminated first
– … but that can be done automatically
• Unpopular because of backtracking
– Thought to be too inefficient (repetition)
• We can avoid backtracking
– Sometimes ...
#18
Predictive Parsers
• Like recursive descent but parser can
“predict” which production to use
– By looking at the next few tokens
– No backtracking
• Predictive parsers accept LL(k) grammars
– First L means “left-to-right” scan of input
– Second L means “leftmost derivation”
– The k means “predict based on k tokens of
lookahead”
• In practice, LL(1) is used
#19
Sometimes Things Are Perfect
• The “.ml-lex” format you emit in PA2
• Will be the input for PA3
– actually the reference “.ml-lex” will be used
• It can be “parsed” with no lookahead
– You always know just what to do next
• Ditto with the “.ml-ast” output of PA3
• Just write a few mutually-recursive functions
• They read in the input, one line at a time
#20
LL(1)
• In recursive descent, for each non-terminal
and input token there may be a choice of
which production to use
• LL(1) means that for each non-terminal and
token there is only one production that could
lead to success
• Can be specified as a 2D table
– One dimension for current non-terminal to
expand
– One dimension for next token
– Each table entry contains one production
#21
Predictive Parsing
and Left Factoring
• Recall the grammar
E → T + E | T
T → int | int * T | ( E )
• Impossible to predict because
– For T two productions start with int
– For E it is not clear how to predict
• A grammar must be left-factored before use
for predictive parsing
#22
Left-Factoring Example
• Recall the grammar
E → T + E | T
T → int | int * T | ( E )
• Factor out common prefixes of productions
E → T X
X → + E | ε
T → ( E ) | int Y
Y → * T | ε
#23
Introducing: Parse Tables
#24
LL(1) Parsing Table Example
• Left-factored grammar
E → T X X → + E | ε
T → ( E ) | int Y Y → * T | ε
• The LL(1) parsing table ($ is a special end
marker):
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#25
LL(1) Parsing Table
Example Analysis
• Consider the [E, int] entry
– “When current non-terminal is E and next input is
int, use production E → T X”
– This production can generate an int in the first
position
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#26
LL(1) Parsing Table
Example Analysis
• Consider the [Y,+] entry
– “When current non-terminal is Y and current
token is +, get rid of Y”
– We’ll see later why this is so
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#27
LL(1) Parsing Tables: Errors
• Blank entries indicate error situations
– Consider the [E,*] entry
– “There is no way to derive a string starting with *
from non-terminal E”
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#28
Using Parsing Tables
• Method similar to recursive descent, except
– For each non-terminal S
– We look at the next token a
– And choose the production shown at [S,a]
• We use a stack to keep track of pending non-
terminals
• We reject when we encounter an error state
• We accept when we encounter end-of-input
#29
LL(1) Parsing Algorithm
initialize stack = <S $>
next = (pointer to tokens)
repeat
match stack with
| <X, rest>:if T[X,*next] = Y1…Yn
then stack ← <Y1… Yn rest>
else error ()
| <t, rest>:if t == *next ++
then stack ← <rest>
else error ()
until stack == < >
#30
Stack Input Action
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#31
Stack Input Action
E $ int * int $ T X
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#32
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#33
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
int Y X $ int * int $ terminal
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#34
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
int Y X $ int * int $ terminal
Y X $ * int $ * T
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#35
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
int Y X $ int * int $ terminal
Y X $ * int $ * T
* T X $ * int $ terminal
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#36
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
int Y X $ int * int $ terminal
Y X $ * int $ * T
* T X $ * int $ terminal
T X $ int $ int Y
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#37
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
int Y X $ int * int $ terminal
Y X $ * int $ * T
* T X $ * int $ terminal
T X $ int $ int Y
int Y X $ int $ terminal
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#38
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
int Y X $ int * int $ terminal
Y X $ * int $ * T
* T X $ * int $ terminal
T X $ int $ int Y
int Y X $ int $ terminal
Y X $ $ ε
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#39
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
int Y X $ int * int $ terminal
Y X $ * int $ * T
* T X $ * int $ terminal
T X $ int $ int Y
int Y X $ int $ terminal
Y X $ $ ε
X $ $ ε
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#40
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
int Y X $ int * int $ terminal
Y X $ * int $ * T
* T X $ * int $ terminal
T X $ int $ int Y
int Y X $ int $ terminal
Y X $ $ ε
X $ $ ε
$ $ ACCEPT
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#41
LL(1) Languages
• LL(1) languages can be LL(1) parsed
– A language Q is LL(1) if there exists an LL(1) table
such the LL(1) parsing algorithm using that table
accepts exactly the strings in Q
• No table entry can be multiply defined
• Once we have the table
– The parsing algorithm is simple and fast
– No backtracking is necessary
• Want to generate parsing tables from CFG!
Q: Movies (263 / 842)
• This 1982 Star Trek film features
Spock nerve-pinching McCoy, Kirstie
Alley "losing" the Kobayashi Maru ,
and Chekov being mind-controlled
by a slug-like alien. Ricardo
Montalban is "is intelligent, but not
experienced. His pattern indicates
two-dimensional thinking."
Q: Music (238 / 842)
• For two of the following four lines from the
1976 Eagles song Hotel California, give
enough words to complete the rhyme.
– So I called up the captain / "please bring me my
wine"
– Mirrors on the ceiling / pink champagne on ice
– And in the master's chambers / they gathered for
the feast
– We are programmed to receive / you can
checkout any time you like,
Q: Books (727 / 842)
•Name 5 of the 9 major
characters in A. A. Milne's 1926
books about a "bear of very
little brain" who composes
poetry and eats honey.
#45
Top-Down Parsing. Review
• Top-down parsing expands a parse tree from
the start symbol to the leaves
– Always expand the leftmost non-terminal
E
T E
+
int * int + int
#46
Top-Down Parsing. Review
• Top-down parsing expands a parse tree from
the start symbol to the leaves
– Always expand the leftmost non-terminal
E
int T
*
T E
+
int * int + int
• The leaves at any point
form a string βAγ
– β contains only terminals
– The input string is βbδ
– The prefix β matches
– The next token is b
#47
Top-Down Parsing. Review
• Top-down parsing expands a parse tree from
the start symbol to the leaves
– Always expand the leftmost non-terminal
E
int T
*
int
T E
+
T
int * int + int
• The leaves at any point
form a string βAγ
– β contains only terminals
– The input string is βbδ
– The prefix β matches
– The next token is b
#48
Top-Down Parsing. Review
• Top-down parsing expands a parse tree from
the start symbol to the leaves
– Always expand the leftmost non-terminal
E
int T
*
int
T E
+
T
int
int * int + int
• The leaves at any point
form a string βAγ
– β contains only terminals
– The input string is βbδ
– The prefix β matches
– The next token is b
#49
Constructing
Predictive Parsing Tables
• Consider the state S !*
βAγ
– With b the next token
– Trying to match βbδ
There are two possibilities:
• b belongs to an expansion of A
• Any A ! α can be used if b can start a string
derived from α
In this case we say that b 2 First(α)
Or…
#50
Constructing
Predictive Parsing Tables
• b does not belong to an expansion of A
– The expansion of A is empty and b belongs to an
expansion of γ (e.g., bω)
– Means that b can appear after A in a derivation
of the form S !*
βAbω
– We say that b 2 Follow(A) in this case
– What productions can we use in this case?
• Any A ! α can be used if α can expand to ε
• We say that ε 2 First(A) in this case
#51
Computing First Sets
Definition First(X) = { b | X →*
bα} ∪ {ε | X →*
ε}
• First(b) = { b }
• For all productions X ! A1 … An
• Add First(A1) – {ε} to First(X). Stop if ε ∉ First(A1)
• Add First(A2) – {ε} to First(X). Stop if ε ∉ First(A2)
• …
• Add First(An) – {ε} to First(X). Stop if ε ∉ First(An)
• Add ε to First(X)
(ignore Ai if it is X)
#52
Example First Set Computation
• Recall the grammar
E → T X X → + E | ε
T → ( E ) | int Y Y → * T | ε
• First sets
First( ( ) = { ( } First( T ) = {int, ( }
First( ) ) = { ) } First( E ) = {int, ( }
First( int) = { int } First( X ) = {+, ε }
First( + ) = { + } First( Y ) = {*, ε }
First( * ) = { * }
#53
Computing Follow Sets
Definition Follow(X) = { b | S →*
β X b ω }
• Compute the First sets for all non-terminals first
• Add $ to Follow(S) (if S is the start non-terminal)
• For all productions Y ! … X A1 … An
• Add First(A1) – {ε} to Follow(X). Stop if ε ∉ First(A1)
• Add First(A2) – {ε} to Follow(X). Stop if ε ∉ First(A2)
• …
• Add First(An) – {ε} to Follow(X). Stop if ε ∉ First(An)
• Add Follow(Y) to Follow(X)
#54
Example Follow Set Computation
• Recall the grammar
E → T X X → + E | ε
T → ( E ) | int Y Y → * T | ε
• Follow sets
Follow( + ) = { int, ( } Follow( * ) = { int, ( }
Follow( ( ) = { int, ( } Follow( E ) = {), $}
Follow( X ) = {$, ) } Follow( T ) = {+, ) , $}
Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $}
Follow( int) = {*, +, ) , $}
#55
Constructing LL(1) Parsing Tables
• Here is how to construct a parsing table T for
context-free grammar G
• For each production A → α in G do:
– For each terminal b ∈ First(α) do
•T[A, b] = α
– If α !*
ε, for each b ∈ Follow(A) do
•T[A, b] = α
#56
LL(1) Table Construction Example
• Recall the grammar
E → T X X → + E | ε
T → ( E ) | int Y Y → * T | ε
• Where in the row of Y do we put Y ! * T ?
– In the columns of First( *T ) = { * }
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#57
LL(1) Table Construction Example
• Recall the grammar
E → T X X → + E | ε
T → ( E ) | int Y Y → * T | ε
• Where in the row of Y we put Y ! ε ?
– In the columns of Follow(Y) = { $, +, ) }
( E )int YT
εεε* TY
εε+ EX
T XT XE
$)(+*int
#58
Avoid Multiple Definitions!
#59
Notes on LL(1) Parsing Tables
• If any entry is multiply defined then G is not
LL(1)
– If G is ambiguous
– If G is left recursive
– If G is not left-factored
– And in other cases as well
• Most programming language grammars are not
LL(1) (e.g., Java, Ruby, C++, OCaml, Cool, Perl, ...)
• There are tools that build LL(1) tables
#60
Simple Parsing Strategies
• Recursive Descent Parsing
– But backtracking is too annoying, etc.
• Predictive Parsing, aka. LL(k)
– Predict production from k tokens of lookahead
– Build LL(1) table
– Parsing using the table is fast and easy
– But many grammars are not LL(1) (or even LL(k))
• Next: a more powerful parsing strategy for
grammars that are not LL(1)
#61
Homework
• Today: WA1 (written homework) due
– Turn in to drop-box by 1pm.
• Friday: PA2 (Lexer) due
– You may work in pairs.
• Next Tuesday: Chapters 2.3.3
– Optional Wikipedia article

More Related Content

PDF
Lexical Analysis - Compiler design
PPTX
Type checking in compiler design
PPTX
Parsing in Compiler Design
PPTX
Role-of-lexical-analysis
PPT
Compiler Design Unit 1
PPT
Chapter 5 Syntax Directed Translation
PPTX
Interrupts
PPTX
Context free grammar
Lexical Analysis - Compiler design
Type checking in compiler design
Parsing in Compiler Design
Role-of-lexical-analysis
Compiler Design Unit 1
Chapter 5 Syntax Directed Translation
Interrupts
Context free grammar

What's hot (20)

PPTX
Parsing
PPTX
First-Come-First-Serve (FCFS)
PPTX
Analysis and Design of Algorithms
PPTX
Input-Buffering
PPT
Introduction to Compiler design
PPTX
Finite automata-for-lexical-analysis
PPTX
Compiler Chapter 1
PDF
CPU Scheduling
PPTX
Ambiguous & Unambiguous Grammar
PPTX
8 queens problem using back tracking
PPTX
Translation of expression(copmiler construction)
PPTX
Staffing level estimation
PDF
PAC Learning
PPTX
Peephole optimization techniques in compiler design
PPT
Intermediate code generation (Compiler Design)
PDF
Unit 1.2 Stepwise Project Planning.pdf
PDF
Module 05 Preprocessor and Macros in C
PPTX
serializability in dbms
PPT
Lexical analyzer
Parsing
First-Come-First-Serve (FCFS)
Analysis and Design of Algorithms
Input-Buffering
Introduction to Compiler design
Finite automata-for-lexical-analysis
Compiler Chapter 1
CPU Scheduling
Ambiguous & Unambiguous Grammar
8 queens problem using back tracking
Translation of expression(copmiler construction)
Staffing level estimation
PAC Learning
Peephole optimization techniques in compiler design
Intermediate code generation (Compiler Design)
Unit 1.2 Stepwise Project Planning.pdf
Module 05 Preprocessor and Macros in C
serializability in dbms
Lexical analyzer
Ad

Viewers also liked (8)

PPTX
Cs419 lec10 left recursion and left factoring
PPTX
Cs419 lec8 top-down parsing
DOCX
Chapter 5: Names, Bindings and Scopes (review Questions and Problem Set)
PPTX
Factoring
PDF
Compiler unit 2&3
PPT
Module 11
PDF
Topdown parsing
PPTX
Marijuana power point presentation dion
Cs419 lec10 left recursion and left factoring
Cs419 lec8 top-down parsing
Chapter 5: Names, Bindings and Scopes (review Questions and Problem Set)
Factoring
Compiler unit 2&3
Module 11
Topdown parsing
Marijuana power point presentation dion
Ad

Similar to Left factor put (20)

PDF
Lecture8 syntax analysis_4
PDF
CS17604_TOP Parser Compiler Design Techniques
PPT
Top down parsing
PPT
Ch4_topdownparser_ngfjgh_ngjfhgfffdddf.PPT
PDF
Lecture7 syntax analysis_3
PPT
Parsing
PDF
12IRGeneration.pdf
PDF
Lecture10 syntax analysis_6
PPT
compiler-lecture-6nn-14112022-110738am.ppt
PPT
ch5-bottomupparser_jfdrhgfrfyyssf-gfrrt.PPT
PPT
PARSING.ppt
PPTX
ALF 5 - Parser Top-Down
PPTX
LL(1) parsing
PPTX
ALF 5 - Parser Top-Down (2018)
PPTX
Algorithms - "Chapter 2 getting started"
PDF
ACD-U2-TopDown..pdf it hhepls inall the the
PPTX
3. Syntax Analyzer.pptx
PPTX
Compiler Design_Intermediate code generation new ppt.pptx
PPTX
Top down parsing(sid) (1)
Lecture8 syntax analysis_4
CS17604_TOP Parser Compiler Design Techniques
Top down parsing
Ch4_topdownparser_ngfjgh_ngjfhgfffdddf.PPT
Lecture7 syntax analysis_3
Parsing
12IRGeneration.pdf
Lecture10 syntax analysis_6
compiler-lecture-6nn-14112022-110738am.ppt
ch5-bottomupparser_jfdrhgfrfyyssf-gfrrt.PPT
PARSING.ppt
ALF 5 - Parser Top-Down
LL(1) parsing
ALF 5 - Parser Top-Down (2018)
Algorithms - "Chapter 2 getting started"
ACD-U2-TopDown..pdf it hhepls inall the the
3. Syntax Analyzer.pptx
Compiler Design_Intermediate code generation new ppt.pptx
Top down parsing(sid) (1)

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Sustainable Sites - Green Building Construction
PDF
Digital Logic Computer Design lecture notes
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Construction Project Organization Group 2.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
web development for engineering and engineering
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT
Project quality management in manufacturing
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
OOP with Java - Java Introduction (Basics)
DOCX
573137875-Attendance-Management-System-original
UNIT 4 Total Quality Management .pptx
bas. eng. economics group 4 presentation 1.pptx
Foundation to blockchain - A guide to Blockchain Tech
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CYBER-CRIMES AND SECURITY A guide to understanding
Sustainable Sites - Green Building Construction
Digital Logic Computer Design lecture notes
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Construction Project Organization Group 2.pptx
CH1 Production IntroductoryConcepts.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
web development for engineering and engineering
Lecture Notes Electrical Wiring System Components
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Project quality management in manufacturing
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
OOP with Java - Java Introduction (Basics)
573137875-Attendance-Management-System-original

Left factor put

  • 2. #2 Extra Credit Question • Given this grammar G: – E → E + T – E → T – T → T * int – T → int – T → ( E ) • Is the string int * (int + int) in L(G)? – Give a derivation or prove that it is not.
  • 3. #3 Revenge of Theory • How do we tell if DFA P is equal to DFA Q? – We can do: “is DFA P empty?” • How? – We can do: “P := not Q” • How? – We can do: “P := Q intersect R” • How? – So do: “is P intersect not Q empty?” • Does this work for CFG X and CFG Y? • Can we tell if s is in CFG X?
  • 4. #4 Outline • Recursive Descent Parsing • Left Recursion • LL(1) Parsing – LL(1) Parsing Tables – LP(1) Parsing Algorithm • Constructing LL(1) Parsing Tables – First, Follow
  • 5. #5 In One Slide • An LL(1) parser reads tokens from left to right and constructs a top-down leftmost derivation. LL(1) parsing is a special case of recursive descent parsing in which you can predict which single production to use from one token of lookahead. LL(1) parsing is fast and easy, but it does not work if the grammar is ambiguous, left-recursive, or not left- factored (i.e., it does not work for most programming languages).
  • 6. #6 Intro to Top-Down Parsing • Terminals are seen in order of appearance in the token stream: t1 t2 t3 t4 t5 The parse tree is constructed – From the top – From left to right A t1 B C t2 D t3 t4 t4
  • 7. #7 Recursive Descent Parsing • We’ll try recursive descent parsing first – “Try all productions exhaustively, backtrack” • Consider the grammar E → T + E | T T → ( E ) | int | int * T • Token stream is: int * int • Start with top-level non-terminal E • Try the rules for E in order
  • 8. #8 Recursive Descent Example • Try E0 → T1 + E2 • Then try a rule for T1 → ( E3 ) – But ( does not match input token int • Try T1 → int . Token matches. – But + after T1 does not match input token * • Try T1 → int * T2 – This will match but + after T1 will be unmatched • Have exhausted the choices for T1 – Backtrack to choice for E0 E → T + E | T T → ( E ) | int | int * T Input = int * int
  • 9. #9 Recursive Descent Example (2) • Try E0 → T1 • Follow same steps as before for T1 – And succeed with T1 → int * T2 and T2 → int – With the following parse tree E0 T1 int * T2 int E → T + E | T T → ( E ) | int | int * T Input = int * int
  • 10. #10 Recursive Descent Parsing • Parsing: given a string of tokens t1 t2 ... tn, find its parse tree • Recursive descent parsing: Try all the productions exhaustively – At a given moment the fringe of the parse tree is: t1 t2 … tk A … – Try all the productions for A: if A ! BC is a production, the new fringe is t1 t2 … tk B C … – Backtrack when the fringe doesn’t match the string – Stop when there are no more non-terminals
  • 11. #11 When Recursive Descent Does Not Work • Consider a production S → S a: – In the process of parsing S we try the above rule – What goes wrong? • A left-recursive grammar has S →+ Sα for some α Recursive descent does not work in such cases – It goes into an 1 loop
  • 12. #12 What's Wrong With That Picture?
  • 13. #13 Elimination of Left Recursion • Consider the left-recursive grammar S → S α | β • S generates all strings starting with a β and followed by a number of α • Can rewrite using right-recursion S → β T T → α T | ε
  • 14. #14 Example of Eliminating Left Recursion • Consider the grammar S ! 1 | S 0 ( β = 1 and α = 0 ) It can be rewritten as S ! 1 T T ! 0 T | ε
  • 15. #15 More Left Recursion Elimination • In general S → S α1 | … | S αn | β1 | … | βm • All strings derived from S start with one of β1, …,βm and continue with several instances of α1,…,αn • Rewrite as S → β1 T | … | βm T T → α1 T | … | αn T | ε
  • 16. #16 General Left Recursion • The grammar S → A α | δ A → S β is also left-recursive because S →+ S β α • This left-recursion can also be eliminated • See book, Section 2.3 • Detecting and eliminating left recursion are popular test questions
  • 17. #17 Summary of Recursive Descent • Simple and general parsing strategy – Left-recursion must be eliminated first – … but that can be done automatically • Unpopular because of backtracking – Thought to be too inefficient (repetition) • We can avoid backtracking – Sometimes ...
  • 18. #18 Predictive Parsers • Like recursive descent but parser can “predict” which production to use – By looking at the next few tokens – No backtracking • Predictive parsers accept LL(k) grammars – First L means “left-to-right” scan of input – Second L means “leftmost derivation” – The k means “predict based on k tokens of lookahead” • In practice, LL(1) is used
  • 19. #19 Sometimes Things Are Perfect • The “.ml-lex” format you emit in PA2 • Will be the input for PA3 – actually the reference “.ml-lex” will be used • It can be “parsed” with no lookahead – You always know just what to do next • Ditto with the “.ml-ast” output of PA3 • Just write a few mutually-recursive functions • They read in the input, one line at a time
  • 20. #20 LL(1) • In recursive descent, for each non-terminal and input token there may be a choice of which production to use • LL(1) means that for each non-terminal and token there is only one production that could lead to success • Can be specified as a 2D table – One dimension for current non-terminal to expand – One dimension for next token – Each table entry contains one production
  • 21. #21 Predictive Parsing and Left Factoring • Recall the grammar E → T + E | T T → int | int * T | ( E ) • Impossible to predict because – For T two productions start with int – For E it is not clear how to predict • A grammar must be left-factored before use for predictive parsing
  • 22. #22 Left-Factoring Example • Recall the grammar E → T + E | T T → int | int * T | ( E ) • Factor out common prefixes of productions E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε
  • 24. #24 LL(1) Parsing Table Example • Left-factored grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε • The LL(1) parsing table ($ is a special end marker): ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 25. #25 LL(1) Parsing Table Example Analysis • Consider the [E, int] entry – “When current non-terminal is E and next input is int, use production E → T X” – This production can generate an int in the first position ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 26. #26 LL(1) Parsing Table Example Analysis • Consider the [Y,+] entry – “When current non-terminal is Y and current token is +, get rid of Y” – We’ll see later why this is so ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 27. #27 LL(1) Parsing Tables: Errors • Blank entries indicate error situations – Consider the [E,*] entry – “There is no way to derive a string starting with * from non-terminal E” ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 28. #28 Using Parsing Tables • Method similar to recursive descent, except – For each non-terminal S – We look at the next token a – And choose the production shown at [S,a] • We use a stack to keep track of pending non- terminals • We reject when we encounter an error state • We accept when we encounter end-of-input
  • 29. #29 LL(1) Parsing Algorithm initialize stack = <S $> next = (pointer to tokens) repeat match stack with | <X, rest>:if T[X,*next] = Y1…Yn then stack ← <Y1… Yn rest> else error () | <t, rest>:if t == *next ++ then stack ← <rest> else error () until stack == < >
  • 30. #30 Stack Input Action ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 31. #31 Stack Input Action E $ int * int $ T X ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 32. #32 Stack Input Action E $ int * int $ T X T X $ int * int $ int Y ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 33. #33 Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 34. #34 Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 35. #35 Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 36. #36 Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 37. #37 Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 38. #38 Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 39. #39 Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 40. #40 Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 41. #41 LL(1) Languages • LL(1) languages can be LL(1) parsed – A language Q is LL(1) if there exists an LL(1) table such the LL(1) parsing algorithm using that table accepts exactly the strings in Q • No table entry can be multiply defined • Once we have the table – The parsing algorithm is simple and fast – No backtracking is necessary • Want to generate parsing tables from CFG!
  • 42. Q: Movies (263 / 842) • This 1982 Star Trek film features Spock nerve-pinching McCoy, Kirstie Alley "losing" the Kobayashi Maru , and Chekov being mind-controlled by a slug-like alien. Ricardo Montalban is "is intelligent, but not experienced. His pattern indicates two-dimensional thinking."
  • 43. Q: Music (238 / 842) • For two of the following four lines from the 1976 Eagles song Hotel California, give enough words to complete the rhyme. – So I called up the captain / "please bring me my wine" – Mirrors on the ceiling / pink champagne on ice – And in the master's chambers / they gathered for the feast – We are programmed to receive / you can checkout any time you like,
  • 44. Q: Books (727 / 842) •Name 5 of the 9 major characters in A. A. Milne's 1926 books about a "bear of very little brain" who composes poetry and eats honey.
  • 45. #45 Top-Down Parsing. Review • Top-down parsing expands a parse tree from the start symbol to the leaves – Always expand the leftmost non-terminal E T E + int * int + int
  • 46. #46 Top-Down Parsing. Review • Top-down parsing expands a parse tree from the start symbol to the leaves – Always expand the leftmost non-terminal E int T * T E + int * int + int • The leaves at any point form a string βAγ – β contains only terminals – The input string is βbδ – The prefix β matches – The next token is b
  • 47. #47 Top-Down Parsing. Review • Top-down parsing expands a parse tree from the start symbol to the leaves – Always expand the leftmost non-terminal E int T * int T E + T int * int + int • The leaves at any point form a string βAγ – β contains only terminals – The input string is βbδ – The prefix β matches – The next token is b
  • 48. #48 Top-Down Parsing. Review • Top-down parsing expands a parse tree from the start symbol to the leaves – Always expand the leftmost non-terminal E int T * int T E + T int int * int + int • The leaves at any point form a string βAγ – β contains only terminals – The input string is βbδ – The prefix β matches – The next token is b
  • 49. #49 Constructing Predictive Parsing Tables • Consider the state S !* βAγ – With b the next token – Trying to match βbδ There are two possibilities: • b belongs to an expansion of A • Any A ! α can be used if b can start a string derived from α In this case we say that b 2 First(α) Or…
  • 50. #50 Constructing Predictive Parsing Tables • b does not belong to an expansion of A – The expansion of A is empty and b belongs to an expansion of γ (e.g., bω) – Means that b can appear after A in a derivation of the form S !* βAbω – We say that b 2 Follow(A) in this case – What productions can we use in this case? • Any A ! α can be used if α can expand to ε • We say that ε 2 First(A) in this case
  • 51. #51 Computing First Sets Definition First(X) = { b | X →* bα} ∪ {ε | X →* ε} • First(b) = { b } • For all productions X ! A1 … An • Add First(A1) – {ε} to First(X). Stop if ε ∉ First(A1) • Add First(A2) – {ε} to First(X). Stop if ε ∉ First(A2) • … • Add First(An) – {ε} to First(X). Stop if ε ∉ First(An) • Add ε to First(X) (ignore Ai if it is X)
  • 52. #52 Example First Set Computation • Recall the grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε • First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+, ε } First( + ) = { + } First( Y ) = {*, ε } First( * ) = { * }
  • 53. #53 Computing Follow Sets Definition Follow(X) = { b | S →* β X b ω } • Compute the First sets for all non-terminals first • Add $ to Follow(S) (if S is the start non-terminal) • For all productions Y ! … X A1 … An • Add First(A1) – {ε} to Follow(X). Stop if ε ∉ First(A1) • Add First(A2) – {ε} to Follow(X). Stop if ε ∉ First(A2) • … • Add First(An) – {ε} to Follow(X). Stop if ε ∉ First(An) • Add Follow(Y) to Follow(X)
  • 54. #54 Example Follow Set Computation • Recall the grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε • Follow sets Follow( + ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ) , $} Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $} Follow( int) = {*, +, ) , $}
  • 55. #55 Constructing LL(1) Parsing Tables • Here is how to construct a parsing table T for context-free grammar G • For each production A → α in G do: – For each terminal b ∈ First(α) do •T[A, b] = α – If α !* ε, for each b ∈ Follow(A) do •T[A, b] = α
  • 56. #56 LL(1) Table Construction Example • Recall the grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε • Where in the row of Y do we put Y ! * T ? – In the columns of First( *T ) = { * } ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 57. #57 LL(1) Table Construction Example • Recall the grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε • Where in the row of Y we put Y ! ε ? – In the columns of Follow(Y) = { $, +, ) } ( E )int YT εεε* TY εε+ EX T XT XE $)(+*int
  • 59. #59 Notes on LL(1) Parsing Tables • If any entry is multiply defined then G is not LL(1) – If G is ambiguous – If G is left recursive – If G is not left-factored – And in other cases as well • Most programming language grammars are not LL(1) (e.g., Java, Ruby, C++, OCaml, Cool, Perl, ...) • There are tools that build LL(1) tables
  • 60. #60 Simple Parsing Strategies • Recursive Descent Parsing – But backtracking is too annoying, etc. • Predictive Parsing, aka. LL(k) – Predict production from k tokens of lookahead – Build LL(1) table – Parsing using the table is fast and easy – But many grammars are not LL(1) (or even LL(k)) • Next: a more powerful parsing strategy for grammars that are not LL(1)
  • 61. #61 Homework • Today: WA1 (written homework) due – Turn in to drop-box by 1pm. • Friday: PA2 (Lexer) due – You may work in pairs. • Next Tuesday: Chapters 2.3.3 – Optional Wikipedia article