2. Outline
Role of parser
Context free grammars
Top down parsing
Bottom up parsing
Parser generators
3. The role of parser
Lexical
Analyzer
Parser
Source
program
token
getNext
Token
Symbol
table
Parse tree Rest of
Front End
Intermediate
representation
4. The role of parser
• The parser obtains a string of tokens from the
lexical analyzer, and verifies that the string of
token names can be generated by the grammar
for the source language. Report any syntax
errors and also recover from common errors.
• The parser constructs a parse tree and passes
it to the rest of the compiler for further
processing.
5. Error handling
Common programming errors
Lexical errors: misspellings of id, keywords or operators
Syntactic errors: misplaced ‘;’ or extra/missing ‘}’
Semantic errors: type mismatch between operands & op
Logical errors: use ‘=’ instead of ‘==’
Error handler goals
Report the presence of errors clearly and accurately
Recover from each error quickly enough to detect
subsequent errors
Add minimal overhead to the processing of correct
programs
6. Error-recover strategies
Panic mode recovery
Discard input symbol one at a time until one of
designated set of synchronization tokens (‘;’ or ‘})’ is
found
Phrase level recovery
Replacing a prefix of remaining input by some string
that allows the parser to continue
Error productions
Augment the grammar with productions that generate
the erroneous constructs
Global correction
Choosing minimal sequence of changes to obtain a
globally least-cost correction---too much effort, only
theoretical interest
7. Context free grammars (CFG)
1. Terminals ( or token name): id
2. Nonterminal: denote set of strings.
Ex. Expression, term and factor
3. Start symbol: first symbol of
grammar, here, it is expression
4. Productions: specify the manner
in which terminal and non-
terminals are combined to form
string.
Example of CFG:
expression -> expression + term
expression -> expression – term
expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id
Short form:
E -> E + T | T
T -> T * F | F
F -> (E) | id
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
Non-left recursive
grammar:
Grammars describe the syntax of
programming language constructs like
expressions and statements
8. Derivations
Productions are treated as rewriting rules to generate
a string
E -> E + E | E * E | -E | (E) | id
Derivations for –(id+id)
E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
leftmost derivations: leftmost nonterminal in each sentential
is always chosen. Above derivation is leftmost.
Rightmost derivations: rightmost nonterminal is always
chosen:
Exercise: find derivation for (id+id*id) using above grammar
9. E -> E + E | E * E | -E | (E) | id
(id+id*id)
Leftmost Derivation:
E -> (E)
-> (E*E)
->(E+E*E)
->(id+E*E)
-> (id+id*E)
->(id+id*id)
Leftmost Derivation:
E -> (E)
-> (E+E)
-> (id+E)
->(id+E*E)
->(id+id*E)
->(id+id*id)
10. Parse trees
A parse tree is a graphical representation of a derivation
E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
11. Ambiguity
A grammar that produces more than one parse
tree for some strings is said to be ambiguous
grammar.
More than one leftmost derivation or more than one rightmost
derivation
Example: id+id*id
13. Elimination of ambiguity
if El then if E2 then S1 else S2
----has two parse trees, so it is ambiguous grammar. Left parse tree is
preferred in all programming languages. Rule is, “ Match each else with
the closest unmatched then”
14. Elimination of ambiguity (cont.)
Idea:
1) end each ‘if `with ‘endif’
2) A statement appearing between a then and an
else must be matched
15. Top-Down Parsing
LL methods (Left-to-right, Leftmost derivation)
and recursive-descent parsing
Grammar:
E T + T
T ( E )
T - E
T id
Leftmost derivation:
E lm T + T
lm id + T
lm id + id
E E
T
+
T
id
id
E
T
T
+
E
T
+
T
id
16. Elimination of left recursion
A grammar is left recursive if it has a non-terminal A such
that there is a derivation A=> Aα
Top down parsing methods cant handle left-recursive
grammars
A simple rule for direct left recursion elimination:
For a rule like: A -> A α|β,
We may replace it with A -> β A’ and A’ -> α A’ | ɛ
Example: direct left recursion elimination from
the grammar given below:
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
17. A -> A α1| A α2|…| A αm| β1| β2|..| βn
Eliminating Left Recursion
A -> β1A1| β2A1|..| βnA1
A1 -> α1A1| α2A1| αm A1 | Ɛ
E-> E+T | E-T | T
T -> T*F | T/F | F
F -> (E) | id
Eliminating Left Recursion
E -> T E1
E1 -> +TE1 | -TE1 | Ɛ
T -> F T1
T1 -> *FT1 | /FT1 | Ɛ
F -> (E) | id
18. Left recursion elimination (cont.)
There are cases like following
S -> Aa
A -> Sb|c changes A->Aab|c and then remove
recursion
Here S is left recursive, but not immediately left recursive.
Left recursion elimination algorithm:
19. Example Left Recursion Elim.
A B C | a
B C A | A b
C A B | C C | a
Choose arrangement: A, B, C
i = 1: nothing to do
i = 2, j = 1: B C A | A b
B C A | B C b | a b
(imm) B C A BR | a b BR
BR C b BR |
i = 3, j = 1: C A B | C C | a
C B C B | a B | C C | a
i = 3, j = 2: C B C B | a B | C C | a
C C A BR C B | a b BR C B | a B | C C | a
(imm) C a b BR C B CR | a B CR | a CR
CR A BR C B CR | C CR |
A -> BC
->AbC
20. 1) A ABd |Aa | a
→
B Be | b
→
After eliminating Left Recursion
A -> aA1
A1 -> BdA1| aA1|
B -> bB1
B1 -> eB1|
2) S a|^|(T)
→
T T, S|S
→
After eliminating Left Recursion
S a|^|(T)
→
T -> ST1
T1 -> , ST1 |
3)S ->Sa | Sb | c | d
After eliminating Left Recursion
S -> cS1 | dS1
S1 -> aS1|bS1|
21. Left factoring
Left factoring is a grammar transformation that is
useful for producing a grammar suitable for
predictive or top-down parsing.
Consider following grammar:
Stmt -> if expr then stmt else stmt
| if expr then stmt
On seeing input if it is not clear for the parser which
production to use
We can easily perform left factoring:
If we have A->αβ1 | αβ2 then we replace it with
A -> αA’
A’ -> β1 | β2
22. Left factoring (cont.)
Algorithm
For each non-terminal A, find the longest prefix α
common to two or more of its alternatives. If α<>
ɛ, then replace all of A-productions A->αβ1 |αβ2
| … | αβn | γ by
A -> αA’ | γ
A’ -> β1 |β2 | … | βn
Example:
S -> i E t S| i E t S e S| a changes S-> iEtSS’|a & S’-
>eS|ɛ
E -> b
23. 1) A → αβ1 | αβ2 | αβ3
After Left factoring
A -> αA1
A1 -> β1 | β2 | β3
2) S iEtS | iEtSeS | a
→
E b
→
After Left factoring
S -> iEtSS1 | a
S1 -> eS|
E b
→
3) A aAB | aBc | aAc
→
4) S bSSaaS | bSSaSb | bSb | a
→
5) S aSSbS | aSaSb | abb | b
→
24. 4) S bSSaaS | bSSaSb | bSb | a
→
After Left factoring
S -> bSS1| a
S1 ->SaaS |SaSb|b
S1 -> SaS2 | b
S2 -> aS | Sb
Final ans
S -> bSS1| a
S1 -> SaS2 | b
S2 -> aS | Sb
26. Introduction
A Top-down parser tries to create a parse tree from the
root towards the leafs scanning input from left to right
It can be also viewed as finding a leftmost derivation for
an input string
Example: id+id*id
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
E
lm
E
T E’
lm
E
T E’
F T’
lm
E
T E’
F T’
id
lm
E
T E’
F T’
id Ɛ
lm
E
T E’
F T’
id Ɛ
+ T E’
27. Predictive Parsing
Eliminate left recursion from grammar
Left factor the grammar
Compute FIRST and FOLLOW
Two variants:
Recursive (recursive calls)
Non-recursive (table-driven)
28. First and Follow
First(α): set of terminals that begins strings derived
fromα
If α=>ɛ then ɛ is also in First(ɛ)
In predictive parsing when we have A-> α|β, if
First(α) and First(β) are disjoint sets then we can
select appropriate A-production by looking at the
next input
Follow(A), for any nonterminal A, is set of terminals a
that can appear immediately after A in some
sentential form
If we have S => αAaβ for some αand βthen a is in
Follow(A) = {a}
If A can be the rightmost symbol in some sentential
form, then $ is in Follow(A)
*
*
29. Computing First
To compute First(X) for all grammar symbols X,
apply following rules until no more terminals or ɛ
can be added to any First set:
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal and X->Y1Y2…Yk is a
production for some k>=1, then place a in First(X)
if for some i, a is in First(Yi) and ɛ is in all of
First(Y1),…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is
in First(Yj) for j=1,…,k then add ɛ to First(X).
3. If X-> ɛ is a production then add ɛ to First(X)
Example!
*
*
30. First(E) = First(T) = { (, id }
First(E’) = { +, Ɛ}
First(T) = First(F) = = { (, id }
First(T’) = {*, Ɛ}
First(F) = { (, id }
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
31. 2) S A
→
Old A aB | Ad
→
A -> aB A1
A1 -> d A1 | ∈
B b
→
C g
→
First(S)= First(A) ={ a}
First(A) = {a}
First(A1) = {d, }
∈
First(B) = {b}
First(C) = {g}
3) S (L) / a
→
L SL’
→
L’ ,SL’ /
→ ∈
First(S) =
First(L) =
First(L’) =
32. Computing follow
To compute First(A) for all nonterminals A, apply
following rules until nothing can be added to any
follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything
in First(β) except ɛ is in Follow(B).
3. If there is a production A->B or a production
A->αBβ where First(β) contains ɛ, then
everything in Follow(A) is in Follow(B)
Example!
34. 2) S A
→
A aB | Ad
→
B b
→
C g
→
Eliminating Left recursion
S A
→
A -> aB A1
A1 -> d A1 | ∈
B b
→
C g
→
First(S)= First(A) ={ a}
First(A) = {a}
First(A1) = {d, }
∈
First(B) = {b}
First(C) = {g}
Follow(S) = { dollar }
Follow(A) = S -> A => {dollar }
Follow(A1) = A -> aB A1, A1 -> d A1 => Follow(A) = {dollar }
Follow(B) = A -> aB A1 => A -> aB =>{First(A1) without U Follow(A)}
∈
=> { d, dollar}
35. S -> ABC | Dg
A -> aA | ∈
B -> bB | ∈
C -> cC| ∈
D ->d
First(S) = First(ABC) U First(Dg)
=First(aABC) U First(dg)
= First( BC)
∈ = First(BC)
=First(bBC)
=First( C) = First(C) =
∈
{c, }
∈
First(S) = {a, d, b, c, }
∈
First(A)= {a, }
∈
First(B)= {b, }
∈
First(C)= {c, }
∈
First(D)= {d}
Follow(S) = {dollar}
Follow(A) = 1)S -> ABC => S->A C=>S->AC=>
∈
S->A =>
∈ S->A
2)A->aA
{ First(B) without } U
∈ First(C) without ∈
U Follow(S) }
Follow(A) ={b, c, dollar}
Follow(B) = 1)S -> ABC => S-> AB => S->AB
∈
2) B ->bB
{First(C) without U Follow(S) }
∈
Follow(B) ={ c, dollar}
Follow(C) = 1) S-> ABC
2) C ->cC
Follow ( C) =Follow(S) = {dollar}
36. 1) A abc | def | ghi
→
2) S A
→
A aB / Ad
→
B b
→
C g
→
3) S (L) / a
→
L SL’
→
L’ ,SL’ /
→ ∈
37. LL(1) Grammars
Predictive parsers are those recursive descent parsers (top-
down parser) needing no backtracking.
Grammars for which we can create predictive parsers are
called LL(1)
The first L means scanning input from left to right
The second L means leftmost derivation
And 1 stands for using one input symbol for lookahead
A grammar G is LL(1) if and only if whenever A-> α|βare two
distinct productions of G, the following conditions hold:
For no terminal a do αandβ both derive strings beginning with a
At most one of α or βcan derive empty string
If α=> ɛ then βdoes not derive any string beginning with a
terminal in Follow(A).
*
38. Recursive Descent Parsing (Recap)
Grammar must be LL(1)
Every nonterminal has one (recursive)
procedure responsible for parsing the
nonterminal’s syntactic category of input
tokens
When a nonterminal has multiple
productions, each production is implemented
in a branch of a selection statement based on
input look-ahead information
39. Using FIRST and FOLLOW to Write a Recursive
Descent Parser
expr term rest
rest + term rest
| - term rest
|
term id
procedure rest()
begin
if lookahead in FIRST(+ term rest) then
match(‘+’); term(); rest()
else if lookahead in FIRST(- term rest) then
match(‘-’); term(); rest()
else if lookahead in FOLLOW(rest) then
return
else error()
end;
FIRST(+ term rest) = { + }
FIRST(- term rest) = { - }
FOLLOW(rest) = { $ }
Procedure
expr()
{
term();
rest();
}
Procedure term()
{
if (i/p symbol
==‘id’)
match(‘id’)
Else error()
main()
{
expr();
if (i/p symbol ==
$)
accept;
else error()
}
id - id +id$
expr -> term rest
-> id rest
->id + term
rest
->id + id rest
-> id + id
40. S -> ABC | Dg
A -> aA | ∈
B -> bB | ∈
C -> cC| ∈
D ->d
main()
{
S();
if (i/p symbol ==
$)
accept;
else error()
}
S()
{
if lookahead in First(ABC) then
begin
if (i/p symbol == ‘a’):
match(‘a’) A(); B(); C();
else if (i/p symbol =‘b’)
match (‘b’) B(); C();
else if(i/p symbol =‘c’)
match(‘c’) (C);
else lookahead in Follow(S) then
return
end
else if lookahead in First(Dg) then
match(‘d’) match(‘g’)
else
error()
}
A()
{
if lookahead in First(aA)
then
match(‘a’) A();
else if lookahead in
Follow(A)
return
else error()
B()
C()
D()
{
if lookahead in First(d) then
match(‘d’) else error() }
abc$
aaaaa$
41. S A
→
A -> aB A1
A1 -> d A1 | ∈
B b
→
C g
→
main()
{
S();
if (i/p symbol ==
$)
accept;
else error()
}
S()
{
A();
}
A1()
{
if lookahead in First(dA1) then
match (‘d’) A1();
else if lookahead in Follow(A1)
then
return
else
error()
}
A()
{
if lookahead in First(aBA1) then
match (‘a’) B(); A1();
else
error()
}
B()
{
if lookahead in First(b)
Match (‘b’)
Else
Error()
C()
{
if lookahead in First(g)
Match (‘g’)
42. Non-Recursive Predictive Parsing: Table-
Driven Parsing
Given an LL(1) grammar G = (N, T, P, S) construct
a table M[A,a] for A N, a T and use a driver
program with a stack
Predictive parsing
program (driver)
Parsing table
M
a + b $
X
Y
Z
$
stack
input
output
43. Construction of predictive parsing table
For each production A->α in grammar do the
following:
1. For each terminal a in First(α) add A->α in M[A,a]
2. If ɛ is in First(α), then for each terminal b in
Follow(A) add A-> ɛ to M[A,b]. If ɛ is in First(α)
and $ is in Follow(A), add A-> ɛ to M[A,$] as well
If after performing the above, there is no
production in M[A, a] then set M[A,a] to error
44. Example
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
F
T
E
E’
T’
First Follow
{(,id}
{(,id}
{(,id}
{+,ɛ}
{*,ɛ}
{+, *, ), $}
{+, ), $}
{+, ), $}
{), $}
{), $}
E
E’
T
T’
F
Non -
termin
al
Input Symbol
id + * ( ) $
E -> TE’ E -> TE’
E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ
T -> FT’ T -> FT’
T’ -> *FT’
T’ -> Ɛ T’ -> Ɛ T’ -> Ɛ
F -> (E)
F -> id
45. S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b
First(S) = {i, a}
First(S’) = {e, Ɛ}
First(E) = { b}
Follow(S) = {$, e}
S -> iEtSS’ =>
First(S’)
S -> iEtS
S’ -> eS
Follow(S’) = {$, e}
S -> iEtSS’
Follow(E) = {t}
M i t a e b $
S S ->
iEtSS’
S->a
S’ S’ -> eS
S’ -> Ɛ
S’ -> Ɛ
E E -> b
M[S’,e] contains two entries, so Given Grammar is not LL(1)
46. Another example
S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b
S
S’
E
Non -
termin
al
Input Symbol
a b e i t $
S -> a S -> iEtSS’
S’ -> Ɛ
S’ -> eS
S’ -> Ɛ
E -> b
48. Predictive Parsing Program (Driver)
push($)
push(S)
a := lookahead
repeat
X := pop()
if X is a terminal or X = $ then
match(X) // moves to next token and a := lookahead
else if M[X,a] = X Y1Y2…Yk then
push(Yk, Yk-1, …, Y2, Y1) // such that Y1 is on top
… invoke actions and/or produce IR output …
else error()
endif
until X = $
50. Example Table-Driven Parsing
Stack
$E
$ ER T
$ ER TR F
$ ER TR id
$ ER TR
$ ER
$ ER T +
$ ER T
$ ER TR F
$ ER TR id
$ ER TR
$ ER
$ ER T +
$ ER T
$ ER TR F
Input
id+id+id$
id+id+id$
id+id+id$
id+id+id$
+id+d$
+id+id$
+id+id$
id+id$
id+id$
id+id$
+id$
+id$
+id$
id$
id $
id $
$
Production applied
E T ER
T F TR
F id
TR
ER + T ER
T F TR
F id
TR
ER + T ER
T F TR
F id
E -> TER
ER -> +TER | Ɛ
T -> FTR
TR -> *F TR | Ɛ
F -> (E) | id
id++id$
(id+id)*id$
51. Let the grammar G = (V, T, S’, P) is
S' S$
→
S xYzS | a
→
Y xYz | y
→
52. 1. E→T R
2. R ε
→
3. R +
→ E
4. T→F S
5. S ε
→
6. S *
→ T
7. F→n
8. F (
→ E )
53. n + * ( ) $
E 1 1
R 3 2 2 2
T 4 4
S 5 6 5 5
F 7 8
54. ched Todo Input Action
E $ n + n * n $
T R $ n + n * n $ E → T R
F S R $ n + n * n $ T → F S
n S R $ n + n * n $ F → n
n S R $ + n * n $ match n
n R $ + n * n $ S → ε
n + E $ + n * n $ R → + E
n + E $ n * n $ match +
n + T R $ n * n $ E → T R
n + F S R $ n * n $ T → F S
n + n S R $ n * n $ F → n
n + n S R $ * n $ match n
n + n * T R $ * n $ S → * T
n + n * T R $ n $ match *
n + n * F S R $ n $ T → F S
n + n * n S R $ n $ F → n
n + n * n S R $ $ match n
n + n * n R $ $ S → ε
n + n * n $ $ R → ε