Compiler Design_Syntax Analyzer_Top Down Parsers.pptx

Compiler course
Chapter 2
Syntax Analysis
by
Dr. Rushali A. Deshmukh

Outline
 Role of parser
 Context free grammars
 Top down parsing
 Bottom up parsing
 Parser generators

The role of parser
Lexical
Analyzer
Parser
Source
program
token
getNext
Token
Symbol
table
Parse tree Rest of
Front End
Intermediate
representation

The role of parser
• The parser obtains a string of tokens from the
lexical analyzer, and verifies that the string of
token names can be generated by the grammar
for the source language. Report any syntax
errors and also recover from common errors.
• The parser constructs a parse tree and passes
it to the rest of the compiler for further
processing.

Error handling
 Common programming errors
 Lexical errors: misspellings of id, keywords or operators
 Syntactic errors: misplaced ‘;’ or extra/missing ‘}’
 Semantic errors: type mismatch between operands & op
 Logical errors: use ‘=’ instead of ‘==’
 Error handler goals
 Report the presence of errors clearly and accurately
 Recover from each error quickly enough to detect
subsequent errors
 Add minimal overhead to the processing of correct
programs

Error-recover strategies
 Panic mode recovery
 Discard input symbol one at a time until one of
designated set of synchronization tokens (‘;’ or ‘})’ is
found
 Phrase level recovery
 Replacing a prefix of remaining input by some string
that allows the parser to continue
 Error productions
 Augment the grammar with productions that generate
the erroneous constructs
 Global correction
 Choosing minimal sequence of changes to obtain a
globally least-cost correction---too much effort, only
theoretical interest

Context free grammars (CFG)
1. Terminals ( or token name): id
2. Nonterminal: denote set of strings.
Ex. Expression, term and factor
3. Start symbol: first symbol of
grammar, here, it is expression
4. Productions: specify the manner
in which terminal and non-
terminals are combined to form
string.
Example of CFG:
expression -> expression + term
expression -> expression – term
expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id
Short form:
E -> E + T | T
T -> T * F | F
F -> (E) | id
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
Non-left recursive
grammar:
Grammars describe the syntax of
programming language constructs like
expressions and statements

Derivations
 Productions are treated as rewriting rules to generate
a string
 E -> E + E | E * E | -E | (E) | id
 Derivations for –(id+id)
E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
leftmost derivations: leftmost nonterminal in each sentential
is always chosen. Above derivation is leftmost.
Rightmost derivations: rightmost nonterminal is always
chosen:
Exercise: find derivation for (id+id*id) using above grammar

 E -> E + E | E * E | -E | (E) | id
 (id+id*id)
 Leftmost Derivation:
 E -> (E)
 -> (E*E)
 ->(E+E*E)
 ->(id+E*E)
 -> (id+id*E)
 ->(id+id*id)
 Leftmost Derivation:
 E -> (E)
 -> (E+E)
 -> (id+E)
 ->(id+E*E)
 ->(id+id*E)
 ->(id+id*id)

Parse trees
 A parse tree is a graphical representation of a derivation
 E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)

Ambiguity
 A grammar that produces more than one parse
tree for some strings is said to be ambiguous
grammar.
 More than one leftmost derivation or more than one rightmost
derivation
 Example: id+id*id

Elimination of ambiguity
 if E1 then S1 else if E2 then S2 else S3

Elimination of ambiguity
if El then if E2 then S1 else S2
----has two parse trees, so it is ambiguous grammar. Left parse tree is
preferred in all programming languages. Rule is, “ Match each else with
the closest unmatched then”

Elimination of ambiguity (cont.)
 Idea:
1) end each ‘if `with ‘endif’
2) A statement appearing between a then and an
else must be matched

Top-Down Parsing
 LL methods (Left-to-right, Leftmost derivation)
and recursive-descent parsing
Grammar:
E  T + T
T  ( E )
T  - E
T  id
Leftmost derivation:
E lm T + T
lm id + T
lm id + id
E E
T
+
T
id
id
E
T
T
+
E
T
+
T
id

Elimination of left recursion
 A grammar is left recursive if it has a non-terminal A such
that there is a derivation A=> Aα
 Top down parsing methods cant handle left-recursive
grammars
 A simple rule for direct left recursion elimination:
 For a rule like: A -> A α|β,
 We may replace it with A -> β A’ and A’ -> α A’ | ɛ
 Example: direct left recursion elimination from
the grammar given below:
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id

A -> A α1| A α2|…| A αm| β1| β2|..| βn
Eliminating Left Recursion
A -> β1A1| β2A1|..| βnA1
A1 -> α1A1| α2A1| αm A1 | Ɛ
E-> E+T | E-T | T
T -> T*F | T/F | F
F -> (E) | id
Eliminating Left Recursion
E -> T E1
E1 -> +TE1 | -TE1 | Ɛ
T -> F T1
T1 -> *FT1 | /FT1 | Ɛ
F -> (E) | id

Left recursion elimination (cont.)
 There are cases like following
 S -> Aa
 A -> Sb|c  changes A->Aab|c and then remove
recursion
 Here S is left recursive, but not immediately left recursive.
 Left recursion elimination algorithm:

1) A ABd |Aa | a
→
B Be | b
→
After eliminating Left Recursion
A -> aA1
A1 -> BdA1| aA1| 
B -> bB1
B1 -> eB1| 
2) S a|^|(T)
→
T T, S|S
→
S a|^|(T)
→
T -> ST1
T1 -> , ST1 | 
3)S ->Sa | Sb | c | d
S -> cS1 | dS1
S1 -> aS1|bS1| 

Left factoring
 Left factoring is a grammar transformation that is
useful for producing a grammar suitable for
predictive or top-down parsing.
 Consider following grammar:
 Stmt -> if expr then stmt else stmt
 | if expr then stmt
 On seeing input if it is not clear for the parser which
production to use
 We can easily perform left factoring:
 If we have A->αβ1 | αβ2 then we replace it with
 A -> αA’
 A’ -> β1 | β2

Left factoring (cont.)
 Algorithm
 For each non-terminal A, find the longest prefix α
common to two or more of its alternatives. If α<>
ɛ, then replace all of A-productions A->αβ1 |αβ2
| … | αβn | γ by
 A -> αA’ | γ
 A’ -> β1 |β2 | … | βn
 Example:
 S -> i E t S| i E t S e S| a changes S-> iEtSS’|a & S’-
>eS|ɛ
 E -> b

1) A → αβ1 | αβ2 | αβ3
After Left factoring
A -> αA1
A1 -> β1 | β2 | β3
2) S iEtS | iEtSeS | a
→
E b
→
S -> iEtSS1 | a
S1 -> eS| 
E b
→
3) A aAB | aBc | aAc
→
4) S bSSaaS | bSSaSb | bSb | a
→
5) S aSSbS | aSaSb | abb | b
→

Introduction
 A Top-down parser tries to create a parse tree from the
root towards the leafs scanning input from left to right
 It can be also viewed as finding a leftmost derivation for
an input string
 Example: id+id*id
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
E
lm
E
T E’
lm
E
T E’
F T’
lm
E
T E’
F T’
id
lm
E
T E’
F T’
id Ɛ
lm
E
T E’
F T’
id Ɛ
+ T E’

Predictive Parsing
 Eliminate left recursion from grammar
 Left factor the grammar
 Compute FIRST and FOLLOW
 Two variants:
 Recursive (recursive calls)
 Non-recursive (table-driven)

First and Follow
 First(α): set of terminals that begins strings derived
fromα
 If α=>ɛ then ɛ is also in First(ɛ)
 In predictive parsing when we have A-> α|β, if
First(α) and First(β) are disjoint sets then we can
select appropriate A-production by looking at the
next input
 Follow(A), for any nonterminal A, is set of terminals a
that can appear immediately after A in some
sentential form
 If we have S => αAaβ for some αand βthen a is in
Follow(A) = {a}
 If A can be the rightmost symbol in some sentential
form, then $ is in Follow(A)
*
*

Computing First
 To compute First(X) for all grammar symbols X,
apply following rules until no more terminals or ɛ
can be added to any First set:
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal and X->Y1Y2…Yk is a
production for some k>=1, then place a in First(X)
if for some i, a is in First(Yi) and ɛ is in all of
First(Y1),…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is
in First(Yj) for j=1,…,k then add ɛ to First(X).
3. If X-> ɛ is a production then add ɛ to First(X)
 Example!
*
*

First(E) = First(T) = { (, id }
First(E’) = { +, Ɛ}
First(T) = First(F) = = { (, id }
First(T’) = {*, Ɛ}
First(F) = { (, id }
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id

2) S A
→
Old A aB | Ad
→
A -> aB A1
A1 -> d A1 | ∈
B b
→
C g
→
First(S)= First(A) ={ a}
First(A) = {a}
First(A1) = {d, }
∈
First(B) = {b}
First(C) = {g}
3) S (L) / a
→
L SL’
→
L’ ,SL’ /
→ ∈
First(S) =
First(L) =
First(L’) =

Computing follow
 To compute First(A) for all nonterminals A, apply
following rules until nothing can be added to any
follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything
in First(β) except ɛ is in Follow(B).
3. If there is a production A->B or a production
A->αBβ where First(β) contains ɛ, then
everything in Follow(A) is in Follow(B)
 Example!

First(E) = First(T) = { (, id }
First(E’) = { +, Ɛ}
First(T) = First(F) = = { (, id }
First(T’) = {*, Ɛ}
First(F) = { (, id }
Follow(E) = {dollar, ) }
Follow(E’)= {dollar, ) }
Follow(T) = E -> TE’ , E->T Ɛ => E->T, E’ ->+TE’ {+, dollar, ) }
Follow(T’) = T -> FT’, T’ -> *FT’ = {+, dollar, ) }
Follow(F) = T -> FT’ => T->F, T’ -> *FT’= {*, +, dollar, )}
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id

2) S A
→
A aB | Ad
→
B b
→
C g
→
Eliminating Left recursion
S A
→
A -> aB A1
A1 -> d A1 | ∈
B b
→
C g
→
First(S)= First(A) ={ a}
First(A) = {a}
First(A1) = {d, }
∈
First(B) = {b}
First(C) = {g}
Follow(S) = { dollar }
Follow(A) = S -> A => {dollar }
Follow(A1) = A -> aB A1, A1 -> d A1 => Follow(A) = {dollar }
Follow(B) = A -> aB A1 => A -> aB =>{First(A1) without U Follow(A)}
∈
=> { d, dollar}

S -> ABC | Dg
A -> aA | ∈
B -> bB | ∈
C -> cC| ∈
D ->d
First(S) = First(ABC) U First(Dg)
=First(aABC) U First(dg)
= First( BC)
∈ = First(BC)
=First(bBC)
=First( C) = First(C) =
∈
{c, }
∈
First(S) = {a, d, b, c, }
∈
First(A)= {a, }
∈
First(B)= {b, }
∈
First(C)= {c, }
∈
First(D)= {d}
Follow(S) = {dollar}
Follow(A) = 1)S -> ABC => S->A C=>S->AC=>
∈
S->A =>
∈ S->A
2)A->aA
{ First(B) without } U
∈ First(C) without ∈
U Follow(S) }
Follow(A) ={b, c, dollar}
Follow(B) = 1)S -> ABC => S-> AB => S->AB
∈
2) B ->bB
{First(C) without U Follow(S) }
∈
Follow(B) ={ c, dollar}
Follow(C) = 1) S-> ABC
2) C ->cC
Follow ( C) =Follow(S) = {dollar}

 1) A abc | def | ghi
→
 2) S A
→
 A aB / Ad
→
 B b
→
 C g
→
 3) S (L) / a
→
 L SL’
→
 L’ ,SL’ /
→ ∈

LL(1) Grammars
 Predictive parsers are those recursive descent parsers (top-
down parser) needing no backtracking.
 Grammars for which we can create predictive parsers are
called LL(1)
 The first L means scanning input from left to right
 The second L means leftmost derivation
 And 1 stands for using one input symbol for lookahead
 A grammar G is LL(1) if and only if whenever A-> α|βare two
distinct productions of G, the following conditions hold:
 For no terminal a do αandβ both derive strings beginning with a
 At most one of α or βcan derive empty string
 If α=> ɛ then βdoes not derive any string beginning with a
terminal in Follow(A).
*

Recursive Descent Parsing (Recap)
 Grammar must be LL(1)
 Every nonterminal has one (recursive)
procedure responsible for parsing the
nonterminal’s syntactic category of input
tokens
 When a nonterminal has multiple
productions, each production is implemented
in a branch of a selection statement based on
input look-ahead information

Using FIRST and FOLLOW to Write a Recursive
Descent Parser
expr  term rest
rest  + term rest
| - term rest
| 
term  id
procedure rest()
begin
if lookahead in FIRST(+ term rest) then
match(‘+’); term(); rest()
else if lookahead in FIRST(- term rest) then
match(‘-’); term(); rest()
else if lookahead in FOLLOW(rest) then
return
else error()
end;
FIRST(+ term rest) = { + }
FIRST(- term rest) = { - }
FOLLOW(rest) = { $ }
Procedure
expr()
{
term();
rest();
}
Procedure term()
{
if (i/p symbol
==‘id’)
match(‘id’)
Else error()
main()
{
expr();
if (i/p symbol ==
$)
accept;
else error()
}
id - id +id$
expr -> term rest
-> id rest
->id + term
rest
->id + id rest
-> id + id

S -> ABC | Dg
A -> aA | ∈
B -> bB | ∈
C -> cC| ∈
D ->d
main()
{
S();
if (i/p symbol ==
$)
accept;
else error()
}
S()
{
if lookahead in First(ABC) then
begin
if (i/p symbol == ‘a’):
match(‘a’) A(); B(); C();
else if (i/p symbol =‘b’)
match (‘b’) B(); C();
else if(i/p symbol =‘c’)
match(‘c’) (C);
else lookahead in Follow(S) then
return
end
else if lookahead in First(Dg) then
match(‘d’) match(‘g’)
else
error()
}
A()
{
if lookahead in First(aA)
then
match(‘a’) A();
else if lookahead in
Follow(A)
return
else error()
B()
C()
D()
{
if lookahead in First(d) then
match(‘d’) else error() }
abc$
aaaaa$

S A
→
A -> aB A1
A1 -> d A1 | ∈
B b
→
C g
→
main()
{
S();
if (i/p symbol ==
$)
accept;
else error()
}
S()
{
A();
}
A1()
{
if lookahead in First(dA1) then
match (‘d’) A1();
else if lookahead in Follow(A1)
then
return
else
error()
}
A()
{
if lookahead in First(aBA1) then
match (‘a’) B(); A1();
else
error()
}
B()
{
if lookahead in First(b)
Match (‘b’)
Else
Error()
C()
{
if lookahead in First(g)
Match (‘g’)

Non-Recursive Predictive Parsing: Table-
Driven Parsing
 Given an LL(1) grammar G = (N, T, P, S) construct
a table M[A,a] for A  N, a  T and use a driver
program with a stack
Predictive parsing
program (driver)
Parsing table
M
a + b $
X
Y
Z
$
stack
input
output

Construction of predictive parsing table
 For each production A->α in grammar do the
following:
1. For each terminal a in First(α) add A->α in M[A,a]
2. If ɛ is in First(α), then for each terminal b in
Follow(A) add A-> ɛ to M[A,b]. If ɛ is in First(α)
and $ is in Follow(A), add A-> ɛ to M[A,$] as well
 If after performing the above, there is no
production in M[A, a] then set M[A,a] to error

Example
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
F
T
E
E’
T’
First Follow
{(,id}
{(,id}
{(,id}
{+,ɛ}
{*,ɛ}
{+, *, ), $}
{+, ), $}
{+, ), $}
{), $}
{), $}
E
E’
T
T’
F
Non -
termin
al
Input Symbol
id + * ( ) $
E -> TE’ E -> TE’
E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ
T -> FT’ T -> FT’
T’ -> *FT’
T’ -> Ɛ T’ -> Ɛ T’ -> Ɛ
F -> (E)
F -> id

S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b
First(S) = {i, a}
First(S’) = {e, Ɛ}
First(E) = { b}
Follow(S) = {$, e}
S -> iEtSS’ =>
First(S’)
S -> iEtS
S’ -> eS
Follow(S’) = {$, e}
S -> iEtSS’
Follow(E) = {t}
M i t a e b $
S S ->
iEtSS’
S->a
S’ S’ -> eS
S’ -> Ɛ
S’ -> Ɛ
E E -> b
M[S’,e] contains two entries, so Given Grammar is not LL(1)

Another example
S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b
S
S’
E
Non -
termin
al
Input Symbol
a b e i t $
S -> a S -> iEtSS’
S’ -> Ɛ
S’ -> eS
S’ -> Ɛ
E -> b

Non-recursive predicting parsing
a + b $
Predictive
parsing
program
output
Parsing
Table
M
stack
X
Y
Z
$

Predictive Parsing Program (Driver)
push($)
push(S)
a := lookahead
repeat
X := pop()
if X is a terminal or X = $ then
match(X) // moves to next token and a := lookahead
else if M[X,a] = X  Y1Y2…Yk then
push(Yk, Yk-1, …, Y2, Y1) // such that Y1 is on top
… invoke actions and/or produce IR output …
else error()
endif
until X = $

Example Table-Driven Parsing
Stack
$E
$ERT
$ERTRF
$ERTRid
$ERTR
$ER
$ERT+
$ERT
$ERTRF
$ERTRid
$ERTR
$ERTRF*
$ERTRF
$ERTRid
$ERTR
Input
id+id*id$
id+id*id$
id+id*id$
id+id*id$
+id*id$
+id*id$
+id*id$
id*id$
id*id$
id*id$
*id$
*id$
id$
id$
$
$
$
Production applied
E  T ER
T  F TR
F  id
TR  
ER  + T ER
T  F TR
F  id
TR  * F TR
F  id
TR  
ER  
E -> TER
ER -> +TER | Ɛ
T -> FTR
TR -> *F TR | Ɛ
F -> (E) | id

Example Table-Driven Parsing
Stack
$E
$ ER T
$ ER TR F
$ ER TR id
$ ER TR
$ ER
$ ER T +
$ ER T
$ ER TR F
$ ER TR id
$ ER TR
$ ER
$ ER T +
$ ER T
$ ER TR F
Input
id+id+id$
id+id+id$
id+id+id$
id+id+id$
+id+d$
+id+id$
+id+id$
id+id$
id+id$
id+id$
+id$
+id$
+id$
id$
id $
id $
$
Production applied
E  T ER
T  F TR
F  id
TR  
ER  + T ER
T  F TR
F  id
TR  
ER  + T ER
T  F TR
F  id
E -> TER
ER -> +TER | Ɛ
T -> FTR
TR -> *F TR | Ɛ
F -> (E) | id
id++id$
(id+id)*id$

Let the grammar G = (V, T, S’, P) is
S' S$
→
S xYzS | a
→
Y xYz | y
→

1. E→T R
2. R ε
→
3. R +
→ E
4. T→F S
5. S ε
→
6. S *
→ T
7. F→n
8. F (
→ E )

n + * ( ) $
E 1 1
R 3 2 2 2
T 4 4
S 5 6 5 5
F 7 8

ched Todo Input Action
E $ n + n * n $
T R $ n + n * n $ E → T R
F S R $ n + n * n $ T → F S
n S R $ n + n * n $ F → n
n S R $ + n * n $ match n
n R $ + n * n $ S → ε
n + E $ + n * n $ R → + E
n + E $ n * n $ match +
n + T R $ n * n $ E → T R
n + F S R $ n * n $ T → F S
n + n S R $ n * n $ F → n
n + n S R $ * n $ match n
n + n * T R $ * n $ S → * T
n + n * T R $ n $ match *
n + n * F S R $ n $ T → F S
n + n * n S R $ n $ F → n
n + n * n S R $ $ match n
n + n * n R $ $ S → ε
n + n * n $ $ R → ε

Compiler Design_Syntax Analyzer_Top Down Parsers.pptx

More Related Content

Similar to Compiler Design_Syntax Analyzer_Top Down Parsers.pptx (20)

More from RushaliDeshmukh2 (15)

Recently uploaded (20)

Compiler Design_Syntax Analyzer_Top Down Parsers.pptx