Bottom up parser

Bottom-Up Parsing
 Bottom-up parsing is more general than top- down.
 A bottom-up parser builds a derivation by working
from the input sentence back toward the start symbol S
 Preferred method in practice
 Also called LR parsing
L means that tokens are read left to right
R means that it constructs a rightmost derivation

Bottom-up Parsing
• Two types of bottom-up parsing:
• Operator-Precedence parsing
• LR parsing
• covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (Lookahead LR
parser)

Bottom-Up Parsing
LR parsing reduces a string to the start
symbol by inverting productions:
Str input string of terminals
repeat
Identify β in str such that A →β is a
production
Replace β by A in str
until str = S (the start symbol) OR all
possibilities are exhausted

Bottom-Up Parsing
 int + (int) + (int)
 E + (int) + (int)
 E + (E) + (int)
 E + (int)
 E + (E)
 E
E → E + ( E ) | int
A rightmost derivation
in reverse

Reductions
Bottom-up parsing as the process of
"reducing" a string w to the start symbol
of the grammar.
At each reduction step, a substring that
matches the body of a production is
replaced by the non-terminal at the head
of that production.

Handle
Handle of a string: Substring that matches the
RHS of some production AND whose reduction
to the non-terminal on the LHS is a step along
the reverse of some rightmost derivation.

Handles
 Formally:
Handle of a right sentential form  is <A  ,
location of  in >
i.e. A   is a handle of  at the location
immediately after the end of , if:S => A => 
 A certain sentential form may have many different
handles.
 Right sentential forms of a non-ambiguous grammar
have one unique handle

Example
S  aABe  aAde  aAbcde  abbcde
S  aABe
A  Abc | b
B  d
S  aABe is a handle of aABe in location 1.
B  d is a handle of aAde in location 3.
A  Abc is a handle of aAbcde in location 2.
A  b is a handle of abbcde in location 2.

Handle Pruning
 A rightmost derivation in reverse can be obtained by “handle-pruning.”
abbcde
Find the handle = b at loc. 2
aAbcde
b at loc. 3 is not a handle:
aAAcde
... blocked.

Handle-pruning
 The process of discovering a handle & reducing it to the appropriate left-hand side is
called handle pruning.
 Handle pruning forms the basis for a bottom-up parsing method.
To construct a rightmost derivation
S = 0  1  2  …  n-1  n = w
Apply the following simple algorithm
for i  n to 1 by -1
Find the handle Ai i in i
Replace i with Ai to generate i-1

1. S -> 0 S1|01 indicate the handle in each of the following right-sentential forms:
1. 000111
2. 00S11
2. For the grammar S -> S S + | S S * | a indicate the handle in each of the following
right-sentential forms:
1. SSS + a * +
2. SS + a * a+
3. aaa * a + +.
3. Give bottom-up parses for the following input strings
1. The input 000111 according to the grammar of
Exercise 1
2. The input aaa * a + + according to the grammar of 2

Shift Reduce Parsing with a
Stack
Two problems:
 locate a handle
 decide which production to use (if there are
more than two candidate productions).

Shift-reduce Parsing
A shift-reduce parser is a stack automaton with four actions
 Shift — next word is shifted onto the stack
 Reduce — right end of handle is at top of stack
Locate left end of handle within the stack
Pop handle off stack & push appropriate lhs
 Accept — stop parsing & report success
 Error — call an error reporting/recovery routine
Accept & Error are simple
Shift is just a push and a call to the scanner
Reduce takes |rhs| pops & 1 push

x - 2 * y
Stack Input Handle Action
$ id - num * id None shift
$ id - num * id
1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle and reduce
0 Goal  Expr
1 Expr  Expr + Term
2 | Expr - Term
3 | Term
4 Term  Term * Factor
5 | Term / Factor
6 | Factor
7 Factor  number
8 | id
9 | ( Expr )

Goal
<id,x>
Term
Fact.
Expr –
Expr
<id,y>
<num,2>
Fact.
Fact.Term
Term
*
Stack Input Action
$ id - num * id shift
$ id - num * id reduce 8
$ Factor - num * id reduce 6
$ Term - num * id reduce 3
$ Expr - num * id shift
$ Expr - num * id shift
$ Expr - num * id reduce 7
$ Expr - Factor * id reduce 6
$ Expr - Term * id shift
$ Expr - Term * id shift
$ Expr - Term * id reduce 8
$ Expr - Term * Factor reduce 4
$ Expr - Term reduce 2
$ Expr reduce 0
$ Goal accept
Back to x - 2 * y
Corresponding Parse Tree

Conflicts During Shift-Reduce Parsing
Conflicts
“shift/reduce” or “reduce/reduce”
Example:
stmt  if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
Stack Input
if … then stmt else … Shift/ Reduce Conflict
We can’t tell
whether it is a
handle

LR Parsing
Bottom-up parser based on a concept called
LR(k) parsing
"L" is for left-to-right scanning of the input.
"R" for constructing a rightmost derivation in
reverse,
“k” for the number of input symbols of
lookahead that are used in making parsing
decisions.

Why LR Parsers?
For a grammar to be LR it is sufficient that a
left-to-right shift-reduce parser be able to
recognize handles of right-sentential forms
when they appear on top of the stack.

Why LR Parsers?
 LR parsers can be constructed to recognize all
programming language constructs for which
context-free grammars can be written.
 The LR-parsing method is the most general non-
back-tracking shift-reduce parsing method.
 An LR parser can detect a syntactic errors.
 The class of grammars that can be parsed using LR
methods is a proper superset of the class of
grammars that can be parsed with predictive or LL
methods.

Components of LR Parser
X
Y
Z
$
A + B $
Parsing Program
Parse Table
Action Goto
output
stack
Input buffer

Techniques for Creating Parse Table
 SLR: Construct parsing table for small set of grammars called
SLR(1).
 Easy to develop.
 CLR(CANONICAL LR) : Most powerful.
Generates a large parse table.
More difficult develop.
Works for all types of CFG
May detect position of error.
 LALR(LOOK AHEAD LR) :Widely used method.
 Optimizes the size of parse table, by combining some states.
 Information may get lost.

SLR Parsers
1. Formation of augmented grammar G’ for
the given grammar G
2. Construction of LR(0) collection of
items.
To find LR(0) collection of items Closure(I)
and Goto(I,X) have to be computed.
3. Finding first and follow of non-terminals
4. Construction of parse table.

Formation of Augmented
Grammar
 The augmented grammar G’, is G with a new start
symbol S’ and an additional production S’ -> S
E->E+T|T
T->T*F|F
 The augmented grammar G’ is given by
E’->E
E->E+T|T
T->T*F|F

Items and the LR(O)
Automaton
How does a shift-reduce parser know when
to shift and when to reduce?
How does the parser know that symbol on the
top of the stack is not a handle?
An LR parser makes shift-reduce decisions
by maintaining states to keep track of all the
operations.
States represent sets of "items."

Items and the LR(O)
Automaton
 An LR(O) item of a grammar G is a production of G, with a
dot at some position in the body.
 Production A -> XYZ yields the four items
A -> ·XYZ
A -> X·YZ
A -> XY·Z
A -> XYZ·
 A ->ε generates only one item, A ->.
 An item indicates how much of a production we have seen
at a given point in the parsing process.

Items and the LR(O)
Automaton
 The item A -> ·XYZ indicates that we hope to see a string derivable from XYZ next
on the input.
 Item A -> X·YZ indicates that we have just seen on the input a string derivable from
X and that we hope next to see a string derivable from YZ.
 Item A -> XY·Z indicates that we have just seen on the input a string derivable from
XY and that we hope next to see a string derivable from Z.
 Item A -> XYZ· indicates that we have seen the body XY Z and that it may be time
to reduce XYZ to A

Items and the LR(O)
Automaton
 Collection of sets of LR(0) items, called the
canonical LR(0) collection.
 Provides the basis for constructing a deterministic
finite automaton that is used to make parsing
decisions.
 Automaton is called an LR(0) automaton.
 Each state of the LR(0) automaton represents a
set of items in the canonical LR(0) collection.

Construction of LR(0) Items
Items are viewed as states in NFA.
Grouped to form same states.
Process of grouping together is called subset
Construction Algorithm.
Closure and goto operations have to be
computed.

Closure of Item Sets
 If I is a set of items for a grammar G, then CLOSURE(I)
is the set of items constructed from I by the two rules:
Initially, add every item in I to CLOSURE(I).
If A -> α·Bγ is in CLOSURE(I) and B -> γ is a
production, then add the item B -> γ to CLOSURE(I),
if it is not already there. Apply this rule until no more
new items can be added to CLOSURE (I).

Computation of CLOSURE
E’  E
E  E + T | T
T  T * F | F
F  ( E ) | id
I0
E’  .E
E  .E + T
E  .T
T  .T * F
T .F
F  .( E )
F  .id

Computation of CLOSURE
SetOfltems CLOSURE (I) {
J = I;
repeat
for ( each item A -> a·Bγ in J )
for ( each production B ->γ of G )
if (B ->.γ is not in J )
add B ->.γ to J;
until no more items are added to J on one round;
return J;
}

 Kernel items : The initial item, S' ->·S, and all items whose dots are not at
the left end.
 Non-kernel items : All items with their dots at the left end, except for S' ->
·S.

The Function GOTO
 GOTO (I, X) is defined to be the closure of the set of all items [A -> αX.β] such that
[A -> α.Xβ] is in I.
 If I is the set of two items { [E' -> E·] , [E -> E· + T] } , then
 GOTO(I, +) contains the items
 E -> E + ·T
 T -> ·T * F
 T -> ·F
 F -> · (E)
 F -> ·id

Grammar
S  E + S | E
E  num
E  num .
S’  S . $
num
ES’  . S $
S  .E + S
S  . E
E  .num
1
S  E . +S
S  E .
2
S  E + S .
5
S  E + . S
S  . E + S
S  . E
E  . num
3
S’  S $ .
7
4
S
S
$
+
E
num

num + $ E S
1 s4 g2 g6
2 s3 SE
3 s4 g2 g5
4 Enum Enum
5 SE+S
6 s7
7 accept

S' S
S   L=R
S   R
L   *R
L   id
R   L
id
S' S
S   L=R
S   R
L   *R
L   id
R   L
L  id 
S  L =R
R  L 
S' S 
I0
I1
I2
I3
S  R I4
L  * R
R   L
L   id
L   * R
I5
S
L
*
id R
S  L=  R
R   L
L   *R
L   id
I6
=
R
S  L=R 
R  L 
L
L
I7
id
I3
*
*
L  *R 
R
I8
I9

state action goto
id = * $ S L R
0 s3 s5 1 2 4
1 accept
2 s6/r(RL)
3 r(Lid) r(Lid)
4 r(SR)
5 s3 s5 7 8
6 s3 s5 7 9
7 r(RL) r(RL)
8 r(L*R) r(L*R)
9 r(SL=R)

Structure of the LR Parsing
Table
1. The ACTION function takes as arguments a state i and a terminal
a (or $, the input endmarker). The value of ACTION[i, a] can have
one of four forms:
a) Shift j , where j is a state. The action taken by the parser shifts
input a to the stack, but uses state j to represent a .
b) Reduce A ->β. The action of the parser reduces β on the top of
the stack to head A.
c) Accept. The parser accepts the input and finishes parsing;
d) Error. The parser discovers an error in its input and takes some
corrective action.
2. Extend the GOTO function, defined on sets of items, to states: if
GOTo [Ii , A] = Ij , then GOTO also maps a state i and a
nonterminal A to state j .

Construction of SLR Parse
Table
1. Construct C={I0,I1…….In} the collection sets of LR(0)
items for G’.
2. Initial state of the parser is constructed from the set of items
for [S’->S]
3. State I is constructed from Ii. The parsing actions are
determined as follows.
1. If [A->α.aβ] is in Ii and GOTO(Ii,a]=Ij, then ACTION[i,a]
is set to ‘shift j’ here ‘a’ must be a terminal.
2. If[A->α.] is in Ii, then ACTION[i,a] is set to reduce A->a
for all ‘a’ Follow(A).
3. If S’->S is in Ii, than action[i, $] is set to ‘accept’.

Construction of SLR parsing
table
4. GOTO transitions are constructed for all non-terminals. If GOTO(Ii,A) =Ij, then
goto(i,A)of the parse table is set to j.
5. All other entries are error entries.

LR-Parser Configurations
 Helps to have complete state o its stack and the remaining
input. A configuration of an LR parser is a pair:(s0s1
………… sm, aiai+1…………an$).
 where the first component is the stack contents and the
second component is the remaining input.

LR-parsing algorithm.
 INPUT: An input string w and an LR-parsing table with functions ACTION
and GOTO for a grammar G.
 OUTPUT: If w is in L ( G), the reduction steps of a bottom-up parse for W ;
otherwise, an error indication.
 METHOD: Initially, the parser has So on its stack, where So is the initial
state, and w$ in the input buffer.

Bottom up parser

More Related Content

What's hot (20)

Similar to Bottom up parser (20)

More from Akshaya Arunan (9)

Recently uploaded (20)

Bottom up parser