Chapter2CDpdf__2021_11_26_09_19_08.pdf

Department of CE
Prof. Happy Chapla
Unit no : 2
Lexical Analyser
LexicalAnalyser
CD:COMPILER DESIGN

Outline :
Introduction to Lexical Analyser
Tokens, Lexemes, and Patterns
Specification of Tokens
Regular expression and Regular Definition
Transition Diagram
Hard coding and automatic generation of lexical analysers
Finite Automata
Regular expression to NFA using Thompson’s rule
NFA to DFA conversion using subset construction method
DFA Optimization
Regular expression to DFA conversion
Department of CE
Unit no : 2
Lexical Analyser
Prof. Happy Chapla

Role of
Lexical
Analyser
 Remove comments and white spaces in the form of
blanks, tabs, and newline characters (aka scanning)
 Macros expansion
 Read input characters from the source program
 Group them into lexemes
 Produce as output a sequence of tokens
 Interact with the symbol table
 Correlate error messages generated by the compiler
with the source program

Scanner –
Parser
Interaction
 After receiving a “Get next token” command from
parser, the lexical analyzer reads the input character
until it can identify the next token.

Why
separating
Lexical and
Syntactic?
 Simplicity of design
 Improved compiler efficiency
 Allows us to use specialized technique for lexer, not
suitable for parser
 Higher portability
 Input-device-specific peculiarities restricted to lexer

Tokens,
Lexemes and
Patterns
 Lexeme:
A lexeme is a sequence of characters in the source
program that is matched by the pattern for a token.
 Pattern:
A set of strings in the input for which the same token is
produced as output. This set of strings is described by a
rule called a pattern associated with the token.
 Token:
Token is a sequence of characters that can be treated as
a single logical entity. Typical tokens
are,
1) Identifiers 2) keywords 3) operators 4) special
symbols 5)constants

Example
Token lexeme pattern
else else characters e, l, s, e
if if characters i, f
comparision <=, < >, >=, > < or <= or = or < > or >=
id pi, s1, Mj5 letter
followed by letters & digit
num 3.14, 0,
3.09e17
any numeric constant
literal "core" any character b/w “and
“except"

Token
Classes
We have different token classes are possible:
 One token per keyword
 Tokens for the operators
 One token representing all identifiers
 Tokens representing constants (e.g. numbers)
 Tokens for punctuation symbols

Example
In C program, the variable declaration line as:
int value = 100;
 int (keyword)
 value(identifier)
 = (operator)
 100 (constant)
 ; (symbol)
Exercise:
if(y <= t)
y = y – 3;

Example
Total = Ans + 30
 Tokens:
 Total : Identifier 1
 = : Operator 1
 Ans : Identifier 2
 + : Operator 2
 30 : Constant 1
 Lexems:
 Lexems of identifiers : Total, Ans
 Lexems of operators : =, +
 Lexems of constant : 30

Dealing with
errors
How lexical analyser deals with errors?
 Lexical analyser unable to proceed: no pattern
matches
 Panic mode recovery: delete successive characters
from remaining input until token found
 Insert missing character
 Delete a character
 Replace character by another
 Transpose two adjacent characters

Specification
of tokens
 There are 3 specifications of tokens:
 Strings
 Language
 Regular expression
 Strings and Languages:
 An alphabet or character class is a finite set of
symbols.
 A string over an alphabet is a finite sequence of
symbols drawn from that alphabet.
 A language is any countable set of strings over
some fixed alphabet.

Operations of
Strings
 A prefix of string s is any string obtained by removing
zero or more symbols from the end of string s. For
example, ban is a prefix of banana.
 A suffix of string s is any string obtained by removing zero
or more symbols from the beginning of s. For example,
nana is a suffix of banana.
 A substring of s is obtained by deleting any prefix and any
suffix from s. For example, nan is a substring of banana.
 The proper prefixes, suffixes, and substrings of a string
s are those prefixes, suffixes, and substrings, respectively
of s that are not ε or not equal to s itself.
 A subsequence of s is any string formed by deleting zero
or more not necessarily consecutive positions of s. For
example, baan is a subsequence of banana.

Exercise
 Write prefix, suffix, substring, proper prefix, proper
suffix and subsequence for following strings:
 Example
 Revolution

Operations of
Languages
The following are the operations that can be applied to
languages:
(Applying these operations on L and S)
 Union (L U S) : {𝑡 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑜𝑟 𝑡 𝑖𝑠 𝑖𝑛 𝑆 }
 Concatenation (LS) : {𝑡𝑧 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑎𝑛𝑑 𝑧 𝑖𝑠 𝑖𝑛 𝑆 }
 Kleene Closure (L*) :
𝐿
∗
𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑧𝑒𝑟𝑜 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.
 Positive Closure (L+) :
𝐿
+
𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑜𝑛𝑒 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.

Example
Let L = {0, 1} and S = {a, b, c}
 Union : L ∪ S = {0, 1, a, b, c}
 Concatenation : L . S = {0a, 1a, 0b, 1b, 0c, 1c}
 Kleene Closure : L* = {ε, 0, 1, 00,……}
 Positive Closure : L+ = {0, 1, 00,……}

Language:
A language is a set of strings.
Example: Even numbers
Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} (All possible symbols for given language)
L = {0, 2, 4, 6, 8, 10, 12, 14, … } (Elements present in language)
Example: Variable name in C language
Σ = ASCII characters
L = {a, b, c, …, A, B, C, …, _, aa, ab, … }
Regular Language:
• A subset of all languages that can be defined by regular expressions.
• Any character is a regular expression matching itself. (a is a regular expression for
character a)
• ε is a regular expression matching the empty string.

Operations on Regular Language:
If R1 and R2 are two regular expressions, then:
• R1R2 is a regular expression matching the concatenation of the languages.
• R1 | R2: is a regular expression matching the disjunction of the languages.
• R1*: is a regular expression matching the Kleene closure of the language (0 or more
occurrences ).
• (R): is a regular expression matching R.
Example (Let Σ = {a, b})
• The regular expression a|b denotes the language _________.
Ans: {a, b}
• (a|b)(a|b) denotes ___________
Ans: {aa, ab, ba, bb}, the language of all strings of length two over the alphabet Σ.
Another regular expression for the same language is aa|ab|ba|bb.
• a* denotes the language consisting of ______________
Ans: all strings of zero or more a's, that is, {ε,a,aa,aaa,...}

Example (Let Σ = {0, 1})
• (0|1)* denotes the set of all strings _____________
Ans: Strings containing zero or more instances of 0 or 1, that is, all strings of 0's and 1's:
{e, 0, 1, 00, 01, 10, 11,...}.
Another regular expression for the same language is (0*1*)*
• 0|0*1 denotes the language __________
Ans: {0, 1, 01, 001, 0001,...}, that is, the string 0 and all strings consisting of zero or more
0's and ending with 1.

Regular Expression
&
Regular Definition

Regular
Expression
Regular Expression is a sequence of characters that
define a pattern.
Notations:
 One or more instances : +
 Zero or more instances: *
 Zero or one instance: ?
 Alphabets : Σ
 Regular expression r and regular language for it is L(r)
 (abc): “abc” occurred together in a regular expression
 [abc]: a, b, c any one of these or all of these are present in
regular expression

Regular
Expression
Rules to define Regular Expression:
 Φ is a regular expression for empty set.
 ε is a regular expression, and L(ε) is { ε }, that is, the
language whose sole member is the empty string.
 If ‘a’ is a symbol in Σ, then ‘a’ is a regular expression, and
L(a) = {a}.
 Suppose r and s are regular expressions denoting the
languages L(r) and L(s). Then,
 (r)|(s) is a regular expression denoting the language L(r) U
L(s).
 (r)(s) is a regular expression denoting the language L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
 (r) is a regular expression denoting L(r).

Regular
Expression
 L = Zero or more occurences of a = a*
 a* = {ε, a, aa, aaa, aaaa,……} (Infinite elements)
 L = One or more occurences of a = a+
 a+ = {a, aa, aaa, aaaa,……} (Infinite elements)

Regular
Expression
Algebraic or identity rules of regular expressions:
(R and S are two regular expressions)
 R + S = S + R + or | is commutative
 (R + S) + T = R + (S + T) + or | is associative
 (RS)T = R(ST) concatenation is associative
 (R + S)T = RT + ST concatenation distributes over |
 ϵ.R = R.ϵ = R ϵ is identity element for concatenation
 R* = (ϵ + R)* relation between * and ϵ
 R** = R* * is idempotent
 RR* = R+
 R? = (R | ϵ) 0 or 1 occurrence
 [a – z] (a|b|…|z) 1 character from the given range
 [acdgj] (a|c|d|g|) 1 of the given characters

Exercise: (For given RE define its meaning or list down which are the valid string
for that RE)
• [012]+
Ans: (0|1|2)+ String can have one or more instances of 0 or 1 or 2. All strings of 0’s and
1’s and 2’s: {0, 1, 2, 00, 01, 02, 10, 11, 12,….}
• [0 – 9]+
Ans: All possible combinations of strings containing elements from 0 to 9. Atleast one
instance is required.
• [1 – 9][0 – 9]+
Ans: Starting symbol must be any number from 1 to 9. After that it can have any number
of instances of any number from 0 to 9.
Resultant set will have strings of length at least 2.
• [a – z A – Z][a – z A – Z 0 – 9]*
Ans: Starting with alphabets and string of length 1 {a, z, A, C} are also valid strings here.

Precedence
and
Associativity
 The unary operator * has highest precedence and is
left associative.
 Concatenation has second highest precedence and is
left associative.
 | has lowest precedence and is left associative.

Examples (Regular Expression)
Language String Regular Expression
0 or 1 0, 1 0 | 1
1 or 10 or 111 1, 10, 111 1 | 10 | 111
Strings having one or more 0 0, 00, 000, 0000,…… 0+
All possible binary strings
over Σ = {0, 1}
0, 1, 00, 01, 10, 11,
000,……
(0 | 1)+
All possible strings of length 3
over Σ = {a, b, c}
aaa, aba, abc, abb,…… (a|b|c) (a|b|c) (a|b|c)

One or more occurrences of
0 or 1 or both
0, 1, 00, 01, 10, 11, 111,
101,……
(0 | 1)+
Binary string ending with 0 0, 10, 100, 110, 00, 010,…… (0 | 1)* 0
Binary string starting with 1 1, 10, 100, 110, 101,
1101,……
1 (0 | 1)*
Binary string starting with 1
and ending with 0
10, 110, 100, 110, 1100,
1110, 1000,……
1 (1|0)* 0
String starting and ending
with same character for Σ =
{0, 1}
00, 11, 010, 000, 101, 1101,
0110, 1011,……
1 (1|0)* 1 or 0 (1|0)* 0
String ending with 01 for Σ
= {0, 1}
01, 101, 001, 1001,
1101,……
(0|1)*01

Language consisting of
exactly two 0’s for Σ = {0, 1}
00, 010, 000, 001, 0101,…… 1*01*01*
All binary strings with
length at least 3 for Σ = {0,
1}
000, 010, 110, 1111,
1011,……
(0|1) (0|1) (0|1) (0|1)*
All binary strings where 2nd
symbol from starting is 0
for Σ = {0, 1}
00, 10, 101, 100,…… (0|1)0(0|1)*
Any number of a’s followed
by any number of b’s
followed by any number of
c’s
ε, abc, aaabc, abbbbbc,
abccc, ab, accc,……
a*b*c*

Exercise
Write regular expression for language specified
Over Σ = {0, 1}
 Strings having even length
Ans: RE: ((0|1)(0|1))*
 String containing exactly three 1’s
Ans: RE: 0*10*10*10*
 String starting with 0 and having odd length
Ans: RE: 0((0|1)(0|1))*
 String starting or ending with 01 or 111
Ans: RE: (01|111)(0|1)* | (0|1)*(01|111)

Regular
Definition
 For notational convenience, we may give names to certain
regular expressions and use those names in subsequent
expressions, as if the names were themselves symbols.
 These names are known as regular definition.
 Regular definition is a sequence of definitions of the form:
𝑑1 → 𝑟1
𝑑2 → 𝑟2
……
𝑑𝑛 → 𝑟𝑛
Where 𝑑𝑖 is a distinct name & 𝑟𝑖 is a regular expression.

Example
Regular definition for identifier
 letter → A|B|C|………..|Z|a|b|………..|z
 digit → 0|1|…….|9
 id → letter (letter | digit)*
Regular Definition for Even Numbers
 (+|-|ε) (0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)
 Sign → + | -
 OptSign → Sign | ε //In other words (Sign ?)
 Digit → [0 – 9] //In other words (0 | 1 | ….. | 9)
 EvenDigit → [02468] //In other words (0 | 2 | 4 | 6 | 8)
 EvenNumber → OptSign Digit* EvenDigit

Transition
Diagram
 Transition diagram is a special kind of flowchart for
language analysis.
 In transition diagram the boxes of flowchart are
drawn as circle and called as states.
 States are connected by arrows called as edges.
 The label or weight on edge indicates the input
character that can appear after that state

Transition
Diagram
 Symbolized representation of transition diagram uses:
is a state
is a transition
is a start state
is a final state

<
0
6
1 2
3
4
5
8
7
=
other
>
=
other
=
>
Return (relop,LE)
Return (relop,NE)
Return (relop,LT)
Return (relop,GE)
Return (relop,GT)
Return (relop,EQ)
Transition Diagram : Example
Transition Diagram for Relational Operators
8
7
4
3
2
5

1 2 8
other
digit
3 4 5 6 7
digit
digit
digit
+or -
digit
digit
E
.
start
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Transition Diagram : Example
Transition Diagram for Unsigned Numbers
E digit
8

Hard Coding
and
Automatic Generation
of
Lexical Analyser

 Lexical analyser helps in identifying the pattern from the input.
 Transition diagram is constructed to recognize the patterns. It is known as hard
coding lexical analyzer.
Example:
 To represent identifier in ‘C’, the first character must be letter and other characters
are either letter or digits. To recognize this pattern, hard coding lexical analyzer
works with this transition diagram:
 Lex and flex are compiler tools which takes regular expression as an input and finds
out the pattern, matching to that expression.
2 3
Start
Letter or digit
Letter
1

Implementing Regular Expression:
• Regular expressions can be implemented using finite
automata.
• There are two kinds of finite automata:
• NFAs (nondeterministic finite automata)
• DFAs (deterministic finite automata)
The step of implementing the lexical analyzer
1. Lexical Specification
2. Regular Expression
3. NFA
4. DFA
5. Table-Driven DFA
Finite State
Automaton

Finite State
Automaton
A finite set of states present in FSM
• One marked as initial state
• One or more marked as final states
• States sometimes labeled or numbered
A set of transitions from one state to another
• Each labeled with symbol from Σ (possible symbols), or ε
Operate by reading input symbols
• Transition can be taken if labelled with current symbol
• ε-transition can be taken at any time
Accept when final state reached & no more input
Reject if no transition possible, or no more input and not in
final state (DFA)

Finite
Automata
 We call the recognizer of the tokens as a finite
automaton.
 FA results in “yes” or “no” based on each input string.
 FA consist of:
 S : Set of states
 𝜮 : Set of input symbol
 move : A transition function
 S0 : Initial state
 F : Accepting state (Final state)

Finite
Automata
 A finite automaton can be: deterministic (DFA) or
non-deterministic (NFA)
 Both deterministic and non-deterministic finite
automaton recognize regular sets.
 Deterministic – faster recognizer, but it may take
more space
 Non-deterministic – slower, but it may take less space
 Deterministic automatons are widely used lexical
analysers.

 Deterministic finite automata (DFA): From each state exactly one edge leaving out
(for each symbol).
 Nondeterministic finite automata (NFA): There are no restrictions on the edges
leaving a state. There can be several with the same symbol as label and some edges
can be labeled with 𝜖.
1 2 3 4
a b b
a
b
1 2 3 4
a b b
a
a
a
b
DFA NFA
b

Regular Expression to NFA
(Thompson’s Rule)

Example
(RE to NFA)
1 4
𝜖 𝜖
𝜖
𝜖
2 3
𝑎
1 2 3
a b
1
2
5
3
4
6
a
b
𝜖
𝜖 𝜖
𝜖
a*
ab
(a|b)

Example
(RE to NFA)
1 4
𝜖 𝜖
𝜖
𝜖
2 3
𝑎
1 4
𝜖 𝜖
𝜖
𝜖
2 3
𝑏
6
5
𝑎 𝑏
5
𝑏
a*b
b*ab

Example
(RE to NFA)
1
2
5
3
6
𝜖
𝜖 𝜖
𝜖
4
d
c
𝜖
𝜖
𝜖
0 7
𝜖
(c|d)*

Exercise
 abab
 a*|b*
 abb(a)*ba
 (ab + b)*ba
 (a + b)*aa(a+b)

NFA to DFA conversion
(Using subset construction method)

Subset
Construction
Algorithm
 Input : NFA (N)
 Output : DFA (D) (Accepting the same language)
 Method : Apply Algorithm, Make Tansition Table, Dtran for
DFA .
Operations to perform:
Operation Description
Є – closure(s) Set of NFA States reachable from NFA State s on Є –
transition alone.
Є – closure(T) Set of NFA States reachable from some NFA State s in T on Є
–transition alone.
Move (T,a) Set of NFA states to which there is a transition on input
symbol a from some NFA state s in T.

Subset
Construction
Algorithm
Initially Є –closure (s0) be the only state in Dstates and it is
unmarked;
While there is unmarked states in T in Dstates do begin
Mark T;
for each input symbol a do begin
U = Є –closure (move (T,a));
If U is not in Dstates then
add U as unmarked state to Dstates;
Dtran [ T, a ] = U
end
end

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
Є closure (0) = {0,1,2,4,7}  A

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
Є closure (0) = {0,1,2,4,7}  A
State a b
A = {0,1,2,4,7}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,a) = {3,8}
State a b
A = {0,1,2,4,7}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,a) = {3,8}
Є closure (Move(A,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
State a b
A = {0,1,2,4,7} B

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,b) = {5}
State a b
A = {0,1,2,4,7} B

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,b) = {5}
Є closure (Move(A,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}
State a b
A = {0,1,2,4,7} B

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,b) = {5}
Є closure (Move(A,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}  C
State a b
A = {0,1,2,4,7} B C

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,a) = {3,8}
Є Closure (Move (B,a)) = { 3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,b) = {5,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,b) = {5,9}
Є Closure (Move (B,b)) = {5,6,7,1,2,4,9}
= {1,2,4,5,6,7,9}  D
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,a) = {3,8}
Є Closure (Move (c,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,b) = {5}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,b) = {5}
Є Closure (Move (C,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}  C
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,b) = {5}
Є Closure (Move (C,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}  C
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9}
Move (D,a) = {3,8}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9}
Move (D,a) = {3,8}
Є Closure (Move (D,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B
Move (D,b) = {5,10}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B
Move (D,b) = {5,10}
Є Closure (Move (D,b)) = {5,6,7,1,2,4,10}
= {1,2,4,5,6,7,10}  E

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
Move (D,b) = {5,10}
Є Closure (Move (D,b)) = {5,6,7,1,2,4,10}
= {1,2,4,5,6,7,10}  E

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10}
Move (E,a) = {3,8}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10}
Move (E,a) = {3,8}
Є Closure (Move (E,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B
Move (E,b) = {5}

Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B C
Move (E,b) = {5}

Example : (a|b)*abb
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B C

Convert NFA to DFA : (a|b)*abb
Є –closure (0) = {0,1,2,4,7}  Let A
Move (A,a) = {3,8}
Є –closure ((A,a)) = {1,2,3,4,6,7,8}  Let B
Move (A,b) = {5}
Є –closure ((A,b)) = {1,2,4,5,6,7}  Let C

Move (B,a) = {3,8}
Є –closure ((B,a)) = {1,2,3,4,6,7,8}  Let B
Move (B,b) = {5,9}
Є –closure ((B,b)) = {1,2,4,5,6,7,9}  Let D
Move (C,a) = {3,8}
Є –closure ((C,a)) = {1,2,3,4,6,7,8}  Let B
Move (C,b) = {5}
Є –closure ((C,b)) = {1,2,4,5,6,7}  Let C
Convert NFA to DFA : (a|b)*abb (Cont…)

Move (D,a) = {3,8}
Є –closure ((D,a)) = {1,2,3,4,6,7,8}  Let B
Move (D,b) = {5,10}
Є –closure ((D,b)) = {1,2,4,5,6,7,10}  Let E
Move (E,a) = {3,8}
Є –closure ((E,a)) = {1,2,3,4,6,7,8}  Let B
Move (E,b) = {5}
Є –closure ((E,b)) = {1,2,4,5,6,7}  Let C

State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B C

Example
 Convert following regular expression to DFA using
subset construction method:
 (0 + 1)*1(0 + 1)
 (0 + 1)*01*

DFA
Optimization
 The procedure can also be known as minimization of DFA.
 Minimization/optimization refers to the detection of those
states of a DFA, whose presence or absence in a DFA does not
affect the language accepted by the automata.
 The states that can be eliminated from automata, without
affecting the language accepted by automata, are:
 Unreachable or inaccessible states
 Dead states
 Non-distinguishable or indistinguishable state or equivalent
states.
 Partitioning algorithm helps in DFA optimization process.

Partitioning
Algorithm
1. Remove all the states that are unreachable from initial
state via any set of transition of DFA.
2. Draw the transition table for all pair of states.
3. Now, split the transition table into two tables T1 and T2.
1. T1 contains all final states
2. T2 contains non-final states
4. Find similar rows from T1 such that;
𝛿(q, a) = p
𝛿(r, a) = p
i.e. find the two states which have same value of a and b and
remove one of them

Continued…
5. Repeat step 3 until we find no similar rows available in T1
6. Repeat step 3 and step 4 for table T2 also.
7. Now combine the reduced T1 and T2 tables.
i.e. the final transition table of minimized DFA.

DFA Optimization
A B C
B B D
C B C
D B E
E B C
States a b
{𝐴, 𝐵, 𝐶, 𝐷, 𝐸}
Nonaccepting States
{𝐴, 𝐵, 𝐶, 𝐷}
Accepting States
{𝐸}
{𝐴, 𝐵, 𝐶} {𝐷}
{𝐴, 𝐶} {𝐵}
• Now no more splitting is possible.
• If we chose A as the representative for group
(AC), then we obtain reduced transition
table
A B A
B B D
D B E
E B A
States a b
Optimized
TransitionTable

Conversion from regular
expression to DFA

Function
computed
from syntax
tree
 nullable (n): Is true for * node and node labeled with Ɛ. For
other nodes it is false.
 firstpos (n): Set of positions at node ti that corresponds to
the first symbol of the sub-expression rooted at n.
 lastpos (n): Set of positions at node ti that corresponds to
the last symbol of the sub-expression rooted at n.
 followpos (i): Set of positions that follows given position by
matching the first or last symbol of a string generated by
sub-expression of the given regular expression.

Rules to compute nullable, firstpos, lastpos
Node n nullable(n) firstpos(n) lastpos(n)
A leaf labeled by  true ∅ ∅
A leaf with position 𝒊 false {𝑖} {𝑖}
nullable(c1)
or
nullable(c2)
firstpos(c1)

firstpos(c2)
lastpos(c1)

lastpos(c2)
|
n
c1
nullable(c1)
and
nullable(c2)
if (nullable(c1))
thenfirstpos(c1) 
firstpos(c2)
else firstpos(c1)
if (nullable(c2)) then
lastpos(c1)  lastpos(c2)
else lastpos(c2)
n
true firstpos(c1) lastpos(c1)
∗
n
c2
.
c1 c2
c1

Computation
of followpos
The position of regular expression can follow another in the
following ways:
 If n is a cat node with left child c1 and right child c2, then for
every position i in lastpos(c1), all positions in firstpos(c2) are
in followpos(i).
 For cat node, for each position i in lastpos of its left
child, the firstpos of its right child will be in followpos(i).
 If n is a star node and i is a position in lastpos(n), then all
positions in firstpos(n) are in followpos(i).
 For star node, the firstpos of that node is in f ollowpos of all
positions in lastpos of that node.

Conversion from regular expression to DFA
𝑎 𝑏
|
∗
.
𝟏 𝟐
𝑎
.
.
.
𝑏
𝑏
#
(a|b)*abb
𝟒
𝟑
𝟓
𝟔
#
Step 2: Nullable node
Here, * is only nullable node
Step 1: Construct SyntaxTree

𝑎 𝑏
|
∗
.
{1} {2}
{1,2}
{1,2} 𝑎
{3}
{1,2,3}
.
{4}
{1,2,3}
.
{5}
{1,2,3}
.
{6}
{1,2,3}
𝑏
𝑏
#
Step 3: Calculate firstpos
Firstpos
A leaf with position 𝒊 = {𝒊}
|
n
c1 c2
firstpos(c1)  firstpos(c2)
∗
n
c1
firstpos(c1)
if (nullable(c1))
firstpos(c1)  firstpos(c2)
else firstpos(c1)
.
n
c1 c2
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔

𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 3: Calculate lastpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
Lastpos
A leaf with position 𝒊 = {𝒊}
|
n
c1 c2
∗
n
c1
lastpos(c1)
if (nullable(c2))
else lastpos(c2)
.
n
c1 c2

Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2,3} {5}
.
{6} {6}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {5}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 6
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 5 = 6
Firstpos
Lastpos

Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2,3} {4}
.
{5} {5}
𝒄𝟏 𝒄𝟐
4 5
Firstpos
Lastpos

Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2,3} {3}
.
{4} {4}
𝒄𝟏 𝒄𝟐
4 5
3 4
Firstpos
Lastpos

Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2} {1,2}
.
{3} {3}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {1,2}
4 5
3 4
2 3
1 3
Firstpos
Lastpos

Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
𝒏
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑛) = {1,2}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑛 = 1,2
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 1,2
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 1,2
4 5
3 4
2 3
1 3
{1,2} {1,2}
*
1,2,
1,2,
Firstpos
Lastpos

Construct DFA
Initial state = 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 of root = {1,2,3} ----- A
State A
δ( (1,2,3),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3),b) = followpos(2)
=(1,2,3) ----- A
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4}

Construct DFA
State B
δ( (1,2,3,4),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3,4),b) = followpos(2) U followpos(4)
=(1,2,3) U (5) = {1,2,3,5} ----- C
State C
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3,5),b) = followpos(2) U followpos(5)
=(1,2,3) U (6) = {1,2,3,6} ----- D
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6}

Construct DFA
State D
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3,6),b) = followpos(2)
=(1,2,3) ----- A
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6} B A
A B C D
a b b
b
a
a
b
a
DFA

Construct DFA
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6} B A
A B C D
a b b
b
a
a
b
a
DFA
Note: Elements of E contains state 10 that is acceptance
state in NFA. So, State E is acceptance state

Chapter2CDpdf__2021_11_26_09_19_08.pdf

More Related Content

What's hot (20)

Similar to Chapter2CDpdf__2021_11_26_09_19_08.pdf (20)

Recently uploaded (20)

Chapter2CDpdf__2021_11_26_09_19_08.pdf