SlideShare a Scribd company logo
Department of CE
Prof. Happy Chapla
Unit no : 2
Lexical Analyser
LexicalAnalyser
CD:COMPILER DESIGN
Outline :
Introduction to Lexical Analyser
Tokens, Lexemes, and Patterns
Specification of Tokens
Regular expression and Regular Definition
Transition Diagram
Hard coding and automatic generation of lexical analysers
Finite Automata
Regular expression to NFA using Thompson’s rule
NFA to DFA conversion using subset construction method
DFA Optimization
Regular expression to DFA conversion
Department of CE
Unit no : 2
Lexical Analyser
Prof. Happy Chapla
Role of
Lexical
Analyser
 Remove comments and white spaces in the form of
blanks, tabs, and newline characters (aka scanning)
 Macros expansion
 Read input characters from the source program
 Group them into lexemes
 Produce as output a sequence of tokens
 Interact with the symbol table
 Correlate error messages generated by the compiler
with the source program
Scanner –
Parser
Interaction
 After receiving a “Get next token” command from
parser, the lexical analyzer reads the input character
until it can identify the next token.
Why
separating
Lexical and
Syntactic?
 Simplicity of design
 Improved compiler efficiency
 Allows us to use specialized technique for lexer, not
suitable for parser
 Higher portability
 Input-device-specific peculiarities restricted to lexer
Tokens,
Lexemes and
Patterns
 Lexeme:
A lexeme is a sequence of characters in the source
program that is matched by the pattern for a token.
 Pattern:
A set of strings in the input for which the same token is
produced as output. This set of strings is described by a
rule called a pattern associated with the token.
 Token:
Token is a sequence of characters that can be treated as
a single logical entity. Typical tokens
are,
1) Identifiers 2) keywords 3) operators 4) special
symbols 5)constants
Example
Token lexeme pattern
else else characters e, l, s, e
if if characters i, f
comparision <=, < >, >=, > < or <= or = or < > or >=
id pi, s1, Mj5 letter
followed by letters & digit
num 3.14, 0,
3.09e17
any numeric constant
literal "core" any character b/w “and
“except"
Token
Classes
We have different token classes are possible:
 One token per keyword
 Tokens for the operators
 One token representing all identifiers
 Tokens representing constants (e.g. numbers)
 Tokens for punctuation symbols
Example
In C program, the variable declaration line as:
int value = 100;
 int (keyword)
 value(identifier)
 = (operator)
 100 (constant)
 ; (symbol)
Exercise:
if(y <= t)
y = y – 3;
Example
Total = Ans + 30
 Tokens:
 Total : Identifier 1
 = : Operator 1
 Ans : Identifier 2
 + : Operator 2
 30 : Constant 1
 Lexems:
 Lexems of identifiers : Total, Ans
 Lexems of operators : =, +
 Lexems of constant : 30
Dealing with
errors
How lexical analyser deals with errors?
 Lexical analyser unable to proceed: no pattern
matches
 Panic mode recovery: delete successive characters
from remaining input until token found
 Insert missing character
 Delete a character
 Replace character by another
 Transpose two adjacent characters
Specification of tokens
Specification
of tokens
 There are 3 specifications of tokens:
 Strings
 Language
 Regular expression
 Strings and Languages:
 An alphabet or character class is a finite set of
symbols.
 A string over an alphabet is a finite sequence of
symbols drawn from that alphabet.
 A language is any countable set of strings over
some fixed alphabet.
Operations of
Strings
 A prefix of string s is any string obtained by removing
zero or more symbols from the end of string s. For
example, ban is a prefix of banana.
 A suffix of string s is any string obtained by removing zero
or more symbols from the beginning of s. For example,
nana is a suffix of banana.
 A substring of s is obtained by deleting any prefix and any
suffix from s. For example, nan is a substring of banana.
 The proper prefixes, suffixes, and substrings of a string
s are those prefixes, suffixes, and substrings, respectively
of s that are not ε or not equal to s itself.
 A subsequence of s is any string formed by deleting zero
or more not necessarily consecutive positions of s. For
example, baan is a subsequence of banana.
Exercise
 Write prefix, suffix, substring, proper prefix, proper
suffix and subsequence for following strings:
 Example
 Revolution
Operations of
Languages
The following are the operations that can be applied to
languages:
(Applying these operations on L and S)
 Union (L U S) : {𝑡 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑜𝑟 𝑡 𝑖𝑠 𝑖𝑛 𝑆 }
 Concatenation (LS) : {𝑡𝑧 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑎𝑛𝑑 𝑧 𝑖𝑠 𝑖𝑛 𝑆 }
 Kleene Closure (L*) :
𝐿
∗
𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑧𝑒𝑟𝑜 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.
 Positive Closure (L+) :
𝐿
+
𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑜𝑛𝑒 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.
Example
Let L = {0, 1} and S = {a, b, c}
 Union : L ∪ S = {0, 1, a, b, c}
 Concatenation : L . S = {0a, 1a, 0b, 1b, 0c, 1c}
 Kleene Closure : L* = {ε, 0, 1, 00,……}
 Positive Closure : L+ = {0, 1, 00,……}
Language:
A language is a set of strings.
Example: Even numbers
Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} (All possible symbols for given language)
L = {0, 2, 4, 6, 8, 10, 12, 14, … } (Elements present in language)
Example: Variable name in C language
Σ = ASCII characters
L = {a, b, c, …, A, B, C, …, _, aa, ab, … }
Regular Language:
• A subset of all languages that can be defined by regular expressions.
• Any character is a regular expression matching itself. (a is a regular expression for
character a)
• ε is a regular expression matching the empty string.
Operations on Regular Language:
If R1 and R2 are two regular expressions, then:
• R1R2 is a regular expression matching the concatenation of the languages.
• R1 | R2: is a regular expression matching the disjunction of the languages.
• R1*: is a regular expression matching the Kleene closure of the language (0 or more
occurrences ).
• (R): is a regular expression matching R.
Example (Let Σ = {a, b})
• The regular expression a|b denotes the language _________.
Ans: {a, b}
• (a|b)(a|b) denotes ___________
Ans: {aa, ab, ba, bb}, the language of all strings of length two over the alphabet Σ.
Another regular expression for the same language is aa|ab|ba|bb.
• a* denotes the language consisting of ______________
Ans: all strings of zero or more a's, that is, {ε,a,aa,aaa,...}
Example (Let Σ = {0, 1})
• (0|1)* denotes the set of all strings _____________
Ans: Strings containing zero or more instances of 0 or 1, that is, all strings of 0's and 1's:
{e, 0, 1, 00, 01, 10, 11,...}.
Another regular expression for the same language is (0*1*)*
• 0|0*1 denotes the language __________
Ans: {0, 1, 01, 001, 0001,...}, that is, the string 0 and all strings consisting of zero or more
0's and ending with 1.
Regular Expression
&
Regular Definition
Regular
Expression
Regular Expression is a sequence of characters that
define a pattern.
Notations:
 One or more instances : +
 Zero or more instances: *
 Zero or one instance: ?
 Alphabets : Σ
 Regular expression r and regular language for it is L(r)
 (abc): “abc” occurred together in a regular expression
 [abc]: a, b, c any one of these or all of these are present in
regular expression
Regular
Expression
Rules to define Regular Expression:
 Φ is a regular expression for empty set.
 ε is a regular expression, and L(ε) is { ε }, that is, the
language whose sole member is the empty string.
 If ‘a’ is a symbol in Σ, then ‘a’ is a regular expression, and
L(a) = {a}.
 Suppose r and s are regular expressions denoting the
languages L(r) and L(s). Then,
 (r)|(s) is a regular expression denoting the language L(r) U
L(s).
 (r)(s) is a regular expression denoting the language L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
 (r) is a regular expression denoting L(r).
Regular
Expression
 L = Zero or more occurences of a = a*
 a* = {ε, a, aa, aaa, aaaa,……} (Infinite elements)
 L = One or more occurences of a = a+
 a+ = {a, aa, aaa, aaaa,……} (Infinite elements)
Regular
Expression
Algebraic or identity rules of regular expressions:
(R and S are two regular expressions)
 R + S = S + R + or | is commutative
 (R + S) + T = R + (S + T) + or | is associative
 (RS)T = R(ST) concatenation is associative
 (R + S)T = RT + ST concatenation distributes over |
 ϵ.R = R.ϵ = R ϵ is identity element for concatenation
 R* = (ϵ + R)* relation between * and ϵ
 R** = R* * is idempotent
 RR* = R+
 R? = (R | ϵ) 0 or 1 occurrence
 [a – z] (a|b|…|z) 1 character from the given range
 [acdgj] (a|c|d|g|) 1 of the given characters
Exercise: (For given RE define its meaning or list down which are the valid string
for that RE)
• [012]+
Ans: (0|1|2)+ String can have one or more instances of 0 or 1 or 2. All strings of 0’s and
1’s and 2’s: {0, 1, 2, 00, 01, 02, 10, 11, 12,….}
• [0 – 9]+
Ans: All possible combinations of strings containing elements from 0 to 9. Atleast one
instance is required.
• [1 – 9][0 – 9]+
Ans: Starting symbol must be any number from 1 to 9. After that it can have any number
of instances of any number from 0 to 9.
Resultant set will have strings of length at least 2.
• [a – z A – Z][a – z A – Z 0 – 9]*
Ans: Starting with alphabets and string of length 1 {a, z, A, C} are also valid strings here.
Precedence
and
Associativity
 The unary operator * has highest precedence and is
left associative.
 Concatenation has second highest precedence and is
left associative.
 | has lowest precedence and is left associative.
Examples (Regular Expression)
Language String Regular Expression
0 or 1 0, 1 0 | 1
1 or 10 or 111 1, 10, 111 1 | 10 | 111
Strings having one or more 0 0, 00, 000, 0000,…… 0+
All possible binary strings
over Σ = {0, 1}
0, 1, 00, 01, 10, 11,
000,……
(0 | 1)+
All possible strings of length 3
over Σ = {a, b, c}
aaa, aba, abc, abb,…… (a|b|c) (a|b|c) (a|b|c)
Language String Regular Expression
One or more occurrences of
0 or 1 or both
0, 1, 00, 01, 10, 11, 111,
101,……
(0 | 1)+
Binary string ending with 0 0, 10, 100, 110, 00, 010,…… (0 | 1)* 0
Binary string starting with 1 1, 10, 100, 110, 101,
1101,……
1 (0 | 1)*
Binary string starting with 1
and ending with 0
10, 110, 100, 110, 1100,
1110, 1000,……
1 (1|0)* 0
String starting and ending
with same character for Σ =
{0, 1}
00, 11, 010, 000, 101, 1101,
0110, 1011,……
1 (1|0)* 1 or 0 (1|0)* 0
String ending with 01 for Σ
= {0, 1}
01, 101, 001, 1001,
1101,……
(0|1)*01
Language String Regular Expression
Language consisting of
exactly two 0’s for Σ = {0, 1}
00, 010, 000, 001, 0101,…… 1*01*01*
All binary strings with
length at least 3 for Σ = {0,
1}
000, 010, 110, 1111,
1011,……
(0|1) (0|1) (0|1) (0|1)*
All binary strings where 2nd
symbol from starting is 0
for Σ = {0, 1}
00, 10, 101, 100,…… (0|1)0(0|1)*
Any number of a’s followed
by any number of b’s
followed by any number of
c’s
ε, abc, aaabc, abbbbbc,
abccc, ab, accc,……
a*b*c*
Exercise
Write regular expression for language specified
Over Σ = {0, 1}
 Strings having even length
Ans: RE: ((0|1)(0|1))*
 String containing exactly three 1’s
Ans: RE: 0*10*10*10*
 String starting with 0 and having odd length
Ans: RE: 0((0|1)(0|1))*
 String starting or ending with 01 or 111
Ans: RE: (01|111)(0|1)* | (0|1)*(01|111)
Regular
Definition
 For notational convenience, we may give names to certain
regular expressions and use those names in subsequent
expressions, as if the names were themselves symbols.
 These names are known as regular definition.
 Regular definition is a sequence of definitions of the form:
𝑑1 → 𝑟1
𝑑2 → 𝑟2
……
𝑑𝑛 → 𝑟𝑛
Where 𝑑𝑖 is a distinct name & 𝑟𝑖 is a regular expression.
Example
Regular definition for identifier
 letter → A|B|C|………..|Z|a|b|………..|z
 digit → 0|1|…….|9
 id → letter (letter | digit)*
Regular Definition for Even Numbers
 (+|-|ε) (0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)
 Sign → + | -
 OptSign → Sign | ε //In other words (Sign ?)
 Digit → [0 – 9] //In other words (0 | 1 | ….. | 9)
 EvenDigit → [02468] //In other words (0 | 2 | 4 | 6 | 8)
 EvenNumber → OptSign Digit* EvenDigit
Example
Regular Definition for Unsigned Numbers
 digit → 0 | 1 | ..... | 9
 digits → digit digit*
 optionalFraction → .digits | ε
 optionalExponent →( E ( + | - | ε ) digits ) | ε
 number → digits optionalFraction optionalExponent
This can be simplified as:
 digit → [0-9]
 digits → digit+
 number → digits (. digits)? ( E [+ | -]? digits )?
Transition Diagram
Transition
Diagram
 Transition diagram is a special kind of flowchart for
language analysis.
 In transition diagram the boxes of flowchart are
drawn as circle and called as states.
 States are connected by arrows called as edges.
 The label or weight on edge indicates the input
character that can appear after that state
Transition
Diagram
 Symbolized representation of transition diagram uses:
is a state
is a transition
is a start state
is a final state
<
0
6
1 2
3
4
5
8
7
=
other
>
=
other
=
>
Return (relop,LE)
Return (relop,NE)
Return (relop,LT)
Return (relop,GE)
Return (relop,GT)
Return (relop,EQ)
Transition Diagram : Example
Transition Diagram for Relational Operators
8
7
4
3
2
5
1 2 8
other
digit
3 4 5 6 7
digit
digit
digit
+or -
digit
digit
E
.
start
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Transition Diagram : Example
Transition Diagram for Unsigned Numbers
E digit
8
Hard Coding
and
Automatic Generation
of
Lexical Analyser
 Lexical analyser helps in identifying the pattern from the input.
 Transition diagram is constructed to recognize the patterns. It is known as hard
coding lexical analyzer.
Example:
 To represent identifier in ‘C’, the first character must be letter and other characters
are either letter or digits. To recognize this pattern, hard coding lexical analyzer
works with this transition diagram:
 Lex and flex are compiler tools which takes regular expression as an input and finds
out the pattern, matching to that expression.
2 3
Start
Letter or digit
Letter
1
Finite Automata
Implementing Regular Expression:
• Regular expressions can be implemented using finite
automata.
• There are two kinds of finite automata:
• NFAs (nondeterministic finite automata)
• DFAs (deterministic finite automata)
The step of implementing the lexical analyzer
1. Lexical Specification
2. Regular Expression
3. NFA
4. DFA
5. Table-Driven DFA
Finite State
Automaton
Finite State
Automaton
A finite set of states present in FSM
• One marked as initial state
• One or more marked as final states
• States sometimes labeled or numbered
A set of transitions from one state to another
• Each labeled with symbol from Σ (possible symbols), or ε
Operate by reading input symbols
• Transition can be taken if labelled with current symbol
• ε-transition can be taken at any time
Accept when final state reached & no more input
Reject if no transition possible, or no more input and not in
final state (DFA)
Finite
Automata
 We call the recognizer of the tokens as a finite
automaton.
 FA results in “yes” or “no” based on each input string.
 FA consist of:
 S : Set of states
 𝜮 : Set of input symbol
 move : A transition function
 S0 : Initial state
 F : Accepting state (Final state)
Finite
Automata
 A finite automaton can be: deterministic (DFA) or
non-deterministic (NFA)
 Both deterministic and non-deterministic finite
automaton recognize regular sets.
 Deterministic – faster recognizer, but it may take
more space
 Non-deterministic – slower, but it may take less space
 Deterministic automatons are widely used lexical
analysers.
 Deterministic finite automata (DFA): From each state exactly one edge leaving out
(for each symbol).
 Nondeterministic finite automata (NFA): There are no restrictions on the edges
leaving a state. There can be several with the same symbol as label and some edges
can be labeled with 𝜖.
1 2 3 4
a b b
a
b
1 2 3 4
a b b
a
a
a
b
DFA NFA
b
Regular Expression to NFA
(Thompson’s Rule)
Base Cases
(For NFA)
Construction
for s|t
Construction
for st
Construction
for s*
Example
(RE to NFA)
1 4
𝜖 𝜖
𝜖
𝜖
2 3
𝑎
1 2 3
a b
1
2
5
3
4
6
a
b
𝜖
𝜖 𝜖
𝜖
a*
ab
(a|b)
Example
(RE to NFA)
1 4
𝜖 𝜖
𝜖
𝜖
2 3
𝑎
1 4
𝜖 𝜖
𝜖
𝜖
2 3
𝑏
6
5
𝑎 𝑏
5
𝑏
a*b
b*ab
Example
(RE to NFA)
1
2
5
3
6
𝜖
𝜖 𝜖
𝜖
4
d
c
𝜖
𝜖
𝜖
0 7
𝜖
(c|d)*
Exercise
 abab
 a*|b*
 abb(a)*ba
 (ab + b)*ba
 (a + b)*aa(a+b)
NFA to DFA conversion
(Using subset construction method)
Subset
Construction
Algorithm
 Input : NFA (N)
 Output : DFA (D) (Accepting the same language)
 Method : Apply Algorithm, Make Tansition Table, Dtran for
DFA .
Operations to perform:
Operation Description
Є – closure(s) Set of NFA States reachable from NFA State s on Є –
transition alone.
Є – closure(T) Set of NFA States reachable from some NFA State s in T on Є
–transition alone.
Move (T,a) Set of NFA states to which there is a transition on input
symbol a from some NFA state s in T.
Subset
Construction
Algorithm
Initially Є –closure (s0) be the only state in Dstates and it is
unmarked;
While there is unmarked states in T in Dstates do begin
Mark T;
for each input symbol a do begin
U = Є –closure (move (T,a));
If U is not in Dstates then
add U as unmarked state to Dstates;
Dtran [ T, a ] = U
end
end
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
Є closure (0) = {0,1,2,4,7}  A
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
Є closure (0) = {0,1,2,4,7}  A
State a b
A = {0,1,2,4,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,a) = {3,8}
State a b
A = {0,1,2,4,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,a) = {3,8}
State a b
A = {0,1,2,4,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,a) = {3,8}
Є closure (Move(A,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
State a b
A = {0,1,2,4,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,b) = {5}
State a b
A = {0,1,2,4,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,b) = {5}
Є closure (Move(A,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}
State a b
A = {0,1,2,4,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,b) = {5}
Є closure (Move(A,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}  C
State a b
A = {0,1,2,4,7} B C
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,a) = {3,8}
Є Closure (Move (B,a)) = { 3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,b) = {5,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,b) = {5,9}
Є Closure (Move (B,b)) = {5,6,7,1,2,4,9}
= {1,2,4,5,6,7,9}  D
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,a) = {3,8}
Є Closure (Move (c,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,b) = {5}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,b) = {5}
Є Closure (Move (C,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}  C
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,b) = {5}
Є Closure (Move (C,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}  C
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9}
Move (D,a) = {3,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9}
Move (D,a) = {3,8}
Є Closure (Move (D,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B
Move (D,b) = {5,10}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B
Move (D,b) = {5,10}
Є Closure (Move (D,b)) = {5,6,7,1,2,4,10}
= {1,2,4,5,6,7,10}  E
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
Move (D,b) = {5,10}
Є Closure (Move (D,b)) = {5,6,7,1,2,4,10}
= {1,2,4,5,6,7,10}  E
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10}
Move (E,a) = {3,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10}
Move (E,a) = {3,8}
Є Closure (Move (E,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B
Move (E,b) = {5}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B
Move (E,b) = {5}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B C
Move (E,b) = {5}
Example : (a|b)*abb
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B C
Convert NFA to DFA : (a|b)*abb
Є –closure (0) = {0,1,2,4,7}  Let A
Move (A,a) = {3,8}
Є –closure ((A,a)) = {1,2,3,4,6,7,8}  Let B
Move (A,b) = {5}
Є –closure ((A,b)) = {1,2,4,5,6,7}  Let C
Move (B,a) = {3,8}
Є –closure ((B,a)) = {1,2,3,4,6,7,8}  Let B
Move (B,b) = {5,9}
Є –closure ((B,b)) = {1,2,4,5,6,7,9}  Let D
Move (C,a) = {3,8}
Є –closure ((C,a)) = {1,2,3,4,6,7,8}  Let B
Move (C,b) = {5}
Є –closure ((C,b)) = {1,2,4,5,6,7}  Let C
Convert NFA to DFA : (a|b)*abb (Cont…)
Move (D,a) = {3,8}
Є –closure ((D,a)) = {1,2,3,4,6,7,8}  Let B
Move (D,b) = {5,10}
Є –closure ((D,b)) = {1,2,4,5,6,7,10}  Let E
Move (E,a) = {3,8}
Є –closure ((E,a)) = {1,2,3,4,6,7,8}  Let B
Move (E,b) = {5}
Є –closure ((E,b)) = {1,2,4,5,6,7}  Let C
Convert NFA to DFA : (a|b)*abb (Cont…)
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B C
Convert NFA to DFA : (a|b)*abb (Cont…)
Example
 Convert following regular expression to DFA using
subset construction method:
 (0 + 1)*1(0 + 1)
 (0 + 1)*01*
DFA Optimization
DFA
Optimization
 The procedure can also be known as minimization of DFA.
 Minimization/optimization refers to the detection of those
states of a DFA, whose presence or absence in a DFA does not
affect the language accepted by the automata.
 The states that can be eliminated from automata, without
affecting the language accepted by automata, are:
 Unreachable or inaccessible states
 Dead states
 Non-distinguishable or indistinguishable state or equivalent
states.
 Partitioning algorithm helps in DFA optimization process.
Partitioning
Algorithm
1. Remove all the states that are unreachable from initial
state via any set of transition of DFA.
2. Draw the transition table for all pair of states.
3. Now, split the transition table into two tables T1 and T2.
1. T1 contains all final states
2. T2 contains non-final states
4. Find similar rows from T1 such that;
𝛿(q, a) = p
𝛿(r, a) = p
i.e. find the two states which have same value of a and b and
remove one of them
Continued…
5. Repeat step 3 until we find no similar rows available in T1
6. Repeat step 3 and step 4 for table T2 also.
7. Now combine the reduced T1 and T2 tables.
i.e. the final transition table of minimized DFA.
DFA Optimization
A B C
B B D
C B C
D B E
E B C
States a b
{𝐴, 𝐵, 𝐶, 𝐷, 𝐸}
Nonaccepting States
{𝐴, 𝐵, 𝐶, 𝐷}
Accepting States
{𝐸}
{𝐴, 𝐵, 𝐶} {𝐷}
{𝐴, 𝐶} {𝐵}
• Now no more splitting is possible.
• If we chose A as the representative for group
(AC), then we obtain reduced transition
table
A B A
B B D
D B E
E B A
States a b
Optimized
TransitionTable
Conversion from regular
expression to DFA
Function
computed
from syntax
tree
 nullable (n): Is true for * node and node labeled with Ɛ. For
other nodes it is false.
 firstpos (n): Set of positions at node ti that corresponds to
the first symbol of the sub-expression rooted at n.
 lastpos (n): Set of positions at node ti that corresponds to
the last symbol of the sub-expression rooted at n.
 followpos (i): Set of positions that follows given position by
matching the first or last symbol of a string generated by
sub-expression of the given regular expression.
Rules to compute nullable, firstpos, lastpos
Node n nullable(n) firstpos(n) lastpos(n)
A leaf labeled by  true ∅ ∅
A leaf with position 𝒊 false {𝑖} {𝑖}
nullable(c1)
or
nullable(c2)
firstpos(c1)

firstpos(c2)
lastpos(c1)

lastpos(c2)
|
n
c1
nullable(c1)
and
nullable(c2)
if (nullable(c1))
thenfirstpos(c1) 
firstpos(c2)
else firstpos(c1)
if (nullable(c2)) then
lastpos(c1)  lastpos(c2)
else lastpos(c2)
n
true firstpos(c1) lastpos(c1)
∗
n
c2
.
c1 c2
c1
Computation
of followpos
The position of regular expression can follow another in the
following ways:
 If n is a cat node with left child c1 and right child c2, then for
every position i in lastpos(c1), all positions in firstpos(c2) are
in followpos(i).
 For cat node, for each position i in lastpos of its left
child, the firstpos of its right child will be in followpos(i).
 If n is a star node and i is a position in lastpos(n), then all
positions in firstpos(n) are in followpos(i).
 For star node, the firstpos of that node is in f ollowpos of all
positions in lastpos of that node.
Conversion from regular expression to DFA
𝑎 𝑏
|
∗
.
𝟏 𝟐
𝑎
.
.
.
𝑏
𝑏
#
(a|b)*abb
𝟒
𝟑
𝟓
𝟔
#
Step 2: Nullable node
Here, * is only nullable node
Step 1: Construct SyntaxTree
Conversion from regular expression to DFA
𝑎 𝑏
|
∗
.
{1} {2}
{1,2}
{1,2} 𝑎
{3}
{1,2,3}
.
{4}
{1,2,3}
.
{5}
{1,2,3}
.
{6}
{1,2,3}
𝑏
𝑏
#
Step 3: Calculate firstpos
Firstpos
A leaf with position 𝒊 = {𝒊}
|
n
c1 c2
firstpos(c1)  firstpos(c2)
∗
n
c1
firstpos(c1)
if (nullable(c1))
firstpos(c1)  firstpos(c2)
else firstpos(c1)
.
n
c1 c2
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
Conversion from regular expression to DFA
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 3: Calculate lastpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
Lastpos
A leaf with position 𝒊 = {𝒊}
|
n
c1 c2
lastpos(c1)  lastpos(c2)
∗
n
c1
lastpos(c1)
if (nullable(c2))
lastpos(c1)  lastpos(c2)
else lastpos(c2)
.
n
c1 c2
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2,3} {5}
.
{6} {6}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {5}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 6
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 5 = 6
Firstpos
Lastpos
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2,3} {4}
.
{5} {5}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {4}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 5
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 4 = 5
4 5
Firstpos
Lastpos
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2,3} {3}
.
{4} {4}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {3}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 4
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 3 = 4
4 5
3 4
Firstpos
Lastpos
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2} {1,2}
.
{3} {3}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {1,2}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 3
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 3
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 3
4 5
3 4
2 3
1 3
Firstpos
Lastpos
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
𝒏
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑛) = {1,2}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑛 = 1,2
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 1,2
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 1,2
4 5
3 4
2 3
1 3
{1,2} {1,2}
*
1,2,
1,2,
Firstpos
Lastpos
Construct DFA
Initial state = 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 of root = {1,2,3} ----- A
State A
δ( (1,2,3),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3),b) = followpos(2)
=(1,2,3) ----- A
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4}
Construct DFA
State B
δ( (1,2,3,4),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3,4),b) = followpos(2) U followpos(4)
=(1,2,3) U (5) = {1,2,3,5} ----- C
State C
δ( (1,2,3,5),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3,5),b) = followpos(2) U followpos(5)
=(1,2,3) U (6) = {1,2,3,6} ----- D
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6}
Construct DFA
State D
δ( (1,2,3,6),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3,6),b) = followpos(2)
=(1,2,3) ----- A
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6} B A
A B C D
a b b
b
a
a
b
a
DFA
Construct DFA
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6} B A
A B C D
a b b
b
a
a
b
a
DFA
Note: Elements of E contains state 10 that is acceptance
state in NFA. So, State E is acceptance state
Thanks
Prof. Happy Chapla

More Related Content

PDF
Language
PPT
Lecture 1 - Lexical Analysis.ppt
PPT
Lecture 1,2
PPT
PPTX
Finite Automata in compiler design
PPTX
System Programing Unit 1
PPTX
Regular Expression in Compiler design
PPTX
C# Private assembly
Language
Lecture 1 - Lexical Analysis.ppt
Lecture 1,2
Finite Automata in compiler design
System Programing Unit 1
Regular Expression in Compiler design
C# Private assembly

What's hot (20)

PPTX
Signed Addition And Subtraction
PDF
Deterministic Finite Automata (DFA)
PPTX
java interface and packages
PPTX
Er diagram
PPT
1.Role lexical Analyzer
PPT
Lecture 3,4
PDF
Character Array and String
PPTX
Context free grammar
PPTX
Object relational database management system
PPT
C# Exceptions Handling
PPTX
Enhance ERD(Entity Relationship Diagram)
PPTX
Lexical analyzer generator lex
PPTX
Data model and entity relationship
PPTX
Ch 4 linker loader
PPTX
Syntax Analysis in Compiler Design
PPT
Lesson 03
PPTX
Lecture 02 lexical analysis
PDF
Flat unit 3
PPTX
Static and dynamic scoping
PPTX
Unit1 principle of programming language
Signed Addition And Subtraction
Deterministic Finite Automata (DFA)
java interface and packages
Er diagram
1.Role lexical Analyzer
Lecture 3,4
Character Array and String
Context free grammar
Object relational database management system
C# Exceptions Handling
Enhance ERD(Entity Relationship Diagram)
Lexical analyzer generator lex
Data model and entity relationship
Ch 4 linker loader
Syntax Analysis in Compiler Design
Lesson 03
Lecture 02 lexical analysis
Flat unit 3
Static and dynamic scoping
Unit1 principle of programming language
Ad

Similar to Chapter2CDpdf__2021_11_26_09_19_08.pdf (20)

PPTX
Lexical analyzer
PPTX
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
PPTX
Unit-2.pptx for complier design for lexical analyzer
PPT
2_2Specification of Tokens.ppt
PPT
Chapter Two(1)
PDF
Lexical analysis Compiler design pdf to read
PDF
Lexical analysis compiler design to read and study
PPTX
[Compilers23] Lexical Analysis – Scanning Part I.pptx
PPT
Compiler Design in Engineering for Designing
PPTX
Chapter 4_Regular Expressions in Automata.pptx
PPTX
L_2_apl.pptx
PDF
Regular Expression
PPTX
Lexical Analyser PPTs for Third Lease Computer Sc. and Engineering
PDF
Lecture3 lexical analysis
PPTX
BCS503 TOC Module 2 PPT.pptx VTU academic Year 2024-25 ODD SEM
PPT
compiler Design course material chapter 2
PDF
Lecture: Regular Expressions and Regular Languages
PPTX
Mod 2_RegularExpressions.pptx
PPT
PPT
Ch3.ppt
Lexical analyzer
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
Unit-2.pptx for complier design for lexical analyzer
2_2Specification of Tokens.ppt
Chapter Two(1)
Lexical analysis Compiler design pdf to read
Lexical analysis compiler design to read and study
[Compilers23] Lexical Analysis – Scanning Part I.pptx
Compiler Design in Engineering for Designing
Chapter 4_Regular Expressions in Automata.pptx
L_2_apl.pptx
Regular Expression
Lexical Analyser PPTs for Third Lease Computer Sc. and Engineering
Lecture3 lexical analysis
BCS503 TOC Module 2 PPT.pptx VTU academic Year 2024-25 ODD SEM
compiler Design course material chapter 2
Lecture: Regular Expressions and Regular Languages
Mod 2_RegularExpressions.pptx
Ch3.ppt
Ad

Recently uploaded (20)

PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Digital Logic Computer Design lecture notes
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Construction Project Organization Group 2.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
additive manufacturing of ss316l using mig welding
PPT
Project quality management in manufacturing
DOCX
573137875-Attendance-Management-System-original
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Foundation to blockchain - A guide to Blockchain Tech
Embodied AI: Ushering in the Next Era of Intelligent Systems
Structs to JSON How Go Powers REST APIs.pdf
UNIT 4 Total Quality Management .pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Digital Logic Computer Design lecture notes
Model Code of Practice - Construction Work - 21102022 .pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Construction Project Organization Group 2.pptx
Lecture Notes Electrical Wiring System Components
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
additive manufacturing of ss316l using mig welding
Project quality management in manufacturing
573137875-Attendance-Management-System-original

Chapter2CDpdf__2021_11_26_09_19_08.pdf

  • 1. Department of CE Prof. Happy Chapla Unit no : 2 Lexical Analyser LexicalAnalyser CD:COMPILER DESIGN
  • 2. Outline : Introduction to Lexical Analyser Tokens, Lexemes, and Patterns Specification of Tokens Regular expression and Regular Definition Transition Diagram Hard coding and automatic generation of lexical analysers Finite Automata Regular expression to NFA using Thompson’s rule NFA to DFA conversion using subset construction method DFA Optimization Regular expression to DFA conversion Department of CE Unit no : 2 Lexical Analyser Prof. Happy Chapla
  • 3. Role of Lexical Analyser  Remove comments and white spaces in the form of blanks, tabs, and newline characters (aka scanning)  Macros expansion  Read input characters from the source program  Group them into lexemes  Produce as output a sequence of tokens  Interact with the symbol table  Correlate error messages generated by the compiler with the source program
  • 4. Scanner – Parser Interaction  After receiving a “Get next token” command from parser, the lexical analyzer reads the input character until it can identify the next token.
  • 5. Why separating Lexical and Syntactic?  Simplicity of design  Improved compiler efficiency  Allows us to use specialized technique for lexer, not suitable for parser  Higher portability  Input-device-specific peculiarities restricted to lexer
  • 6. Tokens, Lexemes and Patterns  Lexeme: A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.  Pattern: A set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token.  Token: Token is a sequence of characters that can be treated as a single logical entity. Typical tokens are, 1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants
  • 7. Example Token lexeme pattern else else characters e, l, s, e if if characters i, f comparision <=, < >, >=, > < or <= or = or < > or >= id pi, s1, Mj5 letter followed by letters & digit num 3.14, 0, 3.09e17 any numeric constant literal "core" any character b/w “and “except"
  • 8. Token Classes We have different token classes are possible:  One token per keyword  Tokens for the operators  One token representing all identifiers  Tokens representing constants (e.g. numbers)  Tokens for punctuation symbols
  • 9. Example In C program, the variable declaration line as: int value = 100;  int (keyword)  value(identifier)  = (operator)  100 (constant)  ; (symbol) Exercise: if(y <= t) y = y – 3;
  • 10. Example Total = Ans + 30  Tokens:  Total : Identifier 1  = : Operator 1  Ans : Identifier 2  + : Operator 2  30 : Constant 1  Lexems:  Lexems of identifiers : Total, Ans  Lexems of operators : =, +  Lexems of constant : 30
  • 11. Dealing with errors How lexical analyser deals with errors?  Lexical analyser unable to proceed: no pattern matches  Panic mode recovery: delete successive characters from remaining input until token found  Insert missing character  Delete a character  Replace character by another  Transpose two adjacent characters
  • 13. Specification of tokens  There are 3 specifications of tokens:  Strings  Language  Regular expression  Strings and Languages:  An alphabet or character class is a finite set of symbols.  A string over an alphabet is a finite sequence of symbols drawn from that alphabet.  A language is any countable set of strings over some fixed alphabet.
  • 14. Operations of Strings  A prefix of string s is any string obtained by removing zero or more symbols from the end of string s. For example, ban is a prefix of banana.  A suffix of string s is any string obtained by removing zero or more symbols from the beginning of s. For example, nana is a suffix of banana.  A substring of s is obtained by deleting any prefix and any suffix from s. For example, nan is a substring of banana.  The proper prefixes, suffixes, and substrings of a string s are those prefixes, suffixes, and substrings, respectively of s that are not ε or not equal to s itself.  A subsequence of s is any string formed by deleting zero or more not necessarily consecutive positions of s. For example, baan is a subsequence of banana.
  • 15. Exercise  Write prefix, suffix, substring, proper prefix, proper suffix and subsequence for following strings:  Example  Revolution
  • 16. Operations of Languages The following are the operations that can be applied to languages: (Applying these operations on L and S)  Union (L U S) : {𝑡 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑜𝑟 𝑡 𝑖𝑠 𝑖𝑛 𝑆 }  Concatenation (LS) : {𝑡𝑧 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑎𝑛𝑑 𝑧 𝑖𝑠 𝑖𝑛 𝑆 }  Kleene Closure (L*) : 𝐿 ∗ 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑧𝑒𝑟𝑜 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.  Positive Closure (L+) : 𝐿 + 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑜𝑛𝑒 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.
  • 17. Example Let L = {0, 1} and S = {a, b, c}  Union : L ∪ S = {0, 1, a, b, c}  Concatenation : L . S = {0a, 1a, 0b, 1b, 0c, 1c}  Kleene Closure : L* = {ε, 0, 1, 00,……}  Positive Closure : L+ = {0, 1, 00,……}
  • 18. Language: A language is a set of strings. Example: Even numbers Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} (All possible symbols for given language) L = {0, 2, 4, 6, 8, 10, 12, 14, … } (Elements present in language) Example: Variable name in C language Σ = ASCII characters L = {a, b, c, …, A, B, C, …, _, aa, ab, … } Regular Language: • A subset of all languages that can be defined by regular expressions. • Any character is a regular expression matching itself. (a is a regular expression for character a) • ε is a regular expression matching the empty string.
  • 19. Operations on Regular Language: If R1 and R2 are two regular expressions, then: • R1R2 is a regular expression matching the concatenation of the languages. • R1 | R2: is a regular expression matching the disjunction of the languages. • R1*: is a regular expression matching the Kleene closure of the language (0 or more occurrences ). • (R): is a regular expression matching R. Example (Let Σ = {a, b}) • The regular expression a|b denotes the language _________. Ans: {a, b} • (a|b)(a|b) denotes ___________ Ans: {aa, ab, ba, bb}, the language of all strings of length two over the alphabet Σ. Another regular expression for the same language is aa|ab|ba|bb. • a* denotes the language consisting of ______________ Ans: all strings of zero or more a's, that is, {ε,a,aa,aaa,...}
  • 20. Example (Let Σ = {0, 1}) • (0|1)* denotes the set of all strings _____________ Ans: Strings containing zero or more instances of 0 or 1, that is, all strings of 0's and 1's: {e, 0, 1, 00, 01, 10, 11,...}. Another regular expression for the same language is (0*1*)* • 0|0*1 denotes the language __________ Ans: {0, 1, 01, 001, 0001,...}, that is, the string 0 and all strings consisting of zero or more 0's and ending with 1.
  • 22. Regular Expression Regular Expression is a sequence of characters that define a pattern. Notations:  One or more instances : +  Zero or more instances: *  Zero or one instance: ?  Alphabets : Σ  Regular expression r and regular language for it is L(r)  (abc): “abc” occurred together in a regular expression  [abc]: a, b, c any one of these or all of these are present in regular expression
  • 23. Regular Expression Rules to define Regular Expression:  Φ is a regular expression for empty set.  ε is a regular expression, and L(ε) is { ε }, that is, the language whose sole member is the empty string.  If ‘a’ is a symbol in Σ, then ‘a’ is a regular expression, and L(a) = {a}.  Suppose r and s are regular expressions denoting the languages L(r) and L(s). Then,  (r)|(s) is a regular expression denoting the language L(r) U L(s).  (r)(s) is a regular expression denoting the language L(r)L(s). c) (r)* is a regular expression denoting (L(r))*.  (r) is a regular expression denoting L(r).
  • 24. Regular Expression  L = Zero or more occurences of a = a*  a* = {ε, a, aa, aaa, aaaa,……} (Infinite elements)  L = One or more occurences of a = a+  a+ = {a, aa, aaa, aaaa,……} (Infinite elements)
  • 25. Regular Expression Algebraic or identity rules of regular expressions: (R and S are two regular expressions)  R + S = S + R + or | is commutative  (R + S) + T = R + (S + T) + or | is associative  (RS)T = R(ST) concatenation is associative  (R + S)T = RT + ST concatenation distributes over |  ϵ.R = R.ϵ = R ϵ is identity element for concatenation  R* = (ϵ + R)* relation between * and ϵ  R** = R* * is idempotent  RR* = R+  R? = (R | ϵ) 0 or 1 occurrence  [a – z] (a|b|…|z) 1 character from the given range  [acdgj] (a|c|d|g|) 1 of the given characters
  • 26. Exercise: (For given RE define its meaning or list down which are the valid string for that RE) • [012]+ Ans: (0|1|2)+ String can have one or more instances of 0 or 1 or 2. All strings of 0’s and 1’s and 2’s: {0, 1, 2, 00, 01, 02, 10, 11, 12,….} • [0 – 9]+ Ans: All possible combinations of strings containing elements from 0 to 9. Atleast one instance is required. • [1 – 9][0 – 9]+ Ans: Starting symbol must be any number from 1 to 9. After that it can have any number of instances of any number from 0 to 9. Resultant set will have strings of length at least 2. • [a – z A – Z][a – z A – Z 0 – 9]* Ans: Starting with alphabets and string of length 1 {a, z, A, C} are also valid strings here.
  • 27. Precedence and Associativity  The unary operator * has highest precedence and is left associative.  Concatenation has second highest precedence and is left associative.  | has lowest precedence and is left associative.
  • 28. Examples (Regular Expression) Language String Regular Expression 0 or 1 0, 1 0 | 1 1 or 10 or 111 1, 10, 111 1 | 10 | 111 Strings having one or more 0 0, 00, 000, 0000,…… 0+ All possible binary strings over Σ = {0, 1} 0, 1, 00, 01, 10, 11, 000,…… (0 | 1)+ All possible strings of length 3 over Σ = {a, b, c} aaa, aba, abc, abb,…… (a|b|c) (a|b|c) (a|b|c)
  • 29. Language String Regular Expression One or more occurrences of 0 or 1 or both 0, 1, 00, 01, 10, 11, 111, 101,…… (0 | 1)+ Binary string ending with 0 0, 10, 100, 110, 00, 010,…… (0 | 1)* 0 Binary string starting with 1 1, 10, 100, 110, 101, 1101,…… 1 (0 | 1)* Binary string starting with 1 and ending with 0 10, 110, 100, 110, 1100, 1110, 1000,…… 1 (1|0)* 0 String starting and ending with same character for Σ = {0, 1} 00, 11, 010, 000, 101, 1101, 0110, 1011,…… 1 (1|0)* 1 or 0 (1|0)* 0 String ending with 01 for Σ = {0, 1} 01, 101, 001, 1001, 1101,…… (0|1)*01
  • 30. Language String Regular Expression Language consisting of exactly two 0’s for Σ = {0, 1} 00, 010, 000, 001, 0101,…… 1*01*01* All binary strings with length at least 3 for Σ = {0, 1} 000, 010, 110, 1111, 1011,…… (0|1) (0|1) (0|1) (0|1)* All binary strings where 2nd symbol from starting is 0 for Σ = {0, 1} 00, 10, 101, 100,…… (0|1)0(0|1)* Any number of a’s followed by any number of b’s followed by any number of c’s ε, abc, aaabc, abbbbbc, abccc, ab, accc,…… a*b*c*
  • 31. Exercise Write regular expression for language specified Over Σ = {0, 1}  Strings having even length Ans: RE: ((0|1)(0|1))*  String containing exactly three 1’s Ans: RE: 0*10*10*10*  String starting with 0 and having odd length Ans: RE: 0((0|1)(0|1))*  String starting or ending with 01 or 111 Ans: RE: (01|111)(0|1)* | (0|1)*(01|111)
  • 32. Regular Definition  For notational convenience, we may give names to certain regular expressions and use those names in subsequent expressions, as if the names were themselves symbols.  These names are known as regular definition.  Regular definition is a sequence of definitions of the form: 𝑑1 → 𝑟1 𝑑2 → 𝑟2 …… 𝑑𝑛 → 𝑟𝑛 Where 𝑑𝑖 is a distinct name & 𝑟𝑖 is a regular expression.
  • 33. Example Regular definition for identifier  letter → A|B|C|………..|Z|a|b|………..|z  digit → 0|1|…….|9  id → letter (letter | digit)* Regular Definition for Even Numbers  (+|-|ε) (0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)  Sign → + | -  OptSign → Sign | ε //In other words (Sign ?)  Digit → [0 – 9] //In other words (0 | 1 | ….. | 9)  EvenDigit → [02468] //In other words (0 | 2 | 4 | 6 | 8)  EvenNumber → OptSign Digit* EvenDigit
  • 34. Example Regular Definition for Unsigned Numbers  digit → 0 | 1 | ..... | 9  digits → digit digit*  optionalFraction → .digits | ε  optionalExponent →( E ( + | - | ε ) digits ) | ε  number → digits optionalFraction optionalExponent This can be simplified as:  digit → [0-9]  digits → digit+  number → digits (. digits)? ( E [+ | -]? digits )?
  • 36. Transition Diagram  Transition diagram is a special kind of flowchart for language analysis.  In transition diagram the boxes of flowchart are drawn as circle and called as states.  States are connected by arrows called as edges.  The label or weight on edge indicates the input character that can appear after that state
  • 37. Transition Diagram  Symbolized representation of transition diagram uses: is a state is a transition is a start state is a final state
  • 38. < 0 6 1 2 3 4 5 8 7 = other > = other = > Return (relop,LE) Return (relop,NE) Return (relop,LT) Return (relop,GE) Return (relop,GT) Return (relop,EQ) Transition Diagram : Example Transition Diagram for Relational Operators 8 7 4 3 2 5
  • 39. 1 2 8 other digit 3 4 5 6 7 digit digit digit +or - digit digit E . start 3 5280 39.37 1.894 E - 4 2.56 E + 7 45 E + 6 96 E 2 Transition Diagram : Example Transition Diagram for Unsigned Numbers E digit 8
  • 41.  Lexical analyser helps in identifying the pattern from the input.  Transition diagram is constructed to recognize the patterns. It is known as hard coding lexical analyzer. Example:  To represent identifier in ‘C’, the first character must be letter and other characters are either letter or digits. To recognize this pattern, hard coding lexical analyzer works with this transition diagram:  Lex and flex are compiler tools which takes regular expression as an input and finds out the pattern, matching to that expression. 2 3 Start Letter or digit Letter 1
  • 43. Implementing Regular Expression: • Regular expressions can be implemented using finite automata. • There are two kinds of finite automata: • NFAs (nondeterministic finite automata) • DFAs (deterministic finite automata) The step of implementing the lexical analyzer 1. Lexical Specification 2. Regular Expression 3. NFA 4. DFA 5. Table-Driven DFA Finite State Automaton
  • 44. Finite State Automaton A finite set of states present in FSM • One marked as initial state • One or more marked as final states • States sometimes labeled or numbered A set of transitions from one state to another • Each labeled with symbol from Σ (possible symbols), or ε Operate by reading input symbols • Transition can be taken if labelled with current symbol • ε-transition can be taken at any time Accept when final state reached & no more input Reject if no transition possible, or no more input and not in final state (DFA)
  • 45. Finite Automata  We call the recognizer of the tokens as a finite automaton.  FA results in “yes” or “no” based on each input string.  FA consist of:  S : Set of states  𝜮 : Set of input symbol  move : A transition function  S0 : Initial state  F : Accepting state (Final state)
  • 46. Finite Automata  A finite automaton can be: deterministic (DFA) or non-deterministic (NFA)  Both deterministic and non-deterministic finite automaton recognize regular sets.  Deterministic – faster recognizer, but it may take more space  Non-deterministic – slower, but it may take less space  Deterministic automatons are widely used lexical analysers.
  • 47.  Deterministic finite automata (DFA): From each state exactly one edge leaving out (for each symbol).  Nondeterministic finite automata (NFA): There are no restrictions on the edges leaving a state. There can be several with the same symbol as label and some edges can be labeled with 𝜖. 1 2 3 4 a b b a b 1 2 3 4 a b b a a a b DFA NFA b
  • 48. Regular Expression to NFA (Thompson’s Rule)
  • 53. Example (RE to NFA) 1 4 𝜖 𝜖 𝜖 𝜖 2 3 𝑎 1 2 3 a b 1 2 5 3 4 6 a b 𝜖 𝜖 𝜖 𝜖 a* ab (a|b)
  • 54. Example (RE to NFA) 1 4 𝜖 𝜖 𝜖 𝜖 2 3 𝑎 1 4 𝜖 𝜖 𝜖 𝜖 2 3 𝑏 6 5 𝑎 𝑏 5 𝑏 a*b b*ab
  • 55. Example (RE to NFA) 1 2 5 3 6 𝜖 𝜖 𝜖 𝜖 4 d c 𝜖 𝜖 𝜖 0 7 𝜖 (c|d)*
  • 56. Exercise  abab  a*|b*  abb(a)*ba  (ab + b)*ba  (a + b)*aa(a+b)
  • 57. NFA to DFA conversion (Using subset construction method)
  • 58. Subset Construction Algorithm  Input : NFA (N)  Output : DFA (D) (Accepting the same language)  Method : Apply Algorithm, Make Tansition Table, Dtran for DFA . Operations to perform: Operation Description Є – closure(s) Set of NFA States reachable from NFA State s on Є – transition alone. Є – closure(T) Set of NFA States reachable from some NFA State s in T on Є –transition alone. Move (T,a) Set of NFA states to which there is a transition on input symbol a from some NFA state s in T.
  • 59. Subset Construction Algorithm Initially Є –closure (s0) be the only state in Dstates and it is unmarked; While there is unmarked states in T in Dstates do begin Mark T; for each input symbol a do begin U = Є –closure (move (T,a)); If U is not in Dstates then add U as unmarked state to Dstates; Dtran [ T, a ] = U end end
  • 60. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є Є closure (0) = {0,1,2,4,7}  A
  • 61. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є Є closure (0) = {0,1,2,4,7}  A State a b A = {0,1,2,4,7}
  • 62. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,a) = {3,8} State a b A = {0,1,2,4,7}
  • 63. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,a) = {3,8} State a b A = {0,1,2,4,7}
  • 64. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,a) = {3,8} Є closure (Move(A,a)) = {3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B State a b A = {0,1,2,4,7} B
  • 65. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,b) = {5} State a b A = {0,1,2,4,7} B
  • 66. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,b) = {5} Є closure (Move(A,b)) = {5,6,7,1,2,4} = {1,2,4,5,6,7} State a b A = {0,1,2,4,7} B
  • 67. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,b) = {5} Є closure (Move(A,b)) = {5,6,7,1,2,4} = {1,2,4,5,6,7}  C State a b A = {0,1,2,4,7} B C
  • 68. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,a) = {3,8} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8}
  • 69. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,a) = {3,8} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8}
  • 70. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,a) = {3,8} Є Closure (Move (B,a)) = { 3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8}
  • 71. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,b) = {5,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B
  • 72. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,b) = {5,9} Є Closure (Move (B,b)) = {5,6,7,1,2,4,9} = {1,2,4,5,6,7,9}  D State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D
  • 73. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,a) = {3,8} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7}
  • 74. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,a) = {3,8} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7}
  • 75. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,a) = {3,8} Є Closure (Move (c,a)) = {3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7}
  • 76. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,b) = {5} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B
  • 77. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,b) = {5} Є Closure (Move (C,b)) = {5,6,7,1,2,4} = {1,2,4,5,6,7}  C State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B
  • 78. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,b) = {5} Є Closure (Move (C,b)) = {5,6,7,1,2,4} = {1,2,4,5,6,7}  C State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C
  • 79. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} Move (D,a) = {3,8}
  • 80. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} Move (D,a) = {3,8} Є Closure (Move (D,a)) = {3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B
  • 81. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B Move (D,b) = {5,10}
  • 82. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B Move (D,b) = {5,10} Є Closure (Move (D,b)) = {5,6,7,1,2,4,10} = {1,2,4,5,6,7,10}  E
  • 83. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E Move (D,b) = {5,10} Є Closure (Move (D,b)) = {5,6,7,1,2,4,10} = {1,2,4,5,6,7,10}  E
  • 84. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} Move (E,a) = {3,8}
  • 85. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} Move (E,a) = {3,8} Є Closure (Move (E,a)) = {3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B
  • 86. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B Move (E,b) = {5}
  • 87. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B Move (E,b) = {5}
  • 88. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B C Move (E,b) = {5}
  • 89. Example : (a|b)*abb State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B C
  • 90. Convert NFA to DFA : (a|b)*abb Є –closure (0) = {0,1,2,4,7}  Let A Move (A,a) = {3,8} Є –closure ((A,a)) = {1,2,3,4,6,7,8}  Let B Move (A,b) = {5} Є –closure ((A,b)) = {1,2,4,5,6,7}  Let C
  • 91. Move (B,a) = {3,8} Є –closure ((B,a)) = {1,2,3,4,6,7,8}  Let B Move (B,b) = {5,9} Є –closure ((B,b)) = {1,2,4,5,6,7,9}  Let D Move (C,a) = {3,8} Є –closure ((C,a)) = {1,2,3,4,6,7,8}  Let B Move (C,b) = {5} Є –closure ((C,b)) = {1,2,4,5,6,7}  Let C Convert NFA to DFA : (a|b)*abb (Cont…)
  • 92. Move (D,a) = {3,8} Є –closure ((D,a)) = {1,2,3,4,6,7,8}  Let B Move (D,b) = {5,10} Є –closure ((D,b)) = {1,2,4,5,6,7,10}  Let E Move (E,a) = {3,8} Є –closure ((E,a)) = {1,2,3,4,6,7,8}  Let B Move (E,b) = {5} Є –closure ((E,b)) = {1,2,4,5,6,7}  Let C Convert NFA to DFA : (a|b)*abb (Cont…)
  • 93. State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B C Convert NFA to DFA : (a|b)*abb (Cont…)
  • 94. Example  Convert following regular expression to DFA using subset construction method:  (0 + 1)*1(0 + 1)  (0 + 1)*01*
  • 96. DFA Optimization  The procedure can also be known as minimization of DFA.  Minimization/optimization refers to the detection of those states of a DFA, whose presence or absence in a DFA does not affect the language accepted by the automata.  The states that can be eliminated from automata, without affecting the language accepted by automata, are:  Unreachable or inaccessible states  Dead states  Non-distinguishable or indistinguishable state or equivalent states.  Partitioning algorithm helps in DFA optimization process.
  • 97. Partitioning Algorithm 1. Remove all the states that are unreachable from initial state via any set of transition of DFA. 2. Draw the transition table for all pair of states. 3. Now, split the transition table into two tables T1 and T2. 1. T1 contains all final states 2. T2 contains non-final states 4. Find similar rows from T1 such that; 𝛿(q, a) = p 𝛿(r, a) = p i.e. find the two states which have same value of a and b and remove one of them
  • 98. Continued… 5. Repeat step 3 until we find no similar rows available in T1 6. Repeat step 3 and step 4 for table T2 also. 7. Now combine the reduced T1 and T2 tables. i.e. the final transition table of minimized DFA.
  • 99. DFA Optimization A B C B B D C B C D B E E B C States a b {𝐴, 𝐵, 𝐶, 𝐷, 𝐸} Nonaccepting States {𝐴, 𝐵, 𝐶, 𝐷} Accepting States {𝐸} {𝐴, 𝐵, 𝐶} {𝐷} {𝐴, 𝐶} {𝐵} • Now no more splitting is possible. • If we chose A as the representative for group (AC), then we obtain reduced transition table A B A B B D D B E E B A States a b Optimized TransitionTable
  • 101. Function computed from syntax tree  nullable (n): Is true for * node and node labeled with Ɛ. For other nodes it is false.  firstpos (n): Set of positions at node ti that corresponds to the first symbol of the sub-expression rooted at n.  lastpos (n): Set of positions at node ti that corresponds to the last symbol of the sub-expression rooted at n.  followpos (i): Set of positions that follows given position by matching the first or last symbol of a string generated by sub-expression of the given regular expression.
  • 102. Rules to compute nullable, firstpos, lastpos Node n nullable(n) firstpos(n) lastpos(n) A leaf labeled by  true ∅ ∅ A leaf with position 𝒊 false {𝑖} {𝑖} nullable(c1) or nullable(c2) firstpos(c1)  firstpos(c2) lastpos(c1)  lastpos(c2) | n c1 nullable(c1) and nullable(c2) if (nullable(c1)) thenfirstpos(c1)  firstpos(c2) else firstpos(c1) if (nullable(c2)) then lastpos(c1)  lastpos(c2) else lastpos(c2) n true firstpos(c1) lastpos(c1) ∗ n c2 . c1 c2 c1
  • 103. Computation of followpos The position of regular expression can follow another in the following ways:  If n is a cat node with left child c1 and right child c2, then for every position i in lastpos(c1), all positions in firstpos(c2) are in followpos(i).  For cat node, for each position i in lastpos of its left child, the firstpos of its right child will be in followpos(i).  If n is a star node and i is a position in lastpos(n), then all positions in firstpos(n) are in followpos(i).  For star node, the firstpos of that node is in f ollowpos of all positions in lastpos of that node.
  • 104. Conversion from regular expression to DFA 𝑎 𝑏 | ∗ . 𝟏 𝟐 𝑎 . . . 𝑏 𝑏 # (a|b)*abb 𝟒 𝟑 𝟓 𝟔 # Step 2: Nullable node Here, * is only nullable node Step 1: Construct SyntaxTree
  • 105. Conversion from regular expression to DFA 𝑎 𝑏 | ∗ . {1} {2} {1,2} {1,2} 𝑎 {3} {1,2,3} . {4} {1,2,3} . {5} {1,2,3} . {6} {1,2,3} 𝑏 𝑏 # Step 3: Calculate firstpos Firstpos A leaf with position 𝒊 = {𝒊} | n c1 c2 firstpos(c1)  firstpos(c2) ∗ n c1 firstpos(c1) if (nullable(c1)) firstpos(c1)  firstpos(c2) else firstpos(c1) . n c1 c2 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔
  • 106. Conversion from regular expression to DFA 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 3: Calculate lastpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 Lastpos A leaf with position 𝒊 = {𝒊} | n c1 c2 lastpos(c1)  lastpos(c2) ∗ n c1 lastpos(c1) if (nullable(c2)) lastpos(c1)  lastpos(c2) else lastpos(c2) . n c1 c2
  • 107. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 {1,2,3} {5} . {6} {6} 𝒄𝟏 𝒄𝟐 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {5} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 6 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 5 = 6 Firstpos Lastpos
  • 108. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 {1,2,3} {4} . {5} {5} 𝒄𝟏 𝒄𝟐 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {4} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 5 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 4 = 5 4 5 Firstpos Lastpos
  • 109. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 {1,2,3} {3} . {4} {4} 𝒄𝟏 𝒄𝟐 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {3} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 4 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 3 = 4 4 5 3 4 Firstpos Lastpos
  • 110. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 {1,2} {1,2} . {3} {3} 𝒄𝟏 𝒄𝟐 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {1,2} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 3 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 3 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 3 4 5 3 4 2 3 1 3 Firstpos Lastpos
  • 111. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 𝒏 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑛) = {1,2} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑛 = 1,2 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 1,2 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 1,2 4 5 3 4 2 3 1 3 {1,2} {1,2} * 1,2, 1,2, Firstpos Lastpos
  • 112. Construct DFA Initial state = 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 of root = {1,2,3} ----- A State A δ( (1,2,3),a) = followpos(1) U followpos(3) =(1,2,3) U (4) = {1,2,3,4} ----- B δ( (1,2,3),b) = followpos(2) =(1,2,3) ----- A Position followpos 5 6 4 5 3 4 2 1,2,3 1 1,2,3 States a b A={1,2,3} B A B={1,2,3,4}
  • 113. Construct DFA State B δ( (1,2,3,4),a) = followpos(1) U followpos(3) =(1,2,3) U (4) = {1,2,3,4} ----- B δ( (1,2,3,4),b) = followpos(2) U followpos(4) =(1,2,3) U (5) = {1,2,3,5} ----- C State C δ( (1,2,3,5),a) = followpos(1) U followpos(3) =(1,2,3) U (4) = {1,2,3,4} ----- B δ( (1,2,3,5),b) = followpos(2) U followpos(5) =(1,2,3) U (6) = {1,2,3,6} ----- D Position followpos 5 6 4 5 3 4 2 1,2,3 1 1,2,3 States a b A={1,2,3} B A B={1,2,3,4} B C C={1,2,3,5} B D D={1,2,3,6}
  • 114. Construct DFA State D δ( (1,2,3,6),a) = followpos(1) U followpos(3) =(1,2,3) U (4) = {1,2,3,4} ----- B δ( (1,2,3,6),b) = followpos(2) =(1,2,3) ----- A Position followpos 5 6 4 5 3 4 2 1,2,3 1 1,2,3 States a b A={1,2,3} B A B={1,2,3,4} B C C={1,2,3,5} B D D={1,2,3,6} B A A B C D a b b b a a b a DFA
  • 115. Construct DFA Position followpos 5 6 4 5 3 4 2 1,2,3 1 1,2,3 States a b A={1,2,3} B A B={1,2,3,4} B C C={1,2,3,5} B D D={1,2,3,6} B A A B C D a b b b a a b a DFA Note: Elements of E contains state 10 that is acceptance state in NFA. So, State E is acceptance state