Regular Expression to Finite Automata

Myself Archana R
Assistant Professor In
Department Of Computer Science
SACWC.
I am here because I love to give presentations.
COMPILER
DESIGN

REGULAR EXPRESSION
TO FINITE AUTOMATA

Regular Expressions:
We use regular expressions to describe tokens of a programming
language.
• A regular expression is built up of simpler regular expressions
(using defining rules).
• Each regular expression denotes a language.
• A language denoted by a regular expression is called as a regular set.

Regular expressions over
alphabet
Language it denotes:
[r1) | (r2)
(r1) (r2)
(r)*
(r)
L(r1) L(r2)
L(r1) L(r2)
(L(r))*
L(r)
Regular Expressions
(Rules)
Reg. Expr

Regular Expressions
• We may remove parentheses by using precedence rules.
– * highest
– concatenation next
– | lowest
• ab*|c means (a(b)*)|(c)

Example:
– = {0,1}
– 0|1 => {0,1}
– (0|1)(0|1) => {00,01,10,11}
– 0* => { ,0,00,000,0000,....}
– (0|1)* => all strings with 0 and 1,
including the empty string

• To write regular expression for some
languages can be difficult, because
their regular expressions can be quite
complex. In those cases, we may
use regular definitions.
• We can give names to regular
expressions, and we can use these
names as symbols to define other
regular expressions.
• A regular definition is a sequence of the
definitions of the form:
d1 r1 (where di is a distinct
name and)
d2 r2 (ri is a regular
expression over
symbols in)
.
{d1,d2,...,di-1}
dn r n
basic symbols previously defined
names
REGULAR DEFINITION:

Ex: Identifiers in Pascal
letter A | B | ... | Z | a | b | ... | z digit
0 | 1 | ... | 9
id letter (letter | digit ) *
–If we try to write the regular
expression representing identifiers
without using regular definitions, that
regular expression will be complex.
(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9)
) *
Ex: Unsigned numbers in Pascal
digit 0 | 1 | ... | 9
digits digit +
opt-fraction ( . digits ) ?
opt-exponent ( E (+|-)?
digits)?
unsigned-num digits opt-
fraction opt-exponent

FINITE AUTOMATA • Both deterministic and non-deterministic finite
automaton recognize regular sets.
• Which one?
– deterministic – faster recognizer, but it may take
more space
– non-deterministic – slower, but it may take less
space
– Deterministic automatons are widely used lexical
analyzers.
• First, we define regular expressions for tokens;
Then we convert them into a DFA to get a lexical
analyzer for our tokens.
– Algorithm1: Regular Expression NFA DFA
(two steps: first to NFA, then to DFA)
– Algorithm2: Regular Expression DFA (directly
convert a regular expression into a DFA)
•A recognizer for a language is a program that
takes a string x, and answers “yes” if x is a
sentence of that language, and “no” otherwise.
• We call the recognizer of the tokens as a
finite automaton.
• A finite automaton can be:
deterministic(DFA) or non-deterministic
(NFA)
• This means that we may use a deterministic
or non-deterministic automaton as a
lexical analyzer.

Non-Deterministic Finite Automaton
(NFA)
• A non-deterministic finite automaton (NFA) is a
mathematical model
that consists of:
– S - a set of states
– - a set of input symbols (alphabet)
– move – a transition function move to map state-
symbol pairs to sets of states.
– s0 - a start (initial) state
– F – a set of accepting states (final states)

• - transitions are allowed in NFAs. In
other words, we can move from
one state to another one without consuming
any symbol.
•A NFA accepts a string x, if and only if
there is a path from the starting state to one
of accepting states such that edge labels
along this path spell out x.

Transition graph of the NFA
The language recognized by this NFA is (a|b) * a b
0 is the start state s0
{2} is the set of final
states F
. = {a,b}
S = {0,1,2}
start
a
a
1
b
b
0
2

ExecutingNFA
•Problem:How to execute NFA efficiently?
"strings accepted are those for which there is
some corresponding path fromat art state to an
accept state“
•Conclusion:Search all paths in graph consistent
With the string.
•Idea:searchpathsinparallel
•Keep track of subset of NFA states that
search could be in after seeing string prefix.
•"Multiple fingers"pointing to graph.

• A Deterministic Finite Automaton (DFA) is a special form of a
NFA.
• no state has - transition
• for each symbol a and state s, there is at most one labeled edge
a leaving s.
i.e. transition function is from pair of state-symbol to state (not
set of states)
DETERMINISTIC FINITE AUTOMATON(DFA):
a b
0 2
b
1
a
a
b
The language recognized by
this DFA is also (a|b) * a b

Converting A Regular Expression into A
NFA
(Thomson’s Construction)
• This is one way to convert a regular
expression into a NFA.
• There can be other ways (much efficient)
for the conversion.
• Thomson’s Construction is simple and
systematic method.
It guarantees that the resulting NFA will
have exactly one final state,
and one start state.
• Construction starts from simplest parts
(alphabet symbols).
To create a NFA for a complex regular
expression, NFAs of its sub-expressions
are combined to create its NFA,
• To recognize an empty string
• To recognize a symbol a in the alphabet
• If N(r1) and N(r2) are NFAs for regular
expressions r1 and r2
• For regular expression r1 | r2
NFA for r1 | r2

Converting a NFA into a DFA:
Step 1
Step 2
S0 is the start state of DFA
since 0 is a member of
S0={0,1,2,4,7}
S1 is an accepting state of DFA
since 8 is a member of S1 =
{1,2,3,4,6,7,8}
a
b
b
S0
s1
S2
a
b
a

DFAvs.NFA
•DFA: Action of automaton one achin put symbolically
determined.
•obvious table-driven implementation.
•NFA:
•Automaton may have choice one ach step
•Automaton accepts a string if there is anyway to make
choices to arrive at accepting state/every path from start
state to an accept state is a string accepted by automaton.
•Not obvious how to implement efficiently!

Convert A Regular Expressions
Directly To DFA:
• We may convert a regular expression into a
DFA (without creating a
NFA first).
•First we augment the given regular
expression by concatenating it with special
symbol #.
• Then, we create a syntax tree for this
augmented regular expression.
•In this syntax tree, all alphabet
symbols (plus # and the empty string)
in the augmented regular expression
will be on the leaves, and all inner
nodes will be the operators in that
augmented regular expression.
•Then each alphabet symbol (plus #)
will be numbered (position numbers).

DFA Minimization:
• DFA construction can produce large
DFA with many states.
• Lexer generators perform additional
phase of DFA Minimization to reduce
to minimum possible size .
What does this DFA do?
-
Can it be simplified.
1

Automatic Scanner Construction:
To convert a specification into code.
•Write down the RE for the input
language.
•Build a big NFA.
•Build the DFA that simulates the
NFA.
•Systematically shrink the DFA.
•Turn it into code.
Scanner generators:
•Lex and flex work along these
lines.
•Algorithm sare well known and
understood.
•Key issue is interface to the
parser.

Regular Expression to Finite Automata

More Related Content

What's hot (20)

Similar to Regular Expression to Finite Automata (20)

More from Archana Gopinath (19)

Recently uploaded (20)

Regular Expression to Finite Automata