SlideShare a Scribd company logo
3
Most read
12
Most read
21
Most read
Myself Archana R
Assistant Professor In
Department Of Computer Science
SACWC.
I am here because I love to give presentations.
COMPILER
DESIGN
REGULAR EXPRESSION
TO FINITE AUTOMATA
Regular Expressions:
We use regular expressions to describe tokens of a programming
language.
• A regular expression is built up of simpler regular expressions
(using defining rules).
• Each regular expression denotes a language.
• A language denoted by a regular expression is called as a regular set.
Regular expressions over
alphabet
Language it denotes:
[r1) | (r2)
(r1) (r2)
(r)*
(r)
L(r1) L(r2)
L(r1) L(r2)
(L(r))*
L(r)
Regular Expressions
(Rules)
Reg. Expr
Regular Expressions
• We may remove parentheses by using precedence rules.
– * highest
– concatenation next
– | lowest
• ab*|c means (a(b)*)|(c)
Example:
– = {0,1}
– 0|1 => {0,1}
– (0|1)(0|1) => {00,01,10,11}
– 0* => { ,0,00,000,0000,....}
– (0|1)* => all strings with 0 and 1,
including the empty string
• To write regular expression for some
languages can be difficult, because
their regular expressions can be quite
complex. In those cases, we may
use regular definitions.
• We can give names to regular
expressions, and we can use these
names as symbols to define other
regular expressions.
• A regular definition is a sequence of the
definitions of the form:
d1 r1 (where di is a distinct
name and)
d2 r2 (ri is a regular
expression over
symbols in)
.
{d1,d2,...,di-1}
dn r n
basic symbols previously defined
names
REGULAR DEFINITION:
Ex: Identifiers in Pascal
letter A | B | ... | Z | a | b | ... | z digit
0 | 1 | ... | 9
id letter (letter | digit ) *
–If we try to write the regular
expression representing identifiers
without using regular definitions, that
regular expression will be complex.
(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9)
) *
Ex: Unsigned numbers in Pascal
digit 0 | 1 | ... | 9
digits digit +
opt-fraction ( . digits ) ?
opt-exponent ( E (+|-)?
digits)?
unsigned-num digits opt-
fraction opt-exponent
FINITE AUTOMATA • Both deterministic and non-deterministic finite
automaton recognize regular sets.
• Which one?
– deterministic – faster recognizer, but it may take
more space
– non-deterministic – slower, but it may take less
space
– Deterministic automatons are widely used lexical
analyzers.
• First, we define regular expressions for tokens;
Then we convert them into a DFA to get a lexical
analyzer for our tokens.
– Algorithm1: Regular Expression NFA DFA
(two steps: first to NFA, then to DFA)
– Algorithm2: Regular Expression DFA (directly
convert a regular expression into a DFA)
•A recognizer for a language is a program that
takes a string x, and answers “yes” if x is a
sentence of that language, and “no” otherwise.
• We call the recognizer of the tokens as a
finite automaton.
• A finite automaton can be:
deterministic(DFA) or non-deterministic
(NFA)
• This means that we may use a deterministic
or non-deterministic automaton as a
lexical analyzer.
Non-Deterministic Finite Automaton
(NFA)
• A non-deterministic finite automaton (NFA) is a
mathematical model
that consists of:
– S - a set of states
– - a set of input symbols (alphabet)
– move – a transition function move to map state-
symbol pairs to sets of states.
– s0 - a start (initial) state
– F – a set of accepting states (final states)
• - transitions are allowed in NFAs. In
other words, we can move from
one state to another one without consuming
any symbol.
•A NFA accepts a string x, if and only if
there is a path from the starting state to one
of accepting states such that edge labels
along this path spell out x.
Transition graph of the NFA
The language recognized by this NFA is (a|b) * a b
0 is the start state s0
{2} is the set of final
states F
. = {a,b}
S = {0,1,2}
start
a
a
1
b
b
0
2
ExecutingNFA
•Problem:How to execute NFA efficiently?
"strings accepted are those for which there is
some corresponding path fromat art state to an
accept state“
•Conclusion:Search all paths in graph consistent
With the string.
•Idea:searchpathsinparallel
•Keep track of subset of NFA states that
search could be in after seeing string prefix.
•"Multiple fingers"pointing to graph.
• A Deterministic Finite Automaton (DFA) is a special form of a
NFA.
• no state has - transition
• for each symbol a and state s, there is at most one labeled edge
a leaving s.
i.e. transition function is from pair of state-symbol to state (not
set of states)
DETERMINISTIC FINITE AUTOMATON(DFA):
a b
0 2
b
1
a
a
b
The language recognized by
this DFA is also (a|b) * a b
Converting A Regular Expression into A
NFA
(Thomson’s Construction)
• This is one way to convert a regular
expression into a NFA.
• There can be other ways (much efficient)
for the conversion.
• Thomson’s Construction is simple and
systematic method.
It guarantees that the resulting NFA will
have exactly one final state,
and one start state.
• Construction starts from simplest parts
(alphabet symbols).
To create a NFA for a complex regular
expression, NFAs of its sub-expressions
are combined to create its NFA,
• To recognize an empty string
• To recognize a symbol a in the alphabet
• If N(r1) and N(r2) are NFAs for regular
expressions r1 and r2
• For regular expression r1 | r2
NFA for r1 | r2
Converting a NFA into a DFA:
Step 1
Step 2
S0 is the start state of DFA
since 0 is a member of
S0={0,1,2,4,7}
S1 is an accepting state of DFA
since 8 is a member of S1 =
{1,2,3,4,6,7,8}
a
b
b
S0
s1
S2
a
b
a
DFAvs.NFA
•DFA: Action of automaton one achin put symbolically
determined.
•obvious table-driven implementation.
•NFA:
•Automaton may have choice one ach step
•Automaton accepts a string if there is anyway to make
choices to arrive at accepting state/every path from start
state to an accept state is a string accepted by automaton.
•Not obvious how to implement efficiently!
Convert A Regular Expressions
Directly To DFA:
• We may convert a regular expression into a
DFA (without creating a
NFA first).
•First we augment the given regular
expression by concatenating it with special
symbol #.
• Then, we create a syntax tree for this
augmented regular expression.
•In this syntax tree, all alphabet
symbols (plus # and the empty string)
in the augmented regular expression
will be on the leaves, and all inner
nodes will be the operators in that
augmented regular expression.
•Then each alphabet symbol (plus #)
will be numbered (position numbers).
DFA Minimization:
• DFA construction can produce large
DFA with many states.
• Lexer generators perform additional
phase of DFA Minimization to reduce
to minimum possible size .
What does this DFA do?
-
Can it be simplified.
1
Automatic Scanner Construction:
To convert a specification into code.
•Write down the RE for the input
language.
•Build a big NFA.
•Build the DFA that simulates the
NFA.
•Systematically shrink the DFA.
•Turn it into code.
Scanner generators:
•Lex and flex work along these
lines.
•Algorithm sare well known and
understood.
•Key issue is interface to the
parser.
Thank
you!

More Related Content

PPTX
Regular expressions
PPTX
NFA & DFA
PPTX
Process synchronization in Operating Systems
PPTX
Java Server Pages
PPTX
Real time operating system
PPT
Cpu organisation
DOC
FRESHER TESTING
Regular expressions
NFA & DFA
Process synchronization in Operating Systems
Java Server Pages
Real time operating system
Cpu organisation
FRESHER TESTING

What's hot (20)

PPT
context free language
PPTX
Parsing in Compiler Design
PPTX
Types of Parser
PPTX
Context free grammar
PPT
Lexical Analysis
PPTX
COMPILER DESIGN
PDF
Automata theory
PDF
Symbol table in compiler Design
PPTX
Lexical Analysis - Compiler Design
PPT
pushdown automata
PPT
Multi Head, Multi Tape Turing Machine
PDF
Algorithms Lecture 1: Introduction to Algorithms
PPTX
Webinar : P, NP, NP-Hard , NP - Complete problems
PDF
Daa notes 1
PDF
Syntactic analysis in NLP
PDF
Formal Languages and Automata Theory Unit 1
PPTX
Lexical analyzer generator lex
DOC
AUTOMATA THEORY - SHORT NOTES
PDF
Lexical Analysis - Compiler design
PPTX
asymptotic notation
context free language
Parsing in Compiler Design
Types of Parser
Context free grammar
Lexical Analysis
COMPILER DESIGN
Automata theory
Symbol table in compiler Design
Lexical Analysis - Compiler Design
pushdown automata
Multi Head, Multi Tape Turing Machine
Algorithms Lecture 1: Introduction to Algorithms
Webinar : P, NP, NP-Hard , NP - Complete problems
Daa notes 1
Syntactic analysis in NLP
Formal Languages and Automata Theory Unit 1
Lexical analyzer generator lex
AUTOMATA THEORY - SHORT NOTES
Lexical Analysis - Compiler design
asymptotic notation
Ad

Similar to Regular Expression to Finite Automata (20)

PPTX
Implementation of lexical analyser
PPTX
Compiler Design_Lexical Analysis phase.pptx
PPT
Ch3.ppt
PDF
Lexicalanalyzer
PDF
Lexicalanalyzer
PPTX
Finite automata-for-lexical-analysis
PPT
Lecture 1 - Lexical Analysis.ppt
PPT
02. Chapter 3 - Lexical Analysis NLP.ppt
PDF
Automata_Theory_and_compiler_design_UNIT-1.pptx.pdf
PPT
Lexical analysis, syntax analysis, semantic analysis. Ppt
PPTX
Unit2 Toc.pptx
PPT
02. chapter 3 lexical analysis
PPTX
Lec1.pptx
PPTX
Optimization of dfa
PPTX
Unitiv 111206005201-phpapp01
PPTX
A simple approach of lexical analyzers
PDF
Lexical
DOC
Compiler Design Material 2
PPTX
symbolic_automata or Advanced Programming Practice.pptx
DOC
Principles of Compiler Design
Implementation of lexical analyser
Compiler Design_Lexical Analysis phase.pptx
Ch3.ppt
Lexicalanalyzer
Lexicalanalyzer
Finite automata-for-lexical-analysis
Lecture 1 - Lexical Analysis.ppt
02. Chapter 3 - Lexical Analysis NLP.ppt
Automata_Theory_and_compiler_design_UNIT-1.pptx.pdf
Lexical analysis, syntax analysis, semantic analysis. Ppt
Unit2 Toc.pptx
02. chapter 3 lexical analysis
Lec1.pptx
Optimization of dfa
Unitiv 111206005201-phpapp01
A simple approach of lexical analyzers
Lexical
Compiler Design Material 2
symbolic_automata or Advanced Programming Practice.pptx
Principles of Compiler Design
Ad

More from Archana Gopinath (19)

PDF
The Graph Abstract Data Type-DATA STRUCTURE.pdf
PPTX
Introduction-to-Binary-Tree-Traversal.pptx
PPTX
DNS-Translates domain names into IP addresses.pptx
PPTX
Data Transfer & Manipulation.pptx
PPTX
DP _ CO Instruction Format.pptx
PPTX
Language for specifying lexical Analyzer
PPTX
A Role of Lexical Analyzer
PPTX
minimization the number of states of DFA
PPTX
Fundamentals of big data analytics and Hadoop
PPTX
Map reduce in Hadoop BIG DATA ANALYTICS
PPTX
Business intelligence
PPTX
PPTX
Programming with R in Big Data Analytics
PPTX
If statements in c programming
PPT
un Guided media
PPT
Guided media Transmission Media
PPTX
Main Memory RAM and ROM
PDF
Java thread life cycle
PPTX
PCSTt11 overview of java
The Graph Abstract Data Type-DATA STRUCTURE.pdf
Introduction-to-Binary-Tree-Traversal.pptx
DNS-Translates domain names into IP addresses.pptx
Data Transfer & Manipulation.pptx
DP _ CO Instruction Format.pptx
Language for specifying lexical Analyzer
A Role of Lexical Analyzer
minimization the number of states of DFA
Fundamentals of big data analytics and Hadoop
Map reduce in Hadoop BIG DATA ANALYTICS
Business intelligence
Programming with R in Big Data Analytics
If statements in c programming
un Guided media
Guided media Transmission Media
Main Memory RAM and ROM
Java thread life cycle
PCSTt11 overview of java

Recently uploaded (20)

PDF
Pre independence Education in Inndia.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
Institutional Correction lecture only . . .
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
Pre independence Education in Inndia.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Institutional Correction lecture only . . .
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Supply Chain Operations Speaking Notes -ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Week 4 Term 3 Study Techniques revisited.pptx
Pharma ospi slides which help in ospi learning
Abdominal Access Techniques with Prof. Dr. R K Mishra
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Cell Types and Its function , kingdom of life
PPH.pptx obstetrics and gynecology in nursing
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Microbial diseases, their pathogenesis and prophylaxis

Regular Expression to Finite Automata

  • 1. Myself Archana R Assistant Professor In Department Of Computer Science SACWC. I am here because I love to give presentations. COMPILER DESIGN
  • 3. Regular Expressions: We use regular expressions to describe tokens of a programming language. • A regular expression is built up of simpler regular expressions (using defining rules). • Each regular expression denotes a language. • A language denoted by a regular expression is called as a regular set.
  • 4. Regular expressions over alphabet Language it denotes: [r1) | (r2) (r1) (r2) (r)* (r) L(r1) L(r2) L(r1) L(r2) (L(r))* L(r) Regular Expressions (Rules) Reg. Expr
  • 5. Regular Expressions • We may remove parentheses by using precedence rules. – * highest – concatenation next – | lowest • ab*|c means (a(b)*)|(c)
  • 6. Example: – = {0,1} – 0|1 => {0,1} – (0|1)(0|1) => {00,01,10,11} – 0* => { ,0,00,000,0000,....} – (0|1)* => all strings with 0 and 1, including the empty string
  • 7. • To write regular expression for some languages can be difficult, because their regular expressions can be quite complex. In those cases, we may use regular definitions. • We can give names to regular expressions, and we can use these names as symbols to define other regular expressions. • A regular definition is a sequence of the definitions of the form: d1 r1 (where di is a distinct name and) d2 r2 (ri is a regular expression over symbols in) . {d1,d2,...,di-1} dn r n basic symbols previously defined names REGULAR DEFINITION:
  • 8. Ex: Identifiers in Pascal letter A | B | ... | Z | a | b | ... | z digit 0 | 1 | ... | 9 id letter (letter | digit ) * –If we try to write the regular expression representing identifiers without using regular definitions, that regular expression will be complex. (A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) * Ex: Unsigned numbers in Pascal digit 0 | 1 | ... | 9 digits digit + opt-fraction ( . digits ) ? opt-exponent ( E (+|-)? digits)? unsigned-num digits opt- fraction opt-exponent
  • 9. FINITE AUTOMATA • Both deterministic and non-deterministic finite automaton recognize regular sets. • Which one? – deterministic – faster recognizer, but it may take more space – non-deterministic – slower, but it may take less space – Deterministic automatons are widely used lexical analyzers. • First, we define regular expressions for tokens; Then we convert them into a DFA to get a lexical analyzer for our tokens. – Algorithm1: Regular Expression NFA DFA (two steps: first to NFA, then to DFA) – Algorithm2: Regular Expression DFA (directly convert a regular expression into a DFA) •A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and “no” otherwise. • We call the recognizer of the tokens as a finite automaton. • A finite automaton can be: deterministic(DFA) or non-deterministic (NFA) • This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer.
  • 10. Non-Deterministic Finite Automaton (NFA) • A non-deterministic finite automaton (NFA) is a mathematical model that consists of: – S - a set of states – - a set of input symbols (alphabet) – move – a transition function move to map state- symbol pairs to sets of states. – s0 - a start (initial) state – F – a set of accepting states (final states)
  • 11. • - transitions are allowed in NFAs. In other words, we can move from one state to another one without consuming any symbol. •A NFA accepts a string x, if and only if there is a path from the starting state to one of accepting states such that edge labels along this path spell out x.
  • 12. Transition graph of the NFA The language recognized by this NFA is (a|b) * a b 0 is the start state s0 {2} is the set of final states F . = {a,b} S = {0,1,2} start a a 1 b b 0 2
  • 13. ExecutingNFA •Problem:How to execute NFA efficiently? "strings accepted are those for which there is some corresponding path fromat art state to an accept state“ •Conclusion:Search all paths in graph consistent With the string. •Idea:searchpathsinparallel •Keep track of subset of NFA states that search could be in after seeing string prefix. •"Multiple fingers"pointing to graph.
  • 14. • A Deterministic Finite Automaton (DFA) is a special form of a NFA. • no state has - transition • for each symbol a and state s, there is at most one labeled edge a leaving s. i.e. transition function is from pair of state-symbol to state (not set of states) DETERMINISTIC FINITE AUTOMATON(DFA): a b 0 2 b 1 a a b The language recognized by this DFA is also (a|b) * a b
  • 15. Converting A Regular Expression into A NFA (Thomson’s Construction) • This is one way to convert a regular expression into a NFA. • There can be other ways (much efficient) for the conversion. • Thomson’s Construction is simple and systematic method. It guarantees that the resulting NFA will have exactly one final state, and one start state. • Construction starts from simplest parts (alphabet symbols). To create a NFA for a complex regular expression, NFAs of its sub-expressions are combined to create its NFA, • To recognize an empty string • To recognize a symbol a in the alphabet • If N(r1) and N(r2) are NFAs for regular expressions r1 and r2 • For regular expression r1 | r2 NFA for r1 | r2
  • 16. Converting a NFA into a DFA: Step 1 Step 2 S0 is the start state of DFA since 0 is a member of S0={0,1,2,4,7} S1 is an accepting state of DFA since 8 is a member of S1 = {1,2,3,4,6,7,8} a b b S0 s1 S2 a b a
  • 17. DFAvs.NFA •DFA: Action of automaton one achin put symbolically determined. •obvious table-driven implementation. •NFA: •Automaton may have choice one ach step •Automaton accepts a string if there is anyway to make choices to arrive at accepting state/every path from start state to an accept state is a string accepted by automaton. •Not obvious how to implement efficiently!
  • 18. Convert A Regular Expressions Directly To DFA: • We may convert a regular expression into a DFA (without creating a NFA first). •First we augment the given regular expression by concatenating it with special symbol #. • Then, we create a syntax tree for this augmented regular expression. •In this syntax tree, all alphabet symbols (plus # and the empty string) in the augmented regular expression will be on the leaves, and all inner nodes will be the operators in that augmented regular expression. •Then each alphabet symbol (plus #) will be numbered (position numbers).
  • 19. DFA Minimization: • DFA construction can produce large DFA with many states. • Lexer generators perform additional phase of DFA Minimization to reduce to minimum possible size . What does this DFA do? - Can it be simplified. 1
  • 20. Automatic Scanner Construction: To convert a specification into code. •Write down the RE for the input language. •Build a big NFA. •Build the DFA that simulates the NFA. •Systematically shrink the DFA. •Turn it into code. Scanner generators: •Lex and flex work along these lines. •Algorithm sare well known and understood. •Key issue is interface to the parser.