1. 1
Instructor:
Lecture: # Week: # Semester:
AMERICAN INTERNATIONAL UNIVERSITY-BANGLADESH
REGULAR EXPRESSIONS (RE)
CSC3113: THEORY OF COMPUTATION
5 4 Fall 2024-2025
Sharfuddin Mahmood, Assistant Professor,
Department of Computer Science, Faculty of Science & Technology.
smahmood@aiub.edu
2. 2
CSC3113: Theory of Computation
LECTURE OUTLINE
LECTURE OUTLINE
Formal Definition of Regular Expression (RE)
Equivalence with Finite Automaton
Conversion from NFA to RE
Conversion from DFA to RE.
Closure under regular operations.
3. 3
CSC3113: Theory of Computation
LEARNING OBJECTIVE
LEARNING OBJECTIVE
Mathematical model of Regular Expression (RE)
Understand the uniformity of RE and FA.
Conversion Techniques from NFA to RE.
The strength of RE.
Techniques to convert DFA to RE
Closure under different regular operations.
4. 4
CSC3113: Theory of Computation
LEARNING OUTCOME
LEARNING OUTCOME
ALL OUTCOME ARE REPRESENTED WITH
EXAMPLES
Understand the mathematical interpretation of Regular Expression
(RE)
Learn the rules for equivalence of RE with Finite Automaton
Apply the conversion rules from RE to NFA
Apply the techniques to convert DFA to RE
Identify the closure under different regular operations.
5. 5
CSC3113: Theory of Computation
REGULAR EXPRESSION
Regular expression is used to describe languages.
Regular expression is specific, standard textual syntax
(combined with alphabets and regular operators) for
representing patterns for matching strings.
Regular expression can be built up using regular
operations.
Precedence order: *
Example:
(01)0* = ({0}{1}){0}* = {0,1}{0}*
A = {w string w starts with a 0 or a 1 followed by zero or more 0’s}
(01)* = ({0}{1})* = {0,1}*
A = {all possible string with 0s and/or 1s}.
6. 6
CSC3113: Theory of Computation
REGULAR EXPRESSION
Example:
L={w| each a in w is followed by at least two b}
a )
(
bb b*
Each a Followed by at least two b’s
For more than two b’s
If there is zero or more b before first occurrence of a
b*
If there is zero or more (repeating pattern of) a after first occurrence of a
a bb b* *
? ? ?
? ?
L={w| w contains odd number of b}
b )
(
a* b b
At least one b to make it odd
If there is zero or more a before first occurrence of b
For more than one b, There will be at least two more b’s to make it (three) odd
Before, In-between, and after these two b’s there can be zero or more a’s
a* a* a*
For more than three b’s, The sequence of two b’s will be repeated
*
?
? ?
?
?
?
? ?
L={w| w contains abbab substring}
abbab
(aUb)*
String abbab must occur at least once
Before string abbab there can be zero or more number of a’s and/or b’s
After string abbab there can be zero or more number of a’s and/or b’s
(aUb)*
?
? ?
b* a bb b* (a bb b*)*
a* b (a* b a* b a*)*
(aUb)* abbab (aUb)*
7. 7
CSC3113: Theory of Computation
FORMAL DEFINITION OF REGULAR
EXPRESSION
R is a regular expression if R is –
a for some a , represents the language {a}.
, represents the language {} containing a single string,
namely, the empty string.
, represents the empty language that doesn’t contain
any string. L(*) = {}.
(R1R2), where R1 and R2 are regular expressions,
R = R, but R may not be equal to R.
(R1R2), where R1 and R2 are regular expressions,
R = R, but R may not be equal to R.
(R1*), where R1 is a regular expressions,
8. 8
CSC3113: Theory of Computation
EQUIVALENCE WITH FINITE
AUTOMATA
Let convert regular language R into an NFA
considering the six cases in the formal definition of
regular language.
R = a, aΣ. Then L(R)={a}, and the NFA that recognizes
L(R) is –
R = . Then L(R)={}, and the NFA that recognizes L(R) is
–
R = . Then L(R)= , and the NFA that recognizes L(R) is
–
a
9. 9
CSC3113: Theory of Computation
EQUIVALENCE WITH FINITE
AUTOMATA
R = R1R2. Then L(R)={R1,R2}, and the NFA that
recognizes L(R) is –
R = R1R2. Then L(R)={R1R2}, and the NFA that
recognizes L(R) is –
R = R1*. Then L(R)={R1}*, and the NFA that
recognizes L(R) is –
R2
R1
R1 R2
R1
10. 10
CSC3113: Theory of Computation
CONVERTING A REGULAR EXPRESSION
TO AN NFA
Building an NFA from regular expression: (ab)*aba
a
b
a
b
ab
(ab)*
aba
a b a
(ab)*aba
11. 11
CSC3113: Theory of Computation
CONVERTING A DFA TO A REGULAR
EXPRESSION
This can be done in two parts. For this we introduce a new type of finite automata
called generalized nondeterministic automaton, GNFA.
First, we will convert a DFA to GNFA, and
then GNFA to regular expression.
GNFA has the following special form –
Transition labels might be in regular expression form.
The start state doesn’t have any incoming arrow
from any other state.
There is only one accept state, and it doesn’t
have any outgoing arrow to any other state.
Start state is never the same as accept state.
There is only one outgoing arrow to any other
state and to itself, except the start and accept
states. We will consider labeled outgoing arrows,
if no transition exists between any two states.
qstart qaccept
ab*
a* (aa)*
b*
aa
ab ba
ab
b
12. 12
CSC3113: Theory of Computation
CONVERTING A DFA TO GNFA
Add a new start state with an
arrow to the old start state.
Add new accept state with
arrows from the old accept states.
If any arrows have multiple labels,
union the previous labels into one
label.
Add arrows with label between
states where there are no arrows.
This won’t change the language as
label arrows can never be used.
Even we might ignore adding such
arrows, as these are arrows which can
be assumed to be there with no use.
1
2
a
b
a,b
ab
s
f 2
13. 13
CSC3113: Theory of Computation
FORMAL DEFINITION OF GNFA
A generalized nondeterministic finite automaton is a 5-tuple,
(Q, Σ, , qstart, qaccept) where –
Q is the finite set of states,
Σ is the input alphabet,
: (Q - {qstart}) (Q - {qaccept}) R is the transition function,
qstart is the start state,
qaccept is the accept state.
A GNFA accepts a string w in Σ* if w = w1w2…wk, where each
wi is in Σ* and a sequence of states q0, q1, …qk exists such that –
q0=qstart is the start state,
qk=qaccept is the accept state, and
For each i, we have wi L(Ri), where Ri = (qi-1, qi);
i.e., Ri is the expression on the arrow from qi-1 to qi.
14. 14
CSC3113: Theory of Computation
CONVERTING A GNFA TO A REGULAR
EXPRESSION
Let consider the GNFA to be with k states.
We will continuously remove one state from the GNFA until
k = 2. These last two states are actually the start and the
accept states.
We do so by selecting a state, ripping it out of the machine,
and repairing the remainder so that the same language is
still recognized.
Any state will do, provided that the state is not the start or
the accept states.
15. 15
CSC3113: Theory of Computation
REPAIRING AFTER REMOVING A
STATE
Let us call the removed state qrmv.
Repair the machine by altering the regular expressions that label each
of the remaining arrows. This change is done for each arrow going
from any state qs to qd, including the case where qs = qd.
The new labels compensate for the absence of qrmv by adding back
the lost computations. i.e., The new label going from a state qs to state
qd is a regular expression that describes all strings that would take the
machine from qs to qd either directly or via qrmv.
qs
qrmv
qd
R4
R2
R1 R3
(R1)(R2)*(R3) (R4)
16. 16
CSC3113: Theory of Computation
EXAMPLE:
CONVERTING A TWO STATE DFA TO AN
EQUIVALENT REGULAR EXPRESSION
1
2
a
b
a,b
ab
s
f 2
b
b(ab)*
a*
a*b(ab)*
17. 17
CSC3113: Theory of Computation
EXAMPLE:
CONVERTING A THREE STATE DFA TO AN
EQUIVALENT REGULAR EXPRESSION
1
3
2
a
a
a
b
b
b
s f
3
2
b
baa
a
ab
aab
a(aab)*abb
a(aab)*
bb
(baa)(aa b)*ab bb
(ba a)(aa b)*
(a(aab)*abb)((baa)(aa b)*ab bb)*((ba a)(aa b)* )
(a(aab)*)
18. 18
CSC3113: Theory of Computation
REFERENCES
REGULAR EXPRESSION: PART-1
Introduction to Theory of Computation, Sipser, (3rd
ed),
Regular Expression.