Flat unit 2

S Venkat, Asst.Professor @CSE 2021
FORMAL LANGUAGE AND AUTOMATA THEORY
UNIT – II
Regular Expressions
• Regular Expressions, Regular Sets, Identity Rules,
• Equivalence of two Regular Expressions,
• Manipulations of Regular Expressions,
• Finite Automata and Regular Expressions,
• Equivalence between Finite Automata and Regular Expressions,
• Pumping Lemma, Closers Properties,
• Applications of Regular Expressions,
• Finite Automata and Regular Grammars,
• Regular Expressions and Regular Grammars.

Regular Expressions :
Regular expressions are simple algebraic expressions that are used to represent a set of
strings.
Recursive definition of regular expression:
For given input alphabet ∑, regular expression is defined by following rules:
1. Every character belonging to ∑, is a regular expression.
2. Null string ɛ is a regular expression.
3. If R1 and R2 are two regular expressions, then R1+ R2 is also a regular expression.
4. If R1 and R2 are two regular expressions, then R1. R2 is also a regular expression.
5. If R is a regular expression, then (R ) is also a regular expression.
6. If R is a regular expression, then R*
is also a regular expression.
7. Any combination of above rules is also a regular expression.
Regular Set or Regular Language :
Set of strings generated by regular expressions is known as regular set (or) regular language.
It is the language accepted by Finite Automata.
Recursive definition of regular language from regular expression:
For given input alphabet ∑, regular expression is defined by following rules:
1. If Null string ɛ is a regular expression, then the regular language is L={ɛ} which
contains the empty string.
2. For every input symbol a∈Σ, a is a regular expression, then the regular set is L={a}.
3. Let R1 = a and R2 = b and R1+ R2 is a regular expression, then regular expression is
a+b (means a or b) and regular set is L={a,b}.
4. Let R1 = (a+b) and R2 = c and R1. R2 is a regular expression, then regular expression is
(a+b).c (means starting with a or b followed by c) and regular set is L={ac,bc}.
5. Let R={a} and R*
is a regular expression (means any number of repetitions of a from
0 to ∞) then regular set is L={ ɛ,a,aa,aaa...}.
6. If ϕ is a regular expression, then regular language is L={}= ϕ which denotes theempty
set.
Properties of Regular Expressions:
If R1 and R2 are two regular expressions, then language generated by R1 is L(R1) and language
generated by R2 is L(R2). The properties of regular expressions are
1. Union: It is a string from L(R1) or L(R2). It is denoted by L(R1+ R2) = L(R1) ∪ L(R2).
Example: let R1 = ab+c ⟹ L(R1) ={ab,c}
R2 = d+ef ⟹ L(R2) ={d,ef}
Then L(R1+ R2) = L(R1) ∪ L(R2) = {ab,c} ∪ {d,ef} = {ab,c,d,ef}
2. Concatenation: It is a string from L(R1) followed by L(R2). It is denoted by L(R1. R2)
= L(R1) . L(R2).
Example: let R1 = ab+c ⟹ L(R1) ={ab,c}
R2 = d+ef ⟹ L(R2)={d,ef}
Then L(R1. R2) = L(R1) . L(R2) = {ab,c} . {d,ef} = {abd,abef,cd,cef}
3. Kleene Closure: It is a string obtained by concatenating zero or more strings with
repetitions. It is denoted by L(R*
) = (L(R))*.
Example: let R = ab+c ⟹L(R) ={ab,c}
Then L(R*
) = (L(R))* = {ab,c}* = {ɛ, ab,c,abab,abc,cab,cc,ababc…}
4. L((R))=L(R) denotes same language as R.
5. L(R).ɛ = ɛ .L(R) = L(R).

S Venkat, Asst.Professor @CSE 2021
Identity rules of Regular Expressions:
Two regular expressions R1 and R2 are equivalent if L(R1) = L(R2). The identity rules of regular expressions are
1. ɛ + R = R + ɛ
2. ɛ . R = R . ɛ = R
3. R + R = R
4. Φ + R = R + Φ =R
5. Φ . R = R . Φ = ϕ
6. ɛ* = ɛ = ϕ*
7. R* . R* = R*
8. R . R* = R* . R
9. ɛ + R . R* = ɛ + R* . R = R*
10. (R*)* = R*
PROOFS
LHS RHS
ɛ + R = R + ɛ L(ɛ+R) =L(ɛ)∪L(R)
= {ɛ}∪L(R)
L(R+ɛ) =L(R)∪L(ɛ)
=L(R))∪{ɛ}
Since L(ɛ + R)= L(R +ɛ),therefore ɛ+
R=R+
ɛ
ɛ . R = R . ɛ = R L(ɛ. R) =L(ɛ).L(R)
= {ɛ}.L(R)
= L(R)
L(R+ɛ) =L(R).L(ɛ)
=L(R).{ɛ}
= L(R)
Since L(ɛ . R)= L(R . ɛ)= L(R),
therefore ɛ . R = R . ɛ = R
R+R=R L(R + R) = L(R) ∪L(R)
= L(R)
Since L(R+R) = L(R),
therefore R+R = R
Φ + R = R + Φ =R L(Φ + R) =L(Φ) ∪ L(R)
= {} ∪ L(R)
= L(R)
L(R+Φ) = L(R)∪ L(Φ)
= L(R) ∪ {}
= L(R)
Since
L(Φ+R)=L(R+Φ)= L(R), therefore
Φ+R= R+ Φ=R
Φ . R = R .Φ = ϕ L(Φ . R) =L(Φ) .L(R)
= {} . L(R)
= {}
= Φ
L(R.Φ) = L(R).L(Φ)
= L(R) .{}
= {}
= Φ
Since L(Φ.R)= L(R.Φ) = Φ ,
therefore Φ .R = R .Φ = ϕ
ɛ* = ɛ = ϕ* L(ɛ*)=(L(ɛ))*
={ ɛ}*
={ ɛ, ɛɛ, ɛɛɛ, …}
={ ɛ, ɛ, ɛ, …}
={ ɛ, ɛɛ, ɛɛɛ, …}
={ ɛ}
= L(ɛ)
L(ϕ *)=(L(ϕ))*
={ }*
={ ɛ}
= L(ɛ)
Since
L(ɛ*) =L(ϕ*)=L(ɛ)
thereforeɛ*=ɛ=ϕ
*
R* . R* = R* L(R*.R*)=L(R*).L( R*)
=(L(R))*.(L( R))*
=(L(R))*.(ɛ.ɛ)
=(L(R))*.(ɛ)
=(L(R))*
Since L(R*.R*)=(L(R))*
Therefore R*.R*=R*
R . R* = R* . R L(R*.R*)=L(R).L(R*)
=L(R).(L( R))*
=(R).{ɛ,R,RR,…}
={R,RR,RRR,…}
L(R*.R)=L(R*).L(R)
=(L(R))*.L( R)
={ɛ,R,RR,..}(R)
={R,RR,RRR,..}
Since L(R.R*)=L(R*.R) Therefore
R.R*=R*.R
ɛ+R.R*=ɛ+R*.R=R* L(ɛ+R.R*)=L(ɛ)∪L(R.R*)
= {ɛ}∪{R,RR,RRR,…}
= L(R*)
L(ɛ+R*.R)=L(ɛ)∪L(R*.R)
= {ɛ}∪{R,RR,RRR,…}
= L(R*)
Since L(ɛ+R.R*)=L(ɛ+R*.R)=L(R*)
Therefore ɛ+R.R*= ɛ+R*.R=R*
(R*)* = R* L((R*)*)= L(R*))*
={(L(R)*}*
=(R*)*
={ ɛ,R,RR,…}*
={ ɛ,R,RR,…}
=R*
L(R*)=(L(R))*
=R*
Since L((R*)*)= L(R*)
Therefore (R*)* = R*

Algebraic laws of Regular Expressions:
1. Commutative law: R + S = S + R
2. Associative law : (R+S)+T = R+(S+T)
(R.S).T = R.(S.T)
3. Distributive law : (R+S)T = RT+ST
(S+T)R = SR+TR
4. Demorgan’s law : (R+S)* = (R*.S*)* = (R*+S*)*
(R.S)* = (R*.S*)* = (R*+S*)*
5. Other laws: (RS)*R = R(SR)*
Equivalence of two regular expressions: For every regular expression, there is an equivalent
finite automata. Two regular expressions are said to be equivalent, if same finite automata is
constructed for both of the regular expressions.
Finite Automata, and Regular Expressions:
Finite Automata to Regular Expressions:
Arden’s theorem: It is used to construct regular expression from finite automata.
Statement: If P, Q are two Res, and if P does not contain epsilon ɛ, then equation
R=Q+RP has a unique solution given by R=QP*.
Steps to convert finite automata to regular expression:
1. Write the equation for each and every state using the following notation
Name of state = (state from which input is coming) (input symbol on the transition)
2. While writing the equation for starting state, include ɛ.
3. Make substitutions and apply arden’s theorem wherever possible to obtain a
regular expression.
Regular Expression to Finite Automata:
There are 2 ways to convert regular expression to finite automata. They are
1. Top-down approach
2. Bottom-up approach
Top-down approach: FA construction starts from RE and ends by reaching a single
element on transition.
Bottom-up approach: FA construction starts from basic element and ends on reaching RE.
Steps to covert regular expression to finite automata:

Top-down approach:
1 Take 2 states – start state q0 and final state qf.
2 Make transition between these two states and place the
given regular expression on the transition.
3 If regular expression is in the form of RE=R1+R2, then
finite automata is represented as
4 If regular expression is in the form of RE=R1.R2, then finite
automata is represented as
5 If regular expression is in the form of RE=R*, then finite
automata is represented
6 Make use of the above steps until the finite automata
contains only single element on each transition.
Bottom-up approach:
1 Divide the given regular expression into number of
parts (Each part must be in the form of a RE.) and
construct finite automata for each and every part
separately using steps 2 to 7.
2 Take 2 states – start state q0 and final state qf. Make
transition between these two states and place the
regular expression on the transition.
3 If regular expression is in the form of RE=ɛ, then finite
4 If regular expression is in the form of RE=a, then finite
5 If regular expression is in the form of RE=R+S, then
6 If regular expression is in the form of RE= R.S, then
7 If regular expression is in the form of RE=R*, then
finite automata is represented
8 connect all the finite automata with an epsilon
transition to obtain resultant finite automata.

Pumping Lemma for regular languages:
Every language is not regular. To show that L is not regular, pumping lemma is used.
Pumping means “generating”. Pumping lemma means “it is a method of generating many
input strings from a given string.” i.e., it is used to break a given long input string into
several substrings.”
Statement of pumping lemma:
Let 𝑴 = (𝑸, 𝚺, 𝜹, 𝒒𝟎, 𝑭) be a finite automata with ‘n’ number of states.
1. Assume that the given language L is regular accepted by finite automata M.
2. Choose a string w𝜖 L such that |w| ≥ n. i.e., length of string ≥ number of states.
3. Break the string into 3 strings x,y,z, i.e., w=xyz such that |xy| ≤ n and y ≠ ɛ.
4. By using pumping lemma, pump y to the suitable integer ‘i’ such that xyi
z ∉ L for
i≥0. This contradicts our assumption.
5. Therefore L is not regular.
Closure Properties of Regular Languages:
Closure property: A set is closed under an operation if applying that operation to elements
of the set results in an element of the set.
For example, the set of natural numbers is closed under addition and multiplication but not
under subtraction or division because 2-3 is negative and 1/2 is not an integer. The set of
integers is closed under addition, subtraction and multiplication but not under division.
Regular sets are closed under the following:
1. Union
2. Concatenation
3. Complementation
4. Intersection
5. Kleene closure
6. Substitution
7. Homomorphism
8. Inverse homomorphism
9. Quotient operation
Union: The class of regular languages is closed under union.
Statement: If L1 , L2 are two regular languages, then L1 ∪ L2 is also regular.
Proof: let r1 is a regular expression denoting L1=L(r1) and let r2 is a regular expression denoting
L2 =L(r2)
L1 ∪ L2 = L(r1) ∪ L(r2)
=L(r1+r2)
According to recursive definition of regular expression, if r1 and r2 are two regular expressions,
then r1+r2 is also a regular expression.
Since r1+r2 is a regular expression and L1 ∪ L2 = L(r1+r2), therefore L1 ∪ L2 is also a regular.
Concatenation: The class of regular languages is closed under concatenation.
Statement: If L1 , L2 are two regular languages, then L1 . L2 is also regular.
Proof: let r1 is a regular expression denoting L1=L(r1) and
let r2 is a regular expression denoting L2=L(r2)
L1 . L2 = L(r1) . L(r2)
=L(r1.r2)
According to recursive definition of regular expression, if r1 and r2 are two regular expressions,
then r1.r2 is also a regular expression.

Since r1.r2 is a regular expression and L1 . L2 = L(r1.r2), therefore L1 . L2 is also a regular.
Complementation: The class of regular languages is closed under Complementation.
Statement: If L(M) is a regular language, then L(M’) is also regular ( 𝑙 ̅= ∑* - L)
Proof: Let L(M) is regular and let M=(Q,∑,δ,q0,F) be a DFA for L(M) and
M’=(Q,∑,δ,q0,Q-F) be a DFA for L(M’), complement of L.
If we construct a DFA for L(M), then w ∈ L, if w leads to final state in M.
This implies that in M’, w leads to the same state but this state is non-accepting state in M’.
Therefore, M’ rejects w. Similarly, if w ∉ L, then M’ accepts w.
Therefore, L(M’) = L(M).
Since L(M) is a regular language, therefore L(M’) is also regular
Intersection: The class of regular languages is closed under Intersection.
Statement: If L1 , L2 are two regular languages, then L1 ∩ L2 is also regular.
Proof: let r1 is a regular expression denoting L1=L(r1) and
let r2 is a regular expression denoting L2=L(r2)
L1 ∩ L2 = (𝑙̅1 ∪ 𝑙̅2 ̅̅̅̅̅̅̅̅̅
)
According to closure properties of regular languages,
since L1 and L2 are two regular languages, its complement 𝑙̅1 and 𝑙̅2 are also regular.
since 𝑙̅1 and 𝑙̅2 are two regular languages, its union 𝑙̅1 ∪ 𝑙̅2 is also regular.
since 𝑙̅1 ∪ 𝑙̅2 are two regular languages, its complement 𝑙̅1 ∪ 𝑙̅2 ̅̅̅̅̅̅̅̅̅
is also regular.
Therefore L1 ∩ L2 is also regular.
Kleene Closure: The class of regular languages is closed under kleene closure.
Statement: If L1 is a regular language, then L1* is also regular.
Proof: let r1 is a regular expression denoting L1=L(r1) and L1* = (L(r1))*
=L(r1*)
According to recursive definition of regular expression, if r1 is a regular expression, then
r1* is also a regular expression.
Since r1* is a regular expression and L1* = L(r1*), therefore L1* is also a regular.
Substitution: The class of regular languages is closed under substitution.
Statement: If L1 is a regular language with some alphabet ∑, then L1 is also regular with a
unique replacement of ∑ with ∑1.
Proof: let ∑={a,b}, then a is a regular expression denoting L1={a}
and b is a regular expression denoting L2={b}
therefore L= L1.L2=ab is a regular language over ∑={a,b}.
let ∑1={0,1}, on substituting ‘a’ with ‘0’ and ‘b’ with ’1’.
L=01 is also regular.
Homomorphism: The class of regular languages is closed under homomorphism.
Statement: If L1 is a regular language with some alphabet ∑, and if strings in L1are replaced
with other strings, then resultant language is also regular.
Proof: let ∑={a,b}, then aba is a regular expression denoting L1={aba} and (aa)* is a regular
expression denoting L2={(aa)*} therefore L= L1.L2=aba(aa)* is a regular language over
∑={a,b}. on substituting ‘aba’ with ‘0’ and ‘aa’ with ’1’.
h(aba)=0
h(aa)=1
then 01* is also regular.

Inverse Homomorphism: The class of regular languages is closed under inverse homomorphism.
Statement: It works by applying homomorphism backwards
Proof: let h(aba)=0
h(aa)=1
then h(01)=aba(aa)
h-1
(abaaa)=01 and h-1
((abaaa + (aa)*)*)=(01+1*)*
Quotient of Language: The class of regular languages is closed under quotient of a
language.
Statement: It works by applying homomorphism backwards
Proof: If L1,L2 are two regular languages, then Quotient denoted by L1/L2 is also regular.
L1/L2={x|∃y in L2 and ∋ xy is in L1}
e.g: Let L1=0*10*, L2=0*1, then L1/L2=0*10*/0*1=0*
Applications of Regular Expressions are:
• Extracting emails from a Text Document
• Regular Expressions for Web Scraping (Data Collection)
• Working with Date-Time features
• Using Regex for Text Pre-processing (NLP)
Finite Automata and regular grammars:
Grammar: Grammar is a set of rules used to define a valid sentence in any language.
Mathematical representation of grammar:
Grammar is defined as a 4-tuple G={V,T,P,S} where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules
S – Start symbol
“Non terminals” are represented by capital letters A, B, C,… so on.
“Terminals” are represented by small letters a, b, c,… so on.
“Production rules” are of the form α → β, where
α contains non-terminal symbols or may also contain both terminals and non-terminals with atleast
one non-terminal.
β contains terminal, non-terminal symbols or any combination of both terminals and non-terminals.
Types of grammar:
1. Linear grammar-
Left linear grammar and Right linear grammar
2. Non-Linear grammar
3. Simple grammar
4. Recursive grammar
5. Regular grammar
Linear grammar: A grammar with atmost one variable (non-terminal) at right side of
production rule is called linear grammar.
Example: S→aSb S→A S→Ab
S→ɛ A→aB|ɛ A→aAb|ɛ
B→Ab
Non linear grammar: A grammar with more than one variable (non terminal) at the right
side of the production is called non linear grammar.
Example: S→SS S→aA
S→ɛ A→ɛ
S→aSb S→aSAS
S→bSa

Simple or S- Grammar: A grammar is said to be S-Grammar if all the productions are of the form
A→ax, where A ∈V (variables)
a ∈T (terminals)
x ∈ V* (0 or more variables) and any pair (A,a) occurs atmost once in production p.
Example: S→aS
S→bS
S
S→c
S→aAS|a
A→SbA
S→aS S→bSS
S→aSS
S→c
(s,a), (s,b) occur only once (s,a) occur only once (s,a) occur twice
∴ s-grammar ∴ s-grammar ∴ not s-grammar
Recursive grammar: A grammar is said to be recursive grammar, if it contains either a
recursive production or an indirectly recursive production.
Example: S→aS S→aA|b S→SS
A→bS|c S→aSb
S→ɛ
Left linear grammar: A grammar G=(V,T,P,S) is said to be left linear grammar if all the
productions are of the form A→Bx or A→x, where A,B ∈ V (variables)
x ∈ T* (0 or more terminals)
i.e., RHS non terminal is at left end.
Example: S→Ab S→Aab
S→Sb A→Aab
S→ɛ A→B
B→a
Right linear grammar: A grammar is said to be right linear grammar if all the productions
are of the form A→xB or A→x, where A,B ∈ V (variables)
x ∈ T* (0 or more variables)
i.e., RHS non terminal is at right end.
Example: S→abS
S→a
S→A
A→aA
A→ɛ
Regular Grammar: A grammar G=(V,T,P,S) is said to be regular grammar, if it is either right
linear grammar or left linear grammar or nothing. i.e., productions are in the form of A→xB or
A→Bx or A→x or A→ɛ, where A,B ∈ V(variables)
x ∈ T* (0 or more terminals)
i.e., LHS contains a single non terminal and
RHS contains only terminals or single non terminal followed by terminals or
terminals followed by single non terminal or nothing.
Example: S→abS
S→a
S→Aab
A→Aab|B
B→a

Relation between Regular Grammar and Finite Automata:
Conversion between regular grammar and FA:
1. RLG to FA
2. FA to RLG
3. RLG to LLG
4. LLG to RLG
5. LLG to FA
6. FA to LLG
1. RLG to FA:
1. Let G=(V,T,P,S) be the RLG and M be the FA.
2. Start symbol of G is the start symbol of M.
3. Each variable in RLG will be a state in FA.
4. If variable has an ɛ-production A→ɛ, then that state A will become final state,
otherwise a new final state will be introduced.
5. If production in G is of the form A→a1 a2 a3… anB, then maketransition
6. If production in G is of the form A→a1 a2 a3… an, then make transition introducing a
new final state.
2.FA to RLG:
1. Let G=(V,T,P,S) be the RLG and M be the FA.
2. Start symbol of M is the start symbol of G.
3. For each transition , write the production
4. Now combine all the productions to form a RLG.
5. Make substitutions if possible to minimize the grammar.
3.RLG to LLG:
1. Let L be RLG. Construct FA for given RLG.
2. Replace the start state with final state and vice versa and reverse the direction of all
the edges.
3. Construct grammar from step2
4. Reverse the right side of each production of RLG obtained in step3.
5. Resultant grammar is LLG.
4.LLG to RLG:
1. Reverse the right side of each production of LLG
2. Construct FA for obtained grammar in step1.
3. Replace the start state with final state and vice versa and reverse the direction of all
the edges.
4. Construct grammar from FA obtained in step3
5. Resultant grammar is RLG.
5.LLG to FA:
LLG → RLG → FA
6.FA to LLG:
FA → RLG → LLG

Ex.1: Construct the FA for regular expression 0*1 + 10.
Solution: We will first construct FA for R = 0*1 + 10 as follows:
Step 1:
Step 2:
Step 3:
Step 4:
Ex.2: Design a NFA from given regular expression 1 (1* 01* 01*)*.
Solution: The NFA for the given regular expression is as follows:
Step 1:
Step 2:

Step 3:
Regular Expressions and Regular Grammars:
Conversion between regular grammar and regular expression:
1. RG to RE
2. RE to RG
1.RG to RE:
1. Let G=(V,T,P,S) be the RLG, replace → symbol in the production with ‘=’ symbol.
2. If there is a equation of the form A= aA | b, then convert it into A=a*b
3. By applying substitutions, compute the expression for the starting symbol.
2.RE to RG:
RE → FA → RLG
-----------------------------------------------------------------------------------------------------------
The following intuitive rules to convert basic/minimal regular expressions directly to regular
grammar (RLG for Right Linear Grammars, LLG for Left Linear Grammars):

Example: To find RLG and LLG for regular expression(regex)-- 0∗(1(0+1))∗0∗(1(0+1))∗.
At each step, same color is used to match part of regex getting translated into corresponding
part in grammar.
Preparing RLG
Notice that rule in first table says e∗e∗ gets translated to RLG productions taking
form S→eS|ϵS→eS|ϵ.
However in example above, we can see that, to emulate * in regex, we have to put ϵϵ in step 2 (green
color) as indicated by the rule, but also need to add bunch of other stuff in step 4 (blue color), which
are not directly indicated by the rule (though somewhat intuitive extension of the rule).
Preparing LLG
While preparing LLG also, we can see that, to emulate * in regex, we have to put ϵϵ (green colored) in
step 1 as indicated by the rule, but also need to add bunch of other stuff in step 2 (green colored),
which are not directly indicated by the rule (though again somewhat intuitive extension of the rule).
Apart from star closure, there are many stuff that we don’t find straight forward or at least requiring
extra awareness (that cannot be easily put in step-by-step procedure) while preparing grammar.
For example, while preparing RLG, to emulate 0∗0∗, I can do S→0S|ϵS→0S|ϵ as indicated by first
table. But in above example, We have to remain extra aware that there is something
more ((1(0+1))∗)((1(0+1))∗) after 0∗0∗ which forces us to put AA in S→0S|A|ϵS→0S|A|ϵ production
in step 2. Other facts we should be aware of:
• While preparing RLG, we should start from left of the regex
• While preparing LLG, we should start from right of the regex

Flat unit 2

More Related Content

What's hot (20)

Similar to Flat unit 2 (20)

Recently uploaded (20)

Flat unit 2