Chapter Two - Regular Expression and Regular languages.pptx

Regular Expression and
Regular languages
Chapter 2

Outline
2.1. Regular expressions
2.2. Connection between regular expression
and regular languages
2.3. Regular grammar
2.4. Pumping lemma

Introduction
• A Regular Expression (RE) is a symbolic method to
describe patterns of strings in a language.
• It represents a regular language, which is a language
accepted by finite automata (either DFA or NFA).
• Regular expressions are essential in:
• Lexical analysis in compilers
• Pattern matching (e.g., grep, regex in Python)
• Describing tokens in programming languages

Formal Definition of Regular
Expressions
I. Base Cases: are the simplest regular expressions — the
building blocks:
1. (phi):
ϕ represents the empty language:
L( ) =
ϕ ∅
This language contains no strings at all.
2.ε (epsilon): represents the language containing only the
empty string:
L(ε) = {ε}
The empty string is a string of length 0.
3.a, for any symbol a Σ: represents the language containing
∈
just the string "a":
L(a) = {a}

Cont …
II. Recursive Rules (Building Larger Expressions)
• If r and s are regular expressions representing languages L(r) and L(s),
• Then the following are also regular expressions:
1.Union: r s
∪
The expression r s
∪ denotes the language:
L(r s) = L(r) L(s)
∪ ∪
That is, any string in either L(r) or L(s).
2.Concatenation: rs
The expression rs denotes the concatenation of L(r) and L(s):
L(rs) = { xy | x L(r), y L(s) }
∈ ∈
3.Kleene Star: r*
The expression r* denotes the language containing zero or more
concatenations of strings from L(r):
L(r) = {ε, w , w w , w w w , ... | each w L(r)}
₁ ₁ ₂ ₁ ₂ ₃ ᵢ ∈ *

Precedence of Operators
• To interpret expressions
correctly without excessive
parentheses, operator
precedence is used:
• Kleene Star * – highest
precedence
• Concatenation
• Union | or ∪ – lowest
precedence
• Example:
• a|bc* is interpreted as a
(b(c*))
∪
• Use parentheses to control
grouping:
• (a|b)*abb means: any
number of a’s and b’s
followed by a, then b, then
b.

Cont …
• To interpret regular expressions without excessive
parentheses, we rely on a standard precedence of operators:
1.Kleene Star (*) – Highest Precedence
•Applies to the symbol or group directly before it.
•Example: a* means zero or more occurrences of a.
2.Concatenation – Medium Precedence
•Joins two patterns end to end.
•Example: ab* is interpreted as a(b*), not (ab)*.
3.Union (| or )
∪ – Lowest Precedence
•Represents choice between alternatives.
•Example: a|b* is interpreted as a | (b*), not (a | b)*.

Algebraic Laws of Regular
Expressions
• Regular expressions follow certain algebraic
identities, helping in simplification:
• Union is commutative: r s = s r
∪ ∪
• Concatenation is associative: (rs)t = r(st)
• Distributive property: r(s t) = rs rt
∪ ∪
• Identity element for union: r ∪ =
ϕ r
• Identity for concatenation: rε = εr = r
• Kleene star properties:
• r* = ε ∪ r rr rrr ...
∪ ∪ ∪
• (r*)* = r*

Excersice
• Write a regular expression for the set
of all strings over the alphabet {a, b}
that start with 'a’.
• Give a brief explanation of how your
regular expression works.
• Answer: Regular Expression: a(a|b)*
• Explanation:
• The string must start with an a.
• The expression (a|b)* means zero or
more occurrences of either a or b.
• So the string starts with a and is
followed by any sequence (including
none) of a’s and b’s.
• Valid examples: a, ab, aabbb, abab, aaaa.
• Write a regular expression for all
strings over {a, b} that end with 'b’.
• Explain how the expression ensures
the last character is 'b’.
• Answer: Regular Expression: (a|b)*b
• Explanation:
• (a|b)* matches any sequence of a's
and b’s.
• The final b ensures that the string
ends with b.
• Valid examples: b, ab, aab, bbab,
bbbb.

Cont …
• Write a regular expression for strings
that contain the substring "ab".
• Explain how your regex ensures "ab"
appears.
• Answer: Regular Expression: (a|b)*ab(a|
b)*
• Explanation:
• (a|b)* before and after means any
characters can come before and
after ab.
• The required ab substring must
appear at least once in the string.
• Examples: ab, aab, babab, aaabba,
bbaab.
• Write a regular expression for
strings over {a, b} that contain
exactly two a’s.
• Describe how this restricts the
count of 'a’.
• Answer: Regular Expression:
(b*)a(b*)a(b*)
• Explanation:
• a appears exactly twice.
• Between and around the as, there
can be any number of bs (even zero).
• No more than two as are allowed.
• Examples: aab, baab, babab, bbabb.

2.2. Connection between
regular expression &
regular languages

Introduction
Regular Expressions (RE):
• Symbolic notation to
describe patterns in
strings.
• Built from basic symbols
using operations: union
( ), concatenation, and
∪
Kleene star (*).
• Example: (a|b)*abb
Regular Languages (RL):
• A class of languages that
can be recognized by
finite automata
(DFA/NFA).
• These are exactly the
languages that can be
described by regular
expressions.

From Regular Languages to Regular
Expressions
• Every regular language can also be represented by a
regular expression.
• Why?
• Regular languages are recognized by DFA/NFA.
• Kleene's Theorem: If a language is recognized by
a finite automaton, then there exists a regular
expression that generates it.
• Conversion: Convert NFA Regular Expression
→

Regular Grammar
• Regular grammar is a formal grammar used to
describe regular languages, which are the languages
that can be recognized by finite automata.
• There are two standard forms of regular grammar:
• Right-Linear Grammar
• Left-Linear Grammar

Cont …
• A regular grammar is a restricted type of context-
free grammar (CFG) where all production rules follow
specific patterns.
• Formal Definition
• A regular grammar G is a 4-tuple (V,Σ,P,S):
• V: Finite set of non-terminal symbols (e.g., S,A,B)
• Σ: Finite set of terminal symbols (e.g., a,b)
• P: Production rules of specific forms
• S: Start symbol (S V)
∈

Types of Regular Grammars
I. Right-Linear Grammars
• Rule Forms:
• A aB(Non-terminal
→ →
terminal + non-terminal)
• A a (Non-terminal
→ →
terminal)
• A→ε (Only if A is the
start symbol)
II. Left-Linear Grammars
• Rule Forms:
• A Ba(Non-terminal
→ →
non-terminal + terminal)
• A a (Non-terminal
→ →
terminal)
• A→ε (Only if A is the
start symbol)
• A language is regular if and only if it can be generated by
a right-linear or left-linear grammar.

I. Right Linear Grammar
• Right Linear Grammars are special type of CFGs, where each
production rule has at most 1 variable on RHS & that variable is on
right most position.
• A xB
⇢
• A x
⇢
• where A,B V and x T*
∈ ∈
• Example 1:
• S -> aA | B
• A -> aaB
• B -> bB | a
• Grammar G is right-linear

Cont …
• Example 2: FA for accepting
strings that start with b
• ∑ = {a, b}
• Initial state(q0) = A
• Final state(F) = B
• The RLG corresponding to FA is
• A bB
⇢
• B /aB/bB
⇢ ∈
• The above grammar is RLG,
which can be written directly
through FA.

II. Left Linear Grammar
• Left Linear Grammars are Special type of CFGs, where Each
Production Rule has At Most 1 Variable on RHS & that variable
is on Left Most position.
• A Bx
⇢
• A x
⇢
• where A,B V and x T*
∈ ∈
• Example:
• A -> Da | Bc | b
• B -> Bf | Ca | a
• C -> Ca | D
• D -> 𝛆

2.4. Pumping lemma and
non-regular language
grammars

Introduction
• pumping lemma is used to prove that a language is not regular.
• what are regular languages and what are non-regular languages?
• In previous lessons, we learned that regular languages are
exactly those that can be described by regular expressions or
recognized by finite automata (DFA/NFA).
• All regular languages can be described using regular expressions.
• However, not all languages are regular.
• To prove that a language is not regular, we use a powerful tool
called the Pumping Lemma.

What is the Pumping Lemma?
• If a language A is regular, then there exists a
pumping length p≥1 such that any string s A, where
∈
s ≥p, can be split into three parts:
∣ ∣
• s=xyz
• such that the following three conditions are true:
• xyi
z A for all ≥0
∈ 𝑖
• ∣y ≥1
∣
• ∣xy ≤p
∣

Cont …
• Let’s break them down:
• xyz is the original string.
• Y is the part that can be repeated (pumped).
• Condition 1 ensures that no matter how many times we
repeat y (including zero times), the new string stays in the
language.
• Condition 2 ensures that we are actually repeating
something (not the empty string).
• Condition 3 limits the position of the pumpable section y
— it must be within the first p characters.

Proof by contradiction
• How to Use the Pumping Lemma to Prove a Language is Not
Regular?
• We use a proof by contradiction:
• Assume the language A is regular.
• Then there must exist a pumping length p.
• Choose a string s A such that s ≥p.
∈ ∣ ∣
• split s into xyz
• Find a value of i such that the pumped string xyi
z A.
∉
• This contradiction means that the language cannot be regular.

Summary Table
Concept Meaning
Pumping Lemma A property that all regular languages must satisfy
Pumping Length (p) The length beyond which strings can be pumped
Goal
Show that no matter how a long string is split, the
conditions will fail
Outcome
If conditions fail contradiction language is
→ → not
regular

Example
• Example 1: Language 1={a
𝐿 n
bn
≥0}L 1={a
∣𝑛 n
bn
n≥0}This
∣
language contains strings with equal number of a’s followed
by equal number of b’s
• (e.g., ab, aabb, aaabbb, etc.).

End Of Chapter
Instructor: Biniyam E.

Chapter Two - Regular Expression and Regular languages.pptx

More Related Content

Similar to Chapter Two - Regular Expression and Regular languages.pptx (20)

Recently uploaded (20)

Chapter Two - Regular Expression and Regular languages.pptx