SlideShare a Scribd company logo
EDUCATION HOLE PRESENTS
THEORY OF AUTOMATA & FORMAL LANGUAGES
Unit-III
Arden Theorem........................................................................................................................ 2
Pumping Lemma for regular expressions..........................................................................................3
Use of lemma...................................................................................................................................4
Proof of the pumping lemma ...........................................................................................................4
General version of pumping lemma for regular languages ..................................................................................5
Converse of lemma not true.................................................................................................................................5
My hill-Nerode theorem .......................................................................................................... 6
Context free grammar: Ambiguity........................................................................................... 8
Recognizing ambiguous grammars ...................................................................................................8
Inherently ambiguous languages......................................................................................................9
Simplification of CFGs.............................................................................................................. 9
Normal forms for CFGs................................................................................................................... 10
Pumping lemma for CFLs ....................................................................................................... 11
Usage of the lemma ....................................................................................................................... 11
Ambiguous to Unambiguous CFG .......................................................................................... 12
Arden Theorem
In theoretical computer science, Arden's rule, also known as Arden's lemma, is a mathematical
statement about a certain form of language equations. Let P and Q be two Regular Expression s over
Σ. If P does not contain Λ, then for the equation
R = Q + RP has a unique (one and only one) solution R = QP*
.
Proof:
Now point out the statements in Arden's Theorem in General form.
(i) P and Q are two Regular Expressions.
(ii) P does not contain Λ symbol.
(iii) R = Q + RP has a solution, i.e. R = QP*
(iv) This solution is the one and only one solution of the equation.
If R = QP*
is a solution of the equation R = Q + RP then by putting the value of R in the equation
we shall get the value ‘0’.
(Putting the value of R in the LHS we get)
So from here it is proved that R = QP*
is a solution of the equation R = Q + RP.
Pumping Lemma for regular expressions
Let L be a regular language. Then there exists an integer p ≥ 1 depending only on L such that
every string w in L of length at least p (p is called the "pumping length") can be written as w =
xyz (i.e., w can be divided into three substrings), satisfying the following conditions:
1. |y| ≥ 1;
2. |xy| ≤ p
3. for all i ≥ 0, xyi
z ∈ L
y is the substring that can be pumped (removed or repeated any number of times, and the
resulting string is always in L). (1) means the loop y to be pumped must be of length at least one;
(2) means the loop must occur within the first p characters. |x| must be smaller than p (conclusion
of (1) and (2)), apart from that there is no restriction on x and z.
In simple words, for any regular language L, any sufficiently long word w (in L) can be split into
3 parts. i.e. w = xyz , such that all the strings xyk
z for k≥0 are also in L.
Below is a formal expression of the Pumping Lemma.
Use of lemma
The pumping lemma is often used to prove that a particular language is non-regular: a proof by
contradiction (of the language's regularity) may consist of exhibiting a word (of the required
length) in the language which lacks the property outlined in the pumping lemma.
For example the language L = {an
bn
: n ≥ 0} over the alphabet Σ = {a, b} can be shown to be
non-regular as follows. Let w, x, y, z, p, and i be as used in the formal statement for the pumping
lemma above. Let w in L be given by w = ap
bp
. By the pumping lemma, there must be some
decomposition w = xyz with |xy| ≤ p and |y| ≥ 1 such that xyi
z in L for every i ≥ 0. Using |xy| ≤ p,
we know y only consists of instances of a. Moreover, because |y| ≥ 1, it contains at least one
instance of the letter a. We now pump y up: xy2
z has more instances of the letter a than the letter
b, since we have added some instances of a without adding instances of b. Therefore xy2
z is not
in L. We have reached a contradiction. Therefore, the assumption that L is regular must be
incorrect. Hence L is not regular.
The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows
the same idea. Given p, there is a string of balanced parentheses that begins with more than p left
parentheses, so that y will consist entirely of left parentheses. By repeating y, we can produce a
string that does not contain the same number of left and right parentheses, and so they cannot be
balanced.
Proof of the pumping lemma
For every regular language there is a finite state automaton (FSA) that accepts the language. The
numbers of states in such an FSA are counted and that count is used as the pumping length p. For
a string of length at least p, let s0 be the start state and let s1, ..., sp be the sequence of the next p
states visited as the string is emitted. Because the FSA has only p states, within this sequence of
p + 1 visited states there must be at least one state that is repeated. Write S for such a state. The
transitions that take the machine from the first encounter of state S to the second encounter of
state S match some string. This string is called y in the lemma, and since the machine will match
a string without the y portion, or the string y can be repeated any number of times, the conditions
of the lemma are satisfied.
For example, the following image shows an FSA.
The FSA accepts the string: abcd. Since this string has a length which is at least as large as the
number of states, which is four, the pigeonhole principle indicates that there must be at least one
repeated state among the start state and the next four visited states. In this example, only q1 is a
repeated state. Since the substring bc takes the machine through transitions that start at state q1
and end at state q1, that portion could be repeated and the FSA would still accept, giving the
string abcbcd. Alternatively, the bc portion could be removed and the FSA would still accept
giving the string ad. In terms of the pumping lemma, the string abcd is broken into an x portion
a, a y portion bc and a z portion d.
General version of pumping lemma for regular languages
If a language L is regular, then there exists a number p ≥ 1 (the pumping length) such that every
string uwv in L with |w| ≥ p can be written in the form
uwv = uxyzv
with strings x, y and z such that |xy| ≤ p, |y| ≥ 1 and
uxyi
zv is in L for every integer i ≥ 0.
This version can be used to prove many more languages are non-regular, since it imposes stricter
requirements on the language.
Converse of lemma not true
Note that while the pumping lemma states that all regular languages satisfy the conditions
described above, the converse of this statement is not true: a language that satisfies these
conditions may still be non-regular. In other words, both the original and the general version of
the pumping lemma give a necessary but not sufficient condition for a language to be regular.
For example, consider the following language L:
.
In other words, L contains all strings over the alphabet {0,1,2,3} with a substring of length 3
including a duplicate character, as well as all strings over this alphabet where precisely 1/7 of the
string's characters are 3's. This language is not regular but can still be "pumped" with p = 5.
Suppose some string s has length at least 5. Then, since the alphabet has only four characters, at
least two of the five characters in the string must be duplicates. They are separated by at most
three characters.
• If the duplicate characters are separated by 0 characters, or 1, pump one of the other two
characters in the string, which will not affect the substring containing the duplicates.
• If the duplicate characters are separated by 2 or 3 characters, pump 2 of the characters
separating them. Pumping either down or up results in the creation of a substring of size 3
that contains 2 duplicate characters.
• The second condition of L ensures that L is not regular: i.e., there are an infinite number
of strings that are in L but cannot be obtained by pumping some smaller string in L.
For a practical test that exactly characterizes regular languages, see the Myhill-Nerode theorem.
The typical method for proving that a language is regular is to construct either a finite state
machine or a regular expression for the language.
My hill-Nerode theorem
parity machine, acceptor accepting {an
bm
|n,m ≥ 1}, and many more such examples. Consider the
serial adder. After getting some input, the machine can be in ‘carry’ state or ‘no carry’ state. It
does not matter what exactly the earlier input was. It is only necessary to know whether it has
produced a carry or not. Hence, the FSA need not distinguish between each and every input. It
distinguishes between classes of inputs. In the above case, the whole set of inputs can be
partitioned into two classes – one that produces a carry and another that does not produce a carry.
Similarly, in the case of parity checker, the machine distinguishes between two classes of input
strings: those containing odd number of 1’s and those containing even number of 1’s. Thus, the
FSA distinguishes between classes of input strings. These classes are also finite. Hence, we say
that the FSA has finite amount of memory.
The following three statements are equivalent.
1. L ⊆ Σ* is accepted by a DFSA.
2. L is the union of some of the equivalence classes of a right invariant equivalence relation
of finite index on Σ*.
3. Let equivalence relation RL be defined over Σ* as follows: xRL y if and only if, for all z
∊ Σ*, xz is in L exactly when yz is in L. Then RL is of finite index.
Proof We shall prove (1) ⇒ (2), (2) ⇒ (3), and (3) ⇒ (1).
(1) ⇒ (2)
Let L be accepted by a FSA M = (K, Σ, δ, q0, F). Define a relation RM on Σ* such that xRMy if δ(q0, x) = δ(q0,y). RM is an equivalence relation,
as seen below.
∀x xRMx, since δ(q0, x) = δ(q0, x),
∀x xRMy ⇒ yRMx ∵ δ(q0, x) = δ(q0, y) which means δ(q0, y) = δ(q0, x),
∀x, y xRM y and yRMz ⇒ xRMz.
For if δ(q0, x) = δ(q0, y) and δ(q0, y) = δ(q0, z) then δ(q0, x) = δ(q0, z).
So RM divides Σ* into equivalence classes. The set of strings which take the machine from q0 to
a particular state qi are in one equivalence class. The number of equivalence classes is therefore
equivalent to the number of states of M, assuming every state is reachable from q0. (If a state is
not reachable from q0, it can be removed without affecting the language accepted). It can be
easily seen that this equivalence relation RM is right invariant, i.e., if
xRM y, xzRM yz ∀z ∊ Σ*.
δ(q0, x) = δ (q0, y) if xRM y,
δ(q0, xz) = δ(δ (q0, x), z) = δ(δ (q0, y), z) = δ(q0, yz). Therefore xzRM yz.
L is the union of those equivalence classes of RM which correspond to final states of M.
(2) ⇒ (3)
Assume statement (2) of the theorem and let E be the equivalence relation considered. Let RL be
defined as in the statement of the theorem. We see that xEy ⇒ xRL y.
If xEy, then xzEyz for each z ∊ Σ*. xz and yz are in the same equivalence class of E. Hence, xz
and yz are both in L or both not in L as L is the union of some of the equivalence classes of E.
Hence xRL y.
Hence, any equivalence class of E is completely contained in an equivalence class of RL.
Therefore, E is a refinement of RL and so the index of RL is less than or equal to the index of E
and hence finite.
(3) ⇒ (1)
First, we show RL is right invariant. xRL y if ∀z in Σ*, xz is in L exactly when yz is in L or we
can also write this in the following way: xRL y if for all w, z in Σ*, xwz is in L exactly when ywz
is in L.
If this holds xwRLyw.
Therefore, RL is right invariant.
Let [x] denote the equivalence class of RL to which x belongs.
Construct a DFSA ML = (K′, Σ, δ′, q0, F′) as follows: K′ contains one state corresponding to each
equivalence class of RL. [ε] corresponds to q′0. F′ corresponds to those states [x], x ∊ L. δ′ is
defined as follows: δ′ ([x], a) = [xa]. This definition is consistent as RL is right invariant.
Suppose x and y belong to the same equivalence class of RL. Then, xa and ya will belong to the
same equivalence class of RL. For,
δ′([x], a) = δ′([y], a)
⇓ ⇓
[xa] = [ya]
if x ∊ L, [x] is a final state in M′, i.e., [x] ∊ F′. This automaton M′ accepts L.
Context free grammar: Ambiguity
An ambiguous grammar is a formal grammar for which there exists a string that can have more
than one leftmost derivation, while an unambiguous grammar is a formal grammar for which
every valid string has a unique leftmost derivation. Many languages admit both ambiguous and
unambiguous grammars, while some languages admit only ambiguous grammars. Any non-
empty language admits an ambiguous grammar by taking an unambiguous grammar and
introducing a duplicate rule or synonym (the only language without ambiguous grammars is the
empty language). A language that only admits ambiguous grammars is called an inherently
ambiguous language, and there are inherently ambiguous context-free languages. Deterministic
context-free grammars are always unambiguous, and are an important subclass of unambiguous
CFGs; there are non-deterministic unambiguous CFGs, however. For real-world programming
languages, the reference CFG is often ambiguous, due to issues such as the dangling else
problem. If present, these ambiguities are generally resolved by adding precedence rules or other
context-sensitive parsing rules, so the overall phrase grammar is unambiguous.
Recognizing ambiguous grammars
The general decision problem of whether a grammar is ambiguous is undecidable because it can
be shown that it is equivalent to the Post correspondence problem. At least, there are tools
implementing some semi-decision procedure for detecting ambiguity of context-free grammars.
The efficiency of context-free grammar parsing is determined by the automaton that accepts it.
Deterministic context-free grammars are accepted by deterministic pushdown automata and can
be parsed in linear time, for example by the LR parser.[2]
This is a subset of the context-free
grammars which are accepted by the pushdown automaton and can be parsed in polynomial time,
for example by the CYK algorithm. Unambiguous context-free grammars can be
nondeterministic. For example, the language of even-length palindromes on the alphabet of 0 and
1 has the unambiguous context-free grammar S → 0S0 | 1S1 | ε. An arbitrary string of this
language cannot be parsed without reading all its letters first which means that a pushdown
automaton has to try alternative state transitions to accommodate for the different possible
lengths of a semi-parsed string.[3]
Nevertheless, removing grammar ambiguity may produce a
deterministic context-free grammar and thus allow for more efficient parsing. Compiler
generators such as YACC include features for resolving some kinds of ambiguity, such as by
using the precedence and associativity constraints.
Inherently ambiguous languages
Inherent ambiguity was proven with Parikh's theorem in 1961 by Rohit Parikh in an MIT
research report.
While some context-free languages (the set of strings that can be generated by a grammar) have
both ambiguous and unambiguous grammars, there exist context-free languages for which no
unambiguous context-free grammar can exist. An example of an inherently ambiguous language
is the union of with . This set is context-
free, since the union of two context-free languages is always context-free. But Hopcroft &
Ullman (1979) give a proof that there is no way to unambiguously parse strings in the (non-
context-free) subset which is the intersection of these two languages.
Simplification of CFGs
a context-free grammar (CFG) is a formal grammar in which every production rule is of the form
V → w
Where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w
can be empty). A formal grammar is considered "context free" when its production rules can be
applied regardless of the context of a nonterminal. It does not matter which symbols the
nonterminal is surrounded by, the single nonterminal on the left hand side can always be
replaced by the right hand side. Languages generated by context-free grammars are known as
context-free languages (CFL). Different Context Free grammars can generate the same context
free language. It is important to distinguish properties of the language (intrinsic properties) from
properties of a particular grammar (extrinsic properties). Given two context free grammars, the
language equality question (do they generate the same language?) is undecidable. Context-free
grammars are important in linguistics for describing the structure of sentences and words in
natural language, and in computer science for describing the structure of programming languages
and other formal languages. In linguistics, some authors use the term phrase structure grammar
to refer to context-free grammars, whereby phrase structure grammars are distinct from
dependency grammars. In computer science, a popular notation for context-free grammars is
Backus–Naur Form, or BNF.
Normal forms for CFGs
If L(G) does not contain , then G can have a CNF form with productions only of type
where
Pumping lemma for CFLs
The pumping lemma for context-free languages, also known as the Bar-Hillel lemma, is a lemma
that gives a property shared by all context-free languages. If a language L is context-free, then
there exists some integer p ≥ 1 such that every string s in L with |s| ≥ p (where p is a "pumping
length") can be written as
s = uvxyz with substrings u, v, x, y and z, such that
1. |vxy| ≤ p,
2. |vy| ≥ 1, and
3. uv n
xy n
z is in L for all n ≥ 0.
Usage of the lemma
The pumping lemma for context-free languages can be used to show that certain languages are
not context-free. For example, we can show that language is not context-
free by using the pumping lemma in a proof by contradiction. First, assume that is context
free. By the pumping lemma, there exists an integer which is the pumping length of language
. Consider the string in . The pumping lemma tells us that can be written in
the form , where , and are substrings, such that ,
, and is in for every integer . By our choice of and the fact that
, it is easily seen that the substring can contain no more than two distinct letters.
That is, we have one of five possibilities for :
1. for some .
2. for some and with .
3. for some .
4. for some and with .
5. for some .
For each case, it is easily verified that does not contain equal numbers of each letter
for any . Thus, does not have the form . This contradicts the definition
of . Therefore, our initial assumption that is context free must be false.
While the pumping lemma is often a useful tool to prove that a given language is not context-
free, it does not give a complete characterization of the context-free languages. If a language
does not satisfy the condition given by the pumping lemma, we have established that it is not
context-free. On the other hand, there are languages that are not context-free, but still satisfy the
condition given by the pumping lemma. There are more powerful proof techniques available,
such as Ogden's lemma, but also these techniques do not give a complete characterization of the
context-free languages.
Ambiguous to Unambiguous CFG
While some context-free languages (the set of strings that can be generated by a grammar) have
both ambiguous and unambiguous grammars, there exist context-free languages for which no
unambiguous context-free grammar can exist. An example of an inherently ambiguous language
is the union of with . This set is context-
free, since the union of two context-free languages is always context-free. But Hopcroft &
Ullman (1979) give a proof that there is no way to unambiguously parse strings in the (non-
context-free) subset which is the intersection of these two languages.

More Related Content

PDF
Unit i
PDF
Unit ii
PPTX
Pumping lemma for regular set h1
PPT
Theory of Computation - Lectures 6 & 7
PDF
FLAT Notes
PPTX
1.10. pumping lemma for regular sets
PDF
Automata
PDF
Pumping lemma (1)
Unit i
Unit ii
Pumping lemma for regular set h1
Theory of Computation - Lectures 6 & 7
FLAT Notes
1.10. pumping lemma for regular sets
Automata
Pumping lemma (1)

What's hot (19)

PDF
Chomsky hierarchy
PDF
Flat unit 3
PDF
Flat unit 2
PPTX
Thoery of Computaion and Chomsky's Classification
PDF
Formal Languages and Automata Theory unit 4
PDF
Introduction to the theory of computation
PPTX
Conteext-free Grammer
PPT
Mba admission in india
PPT
Theory of computing
DOC
Chapter 2 2 1 2
PDF
Flat unit 1
PPT
PPT
2. context free langauages
PDF
Formal language & automata theory
PPT
PPT
PPTX
Introduction TO Finite Automata
PPTX
Regular expressions
PPT
3.1 intro toautomatatheory h1
Chomsky hierarchy
Flat unit 3
Flat unit 2
Thoery of Computaion and Chomsky's Classification
Formal Languages and Automata Theory unit 4
Introduction to the theory of computation
Conteext-free Grammer
Mba admission in india
Theory of computing
Chapter 2 2 1 2
Flat unit 1
2. context free langauages
Formal language & automata theory
Introduction TO Finite Automata
Regular expressions
3.1 intro toautomatatheory h1
Ad

Similar to Free Ebooks Download ! Edhole (20)

PDF
Lecture-8-Pumpdndndndndnddndning Lemma.pdf
PPT
hop-chap4.ppt
PDF
PRESENTATION ON NON REGULAR LANGUAGE.pdf
PDF
09.LearningMaterial_Sample.pdf
PDF
Pumping lema topic of theory of automata
PPTX
RegularLanguageProperties.pptx
PPTX
Pumping Lemma________________________________________________________________...
PPT
non regular language updated theory of automata.ppt
PPTX
Pumping lemma
PPT
non regular language and pumping lemma toc
PPTX
AUTOMATA AUTOMATA Automata5Chapter4.pptx
PPTX
pumpexamples.pptx
PPTX
Pumming Lemma
PPT
Class6
PPT
xcjkfvhdfjlkghfkjbnfkbnfgbnklnbknbmcvbnlkcnb
PDF
Regular pumping examples
PDF
Pumping lemma for cfl
PPTX
Pumping lemma Theory Of Automata
PPT
06_PumpingLemma compiler design of chapter 4.ppt
PDF
Regular pumping
Lecture-8-Pumpdndndndndnddndning Lemma.pdf
hop-chap4.ppt
PRESENTATION ON NON REGULAR LANGUAGE.pdf
09.LearningMaterial_Sample.pdf
Pumping lema topic of theory of automata
RegularLanguageProperties.pptx
Pumping Lemma________________________________________________________________...
non regular language updated theory of automata.ppt
Pumping lemma
non regular language and pumping lemma toc
AUTOMATA AUTOMATA Automata5Chapter4.pptx
pumpexamples.pptx
Pumming Lemma
Class6
xcjkfvhdfjlkghfkjbnfkbnfgbnklnbknbmcvbnlkcnb
Regular pumping examples
Pumping lemma for cfl
Pumping lemma Theory Of Automata
06_PumpingLemma compiler design of chapter 4.ppt
Regular pumping
Ad

More from Edhole.com (20)

PPT
Ca in patna
PPT
Chartered accountant in dwarka
PPT
Ca in dwarka
PPT
Ca firm in dwarka
PPT
Website development company surat
PPTX
Website designing company in surat
PPTX
Website dsigning company in india
PPT
Website designing company in delhi
PPT
Ca in patna
PPT
Chartered accountant in dwarka
PPT
Ca firm in dwarka
PPTX
Ca in dwarka
PPTX
Website development company surat
PPT
Website designing company in surat
PPT
Website designing company in india
PPT
Website designing company in delhi
PPT
Website designing company in mumbai
PPT
Website development company surat
PPT
Website desinging company in surat
PPT
Website designing company in india
Ca in patna
Chartered accountant in dwarka
Ca in dwarka
Ca firm in dwarka
Website development company surat
Website designing company in surat
Website dsigning company in india
Website designing company in delhi
Ca in patna
Chartered accountant in dwarka
Ca firm in dwarka
Ca in dwarka
Website development company surat
Website designing company in surat
Website designing company in india
Website designing company in delhi
Website designing company in mumbai
Website development company surat
Website desinging company in surat
Website designing company in india

Recently uploaded (20)

PDF
Complications of Minimal Access Surgery at WLH
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
master seminar digital applications in india
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Cell Structure & Organelles in detailed.
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
01-Introduction-to-Information-Management.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Complications of Minimal Access Surgery at WLH
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
master seminar digital applications in india
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Computing-Curriculum for Schools in Ghana
Cell Structure & Organelles in detailed.
human mycosis Human fungal infections are called human mycosis..pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
01-Introduction-to-Information-Management.pdf
Microbial disease of the cardiovascular and lymphatic systems
Final Presentation General Medicine 03-08-2024.pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
VCE English Exam - Section C Student Revision Booklet
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx

Free Ebooks Download ! Edhole

  • 1. EDUCATION HOLE PRESENTS THEORY OF AUTOMATA & FORMAL LANGUAGES Unit-III
  • 2. Arden Theorem........................................................................................................................ 2 Pumping Lemma for regular expressions..........................................................................................3 Use of lemma...................................................................................................................................4 Proof of the pumping lemma ...........................................................................................................4 General version of pumping lemma for regular languages ..................................................................................5 Converse of lemma not true.................................................................................................................................5 My hill-Nerode theorem .......................................................................................................... 6 Context free grammar: Ambiguity........................................................................................... 8 Recognizing ambiguous grammars ...................................................................................................8 Inherently ambiguous languages......................................................................................................9 Simplification of CFGs.............................................................................................................. 9 Normal forms for CFGs................................................................................................................... 10 Pumping lemma for CFLs ....................................................................................................... 11 Usage of the lemma ....................................................................................................................... 11 Ambiguous to Unambiguous CFG .......................................................................................... 12 Arden Theorem In theoretical computer science, Arden's rule, also known as Arden's lemma, is a mathematical statement about a certain form of language equations. Let P and Q be two Regular Expression s over Σ. If P does not contain Λ, then for the equation R = Q + RP has a unique (one and only one) solution R = QP* . Proof: Now point out the statements in Arden's Theorem in General form.
  • 3. (i) P and Q are two Regular Expressions. (ii) P does not contain Λ symbol. (iii) R = Q + RP has a solution, i.e. R = QP* (iv) This solution is the one and only one solution of the equation. If R = QP* is a solution of the equation R = Q + RP then by putting the value of R in the equation we shall get the value ‘0’. (Putting the value of R in the LHS we get) So from here it is proved that R = QP* is a solution of the equation R = Q + RP. Pumping Lemma for regular expressions Let L be a regular language. Then there exists an integer p ≥ 1 depending only on L such that every string w in L of length at least p (p is called the "pumping length") can be written as w = xyz (i.e., w can be divided into three substrings), satisfying the following conditions: 1. |y| ≥ 1; 2. |xy| ≤ p 3. for all i ≥ 0, xyi z ∈ L y is the substring that can be pumped (removed or repeated any number of times, and the resulting string is always in L). (1) means the loop y to be pumped must be of length at least one; (2) means the loop must occur within the first p characters. |x| must be smaller than p (conclusion of (1) and (2)), apart from that there is no restriction on x and z. In simple words, for any regular language L, any sufficiently long word w (in L) can be split into 3 parts. i.e. w = xyz , such that all the strings xyk z for k≥0 are also in L.
  • 4. Below is a formal expression of the Pumping Lemma. Use of lemma The pumping lemma is often used to prove that a particular language is non-regular: a proof by contradiction (of the language's regularity) may consist of exhibiting a word (of the required length) in the language which lacks the property outlined in the pumping lemma. For example the language L = {an bn : n ≥ 0} over the alphabet Σ = {a, b} can be shown to be non-regular as follows. Let w, x, y, z, p, and i be as used in the formal statement for the pumping lemma above. Let w in L be given by w = ap bp . By the pumping lemma, there must be some decomposition w = xyz with |xy| ≤ p and |y| ≥ 1 such that xyi z in L for every i ≥ 0. Using |xy| ≤ p, we know y only consists of instances of a. Moreover, because |y| ≥ 1, it contains at least one instance of the letter a. We now pump y up: xy2 z has more instances of the letter a than the letter b, since we have added some instances of a without adding instances of b. Therefore xy2 z is not in L. We have reached a contradiction. Therefore, the assumption that L is regular must be incorrect. Hence L is not regular. The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows the same idea. Given p, there is a string of balanced parentheses that begins with more than p left parentheses, so that y will consist entirely of left parentheses. By repeating y, we can produce a string that does not contain the same number of left and right parentheses, and so they cannot be balanced. Proof of the pumping lemma For every regular language there is a finite state automaton (FSA) that accepts the language. The numbers of states in such an FSA are counted and that count is used as the pumping length p. For a string of length at least p, let s0 be the start state and let s1, ..., sp be the sequence of the next p states visited as the string is emitted. Because the FSA has only p states, within this sequence of p + 1 visited states there must be at least one state that is repeated. Write S for such a state. The transitions that take the machine from the first encounter of state S to the second encounter of state S match some string. This string is called y in the lemma, and since the machine will match a string without the y portion, or the string y can be repeated any number of times, the conditions of the lemma are satisfied.
  • 5. For example, the following image shows an FSA. The FSA accepts the string: abcd. Since this string has a length which is at least as large as the number of states, which is four, the pigeonhole principle indicates that there must be at least one repeated state among the start state and the next four visited states. In this example, only q1 is a repeated state. Since the substring bc takes the machine through transitions that start at state q1 and end at state q1, that portion could be repeated and the FSA would still accept, giving the string abcbcd. Alternatively, the bc portion could be removed and the FSA would still accept giving the string ad. In terms of the pumping lemma, the string abcd is broken into an x portion a, a y portion bc and a z portion d. General version of pumping lemma for regular languages If a language L is regular, then there exists a number p ≥ 1 (the pumping length) such that every string uwv in L with |w| ≥ p can be written in the form uwv = uxyzv with strings x, y and z such that |xy| ≤ p, |y| ≥ 1 and uxyi zv is in L for every integer i ≥ 0. This version can be used to prove many more languages are non-regular, since it imposes stricter requirements on the language. Converse of lemma not true Note that while the pumping lemma states that all regular languages satisfy the conditions described above, the converse of this statement is not true: a language that satisfies these conditions may still be non-regular. In other words, both the original and the general version of the pumping lemma give a necessary but not sufficient condition for a language to be regular. For example, consider the following language L: .
  • 6. In other words, L contains all strings over the alphabet {0,1,2,3} with a substring of length 3 including a duplicate character, as well as all strings over this alphabet where precisely 1/7 of the string's characters are 3's. This language is not regular but can still be "pumped" with p = 5. Suppose some string s has length at least 5. Then, since the alphabet has only four characters, at least two of the five characters in the string must be duplicates. They are separated by at most three characters. • If the duplicate characters are separated by 0 characters, or 1, pump one of the other two characters in the string, which will not affect the substring containing the duplicates. • If the duplicate characters are separated by 2 or 3 characters, pump 2 of the characters separating them. Pumping either down or up results in the creation of a substring of size 3 that contains 2 duplicate characters. • The second condition of L ensures that L is not regular: i.e., there are an infinite number of strings that are in L but cannot be obtained by pumping some smaller string in L. For a practical test that exactly characterizes regular languages, see the Myhill-Nerode theorem. The typical method for proving that a language is regular is to construct either a finite state machine or a regular expression for the language. My hill-Nerode theorem parity machine, acceptor accepting {an bm |n,m ≥ 1}, and many more such examples. Consider the serial adder. After getting some input, the machine can be in ‘carry’ state or ‘no carry’ state. It does not matter what exactly the earlier input was. It is only necessary to know whether it has produced a carry or not. Hence, the FSA need not distinguish between each and every input. It distinguishes between classes of inputs. In the above case, the whole set of inputs can be partitioned into two classes – one that produces a carry and another that does not produce a carry. Similarly, in the case of parity checker, the machine distinguishes between two classes of input strings: those containing odd number of 1’s and those containing even number of 1’s. Thus, the FSA distinguishes between classes of input strings. These classes are also finite. Hence, we say that the FSA has finite amount of memory. The following three statements are equivalent. 1. L ⊆ Σ* is accepted by a DFSA. 2. L is the union of some of the equivalence classes of a right invariant equivalence relation of finite index on Σ*. 3. Let equivalence relation RL be defined over Σ* as follows: xRL y if and only if, for all z ∊ Σ*, xz is in L exactly when yz is in L. Then RL is of finite index. Proof We shall prove (1) ⇒ (2), (2) ⇒ (3), and (3) ⇒ (1).
  • 7. (1) ⇒ (2) Let L be accepted by a FSA M = (K, Σ, δ, q0, F). Define a relation RM on Σ* such that xRMy if δ(q0, x) = δ(q0,y). RM is an equivalence relation, as seen below. ∀x xRMx, since δ(q0, x) = δ(q0, x), ∀x xRMy ⇒ yRMx ∵ δ(q0, x) = δ(q0, y) which means δ(q0, y) = δ(q0, x), ∀x, y xRM y and yRMz ⇒ xRMz. For if δ(q0, x) = δ(q0, y) and δ(q0, y) = δ(q0, z) then δ(q0, x) = δ(q0, z). So RM divides Σ* into equivalence classes. The set of strings which take the machine from q0 to a particular state qi are in one equivalence class. The number of equivalence classes is therefore equivalent to the number of states of M, assuming every state is reachable from q0. (If a state is not reachable from q0, it can be removed without affecting the language accepted). It can be easily seen that this equivalence relation RM is right invariant, i.e., if xRM y, xzRM yz ∀z ∊ Σ*. δ(q0, x) = δ (q0, y) if xRM y, δ(q0, xz) = δ(δ (q0, x), z) = δ(δ (q0, y), z) = δ(q0, yz). Therefore xzRM yz. L is the union of those equivalence classes of RM which correspond to final states of M. (2) ⇒ (3) Assume statement (2) of the theorem and let E be the equivalence relation considered. Let RL be defined as in the statement of the theorem. We see that xEy ⇒ xRL y. If xEy, then xzEyz for each z ∊ Σ*. xz and yz are in the same equivalence class of E. Hence, xz and yz are both in L or both not in L as L is the union of some of the equivalence classes of E. Hence xRL y. Hence, any equivalence class of E is completely contained in an equivalence class of RL. Therefore, E is a refinement of RL and so the index of RL is less than or equal to the index of E and hence finite. (3) ⇒ (1) First, we show RL is right invariant. xRL y if ∀z in Σ*, xz is in L exactly when yz is in L or we can also write this in the following way: xRL y if for all w, z in Σ*, xwz is in L exactly when ywz is in L.
  • 8. If this holds xwRLyw. Therefore, RL is right invariant. Let [x] denote the equivalence class of RL to which x belongs. Construct a DFSA ML = (K′, Σ, δ′, q0, F′) as follows: K′ contains one state corresponding to each equivalence class of RL. [ε] corresponds to q′0. F′ corresponds to those states [x], x ∊ L. δ′ is defined as follows: δ′ ([x], a) = [xa]. This definition is consistent as RL is right invariant. Suppose x and y belong to the same equivalence class of RL. Then, xa and ya will belong to the same equivalence class of RL. For, δ′([x], a) = δ′([y], a) ⇓ ⇓ [xa] = [ya] if x ∊ L, [x] is a final state in M′, i.e., [x] ∊ F′. This automaton M′ accepts L. Context free grammar: Ambiguity An ambiguous grammar is a formal grammar for which there exists a string that can have more than one leftmost derivation, while an unambiguous grammar is a formal grammar for which every valid string has a unique leftmost derivation. Many languages admit both ambiguous and unambiguous grammars, while some languages admit only ambiguous grammars. Any non- empty language admits an ambiguous grammar by taking an unambiguous grammar and introducing a duplicate rule or synonym (the only language without ambiguous grammars is the empty language). A language that only admits ambiguous grammars is called an inherently ambiguous language, and there are inherently ambiguous context-free languages. Deterministic context-free grammars are always unambiguous, and are an important subclass of unambiguous CFGs; there are non-deterministic unambiguous CFGs, however. For real-world programming languages, the reference CFG is often ambiguous, due to issues such as the dangling else problem. If present, these ambiguities are generally resolved by adding precedence rules or other context-sensitive parsing rules, so the overall phrase grammar is unambiguous. Recognizing ambiguous grammars The general decision problem of whether a grammar is ambiguous is undecidable because it can be shown that it is equivalent to the Post correspondence problem. At least, there are tools
  • 9. implementing some semi-decision procedure for detecting ambiguity of context-free grammars. The efficiency of context-free grammar parsing is determined by the automaton that accepts it. Deterministic context-free grammars are accepted by deterministic pushdown automata and can be parsed in linear time, for example by the LR parser.[2] This is a subset of the context-free grammars which are accepted by the pushdown automaton and can be parsed in polynomial time, for example by the CYK algorithm. Unambiguous context-free grammars can be nondeterministic. For example, the language of even-length palindromes on the alphabet of 0 and 1 has the unambiguous context-free grammar S → 0S0 | 1S1 | ε. An arbitrary string of this language cannot be parsed without reading all its letters first which means that a pushdown automaton has to try alternative state transitions to accommodate for the different possible lengths of a semi-parsed string.[3] Nevertheless, removing grammar ambiguity may produce a deterministic context-free grammar and thus allow for more efficient parsing. Compiler generators such as YACC include features for resolving some kinds of ambiguity, such as by using the precedence and associativity constraints. Inherently ambiguous languages Inherent ambiguity was proven with Parikh's theorem in 1961 by Rohit Parikh in an MIT research report. While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. An example of an inherently ambiguous language is the union of with . This set is context- free, since the union of two context-free languages is always context-free. But Hopcroft & Ullman (1979) give a proof that there is no way to unambiguously parse strings in the (non- context-free) subset which is the intersection of these two languages. Simplification of CFGs a context-free grammar (CFG) is a formal grammar in which every production rule is of the form V → w Where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty). A formal grammar is considered "context free" when its production rules can be applied regardless of the context of a nonterminal. It does not matter which symbols the nonterminal is surrounded by, the single nonterminal on the left hand side can always be replaced by the right hand side. Languages generated by context-free grammars are known as
  • 10. context-free languages (CFL). Different Context Free grammars can generate the same context free language. It is important to distinguish properties of the language (intrinsic properties) from properties of a particular grammar (extrinsic properties). Given two context free grammars, the language equality question (do they generate the same language?) is undecidable. Context-free grammars are important in linguistics for describing the structure of sentences and words in natural language, and in computer science for describing the structure of programming languages and other formal languages. In linguistics, some authors use the term phrase structure grammar to refer to context-free grammars, whereby phrase structure grammars are distinct from dependency grammars. In computer science, a popular notation for context-free grammars is Backus–Naur Form, or BNF. Normal forms for CFGs If L(G) does not contain , then G can have a CNF form with productions only of type where
  • 11. Pumping lemma for CFLs The pumping lemma for context-free languages, also known as the Bar-Hillel lemma, is a lemma that gives a property shared by all context-free languages. If a language L is context-free, then there exists some integer p ≥ 1 such that every string s in L with |s| ≥ p (where p is a "pumping length") can be written as s = uvxyz with substrings u, v, x, y and z, such that 1. |vxy| ≤ p, 2. |vy| ≥ 1, and 3. uv n xy n z is in L for all n ≥ 0. Usage of the lemma The pumping lemma for context-free languages can be used to show that certain languages are not context-free. For example, we can show that language is not context- free by using the pumping lemma in a proof by contradiction. First, assume that is context free. By the pumping lemma, there exists an integer which is the pumping length of language . Consider the string in . The pumping lemma tells us that can be written in the form , where , and are substrings, such that , , and is in for every integer . By our choice of and the fact that
  • 12. , it is easily seen that the substring can contain no more than two distinct letters. That is, we have one of five possibilities for : 1. for some . 2. for some and with . 3. for some . 4. for some and with . 5. for some . For each case, it is easily verified that does not contain equal numbers of each letter for any . Thus, does not have the form . This contradicts the definition of . Therefore, our initial assumption that is context free must be false. While the pumping lemma is often a useful tool to prove that a given language is not context- free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free. On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma. There are more powerful proof techniques available, such as Ogden's lemma, but also these techniques do not give a complete characterization of the context-free languages. Ambiguous to Unambiguous CFG While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. An example of an inherently ambiguous language is the union of with . This set is context- free, since the union of two context-free languages is always context-free. But Hopcroft & Ullman (1979) give a proof that there is no way to unambiguously parse strings in the (non- context-free) subset which is the intersection of these two languages.