SlideShare a Scribd company logo
2
Most read
6
Most read
10
Most read
1
SPECIFICATION OF
TOKENS
2
Strings and Languages
• Regular Expressions are an important notation for specifying patterns.
• Alphabet – any finite set of symbols
e.g. ASCII, binary alphabet, UNICODE, EBCDIC,LATIN-1
• String – A finite sequence of symbols drawn from an alphabet
– Banana (ASCII Alphabet)
– Length of a string => |s|
– Empty String => ε
• Other terms relating to strings: prefix; suffix; substring; proper prefix,
suffix, or substring (non-empty, not entire string); subsequence
• Language – A set of strings over a fixed alphabet
3
Languages
• A language, L, is simply any set of strings over a
fixed alphabet.
Alphabet Languages
{0,1} {0,10,100,1000,100000…}
{0,1,00,11,000,111,…}
{a,b,c} {abc,aabbcc,aaabbbccc,…}
{A, … ,Z} {FOR,WHILE,GOTO,…}
{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}
+,-,…,<,>,…}
Special Languages:  - EMPTY LANGUAGE
 - contains  string only
4
String operations
• Given String: banana
• Prefix : ban, banana
• Suffix : ana, banana
• Substring : nan, ban, ana, banana
• Subsequence: bnan, nn
• Proper Prefix and Suffix
5
String Operations
• Concatenation
– xy; s = s = s;  - identity for concatenation
– s0 =  if i > 0 si = si-1s
6
Operations on Languages
OPERATION DEFINITION
union of L and M
written L  M
concatenation of L
and M written LM
Kleene closure of L
written L*
positive closure of L
written L+
L  M = {s | s is in L or s is in M}
LM = {st | s is in L and t is in M}
L+=


0
i
i
L
L* denotes “zero or more concatenations of “ L
L*=


1
i
i
L
L+ denotes “one or more concatenations of “ L
Exponentiation Lo={ε}, L1=L,L2=LL
7
Operations on Languages
• LUD is the set of letters and digits
• LD is the set of strings consisting of a
letter followed by a digit
• L4 is the set of all four strings
• L* is the set of strings including ε
• D+ is the set of strings of one or more
digits.
8
Say What?
L = {A, B, C, D } D = {1, 2, 3}
• L  D
{A, B, C, D, 1, 2, 3 }
• LD
{A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }
• L2
{ AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD}
• L*
{ All possible strings of L plus  }
• L+
L* - 
• L (L  D )
Valid :{ A1,AA2,B345,CD45} Invlaid:{321,4A2}
• L (L  D )*
Valid:{ A,A1,A23,D3,DA5..} Invalid:{31}
9
Regular Expressions
• A Regular Expression is a Set of Rules /
Techniques for Constructing Sequences of
Symbols (Strings) from an Alphabet.
• Let  Be an Alphabet, r a Regular Expression
Then L(r) is the Language That is characterized
by the Rules of r
10
Regular Expressions
• Defined over an alphabet Σ
• ε represents {ε}, the set containing the empty string
• If a is a symbol in Σ, then a is a regular expression
denoting {a}, the set containing the string a
• If r and s are regular expressions denoting the
languages L(r) and L(s), then:
– (r)|(s) is a regular expression denoting L(r)U L(s)
– (r)(s) is a regular expression denoting L(r)L(s)
– (r)* is a regular expression denoting (L(r))*
– (r) is a regular expression denoting L(r)
• Precedence: * (left associative), then concatenation (left
associative), then | (left associative)
11
Regular Expressions
Alphabet = {a, b}
1. a|b denotes {a, b}
2. (a|b)(a|b) denotes {ab, aa, ba, bb}
3. a* denotes {, a, aa, …}
4. (a|b)* - Strings of a’s and b’s including the 
5. a|a*b – a followed by zero/more a’s followed by b
12
Algebraic Properties of Regular
Expressions
AXIOM DESCRIPTION
r | s = s | r
r | (s | t) = (r | s) | t
(r s) t = r (s t)
r = r
r = r
r* = ( r |  )*
r ( s | t ) = r s | r t
( s | t ) r = s r | t r
r** = r*
| is commutative
| is associative
concatenation is associative
concatenation distributes over |
relation between * and 
 Is the identity element for concatenation
* is idempotent
13
Regular Definitions
• Names maybe given to regular expressions; these
names can be used like symbols
• Let  is an alphabet of basic symbols. The regular
definition is a sequence of definitions of the form
d1 r1
d2 r2
. . .
dn rn
Where, each di is a distinct name, and each ri is a
regular expression over the symbols in   {d1, d2,
…, di-1 }
14
Regular Definitions
• Example 1:
– letter  A|B|…|Z|a|b|…|z
– digit  0|1|…|9
– id  letter (letter | digit)*
• Example 2
– digit  0 | 1 | 2 | … | 9
– digits  digit digit*
– optional_fraction  . digits | 
– optional_exponent  ( E ( + | -| ) digits) | 
– num  digits optional_fraction optional_exponent
15
Regular Definitions
• Shorthand
– One or more instances: r+ denotes rr*
– Zero or one Instance: r? denotes r|ε
– Character classes: [a-z] denotes
[a|b|…|z]
16
Example
• digit  0 | 1 | 2 | … | 9
• digits  digit+
• optional_fraction  (. digits ) ?
• optional_exponent  ( E ( + | -) ? digits) ?
• num  digits optional_fraction optional_exponent
17
Limitations of Regular
Expression
• Some languages cannot be described by any regular
expression
• Cannot describe balanced or nested constructs
– Example, all valid strings of balanced parentheses
– This can be done with CFG
• Cannot describe repeated strings
– Example: {wcw|w is a string of a’s and b’s}
– This can be done with CFG
• Can be used to denote only a fixed or unspecified
number of repetitions.

More Related Content

PPTX
Specification-of-tokens
PPTX
Syntax Analysis in Compiler Design
PDF
Syntax analysis
PPTX
CLR AND LALR PARSER
PPTX
Predictive parser
PPTX
Compiler design syntax analysis
PPTX
Types of Parser
PPTX
Context free grammar
Specification-of-tokens
Syntax Analysis in Compiler Design
Syntax analysis
CLR AND LALR PARSER
Predictive parser
Compiler design syntax analysis
Types of Parser
Context free grammar

What's hot (20)

PPTX
Regular Expression Examples.pptx
PPTX
Back patching
PPTX
The role of the parser and Error recovery strategies ppt in compiler design
PPT
Introduction to Compiler design
PPTX
Parsing in Compiler Design
PDF
Lecture: Regular Expressions and Regular Languages
PPT
C# Exceptions Handling
PPTX
Input-Buffering
PDF
Lexical Analysis - Compiler design
PPT
Chapter 5 -Syntax Directed Translation - Copy.ppt
PPTX
Top down parsing
PPTX
Automata theory - CFG and normal forms
PDF
Syntax directed translation
PPTX
Constructors in C++
PDF
Operator precedence
PPT
1.Role lexical Analyzer
PDF
Java I/o streams
PDF
Intermediate code generation
PDF
Syntax Directed Definition and its applications
PPTX
Context free grammar
Regular Expression Examples.pptx
Back patching
The role of the parser and Error recovery strategies ppt in compiler design
Introduction to Compiler design
Parsing in Compiler Design
Lecture: Regular Expressions and Regular Languages
C# Exceptions Handling
Input-Buffering
Lexical Analysis - Compiler design
Chapter 5 -Syntax Directed Translation - Copy.ppt
Top down parsing
Automata theory - CFG and normal forms
Syntax directed translation
Constructors in C++
Operator precedence
1.Role lexical Analyzer
Java I/o streams
Intermediate code generation
Syntax Directed Definition and its applications
Context free grammar
Ad

Similar to 2_2Specification of Tokens.ppt (20)

PDF
Chapter2CDpdf__2021_11_26_09_19_08.pdf
PPT
Chapter Two(1)
PPTX
Regular Expression in Compiler design
PPTX
L_2_apl.pptx
PPTX
Mod 2_RegularExpressions.pptx
PDF
Regular Expression
PPTX
BCS503 TOC Module 2 PPT.pptx VTU academic Year 2024-25 ODD SEM
PPTX
Regular expressions
PPTX
AUTOMATA AUTOMATA Automata4Chapter3.pptx
PPTX
Chapter 4_Regular Expressions in Automata.pptx
PPT
3-regular_expressions_and_languages (1).ppt
PPT
3-regular_expressions_and_languages.ppt 1
PPT
3-regular_expressions_and_languages (1).ppt
PPT
3-regular_expressions_and_languages (1).ppt
PPT
3-regular_expressions_and_languages.ppt 1
PPT
3-regular_expressions_and_languages (1).ppt
PPTX
Theory of automata and formal language
PPTX
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
PPT
To lec 03
PPTX
[Compilers23] Lexical Analysis – Scanning Part I.pptx
Chapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter Two(1)
Regular Expression in Compiler design
L_2_apl.pptx
Mod 2_RegularExpressions.pptx
Regular Expression
BCS503 TOC Module 2 PPT.pptx VTU academic Year 2024-25 ODD SEM
Regular expressions
AUTOMATA AUTOMATA Automata4Chapter3.pptx
Chapter 4_Regular Expressions in Automata.pptx
3-regular_expressions_and_languages (1).ppt
3-regular_expressions_and_languages.ppt 1
3-regular_expressions_and_languages (1).ppt
3-regular_expressions_and_languages (1).ppt
3-regular_expressions_and_languages.ppt 1
3-regular_expressions_and_languages (1).ppt
Theory of automata and formal language
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
To lec 03
[Compilers23] Lexical Analysis – Scanning Part I.pptx
Ad

More from Ratnakar Mikkili (11)

PPTX
Artificial Intelligence_MARKOV MODEL.pptx
PPTX
SyntaxDirectedTranslation in Compiler Design
PDF
Exception Handling notes in java exception
PPTX
AI-State Space Representation.pptx
PPTX
AI-State Space Representation.pptx
PPTX
Artificial Intelligence_Searching.pptx
PPTX
Artificial Intelligence_Environment.pptx
PPT
2_4 Finite Automata.ppt
PPTX
Regular expressions
PPTX
Push down automata
PPTX
Introduction TO Finite Automata
Artificial Intelligence_MARKOV MODEL.pptx
SyntaxDirectedTranslation in Compiler Design
Exception Handling notes in java exception
AI-State Space Representation.pptx
AI-State Space Representation.pptx
Artificial Intelligence_Searching.pptx
Artificial Intelligence_Environment.pptx
2_4 Finite Automata.ppt
Regular expressions
Push down automata
Introduction TO Finite Automata

Recently uploaded (20)

PDF
composite construction of structures.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Welding lecture in detail for understanding
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Construction Project Organization Group 2.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
UNIT 4 Total Quality Management .pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
web development for engineering and engineering
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
PPT on Performance Review to get promotions
PDF
Digital Logic Computer Design lecture notes
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
composite construction of structures.pdf
CH1 Production IntroductoryConcepts.pptx
Internet of Things (IOT) - A guide to understanding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Welding lecture in detail for understanding
Foundation to blockchain - A guide to Blockchain Tech
Construction Project Organization Group 2.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
UNIT 4 Total Quality Management .pptx
573137875-Attendance-Management-System-original
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
web development for engineering and engineering
Embodied AI: Ushering in the Next Era of Intelligent Systems
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Structs to JSON How Go Powers REST APIs.pdf
PPT on Performance Review to get promotions
Digital Logic Computer Design lecture notes
Model Code of Practice - Construction Work - 21102022 .pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf

2_2Specification of Tokens.ppt

  • 2. 2 Strings and Languages • Regular Expressions are an important notation for specifying patterns. • Alphabet – any finite set of symbols e.g. ASCII, binary alphabet, UNICODE, EBCDIC,LATIN-1 • String – A finite sequence of symbols drawn from an alphabet – Banana (ASCII Alphabet) – Length of a string => |s| – Empty String => ε • Other terms relating to strings: prefix; suffix; substring; proper prefix, suffix, or substring (non-empty, not entire string); subsequence • Language – A set of strings over a fixed alphabet
  • 3. 3 Languages • A language, L, is simply any set of strings over a fixed alphabet. Alphabet Languages {0,1} {0,10,100,1000,100000…} {0,1,00,11,000,111,…} {a,b,c} {abc,aabbcc,aaabbbccc,…} {A, … ,Z} {FOR,WHILE,GOTO,…} {A,…,Z,a,…,z,0,…9, { All legal PASCAL progs} +,-,…,<,>,…} Special Languages:  - EMPTY LANGUAGE  - contains  string only
  • 4. 4 String operations • Given String: banana • Prefix : ban, banana • Suffix : ana, banana • Substring : nan, ban, ana, banana • Subsequence: bnan, nn • Proper Prefix and Suffix
  • 5. 5 String Operations • Concatenation – xy; s = s = s;  - identity for concatenation – s0 =  if i > 0 si = si-1s
  • 6. 6 Operations on Languages OPERATION DEFINITION union of L and M written L  M concatenation of L and M written LM Kleene closure of L written L* positive closure of L written L+ L  M = {s | s is in L or s is in M} LM = {st | s is in L and t is in M} L+=   0 i i L L* denotes “zero or more concatenations of “ L L*=   1 i i L L+ denotes “one or more concatenations of “ L Exponentiation Lo={ε}, L1=L,L2=LL
  • 7. 7 Operations on Languages • LUD is the set of letters and digits • LD is the set of strings consisting of a letter followed by a digit • L4 is the set of all four strings • L* is the set of strings including ε • D+ is the set of strings of one or more digits.
  • 8. 8 Say What? L = {A, B, C, D } D = {1, 2, 3} • L  D {A, B, C, D, 1, 2, 3 } • LD {A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 } • L2 { AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD} • L* { All possible strings of L plus  } • L+ L* -  • L (L  D ) Valid :{ A1,AA2,B345,CD45} Invlaid:{321,4A2} • L (L  D )* Valid:{ A,A1,A23,D3,DA5..} Invalid:{31}
  • 9. 9 Regular Expressions • A Regular Expression is a Set of Rules / Techniques for Constructing Sequences of Symbols (Strings) from an Alphabet. • Let  Be an Alphabet, r a Regular Expression Then L(r) is the Language That is characterized by the Rules of r
  • 10. 10 Regular Expressions • Defined over an alphabet Σ • ε represents {ε}, the set containing the empty string • If a is a symbol in Σ, then a is a regular expression denoting {a}, the set containing the string a • If r and s are regular expressions denoting the languages L(r) and L(s), then: – (r)|(s) is a regular expression denoting L(r)U L(s) – (r)(s) is a regular expression denoting L(r)L(s) – (r)* is a regular expression denoting (L(r))* – (r) is a regular expression denoting L(r) • Precedence: * (left associative), then concatenation (left associative), then | (left associative)
  • 11. 11 Regular Expressions Alphabet = {a, b} 1. a|b denotes {a, b} 2. (a|b)(a|b) denotes {ab, aa, ba, bb} 3. a* denotes {, a, aa, …} 4. (a|b)* - Strings of a’s and b’s including the  5. a|a*b – a followed by zero/more a’s followed by b
  • 12. 12 Algebraic Properties of Regular Expressions AXIOM DESCRIPTION r | s = s | r r | (s | t) = (r | s) | t (r s) t = r (s t) r = r r = r r* = ( r |  )* r ( s | t ) = r s | r t ( s | t ) r = s r | t r r** = r* | is commutative | is associative concatenation is associative concatenation distributes over | relation between * and   Is the identity element for concatenation * is idempotent
  • 13. 13 Regular Definitions • Names maybe given to regular expressions; these names can be used like symbols • Let  is an alphabet of basic symbols. The regular definition is a sequence of definitions of the form d1 r1 d2 r2 . . . dn rn Where, each di is a distinct name, and each ri is a regular expression over the symbols in   {d1, d2, …, di-1 }
  • 14. 14 Regular Definitions • Example 1: – letter  A|B|…|Z|a|b|…|z – digit  0|1|…|9 – id  letter (letter | digit)* • Example 2 – digit  0 | 1 | 2 | … | 9 – digits  digit digit* – optional_fraction  . digits |  – optional_exponent  ( E ( + | -| ) digits) |  – num  digits optional_fraction optional_exponent
  • 15. 15 Regular Definitions • Shorthand – One or more instances: r+ denotes rr* – Zero or one Instance: r? denotes r|ε – Character classes: [a-z] denotes [a|b|…|z]
  • 16. 16 Example • digit  0 | 1 | 2 | … | 9 • digits  digit+ • optional_fraction  (. digits ) ? • optional_exponent  ( E ( + | -) ? digits) ? • num  digits optional_fraction optional_exponent
  • 17. 17 Limitations of Regular Expression • Some languages cannot be described by any regular expression • Cannot describe balanced or nested constructs – Example, all valid strings of balanced parentheses – This can be done with CFG • Cannot describe repeated strings – Example: {wcw|w is a string of a’s and b’s} – This can be done with CFG • Can be used to denote only a fixed or unspecified number of repetitions.