SlideShare a Scribd company logo
CHAPTER 2
Lexical Analysis
(Scanning)
1. THE ROLE OF LEXICAL ANALYZER
lexical analyzer
(scanner)
syntax analyzer
(parser)
symbol table
manager
source
program
tokens
 Main task: to read input characters and group them into
“tokens.”
 Secondary tasks:
 Skip comments and whitespace;
 Correlate error messages with source program (e.g., line number of error).
Different approaches for Implementing Lexical
Analyzers:
 Using a scanner generator, e.g., lex or flex. This automatically
generates a lexical analyzer from a high-level description of the tokens.
(easiest to implement; least efficient)
 Programming it in a language such as C, using the I/O facilities of the
language.
(intermediate in ease, efficiency)
 Writing it in assembly language and explicitly managing the input.
(hardest to implement, but most efficient)
 token: a name for a set of input strings with related
structure.
Example: “identifier,” “integer constant”
 pattern: a rule describing the set of strings
associated with a token.
Example: “a letter followed by zero or more letters, digits, or
underscores.”
 lexeme: the actual input string that matches a
pattern.
Example: count
Examples
Input: count = 123
Tokens:
identifier : Rule: “letter followed by …”
Lexeme: count
assg_op : Rule: =
Lexeme: =
integer_const : Rule: “digit followed by …”
Lexeme: 123
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
 If more than one lexeme can match the pattern for a
token, the scanner must indicate the actual lexeme
that matched.
 This information is given using an attribute
associated with the token.
Example: The program statement
count = 123
yields the following token-attribute pairs:
identifier, pointer to the string “count”
assg_op, 
integer_const, the integer value 123
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
2. Input Buffering Scheme
Three approaches for Implementing Lexical
Analyzers:
•Using a scanner generator, e.g., lex or flex. This
automatically generates a lexical analyzer from a high-
level description of the tokens.
• (easiest to implement; least efficient)
•Programming it in a language such as C, using the I/O
facilities of the language.
• (intermediate in ease, efficiency)
•Writing it in assembly language and explicitly managing
the input.
(hardest to implement, but most efficient)
These three choices are listed in the increasing difficulty
for the implementer or compiler writer.
 Lexical Analyzer performance or Speed is
crucial, since
 This is the only part of the compiler that examines the entire input
program one character at a time.
 Disk input can be slow.
 The scanner accounts for considerable 25-30% of total compile
time.
 LA has to lookahead to determine when a match has been
found to announce a token.
 Scanners or LAs use and Inpput buffering technique called
double-buffering to minimize the overheads associated with
identification of tokens in a speed maner.
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
Code:
LexicalAnalysis in Compiler design   .pt
Input Buffering scheme with Sentinels
 Objective: Optimize the common case by reducing
the number of tests to one per advance of fwd.
 Idea: Extend each buffer half to hold a sentinel at
the end.
 This is a special character that cannot occur in a
program (e.g., EOF).
 It signals the need for some special action (fill
other buffer-half, or terminate processing).
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
3. Specification of Tokens: regular expressions
Terminology:
alphabet : a finite set of symbols
string : a finite sequence of alphabet symbols
language : a (finite or infinite) set of strings.
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
Regular Expressions
A pattern notation for describing certain kinds
of sets over strings:
Given an alphabet :
  is a regular exp. (denotes the language {})
 for each a  , a is a regular exp. (denotes the language
{a})
 if r and s are regular exps. denoting L(r) and L(s)
respectively, then so are:
 (r) | (s) ( denotes the language L(r)  L(s) )
 (r)(s) ( denotes the language L(r)L(s) )
 (r)* ( denotes the language L(r)* )
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
4. FINITE AUTOMATA – NFA and DFA
Finite Automata
A finite automaton is a 5-tuple
(Q, , T, q0, F), where:
  is a finite alphabet;
 Q is a finite set of states;
 T: Q    Q is the
transition function;
 q0  Q is the initial state;
and
 F  Q is a set of final
states.
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
NFA with € symbol for RE : (a/b)*abb
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
5.From Regular Expressions to NFA
The following algorithm is used to construct NFA for the
given RE.
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
Construct NFA for the given RE : (a/b)*abb
First decompose the given complex RE
into a simple REs .
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
The final NFA after applying algorithm
6.Conversion from NFA to DFA
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
Construct NFA with € symbol for RE (a/b)*abb and convert
it into DFA
NFA accepting the strings by the given RE is
First consider the staring state of NFA 0, and then
compute the €-closure(0) which is a starting state for
DFA and is set of states taken from NFA
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
DFA after conversion accepts the set of strings
represented by the RE (a/b)*abb
Simulating a DFA
7. Recognition of tokens
Considering the language generated by the
following grammar for the recognition of the tokens by
Lexical Analyzer. The grammar is
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
State9: c=nextchar();
if LETTWR(c) then STATE=10 else FAIL();
State10: c=nextchar();
if LETTWR(c) OR DIGIT(c) then STATE=10
else if OTHER(c) then STATE=11
else FAIL();
State11: Return (getToken(), install_ID())
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
8.Language for specifying Lexical analyzer (Lex,flex);
LEX tool
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
LexicalAnalysis in Compiler design   .pt
Considering the language generated by the
following grammar for the recognition of the tokens by
Lexical Analyzer. The grammar is
The following is a LEX program that recognizes the
tokens of various categories like white space,
identifier, number, relational operators, and
keywrods:if, then, else
LexicalAnalysis in Compiler design   .pt
9. Design of scanner or Lexical Analyzer generator
LexicalAnalysis in Compiler design   .pt
In next step convert this compound NFA to DFA for
recognizing the tokens by LA.
First construct NFA for each pattern Pi in LEX program.
Then construct compound NFA which recognizes all string
represented by all the patterns.
Pattern P1
Pattern P2
Pattern P3
Converting the above NFA to DFA –
The starting state A of DFA is composed of {0,1,3,7} NFA States as E-closure(0)
{0,1,3,7 }
A=
B=
C=
D=
E=
F=
LexicalAnalysis in Compiler design   .pt

More Related Content

PPT
Lecture 1 - Lexical Analysis.ppt
PDF
Lexical
PPTX
Ch 2.pptx
PPT
02. Chapter 3 - Lexical Analysis NLP.ppt
PPT
Chapter-2-lexical-analyser and its property lecture note.ppt
PPTX
A Role of Lexical Analyzer
PPTX
Compiler Lexical Analyzer to analyze lexemes.pptx
PPTX
Chahioiuou9oioooooooooooooofffghfpterTwo.pptx
Lecture 1 - Lexical Analysis.ppt
Lexical
Ch 2.pptx
02. Chapter 3 - Lexical Analysis NLP.ppt
Chapter-2-lexical-analyser and its property lecture note.ppt
A Role of Lexical Analyzer
Compiler Lexical Analyzer to analyze lexemes.pptx
Chahioiuou9oioooooooooooooofffghfpterTwo.pptx

Similar to LexicalAnalysis in Compiler design .pt (20)

PPT
Compiler Design ug semLexical Analysis.ppt
PPTX
Chapter 2 Introduction.pptxhj,jjk,jk,l,l,l,l,l,lhjkj
PPT
PPT
Ch3.ppt
PPTX
04LexicalAnalysissnsnjmsjsjmsbdjjdnd.pptx
PPT
52232.-Compiler-Design-Lexical-Analysis.ppt
PPT
Lexical analysis, syntax analysis, semantic analysis. Ppt
PPTX
Implementation of lexical analyser
PPT
compiler Design course material chapter 2
PDF
Compiler_Design_Introduction_Unit_2_IIT.pdf
PDF
role of lexical parser compiler design1-181124035217.pdf
PPTX
A simple approach of lexical analyzers
PPT
1.Role lexical Analyzer
PPT
Compiler Designs
PPT
Ch3.ppt
PPTX
3. Lexical analysis
PPTX
Cd ch2 - lexical analysis
PPT
SS & CD Module 3
PPT
Module 2
Compiler Design ug semLexical Analysis.ppt
Chapter 2 Introduction.pptxhj,jjk,jk,l,l,l,l,l,lhjkj
Ch3.ppt
04LexicalAnalysissnsnjmsjsjmsbdjjdnd.pptx
52232.-Compiler-Design-Lexical-Analysis.ppt
Lexical analysis, syntax analysis, semantic analysis. Ppt
Implementation of lexical analyser
compiler Design course material chapter 2
Compiler_Design_Introduction_Unit_2_IIT.pdf
role of lexical parser compiler design1-181124035217.pdf
A simple approach of lexical analyzers
1.Role lexical Analyzer
Compiler Designs
Ch3.ppt
3. Lexical analysis
Cd ch2 - lexical analysis
SS & CD Module 3
Module 2
Ad

Recently uploaded (20)

PPT
Project quality management in manufacturing
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
additive manufacturing of ss316l using mig welding
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Welding lecture in detail for understanding
PPTX
Construction Project Organization Group 2.pptx
PDF
composite construction of structures.pdf
PPTX
web development for engineering and engineering
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Project quality management in manufacturing
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
CH1 Production IntroductoryConcepts.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Mechanical Engineering MATERIALS Selection
OOP with Java - Java Introduction (Basics)
Internet of Things (IOT) - A guide to understanding
Operating System & Kernel Study Guide-1 - converted.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
additive manufacturing of ss316l using mig welding
CYBER-CRIMES AND SECURITY A guide to understanding
Welding lecture in detail for understanding
Construction Project Organization Group 2.pptx
composite construction of structures.pdf
web development for engineering and engineering
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Ad

LexicalAnalysis in Compiler design .pt

  • 2. 1. THE ROLE OF LEXICAL ANALYZER lexical analyzer (scanner) syntax analyzer (parser) symbol table manager source program tokens
  • 3.  Main task: to read input characters and group them into “tokens.”  Secondary tasks:  Skip comments and whitespace;  Correlate error messages with source program (e.g., line number of error).
  • 4. Different approaches for Implementing Lexical Analyzers:  Using a scanner generator, e.g., lex or flex. This automatically generates a lexical analyzer from a high-level description of the tokens. (easiest to implement; least efficient)  Programming it in a language such as C, using the I/O facilities of the language. (intermediate in ease, efficiency)  Writing it in assembly language and explicitly managing the input. (hardest to implement, but most efficient)
  • 5.  token: a name for a set of input strings with related structure. Example: “identifier,” “integer constant”  pattern: a rule describing the set of strings associated with a token. Example: “a letter followed by zero or more letters, digits, or underscores.”  lexeme: the actual input string that matches a pattern. Example: count
  • 6. Examples Input: count = 123 Tokens: identifier : Rule: “letter followed by …” Lexeme: count assg_op : Rule: = Lexeme: = integer_const : Rule: “digit followed by …” Lexeme: 123
  • 10.  If more than one lexeme can match the pattern for a token, the scanner must indicate the actual lexeme that matched.  This information is given using an attribute associated with the token. Example: The program statement count = 123 yields the following token-attribute pairs: identifier, pointer to the string “count” assg_op,  integer_const, the integer value 123
  • 13. 2. Input Buffering Scheme Three approaches for Implementing Lexical Analyzers: •Using a scanner generator, e.g., lex or flex. This automatically generates a lexical analyzer from a high- level description of the tokens. • (easiest to implement; least efficient) •Programming it in a language such as C, using the I/O facilities of the language. • (intermediate in ease, efficiency) •Writing it in assembly language and explicitly managing the input. (hardest to implement, but most efficient) These three choices are listed in the increasing difficulty for the implementer or compiler writer.
  • 14.  Lexical Analyzer performance or Speed is crucial, since  This is the only part of the compiler that examines the entire input program one character at a time.  Disk input can be slow.  The scanner accounts for considerable 25-30% of total compile time.  LA has to lookahead to determine when a match has been found to announce a token.  Scanners or LAs use and Inpput buffering technique called double-buffering to minimize the overheads associated with identification of tokens in a speed maner.
  • 17. Code:
  • 19. Input Buffering scheme with Sentinels  Objective: Optimize the common case by reducing the number of tests to one per advance of fwd.  Idea: Extend each buffer half to hold a sentinel at the end.  This is a special character that cannot occur in a program (e.g., EOF).  It signals the need for some special action (fill other buffer-half, or terminate processing).
  • 22. 3. Specification of Tokens: regular expressions Terminology: alphabet : a finite set of symbols string : a finite sequence of alphabet symbols language : a (finite or infinite) set of strings.
  • 26. Regular Expressions A pattern notation for describing certain kinds of sets over strings: Given an alphabet :   is a regular exp. (denotes the language {})  for each a  , a is a regular exp. (denotes the language {a})  if r and s are regular exps. denoting L(r) and L(s) respectively, then so are:  (r) | (s) ( denotes the language L(r)  L(s) )  (r)(s) ( denotes the language L(r)L(s) )  (r)* ( denotes the language L(r)* )
  • 32. 4. FINITE AUTOMATA – NFA and DFA
  • 33. Finite Automata A finite automaton is a 5-tuple (Q, , T, q0, F), where:   is a finite alphabet;  Q is a finite set of states;  T: Q    Q is the transition function;  q0  Q is the initial state; and  F  Q is a set of final states.
  • 37. NFA with € symbol for RE : (a/b)*abb
  • 41. 5.From Regular Expressions to NFA The following algorithm is used to construct NFA for the given RE.
  • 45. Construct NFA for the given RE : (a/b)*abb First decompose the given complex RE into a simple REs .
  • 49. The final NFA after applying algorithm
  • 54. Construct NFA with € symbol for RE (a/b)*abb and convert it into DFA NFA accepting the strings by the given RE is First consider the staring state of NFA 0, and then compute the €-closure(0) which is a starting state for DFA and is set of states taken from NFA
  • 59. DFA after conversion accepts the set of strings represented by the RE (a/b)*abb
  • 61. 7. Recognition of tokens Considering the language generated by the following grammar for the recognition of the tokens by Lexical Analyzer. The grammar is
  • 65. State9: c=nextchar(); if LETTWR(c) then STATE=10 else FAIL(); State10: c=nextchar(); if LETTWR(c) OR DIGIT(c) then STATE=10 else if OTHER(c) then STATE=11 else FAIL(); State11: Return (getToken(), install_ID())
  • 76. 8.Language for specifying Lexical analyzer (Lex,flex); LEX tool
  • 80. Considering the language generated by the following grammar for the recognition of the tokens by Lexical Analyzer. The grammar is
  • 81. The following is a LEX program that recognizes the tokens of various categories like white space, identifier, number, relational operators, and keywrods:if, then, else
  • 83. 9. Design of scanner or Lexical Analyzer generator
  • 85. In next step convert this compound NFA to DFA for recognizing the tokens by LA.
  • 86. First construct NFA for each pattern Pi in LEX program. Then construct compound NFA which recognizes all string represented by all the patterns. Pattern P1 Pattern P2 Pattern P3
  • 87. Converting the above NFA to DFA – The starting state A of DFA is composed of {0,1,3,7} NFA States as E-closure(0) {0,1,3,7 } A= B= C= D= E= F=