SlideShare a Scribd company logo
Lecture 2: Lexical Analysis
CS 540
George Mason University
CS 540 Spring 2013 GMU 2
Lexical Analysis - Scanning
Scanner
(lexical
analysis)
Parser
(syntax
analysis)
Code
Optimizer
Semantic
Analysis
(IC generator)
Code
Generator
Symbol
Table
• Tokens described formally
• Breaks input into tokens
• White space
Source
language
tokens
CS 540 Spring 2013 GMU 3
Lexical Analysis
INPUT: sequence of characters
OUTPUT: sequence of tokens
A lexical analyzer is generally a subroutine of parser:
• Simpler design
• Efficient
• Portable
Input Scanner Parser
Symbol
Table
Next_char()
character token
Next_token()
CS 540 Spring 2013 GMU 4
Definitions
• token – set of strings defining an atomic
element with a defined meaning
• pattern – a rule describing a set of string
• lexeme – a sequence of characters that
match some pattern
CS 540 Spring 2013 GMU 5
Examples
Token Pattern Sample
Lexeme
while while while
relation_op = | != | < | > <
integer (0-9)* 42
string Characters
between “ “
“hello”
CS 540 Spring 2013 GMU 6
Input string: size := r * 32 + c
<token,lexeme> pairs:
• <id, size>
• <assign, :=>
• <id, r>
• <arith_symbol, *>
• <integer, 32>
• <arith_symbol, +>
• <id, c>
CS 540 Spring 2013 GMU 7
Implementing a Lexical Analyzer
Practical Issues:
• Input buffering
• Translating RE into executable form
• Must be able to capture a large number of
tokens with single machine
• Interface to parser
• Tools
CS 540 Spring 2013 GMU 8
Capturing Multiple Tokens
Capturing keyword “begin”
Capturing variable names
What if both need to happen at the same time?
b e g i n WS
WS – white space
A – alphabetic
AN – alphanumeric
A
AN
WS
CS 540 Spring 2013 GMU 9
Capturing Multiple Tokens
b e g i n WS
WS – white space
A – alphabetic
AN – alphanumeric
A-b
AN
WS
AN
Machine is much more complicated – just for these two tokens!
WS
CS 540 Spring 2013 GMU 10
Real lexer (handcoded)
• http://cs.gmu.edu/~white/CS540/lexer.cpp
• Comes from C# compiler in Rotor
• >950 lines of C++
CS 540 Spring 2013 GMU 11
Lex – Lexical Analyzer
Generator
Lex
compiler
exec
Lex
specification
C/C++/Java
input tokens
flex – more modern
version of lex
jflex – java version of
flex
CS 540 Spring 2013 GMU 12
Lex Specification (flex)
%{
int charCount=0, wordCount=0, lineCount=0;
%}
word [^ tn]+
%%
{word} {wordCount++; charCount += strlen(yytext); }
[n] {charCount++; lineCount++;}
. {charCount++;}
%%
main() {
yylex();
printf(“Characters %d, Words: %d, Lines: %dn”,
charCount, wordCount, lineCount);
}
Definitions –
Code, RE
Rules –
RE/Action pairs
User Routines
CS 540 Spring 2013 GMU 13
Lex Specification (jflex)
import java.io.*;
%%
%class ex1
%unicode
%line
%column
%standalone
%{
static int charCount = 0, wordCount = 0, lineCount = 0;
public static void main(String [] args) throws IOException
{
ex1 lexer = new ex1(new FileReader(args[0]));
lexer.yylex();
System.out.println("Characters: " + charCount +
" Words: " + wordCount +" Lines: " +lineCount);
}
%}
%type Object //this line changes the return type of yylex into Object
word = [^ tn]+
%%
{word} {wordCount++; charCount += yytext().length(); }
[n] {charCount++; lineCount++; }
. {charCount++; }
Definitions –
Code, RE
Rules –
RE/Action pairs
CS 540 Spring 2013 GMU 14
Lex definitions section
• C/C++/Java code:
– Surrounded by %{… %} delimiters
– Declare any variables used in actions
• RE definitions:
– Define shorthand for patterns:
digit [0-9]
letter [a-z]
ident {letter}({letter}|{digit})*
– Use shorthand in RE section: {ident}
%{
int charCount=0, wordCount=0, lineCount=0;
%}
word [^ tn]+
CS 540 Spring 2013 GMU 15
Lex Regular Expressions
• Match explicit character sequences
– integer, “+++”, <>
• Character classes
– [abcd]
– [a-zA-Z]
– [^0-9] – matches non-numeric
{word} {wordCount++; charCount += strlen(yytext); }
[n] {charCount++; lineCount++;}
. {charCount++;}
CS 540 Spring 2013 GMU 16
• Alternation
– twelve | 12
• Closure
– * - zero or more
– + - one or more
– ? – zero or one
– {number}, {number,number}
CS 540 Spring 2013 GMU 17
• Other operators
– . – matches any character except newline
– ^ - matches beginning of line
– $ - matches end of line
– / - trailing context
– () – grouping
– {} – RE definitions
CS 540 Spring 2013 GMU 18
Lex Operators
Highest: closure
concatenation
alternation
Special lex characters:
-  / * + > “ { } . $ ( ) | % [ ] ^
Special lex characters inside [ ]:
-  [ ] ^
CS 540 Spring 2013 GMU 19
Examples
• a.*z
• (ab)+
• [0-9]{1,5}
• (ab|cd)?ef = abef,cdef,ef
• -?[0-9].[0-9]
CS 540 Spring 2013 GMU 20
Lex Actions
Lex actions are C (C++, Java) code to implement
some required functionality
• Default action is to echo to output
• Can ignore input (empty action)
• ECHO – macro that prints out matched string
• yytext – matched string
• yyleng – length of matched string (not all versions
have this) In Java:
yytext() and
yytext().length()
CS 540 Spring 2013 GMU 21
User Subroutines
• C/C++/Java code
• Copied directly into the lexer code
• User can supply ‘main’ or use default
main() {
yylex();
printf(“Characters %d, Words: %d, Lines: %dn”,charCount,
wordCount, lineCount);
}
CS 540 Spring 2013 GMU 22
How Lex works
Lex works by processing the file one character at a
time, trying to match a string starting from that
character.
1. Lex always attempts to match the longest possible
string.
2. If two rules are matched (and match strings are same
length), the first rule in the specification is used.
Once it matches a string, it starts from the character
after the string
CS 540 Spring 2013 GMU 23
Lex Matching Rules
1. Lex always attempts to match the longest
possible string.
beg {…}
begin {…}
in {…}
Input ‘begin’ can match either of the first two rules.
The second rule will be chosen because of the length.
CS 540 Spring 2013 GMU 24
Lex Matching Rules
2. If two rules are matched (the matched
strings are same length), the first rule in
the specification is used.
begin {… }
[a-z]+ {…}
Input ‘begin’ can match both rules – the first one will be chosen
CS 540 Spring 2013 GMU 25
Lex Example: Extracting white space
%{
#include <stdio.h>
%}
%%
[ tn] ;
. {ECHO;}
%%
To compile and run above (simple.l):
flex simple.l flex simple.l
gcc lex.yy.c –ll g++ -x c++ lex.yy.c –ll
a.out < input a.out < input
-lfl on some systems
CS 540 Spring 2013 GMU 26
Lex Example: Extracting white space
(Java)
%%
%class ex0
%unicode
%line
%column
%standalone
%%
[^ tn] {System.out.print(yytext());}
. {}
[n] {}
To compile and run above (simple.l):
java -jar ~cs540/JFlex.jar simple.l
javac ex0.java
java ex0 inputfile
name of class to build
CS 540 Spring 2013 GMU 27
Input:
This is a file
of stuff we want to extract all
white space from
Output:
Thisisafileofstuffwewantoextractallwhitespacefrom
CS 540 Spring 2013 GMU 28
Lex (C/C++)
• Lex always creates a file ‘lex.yy.c’ with a
function yylex()
• -ll directs the compiler to link to the lex
library (-lfl on some systems)
• The lex library supplies external symbols
referenced by the generated code
• The lex library supplies a default main:
main(int ac,char **av) {return yylex(); }
CS 540 Spring 2013 GMU 29
Lex Example 2: Unix wc
%{ int charCount=0, wordCount=0, lineCount=0;
%}
word [^ tn]+
%%
{word} {wordCount++; charCount += strlen(yytext); }
[n] {charCount++; lineCount++;}
. {charCount++;}
%%
main() {
yylex();
printf(“Characters %d, Words: %d, Lines: %dn”,charCount,
wordCount, lineCount);
}
CS 540 Spring 2013 GMU 30
Lex Example 3: Extracting tokens
%%
and return(AND);
array return(ARRAY);
begin return(BEGIN);
.
.
.
[ return(‘[‘);
“:=“ return(ASSIGN);
[a-zA-Z][a-zA-Z0-9_]* return(ID);
[+-]?[0-9]+ return(NUM);
[ tn] ;
%%
CS 540 Spring 2013 GMU 31
Uses for Lex
• Transforming Input – convert input from one
form to another (example 1). yylex() is called
once; return is not used in specification
• Extracting Information – scan the text and
return some information (example 2). yylex() is
called once; return is not used in specification.
• Extracting Tokens – standard use with
compiler (example 3). Uses return to give the next
token to the caller.
CS 540 Spring 2013 GMU 32
Lex States
• Regular expressions are compiled to state
machines.
• Lex allows the user to explicitly declare
multiple states.
%s COMMENT
• Default initial state INITIAL (0)
• Actions for matched strings may be
different for different states
CS 540 Spring 2013 GMU 33
Lex States
%{
int ctr = 0;
int linect = 1;
%}
%s COMMENT
%%
<INITIAL>. ECHO;
<INITIAL>[n] {linect++; ECHO;}
<INITIAL>”/*” {BEGIN COMMENT; ctr = 1;}
<COMMENT>. ;
<COMMENT>[n] linect++;
<COMMENT>”*/” {if (ctr == 1) BEGIN INITIAL;
else ctr--;
}
<COMMENT>”/*” {ctr++;}
%%

More Related Content

PPT
compiler Design laboratory lex and yacc tutorial
PPT
system software
PPTX
PPT
Lex and Yacc ppt
PDF
Ch04
PPT
Intro to tsql unit 10
PPTX
Streaming and input output mOOPlec9.pptx
PDF
lex and yacc.pdf
compiler Design laboratory lex and yacc tutorial
system software
Lex and Yacc ppt
Ch04
Intro to tsql unit 10
Streaming and input output mOOPlec9.pptx
lex and yacc.pdf

Similar to CS540-2-lecture2 Lexical analyser of .ppt (20)

PPT
C++ Strings.ppt
PPTX
unit-5 String Math Date Time AI presentation
PPT
Lex (lexical analyzer)
PPT
LEX lexical analyzer for compiler theory.ppt
PPS
CS101- Introduction to Computing- Lecture 38
PPTX
Regular expressions
PDF
CS4200 2019 | Lecture 4 | Syntactic Services
PDF
lecture_lex.pdf
PDF
ANSI C REFERENCE CARD
PDF
0-Slot21-22-Strings.pdf
PPT
CS540-2-lecture1.pptgvcxc increment cpp cpp
PPTX
13 Strings and Text Processing
PDF
Generating parsers using Ragel and Lemon
PPT
Compiler design Lexical analysis based on lex
PPT
1CompilerDesigningss_LexicalAnalysis.ppt
PPT
Saumya Debray The University of Arizona Tucson
PDF
The Ring programming language version 1.8 book - Part 116 of 202
PPTX
Lecture 24 PART 1.pptxkhfwraetrsytfyugiuihjojiiyutdruot8
PPTX
3 (3)Arrays and Strings for 11,12,college.pptx
PDF
DP080_Lecture_2 SQL related document.pdf
C++ Strings.ppt
unit-5 String Math Date Time AI presentation
Lex (lexical analyzer)
LEX lexical analyzer for compiler theory.ppt
CS101- Introduction to Computing- Lecture 38
Regular expressions
CS4200 2019 | Lecture 4 | Syntactic Services
lecture_lex.pdf
ANSI C REFERENCE CARD
0-Slot21-22-Strings.pdf
CS540-2-lecture1.pptgvcxc increment cpp cpp
13 Strings and Text Processing
Generating parsers using Ragel and Lemon
Compiler design Lexical analysis based on lex
1CompilerDesigningss_LexicalAnalysis.ppt
Saumya Debray The University of Arizona Tucson
The Ring programming language version 1.8 book - Part 116 of 202
Lecture 24 PART 1.pptxkhfwraetrsytfyugiuihjojiiyutdruot8
3 (3)Arrays and Strings for 11,12,college.pptx
DP080_Lecture_2 SQL related document.pdf
Ad

More from ranjan317165 (18)

PPT
universal human values L 14 Trust v4.ppt
PPT
Universal human values self and body chapter
PPT
L 13 universal human values Harmony in the Family v4.ppt
PPT
L 20 Mutual Fulfilment in Nature uhv lectures v5.ppt
PPTX
Module 4 Project management by ranjan v.pptx
PPTX
Software Requiremnet analysis module 2.pptx
PPTX
Introduction-to-Programming-Languages.pptx
PPT
Information system securit lecture 1y .ppt
PPTX
C functions with exercise to solve easily.pptx
PPTX
C functions by ranjan call by value and reference.pptx
PPT
L 27 Holistic Technologies v5 universal human values.ppt
PPT
06_PumpingLemma compiler design of chapter 4.ppt
PPT
atc 3rd module compiler and automata.ppt
PDF
role of lexical parser compiler design1-181124035217.pdf
PPT
15CS46 - Data communication or computer networks 1_Module-3.ppt
PPTX
compiler introduction vtu syllabus 1st chapter.pptx
PPT
Ppt on Design engineering which is chapter 9
PPTX
FiniteAutomata_anim.pptx
universal human values L 14 Trust v4.ppt
Universal human values self and body chapter
L 13 universal human values Harmony in the Family v4.ppt
L 20 Mutual Fulfilment in Nature uhv lectures v5.ppt
Module 4 Project management by ranjan v.pptx
Software Requiremnet analysis module 2.pptx
Introduction-to-Programming-Languages.pptx
Information system securit lecture 1y .ppt
C functions with exercise to solve easily.pptx
C functions by ranjan call by value and reference.pptx
L 27 Holistic Technologies v5 universal human values.ppt
06_PumpingLemma compiler design of chapter 4.ppt
atc 3rd module compiler and automata.ppt
role of lexical parser compiler design1-181124035217.pdf
15CS46 - Data communication or computer networks 1_Module-3.ppt
compiler introduction vtu syllabus 1st chapter.pptx
Ppt on Design engineering which is chapter 9
FiniteAutomata_anim.pptx
Ad

Recently uploaded (20)

PPTX
additive manufacturing of ss316l using mig welding
PPT
Mechanical Engineering MATERIALS Selection
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Construction Project Organization Group 2.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Well-logging-methods_new................
PDF
composite construction of structures.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Sustainable Sites - Green Building Construction
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
PPT on Performance Review to get promotions
PPTX
web development for engineering and engineering
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
OOP with Java - Java Introduction (Basics)
additive manufacturing of ss316l using mig welding
Mechanical Engineering MATERIALS Selection
UNIT 4 Total Quality Management .pptx
Construction Project Organization Group 2.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Well-logging-methods_new................
composite construction of structures.pdf
573137875-Attendance-Management-System-original
CYBER-CRIMES AND SECURITY A guide to understanding
Foundation to blockchain - A guide to Blockchain Tech
Sustainable Sites - Green Building Construction
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPT on Performance Review to get promotions
web development for engineering and engineering
Lesson 3_Tessellation.pptx finite Mathematics
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
OOP with Java - Java Introduction (Basics)

CS540-2-lecture2 Lexical analyser of .ppt

  • 1. Lecture 2: Lexical Analysis CS 540 George Mason University
  • 2. CS 540 Spring 2013 GMU 2 Lexical Analysis - Scanning Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table • Tokens described formally • Breaks input into tokens • White space Source language tokens
  • 3. CS 540 Spring 2013 GMU 3 Lexical Analysis INPUT: sequence of characters OUTPUT: sequence of tokens A lexical analyzer is generally a subroutine of parser: • Simpler design • Efficient • Portable Input Scanner Parser Symbol Table Next_char() character token Next_token()
  • 4. CS 540 Spring 2013 GMU 4 Definitions • token – set of strings defining an atomic element with a defined meaning • pattern – a rule describing a set of string • lexeme – a sequence of characters that match some pattern
  • 5. CS 540 Spring 2013 GMU 5 Examples Token Pattern Sample Lexeme while while while relation_op = | != | < | > < integer (0-9)* 42 string Characters between “ “ “hello”
  • 6. CS 540 Spring 2013 GMU 6 Input string: size := r * 32 + c <token,lexeme> pairs: • <id, size> • <assign, :=> • <id, r> • <arith_symbol, *> • <integer, 32> • <arith_symbol, +> • <id, c>
  • 7. CS 540 Spring 2013 GMU 7 Implementing a Lexical Analyzer Practical Issues: • Input buffering • Translating RE into executable form • Must be able to capture a large number of tokens with single machine • Interface to parser • Tools
  • 8. CS 540 Spring 2013 GMU 8 Capturing Multiple Tokens Capturing keyword “begin” Capturing variable names What if both need to happen at the same time? b e g i n WS WS – white space A – alphabetic AN – alphanumeric A AN WS
  • 9. CS 540 Spring 2013 GMU 9 Capturing Multiple Tokens b e g i n WS WS – white space A – alphabetic AN – alphanumeric A-b AN WS AN Machine is much more complicated – just for these two tokens! WS
  • 10. CS 540 Spring 2013 GMU 10 Real lexer (handcoded) • http://cs.gmu.edu/~white/CS540/lexer.cpp • Comes from C# compiler in Rotor • >950 lines of C++
  • 11. CS 540 Spring 2013 GMU 11 Lex – Lexical Analyzer Generator Lex compiler exec Lex specification C/C++/Java input tokens flex – more modern version of lex jflex – java version of flex
  • 12. CS 540 Spring 2013 GMU 12 Lex Specification (flex) %{ int charCount=0, wordCount=0, lineCount=0; %} word [^ tn]+ %% {word} {wordCount++; charCount += strlen(yytext); } [n] {charCount++; lineCount++;} . {charCount++;} %% main() { yylex(); printf(“Characters %d, Words: %d, Lines: %dn”, charCount, wordCount, lineCount); } Definitions – Code, RE Rules – RE/Action pairs User Routines
  • 13. CS 540 Spring 2013 GMU 13 Lex Specification (jflex) import java.io.*; %% %class ex1 %unicode %line %column %standalone %{ static int charCount = 0, wordCount = 0, lineCount = 0; public static void main(String [] args) throws IOException { ex1 lexer = new ex1(new FileReader(args[0])); lexer.yylex(); System.out.println("Characters: " + charCount + " Words: " + wordCount +" Lines: " +lineCount); } %} %type Object //this line changes the return type of yylex into Object word = [^ tn]+ %% {word} {wordCount++; charCount += yytext().length(); } [n] {charCount++; lineCount++; } . {charCount++; } Definitions – Code, RE Rules – RE/Action pairs
  • 14. CS 540 Spring 2013 GMU 14 Lex definitions section • C/C++/Java code: – Surrounded by %{… %} delimiters – Declare any variables used in actions • RE definitions: – Define shorthand for patterns: digit [0-9] letter [a-z] ident {letter}({letter}|{digit})* – Use shorthand in RE section: {ident} %{ int charCount=0, wordCount=0, lineCount=0; %} word [^ tn]+
  • 15. CS 540 Spring 2013 GMU 15 Lex Regular Expressions • Match explicit character sequences – integer, “+++”, <> • Character classes – [abcd] – [a-zA-Z] – [^0-9] – matches non-numeric {word} {wordCount++; charCount += strlen(yytext); } [n] {charCount++; lineCount++;} . {charCount++;}
  • 16. CS 540 Spring 2013 GMU 16 • Alternation – twelve | 12 • Closure – * - zero or more – + - one or more – ? – zero or one – {number}, {number,number}
  • 17. CS 540 Spring 2013 GMU 17 • Other operators – . – matches any character except newline – ^ - matches beginning of line – $ - matches end of line – / - trailing context – () – grouping – {} – RE definitions
  • 18. CS 540 Spring 2013 GMU 18 Lex Operators Highest: closure concatenation alternation Special lex characters: - / * + > “ { } . $ ( ) | % [ ] ^ Special lex characters inside [ ]: - [ ] ^
  • 19. CS 540 Spring 2013 GMU 19 Examples • a.*z • (ab)+ • [0-9]{1,5} • (ab|cd)?ef = abef,cdef,ef • -?[0-9].[0-9]
  • 20. CS 540 Spring 2013 GMU 20 Lex Actions Lex actions are C (C++, Java) code to implement some required functionality • Default action is to echo to output • Can ignore input (empty action) • ECHO – macro that prints out matched string • yytext – matched string • yyleng – length of matched string (not all versions have this) In Java: yytext() and yytext().length()
  • 21. CS 540 Spring 2013 GMU 21 User Subroutines • C/C++/Java code • Copied directly into the lexer code • User can supply ‘main’ or use default main() { yylex(); printf(“Characters %d, Words: %d, Lines: %dn”,charCount, wordCount, lineCount); }
  • 22. CS 540 Spring 2013 GMU 22 How Lex works Lex works by processing the file one character at a time, trying to match a string starting from that character. 1. Lex always attempts to match the longest possible string. 2. If two rules are matched (and match strings are same length), the first rule in the specification is used. Once it matches a string, it starts from the character after the string
  • 23. CS 540 Spring 2013 GMU 23 Lex Matching Rules 1. Lex always attempts to match the longest possible string. beg {…} begin {…} in {…} Input ‘begin’ can match either of the first two rules. The second rule will be chosen because of the length.
  • 24. CS 540 Spring 2013 GMU 24 Lex Matching Rules 2. If two rules are matched (the matched strings are same length), the first rule in the specification is used. begin {… } [a-z]+ {…} Input ‘begin’ can match both rules – the first one will be chosen
  • 25. CS 540 Spring 2013 GMU 25 Lex Example: Extracting white space %{ #include <stdio.h> %} %% [ tn] ; . {ECHO;} %% To compile and run above (simple.l): flex simple.l flex simple.l gcc lex.yy.c –ll g++ -x c++ lex.yy.c –ll a.out < input a.out < input -lfl on some systems
  • 26. CS 540 Spring 2013 GMU 26 Lex Example: Extracting white space (Java) %% %class ex0 %unicode %line %column %standalone %% [^ tn] {System.out.print(yytext());} . {} [n] {} To compile and run above (simple.l): java -jar ~cs540/JFlex.jar simple.l javac ex0.java java ex0 inputfile name of class to build
  • 27. CS 540 Spring 2013 GMU 27 Input: This is a file of stuff we want to extract all white space from Output: Thisisafileofstuffwewantoextractallwhitespacefrom
  • 28. CS 540 Spring 2013 GMU 28 Lex (C/C++) • Lex always creates a file ‘lex.yy.c’ with a function yylex() • -ll directs the compiler to link to the lex library (-lfl on some systems) • The lex library supplies external symbols referenced by the generated code • The lex library supplies a default main: main(int ac,char **av) {return yylex(); }
  • 29. CS 540 Spring 2013 GMU 29 Lex Example 2: Unix wc %{ int charCount=0, wordCount=0, lineCount=0; %} word [^ tn]+ %% {word} {wordCount++; charCount += strlen(yytext); } [n] {charCount++; lineCount++;} . {charCount++;} %% main() { yylex(); printf(“Characters %d, Words: %d, Lines: %dn”,charCount, wordCount, lineCount); }
  • 30. CS 540 Spring 2013 GMU 30 Lex Example 3: Extracting tokens %% and return(AND); array return(ARRAY); begin return(BEGIN); . . . [ return(‘[‘); “:=“ return(ASSIGN); [a-zA-Z][a-zA-Z0-9_]* return(ID); [+-]?[0-9]+ return(NUM); [ tn] ; %%
  • 31. CS 540 Spring 2013 GMU 31 Uses for Lex • Transforming Input – convert input from one form to another (example 1). yylex() is called once; return is not used in specification • Extracting Information – scan the text and return some information (example 2). yylex() is called once; return is not used in specification. • Extracting Tokens – standard use with compiler (example 3). Uses return to give the next token to the caller.
  • 32. CS 540 Spring 2013 GMU 32 Lex States • Regular expressions are compiled to state machines. • Lex allows the user to explicitly declare multiple states. %s COMMENT • Default initial state INITIAL (0) • Actions for matched strings may be different for different states
  • 33. CS 540 Spring 2013 GMU 33 Lex States %{ int ctr = 0; int linect = 1; %} %s COMMENT %% <INITIAL>. ECHO; <INITIAL>[n] {linect++; ECHO;} <INITIAL>”/*” {BEGIN COMMENT; ctr = 1;} <COMMENT>. ; <COMMENT>[n] linect++; <COMMENT>”*/” {if (ctr == 1) BEGIN INITIAL; else ctr--; } <COMMENT>”/*” {ctr++;} %%