SlideShare a Scribd company logo
Lex Yacc tutorial 
Kun-Yuan Hsieh 
kyshieh@pllab.cs.nthu.edu.tw 
Programming Language Lab., NTHU 
PLLab, NTHU,Cs2403 Programming 
Languages 
1
PLLab, NTHU,Cs2403 Programming 
Languages 
2 
Overview 
take a glance at Lex!
Compilation Sequence 
PLLab, NTHU,Cs2403 Programming 
Languages 
3
PLLab, NTHU,Cs2403 Programming 
Languages 
4 
What is Lex? 
• The main job of a lexical analyzer 
(scanner) is to break up an input stream 
into more usable elements (tokens) 
a = b + c * d; 
ID ASSIGN ID PLUS ID MULT ID SEMI 
• Lex is an utility to help you rapidly 
generate your scanners
Lex – Lexical Analyzer 
• Lexical analyzers tokenize input streams 
• Tokens are the terminals of a language 
– English 
PLLab, NTHU,Cs2403 Programming 
Languages 
5 
• words, punctuation marks, … 
– Programming language 
• Identifiers, operators, keywords, … 
• Regular expressions define 
terminals/tokens
Lex Source Program 
PLLab, NTHU,Cs2403 Programming 
Languages 
6 
• Lex source is a table of 
– regular expressions and 
– corresponding program fragments 
digit [0-9] 
letter [a-zA-Z] 
%% 
{letter}({letter}|{digit})* printf(“id: %sn”, yytext); 
n printf(“new linen”); 
%% 
main() { 
yylex(); 
}
Lex Source to C Program 
• The table is translated to a C program 
(lex.yy.c) which 
– reads an input stream 
– partitioning the input into strings which 
match the given expressions and 
– copying it to an output stream if necessary 
PLLab, NTHU,Cs2403 Programming 
Languages 
7
An Overview of Lex 
PLLab, NTHU,Cs2403 Programming 
Languages 
8 
Lex 
C compiler 
a.out 
Lex source 
program 
lex.yy.c 
input 
lex.yy.c 
a.out 
tokens
(required) 
(optional) 
{definitions} 
%% 
{transition rules} 
%% 
{user subroutines} 
PLLab, NTHU,Cs2403 Programming 
Languages 
9 
Lex Source 
• Lex source is separated into three sections by % 
% delimiters 
• The general format of Lex source is 
• The %% 
absolute minimum Lex program is thus
PLLab, NTHU,Cs2403 Programming 
Languages 
10 
Lex v.s. Yacc 
• Lex 
– Lex generates C code for a lexical analyzer, or 
scanner 
– Lex uses patterns that match strings in the input 
and converts the strings to tokens 
• Yacc 
– Yacc generates C code for syntax analyzer, or 
parser. 
– Yacc uses grammar rules that allow it to analyze 
tokens from Lex and create a syntax tree.
Lex source 
(Lexical Rules) 
Yacc source 
(Grammar Rules) 
call 
Input Parsed 
PLLab, NTHU,Cs2403 Programming 
Languages 
11 
Lex with Yacc 
Lex Yacc 
yylex() yyparse() 
Input 
lex.yy.c y.tab.c 
return token
Regular Expressions 
PLLab, NTHU,Cs2403 Programming 
Languages 
12
Lex Regular Expressions 
(Extended Regular Expressions) 
• A regular expression matches a set of strings 
• Regular expression 
PLLab, NTHU,Cs2403 Programming 
Languages 
13 
– Operators 
– Character classes 
– Arbitrary character 
– Optional expressions 
– Alternation and grouping 
– Context sensitivity 
– Repetitions and definitions
PLLab, NTHU,Cs2403 Programming 
Languages 
14 
Operators 
“  [ ] ^ - ? . * + | ( ) $ / { } % < > 
• If they are to be used as text characters, an 
escape should be used 
$ = “$” 
 = “” 
• Every character but blank, tab (t), newline (n) 
and the list above is always a text character
Character Classes [] 
• [abc] matches a single character, which may 
be a, b, or c 
• Every operator meaning is ignored except  - 
and ^ 
• e.g. 
[ab] => a or b 
[a-z] => a or b or c or … or z 
[-+0-9] => all the digits and the two signs 
[^a-zA-Z] => any character which is not a 
PLLab, NTHU,Cs2403 Programming 
Languages 
15 
letter
Arbitrary Character . 
• To match almost character, the 
operator character . is the class of all 
characters except newline 
• [40-176] matches all printable 
characters in the ASCII character set, 
from octal 40 (blank) to octal 176 
(tilde~) 
PLLab, NTHU,Cs2403 Programming 
Languages 
16
Optional & Repeated 
PLLab, NTHU,Cs2403 Programming 
Languages 
17 
Expressions 
• a? => zero or one instance of a 
• a* => zero or more instances of a 
• a+ => one or more instances of a 
• E.g. 
ab?c => ac or abc 
[a-z]+ => all strings of lower case letters 
[a-zA-Z][a-zA-Z0-9]* => all 
alphanumeric strings with a leading 
alphabetic character
Precedence of Operators 
• Level of precedence 
– Kleene closure (*), ?, + 
– concatenation 
– alternation (|) 
• All operators are left associative. 
• Ex: a*b|cd* = ((a*)b)|(c(d*)) 
PLLab, NTHU,Cs2403 Programming 
Languages 
18
Pattern Matching Primitives 
Metacharacter Matches 
. any character except newline 
n newline 
* zero or more copies of the preceding expression 
+ one or more copies of the preceding expression 
? zero or one copy of the preceding expression 
^ beginning of line / complement 
$ end of line 
a|b a or b 
(ab)+ one or more copies of ab (grouping) 
[ab] a or b 
a{3} 3 instances of a 
“a+b” literal “a+b” (C escapes still work) 
PLLab, NTHU,Cs2403 Programming 
Languages 
19
Recall: Lex Source 
a = b + c; 
a operator: ASSIGNMENT b + c; 
PLLab, NTHU,Cs2403 Programming 
Languages 
20 
• Lex source is a table of 
– regular expressions and 
– corresponding program fragments (actions) 
… 
%% 
<regexp> <action> 
<regexp> <action> 
… 
%% 
%% 
“=“ printf(“operator: ASSIGNMENT”);
Transition Rules 
• regexp <one or more blanks> action (C code); 
• regexp <one or more blanks> { actions (C code) } 
• A null statement ; will ignore the input (no 
actions) 
[ tn] ; 
– Causes the three spacing characters to be ignored a = b + c; 
PLLab, NTHU,Cs2403 Programming 
Languages 
21 
d = b * c; 
↓ ↓ 
a=b+c;d=b*c;
Transition Rules (cont’d) 
• Four special options for actions: 
|, ECHO;, BEGIN, and REJECT; 
• | indicates that the action for this rule is from 
the action for the next rule 
PLLab, NTHU,Cs2403 Programming 
Languages 
22 
– [ tn] ; 
– “ “ | 
“t” | 
“n” ; 
• The unmatched token is using a default 
action that ECHO from the input to the output
Transition Rules (cont’d) 
PLLab, NTHU,Cs2403 Programming 
Languages 
23 
• REJECT 
– Go do the next alternative 
… 
%% 
pink {npink++; REJECT;} 
ink {nink++; REJECT;} 
pin {npin++; REJECT;} 
. | 
n ; 
%% 
…
Lex Predefined Variables 
PLLab, NTHU,Cs2403 Programming 
Languages 
24 
• yytext -- a string containing the lexeme 
• yyleng -- the length of the lexeme 
• yyin -- the input stream pointer 
– the default input of default main() is stdin 
• yyout -- the output stream pointer 
– the default output of default main() is stdout. 
• cs20: %./a.out < inputfile > outfile 
• E.g. 
[a-z]+ printf(“%s”, yytext); 
[a-z]+ ECHO; 
[a-zA-Z]+ {words++; chars += yyleng;}
Lex Library Routines 
PLLab, NTHU,Cs2403 Programming 
Languages 
25 
• yylex() 
– The default main() contains a call of yylex() 
• yymore() 
– return the next token 
• yyless(n) 
– retain the first n characters in yytext 
• yywarp() 
– is called whenever Lex reaches an end-of-file 
– The default yywarp() always returns 1
Review of Lex Predefined 
PLLab, NTHU,Cs2403 Programming 
Languages 
26 
Variables 
Name Function 
char *yytext pointer to matched string 
int yyleng length of matched string 
FILE *yyin input stream pointer 
FILE *yyout output stream pointer 
int yylex(void) call to invoke lexer, returns token 
char* yymore(void) return the next token 
int yyless(int n) retain the first n characters in yytext 
int yywrap(void) wrapup, return 1 if done, 0 if not done 
ECHO write matched string 
REJECT go to the next alternative rule 
INITAL initial start condition 
BEGIN condition switch start condition
User Subroutines Section 
• You can use your Lex routines in the same 
ways you use routines in other programming 
languages. 
PLLab, NTHU,Cs2403 Programming 
Languages 
27 
%{ 
void foo(); 
%} 
letter [a-zA-Z] 
%% 
{letter}+ foo(); 
%% 
… 
void foo() { 
… 
}
User Subroutines Section 
PLLab, NTHU,Cs2403 Programming 
Languages 
28 
(cont’d) 
• The section where main() is placed 
%{ 
int counter = 0; 
%} 
letter [a-zA-Z] 
%% 
{letter}+ {printf(“a wordn”); counter++;} 
%% 
main() { 
yylex(); 
printf(“There are total %d wordsn”, counter); 
}
PLLab, NTHU,Cs2403 Programming 
Languages 
29 
Usage 
• To run Lex on a source file, type 
lex scanner.l 
• It produces a file named lex.yy.c which is 
a C program for the lexical analyzer. 
• To compile lex.yy.c, type 
cc lex.yy.c –ll 
• To run the lexical analyzer program, type 
./a.out < inputfile
PLLab, NTHU,Cs2403 Programming 
Languages 
30 
Versions of Lex 
• AT&T -- lex 
http://www.combo.org/lex_yacc_page/lex.html 
• GNU -- flex 
http://www.gnu.org/manual/flex-2.5.4/flex.html 
• a Win32 version of flex : 
http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html 
or Cygwin : 
http://sources.redhat.com/cygwin/ 
• Lex on different machines is not created equal.
Yacc - Yet Another Compiler- 
PLLab, NTHU,Cs2403 Programming 
Languages 
31 
Compiler
PLLab, NTHU,Cs2403 Programming 
Languages 
32 
Introduction 
• What is YACC ? 
– Tool which will produce a parser for a 
given grammar. 
– YACC (Yet Another Compiler Compiler) 
is a program designed to compile a 
LALR(1) grammar and to produce the 
source code of the syntactic analyzer of 
the language produced by this grammar.
How YACC Works 
PLLab, NTHU,Cs2403 Programming 
Languages 
33 
a.out 
File containing desired 
grammar in yacc format 
yyaacccc pprrooggrraamm 
C source program created by yacc 
CC ccoommppiilleerr 
Executable program that will parse 
grammar given in gram.y 
gram.y 
yacc 
y.tab.c 
cc 
or gcc
How YACC Works 
y.output 
y.tab.c a.out 
PLLab, NTHU,Cs2403 Programming 
Languages 
34 
yacc 
(1) Parser generation time 
YACC source (*.y) 
y.tab.h 
y.tab.c 
C compiler/linker 
(2) Compile time 
a.out 
(3) Run time 
Token stream 
Abstract 
Syntax 
Tree
An YACC File Example 
PLLab, NTHU,Cs2403 Programming 
Languages 
35 
%{ 
#include <stdio.h> 
%} 
%token NAME NUMBER 
%% 
statement: NAME '=' expression 
| expression { printf("= %dn", $1); } 
; 
expression: expression '+' NUMBER { $$ = $1 + $3; } 
| expression '-' NUMBER { $$ = $1 - $3; } 
| NUMBER { $$ = $1; } 
; 
%% 
int yyerror(char *s) 
{ 
fprintf(stderr, "%sn", s); 
return 0; 
} 
int main(void) 
{ 
yyparse(); 
return 0; 
}
PLLab, NTHU,Cs2403 Programming 
Languages 
36 
Works with Lex 
How to 
work ?
PLLab, NTHU,Cs2403 Programming 
Languages 
37 
Works with Lex 
call yylex() [0-9]+ 
next token is NUM 
NUM ‘+’ NUM
YACC File Format 
PLLab, NTHU,Cs2403 Programming 
Languages 
38 
%{ 
C declarations 
%} 
yacc declarations 
%% 
Grammar rules 
%% 
Additional C code 
– Comments enclosed in /* ... */ may appear in 
any of the sections.
Definitions Section 
PLLab, NTHU,Cs2403 Programming 
Languages 
39 
%{ 
#include <stdio.h> 
#include <stdlib.h> 
%} 
%token ID NUM 
%start expr 
It is a terminal 
由 expr 開始 
parse
PLLab, NTHU,Cs2403 Programming 
Languages 
40 
Start Symbol 
• The first non-terminal specified in the 
grammar specification section. 
• To overwrite it with %start declaraction. 
%start non-terminal
PLLab, NTHU,Cs2403 Programming 
Languages 
41 
Rules Section 
• This section defines grammar 
• Example 
expr : expr '+' term | term; 
term : term '*' factor | factor; 
factor : '(' expr ')' | ID | NUM;
PLLab, NTHU,Cs2403 Programming 
Languages 
42 
Rules Section 
• Normally written like this 
• Example: 
expr : expr '+' term 
| term 
; 
term : term '*' factor 
| factor 
; 
factor : '(' expr ')' 
| ID 
| NUM 
;
The Position of Rules 
expr : expr '+' term { $$ = $1 + $3; } 
| term { $$ = $1; } 
; 
term : term '*' factor { $$ = $1 * $3; } 
| factor { $$ = $1; } 
; 
factor : '(' expr ')' { $$ = $2; } 
PLLab, NTHU,Cs2403 Programming 
Languages 
43 
| ID 
| NUM 
;
The Position of Rules 
expr : eexxpprr '+' term { $$ = $1 + $3; } 
| tteerrmm { $$ = $1; } 
; 
term : tteerrmm '*' factor { $$ = $1 * $3; } 
| ffaaccttoorr { $$ = $1; } 
; 
factor : '((' expr ')' { $$ = $2; } 
PLLab, NTHU,Cs2403 Programming 
Languages 
44 
| ID 
| NUM 
; 
$$11
The Position of Rules 
expr : expr '++' term { $$ = $1 + $3; } 
| term { $$ = $1; } 
; 
term : term '**' factor { $$ = $1 * $3; } 
| factor { $$ = $1; } 
; 
factor : '(' eexxpprr ')' { $$ = $2; } 
PLLab, NTHU,Cs2403 Programming 
Languages 
45 
| ID 
| NUM 
; $$22
The Position of Rules 
expr : expr '+' tteerrmm { $$ = $1 + $3; } 
| term { $$ = $1; } 
; 
term : term '*' ffaaccttoorr { $$ = $1 * $3; } 
| factor { $$ = $1; } 
; 
factor : '(' expr '))' { $$ = $2; } 
| ID 
| NUM 
; $$33 DDeeffaauulltt:: $$$$ == $$11;; 
PLLab, NTHU,Cs2403 Programming 
Languages 
46
Communication between LEX and 
PLLab, NTHU,Cs2403 Programming 
Languages 
47 
YACC 
call yylex() [0-9]+ 
next token is NUM 
NUM ‘+’ NUM 
LEX and YACC需要一套方法確認token的身份
Communication between LEX 
PLLab, NTHU,Cs2403 Programming 
Languages 
48 
and YACC 
yacc -d gram.y 
Will produce: 
y.tab.h 
• Use enumeration ( 列舉 ) / 
define 
• 由一方產生,另一方 
include 
• YACC 產生 y.tab.h 
• LEX include y.tab.h
Communication between LEX 
scanner.l 
PLLab, NTHU,Cs2403 Programming 
Languages 
49 
and YACC 
%{ 
#include <stdio.h> 
#include "y.tab.h" 
%} 
id [_a-zA-Z][_a-zA-Z0-9]* 
%% 
int { return INT; } 
char { return CHAR; } 
float { return FLOAT; } 
{id} { return ID;} 
%{ 
#include <stdio.h> 
#include <stdlib.h> 
%} 
%token CHAR, FLOAT, ID, INT 
%% 
yacc -d xxx.y 
Produced 
y.tab.h: 
# define CHAR 258 
# define FLOAT 259 
# define ID 260 
# define INT 261 
parser.y
Phrase -> cart_animal AND CART 
| work_animal AND PLOW… 
PLLab, NTHU,Cs2403 Programming 
Languages 
50 
YACC 
• Rules may be recursive 
• Rules may be ambiguous* 
• Uses bottom up Shift/Reduce parsing 
– Get a token 
– Push onto stack 
– Can it reduced (How do we know?) 
• If yes: Reduce using a rule 
• If no: Get another token 
• Yacc cannot look ahead more than one token
PLLab, NTHU,Cs2403 Programming 
Languages 
51 
Yacc Example 
• Taken from Lex & Yacc 
• Simple calculator 
a = 4 + 6 
a 
a=10 
b = 7 
c = a + b 
c 
c = 17 
$
PLLab, NTHU,Cs2403 Programming 
Languages 
52 
Grammar 
expression ::= expression '+' term | 
expression '-' term | 
term 
term ::= term '*' factor | 
term '/' factor | 
factor 
factor ::= '(' expression ')' | 
'-' factor | 
NUMBER | 
NAME
PLLab, NTHU,Cs2403 Programming 
Languages 
53 
statement_list: statement 'n' 
| statement_list statement 'n' 
; 
statement: NAME '=' expression { $1->value = $3; } 
| expression { printf("= %gn", $1); } 
; 
expression: expression '+' term { $$ = $1 + $3; } 
| expression '-' term { $$ = $1 - $3; } 
| term 
; 
parser.y 
Parser (cont’d)
Parser (cont’d) 
PLLab, NTHU,Cs2403 Programming 
Languages 
54 
term: term '*' factor { $$ = $1 * $3; } 
| term '/' factor { if ($3 == 0.0) 
yyerror("divide by zero"); 
else 
$$ = $1 / $3; 
} 
| factor 
; 
factor: '(' expression ')' { $$ = $2; } 
| '-' factor { $$ = -$2; } 
| NUMBER { $$ = $1; } 
| NAME { $$ = $1->value; } 
; 
%% parser.y
Scanner 
%{ 
#include "y.tab.h" 
#include "parser.h" 
#include <math.h> 
%} 
%% 
([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) { 
PLLab, NTHU,Cs2403 Programming 
Languages 
55 
yylval.dval = atof(yytext); 
return NUMBER; 
} 
[ t] ; /* ignore white space */ 
scanner.l
Scanner (cont’d) 
[A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ 
yylval.symp = symlook(yytext); 
return NAME; 
PLLab, NTHU,Cs2403 Programming 
Languages 
56 
} 
"$" { return 0; /* end of input */ } 
n|”=“|”+”|”-”|”*”|”/” return yytext[0]; 
%% 
scanner.l
YACC Command 
PLLab, NTHU,Cs2403 Programming 
Languages 
57 
• Yacc (AT&T) 
– yacc –d xxx.y 
• Bison (GNU) 
– bison –d –y xxx.y 
產生y.tab.c, 與yacc相同 
不然會產生xxx.tab.c
(1) 1 – 2 - 3 
(2) 1 – 2 * 3 
PLLab, NTHU,Cs2403 Programming 
Languages 
58 
Precedence / 
Association 
1. 1-2-3 = (1-2)-3? or 1-(2-3)? 
Define ‘-’ operator is left-association. 
2. 1-2*3 = 1-(2*3) 
Define “*” operator is precedent to “-” 
operator
PLLab, NTHU,Cs2403 Programming 
Languages 
59 
Precedence / 
Association 
%right ‘=‘ 
%left '<' '>' NE LE GE 
%left '+' '-‘ 
%left '*' '/' 
highest precedence
%left '+' '-' 
%left '*' '/' 
%noassoc UMINUS 
PLLab, NTHU,Cs2403 Programming 
Languages 
60 
Precedence / 
Association 
expr : expr ‘+’ expr { $$ = $1 + $3; } 
| expr ‘-’ expr { $$ = $1 - $3; } 
| expr ‘*’ expr { $$ = $1 * $3; } 
| expr ‘/’ expr 
{ 
if($3==0) 
yyerror(“divide 0”); 
else 
$$ = $1 / $3; 
} 
| ‘-’ expr %prec UMINUS {$$ = -$2; }
Shift/Reduce Conflicts 
• shift/reduce conflict 
– occurs when a grammar is written in 
such a way that a decision between 
shifting and reducing can not be made. 
– ex: IF-ELSE ambigious. 
• To resolve this conflict, yacc will 
choose to shift. 
PLLab, NTHU,Cs2403 Programming 
Languages 
61
YACC Declaration 
PLLab, NTHU,Cs2403 Programming 
Languages 
62 
Summary `%start' 
Specify the grammar's start symbol 
`%union' 
Declare the collection of data types that semantic values may 
have 
`%token' 
Declare a terminal symbol (token type name) with no 
precedence or 
associativity specified 
`%type' 
Declare the type of semantic values for a nonterminal symbol
YACC Declaration 
PLLab, NTHU,Cs2403 Programming 
Languages 
63 
`%right' Summary 
Declare a terminal symbol (token type name) that is 
right-associative 
`%left' 
Declare a terminal symbol (token type name) that is left-associative 
`%nonassoc' 
Declare a terminal symbol (token type name) that is 
nonassociative 
(using it in a way that would be associative is a syntax error, 
ex: x op. y op. z is syntax error)
Reference Books 
PLLab, NTHU,Cs2403 Programming 
Languages 
64 
• lex & yacc, 2nd Edition 
– by John R.Levine, Tony Mason & Doug 
Brown 
– O’Reilly 
– ISBN: 1-56592-000-7 
• Mastering Regular Expressions 
– by Jeffrey E.F. Friedl 
– O’Reilly 
– ISBN: 1-56592-257-3

More Related Content

PPTX
Design of a two pass assembler
PPTX
System Programming Overview
PPTX
parallel language and compiler
PPTX
PPTX
Recognition-of-tokens
PDF
Processes and Processors in Distributed Systems
PPTX
LALR Parser Presentation ppt
PDF
Parallel programming model, language and compiler in ACA.
Design of a two pass assembler
System Programming Overview
parallel language and compiler
Recognition-of-tokens
Processes and Processors in Distributed Systems
LALR Parser Presentation ppt
Parallel programming model, language and compiler in ACA.

What's hot (20)

PPTX
Specification-of-tokens
PPTX
Regular Expression to Finite Automata
PPTX
Lexical analysis - Compiler Design
PDF
Lecture 01 introduction to compiler
PPT
Lecture 1 - Lexical Analysis.ppt
PPTX
Lecture 14 run time environment
PPT
1.Role lexical Analyzer
PPTX
Types of Parser
PPT
Intermediate code generation (Compiler Design)
PDF
Syntax analysis
PPT
02. chapter 3 lexical analysis
PPTX
Finite automata-for-lexical-analysis
PPTX
Lex & yacc
PPTX
Lexical analyzer generator lex
PDF
Deterministic Finite Automata (DFA)
PPT
POST’s CORRESPONDENCE PROBLEM
PPTX
Syntax Analysis in Compiler Design
PDF
Code generation in Compiler Design
PPTX
LR(1) and SLR(1) parsing
Specification-of-tokens
Regular Expression to Finite Automata
Lexical analysis - Compiler Design
Lecture 01 introduction to compiler
Lecture 1 - Lexical Analysis.ppt
Lecture 14 run time environment
1.Role lexical Analyzer
Types of Parser
Intermediate code generation (Compiler Design)
Syntax analysis
02. chapter 3 lexical analysis
Finite automata-for-lexical-analysis
Lex & yacc
Lexical analyzer generator lex
Deterministic Finite Automata (DFA)
POST’s CORRESPONDENCE PROBLEM
Syntax Analysis in Compiler Design
Code generation in Compiler Design
LR(1) and SLR(1) parsing
Ad

Similar to Lex and Yacc ppt (20)

PDF
lex and yacc.pdf
PPT
system software
PPT
compiler Design laboratory lex and yacc tutorial
PDF
Lex and Yacc.pdf
PPTX
module 4.pptx
PPT
LEX lexical analyzer for compiler theory.ppt
PPT
Lex and Yacc Tool M1.ppt
PPTX
Compiler Design_LEX Tool for Lexical Analysis.pptx
PDF
lecture_lex.pdf
PPT
Lex (lexical analyzer)
PPT
PPT
531AlmadhorAlwageed2010.ppt
PPTX
Systems Programming & Operating Systems - Overview of LEX-and-YACC
PDF
LANGUAGE PROCESSOR
PPT
Compiler Design Brief in Computer Application
PPT
Compiler Design Tutorial
PPT
Introduction to Lex.ppt
PPT
Lexyacc
PPT
lex&yacc - Rules may be recursive Rules may be ambiguous Uses bottom-up Shift...
PPTX
Lex programming
lex and yacc.pdf
system software
compiler Design laboratory lex and yacc tutorial
Lex and Yacc.pdf
module 4.pptx
LEX lexical analyzer for compiler theory.ppt
Lex and Yacc Tool M1.ppt
Compiler Design_LEX Tool for Lexical Analysis.pptx
lecture_lex.pdf
Lex (lexical analyzer)
531AlmadhorAlwageed2010.ppt
Systems Programming & Operating Systems - Overview of LEX-and-YACC
LANGUAGE PROCESSOR
Compiler Design Brief in Computer Application
Compiler Design Tutorial
Introduction to Lex.ppt
Lexyacc
lex&yacc - Rules may be recursive Rules may be ambiguous Uses bottom-up Shift...
Lex programming
Ad

Recently uploaded (20)

PPTX
Information Storage and Retrieval Techniques Unit III
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
Design Guidelines and solutions for Plastics parts
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PPT
Total quality management ppt for engineering students
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PPTX
introduction to high performance computing
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
737-MAX_SRG.pdf student reference guides
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Information Storage and Retrieval Techniques Unit III
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Design Guidelines and solutions for Plastics parts
Fundamentals of safety and accident prevention -final (1).pptx
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
Total quality management ppt for engineering students
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Management Information system : MIS-e-Business Systems.pptx
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Nature of X-rays, X- Ray Equipment, Fluoroscopy
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
introduction to high performance computing
Abrasive, erosive and cavitation wear.pdf
737-MAX_SRG.pdf student reference guides
Fundamentals of Mechanical Engineering.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...

Lex and Yacc ppt

  • 1. Lex Yacc tutorial Kun-Yuan Hsieh kyshieh@pllab.cs.nthu.edu.tw Programming Language Lab., NTHU PLLab, NTHU,Cs2403 Programming Languages 1
  • 2. PLLab, NTHU,Cs2403 Programming Languages 2 Overview take a glance at Lex!
  • 3. Compilation Sequence PLLab, NTHU,Cs2403 Programming Languages 3
  • 4. PLLab, NTHU,Cs2403 Programming Languages 4 What is Lex? • The main job of a lexical analyzer (scanner) is to break up an input stream into more usable elements (tokens) a = b + c * d; ID ASSIGN ID PLUS ID MULT ID SEMI • Lex is an utility to help you rapidly generate your scanners
  • 5. Lex – Lexical Analyzer • Lexical analyzers tokenize input streams • Tokens are the terminals of a language – English PLLab, NTHU,Cs2403 Programming Languages 5 • words, punctuation marks, … – Programming language • Identifiers, operators, keywords, … • Regular expressions define terminals/tokens
  • 6. Lex Source Program PLLab, NTHU,Cs2403 Programming Languages 6 • Lex source is a table of – regular expressions and – corresponding program fragments digit [0-9] letter [a-zA-Z] %% {letter}({letter}|{digit})* printf(“id: %sn”, yytext); n printf(“new linen”); %% main() { yylex(); }
  • 7. Lex Source to C Program • The table is translated to a C program (lex.yy.c) which – reads an input stream – partitioning the input into strings which match the given expressions and – copying it to an output stream if necessary PLLab, NTHU,Cs2403 Programming Languages 7
  • 8. An Overview of Lex PLLab, NTHU,Cs2403 Programming Languages 8 Lex C compiler a.out Lex source program lex.yy.c input lex.yy.c a.out tokens
  • 9. (required) (optional) {definitions} %% {transition rules} %% {user subroutines} PLLab, NTHU,Cs2403 Programming Languages 9 Lex Source • Lex source is separated into three sections by % % delimiters • The general format of Lex source is • The %% absolute minimum Lex program is thus
  • 10. PLLab, NTHU,Cs2403 Programming Languages 10 Lex v.s. Yacc • Lex – Lex generates C code for a lexical analyzer, or scanner – Lex uses patterns that match strings in the input and converts the strings to tokens • Yacc – Yacc generates C code for syntax analyzer, or parser. – Yacc uses grammar rules that allow it to analyze tokens from Lex and create a syntax tree.
  • 11. Lex source (Lexical Rules) Yacc source (Grammar Rules) call Input Parsed PLLab, NTHU,Cs2403 Programming Languages 11 Lex with Yacc Lex Yacc yylex() yyparse() Input lex.yy.c y.tab.c return token
  • 12. Regular Expressions PLLab, NTHU,Cs2403 Programming Languages 12
  • 13. Lex Regular Expressions (Extended Regular Expressions) • A regular expression matches a set of strings • Regular expression PLLab, NTHU,Cs2403 Programming Languages 13 – Operators – Character classes – Arbitrary character – Optional expressions – Alternation and grouping – Context sensitivity – Repetitions and definitions
  • 14. PLLab, NTHU,Cs2403 Programming Languages 14 Operators “ [ ] ^ - ? . * + | ( ) $ / { } % < > • If they are to be used as text characters, an escape should be used $ = “$” = “” • Every character but blank, tab (t), newline (n) and the list above is always a text character
  • 15. Character Classes [] • [abc] matches a single character, which may be a, b, or c • Every operator meaning is ignored except - and ^ • e.g. [ab] => a or b [a-z] => a or b or c or … or z [-+0-9] => all the digits and the two signs [^a-zA-Z] => any character which is not a PLLab, NTHU,Cs2403 Programming Languages 15 letter
  • 16. Arbitrary Character . • To match almost character, the operator character . is the class of all characters except newline • [40-176] matches all printable characters in the ASCII character set, from octal 40 (blank) to octal 176 (tilde~) PLLab, NTHU,Cs2403 Programming Languages 16
  • 17. Optional & Repeated PLLab, NTHU,Cs2403 Programming Languages 17 Expressions • a? => zero or one instance of a • a* => zero or more instances of a • a+ => one or more instances of a • E.g. ab?c => ac or abc [a-z]+ => all strings of lower case letters [a-zA-Z][a-zA-Z0-9]* => all alphanumeric strings with a leading alphabetic character
  • 18. Precedence of Operators • Level of precedence – Kleene closure (*), ?, + – concatenation – alternation (|) • All operators are left associative. • Ex: a*b|cd* = ((a*)b)|(c(d*)) PLLab, NTHU,Cs2403 Programming Languages 18
  • 19. Pattern Matching Primitives Metacharacter Matches . any character except newline n newline * zero or more copies of the preceding expression + one or more copies of the preceding expression ? zero or one copy of the preceding expression ^ beginning of line / complement $ end of line a|b a or b (ab)+ one or more copies of ab (grouping) [ab] a or b a{3} 3 instances of a “a+b” literal “a+b” (C escapes still work) PLLab, NTHU,Cs2403 Programming Languages 19
  • 20. Recall: Lex Source a = b + c; a operator: ASSIGNMENT b + c; PLLab, NTHU,Cs2403 Programming Languages 20 • Lex source is a table of – regular expressions and – corresponding program fragments (actions) … %% <regexp> <action> <regexp> <action> … %% %% “=“ printf(“operator: ASSIGNMENT”);
  • 21. Transition Rules • regexp <one or more blanks> action (C code); • regexp <one or more blanks> { actions (C code) } • A null statement ; will ignore the input (no actions) [ tn] ; – Causes the three spacing characters to be ignored a = b + c; PLLab, NTHU,Cs2403 Programming Languages 21 d = b * c; ↓ ↓ a=b+c;d=b*c;
  • 22. Transition Rules (cont’d) • Four special options for actions: |, ECHO;, BEGIN, and REJECT; • | indicates that the action for this rule is from the action for the next rule PLLab, NTHU,Cs2403 Programming Languages 22 – [ tn] ; – “ “ | “t” | “n” ; • The unmatched token is using a default action that ECHO from the input to the output
  • 23. Transition Rules (cont’d) PLLab, NTHU,Cs2403 Programming Languages 23 • REJECT – Go do the next alternative … %% pink {npink++; REJECT;} ink {nink++; REJECT;} pin {npin++; REJECT;} . | n ; %% …
  • 24. Lex Predefined Variables PLLab, NTHU,Cs2403 Programming Languages 24 • yytext -- a string containing the lexeme • yyleng -- the length of the lexeme • yyin -- the input stream pointer – the default input of default main() is stdin • yyout -- the output stream pointer – the default output of default main() is stdout. • cs20: %./a.out < inputfile > outfile • E.g. [a-z]+ printf(“%s”, yytext); [a-z]+ ECHO; [a-zA-Z]+ {words++; chars += yyleng;}
  • 25. Lex Library Routines PLLab, NTHU,Cs2403 Programming Languages 25 • yylex() – The default main() contains a call of yylex() • yymore() – return the next token • yyless(n) – retain the first n characters in yytext • yywarp() – is called whenever Lex reaches an end-of-file – The default yywarp() always returns 1
  • 26. Review of Lex Predefined PLLab, NTHU,Cs2403 Programming Languages 26 Variables Name Function char *yytext pointer to matched string int yyleng length of matched string FILE *yyin input stream pointer FILE *yyout output stream pointer int yylex(void) call to invoke lexer, returns token char* yymore(void) return the next token int yyless(int n) retain the first n characters in yytext int yywrap(void) wrapup, return 1 if done, 0 if not done ECHO write matched string REJECT go to the next alternative rule INITAL initial start condition BEGIN condition switch start condition
  • 27. User Subroutines Section • You can use your Lex routines in the same ways you use routines in other programming languages. PLLab, NTHU,Cs2403 Programming Languages 27 %{ void foo(); %} letter [a-zA-Z] %% {letter}+ foo(); %% … void foo() { … }
  • 28. User Subroutines Section PLLab, NTHU,Cs2403 Programming Languages 28 (cont’d) • The section where main() is placed %{ int counter = 0; %} letter [a-zA-Z] %% {letter}+ {printf(“a wordn”); counter++;} %% main() { yylex(); printf(“There are total %d wordsn”, counter); }
  • 29. PLLab, NTHU,Cs2403 Programming Languages 29 Usage • To run Lex on a source file, type lex scanner.l • It produces a file named lex.yy.c which is a C program for the lexical analyzer. • To compile lex.yy.c, type cc lex.yy.c –ll • To run the lexical analyzer program, type ./a.out < inputfile
  • 30. PLLab, NTHU,Cs2403 Programming Languages 30 Versions of Lex • AT&T -- lex http://www.combo.org/lex_yacc_page/lex.html • GNU -- flex http://www.gnu.org/manual/flex-2.5.4/flex.html • a Win32 version of flex : http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html or Cygwin : http://sources.redhat.com/cygwin/ • Lex on different machines is not created equal.
  • 31. Yacc - Yet Another Compiler- PLLab, NTHU,Cs2403 Programming Languages 31 Compiler
  • 32. PLLab, NTHU,Cs2403 Programming Languages 32 Introduction • What is YACC ? – Tool which will produce a parser for a given grammar. – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of the language produced by this grammar.
  • 33. How YACC Works PLLab, NTHU,Cs2403 Programming Languages 33 a.out File containing desired grammar in yacc format yyaacccc pprrooggrraamm C source program created by yacc CC ccoommppiilleerr Executable program that will parse grammar given in gram.y gram.y yacc y.tab.c cc or gcc
  • 34. How YACC Works y.output y.tab.c a.out PLLab, NTHU,Cs2403 Programming Languages 34 yacc (1) Parser generation time YACC source (*.y) y.tab.h y.tab.c C compiler/linker (2) Compile time a.out (3) Run time Token stream Abstract Syntax Tree
  • 35. An YACC File Example PLLab, NTHU,Cs2403 Programming Languages 35 %{ #include <stdio.h> %} %token NAME NUMBER %% statement: NAME '=' expression | expression { printf("= %dn", $1); } ; expression: expression '+' NUMBER { $$ = $1 + $3; } | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ; %% int yyerror(char *s) { fprintf(stderr, "%sn", s); return 0; } int main(void) { yyparse(); return 0; }
  • 36. PLLab, NTHU,Cs2403 Programming Languages 36 Works with Lex How to work ?
  • 37. PLLab, NTHU,Cs2403 Programming Languages 37 Works with Lex call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM
  • 38. YACC File Format PLLab, NTHU,Cs2403 Programming Languages 38 %{ C declarations %} yacc declarations %% Grammar rules %% Additional C code – Comments enclosed in /* ... */ may appear in any of the sections.
  • 39. Definitions Section PLLab, NTHU,Cs2403 Programming Languages 39 %{ #include <stdio.h> #include <stdlib.h> %} %token ID NUM %start expr It is a terminal 由 expr 開始 parse
  • 40. PLLab, NTHU,Cs2403 Programming Languages 40 Start Symbol • The first non-terminal specified in the grammar specification section. • To overwrite it with %start declaraction. %start non-terminal
  • 41. PLLab, NTHU,Cs2403 Programming Languages 41 Rules Section • This section defines grammar • Example expr : expr '+' term | term; term : term '*' factor | factor; factor : '(' expr ')' | ID | NUM;
  • 42. PLLab, NTHU,Cs2403 Programming Languages 42 Rules Section • Normally written like this • Example: expr : expr '+' term | term ; term : term '*' factor | factor ; factor : '(' expr ')' | ID | NUM ;
  • 43. The Position of Rules expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } PLLab, NTHU,Cs2403 Programming Languages 43 | ID | NUM ;
  • 44. The Position of Rules expr : eexxpprr '+' term { $$ = $1 + $3; } | tteerrmm { $$ = $1; } ; term : tteerrmm '*' factor { $$ = $1 * $3; } | ffaaccttoorr { $$ = $1; } ; factor : '((' expr ')' { $$ = $2; } PLLab, NTHU,Cs2403 Programming Languages 44 | ID | NUM ; $$11
  • 45. The Position of Rules expr : expr '++' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '**' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' eexxpprr ')' { $$ = $2; } PLLab, NTHU,Cs2403 Programming Languages 45 | ID | NUM ; $$22
  • 46. The Position of Rules expr : expr '+' tteerrmm { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' ffaaccttoorr { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr '))' { $$ = $2; } | ID | NUM ; $$33 DDeeffaauulltt:: $$$$ == $$11;; PLLab, NTHU,Cs2403 Programming Languages 46
  • 47. Communication between LEX and PLLab, NTHU,Cs2403 Programming Languages 47 YACC call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM LEX and YACC需要一套方法確認token的身份
  • 48. Communication between LEX PLLab, NTHU,Cs2403 Programming Languages 48 and YACC yacc -d gram.y Will produce: y.tab.h • Use enumeration ( 列舉 ) / define • 由一方產生,另一方 include • YACC 產生 y.tab.h • LEX include y.tab.h
  • 49. Communication between LEX scanner.l PLLab, NTHU,Cs2403 Programming Languages 49 and YACC %{ #include <stdio.h> #include "y.tab.h" %} id [_a-zA-Z][_a-zA-Z0-9]* %% int { return INT; } char { return CHAR; } float { return FLOAT; } {id} { return ID;} %{ #include <stdio.h> #include <stdlib.h> %} %token CHAR, FLOAT, ID, INT %% yacc -d xxx.y Produced y.tab.h: # define CHAR 258 # define FLOAT 259 # define ID 260 # define INT 261 parser.y
  • 50. Phrase -> cart_animal AND CART | work_animal AND PLOW… PLLab, NTHU,Cs2403 Programming Languages 50 YACC • Rules may be recursive • Rules may be ambiguous* • Uses bottom up Shift/Reduce parsing – Get a token – Push onto stack – Can it reduced (How do we know?) • If yes: Reduce using a rule • If no: Get another token • Yacc cannot look ahead more than one token
  • 51. PLLab, NTHU,Cs2403 Programming Languages 51 Yacc Example • Taken from Lex & Yacc • Simple calculator a = 4 + 6 a a=10 b = 7 c = a + b c c = 17 $
  • 52. PLLab, NTHU,Cs2403 Programming Languages 52 Grammar expression ::= expression '+' term | expression '-' term | term term ::= term '*' factor | term '/' factor | factor factor ::= '(' expression ')' | '-' factor | NUMBER | NAME
  • 53. PLLab, NTHU,Cs2403 Programming Languages 53 statement_list: statement 'n' | statement_list statement 'n' ; statement: NAME '=' expression { $1->value = $3; } | expression { printf("= %gn", $1); } ; expression: expression '+' term { $$ = $1 + $3; } | expression '-' term { $$ = $1 - $3; } | term ; parser.y Parser (cont’d)
  • 54. Parser (cont’d) PLLab, NTHU,Cs2403 Programming Languages 54 term: term '*' factor { $$ = $1 * $3; } | term '/' factor { if ($3 == 0.0) yyerror("divide by zero"); else $$ = $1 / $3; } | factor ; factor: '(' expression ')' { $$ = $2; } | '-' factor { $$ = -$2; } | NUMBER { $$ = $1; } | NAME { $$ = $1->value; } ; %% parser.y
  • 55. Scanner %{ #include "y.tab.h" #include "parser.h" #include <math.h> %} %% ([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) { PLLab, NTHU,Cs2403 Programming Languages 55 yylval.dval = atof(yytext); return NUMBER; } [ t] ; /* ignore white space */ scanner.l
  • 56. Scanner (cont’d) [A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); return NAME; PLLab, NTHU,Cs2403 Programming Languages 56 } "$" { return 0; /* end of input */ } n|”=“|”+”|”-”|”*”|”/” return yytext[0]; %% scanner.l
  • 57. YACC Command PLLab, NTHU,Cs2403 Programming Languages 57 • Yacc (AT&T) – yacc –d xxx.y • Bison (GNU) – bison –d –y xxx.y 產生y.tab.c, 與yacc相同 不然會產生xxx.tab.c
  • 58. (1) 1 – 2 - 3 (2) 1 – 2 * 3 PLLab, NTHU,Cs2403 Programming Languages 58 Precedence / Association 1. 1-2-3 = (1-2)-3? or 1-(2-3)? Define ‘-’ operator is left-association. 2. 1-2*3 = 1-(2*3) Define “*” operator is precedent to “-” operator
  • 59. PLLab, NTHU,Cs2403 Programming Languages 59 Precedence / Association %right ‘=‘ %left '<' '>' NE LE GE %left '+' '-‘ %left '*' '/' highest precedence
  • 60. %left '+' '-' %left '*' '/' %noassoc UMINUS PLLab, NTHU,Cs2403 Programming Languages 60 Precedence / Association expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; }
  • 61. Shift/Reduce Conflicts • shift/reduce conflict – occurs when a grammar is written in such a way that a decision between shifting and reducing can not be made. – ex: IF-ELSE ambigious. • To resolve this conflict, yacc will choose to shift. PLLab, NTHU,Cs2403 Programming Languages 61
  • 62. YACC Declaration PLLab, NTHU,Cs2403 Programming Languages 62 Summary `%start' Specify the grammar's start symbol `%union' Declare the collection of data types that semantic values may have `%token' Declare a terminal symbol (token type name) with no precedence or associativity specified `%type' Declare the type of semantic values for a nonterminal symbol
  • 63. YACC Declaration PLLab, NTHU,Cs2403 Programming Languages 63 `%right' Summary Declare a terminal symbol (token type name) that is right-associative `%left' Declare a terminal symbol (token type name) that is left-associative `%nonassoc' Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, ex: x op. y op. z is syntax error)
  • 64. Reference Books PLLab, NTHU,Cs2403 Programming Languages 64 • lex & yacc, 2nd Edition – by John R.Levine, Tony Mason & Doug Brown – O’Reilly – ISBN: 1-56592-000-7 • Mastering Regular Expressions – by Jeffrey E.F. Friedl – O’Reilly – ISBN: 1-56592-257-3