SlideShare a Scribd company logo
CH1.1
Chapter 1: Introduction to Compiling
Danish Alam
Computer Engineer
Jamia millia Islaim
Email- mydanish07@gmail.com
CH1.2
Short Historical Intro
CH1.3
Computing in the beginning of time....
?
CH1.4
Computing in the beginning of time
how do you program this?? Changing Gears!
the man... ... the legend
and this
is only
a small
part of
the machine!
the programmer
Charles Babbage
The Difference Engine” (1822)
Lady Ada of Lovelace
CH1.5
Computers!
 Electronic Numerical Integrator and
Computer --- ENIAC, 1942
how do you program this?? Flicking Switches!
CH1.6
Timeline
 the 1940’s.
Code is hand generated at 0-1 level and entered
by physical switches.
Hardware is rewired according to the program.
 the early 1950’s.
First attempts to abstraction.
Grace Murray Hopper:
 “translation is a compilation of a sequence of
machine-language subprograms selected from a
library.”
 first “compiler” A-0 (by G. M. Hopper)
 Code now is written in Assembly form with
simple English statements.
 the late 1950’s.
FORTRAN is born together with its Compiler!!!
CH1.7
Fortran
diagram from http://merd.net/pixel/language-study/diagram.html
CH1.8
the 1960’s
 ALGOL 60
 The first Language with a
formal grammar specification
 more on this later on (in every meeting of this
class!)
 FORTRAN gets improved.
 Language theory is better understood, it evolves
and revolutionizes compiler design.
 First “Syntax-Directed Compiler” is born in 1961.
 PASCAL is born (Wirth, 1968).
 First attempts at automating compiler construction
using elements of Formal Language Theory.
CH1.9
The 1970’s, 80’s, 90’s
 C programming Language is born with its compiler
(1972). Distributed as part of the UNIX operating
system.
 BASIC is born (1975).
 “Compiler-Compiler” Tools start to be developed
and used extensively.
 Success of PCs brings compilers and interpreters in
everyone’s home.
CH1.10
Today
 Compilers are everywhere.
 “Programming” (in the strict sense)
is only one application domain...
 TeX and LaTeX
 language source is compiled into a document.
 Postscript
 language source is translated by laser printers to printer
machine level instructions that print a document.
 Mathematica / Matlab
 use a language to specify mathematical operations.
 Verilog / VHDL
 compiles into a circuit
 you name it...
CH1.11
Introduction to Compilers
 As a Discipline, Involves Multiple CS&E Areas
 Programming Languages and Algorithms
 Theory of Computing & Software Engineering
 Computer Architecture & Operating Systems
 Has Deceivingly Simplistic Intent:
Compiler
Source
program
Target
Program
Error messages
Diverse & Varied
CH1.12
Classifications of Compilers
 Compilers Viewed from Many Perspectives
 However, All utilize same basic tasks to
accomplish their actions
Single Pass
Multiple Pass
Load & Go
Construction
Debugging
Optimizing
Functional
CH1.13
The Model
 The TWO Fundamental Parts:
 We Will Discuss Both in This Class, and
FOCUS on analysis.
Analysis:
Synthesis:
Decompose Source into an
intermediate representation
Target program generation
from representation
CH1.14
Important Notes
 Today: There are many Software Tools for helping with the
Analysis Part. This Wasn’t the Case in Early Days. (some)
analysis is also important in:
 Structure / Syntax directed editors: Force
“syntactically” correct code to be entered
 Pretty Printers: Standardized version for program
structure (i.e., blank space, indenting, etc.)
 Static Checkers: A “quick” compilation to detect
rudimentary errors
 Interpreters: “real” time execution of code a “line-at-a-
time”
CH1.15
Important Notes
 Compilation Is Not Limited to Programming Language
Applications
 Text Formatters
 LATEX & TROFF Are Languages Whose Commands
Format Text
 Silicon Compilers
 Textual / Graphical: Take Input and Generate Circuit Design
 Database Query Processors
 Database Query Languages Are Also a Programming
Language
 Input is compiled Into a Set of Operations for Accessing the
Database
CH1.16
Language Translation
 To execute a program written in a high-level
language, we need to translate it to the native code
of the target machine
Translator
Source code Target code
Target
Machine
CH1.17
Language Translation …contd
 Terminology:
 source code/program
 Target code/program
 Object code
 Native code
 Executable code
 Compiler, assembler, interpreter
CH1.18
Translators
 Assembler
 Input/source: assembly language
 Interpreter
 Translates and executes the source program
 Target code not stored
 Compiler
 Translates and produces target code that can be
stored and reused many times
CH1.19
Compilers
 Analysis-synthesis model
 Analysis breaks source program into pieces and
creates an intermediate representation
 Synthesis part constructs the desired target
program from the intermediate form
CH1.20
Compilers …contd
 Analysis
 Operations implied by the source program are
determined and recorded in a “tree” structure or
similar structure
 “syntax tree”: node is an operation and children
are operands
CH1.21
Source program (1)
Preprocessor
Source program (2)
Compiler
Target assembly program
Assembler
Relocatable machine code
Loader/linker
Absolute machine code
Libraries
object files
Example Language Processing System
CH1.22
Phases of a Compiler
 Lexical analyzer (scanner)
 Syntax analyzer (parser)
 Semantic analyzer*
 Intermediate code generator
 Code optimizer
 Code generator
(*semantic analysis may be part of intermediate code
generation)
All phases
involve
symbol
table and
error
handling
CH1.23
The Many Phases of a Compiler
Source Program
Lexical
Analyzer
1
Syntax Analyzer
2
Semantic Analyzer
3
Intermediate
Code Generator
4
Code Optimizer
5
Code Generator
6
Target Program
Symbol-table
Manager
Error Handler
1, 2, 3 : Analysis - Our Focus
4, 5, 6 : Synthesis
CH1.24
 Three Phases:
 Linear / Lexical Analysis:
 L-to-r Scan to Identify Tokens
token: sequence of chars having a collective meaning
 Hierarchical Analysis:
 Grouping of Tokens Into Meaningful Collection
 Semantic Analysis:
 Checking to ensure Correctness of Components
The Analysis Task For Compilation
CH1.25
Lexical Analysis
 Also called scanning, linear analysis or lex
 Not complex as syntax/semantic analysis
 Reads stream of characters in the source program
left-to-right
 Discards white space and comments
 If language is case-insensitive, makes all characters
(except string const.’s) uniform
CH1.26
Lexical Analysis …contd
 Breaks the source into individual lexical units or lexemes
collectively called “tokens”
 Sequence of characters with a collective meaning in the
grammar of a language
 Tokens are classified into types
 E.g., identifiers, keywords, numeric constants, string
constants, operators
 Non-tokens
 Comments, preprocessor directives and macros (in
C/C++), blanks, tabs and new lines
CH1.27
Lexical Analysis …contd
Token type Example lexemes
ID sum, rate, calc
NUM 60, 0 , 23, 082
REAL 66.1, .5, 10. , 1e-9
IF if (keyword)
EQ =
PLUS +
SCOLON ;
LPAREN (
CH1.28
Phase 1. Lexical Analysis
Easiest Analysis - Identify tokens which
are the basic building blocks
For
Example:
All are tokens
Blanks, Line breaks, etc. are scanned out
Position := initial + rate * 60 ;
_______ __ _____ _ ___ _ __ _
CH1.29
Lexical Analysis …contd
 Consider:
pos = init + rate * 60 ;
 What is the stream of tokens generated?
ID(pos) EQ ID(init) PLUS ID(rate) MULT
NUM(60) SCOLON
 What will be added to the symbol table?
pos, init, rate
60 ?
CH1.30
Syntax Analysis (Parsing)
 Also called hierarchical analysis
 Groups tokens hierarchically into grammatical
phrases called parse trees
 Determines the syntactic structure of the program
and of individual statements
 Detects errors in statements that violate grammar
rules
CH1.31
Syntax Analysis …contd
 E.g., pos = init + rate * 60 ;
ID
I
D
ID
NUM
expression
Assignment statement
expression
expression
expression
expressionpos
init
rate
60
=
+
*
SCOLON
CH1.32
Syntax Analysis …contd
 Hierarchical structure of a program usually
expressed by recursive rules
 Example: definition of an expression
1. Any identifier (ID) is an expression
2. Any number (NUM) is an expression
3. If exp1 and exp2 are expressions, so are
– exp1 + exp2
– exp1 * exp2
– (exp1) and so on…
CH1.33
Syntax Analysis …contd
 Lexical constructs not recursive
 Can recognize tokens via simple linear scan
 Formalized by regular grammars
 Syntactic constructs often recursive
 Formalized by context-free grammars (CFG)
 Linear scan not powerful to analyze
expressions/statements
CH1.34
Semantic Analysis
 Syntax ↔ structure of a program
 Semantics ↔ meaning of a program
 Semantic analysis ensures that parts of a program fit
together meaningfully
 May be part of intermed. code generator
 Type checking is an important task here
CH1.35
Phase 3. Semantic Analysis
 Find More Complicated Semantic Errors and
Support Code Generation
 Parse Tree Is Augmented With Semantic Actions
position
initial
rate
:=
+
*
60
Compressed Tree
position
initial
rate
:=
+
*
inttoreal
60
Conversion Action
CH1.36
Semantic Analysis …contd
 Check if operands are of permitted types
 Can apply coercion if language allows
 Example: pos = init + rate * 60 ;
 If the context is float, then the integer 60 will be
converted (coerced) to 60.0
 Checks if variables have been declared
CH1.37
Symbol Table
 Serves as a database for the compilation
 Contains type and attribute info of each identifier
defined in the program
 Entered by the scanner, attributes added later
 Symbol table management is an essential task of a
compiler
CH1.38
Error Handling
 Each phase can encounter errors
 Errors must be detected and reported
 After detecting an error, error must be dealt with so
that compilation proceeds
 Further errors can be found
 A compiler that stops at 1st error not so helpful
 Syntax & semantic analysis phases typically handle
most of the errors
CH1.39
Intermediate Code Generation
 Intermediate representation
 Generally a program for an abstract machine (can
be assembly language or slightly above)
 Should be easy to produce and translate into
target code
 Why?
 Optimization yet to be done
 Machine code difficult to optimize
CH1.40
Intermediate Code Gen. …contd
 Can use the three-address code (3AC)
 Each instruction can have at most 3 operands
and one operator with assignment
 E.g., pos = init + rate * 60.0 ;
 This in 3AC might look like
temp1 = id3 * 60.0
temp2 = id2 + temp1
id1 = temp2
CH1.41
Code Optimization
 Attempts to improve the intermediate code
 Usually for faster execution, but the code may get
shorter too
 For our example, we can get:
temp1 = id3 * 60.0
id1 = id2 + temp1
 Great variation in possibilities
 “optimizing compilers” can do many things
CH1.42
Optimization Objectives
 Optimize for
 Performance/speed
 Code size
 Power consumption
 Fast compilation
 Security/reliability
 Debugging
CH1.43
Example program
int calc(int a, int b, int N)
{
int i, x, y;
x=y=0;
for (i=0; i <=N; i++) {
x = x + (4*a/b) * i + (i+1) * (i+1);
x = x + b*y;
}
return x;
} What are the possible
optimizations?
CH1.44
Optimizations
 Applicable to the example program
 Constant propagation
 Algebraic simplification
 Deadcode elimination
 Loop invariant removal
 Strength reduction
 Register allocation
CH1.45
Optimizations …contd
 Unoptimized: 7 + N*21 operations
 Optimized: 9+N*7 operations
 Execution times can be 43 sec vs 17 sec
CH1.46
Code Generation
 Generates target code
 Relocatable machine code or assembly code
 Variables assigned to memory locations
 Translates an intermediate instruction to a sequence
of machine instructions
 Register allocation: a critical task
CH1.47
Code Generation …contd
 Suppose we use 2 registers, R1 and R2
 Our example
LOAD id3, R2
MULF #60.0, R2
LOAD id2, R1
ADDF R2, R1
STORE R1, id1
CH1.48
Front Ends & Back Ends
 Typically phases grouped into a front end and a
back end
 Front end:
 Lexical + syntax + semantic analysis, symbol
table creation, intermed. code gen., some
optimizations
 Portions that are source-language-dependent
and machine-independent
CH1.49
Front Ends & Back Ends …contd
 Back end:
 Code generation and machine-dependent
optimizations
 Portions that are independent of source language
 Can port a compiler to different machines by reusing
the front end and re-doing the back end
CH1.50
Passes of a Compiler
 A compiler makes 1 or more passes over a program
 Several phases are usually implemented in a single
pass
 A pass consists of reading an input file and writing
an output file
 Single-pass compiler may be feasible
 E.g., Pascal
CH1.51
Passes of a Compiler …contd
 Lexical analysis to intermed. code gen. can be
grouped into a pass
 Activities in these phases interleaved
 Here, parser can be seen as “in charge”
Calls scanner for the next token
Attempts to discover syntactic structure
Calls the intermed. code gen. to do semantic analysis
and generate portion of code
CH1.52
Passes of a Compiler …contd
 Why need multiple passes?
 Some issues raised early in a program may
remain un-answered until later
E.g., if refers to identifiers/functions defined later
 May not have enough memory to do everything in
a single pass
E.g., Can store intermediate results in a file in one pass
and read them in the next pass
CH1.53
Why Have We Divided Analysis
in This Manner?
 Lexical Analysis - Scans Input, Its Linear Actions
Are Not Recursive
 Identify Only Individual “words” that are the
the Tokens of the Language
 Recursion Is Required to Identify Structure of an
Expression, As Indicated in Parse Tree
 Verify that the “words” are Correctly
Assembled into “sentences”
 What is Third Phase?
 Determine Whether the Sentences have One
and Only One Unambiguous Interpretation
 … and do something about it!
 e.g. “John Took Picture of Mary Out on the
Patio”
CH1.54
Supporting Phases/
Activities for Analysis
 Symbol Table Creation / Maintenance
 Contains Info (storage, type, scope, args) on
Each “Meaningful” Token, Typically Identifiers
 Data Structure Created / Initialized During
Lexical Analysis
 Utilized / Updated During Later Analysis &
Synthesis
 Error Handling
 Detection of Different Errors Which
Correspond to All Phases
 What Kinds of Errors Are Found During the
Analysis Phase?
 What Happens When an Error Is Found?
CH1.55
The Synthesis Task For Compilation
 Intermediate Code Generation
 Abstract Machine Version of Code -
Independent of Architecture
 Easy to Produce and Do Final, Machine
Dependent Code Generation
 Code Optimization
 Find More Efficient Ways to Execute Code
 Replace Code With More Optimal Statements
 2-approaches: High-level Language &
“Peephole” Optimization
 Final Code Generation
 Generate Relocatable Machine Dependent Code
CH1.56
Reviewing the Entire Process
E
r
r
o
r
s
position := initial + rate * 60
lexical analyzer
syntax analyzer
semantic analyzer
intermediate code generator
id1 := id2 + id3 * 60
:=
id1
id2l
id3
+
*
60
:=
id1
id2l
id3
+
*
inttoreal
60
Symbol
Table
position ....
initial ….
rate….
CH1.57
Reviewing the Entire Process
E
r
r
o
r
s
intermediate code generator
code optimizer
final code generator
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
temp1 := id3 * 60.0
id1 := id2 + temp1
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R1, R2
MOVF R1, id1
position ....
initial ….
rate….
Symbol Table
3 address code
CH1.58
Assemblers
 Assembly code: names are used for instructions,
and names are used for memory addresses.
 Two-pass Assembly:
 First Pass: all identifiers are assigned to
memory addresses (0-offset)
e.g. substitute 0 for a, and 4 for b
 Second Pass: produce relocatable machine
code:
MOV a, R1
ADD #2, R1
MOV R1, b
0001 01 00 00000000 *
0011 01 10 00000010
0010 01 00 00000100 *
relocation
bit
CH1.59
Loaders and Link-Editors
 Loader: taking relocatable machine code, altering
the addresses and placing the altered instructions
into memory.
 Link-editor: taking many (relocatable) machine
code programs (with cross-references) and produce
a single file.
 Need to keep track of correspondence between
variable names and corresponding addresses in
each piece of code.
CH1.60
Compiler Cousins: Preprocessors
Provide Input to Compilers
1. Macro Processing
#define in C: does text substitution before
compiling
#define X 3
#define Y A*B+C
#define Z getchar()
CH1.61
2. File Inclusion
#include in C - bring in another file before compiling
defs.h
//////
//////
//////
main.c
#include “defs.h”
…---…---…---
…---…---…---
…---…---…---
//////
//////
//////
…---…---…---
…---…---…---
…---…---…---
CH1.62
3. Rational Preprocessors
 Augment “Old” Languages With Modern
Constructs
 Add Macros for If - Then, While, Etc.
 #Define Can Make C Code More Pascal-like
#define begin {
#define end }
#define then
CH1.63
4. Language Extensions for a
Database System
EQUEL - Database query language embedded in a
programming language. C
## Retrieve (DN=Department.Dnum) where
## Department.Dname = ‘Research’
is Preprocessed into:
ingres_system(“Retr…..Research’”,____,____);
a procedure call in a programming language.
CH1.64
The Grouping of Phases
Front End : Analysis + Intermediate Code Generation
Back End : Code Generation + Optimization
vs.
Number of Passes:
A pass: requires r/w intermediate files
Fewer passes: more efficiency.
However: fewer passes require more
sophisticated memory management and
compiler phase interaction.
Tradeoffs ……..
CH1.65
Compiler Construction Tools
Parser Generators : Produce Syntax
Analyzers
Scanner Generators : Produce Lexical
Analyzers <= Lex (Flex)
Syntax-directed Translation Engines :
Generate Intermediate Code <= Yacc (Bison)
Automatic Code Generators : Generate
Actual Code
Data-Flow Engines : Support Optimization
CH1.66
Compiler Construction Overview
CH1.67
Top-Down Parsing
Grammar:
(1) L  S ; L
(2) L  
(3) S  if e then S else S
(4) S  while e do S
(5) S  begin L end
(6) S  s
Construct First & Follow Tables.
First
L  if while begin s
S if while begin s
Follow
L $ end
S ; else
CH1.68
Predictive Recursive Parser
main()
{
lookahead=gettoken();
L();
}
L()
{
if lookahead belongs to
‘if’, ‘while’, ‘begin’ , s
then {S(); match(‘;’); L(); }
else /* epsilon production */
}
S()
{
if lookahead = ‘if’ then
{ match(‘if’); match(e); match(‘then’);
S(); match(‘else’); S(); }
else if lookahead = ‘while’ then
{ match(‘while’); match(e);
match(‘do’); S(); }
else if lookahead = ‘begin’ then
{ match(‘begin’); L(); match(‘end’); }
else if lookahead = s then
{ match(s); }
else print(‘error’);
}
match(t:token)
{
if t==gettoken()
then
{
nexttoken();
}
else
{
printf(“error”);
}
}
CH1.69
Table-Driven Parser
; if then else begin end while s $
L (1) (1) (2) (1) (1) (2)
S synch (3) synch (5) (4) (6)
Grammar is LL(1)
CH1.70
Table-Driven Parser Code
main()
{ initstack(); push(L);
repeat
X=top(); lookahead=gettoken();
if X is terminal or $ then
{ if X=lookahead then
{ pop(); nexttoken(); }
else error(); }
else /* X is a non-terminal */
{
switch M[X,lookahead] {
case 1: pop(); push(S); push(‘;’); push(L);
nexttoken();
case 2: ;
case 3: ... ... ...
...
case ‘synch’: pop(); printf(“error: X expected”);
case ‘empty’: nexttoken(); printf(“error: lookahead
expected”); }
}
CH1.71
SLR Parsing
Grammar:
(1) E  E + T
(2) E  T
(3) T  T F
(4) T  F
(5) F  F *
(6) F  a
(7) F  b
Construct First & Follow Tables:
First
E a b
T a b
F a b
Follow
E $ +
T $ + a b
F $ + a b *
CH1.72
Canonical Sets of Items + Goto Diagram
Set I0
E’  . E
E  . E + T
E  . T
T  . T F
T  . F
F  . F *
F  . a
F  . b
Set I1
E’  E .
E  E . + T
Set I2
E  T .
T  T . F
F  . F *
F  . a
F  . b
Set I3
T  F .
F  F . *
Set I4
F  a.
Set I5
F  b.
Set I6
E  E + . T
T  . T F
T  . F
F  . F *
F  . a
F  . b
Set I7
E  E + T .
T  T . F
F  . F *
F  . a
F  . b
Set I9
T  F .
F  F . *
Set I8
T  T F .
F  F . *
Set I10
F  F * .
E
TF
a
I5
b
+
T
F
a b
F F
*
*
I4
I5
a
b
I4
I5
a
b
I10
*
CH1.73
SLR Table
+ * a b $ E T F
0 s4 s5 1 2 3
1 s6 acc
2 r2 s4 s5 r2 8
3 r4 s10 r4 r4 r4
4 r6 r6 r6 r6 r6
5 r7 r7 r7 r7 r7
6 s4 s5 7 9
7 r1 s4 s5 r1 8
8 r3 s10 r3 r3 r3
9 r4 s10 r4 r4 r4
10 r5 r5 r5 r5 r5
CH1.74
Code
main()
{ initstack(); push(0);
repeat
state=top(); lookahead=gettoken();
{
switch action[state,lookahead] {
case ‘s4’: push(4); nexttoken();
case ‘s5’: push(5); nexttoken();
...
case ‘r1’: pop();pop();pop();
push(goto(top()));
case ‘r2’: pop();
push(goto(top()));
...
case ‘acc’: printf(“Success”);
case ‘empty’: printf(“error”);
...
}
CH1.75
Translation
Intuition behind translation:
 Assign attributes to terminals and non-terminals.
 Typically, some attributes will be of type e.g. string
and will carry the translated code as we traverse the
parse-tree.
 Write semantic rules for each production.
 Revisit all parsers and every time a certain
production is “selected” (top-down) or “reduced”
(bottom-up) execute all its semantic rules.
 Things to worry about: dependency between
attributes... course of parsing might be inconsistent
with attribute dependencies.
CH1.76
Simple Translation
Grammar:
(1) L  S ; L
(2) L  
(3) S  if e then S else S
(4) S  while e do S
(5) S  begin L end
(6) S  s
Example of an accepted string:
s ;
while e do
begin
s ;
s ;
end ;
if e then s else s ;
s ;
Translation Example: removal
of while-loops
s ;
100:
if e then goto 101 else goto 102 ;
101:
{ s ; s; };
goto 100;
102:
if e then s else s ;
s
CH1.77
Writing A Translation Scheme
Grammar:
(1) L  S ; L1 { L.code =S.code || ‘;’ || L1.code }
(2) L   { L.code =  }
(3) S  if e then S1 else S2 { S.code = ‘if’ || e.lex || ‘then’ || S1.code ||
‘else’ || S2.code }
(4) S  while e do S1 { beginlabel = newlabel( );
blocklabel = newlabel ( );
endlabel = newlabel( );
S.code = beginlabel || ‘:’ || ‘if’ || e.lex ||
‘then’ || ‘goto’ || blocklabel || ‘else’ ||
‘goto’ || endlabel || ‘;’ || blocklabel || ‘:’ ||
S1.code || ‘goto’ || beginlabel || ‘;’ ||
endlabel || ‘:’ }
(5) S  begin L end { S.code = ‘{‘ || L.code || ‘}’ }
(6) S  s { S.code = s.lex }
CH1.78
Predictive Recursive Translation
string main()
{
lookahead=gettoken();
return L();
}
string L()
{
if lookahead belongs to
‘if’, ‘while’, ‘begin’ , s
then {s1 = S();
match(‘;’);
s2 = L();
output = concatenate(s1,’;’,s2);
}
else {output = ‘’}
return output;
}
...
Dealing
with inherited
attributes:
define them as
input to
procedures
CH1.79
Bottom Up Translation
main()
{ initstack(); push(0);
repeat
state=top(); lookahead=gettoken();
{
switch action[state,lookahead] {
...
case ‘r1’: val2=topval();pop();
pop();
val1=topval();pop();
push(goto(top()),concat(val1||’;’||val2));
...
}
(1) L  S ; L1 { L.code =S.code || ‘;’ || L1.code }
Dealing with inherited attributes:
predict their location into the stack
CH1.80
Yacc+Lex
 Translation Engine
 Bottom up translation (LALR)
 We only need to write the translation scheme
following Yacc specifications.
 A variety of attributes can be attached to each non-
terminal.
 Use of side-effects is possible and recommended to
deal with inherited attributes (instead of looking
into the stack).
 Thanks for Watching !

More Related Content

PPT
Lexical Analysis
PDF
Cs6660 compiler design
DOC
Chapter 1 1
PPTX
Compiler construction
PPT
Chapter One
PDF
Lecture 01 introduction to compiler
PDF
Compiler Design Lecture Notes
PDF
Lecture2 general structure of a compiler
Lexical Analysis
Cs6660 compiler design
Chapter 1 1
Compiler construction
Chapter One
Lecture 01 introduction to compiler
Compiler Design Lecture Notes
Lecture2 general structure of a compiler

What's hot (20)

PDF
Compiler design Introduction
PPTX
Phases of compiler
PDF
Principles of compiler design
DOC
Compiler Design(Nanthu)
KEY
Unit 1 cd
PPTX
Structure of the compiler
PPSX
Compiler designs presentation final
PDF
Introduction to compilers
PDF
Compilers Design
PPTX
Compiler Design
PDF
Different phases of a compiler
PPT
Compiler Construction
PPTX
PPTX
Error detection recovery
PPT
Compiler Design Basics
PDF
Lecture1 introduction compilers
PPTX
Compiler presentaion
PPTX
Compiler
PDF
Chapter#01 cc
Compiler design Introduction
Phases of compiler
Principles of compiler design
Compiler Design(Nanthu)
Unit 1 cd
Structure of the compiler
Compiler designs presentation final
Introduction to compilers
Compilers Design
Compiler Design
Different phases of a compiler
Compiler Construction
Error detection recovery
Compiler Design Basics
Lecture1 introduction compilers
Compiler presentaion
Compiler
Chapter#01 cc
Ad

Similar to Chapter1 Introduction of compiler (20)

PPT
1 - Introduction to Compilers.ppt
PPTX
Ss ui lecture 2
PPT
Compiler design computer science engineering.ppt
PPTX
ppt_cd.pptx ppt on phases of compiler of jntuk syllabus
PPTX
Comiler construction Notes
PDF
lec00-Introduction.pdf
PPTX
A Lecture of Compiler Design Subject.pptx
PPTX
Compilers.pptx
PPTX
Compiler an overview
PPTX
1._Introduction_.pptx
PPTX
Lecture 1 introduction to language processors
PDF
Introduction to compiler development
PDF
Compiler_Lecture1.pdf
PPT
Compiler Design in Computer Applications
PPTX
COMPILER CONSTRUCTION KU 1.pptx
PPTX
Ch 1.pptx
PPTX
1 compiler outline
DOCX
Dineshmaterial1 091225091539-phpapp02
PPT
Unit1.ppt
PPTX
Compiler Design Introduction With Design
1 - Introduction to Compilers.ppt
Ss ui lecture 2
Compiler design computer science engineering.ppt
ppt_cd.pptx ppt on phases of compiler of jntuk syllabus
Comiler construction Notes
lec00-Introduction.pdf
A Lecture of Compiler Design Subject.pptx
Compilers.pptx
Compiler an overview
1._Introduction_.pptx
Lecture 1 introduction to language processors
Introduction to compiler development
Compiler_Lecture1.pdf
Compiler Design in Computer Applications
COMPILER CONSTRUCTION KU 1.pptx
Ch 1.pptx
1 compiler outline
Dineshmaterial1 091225091539-phpapp02
Unit1.ppt
Compiler Design Introduction With Design
Ad

Recently uploaded (20)

PPTX
Artificial Intelligence
PDF
Well-logging-methods_new................
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
composite construction of structures.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Digital Logic Computer Design lecture notes
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
DOCX
573137875-Attendance-Management-System-original
Artificial Intelligence
Well-logging-methods_new................
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
composite construction of structures.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Foundation to blockchain - A guide to Blockchain Tech
Safety Seminar civil to be ensured for safe working.
UNIT 4 Total Quality Management .pptx
OOP with Java - Java Introduction (Basics)
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Internet of Things (IOT) - A guide to understanding
Model Code of Practice - Construction Work - 21102022 .pdf
Digital Logic Computer Design lecture notes
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
573137875-Attendance-Management-System-original

Chapter1 Introduction of compiler

  • 1. CH1.1 Chapter 1: Introduction to Compiling Danish Alam Computer Engineer Jamia millia Islaim Email- mydanish07@gmail.com
  • 3. CH1.3 Computing in the beginning of time.... ?
  • 4. CH1.4 Computing in the beginning of time how do you program this?? Changing Gears! the man... ... the legend and this is only a small part of the machine! the programmer Charles Babbage The Difference Engine” (1822) Lady Ada of Lovelace
  • 5. CH1.5 Computers!  Electronic Numerical Integrator and Computer --- ENIAC, 1942 how do you program this?? Flicking Switches!
  • 6. CH1.6 Timeline  the 1940’s. Code is hand generated at 0-1 level and entered by physical switches. Hardware is rewired according to the program.  the early 1950’s. First attempts to abstraction. Grace Murray Hopper:  “translation is a compilation of a sequence of machine-language subprograms selected from a library.”  first “compiler” A-0 (by G. M. Hopper)  Code now is written in Assembly form with simple English statements.  the late 1950’s. FORTRAN is born together with its Compiler!!!
  • 8. CH1.8 the 1960’s  ALGOL 60  The first Language with a formal grammar specification  more on this later on (in every meeting of this class!)  FORTRAN gets improved.  Language theory is better understood, it evolves and revolutionizes compiler design.  First “Syntax-Directed Compiler” is born in 1961.  PASCAL is born (Wirth, 1968).  First attempts at automating compiler construction using elements of Formal Language Theory.
  • 9. CH1.9 The 1970’s, 80’s, 90’s  C programming Language is born with its compiler (1972). Distributed as part of the UNIX operating system.  BASIC is born (1975).  “Compiler-Compiler” Tools start to be developed and used extensively.  Success of PCs brings compilers and interpreters in everyone’s home.
  • 10. CH1.10 Today  Compilers are everywhere.  “Programming” (in the strict sense) is only one application domain...  TeX and LaTeX  language source is compiled into a document.  Postscript  language source is translated by laser printers to printer machine level instructions that print a document.  Mathematica / Matlab  use a language to specify mathematical operations.  Verilog / VHDL  compiles into a circuit  you name it...
  • 11. CH1.11 Introduction to Compilers  As a Discipline, Involves Multiple CS&E Areas  Programming Languages and Algorithms  Theory of Computing & Software Engineering  Computer Architecture & Operating Systems  Has Deceivingly Simplistic Intent: Compiler Source program Target Program Error messages Diverse & Varied
  • 12. CH1.12 Classifications of Compilers  Compilers Viewed from Many Perspectives  However, All utilize same basic tasks to accomplish their actions Single Pass Multiple Pass Load & Go Construction Debugging Optimizing Functional
  • 13. CH1.13 The Model  The TWO Fundamental Parts:  We Will Discuss Both in This Class, and FOCUS on analysis. Analysis: Synthesis: Decompose Source into an intermediate representation Target program generation from representation
  • 14. CH1.14 Important Notes  Today: There are many Software Tools for helping with the Analysis Part. This Wasn’t the Case in Early Days. (some) analysis is also important in:  Structure / Syntax directed editors: Force “syntactically” correct code to be entered  Pretty Printers: Standardized version for program structure (i.e., blank space, indenting, etc.)  Static Checkers: A “quick” compilation to detect rudimentary errors  Interpreters: “real” time execution of code a “line-at-a- time”
  • 15. CH1.15 Important Notes  Compilation Is Not Limited to Programming Language Applications  Text Formatters  LATEX & TROFF Are Languages Whose Commands Format Text  Silicon Compilers  Textual / Graphical: Take Input and Generate Circuit Design  Database Query Processors  Database Query Languages Are Also a Programming Language  Input is compiled Into a Set of Operations for Accessing the Database
  • 16. CH1.16 Language Translation  To execute a program written in a high-level language, we need to translate it to the native code of the target machine Translator Source code Target code Target Machine
  • 17. CH1.17 Language Translation …contd  Terminology:  source code/program  Target code/program  Object code  Native code  Executable code  Compiler, assembler, interpreter
  • 18. CH1.18 Translators  Assembler  Input/source: assembly language  Interpreter  Translates and executes the source program  Target code not stored  Compiler  Translates and produces target code that can be stored and reused many times
  • 19. CH1.19 Compilers  Analysis-synthesis model  Analysis breaks source program into pieces and creates an intermediate representation  Synthesis part constructs the desired target program from the intermediate form
  • 20. CH1.20 Compilers …contd  Analysis  Operations implied by the source program are determined and recorded in a “tree” structure or similar structure  “syntax tree”: node is an operation and children are operands
  • 21. CH1.21 Source program (1) Preprocessor Source program (2) Compiler Target assembly program Assembler Relocatable machine code Loader/linker Absolute machine code Libraries object files Example Language Processing System
  • 22. CH1.22 Phases of a Compiler  Lexical analyzer (scanner)  Syntax analyzer (parser)  Semantic analyzer*  Intermediate code generator  Code optimizer  Code generator (*semantic analysis may be part of intermediate code generation) All phases involve symbol table and error handling
  • 23. CH1.23 The Many Phases of a Compiler Source Program Lexical Analyzer 1 Syntax Analyzer 2 Semantic Analyzer 3 Intermediate Code Generator 4 Code Optimizer 5 Code Generator 6 Target Program Symbol-table Manager Error Handler 1, 2, 3 : Analysis - Our Focus 4, 5, 6 : Synthesis
  • 24. CH1.24  Three Phases:  Linear / Lexical Analysis:  L-to-r Scan to Identify Tokens token: sequence of chars having a collective meaning  Hierarchical Analysis:  Grouping of Tokens Into Meaningful Collection  Semantic Analysis:  Checking to ensure Correctness of Components The Analysis Task For Compilation
  • 25. CH1.25 Lexical Analysis  Also called scanning, linear analysis or lex  Not complex as syntax/semantic analysis  Reads stream of characters in the source program left-to-right  Discards white space and comments  If language is case-insensitive, makes all characters (except string const.’s) uniform
  • 26. CH1.26 Lexical Analysis …contd  Breaks the source into individual lexical units or lexemes collectively called “tokens”  Sequence of characters with a collective meaning in the grammar of a language  Tokens are classified into types  E.g., identifiers, keywords, numeric constants, string constants, operators  Non-tokens  Comments, preprocessor directives and macros (in C/C++), blanks, tabs and new lines
  • 27. CH1.27 Lexical Analysis …contd Token type Example lexemes ID sum, rate, calc NUM 60, 0 , 23, 082 REAL 66.1, .5, 10. , 1e-9 IF if (keyword) EQ = PLUS + SCOLON ; LPAREN (
  • 28. CH1.28 Phase 1. Lexical Analysis Easiest Analysis - Identify tokens which are the basic building blocks For Example: All are tokens Blanks, Line breaks, etc. are scanned out Position := initial + rate * 60 ; _______ __ _____ _ ___ _ __ _
  • 29. CH1.29 Lexical Analysis …contd  Consider: pos = init + rate * 60 ;  What is the stream of tokens generated? ID(pos) EQ ID(init) PLUS ID(rate) MULT NUM(60) SCOLON  What will be added to the symbol table? pos, init, rate 60 ?
  • 30. CH1.30 Syntax Analysis (Parsing)  Also called hierarchical analysis  Groups tokens hierarchically into grammatical phrases called parse trees  Determines the syntactic structure of the program and of individual statements  Detects errors in statements that violate grammar rules
  • 31. CH1.31 Syntax Analysis …contd  E.g., pos = init + rate * 60 ; ID I D ID NUM expression Assignment statement expression expression expression expressionpos init rate 60 = + * SCOLON
  • 32. CH1.32 Syntax Analysis …contd  Hierarchical structure of a program usually expressed by recursive rules  Example: definition of an expression 1. Any identifier (ID) is an expression 2. Any number (NUM) is an expression 3. If exp1 and exp2 are expressions, so are – exp1 + exp2 – exp1 * exp2 – (exp1) and so on…
  • 33. CH1.33 Syntax Analysis …contd  Lexical constructs not recursive  Can recognize tokens via simple linear scan  Formalized by regular grammars  Syntactic constructs often recursive  Formalized by context-free grammars (CFG)  Linear scan not powerful to analyze expressions/statements
  • 34. CH1.34 Semantic Analysis  Syntax ↔ structure of a program  Semantics ↔ meaning of a program  Semantic analysis ensures that parts of a program fit together meaningfully  May be part of intermed. code generator  Type checking is an important task here
  • 35. CH1.35 Phase 3. Semantic Analysis  Find More Complicated Semantic Errors and Support Code Generation  Parse Tree Is Augmented With Semantic Actions position initial rate := + * 60 Compressed Tree position initial rate := + * inttoreal 60 Conversion Action
  • 36. CH1.36 Semantic Analysis …contd  Check if operands are of permitted types  Can apply coercion if language allows  Example: pos = init + rate * 60 ;  If the context is float, then the integer 60 will be converted (coerced) to 60.0  Checks if variables have been declared
  • 37. CH1.37 Symbol Table  Serves as a database for the compilation  Contains type and attribute info of each identifier defined in the program  Entered by the scanner, attributes added later  Symbol table management is an essential task of a compiler
  • 38. CH1.38 Error Handling  Each phase can encounter errors  Errors must be detected and reported  After detecting an error, error must be dealt with so that compilation proceeds  Further errors can be found  A compiler that stops at 1st error not so helpful  Syntax & semantic analysis phases typically handle most of the errors
  • 39. CH1.39 Intermediate Code Generation  Intermediate representation  Generally a program for an abstract machine (can be assembly language or slightly above)  Should be easy to produce and translate into target code  Why?  Optimization yet to be done  Machine code difficult to optimize
  • 40. CH1.40 Intermediate Code Gen. …contd  Can use the three-address code (3AC)  Each instruction can have at most 3 operands and one operator with assignment  E.g., pos = init + rate * 60.0 ;  This in 3AC might look like temp1 = id3 * 60.0 temp2 = id2 + temp1 id1 = temp2
  • 41. CH1.41 Code Optimization  Attempts to improve the intermediate code  Usually for faster execution, but the code may get shorter too  For our example, we can get: temp1 = id3 * 60.0 id1 = id2 + temp1  Great variation in possibilities  “optimizing compilers” can do many things
  • 42. CH1.42 Optimization Objectives  Optimize for  Performance/speed  Code size  Power consumption  Fast compilation  Security/reliability  Debugging
  • 43. CH1.43 Example program int calc(int a, int b, int N) { int i, x, y; x=y=0; for (i=0; i <=N; i++) { x = x + (4*a/b) * i + (i+1) * (i+1); x = x + b*y; } return x; } What are the possible optimizations?
  • 44. CH1.44 Optimizations  Applicable to the example program  Constant propagation  Algebraic simplification  Deadcode elimination  Loop invariant removal  Strength reduction  Register allocation
  • 45. CH1.45 Optimizations …contd  Unoptimized: 7 + N*21 operations  Optimized: 9+N*7 operations  Execution times can be 43 sec vs 17 sec
  • 46. CH1.46 Code Generation  Generates target code  Relocatable machine code or assembly code  Variables assigned to memory locations  Translates an intermediate instruction to a sequence of machine instructions  Register allocation: a critical task
  • 47. CH1.47 Code Generation …contd  Suppose we use 2 registers, R1 and R2  Our example LOAD id3, R2 MULF #60.0, R2 LOAD id2, R1 ADDF R2, R1 STORE R1, id1
  • 48. CH1.48 Front Ends & Back Ends  Typically phases grouped into a front end and a back end  Front end:  Lexical + syntax + semantic analysis, symbol table creation, intermed. code gen., some optimizations  Portions that are source-language-dependent and machine-independent
  • 49. CH1.49 Front Ends & Back Ends …contd  Back end:  Code generation and machine-dependent optimizations  Portions that are independent of source language  Can port a compiler to different machines by reusing the front end and re-doing the back end
  • 50. CH1.50 Passes of a Compiler  A compiler makes 1 or more passes over a program  Several phases are usually implemented in a single pass  A pass consists of reading an input file and writing an output file  Single-pass compiler may be feasible  E.g., Pascal
  • 51. CH1.51 Passes of a Compiler …contd  Lexical analysis to intermed. code gen. can be grouped into a pass  Activities in these phases interleaved  Here, parser can be seen as “in charge” Calls scanner for the next token Attempts to discover syntactic structure Calls the intermed. code gen. to do semantic analysis and generate portion of code
  • 52. CH1.52 Passes of a Compiler …contd  Why need multiple passes?  Some issues raised early in a program may remain un-answered until later E.g., if refers to identifiers/functions defined later  May not have enough memory to do everything in a single pass E.g., Can store intermediate results in a file in one pass and read them in the next pass
  • 53. CH1.53 Why Have We Divided Analysis in This Manner?  Lexical Analysis - Scans Input, Its Linear Actions Are Not Recursive  Identify Only Individual “words” that are the the Tokens of the Language  Recursion Is Required to Identify Structure of an Expression, As Indicated in Parse Tree  Verify that the “words” are Correctly Assembled into “sentences”  What is Third Phase?  Determine Whether the Sentences have One and Only One Unambiguous Interpretation  … and do something about it!  e.g. “John Took Picture of Mary Out on the Patio”
  • 54. CH1.54 Supporting Phases/ Activities for Analysis  Symbol Table Creation / Maintenance  Contains Info (storage, type, scope, args) on Each “Meaningful” Token, Typically Identifiers  Data Structure Created / Initialized During Lexical Analysis  Utilized / Updated During Later Analysis & Synthesis  Error Handling  Detection of Different Errors Which Correspond to All Phases  What Kinds of Errors Are Found During the Analysis Phase?  What Happens When an Error Is Found?
  • 55. CH1.55 The Synthesis Task For Compilation  Intermediate Code Generation  Abstract Machine Version of Code - Independent of Architecture  Easy to Produce and Do Final, Machine Dependent Code Generation  Code Optimization  Find More Efficient Ways to Execute Code  Replace Code With More Optimal Statements  2-approaches: High-level Language & “Peephole” Optimization  Final Code Generation  Generate Relocatable Machine Dependent Code
  • 56. CH1.56 Reviewing the Entire Process E r r o r s position := initial + rate * 60 lexical analyzer syntax analyzer semantic analyzer intermediate code generator id1 := id2 + id3 * 60 := id1 id2l id3 + * 60 := id1 id2l id3 + * inttoreal 60 Symbol Table position .... initial …. rate….
  • 57. CH1.57 Reviewing the Entire Process E r r o r s intermediate code generator code optimizer final code generator temp1 := inttoreal(60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3 temp1 := id3 * 60.0 id1 := id2 + temp1 MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R1, R2 MOVF R1, id1 position .... initial …. rate…. Symbol Table 3 address code
  • 58. CH1.58 Assemblers  Assembly code: names are used for instructions, and names are used for memory addresses.  Two-pass Assembly:  First Pass: all identifiers are assigned to memory addresses (0-offset) e.g. substitute 0 for a, and 4 for b  Second Pass: produce relocatable machine code: MOV a, R1 ADD #2, R1 MOV R1, b 0001 01 00 00000000 * 0011 01 10 00000010 0010 01 00 00000100 * relocation bit
  • 59. CH1.59 Loaders and Link-Editors  Loader: taking relocatable machine code, altering the addresses and placing the altered instructions into memory.  Link-editor: taking many (relocatable) machine code programs (with cross-references) and produce a single file.  Need to keep track of correspondence between variable names and corresponding addresses in each piece of code.
  • 60. CH1.60 Compiler Cousins: Preprocessors Provide Input to Compilers 1. Macro Processing #define in C: does text substitution before compiling #define X 3 #define Y A*B+C #define Z getchar()
  • 61. CH1.61 2. File Inclusion #include in C - bring in another file before compiling defs.h ////// ////// ////// main.c #include “defs.h” …---…---…--- …---…---…--- …---…---…--- ////// ////// ////// …---…---…--- …---…---…--- …---…---…---
  • 62. CH1.62 3. Rational Preprocessors  Augment “Old” Languages With Modern Constructs  Add Macros for If - Then, While, Etc.  #Define Can Make C Code More Pascal-like #define begin { #define end } #define then
  • 63. CH1.63 4. Language Extensions for a Database System EQUEL - Database query language embedded in a programming language. C ## Retrieve (DN=Department.Dnum) where ## Department.Dname = ‘Research’ is Preprocessed into: ingres_system(“Retr…..Research’”,____,____); a procedure call in a programming language.
  • 64. CH1.64 The Grouping of Phases Front End : Analysis + Intermediate Code Generation Back End : Code Generation + Optimization vs. Number of Passes: A pass: requires r/w intermediate files Fewer passes: more efficiency. However: fewer passes require more sophisticated memory management and compiler phase interaction. Tradeoffs ……..
  • 65. CH1.65 Compiler Construction Tools Parser Generators : Produce Syntax Analyzers Scanner Generators : Produce Lexical Analyzers <= Lex (Flex) Syntax-directed Translation Engines : Generate Intermediate Code <= Yacc (Bison) Automatic Code Generators : Generate Actual Code Data-Flow Engines : Support Optimization
  • 67. CH1.67 Top-Down Parsing Grammar: (1) L  S ; L (2) L   (3) S  if e then S else S (4) S  while e do S (5) S  begin L end (6) S  s Construct First & Follow Tables. First L  if while begin s S if while begin s Follow L $ end S ; else
  • 68. CH1.68 Predictive Recursive Parser main() { lookahead=gettoken(); L(); } L() { if lookahead belongs to ‘if’, ‘while’, ‘begin’ , s then {S(); match(‘;’); L(); } else /* epsilon production */ } S() { if lookahead = ‘if’ then { match(‘if’); match(e); match(‘then’); S(); match(‘else’); S(); } else if lookahead = ‘while’ then { match(‘while’); match(e); match(‘do’); S(); } else if lookahead = ‘begin’ then { match(‘begin’); L(); match(‘end’); } else if lookahead = s then { match(s); } else print(‘error’); } match(t:token) { if t==gettoken() then { nexttoken(); } else { printf(“error”); } }
  • 69. CH1.69 Table-Driven Parser ; if then else begin end while s $ L (1) (1) (2) (1) (1) (2) S synch (3) synch (5) (4) (6) Grammar is LL(1)
  • 70. CH1.70 Table-Driven Parser Code main() { initstack(); push(L); repeat X=top(); lookahead=gettoken(); if X is terminal or $ then { if X=lookahead then { pop(); nexttoken(); } else error(); } else /* X is a non-terminal */ { switch M[X,lookahead] { case 1: pop(); push(S); push(‘;’); push(L); nexttoken(); case 2: ; case 3: ... ... ... ... case ‘synch’: pop(); printf(“error: X expected”); case ‘empty’: nexttoken(); printf(“error: lookahead expected”); } }
  • 71. CH1.71 SLR Parsing Grammar: (1) E  E + T (2) E  T (3) T  T F (4) T  F (5) F  F * (6) F  a (7) F  b Construct First & Follow Tables: First E a b T a b F a b Follow E $ + T $ + a b F $ + a b *
  • 72. CH1.72 Canonical Sets of Items + Goto Diagram Set I0 E’  . E E  . E + T E  . T T  . T F T  . F F  . F * F  . a F  . b Set I1 E’  E . E  E . + T Set I2 E  T . T  T . F F  . F * F  . a F  . b Set I3 T  F . F  F . * Set I4 F  a. Set I5 F  b. Set I6 E  E + . T T  . T F T  . F F  . F * F  . a F  . b Set I7 E  E + T . T  T . F F  . F * F  . a F  . b Set I9 T  F . F  F . * Set I8 T  T F . F  F . * Set I10 F  F * . E TF a I5 b + T F a b F F * * I4 I5 a b I4 I5 a b I10 *
  • 73. CH1.73 SLR Table + * a b $ E T F 0 s4 s5 1 2 3 1 s6 acc 2 r2 s4 s5 r2 8 3 r4 s10 r4 r4 r4 4 r6 r6 r6 r6 r6 5 r7 r7 r7 r7 r7 6 s4 s5 7 9 7 r1 s4 s5 r1 8 8 r3 s10 r3 r3 r3 9 r4 s10 r4 r4 r4 10 r5 r5 r5 r5 r5
  • 74. CH1.74 Code main() { initstack(); push(0); repeat state=top(); lookahead=gettoken(); { switch action[state,lookahead] { case ‘s4’: push(4); nexttoken(); case ‘s5’: push(5); nexttoken(); ... case ‘r1’: pop();pop();pop(); push(goto(top())); case ‘r2’: pop(); push(goto(top())); ... case ‘acc’: printf(“Success”); case ‘empty’: printf(“error”); ... }
  • 75. CH1.75 Translation Intuition behind translation:  Assign attributes to terminals and non-terminals.  Typically, some attributes will be of type e.g. string and will carry the translated code as we traverse the parse-tree.  Write semantic rules for each production.  Revisit all parsers and every time a certain production is “selected” (top-down) or “reduced” (bottom-up) execute all its semantic rules.  Things to worry about: dependency between attributes... course of parsing might be inconsistent with attribute dependencies.
  • 76. CH1.76 Simple Translation Grammar: (1) L  S ; L (2) L   (3) S  if e then S else S (4) S  while e do S (5) S  begin L end (6) S  s Example of an accepted string: s ; while e do begin s ; s ; end ; if e then s else s ; s ; Translation Example: removal of while-loops s ; 100: if e then goto 101 else goto 102 ; 101: { s ; s; }; goto 100; 102: if e then s else s ; s
  • 77. CH1.77 Writing A Translation Scheme Grammar: (1) L  S ; L1 { L.code =S.code || ‘;’ || L1.code } (2) L   { L.code =  } (3) S  if e then S1 else S2 { S.code = ‘if’ || e.lex || ‘then’ || S1.code || ‘else’ || S2.code } (4) S  while e do S1 { beginlabel = newlabel( ); blocklabel = newlabel ( ); endlabel = newlabel( ); S.code = beginlabel || ‘:’ || ‘if’ || e.lex || ‘then’ || ‘goto’ || blocklabel || ‘else’ || ‘goto’ || endlabel || ‘;’ || blocklabel || ‘:’ || S1.code || ‘goto’ || beginlabel || ‘;’ || endlabel || ‘:’ } (5) S  begin L end { S.code = ‘{‘ || L.code || ‘}’ } (6) S  s { S.code = s.lex }
  • 78. CH1.78 Predictive Recursive Translation string main() { lookahead=gettoken(); return L(); } string L() { if lookahead belongs to ‘if’, ‘while’, ‘begin’ , s then {s1 = S(); match(‘;’); s2 = L(); output = concatenate(s1,’;’,s2); } else {output = ‘’} return output; } ... Dealing with inherited attributes: define them as input to procedures
  • 79. CH1.79 Bottom Up Translation main() { initstack(); push(0); repeat state=top(); lookahead=gettoken(); { switch action[state,lookahead] { ... case ‘r1’: val2=topval();pop(); pop(); val1=topval();pop(); push(goto(top()),concat(val1||’;’||val2)); ... } (1) L  S ; L1 { L.code =S.code || ‘;’ || L1.code } Dealing with inherited attributes: predict their location into the stack
  • 80. CH1.80 Yacc+Lex  Translation Engine  Bottom up translation (LALR)  We only need to write the translation scheme following Yacc specifications.  A variety of attributes can be attached to each non- terminal.  Use of side-effects is possible and recommended to deal with inherited attributes (instead of looking into the stack).  Thanks for Watching !