Lecture 5
Compiler Construction
The Context of a Compiler
• In addition to a compiler; several other
programs may be required to create an
executable target program.
• A source program may be divided into
modules stored in separate files.
• The task of collecting the source program is
sometimes entrusted to a distinct program
called a preprocessor.
The Context of a Compiler
• The target program created by compiler
may require further processing before it can
be run.
• The compiler creates assembly code that is
translated by an assembler into machine
code and then linked together with some
library routines into the code that actually
runs on the machine.
Preprocessors, Compilers,
Assemblers, and Linkers
Preprocessor
Compiler
Assembler
Linker
Skeletal Source Program
Source Program
Target Assembly Program
Relocatable Object Code
Absolute Machine Code
Libraries and
Relocatable Object Files
5
Analysis of the source program
Linear analysis:
• In which the stream of characters making
up the source program is read from left to
right and grouped into tokens that are
sequences of characters having a collective
meaning.
• In compiler, linear analysis is also called
LEXICAL ANALYSIS or SCANNING
Analysis of the source program
Hierarchical Analysis:
• In which characters or tokens are grouped
hierarchically into nested collections with
collective meaning.
• In compiler, hierarchical analysis is called
parsing or syntax analysis.
Analysis of the source program
Semantic analysis:
• In which certain checks are performed to
ensure that the components of a program fit
together meaningfully.
Semantic Analysis – Complications
• Handling ambiguity
– Semantic ambiguity: “I saw the Habib Bank Plaza
flying into Karachi”
Lexical Analysis
• For example in lexical analysis the characters in the
assignment statement
position := initial + rate * 60
would be grouped into the following tokens.
1. The identifier position
2. The assignment symbol :=
3. The identifier initial
4. The plus + sign
5. The identifier rate
6. The multiplication sign *
7. The number 60
Syntax Analysis
• It involves grouping the tokens of the
source program into grammatical phrases
that are tied by the compiler to synthesize
output.
• Usually the grammatical phrases of the
source program are represented by a parse
tree.
Parse tree
:=
identifier
identifier
identifier
+
*
60
Assignment
statement
position : = initial + rate * 60
position
expression
expression
expression
expression
initial
expression
rate
number
Parse tree tokens
• In expression position: = initial + rate * 60
the phrase rate * 60 is a logical unit.
• As multiplication is performed before addition.
• The expression initial + rate is followed by a *, it
is not grouped into a single phrase by itself.
Rules to identify an expression
Rule 1. Any identifier is an expression
Rule 2. Any number is an expression
Rule 3. If expression1 and expression2 are
expressions, then so are
expression1 + expression2
expression1 * expression2
Explanation of rules
• Rule (1) and (2) are (non-recursive) basic rules,
while (3) defines expression in terms of operators
applied to other expressions.
• By Rule (1), initial and rate are expressions
• By Rule (2), 60 is an expression
• By Rule (3), we can first infer that rate*60 is an
expression & finally that initial + rate * 60 is an
expression
Rule to identify a statement
• If identifier1 is an identifier, and expression2
is an expression then
identifier1 : = expression2
is a statement
Lecture 6
Context free grammar
• Lexical constructs do not require recursion,
while syntactic constructs often do.
• Context free grammars are a
formalization of recursive rules that can be
used to guide syntactic analysis.
Context free grammar
• For example recursion is not required to recognize
identifiers, which are typically strings of letters and digits
beginning with a letter.
• We would normally recognize identifiers by a simple scan
of the input stream, waiting until a character that was
neither a letter nor a digit was found
• And then grouping all the letters and digits found up to
that point into an identifier token.
• The characters so grouped are recorded in a table called a
symbol table, and remove from the input so that
processing of the next token can begin.
Phases of Compiler
Syntax Analyzer
Lexical Analyzer
Semantic Analyzer
Intermediate code generator
Code optimizer
Code generator
Target Program
Source Program
Symbol table
Manger
Error Handler
Symbol table Management
• An essential function of a compiler is to
record the identifiers used in the source
program and collect information about
various attributes of each identifier.
• These attributes may provide information
about the storage allocated for an identifier,
its type, its scope (where in the program it is
valid) etc
Symbol table
• A symbol table is a data structure containing a
record for each identifier, with fields for the
attributes of the identifier.
• The data structure allow us to find the record for
each identifier quickly and to store or retrieve data
from that record quickly.
• When an identifier in the source program is
detected by the lexical analyzer, the identifier is
entered into the symbol table.
Error detection and Reporting
• Each phase can encounter errors.
• After detecting an error, a phase must
somehow deal with that error, so that
compilation can proceed, allowing further
errors in the source program to be detected.
• A compiler that stops when it finds the first
error is not as helpful as it could be.
Error detection and Reporting
• The syntax and semantic analysis phases usually
handle a large fraction of the errors detectable by
the compiler.
• The lexical phase can detect errors where the
characters remaining in the input do not form any
token of the language.
• Errors where the token stream violates the
structure rules (Syntax) of the language are
determined by the syntax analysis phase.
Error detection and Reporting
• During semantic analysis the compiler tries
to detect constructs that have the right
syntactic structure but no meaning to the
operation involved.
• For example if we try to add two identifiers,
one of which is the name of an array and
the other the name of a procedure.
The Analysis Phase
position : = initial + rate * 60
• The lexical analysis phase reads the characters in
the source program and groups them into a stream
of tokens in which each token represents a
logically cohesive sequence of characters, such as
an identifier, a keyword (if, while etc), a
punctuation character or a multi-character operator
like :=
• The character sequence forming a token called the
lexeme for the token.
The Analysis Phase (Cont..)
• Certain tokens will be augmented by a
“lexical value”.
• For example when an identifier like rate is
found, the lexical analyzer not only
generates token, say id, but also enters the
lexeme rate into the symbol table, if it is not
already there.
Compiler Construction
Intermediate code generation
• After syntax and semantic analysis, some
compilers generate an explicit intermediate
representation of the source program.
• This intermediate representation has two
important properties;
– It should be easy to produce
– Easy to translate into the target machine
Intermediate code generation
• The intermediate representation can have a
variety of forms.
• It may called as “three address code”,
which is like the assembly language for a
machine in which every memory location
can act like a register.
Intermediate code generation
• Three address code consists of a sequence of
instructions, each of which has at most three
operands.
• The source program might appear in three address
code as
temp1 := inttoreal (60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
Intermediate code generation
:=
+
*
id 1
id 2
id 3
nu
Intermediate code generation
• The intermediate form has several
properties.
• First each three address instruction has at
most one operator in addition to the
assignment.
• Thus when generating these instructions,
the compiler has to decide on the order in
which operations are to be done.
Intermediate code generation
• In our example the multiplication precedes the
addition in the source program.
• Second the compiler must generate a temporary
name to hold the value computed by each
instruction.
• Third, some “three address” instructions have
fewer than three operands for example the first
and last instruction in our example.
Code Optimization
• The code optimization
phase attempts to
improve the
intermediate code, so
the faster running
machine code will
result.
Code Optimization
temp1 := id3 * 60.0
Id1 := id2 + temp1
• There is nothing wrong with this simple algorithm; since the problem
can be fixed during the code optimization phase.
• The compiler can deduced that the conversion of 60 from integer to
real representation can be done once and for all at compile time; so the
inttoreal operation can be eliminated.
• Besides temp3 is used only once, to transmit its value to id1.
• It then becomes safe to substitute id1 for temp3, whereupon the last
statement of intermediate code is not needed and the optimized code
results.
Code Optimization
• There is a great variation in the amount of
code optimization different compiler
performs.
• In those that do the most, called “optimizing
compilers”
• A significant fraction of the time of the
compiler is spent on this phase
Code generation
• The final phase of the compiler
is the generation of target code,
consisting normally of
relocatable machine code or
assembly code.
• Memory locations are selected
for each of the variables used
by the program.
• Intermediate instructions are
each translated into a sequence
of machine instructions that
perform the same task.
• A crucial aspect is the
assignment of variable to
register.
Code generation
• The first and second
operands of each
instruction specify a
source and destination
respectively.
• The F in each
instruction tells us that
instructions deal with
floating point numbers
Code optimization & Code
generation

More Related Content

PPT
Compiler1
PPSX
Spr ch-05-compilers
PPTX
Compiler Design
PPT
Cpcs302 1
PPT
Compiler Design
PDF
Lecture1 introduction compilers
DOCX
Compiler Design Material
PPTX
Compiler1
Spr ch-05-compilers
Compiler Design
Cpcs302 1
Compiler Design
Lecture1 introduction compilers
Compiler Design Material

What's hot (20)

DOC
Chapter 1 1
PPT
phases of a compiler
PPTX
Error detection recovery
PDF
Different phases of a compiler
PDF
Compiler unit 1
PPTX
Compiler Construction
PPTX
Phases of compiler
PDF
Compiler Design Introduction
DOCX
Techniques & applications of Compiler
PPT
what is compiler and five phases of compiler
PDF
Principles of compiler design
PDF
Cs6660 compiler design
PPT
Compiler Construction introduction
PDF
Lecture2 general structure of a compiler
PPTX
DOC
Compiler Design(Nanthu)
PPT
Lexical analyzer
PDF
Phases of Compiler
PDF
Compiler design
PPTX
Compiler Chapter 1
Chapter 1 1
phases of a compiler
Error detection recovery
Different phases of a compiler
Compiler unit 1
Compiler Construction
Phases of compiler
Compiler Design Introduction
Techniques & applications of Compiler
what is compiler and five phases of compiler
Principles of compiler design
Cs6660 compiler design
Compiler Construction introduction
Lecture2 general structure of a compiler
Compiler Design(Nanthu)
Lexical analyzer
Phases of Compiler
Compiler design
Compiler Chapter 1
Ad

Viewers also liked (20)

PPT
Introduction to Compiler Construction
PPT
PPT
Code Optimization
PPTX
Role-of-lexical-analysis
PPTX
Lecture 1 introduction to language processors
PDF
Lecture3 lexical analysis
PPTX
Type checking in compiler design
PPT
Type Checking(Compiler Design) #ShareThisIfYouLike
PDF
Lecture4 lexical analysis2
PPTX
Lecture 02 lexical analysis
PPTX
Syntax directed-translation
PPTX
Compiler construction
PPTX
Lecture 11 semantic analysis 2
PPTX
Compiler Construction Course - Introduction
PDF
Syntax directed translation
PPTX
Phases of Compiler
PPTX
Compilers
PDF
Phases of the Compiler - Systems Programming
PPT
Analysis of the source program
Introduction to Compiler Construction
Code Optimization
Role-of-lexical-analysis
Lecture 1 introduction to language processors
Lecture3 lexical analysis
Type checking in compiler design
Type Checking(Compiler Design) #ShareThisIfYouLike
Lecture4 lexical analysis2
Lecture 02 lexical analysis
Syntax directed-translation
Compiler construction
Lecture 11 semantic analysis 2
Compiler Construction Course - Introduction
Syntax directed translation
Phases of Compiler
Compilers
Phases of the Compiler - Systems Programming
Analysis of the source program
Ad

Similar to Compiler Construction (20)

PPTX
The Phases of a Compiler
PPTX
Compiler Construction-2 for bs computer science.pptx
PPTX
Unit 1.pptx
PPTX
System software module 4 presentation file
PPTX
Chapter 1.pptx
PDF
Lecture 2.1 - Phase of a Commmmpiler.pdf
PPT
Introduction to compiler
PDF
Principles of Compiler Design
PPTX
1._Introduction_.pptx
PPTX
COMPILER CONSTRUCTION KU 1.pptx
PPT
Concept of compiler in details
PDF
An Introduction to the Compiler Designss
PPTX
Compiler Design Introduction With Design
PPTX
Compiler an overview
PPTX
A Lecture of Compiler Design Subject.pptx
PPTX
16 compiler-151129060845-lva1-app6892-converted.pptx
PPTX
Chapter one
PPT
Compier Design_Unit I.ppt
PPT
Compier Design_Unit I.ppt
PPTX
Chapter-1.pptx compiler Design Course Material
The Phases of a Compiler
Compiler Construction-2 for bs computer science.pptx
Unit 1.pptx
System software module 4 presentation file
Chapter 1.pptx
Lecture 2.1 - Phase of a Commmmpiler.pdf
Introduction to compiler
Principles of Compiler Design
1._Introduction_.pptx
COMPILER CONSTRUCTION KU 1.pptx
Concept of compiler in details
An Introduction to the Compiler Designss
Compiler Design Introduction With Design
Compiler an overview
A Lecture of Compiler Design Subject.pptx
16 compiler-151129060845-lva1-app6892-converted.pptx
Chapter one
Compier Design_Unit I.ppt
Compier Design_Unit I.ppt
Chapter-1.pptx compiler Design Course Material

More from Sarmad Ali (12)

PPTX
Network Engineer Interview Questions with Answers
PDF
RDBMS Model
PDF
RDBMS Algebra
PDF
RDBMS ERD
PDF
RDBMS ER2 Relational
PDF
RDBMS Arch & Models
PDF
RDBMS ERD Examples
PPT
Management Information System-MIS
PPT
Classification of Compilers
PDF
Introduction to RDBMS
PDF
About Data Base
PPT
Management information system-MIS
Network Engineer Interview Questions with Answers
RDBMS Model
RDBMS Algebra
RDBMS ERD
RDBMS ER2 Relational
RDBMS Arch & Models
RDBMS ERD Examples
Management Information System-MIS
Classification of Compilers
Introduction to RDBMS
About Data Base
Management information system-MIS

Recently uploaded (20)

PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
International_Financial_Reporting_Standa.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
My India Quiz Book_20210205121199924.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
IGGE1 Understanding the Self1234567891011
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
202450812 BayCHI UCSC-SV 20250812 v17.pptx
International_Financial_Reporting_Standa.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
Introduction to pro and eukaryotes and differences.pptx
Unit 4 Computer Architecture Multicore Processor.pptx
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
What if we spent less time fighting change, and more time building what’s rig...
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
My India Quiz Book_20210205121199924.pdf
B.Sc. DS Unit 2 Software Engineering.pptx
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Environmental Education MCQ BD2EE - Share Source.pdf
Weekly quiz Compilation Jan -July 25.pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Paper A Mock Exam 9_ Attempt review.pdf.
A powerpoint presentation on the Revised K-10 Science Shaping Paper
IGGE1 Understanding the Self1234567891011

Compiler Construction

  • 3. The Context of a Compiler • In addition to a compiler; several other programs may be required to create an executable target program. • A source program may be divided into modules stored in separate files. • The task of collecting the source program is sometimes entrusted to a distinct program called a preprocessor.
  • 4. The Context of a Compiler • The target program created by compiler may require further processing before it can be run. • The compiler creates assembly code that is translated by an assembler into machine code and then linked together with some library routines into the code that actually runs on the machine.
  • 5. Preprocessors, Compilers, Assemblers, and Linkers Preprocessor Compiler Assembler Linker Skeletal Source Program Source Program Target Assembly Program Relocatable Object Code Absolute Machine Code Libraries and Relocatable Object Files 5
  • 6. Analysis of the source program Linear analysis: • In which the stream of characters making up the source program is read from left to right and grouped into tokens that are sequences of characters having a collective meaning. • In compiler, linear analysis is also called LEXICAL ANALYSIS or SCANNING
  • 7. Analysis of the source program Hierarchical Analysis: • In which characters or tokens are grouped hierarchically into nested collections with collective meaning. • In compiler, hierarchical analysis is called parsing or syntax analysis.
  • 8. Analysis of the source program Semantic analysis: • In which certain checks are performed to ensure that the components of a program fit together meaningfully.
  • 9. Semantic Analysis – Complications • Handling ambiguity – Semantic ambiguity: “I saw the Habib Bank Plaza flying into Karachi”
  • 10. Lexical Analysis • For example in lexical analysis the characters in the assignment statement position := initial + rate * 60 would be grouped into the following tokens. 1. The identifier position 2. The assignment symbol := 3. The identifier initial 4. The plus + sign 5. The identifier rate 6. The multiplication sign * 7. The number 60
  • 11. Syntax Analysis • It involves grouping the tokens of the source program into grammatical phrases that are tied by the compiler to synthesize output. • Usually the grammatical phrases of the source program are represented by a parse tree.
  • 12. Parse tree := identifier identifier identifier + * 60 Assignment statement position : = initial + rate * 60 position expression expression expression expression initial expression rate number
  • 13. Parse tree tokens • In expression position: = initial + rate * 60 the phrase rate * 60 is a logical unit. • As multiplication is performed before addition. • The expression initial + rate is followed by a *, it is not grouped into a single phrase by itself.
  • 14. Rules to identify an expression Rule 1. Any identifier is an expression Rule 2. Any number is an expression Rule 3. If expression1 and expression2 are expressions, then so are expression1 + expression2 expression1 * expression2
  • 15. Explanation of rules • Rule (1) and (2) are (non-recursive) basic rules, while (3) defines expression in terms of operators applied to other expressions. • By Rule (1), initial and rate are expressions • By Rule (2), 60 is an expression • By Rule (3), we can first infer that rate*60 is an expression & finally that initial + rate * 60 is an expression
  • 16. Rule to identify a statement • If identifier1 is an identifier, and expression2 is an expression then identifier1 : = expression2 is a statement
  • 18. Context free grammar • Lexical constructs do not require recursion, while syntactic constructs often do. • Context free grammars are a formalization of recursive rules that can be used to guide syntactic analysis.
  • 19. Context free grammar • For example recursion is not required to recognize identifiers, which are typically strings of letters and digits beginning with a letter. • We would normally recognize identifiers by a simple scan of the input stream, waiting until a character that was neither a letter nor a digit was found • And then grouping all the letters and digits found up to that point into an identifier token. • The characters so grouped are recorded in a table called a symbol table, and remove from the input so that processing of the next token can begin.
  • 20. Phases of Compiler Syntax Analyzer Lexical Analyzer Semantic Analyzer Intermediate code generator Code optimizer Code generator Target Program Source Program Symbol table Manger Error Handler
  • 21. Symbol table Management • An essential function of a compiler is to record the identifiers used in the source program and collect information about various attributes of each identifier. • These attributes may provide information about the storage allocated for an identifier, its type, its scope (where in the program it is valid) etc
  • 22. Symbol table • A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. • The data structure allow us to find the record for each identifier quickly and to store or retrieve data from that record quickly. • When an identifier in the source program is detected by the lexical analyzer, the identifier is entered into the symbol table.
  • 23. Error detection and Reporting • Each phase can encounter errors. • After detecting an error, a phase must somehow deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected. • A compiler that stops when it finds the first error is not as helpful as it could be.
  • 24. Error detection and Reporting • The syntax and semantic analysis phases usually handle a large fraction of the errors detectable by the compiler. • The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. • Errors where the token stream violates the structure rules (Syntax) of the language are determined by the syntax analysis phase.
  • 25. Error detection and Reporting • During semantic analysis the compiler tries to detect constructs that have the right syntactic structure but no meaning to the operation involved. • For example if we try to add two identifiers, one of which is the name of an array and the other the name of a procedure.
  • 26. The Analysis Phase position : = initial + rate * 60 • The lexical analysis phase reads the characters in the source program and groups them into a stream of tokens in which each token represents a logically cohesive sequence of characters, such as an identifier, a keyword (if, while etc), a punctuation character or a multi-character operator like := • The character sequence forming a token called the lexeme for the token.
  • 27. The Analysis Phase (Cont..) • Certain tokens will be augmented by a “lexical value”. • For example when an identifier like rate is found, the lexical analyzer not only generates token, say id, but also enters the lexeme rate into the symbol table, if it is not already there.
  • 29. Intermediate code generation • After syntax and semantic analysis, some compilers generate an explicit intermediate representation of the source program. • This intermediate representation has two important properties; – It should be easy to produce – Easy to translate into the target machine
  • 30. Intermediate code generation • The intermediate representation can have a variety of forms. • It may called as “three address code”, which is like the assembly language for a machine in which every memory location can act like a register.
  • 31. Intermediate code generation • Three address code consists of a sequence of instructions, each of which has at most three operands. • The source program might appear in three address code as temp1 := inttoreal (60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3
  • 33. Intermediate code generation • The intermediate form has several properties. • First each three address instruction has at most one operator in addition to the assignment. • Thus when generating these instructions, the compiler has to decide on the order in which operations are to be done.
  • 34. Intermediate code generation • In our example the multiplication precedes the addition in the source program. • Second the compiler must generate a temporary name to hold the value computed by each instruction. • Third, some “three address” instructions have fewer than three operands for example the first and last instruction in our example.
  • 35. Code Optimization • The code optimization phase attempts to improve the intermediate code, so the faster running machine code will result.
  • 36. Code Optimization temp1 := id3 * 60.0 Id1 := id2 + temp1 • There is nothing wrong with this simple algorithm; since the problem can be fixed during the code optimization phase. • The compiler can deduced that the conversion of 60 from integer to real representation can be done once and for all at compile time; so the inttoreal operation can be eliminated. • Besides temp3 is used only once, to transmit its value to id1. • It then becomes safe to substitute id1 for temp3, whereupon the last statement of intermediate code is not needed and the optimized code results.
  • 37. Code Optimization • There is a great variation in the amount of code optimization different compiler performs. • In those that do the most, called “optimizing compilers” • A significant fraction of the time of the compiler is spent on this phase
  • 38. Code generation • The final phase of the compiler is the generation of target code, consisting normally of relocatable machine code or assembly code. • Memory locations are selected for each of the variables used by the program. • Intermediate instructions are each translated into a sequence of machine instructions that perform the same task. • A crucial aspect is the assignment of variable to register.
  • 39. Code generation • The first and second operands of each instruction specify a source and destination respectively. • The F in each instruction tells us that instructions deal with floating point numbers
  • 40. Code optimization & Code generation