SlideShare a Scribd company logo
Chapter One
Introduction
Compiler Design
1
Basic knowledge of Programming languages.
Basic knowledge of Formal language and AutomataTheory.
Textbook:
AlfredV.Aho, Ravi Sethi, and Jeffrey D. Ullman,
“Compilers:Principles,Techniques,andTools”
Addison-Wesley, 2007.
Course Outline
PreliminariesRequired
Objective
At the end of this session students will be able to:
 Understand the basic concepts and principles of Compiler Design
 Understand the term compiler, its functions and how it works.
 Be familiar with the different classification of compilers.
 Be familiar with cousins of compiler: Linkers, Loaders, Interpreters,Assemblers
 Understand the need of studying compiler Design and construction
 Understand the phases of compilation and the steps of compilation
 Understand the history of compilers. (Assignment For G1 Students)
 Be familiar with the different compiler tools (Assignment For G2 Students)
3
Introduction
Definition
 Compiler is an executable program that can read a program in one high-level language and
translate it into an equivalent executable program in machine language.
 A compiler is a computer program that translates an executable program in a source
language into an equivalent program in a target language.
A source program/code is a program/code written in the source language, which is
usually a high-level language.
4
A target program/code is a program/code
written in the target language, which often is
a machine language or an intermediate code
(Object Code).
Compiler
Source
program
Target
program
Error
message
Contd.
 As a discipline compiler design involves multiple computer science and Engineering
courses like:
Programming Languages
Data structures and Algorithms
Theory of Computation (Automata and formal language theory)
Assembly language
Software Engineering
Computer Architecture
Operating Systems and
Discrete Mathematics
5
Why Study Theory of Compiler?
 curiosity
 Prerequisite for developing advanced compilers, which continues to be active as new
computer architectures emerge
To improve capabilities of existing compiler/interpreter
To write more efficient code in a high-level language
 Useful to develop software tools that parse computer codes or strings
E.g., editors, debuggers, interpreters, preprocessors, …
 Important to understand how compliers work to program more effectively
 To provide solid foundation in parsing theory for parser writing
 To make compiler design as an excellent “capstone” project
 To apply almost all of the major computer science fields, such as: automata theory, computer
programming, programming language design theory, assembly language, computer
architecture, data structures, algorithms, and software engineering.
6
Classification of Compilers
7
 CompilersViewed from Many Perspectives
Single Pass
Multiple Pass
Load & Go
Debugging
Optimizing
 However, all utilize same basic tasks to accomplish their actions
Construction
Functional
o Classifying compilers by number of passes
has its background in the hardware
resource limitations of computers.
o Compiling involves performing lots of work
and early computers did not have enough
memory to contain one program that did all
of this work.
 So compilers were split up into
smaller programs which each made a
pass over the source (or some
representation of it) performing some
of the required analysis and
translations.
Contd.
1. Single(One) Pass Compilers:- is a compiler that passes through the source code of each
compilation unit only once
Also called narrow compilers.
The ability to compile in a single pass has classically been seen as a benefit because it
simplifies the job of writing a compiler.
Single-pass compilers generally perform compilations faster than multi-pass compilers.
Due to the resource limitations of early systems, many early languages were specifically
designed so that they could be compiled in a single pass (e.g., Pascal).
Disadvantage of single pass Compilers:
It is not possible to perform many of the sophisticated optimizations needed to generate
high quality code.
It can be difficult to count exactly how many passes an optimizing compiler makes.
o For instance, different phases of optimization may analyse one expression many times
but only analyse another expression once.
8
Contd.
2. Multi-Pass Compilers:- is a type of compiler that processes the source code or
abstract syntax tree of a program several times.
Also called wide compilers.
Phases are separate "Programs", which run sequentially
Here, by splitting the compiler up into small programs, correct programs will be
produced.
Proving the correctness of a set of small programs often requires less
effort than proving the correctness of a larger, single, equivalent
program.
Many programming languages cannot be represented with a single pass
compilers, for example most latest languages like Java require a multi-pass
compiler.
9
Contd.
3. Load and Go Compilers:- generates machine code & then immediately executes it.
Compilers usually produce either absolute code that is executed immediately
upon conclusion of the compilation or object code that is transformed by a
linking loader into absolute code.
These compiler organizations will be called Load & Go and Link/Load.
Both Load & Go and Link/Load compilers use a number of passes to translate the
source program into absolute code.
A pass reads some form of the source program, transforms it into an another
form, and normally outputs this form to an intermediate file which may be the
input of a later pass.
10
Contd.
4. Optimizing Compilers:- is a compiler that tries to minimize or maximize some
attributes of an executable computer program.
The most common requirement is to minimize the time taken to execute a
program; a less common one is to minimize the amount of memory occupied.
The growth of portable computers has created a market for minimizing the power
consumed by a program.
Compiler optimization is generally implemented using a sequence of optimizing
transformations, algorithms which take a program and transform it to produce a
semantically equivalent output program that uses fewer resources.
Types of Optimization includes Peephole optimizations, local optimization,
global optimization, loop optimization, machine-code optimization, etc.
[Read more ]
11
Cousins of Compilers
A. Assembler:- is a translator that converts programs written in assembly language into
machine code.
 Translate mnemonic operation codes to their machine language equivalents.
 Assigning machine addresses to symbolic labels.
B. Interpreter:- is a computer program that translates high level
instructions/programs into machine code as they are encountered.
 It produces output of statement as they are interpreted
 It generally uses one of the following strategies for program execution:
i. execute the source code directly
ii. translate source code into some efficient intermediate representation and immediately
execute this
iii. explicitly execute stored precompiled code made by a compiler which is part of the
interpreter system
12
Contd.
C. Linker:- is a program that takes one or more objects generated by a compiler and
combines them into a single executable program.
D. Loader:- is the part of an operating system that is responsible for loading programs
from executables (i.e., executable files) into memory, preparing them for execution
and then executing them.
13
preprocessor
compiler
assembler
linker/loader
source program
modified source program
target assembly program
Relocatable machine code
target machine code
Library
files
Compiler vs. Interpreter
 Most languages are usually thought of as using either one or the other:
 Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1
 Interpreters: Lisp, scheme, BASIC,APL, Perl, Python, Smalltalk
 BUT: not always implemented this way
14
Ideal concept:
Compiler
Executable
Executable
Output data
Interpreter
Source code
Input data
Output data
Input data
Source code
Basic Compiler Design
 Write a huge program that takes as input another program in the source language for
the compiler, and gives as output an executable that we can run.
For modifying code easily, usually, we use modular design (decomposition)
methodology to design a compiler.
Two design strategies:
1. Write a “front end” of the compiler (i.e. the lexer, parser, semantic analyzer, and
assembly tree generator), and write a separate back end for each platform that you
want to support
2. Write an efficient highly optimized back end, and write a different front end for
several languages, such as Fortran, C, C++, and Java.
15
Front End Back End
Source
code
Intermediate
code
Target
code
The Analysis-Synthesis Model of Compilation
16
1. LexicalAnalysis
2. SyntaxAnalysis
3. SemanticAnalysis
4. Code Generation
5. Optimization
Analysis
Synthesis
Front
End
Back
End
 Breaks up source program into constituent
pieces
 Creates intermediate representation of
source program
 Construct target program from
intermediate representation
 Takes the tree structure and translates the
operations into the target program
There are two parts to compilation: analysis & synthesis.
 During analysis, the operations implied by the source program are determined and
recorded in a hierarchical structure called a tree.
 During synthesis, the operations involved in producing translated code.
Analysis
 In compiling, analysis has three phases:
1. Linear analysis: stream of characters read from left-to-right and grouped into tokens;
known as lexical analysis or scanning
 Converting input text into stream of known objects called tokens.
 It simplifies scanning process
2. Hierarchical analysis: tokens grouped hierarchically with collective meaning; known as
parsing or syntax analysis
 Translating code to rules of grammar.
 Building representation of code.
3. Semantic analysis: check if the program components fit together meaningfully
 Checks source program for semantic errors
 Gathers type information for subsequent code generation (type checking)
 Identifies operator and operands of expressions and statements
17
Phases of Compilation
scanner
parser
Semantic analyzer
Intermediate code generator
Code optimization
Code generator
Code optimization
Stream of characters
Stream of tokens
Parse/syntax tree
Annotated tree
Intermediate code
Intermediate code
Target code
Target code
General Structure of a Compiler
Phase I: Lexical Analysis
 The low-level text processing portion of the compiler
 The source file, a stream of characters, is broken into larger chunks called tokens.
For example:
void main()
{
int x;
x=3;
}
 The lexical analyzer (scanner) reads a stream of characters and puts them together into
some meaningful (with respect to the source language) units called tokens.
 Typically, spaces, tabs, end-of-line characters and comments are ignored by the lexical
analyzer.
 To design a lexical analyzer: input a description (regular expressions) of the tokens in
the language, and output a lexical analyzer (a program).
19
It will be broken into 13 tokens as below:
void main ( ) { int x ; x = 3 ; }
Phase II: Parsing (Syntax Analysis)
 A parser gets a stream of tokens from the scanner, and determines if the syntax (structure)
of the program is correct according to the (context-free) grammar of the source language.
 Then, it produces a data structure, called a parse tree or an abstract syntax tree, which
describes the syntactic structure of the program.
 The parser ensures that the sequence of tokens returned by the lexical analyzer forms
a syntactically correct program
 It also builds a structured representation of the program called an abstract syntax
tree that is easier for the type checker to analyze than a stream of tokens
 It catches the syntax errors as the statement below:
if if (x > 3) then x = x + 1
 Context-free grammars will be used (as the input) by the parser generator to describe the
syntax of the compiling language
 Most compilers do not generate a parse tree explicitly but rather go to intermediate
20
Parse Tree
 Is output of parsing that shows theTop-down description of program syntax
 Root node is entire program and leaves are tokens that were identified during lexical analysis
 Constructed by repeated application of rules in Context Free Grammar (CFG)
 Syntax structures are analyzed by DPDA (Deterministic Push Down Automata)
Example: parse tree for position:=initial + rate*60
21
Phase III: Semantic Analysis
 It gets the parse tree from the parser together with information about some syntactic
elements
 It determines if the semantics (meanings) of the program is correct.
 It detects errors of the program, such as using variables before they are declared,
assign an integer value to a Boolean variable, …
 This part deals with static semantic.
 semantic of programs that can be checked by reading off from the program only.
 syntax of the language which cannot be described in context-free grammar.
 Mostly, a semantic analyzer does type checking (i.e. Gathers type information for
subsequent code generation.)
 It modifies the parse tree in order to get that (static) semantically correct code
 In this phase, the abstract syntax tree that is produced by the parser is traversed,
looking for semantic errors
22
Contd.
 The main tool used by the semantic analyzer is a symbol table
 Symbol table:- is a data structure with a record for each identifier and its attributes
 Attributes include storage allocation, type, scope, etc
 All the compiler phases insert and modify the symbol table
 Discovery of meaning in a program using the symbol table
 Do static semantics check
 Simplify the structure of the parse tree ( from parse tree to abstract syntax tree
(AST) )
Static semantics check
 Making sure identifiers are declared before use
 Type checking for assignments and operators
 Checking types and number of parameters to subroutines
 Making sure functions contain return statements
23
Phase IV: Intermediate Code Generation
 An intermediate code generator
 takes a parse tree from the semantic analyzer
 generates a program in the intermediate language.
 In some compilers, a source program is translated into an intermediate code first and
then the intermediate code is translated into the target language.
 In other compilers, a source program is translated directly into the target language.
 Compiler makes a second pass over the parse tree to produce the translated code
 If there are no compile-time errors, the semantic analyzer translates the abstract
syntax tree into the abstract assembly tree
 The abstract assembly tree will be passed to the code optimization and assembly code
generation phase
24
Contd.
 Using intermediate code is beneficial when compilers which translates a single source
language to many target languages are required.
 The front-end of a compiler:- scanner to intermediate code generator can be
used for every compilers.
 Different back-ends:- code optimizer and code generator is required for each
target language.
 One of the popular intermediate code is three-address code.
 A three-address code instruction is in the form of x = y op z.
25
Phase V: Assembly Code Generation
 Code generator coverts the abstract assembly tree into the actual assembly code
 To do code generation
 The generator covers the abstract assembly tree with tiles (each tile represents
a small portion of an abstract assembly tree) and
 Output the actual assembly code associated with the tiles that we used to cover the tree
26
Phase VI: Machine Code Generation and Linking
 The final phase of compilation coverts the assembly code into machine code and links
(by a linker) in appropriate language libraries
Code Optimization
 Replacing an inefficient sequence of instructions with a better sequence of instructions.
 Sometimes called code improvement.
 Code optimization can be done:
 after semantic analyzing
performed on a parse tree
 after intermediate code generation
performed on a intermediate code
 after code generation
performed on a target code
 Two types of optimization
1. Local
2. Global
27
Local Optimization
 The compiler looks at a very small block of instructions and tries to determine how it
can improve the efficiency of this local code block
 Relatively easy; included as part of most compilers
Examples of possible local optimizations
1. Constant evaluation
2. Strength reduction
3. Eliminating unnecessary operations
28
Global Optimization
 The compiler looks at large segments of the program to decide how to improve
performance
 Much more difficult; usually omitted from all but the most sophisticated and
expensive production-level “optimizing compilers”
 Optimization cannot make an inefficient algorithm efficient
29
30
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’,‘=’,‘B’,‘+’,‘C’,‘;’
And symbol table with names
Parser (performs syntax analysis
based on the grammar of the
programming language)
Parse tree or abstract syntax
tree
;
|
=
/ 
A +
/ 
B C
Semantic analyzer (type checking,
etc)
Annotated parse tree or abstract
syntax tree
Intermediate code generator Three-address code, quads, or
RTL
int2fp B t1
+ t1 C t2
:= t2 A
Optimizer Three-address code, quads, or
RTL
int2fp B t1
+ t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Summary of Phases of
Compiler
31
Compiler Construction Tools
32
Software development tools are available to implement one or more compiler phases
 Scanner generators
 Parser generators.
 Syntax-directed translation engines
 Automatic code generators
 Data Flow Engines
Other compiler tools:
 JavaCC, a parser generator for Java, including
scanner generator and parser generator. Input
specifications are different than those suitable for
Lex/YACC. Also, unlike YACC, JavaCC generates
a top-down parser.
 ANTLR, a set of language translation tools
(formerly PCCTS). Includes scanner/parser
generators for C, C++, and Java.
 Scanner generators for C/C++: Flex, Lex.
 Parser generators for C/C++: Bison, YACC.
 Available scanner generators for Java:
 JLex, a scanner generator for Java, very similar to Lex.
 JFlex, flex for Java.
 Available parser generators for Java:
 CUP, a parser generator for Java, very similar toYACC.
 BYACC/J, a different version of BerkeleyYACC for Java. It is an extension of the
standardYACC (a -j flag has been added to generate Java code).
Assignment I
Compiler DesignTools
History of Compilers
Thank You ...
?

More Related Content

PPTX
AN INTRODUCTION TO MOBILE APPLICATION DEVELOPMENT
PPTX
Multimedia
PPTX
Generations of computer
PDF
An introduction to software engineering
PDF
Software Project Estimation
PDF
Report of e commerce website
PPTX
Internet
PDF
Growth of the Internet
AN INTRODUCTION TO MOBILE APPLICATION DEVELOPMENT
Multimedia
Generations of computer
An introduction to software engineering
Software Project Estimation
Report of e commerce website
Internet
Growth of the Internet

What's hot (20)

PPTX
CS8494 SOFTWARE ENGINEERING Unit-1
PPTX
6 types of web application development
PPTX
PPTX
Introduction to Internet By Anamika
PPT
The History Of The Internet Presentation
PPT
Chapter01 the systems development environment
PPTX
Web application
PDF
Project Planning in Software Engineering
PPT
Lecture 2
PPTX
Multimedia production
PPTX
Computer Virus
PPTX
Benefits of Internet
DOCX
Industrial Training report on java
PPTX
Web development with Python
PPTX
Software Configuration Management (SCM)
PPTX
Appsolutely! Using SeeSaw for student portfolios
PPTX
Issues in software cost estimation
PPTX
Ch 6 development plan and quality plan
PPT
Introduction to computer programming
PPTX
Network Technologies
CS8494 SOFTWARE ENGINEERING Unit-1
6 types of web application development
Introduction to Internet By Anamika
The History Of The Internet Presentation
Chapter01 the systems development environment
Web application
Project Planning in Software Engineering
Lecture 2
Multimedia production
Computer Virus
Benefits of Internet
Industrial Training report on java
Web development with Python
Software Configuration Management (SCM)
Appsolutely! Using SeeSaw for student portfolios
Issues in software cost estimation
Ch 6 development plan and quality plan
Introduction to computer programming
Network Technologies
Ad

Similar to Chapter 1.pptx compiler design lecture note (20)

PPTX
4_5802928814682016556.pptx
PPTX
CD - CH1 - Introduction to compiler design.pptx
PPTX
Cd ch1 - introduction
PPT
Compiler design lessons notes from Semester
PPTX
Compiler Construction from very basic start
PDF
Chapter#01 cc
PDF
COMPILER DESIGN.pdf
PPTX
Ch 1.pptx
PPT
Compiler Design Basics
PPTX
Chapter 2 Program language translation.pptx
PPTX
Introduction to Compilers
PDF
Compiler type
PPTX
Unit2_CD.pptx more about compilation of the day
PPT
Compiler Design Basics
PPTX
Lecture 1 introduction to language processors
PPTX
Compiler Design Slides for Third Year Computer Science and Engineering
PPTX
Pros and cons of c as a compiler language
PDF
Lecture1 introduction compilers
PPTX
Compiler presentaion
PDF
unit1pdf__2021_12_14_12_37_34.pdf
4_5802928814682016556.pptx
CD - CH1 - Introduction to compiler design.pptx
Cd ch1 - introduction
Compiler design lessons notes from Semester
Compiler Construction from very basic start
Chapter#01 cc
COMPILER DESIGN.pdf
Ch 1.pptx
Compiler Design Basics
Chapter 2 Program language translation.pptx
Introduction to Compilers
Compiler type
Unit2_CD.pptx more about compilation of the day
Compiler Design Basics
Lecture 1 introduction to language processors
Compiler Design Slides for Third Year Computer Science and Engineering
Pros and cons of c as a compiler language
Lecture1 introduction compilers
Compiler presentaion
unit1pdf__2021_12_14_12_37_34.pdf
Ad

More from adugnanegero (9)

PPTX
Chapter 3pppppppppppppppppppppppppppp.pptx
PPTX
ch2.pptxiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
PPTX
chuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu1.pptx
PPTX
Chapter 4.pptx compiler design lecture note
PPTX
Chapter 2.pptx compiler design lecture note
PPT
fggggggggggggggggggggggggggggggfffffffffffffffffff
PPTX
hghghghhghghgggggggggggggggggggggggggggggggggg
PPT
ghhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhjjjjjjjjjjjjjjj
PPT
RTOS Material hfffffffffffffffffffffffffffffffffffff
Chapter 3pppppppppppppppppppppppppppp.pptx
ch2.pptxiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
chuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu1.pptx
Chapter 4.pptx compiler design lecture note
Chapter 2.pptx compiler design lecture note
fggggggggggggggggggggggggggggggfffffffffffffffffff
hghghghhghghgggggggggggggggggggggggggggggggggg
ghhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhjjjjjjjjjjjjjjj
RTOS Material hfffffffffffffffffffffffffffffffffffff

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
PPT on Performance Review to get promotions
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
composite construction of structures.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Construction Project Organization Group 2.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
UNIT 4 Total Quality Management .pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Lesson 3_Tessellation.pptx finite Mathematics
CH1 Production IntroductoryConcepts.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT on Performance Review to get promotions
Structs to JSON How Go Powers REST APIs.pdf
composite construction of structures.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Internet of Things (IOT) - A guide to understanding
Lecture Notes Electrical Wiring System Components
Construction Project Organization Group 2.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
bas. eng. economics group 4 presentation 1.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...

Chapter 1.pptx compiler design lecture note

  • 2. Basic knowledge of Programming languages. Basic knowledge of Formal language and AutomataTheory. Textbook: AlfredV.Aho, Ravi Sethi, and Jeffrey D. Ullman, “Compilers:Principles,Techniques,andTools” Addison-Wesley, 2007. Course Outline PreliminariesRequired
  • 3. Objective At the end of this session students will be able to:  Understand the basic concepts and principles of Compiler Design  Understand the term compiler, its functions and how it works.  Be familiar with the different classification of compilers.  Be familiar with cousins of compiler: Linkers, Loaders, Interpreters,Assemblers  Understand the need of studying compiler Design and construction  Understand the phases of compilation and the steps of compilation  Understand the history of compilers. (Assignment For G1 Students)  Be familiar with the different compiler tools (Assignment For G2 Students) 3
  • 4. Introduction Definition  Compiler is an executable program that can read a program in one high-level language and translate it into an equivalent executable program in machine language.  A compiler is a computer program that translates an executable program in a source language into an equivalent program in a target language. A source program/code is a program/code written in the source language, which is usually a high-level language. 4 A target program/code is a program/code written in the target language, which often is a machine language or an intermediate code (Object Code). Compiler Source program Target program Error message
  • 5. Contd.  As a discipline compiler design involves multiple computer science and Engineering courses like: Programming Languages Data structures and Algorithms Theory of Computation (Automata and formal language theory) Assembly language Software Engineering Computer Architecture Operating Systems and Discrete Mathematics 5
  • 6. Why Study Theory of Compiler?  curiosity  Prerequisite for developing advanced compilers, which continues to be active as new computer architectures emerge To improve capabilities of existing compiler/interpreter To write more efficient code in a high-level language  Useful to develop software tools that parse computer codes or strings E.g., editors, debuggers, interpreters, preprocessors, …  Important to understand how compliers work to program more effectively  To provide solid foundation in parsing theory for parser writing  To make compiler design as an excellent “capstone” project  To apply almost all of the major computer science fields, such as: automata theory, computer programming, programming language design theory, assembly language, computer architecture, data structures, algorithms, and software engineering. 6
  • 7. Classification of Compilers 7  CompilersViewed from Many Perspectives Single Pass Multiple Pass Load & Go Debugging Optimizing  However, all utilize same basic tasks to accomplish their actions Construction Functional o Classifying compilers by number of passes has its background in the hardware resource limitations of computers. o Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work.  So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.
  • 8. Contd. 1. Single(One) Pass Compilers:- is a compiler that passes through the source code of each compilation unit only once Also called narrow compilers. The ability to compile in a single pass has classically been seen as a benefit because it simplifies the job of writing a compiler. Single-pass compilers generally perform compilations faster than multi-pass compilers. Due to the resource limitations of early systems, many early languages were specifically designed so that they could be compiled in a single pass (e.g., Pascal). Disadvantage of single pass Compilers: It is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. o For instance, different phases of optimization may analyse one expression many times but only analyse another expression once. 8
  • 9. Contd. 2. Multi-Pass Compilers:- is a type of compiler that processes the source code or abstract syntax tree of a program several times. Also called wide compilers. Phases are separate "Programs", which run sequentially Here, by splitting the compiler up into small programs, correct programs will be produced. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program. Many programming languages cannot be represented with a single pass compilers, for example most latest languages like Java require a multi-pass compiler. 9
  • 10. Contd. 3. Load and Go Compilers:- generates machine code & then immediately executes it. Compilers usually produce either absolute code that is executed immediately upon conclusion of the compilation or object code that is transformed by a linking loader into absolute code. These compiler organizations will be called Load & Go and Link/Load. Both Load & Go and Link/Load compilers use a number of passes to translate the source program into absolute code. A pass reads some form of the source program, transforms it into an another form, and normally outputs this form to an intermediate file which may be the input of a later pass. 10
  • 11. Contd. 4. Optimizing Compilers:- is a compiler that tries to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the amount of memory occupied. The growth of portable computers has created a market for minimizing the power consumed by a program. Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources. Types of Optimization includes Peephole optimizations, local optimization, global optimization, loop optimization, machine-code optimization, etc. [Read more ] 11
  • 12. Cousins of Compilers A. Assembler:- is a translator that converts programs written in assembly language into machine code.  Translate mnemonic operation codes to their machine language equivalents.  Assigning machine addresses to symbolic labels. B. Interpreter:- is a computer program that translates high level instructions/programs into machine code as they are encountered.  It produces output of statement as they are interpreted  It generally uses one of the following strategies for program execution: i. execute the source code directly ii. translate source code into some efficient intermediate representation and immediately execute this iii. explicitly execute stored precompiled code made by a compiler which is part of the interpreter system 12
  • 13. Contd. C. Linker:- is a program that takes one or more objects generated by a compiler and combines them into a single executable program. D. Loader:- is the part of an operating system that is responsible for loading programs from executables (i.e., executable files) into memory, preparing them for execution and then executing them. 13 preprocessor compiler assembler linker/loader source program modified source program target assembly program Relocatable machine code target machine code Library files
  • 14. Compiler vs. Interpreter  Most languages are usually thought of as using either one or the other:  Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1  Interpreters: Lisp, scheme, BASIC,APL, Perl, Python, Smalltalk  BUT: not always implemented this way 14 Ideal concept: Compiler Executable Executable Output data Interpreter Source code Input data Output data Input data Source code
  • 15. Basic Compiler Design  Write a huge program that takes as input another program in the source language for the compiler, and gives as output an executable that we can run. For modifying code easily, usually, we use modular design (decomposition) methodology to design a compiler. Two design strategies: 1. Write a “front end” of the compiler (i.e. the lexer, parser, semantic analyzer, and assembly tree generator), and write a separate back end for each platform that you want to support 2. Write an efficient highly optimized back end, and write a different front end for several languages, such as Fortran, C, C++, and Java. 15 Front End Back End Source code Intermediate code Target code
  • 16. The Analysis-Synthesis Model of Compilation 16 1. LexicalAnalysis 2. SyntaxAnalysis 3. SemanticAnalysis 4. Code Generation 5. Optimization Analysis Synthesis Front End Back End  Breaks up source program into constituent pieces  Creates intermediate representation of source program  Construct target program from intermediate representation  Takes the tree structure and translates the operations into the target program There are two parts to compilation: analysis & synthesis.  During analysis, the operations implied by the source program are determined and recorded in a hierarchical structure called a tree.  During synthesis, the operations involved in producing translated code.
  • 17. Analysis  In compiling, analysis has three phases: 1. Linear analysis: stream of characters read from left-to-right and grouped into tokens; known as lexical analysis or scanning  Converting input text into stream of known objects called tokens.  It simplifies scanning process 2. Hierarchical analysis: tokens grouped hierarchically with collective meaning; known as parsing or syntax analysis  Translating code to rules of grammar.  Building representation of code. 3. Semantic analysis: check if the program components fit together meaningfully  Checks source program for semantic errors  Gathers type information for subsequent code generation (type checking)  Identifies operator and operands of expressions and statements 17
  • 18. Phases of Compilation scanner parser Semantic analyzer Intermediate code generator Code optimization Code generator Code optimization Stream of characters Stream of tokens Parse/syntax tree Annotated tree Intermediate code Intermediate code Target code Target code General Structure of a Compiler
  • 19. Phase I: Lexical Analysis  The low-level text processing portion of the compiler  The source file, a stream of characters, is broken into larger chunks called tokens. For example: void main() { int x; x=3; }  The lexical analyzer (scanner) reads a stream of characters and puts them together into some meaningful (with respect to the source language) units called tokens.  Typically, spaces, tabs, end-of-line characters and comments are ignored by the lexical analyzer.  To design a lexical analyzer: input a description (regular expressions) of the tokens in the language, and output a lexical analyzer (a program). 19 It will be broken into 13 tokens as below: void main ( ) { int x ; x = 3 ; }
  • 20. Phase II: Parsing (Syntax Analysis)  A parser gets a stream of tokens from the scanner, and determines if the syntax (structure) of the program is correct according to the (context-free) grammar of the source language.  Then, it produces a data structure, called a parse tree or an abstract syntax tree, which describes the syntactic structure of the program.  The parser ensures that the sequence of tokens returned by the lexical analyzer forms a syntactically correct program  It also builds a structured representation of the program called an abstract syntax tree that is easier for the type checker to analyze than a stream of tokens  It catches the syntax errors as the statement below: if if (x > 3) then x = x + 1  Context-free grammars will be used (as the input) by the parser generator to describe the syntax of the compiling language  Most compilers do not generate a parse tree explicitly but rather go to intermediate 20
  • 21. Parse Tree  Is output of parsing that shows theTop-down description of program syntax  Root node is entire program and leaves are tokens that were identified during lexical analysis  Constructed by repeated application of rules in Context Free Grammar (CFG)  Syntax structures are analyzed by DPDA (Deterministic Push Down Automata) Example: parse tree for position:=initial + rate*60 21
  • 22. Phase III: Semantic Analysis  It gets the parse tree from the parser together with information about some syntactic elements  It determines if the semantics (meanings) of the program is correct.  It detects errors of the program, such as using variables before they are declared, assign an integer value to a Boolean variable, …  This part deals with static semantic.  semantic of programs that can be checked by reading off from the program only.  syntax of the language which cannot be described in context-free grammar.  Mostly, a semantic analyzer does type checking (i.e. Gathers type information for subsequent code generation.)  It modifies the parse tree in order to get that (static) semantically correct code  In this phase, the abstract syntax tree that is produced by the parser is traversed, looking for semantic errors 22
  • 23. Contd.  The main tool used by the semantic analyzer is a symbol table  Symbol table:- is a data structure with a record for each identifier and its attributes  Attributes include storage allocation, type, scope, etc  All the compiler phases insert and modify the symbol table  Discovery of meaning in a program using the symbol table  Do static semantics check  Simplify the structure of the parse tree ( from parse tree to abstract syntax tree (AST) ) Static semantics check  Making sure identifiers are declared before use  Type checking for assignments and operators  Checking types and number of parameters to subroutines  Making sure functions contain return statements 23
  • 24. Phase IV: Intermediate Code Generation  An intermediate code generator  takes a parse tree from the semantic analyzer  generates a program in the intermediate language.  In some compilers, a source program is translated into an intermediate code first and then the intermediate code is translated into the target language.  In other compilers, a source program is translated directly into the target language.  Compiler makes a second pass over the parse tree to produce the translated code  If there are no compile-time errors, the semantic analyzer translates the abstract syntax tree into the abstract assembly tree  The abstract assembly tree will be passed to the code optimization and assembly code generation phase 24
  • 25. Contd.  Using intermediate code is beneficial when compilers which translates a single source language to many target languages are required.  The front-end of a compiler:- scanner to intermediate code generator can be used for every compilers.  Different back-ends:- code optimizer and code generator is required for each target language.  One of the popular intermediate code is three-address code.  A three-address code instruction is in the form of x = y op z. 25
  • 26. Phase V: Assembly Code Generation  Code generator coverts the abstract assembly tree into the actual assembly code  To do code generation  The generator covers the abstract assembly tree with tiles (each tile represents a small portion of an abstract assembly tree) and  Output the actual assembly code associated with the tiles that we used to cover the tree 26 Phase VI: Machine Code Generation and Linking  The final phase of compilation coverts the assembly code into machine code and links (by a linker) in appropriate language libraries
  • 27. Code Optimization  Replacing an inefficient sequence of instructions with a better sequence of instructions.  Sometimes called code improvement.  Code optimization can be done:  after semantic analyzing performed on a parse tree  after intermediate code generation performed on a intermediate code  after code generation performed on a target code  Two types of optimization 1. Local 2. Global 27
  • 28. Local Optimization  The compiler looks at a very small block of instructions and tries to determine how it can improve the efficiency of this local code block  Relatively easy; included as part of most compilers Examples of possible local optimizations 1. Constant evaluation 2. Strength reduction 3. Eliminating unnecessary operations 28
  • 29. Global Optimization  The compiler looks at large segments of the program to decide how to improve performance  Much more difficult; usually omitted from all but the most sophisticated and expensive production-level “optimizing compilers”  Optimization cannot make an inefficient algorithm efficient 29
  • 30. 30 The Phases of a Compiler Phase Output Sample Programmer (source code producer) Source string A=B+C; Scanner (performs lexical analysis) Token string ‘A’,‘=’,‘B’,‘+’,‘C’,‘;’ And symbol table with names Parser (performs syntax analysis based on the grammar of the programming language) Parse tree or abstract syntax tree ; | = / A + / B C Semantic analyzer (type checking, etc) Annotated parse tree or abstract syntax tree Intermediate code generator Three-address code, quads, or RTL int2fp B t1 + t1 C t2 := t2 A Optimizer Three-address code, quads, or RTL int2fp B t1 + t1 #2.3 A Code generator Assembly code MOVF #2.3,r1 ADDF2 r1,r2 MOVF r2,A
  • 31. Summary of Phases of Compiler 31
  • 32. Compiler Construction Tools 32 Software development tools are available to implement one or more compiler phases  Scanner generators  Parser generators.  Syntax-directed translation engines  Automatic code generators  Data Flow Engines Other compiler tools:  JavaCC, a parser generator for Java, including scanner generator and parser generator. Input specifications are different than those suitable for Lex/YACC. Also, unlike YACC, JavaCC generates a top-down parser.  ANTLR, a set of language translation tools (formerly PCCTS). Includes scanner/parser generators for C, C++, and Java.  Scanner generators for C/C++: Flex, Lex.  Parser generators for C/C++: Bison, YACC.  Available scanner generators for Java:  JLex, a scanner generator for Java, very similar to Lex.  JFlex, flex for Java.  Available parser generators for Java:  CUP, a parser generator for Java, very similar toYACC.  BYACC/J, a different version of BerkeleyYACC for Java. It is an extension of the standardYACC (a -j flag has been added to generate Java code).
  • 33. Assignment I Compiler DesignTools History of Compilers Thank You ... ?