SlideShare a Scribd company logo
Compiler Design
Dr. Gameil Hamzh
Introduction
Assessment Plan
 Attendance + Quizzes = 10 Marks
 Midterm Exam = 20 Marks
 Presentation = 10 Marks
 Final Exam = 60 Marks
Language processing systems
Test.cpp

Test.obj

Test.exe
Language processing systems
• We have learnt that any computer system is made of
hardware and software The hardware understands a
language, which humans cannot understand
• So we write programs in high level language, which is
easier for us to understand and remember These
programs are then fed into a series of tools and OS
components to get the desired code that can be used
by the machine This is known as Language Processing
System
Language processing systems
• User writes a program in language (high level language)
• The compiler, compiles the program and translates it to assembly
program (low level language)
• An assembler then translates the assembly program into machine
code object
• A linker tool is used to link all the parts of the program together for
execution (executable machine code)
• A loader loads all of them into memory and then the program is
executed
What is compiler?
 A program that translates source code into an
equivalent target code
 Source code is written in a programming language,
e.g. C++ or Java
 Target code is often computer-understandable object
code
 A bridge between (application) software and
hardware
Compiler: a black box view
compiler
Source program Target program
Error message
Compiler: Phases of Compiler
Lexical analyser (scanner)
Syntax analyser (parser)
Semantic analyser
Intermediate code generator
Code optimization
Code generator
Code optimization
Source code: characters
Tokens
Parse/syntax tree
Annotated tree
Intermediate code
Optimised intermediate code
Target code
Optimised target code
Symbol
Table
Error
Table
Compilation process
 Broadly divided into two phases
 Analysis (also called front-end)
• (Lexical, Syntactic, Semantic) Analysis
• Programming language dependent
• Computer architecture independent
 Synthesis (also called back-end)
• (Intermediate) Code generation and optimisation
• Programming language independent
• Computer architecture dependent
Compilation process (cont.)
 Each phase transforms the source program from
one representation into another representation
 The phases communicate with the symbol table
and error table
• Lexical Analysis (Or Scanning Or Tokenizing):
In lexical analysis, stream of characters making up the source
program (called Lexemes) is read from left to right and grouped
into categories (called tokens) that are sequence of characters
having a collective meaning.
For example: Consider the following assignment statement:
Position = Initial + Rate * 60
(All the variables are of type real)
After passing through the lexical analysis phase, the above
assignment statement gets the following form.
id1 = id2 + id3 * 60
Note: The Lexical Analysis truncates white spaces as well as
comments from the source program.
Brief Introduction to Phases of Compiler
An overview of compiler phases
• Syntax Analysis (Parsing Or Hierarchal Analysis):
In syntax analysis phase, characters or tokens are grouped
hierarchically into nested collections with collective meanings. It
includes grouping the tokens of source program into grammatical
phrases which are then used by the compiler to synthesize the
output.
After passing through the syntax analysis phase, the above
assignment statement takes the following form.
=
id1 +
id2 *
id3 60
Brief Introduction to Phases of Compiler (Contd...)
An overview of compiler phases
• Semantic Analysis:
The semantic analysis phase performs certain checks to ensure that the
components of a program fit together meaningfully. It checks the
source program for semantic errors and gathers type information from
symbol table for subsequent code generation phase.
It uses hierarchical structure determined by syntax analysis phase to
identify the operations and operands of the expressions and
statements.
An important component of semantic analysis is type checking. Here
the compiler checks that each operator has operands that are permitted
by the source language specification.
An overview of compiler phases
• For example, when we want to add function name with array name
and store it into variable, the compiler will generate an error.
Moreover, many language specifications require a compiler to
generate an error when a real number is used as index of an array.
However, some languages may allow type conversion which is done
in semantic analysis phase.
• The above assignment statement, after passing through the semantic
analysis phase takes the following form.
An overview of compiler phases
• Intermediate Code Generation:
After syntax and semantic analysis phase some compiler generates an
explicit intermediate representation of source program. We can think
of this intermediate representation as a program for an abstract
machine.
It takes the form of three address code which is like the assembly
language for a machine in which every memory location (i.e. variable)
can act like a register. It consists of a sequence of instructions, each of
which has at most three operands.
The above statement will become as follows:
Temp1 = into real (60)
Temp2 = id3 * Temp1
Temp3 = id2 + Temp2
id1 = Temp3
An overview of compiler phases
Brief Introduction to Phases of Compiler (Contd...)
• Code Optimization:
The code optimization phase attempts to improve the
intermediate code by reducing the lines of code so that faster
running machine code will result.
The above intermediate code will be optimized as:
Temp1 = id3 * 60.0
id1 = id2 + Temp1
An overview of compiler phases
Brief Introduction to Phases of Compiler (Contd...)
• Code Generation:
It generates target code consisting normally of assembly code (or
sometimes relocatable machine code when assembler is
embedded).
So, the above optimized code will be written in assembly
language as:
MOVF R1, id3
MULF R1, #60.0
MOVF R2, id2
ADDF R1, R2
MOVF id1, R1
An overview of compiler phases
Symbol Table Management
A symbol table is a data structure containing information about
various attributes of each identifier.
For example, in case of a variable, their attributes may provide
information like,
• type of variable
• memory allocated to this variable
• address of variable
• scope of variable
and in case of procedure,
• procedure name
• the number and type of arguments
• return type (if any)
• Each phase of compiler retrieves and stores data from and into
symbol table and updates it if required.
Symbol table management
• Each phase of compiler can encounter errors. However after detecting an
error a phase must somehow deal with that error so that compilation can
proceed further and allow further errors in the source program to be
detected.
• All this task of error detection is done by the error detection routine. The
syntax and semantic analysis phases usually handle a large fraction of
errors detectable by the compiler.
• Lexical Analyzer can detect errors only when input characters cannot
form any token of language. Errors where token stream violates the
structure rules of the language are determined by the Syntax Analysis
phase.
• During Semantic Analysis, the compiler tries to detect the language
constructs that have correct syntactic structure but no meaning to the
operation involve e.g. if we try to add two identifiers one of which is the
name of an array and other is a function name. The semantic analyzer will
generate an error.
Error handling
Compiler vs. Interpreter
 Compiler transforms a source code file into an
object code file
 Interpreter translates source code line-by-line,
executes it, and then discards the translated
version
 Compiled languages are time efficient than
interpreted languages (Think why?)
 Compiled languages are space efficient than
interpreted languages (Think why?)
 Interpreted languages are easy to debug (Think
why?)
Applications of Compiler Technology
 HTML and Word processing documents
 General consistency checking could benefit
from type checking
 Textual user interfaces use parsing to recognise
users’ utterances
Tokens, Patterns, and Lexemes
A token is a name given to a logical unit in the
language, often it is a pair:
 token name (e.g., identifier, number)
 token value (e.g., "myCounter")
 A lexeme is a sequence of program characters
that form a token
 e.g., "myCounter"
 A pattern is a description of the form that the
lexemes of a token may take
 e.g., character strings including A-Z, a-z, 0-9, and _
Lexical Analyser
 Groups sequence of characters into lexemes
 smallest meaningful entity in a language (keywords,
identifiers, constants)
 Makes use of the theory of regular languages and finite
state machines.
Lexical Analysis
input token value/lexeme
ID r
ASN =
ID x
MUL *
r = x * (a+10) LP (
ID a
PLUS +
INT 10
RP )
Tokens are typically represented by numbers, for efficiency reasons
Classes of Tokens
 Keywords (also called reserved words)
 Operators
 Identifiers
 Constants: numbers and literal strings
 Punctuation symbol
Examples of Tokens
Token Description Sample lexemes
IF ‘i’, ‘f’ if
ELSE ‘e’, ‘l’, ‘s’, ‘e’ else
OPR Plus, minus, equal +, -, *
ID Letter followed by letters
and digits
pi, score, D2
NUM Any numeric constant 345, 45.6
LITERAL Anything in double or
single quotes
“core dumped”
Lexical Analyser: Issues & Remed
 How to describe tokens?
 Regular expressions could be used
 Often called specification
 How to break text down into tokens?
 Finite automata could be used
 Often called implementation

More Related Content

PPTX
1._Introduction_.pptx
PPTX
COMPILER CONSTRUCTION KU 1.pptx
PDF
Compiler_Lecture1.pdf
PPTX
The Phases of a Compiler
PPTX
Compiler an overview
PPTX
Phases of Compiler.pptx
PPTX
Chapter 1.pptx
PPTX
Compiler Design Introduction With Design
1._Introduction_.pptx
COMPILER CONSTRUCTION KU 1.pptx
Compiler_Lecture1.pdf
The Phases of a Compiler
Compiler an overview
Phases of Compiler.pptx
Chapter 1.pptx
Compiler Design Introduction With Design

Similar to A Lecture of Compiler Design Subject.pptx (20)

PDF
Compilers Principles, Practice & Tools Compilers
DOCX
Compiler Design Material
PDF
Chapter#01 cc
PPTX
Presentation1
PPTX
Presentation1
PDF
Chapter1pdf__2021_11_23_10_53_20.pdf
PPT
Compiler Construction introduction
PDF
11700220036.pdf
PDF
unit1pdf__2021_12_14_12_37_34.pdf
PPTX
Lecture 1 introduction to language processors
PPT
Concept of compiler in details
PDF
PPT
Compier Design_Unit I.ppt
PPT
Compier Design_Unit I.ppt
PDF
Compiler design lecture 1 introduction computer science
PPT
what is compiler and five phases of compiler
PDF
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
PPTX
CD U1-5.pptx
PDF
Lecture 01 introduction to compiler
PPTX
System software module 4 presentation file
Compilers Principles, Practice & Tools Compilers
Compiler Design Material
Chapter#01 cc
Presentation1
Presentation1
Chapter1pdf__2021_11_23_10_53_20.pdf
Compiler Construction introduction
11700220036.pdf
unit1pdf__2021_12_14_12_37_34.pdf
Lecture 1 introduction to language processors
Concept of compiler in details
Compier Design_Unit I.ppt
Compier Design_Unit I.ppt
Compiler design lecture 1 introduction computer science
what is compiler and five phases of compiler
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
CD U1-5.pptx
Lecture 01 introduction to compiler
System software module 4 presentation file
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
KodekX | Application Modernization Development
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
KodekX | Application Modernization Development
Encapsulation_ Review paper, used for researhc scholars
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Per capita expenditure prediction using model stacking based on satellite ima...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectral efficient network and resource selection model in 5G networks
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Ad

A Lecture of Compiler Design Subject.pptx

  • 1. Compiler Design Dr. Gameil Hamzh Introduction
  • 2. Assessment Plan  Attendance + Quizzes = 10 Marks  Midterm Exam = 20 Marks  Presentation = 10 Marks  Final Exam = 60 Marks
  • 4. Language processing systems • We have learnt that any computer system is made of hardware and software The hardware understands a language, which humans cannot understand • So we write programs in high level language, which is easier for us to understand and remember These programs are then fed into a series of tools and OS components to get the desired code that can be used by the machine This is known as Language Processing System
  • 5. Language processing systems • User writes a program in language (high level language) • The compiler, compiles the program and translates it to assembly program (low level language) • An assembler then translates the assembly program into machine code object • A linker tool is used to link all the parts of the program together for execution (executable machine code) • A loader loads all of them into memory and then the program is executed
  • 6. What is compiler?  A program that translates source code into an equivalent target code  Source code is written in a programming language, e.g. C++ or Java  Target code is often computer-understandable object code  A bridge between (application) software and hardware
  • 7. Compiler: a black box view compiler Source program Target program Error message
  • 8. Compiler: Phases of Compiler Lexical analyser (scanner) Syntax analyser (parser) Semantic analyser Intermediate code generator Code optimization Code generator Code optimization Source code: characters Tokens Parse/syntax tree Annotated tree Intermediate code Optimised intermediate code Target code Optimised target code Symbol Table Error Table
  • 9. Compilation process  Broadly divided into two phases  Analysis (also called front-end) • (Lexical, Syntactic, Semantic) Analysis • Programming language dependent • Computer architecture independent  Synthesis (also called back-end) • (Intermediate) Code generation and optimisation • Programming language independent • Computer architecture dependent
  • 10. Compilation process (cont.)  Each phase transforms the source program from one representation into another representation  The phases communicate with the symbol table and error table
  • 11. • Lexical Analysis (Or Scanning Or Tokenizing): In lexical analysis, stream of characters making up the source program (called Lexemes) is read from left to right and grouped into categories (called tokens) that are sequence of characters having a collective meaning. For example: Consider the following assignment statement: Position = Initial + Rate * 60 (All the variables are of type real) After passing through the lexical analysis phase, the above assignment statement gets the following form. id1 = id2 + id3 * 60 Note: The Lexical Analysis truncates white spaces as well as comments from the source program. Brief Introduction to Phases of Compiler An overview of compiler phases
  • 12. • Syntax Analysis (Parsing Or Hierarchal Analysis): In syntax analysis phase, characters or tokens are grouped hierarchically into nested collections with collective meanings. It includes grouping the tokens of source program into grammatical phrases which are then used by the compiler to synthesize the output. After passing through the syntax analysis phase, the above assignment statement takes the following form. = id1 + id2 * id3 60 Brief Introduction to Phases of Compiler (Contd...) An overview of compiler phases
  • 13. • Semantic Analysis: The semantic analysis phase performs certain checks to ensure that the components of a program fit together meaningfully. It checks the source program for semantic errors and gathers type information from symbol table for subsequent code generation phase. It uses hierarchical structure determined by syntax analysis phase to identify the operations and operands of the expressions and statements. An important component of semantic analysis is type checking. Here the compiler checks that each operator has operands that are permitted by the source language specification. An overview of compiler phases
  • 14. • For example, when we want to add function name with array name and store it into variable, the compiler will generate an error. Moreover, many language specifications require a compiler to generate an error when a real number is used as index of an array. However, some languages may allow type conversion which is done in semantic analysis phase. • The above assignment statement, after passing through the semantic analysis phase takes the following form. An overview of compiler phases
  • 15. • Intermediate Code Generation: After syntax and semantic analysis phase some compiler generates an explicit intermediate representation of source program. We can think of this intermediate representation as a program for an abstract machine. It takes the form of three address code which is like the assembly language for a machine in which every memory location (i.e. variable) can act like a register. It consists of a sequence of instructions, each of which has at most three operands. The above statement will become as follows: Temp1 = into real (60) Temp2 = id3 * Temp1 Temp3 = id2 + Temp2 id1 = Temp3 An overview of compiler phases
  • 16. Brief Introduction to Phases of Compiler (Contd...) • Code Optimization: The code optimization phase attempts to improve the intermediate code by reducing the lines of code so that faster running machine code will result. The above intermediate code will be optimized as: Temp1 = id3 * 60.0 id1 = id2 + Temp1 An overview of compiler phases
  • 17. Brief Introduction to Phases of Compiler (Contd...) • Code Generation: It generates target code consisting normally of assembly code (or sometimes relocatable machine code when assembler is embedded). So, the above optimized code will be written in assembly language as: MOVF R1, id3 MULF R1, #60.0 MOVF R2, id2 ADDF R1, R2 MOVF id1, R1 An overview of compiler phases
  • 18. Symbol Table Management A symbol table is a data structure containing information about various attributes of each identifier. For example, in case of a variable, their attributes may provide information like, • type of variable • memory allocated to this variable • address of variable • scope of variable and in case of procedure, • procedure name • the number and type of arguments • return type (if any) • Each phase of compiler retrieves and stores data from and into symbol table and updates it if required. Symbol table management
  • 19. • Each phase of compiler can encounter errors. However after detecting an error a phase must somehow deal with that error so that compilation can proceed further and allow further errors in the source program to be detected. • All this task of error detection is done by the error detection routine. The syntax and semantic analysis phases usually handle a large fraction of errors detectable by the compiler. • Lexical Analyzer can detect errors only when input characters cannot form any token of language. Errors where token stream violates the structure rules of the language are determined by the Syntax Analysis phase. • During Semantic Analysis, the compiler tries to detect the language constructs that have correct syntactic structure but no meaning to the operation involve e.g. if we try to add two identifiers one of which is the name of an array and other is a function name. The semantic analyzer will generate an error. Error handling
  • 20. Compiler vs. Interpreter  Compiler transforms a source code file into an object code file  Interpreter translates source code line-by-line, executes it, and then discards the translated version  Compiled languages are time efficient than interpreted languages (Think why?)  Compiled languages are space efficient than interpreted languages (Think why?)  Interpreted languages are easy to debug (Think why?)
  • 21. Applications of Compiler Technology  HTML and Word processing documents  General consistency checking could benefit from type checking  Textual user interfaces use parsing to recognise users’ utterances
  • 22. Tokens, Patterns, and Lexemes A token is a name given to a logical unit in the language, often it is a pair:  token name (e.g., identifier, number)  token value (e.g., "myCounter")  A lexeme is a sequence of program characters that form a token  e.g., "myCounter"  A pattern is a description of the form that the lexemes of a token may take  e.g., character strings including A-Z, a-z, 0-9, and _
  • 23. Lexical Analyser  Groups sequence of characters into lexemes  smallest meaningful entity in a language (keywords, identifiers, constants)  Makes use of the theory of regular languages and finite state machines.
  • 24. Lexical Analysis input token value/lexeme ID r ASN = ID x MUL * r = x * (a+10) LP ( ID a PLUS + INT 10 RP ) Tokens are typically represented by numbers, for efficiency reasons
  • 25. Classes of Tokens  Keywords (also called reserved words)  Operators  Identifiers  Constants: numbers and literal strings  Punctuation symbol
  • 26. Examples of Tokens Token Description Sample lexemes IF ‘i’, ‘f’ if ELSE ‘e’, ‘l’, ‘s’, ‘e’ else OPR Plus, minus, equal +, -, * ID Letter followed by letters and digits pi, score, D2 NUM Any numeric constant 345, 45.6 LITERAL Anything in double or single quotes “core dumped”
  • 27. Lexical Analyser: Issues & Remed  How to describe tokens?  Regular expressions could be used  Often called specification  How to break text down into tokens?  Finite automata could be used  Often called implementation