Lecture 21 22

1
Compiler Construction
(CS-636)
Muhammad Bilal Bashir
UIIT, Rawalpindi

Outline
1. Data Types & Type Checking
2. Intermediate Code Generation
3. Variants of Syntax Trees
4. Three-Address Code
5. Static Single-Assignment Form
6. Summary
2

Semantic Analysis
Lecture: 21-22
3

Data Types & Type Checking
 One of the principal tasks of a compiler is the
computation and maintenance of information on
data types (type inference)
 Compiler uses this information to ensure that each
part of the program makes sense under the type
rules of the language (type checking)
 Data type information can occur in a program in
several different forms
 Theoretically, a data type is a set of values, or more
precisely a set of values with certain operations on
those values
4

Data Types & Type Checking (Continue…)
 For instance, data type integer in a programming
language refers to a subset of mathematical
integers, together with the arithmetic operations
 These sets in compiler constructions are described
by a type expression
 Type expressions can occur in several places in a
program
5

Type Expressions & Type
Constructors
 A programming language always contain a number
of built-in types
 These predefined types correspond either to
numeric data types like int or double OR they are
elementary types like boolean or char
 Such data types are called simple types, in that
their values exhibit no explicit internal structure
 An interesting predefined type in C language is
void type
 This type has no values, and so represents empty set
6

Constructors (Continue…)
 In some languages it is possible to define new
simple types
 subrange in Pascal and enumerated types in C
 In Pascal, subrange of integers from 0 to 9 can be
declared as
type Digit = 0..9;
 In C, an enumerated type consisting of named
values can be declared as
typedef enum {red, green, blue} Color;
7

Constructors (Continue…)
 Given a set of predefined types, new data types can
be created using type constructors, such as array
and record, or struct
 Such constructors can be viewed as functions that
take existing types as parameters and return new
types with a structure that depends on the
constructor
 Such types are called structured types
8

Type Names, Type Declarations, and
Recursive Types
 Languages that have a rich set of type constructors
usually also have a mechanism for a programmer to
assign names to type expressions
 Such type declarations (sometimes called type
definitions) can be done in C as follows
struct RealIntRec {
double r;
int I;
};
9

Recursive Types (Continue…)
 Type declarations cause the declared type names to
be entered into the symbol table just as variable
declarations cause variable names to be entered
 Type names are associated with attributes in the
symbol table in a similar way to variable
declarations
 These attributes include scope and type
expressions corresponding to the type name
 Since type names can appear in type expressions,
question arise about the recursive use of type
names
10

Recursive Types (Continue…)
 In C programming language, recursive type names
cannot be declared directly because at time of
declaration it is unknown that how much memory be
required for the structure;
struct intBST {
int val;
struct intBST *left, *right;
};
11

Type Equivalence
 Given the possible type expressions of a language,
a type checker must frequently answer the question
of when two type expressions represent the same
type
 This is the question of type equivalence
 There are many possible ways for type equivalence
to be defined by a language
 Type equivalence checking can be seen as a
function in a compiler
function typeEqual( t1, t2, TypeExp ) : Boolean
12

Type Equivalence (Continue…)
 The typeEqual() function takes two type
expressions and returns true if they represent the
same type according to the type equivalence rules
of the language
 One issue that relates directly to the description of
type equivalence algorithm is the way type
expressions are represented within a compiler
 One straightforward method is to use a syntax tree
representation
13

Type Inference & Type Checking
 Type checking is described in terms of semantic
actions based on representation of types and a
typeEqual() operation.
 Compiler needs symbol table as well for this
purpose along with three of its basic operations
insert, lookup, and delete
14

(Continue…)
 Consider the following grammar;
15

(Continue…)
16

Intermediate-Code
Generation
Back-end of a Compiler
17

Where Are We Now?
18
Scanner
Parser
Semantics Analyzer
Intermediate Code Generator
Source code
Syntax Tree
Annotated Tree
Intermediate code
Tokens

Intermediate-Code Generation
 In the analysis-synthesis model of a compiler, the front
end analyzes a source program and creates an
intermediate representation, from which the back
end generates target code
 Ideally, details of the source language are confined
to the front end, and details of the target machine to
the back end
 With a suitably defined intermediate representation,
a compiler for language I and machine j can then be
built by combining the front end for language I with
back end for the machine j
19

(Continue…)
 Following figure shows front-end model of compiler
 Static checking includes type checking, which
ensures that operators are applied to compatible
operands
 Static checking also includes any syntactic checks
that remain after parsing
 A break statement in C is enclosed within a while, for or
switch statement
20

(Continue…)
 While translating a program, compiler may construct
a sequence of intermediate representations
 High-level representations are close to the source
language and low-level representation are close to
the target machine
 The abstract syntax trees are high-level
intermediate representation
 Depict natural hierarchical structure of the source program
21
Source
Program
High Level
Intermediate
Representation
Low Level
Intermediate
Representation
Target
Code

(Continue…)
 A low-level representation is suitable for machine-
dependent tasks like register allocation and
instruction selection
 Three-address code can range from high- to low-
level, depending upon the choice of operators
 The difference between syntax trees and three-
address code are superficial
 A syntax tree represents the component of a statement,
whereas three-address code contains labels and jump
instructions to represent the flow of control, as in machine
language
22

(Continue…)
 The choice or design of an intermediate
representation varies from compiler to compiler
 An intermediate representation may either be an
actual language or it may consist of internal data
structures that are shared by phases of the compiler
 C is a programming language, yet it is often used as
an intermediate form
 C is flexible, it compiles into efficient machine code, and its
compilers are widely available
 The C++ compiler consisted of a front end that generated
C, treating a C compiler as a back end
23

Variants of Syntax Trees
 Nodes in a syntax tree represent constructs in the
source program
 The children of the node represents meaningful
components of a construct
 A directed acyclic graph (DAG) for an expression
identifies the common suhexpression of the
expression
24

Directed Acyclic Graphs for
Expressions
 A directed acyclic graph (DAG), is a directed graph
with no directed cycles
 Like syntax tree for an expression, a DAG has
leaves corresponding to atomic operands and
interior nodes corresponding to operators
 A node N in a DAG has more than one parent if N
represents a common subexpression
 A DAG not only represents expressions more
succinctly, it gives the compiler important clues
regarding the generation of efficient code to
evaluate the expression
25

Directed Acyclic Graphs for
Expressions (Continue…)
 Create Syntax Trees and DAG’s for the following
expressions
 a = a + 10
 a + b + (a + b)
 a + b + a + b
 a + a * (b – c) + (b – c) * d
26

The Value-Number Method for
Constructing DAG’s
 Often, the nodes of a syntax tree or DAG are stored
in an array of records
 Each row of the array represents one record, and
therefore one node
 Consider the figure on next slide that shows a DAG
along with an array for expression i = i + 10
27

Constructing DAG’s (Continue…)
 In the following figure leaves have one additional
field, which holds the lexical value, and interior
nodes have two additional fields indicating the left
and right children
28

Constructing DAG’s (Continue…)
 In the array, we refer to nodes by giving the integer
index of the record for that node within the array
 This integer is called the value number for the node
or for the expression represented by the node
29

Three-Address Code
 In three-address code, there is at most one
operation on the right side of an instruction
 Expression like x+y*z might be translated into the
sequence of three-address instructions
t1 = y*z
t2 = x+t1
t1 and t2 are compiler generated temporary names
 The use of names for intermediate values computed
by a program allows three-address code to be
rearranged easily
30

Three-Address Code (Continue…)
 Exercise
 Represent the following DAG in three-address code
sequence
31

Addresses and Instructions
 Three-address code is built from two concepts:
addresses and instructions
 In object-oriented terms, these concepts correspond
to classes, and the various kinds of addresses and
instructions correspond to appropriate subclasses
 Alternatively, three-address code can be
implemented using records with fields for the
addresses
 The records called quadruples and triples
32

Addresses and Instructions (Continue…)
 In three-address code scheme, an address can be
one of the following
 A name: The names that appear in source program. In
implementation, a source name is replaced by a pointer to
its symbol table entry, where all the information about the
name is kept
 A constant: In practice, a compiler must deal with many
different types of constants and variables
 A compiler-generated temporary: It is useful, especially in
optimizing compilers, to create a distinct name each time a
temporary is needed
33

 Few examples of three-address code instructions
are mentioned below;
 Assignment instruction x = y op z
 Assignment of the form x = op y
 Copy instructions of the form x = y
 An unconditional jump goto L
 Conditional jumps of the form if x goto L
 Indexed copy instructions of the form x = y[z] OR y[z] = x
 etc.
34

 Consider the following statement and its three-
address code in the figures;
do
i = i+1;
while( a[i]<v );
35

Quadruples & Triples
 The description of three-address instructions
specifies components of each type of instructions,
but it does not specify the representation of these
instructions in a data structure
 In a compiler, these instructions can be
implemented as objects or as records with fields for
the operator and the operands
 Three such representations are called “quadruples”,
“triples”, and “indirect triples”
36

Quadruples
 A quadruple or just “quad” has four fields, which we
call op, arg1, arg2, and result
 In x=y+z, ‘+’ is op, y and z are arg1 and arg2 whereas x is
result
 The following are some exceptions in this rule;
 Instructions with unary operators like x = minus y OR x = y
do not use arg2
 Operators like param use neither arg2 nor result
 Conditional and unconditional jumps put the target label in
result
37

Quadruples (Continue…)
 Example: Three-address code for the assignment
a = b*-c+b*-c is shown below
38

Triples
 A triple has only three fields which we call op, arg1,
and arg2
 In earlier example we have seen the result field is
used primarily for temporary names
 Using triples, we refer to the result of an operation x
op y by its position rather than an explicit temporary
name
 Consider the figure in next slide for details;
39

Triples (Continue…)
 Example: Three-address code using Triples
40

Static Single-Assignment Form
 The Static Single-Assignment Form (SSA) is an
intermediate representation that facilitates certain
code optimizations
 Two aspects distinguish SSA from three-address
code
 All assignments in SSA are to variables with distinct names
 SSA uses a notational convention Φ-function to combine
two definitions of same variables
if( flag ) x = -1; else x = 1;
y = x + a
if( flag ) x1 = -1; else x2 = 1;
x3 = Φ(x1,x2)
41

Lecture 21 22

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Lecture 21 22 (20)

Recently uploaded (20)

Lecture 21 22