1 
Summary: Direct Code Generation 
1 Direct Code Generation 
Code generation involves the generation of the target representation (object code) from 
the annotated parse tree (or Abstract Syntactic Tree, AST) produced by syntactic and 
semantic analysis. 
The output of code generation is typically assembler code, although compilers can also be 
used to translate a high level language to another high level language (source to source 
compiler) or from a low level language to a high level one (decompiler). We will assume 
here that assembler is produced. 
Code generation can be direct or indirect: 
• Direct code generation: The object code is produced directly from the syntactic 
tree. 
• Indirect Code Generation: The code generator produces an intermediate 
representation, which is at a level of abstractness between the parse tree and the 
target form. The intermediate code is mapped into the target form by a separate 
process. This approach has the advantage that all processing up until the 
intermediate representation is machine independent, and only the final step is 
machine dependent. 
Code generation can also be distinguished by how it is integrated into the syntactic 
analysis: 
• A single pass compiler performs semantic actions as syntactic rules are applied, 
and these semantic actions generate the code (either assembler directly, or the 
intermediate code). 
• A multiple pass compiler separates syntactic analysis and code generation. The 
parse tree is produced in its entirety, and this is the input to the code generation 
phase. 
The rest of this summary assumes direct code generation, and a single pass compiler, 
generating code via semantic actions. 
2 NASM Assembler 
2.1 Instructions 
Each instruction of a NASM file has the following form: 
label: instruction operands ; comment 
Typical example: fadd st1 
All fields are all optional with some restrictions. Relatively free use of white space: labels 
may have white space before them, or instructions may have no space before them. The 
colon after a label is optional. But use it for clarity.
2 
2.2 Data Space 
The data space of a program can be divided between: 
• Addressable Memory of the program: which can be accessed by referring, e.g., to 
location 345 
• Registers: defined locations which can hold one datum each. 
The program itself occupies the addressable memory of the program. 
Some assembler ‘pseudo instructions’ do not end up as machine code instructions, but 
rather reserve space, e.g. 
buffer: resb 64 
…reserves 64 bytes of memory at this point of the program. The label ‘buffer’ can be used 
to access this memory. 
Registers: NASM has 16 registers predefined: 8 16-bit and 8 32-bit. Our examples will 
work with the following 32-bit registers: 
• EAX (accumulator), EBX, ECX, EDX, 
ESP (stack pointer), EBP, ESI, EDI 
2.3 Operands 
Operands of instructions can be constants, registers, or references to locations in memory 
of the program. 
• Constants: E.g., 
resw 64 (reserves 64 words of memory) 
mov eax,100 (store 100 in register eax) 
• Registers: 
mov eax,ebx (moves contents of register ebx into register eax) 
• Indirect addressing: placing a register in [ ] brackets indicates that the location to use 
is contained within the register 
mov [eax],ebx 
(moves contents of register ebx into memory location contained in 
register eax) 
• Expressions: any operand can be replaced by an expression, e.g., 
mov [eax+1],ebx 
(moves contents of register ebx into memory location denoted by adding 1 to the content 
of register eax) 
Labels as Operands: Labels can be used in place of registers. When translated to 
machine code, the label will be replaced by the memory location of the instruction 
associated with the label. E.g., 
wordvar: resw 2 
mov [wordvar], eax 
(reserve 2 words of memory and then move the content of register eax into the first word 
of this space)
3 
3 Direct Code Generation 
3.1 How code is generated 
There are several ways to generate code from the syntax tree. In this course, we will 
assume it is done via the semantic actions connected to the syntax rules. For instance, 
E :- E1 + E2 { GEN(“add”, E1.loc, E2.loc) } 
The semantic action calls the function GEN with the arguments provided. GEN generates 
assembler code for its arguments, which is saved to the object code file for this source 
code. 
GEN would be c code defined elsewhere and provided to the Bison/Yacc compiler. When 
this rule is applied, the current values of the attributes E1.loc and E2.loc would be 
substituted. E1.loc and E2.loc were calculated as E1 and E2 were parsed. the ‘loc’ 
attribute records which variable or temporary location holds the value of the CFG symbol. 
GEN is responsible for resolving how these memory locations are referred to in the 
assembler code. 
3.2 Dealing with Registers 
Operations can be performed more rapidly when the operands are in registers than when 
they are in addressable memory. Also, some operations require one operand to be a 
particular register (e.g., mul, div). For these reasons, to generate assembler instructions, 
we sometimes need to load variables into registers. For instance, to generate code for: 
A = B + C 
...we might produce the following code: 
mov EAX, [B] ; move the contents of variable B into register EAX 
add EAX, [C] ; Add the contents of variable C to register EAX. 
mov A, EAX ; move the contents of register EAX to location A 
Note we need to generate three types of assembler instructions dealing with registers: 
• Instructions to move variables from memory location into a register 
• Instructions to perform operations on the registers 
• Instructions to store register values into a memory locations. 
An important job of the code generator is to keep track of where variable values are at a 
given point of time. In the example above, if the prior line of source code had left the value 
of B in the EAX register, then it would not be necessary to generate a line of code to move 
B into EAX. 
For this reason, the compiler maintains a variable “AC”, which records which variable’s 
value is currently held in the EAX register. Before generating an instruction to place a 
variable’s value into the register, the compiler checks what is the current value held in the 
register, and only generates the line if needed. 
3.3 The CAC function 
The CAC function is c code provided by the user for use in a YACC/Bison compiler, 
allowing this function to be referenced in semantic actions. 
It is called to ensure particular values are placed in the EAX register. The EAX register, is 
sometimes called the “accumulator”, and CAC thus stands for “control of Accumulator”.
The CAC function is called with two variables as arguments, and generates the assembler 
code needed to ensure one of them is in the EAX register. It returns 0 if the first of these 
variables ends up in the register, and 1 if the second ends up in the register. 
The code for CAC is as follows: 
4 
int CAC (opd *x, opd *y) { 
if (AC==y) return 1; 
if (AC!=x) { 
if (AC!=NULL) GEN ("MOV", AC, "EAX"); 
GEN ("MOV”, “EAX”, x); 
AC=x; 
} 
return 0; 
} 
X and Y represent the two variables which CAC needs to deal with. AC is a variable 
maintained by the compiler, keeping track of which variable should currently be loaded into 
the EAX register (NULL if the current value is not a variable value). 
NOTE: AC and CAC belong to the compiler, they are not part of the assembler code. 
In the first line of the function, the program checks if y is already in EAX. If so, nothing 
needs to be done, so the program returns 1, indicating that the y value is in the register. 
In next lines of code, the program makes sure that the current value of the register is x. If it 
is not the current value, code is generated to move the value currently in the register back 
to its place, and then code is generated to move x into the register. 
The line “AC=x” tells the compiler that at that point, the execution of generated code would 
leave the variable x in EAX. 
The CAC function can be used in other code as follows. Assume we wish to generate 
code to add two values. 
if CAC(x, y) 
GEN(“add”, “EAX”, x) 
else: 
GEN(“add”, “EAX”, y) 
AC=z 
The user calls CAC, which returns 0 if x is in the EAX register, and 1 otherwise. CAC itself 
would issue 0, 1 or 2 lines of code: 
• If x or y were already in EAX, no code would be generated by CAC. 
• if AC was Null, then CAC would generate 1 line to move x into EAX: 
mov eax, [x] 
• if AC was not Null, then CAC would generate 1 line to move EAX back to its 
location, and another to move x into EAX: 
mov [%AC], eax 
mov eax, [x] 
(where %AC would be replaced at compile time with the contents of AC) 
CAC will return 0 or 1. If 1 is returned, the code above would generate code to add x to the 
EAX, otherwise, it generated a line of code to add y to eax.
4 Generating Mathematical Expressions 
This section deals with the generation of code for a mathematical expression, such as “A + 
B” or “A * B”, etc. This would correspond to a grammar rule such as “E :- E1 + E2”. Each of 
the Es on the right hand side can correspond to a simple constant (int or float), an identify 
(a variable), another mathematical expression, or a function call. 
Lets assume a bottom-up parser with code generation performed at the same time as 
syntactic analysis. In this case, We generate code for “E :- E1 + E2” at the time of 
reduction of the rule. 
The recognition of E1 and E2 would also have generated some code, which would thus 
appear in the assembly program before the code for the current rule. This code would 
calculate the values of the right-hand-side Es. 
The code we generate for the rule “E :- E1 + E2” depends on where the values calculated 
for E1 and E2 are left. If E1 is a variable B and E2 is a constant, 120, we might simply 
generate lines of code such as: 
5 
MOV EAX, [B] 
ADD EAX, 120 
If prior code left the value of B already in EAX, then we would not need to generate the 
first line. 
If E1 was itself a mathematical expression, we need to generate code keeping in mind 
where the previously generated code left the value of E1 (possibly in EAX itself). 
The other problem here is that in many languages, mathematical expressions can 
combine data of different types (e.g., int, long, float). Often, the number of bytes of the 
operands will determine the register which will be used to perform the operation. 
One solution is to use conditional code to generate different assembly code depending on 
where the values of the expression are currently stored. The problem splits into two parts: 
1. Getting the values into the correct locations (at least one in a register of the 
appropriate type for the operation, e.g., a float or int register). 
2. Generating the assembler code to perform the operation (the operator needs to be 
float or int). 
Below we give a possible implementation for the realisation of “E+E” (sum). The example 
assumes we are dealing with only three data types: 
• Unsigned chars: 1 byte 
• Int: 2 bytes 
• Double (float): 8 bytes 
These numbers can be from three sources: 
• In the variable space 
• A constant 
• Already in a register 
Where one of the numbers is an unsigned char, it is loaded into an int register, and this 
register is used instead of the original location. In the process described below, it is then 
treated as an int. 
We need three distinct assembler operands: 
• If both of the numbers are int, we use ADD x, y to add the numbers, leaving an int 
in the location of x. 
• If both of the numbers are double, we use the ‘FADD x’ operation. This operation 
assumes a stack (pila) used for storing results. The first number is assumed to be
at the top (cima) of the stack, and the operation adds the operand to this location, 
leaving the result in place of the original value (on top of the stack). 
• if one number is a double, and the other an integer , the double is placed on the 
top of the stack, and then an ‘FIADD x’ operation is used, which adds its integer 
operand to the value on top of the stack, leaving the result in place of the original 
value. 
The following table could be used by the compiler as part of the generation of the 
operation E+E. It allows two numbers, of whatever type, and wherever located, to be 
added together. 
6 
• 
Type of y Operand 
Type of x 
operand 
unsigned 
char 
int Register 
int 
Constant 
int 
double Register 
double 
unsigned 
char 
Load x 
Re-enter 
Swap 
Re-enter 
Swap 
Re-enter 
Load x 
Re-enter 
Load x 
Re-enter 
Load x 
Re-enter 
int Load y 
Re-enter 
Load y 
Re-enter 
ADD y,x Load y 
Re-enter 
Load x 
Re-enter 
FIADD x 
Register 
int 
Load y 
Re-enter 
ADD x,y ADD x,y ADD x,y MOV 
T,x 
Re-enter 
MOV 
T,x 
Re-enter 
Constant 
int 
Swap 
Re-enter 
Swap 
Re-enter 
Swap 
Re-enter 
- Load x 
Re-enter 
FADD x 
double Load y 
Re-enter 
Load y 
Re-enter 
Swap 
Re-enter 
Swap 
Re-enter 
Load y 
Re-enter 
FADD x 
Register 
double 
Swap 
Re-enter 
Swap 
Re-enter 
Swap 
Re-enter 
Swap 
Re-enter 
Swap 
Re-enter 
FADD y 
The table assumes there is a function “Load” within the compiler which places the named 
value into a register of the appropriate type. This function is driven from the following table. 
It assumes that the value to load is either unsigned char, int, int constant or double. The 
table generates distinct code depending on whether you want to load the value into an int 
or double register. 
Load into a register of Type of operand to load 
type: unsigned 
char 
int int constant double 
int XOR RH,RH 
MOV RL,x 
MOV RX,x MOV RX,x FLD x 
FISTP x 
MOV RX,x 
double XOR RH,RH 
MOV RL,x 
MOV T,RX 
FLD T 
FILD x MOV T,x 
FLD T 
FLD x 
Integer operations load their values into a 2 byte register RX. Each byte of RX can be 
accessed individually: RH is the high byte, and RL is the low byte. The operation “XOR 
RH,RH” basically sets all bits of RH to 0 (since the ‘exclusive or’ of two identical numbers is 0). If 
the number to load is an unsigned char, the high byte is cleared, and the char is loaded in the low 
byte. If the number to load is an integer, it is loaded into both bytes directly.
Float operations make use of the stack (an area of memory assigned for such operations). The 
FLD operation loads the float operand onto the top of the stack. The FILD operation loads an 
integer operand onto the top of the stack with 8 bytes of space. 
Lets try an example. We start with code “S+7”, where S is a variable of type float, and 7 is 
an integer constant. On the entry to the function, we have “x” (=S) as a double and “y” (=7) 
as a constant int. 
The code for this cell is “Swap; Re-enter”. This means that we swap the values of x and y, 
and then restart the procedure. 
Now we have x (=7) and y (=S), which means we look at the cell for x=const int and y = 
double. The code for this cell is “load x; re-enter”. The call to “load x” with x as a const int, 
which we want to put into a double register. We thus issue the assembler code: 
7 
MOV T, 7 
FLD T 
We then perform the “re-enter” command, and re-start the routine with x in a double 
register, and y still a double variable. We thus get the commands: “Swap; re-enter”. We 
thus re-enter with x as a double variable, and y as a double register. We thus issue the 
assembler: 
FADD S 
…and are finished. 3 assembler commands issued. 
5 Generating Conditional Instructions 
5.1 The Status Flags and conditional jumping 
A special register exists called the “FLAGS” register. It consists of a sequence of bits, 
which are set (1) or unset (0). These flags are set as the result of mathematical 
operations, e.g., ADD, SUB, MUL or DIV , or their float alternatives. 
• ZF (Zero Flag): set if the operation results in a zero value, unset otherwise. 
• SF (Sign Flag): set if operation results in a negative value, unset otherwise. 
These flags can be referenced in conditional jump operations, e.g., 
jz L100 ; jump to L100 if last op resulted in zero 
5.2 Integer Comparison: CMP 
The NASM instruction CMP basically subtracts its second argument from the first. The 
result is not stored anywhere, but the ZF and SF flags are set as a result of the operation. 
The CMP instruction is thus usually followed by a conditional jump, e.g., 
CMP [A], [B] 
JZ L1 ; jump if cmp result was zero
8 
5.3 Simple If statements 
If-then statements can be mapped into assembler as follows. Assume code like: 
if A == B then <stmt1> 
Firstly, we generate code for the comparison, e.g., 
CMP [A], [B] 
Then we generate code to jump over the code for <stmt1> if test fails 
JZ L1 ; jump if cmp result was zero 
Then we put the code for <stmt1>, e.g. 
ADD X, Y 
On the line following this, we put the label from above 
L1: … 
if A == B then <block> CMP [A], [B] 
JZ L1 
.... CODE FOR <block> 
L1: 
We can use semantic actions to generate the assembler for the source structure. #A1 and 
#A2 correspond to lambda rules with associated semantic actions, used as a means to 
generate code in the correct location (e.g., in parsing “<stmt>:-if <exp> then #A1 <block> 
#A2”, we reduce elements in the following order: <exp>, #A1, <block>, #A2 and then 
<stmt>, and thus the semantic actions to produce code are performed in that order). 
Attribute Grammar: 
<stmt> :- if <exp> then #A1 <block> #A2 
#A1 :- l { Generate code to jump if exp non-zero } 
#A2 :- l { Generate line with label } 
<exp> :- ...
5.4 If -else statements 
If-else statements are a little more complex. A typical if-else statement might generate 
code like: 
9 
CMP [A], [B] 
JZ L1 ; jump if cmp result was zero 
<block1 code> 
JMP L2 
L1: <block2 code> 
L2: … 
Attribute Grammar: 
<stmt> :- if <exp> then #A1 <block1> #A2 else <block2> #A3 
#A1 :- l { Generate code to jump to start_else if exp non-zero } 
#A2 :- l { Generate jmp to end_ifelse; 
Then generate label for start_else } 
#A3 :- l { Generate line with label for end_ifelse } 
6 Generating Loops 
6.1 While Loops 
While loops map onto assembler much as for an if-statement. E.g., for 
while <exp> do <instructions> end 
<loop> :- while #A1 <exp> #A2 do <instructions> end #A3 
#A1 :- l { Generate line with a unique label for loop start } 
#A2 :- l { Generate line with jump to end if expr fails } 
#A3 :- l { Generate jump back to start, and label for loop end }
10 
Below is code from a real while loop: 
topwhile: ;a label to mark the top of this WHILE loop 
mov eax, 3 ;planning to invoke function 3—read from a file 
mov ebx, [infileID] ;the file ID must be placed into register ebx 
mov ecx, mybyte ;the address of memory to receive file content 
;must be placed into register ecx 
mov edx, 1 ;the number of bytes to read is placed in edx, 
int 80h ; invokes a kernel function according to 
;the number in register eax 
cmp eax, 0 ;check whether a byte was read 
je dunwhile ;skip the body if no bytes were read 
xor byte [mybyte], 00001111b ;[] dereferences, thereby refers to the 
;contents at mybytes 
mov eax, 4 ; planning to invoke function 4—write to a file 
mov ebx, [outfileID] ;the file ID must be placed into register ebx 
mov ecx, mybyte ;the address of memory to write from must be 
;placed into register ecx 
mov edx, 1 ;the no. of bytes to write must be placed in edx 
int 80h ; invokes a kernel function according to no. in eax 
jmp topwhile ;go back to the top of the loop 
dunwhile: ;jump to here if no byte is read
11 
6.2 Repeat Loops 
<loop> :- repeat #A1 <instructions> until <exp> #A2 
#A1: Generate unique label for loop start 
#A2: Generate jump to end if last result zero 
Generate unconditional jump back to beginning 
Generate label for loop end 
<instr> Code for <instr> generated by other productions 
<exp>: Code for <exp> generated by other productions 
7 Generating Code from Functions 
This section covers the generation of code for functions. This includes the generation of 
function calls and the generation of the code of the function body itself. Three important 
issues here are: 
1. How are the parameters passed to the function. 
2. How are local variables represented within the function. 
3. How are values returned from the function. 
There are many possible ways to implement functions. Basically, it is up to the person 
writing the code generator to decide how to do it. We describe here one of the more 
standard ways of generating functions and function calls. 
7.1 The Stack Space 
Our implementation of functions depends heavily on the use of a stack in the program 
memory. Many assemblers assign part of the addressable memory of the program to a 
stack to hold information about the current variable context. Basically, when we enter a 
function, space is allocated on top of the stack for the local variables, and when we exit 
from the function, this allocated space is popped off the stack. The stack thus represents 
the embedded block structure we discussed under symbol tables. 
The stack typically starts at the top of addressable memory, and expands downwards. So, 
assume we have 1000 bytes of addressable memory, the “bottom of the stack” will start at 
address 1000. If we push a 2-byte integer onto the stack, it will occupy memory range 999- 
1000. Pushing an 8 byte float value onto this stack, it would occupy bytes: 991-998. 
A register called SP (for Stack pointer) indicates the top of the stack. In some systems, SP 
will point at the next free location in the stack. In others, it points to the lowest byte of the 
top element of the stack. We will assume this last approach, so in the above case, after 
pushing on the two numbers, SP would contain 991.
7.2 The Function Call 
Before calling the function, parameters are pushed onto the stack. These can then be 
accessed by the call routine, from the top of the stack. So that the parameters are 
available in the required order, they are pushed onto the stack in reverse order. 
12 
The Call in Source Code: 
rutina(a, b) 
The Call In Assembler 
... 
PUSH b 
PUSH a 
CALL rutina 
... 
The NASM instruction “CALL” firstly pushes the address of the following instruction onto 
the stack. This will be used as the return address when the function call returns. 
7.3 Entering the Routine 
On entering the routine, the routine firstly establishes the boundaries of the local space of 
the stack. A register BP (Base Pointer) is used to indicate the lowest point of the stack 
which is part of the current context. Consequently, the first thing a routine does on entry is 
to store the old value of BP onto the stack (for later recovery and restoration), and then 
reset the BP to point at the current top of the stack (which is the point from which the local 
context will grow). 
The first lines of any routine will thus be something like the following: 
rutina: 
PUSH BP 
MOV BP,SP 
‘rutina’ is the name of the function, represented as a label in assembler. The old value of 
BP is pushed onto the stack, and then BP is reset to the value of SP (top of the stack).
7.4 Allocating Space for Local Variables 
The next step is to allocate stack space for the local variables. The compiler works out 
how many bytes of memory are required for the local variables, and decrements the stack 
pointer (the stack grows down, remember) by this amount. In the following example, each 
int takes 2 bytes and each the double 8 bytes, a total of 14 bytes. 
Source code: 
int rutina (int a, char *b) 
{ 
13 
int i, j, k; 
double r; 
. . . 
} 
Assembler Code: 
rutina: 
PUSH BP 
MOV BP,SP 
SUB SP,14 
... 
7.5 Referring to parameters and local variables 
In the body of the function, rather than referring to variables by name, one references 
them in terms of offsets from the base pointer. 
Parameters: parameters were pushed onto the stack BEFORE the function was called, 
and thus are part of the previous context, they are thus above BP. In the above example, 
parameters a and b can be accessed using [BP+6] and [BP+8] (note the 6 bytes used to 
store the old BP and the return address). 
Local Variables: The local variables are available under BP in memory. i, j and k are thus 
available as, respectively: [BP-2], [BP-4], [BP-6]. r starts at [BP-14]. 
Then, the program address to return to is pushed on the stack. 
b 
a 
Free 
Memory 
BP 
return address 
old_bp 
i 
j 
k 
r 
SP 
[BP+8] 
[BP+6] 
[BP+2] 
[BP] 
[BP-2] 
[BP-4] 
[BP-6] 
[BP-14] 
On entering the routine, space is allocated for local variables of that routine. 
On leaving the routine, the part of the stack used by the routine can be ‘popped’. 
Recursive routines thus have separate memory space. 
7.6 Placing The function’s code 
After we generate the line to allocate space for the local variables, we then generate the 
code for the body of the function. Firstly, the line MOV SP,BP resets the stack pointer to
its value before calling this routine (we thus pop all the local stack space off the stack). At 
this point, the top element on the stack is the old BP. We can thus issue a command POP 
BP which pops this element off the stack into BP, thus resetting the BP to its prior value 
(SP is also moved up two bytes). 
At this point, the element on top of the stack is the address where execution should 
resume in the calling context. The RET operator pops an element of the stack, and 
resumes processing from that point. 
Back in the calling function, after the function call, we then need to wipe the function 
parameters off the stack. We do this simply by ADD SP,4. 
The calling code: 
PUSH b 
PUSH a 
CALL rutina 
ADD SP,4 
… 
14 
The routine code: 
rutina: 
PUSH BP 
MOV BP,SP 
SUB SP,14 
... 
MOV SP,BP 
POP BP 
RET 
... 
7.7 Returning Values 
A function may or may not return a value. There are various ways to return a value, and it 
is up to the compiler writer to decide how it is done. One way is to leave the returned value 
in the EAX register (if it fits), or in a float register for larger numbers. Alternatively, the 
number could have been placed on the stack, to be popped layer by the calling routine. 
. . . 
rutina(a, b) 
. . . 
int rutina (int a, char *b) 
{ 
int i, j, k; 
double r; 
. . . 
return k; 
} 
Source Program: 
PUSH b 
PUSH a 
CALL rutina 
ADD SP,4 
rutina: 
PUSH BP 
MOV BP,SP 
SUB SP,14 
... 
MOV EAX, [BP-6] 
MOV SP,BP 
POP BP 
RET 
Object Program: 
Move K 
into EAX

More Related Content

PPT
Advanced c programming in Linux
PPT
Assembler (2)
PPT
Assembler design options
PDF
system software 16 marks
PPT
Assembler
PPTX
Two pass Assembler
PPTX
First pass of assembler
PPT
Assembler
Advanced c programming in Linux
Assembler (2)
Assembler design options
system software 16 marks
Assembler
Two pass Assembler
First pass of assembler
Assembler

What's hot (20)

KEY
Unit 1 cd
PDF
Examinable Question and answer system programming
DOC
Lex tool manual
PPTX
Lex & yacc
PPT
Assemblers: Ch03
PPTX
PPT
Loader
PPT
Compiler Design Tutorial
PPTX
Intermediate code- generation
PPT
Lex and Yacc ppt
PPTX
Workshop Assembler
PPT
The smartpath information systems c plus plus
PPTX
Overview of c++ language
PPT
01 c++ Intro.ppt
PPTX
Lecture 15 run timeenvironment_2
PPTX
Yacc (yet another compiler compiler)
PDF
VHDL- data types
PPTX
Assemblers
PDF
DOCX
Yacc topic beyond syllabus
Unit 1 cd
Examinable Question and answer system programming
Lex tool manual
Lex & yacc
Assemblers: Ch03
Loader
Compiler Design Tutorial
Intermediate code- generation
Lex and Yacc ppt
Workshop Assembler
The smartpath information systems c plus plus
Overview of c++ language
01 c++ Intro.ppt
Lecture 15 run timeenvironment_2
Yacc (yet another compiler compiler)
VHDL- data types
Assemblers
Yacc topic beyond syllabus
Ad

Viewers also liked (15)

DOC
Investigacion de Aplicacion Informatica I
PPTX
Materi tik bab 4
PDF
Connected assessment (digitarp)
PDF
digitarp - Soluzioni digitali a supporto della relazione intermediario cliente
PPTX
Tik bab 2 kelas 9
PDF
Corsa contro il tempo per diventare virtuali
DOC
Investigacion de aplicacion informatica
PPT
Teirema e pitagorëskl7
PPTX
Materi tik bab 3 kelas 9
DOC
thực đơn nhà hàng cánh buồm
PDF
Traduccion a ensamblador
PDF
Thucdonalacarte
PPTX
Early American Mythology
PPTX
Summer Internship Report of Acc cement ppt
PDF
JRaese_CV
Investigacion de Aplicacion Informatica I
Materi tik bab 4
Connected assessment (digitarp)
digitarp - Soluzioni digitali a supporto della relazione intermediario cliente
Tik bab 2 kelas 9
Corsa contro il tempo per diventare virtuali
Investigacion de aplicacion informatica
Teirema e pitagorëskl7
Materi tik bab 3 kelas 9
thực đơn nhà hàng cánh buồm
Traduccion a ensamblador
Thucdonalacarte
Early American Mythology
Summer Internship Report of Acc cement ppt
JRaese_CV
Ad

Similar to Generacion de codigo ensamblado (20)

PPTX
Assembly Language for as level computer science
DOCX
1 Describe different types of Assemblers.Assembly language.docx
PPTX
Unit iv(simple code generator)
PPTX
Programming the basic computer
PPT
Assembler
PPT
Assembler
PPT
Assembler design option
PPTX
Compiler Design_Code generation techniques.pptx
PDF
Alp 05
PDF
PPTX
6 assembly language computer organization
PDF
Mips Assembly
PDF
Module-4 Program Design and Anyalysis.pdf
DOC
Compiler notes--unit-iii
PPT
Lec 04 intro assembly
PPT
8051h.ppt microcontroller Assembly Language Programming
PPTX
Assembly 8086
PPTX
It322 intro 3
Assembly Language for as level computer science
1 Describe different types of Assemblers.Assembly language.docx
Unit iv(simple code generator)
Programming the basic computer
Assembler
Assembler
Assembler design option
Compiler Design_Code generation techniques.pptx
Alp 05
6 assembly language computer organization
Mips Assembly
Module-4 Program Design and Anyalysis.pdf
Compiler notes--unit-iii
Lec 04 intro assembly
8051h.ppt microcontroller Assembly Language Programming
Assembly 8086
It322 intro 3

Recently uploaded (20)

PDF
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Soil Improvement Techniques Note - Rabbi
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
PPTX
Feature types and data preprocessing steps
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
737-MAX_SRG.pdf student reference guides
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PPTX
Amdahl’s law is explained in the above power point presentations
PDF
Abrasive, erosive and cavitation wear.pdf
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
communication and presentation skills 01
PPTX
introduction to high performance computing
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
Categorization of Factors Affecting Classification Algorithms Selection
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Soil Improvement Techniques Note - Rabbi
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
Feature types and data preprocessing steps
Management Information system : MIS-e-Business Systems.pptx
737-MAX_SRG.pdf student reference guides
Fundamentals of safety and accident prevention -final (1).pptx
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Amdahl’s law is explained in the above power point presentations
Abrasive, erosive and cavitation wear.pdf
"Array and Linked List in Data Structures with Types, Operations, Implementat...
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
communication and presentation skills 01
introduction to high performance computing
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
August 2025 - Top 10 Read Articles in Network Security & Its Applications
Categorization of Factors Affecting Classification Algorithms Selection

Generacion de codigo ensamblado

  • 1. 1 Summary: Direct Code Generation 1 Direct Code Generation Code generation involves the generation of the target representation (object code) from the annotated parse tree (or Abstract Syntactic Tree, AST) produced by syntactic and semantic analysis. The output of code generation is typically assembler code, although compilers can also be used to translate a high level language to another high level language (source to source compiler) or from a low level language to a high level one (decompiler). We will assume here that assembler is produced. Code generation can be direct or indirect: • Direct code generation: The object code is produced directly from the syntactic tree. • Indirect Code Generation: The code generator produces an intermediate representation, which is at a level of abstractness between the parse tree and the target form. The intermediate code is mapped into the target form by a separate process. This approach has the advantage that all processing up until the intermediate representation is machine independent, and only the final step is machine dependent. Code generation can also be distinguished by how it is integrated into the syntactic analysis: • A single pass compiler performs semantic actions as syntactic rules are applied, and these semantic actions generate the code (either assembler directly, or the intermediate code). • A multiple pass compiler separates syntactic analysis and code generation. The parse tree is produced in its entirety, and this is the input to the code generation phase. The rest of this summary assumes direct code generation, and a single pass compiler, generating code via semantic actions. 2 NASM Assembler 2.1 Instructions Each instruction of a NASM file has the following form: label: instruction operands ; comment Typical example: fadd st1 All fields are all optional with some restrictions. Relatively free use of white space: labels may have white space before them, or instructions may have no space before them. The colon after a label is optional. But use it for clarity.
  • 2. 2 2.2 Data Space The data space of a program can be divided between: • Addressable Memory of the program: which can be accessed by referring, e.g., to location 345 • Registers: defined locations which can hold one datum each. The program itself occupies the addressable memory of the program. Some assembler ‘pseudo instructions’ do not end up as machine code instructions, but rather reserve space, e.g. buffer: resb 64 …reserves 64 bytes of memory at this point of the program. The label ‘buffer’ can be used to access this memory. Registers: NASM has 16 registers predefined: 8 16-bit and 8 32-bit. Our examples will work with the following 32-bit registers: • EAX (accumulator), EBX, ECX, EDX, ESP (stack pointer), EBP, ESI, EDI 2.3 Operands Operands of instructions can be constants, registers, or references to locations in memory of the program. • Constants: E.g., resw 64 (reserves 64 words of memory) mov eax,100 (store 100 in register eax) • Registers: mov eax,ebx (moves contents of register ebx into register eax) • Indirect addressing: placing a register in [ ] brackets indicates that the location to use is contained within the register mov [eax],ebx (moves contents of register ebx into memory location contained in register eax) • Expressions: any operand can be replaced by an expression, e.g., mov [eax+1],ebx (moves contents of register ebx into memory location denoted by adding 1 to the content of register eax) Labels as Operands: Labels can be used in place of registers. When translated to machine code, the label will be replaced by the memory location of the instruction associated with the label. E.g., wordvar: resw 2 mov [wordvar], eax (reserve 2 words of memory and then move the content of register eax into the first word of this space)
  • 3. 3 3 Direct Code Generation 3.1 How code is generated There are several ways to generate code from the syntax tree. In this course, we will assume it is done via the semantic actions connected to the syntax rules. For instance, E :- E1 + E2 { GEN(“add”, E1.loc, E2.loc) } The semantic action calls the function GEN with the arguments provided. GEN generates assembler code for its arguments, which is saved to the object code file for this source code. GEN would be c code defined elsewhere and provided to the Bison/Yacc compiler. When this rule is applied, the current values of the attributes E1.loc and E2.loc would be substituted. E1.loc and E2.loc were calculated as E1 and E2 were parsed. the ‘loc’ attribute records which variable or temporary location holds the value of the CFG symbol. GEN is responsible for resolving how these memory locations are referred to in the assembler code. 3.2 Dealing with Registers Operations can be performed more rapidly when the operands are in registers than when they are in addressable memory. Also, some operations require one operand to be a particular register (e.g., mul, div). For these reasons, to generate assembler instructions, we sometimes need to load variables into registers. For instance, to generate code for: A = B + C ...we might produce the following code: mov EAX, [B] ; move the contents of variable B into register EAX add EAX, [C] ; Add the contents of variable C to register EAX. mov A, EAX ; move the contents of register EAX to location A Note we need to generate three types of assembler instructions dealing with registers: • Instructions to move variables from memory location into a register • Instructions to perform operations on the registers • Instructions to store register values into a memory locations. An important job of the code generator is to keep track of where variable values are at a given point of time. In the example above, if the prior line of source code had left the value of B in the EAX register, then it would not be necessary to generate a line of code to move B into EAX. For this reason, the compiler maintains a variable “AC”, which records which variable’s value is currently held in the EAX register. Before generating an instruction to place a variable’s value into the register, the compiler checks what is the current value held in the register, and only generates the line if needed. 3.3 The CAC function The CAC function is c code provided by the user for use in a YACC/Bison compiler, allowing this function to be referenced in semantic actions. It is called to ensure particular values are placed in the EAX register. The EAX register, is sometimes called the “accumulator”, and CAC thus stands for “control of Accumulator”.
  • 4. The CAC function is called with two variables as arguments, and generates the assembler code needed to ensure one of them is in the EAX register. It returns 0 if the first of these variables ends up in the register, and 1 if the second ends up in the register. The code for CAC is as follows: 4 int CAC (opd *x, opd *y) { if (AC==y) return 1; if (AC!=x) { if (AC!=NULL) GEN ("MOV", AC, "EAX"); GEN ("MOV”, “EAX”, x); AC=x; } return 0; } X and Y represent the two variables which CAC needs to deal with. AC is a variable maintained by the compiler, keeping track of which variable should currently be loaded into the EAX register (NULL if the current value is not a variable value). NOTE: AC and CAC belong to the compiler, they are not part of the assembler code. In the first line of the function, the program checks if y is already in EAX. If so, nothing needs to be done, so the program returns 1, indicating that the y value is in the register. In next lines of code, the program makes sure that the current value of the register is x. If it is not the current value, code is generated to move the value currently in the register back to its place, and then code is generated to move x into the register. The line “AC=x” tells the compiler that at that point, the execution of generated code would leave the variable x in EAX. The CAC function can be used in other code as follows. Assume we wish to generate code to add two values. if CAC(x, y) GEN(“add”, “EAX”, x) else: GEN(“add”, “EAX”, y) AC=z The user calls CAC, which returns 0 if x is in the EAX register, and 1 otherwise. CAC itself would issue 0, 1 or 2 lines of code: • If x or y were already in EAX, no code would be generated by CAC. • if AC was Null, then CAC would generate 1 line to move x into EAX: mov eax, [x] • if AC was not Null, then CAC would generate 1 line to move EAX back to its location, and another to move x into EAX: mov [%AC], eax mov eax, [x] (where %AC would be replaced at compile time with the contents of AC) CAC will return 0 or 1. If 1 is returned, the code above would generate code to add x to the EAX, otherwise, it generated a line of code to add y to eax.
  • 5. 4 Generating Mathematical Expressions This section deals with the generation of code for a mathematical expression, such as “A + B” or “A * B”, etc. This would correspond to a grammar rule such as “E :- E1 + E2”. Each of the Es on the right hand side can correspond to a simple constant (int or float), an identify (a variable), another mathematical expression, or a function call. Lets assume a bottom-up parser with code generation performed at the same time as syntactic analysis. In this case, We generate code for “E :- E1 + E2” at the time of reduction of the rule. The recognition of E1 and E2 would also have generated some code, which would thus appear in the assembly program before the code for the current rule. This code would calculate the values of the right-hand-side Es. The code we generate for the rule “E :- E1 + E2” depends on where the values calculated for E1 and E2 are left. If E1 is a variable B and E2 is a constant, 120, we might simply generate lines of code such as: 5 MOV EAX, [B] ADD EAX, 120 If prior code left the value of B already in EAX, then we would not need to generate the first line. If E1 was itself a mathematical expression, we need to generate code keeping in mind where the previously generated code left the value of E1 (possibly in EAX itself). The other problem here is that in many languages, mathematical expressions can combine data of different types (e.g., int, long, float). Often, the number of bytes of the operands will determine the register which will be used to perform the operation. One solution is to use conditional code to generate different assembly code depending on where the values of the expression are currently stored. The problem splits into two parts: 1. Getting the values into the correct locations (at least one in a register of the appropriate type for the operation, e.g., a float or int register). 2. Generating the assembler code to perform the operation (the operator needs to be float or int). Below we give a possible implementation for the realisation of “E+E” (sum). The example assumes we are dealing with only three data types: • Unsigned chars: 1 byte • Int: 2 bytes • Double (float): 8 bytes These numbers can be from three sources: • In the variable space • A constant • Already in a register Where one of the numbers is an unsigned char, it is loaded into an int register, and this register is used instead of the original location. In the process described below, it is then treated as an int. We need three distinct assembler operands: • If both of the numbers are int, we use ADD x, y to add the numbers, leaving an int in the location of x. • If both of the numbers are double, we use the ‘FADD x’ operation. This operation assumes a stack (pila) used for storing results. The first number is assumed to be
  • 6. at the top (cima) of the stack, and the operation adds the operand to this location, leaving the result in place of the original value (on top of the stack). • if one number is a double, and the other an integer , the double is placed on the top of the stack, and then an ‘FIADD x’ operation is used, which adds its integer operand to the value on top of the stack, leaving the result in place of the original value. The following table could be used by the compiler as part of the generation of the operation E+E. It allows two numbers, of whatever type, and wherever located, to be added together. 6 • Type of y Operand Type of x operand unsigned char int Register int Constant int double Register double unsigned char Load x Re-enter Swap Re-enter Swap Re-enter Load x Re-enter Load x Re-enter Load x Re-enter int Load y Re-enter Load y Re-enter ADD y,x Load y Re-enter Load x Re-enter FIADD x Register int Load y Re-enter ADD x,y ADD x,y ADD x,y MOV T,x Re-enter MOV T,x Re-enter Constant int Swap Re-enter Swap Re-enter Swap Re-enter - Load x Re-enter FADD x double Load y Re-enter Load y Re-enter Swap Re-enter Swap Re-enter Load y Re-enter FADD x Register double Swap Re-enter Swap Re-enter Swap Re-enter Swap Re-enter Swap Re-enter FADD y The table assumes there is a function “Load” within the compiler which places the named value into a register of the appropriate type. This function is driven from the following table. It assumes that the value to load is either unsigned char, int, int constant or double. The table generates distinct code depending on whether you want to load the value into an int or double register. Load into a register of Type of operand to load type: unsigned char int int constant double int XOR RH,RH MOV RL,x MOV RX,x MOV RX,x FLD x FISTP x MOV RX,x double XOR RH,RH MOV RL,x MOV T,RX FLD T FILD x MOV T,x FLD T FLD x Integer operations load their values into a 2 byte register RX. Each byte of RX can be accessed individually: RH is the high byte, and RL is the low byte. The operation “XOR RH,RH” basically sets all bits of RH to 0 (since the ‘exclusive or’ of two identical numbers is 0). If the number to load is an unsigned char, the high byte is cleared, and the char is loaded in the low byte. If the number to load is an integer, it is loaded into both bytes directly.
  • 7. Float operations make use of the stack (an area of memory assigned for such operations). The FLD operation loads the float operand onto the top of the stack. The FILD operation loads an integer operand onto the top of the stack with 8 bytes of space. Lets try an example. We start with code “S+7”, where S is a variable of type float, and 7 is an integer constant. On the entry to the function, we have “x” (=S) as a double and “y” (=7) as a constant int. The code for this cell is “Swap; Re-enter”. This means that we swap the values of x and y, and then restart the procedure. Now we have x (=7) and y (=S), which means we look at the cell for x=const int and y = double. The code for this cell is “load x; re-enter”. The call to “load x” with x as a const int, which we want to put into a double register. We thus issue the assembler code: 7 MOV T, 7 FLD T We then perform the “re-enter” command, and re-start the routine with x in a double register, and y still a double variable. We thus get the commands: “Swap; re-enter”. We thus re-enter with x as a double variable, and y as a double register. We thus issue the assembler: FADD S …and are finished. 3 assembler commands issued. 5 Generating Conditional Instructions 5.1 The Status Flags and conditional jumping A special register exists called the “FLAGS” register. It consists of a sequence of bits, which are set (1) or unset (0). These flags are set as the result of mathematical operations, e.g., ADD, SUB, MUL or DIV , or their float alternatives. • ZF (Zero Flag): set if the operation results in a zero value, unset otherwise. • SF (Sign Flag): set if operation results in a negative value, unset otherwise. These flags can be referenced in conditional jump operations, e.g., jz L100 ; jump to L100 if last op resulted in zero 5.2 Integer Comparison: CMP The NASM instruction CMP basically subtracts its second argument from the first. The result is not stored anywhere, but the ZF and SF flags are set as a result of the operation. The CMP instruction is thus usually followed by a conditional jump, e.g., CMP [A], [B] JZ L1 ; jump if cmp result was zero
  • 8. 8 5.3 Simple If statements If-then statements can be mapped into assembler as follows. Assume code like: if A == B then <stmt1> Firstly, we generate code for the comparison, e.g., CMP [A], [B] Then we generate code to jump over the code for <stmt1> if test fails JZ L1 ; jump if cmp result was zero Then we put the code for <stmt1>, e.g. ADD X, Y On the line following this, we put the label from above L1: … if A == B then <block> CMP [A], [B] JZ L1 .... CODE FOR <block> L1: We can use semantic actions to generate the assembler for the source structure. #A1 and #A2 correspond to lambda rules with associated semantic actions, used as a means to generate code in the correct location (e.g., in parsing “<stmt>:-if <exp> then #A1 <block> #A2”, we reduce elements in the following order: <exp>, #A1, <block>, #A2 and then <stmt>, and thus the semantic actions to produce code are performed in that order). Attribute Grammar: <stmt> :- if <exp> then #A1 <block> #A2 #A1 :- l { Generate code to jump if exp non-zero } #A2 :- l { Generate line with label } <exp> :- ...
  • 9. 5.4 If -else statements If-else statements are a little more complex. A typical if-else statement might generate code like: 9 CMP [A], [B] JZ L1 ; jump if cmp result was zero <block1 code> JMP L2 L1: <block2 code> L2: … Attribute Grammar: <stmt> :- if <exp> then #A1 <block1> #A2 else <block2> #A3 #A1 :- l { Generate code to jump to start_else if exp non-zero } #A2 :- l { Generate jmp to end_ifelse; Then generate label for start_else } #A3 :- l { Generate line with label for end_ifelse } 6 Generating Loops 6.1 While Loops While loops map onto assembler much as for an if-statement. E.g., for while <exp> do <instructions> end <loop> :- while #A1 <exp> #A2 do <instructions> end #A3 #A1 :- l { Generate line with a unique label for loop start } #A2 :- l { Generate line with jump to end if expr fails } #A3 :- l { Generate jump back to start, and label for loop end }
  • 10. 10 Below is code from a real while loop: topwhile: ;a label to mark the top of this WHILE loop mov eax, 3 ;planning to invoke function 3—read from a file mov ebx, [infileID] ;the file ID must be placed into register ebx mov ecx, mybyte ;the address of memory to receive file content ;must be placed into register ecx mov edx, 1 ;the number of bytes to read is placed in edx, int 80h ; invokes a kernel function according to ;the number in register eax cmp eax, 0 ;check whether a byte was read je dunwhile ;skip the body if no bytes were read xor byte [mybyte], 00001111b ;[] dereferences, thereby refers to the ;contents at mybytes mov eax, 4 ; planning to invoke function 4—write to a file mov ebx, [outfileID] ;the file ID must be placed into register ebx mov ecx, mybyte ;the address of memory to write from must be ;placed into register ecx mov edx, 1 ;the no. of bytes to write must be placed in edx int 80h ; invokes a kernel function according to no. in eax jmp topwhile ;go back to the top of the loop dunwhile: ;jump to here if no byte is read
  • 11. 11 6.2 Repeat Loops <loop> :- repeat #A1 <instructions> until <exp> #A2 #A1: Generate unique label for loop start #A2: Generate jump to end if last result zero Generate unconditional jump back to beginning Generate label for loop end <instr> Code for <instr> generated by other productions <exp>: Code for <exp> generated by other productions 7 Generating Code from Functions This section covers the generation of code for functions. This includes the generation of function calls and the generation of the code of the function body itself. Three important issues here are: 1. How are the parameters passed to the function. 2. How are local variables represented within the function. 3. How are values returned from the function. There are many possible ways to implement functions. Basically, it is up to the person writing the code generator to decide how to do it. We describe here one of the more standard ways of generating functions and function calls. 7.1 The Stack Space Our implementation of functions depends heavily on the use of a stack in the program memory. Many assemblers assign part of the addressable memory of the program to a stack to hold information about the current variable context. Basically, when we enter a function, space is allocated on top of the stack for the local variables, and when we exit from the function, this allocated space is popped off the stack. The stack thus represents the embedded block structure we discussed under symbol tables. The stack typically starts at the top of addressable memory, and expands downwards. So, assume we have 1000 bytes of addressable memory, the “bottom of the stack” will start at address 1000. If we push a 2-byte integer onto the stack, it will occupy memory range 999- 1000. Pushing an 8 byte float value onto this stack, it would occupy bytes: 991-998. A register called SP (for Stack pointer) indicates the top of the stack. In some systems, SP will point at the next free location in the stack. In others, it points to the lowest byte of the top element of the stack. We will assume this last approach, so in the above case, after pushing on the two numbers, SP would contain 991.
  • 12. 7.2 The Function Call Before calling the function, parameters are pushed onto the stack. These can then be accessed by the call routine, from the top of the stack. So that the parameters are available in the required order, they are pushed onto the stack in reverse order. 12 The Call in Source Code: rutina(a, b) The Call In Assembler ... PUSH b PUSH a CALL rutina ... The NASM instruction “CALL” firstly pushes the address of the following instruction onto the stack. This will be used as the return address when the function call returns. 7.3 Entering the Routine On entering the routine, the routine firstly establishes the boundaries of the local space of the stack. A register BP (Base Pointer) is used to indicate the lowest point of the stack which is part of the current context. Consequently, the first thing a routine does on entry is to store the old value of BP onto the stack (for later recovery and restoration), and then reset the BP to point at the current top of the stack (which is the point from which the local context will grow). The first lines of any routine will thus be something like the following: rutina: PUSH BP MOV BP,SP ‘rutina’ is the name of the function, represented as a label in assembler. The old value of BP is pushed onto the stack, and then BP is reset to the value of SP (top of the stack).
  • 13. 7.4 Allocating Space for Local Variables The next step is to allocate stack space for the local variables. The compiler works out how many bytes of memory are required for the local variables, and decrements the stack pointer (the stack grows down, remember) by this amount. In the following example, each int takes 2 bytes and each the double 8 bytes, a total of 14 bytes. Source code: int rutina (int a, char *b) { 13 int i, j, k; double r; . . . } Assembler Code: rutina: PUSH BP MOV BP,SP SUB SP,14 ... 7.5 Referring to parameters and local variables In the body of the function, rather than referring to variables by name, one references them in terms of offsets from the base pointer. Parameters: parameters were pushed onto the stack BEFORE the function was called, and thus are part of the previous context, they are thus above BP. In the above example, parameters a and b can be accessed using [BP+6] and [BP+8] (note the 6 bytes used to store the old BP and the return address). Local Variables: The local variables are available under BP in memory. i, j and k are thus available as, respectively: [BP-2], [BP-4], [BP-6]. r starts at [BP-14]. Then, the program address to return to is pushed on the stack. b a Free Memory BP return address old_bp i j k r SP [BP+8] [BP+6] [BP+2] [BP] [BP-2] [BP-4] [BP-6] [BP-14] On entering the routine, space is allocated for local variables of that routine. On leaving the routine, the part of the stack used by the routine can be ‘popped’. Recursive routines thus have separate memory space. 7.6 Placing The function’s code After we generate the line to allocate space for the local variables, we then generate the code for the body of the function. Firstly, the line MOV SP,BP resets the stack pointer to
  • 14. its value before calling this routine (we thus pop all the local stack space off the stack). At this point, the top element on the stack is the old BP. We can thus issue a command POP BP which pops this element off the stack into BP, thus resetting the BP to its prior value (SP is also moved up two bytes). At this point, the element on top of the stack is the address where execution should resume in the calling context. The RET operator pops an element of the stack, and resumes processing from that point. Back in the calling function, after the function call, we then need to wipe the function parameters off the stack. We do this simply by ADD SP,4. The calling code: PUSH b PUSH a CALL rutina ADD SP,4 … 14 The routine code: rutina: PUSH BP MOV BP,SP SUB SP,14 ... MOV SP,BP POP BP RET ... 7.7 Returning Values A function may or may not return a value. There are various ways to return a value, and it is up to the compiler writer to decide how it is done. One way is to leave the returned value in the EAX register (if it fits), or in a float register for larger numbers. Alternatively, the number could have been placed on the stack, to be popped layer by the calling routine. . . . rutina(a, b) . . . int rutina (int a, char *b) { int i, j, k; double r; . . . return k; } Source Program: PUSH b PUSH a CALL rutina ADD SP,4 rutina: PUSH BP MOV BP,SP SUB SP,14 ... MOV EAX, [BP-6] MOV SP,BP POP BP RET Object Program: Move K into EAX