SlideShare a Scribd company logo
COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden
8.  Code Generation
8.1 Intermediate Code and Data Structures for Code Generation
8.1.1 Three-Address Code
8.1.2 Data Structures for the Implementation of Three-Address Code
8.1.3 P-Code
8.2 Basic Code Generation Techniques
8.2.1 Intermediate Code or Target Code as a Synthesized Attribute
8.2.2 Practical Code Generation
8.2.3 Generation of Target Code from Intermediate Code
Code generation from intermediate code involves either or both of  two standard techniques :  Macro expansion and Static simulation Macro expansion  involves  replacin g each kind of intermediate code instruction  with an equivalent sequence of target code instructions Static simulation  involves a straight-line  simulation of the effects of the intermediate code  and generating target code to match these effects
Consider the expression (x=x+3) +4, translate the P-code into three-address code: Lad x Lod x Ldc 3 Adi t1=x+3 Stn x=t1 Ldc 4 Adi t2=t1+4 We perform  a static simulation  of the P-machine stack to find three-address equivalence for the given code
 
 
Now consider the case of translating from three-address code to P-code, by  simple macro expansion .  A three-address instruction: a = b + c Can always be translated into the P-code sequence lda  a lod  b lod  c adi sto
Then, the three-address code for the expression (x=x+3)+4: T1 = x + 3 X = t1 T2 = t1 + 4 Can be translated into the following P-code: Lda t1 Lod x Ldc 3 Adi Sto Lad x Lod t1 Sto Lda t2 Lod t1 Ldc 4 Adi Sto
 
Contents Part One 8.1 Intermediate Code and Data Structure for code Generation 8.2 Basic Code Generation Techniques Part Two 8.3 Code Generation of Data Structure Reference 8.4 Code Generation of Control Statements and Logical Expression 8.5 Code Generation of Procedure and Function calls Other Parts 8.6 Code Generation on Commercial Compilers: Two Case Studies 8.7 TM: A Simple Target Machine 8.8 A Code Generator for the TINY Language 8.9 A Survey of Code Optimization Techniques 8.10 Simple Optimizations for TINY Code Generator
8.3 Code Generation of Data Structure References
8.3.1 Address Calculations
(1) Three-Address Code for Address Calculations The usual arithmetic operations can be used to compute addresses Suppose wished to store the constant value 2 at the address of the variable x plus 10 bytes t1 = &x +10 *t1 = 2 The implementation of these  new addressing modes  requires that  the data structure for three-address code contain a new field or fields   For example, the quadruple data structure of Figure 8.4 (page 403) can be  augmented by an enumerated address-mode field  with possible values  none, address, and indirect
 
8.3.2 Array References
The offset  is computed from the subscript value as follows: First, an  adjustmen t must be made to the subscript value if the subscript range does not begin at 0 Second, the  adjusted subscript value  must be  multiplied by a scale factor  that is equal to the size of each array element in memory Finally, the resulting scaled subscript is  added to the base address  to get the final address of the array element. The address of an array element  a[t] : b a s e _ a d d ress  ( a ) + ( t - lower_bound  ( a )) *  element_size  ( a )
(1) Three-Address Code for Array References   Introduce two new operations : One that fetches the value of an array element t2= a[t1] And one that assigns to the address of an array element a[t2]= t1 For an example: a[i+1] = a [j*2]+3 Translate into the three-address instructions ( with the symbols: =[], []=) t1 = j * 2 t2 = a [t1] t3 = t2 + 3 t4 = i + 1 a [t4] = t3
Writing out the addresses computations of an array element directly  in the code,  The above example can be finally translated into: t1 = j * 2 t2 = t1 * elem_size(a) t3 = &a + t2 t4 = *t3 t5 = t4 + 3 t6 = i + 1 t7 = t6 * elem_size (a) t8 = &a + t7 *t8 = t5
(2) P-Code for Array References   Use the new address instructions  ind  and  ixa . The above example a[i+1] = a [j*2]+3 Will finally become: lda a lod i ldc 1 a d i ixa elem_size(a) lda a lod j ldc 2 m p i ixa elem_size(a) ind 0 ldc 3 a d I sto
 
Array reference generated by a code generation procedure. ( a [ i + 1 ] = 2 ) + a [ j ]   lda a lod i ldc 1 a d i ixa elem_size(a) ldc 2 s t n lda a lod j ixa elem_size(a) ind 0 adi
The code generation procedure for p-code: Void gencode( syntaxtree t, int isaddr) {char codestr[CODESIZE];   /*CODESIZE = max length of 1 line of p-code */   if (t != NULL)   { switch(t->kind) { case OpKind:   switch (t->op) { case Plus:   if (is Addr) emitcode(“Error”);   else { genCode(t->lchild, FALSE); genCode(t->rchild, FALSE);   emitcode(“adi”);}   break;
case Assign: genCode(t->lchild, TRUE); genCode(t->rchild, FALSE); emitcode(“stn”);} break; case Subs: sprintf(codestr,”%s %s”,”lda”, t->strval); emitcode(codestr); gencode(t->lchild,FALSE); sprintf(codestr,”%s%s%s”, “ ixa elem_size(“,t->strval,”)”); emitcode(codestr); if (!isAddr) emitcode (“ind 0”); break;
default: emitcode(“Error”); break; } break; case ConstKind: if (isAddr) emitcode(“Error”); else { sprintf(codestr,”%s %s”, ” ldc”,t->strval);   emitCode(codestr); } break;
case IdKind: if (isAddr) sprintf(codestr,”%s %s”,”lda”,t->strval); else sprintf(codestr,”%s %s”,”lod”,t->strval); emitcode(codestr); break; default: emitCode(“Error”); break; } } }
(4) Multidimensional Arrays For an example, in C an array of two dimensions can be declared as: Int a[15][10] Partially subscripted, yielding an array of fewer dimensions: a[i] Fully subscripted, yielding a value of the element type of the array: a[i][j] The address computation can be  implemented by recursively applying the above techniques
8.3.3 Record Structure and Pointer References
Computing the address of a record or structure field  presents a similar problem to that of computing a subscripted array address   First, the  base address  of the structure variable is computed;  Then, the (usually fixed)  offset  of the named field is found,  and the two are added to get the resulting address For example , the C declarations: Typedef struct rec { int i;   char c;   int j; } Rec; … Rec x;
Memory allocated to x Base address of x Offset of x.c Offset of x.j (Other memory) x.i x.c x.j (Other memory)
1)  Three-Address Code for Structure and Pointer References Use the three-address instruction t1 = &x + field_offset (x,j) x.j = x.i; be translated into  t1 = &x + field_offset (x,j) t2 = &x + field_offset (x,i) *t1 = *t2 Consider the following example of a tree data structure and variable declaration in C: typedef struct treeNode { int val; struct treeNode * lchild, * rchild; } TreeNode;
typedef struct treeNode { int val; struct treeNode * lchild, * rchild; } TreeNode; . . . TreeNode *p; p -> lchild = p; p = p -> rchild; translate into the three-address code t1 = p + field_offset ( *p, lchild ) *t1 = p t2 = p + field_offset ( *p, rchild ) p = *t2
2)  P-Code for Structure and Pointer References  x.j = x.i translated into the P-code lda x lod field_offset (x,j) ixa 1 lda x ind field_offset (x,i) sto
The assignments: p->lchild = p; p = p->rchild Can be translated into the following P-code. Lod p Lod field-offset(*p,lchild) Ixa 1 Lod p Sto Lda p Lod p Ind field_offset(*p,rchild) sto
8.4 Code Generation of Control Statements and Logical Expressions
The section will describe code generation for various forms of  control statements .  Chief among these are the structured  if-statement  and  while-statement Intermediate code generation for control statements involves the generation of  labels  in manner,  Which stand for addresses in the target code to which jumps are made If labels are to be eliminated in the generation of target code,  The a problem arises in that jumps to code locations that are not yet known must be  back-patched , or retroactively rewritten.
8.4.1 Code Generation for If – and While – Statements
Two forms of the if- and while-statements: if-stmt  ->  i f (  e x p  )  stmt  |  i f (  exp  )  stmt  e l s e  stmt while-stmt  ->  w h i l e  (  e x p  )  s t m t The chief problem is to  translate the structured control  features into an  “unstructured” equivalent involving jumps Which can be directly implemented. Compilers arrange to generate code for such statements  in a standard order  that allows the efficient use of a subset of the possible jumps that target architecture might permit.
The typical code arrangement for an if-statement is shown as follows:
While the typical code arrangement for a while-statement
Three-Address Code for Control Statement For the statement: if (  E  )  S1  e l s e  S2 The following code pattern is generated: <code to evaluate  E  to t1> if_false t1 goto L1 <code for  S1 > goto L2 label L1 <code for  S 2 > label L2
Three-Address Code for Control Statement Similarly, a while-statement of the form while (  E  )  S Would cause the following three-address code pattern to be generated: label L1 <code to evaluate  E  to t1> if_false t1 goto L2 <code for  S > goto L1 label L2
P-Code for Control Statement For the statement if (  E  )  S1  else  S 2 The following P-code pattern is generated: <code to evaluate  E > fjp L1 <code for  S 1 > ujp L2 lab L1 <code for  S 2 > lab L2
P-Code for Control Statement And for the statement while (  E  )  S The following P-code pattern is generated: lab L1 <code to evaluate  E > fjp L2 <code for  S > ujp L1 lab L2
8.4.2 Generation of Labels and Back-patching
One feature of code generation for control statements  that can cause problems during target code generation is the fact that, in some cases,  jumps to a label must be  generated prior to the definition  of the label itself A standard method for generating such forward jumps  is either to  leave a gap  in the code where the jump is to occur or to generate a  dummy jump instruction  to a fake location Then, when the  actual jump location becomes known , this location is used to fix up, or  back-patch , the missing code
During the back-patching process  a further problem  may arise in that many architectures have  two varieties of jumps , a short jump or branch ( within 128 bytes if code) and  a long jump that requires more code space   In that case, a code generator may need to insert  nop  instructions when shortening jumps, or make several passes to condense the code
8.4.3 Code Generation of Logical Expressions
The standard way to do this is to represent the Boolean value  false  as 0 and  true  as 1.  Then standard bitwise  and  and  or  operators can be used to compute the value of a Boolean expression on most architectures A further use of jumps  is necessary if the logical operations are  short   circuit . For instance, it is common to write in C: if ((p!=NULL) && ( p->val==0) ) ... Where evaluation of  p->val  when  p  is null could cause a memory fault Short-circuit Boolean operators are similar to if-statements, except that they return values, and often they are defined using if-expressions as a and b  ::  if a then b else false and a or b  ::  if a then true else b
To generate code that ensures that the second sub-expression will be evaluated only when necessary Use jumps in exactly the same way as in the code for if-statements For instance, short-circuit P-code for the C expression  ( x ! = 0 ) & & ( y = = x )  is: lod x ldc 0 n e q fjp L1 lod y lod x e q u ujp L2 lab L1 lod FALSE lab L2
8.4.4 A Sample code Generation Procedure for If- and While- Statements
Exhibiting a code generation procedure for control statements using the following simplified  grammar: stmt  ->  if-stmt  |  while-stmt  |  b r e a k  |  o t h e r if-stmt  ->  i f (  exp  )  stmt  |  i f (  e x p  )  stmt  e l s e  s t m t while-stmt  ->  w h i l e (  e x p  )  s t m t exp  ->  t r u e  |  f a l s e
The following C declaration can be used to implement an abstract syntax tree for this grammar: typedef enum { ExpKind, IfKind, WhileKind, BreakKind, OtherKind } NodeKind; typedef struct streenode { NodeKind kind; struct streenode * child[3] ; int val; /* used with ExpKind */ } STreeNode; typedef STreeNode * SyntaxTree;
 
Using the given  typedef ’s and the corresponding syntax tree structure, a code generation procedure that generates P-code is given as follows: Void genCode(SyntaxTree t, char* lable) { char codestr[CODESIZES];   char *lab1, *lab2;   if (t!=NULL) switch (t->kind) {case ExpKind: if (t->val==0) emitCode(“ldc false”); else emitcode(“ldc true”); break;
case IfKind: genCode(t->child[0], label); lab1 = genLable(); sprintf(codestr,”%s %s”, “fjp”,lab1); emitcode(codestr); gencode(t->child[1],label); if (t->child[2]!=NULL) {  lab2=genlable();   sprintf(codestr,”%s %s”,”ujp”,lab2);   emitcode(codestr);}   sprintf(codestr,”%s %s”,”lab”,lab1);   emitcode(codestr);   if (t->child[2]!=NULL)   {  gencode(t->child[2],lable);   sprintf(codestr,”%s %s”,”lab”,lab2);   emitcode(codestr);} break;
case WhileKind; lab1=genlab(); sprintf(codestr,”%s %s”, “lab”,lab1); emitcode(codestr); gencode(t->child[0],label); lab2=genlabel(); sprintf(codestr,”%s %s”, “fjp”,lab2); emitcode(codestr); gencode(t->child[1],lab2); sprintf(codestr,”%s %s”, “ujp”,lab1); emitcode(codestr); sprintf(codestr,”%s %s”, “lab”,lab2); emitcode(codestr); break;
case BreakKind: sprintf(codestr,”%s %s”, “ujp”,label); emitcode(codestr); break; case OtherKind: emitcode(“other”); break; Default: emitcode(“other”); break; } }
For the statement, if (true) while (true) if (false) break else other The above procedure generates the code sequence ldc true fjp L1 lab L2 ldc true fjp L3 ldc false fjp L4 ujp L3 ujp L5 lab L4 Other lab L5 ujp L2 lab L3 Lab L1
8.5 Code Generation of Procedure and Function Calls
8.5.1 Intermediate Code for Procedures and Functions
The requirements for intermediate code representations of function calls may be described in general terms as follows First, there are actually  two mechanisms  that need descriptions:  function/procedure  definition   and function/procedure  call A definition creates a function name, parameters, and code , but the function does not execute at that point A call creates values for the parameters  and performs a  jump  to the code of the function, which then executes and  returns
Intermediate code for a definition must include  An instruction  marking the beginning , or entry point, of the code for the function,  And an instruction  marking the ending , or return point, of the function Entry instruction <Code for the function body> Return instruction Similarly, a function call must have an instruction  indicating the beginning of the  computation of the arguments  and an actual call instruction that indicates the point where the arguments have been constructed  and  the actual jump  to the code of the function can take place Begin-argument-computation instruction <Code to compute the arguments > Call instruction
Three-Address Code for Procedures and Functions   In three-address code, the entry instruction needs to give a name to the procedure entry point, similar to the  label  instruction; thus, it is a one-address instruction, which we will call simply  entry . Similarly, we will call the return instruction  return For example, consider the C function definition. int f ( int x, int y ) { return x + y + 1; } This will translate into the following three-address code: entry f t1 = x + y t2 = t1 + 1 return t2
Three-Address Code for Procedures and Functions For example, suppose the function f has been defined in C as in the previous example.  Then, the call f ( 2+3, 4) Translates to the three-address code begin_args t1 = 2 + 3 arg t1 arg 4 call f
P-code for Procedures and functions The entry instruction in P-code is  ent , and the return instruction is  ret int f ( int x, int y ) { return x + y + 1; } Thus the definition of the C function f translates into the P-code ent f lod x lod y a d i ldc 1 a d i r e t
P-code for Procedures and functions Our example of a call in C (the call  f (2+3, 4)  to the function f described previously) now translates into the following P-code: m s t ldc 2 ldc 3 a d i ldc 4 cup f
8.5.2 A Code Generation Procedure for Function Definition and Call
The grammar we will use is the following: program  ->  decl-list exp decl-list  ->  decl-list decl  | ε decl  ->  f n  id  (  param-list  ) =  e x p param-list  ->  p a ram - list ,  id  |  id exp  ->  exp  +  exp  |  call  |  num  |  id call  ->  id  (  arg-list  ) arg-list  ->  a rg-list ,  exp  |  exp An example of a program as defined by this grammar is fn f(x)=2+x fn g(x,y)=f(x)+y g ( 3 , 4 )
We do so using the following C declarations: typedef enum {PrgK, FnK, ParamK, PlusK, CallK, ConstK, IdK} NodeKind ; typedef struct streenode { NodeKind kind; struct streenode *lchild,*rchild, * s i b l i n g ; char * name; /* used with FnK,ParamK,Callk,IdK */ int val; /* used with ConstK */ } StreeNode; typedef StreeNode * SyntaxTree;
Abstract syntax tree for the sample program : fn f(x)=2+x fn g(x,y)=f(x)+y g ( 3 , 4 )
Given this syntax tree structure, a code generation procedure that produces P-code is given in the following: Void genCode( syntaxtree t) { char codestr[CODESIZE]; SyntaxTree p; If (t!=NULL) Switch (t->kind) { case PrgK: p = t->lchild; while (p!=NULL) {  gencode(p); p = p->slibing;} gencode(t->rchild); break;
case FnK: sprintf(codestr,”%s %s”,”ent”,t->name); emitcode(codestr); gencode(t->rchild); emitcode(“ret”); break; case ConstK: sprintf(codestr,”%s %d”,”ldc”,t->val); emitcode(codestr); break; case PlusK: gencode(t->lchild); gencode(t->rchild); emitcode(“adi”); break; case IdK: sprintf(codestr,”%s %s”,”lod”,t->name); emitcode(codestr); break;
case CallK: emitCode(“mst”); p = t->rchild; while (p!=NULL) {genCode(p);   p = p->sibling;} sprintf(codestr,”%s %s”,”cup”,t->name); emitcode(codestr); break; default: emitcode(“Error”); break; } }
Given the syntax tree in Figure 8.13, the generated the code sequences: Ent f Ldc 2 Lod x Adi Ret Ent g Mst Lod x Cup f Lod y Adi Ret Mst Ldc 3 Ldc 4 Cup g
End of Part Two THANKS

More Related Content

PPT
Chapter Eight(1)
PPTX
Intermediate code generation1
PPT
Chapter 6 intermediate code generation
PDF
Intermediate code generation
PPT
Chapter Two(1)
PPT
Intermediate code generation (Compiler Design)
PPTX
Lecture 12 intermediate code generation
Chapter Eight(1)
Intermediate code generation1
Chapter 6 intermediate code generation
Intermediate code generation
Chapter Two(1)
Intermediate code generation (Compiler Design)
Lecture 12 intermediate code generation

What's hot (20)

PPT
Chapter Three(1)
PPTX
Compiler Design - Ambiguous grammar, LMD & RMD, Infix & Postfix, Implementati...
PPT
Chapter Eight(3)
PPTX
COMPILER DESIGN AND CONSTRUCTION
PPT
Intermediate code generation
PPTX
Back patching
PPT
Intermediate code generation
PPTX
Three address code In Compiler Design
PPT
Interm codegen
PPTX
Syntax-Directed Translation into Three Address Code
PPT
Chapter Three(2)
PPTX
Intermediate code
PPT
Chapter 6 Intermediate Code Generation
DOC
Compiler Design QA
PPTX
Intermediate code generator
PDF
Intermediate code generation in Compiler Design
PPT
Chapter Seven(2)
PPTX
Three address code generation
Chapter Three(1)
Compiler Design - Ambiguous grammar, LMD & RMD, Infix & Postfix, Implementati...
Chapter Eight(3)
COMPILER DESIGN AND CONSTRUCTION
Intermediate code generation
Back patching
Intermediate code generation
Three address code In Compiler Design
Interm codegen
Syntax-Directed Translation into Three Address Code
Chapter Three(2)
Intermediate code
Chapter 6 Intermediate Code Generation
Compiler Design QA
Intermediate code generator
Intermediate code generation in Compiler Design
Chapter Seven(2)
Three address code generation
Ad

Viewers also liked (6)

PPT
Code generator
PPT
Lecture 16 17 code-generation
PDF
Compiler unit 2&3
PPT
Lexical analyzer
PPTX
Compiler Chapter 1
PPTX
Intermediate code- generation
Code generator
Lecture 16 17 code-generation
Compiler unit 2&3
Lexical analyzer
Compiler Chapter 1
Intermediate code- generation
Ad

Similar to Chapter Eight(2) (20)

DOC
Compiler notes--unit-iii
PPTX
Compiler Design_Code generation techniques.pptx
PPT
Code Generations - 1 compiler design.ppt
PPT
ERTS UNIT 3.ppt
PPTX
Unit iv(simple code generator)
PPTX
Code generation
PPT
COMPILER_DESIGN_CLASS 2.ppt
PPTX
COMPILER_DESIGN_CLASS 1.pptx
PPTX
Compiler Design_Intermediate code generation new ppt.pptx
PPTX
UNIT - III Compiler.pptx power point presentation
PPTX
Code Generation Part-2 in Compiler Construction
PPT
458237.-Compiler-Design-Intermediate-code-generation.ppt
PDF
Module 6 Intermediate Code Generation.pdf
PPTX
Co&amp;al lecture-08
PDF
Generacion de codigo ensamblado
PPT
Assembler design option
PPT
CODE GENERATION PHASE COMPILER DESIGN.ppt
PPT
PRESENTATION ON DATA STRUCTURE AND THEIR TYPE
PPT
456589.-Compiler-Design-Code-Generation (1).ppt
PPTX
Computer organization and architecture
Compiler notes--unit-iii
Compiler Design_Code generation techniques.pptx
Code Generations - 1 compiler design.ppt
ERTS UNIT 3.ppt
Unit iv(simple code generator)
Code generation
COMPILER_DESIGN_CLASS 2.ppt
COMPILER_DESIGN_CLASS 1.pptx
Compiler Design_Intermediate code generation new ppt.pptx
UNIT - III Compiler.pptx power point presentation
Code Generation Part-2 in Compiler Construction
458237.-Compiler-Design-Intermediate-code-generation.ppt
Module 6 Intermediate Code Generation.pdf
Co&amp;al lecture-08
Generacion de codigo ensamblado
Assembler design option
CODE GENERATION PHASE COMPILER DESIGN.ppt
PRESENTATION ON DATA STRUCTURE AND THEIR TYPE
456589.-Compiler-Design-Code-Generation (1).ppt
Computer organization and architecture

More from bolovv (6)

DOC
Chapter 2 2 1 1
DOC
Chapter 2 2 1 2
PPT
Chapter Five(2)
PPT
Chapter One
PPT
Chapter Seven(1)
DOC
Chapter 1 1
Chapter 2 2 1 1
Chapter 2 2 1 2
Chapter Five(2)
Chapter One
Chapter Seven(1)
Chapter 1 1

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Modernizing your data center with Dell and AMD
PDF
Electronic commerce courselecture one. Pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
Teaching material agriculture food technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation theory and applications.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Modernizing your data center with Dell and AMD
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation theory and applications.pdf
Understanding_Digital_Forensics_Presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Chapter Eight(2)

  • 1. COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden
  • 2. 8. Code Generation
  • 3. 8.1 Intermediate Code and Data Structures for Code Generation
  • 5. 8.1.2 Data Structures for the Implementation of Three-Address Code
  • 7. 8.2 Basic Code Generation Techniques
  • 8. 8.2.1 Intermediate Code or Target Code as a Synthesized Attribute
  • 9. 8.2.2 Practical Code Generation
  • 10. 8.2.3 Generation of Target Code from Intermediate Code
  • 11. Code generation from intermediate code involves either or both of two standard techniques : Macro expansion and Static simulation Macro expansion involves replacin g each kind of intermediate code instruction with an equivalent sequence of target code instructions Static simulation involves a straight-line simulation of the effects of the intermediate code and generating target code to match these effects
  • 12. Consider the expression (x=x+3) +4, translate the P-code into three-address code: Lad x Lod x Ldc 3 Adi t1=x+3 Stn x=t1 Ldc 4 Adi t2=t1+4 We perform a static simulation of the P-machine stack to find three-address equivalence for the given code
  • 13.  
  • 14.  
  • 15. Now consider the case of translating from three-address code to P-code, by simple macro expansion . A three-address instruction: a = b + c Can always be translated into the P-code sequence lda a lod b lod c adi sto
  • 16. Then, the three-address code for the expression (x=x+3)+4: T1 = x + 3 X = t1 T2 = t1 + 4 Can be translated into the following P-code: Lda t1 Lod x Ldc 3 Adi Sto Lad x Lod t1 Sto Lda t2 Lod t1 Ldc 4 Adi Sto
  • 17.  
  • 18. Contents Part One 8.1 Intermediate Code and Data Structure for code Generation 8.2 Basic Code Generation Techniques Part Two 8.3 Code Generation of Data Structure Reference 8.4 Code Generation of Control Statements and Logical Expression 8.5 Code Generation of Procedure and Function calls Other Parts 8.6 Code Generation on Commercial Compilers: Two Case Studies 8.7 TM: A Simple Target Machine 8.8 A Code Generator for the TINY Language 8.9 A Survey of Code Optimization Techniques 8.10 Simple Optimizations for TINY Code Generator
  • 19. 8.3 Code Generation of Data Structure References
  • 21. (1) Three-Address Code for Address Calculations The usual arithmetic operations can be used to compute addresses Suppose wished to store the constant value 2 at the address of the variable x plus 10 bytes t1 = &x +10 *t1 = 2 The implementation of these new addressing modes requires that the data structure for three-address code contain a new field or fields For example, the quadruple data structure of Figure 8.4 (page 403) can be augmented by an enumerated address-mode field with possible values none, address, and indirect
  • 22.  
  • 24. The offset is computed from the subscript value as follows: First, an adjustmen t must be made to the subscript value if the subscript range does not begin at 0 Second, the adjusted subscript value must be multiplied by a scale factor that is equal to the size of each array element in memory Finally, the resulting scaled subscript is added to the base address to get the final address of the array element. The address of an array element a[t] : b a s e _ a d d ress ( a ) + ( t - lower_bound ( a )) * element_size ( a )
  • 25. (1) Three-Address Code for Array References Introduce two new operations : One that fetches the value of an array element t2= a[t1] And one that assigns to the address of an array element a[t2]= t1 For an example: a[i+1] = a [j*2]+3 Translate into the three-address instructions ( with the symbols: =[], []=) t1 = j * 2 t2 = a [t1] t3 = t2 + 3 t4 = i + 1 a [t4] = t3
  • 26. Writing out the addresses computations of an array element directly in the code, The above example can be finally translated into: t1 = j * 2 t2 = t1 * elem_size(a) t3 = &a + t2 t4 = *t3 t5 = t4 + 3 t6 = i + 1 t7 = t6 * elem_size (a) t8 = &a + t7 *t8 = t5
  • 27. (2) P-Code for Array References Use the new address instructions ind and ixa . The above example a[i+1] = a [j*2]+3 Will finally become: lda a lod i ldc 1 a d i ixa elem_size(a) lda a lod j ldc 2 m p i ixa elem_size(a) ind 0 ldc 3 a d I sto
  • 28.  
  • 29. Array reference generated by a code generation procedure. ( a [ i + 1 ] = 2 ) + a [ j ] lda a lod i ldc 1 a d i ixa elem_size(a) ldc 2 s t n lda a lod j ixa elem_size(a) ind 0 adi
  • 30. The code generation procedure for p-code: Void gencode( syntaxtree t, int isaddr) {char codestr[CODESIZE]; /*CODESIZE = max length of 1 line of p-code */ if (t != NULL) { switch(t->kind) { case OpKind: switch (t->op) { case Plus: if (is Addr) emitcode(“Error”); else { genCode(t->lchild, FALSE); genCode(t->rchild, FALSE); emitcode(“adi”);} break;
  • 31. case Assign: genCode(t->lchild, TRUE); genCode(t->rchild, FALSE); emitcode(“stn”);} break; case Subs: sprintf(codestr,”%s %s”,”lda”, t->strval); emitcode(codestr); gencode(t->lchild,FALSE); sprintf(codestr,”%s%s%s”, “ ixa elem_size(“,t->strval,”)”); emitcode(codestr); if (!isAddr) emitcode (“ind 0”); break;
  • 32. default: emitcode(“Error”); break; } break; case ConstKind: if (isAddr) emitcode(“Error”); else { sprintf(codestr,”%s %s”, ” ldc”,t->strval); emitCode(codestr); } break;
  • 33. case IdKind: if (isAddr) sprintf(codestr,”%s %s”,”lda”,t->strval); else sprintf(codestr,”%s %s”,”lod”,t->strval); emitcode(codestr); break; default: emitCode(“Error”); break; } } }
  • 34. (4) Multidimensional Arrays For an example, in C an array of two dimensions can be declared as: Int a[15][10] Partially subscripted, yielding an array of fewer dimensions: a[i] Fully subscripted, yielding a value of the element type of the array: a[i][j] The address computation can be implemented by recursively applying the above techniques
  • 35. 8.3.3 Record Structure and Pointer References
  • 36. Computing the address of a record or structure field presents a similar problem to that of computing a subscripted array address First, the base address of the structure variable is computed; Then, the (usually fixed) offset of the named field is found, and the two are added to get the resulting address For example , the C declarations: Typedef struct rec { int i; char c; int j; } Rec; … Rec x;
  • 37. Memory allocated to x Base address of x Offset of x.c Offset of x.j (Other memory) x.i x.c x.j (Other memory)
  • 38. 1) Three-Address Code for Structure and Pointer References Use the three-address instruction t1 = &x + field_offset (x,j) x.j = x.i; be translated into t1 = &x + field_offset (x,j) t2 = &x + field_offset (x,i) *t1 = *t2 Consider the following example of a tree data structure and variable declaration in C: typedef struct treeNode { int val; struct treeNode * lchild, * rchild; } TreeNode;
  • 39. typedef struct treeNode { int val; struct treeNode * lchild, * rchild; } TreeNode; . . . TreeNode *p; p -> lchild = p; p = p -> rchild; translate into the three-address code t1 = p + field_offset ( *p, lchild ) *t1 = p t2 = p + field_offset ( *p, rchild ) p = *t2
  • 40. 2) P-Code for Structure and Pointer References x.j = x.i translated into the P-code lda x lod field_offset (x,j) ixa 1 lda x ind field_offset (x,i) sto
  • 41. The assignments: p->lchild = p; p = p->rchild Can be translated into the following P-code. Lod p Lod field-offset(*p,lchild) Ixa 1 Lod p Sto Lda p Lod p Ind field_offset(*p,rchild) sto
  • 42. 8.4 Code Generation of Control Statements and Logical Expressions
  • 43. The section will describe code generation for various forms of control statements . Chief among these are the structured if-statement and while-statement Intermediate code generation for control statements involves the generation of labels in manner, Which stand for addresses in the target code to which jumps are made If labels are to be eliminated in the generation of target code, The a problem arises in that jumps to code locations that are not yet known must be back-patched , or retroactively rewritten.
  • 44. 8.4.1 Code Generation for If – and While – Statements
  • 45. Two forms of the if- and while-statements: if-stmt -> i f ( e x p ) stmt | i f ( exp ) stmt e l s e stmt while-stmt -> w h i l e ( e x p ) s t m t The chief problem is to translate the structured control features into an “unstructured” equivalent involving jumps Which can be directly implemented. Compilers arrange to generate code for such statements in a standard order that allows the efficient use of a subset of the possible jumps that target architecture might permit.
  • 46. The typical code arrangement for an if-statement is shown as follows:
  • 47. While the typical code arrangement for a while-statement
  • 48. Three-Address Code for Control Statement For the statement: if ( E ) S1 e l s e S2 The following code pattern is generated: <code to evaluate E to t1> if_false t1 goto L1 <code for S1 > goto L2 label L1 <code for S 2 > label L2
  • 49. Three-Address Code for Control Statement Similarly, a while-statement of the form while ( E ) S Would cause the following three-address code pattern to be generated: label L1 <code to evaluate E to t1> if_false t1 goto L2 <code for S > goto L1 label L2
  • 50. P-Code for Control Statement For the statement if ( E ) S1 else S 2 The following P-code pattern is generated: <code to evaluate E > fjp L1 <code for S 1 > ujp L2 lab L1 <code for S 2 > lab L2
  • 51. P-Code for Control Statement And for the statement while ( E ) S The following P-code pattern is generated: lab L1 <code to evaluate E > fjp L2 <code for S > ujp L1 lab L2
  • 52. 8.4.2 Generation of Labels and Back-patching
  • 53. One feature of code generation for control statements that can cause problems during target code generation is the fact that, in some cases, jumps to a label must be generated prior to the definition of the label itself A standard method for generating such forward jumps is either to leave a gap in the code where the jump is to occur or to generate a dummy jump instruction to a fake location Then, when the actual jump location becomes known , this location is used to fix up, or back-patch , the missing code
  • 54. During the back-patching process a further problem may arise in that many architectures have two varieties of jumps , a short jump or branch ( within 128 bytes if code) and a long jump that requires more code space In that case, a code generator may need to insert nop instructions when shortening jumps, or make several passes to condense the code
  • 55. 8.4.3 Code Generation of Logical Expressions
  • 56. The standard way to do this is to represent the Boolean value false as 0 and true as 1. Then standard bitwise and and or operators can be used to compute the value of a Boolean expression on most architectures A further use of jumps is necessary if the logical operations are short circuit . For instance, it is common to write in C: if ((p!=NULL) && ( p->val==0) ) ... Where evaluation of p->val when p is null could cause a memory fault Short-circuit Boolean operators are similar to if-statements, except that they return values, and often they are defined using if-expressions as a and b :: if a then b else false and a or b :: if a then true else b
  • 57. To generate code that ensures that the second sub-expression will be evaluated only when necessary Use jumps in exactly the same way as in the code for if-statements For instance, short-circuit P-code for the C expression ( x ! = 0 ) & & ( y = = x ) is: lod x ldc 0 n e q fjp L1 lod y lod x e q u ujp L2 lab L1 lod FALSE lab L2
  • 58. 8.4.4 A Sample code Generation Procedure for If- and While- Statements
  • 59. Exhibiting a code generation procedure for control statements using the following simplified grammar: stmt -> if-stmt | while-stmt | b r e a k | o t h e r if-stmt -> i f ( exp ) stmt | i f ( e x p ) stmt e l s e s t m t while-stmt -> w h i l e ( e x p ) s t m t exp -> t r u e | f a l s e
  • 60. The following C declaration can be used to implement an abstract syntax tree for this grammar: typedef enum { ExpKind, IfKind, WhileKind, BreakKind, OtherKind } NodeKind; typedef struct streenode { NodeKind kind; struct streenode * child[3] ; int val; /* used with ExpKind */ } STreeNode; typedef STreeNode * SyntaxTree;
  • 61.  
  • 62. Using the given typedef ’s and the corresponding syntax tree structure, a code generation procedure that generates P-code is given as follows: Void genCode(SyntaxTree t, char* lable) { char codestr[CODESIZES]; char *lab1, *lab2; if (t!=NULL) switch (t->kind) {case ExpKind: if (t->val==0) emitCode(“ldc false”); else emitcode(“ldc true”); break;
  • 63. case IfKind: genCode(t->child[0], label); lab1 = genLable(); sprintf(codestr,”%s %s”, “fjp”,lab1); emitcode(codestr); gencode(t->child[1],label); if (t->child[2]!=NULL) { lab2=genlable(); sprintf(codestr,”%s %s”,”ujp”,lab2); emitcode(codestr);} sprintf(codestr,”%s %s”,”lab”,lab1); emitcode(codestr); if (t->child[2]!=NULL) { gencode(t->child[2],lable); sprintf(codestr,”%s %s”,”lab”,lab2); emitcode(codestr);} break;
  • 64. case WhileKind; lab1=genlab(); sprintf(codestr,”%s %s”, “lab”,lab1); emitcode(codestr); gencode(t->child[0],label); lab2=genlabel(); sprintf(codestr,”%s %s”, “fjp”,lab2); emitcode(codestr); gencode(t->child[1],lab2); sprintf(codestr,”%s %s”, “ujp”,lab1); emitcode(codestr); sprintf(codestr,”%s %s”, “lab”,lab2); emitcode(codestr); break;
  • 65. case BreakKind: sprintf(codestr,”%s %s”, “ujp”,label); emitcode(codestr); break; case OtherKind: emitcode(“other”); break; Default: emitcode(“other”); break; } }
  • 66. For the statement, if (true) while (true) if (false) break else other The above procedure generates the code sequence ldc true fjp L1 lab L2 ldc true fjp L3 ldc false fjp L4 ujp L3 ujp L5 lab L4 Other lab L5 ujp L2 lab L3 Lab L1
  • 67. 8.5 Code Generation of Procedure and Function Calls
  • 68. 8.5.1 Intermediate Code for Procedures and Functions
  • 69. The requirements for intermediate code representations of function calls may be described in general terms as follows First, there are actually two mechanisms that need descriptions: function/procedure definition and function/procedure call A definition creates a function name, parameters, and code , but the function does not execute at that point A call creates values for the parameters and performs a jump to the code of the function, which then executes and returns
  • 70. Intermediate code for a definition must include An instruction marking the beginning , or entry point, of the code for the function, And an instruction marking the ending , or return point, of the function Entry instruction <Code for the function body> Return instruction Similarly, a function call must have an instruction indicating the beginning of the computation of the arguments and an actual call instruction that indicates the point where the arguments have been constructed and the actual jump to the code of the function can take place Begin-argument-computation instruction <Code to compute the arguments > Call instruction
  • 71. Three-Address Code for Procedures and Functions In three-address code, the entry instruction needs to give a name to the procedure entry point, similar to the label instruction; thus, it is a one-address instruction, which we will call simply entry . Similarly, we will call the return instruction return For example, consider the C function definition. int f ( int x, int y ) { return x + y + 1; } This will translate into the following three-address code: entry f t1 = x + y t2 = t1 + 1 return t2
  • 72. Three-Address Code for Procedures and Functions For example, suppose the function f has been defined in C as in the previous example. Then, the call f ( 2+3, 4) Translates to the three-address code begin_args t1 = 2 + 3 arg t1 arg 4 call f
  • 73. P-code for Procedures and functions The entry instruction in P-code is ent , and the return instruction is ret int f ( int x, int y ) { return x + y + 1; } Thus the definition of the C function f translates into the P-code ent f lod x lod y a d i ldc 1 a d i r e t
  • 74. P-code for Procedures and functions Our example of a call in C (the call f (2+3, 4) to the function f described previously) now translates into the following P-code: m s t ldc 2 ldc 3 a d i ldc 4 cup f
  • 75. 8.5.2 A Code Generation Procedure for Function Definition and Call
  • 76. The grammar we will use is the following: program -> decl-list exp decl-list -> decl-list decl | ε decl -> f n id ( param-list ) = e x p param-list -> p a ram - list , id | id exp -> exp + exp | call | num | id call -> id ( arg-list ) arg-list -> a rg-list , exp | exp An example of a program as defined by this grammar is fn f(x)=2+x fn g(x,y)=f(x)+y g ( 3 , 4 )
  • 77. We do so using the following C declarations: typedef enum {PrgK, FnK, ParamK, PlusK, CallK, ConstK, IdK} NodeKind ; typedef struct streenode { NodeKind kind; struct streenode *lchild,*rchild, * s i b l i n g ; char * name; /* used with FnK,ParamK,Callk,IdK */ int val; /* used with ConstK */ } StreeNode; typedef StreeNode * SyntaxTree;
  • 78. Abstract syntax tree for the sample program : fn f(x)=2+x fn g(x,y)=f(x)+y g ( 3 , 4 )
  • 79. Given this syntax tree structure, a code generation procedure that produces P-code is given in the following: Void genCode( syntaxtree t) { char codestr[CODESIZE]; SyntaxTree p; If (t!=NULL) Switch (t->kind) { case PrgK: p = t->lchild; while (p!=NULL) { gencode(p); p = p->slibing;} gencode(t->rchild); break;
  • 80. case FnK: sprintf(codestr,”%s %s”,”ent”,t->name); emitcode(codestr); gencode(t->rchild); emitcode(“ret”); break; case ConstK: sprintf(codestr,”%s %d”,”ldc”,t->val); emitcode(codestr); break; case PlusK: gencode(t->lchild); gencode(t->rchild); emitcode(“adi”); break; case IdK: sprintf(codestr,”%s %s”,”lod”,t->name); emitcode(codestr); break;
  • 81. case CallK: emitCode(“mst”); p = t->rchild; while (p!=NULL) {genCode(p); p = p->sibling;} sprintf(codestr,”%s %s”,”cup”,t->name); emitcode(codestr); break; default: emitcode(“Error”); break; } }
  • 82. Given the syntax tree in Figure 8.13, the generated the code sequences: Ent f Ldc 2 Lod x Adi Ret Ent g Mst Lod x Cup f Lod y Adi Ret Mst Ldc 3 Ldc 4 Cup g
  • 83. End of Part Two THANKS