PLI Lecture 11 Code generation II 4. Control statements and logical expressions 4.1 Control statements Control statements such as if- and while-statements are translated by a form of pattern matching. To three-address code: If-statements: if (E) S1 else S2 if_false t1 goto L1 goto L2 label L1 label L2 Exercise: Modify this pattern for: if (E) S1 Switch statements: switch (E) { case V1: S1 case V2: S2 ... default: Sn } (a) If the number of labels V1, V2, ... is small, treat the switch statement as an if-statement, and test the case labels sequentially. (b) If the range of V1, V2, ... is small, store the addreses of the code for statements S1, S2, ... in an array indexed by V1, V2, ..., and execute indirect jumps through this array. (c) If the number of labels V1, V2, ... is large, store the addresses of the code for statements S1, S2, ... in a hash table, using the labels as keys, and execute indirect jumps through this table. Iteration statements: while (E) S label L1 if_false t1 goto L2 goto L1 label L2 Exercise: Modify this pattern to avoid the unconditional jump inside the loop. Exercise: Modify this pattern for: do S while (E) Exercise: Modify this pattern for: for (i=i0; i < in; i++) S Exercise: How are break and contine statements executed? 4.2 Forward jumps How can we generate forward goto statements when we do not yet know the (relative) address of the destination? (a) Leave it to the assembler. (b) Keep track of the relative address of instructions as attributes in the structure tree, compute the destination as an inherited attribute, and/or pass it as an arguement to code generation routines. (c) For each target label, keep a list of the instructions that jump to the label. When the address of the label is determined, "backpatch" the instructions in the list to refer to the label. This list may be "threaded" through the code. (This requires keeping the generated instructions in a main memory buffer, or using a temporary file, which requires an extra pass over the generated code.) 4.3 Boolean expressions Boolean expressions, particularly when used in control statements, should not require boolean operations on values true and false. Rather, jumps should be used to implement the "short circuit" (or "sequential evalation") of boolean operations in most modern languages. I.e., a and b = if a then b else false a or b = if a then true else b For example, if (i < 0 || i > j) && (done || n > 500) S1 else S2 should be translated into something like this: t1 = i < 0 if_true t1 goto L2 t2 = i > j if_false t2 goto L3 label L2 t3 = done if_true t3 goto L4 t4 = n > 500 if_false t4 goto L3 label L4 goto L5 label L3 label L5 Note that no boolean operations are required. To generate such code, there are two options: (a) Compute the following attributes for each boolean expression in the structure tree: label = label of this expression trueLabel = target label if expression is true falseLabel = target label if expression is false next = true/false if label following expression is trueLabel/falseLabel Exercise: Show the values of these attributes on an attributed structure tree for the above control statement. Exercise: Write an attribute grammar to evaluate these attributes for the grammar: stmt -> if (exp) stmt else stmt | other exp -> exp || exp | exp && exp | not exp | id | true | false Given these attributes, code is generated as follows: A conditional jump instruction is emitted following the code to evaluate each operand that contains no more boolean operators. The target of the jump is the label that does not immediately follow the operand, and the condition is chosen accordingly. (b) Alternatively, we may pass the above three attributes as arguments in a standard recursive code generation procedure void genBoolCode(SyntaxTree E, trueLabel, falseLabel, next) and propogating the label and next values in successive procedure calls. 4.4 A simple code generator for control statements Here we show a simple code generator for control statements generated by the following grammar into P-code: stmt -> if-stmt | while-stmt | break | other if-stmt -> if (exp) stmt | if (exp) stmt else stmt while-stmt -> while (exp) stmt exp -> id | true | false The assumed tree structure is the natural one. Here, label is the destination of a break statement. void genCode(SyntaxTree t, char *label) { char codestr[MAXLINELENGTH+1]; char *lab1, *lab2; if (t != NULL) switch (t->kind) { case ExpKind: tmp1 = newname(); if (t->val == 0) emitCode("ldc false"); else emitCode("ldc true"); break; case IfKind: genCode(t->child[0], label); lab1 = genLabel(); sprintf(codestr, "fjp %s", lab1); emitCode(codestr); genCode(t->child[1], label); if (t->child[2] != NULL) { lab2 = genLabel(); sprintf(codestr, "ujp %s", lab2); emitCode(codestr); } sprintf(codestr, "lab %s", lab1); emit(codestr); if (t->child[2] != NULL) { genCode(t->child[2],label); sprintf(codestr, "lab %s", lab2); emitCode(codestr); } break; case WhileKind: lab1 = genLabel(); sprintf(codestr, "lab %s", lab1); emitCode(codestr); genCode(t->child[0], label); lab2 = genLabel(); sprintf(codestr, "fjp %s", lab2); emitCode(codestr); genCode(t->child[1], lab2); sprintf(codestr, "ujp %s", lab1); emitCode(codestr); sprintf(codestr, "lab %s", lab2); emitCode(codestr); break; case BreakKind: sprintf(codestr, "ujp %s", label); emitCode(codestr); break; case OtherKind: emitCode("Other"); break; default: emitCode("Error"); break; } } Exercise: Extend the grammar and code generation procedure to include "continue" statements. Exercise: Modify this procedure to generate three-address code. Exercise: Extend the grammar to include boolean operators and extend either code generation procedure to generate code for the extended language (using short circuit evaluation). 5. Procedure and function calls. This is very dependent on details of the intermediate and target languages. 5.1 Intermediate code Intermediate code for a function definition has the form: Entry instruction Return instruction Intermediate code for a function call has the form: Begin argument computation instruction Call instruction The four bracketing instructions depend on the number, size and location of the parameters (stack or registers), the size of the stack frame, the size of the local variable and temporary value space, the size and organisation of the bookkeeping information. In particular, it is necessary to pass and save the return address (which follows the call instruction) and to push a new frame onto the stack (updating the frame and stack pointers). It is also necessary to pass any return value from the called function to the calling function (on the stack or through a register). In intermediate code, we may ignore many details of how this is all done. For example, the function definition int f(int x, int y) { return x+y+1; } may be translated into the following three-address code: entry f t1 = x + y t2 = t1 + 1 return t2 A corresponding call f(2+3, 4) may be translated into the three-address code: begin_args t1 = 2 + 3 arg t1 arg 4 call f (The order of listing these arguments may depend on the source language.) The instructions "entry f" and "call f" are jointly responsible for pushing a new frame onto the stack, and the instruction "return t2" is responsible for popping a frame from the stack. 5.2 A simple code generator function definitions and calls We consider the following minimal grammar: program -> dec-list exp dec-list -> dec-list dec | dec -> fn id ( par-list ) = exp par-list -> par-list , id | id exp -> exp + exp | call | num | id call -> id ( arg-list ) arg-list -> arg-list , exp | exp A possible program in the language generated by this grammar is: fn f(x) = 2+x fn g(x,y) = f(x) + y g(3,4) The assumed tree structure is the natural one (see Figure 8.13). Exercise: Draw the structure tree corresponding to the above program. We now give a simple code generator for programs in this language into P-code (Figure 8.14). void genCode(SyntaxTree t) { char codestr[MAXLINELENGTH+1]; SyntaxTree p; if (t != NULL) switch (t->kind) { case PrgK: // program p = t->lchild; while (p != NULL) { genCode(p); // declaration p = p->sibling; } genCode(r->rchild); // expression break; case FnK: // declaration sprintf(codestr, "ent %s", t->name); emitCode(codestr); // entry instruction genCode(t->rchild); // body emitCode("ret"); // return break; case pParamK: // parameter (never reached) break; case ConstK: // constant sprintf(codestr, "ldc %d", t->value); emitCode(codestr); break; case PlusK: // sum genCode(t->lchild); genCode(t->rchild); emitCode("adi"); break; case IdK: // identifier sprintf(codestr, "lod %s", t->name); emitCode(codestr); break; case CallK: // call emitCode("mst"); // begin argument computation p = t->rchild; while (p != NULL) { genCode(p); // argument p = p->sibling; } sprintf(codestr, "cup %s", t->name); emitCode(codestr); // call instruction break; default: emitCode("Error"); break; } } Exercise: Modify this procedure to generate three-address code. Exercise: Modify this procedure to generate code for the following let-expression language: program -> let dec-list in exp dec-list -> dec-list , dec | dec dec -> id = exp | fn id ( par-list ) = exp exp -> exp + exp | call | num | id call -> id ( arg-list ) arg-list -> arg-list , exp | exp Assume the language uses static scope rules. (Compare with the let-expression language of lecture 7.) 6. Register allocation Storing subexpression values in registers when possible... 7. The TINY code generator Review the definitions of the TINY language and TM, the simple target machine. Study the code generator in files code.h, code.c, cgen.h and cgen.c. Consider code generators from either of the above two expression languages with function definitions into TM. Exercise: Allocate two registers for a stack pointer and a frame pointer and propose a detailed run-time representation for the language using TM. Exercise: Modify the above code generation procedure to generate TM code using this run-time representation.