PLI Lecture 11 Code generation II 4. Control statements and logical expressions 4.1 Control statements Control statements such as if- and while-statements are translated by a form of pattern matching. To three-address code: If-statements: if (E) S1 else S2

if_false t1 goto L1

goto L2
label L1

label L2

Exercise: Modify this pattern for: if (E) S1 

Switch statements:

switch (E) {
  case V1: S1
  case V2: S2
  ...
  default: Sn
}

(a) If the number of labels V1, V2, ... is small, treat the
switch statement as an if-statement, and test the case labels
sequentially.

(b) If the range of V1, V2, ... is small, store the addreses
of the code for statements S1, S2, ... in an array indexed
by V1, V2, ..., and execute indirect jumps through this array.

(c) If the number of labels V1, V2, ... is large, store the
addresses of the code for statements S1, S2, ... in a hash
table, using the labels as keys, and execute indirect jumps
through this table.

Iteration statements:

while (E) S

label L1

if_false t1 goto L2

goto L1
label L2

Exercise: Modify this pattern to avoid the unconditional jump
inside the loop.

Exercise: Modify this pattern for: do S while (E)

Exercise: Modify this pattern for: for (i=i0; i < in; i++) S

Exercise: How are break and contine statements executed?

4.2 Forward jumps

How can we generate forward goto statements when we do not 
yet know the (relative) address of the destination?

(a) Leave it to the assembler.

(b) Keep track of the relative address of instructions as
attributes in the structure tree, compute the destination
as an inherited attribute, and/or pass it as an arguement
to code generation routines.

(c) For each target label, keep a list of the instructions
that jump to the label.  When the address of the label is
determined, "backpatch" the instructions in the list to 
refer to the label.  This list may be "threaded" through
the code.  (This requires keeping the generated instructions
in a main memory buffer, or using a temporary file, 
which requires an extra pass over the generated code.)

4.3 Boolean expressions

Boolean expressions, particularly when used in control
statements, should not require boolean operations on values
true and false.  Rather, jumps should be used to implement
the "short circuit" (or "sequential evalation") of boolean
operations in most modern languages.  I.e., 

a and b  = if a then b else false
a or b   = if a then true else b

For example, 

if (i < 0 || i > j) && (done || n > 500) S1 else S2

should be translated into something like this:

t1 = i < 0
if_true t1 goto L2
t2 = i > j
if_false t2 goto L3
label L2
t3 = done
if_true t3 goto L4
t4 = n > 500
if_false t4 goto L3
label L4

goto L5
label L3

label L5

Note that no boolean operations are required.

To generate such code, there are two options:

(a) Compute the following attributes for each boolean
expression in the structure tree:

label = label of this expression
trueLabel = target label if expression is true
falseLabel = target label if expression is false
next = true/false if label following expression is
       trueLabel/falseLabel
       
Exercise: Show the values of these attributes on an
attributed structure tree for the above control
statement.
            
Exercise: Write an attribute grammar to evaluate these
attributes for the grammar:

stmt -> if (exp) stmt else stmt | other
exp -> exp || exp | exp && exp | not exp | id | true | false
            
Given these attributes, code is generated as follows:
A conditional jump instruction is emitted following the
code to evaluate each operand that contains no more boolean
operators.  The target of the jump is the label that does
not immediately follow the operand, and the condition is
chosen accordingly.

(b) Alternatively, we may pass the above three attributes
as arguments in a standard recursive code generation procedure

void genBoolCode(SyntaxTree E, trueLabel, falseLabel, next)

and propogating the label and next values in successive
procedure calls.

4.4 A simple code generator for control statements

Here we show a simple code generator for control statements
generated by the following grammar into P-code:

stmt -> if-stmt | while-stmt | break | other
if-stmt -> if (exp) stmt | if (exp) stmt else stmt
while-stmt -> while (exp) stmt
exp -> id | true | false

The assumed tree structure is the natural one.  Here, label
is the destination of a break statement.

void genCode(SyntaxTree t, char *label) {
  char codestr[MAXLINELENGTH+1];
  char *lab1, *lab2;
  if (t != NULL) 
    switch (t->kind) {
      case ExpKind:
        tmp1 = newname();
        if (t->val == 0)  emitCode("ldc false");
        else emitCode("ldc true");
        break;
      case IfKind:
        genCode(t->child[0], label);
        lab1 = genLabel();
        sprintf(codestr, "fjp %s", lab1);
        emitCode(codestr);
        genCode(t->child[1], label);
        if (t->child[2] != NULL) {
          lab2 = genLabel();
          sprintf(codestr, "ujp %s", lab2);
          emitCode(codestr);
        }
        sprintf(codestr, "lab %s", lab1);
        emit(codestr);
        if (t->child[2] != NULL) {
          genCode(t->child[2],label);
          sprintf(codestr, "lab %s", lab2);
          emitCode(codestr);
        }
        break;
      case WhileKind:
        lab1 = genLabel();
        sprintf(codestr, "lab %s", lab1);
        emitCode(codestr);
        genCode(t->child[0], label);
        lab2 = genLabel();
        sprintf(codestr, "fjp %s", lab2);
        emitCode(codestr);
        genCode(t->child[1], lab2);
        sprintf(codestr, "ujp %s", lab1);
        emitCode(codestr);
        sprintf(codestr, "lab %s", lab2);
        emitCode(codestr);
        break;
      case BreakKind:
        sprintf(codestr, "ujp %s", label);
        emitCode(codestr);
        break;
      case OtherKind:
        emitCode("Other");
        break;
      default:
        emitCode("Error");
        break;
    }
}

Exercise: Extend the grammar and code generation procedure
to include "continue" statements.

Exercise: Modify this procedure to generate three-address 
code.

Exercise: Extend the grammar to include boolean operators  
and extend either code generation procedure to generate code
for the extended language (using short circuit evaluation).

5. Procedure and function calls.

This is very dependent on details of the intermediate and
target languages.

5.1 Intermediate code

Intermediate code for a function definition has the form:

Entry instruction

Return instruction

Intermediate code for a function call has the form:

Begin argument computation instruction

Call instruction

The four bracketing instructions depend on the number, size
and location of the parameters (stack or registers), the
size of the stack frame, the size of the local variable and
temporary value space, the size and organisation of the 
bookkeeping information.  In particular, it is necessary
to pass and save the return address (which follows the call
instruction) and to push a new frame onto the stack 
(updating the frame and stack pointers).  It is also necessary
to pass any return value from the called function to the
calling function (on the stack or through a register).

In intermediate code, we may ignore many details of how 
this is all done.  For example, the function definition

int f(int x, int y) { return x+y+1; }

may be translated into the following three-address code:

entry f
t1 = x + y
t2 = t1 + 1
return t2

A corresponding call

f(2+3, 4)

may be translated into the three-address code:

begin_args
t1 = 2 + 3
arg t1
arg 4
call f

(The order of listing these arguments may depend on the source
language.)

The instructions "entry f" and "call f" are jointly responsible
for pushing a new frame onto the stack, and the instruction
"return t2" is responsible for popping a frame from the stack.

5.2 A simple code generator function definitions and calls

We consider the following minimal grammar:

program -> dec-list exp
dec-list -> dec-list dec | 
dec -> fn id ( par-list ) = exp
par-list -> par-list , id | id
exp -> exp + exp | call | num | id
call -> id ( arg-list )
arg-list -> arg-list , exp | exp

A possible program in the language generated by this grammar is:

fn f(x) = 2+x
fn g(x,y) = f(x) + y
g(3,4)

The assumed tree structure is the natural one (see Figure 8.13).

Exercise: Draw the structure tree corresponding to the above
program.

We now give a simple code generator for programs in this language
into P-code (Figure 8.14).

void genCode(SyntaxTree t) {
  char codestr[MAXLINELENGTH+1];
  SyntaxTree p;
  if (t != NULL) 
    switch (t->kind) {
      case PrgK: // program
        p = t->lchild;
        while (p != NULL) {
          genCode(p);       // declaration
          p = p->sibling;
        }
        genCode(r->rchild); // expression
        break;
      case FnK: // declaration
        sprintf(codestr, "ent %s", t->name);
        emitCode(codestr);  // entry instruction
        genCode(t->rchild); // body
        emitCode("ret");    // return
        break;
      case pParamK: // parameter (never reached)
        break;
      case ConstK: // constant
        sprintf(codestr, "ldc %d", t->value);
        emitCode(codestr);
        break;
      case PlusK: // sum
        genCode(t->lchild);
        genCode(t->rchild);
        emitCode("adi");
        break;
      case IdK: // identifier
        sprintf(codestr, "lod %s", t->name);
        emitCode(codestr);
        break;
      case CallK: // call
        emitCode("mst"); // begin argument computation
        p = t->rchild;
        while (p != NULL) {
          genCode(p);    // argument
          p = p->sibling;
        }
        sprintf(codestr, "cup %s", t->name);
        emitCode(codestr); // call instruction
        break;
      default:
        emitCode("Error");
        break;
    }
}

Exercise: Modify this procedure to generate three-address 
code.

Exercise: Modify this procedure to generate code for the
following let-expression language:

program -> let dec-list in exp
dec-list -> dec-list , dec | dec
dec -> id = exp | fn id ( par-list ) = exp
exp -> exp + exp | call | num | id
call -> id ( arg-list )
arg-list -> arg-list , exp | exp

Assume the language uses static scope rules.

(Compare with the let-expression language of lecture 7.)

6. Register allocation

Storing subexpression values in registers when possible...

7. The TINY code generator

Review the definitions of the TINY language and TM, the simple
target machine.

Study the code generator in files code.h, code.c, cgen.h and
cgen.c.

Consider code generators from either of the above two expression
languages with function definitions into TM.

Exercise: Allocate two registers for a stack pointer and a frame
pointer and propose a detailed run-time representation for the
language using TM.

Exercise: Modify the above code generation procedure to generate
TM code using this run-time representation.