PLI Lecture 12 Code optimisation (actually, code improvement) 1. Optimisation opportunities * Register allocation - with small number of registers (register spilling) - with large number of registers (register allocation) - register storage with procedure calls (hardware support?) * Unnecessary operations - common subexpression elimination (programmer/compiler generated, e.g., array references) - assignments to subsequently unused variables - dead (i.e., unreachable) code elimination - jump optimisation * Expensive operations - strength reduction - constant folding - constant propagation - procedure inlining (can be specified in C) - tail recursion elimination (required in Scheme) * Machine-specific opportunities - instruction selection - machine idioms - peephole optimisations * Predicting program behaviour (statistically) - frequent paths, procedures code blocks - very important in modern pipelined processors 2. Classification of optimisations Optimisation is available at every stage. * Source-level vs target-level optimisations * Ordering of optimisations - E.g., perform constant folding & propagation before dead code elimination, x = 1; ... y = 0; ... if (y) x = 0; ... if (x) y = 1; ... * Local (basic block) vs global (intraprocedural) vs interprocedural optimisations - A basic block is a maximal unbranching code sequence (normally including procedure calls). * Example of local optimisation - Don't reload value known to be in register. * Example of global optimisation - Store "induction" variables in registers. - Move "constant" code out of loops. 3. Implementation techniques * Syntax tree transformation is sometimes useful, - e.g., for constant folding and dead code elimination. * Syntax tree attribution is often useful, - e.g., for variable usage counts (which variables to store in registers). * Flow graphs for global optimisation - A flow graph is a directed graph whose nodes correspond to basic blocks and whose edges correspond to (conditional) jumps (to the start of basic blocks). - E.g., see Figure 8.18. - A flow graph can be constructed in a single pass over the intermediate code. - Exercise: Describe how to do this. * Data flow analysis is the process of collecting information useful for code optimisation. * Example of data flow analysis - Reaching definitions - A definition (of a variable) is an instruction (e.g., assignment, read) that sets the value of the variable. - A definition reaches a basic block if at the start of the block the variable has the value set by the definition. - Knowing which definitions reach a given basic block is useful for many optimisations, . e.g., for constant propagation. * To generate code for a basic block, it is useful to first construct a DAG for the block. * A DAG for a basic block is an directed acyclic graph that describes how intermediate and output variables in a block depend on its input and intermediate variables (and constants). - E.g., see Figures 8.19 and 8.20. - Exercise: Describe how to construct a DAG for a basic block. * A DAG for a basic block can be transformed into target code by "topologically sorting" the DAG. - A topological sort of a DAG is an ordering of the nonleaf nodes in the DAG such that no target follows one of its source in the ordering. - (Generating topological orderings is a standard algorithm.) - Some topological sorts are better than others. - E.g., topological sorts of Figure 8.19. - E.g., topological sorts of Figure 8.20. - Each element of the topological sort corresponds to an instruction in the target code. * Using DAGs for basic blocks in this way: - automatically does common subexpression elimination - eliminates redundant assignments - enables good register allocation * Maintenance of register descriptors and address descriptors - A register descriptor stores the set of variables whose value is currently stored in the register. - An address descriptor stores the set of locations where a given variable is currently stored. - E.g., descriptor maintenance during execution of the basic block DAG in Figure 8.19 (pp.479-480). - This information can be used for register allocation and dead code elimination. 4. Other techniques * Register usage - Allocate specified registers for program counter, global pointer, frame pointer, stack pointer, etc. - With large number of registers, allocate registers for parameters and local variables of current procedure call. This may require storing register values on stack frame on subsequent procedure calls and restoring them on procedure return. Some instruction sets support this. - Use remaining registers for temporaries in expression evaluation. This may require "spilling" registers to temporary locations in the current frame when more registers are required. * Example: Expression evaluation with register spilling - First compute the number of registers required to evaluate each node of an expression tree: void expRegs(SyntaxTree t) if (t is a constant) t.nregs = 0 else if (t is a variable) t.nregs = 1 else if (t has one child, u) t.nregs = max(1, u.nregs) else if (t has two children, u and v) if (u.nregs = v.nregs) t.nregs = 1 + u.nregs else t.nregs = max(u.nregs, v.nregs) - Then, generate code to evaluate an expression tree using registers r to K-1 (say): void genCode(SyntaxTree t, int r) if (t is a constant) // e.g., 1 emitCode("r = t.value") else if (t is a variable) // e.g, x emitCode("r = t.name") else if (t has one child, u) // e.g., -2 genCode(u, r) emitCode("r = t.op r") else if (t has two children, u and v) if (u.nregs < v.nregs) // e.g., (u+1) * ((x+y)*(2+z)) Exchange u and v // evaluate more complex subexp first genCode(u, r) if (v is a constant or variable) // e.g., (2*(x+y) + z emitCode("r = r t.op v") else if (v.nregs = K-r) // no more registers to evaluate v addr tmp = spill(r) genCode(v. r) emitCode("r+1 = tmp") emitCode("r = r+1 t.op r") else genCode(v, r+1) // e.g., (2*(x+y)) * (3*(u-v)) emitCode("r = r t.op r+1") - Here, spill() emits code to push its argument onto the stack and return the offset from the frame pointer. - Exercise: Trace the behaviour of these functions on some examples, e.g., x*(y+z*2*(x-y)-u)+3 with only 2 registers available. - Exercise: Prove the two functions are correct or find a counterexample which demonstrates they are wrong! - Additional complications arise if we consider whether or not operators are commutative or not ("+", "-"), whether "reverse operations" exist, and so on. - Exercise: Modify function genCode() to work correctly on noncommutative operations such as subtraction and division. - Exercise: Suppose a node in the expression tree is a function call. How do the two functions need to be extended/modified to handle this? - Hint: Either spill all registers on function entry, or assume each function call requires all K registers (and let genCode() spill registers normally).