Translating C to LC-3: Functions

Stack frame format

Every call for every function has its own stack frame or activation record. The stack frame contains storage for all the local variables and parameters of the function. Additionly the stack frame can be used for temporary storage of intermediate results in expression evaluation where push and pop operations store and remove values from the stack. Finally, the stack frame store bookkeeping information needed to return to the calling procedure.

By convention, on the LC-3, register R5 contains the address of the base of the stack frame. All parameters and local variables and parameters are addressed at offsets from R5 as specified in the function symbol table.

The bookkeeping information is only stored at fixed offsets from R5. There are three pieces of bookkeeping information which are specified, with their offsets, in the following table.

use	offset
saved dynamic link	1
return address	2
return value	3

The dynamic link is the address of the stack frame of the calling function. In particular, it is the saved value of R5 from the calling function. All of the active stack frames are linked through these fields.

The return address is the address of the instruction within within the calling problem to which this function should branch when it completes. In particular, it is the saved value of the program counter (PC) at the time of the call.

The return value is where the result of this function will be saved. Eventually, it will be the "argument" of the return statement.

The parameters passed to the function are stored on the stack before the bookkeeping information. The offset of the first parameter will be 4; the offset of the second will be 5

The local variables of the function are stored on the stack after the bookkeeping information. The offset of the first parameter will be 0; the offset of the second will be -1.

Called function

Function prologue

On entry to the function, all of its arguments have been placed on the top of the stack. The i'th argument will be stored at offset 1-i from register R6. R5 will point to the address of the activation record of the calling procedure.

In common terminology, R5 is called the dynamic link and R6 is called the stack pointer. R7 will contain the address of the calling instruction. This is the return address.

In the first four instructions of the function, the return address and dynamic link are stored on the stack. One stack slot is also set aside to hold the return value. Then the dynamic link is updated to point to the first local variable. Finally, the stack pointer is adjusted to allocate space for local variables.

If v is the number of local variables used by the function, the first five instructions are as follows:

FPROLOG ADD   R6,R6,#-3      ;; Allocate stack space for "bookkeeping"
        STR   R7,R6,#1       ;; Store return address of caller
        STR   R5,R6,#0       ;; Store dynamic link of caller
        ADD   R5,R6,#-1      ;; Point dynamic link to first local
        ADD   R6,R6,#(-v)    ;; Point stack pointer to last local

Notice that only one instruction depends on the number of local variables used by the function, and that none depend on the number of parameters.

Inside the function

All variables, including function parameters, can be accessed using the offsets of the function symbol table.

Function epilogue

When the function exits, the return value must be placed on the stack on top of the function's arguments. The return value slot is located at offset 3 from R5.

The dynamic link and return address must also be restored from the stack. R6, the stack pointer, must also be set to point to the return address, that is, one more that the last used location of the stack.

Assuming that R0 contains the return value, here are five instructions that will return from a function. This particular sequence of code does not depend on the number of parameters or local variables.

FEPILOG STR   R0,R5,#3       ;; Store return value
        ADD   R6,R5,#3       ;; Point stack pointer to return value
        LDR   R7,R5,#2       ;; Restore return address
        LDR   R5,R5,#1       ;; Restore dynamic link
        RET                  ;; Return

Calling function

Function invocation

Before the function is called, all its arguments must be evaluated. Let's assume this has happened and that the arguments are stored in made-up variables α₁ to α_n.

The function invocation must then place the n arguments on the stack. The arguments are placed on the stack from last to first. Letting i go from n down to 1, push the arguments using code similar to the following:

        ADD   R6,R6,#1
        LDR   R0,R5,offset for α_i
        STR   R0,R6,#0

Control can now be transfered to the called function with a JSR or, more likely, a JSRR instruction.

Function return

Function return is pretty simple. The calling function will transfer the return value into a register and then "remove" the return value and the n arguments from the stack by adding n+1 to the stack pointer.

        LDR   R0,R6,#0       ;; Place return value in R0
        ADD   R6,R6,n+1

An example

Let's look at an example, a recursive implementation of Euclid's GCD algorithm.

Elegant C implementation

int GCD(int n, int m) {
  if (n==m)
    return n ;
  else if (n<m)
    return GCD(m, n) ;
  else
    return GCD(n-m, m) ;
}

Awkward C implementation

int GCD(int n, int m) {
  int r, t ;
  t = n-m ;
  if (t==0)
    r = n ;
  else {
    int a1, a2 ;
    if (t<0) {
      a1 = m ;
      a2 = n ;
    } else {
      a1 = t ;
      a2 = m ;
    }
    r = GCD(a1, a2) ;
  }
  return r ;
}

Symbol table

use	offset
`a2`	-3
`a1`	-2
`t`	-1
`r`	0
dynamic link	1
return address	2
return value	3
`n`	4
`m`	5

LC-3 code

        .ORIG        x4B00

;; Prologue -- CREATE THE ACTIVATION RECORD
GCD     ADD     R6,R6,#-3      ;; Allocate stack space for "bookkeeping"
        STR     R7,R6,#1       ;; Store caller return address
        STR     R5,R6,#0       ;; Store caller dynamic link
        ADD     R5,R6,#-1      ;; Set R5 to first local
        ADD     R6,R6,#-4      ;; Set R6 to last local (#locals is 4)

;; t = n-m ;
        LDR     R0,R5,#4       ;; R0 = n
        LDR     R1,R5,#5       ;; R1 = m
        NOT     R1,R1
        ADD     R1,R1,#1
        ADD     R0,R0,R1       ;; R0 = n-m
        STR     R0,R5,#-1      ;; t = R0

;; if (t==0)
        LDR     R0,R5,#-1      ;; R0 = t
        BRnp    ELSE1

;;   r = n ;
        LDR     R0,R5,#4       ;; R0 = n
        STR     R0,R5,#0       ;; r = R0

        BRnzp   JOIN1
;;   else {
ELSE1

;;    if (t<0) {
        LDR     R0,R5,#-1      ;; R0 = t
        BRzp    ELSE2

;;      a1 = m ;
        LDR     R0,R5,#5       ;; R0 = m
        STR     R0,R5,#-2      ;; a1 = R0

;;      a2 = n ;
        LDR     R0,R5,#4       ;; R0 = n
        STR     R0,R5,#-3      ;; a2 = R0

        BRnzp   JOIN2
;;    } else {
ELSE2

;;      a1 = t ;
        LDR     R0,R5,#-1      ;; R0 = t
        STR     R0,R5,#-2      ;; a1 = R0

;;      a2 = m ;
        LDR     R0,R5,#5       ;; R0 = m
        STR     R0,R5,#-3      ;; a2 = R0

JOIN2
;;    }

;;    r = GCD(a1, a2) ;
        ADD     R6,R6,#-1      ;; Put a2 on call stack
        LDR     R0,R5,#-3
        STR     R0,R6,#0
        ADD     R6,R6,#-1      ;; Put a1 on call stack
        LDR     R0,R5,#-2
        STR     R0,R6,#0
        JSR     GCD            ;; Recurse!

        LD      R0,R6,#0       ;; R0 = return value
        ADD     R6,R6,#3       ;; Remove parameters and return value
                               ;; #parameters is 2
        STR     R0,R5,#0       ;; r = R0

JOIN1
;;  }

;;  return r ;
        LDR     R0,R5,#0       ;; R0 = r
        STR     R0,R5,#3       ;; return value = R0
       
;; Epilogue -- REMOVE THE ACTIVATION RECORD
        ADD     R6,R5,#3       ;; Point stack pointer to return value
        LDR     R7,R5,#2       ;; Restore return address
        LDR     R5,R5,#1       ;; Restore dynamic link
        RET                    ;; Return

        .END

Notice that this function is implemented in 34 instructions. Ten of these instructions are used for the function prologue and epilogue or placing the return value on the stack. The "overhead" of function entry and return must be paid by every function.

The function call itself requires seven instructions to prepare the stack for the call and another three to restore the stack after the call. The price paid by the calling function is ten instructions per call. More precisely, the price is 4+3*n, where n is the number of passed parameters.

Intel code

Let's see how a real compiler translates. Start with the following C routine.

int WhoCares(double V, int N) ;

int Abs(double *X, double *Y, int N) {
  int M, R ;
  M = N ;
  if (M < 0)
    M = -M ;
  R = WhoCares(*X * *Y, N) ;
  return R ;
}

Without optimization, here is the code generated by gcc on a 32-bit Intel platform. The comments have been added by the instructor. I'm afraid gcc generates code using "AT&T syntax" which differs significantly from Intel's assembler syntax.

        .file   "progC.c"
        .text                           ; this section for compiled code
.globl Abs
        .type   Abs, @function          ; Abs is a global function
Abs:
        pushl   %ebp                    ; push the base pointer on the stack
        movl    %esp, %ebp              ; move stack pointer to base pointer
        subl    $40, %esp               ; allocate 40 bytes on stack for frame
        movl    16(%ebp), %eax          ; move N to register AX
        movl    %eax, -8(%ebp)          ; move AX to M
        cmpl    $0, -8(%ebp)            ; compare M with 0
        jns     .L2                     ; jump if no sign
        negl    -8(%ebp)                ; negate M, M = -M
.L2:
        movl    8(%ebp), %eax           ; move X to AX
        fldl    (%eax)                  ; push *X on floating point stack
        movl    12(%ebp), %eax          ; move Y to AX
        fldl    (%eax)                  ; push *Y on floating point stack
        fmulp   %st, %st(1)             ; floating point multiply
        movl    16(%ebp), %eax          ; move N to AX
        movl    %eax, 8(%esp)           ; place N on stack
        fstpl   (%esp)                  ; place *X * *Y on stack
        call    WhoCares                ; place return address on stack
                                        ; and branch to WhoCares
        movl    %eax, -4(%ebp)          ; move AX to R
        movl    -4(%ebp), %eax          ; move R to AX (returned value)
        leave                           ; move base pointer to stack pointer
                                        ; and restore stack pointer from stack
        ret                             ; return using address on stack
        .size   Abs, .-Abs
        .ident  "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-48)"
        .section     .note.GNU-stack,"",@progbits