Translating C to LC-3, Vol. 1

Storing variables

The textbook's sketchy rules for variable storage on the LC-3 are found in Sections 12.5 to 12.6.3 (pp. 326-336).

Variables are stored according to their duration.

Variable size

The amount of space allocated to a variable will depend on the computer architecture.

C typeLC-3Intel
char11
int14
type *11
char[10]1010
int[10]1040

Symbol table

Pointless function

int A[100] ;

int addEm(int startHere, int endHere) {
  int i, sum ;
  sum = 0 ;
  for (i=startHere; i<endHere; ++i)
    sum += A[i]++ ;
}

Pointless function symbol table

variabledurableoffsetsize
Astatic15100
endHereautomatic41
iautomatic01
startHereautomatic51
sumautomatic-11

The real world

The textbook's way of handling global variables isn't really feasible. In the real world, every function (or compilation unit) has a table (or a list) with an entry for each accessed global variable. The compiler just allocates space for these entries and provides a way for the linker to associate named global variables with these entires. The linker places the address of the global variable into the entries.

Effectively, every global variable reference is similar to a pointer reference.

Variable access

Reading the value (rvalue) of a variable

sumLDR  R0,R5,#-1
endHereLDR  R0,R5,#4
A[3]LDR  R0,R4,#18
AADD  R0,R4,#15

Note that the "value" of A is really the address of its first element, because array variables are the address of an array's first element.

If a variable is stored more than 32 words from its data pointer (R4 or R5), three instructions are required to access its value. In the following example, the automatic variable x has an offset of -50.

        LD   R0,xOFFSET           ;; R0  :=  -50
        ADD  R0,R0,R5             ;; R0  :=  &x
        LDR  R0,R0,#0             ;; R0  :=  x
; ........
xOFFSET .FILL   #-50

Array indexes also require multiple instructions.

        ADD  R0,R4,#15            ;; R0  :=  A
        LDR  R1,R5,#0             ;; R1  :=  i
        ADD  R0,R0,R1             ;; R0  :=  A+i or &A[i]
        LDR  R0,R0,#0             ;; R0  :=  A[i]

Obtaining the address (lvalue) of a variable

&sumADD  R0,R5,#-1
&endHereADD  R0,R5,#4
&A[3]ADD  R0,R4,#18

There is no &A.

To obtain the address of variables more than 32 words from their data pointers or array elements, just omit the last instruction from the two code sequences shown above.

Dereferencing a variable

If we want to execute a C statement such as "*p = v ;", we need to first get p and v into registers. Let's say p is in R2 and v is in R3. Then the C statement can be accomplished with the instruction "STR  R3,R2,#0".

If you wanted to go in the other direction, i.e., "v = *p ;", then use "LDR  R3,R2,#0".

Expression evaluation

Expression evaluation is the process of translated expressions, such as the following, into LC-3 instructions.

sum += A[i]++ ;
eolist = ( p == NULL || *p == '\0' ) ;

Problem with the textbook "method"

Good expression evaluation is hard to do. The textbook's approach is very simple and very incomplete. In Section 10.3 (pp. 264-272), expression evaluation is done with a push-down stack. So that a C statement like "x = y + x*y" would be implemented with something like the following:

        LD    R0,y
        JSR   PUSH
        LD    R0,x
        JSR   PUSH
        LD    R0,y
        JSR   PUSH        ;; stack has [y][x][y]
        JSR   OpMult      ;; stack has [y][x*y]
        JSR   OpAdd       ;; stack has [y+x*y]
        JSR   POP         ;; R0 is [y+x*y]
        ST    R0,y

This works fine for simple statements, but it just won't do for C. For example in evaluating A[i]++, you can't put the value of A[i] on the stack and then call something like OpPlusPlus, because ++ needs the address of A[i], not the value of A[i].

Similarly for something like p == NULL || *p == '\0', you can't put p == NULL and *p == '\0' on the stack and then call OpLogicalOR, because you shouldn't even attempt the evaluation of *p == '\0' when p == NULL is true.

The hack solution

Instead of searching for an automatic solution to expression evaluation, try an ad hoc approach where you translate a complex expression into a sequence of simple assignments where only one operator appears on the left hand side. You'll need to use made-up variable names to do this. You'll also sometimes need the & operator when an lvalue is expected.

Here's examples of suitable sequences for the earlier C statements. The made-up variable names use Greek letters.

/* sum += A[i]++ ; */
int *α ;
int  β ;
α = &A[i] ;
β = *α ;
α++ ;
sum = sum + β ;
/* eolist = ( p == NULL || *p == '\0' ) ; */
int α ;
int β ;
α = p == NULL ;
if (α != 0) goto ω ;
β  = *p ;
α = β == '\0' ;
ω :
if (α != 0) α = 1 ;
eolist = α ;

Better yet. Hope your instructor gives simpler problems.

Control Structures

This table gives some suggestions for translating the C code on the left. You need to apply these rules until you get nothing but assignments and goto's.

if (expression)
  statement
α = expression ;
if (α == 0) goto ω ;
statement
ω :
if (expression)
  statement1
else
  statement2
α = expression ;
if (α == 0) goto ψ ;
statement1
goto ω ;
ψ :
statement2
ω :
while (expression)
  statement
goto ω ;
ψ :
statement
ω :
α = expression ;
if (α != 0) goto ψ ;
do
  statement
while (expression) ;
ψ :
statement
α = expression ;
if (α != 0) goto ψ ;
for(init ; condition ; increment)
  statement
init
goto ω ;
ψ :
statement
increment
ω :
α = condition ;
if (α != 0) goto ψ ;

Function invocation

Called function

Function prologue

On entry to the function, all of its arguments have been placed on the top of the stack. The i'th argument is at offset 1-i from register R6. R5 points to the address of the activation record of the calling procedure.

In common terminology, R5 is called the dynamic link and R6 is called the stack pointer. R7 will contain the address of the calling instruction. This is the return address.

In the first four instructions of the function, the return address and dynamic link are stored on the stack. One stack slot is also set aside to hold the return value. Then the dynamic link is updated to point to the first local variable. Finally, the stack pointer is adjusted to allocate space for local variables.

If l is the number of local variables used by the function, the first five instructions are as follows:

FPROLOG ADD   R6,R6,#-3      ;; Allocate stack space for "bookkeeping"
        STR   R7,R6,#1       ;; Store return address of caller
        STR   R5,R6,#0       ;; Store dynamic link of caller
        ADD   R5,R6,#-1      ;; Point dynamic link to first local
        ADD   R6,R6,#(-l)    ;; Point stack pointer to last local

Inside the function

All variables, including function parameters, can be accessed using the offsets of the function symbol table.

Function epilogue

When the function exits, the the return value must be placed on the stack on top of it arguments. The return value slot is located at offset 3 from R5.

The dynamic link and return address must also be restored from the stack. R6, the stack pointer, must also be set to point to the return address.

Assuming that R0 contains the return value, here are five instructions that will return from a function.

FEPILOG STR   R0,R5,#3       ;; Store return value
        ADD   R6,R5,#3       ;; Point stack pointer to return value
        LDR   R7,R5,#2       ;; Restore return address
        LDR   R5,R5,#1       ;; Restore dynamic link
        RET                  ;; Return

Calling function

Function invocation

Before the function is called, all its arguments must be evaluated. Let's assume this has happened and that the arguments are stored in made-up variables α1 to αn.

The function invocation must then place the n arguments on the stack. The arguments are placed on the stack from last to first. Letting i go from n down to 1, push the arguments using code similar to the following:

        ADD   R6,R6,#1
        LDR   R0,R5,offset for αi
        STR   R0,R6,#0

Control can now be transfered to the called function with a JSR or, more likely, a JSRR instruction.

Function return

Function return is pretty simple. The calling function will transfer the return value into a register and then "remove" the return value and the n arguments from the stack by adding n+1 to the stack pointer.

        LDR   R0,R6,#0       ;; Place return value in R0
        ADD   R6,R6,n+1 

An example

Let's look at an example, a recursive implementation of Euclid's GCD algorithm.

Elegant C implementation

int GCD(int n, int m) {
  if (n==m)
    return n ;
  else if (n<m)
    return GCD(m, n) ;
  else
    return GCD(n-m, m) ;
}

Awkward C implementation

int GCD(int n, int m) {
  int r, t ;
  t = n-m ;
  if (t==0)
    r = n ;
  else {
    int a1, a2 ;
    if (t<0) {
      a1 = m ;
      a2 = n ;
    } else {
      a1 = t ;
      a2 = m ;
    }
    r = GCD(a1, a2) ;
  }
  return r ;
}

Symbol table

use offset
a2 -3
a1 -2
t -1
r 0
dynamic link 1
return address 2
return value 3
n 4
m 5

LC-3 code

        .ORIG        x4B00

;; Prologue -- CREATE THE ACTIVATION RECORD
GCD     ADD     R6,R6,#-3      ;; Allocate stack space for "bookkeeping"
        STR     R7,R6,#1       ;; Store caller return address
        STR     R5,R6,#0       ;; Store caller dynamic link
        ADD     R5,R6,#-1      ;; Set R5 to first local
        ADD     R6,R6,#-4      ;; Set R6 to last local (#locals is 4)

;; t = n-m ;
        LDR     R0,R5,#4       ;; R0 = n
        LDR     R1,R5,#5       ;; R1 = m
        NOT     R1,R1
        ADD     R1,R1,#1
        ADD     R0,R0,R1       ;; R0 = n-m
        STR     R0,R5,#-1      ;; t = R0

;; if (t==0)
        LDR     R0,R5,#-1      ;; R0 = t
        BRnp    ELSE1

;;   r = n ;
        LDR     R0,R5,#4       ;; R0 = n
        STR     R0,R5,#0       ;; r = R0

        BRnzp   JOIN1
;;   else {
ELSE1

;;    if (t<0) {
        LDR     R0,R5,#-1      ;; R0 = t
        BRzp    ELSE2

;;      a1 = m ;
        LDR     R0,R5,#5       ;; R0 = m
        STR     R0,R5,#-2      ;; a1 = R0

;;      a2 = n ;
        LDR     R0,R5,#4       ;; R0 = n
        STR     R0,R5,#-3      ;; a2 = R0

        BRnzp   JOIN2
;;    } else {
ELSE2

;;      a1 = t ;
        LDR     R0,R5,#-1      ;; R0 = t
        STR     R0,R5,#-2      ;; a1 = R0

;;      a2 = m ;
        LDR     R0,R5,#5       ;; R0 = m
        STR     R0,R5,#-3      ;; a2 = R0

JOIN2
;;    }

;;    r = GCD(a1, a2) ;
        ADD     R6,R6,#-1      ;; Put a2 on call stack
        LDR     R0,R5,#-3
        STR     R0,R6,#0
        ADD     R6,R6,#-1      ;; Put a1 on call stack
        LDR     R0,R5,#-2
        STR     R0,R6,#0
        JSR     GCD            ;; Recurse!

        LD      R0,R6,#0       ;; R0 = return value
        ADD     R6,R6,#3       ;; Remove parameters and return value
                               ;; #parameters is 2
        STR     R0,R5,#0       ;; r = R0

JOIN1
;;  }

;;  return r ;
        LDR     R0,R5,#0       ;; R0 = r
        STR     R0,R5,#3       ;; return value = R0
       
;; Epilogue -- REMOVE THE ACTIVATION RECORD
        ADD     R6,R5,#3       ;; Point stack pointer to return value
        LDR     R7,R5,#2       ;; Restore return address
        LDR     R5,R5,#1       ;; Restore dynamic link
        RET                    ;; Return

        .END