Translating C to LC-3: Variables

Variable Allocation

The textbook's sketchy rules for variable allocation on the LC-3 are found in Sections 12.5 to 12.6.3 (pp. 326-336).

Variables are stored according to their duration.

Variables with static duration are stored in the global data section
Variables with automatic duration are stored on the run-time stack

Variable size

The number of addresses allocated to a variable will depend on the computer architecture.

C type	LC-3 16-bit	Intel 32-bit
`char`	1	1
`int`	1	4
`type *`	1	4
`char[10]`	10	10
`int[10]`	10	40

Symbol table

Variables with automatic duration
- Allocated in function symbol table for the containing function
- Referenced at negative offsets (including 0) from R5
  - Except the i'th function parameter is at offset i+3
- Allocated by the compiler
Variables with static duration
- Allocated in global symbol table constructed for all functions
- Referenced at positive offsets (including 0) from R4
- Allocated by mysterious means

Sometimes the same name will be given to variables with different scopes. For example, the name of a global variable may also be used as the name of a local variables. In these cases, the two variables must be considered distinct. This means that the symbol table must identify the variable by both its name and its scope. However, for simplicity, we will omit the scope in our examples.

Pointless function

int A[100] ;

int addEm(int startHere, int endHere) {
  int i, sum ;
  sum = 0 ;
  for (i=startHere; i<endHere; ++i) {
    sum += A[i]++ ;
  }
  return sum ;
}

Pointless function symbol table

variable	durable	offset	size
`A`	static	15	100
`endHere`	automatic	4	1
`i`	automatic	0	1
`startHere`	automatic	5	1
`sum`	automatic	-1	1

The real world

The textbook's way of handling global variables isn't really feasible because global variables may be accessed from several programs (or compilation units). In the real world, the best the compiler can do is generate a table (or a list) with an entry for each accessed global variable. If the global variable is allocated within the compilation unit, e.g., is not declared with extern, the compiler may be able to allocate memory for the variable.

The linker is given the task of completing this global references. The linker determines where the global variables are really located and performs relocations to global variable references within the compiled code. One way to do this is to maintain a global offset table (GOT) for each function that contains the addresses of the function's global variable. In this case, global variable access involves a hidden pointer access.

However, when dynamic linking is used, the addresses of global variables may not be known until the application is loaded into memory. In this case, the loader must perform relocations before it brances to the compiled code.

Here are some references for those who want to know more.

Variable access

Variables are read with LDR and written with STR using the register and offset found in the symbol table. Because array variables are constant pointers, they require special handling.

Reading the value (rvalue) of a variable

`sum`	`LDR R0,R5,#-1`
`endHere`	`LDR R0,R5,#4`
`A[3]`	`LDR R0,R4,#18`
`A`	`ADD R0,R4,#15`

Note that the "value" of A is really the address of its first element, because array variables are the address of an array's first element.

If a variable is stored more than 32 words from its data pointer (R4 or R5), three instructions are required to access its value. In the following example, the automatic variable x has an offset of -50.

        LD   R0,xOFFSET           ;; R0  :=  -50
        ADD  R0,R0,R5             ;; R0  :=  &x
        LDR  R0,R0,#0             ;; R0  :=  x
; ........
xOFFSET .FILL   #-50

Array indexes also require multiple instructions.

        ADD  R0,R4,#15            ;; R0  :=  A
        LDR  R1,R5,#0             ;; R1  :=  i
        ADD  R0,R0,R1             ;; R0  :=  A+i or &A[i]
        LDR  R0,R0,#0             ;; R0  :=  A[i]

Obtaining the address (lvalue) of a variable

`&sum`	`ADD R0,R5,#-1`
`&endHere`	`ADD R0,R5,#4`
`&A[3]`	`ADD R0,R4,#18`

There is no &A.

To obtain the address of variables more than 32 words from their data pointers or array elements, just omit the last instruction from the two code sequences shown above.

Dereferencing a variable

If we want to execute a C statement such as "*p = v ;", we need to first get p and v into registers. Let's say p is in R2 and v is in R3. Then the C statement can be accomplished with the instruction "STR R3,R2,#0". If you wanted to go in the other direction, i.e., "v = *p ;", then use "LDR R3,R2,#0".