Semi-advanced addressing

Local variables

Local variables has a lifetime of a single function call or a single program block. Local variables may be saved either in the function’s stack frame or in a register. Function arguments are a special case of local variables. They are usually passed in registers; however, sometimes there will just be too many arguments to pass in registers. In this case, some arguments must be passed on the stack.

Either the compiler or the assembly-language programmer or the ABI (Application Binary Interface) allocates registers and stack locations to local variables.

Suppose N is a local variable. Let’s look at how the statement N += 7 can be implemented in the MIP32 architecture. If register $t5 has been allocated to N, the statement can be implemented as a single instruction.
    addi  $t5,$t5,7
However, if N is implemented on the stack, say at offset 40, up to three instructions are required.
    lw    $t0,40($sp)
    addi  $t0,$t0,7
    sw    $t0,40($sp)

Caller and callee saved

However, what happens if N is allocated to $t5 and then a call is made to a subfunction. How can the programmer be sure that the subfunction doesn’t use $t5 and trash the value of N?

Following the common terminology, we’ll call the calling function the caller and the called function the callee. The MIPS32 ABI states that the temporary registers $t0 to $t9 are caller saved. This means that the callee is free to use these registers and that the caller must reload them after the call returns. The argument registers ($a0 to $a3) and value registers ($v0 to $v1) are also considered caller saved.

On the other hand, the saved registers $s0 to $s7 are callee saved. This means that the callee can use these registers only if their original values are saved before use and restored before return. The caller is guaranteed that the saved registers are unchanged by the call.

Global variables

Global, or external, variables require special handling. The address of a global variable cannot be determined by a compiler or assembler. This is the job of the linker (called ld in the Unix world). The best the assembler or compiler can do is generate a table containing the names of all external variables and the many references, both instructions and data, to these variables in the assembler that it produces. The linker tries to resolve these references.

This requires the use of the global pointer ($gp) register. If x is an external variable, an assembly language program may write a statement similar to
lw $t0,x
If x has been allocated within 2¹⁵ bytes of $gp (more precisely within the range $gp-32768 to $gp+32767), the lw can be implemented with code similar to
lw $t0,OFFSET_x($gp)
where OFFSET_x is computed by the linker.

However, sometimes it is simply imposible to put all global variables in 2¹⁶ bytes. In this case, it is necessary to place the addresses of global variables in the global offset table or GOT. Effectively, the global offset table is an vector containing the addresses of external variables and global variable access is implemented by a two-instruction sequence similar to
lw $t0,OFFSET_GOT_n($gp)
lw $t0,0($t0)

By the way, shared librarys (DLLs) add yet more complexity.

Addresses (Pointers)

If N is a variable stored in memory, how do you get its address? If N is stored at 40 bytes from the begining of the stack pointer, you use something like the following:
    addi  $t0,40($sp)
If N is a global, it will be referenced using the global pointer, $gp. Fortunately the assembler has a little hack allowingyou to write use a idiom such as
    la    $t0,N
which is magically translated to something like
    addi  $t0,$gp,OFFSET_N
or, when a global offset table is used, to
    lw  $t0,OFFSET_GOT_N($gp)
Don’t ask questions, just use the idiom la to load address.

Arrays

Remember that A[i] is really just *(A+i) but keep in mind that the addition must be implemented as A+i*sizeof(*A). So, if A is an array of 4-byte integers, then something like the following is requred on a MIPS32 computer to load A[0] into a register.
    la    $t0,A
    lw    $t1,i
    sll   $t1,$t1,2
    add   $t0,$t0,$t1
    lw    $t0,0($t0)

The Intel architecture can do this in a couple of instructions.
mov i, %rax
movl A(,%rax,4), %eax
However, it’s not at all clear that the Intel two will be faster than the MIPS five. Ultimately, both require the same number of adds and shifts.

The faster solution, for both MIPS and Intel, is to use a compiler that transforms a loop like
    for (int i=0; i<n; ++i) {
        aSum = aSum + A[i] ;
    }
into
    for (int *iP=A[i]; iP<&A[n]; ++iP) {
        aSum = aSum + *iP ;
    }
This eliminates all the adding and shifting required to access the array elements.