Translating C to PIC: Functions

Properties of the call conventions

On the PIC24, the function call conventions described in the MPLAB XC16 C Compiler User’s Guide are followed for all function calls generated by the C compiler.

Register usage

Register W15 is used as a stack pointer. This choice that is reflected in the various pop and push machine instructions.
Registers W0 to W7 are caller saved. They are also used for parameters and return values as explained below.
Registers W8 to W14 are callee saved. Usually W14 will be used as a frame pointer.

Parameter passing

If possible, function arguments are passed in registers W0 to W7. 8-bit and 16-bit parameters are passed in a single register, 32-bit parameters are passed in two registers, 32-bit parameters are passed in two registers, and 64-bit parameters are passed in four registers.

The parameters are placed, in order, in the registers W0 to W7. If a parameter cannot fix within the remaining registers, it is placed on the stack in right-to-left order. It is possible for the first and third arguments to be passed in registers, while the second argument is passed on the stack.

In C, an array is passed as a pointer to its first element; but, if space permits, a structure is passed in sequential registers. (This rule seems particularly strange when a structure contains nothing but a single array.) C also permits variadic functions, such as printf, which have a variable number of arguments. Effectively these functions receive a list of parameters as its last argument (written as ... in the function header). This variably-size parameter list, which appears to also include the last paramter before the ..., is never placed in registers.

An example

struct sillyStruct { int8_t buff[80] } ;

int16_t f(int8_t a, int32_t b, struct sillyStruct c, int32_t d[80], int16_t e) ;

parameter	location
`a`	`W0`
`b`	`W2:W1`
`c`	on stack (offset -86)
`d`	`W3` (as address)
`e`	`W4`

Return value

The simpler return values, such as numbers, are returned in registers W0 to W3 as needed. A 16-bit value, such as an uint16_t, will be returned in W0. A 64-bit value, such as a double, will use all four registers.

When an aggregate value, such as a struct or an union, is returned from a function; W0 will contain the address of the returned value.

Stack frame

When programs are compiled under the XC16 compiler, the default action for function invocation is to allocate a stack frame. The frame pointer, the address of the base of the stack frame, is contained in W14. Since W14 is a callee saved, there is no problem with calling routines that don’t allocate a stack frame.

Here is the order for storing information on the stack.

purpose	size in bytes
Parameters that couldn’t fit in registers	varies, but typically 0
saved `PC`, return address	4
saved `R14`, dynamic link	2
local variables, saved registers, arguments to other procedures	varies

By definition, the stack frame begins at the address contained in the frame pointer. On the PIC24 with the XC compilers, this is the location of the first local variable. This means that local variables and saved registers are typically addressed as positive offsets from R14, such as [R14+2]. If any parameters have been passed on the stack, they will be addressed as negative offsets, such as such as [R14-8].

For a small gain in efficiency, it is possible to avoid allocating a stack frame. In this case variables are accessed at negative offsets from the stack pointer.

Actions of the called function

Function prologue

On entry to the function, all of its arguments are either contained in registers or stored near the top of the stack. At the very top of the stack is the two-word return address.

When a stack frame is being used, the first instruction of a function is
LNK #n
where n is the amount of storage needed for local variables and saved registers. The LNK #n instruction performes the following actions:

Pushes W14, the present frame pointer, onto the stack. This now becomes the dynamic link.
Sets W14 to the address of the top of the stack. This now becomes the bottom of the new stack frame.
Increases the stack pointer, W15, by n. This creates space for local variables and saved registers within the stack frame.

The other action usually performed during the function prologue is saving registers. If the function modifies any callee-saved registers, W8 to W13, they must be saved. Also, if the function usually need to save any caller-saved registers that must be used to pass parameters in its own function calls.

Inside the function

All local variables stored in the stack frame will be accessed by positiive offsets of W14, the frame pointer. Most parameters will be accessed in the registers in which they were passed, but those that are stored on the stack will be accessed by negative offsets of the frame pointer.

Function epilogue

Before the function exits, any modified callee-saved registers must be restored and the return value must be loaded into the appropriate registers.

Then the function calls the ULNK instruction which does the following:

Copies the frame pointer, W14, to the stack pointer, W15. This deallocates the present stack frame.
Pops the top of the stack, which now contains the dynamic link, into the frame pointer. This restores the stack frame of the caller function.

The only remaining action of the epilogue is to call the RETURN instruction which pops the two words containing the return address into the program counter. This causes control to transfer back to the calling function.

Actions of the calling function

Function invocation

Before the function is called, all its arguments must be evaluated and placed in the appropriate registers or pushed on the stack. Now a call instruction, typically RCALL, will be made. The call instruction will push the present program counter, the address of the next instruction, onto the stack and then set the program counter to its argument.

The function will also need to save copies of any caller-saved registers it needs after the call is completed.

Function return

If any parameters were passed on the stack, the calling function should adjust the stack pointer to “remove” them. Then, the calling function may also copy the return value to its own local storage. Finally, the calling function may need to restore any caller-saved registers it was using.

An example

Let’s look at an example, an inefficient recursive function for squaring a positive number.

int square(int n) {
    int r = 0 ;
    if (n != 0) {
        r = square(n-1) + n + n - 1 ;
    }
    return r ;
}

Here’s the code generated by the XC16 compiler at optimization level 0, with a few comments added by me.

     LNK     #0x4
     MOV     W0, [W14+2]            ;; n is saved is [W14+2]
     CLR     W0
     MOV     W0, [W14]              ;; r is saved in [W14]
     MOV     [W14+2], W0            ;; testing if n == 0
     SUB     W0, #0x0, [W15]
     BRA     Z, 1f
     MOV     [W14+2], W0
     DEC     W0, W0
     RCALL   square                 ;; calling square with n-1
     MOV     [W14+2], W1
     ADD     W0, W1, W1
     MOV     [W14+2], W0
     ADD     W1, W0, W0
     DEC     W0, [W14]
1:   MOV     [W14], W0
     ULNK  
     RETURN

This version is a shorter.

;; start of prologue
     LNK     #2
     MOV     W0, [W14]              ;; n is saved is [W14]
;; end of prologue
     CLR     W7                     ;; r is saved in W7
     CP0     W0                     ;; testing if n == 0
     BRA     Z, 1f
;; start of invocation
     DEC     W0, W0                 ;; Setting first parameter to n-1
     RCALL   square                 ;; calling square with n-1
;; end of invocation
;; On return, W0 is square(n-1)
     MOV     [W14], W6              ;; must restore the old n (as W6)
     ADD     W0, W6, W7             ;; W7 == square(n-1) + n
     ADD     W7, W6, W7             ;; W7 == square(n-1) + n + n
     DEC     W7, W7                 ;; W7 == square(n-1) + n + n - 1
;; start of epilogue
1:   MOV     W7, W0
     ULNK  
     RETURN
;; end of epilogue

Intel IA32 code

Let’s see how a real compiler translates. Start with the following C routine.

int WhoCares(double V, int N) ;

int Abs(double *X, double *Y, int N) {
  int M, R ;
  M = N ;
  if (M < 0)
    M = -M ;
  R = WhoCares(*X * *Y, N) ;
  return R ;
}

Without optimization, here is the code generated by gcc on a 32-bit Intel platform. The comments have been added by the instructor. I'm afraid gcc generates code using “AT&T syntax” which differs significantly from Intel’s assembler syntax.

        .file   "progC.c"
        .text                           ; this section for compiled code
.globl Abs
        .type   Abs, @function          ; Abs is a global function
Abs:
        pushl   %ebp                    ; push the base pointer on the stack
        movl    %esp, %ebp              ; move stack pointer to base pointer
        subl    $40, %esp               ; allocate 40 bytes on stack for frame
        movl    16(%ebp), %eax          ; move N to register AX
        movl    %eax, -8(%ebp)          ; move AX to M
        cmpl    $0, -8(%ebp)            ; compare M with 0
        jns     .L2                     ; jump if no sign
        negl    -8(%ebp)                ; negate M, M = -M
.L2:
        movl    8(%ebp), %eax           ; move X to AX
        fldl    (%eax)                  ; push *X on floating point stack
        movl    12(%ebp), %eax          ; move Y to AX
        fldl    (%eax)                  ; push *Y on floating point stack
        fmulp   %st, %st(1)             ; floating point multiply
        movl    16(%ebp), %eax          ; move N to AX
        movl    %eax, 8(%esp)           ; place N on stack
        fstpl   (%esp)                  ; place *X * *Y on stack
        call    WhoCares                ; place return address on stack
                                        ; and branch to WhoCares
        movl    %eax, -4(%ebp)          ; move AX to R
        movl    -4(%ebp), %eax          ; move R to AX (returned value)
        leave                           ; move base pointer to stack pointer
                                        ; and restore stack pointer from stack
        ret                             ; return using address on stack
        .size   Abs, .-Abs
        .ident  "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-48)"
        .section     .note.GNU-stack,"",@progbits

AMD64 code

The 64-bit AMD64 (or Intel EM64T) instruction set significantly reduces the cost of function calls. On the AMD64, the first six arguments to a function are passed in registers, so arguments rarely need to be pushed on the stack. The AMD64 also doesn’t use a frame pointer because variables can be addressed relative to the stack pointer if the compiler avoids unneeded changes to the stack pointer. Also, since the AMD64 has 16 registers, twice the 8 of the IA32, many procedures can do all their work using registers.

One additional unusual feature of the AMD64 is that every procedure is allowed to use 128 bytes beyond the stack pointer. This means that leaf procedures often don’t modify the stack pointer.

More information about the AMD64 can be found in x64-64 Machine-Level Programming by Randal Bryant and David O’Hallaron.