Properties of the call conventions
On the PIC24, the function call conventions described in the MPLAB XC16 C Compiler User’s Guide are followed for all function calls generated by the C compiler.
Register usage
- Register
W15
is used as a stack pointer. This choice that is reflected in the various pop and push machine instructions. - Registers
W0
toW7
are caller saved. They are also used for parameters and return values as explained below. - Registers
W8
toW14
are callee saved. UsuallyW14
will be used as a frame pointer.
Parameter passing
If possible, function arguments are passed in registers
W0
to W7
.
8-bit and 16-bit parameters are passed in a single register,
32-bit parameters are passed in two registers,
32-bit parameters are passed in two registers,
and 64-bit parameters are passed in four registers.
The parameters are placed, in order, in the registers
W0
to W7
. If a parameter cannot fix
within the remaining registers, it is placed on the stack
in right-to-left order.
It is possible for the first and third arguments to be
passed in registers, while the second argument is
passed on the stack.
In C, an array is passed as a pointer to its first element;
but, if space permits, a structure is passed in sequential registers.
(This rule seems particularly strange when a structure contains nothing
but a single array.)
C also permits variadic functions, such as
printf
, which have a variable number of arguments.
Effectively these functions receive a list of parameters
as its last argument (written as ...
in the function
header). This variably-size parameter list, which appears
to also include the last paramter before the ...
,
is never placed in registers.
An example
struct sillyStruct { int8_t buff[80] } ; int16_t f(int8_t a, int32_t b, struct sillyStruct c, int32_t d[80], int16_t e) ;
parameter | location |
---|---|
a |
W0 |
b |
W2:W1 |
c |
on stack (offset -86) |
d |
W3 (as address) |
e |
W4 |
Return value
The simpler return values, such as numbers, are returned in
registers W0
to W3
as needed.
A 16-bit value, such as an uint16_t
, will be returned
in W0
. A 64-bit value, such as a double
,
will use all four registers.
When an aggregate value, such as a struct
or an union
, is returned from a function;
W0
will contain the address of the returned value.
Stack frame
When programs are compiled under the XC16 compiler, the default
action for function invocation is to allocate a stack frame.
The frame pointer, the
address of the base of the stack frame, is contained in W14
.
Since W14
is a callee saved, there is no problem with
calling routines that don’t allocate a stack frame.
Here is the order for storing information on the stack.
purpose | size in bytes |
---|---|
Parameters that couldn’t fit in registers | varies, but typically 0 |
saved PC , return address |
4 |
saved R14 , dynamic link |
2 |
local variables, saved registers, arguments to other procedures | varies |
By definition, the stack frame begins at the address contained in
the frame pointer. On the PIC24 with the XC compilers, this is the location
of the first local variable.
This means that local variables and saved registers are typically addressed
as positive offsets from R14
, such as [R14+2]
.
If any parameters have been passed on the stack, they will
be addressed as negative offsets, such as such as [R14-8]
.
For a small gain in efficiency, it is possible to avoid allocating a stack frame. In this case variables are accessed at negative offsets from the stack pointer.
Actions of the called function
Function prologue
On entry to the function, all of its arguments are either contained in registers or stored near the top of the stack. At the very top of the stack is the two-word return address.
When a stack frame is being used, the first instruction of a
function is
LNK #n
where n
is the amount of storage needed for local variables and saved registers.
The LNK #n
instruction performes the following actions:
- Pushes
W14
, the present frame pointer, onto the stack. This now becomes the dynamic link. - Sets
W14
to the address of the top of the stack. This now becomes the bottom of the new stack frame. - Increases the stack pointer,
W15
, byn
. This creates space for local variables and saved registers within the stack frame.
The other action usually performed during the function prologue
is saving registers. If the function modifies any callee-saved registers,
W8
to W13
, they must be saved.
Also, if the function usually need to save
any caller-saved registers that must be
used to pass parameters in its own function calls.
Inside the function
All local variables stored in the stack frame
will be accessed by positiive offsets of W14
, the frame pointer.
Most parameters will be accessed in the registers in which they were
passed, but those that are stored on the stack will be accessed
by negative offsets of the frame pointer.
Function epilogue
Before the function exits, any modified callee-saved registers must be restored and the return value must be loaded into the appropriate registers.
Then the function calls the ULNK
instruction
which does the following:
- Copies the frame pointer,
W14
, to the stack pointer,W15
. This deallocates the present stack frame. - Pops the top of the stack, which now contains the dynamic link, into the frame pointer. This restores the stack frame of the caller function.
The only remaining action of the epilogue is to call the
RETURN
instruction which pops the two words containing
the return address into the program counter. This causes
control to transfer back to the calling function.
Actions of the calling function
Function invocation
Before the function is called, all its arguments must be evaluated
and placed in the appropriate registers or pushed on the stack.
Now a call instruction, typically RCALL
, will be made.
The call instruction will push the present program counter, the address
of the next instruction, onto the stack and then set the program counter
to its argument.
The function will also need to save copies of any caller-saved registers it needs after the call is completed.
Function return
If any parameters were passed on the stack, the calling function should adjust the stack pointer to “remove” them. Then, the calling function may also copy the return value to its own local storage. Finally, the calling function may need to restore any caller-saved registers it was using.
An example
Let’s look at an example, an inefficient recursive function for squaring a positive number.
int square(int n) { int r = 0 ; if (n != 0) { r = square(n-1) + n + n - 1 ; } return r ; }
Here’s the code generated by the XC16 compiler at optimization level 0, with a few comments added by me.
LNK #0x4 MOV W0, [W14+2] ;; n is saved is [W14+2] CLR W0 MOV W0, [W14] ;; r is saved in [W14] MOV [W14+2], W0 ;; testing if n == 0 SUB W0, #0x0, [W15] BRA Z, 1f MOV [W14+2], W0 DEC W0, W0 RCALL square ;; calling square with n-1 MOV [W14+2], W1 ADD W0, W1, W1 MOV [W14+2], W0 ADD W1, W0, W0 DEC W0, [W14] 1: MOV [W14], W0 ULNK RETURN
This version is a shorter.
;; start of prologue LNK #2 MOV W0, [W14] ;; n is saved is [W14] ;; end of prologue CLR W7 ;; r is saved in W7 CP0 W0 ;; testing if n == 0 BRA Z, 1f ;; start of invocation DEC W0, W0 ;; Setting first parameter to n-1 RCALL square ;; calling square with n-1 ;; end of invocation ;; On return, W0 is square(n-1) MOV [W14], W6 ;; must restore the old n (as W6) ADD W0, W6, W7 ;; W7 == square(n-1) + n ADD W7, W6, W7 ;; W7 == square(n-1) + n + n DEC W7, W7 ;; W7 == square(n-1) + n + n - 1 ;; start of epilogue 1: MOV W7, W0 ULNK RETURN ;; end of epilogue
Intel IA32 code
Let’s see how a real compiler translates. Start with the following C routine.
int WhoCares(double V, int N) ; int Abs(double *X, double *Y, int N) { int M, R ; M = N ; if (M < 0) M = -M ; R = WhoCares(*X * *Y, N) ; return R ; }
Without optimization, here is the code generated by gcc on a 32-bit Intel platform. The comments have been added by the instructor. I'm afraid gcc generates code using “AT&T syntax” which differs significantly from Intel’s assembler syntax.
.file "progC.c" .text ; this section for compiled code .globl Abs .type Abs, @function ; Abs is a global function Abs: pushl %ebp ; push the base pointer on the stack movl %esp, %ebp ; move stack pointer to base pointer subl $40, %esp ; allocate 40 bytes on stack for frame movl 16(%ebp), %eax ; move N to register AX movl %eax, -8(%ebp) ; move AX to M cmpl $0, -8(%ebp) ; compare M with 0 jns .L2 ; jump if no sign negl -8(%ebp) ; negate M, M = -M .L2: movl 8(%ebp), %eax ; move X to AX fldl (%eax) ; push *X on floating point stack movl 12(%ebp), %eax ; move Y to AX fldl (%eax) ; push *Y on floating point stack fmulp %st, %st(1) ; floating point multiply movl 16(%ebp), %eax ; move N to AX movl %eax, 8(%esp) ; place N on stack fstpl (%esp) ; place *X * *Y on stack call WhoCares ; place return address on stack ; and branch to WhoCares movl %eax, -4(%ebp) ; move AX to R movl -4(%ebp), %eax ; move R to AX (returned value) leave ; move base pointer to stack pointer ; and restore stack pointer from stack ret ; return using address on stack .size Abs, .-Abs .ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-48)" .section .note.GNU-stack,"",@progbits
AMD64 code
The 64-bit AMD64 (or Intel EM64T) instruction set significantly reduces the cost of function calls. On the AMD64, the first six arguments to a function are passed in registers, so arguments rarely need to be pushed on the stack. The AMD64 also doesn’t use a frame pointer because variables can be addressed relative to the stack pointer if the compiler avoids unneeded changes to the stack pointer. Also, since the AMD64 has 16 registers, twice the 8 of the IA32, many procedures can do all their work using registers.
One additional unusual feature of the AMD64 is that every procedure is allowed to use 128 bytes beyond the stack pointer. This means that leaf procedures often don’t modify the stack pointer.
More information about the AMD64 can be found in x64-64 Machine-Level Programming by Randal Bryant and David O’Hallaron.