Calling_convention
Calling_convention
Introduction
Calling conventions are usually considered part of the application binary interface (ABI). They may be
considered a contract between the caller and the called function.[1]
Related concepts
The names or meanings of the parameters and return values are defined in the application programming
interface (API, as opposed to ABI), which is a separate though related concept to ABI and calling
convention. The names of members within passed structures and objects would also be considered part of
the API, and not ABI. Sometimes APIs do include keywords to specify the calling convention for
functions.
Calling conventions are unlikely to specify the layout of items within structures and objects, such as byte
ordering or structure packing.
For some languages, the calling convention includes details of error or exception handling, (e.g. Go, Java)
and for others, it does not (e.g. C++).
Calling conventions may be related to a particular programming language's evaluation strategy, but most
often are not considered part of it (or vice versa), as the evaluation strategy is usually defined on a higher
abstraction level and seen as a part of the language rather than as a low-level implementation detail of a
particular language's compiler.
Different calling conventions
Calling conventions may differ in:
Where parameters are placed. Options include registers, on the call stack, a mix of both, or
in other memory structures.
The order in which parameters are passed. Options include left-to-right order, or right-to-left,
or something more complex.
How functions that take a variable number of arguments (variadic functions) are handled.
Options include just passed in order (presuming the first parameter is in an obvious position)
or the variable parts in an array.
How return values are delivered from the callee back to the caller. Options include on the
stack, in a register, or reference to something allocated on the heap.
How long or complex values are handled, perhaps by splitting across multiple registers,
within the stack frame, or with reference to memory.
Which registers are guaranteed to have the same value when the callee returns as they did
when the callee was called. These registers are said to be saved or preserved, so they are
not volatile.
How the task of setting up for and cleaning up after a function call is divided between the
caller and the callee. In particular, how the stack frame is restored so the caller may
continue after the callee has finished.
Whether and how metadata describing the arguments is passed
Where the previous value of the frame pointer is stored, which is used to restore the stack
frame when the subroutine ends. Options include within the call stack, or in a specific
register. Sometimes frame pointers are not used at all.[2]
Where any static scope links for the routine's non-local data access are placed (typically at
one or more positions in the stack frame, but sometimes in a general register, or, for some
architectures, in special-purpose registers)
For object-oriented languages, how the function's object is referenced
Many architectures only have one widely-used calling convention, often suggested by the architect. For
RISCs including SPARC, MIPS, and RISC-V, registers names based on this calling convention are often
used. For example, MIPS registers $4 through $7 have "ABI names" $a0 through $a3, reflecting their
use for parameter passing in the standard calling convention. (RISC CPUs have many equivalent general-
purpose registers so there's typically no hardware reason for giving them names other than numbers.)
The calling convention of a given program's language may differ from the calling convention of the
underlying platform, OS, or of some library being linked to. For example, on 32-bit Windows, operating
system calls have the stdcall calling convention, whereas many C programs that run there use the cdecl
calling convention. To accommodate these differences in calling convention, compilers often permit
keywords that specify the calling convention for a given function. The function declarations will include
additional platform-specific keywords that indicate the calling convention to be used. When handled
correctly, the compiler will generate code to call functions in the appropriate manner.
Some languages allow the calling convention for a function to be explicitly specified with that function;
other languages will have some calling convention but it will be hidden from the users of that language,
and therefore will not typically be a consideration for the programmer.
Architectures
x86 (32-bit)
The 32-bit version of the x86 architecture is used with many different calling conventions. Due to the
small number of architectural registers, and historical focus on simplicity and small code-size, many x86
calling conventions pass arguments on the stack. The return value (or a pointer to it) is returned in a
register. Some conventions use registers for the first few parameters which may improve performance,
especially for short and simple leaf-routines very frequently invoked (i.e. routines that do not call other
routines).
Example call:
Typical callee structure: (some or all (except ret) of the instructions below may be optimized away in
simple procedures). Some conventions leave the parameter space allocated, using plain ret instead of
ret imm16. In that case, the caller could add esp,12 in this example, or otherwise deal with the
change to ESP.
calc:
push EBP ; save old frame pointer
mov EBP,ESP ; get new frame pointer
sub ESP,localsize ; reserve stack space for locals
.
. ; perform calculations, leave result in EAX
.
mov ESP,EBP ; free space for locals
pop EBP ; restore old frame pointer
ret paramsize ; free parameter space and return.
x86-64
The 64-bit version of the x86 architecture, known as x86-64, AMD64, and Intel 64, has two calling
sequences in common use. One calling sequence, defined by Microsoft, is used on Windows; the other
calling sequence, specified in the AMD64 System V ABI, is used by Unix-like systems and, with some
changes, by OpenVMS. As x86-64 has more general-purpose registers than does 32-bit x86, both
conventions pass some arguments in registers.
ARM (A32)
The standard 32-bit ARM calling convention allocates the 16 general-purpose registers as:
r15: Program counter (as per the instruction set specification).
r14: Link register. The BL instruction, used in a subroutine call, stores the return address in
this register.
r13: Stack pointer. The Push/Pop instructions in "Thumb" operating mode use this register
only.
r12: Intra-Procedure-call scratch register.
r4 to r11: Local variables.
r0 to r3: Argument values passed to a subroutine and results returned from a subroutine.
If the type of value returned is too large to fit in r0 to r3, or whose size cannot be determined statically at
compile time, then the caller must allocate space for that value at run time, and pass a pointer to that
space in r0.
Subroutines must preserve the contents of r4 to r11 and the stack pointer (perhaps by saving them to the
stack in the function prologue, then using them as scratch space, then restoring them from the stack in the
function epilogue). In particular, subroutines that call other subroutines must save the return address in
the link register r14 to the stack before calling those other subroutines. However, such subroutines do not
need to return that value to r14—they merely need to load that value into r15, the program counter, to
return.
The ARM calling convention mandates using a full-descending stack. In addition, the stack pointer must
always be 4-byte aligned, and must always be 8-byte aligned at a function call with a public interface.[3]
In the prologue, push r4 to r11 to the stack, and push the return address in r14 to the stack
(this can be done with a single STM instruction);
Copy any passed arguments (in r0 to r3) to the local scratch registers (r4 to r11);
Allocate other local variables to the remaining local scratch registers (r4 to r11);
Do calculations and call other subroutines as necessary using BL, assuming r0 to r3, r12
and r14 will not be preserved;
Put the result in r0;
In the epilogue, pull r4 to r11 from the stack, and pull the return address to the program
counter r15. This can be done with a single LDM instruction.
ARM (A64)
The 64-bit ARM (AArch64) calling convention allocates the 31 general-purpose registers as:[4]
RISC-V ISA
RISC-V has a defined calling convention with two flavors, with or without floating point.[6] It passes
arguments in registers whenever possible.
Branch-and-link instructions store the return address in a special link register separate from the general-
purpose registers; a routine returns to its caller with a branch instruction that uses the link register as the
destination address. Leaf routines do not need to save or restore the link register; non-leaf routines must
save the return address before making a call to another routine and restore it before it returns, saving it by
using the Move From Special Purpose Register instruction to move the link register to a general-purpose
register and, if necessary, then saving it to the stack, and restoring it by, if it was saved to the stack,
loading the saved link register value to a general-purpose register, and then using the Move To Special
Purpose Register instruction to move the register containing the saved link-register value to the link
register.
MIPS
The O32[7] ABI is the most commonly-used ABI, owing to its status as the original System V ABI for
MIPS.[8] It is strictly stack-based, with only four registers $a0-$a3 available to pass arguments. This
perceived slowness, along with an antique floating-point model with 16 registers only, has encouraged the
proliferation of many other calling conventions. The ABI took shape in 1990 and was never updated since
1994. It is only defined for 32-bit MIPS, but GCC has created a 64-bit variation called O64.[9]
For 64-bit, the N64 ABI (not related to Nintendo 64) by Silicon Graphics is most commonly used. The
most important improvement is that eight registers are now available for argument passing; It also
increases the number of floating-point registers to 32. There is also an ILP32 version called N32, which
uses 32-bit pointers for smaller code, analogous to the x32 ABI. Both run under the 64-bit mode of the
CPU.[9]
A few attempts have been made to replace O32 with a 32-bit ABI that resembles N32 more. A 1995
conference came up with MIPS EABI, for which the 32-bit version was quite similar.[10] EABI inspired
MIPS Technologies to propose a more radical "NUBI" ABI that additionally reuses argument registers for
the return value.[11] MIPS EABI is supported by GCC but not LLVM; neither supports NUBI.
For all of O32 and N32/N64, the return address is stored in a $ra register. This is automatically set with
the use of the JAL (jump and link) or JALR (jump and link register) instructions. The stack grows
downwards.
SPARC
The SPARC architecture, unlike most RISC architectures, is built on register windows. There are 24
accessible registers in each register window: 8 are the "in" registers (%i0-%i7), 8 are the "local" registers
(%l0-%l7), and 8 are the "out" registers (%o0-%o7). The "in" registers are used to pass arguments to the
function being called, and any additional arguments need to be pushed onto the stack. However, space is
always allocated by the called function to handle a potential register window overflow, local variables,
and (on 32-bit SPARC) returning a struct by value. To call a function, one places the arguments for the
function to be called in the "out" registers; when the function is called, the "out" registers become the "in"
registers and the called function accesses the arguments in its "in" registers. When the called function
completes, it places the return value in the first "in" register, which becomes the first "out" register when
the called function returns.
The System V ABI,[12] which most modern Unix-like systems follow, passes the first six arguments in
"in" registers %i0 through %i5, reserving %i6 for the frame pointer and %i7 for the return address.
Calling program:
Called program:
USING *,153
STM 14,12,12(13) Save registers4
ST 13,SAVE+4 Save caller's savearea addr
LA 12,SAVE Chain saveareas
ST 12,8(13)
LR 13,12
...
L 13,SAVE+45
LM 14,12,12(13)
L 15,RETVAL6
BR 14 Return to caller
SAVE DS 18F Savearea7
Notes:
1. The BALR instruction stores the address of the next instruction (return address) in the
register specified by the first argument—register 14—and branches to the second argument
address in register 15.
2. The caller passes the address of a list of argument addresses in register 1. The last address
has the high-order bit set to indicate the end of the list. This limits programs using this
convention to 31-bit addressing.
3. The address of the called routine is in register 15. Normally this is loaded into another
register and register 15 is not used as a base register.
4. The STM instruction saves registers 14, 15, and 0 through 12 in a 72-byte area provided by
the caller called a save area pointed to by register 13. The called routine provides its own
save area for use by subroutines it calls; the address of this area is normally kept in register
13 throughout the routine. The instructions following STM update forward and backward
chains linking this save area to the caller's save area.
5. The return sequence restores the caller's registers.
6. Register 15 is usually used to pass a return value.
7. Declaring a savearea statically in the called routine makes it non-reentrant and non-
recursive; a reentrant program uses a dynamic savearea, acquired either from the
operating system and freed upon returning, or in storage passed by the calling program.
In the System/390 ABI[13] and the z/Architecture ABI,[14] used in Linux:
gcc (https://web.archive.
Windows CE 5.0 (http://msdn.micr org/web/2014110500003 Renesas (http://documentat
Register osoft.com/fr-fr/library/ms253572.a 6/http://www.kpitgnutool ion.renesas.com/eng/produ
spx) s.com/manuals/SH-ABI- cts/tool/rej10b0152_sh.pdf)
Specification.html)
68k
The most common calling convention for the Motorola 68000 series is:[15][16][17][18]
IBM 1130
The IBM 1130 was a small 16-bit word-addressable machine. It had only six registers plus condition
indicators, and no stack. The registers are Instruction Address Register (IAR), Accumulator (ACC),
Accumulator Extension (EXT), and three index registers X1–X3. The calling program is responsible for
saving ACC, EXT, X1, and X2.[19] There are two pseudo-operations for calling subroutines, CALL to
code non-relocatable subroutines directly linked with the main program, and LIBF to call relocatable
library subroutines through a transfer vector.[20] Both pseudo-ops resolve to a Branch and Store IAR
(BSI) machine instruction that stores the address of the next instruction at its effective address (EA) and
branches to EA+1.
Arguments follow the BSI—usually these are one-word addresses of arguments—the called routine must
know how many arguments to expect so that it can skip over them on return. Alternatively, arguments can
be passed in registers. Function routines returned the result in ACC for real arguments, or in a memory
location referred to as the Real Number Pseudo-Accumulator (FAC). Arguments and the return address
were addressed using an offset to the IAR value stored in the first location of the subroutine.
Subroutines in IBM 1130, CDC 6600 and PDP-8 (all three computers were introduced in 1965) store the
return address in the first location of a subroutine.[21]
Threaded code
Threaded code places all the responsibility for setting up for and cleaning up after a function call on the
called code. The calling code does nothing but list the subroutines to be called. This puts all the function
setup and clean-up code in one place—the prologue and epilogue of the function—rather than in the
many places that function is called. This makes threaded code the most compact calling convention.
Threaded code passes all arguments on the stack. All return values are returned on the stack. This makes
naive implementations slower than calling conventions that keep more values in registers. However,
threaded code implementations that cache several of the top stack values in registers—in particular, the
return address—are usually faster than subroutine calling conventions that always push and pop the return
address to the stack.[22][23][24]
PL/I
The default calling convention for programs written in the PL/I language passes all arguments by
reference, although other conventions may optionally be specified. The arguments are handled differently
for different compilers and platforms, but typically the argument addresses are passed via an argument list
in memory. A final, hidden, address may be passed pointing to an area to contain the return value.
Because of the wide variety of data types supported by PL/I a data descriptor may also be passed to
define, for example, the lengths of character or bit strings, the dimension and bounds of arrays (dope
vectors), or the layout and contents of a data structure. Dummy arguments are created for arguments
which are constants or which do not agree with the type of argument the called procedure expects.
See also
Computer
programming portal
References
1. "Calling Conventions" (https://www.cs.cornell.edu/courses/cs4120/2021sp/notes/callconv/).
cs.cornell.edu. Retrieved 2024-03-05.
2. "/Oy (Frame-Pointer Omission)" (https://learn.microsoft.com/en-us/cpp/build/reference/oy-fra
me-pointer-omission). learn.microsoft.com. 3 August 2021. Retrieved 2024-06-14.
3. "Procedure Call Standard for the ARM Architecture" (https://github.com/ARM-software/abi-a
a/blob/2bcab1e3b22d55170c563c3c7940134089176746/aapcs32/aapcs32.rst). 2021.
4. "Parameters in general-purpose registers" (https://developer.arm.com/documentation/den00
24/a/The-ABI-for-ARM-64-bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Stan
dard/Parameters-in-general-purpose-registers). ARM Cortex-A Series Programmer’s Guide
for ARMv8-A. Retrieved 12 November 2020.
5. "Parameters in NEON and floating-point registers" (https://developer.arm.com/documentatio
n/den0024/a/The-ABI-for-ARM-64-bit-Architecture/Register-use-in-the-AArch64-Procedure-
Call-Standard/Parameters-in-NEON-and-floating-point-registers). developer.arm.com.
Retrieved 13 November 2020.
6. "RISC-V calling convention" (https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf)
(PDF).
7. "MIPS32 Instruction Set Quick Reference" (https://www.mips.com/?do-download=mips32-in
struction-set-quick-reference-v1-01).
8. Sweetman, Dominic. See MIPS Run (2 ed.). Morgan Kaufmann Publishers. ISBN 0-12088-
421-6.
9. "MIPS ABI History" (https://www.linux-mips.org/wiki/MIPS_ABI_History).
10. Christopher, Eric (11 June 2003). "mips eabi documentation" (https://sourceware.org/legacy-
ml/binutils/2003-06/msg00436.html). binutils@sources.redhat.com (Mailing list). Retrieved
19 June 2020.
11. "NUBI" (https://www.linux-mips.org/wiki/NUBI).
12. System V Application Binary Interface SPARC Processor Supplement (http://sparc.org/wp-c
ontent/uploads/2014/01/psABI3rd.pdf.gz) (3 ed.).
13. "S/390 ELF Application Binary Interface Supplement" (http://refspecs.linuxbase.org/ELF/zSe
ries/lzsabi0_s390.html).
14. "zSeries ELF Application Binary Interface Supplement" (https://refspecs.linuxfoundation.org/
ELF/zSeries/lzsabi0_zSeries.html#AEN410).
15. Smith, Dr. Mike. "SHARC (21k) and 68k Register Comparison" (http://people.ucalgary.ca/~s
mithmr/2002webs/encm515_02/02general/background_info/registercompare.htm).
16. XGCC: The Gnu C/C++ Language System for Embedded Development (http://gendev.sprite
smind.net/files/xgcc/xgcc.pdf) (PDF). Embedded Support Tools Corporation. 2000. p. 59.
17. "COLDFIRE/68K: ThreadX for the Freescale ColdFire Family" (https://web.archive.org/web/
20151002215924/http://rtos.com/products/threadx/ColdFire68K#). Archived from the original
(http://rtos.com/products/threadx/ColdFire68K) on 2015-10-02.
18. Moshovos, Andreas. "Subroutines Continued: Passing Arguments, Returning Values and
Allocating Local Variables" (http://www.eecg.toronto.edu/~moshovos/ECE243-06/l12-subrou
tines-2.html). "all registers except d0, d1, a0, a1 and a7 should be preserved across a call."
19. IBM Corporation (1967). IBM 1130 Disk Monitor System, Version 2 System Introduction
(C26-3709-0) (http://media.ibm1130.org/E0018.pdf) (PDF). p. 67. Retrieved 21 December
2014.
20. IBM Corporation (1968). IBM 1130 Assembler Language (C26-5927-4) (http://media.ibm113
0.org/E0022.pdf) (PDF). pp. 24–25.
21. Smotherman, Mark (2004). "Subroutine and procedure call support: Early history" (http://peo
ple.cs.clemson.edu/~mark/subroutines.html).
22. Rodriguez, Brad. "Moving Forth, Part 1: Design Decisions in the Forth Kernel" (http://www.br
adrodriguez.com/papers/moving1.htm). "On the 6809 or Zilog Super8, DTC is faster than
STC."
23. Ertl, Anton. "Speed of various interpreter dispatch techniques" (http://www.complang.tuwien.
ac.at/forth/threading/).
24. Zaleski, Mathew (2008). "Chapter 4: Design and Implementation of Efficient Interpretation"
(http://www.cs.toronto.edu/~matz/dissertation/matzDissertation-latex2html/node7.html).
YETI: a graduallY Extensible Trace Interpreter. "Although direct-threaded interpreters are
known to have poor branch prediction properties... the latency of a call and return may be
greater than an indirect jump."
External links
Johnson, Stephen Curtis; Ritchie, Dennis MacAlistair (September 1981). "Computing
Science Technical Report No. 102: The C Language Calling Sequence" (https://www.bell-lab
s.com/usr/dmr/www/clcs.html). Bell Laboratories.
Introduction to assembly on the PowerPC (http://www.ibm.com/developerworks/library/l-pp
c/)
Mac OS X ABI Function Call Guide (https://developer.apple.com/library/mac/documentation/
DeveloperTools/Conceptual/LowLevelABI/000-Introduction/introduction.html)
Procedure Call Standard for the ARM Architecture (http://infocenter.arm.com/help/topic/com.
arm.doc.ihi0042e/IHI0042E_aapcs.pdf)
Embedded Programming with the GNU Toolchain, Section 10. C Startup (http://www.braveg
nu.org/gnu-eprog/c-startup.html)