Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces
Linux - Syscalls
Implementation
A guided tour
of some
syscalls Lionel Auroux
2017-09-29
Lionel Auroux Linux - Syscalls 2017-09-29 1 / 27
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces
Implementation
A guided tour
of some
syscalls
Generalities
Lionel Auroux Linux - Syscalls 2017-09-29 2 / 27
What is a syscall?
Linux -
Syscalls
Lionel Auroux User space can issue requests to the kernel in order to access its
Generalities resources or perfrom restricted operations.
The syscall
userland You can think of a syscall as regular function call, but where the
interfaces
code being called is in the kernel.
Implementation
A guided tour Syscalls usages:
of some
syscalls
Manipulating files and VFS: open, read, write, . . .
System setup: gettimeofday, swapon, shutdown. . .
Processes management: clone, mmap, . . .
Manipulating devices: ioctl, mount, . . .
Cryptography and security: seccomp, getrandom, . . .
...
Lionel Auroux Linux - Syscalls 2017-09-29 3 / 27
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces
Implementation
A guided tour
of some
syscalls The syscall userland interfaces
Lionel Auroux Linux - Syscalls 2017-09-29 4 / 27
In assembly
Linux -
Syscalls On x86
Lionel Auroux
mov eax, 1 ; exit
Generalities int 0x80 ; or sysenter
The syscall
userland
interfaces Syscall number: eax
Implementation Arguments: ebx, ecx, edx, esi, edi, ebp, then use the
A guided tour stack
of some
syscalls
On x86_64
mov rax, 60 ; exit
syscall
Syscall number: rax
Arguments: rdi, rsi, rdx, rcx, r8 and r9, no args on
memory
Lionel Auroux Linux - Syscalls 2017-09-29 5 / 27
syscall(2)
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall #include <unistd.h>
userland
interfaces #include <sys/syscall.h> /* for __NR_xxx */
Implementation
A guided tour long syscall(long number, ...);
of some
syscalls
Copies the arguments and syscall number to the registers.
Traps to kernel code.
Sets errno if the syscall returns an error.
Lionel Auroux Linux - Syscalls 2017-09-29 6 / 27
Don’t panic!
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces You will learn all about that in kernel from scratch!
Implementation You almost never use direct calls to syscall(2).
A guided tour Your libc provides wrappers for most of the syscalls you
of some
syscalls need.
Linux also abstracts all thoses details in kernel code.
For a list of the Linux system calls, see syscalls(2).
Lionel Auroux Linux - Syscalls 2017-09-29 7 / 27
vdso(7)
Linux -
Syscalls
Lionel Auroux
Virtual Dynamically linked Shared Objects
Small shared library (8k) that the kernel automatically
Generalities
maps into the address space of all user-space applications.
The syscall
userland Contains non priviledged code and data: gettimeofday,
interfaces
time, clock_gettime, . . . (arch-depedent)
Implementation
The ELF must be dynamically linked.
A guided tour
of some
syscalls
Why?
Making system calls can be slow.
On x86 32bit, int 0x80 is expensive: goes through the full
interrupt-handling paths in the processor’s microcode as
well as in the kernel.
Even if there is a dedicated instr (syscall), context
switching must be done.
Lionel Auroux Linux - Syscalls 2017-09-29 8 / 27
Context switch
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall A context is:
userland
interfaces
Implementation The CPU registers (including the instruction pointer)
A guided tour The state of a process (including threads):
of some
syscalls Memory state: stack, page tables, etc.
CPU state: registers, caches, etc.
Process scheduler state
...
Lionel Auroux Linux - Syscalls 2017-09-29 9 / 27
vdso in action
Linux -
Syscalls
$ cat time.c
Lionel Auroux
int main(int ac, char **av) {
Generalities printf("%d\n", time(0));
The syscall }
userland
interfaces
$ gcc time.c -o time -static
Implementation
$ strace -e time ./time
A guided tour
time(NULL) = 1411171041
of some 1411171041
syscalls
+++ exited with 11 +++
$ gcc time.c -o time
$ ldd ./time
linux-vdso.so.1 (0x00007fffe1735000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fee5e753000
/lib64/ld-linux-x86-64.so.2 (0x00007fee5eb01000)
$ strace -e time ./time
1411171118
+++ exited with 11 +++
Lionel Auroux Linux - Syscalls 2017-09-29 10 / 27
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces
Implementation
A guided tour
of some
syscalls Implementation
Lionel Auroux Linux - Syscalls 2017-09-29 11 / 27
Defining a syscall
Linux -
Syscalls
Lionel Auroux
Use the SYSCALL_DEFINEx(syscall, ...) macros anywhere
Generalities
in Linux code.
The syscall
userland
interfaces These macros expands to:
Implementation
A guided tour SYSCALL_METADATA(syscall, ...) generate metadata
of some
syscalls used in the FTRACE tracing framework.
__SYSCALL_DEFINEx(syscall, ...) more function
definition expansion.
Ultimatly expand to: asmlinkage long
SyS_syscall(..)
asmlinkage means that arguments are on the stack.
Lionel Auroux Linux - Syscalls 2017-09-29 12 / 27
Example
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall In kernel/signal.c:
userland
interfaces
3538 SYSCALL_DEFINE0(pause)
Implementation
3539 {
A guided tour 3540 while (!signal_pending(current)) {
of some
syscalls 3541 current->state = TASK_INTERRUPTIBLE;
3542 schedule();
3543 }
3544 return -ERESTARTNOHAND;
3545 }
Lionel Auroux Linux - Syscalls 2017-09-29 13 / 27
Side notes
Linux -
Syscalls current
Lionel Auroux
#include <asm/current.h>
Generalities ...
The syscall pr_debug("The process is \"%s\" (pid %i)\n",
userland
interfaces
current->comm, current->pid);
Implementation
A guided tour signal_pending
of some
syscalls static inline int signal_pending(struct task_struct *p)
{
return unlikely(
test_tsk_thread_flag(p,TIF_SIGPENDING));
}
schedule()
Ask the scheduling subsystem to pick the next process to run.
Lionel Auroux Linux - Syscalls 2017-09-29 14 / 27
The syscalls tables
Linux -
Syscalls
See arch/x86/entry/syscalls/syscall_{32,64}.tbl.
Lionel Auroux
Generalities
syscall_32.tbl
# <number> <abi> <name> <entry point> <compat entry point>
The syscall 0 i386 restart_syscall sys_restart_syscall
userland 1 i386 exit sys_exit
interfaces 2 i386 fork sys_fork stub32_fork
3 i386 read sys_read
Implementation 4 i386 write sys_write
A guided tour 5 i386 open sys_open compat_sys_open
of some 6 i386 close sys_close
syscalls
syscall_64.tbl
0 common read sys_read
1 common write sys_write
2 common open sys_open
3 common close sys_close
4 common stat sys_newstat
5 common fstat sys_newfstat
...
16 64 ioctl sys_ioctl
...
514 x32 ioctl compat_sys_ioctl
Lionel Auroux Linux - Syscalls 2017-09-29 15 / 27
Generation
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces
Implementation Kbuild calls the syscalltbl.sh to generate
A guided tour arch/x86/include/generated/asm/syscalls_{64,32}.h
of some
syscalls Same with syscallhdr.sh
Lionel Auroux Linux - Syscalls 2017-09-29 16 / 27
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces
Implementation
A guided tour
of some
syscalls A guided tour of some syscalls
Lionel Auroux Linux - Syscalls 2017-09-29 17 / 27
sysinfo
Linux -
Syscalls
Lionel Auroux
kernel/sys.c
Generalities
2099 SYSCALL_DEFINE1(sysinfo,
The syscall
userland struct sysinfo __user *, info)
interfaces
2100 {
Implementation 2101 struct sysinfo val;
A guided tour 2102
of some
syscalls 2103 do_sysinfo(&val);
2104
2105 if (copy_to_user(info, &val,
sizeof(struct sysinfo)))
2106 return -EFAULT;
2107
2108 return 0;
2109 }
Lionel Auroux Linux - Syscalls 2017-09-29 18 / 27
User data
Linux -
Syscalls
Lionel Auroux
__user
Generalities
The syscall
Used by tools such as sparse to statically check the use of
userland
interfaces
userspace pointers.
Implementation
# define __user __attribute__((noderef,
A guided tour address_space(1)))
of some
syscalls
copy_to_user
Copy data from kernel land to user land.
Checks that all bytes are writeable, using:
access_ok(VERIFIY_WRITE, addr_to, length)
Lionel Auroux Linux - Syscalls 2017-09-29 19 / 27
ioctl
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
#include <sys/ioctl.h>
userland
interfaces
int ioctl(int d, unsigned long request, ...);
Implementation
A guided tour
of some
Control devices.
syscalls A big mess:
Request numbers encodes data.
Request data is untyped (void *).
See LDD3, Chapter 6: Advanced Char Driver Operations.
Lionel Auroux Linux - Syscalls 2017-09-29 20 / 27
clone
Linux - clone
Syscalls
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
Lionel Auroux
unsigned long, newsp,
int __user *, parent_tidptr,
Generalities
int __user *, child_tidptr,
The syscall int, tls_val)
userland
interfaces {
return do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr);
Implementation
}
A guided tour
of some
syscalls
Lionel Auroux Linux - Syscalls 2017-09-29 21 / 27
clone
Linux - clone
Syscalls
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
Lionel Auroux
unsigned long, newsp,
int __user *, parent_tidptr,
Generalities
int __user *, child_tidptr,
The syscall int, tls_val)
userland
interfaces {
return do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr);
Implementation
}
A guided tour
of some
syscalls fork
SYSCALL_DEFINE0(fork)
{
return do_fork(SIGCHLD, 0, 0, NULL, NULL);
}
Lionel Auroux Linux - Syscalls 2017-09-29 21 / 27
clone
Linux - clone
Syscalls
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
Lionel Auroux
unsigned long, newsp,
int __user *, parent_tidptr,
Generalities
int __user *, child_tidptr,
The syscall int, tls_val)
userland
interfaces {
return do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr);
Implementation
}
A guided tour
of some
syscalls fork
SYSCALL_DEFINE0(fork)
{
return do_fork(SIGCHLD, 0, 0, NULL, NULL);
}
vfork
SYSCALL_DEFINE0(vfork)
{
return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0,
0, NULL, NULL);
}
Lionel Auroux Linux - Syscalls 2017-09-29 21 / 27
clone
Linux - clone
Syscalls
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
Lionel Auroux
unsigned long, newsp,
int __user *, parent_tidptr,
Generalities
int __user *, child_tidptr,
The syscall int, tls_val)
userland
interfaces {
return do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr);
Implementation
}
A guided tour
of some
syscalls fork
SYSCALL_DEFINE0(fork)
{
return do_fork(SIGCHLD, 0, 0, NULL, NULL);
}
vfork
SYSCALL_DEFINE0(vfork)
{
return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0,
0, NULL, NULL);
}
Lionel Auroux Linux - Syscalls 2017-09-29 21 / 27
personality
Linux -
Syscalls
Lionel Auroux
Generalities #include <sys/personality.h>
The syscall
userland
interfaces int personality(unsigned long persona);
Implementation
A guided tour
of some
Sets the process execution domain
syscalls
Used by setarch
Tweak:
uname-2.6
exposed architecture (i386, i486, i586, etc.)
STICKY_TIMEOUT
...
Lionel Auroux Linux - Syscalls 2017-09-29 22 / 27
reboot
Linux -
Syscalls
Lionel Auroux
Generalities #include <unistd.h>
The syscall #include <linux/reboot.h>
userland
interfaces
Implementation int reboot(int magic, int magic2, int cmd, void *arg);
A guided tour
of some
syscalls This system call will fail (with EINVAL) unless magic equals
LINUX_REBOOT_MAGIC1 (that is, 0xfee1dead) and magic2 equals
LINUX_REBOOT_MAGIC2 (that is, 672274793). However, since 2.1.17 also
LINUX_REBOOT_MAGIC2A (that is, 85072278) and since 2.1.97 also
LINUX_REBOOT_MAGIC2B (that is, 369367448) and since 2.5.71 also
LINUX_REBOOT_MAGIC2C (that is, 537993216) are permitted as value for
magic2. (The hexadecimal values of these constants are meaningful.)
Lionel Auroux Linux - Syscalls 2017-09-29 23 / 27
rt_XXX syscalls
Linux -
Syscalls
The addition or real-time signals required the widening of the
Lionel Auroux
signal set structure (sigset_t) from 32 to 64 bits.
Consequently, various system calls were superseded by new
Generalities
system calls that supported the larger signal sets.
The syscall
userland
interfaces
Linux < 2.0 Linux >= 2.2
Implementation
A guided tour sigaction(2) rt_sigaction(2)
of some
syscalls sigpending(2) rt_sigpending(2)
sigprocmask(2) rt_sigprocmask(2)
sigreturn(2) rt_sigreturn(2)
sigsusprend(2) rt_sigsuspend(2)
sigtimedwait(2) rt_sigtimedwait(2)
Lionel Auroux Linux - Syscalls 2017-09-29 24 / 27
Going further than syscalls
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces
There are places in the kernel where the complexity of the
Implementation
task goes bewond a call to a function.
A guided tour
of some ioctl has grew dangerously.
syscalls
For example, netlink(7) aims to replace ioctl for
network configuration.
Lionel Auroux Linux - Syscalls 2017-09-29 25 / 27
References
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces
http://lwn.net/Articles/604287/
Implementation
http://lwn.net/Articles/604515/
A guided tour
of some https://www.kernel.org/doc/htmldocs/kernel-hacking
syscalls
Searchable Linux Syscall Table:
https://filippo.io/linux-syscall-table/
Lionel Auroux Linux - Syscalls 2017-09-29 26 / 27
Contact info
Linux -
Syscalls
Lionel Auroux
Generalities
The syscall
userland
interfaces
Implementation
A guided tour lionel [at] lse.epita.fr with [linux] tag
of some
syscalls
Lionel Auroux Linux - Syscalls 2017-09-29 27 / 27