[go: up one dir, main page]

0% found this document useful (0 votes)
25 views4 pages

Linux-Insides Interrupts Linux-Interrupts Part 3

This document discusses the #DB and #BP exceptions in Linux. #DB occurs when a debug event happens, like changing a debug register. #BP occurs when the int 3 instruction is executed. The document explains that the debug and int3 functions are set as the handlers for these exceptions during early initialization. It also provides background details on debug registers, the vector numbers and error codes for the exceptions, and an example C program that generates a #BP exception by using int 3.

Uploaded by

Avinash Pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views4 pages

Linux-Insides Interrupts Linux-Interrupts Part 3

This document discusses the #DB and #BP exceptions in Linux. #DB occurs when a debug event happens, like changing a debug register. #BP occurs when the int 3 instruction is executed. The document explains that the debug and int3 functions are set as the handlers for these exceptions during early initialization. It also provides background details on debug registers, the vector numbers and error codes for the exceptions, and an example C program that generates a #BP exception by using int 3.

Uploaded by

Avinash Pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

load_idt(&idt_descr);

0xAX / linux-insides Public }

Code Issues 27 Pull requests 7 Actions Security Insights from the arch/x86/kernel/traps.c. We already saw implementation of the
set_intr_gate_ist and set_system_intr_gate_ist functions in the previous part and now
we will look on the implementation of these two exception handlers.
linux-insides / Interrupts / linux-interrupts-3.md

renaudgermain copyedit: interrupts chapter last year


Debug and Breakpoint exceptions

Ok, we setup exception handlers in the early_trap_init function for the #DB and #BP
523 lines (396 loc) · 24.2 KB
exceptions and now is time to consider their implementations. But before we will do this,
first of all let's look on details of these exceptions.
Preview Code Blame Raw

The first exceptions - #DB or debug exception occurs when a debug event occurs. For
example - attempt to change the contents of a debug register. Debug registers are special
Interrupts and Interrupt Handling. Part 3. registers that were presented in x86 processors starting from the Intel 80386 processor
and as you can understand from name of this CPU extension, main purpose of these
registers is debugging.
Exception Handling
These registers allow to set breakpoints on the code and read or write data to trace it.
Debug registers may be accessed only in the privileged mode and an attempt to read or
This is the third part of the chapter about interrupts and an exceptions handling in the
write the debug registers when executing at any other privilege level causes a general
Linux kernel and in the previous part we stopped at the setup_arch function from the
protection fault exception. That's why we have used set_intr_gate_ist for the #DB
arch/x86/kernel/setup.c source code file.
exception, but not the set_system_intr_gate_ist .
We already know that this function executes initialization of architecture-specific stuff. In
The vector number of the #DB exceptions is 1 (we pass it as X86_TRAP_DB ) and as we
our case the setup_arch function does x86_64 architecture related initializations. The
setup_arch is big function, and in the previous part we stopped on the setting of the two
may read in specification, this exception has no error code:

exception handlers for the two following exceptions:


+-----------------------------------------------------+
#DB - debug exception, transfers control from the interrupted process to the debug |Vector|Mnemonic|Description |Type |Error Code|
handler; +-----------------------------------------------------+
|1 | #DB |Reserved |F/T |NO |
#BP - breakpoint exception, caused by the int 3 instruction. +-----------------------------------------------------+

These exceptions allow the x86_64 architecture to have early exception processing for the
purpose of debugging via the kgdb. The second exception is #BP or breakpoint exception occurs when processor executes
the int 3 instruction. Unlike the DB exception, the #BP exception may occur in userspace.
As you can remember we set these exceptions handlers in the early_trap_init function: We can add it anywhere in our code, for example let's look on the simple program:

void __init early_trap_init(void) // breakpoint.c


{
#include <stdio.h>
set_intr_gate_ist(X86_TRAP_DB, &debug, DEBUG_STACK);
set_system_intr_gate_ist(X86_TRAP_BP, &int3, DEBUG_STACK); int main() {

int i; ...
while (i < 6){ ...
printf("i equal to: %d\n", i); ...
__asm__("int3");
++i;
} From this moment we know a little about these two exceptions and we can move on to
} consideration of their handlers.

If we will compile and run this program, we will see following output: Preparation before an exception handler

$ gcc breakpoint.c -o breakpoint As you may note before, the set_intr_gate_ist and set_system_intr_gate_ist functions
$ ./breakpoint takes an addresses of exceptions handlers in theirs second parameter. In or case our two
i equal to: 0 exception handlers will be:
Trace/breakpoint trap
debug ;

But if will run it with gdb, we will see our breakpoint and can continue execution of our int3 .

program:
You will not find these functions in the C code. All of that could be found in the kernel's
*.c/*.h files only definition of these functions which are located in the
$ gdb breakpoint
arch/x86/include/asm/traps.h kernel header file:
...
...
... asmlinkage void debug(void);
(gdb) run
Starting program: /home/alex/breakpoints
i equal to: 0 and

Program received signal SIGTRAP, Trace/breakpoint trap.


asmlinkage void int3(void);
0x0000000000400585 in main ()
=> 0x0000000000400585 <main+31>: 83 45 fc 01 add DWORD PTR
[rbp-0x4],0x1
You may note asmlinkage directive in definitions of these functions. The directive is the
(gdb) c
special specificator of the gcc. Actually for a C functions which are called from assembly,
Continuing.
i equal to: 1 we need in explicit declaration of the function calling convention. In our case, if function
made with asmlinkage descriptor, then gcc will compile the function to retrieve
Program received signal SIGTRAP, Trace/breakpoint trap. parameters from stack.
0x0000000000400585 in main ()
=> 0x0000000000400585 <main+31>: 83 45 fc 01 add DWORD PTR So, both handlers are defined in the arch/x86/entry/entry_64.S assembly source code file
[rbp-0x4],0x1
with the idtentry macro:
(gdb) c
Continuing.
i equal to: 2 idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK

Program received signal SIGTRAP, Trace/breakpoint trap.


0x0000000000400585 in main () and
=> 0x0000000000400585 <main+31>: 83 45 fc 01 add DWORD PTR
[rbp-0x4],0x1
+32 | %RSP |
idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK +24 | %RFLAGS |
+16 | %CS |
+8 | %RIP |
Each exception handler may consists of two parts. The first part is generic part and it is the 0 | ERROR CODE | <-- %RSP
same for all exception handlers. An exception handler should to save general purpose +------------+
registers on the stack, switch to kernel stack if an exception came from userspace and
transfer control to the second part of an exception handler. The second part of an
Now we may start to consider implementation of the idtmacro . Both #DB and BP
exception handler does certain work depends on certain exception. For example page fault
exception handlers are defined as:
exception handler should find virtual page for given address, invalid opcode exception
handler should send SIGILL signal and etc.
idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
As we just saw, an exception handler starts from definition of the idtentry macro from idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK

the arch/x86/entry/entry_64.S assembly source code file, so let's look at implementation of


this macro. As we may see, the idtentry macro takes five arguments: If we will look at these definitions, we may know that compiler will generate two routines
with debug and int3 names and both of these exception handlers will call do_debug and
sym - defines global symbol with the .globl name which will be an an entry of
do_int3 secondary handlers after some preparation. The third parameter defines
exception handler;
existence of error code and as we may see both our exception do not have them. As we
do_sym - symbol name which represents a secondary entry of an exception handler; may see on the diagram above, processor pushes error code on stack if an exception
has_error_code - information about existence of an error code of exception. provides it. In our case, the debug and int3 exception do not have error codes. This may
bring some difficulties because stack will look differently for exceptions which provides
The last two parameters are optional:
error code and for exceptions which not. That's why implementation of the idtentry
paranoid - shows us how we need to check current mode (will see explanation in macro starts from putting a fake error code to the stack if an exception does not provide it:
details later);
shift_ist - shows us is an exception running at Interrupt Stack Table . .ifeq \has_error_code
pushq $-1
.endif
Definition of the .idtentry macro looks:

.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 But it is not only fake error-code. Moreover the -1 also represents invalid system call
ENTRY(\sym) number, so that the system call restart logic will not be triggered.
...
... The last two parameters of the idtentry macro shift_ist and paranoid allow to know
... do an exception handler runned at stack from Interrupt Stack Table or not. You already
END(\sym)
may know that each kernel thread in the system has its own stack. In addition to these
.endm
stacks, there are some specialized stacks associated with each processor in the system.
One of these stacks is - exception stack. The x86_64 architecture provides special feature
Before we will consider internals of the idtentry macro, we should to know state of stack which is called - Interrupt Stack Table . This feature allows to switch to a new stack for
when an exception occurs. As we may read in the Intel® 64 and IA-32 Architectures designated events such as an atomic exceptions like double fault , etc. So the shift_ist
Software Developer’s Manual 3A, the state of stack when an exception occurs is following: parameter allows us to know do we need to switch on IST stack for an exception handler
or not.
+------------+
+40 | %SS |

The second parameter - paranoid defines the method which helps us to know did we
ALLOC_PT_GPREGS_ON_STACK
come from userspace or not to an exception handler. The easiest way to determine this is
to via CPL or Current Privilege Level in CS segment register. If it is equal to 3 , we
came from userspace, if zero we came from kernel space: macro which is defined in the arch/x86/entry/calling.h header file. This macro just allocates
15*8 bytes space on the stack to preserve general purpose registers:
testl $3,CS(%rsp)
jnz userspace .macro ALLOC_PT_GPREGS_ON_STACK addskip=0
... addq $-(15*8+\addskip), %rsp
... .endm
...
// we are from the kernel space
So the stack will look like this after execution of the ALLOC_PT_GPREGS_ON_STACK :

But unfortunately this method does not give a 100% guarantee. As described in the kernel
+------------+
documentation:
+160 | %SS |
+152 | %RSP |
if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context, which might +144 | %RFLAGS |
have triggered right after a normal entry wrote CS to the stack but before we +136 | %CS |
executed SWAPGS, then the only safe way to check for GS is the slower method: the +128 | %RIP |
RDMSR. +120 | ERROR CODE |
|------------|
In other words for example NMI could happen inside the critical section of a swapgs +112 | |
+104 | |
instruction. In this way we should check value of the MSR_GS_BASE model specific register
+96 | |
which stores pointer to the start of per-cpu area. So to check if we did come from +88 | |
userspace or not, we should to check value of the MSR_GS_BASE model specific register and +80 | |
if it is negative we came from kernel space, in other way we came from userspace: +72 | |
+64 | |
+56 | |
movl $MSR_GS_BASE,%ecx +48 | |
rdmsr +40 | |
testl %edx,%edx +32 | |
js 1f +24 | |
+16 | |
+8 | |
In first two lines of code we read value of the MSR_GS_BASE model specific register into +0 | | <- %RSP
edx:eax pair. We can't set negative value to the gs from userspace. But from other side +------------+
we know that direct mapping of the physical memory starts from the 0xffff880000000000
virtual address. In this way, MSR_GS_BASE will contain an address from 0xffff880000000000
After we allocated space for general purpose registers, we do some checks to understand
to 0xffffc7ffffffffff . After the rdmsr instruction will be executed, the smallest possible
did an exception come from userspace or not and if yes, we should move back to an
value in the %edx register will be - 0xffff8800 which is -30720 in unsigned 4 bytes.
interrupted process stack or stay on exception stack:
That's why kernel space gs which points to start of per-cpu area will contain negative
value.
.if \paranoid
After we push fake error code on the stack, we should allocate space for general purpose .if \paranoid == 1
testb $3, CS(%rsp)
registers with:
jnz 1f
.endif +------------+
call paranoid_entry +160 | %SS |
.else +152 | %RSP |
call error_entry +144 | %RFLAGS |
.endif +136 | %CS |
+128 | %RIP |
+120 | ERROR CODE |
Let's consider all of these there cases in course. |------------|
+112 | %RDI |
+104 | %RSI |
An exception occurred in userspace +96 | %RDX |
+88 | %RCX |
In the first let's consider a case when an exception has paranoid=1 like our debug and +80 | %RAX |
+72 | %R8 |
int3 exceptions. In this case we check selector from CS segment register and jump at
+64 | %R9 |
1f label if we came from userspace or the paranoid_entry will be called in other way.
+56 | %R10 |
+48 | %R11 |
Let's consider first case when we came from userspace to an exception handler. As +40 | %RBX |
described above we should jump at 1 label. The 1 label starts from the call of the +32 | %RBP |
+24 | %R12 |
+16 | %R13 |
call error_entry
+8 | %R14 |
+0 | %R15 | <- %RSP
+------------+
routine which saves all general purpose registers in the previously allocated area on the
stack:
After the kernel saved general purpose registers at the stack, we should check that we
SAVE_C_REGS 8 came from userspace space again with:
SAVE_EXTRA_REGS 8

testb $3, CS+8(%rsp)


jz .Lerror_kernelspace
These both macros are defined in the arch/x86/entry/calling.h header file and just move
values of general purpose registers to a certain place at the stack, for example:
because we may have potentially fault if as described in documentation truncated %RIP
.macro SAVE_EXTRA_REGS offset=0 was reported. Anyway, in both cases the SWAPGS instruction will be executed and values
movq %r15, 0*8+\offset(%rsp) from MSR_KERNEL_GS_BASE and MSR_GS_BASE will be swapped. From this moment the %gs
movq %r14, 1*8+\offset(%rsp) register will point to the base address of kernel structures. So, the SWAPGS instruction is
movq %r13, 2*8+\offset(%rsp)
called and it was main point of the error_entry routing.
movq %r12, 3*8+\offset(%rsp)
movq %rbp, 4*8+\offset(%rsp)
Now we can back to the idtentry macro. We may see following assembler code after the
movq %rbx, 5*8+\offset(%rsp)
.endm call of error_entry :

movq %rsp, %rdi


After execution of SAVE_C_REGS and SAVE_EXTRA_REGS the stack will look:
call sync_regs

Here we put base address of stack pointer %rdi register which will be first argument .else
xorl %esi, %esi
(according to x86_64 ABI) of the sync_regs function and call this function which is defined
.endif
in the arch/x86/kernel/traps.c source code file:

Additionally you may see that we zeroed the %esi register above in a case if an exception
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
{ does not provide error code.
struct pt_regs *regs = task_pt_regs(current);
*regs = *eregs; In the end we just call secondary exception handler:
return regs;
}
call \do_sym

which:
This function takes the result of the task_ptr_regs macro which is defined in the
arch/x86/include/asm/processor.h header file, stores it in the stack pointer and returns it.
dotraplinkage void do_debug(struct pt_regs *regs, long error_code);
The task_ptr_regs macro expands to the address of thread.sp0 which represents
pointer to the normal kernel stack:
will be for debug exception and:
#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code);

As we came from userspace, this means that exception handler will run in real process
context. After we got stack pointer from the sync_regs we switch stack: will be for int 3 exception. In this part we will not see implementations of secondary
handlers, because they are very specific, but will see some of them in one of next parts.
movq %rax, %rsp
We just considered first case when an exception occurred in userspace. Let's consider last
two.
The last two steps before an exception handler will call secondary handler are:

1. Passing pointer to pt_regs structure which contains preserved general purpose An exception with paranoid > 0 occurred in kernelspace
registers to the %rdi register:
In this case an exception was occurred in kernelspace and idtentry macro is defined with
paranoid=1 for this exception. This value of paranoid means that we should use slower
movq %rsp, %rdi
way that we saw in the beginning of this part to check do we really came from kernelspace
or not. The paranoid_entry routing allows us to know this:
as it will be passed as first parameter of secondary exception handler.

2. Pass error code to the %rsi register as it will be second argument of an exception ENTRY(paranoid_entry)
cld
handler and set it to -1 on the stack for the same purpose as we did it before - to
SAVE_C_REGS 8
prevent restart of a system call: SAVE_EXTRA_REGS 8
movl $1, %ebx
movl $MSR_GS_BASE, %ecx
.if \has_error_code
rdmsr
movq ORIG_RAX(%rsp), %rsi
testl %edx, %edx
movq $-1, ORIG_RAX(%rsp)
js 1f
SWAPGS
xorl %ebx, %ebx
Exit from an exception handler
1: ret
END(paranoid_entry) After secondary handler will finish its works, we will return to the idtentry macro and the
next step will be jump to the error_exit :

As you may see, this function represents the same that we covered before. We use second
(slow) method to get information about previous state of an interrupted task. As we jmp error_exit

checked this and executed SWAPGS in a case if we came from userspace, we should to do
the same that we did before: We need to put pointer to a structure which holds general routine. The error_exit function defined in the same arch/x86/entry/entry_64.S assembly
purpose registers to the %rdi (which will be first parameter of a secondary handler) and source code file and the main goal of this function is to know where we are from (from
put error code if an exception provides it to the %rsi (which will be second parameter of userspace or kernelspace) and execute SWPAGS depends on this. Restore registers to
a secondary handler): previous state and execute iret instruction to transfer control to an interrupted task.

That's all.
movq %rsp, %rdi

.if \has_error_code
Conclusion
movq ORIG_RAX(%rsp), %rsi
movq $-1, ORIG_RAX(%rsp)
.else It is the end of the third part about interrupts and interrupt handling in the Linux kernel.
xorl %esi, %esi We saw the initialization of the Interrupt descriptor table in the previous part with the #DB
.endif and #BP gates and started to dive into preparation before control will be transferred to an
exception handler and implementation of some interrupt handlers in this part. In the next
The last step before a secondary handler of an exception will be called is cleanup of new part we will continue to dive into this theme and will go next by the setup_arch function
IST stack frame: and will try to understand interrupts handling related stuff.

If you have any questions or suggestions write me a comment or ping me at twitter.


.if \shift_ist != -1
subq $EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist) Please note that English is not my first language, And I am really sorry for any
.endif
inconvenience. If you find any mistakes please send me PR to linux-insides.

You may remember that we passed the shift_ist as argument of the idtentry macro. Links
Here we check its value and if its not equal to -1 , we get pointer to a stack from
Interrupt Stack Table by shift_ist index and setup it. Debug registers

In the end of this second way we just call secondary exception handler as we did it before: Intel 80385
INT 3
call \do_sym gcc
TSS
The last method is similar to previous both, but an exception occurred with paranoid=0 GNU assembly .error directive
and we may use fast method determination of where we are from. dwarf2
CFI directives
IRQ

system call
swapgs
SIGTRAP
Per-CPU variables
kgdb
ACPI
Previous part

You might also like