Interrupt handler

In computer systems programming, an interrupt handler, also known as an interrupt service routine or ISR, is a special block of code associated with a specific interrupt condition. Interrupt handlers are initiated by hardware interrupts, software interrupt instructions, or software exceptions, and are used for implementing device drivers or transitions between protected modes of operation, such as system calls.

The traditional form of interrupt handler is the hardware interrupt handler. Hardware interrupts arise from electrical conditions or low-level protocols implemented in digital logic, are usually dispatched via a hard-coded table of interrupt vectors, asynchronously to the normal execution stream (as interrupt masking levels permit), often using a separate stack, and automatically entering into a different execution context (privilege level) for the duration of the interrupt handler's execution. In general, hardware interrupts and their handlers are used to handle high-priority conditions that require the interruption of the current code the processor is executing.^[1]^[2]

Later it was found convenient for software to be able to trigger the same mechanism by means of a software interrupt (a form of synchronous interrupt). Rather than using a hard-coded interrupt dispatch table at the hardware level, software interrupts are often implemented at the operating system level as a form of callback function.

Interrupt handlers have a multitude of functions, which vary based on what triggered the interrupt and the speed at which the interrupt handler completes its task. For example, pressing a key on a computer keyboard,^[1] or moving the mouse, triggers interrupts that call interrupt handlers which read the key, or the mouse's position, and copy the associated information into the computer's memory.^[2]

An interrupt handler is a low-level counterpart of event handlers. However, interrupt handlers have an unusual execution context, many harsh constraints in time and space, and their intrinsically asynchronous nature makes them notoriously difficult to debug by standard practice (reproducible test cases generally don't exist), thus demanding a specialized skillset—an important subset of system programming—of software engineers who engage at the hardware interrupt layer.

Interrupt flags

Unlike other event handlers, interrupt handlers are expected to set interrupt flags to appropriate values as part of their core functionality.

Even in a CPU which supports nested interrupts, a handler is often reached with all interrupts globally masked by a CPU hardware operation. In this architecture, an interrupt handler would normally save the smallest amount of context necessary, and then reset the global interrupt disable flag at the first opportunity, to permit higher priority interrupts to interrupt the current handler. It is also important for the interrupt handler to quell the current interrupt source by some method (often toggling a flag bit of some kind in a peripheral register) so that the current interrupt isn't immediately repeated on handler exit, resulting in an infinite loop.

Exiting an interrupt handler with the interrupt system in exactly the right state under every eventuality can sometimes be an arduous and exacting task, and its mishandling is the source of many serious bugs, of the kind that halt the system completely. These bugs are sometimes intermittent, with the mishandled edge case not occurring for weeks or months of continuous operation. Formal validation of interrupt handlers is tremendously difficult, while testing typically identifies only the most frequent failure modes, thus subtle, intermittent bugs in interrupt handlers often ship to end customers.

Execution context

In a modern operating system, upon entry the execution context of a hardware interrupt handler is subtle.

For reasons of performance, the handler will typically be initiated in the memory and execution context of the running process, to which it has no special connection (the interrupt is essentially usurping the running context—process time accounting will often accrue time spent handling interrupts to the interrupted process). However, unlike the interrupted process, the interrupt is usually elevated by a hard-coded CPU mechanism to a privilege level high enough to access hardware resources directly.

Stack space considerations

In a low-level microcontroller, the chip might lack protection modes and have no memory management unit (MMU). In these chips, the execution context of an interrupt handler will be essentially the same as the interrupted program, which typically runs on a small stack of fixed size (memory resources have traditionally been extremely scant at the low end). Nested interrupts are often provided, which exacerbates stack usage. A primary constraint on the interrupt handler in this programming endeavour is to not exceed the available stack in the worst-case condition, requiring the programmer to reason globally about the stack space requirement of every implemented interrupt handler and application task.

When allocated stack space is exceeded (a condition known as a stack overflow), this is not normally detected in hardware by chips of this class. If the stack is exceeded into another writable memory area, the handler will typically work as expected, but the application will fail later (sometimes much later) due to the handler's side effect of memory corruption. If the stack is exceeded into a non-writable (or protected) memory area, the failure will usually occur inside the handler itself (generally the easier case to later debug).

In the writable case, one can implement a sentinel stack guard—a fixed value right beyond the end of the legal stack whose value can be overwritten, but never will be if the system operates correctly. It is common to regularly observe corruption of the stack guard with some kind of watch dog mechanism. This will catch the majority of stack overflow conditions at a point in time close to the offending operation.

In a multitasking system, each thread of execution will typically have its own stack. If no special system stack is provided for interrupts, interrupts will consume stack space from whatever thread of execution is interrupted. These designs usually contain an MMU, and the user stacks are usually configured such that stack overflow is trapped by the MMU, either as a system error (for debugging) or to remap memory to extend the space available. Memory resources at this level of microcontroller are typically far less constrained, so that stacks can be allocated with a generous safety margin.

In systems supporting high thread counts, it is better if the hardware interrupt mechanism switches the stack to a special system stack, so that none of the thread stacks need account for worst-case nested interrupt usage. Tiny CPUs as far back as the 8-bit Motorola 6809 from 1978 have provided separate system and user stack pointers.

Constraints in time and concurrency

For many reasons, it is highly desired that the interrupt handler execute as briefly as possible, and it is highly discouraged (or forbidden) for a hardware interrupt to invoke potentially blocking system calls. In a system with multiple execution cores, considerations of reentrancy are also paramount. If the system provides for hardware DMA, concurrency issues can arise even with only a single CPU core. (It is not uncommon for a mid-tier microcontroller to lack protection levels and an MMU, but still provide a DMA engine with many channels; in this scenario, many interrupts are typically triggered by the DMA engine itself, and the associated interrupt handler is expected to tread carefully.)

A modern practice has evolved to divide hardware interrupt handlers into front-half and back-half elements. The front-half (or first level) receives the initial interrupt in the context of the running process, does the minimal work to restore the hardware to a less urgent condition (such as emptying a full receive buffer) and then marks the back-half (or second level) for execution in the near future at the appropriate scheduling priority; once invoked, the back-half operates in its own process context with fewer restrictions and completes the handler's logical operation (such as conveying the newly received data to an operating system data queue).

Divided handlers in modern operating systems

In several operating systems‍—‌Linux, Unix,^{[citation needed]} macOS, Microsoft Windows, z/OS, DESQview and some other operating systems used in the past‍—‌interrupt handlers are divided into two parts: the First-Level Interrupt Handler (FLIH) and the Second-Level Interrupt Handlers (SLIH). FLIHs are also known as hard interrupt handlers or fast interrupt handlers, and SLIHs are also known as slow/soft interrupt handlers, or Deferred Procedure Calls in Windows.

A FLIH implements at minimum platform-specific interrupt handling similar to interrupt routines. In response to an interrupt, there is a context switch, and the code for the interrupt is loaded and executed. The job of a FLIH is to quickly service the interrupt, or to record platform-specific critical information which is only available at the time of the interrupt, and schedule the execution of a SLIH for further long-lived interrupt handling.^[2]

FLIHs cause jitter in process execution. FLIHs also mask interrupts. Reducing the jitter is most important for real-time operating systems, since they must maintain a guarantee that execution of specific code will complete within an agreed amount of time. To reduce jitter and to reduce the potential for losing data from masked interrupts, programmers attempt to minimize the execution time of a FLIH, moving as much as possible to the SLIH. With the speed of modern computers, FLIHs may implement all device and platform-dependent handling, and use a SLIH for further platform-independent long-lived handling.

FLIHs which service hardware typically mask their associated interrupt (or keep it masked as the case may be) until they complete their execution. An (unusual) FLIH which unmasks its associated interrupt before it completes is called a reentrant interrupt handler. Reentrant interrupt handlers might cause a stack overflow from multiple preemptions by the same interrupt vector, and so they are usually avoided. In a priority interrupt system, the FLIH also (briefly) masks other interrupts of equal or lesser priority.

A SLIH completes long interrupt processing tasks similarly to a process. SLIHs either have a dedicated kernel thread for each handler, or are executed by a pool of kernel worker threads. These threads sit on a run queue in the operating system until processor time is available for them to perform processing for the interrupt. SLIHs may have a long-lived execution time, and thus are typically scheduled similarly to threads and processes.

In Linux, FLIHs are called upper half, and SLIHs are called lower half or bottom half.^[1]^[2] This is different from naming used in other Unix-like systems, where both are a part of bottom half.^{[clarification needed]}

References

^ ^a ^b ^c "The Linux Kernel Module Programming Guide, Chapter 12. Interrupt Handlers". The Linux Documentation Project. May 18, 2007. Retrieved February 20, 2015.
^ ^a ^b ^c ^d Jonathan Corbet; Alessandro Rubini; Greg Kroah-Hartman (January 27, 2005). "Linux Device Drivers, Chapter 10. Interrupt Handling" (PDF). O'Reilly Media. Retrieved February 20, 2015.

[tldp-lkmpg-1] "The Linux Kernel Module Programming Guide, Chapter 12. Interrupt Handlers". The Linux Documentation Project. May 18, 2007. Retrieved February 20, 2015.

[lwn-ldd3-2] Jonathan Corbet; Alessandro Rubini; Greg Kroah-Hartman (January 27, 2005). "Linux Device Drivers, Chapter 10. Interrupt Handling" (PDF). O'Reilly Media. Retrieved February 20, 2015.

[1]

[2]