Multi-threaded RTOS
How Multi-threading can increase
on-chip parallelism
Outline
Introduction
Multi-threading models
Architectures of multi-threaded processors
Simultaneous multi-threading and multi-
processors
Cache design
Examples of Multi-threaded environments
Conclusions
Introduction
Two forms of parallelism
instruction-level parallelism (ILP)
thread-level parallelism (TLP)
Both identify independent instructions that can execute in parallel
Wide-issue superscalar processors exploit ILP by executing multiple
instructions from a single program in a single cycle.
Multiprocessors exploit TLP by executing different threads in parallel
on different processors.
The first multi-threaded processor approaches in the 1970s and
1980s applied multi-threading at user-thread-level to solve the
memory access latency problem.
Introduction
Motivations for multi-threaded processor architecture development
include chip area , cost and complexity.
Simultaneous Multi-threading (SMT),
Single chip multiprocessing (CMP),
SMT VLIW architecture,
Multithreaded Vector (SMV) architecture
DSP applications inherently benefit from the following architectural
characteristics:
Parallelization at multiple levels of hierarchy:
- Instruction - separate instruction memory space
- Data – separate date memory space
- Thread- multiple functional units
- Data transfer – multiple wide data buses
Vertical and Horizontal Waste
Vertical waste is
introduced when the
processor issues no
instructions in a cycle
Horizontal waste when
not all issue slots can
be filled in a cycle.
Vertical and Horizontal Waste
Multi-threaded Models
Fine-Grain Multithreading
Only one thread issues instructions
each cycle, but it can use the entire
issue width of the processor.
SM: full Simultaneous Issue
Single
Dual
Four
SM: limited Connection
Hardware context is connected
directly one of each type of
functional units.
Less dynamic
Performance
SMT VLIW Architecture
Simultaneous Vector Multi-threaded Architecture (SVMT)
SMT vs. Multiprocessing
Cache design
Examples Multi-threaded RTOS
Analog Devices VDK
uClinux
The RTXC Quadros RTOS
RTCX/ss
RTXC/ss
ThreadX
Conclusions
A simultaneous multithreaded architecture is superior in
performance to a multiple-issue multiprocessor (multi-issue CMP).
SMT boost utilization by dynamically scheduling functional units
among multiple threads.
SMT also increases hardware design flexibility.
Simultaneous multithreading increases the complexity of instruction
scheduling.
Increased parallelism offered makes multi-threading ideal for DSP
applications where each application can run as a separate thread.