[go: up one dir, main page]

0% found this document useful (0 votes)
82 views30 pages

Basic of Thread Level Parallelism

The document discusses thread level parallelism using simultaneous multithreading (SMT). It explains that SMT allows multiple threads to execute simultaneously on a superscalar processor by sharing the functional units. This is achieved by duplicating independent thread state like registers while sharing common processor resources. SMT provides better hardware utilization than coarse-grained or fine-grained multithreading by mixing instructions from different threads in each cycle to hide latency stalls. Popular implementations of SMT include Intel's Hyper-Threading technology.

Uploaded by

uruzawa ibia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views30 pages

Basic of Thread Level Parallelism

The document discusses thread level parallelism using simultaneous multithreading (SMT). It explains that SMT allows multiple threads to execute simultaneously on a superscalar processor by sharing the functional units. This is achieved by duplicating independent thread state like registers while sharing common processor resources. SMT provides better hardware utilization than coarse-grained or fine-grained multithreading by mixing instructions from different threads in each cycle to hide latency stalls. Popular implementations of SMT include Intel's Hyper-Threading technology.

Uploaded by

uruzawa ibia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Thread Model, Thread VS

Process & pthread Library


Delphi Hanggoro
Understanding of thread
Consider this 1 – 1.000.000

example
let's just say we have a program that adds a
number of 1-1,000,000 Process

Processor I Processor II Processor III Processor IV


Speeding Up With Multiple Processes
250.000 – 500.000 – 750.000 – Properties
1 – 250.000 500.000 750.000 1.000.000

• 4 system fork call needed; one for


creating each process (system call fork
berfungsi untuk membuat proses baru)
Process I Process II Process III Process IV • Each process is isolated from each
other
• IPC mechanism to communicate
• Each process has it’s own memory
map, instruction, data etc.
• Process management with System Call

Processor I Processor II Processor III Processor IV With this approach we consider a lot of
overhead, is there any way to do it better ?
Thread
Process Properties
Model
• 1 fork system call needed; 4 threads
Create process with 4 thread; need to be created --- much more
each thread does 1/4 of the work lighter.
• Each thread is not isolated from others.
• Management of threads with fewer or
no system calls.
• Shared instructions, global. Thread has
its own stack.

Processor Processor Processor Processor


I II III IV
Threads
• Separate streams of data files

execution within a single


process registers registers registers

• Threads in a process not stack stack stack

isolated from each other


code
• Each thread state (thread
control block) contains
• Registers (including EIP, ESP)
• stack
Why Thread ?
• Extremely lightweight

https://computing.llnl.gov/tutorials/pthreads/

• Efficient communication between entities


• Efficient context swiching
Threads vs Processes
• A thread has no data segment or • A process has code, heap,stack,
heap. other segments.
• A thread cannot live on its own. • A process has at-least one
It needs to be attached to a thread
process.
• Threads within a process share
• There can be more than one the same code, files.
thread in a process. Each thread
has its own stack.
• If a process die, all threads die
• If a thread dies, its stack is
reclaimed.
How to make a
thread ?
pthread library
Thread identifier
• Create a thread in a process : pointer to a function,
which starts execution in
int pthread_create(pthread_t *thread, a different thread
const pthread_attr_t *attr,
void *(*start_routine) (void *),
arguments to the function
void *arg);

• Destroying a thread
void pthread_exit(void *retval);

Exit value to the thread


pthread library

• Join : Wait for a specific thread to complete


int pthread_join(pthread_t thread, void **retval);

TID of the thread to wait for Exit status of the thread


Thanks for the attention

Next discussion
1. Thread Level Parallelism & SMT
Multithreading
Delphi Hanggoro
Learning Objectives
• Discuss the basic of Thread level parallelism
• Discuss the concepts of Simultaneous Multithreading
Improving Performance of a Processor
Techniques to increase performance :
1. Pipelining
• Improves clock speed
• Increase number of in-flight instruction
2. Hazard/stall elimination
• Branch prediction
• Register renaming
• Out of order execution
More on Pipelining
• Increase the number of in-flight instruction
• Decrease the gap between successive independent instructions
• Increase the gap between dependent instruction

• Difficult to find more than four independent instruction


• Difficult to fetch more than six instruction
Limits of ILP
Doubling issue rate’s above today’s 3-6 instruction per clock, let’s say
about 6-12 instructions, probably requires a processor to
• Issue 3 or 4 data memory accesses per cycle
• Resolve 2 or 3 branches per cycle
• Rename and access more than 20 register per cycle
• Fetch 12 to 24 instruction per cycle
Most techniques for increasing performance will increase the power
of consumption
Types of parallelism
• Instruction parallelism
• Thread parallelism
• Data parallelism
Type of Thread Level Parallelism
1. Simultaneous Multi-Threading (SMT)
• Multiple thread executed simultaneously
2. Chip Multi Processing (CMP)
Multi-threaded execution
Multi-threading : multiple thread to share the functional units of 1
processor
• Processor must duplicate independent state of each thread example,
separate copy of register file, a separate PC, and for running
independent programs, a separate page table
• Memory shared through the virtual memory mechanism, which
already support multiple processes
• Hardware for fast thread switch
Multi-threaded execution (cont..)
When we are switching of threads execution?
• Fine grain multithreading
• Coarse grain multithreading
Fine Grain Multithreading
• Like Round robin, every clock cycle switch from one to another thread
execution
Advantage
• It can hide short and long stall
Disadvantage
• its slows down execution of individual threads, since a thread ready to
execute without stalls will delayed by instruction from another threads

Used on Sun’s Niagara Processor


Coarse-grained Multithreading
• Until costly stall, like a cache miss we will leave this thread of execution and
we will switch to another threads execution
Advantage
• Relieves need to have very fast thread-switching
• Doesn’t slow down thread, since instructions from other threads will be executed
only when the previous threads encounter costly stall
Disadvantage
• Its hard to overcome throughput losses from shorter stalls, due to pipeline startups
cost
• Since CPU issues instruction from 1 thread, when a stall occurs, the pipeline must be emptied
• Because of this start-up overhead, coarse-grained multithreading is better for
reducing penalty of high cost stall, where pipeline refill (the stall time)
Used on IBM AS/400
Simultaneous Multithreading
• Insight that dynamically scheduled processor already has many
hardware mechanism to support multithreading
• Large set of virtual registers that can be used to hold the register sets of
independent threads
• Register renaming provides unique register identifiers, so instructions from
multiple threads can be mixed in datapath without confusing sources and
destination across threads
• Out of order completion allows the threads to execute out of order and get
better utilization of the hardware
Issue Slot
Time

Coarse Grind Multithreading Fine Grind Multithreading Simultaneous Multithreading

Thread 1 Thread 2 Thread 3 Thread 4


SMT and Hyper threading
2004 Intel’s Hyper Threading
• Intel’s Xeon
• Intel’s Pentium 4
2015
• AMD Announces SMT Processor are coming
• AMD Rayzen 5 and so on
Reference
• Dr. A.P Shanthi, “Thread Level Parallelism – SMT and CMP” 2016.
• Simultaneous Multithreading : Maximizing On-Chip Parallelism, Dean
Tullsen et al., Proceedings of the 22rd Annual International
Symposium on Computer Architecture, June 1995, Pages 392-403

You might also like