Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Threads in C
David Chisnall
February 25, 2011
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
The Basic C Model
One computer
One process
One thread
One stack
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Multithreaded Memory Layout (Again)
0xffffffff 0xffffffff
stack 1
stack 2
0xf0000000
Heap Memory
0x00140000
Library Code
0x00100000
0x000fe000
Static Data
0x0000f000
Program Code
0x00000000 0x00001000
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Thread APIs
Threads are not part of C99
They are part of C1X, but there are currently no
implementations of C1X
On UNIX-like platforms, the POSIX Thread APIs are portable
Other platforms there are different APIs
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Creating Threads
pthread_t thread ;
pthread_create (& thread , NULL , start_function ,
arg ) ;
Creates a new stack
Registers it for scheduling
Sets instruction pointer in new thread to start_function
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Combining Threads
void * ret ;
pthread_join ( thread , & ret ) ;
Waits for thread to finish
Sets ret to the return value from the thread start function
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Aside: Function Pointers
void example ( void ) { printf ( " Stuff goes here \ n " )
; }
void user ( void )
{
// Call example
example () ;
// Store a pointer to example :
void (* funcPtr ) ( void ) = example ;
// Call example via the function pointer
funcPtr () ;
}
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Thread Overhead
Creating the stack
Copying all of the thread-local variables
Context switching if number of threads exceeds number of
cores
Synchronisation costs (including implicit cache coherency
cost)
Sometimes adding threads makes things slower!
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Example: Parallel Quicksort
Recursive sorting algorithm
Perform sub-sorts in a new thread
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Revision: Quicksort
1. Pick pivot point
2. Split array into values lower than pivot and values greater
than pivot
3. Recursively sort each sub-array.
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Serial Quicksort
void quicksort ( int * array , int left , int right )
{
if ( right > left )
{
int pivotIndex = left + ( right - left ) /2;
pivotIndex = partition ( array , left , right ,
pivotIndex ) ;
quicksort ( array , left , pivotIndex -1) ;
quicksort ( array , pivotIndex +1 , right ) ;
}
}
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Making it Parallel
Partition cant (easily) be done in parallel
Recursive calls can
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Recursive Quicksort
void quicksort ( int * array , int left , int right )
{
if ( right > left )
{
int pivotIndex = left + ( right - left ) /2;
pivotIndex = partition ( array , left , right ,
pivotIndex ) ;
struct qsort_starter arg = { array , left ,
pivotIndex -2};
pthread_t thread ;
// Create a new thread for one subsort
pthread_create (& thread , 0 , qsthread , & arg ) ;
quicksort ( array , pivotIndex +1 , right ) ;
// Wait for both to finish
pthread_join ( thread , NULL ) ;
}
}
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Why The Structure?
struct qsort_starter
{
int * array ;
int left ;
int right ;
};
void * qsthreadthread ( void * init )
{
struct qsort_starter * start = init ;
quicksort ( start - > array , start - > left , start - >
right ) ;
return NULL ;
}
Thread function only takes one (void*) argument
If we want to pass more than one, we must pass a struct
pointer
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Is this Safe?
struct qsort_starter arg = { array , left ,
pivotIndex -2};
pthread_t thread ;
// Create a new thread for one subsort
pthread_create (& thread , 0 , qsthread , & arg ) ;
Usually not - passing stack allocation to another thread
Safe here because of the pthread_join() call.
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Testing Speedup
$ time ./a.out
real 0m30.792s
user 0m30.552s
sys 0m0.222s
real (a.k.a wall clock time) - elapsed time from start to
finish
user CPU time scheduled to the process
sys CPU time scheduled to the kernel to handle system
calls for the process
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Multithreading Performance
35
Wall Clock Time
CPU Time
30
25
20
Seconds
15
10
0
0 2 4 6 8 10 12
Depth
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Speedup
T1
Sp =
Tp
T1 time for sequential algorithm
Tp time for parallel algorithm with p processors
Sp speedup with p processors
Wall clock time, not CPU time!
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Efficiency
Sp
Ep =
p
More useful metric
Can compare Ep values independently of p
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Parallel Quicksort Speedup
Threads p Tp Sp Ep
1 1 30.792 1 1
2 2 24.044 1.28 0.64
4 2 23.780 1.29 0.65
8 2 24.548 1.25 0.63
16 2 18.784 1.63 0.82
Linear speedup: Sp = p, Ep = 1
Superlinear speedup (very rare!) Ep > 1
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Why Sublinear?
The partition step is serial
Uneven workload distribution between threads (e.g. one
thread sorting 998 elements, one sorting 2)
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Amdahls Law
Maximum speedup of a program is limited by the sequential
portion
For quicksort, this is the partition step
Absolute best case, with infinite processors, is O(n) for
quicksort
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Parallel Quicksort is Easy
No data sharing between threads
No communication between threads, except on exit
Concurrent parts could be in separate processes
Threads just eliminate some copying overhead
Revision Thread Portability Creating and Joining Threads Parallel Quicksort
Questions?