0% found this document useful (0 votes)

13 views51 pages

Lecture 5

Uploaded by

np03cs4s230155

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views51 pages

Lecture 5

Uploaded by

np03cs4s230155

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

6CS005 High Performance

Computing
Lecture 5 Parallel Programming
Matrix-Vector Multiplication
Pseudocode for serial matrix-vector multiplication

If A=(aij) is an m×n matrix and x is a vector with n

components, then y=A*x is a vector with m components, and
we can find the ith component of y by forming the dot product of
the ith row of A with x:
Parallelization of the matrix-vector
multiplication
• Need parallelize this by dividing the work among the threads.
• One possibility is to divide the iterations of the outer loop
among the threads.
• If we do this, each thread will compute some of the
components of y.
Parallelising Processing

• For example, suppose that m=6 and the number of threads,

thread_count or n, is three. Then the computation could be
divided among the threads as follows:
Contd…

• To compute y[0], thread 0 will need to execute the

code

• Therefore thread 0 will need to access every element

of row 0 of A and every element of x.
• More generally, the thread that has been assigned
y[i] will need to execute the code
Parallelising Processing

• Thus for n threads, and m rows:

– Thread 0
• start row = 0
• end row = (m/n) -1
– Thread 1
• start row = (m/n)
• end row = (m/n) + (m/n) -1
– Thread 2
• start row = (m/n) + (m/n)
• end row = (m/n) + (m/n) + (m/n) -1
– Thread 3
• start row = (m/n) + (m/n) + (m/n)
• end row = (m/n) + (m/n) + (m/n) + (m/n) -1
…
…
– Thread k
• start row = k * (m/n)
• end row = ((k + 1) * (m/n)) -1 /* the very last end row need to be adjusted if m is not a multiple
of n */
Pthread Function for matrix-vector multiplication
4x2 and 2x3 Matrix-Matrix
Multiplication
General Matrix-Matrix Multiplication
Pseudocode for General Matrix-Matrix Multiplication of A x
B=C
Race Condition

• A race condition arises in software when a

computer program, to operate properly,
depends on the sequence or timing of the
program's processes or threads.

• Critical race conditions often happen when the

processes or threads depend on some shared
state.
Race Condtion

Critical Sections
Busy Wait

Busy Wait
Mutex
Issues

• Since a thread that is busy-waiting may continually use the

CPU, busy-waiting is generally not an ideal solution to the
problem of limiting access to a critical section.
• Two better solutions are mutexes and semaphores.
• Mutex is an abbreviation of mutual exclusion, and a mutex is a
special type of variable that, together with a couple of special
functions, can be used to restrict access to a critical section to
a single thread at a time.
• Thus a mutex can be used to guarantee that one thread
“excludes” all other threads while it executes the critical
section.
• Hence, the mutex guarantees mutually exclusive access to
Fig: Run-times(in seconds) of Pi programs using n=108 term on system with two four-core processor
Barriers and Condition Variables
Barriers

• In shared-memory programming: synchronizing the

threads by making sure that they all are at the same
point in a program.
• Such a point of synchronization is called a barrier,
because no thread can proceed beyond the barrier
until all the threads have reached it.
Busy-waiting and a Mutex

• Implementing a barrier using busy-waiting and a

mutex is straightforward: we use a shared counter
protected by the mutex.
• When the counter indicates that every thread has
entered the critical section, threads can leave the
busy-wait loop.
Busy-waiting and a Mutex
Problems in Busy-Waiting and Mutex Implementati

• First barrier
– Threads use a busy-wait loop (while(counter <
thread_count)) to wait for all threads to reach the barrier,
threads consume CPU cycles while waiting
• Second barrier
– Reusing the same counter variable causes unsynchronized
behavior.
– At the start of the second barrier, counter =
thread_count from the first barrier, making the condition
counter < thread_count immediately false.
– As a result, threads bypass the second barrier, leading to
unsynchronized execution.
Contd…

• Resetting the counter after first barrier

– After all threads exit the first barrier but before they reach
the second barrier.
– Increments to counter may be lost, making the second
barrier condition unreachable (counter < thread_count
fails to meet).
– This causes deadlock as threads hang in the busy-wait
loop of the second barrier.
Solution???

• To resolve this issues:

– Use Separate Counters:
Allocate independent counters for each barrier instance to
avoid conflicts.
– Use Condition Variables(pthread_cond_wait):
Replace the busy-waiting loop with condition variables to
manage thread synchronization efficiently and avoid race
conditions (Threads block until signaled, avoiding CPU
waste.)
Implementing a barrier with semaphores
Contd…

• count_sem: Protects the counter and ensures mutual

exclusion during its modification.
• barrier_sem: Blocks threads at the barrier until all
threads have arrived.
• Threads increment the counter under protection by
count_sem.
• The last thread resets the counter and releases all
waiting threads by calling sem_post(&barrier_sem)
multiple times.
• Semaphores prevent busy-waiting, conserving CPU
resources.
Contd…

• counter and count_sem can be reused since they are

properly reset before threads exit the barrier.
• Reusing barrier_sem can lead to a race condition if
threads proceed to the next barrier before all
threads have finished decrementing barrier_sem in
the first barrier.
Solution???

• Separate barrier_sem for Each Barrier:

– Allocate a unique barrier_sem for each barrier to prevent
interference between barriers.
– Ensure threads only interact with the barrier_sem specific
to their current barrier.
• Synchronization Between Barriers:
– Ensure threads cannot proceed to the next barrier until all
threads have exited the first barrier.
– Use additional synchronization mechanisms (e.g., flags or
condition variables) to guarantee proper sequencing.
Condition Variables

• A somewhat better approach to creating a barrier in

Pthreads is provided by condition variables.
• A condition variable is a data object that allows a
thread to suspend execution until a certain event or
condition occurs.
• When the event or condition occurs another thread
can signal the thread to “wake up.”
• A condition variable is always associated with a
mutex.
Condition Variables

Typically, condition variables are used in constructs similar to

this pseudocode:
Contd…

• Lock mutex ensures exclusive access to shared

resources.
• If the condition has occurred, the thread notifies
other threads that are waiting on this condition
variable using pthread_cond_signal or
pthread_cond_broadcast
• The function
int pthread_cond_signal( pthread_cond_t∗ cond_var_p);
will unblock one of the blocked threads, and
int pthread_cond_broadcast( pthread_cond_t∗
cond_var_p); will unblock all of the blocked threads.
Contd…

• If not, the thread waits and releases the mutex,

allowing other threads to update the counter.
Implementing a barrier with condition variables
Contd…

• Int counter: Tracks the number of threads that have reached the
barrier.
• pthread_mutex_t mutex: Protects access to the shared counter to
prevent race conditions.
• pthread_cond_t cond_var: Used for condition-based synchronization
(to block and wake threads).
• pthread_mutex_lock(&mutex) locks the mutex to protect shared data.
• The counter is incremented by the thread to indicate it has reached the
barrier.
• If the counter equals thread_count, it means all threads have reached
the barrier. The thread resets the counter to 0.It then broadcasts a
signal using pthread_cond_broadcast(&cond_var) to wake all threads
waiting at the barrier.
• If the thread is not the last to reach the barrier, it waits on the condition
Pthreads Read-Write Locks

• Pthreads read-write locks, a synchronization

mechanism used to allow multiple threads to access a
shared resource with different levels of concurrency.
• Mutex vs. Read-Write Locks:
• A mutex allows only one thread to access a resource at any
given time.
• Read-write locks provide more flexibility by allowing
multiple threads to read the resource concurrently but
ensuring exclusive access for one thread when writing.
How Read-Write Locks Work?

• Read Lock (pthread_rwlock_rdlock):

– Allows multiple threads to simultaneously read the
resource.
– If any thread holds a read lock, other threads can also
acquire the read lock.
• Write Lock (pthread_rwlock_wrlock):
– Only one thread can hold a write lock at a time.
– If a thread holds the write lock, no other threads can
acquire either a read or a write lock.
• Unlock (pthread_rwlock_unlock):
– Used to release the read or write lock after the thread
finishes its operations on the shared resource.
Pthreads Read-Write Locks Functions

• int pthread_rwlock_rdlock(pthread_rwlock_t
*rwlock_p);
• int pthread_rwlock_tryrdlock(pthread_rwlock_t
*rwlock_p);
• int pthread_rwlock_wrlock(pthread_rwlock_t
*rwlock_p);
• int pthread_rwlock_trywrlock(pthread_rwlock_t
*rwlock_p);
• int pthread_rwlock_unlock(pthread_rwlock_t
*rwlock_p);
• int pthread_rwlock_init(pthread_rwlock_t
*rwlock_p,
Thread-Safety

• In shared-memory programming, thread safety

refers to the ability of a block of code to be executed
by multiple threads simultaneously without causing
issues, such as data corruption or incorrect results.
• Example
– Suppose we want to use multiple threads to “tokenize” a
file that consists of ordinary English text.
– The tokens are just contiguous sequences of characters
separated from the rest of the text by white-space — a
space, a tab, or a newline.
Simple approach

• Divide the input file into lines of text and assign the
lines to the threads in a round-robin fashion.
• The first line goes to thread 0, the second goes to
thread 1,…, the nth goes to thread n, the n +1st goes
to thread 0, etc.
• We can serialize access to the lines of input using
semaphores.
• After a thread has read a single line of input, it can
tokenize the line using the C standard library
"strtok()" function.
The strtok() function

• The first time it’s called the string argument should be

the text to be tokenized.
• For subsequent calls, the first argument should be
NULL.
• The idea is that in the first call, strtok caches a pointer
to string, and for subsequent calls it returns
successive tokens taken from the cached copy.
Running with one thread

• It correctly tokenizes the input stream.

– Pease porridge hot.
– Pease porridge cold.
– Pease porridge in the pot
– Nine days old.
Running with two threads
What happened?

• strtok() caches the input line by declaring a variable to

have static storage class.
• This causes the value stored in this variable to persist
from one call to the next.
• Unfortunately for us, this cached string is a global.
• Thus, thread 0’s call to strtok() with the third line of the
input has apparently overwritten the contents of thread
1’s call with the second line.
• So the strtok() function is not thread-safe.
• If multiple threads call it simultaneously, the output
may not be correct.
Other unsafe C library functions

• It’s not uncommon for C library functions to not be

thread-safe.
• The random number generator random in stdlib.h.
• The time conversion function localtime in time.h.
Solution??

• To resolve this issue, use a thread-safe version of

strtok, such as strtok_r (reentrant strtok), which
allows each thread to maintain its own internal state
during tokenization.

• Each thread would pass its own saveptr to strtok_r,

allowing them to tokenize independently without
affecting each other.
Summary

• A thread in shared-memory programming is analogous to a

process in distributed memory programming.
• However, a thread is often lighter-weight than a full-
fledged process.
• In Pthreads programs, all the threads have access to global
variables, while local variables usually are private to the
thread running the function.
• When indeterminate results arise from multiple threads
attempting to access a shared resource such as a shared
variable or a shared file, at least one of the accesses is an
update, and the accesses can result in an error, we have a
race condition.
Summary

• A critical section is a block of code that updates a

shared resource that can only be updated by one
thread at a time.
• So the execution of code in a critical section should,
effectively, be executed as serial code.
• Busy-waiting can be used to avoid conflicting access
to critical sections with a flag variable and a while-
loop with an empty body.
• It can be very wasteful of CPU cycles.
• It can also be unreliable if compiler optimization is
turned on.
Summary

• A mutex can be used to avoid conflicting access to

critical sections as well.
• Think of it as a lock on a critical section, since
mutexes arrange for mutually exclusive access to a
critical section.
• A semaphore is the third way to avoid conflicting
access to critical sections.
• It is an unsigned int together with two operations:
sem_wait and sem_post.
• Semaphores are more powerful than mutexes since
they can be initialized to any nonnegative value.
Summary

• A barrier is a point in a program at which the threads

block until all of the threads have reached it.
• A read-write lock is used when it’s safe for multiple
threads to simultaneously read a data structure, but if
a thread needs to modify or write to the data
structure, then only that thread can access the data
structure during the modification.
• Some C functions cache data between calls by
declaring variables to be static, causing errors when
multiple threads call the function.
• This type of function is not thread-safe.
End of Lecture 5

05 Ipc
No ratings yet
05 Ipc
67 pages
05-Semaphores Monitors Barriers-S20
No ratings yet
05-Semaphores Monitors Barriers-S20
60 pages
MIT6 087IAP10 Lec13
No ratings yet
MIT6 087IAP10 Lec13
38 pages
P Threads
No ratings yet
P Threads
72 pages
7-Posix Threads - Mutex
No ratings yet
7-Posix Threads - Mutex
36 pages
Pthreads Mod
No ratings yet
Pthreads Mod
110 pages
5 4 Concurrency-Semaphore
No ratings yet
5 4 Concurrency-Semaphore
36 pages
MAP - Unit2
No ratings yet
MAP - Unit2
134 pages
High Performance Computing
No ratings yet
High Performance Computing
67 pages
Week 9
No ratings yet
Week 9
20 pages
Lecture09 ConcurrentProgramming 02 Synchronization
No ratings yet
Lecture09 ConcurrentProgramming 02 Synchronization
30 pages
Conditional and Semaphore
No ratings yet
Conditional and Semaphore
52 pages
Tut 05
No ratings yet
Tut 05
19 pages
Lab 13 1
No ratings yet
Lab 13 1
14 pages
19L010 - Multimedia Compression Techniques H.261 Family of Standards
No ratings yet
19L010 - Multimedia Compression Techniques H.261 Family of Standards
12 pages
ANP - 2. Synchronization in Multithreaded Programs
No ratings yet
ANP - 2. Synchronization in Multithreaded Programs
6 pages
Ex 4
No ratings yet
Ex 4
11 pages
Lab Assignment 2 CP
No ratings yet
Lab Assignment 2 CP
6 pages
Thread Programming Ch6
No ratings yet
Thread Programming Ch6
4 pages
Programming Shared Address Space Platforms
No ratings yet
Programming Shared Address Space Platforms
44 pages
TMP Thread
No ratings yet
TMP Thread
12 pages
Lecture 9 Programming Shared Address Space Platforms Using POSIX Thread API
No ratings yet
Lecture 9 Programming Shared Address Space Platforms Using POSIX Thread API
35 pages
Chapter 6 - Synchronization Tools - Part 2
No ratings yet
Chapter 6 - Synchronization Tools - Part 2
32 pages
Lab 3 Synchronization v2-1
No ratings yet
Lab 3 Synchronization v2-1
19 pages
Lab Synchronization
No ratings yet
Lab Synchronization
20 pages
Pthreads
No ratings yet
Pthreads
70 pages
Major Project Report Reverse Braking System
78% (9)
Major Project Report Reverse Braking System
41 pages
Introduction To Pthreads
No ratings yet
Introduction To Pthreads
19 pages
ASTM D-374 Thickness
No ratings yet
ASTM D-374 Thickness
11 pages
Chap3 Pthread
No ratings yet
Chap3 Pthread
33 pages
L0000010335 Gemweb Plus 500 V6.0.0 Interface Specs Rev 00
No ratings yet
L0000010335 Gemweb Plus 500 V6.0.0 Interface Specs Rev 00
205 pages
Kyocera FS-1800, 3800 Service Manual
No ratings yet
Kyocera FS-1800, 3800 Service Manual
258 pages
Pthreads
No ratings yet
Pthreads
13 pages
Semaphores and Mutexes
No ratings yet
Semaphores and Mutexes
36 pages
Lab3 Synchronization
No ratings yet
Lab3 Synchronization
17 pages
OS-Lab9-manual 2
No ratings yet
OS-Lab9-manual 2
13 pages
SPThreads 5 Sync
No ratings yet
SPThreads 5 Sync
26 pages
P Threads Intro
No ratings yet
P Threads Intro
19 pages
4 Multi-Threading
No ratings yet
4 Multi-Threading
34 pages
Lecture 15
No ratings yet
Lecture 15
32 pages
Objective:: Lab # 8 Threads Synchronization (Mutex & Condition Variables)
No ratings yet
Objective:: Lab # 8 Threads Synchronization (Mutex & Condition Variables)
10 pages
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Lab 3
No ratings yet
Lab 3
18 pages
AIIMS MBBS Entrance Examination 2004 Solved Question Paper
No ratings yet
AIIMS MBBS Entrance Examination 2004 Solved Question Paper
36 pages
Grade 4 April Holiday Assignment
No ratings yet
Grade 4 April Holiday Assignment
99 pages
12 Multithreading Patterns Data Structures
No ratings yet
12 Multithreading Patterns Data Structures
40 pages
Front Threads
No ratings yet
Front Threads
18 pages
Lecture 4
No ratings yet
Lecture 4
41 pages
EE426 OS Lab7 Spring2023
No ratings yet
EE426 OS Lab7 Spring2023
7 pages
An Introduction To Programming With Threads
No ratings yet
An Introduction To Programming With Threads
29 pages
Thread Implementation: For Parallel Processing
No ratings yet
Thread Implementation: For Parallel Processing
42 pages
California Building Code For Suspended Ceiling
0% (1)
California Building Code For Suspended Ceiling
57 pages
CS241 System Programming: Discussion Section 4 Feb 13 - Feb 16
No ratings yet
CS241 System Programming: Discussion Section 4 Feb 13 - Feb 16
31 pages
Ecole Militaire Polytechnique: Content
No ratings yet
Ecole Militaire Polytechnique: Content
15 pages
Exercise
No ratings yet
Exercise
3 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
No ratings yet
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
22 pages
APC - Part - 1 Concept
No ratings yet
APC - Part - 1 Concept
49 pages
JOYO Catalogue 2018
No ratings yet
JOYO Catalogue 2018
26 pages
POSIX Threads
No ratings yet
POSIX Threads
20 pages
Style Luthe
No ratings yet
Style Luthe
17 pages
Session 15 POSIX Threads and Mutex
No ratings yet
Session 15 POSIX Threads and Mutex
20 pages
Concrete Floors
No ratings yet
Concrete Floors
44 pages
Synchronization
No ratings yet
Synchronization
11 pages
Subash Kumar Rawat - 230155 - WPS01
No ratings yet
Subash Kumar Rawat - 230155 - WPS01
11 pages
Leveraging Mongo DB For Efficient Data Storage in MERN
No ratings yet
Leveraging Mongo DB For Efficient Data Storage in MERN
6 pages
COMP6771 Advanced C++ Programming: Week 10 Multithreading - Producer/Consumer Problem
No ratings yet
COMP6771 Advanced C++ Programming: Week 10 Multithreading - Producer/Consumer Problem
16 pages
Business Analysis of Walmart - Success Factors and
No ratings yet
Business Analysis of Walmart - Success Factors and
7 pages
Unit 3 DLT
No ratings yet
Unit 3 DLT
10 pages
Unix Threads
No ratings yet
Unix Threads
36 pages
SHOPERA Benchmark - Specification PART I
No ratings yet
SHOPERA Benchmark - Specification PART I
16 pages
Flameproof Motors PDF
No ratings yet
Flameproof Motors PDF
8 pages
Relationship Between Color and Redox Potential Eh
No ratings yet
Relationship Between Color and Redox Potential Eh
13 pages
Brushless DC Motors Catalog
No ratings yet
Brushless DC Motors Catalog
5 pages
Lab Assignment 6-Process Synchronization
No ratings yet
Lab Assignment 6-Process Synchronization
7 pages
Dte Full Microproject Letest
No ratings yet
Dte Full Microproject Letest
14 pages
Lect9 Pthread
No ratings yet
Lect9 Pthread
24 pages
BITS, Pilani - Hyderabad Campus Operating Systems (CS F372)
No ratings yet
BITS, Pilani - Hyderabad Campus Operating Systems (CS F372)
5 pages
X Class Paper - II EM
No ratings yet
X Class Paper - II EM
8 pages
Open CVIntro
No ratings yet
Open CVIntro
13 pages
Evermotion Archmodels 59 PDF
No ratings yet
Evermotion Archmodels 59 PDF
2 pages
Model CBL: 900 - 1800 HP Steam Wet-Back Package Burner
No ratings yet
Model CBL: 900 - 1800 HP Steam Wet-Back Package Burner
5 pages
Pthread PDF
No ratings yet
Pthread PDF
33 pages
Shared Memory Programming Pthreads: DR Matthew Grove
No ratings yet
Shared Memory Programming Pthreads: DR Matthew Grove
41 pages
Deformation Analysis For The Rotary Draw Bending Process of Circular Tubes - Stress Distribution and Wall Thinning
No ratings yet
Deformation Analysis For The Rotary Draw Bending Process of Circular Tubes - Stress Distribution and Wall Thinning
5 pages
C Assignment 1
No ratings yet
C Assignment 1
5 pages
Cs414 Fa05 06 Semaphores
No ratings yet
Cs414 Fa05 06 Semaphores
7 pages
Mutex
No ratings yet
Mutex
6 pages
5th SAO (2017) Constants Sheet (CMU Serif)
No ratings yet
5th SAO (2017) Constants Sheet (CMU Serif)
1 page
Midterm Examination AMT 112
No ratings yet
Midterm Examination AMT 112
3 pages
ASTM A307 Standard Specification For Carbon Steel Bolts, Studs, and Threaded Rod 60 000 PSI Tensile Strength
No ratings yet
ASTM A307 Standard Specification For Carbon Steel Bolts, Studs, and Threaded Rod 60 000 PSI Tensile Strength
6 pages
ETNA 2 Especificaciones Técnicas
100% (1)
ETNA 2 Especificaciones Técnicas
2 pages
Risk Assessment
No ratings yet
Risk Assessment
9 pages
3000/3200 Series: Ratings
No ratings yet
3000/3200 Series: Ratings
2 pages
SPANNING TREE PROTOCOL: Most important topic in switching
From Everand
SPANNING TREE PROTOCOL: Most important topic in switching
Mulayam Singh
No ratings yet

Lecture 5

Uploaded by

Lecture 5

Uploaded by

6CS005 High Performance

If A=(aij) is an m×n matrix and x is a vector with n

• For example, suppose that m=6 and the number of threads,

• To compute y[0], thread 0 will need to execute the

• Therefore thread 0 will need to access every element

• Thus for n threads, and m rows:

• A race condition arises in software when a

• Critical race conditions often happen when the

• Since a thread that is busy-waiting may continually use the

• In shared-memory programming: synchronizing the

• Implementing a barrier using busy-waiting and a

• Resetting the counter after first barrier

• To resolve this issues:

• count_sem: Protects the counter and ensures mutual

• counter and count_sem can be reused since they are

• Separate barrier_sem for Each Barrier:

• A somewhat better approach to creating a barrier in

Typically, condition variables are used in constructs similar to

• Lock mutex ensures exclusive access to shared

• If not, the thread waits and releases the mutex,

• Pthreads read-write locks, a synchronization

• Read Lock (pthread_rwlock_rdlock):

• In shared-memory programming, thread safety

• The first time it’s called the string argument should be

• It correctly tokenizes the input stream.

• strtok() caches the input line by declaring a variable to

• It’s not uncommon for C library functions to not be

• To resolve this issue, use a thread-safe version of

• Each thread would pass its own saveptr to strtok_r,

• A thread in shared-memory programming is analogous to a

• A critical section is a block of code that updates a

• A mutex can be used to avoid conflicting access to

• A barrier is a point in a program at which the threads

You might also like