[go: up one dir, main page]

0% found this document useful (0 votes)
7 views21 pages

ASPP2024 Parallel Part1 Theory

The document discusses parallel programming in Python, focusing on processes, threads, and the Global Interpreter Lock (GIL). It highlights the benefits of parallelization for speeding up computations and handling large data, while also addressing challenges like race conditions. The document includes practical examples and exercises related to multithreading with NumPy and the multiprocessing package, emphasizing the importance of understanding these concepts for effective programming.

Uploaded by

lure-harmful-fog
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views21 pages

ASPP2024 Parallel Part1 Theory

The document discusses parallel programming in Python, focusing on processes, threads, and the Global Interpreter Lock (GIL). It highlights the benefits of parallelization for speeding up computations and handling large data, while also addressing challenges like race conditions. The document includes practical examples and exercises related to multithreading with NumPy and the multiprocessing package, emphasizing the importance of understanding these concepts for effective programming.

Uploaded by

lure-harmful-fog
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Parallel Python

Aitor Morales-Gregorio
Jenni Rinker
Zbigniew Jędrzejewski-Szmek
ASPP 2024, Heraklion
Fork/clone the repo now!
Outline

● Processes, threads and THE GIL

● Hands-on investigations of embarrassingly parallel problems

A. Multithreading with NumPy

B. The multiprocessing package

C. Blending processes and threads

● Going further

● Wrap-up

4
Exercise: brainstorm
Why do we parallelize?

Talk to your partner and come up with three practical examples of where
parallelization could be beneficial (in your work or another application).

5
Exercise: brainstorm
Why do we parallelize?

Talk to your partner and come up with three practical examples of where
parallelization could be beneficial (in your work or another application).

In short, two reasons why:

● Speed up computations.
● Process “big” things.

As for the “how”...we’ll come back to that later.

6
Process, threads and THE GIL

7
The dakos program

8
The dakos program.
We need to make a single dish: DAKOS.
This requires:
1. Fetch olive oil from the pantry
2. Fetch rusks from the pantry
3. Drizzle the rusks with water
4. Drizzle the rusks with olive oil
5. Fetch tomatoes from the pantry
6. Wash the tomatoes
7. Grate the tomatoes
8. Fetch feta from the pantry
9. Grate the feta cheese
10. Combine the softened rusks, grated tomatoes, and grated
cheese
11. Place the finished dakos in the pantry

9
workstation
Optimize the dakos program place for non-fetch actions
1. Fetch olive oil from the pantry
2. Fetch rusks from the pantry chef
3. Drizzle the rusks with water a stupid box
4. Drizzle the rusks with olive oil (does actions)
5. Fetch tomatoes from the pantry
6. Wash the tomatoes
7. Grate the tomatoes
8. Fetch feta from the pantry
9. Grate the feta cheese
10. Combine the softened rusks, grated
tomatoes, and grated cheese
11. Place the finished dakos in the pantry

countertop
pantry temporarily holds things (recipe,
across the street intermediate ingredients, etc.) 10
Congratulations!
You have just designed a

multi-threaded process.

11
Nice explanation of processes,
threads, memory in [1]
Decoding the metaphor
Key concepts:
Some* elements of our kitchen metaphor:
● Workstations
● Process ● Countertop
○ A running program. OS assigns it space ● Stupid boxChef with list of tasks
in memory for instructions/data. ● Pantry
● (extra) Restaurant owner
● Thread
○ Unit of computation (i.e., set of So which element is which?
instructions) that the OS sends to CPU ● Workstations:
for execution.
● Countertop:
● Chef w/tasks:
● Pantry:
● …and others ● Restaurant owner:
○ Recall the computer architecture lecture!
○ NOTE this metaphor ignores caching. *This list is non-exhaustive. ;)

12
But there is a problem…
…and that problem is tomatoes.

Consider a scenario:
● The dakos program requires fetching tomatoes one-by-one.
● The “fetch tomatoes” task includes a sub-task:
⚫ count all the tomatoes in the pantry,

⚫ take 1 tomato for the dako, and

⚫ write on a note on the countertop how many are left in the pantry

● What happens if two chefs making dakos execute their tasks at the same
time?
○ The number of tomatoes will be off by 1!

13
This problem is called a race condition

Race conditions are problematic in any multithreaded code. In Python, race


conditions can lead to memory corruption.

To avoid race conditions, *Python implemented something special called…

*To be specific, only CPython – Python built on the C language. Other types of Python may not
have this, but you are almost certainly using CPython.

14
The “Global Interpreter Lock” (GIL)
● A “mutex” (mutual exclusion) lock.
● Within the Python process, only 1 thread is
allowed to execute pure-Python code in a
given instance.
● The lock is acquired and released by threads,
approximately every 100 bytecode instructions.
Also released in other cases, e.g., I/O.

Hypothesize with your partner:


NumPy can (and by default does) run code with
multiple threads in parallel. How is this possible?
15
NumPy interfaces with non-Python
NumPy’s trick libraries that, by default, use as many
threads as you have cores.

In other words, it is many


chefs disguised as one!

16
What does this all mean?
Computationally heavy, pure-Python code will generally have 0 speed-up with
multiple threads.

Some specific packages (NumPy) get around this by spinning up multiple threads
without the Python interpreter knowing.

Note that network- and IO-bound problems release the lock and thus can be
handled with multithreading.

Guess: how can we get around the GIL for non-NumPy, non-I/O code?
● Instead of multiple threads, use multiple processes.

17
Multiprocessing: multiple teams of chefs

Each process is an instance of the


Python interpreter and therefore has its
own GIL!

BUT processes have separate memory,


so data must be duplicated in each
process.

Multiple processes therefore have


Red team and purple team split dakos additional computational overhead and
tasks.
memory usage.
Each team can have 1 chef working. 18
To wrap things up…a pop quiz!

On your pair computer, please navigate to kahoot.it


and enter game pin

19
Outline

● Processes, threads and THE GIL

● Hands-on investigations of embarrassingly parallel problems

A. Multithreading with NumPy

B. The multiprocessing package

C. Blending processes and threads

● Going further

● Wrap-up

20
Supplementary material

21
Further reading

1. Open textbook on Operating Systems. Chapters 4, 13 and 26 are particularly nice. pages.cs.wisc.edu

22

You might also like