0% found this document useful (0 votes)

4 views99 pages

Numpy I

Uploaded by

cheukyinchanvic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views99 pages

Numpy I

Uploaded by

cheukyinchanvic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 99

Lecture 5 - NumPy I

ADVIST

by
Dmitrii Nechaev & Dr. Lothar Richter

25.11.22
Recap

2/93
Recap: Generators

3/93
Recap: Generators

a generator is a function that returns a generator iterator (use the yield

keyword)
a generator expression combines lazy evaluation of generators with beauty
and simplicity of list comprehensions

3/93
Recap: Function-Based Context Managers

4/93
Recap: Function-Based Context Managers

we can create a context manager using a function that yields and a

contextmanager decorator

4/93
Recap: Closures

5/93
Recap: Closures

functions can accept other functions as arguments

functions can return functions
a function can be nested, i.o.w., an outer function can have an inner
function within
when an inner function is defined within an outer function, the inner
function keeps access to the variables of the outer function

5/93
Recap: Decorators

6/93
Recap: Decorators

we can use closures to create function‑based decorators

we can define decorators that have parameters (decorator factories)
functools.wraps helps us to keep original names and docstrings

6/93
Recap: Type Hints

7/93
Recap: Type Hints

type hints allow us to specify types for variables, function parameters, and
function return values
IDEs/editors and 3rd party tools can perform static type checking
libraries like pydantic allow us to perform validation based on type hints

7/93
Recap: Dataclasses

8/93
Recap: Dataclasses

the @dataclass decorator simplifies class creation

it adds __init__, __repr__, and __eq__ methods by default
we can control method creation by invoking the @dataclass decorator
with parameters

8/93
NumPy

9/93
Today

vectorized operations
NumPy data types
creating NumPy arrays
accessing NumPy arrays’ elements
math on NuPy arrays
views, copies, and shapes

10/93
Vectorized Operations

11/93
NumPy

NumPy is a library implementing an N‑dimensional array object and linear algebra (and
other) capabilities. It is written in Python and C.
… why should we care?

12/93
Pairwise Sum
Let us take a look at a simple example ‑ pair‑wise sum of list elements:
1 import numpy as np
2 from random import randint

1 length = 10_000_000
2 numbers_1 = [
3 randint(-100, 100) for _ in range(length)
4 ]
5 numbers_2 = [
6 randint(-100, 100) for _ in range(length)
7 ]

1 def pairwise_sum(iterable_1, iterable_2):

2 return [
3 x + y for x, y in zip(
4 iterable_1, iterable_2
5 )
6 ]

13/93
Pairwise Sum with NumPy

Let us also use NumPy to achieve the same goal:

1 np_numbers_1 = np.array(numbers_1)
2 np_numbers_2 = np.array(numbers_2)
3

4 def np_pairwise_sum(np_array_1, np_array_2):

5 return np_array_1 + np_array_2

14/93
Benchmarking Pairwise Sum

1 %timeit -r 10 -n 3 pairwise_sum(numbers_1, numbers_2)

719 ms ± 30.5 ms per loop (mean ± std. dev. of 10 runs, 3 loops each)
1 %timeit -r 10 -n 3 np_pairwise_sum(np_numbers_1, np_numbers_2)

26.7 ms ± 1.98 ms per loop (mean ± std. dev. of 10 runs, 3 loops each)

15/93
Dot Product

1 def dot_product(iterable_1, iterable_2):

2 return sum(
3 x * y for x, y in zip(
4 iterable_1, iterable_2
5 )
6 )

1 def np_dot_product(np_array_1, np_array_2):

2 return np.dot(np_array_1, np_array_2)

16/93
Benchmarking Dot Product

1 %timeit -r 10 -n 3 dot_product(numbers_1, numbers_2)

990 ms ± 17.2 ms per loop (mean ± std. dev. of 10 runs, 3 loops each)
1 %timeit -r 10 -n 3 np_dot_product(np_numbers_1, np_numbers_2)

12.8 ms ± 1.43 ms per loop (mean ± std. dev. of 10 runs, 3 loops each)

17/93
Runtime comparison

Python NumPy
Pairwise Sum ~800 ~30
Dot Product ~1000 ~15

18/93
Why?

No, seriously, why?

19/93
Hardware - Memory
a CPU manipulates values stored in CPU registers (fastest, smallest)
if a value is not present in a CPU register, it needs to be copied from RAM
L1 cache added ‑ if a value is not present in a CPU register, check the L1
cache first, and if there is a cache miss, first copy the value from RAM to the
L1 cache
L2 cache added, L3 cache added
Ryzen 7 5800X3D specifications:
L1: 512KB
L2: 4MB
L3: 96MB

20/93
Hardware - Memory

21/93
Cache?

Yes, cache.
Spatial locality: data is loaded in chunks of bytes from RAM to the cache; in other
words, values that are adjacent in memory are copied together (under the assumption
that we will probably process them all).

22/93
Adjacent Values and Python Lists
” CPython’s lists are really variable‑length arrays, not Lisp‑style linked lists. The
implementation uses a contiguous array of references to other objects, and
keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list ‘a[i]‘ an operation whose cost is independent of the
size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some
cleverness is applied to improve the performance of appending items repeat‑
edly; when the array must be grown, some extra space is allocated so the next
few times don’t require an actual resize.

https://docs.python.org/3/faq/design.html

23/93
Adjacent Values and Python Lists

The implementation uses a contiguous array of references to other objects

Source Code:
1 typedef struct {
2 PyObject_VAR_HEAD
3 PyObject **ob_item;
4 Py_ssize_t allocated;
5 } PyListObject;

24/93
Adjacent Values and Python Lists

25/93
Adjacent Values and NumPy Arrays

26/93
Hardware - CPU
Single Instruction, Multiple Data:
a CPU is simultaneously provided with multiple values and performs an operation on all
of them at once (SIMD, SSE, AVX, …). This is also known as vectorization.
1 def vectorized_pairwise_sum(iterable1, iterable2):
2 stride = 4
3 result = []
4 for i in range(0, len(iterable1), stride):
5 result.append(
6 iterable1[i:i+stride] +
7 iterable2[i:i+stride]
8 )

(This is not a valid Python code snippet =^__^=)

27/93
Why Was NumPy Faster?
Python lists hold references to the values they hold (instead of containing the values
themselves, they contain locations in memory where the data is stored). That gives us
heterogeneity, but that also means we have no guarantee that all (or, at least, enough)
of our values are copied to the cache in one operation.
NumPy arrays are homogeneous and store data in sequential chunks of memory*. That
means several values can be copied from RAM to the CPU cache at once.
NumPy also supports vectorized operations on the data. That means we
don’t need to explicitly loop over elements;
get results of our computations faster.

28/93
Vectorization: Overview

vectorization refers to performing operation on several values at once

(SIMD)
alternatively, vectorization means converting an algorithm from operating
on a single value at a time to operating on several values at a time
several values need to be present at once in CPU registers to perform
vectorized operations
vanilla Python does not provide vectorization capabilities
NumPy n‑dimensional arrays store data in contiguous chunks of memory
and implement optimizations to perform vectorized operations

29/93
NumPy Data Types

30/93
NumPy Data Types
Let’s create a couple of NumPy arrays:
1 arr_1 = np.array([1, 2, 3])
2 arr_2 = np.array([4, 5, 6])
3 arr_1 + arr_2

array([5, 7, 9])
We have mentioned that NumPy arrays contain elements of the same data type. What is
the data type in this particular example?
1 arr_1.dtype

dtype('int64')

31/93
NumPy Data Types: Numbers
We didn’t set the data type explicitly, so NumPy used the default int datatype. Let’s
specify a data type:
1 arr_1 = np.array(
2 [255, 255, 255], dtype=np.uint8
3 )
4 arr_2 = np.array(
5 [1, 1, 1], dtype=np.uint8
6 )
7 arr_1 + arr_2

array([0, 0, 0], dtype=uint8)

32/93
NumPy Data Types: Numbers

Everything has a price, and so do the benefits of data types occupying fixed space in
memory. We just got an overflow error!

33/93
NumPy Data Types: Values

We can check values of a data type using np.iinfo:

1 np.iinfo(arr_1.dtype)

iinfo(min=0, max=255, dtype=uint8)

1 np.iinfo(np.int16)

iinfo(min=-32768, max=32767, dtype=int16)

1 np.iinfo(np.uint32)

iinfo(min=0, max=4294967295, dtype=uint32)

1 np.iinfo(np.int64)

iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

34/93
NumPy Data Types: Numbers
We also have Boolean and floating‑point values:
1 arr_1 = np.array([1.0, 2.0, 3.0])
2 arr_1.dtype

dtype('float64')
1 arr_2 = np.array([True, False, True])
2 arr_2.dtype

dtype('bool')
1 arr_1 + arr_2

array([2., 2., 4.])

35/93
NumPy Data Types: Strings
You will mostly use NumPy with numeric data. That said, it is also possible to use
NumPy with strings:
1 np.array(['a', 'b', 'c'])

array(['a', 'b', 'c'], dtype='<U1')

1 np.array(['aa', 'bb', 'cc'])

array(['aa', 'bb', 'cc'], dtype='<U2')

1 np.array(['a', 'bb', 'ccc'])

array(['a', 'bb', 'ccc'], dtype='<U3')

36/93
NumPy Data Types: Strings

NumPy uses the maximum length of strings present in an array as the upper limit on all
values. Be careful!
1 arr_s = np.array(['martin', 'oliver'])
2 arr_s[0] = 'andreas'
3 arr_s[1] = 'dmitrii'
4 arr_s

array(['andrea', 'dmitri'], dtype='<U6')

37/93
NumPy Data Types: Object
We can store strings of arbitrary length using object. In fact, we can store arbitrary
objects in NumPy arrays using object:
1 arr_s = np.array(
2 ['martin', 'oliver'],
3 dtype=object
4 )
5 arr_s[0] = 'andreas'
6 arr_s[1] = 'dmitrii'
7 arr_s

array(['andreas', 'dmitrii'], dtype=object)

When we use object, we create a NumPy array storing references to Python objects.

38/93
NumPy Data Types: Casting
We can change the data type via the astype method:
1 arr_1 = np.array([1.1, 2.2, 3.3])
2 arr_2 = arr_1.astype(np.uint8)
3 arr_2

array([1, 2, 3], dtype=uint8)

1 arr_s = np.array(['10', '20', '30'])
2 arr_s.astype(np.uint8)

array([10, 20, 30], dtype=uint8)

39/93
NumPy Data Types: Overview

int8 uint8 float16

int16 uint16 float32
int32 uint32 float64
int64 uint64 bool8
unicode
string
object

40/93
CreatingNumPy Arrays

41/93
Creating NumPy Arrays

We have already used lists to create NumPy arrays. We can also use tuples:
1 np.array((1, 2, 3))

array([1, 2, 3])

42/93
Creating NumPy Arrays

We can use ranges, too:

1 np.array(range(1, 10))

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

43/93
Creating NumPy Arrays

We can use sequences to create NumPy arrays:

1 from collections import deque
2 mdq = deque()
3 mdq.appendleft(3)
4 mdq.appendleft(2)
5 mdq.appendleft(1)
6 np.array(mdq)

array([1, 2, 3])

44/93
Creating NumPy Arrays
We can also use the fromiter method to create NumPy arrays from iterables:
1 np.fromiter({3, 2, 1, 2, 3}, dtype=np.int16)

array([1, 2, 3], dtype=int16)

1 def tree_fiddy():
2 for i in [1, 2, 3, 50]:
3 yield i
4

5 np.fromiter(tree_fiddy(), dtype=np.int8)

array([ 1, 2, 3, 50], dtype=int8)

45/93
Creating NumPy Arrays

NumPy provides several helpers for array creation:

1 np.arange(1, 10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

46/93
Creating NumPy Arrays
1 np.zeros((1, 2))

array([[0., 0.]])
1 np.ones((2, 3))

array([[1., 1., 1.],

[1., 1., 1.]])
1 np.eye(3)

array([[1., 0., 0.],

[0., 1., 0.],
[0., 0., 1.]])

47/93
Creating NumPy Arrays
1 np.full((2, 3), 4)

array([[4, 4, 4],
[4, 4, 4]])
1 np.random.random((2, 2))

array([[0.12745227, 0.72619059],
[0.31397517, 0.09816663]])
1 np.random.randint(0, 99, (2, 2))

array([[54, 54],
[21, 54]])

48/93
NumPy Array Creation: Overview

use array to create arrays from sequences

use fromiter to create arrays from iterables
helper methods arange, zeros, ones, eye, full
random numbers with random.random and random.randit

49/93
NumPy Arrays: Element Access

50/93
NumPy Arrays: Element Access
We can access elements of NumPy arrays by their indices:
1 arr_2d = np.array([
2 [1, 2, 3],
3 [4, 5, 6],
4 [7, 8, 9]
5

6 ])
7 arr_2d[1]

array([4, 5, 6])
1 arr_2d[1, 1]

51/93
NumPy Arrays: Element Access

We can use slicing:

1 arr_2d[:2, 1:]

array([[2, 3],
[5, 6]])
1 arr_2d[1:, 2]

array([6, 9])

52/93
NumPy Arrays: Element Access
We can also access array elements using Boolean indexing:
1 numbers = np.array([
2 [1, 2, 3],
3 [4, 5, 6],
4 [7, 8, 9]
5

6 ])
7 booleans = (numbers % 2 == 0)
8 booleans

array([[False, True, False],

[ True, False, True],
[False, True, False]])

53/93
NumPy Arrays: Element Access

1 numbers[booleans]

array([2, 4, 6, 8])

54/93
NumPy Arrays: Fancy Indexing
We can pass an array of indices to access multiple elements at once:
1 numbers

array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 numbers[[0, 2, 1], [1, 2, 0]]

array([2, 9, 4])
1 [numbers[0, 1], numbers[2, 2], numbers[1, 0]]

[2, 9, 4]

55/93
NumPy Arrays: Fancy Indexing
We can combine simple indices with fancy indexing:
1 numbers

array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 numbers[2, [2, 0]]

array([9, 7])
1 [numbers[2, 2], numbers[2, 0]]

[9, 7]

56/93
NumPy Arrays: Fancy Indexing
We can combine slicing with fancy indexing, too:
1 numbers

array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 numbers[1:, [1, 0]]

array([[5, 4],
[8, 7]])

57/93
Sorting

We can sort an array using np.sort:

1 np.sort(np.array([2, 3, 1, 5, 4]))

array([1, 2, 3, 4, 5])

58/93
Sorting
We can sort an array along a specific axis (dimension):
1 np.sort(
2 np.array(
3 [
4 [2, 5, 6],
5 [1, 4, 3]
6 ]
7 ),
8 axis=0
9 )

array([[1, 4, 3],
[2, 5, 6]])

59/93
Sorting
1 np.sort(
2 np.array(
3 [
4 [2, 5, 6],
5 [1, 4, 3]
6 ]
7 ),
8 axis=1
9 )

array([[2, 5, 6],
[1, 3, 4]])

60/93
Finding Elements
We can get indices of elements satisfying a specific condition via the where method:
1 arr = np.random.randint(10, 20, (5, ))
2 arr

array([13, 16, 18, 19, 12])

1 indices = np.where(arr > 15)
2 indices

(array([1, 2, 3]),)
1 arr[indices]

array([16, 18, 19])

61/93
NumPy Array: Shape
We can use the len function, but it will simply return the number of elements in the
first dimension:
1 len(np.arange(1, 10))

9
1 len(np.array([[1, 2], [3, 4]]))

2
If we want to get numbers of elements across all dimensions, we need to use the shape
attribute:
1 np.array([[1, 2], [3, 4]]).shape

(2, 2)

62/93
Accessing NumPy Arrays: Overview

access by index
slicing
boolean indexing
fancy indexing
combining the above
sorting with sort, finding elements with where
checking number of elements with shape

63/93
Math onNumPy Arrays

64/93
Math on NumPy Arrays

We can use basic mathematical functions as operators or functions:

1 arr_1 = np.array([1, 2, 3])
2 arr_2 = np.array([4, 5, 6])

1 np.add(arr_1, arr_2) 1 arr_1 * arr_2

array([5, 7, 9]) array([ 4, 10, 18])
1 arr_2 - arr_1 1 np.divide(arr_2, arr_1)
array([3, 3, 3]) array([4. , 2.5, 2. ])

65/93
Math on NumPy Arrays

In the previous example multiplication means pairwise multiplication. If we want to

perform vector (matrix) multiplication, we need to use the matmul function or the @
operator:
1 np.matmul(arr_1, arr_2)

32
1 arr_1 @ arr_2

66/93
Math on NumPy Arrays
We have sum and product methods/functions:
1 arr_2d = np.array([
2 [1, 2, 3],
3 [4, 5, 6],
4 [7, 8, 9]
5

6 ])
7 arr_2d