Numpy I
Numpy I
ADVIST
by
Dmitrii Nechaev & Dr. Lothar Richter
25.11.22
Recap
2/93
Recap: Generators
3/93
Recap: Generators
3/93
Recap: Function-Based Context Managers
4/93
Recap: Function-Based Context Managers
4/93
Recap: Closures
5/93
Recap: Closures
5/93
Recap: Decorators
6/93
Recap: Decorators
6/93
Recap: Type Hints
7/93
Recap: Type Hints
type hints allow us to specify types for variables, function parameters, and
function return values
IDEs/editors and 3rd party tools can perform static type checking
libraries like pydantic allow us to perform validation based on type hints
7/93
Recap: Dataclasses
8/93
Recap: Dataclasses
8/93
NumPy
9/93
Today
vectorized operations
NumPy data types
creating NumPy arrays
accessing NumPy arrays’ elements
math on NuPy arrays
views, copies, and shapes
10/93
Vectorized Operations
11/93
NumPy
NumPy is a library implementing an N‑dimensional array object and linear algebra (and
other) capabilities. It is written in Python and C.
… why should we care?
12/93
Pairwise Sum
Let us take a look at a simple example ‑ pair‑wise sum of list elements:
1 import numpy as np
2 from random import randint
1 length = 10_000_000
2 numbers_1 = [
3 randint(-100, 100) for _ in range(length)
4 ]
5 numbers_2 = [
6 randint(-100, 100) for _ in range(length)
7 ]
13/93
Pairwise Sum with NumPy
14/93
Benchmarking Pairwise Sum
719 ms ± 30.5 ms per loop (mean ± std. dev. of 10 runs, 3 loops each)
1 %timeit -r 10 -n 3 np_pairwise_sum(np_numbers_1, np_numbers_2)
26.7 ms ± 1.98 ms per loop (mean ± std. dev. of 10 runs, 3 loops each)
15/93
Dot Product
16/93
Benchmarking Dot Product
990 ms ± 17.2 ms per loop (mean ± std. dev. of 10 runs, 3 loops each)
1 %timeit -r 10 -n 3 np_dot_product(np_numbers_1, np_numbers_2)
12.8 ms ± 1.43 ms per loop (mean ± std. dev. of 10 runs, 3 loops each)
17/93
Runtime comparison
Python NumPy
Pairwise Sum ~800 ~30
Dot Product ~1000 ~15
18/93
Why?
19/93
Hardware - Memory
a CPU manipulates values stored in CPU registers (fastest, smallest)
if a value is not present in a CPU register, it needs to be copied from RAM
L1 cache added ‑ if a value is not present in a CPU register, check the L1
cache first, and if there is a cache miss, first copy the value from RAM to the
L1 cache
L2 cache added, L3 cache added
Ryzen 7 5800X3D specifications:
L1: 512KB
L2: 4MB
L3: 96MB
20/93
Hardware - Memory
21/93
Cache?
Yes, cache.
Spatial locality: data is loaded in chunks of bytes from RAM to the cache; in other
words, values that are adjacent in memory are copied together (under the assumption
that we will probably process them all).
22/93
Adjacent Values and Python Lists
” CPython’s lists are really variable‑length arrays, not Lisp‑style linked lists. The
implementation uses a contiguous array of references to other objects, and
keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list ‘a[i]‘ an operation whose cost is independent of the
size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some
cleverness is applied to improve the performance of appending items repeat‑
edly; when the array must be grown, some extra space is allocated so the next
few times don’t require an actual resize.
https://docs.python.org/3/faq/design.html
23/93
Adjacent Values and Python Lists
Source Code:
1 typedef struct {
2 PyObject_VAR_HEAD
3 PyObject **ob_item;
4 Py_ssize_t allocated;
5 } PyListObject;
24/93
Adjacent Values and Python Lists
25/93
Adjacent Values and NumPy Arrays
26/93
Hardware - CPU
Single Instruction, Multiple Data:
a CPU is simultaneously provided with multiple values and performs an operation on all
of them at once (SIMD, SSE, AVX, …). This is also known as vectorization.
1 def vectorized_pairwise_sum(iterable1, iterable2):
2 stride = 4
3 result = []
4 for i in range(0, len(iterable1), stride):
5 result.append(
6 iterable1[i:i+stride] +
7 iterable2[i:i+stride]
8 )
27/93
Why Was NumPy Faster?
Python lists hold references to the values they hold (instead of containing the values
themselves, they contain locations in memory where the data is stored). That gives us
heterogeneity, but that also means we have no guarantee that all (or, at least, enough)
of our values are copied to the cache in one operation.
NumPy arrays are homogeneous and store data in sequential chunks of memory*. That
means several values can be copied from RAM to the CPU cache at once.
NumPy also supports vectorized operations on the data. That means we
don’t need to explicitly loop over elements;
get results of our computations faster.
28/93
Vectorization: Overview
29/93
NumPy Data Types
30/93
NumPy Data Types
Let’s create a couple of NumPy arrays:
1 arr_1 = np.array([1, 2, 3])
2 arr_2 = np.array([4, 5, 6])
3 arr_1 + arr_2
array([5, 7, 9])
We have mentioned that NumPy arrays contain elements of the same data type. What is
the data type in this particular example?
1 arr_1.dtype
dtype('int64')
31/93
NumPy Data Types: Numbers
We didn’t set the data type explicitly, so NumPy used the default int datatype. Let’s
specify a data type:
1 arr_1 = np.array(
2 [255, 255, 255], dtype=np.uint8
3 )
4 arr_2 = np.array(
5 [1, 1, 1], dtype=np.uint8
6 )
7 arr_1 + arr_2
32/93
NumPy Data Types: Numbers
Everything has a price, and so do the benefits of data types occupying fixed space in
memory. We just got an overflow error!
33/93
NumPy Data Types: Values
34/93
NumPy Data Types: Numbers
We also have Boolean and floating‑point values:
1 arr_1 = np.array([1.0, 2.0, 3.0])
2 arr_1.dtype
dtype('float64')
1 arr_2 = np.array([True, False, True])
2 arr_2.dtype
dtype('bool')
1 arr_1 + arr_2
35/93
NumPy Data Types: Strings
You will mostly use NumPy with numeric data. That said, it is also possible to use
NumPy with strings:
1 np.array(['a', 'b', 'c'])
36/93
NumPy Data Types: Strings
NumPy uses the maximum length of strings present in an array as the upper limit on all
values. Be careful!
1 arr_s = np.array(['martin', 'oliver'])
2 arr_s[0] = 'andreas'
3 arr_s[1] = 'dmitrii'
4 arr_s
37/93
NumPy Data Types: Object
We can store strings of arbitrary length using object. In fact, we can store arbitrary
objects in NumPy arrays using object:
1 arr_s = np.array(
2 ['martin', 'oliver'],
3 dtype=object
4 )
5 arr_s[0] = 'andreas'
6 arr_s[1] = 'dmitrii'
7 arr_s
38/93
NumPy Data Types: Casting
We can change the data type via the astype method:
1 arr_1 = np.array([1.1, 2.2, 3.3])
2 arr_2 = arr_1.astype(np.uint8)
3 arr_2
39/93
NumPy Data Types: Overview
40/93
CreatingNumPy Arrays
41/93
Creating NumPy Arrays
We have already used lists to create NumPy arrays. We can also use tuples:
1 np.array((1, 2, 3))
array([1, 2, 3])
42/93
Creating NumPy Arrays
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
43/93
Creating NumPy Arrays
array([1, 2, 3])
44/93
Creating NumPy Arrays
We can also use the fromiter method to create NumPy arrays from iterables:
1 np.fromiter({3, 2, 1, 2, 3}, dtype=np.int16)
5 np.fromiter(tree_fiddy(), dtype=np.int8)
45/93
Creating NumPy Arrays
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
46/93
Creating NumPy Arrays
1 np.zeros((1, 2))
array([[0., 0.]])
1 np.ones((2, 3))
47/93
Creating NumPy Arrays
1 np.full((2, 3), 4)
array([[4, 4, 4],
[4, 4, 4]])
1 np.random.random((2, 2))
array([[0.12745227, 0.72619059],
[0.31397517, 0.09816663]])
1 np.random.randint(0, 99, (2, 2))
array([[54, 54],
[21, 54]])
48/93
NumPy Array Creation: Overview
49/93
NumPy Arrays: Element Access
50/93
NumPy Arrays: Element Access
We can access elements of NumPy arrays by their indices:
1 arr_2d = np.array([
2 [1, 2, 3],
3 [4, 5, 6],
4 [7, 8, 9]
5
6 ])
7 arr_2d[1]
array([4, 5, 6])
1 arr_2d[1, 1]
51/93
NumPy Arrays: Element Access
array([[2, 3],
[5, 6]])
1 arr_2d[1:, 2]
array([6, 9])
52/93
NumPy Arrays: Element Access
We can also access array elements using Boolean indexing:
1 numbers = np.array([
2 [1, 2, 3],
3 [4, 5, 6],
4 [7, 8, 9]
5
6 ])
7 booleans = (numbers % 2 == 0)
8 booleans
53/93
NumPy Arrays: Element Access
1 numbers[booleans]
array([2, 4, 6, 8])
54/93
NumPy Arrays: Fancy Indexing
We can pass an array of indices to access multiple elements at once:
1 numbers
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 numbers[[0, 2, 1], [1, 2, 0]]
array([2, 9, 4])
1 [numbers[0, 1], numbers[2, 2], numbers[1, 0]]
[2, 9, 4]
55/93
NumPy Arrays: Fancy Indexing
We can combine simple indices with fancy indexing:
1 numbers
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 numbers[2, [2, 0]]
array([9, 7])
1 [numbers[2, 2], numbers[2, 0]]
[9, 7]
56/93
NumPy Arrays: Fancy Indexing
We can combine slicing with fancy indexing, too:
1 numbers
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 numbers[1:, [1, 0]]
array([[5, 4],
[8, 7]])
57/93
Sorting
array([1, 2, 3, 4, 5])
58/93
Sorting
We can sort an array along a specific axis (dimension):
1 np.sort(
2 np.array(
3 [
4 [2, 5, 6],
5 [1, 4, 3]
6 ]
7 ),
8 axis=0
9 )
array([[1, 4, 3],
[2, 5, 6]])
59/93
Sorting
1 np.sort(
2 np.array(
3 [
4 [2, 5, 6],
5 [1, 4, 3]
6 ]
7 ),
8 axis=1
9 )
array([[2, 5, 6],
[1, 3, 4]])
60/93
Finding Elements
We can get indices of elements satisfying a specific condition via the where method:
1 arr = np.random.randint(10, 20, (5, ))
2 arr
(array([1, 2, 3]),)
1 arr[indices]
61/93
NumPy Array: Shape
We can use the len function, but it will simply return the number of elements in the
first dimension:
1 len(np.arange(1, 10))
9
1 len(np.array([[1, 2], [3, 4]]))
2
If we want to get numbers of elements across all dimensions, we need to use the shape
attribute:
1 np.array([[1, 2], [3, 4]]).shape
(2, 2)
62/93
Accessing NumPy Arrays: Overview
access by index
slicing
boolean indexing
fancy indexing
combining the above
sorting with sort, finding elements with where
checking number of elements with shape
63/93
Math onNumPy Arrays
64/93
Math on NumPy Arrays
65/93
Math on NumPy Arrays
32
1 arr_1 @ arr_2
32
66/93
Math on NumPy Arrays
We have sum and product methods/functions:
1 arr_2d = np.array([
2 [1, 2, 3],
3 [4, 5, 6],
4 [7, 8, 9]
5
6 ])
7 arr_2d
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 np.sum(arr_2d) 1 arr_2d.prod()
45 362880
67/93
Math on NumPy Arrays
We can also compute a sum or a product over a given axis:
1 arr_2d
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
68/93
Math on NumPy Arrays
We can find mean and median value over the entire array:
1 arr_2d
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 np.mean(arr_2d) 1 np.median(arr_2d)
5.0 5.0
69/93
Math on NumPy Arrays
We can compute statistics over a specific axis, too:
1 arr_2d
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
70/93
Math on NumPy Arrays
We can find min and max elements:
1 arr_2d
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 np.min(arr_2d) 1 np.max(arr_2d)
1 9
1 np.min(arr_2d, axis=0) 1 np.max(arr_2d, axis=0)
array([1, 2, 3]) array([7, 8, 9])
1 np.min(arr_2d, axis=1) 1 np.max(arr_2d, axis=1)
array([1, 4, 7]) array([3, 6, 9])
71/93
Math on NumPy Arrays
We can also find indices of min and max elements:
1 arr_2d
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
1 np.argmax(arr_2d, axis=0)
array([2, 2, 2])
1 np.argmin(arr_2d, axis=1)
array([0, 0, 0])
72/93
Math on NumPy Arrays
73/93
Math on NumPy Arrays: Overview
pairwise operations
matrix multiplication
sum and product, min and max, mean and median, etc.
over the entire array or a specific axis only
check the documentation!
74/93
Broadcasting
75/93
Broadcasting
76/93
Broadcasting
Broadcasting allows us to use a smaller array several times together with a larger array
according to the following rules:
if the arrays do not have same number of dimensions, prepend 1 to the
shape of the smaller one until they do;
arrays are compatible in a dimension if they have the same size in a given
dimension OR if the smaller array has size 1;
a smaller array acts as if it was copied along those dimensions where its size
is 1.
77/93
Broadcasting
In this example dimensions are not compatible and broadcasting doesn’t work:
1 long_arr = np.array([[1, 2, 3], [4, 5, 6]])
2 short_arr = np.array([10, 100])
3 long_arr * short_arr
ValueError: operands could not be broadcast together with shapes (2,3) (2,)
78/93
Broadcasting: Overview
79/93
Views, Copies, and Shapes
80/93
Views
When we slice an array, we do not create a new copy of the data:
1 arr = np.array([
2 [1, 2, 3],
3 [4, 5, 6],
4 [7, 8, 9]
5
6 ])
7 sub_arr = arr[1:, :2]
8 sub_arr
array([[4, 5],
[7, 8]])
81/93
Views
We create a view:
1 sub_arr[0, 1] = 100
2 arr
array([[ 1, 2, 3],
[ 4, 100, 6],
[ 7, 8, 9]])
82/93
Views
A view is a new array object that refers to the same data:
1 arr_view = arr.view()
2 arr_view
array([[ 1, 2, 3],
[ 4, 100, 6],
[ 7, 8, 9]])
1 arr_view is arr
False
1 arr[2, 2] = 999
2 arr_view
array([[ 1, 2, 3],
[ 4, 100, 6],
[ 7, 8, 999]])
83/93
Copies
The copy method returns a new array containing a copy of data:
1 arr = np.arange(1, 7).reshape(2, 3)
2 arr_copy = arr.copy()
3 arr_copy[1, 0] = 9; arr_copy
array([[1, 2, 3],
[9, 5, 6]])
1 arr
array([[1, 2, 3],
[4, 5, 6]])
84/93
Shapes
Can you guess what the reshape method does?
1 arr_1d = np.arange(8)
2 arr_1d
array([0, 1, 2, 3, 4, 5, 6, 7])
1 arr_1d.reshape(2, 4) 1 arr_1d.reshape(4, 2)
array([[0, 1, 2, 3], array([[0, 1],
[4, 5, 6, 7]]) [2, 3],
[4, 5],
[6, 7]])
85/93
Shapes
1 arr_3d = arr_1d.reshape(2, 2, 2)
2 arr_3d
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
1 arr_3d[0, 1, 1] 1 arr_3d[1, 0, 0]
3 4
86/93
Shapes
https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
87/93
Shapes
1 arr_1d = np.arange(8)
2 arr_2d = arr_1d.reshape(2, 4)
3 arr_2d[0, 3] = 100
4 arr_1d
88/93
Transposing
To transpose an array, use the T method or the numpy.transpose function:
1 arr = np.arange(6).reshape(2,3)
2 arr
array([[0, 1, 2],
[3, 4, 5]])
1 np.transpose(arr) 1 arr.T
array([[0, 3], array([[0, 3],
[1, 4], [1, 4],
[2, 5]]) [2, 5]])
89/93
Transposing
https://numpy.org/doc/stable/reference/generated/numpy.transpose.html
90/93
Transposing
1 transposed = arr.T
2 transposed[0,1] = 100
3 arr
array([[ 0, 1, 2],
[100, 4, 5]])
91/93
Views, Copies, and Shapes: Overview
92/93
Thank you!
QUESTIONS?
93/93