Introduction to Numpy
Pruthvish Rajput, Venus Patel
February 23, 2023
1 Computation on NumPy Arrays: Universal Functions
1.1 The Slowness of Loops
• Python’s default implementation (known as CPython) does some operations very slowly.
• This is in part due to the dynamic, interpreted nature of the language:
– the fact that types are flexible, so that sequences of operations cannot be compiled down
to efficient machine code as in languages like C and Fortran.
• Recently there have been various attempts to address this weakness: well-known examples
are the
– PyPy project, a just-in-time compiled implementation of Python;
– the Cython project, which converts Python code to compilable C code;
– the Numba project, which converts snippets of Python code to fast LLVM bytecode.
[1]: import numpy as np
np.random.seed(0)
def compute_reciprocals(values):
output = np.empty(len(values))
for i in range(len(values)):
output[i] = 1.0 / values[i]
return output
values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)
[1]: array([0.16666667, 1. , 0.25 , 0.25 , 0.125 ])
[2]: big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)
1.92 s ± 66.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.2 Bottleneck:
• the type-checking and function dispatches that CPython must do at each cycle of the loop.
• Each time the reciprocal is computed, Python first examines the object’s type and does a
dynamic lookup of the correct function to use for that type.
1
1.3 Introducing UFuncs
• NumPy provides a convenient interface into just this kind of statically typed, compiled rou-
tine. This is known as a vectorized operation.
• This can be accomplished by simply performing an operation on the array, which will then
be applied to each element.
• This vectorized approach is designed to push the loop into the compiled layer that underlies
NumPy, leading to much faster execution.
[3]: print(compute_reciprocals(values))
print(1.0 / values)
[0.16666667 1. 0.25 0.25 0.125 ]
[0.16666667 1. 0.25 0.25 0.125 ]
Looking at the execution time for our big array, we see that it completes orders of magnitude faster
than the Python loop:
[4]: %timeit (1.0 / big_array)
2.75 ms ± 456 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
• Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is to quickly
execute repeated operations on values in NumPy arrays.
• Ufuncs are extremely flexible – before we saw an operation between a scalar and an array,
but we can also operate between two arrays:
[5]: np.arange(5) / np.arange(1, 6)
[5]: array([0. , 0.5 , 0.66666667, 0.75 , 0.8 ])
And ufunc operations are not limited to one-dimensional arrays–they can also act on multi-
dimensional arrays as well:
[6]: x = np.arange(9).reshape((3, 3))
2 ** x
[6]: array([[ 1, 2, 4],
[ 8, 16, 32],
[ 64, 128, 256]])
• Computations using vectorization through ufuncs are nearly always more efficient than their
counterpart implemented using Python loops, especially as the arrays grow in size.
• Any time you see such a loop in a Python script, you should consider whether it can be
replaced with a vectorized expression.
1.4 Exploring NumPy’s UFuncs
Ufuncs exist in two flavors:
- *unary ufuncs*, which operate on a single input
2
- *binary ufuncs*, which operate on two inputs.
1.4.1 Array arithmetic
[7]: x = np.arange(4)
print("x =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2) # floor division
x = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0. 0.5 1. 1.5]
x // 2 = [0 0 1 1]
There is also a unary ufunc for negation, and a ** operator for exponentiation, and a % operator
for modulus:
[8]: print("-x = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2 = ", x % 2)
-x = [ 0 -1 -2 -3]
x ** 2 = [0 1 4 9]
x % 2 = [0 1 0 1]
In addition, these can be strung together however you wish, and the standard order of operations
is respected:
[9]: -(0.5*x + 1) ** 2
[9]: array([-1. , -2.25, -4. , -6.25])
Each of these arithmetic operations are simply convenient wrappers around specific functions built
into NumPy; for example, the + operator is a wrapper for the add function:
[10]: np.add(x, 2)
[10]: array([2, 3, 4, 5])
The following table lists the arithmetic operators implemented in NumPy:
Operator Equivalent ufunc Description
+ np.add Addition (e.g., 1 + 1 = 2)
- np.subtract Subtraction (e.g., 3 - 2 = 1)
- np.negative Unary negation (e.g., -2)
3
Operator Equivalent ufunc Description
* np.multiply Multiplication (e.g., 2 * 3 = 6)
/ np.divide Division (e.g., 3 / 2 = 1.5)
// np.floor_divide Floor division (e.g., 3 // 2 = 1)
** np.power Exponentiation (e.g., 2 ** 3 = 8)
% np.mod Modulus/remainder (e.g., 9 % 4 = 1)
1.4.2 Absolute value
Just as NumPy understands Python’s built-in arithmetic operators, it also understands Python’s
built-in absolute value function:
[11]: x = np.array([-2, -1, 0, 1, 2])
abs(x)
[11]: array([2, 1, 0, 1, 2])
The corresponding NumPy ufunc is np.absolute, which is also available under the alias np.abs:
[12]: np.absolute(x)
[12]: array([2, 1, 0, 1, 2])
[13]: np.abs(x)
[13]: array([2, 1, 0, 1, 2])
This ufunc can also handle complex data, in which the absolute value returns the magnitude:
[14]: x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)
[14]: array([5., 5., 2., 1.])
1.4.3 Trigonometric functions
NumPy provides a large number of useful ufuncs, and some of the most useful for the data scientist
are the trigonometric functions. We’ll start by defining an array of angles:
[15]: theta = np.linspace(0, np.pi, 3)
Now we can compute some trigonometric functions on these values:
[16]: print("theta = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))
4
theta = [0. 1.57079633 3.14159265]
sin(theta) = [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) = [ 1.000000e+00 6.123234e-17 -1.000000e+00]
tan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16]
The values are computed to within machine precision, which is why values that should be zero do
not always hit exactly zero. Inverse trigonometric functions are also available:
[17]: x = [-1, 0, 1]
print("x = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))
x = [-1, 0, 1]
arcsin(x) = [-1.57079633 0. 1.57079633]
arccos(x) = [3.14159265 1.57079633 0. ]
arctan(x) = [-0.78539816 0. 0.78539816]
1.4.4 Exponents and logarithms
Another common type of operation available in a NumPy ufunc are the exponentials:
[18]: x = [1, 2, 3]
print("x =", x)
print("e^x =", np.exp(x))
print("2^x =", np.exp2(x))
print("3^x =", np.power(3, x))
x = [1, 2, 3]
e^x = [ 2.71828183 7.3890561 20.08553692]
2^x = [2. 4. 8.]
3^x = [ 3 9 27]
The inverse of the exponentials, the logarithms, are also available. The basic np.log gives the
natural logarithm; if you prefer to compute the base-2 logarithm or the base-10 logarithm, these
are available as well:
[19]: x = [1, 2, 4, 10]
print("x =", x)
print("ln(x) =", np.log(x))
print("log2(x) =", np.log2(x))
print("log10(x) =", np.log10(x))
x = [1, 2, 4, 10]
ln(x) = [0. 0.69314718 1.38629436 2.30258509]
log2(x) = [0. 1. 2. 3.32192809]
log10(x) = [0. 0.30103 0.60205999 1. ]
There are also some specialized versions that are useful for maintaining precision with very small
input:
5
[20]: x = [0, 0.001, 0.01, 0.1]
print("exp(x) - 1 =", np.expm1(x))
print("log(1 + x) =", np.log1p(x))
exp(x) - 1 = [0. 0.0010005 0.01005017 0.10517092]
log(1 + x) = [0. 0.0009995 0.00995033 0.09531018]
When x is very small, these functions give more precise values than if the raw np.log or np.exp
were to be used.
1.4.5 Specialized ufuncs
[21]: from scipy import special
[22]: # Gamma functions (generalized factorials) and related functions
x = [1, 5, 10]
print("gamma(x) =", special.gamma(x))
print("ln|gamma(x)| =", special.gammaln(x))
print("beta(x, 2) =", special.beta(x, 2))
gamma(x) = [1.0000e+00 2.4000e+01 3.6288e+05]
ln|gamma(x)| = [ 0. 3.17805383 12.80182748]
beta(x, 2) = [0.5 0.03333333 0.00909091]
[23]: # Error function (integral of Gaussian)
# its complement, and its inverse
x = np.array([0, 0.3, 0.7, 1.0])
print("erf(x) =", special.erf(x))
print("erfc(x) =", special.erfc(x))
print("erfinv(x) =", special.erfinv(x))
erf(x) = [0. 0.32862676 0.67780119 0.84270079]
erfc(x) = [1. 0.67137324 0.32219881 0.15729921]
erfinv(x) = [0. 0.27246271 0.73286908 inf]