[go: up one dir, main page]

0% found this document useful (0 votes)
15 views15 pages

Unit 2

Uploaded by

Mehak Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views15 pages

Unit 2

Uploaded by

Mehak Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

For most data analysis applications, the main areas of funcionality I'lfocus on are:

Fast vectorized array operations for data munging and cleaning, subsetting and
filtering, transformation, and any other kinds of computations
Common array algorithms like sorting, unique, and set operations
" Eficient descriptive statistics and aggregating/summarizing data
Data alignment and relational data manipulations for merging and joining together
heterogeneous data sets
Expressing conditional logic as array expressions instead of loops with if-elif
else branches
Group-wise data manipulations (aggregation, transformation, function applica
tion). Much more on this in Chapter 5
While NumPy provides the computational foundation for these operations, you will
likely want to use pandas as your basis for most kinds of data analysis (especially for
structured or tabular data) as it provides a rich, high-level interface making most com
mon data tasks very concise and simple. pandas also provides some more domain
specific functionality like time series manipulation, which is not present in NumPy.

In this chapter and throughout the book, I use the standard NumPy
convention of always using import numpy as np. You are, of course,
g welcome to put from numpy import *in your code to avoid having to
write np., but I would caution you against makinga habit of this.

The NumPy ndarray: AMultidimensional Array Object


One of the key features of NumPy is its N-dimensional array object, or ndarray, which
is a fast, flexible container for large data sets in Python. Arrays enable you to perform
mathematical operations on whole blocks of data using similar syntax to the equivalent
operations between scalar elements:
In [8]: data
Out [8]:
array([[ 0.9526, -0.246 -0.8856],
[0.5639, 0.2379, 0.9104]])
In data * 10 In [10]: data + data
Out9
t[9]: Out[10]:
array([[ 9.5256, -2.4601, -8.8565], array([[ 1.9051, -0.492 , -1.7713],
[5.6385, 2.3794, 9.104 ])) [ 1.1277, 0.4759, 1.8208]])
An ndarray is a generic multidimensional container for homogeneous data; that is, all
of the elements must be the same type. Every array has a shape, a tuple indicating the
size of each dimension, and a dtype, an object describing the data type of the array:
In [11]: data.shape
Out[11]: (2, 3)

80 | Chapter 4: NumPy Basics: Arrays and Vectorized Computation

www.it-ebooks. info
In (12]: data.dtype
Out[12]: dtype(" float64')
This chapter will introduce you to the basics of using NumPy arrays, and should be
sufficient for following along with the rest of the book. While it's not necessary to have
adeep understanding of NumPy for many data analytical applications, becoming pro
ficient in array-oriented programming and thinking is a key step along the way to be
coming a scientific Python guru.

Whenever you see "array", "NumPy array", or "ndarray" in the text,


with few exceptions they all refer to the same thing: the ndarray object.

Creating ndarrays
The easiest way to create an array is to use the array function. This accepts any se
quence-like object (including other arrays) and produces a new NumPy array contain
ing the passed data. For example, a list is a good candidate for conversion:
In [13]: data1 = (6, 7.5, 8, 0, 1]
In [14]: arri = np.array(data1)

In [15]: arr1
Out [15]: array([ 6. , 7.5, 8. , 0. , 1. )

Nested sequences, like alist of equal-length lists, will be converted into a multidimen
sional array:
In [16]: data2 ([1, 2, 3, 4], [5, 6, 7, 8]]

In (17): arr2 np.array (data2)


In [18]: arr2
Out[18):
array([[1, 2, 3, 4].
(5, 6, 7, 8]])
In [19): arr2.ndim
Out[19]: 2
In [20]: arr2.shape
Out [20]: (2, 4)
Unless explicitly specified (more on this later), np. array tries to infer a good data type
for the array that it creates. The data type is stored in aspecial dtype object; for example,
in the above two examples we have:
In (21]: arr1.dtype
Out[21]: dtype('float64' )

The NumPy ndarray: AMultidimenslonal Aay 0bject| 81

www.it-ebooks. info
In [22]: arr2.dtype
Out[22]: dtype('int64')
In addition to np.array, there are a number of other functions for creating new arrays.
As examples, zeros and ones create arrays of 0's or 1's, respectively, with agiven length
or shape. empty creates an array without initializing its values to any particular value.
To create a higher dimensional array with these methods, pass a tuple for the shape:
In (23]: np.zeros(10)
Out [23]: array([ 0., 0.) 0., 0., 0., 0., 0., 0., 0., 0.])

In [24]: np.zeros((3, 6))


Out[24]:
array(([ 0., 0., 0., 0., 0., 0.],
0., 0., 0., 0. 0., 0.],
0., 0.. 0., 0.)))
In [25): np.empty((2, 3, 2))
Out [25]:
array([[ 4.94065646e-324, 4.94065646e-324),
3.87491056e-297, 130],
.400e-22Aii.
.

4.9406s646e-324, 4.9

[[ 1.90723115e+083, S.73293533e-053],
[ -2.33568637e+124, -6.70608105e-012],
[ 4.42786966e+160, 1.27100354e+025]]])

Ir's not safe to assume that np.empty will return an array of all zeros. In
many cases, as previously shown, it will return uninitialized garbage
values.

arange is an array-valued version of the built-in Python range function:


In [26]: np.arange(15)
Out [26]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
See Table 4-1 for a short list of standard array creation functions. Since NumPy is
focused on numerical computing, the data type, if not specified, will in many cases be
float64 (floating point).
Table 4-1. Array creation functions
Function Description
array Convert input data (list, tuple, array, or other sequence type) to an ndarray elther by
inferring adtype or explidtly speifying adtype. Copies the input data by defauit.
asarray Convert input to ndaray, but do not copy if the input is already an ndarray
arange Like the built-in range but returns an ndarray instead ofa list.
ones, ones like Produce an array of all 1's with the given shape and dtype. ones like takes another
array and produces aones array of the same shape and dtype.
zeros, zeros like Like ones and ones like but producing arrays of 0's instead

82 Chapter 4: NumPy Basics: Arrays and Vectorized Computation

www.it-ebooks. info
Function Description
empty, empty_like Create new arrays by allocating new memory, but do not populate with any values like
ones and zeros
eye, identity (reate asquare NxNidentity matrix (1's on the diagonal and O's elsewhere)

Data Types for ndarays


The data type or dtype is a special object containing the information the ndarray needs
to interpret a chunk of memory as a particular type of data:
In [27]: arri = np. array([1, 2, 3], dtype=np.float64)
In [28]: arr2 = np. array([1, 2, 3], dtype=np.int32)
In [29]: arr1.dtype In [30]: arr2.dtype
Out [29]: dtype("float64') Out [30]: dtype(' int32')
Drypes are part of what make NumPy so powerful and flexible. In most cases they map
directly onto an underlying machine representation, which makes it easy to read and
write binary streams of data to disk and also to connect to code written in a low-level
language like Cor Fortran. The numerical drypes are named the same way: a type name,
like float or int, followed by a number indicating the number of bits per element. A
standard double-precision floating point value (what's used under the hood in Python's
float object) takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as
float64. See Table 4-2 for a full listing of NumPy's supported data types.

Don't worry about memorizing the NumPy dtypes, especially if you're


a new user. It's often only necessary to care about the general kind of
data you're dealing with, whether floating point, complex, integer,
boolean, string, or general Python object. When you need more control
over how data are stored in memory and on disk, especially large data
sets, it is good to know that you have control over the storage type.

Table 4-2. NumPy data types

Type Type Code Description


int8, uint8 i1, u1 Signed and unsigned 8-bit (1 byte) integer types
int16, uint16 i2, u2 Signed and unsigned 16-bit integer types
int32, uint32 i4, u4 Signed and unsigned 32-bit integer types
int64, uint64 i8, u8 Signed and unsigned 32-bit integer types
float16 f2 Half-precision floating point
float32 f4 or f Standard single-precision floating point. Compatible with Cfloat
float64, float128 f8 or d Standard double-precision floating point. Compatible with Cdouble
and Python float object

The NumPy ndaray: AMultidimensional Array Object | 83

www.it-ebooks. info
Type Type Code Descriptlon
float128 f16 or g Extended-precision floating point
complex64, complex128, c8, c16, Complexnumbersrepresentedby two 32,64,or 128foats, respectively
complex256 c32

bool ? Boolean type storing True and False values


object 0 Python objet type
string Fixed-length string type (1 byte per character). For exampe, to create
astring dtype with length 10, use 'S10".
unicode Fixed-length unicode type (number ofbytes platform specific). Same
specification semantics as string_(e.g. 'U10").

You can explicitly convert or cast an array from one dype to another using ndarray's
astype method:
In [31]: arr np.array([1, 2, 3, 4, 5))
In [32]: arr.dtype
Out[32]: dtype(" int64')
In (33]: float_ar = arr.astype(np.float64)
In (34]: float_arr.dtype
Out[34]: dtype"float64')
In this example, integers were cast to floating point. IfI cast some floating point num
bers to be of integer dtype, the decimal part will be truncated:
In [35): arr = np. array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
In [36]: arr
Out[36]: array([ 3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
In [37]: arr.astype (np. int32)
Qut37]: array([ 3, -1, -2, 0, 12, 10), dtype=int32)
Should you have an array of strings representing numbers, you can use astype to convert
them to numeric form:
In [38]: numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string )
In (39): numeric _strings. astype(float)
Out [39): array([ 1.25, -9.6 , 42. J)
If casting were to fail for some reason (like a string that cannot be converted to
float64), a TypeError will be raised. See that I was a bit lazy and wrote float instead of
np.float64; NumPy is smart enough to alias the Python types to the equivalent dtypes.
You can also use another array's dtype attribute:
In [40]: int array = np.arange(10)

84 Chapter 4: NumPy Basics: Arays and Vectorized Computation

www.it-ebooks. info
In [41]: calibers np.array([.22, .270, 357, 380, .44, 50], dtype=np.float64)
In (42]: int array.astype(calibers.dtype)
Out [42]: array([ 0., 1., 2., 3., 4. 5., 6., 7., 8., 9.])
There are shorthand type code strings you can also use to refer to a dtype:
In [43]: empty uint32 = np.empty(8, dtype='u4')
In [44]: empty_uint32
Out [44):
array([ 0, 65904672, 0, 64856792, 0,
39438163, 0], dtype=uint32)

Calling astype always creates a new array (a copy of the data), even if
the new dtype is the same as the old dtype.

It's worth keeping in mind that floating point numbers, such as those
in float64 and float32 arrays, are only capable of approximating frac
tional quantities. In complex computations, you may accrue some
floating point error, making comparisons only valid up to acertain num
ber of decimal places.

Operations between Arrays and Scalars


Arrays are important because they enable you to express batch operations on data
without writing any for loops. This is usually called vectorization. Any arithmetic op
erations between equal-size arrays applies the operation elementwise:
In [45): arr np.array([[1. , 2., 3.], [4., 5., 6.]])
In (46): arr
Out[46]:
array([[ 1., 2, 3l.
[4., 5., 6.)])
In [47]: arr * arr In 481: arr- arr
Out[47]: Out[48]:
array([[ 1., 4., 9.J,
[16., 25., 36.]]1)
array(([ 0.,
O. 0., i
Arithmetic operations with scalars are as you would expect, propagating the value to
each element:
In [49]: 1 /arr In (50]: arr ** 0.5
Out [49]: Out[50]:
array([[ 1. 0.5 0.3333]. array([[ 1. ) 1.4142, 1.7321],
[0.25 , 0.2 0.1667]]) [ 2. . 2.2361, 2.4495]])

The NumPy ndarray: AMultidimensional Aray Object|85

www.it-ebooks.info
Operations between differently sized arrays is called broadcasting and will be discussed
in more detail in Chapter 12. Having adeep understanding of broadcasting is not nec
essary for most of this book.

Basic Indexing and Slicing


NumPy array indexing is a rich topic, as there are many ways you may want to select
a subset of your data or individual elements. One-dimensional arrays are simple; on
the surface they act similarly to Python lists:
In (51]: arr = np.arange (10)
In [52]: arr
Out [52]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [$3]: arr[5]
Out[53]: 5
In [54): arr[5:8]
Out [54]: array([5, 6, 7])
In [55]: arr[5:8] = 12
In [56]: arr
Out[56]: array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
As you can see, if you assign a scalar value to a slice, as in arr[5:8] 12, the value is
propagated (or broadcasted henceforth) to the entire selection. An important first dis
tinction from lists is that array slices are views on the original array. This means that
the data is not copied, and any modifications to the view will be reflected in the source
array:
In [57]: arr_slice - arr[5:8]
In [58]: arr slice[1] = 12345
In (59]: arr
Out [59]: array([ 0, 1, 2, 3, 4, 12, 12345, 12, 8. 9])

In [60]: arr slice[:] = 64


In [61]: arr
Out [61]: array([ 0, 1, 2, 3, 4, 64, 64, 64, 8, 9])
If you are new to NumPy, you might be surprised by this, especially if they have used
other array programming languages which copy data more zealously. As NumPy has
been designed with large data use cases in mind, you could imagine performance and
memory problems if NumPy insisted on copying data left and right.

86 | Chapter 4: NumPy Basics: Arays and Vectorized Computation

www.it-ebooks. info
If you want a copy of a slice of an ndarray instead of a view, you will
need to explicitly copy the array; for example arr[5:8].copy().

With higher dimensional arrays, you have many more options. In atwo-dimensional
array, the elements at each index are no longer scalars but rather one-dimensional
arrays:
In [62]: arr2d = np. array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
In [63]: arr2d[2]
Out [63): array([7, 8, 9])
Thus, individual elements can be accessed recursively. But that is a bit too much work,
so you can pass a comma-separated list of indices to select individual elements. So these
are equivalent:
In [64]: arr2d[o][2)
Out [64]: 3
In [65]: arr2d[o, 2]
Out [65]: 3
See Figure 4-1 for an ilustration of indexing on a 2D array.
axis 1
1

0,0 0.1 0,2

axis 0 1 1,0 1,1 1,2

2,0 2, 1 2,2

Figure 4-1. Indexing elements in a NumPy array

In multidimensional arrays, ifyouomit later indices, the returned object will be a lower
dimensional ndarray consisting of all the data along the higher dimensions. So in the
2 x 2x 3 array arr3d
In [66]: arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]1])
In (67]: arr3d
Out[67]:
array([[[ 1, 2, 3),

The NumPy ndarray: AMultidimensional Aay Object | 87

www.it-ebooks. info
[4, S, 6]1.
([ 7, 8, 9],
(10, 11, 12]]])
arr3d[0] is a 2 x 3 array:
In [68]: arr3d[o]
Out[68]:
array([[1, 2, 3],
[4, 5, 6]])
Both scalar values and arrays can be assigned to arr3d[o]:
In [69]: old values = arr3d[o].copy()
In [70]: arr3d[o] = 42
In [71]: arr3d
Out [71]:
array(([[42, 42, 42],
[42, 42, 42]],
[[ 7, 8, 9],
[10, 11, 12]])
In [72]: arr3d[o] = old_values
In [73]: arr3d
Out[73]:
array([[l 1, 2, 3]
[4, 5, 6]),
[I 7, 8, 9],
[10, 11, 12]]])
Similarly, arr3d[1, o] gives you all of the values whose indices start with (1, 0), form
ing a 1-dimensional array:
In [74]: arr3d[1, o]
Out[74): array([7, 8, 9])
Note that in all of these cases where subsections of the array have been selected, the
returned arrays are views.

Indexing with slices


Like one-dimensional objects such as Python lists, ndarrays can be sliced using the
familiar syntax:
In [75]: arr[1:6]
Out[75]: array([ 1, 2, 3, 4, 64])
Higher dimensional objects give you more options as you can slice one or more axes
and also mix integers. Consider the 2D array above, arr2d. Slicing this array is a bit
different:
In [76]: arr2d In [77): arr2d[:2]
Out [76]: Qut [77]:

88 Chapter 4: NumPy Basics: Arrays and Yectorized Computation

www.it-ebooks. info
array([[1, 2, 3], array([ [4,
[1, 5,2, 6jj)
3],
[4, 5, 6]
[7, 8, 9]))
As you can see, it has sliced along axis 0, the first axis. Aslice, therefore, selects a range
of elements along an axis. You can pass multiple slices just like you can pass multiple
indexes:
In [78]: arr2d[:2, 1:]
Out [78]:
array([[2, 3).
[S, 6j])
When slicing like this, you always obtain array views of the samenumber ofdimensions.
By mixing integer indexes and slices, you get lower dimensional slices:
In [79]: arr2d[1, :2] Out
In olarr2d[2, :1)
Out[79]: array([4, si) array([7])
See Figure 4-2 for an illustration. Note that a colon by itself means to take the entire
axis, so you can slice only higher dimensional axes by doing:
In [81]: arr2d[:, :1]
Out [81]:
array([[1],
(
i71j)
Of course, assigning to a slice expression assigns to the whole selection:
In [82]: arr2d[:2, 1:] = 0

Boolean Indexing
Let's consider an example where we have some data in an array and an array of names
with duplicates. I'm going to use here the randn function in numpy.random to generate
some random normally distributed data:
In [83]: names np.array(['Bob', 'Joe', Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [84]: data randn(7, 4)


In [85]: names
Out [85]:
array([" Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],
dtype=" |S4')
In [86]: data
Out[86]:
array([[-0.048, 0.5433, -0.2349, 1.2792],
[-0.268 , 0.5465, 0.0939, -2.0445],
[-0.047 -2.026 , 0.7719, 0.3103])
[ 2.1452, 0.8799, -0.0523, 0.0672]
[-1.0023, -0.1698, 1.1503, 1.7289],

The NumPy ndarray: AMutidimensional Aray Object | 89

www.it-ebooks. info
[0.1913, 0.4544, 0.4519, 0. 5535],
[0.5994, 0.8174, -0.9297, -1.2564]])

Expression Shape
arr[:2, 1:] (2, 2)

arr[2] (3,)
arr[2, :] (3,)
arr[2:, :] (1, 3)

arr[:, :2] (3, 2)

arr[1, :2] (2,)


arr[1:2, :2] (1, 2)

Figure 4-2. Two-dimensional array slicing

Suppose each name corresponds to a row in the data array. If we wanted to select all
the rows with corresponding name 'Bob". Like arithmetic operations, comparisons
(such
'Bob'
as ==) with arrays are also vectorized. Thus, comparing names with the string
yields boolean array:
a
In [87]: names == 'Bob
Out[87]: array([ True, False, False, True, False, False, False], dtype=bool)
This boolean array can be passed when indexing the array:
In [88]: data[names == 'Bob']
Out [88j:
array([[-o.048 , 0.5433, -0.2349, 1.2792],
[2.1452, 0.8799, -0.0523, 0.0672]])
The boolean array must be of the same lengrh as the axis it's indexing. You can even
mix and match boolean arrays with slices or integers (or sequences of integers, more
on this later):
In [89]: data[names = 'Bob', 2:]
Out [89]:
array([[-0.2349, 1.2792],

90 | Chapter 4: NumPy Basics: Arays and Vectorized Computation

www.it-ebooks.info
[-0.0523, 0.0672]])

In [90]: data[names s 'Bob', 3]


Out[90]: array([ 1.2792, 0.0672])
To select everything but 'Bob', you can either use = or negate the condition using -;
In (91]: names I- 'Bob
Out[91]: array([False, True, True, False, True, True, True], dtype=bool)
In (92]: data[-(names 'Bob')]
Out [92]:
array([[-0.268, 0.5465, 0.0939, -2.0445),
[-0.047 , -2.026 0.7719, 0.3103],
(-1.0023, -0.1698, 1.1503, 1.7289]
[ 0.1913, 0.4544, 0.4519, 0.5535],
0.5994, 0.8174, -0.9297, -1.2564]])
Selecting two of the three names to combine multiple boolean conditions, use boolean
arithmetic operators like &(and) and (or):
In [93]: mask = (names 'Bob') | (names 'Will')
In [94]: mask
Out [94]: array([True, False, True, True, True, False, False], dtype=bool)
In [95): data[mask]
Out[95]:
array([[-0.048, 0.5433, -0.2349, 1.2792),
-0.047 , -2.026 , 0.7719, 0.3103]
2.1452, 0.8799, -0.0523, 0.0672],
(-1.0023, -0.1698, 1.1503, 1.7289)])
Selecting data from an array by boolean indexing always creates a copy of the data,
even if the returned array is unchanged.
The Python keywords and and or do not work with boolean arrays.

Setting values with boolean arrays works in a common-sense way. To set all of the
negative values in data to Owe need only do:
In [96]: data[data < o] = 0
In (97]: data
Out[97]:
array([[ 0. 0.5433, 0. , 1.2792]
0.5465, 0.0939, 0.
0 0. 0.7719, 0.3103],
2.1452, 0.8799, 0. 0.0672],
0. , 0. , 1.1503, 1.7289],
0.1913, 0.4544, 0.4519, 0.5535],
0.5994, 0.8174, 0. 0. j)

The Numfy ndarray: AMultidimensional Aray Object | 91

www.it-ebooks. info
Setting whole rows or columns using a lD boolean array is als0 easy:
In [98]: data[names != 'Joe'] -7
In [99]: data
Out [99]:
array([[ 7. 7.
0. 0.5465, 0.0939, 0,
7. 7.
7. 7.
7 7.
0.1913, 0.4544, 0.4519,
0.5994, 0.8174, 0.
o.553)
0.

Fancy Indexing
Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.
Suppose we had a 8 x 4 array:
In [100]: arr = np.empty((8, 4))
In [101]: for i in range(8):
arr[i] i
In [102]: arr
Out[102]:
array([[ 0., 0., 0., 0.],
1., 1. 1., 1. J
2. 2.. 2. 2.
3"J»
4. 4) 4.) 4.],
5., 5., 5., 5.],
6., 6., 6., 6.],
7., 7., 7., 7.]))
To select out a subset of the rows in a particular order, you can simply pass a list or
ndarray of integers specifying the desired order:
In [103]: arr[[4, 3, 0, 6]]
Out [103] :
array([[ 4., 4. 4., 4.J,
3., 3, 3., 3.],
0., 0., 0., 0.J,
6., 6., 6., 6.11)

Hopefully this code did what you expected! Using negative indices select rows from
the end:
In [104]: arr[[-3, -5, -7]]
Out[104):
array([[ 5., 5., 5., 5.],
3., 3., 3., 3.],
I1., 1., 1., 1.]])

92 Chapter 4: NumPy Basics: Aays and Vectorized Computation

www.it-ebooks.info
index arrays does something slightly different; it selects a lD array of
Passing multiple indices:
elements corresponding to each tuple of
# more on reshape in Chapter (8, 4))
In (105]: arr = np.arange(32) .reshape(
In (106]: arr
Out [106] :
array([[ 0, 1, 2, 3],
4, 5, 6, 7],
8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]
[24, 25, 26, 27,
[28, 29, 30, 31])

In (107]: arr[[1, 5, 7, 2), [0, 3, 1, 2)}]


Out [107] : array([ 4, 23, 29, 10])
the elements (1, o), (5, 3), (7,
Take a moment to understand what just happened:indexing
1), and (2, 2) were selected. The behavior of fancy in this case is a bit different
(myself included), which is the rectangular
from what some users might have expectedmatrix's
region formed by selecting a subset of the rows and columns. Here is one way
to get that:
In [108) : arr[(1, 5, 7, 2]][:, [0, 3, 1, 2]]
Out [108] :
array([[ 4, 7, 5, 6],
[20, 23, 21, 22],
[28, 31, 29, 30J
8, 11, 9, 10]])
Another way is to use the np.ix function, which converts two lD integer arrays to an
indexer that selects the square region:
In (109]: arr[np.ix ([1, 5, 7, 2), [0, 3, 1, 2])]
Out[109] :
array(([ 4, 7, 5, 6),
[20, 23, 21, 22],
[28, 31, 29, 30),
[8, 11, 9, 10)])
Keepin mind that fancy indexing, unlike slicing, always copies the data into a newarray.

Transposing Arrays and Swapping Axes


Transposing is a special form of reshaping which similarly returns a view on the un
derlying data without copying anything. Arrays have the transpose method and also
the special Tattribute:
In [110]: arr = np.arange(15) .reshape( (3, 5))
In [111]: arr In [112)]: arr.T

The NumPy ndarray: AMultidimensional Array Object|93

www.it-ebooks. info
Out[111] : Out[112 ]:
array([[ 0, 1, 2, 3, 4] array([[ 0, S, 10],
[5, 6, 7, 8, 9], 1, 6, 11],
(10, 11, 12, 13, 14]]) [ 2, 7,, 121,
3, 8, 13],
4, 9, 14]])
When doing matrix computations, you will do this very often, like for example com
puting the inner matrix product X'X using np. dot:
In (113]: arr = np.random. randn(6, 3)
In [114]: np.dot(arr.T, arr)
Out [114]:
array([[ 2.584 , 1.8753, 0.88881,
1.8753, 6.6636, 0.3884],
[ o.8888, 0.3884, 3.9781]])
For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute
the axes (for extra mind bending):
In [115]: arr = np.arange(16) .reshape((2, 2, 4))
In [116]: arr
Out [116]:
array([([ 0, 1, 2, 3:
[4, 5, 6, 7]]),
[[8, 9, 10, 11),.
(12, 13, 14, 15j)])
In [117]: arr.transpose((1, 0, 2))
Out[117]:
array([[[ 0, 1, 2, 3),
[8, 9, 10, 11]],
[[ 4, 5, 6, 7),
(12, 13, 14, 15í])
Simple transposing with .T is just a special case of swapping axes. ndarray has the
method swapaxes which takes a pair of axis numbers:
In [118]: arr In [119]: arr.swapaxes (1, 2)
Out [118] : Out[119] :
array([[[ o, 4),
array(([[ 0, 1, 2, 7ii.
3),
[4, 5, 6, 1, 5],
2, 6],
([ 8, 9, 10, 3, 711,
[12, 13, 14, 15j1)
[[ 8, 12)],
9, 13],
[10, 14],
[11, 1s]]])
Swapaxes similarly returns a view on the data without making a copy.

94 | Chapter 4: NumPy Basics: Arrays and Vectorized Computation

www.it-ebooks. info

You might also like