Numpy and Matplotlib
Lesson 15
Data Analysis Packages for Python
• NumPy: N-dimensional array
• Matplotlib: 2D and 3D Plotting
• SciPy: Scientific computing (linear algebra, numerical integration,
optimization, etc)
• Ipython: Enhanced Interactive Console
• Sympy: Symbolic mathematics
• Pandas: Data analysis (provides a data frame structure similar to R)
Numpy? http://www.numpy.org/
• Numpy: fundamental package for scientific computing with Python. It
contains:
• a multi-dimensional array object
• basic linear algebra functions
• basic Fourier transforms
• sophisticated random number capabilities
• tools for integrating Fortran code
• tools for integrating C/C++ code
• Reference:
• Official documentation (Numpy and Scipy): http://docs.scipy.org/doc/
• The Numpy Book: http://web.mit.edu/dvp/Public/numpybook.pdf
Numpy – the ndarray data structure
• NumPy's main object is the homogeneous
>>> a = numpy.array([1,3,5,7,9])
multidimensional array called ndarray. >>> b = numpy.array([3,5,6,7,9])
>>> c = a + b
– This is a table of elements (usually numbers), all of the >>> print c
same type, indexed by a tuple of positive integers. [4, 8, 11, 14, 18]
– can be an object type though
– Typical examples of multidimensional arrays include
vectors, matrices, images and spreadsheets.
– Dimensions usually called axes, number of axes is the rank
[7, 5, -1] An array of rank 1 i.e. It has 1 axis of length 3
[ [ 1.5, 0.2, -3.7] , An array of rank 2 i.e. It has 2 axes, the first
[ 0.1, 1.7, 2.9] ] length 3, the second of length 3 (a matrix
with 2 rows and 3 columns
Numpy – ndarray attributes
• ndarray.ndim
– the number of axes (dimensions) of the array i.e. the rank.
• ndarray.shape
– the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension.
For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore
the rank, or number of dimensions, ndim.
• ndarray.size
– the total number of elements of the array, equal to the product of the elements of shape.
• ndarray.dtype
– an object describing the type of the elements in the array. One can create or specify dtype's using
standard Python types. NumPy provides many, for example bool_, character, int_, int8, int16, int32,
int64, float_, float8, float16, float32, float64, complex_, complex64, object_.
• ndarray.itemsize
– the size in bytes of each element of the array. E.g. for elements of type float64, itemsize is 8 (=64/8),
while complex32 has itemsize 4 (=32/8) (equivalent to ndarray.dtype.itemsize).
• ndarray.data
– the buffer containing the actual elements of the array. Normally, we won't need to use this attribute
because we will access the elements in an array using indexing facilities.
Numpy – ndarray methods
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.html
• ndarray.copy()
– Returns a copy of the array
• ndarray.flatten([order])
– Returns a copy of the array collapsed into one dimension (an ndarray).
• ndarray.ravel([order])
– Returns a contiguous flattened array (a Python 1-D array).
• ndarray.tolist()
– Returns the contents of self as a nested list
• ndarray. item(*args)
– Copy an element of an array to a standard Python scalar and return it.
• ndarray.sort([axis, kind, order])
– Sorts an array, in place
• ndarray.fill(scalar)
– Fill an array with the scalar value
• ndarray.put(indices, values[, mode])
– Replace every value at index n with value at index n, leave the other values of ndarray unaltered
• ndarray.resize(new_shape[, refcheck])
– Change shape and size of array in-place
Numpy – examples of ndarray manipulation
• Creating an array • Be careful, arrays are mutable!
import numpy
A = np.zeros((2, 2))
a = array([[1,2,3],
# array([[ 0., 0.],
[4,5,6],
# [ 0., 0.]])
[7,8,9]])
a.shape
C=A
a.dtype
C[0, 0] = 1
• Slicing an array print A
• Almost the same as lists except one # [[ 1. 0.]
can specify multiple dimensions # [ 0. 0.]]
a[1]
a[1,:]
a[1,1:]
a[:1,1:]
Numpy – examples of ndarray manipulation
• Vector arithmetics • Matrix arithmetics
>>> a = np.array([1,2,3], float) >>> a = np.array([[1, 2], [3, 4], [5, 6]], float)
>>> b = np.array([5,2,6], float) >>> b = np.array([-1, 3], float)
>>> a + b >>> a
array([6., 4., 9.]) array([[ 1., 2.],
>>> a – b [ 3., 4.],
array([-4., 0., -3.]) [ 5., 6.]])
>>> a * b >>> b
array([5., 4., 18.]) array([-1., 3.])
>>> b / a >>> a + b
array([5., 1., 2.]) array([[ 0., 5.],
>>> a % b [ 2., 7.],
array([1., 0., 3.]) [ 4., 9.]])
>>> b**a
array([5., 4., 216.])
• Element-by-element Array functions
• e.g. add(x1,x2), absolute(x), log10(x),
sin(x), logical_and(x1,x2)
Numpy – Array broadcasting
• Simplified syntax for matricial
results :
• With scalars: add a constant to or
multiply the matrix by a constant
#fill matrix A with 1s
A = np.ones((3,3))
print 3 * A – 1
# [[ 2. 2. 2.]
# [ 2. 2. 2.]
# [ 2. 2. 2.]]
• with vectors: complements a
virtual matrix (see figure)
Numpy – some functions
https://docs.scipy.org/doc/numpy-1.14.0/genindex.html
• Mathematical
• abs(), add(), binomial(), cumprod(), cumsum(), floor(), histogram(), min(), max(), multipy(),
polyfit(), randint(), shuffle(), transpose()
• Initialization
• linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)[source]
• Return evenly spaced numbers (num samples) over a specified interval
>>> np.arange(3,7,2) array([3, 5])
• arange([start, ]stop, [step, ]dtype=None): returns evenly spaced values within a given interval.
Same as linspace but with a step.
>>> np.arange(3,7) array([3, 4, 5, 6])
• randint(low, high=None, size=None, dtype='l’): returns random integers from low (inclusive) to
high (exclusive) >>> np.random.randint(2, size=10) array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])
• Fromstring(string, dtype=float, [count=-1,] [sep=‘’]): Useful function to convert raw data into
processable data - creates a new 1-D array initialized from text data in a string
>>> np.fromstring('1, 2', dtype=int, sep=',’) array([1, 2])
Matplotlib ? http://matplotlib.org
• Library for making 2D plots of arrays in Python
• Also 3D plots
• Also extra toolkits for maps, statistics, etc.
• Provides many ways of extracting meaningful representations and to
interpret phenomena out of raw data
• Can be used in
• Python scripts
• the Python and IPython shell
• web application servers
References
• Documentation page (links):
• https://matplotlib.org/contents.html
• PyplotTutorial
• https://matplotlib.org/users/pyplot_tutorial.html
• User’s Guide
• https://matplotlib.org/users/
• Cookbook / Matplotlib
• https://matplotlib.org/api/cbook_api.html API Examples
https://matplotlib.org/gallery/index.html#api-examples
• Matplotlib FAQ
• https://matplotlib.org/faq/index.html
• Examples:
• https://matplotlib.org/examples/
• API:
• https://matplotlib.org/api/
• Pyplot Documentation (MATLAB-like plotting framework):
• https://matplotlib.org/api/pyplot_api.html
Installation
• Matplotlib:
python -mpip install -U pip
python -mpip install -U matplotlib
• These instructions also install the six, cycler, python-dateutil, pytz,
numpy, pyparsing, and kiwisolver libraries
• Alternately, you might want to run a Windows installer for the
package
Matplotlib API / frontend
• Also called Matplotlib frontend
• Abstract interface
• Relies on device dependent renderer
(known as backend)
• Pyplot API (Matlab-like)
• collection of command style functions
• Object-Oriented API
• Artist
• Figure
• Line2D
• Axis
• Text
• Tick
• Patches (e.g., matplotlib.patches.Ellipse and Path)
Matplotlib API – Originally emulating the MATLAB graphics commands
Source: John D. Hunter, Perry Greenfield
Matplotlib API
from matplotlib.backends.backend_agg
import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
fig = Figure()
canvas = FigureCanvas(fig)
ax = fig.add_subplot(111)
ax.plot([1,2,3]) a
ax.set_title(‘Anatomy of a figure')
ax.grid(True)
ax.set_xlabel(‘X axis label')
ax.set_ylabel(‘Y axis label')
fig.savefig(‘figure')
Programming with Matplotlib.pyplot
• plot() creates an object and outputs info on it
• show() displays a figure
• draw(): clear the current figure and initialize a blank figure without
hanging or displaying anything
• clf(): clear the figure
• axis(): specifies the view port of the axes
More Matplotlib.pyplot functions …
Function Description Function Description Function Description
acorr plot the autocorrelation function gca return the current axes gca return the current axes
annotate annotate something in the figure gcf return the current figure gcf return the current figure
arrow add an arrow to the axes gci get the current image, or None gci get the current image, or None
axes create a new axes getp get a graphics property getp get a graphics property
axhline draw a horizontal line across axes grid set whether gridding is on grid set whether gridding is on
axvline draw a vertical line across axes hexbin make a 2D hexagonal binning plot hexbin make a 2D hexagonal binning plot
axhspan draw a horizontal bar across axes hist make a histogram hist make a histogram
axvspan draw a vertical bar across axes hold set the axes hold state hold set the axes hold state
axis set or return the current axis limits ioff turn interaction mode off ioff turn interaction mode off
barbs a (wind) barb plot ion turn interaction mode on ion turn interaction mode on
bar make a bar chart isinteractive return True if interaction mode is on isinteractive return True if interaction mode is on
barh a horizontal bar chart imread load image file into array imread load image file into array
broken_barh a set of horizontal bars with gaps imsave save array as an image file imsave save array as an image file
box set the axes frame on/off state imshow plot image data imshow plot image data
boxplot make a box and whisker plot ishold return the hold state of the current axes ishold return the hold state of the current axes
cla clear current axes legend make an axes legend legend make an axes legend
clabel label a contour plot adjust parameters used in locating axis adjust parameters used in locating axis
locator_params locator_params
clf clear a figure window ticks ticks
clim adjust the color limits of the current image loglog a log log plot loglog a log log plot
close close a figure window display a matrix in a new figure preserving display a matrix in a new figure preserving
matshow matshow
colorbar add a colorbar to the current figure aspect aspect
cohere make a plot of coherence margins set margins used in autoscaling margins set margins used in autoscaling
contour make a contour plot pcolor make a pseudocolor plot pcolor make a pseudocolor plot
contourf make a filled contour plot make a pseudocolor plot using a make a pseudocolor plot using a
pcolormesh pcolormesh
csd make a plot of cross spectral density quadrilateral mesh quadrilateral mesh
delaxes delete an axes from the current figure pie make a pie chart pie make a pie chart
draw Force a redraw of the current figure plot make a line plot plot make a line plot
errorbar make an errorbar graph plot_date plot dates plot_date plot dates
make legend on the figure rather than the plot column data from an ASCII plot column data from an ASCII
figlegend plotfile plotfile
axes tab/space/comma delimited file tab/space/comma delimited file
figimage make a figure image pie pie charts pie pie charts
figtext add text in figure coords polar make a polar plot on a PolarAxes polar make a polar plot on a PolarAxes
figure create or change active figure psd make a plot of power spectral density psd make a plot of power spectral density
fill make filled polygons quiver make a direction field (arrows) plot quiver make a direction field (arrows) plot
fill_between make filled polygons between two curves rc control the default params rc control the default params
axis() command
(4,16)
• The axis() command takes a list (3,9)
[xmin, xmax, ymin, ymax]
and specifies the viewport of the axes
(2,4)
Example
plot([1,2,3,4], [1,4,9,16], 'ro') (1,1)
axis([0, 6, 0, 20])
Logarithmic and other nonlinear axes
Text on Figures
• Text commands:
• X-axis label: xlabel()
• Y-axis label: ylabel()
• Figure title: title()
• text()
• In simple cases, all but text() take just a string argument
• text() needs at least 3 arguments:
x and y coordinates (as per the axes) and a string
• All take optional keyword arguments or dictionaries to specify the font properties
• They return instances of class Text
Legends on Figures
Line Plot
• plot() command
More curve plotting
• plot() has a format string argument specifying the color and line type
• Goes after the y argument (whether or not there’s an x argument)
• Can specify one or the other or concatenate a color string with a line style string in either order
• Default format string is 'b-' ( solid blue line)
• Generally work with arrays, not lists
• All sequences are converted to NumPy arrays internally
• Some color abbreviations
b (blue), g (green), r (red), k (black), c (cyan)
• Can specify colors in many other ways (e.g., RGB triples)
• Some line styles that fill in the line between the specified points
- (solid line), -- (dashed line), -. (dash-dot line), : (dotted line)
• Some line styles that don’t fill in the line between the specified point
o (circles), + (plus symbols), x (crosses), D (diamond symbols)
• For a complete list of line styles and format strings,
https://matplotlib.org/api/pyplot_api.html#module-matplotlib.pyplot
• Look under matplotlib.pyplot.plot()
Polar plots
Multiple plots
• plot() can draw more than one line at a time
• Put the (1-3) arguments for one line in the figure
• Then the (1-3) arguments for the next line
• And so on
import numpy as np
import matplotlib as plt
# evenly sampled time at 200ms intervals
t = np.arange(0., 5., 0.2)
# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^') plt.show()
Multiple plots
• Another approach using multiple plot statements
Plotting with categorical variables
Plots and text annotations
• text() command: place text at an arbitrary position on the Axes.
• annotate() method: makes annotations easy.
• location being annotated: argument xy
• location of the text: xytext.
• Both of these arguments are (x,y) tuples.
Multiple subplots
• subplot(nrows, ncols, index, **kwargs): create
and return an Axes, at position index of a (virtual)
grid of nrows by ncols axes
• If nrows, ncols and index are all less than 10, they
can also be given as a single, concatenated, three-
digit number.
• plt.subplot(211) creates a subplot which
represents the top plot of a grid with 2 rows
and 1 column.
Multiple subplots
Multiple subplots
More subplots
Images
• Load images with imread()
• Display them with imshow()
Images
• Create image from raw data
• Numpy.fromstring(string, dtype=float, count=-1, sep=’’) creates a new 1-D
array initialized from text data in a string
Images with interpolations
• Fill in empty pixels using different strategies
Histograms
• Histograms with optional bin counts
and sizes
Scatter Plots
• scatter(x, y )
• x and y are arrays of numbers of the same length, N
• Makes a scatter plot of x vs. y
N = 20
x = 0.9*rand(N)
y = 0.9*rand(N)
scatter(x,y)
savefig('scatter_dem')
Line properties: Line2D
• Lines you encounter are instances of the Line2D class
• Instances have the following properties
Property Values
alpha The alpha transparency on 0-1 scale alpha = 0.0 is complete transparency,
alpha = 1.0 is complete opacity
antialiased True | False - use antialised rendering
color a matplotlib color arg
label a string optionally used for legend
linestyle One of -- : -. -
linewidth a float, the line width in points
marker One of + , o . s v x > < ^
markeredgewidth The line width around the marker symbol
markeredgecolor The edge color if a marker is used
markerfacecolor The face color if a marker is used
markersize The size of the marker in points
Setting line properties
• Use keyword line properties as keyword arguments
• plot(x, y, linewidth=2.0)
• setter methods of the Line2D instance
• For every property prop, there’s a set_prop() method
• line, = plot(x, y, 'o’)
line.set_antialiased(False)
• use function setp()
• 1st argument can be an object or sequence of objects
• The rest are keyword arguments updating the object’s (or objects’) properties
• lines = plot(x1, y1, x2, y2)
setp(lines, color='r', linewidth=2.0)
General Purpose Toolkits
• Mplot3d: generates 3D plots - Axes3D object is
created just like any other axes using the
projection=‘3d’ keyword
https://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html
• Axisartist: custom Axes class that is meant to
support curvilinear grids (e.g., the world
coordinate system in astronomy)
• axes_grid1: collection of helper classes to ease
displaying (multiple) images with matplotlib
Using mplot3d
• An Axes3D object is created like any other axes, but using the projection='3d'
keyword
• Axes3D.plot(xs, ys, *args, **kwargs)
• xs, ys are the arrays of x, y coordinates of vertices
• zs is the z value(s), either 1 for all points or (as array) 1 for each point
• zdir gives which direction to use as z (‘x’, ‘y’ or ‘z’) when plotting a 2D set
• Other arguments are passed on to plot()
• Create a new matplotlib.figure.Figure and add to it a new axes of type Axes3D:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
Mplot3d example: parametric curve
Mapping Toolkits
• Basemap
• library for plotting 2D data on maps
• transform coordinates to one of 25 different map projections (azimutal
equidistant, gnomonic, orthographic, geostationary, Mercator …)
• EoL in 2020
• Cartopy
• geospatial data processing
• object oriented projection definitions
• point, line, polygon and image transformations between projections
• powerful vector data handling by integrating shapefile reading with Shapely capabilities
• Aimed at replacing Basemap
Installation (matplot_tools)
• Basemap (https://matplotlib.org/basemap/users/installing.html):
• Install GEOS (Geometric Engine Open Source), Pyproj (cartographic transformations and geodetic
computations), and Pillow (Python imaging library):
pip install pyproj
pip install geos
pip install Pillow
• Possibly configure the GEOS_DIR environment variable to the location of geos (e.g.
/usr/local/Cellar/geos/3.6.2/include/)
• download .tar.gz from https://github.com/matplotlib/basemap/releases/
• decompress it, and run « python setup.py install »
• Cartopy (http://scitools.org.uk/cartopy/docs/latest/installing.html#installing):
• Install Scipy, and Cython:
pip install scipy
pip install cython
• Install Shapely and Cartopy as follows:
pip install shapely cartopy --no-binary shapely --no-binary cartopy
Basemap API https://matplotlib.org/basemap/
• Drawing a Map Background
• drawcoastlines(): draw coastlines
• fillcontinents(): color the interior of continents
• drawcountries(): draw country boundaries
• drawrivers(): draw rivers
• bluemarble(): draw a NASA Blue Marble image as a map background
• shadedrelief(): draw a shaded relief image as a map background
• etopo(): draw an etopo relief image as map background
• warpimage(): use an abitrary image as a map background. The image must be
global, covering the world in lat/lon coordinates from the international
dateline eastward and the South Pole northward.
Basemap - Draw coastlines, filling ocean and
land areas
Basemap - Draw a shaded relief image
Basemap API https://matplotlib.org/basemap/
• Converting to and from map projection coordinates
• Using constructors: lon/lat to x/y
• inverse=True: x/y to lon/lat
• Plotting data on a map
• scatter(): draw points with markers
• plot(): draw lines and/or markers
• imshow(): draw an image
• contourf(): draw filled contours
http://scitools.org.uk/cartopy/docs/latest/gallery/index.html
Cartopy: drawing maps
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
# Save the plot by calling plt.savefig() BEFORE plt.show()
plt.savefig('coastlines.pdf’)
plt.savefig('coastlines.png’)
plt.show()
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
ax = plt.axes(projection=ccrs. InterruptedGoodeHomolosine())
ax.coastlines(resolution=‘110m’)
ax.gridlines()
plt.show()
Statistics Toolkits
• seaborn: statistical data visualization http://seaborn.pydata.org/
• Built-in themes, color palettes, grids of plots
• Visualizations for univariate and bivariate distributions, linear regression
models, matrices of data, clustering algorithms, plots for statistical timeseries
• HoloViews http://holoviews.org/
• data analysis and visualization (tabulated data, diagrams, animations)
• ggplot https://github.com/yhat/ggpy
• Python implementation of the Grammar of Graphics (Leland Wilkinson et al.)
• Data visualization with templates, maps, analysis tools …
• prettyplotlib https://olgabot.github.io/prettyplotlib/
• matplotlib-enhancer library which painlessly creates beautiful default
matplotlib plots (improved colors, aspect ratio, etc.)