Unit-5 AD23211 PDS Final NOTES
Unit-5 AD23211 PDS Final NOTES
Importing Matplotlib – Simple line plots – Simple scatter plots – visualizing errors – density
and contour plots – Histograms – legends – colors – subplots – text and annotation –
customization – three dimensional plotting - Geographic Data with Basemap– Visualization
with Seaborn.
We’ll now take an in-depth look at the Matplotlib tool for visualization in Python. Matplotlib is a
multiplatform data visualization library built on NumPy arrays, and designed to work with the broader
SciPy stack. It was conceived by John Hunter in 2002, originally as a patch to IPython for enabling
interactive MATLAB-style plotting via gnuplot from the IPython command line. IPython’s creator,
Fernando Perez, was at the time scrambling to finish his PhD, and let John know he wouldn’t have time
to review the patch for several months. John took this as a cue to set out on his own, and the Matplotlib
package was born, with version 0.1 released in 2003. It received an early boost when it was adopted as
the plotting package of choice of the Space Telescope Science Institute (the folks behind the Hubble
Telescope), which financially supported Matplotlib’s development and greatly expanded its
capabilities.
One of Matplotlib’s most important features is its ability to play well with many operating systems
and graphics backends. Matplotlib supports dozens of backends and output types, which means you
can count on it to work regardless of which operating system you are using or which output format you
wish. This cross-platform, everything-to-everyone approach has been one of the great strengths of
Matplotlib. It has led to a large userbase, which in turn has led to an active developer base and Mat-
plotlib’s powerful tools and ubiquity within the scientific Python world.
1. IMPORTING MATPLOTLIB
Before we dive into the details of creating visualizations with Matplotlib, there are a few
useful things you should know about using the package.
Visualization with Matplotlib
General Matplotlib Tips
In[1]: import matplotlib as mpl
import matplotlib.pyplot as plt
plt.style directive to choose appropriate aesthetic styles for our figures
In[2]: plt.style.use('classic')
1.1 Plotting from a script
If you are using Matplotlib from within a script, the function plt.show() is your friend.
plt.show() starts an event loop, looks for all currently active figure objects, and opens one or
more interactive windows that display your figure or figures. So, for example, you may have a
file called myplot.py containing the following:
# file: myplot.py
import matplotlib.pyplot as plt
import numpy as np
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
plt.show()
You can then run this script from the command-line prompt, which will result in a window opening
with your figure displayed:
$ python myplot.py
The plt.show command does a lot under the hood, as it must interact with your system's
interactive graphical backend. The details of this operation can vary greatly from system to
system and even installation to installation, but Matplotlib does its best to hide all these details
from you.
One thing to be aware of: the plt.show command should be used only once per Python
session, and is most often seen at the very end of the script. Multiple show commands can lead
to unpredictable backend-dependent behavior, and should mostly be avoided.
1.2 Plotting from an IPython shell
IPython is built to work well with Matplotlib if you specify Matplotlib mode. To enable
this mode, you can use the %matplotlib magic command after starting ipython:
In [1]: %matplotlib
Using matplotlib backend: TkAgg
At this point, any plt plot command will cause a figure window to open, and further
commands can be run to update the plot. Some changes (such as modifying properties of lines
that are already drawn) will not draw automatically: to force an update, use plt.draw.
Using plt.show in IPython's Matplotlib mode is not required.
1.3 Plotting from an IPython notebook
The Jupyter notebook is a browser-based interactive data analysis tool that can
combine narrative, code, graphics, HTML elements, and much more into a single executable
document (see IPython: Beyond Normal Python).
%matplotlib inline will lead to static images of your plot embedded in the notebook.
%matplotlib notebook will lead to interactive plots embedded within the notebook.
For this book, we will generally stick with the default, with figures rendered as static images
(see the following figure for the result of this basic plotting example):
%matplotlib inline
import numpy as np
x = np.linspace(0, 10, 100)
fig = plt.figure()
plt.plot(x, np.sin(x), '-')
plt.plot(x, np.cos(x), '--');
Though it may be less clear to someone reading your code, you can save some
keystrokes by combining these linestyle and color codes into a single non-keyword argument
to the plt.plot function; the following figure shows the result:
In[8]: plt.plot(x, x + 0, '-g') # solid green
plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r'); # dotted red
These single-character color codes reflect the standard abbreviations in the RGB
(Red/Green/Blue) and CMYK (Cyan/Magenta/Yellow/blacK) color systems, commonly used
for digital color graphics.
2.2 Adjusting the Plot: Axes Limits
Matplotlib does a decent job of choosing default axes limits for your plot, but
sometimes it's nice to have finer control. The most basic way to adjust the limits is to use
the plt.xlim and plt.ylim functions (see the following figure):
plt.plot(x, np.sin(x))
plt.xlim(-1, 11)
plt.ylim(-1.5, 1.5);
If for some reason you'd like either axis to be displayed in reverse, you can simply reverse
the order of the arguments.
In[10]: plt.plot(x, np.sin(x))
plt.xlim(10, 0)
plt.ylim(1.2, -1.2);
A useful related method is plt.axis (note here the potential confusion between axes with an e,
and axis with an i), which allows more qualitative specifications of axis limits. For example,
you can automatically tighten the bounds around the current content, as shown in the
following figure:
In[11]: plt.plot(x, np.sin(x))
plt.axis([-1, 11, -1.5, 1.5]); # x axis limit -1 to 11
Or you can specify that you want an equal axis ratio, such that one unit in x is visually
equivalent to one unit in y, as seen in the following figure:
The third argument in the function call is a character that represents the type of symbol
used for the plotting. Just as you can specify options such as '-' or '--' to control the line style,
the marker style has its own set of short string codes. The full list of available symbols can be
seen in the documentation of plt.plot, or in Matplotlib's online documentation. Most of the
possibilities are fairly intuitive, and a number of the more common ones are demonstrated
here (see the following figure)
rng = np.random.default_rng(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.random(2), rng.random(2), marker, color='black',
label="marker='{0}'".format(marker))
plt.legend(numpoints=1, fontsize=13)
plt.xlim(0, 1.8);
For even more possibilities, these character codes can be used together with line and
color codes to plot points along with a line connecting them (see the following figure):
In[4]: plt.plot(x, y, '-ok'); # line (-), circle marker (o), black (k)
In[5]: plt.plot(x, y, '-p', color='gray', # -p pentagon
markersize=15, linewidth=4,
markerfacecolor='white',
markeredgecolor='gray',
markeredgewidth=2)
plt.ylim(-1.2, 1.2);
The primary difference of plt.scatter from plt.plot is that it can be used to create scatter
plots where the properties of each individual point (size, face color, edge color, etc.) can be
individually controlled or mapped to data.
Let's show this by creating a random scatter plot with points of many colors and sizes.
In order to better see the overlapping results, we'll also use the alpha keyword to adjust the
transparency level (see the following figure):
In[7]: rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)
plt.scatter(x, y, c=colors, s=sizes, alpha=0.3, cmap='viridis'
plt.colorbar(); # show color scale
Notice that the color argument is automatically mapped to a color scale (shown here
by the colorbar command), and that the size argument is given in pixels. In this way, the color
and size of points can be used to convey information in the visualization, in order to visualize
multidimensional data.
4. VISUALIZING ERRORS
For any scientific measurement, accurate accounting of uncertainties is nearly as
important, if not more so, as accurate reporting of the number itself. For example, imagine that
I am using some astrophysical observations to estimate the Hubble Constant, the local
measurement of the expansion rate of the Universe. I know that the current literature suggests
a value of around 70 (km/s)/Mpc, and I measure a value of 74 (km/s)/Mpc with my method.
Are the values consistent? The only correct answer, given this information, is this: there is no
way to know.
Suppose I augment this information with reported uncertainties: the current literature
suggests a value of 70 ± 2.5 (km/s)/Mpc, and my method has measured a value of 74 ± 5
(km/s)/Mpc. Now are the values consistent? That is a question that can be quantitatively
answered. In visualization of data and results, showing these errors effectively can make a plot
convey much more complete information.
Basic Errorbars
One standard way to visualize uncertainties is using an errorbar. A basic errorbar can be
created with a single Matplotlib function call, as shown in the following figure:
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
In[2]: x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');
Here the fmt is a format code controlling the appearance of lines and points, and it has
the same syntax as the shorthand used in plt.plot, outlined in the previous chapter and earlier
in this chapter. In addition to these basic options, the errorbar function has many options to
fine-tune the outputs. Using these additional options, you can easily customize the aesthetics
of your errorbar plot. I often find it helpful, especially in crowded plots, to make the errorbars
lighter than the points themselves.
Here we'll perform a simple Gaussian process regression, using the Scikit-Learn API
(see Introducing Scikit-Learn for details). This is a method of fitting a very flexible
nonparametric function to data with a continuous measure of the uncertainty. We won't delve
into the details of Gaussian process regression at this point, but will focus instead on how you
might visualize such a continuous error measurement:
Notice that when a single color is used, negative values are represented by dashed lines and
positive values by solid lines. Alternatively, the lines can be color-coded by specifying a
colormap with the cmap argument. Here we'll also specify that we want more lines to be
drawn, at 20 equally spaced intervals within the data range, as shown in the following figure:
In[5]: plt.contour(X, Y, Z, 20, cmap='RdGy'); # 20 - contour levels
Here we chose the RdGy (short for Red–Gray) colormap, which is a good choice for
divergent data: (i.e., data with positive and negative variation around zero). Matplotlib has a
wide range of colormaps available, which you can easily browse in IPython by doing a tab
completion on the plt.cm module:
In[6]: plt.contourf(X, Y, Z, 20, cmap='RdGy') # 20 - contour levels
plt.colorbar();
The colorbar makes it clear that the black regions are "peaks," while the red regions are
"valleys." One potential issue with this plot is that it is a bit splotchy: the color steps are
discrete rather than continuous, which is not always what is desired. This could be remedied
by setting the number of contours to a very high number, but this results in a rather inefficient
plot: Matplotlib must render a new polygon for each step in the level. A better way to generate
a smooth representation is to use the plt.imshow function, which offers
the interpolation argument to generate a smooth two-dimensional representation of the data
(see the following figure):
If you are interested in computing, but not displaying, the histogram (that is, counting
the number of points in a given bin), you can use the np.histogram function:
In[5]: counts, bin_edges = np.histogram(data, bins=5) # bin_edges – contain edges of bin
print(counts)
[ 12 190 468 301 29]
6.1 Two-Dimensional Histograms and Binnings
Just as we create histograms in one dimension by dividing the number line into bins, we
can also create histograms in two dimensions by dividing points among two-dimensional bins.
We'll take a brief look at several ways to do this here. We'll start by defining some data—
an x and y array drawn from a multivariate Gaussian distribution:
In[6]: mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 10000).T #generate samples from
multivariate normal distribution
Plot legends give meaning to a visualization, assigning meaning to the various plot
elements. We previously saw how to create a simple legend; here we'll take a look at
customizing the placement and aesthetics of the legend in Matplotlib. The simplest legend
can be created with the plt.legend command, which automatically creates a legend for any
labeled plot elements
Plot legends identify discrete labels of discrete points. For continuous labels based on the
color of points, lines, or regions, a labeled colorbar can be a great tool. In Matplotlib, a colorbar
is drawn as a separate axes that can provide a key for the meaning of colors in a plot. Because
the book is printed in black and white, this chapter has an accompanying online
supplement where you can view the figures in full color. We'll start by setting up the notebook
for plotting and importing the functions we will use
def grayscale_cmap(cmap):
"""Return a grayscale version of the given colormap"""
cmap = plt.cm.get_cmap(cmap)
colors = cmap(np.arange(cmap.N))
def view_colormap(cmap):
"""Plot a colormap with its grayscale equivalent"""
cmap = plt.cm.get_cmap(cmap)
colors = cmap(np.arange(cmap.N))
cmap = grayscale_cmap(cmap)
grayscale = cmap(np.arange(cmap.N))
In[6]: view_colormap('jet')
In[7]: view_colormap('viridis')
In[8]: view_colormap('cubehelix')
In[9]: view_colormap('RdBu')
9. Multiple Subplots
Sometimes it is helpful to compare different views of data side by side. To this end,
Matplotlib has the concept of subplots: groups of smaller axes that can exist together within a
single figure. These subplots might be insets, grids of plots, or other more complicated layouts.
In this chapter we'll explore four routines for creating subplots in Matplotlib.
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
In[2]: ax1 = plt.axes() # standard axes
ax2 = plt.axes([0.65, 0.65, 0.2, 0.2]) # position, size of subplot
Create an inset axes at the top-right corner of another axes by setting the x and y
position to 0.65 (that is, starting at 65% of the width and 65% of the height of the figure) and
the x and y extents to 0.2 (that is, the size of the axes is 20% of the width and 20% of the height
of the figure).
Aligned columns or rows of subplots are a common enough need that Matplotlib has
several convenience routines that make them easy to create. The lowest level of these
is plt.subplot, which creates a single subplot within a grid. As you can see, this command takes
three integer arguments—the number of rows, the number of columns, and the index of the
plot to be created in this scheme, which runs from the upper left to the bottom right
In[4]: for i in range(1, 7):
plt.subplot(2, 3, i)
plt.text(0.5, 0.5, str((2, 3, i)), fontsize=18, ha='center') # Text coordinate
Matplotlib's default tick locators and formatters are designed to be generally sufficient
in many common situations, but are in no way optimal for every plot. This chapter will give
several examples of adjusting the tick locations and formatting for the particular plot type
you're interested in. Before we go into examples, however, let's talk a bit more about the object
hierarchy of Matplotlib plots. Matplotlib aims to have a Python object representing everything
that appears on the plot: for example, recall that the Figure is the bounding box within which
plot elements appear. Each Matplotlib object can also act as a container of subobjects: for
example, each Figure can contain one or more Axes objects, each of which in turn contains
other objects representing plot contents. The tickmarks are no exception. Each axes has
attributes xaxis and yaxis, which in turn have attributes that contain all the properties of the
lines, ticks, and labels that make up the axes.
Since, this is hard to do all the modifications each time its best to change the defaults
Changing the Defaults: rcParams. Each time matplotlib loads it defines a runtime configuration
(rc) containing default style for each plot. plt.rc.
In[4]: IPython_default = plt.rcParams.copy()
In[5]: from matplotlib import cycler
colors = cycler('color',
['#EE6666', '#3388BB', '#9988DD',
'#EECC55', '#88BB44', '#FFBBBB'])
plt.rc('axes', facecolor='#E6E6E6', edgecolor='none',
axisbelow=True, grid=True, prop_cycle=colors)
plt.rc('grid', color='w', linestyle='solid')
plt.rc('xtick', direction='out', color='gray')
plt.rc('ytick', direction='out', color='gray')
plt.rc('patch', edgecolor='#E6E6E6')
plt.rc('lines', linewidth=2)
In[6]: plt.hist(x);
In[7]: for i in range(4):
plt.plot(np.random.rand(10)) # generate 10 random numbers
Stylesheets
In[8]: plt.style.available[:5] # names of the first five available Matplotlib styles
Out[8]: ['fivethirtyeight',
'seaborn-pastel',
'seaborn-whitegrid',
'ggplot',
'grayscale']
The basic way to switch to a stylesheet is to call:
plt.style.use('stylename')
with plt.style.context('stylename'):
make_a_plot()
Let’s create a function that will make two basic types of plot:
In[9]: def hist_and_lines():
np.random.seed(0)
fig, ax = plt.subplots(1, 2, figsize=(11, 4))
ax[0].hist(np.random.randn(1000)) # plots histogram
for i in range(3):
ax[1].plot(np.random.rand(10))
ax[1].legend(['a', 'b', 'c'], loc='lower left') # plots line graph
Default style
In[10]: # reset rcParams
plt.rcParams.update(IPython_default);
Now let’s see how it looks (Figure 4-85):
In[11]: hist_and_lines()
FiveThirtyEight style
In[12]: with plt.style.context('fivethirtyeight'):
hist_and_lines()
Similarly, we have ggplot, Bayesian Methods for Hackers style, Dark background, Grayscale,
Seaborn style
12.Three-Dimensional Plotting in Matplotlib
Matplotlib was initially designed with only two-dimensional plotting in mind. Around
the time of the 1.0 release, some three-dimensional plotting utilities were built on top of
Matplotlib's two-dimensional display, and the result is a convenient (if somewhat limited) set
of tools for three-dimensional data visualization. Three-dimensional plots are enabled by
importing the mplot3d toolkit, included with the main Matplotlib installation
In[1]: from mpl_toolkits import mplot3d
In[2]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
In[3]: fig = plt.figure()
ax = plt.axes(projection='3d')
A common early complaint, which is now outdated: prior to version 2.0, Matplotlib's
color and style defaults were at times poor and looked dated.
Matplotlib's API is relatively low-level. Doing sophisticated statistical visualization is
possible, but often requires a lot of boilerplate code.
Matplotlib predated Pandas by more than a decade, and thus is not designed for use
with Pandas DataFrame objects. In order to visualize data from a DataFrame, you must
extract each Series and often concatenate them together into the right format. It would
be nicer to have a plotting library that can intelligently use the DataFrame labels in a
plot.
An answer to these problems is Seaborn. Seaborn provides an API on top of Matplotlib that
offers sane choices for plot style and color defaults, defines simple high-level functions for
common statistical plot types, and integrates with the functionality provided by Pandas.
To be fair, the Matplotlib team has adapted to the changing landscape: it added
the plt.style tools discussed in Customizing Matplotlib: Configurations and Style Sheets, and
Matplotlib is starting to handle Pandas data more seamlessly. But for all the reasons just
discussed, Seaborn remains a useful add-on.
Example of matplot lib classic plot.
In[1]: import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
import pandas as pd
In[2]: # Create some data
rng = np.random.RandomState(0)
x = np.linspace(0, 10, 500)
y = np.cumsum(rng.randn(500, 6), 0) #cumulative sum of elements (partial sum
of sequence)
In[3]: # Plot the data with Matplotlib defaults
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');