Unit 5
Unit 5
DATA VISUALIZATION
Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots –
Histograms – legends – colors – subplots – text and annotation – customization – three dimensional
plotting - Geographic Data with Basemap - Visualization with Seaborn.
Short assignment
linestyle='-
' # solid
linestyle='-
-' # dashed
linestyle='-
.' #
dashdot
linestyle=':
' # dotted
• linestyle and color codes can be combined into a single nonkeyword argument to the plt.plot()
function
plt.plot(x, x + 0, '-g') #
solid green plt.plot(x, x +
1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') #
dashdot blackplt.plot(x, x
+ 3, ':r'); # dotted red
Axes
Limits
1
• The most basic way to adjust axis limits is to use the plt.xlim() and plt.ylim() methods
Example
plt.xlim(10, 0)
plt.ylim(1.2, -1.2);
• The plt.axis() method allows you to set the x and y limits with a single call, by passing a list that specifies
[xmin, xmax, ymin, ymax]
plt.axis([-1, 11, -1.5, 1.5]);
• Aspect ratio equal is used to represent one unit in x is equal to one unit in y. plt.axis('equal')
Labeling Plots
The labeling of plots includes titles, axis labels, and simple
legends.Title - plt.title()
Label - plt.xlabel()
plt.ylabel()
Legend - plt.legend()
Example programs
Line color
import matplotlib.pyplot as
pltimport numpy as np
fig =
plt.figure()ax =
plt.axes()
x = np.linspace(0, 10,
1000)ax.plot(x, np.sin(x));
plt.plot(x, np.sin(x - 0), color='blue') # specify color by name
plt.plot(x, np.sin(x - 1), color='g') # short color code
(rgbcmyk) plt.plot(x, np.sin(x - 2), color='0.75') # Grayscale
between 0 and 1
plt.plot(x, np.sin(x - 3), color='#FFDD44') # Hex code (RRGGBB from 00 to
FF)plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3)) # RGB tuple, values 0 and 1
plt.plot(x, np.sin(x - 5), color='chartreuse');# all HTML color names
supported
Line style
import matplotlib.pyplot as plt
import numpy as npfig =
plt.figure()
ax = plt.axes()
x = np.linspace(0, 10, 1000)
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1,
linestyle='dashed') plt.plot(x, x +
2, linestyle='dashdot')plt.plot(x, x
+ 3, linestyle='dotted');
# For short, you can use the following
codes:plt.plot(x, x + 4, linestyle='-') # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':'); # dotted
import matplotlib.pyplot as
pltimport numpy as np
fig =
plt.figure()ax =
plt.axes()
x = np.linspace(0, 10, 1000)
plt.xlim(-1, 11)
plt.ylim(-1.5, 1.5);
plt.plot(x, np.sin(x), '-g', label='sin(x)')
plt.plot(x, np.cos(x), ':b',
label='cos(x)')plt.title("A Sine
Curve")
plt.xlabel("x")
plt.ylabel("sin(x)");
plt.legend();
Example
plt.plot(x, y, 'o', color='black');
• The third argument in the function call is a character that represents the type of symbol used for the plotting.
Just as you can specify options such as '-' and '--' to control the line style, the marker style has its own set of
short string codes.
Example
• Various symbols used to specify ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']
plt.plot(x, y, '-ok');
Example
plt.plot(x, y, '-p', color='gray',
markersize=15, linewidth=4,
markerfacecolor='white',
markeredgecolor='gray',
markeredgewidth=2)
plt.ylim(-1.2, 1.2);
4
Diverging
['PiYG', 'PRGn', 'BrBG', 'PuOr', 'RdGy', 'RdBu', 'RdYlBu', 'RdYlGn', 'Spectral',
'coolwarm', 'bwr', 'seismic']
Qualitative
['Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'Set1', 'Set2', 'Set3',
'tab10', 'tab20', 'tab20b', 'tab20c']
Miscellaneous
['flag', 'prism', 'ocean', 'gist_earth', 'terrain', 'gist_stern', 'gnuplot',
'gnuplot2', 'CMRmap', 'cubehelix', 'brg', 'hsv', 'gist_rainbow', 'rainbow',
'jet', 'nipy_spectral', 'gist_ncar']
Example programs.
import numpy as np
import matplotlib.pyplot as
pltx = np.linspace(0, 10, 20)
y = np.sin(x)
plt.plot(x, y, '-o',
color='gray',
markersize=15,
linewidth=4,
markerfacecolor='yellow',
markeredgecolor='red',
markeredgewidth=4)
plt.ylim(-1.5, 1.5);
Visualizing Errors
For any scientific measurement, accurate accounting for errors is nearly as important, if not more important,
than accurate reporting of the number itself. For example, imagine that I am using some astrophysical
observations to estimate the Hubble Constant, the local measurement of the expansion rate of the Universe.
In visualization of data and results, showing these errors effectively can make a plot convey much more
completeinformation.
Types of errors
• Basic Errorbars
• Continuous Errors
Basic Errorbars
A basic errorbar can be created with a single Matplotlib function call.
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');
• Here the fmt is a format code controlling the appearance of lines and points, and has the same syntax as
theshorthand used in plt.plot()
• In addition to these basic options, the errorbar function has many options to fine tune the outputs.
Usingthese additional options you can easily customize the aesthetics of your errorbar plot.
6
Continuous Errors
• In some situations it is desirable to show errorbars on continuous quantities. Though Matplotlib does not
have a built-in convenience routine for this type of application, it’s relatively easy to combine primitives like
plt.plot and plt.fill_between for a useful result.
• Here we’ll perform a simple Gaussian process regression (GPR), using the Scikit-Learn API. This is a method
of fitting a very flexible nonparametric function to data with a continuous measure of the uncertainty.
• Notice that by default when a single color is used, negative values are represented by dashed lines,
andpositive values by solid lines.
• Alternatively, you can color-code the lines by specifying a colormap with the cmap argument.
• We’ll also specify that we want more lines to be drawn—20 equally spaced intervals within the data range.
7
plt.contour(X, Y, Z, 20, cmap='RdGy');
• One potential issue with this plot is that it is a bit “splotchy.” That is, the color steps are discrete rather
thancontinuous, which is not always what is desired.
• You could remedy this by setting the number of contours to a very high number, but this results in a
ratherinefficient plot: Matplotlib must render a new polygon for each step in the level.
• A better way to handle this is to use the plt.imshow() function, which interprets a two-dimensional grid
ofdata as an image.
Example Program
import numpy as np
import matplotlib.pyplot as plt
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) *
np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.imshow(Z, extent=[0, 10, 0, 10],
origin='lower', cmap='RdGy')
plt.colorbar()
Histograms
• Histogram is the simple plot to represent the large data set. A histogram is a graph showing
frequencydistributions. It is a graph showing the number of observations within each given interval.
Parameters
• plt.hist( ) is used to plot histogram. The hist() function will use an array of numbers to create a
histogram,the array is sent into the function as an argument.
8
• bins - A histogram displays numerical data by grouping data into "bins" of equal width. Each bin is plotted
as a bar whose height corresponds to how many data points are in that bin. Bins are also sometimes called
"intervals", "classes", or "buckets".
• normed - Histogram normalization is a technique to distribute the frequencies of the histogram over a wider
range than the current range.
• x - (n,) array or sequence of (n,) arrays Input values, this takes either a single array or a sequence of arrays
which are not required to be of the same length.
• histtype - {'bar', 'barstacked', 'step', 'stepfilled'},
optionalThe type of histogram to draw.
• 'bar' is a traditional bar-type histogram. If multiple data are given the bars are arranged side by side.
• 'barstacked' is a bar-type histogram where multiple data are stacked on top of each other.
• 'step' generates a lineplot that is by default unfilled.
• 'stepfilled' generates a lineplot that is by default
filled.Default is 'bar'
• align - {'left', 'mid', 'right'}, optional
Controls how the histogram is
plotted.
Default is None
• label - str or None, optional. Default is None
Other parameter
• **kwargs - Patch properties, it allows us to pass a
variable number of keyword arguments to a
python function. ** denotes this type of function.
Example
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
data = np.random.randn(1000)
plt.hist(data);
The hist() function has many options to tune both the calculation and the display; here’s an example of a
morecustomized histogram.
plt.hist(data, bins=30, alpha=0.5,histtype='stepfilled', color='steelblue',edgecolor='none');
The plt.hist docstring has more information on other customization options available. I find this combination
of histtype='stepfilled' along with some transparency alpha to be very useful when comparing histograms of
several distributions
x1 = np.random.normal(0, 0.8, 1000)
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)
kwargs = dict(histtype='stepfilled', alpha=0.3, bins=40)
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);
Example
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 1000).T
plt.hist2d(x, y, bins=30, cmap='Blues')
cb = plt.colorbar()
cb.set_label('counts in bin')
10
Legends
Plot legends give meaning to a visualization, assigning labels to the various plot elements. We previously saw
how to create a simple legend; here we’ll take a look at customizing the placement and aesthetics of the legend
in Matplotlib.
Plot legends give meaning to a visualization, assigning labels to the various plot elements. We previously saw
how to create a simple legend; here we’ll take a look at customizing the placement and aesthetics of the legend
in Matplotlib
plt.plot(x, np.sin(x), '-b', label='Sine')
plt.plot(x, np.cos(x), '--r', label='Cosine')
plt.legend();
Number of columns - We can use the ncol command to specify the number of columns in the legend.
ax.legend(frameon=False, loc='lower center', ncol=2)
fig
11
We can use a rounded box (fancybox) or add a shadow, change the transparency (alpha value) of the frame, or
change the padding around the text.
ax.legend(fancybox=True, framealpha=1, shadow=True, borderpad=1)
fig
Multiple legends
It is only possible to create a single legend for the entire plot. If
you try to create a second legend using plt.legend() or ax.legend(),
it willsimply override the first one. We can work around this by
creating a
new legend artist from scratch, and then using the lower-level ax.add_artist() method to manually add the
second artist to the plot
Example
import matplotlib.pyplot as plt
plt.style.use('classic')
import numpy as np
x = np.linspace(0, 10, 1000)
ax.legend(loc='lower center', frameon=True, shadow=True,borderpad=1,fancybox=True)
fig
Color Bars
In Matplotlib, a color bar is a separate axes that can provide a key for the meaning of colors in a plot.
Forcontinuous labels based on the color of points, lines, or regions, a labeled color bar can be a great tool.
The simplest colorbar can be created with the plt.colorbar() function.
Customizing Colorbars
Choosing color map.
We can specify the colormap using the cmap argument to the plotting function that is creating the
visualization.Broadly, we can know three different categories of colormaps:
• Sequential colormaps - These consist of one continuous sequence of colors (e.g., binary or viridis).
• Divergent colormaps - These usually contain two distinct colors, which show positive and negative
deviations from a mean (e.g., RdBu or PuOr).
• Qualitative colormaps - These mix colors with no particular sequence (e.g., rainbow or jet).
12
Color limits and extensions
• Matplotlib allows for a large range of colorbar customization. The colorbar itself is simply an instance of
plt.Axes, so all of the axes and tick formatting tricks we’ve learned are applicable.
• We can narrow the color limits and indicate the out-of-bounds values with a triangular arrow at the top
andbottom by setting the extend property.
plt.subplot(1, 2, 2)
plt.imshow(I, cmap='RdBu')
plt.colorbar(extend='both')
plt.clim(-1, 1);
Discrete colorbars
Colormaps are by default continuous, but sometimes you’d like to
represent discrete values. The easiest way to do this is to use the
plt.cm.get_cmap() function, and pass the name of a suitable colormap
along with the number of desired bins.
plt.imshow(I, cmap=plt.cm.get_cmap('Blues', 6))
plt.colorbar()
plt.clim(-1, 1);
Subplots
• Matplotlib has the concept of subplots: groups of smaller axes that can exist together within a single figure.
• These subplots might be insets, grids of plots, or other more complicated layouts.
• We’ll explore four routines for creating subplots in Matplotlib.
• plt.axes: Subplots by Hand
• plt.subplot: Simple Grids of Subplots
• plt.subplots: The Whole Grid in One Go
• plt.GridSpec: More Complicated Arrangements
13
For example,
we might create an inset axes at the top-right corner of
another axes by setting the x and y position to 0.65 (that is,
starting at 65% of the width and 65% of the height of the
figure) and the xand y extents to 0.2 (that is, the size of the
axes is 20% of the width and 20% of the height of the figure).
For example, a gridspec for a grid of two rows and three columns with some specified width and height
spacelooks like this:
15
• Text annotation can be done manually with the plt.text/ax.text command, which will place text at a
particular x/y value.
• The ax.text method takes an x position, a y position, a string, and then optional keywords specifying the
color, size, style, alignment, and other properties of the text. Here we used ha='right' and ha='center', where
ha is short for horizontal alignment.
Example
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.style.use('seaborn-whitegrid')
import numpy as np
import pandas as pd
fig, ax = plt.subplots(facecolor='lightgray')
ax.axis([0, 10, 0, 10])
# transform=ax.transData is the default, but we'll specify it anyway
ax.text(1, 5, ". Data: (1, 5)", transform=ax.transData)
ax.text(0.5, 0.1, ". Axes: (0.5, 0.1)", transform=ax.transAxes)
ax.text(0.2, 0.2, ". Figure: (0.2, 0.2)", transform=fig.transFigure);
16
Note that by default, the text is aligned above and to the left of the specified coordinates; here the “.” at the
beginning of each string will approximately mark the given coordinate location.
The transData coordinates give the usual data coordinates associated with the x- and y-axis labels. The
transAxes coordinates give the location from the bottom-left corner of the axes (here the white box) as a
fraction of the axes size.
The transfigure coordinates are similar, but specify the position from the bottom left of the figure (here the
gray box) as a fraction of the figure size.
Notice now that if we change the axes limits, it is only the transData coordinates that will be affected, while the
others remain stationary.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
ax = plt.axes(projection='3d')
# Data for a three-dimensional line
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')
# Data for three-dimensional scattered points
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');plt.show()
Notice that by default, the scatter points have their transparency adjusted to give a sense of depth on the page.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X, Y, Z, color='black')
ax.set_title('wireframe');
plt.show()
18
• Adding a colormap to the filled polygons can aid perception of the topology of the surface being visualized
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,
cmap='viridis', edgecolor='none')
ax.set_title('surface')
plt.show()
Surface Triangulations
• For some applications, the evenly sampled grids required by
the preceding routines are overly restrictive and
inconvenient.
• In these situations, the triangulation-based plots can be very useful.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
theta = 2 * np.pi * np.random.random(1000)
r = 6 * np.random.random(1000)
x = np.ravel(r * np.sin(theta))
y = np.ravel(r * np.cos(theta))
z = f(x, y)
ax = plt.axes(projection='3d')
ax.scatter(x, y, z, c=z, cmap='viridis', linewidth=0.5)
19
• We’ll use an etopo image (which shows topographical features both on land and under the ocean) as
themap background
Program to display particular area of the map with latitude
andlongitude lines
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
from itertools import chain
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,
width=8E6, height=8E6,
lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5)
def draw_map(m, scale=0.2):
# draw a shaded-relief image
m.shadedrelief(scale=scale)
# lats and longs are returned as a dictionary
lats = m.drawparallels(np.linspace(-90, 90, 13))
lons = m.drawmeridians(np.linspace(-180, 180, 13))
# keys contain the plt.Line2D instances
lat_lines = chain(*(tup[1][0] for tup in lats.items()))
lon_lines = chain(*(tup[1][0] for tup in lons.items()))
all_lines = chain(lat_lines, lon_lines)
# cycle through these lines and set the desired style
for line in all_lines:
line.set(linestyle='-', alpha=0.3, color='r')
Map Projections
The Basemap package implements several dozen such projections, all referenced by a short format code. Here
we’llbriefly demonstrate some of the more common ones.
• Cylindrical projections
• Pseudo-cylindrical projections
• Perspective projections
• Conic projections
Cylindrical projection
• The simplest of map projections are cylindrical projections, in which lines of constant latitude and
longitudeare mapped to horizontal and vertical lines, respectively.
• This type of mapping represents equatorial regions quite well, but results in extreme distortions near
thepoles.
• The spacing of latitude lines varies between different cylindrical projections, leading to different
conservation properties, and different distortion near the poles.
• Other cylindrical projections are the Mercator (projection='merc') and the cylindrical equal-area
(projection='cea') projections.
• The additional arguments to Basemap for this view specify the latitude (lat) and longitude (lon) of
thelower-left corner (llcrnr) and upper-right corner (urcrnr) for the desired map, in units of degrees.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
20
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='cyl', resolution=None,
llcrnrlat=-90, urcrnrlat=90,
llcrnrlon=-180, urcrnrlon=180, )
draw_map(m)
Pseudo-cylindrical projections
• Pseudo-cylindrical projections relax the requirement that meridians (lines of constant longitude)
remainvertical; this can give better properties near the poles of the projection.
• The Mollweide projection (projection='moll') is one common example of this, in which all meridians
areelliptical arcs
• It is constructed so as to
• preserve area across the map: though there
aredistortions near the poles, the area of small
patches reflects the true area.
• Other pseudo-cylindrical projections are the
sinusoidal (projection='sinu') and Robinson
(projection='robin') projections.
• The extra arguments to Basemap here refer to
the central latitude (lat_0) and longitude
(lon_0) for the desired map.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='moll', resolution=None,
lat_0=0, lon_0=0)
draw_map(m)
Perspective projections
• Perspective projections are constructed using a particular choice of perspective point, similar to if you
photographed the Earth from a particular point in space (a point which, for some projections, technically
lieswithin the Earth!).
21
• One common example is the orthographic projection (projection='ortho'), which shows one side of the globe
as seen from a viewer at a very long distance.
• Thus, it can show only half the globe at a time.
• Other perspective-based projections include the
gnomonic projection (projection='gnom') and
stereographic projection (projection='stere').
• These are often the most useful for showing small
portions of the map.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None,
lat_0=50, lon_0=0)
draw_map(m);
Conic projections
• A conic projection projects the map onto a single cone, which is then unrolled.
• This can lead to very good local properties, but regions far from the focus point of the cone may
becomevery distorted.
• One example of this is the Lambert conformal conic projection (projection='lcc').
• It projects the map onto a cone arranged in such a way that two standard parallels (specified in Basemap by
lat_1 and lat_2) have well-represented distances, with scale decreasing between them and increasing
outsideof them.
• Other useful conic projections are the equidistant conic (projection='eqdc') and the Albers equal-area
(projection='aea') projection
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,
lon_0=0, lat_0=50, lat_1=45, lat_2=55, width=1.6E7, height=1.2E7)
draw_map(m)
2
2
Drawing a Map Background
The Basemap package contains a range of useful functions for drawing borders of physical features like
continents,oceans, lakes, and rivers, as well as political boundaries such as countries and US states and counties.
The following are some of the available drawing functions that you may wish to explore using IPython’s
helpfeatures:
• Political boundaries
drawcountries() - Draw country
boundaries drawstates() - Draw US state
boundaries drawcounties() - Draw US
county boundaries
• Map features
drawgreatcircle() - Draw a great circle between two
pointsdrawparallels() - Draw lines of constant latitude
drawmeridians() - Draw lines of constant longitude
drawmapscale() - Draw a linear scale on the map
• Whole-globe images
bluemarble() - Project NASA’s blue marble image onto the
mapshadedrelief() - Project a shaded relief image onto the
map etopo() - Draw an etopo relief image onto the map
warpimage() - Project a user-provided image onto the map
Pair plots
When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. This is very useful
forexploring correlations between multidimensional data, when you’d like to plot all pairs of values against each
other.
We’ll demo this with the Iris dataset, which lists measurements of petals and sepals of three iris species:
import seaborn as sns
iris = sns.load_dataset("iris")
sns.pairplot(iris, hue='species', size=2.5);
24
Faceted histograms
• Sometimes the best way to view data is via histograms of subsets. Seaborn’s FacetGrid makes this
extremely simple.
• We’ll take a look at some data that shows the amount that restaurant staff receive in tips based on
variousindicator data
25
Factor plots
Factor plots can be useful for this kind of visualization as well. This allows you to
view the distribution of aparameter within bins defined by any other parameter.
Joint distributions
Similar to the pair plot we saw earlier, we can use sns.jointplot to show the joint
distribution between differentdatasets, along with the associated marginal distributions.
Bar plots
Time series can be plotted with sns.factorplot.