Matplotlib Seaborn Fundamentals (1)
Matplotlib Seaborn Fundamentals (1)
The Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python. This course covers the basic usage patterns and best practices to get
started with Matplotlib.
Installing Matplotlib
The anaconda stack installation has Matplotlib pre-installed and we are ready to use it.
Out[1]: '3.5.1'
In Colab as well, the Matplotlib comes built in and we can check its version after importing.
The convension in the Python world is to import the Matplotlib by an alias name mpl:
To display all the contents of the matplotlib namespace, we can use this:
In [3]: 1 mpl?
https://matplotlib.org/stable/index.html (https://matplotlib.org/stable/index.html)
The matplotlib tutorials can be accessed using:
https://matplotlib.org/stable/tutorials/index.html (https://matplotlib.org/stable/tutorials/index.html)
The matplotlib examples can be accessed using:
https://matplotlib.org/stable/gallery/index.html (https://matplotlib.org/stable/gallery/index.html)
Matplotlib graphs data on Figures, each of which can contain one or more Axes, an area where
points can be specified in terms of x-y coordinates.
The simplest way of creating a Figure with an Axes is using pyplot. The matplotlib.pyplot is an
interface of matplotlib we use widely for plots. The pyplot functions help making changes to the
figures, as required. We start importing with the shorthands:
Setting styles: We use the plt.style directive to choose appropriate aesthetic styles for our
figures.
We will set the classic style, which ensures that the plots we create use the classic Matplotlib
style:
In [5]: 1 plt.style.use('classic')
Plotting from scripts: The plt.plot() draws figures inline when called in Jupyter notebooks.
In [6]: 1 x = np.linspace(0, 10, 100)
2 plt.plot(x, np.sin(x))
3 plt.plot(x, np.cos(x))
When we run in Ipython command shell we need to call plt.show() to view the figure in a new
window.
Interactive Plotting: To enter into the interactive mode we have to use the magic command
%matplotlib
Any plt.plot() command will cause a figure window to open, and further commands can be run to
update the plot. The changes such as modifying properties of lines that are already drawn will not
draw automatically; to force an update we have to use plt.draw().
In the IPython notebook, we have the option of embedding graphics directly in the notebook using
%matplotlib inline which leads to static images of the plot embedded into the notebook.
Once we run this command (once per kernel/session), any cell within the notebook that creates a
plot will be embedded into a PNG image of the resulting graphics:
In [8]: 1 fig = plt.figure()
2 plt.plot(x, np.sin(x), '--')
3 plt.plot(x, np.cos(x), '*')
Saving figures to file: Matplotlib has the ability to save figures into various image formats
determined by the following command.
In [9]: 1 fig.canvas.get_supported_filetypes()
In [10]: 1 fig.savefig('firstFig.jpg')
We have a file called firstFig.jpg in the current working directory. Lets check!!!
To confirm that it contains what we saved, let’s use the Python Image object to display the
contents of firstFig.jpg
Out[11]:
Subplots - MATLAB like interface: Matplotlib was originally written as a Python alternative for
MATLAB and much of the syntax reflects the same. The MATLAB-style tools are contained in the
pyplot interface.
Object oriented interface: The object-oriented interface is available for complicated situations
when we want to have more control over the figure.
In the object-oriented interface the plotting functions are methods of explicit Figure and Axes
objects.
Simple Line Plots: The simplest of all plots is the visualization of a function y = f(x). We can start
by creating a figure and an axes:
In [14]: 1 fig = plt.figure()
2 ax = plt.axes()
In [15]: 1 plt.grid()
The figure is an instance of the class plt.Figure and it can be thought of as a single container that
contains all the objects representing axes, graphics, text, and labels.
The axes is an instance of the class plt.Axes and it is a bounding box with ticks and labels, which
will eventually contain the plot elements that make up our visualization.
In the Python world, the variable name fig is used to refer to a figure instance and ax is used to
refer to an axes instance or group of axes instances.
Once we have created an axes, we have to use the ax.plot() function to plot the data.
In [16]: 1 plt.style.use('seaborn-whitegrid')
2 fig = plt.figure()
3 ax = plt.axes()
4 ax.plot(x, np.sin(x))
Alternatively, we can use the pylab interface and let the figure and axes be created in the
background:
When we want to create a single figure with multiple line plots, we have to call the plot function
multiple times:
Line colours & styles: The plt.plot() function takes additional arguments that can be used to
specify colours and styles.
To adjust the color we use the color keyword which accepts a string argument representing
virtually any color.
These linestyle and color codes can be combined into a single nonkeyword argument to the
plt.plot() function:
Axes Limits: The axes limits can be adjusted by using plt.xlim() and plt.ylim() methods.
In [22]: 1 plt.plot(x, np.cos(x))
2 plt.xlim(-2, 12)
3 plt.ylim(-2.75, 2.5);
The plt.axis() method allows to set the x and y limits with a single call by passing a list that
specifies [xmin, xmax, ymin, ymax] .
The plt.axis('tight') method can automatically tighten the bounds around the current plot:
In [24]: 1 plt.plot(x, np.sin(2.5*x))
2 plt.axis('tight');
It allows an equal aspect ratio so that on the screen one unit in x is equal to one unit in y by using
plt.axis('equal'):
Labelling plots: Titles and axis labels can be produced by using the methods plt.title(),
plt.xlabel() and plt.ylabel()
In [26]: 1 plt.plot(x, np.tan(x))
2 plt.title("This is tan function")
3 plt.xlabel("x")
4 plt.ylabel("tan(x)")
When multiple lines are being plotted on a single axes, it can be useful to create a plot legend that
labels each line type. Matplotlib has a built-in way of quickly creating such a legend by using
plt.legend() method. We can specify the label of each line by using the *label keyword.
We can use the following list of conversions between Matlab style plots and Object oriented plots.
In the object-oriented interface to plotting we can use the ax.set() method to set all properties
required at once:
In [28]: 1 ax = plt.axes()
2 ax.plot(x, np.cos(2.5*x))
3 ax.set(xlim=(-2, 12), ylim=(-2.5, 4),
4 xlabel='x', ylabel='cos(x)',
5 title='Plot of sinusoid');
The advantage of plt.scatter() is that it can be used to create scatter plots where the properties
such as size, face color, edge color of each point can be individually controlled. One can also use
the alpha keyword to adjust the transparency level of the data points.
Example: creating a random scatter plot with points of many colors and sizes.
In [30]: 1 x = np.random.randn(50)
2 y = np.random.randn(50)
3 colors = np.random.rand(50)
4 sizes = 500 * np.random.rand(50)
5 plt.scatter(x, y, c=colors, s=sizes, alpha=0.4, cmap='viridis')
6 plt.colorbar();
C:\Users\prash\AppData\Local\Temp\ipykernel_1112\123576764.py:6: MatplotlibDepr
ecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecate
d since 3.5 and will be removed two minor releases later; please call grid(Fals
e) first.
plt.colorbar();
To view the list colormaps supported we can use plt.cm.< TAB >
Using some additional options we can easily customize the aesthetics of your errorbar plot:
Density and Contour plots: It is useful to display the 3D data in two dimensions using contours or
color-coded regions. There are three Matplotlib functions that can be helpful for this task:
plt.contour() for contour plots, plt.contourf() for filled contour plots, and plt.imshow() for showing
images.
A contour plot can be created with the plt.contour() function. It takes three arguments: a grid of x
values, a grid of y values, and a grid of z values. The x and y values represent positions on the
plot, and the z values will be represented by the contour levels.
The straightforward way to prepare such data is to use the np.meshgrid() function, which builds
2D grids from 1D arrays:
Notice that by default when a single color is used, negative values are represented by dashed
lines, and positive values by solid lines. Alternatively, we can color-code the lines by specifying a
colormap with the cmap argument.
In [34]: 1 plt.contour(X, Y, Z, 20, cmap='RdGy')
We can also specify that we want more lines to be drawn with color information in the form of
colorbarusing the plt.colorbar() command.
C:\Users\prash\AppData\Local\Temp\ipykernel_1112\513212699.py:2: MatplotlibDepr
ecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecate
d since 3.5 and will be removed two minor releases later; please call grid(Fals
e) first.
plt.colorbar();
The splotchiness of the graph can be reduced by using the plt.imshow() function:
plt.imshow() doesn’t accept an x and y grid, so we must manually specify the extent [xmin, xmax,
ymin, ymax] of the image on the plot.
plt.imshow() by default follows the standard image array where the origin is the upper left, not in
the lower left. This must be changed when showing gridded data.
plt.imshow() will automatically adjust the axis aspect ratio to match the input data; we can change
this by setting by plt.axis(aspect='image') to make x and y units match.
In [36]: 1 plt.imshow(Z, extent=[0, 5, 0, 5],
2 origin='lower', cmap='autumn')
3 plt.colorbar()
4 plt.axis('image')
C:\Users\prash\AppData\Local\Temp\ipykernel_1112\3509406218.py:3: MatplotlibDep
recationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecat
ed since 3.5 and will be removed two minor releases later; please call grid(Fal
se) first.
plt.colorbar()
We can plot useful information on contour plots by combining the contour plots and image plots.
To achieve this, we will use a partially transparent background image and over-plot contours with
labels on the contours using the plt.clabel() function):
In [37]: 1 contours = plt.contour(X, Y, Z, 5, colors = 'black')
2 plt.clabel(contours, inline = True, fontsize = 10)
3 plt.imshow(Z, extent = [0, 5, 0, 4], origin = 'lower',
4 cmap = 'Dark2', alpha = 0.4)
5 plt.colorbar()
C:\Users\prash\AppData\Local\Temp\ipykernel_1112\1712393327.py:5: MatplotlibDep
recationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecat
ed since 3.5 and will be removed two minor releases later; please call grid(Fal
se) first.
plt.colorbar()
Out[38]: (array([ 12., 15., 64., 162., 212., 216., 201., 84., 23., 11.]),
array([-3.02858801, -2.42470463, -1.82082126, -1.21693789, -0.61305452,
-0.00917114, 0.59471223, 1.1985956 , 1.80247897, 2.40636235,
3.01024572]),
<BarContainer object of 10 artists>)
The plt.hist() function has many options to tune both the calculation and the display:
In [39]: 1 plt.hist(myData, bins = 40,
2 alpha = 0.7,
3 histtype = 'bar',
4 color = 'steelblue',
5 edgecolor = 'Red');
We can use the combination of histtype='stepfilled' along with a transparency alpha to compare
histograms of several distributions:
If we would like to simply compute or count the number of points in a given bin and not display it,
the np.histogram() function is available:
In [41]: 1 counts, bin_edges = np.histogram(myData, bins = 4)
2 print(counts)
C:\Users\prash\AppData\Local\Temp\ipykernel_1112\4248966252.py:1: MatplotlibDep
recationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecat
ed since 3.5 and will be removed two minor releases later; please call grid(Fal
se) first.
plt.hist2d(x, y, bins=25, cmap='BuGn')
C:\Users\prash\AppData\Local\Temp\ipykernel_1112\4248966252.py:2: MatplotlibDep
recationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecat
ed since 3.5 and will be removed two minor releases later; please call grid(Fal
se) first.
cb = plt.colorbar()
Hexagonal binnings: The Matplotlib provides the plt.hexbin() routine, which represents a 2D
dataset binned within a grid of hexagons.
In [45]: 1 plt.hexbin(x, y, gridsize=30, cmap='RdYlGn_r')
2 cb = plt.colorbar(label='count in bin')
3
C:\Users\prash\AppData\Local\Temp\ipykernel_1112\3618383650.py:2: MatplotlibDep
recationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecat
ed since 3.5 and will be removed two minor releases later; please call grid(Fal
se) first.
cb = plt.colorbar(label='count in bin')
Plot Legends: Plot legends assign labels to the plot elements. A legend can be created with the
plt.legend() command.
Out[47]:
We can use the ncol command to specify the number of columns in the legend:
In [48]: 1 ax.legend(frameon = True, loc = 'center', ncol = 2)
2 fig
Out[48]:
We can use a rounded box by using the fancybox keyword and we can also add a shadow using
the shadow keyword and change the transparency of the frameusing framealpha keyword or
change the padding around the text by using borderpad keyword:
In [49]: 1 plt.style.use('classic')
2 ax.legend(fancybox = True, framealpha = 0.5,
3 shadow = True, borderpad = 1)
4 fig
5
Out[49]:
Multiple subplots: groups of smaller axes that can exist together within a single figure. These
subplots might be insets, grids of plots, or other more complicated layouts.
We can create an axes is to use plt.axes() function. By default this function creates a standard
axes object that fills the entire figure.
But the same plt.axes() can take an optional argument which should be a list of four numbers
[bottom, left, width, height] within the figure coordinate system which range from 0 at the bottom
left of the figure to 1 at the top right of the figure.
In [50]: 1 plt.style.use('seaborn-white')
Example: To create an inset axes at the top-left corner of another axes by setting the x position to
0.5 (starting at 50% of the width) and y position to 0.5 (50% of the height of the figure) and the x
entends to 0.2 and y extents to 0.2 (the size of the axes is 20% of the width and 20% of the height
of the figure)
Grids og subplorts: Aligned columns or rows of subplots can be created in Matplotlib by using
plt.subplot() which creates a single subplot within a grid.
This command takes three integer arguments—the number of rows, the number of columns and
the index of the plot to be created in this scheme which runs from the upper left to the bottom right.
In [53]: 1 for i in range(1, 9):
2 plt.subplot(4, 2, i)
3 plt.text(0.5, 0.5, str((4, 2, i)),
4 fontsize = 18, ha = 'center')
The command plt.subplots_adjust() can be used to adjust the spacing between these plots.
Complete grid at once: The function plt.subplots() is the easier tool to use which creates a full
grid of subplots in a single line and returns them in a NumPy array.
The arguments are the number of rows and number of columns and the optional keywords sharex
and sharey which allow to specify the relationships between different axes.
Example: To create a 5 by 4 grid of subplots where all axes in the same row share their y-axis
scale and all axes in the same column share their x-axis scale.
The resulting grid of axes instances gets returned within a NumPy array, allowing a convenient
specification of the desired axes using standard array indexing notation.
Out[56]:
Complex arrangements: To draw the subplots which span multiple rows and columns, the
function plt.GridSpec() is used.
Example: a gridspec for a grid of 4 rows and 4 columns with some specified width and height
space looks like this:
In [57]: 1 grid = plt.GridSpec(4, 4, wspace = 0.4, hspace = 0.3)
2 plt.subplot(grid[0, :3])
3 plt.subplot(grid[:, 3])
4 plt.subplot(grid[1:, 0])
5 plt.subplot(grid[1:3, 1:3])
6 plt.subplot(grid[3, 1:3])
Out[57]: <AxesSubplot:>
Stylesheets: As already discussed, one can check the styles available by using
plt.style.available() and switch to the one selected by us using plt.style.use().
But this will change the plot style of rest of the session! To avoid the same, we use the
plt.style.context() which sets style temporarily:
3D plots: We have to enable the 3D plots by importing the mplot3d toolkit included with the main
Matplotlib installation.
We can create a 3D axes by passing the keyword projection='3d' to any of the normal axes
creation routines.
The 3D line or scatter plot can be created from sets of (x, y, z) triples and we can create them by
using the ax.plot3D() and ax.scatter3D() functions.
In [66]: 1 ax = plt.axes(projection='3d')
2
3 # Data for a three-dimensional line
4 zline = np.linspace(0, 10, 100)
5 xline = np.sin(zline)
6 yline = np.cos(zline)
7
8 ax.plot3D(xline, yline, zline, 'gray')
9
10 # Data for three-dimensional scattered points
11 zdata = 10 * np.random.random(50)
12 xdata = np.sin(zdata) + 0.1 * np.random.randn(50)
13 ydata = np.cos(zdata) + 0.1 * np.random.randn(50)
14
15 ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Blues');
3D contour plot: By using the ax.contour3D() function.
The viewing angle can be set by using ax.view_init() to set elevation from xy plane and azimuth
about z axis.
In [68]: 1 ax.view_init(65, 25)
2 fig
Out[68]:
Matplotlib vs Seaborn:
In [71]: 1 x = np.linspace(0, 10, 500)
2 y = np.cumsum(np.random.randn(500, 4), 0)
3 plt.plot(x, y)
4 plt.legend('WXYZ', ncol=2, loc='upper left');
Lets bring-in seaborn and set the stying using sb.set() method.
In [72]: 1 import seaborn as sb
2 sb.set()
3 plt.plot(x, y)
4 plt.legend('WXYZ', ncol=2, loc='upper left');
Histograms:
In [73]: 1 import pandas as pd
2 data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
3 data = pd.DataFrame(data, columns=['x', 'y'])
4 for col in 'xy':
5 plt.hist(data[col], alpha=0.4)
We can get a smooth estimate of the distribution using a kernel density estimation with
sb.kdeplot()
In [74]: 1 for col in 'xy':
2 sb.kdeplot(data[col], shade = False)
C:\Users\prash\anaconda3\lib\site-packages\seaborn\distributions.py:2619: Futur
eWarning: `distplot` is a deprecated function and will be removed in a future v
ersion. Please adapt your code to use either `displot` (a figure-level function
with similar flexibility) or `histplot` (an axes-level function for histogram
s).
warnings.warn(msg, FutureWarning)
C:\Users\prash\anaconda3\lib\site-packages\seaborn\distributions.py:2619: Futur
eWarning: `distplot` is a deprecated function and will be removed in a future v
ersion. Please adapt your code to use either `displot` (a figure-level function
with similar flexibility) or `histplot` (an axes-level function for histogram
s).
warnings.warn(msg, FutureWarning)
C:\Users\prash\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWar
ning: Pass the following variable as a keyword arg: y. From version 0.12, the o
nly valid positional argument will be `data`, and passing other arguments witho
ut an explicit keyword will result in an error or misinterpretation.
warnings.warn(
We can see the joint distribution and the marginal distributions together using sb.jointplot().
In [77]: 1 with sb.axes_style('white'):
2 sb.jointplot("x", "y", data)
C:\Users\prash\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWar
ning: Pass the following variables as keyword args: x, y, data. From version 0.
12, the only valid positional argument will be `data`, and passing other argume
nts without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
There are other parameters that can be passed to jointplot—for example, we can use a
hexagonally based histogram:
In [78]: 1 with sb.axes_style('white'):
2 sb.jointplot("x", "y", data, kind='hex')
C:\Users\prash\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWar
ning: Pass the following variables as keyword args: x, y, data. From version 0.
12, the only valid positional argument will be `data`, and passing other argume
nts without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
Pairplots: When we generalize joint plots to datasets of larger dimensions, we end up with pair
plots.
This is very useful for exploring correlations between multidimensional data, when we like to plot
all pairs of values against each other.
In [79]: 1 irisData = sb.load_dataset("iris")
2 irisData.head(10)
Out[79]:
sepal_length sepal_width petal_length petal_width species
Faceted histograms: Sometimes the we may have to view data is via histograms of subsets.
Seaborn’s FacetGrid makes this extremely simple.
In [81]: 1 tipsData = sb.load_dataset('tips')
2 tipsData.head(10)
Out[81]:
total_bill tip sex smoker day time size
Factor plots: Factor plots allows us to view the distribution of a parameter within bins defined by
any other parameter.
In [83]: 1 with sb.axes_style(style = 'ticks'):
2 g = sb.catplot(x = "day", y = "total_bill", hue = "sex",
3 data = tipsData, kind="box")
4 g.set_axis_labels("Day", "Total Bill")
Joint distributions: we can use sb.jointplot() to show the joint distribution between different
datasets, along with the associated marginal distributions:
In [84]: 1 with sb.axes_style('white'):
2 sb.jointplot(x = "total_bill", y = "tip",
3 data = tipsData, kind='hex')
4
The joint plot can even do automatic kernel density estimation and regression:
In [85]: 1 sb.jointplot(x = "total_bill", y = "tip",
2 data = tipsData, kind = 'reg')