100% found this document useful (1 vote)

157 views17 pages

The Next Level of Data Visualization in Python

This document summarizes an article that discusses using the Plotly library in Python for interactive data visualization. It begins with an overview of Plotly and how it can be used to create better-looking and interactive plots with just one line of code compared to traditional matplotlib. It then provides examples of creating histograms, boxplots, scatter plots, and time series plots with Plotly. These examples demonstrate Plotly's capabilities for visualizing univariate and bivariate distributions as well as time series data in an interactive format.

Uploaded by

Niranjan Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

157 views17 pages

The Next Level of Data Visualization in Python

Uploaded by

Niranjan Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

(Source)

The Next Level of Data Visualization

in Python
How to make great-looking, fully-interactive plots with a single line of
Python

Will Koehrsen
Jan 9 · 8 min read

The sunk-cost fallacy is one of many harmful cognitive biases to which humans fall
prey. It refers to our tendency to continue to devote time and resources to a lost cause
because we have already spent — sunk — so much time in the pursuit. The sunk-cost
fallacy applies to staying in bad jobs longer than we should, slaving away at a project

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 1/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

even when it’s clear it won’t work, and yes, continuing to use a tedious, outdated
plotting library — matplotlib — when more efficient, interactive, and better-looking
alternatives exist.

Over the past few months, I’ve realized the only reason I use matplotlib is the
hundreds of hours I’ve sunk into learning the convoluted syntax. This complication
leads to hours of frustration on StackOverflow figuring out how to format dates or add
a second y-axis. Fortunately, this is a great time for Python plotting, and after exploring
the options, a clear winner — in terms of ease-of-use, documentation, and
functionality — is the plotly Python library. In this article, we’ll dive right into plotly ,

learning how to make better plots in less time — often with one line of code.

All of the code for this article is available on GitHub. The charts are all interactive and
can be viewed on NBViewer here.

Example of plotly gures (source)

Plotly Brief Overview

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 2/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

The plotly Python package is an open-source library built on plotly.js which in turn
is built on d3.js . We’ll be using a wrapper on plotly called cufflinks designed to work
with Pandas dataframes. So, our entire stack is cufflinks > plotly > plotly.js > d3.js
which means we get the efficiency of coding in Python with the incredible interactive
graphics capabilities of d3.

(Plotly itself is a graphics company with several products and open-source tools. The
Python library is free to use, and we can make unlimited charts in offline mode plus up
to 25 charts in online mode to share with the world.)

All the work in this article was done in a Jupyter Notebook with plotly + cufflinks
running in offline mode. After installing plotly and cufflinks with pip install

cufflinks plotly import the following to run in Jupyter:

# Standard plotly imports

import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode

# Using plotly + cufflinks in offline mode

import cufflinks
cufflinks.go_offline(connected=True)
init_notebook_mode(connected=True)

. . .

Single Variable Distributions: Histograms and Boxplots

Single variable — univariate — plots are a standard way to start an analysis and the
histogram is a go-to plot (although it has some issues) for graphing a distribution.
Here, using my Medium article statistics (you can see how to get your own stats here or
use mine here) let’s make an interactive histogram of the number of claps for articles (
df is a standard Pandas dataframe):

df['claps'].iplot(kind='hist', xTitle='claps',
yTitle='count', title='Claps Distribution')

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 3/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

Interactive histogram made with plotly+cu inks

For those used to matplotlib , all we have to do is add one more letter ( iplot instead
of plot ) and we get a much better-looking and interactive chart! We can click on the

data to get more details, zoom into sections of the plot, and as we’ll see later, select
different categories to highlight.

If we want to plot overlaid histograms, that’s just as simple:

df[['time_started', 'time_published']].iplot(
kind='hist',
histnorm='percent',
barmode='overlay',
xTitle='Time of Day',
yTitle='(%) of Articles',
title='Time Started and Time Published')

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 4/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

With a little bit of pandas manipulation, we can do a barplot:

# Resample to monthly frequency and plot

df2 = df[['view','reads','published_date']].\
set_index('published_date').\
resample('M').mean()

df2.iplot(kind='bar', xTitle='Date', yTitle='Average',

title='Monthly Average Views and Reads')

s we saw, we can combine the power of pandas with plotly + cufflinks. For a boxplot of
the fans per story by publication, we use a pivot and then plot:

df.pivot(columns='publication', values='fans').iplot(
kind='box',
yTitle='fans',
title='Fans Distribution by Publication')

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 5/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

The benefits of interactivity are that we can explore and subset the data as we like.
There’s a lot of information in a boxplot, and without the ability to see the numbers,
we’ll miss most of it!

. . .

Scatterplots
The scatterplot is the heart of most analyses. It allows us to see the evolution of a
variable over time or the relationship between two (or more) variables.

Time-Series
A considerable portion of real-world data has a time element. Luckily, plotly +
cufflinks was designed with time-series visualizations in mind. Let’s make a dataframe
of my TDS articles and look at how the trends have changed.

Create a dataframe of Towards Data Science Articles

tds = df[df['publication'] == 'Towards Data Science'].\
set_index('published_date')

# Plot read time as a time series

tds[['claps', 'fans', 'title']].iplot(
y='claps', mode='lines+markers', secondary_y = 'fans',
secondary_y_title='Fans', xTitle='Date', yTitle='Claps',
text='title', title='Fans and Claps over Time')

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 6/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

Here we are doing quite a few different things all in one line:

Getting a nicely formatted time-series x-axis automatically

Adding a secondary y-axis because our variables have different ranges

Adding in the title of the articles as hover information

For more information, we can also add in text annotations quite easily:

tds_monthly_totals.iplot(
mode='lines+markers+text',
text=text,
y='word_count',
opacity=0.8,
xTitle='Date',
yTitle='Word Count',
title='Total Word Count by Month')

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 7/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

Scatterplot with annotations

For a two-variable scatter plot colored by a third categorical variable we use:

df.iplot(
x='read_time',
y='read_ratio',
# Specify the category
categories='publication',
xTitle='Read Time',
yTitle='Reading Percent',
title='Reading Percent vs Read Ratio by Publication')

Let’s get a little more sophisticated by using a log axis — specified as a plotly layout —
(see the Plotly documentation for the layout specifics) and sizing the bubbles by a
https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 8/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

numeric variable:

tds.iplot(
x='word_count',
y='reads',
size='read_ratio',
text=text,
mode='markers',
# Log xaxis
layout=dict(
xaxis=dict(type='log', title='Word Count'),
yaxis=dict(title='Reads'),
title='Reads vs Log Word Count Sized by Read Ratio'))

With a little more work (see notebook for details), we can even put four variables (this
is not advised) on one graph!

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 9/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

As before, we can combine pandas with plotly+cufflinks for useful plots

df.pivot_table(
values='views', index='published_date',
columns='publication').cumsum().iplot(
mode='markers+lines',
size=8,
symbol=[1, 2, 3, 4, 5],
layout=dict(
xaxis=dict(title='Date'),
yaxis=dict(type='log', title='Total Views'),
title='Total Views over Time by Publication'))

See the notebook or the documentation for more examples of added functionality. We
can add in text annotations, reference lines, and best-fit lines to our plots with a single
line of code, and still with all the interaction.
https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 10/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

. . .

Advanced Plots
Now we’ll get into a few plots that you probably won’t use all that often, but which can
be quite impressive. We’ll use the plotly figure_factory , to keep even these incredible

plots to one line.

Scatter Matrix
When we want to explore relationships among many variables, a scattermatrix (also
called a splom) is a great option:

import plotly.figure_factory as ff

figure = ff.create_scatterplotmatrix(
df[['claps', 'publication', 'views',
'read_ratio','word_count']],
diag='histogram',
index='publication')

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 11/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

Even this plot is completely interactive allowing us to explore the data.

Correlation Heatmap
To visualize the correlations between numeric variables, we calculate the correlations
and then make an annotated heatmap:

corrs = df.corr()

figure = ff.create_annotated_heatmap(
z=corrs.values,
x=list(corrs.columns),
y=list(corrs.index),
annotation_text=corrs.round(2).values,
showscale=True)

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 12/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

The list of plots goes on and on. Cufflinks also has several themes we can use to get
completely different styling with no effort. For example, below we have a ratio plot in
the “space” theme and a spread plot in “ggplot”:

We also get 3D plots (surface and bubble):

For those who are so inclined, you can even make a pie chart:

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 13/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

Editing in Plotly Chart Studio

When you make these plots in the notebook, you’ll notice a small link on the lower
right-hand side on the graph that says “Export to plot.ly”. If you click that link, you are
then taken to the chart studio where you can touch up your plot for a final
presentation. You can add annotations, specify the colors, and generally clean
everything up for a great figure. Then, you can publish your figure online so anyone
can find it with the link.

Below are two charts I touched up in Chart Studio:

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 14/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

With everything mentioned here, we are still not exploring the full capabilities of the
library! I’d encourage you to check out both the plotly and the cufflinks documentation
for more incredible graphics.

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 15/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

Plotly interactive graphics of wind farms in United States (Source)

Conclusions
The worst part about the sunk cost fallacy is you only realize how much time you’ve
wasted after you’ve quit the endeavor. Fortunately, now that I’ve made the mistake of
sticking with matploblib for too long, you don’t have to!

When thinking about plotting libraries, there are a few things we want:

1. One-line charts for rapid exploration

2. Interactive elements for subsetting/investigating data

3. Option to dig into details as needed

4. Easy customization for final presentation

As of right now, the best option for doing all of these in Python is plotly. Plotly allows us
to make visualizations quickly and helps us get better insight into our data through
interactivity. Also, let’s admit it, plotting should be one of the most enjoyable parts of
data science! With other libraries, plotting turned into a tedious task, but with plotly,
there is again joy in making a great figure!

A plot of my enjoyment with plotting in Python over time

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 16/17
8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

Now that it’s 2019, it is time to upgrade your Python plotting library for better
efficiency, functionality, and aesthetics in your data science visualizations.

. . .

As always, I welcome feedback and constructive criticism. I can be reached on Twitter

@koehrsen_will.

Data Science Data Visualization Python Education Towards Data Science

About Help Legal

https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e 17/17

Ethical Info Graphics: in Data Visualization, Journalism Meets Engineering
100% (4)
Ethical Info Graphics: in Data Visualization, Journalism Meets Engineering
3 pages
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
100% (1)
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
41 pages
Plotly Cheatsheet
No ratings yet
Plotly Cheatsheet
2 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
R Programming
100% (8)
R Programming
60 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Critical Capabilities For Property and Casualty Insurance Claims Management Modules
No ratings yet
Critical Capabilities For Property and Casualty Insurance Claims Management Modules
26 pages
Top 100 Tableau Interview Questions and Answers (2021)
100% (1)
Top 100 Tableau Interview Questions and Answers (2021)
18 pages
Performance Magazine Issue 16 Compressed
No ratings yet
Performance Magazine Issue 16 Compressed
79 pages
IntroToDossiers PDF
No ratings yet
IntroToDossiers PDF
119 pages
Plotly Express Cheat Sheet
No ratings yet
Plotly Express Cheat Sheet
1 page
Machine Learning With Decision Trees and Random Forest ?
No ratings yet
Machine Learning With Decision Trees and Random Forest ?
31 pages
Statistical Modeling
No ratings yet
Statistical Modeling
22 pages
Python Plotly
No ratings yet
Python Plotly
8 pages
Time Series Analysis
0% (1)
Time Series Analysis
173 pages
Data Visualization - Plotly
100% (1)
Data Visualization - Plotly
106 pages
Plotly PDF
No ratings yet
Plotly PDF
166 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Plotly
No ratings yet
Plotly
106 pages
Numpy - Pandas Assignment
No ratings yet
Numpy - Pandas Assignment
2 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Pandas Plotting Capabilities
No ratings yet
Pandas Plotting Capabilities
27 pages
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
No ratings yet
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
8 pages
The Complete Guide To Data Preprocessing
No ratings yet
The Complete Guide To Data Preprocessing
50 pages
Stata Ts Introduction To Time-Series Commands
100% (1)
Stata Ts Introduction To Time-Series Commands
6 pages
Time Series Summary
100% (1)
Time Series Summary
23 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Numpy Cheat Sheet & Quick Reference
100% (1)
Numpy Cheat Sheet & Quick Reference
6 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Python Matplotlib: Gaurav Verma
100% (1)
Python Matplotlib: Gaurav Verma
21 pages
Econ209 f2024 Lab 4 Truong Gia Han
No ratings yet
Econ209 f2024 Lab 4 Truong Gia Han
11 pages
Principal Component Analysis (PCA) in Machine Learning
No ratings yet
Principal Component Analysis (PCA) in Machine Learning
20 pages
Cluster
100% (1)
Cluster
72 pages
DecisionTrees RandomForest v2
No ratings yet
DecisionTrees RandomForest v2
27 pages
Lecture Notes WI3411TU Financial Time Series - 2021
No ratings yet
Lecture Notes WI3411TU Financial Time Series - 2021
107 pages
A Practical Time-Series Tutorial With MATLAB
No ratings yet
A Practical Time-Series Tutorial With MATLAB
95 pages
Python Data Analysis Visualization
No ratings yet
Python Data Analysis Visualization
34 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
487 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Random Forest
No ratings yet
Random Forest
5 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Data Visualization Using Pyplot
100% (2)
Data Visualization Using Pyplot
8 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
Forecast
No ratings yet
Forecast
82 pages
Decision Trees For Predictive Modeling (Neville)
100% (1)
Decision Trees For Predictive Modeling (Neville)
24 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Forecast Time Series-Notes
No ratings yet
Forecast Time Series-Notes
138 pages
10.412 Advanced Analytics Reporting
100% (1)
10.412 Advanced Analytics Reporting
260 pages
Summary - Applied Data Science With Python and Jupyter
No ratings yet
Summary - Applied Data Science With Python and Jupyter
2 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Logistic regression Second Edition
From Everand
Logistic regression Second Edition
Gerardus Blokdyk
No ratings yet
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Start Predicting In A World Of Data Science And Predictive Analysis
From Everand
Start Predicting In A World Of Data Science And Predictive Analysis
Matthew Abbitt
No ratings yet
Neural Networks and Fuzzy Logic
From Everand
Neural Networks and Fuzzy Logic
C. Naga Bhaskar
No ratings yet
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
41 pages
Datascienece
No ratings yet
Datascienece
18 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Data Visualization Python Tutorial
100% (1)
Data Visualization Python Tutorial
9 pages
Qlik Sense Vs Tableau (Highlighted)
No ratings yet
Qlik Sense Vs Tableau (Highlighted)
11 pages
Calling Bullshit - Syllabus UOW
No ratings yet
Calling Bullshit - Syllabus UOW
10 pages
DAT206 - Data Visualization: Number of Contact Hours, Credits, & Prerequisites
No ratings yet
DAT206 - Data Visualization: Number of Contact Hours, Credits, & Prerequisites
3 pages
Clinical Data Repositories
No ratings yet
Clinical Data Repositories
15 pages
Power BI - Ebook PDF
No ratings yet
Power BI - Ebook PDF
15 pages
Power BI
100% (1)
Power BI
210 pages
Graph Visualization WP Compressed
No ratings yet
Graph Visualization WP Compressed
22 pages
Tableau Dashboard Best Practices
No ratings yet
Tableau Dashboard Best Practices
15 pages
Gis Desde R
No ratings yet
Gis Desde R
96 pages
Data Visualisation CSE613: Prof. Ramesh Ragala
No ratings yet
Data Visualisation CSE613: Prof. Ramesh Ragala
59 pages
Coursera Main 1 X5T7BGT2PPHU
No ratings yet
Coursera Main 1 X5T7BGT2PPHU
1 page
Visualizations That Really Work
No ratings yet
Visualizations That Really Work
11 pages
The Art of Data Visualization - Learn 7 Visualizations in R
No ratings yet
The Art of Data Visualization - Learn 7 Visualizations in R
15 pages
List of Vacancies CO September 07 2021
No ratings yet
List of Vacancies CO September 07 2021
14 pages
30 Chants For Better Charts PDF
No ratings yet
30 Chants For Better Charts PDF
13 pages
Data Analytics in CRM Processes: A Literature Review: Riga Technical University, Latvia
No ratings yet
Data Analytics in CRM Processes: A Literature Review: Riga Technical University, Latvia
6 pages
MATH Writing PDF
No ratings yet
MATH Writing PDF
194 pages
Assessment Task 5-Social Literacy
100% (1)
Assessment Task 5-Social Literacy
4 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
Data Visualization Using R
No ratings yet
Data Visualization Using R
3 pages
The Data Culture Playbook: A Guide To Building Business Resilience With Data
No ratings yet
The Data Culture Playbook: A Guide To Building Business Resilience With Data
10 pages
Introduction of Data Science - Mahatma Gandhi Central University
No ratings yet
Introduction of Data Science - Mahatma Gandhi Central University
17 pages
Statistics With R Specialization
No ratings yet
Statistics With R Specialization
15 pages
Industrial Training Report
No ratings yet
Industrial Training Report
24 pages
R Data Science Essentials - Sample Chapter
No ratings yet
R Data Science Essentials - Sample Chapter
26 pages

The Next Level of Data Visualization in Python

Uploaded by

The Next Level of Data Visualization in Python

Uploaded by

8/11/2019 The Next Level of Data Visualization in Python - Towards Data Science

The Next Level of Data Visualization

Example of plotly gures (source)

Plotly Brief Overview

cufflinks plotly import the following to run in Jupyter:

# Standard plotly imports

# Using plotly + cufflinks in offline mode

Single Variable Distributions: Histograms and Boxplots

Interactive histogram made with plotly+cu inks

If we want to plot overlaid histograms, that’s just as simple:

With a little bit of pandas manipulation, we can do a barplot:

# Resample to monthly frequency and plot

df2.iplot(kind='bar', xTitle='Date', yTitle='Average',

Create a dataframe of Towards Data Science Articles

# Plot read time as a time series

Getting a nicely formatted time-series x-axis automatically

Adding a secondary y-axis because our variables have different ranges

Adding in the title of the articles as hover information

Scatterplot with annotations

For a two-variable scatter plot colored by a third categorical variable we use:

As before, we can combine pandas with plotly+cufflinks for useful plots

plots to one line.

Even this plot is completely interactive allowing us to explore the data.

We also get 3D plots (surface and bubble):

Editing in Plotly Chart Studio

Below are two charts I touched up in Chart Studio:

Plotly interactive graphics of wind farms in United States (Source)

1. One-line charts for rapid exploration

2. Interactive elements for subsetting/investigating data

3. Option to dig into details as needed

4. Easy customization for final presentation

A plot of my enjoyment with plotting in Python over time

As always, I welcome feedback and constructive criticism. I can be reached on Twitter

Data Science Data Visualization Python Education Towards Data Science

About Help Legal

You might also like