Python Unit 4.Notes
Python Unit 4.Notes
An interactive session with Pyplot, a module in the Matplotlib library, enables dynamic
exploration and manipulation of plots. This approach is particularly useful for data analysis
and visualization, allowing users to observe the effects of changes in real-time.
Interactive Backends
Matplotlib supports several interactive backends, which determine how plots are displayed
and how users can interact with them. Common backends include:
TkAgg: Uses Tkinter.
QtAgg: Uses PyQt or PySide.
nbAgg: or %matplotlib notebook: For use in classic Jupyter Notebook.
widget: or %matplotlib widget: For use in Jupyter Notebooks and JupyterLab,
requires ipympl.
To enable an interactive backend, use the %matplotlib magic command in Jupyter
environments or plt.ion() in other environments before creating any plots.
Basic Interactive Plotting
Here's an example of creating a simple interactive plot:
import matplotlib.pyplot as plt
import numpy as np
# Add interactivity
ax.set_title('Interactive Sine Wave')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
plt.show(block=True)
This example uses ipywidgets.interact to create sliders for adjusting the frequency and
amplitude of the sine wave. The update function redraws the plot whenever the slider values
change.
Basic plotting
We often communicate complex information through pictures, diagrams, maps, plots and
infographics. When starting to analyse a dataset, simple graphical plots are useful to sanity
check data, look for patterns and relationships, and compare different groups.
Exploratory data analysis (EDA) is described by Tukey (1977) as detective work; assembling
clues and evidence relevant to solving the case (question). It involves both visualisation of
data in various graphical plots and looking at summary statistics such as range, mean,
quartiles and correlation coefficient. The aim is to discover patterns and relationships within a
dataset so that these can be investigated more deeply by further data analysis.
A graphical plot gives a visual summary of some part of a dataset. In this step, we will
consider structured data, such as a table of rows and columns (like a spreadsheet), and how
to build some simple graphical plots using the fantastic Matplotlib library for Python (which
is already installed with Jupyter Notebook).
Structured data and variables
When looking at structured data, a row in the table corresponds to a case (also called a
record, example, instance or observation) and a column corresponds to a variable (also called
a feature, attribute, input or predictor).
When it comes to variables, it is essential to distinguish between categorical
variables (names or labels) and quantitative variables (numerical values with magnitude) to
ensure we plot the correct type of plot for that variable. For example, in crime data, ‘type of
offence’ (criminal damage, common assault, etc) would be a categorical variable, whereas in
health data, the ‘resting heart rate’ of an adult would be a quantitative variable.
Bar graph
A bar graph (or bar chart) is useful for plotting a single categorical variable. For example,
we can plot a bar graph (using plt.bar) showing the population of countries using data
from Countries in the world by population (2020).
import matplotlib.pyplot as plt
country = ['China', 'India', 'United Sates', 'Indonesia', 'Pakistan', 'Brazil']
population = [1439323776, 1380004385, 331002651, 273523615, 220892340, 212559417]
plt.bar(country,population)
plt.title('Population of Countries (2020)')
plt.xlabel('Country')
plt.ylabel('Population')
plt.show()
The bar graph produced by the Python code above looks like the image below. Note that the
‘1e9’ at the top of the population axis indicates that the values on that axis are in units of
(1times10^9) which is 1 billion. Notice the presence of the title and axis labels.
For example, using the dataset for number of daily births in the USA (2000-2014) used in a
previous step, we could write the following Python code to plot a boxplot (using plt.boxplot)
of the data.
import pandas as pd
url = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_2000-
2014_SSA.csv'
mydata = pd.read_csv(url)
plt.boxplot(mydata.births, vert=False)
plt.title('Distribution of Number of Daily Births in the USA (2000-2014)')
plt.xlabel('Number of Births')
plt.show()
The boxplot produced by the Python code above looks like the image below. Notice the
minimum daily count (just below 6000), the maximum daily count (just above 16000), and
the median (approximately 12500) represented by the orange bar.
Boxplots are especially useful for comparing the distribution of two datasets.
Scatterplot
A scatterplot is useful for investigating the relationship between two quantitative variables.
For example, the following Python code plots a scatterplot (using plt.scatter) of a dataset
showing ice cream sales and temperature.
temperature = [14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 19.4, 25.1, 23.4, 18.1, 22.6, 17.2]
icecreamsales = [215, 325, 185, 332, 406, 522, 412, 614, 544, 421, 445, 408]
plt.scatter(temperature, icecreamsales)
plt.title('Ice cream sales vs Temperature')
plt.xlabel('Temperature (degrees C'))
plt.ylabel('Ice cream sales')
plt.show()
A bit of cartoon fun
You might have seen the xkcd webcomics. They often include graphs as part of the cartoon.
Scientific Paper Graph Quality © xkcd, CC BY-NC 2.5
Well, as a bit of fun, adding the line below in the Python code for any of the plots mentioned
above, before the first line starting with plt, gives a cartoon version of the plot in the style of
xkcd.
plt.xkcd()
For example, the scatterplot above becomes:
The Matplotlib library in Python provides a whole range of different graphical plots. Don’t
forget to add a title and axis labels.
Logarithmic scale
Logarithmic Plots in Python (Matplotlib)
Two plots on the same Axes with different left and right scales.
Plots with multiple axes
The trick is to use two different Axes that share the same x axis. You can use
separate matplotlib.ticker formatters and locators as desired since the two Axes are
independent.
Such Axes are generated by calling the Axes.twinx method. Likewise, Axes.twiny is
available to generate Axes that share a y axis but have different top and bottom scales.
import matplotlib.pyplot as plt
import numpy as np
# Create some mock data
t = np.arange(0.01, 10.0, 0.01)
data1 = np.exp(t)
data2 = np.sin(2 * np.pi * t)
fig, ax1 = plt.subplots()
color = 'tab:red'
ax1.set_xlabel('time (s)')
ax1.set_ylabel('exp', color=color)
ax1.plot(t, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx() # instantiate a second Axes that shares the same x-axis
color = 'tab:blue'
ax2.set_ylabel('sin', color=color) # we already handled the x-label with ax1
ax2.plot(t, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)
fig.tight_layout() # otherwise the right y-label is slightly clipped
plt.show()
Key Concepts and Techniques:
Dual-Axis Charts (Two Y-Axes):
This is a common type where one set of data is plotted on the primary Y-axis, and another is
plotted on a secondary Y-axis on the opposite side of the chart.
Adding Axes in Matplotlib:
Libraries like Matplotlib provide functions like twinx() to create a second Y-axis that shares
the same x-axis, or twiny() for a second X-axis that shares the same Y-axis.
Positioning and Visibility:
You can control the visibility and position of the secondary axis (e.g., on the left or right side)
using spines and related functions within the charting library.
Multiple Axes in a Figure:
Libraries like Matplotlib also allow arranging multiple plots within a figure, each with its
own set of axes, using subplots() or tiledlayout().
Example (Conceptual):
Imagine you want to compare temperature (Celsius, primary Y-axis) and rainfall (mm,
secondary Y-axis) over a period of time (X-axis). You could use a dual-axis chart to display
both datasets on the same graph, allowing for easier comparison of how temperature and
rainfall might correlate.
Benefits of Multiple Axes:
Visual Comparison:
Allows for a side-by-side comparison of different datasets or data with different scales.
Enhanced Data Interpretation:
Can reveal trends or relationships that might be missed when data is presented separately.
Complex Data Visualization:
Provides a way to represent more complex data structures and relationships in a single chart.
Tools and Libraries:
Excel: Provides a built-in feature to add a secondary Y-axis to charts.
Matplotlib: A powerful Python library for creating static, interactive, and animated
visualizations.
Plotly: A free and open-source graphing library for Python and JavaScript, with
capabilities for multiple axes.
BI Office (Pyramid Analytics): Offers multi-axis charts for data analysis and
reporting.
xViz (Microsoft AppSource): Provides a Power BI visual for creating multi-axis
charts.
Applications in Fields
Field Greek Symbols Used Examples
Structure of matplotlib
At a high level, matplotlib has a hierarchical structure. The key levels are:
1. Figure
2. Axes
3. Axis
4. Artist
Let’s explore each level.
Figure
The entire canvas or window in which all plots appear.
It can contain multiple subplots (Axes).
Think of it as a blank sheet of paper.
python
CopyEdit
import matplotlib.pyplot as plt
fig = plt.figure()
Or using pyplot shortcut:
python
CopyEdit
fig, ax = plt.subplots()
Axes
A single plot or graph inside the figure.
A figure can have multiple Axes (subplots).
An Axes contains:
o X and Y axis (and optionally Z in 3D)
o Labels, titles, lines, ticks, legends, etc.
python
CopyEdit
ax = fig.add_subplot(1, 1, 1)
Or with subplots:
python
CopyEdit
fig, ax = plt.subplots()
Note: Don’t confuse Axes (plot area) with Axis (x-axis or y-axis object).
Axis
Refers to individual x and y axis objects in an Axes.
Controls the:
o Scale (linear, log)
o Ticks and tick labels
o Limits (like xlim, ylim)
python
CopyEdit
ax.set_xlim(0, 10)
ax.set_ylim(0, 100)
Artist
Everything that you see in a plot (lines, text, ticks, labels, etc.) is an Artist.
Two types:
o Primitive Artists: Lines, text, patches, etc.
o Container Artists: Axes, Figures that contain other artists.
For example:
python
CopyEdit
line = ax.plot(x, y) # line is an Artist
ax.set_title("Title") # Title is an Artist
# Data
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
# Plot
ax.plot(x, y)
# Customize
ax.set_title('Simple Plot')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
# Show
plt.show()
Additional Components
pyplot Module
matplotlib.pyplot is a state-based interface to matplotlib, like MATLAB.
Maintains an implicit figure and axes.
Backends
matplotlib can render on different backends:
o GUI (TkAgg, Qt5Agg)
o File (PDF, SVG, PNG)
Summary Table
Component Role Example
Vector Field Represents direction and magnitude of vectors over a grid (e.g., wind,
Plot magnetic field).
1. Contour Plots
Definition:
A contour plot shows lines (contours) where a function of two variables f(x, y) is constant.
Each contour line represents a specific height (z-value).
Example: Contour Plot in matplotlib
python
CopyEdit
import numpy as np
import matplotlib.pyplot as plt
# Define function
def f(x, y):
return np.sin(x) ** 2 + np.cos(y) ** 2
# Generate grid
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
# Define grid
x = np.linspace(-2, 2, 20)
y = np.linspace(-2, 2, 20)
X, Y = np.meshgrid(x, y)
Summary Table
Function Description
Returning Path
The OS Module gives the Python Path within the system and the Sys module gives the
Python Path that is shown below:
import os
import sys
# Getting current working directory using os.getcwd()
current_directory = os.getcwd()
# Creating a new directory using os.mkdir()
new_directory1 = os.path.join(current_directory, 'new_folder')
10
os.mkdir(new_directory1)
# Checking Python path using sys.executable
python_path = sys.executable
print("Current Directory:", current_directory)
print("New Directory Path:", new_directory1)
print("Python Path:", python_path)
Output:
It is mainly focused on
It offers extensive file
performing the system-level
manipulation operations
File and Directory Ops interactions.
OS Module is focussed on
Customizing Imports It enables customization of
file and directory operations
Python’s import behavior.
without customization.
It facilitates configuration
It does not focus on the file
and interaction with the
interpreter setup..
Interpreter Configuration Python interpreter.
‘
Introduction to File Input and Output
I/O Streams
A stream is a communication channel that a program has with the outside world. It is used to
transfer data items in succession.
An Input/Output (I/O) Stream represents an input source or an output destination. A stream
can represent many different kinds of sources and destinations, including disk files, devices,
other programs, and memory arrays.
Streams support many different kinds of data, including simple bytes, primitive data types,
localized characters, and objects. Some streams simply pass on data; others manipulate and
transform the data in useful ways.
No matter how they work internally, all streams present the same simple model to programs
that use them: A stream is a sequence of data.
Reading information into a program.
A program uses an input stream to read data from a source, one item at a time:
Writing information from a program.
A program uses an output stream to write data to a destination, one item at time:
The data source and data destination pictured above can be anything that holds, generates, or
consumes data. Obviously this includes disk files, but a source or destination can also another
program, a peripheral device, a network socket, or an array.
file1.seek(0)
# To show difference between read and readline
print("Output of Read(9) function is ")
print(file1.read(9))
print()
file1.seek(0)
print("Output of Readline(9) function is ")
print(file1.readline(9))
file1.seek(0)
# readlines function
print("Output of Readlines function is ")
print(file1.readlines())
print()
file1.close()
Output:
Output of Read function is
Hello
This is Delhi
This is Paris
This is London
Output of Readline function is
Hello
Output of Read(9) function is
Hello
Th
Output of Readline(9) function is
Hello
Output of Readlines function is
['Hello \n', 'This is Delhi \n', 'This is Paris \n', 'This is London \n']
file1.writelines(lst)
file1.close()
print("Data is written into the file.")
Output:
Data is written into the file.
Appending to a File in Python
In this example, a file named "myfile.txt" is initially opened in write mode ( "w" ) to write
lines of text. The file is then reopened in append mode ( "a" ), and "Today" is added to the
existing content. The output after appending is displayed using readlines . Subsequently, the
file is reopened in write mode, overwriting the content with "Tomorrow". The final output
after writing is displayed using readlines.
file1 = open("myfile.txt", "w")
L = ["This is Delhi \n", "This is Paris \n", "This is London \n"]
file1.writelines(L)
file1.close()
# Append-adds at last
file1 = open("myfile.txt", "a") # append mode
file1.write("Today \n")
file1.close()
# Write-Overwrites
file1 = open("myfile.txt", "w") # write mode
file1.write("Tomorrow \n")
file1.close()
In this article we have covered python function that can used on CSV files but if you wish to
learn python to advance level and having good grip on the python concepts then you should
checkout our Complete python course
Reading a CSV file
Reading from a CSV file is done using the reader object. The CSV file is opened as a text file
with Python’s built-in open() function, which returns a file object. In this example, we first
open the CSV file in READ mode, file object is converted to csv.reader object and further
operation takes place. Code and detailed explanation is given below.
# importing csv module
import csv
# csv file name
filename = "aapl.csv"
# initializing the titles and rows list
fields = []
rows = []
# reading csv file
with open(filename, 'r') as csvfile:
# creating a csv reader object
csvreader = csv.reader(csvfile)
# extracting field names through first row
fields = next(csvreader)
# extracting each data row one by one
for row in csvreader:
rows.append(row)
# get total number of rows
print("Total no. of rows: %d" % (csvreader.line_num))
# printing the field names
print('Field names are:' + ', '.join(field for field in fields))
The above example uses a CSV file aapl.csv which can be downloaded from here .
Run this program with the aapl.csv file in the same directory.
Let us try to understand this piece of code.
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile)
Here, we first open the CSV file in READ mode. The file object is named as csvfile .
The file object is converted to csv.reader object. We save the csv.reader object
as csvreader.
fields = csvreader.next()
csvreader is an iterable object. Hence, .next() method returns the current row and
advances the iterator to the next row. Since, the first row of our csv file contains the
headers (or field names), we save them in a list called fields .
for row in csvreader:
rows.append(row)
Now, we iterate through the remaining rows using a for loop. Each row is appended to
a list called rows . If you try to print each row, one can find that a row is nothing but a
list containing all the field values.
print("Total no. of rows: %d"%(csvreader.line_num))
csvreader.line_num is nothing but a counter which returns the number of rows that
have been iterated.
Reading CSV Files Into a Dictionary With csv
We can read a CSV file into a dictionary using the csv module in Python and
the csv.DictReader class. Here's an example:
Suppose, we have a employees.csv file and content inside it will be:
Name ,department ,birthday ,month
John Smith, HR, July
Alice Johnson, IT, October
Bob Williams, Finance, January
In this example, csv.DictReader reads each row of the CSV file as a dictionary where the
keys are the column headers, and the values are the corresponding values in each row. The
dictionaries are then appended to a list ( data_list in this case).
import csv
# Open the CSV file for reading
with open('employees.csv', mode='r') as file:
# Create a CSV reader with DictReader
csv_reader = csv.DictReader(file)
# Initialize an empty list to store the dictionaries
data_list = []
# Iterate through each row in the CSV file
for row in csv_reader:
# Append each row (as a dictionary) to the list
data_list.append(row)
# Print the list of dictionaries
for data in data_list:
print(data)
Output:
{'name': 'John Smith', 'department': 'HR', 'birthday_month': 'July'}
{'name': 'Alice Johnson', 'department': 'IT', 'birthday_month': 'October'}
{'name': 'Bob Williams', 'department': 'Finance', 'birthday_month': 'January'}
university record
We notice that the delimiter is not a comma but a semi-colon. Also, the rows are
separated by two newlines instead of one. In such cases, we can specify the delimiter
and line terminator.
Reading CSV Files With P andas
We can read a Python CSV files with Pandas using the pandas.read_csv( ) function. Here's
an example:
Suppose, we have a employees.csv file and content inside it will be:
name,department,birthday_month
John Smith,HR,July
Alice Johnson,IT,October
Bob Williams,Finance,January
In this example, pd.read_csv() reads the CSV file into a Pandas DataFrame. The resulting
DataFrame can be used for various data manipulation and analysis tasks.
import pandas as pd
Output: