Introduction To Data Analytics: Instructor: Parisa Pouladzadeh Email: Parisa - Pouladzadeh@humber - Ca
Introduction To Data Analytics: Instructor: Parisa Pouladzadeh Email: Parisa - Pouladzadeh@humber - Ca
Analytics
ITC 5201
Instructor: Parisa Pouladzadeh
Email: parisa.pouladzadeh@humber.ca
Pandas
• If the data is really large you don’t want to print out the entire dataframe to your
output.
• The head(n) method outputs the first n rows of the data frame. If n is not supplied,
the default is the first 5 rows.
• I like to run the head() method after I read in the dataframe to check that
everything got read in correctly.
• There is also a tail(n) method that returns the last n rows of the dataframe
Basic Features
Think of this
as a list
object = string
float64 = decimal
int64 = integer
Basic Features
column names
Slice/index through
the index, which is
usually numbers
Slicing a Series
Slice/index through
the index, which is
usually numbers
Slice/index through
the index, which is
usually numbers
Slice/index through
the index, which is
usually numbers
• There are a few ways to pick slice a data frame, we will use the .loc method.
first_row is a series
Slicing a Data Frame
We can also create column as function of other column. The Final was worth 36
points, let’s create a column for each student’s percentage.
Deleting Columns
The Drop Method
“old_column_name”: “new_column_name”
Changing Column Names
Let’s say you had separate csv files with the info for the students who got an A and
everyone else, but you want to analyze everything together.
Concatenating DataFrames - Stacked
We can replace the missing data with a true NaN (right now everything is just a string).
Isnull() Method
• The isnull() method lets you check where the NaNs are:
Isnull() Method
• The isnull() method lets you check where the NaNs are:
Dropna() Method
Rather than getting rid of rows/columns, we fill the “holes” in a number of ways.
Introduction to Data
Analytics
ITE 5201
Lecture4-Data Visualization
Instructor: Parisa Pouladzadeh
Email: parisa.pouladzadeh@humber.ca
Data visualization
Data Showcasing
◦ For presentations to analysts, scientist, mathematicians, and engineers
Data Art
◦ For presentations to activists or to the general public
◦ Make it easy for the audience to get the point. Your data visualization
should be:
• Clutter-free
• Highly organized
◦ Audience:
• Nonanalysts
• Nontechnical business managers
◦ Product types:
• Static images
• Simple interactive dashboards
Line graphs can also be used to compare changes over the same
period of time for more than one group.
Pie charts are best to use when you are trying to compare parts of
a whole. They do not show changes over time.
Bar graphs are used to compare things between different groups
or to track changes over time.
Seaborn
◦ For example, when the Time Series Analysis shows a pattern that is
upward, we call it an Uptrend,
◦ when the pattern is downward, we call it a Down trend
◦ if there was no trend at all, we call it a horizontal or stationary trend
◦ Trend happens for a period of time and then disappears. However Seasonality keeps
happening within a fixed time period.
◦ For example, when it’s Christmas, you discover more candies and chocolates are sold and this
keeps happening every year.
◦ A very good example is the case of Ebola. During that period, there
was a massive demand for hand sanitizers
Lecture 4
Matplotlib Overview
Importing Library
You'll also need to use this line to see plots in the notebook:
That line is only for jupyter notebooks, if you are using another editor, you'll use: plt.show() at the end of all your plotting commands to have the figure pop up
in another window.
In [73]: 1 x
In [74]: 1 y
Out[74]: array([ 0, 4, 16, 36, 64, 100, 144, 196, 256, 324], dtype=int32)
localhost:8888/notebooks/Lecture3-Part2.ipynb# 1/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
In [76]: 1 #Example
2 x = range(1,10)
3 y = [1,2,3,4,0,4,3,2,1]
4
5
Now, we want to show the line chart of the real dataset.First, download the dataset from Blackboard and then change the address to the address of your
downloaded dataset.
In [78]: 1 mpg.plot()
2
Out[78]: <AxesSubplot:>
In [79]: 1 mpg.plot()
2 plt.xlabel('number of cars')
3 plt.ylabel('mpg')
4 plt.title('statistic')
5 plt.show()
In [80]: 1 df = cars[['cyl','wt','mpg']]
2 df.plot()
Out[80]: <AxesSubplot:>
localhost:8888/notebooks/Lecture3-Part2.ipynb# 2/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
To begin we create a figure instance. Then we can add axes to that figure:
Code is a little more complicated, but the advantage is that we now have full control of where the plot axes are placed, and we can easily add more than one
axis to the figure:
localhost:8888/notebooks/Lecture3-Part2.ipynb# 3/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
subplots()
Here we will have two subplot , x1 and x2. in this subplot we will have one row and two column. so we will write: ax1.plot(x) and ax2.plot(x,y)
Then you can specify the number of rows and columns when creating the subplots() object:
localhost:8888/notebooks/Lecture3-Part2.ipynb# 4/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
Out[86]:
Saving figures
Matplotlib can generate high-quality output in a number formats, including PNG, JPG, EPS, SVG, PGF and PDF.
To save a figure to a file we can use the savefig method in the Figure class:
In [87]: 1 fig.savefig("filename.png")
In [88]: 1 %pwd
Out[88]: 'C:\\Users\\prpou'
linetypes
In [89]: 1 import numpy as np
2 x = np.linspace(0, 5, 11)
3 y = x ** 2
In [90]: 1 x
In [91]: 1 y
localhost:8888/notebooks/Lecture3-Part2.ipynb# 5/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
line Colors
In [93]: 1 fig, ax = plt.subplots()
2
3 ax.plot(x, x+1, color="blue", alpha=0.5) # half-transparant
4 ax.plot(x, x+2, color="#8B008B") # RGB hex code
5 ax.plot(x, x+3, color="#FF8C00") # RGB hex code
Example:
In [94]: 1 fig, ax = plt.subplots(figsize=(12,6))
2
3 ax.plot(x, x+1, color="red", linewidth=0.25)
4 ax.plot(x, x+2, color="red", linewidth=0.50)
5 ax.plot(x, x+3, color="red", linewidth=1.00)
6 ax.plot(x, x+4, color="red", linewidth=2.00)
7
8 # possible linestype options ‘-‘, ‘–’, ‘-.’, ‘:’, ‘steps’
9 ax.plot(x, x+5, color="green", lw=3, linestyle='-')
10 ax.plot(x, x+6, color="green", lw=3, ls='-.')
11 ax.plot(x, x+7, color="green", lw=3, ls=':')
12
13 # custom dash
14 line, = ax.plot(x, x+8, color="black", lw=1.50)
15 line.set_dashes([5, 10, 15, 10]) # format: line length, space length, ...
16
17 # possible marker symbols: marker = '+', 'o', '*', 's', ',', '.', '1', '2', '3', '4', ...
18 ax.plot(x, x+ 9, color="blue", lw=3, ls='-', marker='+')
19 ax.plot(x, x+10, color="blue", lw=3, ls='--', marker='o')
20 ax.plot(x, x+11, color="blue", lw=3, ls='-', marker='s')
21 ax.plot(x, x+12, color="blue", lw=3, ls='--', marker='1')
22
23 # marker size and color
24 ax.plot(x, x+13, color="purple", lw=1, ls='-', marker='o', markersize=2)
25 ax.plot(x, x+14, color="purple", lw=1, ls='-', marker='o', markersize=4)
26 ax.plot(x, x+15, color="purple", lw=1, ls='-', marker='o', markersize=8, markerfacecolor="red")
27 ax.plot(x, x+16, color="purple", lw=1, ls='-', marker='s', markersize=8,
28 markerfacecolor="yellow", markeredgewidth=3, markeredgecolor="green");
localhost:8888/notebooks/Lecture3-Part2.ipynb# 6/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
In [21]: 1 plt.bar(x,y)
localhost:8888/notebooks/Lecture3-Part2.ipynb# 7/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
In [85]: 1 mpg.plot(kind="bar")
In [62]: 1 mpg.plot(kind="barh")
In [64]: 1 x = [1,2,3,4,0.5]
2 plt.pie(x)
3 plt.show()
In [65]: 1 plt.pie(x)
2 plt.savefig('pie_chart.png')
3 plt.show()
localhost:8888/notebooks/Lecture3-Part2.ipynb# 8/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
In [66]: 1 %pwd
Out[66]: 'C:\\Users\\prpou'
Pie chart-Labels
In [60]: 1 import matplotlib.pyplot as plt
2
3 y = [35, 25, 25, 15]
4 mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
5
6 plt.pie(y, labels = mylabels)
7 plt.show()
Pie chart-Explode
In [63]: 1 myexplode = [0.2, 0, 0, 0]
2
3 plt.pie(y, labels = mylabels, explode = myexplode)
4 plt.show()
localhost:8888/notebooks/Lecture3-Part2.ipynb# 9/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
Pie chart-Shadow
In [65]: 1 plt.pie(y, labels = mylabels, explode = myexplode, shadow = True)
2 plt.show()
Pie chart-Colors
https://www.w3schools.com/colors/colors_hexadecimal.asp (https://www.w3schools.com/colors/colors_hexadecimal.asp)
https://www.w3schools.com/colors/colors_names.asp (https://www.w3schools.com/colors/colors_names.asp)
Pie chart-Legend
In [67]: 1 plt.pie(y, labels = mylabels)
2 plt.legend()
3 plt.show()
localhost:8888/notebooks/Lecture3-Part2.ipynb# 10/11
1/26/23, 9:12 AM Lecture3-Part2 - Jupyter Notebook
In [ ]: 1
localhost:8888/notebooks/Lecture3-Part2.ipynb# 11/11