0% found this document useful (0 votes)

12 views41 pages

ELE492 - ELE492 - Image Process Lecture Notes 5

This document provides an overview of data science libraries in Python for analyzing data, including Pandas, NumPy, and Matplotlib. It discusses what each library is used for and provides examples of how to create data frames and arrays and manipulate data using these libraries. Key terms discussed include data frames, series, indexing, slicing, and broadcasting.

Uploaded by

ozllmtkn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views41 pages

ELE492 - ELE492 - Image Process Lecture Notes 5

Uploaded by

ozllmtkn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

ELE 492: Image Processing

Assoc. Prof. Seniha Esen Yüksel

Lecture 5: Data Science Libraries

Hacettepe University
Department of Electrical and Electronics Engineering

** Course slides are the courtesy of Fual Akal, Erkut Erdem

and Aykut Erdem from the lecture notes of BBM 101
1
Lecture Overview
• Introduction to Data Science
– Data, Data Science, Data Scientist…
• Python Libraries to Analyse Data
– Pandas
– Numpy
– Matplotlib

Disclaimer: Much of the material and slides for this lecture were borrowed from
- IBM Courses at Coursera, https://www.coursera.org/professional-certificates/ibm-data-science
- CS109 Data Science course at Harvard University, by Rafael A. Irizarry and Verena Kaynig-Fittkau.
- Python Numpy Tutorial by Justin Johnson.
2
What is Data?

3
What is Data Science?
• Data science is the study of data.

• It involves developing methods of recording, storing,

and analyzing data to effectively extract useful
information.

• The goal of data science is to gain insights and

knowledge from any type of data — both structured
and unstructured.

4
Everybody is Talking about Data

5
Why Data Science Now?
• We are producing more and more data every minute via
– Sensors
– Video Surveillance Cameras
– Browsing Web
– Medical Instruments
– …

• The biggest data source we

have today is Internet
– Currently at Exabytes

• Getting insights out of data

is crucial as we want to
– Build better football teams
– Sell more products
– Avoid fraud
– Find treatments
– …

6
Python Libraries for Machine Learning

7
Python Libraries to Analyse Data
• Pandas
– Provides data structures and operations for data (e.g.
tables and time series) manipulation and analysis.

• Numpy
– Provides means to work with multidimensional arrays.

• Matplotlib
– A plotting library used to create high-quality graphs,
charts, and figures.

8
Pandas
• A library that contains high-performance, easy-to-use data
structures and data analysis tools.

• Some important aspects of Pandas

– A fast and efficient DataFrame object for data manipulation with

integrated indexing.

– Tools for reading and writing data in different formats, e.g. csv, Excel,
SQL Database.

– Slicing, indexing, subsetting, merging and joining of huge datasets.

• Typically imported as import pandas as pd in Python

programs

9
Create DataFrames using Dictionaries
import pandas as pd
data = { 'name': ['Fuat', 'Aykut', 'Erkut'],
'midterm': [60, 85, 100],
'final': [69, 90, 100],
'attendance': [6, 10, 10]
}
df_bbm101 = pd.DataFrame(data)

print(df_bbm101.head()) # Prints top 5 rows

name midterm final attendance

0 Fuat 60 69 7
1 Aykut 85 90 10
2 Erkut 100 100 10

10
Same Thing, in Another Way
names = ['Fuat', 'Aykut', 'Erkut']
midterms = [60, 85, 100]
finals = [69, 90, 100]
attendances = [6, 10, 10]

list_labels = ['name', 'midterm', 'final', 'attendance']

list_cols = [names, midterms, finals, attendances]

zipped = list(zip(list_labels, list_cols))

print(zipped) # [('name', ['Fuat', 'Aykut', 'Erkut']),

# ('midterm', [60, 85, 100]),
# ('final', [69, 90, 100]),
# ('attendance', [6, 10, 10])]

data = dict(zipped)

df_bbm101 = pd.DataFrame(data)

11
Broadcasting
df_bbm101['total'] = 0
# Adds new column to df and
# broadcasts 0 to entire column

print(df_bbm101.head())

name midterm final attendance total

0 Fuat 60 69 6 0
1 Aykut 85 90 10 0
2 Erkut 100 100 10 0

12
Compute Columns
df_bbm101['total'] = df_bbm101['midterm']*0.3 + \
df_bbm101['final']*0.6 + \
df_bbm101['attendance']*0.1

df_bbm101.loc[(df_bbm101['total'] >= 60) &

(df_bbm101['total'] < 70), 'grade'] = 'D’
… # Code to compute Bs and Cs comes here
df_bbm101.loc[df_bbm101['total'] >= 90, 'grade'] = 'A’

print(df_bbm101.head())

name midterm final attendance total grade

0 Fuat 60 69 6 60.0 D
1 Aykut 85 90 10 80.5 B
2 Erkut 100 100 10 91.0 A

13
Subsetting/Slicing Data
print(df_bbm101[['name', 'grade']])

print(df_bbm101.iloc[:, [0, 5]])

print(df_bbm101.iloc[:, [True, False, False, False,

False, True]])

# They all return the same thing

# name and grade columns of the df
# Same principle can be applied to rows as well

name grade
0 Fuat D
1 Aykut B
2 Erkut A

14
DataFrames from CSV Files

file name: bbm101.csv

df_bbm101 = pd.read_csv('bbm101.csv')
print(df_bbm101.head())

name midterm final attendance total grade

0 Fuat 60 69 6 60.0 D
1 Aykut 85 90 10 80.5 B
2 Erkut 100 100 10 91.0 A

15
Indexing DataFrames
df_bbm101 = pd.read_csv('bbm101.csv’, index_col ='name')
print(df_bbm101.head())
midterm final attendance total grade
name
Fuat 60 69 6 60.0 D
Aykut 85 90 10 80.5 B
Erkut 100 100 10 91.0 A

print(df_bbm101.loc['Fuat']) midterm 60
final 69
attendance 6
total 60
print(df_bbm101. grade D
loc[['Aykut', 'Erkut']]) Name: Fuat, dtype: object

midterm final attendance total grade

name
Aykut 85 90 10 80.5 B
Erkut 100 100 10 91.0 A 16
Numpy
• A library for the Python programming language, adding
support for large multi-dimensional arrays and matrices,
– along with a large collection of high-level mathematical
functions to operate on these arrays.

• A numpy array is a grid of values, all of the same type, and

is indexed by a tuple of nonnegative integers.

• The number of dimensions is the rank of the array.

• The shape of an array is a tuple of integers giving the size of

the array along each dimension.

• Typically imported as import numpy as np in Python

programs

17
Creating Numpy Arrays
import numpy as np

a = np.array([1,2,3]) # Create a rank 1 array

print(type(a)) # <class 'numpy.ndarray'>
print(a.shape) # (3,)
print(a) # [1 2 3]
print(a[0], a[1], a[2]) # 1 2 3

b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array

print(b.shape) # (2, 3)
print(b) # [[1 2 3]
# [4 5 6]]
print(b[0, 0], b[0, 1], b[1, 0]) # 1 2 4

18
Miscellaneous Ways to Create Arrays
a = np.zeros((2,2)) # Create an array of all zeros
print(a) # [[ 0. 0.]
# [ 0. 0.]]

b = np.ones((1,2)) # Create an array of all ones

print(b) # [[ 1. 1.]]

c = np.full((2,2), 7) # Create a constant array

print(c) # [[ 7. 7.]
# [ 7. 7.]]

d = np.eye(2) # Create a 2x2 identity matrix

print(d) # [[ 1. 0.]
# [ 0. 1.]]

e = np.random.random((2,2)) # Create an array filled with

# random values
print(e) # Might print
# [[ 0.91940167 0.08143941]
# [ 0.68744134 0.87236687]]
19
Indexing Arrays
• Slicing
• Integer Indexing
• Boolean (or, Mask) Indexing

20
Slicing
• Similar to slicing Python lists.
• Since arrays may be multidimensional, you must
specify a slice for each dimension of the array.
• Slices are views (not copies) of the original data.

21
Slicing Examples
a = np.array([[1, 2, 3, 4], # Create a rank 2 array
[5, 6, 7, 8], # with shape (3, 4)
[9, 10, 11, 12]])

print(a) # [[ 1 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]

b = a[:2, 1:3]
print(b) # [[ 2 3 ]
# [ 6 7 ]

print(a[1, :]) # [5 6 7 8]

print(a[:, :-2]) # [[ 1 2]
# [ 5 6]
# [ 9 10]]

22
Integer Indexing
• NumPy arrays may be indexed with other arrays.
• Index arrays must be of integer type.
• Each value in the array indicates which value in the
array to use in place of the index.
• Returns a copy of the original data.

23
Integer Indexing Examples
a = np.array([1, 2, 3, 4, 5, 6])
print(a) # [1 2 3 4 5 6]
print(a[[1, 3, 5]]) # [2 4 6]

a = np.array([[1, 2], [3, 4], [5, 6]])

print(a) # [[ 1 2 ]
# [ 3 4 ]
# [ 5 6 ]]

# The returned array will have shape (3,)

print(a[[0, 1, 2], [0, 1, 0]]) # [1 4 5]
print(np.array([a[0, 0], a[1, 1], a[2, 0]])) # [1 4 5]

# The same element from the source array can be reused

print(a[[0, 0], [1, 1]]) # [2 2]
print(np.array([a[0, 1], a[0, 1]])) # [2 2]

24
Boolean (or, Mask) Indexing
• Boolean array indexing lets you pick out arbitrary
elements of an array.
• Frequently used to select the elements of an array
that satisfy some condition.
– Thus, called the mask indexing.

25
Boolean (or, Mask) Indexing Examples
a = np.array([1, 2, 3, 4, 5, 6])

bool_idx = (a > 2)
# Find the elements of a that are bigger than 2;
# this returns a numpy array of Booleans of the same
# shape as a, where each slot of bool_idx tells
# whether that element of a is > 2.

print(bool_idx) # [False False True

# True True True]

# We use boolean array indexing to construct a rank 1 array

# consisting of the elements of a corresponding to the True
# values of bool_idx
print(a[bool_idx]) # [3 4 5 6]

# We can do all of the above in a single concise statement:

print(a[a > 2]) # [3 4 5 6]
26
Array Math
• Basic mathematical functions operate elementwise
on arrays.
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

# Elementwise sum
print(x + y)
print(np.add(x, y))
# [[ 6 8]
# [10 12]]

# Elementwise product
print(x * y)
print(np.multiply(x, y)) Same principle holds for
# [[ 5 12] “np.divide, /” and “np.subtract, -”
# [21 32]]

27
Array Math (Cont’d)
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

v = np.array([9, 10]
w = np.array([11, 12])

# Inner product of vectors; # Matrix / matrix product;

# both produce 219 # both produce a rank 2 array
print(v.dot(w)) # [[19 22]
print(np.dot(v, w)) # [43 50]]
print(x.dot(y))
# Matrix / vector product; print(np.dot(x, y))
# both produce the rank 1
# array [29 67] # Transpose of x
print(x.dot(v)) # [[1 3]
print(np.dot(x, v)) # [2 4]]
print(x.T)

28
Matplotlib
• Python 2D plotting library which produces publication quality
figures in a variety of hardcopy formats and interactive
environments.

• Typically imported as import matplotlib.pyplot as

plt in Python programs.

• Pyplot is a module of Matplotlib which provides simple

functions to add plot elements like lines, images, text, etc.

• There are many plot types. Some of are more frequently used.

29
Why Build Visuals?
• For exploratory data analysis
• Communicate data clearly
• Share unbiased representation of data
• A picture is worth a thousand words 

30
Make a Simple Plot
import matplotlib.pyplot as plt

plt.plot(5, 5, 'o')

plt.title("Plot a Point")

plt.xlabel("X")
plt.ylabel("Y")

plt.show()

31
Plot a Simple Line
import matplotlib.pyplot as plt

year = ['2016', '2017', '2018', '2019', '2020']

lowest_rank = [21358, 20816, 17555, 11743, 7500]

plt.plot(year, lowest_rank)

plt.title(”HU-BBM Progress")
plt.xlabel('Year')
plt.ylabel('Lowest Rank')

plt.show()

32
Dataset to Use for the Rest of This Section
• The Population Division of the United Nations compiled data
pertaining to 45 countries.

• For each country, annual data on the flows of international

migrants is reported in addition to other metadata.

• We will work with data on Canada.

• You can get the original data at:

– https://www.un.org/en/development/desa/population/migration/dat
a/empirical2/migrationflows.asp#

33
Immigration Data to Canada

34
Read Data into Pandas Dataframe
df = pd.read_excel
(’http://www.un.org/…/Canada.xlsx',
sheetname='Canada by Citizenship',
skiprows=range(20),
skip_footer=2)

print(df.head())

35
After Little Preprocessing

In case you want to try:

df_canada = df.drop(columns=['Type', 'Coverage', 'AREA', 'REG', 'DEV'])
df_canada.rename(columns={'OdName':'Country', 'AreaName':'Continent', /
'RegName':'Region'}, inplace=True)
df_canada.set_index('Country', inplace=True)
df_canada['Total'] = df_canada.sum(axis=1)

36
Line Plots
A line plot displays information as a series of data points called
‘markers’ connected by straight line segments.

years = list(range(1980, 2014))

df_canada.loc['Haiti', years].plot(kind = 'line')

plt.title('Immigration from Haiti')

plt.xlabel('Years')
plt.ylabel('Number of Immigrants')

plt.show()

37
Area Plots
Commonly used to
represent cumulated
totals using numbers
or percentages over time.

df_canada.sort_values(['Total'], ascending=False,
axis=0, inplace=True)
df_top5 = df_canada.head()
df_top5 = df_top5[years].transpose()
df_top5.plot(kind='area')

plt.title('Immigration trend of top 5 countries')

plt.xlabel('Years')
plt.ylabel('Number of Immigrants')

plt.show() 38
Histogram
Histogram is a way of representing the frequency distribution of
a variable.
df_canada[2013].plot(kind='hist')

plt.title('Histogram of Immigration in 2013')

plt.xlabel('Number of Immigrants')
plt.ylabel('Number of Countries')

plt.show()

39
Bar Chart
Unlike a histogram, a bar
chart is commonly used
to compare the values of
a variable at a given point.

df_iceland = df_canada.loc['Iceland', years]

df_iceland.plot(kind='bar')

plt.title('Icelandic Immigrants to Canada from 1980 to 2013')

plt.xlabel('Year')
plt.ylabel('Number of Immigrants')

plt.show()
40
Pie Chart
A pie chart is a circular statistical graphic divided into slices to
illustrate numerical proportion.

df_continents = df_canada.groupby('Continent', axis=0).sum()

df_continents['Total'].plot(kind='pie')

plt.title('Immigration to Canada by Continent')

plt.show()

SOP of SOC
No ratings yet
SOP of SOC
10 pages
Bluetooth and Wifi Elm327 Instructions
No ratings yet
Bluetooth and Wifi Elm327 Instructions
5 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
Week 4- Introduction to Python #3
No ratings yet
Week 4- Introduction to Python #3
47 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
4 Introduction to Python Part 3 (2)
No ratings yet
4 Introduction to Python Part 3 (2)
48 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
21284254 Python Module 5
No ratings yet
21284254 Python Module 5
43 pages
Python Libraries
No ratings yet
Python Libraries
79 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
M3-Introduction to Numpy and Pandas
No ratings yet
M3-Introduction to Numpy and Pandas
55 pages
PPS - Unit 5 (Imp Topics)
No ratings yet
PPS - Unit 5 (Imp Topics)
7 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
RAW Data
No ratings yet
RAW Data
22 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
PyDays Day-2 - Final
No ratings yet
PyDays Day-2 - Final
26 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
4 Introduction to Python Part 3(1)
No ratings yet
4 Introduction to Python Part 3(1)
62 pages
Python Numpy Primer
No ratings yet
Python Numpy Primer
54 pages
Numpy_Data_Analysis_and_visualisation_with_Python
No ratings yet
Numpy_Data_Analysis_and_visualisation_with_Python
75 pages
Lecture 2 - NumPy I
No ratings yet
Lecture 2 - NumPy I
12 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
NUMPY
No ratings yet
NUMPY
33 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
PP&DS UNIT III
No ratings yet
PP&DS UNIT III
26 pages
Unit-V Python_BCC402
No ratings yet
Unit-V Python_BCC402
20 pages
Numpy Arrays
No ratings yet
Numpy Arrays
7 pages
DAY6 Pandas Seaborn
No ratings yet
DAY6 Pandas Seaborn
97 pages
Python-Unit-4
No ratings yet
Python-Unit-4
43 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Value Added Course: Programming in Python and Machine Learning UNIT-2
No ratings yet
Value Added Course: Programming in Python and Machine Learning UNIT-2
41 pages
Fds Answers
No ratings yet
Fds Answers
53 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
n Umpy Pandas Tutorial
No ratings yet
n Umpy Pandas Tutorial
65 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
Lecture 2 - NumPy I
No ratings yet
Lecture 2 - NumPy I
12 pages
Unit4
No ratings yet
Unit4
49 pages
ip study
No ratings yet
ip study
18 pages
Python For DScience & D Visualisation Updated
No ratings yet
Python For DScience & D Visualisation Updated
11 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
Interview Questions About Python Programming
No ratings yet
Interview Questions About Python Programming
16 pages
Section 7
No ratings yet
Section 7
33 pages
Attachment 3 Python for Data Analysis Lyst9850 (1)
No ratings yet
Attachment 3 Python for Data Analysis Lyst9850 (1)
31 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
No ratings yet
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
29 pages
Module3 Advance Pythonlibraries
No ratings yet
Module3 Advance Pythonlibraries
53 pages
Ty B Tech - Bda - Ai315 - Lab Manual
No ratings yet
Ty B Tech - Bda - Ai315 - Lab Manual
52 pages
Arrays
No ratings yet
Arrays
28 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
UM 822-111 IRG2000 En-Gb
No ratings yet
UM 822-111 IRG2000 En-Gb
30 pages
RAV3 Technical Support Webinar
No ratings yet
RAV3 Technical Support Webinar
28 pages
PN College - Report
No ratings yet
PN College - Report
1 page
Creative Writer Resume
100% (1)
Creative Writer Resume
8 pages
Slides
No ratings yet
Slides
41 pages
Soham Sadhukhan Tcs Resume
No ratings yet
Soham Sadhukhan Tcs Resume
1 page
Active Directory Replication
No ratings yet
Active Directory Replication
3 pages
Rapid GUI Programming with Python and Qt 1st Edition Mark Summerfield - Download the full set of chapters carefully compiled
No ratings yet
Rapid GUI Programming with Python and Qt 1st Edition Mark Summerfield - Download the full set of chapters carefully compiled
50 pages
Aspera File Transfer Level 2 Quiz Attempt Review - NEW2
No ratings yet
Aspera File Transfer Level 2 Quiz Attempt Review - NEW2
13 pages
Shape Up Case Study
No ratings yet
Shape Up Case Study
38 pages
Chandigarh Q 22-23
No ratings yet
Chandigarh Q 22-23
7 pages
Meal Donation Application Using Flutter Technology
No ratings yet
Meal Donation Application Using Flutter Technology
5 pages
Lecture 1
No ratings yet
Lecture 1
33 pages
DR Generator Sync Box - Service Manual-1
No ratings yet
DR Generator Sync Box - Service Manual-1
56 pages
Bauy VMware vSphere 7 Essentials Plus Kit (20 Devices, Lifetime) - Broadcom Key - GLOBAL - Cheap - G2A.COM!
No ratings yet
Bauy VMware vSphere 7 Essentials Plus Kit (20 Devices, Lifetime) - Broadcom Key - GLOBAL - Cheap - G2A.COM!
3 pages
Selenium
No ratings yet
Selenium
26 pages
FQ CV
No ratings yet
FQ CV
2 pages
Sintetičke Membrane - Sustav Zelenog Krova - Preljev
No ratings yet
Sintetičke Membrane - Sustav Zelenog Krova - Preljev
2 pages
Sysmac-Family-Brochure-202405-r7
No ratings yet
Sysmac-Family-Brochure-202405-r7
17 pages
Unit 6 Material
No ratings yet
Unit 6 Material
26 pages
ADVANCE SOFTWARE For GPR (GPRSoft PRO With 3D MODULE) PT. Andalan Tunas Mandiri
No ratings yet
ADVANCE SOFTWARE For GPR (GPRSoft PRO With 3D MODULE) PT. Andalan Tunas Mandiri
3 pages
Awesome React
No ratings yet
Awesome React
335 pages
Clay Product
No ratings yet
Clay Product
2 pages
Introduction To Data Analytics For IoT
No ratings yet
Introduction To Data Analytics For IoT
4 pages
STE Practical8
No ratings yet
STE Practical8
3 pages
Install Instruction Ortho4XP-6
No ratings yet
Install Instruction Ortho4XP-6
2 pages
Fingerprint-Based_Vote_Marking_System_for_Elector_Identification
No ratings yet
Fingerprint-Based_Vote_Marking_System_for_Elector_Identification
8 pages
What Is Lorem Ipsun
No ratings yet
What Is Lorem Ipsun
1 page