Data Analysis & Visualization
NumPy Basics
NumPy Agenda
• NumPy Intro
• Creating Arrays
• NumPy Array Indexing
• NumPy Array Slicing
• NumPy Data Types
• NumPy Copy vs View
• NumPy Array Shape
• NumPy Array Reshape
• NumPy Array Join
• NumPy Array Sort
• NumPy Array Filter
Creating NumPy Arrays
array([[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]],
[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]],
[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]])
NumPy Arrays Shape
NumPy Arrays Transpose
NumPy Array Indexing
NumPy Array Indexing
Basic array operations
Broadcasting
Basic array operations
More useful array operations
More useful array operations
Creating NumPy Array
NumPy Copy vs View
NumPy Array Reshape
flattening multidimensional arrays
Working with mathematical formulas
Working with mathematical formulas
Pandas Basics
Pandas
Pandas Agenda
• Pandas Intro
• Pandas Series
• Pandas DataFrames
• Pandas Read CSV
• Pandas Analyzing Data
• Cleaning Data
• Cleaning Empty Cells
• Cleaning Wrong Format
• Cleaning Wrong Data
• Removing Duplicates
• Pandas Correlations
• Pandas Plotting
• Merging, joining, and concatenating
• Operations
• Apply function
• Data input and output
Pandas Agenda
Pandas Data Structure
Pandas Data Structure
Pandas Data Structure
Pandas Data Structure
Reading files
Indexing
Pandas Data Structure
index Column-1 Column-2 … Column-n
Row-1 0 ...
Row-2 1 ...
… ... ... ... ...
...
Row-L L ...
DataFrame
Creating a DataFrame
Creating a DataFrame
Indexing and slicing
Indexing and slicing
loc Vs. iloc
loc Vs. iloc
loc Vs. iloc
loc Vs. iloc
loc Vs. iloc
loc Vs. iloc
Analyzing DataFrames
• head()
• tail()
• info()
• describe()
Cleaning Empty Cells
Cleaning Empty Cells
Cleaning Empty Cells
Cleaning Empty Cells
Removing Duplicates
Data Format
Set DataFrame Index
Reset DataFrame Index
Apply Function
Apply Function
Drop Function
Drop Function
filter
filter
Group by
Group by
Aggregation Methods
Aggregation Method Description
.count() The number of non-null records
.sum() The sum of the values
.mean() The arithmetic mean of the values
.median() The median of the values
.min() The minimum value of the group
.max() The maximum value of the group
.mode() The most frequent value in the group
.std() The standard deviation of the group
.var() The variance of the group
Group by Example
daily_spend_count = df.groupby('Day')['Debit'].count()
daily_spend_sum = df.groupby('Day')['Debit'].sum()
df.groupby(['Category','Month'])['Debit'].sum()
Sort
Correlations
df.corr()
Concatenate
append
Merge Function
Column-1 … Column-n
... ... ... Data Integration
Merge Function
Year
Column-1 Temperature
… Rainfall
Column-n
... ...
... ...
...
Pandas Merge
df.merge(right=other_df, on=‘common_column’ , how=‘how_to_join’ )
df Other_df
+ =
Pandas Merge
Pandas concat Vs append Vs join Vs merge
• Concat gives the flexibility to join based on the axis( all rows or all
columns)
• Append is the specific case(axis=0, join='outer') of concat
• Merge is based on any particular column each of the two dataframes,
this columns are variables on like 'left_on', 'right_on', 'on’.
• Join is based on the indexes (set by set_index) on how variable
=['left','right','inner','outer']
THANK YOU