0% found this document useful (0 votes)

6 views42 pages

data analysis

The document provides an overview of data analysis using Python, detailing essential libraries such as NumPy, Pandas, and Matplotlib. It covers data importing, preprocessing, normalization, and exploratory data analysis (EDA) techniques, including descriptive statistics and correlation. Additionally, it includes practical exercises for applying these concepts with a sample dataset.

Uploaded by

piyush dwivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views42 pages

data analysis

Uploaded by

piyush dwivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Data Analysis With Python

Dr. Amar Singh

Professor
Lovely Professional University
Libraries in Python
• Scientific Computing Libraries
• NumPy
• Pandas
• SciPy
• Visualization Libraries
• Matplotlib
• Seaborn
• Algorithmic Libraries
• Scikit-learn
Importing Data in Python
• Importing is the process of loading or reading the data from different
resources.
• The data may be in different formats.
• .csv, .json, .xlsx
• Path of the dataset could be mentioned as below:
• C:\\mydata\\data.csv
• To read a csv file we can use following command:
• pd.read_csv(“c:\\mydata\\data.csv”)
Libraries
Check data type
• Dataframe.dtypes
Printing Dataframe
• df[“BasePay”] // Prints only BasePay column
• df.head(n) //shows first n rows of the data frame
• df.tail(n) //shows bottom n rows of the data frame
• Df.dtypes // used to check data types
Dataframe.describe()
• Returns full summary Statistical
Dataframe.describe(include=“all”)
Data Pre-processing
• Pre-Processing is used to convert raw data into another format for
further data analysis.
• Also known as data cleaning or data wrangling.
Data-Preprocessing
• Deal with missing values
• Data Formatting
• Data Normalization
• Converting Categorical Values to Numerical Values
Missing Values
• When no value is stored for column in an observation.
• Could be represented as ?, NA or blank cell.
How to deal with missing data
• Drop missing values
• Drop the variable
• Replace missing values with an average or frequency values.
• Leave it as missing data.
How to drop missing values in python
• Use dataframe.dropna()
How to replace missing value with new value ?
• Df.replace(missing value, new value)
Data Formatting
• Data are usually collected from different sources and stored in
different formats.
• Bringing data into standard of expression allows user to make
meaningful comparisons.
Incorrect Data Types
• Sometimes wrong datatype is assigned to a column.
Continue..
Apply calculations to entire column
Data Normalization
• Normalization is the process of transforming values of several
variables into a similar range.
• Typical values range from 0 to 1
Normalization

Age Income
20 20000
25 45000
37 28000

• Age and income are in different ranges..

• Hard to Compare.
• “Income” will influence the results more.
Methods for normalization
Simple feature scaling
• df['length'] = df['length']/df['length'].max()
• df['width'] = df['width']/df['width'].max()
Categorical
Continue..
Continue..
Continue..
Continue..
Exploratory Data Analysis (EDA)
• Preliminary Step to data analysis
• Get better understanding of data set.
• Summarize main characteristics of data set.
• Uncover relationship between different variables
• Extract Important Variables
Descriptive Statistics
• Describe basic features of data.
• Giving short summaries about sample and measures of data set.
Descriptive Statistics
• df.describe()
• df.value_count()
• Summarizing categorical data
• Example : df[“drive-wheels"].value_counts()
Scatterplot
• Represents the relationships between variables
• Predictor variable on x-axis.
• Target variable on y-axis.
Scatterplot : Example
Grouping Data
• Groupby method is used to grouping the data.
• Can be applied on categorical variables.
• Groups the data into categories.
• Example:
• test_Data1= test_Data.groupby('JobTitle’)
• test_Data1.mean()
Correlation
• a measure of the extent of interdependence between variables.
• 1: Total positive linear correlation.
• 0: No linear correlation, the two variables most likely do not affect each other.
• -1: Total negative linear correlation.
• df.corr()
Correlation using scatter plot
Exercise
• Import SocialAds.csv dataset.
• Show first five rows of the dataset.
• Show last five rows of the dataset.
• Give the statistical description of the dataset.
• Count the number of males and females in the dataset.

Data Analysis With Python
No ratings yet
Data Analysis With Python
29 pages
Final Year Project - Medical Image Classification Using Support Vector Machine
67% (3)
Final Year Project - Medical Image Classification Using Support Vector Machine
54 pages
Subject Title: Principles of Communication Systems
No ratings yet
Subject Title: Principles of Communication Systems
29 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Data Mining Vs Data Exploration UNIT-II
No ratings yet
Data Mining Vs Data Exploration UNIT-II
11 pages
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
100% (1)
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
12 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Python Basics - Hamza Zahoor
No ratings yet
Python Basics - Hamza Zahoor
6 pages
Phython Example
No ratings yet
Phython Example
12 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Course_ Introduction to Data Science (SD211105)
No ratings yet
Course_ Introduction to Data Science (SD211105)
10 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Data Exploration in Python PDF
No ratings yet
Data Exploration in Python PDF
1 page
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Stats Unit1
No ratings yet
Stats Unit1
27 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
CSE445 NSU Week_3
No ratings yet
CSE445 NSU Week_3
48 pages
L6 and 7-Data Preprocessing-coding
No ratings yet
L6 and 7-Data Preprocessing-coding
34 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Comprehensive EDA Python Guide
No ratings yet
Comprehensive EDA Python Guide
13 pages
Data Analytics With PowerBI
No ratings yet
Data Analytics With PowerBI
27 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Python Data Analyst Handbook Guide_byom_cybertechie
No ratings yet
Python Data Analyst Handbook Guide_byom_cybertechie
57 pages
Teks DATA SCIENCE Syllabus - QR
No ratings yet
Teks DATA SCIENCE Syllabus - QR
26 pages
Data Mining Using Python Manual
No ratings yet
Data Mining Using Python Manual
69 pages
Exp 8_LM
No ratings yet
Exp 8_LM
10 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
Exploratory Data Analysis: Prasad Deshmukh
No ratings yet
Exploratory Data Analysis: Prasad Deshmukh
15 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
15 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Data Science lab manual..
No ratings yet
Data Science lab manual..
54 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
final dev record
No ratings yet
final dev record
49 pages
EDAP LAB
No ratings yet
EDAP LAB
47 pages
UNIT 2 dt
No ratings yet
UNIT 2 dt
8 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
Data Minds - Data Science Curriculum 2023 V2
No ratings yet
Data Minds - Data Science Curriculum 2023 V2
15 pages
Data Understanding and Preparation
No ratings yet
Data Understanding and Preparation
48 pages
Endsem Imp Bi Unit 4
No ratings yet
Endsem Imp Bi Unit 4
36 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
Unit 4_Working With Graphs _python
No ratings yet
Unit 4_Working With Graphs _python
49 pages
Data Analyst Nanodegree Program - Syllabus
No ratings yet
Data Analyst Nanodegree Program - Syllabus
7 pages
UNIT-2
No ratings yet
UNIT-2
36 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Getting Started With Python Data Analysis - Sample Chapter
0% (1)
Getting Started With Python Data Analysis - Sample Chapter
17 pages
EDA+Cheatsheet+-+Class+Note
No ratings yet
EDA+Cheatsheet+-+Class+Note
29 pages
Data Wrangling
No ratings yet
Data Wrangling
18 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
00002211-SEM III OOPJ INTERNALS 24 Jan 2022
No ratings yet
00002211-SEM III OOPJ INTERNALS 24 Jan 2022
1 page
Almas Baji
No ratings yet
Almas Baji
21 pages
A Spiking Neural Network (SNN) Forecast Engine For Short-Term Electrical Load Forecasting
No ratings yet
A Spiking Neural Network (SNN) Forecast Engine For Short-Term Electrical Load Forecasting
8 pages
Sudoku 20210726
No ratings yet
Sudoku 20210726
100 pages
English Notes PDF
100% (1)
English Notes PDF
233 pages
BC-R25 Series Burner Controller User's Manual
No ratings yet
BC-R25 Series Burner Controller User's Manual
62 pages
ht - Weather Map Gizmo
No ratings yet
ht - Weather Map Gizmo
5 pages
An Automated Medical Duties Scheduling System Using Queing Techniques (Chapter 1-5)
No ratings yet
An Automated Medical Duties Scheduling System Using Queing Techniques (Chapter 1-5)
19 pages
1st Periodic Test - Science 7
No ratings yet
1st Periodic Test - Science 7
4 pages
FZ 25 2021
No ratings yet
FZ 25 2021
247 pages
C&C08 OVSV610R103 H302HCB Configuration Guide: Huawei Technologies Co, LTD
No ratings yet
C&C08 OVSV610R103 H302HCB Configuration Guide: Huawei Technologies Co, LTD
17 pages
3cl Sali, Jomar A. Week 4
No ratings yet
3cl Sali, Jomar A. Week 4
5 pages
Cs4227 Edc Phy Product Brief
No ratings yet
Cs4227 Edc Phy Product Brief
2 pages
On-Board Processing Benchmarks
No ratings yet
On-Board Processing Benchmarks
9 pages
Shaik Nayab Rasool-Gis-Cv
No ratings yet
Shaik Nayab Rasool-Gis-Cv
6 pages
Crippa CA945 Drive Faults
No ratings yet
Crippa CA945 Drive Faults
57 pages
Turnado - Manual
No ratings yet
Turnado - Manual
41 pages
03 - Daf - SDF - Rev 02B
No ratings yet
03 - Daf - SDF - Rev 02B
10 pages
Rhythm and Tempo
0% (1)
Rhythm and Tempo
8 pages
New Jss3 Seco Term Exams
No ratings yet
New Jss3 Seco Term Exams
4 pages
Assignment 1 CORRECT
No ratings yet
Assignment 1 CORRECT
3 pages
PM Trailer 16T General Purpose RevB
No ratings yet
PM Trailer 16T General Purpose RevB
60 pages
Trane - Coil Sizing Guide PDF
100% (1)
Trane - Coil Sizing Guide PDF
42 pages
Viii QP Mathematics Annual 2023
No ratings yet
Viii QP Mathematics Annual 2023
3 pages
Vdoc - Pub Classics of Semiotics 101 125
No ratings yet
Vdoc - Pub Classics of Semiotics 101 125
25 pages
Class Xii - Physics (Question Bank) - Dual Natutre of Matter and Radiation (Subj) - 04.02.2022
No ratings yet
Class Xii - Physics (Question Bank) - Dual Natutre of Matter and Radiation (Subj) - 04.02.2022
4 pages
Middleware Technologies: by Sirisha Mandadi
No ratings yet
Middleware Technologies: by Sirisha Mandadi
35 pages
Resonant Push Pull Converter With Flyback Regulator For MHZ High Step Up Power Conversion
No ratings yet
Resonant Push Pull Converter With Flyback Regulator For MHZ High Step Up Power Conversion
10 pages

data analysis

Uploaded by

data analysis

Uploaded by

Data Analysis With Python

Dr. Amar Singh

• Age and income are in different ranges..

You might also like