Chap 2 B

Exploratory Data Analysis (EDA) is a crucial process for analyzing and summarizing datasets to identify patterns, relationships, and insights prior to advanced modeling. The objectives of EDA include understanding data types, detecting data issues, and uncovering patterns through descriptive statistics and data visualization. Key methods in EDA involve calculating measures of central tendency and variability, as well as analyzing data distributions through skewness and kurtosis.

Uploaded by

oguzhanaytekin65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views15 pages

Chap 2 B

Uploaded by

oguzhanaytekin65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Veri Bilimi ve Veri Analitiği

Exploratory Data Analysis (EDA)

What is Exploratory Data Analysis (EDA)?
 Exploratory Data Analysis (EDA) is a process of analyzing and summarizing datasets
to uncover patterns, relationships, and insights, typically before applying more
advanced modeling techniques.
 EDA is a critical first step in the data analysis workflow, helping analysts and data
scientists understand the data’s structure, quality, and feature.
Objectives of EDA
 Understand the Data:
➢ Identify the types of variables (categorical, numerical).
➢ Assess the distribution of data and its overall structure.
 Detect Data Issues:
➢ Find missing values, duplicates, or outliers.
➢ Evaluate inconsistencies or errors in the data.
 Uncover Patterns:
➢ Reveal trends, clusters, or correlations.
➢ Identify relationships between variables.
Steps in EDA
 Descriptive Statistics:
➢ Calculate measures like mean, median, mode, variance, and standard
deviation.
➢ Summarize categorical data with frequency counts and percentages.
 Data Visualization:
➢ Univariate Analysis: Visualize single variables using histograms, box
plots, or bar charts.
➢ Bivariate Analysis: Study relationships between two variables using scatter
plots or correlation matrices.
➢ Multivariate Analysis: Explore relationships among multiple variables using
heatmaps or pair plots.
 Data Cleaning: Handle missing values, outliers, or erroneous data points.
 Correlation Analysis: Use metrics like Pearson or Spearman correlation to
examine relationships between numerical variables.
Descriptive Statistics
 Descriptive statistics is a branch of statistics that involves summarizing and
organizing data to describe its main characteristics.
 Descriptive statistics focuses on presenting data in a clear and
understandable way.
 Purpose of Descriptive Statistics:
➢ Simplify large datasets into a manageable format.
➢ Provide insights into the distribution, central tendency, and variability of the
data.
➢ Serve as a foundation for further statistical analysis.
 Key Methods of Descriptive Statistics:
➢ Measures of Central Tendency
➢ Measures of Variability
➢ Measures of Shape
Measures of Central Tendency
 These metrics describe the center or typical value of a dataset.
 Mean (Average): The sum of all values divided by the number of values.
σ𝑥
 Formula: Mean= 𝑛

 Example: For the data [10,20,20,30,40], the mean is 24.

 Median: The middle value when data is sorted.

 If the dataset has an even number of values, it is the average of the two middle
values.
 Example: For the data [10,20,20,30,40], the median is 20.

 Mode: The most frequent value(s) in the dataset.

 Example: For the data [10,20,20,30,40], the mode is 20.
Measures of Central Tendency

Measure Strengths Weaknesses

Mean - Takes all data points into account. - Sensitive to outliers or extreme values.

- Useful for numerical datasets. - May not represent skewed data accurately.
Median - Not affected by outliers. - Ignores the magnitude of values.

- Provides a better central measure for skewed data. - May not be ideal for datasets with small variations.
Mode - Useful for categorical data. - May not exist or may have multiple values
(bimodal or multimodal datasets).
- Works well when identifying common trends.
Measures of Variability
 These metrics indicate how spread out the data is.
 Range: The difference between the maximum and minimum values.
 Formula: Range = Max - Min
 Example: For the data [10,20,20,30,40], the range is 40 – 10 =30.

 Variance: The average of the squared differences from the mean.

2
σ 𝑥−𝑀𝑒𝑎𝑛
 Formula: Variance = 𝑛
 Example: For the data [10,20,20,30,40], the variance is 104.
 Standard Deviation: The square root of the variance, representing the average
distance from the mean.
 Example: For the data [10,20,20,30,40], the standard deviation is 10.20.
 Interquartile Range (IQR): The range between the 25th percentile (Q1) and the 75th
percentile (Q3).
Percentiles and Quartiles
Percentiles and Quartiles
Measures of Shape
 These metrics describe the distribution’s shape:
 Skewness: Measures asymmetry in the data distribution.
➢ Positive skew: Long tail on the right.
➢ Negative skew: Long tail on the left.
Example of Skewness
Skewness
Measures of Shape
 Kurtosis: Measures the "tailedness" of the distribution.
➢ High kurtosis
➢ Low kurtosis
References

metin, ekran görüntüsü, harita, pembe içeren bir resim The Data Science Design Manual (Texts in Computer Science) A Hands-On Introduction to Data Science

Açıklama otomatik olarak oluşturuldu

Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
Research Report
No ratings yet
Research Report
47 pages
Module 3 Data Analysis Techniques
No ratings yet
Module 3 Data Analysis Techniques
55 pages
Descriptive Statistics Overview
100% (1)
Descriptive Statistics Overview
6 pages
Ds Unit 2 QB
No ratings yet
Ds Unit 2 QB
25 pages
Unit 2
No ratings yet
Unit 2
20 pages
Descriptive Statics
No ratings yet
Descriptive Statics
17 pages
MS102
No ratings yet
MS102
9 pages
Basic Statistics
No ratings yet
Basic Statistics
7 pages
Descriptive Stats with Python
No ratings yet
Descriptive Stats with Python
8 pages
Analytical Decision Making
No ratings yet
Analytical Decision Making
27 pages
Unit .......
No ratings yet
Unit .......
45 pages
Green Aesthetic Thesis Defense Presentation
No ratings yet
Green Aesthetic Thesis Defense Presentation
5 pages
Decriptive Statistics in Data Science
No ratings yet
Decriptive Statistics in Data Science
9 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
15 pages
Exploratory Data Analysis (EDA) Techniques - Lect 5
No ratings yet
Exploratory Data Analysis (EDA) Techniques - Lect 5
25 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
Foundations or Research Analysis
No ratings yet
Foundations or Research Analysis
31 pages
Chapter 2 Descriptive Statistics
No ratings yet
Chapter 2 Descriptive Statistics
3 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
1 page
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Lec 2-3
No ratings yet
Lec 2-3
18 pages
Statistics FoundationalMathofAI S24
No ratings yet
Statistics FoundationalMathofAI S24
5 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Data science-Unit-3-Complete
No ratings yet
Data science-Unit-3-Complete
33 pages
8409 Statistics
No ratings yet
8409 Statistics
17 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
Descriptive and Inferential Stats Guide
No ratings yet
Descriptive and Inferential Stats Guide
49 pages
CH4 Exploratory Data Analysis
No ratings yet
CH4 Exploratory Data Analysis
12 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Research Method Lecture Notes
No ratings yet
Research Method Lecture Notes
32 pages
Chapter Five
No ratings yet
Chapter Five
48 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
Program-1
No ratings yet
Program-1
15 pages
5.1 Exploratory Analysis en
No ratings yet
5.1 Exploratory Analysis en
79 pages
SQL Notes
No ratings yet
SQL Notes
3 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
Data Exploration LEC3 AM
No ratings yet
Data Exploration LEC3 AM
59 pages
Dsa Report
No ratings yet
Dsa Report
11 pages
Unit 3
No ratings yet
Unit 3
6 pages
MPC 006 2024-25 For SSC and All Educational Needs
No ratings yet
MPC 006 2024-25 For SSC and All Educational Needs
27 pages
Lecture 2.1 Data - Exploration
No ratings yet
Lecture 2.1 Data - Exploration
22 pages
Comprehensive Guide to Statistics
No ratings yet
Comprehensive Guide to Statistics
21 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
4 pages
Statistics for Computer Science Students
No ratings yet
Statistics for Computer Science Students
6 pages
Unit II Descriptive-Statistics-And-Correlation
No ratings yet
Unit II Descriptive-Statistics-And-Correlation
19 pages
ds1 Iat Ans
No ratings yet
ds1 Iat Ans
18 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
173 pages
Statistical Analysis of Data
No ratings yet
Statistical Analysis of Data
3 pages
9-1 Data Analysis and Pre-Processing Part 1 PDF
No ratings yet
9-1 Data Analysis and Pre-Processing Part 1 PDF
19 pages
Week - 6-7
No ratings yet
Week - 6-7
9 pages
Descriptive & Inferential Stats Guide
No ratings yet
Descriptive & Inferential Stats Guide
13 pages
I Am Sharing 'DOC-20250811-WA0005.' With You
No ratings yet
I Am Sharing 'DOC-20250811-WA0005.' With You
16 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
DAI Data Preprocessing 1 46233380 2025 06 12 17 18
No ratings yet
DAI Data Preprocessing 1 46233380 2025 06 12 17 18
14 pages
Statistical Instruments and References Writing in Research
No ratings yet
Statistical Instruments and References Writing in Research
36 pages
Neo-FFI Report
No ratings yet
Neo-FFI Report
9 pages
Best College Ranking 2020 SIP Proposal
No ratings yet
Best College Ranking 2020 SIP Proposal
3 pages
W11 & W13 Research Paper Writing - Module
No ratings yet
W11 & W13 Research Paper Writing - Module
5 pages
A Survey On Semi-Tensor Product of Matrices: M×N P×Q
No ratings yet
A Survey On Semi-Tensor Product of Matrices: M×N P×Q
19 pages
MATLAB AMDF Pitch Detection Guide
No ratings yet
MATLAB AMDF Pitch Detection Guide
5 pages
Servqual Servperf
No ratings yet
Servqual Servperf
13 pages
Project On Times of India
92% (13)
Project On Times of India
74 pages
Home Renovation Project Metrics
No ratings yet
Home Renovation Project Metrics
7 pages
Financial Literacy Amongst Small Scale Farmers in Zambia
100% (1)
Financial Literacy Amongst Small Scale Farmers in Zambia
13 pages
Session3 Probability2
No ratings yet
Session3 Probability2
13 pages
Technical and Financial Proposal Template
100% (7)
Technical and Financial Proposal Template
6 pages
Children's Attitudes on Inclusive PE
No ratings yet
Children's Attitudes on Inclusive PE
6 pages
A Study On Student Satisfaction in Pakistani Universities
No ratings yet
A Study On Student Satisfaction in Pakistani Universities
12 pages
Defense Slides Preparation
No ratings yet
Defense Slides Preparation
4 pages
Free Fluid Level Theory
No ratings yet
Free Fluid Level Theory
5 pages
The Research Imagination 1st Edition Paul S. Gray Full Access
100% (8)
The Research Imagination 1st Edition Paul S. Gray Full Access
102 pages
Environmental Impact Assessment (Eia) : Screening
No ratings yet
Environmental Impact Assessment (Eia) : Screening
12 pages
HRM Ia2
No ratings yet
HRM Ia2
46 pages
Impact of Ad of Coke & Pepsi
No ratings yet
Impact of Ad of Coke & Pepsi
25 pages
Examining The Role of Social Media in Effective Crisis Management PDF
No ratings yet
Examining The Role of Social Media in Effective Crisis Management PDF
21 pages
Mice Research Proposal
No ratings yet
Mice Research Proposal
14 pages
OU Osmania University - MBA - 2016 - 1st Semester - Jan - 1025 SM Statistics For Management
No ratings yet
OU Osmania University - MBA - 2016 - 1st Semester - Jan - 1025 SM Statistics For Management
2 pages
Tsehay Zikarge
No ratings yet
Tsehay Zikarge
68 pages
Reliability of The Assessment Tools: UNIT-5
No ratings yet
Reliability of The Assessment Tools: UNIT-5
13 pages
Blen Alemayehu, Final Thesis
No ratings yet
Blen Alemayehu, Final Thesis
107 pages
06 - Chapter 3 PDF
No ratings yet
06 - Chapter 3 PDF
6 pages
Intro to Random Sampling Methods
No ratings yet
Intro to Random Sampling Methods
26 pages
Chapter 2
100% (1)
Chapter 2
5 pages
Classics Dissertation Examples
100% (2)
Classics Dissertation Examples
4 pages

Chap 2 B

Uploaded by

Chap 2 B

Uploaded by

Veri Bilimi ve Veri Analitiği

Exploratory Data Analysis (EDA)

 Example: For the data [10,20,20,30,40], the mean is 24.

 Median: The middle value when data is sorted.

 Mode: The most frequent value(s) in the dataset.

Measure Strengths Weaknesses

 Variance: The average of the squared differences from the mean.

Açıklama otomatik olarak oluşturuldu

You might also like