[go: up one dir, main page]

0% found this document useful (0 votes)
18 views15 pages

Chap 2 B

Exploratory Data Analysis (EDA) is a crucial process for analyzing and summarizing datasets to identify patterns, relationships, and insights prior to advanced modeling. The objectives of EDA include understanding data types, detecting data issues, and uncovering patterns through descriptive statistics and data visualization. Key methods in EDA involve calculating measures of central tendency and variability, as well as analyzing data distributions through skewness and kurtosis.

Uploaded by

oguzhanaytekin65
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views15 pages

Chap 2 B

Exploratory Data Analysis (EDA) is a crucial process for analyzing and summarizing datasets to identify patterns, relationships, and insights prior to advanced modeling. The objectives of EDA include understanding data types, detecting data issues, and uncovering patterns through descriptive statistics and data visualization. Key methods in EDA involve calculating measures of central tendency and variability, as well as analyzing data distributions through skewness and kurtosis.

Uploaded by

oguzhanaytekin65
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Veri Bilimi ve Veri Analitiği

Exploratory Data Analysis (EDA)


What is Exploratory Data Analysis (EDA)?
 Exploratory Data Analysis (EDA) is a process of analyzing and summarizing datasets
to uncover patterns, relationships, and insights, typically before applying more
advanced modeling techniques.
 EDA is a critical first step in the data analysis workflow, helping analysts and data
scientists understand the data’s structure, quality, and feature.
Objectives of EDA
 Understand the Data:
➢ Identify the types of variables (categorical, numerical).
➢ Assess the distribution of data and its overall structure.
 Detect Data Issues:
➢ Find missing values, duplicates, or outliers.
➢ Evaluate inconsistencies or errors in the data.
 Uncover Patterns:
➢ Reveal trends, clusters, or correlations.
➢ Identify relationships between variables.
Steps in EDA
 Descriptive Statistics:
➢ Calculate measures like mean, median, mode, variance, and standard
deviation.
➢ Summarize categorical data with frequency counts and percentages.
 Data Visualization:
➢ Univariate Analysis: Visualize single variables using histograms, box
plots, or bar charts.
➢ Bivariate Analysis: Study relationships between two variables using scatter
plots or correlation matrices.
➢ Multivariate Analysis: Explore relationships among multiple variables using
heatmaps or pair plots.
 Data Cleaning: Handle missing values, outliers, or erroneous data points.
 Correlation Analysis: Use metrics like Pearson or Spearman correlation to
examine relationships between numerical variables.
Descriptive Statistics
 Descriptive statistics is a branch of statistics that involves summarizing and
organizing data to describe its main characteristics.
 Descriptive statistics focuses on presenting data in a clear and
understandable way.
 Purpose of Descriptive Statistics:
➢ Simplify large datasets into a manageable format.
➢ Provide insights into the distribution, central tendency, and variability of the
data.
➢ Serve as a foundation for further statistical analysis.
 Key Methods of Descriptive Statistics:
➢ Measures of Central Tendency
➢ Measures of Variability
➢ Measures of Shape
Measures of Central Tendency
 These metrics describe the center or typical value of a dataset.
 Mean (Average): The sum of all values divided by the number of values.
σ𝑥
 Formula: Mean= 𝑛

 Example: For the data [10,20,20,30,40], the mean is 24.

 Median: The middle value when data is sorted.


 If the dataset has an even number of values, it is the average of the two middle
values.
 Example: For the data [10,20,20,30,40], the median is 20.

 Mode: The most frequent value(s) in the dataset.


 Example: For the data [10,20,20,30,40], the mode is 20.
Measures of Central Tendency

Measure Strengths Weaknesses

Mean - Takes all data points into account. - Sensitive to outliers or extreme values.

- Useful for numerical datasets. - May not represent skewed data accurately.
Median - Not affected by outliers. - Ignores the magnitude of values.

- Provides a better central measure for skewed data. - May not be ideal for datasets with small variations.
Mode - Useful for categorical data. - May not exist or may have multiple values
(bimodal or multimodal datasets).
- Works well when identifying common trends.
Measures of Variability
 These metrics indicate how spread out the data is.
 Range: The difference between the maximum and minimum values.
 Formula: Range = Max - Min
 Example: For the data [10,20,20,30,40], the range is 40 – 10 =30.

 Variance: The average of the squared differences from the mean.


2
σ 𝑥−𝑀𝑒𝑎𝑛
 Formula: Variance = 𝑛
 Example: For the data [10,20,20,30,40], the variance is 104.
 Standard Deviation: The square root of the variance, representing the average
distance from the mean.
 Example: For the data [10,20,20,30,40], the standard deviation is 10.20.
 Interquartile Range (IQR): The range between the 25th percentile (Q1) and the 75th
percentile (Q3).
Percentiles and Quartiles
Percentiles and Quartiles
Measures of Shape
 These metrics describe the distribution’s shape:
 Skewness: Measures asymmetry in the data distribution.
➢ Positive skew: Long tail on the right.
➢ Negative skew: Long tail on the left.
Example of Skewness
Skewness
Measures of Shape
 Kurtosis: Measures the "tailedness" of the distribution.
➢ High kurtosis
➢ Low kurtosis
References

metin, ekran görüntüsü, harita, pembe içeren bir resim The Data Science Design Manual (Texts in Computer Science) A Hands-On Introduction to Data Science

Açıklama otomatik olarak oluşturuldu

You might also like