Basics Data Description

Uploaded by

Eshan Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views2 pages

Basics Data Description

Uploaded by

Eshan Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Basics: Data Description

12 June 2024 16:20

Describing and Summarising Data Variability

The mean, median and mode give you a sense of the center of the data, but none of
When we acquire a set of data, we should begin by asking some these indicate how far the data are spread around the center.
important questions:
• Where do the data come from? Standard Deviation
• How were they collected? The standard deviation tells us how far the data are spread out. A large standard
• How can we help the data tell their story? deviation indicates that the data are widely dispersed. A smaller standard deviation
tells us that the data points are more tightly clustered together.
Outlier- First, we must investigate why an outlier exists. Is it just an
unusual, but valid value? Could it be a data entry error? Was it collected
in a different way than the rest of the data? At a different time? The
following action can be taken
• Leave it alone
• Remove it- if not relevant for analysis eg- Data entry error
• Replace it

Histogram
Data Analysis> Histogram> Input Values> Bin range> Tick on chart
output
Variance
1. Each Saturday's number of requests lies a certain distance from 172, the
mean number of requests. To find the variance, we first sum the squares of
these differences. Why square the differences?
Histogram 2. A hotel manager would want information about the magnitude of each
difference, which can be positive, negative, or zero. If we simply summed the
differences between each Saturday's requests and the mean, positive and
negative differences would cancel each other out.
Measures of central tendency- Sense of centre of the data
Mean- Average function
Median- Median function, unordered list works as well
Mode- Mode function

Coefficient of Variation
To get a sense of the relative magnitude of the variation in a data
set, we want to compare the standard deviation of the data to the
data's mean.

3. The formula for variance adds up the squared differences and divides by n-1 to
get a type of "average" squared difference as a measure of variability. (The
reason we divide by n-1 to get an average here is a technicality beyond the
Applying Data Analysis scope of this course.) SD= 25.2 requests

Interpretation
Skewness of Histogram Larger the SD, larger the spread and vice versa
Excel-
scuba_price

Even without calculation, we can figure out which dataset is closer to has lower SD
by comparing the values to the mean
Eg- A looks less skewed as the values are closer to the mean which is the correct
answer

In a right-skewed distribution, the tail extends towards

the higher values. The peak of the distribution is on the left
side, and the mean is greater than the median. This
skewness suggests that while most data points cluster towards
the lower end, a few significantly higher values stretch the tail
towards the right.

Relationship between variables Correlation

Hidden Variables It quantifies the extent to which there is a linear relationship between two variables.
Even when two data sets seem to be directly related, we may need to To describe the strength of a linear relationship, the correlation coefficient takes on
investigate further to understand the reason for the relationship. We may values between -1 and +1. If every point falls exactly on a line with a negative slope, the

QM Page 1
Relationship between variables Correlation
Hidden Variables It quantifies the extent to which there is a linear relationship between two variables.
Even when two data sets seem to be directly related, we may need to To describe the strength of a linear relationship, the correlation coefficient takes on
investigate further to understand the reason for the relationship. We may values between -1 and +1. If every point falls exactly on a line with a negative slope, the
find that the reason is not due to any fundamental connection between the correlation coefficient is exactly -1.
two variables themselves, but that they are instead mutually related to
another underlying factor. Eg- The below plot might show that as baseball Even when the correlation coefficient is 0, a
scales increase, hockey puck scale decrease but the actual difference is relationship might exist — just not a linear
because baseball is played in summer and hockey in winter. relationship. As we've seen, scatter plots can reveal
patterns and help us better understand the business
context the data describe.

Influence of outliers
Eg- Suppose a manager suspects that his employees skip work to enjoy the good life
more often as the temperature rises. After pairing absences with daily temperature data,
he finds the correlation coefficient to be 0.466.
But look at the data — besides a few outliers, there isn't a clear relationship. Seeing the
scatter plot, the manager might realize that the three outliers correspond to a late-
False relationships summer, three-day transportation strike that kept some workers homebound the previous
year.
Variable and Time Without looking at the data, the correlation coefficient can lead us down false paths. If we
Assuming we have price data collected over time, we can plot a scatter exclude the outliers, the relationship disappears, and the correlation essentially drops to
diagram for memory price, in the same way we plotted height and weight. zero, quieting any suspicion of weather.
Because time is one of the variables, we call this graph a time series.

Time series will help us

recognize seasonal
patterns and yearly
trends. But we must be
careful: we shouldn't rely
only on visual analysis
when looking for
relationships and
patterns.

As a summary statistic for the data, the correlation coefficient is calculated numerically,
incorporating the value of every data point. Because measures like correlation give more
weight to points distant from the center of the data, outliers can strongly influence the
correlation coefficient of the entire set. In these situations, our intuition and the measure
we use to quantify our intuition can be quite different. We should always attempt to
reconcile those differences by returning to the data.

Excel- CORREL function

QM Page 2

Statistics
No ratings yet
Statistics
87 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
Ayush Roy - 30905022078 - BBA504A - CA1
No ratings yet
Ayush Roy - 30905022078 - BBA504A - CA1
11 pages
Outliers Correlation
No ratings yet
Outliers Correlation
21 pages
Data Visualization
No ratings yet
Data Visualization
37 pages
Unit II Descriptive-Statistics-And-Correlation
No ratings yet
Unit II Descriptive-Statistics-And-Correlation
19 pages
AS-level - Research Methods 4 - Correlation and Data Analysis
No ratings yet
AS-level - Research Methods 4 - Correlation and Data Analysis
63 pages
Lecture2 - Descriptive Statistics - 0909
No ratings yet
Lecture2 - Descriptive Statistics - 0909
29 pages
Stastical Data Analysis: A Lokeshwari 22N31E0014
No ratings yet
Stastical Data Analysis: A Lokeshwari 22N31E0014
30 pages
DSBD Unit-II 4
No ratings yet
DSBD Unit-II 4
15 pages
Interpretation of Data Analysis
No ratings yet
Interpretation of Data Analysis
3 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
26 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
FYBBA-SEM1-Business Statistics and Logic Theory
No ratings yet
FYBBA-SEM1-Business Statistics and Logic Theory
14 pages
SPSS Aditya 3rd Sem
No ratings yet
SPSS Aditya 3rd Sem
55 pages
Module5 Bigdata Analytics
No ratings yet
Module5 Bigdata Analytics
110 pages
Unit .......
No ratings yet
Unit .......
45 pages
I Am Sharing 'DOC-20250811-WA0005.' With You
No ratings yet
I Am Sharing 'DOC-20250811-WA0005.' With You
16 pages
Appendix 1 Basic Statistics: Summarizing Data
No ratings yet
Appendix 1 Basic Statistics: Summarizing Data
5 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
Module 5
No ratings yet
Module 5
51 pages
Appendix 1 Basic Statistics: Summarizing Data
No ratings yet
Appendix 1 Basic Statistics: Summarizing Data
9 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Stastics For Data Science1 (Quiz1 Notes)
No ratings yet
Stastics For Data Science1 (Quiz1 Notes)
2 pages
BA 1 - Describing and Summarizing Data PDF
No ratings yet
BA 1 - Describing and Summarizing Data PDF
4 pages
Variables and Data Presentation
No ratings yet
Variables and Data Presentation
64 pages
Screenshot 2024-12-15 at 8.15.38 PM
No ratings yet
Screenshot 2024-12-15 at 8.15.38 PM
138 pages
Program 2
No ratings yet
Program 2
9 pages
Quantitative Analysis
No ratings yet
Quantitative Analysis
30 pages
Mbas901 - L2
No ratings yet
Mbas901 - L2
110 pages
Ambreen 2338 18990 1 BRM Session 14 SPSS
No ratings yet
Ambreen 2338 18990 1 BRM Session 14 SPSS
26 pages
Descriptive Statistical Measures
No ratings yet
Descriptive Statistical Measures
18 pages
Algebra 1 Unit 6 Describing Data Notes
No ratings yet
Algebra 1 Unit 6 Describing Data Notes
13 pages
ISOM Cheat Sheet 1
No ratings yet
ISOM Cheat Sheet 1
6 pages
HR Data Visualization & Correlation
No ratings yet
HR Data Visualization & Correlation
10 pages
Statistics: Types, Data, and Measures
No ratings yet
Statistics: Types, Data, and Measures
6 pages
Data Science & ML Essentials
No ratings yet
Data Science & ML Essentials
15 pages
Unit 4 - Measures of Variability
No ratings yet
Unit 4 - Measures of Variability
24 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
22 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
What Are The Various Measures of Central Tendency
No ratings yet
What Are The Various Measures of Central Tendency
4 pages
Analytical Decision Making
No ratings yet
Analytical Decision Making
27 pages
Statistics Foundation Slider Team Group#1
No ratings yet
Statistics Foundation Slider Team Group#1
94 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
68 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
INF30036 Lecture5
No ratings yet
INF30036 Lecture5
33 pages
Notes: Section 1: Exploratory Data Analysis
No ratings yet
Notes: Section 1: Exploratory Data Analysis
6 pages
Stats For Data Analytics
No ratings yet
Stats For Data Analytics
87 pages
Lecture 1 Exploratory Data Analysis
No ratings yet
Lecture 1 Exploratory Data Analysis
41 pages
Data Analysis
No ratings yet
Data Analysis
10 pages
ISA Summary Toya
No ratings yet
ISA Summary Toya
38 pages
How Much Data Does Google Handle?
No ratings yet
How Much Data Does Google Handle?
132 pages
Terro's Real Estate Agency
No ratings yet
Terro's Real Estate Agency
17 pages
Statistics - Reviewer
No ratings yet
Statistics - Reviewer
12 pages
Data Science Stats for Analysts
No ratings yet
Data Science Stats for Analysts
91 pages
WK 6 Scatterdiagram and Correlation Excel
No ratings yet
WK 6 Scatterdiagram and Correlation Excel
12 pages
02 - Data Exploration: IS5740: Management Support and Business Intelligence Systems
No ratings yet
02 - Data Exploration: IS5740: Management Support and Business Intelligence Systems
37 pages
bk9 18
No ratings yet
bk9 18
12 pages
PR2 - G12 - M - Quizan - Research Design
No ratings yet
PR2 - G12 - M - Quizan - Research Design
20 pages
Reliability, Validity, and Factor Structure of The Creative Achievement Questionnaire
No ratings yet
Reliability, Validity, and Factor Structure of The Creative Achievement Questionnaire
15 pages
Dynamic Leadership Talent Map
No ratings yet
Dynamic Leadership Talent Map
9 pages
Student-Authored Note Guide 2021: Ornell AW Eview
No ratings yet
Student-Authored Note Guide 2021: Ornell AW Eview
19 pages
Data Presentation
No ratings yet
Data Presentation
37 pages
Creativity at Work Environment
No ratings yet
Creativity at Work Environment
14 pages
Nykaa Brand Image Enhancement Strategies
No ratings yet
Nykaa Brand Image Enhancement Strategies
9 pages
(2015) E-Procurement Implementation in The U.S - Understanding Progress in Local Government
No ratings yet
(2015) E-Procurement Implementation in The U.S - Understanding Progress in Local Government
34 pages
Amdor Analytics Internship Program Case 1study
No ratings yet
Amdor Analytics Internship Program Case 1study
4 pages
Chapter 2 TECHNIQUES FOR GENERATING IDEAS
No ratings yet
Chapter 2 TECHNIQUES FOR GENERATING IDEAS
17 pages
HR Management Course Outline
No ratings yet
HR Management Course Outline
4 pages
Possible Questions in Defense
No ratings yet
Possible Questions in Defense
8 pages
Senior High School The Problem and Its Background 1.1: Basic Education Department
No ratings yet
Senior High School The Problem and Its Background 1.1: Basic Education Department
15 pages
JE - Short - Form HR Analytics Manager
No ratings yet
JE - Short - Form HR Analytics Manager
2 pages
Essentials of Business Statistics 5th Edition by Bruce L Bowerman
No ratings yet
Essentials of Business Statistics 5th Edition by Bruce L Bowerman
317 pages
Grade 10 & STEM Students on ECAs
No ratings yet
Grade 10 & STEM Students on ECAs
17 pages
Selection and Information Bias
No ratings yet
Selection and Information Bias
48 pages
Panel Data EconometricsPanel Data Sets
No ratings yet
Panel Data EconometricsPanel Data Sets
9 pages
Texas Sharpshooter & False Dichotomy
No ratings yet
Texas Sharpshooter & False Dichotomy
18 pages
3is Q2 Module 2.1
85% (13)
3is Q2 Module 2.1
11 pages
Analyzing The Impact of Augmented Reality (AR) On Fashion Retail Experiences
No ratings yet
Analyzing The Impact of Augmented Reality (AR) On Fashion Retail Experiences
24 pages
Conclusion For Literature Review Sample
100% (2)
Conclusion For Literature Review Sample
6 pages
Tae Kwon Do's Impact on Teens
No ratings yet
Tae Kwon Do's Impact on Teens
12 pages
Amity University, Mumbai Amity Institute of Allied and Behavioral Sciences
No ratings yet
Amity University, Mumbai Amity Institute of Allied and Behavioral Sciences
1 page
CASP Clinical Prediction Rule Checklist Download
No ratings yet
CASP Clinical Prediction Rule Checklist Download
5 pages
PracResearch2 - Grade 12 - Q3 - Mod1 - Nature of Inquiry and Research - Version4
No ratings yet
PracResearch2 - Grade 12 - Q3 - Mod1 - Nature of Inquiry and Research - Version4
37 pages
Primary and Secondary Sources
No ratings yet
Primary and Secondary Sources
4 pages
CSTP 5 Individual Induction Plan Template
No ratings yet
CSTP 5 Individual Induction Plan Template
3 pages
Course Outline
No ratings yet
Course Outline
4 pages

Basics Data Description

Uploaded by

Basics Data Description

Uploaded by

Basics: Data Description

12 June 2024 16:20

Describing and Summarising Data Variability

In a right-skewed distribution, the tail extends towards

Relationship between variables Correlation

Time series will help us

Excel- CORREL function

You might also like