0% found this document useful (0 votes)

27 views19 pages

Chapter 4

This document summarizes key descriptive statistical measures including measures of location (mean, median, mode, midrange) and measures of dispersion (range, interquartile range, variance, standard deviation). It provides examples of computing each measure using data from a purchase orders database. Empirical rules about what percentage of data falls within a certain number of standard deviations from the mean are also discussed.

Uploaded by

storkydd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views19 pages

Chapter 4

Uploaded by

storkydd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Chapter 4

Descriptive Statistical Measures

Populations and Samples

 Population (Tổng thể) - all items of interest for a particular decision or investigation

- all married drivers over 25 years old

- all subscribers to Netflix

 Sample (Mẫu) - a subset of the population

- a list of individuals who rented a comedy from

Netflix in the past year

 The purpose of sampling (chọn mẫu) is to obtain sufficient information to draw a valid
inference about a population.

Understanding Statistical Notation

 We typically label the elements of a data set using subscripted variables, x1, x2 , … , and so
on, where xi represents the ith observation.

 It is common practice in statistics to use Greek letters, such as m (mu), s (sigma), and p (pi),
to represent population measures and italic letters such as by x (called x-bar), s, and p to
represent sample statistics.

 N represents the number of items in a population and n represents the number of

observations in a sample.

 S represents summation: Sxi = x1 + x2 + … xn

Measures of Location: Arithmetic Mean

 Population mean:

 Sample mean:

 Excel function: =AVERAGE(data range)

 Property of the mean:

 Outliers can affect the value of the mean. Outliers: observations that are radically different
from the rest—which pull the value of the mean toward these values.

Example 4.1: Computing Mean Cost per Order

Purchase Orders database

 Using formula:

=SUM(B2:B95)/COUNT(B2:B95)

Mean = $2,471,760/94

= $26,295.32

Using Excel AVERAGE Function

=AVERAGE(B2:B95)

Measures of Location: Median

 The median specifies the middle value when the data are arranged from least to greatest.

◦ Half the data are below the median, and half the data are above it.

◦ For an odd number of observations, the median is the middle of the sorted
numbers.

◦ For an even number of observations, the median is the mean of the two middle
numbers.

 We could use the Sort option in Excel to rank-order the data and then determine the median.
The Excel function =MEDIAN(data range) could also be used.

 The median is meaningful for ratio, interval, and ordinal data.

 Not affected by outliers.

Example 4.2: Finding the Median Cost per Order

 Sort the data from smallest to largest. Since we have 90 observations, the median is the
average of the 47th and 48th observation.
 Median =

 ($15,562.50 + $15,750.00)/2 = $15,656.25

 =MEDIAN(B2:B94)

Measures of Location: Mode

 The mode is the observation that occurs most frequently.

 The mode is most useful for data sets that contain a relatively small number of unique
values.

 You can easily identify the mode from a frequency distribution by identifying the value or
group having the largest frequency or from a histogram by identifying the highest bar.

 Excel function: =MODE.SNGL(data range).

 For multiple modes: =MODE.MULT(data range)

Example 4.3: Finding the Mode

 Purchase Orders database: A/P Terms

 Mode = 30 months

 Cost per order

 Mode is the group between $0 and $13,000

Measures of Location: Midrange

 The midrange is the average of the greatest and least values in the data set.

 Caution must be exercised when using the midrange because extreme values easily distort
the result. This is because the midrange uses only two pieces of data, whereas the mean
uses all the data; thus, it is usually a much rougher estimate than the mean and is often
used for only small sample sizes.

Example 4.4: Computing the Midrange

 Purchase Orders data

 Use the Excel MIN and MAX functions or sort the data and find them easily.

 Cost per order midrange:

= ($68.78 + $127,500)/2

= $63,784.89

Using Measures of Location – Example 4.5: Quoting Computer Repair Times

The Excel file Computer Repair Times includes 250 repair times for customers.

 What repair time would be reasonable to quote to a new customer?

 Median repair time is 2 weeks; mean and mode are about 15 days.

 Examine the histogram.

90% are completed within 3 weeks

Measures of Dispersion

 Dispersion refers to the degree of variation in the data; that is, the numerical spread (or
compactness) of the data.

 Key measures:

◦ Range

◦ Interquartile range (độ trải giữa)

◦ Variance (phương sai)

◦ Standard deviation (độ lệch chuẩn)

Measures of Dispersion: Range

 The range is the simplest and is the difference between the maximum value and the
minimum value in the data set.

 In Excel, compute as =MAX(data range) - MIN(data range).

 The range is affected by outliers, and is often used only for very small data sets.

Example 4.6: Computing the Range

 Purchase Orders data

 For the cost per order data:

◦ Maximum = $127,500

◦ Minimum = $68.78

 Range = $127,500 - $68.78 = $127,431.22

Measures of Dispersion: Interquartile Range

 The interquartile range (IQR), or the midspread is the difference between the first and third
quartiles, Q3 – Q1.

 This includes only the middle 50% of the data and, therefore, is not influenced by extreme
values.

Example 4.7: Computing the Interquartile Range

 Purchase Orders data

 For the Cost per order data:

 Third Quartile = Q3 = $27,593.75

 First Quartile = Q1 = $6,757.81

 Interquartile Range = $27,593.75 – $6,757.81 =$20,835.94

Measures of Dispersion: Variance

 The variance is the “average” of the squared deviations from the mean.

 For a population:

◦ In Excel: =VAR.P(data range)

 For a sample:

◦ In Excel: =VAR.S(data range)

 Note the difference in denominators!

Example 4.8 Computing the Variance

 Purchase Orders Cost per order data

Measures of Dispersion: Standard Deviation

 The standard deviation is the square root of the variance.

◦ Note that the dimension of the variance is the square of the dimension of the
observations, whereas the dimension of the standard deviation is the same as the
data. This makes the standard deviation more practical to use in applications.

 For a population:

◦ In Excel: =STDEV.P(data range)

 For a sample:

◦ In Excel: =STDEV.S(data range)

Example 4.9 Computing the Standard Deviation

 Purchase Orders Cost per order data

 Using the results of Example 4.8, take the square root of the variance:

 Alternatively, use the STDEV.S function for the data range.

Standard Deviation as a Measure of Risk

Excel file: Closing Stock Prices

Intel (INTC):

Mean = $18.81

Standard deviation = $0.50

General Electric (GE):

Mean = $16.19

Standard deviation = $0.35

INTC is a higher risk

investment than GE.

Chebyshev’s Theorem

 For any data set, the proportion of values that lie within k (k > 1) standard deviations of the
mean is at least 1 – 1/k2

 Examples:

◦ For k = 2: at least ¾ or 75% of the data lie within two standard deviations of the
mean

◦ For k = 3: at least 8/9 or 89% of the data lie within three standard deviations of the
mean

Empirical Rules

 For many data sets encountered in practice:

 Approximately 68% of the observations fall within one standard deviation of the
mean
 Approximately 95% fall within two standard deviations of the mean

 Approximately 99.7% fall within three standard deviations of the mean

 These rules are commonly used to characterize the natural variation in manufacturing
processes and other business phenomena.

Process Capability Index

 The process capability index (Cp) is a measure of how well a manufacturing process can
achieve specifications.

 Using a sample of output, measure the dimension of interest, and compute the total
variation using the third empirical rule.

 Compare results to specifications using:

Example 4.11 Using Empirical Rules to Measure the Capability of a Manufacturing Process

Empirical rules

 A Cp value less than 1.0 is not good; it means that the variation in the process is wider than
the specification limits, signifying that some of the parts will not meet the specifications. In
practice, many manufacturers want to have Cp values of at least 1.5.

Standardized Values
 A standardized value, commonly called a z-score, provides a relative measure of the
distance an observation is from the mean, which is independent of the units of
measurement.

 The z-score for the ith observation in a data set is calculated as follows:

◦ Excel function: =STANDARDIZE(x, mean, standard_dev).

Properties of z-Scores

 The numerator represents the distance that xi is from the sample mean; a negative value
indicates that xi lies to the left of the mean, and a positive value indicates that it lies to the
right of the mean. By dividing by the standard deviation, s, we scale the distance from the
mean to express it in units of standard deviations. Thus,

◦ a z-score of 1.0 means that the observation is one standard deviation to the right of
the mean;

◦ a z-score of 2 1.5 means that the observation is 1.5 standard deviations to the left of
the mean.

Example 4.12 Computing z-Scores

 Purchase Orders Cost per order data

=(B2 -
$B$97)/$B$98, or =STANDARDIZE(B2,$B$97,$B$98).

Coefficient of Variation

 The coefficient of variation (CV) provides a relative measure of dispersion in data relative to
the mean:

 Sometimes expressed as a percentage.

 Provides a relative measure of risk to return.

 Return to risk = 1/CV, is often easier to interpret, especially in financial risk analysis.

 The Sharpe ratio is a related measure in finance.

Example 4.13 Applying the Coefficient of Variation

 Closing Stock Prices worksheet

 Intel (INTC) is slightly riskier than the other stocks.

 The Index fund has the least risk (lowest CV).

Measures of Shape: Skewness

 Skewness (độ lệch) describes the lack of symmetry of data.

◦ Distributions that tail off to the right are called positively skewed; those that tail off
to the left are said to be negatively skewed.

Positively skewed Symmetrical

Coefficient of Skewness

 Coefficient of Skewness (CS):

 Excel function: =SKEW(data range)

 CS is negative for left-skewed data.

 CS is positive for right-skewed data.

 |CS| > 1 suggests high degree of skewness.

 0.5 ≤ |CS| ≤ 1 suggests moderate skewness.

 |CS| < 0.5 suggests relative symmetry.

Example 4.14: Measuring Skewness

 Purchase Orders database

 Cost per order data: CS = 1.66 (high positive skewness)

 A/P terms data: CS = 0.60 (moderate positive skewness)

Measures of Shape: Kurtosis

 Kurtosis (độ nhọn) refers to the peakedness (i.e., high, narrow) or flatness (i.e., short, flat-
topped) of a histogram.

 The coefficient of kurtosis (CK) measures the degree of kurtosis of a population

 CK < 3 indicates the data is somewhat flat with a wide degree of dispersion.

 CK > 3 indicates the data is somewhat peaked with less dispersion.

 Excel function: =KURT(data range).

Shape and Measures of Location

 Comparing measures of location can sometimes reveal information about the shape of the
distribution of observations.

◦ For example, if the distribution were perfectly symmetrical and unimodal, the mean,
median, and mode would all be the same.

◦ If it were negatively skewed, we would generally find that mean < median <
mode

◦ Positive skewness would suggest that mode < median < mean
Excel Descriptive Statistics Tool

This tool provides a summary of numerical statistical measures for sample data.

Data >

Data Analysis >

Descriptive Statistics

 Enter Input Range

 Labels (optional)

 Check Summary Statistics box

 The data must be in a single row or column. If the data are in multiple columns, the tool
treats each row or column as a separate data set

Descriptive Statistics for Grouped Data

 Population mean:
 Sample mean:

 Population variance:

 Sample variance:

Grouped Data

 If the data are grouped into k cells in a frequency distribution, we can use modified versions
of the formulas to estimate the mean and variance by replacing xi with a representative value
(such as the midpoint) for all the observations in each cell.

Descriptive Statistics for Categorical Data: The Proportion

 The proportion, denoted by p, is the fraction of data that have a certain characteristic.

 Proportions are key descriptive statistics for categorical data, such as defects or errors in
quality control applications or consumer preferences in market research.

Statistics in PivotTables

Value Field Settings include several statistical measures:

 Average

 Max and Min

 Product

 Standard deviation

 Variance
Measures of Association

 Two variables have a strong statistical relationship with one another if they appear to move
together.

 When two variables appear to be related, you might suspect a cause-and-effect relationship.

 Sometimes, however, statistical relationships exist even though a change in one variable is
not caused by a change in the other.

Measures of Association: Covariance

 Covariance (hiệp phương sai) is a measure of the linear association between two variables, X
and Y. Like the variance, different formulas are used for populations and samples.

 Population covariance:

◦ Excel function: =COVARIANCE.P(array1,array2)

 Sample covariance:

◦ Excel function: =COVARIANCE.S(array1,array2)

 The covariance between X and Y is the average of the product of the deviations of each pair
of observations from their respective means.

 The larger the absolute value of the covariance, the higher is the degree of linear association
between the two variables.

 The sign of the covariance tells us whether there is a direct relationship or an inverse
relationship.

Measures of Association: Correlation

 Correlation is a measure of the linear relationship between two variables, X and Y, which
does not depend on the units of measurement.

 Correlation is measured by the correlation coefficient, also known as the Pearson product
moment correlation coefficient.

 Correlation coefficient for a population:

 Correlation coefficient for a sample:

 The correlation coefficient is scaled between -1 and 1.

 Excel function: =CORREL(array1,array2)

Notes on the CORREL Function

 When using the CORREL function, it does not matter if the data represent samples or
populations. In other words,

CORREL(array1,array2) =

COVARIANCE.P(array1,array2) / STDEV.P(array1)*STDEV.P(array2)

and

CORREL(array1,array2) =

COVARIANCE.S(array1,array2) / STDEV.S(array1)*STDEV.S(array2)
Excel Correlation Tool

Data >

Data Analysis >

Correlation

 Excel computes the correlation coefficient

between all pairs of variables in the Input Range. Input Range data must be in contiguous columns.

Identifying Outliers

 There is no standard definition of what constitutes an outlier.

 Some typical rules of thumb:

 z-scores greater than +3 or less than -3

 Extreme outliers are more than 3*IQR to the left of Q1 or right of Q3

 Mild outliers are between 1.5*IQR and 3*IQR to the left of Q1 or right of Q3

Statistical Thinking in Business Decisions

 Statistical Thinking is a philosophy of learning and action for improvement, based on

principles that:

 all work occurs in a system of interconnected processes

 variation exists in all processes

 better performance results from understanding and reducing variation

 Work gets done in any organization through processes — systematic ways of doing things
that achieve desired results.

 Understanding business processes provides the context for determining the effects of
variation and the proper type of action to be taken.
Variability in Samples

 Different samples from any population will vary.

◦ They will have different means, standard deviations, and other statistical measures

◦ They will have differences in the shapes of histograms.

 Samples are extremely sensitive to the sample size – the number of observations included in
the samples.

Evans Analytics2e PPT 04
No ratings yet
Evans Analytics2e PPT 04
57 pages
Descriptive Statistical Measures
No ratings yet
Descriptive Statistical Measures
63 pages
(IN) Measures
No ratings yet
(IN) Measures
11 pages
Chapter 4 - Descriptive Statistical Measures
No ratings yet
Chapter 4 - Descriptive Statistical Measures
63 pages
Statistics for Business Analysis
No ratings yet
Statistics for Business Analysis
63 pages
Unit 2 - Descriptive Analytics
No ratings yet
Unit 2 - Descriptive Analytics
74 pages
Evans Analytics2e PPT 04 Revised
No ratings yet
Evans Analytics2e PPT 04 Revised
51 pages
Statistical Analysis Concepts Guide
No ratings yet
Statistical Analysis Concepts Guide
47 pages
Evans Analytics1e PPT 04
No ratings yet
Evans Analytics1e PPT 04
64 pages
Descriptive Statistical Measures
No ratings yet
Descriptive Statistical Measures
18 pages
Chapter 4 - Descriptive Statistical Measures
No ratings yet
Chapter 4 - Descriptive Statistical Measures
71 pages
Chapter 4 Fin534
No ratings yet
Chapter 4 Fin534
38 pages
STAT241 - Business Statistics (Day 3)
No ratings yet
STAT241 - Business Statistics (Day 3)
32 pages
Lesson Recap
No ratings yet
Lesson Recap
106 pages
Unit 4 - Measures of Variability
No ratings yet
Unit 4 - Measures of Variability
24 pages
Chapter 2 (Cont)
No ratings yet
Chapter 2 (Cont)
30 pages
Midterms Day 4
No ratings yet
Midterms Day 4
51 pages
Fundamentals of Statistics With MS Excel
No ratings yet
Fundamentals of Statistics With MS Excel
83 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Probability Theory & Statistics: Describing Data: Numerical
No ratings yet
Probability Theory & Statistics: Describing Data: Numerical
36 pages
Chapter 4
No ratings yet
Chapter 4
21 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Part 2-Chapter 3 - Describing Data - Edit
No ratings yet
Part 2-Chapter 3 - Describing Data - Edit
46 pages
Standard Deviation
No ratings yet
Standard Deviation
37 pages
Statistical Measures
No ratings yet
Statistical Measures
54 pages
Lecture 06-Describing Data Visual Information
No ratings yet
Lecture 06-Describing Data Visual Information
49 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
Math264 Numerical Measures Apaydın
No ratings yet
Math264 Numerical Measures Apaydın
64 pages
Lecture 2b - Describing Data-Numerical
No ratings yet
Lecture 2b - Describing Data-Numerical
47 pages
2 Measures of Location - Dispersion
No ratings yet
2 Measures of Location - Dispersion
61 pages
SLG 4.3 Using Technology To Summarize Quantitative Variables
No ratings yet
SLG 4.3 Using Technology To Summarize Quantitative Variables
4 pages
Standard Error
No ratings yet
Standard Error
14 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
Session 2 Descriptive Statistics Numerical Measures
No ratings yet
Session 2 Descriptive Statistics Numerical Measures
58 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
49 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Topic 4 Descriptive Statistics
No ratings yet
Topic 4 Descriptive Statistics
49 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
Measures of Dispersion Updated
No ratings yet
Measures of Dispersion Updated
38 pages
Varianc and Standard Deviation
No ratings yet
Varianc and Standard Deviation
10 pages
DSBDAL - Assignment No 3
No ratings yet
DSBDAL - Assignment No 3
4 pages
Lecture 04
No ratings yet
Lecture 04
88 pages
Lec006 - Measures of Dispersion
No ratings yet
Lec006 - Measures of Dispersion
42 pages
Standard Deviation and Variance
No ratings yet
Standard Deviation and Variance
10 pages
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
No ratings yet
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
62 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
39 pages
Week 6+7+8
No ratings yet
Week 6+7+8
37 pages
Statistics for Business Analysis
No ratings yet
Statistics for Business Analysis
29 pages
2 Descriptives
No ratings yet
2 Descriptives
43 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
7 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Chap 003
No ratings yet
Chap 003
40 pages
G6statsandprobsreport 220420182943
No ratings yet
G6statsandprobsreport 220420182943
24 pages
Business Statistics: Session 2
No ratings yet
Business Statistics: Session 2
60 pages
Bus. Statt. Chapter-Lecture 2+3
No ratings yet
Bus. Statt. Chapter-Lecture 2+3
43 pages
TV M Unit Cost 1 2 Const, Requirement Stain Remover 0 0.01 0.03 0.03 Liquid 0.03 0.02 0.18 0.18 Powder - 0.01 0.04 0.08 0.04
No ratings yet
TV M Unit Cost 1 2 Const, Requirement Stain Remover 0 0.01 0.03 0.03 Liquid 0.03 0.02 0.18 0.18 Powder - 0.01 0.04 0.08 0.04
6 pages
IFM - Chapter 6
No ratings yet
IFM - Chapter 6
36 pages
IFM - Chapter 5
No ratings yet
IFM - Chapter 5
39 pages
Ifm - 1
No ratings yet
Ifm - 1
42 pages
Statistics Test Review Guide
No ratings yet
Statistics Test Review Guide
2 pages
Unit Plan Nursing Research
50% (2)
Unit Plan Nursing Research
7 pages
Sampling Methods - Types With Examples - QuestionPro - PDF - 20231127 - 145419 - 0000
No ratings yet
Sampling Methods - Types With Examples - QuestionPro - PDF - 20231127 - 145419 - 0000
1 page
Hasil Olah Data Penelitian SPSS
No ratings yet
Hasil Olah Data Penelitian SPSS
7 pages
Union College of Laguna Santa Cruz, Laguna Course Syllabus (MATH 20: Quantitative Techniques For Business)
No ratings yet
Union College of Laguna Santa Cruz, Laguna Course Syllabus (MATH 20: Quantitative Techniques For Business)
10 pages
Ggplot2 Cheatsheet Portuguese
0% (1)
Ggplot2 Cheatsheet Portuguese
2 pages
Statistical Inference Project Part 1
No ratings yet
Statistical Inference Project Part 1
5 pages
ps0 07
No ratings yet
ps0 07
1 page
Ordinal Logistic Regression Analysis of Factors Affecting The Blood Sugar Levels of Diabetes Mellitus Patients
No ratings yet
Ordinal Logistic Regression Analysis of Factors Affecting The Blood Sugar Levels of Diabetes Mellitus Patients
10 pages
How To Choose A Psychotherapist 1st Edition Neville Symington Instant Download
100% (5)
How To Choose A Psychotherapist 1st Edition Neville Symington Instant Download
148 pages
Practice Problems
0% (1)
Practice Problems
8 pages
Understanding Correlation & Regression
No ratings yet
Understanding Correlation & Regression
15 pages
STA201 Course Syllabus MNS
No ratings yet
STA201 Course Syllabus MNS
5 pages
D. A Systematic Process of Gathering, Analyzing and Interpreting Data
No ratings yet
D. A Systematic Process of Gathering, Analyzing and Interpreting Data
8 pages
MVU
100% (1)
MVU
72 pages
Dissertation Guides Workbook 2009
100% (4)
Dissertation Guides Workbook 2009
126 pages
6th IT Handbook
No ratings yet
6th IT Handbook
37 pages
9709 s09 QP 7
No ratings yet
9709 s09 QP 7
9 pages
Lab-5-1-Regression and Multiple Regression
100% (2)
Lab-5-1-Regression and Multiple Regression
8 pages
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
No ratings yet
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
12 pages
Mathematics Education
No ratings yet
Mathematics Education
25 pages
CH 4
No ratings yet
CH 4
49 pages
Manuale Di Psichiatria e Psicologia Clinica Cinzia Bressi Download
100% (1)
Manuale Di Psichiatria e Psicologia Clinica Cinzia Bressi Download
92 pages
Joint Moments & Characteristic Functions
No ratings yet
Joint Moments & Characteristic Functions
20 pages
Topic 1 - Estimating Market Risk Measures Question PDF
No ratings yet
Topic 1 - Estimating Market Risk Measures Question PDF
16 pages
Course ONE: Experimental Research
No ratings yet
Course ONE: Experimental Research
17 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
8 pages
New PV 2020 Semi Final
No ratings yet
New PV 2020 Semi Final
113 pages
English Proficiency by Sex and Grade
No ratings yet
English Proficiency by Sex and Grade
2 pages
Jayaprakash Loganathan 2025 Transforming The Indian Stock Market The Role of Artificial Intelligence
No ratings yet
Jayaprakash Loganathan 2025 Transforming The Indian Stock Market The Role of Artificial Intelligence
12 pages