0% found this document useful (0 votes)

79 views20 pages

Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods

This document introduces descriptive statistics concepts. It discusses obtaining random samples from populations to make valid inferences. It describes variables and different ways of displaying and describing data, including measures of center (mean, median, mode), measures of spread (range, variance, standard deviation), and quantiles (quartiles, interquartile range). Outliers are identified as observations more than 1.5 times the interquartile range below the first quartile or above the third quartile. Examples are provided to illustrate key concepts.

Uploaded by

Cindy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views20 pages

Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods

Uploaded by

Cindy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

CHAPTER 1: DESCRIPTIVE STATISTICS

1.1 Introduction

Example 1: Making Steel Rods

Consider a machine that makes steel rods for use in optical storage
devices. The specification for the diameter of the rods is 0.450.02
cm. The machine makes 1000 rods per hour (continuous flow
production). The engineer wants to be fairly certain that the
percentage of good rods is at least 90%; otherwise he will shut
down the process for recalibration.

Example 2: Comparison of Breaking Strength of Two Alloys

In order to compare the strength qualities of two alloys, five

specimens from each group were selected randomly and their
breaking strength (force required to rupture a specimen in a tension
test) in megapascals was measured.

The following data were obtained:

Alloy A Alloy B
404 365
406 452
396 378
392 461
402 344

1
Alloy A

Alloy B

Which alloy seems to be the better in terms of its strength?

Study components: obtaining a random sample, collecting data

(obtaining trustworthy measurements), data analysis and
conclusions (generalization from the sample to the whole
population).

1.2 Applications of Statistics in Engineering

Quality control (in manufacturing operations randomly sampling

and testing a fraction of the output; process can be corrected
before a large number of defective items is produced)

Example 3: Filling process

Filling machine fills plastic bottles with a drink; random

sampling used to control the amount of drink in bottles; filling
process can be corrected before it creates a large number of
underfilled or overfilled bottles.

Reliability (ability of a device or system to perform a required

function under stated conditions for a specified period of time;
how long a component or a system will survive)

2
Example 4: Fatigue Tests for Aircraft Wheels

Suppose time until failure of wheels used on commercial aircraft

needs to be estimated. The wheels are made of very strong alloys
able to support aircraft on the ground.

A machine is used to roll the wheel under the desired design load.
The times until failure for wheels randomly selected from the
production run (in thousands of km) are obtained. The data are
used to estimate time until failure for all wheels.

Example 5: Data Mining in Oil and Gas Extraction

Digitization of oil fields: surface rock and soil type, seismic data
(creating shock waves that pass through hidden rock layers and
interpreting the waves that are reflected back to the surface),
satellite images, small core samples obtained by shallow drilling.
Statistical models used to find economically viable fields.

1.3 Random Sampling

Population - the entire collection of individuals or objects about

which information is desired

Sample - the collection of individuals or objects we will actually

measure

Generalization
Sample Population

3
Inferences - statements about the population based on the sample
data

Valid inferences about population can be reached if sample is

representative of the population.
Random sample - a sample in which the elements are chosen at
random (random sample is representative of the population). Larger
random samples give more accurate results than smaller samples.

1.4 Variables

Variable any characteristic of a person or thing that can be

expressed as a number or a label.

Variables

Categorical Quantitative (numerical)

(values are labels) (values are numbers)

Categorical variables: gender, hair color, marital status

Quantitative variables: weight, height, age, income

4
1.5 Displaying Categorical Variables

Consider a class of 30 with 18 males and 12 females.

(a) Bar Graph

A vertical bar erected over each category; the height of the bar is
the frequency or the percentage of observations in the category.

Percent

Females Males

(b) Pie Chart

Females
Males
Slices represent categories;
size of each slice corresponds
to the percentage for the
category

5
1.6 Describing Quantitative Variables

(a) Measures of Center

Sample Mean

Suppose a sample consists of n observations x1, x2, , xn . The

sample mean x is defined as
x1 x2 ... xn
x
n

x .
In compact notation: x
i

Sample mean x is an estimate of the population mean (the

mean of all observations in the whole population).

Example 6: 30 30 40 50 50 60 40 x 300 / 7 42.857

30 30 40 50 50 60 340 x 600 / 7 85.714

Conclusion: The mean is not resistant measure of center (very

sensitive to outliers, observations that fall well below or above
the overall bulk of the data).

Sample Trimmed Mean

Delete some of the smallest and some of the largest observations

(usually bottom 10% and top 10% removed) and take the mean
of the remaining observations.
6
The Sample Median

To compute the median:

(i) Arrange all observations in order, from smallest to largest

(ii)

The single middle value if n is odd,

Median =
The average of the two middle values if n
is even.
Example 7:

Data set 1 : 30 60 40 30 50 40 50 Median =40

Ordered list: 30 30 40 40 50 50 60

Data set 2: 30 60 40 30 50 48 50 40 Median =(40+48)/2

Ordered list: 30 30 40 40 48 50 50 60

The median remains the same if 60 replaced by 600.

Conclusion: The median is a resistant measure of center.

Sample Mode

The sample mode is the most frequently occurring observation in

the sample (no mode if the observations occur with the same
frequency).

7
(b) Measures of Spread

Sample Range

Range = Largest Smallest.

Range ignores all of the information between the largest and the
smallest values.

Variance and Standard Deviation

x x1

x1 x
Observations: x1, x2, , xn

Variance s2 is defined as

( x1 x )2 ( x2 x ) 2 ...( xn x ) 2
s
2
.
n 1

s 2

(x x )
i
2

.
Compact notation: n 1

Standard deviation s: s s
2

8
Properties of s:

1. Measures the spread of observations about the mean.

2. s is not resistant to outliers.
3. s=0 only when there is no spread (all observations equal).
4. s is an estimate of population standard deviation
(standard deviation of all observations in the population).

Example 8: Compute the variance and standard deviation of the

observations: 20, 40, 50, 30, 60, 70

Solution:

Equivalent formula for the variance:

n
2
1 n 2
( xi )
s
2
xi i 1
.
n 1 i 1 n

Example 9: Use the above formula to recalculate the standard

deviation for the above data.

9
Sample Quantiles

The p th sample quantile is a value such that p percent of the

observations fall below or at that value.

Three useful quantiles are quartiles. The lower (or first) quartile has
p=25, the median (or second) quartile denoted by Q2 has p=50, and
the upper (or third) quartile has p=75.

They are denoted by Q1, Q2, and Q3 , respectively.

M=median= Q2

LOWER HALF UPPER HALF

Lower Upper Lower Upper

n even n odd

10
Q1 = Median of the Lower Half
Q2 = Overall Median,
Q3 = Median of the Upper Half.

Q1 Q2 Q3

Interquartile range IQR: IQR = Q3 Q1.

IQR is a measure of spread in the data.

Example 10: Obtain the quartiles and IQR for the sample:

30 30 40 40 48 50 50 60 66 86 94 112

Solution:

11
1.7 Outliers

Outliers- observations separated from the main body of data

outlier

Outlier an observation 1.5IQR below Q1 or 1.5IQR above Q3

Q1 Q2 Q3
1.5 IQR 1.5 IQR

Example 11: Are there are any outliers in Example 10?

Solution:

12
1.8 Displaying Quantitative Variables

Example 12: 30 examination scores:

75 79 58 73 82 94
61 77 54 77 65 67
62 61 64 45 58 86
66 83 70 91 48 78
86 66 52 80 59 55

(a) Histograms

1. Divide the range of the data into non-overlapping classes of

equal width.

( )( )()( )(

Convention: Right-hand limit of each class is included, left-

hand limit is excluded (Excel).
2. Count the number of observations (frequency) in each class.
3. Erect over each class a rectangle whose height equals to the
frequency of that class.

13
Frequency Table:

Class Intervals Frequency Relative

Frequency
40-50 2 2/30
50-60 6 6/30
60-70 9 9/30
70-80 7 7/30
80-90 4 4/30
90-100 2 2/30

40 50 60 70 80 90 100

Frequency histogram for the 30 scores

14
9/30

7/30

2/30

40 50 60 70 80 90 100

Relative frequency histogram of the 30 scores

Shapes of histograms

Unimodal (one peak), bimodal (two peaks)

Shapes of histograms

Symmetric Skewed

Skewed right Skewed left

15
4

Frequency
2

Symmetric

4
Frequency

Skewed Left

4
Frequency

Skewed Right

16
(b) Boxplots

Outlier (more than 1.5

IQR above Q3)

The largest observation
within 1.5 IQR from Q3

IQR Q2
R Q1
The smallest observation
within 1.5 IQR from Q1

Outlier (more than 1.5

IQR below Q1)

Skewed right 17
Skewed left Symmetric
For symmetric distribution: Mean=Median=Mode.

Example 13: Obtain the boxplot for the 30 exam scores. Repeat the
exercise with the score 94 replaced by 120.

Scatterplots are used to display a relationship between two

numerical variables.

Sales

Price

18
(d) Time Series Plots (line charts)

Variable

| | | | |

Equally-spaced time intervals

Example 14: Lumber Cutting

Operator cuts 2-by-4 lumber into exactly 96-inch lengths using a

table saw. However, few pieces will be exactly 96 inches long.
Sources of variation in the cut lengths: most saw blades wobble,
lumber is at least slightly warped, cuts become less precise as the
saw blade becomes duller. The lengths (in inches) of 20 cuts are
given below:
Order Length Order Lengths
1 95.99 11 96.01
2 96 12 95.96
3 95.99 13 96.01
4 96 14 96.02
5 95.98 15 95.95
6 96 16 96.04
7 95.98 17 96.02
8 96 18 96.07
9 95.97 19 96.03
10 96.03 20 96.05

19
20

Probability and Statistics For Computer Scientists Second Edition, By: Michael Baron
No ratings yet
Probability and Statistics For Computer Scientists Second Edition, By: Michael Baron
63 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
H1.1 Definitions, Measures, Plots, CLT
No ratings yet
H1.1 Definitions, Measures, Plots, CLT
83 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
City Uni of New York
No ratings yet
City Uni of New York
33 pages
Math236 Lecture 2
No ratings yet
Math236 Lecture 2
64 pages
Week 01
No ratings yet
Week 01
71 pages
Engineering Statistics Guide
No ratings yet
Engineering Statistics Guide
124 pages
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
No ratings yet
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
9 pages
Introduction To Statistics 2024-2025
No ratings yet
Introduction To Statistics 2024-2025
40 pages
Data Management
No ratings yet
Data Management
36 pages
Statistics Concept Review
No ratings yet
Statistics Concept Review
54 pages
Actuary Math - Stat. Lec1-9
No ratings yet
Actuary Math - Stat. Lec1-9
22 pages
ST Formula Sheet Midterm
No ratings yet
ST Formula Sheet Midterm
4 pages
Math264 Numerical Measures Apaydın
No ratings yet
Math264 Numerical Measures Apaydın
64 pages
STAT241 - Business Statistics (Day 3)
No ratings yet
STAT241 - Business Statistics (Day 3)
32 pages
Business Statistics
No ratings yet
Business Statistics
106 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Ch01 Intro Stat&DataAnalysis
No ratings yet
Ch01 Intro Stat&DataAnalysis
106 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Math Notes Module 4A
No ratings yet
Math Notes Module 4A
4 pages
Descriptive Statistics Week 2: L2 - Graphical Display of Data
No ratings yet
Descriptive Statistics Week 2: L2 - Graphical Display of Data
22 pages
Topic1 Summarizing and Visualizing Data PDF
No ratings yet
Topic1 Summarizing and Visualizing Data PDF
29 pages
Math
No ratings yet
Math
50 pages
Statistics for Computer Science Students
No ratings yet
Statistics for Computer Science Students
6 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
13 pages
STAB22 Lecture's Notes
No ratings yet
STAB22 Lecture's Notes
64 pages
ST8114 Module1 PartI UnivariateEDA
No ratings yet
ST8114 Module1 PartI UnivariateEDA
60 pages
Note 02
No ratings yet
Note 02
31 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Stats Guide for Engineering Students
No ratings yet
Stats Guide for Engineering Students
63 pages
Data Management
No ratings yet
Data Management
43 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
2466939-EDA and STATISTICS NOTES
No ratings yet
2466939-EDA and STATISTICS NOTES
15 pages
Making Sense of Data Statistic Course
No ratings yet
Making Sense of Data Statistic Course
39 pages
Lecture 9
No ratings yet
Lecture 9
40 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
7 pages
COMM 191 Reviewer
No ratings yet
COMM 191 Reviewer
17 pages
Descriptive Statistics Course Guide
No ratings yet
Descriptive Statistics Course Guide
50 pages
02 Data
No ratings yet
02 Data
36 pages
First Week
No ratings yet
First Week
8 pages
Stats
No ratings yet
Stats
109 pages
Chapter 1
No ratings yet
Chapter 1
63 pages
Unit 1 AIDS
No ratings yet
Unit 1 AIDS
128 pages
Module 1 Overview - of - Statistics
No ratings yet
Module 1 Overview - of - Statistics
11 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
Statistics Introduction
No ratings yet
Statistics Introduction
37 pages
C1S1 Statistics Packet
No ratings yet
C1S1 Statistics Packet
24 pages
Lec 1 Probability
No ratings yet
Lec 1 Probability
34 pages
One-Variable Data Analysis Guide
No ratings yet
One-Variable Data Analysis Guide
4 pages
Cover Sheet 351
No ratings yet
Cover Sheet 351
1 page
Chapter4 PDF
No ratings yet
Chapter4 PDF
12 pages
Chapter 3: Random Variables: Random Variable Assigns A Numerical Value To Each
No ratings yet
Chapter 3: Random Variables: Random Variable Assigns A Numerical Value To Each
19 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
7 pages
30-Second Biochemistry - The 50 Vital Processes in and Around Living Organisms, Each Explained
100% (1)
30-Second Biochemistry - The 50 Vital Processes in and Around Living Organisms, Each Explained
162 pages
Practice Test at Home (7) Sol.
No ratings yet
Practice Test at Home (7) Sol.
5 pages
f4 Chemistry Series 3
No ratings yet
f4 Chemistry Series 3
7 pages
C210 Training Manual
100% (10)
C210 Training Manual
158 pages
Nickerie Technical Workflow Juli
No ratings yet
Nickerie Technical Workflow Juli
176 pages
Joint and Cartesian Trajectory Planning
No ratings yet
Joint and Cartesian Trajectory Planning
11 pages
Operating Instructions Temperature-Dependent Charging Compusave/ Protect
No ratings yet
Operating Instructions Temperature-Dependent Charging Compusave/ Protect
5 pages
Butler Matrix - Wikipedia
No ratings yet
Butler Matrix - Wikipedia
3 pages
APC 600VA UPS Product Overview
No ratings yet
APC 600VA UPS Product Overview
3 pages
EZ2000 Series - Total Aluminium
No ratings yet
EZ2000 Series - Total Aluminium
7 pages
COMP9417 Review Notes
No ratings yet
COMP9417 Review Notes
10 pages
Calculus Early Transcendental Functions 4th Edition Robert T Smith
No ratings yet
Calculus Early Transcendental Functions 4th Edition Robert T Smith
305 pages
Learning Vector Quantization
No ratings yet
Learning Vector Quantization
3 pages
Chatper 2 Tangent To Circles ShyxJ
No ratings yet
Chatper 2 Tangent To Circles ShyxJ
5 pages
Sample - HV Metal Enclosed Switchgear Functional Specification
No ratings yet
Sample - HV Metal Enclosed Switchgear Functional Specification
25 pages
Important RGPV Question, EC-305, Network Analysis (NA), III Sem, EC ?
No ratings yet
Important RGPV Question, EC-305, Network Analysis (NA), III Sem, EC ?
1 page
Ebook Ebook PDF The Analysis of Biological Data Second Edition All Chapter PDF Docx Kindle
100% (37)
Ebook Ebook PDF The Analysis of Biological Data Second Edition All Chapter PDF Docx Kindle
47 pages
Bilateral Trade Setups
No ratings yet
Bilateral Trade Setups
42 pages
Vector Basics for Students
No ratings yet
Vector Basics for Students
10 pages
Butterworth Filter Design With A Low Pass Butterworth
No ratings yet
Butterworth Filter Design With A Low Pass Butterworth
10 pages
Meto 222 Final Examination 2
No ratings yet
Meto 222 Final Examination 2
7 pages
Machine Fuji Japan
No ratings yet
Machine Fuji Japan
23 pages
Plano Elec 938k w8k00939
100% (2)
Plano Elec 938k w8k00939
42 pages
S3 Mock6 Paper1 S E
No ratings yet
S3 Mock6 Paper1 S E
24 pages
Medical Image Processing
No ratings yet
Medical Image Processing
79 pages
ISOM 491 Session 2 28aug2015
No ratings yet
ISOM 491 Session 2 28aug2015
115 pages
CSE-IOT&CS Incl BCT - CS& Syllabus - UG - R20
No ratings yet
CSE-IOT&CS Incl BCT - CS& Syllabus - UG - R20
43 pages
Greenhouse Monitoring Using GSM
No ratings yet
Greenhouse Monitoring Using GSM
21 pages
Simpl Logic
No ratings yet
Simpl Logic
326 pages
Parc de Villette, Paris: AREA - 55 Hectares YEAR - 1982-1983
No ratings yet
Parc de Villette, Paris: AREA - 55 Hectares YEAR - 1982-1983
11 pages

Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods

Uploaded by

Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods

Uploaded by

CHAPTER 1: DESCRIPTIVE STATISTICS

Example 1: Making Steel Rods

Example 2: Comparison of Breaking Strength of Two Alloys

In order to compare the strength qualities of two alloys, five

The following data were obtained:

Which alloy seems to be the better in terms of its strength?

Study components: obtaining a random sample, collecting data

1.2 Applications of Statistics in Engineering

Quality control (in manufacturing operations randomly sampling

Example 3: Filling process

Filling machine fills plastic bottles with a drink; random

Reliability (ability of a device or system to perform a required

Suppose time until failure of wheels used on commercial aircraft

Example 5: Data Mining in Oil and Gas Extraction

1.3 Random Sampling

Population - the entire collection of individuals or objects about

Sample - the collection of individuals or objects we will actually

Valid inferences about population can be reached if sample is

Variable any characteristic of a person or thing that can be

Categorical Quantitative (numerical)

Categorical variables: gender, hair color, marital status

Consider a class of 30 with 18 males and 12 females.

(a) Bar Graph

(b) Pie Chart

(a) Measures of Center

Suppose a sample consists of n observations x1, x2, , xn . The

Sample mean x is an estimate of the population mean (the

Example 6: 30 30 40 50 50 60 40 x 300 / 7 42.857

30 30 40 50 50 60 340 x 600 / 7 85.714

Conclusion: The mean is not resistant measure of center (very

Sample Trimmed Mean

Delete some of the smallest and some of the largest observations

To compute the median:

(i) Arrange all observations in order, from smallest to largest

The single middle value if n is odd,

Data set 1 : 30 60 40 30 50 40 50 Median =40

Data set 2: 30 60 40 30 50 48 50 40 Median =(40+48)/2

The median remains the same if 60 replaced by 600.

Conclusion: The median is a resistant measure of center.

The sample mode is the most frequently occurring observation in

Range = Largest Smallest.

Variance and Standard Deviation

1. Measures the spread of observations about the mean.

Example 8: Compute the variance and standard deviation of the

Equivalent formula for the variance:

Example 9: Use the above formula to recalculate the standard

The p th sample quantile is a value such that p percent of the

They are denoted by Q1, Q2, and Q3 , respectively.

LOWER HALF UPPER HALF

Lower Upper Lower Upper

Interquartile range IQR: IQR = Q3 Q1.

IQR is a measure of spread in the data.

Outliers- observations separated from the main body of data

Outlier an observation 1.5*IQR below Q1 or 1.5*IQR above Q3

Example 11: Are there are any outliers in Example 10?

Example 12: 30 examination scores:

1. Divide the range of the data into non-overlapping classes of

Convention: Right-hand limit of each class is included, left-

Class Intervals Frequency Relative

Frequency histogram for the 30 scores

Relative frequency histogram of the 30 scores

Unimodal (one peak), bimodal (two peaks)

Skewed right Skewed left

Outlier (more than 1.5

Outlier (more than 1.5

Scatterplots are used to display a relationship between two

Equally-spaced time intervals

Example 14: Lumber Cutting

Operator cuts 2-by-4 lumber into exactly 96-inch lengths using a

You might also like

Outlier an observation 1.5IQR below Q1 or 1.5IQR above Q3