03 The Nature of Data
03 The Nature of Data
03 The Nature of Data
We expect data to vary, and, if it didn’t, we’d question how accurate it is,
but, because it varies, it makes using data for decision making a little more
challenging.
We normally won’t just use one data point for a decision, but rather collect
multiple pieces of data, and we’ll manage that collection to minimize
variation.
Thus variation is natural and expected, and it is the foundation of statistics
Copyright Route Six Sigma, LLC 2003 6
Precise or Accurate – Which
Way?
Data can be precise (small
variation) but not accurate
like these arrows on the
target
Or it can be
accurate but
lack precision
(large variation)
Copyright Route Six Sigma, LLC 2003 7
The primary Sources of Variation
Inadequate Design Margin
Overall Capability Observed Performance Exp. "Within" Performance Exp. "Overall" Performance
Insufficient
Pp 1.07 PPM < LSL 10000.00 PPM < LSL 3621.06 PPM < LSL 6328.16
PPU 1.32 PPM > USL 0.00 PPM > USL 10.51 PPM > USL 39.19
Unstable Parts
PPL 0.83 PPM Total 10000.00 PPM Total 3631.57 PPM Total 6367.35
Ppk 0.83
Process
and Material
Capability
Copyright Route Six Sigma, LLC 2003 8
Types of Variation
Common cause Special Cause
Unknown or chance cause of Causes that are distinct and
variation inherent in any assignable to a specific element or
process. It is not controllable input to a process.
with the technology used in These causes are generally
the process. controllable with the existing
It is also known as residual or technology.
background noise. These causes will effect the
It limits the achievable variation in the process output
variation in the process. So, over time.
the common cause variation in Special Causes are often
a process represents the best a categorized by 5 M’s
process can be, from a Manpower
variation perspective. Machinery
Control or improvement of Method
common cause variation Measurement
requires action on the system Materials
or process. Environment
x m 2
f x
1 2s
2
e
s 2
Note that calculation of expected value for the density function
requires knowledge of the value of the x, the value of the mean (m)
and the value of the standard deviation (s).
x i
X i 1
n
Where X represents the name of the variable being observed
and xi represents the ith value of x in the set of data. S
represents “sum of” and X represents the mean of the xi’s.
Note that m is used in lieu of X when the mean is of the
entire population.
Example:
For the data set 5,7,8,9,12,15,16, the median is “9”.
For the data set 5,7,8,9,12,16, the median is 8.5 (the average of 8
and 9.
Count the number of times each size appears. 10.5 is the mode.
s i 1
s sˆ i 1
10 N n 1
45 50 55
normal2
Artificial scales
Likert Scales
Good; Better; Best
Agree; Neutral; Disagree
P (x ) p (1 p )n x
x
0.3
Density
0.2
0.1
0.0
0 .3
0 .2
Density
0 .1
0 .0
0 5 10 15
p o is s o n
Copyright Route Six Sigma, LLC 2003 23
Defects per Unit
Since we are counting to accumulate data, a
common thread will that it will be the count per
unit of measure.
I.e., number of defects on a PC Board.
It is common to report this data as
Defects per Unit (DPU).
But a computer mother board will not have as
many opportunities for a defect as a Alcatel card
or a sun CPU
Is it accurate to report defects as DPU in both
instances and compare the results?
DPU forms the foundation for Six Sigma using discrete data.
Copyright Route Six Sigma, LLC 2003 24
Defects per Opportunity
To accurately compare discrete defect data from
different processes or products, it is appropriate to
include the number of opportunities for a defect to
occur in each unit as well as the number of units.
Defect count
Defect per opportunity
Units * (opportunities per unit)
Given this relationship, what should be our goal for
the opportunities?
Reduce the number of opportunities
Increase the ability of each opportunity to perform without
defects
DPO is the probability of a defect on any one CTQ or step of the process.
What’s missing?
Yield Yield Op1 * Yield Op2 * Yield Op3 * ...* Yield Opn