CHAPTER 3
Averages and Variation
PART 1
MODE - for discrete data, the mode is the value that occurs the most - may involve
one or two or even three values
Example:
✔ 1,1,2,2,2,3,4,5,6,6 mode= 2 ✔ -1,-1,0,0,0,1,2,3,3,4,4,4,4 mode= 0, 4
✔ 5,6,8,10,12,15,20 no mode ✔ 8,8,9,9,10,10,11,11,12,12 no mode - for
continuous data, its is (are) the peak(s) of the distribution
Advantages:
⮚ Easy to fine
⮚ Not sensitive to extreme values
⮚ Only measure of central tendency for categorical data
Disadvantages:
⮚ Only uses some of the data
MEDIAN - the central value of an ordered distribution: half of the dataisbelow the
median and half of the data is above the median 1. Order the data from the smallest to
largest
2. For an odd number of values, the median is the middle value 3. For an even
number of values, the median is the average of thetwo middle values
Example:
5,6,6,8,10 median = 6
5,6,8,10,12,14 median = 9 (8+10/2)
For a large data sets, it is handy to know that the position of the meanis n + 1
2
Advantage:
⮚ Not sensitive to extreme values
Disadvantage:
⮚ Only includes one or two data values
sum of all values
MEAN - the average value. For discrete data
number of values
Where:
● n is the sample size
● N is the population size
Example:
1,1,1,2,2,3,3,4,5,5 mean = 2.7 (27/10)
1,1,1,2,2,3,3,4,5,100 mean = 12.2 (122/10)
Advantages:
⮚ Every data value is used
⮚ Reliable:means of samples from the same population do not varymuch (relatively
speaking)
Disadvantage:
⮚ Sensitive to extreme values
TRIMMED MEAN - we trim k% from both “ends” of the data: removeextreme
values.
Procedures:
1. Put the data in order from the smallest to largest
n k%
100
2. Calculate how many values make up k%
3. Discard the number of values from (2) fromthe top andthebottom of the data
4. Calculate the mean on the remaining values
Example:
Calculate a 5% trimmed mean
1,1,2,3,4,4,5,5,5,6,6,6,6,7,7,8,9,10,18n = 19
5% of 19 = 95
WEIGHTED MEAN - gives more weight or importance to some values: like grades
Example:
You want to know your grade in statistics before the final exam. You currently have a
homework (20%) grade of 92, three test grades(12% each) of 100, 85, 96, and a
participation grade (20%) of 98.
PART 2
RANGE - the overall spread of the data between the minimumandmaximum
values
R = max - min
Example:
-1,-1,0,0,0,1,2,3,3,4,4,4
5,6,8,10,12,15,100
Advantage:
⮚ Easy to find
Disadvantage:
⮚ Very sensitive to extreme values
⮚ Does not provide information about the shape
STANDARD DEVIATION - it measures the variation of all values fromthe mean.
Advantages:
⮚ Uses all values
⮚ Same units as the data
Disadvantages
⮚ Difficult to calculate
⮚ Sensitive to extreme values
Note: the variance is the square of the standard deviation
* The round-off rule for science states that you include one moredecimal place than
you have in your data. But you do not round until thefinal answer
PART 3
COEFFICIENT OF VARIATION (CV) - it is a measure of relativevariation. We use it
to compare the variation in two or more samples or populations
Note: It is always better to have less variation
PART 4
CHEBYSHEV’S THEOREM
● Use to determine the minimum proportion of data (or the population) that must lie
within more (greater) than 1 standard deviation toeither side of the mean
● For any set of data (either population or sample) and for any constant k greater than 1,
the proportion of the data that must lie withinkstandard deviations on either side of
the mean is at least
● It applies to any distribution as long as the man and standarddeviation are defined
(finite)
● Tells us the minimum proportion (percentage) of the data (or thepopulation) that falls
within k standard deviations of the mean(either side of the mean)
● A minimum of 88.9% of the data falls between the values 3 standarddeviations below
the mean and 3 standard deviations above the mean. ⮚ This implies that a maximum
of 11.1% of data fall beyond3standard deviations of the mean
⮚ Such values might be suspect outliers, particularly for amound-shaped symmetric
distribution
PERCENTILE, QUARTILES & 5# SUMMARY
PERCENTILE - the Pth percentile (1< P< 99) of a distribution is a valuesuch that P%
of the data fall below it and (100-P)%of the data fall or above it.
Example:
If you are in the 89
th
percentile of math score, what %of students
have scores:
a. Below yours? 89%
b. Above yours? 11% (100 - 89%)
Note: There is no 100
th
percentile because any person is part of 100%soa
100% can’t be below that person’s score because the person is
QUARTILES
Q1 = 25
th
percentile
Q2 = 50
th
percentile (median)
Q3 = 75
th
percentile
Procedure:
1. Put the data in order from the smallest to largest 2. Find the median (Q2)
3. Find the median of the values below (not equal to) the median-Q14. Find
the
median of the values above (not equal to) the median -Q3 5 NUMBER
SUMMARY
1. Minimum value = 111
2. Q1 = 182
3. Q2 = 221.5
4. Q3 = 319
5. Maximum value = 439
The 5 number summary for example 2 are:
111, 182, 221.5, 319, 439
BOX AND WHISKER PLOTS (BOX PLOTS) - a useful technique from
exploratory data analysis for describingdata
Procedure:
1. Draw a scale horizontal scale
2. Above the scale draw a box from Q1 to Q3 (height of boxcanvary)
3. Draw a solid vertical line from the top to the bottomof thebox at Q2
4. Draw horizontal lines (whiskers) from the left end of thebox(Q1) to the
minimum (lowest) value (located verticallynear the center of the box) and from
the right end of the box(Q3) to the maximum (highest) value
Symmetric Distribution - if the line for Q2 id approximatelyat thecenter of the
box, the distribution is symmetric
Skewed to the left - the line is closer to Q3; left (horizontal) or lower (vertical) side of
box bigger
Skewed to the right - the line is closer to Q1; right side (horizontal) on upper side
(vertical) is bigger