Chapter 4 Full Gned 03
Chapter 4 Full Gned 03
Introduction
Statistics is the branch of mathematics that deals with the collection, analysis, interpretation,
presentation, and organization of data. It involves the use of mathematical methods to draw conclusions
from data and make decisions based on those conclusions. Statistics is used in a variety of fields, including
business, finance, healthcare, social sciences, and engineering.
Data management refers to the process of collecting, storing, organizing, maintaining, and utilizing
data. Effective data management is critical for organizations to make informed decisions, improve
efficiency, and gain a competitive advantage. It involves using a variety of tools and techniques to ensure
that data is accurate, consistent, and accessible when needed. Data management can include tasks such
as data cleaning, data integration, and data security. Overall, statistics and data management go hand in
hand, as statistics is used to analyze and interpret the data that is managed by organizations.
Data is any bit of information that is expressed in a value or numerical number is data. Raw data is
an initial collection of information. This information has not yet been organized.
Levels of Measurement
A more detailed distinction, termed as the levels of measurement, is used by some
researchers in examining the information that is collected. It is classified as follows:
1. Nominal Measurement – numbers or symbols are used to code or classify each element in
the population. Note that the assigned numbers have no numerical meaning.
Examples: gender, educational background, employment status.
2. Ordinal Measurement – uses numerical category that expresses the meaningful order.
There is no indication of distance between positions. The numbers become meaningful
because they reveal whether one class or category is more or less than the other.
Categories are ranked according to the order of their value on the property like first,
second, third; oldest, next oldest, youngest.
Example: rank in beauty contest
3. Interval – Measurement–has equal intervals. There is significance to the distance between
any two values. It tells us that one unit differs by a certain amount of the property from
another unit. It has no absolute zero.
Example: Aptitude test, temperature
4. Ratio Measurement–A variable measured at this level not only includes the concepts of order
and interval, but also includes the idea of ’nothingness’, or absolute zero.
Example: Measurement of height, weight, ages
Summary
Always remember that when collecting data, it is essential to ensure that the data is reliable and
accurate. This can be achieved by using random sampling, ensuring that the sample size is
appropriate, and using standardized methods of data collection.
B. Data Organization
Once data has been collected, it needs to be organized. This is done to make the data easier to
analyze and interpret. There are three ways or forms to present and organized data. These are:
textual, tabular, and graphical.
2. Tabular Form – A systematic presentation of data in rows and columns. It is used when related
numerical facts need to be classified in arrays.
◼ It should be simple.
◼ It should focus the reader’s attention on the
data rather than on the form.
◼ It should make the meanings and
significance of information being presented
clear.
Different types of graphs can be used in data presentation based on purpose. These are:
◼
◼
◼
1. Which type of graph would you use to show the proportion of ____________
students who own a cat, dog, fish, bird, and no pet?
2. What type of graph would you use to show the populations of 5 ____________
different cities in Cavite?
3. What type of graph would you use to show the population ____________
change of Cavite over the past 25 years?
Classes Frequency Class mark Class Boundaries Cumulative Frequency Relative Frequency
𝒄 𝒇 𝒙 𝑳𝑪𝑩 𝑼𝑪𝑩 < 𝑪𝑭 > 𝑪𝑭 𝒓𝒇%
First, array the data. Array - is an arrangement of observations according to their magnitude, either
in increasing or decreasing order.
Solution:
1. Determine the range.
𝐑𝐚𝐧𝐠𝐞 = 𝐇𝐕− 𝐋𝐕 = 𝟗𝟔 − 𝟓𝟎 = 𝟒𝟔
2. Determine the number of classes, k.
𝐤 =𝟏+ 𝟑. 𝟑𝟐𝟐𝐥𝐨𝐠𝐍 = 𝟏+ 𝟑. 𝟑𝟐𝟐𝐥𝐨𝐠 (𝟏𝟎𝟎) = 𝟕. 𝟔𝟒𝟒
3. Determine the class size, c.
𝑹 𝟒𝟔
𝒄= = = 𝟔. 𝟎𝟏 ≈ 𝟔
𝑲 𝟕. 𝟔𝟒𝟒
4. Determine the lowest class.
The lowest class starts with the smallest value, which is 50. Then, counting six (c = 6) numbers from
50, we have 50 – 55 as the lowest class.
5. Determine all the classes. 6. Tally the frequencies for each class.
Example 2:
Solution:
1. 𝑅ange = 172 − 112 = 60
2. 𝐾 = 1 + 3.322𝐿𝑜𝑔(50) = 6.64 ≈ 7
60
3. 𝐶 = = 8.57 ≈ 9
7
A measure of central tendency is a summary measure that attempts to describe a whole set of
data with a single value that represents the middle or center of data set. Most commonly used measures of
central tendency or type of averages are arithmetic mean, median and mode.
The Mean
The arithmetic mean is the most commonly used measure of central tendency. It is defined as the
sum of all the values of the observations divided by the number of observations. The mean for a finite
population with N elements is denoted by the Greek letter (read as mu). The sample mean which is
denoted by X (read as x-bar) is used to estimate the population mean.
Examples
a. The numbers of employees at 5 different gift shops are 4, 8, 10, 12 and 6. Find the mean
of employees for the five stores.
b. Scores in the Algebra Long Examination for a sample of 10 students are as follows: 84,
75, 90, 98, 88, 79, 95, 86, 93 and 89. Compute for the mean score.
c. The items listed below represents the scores of the seven BSIT students during the
midterm examination. Compute the mean score. 49, 65, 70, 55, 83, 75, and 50.
Solutions
∑𝑥 4+6+8+10+12 40
a. 𝑥̅ = = = =8
𝑛 5 5
∑𝑥 84+75+90+98+88+79+95+86+93+89 877
b. 𝑥̅ = = = = 87.7
𝑛 10 10
∑𝑥 49+65+70+55+83+75+50 447
c. 𝑥̅ = = = = 63.86
𝑛 7 7
Try this!
This is possible only when we can assume the class mark to be representative of all
the values in that class. If the assumption holds, the following equation may be used to
approximate the mean from a frequency distribution.
Example
∑ 𝑓𝑖 𝑋 1273
𝑖
𝑥̅ = =
= 𝟏𝟗. 𝟖𝟗
𝑁 64
Therefore, the average score of 64 students is 𝟏𝟗. 𝟖𝟗.
Note: From the frequency distribution table, add another column to represent the
𝑓𝑖 𝑋𝑖 . Then take the sum under this column.
Examples
1. The daily rates of a sample of eight employee at GMS inc. are ₱550, ₱420, ₱560, ₱500,
₱700, ₱670, ₱860, ₱480. Find the median daily rate of employee.
Solution:
420, 480, 500, 550, 560, 670, 700, 860 Arranged the data in ascending order
1 𝑛 𝑛
𝑥̅̃ = [( ) 𝑡ℎ + ( + 1) 𝑡ℎ] Since 𝑁 = 8 is even, we will use this formula
2 2 2 to find the two middle rank values.
1 8 8
𝑥̅̃ = [( ) 𝑡ℎ + ( + 1) 𝑡ℎ]
2 2 2
1
𝑥̅̃ = [4𝑡ℎ + 5𝑡ℎ]
2
1
𝑥̅̃ = [550 + 560] 420, 480, 500, 550, 560, 670, 700, 860
2
1
𝑥̅̃ = 2 [1110]
𝑥̅̃ = 555 Therefore, the median daily rate is 555.
Try this!
Note: Take note that the median is located in the middle value of the frequency
distribution. It is the value that separates the upper half of the distribution from the lower
half. It is also obvious to note that it is a measure of central tendency because it is the
exact center of the scores in a distribution.
Example
The table below represents the scores of 64 students in a long quiz
Solution
Step 1. Construct a less than cumulative frequency column in the table.
Class interval Frequency (𝑓) Class Mark (𝑥̅) Class Boundary < 𝑐𝑓
5–9 7 7 4.5 – 9.5 7
10 – 14 10 12 9.5 – 14. 5 17
15 – 19 13 17 14. 5 – 19.5 30
20 – 24 18 22 19.5 – 24.5 48
25 – 29 8 27 24.5 – 29.5 56
30 – 34 5 32 29.5 – 34.5 61
35 – 39 3 37 34.5 – 39.5 64
Total N = 64
Step 4: Apply the median formula to compute for the value of the median.
𝑛
− 𝐹𝑏
𝑥̅̃ = 𝐿𝑚𝑑 + 𝑐 [2 ]
𝑓𝑚𝑑
64
− 30
𝑥̅̃ = 19.5 + 5 [ 2 ]
18
𝑥̅̃ = 20.05555
The Mode
The mode (denoted by xˆ , read as x-hat) is the most frequently occurring score in a given data set.
It locates the point where the observation values occur with the greatest density. It does not always exist,
and if it does, it may not be unique. The mode is not easily affected by extreme values, that is even if we
add values to a given data set, still the value will not easily be changed unlike the mean. it can be used for
qualitative as well as quantitative data.
Solutions
a. No mode because there is no value that is repeatedly occurring.
b. 𝑥̅̂ = 2, unimodal d. 𝑥̅̂ = 1, 2, 3, 4, 5, multimodal
c. 𝑥̅̂ = 1 𝑎𝑛𝑑 3, bimodal e. 𝑥̅̂ = 𝐵𝑙𝑢𝑒
Example
The table below represents the scores of 64 students in a long quiz
Step 1. Identify the modal class. Modal class is the class with the highest frequency.
In the table, the modal class is the interval 20 – 24.
Class interval Frequency (𝑓) Class Mark (𝑥̅) Class Boundary
5–9 7 7 4.5 – 9.5
10 – 14 10 12 9.5 – 14. 5
15 – 19 13 17 14. 5 – 19.5
20 – 24 18 22 19.5 – 24.5
25 – 29 8 27 24.5 – 29.5
30 – 34 5 32 29.5 – 34.5
35 – 39 3 37 34.5 – 39.5
Total N = 64
18 − 13
𝑥̅̂ = 19.5 + 5 [ ]
2(18) − 13 − 8
𝑥̅̂ = 21. 166666 ≈ 21.17
Measures of dispersion indicate the extent to which individual items in a series are scattered about
the average. It is used to determine the extent of scatter so that steps may be taken to control the existing
variation. It is also used as a measure of reliability of the average value.
Suppose the researcher wishes to conduct a study on the achievements of teacher-education
students in Zoology. The results are shown in two samples below. The two samples have the same
arithmetic mean, but Sample B is more spread than Sample A. Consider these two sets of scores in a
Zoology examination.
Sample A: 30 31 33 35 37 38 40 42 44 45
Sample B: 20 24 25 28 30 35 40 43 60 70
The foregoing scores have the same arithmetic mean, 37.5 but the scores in Sample A are
homogeneous while in Sample B are heterogeneous. The measures of absolute dispersion are expressed
in the units of the original observations. They cannot be used to compare variations of two data sets when
the averages of these data sets differ a lot in value or when the observations differ in units of measurement.
Range
The range of a set of measurement is the simplest and easiest measure of dispersion. It is easily
attained and readily understood. However, it is the most unreliable because its value depends only in the
extreme values (highest and lowest) of the data set.
Absolute Range. The absolute range is the difference between the largest and the smallest value
of an ungrouped data. For grouped data, the range is the difference between the upper limit of the
highest class and the lower limit of the lowest class.
Example
Find the absolute range of the following:
1. Determine the variance for the data shown below. 10 9 15 18 18 8 10 14 12 16.
2. The table below represents the scores of 64 students in a long quiz. Find the variance.
Class interval Frequency (𝑓) Class Mark (𝑥̅)
5–9 7 7
10 – 14 10 12
15 – 19 13 17
20 – 24 18 22
25 – 29 8 27
30 – 34 5 32
35 – 39 3 37
Total N = 64
Solutions
a. Absolute Range = HV – LV = 18 – 8 = 10
b. Absolute Range = UHC – LLC = 39 – 5 = 34
Example
Determine the mean deviation for the data shown below.
10 9 15 18 18 8 10 14 12 16
Solutions
1. Determine the mean.
10 + 9 + 15 + 18 + 18 + 8 + 10 + 14 + 12 + 16 130
̅=
𝒙 = = 𝟏𝟑
10 10
2.
∑(𝑋 − 𝑥̅ ) |10 − 13| + |9 − 13| + |15 − 13| + … + |16 − 13| + 32
𝑀𝐷 = = = = 𝟑. 𝟐
𝑁 10 10
Variance
Example
3. Determine the variance for the data shown below. 10 9 15 18 18 8 10 14 12 16.
4. The table below represents the scores of 64 students in a long quiz. Find the variance.
Class interval Frequency (𝑓) Class Mark (𝑥̅)
5–9 7 7
10 – 14 10 12
15 – 19 13 17
20 – 24 18 22
25 – 29 8 27
30 – 34 5 32
35 – 39 3 37
Total N = 64
∑ 𝑓(𝑥̅ − 𝑥̅ )2 3990.23
𝑠2 = = = 63.34.
𝑛−1 64 − 1
Standard Deviation
Standard deviation is another measure of dispersion which is most commonly used as a guide for
the degree of dispersion or spread. It is also the most dependable measure to calculate the variability of the
total population from which the sample came. It is equivalent to the square root of the variance.
𝑠 = √𝑠 2
Example
1. Determine the variance for the data shown below. 10 9 15 18 18 8 10 14 12 16.
2. The table below represents the scores of 64 students in a long quiz. Find the variance.
Class interval Frequency (𝑓) Class Mark (𝑥̅)
5–9 7 7
10 – 14 10 12
15 – 19 13 17
20 – 24 18 22
25 – 29 8 27
30 – 34 5 32
35 – 39 3 37
Solutions
1. Since the computed variance for the above data is 13.78, its standard deviation is the
square root of 13.78 or 𝒔 = √𝟏𝟑. 𝟕𝟖 = 𝟑. 𝟕𝟏.
2. Since the computed variance for the above grouped data is 63.34, its standard deviation
is the square root of 63.34 or 𝒔 = √𝟔𝟑. 𝟑𝟒 = 𝟕. 𝟗𝟔.
Measures of locations are values below which a specified fraction or percentage of the observations
in a given set must fall.
The Quartiles
Quartiles are values that divide the array into four equal parts. Thus, Q1, read as first quartile, is
the value below which 25% of the values fall; Q2, read as second quartile, is the value below which 50% of
the values fall; Q3, read as third quartile, is the value below which 75% of the values fall. The second
quartile is computationally equivalent to the median.
𝑛+1 𝑡ℎ
1. If n is odd, 𝑄𝑖 = 𝑖 ( ) observation
4
𝑛 𝑡ℎ
2. If n is even, 𝑄𝑖 = 𝑖 (4 ) observation
Example
1. From the given set of scores in a quiz: 3, 8, 9, 11, 12, 18, 19, find 𝑄1 and 𝑄3
2. Given the data, 6, 8, 9, 9, 10, 12, 15, 15, 17, 18, find 𝑄1 and 𝑄2
Solutions
𝑛+1 𝑡ℎ 𝑛+1 𝑡ℎ
1. 𝑄𝑖 = 𝑖 ( ) observation 1. 𝑄𝑖 = 𝑖 ( ) observation
4 4
7+1 𝑡ℎ 7+1 𝑡ℎ
= 1( ) observation = 3( ) observation
4 4
= 2𝑛𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 6𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑄1 = 8 𝑄3 = 18
Therefore, 25% of the scores in the quiz Therefore, 75% of the scores in the quiz
are below 8. are below 18.
𝑛 𝑡ℎ 𝑛 𝑡ℎ
2. 𝑄𝑖 = 𝑖 ( 4) observation 2. 𝑄𝑖 = 𝑖 ( 4) observation
10 𝑡ℎ 10 𝑡ℎ
= 1 ( 4 ) observation = 2 ( 4 ) observation
= 2.5 𝑜𝑟 3𝑟𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 5𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑄1 = 9 𝑄2 = 10
Therefore, 25% of the scores in the quiz Therefore, 50% of the scores in the quiz
are below 9. are below 10.
Deciles are values that divide the array into ten equal parts. Thus, D1, read as first decile, is the
value below which 10% of the values fall; D2, read as second decile, is the value below which 20% of the
values fall; and so on. The fifth decile is computationally equivalent to the median
Example
1. From the given set of scores in a quiz: 3, 8, 9, 11, 12, 18, 19, find 𝐷4 and 𝐷7
Solutions
𝑛 𝑡ℎ 𝑛 𝑡ℎ
𝐷𝑖 = 𝑖 (10) observation 𝐷𝑖 = 𝑖 (10) observation
7 𝑡ℎ 7 𝑡ℎ
= 4 (10) observation = 7 (10) observation
= 2.8 𝑜𝑟 3𝑟𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 4.9 𝑜𝑟 5𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝐷4 = 9 𝐷7 = 12
Therefore, 40% of the scores in the quiz Therefore, 70% of the scores in the quiz
are below 9. are below 12.
The Percentiles
Percentiles are values that divide the array into one hundred equal parts. Thus, 𝑃1 , read as first-
percentile, is the value below which 1% of the values fall; 𝑃2 ,, read as second percentile, is the value below
which 2% of the values fall; and so on. The fiftieth percentile is computationally equivalent to the median.
Example
1. From the given set of scores in a quiz: 3, 8, 9, 11, 12, 18, 19, 20, 21, 22, find the 45th
percentile (𝑃45 ) and the 77th percentile (𝑃77 )
𝑃45 = 12 𝑃77 = 20
Therefore, 45% of the scores in the quiz Therefore, 77% of the scores in the quiz
are below 12. are below 20.
The normal distribution is sometimes called the "bell curve" and "Gaussian curve" after the
mathematician Karl Friedrich Gauss. Although Gauss played an important role in its history, Abraham de
Moivre first discovered the normal distribution.
The normal curve is a very important distribution in the behavioral sciences. There are three
principal reasons. First, many of the variables measured in behavioral research have distributions that quite
closely approximate the normal curve. Second, many of the inference tests used in analyzing experiments
have sampling distributions that become normally distributed with increasing sample size. Finally, many
inference tests require sampling distributions that are normally distributed. Thus, much of the importance of
the normal curve occurs in conjunction with inferential statistics.
The normal curve is a theoretical distribution of population scores. It is a bell-shaped curve that is
described by the following equation:
𝑵 −(𝑿−𝝁)𝟐
𝒀= 𝒆 𝟐𝝈𝟐
√𝟐𝝅𝝈
Most of us will never need to know the exact equation of the normal curve. It has been given here
primarily to make the point that the normal curve is a theoretical curve that is mathematically generated.
To give us a deeper understanding of the concept of the normal distribution, let us learn more about
its properties. The following are the properties that can be observed from the graph of a normal
distribution.
1. The graph is a continuous curve and has a domain −∞ < 𝑋 < ∞. This means that X may increase
or decrease without bound.
2. The graph is asymptotic to the 𝑥̅ −axis. The value of the variable gets closer and closer but will never
be equal to 0. As the 𝑥̅ gets larger and larger in the positive direction, the tail of the curve
5. The total area in the normal distribution under the curve is equal to 1. Since the mean divides the
curve into halves, 50% of the area is to the right and 50% to its left having a total of 100% or 1.
6. In general, the graph of a normal distribution is a bell-shaped curve with two inflection points, one on
the left and another on the right. Inflection points are the points that mark the change in the curve’s
concavity.
• Inflection point is the point at which a
change in the direction of curve at mean
minus standard deviation and mean plus
standard deviation.
• Note that each inflection point of the normal
curve is one standard deviation away from the
mean.
7. Every normal curve corresponds to the “Empirical Rule” (also called the 68 - 95 - 99.7% rule):
• about 68.3% of the area under the curve falls within 1 standard deviation of the mean.
• about 95.4% of the area under the curve falls within 2 standard deviations of the mean.
We can easily identify the area of the regions under normal curve by using the Table of Areas under
the Normal Curve which is also known as z-Table (see attachment at the last page of this chapter). This
table gives an area to any value of z from -3.99 to 3.99. The value from this table will describe the area of
the specific region of the curve to the left of the given z-value.
Solution: First, split the given z-value into hundredths, we can find the whole number and the tenths digit (-
1.6) at left side of the table while the hundredths (0.09) located at the upper most of the table. Then the
intersection of these numbers will be the area of the normal curve to the left of the z-value.
Illustration:
Example 1:
Step 1: Step 3.
The intersection between -1.3 and 0.05 is 0.0885
Step 2: Step 5
0.9115
Step 1:
Step 4. Since the shaded region is within
the two z-value, subtract 0.4772 by
0.0968. The difference is 0.8804.
Step 2:
Table entry for 𝒛 is the area under the standard normal curve to the left of 𝒛.
Table entry for 𝒛 is the area under the standard normal curve to the left of 𝒛.