Statistics is the scientific methods of collecting, analyzing, summarizing, interpreting, and
presentation of data to make valid conclusion. Statistics is divided into: Descriptive and
Inferential.
Descriptive Statistics: It involves scientific methods to collect and present information with
graphs and numerical values.
Inferential Statistics: Involves the use of probability to generalize base on a sample of
population from a larger population to make conclusion.
DATA AND DATA SOURCES
Statistical data are raw facts of statistics. It may relate to an activity of under study, a
phenomenon, or a situation of interest. Statistical data are derived through the process of
measuring, counting and/or observing. An activity or phenomenon that generates data through its
process is termed as a variable. In other words, a variable
is one that takes on different values upon successive measurements. In statistics, data are
classified into two categories: quantitative data and qualitative data. This classification is based
on the kind of characteristics that are measured.
Quantitative Data: These are data that can be expressed numerically or quantified in definite
units of measurement.
Examples : Age of students taking STS 102, Score of UTME exam, etc. These observations are
expressed using numbers or quantified.
Depending on the nature of the variable observed for measurement, quantitative data can be
further categorized as continuous and discrete data.
Qualitative Data: These data cannot be expressed in numbers or quantified in unit of
measurement. Examples include Blood group, Sex, Nationality etc. These data are further
classified as nominal and rank data.
DATA SOURCES
The sources of data is divided into: Primary and Secondary data
Primary Data: These are data collected directly from the respondent. They are regarded as first
hand information collected by the researcher. Examples of Primary data can be obtained from:
Census
Survey
Secondary data: These are data already existed in form of published or unpublished source.
They are available from published source(s) which may not necessarily in the form actually
required.
Examples of secondary data include:
Journals publication
Research or Media organization
Methods of Data Collection
The method of data collection depends solely on the problem at hand. There are various methods
of collection of data viz-a-viz :
Interviewing
Questionnaire
Observation
Telephone
Data Presentation
A set of raw data collected are organized numerically for ease of analysis and
presentation. This is done by creating frequency table which is known as frequency
distribution. Presenting data in tables, charts, graphs gives a clearer meaning to the data.
Basic Terms
Class interval : A symbol defining a class, e.g 60–62 is called a class interval. The end numbers,
60
and 62, are called class limits; the smaller number (60) is the lower class limit, and the larger
number (62)
is the upper class limit.
Class Boundaries : the class boundaries are obtained by adding the upper limit of one class
interval to the
lower limit of the next-higher class interval and dividing by 2.
Class Width or Class Size: The size, or width, of a class interval is the difference between the
lower and upper class boundaries
and is also referred to as the class width, class size, or class length. If all class intervals of a
frequency
distribution have equal widths, this common width is denoted by c. In such case c is equal to the
difference between two successive lower class limits or two successive upper class limits.
Class Mark: The class mark is the midpoint of the class interval and is obtained by adding the
lower and upper
class limits and dividing by 2. The class mark is also called the class midpoint.
Frequency: A frequency is the number of times a value of the data occurs
Relative Frequency: A relative frequency is the ratio (fraction or proportion) of the number of
times a value of the data occurs in the set of all outcomes to the total number of outcomes. To
find the relative frequencies, divide each frequency by the total number of students in the
sample, n.
Cumulative Frequency: it is the sum of a frequency of the particular class to the frequencies of
the class before it.
Frequency Distribution
Frequency distribution is classified as: grouped and ungrouped frequency distribution.
Ungrouped frequency: it is basically for quantitative data sets. It is best when the range of the
data is less than 10 units. Range is the difference between the largest data value and the smallest
data value. For example, twenty students were asked how many hours they worked per day.
Their responses, in hours, are as follows:
5; 6; 3; 3; 2; 4; 8; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3.
Range= 8-2
=6
Since the range is 6, we will keep each data value separate and not group them together. To
create an ungrouped frequency distribution is a simple task. Place the data values from smallest
to the largest without skipping any values on the first column. Place the frequency, the count of
each data value, in the corresponding row of the second column.
The table below shows the different data values in ascending order and their frequencies. Notice
all the data values are listed including seven which is not listed on the original data set.
Data Values Frequency(f)
2 3
3 5
4 3
5 6
6 2
7 0
8 1
Frequency distribution of students work hours
Grouped Frequency Distribution
This second type of frequency distribution is also used when there is quantitative data. However,
it is used when the range is large and the data values need to be grouped together. For example,
28 students were asked how many hours they worked per week. Their responses, in hours, are as
follows:
15; 26; 13; 33; 22; 14; 27; 15; 32; 23; 5; 26; 25; 14; 34; 13; 15; 22; 15; 28; 10; 18; 21; 24; 20; 18;
34; 20;
Here there are too many different data values to list them separately as in the ungrouped
frequency distribution. Notice the range is 29 (highest – lowest = 34 – 5). Therefore we need to
construct a grouped frequency distribution and group data values into classes.
A class is an interval where the lowest value of the interval is known as the lower limit and the
highest value of the interval is known as the upper limit.
Guidelines for classes:
There should be between 5 and 20 classes
Classes must be mutually exclusive (no overlap of data values)
Classes must be all inclusive and continuous
Classes must be equal in width
Constructing a Grouped Frequency Distribution:
1.) Find Range (R) (highest data value – lowest data value)
2.) Determine the number of classes (C) (usually the minimum is 5 classes and a maximum of 20
classes)
There are several suggested guide lines aimed at helping one decided on how many class
intervals to employ. Two of such methods are:
(a) C = 1 +3.322(log10 𝑛)
(b) C = 𝑛 where n = number of observations.
𝑅
3. Determine the width of the class interval (W), given as W= 𝐶 , where R is the Range of values,
and C is number of classes.
Note: Class width are rounded up to give number of classes.
4. Choose first lower limit (usually the lowest data value)
5. Create the other lower limits of the classes by adding the class width to the previous lower
limit
6. Create the upper limits by not overlapping the limits
7. Determine the numbers of observations falling into each class interval i.e. find the class
frequencies.
.
Example1: The following are the marks of 50 students in STS 102:
48 70 60 47 51 55 59 63 68 63 47 53 72 53 67 62 64 70 57 56
48 51 58 63 65 62 49 64 53 59 63 50 61 67 72 56 64 66 49 52
61 71 58 53 63 69 59 64 73 56.
(a) Construct a frequency table for the above data.
(b) Answer the following questions using the table obtained:
(i) how many students scored between 51 and 62?
(ii) how many students scored above 50?
(iii) what is the probability that a student selected at random from the class will
score less than 63?
Solution:
(a) Range (R) = Largest value – Smallest Value
= 73-47=26
No of classes(C) = 𝑛 = 50= 7.07≅ 7
𝑅 26
Class size or width (W)= 𝐶 = = 3.7 ≅ 4
7
Frequency Table
Marks Tally Frequency (f)
47-50 |||| || 7
51-54 |||| || 7
55-58 |||| || 7
59-62 |||| ||| 8
63-66 |||| |||| | 11
67-70 |||| | 6
71-74 |||| 4
50
b. i. 7+7+8 = 22
ii. 7+7+8+11+6+4= 43
iii. scores less than 63= 8+7+7+7= 29
Total number of students= 50
Prob(less than 63) = 29/50= 0.58
Example2: Twenty-eight students were asked how many hours they worked per week. Their
responses, in hours, are as follows: 15; 26; 13; 33; 22; 14; 27; 15; 32; 23; 5; 26; 25; 14; 34; 13;
15; 22; 15; 28; 10; 18; 21; 24; 20; 18; 34; 20; construct a grouped frequency distribution using 5
classes
Solution:
1. Range = 34 – 5 = 29
2. Use 5 classes
3. Class Width = 29/5 = 5.8 round up to 6
4. First lower limit will be 5 which is the minimum data value
5. The other lower limits will be 11, 17, 23, 29 by adding the class width of 6 to the previous
lower limit
6. The first upper limit will be 10 since the next class begins at 11. Using class width again, the
other upper limits are 16, 22, 28, 34
Class Tally Frequency (f)
5- 10 || 2
11-16 |||| ||| 8
17- 22 |||| || 7
23- 28 |||| || 7
29-34 |||| 4
28
ASSIGNMENT 1
The following data represent the ages (in years) of people living in a housing estate
in Abeokuta.
18 31 30 6 16 17 18 43 2 8 32 33 9 18 33 19 21 13 13 14
14 6 52 45 61 23 26 15 14 15 14 27 36 19 37 11 12 11 20 12
39 20 40 69 63 29 64 27 15 28.
Present the above data in a frequency table showing the following columns; class
interval, class boundary, class mark (mid-point), tally, frequency and cumulative
ASSIGNMENT 2
The grade points of 40 students are given below, using class 8 classes, construct a frequency
distribution and relative frequency
48 70 60 47 51 55 59 63 68 63 47 53 72 53 67 62 64 70 57 56
48 51 58 63 65 62 49 64 53 59 63 50 61 67 72 56 64 66 49 52