Unit 9 Part 1
Unit 9 Part 1
Unit-IX - Statistics – I
Statistical Method
As a singular noun, Statistics means the methods used for collection, classification, analysis
and interpretation of numerical observations.
Limitations of Statistics
i) Statistics can be used only to study numerically valued data. Qualitative phenomena line
honesty, intelligence, poverty etc., are not capable of direct statistical analysis.
ii) Statistics deals only with aggregate and not with individuals.
iii) Statistical data are used only on an average
iv) Statistical data collected for a given purpose cannot be applied to any situation.
v) It is not always possible to compare statistical data, unless they are homogenous in
character.
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 1
Unit – 9 – Statistics Statistical Method
Collection of Data
Before collecting statistical date one should clearly define the following below
The purpose of inquiry
Statistical data are collected to draw desired conclusions based on the data
Scope of inquiry
For an investigation of a statistical problem, one has to decide the geographical are to be
covered, the field of enquiry to which it is to be confined.
Sources of information
i) Primary data : the statistical data can be collected by conducting a survey
is called primary data
ii) Secondary data: The data collected by others may be sufficient for
investigation is called secondary data.
Collection of Primary Data
i) Direct personal investigation
ii) Indirect oral investigation
iii) Information through correspondents
iv) By sending questionnaires by mail
v) By sending schedules to be filed by enumerators.
Statistical Unit
Statistical data cannot be presented or interpreted without a unit. The unit should be simple
and unambiguous. It should also be stable and uniform. The statistical unit can be simple like gram,
kilometer, rupee etc.,
Standard Accuracy
Statistical is a science of estimates. Absolute accuracy is neither possible nor desirable in a
statistical enquiry. Since statistical enquiry involves large collations of data, a reasonable standard of
accuracy is sufficient.
Classification
1. Qualitatively: If the statistical data collected are numerical facts about the qualities like
male, female, employed, Indian, foreigner, etc.,
3. Geographicaly: Statistical data classified according to different areas like states, districts,
towns, villages, etc., come under the category of geographical classification.
4. Chronologicaly: Statistical data arranged according to the time of occurrence come under
this classification.
Tabulation
A statistical data collected either through a primary source or a secondary source has to be
classified first. The classified data has to be presented in a tabular form in an orderly way before
analysis and interpretation of the data.
Tabulation is defined as “the orderly or systematic presentation of numerical data in rows
and columns, designed to facilitate the comparison between the figures”
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 2
Unit – 9 – Statistics Statistical Method
Measures of Location
Average
An average is considered as a typical representation of the whole data.
The various averages in common use are
i) Mean
ii) Median
iii) Mode
There are three types of Mean that are used. They are
i) Arithmetic Mean
ii) Geometric Mean
iii) Harmonic Mean
Arithmetic Mean
Arithmetic mean of a set of observations is their sum divided by the number of observation.
Ie, If are observations then
Examples
1. Find the AM of the following set of observations
25, 32, 28, 34, 24, 31, 36, 27, 29, 30
Ans
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 3
Unit – 9 – Statistics Statistical Method
Examples
1. There are two branches of an establishment employing 100 and 80 persons respectively. If
the arithmetic means of the monthly salaries paid by two branches are Rs. 275 and Rs.225
respectively, find the arithmetic mean of the salaries of the employees of the establishment
as a whole.
Ans
;
;
Let be the average salary of all the employees
2. There average salary of male employees in a firm was Rs.5200 and that of females was
Rs.4,200. The mean salary of all the employees was Rs.5,000. Find the percentage of male
and female employees.
Ans
; and
Let be the average salary of all the employees
Weighted Mean
Note: Weighted Mean is the same as the formula for simple mean with frequencies replaced by
Example
1. Find the simple and weighted arithmetic mean of the first natural numbers, the weights
being the corresponding numbers
Ans
The first natural numbers are 1,2,…
Simple AM
Weighted A.M
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 4
Unit – 9 – Statistics Statistical Method
PG – TRB – Questions
1. There are two branches of a company employing 100 and 80 persons two companies
are Rs.275 and Rs.225 respectively. The A.M of the employees of the companies as a
whole is [2002-03]
a) 500 b) 252.8 c) 250 d) 232.5
2. The algebraic sum of the deviations of all the observations taken from their mean is
a) b) c) d) [2004-05]
3. The algebraic sum of the deviations of a set of values from their arithmetic mean is
a) 1 b) 0 c) n+1 d) -1 [2005-06]
4. The mean mark of 100 students was found to be 40. It was found later that a mark 53
was read as 83. The correlation mean mark is [2005-06]
a) 43 b) 39.7 c) 40.7 d) 35
5. The sum of squares of deviations of a set of values is minimum when taken about the
a) mean b) median c) mode d) origin [2011-12]
6. The average salary of male employees in a firm was Rs.520 and that of females was
Rs.420. The mean salary of all the employees was Rs.500. The percentage of male
employees is [2012-13]
a) 20% b) 60% c) 80% d) 40%
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 5
Unit – 9 – Statistics Statistical Method
Geometric Mean
If are observations then
(Or)
Example
1. Calculate the G.M of the following quantities; 3, 6, 24, 48
Ans
Ans
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 6
Unit – 9 – Statistics Statistical Method
Harmonic Mean
If are observations then
Where
Merits and Demerits of HM
Sl. No. Merit Demerits
1. It is well-defined It is not easy to understand
2. It is based on all the observations It cannot be calculated if one of the items is
Zero
3. It is suitable for algebraic treatment --
4. It can be used in averaging rates, --
ratios, etc.,
Example
1. Calculate the H.M of the following quantities; 3, 6, 24, 48
Ans
2. An aeroplane covered a distance of 800 miles with four different speeds of 100, 200, 300
and 400 3.p/h for the first, second, third and fourth quarter of the distance. Find the average
speed in miles per hour.
Ans
The average speed is given by the H.M of the given set of data
Average speed p/h = H.M.
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 7
Unit – 9 – Statistics Statistical Method
Median
In the case of a Discrete frequency distribution
If the observations are arranged in ascending or descending order of magnitude, the middle
most item is called the median.
Median is not dependent on the values of all the items but dependent on the positional
value and hence is also called a positional average.
In the case of a Continuous frequency distribution
Let there are observations
i) If is odd then median is observation.
Where
= lower limit of the median class
= total frequency
= cumulative frequency of the pre-median class
= frequency of the median class
= uniform class interval
Merits and Demerits of Median
Sl. No. Merit Demerits
1. It is easy to understand It is not well defined
2. It is easily determined It is not based on all the items
3. It is unaffected by extreme value It cannot be accurate
4. It can be used in averaging rates, It is not suitable for algebraic treatment
ratios, etc.,
5. -- It is affected by sampling fluctuations
Quartiles
Quartiles are position values similar to the median.
There are three quartiles denoted .
- the lower quartile or first quartile
- the upper quartile or 3rd quartile
Note:
1. items are less than and other items are greater than
2. items are less than and other items are greater than
nd
3. Median is called the 2 quartile
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 8
Unit – 9 – Statistics Statistical Method
Formulae
For a set of observations arranged in ascending order of magnitude,
observation
observation
In a frequency distribution
Where,
lower limit of the
cumulative frequency of the preceding class
Frequency of the
class interval
Total frequency
Deciles
Deciles are position values similar to the median.
There are ten deciles denoted
- the first Deciles
10% of the observations are less than and the other 90% are greater than . Similarly
we can define
Examples
1. Find the medial of the set of observations 27, 36, 28, 18, 35, 26, 20, 35, 40, 26
Ans
Its ascending order are 18, 20, 26, 26, 27, 28, 35, 35, 36, 40
Number of observations = 10
Median =
2. Find the medial of the following frequency distribution
Daily wages in Rs. : 5 10 15 20 25 30
No. of Persons : 7 12 37 25 22 11
Ans
x f fx
5 7 7
10 12 19
15 37 56
20 25 81
25 22 103
30 11 114
Total 114
This is a discrete frequency distribution.
Median value is given by the A.M. of the two middle observations.
The two middle items are 57th and 58th observations.
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 9
Unit – 9 – Statistics Statistical Method
Mode
Mode is the value which occurs most frequently in a set of observations and around which
the other items of the cluster densely. In other words, mode is the value of the variable which is
predominant series.
Example
Find the mode of the following set of observations 3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3
Solution
5 appears maximum number of times.
So, mode is 5
In the case of a discrete frequency distribution
Mode is the value of corresponding to maximum frequency.
Example
Find the mode of the following
X: 1 2 3 4 5 6 7 8
F: 4 9 16 25 22 15 7 3
Solution
Value of x corresponding to the maximum frequency viz, 25 is 4.
Hence mode is 4
In the case of a Continuous frequency distribution
In this case, mode is obtained by giving weights to the frequencies of the modal class and
pre-modal class and post – modal class. It is given by the formula,
Where,
= lower limit of the modal class
= frequency of the modal class
= frequency of the pre-modal class
= frequency of the post-modal class
= class interval
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 10
Unit – 9 – Statistics Statistical Method
iii) Karl Pearson has given an approximate empirical relation connecting mean, median and
mode.
Example
If the mean, median of a distribution are 4.6 and 4 respectively then the model is
Solution
Mode =
=
=
=
PG – TRB – Questions
1. Mean-M, Median-Md and Mode-M0 of a distribution are related by [2002-03]
a) b)
c) d)
3. The empirical relation between the mean, median and mode is [2003-04]
a) Mode = 3 Median – 2 mean b) Median = 3 Mean – 2 mode
c) Mode = 2 Median – 3 mean d) Mean = 3 Median – 2 mode
5. If the mean, median of a distribution are 4.6 and 4 respectively then the model is
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 11
Unit – 9 – Statistics Statistical Method
Measures of Dispersion
The various measure of dispersion are
i) Range
ii) Quartile deviation
iii) Mean deviation
iv) Standard deviation
Note: The first two are positional measures of dispersion and the last two are measures of
dispersion based on all the observations .
Range
Range is defined to be the difference between the largest and the smaller of the
observations.
Example
1. Find the Range and Coefficient of Range for the following data 13, 25, 36, 22, 18, 45, 21, 26,
30, 22
Solution
2. Find the Range and Coefficient of Range for the following data
Wages (in Rs.) 35-45 45-55 55-65 65-75 75-85
No. of workers 18 22 30 6 4
Solution
Quartile Deviation
Arrange the observations in ascending order and only the middle (50%) of them is
considered. The difference between the lowest and the highest of this group (middle 50%) is called
the inter-quartile range.
Inter-quartile range
Where - the lower quartile
- the upper quartile
Mean Deviation
Range and Quartile deviations are positional measures of dispersion, wherein all the
observations are not taken into account in the calculation.
Now, we consider a measure of dispersion called mean deviation based on all observations.
Definition
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 13
Unit – 9 – Statistics Statistical Method
Example
Consider the observation 3, 5, 6, 7, 9
The sum of the deviations of the items from the mean is zero.
Consider the A.M of the absolute deviations of these observations from their mean
This is called the mean deviation about the mean. This tells that on the average the
observations are deviated away from the mean by 1.6 units on either side.
Merits and Demerits
i) Mean deviation is based on all the observations.
ii) It is simple to understand and easy to calculate
iii) It is not very much affected by the presence of extreme values
iv) It is stable
v) It gives up the sign of the deviations and takes only the numerical deviations
vi) It is not suited for algebraic treatment
Standard Deviation
It is defined as the positive square root of the AM of the squares of all deviations of the
observations from their AM. S.D is denoted by symbol .
. Then
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 14
Unit – 9 – Statistics Statistical Method
Variance
Result
Co-efficient of variance
100 times the coefficient of dispersion based upon standard deviation is called coefficient of
variance. (or)
1. The standard deviation of two sets containing and members are and
respectively being measured from their respective means and . If the two sets
are grouped together as one set of members then the standard deviation of
this set measured from its mean is given by [2001]
a) + b)
c) + d) none of these
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 15
Unit – 9 – Statistics Statistical Method
Moments
The moment of a random variable about any point is denoted by
In particular,
i) If then
ii) If then
iii) If then
Relation between Moments about Mean in terms of Moment at any point and vice versa
In particular,
i) If then
ii) If then
iii) If then
Also
In particular,
i) If then
ii) If then
iii) If then
and
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 16
Unit – 9 – Statistics Statistical Method
PG – TRB – Questions
1. The relation between the third order central and raw moments is given by [2002-02]
a) b)
c) d)
2. For all distributions, the first moment about the mean is [2004-05]
a) b) c) d)
3. The first moment of a distribution about the point x = 2 is 1. The mean is [2005-06]
a) 0 b) 1 c) 2 d) 3
4. For a symmetrical distribution the standard deviation is 5 and the both fourth
moment about mean is [2005-06]
a) 2050 b) 1875 c) 1250 d) 2500
5. Let denote the moment about any point a and denote the moment
about the mean then [2011-12]
a) b) c) d)
6. If the frequency distributions is continuous and is the width of the class interval
then (corrected) due to W.F. Sheppard is [2015]
a) b)
c) d)
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 17
Unit – 9 – Statistics Statistical Method
Skewness
In a symmetrical distribution, mean, median and mode coincide. If this is not the case, the
distribution is said to be skewed.
Skewness means “lack of symmetry”. Skewness gives an idea about the shape of the curve.
A distribution is said to be skewed if
i) a distribution is not symmetrical,
ii) Mean, median and mode fall at different points. Ie.
iii) Quartiles are not equidistant from median
iv) The curve drawn with the help of the given data is not symmetrical but stretched more
to one side than to the other.
Measures of Skewness
i)
ii)
iii)
Where is the mean; is the median; is mode of the distribution
Note: The measure of skewness based on moment is given by about the mean
Result
i) In a negatively skewed distribution,
ii) In a positively skewed distribution,
PG – TRB – Questions
1. Karl pearson’s coefficient of skewness is [2001]
a) b) c) d)
2. The Karl Pearson’s coefficient of skewness lines between [2011-12]
a) -1 and 1 b) -3 and 3 c) -2 and 2 d) 0 and 2
3. The limits Karl Pearson’s co-efficient of skewness are [2012-13]
a) b) c) d)
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 18
Unit – 9 – Statistics Statistical Method
Kurtosis
Kurtosis enables us to have an idea about the flatness or peakness of the frequency
curve. It is measured by the coefficient or its derivation is given by
ii) The curve which is more plateopped than the normal curve is more peaked than the
normal curve is called platykurtic.
iii) The curve which is more peaked than the normal curve is called leptokurtic.
In Particular,
i) If the curve is called mesokurtic
ii) If the curve is called leptokurtic
iii) If the curve is called platykurtic
Otherwise,
i) If the curve is called mesokurtic
ii) If the curve is called leptokurtic
iii) If the curve is called platykurtic
Result
1. For the normal distribution and taken as the standard distribution to measure
kurtosis
3. The standard deviation of a distribution is 5. The value of the fourth central moment ( in
order that the distribution be mesokurtic, should be equal to 1875
PG – TRB – Questions
__________________________________________________________________________________
Prepared by D.THIRUMARAN, M.Sc., B.Ed., - 8015461606 C.Mutlur 19