Statistics For MGMT
Statistics For MGMT
L = Largest value
S = Smallest value
15
R = the no. or classes
Ex:
If the marks of 60 students in a class varies between 40 and 100 and if we want
to form 6 classes, the class interval would be
R
S L
i
=
6
40 100
=
6
60
= 10 L = 100
S = 40
K = 6
Therefore, class intervals would be 40 50, 50 60, 60 70, 70 80, 80 90
and 90 100.
Methods of forming class-interval
a) Exclusive method (overlapping)
In this method, the upper limits of one class-interval is the lower limit of next
class. This methods makes continuity of data.
Ex:
Marks No. of students
20 30 5
30 40 15
40 50 25
A student whose mark is between 20 to 29.9 will be included in the 20 30
class.
Better way of expressing is
Marks No. of students
20 to les than 30
(More than 20 but les than 30)
5
30 to les than 40 15
40 to les than 50 25
Total Students 50
b) Inclusive method (non-overlaping)
Ex:
Marks No. of students
16
20 29 5
30 39 15
40 49 25
A student whose mark is 29 is included in 20 29 class interval and a student
whose mark in 39 is included in 30 39 class interval.
Class Frequency
The number of observations falling within class-interval is called its class
frequency.
Ex: The class frequency 90 100 is 5, represents that there are 5 students scored
between 90 and 100. If we add all the frequencies of individual classes, the total
frequency represents total number of items studied.
Magnitude of class interval
The magnitude of class interval depends on range and number of classes. The
range is the difference between the highest and smallest values is the data series. A
class interval is generally in the multiples of 5, 10, 15 and 20.
Sturges formula to find number of classes is given below
K = 1 + 3.322 log N.
K = No. of class
log N = Logarithm of total no. of observations
Ex: If total number of observations are 100, then number of classes could be
K = 1 + 3.322 log 100
K = 1 + 3.322 x 2
K = 1 + 6.644
K = 7.644 = 8 (Rounded off)
NOTE: Under this formula number of class cant be less than 4 and not greater than 20.
Class mid point or class marks
The mid value or central value of the class interval is called mid point.
Mid point of a class =
2
class) of limit upper class of limit (lower +
Sturges formula to find size of class interval
Size of class interval (h) =
N log 322 . 3 1
Range
+
17
Ex: In a 5 group of worker, highest wage is Rs. 250 and lowest wage is 100 per day.
Find the size of interval.
h =
N log 322 . 3 1
Range
+
=
50 log 322 . 3 1
100 250
+
= 55.57 56
Constructing a frequency distribution
The following guidelines may be considered for the construction of frequency
distribution.
a) The classes should be clearly defined and each observations must belong to one
and to only one class interval. Interval classes must be inclusive and non-
overlapping.
b) The number of classes should be neither too large nor too small.
Too small classes result greater interval width with loss of accuracy. Too many
class interval result is complexity.
c) All interval should be of the same width. This is preferred for easy
computations.
The width of interval =
classes of Number
Range
d) Open end classes should be avoided since creates difficulty in analysis and
interpretation.
e) Intervals would be continuous throughout the distribution. This is important for
continuous distribution.
f) The lower limits of the class intervals should be simple multiples of the interval.
Ex: A simple of 30 persons weight of a particular class students are as follows.
Construct a frequency distribution for the given data.
62 58 58 52 48 53 54 63 69 63
57 56 46 48 53 56 57 59 58 53
52 56 57 52 52 53 54 58 61 63
Steps of construction
Step 1
Find the range of data (H) Highest value = 70
(L) Lowest value = 46
Range = H L = 69 46 = 23
Step 2
Find the number of class intervals.
Sturges formula
K = 1 + 3.322 log N.
18
K = 1 + 3.222 log 30
K = 5.90 Say K = 6
No. of classes = 6
Step 3
Width of class interval
Width of class interval =
classes of Number
Range
= 4 883 . 3
6
23
Step 4
Conclusions all frequencies belong to each class interval and assign this total
frequency to corresponding class intervals as follows.
Class interval Tally bars Frequency
46 50 | | | 3
50 54 | | | | | | | 8
54 58 | | | | | | | 8
58 62 | | | | | 6
62 66 | | | | 4
66 70 | 1
Cumulative frequency distribution
Cumulative frequency distribution indicating directly the number of units that
lie above or below the specified values of the class intervals. When the interest of the
investigator is on number of cases below the specified value, then the specified value
represents the upper limit of the class interval. It is known as less than cumulative
frequency distribution. When the interest is lies in finding the number of cases above
specified value then this value is taken as lower limit of the specified class interval.
Then, it is known as more than cumulative frequency distribution.
The cumulative frequency simply means that summing up the consecutive
frequency.
Ex:
Marks No. of students
Less than
cumulative
frequency
0 10 5 5
10 20 3 8
19
20 30 10 18
30 40 20 38
40 50 12 50
In the above less than cumulative frequency distribution, there are 5 students
less than 10, 3 less than 20 and 10 less than 30 and so on.
Similarly, following table shows greater than cumulative frequency
distribution.
Ex:
Marks No. of students
Less than
cumulative
frequency
0 10 5 50
10 20 3 45
20 30 10 42
30 40 20 32
40 50 12 12
In the above greater than cumulative frequency distribution, 50 students are
scored more than 0, 45 more than 10, 42 more than 20 and so on.
Diagrammatic and Graphic Representation
The data collected can be presented graphically or pictorially to be easy
understanding and for quick interpretation. Diagrams and graphs gives visual
indications of magnitudes, groupings, trends and patterns in the data. There parameter
can be more simply presented in the graphical manner. The diagrams and graphs helps
for comparison of the variables.
Diagrammatic presentation
A diagram is a visual form for presentation of statistical data. The diagram
refers various types of devices such as bars, circles, maps, pictorials and cartograms etc.
Importance of Diagrams
1. They are simple, attractive and easy understandable
2. They give quick information
3. It helps to compare the variables
20
4. Diagrams are more suitable to illustrate discrete data
5. It will have more stable effect in the readers mind.
Limitations of diagrams
1. Diagrams shows approximate value
2. Diagrams are not suitable for further analysis
3. Some diagrams are limited to experts (multidimensional)
4. Details cannot be provided fully
5. It is useful only for comparison
General Rules for drawing the diagrams
i) Each diagram should have suitable title indicating the theme with which
diagram is intended at the top or bottom.
ii) The size of diagram should emphasize the important characteristics of data.
iii) Approximate proposition should be maintained for length and breadth of
diagram.
iv) A proper / suitable scale to be apoted for diagram
v) Selection of approximate diagram is important and wrong selection may
mislead the reader.
vi) Source of data should be mentioned at bottom.
vii) Diagram should be simple and attractive
viii) Diagram should be effective than complex.
Some important types of diagrams
a) One dimensional diagrams (line and bar)
b) Two-dimensional diagram (rectangle, square, circle)
c) Three dimensional diagram (cube, sphere, cylinder etc.)
d) pictogram
e) Cartogram
a) One dimensional diagrams (line and bar)
In one dimensional diagrams, the length of the bars or lines are taken into
account. Width of the bars are not considered. Bar diagrams are classified mainly as
follows.
i) Line diagram
ii) Bar diagram
21
- Vertical bar diagram
- Horizontal bar diagram
- Multiple (compound) bar diagram
- Sub-divided (component) bar diagram
- Percentage subdivided bar diagram
i) Line diagram
This is simplest type of one dimensional diagram. On the basis of size of the
figures, heights of the bar / lines are drawn. The distance between bars are kept
uniform. The limitation of this diagram are it is not attractive cannot provide more than
one information.
Ex: Draw the line diagram for the following data
Year 2001 2002 2003 2004 2005 2006
No. of students passed in first class
with distinction
5 7 12 5 13 15
2001 2002 2003 2004 2005 2006
4
6
8
10
12
14
16
(15)
(13)
(5)
(12)
(7)
(5)
N
o
.
o
f
s
t
u
d
e
n
t
s
p
a
s
s
e
d
i
n
F
C
D
Year
Indication of diagram: Highest FCD is at 2006 and lowest FCD are at 2001 and 2004.
b) Simple bars diagram
A simple bar diagram can be drawn using horizontal or vertical bar. In business
and economics, it is very a common diagram.
Vertical bar diagram
22
The annual expresses of maintaining the car of various types are given below.
Draw the vertical bar diagram. The annual expenses of maintaining includes (fuel +
maintenance + repair + assistance + insurance).
Type of the car Expense in Rs. / Year
Maruthi Udyog 47533
Hyundai 59230
Tata Motors 63270
Source: 2005 TNS TCS Study
Published at: Vijaya Karnataka, dated: 03.08.2006
47533
59230
63270
30000
35000
40000
45000
50000
55000
60000
65000
70000
Maruthi Udyog Hyundai Tata Motors
Source: 2005 TNS TCS Study
Published at: Vijaya Karnataka, dated: 03.08.2006
Indicating of diagram
a) Annual expenses of Maruthi Udyog brand car is comparatively less with
other brands depicted
b) High annual expenses of Tata motors brand can be seen from diagram.
Horizontal bar diagram
World biggest top 10 steel makers are data are given below. Draw horizontal
bar diagram.
Steel
maker
Arcelor
Mittal
Nippo
n
POSCO JFE
BAO
Steel
US
Steel
NUCOR
RIVA Thyssen-
krupp
Tangshan
23
Prodn.
in
million
tonnes
110 32 31 30 24 20 18 18 17 16
110
32
31
30
24
20
18
18
17
16
0 20 40 60 80 100 120
Arcelor Mittal
Nippon
POSCO
JFE
BAO Steel
US Steel
NUCOR
RIVA
Thyssen-krupp
Tangshan
T
o
p
-
1
0
S
t
e
e
l
M
a
k
e
r
s
Production of Steel (Million Tonnes)
Source: ISSB Published by India Today
Compound bar diagram (Multiple bar diagram)
Multiple bar diagrams are used to provide more information than simple bar
diagram. Multiple bar diagram provides more than one phenomenon and highly useful
for direct comparison. The bars are drawn side by side and different columns, shades
hatches can be used for indicating each variables used.
Ex: Draw the bar diagram for the following data. Resale value of the cars (Rs. 000) are
as follows.
Year (Model) Santro Zen Wagonr
2003 208 252 248
2004 240 278 274
2005 261 296 302
24
208
252
248
240
278
274
261
296
302
0
50
100
150
200
250
300
350
1 2 3
Model of Car
V
a
l
u
e
i
n
R
s
.
Santro Zen Wagnor
Source: True value used car purchase data
Published by: Vijaya Karnataka, dated: 03.08.2006
Ex: Represent following in suitable diagram
Class A B C
Male 1000 1500 1500
Female 500 800 1000
Total 1500 2300 2500
1000
500
1500
800
1500
1000
0
500
1000
1500
2000
2500
P
o
p
u
l
a
t
i
o
n
(
i
n
N
o
s
.
)
1 2 3
Class
Male Female
25
1500
2300
2500
Ex: Draw the suitable diagram for following data
Mode of
investment
Investment in 2004 in Rs. Investment in 2005 in Rs.
Investment %age Investment %age
NSC 25000 43.10 30000 45.45
MIS 15000 25.86 10000 15.15
Mutual Fund 15000 25.86 25000 37.87
LIC 3000 5.17 1000 1.52
Total 58000 100 66000 100
2004 2005
0
10
20
30
40
50
60
70
80
90
100
110
45.45
15.15
37.87
1.52 5.17
25.86
25.86
43.10
%
o
f
I
n
v
e
s
t
m
e
n
t
Year
Two-dimensional diagram
In two-dimensional diagram both breadth and length of the diagram (i.e. area of
the diagram) are considered as area of diagram represents the data. The important two
dimensional diagrams are
a) Rectangular diagram
b) Square diagram
a) Rectangular diagram
Rectangular diagrams are used to depict two or more variables. This diagram
helps for direct comparison. The area of rectangular are kept in proportion to the
values. It may be of two types.
i) Percentage sub-divided rectangular diagram
ii) Sub-divided rectangular diagram
In former care width of the rectangular are proportional to the values, the
various components of the values are converted into percentages and rectangles are
26
divided according to them. While later case is used to show some related phenomenon
like cost per unit, quality of production etc.
Ex: Draw the rectangle diagram for following data
Item Expenditure
Expenditure in Rs.
Family A Family B
Provisional stores 1000 2000
Education 250 500
Electricity 300 700
House Rent 1500 2800
Vehicle Fuel 500 1000
Total 3500 7000
Total expenditure will be taken as 100 and the expenditure on individual items
are expressed in percentage. The width of two rectangles are in proportion to the total
expenses of the two families i.e. 3500 : 7000 or 1 : 2. The height of rectangles are
according to percentage of expenses.
Item Expenditure
Monthly expenditure
Family A (Rs. 3500) Family B(Rs. 7000)
Rs. %age Rs. %age
Provisional stores 1000 28.57 2000 28.57
Education 250 7.14 500 7.14
Electricity 300 8.57 700 10
House Rent 1500 42.85 2800 40
Vehicle Fuel 500 12.85 1000 14.28
Total 3500 100 7000 100
27
0
20
40
60
80
100
B A
%
o
f
E
x
p
e
n
d
i
t
u
r
e
Family
Provisonal Stores Education
Electricity House Rent Vehicle Fuel
b) Square diagram
To draw square diagrams, the square root is taken of the values of the various
items to be shown. A suitable scale may be used to depict the diagram. Ratios are to be
maintained to draw squares.
Ex: Draw the square diagram for following data
4900 2500 1600
Solution: Square root for each item in found out as 70, 50 and 40 and is divided by 10;
thus we get 7, 5 and 4.
0
1000
2000
3000
4000
5000
6000
7 5
4
3 2 1
4900
2500
1600
28
Pie diagram
Pie diagram helps us to show the portioning of a total into its component parts.
It is used to show classes or groups of data in proportion to whole data set. The entire
pie represents all the data, while each slice represents a different class or group within
the whole. Following illustration shows construction of pie diagram.
Draw the pie diagram for following data
Revenue collections for the year 2005-2006 by government in Rs. (crore)s for
petroleum products are as follows. Draw the pie diagram.
Customs 9600
Excise 49300
Corporate Tax and dividend 18900
States taking 48800
Total 126600
Solution:
Item / Source Value in
crores
Angle of circle %ge
Customs 9600
o
30 . 27 360 x
126600
9600
7.58
Excise 49300
o
20 . 140 360 x
126600
49300
39.00
Corporate Tax and Dividend 18900
o
70 . 53 360 x
126600
18900
14.92
States taking 48800
o
80 . 138 360 x
126600
48800
38.50
Total 126600 360
o
100
29
7.58
39
14.92
38.5
Customs
Excise
Corporate Tax
and Dividend
States taking
Source: India Today 19 June, 2006
Choice or selection of diagram
There are many methods to depict statistical data through diagram. No angle
diagram is suited for all purposes. The choice / selection of diagram to suit given set of
data requires skill, knowledge and experience. Primarily, the choice depends upon the
nature of data and purpose of presentation, to whom it is meant. The nature of data will
help in taking a decision as to one-dimensional or two-dimensional or three-
dimensional diagram. It is also required to know the audience for whom the diagram is
depicted.
The following points are to be kept in mind for the choice of diagram.
1. To common man, who has less knowledge in statistics cartogram and
pictograms are suited.
2. To present the components apart from magnitude of values, sub-divided bar
diagram can be used.
3. When a large number of components are to be shows, pie diagram is suitable.
Graphic presentation
A graphic presentation a visual form of presentation graphs are drawn on a
special type of paper known are graph paper.
Common graphic representations are
a) Histogram
b) Frequency polygon
c) Cumulative frequency curve (ogive)
Advantages of graphic presentation
1. It provides attractive and impressive view
30
2. Simplifies complexity of data
3. Helps for direct comparison
4. It helps for further statistical analysis
5. It is simplest method of presentation of data
6. It shows trend and pattern of data
Difference between graph and diagram
Diagram Graph
1. Ordinary paper can be used 1. Graph paper is required
2. It is attractive and easily
understandable
2. Needs some effect to understand
3. It is appropriate and effective to
measure more variable
3. It creates problem
4. It cant be used for further analysis 4. Can be used for further analysis
5. It gives comparison 5. It shows relationship between
variables
6. Data are represented by bars,
rectangles
6. Points and lines are used to represent
data
Frequency Histogram
In this type of representation the given data are plotted in the form of series of
rectangles. Class intervals are marked along the x-axis and the frequencies are along
the y-axis according to suitable scale. Unlike the bar chart, which is one-dimensional, a
histogram is two-dimensional in which the length and width are both important. A
histogram is constructed from a frequency distribution of grouped data, where the
height of rectangle is proportional to respective frequency and width represents the
class interval. Each rectangle is joined with other and the blank space between the
rectangles would mean that the category is empty and there is no values in that class
interval.
Ex: Construct a histogram for following data.
Marks obtained (x) No. of students (f) Mid point
15 25 5 20
25 35 3 30
35 45 7 40
45 55 5 50
55 65 3 60
65 75 7 70
Total 30
For convenience sake, we will present the frequency distribution along with
mid-point of each class interval, where the mid-point is simply the average of value of
lower and upper boundary of each class interval.
31
0
1
2
3
4
5
6
7
75 65 55 45 35 25 15
F
r
e
q
u
e
n
c
y
(
N
o
.
o
f
s
t
u
d
e
n
t
s
)
Class Interval (Marks)
Frequency polygon
A frequency polygon is a line chart of frequency distribution in which either the
values of discrete variables or the mid-point of class intervals are plotted against the
frequency and those plotted points are joined together by straight lines. Since, the
frequencies do not start at zero or end at zero, this diagram as such would not touch
horizontal axis. However, since the area under entire curve is the same as that of a
histogram which is 100%. The curve must be enclosed, so that starting mid-point is
jointed with fictitious preceding mid-point whose value is zero. So that the beginning
of curve touches the horizontal axis and the last mid-point is joined with a fictitious
succeeding mid-point, whose value is also zero, so that the curve will end at horizontal
axis. This enclosed diagram is known as frequency polygon.
Ex: For following data construct frequency polygon.
Marks (CI) No. of frequencies (f) Mid-point
15 25 5 20
25 35 3 30
35 45 7 40
45 55 5 50
55 65 3 60
65 75 7 70
32
0 10 20 30 40 50 60 70 80 90 100
0
2
4
6
8
10
A Frequency polygon
F
r
e
q
u
e
n
c
y
Mid point (x)
Cumulative frequency curve (ogive)
ogives are the graphic representations of a cumulative frequency distribution.
These ogives are classified as less than and more than ogives. In case of less than,
cumulative frequencies are plotted against upper boundaries of their respective class
intervals. In case of grater than cumulative frequencies are plotted against upper
boundaries of their respective class intervals. These ogives are used for comparison
purposes. Several ogves can be compared on same grid with different colour for easier
visualisation and differentiation.
Ex:
Marks
(CI)
No. of
frequencies (f)
Mid-point
Cum. Freq.
Less than
Cum. Freq.
More than
15 25 5 20 5 30
25 35 3 30 8 25
35 45 7 40 15 22
45 55 5 50 20 15
55 65 3 60 23 10
65 75 7 70 30 7
33
Less than give diagram
20 30 40 50 60 70
5
10
15
20
25
30
'Less than' ogive
L
e
s
s
t
h
a
n
C
u
m
u
l
a
t
i
v
e
F
r
e
q
u
e
n
c
y
Upper Boundary (CI)
Less than give diagram
10 20 30 40 50 60 70
10
15
20
25
30
35
'More than' ogive
M
o
r
e
t
h
a
n
O
g
i
v
e
Lower Boundary (CI)
34
LESSON 1
STATISTICS FOR MANAGEMENT
Session 2 Duration: 1 hr
Classification and Tabulation
The data collected for the purpose of a statistical inquiry some times consists of
a few fairly simple figures, which can be easily understood without any special
treatment. But more often there is an overwhelming mass of raw data without any
structure. Thus, unwieldy, unorganised and shapeless mass of collected is not capable
of being rapidly or easily associated or interpreted. Unorganised data are not fit for
further analysis and interpretation. In order to make the data simple and easily
understandable the first task is not condense and simplify them in such a way that
irrelevant data are removed and their significant features are stand out prominently.
The procedure adopted for this purpose is known as method of classification and
tabulation. Classification helps proper tabulation.
Classified and arranged facts speak themselves; unarranged, unorganised they
are dead as mutton.
- Prof. J.R. Hicks
Meaning of Classification
Classification is a process of arranging things or data in groups or classes
according to their resemblances and affinities and gives expressions to the unity of
attributes that may subsit among a diversity of individuals.
Definition of Classification
Classification is the process of arranging data into sequences and groups
according to their common characteristics or separating them into different but related
parts.
- Secrist
The process of grouping large number of individual facts and observations on
the basis of similarity among the items is called classification.
- Stockton & Clark
Characteristics of classification
e) Classification performs homogeneous grouping of data
f) It brings out points of similarity and dissimilarities.
g) The classification may be either real or imaginary
h) Classification is flexible to accommodate adjustments
Objectives / purposes of classifications
35
ix) To simplify and condense the large data
x) To present the facts to easily in understandable form
xi) To allow comparisons
xii) To help to draw valid inferences
xiii) To relate the variables among the data
xiv) To help further analysis
xv) To eliminate unwanted data
xvi) To prepare tabulation
Guiding principles (rules) of classifications
Following are the general guiding principles for good classifications
g) Exhaustive: Classification should be exhaustive. Each and every item
in data must belong to one of class. Introduction of residual class (i.e.
either, miscellaneous etc.) should be avoided.
h) Mutually exclusive: Each item should be placed at only one class
i) Suitability: The classification should confirm to object of inquiry.
j) Stability: Only one principle must be maintained throughout the
classification and analysis.
k) Homogeneity: The items included in each class must be homogeneous.
l) Flexibility: A good classification should be flexible enough to
accommodate new situation or changed situations.
Modes / Types of Classification
Modes / Types of classification refers to the class categories into which the data
could be sorted out and tabulated. These categories depend on the nature of data and
purpose for which data is being sought.
Important types of classification
e) Geographical (i.e. on the basis of area or region wise)
f) Chronological (On the basis of Temporal / Historical, i.e. with respect to time)
g) Qualitative (on the basis of character / attributes)
h) Numerical, quantitative (on the basis of magnitude)
e) Geographical Classification
In geographical classification, the classification is based on the geographical
regions.
Ex: Sales of the company (In Million Rupees) (region wise)
Non-smokers
Illiterate
Male Female
Male Female
Illiterate
Male Female
Male Female
36
Region Sales
North 285
South 300
East 185
West 235
f) Chronological Classification
If the statistical data are classified according to the time of its occurrence, the
type of classification is called chronological classification.
Sales reported by a departmental store
Month
Sales
(Rs.) in lakhs
January 22
February 26
March 32
April 25
May 27
June 30
g) Qualitative Classification
In qualitative classifications, the data are classified according to the presence or
absence of attributes in given units. Thus, the classification is based on some quality
characteristics / attributes.
Ex: Sex, Literacy, Education, Class grade etc.
Further, it may be classified as
a) Simple classification b) Manifold classification
iii) Simple classification: If the classification is done into only two classes then
classification is known as simple classification.
Ex: a) Population in to Male / Female
b) Population into Educated / Uneducated
iv) Manifold classification: In this classification, the classification is based on more
than one attribute at a time.
Ex:
Non-smokers
Illiterate
Male Female
Male Female
Illiterate
Male Female
Male Female
37
h) Quantitative Classification: In Quantitative classification, the classification is
based on quantitative measurements of some characteristics, such as age, marks,
income, production, sales etc. The quantitative phenomenon under study is known
as variable and hence this classification is also called as classification by variable.
Ex:
For a 50 marks test, Marks obtained by students as classified as follows
Marks No. of students
0 10 5
10 20 7
20 30 10
30 40 25
40 50 3
Total Students = 50
In this classification marks obtained by students is variable and number of
students in each class represents the frequency.
Tabulation
Meaning and Definition of Tabulation
Tabulation may be defined, as systematic arrangement of data is column and
rows. It is designed to simplify presentation of data for the purpose of analysis and
statistical inferences.
Major Objectives of Tabulation
6. To simplify the complex data
Population
Smokers Non-smokers
Illiterate Literate
Male Female
Male Female
Literate
Illiterate
Male Female
Male Female
38
7. To facilitate comparison
8. To economize the space
9. To draw valid inference / conclusions
10. To help for further analysis
Differences between Classification and Tabulation
4. First data are classified and presented in tables; classification is the basis for
tabulation.
5. Tabulation is a mechanical function of classification because is tabulation
classified data are placed in row and columns.
6. Classification is a process of statistical analysis while tabulation is a process of
presenting data is suitable structure.
Classification of tables
Classification is done based on
4. Coverage (Simple and complex table)
5. Objective / purpose (General purpose / Reference table / Special table or
summary table)
6. Nature of inquiry (primary and derived table).
Ex:
c) Simple table: Data are classified based on only one characteristic
Distribution of marks
Class Marks No. of students
30 40 20
40 50 20
50 60 10
39
Total 50
d) Two-way table: Classification is based on two characteristics
Class Marks
No. of students
Boys Girls Total
30 40 10 10 20
40 50 15 5 20
50 60 3 7 10
Total 28 22 50
Frequency Distribution
Frequency distribution is a table used to organize the data. The left column
(called classes or groups) includes numerical intervals on a variable under study. The
right column contains the list of frequencies, or number of occurrences of each
class/group. Intervals are normally of equal size covering the sample observations
range.
It is simply a table in which the gathered data are grouped into classes and the
number of occurrences, which fall in each class, is recorded.
Definition
A frequency distribution is a statistical table which shows the set of all distinct
values of the variable arranged in order of magnitude, either individually or in groups
with their corresponding frequencies.
- Croxton and Cowden
A frequency distribution can be classified as
d) Series of individual observation
e) Discrete frequency distribution
f) Continuous frequency distribution
b) Series of individual observation
Series of individual observation is a series where the items are listed one after
the each observation. For statistical calculations, these observation could be arranged is
either ascending or descending order. This is called as array.
Ex:
Roll No. Marks obtained
in statistics
40
paper
1 83
2 80
3 75
4 92
5 65
The above data list is a raw data. The presentation of data in above form
doesnt reveal any information. If the data is arranged in ascending / descending in the
order of their magnitude, which gives better presentation then, it is called arraying of
data.
Discrete (ungrouped) Frequency Distribution
If the data series are presented in such away that indicating its exact
measurement of units, then it is called as discrete frequency distribution. Discrete
variable is one where the variants differ from each other by definite amounts.
Ex:
Assume that a survey has been made to know number of post-graduates in 10
families at random; the resulted raw data could be as follows.
0, 1, 3, 1, 0, 2, 2, 2, 2, 4
This data can be classified into an ungrouped frequency distribution. The number of
post-graduates becomes variable (x) for which we can list the frequency of occurrence
(f) in a tabular from as follows;
Number of post
graduates (x)
Frequency
(f)
0 2
1 2
2 4
3 1
41
4 1
The above example shows a discrete frequency distribution, where the variable
has discrete numerical values.
Continuous frequency distribution (grouped frequency distribution)
Continuous data series is one where the measurements are only approximations
and are expressed in class intervals within certain limits. In continuous frequency
distribution the class interval theoretically continuous from the starting of the frequency
distribution till the end without break. According to Boddington the variable which
can take very intermediate value between the smallest and largest value in the
distribution is a continuous frequency distribution.
Ex:
Marks obtained by 20 students in students exam for 50 marks are as given
below convert the data into continuous frequency distribution form.
18 23 28 29 44 28 48 33 32 43
24 29 32 39 49 42 27 33 28 29
By grouping the marks into class interval of 10 following frequency distribution
tables can be formed.
Marks No. of students
0 - 5 0
5 10 0
10 15 0
15 20 1
20 25 2
25 30 7
30 35 4
35 40 1
40 45 3
42
45 50 2
LESSON 1
STATISTICS FOR MANAGEMENT
Session 3 Duration: 1 hr
Technical terms used in formulation frequency distribution
c) Class limits:
The class limits are the smallest and largest values in the class.
Ex:
0 10, in this class, the lowest value is zero and highest value is 10. the two
boundaries of the class are called upper and lower limits of the class. Class limit is also
called as class boundaries.
d) Class intervals
The difference between upper and lower limit of class is known as class
interval.
Ex:
In the class 0 10, the class interval is (10 0) = 10.
The formula to find class interval is gives on below
R
S L
i
L = Largest value
S = Smallest value
R = the no. of classes
Ex:
If the mark of 60 students in a class varies between 40 and 100 and if we want
to form 6 classes, the class interval would be
I= (L-S ) / K =
6
40 100
=
6
60
= 10 L = 100
S = 40
K = 6
Therefore, class intervals would be 40 50, 50 60, 60 70, 70 80, 80 90
and 90 100.
Methods of forming class-interval
43
c) Exclusive method (overlapping)
In this method, the upper limits of one class-interval are the lower limit of next
class. This method makes continuity of data.
Ex:
Marks No. of students
20 30 5
30 40 15
40 50 25
A student whose mark is between 20 to 29.9 will be included in the 20 30
class.
Better way of expressing is
Marks No. of students
20 to les than 30
(More than 20 but les than 30)
5
30 to les than 40 15
40 to les than 50 25
Total Students 50
d) Inclusive method (non-overlaping)
Ex:
Marks No. of students
20 29 5
30 39 15
40 49 25
A student whose mark is 29 is included in 20 29 class interval and a student
whose mark in 39 is included in 30 39 class interval.
Class Frequency
The number of observations falling within class-interval is called its class
frequency.
44
Ex: The class frequency 90 100 is 5, represents that there are 5 students scored
between 90 and 100. If we add all the frequencies of individual classes, the total
frequency represents total number of items studied.
Magnitude of class interval
The magnitude of class interval depends on range and number of classes. The
range is the difference between the highest and smallest values is the data series. A
class interval is generally in the multiples of 5, 10, 15 and 20.
Sturges formula to find number of classes is given below
K = 1 + 3.322 log N.
K = No. of class
log N = Logarithm of total no. of observations
Ex: If total number of observations are 100, then number of classes could be
K = 1 + 3.322 log 100
K = 1 + 3.322 x 2
K = 1 + 6.644
K = 7.644 = 8 (Rounded off)
NOTE: Under this formula number of class cant be less than 4 and not greater than 20.
Class mid point or class marks
The mid value or central value of the class interval is called mid point.
Mid point of a class =
2
class) of limit upper class of limit (lower +
Sturges formula to find size of class interval
Size of class interval (h) =
N log 322 . 3 1
Range
+
Ex: In a 5 group of worker, highest wage is Rs. 250 and lowest wage is 100 per day.
Find the size of interval.
h =
N log 322 . 3 1
Range
+
=
50 log 322 . 3 1
100 250
+
= 55.57 56
Constructing a frequency distribution
The following guidelines may be considered for the construction of frequency
distribution.
45
g) The classes should be clearly defined and each observation must belong to one
and to only one class interval. Interval classes must be inclusive and non-
overlapping.
h) The number of classes should be neither too large nor too small.
Too small classes result greater interval width with loss of accuracy. Too many
class interval result is complexity.
i) All intervals should be of the same width. This is preferred for easy
computations.
The width of interval =
classes of Number
Range
j) Open end classes should be avoided since creates difficulty in analysis and
interpretation.
k) Intervals would be continuous throughout the distribution. This is important for
continuous distribution.
l) The lower limits of the class intervals should be simple multiples of the interval.
Ex: A simple of 30 persons weight of a particular class students are as follows.
Construct a frequency distribution for the given data.
62 58 58 52 48 53 54 63 69 63
57 56 46 48 53 56 57 59 58 53
52 56 57 52 52 53 54 58 61 63
Steps of construction
Step 1
Find the range of data (H) Highest value = 70
(L) Lowest value = 46
Range = H L = 69 46 = 23
Step 2
Find the number of class intervals.
Sturges formula
K = 1 + 3.322 log N.
K = 1 + 3.222 log 30
K = 5.90 Say K = 6
No. of classes = 6
Step 3
Width of class interval
Width of class interval =
classes of Number
Range
= 4 883 . 3
6
23
46
Step 4
Conclusions all frequencies belong to each class interval and assign this total
frequency to corresponding class intervals as follows.
Class interval Tally bars Frequency
46 50 | | | 3
50 54 | | | | | | | 8
54 58 | | | | | | | 8
58 62 | | | | | 6
62 66 | | | | 4
66 70 | 1
Cumulative frequency distribution
Cumulative frequency distribution indicating directly the number of units that
lie above or below the specified values of the class intervals. When the interest of the
investigator is on number of cases below the specified value, then the specified value
represents the upper limit of the class interval. It is known as less than cumulative
frequency distribution. When the interest is lies in finding the number of cases above
specified value then this value is taken as lower limit of the specified class interval.
Then, it is known as more than cumulative frequency distribution.
The cumulative frequency simply means that summing up the consecutive
frequency.
Ex:
Marks No. of students
Less than
cumulative
frequency
0 10 5 5
10 20 3 8
20 30 10 18
30 40 20 38
40 50 12 50
47
In the above less than cumulative frequency distribution, there are 5 students
less than 10, 3 less than 20 and 10 less than 30 and so on.
Similarly, following table shows greater than cumulative frequency
distribution.
Ex:
Marks No. of students
Less than
cumulative
frequency
0 10 5 50
10 20 3 45
20 30 10 42
30 40 20 32
40 50 12 12
In the above greater than cumulative frequency distribution, 50 students are
scored more than 0, 45 more than 10, 42 more than 20 and so on.
Diagrammatic and Graphic Representation
The data collected can be presented graphically or pictorially to be easy
understanding and for quick interpretation. Diagrams and graphs give visual
indications of magnitudes, groupings, trends and patterns in the data. These parameter
can be more simply presented in the graphical manner. The diagrams and graphs help
for comparison of the variables.
Diagrammatic presentation
A diagram is a visual form for presentation of statistical data. The diagram
refers various types of devices such as bars, circles, maps, pictorials and cartograms etc.
Importance of Diagrams
6. They are simple, attractive and easy understandable
7. They give quick information
8. It helps to compare the variables
9. Diagrams are more suitable to illustrate discrete data
10. It will have more stable effect in the readers mind.
Limitations of diagrams
1. Diagrams shows approximate value
2. Diagrams are not suitable for further analysis
48
3. Some diagrams are limited to experts (multidimensional)
4. Details cannot be provided fully
5. It is useful only for comparison
General Rules for drawing the diagrams
ix) Each diagram should have suitable title indicating the theme with which
diagram is intended at the top or bottom.
x) The size of diagram should emphasize the important characteristics of data.
xi) Approximate proposition should be maintained for length and breadth of
diagram.
xii) A proper / suitable scale to be adopted for diagram
xiii) Selection of approximate diagram is important and wrong selection may
mislead the reader.
xiv) Source of data should be mentioned at bottom.
xv) Diagram should be simple and attractive
xvi) Diagram should be effective than complex.
Some important types of diagrams
f) One dimensional diagrams (line and bar)
g) Two-dimensional diagram (rectangle, square, circle)
h) Three-dimensional diagram (cube, sphere, cylinder etc.)
i) Pictogram
j) Cartogram
c) One dimensional diagrams (line and bar)
In one-dimensional diagrams, the length of the bars or lines is taken into
account. Widths of the bars are not considered. Bar diagrams are classified mainly as
follows.
iii) Line diagram
iv) Bar diagram
- Vertical bar diagram
- Horizontal bar diagram
- Multiple (compound) bar diagram
- Sub-divided (component) bar diagram
- Percentage subdivided bar diagram
ii) Line diagram
49
This is simplest type of one-dimensional diagram. On the basis of size of the
figures, heights of the bar / lines are drawn. The distances between bars are kept
uniform. The limitation of this diagram are it is not attractive cannot provide more than
one information.
Ex: Draw the line diagram for the following data
Year 2001 2002 2003 2004 2005 2006
No. of students passed in first class
with distinction
5 7 12 5 13 15
2001 2002 2003 2004 2005 2006
4
6
8
10
12
14
16
(15)
(13)
(5)
(12)
(7)
(5)
N
o
.
o
f
s
t
u
d
e
n
t
s
p
a
s
s
e
d
i
n
F
C
D
Year
Indication of diagram: Highest FCD is at 2006 and lowest FCD are at 2001 and 2004.
d) Simple bars diagram
A simple bar diagram can be drawn using horizontal or vertical bar. In business
and economics, it is very a common diagram.
Vertical bar diagram
The annual expresses of maintaining the car of various types are given below.
Draw the vertical bar diagram. The annual expenses of maintaining includes (fuel +
maintenance + repair + assistance + insurance).
Type of the car Expense in Rs. / Year
Maruthi Udyog 47533
Hyundai 59230
50
Tata Motors 63270
Source: 2005 TNS TCS Study
Published at: Vijaya Karnataka, dated: 03.08.2006
47533
59230
63270
30000
35000
40000
45000
50000
55000
60000
65000
70000
Maruthi Udyog Hyundai Tata Motors
Source: 2005 TNS TCS Study
Published at: Vijaya Karnataka, dated: 03.08.2006
Indicating of diagram
a) Annual expenses of Maruthi Udyog brand car is comparatively less with
other brands depicted
b) High annual expenses of Tata motors brand can be seen from diagram.
Horizontal bar diagram
World biggest top 10 steel makers are data are given below. Draw horizontal
bar diagram.
Steel
maker
Arcelor
Mittal
Nippo
n
POSCO JFE
BAO
Steel
US
Steel
NUCOR
RIVA Thyssen-
krupp
Tangshan
Prodn.
in
million
tonnes
110 32 31 30 24 20 18 18 17 16
51
110
32
31
30
24
20
18
18
17
16
0 20 40 60 80 100 120
Arcelor Mittal
Nippon
POSCO
JFE
BAO Steel
US Steel
NUCOR
RIVA
Thyssen-krupp
Tangshan
T
o
p
-
1
0
S
t
e
e
l
M
a
k
e
r
s
Production of Steel (Million Tonnes)
Source: ISSB Published by India Today
Compound bar diagram (Multiple bar diagram)
Multiple bar diagrams are used to provide more information than simple bar
diagram. Multiple bar diagram provides more than one phenomenon and highly useful
for direct comparison. The bars are drawn side-by-side and different columns, shades
hatches can be used for indicating each variable used.
Ex: Draw the bar diagram for the following data. Resale value of the cars (Rs. 000) is
as follows.
Year (Model) Santro Zen Wagonr
2003 208 252 248
2004 240 278 274
2005 261 296 302
52
208
252
248
240
278
274
261
296
302
0
50
100
150
200
250
300
350
1 2 3
Model of Car
V
a
l
u
e
i
n
R
s
.
Santro Zen Wagnor
Source: True value used car purchase data
Published by: Vijaya Karnataka, dated: 03.08.2006
Ex: Represent following in suitable diagram
Class A B C
Male 1000 1500 1500
Female 500 800 1000
Total 1500 2300 2500
1000
500
1500
800
1500
1000
0
500
1000
1500
2000
2500
P
o
p
u
l
a
t
i
o
n
(
i
n
N
o
s
.
)
1 2 3
Class
Male Female
53
1500
2300
2500
Ex: Draw the suitable diagram for following data
Mode of
investment
Investment in 2004 in Rs. Investment in 2005 in Rs.
Investment %age Investment %age
NSC 25000 43.10 30000 45.45
MIS 15000 25.86 10000 15.15
Mutual Fund 15000 25.86 25000 37.87
LIC 3000 5.17 1000 1.52
Total 58000 100 66000 100
2004 2005
0
10
20
30
40
50
60
70
80
90
100
110
45.45
15.15
37.87
1.52 5.17
25.86
25.86
43.10
%
o
f
I
n
v
e
s
t
m
e
n
t
Year
Two-dimensional diagram
In two-dimensional diagram both breadth and length of the diagram (i.e. area of
the diagram) are considered as area of diagram represents the data. The important two-
dimensional diagrams are
a) Rectangular diagram
b) Square diagram
c) Rectangular diagram
Rectangular diagrams are used to depict two or more variables. This diagram
helps for direct comparison. The area of rectangular are kept in proportion to the
values. It may be of two types.
iii) Percentage sub-divided rectangular diagram
iv) Sub-divided rectangular diagram
54
In former case, width of the rectangular are proportional to the values, the various
components of the values are converted into percentages and rectangles are divided
according to them. Later case is used to show some related phenomenon like cost per
unit, quality of production etc.
Ex: Draw the rectangle diagram for following data
Item Expenditure
Expenditure in Rs.
Family A Family B
Provisional stores 1000 2000
Education 250 500
Electricity 300 700
House Rent 1500 2800
Vehicle Fuel 500 1000
Total 3500 7000
Total expenditure will be taken as 100 and the expenditure on individual items
are expressed in percentage. The widths of two rectangles are in proportion to the total
expenses of the two families i.e. 3500: 7000 or 1: 2. The heights of rectangles are
according to percentage of expenses.
Item Expenditure
Monthly expenditure
Family A (Rs. 3500) Family B(Rs. 7000)
Rs. %age Rs. %age
Provisional stores 1000 28.57 2000 28.57
Education 250 7.14 500 7.14
Electricity 300 8.57 700 10
House Rent 1500 42.85 2800 40
Vehicle Fuel 500 12.85 1000 14.28
Total 3500 100 7000 100
55
0
20
40
60
80
100
B A
%
o
f
E
x
p
e
n
d
i
t
u
r
e
Family
Provisonal Stores Education
Electricity House Rent Vehicle Fuel
d) Square diagram
To draw square diagrams, the square root is taken of the values of the various
items to be shown. A suitable scale may be used to depict the diagram. Ratios are to be
maintained to draw squares.
Ex: Draw the square diagram for following data
4900 2500 1600
Solution: Square root for each item in found out as 70, 50 and 40 and is divided by 10;
thus we get 7, 5 and 4.
0
1000
2000
3000
4000
5000
6000
7 5
4
3 2 1
4900
2500
1600
56
Pie diagram
Pie diagram helps us to show the portioning of a total into its component parts.
It is used to show classes or groups of data in proportion to whole data set. The entire
pie represents all the data, while each slice represents a different class or group within
the whole. Following illustration shows construction of pie diagram.
Draw the pie diagram for following data
Revenue collections for the year 2005-2006 by government in Rs. (crore)s for
petroleum products are as follows. Draw the pie diagram.
Customs 9600
Excise 49300
Corporate Tax and dividend 18900
States taking 48800
Total 126600
Solution:
Item / Source Value in
crores
Angle of circle %ge
Customs 9600
o
30 . 27 360 x
126600
9600
7.58
Excise 49300
o
20 . 140 360 x
126600
49300
39.00
Corporate Tax and Dividend 18900
o
70 . 53 360 x
126600
18900
14.92
States taking 48800
o
80 . 138 360 x
126600
48800
38.50
Total 126600 360
o
100
57
7.58
39
14.92
38.5
Customs
Excise
Corporate Tax
and Dividend
States taking
Source: India Today 19 June, 2006
Choice or selection of diagram
There are many methods to depict statistical data through diagram. No angle
diagram is suited for all purposes. The choice / selection of diagram to suit given set of
data requires skill, knowledge and experience. Primarily, the choice depends upon the
nature of data and purpose of presentation, to which it is meant. The nature of data will
help in taking a decision as to one-dimensional or two-dimensional or three-
dimensional diagram. It is also required to know the audience for whom the diagram is
depicted.
The following points are to be kept in mind for the choice of diagram.
4. To common man, who has less knowledge in statistics cartogram and
pictograms are suited.
5. To present the components apart from magnitude of values, sub-divided bar
diagram can be used.
6. When a large number of components are to be shows, pie diagram is suitable.
Graphic presentation
A graphic presentation is a visual form of presentation graphs are drawn on a
special type of paper known are graph paper.
Common graphic representations are
a) Histogram
b) Frequency polygon
c) Cumulative frequency curve (ogive)
58
Advantages of graphic presentation
7. It provides attractive and impressive view
8. Simplifies complexity of data
9. Helps for direct comparison
10. It helps for further statistical analysis
11. It is simplest method of presentation of data
12. It shows trend and pattern of data
Difference between graph and diagram
Diagram Graph
7. Ordinary paper can be used 7. Graph paper is required
8. It is attractive and easily
understandable
8. Needs some effect to understand
9. It is appropriate and effective to
measure more variable
9. It creates problem
10. It cant be used for further analysis 10. Can be used for further analysis
11. It gives comparison 11. It shows relationship between
variables
12. Data are represented by bars,
rectangles
12. Points and lines are used to represent
data
Frequency Histogram
In this type of representation the given data are plotted in the form of series of
rectangles. Class intervals are marked along the x-axis and the frequencies are along
the y-axis according to suitable scale. Unlike the bar chart, which is one-dimensional, a
histogram is two-dimensional in which the length and width are both important. A
histogram is constructed from a frequency distribution of grouped data, where the
height of rectangle is proportional to respective frequency and width represents the
class interval. Each rectangle is joined with other and the blank space between the
rectangles would mean that the category is empty and there are no values in that class
interval.
Ex: Construct a histogram for following data.
Marks obtained (x) No. of students (f) Mid point
15 25 5 20
25 35 3 30
35 45 7 40
45 55 5 50
55 65 3 60
65 75 7 70
Total 30
59
For convenience sake, we will present the frequency distribution along with
mid-point of each class interval, where the mid-point is simply the average of value of
lower and upper boundary of each class interval.
0
1
2
3
4
5
6
7
75 65 55 45 35 25 15
F
r
e
q
u
e
n
c
y
(
N
o
.
o
f
s
t
u
d
e
n
t
s
)
Class Interval (Marks)
Frequency polygon
A frequency polygon is a line chart of frequency distribution in which either the
values of discrete variables or the mid-point of class intervals are plotted against the
frequency and those plotted points are joined together by straight lines. Since, the
frequencies do not start at zero or end at zero, this diagram as such would not touch
horizontal axis. However, since the area under entire curve is the same as that of a
histogram which is 100%. The curve must be enclosed, so that starting mid-point is
jointed with fictitious preceding mid-point whose value is zero. So that the beginning
of curve touches the horizontal axis and the last mid-point is joined with a fictitious
succeeding mid-point, whose value is also zero, so that the curve will end at horizontal
axis. This enclosed diagram is known as frequency polygon.
Ex: For following data construct frequency polygon.
Marks (CI) No. of frequencies (f) Mid-point
15 25 5 20
25 35 3 30
35 45 7 40
45 55 5 50
55 65 3 60
65 75 7 70
60
0 10 20 30 40 50 60 70 80 90 100
0
2
4
6
8
10
A Frequency polygon
F
r
e
q
u
e
n
c
y
Mid point (x)
Cumulative frequency curve (ogive)
ogives are the graphic representations of a cumulative frequency distribution.
These ogives are classified as less than and more than ogives. In case of less than,
cumulative frequencies are plotted against upper boundaries of their respective class
intervals. In case of grater than cumulative frequencies are plotted against upper
boundaries of their respective class intervals. These ogives are used for comparison
purposes. Several ogves can be compared on same grid with different colour for easier
visualisation and differentiation.
Ex:
Marks
(CI)
No. of
frequencies (f)
Mid-point
Cum. Freq.
Less than
Cum. Freq.
More than
15 25 5 20 5 30
25 35 3 30 8 25
35 45 7 40 15 22
45 55 5 50 20 15
55 65 3 60 23 10
65 75 7 70 30 7
61
Less than give diagram
20 30 40 50 60 70
5
10
15
20
25
30
'Less than' ogive
L
e
s
s
t
h
a
n
C
u
m
u
l
a
t
i
v
e
F
r
e
q
u
e
n
c
y
Upper Boundary (CI)
Less than give diagram
10 20 30 40 50 60 70
10
15
20
25
30
35
'More than' ogive
M
o
r
e
t
h
a
n
O
g
i
v
e
Lower Boundary (CI)
62
Session 4
Measures of Central Tendency
A classified statistical data may sometimes be described as distributed around
some value called the central value or average is some sense. It gives the most
representative value of the entire data. Different methods give different central values
and are referred to as the measures of central tendency.
Thus, the most important objective of statistical analysis is to determine a single
value that represents the characteristics of the entire raw data. This single value
representing the entire data is called Central value or an average. This value is the
point around which all other values of data cluster. Therefore, it is known as the
measure of location and since this value is located at central point nearest to other
values of the data it is also called as measures of central tendency.
Different methods give different central values and are referred as measures of
central tendency. The common measures of central tendency are a) Mean b) Median c)
Mode.
These values are very useful not only in presenting overall picture of entire data,
but also for the purpose of making comparison among two or more sets of data.
Average
Definition
Average is a value which is typical or representative of a set of data.
- Murry R. Speigal
Average is an attempt to find one single figure to describe whole of figures.
- Clark & Sekkade
From above definitions it is clear that average is a typical value of the entire
data and is a measure of central tendency.
Functions of an average
To represents complex or large data.
It facilitates comparative study of two variables.
Helps to study population from sample data.
Helps in decision making.
Represents single value for a series of data.
To establish mathematical relationship.
63
Characteristics of a typical average
It should be rigidly defined and easily understandable.
It should be simple to compute and in the form of mathematical formula.
It should be based on all the items in the data.
It should not be unduly influenced by any single item.
It should be capable of further mathematical treatment.
It should have sampling stability.
Types of average
Average or measures of central tendency are of following types.
1. Mathematical average
a. Arithmetical mean
i. Simple mean
ii. Weighted mean
b. Geometric mean
c. Harmonic mean
2. Positional Averages
a. Median
b. Mode
Arithmetic mean
Arithmetic mean is also called arithmetic average. It is most commonly used
measures of central tendency. Arithmetic average of a series is the value obtained by
dividing the total value of various item by its number.
Arithmetic average are of two types
a. Simple arithmetic average
b. Weighted arithmetic average
Simple arithmetic average (Mean)
Arithmetic mean is simply sometimes referred as Mean. Ex: Mean income,
Mean expenses, Mean marks etc.
Unlike other averages, mean has to be computed by considering each and every
observations in the series. Hence, the mean cannot be found by either by inspection or
observation of items.
Simple arithmetic mean is equal to sum of the variable divided by their number
of observations in the sample.
64
Let x
i
is the variable which takes values x
1
, x
2
, x
3
, x
n
over n items, then
arithmetic mean, simply the mean of x, denoted by bar over the variable x is given by.
n
x
n
x ..... .......... x x x
x
n 3 2 1
+ + + +
6
20 25 22 23 15 20 + + + + +
6
125
= 20.83
2. Six month income of departmental store are given below. Find mean income of
stores.
Month Jan Feb Mar Apr May June
Income (Rs.) 25000 30000 45000 20000 25000 20000
n = Total No. of items (observations) = 6
Total income = x
i
= (25000 + 30000 + 45000 + 20000 + 20000)
= 140000
Mean income = 33 . 23333 . Rs
6
140000
n
x
i
The above example shows that if there are large data or large figures are there in
data, computations required to get mean in high. In order to reduce computations one
can go for short-cut method. The method is illustrated below.
65
Shortcut method
Steps of this method is given below.
Step 1: Assume any one value as a mean which is called arbitrary average (A).
Step 2: Find the difference (deviations) of each value from arbitrary average.
D = x
i
A
Step 3: Add all deviations (differences) to get d.
Step 4: Use following equation and compute the mean value.
n
d
A x
+
n = Total No. of observations
d = Total deviation value
A = Arbitrary mean
Example: Find the mean marks obtained by the students for the joining data given.
20 25 20 22 20 21 23 25 22 18
Let A = 20 and n = 10
Marks D = (x
i
20)
20 0
25 5
20 0
22 2
20 0
21 1
23 3
25 5
22 2
18 -2
d = 16
n
d
A x
+
10
16
20 x +
= 20 + 1.6
Mean Marks 6 . 21 x
66
1. Mathematical characteristics of mean
a. Algebraic sum of deviations of all observations from their arithmetic mean is
zero i.e. (x
i
- x ) = 0.
b. The sum of squared deviations of the items from the mean is a minimum, that is
less than the sum of squared deviations of items from any other value.
d
2
= minimum
c. Since
n
x
x
. If any two values are given, third value can be computed.
d. If all the items of a sets are increased / decreased by any constant value, the
arithmetic mean will also increases / decreases by the same constant.
2. Weighted arithmetic mean
The weighted mean is computed by considering the relative importance of each
of values to the total value. The arithmetic mean gives equal importance to all the items
of distribution. In certain cases, relative importance of items is not the same. To give
relative importance, weightage may be given to variables depending on cases. Thus,
weightage represents the relative importance of the items.
The weighted arithmetic mean in computed by following equation.
Let
x
1
, x
2
, x
3
, x
n
are the variables and
w
1
, w
2
, w
3
, w
n
are the respective weights assigned. Then weighted
mean w x is given by below equation.
+ + + +
+ + + +
w
xw
w .. .......... w w w
w x ...... w x w x w x
x
n 3 2 1
n n 3 3 2 2 1 1
w
i.e., weighted average is the ratio of product of all values and respective weights
to sum of weights.
Ex: Compute simple weighted arithmetic mean and comment on them.
Designation
Monthly salary
(Rs) (x)
Strength of
cadre (w)
xw
General Manager 25000 10 250000
Mangers 19000 20 380000
Supervisors 14000 10 140000
Office Assistant 10000 50 500000
Helpers 8000 25 200000
(N = 5) Total
x = 76000 w = 115 xw = 1470000
67
a. Simple arithmetic mean = 15200 . Rs
5
76000
N
x
In this example, simple arithmetic mean does not accounts the difference in
salary range for various staff. It is given equal importance. The salary of General
Manager and Manager has inflated the value of simple mean. The weighted mean gives
importance to the number of persons in various salary range.
Ex: Comment on performance of students of two universities given below.
University Bombay Madras
Course
% of pas
(x)
No. of (w)
students
(000)
w
x
% of
pas (x)
No. of
(w)
students
w
x
MBA 71 3 213 81 5 405
MCA 83 2 166 76 3 228
MA 73 5 365 58 3 174
M.Sc. 75 2 150 76 1 76
M.Com. 70 2 140 81 2 162
Total () x = 372 w =14 wx =1034 x =372 w =14 wx =1045
a. Since x is same, simple arithmetic average for both universities.
= 4 . 74
5
372
N
x
By short-cut method
Let A = 3, (Assumed mean = 3)
Value (x) Frequency (f) d = (x A) fd
1 10 -2 -20
2 15 -1 -15
3 10 0 0
4 9 1 9
5 5 2 10
f = 49 fd = - 16
67 . 2
49
16
3
N
fx
A x
,
_
+
69
Continuous series
In continuous frequency distribution, the individual value of each item in the
frequency distribution is not known. In a continuous series the mid points of various
class intervals are written down to replace the class interval. In continuous series the
mean can be calculated by any of the following methods.
a. Direct method
b. Short cut method
c. Step deviation method
a. Direct method
Steps of their method are as follows
1. Find out the mid value of class group or class.
Ex: For a class interval 20-30, the mid value is 25
2
50
2
30 23
+
mid value is
denoted by m.
2. Multiply the mid value m by frequency f of each class and sum up to get
fm.
3. Use
N
fm
x
where N = f formula to get mean value.
Ex: Compute the mean for following data.
Age group
(CI)
No. of persons
(f)
Mid point
m
fm
0 10 5 5 25
10 20 15 15 225
20 30 25 25 625
30 40 8 35 280
40 50 7 45 315
Total
f = 60 = N fm = 1470
Mean age =
245
60
1470
N
fm
f
fm
x
= 24.5
b. Short cut method
Steps of above methods are described below.
1. Find the mid value of each class
2. Assume any of the mid value as arbitrary average (A).
3. Multiply the deviation (differences) d by frequency f.
70
Using the formula
N
fd
A x
+ find the mean value.
Ex: Find the mean age of patient visiting to hospital in a particular day using following
data.
Age group
CI
No. of patients
(f)
Mid value
M
d = (m 25) fd
0 10 5 5 -20 -100
10 20 15 15 -10 -150
20 30 25 25 0 0
30 40 8 35 10 80
40 50 7 45 20 140
Total
f = 60 = N fd = 30
Let Arbitrary average = A = 25
Mean age
N
fd
A x
+
5 . 24
2
1
25
60
30
25 x
,
_
+
5 . 24 x
c. Step deviation method
In this method, after finding deviation from arbitrary mean, it is divided by a
common factor. Scaling down the deviation by a step will reduce the calculation to
minimum. The procedure of this method is described below.
Steps of step deviation method
1. Find out the mid value m.
2. Select the arbitrary men A.
3. Find the deviation (d) of mid value of each from A.
4. Deviations d are divided by a common factor d'.
5. multiply d' of each class by frequency f to get fd' and sum up for all classes to
get fd'.
6. Using the formula C x
N
' fd
A x
+ (where, C is a common factor) calculate
mean value.
71
Ex: Find the mean age of following data.
Age (CI)
No. of persons
f
Mid value
m
(d=mA)
(d=m25)
d'=
10
d
fd'
0 10 5 5 -20 -2 -10
10 20 15 15 -10 -1 -15
20 30 25 25 0 0 0
30 40 8 35 10 1 8
40 50 7 45 20 2 14
Total
f=60=N fd'= -3
Let A = 25 and
C = 10
C x
N
' fd
A x
+
10 x
60
) 3 (
25 x
+
2
1
25 x
5 . 24 x
72
Session 5
Measures of Central Tendency
Combined Mean
Combined arithmetic mean can be computed if we know the mean and number
of items in each groups of the data.
The following equation is used to compute combined mean.
Let
2 1
x & x are the mean of first and second group of data containing N
1
& N
2
items respectively.
Then, combined mean =
2 1
2 2 1 1
12
N N
x N x N
x
+
+
Ex - 1:
a) Find the means for the entire group of workers for the following data.
Group 1 Group 2
Mean wages 75 60
No. of workers 1000 1500
Given data: N
1
= 1000 N
2
= 1500
60 x & 75 x
2 1
Group Mean =
2 1
2 2 1 1
12
N N
x N x N
x
+
+
=
1500 1000
60 x 1500 75 x 1000
+
+
= 66 . Rs x
12
Ex - 2: Compute mean for entire group.
Medical examination No. examined Mean weight (pounds)
A 50 113
B 60 120
C 90 115
73
Combined mean (grouped mean weight)
3 2 1
3 3 2 2 1 1
N N N
x N x N x N
+ +
+ +
) 90 60 50 (
) 115 x 90 120 x 60 113 x 50 (
x
123
+ +
+ +
1
]
1
+ + +
N
x log
log Anti
n
log ...... .......... log log
i
1 i
n 2 1
Merits of GM
a. It is based on all the observations in the series.
b. It is rigidly defined.
c. It is best suited for averages and ratios.
d. It is less affected by extreme values.
e. It is useful for studying social and economics data.
Demerits of GM
a. It is not simple to understand.
b. It requires computational skill.
c. GM cannot be computed if any of item is zero or negative.
d. It has restricted application.
Ex - 1:
a. Find the GM of data 2, 4, 8
x
1
= 2,
x
2
= 4,
x
3
= 8
n = 3
GM =
3 2 1
x x x x x n
GM = 8 x 4 x 2 3
GM = 4 64 3
GM = 4
b. Find GM of data 2, 4, 8 using logarithms.
Data: x
1
= 2
x
2
= 4
x
3
= 8
N = 3
75
x log x
2 0.301
4 0.602
8 0.903
logx = 1.806
GM = Antilog
1
]
1
N
x log
GM = Antilog
1
]
1
3
806 . 1
GM = Antilog (0.6020)
= 3.9997
GM 4
Ex - 2:
Compare the previous year the Over Head (OH) expenses which went up to
32% in year 2003, then increased by 40% in next year and 50% increase in the
following year. Calculate average increase in over head expenses.
Let 100% OH Expenses at base year
Year OH Expenses (x) log x
2002 Base year
2003 132 2.126
2004 140 2.146
2005 150 2.176
log x = 6.448
GM = Antilog
1
]
1
N
x log
GM = Antilog
1
]
1
3
448 . 6
GM = 141.03
GM for discrete series
GM for discrete series is given with usual notations as month:
76
GM = Antilog
1
]
1
N
x log
i
1 i
Ex - 3:
Consider following time series for monthly sales of ABC company for 4
months. Find average rate of change per monthly sales.
Month Sales
I 10000
II 8000
III 12000
IV 15000
Let Base year = 100% sales.
Solution:
Month Base year
Sales
(Rs)
Increase /
decrease
%ge
Conversion
(x)
log (x)
I 100% 10000
II 20% 8000 80 80 1.903
III + 50% 12000 130 130 2.113
IV + 25% 15000 155 155 2.190
logx = 6.206
GM = Antilog
1
]
1
3
206 . 6
= 117.13
Average sales = 117.13 100 = 14.46%
Ex - 4: Find GM for following data.
Marks
(x)
No. of students
(f)
log x f log x
130 3 2.113 6.339
135 4 2.130 8.52
140 6 2.146 12.876
145 6 2.161 12.996
150 3 2.176 6.528
f = N = 22 f log x =47.23
77
GM = Antilog
1
]
1
N
x log f
GM = Antilog
1
]
1
22
23 . 47
GM = 140.212
Geometric Mean for continuous series
Steps:
1. Find mid value m and take log of m for each mid value.
2. Multiply log m with frequency f of each class to get f log m and sum up to obtain
f log m.
3. Divide f log m by N and take antilog to get GM.
Ex: Find out GM for given data below
Yield of wheat
in
MT
No. of farms
frequency
(f)
Mid value
m
log m f log m
1 10 3 5.5 0.740 2.220
11 20 16 15.5 1.190 19.040
21 30 26 25.5 1.406 36.556
31 40 31 35.5 1.550 48.050
41 50 16 45.5 1.658 26.528
51 60 8 55.5 1.744 13.954
f = N = 100 f log m = 146.348
GM = Antilog
1
]
1
N
m log f
GM = Antilog
1
]
1
100
348 . 146
GM = 29.07
Harmonic Mean
It is the total number of items of a value divided by the sum of reciprocal of
values of variable. It is a specified average which solves problems involving variables
expressed in within Time rates that vary according to time.
78
Ex: Speed in km/hr, min/day, price/unit.
Harmonic Mean (HM) is suitable only when time factor is variable and the act being
performed remains constant.
HM =
x
1
N
= 0.0738
HM =
x
1
N
=
0738 . 0
5
= 67.72
HM = 67.72
79
2. A man travel by a car for 3 days he covered 480 km each day. On the first day he
drives for 10 hrs at the rate of 48 KMPH, on the second day for 12 hrs at the rate of
40 KMPH, and on the 3
rd
day for 15 hrs @ 32 KMPH. Compute HM and weighted
mean and compare them.
Harmonic Mean
x
x
1
48 0.0208
40 0.025
32 0.0312
x
1
= 0.0770
Data:
10 hrs @ 48 KMPH
12 hrs @ 40 KMPH
15 hrs @ 32 KMPH
HM =
x
1
N
=
0770 . 0
3
HM = 38.91
Weighted Mean
w x wx
10 48 480
12 40 480
15 32 480
w = 37 wx = 1440
Weighted Mean =
w
wx
x
=
37
1440
91 . 38 x
Both the same HM and WM are same.
80
3. Find HM for the following data.
Class (CI) Frequency (f) Mid point (m) Reciprocal
,
_
m
1
f
,
_
m
1
0 10 5 5 0.2 1
10 20 15 15 0.0666 0.999
20 30 25 25 0.04 1
30 40 8 35 0.0285 0.228
40 50 7 45 0.0222 0.1554
f = 60
f
,
_
m
1
= 3.3824
HM =
,
_
m
1
f
N
=
3824 . 3
60
HM = 17.73
Relationship between Mean, Geometric Mean and Harmonic Mean.
1. If all the items in a variable are the same, the arithmetic mean, harmonic mean and
Geometric mean are equal. i.e., HM GM x .
2. If the size vary, mean will be greater than GM and GM will be greater than HM.
This is because of the property that geometric mean to give larger weight to smaller
item and of the HM to give largest weight to smallest item.
Hence, HM GM x > > .
Median
Median is the value of that item in a series which divides the array into two
equal parts, one consisting of all the values less than it and other consisting of all the
values more than it. Median is a positional average. The number of items below it is
equal to the number. The number of items below it is equal to the number of items
above it. It occupies central position.
Thus, Median is defined as the mid value of the variants. If the values are
arranged in ascending or descending order of their magnitude, median is the middle
value of the number of variant is odd and average of two middle values if the number of
variants is even.
Ex: If 9 students are stand in the order of their heights; the 5
th
student from either side
shall be the one whose height will be Median height of the students group. Thus,
median of group is given by an equation.
81
Median =
1
]
1
+
2
1 N
Ex
1. Find the median for following data.
22 20 25 31 26 24 23
Arrange the given data in array form (either in ascending or descending order).
20 22 23 24 25 26 31
Median is given by
1
]
1
+
2
1 N
th
item =
1
]
1
+
2
1 7
=
4
8
Median = 4
th
item.
2. Find median for following data.
20 21 22 24 28 32
Median is given by
1
]
1
+
2
1 N
th
item =
1
]
1
+
2
1 6
Median = 3.5
th
item.
The item lies between 3
rd
and 4.
So, there are two values 22 and 24.
The median value will be the mean values of these two values.
Median =
1
]
1
+
2
24 22
= 23
Discrete Series Median
In discrete series, the values are (already) in the form of array and the
frequencies are recorded against each value. However, to determine the size of median
1
]
1
+
2
1 N
th
item, a separate column is to be prepared for cumulative frequencies. The
median size is first located with reference to the cumulative frequency which covers the
size first. Then, against that cumulative frequency, the value will be located as median.
82
Ex: Find the median for the students marks.
Obtained in statistics
Marks (x)
No. of
students (f)
Cumulative
frequency
10 5 5
20 5 10
30 3 13
40 15 28
50 30 58
60 10 68
N = 68
Ex: In a class 15 students, 5 students were failed in a test. The marks of 10 students
who have passed were 9, 6, 7, 8, 9, 6, 5, 4, 7, 8. Find the Median marks of 15 students.
Marks No. of students (f) cf
0
5
1
2
3
4 1 6
5 1 7
6 2 9
7 2 11
8 2 13
9 2 15
f = 15
Median =
2
1 N
th
+
item
Me =
2
1 15+
= 8
th
Me 8
th
item covers in cf of 9. the marks against cf 9 is 6 and hence
Median = 6
83
Just above 34
is 58. Against
58 c.f. the
value is 50
which is
median value
Continuous Series
The procedure is different to get median in continuous series. The class
intervals are already in the form of array and the frequency are recorded against each
class interval. For determining the size, we should take
th
2
n
item and median class
located accordingly with reference to the cumulative frequency, which covers the size
first. When the median class is located, the median value is to be interpolated using
formula given below.
Median =
1
]
1
+ C
2
N
f
h
Where
2
1 0
+
where,
0
is left end point of N/2 class and l
1
is right end
point of previous class.
h = Class width, f = frequency of median clas
C = Cumulative frequency of class preceding the median class.
Ex: Find the median for following data. The class marks obtained by 50 students are as
follows.
CI Frequency (f)
Cum.
frequency (cf)
10 15 6 6
15 20 18 24
20 25 9
33 N/2 class
25 30 10 43
30 35 4 47
35 40 3 50
f = N = 50
25
2
50
2
N
Cum. frequency just above 25 is 33 and hence, 20 25 is median class.
2
1 0
20
2
20 20
+
20
h = 20 15 = 5
84
f = 9
c = 24
Median =
1
]
1
+ C
2
N
f
h
Median = [ ] 24 25
9
5
20 +
=
9
5
20+
Median = 20.555
Ex: Find the median for following data.
Mid values (m) 115 125 135 145 155 165 175 185 195
Frequencies (f) 6 25 48 72 116 60 38 22 3
The interval of mid-values of CI and magnitudes of class intervals are same i.e.
10. So, half of 10 is deducted from and added to mid-values will give us the lower and
upper limits. Thus, classes are.
115 5 = 110 (lower limit)
115 5 = 120 (upper limit) similarly for all mid values we can get CI.
CI Frequency (f)
Cum.
frequency (cf)
110 120 6 6
120 130 25 31
130 140 48 79
140 150 72 151
150 160 116 267 N/2 class
160 170 60 327
170 180 38 365
180 190 22 387
190 200 3 390
f = N = 390
2
390
2
N
195
Cum. frequency just above 195 is 267.
85
Median class = 150 160
=
2
150 150 +
= 150
h = 116
N/2 = 195
C = 151
h = 10
Median =
1
]
1
+ C
2
N
f
h
where,
f
m
= max frequency (modal class frequency)
f
1
= frequency preceding to modal class.
f
2
= frequency succeeding to modal class
h = class width.
or
Mode =
2 1
2
f f
hf
+
+
87
Ex:
1. Find the modal for following data.
Marks
(CI)
No. of students
(f)
1 10 3
11 20 16
21 30 26
31 40 31 Max. frequency
41 50 16
51 60 8
f = N = 100
We shall identify the modal class being the class of maximum frequency. i.e.
31-40.
where,
f
m
= 31
f
1
= 26
f
2
= 16
h = 10
2
31 30 +
30.5
Mode (z) =
2 1 m
1 m
f f f 2
) f f ( h
+
Mode =
16 26 31 x 2
26) - (31 10
30.5
+
Mode = 33.
Or
88
Mode =
2 1
2
f f
hf
+
+
=
) 16 26 (
16 x 10
5 . 30
+
+
Mode = 34.30
It can be noted that there exists slightly different mode value in the second
method.
Partition values
Median divides in to two equal parts. There are other values also which divides
the series partitioned value (PV).
Just as one point divides as series in to two equal parts (halves), 3 points divides
in to four points (Quartiles) 9 points divides in to 10 points (deciles) and 99 divide in to
100 parts (percentage). The partitioned values are useful to know the exact
composition of series.
Quartiles
A measure, which divides an array, in to four equal parts is known as quartile.
Each portion contain equal number of items. The first second and third point are
termed as first quartile (Q
1
). Second quartile (Q
2
) and third quartile (Qs). The first
quartile is also known as lower quartiles as 25% of observation of distribution below it,
75% of observations of the distribution below it and 25% of observation above it.
Calculation of quartiles
Q
1
= size of
( )
item
4
1 N
th
+
Q
2
= size of
( )
item
4
1 N 3
th
+
Q
2
= (median) =
1
]
1
+ C
2
N
f
h
Measures of quartiles
The quartile values are located on the principle similar to locating the median
value.
89
Following table shows procedure of locating quartiles.
Measure
Individual and Discrete
senses
Continuous series
Q
1
( )
item
4
1 N
th
+
item
4
n
th
Q
2
( )
item
4
1 N 2
th
+
item n
4
2
th
Q
3
( ) item 1 N
4
3
th
+ item n
4
3
th
Ex - 1: From the following marks find Q
1
, Median and Q
3
marks
23, 48, 34, 68, 15, 36, 24, 54, 65, 75, 92, 10, 70, 61, 20, 47, 83, 19, 77
Let us arrange the data in array form.
Sl.
No.
x
1. 10
2. 15
3. 19
4. 20
5.
23 Q
1
6. 24
7. 34
8. 36
9. 47
10.
48 Q
2
11. 54
12. 61
13. 65
14. 68
15.
70 Q
3
16. 75
17. 77
18. 83
19. 92
90
a. Q
1
= ( ) item 1 n
4
1
th
+
Q
1
= ( ) 1 19
4
1
+ Here, n = 19 items
Q
1
= 20 x
4
1
Q
1
= 5
th
item
Q
1
= 23
b. Q
2
= ( ) item 1 n
4
2
th
+
Q
2
= 20 x
4
2
10
th
item
Q
2
= 48
c. Q
3
= ( ) item 1 n
4
3
th
+
Q
3
= 20 x
4
3
= 15
th
item
Q
3
= 70
Ex - 2: Locate the median and quartile from the following data.
Size of shoes 4 4.5 5 5.5 6 6.5 7 7.5 8
Frequencies 20 36 44 50 80 30 30 16 14
X f cf
4 20 20
4.5 36 56
5 44
100 Q
1
5.5 50 150
6 80
230 Q
2
6.5 30
260 Q
3
7 30 290
7.5 16 306
8 14 320
N = f = 320
91
Q
1
= ( ) item 1 n
4
1
th
+
Q
1
= 321
4
1
Q
1
= 80.25
th
item
Just above 80.25, the cf is 100. Against 100 cf, value is 5.
Q
1
= 5
Q
2
= ( ) item 1 n
2
1
th
+
Q
2
= 321 x
2
1
160.5
th
item
Just above 160.5, the cf is 230. Against 230 cf value is 6.
Q
2
= 6
Q
3
= ( ) item 1 n
4
3
th
+
Q
3
= x
4
3
321 = 240.75
th
item
Just above 240.75, the cf is 260. Against 260 cf value is 6.5.
Q
3
= 6.5
Ex - 3: Compute the quartiles from the following data.
CI 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency
(f)
5 8 7 12 28 20 10 10
First quartile (Q
1
) =
1
]
1
+ C N
4
1
f
h
and Q
3
=
1
]
1
+ C N
4
3
f
h
and (Q
2
) = Median =
and C
2
N
f
h
1
]
1
+
92
CI f cf
0-10 5 5
10-20 8 13
20-30 7 20
30-40 12
32 Q
1
40-50 28
60 Q
2
50-60 20
80 Q
3
60-70 10 90
70-80 10 100
N = f = 100
a. First locate Q
1
for N
N = 25
= 30
h = 10
f = 12
c = 20
(Q
1
) =
1
]
1
+ C N
4
1
f
h
= 30
2
30 30
+
+
Q
1
= [ ] 20 25
12
10
30 +
Q
1
= 34.16
b. Locate Q
2
(Median)
Q
2
corresponds to N/2 = 50, 40
2
40 40
+
+
Q
2
=
1
]
1
+ C
2
N
f
h
Q
2
= [ ] 32 50
28
10
40 +
Q
2
= 46.42
93
Q
3
corresponds to N = 75, 50
2
50 50
+
+
Q
3
=
1
]
1
+ C N
4
3
f
h
Q
3
= [ ] 60 75
20
10
50 +
Q
3
= 57.5
Deciles
The deciles divide the arrayed set of variates into ten portions of equal
frequency and they are some times used to characterize the data for some specific
purpose. In this process, we get nine decile values. The fifth decile is nothing but a
median value. We can calculate other deciles by following the procedure which is used
in computing the quartiles.
Formula to compute deciles.
, C N
10
1
f
h
D
1
,
_
+
,
_
+ C N
20
2
f
h
D
2
on , so &
Percentiles
Percentile value divides the distribution into 100 parts of equal frequency. In
this process, we get ninety-nine percentile values. The 25
th
, 50
th
and 75
th
percentiles are
nothing but quartile first, median and third quartile values respectively.
Formula to compute percentiles is given below:
P
25
=
, C N
100
25
f
h
,
_
+
P
26
=
,
_
+ C N
100
26
f
h
and so, on
Ex:
Find the decile 7 and 60
th
percentile for the given data of patients visited to hospital on a
particular day.
CI f Cf
10-20 1 1
20-30 3 4
30-40 11 15
40-50 21 36
50-60 43
79 P
60
60-70 32
111 D
70
70-80 9 120
f = N = 120
94
a. D
7
=
, C N
10
7
f
h
,
_
+
60
2
60 60
+
84 N
10
7
h = 10, f = 32
c = 79
D
7
= ( ) 79 84
32
10
60 +
7
th
Decile = D
7
= 61.562
b. 60
th
percentile
P
60
=
,
_
+ C N
100
60
f
h
50
2
50 50
+
h = 10
f = 43
c = 36
72 N
100
60
P
60
= ( ) 36 72
43
10
50 +
P
60
= ( ) 36 72
43
10
50 +
P
60
= 58.37
SOME NUMERICAL EXAMPLES
1. Show that following distribution is symmetrical about the average. Also shows
that median is the mid-way between lower and upper quartiles.
X 2 3 4 5 6 7 8 9 10
Frequency 2 9 29 57 80 57 29 9 2
To show the given distribution is symmetrical, Mean, Median and Mode must
be same.
95
To show median is mid-way between the lower and upper quartile i.e., Q
2
Q
1
= Q
3
Q
2
.
Mid-point
x
Class interval
CI
f d = (x 6) fd
cf
Cum. freq.
2 1.5 2.5 2 -4 -8 2
3 2.5 3.5 9 -3 -27 11
4 3.5 4.5 29 -2 -58 40
5 4.5 5.5 57 -1 -57 97 Q
1
class
6 5.5 6.5 80 0 0 177 Q
2
class
7 6.5 7.5 57 1 57 234 Q
3
class
8 7.5 8.5 29 2 58 263
9 8.5 9.5 9 3 27 272
10 8.5 10.5 2 4 8 274
N=274 fd = 0
Let A = 6
Mean =
N
fd h
A
+
= 6
274
0 x 1
6 +
Mean = 6.
Median
Q
2
=
1
]
1
+ C
2
N
f
h
137
2
274
2
N
C = 97
Q
2
= [ ] 97 137
80
1
. 5 +
Q
2
= 5.5 + 0.5
Median = Q
2
= 6.
96
Mode
Mode =
( )
2 1 m
1 m
f f f 2
f f h
+
Modal class 5.5 6.5
Mode =
( )
57 57 80 x 2
57 80 1
5 . 5
+
Mode = 6.
Since, Mean = Mode = Median. The given distribution is symmetrical.
Q
1
calculation
Q
1
=
1
]
1
+ C N
2
4
f
h
Q
1
= [ ] 40 5 . 68
57
1
5 . 6 +
Q
1
= 7.
Now, Q
2
Q
1
= Q
3
Q
2
i.e. 6 5 = 7 5
2 = 2
2. Find the mean for the set of observations given below.
6, 7, 5, 4
5
4 8 7 8 6
N
xi
x
1 i
n
+ + + +
= 6
5
30
32 x
4. Find the mean profit of the organisation for the given data below:
Profit CI f x
i
fx
100-200 10 150 1500
200-300 18 250 4500
300-400 20 350 7000
400-500 26 450 11700
500-600 30 550 16500
600-700 28 650 18200
700-800 18 750 13500
N = f = 150 72900
x
1
=
2
200 100 +
x
1
=
2
300
x
1
= 150
N
fx
x
=
150
72900
x 486
Step Deviation Method
x = a + hd d =
h
a x
N
fd
h a x
+
a = Arbitrary constant
h = class width
98
Profit CI f x
i
d fd
100-200 10 150 -3 -30
200-300 18 250 -2 -36
300-400 20 350 -1 -20
400-500 26 450 0 0
500-600 30 550 +1 30
600-700 28 650 +2 56
700-800 18 750 +3 34
N = f = 150 fm = 54
N
fd
h a x
+
,
_
+
150
54
100 450 x
486 x
5. In an office there are 84 employees and there salaries are given below.
Salary 2430 2590 2870 3390 4720 5160
Employees 4 28 31 16 3 2
1. Find the mean salary of the employees
2. What is the total salary of the employees?
N
fx
x
=
84
2 x 5160 3 x 4730 16 x 3390 31 x 2870 28 x 2590 4 x 2430 + + + + +
N
fx
x
84
249930
x
Rs. 2975.36
1. x 2975.36
2. Total salary = 2,49,930 (Rs.)
99
6. The average marks secured by 36 students was 52 but it was discovered that on item
64 was misread as 46. Find the correct me of the marks.
N
fx
x
56
fx
52
fx = 52 x 36 = 1872
fx = fx - incorrect + correct
correct = 1872 46 64 = 1890
x
N
correct fx
x
36
1890
x 52.5
7. The mean of 100 items is 46, later it was discovered that an item 16 was misread as
61 and another item 43 was misread as 34 and also found that the total number of
items are 90 not 100 find the correct mean value.
N
fx
x
100
fx
46
fx = 4600
fx = fx - incorrect + correct
= 4600 61 - 34 + 16 + 43
= 4564
x
N
correct fx
x
90
4564
= 50.71
100
8. Calculate the mean for the following data.
Value Frequency
< 10 4
< 20 10
< 30 15
< 40 25
< 50 30
CI f m mid point fm
0-10 4 5 20
10-20 10 15 150
20-30 15 25 375
30-40 25 35 875
40-50 30 45 1350
f = 84 fx 2770
N
fm
x
84
2770
x 32.97
9. For a given frequency table, find out the missing data. The average accident are
1.46.
No. of accidents Frequency
0 46
1 ?
2 ?
3 25
4 10
5 5
101
No. of accidents
(x)
Frequency
(f)
fx
0 46 0
1 ? f
1
2 ? 2f
1
3 25 75
4 10 40
5 5 25
N = 200
fx = 140 + f
1
+ 2f
2
1.46 =
200
f 2 f 140
2 1
+ +
292 = 140 + f
1
+ 2f
2
f
1
+ 2f
2
= 152 ----(1)
w.k.t. N = f
200 = 86 + f
1
+ f
2
f
1
+ f
2
= 114 ----(2)
f
1
+ 2f
2
= 152 ----(1)
f
1
+ f
2
= 114 ----(2) (1) (2)
---------------------------------
f
2
= 38
---------------------------------
f
2
= 38
f
1
+ f
2
= 114
f
1
+ 114 38
f
1
= 76
102
10. Find out the missing values of the variate for the following data with mean is 31.87.
x
i
F
12 8
20 16
27 48
33 90
? 30
54 8
N = 200
x
i
f fx
12 8 96
20 16
320
27 48 1296
33 90 2970
x 30 30x
54 8 432
N = 200
fx = 5114 + 30x
x 31.87
N
fx
x
200
fx
87 . 31
fx = 6374 ----(1)
fx = 5114 + 30x ----(2)
(1) = (2)
6374 = 5114 + 30x
6374 - 5114 = 30x
30x = 1260
x = 42.
103
11. The average rainfall of a city from Monday to Saturday is 0.3 inches. Due to heavy
rainfall Sunday the average rainfall for the week increased to 0.5 inches. What is
the rainfall on Sunday?
Given: Mon Sat = 0.3
Sun = 0.5
N
fx
x
1
6
fx
3 . 0
1
fx
1
= 1.8
N
fx
x
2
7
fx
5 . 0
2
fx
2
= 3.5
Rainfall on Sunday = fx
2
fx
1
= 3.5 1.8
= 1.7
12. The average salary of male employees in a firm was Rs. 520 and that of females Rs.
420 the mean of salary of all the employees as a whole is Rs. 500. Find the
percentage of male and female employees.
Given: 520 x
1
420 x
2
500 x
n
1
= Male persons. n
2
= Female persons.
2 1
2 2 1 1
n n
x n x n
x
+
+
2 1
2 1
n n
420 x n 520 x n
500
+
+
2 1
2 1
n n
n 420 n 520
500
+
+
500n
1
+ 500n
2
= 520n
1
420n
2
80n
2
= 20n
1
n
1
= 4n
2
Let n
1
+ n
2
= 100
4n
2
+ n
2
= 100
5n
2
= 100
n
2
= 20% Female
n
1
= 80% Male
20% and 80% are male and females in the firm.
104
13. The A-M of two observations is 25 and there GM is 15. Find the HM.
Given:
AM = 25
2
b a
x
+
2
b a
x
+
2
b a
25
+
a + b = 50
GM = 15
GM = ab 2
GM = ab
15 = ab
(15)
2
= ( ab )
2
ab = 225
HM = ?
HM =
b
1
a
1
2
+
HM =
b a
ab 2
+
HM =
50
225 x 2
HM = 9
a + b = 50
ab = 225
a =
b
225
HM = 9
14. The GM is 60 an HM is 28.24. Find AM for two observations.
AM GM HM
2
b a
x
+
2
b 95 254
x
= 127.475
60 = ab
60
2
= ab
ab = 3600
28.24 =
b a
ab 2
+
a + b =
4 . 28
ab 2
=
4 . 28
3600 x 2
a + b = 254.95
105
15. Calculate the missing frequency from the data if the median is 50.
CI f cf
10-20
2 2
20-30
8 10
30-40
6 16
40-50
? f
1
16+f
1
50-60
15
31+f
1
median class
60-70
10 41+f
1
f = 41 + f
1
Q =
1
]
1
+ C
2
N
f
h
50 = 50 +
1
]
1
+ ) f 16 (
2
N
15
10
1
50 50 =
1
]
1
+ ) f 16 (
2
N
15
10
1
0 =
1
]
1
+ ) f 16 (
2
N
15
10
1
0 =
1
]
1
+ ) f 16 (
2
N
10
1
0 =
1
]
1
+ ) f 16 (
2
N
1
2
N
) f 16 (
1
+
16 + f
1
= (41 + f
1
)
2 (16 + f
1
) = 41 + f
1
32 + 2f
1
= 41 + f
1
f
1
= 9
106
SOURCES AND REFERENCES
1. Statistics for Management, Richard I Levin, PHI / 2000.
2. Statistics, RSN Pillai and Bagavathi, S. Chands, Delhi.
3. An Introduction to Statistical Method, C.B. Gupta, & Vijaya Gupta, Vikasa
Publications, 23e/2006.
4. Business Statistics, C.M. Chikkodi and Salya Prasad, Himalaya Publications, 2000.
5. Statistics, D.C. Sancheti and Kappor, Sultan Chand and Sons, New Delhi, 2004.
6. Fundamentals of Statistics, D.N. Elhance and Veena and Aggarwal, KITAB
Publications, Kolkata, 2003.
7. Business Statistics, Dr. J.S. Chandan, Prof. Jagit Singh and Kanna, Vikas
Publications, 2006.
107
Session 7
Measures of Dispersions
The measures of Central Tendency alone will not exhibit various characteristics
of the frequency distribution having the same total frequency. Two distribution can
have the same mean but can differ significantly. We need to know the extent of
variation or deviation of the values in comparison with the central value or average
referred to as the measures of central tendency.
Measures of dispassion are the average of second order. The are based on the
average of deviations of the values obtained from central tendencies
x
, Me or z. The
variability is the basic feature of the values of variables. Such type of variation or
dispersion refers to the lack of uniformity.
Definition: A measure of dispersion may be defined as a statistics signifying the extent
of the scatteredness of items around a measure of central tendency.
Absolute and Relative Measures of Dispersion:
A measure of dispersion may be expressed in an absolute form, or in a relative
form. It is said to be in absolute form when it states the actual amount by which the
value of item on an average deviates from a measure of central tendency. Absolute
measures are expressed in concrete units i.e., units in terms of which the data have been
expressed e.g.: Rupees, Centimetres, Kilogram etc. and are used to describe frequency
distribution.
A relative measures of dispersion is a quotient by dividing the absolute
measures by a quality in respect to which absolute deviation has been computed. It is
as such a pure number and is usually expressed in a percentage form. Relative
measures are used for making comparisons between two or more distribution.
Thus, absolute measures are expressed in terms of original units and they are
not suitable for comparative studies. The relative measures are expressed in ratios or
percentage and they are suitable for comparative studies.
Measures of Dispersion Types
Following are the common measures of dispersions.
a. The Range
b. The Quartile Deviation (QD)
c. The Mean Deviation (MD)
d. The Standard Deviation (SD)
108
Range
Range represents the differences between the values of the extremes. The
range of any such is the difference between the highest and the lowest values in the
series.
The values in between two extremes are not all taken into consideration. The
range is an simple indicator of the variability of a set of observations. It is denoted by
R. In a frequency distribution, the range is taken to be the difference between the
lower limit of the class at the lower extreme of the distribution and the upper limit of
the distribution and the upper limit of the class at the upper extreme. Range can be
computed using following equation.
Range = Large value Small value
value Small value e arg L
value Small value e arg L
Range of t Coefficien
+
Problems
1. Compute the range and also the co-efficient of range of the given series of state
which one is more dispersed and which is more uniform.
Series I 9, 10, 15, 19, 21 Series II 1, 15, 24, 28, 29
R = LV SV = 21 9 = 12 R = LV SV = 29 1 = 28
CR =
30
12
9 21
12
+
= 0.4 CR =
30
28
SV LV
R
+
= 0.933
Series I is les dispersed and more uniform
Series II is more dispersed and less uniform
Evaluating Criteria
i. Less the CR is less dispersion
ii. More the CR is less uniform
Range Merits
i. It is very simplest to measure.
ii. It is defined rigidly
iii. It is very much useful in Statistical Quality Control (SBC).
iv. It is useful in studying variation in price of shars and stocks.
109
Limitations
i. It is not stable measure of dispersion affected by extreme values.
ii. It does not considers class intervals and is not suitable for C.I. problems.
iii. It considers only extreme values.
2. Find range of Co-efficient of range from following data.
A: 10 11 12 13 14
B: 40 41 42 43 44
C: 100 101 102 103 104
Series - I Series II Series III
R =LV 3m
= 14 10
= 4
CR =
SV LV
R
+
=
24
4
= 0.166
R = 44 - 40
= 4
CR =
SV LV
R
+
=
84
4
= 0.0476
R = 104 - 100
= 4
CR =
SV LV
R
+
=
204
4
= 0.0196
Series III is less dispersed and more uniform
Series I is more dispersed and less uniform
3. Compute range and coefficient of range for the following data.
x: 6 12 18 24 30 36 42
f: 20 130 16 14 20 15 40
Range = LV SV = 42 6 = 36
CR =
SV LV
R
+
=
48
36
= 0.75
110
Quartile Deviation
Quartile divides the total frequency in to four equal parts. The lower quartile Q
1
refers to the values of variates corresponding to the cumulative frequency N/4.
Upper quartile Q
3
refers the value of variants corresponding to cumulative
frequency N.
Quartile deviation is defined as QD =
2
1
(Q
3
Q
1
). In this quartile Q
2
as it
corresponds to the value of variate with cumulative frequency is equal to c.f. =
2
N
.
a) QD =
2
1
(Q
3
Q
1
)
b) Relative measure of dispersion coefficient of QD =
1 3
1 3
Q Q
Q Q
+
Problems
1. Find quartile deviation and coefficient of quartile deviation for the given grouped
data also compute middle quartile.
Class f
1 10 3
11 20 16
21 30 26
31 40 31
41 50 16
51 60 8
f = N = 100
Class f Cf
1 10 3 3
11 20 16 19
21 30 26
45 Q
1
Class
31 40 31
76 Q
2
& Q
3
Class
41 50 16 92
51 60 8 100
N = 100
111
Q
1
4
N
=
25
4
100
Q
1
=
1
]
1
+ C
4
N
f
h
Q
1
= [ ] 19 25
26
10
5 . 20 +
Q
1
= 22.80
Q
2
=
1
]
1
+ C
2
N
f
h
Q
2
= [ ] 45 50
31
10
5 . 30 +
Q
2
= 32.11
Q
3
=
1
]
1
+ C N
4
3
f
h
Q
3
= [ ] 45 75
31
10
5 . 30 +
Q
3
= 40.17
QD =
2
1
(Q
3
Q
1
) = 0.5 (Q
3
Q
1
)
=
2
1
(40.17 22.80)
= 8.685
Coef. QD =
1 3
1 3
Q Q
Q Q
+
=
80 . 22 17 . 40
80 . 22 17 . 40
+
97 . 62
37 . 17
= 0.275
112
2. Find quartile deviation from the following marks of 12 students and also
co-efficient of quartile deviation.
Sl. No. Marks
1. 25
2. 30
3. 37
4. 43
5. 48
6. 54
7. 61
8. 67
9. 72
10. 80
11. 84
12. 89
Q
1
= 3.25
th
item
= 3
rd
item + 0.25 of item
= 37 + 0.25 (43 - 37)
Q
1
= 38.5
Q
3
=9.75
th
item
= 9 + 0.75
rd
item
= 72 + 0.75 (80- 72)
Q
3
= 78
QD =
2
1
(Q
3
Q
1
)
=
2
1
(78 38.3)
QD = 19.75
Coef. QD =
1 3
1 3
Q Q
Q Q
+
=
5 . 38 78
5 . 38 78
+
= 0.339
3. Compute quartile deviation. and its Coefficient for the data given below:
113
x f Cf
58 15 15
59 20 35
60 32
67 Q
1
Class
61 35 102
62 33 135
63 22
157 Q
3
Class
64 20 177
65 10 187
65 8 195
N = 195
Q
1
= size
4
1 n
th
+
= size
4
1 195
th
+
Q
1
= 48.78
th
size and corresponding to cf 67, which gives
Q
1
= 60
Q
3
= ( ) size 1 n
4
3
th
+
= ( ) . size 33 . 146 196
4
3
th th
=
123
3
60 63
60 63
= 0.024
Merits of Quartile Deviation
114
It is very easy to compute
It is not affected by extreme values of variable.
It is not at all affected by open and class intervals.
Demerits of Quartile Deviation
It ignores completely the portions below the lower quartile and
above the upper of quartile.
It is not capable for further mathematical treatment.
It is greatly affected by fluctuations in the sampling.
It is only the positional average but not mathematical average.
115
Session 8
Measures of Dispersions
Mean Deviation
Mean deviation is the average differences among the items in a series from the
mean itself or median or mode of that series. It is concerned with the extent of which
the values are dispersed about the mean or median or the mode. It is found by
averaging all the deviations from control tendency. These deviations are taken into
computations with regard to negative sign. Theoretically the deviations of item are
taken preferably from median instead than from the mean and mode.
Merits of Mean Deviation
It is rigidly defined and easy to compute.
It takes all items in to considerations and gives weight to deviation
according to these sign.
It is less affected by extreme values.
It removes all irregularities by obtaining deviation and provides
correct measures.
Demerits of Mean Deviation
It is not suitable for algebraic treatments.
It is positive which is not justified mathematically.
It is not satisfactory measure when the deviations are taken from
mode.
It is not suitable when class intervals are open end.
116
Formula to compute Mean Deviation
If x
i
is variant and takes the values x
1
, x
2
, x
3
, .. x
n
with average. A (mean,
median, mode), then mean deviation from the average A is defined by
MD =
N
A x
i
For the grouped data
MD =
N
A x f
i
Coefficient of MD =
Mean
MD
1. Compute MD and CMD from mean for the given data below.
X
d =
x x
i
21 26.55
32 15.55
38 9.55
41 6.55
49 1.45
54 6.45
59 11.45
66 18.45
68 20.45
x = 428
x x
i
= d= 116.45
1 i
i
n
x x
35 . 47
9
428
x
MD =
9
45 . 116
N
x x
i
MD = = 12.938
Coefficient of MD =
Avg
MD
=
55 . 47
938 . 12
= 0.272
117
2. Following are the wages of workers. Find mean deviation from median and
its coefficient.
x Wages
Me x
i
=
47 x
i
59 17 30
32 22 25
67 25 22
43 32 15
22 43 4
17
47 M
0
64 55 8
55 59 12
47 64 17
80 67 20
25 80 33
25
M x
i
= 186
M x
i
= 186
Median =
,
_
+
2
1 11
th
item
=
,
_
+
2
1 11
= 6
th
item
Me = 47
MD =
N
Me x
i
=
11
186
= 16.91
Coefficient of MD =
Median
MD
47
91 . 16
= 0.359
3. Compute MD about its mode and its coefficient.
118
x f
d =
Mode x
i
fd
20 6 100 600
40 19 80 1520
60 40 60 2400
80 23 40 920
100 65 20 1300
120 Mode 83 Modal
class
0 0
140 55 20 1100
160 20 40 800
180 9 60 5401
f = 320
f
Mode x
i
=
9180
the highest frequency is 83 and hence
Z = 120
MD=
N
Mode x
i
Median =
,
_
320
9180
= 28.68
Coefficient of MD =
120
68 . 28
= 0.239
119
4. Find out the mean deviation from the data given below about its median.
Salaries 40 50 50-100 100-200 200-400
No. of Employees 22 18 10 8 2
x
No. of
Employees
x(mv) cf d =
Me x
i
fd
40 22 40 22 10 220
50 18 50 40 0 0
50-100 10 75 50 25 250
100-200 8 150 58 100 800
200-400 2 300 60 250 500
f = 60
f
Me x
i
= 1770
Median =
th
2
1 N
,
_
+
item
=
2
1 60 +
=
2
61
= 30.5 It lies in 40 cf and against 40 cf
discrete value is 50
MD =
N
Median x
i
=
,
_
60
1770
MD = 29.5
Coefficient of MD =
Median
MD
=
50
5 . 29
= 0.59
Session 9
Measures of Dispersions
120
Standard Deviation
Standard deviation is the root of sum of the squares of deviations divided by
their numbers. It is also called Mean error deviation. It is also called mean square
error deviation (or) Root mean square deviation. It is a second moment of dispersion.
Since the sum of squares of deviations from the mean is a minimum, the deviations are
taken only from the mean (But not from median and mode).
The standard deviation is Root Mean Square (RMS) average of all the
deviations from the mean. It is denoted by sigma ().
Characteristics of standard deviation
1. Standard deviation and coefficient of variation possesses all these properties
which a good measure of dispersion should possess.
2. The process of squaring the deviation eliminates negative sign and makes
mathematical computations easy.
Merits
1. It is based on all observations.
2. It can be smoothly handled algebraically.
3. It is a well defined and definite measure of dispersion.
4. It is of great importance when we are making comparison between
variability of two series.
Merits
1. It is difficult to calculate and understand.
2. It gives more weightage to extreme values as the deviation is squared.
3. It is not useful in economic studies.
Standard deviation
If the variant x
i
takes the values of x
1
, x
2
.. x
n
the standard deviation
denoted by and it is defined by
=
( )
N
x x
2
i
The quantity
2
is called variance.
121
Alternate Expressions
For raw data
2
= ( )
2
2
x
n
x
,
_
Coefficient of variance
It is defined as the ratio to be equal to standard deviation divided by mean. The
percentage form of CV is given by CV =
100 x
x
122
Problems
1. Ten students of a class have obtained the following marks in a particular subject out
of 100. Calculate SD and CV for the given data below.
Sl. No.
(x)
marks
d = (x
1
= 38.5)
d = (x
1
-
x
)
(x
1
-
x
)
2
1. 5 - 33.5 1122.25
2. 10 - 28.4 812.25
3. 20 - 18.5 342.25
4. 25 - 13.5 182.25
5. 40 1.5 2.25
6. 42 3.5 12.25
7. 45 6.5 42.25
8. 48 9.5 90.25
9. 70 31.5 992.25
10. 80 41.5 1722.25
x = 385
(x
1
-
x
)
2
=
d
2
= 5320.50
N
x
x
=
10
385
= 38.5
=
( )
N
x x
2
i
=
10
5 . 5320
= 23.066
CV =
100 x
x
CV = 100 x
5 . 38
23
CV = 59.9%
123
2. Compute standard deviation and coefficient of varience for following data of 100
students marks.
Class f Class
Mid
point
x
d fd fd
2
1 10 3 0.5 10.5 5.5 -2 -6 12
11 20 16 10.5 20.5 15.5 -1 -16 16
21 30 26 20.5 30.5 25.5 0 0 0
31 40 31 30.5 40.5 35.5 1 31 31
41 50 16 40.5 50.5 45.5 2 32 64
51 60 8 50.5 60.5 55.5 3 24 72
N = f =
100
fd = 65 fd
2
= 195
a = 25.5
d = d
10
5 . 25 x
h
a x
d = 1
10
10
10
5 . 25 5 . 15
N
fd
h a x
+ +
,
_
+
100
65
10 5 . 25
= 25.5 + 6.5
x 32
= h
2
2
N
fd
N
fd
,
_
= 10
2
100
65
100
195
,
_
= 12.359
CV =
100 x
x
CV = 100 x
32
359 . 12
= 38.62%
124
3. The AM and SD of a set of nine items are 43 and 5 respectively if an item of value
63 is added, find the mean and SD.
N
x
x
i
x
i
=
x
x N
x
i
= 43 x 9
x = 387 for 9 items
x = 387 + 63 for 10 item
x = 450
Modified mean
10
450
N
x
x
x
= 45
x
= 43 = 5 for 9 items
2
= ( )
2
2
x
N
x
25 = ( )
2
2
43
9
x
25 = 1849
9
x
2
25 + 1849 =
9
x
2
9
x
2
= 1874
x
2
= 1874
x
2
= 16866 for 9 items
If 63 is added
x
2
= 16866 + (63)
2
= 20835 for 10 items
Modified
2
= ( )
2
2
x
N
x
2
=
( )
2
45
10
20835
2
= 7.64 is modified SD.
125
4. The mean of 5 observations is 4.4. and variance is 8.24 and if the 3 items of the five
observations are 1, 2 and 6. Find the values of other two observations.
w.k.t.
N
x
x
N
x
4 . 4
x = 22
2
= ( )
2
2
x
N
x
8.24 = ( )
2
2
4 . 4
5
x
8.24 = 36 . 19
9
x
2
8.24 + 19.36 =
5
x
2
x
2
= 138
x
2
= 1
2
+ 2
2
+ 6
2
+ x
1
2
+ x
2
2
138 = 1 + 4 + 36 + x
1
2
+ x
2
2
97 = x
1
2
+ x
2
2
x
1
2
+ x
2
2
= 97 ---- (1)
x = 1 + 2 + 6 + x
1
+ x
2
22 = 9 + x
1
+ x
2
x
1
+ x
2
= - 13 ---- (2) put (2) in (1)
x
2
= 13 x
1
by (1) & (2)
x
1
2
+ (13 x
1
)
2
= 97
x
1
2
+ 169 + x
1
2
26x
1
= 97
2 x
1
2
26x
1
+ 72 = 0
x
1
2
13x
1
+ 36 = 0
126
x
1
=
a 2
49 b b -
2
t
x
1
=
2
36 x 4 169 (-13) - t
x
1
=
2
5 13 t
x
1
=
2
5
2
13
t
x
1
= 6.5 t 2.5
x
1
= 9 or x
1
= 4
x
1
= 9 x
2
= 4
127
5. The mean and S.D. of the frequency distribution of a continuous random variable x
are 40.604 and 7.92 respectively. Change of origin and scale is given below.
Determine the actual class interval.
d -3 -2 -1 0 1 2 3 4
f 3 15 45 57 50 36 25 9
d f fd fd
2
MV CI
-3 3 -9 27 22.5 20-25
-2 15 -30 60 29.5 25-30
-1 45 -45 45 32.5 30-35
0 57 0 0 37.5 35-40
1 50 50 50 42.5 40-45
2 36 72 144 47.5 50-55
3 25 75 225 52.5 55-60
4 9 36 144 57.5
N = 240 fd = 149 fd
2
= 695
N
fd
h a x
+
240
149
h a 604 . 40 +
40.604 = a + 0.62h ----- (1)
= h
2
2
N
fd
N
fd
,
_
7.92 = h
2
240
149
240
695
,
_
= h 620 . 0 895 . 2
7.92 = h x 1.584
h = 4.998
h = 5
Put h = 5 in equation (1)
40.604 = a + 0.62 x 5
a = 37.5
128
Combined Standard Deviation
Suppose we have different samples of various sizes n
1
, n
2
, n
3
.. having
means x
1
, x
2
, x
3
and standard deviation
1
,
2
,
3
. then combine standard
deviation can be computed by the following formula.
2
(n
1
+ n
2
) = n
1
(
1
2
+ d
1
2
) + n
2
(
2
2
+ d
2
2
)
d
1
= x x
1
d
2
= x x
2
1. The means of two samples of sizes 50 and 100 respectively are 54.1 and 50.3 and
there standard deviations are 8 and 7 respectively obtain the SD for combined
group.
n
1
= 50
1
x = 54.1
1
= 8
n
2
= 100
2
x = 50.3
2
= 7
) n n (
x n x n
x
2 1
2 2 1 1
+
+
100 50
) 3 . 50 x 100 ( ) 1 . 54 x 50 (
x
+
+
x 51.56
2
(n
1
+ n
2
) = n
1
(
1
2
+ d
1
2
) + n
2
(
2
2
+ d
2
2
)
d
1
= x x
1
d
2
= x x
2
d
1
= 94.1 51.56
d
1
= 2.54 d
1
2
= 6.45
d
2
= 50.3 51.56
d
2
= - 1.26 d
2
2
= 1.56
2
150 = 50 (8
2
+ 6.45) + 100 (7
2
+ 1.58)
3
2
= (64 + 6.45) + 2 (49 + 1.58)
3
2
= 70.45 + 2 x 50.58
= 7.56
129
2. The mean wage is Rs. 75 per day, SD wage is Rs. 5 per day for a group of 1000
workers and the same is Rs. 60 and Rs. 4.5 for the other group of 1500 workers.
Find mean and standard deviation for the entire group.
We have by data,
1
x = 75,
1
= 5, n
1
= 1000
2
x = 60,
2
= 450, n
2
= 1500
Let
x
and be the mean and SD of the entire group.
Consider
2 1
2 2 1 1
n n
x n x n
x
+
+
i.e.,
0 6
1500 1000
60 x 1500 75 x 1000
x
+
+
Also we have,
(n
1
+ n
2
)
2
= n
1
(
1
2
+ d
1
2
) + n
2
(
2
2
+ d
2
2
),
where d
1
=
1
x -
x
= 75 66 = 9; d
2
=
2
x -
x
= 60 66 = -6
(1000 + 1500)
2
= 1000 (5
2
+ 9
2
) + 1500 (4.5
2
+ (-6)
2
)
2
= 76.15 or = 8.73
3. The runs scored by 3 batsman are 50, 48 and 12. Arithmtic means respectively.
The SD of there runs are 15, 12 and 2 respectively. Who is t he most consistent of
the three batsman? If the one of these three is to be selected who is to be selected?
A B C
AM (
x
)
50 48 12
SD()
15 12 2
CV
A
=
A
A
x
x 100
CV
A
=
50
15
x 100
CV
A
= 30%
CV
B
=
B
B
x
x 100
CV
B
=
48
12
x 100
CV
B
= 25%
CV
C
=
C
C
x
x 100
CV
C
=
12
2
x 100
CV
C
= 16.66%
Evaluation Criteria
1. Less CV indicates more constant player and hence more consistent player is
(Player C)
2. Highest rune scorer =
x A
= 50
4. The coefficient of variation of the two series are 75% and 90% with SD 15 and 18
respectively compute there mean.
CV
A
= 75%
CV
B
= 80%
A
= 15
B
= 18
CV
=
100 x
x
75 =
100 x
x
15
A
90 =
100 x
x
18
A
x
A
= 20
x
A
= 20
5. Goals scored by two teams A & B in a foot ball season are as shown below. By
calculating CV in each, find which team may be considered as more consistent.
No. of goals
x
No. of matches
Team (A)
fx
Team (B)
fx A-team B-team
0 27 17 0 0
1 9 9 9 9
2 8 6 16 12
3 5 5 15 15
4 4 3 16 12
N = f = 53 f = 40 fx = 56 fx
2
= 48
Team (A)
fx
2
Team (B)
fx
2
0 0
9 9
32 24
45 45
64 48
fx
2
= 150 fx
2
= 126
x
A
=
N
fx
=
53
56
= 1.056
x
B
=
N
fx
=
40
48
= 1.2
( )
2
2
2
A
x
N
fx
= ( ) 715 . 1 056 . 1
53
150
2
=
30 . 1
A
24
( )
2
2
2
B
x
N
fx
= ( ) 95 . 1 2 . 1
40
126
2
=
30 . 1
B
CV
A
=
A
A
x
x 100 = 100 x
056 . 1
30 . 1
= 123.8%
CV
B
=
B
B
x
x 100 = 100 x
2 . 1
30 . 1
= 109%
Since, CV
B
< CV
A
, team B is more consistent player
6. The prices of x and y share A & B respectively state which share more stable in its
value.
Price A
(x)
(x
i
= 53)
(x
i
=
x
)
(x
i
=
x
)
2
Price - A
(4)
(x
i
= 105)
(x
i
=
x
)
(x
i
=
x
)
2
55 2 4 108 3 9
54 1 1 107 2 4
52 -1 1 105 0 0
53 0 0 105 0 0
56 3 9 106 1 1
58 5 25 107 2 4
52 -1 1 104 -1 1
50 -3 9 103 -2 4
51 -2 4 104 -1 1
49 -4 16 101 -4 16
x = 530 (x
i
=
x
)
2
= 70 x = 1050 x(x
i
=
x
)
2
= 40
25
x
A
=
N
x
=
10
530
= 53
x
B
=
N
x
=
10
1050
= 105
64 . 2
10
70
A A
2
10
40
B B
CV
A
=
x
A
x 100 = 100 x
53
64 . 2
= 4.98%
CV
B
=
x
B
x 100 = 100 x
105
2
= 1.903%
Since, CV
B
is less share B is more stable.
7. A student while computing the coefficient of variation obtained the mean and SD of
100 observations as 40 and 5.1 respectively. It was later discovered that he had
wrongly copied an observation as 50 instead of 40. Calculate the correct coefficient of
variation.
>>
n
x
x
i.e.
100
x
40
x (incorrect) = 4000
Now correct x = 4000 50 + 40 = 3990
correct
100
3990
x
= 39.9
Let us consider ( )
2
2
2
x
n
x
( ) ( )
2
2
2
40
100
x
1 . 5
i.e. ( ) ( ) 01 . 1626
100
x
or
100
x
1 . 5 40
2 2
2 2
+
26
x
2
(incorrect) = 100 x 1626.01 = 162601
Now correct x
2
= 162601 (50)
2
+ (40)
2
= 161701
correct
2
= correct ( )
2
2
x correct
n
x
i.e., correct
2
=
( ) 25 9 . 39
100
161701
2
Now correct efficient of variation =
100 x
x
% 56 . 12 100 x
9 . 39
5
incorrect x = 630
Now omitting the incorrect value 10,
New x = 630 10 = 620
n = 21 1 = 20
New 31
20
620
x
Next consider ( )
2
2
2
x
n
x
( ) ( )
2
2
2
30
100
x
5
i.e.
21
x
25 900
2
+
19425 21 x 925 x incorrect
2
Again omitting the incorrect value 10.
New x = 19425 (10)
2
= 19325, n = 20
Hence new ( )
2
2
2
x new
20
x
new
25 . 5 ) 31 (
20
19325
2
New = 25 . 5 = 2.29
9. The mean of 200 items was 50. Later on it was discovered that two items were misread
as 92 and 8 instead of 192 and 88. Find out the correct mean.
>>
n
x
x
i.e. 10000 x or
200
x
50
incorrect x = 10000
Correct x = 10000 92 8 + 192 + 88 = 10180
28
Correct mean =
200
10180
= 50.9
10. Find the missing frequencies in the following data given that the median is 137.2.
Class 100-
110
110-
120
120-
130
130-
140
140-
150
150-
100
106-
170
170-
180
Frequency 15 44 133 F
1
125 F
2
35 16 N=600
>> We prepare the table with the column of cumulative frequencies and use the
formula for median.
Class Frequency cf
100-110 15 15
110-120 44 59
120-130 133 192
130-140 f
1
192 + f
1 Median class
140-150 125 317 + f
1
150-160 f
2
317 + f
1
+ f
2
160-170 35 352 + f
1
+ f
2
170-180 16 368 + f
1
+ f
2
N = 600
Median = 1 +
,
_
c
2
N
f
h
We can take the median class as 130-140 since median is given to be 137.2
130
2
130 130
l
+
, h = 10 f = f
1
, c = 192
137.2 = 130 +
1
f
10
(300 - 192)
i.e., 137-2 130 =
1
f
1080
i.e., 7.2 f
1
= 1080 or f
1
150
But the last cumulative frequency must be equal to N = 600
i.e. 368 + f
1
+ f
2
= 600
29
368 + 150 + f
2
= 600 f
2
= 82
Thus f
1
= 150, f
2
= 82
30
Relationship between various measures of dispersion
We have some of following relationships among the various methods of measures
of dispersion
1. Mean t QD covers 50% of observations of the distribution
2. Mean t MD covers 57.5% of observations
3. Mean t 1 includes 68.27% of observations
4. Mean t 2 includes 95.45% of observations
5. Mean t 3 includes 99.73% of observations
6. QD =
3
2
6745
7. MD =
5
4
x
A
2
8. QD =
6
5
MD
9. Combining the results we get 3 QD = 2 SD and 5 MD = 4 SD that is also equal to 6
QD.
10. Range = 6 times SD.
SOURCES AND REFERENCES
8. Statistics for Management, Richard I Levin, PHI / 2000.
9. Statistics, RSN Pillai and Bagavathi, S. Chands, Delhi.
10. An Introduction to Statistical Method, C.B. Gupta, & Vijaya Gupta, Vikasa
Publications, 23e/2006.
11. Business Statistics, C.M. Chikkodi and Salya Prasad, Himalaya Publications, 2000.
12. Statistics, D.C. Sancheti and Kappor, Sultan Chand and Sons, New Delhi, 2004.
13. Fundamentals of Statistics, D.N. Elhance and Veena and Aggarwal, KITAB
Publications, Kolkata, 2003.
14. Business Statistics, Dr. J.S. Chandan, Prof. Jagit Singh and Kanna, Vikas Publications,
2006.
31