[go: up one dir, main page]

0% found this document useful (0 votes)
419 views140 pages

Statistics For MGMT

Statistics can refer to either numerical data or statistical methods. As data, statistics are aggregates of facts that are numerically expressed and collected systematically. As methods, statistics involve collecting, organizing, presenting, analyzing, and interpreting numerical data. Statistics are important across many fields like business, economics, education, and more. They help simplify complex data, formulate policies, and allow comparisons and predictions. While useful, statistics also have limitations like not accounting for individual cases and qualitative factors.

Uploaded by

anandashankara
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
419 views140 pages

Statistics For MGMT

Statistics can refer to either numerical data or statistical methods. As data, statistics are aggregates of facts that are numerically expressed and collected systematically. As methods, statistics involve collecting, organizing, presenting, analyzing, and interpreting numerical data. Statistics are important across many fields like business, economics, education, and more. They help simplify complex data, formulate policies, and allow comparisons and predictions. While useful, statistics also have limitations like not accounting for individual cases and qualitative factors.

Uploaded by

anandashankara
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 140

LESSON 1

STATISTICS FOR MANAGEMENT


Session 1 Duration: 1 hr
Meaning of Statistics
The term statistics mean that the numerical statement as well as statistical
methodology. When it is used in the sense of statistical data it refers to quantitative
aspects of things and is a numerical description.
Example: Income of family, production of automobile industry, sales of cars etc. There
quantities are numerical. But there are some quantities which are not in themselves
numerical but can be made so by counting. The sex of a baby is not a number, but by
counting the number of boys, we can associate a numerical description to sex of all new
born babies, for an example, when saying that 60% of all live-born babies are boy. This
information then, comes within the realm of statistics.
Definition
The word statistics can be used is two senses, viz, singular and plural. In
narrow sense and plural sense, statistics denotes some numerical data (statistical data).
In a wide and singular sense statistics refers to the statistical methods. Therefore, these
have been grouped under two heads Statistics as a data and Statistics as a
methods.
Statistics as a Data
Some definitions of statistics as a data are
a) Statistics are numerical statement of facts in any department of
enquiring placed in relation to each other.
- Powley
b) By statistics we mean quantities data affected to a marked
extent by multiplasticity of course.
- Yule and Kendall
c) By statistics we mean aggregates of facts affected to a marked
extent by multiplicity of causes, numerically expressed, enumerated or estimated
according to reasonable standard of accuracy, collected in a systematic manner for
pre-determinated purpose and placed in relation to each other.
- H. Secrist
This definition is more comprehensive and exhaustive. It shows more light on
characteristics of statistics and covers different aspects.
Some characteristics the statistics should possess by H. Secrist can be listed as
follows.
1
Statistics are aggregate of facts
Statistics are affected to a marked extent by multiplicity of causes.
Statistics are numerically expressed
Statistics should be enumerated / estimated
Statistics should be collected with reasonable standard of accuracy
Statistics should be placed is relation to each other.
Statistics as a methods
Definition
a) Statistics may be called to science of counting
- A.L. Bowley
b) Statistics is the science of estimates and probabilities.
- Boddington
c) Dr. Croxton and Cowden have given a clear and concise definition.
Statistics may be defined as the collection, presentation, analysis and
interpretation of numerical data.
According to Croxton and Cowden there are 4 stages.
a) Collection of Data
A structure of statistical investigation is based on a systematic collection of
data. The data is classified into two groups
i) Internal data and
ii) External data
Internal data are obtained from internal records related to operations of business
organisation such as production, source of income and expenditure, inventory,
purchases and accounts.
The external data are collected and purchased by external agencies. The
external data could be either primary data or secondary data. The primary data are
collected for first time and original, while secondary data are collected by published by
some agencies.
b) Organisations of data
The collected data is a large mass of figures that needs to be organised. The
collected data must be edited to rectify for any omissions, irrelevant answers, and
wrong computations. The edited data must be classified and tabulated to suit further
analysis.
c) Presentation of data
2
The large data that are collected cannot be understand and analysis easily and
quickly. Therefore, collected data needs to be presented in tabular or graphic form.
This systematic order and graphical presentation helps for further analysis.
d) Analysis of data
The analysis requires establishing the relationship between one or more
variables. Analysis of data includes condensation, abstracting, summarization,
conclusion etc. With the help of statistical tools and techniques like measures of
dispersion central tendency, correlation, variance analysis etc analysis can be done.
e) Interpretation of data
The interpretation requires deep insight of the subject. Interpretation involves
drawing the valid conclusions on the bases of the analysis of data. This work requires
good experience and skill. This process is very important as conclusions of results is
done based on interpretation.
We can define statistics as per Seligman as follows.
Statistics is a science which deals with the method and of collecting,
classifying, presenting, comparing and interpretating the numerical data collected
to throw light on enquiry.
Importance of statistics
In todays context statistics is indispensable. As the use of statistics is extended
to various field of experiments to draw valid conclusions, it is found increased
importance and usage. The number of research investigations in the field of economics
and commerce are largely statistical. Further, the importance and statistics in various
fields are listed as below.
a) State Affairs: In state affairs, statistics is useful in following ways
1. To collect the information and study the economic condition of people in the
states.
2. To asses the resources available in states.
3. To help state to take decision on accepting or rejecting its policy based on
statistics.
4. To provide information and analysis on various factors of state like wealth,
crimes, agriculture experts, education etc.
b) Economics: In economics, statistics is useful in following ways
1. Helps in formulation of economic laws and policies
2. Helps in studying economic problems
3. Helps in compiling the national income accounts.
4. Helps in economic planning.
c) Business
1. Helps to take decisions on location and size
2. Helps to study demand and supply
3
3. Helps in forecasting and planning
4. Helps controlling the quality of the product or process
5. Helps in making marketing decisions
6. Helps for production, planning and inventory management.
7. Helps in business risk analysis
8. Helps in resource long term requirements, in estimating consumers preference
and helps in business research.
d) Education: Statistics is necessary to formulate the polices regarding start of new
courses, consideration of facilities available for proposed courses.
e) Accounts and Audits:
1. Helps to study the correlation between profits and dividends enable to know
trend of future profits.
2. In auditing sampling techniques are followed.
Functions of statistics
Some important functions of statistics are as follows
1. To collect and present facts in a systematic manner.
2. Helps in formulation and testing of hypothesis.
3. Helps in facilitating the comparison of data.
4. Helps in predicting future trends.
5. Helps to find the relationship between variable.
6. Simplifies the mass of complex data.
7. Help to formulate polices.
8. Helps Government to take decisions.
Limitations of statistics
1. Does not study qualitative phenomenon.
2. Does not deal with individual items.
3. Statistical results are true only on an average.
4. Statistical data should be uniform and homogeneous.
5. Statistical results depends on the accuracy of data.
6. Statistical conclusions are not universally true.
7. Statistical results can be interpreted only if person has sound knowledge of
statistics.
Distrust of Statistics
4
Distrust of statistics are due to lack of knowledge and limitations of its uses, but
not due to statistical sciences.
Distrust of statistics are due to following reasons.
a) Figures are manipulated or incompleted.
b) Quoting figures without their context.
c) Inconsistent definitions.
d) Selection of non-representative statistical units.
e) Inappropriate comparison
f) Wrong inference drawn.
g) Errors in data collection.
Statistical Data
Statistical investigation is a long and comprehensive process and requires
systematic collection of data in large size. The validity and accuracy of the conclusion
or results of the study depends upon how well the data were gathered. The quality of
data will greatly influence the conclusions of the study and hence importance is to be
given to the data collection process.
Statistical data may be classified as Primary Data and Secondary Data based on
the sources of data collection.
Primary data
Primary data are those which are collected for the first time by the investigator /
researchers and are thus original in character. Thus, data collected by investigator may
be for the specific purpose / study at hand. Primary data are usually in the shape of raw
materials to which statistical methods are applied for the purpose of analysis and
interpretation.
Secondary data
Secondary have been already collected for the purpose other than the problem at
hand. These data are those which have already been collected by some other persons
and which have passed through the statistical analysis at least once. Secondary data are
usually in the shape of finished products since they have been already treated
statistically in one or the other form. After statistical treatment the primary data lose
their original shape and becomes secondary data. Secondary data of one organisation
become the primary data of other organisation who first collect and publish them.
Primary Vs Secondary Data
Primary data are originated by researcher for specific purpose / study at hand
while secondary data have already been collected for purpose other than
research work at hand.
5
Primary data collection requires considerably more time, relatively expensive.
While the secondary data are easily accessible, inexpensive and quickly
obtained.
Table A compression of Primary and Secondary Data
Primary data Secondary data
Collection purpose For the problem at hand For other problems
Collection process Very involved Rapid and easy
Collection cost High Relatively low
Collection time Long Short
Suitability Its suitability is positive It may or may not suit the
object of survey
Originality It is original It is not original
Precautions No extra precautions
required to use the data
It should be used with extra
case
Limitations of secondary data
a) Since secondary data is collected for some other purpose, its usefulness to
current problem may be limited in several important ways, including relevancies
and accuracy.
b) The objectives, nature and methods used to collect secondary data may not be
appropriate to present situation.
c) The secondary data may not be accurate, or they may not be completely current
or dependable.
Criteria for evaluating secondary data
Before using the secondary data it is important to evaluate them on following
factors
a) Specification and methodology used to collect the data
b) Error and accuracy of data of the data
c) The currency
d) The objective The purpose for which data were collected
e) The nature content of data
f) The dependability
Sources of data
Primary source The methods of collecting primary data.
6
When data is neither internally available nor exists as a secondary source, then
the primary sources of data would be approximate.
The various method of collection of primary data are as follows
a) Direct personal investigation
- Interview
- Observation
b) Indirect or oral investigation
c) Information from local agents and correspondents
d) Mailded questionnaires and schedules
e) Through enumerations

Secondary source The methods of collecting secondary data
i) Published Statistics
a) Official publications of Central Government
Ex: Central Statistical Organisation (CSO) Ministry of planning
- National Sample Survey Organisation (NSSO)
- Office of the Registrar General and Census Committee GOI
- Director of Statistics and Economics Ministry of Agriculture
- Labour Bureau Ministry of Labour etc.
ii) Publications of Semi-government organisation
Ex:
- The institute of foreign trade, New Delhi
- The institute of economic growth, New Delhi.
iii) Publication of research institutes
Ex:
- Indian Statistical Institute
- Indian Agriculture Statistical Institute
- NCRET Publications
- Indian Standards Institute etc.
iv) Publication of Business and Financial Institutions
Ex:
- Trade Association Publications like Sugar factory, Textile mill, Indian
chamber of Industry and Commerce.
- Stock exchange reports, Co-operative society reports etc.
v) News papers and periodicals
Ex:
7
- The Financial Express, Eastern Economics, Economic Times, Indian
Finance, etc.
vi) Reports of various committees and commissions
Ex:
- Kothari commission report on education
- Pay commission reports
- Land perform committee reports etc.
vii) Unpublished statistics
- Internal and administrative data like Periodical Loss, Profit, Sales,
Production Rate, Balance Sheet, Labour Turnover, Budges, etc.
Classification and Tabulation
The data collected for the purpose of a statistical inquiry some times consists of
a few fairly simple figures which can be easily understood without any special
treatment. But more often there is an overwhelming mass of raw data without any
structure. Thus, unwidely, unorganised and shapeless mass of collected is not capable
of being rapidly or easily associated or interpreted. Unorganised data are not fit for
further analysis and interpretation. In order to make the data simple and easily
understandable the first task is not condense and simplify them in such a way that
irrelevant datas are removed and their significant features are stand out prominently.
The procedure adopted for this purpose is known as method of classification and
tabulation. Classification helps proper tabulation.
Classified and arranged facts speak themselves; unarranged, unorganised they
are dead as mutton.
- Prof. J.R. Hicks
Meaning of Classification
Classification is a process of arranging things or data in groups or classes
according to their resemblances and affinities and gives expressions to the unity of
attributes that may subsit among a diversity of individuals.
Definition of Classification
Classification is the process of arranging data into sequences and groups
according to their common characteristics or separating them into different but related
parts.
- Secrist
The process of grouping large number of individual facts and observations on
the basis of similarity among the items, is called classification.
- Stockton & Clark
Characteristics of classification
a) Classification performs homogeneous grouping of data
8
b) It brings out points of similarity and dissimilating
c) The classification may be either real or imaginary
d) Classification is flexible to accommodate adjustments
Objectives / purposes of classifications
i) To simplify and condense the large data
ii) To present the facts to easily in understandable form
iii) To allow comparisons
iv) To help to draw valid inferences
v) To relate the variables among the data
vi) To help further analysis
vii) To eliminate unwanted data
viii) To prepare tabulation
Guiding principles (rules) of classifications
Following are the general guiding principles for good classifications
a) Exhaustive: Classification should be exhaustive. Each and every item
in data must belong to one of class. Introduction of residual class (i.e.
either, miscellaneous etc.) should be avoided.
b) Mutually exclusive: Each item should be placed at only one class
c) Suitability: The classification should confirm to object of inquiry.
d) Stability: Only one principle must be maintained throughout the
classification and analysis.
e) Homogeneity: The items included in each class must be homogeneous.
f) Flexibility: A good classification should be flexible enough to
accommodate new situation or changed situations.
Modes / Types of Classification
Modes / Types of classification refers to the class categories into which the data
could be sorted out and tabulated. These category depends on the nature of data and
purpose for which data is being sought.
Important types of classification
a) Geographical (i.e. on the basis of area or region wise)
b) Chronological (On the basis of Temporal / Historical, i.e. with respect to time)
c) Qualitative (on the basis of character / attributes)
d) Numerical, quantitative (on the basis of magnitude)
Non-smokers
Illiterate
Male Female
Male Female
Illiterate

Male Female
Male Female
9
a) Geographical Classification
In geographical classification, the classification is based on the geographical
regions.
Ex: Sales of the company (In Million Rupees) (region wise)
Region Sales
North 285
South 300
East 185
West 235
b) Chronological Classification
If the statistical data are classified according to the time of its occurrence, the
type of classification is called chronological classification.
Sales reported by a departmental store
Month
Sales
(Rs.) in lakhs
January 22
February 26
March 32
April 25
May 27
June 29
July 30
August 30
c) Qualitative Classification
In qualitative classifications, the data are classified according to the presence or
absence of attributes in given units. Thus, the classification is based on some quality
characteristics / attributes.
Ex: Sex, Literacy, Education, Class grade etc.
Further, it may be classified as
a) Simple classification b) Manifold classification
i) Simple classification: If the classification is done into only two classes then
classification is known as simple classification.
Ex: a) Population in to Male / Female
Non-smokers
Illiterate
Male Female
Male Female
Illiterate

Male Female
Male Female
10
b) Population into Educated / Uneducated
ii) Manifold classification: In this classification, the classification is based on more
than one attribute at a time.
Ex:
d) Quantitative Classification: In Quantitative classification, the classification is
based on quantitative measurements of some characteristics, such as age, marks,
income, production, sales etc. The quantitative phenomenon under study is known
as variable and hence this classification is also called as classification by variable.
Ex:
For a 50 marks test, Marks obtained by students as classified as follows
Marks No. of students
0 10 5
10 20 7
20 30 10
30 40 25
40 50 3
Total Students = 50
In this classification marks obtained by students is variable and number of
students in each class represents the frequency.
Meaning and Definition of Tabulation
Tabulation may be defined as systematic arrangement of data is column and
rows. It is designed to simplify presentation of data for the purpose of analysis and
statistical inferences.
Population
Smokers Non-smokers
Illiterate Literate
Male Female
Male Female
Literate
Illiterate

Male Female
Male Female
11

Major Objectives of Tabulation
1. To simplify the complex data
2. To facilitate comparison
3. To economise the space
4. To draw valid inference / conclusions
5. To help for further analysis
Differences between Classification and Tabulation
1. First data are classified and presented in tables; classification is the basis for
tabulation.
2. Tabulation is a mechanical function of classification because is tabulation
classified data are placed in row and columns.
3. Classification is a process of statistical analysis while tabulation is a process of
presenting data is suitable structure.
Classification of tables
Classification is done based on
1. Coverage (Simple and complex table)
2. Objective / purpose (General purpose / Reference table / Special table or
summary table)
3. Nature of inquiry (primary and divided table).
Ex:
a) Simple table: Data are classified based on only one characteristic
Distribution of marks
Class Marks No. of students
30 40 20
40 50 20
50 60 10
Total 50
b) Two-way table: Classification is based on two characteristics
Class Marks
No. of students
Boys Girls Total
12
30 40 10 10 20
40 50 15 5 20
50 60 3 7 10
Total 28 22 50
Frequency Distribution
Frequency distribution is a table used to organize the data. The left column
(called classes or groups) includes numerical intervals on a variable under study. The
right column contains the list of frequencies, or number of occurrences of each
class/group. Intervals are normally of equal size covering the sample observations
range.
It is simply a table in which the gathered data are grouped into classes and the
number of occurrences which fall in each class is recorded.
Definition
A frequency distribution is a statistical table which shows the set of all distinct
values of the variable arranged in order of magnitude, either individually or in groups
with their corresponding frequencies.
- Croxton and Cowden
A frequency distribution can be classified as
a) Series of individual observation
b) Discrete frequency distribution
c) Continuous frequency distribution
a) Series of individual observation
Series of individual observation is a series where the items are listed one after
the each observations. For statistical calculations, these observation could be arranged
is either ascending or descending order. This is called as array.
Ex:
Roll No.
Marks obtained
in statistics
paper
1 83
2 80
3 75
4 92
13
5 65
The above data list is a raw data. The presentation of data in above form
doesnt reveal any information. If the data is arranged in ascending / descending in the
order of their magnitude, which gives better presentation then, it is called arraying of
data.
Discrete (ungrouped) Frequency Distribution
If the data series are presented in such away that indicating its exact
measurement of units, then it is called as discrete frequency distribution. Discrete
variable is one where the variates differ from each other by definite amounts.
Ex:
Assume that a survey has been made to know number of post-graduates in 10
families at random, the resulted raw data could be as follows.
0, 1, 3, 1, 0, 2, 2, 2, 2, 4
This data can be classified into an ungrouped frequency distribution. The
number of post-graduates becomes variable (x) for which we can list the frequency of
occurrence (f) in a tabular from as follows;
Number of post
graduates (x)
Frequency
(f)
0 2
1 2
2 4
3 1
4 1
The above example shows a discrete frequency distribution, where the variables
has discrete numerical values.
Continuous frequency distribution (grouped frequency distribution)
Continuous data series is one where the measurements are only approximations
and are expressed in class intervals within certain limits. In continuous frequency
distribution the class interval theoretically continuous from the starting of the frequency
distribution till the end without break. According to Boddington the variable which
can take very intermediate value between the smallest and largest value in the
distribution is a continuous frequency distribution.
Ex:
Marks obtained by 20 students in students exam for 50 marks are as given
below convert the data into continuous frequency distribution form.
18 23 28 29 44 28 48 33 32 43
14
24 29 32 39 49 42 27 33 28 29
By grouping the marks into class interval of 10 following frequency distribution
table can be formed.
Marks No. of students
0 - 5 0
5 10 0
10 15 0
15 20 1
20 25 2
25 30 7
30 35 4
35 40 1
40 45 3
45 50 2
Technical terms used in formulation frequency distribution
a) Class limits:
The class limits are the smallest and largest values in the class.
Ex:
0 10, in this class, the lowest value is zero and highest value is 10. the two
boundaries of the class are called upper and lower limits of the class. Class limit is also
called as class boundaries.
b) Class intervals
The difference between upper and lower limit of class is known as class
interval.
Ex:
In the class 0 10, the class interval is (10 0) = 10.
The formula to find class interval is gives on below
R
S L
i

L = Largest value
S = Smallest value
15
R = the no. or classes
Ex:
If the marks of 60 students in a class varies between 40 and 100 and if we want
to form 6 classes, the class interval would be
R
S L
i

=
6
40 100
=
6
60
= 10 L = 100
S = 40
K = 6
Therefore, class intervals would be 40 50, 50 60, 60 70, 70 80, 80 90
and 90 100.
Methods of forming class-interval
a) Exclusive method (overlapping)
In this method, the upper limits of one class-interval is the lower limit of next
class. This methods makes continuity of data.
Ex:
Marks No. of students
20 30 5
30 40 15
40 50 25
A student whose mark is between 20 to 29.9 will be included in the 20 30
class.
Better way of expressing is
Marks No. of students
20 to les than 30
(More than 20 but les than 30)
5
30 to les than 40 15
40 to les than 50 25
Total Students 50
b) Inclusive method (non-overlaping)
Ex:
Marks No. of students
16
20 29 5
30 39 15
40 49 25
A student whose mark is 29 is included in 20 29 class interval and a student
whose mark in 39 is included in 30 39 class interval.
Class Frequency
The number of observations falling within class-interval is called its class
frequency.
Ex: The class frequency 90 100 is 5, represents that there are 5 students scored
between 90 and 100. If we add all the frequencies of individual classes, the total
frequency represents total number of items studied.
Magnitude of class interval
The magnitude of class interval depends on range and number of classes. The
range is the difference between the highest and smallest values is the data series. A
class interval is generally in the multiples of 5, 10, 15 and 20.
Sturges formula to find number of classes is given below
K = 1 + 3.322 log N.
K = No. of class
log N = Logarithm of total no. of observations
Ex: If total number of observations are 100, then number of classes could be
K = 1 + 3.322 log 100
K = 1 + 3.322 x 2
K = 1 + 6.644
K = 7.644 = 8 (Rounded off)
NOTE: Under this formula number of class cant be less than 4 and not greater than 20.
Class mid point or class marks
The mid value or central value of the class interval is called mid point.
Mid point of a class =
2
class) of limit upper class of limit (lower +
Sturges formula to find size of class interval
Size of class interval (h) =
N log 322 . 3 1
Range
+
17
Ex: In a 5 group of worker, highest wage is Rs. 250 and lowest wage is 100 per day.
Find the size of interval.
h =
N log 322 . 3 1
Range
+
=
50 log 322 . 3 1
100 250
+

= 55.57 56
Constructing a frequency distribution
The following guidelines may be considered for the construction of frequency
distribution.
a) The classes should be clearly defined and each observations must belong to one
and to only one class interval. Interval classes must be inclusive and non-
overlapping.
b) The number of classes should be neither too large nor too small.
Too small classes result greater interval width with loss of accuracy. Too many
class interval result is complexity.
c) All interval should be of the same width. This is preferred for easy
computations.
The width of interval =
classes of Number
Range

d) Open end classes should be avoided since creates difficulty in analysis and
interpretation.
e) Intervals would be continuous throughout the distribution. This is important for
continuous distribution.
f) The lower limits of the class intervals should be simple multiples of the interval.
Ex: A simple of 30 persons weight of a particular class students are as follows.
Construct a frequency distribution for the given data.
62 58 58 52 48 53 54 63 69 63
57 56 46 48 53 56 57 59 58 53
52 56 57 52 52 53 54 58 61 63
Steps of construction
Step 1
Find the range of data (H) Highest value = 70
(L) Lowest value = 46
Range = H L = 69 46 = 23
Step 2
Find the number of class intervals.
Sturges formula
K = 1 + 3.322 log N.
18
K = 1 + 3.222 log 30
K = 5.90 Say K = 6
No. of classes = 6
Step 3
Width of class interval
Width of class interval =
classes of Number
Range
= 4 883 . 3
6
23

Step 4
Conclusions all frequencies belong to each class interval and assign this total
frequency to corresponding class intervals as follows.
Class interval Tally bars Frequency
46 50 | | | 3
50 54 | | | | | | | 8
54 58 | | | | | | | 8
58 62 | | | | | 6
62 66 | | | | 4
66 70 | 1
Cumulative frequency distribution
Cumulative frequency distribution indicating directly the number of units that
lie above or below the specified values of the class intervals. When the interest of the
investigator is on number of cases below the specified value, then the specified value
represents the upper limit of the class interval. It is known as less than cumulative
frequency distribution. When the interest is lies in finding the number of cases above
specified value then this value is taken as lower limit of the specified class interval.
Then, it is known as more than cumulative frequency distribution.
The cumulative frequency simply means that summing up the consecutive
frequency.
Ex:
Marks No. of students
Less than
cumulative
frequency
0 10 5 5
10 20 3 8
19
20 30 10 18
30 40 20 38
40 50 12 50
In the above less than cumulative frequency distribution, there are 5 students
less than 10, 3 less than 20 and 10 less than 30 and so on.
Similarly, following table shows greater than cumulative frequency
distribution.
Ex:
Marks No. of students
Less than
cumulative
frequency
0 10 5 50
10 20 3 45
20 30 10 42
30 40 20 32
40 50 12 12
In the above greater than cumulative frequency distribution, 50 students are
scored more than 0, 45 more than 10, 42 more than 20 and so on.
Diagrammatic and Graphic Representation
The data collected can be presented graphically or pictorially to be easy
understanding and for quick interpretation. Diagrams and graphs gives visual
indications of magnitudes, groupings, trends and patterns in the data. There parameter
can be more simply presented in the graphical manner. The diagrams and graphs helps
for comparison of the variables.
Diagrammatic presentation
A diagram is a visual form for presentation of statistical data. The diagram
refers various types of devices such as bars, circles, maps, pictorials and cartograms etc.
Importance of Diagrams
1. They are simple, attractive and easy understandable
2. They give quick information
3. It helps to compare the variables
20
4. Diagrams are more suitable to illustrate discrete data
5. It will have more stable effect in the readers mind.
Limitations of diagrams
1. Diagrams shows approximate value
2. Diagrams are not suitable for further analysis
3. Some diagrams are limited to experts (multidimensional)
4. Details cannot be provided fully
5. It is useful only for comparison
General Rules for drawing the diagrams
i) Each diagram should have suitable title indicating the theme with which
diagram is intended at the top or bottom.
ii) The size of diagram should emphasize the important characteristics of data.
iii) Approximate proposition should be maintained for length and breadth of
diagram.
iv) A proper / suitable scale to be apoted for diagram
v) Selection of approximate diagram is important and wrong selection may
mislead the reader.
vi) Source of data should be mentioned at bottom.
vii) Diagram should be simple and attractive
viii) Diagram should be effective than complex.
Some important types of diagrams
a) One dimensional diagrams (line and bar)
b) Two-dimensional diagram (rectangle, square, circle)
c) Three dimensional diagram (cube, sphere, cylinder etc.)
d) pictogram
e) Cartogram
a) One dimensional diagrams (line and bar)
In one dimensional diagrams, the length of the bars or lines are taken into
account. Width of the bars are not considered. Bar diagrams are classified mainly as
follows.
i) Line diagram
ii) Bar diagram
21
- Vertical bar diagram
- Horizontal bar diagram
- Multiple (compound) bar diagram
- Sub-divided (component) bar diagram
- Percentage subdivided bar diagram
i) Line diagram
This is simplest type of one dimensional diagram. On the basis of size of the
figures, heights of the bar / lines are drawn. The distance between bars are kept
uniform. The limitation of this diagram are it is not attractive cannot provide more than
one information.
Ex: Draw the line diagram for the following data
Year 2001 2002 2003 2004 2005 2006
No. of students passed in first class
with distinction
5 7 12 5 13 15
2001 2002 2003 2004 2005 2006
4
6
8
10
12
14
16
(15)
(13)
(5)
(12)
(7)
(5)


N
o
.

o
f

s
t
u
d
e
n
t
s

p
a
s
s
e
d

i
n

F
C
D
Year
Indication of diagram: Highest FCD is at 2006 and lowest FCD are at 2001 and 2004.
b) Simple bars diagram
A simple bar diagram can be drawn using horizontal or vertical bar. In business
and economics, it is very a common diagram.
Vertical bar diagram
22
The annual expresses of maintaining the car of various types are given below.
Draw the vertical bar diagram. The annual expenses of maintaining includes (fuel +
maintenance + repair + assistance + insurance).
Type of the car Expense in Rs. / Year
Maruthi Udyog 47533
Hyundai 59230
Tata Motors 63270
Source: 2005 TNS TCS Study
Published at: Vijaya Karnataka, dated: 03.08.2006
47533
59230
63270
30000
35000
40000
45000
50000
55000
60000
65000
70000
Maruthi Udyog Hyundai Tata Motors
Source: 2005 TNS TCS Study
Published at: Vijaya Karnataka, dated: 03.08.2006
Indicating of diagram
a) Annual expenses of Maruthi Udyog brand car is comparatively less with
other brands depicted
b) High annual expenses of Tata motors brand can be seen from diagram.
Horizontal bar diagram
World biggest top 10 steel makers are data are given below. Draw horizontal
bar diagram.
Steel
maker
Arcelor
Mittal
Nippo
n
POSCO JFE
BAO
Steel
US
Steel
NUCOR
RIVA Thyssen-
krupp
Tangshan
23
Prodn.
in
million
tonnes
110 32 31 30 24 20 18 18 17 16
110
32
31
30
24
20
18
18
17
16
0 20 40 60 80 100 120
Arcelor Mittal
Nippon
POSCO
JFE
BAO Steel
US Steel
NUCOR
RIVA
Thyssen-krupp
Tangshan
T
o
p

-

1
0

S
t
e
e
l

M
a
k
e
r
s
Production of Steel (Million Tonnes)
Source: ISSB Published by India Today

Compound bar diagram (Multiple bar diagram)
Multiple bar diagrams are used to provide more information than simple bar
diagram. Multiple bar diagram provides more than one phenomenon and highly useful
for direct comparison. The bars are drawn side by side and different columns, shades
hatches can be used for indicating each variables used.
Ex: Draw the bar diagram for the following data. Resale value of the cars (Rs. 000) are
as follows.
Year (Model) Santro Zen Wagonr
2003 208 252 248
2004 240 278 274
2005 261 296 302
24
208
252
248
240
278
274
261
296
302
0
50
100
150
200
250
300
350
1 2 3
Model of Car
V
a
l
u
e

i
n

R
s
.
Santro Zen Wagnor
Source: True value used car purchase data
Published by: Vijaya Karnataka, dated: 03.08.2006
Ex: Represent following in suitable diagram
Class A B C
Male 1000 1500 1500
Female 500 800 1000
Total 1500 2300 2500
1000
500
1500
800
1500
1000
0
500
1000
1500
2000
2500
P
o
p
u
l
a
t
i
o
n

(
i
n

N
o
s
.
)
1 2 3
Class
Male Female
25
1500
2300
2500
Ex: Draw the suitable diagram for following data
Mode of
investment
Investment in 2004 in Rs. Investment in 2005 in Rs.
Investment %age Investment %age
NSC 25000 43.10 30000 45.45
MIS 15000 25.86 10000 15.15
Mutual Fund 15000 25.86 25000 37.87
LIC 3000 5.17 1000 1.52
Total 58000 100 66000 100
2004 2005
0
10
20
30
40
50
60
70
80
90
100
110
45.45
15.15
37.87
1.52 5.17
25.86
25.86
43.10


%

o
f

I
n
v
e
s
t
m
e
n
t
Year
Two-dimensional diagram
In two-dimensional diagram both breadth and length of the diagram (i.e. area of
the diagram) are considered as area of diagram represents the data. The important two
dimensional diagrams are
a) Rectangular diagram
b) Square diagram
a) Rectangular diagram
Rectangular diagrams are used to depict two or more variables. This diagram
helps for direct comparison. The area of rectangular are kept in proportion to the
values. It may be of two types.
i) Percentage sub-divided rectangular diagram
ii) Sub-divided rectangular diagram
In former care width of the rectangular are proportional to the values, the
various components of the values are converted into percentages and rectangles are
26
divided according to them. While later case is used to show some related phenomenon
like cost per unit, quality of production etc.
Ex: Draw the rectangle diagram for following data
Item Expenditure
Expenditure in Rs.
Family A Family B
Provisional stores 1000 2000
Education 250 500
Electricity 300 700
House Rent 1500 2800
Vehicle Fuel 500 1000
Total 3500 7000
Total expenditure will be taken as 100 and the expenditure on individual items
are expressed in percentage. The width of two rectangles are in proportion to the total
expenses of the two families i.e. 3500 : 7000 or 1 : 2. The height of rectangles are
according to percentage of expenses.
Item Expenditure
Monthly expenditure
Family A (Rs. 3500) Family B(Rs. 7000)
Rs. %age Rs. %age
Provisional stores 1000 28.57 2000 28.57
Education 250 7.14 500 7.14
Electricity 300 8.57 700 10
House Rent 1500 42.85 2800 40
Vehicle Fuel 500 12.85 1000 14.28
Total 3500 100 7000 100
27
0
20
40
60
80
100
B A


%

o
f

E
x
p
e
n
d
i
t
u
r
e
Family
Provisonal Stores Education
Electricity House Rent Vehicle Fuel
b) Square diagram
To draw square diagrams, the square root is taken of the values of the various
items to be shown. A suitable scale may be used to depict the diagram. Ratios are to be
maintained to draw squares.
Ex: Draw the square diagram for following data
4900 2500 1600
Solution: Square root for each item in found out as 70, 50 and 40 and is divided by 10;
thus we get 7, 5 and 4.
0
1000
2000
3000
4000
5000
6000
7 5
4

3 2 1
4900
2500
1600


28
Pie diagram
Pie diagram helps us to show the portioning of a total into its component parts.
It is used to show classes or groups of data in proportion to whole data set. The entire
pie represents all the data, while each slice represents a different class or group within
the whole. Following illustration shows construction of pie diagram.
Draw the pie diagram for following data
Revenue collections for the year 2005-2006 by government in Rs. (crore)s for
petroleum products are as follows. Draw the pie diagram.
Customs 9600
Excise 49300
Corporate Tax and dividend 18900
States taking 48800
Total 126600
Solution:
Item / Source Value in
crores
Angle of circle %ge
Customs 9600
o
30 . 27 360 x
126600
9600
7.58
Excise 49300
o
20 . 140 360 x
126600
49300
39.00
Corporate Tax and Dividend 18900
o
70 . 53 360 x
126600
18900
14.92
States taking 48800
o
80 . 138 360 x
126600
48800
38.50
Total 126600 360
o
100
29
7.58
39
14.92
38.5
Customs
Excise
Corporate Tax
and Dividend
States taking
Source: India Today 19 June, 2006
Choice or selection of diagram
There are many methods to depict statistical data through diagram. No angle
diagram is suited for all purposes. The choice / selection of diagram to suit given set of
data requires skill, knowledge and experience. Primarily, the choice depends upon the
nature of data and purpose of presentation, to whom it is meant. The nature of data will
help in taking a decision as to one-dimensional or two-dimensional or three-
dimensional diagram. It is also required to know the audience for whom the diagram is
depicted.
The following points are to be kept in mind for the choice of diagram.
1. To common man, who has less knowledge in statistics cartogram and
pictograms are suited.
2. To present the components apart from magnitude of values, sub-divided bar
diagram can be used.
3. When a large number of components are to be shows, pie diagram is suitable.
Graphic presentation
A graphic presentation a visual form of presentation graphs are drawn on a
special type of paper known are graph paper.
Common graphic representations are
a) Histogram
b) Frequency polygon
c) Cumulative frequency curve (ogive)
Advantages of graphic presentation
1. It provides attractive and impressive view
30
2. Simplifies complexity of data
3. Helps for direct comparison
4. It helps for further statistical analysis
5. It is simplest method of presentation of data
6. It shows trend and pattern of data
Difference between graph and diagram
Diagram Graph
1. Ordinary paper can be used 1. Graph paper is required
2. It is attractive and easily
understandable
2. Needs some effect to understand
3. It is appropriate and effective to
measure more variable
3. It creates problem
4. It cant be used for further analysis 4. Can be used for further analysis
5. It gives comparison 5. It shows relationship between
variables
6. Data are represented by bars,
rectangles
6. Points and lines are used to represent
data
Frequency Histogram
In this type of representation the given data are plotted in the form of series of
rectangles. Class intervals are marked along the x-axis and the frequencies are along
the y-axis according to suitable scale. Unlike the bar chart, which is one-dimensional, a
histogram is two-dimensional in which the length and width are both important. A
histogram is constructed from a frequency distribution of grouped data, where the
height of rectangle is proportional to respective frequency and width represents the
class interval. Each rectangle is joined with other and the blank space between the
rectangles would mean that the category is empty and there is no values in that class
interval.
Ex: Construct a histogram for following data.
Marks obtained (x) No. of students (f) Mid point
15 25 5 20
25 35 3 30
35 45 7 40
45 55 5 50
55 65 3 60
65 75 7 70
Total 30
For convenience sake, we will present the frequency distribution along with
mid-point of each class interval, where the mid-point is simply the average of value of
lower and upper boundary of each class interval.
31
0
1
2
3
4
5
6
7
75 65 55 45 35 25 15


F
r
e
q
u
e
n
c
y

(
N
o
.

o
f

s
t
u
d
e
n
t
s
)
Class Interval (Marks)
Frequency polygon
A frequency polygon is a line chart of frequency distribution in which either the
values of discrete variables or the mid-point of class intervals are plotted against the
frequency and those plotted points are joined together by straight lines. Since, the
frequencies do not start at zero or end at zero, this diagram as such would not touch
horizontal axis. However, since the area under entire curve is the same as that of a
histogram which is 100%. The curve must be enclosed, so that starting mid-point is
jointed with fictitious preceding mid-point whose value is zero. So that the beginning
of curve touches the horizontal axis and the last mid-point is joined with a fictitious
succeeding mid-point, whose value is also zero, so that the curve will end at horizontal
axis. This enclosed diagram is known as frequency polygon.
Ex: For following data construct frequency polygon.
Marks (CI) No. of frequencies (f) Mid-point
15 25 5 20
25 35 3 30
35 45 7 40
45 55 5 50
55 65 3 60
65 75 7 70
32
0 10 20 30 40 50 60 70 80 90 100
0
2
4
6
8
10
A Frequency polygon


F
r
e
q
u
e
n
c
y
Mid point (x)
Cumulative frequency curve (ogive)
ogives are the graphic representations of a cumulative frequency distribution.
These ogives are classified as less than and more than ogives. In case of less than,
cumulative frequencies are plotted against upper boundaries of their respective class
intervals. In case of grater than cumulative frequencies are plotted against upper
boundaries of their respective class intervals. These ogives are used for comparison
purposes. Several ogves can be compared on same grid with different colour for easier
visualisation and differentiation.
Ex:
Marks
(CI)
No. of
frequencies (f)
Mid-point
Cum. Freq.
Less than
Cum. Freq.
More than
15 25 5 20 5 30
25 35 3 30 8 25
35 45 7 40 15 22
45 55 5 50 20 15
55 65 3 60 23 10
65 75 7 70 30 7
33
Less than give diagram
20 30 40 50 60 70
5
10
15
20
25
30
'Less than' ogive


L
e
s
s

t
h
a
n

C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
Upper Boundary (CI)
Less than give diagram
10 20 30 40 50 60 70
10
15
20
25
30
35
'More than' ogive


M
o
r
e

t
h
a
n

O
g
i
v
e
Lower Boundary (CI)
34
LESSON 1
STATISTICS FOR MANAGEMENT
Session 2 Duration: 1 hr
Classification and Tabulation
The data collected for the purpose of a statistical inquiry some times consists of
a few fairly simple figures, which can be easily understood without any special
treatment. But more often there is an overwhelming mass of raw data without any
structure. Thus, unwieldy, unorganised and shapeless mass of collected is not capable
of being rapidly or easily associated or interpreted. Unorganised data are not fit for
further analysis and interpretation. In order to make the data simple and easily
understandable the first task is not condense and simplify them in such a way that
irrelevant data are removed and their significant features are stand out prominently.
The procedure adopted for this purpose is known as method of classification and
tabulation. Classification helps proper tabulation.
Classified and arranged facts speak themselves; unarranged, unorganised they
are dead as mutton.
- Prof. J.R. Hicks
Meaning of Classification
Classification is a process of arranging things or data in groups or classes
according to their resemblances and affinities and gives expressions to the unity of
attributes that may subsit among a diversity of individuals.
Definition of Classification
Classification is the process of arranging data into sequences and groups
according to their common characteristics or separating them into different but related
parts.
- Secrist
The process of grouping large number of individual facts and observations on
the basis of similarity among the items is called classification.
- Stockton & Clark
Characteristics of classification
e) Classification performs homogeneous grouping of data
f) It brings out points of similarity and dissimilarities.
g) The classification may be either real or imaginary
h) Classification is flexible to accommodate adjustments
Objectives / purposes of classifications
35
ix) To simplify and condense the large data
x) To present the facts to easily in understandable form
xi) To allow comparisons
xii) To help to draw valid inferences
xiii) To relate the variables among the data
xiv) To help further analysis
xv) To eliminate unwanted data
xvi) To prepare tabulation
Guiding principles (rules) of classifications
Following are the general guiding principles for good classifications
g) Exhaustive: Classification should be exhaustive. Each and every item
in data must belong to one of class. Introduction of residual class (i.e.
either, miscellaneous etc.) should be avoided.
h) Mutually exclusive: Each item should be placed at only one class
i) Suitability: The classification should confirm to object of inquiry.
j) Stability: Only one principle must be maintained throughout the
classification and analysis.
k) Homogeneity: The items included in each class must be homogeneous.
l) Flexibility: A good classification should be flexible enough to
accommodate new situation or changed situations.
Modes / Types of Classification
Modes / Types of classification refers to the class categories into which the data
could be sorted out and tabulated. These categories depend on the nature of data and
purpose for which data is being sought.
Important types of classification
e) Geographical (i.e. on the basis of area or region wise)
f) Chronological (On the basis of Temporal / Historical, i.e. with respect to time)
g) Qualitative (on the basis of character / attributes)
h) Numerical, quantitative (on the basis of magnitude)
e) Geographical Classification
In geographical classification, the classification is based on the geographical
regions.
Ex: Sales of the company (In Million Rupees) (region wise)
Non-smokers
Illiterate
Male Female
Male Female
Illiterate

Male Female
Male Female
36
Region Sales
North 285
South 300
East 185
West 235
f) Chronological Classification
If the statistical data are classified according to the time of its occurrence, the
type of classification is called chronological classification.
Sales reported by a departmental store
Month
Sales
(Rs.) in lakhs
January 22
February 26
March 32
April 25
May 27
June 30
g) Qualitative Classification
In qualitative classifications, the data are classified according to the presence or
absence of attributes in given units. Thus, the classification is based on some quality
characteristics / attributes.
Ex: Sex, Literacy, Education, Class grade etc.
Further, it may be classified as
a) Simple classification b) Manifold classification
iii) Simple classification: If the classification is done into only two classes then
classification is known as simple classification.
Ex: a) Population in to Male / Female
b) Population into Educated / Uneducated
iv) Manifold classification: In this classification, the classification is based on more
than one attribute at a time.
Ex:
Non-smokers
Illiterate
Male Female
Male Female
Illiterate

Male Female
Male Female
37
h) Quantitative Classification: In Quantitative classification, the classification is
based on quantitative measurements of some characteristics, such as age, marks,
income, production, sales etc. The quantitative phenomenon under study is known
as variable and hence this classification is also called as classification by variable.
Ex:
For a 50 marks test, Marks obtained by students as classified as follows
Marks No. of students
0 10 5
10 20 7
20 30 10
30 40 25
40 50 3
Total Students = 50
In this classification marks obtained by students is variable and number of
students in each class represents the frequency.
Tabulation
Meaning and Definition of Tabulation
Tabulation may be defined, as systematic arrangement of data is column and
rows. It is designed to simplify presentation of data for the purpose of analysis and
statistical inferences.

Major Objectives of Tabulation
6. To simplify the complex data
Population
Smokers Non-smokers
Illiterate Literate
Male Female
Male Female
Literate
Illiterate

Male Female
Male Female
38
7. To facilitate comparison
8. To economize the space
9. To draw valid inference / conclusions
10. To help for further analysis
Differences between Classification and Tabulation
4. First data are classified and presented in tables; classification is the basis for
tabulation.
5. Tabulation is a mechanical function of classification because is tabulation
classified data are placed in row and columns.
6. Classification is a process of statistical analysis while tabulation is a process of
presenting data is suitable structure.
Classification of tables
Classification is done based on
4. Coverage (Simple and complex table)
5. Objective / purpose (General purpose / Reference table / Special table or
summary table)
6. Nature of inquiry (primary and derived table).
Ex:
c) Simple table: Data are classified based on only one characteristic
Distribution of marks
Class Marks No. of students
30 40 20
40 50 20
50 60 10
39
Total 50
d) Two-way table: Classification is based on two characteristics
Class Marks
No. of students
Boys Girls Total
30 40 10 10 20
40 50 15 5 20
50 60 3 7 10
Total 28 22 50
Frequency Distribution
Frequency distribution is a table used to organize the data. The left column
(called classes or groups) includes numerical intervals on a variable under study. The
right column contains the list of frequencies, or number of occurrences of each
class/group. Intervals are normally of equal size covering the sample observations
range.
It is simply a table in which the gathered data are grouped into classes and the
number of occurrences, which fall in each class, is recorded.
Definition
A frequency distribution is a statistical table which shows the set of all distinct
values of the variable arranged in order of magnitude, either individually or in groups
with their corresponding frequencies.
- Croxton and Cowden
A frequency distribution can be classified as
d) Series of individual observation
e) Discrete frequency distribution
f) Continuous frequency distribution
b) Series of individual observation
Series of individual observation is a series where the items are listed one after
the each observation. For statistical calculations, these observation could be arranged is
either ascending or descending order. This is called as array.
Ex:
Roll No. Marks obtained
in statistics
40
paper
1 83
2 80
3 75
4 92
5 65
The above data list is a raw data. The presentation of data in above form
doesnt reveal any information. If the data is arranged in ascending / descending in the
order of their magnitude, which gives better presentation then, it is called arraying of
data.
Discrete (ungrouped) Frequency Distribution
If the data series are presented in such away that indicating its exact
measurement of units, then it is called as discrete frequency distribution. Discrete
variable is one where the variants differ from each other by definite amounts.
Ex:
Assume that a survey has been made to know number of post-graduates in 10
families at random; the resulted raw data could be as follows.
0, 1, 3, 1, 0, 2, 2, 2, 2, 4
This data can be classified into an ungrouped frequency distribution. The number of
post-graduates becomes variable (x) for which we can list the frequency of occurrence
(f) in a tabular from as follows;
Number of post
graduates (x)
Frequency
(f)
0 2
1 2
2 4
3 1
41
4 1
The above example shows a discrete frequency distribution, where the variable
has discrete numerical values.
Continuous frequency distribution (grouped frequency distribution)
Continuous data series is one where the measurements are only approximations
and are expressed in class intervals within certain limits. In continuous frequency
distribution the class interval theoretically continuous from the starting of the frequency
distribution till the end without break. According to Boddington the variable which
can take very intermediate value between the smallest and largest value in the
distribution is a continuous frequency distribution.
Ex:
Marks obtained by 20 students in students exam for 50 marks are as given
below convert the data into continuous frequency distribution form.
18 23 28 29 44 28 48 33 32 43
24 29 32 39 49 42 27 33 28 29
By grouping the marks into class interval of 10 following frequency distribution
tables can be formed.
Marks No. of students
0 - 5 0
5 10 0
10 15 0
15 20 1
20 25 2
25 30 7
30 35 4
35 40 1
40 45 3
42
45 50 2
LESSON 1
STATISTICS FOR MANAGEMENT
Session 3 Duration: 1 hr
Technical terms used in formulation frequency distribution
c) Class limits:
The class limits are the smallest and largest values in the class.
Ex:
0 10, in this class, the lowest value is zero and highest value is 10. the two
boundaries of the class are called upper and lower limits of the class. Class limit is also
called as class boundaries.
d) Class intervals
The difference between upper and lower limit of class is known as class
interval.
Ex:
In the class 0 10, the class interval is (10 0) = 10.
The formula to find class interval is gives on below
R
S L
i

L = Largest value
S = Smallest value
R = the no. of classes
Ex:
If the mark of 60 students in a class varies between 40 and 100 and if we want
to form 6 classes, the class interval would be
I= (L-S ) / K =
6
40 100
=
6
60
= 10 L = 100
S = 40
K = 6
Therefore, class intervals would be 40 50, 50 60, 60 70, 70 80, 80 90
and 90 100.
Methods of forming class-interval
43
c) Exclusive method (overlapping)
In this method, the upper limits of one class-interval are the lower limit of next
class. This method makes continuity of data.
Ex:
Marks No. of students
20 30 5
30 40 15
40 50 25
A student whose mark is between 20 to 29.9 will be included in the 20 30
class.
Better way of expressing is
Marks No. of students
20 to les than 30
(More than 20 but les than 30)
5
30 to les than 40 15
40 to les than 50 25
Total Students 50
d) Inclusive method (non-overlaping)
Ex:
Marks No. of students
20 29 5
30 39 15
40 49 25
A student whose mark is 29 is included in 20 29 class interval and a student
whose mark in 39 is included in 30 39 class interval.
Class Frequency
The number of observations falling within class-interval is called its class
frequency.
44
Ex: The class frequency 90 100 is 5, represents that there are 5 students scored
between 90 and 100. If we add all the frequencies of individual classes, the total
frequency represents total number of items studied.
Magnitude of class interval
The magnitude of class interval depends on range and number of classes. The
range is the difference between the highest and smallest values is the data series. A
class interval is generally in the multiples of 5, 10, 15 and 20.
Sturges formula to find number of classes is given below
K = 1 + 3.322 log N.
K = No. of class
log N = Logarithm of total no. of observations
Ex: If total number of observations are 100, then number of classes could be
K = 1 + 3.322 log 100
K = 1 + 3.322 x 2
K = 1 + 6.644
K = 7.644 = 8 (Rounded off)
NOTE: Under this formula number of class cant be less than 4 and not greater than 20.
Class mid point or class marks
The mid value or central value of the class interval is called mid point.
Mid point of a class =
2
class) of limit upper class of limit (lower +
Sturges formula to find size of class interval
Size of class interval (h) =
N log 322 . 3 1
Range
+
Ex: In a 5 group of worker, highest wage is Rs. 250 and lowest wage is 100 per day.
Find the size of interval.
h =
N log 322 . 3 1
Range
+
=
50 log 322 . 3 1
100 250
+

= 55.57 56
Constructing a frequency distribution
The following guidelines may be considered for the construction of frequency
distribution.
45
g) The classes should be clearly defined and each observation must belong to one
and to only one class interval. Interval classes must be inclusive and non-
overlapping.
h) The number of classes should be neither too large nor too small.
Too small classes result greater interval width with loss of accuracy. Too many
class interval result is complexity.
i) All intervals should be of the same width. This is preferred for easy
computations.
The width of interval =
classes of Number
Range

j) Open end classes should be avoided since creates difficulty in analysis and
interpretation.
k) Intervals would be continuous throughout the distribution. This is important for
continuous distribution.
l) The lower limits of the class intervals should be simple multiples of the interval.
Ex: A simple of 30 persons weight of a particular class students are as follows.
Construct a frequency distribution for the given data.
62 58 58 52 48 53 54 63 69 63
57 56 46 48 53 56 57 59 58 53
52 56 57 52 52 53 54 58 61 63
Steps of construction
Step 1
Find the range of data (H) Highest value = 70
(L) Lowest value = 46
Range = H L = 69 46 = 23
Step 2
Find the number of class intervals.
Sturges formula
K = 1 + 3.322 log N.
K = 1 + 3.222 log 30
K = 5.90 Say K = 6
No. of classes = 6
Step 3
Width of class interval
Width of class interval =
classes of Number
Range
= 4 883 . 3
6
23

46
Step 4
Conclusions all frequencies belong to each class interval and assign this total
frequency to corresponding class intervals as follows.
Class interval Tally bars Frequency
46 50 | | | 3
50 54 | | | | | | | 8
54 58 | | | | | | | 8
58 62 | | | | | 6
62 66 | | | | 4
66 70 | 1
Cumulative frequency distribution
Cumulative frequency distribution indicating directly the number of units that
lie above or below the specified values of the class intervals. When the interest of the
investigator is on number of cases below the specified value, then the specified value
represents the upper limit of the class interval. It is known as less than cumulative
frequency distribution. When the interest is lies in finding the number of cases above
specified value then this value is taken as lower limit of the specified class interval.
Then, it is known as more than cumulative frequency distribution.
The cumulative frequency simply means that summing up the consecutive
frequency.
Ex:
Marks No. of students
Less than
cumulative
frequency
0 10 5 5
10 20 3 8
20 30 10 18
30 40 20 38
40 50 12 50
47
In the above less than cumulative frequency distribution, there are 5 students
less than 10, 3 less than 20 and 10 less than 30 and so on.
Similarly, following table shows greater than cumulative frequency
distribution.
Ex:
Marks No. of students
Less than
cumulative
frequency
0 10 5 50
10 20 3 45
20 30 10 42
30 40 20 32
40 50 12 12
In the above greater than cumulative frequency distribution, 50 students are
scored more than 0, 45 more than 10, 42 more than 20 and so on.
Diagrammatic and Graphic Representation
The data collected can be presented graphically or pictorially to be easy
understanding and for quick interpretation. Diagrams and graphs give visual
indications of magnitudes, groupings, trends and patterns in the data. These parameter
can be more simply presented in the graphical manner. The diagrams and graphs help
for comparison of the variables.
Diagrammatic presentation
A diagram is a visual form for presentation of statistical data. The diagram
refers various types of devices such as bars, circles, maps, pictorials and cartograms etc.
Importance of Diagrams
6. They are simple, attractive and easy understandable
7. They give quick information
8. It helps to compare the variables
9. Diagrams are more suitable to illustrate discrete data
10. It will have more stable effect in the readers mind.
Limitations of diagrams
1. Diagrams shows approximate value
2. Diagrams are not suitable for further analysis
48
3. Some diagrams are limited to experts (multidimensional)
4. Details cannot be provided fully
5. It is useful only for comparison
General Rules for drawing the diagrams
ix) Each diagram should have suitable title indicating the theme with which
diagram is intended at the top or bottom.
x) The size of diagram should emphasize the important characteristics of data.
xi) Approximate proposition should be maintained for length and breadth of
diagram.
xii) A proper / suitable scale to be adopted for diagram
xiii) Selection of approximate diagram is important and wrong selection may
mislead the reader.
xiv) Source of data should be mentioned at bottom.
xv) Diagram should be simple and attractive
xvi) Diagram should be effective than complex.
Some important types of diagrams
f) One dimensional diagrams (line and bar)
g) Two-dimensional diagram (rectangle, square, circle)
h) Three-dimensional diagram (cube, sphere, cylinder etc.)
i) Pictogram
j) Cartogram
c) One dimensional diagrams (line and bar)
In one-dimensional diagrams, the length of the bars or lines is taken into
account. Widths of the bars are not considered. Bar diagrams are classified mainly as
follows.
iii) Line diagram
iv) Bar diagram
- Vertical bar diagram
- Horizontal bar diagram
- Multiple (compound) bar diagram
- Sub-divided (component) bar diagram
- Percentage subdivided bar diagram
ii) Line diagram
49
This is simplest type of one-dimensional diagram. On the basis of size of the
figures, heights of the bar / lines are drawn. The distances between bars are kept
uniform. The limitation of this diagram are it is not attractive cannot provide more than
one information.
Ex: Draw the line diagram for the following data
Year 2001 2002 2003 2004 2005 2006
No. of students passed in first class
with distinction
5 7 12 5 13 15
2001 2002 2003 2004 2005 2006
4
6
8
10
12
14
16
(15)
(13)
(5)
(12)
(7)
(5)


N
o
.

o
f

s
t
u
d
e
n
t
s

p
a
s
s
e
d

i
n

F
C
D
Year
Indication of diagram: Highest FCD is at 2006 and lowest FCD are at 2001 and 2004.
d) Simple bars diagram
A simple bar diagram can be drawn using horizontal or vertical bar. In business
and economics, it is very a common diagram.
Vertical bar diagram
The annual expresses of maintaining the car of various types are given below.
Draw the vertical bar diagram. The annual expenses of maintaining includes (fuel +
maintenance + repair + assistance + insurance).
Type of the car Expense in Rs. / Year
Maruthi Udyog 47533
Hyundai 59230
50
Tata Motors 63270
Source: 2005 TNS TCS Study
Published at: Vijaya Karnataka, dated: 03.08.2006
47533
59230
63270
30000
35000
40000
45000
50000
55000
60000
65000
70000
Maruthi Udyog Hyundai Tata Motors
Source: 2005 TNS TCS Study
Published at: Vijaya Karnataka, dated: 03.08.2006
Indicating of diagram
a) Annual expenses of Maruthi Udyog brand car is comparatively less with
other brands depicted
b) High annual expenses of Tata motors brand can be seen from diagram.
Horizontal bar diagram
World biggest top 10 steel makers are data are given below. Draw horizontal
bar diagram.
Steel
maker
Arcelor
Mittal
Nippo
n
POSCO JFE
BAO
Steel
US
Steel
NUCOR
RIVA Thyssen-
krupp
Tangshan
Prodn.
in
million
tonnes
110 32 31 30 24 20 18 18 17 16
51
110
32
31
30
24
20
18
18
17
16
0 20 40 60 80 100 120
Arcelor Mittal
Nippon
POSCO
JFE
BAO Steel
US Steel
NUCOR
RIVA
Thyssen-krupp
Tangshan
T
o
p

-

1
0

S
t
e
e
l

M
a
k
e
r
s
Production of Steel (Million Tonnes)
Source: ISSB Published by India Today

Compound bar diagram (Multiple bar diagram)
Multiple bar diagrams are used to provide more information than simple bar
diagram. Multiple bar diagram provides more than one phenomenon and highly useful
for direct comparison. The bars are drawn side-by-side and different columns, shades
hatches can be used for indicating each variable used.
Ex: Draw the bar diagram for the following data. Resale value of the cars (Rs. 000) is
as follows.
Year (Model) Santro Zen Wagonr
2003 208 252 248
2004 240 278 274
2005 261 296 302
52
208
252
248
240
278
274
261
296
302
0
50
100
150
200
250
300
350
1 2 3
Model of Car
V
a
l
u
e

i
n

R
s
.
Santro Zen Wagnor
Source: True value used car purchase data
Published by: Vijaya Karnataka, dated: 03.08.2006
Ex: Represent following in suitable diagram
Class A B C
Male 1000 1500 1500
Female 500 800 1000
Total 1500 2300 2500
1000
500
1500
800
1500
1000
0
500
1000
1500
2000
2500
P
o
p
u
l
a
t
i
o
n

(
i
n

N
o
s
.
)
1 2 3
Class
Male Female
53
1500
2300
2500
Ex: Draw the suitable diagram for following data
Mode of
investment
Investment in 2004 in Rs. Investment in 2005 in Rs.
Investment %age Investment %age
NSC 25000 43.10 30000 45.45
MIS 15000 25.86 10000 15.15
Mutual Fund 15000 25.86 25000 37.87
LIC 3000 5.17 1000 1.52
Total 58000 100 66000 100
2004 2005
0
10
20
30
40
50
60
70
80
90
100
110
45.45
15.15
37.87
1.52 5.17
25.86
25.86
43.10


%

o
f

I
n
v
e
s
t
m
e
n
t
Year
Two-dimensional diagram
In two-dimensional diagram both breadth and length of the diagram (i.e. area of
the diagram) are considered as area of diagram represents the data. The important two-
dimensional diagrams are
a) Rectangular diagram
b) Square diagram
c) Rectangular diagram
Rectangular diagrams are used to depict two or more variables. This diagram
helps for direct comparison. The area of rectangular are kept in proportion to the
values. It may be of two types.
iii) Percentage sub-divided rectangular diagram
iv) Sub-divided rectangular diagram
54
In former case, width of the rectangular are proportional to the values, the various
components of the values are converted into percentages and rectangles are divided
according to them. Later case is used to show some related phenomenon like cost per
unit, quality of production etc.
Ex: Draw the rectangle diagram for following data
Item Expenditure
Expenditure in Rs.
Family A Family B
Provisional stores 1000 2000
Education 250 500
Electricity 300 700
House Rent 1500 2800
Vehicle Fuel 500 1000
Total 3500 7000
Total expenditure will be taken as 100 and the expenditure on individual items
are expressed in percentage. The widths of two rectangles are in proportion to the total
expenses of the two families i.e. 3500: 7000 or 1: 2. The heights of rectangles are
according to percentage of expenses.
Item Expenditure
Monthly expenditure
Family A (Rs. 3500) Family B(Rs. 7000)
Rs. %age Rs. %age
Provisional stores 1000 28.57 2000 28.57
Education 250 7.14 500 7.14
Electricity 300 8.57 700 10
House Rent 1500 42.85 2800 40
Vehicle Fuel 500 12.85 1000 14.28
Total 3500 100 7000 100
55
0
20
40
60
80
100
B A


%

o
f

E
x
p
e
n
d
i
t
u
r
e
Family
Provisonal Stores Education
Electricity House Rent Vehicle Fuel
d) Square diagram
To draw square diagrams, the square root is taken of the values of the various
items to be shown. A suitable scale may be used to depict the diagram. Ratios are to be
maintained to draw squares.
Ex: Draw the square diagram for following data
4900 2500 1600
Solution: Square root for each item in found out as 70, 50 and 40 and is divided by 10;
thus we get 7, 5 and 4.
0
1000
2000
3000
4000
5000
6000
7 5
4

3 2 1
4900
2500
1600


56
Pie diagram
Pie diagram helps us to show the portioning of a total into its component parts.
It is used to show classes or groups of data in proportion to whole data set. The entire
pie represents all the data, while each slice represents a different class or group within
the whole. Following illustration shows construction of pie diagram.
Draw the pie diagram for following data
Revenue collections for the year 2005-2006 by government in Rs. (crore)s for
petroleum products are as follows. Draw the pie diagram.
Customs 9600
Excise 49300
Corporate Tax and dividend 18900
States taking 48800
Total 126600
Solution:
Item / Source Value in
crores
Angle of circle %ge
Customs 9600
o
30 . 27 360 x
126600
9600
7.58
Excise 49300
o
20 . 140 360 x
126600
49300
39.00
Corporate Tax and Dividend 18900
o
70 . 53 360 x
126600
18900
14.92
States taking 48800
o
80 . 138 360 x
126600
48800
38.50
Total 126600 360
o
100
57
7.58
39
14.92
38.5
Customs
Excise
Corporate Tax
and Dividend
States taking
Source: India Today 19 June, 2006
Choice or selection of diagram
There are many methods to depict statistical data through diagram. No angle
diagram is suited for all purposes. The choice / selection of diagram to suit given set of
data requires skill, knowledge and experience. Primarily, the choice depends upon the
nature of data and purpose of presentation, to which it is meant. The nature of data will
help in taking a decision as to one-dimensional or two-dimensional or three-
dimensional diagram. It is also required to know the audience for whom the diagram is
depicted.
The following points are to be kept in mind for the choice of diagram.
4. To common man, who has less knowledge in statistics cartogram and
pictograms are suited.
5. To present the components apart from magnitude of values, sub-divided bar
diagram can be used.
6. When a large number of components are to be shows, pie diagram is suitable.
Graphic presentation
A graphic presentation is a visual form of presentation graphs are drawn on a
special type of paper known are graph paper.
Common graphic representations are
a) Histogram
b) Frequency polygon
c) Cumulative frequency curve (ogive)
58
Advantages of graphic presentation
7. It provides attractive and impressive view
8. Simplifies complexity of data
9. Helps for direct comparison
10. It helps for further statistical analysis
11. It is simplest method of presentation of data
12. It shows trend and pattern of data
Difference between graph and diagram
Diagram Graph
7. Ordinary paper can be used 7. Graph paper is required
8. It is attractive and easily
understandable
8. Needs some effect to understand
9. It is appropriate and effective to
measure more variable
9. It creates problem
10. It cant be used for further analysis 10. Can be used for further analysis
11. It gives comparison 11. It shows relationship between
variables
12. Data are represented by bars,
rectangles
12. Points and lines are used to represent
data
Frequency Histogram
In this type of representation the given data are plotted in the form of series of
rectangles. Class intervals are marked along the x-axis and the frequencies are along
the y-axis according to suitable scale. Unlike the bar chart, which is one-dimensional, a
histogram is two-dimensional in which the length and width are both important. A
histogram is constructed from a frequency distribution of grouped data, where the
height of rectangle is proportional to respective frequency and width represents the
class interval. Each rectangle is joined with other and the blank space between the
rectangles would mean that the category is empty and there are no values in that class
interval.
Ex: Construct a histogram for following data.
Marks obtained (x) No. of students (f) Mid point
15 25 5 20
25 35 3 30
35 45 7 40
45 55 5 50
55 65 3 60
65 75 7 70
Total 30
59
For convenience sake, we will present the frequency distribution along with
mid-point of each class interval, where the mid-point is simply the average of value of
lower and upper boundary of each class interval.
0
1
2
3
4
5
6
7
75 65 55 45 35 25 15


F
r
e
q
u
e
n
c
y

(
N
o
.

o
f

s
t
u
d
e
n
t
s
)
Class Interval (Marks)
Frequency polygon
A frequency polygon is a line chart of frequency distribution in which either the
values of discrete variables or the mid-point of class intervals are plotted against the
frequency and those plotted points are joined together by straight lines. Since, the
frequencies do not start at zero or end at zero, this diagram as such would not touch
horizontal axis. However, since the area under entire curve is the same as that of a
histogram which is 100%. The curve must be enclosed, so that starting mid-point is
jointed with fictitious preceding mid-point whose value is zero. So that the beginning
of curve touches the horizontal axis and the last mid-point is joined with a fictitious
succeeding mid-point, whose value is also zero, so that the curve will end at horizontal
axis. This enclosed diagram is known as frequency polygon.
Ex: For following data construct frequency polygon.
Marks (CI) No. of frequencies (f) Mid-point
15 25 5 20
25 35 3 30
35 45 7 40
45 55 5 50
55 65 3 60
65 75 7 70
60
0 10 20 30 40 50 60 70 80 90 100
0
2
4
6
8
10
A Frequency polygon


F
r
e
q
u
e
n
c
y
Mid point (x)
Cumulative frequency curve (ogive)
ogives are the graphic representations of a cumulative frequency distribution.
These ogives are classified as less than and more than ogives. In case of less than,
cumulative frequencies are plotted against upper boundaries of their respective class
intervals. In case of grater than cumulative frequencies are plotted against upper
boundaries of their respective class intervals. These ogives are used for comparison
purposes. Several ogves can be compared on same grid with different colour for easier
visualisation and differentiation.
Ex:
Marks
(CI)
No. of
frequencies (f)
Mid-point
Cum. Freq.
Less than
Cum. Freq.
More than
15 25 5 20 5 30
25 35 3 30 8 25
35 45 7 40 15 22
45 55 5 50 20 15
55 65 3 60 23 10
65 75 7 70 30 7
61
Less than give diagram
20 30 40 50 60 70
5
10
15
20
25
30
'Less than' ogive


L
e
s
s

t
h
a
n

C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
Upper Boundary (CI)
Less than give diagram
10 20 30 40 50 60 70
10
15
20
25
30
35
'More than' ogive


M
o
r
e

t
h
a
n

O
g
i
v
e
Lower Boundary (CI)
62
Session 4
Measures of Central Tendency
A classified statistical data may sometimes be described as distributed around
some value called the central value or average is some sense. It gives the most
representative value of the entire data. Different methods give different central values
and are referred to as the measures of central tendency.
Thus, the most important objective of statistical analysis is to determine a single
value that represents the characteristics of the entire raw data. This single value
representing the entire data is called Central value or an average. This value is the
point around which all other values of data cluster. Therefore, it is known as the
measure of location and since this value is located at central point nearest to other
values of the data it is also called as measures of central tendency.
Different methods give different central values and are referred as measures of
central tendency. The common measures of central tendency are a) Mean b) Median c)
Mode.
These values are very useful not only in presenting overall picture of entire data,
but also for the purpose of making comparison among two or more sets of data.
Average
Definition
Average is a value which is typical or representative of a set of data.
- Murry R. Speigal
Average is an attempt to find one single figure to describe whole of figures.
- Clark & Sekkade
From above definitions it is clear that average is a typical value of the entire
data and is a measure of central tendency.
Functions of an average
To represents complex or large data.
It facilitates comparative study of two variables.
Helps to study population from sample data.
Helps in decision making.
Represents single value for a series of data.
To establish mathematical relationship.
63
Characteristics of a typical average
It should be rigidly defined and easily understandable.
It should be simple to compute and in the form of mathematical formula.
It should be based on all the items in the data.
It should not be unduly influenced by any single item.
It should be capable of further mathematical treatment.
It should have sampling stability.
Types of average
Average or measures of central tendency are of following types.
1. Mathematical average
a. Arithmetical mean
i. Simple mean
ii. Weighted mean
b. Geometric mean
c. Harmonic mean
2. Positional Averages
a. Median
b. Mode
Arithmetic mean
Arithmetic mean is also called arithmetic average. It is most commonly used
measures of central tendency. Arithmetic average of a series is the value obtained by
dividing the total value of various item by its number.
Arithmetic average are of two types
a. Simple arithmetic average
b. Weighted arithmetic average
Simple arithmetic average (Mean)
Arithmetic mean is simply sometimes referred as Mean. Ex: Mean income,
Mean expenses, Mean marks etc.
Unlike other averages, mean has to be computed by considering each and every
observations in the series. Hence, the mean cannot be found by either by inspection or
observation of items.
Simple arithmetic mean is equal to sum of the variable divided by their number
of observations in the sample.
64
Let x
i
is the variable which takes values x
1
, x
2
, x
3
, x
n
over n items, then
arithmetic mean, simply the mean of x, denoted by bar over the variable x is given by.
n
x
n
x ..... .......... x x x
x
n 3 2 1

+ + + +

Where, is the Greek symbol sigma denotes the summation of all x


i
values.
Arithmetic mean can be computed by following two methods for direct
observation of individual items.
a. Direct method
b. Short cut method.
Direct method uses above equation and steps for short cut method is illustrated
in the subsequent topic.
Ex: (For Direct Method)
1. Calculate the mean for following data.
Marks obtained by 65 students are given below:
20, 15, 23, 22, 25, 20.
Mean marks
n
x ......... x x
x
n 2 1
+ + +

6
20 25 22 23 15 20 + + + + +

6
125

= 20.83
2. Six month income of departmental store are given below. Find mean income of
stores.
Month Jan Feb Mar Apr May June
Income (Rs.) 25000 30000 45000 20000 25000 20000
n = Total No. of items (observations) = 6
Total income = x
i
= (25000 + 30000 + 45000 + 20000 + 20000)
= 140000
Mean income = 33 . 23333 . Rs
6
140000
n
x
i

The above example shows that if there are large data or large figures are there in
data, computations required to get mean in high. In order to reduce computations one
can go for short-cut method. The method is illustrated below.
65
Shortcut method
Steps of this method is given below.
Step 1: Assume any one value as a mean which is called arbitrary average (A).
Step 2: Find the difference (deviations) of each value from arbitrary average.
D = x
i
A
Step 3: Add all deviations (differences) to get d.
Step 4: Use following equation and compute the mean value.
n
d
A x

+
n = Total No. of observations
d = Total deviation value
A = Arbitrary mean
Example: Find the mean marks obtained by the students for the joining data given.
20 25 20 22 20 21 23 25 22 18
Let A = 20 and n = 10
Marks D = (x
i
20)
20 0
25 5
20 0
22 2
20 0
21 1
23 3
25 5
22 2
18 -2
d = 16
n
d
A x

+
10
16
20 x +
= 20 + 1.6
Mean Marks 6 . 21 x
66
1. Mathematical characteristics of mean
a. Algebraic sum of deviations of all observations from their arithmetic mean is
zero i.e. (x
i
- x ) = 0.
b. The sum of squared deviations of the items from the mean is a minimum, that is
less than the sum of squared deviations of items from any other value.
d
2
= minimum
c. Since
n
x
x

. If any two values are given, third value can be computed.
d. If all the items of a sets are increased / decreased by any constant value, the
arithmetic mean will also increases / decreases by the same constant.
2. Weighted arithmetic mean
The weighted mean is computed by considering the relative importance of each
of values to the total value. The arithmetic mean gives equal importance to all the items
of distribution. In certain cases, relative importance of items is not the same. To give
relative importance, weightage may be given to variables depending on cases. Thus,
weightage represents the relative importance of the items.
The weighted arithmetic mean in computed by following equation.
Let
x
1
, x
2
, x
3
, x
n
are the variables and
w
1
, w
2
, w
3
, w
n
are the respective weights assigned. Then weighted
mean w x is given by below equation.

+ + + +
+ + + +

w
xw
w .. .......... w w w
w x ...... w x w x w x
x
n 3 2 1
n n 3 3 2 2 1 1
w
i.e., weighted average is the ratio of product of all values and respective weights
to sum of weights.
Ex: Compute simple weighted arithmetic mean and comment on them.
Designation
Monthly salary
(Rs) (x)
Strength of
cadre (w)
xw
General Manager 25000 10 250000
Mangers 19000 20 380000
Supervisors 14000 10 140000
Office Assistant 10000 50 500000
Helpers 8000 25 200000
(N = 5) Total
x = 76000 w = 115 xw = 1470000
67
a. Simple arithmetic mean = 15200 . Rs
5
76000
N
x

b. Weighted arithmetic mean =


6 . 12782 . Rs
115
1470000
w
xw

In this example, simple arithmetic mean does not accounts the difference in
salary range for various staff. It is given equal importance. The salary of General
Manager and Manager has inflated the value of simple mean. The weighted mean gives
importance to the number of persons in various salary range.
Ex: Comment on performance of students of two universities given below.
University Bombay Madras
Course
% of pas
(x)
No. of (w)
students
(000)
w
x
% of
pas (x)
No. of
(w)
students
w
x
MBA 71 3 213 81 5 405
MCA 83 2 166 76 3 228
MA 73 5 365 58 3 174
M.Sc. 75 2 150 76 1 76
M.Com. 70 2 140 81 2 162
Total () x = 372 w =14 wx =1034 x =372 w =14 wx =1045
a. Since x is same, simple arithmetic average for both universities.
= 4 . 74
5
372
N
x

b. Weighted mean for Bombay University =


86 . 73
14
1034
w
wx

c. Weighted mean for Madras University =


64 . 74
14
1045
w
wx

Comment: Madras University students performance is better than Bombay University


students.
Discrete Series
Frequencies of each value is multiplied with respective size to get total number
of items is discrete series and their total number of item is divided by total number of
frequencies to obtain arithmetic mean. This can be done in two methods one by direct
or by short cut method.
68
Ex: Calculate the mean for following data.
Value (x) 1 2 3 4 5
Frequency (f) 10 15 10 9 5
Steps:
1. Multiply each size of item by frequency to get fx
2. Add all frequencies (f = N)
3. Use formula
N
fx
f
fx
x

to get mean value.


Solution:
By direct method
Value (x) Frequency (f) fx
1 10 10
2 15 30
3 10 30
4 9 36
5 5 25
f = 49 fx = 131
67 . 2
49
131
N
fx
x

By short-cut method
Let A = 3, (Assumed mean = 3)
Value (x) Frequency (f) d = (x A) fd
1 10 -2 -20
2 15 -1 -15
3 10 0 0
4 9 1 9
5 5 2 10
f = 49 fd = - 16
67 . 2
49
16
3
N
fx
A x

,
_

+
69
Continuous series
In continuous frequency distribution, the individual value of each item in the
frequency distribution is not known. In a continuous series the mid points of various
class intervals are written down to replace the class interval. In continuous series the
mean can be calculated by any of the following methods.
a. Direct method
b. Short cut method
c. Step deviation method
a. Direct method
Steps of their method are as follows
1. Find out the mid value of class group or class.
Ex: For a class interval 20-30, the mid value is 25
2
50
2
30 23

+
mid value is
denoted by m.
2. Multiply the mid value m by frequency f of each class and sum up to get
fm.
3. Use
N
fm
x

where N = f formula to get mean value.
Ex: Compute the mean for following data.
Age group
(CI)
No. of persons
(f)
Mid point
m
fm
0 10 5 5 25
10 20 15 15 225
20 30 25 25 625
30 40 8 35 280
40 50 7 45 315
Total
f = 60 = N fm = 1470
Mean age =
245
60
1470
N
fm
f
fm

x
= 24.5
b. Short cut method
Steps of above methods are described below.
1. Find the mid value of each class
2. Assume any of the mid value as arbitrary average (A).
3. Multiply the deviation (differences) d by frequency f.
70
Using the formula
N
fd
A x

+ find the mean value.
Ex: Find the mean age of patient visiting to hospital in a particular day using following
data.
Age group
CI
No. of patients
(f)
Mid value
M
d = (m 25) fd
0 10 5 5 -20 -100
10 20 15 15 -10 -150
20 30 25 25 0 0
30 40 8 35 10 80
40 50 7 45 20 140
Total
f = 60 = N fd = 30
Let Arbitrary average = A = 25
Mean age
N
fd
A x

+
5 . 24
2
1
25
60
30
25 x

,
_


+
5 . 24 x
c. Step deviation method
In this method, after finding deviation from arbitrary mean, it is divided by a
common factor. Scaling down the deviation by a step will reduce the calculation to
minimum. The procedure of this method is described below.
Steps of step deviation method
1. Find out the mid value m.
2. Select the arbitrary men A.
3. Find the deviation (d) of mid value of each from A.
4. Deviations d are divided by a common factor d'.
5. multiply d' of each class by frequency f to get fd' and sum up for all classes to
get fd'.
6. Using the formula C x
N
' fd
A x

+ (where, C is a common factor) calculate
mean value.
71
Ex: Find the mean age of following data.
Age (CI)
No. of persons
f
Mid value
m
(d=mA)
(d=m25)
d'=
10
d
fd'
0 10 5 5 -20 -2 -10
10 20 15 15 -10 -1 -15
20 30 25 25 0 0 0
30 40 8 35 10 1 8
40 50 7 45 20 2 14
Total
f=60=N fd'= -3
Let A = 25 and
C = 10
C x
N
' fd
A x

+
10 x
60
) 3 (
25 x

+
2
1
25 x
5 . 24 x
72
Session 5
Measures of Central Tendency
Combined Mean
Combined arithmetic mean can be computed if we know the mean and number
of items in each groups of the data.
The following equation is used to compute combined mean.
Let
2 1
x & x are the mean of first and second group of data containing N
1
& N
2
items respectively.
Then, combined mean =
2 1
2 2 1 1
12
N N
x N x N
x
+
+

If there are 3 groups then


3 2 1
3 3 2 2 1 1
123
N N N
x N x N x N
x
+ +
+ +

Ex - 1:
a) Find the means for the entire group of workers for the following data.
Group 1 Group 2
Mean wages 75 60
No. of workers 1000 1500
Given data: N
1
= 1000 N
2
= 1500
60 x & 75 x
2 1

Group Mean =
2 1
2 2 1 1
12
N N
x N x N
x
+
+

=
1500 1000
60 x 1500 75 x 1000
+
+
= 66 . Rs x
12

Ex - 2: Compute mean for entire group.
Medical examination No. examined Mean weight (pounds)
A 50 113
B 60 120
C 90 115
73
Combined mean (grouped mean weight)
3 2 1
3 3 2 2 1 1
N N N
x N x N x N
+ +
+ +

) 90 60 50 (
) 115 x 90 120 x 60 113 x 50 (
x
123
+ +
+ +

pounds 116 weight Mean x


123

Merits of Arithmetic Mean
1. It is simple and easy to compute.
2. It is rigidly defined.
3. It can be used for further calculation.
4. It is based on all observations in the series.
5. It helps for direct comparison.
6. It is more stable measure of central tendency (ideal average).
Limitations / Demerits of Mean
1. It is unduly affected by extreme items.
2. It is sometimes un-realistic.
3. It may leads to confusion.
4. Suitable only for quantitative data (for variables).
5. It can not be located by graphical method or by observations.
Geometric Mean (GM)
The GM is n
th
root of product of quantities of the series. It is observed by
multiplying the values of items together and extracting the root of the product
corresponding to the number of items. Thus, square root of the products of two items
and cube root of the products of the three items are the Geometric Mean.
Usually, geometric mean is never larger than arithmetic mean. If there are zero
and negative number in the series. If there are zeros and negative numbers in the series,
the geometric means cannot be used logarithms can be used to find geometric mean to
reduce large number and to save time.
In the field of business management various problems often arise relating to
average percentage rate of change over a period of time. In such cases, the arithmetic
mean is not an appropriate average to employ, so, that we can use geometric mean in
such case. GM are highly useful in the construction of index numbers.
Geometric Mean (GM) =
n 2 1
x x . .......... x x x x n
When the number of items in the series is larger than 3, the process of
computing GM is difficult. To over come this, a logarithm of each size is obtained.
74
The log of all the value added up and divided by number of items. The antilog of
quotient obtained is the required GM.
(GM) = Antilog
1
]
1

1
]
1

+ + +

N
x log
log Anti
n
log ...... .......... log log
i
1 i
n 2 1
Merits of GM
a. It is based on all the observations in the series.
b. It is rigidly defined.
c. It is best suited for averages and ratios.
d. It is less affected by extreme values.
e. It is useful for studying social and economics data.
Demerits of GM
a. It is not simple to understand.
b. It requires computational skill.
c. GM cannot be computed if any of item is zero or negative.
d. It has restricted application.
Ex - 1:
a. Find the GM of data 2, 4, 8
x
1
= 2,
x
2
= 4,
x
3
= 8
n = 3
GM =
3 2 1
x x x x x n
GM = 8 x 4 x 2 3
GM = 4 64 3
GM = 4
b. Find GM of data 2, 4, 8 using logarithms.
Data: x
1
= 2
x
2
= 4
x
3
= 8
N = 3
75
x log x
2 0.301
4 0.602
8 0.903
logx = 1.806
GM = Antilog
1
]
1

N
x log
GM = Antilog
1
]
1

3
806 . 1
GM = Antilog (0.6020)
= 3.9997
GM 4
Ex - 2:
Compare the previous year the Over Head (OH) expenses which went up to
32% in year 2003, then increased by 40% in next year and 50% increase in the
following year. Calculate average increase in over head expenses.
Let 100% OH Expenses at base year
Year OH Expenses (x) log x
2002 Base year
2003 132 2.126
2004 140 2.146
2005 150 2.176
log x = 6.448
GM = Antilog
1
]
1

N
x log
GM = Antilog
1
]
1

3
448 . 6
GM = 141.03
GM for discrete series
GM for discrete series is given with usual notations as month:
76
GM = Antilog
1
]
1

N
x log
i
1 i
Ex - 3:
Consider following time series for monthly sales of ABC company for 4
months. Find average rate of change per monthly sales.
Month Sales
I 10000
II 8000
III 12000
IV 15000
Let Base year = 100% sales.
Solution:
Month Base year
Sales
(Rs)
Increase /
decrease
%ge
Conversion
(x)
log (x)
I 100% 10000
II 20% 8000 80 80 1.903
III + 50% 12000 130 130 2.113
IV + 25% 15000 155 155 2.190
logx = 6.206
GM = Antilog
1
]
1

3
206 . 6
= 117.13
Average sales = 117.13 100 = 14.46%
Ex - 4: Find GM for following data.
Marks
(x)
No. of students
(f)
log x f log x
130 3 2.113 6.339
135 4 2.130 8.52
140 6 2.146 12.876
145 6 2.161 12.996
150 3 2.176 6.528
f = N = 22 f log x =47.23
77
GM = Antilog
1
]
1

N
x log f
GM = Antilog
1
]
1

22
23 . 47
GM = 140.212
Geometric Mean for continuous series
Steps:
1. Find mid value m and take log of m for each mid value.
2. Multiply log m with frequency f of each class to get f log m and sum up to obtain
f log m.
3. Divide f log m by N and take antilog to get GM.
Ex: Find out GM for given data below
Yield of wheat
in
MT
No. of farms
frequency
(f)
Mid value
m
log m f log m
1 10 3 5.5 0.740 2.220
11 20 16 15.5 1.190 19.040
21 30 26 25.5 1.406 36.556
31 40 31 35.5 1.550 48.050
41 50 16 45.5 1.658 26.528
51 60 8 55.5 1.744 13.954
f = N = 100 f log m = 146.348
GM = Antilog
1
]
1

N
m log f
GM = Antilog
1
]
1

100
348 . 146
GM = 29.07
Harmonic Mean
It is the total number of items of a value divided by the sum of reciprocal of
values of variable. It is a specified average which solves problems involving variables
expressed in within Time rates that vary according to time.
78
Ex: Speed in km/hr, min/day, price/unit.
Harmonic Mean (HM) is suitable only when time factor is variable and the act being
performed remains constant.
HM =
x
1
N

Merits of Harmonic Mean


1. It is based on all observations.
2. It is rigidly defined.
3. It is suitable in case of series having wide dispersion.
4. It is suitable for further mathematical treatment.
Demerits of Harmonic Mean
1. It is not easy to compute.
2. Cannot used when one of the item is zero.
3. It cannot represent distribution.
Ex:
1. The daily income of 05 families in a very rural village are given below. Compute
HM.
Family Income (x) Reciprocal (1/x)
1 85 0.0117
2 90 0.01111
3 70 0.0142
4 50 0.02
5 60 0.016

x
1

= 0.0738
HM =
x
1
N

=
0738 . 0
5
= 67.72
HM = 67.72
79
2. A man travel by a car for 3 days he covered 480 km each day. On the first day he
drives for 10 hrs at the rate of 48 KMPH, on the second day for 12 hrs at the rate of
40 KMPH, and on the 3
rd
day for 15 hrs @ 32 KMPH. Compute HM and weighted
mean and compare them.
Harmonic Mean
x
x
1
48 0.0208
40 0.025
32 0.0312
x
1

= 0.0770
Data:
10 hrs @ 48 KMPH
12 hrs @ 40 KMPH
15 hrs @ 32 KMPH
HM =
x
1
N

=
0770 . 0
3

HM = 38.91
Weighted Mean
w x wx
10 48 480
12 40 480
15 32 480
w = 37 wx = 1440
Weighted Mean =
w
wx
x

=
37
1440

91 . 38 x
Both the same HM and WM are same.
80
3. Find HM for the following data.
Class (CI) Frequency (f) Mid point (m) Reciprocal
,
_

m
1
f
,
_

m
1
0 10 5 5 0.2 1
10 20 15 15 0.0666 0.999
20 30 25 25 0.04 1
30 40 8 35 0.0285 0.228
40 50 7 45 0.0222 0.1554
f = 60
f
,
_

m
1
= 3.3824
HM =

,
_

m
1
f
N
=
3824 . 3
60

HM = 17.73
Relationship between Mean, Geometric Mean and Harmonic Mean.
1. If all the items in a variable are the same, the arithmetic mean, harmonic mean and
Geometric mean are equal. i.e., HM GM x .
2. If the size vary, mean will be greater than GM and GM will be greater than HM.
This is because of the property that geometric mean to give larger weight to smaller
item and of the HM to give largest weight to smallest item.
Hence, HM GM x > > .
Median
Median is the value of that item in a series which divides the array into two
equal parts, one consisting of all the values less than it and other consisting of all the
values more than it. Median is a positional average. The number of items below it is
equal to the number. The number of items below it is equal to the number of items
above it. It occupies central position.
Thus, Median is defined as the mid value of the variants. If the values are
arranged in ascending or descending order of their magnitude, median is the middle
value of the number of variant is odd and average of two middle values if the number of
variants is even.
Ex: If 9 students are stand in the order of their heights; the 5
th
student from either side
shall be the one whose height will be Median height of the students group. Thus,
median of group is given by an equation.
81
Median =
1
]
1

+
2
1 N
Ex
1. Find the median for following data.
22 20 25 31 26 24 23
Arrange the given data in array form (either in ascending or descending order).
20 22 23 24 25 26 31
Median is given by
1
]
1

+
2
1 N
th
item =
1
]
1

+
2
1 7
=
4
8
Median = 4
th
item.
2. Find median for following data.
20 21 22 24 28 32
Median is given by
1
]
1

+
2
1 N
th
item =
1
]
1

+
2
1 6
Median = 3.5
th
item.
The item lies between 3
rd
and 4.
So, there are two values 22 and 24.
The median value will be the mean values of these two values.
Median =
1
]
1

+
2
24 22
= 23
Discrete Series Median
In discrete series, the values are (already) in the form of array and the
frequencies are recorded against each value. However, to determine the size of median
1
]
1

+
2
1 N
th
item, a separate column is to be prepared for cumulative frequencies. The
median size is first located with reference to the cumulative frequency which covers the
size first. Then, against that cumulative frequency, the value will be located as median.
82
Ex: Find the median for the students marks.
Obtained in statistics
Marks (x)
No. of
students (f)
Cumulative
frequency
10 5 5
20 5 10
30 3 13
40 15 28
50 30 58
60 10 68
N = 68
Ex: In a class 15 students, 5 students were failed in a test. The marks of 10 students
who have passed were 9, 6, 7, 8, 9, 6, 5, 4, 7, 8. Find the Median marks of 15 students.
Marks No. of students (f) cf
0
5
1
2
3
4 1 6
5 1 7
6 2 9
7 2 11
8 2 13
9 2 15
f = 15
Median =
2
1 N
th
+
item
Me =
2
1 15+
= 8
th

Me 8
th
item covers in cf of 9. the marks against cf 9 is 6 and hence
Median = 6
83
Just above 34
is 58. Against
58 c.f. the
value is 50
which is
median value
Continuous Series
The procedure is different to get median in continuous series. The class
intervals are already in the form of array and the frequency are recorded against each
class interval. For determining the size, we should take
th
2
n
item and median class
located accordingly with reference to the cumulative frequency, which covers the size
first. When the median class is located, the median value is to be interpolated using
formula given below.
Median =
1
]
1

+ C
2
N
f
h

Where
2
1 0

+
where,
0
is left end point of N/2 class and l
1
is right end
point of previous class.
h = Class width, f = frequency of median clas
C = Cumulative frequency of class preceding the median class.
Ex: Find the median for following data. The class marks obtained by 50 students are as
follows.
CI Frequency (f)
Cum.
frequency (cf)
10 15 6 6
15 20 18 24
20 25 9
33 N/2 class
25 30 10 43
30 35 4 47
35 40 3 50
f = N = 50
25
2
50
2
N

Cum. frequency just above 25 is 33 and hence, 20 25 is median class.
2
1 0

20
2
20 20

+
20
h = 20 15 = 5
84
f = 9
c = 24
Median =
1
]
1

+ C
2
N
f
h

Median = [ ] 24 25
9
5
20 +
=
9
5
20+
Median = 20.555
Ex: Find the median for following data.
Mid values (m) 115 125 135 145 155 165 175 185 195
Frequencies (f) 6 25 48 72 116 60 38 22 3
The interval of mid-values of CI and magnitudes of class intervals are same i.e.
10. So, half of 10 is deducted from and added to mid-values will give us the lower and
upper limits. Thus, classes are.
115 5 = 110 (lower limit)
115 5 = 120 (upper limit) similarly for all mid values we can get CI.
CI Frequency (f)
Cum.
frequency (cf)
110 120 6 6
120 130 25 31
130 140 48 79
140 150 72 151
150 160 116 267 N/2 class
160 170 60 327
170 180 38 365
180 190 22 387
190 200 3 390
f = N = 390
2
390
2
N

195
Cum. frequency just above 195 is 267.
85
Median class = 150 160

=
2
150 150 +
= 150
h = 116
N/2 = 195
C = 151
h = 10
Median =
1
]
1

+ C
2
N
f
h

Median = [ ] 151 195


116
10
150 +
Median = 153.8
Merits of Median
a. It is simple, easy to compute and understand.
b. Its value is not affected by extreme variables.
c. It is capable for further algebraic treatment.
d. It can be determined by inspection for arrayed data.
e. It can be found graphically also.
f. It indicates the value of middle item.
Demerits of Median
a. It may not be representative value as it ignores extreme values.
b. It cant be determined precisely when its size falls between the two values.
c. It is not useful in cases where large weights are to be given to extreme values.
86
Session 6
Measures of Central Tendency
Mode
It is the value which occurs with the maximum frequency. It is the most typical
or common value that receives the height frequency. It represents fashion and often it
is used in business. Thus, it corresponds to the values of variable which occurs most
frequently. The model class of a frequency distribution is the class with highest
frequency. It is denoted by z.
Mode is the value of variable which is repeated the greatest number of times in
the series. It is the usual, and not casual, size of item in the series. It lies at the position
of greatest density.
Ex: If we say modal marks obtained by students in class test is 42, it means that the
largest number of student have secured 42 marks.
If each observations occurs the same number of times, we can say that there is
no mode. If two observations occur the same number of times, we can say that it is a
Bi-modal. If there are 3 or more observations occurs the same number of times we say
that multi-modal case. When there is a single observation occurs mot number of
times, we can say it is uni-modal case.
For a grouped data mode can be computed by following equations with usual
notations.
Mode =
2 1 m
1 m
f f f 2
) f f ( h



where,
f
m
= max frequency (modal class frequency)
f
1
= frequency preceding to modal class.
f
2
= frequency succeeding to modal class
h = class width.
or
Mode =
2 1
2
f f
hf
+
+
87
Ex:
1. Find the modal for following data.
Marks
(CI)
No. of students
(f)
1 10 3
11 20 16
21 30 26
31 40 31 Max. frequency
41 50 16
51 60 8
f = N = 100
We shall identify the modal class being the class of maximum frequency. i.e.
31-40.
where,
f
m
= 31
f
1
= 26
f
2
= 16
h = 10
2
31 30 +


30.5
Mode (z) =
2 1 m
1 m
f f f 2
) f f ( h

+

Mode =
16 26 31 x 2
26) - (31 10
30.5

+
Mode = 33.
Or
88
Mode =
2 1
2
f f
hf
+
+
=
) 16 26 (
16 x 10
5 . 30
+
+
Mode = 34.30
It can be noted that there exists slightly different mode value in the second
method.
Partition values
Median divides in to two equal parts. There are other values also which divides
the series partitioned value (PV).
Just as one point divides as series in to two equal parts (halves), 3 points divides
in to four points (Quartiles) 9 points divides in to 10 points (deciles) and 99 divide in to
100 parts (percentage). The partitioned values are useful to know the exact
composition of series.
Quartiles
A measure, which divides an array, in to four equal parts is known as quartile.
Each portion contain equal number of items. The first second and third point are
termed as first quartile (Q
1
). Second quartile (Q
2
) and third quartile (Qs). The first
quartile is also known as lower quartiles as 25% of observation of distribution below it,
75% of observations of the distribution below it and 25% of observation above it.
Calculation of quartiles
Q
1
= size of
( )
item
4
1 N
th
+
Q
2
= size of
( )
item
4
1 N 3
th
+
Q
2
= (median) =
1
]
1

+ C
2
N
f
h

Measures of quartiles
The quartile values are located on the principle similar to locating the median
value.
89
Following table shows procedure of locating quartiles.
Measure
Individual and Discrete
senses
Continuous series
Q
1
( )
item
4
1 N
th
+
item
4
n
th
Q
2
( )
item
4
1 N 2
th
+
item n
4
2
th
Q
3
( ) item 1 N
4
3
th
+ item n
4
3
th
Ex - 1: From the following marks find Q
1
, Median and Q
3
marks
23, 48, 34, 68, 15, 36, 24, 54, 65, 75, 92, 10, 70, 61, 20, 47, 83, 19, 77
Let us arrange the data in array form.
Sl.
No.
x
1. 10
2. 15
3. 19
4. 20
5.
23 Q
1
6. 24
7. 34
8. 36
9. 47
10.
48 Q
2
11. 54
12. 61
13. 65
14. 68
15.
70 Q
3
16. 75
17. 77
18. 83
19. 92
90
a. Q
1
= ( ) item 1 n
4
1
th
+
Q
1
= ( ) 1 19
4
1
+ Here, n = 19 items
Q
1
= 20 x
4
1
Q
1
= 5
th
item
Q
1
= 23
b. Q
2
= ( ) item 1 n
4
2
th
+
Q
2
= 20 x
4
2
10
th
item
Q
2
= 48
c. Q
3
= ( ) item 1 n
4
3
th
+
Q
3
= 20 x
4
3
= 15
th
item
Q
3
= 70
Ex - 2: Locate the median and quartile from the following data.
Size of shoes 4 4.5 5 5.5 6 6.5 7 7.5 8
Frequencies 20 36 44 50 80 30 30 16 14
X f cf
4 20 20
4.5 36 56
5 44
100 Q
1
5.5 50 150
6 80
230 Q
2
6.5 30
260 Q
3
7 30 290
7.5 16 306
8 14 320
N = f = 320
91
Q
1
= ( ) item 1 n
4
1
th
+
Q
1
= 321
4
1
Q
1
= 80.25
th
item
Just above 80.25, the cf is 100. Against 100 cf, value is 5.
Q
1
= 5
Q
2
= ( ) item 1 n
2
1
th
+
Q
2
= 321 x
2
1
160.5
th
item
Just above 160.5, the cf is 230. Against 230 cf value is 6.
Q
2
= 6
Q
3
= ( ) item 1 n
4
3
th
+
Q
3
= x
4
3
321 = 240.75
th
item
Just above 240.75, the cf is 260. Against 260 cf value is 6.5.
Q
3
= 6.5
Ex - 3: Compute the quartiles from the following data.
CI 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency
(f)
5 8 7 12 28 20 10 10
First quartile (Q
1
) =
1
]
1

+ C N
4
1
f
h

and Q
3
=
1
]
1

+ C N
4
3
f
h


and (Q
2
) = Median =
and C
2
N
f
h
1
]
1

+
92
CI f cf
0-10 5 5
10-20 8 13
20-30 7 20
30-40 12
32 Q
1
40-50 28
60 Q
2
50-60 20
80 Q
3
60-70 10 90
70-80 10 100
N = f = 100
a. First locate Q
1
for N
N = 25
= 30
h = 10
f = 12
c = 20
(Q
1
) =
1
]
1

+ C N
4
1
f
h

= 30
2
30 30

+
+
Q
1
= [ ] 20 25
12
10
30 +
Q
1
= 34.16
b. Locate Q
2
(Median)
Q
2
corresponds to N/2 = 50, 40
2
40 40

+
+
Q
2
=
1
]
1

+ C
2
N
f
h

Q
2
= [ ] 32 50
28
10
40 +
Q
2
= 46.42
93
Q
3
corresponds to N = 75, 50
2
50 50

+
+
Q
3
=
1
]
1

+ C N
4
3
f
h

Q
3
= [ ] 60 75
20
10
50 +
Q
3
= 57.5
Deciles
The deciles divide the arrayed set of variates into ten portions of equal
frequency and they are some times used to characterize the data for some specific
purpose. In this process, we get nine decile values. The fifth decile is nothing but a
median value. We can calculate other deciles by following the procedure which is used
in computing the quartiles.
Formula to compute deciles.
, C N
10
1
f
h
D
1

,
_

+

,
_

+ C N
20
2
f
h
D
2


on , so &
Percentiles
Percentile value divides the distribution into 100 parts of equal frequency. In
this process, we get ninety-nine percentile values. The 25
th
, 50
th
and 75
th
percentiles are
nothing but quartile first, median and third quartile values respectively.
Formula to compute percentiles is given below:
P
25
=
, C N
100
25
f
h

,
_

+
P
26
=
,
_

+ C N
100
26
f
h

and so, on
Ex:
Find the decile 7 and 60
th
percentile for the given data of patients visited to hospital on a
particular day.
CI f Cf
10-20 1 1
20-30 3 4
30-40 11 15
40-50 21 36
50-60 43
79 P
60
60-70 32
111 D
70
70-80 9 120
f = N = 120
94
a. D
7
=
, C N
10
7
f
h

,
_

+
60
2
60 60

+

84 N
10
7

h = 10, f = 32
c = 79
D
7
= ( ) 79 84
32
10
60 +
7
th
Decile = D
7
= 61.562
b. 60
th
percentile
P
60
=
,
_

+ C N
100
60
f
h

50
2
50 50

+

h = 10
f = 43
c = 36
72 N
100
60

P
60
= ( ) 36 72
43
10
50 +
P
60
= ( ) 36 72
43
10
50 +
P
60
= 58.37
SOME NUMERICAL EXAMPLES
1. Show that following distribution is symmetrical about the average. Also shows
that median is the mid-way between lower and upper quartiles.
X 2 3 4 5 6 7 8 9 10
Frequency 2 9 29 57 80 57 29 9 2
To show the given distribution is symmetrical, Mean, Median and Mode must
be same.
95
To show median is mid-way between the lower and upper quartile i.e., Q
2
Q
1
= Q
3
Q
2
.
Mid-point
x
Class interval
CI
f d = (x 6) fd
cf
Cum. freq.
2 1.5 2.5 2 -4 -8 2
3 2.5 3.5 9 -3 -27 11
4 3.5 4.5 29 -2 -58 40
5 4.5 5.5 57 -1 -57 97 Q
1
class
6 5.5 6.5 80 0 0 177 Q
2
class
7 6.5 7.5 57 1 57 234 Q
3
class
8 7.5 8.5 29 2 58 263
9 8.5 9.5 9 3 27 272
10 8.5 10.5 2 4 8 274
N=274 fd = 0
Let A = 6
Mean =
N
fd h
A

+
= 6
274
0 x 1
6 +
Mean = 6.
Median
Q
2
=
1
]
1

+ C
2
N
f
h

137
2
274
2
N

C = 97
Q
2
= [ ] 97 137
80
1
. 5 +
Q
2
= 5.5 + 0.5
Median = Q
2
= 6.
96
Mode
Mode =
( )
2 1 m
1 m
f f f 2
f f h

+
Modal class 5.5 6.5
Mode =
( )
57 57 80 x 2
57 80 1
5 . 5

+
Mode = 6.
Since, Mean = Mode = Median. The given distribution is symmetrical.
Q
1
calculation
Q
1
=
1
]
1

+ C N
2
4
f
h

Q
1
= [ ] 40 5 . 68
57
1
5 . 6 +
Q
1
= 7.
Now, Q
2
Q
1
= Q
3
Q
2

i.e. 6 5 = 7 5
2 = 2
2. Find the mean for the set of observations given below.
6, 7, 5, 4
5
4 8 7 8 6
N
xi
x
1 i
n
+ + + +

= 6
5
30

3. Find the mean for the following data.


CI f x
i
fx
0-10 3 5.5 16.5
11-20 16 15.5 248
21-30 26 23.5 683
31-40 31 35.5 1180.5
41-50 16 45.5 728
51-60 8 55.5 444
N = f = 100 3300
97
100
3200
N
fx
x

32 x
4. Find the mean profit of the organisation for the given data below:
Profit CI f x
i
fx
100-200 10 150 1500
200-300 18 250 4500
300-400 20 350 7000
400-500 26 450 11700
500-600 30 550 16500
600-700 28 650 18200
700-800 18 750 13500
N = f = 150 72900
x
1
=
2
200 100 +
x
1
=
2
300
x
1
= 150
N
fx
x

=
150
72900
x 486
Step Deviation Method
x = a + hd d =
h
a x
N
fd
h a x

+
a = Arbitrary constant
h = class width
98
Profit CI f x
i
d fd
100-200 10 150 -3 -30
200-300 18 250 -2 -36
300-400 20 350 -1 -20
400-500 26 450 0 0
500-600 30 550 +1 30
600-700 28 650 +2 56
700-800 18 750 +3 34
N = f = 150 fm = 54
N
fd
h a x

+

,
_

+
150
54
100 450 x
486 x
5. In an office there are 84 employees and there salaries are given below.
Salary 2430 2590 2870 3390 4720 5160
Employees 4 28 31 16 3 2
1. Find the mean salary of the employees
2. What is the total salary of the employees?
N
fx
x

=
84
2 x 5160 3 x 4730 16 x 3390 31 x 2870 28 x 2590 4 x 2430 + + + + +
N
fx
x

84
249930
x
Rs. 2975.36
1. x 2975.36
2. Total salary = 2,49,930 (Rs.)
99
6. The average marks secured by 36 students was 52 but it was discovered that on item
64 was misread as 46. Find the correct me of the marks.
N
fx
x

56
fx
52

fx = 52 x 36 = 1872
fx = fx - incorrect + correct
correct = 1872 46 64 = 1890
x
N
correct fx
x
36
1890

x 52.5
7. The mean of 100 items is 46, later it was discovered that an item 16 was misread as
61 and another item 43 was misread as 34 and also found that the total number of
items are 90 not 100 find the correct mean value.
N
fx
x

100
fx
46

fx = 4600
fx = fx - incorrect + correct
= 4600 61 - 34 + 16 + 43
= 4564
x
N
correct fx
x
90
4564

= 50.71
100
8. Calculate the mean for the following data.
Value Frequency
< 10 4
< 20 10
< 30 15
< 40 25
< 50 30
CI f m mid point fm
0-10 4 5 20
10-20 10 15 150
20-30 15 25 375
30-40 25 35 875
40-50 30 45 1350
f = 84 fx 2770
N
fm
x

84
2770

x 32.97
9. For a given frequency table, find out the missing data. The average accident are
1.46.
No. of accidents Frequency
0 46
1 ?
2 ?
3 25
4 10
5 5
101
No. of accidents
(x)
Frequency
(f)
fx
0 46 0
1 ? f
1
2 ? 2f
1
3 25 75
4 10 40
5 5 25
N = 200
fx = 140 + f
1
+ 2f
2
1.46 =
200
f 2 f 140
2 1
+ +
292 = 140 + f
1
+ 2f
2
f
1
+ 2f
2
= 152 ----(1)
w.k.t. N = f
200 = 86 + f
1
+ f
2
f
1
+ f
2
= 114 ----(2)
f
1
+ 2f
2
= 152 ----(1)
f
1
+ f
2
= 114 ----(2) (1) (2)
---------------------------------
f
2
= 38
---------------------------------
f
2
= 38
f
1
+ f
2
= 114
f
1
+ 114 38
f
1
= 76
102
10. Find out the missing values of the variate for the following data with mean is 31.87.
x
i
F
12 8
20 16
27 48
33 90
? 30
54 8
N = 200
x
i
f fx
12 8 96
20 16
320
27 48 1296
33 90 2970
x 30 30x
54 8 432
N = 200
fx = 5114 + 30x
x 31.87
N
fx
x

200
fx
87 . 31

fx = 6374 ----(1)
fx = 5114 + 30x ----(2)
(1) = (2)
6374 = 5114 + 30x
6374 - 5114 = 30x
30x = 1260
x = 42.
103
11. The average rainfall of a city from Monday to Saturday is 0.3 inches. Due to heavy
rainfall Sunday the average rainfall for the week increased to 0.5 inches. What is
the rainfall on Sunday?
Given: Mon Sat = 0.3
Sun = 0.5
N
fx
x
1


6
fx
3 . 0
1

fx
1
= 1.8
N
fx
x
2


7
fx
5 . 0
2

fx
2
= 3.5
Rainfall on Sunday = fx
2
fx
1

= 3.5 1.8
= 1.7
12. The average salary of male employees in a firm was Rs. 520 and that of females Rs.
420 the mean of salary of all the employees as a whole is Rs. 500. Find the
percentage of male and female employees.
Given: 520 x
1
420 x
2
500 x
n
1
= Male persons. n
2
= Female persons.
2 1
2 2 1 1
n n
x n x n
x
+
+

2 1
2 1
n n
420 x n 520 x n
500
+
+

2 1
2 1
n n
n 420 n 520
500
+
+

500n
1
+ 500n
2
= 520n
1
420n
2
80n
2
= 20n
1
n
1
= 4n
2
Let n
1
+ n
2
= 100
4n
2
+ n
2
= 100
5n
2
= 100
n
2
= 20% Female
n
1
= 80% Male
20% and 80% are male and females in the firm.
104
13. The A-M of two observations is 25 and there GM is 15. Find the HM.
Given:
AM = 25
2
b a
x
+

2
b a
x
+

2
b a
25
+

a + b = 50
GM = 15
GM = ab 2
GM = ab
15 = ab
(15)
2
= ( ab )
2
ab = 225
HM = ?
HM =
b
1
a
1
2
+
HM =
b a
ab 2
+
HM =
50
225 x 2
HM = 9
a + b = 50
ab = 225
a =
b
225
HM = 9
14. The GM is 60 an HM is 28.24. Find AM for two observations.
AM GM HM
2
b a
x
+

2
b 95 254
x

= 127.475
60 = ab
60
2
= ab
ab = 3600
28.24 =
b a
ab 2
+
a + b =
4 . 28
ab 2
=
4 . 28
3600 x 2
a + b = 254.95
105
15. Calculate the missing frequency from the data if the median is 50.
CI f cf
10-20
2 2
20-30
8 10
30-40
6 16
40-50
? f
1
16+f
1
50-60
15
31+f
1
median class
60-70
10 41+f
1
f = 41 + f
1
Q =
1
]
1

+ C
2
N
f
h

50 = 50 +
1
]
1

+ ) f 16 (
2
N
15
10
1
50 50 =
1
]
1

+ ) f 16 (
2
N
15
10
1
0 =
1
]
1

+ ) f 16 (
2
N
15
10
1
0 =
1
]
1

+ ) f 16 (
2
N
10
1
0 =
1
]
1

+ ) f 16 (
2
N
1
2
N
) f 16 (
1
+
16 + f
1
= (41 + f
1
)
2 (16 + f
1
) = 41 + f
1
32 + 2f
1
= 41 + f
1
f
1
= 9
106
SOURCES AND REFERENCES
1. Statistics for Management, Richard I Levin, PHI / 2000.
2. Statistics, RSN Pillai and Bagavathi, S. Chands, Delhi.
3. An Introduction to Statistical Method, C.B. Gupta, & Vijaya Gupta, Vikasa
Publications, 23e/2006.
4. Business Statistics, C.M. Chikkodi and Salya Prasad, Himalaya Publications, 2000.
5. Statistics, D.C. Sancheti and Kappor, Sultan Chand and Sons, New Delhi, 2004.
6. Fundamentals of Statistics, D.N. Elhance and Veena and Aggarwal, KITAB
Publications, Kolkata, 2003.
7. Business Statistics, Dr. J.S. Chandan, Prof. Jagit Singh and Kanna, Vikas
Publications, 2006.
107
Session 7
Measures of Dispersions
The measures of Central Tendency alone will not exhibit various characteristics
of the frequency distribution having the same total frequency. Two distribution can
have the same mean but can differ significantly. We need to know the extent of
variation or deviation of the values in comparison with the central value or average
referred to as the measures of central tendency.
Measures of dispassion are the average of second order. The are based on the
average of deviations of the values obtained from central tendencies
x
, Me or z. The
variability is the basic feature of the values of variables. Such type of variation or
dispersion refers to the lack of uniformity.
Definition: A measure of dispersion may be defined as a statistics signifying the extent
of the scatteredness of items around a measure of central tendency.
Absolute and Relative Measures of Dispersion:
A measure of dispersion may be expressed in an absolute form, or in a relative
form. It is said to be in absolute form when it states the actual amount by which the
value of item on an average deviates from a measure of central tendency. Absolute
measures are expressed in concrete units i.e., units in terms of which the data have been
expressed e.g.: Rupees, Centimetres, Kilogram etc. and are used to describe frequency
distribution.
A relative measures of dispersion is a quotient by dividing the absolute
measures by a quality in respect to which absolute deviation has been computed. It is
as such a pure number and is usually expressed in a percentage form. Relative
measures are used for making comparisons between two or more distribution.
Thus, absolute measures are expressed in terms of original units and they are
not suitable for comparative studies. The relative measures are expressed in ratios or
percentage and they are suitable for comparative studies.
Measures of Dispersion Types
Following are the common measures of dispersions.
a. The Range
b. The Quartile Deviation (QD)
c. The Mean Deviation (MD)
d. The Standard Deviation (SD)
108
Range
Range represents the differences between the values of the extremes. The
range of any such is the difference between the highest and the lowest values in the
series.
The values in between two extremes are not all taken into consideration. The
range is an simple indicator of the variability of a set of observations. It is denoted by
R. In a frequency distribution, the range is taken to be the difference between the
lower limit of the class at the lower extreme of the distribution and the upper limit of
the distribution and the upper limit of the class at the upper extreme. Range can be
computed using following equation.
Range = Large value Small value
value Small value e arg L
value Small value e arg L
Range of t Coefficien
+

Problems
1. Compute the range and also the co-efficient of range of the given series of state
which one is more dispersed and which is more uniform.
Series I 9, 10, 15, 19, 21 Series II 1, 15, 24, 28, 29
R = LV SV = 21 9 = 12 R = LV SV = 29 1 = 28
CR =
30
12
9 21
12

+
= 0.4 CR =
30
28
SV LV
R

+
= 0.933
Series I is les dispersed and more uniform
Series II is more dispersed and less uniform
Evaluating Criteria
i. Less the CR is less dispersion
ii. More the CR is less uniform
Range Merits
i. It is very simplest to measure.
ii. It is defined rigidly
iii. It is very much useful in Statistical Quality Control (SBC).
iv. It is useful in studying variation in price of shars and stocks.
109
Limitations
i. It is not stable measure of dispersion affected by extreme values.
ii. It does not considers class intervals and is not suitable for C.I. problems.
iii. It considers only extreme values.
2. Find range of Co-efficient of range from following data.
A: 10 11 12 13 14
B: 40 41 42 43 44
C: 100 101 102 103 104
Series - I Series II Series III
R =LV 3m
= 14 10
= 4
CR =
SV LV
R
+
=
24
4
= 0.166
R = 44 - 40
= 4
CR =
SV LV
R
+
=
84
4
= 0.0476
R = 104 - 100
= 4
CR =
SV LV
R
+
=
204
4
= 0.0196
Series III is less dispersed and more uniform
Series I is more dispersed and less uniform
3. Compute range and coefficient of range for the following data.
x: 6 12 18 24 30 36 42
f: 20 130 16 14 20 15 40
Range = LV SV = 42 6 = 36
CR =
SV LV
R
+
=
48
36
= 0.75
110
Quartile Deviation
Quartile divides the total frequency in to four equal parts. The lower quartile Q
1
refers to the values of variates corresponding to the cumulative frequency N/4.
Upper quartile Q
3
refers the value of variants corresponding to cumulative
frequency N.
Quartile deviation is defined as QD =
2
1
(Q
3
Q
1
). In this quartile Q
2
as it
corresponds to the value of variate with cumulative frequency is equal to c.f. =
2
N
.
a) QD =
2
1
(Q
3
Q
1
)
b) Relative measure of dispersion coefficient of QD =
1 3
1 3
Q Q
Q Q
+

Problems
1. Find quartile deviation and coefficient of quartile deviation for the given grouped
data also compute middle quartile.
Class f
1 10 3
11 20 16
21 30 26
31 40 31
41 50 16
51 60 8
f = N = 100
Class f Cf
1 10 3 3
11 20 16 19
21 30 26
45 Q
1
Class
31 40 31
76 Q
2
& Q
3
Class
41 50 16 92
51 60 8 100
N = 100
111
Q
1

4
N
=
25
4
100
Q
1
=
1
]
1

+ C
4
N
f
h

Q
1
= [ ] 19 25
26
10
5 . 20 +
Q
1
= 22.80
Q
2
=
1
]
1

+ C
2
N
f
h

Q
2
= [ ] 45 50
31
10
5 . 30 +
Q
2
= 32.11
Q
3
=
1
]
1

+ C N
4
3
f
h

Q
3
= [ ] 45 75
31
10
5 . 30 +
Q
3
= 40.17
QD =
2
1
(Q
3
Q
1
) = 0.5 (Q
3
Q
1
)
=
2
1
(40.17 22.80)
= 8.685
Coef. QD =
1 3
1 3
Q Q
Q Q
+

=
80 . 22 17 . 40
80 . 22 17 . 40
+


97 . 62
37 . 17

= 0.275
112
2. Find quartile deviation from the following marks of 12 students and also
co-efficient of quartile deviation.
Sl. No. Marks
1. 25
2. 30
3. 37
4. 43
5. 48
6. 54
7. 61
8. 67
9. 72
10. 80
11. 84
12. 89
Q
1
= 3.25
th
item
= 3
rd
item + 0.25 of item
= 37 + 0.25 (43 - 37)
Q
1
= 38.5
Q
3
=9.75
th
item
= 9 + 0.75
rd
item
= 72 + 0.75 (80- 72)
Q
3
= 78
QD =
2
1
(Q
3
Q
1
)
=
2
1
(78 38.3)
QD = 19.75
Coef. QD =
1 3
1 3
Q Q
Q Q
+

=
5 . 38 78
5 . 38 78
+

= 0.339
3. Compute quartile deviation. and its Coefficient for the data given below:
113
x f Cf
58 15 15
59 20 35
60 32
67 Q
1
Class
61 35 102
62 33 135
63 22
157 Q
3
Class
64 20 177
65 10 187
65 8 195
N = 195
Q
1
= size
4
1 n
th
+
= size
4
1 195
th
+
Q
1
= 48.78
th
size and corresponding to cf 67, which gives
Q
1
= 60
Q
3
= ( ) size 1 n
4
3
th
+
= ( ) . size 33 . 146 196
4
3
th th

It lies in 157, cf. Against cf 157 Q


3
= 63
QD =
2
1
(Q
3
Q
1
)
=
2
1
(63 60)
QD = 1.5
Coef. QD =
1 3
1 3
Q Q
Q Q
+

=
123
3
60 63
60 63


= 0.024
Merits of Quartile Deviation
114
It is very easy to compute
It is not affected by extreme values of variable.
It is not at all affected by open and class intervals.
Demerits of Quartile Deviation
It ignores completely the portions below the lower quartile and
above the upper of quartile.
It is not capable for further mathematical treatment.
It is greatly affected by fluctuations in the sampling.
It is only the positional average but not mathematical average.
115
Session 8
Measures of Dispersions
Mean Deviation
Mean deviation is the average differences among the items in a series from the
mean itself or median or mode of that series. It is concerned with the extent of which
the values are dispersed about the mean or median or the mode. It is found by
averaging all the deviations from control tendency. These deviations are taken into
computations with regard to negative sign. Theoretically the deviations of item are
taken preferably from median instead than from the mean and mode.
Merits of Mean Deviation
It is rigidly defined and easy to compute.
It takes all items in to considerations and gives weight to deviation
according to these sign.
It is less affected by extreme values.
It removes all irregularities by obtaining deviation and provides
correct measures.
Demerits of Mean Deviation
It is not suitable for algebraic treatments.
It is positive which is not justified mathematically.
It is not satisfactory measure when the deviations are taken from
mode.
It is not suitable when class intervals are open end.
116
Formula to compute Mean Deviation
If x
i
is variant and takes the values x
1
, x
2
, x
3
, .. x
n
with average. A (mean,
median, mode), then mean deviation from the average A is defined by
MD =
N
A x
i

For the grouped data
MD =
N
A x f
i

Coefficient of MD =
Mean
MD
1. Compute MD and CMD from mean for the given data below.
X
d =
x x
i

21 26.55
32 15.55
38 9.55
41 6.55
49 1.45
54 6.45
59 11.45
66 18.45
68 20.45
x = 428

x x
i

= d= 116.45

1 i
i
n
x x

35 . 47
9
428
x
MD =
9
45 . 116
N
x x
i


MD = = 12.938
Coefficient of MD =
Avg
MD
=
55 . 47
938 . 12
= 0.272
117
2. Following are the wages of workers. Find mean deviation from median and
its coefficient.
x Wages
Me x
i

=
47 x
i

59 17 30
32 22 25
67 25 22
43 32 15
22 43 4
17
47 M
0
64 55 8
55 59 12
47 64 17
80 67 20
25 80 33
25

M x
i

= 186
M x
i

= 186
Median =
,
_

+
2
1 11
th
item
=
,
_

+
2
1 11
= 6
th
item
Me = 47
MD =
N
Me x
i


=
11
186
= 16.91
Coefficient of MD =
Median
MD

47
91 . 16
= 0.359
3. Compute MD about its mode and its coefficient.
118
x f
d =
Mode x
i

fd
20 6 100 600
40 19 80 1520
60 40 60 2400
80 23 40 920
100 65 20 1300
120 Mode 83 Modal
class
0 0
140 55 20 1100
160 20 40 800
180 9 60 5401
f = 320
f
Mode x
i

=
9180
the highest frequency is 83 and hence
Z = 120
MD=
N
Mode x
i


Median =
,
_

320
9180

= 28.68
Coefficient of MD =
120
68 . 28
= 0.239

119
4. Find out the mean deviation from the data given below about its median.
Salaries 40 50 50-100 100-200 200-400
No. of Employees 22 18 10 8 2
x
No. of
Employees
x(mv) cf d =
Me x
i

fd
40 22 40 22 10 220
50 18 50 40 0 0
50-100 10 75 50 25 250
100-200 8 150 58 100 800
200-400 2 300 60 250 500
f = 60
f
Me x
i

= 1770
Median =
th
2
1 N

,
_

+
item
=
2
1 60 +
=
2
61
= 30.5 It lies in 40 cf and against 40 cf
discrete value is 50

MD =
N
Median x
i


=
,
_

60
1770

MD = 29.5
Coefficient of MD =
Median
MD
=
50
5 . 29
= 0.59
Session 9
Measures of Dispersions
120
Standard Deviation
Standard deviation is the root of sum of the squares of deviations divided by
their numbers. It is also called Mean error deviation. It is also called mean square
error deviation (or) Root mean square deviation. It is a second moment of dispersion.
Since the sum of squares of deviations from the mean is a minimum, the deviations are
taken only from the mean (But not from median and mode).
The standard deviation is Root Mean Square (RMS) average of all the
deviations from the mean. It is denoted by sigma ().
Characteristics of standard deviation
1. Standard deviation and coefficient of variation possesses all these properties
which a good measure of dispersion should possess.
2. The process of squaring the deviation eliminates negative sign and makes
mathematical computations easy.
Merits
1. It is based on all observations.
2. It can be smoothly handled algebraically.
3. It is a well defined and definite measure of dispersion.
4. It is of great importance when we are making comparison between
variability of two series.
Merits
1. It is difficult to calculate and understand.
2. It gives more weightage to extreme values as the deviation is squared.
3. It is not useful in economic studies.
Standard deviation
If the variant x
i
takes the values of x
1
, x
2
.. x
n
the standard deviation
denoted by and it is defined by
=
( )
N
x x
2
i

The quantity
2
is called variance.
121
Alternate Expressions
For raw data

2
= ( )
2
2
x
n
x

For a grouped data


2
= ( )
2
2
x
n
fx

For a grouped data with step deviation method =


2
2
N
fd
N
fd

,
_

Coefficient of variance
It is defined as the ratio to be equal to standard deviation divided by mean. The
percentage form of CV is given by CV =
100 x
x

122
Problems
1. Ten students of a class have obtained the following marks in a particular subject out
of 100. Calculate SD and CV for the given data below.
Sl. No.
(x)
marks
d = (x
1
= 38.5)
d = (x
1
-
x
)
(x
1
-
x
)
2
1. 5 - 33.5 1122.25
2. 10 - 28.4 812.25
3. 20 - 18.5 342.25
4. 25 - 13.5 182.25
5. 40 1.5 2.25
6. 42 3.5 12.25
7. 45 6.5 42.25
8. 48 9.5 90.25
9. 70 31.5 992.25
10. 80 41.5 1722.25
x = 385
(x
1
-
x
)
2
=
d
2
= 5320.50
N
x
x


=
10
385
= 38.5
=
( )
N
x x
2
i

=
10
5 . 5320
= 23.066
CV =
100 x
x

CV = 100 x
5 . 38
23
CV = 59.9%
123
2. Compute standard deviation and coefficient of varience for following data of 100
students marks.
Class f Class
Mid
point
x
d fd fd
2
1 10 3 0.5 10.5 5.5 -2 -6 12
11 20 16 10.5 20.5 15.5 -1 -16 16
21 30 26 20.5 30.5 25.5 0 0 0
31 40 31 30.5 40.5 35.5 1 31 31
41 50 16 40.5 50.5 45.5 2 32 64
51 60 8 50.5 60.5 55.5 3 24 72
N = f =
100
fd = 65 fd
2
= 195
a = 25.5
d = d
10
5 . 25 x
h
a x

d = 1
10
10
10
5 . 25 5 . 15

N
fd
h a x

+ +

,
_

+
100
65
10 5 . 25
= 25.5 + 6.5
x 32
= h
2
2
N
fd
N
fd

,
_

= 10
2
100
65
100
195

,
_

= 12.359
CV =
100 x
x

CV = 100 x
32
359 . 12
= 38.62%
124
3. The AM and SD of a set of nine items are 43 and 5 respectively if an item of value
63 is added, find the mean and SD.
N
x
x
i

x
i
=
x
x N
x
i
= 43 x 9
x = 387 for 9 items
x = 387 + 63 for 10 item
x = 450
Modified mean
10
450
N
x
x

x
= 45
x
= 43 = 5 for 9 items

2
= ( )
2
2
x
N
x

25 = ( )
2
2
43
9
x

25 = 1849
9
x
2

25 + 1849 =
9
x
2

9
x
2

= 1874
x
2
= 1874
x
2
= 16866 for 9 items
If 63 is added
x
2
= 16866 + (63)
2
= 20835 for 10 items
Modified
2
= ( )
2
2
x
N
x

2
=
( )
2
45
10
20835


2
= 7.64 is modified SD.
125
4. The mean of 5 observations is 4.4. and variance is 8.24 and if the 3 items of the five
observations are 1, 2 and 6. Find the values of other two observations.
w.k.t.
N
x
x

N
x
4 . 4

x = 22

2
= ( )
2
2
x
N
x

8.24 = ( )
2
2
4 . 4
5
x

8.24 = 36 . 19
9
x
2

8.24 + 19.36 =
5
x
2

x
2
= 138
x
2
= 1
2
+ 2
2
+ 6
2
+ x
1
2
+ x
2
2
138 = 1 + 4 + 36 + x
1
2
+ x
2
2
97 = x
1
2
+ x
2
2
x
1
2
+ x
2
2
= 97 ---- (1)
x = 1 + 2 + 6 + x
1
+ x
2
22 = 9 + x
1
+ x
2
x
1
+ x
2
= - 13 ---- (2) put (2) in (1)
x
2
= 13 x
1
by (1) & (2)
x
1
2
+ (13 x
1
)
2
= 97
x
1
2
+ 169 + x
1
2
26x
1
= 97
2 x
1
2
26x
1
+ 72 = 0
x
1
2
13x
1
+ 36 = 0
126
x
1
=
a 2
49 b b -
2
t
x
1
=
2
36 x 4 169 (-13) - t
x
1
=
2
5 13 t
x
1
=
2
5
2
13
t
x
1
= 6.5 t 2.5
x
1
= 9 or x
1
= 4
x
1
= 9 x
2
= 4
127
5. The mean and S.D. of the frequency distribution of a continuous random variable x
are 40.604 and 7.92 respectively. Change of origin and scale is given below.
Determine the actual class interval.
d -3 -2 -1 0 1 2 3 4
f 3 15 45 57 50 36 25 9
d f fd fd
2
MV CI
-3 3 -9 27 22.5 20-25
-2 15 -30 60 29.5 25-30
-1 45 -45 45 32.5 30-35
0 57 0 0 37.5 35-40
1 50 50 50 42.5 40-45
2 36 72 144 47.5 50-55
3 25 75 225 52.5 55-60
4 9 36 144 57.5
N = 240 fd = 149 fd
2
= 695
N
fd
h a x

+
240
149
h a 604 . 40 +
40.604 = a + 0.62h ----- (1)
= h
2
2
N
fd
N
fd

,
_

7.92 = h
2
240
149
240
695

,
_

= h 620 . 0 895 . 2
7.92 = h x 1.584
h = 4.998
h = 5
Put h = 5 in equation (1)
40.604 = a + 0.62 x 5
a = 37.5
128
Combined Standard Deviation
Suppose we have different samples of various sizes n
1
, n
2
, n
3
.. having
means x
1
, x
2
, x
3
and standard deviation
1
,
2
,
3
. then combine standard
deviation can be computed by the following formula.

2
(n
1
+ n
2
) = n
1
(
1
2
+ d
1
2
) + n
2
(
2
2
+ d
2
2
)
d
1
= x x
1

d
2
= x x
2

1. The means of two samples of sizes 50 and 100 respectively are 54.1 and 50.3 and
there standard deviations are 8 and 7 respectively obtain the SD for combined
group.
n
1
= 50
1
x = 54.1

1
= 8
n
2
= 100
2
x = 50.3

2
= 7
) n n (
x n x n
x
2 1
2 2 1 1
+
+

100 50
) 3 . 50 x 100 ( ) 1 . 54 x 50 (
x
+
+

x 51.56

2
(n
1
+ n
2
) = n
1
(
1
2
+ d
1
2
) + n
2
(
2
2
+ d
2
2
)
d
1
= x x
1

d
2
= x x
2

d
1
= 94.1 51.56
d
1
= 2.54 d
1
2
= 6.45
d
2
= 50.3 51.56
d
2
= - 1.26 d
2
2
= 1.56

2
150 = 50 (8
2
+ 6.45) + 100 (7
2
+ 1.58)
3
2
= (64 + 6.45) + 2 (49 + 1.58)
3
2
= 70.45 + 2 x 50.58
= 7.56
129
2. The mean wage is Rs. 75 per day, SD wage is Rs. 5 per day for a group of 1000
workers and the same is Rs. 60 and Rs. 4.5 for the other group of 1500 workers.
Find mean and standard deviation for the entire group.
We have by data,
1
x = 75,
1
= 5, n
1
= 1000
2
x = 60,
2
= 450, n
2
= 1500
Let
x
and be the mean and SD of the entire group.
Consider
2 1
2 2 1 1
n n
x n x n
x
+
+

i.e.,
0 6
1500 1000
60 x 1500 75 x 1000
x
+
+

Also we have,
(n
1
+ n
2
)
2
= n
1
(
1
2
+ d
1
2
) + n
2
(
2
2
+ d
2
2
),
where d
1
=
1
x -
x
= 75 66 = 9; d
2
=
2
x -
x
= 60 66 = -6
(1000 + 1500)
2
= 1000 (5
2
+ 9
2
) + 1500 (4.5
2
+ (-6)
2
)

2
= 76.15 or = 8.73
3. The runs scored by 3 batsman are 50, 48 and 12. Arithmtic means respectively.
The SD of there runs are 15, 12 and 2 respectively. Who is t he most consistent of
the three batsman? If the one of these three is to be selected who is to be selected?
A B C
AM (
x
)
50 48 12
SD()
15 12 2
CV
A
=
A
A
x

x 100
CV
A
=
50
15
x 100
CV
A
= 30%
CV
B
=
B
B
x

x 100
CV
B
=
48
12
x 100
CV
B
= 25%
CV
C
=
C
C
x

x 100
CV
C
=
12
2
x 100
CV
C
= 16.66%
Evaluation Criteria
1. Less CV indicates more constant player and hence more consistent player is
(Player C)
2. Highest rune scorer =
x A
= 50
4. The coefficient of variation of the two series are 75% and 90% with SD 15 and 18
respectively compute there mean.
CV
A
= 75%
CV
B
= 80%

A
= 15

B
= 18
CV

=
100 x
x


75 =
100 x
x
15
A
90 =
100 x
x
18
A
x
A
= 20
x
A
= 20
5. Goals scored by two teams A & B in a foot ball season are as shown below. By
calculating CV in each, find which team may be considered as more consistent.
No. of goals
x
No. of matches
Team (A)
fx
Team (B)
fx A-team B-team
0 27 17 0 0
1 9 9 9 9
2 8 6 16 12
3 5 5 15 15
4 4 3 16 12
N = f = 53 f = 40 fx = 56 fx
2
= 48
Team (A)
fx
2
Team (B)
fx
2
0 0
9 9
32 24
45 45
64 48
fx
2
= 150 fx
2
= 126
x
A
=
N
fx
=
53
56
= 1.056
x
B
=
N
fx
=
40
48
= 1.2
( )
2
2
2
A
x
N
fx

= ( ) 715 . 1 056 . 1
53
150
2
=
30 . 1
A

24
( )
2
2
2
B
x
N
fx

= ( ) 95 . 1 2 . 1
40
126
2
=
30 . 1
B

CV
A
=
A
A
x

x 100 = 100 x
056 . 1
30 . 1
= 123.8%
CV
B
=
B
B
x

x 100 = 100 x
2 . 1
30 . 1
= 109%
Since, CV
B
< CV
A
, team B is more consistent player
6. The prices of x and y share A & B respectively state which share more stable in its
value.
Price A
(x)
(x
i
= 53)
(x
i
=
x
)
(x
i
=
x
)
2
Price - A
(4)
(x
i
= 105)
(x
i
=
x
)
(x
i
=
x
)
2
55 2 4 108 3 9
54 1 1 107 2 4
52 -1 1 105 0 0
53 0 0 105 0 0
56 3 9 106 1 1
58 5 25 107 2 4
52 -1 1 104 -1 1
50 -3 9 103 -2 4
51 -2 4 104 -1 1
49 -4 16 101 -4 16
x = 530 (x
i
=
x
)
2
= 70 x = 1050 x(x
i
=
x
)
2
= 40
25
x
A
=
N
x
=
10
530
= 53
x
B
=
N
x
=
10
1050
= 105
64 . 2
10
70
A A

2
10
40
B B

CV
A
=
x
A

x 100 = 100 x
53
64 . 2
= 4.98%
CV
B
=
x
B

x 100 = 100 x
105
2
= 1.903%
Since, CV
B
is less share B is more stable.
7. A student while computing the coefficient of variation obtained the mean and SD of
100 observations as 40 and 5.1 respectively. It was later discovered that he had
wrongly copied an observation as 50 instead of 40. Calculate the correct coefficient of
variation.
>>
n
x
x

i.e.
100
x
40

x (incorrect) = 4000
Now correct x = 4000 50 + 40 = 3990
correct
100
3990
x
= 39.9
Let us consider ( )
2
2
2
x
n
x


( ) ( )
2
2
2
40
100
x
1 . 5

i.e. ( ) ( ) 01 . 1626
100
x
or
100
x
1 . 5 40
2 2
2 2


+
26
x
2
(incorrect) = 100 x 1626.01 = 162601
Now correct x
2
= 162601 (50)
2
+ (40)
2
= 161701
correct
2
= correct ( )
2
2
x correct
n
x

i.e., correct
2
=
( ) 25 9 . 39
100
161701
2

Now correct efficient of variation =
100 x
x

% 56 . 12 100 x
9 . 39
5

Hence correct C.V. = 12.53%


27
8. The mean and SD of 21 observations are 30 and 5 respectively. It was subsequently
noted that one of the observations 10 was incorrect. Omit it and determine the mean
and SD of the rest.
>>
n
x
x

i.e. 630 x or
21
x
30

incorrect x = 630
Now omitting the incorrect value 10,
New x = 630 10 = 620
n = 21 1 = 20
New 31
20
620
x
Next consider ( )
2
2
2
x
n
x


( ) ( )
2
2
2
30
100
x
5

i.e.
21
x
25 900
2

+
19425 21 x 925 x incorrect
2

Again omitting the incorrect value 10.
New x = 19425 (10)
2
= 19325, n = 20
Hence new ( )
2
2
2
x new
20
x
new


25 . 5 ) 31 (
20
19325
2

New = 25 . 5 = 2.29
9. The mean of 200 items was 50. Later on it was discovered that two items were misread
as 92 and 8 instead of 192 and 88. Find out the correct mean.
>>
n
x
x

i.e. 10000 x or
200
x
50

incorrect x = 10000
Correct x = 10000 92 8 + 192 + 88 = 10180
28
Correct mean =
200
10180
= 50.9
10. Find the missing frequencies in the following data given that the median is 137.2.
Class 100-
110
110-
120
120-
130
130-
140
140-
150
150-
100
106-
170
170-
180
Frequency 15 44 133 F
1
125 F
2
35 16 N=600
>> We prepare the table with the column of cumulative frequencies and use the
formula for median.
Class Frequency cf
100-110 15 15
110-120 44 59
120-130 133 192
130-140 f
1
192 + f
1 Median class
140-150 125 317 + f
1
150-160 f
2
317 + f
1
+ f
2
160-170 35 352 + f
1
+ f
2
170-180 16 368 + f
1
+ f
2
N = 600
Median = 1 +
,
_

c
2
N
f
h
We can take the median class as 130-140 since median is given to be 137.2
130
2
130 130
l
+
, h = 10 f = f
1
, c = 192
137.2 = 130 +
1
f
10
(300 - 192)
i.e., 137-2 130 =
1
f
1080
i.e., 7.2 f
1
= 1080 or f
1
150
But the last cumulative frequency must be equal to N = 600
i.e. 368 + f
1
+ f
2
= 600
29
368 + 150 + f
2
= 600 f
2
= 82
Thus f
1
= 150, f
2
= 82
30
Relationship between various measures of dispersion
We have some of following relationships among the various methods of measures
of dispersion
1. Mean t QD covers 50% of observations of the distribution
2. Mean t MD covers 57.5% of observations
3. Mean t 1 includes 68.27% of observations
4. Mean t 2 includes 95.45% of observations
5. Mean t 3 includes 99.73% of observations
6. QD =
3
2
6745
7. MD =
5
4
x
A
2
8. QD =
6
5
MD
9. Combining the results we get 3 QD = 2 SD and 5 MD = 4 SD that is also equal to 6
QD.
10. Range = 6 times SD.
SOURCES AND REFERENCES
8. Statistics for Management, Richard I Levin, PHI / 2000.
9. Statistics, RSN Pillai and Bagavathi, S. Chands, Delhi.
10. An Introduction to Statistical Method, C.B. Gupta, & Vijaya Gupta, Vikasa
Publications, 23e/2006.
11. Business Statistics, C.M. Chikkodi and Salya Prasad, Himalaya Publications, 2000.
12. Statistics, D.C. Sancheti and Kappor, Sultan Chand and Sons, New Delhi, 2004.
13. Fundamentals of Statistics, D.N. Elhance and Veena and Aggarwal, KITAB
Publications, Kolkata, 2003.
14. Business Statistics, Dr. J.S. Chandan, Prof. Jagit Singh and Kanna, Vikas Publications,
2006.
31

You might also like