CHAPTER 13 6 MARKS
STATISTICAL DESCRIPTION OF DATA
BY : SHIVANI SHARMA
DEFINITION OF STATISTICS
SINGULAR SENSE PLURAL SENSE
● scientific method that is ● data qualitative as well
employed for collecting, as quantitative, that are
analysing and presenting collected, usually with a
data to draw statistical view of having
inferences statistical analysis.
ORIGIN OF WORD - STATISTICS
Language Word
LATIN STATUS
ITALIAN STATISTA
GERMAN STATISTIK
FRENCH STATISTIQUE
HISTORY OF STATISTICS
Kautilya’s 'Arthashastra' has During the reign of Akbar in Referring to Egypt,
record of births and deaths the sixteenth century A.D. We the first census was
during Chandragupta's reign find statistical records on conducted by the
in the fourth century B.C. agriculture in Ain-i-Akbari Pharaoh during 300
written by Abu Fazl. B.C. to 2000 B.C.
APPLICATIONS OF STATISTICS
❖ Economics
❖ Business Management
❖ Commerce and Industry
LIMITATIONS OF STATISTICS
I. Statistics deals with the aggregates and not individual data.
II. Statistics is concerned with quantitative data. However, qualitative
data also can be converted to quantitative data by providing a
numerical description to the corresponding qualitative data.
III. Future projections of sales, production, price and quantity etc. are
possible under a specific set of conditions. If any of these conditions
is violated, projections are likely to be inaccurate.
IV. Sampling based conclusions are used , improper sampling leads to
improper results .
Quantitative information shown as number
DATA
PRIMARY SECONDARY
The data which are collected collected data used by a
for the first time by an different person or agency.
investigator or agency
Variable is a measurable data
VARIABLE
CONTINUOUS
DISCRETE VARIABLE VARIABLE
● When a variable assumes a ● When a variable assumes
finite or a countably infinite
any value from a given
number of isolated values, it
interval.
is known as a discrete
variable. ● EXAMPLE : height, weight
● EXAMPLE : Number of petals
in a flower, the number of
road accidents in locality
CONTINUOUS
DISCRETE VARIABLE VARIABLE
● Height
● Number of petals in flower
● Weight
● Number of misprints a book contains
● Sale
● Number of road accidents in
particular locality ● The distribution of profits of a
● Annual income of a person
blue-chip company
● Marks of a student
● Age of a person
● The distribution of shares
● Turnover of a company
● Salary of a person (Personal point of
view) (Commercial point of view)
ATTRIBUTE
● A qualitative characteristic is known as an attribute.
● The gender of a baby, the nationality of a person, the colour of a
flower etc. are examples of attributes.
PERSONAL INDIRECT TELEPHONE
INTERVIEW INTERVIEW INTERVIEW
INTERVIEW METHOD
MAILED QUESTIONNAIRE
COLLECTION
OF PRIMARY
DAT A OBSERVATION
QUESTIONNAIRE FILLED BY
ENUMERATOR
INTERVIEW METHOD
PERSONAL INTERVIEW METHOD TELEPHONE
INDIRECT INTERVIEW METHOD
INTERVIEW
The investigator meets the METHOD
respondents directly and collects the ● When reaching
respondent is difficult, Data is collected
required information .
over phone
data is collected by
Highly accurate
contacting associated Quick and
non -expensive
EXAMPLE : natural calamity like a super persons .
method
● Highly accurate , low
cyclone or an earthquake or an epidemic like Non-responses is
coverage
plague, maximum
● EXAMPLE : rail accident
Low accuracy
High coverage
MAILED QUESTIONNAIRE METHOD
● In this method well-drafted and
soundly-sequenced questionnaire covering all the
important aspects of the data requirement is sent
to the respondents for filling .
● Coverage is wide but amount of non responses will
be maximum
OBSERVATION METHOD
● In this method data is collected by direct
observation or using instrument .
● EXAMPLE : data on the height and weight of a group of
students.
● more accurate
● time consuming,
● laborious
● covers only a small area.
QUESTIONNAIRE FILLED AND SENT BY ENUMERATORS
● Enumerator means a Person who directly
interacts with respondent and fills the
questionnaire.
● It is generally used in case of surveys and
census.
SOURCES OF SECONDARY DATA
● International sources : WHO, ILO, IMF, World Bank etc.
● Government sources : Statistical Abstract by CSO,
● Private and quasi-government sources : ISI, ICAR, NCERT etc.
● Unpublished sources of various research institutes, researchers etc.
SCRUTINY OF DATA
● Checking accuracy and consistency of data
● No hard and fast rules can be recommended for the scrutiny
of data. One must apply his intelligence, patience and
experience while scrutinising the given information.
INTERNAL CONSISTENCY
When two or more series of related data are given , we should check
consistency among them
CLASSIFICATION OR ORGANISATION OF DATA
● It puts the data in a neat, precise and condensed form so that it
is easily understood and interpreted.
● It makes comparison possible between various characteristics,
● Statistical analysis is possible only for the classified data.
Chronological / Temporal / Time Series
POPULAR
Geographical / Spatial Series
Data
DATA
CLASSIFICATION Qualitative / Ordinal Data
Quantitative / Cardinal Data
Chronological / Temporal / Time Series
When the data are classified in respect of successive time points or
intervals, they are known as time series data.
EXAMPLE
The following example shows
the population of India
classified in terms of years.
Geographical / Spatial Series Data
Data arranged region wise are known as geographical data.
EXAMPLE
shows the yield of wheat in
different countries
QUALITATIVE / ORDINAL DATA
● Data classified in respect of an attribute are referred to as qualitative data.
Data on nationality, gender, smoking habit of a group of individuals are
examples of qualitative data.
EXAMPLE
In the following example, we find
population of a country is grouped
on the basis of the qualitative
variable “gender”
Quantitative / Cardinal Data
● when the data are classified in respect of a variable, say height,
weight, profits, salaries , marks of students etc., they are known as
quantitative data.
EXAMPLE
the quantitative
classification of marks in
mathematics
Textual Presentation
Mode Of Tabular Presentation / Tabulation
Presentation
of Data
Diagrammatic Representation
TEXTUAL PRESENTATION
● This method comprises presenting data with the help of a paragraph or
a number of paragraphs.
● EXAMPLE
● 'In 2009, out of a total of five thousand workers of Roy Enamel Factory,
four thousand and two hundred were members of a Trade Union. The
number of female workers was twenty per cent of the total workers out
of which thirty per cent were members of the Trade Union.
TEXTUAL PRESENTATION
MERITS DEMERITS
● Even a layman can present ● It is dull, monotonous and
data by this method comparison between
different observations is
● The observations with exact
not possible in this method.
magnitude can be
presented with the help of
textual presentation.
❓
❓
TABULAR PRESENTATION / TABULATION
Tabulation may be defined as systematic presentation of data with the help
of a statistical table .
MERITS
● It facilitates comparison between rows and columns.
● Complicated data can also be represented using tabulation.
● It is a must for diagrammatic representation.
● Without tabulation, statistical analysis of data is not possible. MOST
ACCURATE
/
BEST
TABULAR PRESENTATION / TABULATION
entire upper part of the table which includes columns and sub-column
BOX HEAD :
numbers, unit(s) of measurement along with caption.
the upper part of the table, describing the columns and sub-columns,
CAPTION :
left part of the table providing the description of the rows.
STUB :
BODY : main part of the table that contains the numerical figures.
FOOTNOTE : source of the data at the bottom of table
Box Head
Caption
Member of Not Member Of Total
From Annual Report of
Trade Union Trade Union
Stub
From Annual Report of
Gender Male Femal Male Female Male Female
e
Unit % No % No % No % No % No % No.
. . . . .
2009
2010
Footnote
From Annual Report of__________ Body
From Annual Report of
DIAGRAMMATIC REPRESENTATION OF DATA
● An attractive representation of statistical data
● can be used for both the educated section and uneducated section
of the society.
● Any hidden trend present in the given data can be noticed only in
this mode of representation.
● Compared to tabulation, this is less accurate. So, if there is a
priority for accuracy, we have to recommend tabulation.
MOST
ATTRACTIVE
Line Diagram / Historiagram
Bar Diagram
TYPES OF
DIAGRAM
Pie Chart
Line Diagram or Historiagram
● Generally used for time series .
wide fluctuation LOG CHART OR RATIO CHART
two or more series of same unit MULTIPLE LINE CHART
two or more series of distinct unit MULTIPLE AXIS CHART
LINE DIAGRAM
Year Profit in Rs.
Lakhs
2009 5
2010 8
2011 9
2012 6
2013 12
2014 15
2015 24
LOG CHART / RATIO CHART
Year x Profit in Rs. log10y
Lakhs y
2009 10 1
2010 100 2
2011 1000 3
2012 10000 4
MULTIPLE LINE CHART
Dotted line represent
production of rice and
continuous line that of
wheat
MULTIPLE AXIS CHART
BAR DIAGRAM
Bars i.e. rectangles of equal width and usually of varying lengths
drawn either horizontally or vertically.
Year Profit in Rs.
Lakhs
2009 5
2010 8
2011 9
2012 6
2013 12
2014 15
2015 24
HORIZONTAL VERTICAL
Qualitative data or Data varying Quantitative data or Time series data
over space (Geography )
MULTIPLE / GROUPED BAR DIAGRAM
We consider Multiple or Grouped Bar diagrams to compare related series.
COMPONENT BAR DIAGRAM
● Component or sub-divided
Bar diagrams are applied
for representing data
divided into a number of
components.
Percentage BAR DIAGRAM
● For relative comparison to
whole , percentage bar
diagrams or divided bar
diagrams are used
Pie chart
It is used for circular presentation of relative data
Segment angle = (segment value x 3600)
(total value)
Example: Draw an appropriate diagram with a view to represent the
following data :
Unit 1 Exercise
Set A
Que 1. Which of the following statements is false?
(a) Statistics is derived from the Latin word ‘Status’
(b) Statistics is derived from the Italian word ‘Statista’
(c) Statistics is derived from the French word ‘Statistik’
(d) None of these.
c
Que 2. Statistics is defined in terms of numerical data in the
(a) Singular sense
(b) Plural sense
(c) Either (a) or (b)
(d) Both (a) and (b).
b
Que 3. Statistics is applied in
(a) Economics
(b) Business management
(c) Commerce and industry
(d) All these.
d
Que 4. Statistics is concerned with
(a) Qualitative information
(b) Quantitative information
(c) (a) or (b)
(d) Both (a) and (b).
d
Que 5. An attribute is
(a) A qualitative characteristic
(b) A quantitative characteristic
(c) A measurable characteristic
(d) All these.
a
Que 6. Annual income of a person is
(a) An attribute
(b) A discrete variable
(c) A continuous variable
(d) (b) or (c).
b
Que 7. Marks of a student is an example of
(a) An attribute
(b) A discrete variable
(c) A continuous variable
(d) None of these.
b
Que. 8 Nationality of a student is
(a) An attribute
(b) A continuous variable
(c) A discrete variable
(d) (a) or (c).
a
Que 9 Drinking habit of a person is
(a) An attribute
(b) A variable
(c) A discrete variable
(d) A continuous variable.
a
Que 10. Age of a person is
(a) An attribute
(b) A discrete variable
(c) A continuous variable
(d) A variable.
c
Que 11. Data collected on religion from the census reports are
(a) Primary data
(b) Secondary data
(c) Sample data
(d) (a) or (b).
b
Que.12 The data collected on the height of a group of students after
recording their heights with a measuring tape are
(a) Primary data
(b) Secondary data
(c) Discrete data
(d) Continuous data.
a
Que 13. The primary data are collected by
(a) Interview method
(b) Observation method
(c) Questionnaire method
(d) All these.
d
Que 14. The quickest method to collect primary data is
(a) Personal interview
(b) Indirect interview
(c) Telephone interview
(d) By observation.
c
Que 15. The best method to collect data, in case of a natural calamity, is
(a) Personal interview
(b) Indirect interview
(c) Questionnaire method
(d) Direct observation method.
a
Que 16. In case of a rail accident, the appropriate method of data
collection is by
(a) Personal interview
(b) Direct interview
(c) Indirect interview
(d) All these.
c
Que 17. Which method of data collection covers the widest area?
(a) Telephone interview method
(b) Mailed questionnaire method
(c) Direct interview method
(d) All these.
b
Que 18. The amount of non-responses is maximum in
(a) Mailed questionnaire method
(b) Interview method
(c) Observation method
(d) All these.
a
Que 19. Some important sources of secondary data are
(a) International and Government sources
(b) International and primary sources
(c) Private and primary sources
(d) Government sources.
a
Que 20. Internal consistency of the collected data can be checked when
(a) Internal data are given
(b) External data are given
(c) Two or more series are given
(d) A number of related series are given.
d
Que 21. The accuracy and consistency of data can be verified by
(a) Internal checking
(b) External checking
(c) Scrutiny
(d) Both (a) and (b).
c
Que22. The mode of presentation of data are
(a) Textual, tabulation and diagrammatic
(b) Tabular, internal and external
(c) Textual, tabular and internal
(d) Tabular, textual and external.
a
Que23. The best method of presentation of data is
(a) Textual
(b) Tabular
(c) Diagrammatic
(d) (b) and (c).
b
Que24. The most attractive method of data presentation is
(a) Tabular
(b) Textual
(c) Diagrammatic
(d) (a) or (b).
c
Que 25. For tabulation, ‘caption’ is
(a) The upper part of the table
(b) The lower part of the table
(c) The main part of the table
(d) The upper part of a table that describes the column and sub-column.
d
Que 26. ‘Stub’ of a table is the
(a) Left part of the table describing the columns
(b) Right part of the table describing the columns
(c) Right part of the table describing the rows
(d) Left part of the table describing the rows.
d
Que 27. The entire upper part of a table is known as
(a) Caption
(b) Stub
(c) Box head
(d) Body.
c
Que28. The unit of measurement in tabulation is shown in
(a) Box head
(b) Body
(c) Caption
(d) Stub.
a
Que 29. In tabulation source of the data, if any, is shown in the
(a) Footnote
(b) Body
(c) Stub
(d) Caption.
a
Que 30. Which of the following statements is untrue for tabulation?
(a) Statistical analysis of data requires tabulation
(b) It facilitates comparison between rows and not columns
(c) Complicated data can be presented
(d) Diagrammatic representation of data requires tabulation.
b
Que 31. Hidden trend, if any, in the data can be noticed in
(a) Textual presentation
(b) Tabulation
(c) Diagrammatic representation
(d) All these.
c
Que. 32 Diagrammatic representation of data is done by
(a) Diagrams
(b) Charts
(c) Pictures
(d) All these.
d
Que33. The most accurate mode of data presentation is
(a) Diagrammatic method
(b) Tabulation
(c) Textual presentation
(d) None of these.
b
Que 34. The chart that uses logarithm of the variable is known as
(a) Line chart
(b) Ratio chart
(c) Multiple line chart
(d) Component line chart.
b
Que 35. Multiple line chart is applied for
(a) Showing multiple charts
(b) Two or more related time series when the variables are expressed in
the same unit
(c) Two or more related time series when the variables are expressed in
different unit
(d) Multiple variations in the time series.
b
Que 36. Multiple axis line chart is considered when
(a) There is more than one time series
(b) The units of the variables are different
(c) (a) or (b)
(d) (a) and (b).
d
Que 37. Horizontal bar diagram is used for
(a) Qualitative data
(b) Data varying over time
(c) Data varying over space
(d) (a) or (c).
d
Que 38. Vertical bar diagram is applicable when
(a) The data are qualitative
(b) The data are quantitative
(c) When the data vary over time
(d) (b) or (c).
d
Que 39. Divided bar chart is considered for
(a) Comparing different components of a variable
(b) The relation of different components to the table
(c) (a) or (b)
(d) (a) and (b).
d
Que 40. In order to compare two or more related series, we consider
(a) Multiple bar chart
(b) Grouped bar chart
(c) (a) or (b)
(d) (a) and (b).
c
Que 41 Pie-diagram is used for
(a) Comparing different components and their relation to the total
(b) Representing qualitative data in a circle
(c) Representing quantitative data in circle
(d) (b) or (c).
a
FREQUENCY And DISTRIBUTION
FREQUENCY : Number of times a particular observation is repeated.
FREQUENCY DISTRIBUTION TABLE : It is a table which contains observation or class
intervals in one column and corresponding frequency in the other
TYPES OF FREQUENCY DISTRIBUTION
Ungrouped / Simple Grouped Frequency
Frequency Distribution distribution
When there are large number
When there are
of observations , grouping is
limited number of
distinct observations , done among them , each
frequency can be group is called class interval
assigned to each one and frequency is assigned to
of them
group and not individual
value .
Frequency Distribution
Ungrouped Grouped
Non - Overlapping / Mutually Overlapping / Mutually
Inclusive classification Exclusive classification
Class Limit
● For a class interval, the class limits may be CLASS FREQUENCY LCL UCL
INTERVAL
defined as the minimum value and the
44-48 3 44 48
maximum value the class interval
● Minimum value = lower class limit (LCL) 49 - 53 4 49 53
● Maximum value = upper class limit (UCL).
54 -58 5 54 58
Non - Overlapping / Mutually Inclusive Overlapping / Mutually Exclusive
classification classification
Usually applicable to continuous variable .
CLASS LCL UCL CLASS LCL UCL
INTERVAL INTERVAL
44-48 44 48 40 - 50 40 50
49 - 53 49 53 50 - 60 50 60
54 -58 54 58 60 - 70 60 70
● Includes UCL ● Excludes UCL
● Usually applicable for ● Usually applicable for
discrete variable continuous variable
CLASS BOUNDARY
OVERLAPPING / MUTUALLY EXCLUSIVE
CLASS LCL UCL LCB UCB
INTERVAL
40 - 50 40 50 40 50
50 -60 50 60 50 60
60-70 60 70 60 70
Class limit = Class boundary
CLASS BOUNDARY
NON-OVERLAPPING / MUTUALLY INCLUSIVE
CLASS LCL UCL LCB UCB
INTERVAL
44-48 44 48 43.5 48.5
49 - 53 49 53 48.5 53.5
54 -58 54 58 53.5 58.5
LCB = LCL - 0.5
UCB = UCL + 0.5
CLASS LENGTH
Class length = UCB - LCB
CLASS LCL UCL LCB UCB CLASS
INTERVAL LENGTH
44-48 44 48 43.5 48.5 5
49 - 53 49 53 48.5 53.5 5
54 -58 54 58 53.5 58.5 5
Mid-Point or Mid-Value or Class Mark
CLASS LCL UCL LCB UCB MID POINT
INTERVAL
44-48 44 48 43.5 48.5 46
49 - 53 49 53 48.5 53.5 51
54 -58 54 58 53.5 58.5 56
Example: Following are the weights in kgs. of 36 BBA students of St.
Xavier’s College.
Construct a frequency distribution of weights, taking class length as 5.
Solution: We have, Range = Maximum weight – Minimum weight
= 73 kgs. – 44 kgs. = 29 kgs.
No. of class interval × class lengths = Range
No. of class interval × 5 = 29
No. of class interval = 29/5 = 6
(We always take the next integer as the number of class intervals so as
to include both the minimum and maximum values).
Grouped Frequency Distribution
Que 42. A frequency distribution
(a) Arranges observations in an increasing order
(b)Arranges observation in terms of a number of groups
(c) Relates to a measurable characteristics
(d) All of these
d
Que 43. The frequency distribution of a continuous variable is known as
(a) Grouped frequency distribution
(b) Simple frequency distribution
(c) (a) or (b)
(d) (a) and (b).
a
Que 44. The distribution of shares is an example of the frequency
distribution of
(a) A discrete variable
(b) A continuous variable
(c) An attribute
(d) (a) or (c).
a
Que 45. The distribution of profits of a blue-chip company relates to
(a) Discrete variable
(b) Continuous variable
(c) Attributes
(d) (a) or (b).
b
Que 46. Mutually exclusive classification
(a) Excludes both the class limits
(b) Excludes the upper class limit but includes the lower class limit
(c) Includes the upper class limit but excludes the upper class limit
(d) Either (b) or (c).
b
Que 47. Mutually inclusive classification is usually meant for
(a) A discrete variable
(b) A continuous variable
(c) An attribute
(d) All these.
a
Que 48. Mutually exclusive classification is usually meant for
(a) A discrete variable
(b) A continuous variable
(c) An attribute
(d) Any of these.
b
Que 49. The LCB is
(a) An upper limit to LCL
(b) A lower limit to LCL
(c) (a) and (b)
(d) (a) or (b).
b
Que 50. The UCB is
(a) An upper limit to UCL
(b) A lower limit to LCL
(c) Both (a) and (b)
(d) (a) or (b).
a
Que 51. length of a class is
(a) The difference between the UCB and LCB of that class
(b) The difference between the UCL and LCL of that class
(c) (a) or (b)
(d) Both (a) and (b).
a
CUMULATIVE FREQUENCY
● These are of two types -
Less than type cumulative frequency
More than type cumulative frequency
● For a particular class boundary
Less than type CF + More than type CF = Total frequency
Class More than Less than
Boundary
Class Frequency
0 43 0
0 - 10 4
10 39 4
10 - 20 8
20 -30 13 20 31 12
30 - 40 12
30 18 25
40 -50 6
40 6 37
50 0 43
Class Frequency More than (LCB) Less than (UCB)
0 - 10 4 43 4
10 - 20 8 39 12
20 -30 13 31 25
30 - 40 12 18 37
40 -50 6 6 43
Frequency Density Relative Frequency Percentage Frequency
Relative frequencies add percentage frequencies
up to unity add up to one hundred.
Relative frequency for a
particular class
Lies between 0 and 1
Frequency
Class Interval Frequency Class Length
Density
44 - 48 3 5 3/5 = 0.6
49- 53 4 5 4/5 = 0.8
54- 58 5 5
59 -63 7 5
64 - 68 9 5
69 -73 8 5
Total 36
Class Interval Frequency Relative Percentage
Frequency Frequency
44 - 48 3 3/ 36 = 0.083 3/36 x 100 = 8.33 %
49- 53 4 4/36 = 0.111
54- 58 5 5/36
59 -63 7 7/36
64 - 68 9 9/36
69 -73 8 8/36
Total 36 1 100 %
Histogram / Area Diagram
GRAPHICAL
REPRESENTATIO Frequency Polygon
N OF FREQUENCY
DISTRIBUTION
Ogives /Cumulative Frequency
Graphs
HISTOGRAM / AREA DIAGRAM
● This is a very convenient way to represent a frequency distribution.
● Comparison between frequency of two different classes are
possible
● It is used to calculate MODE.
HISTOGRAM / AREA DIAGRAM
Class
Frequency LCB UCB
Interval
44 - 48 3 43.5 48.5
49- 53 4 48.5 53.5
54- 58 5 53.5 58.5
MODE
59 -63 7 58.5 63.5
64 - 68 9 63.5 68.5
69 -73 8 68.5 73.5
Total 36 ● Mode = 66.50 kgs.
FREQUENCY POLYGON
● Usually frequency polygon is meant for simple / Ungrouped
frequency distribution.
● However, we also apply it for grouped frequency distribution
provided the width of the class intervals remains the same.
● We can also obtain a frequency polygon starting with a histogram
by adding the mid- points of the upper sides of the rectangles
successively and then completing the figure by joining the two ends
as before.
FREQUENCY POLYGON - UNGROUPED FREQUENCY DISTRIBUTION
Observation Frequency
(x)
0 5
1 5
2 6
3 6
4 4
5 2
6 2
FREQUENCY POLYGON - GROUPED FREQUENCY DISTRIBUTION
OGIVES / CUMULATIVE FREQUENCY GRAPH
By plotting cumulative frequency against the respective class boundary, we get ogives
TWO TYPES OF OGIVES
Less than type Ogives More than type Ogives
● less than type ogives, ● more than type ogives by
obtained by taking less plotting more than type
than cumulative cumulative frequency on the
frequency on the vertical vertical axis
axis
OGIVES / CUMULATIVE FREQUENCY GRAPH
● Ogives may be considered for obtaining quartiles graphically.
● If a perpendicular is drawn from the point of intersection of the two
ogives on the horizontal axis, then the x-value of this point gives us
the value of median
More
Class Less than
Frequency CB than
Interval type CF
type CF
44 - 48 3 43.5 0 36
49- 53 4 48.5 3 33
54- 58 5 53.5 7 29
59 -63 7 58.5 12 24
64 - 68 9 63.5 19 17
69 -73 8 68.5 28 8
73.5 36 0
❓
❓
FREQUENCY CURVE
● It is a limiting form of a histogram or frequency polygon.
● The frequency curve for a distribution can be obtained by drawing a
smooth and free hand curve through the mid-points of the upper
sides of the rectangles forming the histogram.
FREQUENCY CURVE
BELL SHAPED CURVE 4 J - SHAPED CURVE
T
Y
U - SHAPED CURVE P MIXED CURVE
E
S
BELL SHAPED CURVE
● Most of the commonly used
distributions provide bell-shaped
curve, which, as suggested by the
name, looks almost like a bell.
● The distribution of height, weight,
mark, profit etc. usually belong to
this category.
● On a bell-shaped curve , the frequency , starting from a rather low value ,
gradually reaches the maximum value , somewhere near the central part and
then gradually decreases to reach its lowest value at the other extremity .
U - SHAPED CURVE
For a U-shaped curve , the
frequency is minimum
near the central part and
the frequency slowly but
steadily reaches its
maximum at the two
extremities .
J - SHAPED CURVE
The J-shaped curve starts
with a minimum frequency
and then gradually reaches
its maximum frequency at
the other extremity .
MIXED CURVE
13.23
❓
❓
Que 52. For a particular class boundary, the less than cumulative
frequency and more than cumulative frequency add up to
(a) Total frequency
(b) Fifty per cent of the total frequency
(c) (a) or (b)
(d) None of these.
a
Que 53. Frequency density corresponding to a class interval is the ratio of
(a) Class frequency to the total frequency
(b) Class frequency to the class length
(c) Class length to the class frequency
(d) Class frequency to the cumulative frequency.
b
Que 54. Relative frequency for a particular class
(a) Lies between 0 and 1
(b) Lies between 0 and 1, both inclusive
(c) Lies between –1 and 0
(d) Lies between –1 to 1.
a
Que 55. Mode of a distribution can be obtained from
(a) Histogram
(b) Less than type ogives
(c) More than type ogives
(d) Frequency polygon.
a
Que 56. Median of a distribution can be obtained from
(a) Frequency polygon
(b) Histogram
(c) Less than type ogives
(d) None of these.
c
Que 57. A comparison among the class frequencies is possible only in
(a) Frequency polygon
(b) Histogram
(c) Ogives
(d) (a) or (b).
b
Que 58. Frequency curve is a limiting form of
(a) Frequency polygon
(b) Histogram
(c) (a) or (b)
(d) (a) and (b).
c
Que 59. Most of the commonly used frequency curves are
(a) Mixed
(b) Inverted J-shaped
(c) U-shaped
(d) Bell-shaped.
d
Que 60. The distribution of profits of a company follows
(a) J-shaped frequency curve
(b) U-shaped frequency curve
(c) Bell-shaped frequency curve
(d) Any of these.
c
SET B
Que. 1 Out of 1000 persons, 25 per cent were industrial workers and the
rest were agricultural workers. 300 persons enjoyed world cup matches
on TV. 30 per cent of the people who had not watched world cup matches
were industrial workers. What is the number of agricultural workers who
had enjoyed world cup matches on TV?
(a) 260
(b) 240
(c) 230
(d) 250
Que. 2 A sample study of the people of an area revealed that total
number of women were 40% and the percentage of coffee drinkers were
45 as a whole and the percentage of male coffee drinkers was 20. What
was the percentage of female non-coffee drinkers?
(a) 10
(b) 15
(c) 18
(d) 20
Que. 3 Cost of sugar in a month under the heads raw materials, labour,
direct production and others were 12, 20, 35 and 23 units respectively.
What is the difference between the central angles for the largest and
smallest components of the cost of sugar?
(a) 72°
(b) 48°
(c) 56°
(d) 92°
Que. 4 The number of accidents for seven days in a locality are given
below :
No. of accidents : 0 1 2 3 4 5 6
Frequency : 15 19 22 31 9 3 2
What is the number of cases when 3 or less accidents occurred?
(a) 56
(b) 6
(c) 68
(d) 87
Que. 5 The following data relate to the incomes of 86 persons :
Income in Rs. : 500–999 1000–1499 1500–1999 2000–2499
No. of persons : 15 28 36 7
What is the percentage of persons earning more than Rs. 1500?
(a) 50
(b) 45
(c) 40
(d) 60
Que. 6 The following data relate to the marks of a group of students:
Marks : Below 10 Below 20 Below 30 Below 40 Below 50
No. of 15 38 65 84 100
students :
How many students got marks more than 30?
(a) 65
(b) 50
(c) 35
(d) 43
Que. 7 Find the number of observations between 250 and 300 from the
following data :
Value : More than More than More than More than 350
200 250 300
No. of 56 38 15 0
observations :
(a) 56
(b) 23
(c) 15
(d) 8
SAMPLING
POPULATION / UNIVERSE
● All items ,elements , or observations of interest having similar properties
are known as population .
● It may be defined as the aggregate of all the units under consideration .
● Example : Population of students enrolled for CA Course
● The number of units belonging to a population is known as population size(N).
● If a population comprises only a finite number of units, then it is known as a
finite population.
● EXAMPLE : Population of students enrolled for CA Course
P
O
P ● If the population contains an infinite or uncountable number of units, then it is
U known as an infinite population.
L ● EXAMPLE : population of stars, the population of mosquitoes
A
T
I ● A population consisting of real objects is known as an existent population.
O
N
● A population that exists just hypothetically like the population of heads when a
coin is tossed infinitely is known as a hypothetical or an imaginary population
Census
● Study of every elements of population is called
census .
SAMPLE
● A sample may be defined as a part of a population so selected
with a view to representing the population in all its
characteristics .
● If a sample contains n units, then n is known as sample size.
● The units forming the sample are known as “Sampling Units”.
● A detailed and complete list of all the sampling units is known as a “Sampling
Frame”.
There are different statistical measures in statistics such as mean , median ,
mode , standard deviation , variance , proportion etc . These can be computed
for both population and sample .
PARAMETER STATISTIC
● It is the statistical measures ● It is the statistical measures
computed from population. computed from Sample .
● A parameter may be defined as a ● A statistic may be defined as a
characteristic of a population statistical measure of sample
based on all the units of the observation and as such it is a
population function of sample
observations
ESTIMATE
POPULATIO SAMPL
N E
● A statistic is used to estimate a particular
population parameter
^
P Proportion P
● Sampling is a technique of selecting individual
members or subset of the population to make
statistical inferences from them and estimate
characteristics of the whole universe .
● Sample Survey is the study of the unknown population on the basis of a proper representative
sample drawn from it .
LAW OF STATISTICAL REGULARITY
PRINCIPLES PRINCIPLE OF INERTIA
OF SAMPLE
SURVEY PRINCIPLE OF OPTIMISATION
PRINCIPLE OF VALIDITY
● LAW OF STATISTICAL REGULARITY : According to the law of statistical regularity, if
a sample of fairly large size is drawn from the population under discussion at random,
then on an average the sample would possess the characteristics of that population.
● PRINCIPLE OF INERTIA : It states that as sample size increases , the results
are likely to be more reliable , accurate and precise , provided other factors
are kept constant
● PRINCIPLE OF OPTIMISATION : The principle of optimization ensures that an
optimum level of efficiency at a minimum cost or the maximum efficiency at a
given level of cost can be achieved with the selection of an appropriate
sampling design.
● PRINCIPLE OF VALIDITY : The principle of validity states that a sampling design
is valid only if it is possible to obtain valid estimates and valid tests about
population parameters.
● Only a probability sampling ensures this validity.
COMPARISON BETWEEN SAMPLE SURVEY AND COMPLETE ENUMERATION
● When complete information is collected for all the units belonging to a
population, it is defined as complete enumeration or census.
● In most cases, we prefer sample survey to complete enumeration due to
the following factors:
Speed: As compared to census, a sample survey could be conducted,
usually, much more quickly simply because in sample survey, only a part
of the vast population is enumerated.
COMPARISON BETWEEN SAMPLE SURVEY AND COMPLETE ENUMERATION
Cost : The cost of collection of data on each unit in case of sample survey
is likely to be more as compared to census because better trained
personnel are employed for conducting a sample survey.
But when it comes to total cost, sample survey is likely to be less
expensive as only some selected units are considered in a sample
survey.
COMPARISON BETWEEN SAMPLE SURVEY AND COMPLETE ENUMERATION
Reliability : The data collected in a sample survey are likely to be more
reliable than that in a complete enumeration because of trained
enumerators better supervision and application of modern technique.
COMPARISON BETWEEN SAMPLE SURVEY AND COMPLETE ENUMERATION
Accuracy : Every sampling is subjected to what is known as sampling
fluctuation which is termed as sampling error.
It is obvious that complete enumeration is totally free from this sampling
error.
It may be noted that in sample survey, the sampling error can be reduced
to a great extent by taking several steps like increasing the sample size,
adhering to a probability sampling design strictly and so on.
COMPARISON BETWEEN SAMPLE SURVEY AND COMPLETE ENUMERATION
Necessity : Sometimes, sampling becomes necessity. When it comes to
destructive sampling where the items get exhausted like testing the
length of life of electrical bulbs or sampling from a hypothetical
population like coin tossing, there is no alternative to sample survey.
However, when it is necessary to get detailed information about each and
every item constituting the population, we go for complete enumeration.
ERRORS IN SAMPLE SURVEY
● Errors or biases in a survey may be defined as the deviation between the
value of population parameter as obtained from a sample and its
observed value.
TYPES OF ERROR
SAMPLING ERROR NON SAMPLING ERROR
SAMPLING ERROR
● Since only a part of population is investigated in sampling , every sampling
design is subjected to this type of errors . Sampling errors are absent in census
survey .
● Factors contributing to sampling errors are as follows :
● Errors arising out due to defective sampling design:
● Errors arising out due to substitution
● Errors owing to faulty demarcation of units:
● Errors owing to wrong choice of statistics :
● Variability in the population:
NON SAMPLING ERROR
● Errors due to recording observations, biases on the part of the
enumerators, wrong and faulty interpretation of data is termed as
non-sampling errors.
● This type of errors happen both in sampling and complete enumeration
Factors contributing to Non sampling errors are as follows :
● Lapse of memory
● Ignorance
● Communication gap
● Faulty planning
● Errors in compilation
● Non response bias
● Incomplete coverage
TYPES OF SAMPLING
PROBABILITY SAMPLING
NON - PROBABILITY SAMPLING
MIXED SAMPLING
PROBABILITY SAMPLING
● In the Probability sampling there is always a fixed, pre assigned probability for
each member of the population to be a part of the sample taken from that
population
● Some important probability sampling are :
simple random sampling ,
stratified sampling,
Multi Stage sampling, Multi Phase Sampling, Cluster Sampling and so on.
SIMPLE RANDOM SAMPLING
● When the units are selected independent of each other in
such a way that each unit belonging to the population
has an equal chance of being a part of the sample, the
sampling is known as Simple random sampling or just
random sampling.
● the population is not very large
SIMPLE RANDOM
● the sample size is not very small
SAMPLING IS EFFECTIVE IF
● the population under consideration is
not heterogeneous
STRATIFIED SAMPLING
● In this method , the universe or the entire
population is divided into a number of groups
or strata and then certain number of items are
taken from each group at random .
● Its basic purpose is to ensure that all the
characteristics of a heterogeneous population
are adequately represented in the sample .
● It helps in reduction of variability and thereby an
increase in precision.
● There are two types of allocation of sample size.
● “Proportional allocation” or ● “Neyman’s allocation”
“Bowely’s allocation
● When the strata-variances differ
● When there is not much significantly among themselves
variation between the strata
● sample size vary jointly with
variances
population size and population
● sample sizes for different standard deviation
strata are taken as
proportional to the population
sizes.
STRATIFIED SAMPLING
● The purpose of stratified sampling are
● (i) to make representation of all the sub populations
● (ii) to provide an estimate of parameter not only for all the strata but also and
overall estimate
● (iii) reduction of variability and thereby an increase in precision.
● Stratified sampling not advisable if
● (i) population is not large
● (ii) some prior information is not available
● (iii) there is not much heterogeneity among the units of population
MULTISTAGE SAMPLING
● In this type of complicated sampling,
the population is supposed to compose
of first stage sampling units, each of
which in its turn is supposed to
compose of second stage sampling
units, each of which again in its turn is Suppose we want to take a sample of 5000
households from India
supposed to compose of third stage
sampling units and so on till we reach
the ultimate sampling unit.
MULTISTAGE SAMPLING
● The coverage in case of multistage sampling is quite large.
● It also saves computational labour and is cost-effective.
● It adds flexibility into the sampling process which is lacking
in other sampling schemes.
● However, compared to stratified sampling, multistage
sampling is likely to be less accurate.
NON - PROBABILITY SAMPLING
● In non- probability sampling , no probability attached to the
member of the population and as such it is based entirely on the
judgement of the sampler.
● Non-probability sampling is also known as Purposive or
Judgemental Sampling
PURPOSIVE OR JUDGEMENTAL SAMPLING
● This type of sampling is dependent solely
on the discretion of the sampler and he
applies his own judgement based on his
belief, prejudice, whims and interest to
select the sample.
● Since this type of sampling is
non-probabilistic, it is purely subjective
and, as such, varies from person to
person.
● No statistical hypothesis can be tested
on the basis of a purposive sampling
MIXED SAMPLING
● Mixed sampling is based partly on some
probabilistic law and partly on some pre decided
rule.
● Systematic sampling belongs to this category.
SYSTEMATIC SAMPLING
● It refers to a sampling scheme where the units
constituting the sample are selected at regular
interval after selecting the very first unit at
random i.e., with equal probability.
● Systematic sampling is partly probability
sampling in the sense that the first unit of the
systematic sample is selected probabilistically
and partly non- probability sampling in the sense
that the remaining units of the sample are
selected according to a fixed rule which is
non-probabilistic in nature.
SYSTEMATIC SAMPLING
● If the population size N is a multiple of the sample size n i.e. N = nk, for a positive
integer k which must be less than n, then the systematic sampling comprises
selecting one of the first k units at random, usually by using random sampling
number and thereby selecting every kth unit till the complete, adequate and
updated sampling frame comprising all the members of the population is
exhausted. This type of systematic sampling is known as “linear systematic
sampling “. K is known as “sample interval”.
SYSTEMATIC SAMPLING
● However, if N is not a multiple of n, then we may write N = nk + p, p < k and as
before, we select the first unit from 1 to k by using random sampling number and
thereafter selecting every kth unit in a cyclic order till we get the sample of the
required size n. This type of systematic sampling is known as “circular
systematic sampling.”
SAMPLING FLUCTUATION
It is the variation in the value of a statistic computed from different samples .
● If we compute the value of a statistic, say mean, it is quite natural that the value of
the sample mean may vary from sample to sample as the sampling units of one
sample may be different from that of another sample.
SAMPLING DISTRIBUTION
It is the probability distribution of a given statistic
● The mean of the statistic, as obtained from its sampling distribution, is known
as “Expectation” and the standard deviation of the statistic is known as the
“Standard Error (SE)“ .
SAMPLING DISTRIBUTION AND STANDARD ERROR OF STATISTIC
● SE can be regarded as a measure of precision achieved by
sampling.
● SE is inversely proportional to the square root of sample size.
SAMPLING DISTRIBUTION AND STANDARD ERROR OF STATISTIC
● Starting with a population of N units, we can draw many a sample
of a fixed size n.
● In case of sampling with replacement, the total number of samples
that can be drawn is Nn
● When it comes to sampling without replacement , the total number
N
of samples that can be drawn is cn
Answer the following questions. Each question carries one mark.
Que. 1 Sampling can be described as a statistical procedure
(a) To infer about the unknown universe from a knowledge of any
sample
(b) To infer about the known universe from a knowledge of a sample
drawn from it
(c) To infer about the unknown universe from a knowledge of a
random sample drawn from it
(d) Both (a) and (b).
c
Answer the following questions. Each question carries one mark.
Que. 2 The Law of Statistical Regularity says that
(a) Sample drawn from the population under discussion possesses
the characteristics of the population
(b) A large sample drawn at random from the population would
possess the characteristics of the population
(c) A large sample drawn at random from the population would
possess the characteristics of the population on an average
(d) An optimum level of efficiency can be attained at a minimum
cost.
c
Answer the following questions. Each question carries one mark.
Que. 3 A sample survey is prone to
(a) Sampling errors
(b) Non-sampling errors
(c) Either (a) or (b)
(d) Both (a) and (b)
d
Answer the following questions. Each question carries one mark.
Que. 4 The population of roses in Salt Lake City is an example of
(a) A FInite population
(b) An infinite population
(c) A hypothetical population
(d) An imaginary population.
b
Answer the following questions. Each question carries one mark.
Que. 5 Statistical decision about an unknown universe is taken on the
basis of
(a) Sample observations
(b) A sampling frame
(c) Sample survey
(d) Complete enumeration
a
Answer the following questions. Each question carries one mark.
Que. 6 Random sampling implies
(a) Haphazard sampling
(b) Probability sampling
(c) Systematic sampling
(d) Sampling with the same probability for each unit.
d
Answer the following questions. Each question carries one mark.
Que. 7 A parameter is a characteristic of
(a) Population
(b) Sample
(c) Both (a) and (b)
(d) (a) or (b)
a
Answer the following questions. Each question carries one mark.
Que. 8 A statistic is
(a) A function of sample observations
(b) A function of population units
(c) A characteristic of a population
(d) A part of a population.
a
Answer the following questions. Each question carries one mark.
Que. 9 Sampling Fluctuations may be described as
(a) The variation in the values of a statistic
(b) The variation in the values of a sample
(c) The differences in the values of a parameter
(d) The variation in the values of observations.
a
Answer the following questions. Each question carries one mark.
Que. 10 The sampling distribution is
(a) The distribution of sample observations
(b) The distribution of random samples
(c) The distribution of a parameter
(d) The probability distribution of a statistic.
d
Answer the following questions. Each question carries one mark.
Que. 11 Standard error can be described as
(a) The error committed in sampling
(b) The error committed in sample survey
(c) The error committed in estimating a parameter
(d) Standard deviation of a statistic.
d
Answer the following questions. Each question carries one mark.
Que. 12 A measure of precision obtained by sampling is given by
(a) Standard error
(b) Sampling fluctuation
(c) Sampling distribution
(d) Expectation.
a
Answer the following questions. Each question carries one mark.
Que. 13 As the sample size increases, standard error
(a) Increases
(b) Decreases
(c) Remains constant
(d) Decreases proportionally.
b
Answer the following questions. Each question carries one mark.
Que. 14 If from a population with 25 members, a random sample
without replacement of 2 members is taken, the number of all such
samples is
(a) 300
(b) 625
(c) 50
(d) 600
a
Answer the following questions. Each question carries one mark.
Que. 15 A population comprises 5 members. The number of all
possible samples of size 2 that can be drawn from it with replacement
is
(a) 100
(b) 15
(c) 125
(d) 25
d
Answer the following questions. Each question carries one mark.
Que. 16 Simple random sampling is very effective if
(a) The population is not very large
(b) The population is not much heterogeneous
(c) The population is partitioned into several sections.
(d) Both (a) and (b)
d
Answer the following questions. Each question carries one mark.
Que. 17 Simple random sampling is
(a) A probabilistic sampling
(b) A non- probabilistic sampling
(c) A mixed sampling
(d) Both (b) and (c).
a
Answer the following questions. Each question carries one mark.
Que. 18 According to Neyman’s allocation, in stratified sampling
(a) Sample size is proportional to the population size
(b) Sample size is proportional to the sample SD
(c) Sample size is proportional to the sample variance
(d) Population size is proportional to the sample variance.
a
Answer the following questions. Each question carries one mark.
Que. 19 Which sampling provides separate estimates for population
means for different segments and also an over all estimate?
(a) Multistage sampling
(b) Stratified sampling
(c) Simple random sampling
(d) Systematic sampling
b
Answer the following questions. Each question carries one mark.
Que. 20 Which sampling adds flexibility to the sampling process?
(a) Simple random sampling
(b) Multistage sampling
(c) Stratified sampling
(d) Systematic sampling
b
Answer the following questions. Each question carries one mark.
Que. 21 Which sampling is affected most if the sampling frame
contains an undetected periodicity?
(a) Simple random sampling
(b) Stratified sampling
(c) Multistage sampling
(d) Systematic sampling
d
Answer the following questions. Each question carries one mark.
Que. 22 Which sampling is subjected to the discretion of the sampler?
(a) Systematic sampling
(b) Simple random sampling
(c) Purposive sampling
(d) Quota sampling.
c
Answer the following questions. Each question carries one mark.
Que. 23 If a random sample of size 2 with replacement is taken from
the population containing the units 3,6 and 1, then the samples would
be
(a) (3, 6),(3, 1),(6, 1)
(b) (3, 3),(6, 6),(1, 1)
(c) (3, 3),(3, 6),(3, 1),(6, 6),(6, 3),(6, 1),(1, 1),(1, 3),(1, 6)
(d) (1, 1),(1, 3),(1, 6),(6, 1),(6, 2),(6, 3),(6, 6),(1, 6),(1, 1)
c
Answer the following questions. Each question carries one mark.
Que. 24 If a random sample of size two is taken without replacement
from a population containing the units a,b,c and d then the possible
samples are
(a) (a, b),(a, c),(a, d)
(b) (a, b),(b, c), (c, d)
(c) (a, b), (b, a), (a, c),(c,a), (a, d), (d, a)
(d) (a, b), (a, c), (a, d), (b, c), (b, d), (c,d)