📑 STATISTICS PROJECT
Page 1: ACKNOWLEDGMENT
I sincerely thank my Mathematics teacher for her valuable guidance, motivation, and support
throughout the completion of this project. Her encouragement helped me to understand and
apply statistical concepts effectively.
I am also grateful to my parents and friends for their continuous help and inspiration during this
project.
Page 2: INDEX
Page Content
1 Acknowledgment
2 Index
3 Introduction
4-6 Types of Measures of Central Tendency (Advantages &
Disadvantages)
7-9 Problem 1 – Diabetic Patients Data (Step Deviation, Histogram, Ogive)
10-12 Ogive-based Problem (Median, Quartiles, Deciles, Modal Class)
13-14 Analysis & Comparison of Results
15-18 Problem 2 – Salaries (Median, Interquartile Range, Mode)
19-20 Problem 3 – Missing Frequencies
21 Conclusion
22 Bibliography
Page 3: INTRODUCTION
Statistics is the branch of mathematics that deals with the collection, analysis, interpretation,
and presentation of data. In real life, data is often large and unorganized, and it becomes
necessary to represent it in a simplified form for proper understanding and decision-making.
One important tool of statistics is the measure of central tendency, which is a single
representative value that gives an idea about the central or typical value in a set of data.
The three most common measures of central tendency are:
● Mean (Average)
● Median (Middle value)
● Mode (Most frequent value)
This project aims to explore these measures using different datasets and statistical methods,
including step deviation, histograms, ogives, median, quartiles, deciles, and missing frequency
problems.
Pages 4-6: TYPES OF MEASURES OF CENTRAL
TENDENCY
1. Mean
● Definition: The mean is the sum of all data values divided by the number of values.
● Advantages:
1. Easy to calculate and understand.
2. Considers all observations in the dataset.
3. Useful for further algebraic and statistical analysis.
● Disadvantages:
1. Heavily affected by extreme values (outliers).
2. Not suitable for qualitative data.
2. Median
● Definition: The median is the middle value of an ordered dataset. For grouped data, it is
obtained using interpolation.
● Advantages:
1. Not affected by outliers.
2. Best suited for skewed data.
● Disadvantages:
1. Ignores most of the data values.
2. Cannot be used in further algebraic analysis.
3. Mode
● Definition: The mode is the most frequently occurring value in the dataset.
● Advantages:
1. Very simple to understand and apply.
2. Suitable for categorical and qualitative data.
● Disadvantages:
1. Mode may not exist, or there may be more than one.
2. Does not use all data values.
Pages 7-9: PROBLEM 1 – DIABETIC PATIENTS
Data Table:
Age (years) <10 <20 <30 <40 <50 <60 <70 <80
No. of Patients 2 11 27 37 45 55 65 72
Step Deviation Method (Mean)
1. Convert cumulative data to class intervals:
0–10, 10–20, 20–30, 30–40, 40–50, 50–60, 60–70, 70–80
2. Class width (h) = 10
3. Take assumed mean A = 45 (class 40–50 midpoint).
4. Compute deviations:u =
(Here, you’ll show the calculation table with midpoints, deviations, and products. The final
answer will come out as the approximate mean age of diabetic patients.)
Histogram
(Graphical representation – insert histogram with Age on X-axis, Frequency on Y-axis.)
Ogive
(Graphical representation – insert ogive with cumulative frequency vs. upper class limits.)
Pages 10-12: OGIVE-BASED PROBLEM
1. Frequency distribution (class width 10)
Upper-cumulative frequencies read from the ogive:
at 10 → 5, 20 → 15, 30 → 29, 40 → 50, 50 → 75, 60 → 109, 70 → 145, 80 → 172, 90 → 188,
100 → 200.
Frequency = successive differences:
Class Cumulative Freq (CF) Class Freq (f)
(marks)
0 – 10 5 5
10 – 20 15 10
20 – 30 29 14
30 – 40 50 21
40 – 50 75 25
50 – 60 109 34
60 – 70 145 36
70 – 80 172 27
80 – 90 188 16
90 –100 200 12
Total 200 200
Total students N=200N=200.
(i) Scale of the graph
● X-axis (Marks): 1 small square = 1 mark (1 big square = 10 marks).
● Y-axis (Number of students): 1 small square = 2 students (1 big square = 20 students).
(These are the scales consistent with the axis labelling and the point at (100,200).)
(ii) Median and median class
● Median position = = 100th observation.
● CF at 50 = 75, at 60 = 109 → median lies in class 50–60 (class frequency f=34).
Use linear interpolation:
Median ≈ 57.35 marks. Median class = 50–60.
(iii) Upper quartile (Q3) and its class
● Q3 position = = 150th observation.
● CF at 70 = 145, at 80 = 172 → Q3 lies in class 70–80 ( f=27).
Interpolation:
Q3 ≈ 71.85 marks. Upper-quartile class = 70–80.
(iv) First decile
● D1D_1 position = N/10=20N/10 = 20th observation.
● CF at 20 = 15, at 30 = 29 → lies in class 20–30 ( f=14).
Interpolation:
First decile D1≈23.57
marks (class 20–30)
(v) Modal class
The class with maximum frequency is 60–70 (frequency = 36).
Modal class = 60–70.
(vi) Number of students who scored 95% or more
95% of 100 = 95 marks. We need number of students with marks
From the CF: CF at 90 = 188, CF at 100 = 200. The 90–100 class has 12 students.
Assuming uniform distribution inside that class,
So number scoring
Approximately 6 students scored 95% or more.
Pages 13-14: ANALYSIS OF PROBLEM 1 & 2
● By comparing Mean, Median, and Mode, we can see how different measures of central
tendency provide slightly different insights into the same dataset.
● The age group 60–70 shows the highest number of diabetic patients.
● Possible reasons: age-related health decline, less physical activity, poor dietary habits,
and hereditary factors.
● The graphical methods (Histogram, Ogive) visually confirm these results.
Pages 15-18: PROBLEM 2 – SALARIES
Data Table:
Salary (₹ ‘000) 12 27 33 42 51 56 58 62 70
No. of Persons 49 128 63 15 6 7 4 2 1
i) Median Salary
(Step-by-step working here with N=275. Show cumulative frequencies, identify median class,
then calculate median.)
ii) Interquartile Range (IQR)
(Show detailed calculation using interpolation for Q1 and Q3 positions.)
iii) Modal Salary
(Show working by identifying the modal class and substituting values.)
Pages 19-20: PROBLEM 3 – MISSING FREQUENCIES
Given:
● Σf = 120
● Mean = 50
Class Interval Frequenc
y
0–20 17
20–40 a
40–60 32
60–80 b
80–100 19
● Class midpoints: 10, 30, 50, 70, 90
Equation of mean:
Also:
Solve simultaneously for a and b.
Page 21: CONCLUSION
This project gave me practical knowledge of how statistics is applied to real-life situations. I
learned how to calculate mean, median, mode, quartiles, and deciles using different methods. I
also understood how to draw and interpret histograms and ogives.
Through the problems on diabetic patients and salary distribution, I realized the importance of
statistics in health studies, economics, and social sciences.
Page 22: BIBLIOGRAPHY
1. NCERT Mathematics Textbook (Class X)
2. R.S. Aggarwal – Statistics and Probability
3. Online Resources: Khan Academy, BYJU’s, PhysicsWallah, Vedantu
4. Teacher’s classroom notes