0% found this document useful (0 votes)

11 views26 pages

Statistical Fundamentals

The document provides an overview of statistics, emphasizing its importance in various fields and everyday life. It distinguishes between descriptive and inferential statistics, explains types of variables, and discusses sampling methods, data types, and measures of central tendency. Additionally, it introduces frequency distributions and graphical representations of data, such as histograms and pie charts.

Uploaded by

tanjim09826

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views26 pages

Statistical Fundamentals

Uploaded by

tanjim09826

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Fundamental Issues of Statistics

Introduction to Statistics
➢ What do you think of when you see/hear the word “Statistics”? The majority of people
immediately think of numerical facts, data, graphs and tables. But not only do statisticians
collect, classify and tabulate data, they also analyze data in order to make generalizations and
decisions.

Why study statistics?

• Everyone comes in contact with statistics in everyday life.
• People should understand reports in newspapers, magazine and journals.
• People should be able to question the statistics they read, and not blindly accept this as
proven fact.

➢ Many areas of study use statistics, such as; psychology, sociology, business, biology,
government, engineering, science education and even areas such as history, language and the
arts.

➢ Statistics is the science of collecting, organizing, summarizing, and analyzing data to draw
conclusions or answer questions. It also provides a measure of confidence in any
conclusions.

➢ Statistics is the discipline that concerns the collection, organization, analysis, interpretation,
and presentation of data.

Two types of statistics:

• Descriptive statistics: the use of numbers to summarize information which is known
about some population. [collecting, organizing and summarizing the data]

Descriptive Statistics is the branch of statistics that focuses on collecting, summarizing,

and presenting a set of data.

• Inferential statistics: the use of numbers related to a random sample from a population
to give numerical information about the population itself. [analyzing the data to draw
conclusions or answer questions about the population]

Inferential Statistics is the branch of statistics that analyzes sample data to draw
conclusions about a population.

➢ Probability is very important in inferential statistics; it’s related to the risk of making an
error.

➢ Variables are characteristics of the individuals or things being studied. Variable is a

characteristic of an individual that will be analyzed using statistics.

Page 1 of 26
Fundamental Issues of Statistics
Two types of variables:
• Qualitative or categorical variable – classification based on some attribute or
characteristic of the individual (non-numerical).

Categorical (qualitative) variables have values that can only be placed into categories,
such as “yes” and “no”; major; architectural style; etc. Example: hair color, eye color,
gender, race, ethnicity

• Quantitative or numerical variable – provides numerical measures of individuals.

Numerical (quantitative) variables have values that represent quantities.

Two types of quantitative variables:

✓ Discrete (something that can be counted) – has either a finite number of possible
values or a countable number of possible values.

Discrete variables arise from a counting process. Example: number of cars at a light, number
of students in a classroom, number of rooms in a house.

✓ Continuous (something that can be measured) – has an infinite number of possible

values that are not countable.

Continuous variables arise from a measuring process. Example: height, weight, age, miles
per gallon, time.

➢ Raw scores or data: Numbers obtained in a particular situation. A collection of raw scores is
usually called a distribution of scores. Data are individual facts or items of information.

Examples of a distribution:
• Test scores on an exam in a particular class.
• Ages of students at MCCC.
• IQ’s of a random sample of 6th grade students in the Trenton school district.

➢ Population – All people or things being considered in a particular situation.

A population consists of all the items or individuals or subjects about which you want to
draw a conclusion. So, the population is the “large group” in which you are interested.

Example of the population:

• All students in the 6th grade in the Trenton school district.
• The mean IQ score of all 6th grade students in the Trenton school district.

✓ A parameter is a numerical summary of a population. Parameter is a numerical measure that

describes a characteristic of a population.

Page 2 of 26
Fundamental Issues of Statistics
➢ Sample – any portion (subset) of a population under consideration.

A sample is the portion of a population selected for analysis. The sample is the “small group”
for whom we have (or plan to have) data, often randomly selected.

Examples of the sample:

• Fifty 6th grade students from the Trenton school district.
• The mean IQ score for fifty 6th grade students from the Trenton school district.

✓ A statistic is a numerical summary of a sample. Statistic is a numerical measure that

describes a characteristic of a sample

✓ A parameter goes with a population and a statistic goes with a sample.

o Random Sample: A sample selected in such a way that every member of the population has
an equal chance of being selected. The members of the random sample are picked arbitrarily
from the population.

Explanation:
Population: All students attending Mercer County Community College Variable: Some measure
of mathematical ability

Sample: Students leaving a section of calculus at MCCC.

This is not a random sample from the population of all students at MCCC. From this sample we
should not attempt to infer anything about the mathematical ability of all students at MCCC.

A bias in obtaining a sample will destroy the value of the statistical information obtained since
statistical inferences made from this information would be invalid.

Why use a sample instead of a population?

• Time
• Money
• Population continuously changing.
• Cannot use the entire population.

Primary & Secondary Data:

Primary data are the original data derived from your research endeavors. Secondary data are data
derived from your primary data. Primary data is information collected through original or first-
hand research. For example, surveys and focus group discussions. On the other hand, secondary
data is information which has been collected in the past by someone else. For example,
researching the internet, newspaper articles and company reports.

Page 3 of 26
Fundamental Issues of Statistics
Basic Terms of Frequency Table

Let us consider the following frequency distribution table consisting the weights of 50 students.

Table: Frequency distribution table for the weight of 50 students

Class Mark Cumulative
Class Interval Class Boundary Frequency
(Mid Value) Frequency
(Class) (Original Class) (f)
(x) (F)
54 - 57 53.5 - 57.5 55.5 6 6
58 - 61 57.5 - 61.5 59.5 9 15
62 - 65 61.5 - 65.5 63.5 11 26
66 - 69 65.5 - 69.5 67.5 16 42
70 - 73 69.5 - 73.5 71.5 8 50

A frequency distribution shows us a summarized grouping of data divided into mutually

exclusive classes and the number of occurrences in a class. It is a way of showing unorganized
data notably to show results of an event considered for a certain interest.

The frequency distribution table is an arrangement of the values that one or more variables take
in a sample. Each entry in the table contains the frequency or count of the occurrences of values
within a particular group or interval, and in this way, the table summarizes the distribution of
values in the sample.

The frequency distribution is a representation, either in a graphical or tabular format that

displays the number of observations within a given interval. The interval size depends on the
data being analyzed and the goals of the analyst. The intervals must be mutually exclusive and
exhaustive.

In statistics, a frequency distribution is a list, table or graph that displays the frequency of
various outcomes in a sample. Each entry in the table contains the frequency or count of the
occurrences of values within a particular group or interval.

The class interval (or class width) is the same for all classes. The classes all taken together must
cover at least the distance from the lowest value (minimum) in the data to the highest
(maximum) value. Equal class intervals are preferred in a frequency distribution, while
unequal class intervals (for example logarithmic intervals) may be necessary for certain

Page 4 of 26
Fundamental Issues of Statistics
situations to produce a good spread of observations between the classes and avoid a large
number of empty, or almost empty classes.

Corresponding to a class interval, the class limits may be defined as the minimum value and the
maximum value the class interval may contain. The minimum value is known as the lower-class
limit (LCL) and the maximum value is known as the upper-class limit (UCL).

The class boundaries may be defined as the actual class limit of a class interval. For overlapping
classification or mutually exclusive classification, the class boundaries coincide with the class
limits. This is usually done for a continuous variable. However, for non-overlapping or mutually
inclusive classification, we have lower-class boundaries (LCB) and upper-class boundaries
(UCB) will have the following forms.

𝐷 𝐷
𝐿𝐶𝐵 = 𝐿𝐶𝐿 − & 𝑈𝐶𝐵 = 𝑈𝐶𝐿 +
2 2

where D is the difference between the LCL of the next class interval and the UCL of the given
class interval.

The class midpoint (or class-mark) is a specific point in the center of the classes in a frequency
distribution table. It’s also the center of a bar in a histogram. It is defined as the average of the
upper and lower class limits. The lower-class limit is the lowest value in a class and the upper-
class limits are the highest values that can be in the class. In other words, in a class interval, class
mid-point may be defined as an arithmetic mean or average of the class limits or the class
boundaries.

The frequency (or absolute frequency) of an event is the number of times the event occurred in
an experiment or study. These frequencies are often graphically represented in histograms.

Cumulative frequency is defined as a running total of frequencies. The frequency of an element

in a set refers to how many of that element there are in the set. Cumulative frequency can also be
defined as the sum of all previous frequencies up to the current point.

Cumulative frequency analysis is the analysis of the frequency of occurrence of values of a

phenomenon less than a reference value. The phenomenon may be time- or space-dependent.
Cumulative frequency is also called the frequency of non-exceedance.

Page 5 of 26
Fundamental Issues of Statistics
Statistical Graphs
Frequency polygon: They are formed by lines. On the horizontal axis is the independent
variable (class marks) and on the vertical axis is the dependent variable (frequency).

Page 6 of 26
Fundamental Issues of Statistics
Cumulative frequency polygon: They are formed by increasing lines. On the horizontal axis is
the independent variable (class upper limit) and on the vertical axis is the dependent variable
(cumulative frequency).

Page 7 of 26
Fundamental Issues of Statistics
Pie chart: A circle is divided into sectors. The amplitude of each sector is proportional to the
corresponding frequency.

Consider the total number of students is 300.

Number secured Alocated subject
40 − 55 French
55 − 70 English
70 − 85 Science
85 − 100 Mathematics

An women has to buy in total 150

domestic elements with the following
price ranges and the percentage of the
individual elements are as provided in the
given pie-chart.

Toys 1000 − 2000, furniture 3000 −

4000, home décor 4000 − 5000, and
electronics 2000 − 3000.

Page 8 of 26
Fundamental Issues of Statistics
Histogram: It is a bar graph in which the height of these bars is proportional to the frequency.
There is no space between bars. It is only used if the variable is quantitative and the scale of the
values is continuous.

Page 9 of 26
Fundamental Issues of Statistics
The relation between the frequency polygon and the histogram:

Please, visit the following links for more details.

https://www.statisticshowto.datasciencecentral.com/
https://www.tutorialspoint.com/statistics/index.htm
https://people.richland.edu/james/lecture/m170/ch02-def.html
https://www.scribbr.com/statistics/
https://libguides.library.curtin.edu.au/uniskills/numeracy-skills/statistics
https://www.statisticshowto.com/probability-and-statistics/
https://courses.lumenlearning.com/introstats1/chapter/learning-outcomes/

Page 10 of 26
Fundamental Issues of Statistics
Measures of Central Tendency
Central tendency: A measure of central tendency is a single value that attempts to describe a set
of data by identifying the central position within that set of data. As such, measures of central
tendency are sometimes called measures of central location.

The Mean, Mode (Mo) and Median (Me) are all valid measures of central tendency, but under
different conditions, some measures of central tendencies, such as Quartile, Decile, and
Percentile become more appropriate to use than others.

There are three types of Mean, namely Arithmetic Mean (AM), Geometric Mean (GM), and
Harmonic Mean (HM).

For 𝑛 number of classes Arithmetic Mean can be estimated as

∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖
𝐴𝑀 = 𝑥̅ = 𝑛
∑𝑖=1 𝑓𝑖

For the shift 𝑎 and scale ℎ, the coding formula for Arithmetic Mean can be found as

𝑥𝑖 − 𝑎
𝑢𝑖 =
ℎ

⇒ 𝑥𝑖 = 𝑎 + ℎ𝑢𝑖

⇒ 𝑓𝑖 𝑥𝑖 = 𝑎𝑓𝑖 + ℎ𝑓𝑖 𝑢𝑖
𝑛 𝑛 𝑛

⇒ ∑ 𝑓𝑖 𝑥𝑖 = 𝑎 ∑ 𝑓𝑖 + ℎ ∑ 𝑓𝑖 𝑢𝑖
𝑖=1 𝑖=1 𝑖=1

∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 ∑𝑛𝑖=1 𝑓𝑖 𝑢𝑖
⇒ = 𝑎 + ℎ
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖

∑𝑛
𝑖=1 𝑓𝑖 𝑢𝑖
So, 𝐴𝑀 = 𝑎 + ℎ ∑𝑛
𝑖=1 𝑓𝑖

For 𝑛 number of classes Geometric Mean can be estimated as

1
𝑛 ∑𝑛
𝑖=1 𝑓𝑖
𝐺𝑀 = (∏ 𝑥𝑖 𝑓𝑖 )
𝑖=1

Page 11 of 26
Fundamental Issues of Statistics
For the feasible estimation, the working formula for Geometric Mean can be found as

1
𝑛 ∑𝑛
𝑖=1 𝑓𝑖
log(𝐺𝑀) = log (∏ 𝑥𝑖 𝑓𝑖 )
𝑖=1

𝑛
1
⇒ log(𝐺𝑀) = log (∏ 𝑥𝑖 𝑓𝑖 )
∑𝑛𝑖=1 𝑓𝑖
𝑖=1

∑𝑛𝑖=1 log(𝑥𝑖 ) 𝑓𝑖
⇒ log(𝐺𝑀) =
∑𝑛𝑖=1 𝑓𝑖

∑𝑛𝑖=1 𝑓𝑖 log(𝑥𝑖 )
⇒ log(𝐺𝑀) =
∑𝑛𝑖=1 𝑓𝑖

∑𝑛 𝑓 log 𝑥𝑖
∑𝑛 ( 𝑖=1𝑛 𝑖 )
𝑖=1 𝑓𝑖 log 𝑥𝑖 ∑𝑖=1 𝑓𝑖
So, 𝐺𝑀 = Antilog ( ∑𝑛
) = 10
𝑖=1 𝑓𝑖

For 𝑛 number of classes Harmonic Mean can be estimated as

1 −1 𝑓 −1
∑𝑛𝑖=1 𝑓𝑖 ( ) ∑𝑛𝑖=1 ( 𝑖 ) ∑𝑛𝑖=1 𝑓𝑖
𝑥𝑖 𝑥𝑖
𝐻𝑀 = ( ) = ( ) =
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 𝑓
∑𝑛𝑖=1 ( 𝑖 )
𝑥𝑖

Formula for Mode of the frequency distribution can be estimated as

∆1
𝑀𝑜 = 𝐿 + ×𝐶
∆1 + ∆2

The class at which the highest frequency is present is called the modal class. 𝐿 is the lower limit
of the modal class, ∆1 = 𝑓𝑚 − 𝑓𝑚−1 is the frequency difference between the modal and pre-modal
class, ∆2 = 𝑓𝑚 − 𝑓𝑚+1 is the frequency difference between the modal and post-modal class, and
𝐶 is the class size.

Formula for Median of the frequency distribution can be estimated as

𝑁
− 𝐹𝑚−1
𝑀𝑒 = 𝐿 + 2 ×𝐶
𝑓𝑚

Page 12 of 26
Fundamental Issues of Statistics
𝑁
The class at which − 𝑡ℎ frequency (𝑁 = ∑𝑛𝑖=1 𝑓𝑖 ) is present is called the median class. 𝐿 is the
2

lower limit of the median class, 𝐹𝑚−1 is the cumulative frequency pre-median class, 𝑓𝑚 is the
frequency of the median class, and 𝐶 is the class size.

𝑖×𝑁
−𝐹𝑞−1
4
Formula for Quartile is 𝑄𝑖 = 𝐿 + × 𝐶 ; 𝑖 = 1,2,3
𝑓𝑞

𝑖×𝑁
The class at which − 𝑡ℎ frequency (𝑁 = ∑𝑛𝑖=1 𝑓𝑖 ) is present is called the 𝑖 − 𝑡ℎ quartile class.
4

𝐿 is the lower limit of the quartile class, 𝐹𝑞−1 is the cumulative frequency pre-quartile class, 𝑓𝑞 is
the frequency of the quartile class, and 𝐶 is the class size.

𝑖×𝑁
−𝐹𝑑−1
10
Formula for Decile is 𝐷𝑖 = 𝐿 + × 𝐶 ; 𝑖 = 1,2, … … , 9
𝑓𝑑

𝑖×𝑁
The class at which − 𝑡ℎ frequency (𝑁 = ∑𝑛𝑖=1 𝑓𝑖 ) is present is called the 𝑖 − 𝑡ℎ decile class. 𝐿
10

is the lower limit of the decile class, 𝐹𝑑−1 is the cumulative frequency pre-decile class, 𝑓𝑑 is the
frequency of the decile class, and 𝐶 is the class size.

𝑖×𝑁
−𝐹𝑝−1
100
Formula for Prcentile is 𝑃𝑖 = 𝐿 + × 𝐶 ; 𝑖 = 1,2, … … , 99
𝑓𝑝

𝑖×𝑁
The class at which − 𝑡ℎ frequency (𝑁 = ∑𝑛𝑖=1 𝑓𝑖 ) is present is called the 𝑖 − 𝑡ℎ percentile
100

class. 𝐿 is the lower limit of the percentile class, 𝐹𝑝−1 is the cumulative frequency pre-percentile
class, 𝑓𝑝 is the frequency of the percentile class, and 𝐶 is the class size.

𝐴𝑀 − 𝑀𝑜 = 3(𝐴𝑀 − 𝑀𝑒)

𝑀𝑜 = 3𝑀𝑒 − 2𝑀𝑒

Page 13 of 26
Fundamental Issues of Statistics
Measures of Dispersion
Dispersion: Dispersion in statistics is a way of describing how to spread out a set of data is.
When a data set has a large value, the values in the set are widely scattered; when it is small the
items in the set are tightly clustered.

The spread of a data set can be described by a range of descriptive statistics including Mean
Deviation (MD), Standard Deviation (SD), and Interquartile Range. Those are called the absolute
measures of dispersion.

Also, there are some relative measures of dispersion, such as co-efficient of Mean Deviation, co-
efficient of Standard Deviation, and co-efficient of Interquartile Range.

There are three types of Mean Deviation, estimated from Arithmetic Mean, Mode, and Median,
respectively.

∑𝑛𝑖=1 𝑓𝑖 |𝑥𝑖 − 𝐴𝑀|

𝑀𝐷𝐴𝑀 =
∑𝑛𝑖=1 𝑓𝑖

∑𝑛𝑖=1 𝑓𝑖 |𝑥𝑖 − 𝑀𝑜|

𝑀𝐷𝑀𝑜 =
∑𝑛𝑖=1 𝑓𝑖

∑𝑛𝑖=1 𝑓𝑖 |𝑥𝑖 − 𝑀𝑒|

𝑀𝐷𝑀𝑒 =
∑𝑛𝑖=1 𝑓𝑖

Co-efficient of Mean Deviation is

𝑀𝐷𝐵𝑎𝑠𝑒
𝐶𝑀𝐷 = × 100% ; 𝐵𝑎𝑠𝑒 = 𝐴𝑀, 𝑀𝑜, 𝑀𝑒
𝐵𝑎𝑠𝑒

Variance and Standard Deviation of the statistical data are

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝐴𝑀)2 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2

2
𝑉(𝑋) = 𝜎 = =
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝐴𝑀)2 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2

𝑆𝐷 = 𝜎 = √ =√
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖

Page 14 of 26
Fundamental Issues of Statistics
The working formula for Standard Deviation is

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖2 − 2𝑥𝑖 𝑥̅ + 𝑥̅ 2 ) ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖2 − 2𝑥̅ ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 + 𝑥̅ 2 ∑𝑛𝑖=1 𝑓𝑖
𝜎=√ = √ = √
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖

2
∑𝑛 𝑓𝑖 𝑥𝑖2 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 ∑𝑛 𝑓 𝑥 2 ∑𝑛 𝑓 𝑥 ∑𝑛 𝑓 𝑥 ∑𝑛 𝑓 𝑥
= √ 𝑖=1 − 2𝑥̅ + 𝑥̅ 2 = √ 𝑖=1 𝑖 𝑖 − 2 𝑖=1 𝑖 𝑖 𝑖=1 𝑖 𝑖 + ( 𝑖=1 𝑖 𝑖 )
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖

2 2 2
∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖2 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖2 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖
=√ − 2( 𝑛 ) +( 𝑛 ) = √ −( 𝑛 )
∑𝑛𝑖=1 𝑓𝑖 ∑𝑖=1 𝑓𝑖 ∑𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 ∑𝑖=1 𝑓𝑖

𝑥𝑖 −𝑎
The coding formula for Standard Deviation is derived from 𝑢𝑖 = as
ℎ

𝑥𝑖 = 𝑎 + ℎ𝑢𝑖

∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 ∑𝑛𝑖=1 𝑓𝑖 𝑢𝑖
⇒ = 𝑎 + ℎ
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖

⇒ 𝑥̅ = 𝑎 + ℎ𝑥̅

So, 𝑥𝑖 − 𝑥̅ = ℎ(𝑢𝑖 − 𝑢̅)

Then, we can write

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2 ∑𝑛𝑖=1 𝑓𝑖 ℎ2 (𝑢𝑖 − 𝑢̅)2 ∑𝑛𝑖=1 𝑓𝑖 (𝑢𝑖 − 𝑢̅)2

𝜎=√ = √ = ℎ × √
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖

Now, the working formula for Standard Deviation is

2
∑𝑛𝑖=1 𝑓𝑖 𝑢𝑖 2 ∑𝑛 𝑓𝑖 𝑢𝑖
𝑆𝐷 = ℎ × √ 𝑛 − ( 𝑖=1 )
∑𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖

Co-efficient of Standard Deviation/Variation is

𝑆𝐷
𝐶𝑆𝐷 = × 100%
𝐴𝑀

Page 15 of 26
Fundamental Issues of Statistics
Interquartile Range is 𝐼𝑄𝑅 = 𝑄3 − 𝑄1

𝑄 −𝑄
Co-efficient of Interquartile Range 𝐶𝐼𝑄𝑅 = 𝑄3+𝑄1 × 100%
3 1

4
𝑀𝐷 = 𝑆𝐷
5

2
𝐼𝑄𝑅 = 𝑆𝐷
3

Page 16 of 26
Fundamental Issues of Statistics
Moments, Skewness, and Kurtosis

Two distributions may have the same Mean and Standard Deviation but may differ in their shape
of the distribution. Further description of their characteristics is necessary that is provided by
measures of skewness and kurtosis. Moments are popularly used to describe the characteristics of
a distribution. They represent a convenient and unifying method for summarizing many of the
most commonly used descriptive statistical measures such as central tendency, variation,
Skewness, and Kurtosis.

The term ‘skewness’ refers to a lack of symmetry or departure from symmetry, e.g., when a
distribution is not symmetrical (or is asymmetrical) it is called a skewed distribution. The
measures of skewness indicate the difference between the manners in which the observations are
distributed in a particular distribution compared with the symmetrical (or normal) distribution.
The concept of skewness gains importance from the fact that statistical theory is often based
upon the assumption of the normal distribution. A measure of skewness is, therefore, necessary
in order to guard against the consequence of this assumption.

In statistics, kurtosis refers to the degree of flatness or peakedness in the region about the mode
of a frequency curve. The degree of kurtosis of a distribution is measured relative to the flatness
or peakedness of a normal curve, it is called “Platykurtic” or “Leptokurtic”. The normal curve
itself is known as “Mesokurtic”.

Page 17 of 26
Fundamental Issues of Statistics

The 𝑟 − 𝑡ℎ raw moment about an arbitrary point 𝐴 is

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝐴)𝑟
𝑚𝑟′ = ; 𝐴 ≠ 𝑥̅
∑𝑛𝑖=1 𝑓𝑖
𝑥𝑖 −𝐴
The coding formula for the 𝑟 − 𝑡ℎ raw moment from 𝑢𝑖 = is
ℎ

𝑥𝑖 − 𝐴 = ℎ𝑢𝑖

Then, we can write

∑𝑛𝑖=1 𝑓𝑖 ℎ𝑟 𝑢𝑖𝑟 ∑𝑛𝑖=1 𝑓𝑖 𝑢𝑖𝑟

𝑚𝑟′ = = ℎ 𝑟
×
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖

The 𝑟 − 𝑡ℎ central moment about the arithmetic mean 𝑥̅ is

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟
𝑚𝑟 =
∑𝑛𝑖=1 𝑓𝑖
We can write
∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ ) ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥̅ ∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖
𝑚1 = = = 𝑛 − 𝑥̅ = 𝑥̅ − 𝑥̅ = 0
∑𝑛𝑖=1 𝑓𝑖 ∑𝑛𝑖=1 𝑓𝑖 ∑𝑖=1 𝑓𝑖
So, 𝑚1 = 0 for every dataset.

Page 18 of 26
Fundamental Issues of Statistics
Again. we can write
∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑚2 = = 𝜎2
∑𝑛𝑖=1 𝑓𝑖
So, 𝑚2 is the variance of the data set and hence 𝑆𝐷 = 𝜎 = √𝑚2 .

Estimation of the central moments from the raw moments

𝑚1 = 0

2
𝑚2 = 𝑚2′ − 𝑚1′

3
𝑚3 = 𝑚3′ − 3𝑚2′ 𝑚1′ + 2𝑚1′

2 4
𝑚4 = 𝑚4′ − 4𝑚3′ 𝑚1′ + 6𝑚2′ 𝑚1′ − 3𝑚1′

Co-efficient of Skewness
𝑚3
𝛾3 =
√𝑚23

If 𝛾3 < 0, the provided data set is called negatively skewed.

If 𝛾3 = 0, the provided data set is called non-skewed (Normal).
If 𝛾3 > 0, the provided data set is called positively skewed.

Co-efficient of Kurtosis
𝑚4
𝛾4 =
𝑚22

If 𝛾4 < 3, the provided data set is called platykurtic (flattered).

If 𝛾4 = 3, the provided data set is called mesokurtic (balanced).
If 𝛾4 > 3, the provided data set is called leptokurtic (peaked).

Corrected central momnets due to class size 𝑐 as chosen as the round figure

(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑)
𝑚1 =0
(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑) 1
𝑚2 = 𝑚2 − 𝑐2
12
(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑)
𝑚3 = 𝑚3
(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑) 1 7
𝑚4 = 𝑚4 − 𝑚2 𝑐2 − 𝑐4
2 240

Page 19 of 26
Fundamental Issues of Statistics
Correlation and Regression

Correlation Analysis: Correlation analysis is applied in quantifying the association between two
continuous variables, for example, a dependent and independent variable or among two
independent variables.

The sign of the coefficient of correlation shows the direction of the association. The magnitude
of the coefficient shows the strength of the association. The sample of a correlation coefficient is
estimated in the correlation analysis.

It ranges between −1 and +1, denoted by r and quantifies the strength and direction of the linear
association among two variables. The correlation among two variables can either be positive, i.e.,
a higher level of one variable is related to a higher level of another or negative, i.e., a higher
level of one variable is related to a lower level of the other.

Page 20 of 26
Fundamental Issues of Statistics

Regression Analysis: Regression analysis involves identifying the relationship between a

dependent variable and one or more variables. The outcome variable is known as the dependent
or response variable and the risk elements, and cofounders are known as predictors or
independent variables. The dependent variable is shown by 𝒚 and independent variables are
shown by 𝒙 in regression analysis.

Page 21 of 26
Fundamental Issues of Statistics
A model of the relationship is hypothesized, and estimates of the parameter values are used to
develop an estimated regression equation. Various tests are then employed to determine if the
model is satisfactory. If the model is deemed satisfactory, the estimated regression equation can
be used to predict the value of the dependent variable given values for the independent variables.

Linear Regression: This is a linear approach to modeling the relationship between the scalar
components and one or more independent variables. If the regression has one independent
variable, then it is known as a simple linear regression. If it has more than one independent
variable, then it is known as multiple linear regression.

Linear regression only focuses on the conditional probability distribution of the given values
rather than the joint probability distribution. In general, all the real world regressions models
involve multiple predictors. So, the term linear regression often describes multivariate linear
regression.

Page 22 of 26
Fundamental Issues of Statistics

Comparison between Correlation and Regression:

Basis Correlation Regression

A statistical measure that defines Describes how an independent

Meaning co-relationship or association of variable is associated with the
two variables. dependent variable.

Dependent and Independent

No difference Both variables are different.
variables

To fit the best line and estimate

To describe a linear relationship
Usage one variable based on another
between two variables.
variable.

To estimate the values of a

To find a value expressing the
Objective random variable based on the
relationship between variables.
values of a fixed variable.

For 𝑁 is the number of inputs and 𝑥 & 𝑦 are two variables, there are some well-known notations

∑(𝑥 − 𝑥̅ )2
𝑉(𝑋) = 𝜎𝑥2 = = 𝑆𝑥𝑥
𝑁
∑(𝑦 − 𝑦̅)2
𝑉(𝑌) = 𝜎𝑦2 = = 𝑆𝑦𝑦
𝑁
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝐶𝑜𝑣(𝑋, 𝑌) = 𝜎𝑥𝑦 = = 𝑆𝑥𝑦
𝑁
Here, 𝑆𝑥𝑦 is called the covariance between 𝑥 & 𝑦.

Page 23 of 26
Fundamental Issues of Statistics
The correlation coefficient

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)

𝜎𝑥𝑦 𝑁 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑟𝑥𝑦 = 𝑟𝑦𝑥 = = =
𝜎𝑥 𝜎𝑦 2 2 √∑(𝑥 − 𝑥̅ )2 ∑(𝑦 − 𝑦̅)2
√∑(𝑥 − 𝑥̅ ) ∑(𝑦 − 𝑦̅)
𝑁 𝑁
𝑁 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
=
√(𝑁 ∑ 𝑥 2 − (∑ 𝑥)2 )(𝑁 ∑ 𝑦 2 − (∑ 𝑦)2 )

𝑥−𝑎 𝑦−𝑏
Assume 𝑢 = &𝑣 = , where 𝑎 & 𝑏 are the shifts and ℎ & 𝑘 are the scales. Then it can be
ℎ 𝑘
shown that

𝑥 = 𝑎 + ℎ𝑢 𝑦 = 𝑏 + 𝑘𝑣
That gives, 𝑥̅ = 𝑎 + ℎ𝑥̅ That gives, 𝑦̅ = 𝑏 + 𝑘𝑣̅
So, 𝑥 − 𝑥̅ = ℎ(𝑢 − 𝑢̅) So, 𝑦 − 𝑦̅ = 𝑘(𝑣 − 𝑣̅ )

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) ℎ𝑘 ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ ) ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ )

𝑟𝑥𝑦 = = = = 𝑟𝑢𝑣
√∑(𝑥 − 𝑥̅ )2 ∑(𝑦 − 𝑦̅)2 √ℎ2 𝑘 2 ∑(𝑢 − 𝑢̅)2 ∑(𝑣 − 𝑣̅ )2 √∑(𝑢 − 𝑢̅)2 ∑(𝑣 − 𝑣̅ )2
𝑁 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣
=
√(𝑁 ∑ 𝑢2 − (∑ 𝑢)2 )(𝑁 ∑ 𝑣 2 − (∑ 𝑣)2 )

So, the Correlation coefficient is independent of the shift and scale.

There are two Regression co-efficient

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)

𝜎𝑥𝑦 𝑁 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) 𝑁 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏𝑦 = 2 = 2 = =
𝑥 𝜎𝑥 ∑(𝑥 − 𝑥̅ ) ∑(𝑥 − 𝑥̅ )2 𝑁 ∑ 𝑥 2 − (∑ 𝑥)2
𝑁

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)

𝜎𝑥𝑦 𝑁 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) 𝑁 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏𝑥 = 2 = 2 = =
𝑦 𝜎𝑦 ∑(𝑦 − 𝑦̅) ∑(𝑦 − 𝑦̅)2 𝑁 ∑ 𝑦 2 − (∑ 𝑦)2
𝑁

Again, for shifting and scaling data set

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) ℎ𝑘 ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ ) 𝑘 ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ ) 𝑘

𝑏𝑦 = = = = × 𝑏𝑣
𝑥 ∑(𝑥 − 𝑥̅ )2 ℎ2 ∑(𝑢 − 𝑢̅)2 ℎ ∑(𝑢 − 𝑢̅)2 ℎ 𝑢
𝑘 𝑁 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣
= ×( )
ℎ 𝑁 ∑ 𝑢2 − (∑ 𝑢)2

Page 24 of 26
Fundamental Issues of Statistics
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) ℎ𝑘 ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ ) ℎ ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ ) ℎ
𝑏𝑥 = = = = × 𝑏𝑢
𝑦 ∑(𝑦 − 𝑦̅)2 𝑘 2 ∑(𝑣 − 𝑣̅ )2 𝑘 ∑(𝑣 − 𝑣̅ )2 𝑘 𝑣
ℎ 𝑁 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣
= ×( )
𝑘 𝑁 ∑ 𝑣 2 − (∑ 𝑣)2

So, the Regression coefficients are independent of the shifts but dependent on scales.

The Regression lines can be formed as given below.

Regression line of 𝑦 on 𝑥 is Regression line of 𝑥 on 𝑦 is

𝑦 − 𝑦̅ = 𝑏𝑦 (𝑥 − 𝑥̅ ) 𝑥 − 𝑥̅ = 𝑏𝑥 (𝑦 − 𝑦̅)
𝑥 𝑦
∑𝑦 ∑𝑥 ∑𝑥 ∑𝑦
⇒ 𝑦− = 𝑏𝑦 (𝑥 − ) ⇒𝑥− = 𝑏𝑥 (𝑦 − )
𝑁 𝑥 𝑁 𝑁 𝑦 𝑁
∑𝑥 ∑𝑦 ∑𝑦 ∑𝑥
⇒ 𝑦 = 𝑏𝑦 (𝑥 − )+ ⇒ 𝑥 = 𝑏𝑥 (𝑦 − )+
𝑥 𝑁 𝑁 𝑦 𝑁 𝑁

Now, we have

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) (∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅))2

𝑏𝑦 × 𝑏 𝑥 = × =
𝑥 𝑦 ∑(𝑥 − 𝑥̅ )2 ∑(𝑦 − 𝑦̅)2 ∑(𝑥 − 𝑥̅ )2 ∑(𝑦 − 𝑦̅)2
2
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) 2
={ } = 𝑟𝑥𝑦
√∑(𝑥 − 𝑥̅ )2 ∑(𝑦 − 𝑦̅)2
So, it can be written as

𝑟𝑥𝑦 = ±√𝑏𝑦 × 𝑏𝑥
𝑥 𝑦

Then, the definitive formulae can be can be applied to find the followings.
𝜎𝑥𝑦 𝜎𝑥𝑦
𝑟𝑥𝑦 𝜎𝑥 𝜎𝑦 𝜎𝑥 𝑟𝑥𝑦 𝜎𝑥 𝜎𝑦 𝜎𝑦
= 𝜎 = = 𝜎 =
𝑏𝑦 𝑥𝑦 𝜎𝑦 𝑏𝑥 𝑥𝑦 𝜎𝑥
𝑥 𝜎𝑥2 𝑦 𝜎𝑦2
𝜎𝑥 𝜎𝑦
⇒ 𝑟𝑥𝑦 = 𝑏𝑦 × ⇒ 𝑟𝑥𝑦 = 𝑏𝑥 ×
𝑥 𝜎𝑦 𝑦 𝜎𝑥
𝜎𝑦 𝜎𝑥
⇒ 𝑏𝑦 = 𝑟𝑥𝑦 × ⇒ 𝑏𝑥 = 𝑟𝑥𝑦 ×
𝑥 𝜎𝑥 𝑦 𝜎𝑦

Page 25 of 26
Fundamental Issues of Statistics
Rank Correlation: Sometimes there doesn’t exist a marked linear relationship between two
random variables but a monotonic relation (if one increases, the other also increases or instead,
decreases) is clearly noticed.

If instead of measuring the correlation between two sets of continuous random variables 𝑥 & 𝑦
we replace their numerical values by their rankings, then we obtain the Rank Correlation
coefficient. The rank of the 𝑖 − 𝑡ℎ element of a sample of size 𝑁 is equal to the index of the
order statistic.

The Rank Correlation coefficient

6 ∑ 𝑑𝑖2
𝑟𝑟𝑎𝑛𝑘 = 1 − ; 𝑑𝑖 = 𝑥𝑖 − 𝑦𝑖
𝑁(𝑁 2 − 1)

Page 26 of 26

Foundational Mathematics of Data Science B. Tech Sem-VI UNIT-I, II
No ratings yet
Foundational Mathematics of Data Science B. Tech Sem-VI UNIT-I, II
41 pages
Elements of Statistics BCA Sem-I.
No ratings yet
Elements of Statistics BCA Sem-I.
46 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Part1 141104090445 Conversion Gate01
No ratings yet
Part1 141104090445 Conversion Gate01
27 pages
Intro to Statistics for Engineers
No ratings yet
Intro to Statistics for Engineers
64 pages
Chapter - I 1. Introduction: - 1.1 Definition and Classification of Statistics
No ratings yet
Chapter - I 1. Introduction: - 1.1 Definition and Classification of Statistics
14 pages
Engineering Stats Overview
No ratings yet
Engineering Stats Overview
50 pages
1 Descriptive Part
No ratings yet
1 Descriptive Part
13 pages
ABE 322 Sta Class 1-2
No ratings yet
ABE 322 Sta Class 1-2
35 pages
Statistics for Students
No ratings yet
Statistics for Students
11 pages
Statistics Assignment
No ratings yet
Statistics Assignment
3 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
Ns Statistics 2022
No ratings yet
Ns Statistics 2022
70 pages
Quant For Student
No ratings yet
Quant For Student
80 pages
Collection of Data Part 2 Edited MLIS
No ratings yet
Collection of Data Part 2 Edited MLIS
45 pages
Chapter 1 Intro To Statistics
No ratings yet
Chapter 1 Intro To Statistics
12 pages
STAT. - Adamu2 Finialcorrect NEW-LASTEST
No ratings yet
STAT. - Adamu2 Finialcorrect NEW-LASTEST
398 pages
Chapter One
No ratings yet
Chapter One
8 pages
Sta 103 L1 Upda2
No ratings yet
Sta 103 L1 Upda2
104 pages
Statistics notes-II
No ratings yet
Statistics notes-II
51 pages
Session 01
No ratings yet
Session 01
16 pages
Note For Int To Statistics
No ratings yet
Note For Int To Statistics
24 pages
Sta 131 Complete Note
No ratings yet
Sta 131 Complete Note
33 pages
STATISTICS Is A Group of Methods Used To Collect
No ratings yet
STATISTICS Is A Group of Methods Used To Collect
17 pages
Statistics for Beginners
No ratings yet
Statistics for Beginners
99 pages
Nature of Statistics
No ratings yet
Nature of Statistics
7 pages
Nature of Statistics
100% (1)
Nature of Statistics
7 pages
MB0040 Statistics
No ratings yet
MB0040 Statistics
18 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
QABD
No ratings yet
QABD
35 pages
Statatics Cha 1
No ratings yet
Statatics Cha 1
8 pages
STAT Module I Notes
No ratings yet
STAT Module I Notes
10 pages
Stats For PGDM
No ratings yet
Stats For PGDM
52 pages
Probability & Statistics Basics
No ratings yet
Probability & Statistics Basics
72 pages
2nd Software Engineering
No ratings yet
2nd Software Engineering
107 pages
Begashaw Probability Full @keleme - 2013
100% (3)
Begashaw Probability Full @keleme - 2013
232 pages
Unit - 1: Statistics: Meaning, Significance & Limitations
No ratings yet
Unit - 1: Statistics: Meaning, Significance & Limitations
11 pages
Lesson 1: Fundamental Concepts and Summation Notation
No ratings yet
Lesson 1: Fundamental Concepts and Summation Notation
8 pages
1 Nature of Statistics
No ratings yet
1 Nature of Statistics
33 pages
Statistics & Probability Basics
No ratings yet
Statistics & Probability Basics
6 pages
Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
100% (1)
Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
4 pages
Statistics For Management I
No ratings yet
Statistics For Management I
82 pages
Statistics Review
No ratings yet
Statistics Review
59 pages
Statistics LEC1
No ratings yet
Statistics LEC1
34 pages
Week One May 20 bcsc108
No ratings yet
Week One May 20 bcsc108
13 pages
Statatics Chapter 1
No ratings yet
Statatics Chapter 1
21 pages
Satatistics
No ratings yet
Satatistics
40 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
7 pages
Definition of Statistics
No ratings yet
Definition of Statistics
4 pages
Chapter - I 1. Introduction: - 1.1 Definition and Classification of Statistics
No ratings yet
Chapter - I 1. Introduction: - 1.1 Definition and Classification of Statistics
14 pages
Descriptive & Infrential Statistics (Q.r-Ii)
No ratings yet
Descriptive & Infrential Statistics (Q.r-Ii)
89 pages
Basic Statistics For Analysis and Interpretation of Assessment Data
No ratings yet
Basic Statistics For Analysis and Interpretation of Assessment Data
24 pages
Statistics by Begashaw Moltot
100% (2)
Statistics by Begashaw Moltot
232 pages
Average: Sagni D. 1
No ratings yet
Average: Sagni D. 1
85 pages
Introduction To Statistics For IGCSE Students
No ratings yet
Introduction To Statistics For IGCSE Students
10 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
4 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
N03 Piecewise Functions
No ratings yet
N03 Piecewise Functions
1 page
HIMI
No ratings yet
HIMI
10 pages
CSE-Math-1151 Course Syllabus Summer 2024
No ratings yet
CSE-Math-1151 Course Syllabus Summer 2024
4 pages
Assignment 02
No ratings yet
Assignment 02
5 pages
Statistics Programs
No ratings yet
Statistics Programs
13 pages
Course Syllabus
No ratings yet
Course Syllabus
3 pages
Probability Tools
No ratings yet
Probability Tools
4 pages
Box Whisker Plot
No ratings yet
Box Whisker Plot
6 pages
Full Functions Booklet
No ratings yet
Full Functions Booklet
29 pages
Online Examination System Project Report
0% (2)
Online Examination System Project Report
125 pages
Merger Acquisition and Corporate Restructuring
No ratings yet
Merger Acquisition and Corporate Restructuring
30 pages
BlueCrest College Fees Guide
No ratings yet
BlueCrest College Fees Guide
1 page
Pico Kaplan Turbine Study
No ratings yet
Pico Kaplan Turbine Study
9 pages
068 Total Champuvoin Architecture DD Specification
No ratings yet
068 Total Champuvoin Architecture DD Specification
3 pages
SPG Admission List 2022 PDF
No ratings yet
SPG Admission List 2022 PDF
21 pages
Abrasion
No ratings yet
Abrasion
5 pages
English Exercises
No ratings yet
English Exercises
26 pages
Assessment Review Competency Test Excel Fundamentals - AlmaBetter
No ratings yet
Assessment Review Competency Test Excel Fundamentals - AlmaBetter
22 pages
Jiraiya English
No ratings yet
Jiraiya English
2 pages
Class 8 - Sci - Revision Practice Worksheet - 2023-24
No ratings yet
Class 8 - Sci - Revision Practice Worksheet - 2023-24
13 pages
ChaseDream Business School Guide CBS - ZH-CN - en
No ratings yet
ChaseDream Business School Guide CBS - ZH-CN - en
11 pages
RIICWD512E Student Assessment Tasks
No ratings yet
RIICWD512E Student Assessment Tasks
39 pages
Hw2 0809spring PDF
No ratings yet
Hw2 0809spring PDF
8 pages
Freely-Jointed Chain: For Up To Date Version of This Document, See Z. Suo
No ratings yet
Freely-Jointed Chain: For Up To Date Version of This Document, See Z. Suo
6 pages
DETC2005-85460: A Note On Denavit-Hartenberg Notation in Robotics
No ratings yet
DETC2005-85460: A Note On Denavit-Hartenberg Notation in Robotics
6 pages
3t-Db-Screen Black - 01-062023
No ratings yet
3t-Db-Screen Black - 01-062023
2 pages
A New Fast and Efficient Decision-Based Algorithm For Removal of High-Density Impulse Noises
No ratings yet
A New Fast and Efficient Decision-Based Algorithm For Removal of High-Density Impulse Noises
4 pages
Antena DM C-175
No ratings yet
Antena DM C-175
2 pages
Dungeons & Lairs #48: Assassin School: Level (APL) of 3, 5, 8, or 11. This Document Of-Credits
No ratings yet
Dungeons & Lairs #48: Assassin School: Level (APL) of 3, 5, 8, or 11. This Document Of-Credits
20 pages
Culinary Arts for Hospitality Students
No ratings yet
Culinary Arts for Hospitality Students
44 pages
Calcium Hydroxide Solubility Study
No ratings yet
Calcium Hydroxide Solubility Study
3 pages
History and Careers in Physics
No ratings yet
History and Careers in Physics
6 pages
Global System For Mobile Communication
100% (1)
Global System For Mobile Communication
7 pages
Hydraulic Hand Pallet Trucks
No ratings yet
Hydraulic Hand Pallet Trucks
32 pages
Soal B.inggris Kelas VII
No ratings yet
Soal B.inggris Kelas VII
4 pages
SAS Programming by Example (9) : Chapter 9 Proc Print Writing Simple Reports
No ratings yet
SAS Programming by Example (9) : Chapter 9 Proc Print Writing Simple Reports
3 pages
Chapter 20 - Cardiac Emergencies
No ratings yet
Chapter 20 - Cardiac Emergencies
96 pages
E-Business Infrastructure Guide
100% (1)
E-Business Infrastructure Guide
35 pages

Statistical Fundamentals

Uploaded by

Statistical Fundamentals

Uploaded by

Fundamental Issues of Statistics

Why study statistics?

Two types of statistics:

Descriptive Statistics is the branch of statistics that focuses on collecting, summarizing,

➢ Variables are characteristics of the individuals or things being studied. Variable is a

• Quantitative or numerical variable – provides numerical measures of individuals.

Numerical (quantitative) variables have values that represent quantities.

Two types of quantitative variables:

✓ Continuous (something that can be measured) – has an infinite number of possible

➢ Population – All people or things being considered in a particular situation.

Example of the population:

✓ A parameter is a numerical summary of a population. Parameter is a numerical measure that

Examples of the sample:

✓ A statistic is a numerical summary of a sample. Statistic is a numerical measure that

✓ A parameter goes with a population and a statistic goes with a sample.

Sample: Students leaving a section of calculus at MCCC.

Why use a sample instead of a population?

Primary & Secondary Data:

Table: Frequency distribution table for the weight of 50 students

A frequency distribution shows us a summarized grouping of data divided into mutually

The frequency distribution is a representation, either in a graphical or tabular format that

Cumulative frequency is defined as a running total of frequencies. The frequency of an element

Cumulative frequency analysis is the analysis of the frequency of occurrence of values of a

Consider the total number of students is 300.

An women has to buy in total 150

Toys 1000 − 2000, furniture 3000 −

Please, visit the following links for more details.

For 𝑛 number of classes Arithmetic Mean can be estimated as

For 𝑛 number of classes Geometric Mean can be estimated as

For 𝑛 number of classes Harmonic Mean can be estimated as

Formula for Mode of the frequency distribution can be estimated as

Formula for Median of the frequency distribution can be estimated as

∑𝑛𝑖=1 𝑓𝑖 |𝑥𝑖 − 𝐴𝑀|

∑𝑛𝑖=1 𝑓𝑖 |𝑥𝑖 − 𝑀𝑜|

∑𝑛𝑖=1 𝑓𝑖 |𝑥𝑖 − 𝑀𝑒|

Co-efficient of Mean Deviation is

Variance and Standard Deviation of the statistical data are

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝐴𝑀)2 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝐴𝑀)2 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2

So, 𝑥𝑖 − 𝑥̅ = ℎ(𝑢𝑖 − 𝑢̅)

Then, we can write

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2 ∑𝑛𝑖=1 𝑓𝑖 ℎ2 (𝑢𝑖 − 𝑢̅)2 ∑𝑛𝑖=1 𝑓𝑖 (𝑢𝑖 − 𝑢̅)2

Now, the working formula for Standard Deviation is

Co-efficient of Standard Deviation/Variation is

The 𝑟 − 𝑡ℎ raw moment about an arbitrary point 𝐴 is

Then, we can write

∑𝑛𝑖=1 𝑓𝑖 ℎ𝑟 𝑢𝑖𝑟 ∑𝑛𝑖=1 𝑓𝑖 𝑢𝑖𝑟

The 𝑟 − 𝑡ℎ central moment about the arithmetic mean 𝑥̅ is

Estimation of the central moments from the raw moments

If 𝛾3 < 0, the provided data set is called negatively skewed.

If 𝛾4 < 3, the provided data set is called platykurtic (flattered).

Regression Analysis: Regression analysis involves identifying the relationship between a

Comparison between Correlation and Regression:

Basis Correlation Regression

A statistical measure that defines Describes how an independent

Dependent and Independent

To fit the best line and estimate

To estimate the values of a

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) ℎ𝑘 ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ ) ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ )

So, the Correlation coefficient is independent of the shift and scale.

There are two Regression co-efficient

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)

Again, for shifting and scaling data set

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) ℎ𝑘 ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ ) 𝑘 ∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ ) 𝑘

The Regression lines can be formed as given below.

Regression line of 𝑦 on 𝑥 is Regression line of 𝑥 on 𝑦 is

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) (∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅))2

The Rank Correlation coefficient

You might also like