STATISTICAL INFERENCE 1 LECTURE NOTES.
Recall that the purpose of descriptive statistics is to make the collected data more easily
comprehensible and understandable. Some tools we examined in descriptive statistics include
frequency distributions, measures of central tendency, and measures of dispersion, among others.
Because it is not always possible to address every member of the population, we take samples. The
statistical question that needs to be answered is whether or not the characteristics observed in the
sample are likely to reflect the true characteristics of the larger population from which the sample
was taken. Inferential statistics provide us with the tools we need to answer this question.
Inference refers to reaching a conclusion based on available information/ knowledge or facts.
Statistical inference refers to the act of generalizing from a sample to a population with calculated
degree of certainty. The aim of statistical inference is to make certain determinations with regard to
the unknown constants known as parameter(s) in the underlying distribution. In other words,
Statistical inference aims at learning characteristics of the population from a sample.
Illustration
We want to learn about population parameters but we can only calculate sample statistics. What do
we do?
Population Sample
Data
Parameter Infer
Statistics
Statistical Inference is divided into estimation and hypothesis testing. Estimation is carried out
using numerous procedures that are used to estimate the population parameters using sample data
while Hypothesis testing involves determining whether the population parameters estimated are
realistic or not.
BASIC CONCEPTS IN INFERENTIAL STATISTICS:
1. Population: refers to all elements of interest. It is the totality of observations with which a
statistician is concerned.
2. Census: This is when every member or unit in the population is surveyed.
3. Sample: is the subset of the population.
4. Sampling Units: These are people /items to be sampled.
5. Sampling frame: This is a list of sampling units.
6. Parameter: This is a numerical value describing the characteristic of the population. It is
usually assumed to be fixed but unknown. Examples of parameters include the population
mean (μ) and the population standard deviation (σ).
7. Statistic: This is a numerical value describing the characteristic of a sample. A statistic
estimates a parameter and it changes with each new sample. Examples of sample statistics:
sample mean (𝑥̅ ) and sample variance (s2).
Symbols used to denote parameters and statistics.
Population parameter Sample statistic
N: Number of observations in the population n: Number of observations in the sample
Ni: Number of observations in population i ni: Number of observations in sample i
Π or P: Proportion of successes in population p: Proportion of successes in sample
μ: Population mean 𝑥̅ : Sample mean
σ: Population standard deviation s: Sample standard deviation
σ : Population Variance
2
s 2: Sample variance
8. Variable: This refers to the characteristic being measured and can be described as either
qualitative/ quantitative.
9. Sampling: This is the process of obtaining a sample from a population. A sample is usually
taken to make useful inferences about the population. The sample must be representative of
the whole population. However, there are errors involved, i.e. sampling errors and non-
sampling errors. Sampling errors are errors that are introduced in the parameter by studying
a sample rather than a population e.g. selecting a wrong sample/biased sample. A biased
sample is a sample which consistently over/under estimates some/all of the characteristics of
the population. Other sampling errors include wrong choices of a sampling unit, lack of a good
sampling frame etc. Non- sampling errors are errors that are introduced in the parameter due
to incorrect information like communication errors, transcription errors at data entry,
ignorance on the part of respondents, deliberate false responses, etc.
10. Sampling methods: Once a sampling frame has been established, you can choose a method
of sampling. There are two categories of this; namely- Random/Probability sampling methods
and Non-random/ Non-probability sampling methods.
11. Dependent samples: These are samples in which the values in one sample affect the values
in another sample. Such samples are paired/matched measurements for one set of items.
12. Independent samples: These are samples in which the values in one sample do not affect the
values in another sample. I.e. the occurrence of one sample doesn’t affect the occurrence of
another sample.
SAMPLING DISTRIBUTIONS
Sampling distributions are probability distributions of statistics. In general, the sampling distribution
of a given statistic is the probability distribution of the values taken by the statistic in all possible
samples of the same size from the same population.
In other words, if we repeatedly collect samples of the same sample size from the population, compute
the statistics (mean, standard deviation, proportion), and then draw a graph (histogram)/frequency
distribution table of those statistics, the distribution of that histogram/table is called the sampling
distribution of the statistics (mean, standard deviation, proportion).
Steps in generating a sampling distribution.
• Choose a population and sample for this experiment.
• Select a sample randomly out of the given population.
• Calculate the sample statistic.
• Follow the above steps for obtaining a number of similar statistics out of the same
population.
• Generate a frequency distribution: Plot the statistics on a graph or tabulate the data. The
final graph or table will represent your sampling distribution.
Significance of Sampling Distributions
The primary purpose of Sampling Distribution is to establish representative results of small
samples of a comparatively larger population. This helps researchers and analysts to dig deep into
the population, get a closer look into small groups of the population, and create generalized results
based on the sample. The significance of sampling distribution is immense in the field of statistics.
• Firstly, the concept of sampling distribution provides accuracy. For any population being
studied, it is important for a researcher to collect all possible samples to generate an
inclusive and effective result. Sampling Distribution allows one to do that by collecting all
possible samples and developing the sample statistics to give the best possible result.
• Secondly, the repeated collection of samples from the same set of subjects leads to
consistency. What's more, the standard error also allows a researcher to reflect on the
deviation and thus identify the unbiased nature of the sampling distribution altogether.
• Thirdly, the variability of the sampling distribution is immensely significant as it reflects
the inclusion of numerous samples from the same set of subjects. This leads to an almost
symmetric graph. The variability also ensures that all possible samples are collected from
the population.
Sampling distribution of the sample mean
The sampling distribution of the sample mean focuses on calculating the means of all possible
samples of sample size n which are then arranged to form a probability distribution of the sample
mean. When the average of every sample is put together, the mean and variance of the sampling
distribution is calculated which reflects the nature of the whole population.
Illustration
Imagine a population with 3 members. Let us select 6 random samples (with replacement) each of
sample size 2 and note down the height for each member of the sample. The sampling distribution
of the sample mean height can be obtained as below,
Step 1. Draw the 6 samples, each of size 2 and calculate the mean height for each.
Sample Members of the sample Sample average height
1 6.2ft,5.2ft 5.7ft
2 5.5ft,6.2ft 5.85ft
3 6.2ft,6.2ft 6.2ft
4 5.5ft,5.5ft 5.5ft
5 5.2ft,6.2ft 5.7ft
6 5.2ft,6.2ft 5.7ft
Step 2: Construct the sampling distribution of the sample mean
Sample average height (ft) 5.7 5.85 5.5 6.2
Frequency 3 1 1 1
Probability 3/6 1/6 1/6 1/6
Mean and Variance of the Sampling distribution of the sample mean.
For samples of any size drawn from a given population , the sample mean will be distributed,
𝜎2
with mean 𝜇𝑋̅ = μ and variance , where n is the sample size.
𝑛
Proof
Suppose you draw n random independent observations 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 from a normally
distributed population with mean 𝜇 and variance 𝜎 2 .
Mean is given as,
𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
𝑥̅ =
𝑛
𝑥1 +𝑥2 +𝑥3 +⋯+𝑥𝑛
𝐸 (𝑥̅ ) = E[ ]
𝑛
But 𝐸 (𝑎𝑥 ) = 𝑎𝐸 (𝑥)
1 1 1 1
𝐸 (𝑥̅ ) = 𝑛 E[𝑥1 ] + 𝑛 E[𝑥2 ] + 𝑛 E[𝑥3 ] + ⋯ + 𝑛 E[𝑥𝑛 ]
But 𝐸 (𝑥 ) = 𝜇 = E[𝑥1 ] = E[𝑥2 ] = E[𝑥3 ] = ⋯ = E[𝑥𝑛 ]
1
𝐸 (𝑥̅ ) = 𝑛 (𝜇 + 𝜇 + 𝜇 + ⋯ + 𝜇)
1
𝐸 (𝑥̅ ) = 𝑛 (𝑛𝜇)
𝐸 (𝑥̅ ) = 𝜇, which is the mean of the sampling distribution of the sample mean.
Variance is given as,
𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
Var(𝑥̅ ) = 𝑉𝑎𝑟( )
𝑛
𝑥 𝑥 𝑥 𝑥
𝑉𝑎𝑟(𝑥̅ ) = Var ( 𝑛1 ) + Var ( 𝑛2 ) + Var ( 𝑛3 ) + ⋯ + Var ( 𝑛𝑛)
But 𝑉𝑎𝑟(𝑎𝑥) = 𝑎2 𝑉𝑎𝑟(𝑥 )
1 1 1 1
𝑉𝑎𝑟(𝑥̅ ) = 𝑛2 Var(𝑥1 ) + 𝑛2 Var(𝑥2 ) + 𝑛2 Var(𝑥3 ) + ⋯ + 𝑛2 Var(𝑥𝑛 )
But 𝑉𝑎𝑟(𝑥 ) = 𝜎 2 = Var(𝑥1 ) = Var(𝑥2 ) = Var(𝑥3 ) = ⋯ = Var(𝑥𝑛 )
𝜎2 𝜎2 𝜎2 𝜎2
𝑉𝑎𝑟(𝑥̅ ) = 𝑛2 + 𝑛2 + 𝑛2 + ⋯ + 𝑛2
𝜎2
𝑉𝑎𝑟(𝑥̅ ) = 𝑛(𝑛2 )
𝜎2
𝑉𝑎𝑟(𝑥̅ ) = , which is the variance of the sampling distribution of the sample mean.
𝑛
σ
And, the standard error of the mean (SEM) = √ ( 𝑉𝑎𝑟(𝑥̅ )) = √n
∑𝑛
𝑖=1 𝑥𝑖
Generally, if 𝑥̅ = is the sample mean of a random sample of size n drawn from a given
𝑛
population having mean µ and standard deviation σ, then 𝑥̅ follows a distribution with mean µ and
σ σ
standard deviation √n. For example if 𝑥𝑖 ∼ N (µ, σ) =⇒ 𝑥̅ ∼ N (µ, √n).
Normal Population
A normal distribution is one in which the values are evenly distributed both above and below the
mean. A population has a precisely normal distribution if the mean, mode, and median are all
equal. For example, the population of 3,4,5,5,5,6,7, the mean, mode, and median are all 5 hence
being normally distributed.
A normal distribution can also be defined as a probability distribution that is symmetric about the
mean, showing that data near the mean are more frequent in occurrence than data far from the
mean. In graph form, normal distribution will appear as a bell curve.
Properties of a normal distribution.
• The mean, mode and median are all equal.
• The curve is symmetric at the center (i.e. around the mean, μ).
• Exactly half of the values are to the left of center and exactly half the values are to the right.
• The total area under the curve is 1.
• Unimodal (one mode)
• Asymptotic to the x-axis. i.e. as the distance from the mean increases the curve approaches
to the base line more and more closely.
CENTRAL LIMIT THEOREM AND THE LAW OF LARGE NUMBERS
Central limit theorem states that;
If a random sample of size n is selected from any population with mean µ and standard deviation
σ
σ, then 𝑥̅ is approximately N (µ, √n) when n is sufficiently large.
i.e.
If 𝑥̅ is the mean of a random sample of size n taken from a population with mean µ and finite
𝑥̅ −µ
variance σ 2 , then 𝑧 = σ → N (z; 0,1) as n → ∞. Where Z is a standard normal distribution.
√n
NOTE. The Central Limit Theorem is important because, for reasonably large sample size, it
allows us to make an approximate probability statement concerning the sample mean, without
knowledge of the shape of the population distribution.
• Again, one of the essential assumptions is a random sample.
• The distribution of X has the approximately normal distribution if the random sample is
from a population other than normal.
• How large a sample size? Usually, it would be safe to apply the CLT if n ≥ 30. It also
depends on the population distribution, however. More observations are required if the
population distribution is far from normal.
The Law of Large numbers states that;
As sample size increases, its mean gets closer to the average of the whole population
Example 1:
The average male drinks 2 L of water when active outdoors with a standard deviation of 0.7 L.
You are planning a full day nature trip for 50 men and will bring 110 L of water.
Required;
i) What is the mean and standard deviation of the sampling distribution of sample mean?
ii) State the sampling distribution of the sample mean.
iii) What is the probability that you will run out of water?
Example 2.
An auto-maker does quality control tests on the paint thickness at different points on its car parts
since there is some variability in the painting process. A certain part has a target thickness of 2mm.
The distribution of thicknesses on this part is skewed to the right with a mean of 2mm and a
standard deviation of 0.5mm. A quality control check on this part involves taking a random sample
of 100 points and calculating the mean thickness of those points.
Required;
i) What is the shape of the sampling distribution of the sample mean thickness?
ii) Find the mean and standard deviation of the sampling distribution of the sample mean.
iii) Assuming the stated mean and standard deviation of the thicknesses are correct, what is
the probability that the mean thickness in the sample of 100 points is within 0.1mm of the
target value?
Student’s t distribution/t distribution.
𝑥̅ −µ
We have learnt that 𝑧 = σ (exactly or approximately) follows the standard normal distribution,
√n
where the data are from a random sample of size n from the population with mean µ and standard
deviation σ. And, it is very likely that both µ and σ are unknown parameters. In practice, it suffices
that the distribution is symmetric and single-peaked unless the sample is very small. Since most
of the simple work in statistical inference focuses on the unknown population mean µ, we will
need to deal with the unknown σ especially when n is not large (n<30). It is quite intuitive and
natural to estimate the unknown population standard deviation σ using the sample standard
deviation, s.
𝑥̅ −µ 𝑥̅ −µ
We have another statistic 𝑡 = s 𝒊𝒏𝒔𝒕𝒆𝒂𝒅 𝒐𝒇 𝑧 = σ
√n √n
Let 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 be independent random variables that are all normal with mean µ and standard
∑𝑛 𝑥 𝑖 ∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 𝑥̅ −µ
deviation σ. Let 𝑥̅ = 𝑖=1 𝑛
and 𝑠 2
= 𝑛−1
. Then the random variable; 𝑡 = s has a t-
√n
distribution with ν = n−1 degrees of freedom.
NOTE. When n is very large, s is a very good estimate of σ, and the corresponding t distributions
are very close to the normal distribution. The t distributions become wider for smaller sample sizes,
reflecting the lack of precision in estimating σ from s.
In statistics, the number of degrees of freedom is the number of values in the final calculation of
a statistic that are free to vary.
It can be seen that the t-distributions have slightly greater variability than the standard normal
distribution. Also, as degrees of freedom increase, the t-distribution curve gets closer to the
standard normal curve.
Properties of the Student t distribution
• The t distribution is different for different sample sizes, or different degrees of freedom.
• The t distribution has the same general symmetric bell shape as the standard normal distribution,
but it reflects the greater variability (with wider distributions) that is expected with small samples.
• The t distribution has a mean of 0.
• The standard deviation of the t distribution varies with the sample size, but it is greater than 1.
• As the sample size n gets larger, the t distribution gets closer to the standard normal distribution.
Sampling distribution of sample proportions (p-hat)
The probability distribution of the values of the sample proportions (p-hat) in
repeated samples of the same size is called the sampling distribution of p-hat.
If the population is normally distributed/approximately normally distributed with a proportion of p,
then random samples of the same size drawn from the population will have sample proportions
close to p. More specifically, the sampling distribution of sample proportions will have a mean
of p.
𝑋
Consider a sample proportion 𝑝̂ = 𝑛 where x is the number of subjects in the sample with the
characteristic of interest and n is the sample size. x is a binomial random variable with parameters
n and p.
The binomial random variable x has;
• Mean = np
• Variance = np (1 – p)
• Approximately normal distribution for large sample sizes.
𝑝(1−𝑝)
The sampling distribution of sample proportion (𝑝̂ ) has mean p and variance .
𝑛
Proof;
Mean
𝑋
𝐸 (𝑝̂ ) = 𝐸( )
𝑛
1
𝐸 (𝑝̂ ) = 𝐸(𝑋)
𝑛
1
𝐸 (𝑝̂ ) = (𝑛𝑝)
𝑛
𝐸 (𝑝̂ ) = 𝑝
Variance
𝑋
𝑉𝑎𝑟(𝑝̂ ) = 𝑉𝑎𝑟( )
𝑛
1
𝑉𝑎𝑟(𝑝̂ ) = 𝑉𝑎𝑟(𝑋)
𝑛2
1
𝑉𝑎𝑟(𝑝̂ ) = (𝑛𝑝(1 − 𝑝)
𝑛2
𝑝(1 − 𝑝)
𝑉𝑎𝑟(𝑝̂ ) =
𝑛
𝑝(1−𝑝)
Standard deviation = √𝑉𝑎𝑟 (𝑝̂ ) = √
𝑛
Note:
• The sample size required to achieve approximate normality depends on the value of p, i.e.;
if p is close to 0.5, the sample size doesn’t need to be very large. Whereas if p is close to 0
or 1, a much larger sample size is required.
• Since the sample size n appears in the denominator of the square root, the standard
deviation does decrease as sample size increases. Finally, the shape of the distribution of p-
hat will be approximately normal as long as the sample size n is large enough. The
convention is to require both np and n (1 – p) to be at least 5.
𝑝̂−𝑝
• We standardize using 𝑍 =
√𝑝(1−𝑝)
𝑛
Example.
A random sample of 100 students is taken from the population of all part-time students in the
United States, for which the overall proportion of females is 0.6.
(a) There is a 95% chance that the sample proportion (p-hat) falls between what two values?
(b) What is the probability that sample proportion p-hat is less than or equal to 0.56?
ESTIMATION
Estimation refers to the process of using numerous procedures to estimate the population
parameters using sample data. The formula/rule/procedure that is used to calculate an estimate of
a population parameter is called an Estimator while the numerical value that is used to estimate
the population parameter is called an Estimate.
An estimate of a population parameter may be expressed in two ways:
1.Point estimate. A point estimate of a population parameter is a single value used to estimate the
population parameter. For example, the sample mean 𝑥̅ is a point estimate of the population mean
μ. Similarly, the sample proportion 𝑝̂ is a point estimate of the population proportion p.
2.Interval estimate. An interval estimate refers to the range of values within which a population
parameter is said to lie. For example, a < μ < b is an interval estimate of the population mean μ.
POINT ESTIMATION
This is when a single sample statistic is taken as the estimate of the unknown population parameter.
Example:
A random sample of 10 students was drawn from a statistics class of 100 students and their ages
were found to be as follows: 21,20,21,22,23,25,25,25,22,22. Find the point estimates for the
population mean and variance of the students’ ages in the class.
Methods Used in Point Estimation
Suppose that we have a random sample from a population of interest, we may have a theoretical
model for the way that population is distributed. However, there may be several population
parameters of which we do not know the values. We can determine such parameters using various
methods among which include method of moments and maximum likelihood estimation.
Method of Moments
This method involves equating sample moments with theoretical moments.
So, let us start by making sure we learn the definitions of theoretical moments as well as sample
moments.
1. 𝐸 (𝑥 𝑘 ) is the 𝑘 𝑡ℎ theoretical moment of the distribution about the origin for k=1,2, …
2. 𝐸 (𝑥 − 𝜇)𝑘 is the 𝑘 𝑡ℎ theoretical moment about the mean for k=1,2, …
1
3. 𝑚𝑘 = 𝑛 ∑𝑛𝑖=1 𝑥𝑖𝑘 is the 𝑘 𝑡ℎ sample moment for k=1,2,…
1
4. 𝑚𝑘∗ = 𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )𝑘 is the 𝑘 𝑡ℎ sample moment about the mean for k=1,2, …
Procedure (using theoretical moments about the origin)
• Equate the first sample moment about the origin to the first theoretical moment. i.e. 𝑚1 =
𝐸 (𝑥 )
• Equate the second sample moment about the origin to the second theoretical moment. i.e.
𝑚2 = 𝐸 (𝑥 2 ).
• Continue equating sample moments about the origin, 𝑚𝑘 with the corresponding
theoretical moments 𝐸 (𝑥 𝑘 ) for k=3, 4… until you have as many equations as you have the
parameters.
• Solve the parameters
• The resulting values are called method of moments estimates.
Example 1
Let 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 be Bernoulli random variables with parameter p. what is the MoM estimator
of p?
Example 2
Let 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 be normal random variables with parameters mean 𝜇 and variance 𝜎 2 . what is
the MoM estimators of mean and variance?
Example 3.
Four losses are observed from a Gamma distribution. The observed losses are 200, 300, 350, and
450. Find the method of moments estimate for α and β.
Example 4.
2
The random variable X has the density function 𝑓(𝑥 ) = 𝜃2 (𝜃 − 𝑥 ), 0 < X < θ. A random sample
of two observations of X yields values 0.50 and 0.90. Determine the method of moments estimate
of θ.