Notes on Probability and Statistics
Prepared By Sudip Khadka
CHAPTER ONE
INTRODUCTION AND DISCRIPTIVE STATISTIC
1.1. Statistics: The word statistic is derived from german word “statistik” which mean the political state.
Statistics is the science which deals with the collection, presentation (Classification, tabulation, diagram
and graphs), analysis and interpretation of the numerical data
Function/Application of statistics
The major functions of statistics are elaborated in subsequent sections.
a) Statistics avoids collecting data by avoiding our biases: Data are the fundamental elements of
statisticsand they need to be collected. Collecting data sounds simple but may be the most difficult,
particularly when we want to avoid biases in the process of collecting data. There are numerous
scientific sampling methods in statistics which avoids biases while collecting data. Findings based on
data, which are free ofbiases, can be used to estimate parameters and such estimate get public credibility.
b) Statistics provides techniques for organizing data scientifically: Collected data are in the raw form.
Theyneed to organize scientifically. Statistics provides techniques that help us to organize data
scientifically. Nowadays it is customary to organize data in computer. Organization or management of data
is very essentials in modern research. since well organized data eases the work of data analysis as well as
helps fordrawing desired information very quickly.
c) Statistics enables to draw findings from masses of data: One of the main objectives of data analysis is to
draw desired findings from masses of data. There are statistical methods or techniques that help us to draw
desired findings from a large data set. The desired findings can be used in variety of situations, such as
comparison of the performance between two or more business companies, formulating policies, and to take
decisions.
d) Statistics helps to take decision under uncertainty: One major topic of inferential statistics is hypothesis
testing, which enables us to take decision under uncertainty with desired amount of risk.
e) Statistics enables to interpret data: The derived findings from a data set may need to interpret so that the
people of various walks of life can understand the meaning of the findings. Statistics provides knowledge
for interpreting the findings. vi. Statistics enhances presentation of findings effectively Presentation of
numerical facts to public is as important as finding of facts from the data. Statistics provides various
methods, such as tabular. graphical, summary measures, statistical models etc. for presenting the numerical
findings.
f) Statistics enhances presentation of findings effectively: Presentation of numerical facts to public is as
important as finding affect from the data. Statistics provides various methods, such as tabular. graphical,
summary measures, statistical models etc. for presenting the numerical findings.
Limitation of statistic
Like in many other disciplines, statistics has certain limitations. Some of them are listed below.
i. Statistics works on aggregate level only : Statistical findings are usually interpreted in terms of average,
which may not be true to an individual. For example, statistical finding, such as, average monthly salary of
employees of a company is Rs 40,000, but no employee in the company may be receiving this amount of
salary.
ii. Statistics is relatively weak in the analysis of categorical data: Categorical data are frequently occurred
in practice. However, there are some statistical methods that do not work for categorical data, but they work
Of quantitative data.
iii. Statistics is likely to be misused and imsinterpreted: Statistical methods are often intentionally misused,
in the sense finding ways to interpret data that are favorable to the presenter. Inappropriate use of the
available tools can produce misleading conclusions. In this context, it is worth mentioning the famous quote
believed to be due to Benjamin Disraeli: There are three types of lies - lies, damn lies, and statistics.
iv. Statistics cannot prove anything: The logic employed in inferential statistics is inductive in nature
(drawing inference from small part - sample - to a larger part - population) which is opposite to the
1 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
deductive logic used in mathematics. The deductive logic or argument does not prove anything.
v. Statistical results are sometimes distrusting: Sometimes conflicting nature of statistical statements are
available in the literature, particularly in the medical sciences. For example, statements like "doing X
reduces high blood pressure" and also statements like "doing X actually worsens high blood pressure". This
kind of conflicting statement is mainly when studies were conducted on different groups with different
protocols. However, many readers may fail to notice these distinctions, or the media may oversimplify this
vital contextual information and the public's distrust of statistics is thereby increased.
vi. Statistic is only mean and its laws are not exact
vii. It does not measure a qualitative phenomena which cannot be measure in quantity.
Pictorial Representation of Data
a) Pie chart: A pie-char is a circular diagram which is usually used for
depicting the components of a single factor. Pie chart is an angular two
dimensional diagram in which area of different sectors of circle can represent
items of statistical data. The circle is divided into segments, which are in
proportion to the size of the components. To calculate the angle of each
component, we first equate the total observation equal to 360° i.e. angle at
center of the circle. Then the corresponding angle or any given value
360°
= x given value
Total value
b) Histogram: A Histogram is a graph which is suitable for
frequency distribution with continuous classes, the width of
all bars is equal to class interval and heights of the bars are
in proportion to the frequencies of the respective classes For
unequal class intervals the heights will be proportional to
the frequency density. It is ratio of frequency to
corresponding class size.
𝐜𝐥𝐚𝐬𝐬 𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲
∴ Frequency density=
𝐥𝐞𝐧𝐠𝐭𝐡 𝐨𝐟 𝐜𝐥𝐚𝐬𝐬 𝐰𝐢𝐝𝐭𝐡
c) Ogive curve: It is a graph plotted for the variate values and their corresponding cumulative frequency of
afrequency distribution. Variate values are plotted in x-axis and cumulated frequency in y- axis
1.3.Measure of center tendency/ MEASURE OF LOCATION
The measure of central tendency is defined as an average. It is a single value within which the range of
data tends to cluster which represents a groups of individual values in a simple and concise manner so that the
mind can get a quick understanding of the general idea of the individual in the group. Since the value lies
within the range of data, it is known as measure of central tendency. The objectives of averages are
i. To facilitate comparison
ii. To know about population from a sample.
iii. To present the silent features of mass of complex data.
iv. To trace statistical relations.
2 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
v. To helps in decision making.
Properties/Characteristics of Good/ideal Measure of Central Tendency
The desire properties or requisites of a good measure of central tendencies are as follows.
a. It should be based on all the observations of a set of values.
b. It should be rigidly defined and its values should be definite.
c. It should be easily understand and computable.
d. It should be least affected by extreme values.
e. It should fluctuate least from sample to sample drawn from the same population.
f. It should be suitable for further statistical and mathematical treatments
TYPES OF CENTRAL TENDENCIES
1. Arithmetic Average/Arithmetic mean: Arithmetic mean of set of observation may be defined as the sum
ofobservation divided by the total number of value in the set. It is denoted by x̅
Sum of observation
i.e. x̅= Number of observation
∑X
∴ x̅=
n
Calculation of mean
a. For Individual series
Let x1, x2. X3 ... Xn be the values of the variables .then mean is defined as
∑X
x̅=
n
b. For Discrete Series and continuous series
For discrete and continuous data also known as frequency data, Let X1, X2, X3, ................... Xn be n
observation with corresponding frequency f1, f2, ................. fn. Then the arithmetic mean or simply mean
is defined as:
∑ fX
x̅=
N
Where N=∑f is total frequency
2. Median :Median is a positional average i.e. its value depends on the position occupied by a value in the
frequency distribution. Median is the value of the variable that divides the ordered set of values in to two
equal halves i. e. 50% value are to the left of median and 50% are to the right of median . . It is most
preferable measure of location for asymmetric distribution.
Calculation of median
A. For individual series
a) Arrange the data in either ascending or descending order of magnitude.
b) If the number of observations is odd , the middle value gives the median and if number of observations is
even there will be two middle values so the arithmetic mean of two middle values gives the, median.
c) Formula for calculating the median is as,
Median = Value of n+1 th item.
( 2 )
Here n is the number of observations.
B. For discrete series
i. Arrange the data according to their ascending order of magnitude,
ii. Form the cumulative frequency distribution.
iii. To calculate median we use the following formula as
Median = Value of N+1 th items,
( 2 )
Where 'N' is the total num.ber of observation.
3 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
B. For Continuous series
a) Prepare the less than cumulative frequency distribution
b) Find the value of N/2
c) The class with cumulative frequency just equal or greater than N is median class and this contains median.
2
d) The value of median is calculated by using following formula.
N
−c.f
Median = L + 2 xh
f
Where L is lower limit of median class,
N = ∑f, total frequency,
C. f. is cumulative frequency proceeding to the median class, 'f'' is the frequency of median class, and 'h' is
size of median class
3. Mode: The mode of the values of variable in a series of data is that value of a series which appear most
frequently than any other and around which the other items of the set concentrate densely. It is denoted by
Mo.
Computation of mode in case of continuous series
For frequency distribution with classes in ascending order of magnitude, the class corresponding to the
maximum frequency is called model class and we use the following formula for calculation of mode.
M = l + ∆1 xh
o
∆1 + ∆ 2 1
Where L is lower limit of model class
∆1=f1 − f0 and ∆2=f2 − f0
f1 is maximum frequency .
fo is frequency preceding to model class
fz is frequency succeeding to model class
4. Partition Values
Partition values are variate values of variable, say X, which divide the total number of observations into
equal number of parts. The equal number of parts may be of four, eight. ten, twelve, one hundred etc.
TYPES OF PARTITION VALUES
1. Quartiles :Three variate values of the variable, say X, which divide the series or frequency in to four equal
parts, are called quartiles for the corresponding distribution of X. There are three quartiles namely Q1 , Q2,
Q3.
Calculation of quartiles
a.For individual: th n+1
Qi = value of [i ( 4
)] item where i=1,2,3
b. For Discrete: th
N+1
Qi = value of [i ( 4
)] item where i=1,2,3
c.For continuous:
N
i −c.f
Qi = L + ( 4 ) x h where i=1,2,3
f
i. L is lower limit of the class in which the partition value lies.
ii. cf is the cumulative frequency preceding to the class in which the partition values lie.
iii. h is size of class or class width.
iv. F is the frequency of the class in which the partition value lies.
2. Deciles : Nine variate values of the variable, say X, which divide the series or frequency into ten equal
parts, are called Deciles for the corresponding distribution of X. There are nine Deciles namely D1 , D2, D3.
4 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
. .. . D9
Calculation of Deciles
a. For individual: th
n+1
Qi = value of [i ( 10
)] item where i=1,2,3 ............ 9
b. For Discrete: th
N+1
Qi = value of [i ( 10
)] item where i=1,2,3………9
c.For continuous:
N
i −c.f
Qi = L + ( 4 ) x h where i=1,2,3 ............... 9
10
Where ,
i. L is lower limit of the class in which the partition value lies.
ii. cf is the cumulative frequency preceding to the class in which the partition values lie.
iii. h is size of class or class width.
iv. F is the frequency of the class in which the partition value lies.
3. Percentiles : Nintynine variate values of the variable, say X, which divide the series or frequency into
hundred equal parts, are called Deciles for the corresponding distribution of X. There are nine Deciles
namely P1 , P2, P3 .............P100
Calculation of percentiles
a. For individual: th n+1
Qi = value of [i ( 100 )] item where i=1,2,3 ............ 99
b. For Discrete: th
N+1
Qi = value of [i ( 100 )] item where i=1,2,3………99
c.For continuous:
N
i −c.f
Qi = L + ( 4 ) x h where i=1,2,3 ............... 99Where ,
100
i. L is lower limit of the class in which the partition value lies.
ii. cf is the cumulative frequency preceding to the class in which the partition values lie.
iii. h is size of class or class width.
iv. F is the frequency of the class in which the partition value lies
Dispersion: Dispersion is defined as the scatterness of the items from the central values of data. It is also
called the measurement of variability in the given set of data.
METHODS OF MEASURE OF DISPERSION
A. Range: It is simplest and easiest method of studying the variation in the set of observations. It is defined as
the difference between the largest and smallest values of a set of data. If 'L' is largest and 'S'· is the smallest
value in the data then,
Range=L – S
Co-efficient of range: Relative measure of range called coefficient of Range is defined as Coefficient of
Range (C.O.F)
C.O.F= 𝐋−𝐒
𝐋+𝐒
5 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
B. Quartile Deviation: The difference between the upper and lower quartiles is called inter quartile range;
symbolically it is equal to (Q3 -Q 1). Half the inter quartiles range is known as semi- inter quartile range or
quartile deviation (OD) and given as;
Q.D=Q3−Q1
2
Co-efficient of Quartile deviation :The relative measure of dispersion based on quartiles also known as
coefficient of QD is defined as
Co-efficient of Quartile deviation=Q3−Q1
Q3+Q1
Merits of Quartile deviation
It is easy to calculate and understand.
ii. It can be calculated in case of open end frequency distribution as well.
iii. It is not affected by 25% upper and 25% lower extreme values.
Demerits of Quartile deviation
i. It is based only on the central 50% observation and ignores the other
50%.
ii. It is not applicable to further algebraic treatment.
iii. ·It is not based on all the observations.
iv. It is affected by fluctuation of sampling.
C. Mean deviation: Mean deviation is a measure of dispersion which is based on all the values of set of data.
It is defined as the average of the absolute deviation taken from an average, usually, the mean, median or
mode. It is denoted by MD.
Merits of Mean Deviation
i. It utilizes all the observations of the series.
ii. It is simple to calculate and understand
iii. It is least affected by extreme values.
Demerits of Mean deviation
i. The foremost weakness of mean deviation is that in its calculation negative differences are considered
as positive without any sound reasoning
ii. It is not an amenable to further statistical treatment.
iii. It can't be calculated in case of open end frequency distribution if mean is used as average.
Calculation of mean deviation
∑|𝐗−𝐀|
a. For individual: MD= where A may be any chosen constant out of mean , median and mode.
𝐧
∑ 𝐟|𝐗−𝐀|
b. For Discrete and continuous: MD= where A may be any chosen constant out of mean , median
𝐍
and mode; N=∑f is total frequency
Co-efficient of MD: Mean deviation
average used
D. Standard deviation: Standard deviation is the absolute measure of dispersion. It fulfills all the requisites of
a good measure of dispersion except that it is sensitive to extreme values. That is why it is known as
standard deviation. It is defined as the positive square root of the mean of the square of the deviations taken
from mean ( x̅). It is denoted by σ . It is also known as mean error, mean square error or root mean square
deviation from mean.
Merit of standard deviation
i. It is based on all items of the series.
ii. It is rigidly defined.
iii. It is least affected by fluctuation of sampling.
iv. It is suitable for further statistical treatments.
6 Visit https://sudipkhadka.com.np more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
v. It is suitable for comparing variability.
Demerit of standard deviation
ii. It can't be calculated for open end classes.
iii. It gives greater weight to the extreme values and less to those which are near the mean.
iv. It is difficult to compute and understand for non mathematical person.
Calculation of standard deviation
∑(X−X̅ ) 𝟏
2
∑𝐱 𝟐
a. For individual: σ =√ . =√ ∑ 𝐱𝟐 − ( )
n 𝐧 𝐧
∑ f(X−X̅ )2 1 ∑ fX 2
b. For Discrete and continuous: σ = √ N
=√ ∑ fx2 − (
N
)
N
where N=∑f
E. Varience: Varience 2
is just a square of standard deviation. i.e varience= σ2.
∑ f(X−X̅ ) ∑ fX 2
Varience = = 1 ∑ fx2 − ( )
N N N
Co-efficient of varience (C.V ) :
The relative measure of dispersion based on the standard deviation is defined as the ratio of the standard
deviation and the mean is called coefficient of standard deviation.
Coefficient of SD =𝜎
𝐱̅
When coefficient of SD is expressed in percentage is called coefficient of variation, it is denoted by C.V. and
given as:
C.V. = 𝜎 𝐱𝟏𝟎𝟎%
𝐱̅
It is independent of unit of measurement. So two distributions can batter compared, with the help of C.V. for
their variability. Less the C.V. more will be the uniformity, consistency, homogeneity, equitable etc. and more
the C.V. less will be the uniformity, consistency, homogeneity etc.
7 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
CHAPTER TW0
CORRELATION AND REGRESSION ANALYSIS
Correlation: The degree of relationship between variable is called correlation. It may be simple, multiple
or partial. Two variables are said to be in correlation when they are so related that the change in value of
onevariable is accompained by change in value of other variable
Types of correlation
a. Positive ,negative and No-correlation
Two variables are said to be positively correlated if they tend to change together in the same
direction, that is, if they tend to increase or decrease together. Such positive correlation is postulated by
economic theory for the consumption and income. When the income increases the consumption
increases, and conversely, when income decreases the consumption decreases
Two variables are said to be negatively correlated if they tend to change in the opposite direction:
when X increases Y decreases, and vice versa. For example, the quantity of a commodity demanded and
its price are negatively correlated. When price increases, demand for · commodity decreases and when
price falls demand increases
Two variables are uncorrelated when they tend to change with no connection to each other. For
example one should expect zero correlation between the height of the inhabitants of a country and the
production of steel
b. Linear and non linear correlation
Correlation may be linear, when all points (X, Y) on a scatter diagram seem to cluster near a straight
line, or nonlinear, when all points seem to lie near a curve
c. Simple, partial and multiple correlation
In simple correlation. we measure the correlation between two variables (of which one is dependent
and the other is independent).
.
Coefficient of correlation
a) Karl pearson’s Co –relation coefficient: The Karl Pearson's correlation coefficient measures the degree of
linear association between two variables. This method is popularly known as Pearsonian correlation
coefficient. It is unitless measure of correlation. It is defined as the ratio of covariance between two variables
to the product of the standard deviation of the two variables. It is denoted by r, Let X and Y are two
variables, then Karl Pearson's correlation coefficient between X and Y is given by:
covarience(X,y)
r= where
σx X σy
∑(X−X̅ ).(y−y
̅)
co-varience(x,y) =
n
∑(X−X̅ )2
σx=√
n
̅)2
∑(y−y
σx=√
n
Where
x̅= The Arithmetic mean of values of variables X and
y̅= The Arithmetic mean of values of variables Y
and n = Number of pair of observations
n ∑ Xy−∑ X.∑ y
r=
√n ∑(X)2−(∑ X)2 X√n ∑(y)2−(∑ y)2
Properties of Karl Pearson's correlation coefficient
Following are the important properties of Karl Pearson's correlation coefficient:
Correlation coefficient lies between- 1 and + 1 i. e. - 1 < r < + 1
The formula of Correlation coefficient is symmetrical i · e r xy = ry x.
Correlation coefficient is geometric mean between two regression coefficient i.e. r =√bXybyX
8 Visit https://sudipkhadka.com.np more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
Correlation coefficient is a relative statistical measure and has no unit
Two independent variables are uncorrelated but the converse may not be true 1.e. uncorrelated variables
may not be independent.
Correlation coefficient is independent of change of origin as well as of scale
b) Speareman’s
∑ d2 correlation(rs): The spearman’s rank correlation is coefficient between x and y is given by
r =1 – 6 where the=difference of rank between x and y
s
n(n2−1)
Merits of Speareman's Rank correlation Coefficient
i. It is appropriate for qualitative data.
ii. It is easy to compute.
iii. It can be used when the data are highly skewed.
Demerits of Speareman's Rank correlation Coefficient
i. It is not suitable to compute in case of large number of observations.
ii. It is not appropriate for calculating correlation coefficient of a frequency distribution.
iii. It does not consider original values i.e. it involves ranks which is not actual measure.
Coefficient of Determination: The Square of the simple correlation coefficient is called the coefficient of
determination. It is used for interpretation of the value of the calculated correlation coefficient.
If r is correlation coefficient then r2 is coefficient of determination. If r =0.8, the value of r2 = 0.64. this
shows that 64% of the total variation in dependent variable has been explained by the independent variable and
other 36% is due to other unknown factors. 1 - r2 is called coefficient of non determination.
Interpretation of calculated value of correlation coefficient
The calculate value of Karl Pearson's correlation coefficient may be interpreted in the following manner.
If r = 1 ; there is perfect positive correlation between the variables.
If r = -1 ; there is perfect negative correlation between the variables.
If r = 0;there is no correlation between the variables.
If r lies between 0.700 to 0.999; there is high degree positive correlation.
If r lies between - 0.700 to - 0.999; there is high degree negative correlation.
If r lies between, 0.500 to 0.699;there is moderate positive correlation.
If r lies between. - 0.500 to - 0.699; there is moderate negative correlation.
If r lies between. 0.001 to 0.499; there is low degree positive correlation.
If r lies between - 0.001 to - 0.499; there is low degree negative correlation.
Regression: Regression is the average relationship among variables. It is used to predict the value of one
variable on the basis of other variable or variables. Regresssion may be simple partial or multiple
Lines of regression The lines of regression are the lines, which gives the best estimated value of variables
for the given value of independent variable's value. Let X and Y are two variables and there are two lines of
regression then;
1. Line of regression of y on x: (y=dependent and x=independent variable)
Let,
y= a+bx…............. (i)
be the line of regression of y on x; where a and b are constant to be determined.
By principle of least square method, the normal equation of (i) are
∑y = a∑1 + b∑x
∑ y = na + b ∑ x.........................(ii)
And
∑ xy = a ∑ x + b ∑ x2 .................. (iii)
Solving equation (ii) and (iii), we get value of a and b
Let a=⃗⃗a→ ; b=⃗b→
The fitted expression line of y on x is:
9 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
y=⃗⃗a→ + ⃗b→x
Note: For the line y=a+bx; b is called regression coefficient of y on x. It is denoted by byx.
2. Line of regression of x on y: (x=dependent and y=independent variable)
Let,
x= a1+b1y… ............. (i)
be the line of regression of x on y; where a1 and b1 are constant to be determined.
By principle of least square method, the normal equation of (i) are
∑ y = a1 ∑ 1 + b 1 ∑ y
∑ y = na1 + b1 ∑ y ........................ (ii)
And
∑ xy = a1 ∑ y + b1 ∑ y2 .................. (iii)
Solving equation (ii) and (iii), we get value of a and b
Let a1=⃗⃗a⃗⃗⃗1→
b1=⃗⃗b⃗⃗⃗→1
The fitted expression line of x on y is:
y=⃗a⃗⃗1→ + ⃗⃗b⃗⃗⃗→1x
Note: For the line y=a+bx; b is called regression coefficient of x on y. It is denoted by bxy.
Properties of Regression Coefficients
The following are the important properties of regression coefficients:
Both regression coefficients must have same sign and the sign of correlation coefficient is same as the sign
of regression coefficients.
The product of the two regression coefficient must be less than or equal to 1 i.e. bxy X byx ≤1
The correlation coefficient is the geometric mean between two regression coefficient i.e.
r = √bXy + byX
The two lines of regression intersect at the point (X ,Y ), where X and Y are the variables under study.
Regression coefficient are independent of change of origin but not of scale
i.e. bxy = buv if u = X- A and v = Y- B and
X− A
bxy ≠buv if u = and v = Y− B Where A and B are assumed mean, h and k are the common factor of X
h k
and Y series respectively.
Arithmetic mean of regression coefficient is greater than the correlation coefficient.
If r = ±1the lines of regression are either parallel or co-incident
10 Visit https://sudipkhadka.com.np more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
CHAPTER THREE
PROBABILITY
Probability: The probability can be defined as a measure of a likelihood that a particular event will occur; it is
a numerical measure of chance factor which value lies between 0 and 1. If S is a sample space having 'n' points
which arise out of all possible or exhaustive , equally likely and mutually exclusive outcomes of a random
experiment and 'm' points in a S are Favourable to an event 'A', then the probability of the event A is given as
Favourable cases m
P(A) = =
Total exhaustive number of cases n
Sample spaces and events: The set of all possible outcomes of a random experiment is called sample
space. The possible outcomes are called sample points. The sample space is usually denoted by the letter
'S'. For example:
In a tosses of a single fair coin the sample points are head(H) and Tail (T) and sample space is
S = {H, T}
If two unbiased coins are tosses simultaneously , the sample space is S = { HH, HT, TH, TT}
Performing a random experiment is called a trial and the collections of possible outcomes are
termed as events. For example; If a coin is tossed repeatedly, the result is not unique, we may get any of
the two faces head or tail. Thus tossing of a coin is a random experiment or trial and getting of a head
or tail is an event
3.1.1 TYPES OF EVENTS
A. Simple or Compound Events: An event having only one sample point is called simple or elementary
event. Otherwise it is known as compound or composite event. Thus in tossing of a single die the event of
getting '5' is a simple event but the event of getting an odd number is a compound event.
B. Mutually Exclusive Events : Events are said to be mutually exclusive or disjoint if the occurrence of one
precludes the occurrence of other i.e. one and only one of them can occur in a single trial. For example in
tossing of a fa ir coin, the events head and tail are mutually exclusive events since if the coin fall with head
upside tail can't turn up and vice-versa. If two or more events can occur together at a time they are not
mutually exclusive. For example, if we draw either a king or a diamond from a pack of 52 cards then event
king and diamond can occur together because we could draw the king of diamond, the king and diamond
are not mutually exclusive.
C. Exhaustive Events :The total number of possible outcomes of a random experiment is known as the
exhaustive number of event. For example, in tossing of a fair coin exhaustive cases are head and tail. In
tosses of two fair coin at a time th exhaustive cases are (HH, HT, TH and TT}
D. Favourable Events: The number of cases Favourable to an event in a trial is the number of outcomes
which entail the happening of the event. For example in drawing a card from a pack of cards the number of
cases Favourable to drawing of a king is 4, for drawing of diamond is13 and for drawing a black card are
26.
E. Equally Likely Events :Outcomes of a trial are said to be equally likely if none of them is expected to
occur in preference to other i.e. happening of all event has equal and independent chance. For example in
throwing an unbiased die, all the six faces are equally likely to come; in a random toss of a fair coin head
and tail are equally likely.
F. Independent and Dependent Events: Two or more events are said to be independent if the occurrence or
non-occurrence of one event doesn't affect the occurrence or non-occurrence of the other events. Otherwise
said to be dependence. For example, if we toss 2 coins together, let E1 be the event of occurrence of head
or tail on first coin and E2 be the event of occurrence of head or tail on second coin , so E1 and E2 are two
independent events. If we draw a card from a pack of well-shuffled cards and replace it before drawing the
11 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
second card, the result of the second draw is independent of the first draw, but, however, if the first card
drawn is not replaced then the second draw is dependent on the first draw.
Axioms, interpretations and properties of probability
If S is a sample space having 'n' points which arise out of all possible or exhaustive , equally likely
and mutually exclusive outcomes of a random experiment and 'm' points in a S are Favourable to an event
'A', then the probability of the event A is given as
m
P(A) =
n
A. In probability n and m are positive integer , thus , we have 0≤ m≤ n. 0≤ m≤ 1 ∴ 0≤ p(A) ≤ 1, hence the
probability of an event A is a number which is always lies between 0 and 1.
B. If P(A) = 0, then A is called an impossible or null event and If P(A) = 1, then A is called a certain or
sure event.
C. Probability is a pure number, it has no unit.
D. Let A be event of non- occurrence of an event A then,
(Number of favourable cases for non−occurence of A) n−m m
P(A) = Total cases
= n
= 1 − n= 1 − P(A)
a. P (A) + P (A ̅) = 1, ie total probability is always equal to unity.
E. We can compute the probability of an event A, i. e. P (A) by logical reasoning without conducting any
experiment, since, the probability, P (A) can be computed prior to obtaining any experimental data, and
it is also termed as a priori or mathematical probability.
Conditional probability: Many times the information is available that a event has occurred and one is
required to find out the probability of occurrence of another event B, utilizing the information about A ,
such a probability is known as conditional probability and is denoted by P(B/A) i.e. the probability of the
event B given A For example, suppose we know that a accident will happen (A) then one is interested to
know the probability of death of patient (B) i.e. we want to calculate P(B/A) ,
(A∩B)
The formula is P(B/A) = P
P(A)
(A∩B) P(A).P(B)
If A and B are independent, then P(B/A) = P = = P(B)
P(A) P(A)
Theorems on probability(Addition, Multiplication and bayes and their applications)
The theorems of probability are as follow:
Additive theorem of probability: There are two cases arise for additional theorems of probability.They
are:
Additive theorem of probability for not mutually exclusive events: T he probability of occurrenceof at
least one of the two events A and B is given by:
P(AUB)= P(A) + P(B)- P(A∩B)
Proof
Let us suppose that a random experiment results in a sample space S with N sample points, then by
definition
n(AUB) n(AUB)
P(AUB) = =
n(S) N
Where n(A U B) is the number of occurrences favorable to the event (AU B) . From the adjoining
diagram we get.
n(A)−n(A∩B)+n(B) n(A) n(A∩B)) n(B)
P(AUB) = = − + = P(A) + P(B) − P(A ∩ B)
N N N N
Additive theorem of probability for mutually exclusive event: If the events are mutually disjoint
i.e. (A∩B)= Ф then P(A∩B)=0
∴ P(AUB)= P(A) + P(B)- P(A∩B)= P(A) + P(B)
Hence probability of happening of any one of the two mutually exclusive events is equal to the sum
of their individual events.
Multiplication theorem of probability: There are two cases aries for multiplication theorems of
probability. They are:
12 Visit https://sudipkhadka.com.np more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
Multiplication theorem of probability for dependent events: The probability of
simultaneoushappening of two events A and B is given by:
P(A∩B) = P(A) P(B/A); P(A)≠ 0
P(A∩B) = P(B) P(NB); P(B)≠0,
where P(B/A) is the conditional probability of happening of B· under the condition that A has
already happened.
Proof
Let A and B be the events associated with the sample space S of random experiment with exhaustive
number of outcomes N ie n(S) = N, then by defination
𝐧(𝐀∩𝐁)
𝐏(𝐀 ∩ 𝐁) = .For the conditional event A/B , the Favourable outcomes must be out of the
𝐍
sample point B. In other words for the event A/B, the sample space is B and then
(𝐀∩𝐁)
P(A/B) = 𝐧 .
𝐧(𝐁)
(𝐀∩𝐁)
Similarly P(B/A) =𝐧
𝐧(𝐀)
𝐧(𝐀∩𝐁) 𝐧(𝐀) 𝐧(𝐀∩𝐁)
∴ 𝐏(𝐀 ∩ 𝐁) = = 𝐱 = P(A) x P(B/A)
𝐍 𝐍 𝐧(𝐀)
𝐧(𝐀∩𝐁) 𝐧(B) 𝐧(𝐀∩𝐁)
∴ 𝐏(𝐀 ∩ 𝐁) = = 𝐱 = P(A) x P(A/B)
𝐍 𝐍 𝐧(𝐁)
Multiplication theorem of probability for independent events: If A and B are independent, so
thatthe probability of occurrence of or non-occurrence of A is not affected by the occurrence or
non- occurrence of B, then we have
P (A/B)= P (A) and P (B/A) = P (B)
∴ P (A n B)= P (A) P (B)
Bayes theorem and their application
Statements: If E1, E2, E3…… En are and mutually and exclusive events of samples space “s” such that
⋃ni=1 Ei = s and P(Ei)>0 for i=1,2,3…… and and if A is any orbitrary events which is associated with
E1’s such that P(A)>0. Then the probability that A has occurred and given by the event Ei is given as
P(E/A) i.e.
P(A/Ei)P(Ei)
P(E/A)= n
∑i=1 P(A/Ei)P(Ei)
Proof: The events E1,E2 ........................ En are mutually exclusive events and A is any arbitrary event which is
associated with E1 ‘ and P(A)>0 & P(Ei)> 0. Then from the conditional probability of two events A and
Ei
P (A∩Ei) = P(A) x P(E/A) = P(Ei) x P(A/Ei)
P(Ei)XP(A/E1) ............................
or P(E /A) = (i)
i P(A)
The event A is define as
A=(A∩E1)U(A∩E2)U…........... U(A∩En),
since each Ei are mutually exclusive and thus (A∩E1), (A∩E2), ................ ,(A∩En) are also mutually
exclusive.
∴ P(A)= P(E1) X P(A/E1)+ P(E2) X P(A/E2)+ P(E3) XP(A/E3)+… ......... P(En) X P(A/En)
n P(E )xP(A/E ) .............................................(ii)
P(A) = ∑i=1 i 1
From Equation (i) and (ii)
P(E )XP(A/E1)
P(Ei/A) = n i
∑i=1 P(Ei)XP(A/E1)
For three exclusive and exhaustive events,
P(E1)XP(A/E1)
P(E1 /A) =
P(E1)XP(A/E1)+P(E2)XP(A/E2)+P(E3)XP(A/E3)
P(E /A) = P(E2)XP(A/E2) P(E /A) =
2 P(E1)XP(A/E1)+P(E2)XP(A/E2)+P(E3)XP(A/E3) 3
P(E3)XP(A/E3)
)
P(E1 XP(A/E1)+P(E2)XP(A/E2)+P(E3)XP(A/E3)
13 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
CHAPTER FOUR
RANDOM VARIABLES AND MATHEMMATICAL EXPECTATION
Random variables
A rule that assigns a real number to each outcome of a random experiment than the set of this real
number is called a random variable (r v).Thus random variable associates numerical values with each
elementary outcomes of an experiment. A random variable is usually denoted by any of the Capital Latin
letters X, Y, Z, U, V ... etc and particular values which the random variable takes are denoted by the
corresponding small letters . If we flip a coin and denote the head by 1 and tail by 0 then the random
variable takes place the value 1 and 0. In symbol X={x:x=(1,0)є S}
Types of Random Variables
There are two types of random variable namely.
A. Discrete random variable: A random variable takes a finite number of values or an infinite number of
values that can be arranged in a sequence that can be counted with positive integers 1,2......And so forth is
known as Discrete random variable. Further if there is a natural gap between two values of the variable
then the variable is discrete random variable. For example:
a) If we toss a coin, the variable can take only two values 0 and 1 assigned to tail and head respectively.
b) The number of student in a class etc
c) The number of people in queue in a Cinema hall for ticketing,
d) The number of children in a family,
e) Number of run made by a Cricketer in a match etc
B. Continuous random variable: A random variable X, which can take all possible values in Real line R or
between certain limits is said to be a continuous random variable. For a continuous random variable the
probability at a point is zero. For example:
a) The weight of middle aged people in Kathmandu lying between 40 kg and 150 kg is a continuous
random variable. Symbolically X = {x: 40 ≤ x≤150},
b) Measurement of weight, height, temperature, electric voltage, rainfall,
c) Hardness of steel,
d) Distance covered by a vehicle etc .
Properties of Random variables
The important properties of random variable are as follows
a) If X is a random variable and a and bare any two constants, then aX+b is also a random variable
b) If X is a random variable, then X2 is also a random variable
c) If X is a random variable then 1/ X is also a random variable
d) If X and Y are random variables defined on the same sample space S then, X+Y, X-Y, aX, bY or
aX+bY are also random variables with a and b are non negative constants
e) If X1, X2.......................... Xn are n independent random variables then
Un = min(X1. X2. . ...... Xn ) and Vn = max(X1. X2 ........................ Xn) are also random variables
Probability mass function(P.M.F)
If X is a one-dimensional discrete random variable taking at most a countable numbers x1, x2 ......... xn with
each value of the variable X , we associate a number
P(xi) = P(X = xi) ; i = 1, 2, .... , n.
This is known as the probability of variate values xi and satisfying the following two conditions
i . Each P (Xi) = P(X = xi) ≥ 0 i.e. P (xi) are all non-negative and
ii. ∑ P(xi) = 1 i.e. the total probability is unity.
Then P (xi) = P(X = xi) is called probability mass function of random variable X. The set of ordered
pairs {x;, P(xi)} i = 1, 2, .... ,n is called the probability distribution of the random variable X
Probability density function
If X is continuous random variable and f(x) is a continuous function of X. Then f(x)dx gives the
dX dX
probability of the event that random variable X lies in the interval (x − and x + )
2 2
i.e f(x)dx=P(x – dX≤X≤ x+dX)
2 2
14 Visit https://sudipkhadka.com.np more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
Then f(x) is called probability density function (p.d.f.). It is also known as frequency function because it
dX dX
also gives the proportion of units lying in the interval x − and x + . If X has the range [a, b]. i.e.
2 2
a≤X≤ b then the probability density function of random variable X must satisfy the following two
Properties
f(x) ≥0 for every x ε [a, b]
∑ba f(x)dx = 1,
it implies that the total area under the frequency curve is always unity.
Cumulative distribution function: A function Fx(x) of a random variable X for a real value x giving the
probability of an event (X≤x) is called a cumulative distribution function(c.d.f). it is simply known as
distribution function.
Properties of Distribution of function
Some importance properties of distribution
a. If a and b are two constant values such that a < b and F is the distribution function then
P(a <x≤b) = F(b)- F(a)
Proof :The events 'a < X ≤b' and 'X≤a' are disjoint and their union is the event 'X≤b'. Hence by
addition theorem of probability
P(a<X ≤b)+ P(X≤ a) = P(X ≤b)
P(a<X ≤b) = P(X≤ b) - P(X: ≤ a)
P(a <X ≤b) = F(b)- F(a)
b. If F(x) is the distribution function of a mono-variate X, then 0≤F(x) ≤1
c. If F(x) is the distribution function of mono-variate X , then F(−∞) = lim 𝑓(𝑥) =
𝑛→−∞
0 𝑎𝑛𝑑 F(∞) = lim 𝑓(𝑥) = 1
𝑛→∞
Mathematical expectation
Physical meaning of mathematical expectation:Let X be a one dimentional descrete Random
variables which can assume any one of the values x1,x2 .. xn, with respective probabilities P (x1) , P
(x2) ... P (xn). Then the mathematical expectation of r v X usually called the expected value of X and
denoted by E(X) and it is defined as
E(X) = x1P (x1) + x2P(x2) + ........... + xnP(xn)=∑x P(x)
Where ∑𝑖=1 n p(x) = 1 i.e. total probability is unity
Similarly, if X is continuous random variable with probability density function f(x) then
mathematical expectation of r. v X is defined as
𝑏
E(X) = ∫xf(x) dx; where f(x) ≥0 for every xϵ[a, b] and ∫𝑎 f(x)dx = 1
Physical meaning of Mathematical Expectation
Let us consider the following frequency distribution of the discrete random variable X as follwos
X x1 x2 … xn
F f1 f2 … fn
Then the arithmetic mean of the discrete random variable is given by
∑ 𝑓𝑥 𝑓 𝑥 +𝑓 𝑥 +⋯+𝑓𝑛𝑥𝑛
𝑥̅ = 𝑁 = 1 1 2 𝑁2
𝑓1 𝑓2 𝑓𝑛
𝑥̅ = 𝑥 + 𝑥 + ⋯+ 𝑥
𝑁 1 𝑁 2 𝑁 𝑛
We observed that, out of total of N cases f; cases are favourable to each of x events.
𝑓1
Therefore P(X =x) = = P (xi), i = 1, 2 ................ n.
𝑁
𝑛 𝑓
Which implies that, 𝑓1 = P(xi), 𝑓2= P(X2) ...... ..... . = P(xn)
𝑁 𝑁 𝑁
Therefore, x = x1P(x1) + x2P(x2) + ....... + xnP(xn) =ΣxP(x) = E(X).
15 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
Hence, mathematical expectation of a random variable is nothing but its simply arithmetic mean of
random variable.
Properties of Mathematical Expectation
a. If 'k' is a constant then E(k) = k
b. The mean of random variable Y defined as Y = a x + b with mean of random variable X as µ is
= a E(X) + b = aµ + b
'a' and 'b' are two constants then E (aX +b) = a E(X) + b
E (aX+ b)= E(aX) + E (b)
=a E(X) + b
=a E(X) + b
c. The mathematical expectation of the sum of random variables is equal, to the sum of their expectation,
provided all the expiation exists. Symbolically, Let X, Y Z ... T are random variables then
E ( X + Y + Z + ... + T) = E ( X ) + E ( Y) + E ( Z ) + ... + E ( T)
16 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
CHAPTER FIVE
DISCRETE PROBABILITY DISTRIBUTION
Binomial and possion probability distribution
Binomial Probability distribution:
Derivation of probability mass function of Binomial distribution
The probability of x successes and consequently (n-x) failures in 'n' independent trials, in a specified order
say
FFSFSSSF .............. SFS ,
where S represent success and F represents failure, is given by the compound probability theorem as
P (F F S F S S S F.... S F S)
= P (F) P(F) P(S) P(F) .... P(S) P(F) P(S)
= q q p q .... p q p. [Since trial are independent]
= ppp… … ... p qqq .... qqq
( x successes) (n-x failures)
= px qn – x
But x success in 'n' trials can occur in nCx ways and the probability for each of these ways is same (say) =
pxqn – x
Hence the probability of x success in n trials in any order is given by the addition theorem of probability
by the expression nCx pxqn – x .
Here nCx pxqn – x is the (x+1)th terms in the binomial expansion of (p+ q)n.
Characteristics of Binomial Probability distribution
The two independent constant and and p in the binomial distribution are known as the parameters of the
binomial distribution , sometimes ‘n’ is also known as the degree of the binomial distribution.
Binomial distribution is a discrete distribution as X can take only the integral values viz 0, 1, 2, 3 ... .....
........ . n.
Any random variable which follows binomial distribution is known as binomial variate.
We shall use the notation X≈ B(n, p) to denote that the random variable X follows binomial distribution
with parameter and and p.
The P(x) defined in binomial distribution is probability mass function bacause
𝐧 𝐧
∑ p(x) = ∑ pXqn−X = 𝟏
𝟎 𝟎
For the binomial distribution, variance is less than mean.
Since q is the probability of failure, for binomial distribution we always have
0 < q < 1, multiply by np on all sides we get
0<npq<np
0 < variance < mean
Thus, for binomial distribution mean is greater than that of variance.
If p = q = 1 the binomial distribution is symmetrical .
2
If p < 1 , t e binomial distribution is positive skewed.
2
If p > 1, the binomial distributicn is negative skewed.
2
Binomial distribution is unimodal if 'np' is a whole number and the mean and mode are equal, each
being 'np'
Mean and variance of binomial distribution
nC pxqn−x for x=0,1,2……n
Let X follows binomial distribution with parameters and n and p then P(X = x) { x
0 otherwise
∴mean =E(x)= ∑ x p(x) = ∑nX=0 xnCx pXqn−X =
= ∑n xn!
pXqn−X
X=0 (n−X)!X!
17 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
n(n−1)!
= ∑ nX=0 x (n−X)!X(X−1)! pXqn−X
=np ∑Xn=1 n – 2CX−2pX−1qn−X
=np(q+p)n – 1
=np
Thus mean of binomial distribution is np.
Similarly,2 variance of X= nvar(x)=E(x2) – [E(x)]2
then, E(x )=∑ x2P(x) = ∑ {x(x − 1) + x}p(x) = ∑n x(x − 1)nC pXqn−X + p(x)
X=0 X=0 x
n(n−1)(n−2)!pxqn−x
= ∑ Xn=0 (n−X)!(X−2)!
+ np
2
=n(n – 1)P ∑Xn=2 n – 2CX−2pX−2qn−X + np
= n(n – 1)P2(p+q)n – 2+ np=n(n – 1)p2+np
∴ variance of X= var(x)=E(x2) – [E(x)]2
= n(n – 1)p2+np – {np}2
=np(1 – q)=npq
Thus Varience of binomial distribution is npq.
Poisson Probability Distribution:
When the number of trials and is very large and probability of success is very small then
the resulting distribution is known as poisson distribution. It deals with the evaluation of probabilities of
rae events such as
Number of car accidents on road
Number of earthquakes in a year
Number of printing mistakes per page
Number of blind children born in a city
The number of twin births in a city hospitals etc.
Conditions of Poisson distribution
Poisson distribution is a limiting case of binomial distribution under the following conditions.
The discrete random variable X is the number of occurrence of ' and event in an interval of time.
The occurrences of the events must be random.
The occurrences of the events must be rare and independent of each other.
The occurrence must be uniformly distributed over the interval being used.
The number of trials 'n' is indefinitely large i.e. n→∞
The constant probability of success 'p' for each trial is indefinitely small i.e. p→0
The product of n and p i.e. n p = λ (say) is finite so
that P = λ and Q = 1 - λ where λ is read as lambda.
n n
Definition of Poison Distribution
A discrete random variable X is said to follows a poison distribution with single parameters A if it assumes
only non-negative values and it's probability mass function is given by
𝐞− λ λ 𝐱
P(X = x) { 𝐱! for x = 0,1,2 … … n
0 otherwise
Properties/characteristics of Poisson distribution
Following are the important properties of Poisson distribution.
The Poisson distribution depends upon the value of A , the average number of success per unit, 1·. is
called the parameter of the Poisson distribution.
Like binomial distribution Poisson distribution is a discrete probability distribution, it is concerned with
occurrence that can take ·integral value 0,1 , 2 ... .
Any random variable which follows Poisson distribution is known as Poisson variate.
We shall use the notation X- P(λ) to denote that random variable X follows Poisson distribution with
parameter λ.
18 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
The probability of success P(X=x) = P(x) defined in the Poisson distribution is pmf because
∞ ∞
𝐞− λ λ𝐱 λ𝐱
∑ =𝐞 ∑
− λ = 𝐞− λ𝐞 λ = 1
X=0
𝐱! X=0
𝐱!
Poisson distribution is independent of trial n.
Poisson variates are independent to each others.
The Poisson distribution is a positively skewed distribution. As λ increase, the graph of Poisson
distribution would get closer to a symmetrical continuous curve.
Poisson distribution is unimodal if λ is not an integer and unique modal value is integral part of A and
it is bimodal if λ is a integral and models value being λ and (λ -1 )
Means and variance of poisson distribution
The probability mass function of Poisson distribution is
𝐞− λ λ𝐱
P(𝐱) = 𝐱! x=0,1,2,3……∞
mean = E(x)= ∑ x p(x)
−λ x − λλ ∑n λx−1
∑ n e λ
= X=0 =e
X!
x=1 (X−1)!
= e− λλeλ=λ
Thus mean of poissons distribution is λ
Similarly,
variance of X= var(x)=E(x2) – [E(x)]2
then, E(x2)=∑∞ x2P(x) = ∑−∞λ {x(x x
− 1) + x}p(x)
0∞ X=0λ
∑ ( ) e ∑∞ ( )
= X=0 x x − 1 + X=0 xp x
− λ 2 ∑∞ X(X−1) X! λx−2
=e λ +λ
X=0
− λ 2 ∑∞ X(Xx−2
−1)(X−2)! −λ 2 λ
λ
= e λ X=2 (X−2)! + λ = e λ e + λ
= λ2 + λ
∴ variance of X= var(x)=E(x2) – [E(x)]2
= λ2 + λ − λ2 = λ
∴ Varience of binomial distribution is λ.
Hence for poissons distribution the mean and variance is equal to λ.
Fitting of Binomial and poisson Distribution
Fitting of Binomial Distribution
Let X follows binomial distribution with parameter n and p then
P(X=x)=nCxpxqn – x
P(X=x+1)=nCx+1px+1qn – x – 1
hence,
P(X=X+1) nC x+1qn−x
x+1p
= nC pxqn−x
P(X=X) x
(n−X)p
∴ P(X=x+1)= P(X=x)
(X+1)q
This relation is used to calculate the probabilities of binomial distribution when probabilities of
binomial distribution when probability of X=0 is known.
If a trail is repeated for N times then theoritical frequency or expected frequency is given by
F(x)=N X P(x) = N X nCx pxqn – x x=0,1,2,3,….n
This is known as fitting of binomial distribution.
Fitting of poisson Distribution
For poission distribution with parameter λ
𝐞− 𝜆 𝜆𝐱
P(X=x)= , x=0,1,2 ……∞
𝐱!
19 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
𝐞− 𝜆 𝜆𝐱+𝟏
P(X=x+1)= , x=0,1,2 ……∞
(𝐱+𝟏)!
𝐞− 𝜆
𝜆𝐱+𝟏
𝐏(𝐗=𝐱+𝟏) (𝐱+𝟏)! 𝜆
= 𝐞− 𝜆 𝜆𝐱
= (𝐱+𝟏)
𝐏(𝐗=𝐱)
𝐱!
λ
∴ 𝑃(𝑋 = 𝑥 + 1) = 𝑃(𝑋 = 𝑥) which is required reccurrence formula for poission distribution, used
(X+1)
to calculate the probability of poissions distribution when probability of X=0 is known.
If an experiment, satisfying the requirement of poission distribution is repeated N – times then
an expected frequency or theoritical frequency of getting x – successes is given by
− λ λ𝐱
F(x)=N P(X=x) =N𝐞 , x=0,1,2,3…∞
𝐱!
This is known as fitting of poission distribution
The hyper-geometric and negative binomial distribution
The hyper-geometric distribution
A discrete random variable X is said to follow the hyper geometric distribution with parameters N, M and n
if it assumes only non-negative value and its probability mass function is given by
M N−M
( )( )
P(X=x)= X n−X x=0,1,2,3…… min(n,M): M≤N : n≤ And
(N
n)
0 otherwise
Where N is a positive integer, M is a positive integer not exceeding And and n is a positive integer that is
at-most N.
Under the following condition we use hyper-geometric distribution.
The population or universe where the sample drawn consist of N finite individual, objects or elements.
Each individual can be characterized as a success or failure and then are M successes in the whole
population.
A random sample of n individuals is selected without replacement In such a way that each subset of size
n is equally likely to be chosen.
Properties of Hyper-geometric Distribution
The following are the properties of Hyper-geometric distribution
The three independent constants, N, M and n in the distribution art, known as the parameters of hyper
geometric distribution
Hyper geometric distribution is a discrete distribution as X can take only the integer values viz, 0, 1, 2 ....
.. min(n. M).
Any random variable which follows hyper geometric distribution is known as hyper geometric
variate.We shall use the notation X-HG (k; N, M, n) to denote that the random variable X follows hyper
geometric distribution with three parameter, And M and n.
The P(X = x) defined in the hyper geometric distribution is in pmf because
n M N−M
( )( )
=∑ x n−x =1
N
X=0 ( )
n
Mean and varience of hyper geometric distribution are 𝑛𝑀 and ................... respectively
𝑁
Negative Binomial Distribution
Negative binomial distribution is the converse of binomial distribution. In binomial distribution the
number of trials are fixed but in negative binomial distribution the number of success is fixed and the
number of trial is random variables.
The probability of x-failure preceding the rth success is given by
−r
P(X=x)= ( x ) (p)r(−q)X , x=0,1,2,3….
which is the (x+1)th term in the expansion of pr(1 – p)r.
Negative binomial distribution can be used under the following conditions.
20 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
The experiment consists of sequence of independent trials.
Each trial has only two possible outcomes; a success and a failure.
The probability of success is constant from trial to trial.
The experiment continues until a total of 'r' success have been observed with last trial success
We count the number of trials before the success occurs.
Properties of Negative Binomial Distribution
The properties of Negative Binomial Distribution are as follows:
The negative binomial distribution depends upon 'r' and 'p' then 'r' and 'p' are called the parameter of
the negative binomial distribution.
Negative binomial distribution is a discrete probability distribution. It is concerned with occurrence
that can take integral values as 0, 1, 2 ...
Any random variable which follows negative binomial distribution is known as negative. binomial
variate.
We shall use the notation X-NBD(r, p) to denote the random variable X follows negative binomial
distribution with parameters 'r' and 'p'.
The P(X=x) defined in NBD is pmf because
𝑟(−𝑞)𝑥
𝑥(−𝑞) = 𝑝𝑟(1 − 𝑞)𝑟 = 1
∑ p(x) = ∑ −r𝐶 𝑥
𝑥𝑝 = 𝑝𝑟 ∑ −r𝐶
𝑋 X
Mean and variance of negative binomial distribution are rq and rq respectively
p p2
In Negative Binomial Distribution the number of success is fixed and number of trials is a random
variable.
Variance of Negative Binomial Distribution is greater than its mean i.e
rq rq
> => p < 1, which is true.
2p p
21 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
CHAPTER SIX
CONTINUOUS PROBABILITY DISTRIBUTION
Normal, standard normal, the gamma, chi square and t-distribution
Normal distribution
A continuous random variable X is said to have a normal distribution with parameters μ called mean
and 𝜎2 called variance, if its pdf is given by the probability law as
1 (X−μ)2
f(x)=f(x;μ ,σ)= e− 2σ2 ; – ∞< x< ∞ ; – ∞<μ & σ>0
σ√2π
Characteristics of normal distribution and normal probability curves
The normal probability curve with mean μ & variance 𝜎2 is given by the equation f(x)
(X−μ)2
= 1 e− 2σ2 and has the following properties
σ√2π
The graph of probability f(x) of normal distribution is famous 'bell – shaped' curve. The top of the
bell is directly above the mean μ and tails of curve of a normal distribution curve extended indefin
itely on both
sides of X = μ and never touch the x- axis. For large value of σ, the curve tends to flatten out and
for small value of σ it has a sharp peak.
Mean, median and mode of the distribution are coinciding.
The two independent constants p and a in the distribution are known as the parameters of the
distribution.
Normal distribution is a continuous distribution as x can takes any value in the real line R i.e. -
∞<x<∞
Any random variable which follows normal distribution is known as normal variate.
We shall use the notation x - N (μ , 𝜎2) to denote that the random variable X follows normal
distribution with mean p and variance 𝜎2
The probability function f(x) defined in the distribution is pdf because
∞ ∞ 1 (X−μ)2
∫ 𝑓(𝑥) = i.e =∫ e− 2σ2 =1
−∞ −∞ σ√2π
The normal curve is symmetrical about the line X = μ , thus coefficient of skewness β1 = 0 and λ1=
0. the normal distribution is masokurtic thus β2 = = 3 and λ2= 0.
Since f(x) being the probability, can never be negative, no portion of the curve lies below the x -
axis.
Linear combination of independent normal variates is also a normal variate.
X-axis is an asymptote to the normal curve.
2
Mean deviation about mean=√ 𝜎 = 4 𝜎(approx.)
5 𝜋
Quartiles are Q1 =μ - 0.6745 σ, Q3 =μ + 0.6745
𝑄 −𝑄
Quartile deviation Q.D. = ( 2 1)(approx)
2
4
Therefore QD: MD: SD = 2 𝜎: 𝜎: 𝜎 X=μ
3 5
Which implies QD:MD:SD = 10: 12: 15 Fig: Normal probability curve
IMPORTANCE OF NORMAL DISTRIBUTION
Normal distribution plays a very important role in statistical theory because of the following reasons.
1. For large values of n, calculations of probability for discrete probability distribution (e.g. Binomial
distribution, Poisson distribution, hyper geometric distribution etc,) become quite tedious and time
consuming. In such cases, normal approximation can be used with great ease and convenience.
2. For large value of n (i.e. n →∞) , almost all of the exact sampling distribution e.g.- t-distribution , f-
distribution and the chi-square distribution conform to normal distribution.
3. The whole theory of exact sample (small sample) tests namely t. f ,x2 tests etc is based on the
assumption that the parent population from which the sample are drawn follows normal distribution.
4. For large value of n (i.e. n →∞), the central limit theorem follows normal distribution.
22 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
5. It is extensively use in large sampling theory to find the estimates of parameters from statistic
andconfidence limits etc
6. Many of the distributions which are not normal can be made normal by simple transformation.
7. It finds many applications in statistical quality control and industrial experiments for setting control
limits.
Standard normal distribution
If X is a normal variate with parameter µ and 𝜎2 then random variable Z defined by
𝑥−𝜇
Z= is a Standard normal variate with mean zero (0) and Cr variance unity (1) and the pdf of
𝜎
Standard normal distribution is given by
1 𝑧2
f(z) = 𝑒 − 2 , −∞ < 𝑍 < ∞
√2𝜋
The gamma distribution
A continuous random variable X is said to have a gamma distribution with single parameter λ, if it
assumes only non- negative values and it probability density function is defined as 𝑓(𝑥) = 𝑓(𝑥, λ ) =
e− x Xλ−1
λ > 0 ,0 ≤ x < ∞
{ (λ)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Properties of gamma distribution
Following are the important properties of gamma distribution.
The Game distribution depends upon the two constants α and.λ are called the parameter of the
Gamma distribution.
Like normal distribution gamma distribution is also a continuous probability distribution
Any random variable which follows gamma distribution is known as gamma variate.
We shall use the notation X = G(α,λ) to denote that random variable X follows Gamma distribution
with parameters α and λ. and also X = G(λ)for single parameter A.
The f{x, λ.) defined in gamma distribution is pdf because
∞
∫ 𝑓(𝑥, λ ) = 1
0
As λ→∞ gamma distribution tends to normal distribution.
If x1,x2 ................. Xn are identically independently distributed (iid) gamma variate with parameter α,λi;
i= 1, 2 .... ...... n then ∑ 𝑥𝑖 ≈ G(𝖺, ∑ λ1)
Chi square distribution
The square of a standard normal variate is known as a chi-square variate (pronounced as ki - sky without s)
with 1 degree of freedom (d.f.). Thus if Z is SNV with mean 0 and variance 1 than Z is known as the
chisquare variate.
In general if x1, x2 ............. xn are n independent normal ~arlnt •, wltll mean J..l; and
variances. cr; (i = 1' 2 .. ... n), then Z1, Z2 ----------- -- · Zn are independent SNV&
𝑥2 = 𝑧2 + 𝑧2 + 𝑧2 + ⋯ .......... +𝑧2 is chi-square variate with n. df and its pdf is defined as
1 2 3 𝑛 𝑛
1 𝑥 −1
−
𝑓(𝑥 2 ) = 1 𝑒 2 𝑥2 ; 0≤x<∞
22𝛤𝑛
2
Properties of Chi-Square Distribution
A random variable X which follows chi-square distribution is also known as chi-square variate.
A chi-square variate X is continuous random variable which assumes non-negative variables only i.e. o
~ x < oo.
23 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
We shall use the notation X "'l(n) to denote that random variable X follows chi- square distribution with
n- degrees of freedom.
Mean and variance of chi square distribution are nand 2n respectively.
Mode of the chi-square distribution curve is lies at the point x2=and – 2 for n > 2
Measure of Skewness 8and Kurtosis is12 Therefore, Chi-square curve is positive Skewness &
𝑛 𝑛
Leptokurtlc
The chi-square distribution curve with 1 and 2 degree of freedom is hyperbolic in shape where as for n
> 2, the curve is un-symmetrical shaped.
If x1 ,x2, ... ..... xk are k iid chi square variate with ni d.f then ;∑x in also a chi-square variate with ∑n2
d.f. this is known as additive property of chisquare.
It X & Y are two independent distribution chi square variate the with n1 & n2 d.f. then (X ± Y) is also a
chi-square with (n1 ± n2) d.f.
If n is large then lim 𝑥𝑛2→N{n, 2n). This is known as the limiting property of the chi-square.
𝑛→∞
The chi-square distribution is used to test whether a hypothetical value of the population variance is true
or not.
It is used to test a test of goodness of fit.
It is used to test the independent of attributes.
It is used to test the homogeneity of several population variances.
It is used to test the equality of several population correlation coefficients
Application of Chi-Square Distribution
Chi- square distribution has a large number of applications in statistics some of the most important
applications are as follows.
To test if the hypothetical value of the population variance
To test the 'goodness of fit'
To test the independence of attributes.
To test the homogeneity of independent estimate of the population variance.
To combine various probabilities obtained from independent experiments to given a single test of
significance
Student t-distribution
Let x1, x2, x3 …..xn be a random sample size of size and drawn from a normal population with
mean µ variance σ2. Then student t-distribution is defined as
𝑥̅−𝜇
𝑡 = 𝑠√𝑛
𝑥̅−𝜇
Or 𝑡 = Follows student t-distribution with (n-1) degree of freedom and where
𝑠√𝑛−1
∑
𝑥 ∑(𝑥−𝑥̅)2
𝑥̅ = and S2=
𝑛 𝑛−1
Properties of t-distribution
The important properties of t-distribution are given as follow:
a) Like Z-distribution, t-distribution is a continuous distribution having symmetrical and bell shaped curve.
The value of 't' ranges from. - ∞ to ∞ i.e. - ∞ < t < ∞
b) We use the notation t ~ tn to show that the statistics 't' follows student's t-distribution with n degrees
offreedom.
c) As n→∞ the t-distribution tends to standard normal distribution.
d) Mean of the t-distribution is zero and variance is equal to 𝑣 ; 𝑣 > 2
𝑣−𝑧
e) The t-distribution is symmetrical i.e.β1 = 0 => λ1= 0.
f) For t-distribution P2 = 3 (~ ~ ~). for large degrees of freedom p2 = 3, hence for large degrees of freedom t-
distribution tends to normal distribution.
g) The t-distribution curve is unimodal with mean = mode = median = 0.
h) The t-axis is the asymptote of the t-distribution.
24 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
i) The t-distribution is flatter than the normal distribution and there is a different t-distribution for every
possible sample size. For example a t-distribution for a sample size of 15 is different than the t-distribution
for a sample size of 2 and so on.
Assumption about t-Distribution
For derivation of student's t-distribution following basic assumption are made
a) The parent population from which the samples are drawn is normal with mean µ and variance σ2
b) All observations in the sample are independent i.e. one item selected in the sample doesn't effect the other
items included in the sample.
c) The sample size is small i. e. less than 30 as a usual practice. Also the sample should not certain less tan 5
observations.
d) The hypothetical value µ0 of µ is a correct value of population mean of parent population form which the
sample are drawn.
e) The sample values are correctly measured and recorded.
f) The value of population standard deviation σ is unknown.
Application of t-Distribution
The student's t-distribution has wide number of application in statistics, same of which are enumerated below
a) To test it the sample mean (𝑥̅) differ significantly from the hypothetical value µ of the population mean,
Population variance being unknown.
b) To test the significance of the difference between two independent mean, The population variance being
equal but unknown.
c) Paired t-test for difference of two means.
d) Test of significance of an observed sample correlation coefficient
e) Test of significance about the slope parameter β0 of regression equation
f) Test of significance about the regression coefficient βxy
25 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
CHAPTER SEVEN
ESTIMATION
Parameter and statistics
a) Parameter: The statistical constants of whole population are called parameter such as population mean
(μ), population variance (σ2). population proportion (P). Population correlation coefficient (p),
population regression coefficient (βvx) etc. out of various parameters mean and variance are largely
used.
b) Statistics: According to Ronald A. Fisher the statistical constants of the sample selected from the
populat1on are called statistics; in general statistics are the function of observable random variable and
doesn't involve any unknown parameters.
Sample statistic Population parameter
Samle mean (𝑥̅) μ
Standard deviation (ρ) σ
Proportion (p) P
Correlation coefficient (r) ρ
μ = 𝑥̅ ± 𝑆𝐸(𝑥̅)
𝜎
= 𝑥̅ ±
√𝑛
𝜌
=𝑥̅ ±
√𝑛
Where 𝑥̅=mean; SE=standard error; ρ= 𝜎
Standard error and sampling Distribution
The standard deviation of the sampling distribution of sample statistics is known as standard error (S.E.) of
the statistics
Sample Standard error
1. 𝒙̅ 1. 𝝈 𝒁𝑎 or 𝒔 𝒁𝑎 or S.E(𝒙 ̅)
√𝒏 𝟐 √𝒏 𝟐
𝝈𝟐 𝝈𝟐
2. ̅𝒙̅𝟏– ̅𝒙̅𝟐 2. S.E(̅𝒙̅ − ̅𝒙̅) = (√ +𝟏 )𝟐𝒁
𝑎
𝟏 𝟐 𝒏𝟏 𝒏𝟐 𝟐
𝒑𝒒
3. p 3. 𝑺. 𝑬(𝒑)=( ) 𝒁𝑎
√
𝒏 𝟐
𝒑𝟏𝒒𝟏 𝒑𝟐𝒒𝟐
4. p1 – p2 4. 𝑺. 𝑬(𝒑 − 𝒑 ) = +
𝟏 𝟐 √ 𝒏𝟏 𝒏𝟐
Estimation: It is the process of estimate the population parameter on the basis of sample statistic with desire
degree of precision . The main objective of estimation is to obtain guess or estimate unknown.
Point estimation: It is the process of estimating population parameter in the form of certain interval through
sampling statistic. For example a sample mean ( 𝑥̅ ) and a sample variance s2 is the point estimate of
population mean μ and population variance σ2. We write as point estimator of μ is 𝑥̅ as 𝜇̂. = 𝑥̅ and that of
σ2 a s 𝝈
̂𝟐 = 𝒔𝟐
Interval of estimation: It is the process of estimating population parameter in the form of certain interval
through sampling statistic. There are two ways for indicating error by the extent of its range and by the
probability of the true population parameter lying within that range. Example:
𝜎
μ = 𝑥̅ ±
√𝑛
P = p± 𝜎
√𝑛
sometimes, interval of estimation= confidence interval
26 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
Properties of good estimator
A good estimator is one which is as close to the true value of the parameter as possible. The following are
the some of the criteria that should be satisfied by a good estimator.
a. Unbiasedness: An estimator or statistics In which is a function of the asmple observations x1, X2 . ...
Xn is said to be an unbiased estimator of the corresponding population parameter θ if E( tn ) = θ
b. Consistency: statistics tn is said to be consistent estimator of the population parameter θ if as the
sample size increase the value of the sample statistics become very close to the value of the
population parameter.
c. Sufficiency: A statistics tn based on sample of n observation is said to be sufficient for a parameter
θ if it contains all the information in the sample regarding the parameter.
d. Efficiency: An unbiased estimator tn, based on sample of n observation is said to be efficient
estimator than any other unbiased estimator tn, of population parameter θ if and only
var(tn)
if Var( tn) <
Var(𝑡𝑜). Also the relative efficiency of tn as compared to 𝑡𝑜 is given as RE =
𝑛 𝑛 o
vartn
27 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
CHAPTER EIGHT
STATICAL INFERENCE
Hypothesis : A quantitative statement about the population parameter is called a hypothesis. A statistical
hypothesis is a statement about the distribution of one or more variables
Types of Hypothesis
A. Null Hypothesis : It is the statement about population parameter which is usually a hypothesis of no
difference. It is denoted by H0. Null hypothesis is the hypothesis which is tested for possible rejection
under the assumption that it is true.
B. Alternative hypothesis.: It is complementary of null hypothesis. It is against null hypothesis. In other
word alternative hypothesis is a statement about the population parameter or parameters which provides
an alternative to the null hypothesis within the range of pertinent value of parameter. It is denoted by H1.
It may be one tail or two tail.
Steps of testing of hypothesis/some comments on selecting a test procedure
1. Set up null hypothesis Ho.e.g there is no significance difference between sample statistics and
population parameters.
2. Set up alternative hypothesis(H1): e.g there is significance difference between sample statistics and
population parameters. It may be one tailed or two tailed.
3. Choose Level of significance(α):generally it is taken 5%
4. Test statistic: Choose appropriate test statistics under null hypothesis H0 such as t-test, z -test , f-
test etc
𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
Test statistic =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐
5. Critical value : It is obtained from respective table under pre-determined value of α.
6. Conclusion:
a. If calculated value is < tabulated value H0 is accepted.
b. If calculated value is > tabulated value H0 is rejected and H1 is accepted.
Error in testing of hypothesis
While testing of statistical hypothesis ,in practice there are four possibilities of making
decisioneither to accept or reject. They are:
a. Accepting the null hypothesis when the null hypothesis is true.
b. Accepting the null hypothesis when the null hypothesis is false.
c. Rejecting the null hypothesis when the null hypothesis is false.
d. Rejecting the null hypothesis when the null hypothesis is true.
Here we commit two type of errors. They are:
A. Type I error: The error committed in rejecting null hypothesis when it is true is called type I error .
It is also known as producers risk. Type I error amounts to rejecting a lot when it is good.
B. Type II error: The error committed in accepting null hypothesis when it is wrong is called type II
error . It is also known as consumers risk. Type II error amounts to accepting a lot when it is bad.
Level of significance :The level of significance may be defined as the probability of type of I error
which we are ready to tolerate in making a decision about Ho.
i.e. P (reject Ho when Ho is true)= α
It is our endeavour to carry out a test which minimizes both type of error; unfortunately for given
set of observations, both the errors can't be controlled simultaneously. Hence it is a general practice to
assign a bound to type I error and to minimize type II error, Thus one choose a value of α lying between
0 and 1 which is known as the level of significance.
Degree of freedom: The number of independent observation in a set is called degrees of freedom. Inother
word it may be defined as the number of observation in a set minus the number of restrictions improved
on it. . It is denoted by v(nu) or d.f.
28 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
One tailed or two tailed test:
If an alternative hypothesis is such that it leads to one sided alternatives to the null hypothesis, it
is said to be a one tailed test. For example, Test Ho: μ= 20 verses H1:μ >20 or H1:μ <20 , the critical
region or area ‘A’ lies only on one sided tail. Here Area ‘A’ lies on right tail given when μ >20 and lies
in left tail given when μ <20.
If an alternative hypothesis is such that it leads to two alternatives to the null hypothesis, it is said
to be a two tailed test. For example, Test Ho: μ= 20 verses H1:μ ≠20 leads to two sided test as μ can be
greater than 20 or less than 20. In this situation half of the area of critical region lies on the left tail and
the half on the right. If A is the area of the critical region 𝐴 is the area on both the tails
2
Large sample test(z – test)
Large sample test is generally used when the sample size is greater than 30. i.e n ≥30.
Assumptions
a.In this test the sample drawn from population are normally distributed.
b.Population standard deviation is known
c. Samples are independent
Test of significance (For mean)
A. For single mean
a. Set up null hypothesis Ho: Ho:.μ= μo or H0:μ =𝑋̅
b. Set up alternative hypothesis(H1) : H1: .μ≠ μo or H1:μ ≠𝑋̅
c. Choose Level of significance(α): Determine the level of significance at which hypothesis is to
betested.
d. Test statistic: Under null hypothesis Ho:.μ= μo the statistic is given by
−
̅ 𝝁
𝒙 −
̅ 𝝁
𝒙
Test statistic (Z)= −
̅ 𝝁
𝒙
= 𝝈/√𝒏 = 𝒔/√𝒏 where μ=given mean and σ is given S.D
𝑺.𝑬(𝒙
̅)
𝝈 𝒔
μ= 𝒙
̅± 𝒁 =𝒙
𝑎̅± 𝒁𝑎
√𝒏 𝟐 √𝒏 𝟐
e. Critical value : Find the critical value of Z as Zα from table
f. Conclusion:
i. If calculated value Z < Zα , H0 is accepted.
ii. If calculated value is Z> Zα , H0 is rejected and H1 is accepted.
B. Difference of Two mean
Process same as in single mean except value Z
̅ −𝒙
𝒙 ̅ ̅ −𝒙
𝒙 ̅ ̅ −𝒙
𝒙 ̅
Z= 𝟏 𝟐 = 𝟏 𝟐 = 𝟏 𝟐
𝑺.𝑬(𝒙
̅𝟏 −𝒙
̅𝟐 ) 𝒔𝟐 𝒔𝟐 𝝈𝟐 𝝈𝟐
√ 𝟏+ 𝟐 √ 𝟏+ 𝟐
𝒏𝟏 𝒏𝟐 𝒏𝟏 𝒏𝟐
Test of significance for Population proportion
A .single Proportion
a. Set up null hypothesis Ho: Ho:P= Po
b. Set up alternative hypothesis(H1) :
H1: .P≠ Po (For two tailed)
H1: .P< Po (For one left tailed)
H1: .P> Po (For one right tailed)
c. Choose Level of significance(α): Determine the level of significance at which hypothesis is to be
tested.
d. Test statistic: Under null hypothesis Ho:.μ= μo the statistic is given by
Test statistic (Z)= 𝒑−𝑷 = 𝒑−𝑷 = 𝒑−𝑷
𝑺.𝑬(𝒑) √𝑷𝑸 √𝑷𝑸
𝒏 𝒏
where P= Proportion of claimed value and p sample proportion
Q=1 – P and n= no. of sample
29 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
𝑷𝑸
P = P±𝒁𝑎S.E=𝒑 ± 𝒁𝑎√ 𝑷𝑸 = 𝒑 ± 𝒁𝑎√ is confidence interval of population interval.
𝟐 𝟐 𝒏 𝟐 𝒏
q=1 – p
e. Critical value : Find the critical Z as Zα from table
f. Conclusion:
i. If calculated value Z < Zα , H0 is accepted.
ii. If calculated value is Z> Zα , H0 is rejected and H1 is accepted.
B. Two proportion
If p1 and n1 are proportion and number of samples drawn fron 1st population
p2 and n2 are proportion and number of samples drawn fron 2nd population then test statistic is
given by:
𝑝1 − 𝑝2 𝑝1 − 𝑝2
𝑍= =
𝑆. 𝐸(𝑝1 − 𝑝2) 𝑝1𝑞1 𝑝2𝑞2
√ 𝑛 +
1 𝑛2
Small sample test(t-test )
Assumptions for t-test
a. The sample size is less than 30
b. The sample drawn from the population are normally distributed
c. Population varience or S.D is not known
d. Sample are independent and one sample does not effect other sample
e. The sample values are correctly measured and correct
T-test for Significance difference of single mean
a. Set up null hypothesis Ho: Ho:. μ = μ o
b. Set up alternative hypothesis(H1) :
H1: . μ ≠ μ o (For two tailed)
H1: . μ < μ o (For one left tailed)
H1: . μ > μ o (For one right tailed)
c. Choose Level of significance(α): Determine the level of significance at which hypothesis is to
betested.
d. Test statistic: Under null hypothesis Ho:.μ= μo the statistic is given by
Test statistic (t) = 𝑥̅−𝑆𝜇0
√𝑛
∑𝑥
where 𝑥̅ = is sample mean μ0 =claimed value of population mean.
𝑛
∑(𝑋−𝑥̅)2 ∑ 𝑥2 (∑ 𝑥)2 ∑(𝑋−𝑥̅)2 ∑ 𝑥2 ∑𝑥 𝟐
2
𝑠 = = − or 𝑠2 = = −( )
𝑛−1 𝑛−1 𝑛(𝑛−1) 𝑛 𝑛 𝒏
e. Degree of freedom = n – 1
f. Critical value : Find the critical value t as tα from table
g. Conclusion:
i. If calculated value t < tα , H0 is accepted.
ii. If calculated value is t> tα , H0 is rejected and H1 is accepted.
T-test for Significance difference of two mean
a. Set up null hypothesis Ho: Ho:. μ1 = μ 2
b. Set up alternative hypothesis(H1) :
H1: . μ1 ≠ μ 2 (For two tailed)
H1: . μ1 < μ 2 (For one left tailed)
H1: . μ1 > μ 2 (For one right tailed)
c. Choose Level of significance(α): Determine the level of significance at which hypothesis is to
betested.
30 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
d. Test statistic: Under null hypothesis Ho:. μ1 = μ 2 the statistic is given by
𝑥̅1−𝑥̅2
Test statistic (t)= 𝑥̅1−𝑥̅2 =
𝑆.𝐸(𝑥̅1−𝑥̅2) √𝑠2 ( 1+ 1)
𝑝 𝑛1 𝑛2
(𝒏𝟏−𝟏)𝒔𝟐+(𝒏𝟐−𝟏)𝒔𝟐
where 𝑠𝑝2 = 𝟏
𝒏𝟏+𝒏𝟐−𝟐
𝟐
e. Degree of freedom = 𝒏𝟏 + 𝒏𝟐 − 𝟐
f. Critical value : Find the critical value t as tα from table
g. Conclusion:
If calculated value t < tα , H0 is accepted.
If calculated value is t> tα , H0 is rejected and H1 is accepted.
Paired t-test
It is concerned with the difference between the pair of related observations instead of the value of
the individual observation. Paired t-test for difference of means can be applied under the following
situations.
The sample sizes are equal i.e. n1= n2 = n (say).
The two samples are not independent but the sample observations are paired together i.e. the pair
observations (xi , yi) ; i = 1, 2 ... n corresponds to the same sample unit.
The same set of sample is treating twice on the same subject matter.
The following are the steps in testing paired t-test for difference of mean or t-test for paired wise dependent
samples is as follows:
a. Set up null hypothesis Ho: Ho:. μ1 = μ 2 ; there is no significance difference in the observation
before the treatment
b. Set up alternative hypothesis(H1) : There is significance difference in the observation before the
treatment
H1: . μ1 ≠ μ 2 (For two tailed)
H1: . μ1 < μ 2 (For one left tailed)
H1: . μ1 > μ 2 (For one right tailed)
c. Choose Level of significance(α): Determine the level of significance at which hypothesis is to
betested.
d. Test statistic: Under null hypothesis Ho:.μ1= μ2 the statistic is given by
̅
Test statistic (t) = 𝑆𝑑 , follows the student t=distribution with (n – 1) degree of freedom
𝑑
√𝑛
∑𝑑
where 𝑑̅ = ; d= x – y
𝑛
∑(𝑑)2 (∑ 𝑑)2
𝑠𝑑2 = −
𝑛−1 𝑛(𝑛−1)
e. Critical value : For α level of significance and (n – 1) degrees of freedom find the critical value t
as tα from table
f. Conclusion:
If calculated value t < tα , H0 is accepted.
If calculated value is t> tα , H0 is rejected and H1 is accepted.
31 Downloaded from https://sudipkhadka.com.np
NOTES ON PROBABILITY AND STATISTICS
CHAPTER NINE
THE ANALYSIS OF CATEGORICAL DATA
Chi-square test : Chi-square test (ψ2) is a non- parametric test It depends only on the set of observed and
expected frequencies and degree of freedom. Since chi-square test doesn't make any assumption about
population parameter ,it is also called distribution free test. Chi-square test is a test which describes the
magnitude of difference between observed frequency and theoretical or expected frequencies under
certainassumptions.
𝑂−𝐸2
ψ2=∑ where E=Expected frequency and O=observed frequency
𝐸
Application of Chi-Square Distribution Chi- square distribution has a large number of applications in
statistics some of the most important applications are as follows.
A. To test if the hypothetical value of the population variance
B. To test the 'goodness of fit'
C. To test the independence of attributes
D. To test the homogeneity of independent estimate of the population variance.
E. To combine various probabilities obtained from independent experiments to given a single test of
significance
Test procedure for a population Varience
a. Set up null hypothesis (Ho): Ho:σ2 = 𝜎20 ; sample are drawn from the normal population with specified
variance 𝜎02
b. Set up alternative hypothesis(H1) : σ2 ≠ 𝜎20 𝑖. 𝑒 sample are drawn from the normal population with
specified variance 𝜎02
c. Choose Level of significance(α): Determine the level of significance at which hypothesis is to be tested.
2
d. Test statistic: Under null hypothesis
2
Ho:σ
2
= 𝜎02 the statistic is given by
𝑥−𝑥̅ 𝑛𝑠
Test statistic (t) = 𝜓 = ∑
2 = ∑ , follows chi-square test distribution with (n – 1) degree of
𝜎02 𝜎02
freedom
1
where 𝑠𝑑2 = ∑(𝑥 − 𝑥̅)
𝑛
32 Visit https://sudipkhadka.com.np for more notes and tutorials
Notes on Probability and Statistics
Prepared By Sudip Khadka
e. Critical value : For α level of significance and (n – 1) degrees of freedom find the critical value t as tα
from table
f. Conclusion:
If calculated value t < tα , H0 is accepted.
If calculated value is t> tα , H0 is rejected and H1 is accepted
Chi-square test test of goodness of fit
Following are the steps for test of goodness of fit
a. Set up null hypothesis (Ho): The fitting of certain distribution is good
b. Set up alternative hypothesis(H1) : The fitting of certain distribution is not good
c. Choose Level of significance(α): Determine the level of significance at which hypothesis is to be tested.
d. Test statistic: Under null hypothesis Ho the statistic is given by
𝑂−𝐸2
Test statistic (t) = 𝜓 =
2 ∑ , follows chi-square test distribution with (n – 1) degree of freedom
𝐸
1
where 𝑠𝑑2 = ∑(𝑥 − 𝑥̅)
𝑛
e. Critical value : For α level of significance and (n – 1) degrees of freedom find the critical value t as tα
from table
f. Conclusion:
If calculated value t < tα , H0 is accepted.
If calculated value is t> tα , H0 is rejected and H1 is accepted
Two way contigency table and test of independence of attributes: A contingency table is a two-way table
in which the columns are classified according to one criterion or attribute and rows are classified
according to the other criteria or attribute. Each cell contains that the number of items Oij possessing the
quality of ith row and jth column. Where i = 1, 2, ... nand j = 1, 2, ... m. In such a case the contingency tabl
is said to be of order (n x m). Contingency table helps to test the independence of two attribute. A two ways
contingency table is as
Test of independence of attributes In order to test whether attribute or characteristic A and B are independent
i.e. there is no association or relationship between the two characteristics or attribute A and B. we use the
following step for of independent of attributes. Similarly to the TEST OF HOMOGENEITY oftwo or
more then two population we use the same process with slightly different hypothesis.
a) Set up null hypothesis Ho: Two attributes (two categorical variable) A and B are independence i.e. there is
no relationship or association between them
b) Set up alternative hypothesis H1: Two attributes (two categorical variable) A and B are dependence i.e.
there is some relationship or association between them.
c) Fix the level of significance α
d) Test statistics: Under null hypothesis Ho. the test statistics is
𝑂−𝐸2
= 𝜓2 = ∑ ' follows chi-square distribution with (r-1) (c-1) ; where,
𝐸
O=observed frequency, E= expected frequency
r= number of rows, c=numbers of columns.
The expected frequency in a cell is the product of its row total and column total divided by the total
𝑅𝑇 𝑋 𝐶𝑇
frequency, thus 𝑒 = where RT=row total, CT- column total and N is the grand total.
𝑁
e) Critical Value: For a-level of significance and (r-1) (c-1) degrees of freedom, find the critical value of ψ2
from table as 𝜑𝛼2
f) Conclusion: if calculated value of ψ2 is less than tabulated value we accept our null hypothesis and
otherwise reject it.
33 Downloaded from https://sudipkhadka.com.np