STATISTICS

KNOWLEDGE ARENA FOR UGC NTA NET/JRF
KNOWLEDGE ARENA
(BEST COACHING FOR UGC NTA NET/JRF)
Statistics Notes
Points to be remembered
 F distribution is coined by George W Snedewr in Honour of Sir Ronald A Fisher.
 Chi square (non- parametric test) concept given by Karl Pearson.
 Concept of normal distribution is given by De Mouire and person involved in this are Laplace, Gauss and W J
Yoden.
 Concept of Regression is given by Sir Francis Galton in 1877.
 Concept of T- distribution is given by WS Gooset.
 Data which are collected for the first time are primary data.
 Data is only quantitative.
 Secondary data are second hand data which are in the form of published & unpublished.
 Category of secondary data are also called paper source.
 Median and mode are positional average.
 Athematic mean, geometric, harmonic & weighted average mean are mathematical average.
 Σ known as capital SIGMA.
 Property of arithmetic mean-:
1) Sum of all observations of the given set of observations from their arithmetic mean is zero.
2) Combined mean= n1x1 + n2x2/n1+n2
3) The sum of square of the deviations of the given set of observations is minimum when taken from the arithmetic mean.
4) Mean is affected by both change of scale and change of origin.
5) AM> GM> HM ( AM- arithmetic mean, GM- geometric mean, HM- harmonic mean)
6) Mode = 3median – 2mean
 One dimensional 1d diagram are those which have only length. Examples are line diagram, multiple bar
diagram, compound or cluster bar diagram, sub divided bar diagram are also called Component Bar Diagram,
Percentage Bar Diagram, Deviational Bar Diagram.
 Two dimensional diagram are those in which both length and breadth are present 2D. Examples are
histogram, area diagrams, rectangles, square, circles and pie diagram.
 OGIVE CURVE and frequency polygon are also 2d diagrams.
 OGIVE represent cumulative frequency and histogram frequency distribution and frequency polygon
 The larger the sample the more accurate will be the research.
 Increasing the sample size decreases the sample error.
www.knowledgearena.in Page 1
 Sample size N = 100

 Principles of sampling-
1) Law of statistical regularity.
2) Principle of inertia of large number.
 Type 1 error by rejecting a true null hypothesis and it is also known as producer error, alpha or level of
significance.
 Type 2 error by accepting the false null hypothesis also called consumer error (1- β) beta, power function of
test or power curve, power of test.
 Standard error (SE) is standard deviation of the distribution of the sample mean. S.E = σ/√Ŷ
 In a normal distribution curve, since the curve is a bell shaped & symmetrical i.e. mean=median=mode
 Total area under normal probability curve is 1 (.5 + .5)
 Since curve is symmetrical co efficient of kurtosis is 3 mesocratic.
 Range of distribution is ∞to ∞ but practically it is 60σ.
 Point of inflexion is x= +- μσ
 Leptokurtic <3, Platokurtic>, Mesocratic =3.
 Area μ+-1σ= 68.27%, μ+-2σ= 95.45%, μ+- 3σ= 99.73%
 Z (STANDARD NORMAL DISTRIBUTION) = X-μ/Σ
 Concept of BINOMIAL DISTRIBUTION is given by JAMES BERNOULLI.
 Concept of POISSON DISTRIBUTION is given by SIMEON POISSON. In this the value of mean and
variance is (0,0)
 Standard deviation is also known as root Mean Square Deviation.
 Standard deviation is affected by change of scale & independent of change of origin.
 Positively skewed= mean>median>mode.
 Negatively skewed= mode>median>mean.
 Balance pattern= mean=median=mode.
 Skewness means lack of symmetry or asymmetrical distribution.
 Concept of co efficient of Skewness given by Karl Pearson.
 Confidence interval 95%= 1.96, 99%= 2.56/2.58
 Chi square is a non- parametric test
 Chi square lies from 0 to ∞
 Conditions to apply chi square test:-
1) Population mean & sample mean is not given in the question.
2) Degree of freedom (df-1) as df starts from t test.
 Chi square test is also known as:-
1) Goodness of fit accumulation.
2) Contingency table.
3) Quantitative variables.
4) Co efficient of association.
Degree of freedom = (row-1) (column -1)
 F test in which value of numerator is always greater than denominator

 Concept of correlation is given by Karl Pearson.
 Correlation denotes from R and its values lies between -1 to 1
 Spearman Correlation Given By Edward Spearman.
 Edward Spearman Denotes Correlation From P (Rho)
P= 1- 6Σd2/n (n2-1)
If tied rank the formula will be p= 1-6Σd2/n (n2-1) + m (m2-1)/12.
 Karl Pearson correlation formula is cov (xy)/σx.σy.
 Correlation is independent of both change of scale & origin.
 Regression is affected by change of scale & independent of change of origin.
 R2 is coefficient of determination.
 Coefficient values lies between 0 to 1.
 R2= bxy* byx
 Regression shows a causual effect i.e. cause & effect relationships.
 Parametric test: - Z Test, T Test, F Test.
 Non Parametric Test Or Distribution Free Test: - Sign Test, Median Test, Mann Whitney U Test, run test, K.S
Test, Chi Square Test.
The most common basic statistics terms you’ll come across are the mean, mode and median. These are all what
are known as “Measures of Central Tendency.” Also important in this early chapter of statistics is the shape of a
distribution. This tells us something about how data is spread out around the mean or median. Perhaps the most
common distribution you’ll see is the normal distribution, sometimes called a bell curve. Heights, weights, and
many other things found in nature tend to be shaped like this:
Overview
Stuck on how to find the mean, median, & mode in statistics?
1. The mean is the average of a data set.

2. The mode is the most common number in a data set.
3. The median is the middle of the set of numbers.
Of the three, the mean is the only one that requires a formula. I like to think of it in the other dictionary sense of
the word (as in, it’s mean as opposed to nice!). That’s because, compared to the other two, it’s not as easy to
work with.
Hints to remember the difference

Having trouble remembering the difference between the mean, median and mode? Here’s a couple of hints that
can help.
 “A la mode” is a French word that means fashionable and it also refers to a popular way of serving ice
cream. So “Mode” is the most popular or fashionable member of a set of numbers. The word MOde is also
like MOst.
 The “Mean” requires you do arithmetic (adding all the numbers and dividing) so that’s the “mean” one.
 “Median” has the same number of letters as “Middle”.
The Mean
Mean vs. Median
Mean vs. Average
Specific “Means” commonly used in Stats
Other Types
Mean vs Median
Both are measures of where the center of a data set lies, but they are usually different numbers. For example, take this list
of numbers: 10,10,20,40,70.
 The mean (average) is found by adding all of the numbers together and dividing by the number of items in the set: 10
+ 10 + 20 + 40 + 70 / 5 = 30.
 The median is found by ordering the set from lowest to highest and finding the exact middle. The median is just the
middle number: 20.
Sometimes the two will be the same number. For example, the data set 1,2,4,6,7 has an average of 1 + 2 + 4 + 6 + 7 / 5 = 4
and a median (a middle) of 4.
Mean vs Average: What’s the Difference?

When you first started out in mathematics, you were probably taught that an average was a “middling” amount for a set of
numbers. You added up the numbers, divided by the number of items you can and voila! you get the average. For
example, the average of 10, 5 and 20 is:
10 + 6 + 20 = 36 / 3 = 12.
The you started studying statistics and all of a sudden the “average” is now called the mean. What happened? The answer
is that they are exactly the same word (they are synonyms).
That said, technically, the word mean is short for the arithmetic mean. We use different words in stats, because there are
multiple different types of means, and they all do different things.
Specific “Means” commonly used in Stats

You’ll probably come across these in your stats class. They have very narrow meanings:
 Mean of the sampling distribution: used with probability distributions, especially with the Central Limit Theorem.
It’s an average of a set of distributions.
 Sample mean: the average value in a sample.
 Population mean: the average value in a population.
Other Types
There are other types of means, and you’ll use them in various branches of math. Most have very narrow applications to
fields like finance or physics; if you’re in elementary statistics you probably won’t work with them.
These are some of the most common types you’ll come across.
1. Weighted mean.
2. Harmonic mean.
3. Geometric mean.
4. Arithmetic-Geometric mean.
5. Root-Mean Square mean.
6. Heronian mean.
1. Weighted Mean
These are fairly common in statistics, especially when studying populations. Instead of each data point contributing
equally to the final average, some data points contribute more than others. If all the weights are equal, then this will
equal the arithmetic mean. There are certain circumstances when this can give incorrect information, as shown
by Simpson’s Paradox.
2. Harmonic Mean
The harmonic formula.
To find it:
A. Add the reciprocals of the numbers in the set. To find a reciprocal, flip the fraction so that the numerator
becomes the denominator and the denominator becomes the numerator. For example, the reciprocal of 6/1 is
1/6.
B. Divide the answer by the number of items in the set.
C. Take the reciprocal of the result.
This is used quite a lot in physics. In some cases involving rates and ratios it gives a better average than the
arithmetic mean. You’ll also find uses in geometry, finance and computer science.
3. Geometric Mean
Formatted: Font: (Default) Times New Roman, Font color:

Background 2, Border: : (No border)
This type has has very narrow and specific uses in finance, social sciences and technology. For example, let’s say
you own stocks that earn 5% the first year, 20% the second year, and 10% the third year. If you want to know the
average rate of return, you can’t use the arithmetic average. Why? Because when you are finding rates of return you
are multiplying, not adding. For example, the first year you are multiplying by 1.05.
4. Arithmetic-Geometric Mean
This is used mostly in calculus and in machine computation (i.e. as the basic for many computer calculations). It’s
related to the perimeter of an ellipse. When it was first developed by Gauss, it was used to calculate planetary orbits.
The arithmetic-geometric is (not surprisingly!) a blend of the arithmetic and geometric averages. The math is quite
complicated but you can find a relatively simple explanation of the math here.
2. What is the Mode?

The mode is the most common number in a set. For example, the mode in this set of numbers is 21:
21, 21, 21, 23, 24, 26, 26, 28, 29, 30, 31, 33
3. What is the Median?

The median is the middle number in a data set. To find the median, list your data points in ascending order and then find
the middle number. The middle number in this set is 28 as there are 4 numbers below it and 4 numbers above:
23, 24, 26, 26, 28, 29, 30, 31, 33
Note: If you have an even set of numbers, average the middle two to find the mean. For example, the mean of this set of
numbers is 28.5 (28 + 29 / 2).
23, 24, 26, 26, 28, 29, 30, 31, 33, 34
Statistics Basics: Definitions
What is Bias in Statistics?

Bias is the tendency of a statistic to overestimate or underestimate a parameter. To understand the difference between a
statistic and a parameter, see this article. Bias can seep into your results for a slew of reasons including sampling or
measurement errors, or unrepresentative samples.
Sampling error is the tendency for a statistic not to exactly match the population. Error doesn’t necessarily mean that a
mistake was made in your sampling; Sampling Variability could be a more accurate name. For example, let’s say you
have a population in the United States with an average height of 5 feet 9 inches. If you take a sample, even a fairly sizable
sample of say, 10,000 people, it’s unlikely that you’ll get exactly 5 feet 9 inches. You might get very close, perhaps to
within a fraction of an inch. If you repeat the experiment, you might get another very close result. For example, in
experiment 1 you might get 5 feet 8.9 inches and in experiment 2 you might get 5 feet 9.1 inches. The tendency for
statistics to get very close, but not exactly right, is called sampling error.
Note: If the statistic is unbiased, the average of all statistics from all samples will average the true population parameter.
Measurement Errors
Measurement errors are where a provided response is different from the real value. For example, you might survey to
find out if a person voted for President Obama. A person may have voted for him, but they are confused by the wording of
the questionnaire and mistakenly respond that they did not vote for him. Several factors may cause measurement error,
including:
 The way the interviewer poses the question.
 The wording on the questionnaire.
 The way the data is collected.
 The respondent’s record-keeping system.
Biased Estimators
In statistics, an estimator is a rule for calculating an estimate of a quantity based on observed data. For example, you
might have a rule to calculate a population mean. The result of using the rule is an estimate (a statistic) that hopefully is a
true reflection of the population. The bias of an estimator is the difference between the statistic’s expected value and the
true value of the population parameter. If the statistic is a true reflection of a population parameter it is
an unbiased estimator. If it is not a true reflection of a population parameter it is a biased estimator.
The word bias in the regular English language implies that you have a personal reason to misrepresent a piece of
information. However, in statistics, it doesn’t mean that the interviewer, the researcher or even the respondent in an
interview is biased in some way. It just means that the estimator being used doesn’t produce a good estimate.
Example of a Biased Estimator

You are playing the party game “Pin the tail on the donkey.” (If you aren’t familiar with the game, a picture of a donkey is
placed on the wall and you are given a paper tail to pin on the donkey while you are blindfolded. The person who pins the
tail closest to the actual spot where the real tail should go wins the game). You try six times to pin the tail in the right
place and each time you pin the tail in the wrong place, at the bottom or to the front of the donkey. Your estimation for the
actual spot where the tail should have gone is a biased estimator because you put the tails in the wrong place.
What is Selection Bias?

Ideally, you should randomly select every participant in a survey. But, sometimes biases creep in, whether intentional or
unintentional. Selection bias takes away from the “randomness” you are hoping to achieve. It’s usually a result of not
using the correct procedures to choose your participants. Types of selection bias include: the healthy worker effect, non-
response bias, undercoverage, and voluntary response bias.
Qualitative Variable (Categorical Variable): Definition and Examples

A qualitative variable, also called a categorical variable, are variables that are not numerical. It describes data that fits
into categories. For example:
 Eye colors (variables include: blue, green, brown, hazel).
 States (variables include: Florida, New Jersey, Washington).
 Dog breeds (variables include: Alaskan Malamute, German Shepherd, Siberian Husky, Shih tzu).
These are all qualitative variables as they have no natural order. On the other hand, quantitative variables have a value and
they can be added, subtracted, divided or multiplied.
Breeds of dog are qualitative variables. How many dogs are quantitative variables.
Quantitative Variable Qualitative Variables
Fractions Cat breeds
Decimals Cities
Odd Numbers Fast Food Chains
Whole Numbers College Major
Irrational Numbers Fraternities
Ordered pairs (x,y) Hair Color
Negative Numbers Computer Brands
Map coordinates Beer breweries
Positive Numbers Pop music genre
Exponents Tribe
As a general rule, if you can apply some kind of math (like addition), it’s a quantitative variable. Otherwise, it’s
qualitative. For example, you can’t add blue+green (unless you’re in an art class — even then you “mix” them, you don’t
add them!).
Numbers are sometimes assigned to qualitative variables for data analysis, but they are still classified as qualitative
variables despite the numerical classification. For example, a study may assign the number “1” to males and “2” to
females.
Qualitative Variables and the Nominal Scale

Qualitative variables aren’t ordered on a numerical scale in statistics so they are assigned nominal scales. The word
“nominal” means “name”, which is exactly what qualitative variables are. A nominal scale is a scale where no ordering is
possible or implied (except for alphabetical ordering like New York, Washington, West Virginia or Chelsea, Edinburgh,
London). In other words, the nominal scale is where data is assigned to a category.
Census in Statistics: Overview

A census studies every member of a population. It results in a parameter for a population, as opposed to a statistic.
Basically, a parameter contains information about everyone in the population while a statistic only tells you something
about a small part of that population (How to tell the difference between a statistic and a parameter). Studying every
member of a population is usually not practical because of finances or time constraints. With the exception of the U.S.
Census, censuses are very rare.
Discrete vs Continuous variables: Definitions.

What is a Discrete Variable?
Discrete variables are countable in a finite amount of time. For example, you can count the change in your pocket. You
can count the money in your bank account. You could also count the amount of money in everyone’s bank accounts. It
might take you a long time to count that last item, but the point is—it’s still countable.
Discrete variables on a scatter plot.
What is a Continuous Variable?

Continuous Variables would (literally) take forever to count. In fact, you would get to “forever” and never finish
counting them. For example, take age. You can’t count “age”. Why not? Because it would literally take forever. For
example, you could be:
25 years, 10 months, 2 days, 5 hours, 4 seconds, 4 milliseconds, 8 nanoseconds, 99 picosends…and so on.
Time is a continuous variable.
You could turn age into a discrete variable and then you could count it. For example:
 A person’s age in years.
 A baby’s age in months.
Take a look at this article on orders of magnitude of time and you’ll see why time or age just isn’t countable. Try counting
your age in Planctoseconds (good luck…see you at the end of time!).
What is a Dependent Event?

When two events are dependent events, one event influences the probability of another event. A dependent event is an
event that relies on another event to happen first. Dependent events in probability are no different from dependent events
in real life: If you want to attend a concert, it might depend on whether you get overtime at work; if you want to visit
family out of the country next month, it depends on whether or not you can get a passport in time. More formally, we say
that when two events are dependent, the occurrence of one event influences the probability of another event.
Simple examples of dependent events:
 Robbing a bank and going to jail.
 Not paying your power bill on time and having your power cut off.
 Boarding a plane first and finding a good seat.
 Parking illegally and getting a parking ticket. Parking illegally increases your odds of getting a ticket.
 Buying ten lottery tickets and winning the lottery. The more tickets you buy, the greater your odds of winning.
 Driving a car and getting in a traffic accident.
What is an Independent Event?

An independent event is an event that has no connection to another event’s chances of happening (or not happening). In
other words, the event has no effect on the probability of another event occurring. Independent events in probability are no
different from independent events in real life. Where you work has no effect on what color car you drive. Buying a lottery
ticket has no effect on having a child with blue eyes.
When two events are independent, one event does not influence the probability of another event.
Simple examples of independent events:
 Owning a dog and growing your own herb garden.
 Paying off your mortgage early and owning a Chevy Cavalier.
 Winning the lottery and running out of milk.
 Buying a lottery ticket and finding a penny on the floor (your odds of finding a penny does not depend on you
buying a lottery ticket).
 Taking a cab home and finding your favorite movie on cable.
 Getting a parking ticket and playing craps at the casino.
What is a Parameter in Statistics?
In math, a parameter is something in an equation that is passed on in an equation. It means something different in
statistics. It’s a value that tells you something about a population and is the opposite from a statistic, which tells you
something about a small part of the population.
A census is where everyone is surveyed.
A parameter never changes, because everyone (or everything) was surveyed to find the parameter. For example, you
might be interested in the average age of everyone in your class. Maybe you asked everyone and found the average age
was 25. That’s a parameter, because you asked everyone in the class. Now let’s say you wanted to know the average age
of everyone in your grade or year. If you use that information from your class to take a guess at the average age, then that
information becomes a statistic. That’s because you can’t be sure your guess is correct (although it will probably be
close!).
Statistics vary. You know the average age of your classmates is 25. You might guess that the average age of everyone in
your year is 24, 25, or 26. You might guess the average age for other colleges in your area is the same. And you might
even guess that’s the average age for college students in the U.S.. These may not be bad guesses, but they are statistics
because you didn’t ask everyone.
What is the Sample Variance?

The sample variance, s2, is used to calculate how varied a sample is. A sample is a select number of items taken from a
population. For example, if you are measuring American people’s weights, it wouldn’t be feasible (from either a time or a
monetary standpoint) for you to measure the weights of every person in the population. The solution is to take a sample of
the population, say 1000 people, and use that sample size to estimate the actual weights of the whole population. The
variance helps you to figure out how spread out your weights are.
Body types are varied — they come in all shapes and sizes.
Definition of Sample Variance

The variance is mathematically defined as the average of the squared differences from the mean. But what does that
actually mean in English? In order to understand what you are calculating with the variance, break it down into steps:
 Step 1: Calculate the mean (the average weight).
 Step 2: Subtract the mean and square the result.
 Step 3: Work out the average of those differences.
Standard Deviation: Simple Definition
Standard deviation is a measure of dispersement in statistics. “Dispersement” tells you how much your data is spread out.
Specifically, it shows you how much your data is spread out around the mean or average. For example, are all your scores
close to the average? Or are lots of scores way above (or way below) the average score?
What Does it Look Like on a Graph?

The bell curve (what statisticians call a “normal distribution“) is commonly seen in statistics as a tool to understand
standard deviation.
The following graph of a normal distribution represents a great deal of data in real life. The mean, or average, is
represented by the Greek letter μ, in the center. Each segment (colored in dark blue to light blue) represents one standard
deviation away from the mean. For example, 2σ means two standard deviations from the mean.
Real Life Example

A normal distribution curve can represent hundreds of situations in real life. Have you ever noticed in class that most
students get Cs while a few get As or Fs? That can be modeled with a bell curve. People’s weights, heights, nutrition
habits and exercise regimens can also be modeled with graphs similar to this one. That knowledge enables companies,
schools and governments to make predictions about future behavior. For behaviors that fit this type of bell curve (like
performance on the SAT), you’ll be able to predict that 34.1 + 34.1 = 68.2% of students will score very close to the
average score, or one standard deviation away from the mean.
Statistics Symbols A to Z
α: significance level (type I error).

b or b0: y intercept.
b1: slope of a line (used in regression).
β: probability of a Type II error.
1-β: statistical power.
CI: confidence interval.
df: degrees of freedom.
E = margin of error.
f = frequency (i.e. how often something happens).
f/n = relative frequency.
Ho = null hypothesis.
H1 or Ha: alternative hypothesis.
IQR = interquartile range.
m = slope of a line.
M: median.
n: sample size or number of trials in a binomial experiment.
N: population size.
σ: standard deviation.
σx̅: standard error of the mean.
σp̂: standard error of the proportion.
p: p-value, or probability of success in a binomial experiment, or population proportion.
ρ: correlation coefficient for a population.
p̂: sample proportion.
P(A): probability of event A.
P(AC) or P(not A): the probability that A doesn’t happen.
P(B|A): the probability that event B occurs, given that event A occurs.
q: probability of failure in a binomial or geometric distribution.
Q1: first quartile.
Q3: third quartile.
r: correlation coefficient of a sample.
R2: coefficient of determination.
s: standard deviation of a sample.
s.d or SD: standard deviation.
t: t-score.
μ mean.
ν: degrees of freedom.
X: a variable.
Χ2: chi-square.
x: one data value.
x̄: mean of a sample.
z: z-score.
Binomial Theorem: Simple Definition, Formula.
What is the Binomial Theorem?

The most common form of the binomial theorem (sometimes called a binomial expansion) used in statistics is simply a
formula:
What is a Bernoulli Distribution?

A Bernouilli distribution is a discrete probability distribution for a Bernouilli trial — a random experiment that has only
two outcomes (usually called a “Success” or a “Failure”). For example, the probability of getting a heads (a “success”)
while flipping a coin is 0.5. The probability of “failure” is 1 – P (1 minus the probability of success, which also equals 0.5
for a coin toss). It is a special case of the binomial distribution for n = 1. In other words, it is a binomial distribution with a
single trial (e.g. a single coin toss).
The probability of a failure is labeled on the x-axis as 0 and success is labeled as 1. In the following Bernoulli distribution,
the probability of success (1) is 0.4, and the probability of failure (0) is 0.6:
– x
The probability density function (pdf) for this distribution is px (1 – p)1 , which can also be written as:
The expected value for a random variable, X, from a Bernoulli distribution is:
E[X]=p.
For example, if p = .04, then E[X] = 0.4.
The variance of a Bernoulli random variable is:

Var[X] = p(1 – p).
What is a Bernoulli Trial?

A Bernoulli trial is one of the simplest experiments you can conduct in probability and statistics. It’s an experiment
where you can have one of two possible outcomes. For example, “Yes” and “No” or “Heads” and “Tails.” A few more
examples:
 Coin tosses: record how many coins land heads up and how many land tails up.
 Births: how many boys are born and how many girls are born each day.
 Rolling Dice: the probability of a roll of two die resulting in a double six.
Coin tossing as a game of probability and chance has been around since Roman times.
Bernoulli trials are usually phrased in terms of success and failure. Success doesn’t mean success in the usual way — it
just refers to an outcome you want to keep track of. For example, you might want to find out how many boys are born
each day, so you call a boy birth a “success” and a girl birth a “failure.” In the dice rolling example, a double six die roll
would be your “success” and everything else rolled would be considered a “failure.”
Independence
An important part of every Bernoulli trial is that each action must be independent. That means the probabilities must
remain the same throughout the trials; each event must be completely separate and have nothing to do with the previous
event.
Winning a scratch off lottery is an independent event. Your odds of winning on one ticket are the same as winning on any
other ticket. On the other hand, drawing lotto numbers is a dependent event. Lotto numbers come out of a ball (the
numbers aren’t replaced) so the probability of successive numbers being picked depends upon how many balls are left;
when there’s a hundred balls, the probability is 1/100 that any number will be picked, but when there are only ten balls
left, the probability shoots up to 1/10. While it’s possible to find those probabilities, it isn’t a Bernoulli trial because the
events (picking the numbers) are connected to each other.
The Bernouilli process leads to several probability distributions:
 The binomial distribution,
 The geometric distribution,
 The negative binomial distribution.
Relation to the Binomial Distribution
The Bernoulli distribution is closely related to the Binomial distribution. As long as each individual Bernoulli trial is
independent, then the number of successes in a series of Bernoulli trails has a Binomial Distribution. The Bernoulli
distribution can also be defined as the Binomial distribution with n = 1.
Criteria
Binomial distributions must also meet the following three criteria:
1. The number of observations or trials is fixed. In other words, you can only figure out the probability of
something happening if you do it a certain number of times. This is common sense—if you toss a coin once, your
probability of getting a tails is 50%. If you toss a coin a 20 times, your probability of getting a tails is very, very
close to 100%.
2. Each observation or trial is independent. In other words, none of your trials have an effect on the probability of
the next trial.
3. The probability of success (tails, heads, fail or pass) is exactly the same from one trial to another.
Variance of a Binomial Distribution
A binomial distribution is a simple experiment where there is “success” or “failure.” For example, choosing a winning
lottery ticket could be a binomial experiment (you either win or lose!). Tossing a coin to try and get heads is also binomial
(with tossing a heads being a “success” and a tails a “failure”). The formula for the variance of binomial distribution
is n*p (1-p) or n*p*q. The two formulas are equivalent because q = (1-p).
Example problem: If you flip a coin 50 times and try to get heads, what is the variance of binomial distribution?
Step 1: Find “p”. The first step to solving this problem is to realize that the probability of getting a heads is 50 percent, or
.5. Therefore, “p” (the probability) is .5.
Step 2: Find “q”, or 1-p. These two are equivalent. They are the probability of not getting a heads (in other words, the
probability of getting a tails). 1 – 0.5 = 0.5. Therefore, “q” (or 1 – p) = 0.5.
Step 3: Multiply Step 1 (p) by Step 2 (q) by “n” (the number of trials). We are flipping the coin 50 times, so the number of
trials is 50 (n = 50).
N * p * q = 50 * .5 * .5 = 12.5.
The var. of binomial distribution for flipping a coin 50 times is 12.5.
OK, So what does the Binomial Variance mean?
In essence, not a lot! The variance isn’t used for much at all, except for calculating standard deviation. For example, the
standard deviation for this particular binomial distribution is:
√12.5=3.54.
You’ll use the variance for things like calculating z-scores (this typically comes later in a stats class, after normal
distributions), which has a standard deviation in the bottom of the formula:
Alternate form of the z score.
Population Variance
The population variance is a type of parameter. If you aren’t sure what a parameter is, you may want to review:
What is the Difference Between a Statistic and a Parameter?
The formula is:
Standard Deviation for a Binomial
Formula:
Example problem: Find standard deviation for a binomial distribution with n = 5 and p = 0.12.
Step 1: Subtract p from 1 to find q.
1 – .12
=.88
Step 2: Multiply n times p times q.
5 * .12 * .88
=.528
Step 3: Find the square root of the answer from Step 2.
√.528 = =.727 (rounded to 3 decimal places).
Normal Distributions (Bell Curve):
What is a Normal distribution?

A normal distribution, sometimes called the bell curve, is a distribution that occurs naturally in many situations. For
example, the bell curve is seen in tests like the SAT and GRE. The bulk of students will score the average (C), while
smaller numbers of students will score a B or D. An even smaller percentage of students score an F or an A. This creates a
distribution that resembles a bell (hence the nickname). The bell curve is symmetrical. Half of the data will fall to the left
of the mean; half will fall to the right.
Many groups follow this type of pattern. That’s why it’s widely used in business, statistics and in government bodies like
the FDA:
 Heights of people.
 Measurement errors.
 Blood pressure.
 Points on a test.
 IQ scores.
 Salaries.
The empirical rule tells you what percentage of your data falls within a certain number of standard deviations from
the mean:
• 68% of the data falls within one standard deviation of the mean.
• 95% of the data falls within two standard deviations of the mean.
• 99.7% of the data falls within three standard deviations of the mean.
The standard deviation controls the spread of the distribution. A smaller standard deviation indicates that the data is
tightly clustered around the mean; the normal distribution will be taller. A larger standard deviation indicates that the data
is spread out around the mean; the normal distribution will be flatter and wider.
Properties of a normal distribution

 The mean, mode and median are all equal.
 The curve is symmetric at the center (i.e. around the mean, μ).
 Exactly half of the values are to the left of center and exactly half the values are to the right.
 The total area under the curve is 1.
The Standard Normal Model

A standard normal model is a normal distribution with a mean of 1 and a standard deviation of 1.
Standard Normal Model: Distribution of Data

One way of figuring out how data are distributed is to plot them in a graph. If the data is evenly distributed, you may
come up with a bell curve. A bell curve has a small percentage of the points on both tails and the bigger percentage on the
inner part of the curve. In the standard normal model, about 5 percent of your data would fall into the “tails” (colored
darker orange in the image below) and 90 percent will be in between. For example, for test scores of students, the normal
distribution would show 2.5 percent of students getting very low scores and 2.5 percent getting very high scores. The rest
will be in the middle; not too high or too low. The shape of the standard normal distribution looks like this:
Practical Applications of the Standard Normal Model
The standard normal distribution could help you figure out which subject you are getting good grades in and which
subjects you have to exert more effort into due to low scoring percentages. Once you get a score in one subject that is
higher than your score in another subject, you might think that you are better in the subject where you got the higher
score. This is not always true.
You can only say that you are better in a particular subject if you get a score with a certain number of standard deviations
above the mean. The standard deviation tells you how tightly your data is clustered around the mean; It allows you to
compare different distributions that have different types of data — including different means.
For example, if you get a score of 90 in Math and 95 in English, you might think that you are better in English than in
Math. However, in Math, your score is 2 standard deviations above the mean. In English, it’s only one standard deviation
above the mean. It tells you that in Math, your score is far higher than most of the students (your score falls into the tail).
Based on this data, you actually performed better in Math than in English!
Poisson Distribution / Poisson Curve: Simple Definition
What is the Poisson Distribution?

A Poisson distribution is a tool that helps to predict the probability of certain events from happening when you know how
often the event has occurred. It gives us the probability of a given number of events happening in a fixed interval of
time.
Poisson distributions, valid only for integers on the horizontal axis. λ (also written as μ) is the expected number of event
occurrences.
Practical Uses of the Poisson Distribution:
A textbook store rents an average of 200 books every Saturday night. Using this data, you can predict the probability
that more books will sell (perhaps 300 or 400) on the following Saturday nights. Another example is the number of
diners in a certain restaurant every day. If the average number of diners for seven days is 500, you can predict the
probability of a certain day having more customers.
Because of this application, Poisson distributions are used by businessmen to make forecasts about the number of
customers or sales on certain days or seasons of the year. In business, overstocking will sometimes mean losses if the
goods are not sold. Likewise, having too few stocks would still mean a lost business opportunity because you were not
able to maximize your sales due to a shortage of stock. By using this tool, businessmen are able to estimate the time when
demand is unusually higher, so they can purchase more stock. Hotels and restaurants could prepare for an influx of
customers, they could hire extra temporary workers in advance, purchase more supplies, or make contingency plans just in
case they cannot accommodate their guests coming to the area.
With the Poisson distribution, companies can adjust supply to demand in order to keep their business earning good profit.
In addition, waste of resources is prevented.
Calculating the Poisson Distribution
The Poisson Distribution pmf is: P(x; μ) = (e-μ * μx) / x!
Where:
 The symbol “!” is a factorial.
 μ (the expected number of occurrences) is sometimes written as λ. Sometimes called the event rate or rate
parameter.
Example question
The average number of major storms in your city is 2 per year. What is the probability that exactly 3 storms will
hit your city next year?
Step 1: Figure out the components you need to put into the equation.
 μ = 2 (average number of storms per year, historically)
 x = 3 (the number of storms we think might hit next year)
 e = 2.71828 (e is Euler’s number, a constant)
Step 2: Plug the values from Step 1 into the Poisson distribution formula:
 P(x; μ) = (e-μ) (μx) / x!

 = (2.71828 – 2) (23) / 3!
 = (0.13534) (8) / 6
 = 0.180
The probability of 3 storms happening next year is 0.180, or 18%
As you can probably tell, you can calculate the Poisson distribution manually but that would take an extraordinary amount
of time unless you have a simple set of data. The usual way to calculate a Poisson distribution in real life situations is with
software like IBM SPSS.
Poisson distribution vs. Binomial

The above example was over-simplified to show you how to work through a problem. However, it can be challenging to
figure out if you should use a binomial distributionor a Poisson distribution. If you aren’t given a specific guideline from
your instructor, use the following general guideline.
 If your question has an average probability of an event happening per unit (i.e. per unit of time, cycle,
event) and you want to find probability of a certain number of events happening in a period of time (or number of
events), then use the Poisson Distribution.
 If you are given an exact probability and you want to find the probability of the event happening a certain
number out times out of x (i.e. 10 times out of 100, or 99 times out of 1000), use the Binomial Distribution
formula.
Hypothesis Testing
What is a Hypothesis?
If you are going to propose a hypothesis, it’s customary to write a statement. Your statement will look like this:
“If I…(do this to an independent variable)….then (this will happen to the dependent variable).”
For example:
 If I (decrease the amount of water given to herbs) then (the herbs will increase in size).
 If I (give patients counseling in addition to medication) then (their overall depression scale will decrease).
 If I (give exams at noon instead of 7) then (student test scores will improve).
 If I (look in this certain location) then (I am more likely to find new species).
A good hypothesis statement should:

 Include an “if” and “then” statement (according to the University of California).
 Include both the independent and dependent variables.
 Be testable by experiment, survey or other scientifically sound technique.
 Be based on information in prior research (either yours or someone else’s).
 Have design criteria (for engineering or programming projects).
What is Hypothesis Testing?
Hypothesis testing in statistics is a way for you to test the results of a survey or experiment to see if you have meaningful
results. You’re basically testing whether your results are valid by figuring out the odds that your results have happened by
chance. If your results may have happened by chance, the experiment won’t be repeatable and so has little use.
Hypothesis testing can be one of the most confusing aspects for students, mostly because before you can even perform a
test, you have to know what your null hypothesis is. Often, those tricky word problems that you are faced with can be
difficult to decipher. But it’s easier than you think; all you need to do is:
1. Figure out your null hypothesis,

2. State your null hypothesis,
3. Choose what kind of test you need to perform,
4. Either support or reject the null hypothesis.
What is the Null Hypothesis?

If you trace back the history of science, the null hypothesis is always the accepted fact. Simple examples of null
hypotheses that are generally accepted as being true are:
1. DNA is shaped like a double helix.
2. There are 8 planets in the solar system (excluding Pluto).
3. Taking Vioxx can increase your risk of heart problems (a drug now taken off the market).
How do I State the Null Hypothesis?

You won’t be required to actually perform a real experiment or survey in elementary statistics (or even disprove a fact like
“Pluto is a planet”!), so you’ll be given word problems from real-life situations. You’ll need to figure out what your
hypothesis is from the problem. This can be a little trickier than just figuring out what the accepted fact is. With word
problems, you are looking to find a fact that is nullifiable (i.e. something you can reject).
Hypothesis Testing Examples #1: Basic Example

A researcher thinks that if knee surgery patients go to physical therapy twice a week (instead of 3 times), their recovery
period will be longer. Average recovery times for knee surgery patients is 8.2 weeks.
The hypothesis statement in this question is that the researcher believes the average recovery time is more than 8.2 weeks.
It can be written in mathematical terms as:
H1: μ > 8.2
Next, you’ll need to state the null hypothesis (See: How to state the null hypothesis). That’s what will happen if the
researcher is wrong. In the above example, if the researcher is wrong then the recovery time is less than or equal to 8.2
weeks. In math, that’s:
H0 μ ≤ 8.2
Rejecting the null hypothesis
Ten or so years ago, we believed that there were 9 planets in the solar system. Pluto was demoted as a planet in 2006. The
null hypothesis of “Pluto is a planet” was replaced by “Pluto is not a planet.” Of course, rejecting the null hypothesis isn’t
always that easy — the hard part is usually figuring out what your null hypothesis is in the first place.
Hypothesis Testing Examples (One Sample Z Test)

The one sample z test isn’t used very often (because we rarely know the actual population standard deviation). However,
it’s a good idea to understand how it works as it’s one of the simplest tests you can perform in hypothesis testing. In
English class you got to learn the basics (like grammar and spelling) before you could write a story; think of one sample z
tests as the foundation for understanding more complex hypothesis testing. This page contains hypothesis testing
examples for one sample z-tests.
One Sample Hypothesis Testing Examples:
A principal at a certain school claims that the students in his school are above average intelligence. A random sample of
thirty students IQ scores have a mean score of 112. Is there sufficient evidence to support the principal’s claim? The mean
population IQ is 100 with a standard deviation of 15.
Step 1: State the Null hypothesis. The accepted fact is that the population mean is 100, so: H0: μ=100.
Step 2: State the Alternate Hypothesis. The claim is that the students have above average IQ scores, so:
H1: μ > 100.
The fact that we are looking for scores “greater than” a certain point means that this is a one-tailed test.
Step 3: Draw a picture to help you visualize the problem.
Step 4: State the alpha level. If you aren’t given an alpha level, use 5% (0.05).
Step 5: Find the rejection region area (given by your alpha level above) from the z-table. An area of .05 is equal to a z-
score of 1.645.
Step 6: Find the test statistic using this formula:

For this set of data: z= (112.5-100) / (15/√30)=4.56.
Step 7: If Step 6 is greater than Step 5, reject the null hypothesis. If it’s less than Step 5, you cannot reject the null
hypothesis. In this case, it is greater (4.56 > 1.645), so you can reject the null.
What is a Z Test?
A Z-test is a type of hypothesis test. Hypothesis testing is just a way for you to figure out if results from a test are valid or
repeatable. For example, if someone said they had found a new drug that cures cancer, you would want to be sure it was
probably true. A hypothesis test will tell you if it’s probably true, or probably not true. A Z test, is used when your data is
approximately normally distributed.
When you can run a Z Test.

Several different types of tests are used in statistics (i.e. f test, chi square test, t test). You would use a Z test if:
 Your sample size is greater than 30. Otherwise, use a t test.
 Data points should be independent from each other. In other words, one data point isn’t related or doesn’t affect
another data point.
 Your data should be normally distributed. However, for large sample sizes (over 30) this doesn’t always matter.
 Your data should be randomly selected from a population, where each item has an equal chance of being selected.
 Sample sizes should be equal if at all possible.
 If N≤30 (Standard deviation of population mean is given)
 One tailed test is also known as direction test or right tailed test. F test and chi square test is one tailed test.
 Two tailed test is called left tailed test as direction is not mention.
How do I run a Z Test?

Running a Z test on your data requires five steps:
1. State the null hypothesis and alternate hypothesis.
2. Choose an alpha level.
3. Find the critical value of z in a z table.
4. Calculate the z test statistic (see below).
5. Compare the test statistic to the critical z value and decide if you should support or reject the null hypothesis.
What is an F Test?
An “F Test” is a catch-all term for any test that uses the F-distribution. In most cases, when people talk about the F-Test,
what they are actually talking about is The F-Test to Compare Two Variances. However, the f-statistic is used in a variety
of tests including regression analysis, the Chow test and the Scheffe Test (a post-hoc ANOVA test).
Conditions to apply f test:-

1) Population mean, sample mean & standard deviation is not given in the question.
2) It will talk about two mean.
3) Its value lies between 0 to ∞
F Test to Compare Two Variances

A Statistical F Test uses an F Statistic to compare two variances, s1 and s2, by dividing them. The result is always a
positive number (because variances are always positive). The equation for comparing two variances with the f-test is:
F = s21 / s22
If the variances are equal, the ratio of the variances will equal 1. For example, if you had two data sets with a sample 1
(variance of 10) and a sample 2 (variance of 10), the ratio would be 10/10 = 1.
You always test that the population variances are equal when running an F Test. In other words, you always assume that
the variances are equal to 1. Therefore, your null hypothesis will always be that the variances are equal.
Assumptions:
Several assumptions are made for the test. Your population must be approximately normally distributed (i.e. fit the
shape of a bell curve) in order to use the test. Plus, the samples must be independent events. In addition, you’ll want to
bear in mind a few important points:
 The larger variance should always go in the numerator (the top number) to force the test into a right-tailed test.
Right-tailed tests are easier to calculate.
 For two-tailed tests, divide alpha by 2 before finding the right critical value.
 If you are given standard deviations, they must be squared to get the variances.
 If your degrees of freedom aren’t listed in the F Table, use the larger critical value. This helps to avoid the
possibility of Type I errors.
The difference between running a one or two tailed F test is that the alpha level needs to be halved for two tailed F tests.
For example, instead of working at α = 0.05, you use α = 0.025; Instead of working at α = 0.01, you use α = 0.005.
With a two tailed F test, you just want to know if the variances are not equal to each other. In notation:
Ha = σ21 ≠ σ2 2
Sample problem: Conduct a two tailed F Test on the following samples:

Sample 1: Variance = 109.63, sample size = 41.
Sample 2: Variance = 65.99, sample size = 21.
Step 1: Write your hypothesis statements:

Ho: No difference in variances.
Ha: Difference in variances.
Step 2: Calculate your F critical value. Put the highest variance as the numerator and the lowest variance as the
denominator:
F Statistic = variance 1/ variance 2 = 109.63 / 65.99 = 1.66
Step 3: Calculate the degrees of freedom:

The degrees of freedom in the table will be the sample size -1, so:
Sample 1 has 40 df (the numerator).
Sample 2 has 20 df (the denominator).
Step 4: Choose an alpha level. No alpha was stated in the question, so use 0.05 (the standard “go to” in statistics). This
needs to be halved for the two-tailed test, so use 0.025.
Step 5: Find the critical F Value using the F Table. There are several tables, so make sure you look in the alpha = .025
table. Critical F (40,20) at alpha (0.025) = 2.287.
Step 6: Compare your calculated value (Step 2) to your table value (Step 5). If your calculated value is higher than the
table value, you can reject the null hypothesis:
F calculated value: 1.66
F value from table: 2.287.
1.66 < 2 .287.
So we cannot reject the null hypothesis
What is a T test?
The t-distribution, used for the t-test. Image: Carnegie Mellon.
The t test tells you how significant the differences between groups are; In other words it lets you know if those differences
(measured in means/averages) could have happened by chance.
A very simple example: Let’s say you have a cold and you try a naturopathic remedy. Your cold lasts a couple of days.
The next time you have a cold, you buy an over-the-counter pharmaceutical and the cold lasts a week. You survey your
friends and they all tell you that their colds were of a shorter duration (an average of 3 days) when they took the
homeopathic remedy. What you really want to know is, are these results repeatable? A t test can tell you by comparing the
means of the two groups and letting you know the probability of those results happening by chance
T-Test Assumptions
1. The first assumption made regarding t-tests concerns the scale of measurement. The assumption for a t-test is that
the scale of measurement applied to the data collected follows a continuous or ordinal scale, such as the scores for
an IQ test.
2. The second assumption made is that of a simple random sample, that the data is collected from a representative,
randomly selected portion of the total population.
3. The third assumption is the data, when plotted, results in a normal distribution, bell-shaped distribution curve.
When a normal distribution is assumed, one can specify a level of probability (alpha level, level of
significance, p) as a criterion for acceptance. In most cases, a 5% value can be assumed.
4. The fourth assumption is a reasonably large sample size is used. A larger sample size means the distribution of
results should approach a normal bell-shaped curve.
5. The final assumption is homogeneity of variance. Homogeneous, or equal, variance exists when the standard
deviations of samples are approximately equal.
Conditions for applying t-Test

1) N≤30 standard deviation of sample mean is given.
2) To check the difference in mean.
 T test is also called t distribution & student t test, exact test, small test.
 Conditions to accept & reject hypothesis-:
1) Table value> calculated value= accept
2) Table value< calculated value= reject
Uses
The T Distribution (and the associated t scores), are used in hypothesis testing when you want to figure out if you
should accept or reject the null hypothesis.
The central region on this graph is the acceptance area and the tail is the rejection region, or regions. In this particular
graph of a two tailed test, the rejection region is shaded blue. The area in the tail can be described with z-scores or t-
scores. For example, the image to the left shows an area in the tails of 5% (2.5% each side). The z-score would be 1.96
(from the z-table), which represents 1.96 standard deviations from the mean. The null hypothesis will be rejected if z is
less than -1.96 or greater than 1.96.
In general, this distribution is used when you have a small sample size (under 30) or you don’t know the population
standard deviation. For practical purposes (i.e. in the real world), this is nearly always the case. So, unlike in
your elementary statistics class, you’ll likely be using it in real life situations more than the normal distribution. If the size
of your sample is large enough, the two distributions are practically the same.
The T Score.
The t score is a ratio between the difference between two groups and the difference within the groups. The larger the t
score, the more difference there is between groups. The smaller the t score, the more similarity there is between groups. A
t score of 3 means that the groups are three times as different from each other as they are within each other. When you run
a t test, the bigger the t-value, the more likely it is that the results are repeatable.
 A large t-score tells you that the groups are different.
 A small t-score tells you that the groups are similar
T-Values and P-values

How big is “big enough”? Every t-value has a p-value to go with it. A p-value is the probability that the results from your
sample data occurred by chance. P-values are from 0% to 100%. They are usually written as a decimal. For example, a p
value of 5% is 0.05. Low p-values are good; They indicate your data did not occur by chance. For example, a p-value of
.01 means there is only a 1% probability that the results from an experiment happened by chance. In most cases, a p-value
of 0.05 (5%) is accepted to mean the data is valid.
Calculating the Statistic / Test Types
There are three main types of t-test:
 An Independent Samples t-test compares the means for two groups.

 A Paired sample t-test compares means from the same group at different times (say, one year apart).
 A One sample t-test tests the mean of a single group against a known mean.
When to Choose a Paired T Test / Paired Samples T Test / Dependent Samples T

Test
Choose the paired t-test if you have two measurements on the same item, person or thing. You should also choose this test
if you have two items that are being measured with a unique condition. For example, you might be measuring car safety
performance in Vehicle Research and Testing and subject the cars to a series of crash tests. Although the manufacturers
are different, you might be subjecting them to the same conditions.
With a “regular” two sample t test, you’re comparing the means for two different samples. For example, you might test
two different groups of customer service associates on a business-related test or testing students from two universities on
their English skills. If you take a random sample each group separately and they have different conditions, your samples
are independent and you should run an independent samples t test (also called between-samples and unpaired-samples).
The null hypothesis for the for the independent samples t-test is μ1 = μ2. In other words, it assumes the means are equal.
With the paired t test, the null hypothesis is that the pairwise difference between the two tests is equal (H0: µd = 0). The
difference between the two tests is very subtle; which one you choose is based on your data collection method.
Paired Samples T Test By hand
Sample question: Calculate a paired t test by hand for the following data:
Step 1: Subtract each Y score from each X score.
Step 2: Add up all of the values from Step 1.

Set this number aside for a moment.
Step 3: Square the differences from Step 1.
Step 4: Add up all of the squared differences from Step 3.
Step 5: Use the following formula to calculate the t-score:
ΣD: Sum of the differences (Sum of X-Y from Step 2)

ΣD2: Sum of the squared differences (from Step 4)
(ΣD)2: Sum of the differences (from Step 2), squared.
Step 6: Subtract 1 from the sample size to get the degrees of freedom. We have 11 items, so 11-1 = 10.
Step 7: Find the p-value in the t-table, using the degrees of freedom in Step 6. If you don’t have a specified alpha level,
use 0.05 (5%). For this sample problem, with df=10, the t-value is 2.228.
Step 8: Compare your t-table value from Step 7 (2.228) to your calculated t-value (-2.74). The calculated t-value is greater
than the table value at an alpha level of .05. The p-value is less than the alpha level: p <.05. We can reject the null
hypothesis that there is no difference between means.
Note: You can ignore the minus sign when comparing the two t-values, as ± indicates the direction; the p-value remains
the same for both directions.
What is a Chi Square Test?

There are two types of chi-square tests. Both use the chi-square statistic and distribution for different purposes:
 A chi-square goodness of fit test determines if a sample data matches a population. For more details on this type,
see: Goodness of Fit Test.
 A chi-square test for independence compares two variables in a contingency table to see if they are related. In a
more general sense, it tests to see whether distributions of categorical variables differ from each another.
 A very small chi square test statistic means that your observed data fits your expected data extremely
well. In other words, there is a relationship.
 A very large chi square test statistic means that the data does not fit very well. In other words, there
isn’t a relationship.
The formula for the chi-square statistic used in the chi square test is:
Uses
The chi-squared distribution has many uses in statistics, including:
 Confidence interval estimation for a population standard deviation of a normal distribution from a sample
standard deviation.
 Independence of two criteria of classification of qualitative variables.
 Relationships between categorical variables (contingency tables).
 Sample variance study when the underlying distribution is normal.
 Tests of deviations of differences between expected and observed frequencies (one-way tables).
 The chi-square test (a goodness of fit test).
The ANOVA Test
An ANOVA test is a way to find out if survey or experiment results are significant. In other words, they help you to figure
out if you need to reject the null hypothesis or accept the alternate hypothesis.
Basically, you’re testing groups to see if there’s a difference between them. Examples of when you might want to test
different groups:
 A group of psychiatric patients are trying three different therapies: counseling, medication and biofeedback. You
want to see if one therapy is better than the others.
 A manufacturer has two different processes to make light bulbs. They want to know if one process is better than
the other.
 Students from different colleges take the same exam. You want to see if one college outperforms the other.
What Does “One-Way” or “Two-Way Mean?

One-way or two-way refers to the number of independent variables (IVs) in your Analysis of Variance test.
 One-way has one independent variable (with 2 levels). For example: brand of cereal,
 Two-way has two independent variables (it can have multiple levels). For example: brand of cereal, calories.
What are “Groups” or “Levels”?

Groups or levels are different groups within the same independent variable. In the above example, your levels for “brand
of cereal” might be Lucky Charms, Raisin Bran, Cornflakes — a total of three levels. Your levels for “Calories” might be:
sweetened, unsweetened — a total of two levels.
Let’s say you are studying if an alcoholic support group and individual counseling combined is the most effective
treatment for lowering alcohol consumption. You might split the study participants into three groups or levels:
 Medication only,
 Medication and counseling,
 Counseling only.
Your dependent variable would be the number of alcoholic beverages consumed per day.
If your groups or levels have a hierarchical structure (each level has unique subgroups), then use a nested ANOVA for the
analysis.
What Does “Replication” Mean?

It’s whether you are replicating (i.e. duplicating) your test(s) with multiple groups. With a two way ANOVA with
replication , you have two groups and individuals within that group are doing more than one thing (i.e. two groups of
students from two colleges taking two tests). If you only have one group taking two tests, you would use without
replication.
Types of Tests.
There are two main types: one-way and two-way. Two-way tests can be with or without replication.
 One-way ANOVA between groups: used when you want to test two groups to see if there’s a difference between
them.
 Two way ANOVA without replication: used when you have one group and you’re double-testing that same
group. For example, you’re testing one set of individuals before and after they take a medication to see if it works
or not.
 Two way ANOVA with replication: Two groups, and the members of those groups are doing more than one
thing. For example, two groups of patients from different hospitals trying two different therapies.
One Way ANOVA

A one way ANOVA is used to compare two means from two independent (unrelated) groups using the F-distribution.
The null hypothesis for the test is that the two means are equal. Therefore, a significant result means that the two means
are unequal.
Examples of when to use a one way ANOVA
Situation 1: You have a group of individuals randomly split into smaller groups and completing different tasks. For
example, you might be studying the effects of tea on weight loss and form three groups: green tea, black tea, and no tea.
Situation 2: Similar to situation 1, but in this case the individuals are split into groups based on an attribute they possess.
For example, you might be studying leg strength of people according to weight. You could split participants into weight
categories (obese, overweight and normal) and measure their leg strength on a weight machine.
Limitations of the One Way ANOVA
A one way ANOVA will tell you that at least two groups were different from each other. But it won’t tell you which
groups were different. If your test returns a significant f-statistic, you may need to run an ad hoc test (like the Least
Significant Difference test) to tell you exactly which groups had a difference in means.
Two Way ANOVA

A Two Way ANOVA is an extension of the One Way ANOVA. With a One Way, you have one independent
variable affecting a dependent variable. With a Two Way ANOVA, there are two independents. Use a two way ANOVA
when you have one measurement variable (i.e. a quantitative variable) and two nominal variables. In other words, if your
experiment has a quantitative outcome and you have two categorical explanatory variables, a two way ANOVA is
appropriate.
For example, you might want to find out if there is an interaction between income and gender for anxiety level at job
interviews. The anxiety level is the outcome, or the variable that can be measured. Gender and Income are the
two categorical variables. These categorical variables are also the independent variables, which are called factors in a
Two Way ANOVA.
The factors can be split into levels. In the above example, income level could be split into three levels: low, middle and
high income. Gender could be split into three levels: male, female, and transgender. Treatment groups are all possible
combinations of the factors. In this example there would be 3 x 3 = 9 treatment groups.
Main Effect and Interaction Effect

The results from a Two Way ANOVA will calculate a main effect and an interaction effect. The main effect is similar to a
One Way ANOVA: each factor’s effect is considered separately. With the interaction effect, all factors are considered at
the same time. Interaction effects between factors are easier to test if there is more than one observation in each cell. For
the above example, multiple stress scores could be entered into cells. If you do enter multiple observations into cells, the
number in each cell must be equal.
Two null hypotheses are tested if you are placing one observation in each cell. For this example, those hypotheses would
be:
H01: All the income groups have equal mean stress.
H02: All the gender groups have equal mean stress.
For multiple observations in cells, you would also be testing a third hypothesis:
H03: The factors are independent or the interaction effect does not exist.
An F-statistic is computed for each hypothesis you are testing.
Assumptions for Two Way ANOVA

 The population must be close to a normal distribution.
 Samples must be independent.
 Population variances must be equal.
 Groups must have equal sample sizes.
Correlation & Regression
Definition
Correlation is used to test relationships between quantitative variables or categorical variables. In other words, it’s a
measure of how things are related. The study of how variables are correlated is called correlation analysis.
Some examples of data that have a high correlation:
 Your caloric intake and your weight.

 Your eye color and your relatives’ eye colors.
 The amount of time your study and your GPA.
Some examples of data that have a low correlation (or none at all):
 Your sexual preference and the type of cereal you eat.

 A dog’s name and the type of dog biscuit they prefer.
 The cost of a car wash and how long it takes to buy a soda inside the station.
Correlations are useful because if you can find out what relationship variables have, you can make predictions about
future behavior. Knowing what the future holds is very important in the social sciences like government and healthcare.
Businesses also use these statistics for budgets and business plans.
The Correlation Coefficient

A correlation coefficient is a way to put a value to the relationship. Correlation coefficients have a value of between -1
and 1. A “0” means there is no relationship between the variables at all, while -1 or 1 means that there is a perfect
negative or positive correlation (negative or positive correlation here refers to the type of graph the relationship will
produce).
Correlation coefficient formulas are used to find how strong a relationship is between data. The formulas return a value
between -1 and 1, where:
 1 indicates a strong positive relationship.
 -1 indicates a strong negative relationship.
 A result of zero indicates no relationship at all.
 Correlation coefficient formulas are used to find how strong a relationship is between data. The formulas return a
value between -1 and 1, where:
1 indicates a strong positive relationship.

-1 indicates a strong negative relationship.
A result of zero indicates no relationship at all.
Types of correlation coefficient formulas:

There are several types of correlation coefficient formulas.
One of the most commonly used formulas in stats is Pearson’s correlation coefficient formula. If you’re taking a basic
stats class, this is the one you’ll probably use:
Pearson correlation coefficient
Followings are the methods of correlation

1) Scatter diagram method
2) Graphic method
3) Karl Pearson coefficient of correlation.
4) Rank correlation
5) Concurrent deviation method
6) Method of least squares
Regression:
What is Regression?
 If you’re just beginning to learn about regression analysis, a simple linear is the first type of regression you’ll
come across in a stats class.
Linear regression is the most widely used statistical technique; it is a way to model a relationship between two sets
of variables. The result is a linear regression equation that can be used to make predictions about data.
Linear” means line. The word Regression came from a 19th-Century Scientist, Sir Francis Galton, who coined the term
“regression toward mediocrity” (in modern language, that’s regression toward the mean). He used the term to describe the
phenomenon of how nature tends to dampen excess physical traits from generation to generation (like extreme height).
What is Simple Linear Regression?

You’re probably familiar with plotting line graphs with one X axis and one Y axis. The X variable is sometimes called
the independent variable and the Y variable is called the dependent variable. Simple linear regression plots one
independent variable X against one dependent variable Y. Technically, in regression analysis, the independent variable is
usually called the predictor variable and the dependent variable is called the criterion variable. However, many people just
call them the independent and dependent variables. More advanced regression techniques (like multiple regression) use
multiple independent variables.
Regression analysis can result in linear or nonlinear graphs. A linear regression is where the relationships between
your variables can be described with a straight line. Non-linear regressions produce curved lines.(**)
Simple linear regression for the amount of rainfall per year.
Points to be remembered in Regression

 Used for predictions and forecasting.
 It is a statistical device with the help of which we are in a position to estimate (or predict) the unknown values of
one variable from known values of another variables.
Y = a+bx, Y dependent variable we are trying to predict, x= independent variable which is used to predict.
 Geometric mean of two regression co-efficient gives co-efficient of correlation. R= √bxy.byx
 Regression is affected by change of scale & independent of change in origin.
 R2= √bxy.byx
 R2= bxy.byx
 = cov(xy/σx2. Cov(xy)/σy
 = cov2(xy)/σx2.σy2
Difference between Correlation & Regression

CORRELATION REGRESSION
1) Correlation simply tells the relationship 1) Regression mean stepping back or returning
between the two or more variables which vary to the average value i.e. it’s simply tells average
together relationship between two variables.
2) Correlation coefficient tells the degree of 2) Regression analysis aims at establishing the
relationships between two variable, r= xy= ryx functional relationships between two variables
bxy.byx (not symmetric).
3) Correlation need not imply cause & effect 3) Regression analysis clearly indicate the cause
relationship between two variables. & effect relationships.
4) Correlation coefficient is a relative measure 4) Regression coefficient bxy & byx are
of the linear relationship between x and y & is absolute measures representing the change in the
independent of units of measurement. It value value of variable y for a unit change in the value
lies between -1 to 1. of variable x,
its value lies between 0 & 1.
5) There may be nonsense correlation between 5) There is no such things as non-sense

two variables e/g intelligent & weight called regression.
spurious correlations.
6) Correlation analysis is confined to the study 6) Regression analysis includes linear as well as
of linear relationships between variables non-linear relationships between variables.
7) Correlation coefficient is independent of both 7) Regression is independent of change of origin
change of scale & change of origin but not of scale.
Some Important Law & Principles of Sampling:
Law of statistical regularity:

This law is derived from the mathematical theory of probability. In the words of King: "the law of statistical regularity
lays down that a moderately large number of items chosen at random from a large group are almost sure on the average to
possess the characteristics of the large group”. In other words, this law points out that if a sample is taken at random from
a population, it is likely to possess almost the same characteristics as that of the population. This law directs our attention
to an important point, that is, the desirability of choosing the sample at random.
By random selection we mean a selection where each and every item of the population has an equal chance of being
selected in the sample. In other words, the selection must not be made by deliberate exercise of one’s discretion. A sample
selected in this manner would be a representative of the population. If this condition is satisfied it is possible for one to
depict fairly accurately the characteristics of the population by studying only a part of it. Hence, this law is of great
practical significance because it makes possible a considerable reduction of the work necessary before any conclusion is
drawn regarding a large universe. For example, if one intends to make a study of the average height of the students of an
University it is not necessary to measure the heights of each and every student. A few students may be selected at random
from each college, their heights may be measured and the average height of university students in general may be inferred.
It should be noted that the results derived from sample data may be different from that of the population. This is for the
simple reason that the sample is only a part of the whole universe. For example, the average height of students of the
University may come out to be 160cm. By census method whereas it may be 159cm. or 161cm. for the sample taken. It
should be just a coincidence if the height comes out to be exactly 160cm. under both the methods. However, there would
not be much difference in the results derived if the sample is a representative of the population.
Law of Inertia of large numbers:

This law is a corollary of the law of statistical regularity. It is of great significance in the theory of sampling. It states that,
other things being equal, larger the size of the sample, more accurate the results are likely to be. This is because large
numbers are more stable as compared to small ones. The difference in the aggregate result is likely to be insignificant,
when the number in the sample is large, because when large numbers are considered the variations in the component parts
tend to balance each other and, therefore, the variation in the aggregate is insignificant. For example, if a coin is tossed 10
times we should expect equal number of heads and tails, i.e., 5 each. But since the experiment is tried a small number of
times it is likely that we may not get exactly 5 heads and 5 tails. The result may be a combination of 9 heads and 1 tail, or
8 heads and 2 tails, or 7 heads and 3 tails. If the same experiment is carried out 1,000 times the chance of 500 heads and
500 tails would be very high, i.e., the result would be very near to 50% heads and 50% tails. The basic reason for such
likelihood is that the experiment has been carried out sufficiently large number of times and possibility of variation in one
direction compensating others in a different direction is greater. If at one time we get continuously 5 heads, it is likely that
at other time we may get continuously 5 tails, and so on, and for the experiment as a whole the number of heads and tails
may be more or less equal. Similarly, if it is intended to study the variation in the production of rice over a number of
years and data are collected from one or two States only, the result would reflect large variations in production due to the
favourable factors in operation. If, on the other hand, figures of production are collected for all the States in India, it is
quite likely that we find little variation in the aggregate. This does not mean that the production would remain constant for
all the years. It only implies that the changes in the production of the individual States will be counterbalanced so as to
reflect smaller variations in production for the country as a whole.
Central Limit Theorem:-

Definition: The Central Limit Theorem states that when a large number of simple random samples are selected from the
population and the mean is calculated for each then the distribution of these sample means will assume the normal
probability distribution.
n other words, the sample means will be normally distributed when the mean and standard deviation of the population is
given, and large random samples are selected from the population, irrespective of whether the population is normal or
skewed. Symbolically the central limit theorem can be explained as:
When ‘n’ number of independent random variables are given each having the same distribution, then:
X = X1+X2+X3+X4+…. +Xn, the mean and variance of X will be:
The following three probability distributions must be

understood for the complete understanding of the Sampling Theory:
 Population (Universe) Distribution
 Sample Distribution
 Sampling Distribution
The utility of the central limit theorem is that it requires no condition on distribution patterns of the random variables and
in fact, uses the practical method to compute the approximate probability values for the arbitrarily distributed random
variables.
Also, it helps to determine why the vast number of phenomena shows approximate normal distribution. Suppose, the
population is skewed, the skewness of the sampling distribution is inversely proportional to the square root of the
sample size. Thus, if the sample size is 25, then the sampling distribution exhibits only one-fifth as much skewness as the
population.
Thus, it can be said that the sampling distribution of the sample mean assumes the normal distribution irrespective of what
distribution a population assumes from which the samples are drawn, and the approximation to the normal distribution is
likely to increase with the increase in the sample size.
Principle of persistence of small numbers: According to this principle if some of the items in a population
possess markedly distinct characteristic from the remaining items then this tendency would be revealed in the sample
value also rather this tendency of persistence will be there even if the population size is increased or even in the case of
large sample.
Principle of validity: A sample design is termed as valid if it enables us to obtain valid tests & estimates about the
population parameters
Principle of optimization: this principle stresses the need of obtaining optimum results in terms of efficiency cost of
the sample design with the source available at our disposal.
Important Properties for NET Point of View
Properties of arithmetic mean-

Property 1 :
If all the observations assumed by a variable are constants, say "k", then arithmetic mean is also "k".
For example, if the height of every student in a group of 10 students is 170 cm, the mean height is, of course 170 cm.
Property 2 :
The algebraic sum of deviations of a set of observations from their arithmetic mean is zero.
That is,
for unclassified data, ∑(x - x̄) = 0.
And for a grouped frequency distribution, ∑f(x - x̄) = 0.
For example, if a variable "x" assumes five observations, say 10, 20, 30, 40, 50, then x̄ = 30.
The deviations of the observations from arithmetic mean (x - x̄) are -20, -10, 0, 10, 20.
Now, ∑(x - x̄) = (-20) + (-10) + 0 + 10 + 20 = 0
Property 3 :
Arithmetic mean is affected due to a change of origin and/or scale which implies that if the original variable "x" is
changed to another variable "y" effecting a change of origin, say "a" and scale, say "b", of "x". That is y = a + bx.
Then we have,
Arithmetic mean of "y" = a + bx̄
For example, if it is known that two variables x and y are related by 2x + 3y + 7 = 0 and x̄ = 15, then
Arithmetic mean of "y" = (-7 - 2x̄) / 3
Plug x̄ = 15
Arithmetic mean of "y" = (-7 - 2x15) / 3
Arithmetic mean of "y" = (-7 - 30) / 3
Arithmetic mean of "y" = -37/ 3
Arithmetic mean of "y" = -12.33
Property 4 :
If there are two groups containing n₁ and n₂ observations
x̄₁ and x̄₂ are the respective arithmetic means, then the combined arithmetic mean is given by
x̄ = (n₁x̄₁ + n₂x̄₂) / (n₁ + n₂)

This property could be extended to more than two groups and we may write it as
x̄ = ∑nx̄ / ∑n
Here,
∑nx̄ = n₁x̄₁ + n₂x̄₂ + ..............
∑n = n₁ + n₂ + ........................
The properties of the Median:-

1. The median is used to find the center or middle value of a data set.
2. The median is used when it is necessary to find out whether the data values fall into the upper half or lower half of the
distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or extremely low values
The main properties of mode in statistics are :-

1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute.
3. The mode can be used when the data are nominal or categorical, such as religious preference, gender, or political
affiliation.
4. The mode is not always unique. A data set can have more than one mode, or the mode may not exist for a data set.
The main properties of correlation.

1. Coefficient of Correlation lies between -1 and +1:
The coefficient of correlation cannot take value less than -1 or more than one +1. Symbolically,
-1<=r<= + 1 or | r | <1.
2. Coefficients of Correlation are independent of Change of Origin:
This property reveals that if we subtract any constant from all the values of X and Y, it will not affect the coefficient of
correlation.
3. Coefficients of Correlation possess the property of symmetry:
The degree of relationship between two variables is symmetric as shown below:
4. Coefficient of Correlation is independent of Change of Scale:
This property reveals that if we divide or multiply all the values of X and Y, it will not affect the coefficient of correlation.
5. Co-efficient of correlation measures only linear correlation between X and Y.
6. If two variables X and Y are independent, coefficient of correlation between them will be zero
The main properties of regression coefficient:

1. It is generally denoted by ‘b’.
2. It is expressed in the form of an original unit of data.
3. If two variables are there say x and y, two values of the regression coefficient are obtained. One will be
obtained when x is independent and y is dependent and other when we consider y as independent and x as
a dependent. The regression coefficient of y on x is represented by b yx and x on y as bxy.
4. Both of the regression coefficients must have the same sign. If b yx is positive, bxy will also be positive
and it is true for vice versa.
5. If one regression coefficient is greater than unity, then others will be lesser than unity.
6. The geometric mean between the two regression coefficients is equal to the correlation coefficient
o R=sqrt(byx*bxy)
Also, the arithmetic means (am) of both regression coefficients is equal to or greater than the coefficient of
correlation.
(byx + bxy)/2= equal or greater than r.
7. The regression coefficients are independent of the change of the origin. But, they are not independent
of the change of the scale. It means there will be no effect on the regression coefficients if any constant is
subtracted from the value of x and y. If x and y are multiplied by any constant, then the regression
coefficient will change.
Main Properties of standard deviation:-

When using standard deviation keep in mind the following properties.
 Standard deviation is only used to measure spread or dispersion around the mean of a data set.
 Standard deviation is never negative.
 Standard deviation is sensitive to outliers. A single outlier can raise the standard deviation and in turn, distort the
picture of spread.
 For data with approximately the same mean, the greater the spread, the greater the standard deviation.
 If all values of a data set are the same, the standard deviation is zero (because each value is equal to the mean).
Some Random Topics For NET
What is an Interquartile Range?

The interquartile range is a measure of where the “middle fifty” is in a data set. Where a range is a measure of where the
beginning and end are in a set, an interquartile range is a measure of where the bulk of the values lie. That’s why it’s
preferred over many other measures of spread (i.e. the average or median) when reporting things like school performance
or SAT scores.
The interquartile range formula is the first quartile subtracted from the third quartile:
IQR = Q3 – Q1.
Range:-
The Range is the difference between the lowest and highest values.
Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9.
So the range is 9 − 3 = 6.
Probable Error of Correlation Coefficient:-
Definition: The Probable Error of Correlation Coefficient helps in determining the accuracy and reliability of the
value of the coefficient that in so far depends on the random sampling.
In other words, the probable error (P.E.) is the value which is added or subtracted from the coefficient of correlation (r) to
get the upper limit and the lower limit respectively, within which the value of the correlation expectedly lies.
The probable error of correlation coefficient can be obtained by applying the following formula:
r = coefficient of correlation
N = number of observations
 There is no correlation between the variables if the value of ‘r’ is less than P.E. This shows that the coefficient
of correlation is not at all significant.
 The correlation is said to be certain when the value of ‘r’ is six times more than the probable error; this shows
that the value of ‘r’ is significant.
 By adding and subtracting the value of P.E from the value of ‘r,’ we get the upper limit and the lower
limit, respectively within which the correlation of coefficient is expected to lie. Symbolically, it can be expressed
where rho denotes the correlation in a population

The probable Error can be used only when the following three conditions are fulfilled:
1. The data must approximate to the bell-shaped curve, i.e. a normal frequency curve.
2. The Probable error computed from the statistical measure must have been taken from the sample.
3. The sample items must be selected in an unbiased manner and must be independent of each other.
Thus, the probable error is calculated to check the reliability of the value of coefficient calculated from the random
sampling.
What is the standard error?

The standard error(SE) is very similar to standard deviation. Both are measures of spread. The higher the number, the
more spread out your data is. To put it simply, the two terms are essentially equal — but there is one important difference.
While the standard error uses statistics (sample data) standard deviations use parameters (population data). (What is
the difference between a statistic and a parameter?).
In statistics, you’ll come across terms like “the standard error of the mean” or “the standard error of the median.” The SE
tells you how far your sample statistic (like the sample mean) deviates from the actual population mean. The larger your
sample size, the smaller the SE. In other words, the larger your sample size, the closer your sample mean is to the actual
population mean.
What is the SE Calculation?

How you find the standard error depends on what stat you need. For example, the calculation is different for the mean or
proportion. When you are asked to find the sample error, you’re probably finding the standard error. That uses the
following formula: s/√n. You might be asked to find standard errors for other stats like the mean or proportion.
Important Formulas:-

STATISTICS

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

STATISTICS

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STATISTICS

Uploaded by

Copyright:

Available Formats

KNOWLEDGE ARENA FOR UGC NTA NET/JRF

2) Combined mean= n1x1 + n2x2/n1+n2

4) Mean is affected by both change of scale and change of origin.

6) Mode = 3median – 2mean

 Sample size N = 100

1) Law of statistical regularity.

2) Principle of inertia of large number.

 F test in which value of numerator is always greater than denominator

1. The mean is the average of a data set.

Hints to remember the difference

Mean vs Average: What’s the Difference?

Specific “Means” commonly used in Stats

The harmonic formula.

Formatted: Font: (Default) Times New Roman, Font color:

2. What is the Mode?

3. What is the Median?

Statistics Basics: Definitions

What is Bias in Statistics?

Example of a Biased Estimator

What is Selection Bias?

Qualitative Variable (Categorical Variable): Definition and Examples

Quantitative Variable Qualitative Variables

Fractions Cat breeds

Odd Numbers Fast Food Chains

Whole Numbers College Major

Irrational Numbers Fraternities

Ordered pairs (x,y) Hair Color

Negative Numbers Computer Brands

Map coordinates Beer breweries

Positive Numbers Pop music genre

Qualitative Variables and the Nominal Scale

Census in Statistics: Overview

Discrete vs Continuous variables: Definitions.

Discrete variables on a scatter plot.

What is a Continuous Variable?

Time is a continuous variable.

What is a Dependent Event?

What is an Independent Event?

What is a Parameter in Statistics?

A census is where everyone is surveyed.

What is the Sample Variance?

Definition of Sample Variance

Standard Deviation: Simple Definition

What Does it Look Like on a Graph?

Real Life Example

α: significance level (type I error).

Binomial Theorem: Simple Definition, Formula.

What is the Binomial Theorem?

What is a Bernoulli Distribution?

For example, if p = .04, then E[X] = 0.4.

The variance of a Bernoulli random variable is:

What is a Bernoulli Trial?

Relation to the Binomial Distribution

Variance of a Binomial Distribution

OK, So what does the Binomial Variance mean?

Alternate form of the z score.

Standard Deviation for a Binomial

Normal Distributions (Bell Curve):