[go: up one dir, main page]

0% found this document useful (0 votes)
17 views6 pages

4th QTR Lecture

reviewer for 4th quarter in statistics and probability

Uploaded by

Alexa Cunanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

4th QTR Lecture

reviewer for 4th quarter in statistics and probability

Uploaded by

Alexa Cunanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

CHAPTER 1: HYPOTHESIS FORMULATION

DEFINITION OF TERMS:
1.) Hypothesis - is a conjecture or statement which aims to explain certain phenomena in the real world.
2.) Hypothesis testing – or significance testing is a method used for testing a claim or hypothesis about a parameter using the
data measured from a sample.
3.) Test statistic is a value determined by a formula that is compared with a critical value (z or t).
4.) Critical values are values that separate the rejection region and the non-rejection region.
5.) Rejection region (Critical region) refers to the region which contains the set of values for the test statistic that leads to
rejection of null hypothesis.
6.) Non-rejection region (Acceptance region) the set of values not in the rejection region that leads to non-rejection of null
hypothesis.
7.) Significance level (α) is the probability of committing a Type I error.

A hypothesis testing is a process of gathering evidence to either support or rebut a claim known as hypothesis. In this method,
we test a hypothesis by determining the likelihood that a sample statistic could have been selected, if the hypothesis regarding
the population parameter was true.

Type I and Type II Errors

Based on the decision matrix, the four possible outcomes when we reject or
accept the null hypotheses are:
1. The null hypothesis is accepted when, in fact, it is TRUE.
2. The null hypothesis is accepted when, in fact, it is FALSE.
3. The null hypothesis is rejected when, in fact, it is TRUE.
4. The null hypothesis is rejected when, in fact, it is FALSE.

One-tailed and Two-tailed Tests


A one-tailed hypothesis test is also called a directional test because the hypothesis states the direction of the difference between
a parameter and its hypothesized value. This test is used when the alternative hypothesis, H1, utilizes the > or < symbol.
One-tailed test can be right-tailed or left-tailed. The direction will depend on the alternative hypothesis.

Left-Tailed Test
It is used when the parameter is believed to be lower than the hypothesized value. The alternative hypothesis (H1) must contain
the < symbol to use this test.
Right-Tailed Test
It is used when parameter is supposed to be greater than the hypothesized value. The alternative hypothesis (H1) must contain
the > symbol to use this test.

Two-Tailed Test
Two-tailed hypothesis tests are also known as nondirectional and two-sided tests because you can test for effects in both
directions. When you perform a two-tailed test, you split the significance level percentage between both tails of the distribution.
A test is called a TWO-TAILED TESTS if the rejection region is located on both ends of the distribution. It is used when the
alternative hypothesis (H1) utilizes the ≠ symbol. Hypothesis does not state the direction of difference between a parameter and
its hypothesized value.

Significance Level
The commonly used significance levels are α = 0.05 and α = 0.01. If no level of significance is given, use α = 0.05.

CHAPTER 2: HYPOTHESIS TESTING ON POPULATION MEAN USING TRADITIONAL METHOD


Definition of Terms
Rejection region (Critical region) refers to the region which contains the set of values for the test statistic that leads to
rejection of H0.
Non-rejection region (Acceptance region) refers to the region which contains the set of values not in the rejection region that
leads to non-rejection of H0.
Critical values are values that separate the rejection region and the non-rejection regions. There are z - critical values and t -
critical values, but only one should be used in hypothesis testing, depending on the given conditions.
Significance level (α) is the probability of committing Type I error. It is also the area of the rejection region. The commonly used
significance levels are 0.01 and 0.05.

REJECTION REGION, NON-REJECTION REGION, AND CRITICAL VALUES


For two-tailed tests:
• There are two rejection regions which are located on both sides of the curve.
• Each rejection region has an area of α/2.
• There are also two critical values (one positive and one negative) which separate the rejection regions and the non-
rejection region.
• The z-critical values depend on the significance level used in hypothesis testing. These values are fixed and can be derived
from the z-table.
• The t-critical values depend on the significance level and the degrees of freedom. These values can be obtained from the t-
table.
(Formula: df = n – 1 where df – degrees of freedom, n - sample size)

For one-tailed tests:


• There is only one rejection region for a one-
tailed test. If the test is left-tailed, the rejection
region is found on the left. If the test is right-
tailed, the rejection region is found on the right.
• The rejection region has an area of α
(significance level).
• The critical value is negative for a left-tailed test,
and positive for a right-tailed test.
• The z-critical values depend on the significance
level used in hypothesis testing. These values
are fixed and can be derived from the z-table.
• The t-critical values depend on the significance
level and the degrees of freedom. These values
can be obtained from the t-table.
(Formula: df = n – 1 where df – degrees of freedom, n - sample size)

TEST STATISTIC
The two test statistics are z and t, but only one should be used in hypothesis testing.
• If the population standard deviation (σ) is known or given, then we use z-test.
• If the population standard deviation (σ) is unknown, but the sample standard deviation (s) is given, then we use t-test.

The formulas for the test statistics are:


where:
x̅ = sample mean
σ = population standard deviation
s = sample standard deviation
n = sample size
μ = hypothesized value of population mean

HYPOTHESIS TESTING ON ONE SAMPLE MEAN


One-Sample z-Test or t-Test
A one-sample test is a test conducted on one sample purportedly coming from a population with mean μ. It is sometimes called a
significance test for a single mean.
Two cases to consider for testing the mean of a single population:
1. The sample must come from a normally distributed population. Thus, we can apply the Central Limit Theorem and we use the
normal curve as a model.
2. Use z-test if the population standard deviation (σ) is known. But if σ is unknown and s is known, use t-test.

Steps in Traditional Method of Hypothesis Testing:


1. Formulate the Null and Alternative Hypotheses
Formulate the null and alternative hypothesis in statement and symbol by describing population mean.
Two-tailed Right Tailed Test Left Tailed Test
Null Hypothesis Alternative Null Hypothesis Alternative Null Hypothesis Alternative
Hypothesis Hypothesis Hypothesis
H 0: μ = μ0 H 1: μ ≠ μ0 H 0: μ ≤ μ0 H 1: μ > μ0 H 0: μ ≥ μ0 H 1: μ < μ0
2. Determine the type of test, significance level (α), and critical value/s (z/tcrit)
Type of test: left-tailed, right-tailed, two-tailed
α = 0.05 or 0.01
z/tcrit = refer to the table in lesson 1
3. Compute the test statistic (z or t)
4. State the Decision Rule
For left-tailed test: If z/tcomp ≤ -z/t crit , reject H0. Otherwise, do not reject H0.
For right-tailed test: If z/tcomp ≥ z/tcrit, reject H0. Otherwise, do not reject H0.
For two-tailed test: If z/tcomp ≤ -z/tcrit or z/tcomp ≥ +z/tcrit, reject H0. Otherwise, do not reject H0.
Note: The decision rule to be used depends on the type of test. The decision rule is a guide to determine when to reject
or do not reject the null hypothesis. You can draw the normal curve to show whether the computed statistic lies in the
rejection region or non-rejection region to support the decision rule.
5. Make the decision on null hypothesis
Before you can make decision, you must compare first the computed value of the test statistic (z or t) and the critical
value (z/tcrit). If the decision rule is satisfied based on the comparison, then REJECT the null hypothesis. If the rule is not
satisfied, DO NOT REJECT the null hypothesis.
6. Write a conclusion based on the claim
• If the null hypothesis is rejected, it means that the alternative hypothesis is true. (Just rewrite the alternative hypothesis
in words as the conclusion)
• If the null hypothesis is not rejected, it means that it is true and there is no significant difference between the population
mean and sample mean. (Just rewrite the null hypothesis in words as the conclusion)

CHAPTER 3: HYPOTHESIS TESTING ON POPULATION PROPORTION


Definition of Terms:
1) Proportion – is a fraction expression with the number of favorable responses in the numerator and the total number or
respondents in the denominator.
2) For a sample proportion, we shall use the following formulas:
X
p̂ = and q̂ =1− p̂
n
Where,
n = number of observations in a simple random sample
p̂ = sample proportion (read “p hat”)
X = desired outcomes
3) Properties of the Sampling Distribution of p̂ :
a. The mean of the sampling distribution of p̂ is p. in other words, p̂ is the unbiased estimator of p.

b. The standard deviation of the sampling distribution of p̂ is


√ pq . In other words, 𝜎𝑝̂ =
n √ pq , where q = 1 - p
n
c. When the n is large, the sampling distribution of p̂ is approximately normal. A sample size is considered large
if the interval p̂ ±3𝜎𝑝̂ does not include 0 (like p = 0.001) or (like = 0.99)
Z-TEST FOR PROPORTIONS

Steps in hypothesis testing on population proportion:


1. Formulate the null and alternative hypothesis in statement and symbol by describing population parameter of interest- mean.
2. State the type of test, α level, critical value/s
3. Apply the appropriate Test statistic (z or t)
4. State the Decision Rule
5. State the decision on the null hypothesis.
6. Construct your conclusion

CHAPTER 4: CORRELATION ANALYSIS


In the previous study of statistic, we dealt with data which involve a single variable. These are called univariate data.
Since we are dealing with a single variable independently of the other variables, the only statistical option we can do is to
describe it in terms of central tendency, variation, or other descriptive statistics.
Bivariate data are data that involve two variables as different from univariate data that involve only a single variable. In
univariate data, the major purpose of the analysis is to describe based on the descriptive statistics computed such as averages,
standard deviations, frequency counts and the like.
In bivariate data, the purpose of the analysis is to describe relationships where new statistical methods will be
introduced. We will be describing relationships between related variables in terms of strength and direction.

DEFINITION OF TERMS
1. CORRELATION - It is the extent to which two variables are related. If the two variables are highly related, then knowing the
value of one of them will allow you to predict the other variable with considerable accuracy (regression analysis).
2. CORRELATION ANALYSIS - It is a statistical method used to determine the relationship between two variables (bivariate
data) in terms of strength and direction. The goal of a correlation analysis is to see whether two quantitative variables co vary,
and to quantify the strength of the relationship between the variables.
3. DIRECTION OF CORRELATION
• Positive Correlation – exists when high values of one variable correspond to high values in the other variable or vice versa.
Example: no. of family members and expenses; height and shoe size; age and weight
• Negative Correlation – exists when high values in one variable correspond to low values in the other variable or vice versa.
Example: expenses and savings; no. of absences and grades; no. of cigarettes consumed and age at death
• Zero Correlation – exists when high values in one variable correspond to either high or low values in the other variable.
Example: height and grade; scores in Filipino and scores in PE
4. STRENGTH OF CORRELATION
The strength of correlation between two variables maybe perfect, very high, moderately high, moderately low, very low, and
zero.
Pearson r Qualitative Description
±1 Perfect
± 0.75 to < ± 1 Very High
± 0.50 to < ± 0.75 Moderately High
± 0.25 to < ± 0.50 Moderately Low
> 0 to < ± 0.25 Very Low
0 No correlation
SCATTERPLOT DIAGRAM AND PEARSON r

1. Scatterplot Diagram
It is a point-graph of all the scores taken from bivariate data. A scatter plot is sometimes written as one-word,
scatterplot and is also called scatter graph or scatter diagram.
It shows how each point collected from a set of bivariate data are scattered on the Cartesian plane. It allows us to
visually see the relation between two variables. Independent variable is plotted on the x-axis and dependent variable on the y-
axis. It allows us to visually see the relation between two variables. One variable is plotted on the ordinate(y) and the other on
the abscissa(x). It is common to place the variable you are attempting to predict on the ordinate.
NOTE:
• Direction is determined by the slope of the trend line. Trend line is the line closest to the points. Strength is indicated by the
closeness of the points to the trend line. The closer the points are to the trend line, the stronger the relationship is.
• The absolute value of r indicates the strength or
magnitude of correlation between two variables. The
direction of correlation is indicated by the sign
(positive or negative) of r.
• If the trend line contains all the points in the
scatterplot and the line points to the right, we
conclude that there is a perfect positive
correlation between the two variables. The
computed r is 1.
• If all the points fall on the trend line that point to the
left, then there exists a perfect negative
correlation between the pair of variables. The
computed r is –1.
• If a trend line does not exist, there is no correlation
between the pair of variables. This is confirmed by
the computed value of r is 0.

Whenever we describe correlation between two variables, we should always describe it in terms of strength and direction. So,
we can have a perfect positive correlation, perfect negative correlation, moderately high positive correlation, moderately high
negative correlation, and so on.

2. Pearson Product-Moment Correlation Coefficient (Pearson r)


• It is the most commonly used statistic to measure the degree of relationship between two variables (scalar). It evaluates the
linear relationship between two variables
• The formula is:
Where
n = no. of respondents
Σxy = summation of the product of two variables
Σx = summation of the x variables
Σy = summation of the y variables
Σx2 = summation of x-squared variables
Σy2 = summation of y- squared variables
(Σx)2 = square of the summation of x variables
(Σy)2 = square of the summation of y variables

NOTE: Causation means cause and effect relation. “Correlation does not imply causation.” means that correlation cannot be
used to infer a causal relationship between the variables. Simple example is that sales of personal computers and athletic shoes
have both risen strongly in the last several years and there is a high correlation between them, but you cannot assume that
buying computers causes people to buy athletic shoes (or vice versa).

CHAPTER 5: REGRESSION ANALYSIS


INDEPENDENT AND DEPENDENT VARIABLES
1. Independent Variable
o It is a variable that stands alone and is not affected by other variables.
o It is regarded as the predictor.
o It is represented by x symbol.
2. Dependent Variable
o It is a variable whose value depends on other factors or variables.
o It is regarded as the outcome.
o It is represented by the y symbol.
REGRESSION ANALYSIS

Regression Analysis is a statistical method used to predict the value


of the dependent variable using the given value of the independent
variable. It can only be performed if there is a significant relationship
between the two variables which can be observed through correlation
analysis and hypothesis testing.

FORMULAS

where:
n – number of ordered pairs in the data (pairs of x and y
values)
x – independent variable
y – dependent variable
y’ – predicted value of dependent variable
a – y-intercept of the regression line
b – slope of the regression line

STEPS IN SOLVING THE REGRESSION LINE EQUATION


STEP 1: Construct the table for x, y, x2, y2, and xy.
REMEMBER
✓ Identify first the independent and dependent variables. The independent variable x is the price of gas, and the
dependent variable y is the price of pork.
✓ For columns x and y, copy each pair of x-y values in the given data. They cannot be interchanged!
✓ For column x2, just get the square of each x value.
✓ For column y2, just get the square of each y value.
✓ For column xy, just get the product of each pair of x and y values.
✓ Lastly, get the total value for each column. The total values will be used in the formulas.
*The Σ (sigma) symbol represents the summation of all the values in a particular column.
STEP 2: Compute the y-intercept (a) and slope (b) of the regression line.
REMEMBER!
✓ Just substitute in the formula the corresponding total value we obtained in the last row of the table in Step 1.
STEP 3: Substitute the values of a and b to the regression line equation.
The regression line equation is y’ = bx + a. Substitute the values obtained from Step 2

Additional Information
Using the Desmos App in phone or website: desmos.com, we can graph the points given in the data and the regression
line. Remember, regression analysis should only be used on variables with linear relationship.
The dependent variable can be influenced by the independent variable but there are also other variables that can affect
it. Correlation does not imply causation.

You might also like