4th QTR Lecture
4th QTR Lecture
DEFINITION OF TERMS:
1.) Hypothesis - is a conjecture or statement which aims to explain certain phenomena in the real world.
2.) Hypothesis testing – or significance testing is a method used for testing a claim or hypothesis about a parameter using the
data measured from a sample.
3.) Test statistic is a value determined by a formula that is compared with a critical value (z or t).
4.) Critical values are values that separate the rejection region and the non-rejection region.
5.) Rejection region (Critical region) refers to the region which contains the set of values for the test statistic that leads to
rejection of null hypothesis.
6.) Non-rejection region (Acceptance region) the set of values not in the rejection region that leads to non-rejection of null
hypothesis.
7.) Significance level (α) is the probability of committing a Type I error.
A hypothesis testing is a process of gathering evidence to either support or rebut a claim known as hypothesis. In this method,
we test a hypothesis by determining the likelihood that a sample statistic could have been selected, if the hypothesis regarding
the population parameter was true.
Based on the decision matrix, the four possible outcomes when we reject or
accept the null hypotheses are:
1. The null hypothesis is accepted when, in fact, it is TRUE.
2. The null hypothesis is accepted when, in fact, it is FALSE.
3. The null hypothesis is rejected when, in fact, it is TRUE.
4. The null hypothesis is rejected when, in fact, it is FALSE.
Left-Tailed Test
It is used when the parameter is believed to be lower than the hypothesized value. The alternative hypothesis (H1) must contain
the < symbol to use this test.
Right-Tailed Test
It is used when parameter is supposed to be greater than the hypothesized value. The alternative hypothesis (H1) must contain
the > symbol to use this test.
Two-Tailed Test
Two-tailed hypothesis tests are also known as nondirectional and two-sided tests because you can test for effects in both
directions. When you perform a two-tailed test, you split the significance level percentage between both tails of the distribution.
A test is called a TWO-TAILED TESTS if the rejection region is located on both ends of the distribution. It is used when the
alternative hypothesis (H1) utilizes the ≠ symbol. Hypothesis does not state the direction of difference between a parameter and
its hypothesized value.
Significance Level
The commonly used significance levels are α = 0.05 and α = 0.01. If no level of significance is given, use α = 0.05.
TEST STATISTIC
The two test statistics are z and t, but only one should be used in hypothesis testing.
• If the population standard deviation (σ) is known or given, then we use z-test.
• If the population standard deviation (σ) is unknown, but the sample standard deviation (s) is given, then we use t-test.
DEFINITION OF TERMS
1. CORRELATION - It is the extent to which two variables are related. If the two variables are highly related, then knowing the
value of one of them will allow you to predict the other variable with considerable accuracy (regression analysis).
2. CORRELATION ANALYSIS - It is a statistical method used to determine the relationship between two variables (bivariate
data) in terms of strength and direction. The goal of a correlation analysis is to see whether two quantitative variables co vary,
and to quantify the strength of the relationship between the variables.
3. DIRECTION OF CORRELATION
• Positive Correlation – exists when high values of one variable correspond to high values in the other variable or vice versa.
Example: no. of family members and expenses; height and shoe size; age and weight
• Negative Correlation – exists when high values in one variable correspond to low values in the other variable or vice versa.
Example: expenses and savings; no. of absences and grades; no. of cigarettes consumed and age at death
• Zero Correlation – exists when high values in one variable correspond to either high or low values in the other variable.
Example: height and grade; scores in Filipino and scores in PE
4. STRENGTH OF CORRELATION
The strength of correlation between two variables maybe perfect, very high, moderately high, moderately low, very low, and
zero.
Pearson r Qualitative Description
±1 Perfect
± 0.75 to < ± 1 Very High
± 0.50 to < ± 0.75 Moderately High
± 0.25 to < ± 0.50 Moderately Low
> 0 to < ± 0.25 Very Low
0 No correlation
SCATTERPLOT DIAGRAM AND PEARSON r
1. Scatterplot Diagram
It is a point-graph of all the scores taken from bivariate data. A scatter plot is sometimes written as one-word,
scatterplot and is also called scatter graph or scatter diagram.
It shows how each point collected from a set of bivariate data are scattered on the Cartesian plane. It allows us to
visually see the relation between two variables. Independent variable is plotted on the x-axis and dependent variable on the y-
axis. It allows us to visually see the relation between two variables. One variable is plotted on the ordinate(y) and the other on
the abscissa(x). It is common to place the variable you are attempting to predict on the ordinate.
NOTE:
• Direction is determined by the slope of the trend line. Trend line is the line closest to the points. Strength is indicated by the
closeness of the points to the trend line. The closer the points are to the trend line, the stronger the relationship is.
• The absolute value of r indicates the strength or
magnitude of correlation between two variables. The
direction of correlation is indicated by the sign
(positive or negative) of r.
• If the trend line contains all the points in the
scatterplot and the line points to the right, we
conclude that there is a perfect positive
correlation between the two variables. The
computed r is 1.
• If all the points fall on the trend line that point to the
left, then there exists a perfect negative
correlation between the pair of variables. The
computed r is –1.
• If a trend line does not exist, there is no correlation
between the pair of variables. This is confirmed by
the computed value of r is 0.
Whenever we describe correlation between two variables, we should always describe it in terms of strength and direction. So,
we can have a perfect positive correlation, perfect negative correlation, moderately high positive correlation, moderately high
negative correlation, and so on.
NOTE: Causation means cause and effect relation. “Correlation does not imply causation.” means that correlation cannot be
used to infer a causal relationship between the variables. Simple example is that sales of personal computers and athletic shoes
have both risen strongly in the last several years and there is a high correlation between them, but you cannot assume that
buying computers causes people to buy athletic shoes (or vice versa).
FORMULAS
where:
n – number of ordered pairs in the data (pairs of x and y
values)
x – independent variable
y – dependent variable
y’ – predicted value of dependent variable
a – y-intercept of the regression line
b – slope of the regression line
Additional Information
Using the Desmos App in phone or website: desmos.com, we can graph the points given in the data and the regression
line. Remember, regression analysis should only be used on variables with linear relationship.
The dependent variable can be influenced by the independent variable but there are also other variables that can affect
it. Correlation does not imply causation.