Course Number: STA 240 Course Name: Statistics Course Instructor: Tamanna Siddiqua Ratna. (Lecturer)
Course Number: STA 240 Course Name: Statistics Course Instructor: Tamanna Siddiqua Ratna. (Lecturer)
Course Number: STA 240 Course Name: Statistics Course Instructor: Tamanna Siddiqua Ratna. (Lecturer)
2
3
Correlation Analysis
Statistical measure that determine direction and the
strength or degree of linear association between two
or more variables.
Correlation coefficient is the quantitative measure
of correlation.
Sample correlation coefficient denoted by r.
4
Correlation
5
Simple Correlation
6
Assumption of Simple Correlation
Linear Relationship.
Interval scale/ Ratio scale.
Bi- variate Normal Distribution.
7
Properties of Correlation Coefficient (r)
r lies between -1 to 1.
r is a symmetric measure i.e rxy= ryx
This is a dimensionless quantity i.e unit free.
8
If r = Zero this means no association or correlation between the two variables.
If 0 < r < 0.25 = weak correlation.
If 0.25 ≤ r < 0.75 = intermediate correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = l = perfect correlation.
strong intermediate weak weak intermediate strong
9
Convenient Form to Calculate Correlation
Coefficient (r) :
𝑖 𝑥𝑖 𝑖 𝑦𝑖
𝑖 𝑥𝑖 𝑦𝑖 −
𝑛
𝑟=
( 𝑥 ) 2 ( 𝑦 ) 2
2 𝑖 𝑖 2 𝑖 𝑖
𝑖 𝑖 −
𝑥 𝑛 𝑖 𝑖 −
𝑦 𝑛
10
Example 01: Consider study hour of Ten students Of
IUBAT and their CGPA in a Semester as,
Study 6 7 8 5 6 7 2 3 1 4
Time in
Hours (x)
CGPA 3.7 3.80 3.75 3.50 3.20 3.72 3.20 3.50 3.2 3.61
8
11
Solution:
a. Scatter diagram of Study time and CGPA
3.9
3.8
3.7
3.6
CGPA
3.5
CGPA
3.3
3.2
3.1
0 1 2 3 4 5 6 7 8 9
Study time in hours(x)
12
b.
𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑦𝑖2 𝑥𝑖 𝑦𝑖
6 3.78 36 14.2884 22.68
7 3.80 49 14.44 26.6
8 3.75 64 14.0625 30
5 3.50 25 12.25 17.5
6 3.20 36 11.1556 20.04
7 3.72 49 13.8384 26.04
2 3.20 4 10.24 6.4
3 3.50 9 11.9025 10.35
1 3.20 1 10.24 3.2
4 3.61 16 13.0321 14.44
𝑥𝑖 = 49 𝑦𝑖= 35.26 𝑥𝑖2 = 289 𝑦𝑖2 = 124.88 𝑥𝑖 𝑦𝑖 = 176.56
13
Here, n=10
𝑖 𝑥𝑖 𝑖 𝑦𝑖
𝑖 𝑥𝑖 𝑦𝑖 −
𝑛
𝑟=
( 𝑥 ) 2 ( 𝑦 ) 2
2 𝑖 𝑖 2 𝑖 𝑖
𝑖 𝑖 −
𝑥
𝑛 𝑖 𝑖 −
𝑦
𝑛
49 ∗ 35.26
176.56 −
𝑟= 10
492 35.262
289 − 124.88 −
10 10
14
Type of Variable Association measurement
Statistical tool
Quantitative (Ration/ Interval)Variable Pearson Correlation Coefficient
15
Multiple Correlation
16
Let, a researcher takes information of 10 individuals on two
variables, as x= Age and y= Blood Pressure. If there,
𝑥 = 25 𝑦 = 7.5 𝑥 2 = 234 𝑦 2 = 435 𝑥𝑦 = 102
17
Regression Analysis
18
Regression Analysis
A statistical technique for studying the dependency of one
variable (called dependent variable) on one or more other
variables (called independent variables).
19
Types of Variables
Independent
Variable
Dependent
Whose values doesn’t
depend on any other variable
variable.
Whose values are
Also called regressor, determine through the
predictor, explanatory values of other variables.
variable Also called Response,
Regressed, explained
Variable.
20
Primary Objects of Regression analysis
Examine the effect of a set of independent variables
on the mean of the dependent variable.
Predict a mean value of a dependent variable for a
given set of independent variables.
Explained how the variations in the dependent
variable can be explained by a set of independent
variables.
21
Classification of Regression Model
Simple Regression:
Involves only one independent variable.
Multiple Regression:
Involves more than one independent
variables.
22
Let Consider a case, with CGPA of a student of IUBAT (y), his or her study
hour (x1), spending time on social media (x2), attendance in classes (x3).
Now,
𝒚 = 𝜶 + 𝜷𝒙𝟏 + Ԑ
𝒚 = 𝜶 + 𝜷𝟏 𝒙𝟏 + 𝜷𝟐 𝒙𝟑𝟐 + 𝜷𝟑 𝒙𝟑 + Ԑ
23
Simple Linear Regression
24
Simple Linear Regression
In a regression equation, if only two variable is involved , one
independent and one dependent then it called simple linear
regression.
Linearity in variable and parameter.
let, y be the dependent variable and x be the independent variable
then simple regression equation be,
𝑦 = α + β𝑥 + Ԑ
Where,
α = intercept
β= regression coefficient
Ԑ= random error
25
Y
O X
26
Examples
Crops production in a certain farm and amount of fertilizer
used in that farm. Then model be as,
production= α + β*amount of used fertilizer+Ԑ
𝑖 𝑥𝑖 𝑖 𝑦𝑖
𝑖 𝑥𝑖 𝑦𝑖 − 𝑛
∗∗ β = 2
2 ( 𝑥
𝑖 𝑖 )
𝑖 𝑥𝑖 − 𝑛
∗∗ α = 𝑦 − β𝑥
29
Example 03: Suppose that, it is desired to determine the relationship
between the length of sales experience (x) and the volume of sales (y) for
each employee over 6 months from a group of 10 marketing employee of a
Pharmaceutical company. Data are given as,
Sales Experience (x) Volume of sales (y)
(in thousand)
1 80
2 97
4 92
4 102
6 103
8 111
10 119
10 123
11 117
13 136 30
a) Plot the data as a scatter Diagram.
b) Construct a regression line of y on x.
c) Estimate the amount of sales for a salesman
having 12 years experience.
31
Scatter diagram of Sales experince and volume of sales
160
100
40
20
0
0 2 4 Sales6 Experience
8 10 12 14
32
b. Sales xi yi xi yi
Person
1 1 80 1 80
2 2 97 4 194
3 4 92 16 368
4 4 102 16 408
5 6 103 36 618
6 8 111 64 888
7 10 119 100 1190
8 10 123 100 1230
9 11 117 121 1287
10 13 136 169 1768
Total 69 1080 627 8031
33
Here,
𝒙𝒊 = 6.9 and 𝒚𝒊 = 108
And,
69 ∗ 1080
8031 −
𝜷 = 10 = 𝟑. 𝟖𝟒
69 2
627 −
10
34
And
𝜶 = 108 − (3.84 ∗ 6.9) = 𝟖𝟏. 𝟓
35
(c)
Estimated sales for the employee having
12 years experience is,
37
Comparison of Correlation and Regression
Basis Correlation Regression.
Degree and nature of Here, the degree and direction of In Regression nature of relationship
relationship relationship between the variables are is studied.
studied.
Variables type All variables are treated in same way. All variables are not treated in the
same way.
Function If the value of one variable is known, If the value of a variable is known,
the value of other variable cannot be other can be estimated through
estimated, only get the association. functional relationship.
39