[go: up one dir, main page]

0% found this document useful (0 votes)
48 views39 pages

Course Number: STA 240 Course Name: Statistics Course Instructor: Tamanna Siddiqua Ratna. (Lecturer)

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 39

Course Number: STA 240

Course Name: Statistics

Course Instructor: Tamanna Siddiqua Ratna.


( Lecturer )
• Definition and explanation of Correlation
• Classification of Correlation
• Practice correlation related problem and interpretation
• Definition and explanation of Regression Analysis
• Classification of Regression
• Practice Regression related problem and interpretation
• Comparison between correlation and regression

2
3
Correlation Analysis
 Statistical measure that determine direction and the
strength or degree of linear association between two
or more variables.
 Correlation coefficient is the quantitative measure
of correlation.
 Sample correlation coefficient denoted by r.

4
Correlation

Simple Correlation Multiple Correlation


(involved only two Variable) (involved more than two
variables)

5
Simple Correlation

6
Assumption of Simple Correlation

 Linear Relationship.
 Interval scale/ Ratio scale.
 Bi- variate Normal Distribution.

7
Properties of Correlation Coefficient (r)

 r lies between -1 to 1.
r is a symmetric measure i.e rxy= ryx
 This is a dimensionless quantity i.e unit free.

8
 If r = Zero this means no association or correlation between the two variables.
 If 0 < r < 0.25 = weak correlation.
 If 0.25 ≤ r < 0.75 = intermediate correlation.
 If 0.75 ≤ r < 1 = strong correlation.
 If r = l = perfect correlation.
strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1


indirect Direct
perfect correlation perfect correlation
no linear
correlation

9
Convenient Form to Calculate Correlation
Coefficient (r) :

𝑖 𝑥𝑖 𝑖 𝑦𝑖
𝑖 𝑥𝑖 𝑦𝑖 −
𝑛
𝑟=
( 𝑥 ) 2 ( 𝑦 ) 2
2 𝑖 𝑖 2 𝑖 𝑖
𝑖 𝑖 −
𝑥 𝑛 𝑖 𝑖 −
𝑦 𝑛

10
Example 01: Consider study hour of Ten students Of
IUBAT and their CGPA in a Semester as,

Study 6 7 8 5 6 7 2 3 1 4
Time in
Hours (x)
CGPA 3.7 3.80 3.75 3.50 3.20 3.72 3.20 3.50 3.2 3.61
8

a. Plot the data as a scatter diagram.


b. Find the correlation coefficient.

11
Solution:
a. Scatter diagram of Study time and CGPA
3.9

3.8

3.7

3.6
CGPA

3.5
CGPA

3.4 Linear (CGPA )

3.3

3.2

3.1
0 1 2 3 4 5 6 7 8 9
Study time in hours(x)

12
b.
𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑦𝑖2 𝑥𝑖 𝑦𝑖
6 3.78 36 14.2884 22.68
7 3.80 49 14.44 26.6
8 3.75 64 14.0625 30
5 3.50 25 12.25 17.5
6 3.20 36 11.1556 20.04
7 3.72 49 13.8384 26.04
2 3.20 4 10.24 6.4
3 3.50 9 11.9025 10.35
1 3.20 1 10.24 3.2
4 3.61 16 13.0321 14.44
𝑥𝑖 = 49 𝑦𝑖= 35.26 𝑥𝑖2 = 289 𝑦𝑖2 = 124.88 𝑥𝑖 𝑦𝑖 = 176.56
13
Here, n=10
𝑖 𝑥𝑖 𝑖 𝑦𝑖
𝑖 𝑥𝑖 𝑦𝑖 −
𝑛
𝑟=
( 𝑥 ) 2 ( 𝑦 ) 2
2 𝑖 𝑖 2 𝑖 𝑖
𝑖 𝑖 −
𝑥
𝑛 𝑖 𝑖 −
𝑦
𝑛

49 ∗ 35.26
176.56 −
𝑟= 10
492 35.262
289 − 124.88 −
10 10

r= 0. 73 Which implies that, there exists an intermediate positive linear


relationship between study hour and CGPA of a student.

14
Type of Variable Association measurement
Statistical tool
Quantitative (Ration/ Interval)Variable Pearson Correlation Coefficient

Ordinal Variable Spearman Rank Order Correlation


Coefficient

Nominal Variable Chi-square test statistic

15
Multiple Correlation

 More than two variables.


 one is considered as dependent variable and others
are independent.
 Measures the impact to independent variables to
dependent variable.
 Lies between 0 to 1.

16
Let, a researcher takes information of 10 individuals on two
variables, as x= Age and y= Blood Pressure. If there,
𝑥 = 25 𝑦 = 7.5 𝑥 2 = 234 𝑦 2 = 435 𝑥𝑦 = 102

a) Calculate the correlation coefficient between x and y,


assuming linear relationship
b) Interpret the result.

17
Regression Analysis

18
Regression Analysis
 A statistical technique for studying the dependency of one
variable (called dependent variable) on one or more other
variables (called independent variables).

 relationship is expressed in the form of an equation connecting


dependent (let y) and independent variables (let x1, x2… xn). i.e
𝑦 = 𝑓(𝑥1 , 𝑥2, … 𝑥𝑛 )

19
Types of Variables
Independent
Variable
Dependent
Whose values doesn’t
depend on any other variable
variable.
Whose values are
Also called regressor, determine through the
predictor, explanatory values of other variables.
variable Also called Response,
Regressed, explained
Variable.
20
Primary Objects of Regression analysis
 Examine the effect of a set of independent variables
on the mean of the dependent variable.
 Predict a mean value of a dependent variable for a
given set of independent variables.
 Explained how the variations in the dependent
variable can be explained by a set of independent
variables.
21
Classification of Regression Model

Simple Regression:
Involves only one independent variable.

Multiple Regression:
Involves more than one independent
variables.

22
Let Consider a case, with CGPA of a student of IUBAT (y), his or her study
hour (x1), spending time on social media (x2), attendance in classes (x3).
Now,

If we involve only x1 and y, then get Simple regression be like,

𝒚 = 𝜶 + 𝜷𝒙𝟏 + Ԑ

Involving all independent variables we get a multiple regression as,

𝒚 = 𝜶 + 𝜷𝟏 𝒙𝟏 + 𝜷𝟐 𝒙𝟑𝟐 + 𝜷𝟑 𝒙𝟑 + Ԑ

23
Simple Linear Regression

24
Simple Linear Regression
 In a regression equation, if only two variable is involved , one
independent and one dependent then it called simple linear
regression.
 Linearity in variable and parameter.
 let, y be the dependent variable and x be the independent variable
then simple regression equation be,
𝑦 = α + β𝑥 + Ԑ
Where,
α = intercept
β= regression coefficient
Ԑ= random error

25
Y

O X
26
Examples
 Crops production in a certain farm and amount of fertilizer
used in that farm. Then model be as,
production= α + β*amount of used fertilizer+Ԑ

 acquire marks in statistics course and time (in hours) spends


on studying statistics. Then model be,
marks_in_stat= α + β*study time +Ԑ

 sales of furniture in OTOBI and the amount of advertising


cost for a certain time. Then model be,
Sales= α + β*cost of advertising+Ԑ
27
Interpretation of the Estimated Regression
Coefficient
β= represents the average change in the value of
dependent variable (y) for each unit changes in the
independent variable (x).

α= If the impact of independent variable is zero then


the average change of dependent variable is represented
by α.
28
Formula for Estimating Regression Coefficient :

𝑖 𝑥𝑖 𝑖 𝑦𝑖
𝑖 𝑥𝑖 𝑦𝑖 − 𝑛
∗∗ β = 2
2 ( 𝑥
𝑖 𝑖 )
𝑖 𝑥𝑖 − 𝑛

∗∗ α = 𝑦 − β𝑥

29
Example 03: Suppose that, it is desired to determine the relationship
between the length of sales experience (x) and the volume of sales (y) for
each employee over 6 months from a group of 10 marketing employee of a
Pharmaceutical company. Data are given as,
Sales Experience (x) Volume of sales (y)
(in thousand)
1 80
2 97
4 92
4 102
6 103
8 111
10 119
10 123
11 117
13 136 30
a) Plot the data as a scatter Diagram.
b) Construct a regression line of y on x.
c) Estimate the amount of sales for a salesman
having 12 years experience.

31
Scatter diagram of Sales experince and volume of sales
160

Volume of sales in thousands 140 y = 4x + 80


R² = 0.902
120

100

Volume of sales (y) in Thousands


80

Linear (Volume of sales (y) in


60 Thousands )

40

20

0
0 2 4 Sales6 Experience
8 10 12 14

32
b. Sales xi yi xi yi
Person
1 1 80 1 80
2 2 97 4 194
3 4 92 16 368
4 4 102 16 408
5 6 103 36 618
6 8 111 64 888
7 10 119 100 1190
8 10 123 100 1230
9 11 117 121 1287
10 13 136 169 1768
Total 69 1080 627 8031
33
Here,
𝒙𝒊 = 6.9 and 𝒚𝒊 = 108
And,
69 ∗ 1080
8031 −
𝜷 = 10 = 𝟑. 𝟖𝟒
69 2
627 −
10

𝜷 = 𝟑. 𝟖𝟒 implies that for an average increase of


one year sales experience of an employee, the
volume of sales would increase on the average by
taka 3.83 thousand.

34
And
𝜶 = 108 − (3.84 ∗ 6.9) = 𝟖𝟏. 𝟓

𝜶 = 𝟖𝟏. 𝟓 Implies if an employee has


no experience, then average increase in
sales volume will be 81.5 thousand taka.

Thus the estimated regression line is,


𝒚 = 𝟖𝟏. 𝟓 + 𝟑. 𝟖𝟒𝒙

35
(c)
Estimated sales for the employee having
12 years experience is,

𝒚 = 81.5 + 3.84 ∗ 12 = 𝟏𝟐𝟕. 𝟓𝟖

Implies that, a 12 years experienced


employee will have sales volume of
127.58 thousand taka on an average.
36
Goodness of fit (R2) :
We need to know how “good” is the fitted line and R2 be
the tools to determinate that.

And R2 = .68 indicates that 68% of the total variation of


dependent variable (y) can be explained by independent
variable (x).

37
Comparison of Correlation and Regression
Basis Correlation Regression.
Degree and nature of Here, the degree and direction of In Regression nature of relationship
relationship relationship between the variables are is studied.
studied.
Variables type All variables are treated in same way. All variables are not treated in the
same way.
Function If the value of one variable is known, If the value of a variable is known,
the value of other variable cannot be other can be estimated through
estimated, only get the association. functional relationship.

Prediction r is used to describe the linear Regression analysis enable us to


association but unable to make any make prediction.
prediction.
Ranges r Ranges the values from -1 to 1. Regression coefficient ranges from
–α to +α 38
Thank You.

39

You might also like