[go: up one dir, main page]

0% found this document useful (0 votes)
10 views27 pages

Slide 3 Linear Regression

The document provides an overview of linear regression, explaining its purpose in predicting a dependent variable based on independent variables. It includes examples of simple and multiple regression, discusses the importance of minimizing error through cost functions, and highlights challenges such as outliers and normalization. Additionally, it emphasizes the significance of the R-squared score in evaluating model performance.

Uploaded by

JOBIN Wilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views27 pages

Slide 3 Linear Regression

The document provides an overview of linear regression, explaining its purpose in predicting a dependent variable based on independent variables. It includes examples of simple and multiple regression, discusses the importance of minimizing error through cost functions, and highlights challenges such as outliers and normalization. Additionally, it emphasizes the significance of the R-squared score in evaluating model performance.

Uploaded by

JOBIN Wilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

24CSA524: Machine Learning

Remya Rajesh
LINEAR REGRESSION

Area Cost (Lakh) • Plot a graph between the cost and area of the house
(sq.feet) X Y
• The area of the house is represented in the X-axis while
1000 30
cost is represented in Y-axis
1200 40
• What will Regression Do?
1300 50
• Fit the line through these points
1450 70
1495 70
1600 80

Cost
Example 1

Area
LINEAR REGRESSION

Predict for House area=1100?

How is this line represented mathematically?


Introduction to Linear Regression

Example 2
Regression
• We assume that we have 𝑘 feature variables:
• or independent variables
• The target variable is also known as dependent variable.
• We are given a dataset of the form (𝒙1 , 𝑦1 ) , … , (𝒙𝑛 , 𝑦𝑛 ) where, 𝒙𝒊 is a 𝑘-
dimensional feature vector (real), and 𝑦𝑖 a real value
• We want to learn a function ℎ which given a feature vector 𝒙𝒊 predicts a
value 𝑦𝑖′ = ℎ 𝒙𝒊 that is as close as possible to the value 𝑦𝑖 or 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 𝑦𝑖
• Minimize sum of squares:
2
෍ 𝑦𝑖 − ℎ 𝒙𝒊
𝑖
The Simple Regression Model
• Definition A simple regression of y on x explains variable y in terms of a
single variable x

Intercept Slope parameter

Error term,
Dependent variable, disturbance,
Independent variable, unobservables,…
LHS variable, RHS variable,
explained variable, explanatory variable,
response variable,… Control variable,…
The Simple Regression Model
• Example: Soybean yield and fertilizer quantity

Rainfall,
land quality,
Measures the effect of amount of fertilizer on presence of parasites, …
yield

• Example: A simple wage equation


Total experience,
current experience,
Measures wage work ethic, work interest,workshops
given number of years attended.. …
of education
Linear regression
• Univariate Linear regression

Training Set

Learning Algorithm

Estimated Model Representation


price
Size of h
(predicted
h ( x) =  0 + 1 x
house
(x) y)
Basic Idea: Method 1
• Using a linear equation h ( x) =  0 + 1 x

compute:
Linear Regression: Prediction Model Example 3

X (years of Y (salary, Rs
• Given one variable X experience) 1,000)
• Goal: Predict value of Y 3 30

• Example: 8 57

• Given Years of Experience 9 64


• Predict Salary 13 72
3 36
• Questions:
6 43
• When X=10, what is Y?
11 59
• When X=25, what is Y?
21 90
• This is known as regression
1 20
16 83
Example
Salary Dataset
X Y
Years of Salary in
Experience 1000s
3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90
1 20
16 83
From the given dataset we could get: 𝑥ҧ = 9.1; 𝑦ത = 55.4

3−9.1 ∗ 30−55.4 + 8−9.1 ∗ 57−55.4 +⋯+ 16−9.1 ∗(83−55.4)


θ1 = = 3.5
(3−9.1)2 +(8−9.1)2 + ⋯+(16−9.1)2

θ0 = 55.4 − 3.5 ∗ 9.1 = 23.2

Thus, 𝑦 = 23. 2 + 3.5 ∗ 𝑥


Linear Regression Example
Linear Regression: Y=3.5*X+23.2

120

100

80
Salary

60

40

20

0
0 5 10 15 20 25
Years
For the example data

Thus, when x=10 years, prediction of y (salary)


is: 23.2+35=58.2 K Rs/year.
Multivariate models

simple regression model


(Education) x y (Income)

Multivariate or multiple regression model

(Education) x1
(Gender) x2 y (Income)
(Experience) x3
(Age) x4
More than one prediction attribute
• Consider two independent attributes X1, X2 and a dependent
variable Y
• For example,
• X1=‘years of experience’
• X2=‘age’
• Y=‘salary’
• Equation:
Outliers
Image result for regression with outliers

• Regression is sensitive to outliers:


• The line will “tilt” to accommodate very extreme values
• Solution: remove the outliers
• But make sure that they do not capture useful information
Normalization

• In the regression problem some times our features may have very
different scales:
• For example: predict the GDP of a country using the count of home owners and
the income as features

• Solution: Normalize the features by replacing the values with the z-


scores
Predictive Model challenges with linearity
Hypothesis

• is are the parameters


• Lets visualize this hypothesis
• Consider 0 = 1.5 and 1 = 0
• h(x) = 1.5 + 0x
Hypothesis
Consider 0 = 0 and 1 = 1 Consider 0 = 0 and 1 = 0.5
h(x) = 0 + 1x h(x) = 0 + 0.5x
Optimize Cost Function
1 m 2
min imize  (h( x (i) )− y (i) )
2m i =1

 ,
0 1

1 m 2
min imize  ( 0 + 1x (i) − y (i) )
2m i =1

 ,
0 1

1 m 2
J ( 0 , 1) = 
2m i =1 (h( x ) − y )
(i ) (i )

Goal:min imizeJ ( 0, 1)


 ,
0 1

Where J(0,1) is the cost function or the squared error function


Cost Function
• How to best fit our Data?
• Choose the value of theta such that the 𝐽(𝜃) = ℎ(𝑥) − 𝑦
difference between h(x) [which returns the
predicted value] and y (which is the actual 𝐽(𝜃) = (ℎ(𝑥) − 𝑦)2
value) is minimum 𝑚

• To calculate this - define an error function 𝐽(𝜃) = ෍(ℎ(𝑥 𝑖 ) − 𝑦 𝑖 )2


also called the cost function 𝑖=1
• Absolute error - square of the error 𝑚
because some points are above and below 1
the line 𝐽(𝜃) = ෍(ℎ(𝑥 𝑖 ) − 𝑦 𝑖 )2
𝑚
• Error of ALL points – Summation 𝑖=1
𝑚
• Averaged and then divided by 2 to make 1
the calculation easier 𝐽(𝜃) = ෍(ℎ(𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1
Cost Function
1 m 2
min imize  (h( x (i) )− y (i) )
2m i =1

 ,
0 1

1 m 2
min imize  ( 0 + 1x (i) − y (i) )
2m i =1

 ,
0 1

1 m 2
J ( 0 , 1) = 
2m i =1 (h( x ) − y )
(i ) (i )

Goal:min imizeJ ( 0, 1)


 ,
0 1

J(0,1) is the cost function or the squared error function


Evaluation

the higher the R-squared score, the better the model fits your data

You might also like