24CSA524: Machine Learning
Remya Rajesh
LINEAR REGRESSION
Area Cost (Lakh) • Plot a graph between the cost and area of the house
(sq.feet) X Y
• The area of the house is represented in the X-axis while
1000 30
cost is represented in Y-axis
1200 40
• What will Regression Do?
1300 50
• Fit the line through these points
1450 70
1495 70
1600 80
Cost
Example 1
Area
LINEAR REGRESSION
Predict for House area=1100?
How is this line represented mathematically?
Introduction to Linear Regression
Example 2
Regression
• We assume that we have 𝑘 feature variables:
• or independent variables
• The target variable is also known as dependent variable.
• We are given a dataset of the form (𝒙1 , 𝑦1 ) , … , (𝒙𝑛 , 𝑦𝑛 ) where, 𝒙𝒊 is a 𝑘-
dimensional feature vector (real), and 𝑦𝑖 a real value
• We want to learn a function ℎ which given a feature vector 𝒙𝒊 predicts a
value 𝑦𝑖′ = ℎ 𝒙𝒊 that is as close as possible to the value 𝑦𝑖 or 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 𝑦𝑖
• Minimize sum of squares:
2
𝑦𝑖 − ℎ 𝒙𝒊
𝑖
The Simple Regression Model
• Definition A simple regression of y on x explains variable y in terms of a
single variable x
Intercept Slope parameter
Error term,
Dependent variable, disturbance,
Independent variable, unobservables,…
LHS variable, RHS variable,
explained variable, explanatory variable,
response variable,… Control variable,…
The Simple Regression Model
• Example: Soybean yield and fertilizer quantity
Rainfall,
land quality,
Measures the effect of amount of fertilizer on presence of parasites, …
yield
• Example: A simple wage equation
Total experience,
current experience,
Measures wage work ethic, work interest,workshops
given number of years attended.. …
of education
Linear regression
• Univariate Linear regression
Training Set
Learning Algorithm
Estimated Model Representation
price
Size of h
(predicted
h ( x) = 0 + 1 x
house
(x) y)
Basic Idea: Method 1
• Using a linear equation h ( x) = 0 + 1 x
compute:
Linear Regression: Prediction Model Example 3
X (years of Y (salary, Rs
• Given one variable X experience) 1,000)
• Goal: Predict value of Y 3 30
• Example: 8 57
• Given Years of Experience 9 64
• Predict Salary 13 72
3 36
• Questions:
6 43
• When X=10, what is Y?
11 59
• When X=25, what is Y?
21 90
• This is known as regression
1 20
16 83
Example
Salary Dataset
X Y
Years of Salary in
Experience 1000s
3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90
1 20
16 83
From the given dataset we could get: 𝑥ҧ = 9.1; 𝑦ത = 55.4
3−9.1 ∗ 30−55.4 + 8−9.1 ∗ 57−55.4 +⋯+ 16−9.1 ∗(83−55.4)
θ1 = = 3.5
(3−9.1)2 +(8−9.1)2 + ⋯+(16−9.1)2
θ0 = 55.4 − 3.5 ∗ 9.1 = 23.2
Thus, 𝑦 = 23. 2 + 3.5 ∗ 𝑥
Linear Regression Example
Linear Regression: Y=3.5*X+23.2
120
100
80
Salary
60
40
20
0
0 5 10 15 20 25
Years
For the example data
Thus, when x=10 years, prediction of y (salary)
is: 23.2+35=58.2 K Rs/year.
Multivariate models
simple regression model
(Education) x y (Income)
Multivariate or multiple regression model
(Education) x1
(Gender) x2 y (Income)
(Experience) x3
(Age) x4
More than one prediction attribute
• Consider two independent attributes X1, X2 and a dependent
variable Y
• For example,
• X1=‘years of experience’
• X2=‘age’
• Y=‘salary’
• Equation:
Outliers
Image result for regression with outliers
• Regression is sensitive to outliers:
• The line will “tilt” to accommodate very extreme values
• Solution: remove the outliers
• But make sure that they do not capture useful information
Normalization
• In the regression problem some times our features may have very
different scales:
• For example: predict the GDP of a country using the count of home owners and
the income as features
• Solution: Normalize the features by replacing the values with the z-
scores
Predictive Model challenges with linearity
Hypothesis
• is are the parameters
• Lets visualize this hypothesis
• Consider 0 = 1.5 and 1 = 0
• h(x) = 1.5 + 0x
Hypothesis
Consider 0 = 0 and 1 = 1 Consider 0 = 0 and 1 = 0.5
h(x) = 0 + 1x h(x) = 0 + 0.5x
Optimize Cost Function
1 m 2
min imize (h( x (i) )− y (i) )
2m i =1
,
0 1
1 m 2
min imize ( 0 + 1x (i) − y (i) )
2m i =1
,
0 1
1 m 2
J ( 0 , 1) =
2m i =1 (h( x ) − y )
(i ) (i )
Goal:min imizeJ ( 0, 1)
,
0 1
Where J(0,1) is the cost function or the squared error function
Cost Function
• How to best fit our Data?
• Choose the value of theta such that the 𝐽(𝜃) = ℎ(𝑥) − 𝑦
difference between h(x) [which returns the
predicted value] and y (which is the actual 𝐽(𝜃) = (ℎ(𝑥) − 𝑦)2
value) is minimum 𝑚
• To calculate this - define an error function 𝐽(𝜃) = (ℎ(𝑥 𝑖 ) − 𝑦 𝑖 )2
also called the cost function 𝑖=1
• Absolute error - square of the error 𝑚
because some points are above and below 1
the line 𝐽(𝜃) = (ℎ(𝑥 𝑖 ) − 𝑦 𝑖 )2
𝑚
• Error of ALL points – Summation 𝑖=1
𝑚
• Averaged and then divided by 2 to make 1
the calculation easier 𝐽(𝜃) = (ℎ(𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1
Cost Function
1 m 2
min imize (h( x (i) )− y (i) )
2m i =1
,
0 1
1 m 2
min imize ( 0 + 1x (i) − y (i) )
2m i =1
,
0 1
1 m 2
J ( 0 , 1) =
2m i =1 (h( x ) − y )
(i ) (i )
Goal:min imizeJ ( 0, 1)
,
0 1
J(0,1) is the cost function or the squared error function
Evaluation
the higher the R-squared score, the better the model fits your data