0% found this document useful (0 votes)

47 views80 pages

Lecture 2-Linear-Regression-Part1

Here are the key steps of one iteration of gradient descent: 1. Compute the derivative of the cost function J with respect to each parameter θi. This gives the slope of J at the current values of the θs. 2. The slope indicates how much changing each θi would help reduce the cost. If the slope is positive, reducing θi would lower J. If the slope is negative, increasing θi would lower J. 3. Take a small step in the direction opposite to the slope. This means subtracting a small amount from θi if the slope is positive, or adding a small amount if the slope is negative. 4. The step size is determined by the learning rate α.

Uploaded by

Nada Shaaban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views80 pages

Lecture 2-Linear-Regression-Part1

Uploaded by

Nada Shaaban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Prepared by : Dr.

Hanaa Bayomi
Updated By: Prof Abeer ElKorany

Lecture 2 : Linear Regression

LINEAR REGRESSION WITH ONE VARIABLE

➢ Model Representation

➢ Cost Function

➢ Gradient Descent
MODEL REPRESENTATION

dependent
variable

1250
Independent
variable
Supervised Learning Regression:

“right answers” or “Labeled data” given Predict continuous valued output (price)
MODEL REPRESENTATION

Example

x (1) 2104
(x,y) one training example (one raw) 232
y (2)
(x (i),y (i)) i th training example
x (4) 852
MODEL REPRESENTATION

Training set the job of a learning algorithm to output

a function is usually denoted lowercase
h and h stands for hypothesis
Learning algorithm

x h y

the job of a hypothesis function is

taking the value of x and it tries to
output the estimated value of y. So h is
a function that maps from x's to y's
MODEL REPRESENTATION
How do we represent h ?

X
X
X X X
X
X
Y X
X

X
Linear Equations
Y

Change in Y
θ1= Slop (ΔY)

Change in X (ΔX)

θ0=Y-intercept

Linear regression with one variable. Univariate linear regression.

X
Types of Regression Models

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

COST FUNCTION
▪ The cost function, let us figure out how to fit the best
possible straight line to our data.

How to choose θi’s ?

Scatter plot
▪ 1. Plot of All (Xi, Yi) Pairs
▪ 2. Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60
Thinking Challenge

How would you draw a line through the points?

How do you determine which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60
11
Thinking Challenge

How would you draw a line through the points?

How do you determine which line ‘fits best’?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?

Slope unchanged

Y
60
40
20
0 X
0 20 40 60
Intercept changed
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope changed

Y
60
40
20
0 X
0 20 40 60
Intercept changed
Price ($) in 1000's
Training Set Size in feet2 (x)
(y)
2104 460
1416 232
1534 315
852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ? Or Weight
3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Least Squares
▪ 1. ‘Best Fit’ Means Difference Between Actual Y Values
and Predicted Y Values is a Minimum. So square errors!

m m
 (Yi − h ( xi ) ) = 
2
ˆi
2
i =1 i =1

17
Least Squares
▪ 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted
Y Values Are a Minimum. So square errors!

m m
 (Yi − h ( xi ) ) = 
2
ˆi
2
i =1 i =1
▪ 2. LS Minimizes the Sum of the Squared Differences (errors) (SSE)

18
Least Squares Graphically
n
LS minimizes   i =  1 +  2 +  3 +  4
 2  2  2  2  2

i =1
Y Y2 =  0 + 1 X 2 + ˆ2
^4
^2
^1 ^3
hθ(xi ) = θ0 + θ1 X i
X
19
Least Squared errors Linear Regression
COST FUNCTION ,

Minimize

predictions on the
training set
the actual values

Minimize
Cost function visualization with One parameter
Consider a simple case of hypothesis by setting θ0=0, then h becomes :
hθ(x)=θ1x

Each value of θ1 corresponds to a different hypothesis as it is the slope

of the line
which corresponds to different lines passing through the origin as
shown in plots below as y-intercept i.e. θ0 is nulled out.

At θ1=2,
At θ1=1,

At θ1=0.5, J(0.5)
Cost function visualization with One parameter
CHANGE OF COEFFICIENT COST FUNCTION
Cost function visualization with One parameter
CHANGE OF COEFFICIENT COST FUNCTION
Cost function visualization with One parameter
Simple Hypothesis

At θ1=2,

At θ1=1,

At θ1=0.5, J(0.5)

On plotting points like this further, one

gets the following graph for the cost
function which is dependent on
parameter θ1.

plot each value of θ1 corresponds to a

different hypothesizes
Cost function visualization with One parameter

What is the optimal value of θ1 that minimizes J(θ1) ?

It is clear that best value for θ1 =1 as J(θ1 ) = 0,
which is the minimum.

How to find the best value for θ1 ?

Plotting ?? Not practical specially in high

dimensions?

The solution :

1. Analytical solution: not applicable for large

datasets
2. Numerical solution: ex: Gradient descent .
Ploting the cost function 𝑗 𝜃0 , 𝜃1
Cost function visualization with θ0, θ1
COST FUNCTION (RECAP)
Gradient Descent
GRADIENT DESCENT

➢ Iterative solution not only in linear regression. It's

actually used all over the place in machine learning.

➢ Objective: minimize any function ( Cost Function J)

PROBLEM SETUP
Imagine that this is a landscape of grassy park, and you want
to go to the lowest point in the park as rapidly as possible

Starting
point Red: means high
blue: means low

J(0,1)

1
local
minimum 0
New Starting
point

Red: means high

blue: means low

J(0,1)

New local
minimum

1
0
With different starting point
Gradient descent Algorithm (LMS)
Gradient descent Algorithm
J(θ1) EXAMPLE

d
1 = 1 −  j (1 )
+ slop
d1

θ1 θ1= θ1- (+ve)

J(θ1)

- slop

θ1= θ1- 
(-ve)

θ1
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
QUESTION
what do you think one step of gradient descent will do?
Change of Learning rate value

o
If α is too small, gradient
descent can be slow.

If α is too large, gradient

descent can overshoot the
minimum. It may fail to
converge, or even diverge.
Change of Learning rate value

If α is too small, gradient

descent can be slow.

If α is too large, gradient

descent can overshoot the
minimum. It may fail to
converge, or even diverge.
Local minimum
Gradient descent can converge to a local minimum, even with the
learning rate α fixed.

As we approach a local minimum, gradient descent will

automatically take smaller steps. So, no need to decrease α
over time.
GRADIENT DESCENT FOR
A LINEAR REGRESSION
d 1 m
 (h ( xi ) − Yi )
d
j (0 ,1 ) = 2
d j d j 2m i=1
d 1 m
 ( 0 +  1( xi ) − Yi )
d
j (0 ,1 ) = 2
d j d j 2m i=1
1 m
j ( 0 ,1 ) =  (h ( xi ) − Yi )
d
j = 0:
d 0 m i =1
1 m
j ( 0 ,1 ) =  (h ( xi ) − Yi ) • xi
d
j = 1:
d1 m i =1
G.D. FOR LINEAR
REGRESSION
MODEL REPRESENTATION
Linear Regression
Using
TensorFlow
1-D Data Example
Data Preparation

import numpy as np

num_of_points = 100 #Generate 100 Data Points

points = []
for i in range(num_of_points):
x1= np.random.normal(0.0, 0.55)
y1= x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.01)
points.append([x1, y1])
x_data = [v[0] for v in points]
y_data = [v[1] for v in points]
Draw Data

import matplotlib.pyplot as plt

plt.plot(x_data, y_data, 'ro', label='Original data')

plt.legend()
plt.show()
Original Data
Variables and Nodes
Preparation

import tensorflow as tf

#initialize weights "W and bias "b"

W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))

y = W * x_data + b

#Define Loss function as Mean of Squared Error

loss = tf.reduce_mean(tf.square(y - y_data))

#Create Optimizer class to minimize Losses

optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

#initialize TensorFlow Variables (always)

init = tf.global_variables_initializer()
Execute TensorFlow Graph
#Start TensorFlow Session and carryout Variable initialization
sess = tf.Session()
sess.run(init)

#Carryout 16 Iterations
for step in range(16):
sess.run(train)

#Draw Original Data

plt.plot(x_data, y_data, 'ro')

#Draw Predicted data (using calculated weight and bias after training
plt.plot(x_data, sess.run(W) * x_data + sess.run(b))
plt.xlabel('x')
plt.xlim(-2, 2)
plt.ylim(0.1, 0.6)
plt.ylabel('y')
plt.legend()
plt.show()

# print updated weights, bias, and Loss value after current training iteration
print(step, sess.run(W), sess.run(b),sess.run(loss))
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
Iteration 9
Iteration 10
Iteration 11
Iteration 12
Iteration 13
Iteration 14
Iteration 15
Iteration 16

Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Lec2 Linear Regression With One Variable
No ratings yet
Lec2 Linear Regression With One Variable
48 pages
Week 6
No ratings yet
Week 6
72 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
12 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Linear Regression: Level:4 Department: IT, Security
No ratings yet
Linear Regression: Level:4 Department: IT, Security
35 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229
No ratings yet
CS229
69 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
30 pages
cs229 2
No ratings yet
cs229 2
275 pages
Notes 1
No ratings yet
Notes 1
30 pages
Regression
No ratings yet
Regression
30 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
2-LR Optim
No ratings yet
2-LR Optim
60 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Machine Learning Basics for Students
No ratings yet
Machine Learning Basics for Students
7 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
10 pages
Lecture W2ab
No ratings yet
Lecture W2ab
56 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Week 04
No ratings yet
Week 04
101 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Slide 3 - Linear Regression One Variable
No ratings yet
Slide 3 - Linear Regression One Variable
60 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
Gradient Descent Algorithm
No ratings yet
Gradient Descent Algorithm
6 pages
Linear Regression
100% (1)
Linear Regression
51 pages
ML02
No ratings yet
ML02
25 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
CSE 412 Lab Manual 3 Linear Regression
No ratings yet
CSE 412 Lab Manual 3 Linear Regression
10 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Linear Regression Techniques
No ratings yet
Linear Regression Techniques
25 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
7 pages
Least Square Vs Gradient Descent
100% (1)
Least Square Vs Gradient Descent
52 pages
AI & ML Lab Manual - LDCE
No ratings yet
AI & ML Lab Manual - LDCE
70 pages
Week 4
No ratings yet
Week 4
101 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Compiler Construction Basics
No ratings yet
Compiler Construction Basics
79 pages
Context-Free Grammars & Parsing
No ratings yet
Context-Free Grammars & Parsing
108 pages
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
TNSTC
No ratings yet
TNSTC
1 page
Reviewer in Electricity and Magnetism
No ratings yet
Reviewer in Electricity and Magnetism
3 pages
MBA With Digital Marketing-UWS-London
No ratings yet
MBA With Digital Marketing-UWS-London
8 pages
Sonometer
No ratings yet
Sonometer
7 pages
Grapevine Communication
No ratings yet
Grapevine Communication
2 pages
Evaluating HP Programs Workbook
No ratings yet
Evaluating HP Programs Workbook
79 pages
Rylstone WS - River Pumping Station Concept Design Report: Prepared For Mid-Western Regional Council
No ratings yet
Rylstone WS - River Pumping Station Concept Design Report: Prepared For Mid-Western Regional Council
54 pages
Chord Progression
No ratings yet
Chord Progression
7 pages
Storm Drainage System Design
No ratings yet
Storm Drainage System Design
15 pages
Bajrang Dal Hate Speeches Against Muslims - Sanjeev Sabhlok's Revolutionary Blog
No ratings yet
Bajrang Dal Hate Speeches Against Muslims - Sanjeev Sabhlok's Revolutionary Blog
7 pages
DVB-T Modulator IP Core Specification
No ratings yet
DVB-T Modulator IP Core Specification
10 pages
WTC - B2 CR Masonry Plan
No ratings yet
WTC - B2 CR Masonry Plan
1 page
Science 8 Summative Test
No ratings yet
Science 8 Summative Test
3 pages
Kisi Kisi Soal B. Inggris Psaj 2023-2024
No ratings yet
Kisi Kisi Soal B. Inggris Psaj 2023-2024
2 pages
Teacher Level 2
No ratings yet
Teacher Level 2
6 pages
Quartal Jazz Piano Voicings PDF
0% (2)
Quartal Jazz Piano Voicings PDF
2 pages
Conf A Rance
No ratings yet
Conf A Rance
377 pages
ProMax Troubleshooting Guide
No ratings yet
ProMax Troubleshooting Guide
2 pages
SD Unit 2QB
No ratings yet
SD Unit 2QB
17 pages
Kapil CPF
No ratings yet
Kapil CPF
3 pages
Periyar University: Degree of Master of Commerce
No ratings yet
Periyar University: Degree of Master of Commerce
90 pages
Lecture 5 - Introduction To GIS
No ratings yet
Lecture 5 - Introduction To GIS
66 pages
Ayush Kumar Insurance
No ratings yet
Ayush Kumar Insurance
4 pages
Day 2 - Reading - Adventure Sports
No ratings yet
Day 2 - Reading - Adventure Sports
3 pages
Stones Paramount Cutter-A System For Cutting Garments 1887
100% (1)
Stones Paramount Cutter-A System For Cutting Garments 1887
72 pages
Ventilation Solutions for Homes
No ratings yet
Ventilation Solutions for Homes
4 pages
German, Code No. 120 Class XI and XII (2020-2021)
No ratings yet
German, Code No. 120 Class XI and XII (2020-2021)
11 pages
Thermodynamics - II: Clausius-Clapeyron Equation
No ratings yet
Thermodynamics - II: Clausius-Clapeyron Equation
12 pages
GSRTC 24nov
No ratings yet
GSRTC 24nov
1 page
The Quantum Software Lifecycle: Benjamin Weder Johanna Barzen Frank Leymann
No ratings yet
The Quantum Software Lifecycle: Benjamin Weder Johanna Barzen Frank Leymann
8 pages

Lecture 2-Linear-Regression-Part1

Uploaded by

Lecture 2-Linear-Regression-Part1

Uploaded by

Prepared by : Dr.

Lecture 2 : Linear Regression

Training set the job of a learning algorithm to output

the job of a hypothesis function is

Linear regression with one variable. Univariate linear regression.

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

How to choose θi’s ?

How would you draw a line through the points?

How would you draw a line through the points?

Each value of θ1 corresponds to a different hypothesis as it is the slope

On plotting points like this further, one

plot each value of θ1 corresponds to a

What is the optimal value of θ1 that minimizes J(θ1) ?

How to find the best value for θ1 ?

Plotting ?? Not practical specially in high

1. Analytical solution: not applicable for large

➢ Iterative solution not only in linear regression. It's

➢ Objective: minimize any function ( Cost Function J)

Red: means high

θ1 θ1= θ1- (+ve)

If α is too large, gradient

If α is too small, gradient

If α is too large, gradient

As we approach a local minimum, gradient descent will

num_of_points = 100 #Generate 100 Data Points

import matplotlib.pyplot as plt

plt.plot(x_data, y_data, 'ro', label='Original data')

#initialize weights "W and bias "b"

#Define Loss function as Mean of Squared Error

#Create Optimizer class to minimize Losses

#initialize TensorFlow Variables (always)

#Draw Original Data

You might also like