Machine learning Algorithms
Linear regression with one variable :
Model representation
Model Representation
Cost Function
Gradient Descent
500
Housing Prices
400
300
dependent 4
200
variable X
125
0
Independent variable
Supervised Regression:
Learning
“right answers” or “Labeled Predict continuous valued
data” given output (price)
B
Training set of Size in feet 2 ( x) Price ($ ) in 1000's (y)
housing prices 2104
1416
460
232
^
( Portland, OR )
1534 315 > m
852 178
••• •••
Notation:
m = Number of training examples
Example
x's = "input" variable / features
y's = "output" variable / "target" variable
x (1) 2104
(x,y) one training example (one raw)
y (2) 232
(x (i),y (i)) i th training example
x (4) 852
Training set the job of a learning algorithm to
output a function is usually
w denoted lowercase h and h
Learning algorithm stands for hypothesis
x h > y
the job of a hypothesis function
is taking the value of x and it
tries to output the estimated
value of y. So h is a function that
maps from x's to y's
How do we represent h ?
X
Linear Equations
Y
y = f (X) = 00 + <9lX
Change in Y
θ1= Slop (ΔY)
Change in X (ΔX)
θ0=Y-intercept
X
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
3.5
3-
2.5 - t +
2 +
1.5 | J
1
0.5 -
0 + t
5 10 15 20
Negative Linear Relationship No Relationship
S
7
6
5
4
3
2
1
O t + + t
O 2 4 6 S 1O
u
The cost function, let us figure out how to fit the best
possible straight line to our data.
Size in feet 2 ( x ) Price ($ ) in 1000's ( y)
Training Set
2104 460
1416 232
1534 315
852 178
••• ••
#
Hypothesis: he ( x ) — 9Q + 9\ X
How to choose θi’s ?
Scatter plot
• 1. Plot of All (Xi, Yi) Pairs
• 2. Suggests How Well Model Will Fit
Y
60
40
20
0 X
0 20 40 60
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Y
60
40
20
0 X
0 20 40 60
11
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Y
60
40
20
0 X
0 20 40 60
Intercept
unchanged
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope
unchanged
Y
60
40
20
0 X
0 20 40 60
Intercept
changed
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope
changed
Y
6
0
4
0 X
2 0 2 4 60
Intercept 0 0 0
changed
0
Least Squares
• 1. „Best Fit‟ Means Difference Between Actual Y
Values and Predicted Y Values is a Minimum.
• So square errors!
m m
Yi h(x i) ˆi
2 2
i1 i1
15
Least Squares
• 1. „Best Fit‟ Means Difference Between Actual Y Values &
Predicted Y Values Are a Minimum. So square errors!
m m
Yi h(x i) ˆi
2 2
i1 i1
• 2. LS Minimizes the Sum of the Squared Differences
(errors) (SSE)
16
Least Squares Graphically
n
LS minimizes 2
2
1 2
2 2
3 2
4
i 1
Y Y201X2 ˆ2
^4
^2
^1 ^3
hθ(xi ) θ0 θ1 X i
X
17
Least Squared errors Linear
Regression
I
,
A
Minimiz
e
> predictions on the
X training set
the actual values
Idea : Choose #oA so that
/10 (2) is close to y for our
training examples ( x , y )
Minimiz
e
Cost function visualization
Consider a simple case of hypothesis by setting θ0=0, then h becomes : hθ(x)=θ1x
Each value of θ1 corresponds to a different hypothesis as it is the slope of the line
which corresponds to different lines passing through the origin as shown in plots below as y-intercept
i.e. θ0 is nulled out.
At θ1=2,
At θ1=1,
At θ1=0.5, J(0.5) ( 0.52 + l 2 + 1.52 ) = 0.58
Simple
Simple Hypothesis
Cost function visualization
At θ1=2,
At θ1=1,
At θ1=0.5, J(0.5) „ -- ( 0.52 + l 2 + 1.52 ) = 0.58
2* 6
On plotting points like this further, one gets
the following graph for the cost function which
is dependent on parameter θ1.
plot each value of θ1 corresponds to a
different hypothesizes -2 4 9\ 6
Cost function visualization
What is the optimal value of θ1 that minimizes
J(θ1) ?
It is clear that best value for θ1 =1 as J(θ1 ) = 0,
which is the minimum.
How to find the best value for θ1 ?
Plotting ?? Not practical specially in high
dimensions?
The solution :
1. Analytical solution: not applicable for large -2 4 9\ 6
datasets
2. Numerical solution: ex: Gradient descent .
Hypothesis:
h 0 ( x ) = #o H-
Parameters:
Cost Function:
m
(i ) 2
J ( 0o , 0 i ) = 2 hl 2Z (h o { x ) - y (i )
)
Goal: minimize J ( O n . 0 \ )
0o , 0 \
Gradient Descent
Iterative solution not only in linear regression. It's
actually used all over the place in machine learning.
Objective: minimize any function ( Cost Function J)
p
Have some function J ( QQ , Q\ )
Want min J( #oA )
0O,0I
Outline:
• Start with some %
• Keep changing 0o
^ i -
to reduce / ( $o ? $i)
until we hopefully end up at a minimum
Imagine that this is a landscape of grassy park, and you want to go
to the lowest point in the park as rapidly as possible
Red: means
Starting point
high blue:
means low
J()
local
minimum
New Starting
point
Red: means
high blue:
means low
J()
New local
minimum
With different starting
point
Gradient descent Algorithm
d
repeat until convergence
• Where
o
^
:= is the assignment operator
:= 9 j - a
893;
o ois the learning rate which basically defines how big the steps are during
the descent
o o «J( 0 0 ) s the partial derivative term
OGj o» i
'
o j = 0, 1 represents the feature index number
Also the parameters should be updated simulatenously , i.e. ,
tempo ’.= OQ — ex. %r J( eo , 0 )
dOo|
i
2 00
1 75
1.50
125
tempi Oi — at % J ( Bo, Oi )
dGx
- Z 1.00
0 75
0 50
9Q := tempo 0.25
-0.25 x
-050
6\ := tempi -0 50-0 75
-0.75
-1 005-1.00
J(θ1)
d
1 1 j(1)
+ slop
d1
θ1 θ1= θ1- (+ve)
J(θ1)
- slop
θ1= θ1- (-ve)
θ1
Gradient descent algorith Linear Regression Model
repeat until convergence
d
Mra3rwi)
]
I
( for j = 1 and j = (!)
/
d m
d j
j(0,1)
d 1
h (xi ) Yi 2
d j 2m i1
d
0 1 (xi ) Yi
1 m
d
j(0,1)
2
d j d j 2m i1
m
j(0,1) h(xi ) Yi
d 1
j 0:
d0 m i1
m
j(0,1) h(xi ) Yi xi
d 1
j 1:
d1 m i1
Gradient descent algorithm
repeat until convergence {
‘
m
1 ( i ) ) - (i)
#0 0() am E ( M* 2/ )
m
01 = 01
•
1
a 111 E ( M* (i ) ) - (i )
2/ ) xw•
}
"Batch" Gradient Descent
"Batch": Each step of gradient descent
uses all the training examples.
repeat until convergence {
m
9o : Oo
‘
1
am w {i )
£ (M* ) y )
-
m
9\ 91 i
am E { ho { x {l)
)- y ) - x
{i)
^
}
Example after implement some iterations
using gradient descent
Iteration 1
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 2
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 3
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 4
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 5
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 6
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 7
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 8
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 9
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 10
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 11
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Thanks