Machine Learning
Gradient Descent
Gradient Descent
Gradient Descent is just like Agile Methodology
Make
changes Build
depending something
upon the quickly
feedback
Get some Get it out
feedback there
Gradient Descent
Lets have some function 𝐽 θ
Want to min J(θ)
θ
Algorithm:
- initialize θ ’s randomly
- keep chaining θ ′ s to reduce J(θ)
until we hopefully end up at a minimum
Gradient Descent
Lets have some function 𝐽 θ
Want to min J(θ)
θ
Algorithm:
- initialize θ ’s randomly
- repeat until convergence {
𝜕
θi := θi - α J(θ)
𝜕θi
Gradient Descent
Lets have some function 𝐽 θ1
Want to min J(θ1)
θ1
Algorithm:
- initialize θ1 randomly
- keep chaining θ1 to reduce J(θ 1)
until we hopefully end up at a minimum
Gradient Descent
Lets have some function 𝐽 θ1
Want to min J(θ1)
θ1
Algorithm:
- initialize θ ’s randomly
- repeat until convergence {
𝜕
θ1 := θ1 - α J(θ1)
𝜕θ1
}
Gradient Descent
𝐽 θ1 = (θ1 - 3 )2 +5 θ1 := θ1 - α
𝜕
J(θ1)
𝜕θ1
θ1 𝑱 θ1 𝜕
0 14 J(θ1) = 2(θ1 – 3) α = 0.1
𝜕θ1
1 9
-1 21
If θ1 = 10
2 6
-2 30
3 5
-3 41
4 6
-4 54
5 9
-5 69
6 14 If θ1 = -5
-6 86
7 21
8 30
9 41
10 54
11 69
12 86
13 105
Gradient Descent
Q&A
Impact of learning rate in Gradient
Descent
Impact of learning rate in Gradient Descent
Impact of learning rate in Gradient Descent
Q&A
How to implement Gradient Descent
How to implement Gradient Descent
𝐽 θ1 = (θ1 - 3 )2 +5 initialize θ ’s randomly
- repeat until convergence {
θ1 𝑱 θ1 𝜕
0 14 θ1 := θ1 - α J(θ1)
𝜕θ1
1 9 }
-1 21
2 6
-2 30
3 5 𝜕
𝐽(θ1) = 2(θ1 – 3)
-3 41 𝜕θ1
4 6
-4 54
5 9
-5 69 initialization θ1 = 10 initialization θ1 = -5
6 14
-6 86
7 21 Repeat until convergence{
8 30
θ1 := θ1 - α 2(θ1 – 3)
9 41
10 54
}
11 69
12 86
13 105
How to implement Gradient Descent
Cost function: J(θ0,θ1)
min J(θ0,θ1)
θ0,θ1
Algorithm:
- initialize θ ’s randomly
- repeat until convergence {
𝜕
θi := θi - α J(θ0,θ1)
𝜕θi
How to implement Gradient Descent
How to implement Gradient Descent
How to implement Gradient Descent
How to implement Gradient Descent
Cost function: J(θ0,θ1)
Algorithm:
- initialize θ ’s randomly min J(θ0,θ1)
θ0,θ1
- repeat until convergence {
𝜕
θi := θi - α J(θ0,θ1)
𝜕θi
Correct: Simultaneous Update Incorrect
𝜕 𝜕
temp0 := θ0 - α J(θ0,θ1) temp0 := θ0 - α J(θ0,θ1)
𝜕θ0 𝜕θ0
𝜕
temp1 := θ1 - α J(θ0,θ1) θ0 := temp0
𝜕θ1
𝜕
θ0 := temp0 temp1 := θ1 - α J(θ0,θ1)
𝜕θ1
θ1 := temp1 θ1 := temp1
Q&A