0% found this document useful (0 votes)

19 views101 pages

Week 4

Linear regression is a fundamental method for predicting real-valued outputs by minimizing the mean squared error between predictions and actual values. It can be applied with one variable or multiple variables, using gradient descent for optimization. Logistic regression extends linear regression for binary classification problems, utilizing the sigmoid function to estimate probabilities.

Uploaded by

Áhmed Śhahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views101 pages

Week 4

Uploaded by

Áhmed Śhahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 101

Linear

Regression
Linear Regression
• Linear regression, or ordinary least squares (OLS), is the simplest and most classic linear method
for regression.
• Linear regression finds the parameters w and b that minimize the mean squared error between
predictions and the true regression targets, y, on the training set.
• The mean squared error is the sum of the squared differences between the predictions and the
true values.
Linear regression
with one variable
Model
representation
Machine Learning
500
Housing Prices
400
(Portland, OR)
300

Price 220200
(in 1000s 100
of dollars) 0
0 500 1000
1250 1500 2000 2500 3000

Size (feet2)
Supervised Learning Regression Problem
Given the “right answer” for Predict real-valued output
each example in the data. (Classification: discrete output values)
Training set of Size in feet2 (x) Price ($) in 1000's (y)
housing prices 2104 460
(Portland, OR) 1416 232
1534 315
852 178
… …
Notation:
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable

(x, y) – one training example

(𝑥 𝑖 , 𝑦 𝑖 ) 𝑖𝑡ℎ 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
Training Set How do we represent h ?
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Learning Algorithm

x
x x x
Size of h Estimated y x x
house price
x hypothesis Estimated y x
Linear regression with one variable.
h maps from x’s to y’s Univariate linear regression.
Linear regression
with one variable
Cost function

Machine Learning
Size in feet2 (x) Price ($) in 1000's (y)
Training Set
2104 460
1416 232
1534 315 m = 47

852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ?
3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
y

Idea: Choose so that

is close to for our
training examples
Linear regression
with one variable
Cost function
intuition I
Machine Learning
Simplified
Hypothesis:

Parameters:

Cost Function:

Goal:
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2

y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2

y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2

y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
Linear regression
with one variable
Cost function
intuition II
Machine Learning
Hypothesis:

Parameters:

Cost Function:

Goal:
(for fixed , this is a function of x) (function of the parameters )

500

400

Price ($) 300

in 1000’s
200

100

0
0 500 1000 1500 2000 2500 3000
Size in feet2 (x)
(for fixed , this is a function of x) (function of the parameters )

Each contour line shows the set of values of parameter that share the same value of J
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Linear regression
with one variable
Gradient
descent
Machine Learning
Have some function
Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
J()



J()



Gradient descent algorithm
Gradient descent algorithm

Correct: Simultaneous update Incorrect:

Linear regression
with one variable
Gradient descent
intuition
Machine Learning
Gradient descent algorithm
If α is too small, gradient descent
can be slow.

If α is too large, gradient descent

can overshoot the minimum. It may
fail to converge, or even diverge.
at local optima

Current value of
Gradient descent can converge to a local
minimum, even with the learning rate α fixed.

As we approach a local
minimum, gradient
descent will automatically
take smaller steps. So, no
need to decrease α over
time.
Linear regression
with one variable
Gradient descent for
linear regression
Machine Learning
Gradient descent algorithm Linear Regression Model
Gradient descent algorithm

update
and
simultaneously
J()



CONVEX QUADRATIC FUNCTION

The cost function for linear regression is always going to be a bowl-shaped function,
called a convex quadratic function that gives the global optima.
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
“Batch” Gradient Descent

“Batch”: Each step of gradient descent

uses all the training examples.
Other Options of GD
• Stochastic Gradient Descent
• we repeatedly run through the
training set
• Each time we encounter a training
exp, we update the parameters
according to the gradient of the
error with respect to that single
training example only.

• Mini-Batch Gradient Descent

• Take a subset of the entire
dataset.
Linear Regression with
multiple variables

Multiple features

Machine Learning
Multiple features (variables).

Size (feet2) Price ($1000)

2104 460
1416 232
1534 315
852 178
… …
Multiple features (variables).
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
For convenience of notation, define .

1 * (n+1)
Multivariate linear regression.
Linear Regression with
multiple variables

Gradient descent for

multiple variables

Machine Learning
Hypothesis:
Parameters:
Cost function:

Gradient descent:
Repeat

(simultaneously update for every )

New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
(simultaneously update for
)

(simultaneously update )
Linear Regression with
multiple variables
Gradient descent in
practice I: Feature Scaling

Machine Learning
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g. = size (0-2000 feet2)
= number of bedrooms (1-5)
Feature Scaling
Get every feature into approximately a range.
Mean normalization
Replace with to make features have approximately zero mean
(Do not apply to ).
E.g.
Linear Regression with
multiple variables

Gradient descent in
practice II: Learning rate
Machine Learning
Gradient descent

- “Debugging”: How to make sure gradient

descent is working correctly.
- How to choose learning rate .
Making sure gradient descent is working correctly.

Example automatic
convergence test:

Declare convergence if
decreases by less than
in one iteration.
0 100 200 300 400
No. of iterations
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller .

No. of iterations

No. of iterations No. of iterations

- For sufficiently small , should decrease on every iteration.

- But if is too small, gradient descent can be slow to converge.
Summary:
- If is too small: slow convergence.
- If is too large: may not decrease on
every iteration; may not converge.

To choose , try
Linear Regression with
multiple variables

Features and
polynomial regression
Machine Learning
Housing prices prediction
Polynomial regression

Price
(y)

Size (x)
Choice of features

Price
(y)

Size (x)
Linear Regression with
multiple variables

Normal equation

Machine Learning
Gradient Descent

Normal equation: Method to solve for

analytically.
Intuition: If 1D

(for every )

Solve for
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
examples ; features.

E.g. If
is inverse of matrix .

Octave: pinv(X’*X)*X’*y
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.
Ridge Regression (L2 Regularization)
•

•
•
•
Lasso (Least Absolute Shrinkage and Selection
Operator) (L1 Regularization)
•

•
•
•
Logistic
Regression
Logistic Regression*
ℎ𝜃 𝑥 = 𝜃 𝑇 𝒙

Malignancy
0.5

Tumor Size Tumor Size

*Name is given due to historical reasons
Logistic Regression
• In case of a binary classification problem where 𝑦 𝜖{0,1}
• Threshold of classifier is at ℎ𝜃 (𝑥) = 0.5
• If ℎ𝜃 (𝑥) > 0.5, y = 1
• If ℎ𝜃 (𝑥) < 0.5, y = 0
• Logistic regression: 0 ≤ ℎ𝜃 (𝑥) ≤ 1
Logistic Regression
• In linear regression: ℎ𝜃 𝑥 = 𝜃 𝑇 𝒙
• In logistic regression: ℎ𝜃 𝑥 = 𝑠𝑖𝑔𝑚(𝜃 𝑇 𝒙)
𝟏
• Where 𝑠𝑖𝑔𝑚 𝑧 = (sigmoid function or the logistic function)
𝟏+𝒆−𝒛
• Hence the name logistic regression
• But it is a classifier that is extended from linear regression
𝟏
• Finally ℎ𝜃 𝑥 = 𝑇
𝟏+𝒆−𝜃 𝒙
Logistic Regression
𝟏
• ℎ𝜃 𝑥 = 𝑇 1
𝟏+𝒆−𝜃 𝒙
• Task is to select parameters 𝜃 to fit the date
0.5

0
Logistic Regression
• ℎ𝜃 𝑥 = estimated probability that y = 1 on input x
• For example, if
𝑥0 1
𝑥= 𝑥 =
1 𝑡𝑢𝑚𝑜𝑟 𝑆𝑖𝑧𝑒
And ℎ𝜃 𝑥 = 0.7
70% chance of tumor being malignant.

• ℎ𝜃 𝑥 = 𝑝(𝑦 = 1|𝑥; 𝜃), prob that y=1 given x, parameterized by 𝜃.

• 𝑝 𝑦 = 0 𝑥; 𝜃 + 𝑝(𝑦 = 1|𝑥; 𝜃)=1
• 𝑝(𝑦 = 0|𝑥; 𝜃)=1-𝑝(𝑦 = 1|𝑥; 𝜃)
Logistic Regression
Decision Boundary

ℎ𝜃 𝑥 = 𝑔 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2
Linear Case −3
𝜃= 1
x2
1
Predict y = 1 if −3 + 𝑥1 + 𝑥2 ≥ 0
𝑥1 + 𝑥2 ≥ 3
A linear decision boundary obtained from
linear regression model.

x1
Logistic Regression
Decision Boundary

Nonlinear Case ℎ𝜃 𝑥 = 𝑔 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥32 + 𝜃4 𝑥42

−1
0
𝜃= 0
1
1
Predict y = 1 if −1 + 𝑥32 + 𝑥42 ≥ 0
Logistic Regression
Decision Boundary

Nonlinear Case ℎ𝜃 𝑥 = 𝑔 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥32 + 𝜃4 𝑥42

−1
0
𝜃= 0
1
1
Predict y = 1 if −1 + 𝑥32 + 𝑥42 ≥ 0
x2

1 y=1

y=0

-1 x1
Logistic Regression
Cost Function
Cost that a learning algo (hypothesis) has to pay if its prediction is h(x) when the actual label is y
𝑚
1 1 2
Linear Regression Model: 𝐽 𝜃 = ෍ ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 (MSE: Mean Squared Error)
𝑚 2
𝑖=1
1 2
Cost (ℎ𝜃 (x),y) = ℎ𝜃 𝑥 − 𝑦
2

If we use the same cost function for logistic regression whose hypothesis is a nonlinear
𝜃
function, it will result in a nonconvex J( ).
Gradient Descent
𝜃 𝑛𝑒𝑤 = 𝜃 𝑜𝑙𝑑 + ∆𝜃

𝜕𝐽 𝜃
∆𝜃 = −𝛼 |𝜃= 𝜃(old)
𝜕𝜃

Parameter 𝛼 > 0.𝛼 plays an important role in the

convergence of the algorithm.
If 𝛼 is too small, the correction is small and
Geometric interpretation of parameter optimization.
convergence to the optimum point is slow.
If 𝛼 is too large, the algorithm may oscillate around In the gradient descent scheme, the
the optimum value and convergence is not possible. correction of the parameters takes place
in the direction that decreases the value
of the cost function.
Logistic Regression
Cost Function
• If𝛼 is properly chosen, the algorithm converges
to a stationary point of J(𝜃) which can be
• A local minimum 𝜃10
• A global minimum 𝜃 0
• Or a saddle point 𝜃20
• To which of the stationary points the algorithm
will converge depends on the position of the
initial point, relative to the stationary points.
• Furthermore, the convergence speed depends on
the form of the cost J(𝜃).
Logistic Regression
Cost Function for Logistic Regression
Logistic Regression − log ℎ𝜃 (x) if y=1
Cost (ℎ𝜃 (x),y) = ቊ
Model: − log(1 − ℎ𝜃 (x)) if y=0

Y=1
Cost = 0 if y = 1, h = 1
But as h → 0
Cost → infinity
That is if h = 0
P(y=1|x;w) = 0, but y = 1
We’ll penalize the learning algorithm by a large cost.

ℎ𝜃 (x) 1
Logistic Regression
Cost Function for Logistic Regression
Logistic Regression − log ℎ𝜃 (x) if y=1
Cost (ℎ𝜃 (x),y) = ቊ
Model: − log(1 − ℎ𝜃 (x)) if y=0

Y=0

ℎ𝜃 (x) 1
Logistic Regression
Cost Function for Logistic Regression
𝑚
1
𝐽 𝜃 = ෍ 𝐶𝑜𝑠𝑡(ℎ𝜃 (𝑥 𝑖 ),𝑦 𝑖 )
𝑚
𝑖=1

− log ℎ𝜃 (x) if y=1

Cost (ℎ𝜃 (x),y) = ቊ
− log(1 − ℎ𝜃 (x)) if y=0

Cost (ℎ𝜃 (x),y) = −𝑦 log ℎ𝜃 (x) − (1 − 𝑦) log(1 − ℎ𝜃 (x))

𝑚
1
𝐽 𝜃 = ෍(−𝑦 𝑖 log ℎ𝜃 (𝑥 𝑖 ) − (1 − 𝑦 𝑖 ) log(1 − ℎ𝜃 (𝑥 𝑖 ))
𝑚
𝑖=1
• This cost function is derived in Statistics from the idea of maximum likelihood estimation
which helps to efficiently find parameters for different models
Logistic Regression
Cost Function for Logistic Regression
𝑚
1
𝐽 𝜃 = ෍ 𝐶𝑜𝑠𝑡(ℎ𝜃 (𝑥 𝑖 ),𝑦 𝑖 )
𝑚
𝑖=1

• To fit parameter 𝜃
min 𝐽 𝜃 𝑡𝑜 𝑔𝑒𝑡𝜃 𝒃𝒖𝒕 𝒉𝒐𝒘? ?
𝜃
• To make a prediction given new data (x)
𝟏
ℎ𝜃 𝑥 = 𝑇𝒙
𝟏 + 𝒆−𝜃
Logistic Regression
Cost Function for Logistic Regression

min 𝐽 𝜃 𝑖𝑠 𝑎𝑐ℎ𝑖𝑒𝑣𝑒𝑑 𝑡ℎ𝑟𝑜𝑢𝑔ℎ 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝑑𝑒𝑠𝑐𝑒𝑛𝑡

𝜃
𝑚
1
𝐽 𝜃 = ෍ −𝑦 𝑖 log ℎ𝜃 (𝑥 𝑖 ) + (1 − 𝑦 𝑖 ) log(1 − ℎ𝜃 (𝑥 𝑖 ))
𝑚
𝑖=1

𝜕𝐽 𝜃
𝜃𝑗 = 𝜃𝑗 −𝛼
𝜕𝜃𝑗
1 𝑚
𝜃𝑗 = 𝜃𝑗 − 𝛼 σ𝑖=1(ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )𝑥𝑗𝑖
𝑚

Multivariate Linear Regression Guide
No ratings yet
Multivariate Linear Regression Guide
24 pages
Cuestionarios IA
No ratings yet
Cuestionarios IA
17 pages
Informed Search Strategies: Artificial Intelligence
100% (1)
Informed Search Strategies: Artificial Intelligence
72 pages
An Introduction To Kohonen Self Organizing Maps: Rajarshi Guha
No ratings yet
An Introduction To Kohonen Self Organizing Maps: Rajarshi Guha
12 pages
Artificial Intelligence Artificial Neural Networks - : Introduction
No ratings yet
Artificial Intelligence Artificial Neural Networks - : Introduction
43 pages
Data Analytics - Ridge and LASSO Regression
No ratings yet
Data Analytics - Ridge and LASSO Regression
15 pages
Heuristic Search
No ratings yet
Heuristic Search
49 pages
Gurobi Optimization
100% (2)
Gurobi Optimization
26 pages
Least Square Regression
No ratings yet
Least Square Regression
13 pages
Policy Gradient Methods For Reinforcement Learning PDF
No ratings yet
Policy Gradient Methods For Reinforcement Learning PDF
5 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Lec 5 Contd Minimax Alpha Beta Algorithm
No ratings yet
Lec 5 Contd Minimax Alpha Beta Algorithm
21 pages
Ps 1
No ratings yet
Ps 1
16 pages
Final Exam: CS 188 Spring 2019 Introduction To Artificial Intelligence
No ratings yet
Final Exam: CS 188 Spring 2019 Introduction To Artificial Intelligence
23 pages
Introduction
No ratings yet
Introduction
6 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
27 pages
01 - Welcome To ML4T
100% (1)
01 - Welcome To ML4T
15 pages
Decision Trees
No ratings yet
Decision Trees
32 pages
CS 3600 Project 4b Analysis
No ratings yet
CS 3600 Project 4b Analysis
3 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
Gaussian Mixture Models Unit-III
No ratings yet
Gaussian Mixture Models Unit-III
13 pages
Lecture14 - Alpha Beta Pruning
No ratings yet
Lecture14 - Alpha Beta Pruning
47 pages
Branch and Bound NOV 2021
No ratings yet
Branch and Bound NOV 2021
38 pages
AI - 03 (Problems, State Space)
No ratings yet
AI - 03 (Problems, State Space)
44 pages
Heuristic Search Techniques
No ratings yet
Heuristic Search Techniques
54 pages
Simplex Method in OR
No ratings yet
Simplex Method in OR
10 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
Tutorial 2018 Optimization
No ratings yet
Tutorial 2018 Optimization
7 pages
A Practical Guide To Robust Optimization11
No ratings yet
A Practical Guide To Robust Optimization11
29 pages
Logistic Regression & Model Evaluation
100% (1)
Logistic Regression & Model Evaluation
11 pages
Operations Research
100% (1)
Operations Research
19 pages
AI Question Bank 2017 18 CSE
No ratings yet
AI Question Bank 2017 18 CSE
4 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Artificial Intelligence CS-3431w (V2)
No ratings yet
Artificial Intelligence CS-3431w (V2)
15 pages
UCR Time Series Classification Archive
No ratings yet
UCR Time Series Classification Archive
14 pages
ISYE-6669 Optimization Homework
No ratings yet
ISYE-6669 Optimization Homework
3 pages
Lasso Regularization for Statisticians
No ratings yet
Lasso Regularization for Statisticians
14 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Chapter 2 IA
No ratings yet
Chapter 2 IA
49 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Module08 PolynomialRegressionSplineGAMs
No ratings yet
Module08 PolynomialRegressionSplineGAMs
56 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
C15-Momentum RMSProp Adam
No ratings yet
C15-Momentum RMSProp Adam
23 pages
Gradient Descent Algorithm Guide
No ratings yet
Gradient Descent Algorithm Guide
11 pages
Chapter 4: Divide and Conquer
No ratings yet
Chapter 4: Divide and Conquer
41 pages
Cross-Validation and Model Selection
No ratings yet
Cross-Validation and Model Selection
46 pages
CO250 Web
No ratings yet
CO250 Web
204 pages
Midterm Exam Fall 2019 Solution PDF
No ratings yet
Midterm Exam Fall 2019 Solution PDF
7 pages
Elementary Linear Programming With Applications 1st Edition by Bernard Kolman, Robert Beck 0124179103â ŽÂ 978-0124179103pdf Download
100% (10)
Elementary Linear Programming With Applications 1st Edition by Bernard Kolman, Robert Beck 0124179103â ŽÂ 978-0124179103pdf Download
90 pages
Arc Consistency in AI CSPs
No ratings yet
Arc Consistency in AI CSPs
23 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
cs188 sp19 Final Sol
No ratings yet
cs188 sp19 Final Sol
28 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
No ratings yet
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
351 pages
Lecture08 AI UMT Fall 2020 21 - V3
No ratings yet
Lecture08 AI UMT Fall 2020 21 - V3
31 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
ANN-Unit 6 - Deep Neural Networks
No ratings yet
ANN-Unit 6 - Deep Neural Networks
29 pages
Association Rules for Data Analysts
No ratings yet
Association Rules for Data Analysts
16 pages
Week 04
No ratings yet
Week 04
101 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Retail Management (RM) Notes Module 01
No ratings yet
Retail Management (RM) Notes Module 01
10 pages
M Arch Architechture
No ratings yet
M Arch Architechture
31 pages
SP Ishares Core Msci Total International Stock Etf 7 31
No ratings yet
SP Ishares Core Msci Total International Stock Etf 7 31
16 pages
OTN N50 N70 Flexible Ethernet Transport
No ratings yet
OTN N50 N70 Flexible Ethernet Transport
8 pages
WZKat 0811 E
No ratings yet
WZKat 0811 E
70 pages
Nike Marketing Dissertation Help
100% (2)
Nike Marketing Dissertation Help
8 pages
Analysis of Power Losses in AC-DC Converter For EV Drive
No ratings yet
Analysis of Power Losses in AC-DC Converter For EV Drive
4 pages
McHenry Squadron - Dec 2007
No ratings yet
McHenry Squadron - Dec 2007
7 pages
Miss Ambreen Shipyard
No ratings yet
Miss Ambreen Shipyard
11 pages
The Thiefs Story PYQs CBSE10
No ratings yet
The Thiefs Story PYQs CBSE10
3 pages
Being Pwer and Powerless - Hmnties & SCL Scncs Reviews
No ratings yet
Being Pwer and Powerless - Hmnties & SCL Scncs Reviews
106 pages
Extinction of Criminal Action
No ratings yet
Extinction of Criminal Action
17 pages
Giới Thiệu Tổng Quan Về Công Ty Cổ Phần Sữa Việt Nam
No ratings yet
Giới Thiệu Tổng Quan Về Công Ty Cổ Phần Sữa Việt Nam
10 pages
Final Report 3
No ratings yet
Final Report 3
65 pages
Master List of Drawings PDF
No ratings yet
Master List of Drawings PDF
117 pages
Legal Profession and Ethics Guide
100% (1)
Legal Profession and Ethics Guide
5 pages
Syllabus For PPSC POSTS
No ratings yet
Syllabus For PPSC POSTS
2 pages
Induction Program & University Orientation Program 2025
No ratings yet
Induction Program & University Orientation Program 2025
12 pages
Voith Hydro India Overview
No ratings yet
Voith Hydro India Overview
302 pages
International Law Answers
No ratings yet
International Law Answers
8 pages
Tuned Amplifier PDF
100% (1)
Tuned Amplifier PDF
40 pages
Falk Catalog G Series
No ratings yet
Falk Catalog G Series
62 pages
Price Report 20240930 e
No ratings yet
Price Report 20240930 e
2 pages
A Study On The Performance of E-Waste With Bitumen For Road Construction
No ratings yet
A Study On The Performance of E-Waste With Bitumen For Road Construction
34 pages
OHS Act and Workplace Safety Guide
No ratings yet
OHS Act and Workplace Safety Guide
21 pages
Brookes Bell Seed Cake Cargoes Imsbc Code 2020
No ratings yet
Brookes Bell Seed Cake Cargoes Imsbc Code 2020
5 pages
Manual de Partes de 420F
100% (19)
Manual de Partes de 420F
609 pages
Cons Fami Phys Repo
No ratings yet
Cons Fami Phys Repo
2 pages
Leadership Styles
No ratings yet
Leadership Styles
2 pages
SEO Tips for Literature Reviews
100% (3)
SEO Tips for Literature Reviews
6 pages

Week 4

Uploaded by

Week 4

Uploaded by

Linear

(x, y) – one training example

Idea: Choose so that

Price ($) 300

Correct: Simultaneous update Incorrect:

If α is too large, gradient descent

“Batch”: Each step of gradient descent

• Mini-Batch Gradient Descent

Size (feet2) Price ($1000)

Gradient descent for

(simultaneously update for every )

- “Debugging”: How to make sure gradient

No. of iterations No. of iterations

- For sufficiently small , should decrease on every iteration.

Normal equation: Method to solve for

Tumor Size Tumor Size

• ℎ𝜃 𝑥 = 𝑝(𝑦 = 1|𝑥; 𝜃), prob that y=1 given x, parameterized by 𝜃.

Nonlinear Case ℎ𝜃 𝑥 = 𝑔 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥32 + 𝜃4 𝑥42

Nonlinear Case ℎ𝜃 𝑥 = 𝑔 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥32 + 𝜃4 𝑥42

Parameter 𝛼 > 0.𝛼 plays an important role in the

− log ℎ𝜃 (x) if y=1

Cost (ℎ𝜃 (x),y) = −𝑦 log ℎ𝜃 (x) − (1 − 𝑦) log(1 − ℎ𝜃 (x))

min 𝐽 𝜃 𝑖𝑠 𝑎𝑐ℎ𝑖𝑒𝑣𝑒𝑑 𝑡ℎ𝑟𝑜𝑢𝑔ℎ 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝑑𝑒𝑠𝑐𝑒𝑛𝑡

You might also like