0% found this document useful (0 votes)

166 views42 pages

Linear Regression With Multiple Variables

This document discusses linear regression with multiple variables. It introduces notation for multiple features and describes representing the hypothesis as a matrix multiplication between the parameter vector θ and the feature vector X. The gradient descent algorithm is generalized to update each parameter θj simultaneously based on its contribution to minimizing the cost function. Feature scaling techniques like normalization are also introduced to help gradient descent converge more quickly.

Uploaded by

Asif Bin Latif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views42 pages

Linear Regression With Multiple Variables

Uploaded by

Asif Bin Latif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Linear Regression

with Multiple Variables

Lecture 04

Silvia Ahmed Mirza Mohammad Lutfe Elahi

CSE 445 Machine Learning ECE@NSU
Multiple Features
Size in feet2 (x) Number of Number of Age of Price ($) in 1000’s
Bedrooms Floors home (y)
(years)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …

• Notation
– n = number of features
– m = number of training examples
– x(i) = input (features) of ith training example
= value of features j in ith training example

CSE 445 Machine Learning ECE@NSU

Multiple Features
• Multiple variables = multiple features
• In original version we had
– x = house size (use this to predict)
– y = house price
• If in a new scheme we have more variables (such as
number of bedrooms, number of floors, age of the
home)
– x1, x2, x3, x4 are the four features
• x1 - size (feet squared)
• x2 - Number of bedrooms
• x3 - Number of floors
• x4 - Age of home (years)
– y is the output variable (price)
CSE 445 Machine Learning ECE@NSU
Hypothesis for Multiple Features

Previously: hθ(x) = θ0 + θ1x

hθ(x) = θ0 + θ1x1 + θ2x2 + θ3x3 + … + θnxn

E.g.:
hθ(x) = 80 + 0.1x1 + 0.01x2 + 3x3 - 2x4

CSE 445 Machine Learning ECE@NSU

Hypothesis for Multiple Features
• For convenience of notation, x0 = 1 For every example
i you have an additional 0th feature for each example
• So now your feature vector is n + 1 dimensional
feature vector indexed from 0
– This is a column vector called X
– Each example/sample has a column vector associated with it
• Parameters are also in a 0 indexed n + 1 dimensional
vector
– This is also a column vector called θ
– This vector is the same for each example

CSE 445 Machine Learning ECE@NSU

Hypothesis for Multiple Features
hθ(x) = θ0 + θ1x1 + θ2x2 + … + θnxn

0 θ0
1 θ1
2 θ2
X= ϵ ℝn+1 θ= ϵ ℝn+1

n θn

hθ(x) = θTX

CSE 445 Machine Learning ECE@NSU

Hypothesis for Multiple Features
• hθ(x) = θT X
– θT is an [1 x n+1] matrix
– In other words, because θ is a column vector, the
transposition operation transforms it into a row vector
– So before θ was a matrix [n+1 x 1]
– Now θT is a matrix [1 x n+1]
– Which means the inner dimensions of θT and X match, so
they can be multiplied together as
• [1 x n+1] * [n+1 x 1] = hθ(x)
• So, in other words, the transpose of our parameter vector * an input
example X gives you a predicted hypothesis which is [1 x 1]
dimensions (i.e. a single value)

CSE 445 Machine Learning ECE@NSU

Model for Multiple Features
Hypothesis: hθ(x) = θTX = θ0x0 + θ1x1 + θ2x2 + … + θnxn

Parameters = θ0, θ1, θ2, … θn

Cost function: θ = θ( )−

Gradient descent:
Repeat{
θj θj – α J(θ)
j

} (simultaneously update for every j = 0, 1, … n)

CSE 445 Machine Learning ECE@NSU
Gradient Descent Algorithm
Previously (n = 1):
Repeat
{
θ0 θ0 – α θ(
()
)− ()

θ1 θ1 – α θ(
( )) − () ()

(simultaneously update θ0, θ1)

}

CSE 445 Machine Learning ECE@NSU

Gradient Descent Algorithm
New algorithm (n ≥ 1):
Repeat
{
θj θj – α θ( )−
(simultaneously update θj for j = 0, 1, … n)
}

θ0 θ0 – α θ(
()
)− () ()

θ1 θ1 – α θ(
()
)− () ()

θ2 θ2 – α θ(
()
)− () ()

…
CSE 445 Machine Learning ECE@NSU
Gradient Descent Algorithm
• We're doing this for each j (0 until n)
as simultaneous update (like when n = 1)
• So, we reset/update θj to
– θj minus the learning rate (α) times the partial derivative of
the θ vector with respect to θj
– In non-calculus words, this means that we do
• Learning rate
• Times 1/m (makes the math easier)
• Times the sum of
– The hypothesis taking in the variable vector, minus the actual value,
times the jth value in that variable vector for each example

CSE 445 Machine Learning ECE@NSU

Feature Scaling
• Having a problem with multiple features
• Make sure those features have a similar scale
– Means gradient descent will converge more quickly
• E.g. x1 = size (0 – 2000 feet2)
x2 = number of bed rooms (1 - 5)
• Means the contours generated if we
θ2 J(θ)
plot θ1 vs. θ2 give a very tall and
thin shape due to the huge range
difference
• Running gradient descent can take a
long time to find the global minimum

θ1
CSE 445 Machine Learning ECE@NSU
Feature Scaling
• Idea: Make sure features are on a similar scale

0 ≤ x1 ≤ 1
0 ≤ x2 ≤ 1 θ2
J(θ)
• You define each value from
x1 and x2 by dividing by the
max for each feature
• Contours become more like
circles (as scaled between
0 and 1) θ1

CSE 445 Machine Learning ECE@NSU

Feature Scaling
• Get every feature into approximately a -1 ≤ xi ≤ 1 range
• Want to avoid large ranges, small ranges or very
different ranges from one another
• Rule of thumb regarding acceptable ranges
– -3 to +3 is generally fine - any bigger bad
– -1/3 to +1/3 is okay - any smaller bad

x0 = 0
0 ≤ x1 ≤ 3
-2 ≤ x2 ≤ 0.5
-100 ≤ x3 ≤ 100
-0.0001 ≤ x4 ≤ 0.0001
CSE 445 Machine Learning ECE@NSU
Mean Normalization
• Take a feature xi
– Replace it by (xi - mean)/max
– So your values all have an average of about 0

• E.g. -0.5 ≤ x1 ≤ 0.5

-0.5 ≤ x2 ≤ 0.5

• Instead of max, can also use standard deviation or

(max - min)

CSE 445 Machine Learning ECE@NSU

Learning Rate α
• θj θj – α J(θ)
j

• Debugging: how to make sure gradient descent is

working correctly
• How to choose learning rate α

CSE 445 Machine Learning ECE@NSU

Learning Rate α
J(θ)
θ
• θj θj – α J(θ)
j

• Number of iterations varies a lot

– 30 iterations
– 3000 iterations
0 100 200 300 400
– 3000000 iterations
– Very hard to tell in advance how No of iteration
many iterations will be needed
– Can often make a guess based on a plot like this after the first 100 or so
iterations

CSE 445 Machine Learning ECE@NSU

Learning Rate α
J(θ)
θ

0 100 200 300 400

No of iteration
• Automatic convergence tests
– Declare convergence if J(θ) decrease by less than 10-3 in one iteration

CSE 445 Machine Learning ECE@NSU

Learning Rate α
J(θ) J(θ)

No of iteration No of iteration
J(θ)

• Gradient descent not working

• Use small α

No of iteration
CSE 445 Machine Learning ECE@NSU
Learning Rate α
• For sufficiently small α, J(θ) should decrease on every
iteration
• But if α is too small, gradient descent can be slow to
convergence
• So
– If α is too small: slow convergence
– If α is too large: J(θ) may not decrease on every iteration;
may not converge
• To choose α, try
…, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, …

CSE 445 Machine Learning ECE@NSU

Feature Selection
• House price prediction

• Two features
– Frontage - width of the plot of land along road (x1)
– Depth - depth away from road (x2)

• hθ(x) = θ0 + θ1x frontage + θ2 x depth

CSE 445 Machine Learning ECE@NSU

Feature Selection
• You don't have to use just two features
– Can create new features

• Might decide that an important feature is the land area

– So, create a new feature area (x3) = frontage * depth
hθ(x) = θ0 + θ1x area
• Area is a better indicator
• Often, by defining new features you may get a better
model

CSE 445 Machine Learning ECE@NSU

Polynomial regression
• May fit the data better
• hθ(x) = θ0 + θ1x + θ2x2
– e.g. quadratic function

CSE 445 Machine Learning ECE@NSU

Polynomial regression
• For housing data could use a quadratic function
– But may not fit the data so well - inflection point
means housing prices decrease when size gets
really big

CSE 445 Machine Learning ECE@NSU

Polynomial regression
• So instead must use a cubic function
• hθ(x) = θ0 + θ1x + θ2x2 + θ3x3

CSE 445 Machine Learning ECE@NSU

Polynomial regression
• So instead must use a cubic function
• hθ(x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3
• hθ(x) = θ0 + θ1(size) + θ2 (size)2 + θ3 (size)3
x1 = (size)
x2 = (size)2
x3 = (size)3

Make sure apply the feature scaling.

size = 1 – 1000
(size)2 = 1 – 1000000
(size)3 = 1 – 109
CSE 445 Machine Learning ECE@NSU
Choice of features
hθ(x) = θ0 + θ1 (size) + θ2 (size)

• Instead of a conventional polynomial you could do

variable ^(1/something) - i.e. square root, cubed root,
etc.
CSE 445 Machine Learning ECE@NSU
Gradient Descent
J(θ)

• In order to minimize cost function J(θ), iterative

algorithm takes many steps in multiple iteration of
gradient descent to converge to global minimum

CSE 445 Machine Learning ECE@NSU

Normal Equation
• For some linear regression problems the
normal equation provides a better solution
• So far we've been using gradient descent
– Iterative algorithm which takes steps to converge
• Normal equation solves θ analytically
– Solve for the optimum value of θ in one step
– Has some advantages and disadvantages

CSE 445 Machine Learning ECE@NSU

Normal Equation
• Simplified cost function
J(θ) = aθ θ+c
If 1D, θ ϵ ℝ, not a vector
• How do you minimize this?
– Set-

J(θ) = … = 0
• Take derivative of J(θ) with respect to θ
• Set that derivative equal to 0
• Allows to solve for the value of θ which minimizes
J(θ)

CSE 445 Machine Learning ECE@NSU

Normal Equation
• In our more complex problems;
– Here θ is an n + 1 dimensional vector of real numbers
– Cost function is a function of the vector value
θ ϵ ℝn+1
J(θ0, θ1, θ2, … θn) = θ( )−
• How do we minimize this function-
– Take the partial derivative of J(θ) with respect θj and set
to 0 for every j
J(θ) = … = 0 (for every j)

CSE 445 Machine Learning ECE@NSU

Normal Equation
• Solve for
J(θ0, θ1, θ2, … θn )
which minimizes J(θ)

• This would give the values of θ which minimize J(θ)

• If you work through the calculus and the solution, the
derivation is pretty complex
– Not going to go through here
– Instead, what do you need to know to implement this
process

CSE 445 Machine Learning ECE@NSU

Normal Equation
Size in feet2 Number of Number of Age of home Price ($) in 1000’s
Bedrooms Floors (years)
(x1) (x2) (x3) (x4) (y)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178

• Here
– n=4
– m=4

CSE 445 Machine Learning ECE@NSU

Normal Equation
• Add an extra column (x0 feature)
• Construct a column vector y vector [m x 1] matrix
Size in Number of Number of Age of home Price ($) in 1000’s
feet2 Bedrooms Floors (years)
(x0) (x1) (x2) (x3) (x4) (y)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178

X= y=

m x (n+1) m
• θ= (XT X)-1XT y
CSE 445 Machine Learning ECE@NSU
General Form
• m training examples and n features
• The design matrix (X)
– Each training example is a n+1 dimensional feature column
vector
– X is constructed by taking each training example,
determining its transpose (i.e. column -> row) and using it
for a row in the design X
– This creates an [m x (n+1)] matrix

CSE 445 Machine Learning ECE@NSU

General Form
• m training examples (x(1), y(1)), …, (x(m), y(m))
• n features

ϵ ℝn+1
Design
Matrix

m x (n+1)

CSE 445 Machine Learning ECE@NSU

General Form
• Concrete example with only one feature.

E.g.

mx2

CSE 445 Machine Learning ECE@NSU

Normal Equation
θ = (XT X)-1XT y

• (XT X)-1 is inverse of matrix XT X

– i.e. A = XT X
– A-1 = (XT X)-1

• No need to do feature scaling.

CSE 445 Machine Learning ECE@NSU

Gradient Descent vs Feature Scaling
m training examples, n features
Gradient Descent Normal Equation
Need to choose α No need to choose α
Needs many iteration Don’t need to iterate
Works well even when n is Needs to compute (XT X)-1
massive (millions) • This is the inverse of an n x n
• Better suited to big data matrix
• What is a big n though • With most implementations
• 100 or even a 1000 is still computing a matrix inverse
(relativity) small grows by O(n3)
• If n is 10 000 then look at • So not great
using gradient descent • Slow if n is large
• Can be much slower
n = 106 n = 100, n = 1000, n = 10000

CSE 445 Machine Learning ECE@NSU

Normal Equation and Noninvertibility
Normal equation: θ = (XT X)-1XT y

• What if (XT X)-1 is non-invertible?

(singular/degenetate)
– Only some matrices are invertible
– This should be quite a rare problem
• Octave: pinv(X ' * X) * X ' * y
– pinv (pseudo inverse)
– This gets the right value even if (XT X) is non-invertible

CSE 445 Machine Learning ECE@NSU

Normal Equation and Noninvertibility
Normal equation: θ = (XT X)-1XT y

• What does it mean for (XT X) to be non-invertible

• Normally two common causes
– Redundant features (Linearly dependent)
• e.g.
– x1 = size in feet
– x2 = size in meters squared
– x1= (3.28)2 x2

CSE 445 Machine Learning ECE@NSU

Normal Equation and Noninvertibility
Normal equation: θ = (XT X)-1XT y
• What does it mean for (XT X) to be non-invertible
• Normally two common causes
– Too many features
• e.g. m ≤ n (m is much larger than n)
– m = 10 and n = 100
– θ ϵ ℝ100+1
• Trying to fit 101 parameters from 10 training examples
• Sometimes work, but not always a good idea
• Not enough data
• Later look at why this may be too little data
• To solve this we
– Delete features
– Use regularization (let's you use lots of features for a small training set)

CSE 445 Machine Learning ECE@NSU

Multiple Regression for Statisticians
No ratings yet
Multiple Regression for Statisticians
10 pages
Balanced Truncation
No ratings yet
Balanced Truncation
15 pages
Robust and Optimal Control PDF
No ratings yet
Robust and Optimal Control PDF
7 pages
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
No ratings yet
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
351 pages
A Mathematical Approach To Control System PDF
No ratings yet
A Mathematical Approach To Control System PDF
666 pages
Nonlinear Tracking Differentiator For Velocity Estimation From Shaft Encoder
No ratings yet
Nonlinear Tracking Differentiator For Velocity Estimation From Shaft Encoder
12 pages
Linear Systems Raymond A DeCarlo Cap 4 - A - 6
No ratings yet
Linear Systems Raymond A DeCarlo Cap 4 - A - 6
55 pages
Wiener Filter for EEE Students
No ratings yet
Wiener Filter for EEE Students
48 pages
A Mathematical Approach From Classical Control To Advanced Control
No ratings yet
A Mathematical Approach From Classical Control To Advanced Control
346 pages
Dynamic Programming Value Iteration
100% (1)
Dynamic Programming Value Iteration
36 pages
244 Cheat Sheet
No ratings yet
244 Cheat Sheet
4 pages
Topic 10 Nonlinear Systems and Their Linearizations
No ratings yet
Topic 10 Nonlinear Systems and Their Linearizations
20 pages
Dynamic Programming and Optimal Control, Volumes I Solution Selected
No ratings yet
Dynamic Programming and Optimal Control, Volumes I Solution Selected
30 pages
Loop Shaping
100% (1)
Loop Shaping
182 pages
Stochastic Model Predictive Control
100% (1)
Stochastic Model Predictive Control
208 pages
Chapter 9 Quantitative Feedback Theory
No ratings yet
Chapter 9 Quantitative Feedback Theory
44 pages
Dynamic Programming Basics
No ratings yet
Dynamic Programming Basics
6 pages
02 Machine Learning Overview
No ratings yet
02 Machine Learning Overview
103 pages
Modern Control Systems (MCS) : Lecture-13 Introduction To State Space Modeling & Analysis
No ratings yet
Modern Control Systems (MCS) : Lecture-13 Introduction To State Space Modeling & Analysis
30 pages
Binomial Distribution Explained
No ratings yet
Binomial Distribution Explained
16 pages
Stability: EE-601 Linear System Theory
No ratings yet
Stability: EE-601 Linear System Theory
26 pages
Lecture 3-Transfer Functions v2.0
100% (1)
Lecture 3-Transfer Functions v2.0
21 pages
Predictive Control: For Linear and Hybrid Systems
No ratings yet
Predictive Control: For Linear and Hybrid Systems
458 pages
Stochastic Calculus, Filtering, and Stochastic Control
100% (2)
Stochastic Calculus, Filtering, and Stochastic Control
265 pages
Intro2modernControl First 2chaps
No ratings yet
Intro2modernControl First 2chaps
42 pages
Gurobi Optimization
100% (2)
Gurobi Optimization
26 pages
Practical Signal Processing
No ratings yet
Practical Signal Processing
30 pages
Below Frequency Response of A 2nd Order Band Pass Filter
No ratings yet
Below Frequency Response of A 2nd Order Band Pass Filter
11 pages
MPC Simulation Techniques
No ratings yet
MPC Simulation Techniques
5 pages
Computational Learning Theory
No ratings yet
Computational Learning Theory
15 pages
If Statements: 1 A Simple Example
No ratings yet
If Statements: 1 A Simple Example
5 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
172 pages
Robust Toolbox
No ratings yet
Robust Toolbox
655 pages
Operations Research: Linear Programming Problems
No ratings yet
Operations Research: Linear Programming Problems
28 pages
Optimization Principles: 7.1.1 The General Optimization Problem
No ratings yet
Optimization Principles: 7.1.1 The General Optimization Problem
13 pages
ch01 Introduction To Linear Algebra 5th Edition PDF
0% (1)
ch01 Introduction To Linear Algebra 5th Edition PDF
111 pages
AI Exam Prep for CS Students
No ratings yet
AI Exam Prep for CS Students
4 pages
LP Simplex Minimization
No ratings yet
LP Simplex Minimization
23 pages
Simplex Method for Beginners
No ratings yet
Simplex Method for Beginners
6 pages
Econ275 (Stanford) PDF
No ratings yet
Econ275 (Stanford) PDF
4 pages
Mobile Antenna Systems Handbook Artech House Antennas and Propagation Library 3rd Edition Kyohei Fujimoto Instant Download
100% (10)
Mobile Antenna Systems Handbook Artech House Antennas and Propagation Library 3rd Edition Kyohei Fujimoto Instant Download
150 pages
LMI-Linear Matrix Inequality
100% (1)
LMI-Linear Matrix Inequality
34 pages
Optimal Control Dynamic Programming
No ratings yet
Optimal Control Dynamic Programming
18 pages
Monte Carlo Simulation Guide
No ratings yet
Monte Carlo Simulation Guide
13 pages
Monte Carlo With Applications To Finance in R
100% (2)
Monte Carlo With Applications To Finance in R
73 pages
DPOCexam2008midterm Solution
No ratings yet
DPOCexam2008midterm Solution
12 pages
Bilevel Optimization Tutorial Guide
No ratings yet
Bilevel Optimization Tutorial Guide
39 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
263 Homework
No ratings yet
263 Homework
153 pages
Dynamic Probabilistic Systems Volume 2 - Ronald A. Howard
No ratings yet
Dynamic Probabilistic Systems Volume 2 - Ronald A. Howard
564 pages
Linear and NonLinearProgramming18903341X
No ratings yet
Linear and NonLinearProgramming18903341X
6 pages
Laplace Transform
100% (1)
Laplace Transform
37 pages
Optimization For Dummies
No ratings yet
Optimization For Dummies
5 pages
Linear Regression Multi
No ratings yet
Linear Regression Multi
43 pages
CSE445 T3b Linear Regression Multiple Varable
No ratings yet
CSE445 T3b Linear Regression Multiple Varable
27 pages
Linear Regression With Multiple Variables
No ratings yet
Linear Regression With Multiple Variables
56 pages
Linear Regression - Univariate
No ratings yet
Linear Regression - Univariate
62 pages
Achine Learning Inear Egression With Multiple Variable: Ntroduction
No ratings yet
Achine Learning Inear Egression With Multiple Variable: Ntroduction
14 pages
CSE445 T3 Linear Regression One Variable
No ratings yet
CSE445 T3 Linear Regression One Variable
57 pages
ML03
No ratings yet
ML03
14 pages
TV Wall 01
No ratings yet
TV Wall 01
1 page
Boutique Logo Usage Guide
No ratings yet
Boutique Logo Usage Guide
1 page
Who Can Make You Famous With The Push of A Button: 241 New Media Influencers
No ratings yet
Who Can Make You Famous With The Push of A Button: 241 New Media Influencers
32 pages
The Manufacturing Funding Landscape
No ratings yet
The Manufacturing Funding Landscape
1 page
Economics: Elasticity and Its Application
No ratings yet
Economics: Elasticity and Its Application
49 pages
Do You Believe Fast Foods Should Come With Warning Labels?
No ratings yet
Do You Believe Fast Foods Should Come With Warning Labels?
4 pages
Creating and Manipulating Structure of Similar Data Types (Example 1)
No ratings yet
Creating and Manipulating Structure of Similar Data Types (Example 1)
5 pages
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
100% (2)
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
112 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
18 pages
Newton's Law of Motion
No ratings yet
Newton's Law of Motion
60 pages
The Institution of Civil Engineers (India)
No ratings yet
The Institution of Civil Engineers (India)
3 pages
Angelita Unearthed
100% (1)
Angelita Unearthed
6 pages
LED-Based Visible Light Communication
No ratings yet
LED-Based Visible Light Communication
4 pages
HR Recruitment & Selection Guide
No ratings yet
HR Recruitment & Selection Guide
35 pages
Der, Die, Das - Easy Tricks To Identify German Articles
No ratings yet
Der, Die, Das - Easy Tricks To Identify German Articles
3 pages
E.E. Cummings Complete Poems PDF
100% (1)
E.E. Cummings Complete Poems PDF
1,135 pages
SmartGrip 532 Tile Adhesive Guide
No ratings yet
SmartGrip 532 Tile Adhesive Guide
2 pages
Sonnet 147: My Love Is As A Fever, Longing Still
No ratings yet
Sonnet 147: My Love Is As A Fever, Longing Still
2 pages
User Manual Mackie CR4BT (English - 16 Pages)
No ratings yet
User Manual Mackie CR4BT (English - 16 Pages)
3 pages
AITS 2425 PT I JEEM Sol
No ratings yet
AITS 2425 PT I JEEM Sol
19 pages
Trimetric Analysis Neutralization Reactions
100% (1)
Trimetric Analysis Neutralization Reactions
21 pages
Fleming 2014
No ratings yet
Fleming 2014
20 pages
Effect of Solvent Composition On The Sulfonation Degree of Poly (Phenylene Oxide) (Ppo)
No ratings yet
Effect of Solvent Composition On The Sulfonation Degree of Poly (Phenylene Oxide) (Ppo)
5 pages
45 - 6575 - EE231 - 2013 - 1 - 1 - 1 - EE 231 - Lect - 01
No ratings yet
45 - 6575 - EE231 - 2013 - 1 - 1 - 1 - EE 231 - Lect - 01
18 pages
Origins of Feedback Control
No ratings yet
Origins of Feedback Control
10 pages
Physics: P.M. WEDNESDAY, 25 May 2016 1 Hour
No ratings yet
Physics: P.M. WEDNESDAY, 25 May 2016 1 Hour
15 pages
Azerbaijan State Oil and Industry University Program (Mba/ Bba/ Zu) : Final/Midterm: Course Name: Instructor's Name: Agil Valiyev
No ratings yet
Azerbaijan State Oil and Industry University Program (Mba/ Bba/ Zu) : Final/Midterm: Course Name: Instructor's Name: Agil Valiyev
6 pages
Fix32 Scada-3
100% (1)
Fix32 Scada-3
50 pages
Techniques of The Observer - Jonathan Crary PDF
No ratings yet
Techniques of The Observer - Jonathan Crary PDF
15 pages
B Tech Mechanical
No ratings yet
B Tech Mechanical
5 pages
Kalimarau Airport WAQT Info
100% (2)
Kalimarau Airport WAQT Info
9 pages
Xrdocs Io CNBNG Tutorials Inception Server Deployment Guide
No ratings yet
Xrdocs Io CNBNG Tutorials Inception Server Deployment Guide
8 pages
Carbohydrates
No ratings yet
Carbohydrates
23 pages
BCMS PDF
No ratings yet
BCMS PDF
1 page
Spray Quality & Drift Control Insights
No ratings yet
Spray Quality & Drift Control Insights
1 page
Kobra 2 Max How To Set Up Kobra 2 Max Printer Parameters and Import
No ratings yet
Kobra 2 Max How To Set Up Kobra 2 Max Printer Parameters and Import
6 pages
07 Task Performance 1
No ratings yet
07 Task Performance 1
2 pages
Modulation Worksheet
No ratings yet
Modulation Worksheet
13 pages
Bank Reconciliation Statement - A Practical Legal Approach
No ratings yet
Bank Reconciliation Statement - A Practical Legal Approach
21 pages

Linear Regression With Multiple Variables

Uploaded by

Linear Regression With Multiple Variables

Uploaded by

Linear Regression

with Multiple Variables

Silvia Ahmed Mirza Mohammad Lutfe Elahi

CSE 445 Machine Learning ECE@NSU

Previously: hθ(x) = θ0 + θ1x

hθ(x) = θ0 + θ1x1 + θ2x2 + θ3x3 + … + θnxn

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

Parameters = θ0, θ1, θ2, … θn

} (simultaneously update for every j = 0, 1, … n)

(simultaneously update θ0, θ1)

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

• E.g. -0.5 ≤ x1 ≤ 0.5

• Instead of max, can also use standard deviation or

CSE 445 Machine Learning ECE@NSU

• Debugging: how to make sure gradient descent is

CSE 445 Machine Learning ECE@NSU

• Number of iterations varies a lot

CSE 445 Machine Learning ECE@NSU

0 100 200 300 400

CSE 445 Machine Learning ECE@NSU

• Gradient descent not working

CSE 445 Machine Learning ECE@NSU

• hθ(x) = θ0 + θ1x frontage + θ2 x depth

CSE 445 Machine Learning ECE@NSU

• Might decide that an important feature is the land area

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

Make sure apply the feature scaling.

• Instead of a conventional polynomial you could do

• In order to minimize cost function J(θ), iterative

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

• This would give the values of θ which minimize J(θ)

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

• (XT X)-1 is inverse of matrix XT X

• No need to do feature scaling.

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

• What if (XT X)-1 is non-invertible?

CSE 445 Machine Learning ECE@NSU

• What does it mean for (XT X) to be non-invertible

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

You might also like