0% found this document useful (0 votes)

17 views34 pages

Lec 3-5 (Function Approximation)

Uploaded by

f20212304

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views34 pages

Lec 3-5 (Function Approximation)

Uploaded by

f20212304

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Deep Learning

CS F425
BITS Pilani Dr. Bharat Richhariya
Department of CSIS
Pilani Campus
Lecture 3: Approximating a function (Regression)
BITS Pilani
Pilani Campus
Linear Regression
• Regression refers to a set of methods for modeling the relationship
between one or more independent variables and a dependent variable.
o The purpose of regression is most often to characterize the relationship
between the inputs and outputs.
o Machine learning, on the other hand, is most often concerned with prediction.

o We can use regression whenever we want to predict a numerical value.

o Predicting prices (of homes, stocks, etc.)
o Predicting length of stay (for patients in the hospital)
o Demand forecasting (for retail sales)
Linear Regression
• Linear regression flows from a few simple assumptions:
• The relationship between the independent variables 𝒙 and the
dependent variable 𝑦 is linear, i.e., 𝑦 can be expressed as a weighted
sum of the elements in 𝒙, given some noise on the observations.
• We will use 𝑛 to denote the number of examples. We index the data
examples by 𝑖.

• and the corresponding label as 𝑦(𝑖)

Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Linear regression: linear model
• The linearity assumption says that the target (price) can be expressed as a
weighted sum of the features (area and age):
𝑝𝑟𝑖𝑐𝑒 = 𝑤𝑎𝑟𝑒𝑎 ⋅ 𝑎𝑟𝑒𝑎 + 𝑤𝑎𝑔𝑒 ⋅ 𝑎𝑔𝑒 + 𝑏
• 𝑤𝑎𝑟𝑒𝑎 and 𝑤𝑎𝑔𝑒 are called weights, and 𝑏 is called a bias(/offset/intercept).
• The weights determine the influence of each feature on our prediction.
• The bias just says what value the predicted price should take when all of the
features take value 0.
• The equation above is an affine transformation of input features, which is
characterized by a linear transformation of features via weighted sum, combined
with a translation via the added bias.
Linear regression: linear model
• Given a dataset, our goal is to choose the weights 𝒘 and the bias 𝑏 such that, on
average, the predictions made by our model best fit the true observations.

• Models whose output prediction is determined by the affine transformation of

input features are linear models.
Linear regression: linear model
• In machine learning, we usually work with high-dimensional datasets.
• When our inputs consist of 𝑑 features, we express our prediction 𝑦ො as
𝑦ො = 𝑤1 𝑥1 + ⋯ + 𝑤𝑑 𝑥𝑑 + 𝑏
• Collecting all features into a vector 𝒙 ∈ 𝑅𝑑 and all weights into a vector 𝒘 ∈ 𝑅𝑑 ,
we can express our model using a dot product:
𝑦ො = 𝒘⊤ 𝒙 + 𝑏
the vector 𝒙 corresponds to features of a single data example.
• We refer to features of our entire dataset of 𝒏 examples via the matrix 𝑿 ∈ 𝑅𝑛×𝑑 ,
where, 𝑿 contains one row for every example and one column for every feature.
• For a collection of features 𝑿, the predictions 𝑦ො ∈ 𝑅𝑛 can be expressed via the
matrix-vector product:
𝑦ො = 𝑿𝒘 + 𝑏
Linear regression: linear model
• Given the features of a training dataset 𝑿 and corresponding (known) labels 𝑦, the goal
of linear regression is to find the weight vector 𝒘 and the bias term 𝑏 such that, given
features of a new data example sampled from the same distribution as 𝑿, the new
example’s label will (in expectation) be predicted with the lowest error.

• We would not expect to find a real-world dataset of 𝑛 examples where 𝑦 𝑖 exactly

equals 𝒘⊤ 𝒙 𝑖 + 𝑏 for all 1 ≤ 𝑖 ≤ 𝑛.
o Thus, even when we are confident that the underlying relationship is linear, we will
incorporate a noise term to account for such errors.
• Before searching for the best parameters (or model parameters) 𝒘 and 𝑏, we need two
more things:
1. A quality measure for some given model.
2. A procedure for updating the model to improve its quality.
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Linear regression: Loss function
• To think about how to fit data with our model, we need to determine a measure
of fitness.
• The loss function quantifies the distance between the real and predicted value of
the target.
o The loss will be a non-negative number where smaller values are better.
o Perfect predictions incur a loss of 0.
• The most popular loss function in regression problems is the squared error:
𝑖 1 𝑖 𝑖 2
𝑙 (𝒘, 𝑏) = 𝑦ො − 𝑦
2
1
• The constant makes no difference but will prove to be convenient, cancelling out when we take
2
the derivative of the loss.
Linear regression: Loss function
• The empirical error is only a function of the model parameters.

• Consider the example below where we plot a regression problem for

a one-dimensional case.

• Large differences between estimates

𝑦ො 𝑖 and observations 𝑦 𝑖 lead to
even larger contributions to the loss,
due to the quadratic dependence.
Linear regression: Loss function
• To measure the quality of a model on the entire dataset of 𝑛 examples, we
average (or equivalently, sum) the losses on the training set:

1 𝑛 1 𝑛 2
𝐿(𝒘, 𝑏) = σ𝑖=1 𝑙 𝑖 𝒘, 𝑏 = σ𝑖=1 𝒘⊤ 𝑥 𝑖 +𝑏−𝑦 𝑖
𝑛 𝑛

• When training the model, we want to find parameters (𝒘∗ , 𝑏 ∗ ) that

minimize the total loss across all training examples:

𝒘∗ , 𝑏 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛𝒘,𝑏 𝐿(𝒘, 𝑏)
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Linear regression: analytic solution
• Linear regression can be solved analytically by applying a simple formula:
• Subsume the bias 𝑏 into the parameter 𝒘 by appending a column to the design matrix
(𝑿) consisting of all ones.
• Then our prediction problem is to minimize ∥ 𝑦 − 𝑿𝒘 ∥2 .
• Take the loss surface to be the minimum of the loss over the entire domain.
• Taking the derivative of the loss with respect to 𝒘 and setting it equal to zero yields the
analytic solution:

𝒘 ∗ = 𝑿⊤ 𝑿 −1
𝑿⊤ 𝑦
• The requirement of an analytic solution is so restrictive that it would exclude all exciting
aspects of deep learning.
• Simple problems like linear regression may admit analytic solutions but, we should not
get used to such good fortune!
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Linear regression: gradient descent
• In cases where we cannot solve the models analytically, we can still train models
effectively in practice.
• The key technique for optimizing nearly any deep learning model is called
gradient descent.
• Gradient descent iteratively reduces the error by updating the parameters in the
direction that incrementally lowers the loss function.
Starting
point

loss

Value of weight Point of convergence

Linear regression: gradient descent
• The most naïve application of gradient descent consists of taking the derivative
of the loss function, which is an average of the losses computed on every single
example in the dataset.
o This is extremely slow: we must pass over the entire dataset before making a single update.
o Thus, we will sample a random minibatch of examples every time we need to compute the
update, this variant called minibatch stochastic gradient descent.
• In each iteration:
1. We first randomly sample a minibatch B consisting of a fixed number of training examples.
2. We then compute the derivative (gradient) of the average loss on the minibatch with
regard to the model parameters.
3. Finally, we multiply the gradient by a predetermined positive value η and subtract the
resulting term from the current parameter values.
Linear regression: gradient descent
• We can express the update mathematically:
𝜂
𝒘, 𝑏 ← 𝒘, 𝑏 − σ𝑖∈𝐵 𝜕 𝒘,𝑏 𝑙 𝑖 𝒘, 𝑏
𝐵
• 𝒘 is the weights vector, 𝑏 is the bias, η is predetermined positive
value (learning rate), cardinality |𝑩| represents the number of
examples in each minibatch (the batch size), and “𝜕 𝒘,𝑏 𝑙 𝑖 𝒘, 𝑏 ”
means the partial derivative of the loss of 𝑖th element.

Randomly Iteratively sample Update the

initialize the random parameters in the
values of the minibatches from direction of the
model parameters the data negative gradient
Linear regression: gradient descent
• We can write this out explicitly as follows:

𝜂 𝑖 𝜂 𝑖
𝒘←𝒘− σ𝑖∈𝐵 𝜕𝒘 𝑙 𝒘, 𝑏 = 𝒘 − σ𝑖∈𝐵 𝒙 𝒘⊤ 𝒙 𝑖 +𝑏−𝑦 𝑖
𝐵 𝐵
𝜂 𝑖 𝜂
𝑏←𝑏− σ𝑖∈𝐵 𝜕𝑏 𝑙 𝒘, 𝑏 = 𝑏 − σ𝑖∈𝐵(𝒘⊤ 𝒙 𝑖
+𝑏−𝑦 𝑖 )
𝐵 𝐵

• The values of the batch size and learning rate are manually pre-
specified and not typically learned through model training.
o These parameters that are tunable but not updated in the training loop are
called hyperparameters.
Linear regression: gradient descent
• Linear regression happens to be a learning problem where there is only one minimum
over the entire domain.
• For more complicated models, like deep networks, the loss surfaces contain many
minima.
• Deep learning practitioners seldom struggle to find parameters that minimize the loss on
training sets.
• The more formidable task is to find parameters that will achieve
low loss on data that we have not seen before.
• A challenge called generalization.
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Making Predictions with the Learned Model
• Given the learned linear regression model 𝒘 ෝ ⊤ 𝒙 + 𝑏෠ , we can estimate
the price of a new house given its area 𝑥1 and age 𝑥2 .
o Estimating targets given features is commonly called prediction or
inference.
BITS Pilani
Pilani Campus

Logistic Regression
Regression vs. Classification
• Regression estimates a continuous value
• Classification predicts a discrete category
Logistic regression
• This is just like linear regression, except that the values y we want to
predict takes on only a small number of discrete values.
• For now, we will focus on the binary classification problem in which y
can take on only two values: 0 and 1.
• For instance, if we are trying to build a spam classifier for email, then
x (i) may be some features of a piece of email, and y may be 1 if it is a
piece of spam mail, and 0 otherwise.
Logistic regression
• We could approach the classification problem ignoring the fact that y
is discrete-valued, and use our old linear regression algorithm to try
to predict y given x.
• However, this method performs very poorly.
• Intuitively, it also doesn’t make sense to consider 𝑦ො values larger than
1 or smaller than 0 when we know that y ∈ {0, 1}.
Logistic regression
• Let’s change the form for our hypotheses for the prediction 𝑦.
ො
• We will choose
ෝ=σ(θT)
𝒚
• where

• is called the logistic function or the sigmoid function

The logistic/sigmoid function
Linear vs logistic regression
Logistic regression
Sources
• https://www.cs.bu.edu/fac/snyder/cs237/Lectures%20and%20Materi
als/Lecture%2023%20--%20Logistic%20Regression.pdf

Regression Linear Simple
No ratings yet
Regression Linear Simple
37 pages
ML Module 2
No ratings yet
ML Module 2
185 pages
Unit 2 ML_Ver 2
No ratings yet
Unit 2 ML_Ver 2
129 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
2-LR_Optim
No ratings yet
2-LR_Optim
60 pages
Machine Learning: Introduction and Linear Regression
No ratings yet
Machine Learning: Introduction and Linear Regression
29 pages
Chap 2 Linear Regression - Part1
No ratings yet
Chap 2 Linear Regression - Part1
29 pages
D2L CH3 Part1
No ratings yet
D2L CH3 Part1
36 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Unit 2 ML_Ver 2
No ratings yet
Unit 2 ML_Ver 2
129 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
lecture3_supervised_learning_I
No ratings yet
lecture3_supervised_learning_I
84 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
02-Linear Regression
No ratings yet
02-Linear Regression
17 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Chapter_2_Linear and Logistic Regression
No ratings yet
Chapter_2_Linear and Logistic Regression
34 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Unit 2
No ratings yet
Unit 2
35 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Regression
No ratings yet
Regression
25 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
Regression and Optimization in ML
No ratings yet
Regression and Optimization in ML
41 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Week 04
No ratings yet
Week 04
101 pages
AAI Lecture 10 Sp 25
No ratings yet
AAI Lecture 10 Sp 25
37 pages
ML-2
No ratings yet
ML-2
155 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Group 30 Ppt
No ratings yet
Group 30 Ppt
33 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Lec 6
No ratings yet
Lec 6
19 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
eng
No ratings yet
eng
10 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Wk05 machine learning
No ratings yet
Wk05 machine learning
6 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
BAA09 - Course Plan - 2T 2021 2022 - Statistical Analysis With Software Application
No ratings yet
BAA09 - Course Plan - 2T 2021 2022 - Statistical Analysis With Software Application
7 pages
Chapter 4 Data Collection
No ratings yet
Chapter 4 Data Collection
77 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Topic 7 AM025 (Normal Distribution) STUDENT
No ratings yet
Topic 7 AM025 (Normal Distribution) STUDENT
82 pages
Probability and Statistics
No ratings yet
Probability and Statistics
64 pages
Sms 3450 Simulation and Modeling
No ratings yet
Sms 3450 Simulation and Modeling
3 pages
100% QM Source
No ratings yet
100% QM Source
205 pages
STAT 252 2025 Winter Common Syllabus 1
No ratings yet
STAT 252 2025 Winter Common Syllabus 1
7 pages
Econometrics II Lecture 5: Instrumental Variables Part II: Måns Söderbom
No ratings yet
Econometrics II Lecture 5: Instrumental Variables Part II: Måns Söderbom
17 pages
Business Statistic Group 4 Mid Term Exam
No ratings yet
Business Statistic Group 4 Mid Term Exam
19 pages
Ppt - 3 (Chi-square Test)
No ratings yet
Ppt - 3 (Chi-square Test)
12 pages
Sam Easton: What Should Easton Realty Do Next?
No ratings yet
Sam Easton: What Should Easton Realty Do Next?
19 pages
Mid - Term Exam in Statistics and Probability
No ratings yet
Mid - Term Exam in Statistics and Probability
3 pages
Multivariate Analysis of Variance (MANOVA) - Stahle
No ratings yet
Multivariate Analysis of Variance (MANOVA) - Stahle
15 pages
Chi Squared Tests
100% (1)
Chi Squared Tests
24 pages
Reference:-MACE Material: Prepared By: - Nitesh Pandey
No ratings yet
Reference:-MACE Material: Prepared By: - Nitesh Pandey
27 pages
Định Luật Chuyển Động Của Newton Bài Thuyết Trình Khoa Học Theo Phong Cách Vẽ Tay Màu Be Than
No ratings yet
Định Luật Chuyển Động Của Newton Bài Thuyết Trình Khoa Học Theo Phong Cách Vẽ Tay Màu Be Than
14 pages
Pengaruh Gaya Hidup, Kelas Sosial, Dan Kepribadian Terhadap Keputusan Pembelian Pada Perumahan Griya Permata Insani
No ratings yet
Pengaruh Gaya Hidup, Kelas Sosial, Dan Kepribadian Terhadap Keputusan Pembelian Pada Perumahan Griya Permata Insani
17 pages
Credit Risk and Profitability of Listed Deposit Money Banks in Nigeria
No ratings yet
Credit Risk and Profitability of Listed Deposit Money Banks in Nigeria
16 pages
DSV Tut 2 Answers
No ratings yet
DSV Tut 2 Answers
6 pages
Errors in Chemical Analysis
No ratings yet
Errors in Chemical Analysis
19 pages
BRM Research Process
No ratings yet
BRM Research Process
18 pages
Module 7
No ratings yet
Module 7
11 pages
50.2 - Chi Square Goodness-of-Fit Test
No ratings yet
50.2 - Chi Square Goodness-of-Fit Test
11 pages
Ashoka University Stata Results Log Filenaresh Sehdev
No ratings yet
Ashoka University Stata Results Log Filenaresh Sehdev
11 pages
Practice Multiple Choice Questions and Feedback - Chapters 1 and 2
No ratings yet
Practice Multiple Choice Questions and Feedback - Chapters 1 and 2
10 pages
Topic One Nature and Scope of Statistics: Objectives
No ratings yet
Topic One Nature and Scope of Statistics: Objectives
5 pages
ES-MAT-61-Business Statistics
No ratings yet
ES-MAT-61-Business Statistics
3 pages
The Chi Square Statistic
No ratings yet
The Chi Square Statistic
6 pages
Table 4: Descriptive Statistics
No ratings yet
Table 4: Descriptive Statistics
2 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet