0% found this document useful (0 votes)

7 views73 pages

CHC 351 Module 2

The document discusses the fundamentals of regression in machine learning, focusing on univariate regression using fully connected networks. It explains the process of training a model to find parameters that accurately predict outputs from inputs, utilizing loss functions to measure prediction accuracy. Additionally, it introduces shallow neural networks as a flexible approach to model complex relationships between multiple inputs and outputs, emphasizing the universal approximation theorem.

Uploaded by

anujs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views73 pages

CHC 351 Module 2

Uploaded by

anujs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Module 2

Regression

• Univariate regression problem (one output, real value)

• Fully connected network
ML Model

The model is just a mathematical equation; when the inputs are passed through this
equation, it computes the output, and this is termed inference.

The model equation also contains parameters. Different parameter values change the
outcome of the computation; the model equation describes a family of possible relations
between inputs and outputs, and the parameters specify the particular relationship.

When we train or learn a MODEL, we FIND parameters that describe the true relationship
between inputs and outputs.
A learning algorithm takes a
training set of input/output
pairs and manipulates the
parameters until the inputs
Learning predict their corresponding
Algorithm outputs as closely as possible.
• For simplicity, we assume that both
the input x and output y are
vectors of a predetermined and
fixed size and that the elements of
Structured/ each vector are always ordered in
Tabular Data the same way.
Supervised ML Model

When we compute the prediction y from the input x, we call

this inference.

The model is just a mathematical equation with a fixed form. It

represents a family of different relations between the input
and the output. The model also contains parameters 𝜙. The
choice of parameters determines the particular relation
between input and output, so we should really write:
ML Model

The model is just a mathematical equation; when the inputs are passed through this
equation, it computes the output, and this is termed inference.

When we train or learn a MODEL, we FIND parameters that describe the true relationship
between inputs and outputs.
Learning/training the Model
• When we talk about learning or training a model, we mean that we
attempt to find parameters 𝝓 that make sensible output predictions
from the input.
• We learn these parameters using a training dataset of I pairs of input
and output examples {xi, yi}.
• We aim to select parameters that map each training input to its
associated output as closely as possible. We quantify the degree of
mismatch in this mapping with the loss L.
• This is a scalar value that summarizes how poorly the model predicts
the training outputs from their corresponding inputs for parameters
𝝓.
Loss Function
More properly, the loss function also depends on the training data {xi, yi}, so we should write
L [{xi, yi}, 𝝓 ], but this is rather cumbersome
Deploying the Model
• If the loss is small after this minimization, we have found model
parameters that accurately predict the training outputs yi from the
training inputs xi.
• After training a model, we must now assess its performance; we run
the model on separate test data to see how well it generalizes to
examples that it didn’t observe during training.
• If the performance is adequate, then we are ready to deploy the
model.
Example: Linear Model
2 Parameters between 1D input and 1D output
Linear Model

This model has two parameters ϕ = [ϕ0, ϕ1], where ϕ0 is

the y-intercept of the line and ϕ1 is the slope.
Different choices for the y-intercept and slope result in different relations between input and output
Data:Input/Output pairs (I = 12)
Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”

Loss(𝜙)

The three circles represent the lines

Goal
Training
• The process of finding parameters that minimize the loss is termed
model fitting, training, or learning.
• The basic method is to choose the initial parameters randomly and
then improve them by “walking down” the loss function until we
reach the bottom.
• One way to do this is to measure the gradient of the surface at the
current position and take a step in the direction that is most steeply
downhill. Repeat this process until the gradient is flat and we can
improve no further.
Loss(𝜙)

The three circles represent the lines

Example: 1D Linear regression training

This technique is known as gradient descent

Shallow Neural Networks
Shallow neural networks
• 1D regression model is obviously limited
• Want to be able to describe input/output that are not lines
• Want multiple inputs
• Want multiple outputs

• Shallow neural networks

• Flexible enough to describe arbitrarily complex input/output mappings
• Can have as many inputs as we want
• Can have as many outputs as we want
Shallow neural networks
• Example network, 1 input, 1 output
• Universal approximation theorem
• More than one output
• More than one input
• General case
• Number of regions
• Terminology
• Functions for three different choices of the ten parameters 𝜙. In each case, the input/output relation is
piecewise linear.
• However, the positions of the joints, the slopes of the linear regions between them, and the overall height
vary.
1D Linear Regression

Example shallow network

Example shallow network
Example shallow network
Activation function
Example shallow network
Activation function

Rectified Linear Unit

(particular kind of activation function)
Example shallow network
Activation function

Rectified Linear Unit

(particular kind of activation function)
Example shallow network

This model has 10 parameters:

• Represents a family of functions

• Parameters determine particular function
• Given parameters can perform inference (run equation)
• Given training dataset
• Define loss function (least squares)
• Change parameters to minimize loss function
Example shallow network
Example shallow network

Piecewise linear functions with three joints

Hidden units

Break down into two parts:

where:

Hidden units
1. compute three
linear functions
2. Weight the hidden
units
2. Pass through ReLU
functions to
compute hidden units
2. Pass through ReLU
functions to
compute hidden units
Example shallow network

Example shallow network = piecewise linear functions

1 “joint” per ReLU function
Activation pattern = which hidden units are activated

Shaded region:
• Unit 1 active
• Unit 2 inactive
• Unit 3 active
Depicting neural networks

Each parameter multiplies its source and adds to its target

Depicting neural networks
Shallow neural networks
• Example network, 1 input, 1 output
• Universal approximation theorem
• More than one output
• More than one input
• General case
• Number of regions
• Terminology
With 3 hidden units:

With D hidden units:

With enough hidden units…
… we can describe any 1D function to arbitrary accuracy
Universal approximation theorem

“a formal proof that, with enough hidden units, a shallow

neural network can describe any continuous function on a
compact subset of to arbitrary precision”
Shallow neural networks
• Example network, 1 input, 1 ouput
• Universal approximation theorem
• More than one output
• More than one input
• General case
• Number of regions
• Terminology
Two outputs
• 1 input, 4 hidden units, 2 outputs
Two outputs
• 1 input, 4 hidden units, 2 outputs
Two outputs
• 1 input, 4 hidden units, 2 outputs
Shallow neural networks
• Example network, 1 input, 1 ouput
• Universal approximation theorem
• More than one output
• More than one input
• General case
• Number of regions
• Terminology
Two inputs
• 2 inputs, 3 hidden units, 1 output
Convex polygons
Question 1:
• For the 2D case, what if there were two outputs?
• If this is one of the outputs, what would the other one look like?
Shallow neural networks
• Example network, 1 input, 1 ouput
• Universal approximation theorem
• More than one output
• More than one input
• General case
• Number of regions
• Terminology
Arbitrary inputs, hidden units, outputs
• 𝐷𝑜 Outputs, D hidden units, and 𝐷𝑖 inputs

• e.g., Three inputs, three hidden units, two outputs

Question 2:
• How many parameters does this model have?
Shallow neural networks
• Example network, 1 input, 1 ouput
• Universal approximation theorem
• More than one output
• More than one input
• General case
• Number of regions
• Terminology
Number of output regions
• In general, each output consists of D dimensional convex polytopes
• With two inputs, and three outputs, we saw there were seven
polygons:
Number of output regions
• In general, each output consists of D dimensional convex polytopes
• How many?

Highlighted point = 500 hidden units or 51,001 parameters

Number of regions:
• Number of regions created by D > 𝐷𝑖 planes in 𝐷𝑖 dimensions was
proved by Zavlasky (1975) to be:

Binomial coefficients!

• How big is this? It’s greater than 2𝐷𝑖 but less than 2𝐷 .
𝐷𝑖
Proof that bigger than larger than 2

1D input with 1 hidden 2D input with 2 hidden 3D input with D hidden

unit creates two regions units creates four regions units creates eight regions
(one joint) (two lines) (three planes)
Shallow neural networks
• Example network, 1 input, 1 ouput
• Universal approximation theorem
• More than one output
• More than one input
• General case
• Number of regions
• Terminology
Nomenclature
Nomenclature

• Y-offsets = biases
• Slopes = weights
• Everything in one layer connected to everything in the next = Fully Connected Network
• No loops = Feedforward network
• Values after ReLU (activation functions) = activations
• Values before ReLU = pre-activations
• One hidden layer = shallow neural network
• More than one hidden layer = deep neural network
• Number of hidden units ≈ capacity
Other activation functions
Regression

We have built a model that can:

• take an arbitrary number of inputs
• output an arbitrary number of outputs
• model a function of arbitrary complexity between the two
Next time:
• What happens if we feed one neural network into another neural
network?

CM20315 03 Shallow
No ratings yet
CM20315 03 Shallow
59 pages
Lecture 5 6 Shallow NN
No ratings yet
Lecture 5 6 Shallow NN
59 pages
CM20315 03 Shallow
No ratings yet
CM20315 03 Shallow
55 pages
DL145611 03 Shallow
No ratings yet
DL145611 03 Shallow
66 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Module 2
No ratings yet
Module 2
44 pages
Unit 2 DL
No ratings yet
Unit 2 DL
70 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
465-Lecture 1 (Deep Learning)
No ratings yet
465-Lecture 1 (Deep Learning)
47 pages
Deep Learning Algorithms Report PDF
No ratings yet
Deep Learning Algorithms Report PDF
11 pages
Unit 2
No ratings yet
Unit 2
18 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
TFM Lichtner Bajjaoui Aisha
No ratings yet
TFM Lichtner Bajjaoui Aisha
18 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Lecture 3 - MATLAB Representation of Neural Network
No ratings yet
Lecture 3 - MATLAB Representation of Neural Network
6 pages
UDL Answer Booklet Students
No ratings yet
UDL Answer Booklet Students
79 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
UDL Answer Booklet Students
No ratings yet
UDL Answer Booklet Students
79 pages
UDL Answer Booklet Students
No ratings yet
UDL Answer Booklet Students
79 pages
2.3 Feed Forward Netwoks
No ratings yet
2.3 Feed Forward Netwoks
25 pages
Supervised Learning Basics
No ratings yet
Supervised Learning Basics
53 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
79 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Seminar Artificial Neural Network 24 9
No ratings yet
Seminar Artificial Neural Network 24 9
39 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
4.shallow Neural Network
No ratings yet
4.shallow Neural Network
14 pages
Richi's Neural Nets Summary
No ratings yet
Richi's Neural Nets Summary
114 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Unit 3
No ratings yet
Unit 3
12 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
PDC Lecture 12
No ratings yet
PDC Lecture 12
42 pages
Neural Networks & Gradient Descent
No ratings yet
Neural Networks & Gradient Descent
77 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
AI17-Neural Networks
No ratings yet
AI17-Neural Networks
34 pages
Unit V
No ratings yet
Unit V
9 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
Neural Networks
No ratings yet
Neural Networks
108 pages
Unit 2 - Machine Learning
No ratings yet
Unit 2 - Machine Learning
19 pages
Chap 7 Neural Networks
No ratings yet
Chap 7 Neural Networks
42 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Neural Networks Training Loss Functions, Stochastic Gradient Descent, Backpropagation Algorithm, Bias-Variance Tradeoff
No ratings yet
Neural Networks Training Loss Functions, Stochastic Gradient Descent, Backpropagation Algorithm, Bias-Variance Tradeoff
29 pages
CS2011 5
No ratings yet
CS2011 5
43 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Utf-8''02. Presentation
No ratings yet
Utf-8''02. Presentation
94 pages
Unit I
No ratings yet
Unit I
90 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Introduction To Process Modeling and Simulation - Part2
No ratings yet
Introduction To Process Modeling and Simulation - Part2
24 pages
HSS-304 L5
No ratings yet
HSS-304 L5
29 pages
HSS-304 L4
No ratings yet
HSS-304 L4
22 pages
2025 - Social Psychology - DeLamater, Collett, Hitlin
100% (2)
2025 - Social Psychology - DeLamater, Collett, Hitlin
679 pages
Fourier Analysis - APS
No ratings yet
Fourier Analysis - APS
3 pages
Extensions of The Frame Alignment Technique and Their Use in The Characteristic Locus Design Method
No ratings yet
Extensions of The Frame Alignment Technique and Their Use in The Characteristic Locus Design Method
11 pages
Lista 9 - T Laplace de Funções Periódicas
No ratings yet
Lista 9 - T Laplace de Funções Periódicas
5 pages
ECE317 L10 - RouthHurwitzEx
No ratings yet
ECE317 L10 - RouthHurwitzEx
25 pages
How To Evaluate For PAUT
100% (1)
How To Evaluate For PAUT
6 pages
Department of Electrical:Electronic Engineering - Details of Syllabus
No ratings yet
Department of Electrical:Electronic Engineering - Details of Syllabus
13 pages
4MB0 01 Que 20160526
No ratings yet
4MB0 01 Que 20160526
20 pages
Ert
No ratings yet
Ert
2 pages
B3.2-R3 Syllabus
No ratings yet
B3.2-R3 Syllabus
2 pages
Teacher Resource Maths XII
No ratings yet
Teacher Resource Maths XII
235 pages
Differential Topology 2ed. Edition Amiya Mukherjee Complete Edition
100% (5)
Differential Topology 2ed. Edition Amiya Mukherjee Complete Edition
122 pages
Numbers - 2: Number of Questions: 20
100% (1)
Numbers - 2: Number of Questions: 20
5 pages
7.5.4 Test (TST) - Sequences and Functions (Test)
No ratings yet
7.5.4 Test (TST) - Sequences and Functions (Test)
9 pages
Maximum Subrectangle Algorithms
No ratings yet
Maximum Subrectangle Algorithms
3 pages
Maths
29% (7)
Maths
164 pages
Stochastic Geometry For Wireless Network
No ratings yet
Stochastic Geometry For Wireless Network
15 pages
Unit 2B With Answer
No ratings yet
Unit 2B With Answer
35 pages
M347 201806
No ratings yet
M347 201806
25 pages
2012 Maths Practice Paper (M1 & M2) Marking Scheme
No ratings yet
2012 Maths Practice Paper (M1 & M2) Marking Scheme
23 pages
Column Notes
No ratings yet
Column Notes
62 pages
1S1819 Q1 ST1 Set A
100% (1)
1S1819 Q1 ST1 Set A
1 page
Area Under Curve & Differential Equations
No ratings yet
Area Under Curve & Differential Equations
26 pages
GMAT New Qe Labyu
No ratings yet
GMAT New Qe Labyu
53 pages
7UhCz915KH - Quant Time Table (March)
No ratings yet
7UhCz915KH - Quant Time Table (March)
2 pages
Mock Exam-P1 Review 2025
No ratings yet
Mock Exam-P1 Review 2025
41 pages
Differential Equations Exemplar
No ratings yet
Differential Equations Exemplar
25 pages
TIFR Pamphlet On Riemann Surfaces
No ratings yet
TIFR Pamphlet On Riemann Surfaces
73 pages
Polynomials DLP
No ratings yet
Polynomials DLP
6 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
15 pages
Bernoulli Formula With Sample
No ratings yet
Bernoulli Formula With Sample
3 pages

CHC 351 Module 2

Uploaded by

CHC 351 Module 2

Uploaded by

Module 2

• Univariate regression problem (one output, real value)

When we compute the prediction y from the input x, we call

The model is just a mathematical equation with a fixed form. It

This model has two parameters ϕ = [ϕ0, ϕ1], where ϕ0 is

“Least squares loss function”

The three circles represent the lines

The three circles represent the lines

This technique is known as gradient descent

• Shallow neural networks

Example shallow network

Rectified Linear Unit

Rectified Linear Unit

This model has 10 parameters:

• Represents a family of functions

Piecewise linear functions with three joints

Break down into two parts:

Example shallow network = piecewise linear functions

Each parameter multiplies its source and adds to its target

With D hidden units:

“a formal proof that, with enough hidden units, a shallow

• e.g., Three inputs, three hidden units, two outputs

Highlighted point = 500 hidden units or 51,001 parameters

1D input with 1 hidden 2D input with 2 hidden 3D input with D hidden

We have built a model that can:

You might also like