0% found this document useful (0 votes)

53 views5 pages

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

nihed13535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views5 pages

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

nihed13535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS445: Neural Networks and Deep Learning

Lecture 4: Backpropagation and Gradient Descent Professor Chen - Fall 2024

I. Understanding Backpropagation
Today's lecture focused on the mathematics behind neural network training. The
backpropagation algorithm is fundamental to how neural networks learn from data.

Key Concepts:
1. Forward Propagation

- Input signals flow through the network

- Each neuron computes: output = activation_function(weighted_sum + bias)
- Final layer produces prediction

2. Computing the Loss

- Measures difference between prediction and actual target

- Common loss functions:
● Mean Squared Error (MSE): L = 1/n Σ(y - ŷ)²
● Cross-Entropy: L = -Σ y log(ŷ)

II. The Chain Rule in Neural Networks

The chain rule is crucial for computing gradients through multiple layers:

∂L/∂w = ∂L/∂a × ∂a/∂z × ∂z/∂w

Where:

- L is the loss
- a is the activation
- z is the weighted sum
- w is the weight

III. Gradient Descent Implementation

def backward_pass(network, loss, learning_rate=0.01):
# Compute gradients

for layer in reversed(network.layers):

layer.gradients = compute_gradients(layer)

# Update weights and biases

layer.weights -= learning_rate * layer.gradients['weights']

layer.biases -= learning_rate * layer.gradients['biases']

Types of Gradient Descent:

1. Batch Gradient Descent

- Uses entire dataset for each update

- Very stable but slow
- High memory requirements

2. Stochastic Gradient Descent (SGD)

- Uses single sample for each update

- Faster but noisier
- Lower memory requirements

3. Mini-batch Gradient Descent

- Best of both worlds

- Typically 32-256 samples per batch
- Most commonly used in practice

IV. Activation Functions

We covered several activation functions and their derivatives:

1. Sigmoid

- σ(x) = 1/(1 + e^(-x))

- Derivative: σ(x)(1 - σ(x))
- Issues with vanishing gradients
2. ReLU

- f(x) = max(0, x)
- Derivative: 1 if x > 0, 0 otherwise
- Most commonly used today

3. Tanh

- Range: [-1, 1]
- Often better than sigmoid
- Still has vanishing gradient issues

V. Common Challenges and Solutions

1. Vanishing Gradients
Solutions:

- Use ReLU activation

- Implement residual connections
- Proper initialization

2. Exploding Gradients
Solutions:

- Gradient clipping
- Batch normalization
- L2 regularization

VI. Practical Implementation Tips

1. Weight Initialization:

# He initialization for ReLU networks

weights = np.random.randn(shape) * np.sqrt(2/n_inputs)

# Xavier initialization for tanh networks

weights = np.random.randn(shape) * np.sqrt(1/n_inputs)

2. Learning Rate Selection:
- Start with 0.01
- Use learning rate schedules
- Consider adaptive methods (Adam, RMSprop)

VII. Today's Lab Exercise

Implement a simple neural network with:

1. One hidden layer (64 units)

2. ReLU activation
3. Softmax output layer
4. Cross-entropy loss
5. Mini-batch gradient descent

Homework Assignment
Due next Tuesday:

1. Implement backpropagation from scratch

2. Train a network on MNIST dataset
3. Experiment with different:
- Learning rates
- Batch sizes
- Network architectures

Important Formulas to Remember

1. Softmax: σ(z)ᵢ = e^zᵢ / Σ e^z

2. Cross-Entropy Loss: L = -Σ yᵢ log(ŷᵢ)

3. Weight Update Rule: w = w - α∇L

Additional reading: "Deep Learning" by Goodfellow, Bengio, and Courville - Chapter 6.5

Next Week's Preview

- Convolutional Neural Networks
- Feature Maps
- Pooling Layers
- CNN Architectures
Recommended Resources
- Tensorflow Documentation
- PyTorch Tutorials
- Stanford CS231n Course Notes
- Andrew Ng's Deep Learning Specialization

Note: Office hours this week are Wednesday 2-4pm and Thursday 3-5pm in Room 405.

Module 2
No ratings yet
Module 2
12 pages
02 Neural Networks
No ratings yet
02 Neural Networks
32 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
3rd Ass
No ratings yet
3rd Ass
6 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
Module 2
No ratings yet
Module 2
13 pages
AI Techniques for Engineering Students
No ratings yet
AI Techniques for Engineering Students
12 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Deep Learning Module-02
No ratings yet
Deep Learning Module-02
15 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Neural Networks & Deep Learning - Study Notes
No ratings yet
Neural Networks & Deep Learning - Study Notes
8 pages
Introduction To Deep Learning With IBM PDF
No ratings yet
Introduction To Deep Learning With IBM PDF
15 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Gen Ai Mynotes
No ratings yet
Gen Ai Mynotes
12 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Lecture 02-2
No ratings yet
Lecture 02-2
37 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
DS303 NN
No ratings yet
DS303 NN
20 pages
A Probabilistic Theory of Deep Learning: Unit 2
100% (1)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
3 pages
3EBX0 Lecture Notes Addendum
No ratings yet
3EBX0 Lecture Notes Addendum
10 pages
PDL Challenge 2
No ratings yet
PDL Challenge 2
9 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Mod 4 Notes
No ratings yet
Mod 4 Notes
46 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Lecture 02 With Notes
No ratings yet
Lecture 02 With Notes
65 pages
Karpathy 1 Micrograd 1
No ratings yet
Karpathy 1 Micrograd 1
52 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
3.4 - Backpropagation and Architectures
No ratings yet
3.4 - Backpropagation and Architectures
28 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
Slides 11
No ratings yet
Slides 11
48 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
4 Neural Networks
No ratings yet
4 Neural Networks
31 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
Week5 Summary Detail
No ratings yet
Week5 Summary Detail
10 pages
Neural Networks: Key Concepts & Functions
No ratings yet
Neural Networks: Key Concepts & Functions
22 pages
Notes of AEM (3CS1-01,3AM1-01,3AD1-01) - Unit 2 by Dr. RM - 2024-25
No ratings yet
Notes of AEM (3CS1-01,3AM1-01,3AD1-01) - Unit 2 by Dr. RM - 2024-25
46 pages
Continuous Probability Distributions
No ratings yet
Continuous Probability Distributions
11 pages
Syllabus ANN
No ratings yet
Syllabus ANN
2 pages
ESO 209: Probability and Statistics 2019-2020-II Semester Assignment No. 7
No ratings yet
ESO 209: Probability and Statistics 2019-2020-II Semester Assignment No. 7
18 pages
11.2 BaselineModels
No ratings yet
11.2 BaselineModels
6 pages
Neural Network Assignment Enhanced
No ratings yet
Neural Network Assignment Enhanced
8 pages
10.2. Deep Learning (CNN)
No ratings yet
10.2. Deep Learning (CNN)
50 pages
Neural Network & MATLAB Workshop
No ratings yet
Neural Network & MATLAB Workshop
5 pages
Assisgnment Questions
No ratings yet
Assisgnment Questions
4 pages
Noc18 Ee27 Assignment1
No ratings yet
Noc18 Ee27 Assignment1
3 pages
Presentation DLD 1
No ratings yet
Presentation DLD 1
28 pages
(2020129) On Layer Normalization in The Transformer Architecture
No ratings yet
(2020129) On Layer Normalization in The Transformer Architecture
17 pages
HS08 02
No ratings yet
HS08 02
296 pages
LLM Book 43-102
No ratings yet
LLM Book 43-102
60 pages
PRP UNIT IV Markove Process
No ratings yet
PRP UNIT IV Markove Process
52 pages
Time Series Analysis: PACF Guide
No ratings yet
Time Series Analysis: PACF Guide
5 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Complexity Theory
No ratings yet
Complexity Theory
19 pages
Lecture 6 - Convolution Neural Network (CNN)
No ratings yet
Lecture 6 - Convolution Neural Network (CNN)
26 pages
CS301 Automata Exam Spring 2016
No ratings yet
CS301 Automata Exam Spring 2016
8 pages
Introduction To Deep Learning: Suresh Jaganathan
No ratings yet
Introduction To Deep Learning: Suresh Jaganathan
73 pages
The Binomial, Poisson, and Normal Distributions: Modified After Powerpoint by Fauziah Binti Aziz
No ratings yet
The Binomial, Poisson, and Normal Distributions: Modified After Powerpoint by Fauziah Binti Aziz
25 pages
Zero-Inflated GLMMs with glmmTMB
No ratings yet
Zero-Inflated GLMMs with glmmTMB
23 pages
Thẻ ghi nhớ - DPL302m (FPTU - AI) - Quizlet
No ratings yet
Thẻ ghi nhớ - DPL302m (FPTU - AI) - Quizlet
18 pages
Analysis Skewness in GARCH Model
No ratings yet
Analysis Skewness in GARCH Model
18 pages
HRJ R1333
No ratings yet
HRJ R1333
6 pages
ARIMA Modeling & Forecast in Excel
No ratings yet
ARIMA Modeling & Forecast in Excel
2 pages
Soft Computing UNIT 3
No ratings yet
Soft Computing UNIT 3
10 pages
RBBA ResNet - BERT - Bahdanau Attention For Image Caption Generator
No ratings yet
RBBA ResNet - BERT - Bahdanau Attention For Image Caption Generator
6 pages
Wilson2020 Part2
No ratings yet
Wilson2020 Part2
47 pages

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

CS445: Neural Networks and Deep Learning

Lecture 4: Backpropagation and Gradient Descent Professor Chen - Fall 2024

- Input signals flow through the network

2. Computing the Loss

- Measures difference between prediction and actual target

II. The Chain Rule in Neural Networks

∂L/∂w = ∂L/∂a × ∂a/∂z × ∂z/∂w

III. Gradient Descent Implementation

for layer in reversed(network.layers):

# Update weights and biases

layer.weights -= learning_rate * layer.gradients['weights']

layer.biases -= learning_rate * layer.gradients['biases']

Types of Gradient Descent:

- Uses entire dataset for each update

2. Stochastic Gradient Descent (SGD)

- Uses single sample for each update

3. Mini-batch Gradient Descent

- Best of both worlds

IV. Activation Functions

- σ(x) = 1/(1 + e^(-x))

V. Common Challenges and Solutions

- Use ReLU activation

VI. Practical Implementation Tips

# He initialization for ReLU networks

weights = np.random.randn(shape) * np.sqrt(2/n_inputs)

# Xavier initialization for tanh networks

weights = np.random.randn(shape) * np.sqrt(1/n_inputs)

VII. Today's Lab Exercise

1. One hidden layer (64 units)

1. Implement backpropagation from scratch

Important Formulas to Remember

2. Cross-Entropy Loss: L = -Σ yᵢ log(ŷᵢ)

3. Weight Update Rule: w = w - α∇L

Next Week's Preview

You might also like