Deep Learning

Uploaded by

nomialsCry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views3 pages

Deep Learning

Uploaded by

nomialsCry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

GD: batch size parameter w = w - alpha*(dw / sqrt(s_dw))

Normalizing: faster convergence => move at a b = b - alpha*(db / sqrt(s_db))

similar scale
Mini batch GD: What happens if we raise beta?
- Batch size > 1
Stochastic GD: Adam:
- Bath size = 1, one training example at a time V_dw = beta*V_dw + (1-beta)*dw
- Extremely noisy S_dw = beta*S_dw + (1-beta)*dw^2
- No convergence
V_dw^corrected = V_dw / (1-beta_1^t)
Exponentially weighted moving averages S_dw^corrected = S_dw / (1-beta_2^t)
Moving average W = w - alpha*V_dw^corrected /
V_t = Beta*v_{t-1} + (1-b)*Theta_t (sqrt(S_dw^corrected) + e)
Averaging over 1/(1-Beta) - many t’s
Compute the average of the last t data points Hyperparameters choice:
So in time series let’s say you want the average of Use default values for the hyperparameters but
the quarters: alpha, the learning rate, needs to be tuned
(¼)*sum of the first year, then plot the average of ½ * Combines RMSProp and gradient with momentum
next two
Average of the last 1/(1-b) days plus some weight of Learning rate decay:
that day => Reduce learning rate - to oscillate over a tighter
Number of days proportional to b region
Recursive on sum (1-b)*(b^n)*T for n = 1 to 100 alpha = (1 / (1+decayRate*epoch))*alpha_0

What happens if we raise beta? Closer to the points Local optima:

Bias correction:
Starts off really low => add a bias term

Gradient descent with momentum:

Like moving average for derivatives, instead of time
series
V_dw = beta*V_dw + (1-beta)*dw
V_db = beta*V_db + (1-beta)*db
W = w - a*V_dw
b = b - a*V_db

What happens if we raise beta? It’s like learning rate

High β\betaβ Leads to More Horizontal Movement
=>shorter axis, smoother => depends more on the
general trend, less sensitive to noise
Low β\betaβ Leads to More Vertical Movement,
greater descent noisier => depends more on the Exponential, step decay
current level

RootMeanSprop:
S_dw = beta*S_dw + (1-beta)*dw^2
S_db = beta*S_db + (1-beta)*db^2
Computational resources SEQUENCE MODELS
- tune: learning rate, mini-batch size
- whether to try Panda or Caviar

New hyperparameters - should only be done if new

hardware or computational power is acquired =>
false

Batch normalization
Are beta and gamma learned

Deep learning programming frameworks don’t

require cloud-based machines to run

Framework allows fewer lines of code

Tasks that could be addressed by a many-to-one
RNN model architecture:
If searching among a large number of
hyperparameters, you SHOULD NOT try values What’s many-to-one RNN model architecture?
in a grid rather than random values, so that you Sequence then outputs 1 result
can carry out the search more systematically
and not rely on chance => use random search

Don’t use the most recent mini-batch’s mean

and sigma used for normalization

After training a neural network with batch norm,

at test time, to evaluate the neural network on a
new example, you should perform the needed
normalizations, use mean and sigma estimated
using an exponentially weighted average across
mini-batches seen during training
If you are training an RNN model, and find that your LSTM
weights and activations are all taking on the value of
NaN (“Not a Number”) gradients exploding

Gu has the same dim as # of hidden nodes

choose the r-th training sample first,

then the s-th word

If we want c<t> to be highly dependent on c<t-

1>, we want Gu to be very low,
Gr => about remembering previous states

Dimensionality in word embedding:

Question 10
The sparsity of connections and weight sharing are
mechanisms that allow us to use fewer parameters in a
convolutional layer making it possible to train a network
with smaller training sets. True/False?

Number of weights per filter:

Total number of weights for all filters:
Bias parameters: one per filter

LSTM

Gu => update gate

Gr => forget
Go => output gate
Gu has dimension = # hidden units in the LSTM

Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
Training NNs
No ratings yet
Training NNs
34 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Unit 4 NNDL-1
No ratings yet
Unit 4 NNDL-1
12 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Deep Learning Cheatsheet
No ratings yet
Deep Learning Cheatsheet
5 pages
Optimization
No ratings yet
Optimization
44 pages
PDF Hyperparameter Tuning Batch Normalization
No ratings yet
PDF Hyperparameter Tuning Batch Normalization
11 pages
Lect 7
No ratings yet
Lect 7
43 pages
Lec 8
No ratings yet
Lec 8
43 pages
Dense Neural Nets
No ratings yet
Dense Neural Nets
68 pages
Deep Learning Concepts
No ratings yet
Deep Learning Concepts
13 pages
Gen Ai Mynotes
No ratings yet
Gen Ai Mynotes
12 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Neural Networks for Tech Enthusiasts
No ratings yet
Neural Networks for Tech Enthusiasts
2 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
CS 329 Lecture4 2025new
No ratings yet
CS 329 Lecture4 2025new
61 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Cours 5
No ratings yet
Cours 5
23 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Deep Learning Turorial PDF
No ratings yet
Deep Learning Turorial PDF
301 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
Deep Learning & Neural Networks Guide
No ratings yet
Deep Learning & Neural Networks Guide
87 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
ANNs
No ratings yet
ANNs
17 pages
DS303 NN
No ratings yet
DS303 NN
20 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
15 Deep
No ratings yet
15 Deep
39 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
Batch Normalization
No ratings yet
Batch Normalization
7 pages
Deep Learning & Neural Networks
No ratings yet
Deep Learning & Neural Networks
10 pages
DL Lecture 11 Optimizers
No ratings yet
DL Lecture 11 Optimizers
41 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
ITNN Week3
No ratings yet
ITNN Week3
21 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
Differential Geometry and High Dimensional Concepts
No ratings yet
Differential Geometry and High Dimensional Concepts
2 pages
Stanford KNNassignment
No ratings yet
Stanford KNNassignment
78 pages
All The Math
No ratings yet
All The Math
6 pages
BV21
No ratings yet
BV21
25 pages
Nonlinear Orthogonal Projection
No ratings yet
Nonlinear Orthogonal Projection
31 pages
Absil malickProjectiveRetraction
No ratings yet
Absil malickProjectiveRetraction
25 pages
Graphs
No ratings yet
Graphs
2 pages
Pinball-Huber Boosted Extreme Learning Machine Regression: A Multiobjective Approach To Accurate Power Load Forecasting
No ratings yet
Pinball-Huber Boosted Extreme Learning Machine Regression: A Multiobjective Approach To Accurate Power Load Forecasting
16 pages
TOC Imp Questions
No ratings yet
TOC Imp Questions
8 pages
Roadmap For Mastering Natural Language Processing
No ratings yet
Roadmap For Mastering Natural Language Processing
3 pages
Activation Functions
No ratings yet
Activation Functions
29 pages
Answer Key Quizactivity - Mansci
No ratings yet
Answer Key Quizactivity - Mansci
10 pages
Exp. 7
No ratings yet
Exp. 7
3 pages
Assignment Pool - 2
No ratings yet
Assignment Pool - 2
5 pages
Introduction To Large Language Models (LLMS) - Quiz - Week 3 - NOV25
No ratings yet
Introduction To Large Language Models (LLMS) - Quiz - Week 3 - NOV25
3 pages
MCA 2nd Semester Math Assignment
No ratings yet
MCA 2nd Semester Math Assignment
2 pages
50 Cau Trac Nghiem
No ratings yet
50 Cau Trac Nghiem
13 pages
Composition Method
100% (2)
Composition Method
4 pages
09-RNN (V.Andicsova)
No ratings yet
09-RNN (V.Andicsova)
30 pages
Types of Distribution Normal Distribution
No ratings yet
Types of Distribution Normal Distribution
5 pages
Univariate Time Series Models Guide
No ratings yet
Univariate Time Series Models Guide
15 pages
Hedlin Novian Napitupulu Tugas3
No ratings yet
Hedlin Novian Napitupulu Tugas3
7 pages
Retro Model Deep-Mind
No ratings yet
Retro Model Deep-Mind
43 pages
Theory of Computation Qpaper DU CBCS2024
No ratings yet
Theory of Computation Qpaper DU CBCS2024
6 pages
CS8501 Iq
No ratings yet
CS8501 Iq
17 pages
Deep Learning Course File
No ratings yet
Deep Learning Course File
56 pages
Probability and Statistic Chapter2
No ratings yet
Probability and Statistic Chapter2
78 pages
Deep Learning TensorFlow and Keras
No ratings yet
Deep Learning TensorFlow and Keras
454 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
10 pages
Assignment-2: Finite Automata
No ratings yet
Assignment-2: Finite Automata
5 pages
Advanced Robot Autonomy Course
No ratings yet
Advanced Robot Autonomy Course
30 pages
08 NLP With Deep Learning
No ratings yet
08 NLP With Deep Learning
31 pages
Theory of Computation - Weekly Test 01 Discussion Notes
No ratings yet
Theory of Computation - Weekly Test 01 Discussion Notes
13 pages
Lecture 6 - Convolution Neural Network (CNN)
No ratings yet
Lecture 6 - Convolution Neural Network (CNN)
26 pages
ARIMA Models: Instructions
60% (5)
ARIMA Models: Instructions
3 pages
Demand Forecasting (For Students) - V6
No ratings yet
Demand Forecasting (For Students) - V6
75 pages
PCCAIML502
No ratings yet
PCCAIML502
2 pages