Cheatsheet Deep Learning

This document provides a summary of key concepts in deep learning and machine learning, including: 1) Neural network architectures include convolutional and recurrent neural networks, with layers made up of hidden units that perform operations like convolutions or transformations through activation functions. 2) Backpropagation is used to update weights in the network by computing gradients with respect to weights based on the loss between the actual and desired output. 3) Recurrent neural networks include different types of gates to allow information to persist, like in LSTMs which include forget gates to avoid the vanishing gradient problem.

Uploaded by

MichelleHan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

155 views2 pages

Cheatsheet Deep Learning

Uploaded by

MichelleHan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

CS 229 – Machine Learning https://stanford.

edu/~shervine

VIP Cheatsheet: Deep Learning r Learning rate – The learning rate, often noted η, indicates at which pace the weights get
updated. This can be fixed or adaptively changed. The current most popular method is called
Adam, which is a method that adapts the learning rate.

r Backpropagation – Backpropagation is a method to update the weights in the neural network

Afshine Amidi and Shervine Amidi by taking into account the actual output and the desired output. The derivative with respect
to weight w is computed using chain rule and is of the following form:
August 23, 2018 ∂L(z,y) ∂L(z,y) ∂a ∂z
= × ×
∂w ∂a ∂z ∂w

Neural Networks As a result, the weight is updated as follows:

Neural networks are a class of models that are build with layers. Commonly used types of neural ∂L(z,y)
networks include convolutional and recurrent neural networks. w ←− w − η
∂w
r Architecture – The vocabulary around neural networks architectures is described in the
figure below:
r Updating weights – In a neural network, weights are updated as follows:

• Step 1: Take a batch of training data.

• Step 2: Perform forward propagation to obtain the corresponding loss.

• Step 3: Backpropagate the loss to get the gradients.

By noting i the ith layer of the network and j the j th hidden unit of the layer, we have:
• Step 4: Use the gradients to update the weights of the network.
[i] [i] T [i]
zj = wj x + bj

where we note w, b, z the weight, bias and output respectively. r Dropout – Dropout is a technique meant at preventing overfitting the training data by
dropping out units in a neural network. In practice, neurons are either dropped with probability
r Activation function – Activation functions are used at the end of a hidden unit to introduce p or kept with probability 1 − p.
non-linear complexities to the model. Here are the most common ones:

Sigmoid Tanh ReLU Leaky ReLU Convolutional Neural Networks

1 ez − e−z r Convolutional layer requirement – By noting W the input volume size, F the size of the
g(z) = g(z) = g(z) = max(0,z) g(z) = max(z,z)
1 + e−z ez + e−z convolutional layer neurons, P the amount of zero padding, then the number of neurons N that
with 1 fit in a given volume is such that:

W − F + 2P
N = +1
S

r Batch normalization – It is a step of hyperparameter γ, β that normalizes the batch {xi }.

By noting µB , σB
2 the mean and variance of that we want to correct to the batch, it is done as

follows:

xi − µ B
xi ←− γ p +β
r Cross-entropy loss – In the context of neural networks, the cross-entropy loss L(z,y) is 2 +
σB
commonly used and is defined as follows:
h i
L(z,y) = − y log(z) + (1 − y) log(1 − z) It is usually done after a fully connected/convolutional layer and before a non-linearity layer and
aims at allowing higher learning rates and reducing the strong dependence on initialization.

Stanford University 1 Fall 2018

CS 229 – Machine Learning https://stanford.edu/~shervine

Recurrent Neural Networks • We iterate the value based on the values before:

r Types of gates – Here are the different types of gates that we encounter in a typical recurrent
" #
neural network:
X
0 0
Vi+1 (s) = R(s) + max γPsa (s )Vi (s )
a∈A
s0 ∈S
Input gate Forget gate Output gate Gate
Write to cell or not? Erase a cell or not? Reveal a cell or not? How much writing?
r Maximum likelihood estimate – The maximum likelihood estimates for the state transition
probabilities are as follows:
r LSTM – A long short-term memory (LSTM) network is a type of RNN model that avoids #times took action a in state s and got to s0
the vanishing gradient problem by adding ’forget’ gates. Psa (s0 ) =
#times took action a in state s

Reinforcement Learning and Control

r Q-learning – Q-learning is a model-free estimation of Q, which is done as follows:
The goal of reinforcement learning is for an agent to learn how to evolve in an environment. h i
Q(s,a) ← Q(s,a) + α R(s,a,s0 ) + γ max Q(s0 ,a0 ) − Q(s,a)
r Markov decision processes – A Markov decision process (MDP) is a 5-tuple (S,A,{Psa },γ,R) a0
where:
• S is the set of states
• A is the set of actions
• {Psa } are the state transition probabilities for s ∈ S and a ∈ A

• γ ∈ [0,1[ is the discount factor

• R : S × A −→ R or R : S −→ R is the reward function that the algorithm wants to

maximize

r Policy – A policy π is a function π : S −→ A that maps states to actions.

Remark: we say that we execute a given policy π if given a state a we take the action a = π(s).
r Value function – For a given policy π and a given state s, we define the value function V π
as follows:
h i
V π (s) = E R(s0 ) + γR(s1 ) + γ 2 R(s2 ) + ...|s0 = s,π

∗
r Bellman equation – The optimal Bellman equations characterizes the value function V π
of the optimal policy π ∗ :
∗ ∗
X
V π (s) = R(s) + max γ Psa (s0 )V π (s0 )
a∈A
s0 ∈S

Remark: we note that the optimal policy π ∗ for a given state s is such that:
X
π ∗ (s) = argmax Psa (s0 )V ∗ (s0 )
a∈A
s0 ∈S

r Value iteration algorithm – The value iteration algorithm is in two steps:

• We initialize the value:

V0 (s) = 0

Stanford University 2 Fall 2018

Physics Informed Neural Network Theory and Applications
No ratings yet
Physics Informed Neural Network Theory and Applications
44 pages
Sem Exercise v2.5
100% (1)
Sem Exercise v2.5
31 pages
Deep Learning Cheatsheet
No ratings yet
Deep Learning Cheatsheet
5 pages
CS 229 - Deep Learning Cheatsheet
No ratings yet
CS 229 - Deep Learning Cheatsheet
6 pages
CS 229 - Deep Learning Cheatsheet
No ratings yet
CS 229 - Deep Learning Cheatsheet
6 pages
Machine Learning with Artificial Neural Networks
No ratings yet
Machine Learning with Artificial Neural Networks
6 pages
5afad2ed-331b-41c0-a807-e2698168ed96_DNN_Cheat_sheet
No ratings yet
5afad2ed-331b-41c0-a807-e2698168ed96_DNN_Cheat_sheet
5 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
18 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Module1 ECO-598 AI & ML Aug 21
No ratings yet
Module1 ECO-598 AI & ML Aug 21
45 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Neural network
No ratings yet
Neural network
7 pages
Basic Neural Networks
No ratings yet
Basic Neural Networks
9 pages
Lecture_09_slides_-_after
No ratings yet
Lecture_09_slides_-_after
57 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Single Neuron Model
No ratings yet
Single Neuron Model
16 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
shortnotedeeplearning (2)
No ratings yet
shortnotedeeplearning (2)
11 pages
Main
No ratings yet
Main
25 pages
ML807_Distributed_and_Federated_Learning_Slides_2
No ratings yet
ML807_Distributed_and_Federated_Learning_Slides_2
211 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Unit III
No ratings yet
Unit III
37 pages
ANN Doc
No ratings yet
ANN Doc
2 pages
Introduction neural
No ratings yet
Introduction neural
13 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
Neural NetworksChapter2Sup
No ratings yet
Neural NetworksChapter2Sup
20 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Cs229 Notes Deep Learning
No ratings yet
Cs229 Notes Deep Learning
21 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
Chapter 6 AI
No ratings yet
Chapter 6 AI
52 pages
Lecture 5 - CS50's Introduction to Artificial Intelligence with Python
No ratings yet
Lecture 5 - CS50's Introduction to Artificial Intelligence with Python
16 pages
THE_DEEP_NEURAL_NETWORK-A_REVIEW
No ratings yet
THE_DEEP_NEURAL_NETWORK-A_REVIEW
5 pages
Unit 1
No ratings yet
Unit 1
16 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
L2 Neural Network Basics
No ratings yet
L2 Neural Network Basics
105 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
DS303_NN
No ratings yet
DS303_NN
20 pages
Supervised Learning Neural Networks
No ratings yet
Supervised Learning Neural Networks
4 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Unit 3
No ratings yet
Unit 3
7 pages
Chap 2
No ratings yet
Chap 2
32 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Artificial neural network using R
No ratings yet
Artificial neural network using R
15 pages
DOC-20241108-WA0006.
No ratings yet
DOC-20241108-WA0006.
70 pages
Practical On Artificial Neural Networks: Amrender Kumar
No ratings yet
Practical On Artificial Neural Networks: Amrender Kumar
11 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
2024-05-07 - Module réseaux de neurones pour la performance industrielle
No ratings yet
2024-05-07 - Module réseaux de neurones pour la performance industrielle
61 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Exercises of Power, Taylor and Fourier Series
From Everand
Exercises of Power, Taylor and Fourier Series
Simone Malacrida
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Asthma: Practice Gap
100% (1)
Asthma: Practice Gap
21 pages
Bronchiolitis: Practice Gaps
No ratings yet
Bronchiolitis: Practice Gaps
11 pages
Use of C-Reactive Protein and Ferritin Biomarkers in Daily Pediatric Practice
No ratings yet
Use of C-Reactive Protein and Ferritin Biomarkers in Daily Pediatric Practice
14 pages
Platelet Disorders: Practice Gaps
No ratings yet
Platelet Disorders: Practice Gaps
14 pages
Ecg Session
No ratings yet
Ecg Session
7 pages
Tumor Lysis Syndrome: Practice Gaps
No ratings yet
Tumor Lysis Syndrome: Practice Gaps
9 pages
Medical Student Guide Ophthalmology Match 0
No ratings yet
Medical Student Guide Ophthalmology Match 0
23 pages
Normal 2. Balanced Carrier 3. Trisomy 4. Monosomy (Autosomal Monosomy Not Viable)
No ratings yet
Normal 2. Balanced Carrier 3. Trisomy 4. Monosomy (Autosomal Monosomy Not Viable)
4 pages
3R Solutions
No ratings yet
3R Solutions
64 pages
Reading Assignments
No ratings yet
Reading Assignments
5 pages
Chapter 12 Practice AP Test
100% (1)
Chapter 12 Practice AP Test
34 pages
Matte
No ratings yet
Matte
17 pages
Debt and Distress - Evaluating The Psychological Cost of Credit
No ratings yet
Debt and Distress - Evaluating The Psychological Cost of Credit
22 pages
Yate’s Correction
No ratings yet
Yate’s Correction
15 pages
Gibbs Sampling
No ratings yet
Gibbs Sampling
10 pages
Introduction Assessment of The Trends in Urban Housing Demand 012424 1
No ratings yet
Introduction Assessment of The Trends in Urban Housing Demand 012424 1
7 pages
Basic Core Curriculum
No ratings yet
Basic Core Curriculum
292 pages
English Courses 2014-2015 PDF
No ratings yet
English Courses 2014-2015 PDF
9 pages
Lind 10e Chap09
No ratings yet
Lind 10e Chap09
28 pages
Analysis of Variance (ANOVA)
No ratings yet
Analysis of Variance (ANOVA)
7 pages
MBA 2021 Lecture Notes
No ratings yet
MBA 2021 Lecture Notes
26 pages
Accounting Department Courseware
No ratings yet
Accounting Department Courseware
28 pages
BAM 212
No ratings yet
BAM 212
7 pages
Lampiran 9. Tingkat Kesukaran
No ratings yet
Lampiran 9. Tingkat Kesukaran
2 pages
Quality Control Assurance and Reliability: Dr. Sharad Shrivastava
No ratings yet
Quality Control Assurance and Reliability: Dr. Sharad Shrivastava
76 pages
Econometrics Assignment Week 4
No ratings yet
Econometrics Assignment Week 4
6 pages
Pseudo Holday - Handle COVID 19 - Facebook Prophet
No ratings yet
Pseudo Holday - Handle COVID 19 - Facebook Prophet
27 pages
Survival Analysis
No ratings yet
Survival Analysis
36 pages
SSMD MID Question Paper April 2024
No ratings yet
SSMD MID Question Paper April 2024
1 page
Case Study Project
100% (1)
Case Study Project
28 pages
Assumption of Linear Regression
No ratings yet
Assumption of Linear Regression
6 pages
CMR University School of Engineering and Technology Department of Cse and It
No ratings yet
CMR University School of Engineering and Technology Department of Cse and It
6 pages
Tesfaye Tadesse
No ratings yet
Tesfaye Tadesse
59 pages
PG - M.B.a Five Years Intergrated - 5 Year Integrated - 347 32 BUSINESS STATISTICS - 7958
No ratings yet
PG - M.B.a Five Years Intergrated - 5 Year Integrated - 347 32 BUSINESS STATISTICS - 7958
248 pages
Distributive Analysis Stata Package V2.3
No ratings yet
Distributive Analysis Stata Package V2.3
150 pages
Draft 1 Aggression b.el.Ed
No ratings yet
Draft 1 Aggression b.el.Ed
62 pages
DESIGN OF EXPERIMENTS note 1
No ratings yet
DESIGN OF EXPERIMENTS note 1
30 pages
KalmanNet Neural Network Aided Kalman
No ratings yet
KalmanNet Neural Network Aided Kalman
13 pages
Autocorrelation by Christopher Dougherty PDF
No ratings yet
Autocorrelation by Christopher Dougherty PDF
30 pages
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
100% (1)
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
76 pages