SYSC4415
Introduction to Machine Learning
Lecture 11
Prof James Green
jrgreen@sce.Carleton.ca
Systems and Computer Engineering, Carleton University
Learning Objectives for Lecture 11
• Introduce recurrent neural networks (RNN) and Long Short-Term
Memory (LSTM) networks for analyzing sequential data
• Understand how a recurrent neural network (RNN) differs from a MLP
• Understand the function of each gating sub-unit within an LSTM
• Introduce at least one software framework for building, training, and
testing RNN/LSTM
Pre-Lecture Assignment
• Chapter 6.2.2(72-75 = 4 pages)
• https://www.youtube.com/watch?v=WCUNPb-5EYI
• (RNN and LSTM at a conceptual level; 26min, or 17min @ 1.5 speed)
Key terms
• Recurrent neural networks (RNNs), state, softmax function,
backpropagation through time, long short-term memory (LSTM),
Gated Recurrent Unit (GRU), minimal gated GRU, bi-directional RNNs,
attention, sequence-to-sequence RNN, recursive neural network.
In-Class Activities
• Review key concepts in the chapter through discussion,
PollEverywhere questions
• Tutorial: Review a Jupyter notebook that builds, trains, and tests a
LSTM network using Keras
RNN : used to label , classify ,
generate sequences
>
- different from FNN (contains Wops)
· Each unit u
of recurrent
layer L has a
-
state/h
<, u
&
-
memory
unit
of the
RVN
Seqseq - , specifically LSTN
element by element addition
remember/(
& gating (on/off) 1 0
.
,
0 . 5 ,
0 0
.
Parameters : , , blu , in matrix) , Enu
found using gradient descent with backpropagation .
Through time
didn't e
das
~
&
do
in
>
-
I
Motivational Example
• Introduction to RNN:
• https://www.youtube.com/watch?v=LHXXI4-IEns (10 min)
vanishing gradient causes to
>
- problem us
move
away from RNN towards LSTM Set dimensionality of parameter matrix Vj such that Vjhjt results
in a vector of dimension c (# classes)
Recurrent Neural Network Hidden Layers
Softmax function:
Input sequence:
=>
-
RNN Unfolding/unrolling
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Left: https://www.youtube.com/watch?v=S0XFd0VMFss (2-6 of 8min)
• Right: https://www.youtube.com/watch?v=_h66BW-xNgk&index=1&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI (~15 min mark)
RNN Unfolding
Unfolding
>
-
LSTM and GRU
• Watch: https://www.youtube.com/watch?v=8HyCNIVRbSU (11min)
ct-1 ct ht-1 ht
ht-1 ht
xt xt
-
Minimal Gated Recurrent Unit
Each “cell” is made up of multiple units (sizel). One unit shown here:
“Forget” or “Update” gate
Textbook: Minimal Gated GRU (gated recurrent unit)
1) New potential memory cell value f(inputs and ht-1);&
g1 = tanh
2) Memory forget gate; 1=forget=take new, 0=keep=ignore new; f(inputs and ht-1)
- g2 “gate function” uses sigmoid function Fgate value;
3) New memory cell value. Either take new h~ value, or keep ht-1; f(Fgate and ht-1)
4) Vector of new memory cell values. 1 per unit in this layer
5) Output vector;-
g3=softmax
Discussion of dimensions of signals in an LSTM: https://mmuratarat.github.io//2019-01-19/dimensions-of-lstm
-
true for
both
Whig SGD)
(aery !
Advanced RNN Architectures
• Other important extensions to RNNs include:
• A generalization of an RNN is a recursive neural network
• bi-directional RNNs
• RNNs with attention (see extended Ch6 material on course wiki)
• Attention-only networks = transformers…
• sequence-to-sequence RNN models.
• Frequently used to build neural machine translation models and other models for text to
text transformations.
• Will see this later in the textbook (section 7.7)…
• Combinations of CNN+LSTM
• Image Captioning
• Video:
• CNN on indiv frames to extact feature vectors, LSTM for time-series
• Or look at 3D conv with fixed time window
Textbook Recommended Readings for RNN:
• An extended version of Chapter 6 with RNN unfolding, bidirectional RNN, and attention
• The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy (2015)
• Recurrent Neural Networks and LSTM by Niklas Donges (2018)
• Understanding LSTM Networks by Christopher Olah (2015)
• Introduction to RNNs by Denny Britz (2015)
• Implementing a RNN with Python, Numpy and Theano by Denny Britz (2015)
• Backpropagation Through Time and Vanishing Gradients by Denny Britz (2015)
• Implementing a GRU/LSTM RNN with Python and Theano by Denny Britz (2015)
• Simplified Minimal Gated Unit Variations for Recurrent Neural Networks by Joel Heck and
Fathi Salem (2017)
Transformers: “Attention is all you need”
Great 1-hour Transformers Tutorial
• Transformers (“Attention is all you need” 2017)
• Attention: watch ~11 mins from 10min mark: Transformers with Lucas Beyer
• “LSTM is Dead. Long Live Transformers” (2019)
• https://www.youtube.com/watch?v=S27pHKBEp30 (~45min)
• Warning: we won’t cover NLP for a few weeks…
• Code and pre-trained models: github.com/huggingface/transformers
• “The Illustrated Transformer”: http://jalammar.github.io/illustrated-transformer/
• Transformers replacing CNN for image analysis…
• “An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale”
https://arxiv.org/pdf/2010.11929.pdf (2021)
• ViT = Vision Transformer
• Break image into patches; flatten each patch into a vector; add positional information (where
did patch come from within image?); get ‘sequence’ of encoded patches; compute key, value,
query using linear layer; compute attention; MLP/FFNN; …