0% found this document useful (0 votes)

20 views36 pages

Module 4

The document discusses recurrent neural networks (RNNs), detailing their structure, including recurrent neurons, memory cells, and various input-output sequence types. It covers training methods, data preparation for machine learning models, and forecasting techniques using linear models, simple RNNs, and deep RNNs, while addressing challenges like unstable gradients and short-term memory problems. Additionally, it introduces advanced architectures such as LSTM and GRU for improved performance.

Uploaded by

3BR18CS151Srinath V Devale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views36 pages

Module 4

Uploaded by

3BR18CS151Srinath V Devale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Recurrent Neurons and Layers

1. The simplest RNN has just one neuron that:

● Receives inputs at each time step t
● Receives its own previous output from time step t-1
● At the first time step, with no previous output, it starts at 0
2. When expanded to a full RNN layer:
○ Every neuron receives both the input vector x(t)
○ Every neuron receives the output vector from previous time step ŷ
(t-1)
○ The inputs and outputs become vectors instead of scalars
3. Each recurrent neuron has two weight sets:
● wx: weights for current input x(t)
● wŷ: weights for previous outputs ŷ(t-1)
● For a full layer, these become matrices Wx and Wŷ
4. The output calculation for a single instance is:
ŷ(t) = ϕ(Wx⊺x(t) + Wŷ⊺ŷ(t-1) + b)
5. For a mini-batch, the output calculation becomes:
Ŷ(t) = ϕ(X(t)Wx + Ŷ(t-1)Wŷ + b) = ϕ([X(t) Ŷ(t-1)]W + b) with W = [Wx Wŷ]
Where:
● Ŷ(t) is m × n neurons matrix (outputs at time t)
● X(t) is m × n inputs matrix (inputs for all instances)
● Wx is n inputs × n neurons matrix (weights for current inputs)
● Wŷ is n neurons × n neurons matrix (weights for previous outputs)
● b is the bias vector of size n neurons
6. Key characteristics:
● The output Ŷ(t) depends on both current input X(t) and previous output Ŷ
(t-1)
● This creates a chain of dependencies going back to the first time step
● At t=0, previous outputs are initialized to zeros
Memory Cells
1. Memory in RNNs:
● A recurrent neuron's output at time t depends on all previous inputs
● This creates a form of memory in the network
● Any part of a neural network that maintains state across time steps
is called a memory cell
2. Basic Memory Cells:
○ A single recurrent neuron is a basic memory cell
○ A layer of recurrent neurons is also a basic memory cell
○ These basic cells can typically learn patterns about 10 steps long
○ The pattern length capability varies depending on the task
3. More Complex Cells:

● Later chapters cover more sophisticated cell types

● These can learn patterns roughly 10 times longer
● Pattern length still varies based on the task

4. Cell State Characteristics:

● Cell state at time t is denoted as h(t) (h stands for "hidden")

● State is a function of:
○ Current inputs x(t)
○ Previous state h(t-1)
● Written as: h(t) = f(x(t), h(t-1))
5. Cell Output:

● Output at time t is denoted as ŷ(t)

● Output is a function of:
○ Previous state
○ Current inputs
● In basic cells: output equals state
● In complex cells: output may differ from state
Input and Output Sequences
1. Sequence-to-Sequence (top-left):
● Takes a sequence and outputs a sequence
● Example: Power consumption forecasting where you input N days of data and
output predictions shifted by one day
● Best for tasks where input and output are naturally sequential and aligned

2. Sequence-to-Vector (top-right):
● Takes a sequence but only uses final output
● Example: Sentiment analysis of movie reviews, where words are the input
sequence and the output is a single sentiment score
● Good for classification/scoring of sequential data
3. Vector-to-Sequence (bottom-left):
● Takes a single vector repeatedly as input and produces a sequence
● Example: Image captioning, where a CNN-processed image is input and
the output is a sequence of words describing it
● Useful when generating sequential content from a fixed input
4. Encoder-Decoder (bottom-right):
● Combines sequence-to-vector (encoder) with vector-to-sequence
(decoder)
● Example: Language translation, where input sentence is encoded to a
vector, then decoded to target language
● Better than direct sequence-to-sequence for translation because it can
consider entire input context before generating output
● More complex implementation than the diagram suggests (covered in
Chapter 16)
Training RNNs
1. Basic Concept:
● BPTT involves unrolling the RNN through time
● Uses regular backpropagation principles on the unrolled network
● Consists of forward pass followed by backward pass

2. Forward Pass:
● Network processes the input sequence from start to finish
● Represented by dashed arrows in Figure 15-5
● Generates predictions Ŷ(0) through Ŷ(T) for each timestep
3. Loss Function:

● Evaluates output sequence against target sequence

● Format: ℒ(Y(0), Y(1), ..., Y(T); Ŷ(0), Ŷ(1), ..., Ŷ(T))
● Can selectively ignore certain outputs depending on the task
● Example: Sequence-to-vector RNNs only use the final output

4. Backward Pass:

● Gradients flow backward through the unrolled network

● Only flows through outputs used in loss calculation
● In the example, only flows through Ŷ(2), Ŷ(3), and Ŷ(4)

5. Parameter Updates:

● Same parameters (W and b) are used at each timestep

● Parameters receive multiple gradient updates during backprop
● Final gradient descent step updates parameters just like regular backprop
Preparing Data for ML models
● The text describes preparing time series data for machine learning
models, with the goal of forecasting tomorrow's ridership based on
8 weeks (56 days) of past data.
● The concept of using sliding windows: Every 56-day window from
the past serves as training data, with the target being the value
immediately following each window.
● Keras provides two methods for creating time series datasets:

First method using timeseries_dataset_from_array():

import tensorflow as tf

my_series = [0, 1, 2, 3, 4, 5]

my_dataset = tf.keras.utils.timeseries_dataset_from_array(

my_series,

targets=my_series[3:], # targets are 3 steps into the future

sequence_length=3,

batch_size=2

)
Alternative method using window():

dataset = tf.data.Dataset.range(6).window(4, shift=1, drop_remainder=True)

dataset = dataset.flat_map(lambda window_dataset:

window_dataset.batch(4))

# Helper function for extracting windows

def to_windows(dataset, length):

dataset = dataset.window(length, shift=1, drop_remainder=True)

return dataset.flat_map(lambda window_ds: window_ds.batch(length))

Final data preparation steps for the rail ridership example:

rail_train = df["rail"]["2016-01":"2018-12"] / 1e6

rail_valid = df["rail"]["2019-01":"2019-05"] / 1e6

rail_test = df["rail"]["2019-06":] / 1e6

seq_length = 56

train_ds = tf.keras.utils.timeseries_dataset_from_array(

rail_train.to_numpy(),

targets=rail_train[seq_length:],

sequence_length=seq_length,
batch_size=32,

shuffle=True,

seed=42

valid_ds = tf.keras.utils.timeseries_dataset_from_array(

rail_valid.to_numpy(),

targets=rail_valid[seq_length:],

sequence_length=seq_length,

batch_size=32

)
Forecasting Using Linear Model
Performance Results:
● The model achieved a validation MAE of approximately 37,866
● This performance is:
○ Better than naive forecasting
○ Worse than the SARIMA model

Key Model Characteristics:

● Uses Huber loss instead of MAE directly for better performance
● Implements early stopping to prevent overfitting
● Uses SGD optimizer with momentum
● Monitors validation MAE for early stopping
Code Snippet:

tf.random.set_seed(42)

model = tf.keras.Sequential([

tf.keras.layers.Dense(1, input_shape=[seq_length])

])
Forecasting Using Simple RNN
Initial Simple RNN Implementation:

model = tf.keras.Sequential([

tf.keras.layers.SimpleRNN(1, input_shape=[None, 1])

])
2. Input Shape Requirements:
● RNN layers expect 3D inputs: [batch size, time steps, dimensionality]
● Input_shape ignores the first dimension (batch size)
● Time steps can be None (any size)
● Dimensionality is 1 for univariate time series
3. How the Simple RNN Works:
● Initial state h(init) starts at 0
● Each step processes current input and previous state
● Uses hyperbolic tangent (tanh) activation by default
● Outputs only the final value unless return_sequences=True
4. Problems with Initial Model:
● Validation MAE > 100,000 (poor performance)
● Only 3 parameters total (2 weights + 1 bias)
● Limited by tanh activation range (-1 to +1)
● Too simple for the complexity of the data
Forecasting Using Deep RNN
Code Snippet:

deep_model = tf.keras.Sequential([

tf.keras.layers.SimpleRNN(32, return_sequences=True,

input_shape=[None, 1]),

tf.keras.layers.SimpleRNN(32, return_sequences=True),

tf.keras.layers.SimpleRNN(32),

tf.keras.layers.Dense(1)

])
Fighting Unstable Gradiets Problem
1. Common Deep Learning Techniques That Help:
● Good parameter initialization
● Faster optimizers
● Dropout
2. ReLU and Non-saturating Activation Functions:
● May not help as much with RNNs
● Can actually increase instability
● Risk of exploding outputs due to weight reuse across time steps
● Saturating functions like tanh are preferred (hence being the default)
3. Gradient Issues:
● Gradients can explode
● Solutions include:
○ Using smaller learning rates
○ Monitoring gradient size (via TensorBoard)
○ Using gradient clipping
4. Batch Normalization (BN) Limitations:
● Less effective with RNNs than with feedforward networks
● Cannot be used effectively between time steps
● When used in memory cells:
○ Same BN layer used at each time step
○ Same parameters regardless of input scale
○ Only slightly beneficial when applied to layer inputs
○ Not helpful when applied to hidden states
○ Can slow down training
5. Layer Normalization Benefits:
● Better suited for RNNs than batch normalization
● Normalizes across features dimension instead of batch dimension
● Advantages:
○ Can compute statistics on the fly at each time step
○ Works independently for each instance
○ Consistent behavior during training and testing
○ Doesn't need exponential moving averages
○ Learns scale and offset parameters for each input
6. Implementation:
● Used after linear combination of inputs and hidden states
● Requires defining a custom memory cell in Keras
● Cell's call() method needs to handle both:
○ Current time step inputs
○ Previous time step hidden states
Tackling Short-Term Memory Problem

Using,
1. LSTM
2. GRU
LSTM
Code Snippet:

model = tf.keras.Sequential([

tf.keras.layers.LSTM(32, return_sequences=True, input_shape=

[None, 5]),

tf.keras.layers.Dense(14)

])
GRU
Code Snippet:

model = tf.keras.Sequential([

tf.keras.layers.GRU(32, return_sequences=True, input_shape=

[None, 5]),

tf.keras.layers.Dense(14)

])

Unit 4b - Recurrent Neural Networks
No ratings yet
Unit 4b - Recurrent Neural Networks
60 pages
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
No ratings yet
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
9 pages
Chapter15 RNN
No ratings yet
Chapter15 RNN
29 pages
Advanced Deep Learning with RNNs
No ratings yet
Advanced Deep Learning with RNNs
50 pages
Lesson 7 - RNN
No ratings yet
Lesson 7 - RNN
89 pages
A Practical Guide To RNN and LSTM in Keras - by Mohit Mayank - Towards Data Science
No ratings yet
A Practical Guide To RNN and LSTM in Keras - by Mohit Mayank - Towards Data Science
13 pages
DL Experiments
No ratings yet
DL Experiments
19 pages
SocrAI Day 4
No ratings yet
SocrAI Day 4
38 pages
RNNs for Time Series Prediction
No ratings yet
RNNs for Time Series Prediction
7 pages
Recurrent Neural Networks: Prof. Gheith Abandah
No ratings yet
Recurrent Neural Networks: Prof. Gheith Abandah
32 pages
Deep Learning for Data Scientists
No ratings yet
Deep Learning for Data Scientists
21 pages
RNN Implementation Guide
No ratings yet
RNN Implementation Guide
18 pages
Deep Learning Subject Practicals Uni Mumbai
No ratings yet
Deep Learning Subject Practicals Uni Mumbai
13 pages
On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras
100% (1)
On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras
34 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
63 pages
Unit 4
No ratings yet
Unit 4
86 pages
DL Exp-7 16010422230
No ratings yet
DL Exp-7 16010422230
12 pages
Deep Learning Updated
No ratings yet
Deep Learning Updated
11 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
115 pages
Build RNN with Numpy: Step-by-Step Guide
No ratings yet
Build RNN with Numpy: Step-by-Step Guide
36 pages
DL Mod 3
No ratings yet
DL Mod 3
4 pages
Built An AI Based Forecasting Model For Intraday Trading 1713981234
No ratings yet
Built An AI Based Forecasting Model For Intraday Trading 1713981234
4 pages
Lab - 2 Csa
No ratings yet
Lab - 2 Csa
10 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
Recurrent Neural Networks (RNNS)
No ratings yet
Recurrent Neural Networks (RNNS)
45 pages
RNN Recurrent Neural Network: Application Input Sequence Task
No ratings yet
RNN Recurrent Neural Network: Application Input Sequence Task
10 pages
AN2DL 04 2324 RecurrentNeuralNetworks
No ratings yet
AN2DL 04 2324 RecurrentNeuralNetworks
34 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
DL 8
No ratings yet
DL 8
4 pages
Module 7 RNN
No ratings yet
Module 7 RNN
12 pages
Sequence Models
No ratings yet
Sequence Models
73 pages
Handling Sequence Data in Pytorch
No ratings yet
Handling Sequence Data in Pytorch
13 pages
Module 6 - Deep Sequence Modeling-Original
No ratings yet
Module 6 - Deep Sequence Modeling-Original
65 pages
Deep Learning Basics (Lecture Notes) : Romain Tavenard
No ratings yet
Deep Learning Basics (Lecture Notes) : Romain Tavenard
49 pages
598 114 216 Recurrent Neural Networks
No ratings yet
598 114 216 Recurrent Neural Networks
87 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
Dense Neural Nets
No ratings yet
Dense Neural Nets
68 pages
18 Rnns
No ratings yet
18 Rnns
57 pages
Unit 5
No ratings yet
Unit 5
42 pages
ch6 RNN
No ratings yet
ch6 RNN
25 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Lecture5 MCQ Guide
No ratings yet
Lecture5 MCQ Guide
9 pages
Advanced RNN Design & Applications
No ratings yet
Advanced RNN Design & Applications
41 pages
06 RNN Et Séquences Temporelles v2.02
No ratings yet
06 RNN Et Séquences Temporelles v2.02
37 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Introduction To Rnns
No ratings yet
Introduction To Rnns
48 pages
ML Lec 21 RNN
No ratings yet
ML Lec 21 RNN
72 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
9 Deep Leaning RNN
No ratings yet
9 Deep Leaning RNN
64 pages
CNN and RNN Applications in AI
No ratings yet
CNN and RNN Applications in AI
41 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Slides RNN
No ratings yet
Slides RNN
75 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Module 5
No ratings yet
Module 5
23 pages
Module 5
No ratings yet
Module 5
23 pages
Key Answers Cs
No ratings yet
Key Answers Cs
4 pages
On The Open Road by Changle Stuti - 240604 - 225251
No ratings yet
On The Open Road by Changle Stuti - 240604 - 225251
159 pages
AI's Impact on Construction Management
No ratings yet
AI's Impact on Construction Management
23 pages
Artificial Intelligence SF6 Circuit Breaker Health Assessment
No ratings yet
Artificial Intelligence SF6 Circuit Breaker Health Assessment
9 pages
Time Management Literature Review
100% (2)
Time Management Literature Review
8 pages
Artificial Intelligence: Chap-01: Introduction
No ratings yet
Artificial Intelligence: Chap-01: Introduction
24 pages
Use of Artificial Intelligence by Tax Administrations An Analysis Regarding Taxpayers' Rights in Latin American Countries
No ratings yet
Use of Artificial Intelligence by Tax Administrations An Analysis Regarding Taxpayers' Rights in Latin American Countries
13 pages
Machine Learning Internship Task
No ratings yet
Machine Learning Internship Task
10 pages
Neeharika CV PDF
No ratings yet
Neeharika CV PDF
2 pages
Assignment - 3 - BUSM2570 - HN02 - Le Hai Nam
No ratings yet
Assignment - 3 - BUSM2570 - HN02 - Le Hai Nam
11 pages
Chatbot Deployment Project
100% (1)
Chatbot Deployment Project
69 pages
XLRI PM Casebook 2024
No ratings yet
XLRI PM Casebook 2024
229 pages
Ccs345 E&ai Master Lab
No ratings yet
Ccs345 E&ai Master Lab
48 pages
AI Powered Internship Recommendation Engine
No ratings yet
AI Powered Internship Recommendation Engine
6 pages
Ai Set B QP
100% (1)
Ai Set B QP
5 pages
AI Literacy in Hausa Course Presentation
No ratings yet
AI Literacy in Hausa Course Presentation
61 pages
ML Engineers AI CoE Apr2024
No ratings yet
ML Engineers AI CoE Apr2024
2 pages
A Deep Learning Approach To The Geometry Friends Game (Artículo)
No ratings yet
A Deep Learning Approach To The Geometry Friends Game (Artículo)
10 pages
SoftComputing Module I
No ratings yet
SoftComputing Module I
4 pages
TensorFlow for AI Researchers
No ratings yet
TensorFlow for AI Researchers
240 pages
CB Insights Future of Fashion
100% (1)
CB Insights Future of Fashion
68 pages
Seminar Report Presentation On Robotics
No ratings yet
Seminar Report Presentation On Robotics
31 pages
Soal Tka LATIHAN
No ratings yet
Soal Tka LATIHAN
6 pages
VGTC Dissertation Award
100% (2)
VGTC Dissertation Award
6 pages
National Security & Information Security
No ratings yet
National Security & Information Security
11 pages
Reshaping The Future of Experiences: Industry X.0 Transformation
No ratings yet
Reshaping The Future of Experiences: Industry X.0 Transformation
13 pages
Homework 2 Answer: BME-AP-K53: Image Processing Course
No ratings yet
Homework 2 Answer: BME-AP-K53: Image Processing Course
9 pages
Accounts 31.10.2020 11-30
No ratings yet
Accounts 31.10.2020 11-30
5 pages
MumbaiHacks 2025 - Format, Problem Statements, and The Way Forward!
No ratings yet
MumbaiHacks 2025 - Format, Problem Statements, and The Way Forward!
6 pages
Face Recognition Based Attendance System
No ratings yet
Face Recognition Based Attendance System
9 pages
Smart Draw! Doodle Recognition
No ratings yet
Smart Draw! Doodle Recognition
6 pages
Deloitte CN TMT China Education Development en Report 2018
No ratings yet
Deloitte CN TMT China Education Development en Report 2018
68 pages