[go: up one dir, main page]

0% found this document useful (0 votes)
6 views8 pages

DL-unit-4-part-2

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

Recurrent Neural Networks (RNN)

A recurrent neural network (RNN) is the type of artificial neural network (ANN) that is used
in Apple’s Siri and Google’s voice search. RNN remembers past inputs due to an internal
memory which is useful for predicting stock prices, generating text, transcriptions, and machine
translation.

In the traditional neural network, the inputs and the outputs are independent of each other,
whereas the output in RNN is dependent on prior elementals within the sequence. Recurrent
networks also share parameters across each layer of the network. In feedforward networks,
there are different weights across each node. Whereas RNN shares the same weights within
each layer of the network and during gradient descent, the weights and basis are adjusted
individually to reduce the loss.

RNN

The image above is a simple representation of recurrent neural networks. If we are forecasting
stock prices using simple data [45,56,45,49,50,…], each input from X0 to Xt will contain a
past value. For example, X0 will have 45, X1 will have 56, and these values are used to predict
the next number in a sequence.

How Recurrent Neural Networks Work


In RNN, the information cycles through the loop, so the output is determined by the current
input and previously received inputs.
The input layer X processes the initial input and passes it to the middle layer A. The middle
layer consists of multiple hidden layers, each with its activation functions, weights, and biases.
These parameters are standardized across the hidden layer so that instead of creating multiple
hidden layers, it will create one and loop it over.

Instead of using traditional backpropagation, recurrent neural networks


use backpropagation through time (BPTT) algorithms to determine the gradient. In
backpropagation, the model adjusts the parameter by calculating errors from the output to the
input layer. BPTT sums the error at each time step as RNN shares parameters across each layer.

Types of Recurrent Neural Networks


Feedforward networks have single input and output, while recurrent neural networks are
flexible as the length of inputs and outputs can be changed. This flexibility allows RNNs to
generate music, sentiment classification, and machine translation.

There are four types of RNN based on different lengths of inputs and outputs.

 One-to-one is a simple neural network. It is commonly used for machine


learning problems that have a single input and output.
 One-to-many has a single input and multiple outputs. This is used for
generating image captions.
 Many-to-one takes a sequence of multiple inputs and predicts a single output.
It is popular in sentiment classification, where the input is text and the output
is a category.
 Many-to-many takes multiple inputs and outputs. The most common
application is machine translation.

Types of RNN

CNN vs. RNN


The convolutional neural network (CNN) is a feed-forward neural network capable of
processing spatial data. It is commonly used for computer vision applications such as image
classification. The simple neural networks are good at simple binary classifications, but they
can't handle images with pixel dependencies. The CNN model architecture consists
of convolutional layers, ReLU layers, pooling layers, and fully connected output layers. You
can learn CNN by working on a project such as Convolutional Neural Networks in Python.

CNN Model Architecture

Key Differences Between CNN and RNN

 CNN is applicable for sparse data like images. RNN is applicable for time
series and sequential data.
 While training the model, CNN uses a simple backpropagation and RNN uses
backpropagation through time to calculate the loss.
 RNN can have no restriction in length of inputs and outputs, but CNN has
finite inputs and finite outputs.
 CNN has a feedforward network and RNN works on loops to handle
sequential data.
 CNN can also be used for video and image processing. RNN is primarily used
for speech and text analysis.

Limitations of RNN
Simple RNN models usually run into two major issues. These issues are related to gradient,
which is the slope of the loss function along with the error function.

1. Vanishing Gradient problem occurs when the gradient becomes so small that
updating parameters becomes insignificant; eventually the algorithm stops
learning.
2. Exploding Gradient problem occurs when the gradient becomes too large,
which makes the model unstable. In this case, larger error gradients
accumulate, and the model weights become too large. This issue can cause
longer training times and poor model performance.
RNN Applications
Recurrent Neural Networks are used to tackle a variety of problems involving sequence
data. There are many different types of sequence data, but the following are the most
common: Audio, Text, Video, Biological sequences.
Using RNN models and sequence datasets, you may tackle a variety of problems,
including :
 Speech recognition
 Generation of music
 Automated Translations
 Analysis of video action
 Sequence study of the genome and DNA

CODE for RNN

Recurrent neural networks (RNN) are a class of neural networks that is powerful for modeling
sequence data such as time series or natural language.

Schematically, a RNN layer uses a for loop to iterate over the timesteps of a sequence, while
maintaining an internal state that encodes information about the timesteps it has seen so far.

The Keras RNN API is designed with a focus on:

 Ease of use:
the built-in keras.layers.RNN, keras.layers.LSTM, keras.layers.GRU layers enable you
to quickly build recurrent models without having to make difficult configuration
choices.
 Ease of customization:
You can also define your own RNN cell layer (the inner part of the for loop) with
custom behavior, and use it with the generic keras.layers.RNN layer (the for loop
itself). This allows you to quickly prototype different research ideas in a flexible way
with minimal code.
RNN CODE
RNN Code Implementation

Imported libraries:

Imported some necessary libraries such as numpy, tensorflow for numerical calculation an
model building.
import numpy as np

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import SimpleRNN, Dense

Input Generation:
Generated some example data using text.

text = "This is GeeksforGeeks a software training institute"

chars = sorted(list(set(text)))

char_to_index = {char: i for i, char in enumerate(chars)}

index_to_char = {i: char for i, char in enumerate(chars)}

Created input sequences and corresponding labels for further implementation.

seq_length = 3

sequences = []

labels = []

for i in range(len(text) - seq_length):

seq = text[i:i+seq_length]
label = text[i+seq_length]

sequences.append([char_to_index[char] for char in seq])

labels.append(char_to_index[label])

Converted sequences and labels into numpy arrays and used one-hot encoding to convert
text into vector.

X = np.array(sequences)

y = np.array(labels)

X_one_hot = tf.one_hot(X, len(chars))

y_one_hot = tf.one_hot(y, len(chars))

Model Building:
Build RNN Model using ‘relu’ and ‘softmax‘ activation function.

model = Sequential()

model.add(SimpleRNN(50, input_shape=(seq_length, len(chars)), activation='relu'))

model.add(Dense(len(chars), activation='softmax'))

Model Compilation:
The model.compile line builds the neural network for training by specifying the optimizer
(Adam), the loss function (categorical crossentropy), and the training metric (accuracy).
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Model Training:
Using the input sequences (X_one_hot) and corresponding labels (y_one_hot) for 100
epochs, the model is trained using the model.fit line, which optimises the model
parameters to minimise the categorical crossentropy loss.

model.fit(X_one_hot, y_one_hot, epochs=100)

Model Prediction:
Generated text using pre-trained model.

start_seq = "This is G"

generated_text = start_seq

for i in range(50):

x = np.array([[char_to_index[char] for char in generated_text[-seq_length:]]])

x_one_hot = tf.one_hot(x, len(chars))

prediction = model.predict(x_one_hot)

next_index = np.argmax(prediction)

next_char = index_to_char[next_index]
generated_text += next_char

print("Generated Text:")

print(generated_text)

You might also like