Deep Learning
MLP Feed Forward Neural Network
• Definition: A fully connected, feedforward neural
network with at least one hidden layer between the
input and output layers.
• Feedforward: Data flows in one direction (input →
hidden layers → output) with no cycles.
• Purpose: Learn non-linear relationships in data for
tasks like classification, regression, and pattern
recognition.
Architecture of an MLP
•Input Layer: Receives raw features (e.g., pixels in an
image, words in a document).
•Hidden Layers: Transform inputs using weights and
activation functions (e.g., ReLU, Sigmoid).
•Output Layer: Produces predictions (e.g., class
probabilities for classification).
Convolutional Neural Networks
(CNNs)
• Convolutional Neural Networks (CNNs) for image
classification (2D CNN) and text classification (1D CNN),
2D CNN for Image Classification
What:
CNNs use convolutional layers to extract spatial
hierarchies of features (edges → textures → objects)
from images.
Why:
• Translation invariance: Detects patterns regardless of
position.
• Parameter sharing: Fewer weights than dense layers.
1D CNN for Text Classification
What:
1D CNNs apply temporal convolutions to sequences
(e.g., word embeddings) to detect local patterns (n-
grams).
Why:
• Faster than RNNs: Parallel processing of sequences.
• Captures local context: Detects phrases or word
combinations.
Recurrent Neural Networks
(RNNs)
• Class of neural networks designed for sequential
data (e.g., time series, text, speech).
• Purpose: Process sequences by maintaining a hidden
state that captures temporal dependencies.
• Key Idea: Reuse weights across time steps, allowing
the network to "remember" past information.
• Use Cases:
• Time series forecasting.
• Text generation/sentiment analysis.
• Speech recognition.
• Machine translation.
Challenges with Basic RNNs
• Vanishing/Exploding Gradients: Difficulty learning
long-range dependencies (e.g., in long sentences).
• Short-Term Memory: Basic RNNs struggle to retain
information over many steps.
Solutions:
• Long Short-Term Memory (LSTM):
Uses gates (input, forget, output) to control information
flow.
• Gated Recurrent Units (GRU): Simplified version of
LSTM with fewer gates.
What Are Word Embeddings?
•Definition: Dense vector representations of words in a
continuous space, where similar words are closer
geometrically.
•Purpose: Capture semantic meaning (e.g., king - man +
woman ≈ queen) and syntactic patterns (e.g., verb
tenses).
•Key Benefit: Overcome thinness issues of traditional
methods like BoW.
Word2Vec
A framework by Google in 2013 for learning word
embeddings using shallow neural networks. Two
variants:
1.Continuous Bag-of-Words (CBOW)
2.Skip-Gram
Continuous Bag-of-Words
(CBOW)
• Goal: Predict a target word from its surrounding context
words.
• Input: Average of context word vectors (e.g., window of
2 words before/after).
• Output: Probability distribution over the vocabulary for
the target word.
Example:
•Context: ["The", "cat", "on", "the"] → Predict target
word "sat".
•Training: Adjust weights to maximize the probability
of "sat" given the context.
Skip-Gram
•Goal: Predict context words given a target word (inverse
of CBOW).
•Input: Target word vector.
•Output: Probability distribution over context words.
Example:
•Target word: "sat" → Predict context ["The", "cat", "on",
"the"].
•Training: Adjust weights to maximize the probability of
context words.
Gensim and Custom Embedding
Training
• Gensim: Gensim is a Python library for natural
language processing, particularly known for its topic
modeling and document indexing capabilities.
• Word Embeddings: These are vector representations
of words, capturing semantic relationships between
them.
• Word2Vec: A popular algorithm within Gensim for
creating word embeddings.
Gensim and Custom Embedding Training
Training a Custom Word2Vec Model:
•Input Data:
You'll need a corpus of text data. Gensim's Word2Vec expects
input as a sequence of sentences, where each sentence is a
list of words.
•Preprocessing:
You might need to preprocess your text data (e.g., lowercasing,
removing punctuation, handling special characters) before
feeding it to the model.
•Model Training:
•Word2Vec Model: Instantiate the Word2Vec model from Gensim.
Sequence Models
• Sequence models are a type of machine learning model
designed to process and predict sequential data, such
as text, time series, or audio, by leveraging the inherent
order and dependencies within the data.
• They are commonly used in tasks like machine
translation, speech recognition, and text generation.
• Sequence models by their input/output structures 1-
to-1, 1-to-Many, Many-to-1, and Many-to-Many.
1-to-1 (Vanilla Feedforward
Model)
• Structure: Single input → Single output (no sequential
dependency).
Use Case: Standard classification/regression (e.g.,
MNIST digit classification).
1-to-Many
Structure: Single input → Sequence of outputs.
Use Cases:
• Image Captioning: Generate a sentence from an
image.
• Music Generation: Create a melody from a seed note.
Many-to-1
Structure: Sequence of inputs → Single output.
Use Cases:
• Sentiment Analysis: Classify a sentence as
positive/negative.
• Time Series Forecasting: Predict stock price from
historical data.
Many-to-Many
Structure: Sequence of inputs → Sequence of outputs.
Subtypes:
1.Aligned (Same Length): Each input step maps to an
output step (e.g., POS tagging).
2.Non-Aligned (Different Length): Input and output
sequences differ in length (e.g., translation).
a. Aligned Many-to-Many Use Cases:
• POS Tagging: Assign grammatical tags to each word.
• Video Frame Prediction: Predict next frame in a
video.
b. Non-Aligned Many-to-Many (Encoder-Decoder) Use
Cases:
• Machine Translation: Translate English → French.
• Speech Recognition: Transcribe audio → text.
Encoder: Processes input sequence into a context
vector.
Decoder: Generates output sequence from the context
vector.
Bi-Directional LSTM/RNN in
Sequence Models
• Bidirectional LSTMs (BiLSTMs) are a type of recurrent
neural network (RNN) that processes sequential data in
both forward and backward directions, allowing the
model to capture context from both past and future
inputs, which is particularly useful in tasks like natural
language processing.
• Why: Traditional unidirectional RNNs only use past
context. Bidirectional models improve performance in
tasks where future context matters (e.g., text
understanding, speech recognition).
•Two Hidden Layers:
• Forward Layer: Processes sequence
from t=1t=1 to t=Tt=T.
• Backward Layer: Processes sequence
from t=Tt=T to t=1t=1.
•Output Concatenation: Combine outputs from both layers at
Use Cases
1.Sentiment Analysis:
1.Understand how words influence each other in both directions
(e.g., "not good").
2.Machine Translation:
1.Encoder in seq2seq models (replaced by Transformers in
modern systems).
3.Speech Recognition:
1.Transcribe audio by leveraging past and future frames.
Limitations
• Not Suitable for Real-Time Prediction: Future
context isn’t available in streaming tasks (e.g., live
captioning).
• Memory-Intensive: Requires storing states for both
directions.