Deep Learning Introduction
Deep Learning Introduction
LEARNING
Do we know this..?
Data
Supervised, Preprocessin
Unsupervised g Variables,
, Features,
Reinforcemen Labels
t Learning
Gradient Machine
Decent Learning
Generalizati Training &
on Testing Data
Fine
Tunning
Regression,
Overfitting,
Classificatio
Underfitting
n, Clustering
Varian
ce Bias
Origins and Early Developments (1870-1960s)
1873
helped set the stage making
computers better at
understanding what we say and
write.
• Warren McCulloch and Walter Pitts
1943
wrote a paper introducing the
McCulloch-Pitts neuron model.
• model showed how simple units, like
switches in the brain, could work
together to solve complex problems.
1957
• Frank Rosenblatt created the
perceptron
• Neural network that learns from
labelled data
1986
algorithm
• This algorithm enabled
more efficient training of
neural networks, allowing
them to learn from data
• more effectively.
Yann LeCun developed
Convolutional Neural
1989 Networks, a
groundbreaking
advancement in the field
of computer vision.
• I n 2 0 0 6 G e o ff re y H i n t o n m a d e a s i g n i fi c a n t c o n t r i b u t i o n t o t h e
fi e l d o f d e e p l e a rn i n g .
• H e i n t ro d u c e d d e e p b e l i e f n e t w o r k s w h i c h a re p ro b a b i l i s t i c
generative models made up of multiple layers of stochastic,
latent variables.
• N e w a p p ro a c h t o u n s u p e r v i s e d l e a rn i n g , w h e re m a c h i n e s c o u l d
l e a rn p a t t e rn s a n d re l a t i o n s h i p s i n d a t a w i t h o u t ex p l i c i t g u i d a n c e.
• D e ep b e l i e f n e t w o r k s a p p l i e d f o r f e a t u re l e a rn i n g , d i m e n s i o n a l i t y
re d u c t i o n , a n d a n o m a l y d e t e c t i o n , f u r t h e r a d v a n c i n g t h e
c a p a b i l i t i e s o f a r t i fi c i a l i n t e l l i g e n c e .
Expanding Frontiers (2010s-Present)
• Artificial Intelligence
2014
Adversarial Networks (GANs)
• Two neural networks, the generator and
the discriminator compete and
collaborate.
• Andrej Karpathy applied recurrent neural
2015
networks (RNNs) to the realm of natural
language processing.
• With RNNs, sequences of words could be
understood and processed in context
2017
• Geoffrey Hinton introduced Capsule
Networks, a visionary alternative to
convolutional neural networks (CNNs)
Each of these milestones marked a chapter in the ongoing saga of artificial intelligence,
driving the field forward with unprecedented leaps in capability and understanding.
Applications
Image Recognition
(Facial recognition, Object detection)
Entertainment
(recommendation systems, deepfake technology)
Autonomous Vehicles
(self-driving cars)
Finance
(fraud detection, stock market prediction)
Where it stands?
Building Blocks of Neural Network
Po s s i b l e t o m i m i c d e n d r i t e s , c e l l b o d i e s a n d a xo n s u s i n g s i m p l i fi e d
mathematical models
How it works..?
- O n re c e i v i n g e n o u g h s i g n a l s t o t h e H o w it w o rk s..?
d e n d r i t e s , t h e y p a s s i t t o t h e a xo n - A f u n c t i o n t h a t re c e i v e s a
- Outgoing signal can then be used as list of weighted input
a n o t h e r i n p u t f o r o t h e r n e u ro n s , signals and outputs some
re p e a t i n g t h e p ro c e s s . kind of signal if the sum
- S o m e s i g n a l s a re m o re i m p o r t a n t t h a n of these weighted inputs
o t h e r s a n d c a n t r i g g e r s o m e n e u ro n s re a c h a c e r t a i n b i a s .
t o fi re e a s i e r
- C o n n e c t i o n s c a n b e c o m e s t ro n g e r o r
w e a ke r , n e w c o n n e c t i o n s c a n a p p e a r
Artifi cial Neural Network (ANN)
• ANN is an information-processing model inspired from the biological
nervous system.
• Like humans, ANN learns from by the example. More the data, better the
learning.
• Can solve one problem at a time unlike to the brain. (Face recognition/
Language Understanding/ Prediction/ Classification/ Forecasting)
• It is composed of large number of highly interconnected processing
units(neurons) working to solve a specific type of problem. Neurons are
connected with the connection link.
• Each neuron has its own internal state called Activation Level of neuron
Neural Network and Perceptron
I=
Input = {x1,x2,x3,x4}
Basic Components
1. Connections – connections between the neurons containing the weights.
i. Single-Layered feed-forward network
ii. Multilayer feed-forward network
iii. Single node with its own feedback
iv. Single-layer Recurrent network
2. Learning – process by which a neural network adapts itself to stimulate by adjusting
parameters to provide the most optimal output.
i. Parameter Learning – update the weights
ii. Structure Learning – update the the layers and number of neurons
3. Activation Function
i. Binary step function
ii. Linear function
iii. Sigmoid activation function
iv. ReLU Activation function (Rectified Linear Unit), Leaky ReLU, Parameterised ReLU
Connections
Inpu Outpu
t t
Feedba
ck
Single node with own
ACTIVATION FUNCTION
• A n e u r a l n e t w o r k a c t i v a t i o n f u n c t i o n i s a f u n c t i o n t h a t i n t ro d u c e s
nonlinearity into the model.
• T h e y h e l p t o d e t e rm i n e t h e o u t p u t o f a n e u r a l n e t w o r k b y a p p l y i n g
m a t h e m a t i c a l t r a n s f o rm a t i o n s t o t h e i n p u t s i g n a l s re c e i v e d f ro m
o t h e r l a y e r s i n a n e t w o r k . Ac t i v a t i o n f u n c t i o n s a l l o w f o r c o m p l ex
n o n - l i n e a r re l a t i o n s h i p s b e t w e e n i n p u t a n d o u t p u t d a t a p o i n t s .
• I n n e u r a l n e t w o r k n o t a l l t h e i n f o rm a t i o n i s i m p o r t a n t . Ac t i v a t i o n
f u n c t i o n h e l p s t o u s e i m p o r t a n t i n f o rm a t i o n a n d s u p re s s t h e
i rre l e v a n t d a t a p o i n t s .
https://vitalflux.com/how-know-data-linear-non-linear/
Why use activation functions?
• Ac t i v a t i o n f u n c t i o n s ’ a d d s n o n - l i n e a r i t i e s i n t o t h e n e t w o r k
• In the absence of activation functions, the network would only be
capable of p e r f o rm i n g linear t r a n s f o rm a t i o n s , which cannot
a d e q u a t e l y re p re s e n t t h e c o m p l ex i t y a n d n u a n c e s o f re a l - w o r l d d a t a .
• N o rm a l i z i n g e a c h n e u ro n i n t h e n e t w o r k ’ s o u t p u t i s a ke y b e n e fi t o f
utilizing activation functions.
• Depending on the inputs it gets and the weights associated to those
i n p u t s , a n e u ro n ’ s o u t p u t c a n r a n g e f ro m ex t re m e l y h i g h t o
ex t re m e l y l o w.
• Ac t i v a t i o n f u n c t i o n s m a ke e n s u r i n g t h a t e a c h n e u ro n ’ s o u t p u t f a l l s
i n s i d e a d e fi n e d r a n g e , w h i c h m a ke s i t s i m p l e r t o o p t i m i s e t h e
network during training
Why use activation functions?
Linear
Limitation
• Allows the network
• Functions
to move beyond
transform the simply separating
• NN process linear data Learning Foundation for
data linearly, like
data layer by Complex Complex Tasks
into a non- Patterns logistic regression.
layer. linear form • It can learn
• Each layer • e.g., sigmoid intricate
performs a • If all layers relationships
function between features • Image
weighted sum used linear curves the • The network can
of its inputs in the data. recognition,
activation output learn complex Natural
and adds a functions (y = patterns in the Beyond Linear
bias. data that wouldn’t Separation language
mx + b), Introducing
Non-linearity be possible with processing,
Data stacking these just linear
Processing functions.
layers would
just create
another linear
function.
• No matter
how many
layers you
add, the
output would
still be a
straight line.
Ty p e s o f A c t iv a t i o n
Binary Step Fu n c t io n s Sigmoid activation
Linear Tanh activation
Function Function function function
Whether or not the neuron Activation is proportional to Use to classify the information. Maps inputs onto values between -
should be activated the input Maps any input onto a value 1 and 1.
between 0 and 1. Suffers from Multi-dimensional classification
vanishing gradient problems.
x2 w
if y
s=
- Weight and bias adjustment X2 2 y
wi(new) = wi(old) +
y=
w f(s)
txi
3
x3
b(new) = b(old) + t
X3
else
wi(new) = wi(old)
b(new) = b(old)
https://www.youtube.com/watch?v=gUCFaWULv0s&list=PL4gu8xQu0_5JK6KmQi-Qx5hI3W13RJbDY&index=11
Start
Set (0 to 1)
For
eac No
h
s:t
If y! No
=t
wi(new) = wi(old)
Yes +
wi(new) =
txi
wi(old)
b(new) = b(old) +
b(new) = b(old)
t
If
Yes weigh
t
chang
es
No
Stop
Perceptron Learning Rules (Multiple Output Class)
if y
- Weight and bias adjustment
wi(new) = wi(old) + txi
b(new) = b(old) + t
else
wi(new) = wi(old)
b(new) = b(old)
https://www.youtube.com/watch?v=gUCFaWULv0s&list=PL4gu8xQu0_5JK6KmQi-Qx5hI3W13RJbDY&index=11
Some Basics
Dot Product:
a1b1 + a2b2+a3b3
Vector A = a1, a2, a3
a
Vector B = b1, b2, b3 a
1 a1
1 a1
2 a2
a
1 a2 a3
5 3
b
3 15 b 1 2 3
b 1 1 1 6
b1 b2 a.b b1 b2 b3 a.b
a.b
a1b1 + a2b2 =
1+2=3 = a1b1 + a2b2 +
a3b3
= 1+2+3
=6
Matrix Multiplication:
- Row and Columns
- Column of the first matrix should be equal to the row of the second matrix
- A [1,1] B [1,2]
- First number of first matrix and last number of the second matrix will be the
dimension of the resultant matrix. R [1,2]
- Multiple – First matrix first row with the second matrix first column
c1 c2 4 3 2 1
1 1 -1 -1 B
B
B 1 -1 -1 1 -1 1 2 1 2
1 A
A A
R1c1= R1c2=- 3 1 1 1 5 5 3 3
r1 1 2 4 -2 -4
1 1 1 -1
R2c1=- R2c2= 2 4 -2 2 -6 3 1 1 -1
-1 6
r2 1 1 1 0
4 3 2 1
Single Layer 2(1) = 2
1(-1) = -1
+ 3(1) = 3
---------------------
2 4 (Sum of weighted
inputs)
input
1 + - 5 (bias)
---------------------
3 -1
1 -1 1 -5 -1 0
2
Single Layer (Multiple Nodes)
a
x
1
= 2(1)+1(-1)+3(1)-5
= 2-1+3-5
b
= -1 2 x
1
x 1 x
2 2
3 x
c 3
x 1 -1 1 -5 -1 0
a
3
b 1 1 0 0 3 3
d c 0 1 1 1 5 5
d 1 0 1 -1 3 3
Activation
weight bia Function
s
Hidden Layer = 0(1) + 3(1) + 5(-1) +
0
2
= 3-5
0 a = -2 1
x 2 3
1
1 -1 1 -5 -1 0
3 b
1 1 0 0 3 3
x 1
2 Activation
0 1 1 1 5 Function 5
5 c 1 0 1 -1 3 3
x 3
3
1 1 -1 0 0 -2 0
3 d 0 0 1 -1 1 3 Activation 3
Function
Activation Function
2 1 Integer Sigmoid
0 0.5 >=2 1
Sigmoid
-2 Activatio
0 -1,0,1 0.5
n
Function
1 1 <=-2 0
-1 0
ReLU
Activatio
-3 n 0
Function
2 1 Integer Sigmoid
0 0 >=2 1
Tanh
-2 Activatio
-1 -1,0,1 0.5
n
Function
<=-2 -1
https://www.datamation.com/big-data/neural-
network-in-excel/
https://www.javatpoint.com/pytorch-
backpropagation-process-in-deep-neural-network
Perceptron learning rules
https://www.youtube.com/watch?
v=KKSCmPUyczU&list=PL4gu8xQu0_5JK6KmQi-
Qx5hI3W13RJbDY&index=13
Back propagation
• https://hmkcode.com/ai/backpropagation-step-by-step/
• https://medium.com/@karna.sujan52/back-propagation-algorithm-numerical
-solved-f60c6986b643
Why Bias is required?
https://www.youtube.com/watch?v=SJ-hWwBF3zU
Model Building
model.compile()
• Function in Keras is used to configure the model for training.
• It specifies the loss function, optimizer, and metrics
model.compile(optimizer=‘ ‘ , loss=‘ ‘ , metrics=[‘
'])
• Responsible for updating • Measures how well the • Used to monitor the
the weights of the model is performing. performance of the
network to minimize the • Commonly used loss model.
loss function functions- • Commonly used metrics -
• Commonly used • • accuracy : percentage of
optimizers - categorical_crossentrop correct prediction
• Adaptive Moment y : multiclass • Mean Absolute Error
Estimation (adam) classification with one- (mae) : Regression
• Stochastic Gradient hot encoded target. problems
Decent (sgd) • Binary crossentropy : • Mean Squared Error
• Root Mean Sqared binary classification (mse) : Regression
Propogation (rmsprop)
• Adaptive • Mean_squared_error : problems
Gradient
Optimizers
model.compile(optimizer=RMSprop(learning_rate=0.001),
loss='mean_squared_error', metrics=['mae'])
Loss
Categorical model.compile(optimizer='adam',
Crossentropy for loss='categorical_crossentropy', metrics=['accuracy'])
multi-class
classification
Binary model.compile(optimizer='adam', loss='binary_crossentropy',
Crossentropy for metrics=['accuracy'])
binary classification
Mean Squared Error model.compile(optimizer='adam', loss='mean_squared_error',
(MSE) for metrics=['mae'])
regression
Metrics
Metrics Description Example
'accuracy': Measures the proportion of correctly model.compile(optimizer='adam',
classified samples loss='categorical_crossentropy',
metrics=['accuracy'])
'binary_accuracy Same as 'accuracy' but specifically for
binary classification
'mae' (Mean Measures the average absolute error for model.compile(optimizer='adam',
Absolute Error): regression loss='mean_squared_error',
metrics=['mae'])
'mse' (Mean Measures the average squared error for
Squared Error): regression
'mape' (Mean Measures the average absolute
Absolute percentage error for regression
Percentage Error)
Precision Measures the proportion of true positives from tensorflow.keras.metrics import
among all positive predictions Precision, Recall
Recall Measures the proportion of true positives
among all actual positives model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=[Precision(), Recall()])
AUC Measures the model's ability to from tensorflow.keras.metrics import
distinguish between classes. Useful in AUC
binary classification.
REFERENCES
• https://medium.com/@sathishv700/history-of-deep-learning-d734c5fd4b12
• https://www.v7labs.com/blog/deep-learning-guide
• https://towardsdatascience.com/the-differences-between-artificial-and-biolo
gical-neural-networks-a8b46db828b7
• https://www.youtube.com/watch?v=vVWRsZMi8xs&list=PL4gu8xQu0_5JK6K
mQi-Qx5hI3W13RJbDY&index=2
• https://www.shiksha.com/online-courses/articles/activation-functions-with-re
al-life-analogy-and-python-code/
• https://www.analyticsvidhya.com/blog/2020/01/fundamentals-deep-learning
-activation-functions-when-to-use-them/
• https://developers.google.com/machine-learning/crash-course/reducing-loss
/learning-rate