[go: up one dir, main page]

0% found this document useful (0 votes)
17 views44 pages

Deep Learning Introduction

Uploaded by

Alwyn Antony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views44 pages

Deep Learning Introduction

Uploaded by

Alwyn Antony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

DEEP

LEARNING
Do we know this..?

Data
Supervised, Preprocessin
Unsupervised g Variables,
, Features,
Reinforcemen Labels
t Learning

Gradient Machine
Decent Learning
Generalizati Training &
on Testing Data
Fine
Tunning
Regression,
Overfitting,
Classificatio
Underfitting
n, Clustering

Varian
ce Bias
Origins and Early Developments (1870-1960s)

• Ludwig Wittgenstein’s ideas

1873
helped set the stage making
computers better at
understanding what we say and
write.
• Warren McCulloch and Walter Pitts

1943
wrote a paper introducing the
McCulloch-Pitts neuron model.
• model showed how simple units, like
switches in the brain, could work
together to solve complex problems.

1957
• Frank Rosenblatt created the
perceptron
• Neural network that learns from
labelled data

Despite these advancements, the field of artificial intelligence experienced a setback


known as “AI Winter 1” due to challenges in computing power and algorithmic
Reemergence and Renewed Interest (Late 1980s-1990s)
• Geoffrey Hinton
introduced the
backpropagation

1986
algorithm
• This algorithm enabled
more efficient training of
neural networks, allowing
them to learn from data
• more effectively.
Yann LeCun developed
Convolutional Neural

1989 Networks, a
groundbreaking
advancement in the field
of computer vision.

Both of these breakthroughs came during a time of renewed interest in artificial


intelligence following the second AI winter, a period of reduced funding and interest in
the field.
Deep Learning Revolution (2000s-2010s):

• I n 2 0 0 6 G e o ff re y H i n t o n m a d e a s i g n i fi c a n t c o n t r i b u t i o n t o t h e
fi e l d o f d e e p l e a rn i n g .
• H e i n t ro d u c e d d e e p b e l i e f n e t w o r k s w h i c h a re p ro b a b i l i s t i c
generative models made up of multiple layers of stochastic,
latent variables.
• N e w a p p ro a c h t o u n s u p e r v i s e d l e a rn i n g , w h e re m a c h i n e s c o u l d
l e a rn p a t t e rn s a n d re l a t i o n s h i p s i n d a t a w i t h o u t ex p l i c i t g u i d a n c e.
• D e ep b e l i e f n e t w o r k s a p p l i e d f o r f e a t u re l e a rn i n g , d i m e n s i o n a l i t y
re d u c t i o n , a n d a n o m a l y d e t e c t i o n , f u r t h e r a d v a n c i n g t h e
c a p a b i l i t i e s o f a r t i fi c i a l i n t e l l i g e n c e .
Expanding Frontiers (2010s-Present)
• Artificial Intelligence

2010 • Alex Krizhevsky, Ilya Sutskever and


Geoffrey Hinton, unveiled AlexNet (CNN)
improved image classification accuracy.
• Ian Goodfellow introduced Generative

2014
Adversarial Networks (GANs)
• Two neural networks, the generator and
the discriminator compete and
collaborate.
• Andrej Karpathy applied recurrent neural

2015
networks (RNNs) to the realm of natural
language processing.
• With RNNs, sequences of words could be
understood and processed in context

2017
• Geoffrey Hinton introduced Capsule
Networks, a visionary alternative to
convolutional neural networks (CNNs)

Each of these milestones marked a chapter in the ongoing saga of artificial intelligence,
driving the field forward with unprecedented leaps in capability and understanding.
Applications

Image Recognition
(Facial recognition, Object detection)

Natural Language Processing


(chatbots, language translation)

Entertainment
(recommendation systems, deepfake technology)

Autonomous Vehicles
(self-driving cars)

Finance
(fraud detection, stock market prediction)
Where it stands?
Building Blocks of Neural Network
Po s s i b l e t o m i m i c d e n d r i t e s , c e l l b o d i e s a n d a xo n s u s i n g s i m p l i fi e d
mathematical models

How it works..?
- O n re c e i v i n g e n o u g h s i g n a l s t o t h e H o w it w o rk s..?
d e n d r i t e s , t h e y p a s s i t t o t h e a xo n - A f u n c t i o n t h a t re c e i v e s a
- Outgoing signal can then be used as list of weighted input
a n o t h e r i n p u t f o r o t h e r n e u ro n s , signals and outputs some
re p e a t i n g t h e p ro c e s s . kind of signal if the sum
- S o m e s i g n a l s a re m o re i m p o r t a n t t h a n of these weighted inputs
o t h e r s a n d c a n t r i g g e r s o m e n e u ro n s re a c h a c e r t a i n b i a s .
t o fi re e a s i e r
- C o n n e c t i o n s c a n b e c o m e s t ro n g e r o r
w e a ke r , n e w c o n n e c t i o n s c a n a p p e a r
Artifi cial Neural Network (ANN)
• ANN is an information-processing model inspired from the biological
nervous system.
• Like humans, ANN learns from by the example. More the data, better the
learning.
• Can solve one problem at a time unlike to the brain. (Face recognition/
Language Understanding/ Prediction/ Classification/ Forecasting)
• It is composed of large number of highly interconnected processing
units(neurons) working to solve a specific type of problem. Neurons are
connected with the connection link.

• Each neuron has its own internal state called Activation Level of neuron
Neural Network and Perceptron

I=
Input = {x1,x2,x3,x4}
Basic Components
1. Connections – connections between the neurons containing the weights.
i. Single-Layered feed-forward network
ii. Multilayer feed-forward network
iii. Single node with its own feedback
iv. Single-layer Recurrent network
2. Learning – process by which a neural network adapts itself to stimulate by adjusting
parameters to provide the most optimal output.
i. Parameter Learning – update the weights
ii. Structure Learning – update the the layers and number of neurons
3. Activation Function
i. Binary step function
ii. Linear function
iii. Sigmoid activation function

iv. ReLU Activation function (Rectified Linear Unit), Leaky ReLU, Parameterised ReLU
Connections

Inpu Outpu
t t

Feedba
ck
Single node with own
ACTIVATION FUNCTION

• A n e u r a l n e t w o r k a c t i v a t i o n f u n c t i o n i s a f u n c t i o n t h a t i n t ro d u c e s
nonlinearity into the model.
• T h e y h e l p t o d e t e rm i n e t h e o u t p u t o f a n e u r a l n e t w o r k b y a p p l y i n g
m a t h e m a t i c a l t r a n s f o rm a t i o n s t o t h e i n p u t s i g n a l s re c e i v e d f ro m
o t h e r l a y e r s i n a n e t w o r k . Ac t i v a t i o n f u n c t i o n s a l l o w f o r c o m p l ex
n o n - l i n e a r re l a t i o n s h i p s b e t w e e n i n p u t a n d o u t p u t d a t a p o i n t s .
• I n n e u r a l n e t w o r k n o t a l l t h e i n f o rm a t i o n i s i m p o r t a n t . Ac t i v a t i o n
f u n c t i o n h e l p s t o u s e i m p o r t a n t i n f o rm a t i o n a n d s u p re s s t h e
i rre l e v a n t d a t a p o i n t s .

https://vitalflux.com/how-know-data-linear-non-linear/
Why use activation functions?
• Ac t i v a t i o n f u n c t i o n s ’ a d d s n o n - l i n e a r i t i e s i n t o t h e n e t w o r k
• In the absence of activation functions, the network would only be
capable of p e r f o rm i n g linear t r a n s f o rm a t i o n s , which cannot
a d e q u a t e l y re p re s e n t t h e c o m p l ex i t y a n d n u a n c e s o f re a l - w o r l d d a t a .
• N o rm a l i z i n g e a c h n e u ro n i n t h e n e t w o r k ’ s o u t p u t i s a ke y b e n e fi t o f
utilizing activation functions.
• Depending on the inputs it gets and the weights associated to those
i n p u t s , a n e u ro n ’ s o u t p u t c a n r a n g e f ro m ex t re m e l y h i g h t o
ex t re m e l y l o w.
• Ac t i v a t i o n f u n c t i o n s m a ke e n s u r i n g t h a t e a c h n e u ro n ’ s o u t p u t f a l l s
i n s i d e a d e fi n e d r a n g e , w h i c h m a ke s i t s i m p l e r t o o p t i m i s e t h e
network during training
Why use activation functions?

Linear
Limitation
• Allows the network
• Functions
to move beyond
transform the simply separating
• NN process linear data Learning Foundation for
data linearly, like
data layer by Complex Complex Tasks
into a non- Patterns logistic regression.
layer. linear form • It can learn
• Each layer • e.g., sigmoid intricate
performs a • If all layers relationships
function between features • Image
weighted sum used linear curves the • The network can
of its inputs in the data. recognition,
activation output learn complex Natural
and adds a functions (y = patterns in the Beyond Linear
bias. data that wouldn’t Separation language
mx + b), Introducing
Non-linearity be possible with processing,
Data stacking these just linear
Processing functions.
layers would
just create
another linear
function.
• No matter
how many
layers you
add, the
output would
still be a
straight line.
Ty p e s o f A c t iv a t i o n
Binary Step Fu n c t io n s Sigmoid activation
Linear Tanh activation
Function Function function function

Whether or not the neuron Activation is proportional to Use to classify the information. Maps inputs onto values between -
should be activated the input Maps any input onto a value 1 and 1.
between 0 and 1. Suffers from Multi-dimensional classification
vanishing gradient problems.

Softmax activation ReLU activation Leaky ReLU activation


function function function

Adds a small slope in the negative


Combination of multiple sigmoid. Can Most effective in hidden layers.
be used for multiclass classification direction to prevent ReLU problems from
vanishing gradient problem
problems disappearing
Ty p e s o f A c t iv a t i o n
Activation Fu n c t io n s Output
Python Code Graphical Description
Function Representation

Sigmoid def sigmoid(x): [0.26894142 Use to classify


the information.
return 1 / (1 + np.exp(-x)) 0.5 Maps any input
0.73105858] onto a value
print(sigmoid(x)) between 0 and
1. Suffers from
vanishing
gradient
problems.

tanh def tanh(x): [-0.76159416 Maps inputs


onto values
return np.tanh(x) 0. between -1 and
0.76159416] 1.
print(tanh(x)) Multi-
dimensional x = np.array([-1, 0,
classification
1])
ReLU def relu(x): [0 Most effective in
hidden layers.
(Rectified return np.maximum(0, x) 0 vanishing
Linear Unit) 1] gradient
print(relu(x)) problem

Leaky ReLU def leaky_relu(x, alpha=0.01): [-0.01 Adds a small


slope in the
return np.maximum(alpha * x, 0. negative
x) 1. ] direction to
prevent ReLU
print(leaky_relu(x)) problems from
disappearing

Softmax def softmax(x): [0.09003057 Combination of


multiple
exp_x = np.exp(x) 0.24472847 sigmoid. Can be
return exp_x / np.sum(exp_x) 0.66524096] used for
Choosing the Right Activation Function
What is Learning Rate?
• L e a r n i n g r a t e i s a h y p e r p a r a m e t e r t h a t d e t e r m i n e s t h e s i z e o f t h e s t e p s t a ke n
durin g the optim ization proc ess . Oft e n i n t he range be t w e e n 0.0 and 1.0
• H e l p s t o fi n d t h e o p t i m a l s e t o f w e i g h t s a n d b i a s e s t h a t m i n i m i z e t h e e r r o r
of the model on your training data.
• Comes into play during the optimization algorithm, such as gradient descent.
• It controls how much you want to update your weights with respect to the
loss gradient.
• A high learning rate means you’re taking large steps, which might cause you
to overshoot the minimum.
• Sigmoid functions and their combinations generally work better in the case of
classifiers
• Sigmoids and tanh functions are sometimes avoided due to the vanishing gradient
problem
• ReLU function is a general activation function and is used in most cases these days
• If we encounter a case of dead neurons in our networks the leaky ReLU function is
the best choice
• Always keep in mind that ReLU function should only be used in the hidden layers
• As a rule of thumb, you can begin with using ReLU function and then move over to
other activation functions in case ReLU doesn’t provide with optimum results
Perceptron Learning Rules (Single Output Class)

Initialize – Weight, Bias, Learning Rate ()

- Repeat for each training pair of the vector (s : t) until


condition is false
- Calculate the output of the network
1 if ym>
ym = y= 0 if - <= ym<=
f(ym)= -1 if ym <
x1
X1
w1
b

x2 w

if y
s=
- Weight and bias adjustment X2 2 y

wi(new) = wi(old) +
y=
w f(s)

txi
3
x3

b(new) = b(old) + t
X3

else
wi(new) = wi(old)
b(new) = b(old)

https://www.youtube.com/watch?v=gUCFaWULv0s&list=PL4gu8xQu0_5JK6KmQi-Qx5hI3W13RJbDY&index=11
Start

Initialize Weight and Bais

Set (0 to 1)

For
eac No
h
s:t

Activate input units


xi = s i

Calculate net input ym

Apply activation function y


= f( ym)

If y! No
=t

wi(new) = wi(old)
Yes +
wi(new) =
txi
wi(old)
b(new) = b(old) +
b(new) = b(old)
t
If
Yes weigh
t
chang
es
No
Stop
Perceptron Learning Rules (Multiple Output Class)

Initialize – Weight, Bias, Learning Rate ()

- Repeat for each training pair of the vector (s : t) until


condition is false
- Calculate the output of the network
1 if ym>
ym = y= 0 if - <= ym<=
f(ym)= -1 if ym <

if y
- Weight and bias adjustment
wi(new) = wi(old) + txi
b(new) = b(old) + t

else
wi(new) = wi(old)
b(new) = b(old)

https://www.youtube.com/watch?v=gUCFaWULv0s&list=PL4gu8xQu0_5JK6KmQi-Qx5hI3W13RJbDY&index=11
Some Basics
Dot Product:
a1b1 + a2b2+a3b3
Vector A = a1, a2, a3
a
Vector B = b1, b2, b3 a
1 a1
1 a1
2 a2
a
1 a2 a3
5 3
b
3 15 b 1 2 3
b 1 1 1 6
b1 b2 a.b b1 b2 b3 a.b
a.b
a1b1 + a2b2 =
1+2=3 = a1b1 + a2b2 +
a3b3
= 1+2+3
=6
Matrix Multiplication:
- Row and Columns
- Column of the first matrix should be equal to the row of the second matrix
- A [1,1] B [1,2]
- First number of first matrix and last number of the second matrix will be the
dimension of the resultant matrix. R [1,2]
- Multiple – First matrix first row with the second matrix first column

c1 c2 4 3 2 1
1 1 -1 -1 B
B
B 1 -1 -1 1 -1 1 2 1 2
1 A
A A
R1c1= R1c2=- 3 1 1 1 5 5 3 3
r1 1 2 4 -2 -4
1 1 1 -1
R2c1=- R2c2= 2 4 -2 2 -6 3 1 1 -1
-1 6
r2 1 1 1 0
4 3 2 1
Single Layer 2(1) = 2
1(-1) = -1
+ 3(1) = 3
---------------------
2 4 (Sum of weighted
inputs)

input
1 + - 5 (bias)
---------------------
3 -1

1 -1 1 -5 -1 0

weight bia Activati 2


s on
Function
2 0

2
Single Layer (Multiple Nodes)
a

x
1
= 2(1)+1(-1)+3(1)-5
= 2-1+3-5
b
= -1 2 x
1
x 1 x
2 2
3 x
c 3

x 1 -1 1 -5 -1 0
a
3
b 1 1 0 0 3 3

d c 0 1 1 1 5 5

d 1 0 1 -1 3 3
Activation
weight bia Function

s
Hidden Layer = 0(1) + 3(1) + 5(-1) +
0
2
= 3-5
0 a = -2 1
x 2 3
1

1 -1 1 -5 -1 0
3 b

1 1 0 0 3 3
x 1
2 Activation
0 1 1 1 5 Function 5

5 c 1 0 1 -1 3 3

x 3
3

1 1 -1 0 0 -2 0
3 d 0 0 1 -1 1 3 Activation 3
Function
Activation Function

2 1 Integer Sigmoid

0 0.5 >=2 1
Sigmoid
-2 Activatio
0 -1,0,1 0.5
n
Function
1 1 <=-2 0

-1 0
ReLU
Activatio
-3 n 0
Function

2 1 Integer Sigmoid

0 0 >=2 1
Tanh
-2 Activatio
-1 -1,0,1 0.5
n
Function
<=-2 -1
https://www.datamation.com/big-data/neural-
network-in-excel/

https://www.javatpoint.com/pytorch-
backpropagation-process-in-deep-neural-network
Perceptron learning rules

https://www.youtube.com/watch?
v=KKSCmPUyczU&list=PL4gu8xQu0_5JK6KmQi-
Qx5hI3W13RJbDY&index=13
Back propagation

• https://hmkcode.com/ai/backpropagation-step-by-step/
• https://medium.com/@karna.sujan52/back-propagation-algorithm-numerical
-solved-f60c6986b643
Why Bias is required?

https://www.youtube.com/watch?v=SJ-hWwBF3zU
Model Building
model.compile()
• Function in Keras is used to configure the model for training.
• It specifies the loss function, optimizer, and metrics
model.compile(optimizer=‘ ‘ , loss=‘ ‘ , metrics=[‘
'])

optimizer=‘ ‘ loss=‘ ‘ metrics=[‘ ']

• Responsible for updating • Measures how well the • Used to monitor the
the weights of the model is performing. performance of the
network to minimize the • Commonly used loss model.
loss function functions- • Commonly used metrics -
• Commonly used • • accuracy : percentage of
optimizers - categorical_crossentrop correct prediction
• Adaptive Moment y : multiclass • Mean Absolute Error
Estimation (adam) classification with one- (mae) : Regression
• Stochastic Gradient hot encoded target. problems
Decent (sgd) • Binary crossentropy : • Mean Squared Error
• Root Mean Sqared binary classification (mse) : Regression
Propogation (rmsprop)
• Adaptive • Mean_squared_error : problems
Gradient
Optimizers

Adam model.compile (optimizer='adam', loss='categorical_crossentropy',


optimizer metrics=['accuracy'])
(most
commonly
used)
SGD with a from tensorflow.keras.optimizers import SGD
custom
learning rate model.compile(optimizer=SGD(learning_rate=0.01),
loss='categorical_crossentropy', metrics=['accuracy'])
RMSProp from tensorflow.keras.optimizers import RMSprop

model.compile(optimizer=RMSprop(learning_rate=0.001),
loss='mean_squared_error', metrics=['mae'])
Loss

Categorical model.compile(optimizer='adam',
Crossentropy for loss='categorical_crossentropy', metrics=['accuracy'])
multi-class
classification
Binary model.compile(optimizer='adam', loss='binary_crossentropy',
Crossentropy for metrics=['accuracy'])
binary classification
Mean Squared Error model.compile(optimizer='adam', loss='mean_squared_error',
(MSE) for metrics=['mae'])
regression
Metrics
Metrics Description Example
'accuracy': Measures the proportion of correctly model.compile(optimizer='adam',
classified samples loss='categorical_crossentropy',
metrics=['accuracy'])
'binary_accuracy Same as 'accuracy' but specifically for
binary classification
'mae' (Mean Measures the average absolute error for model.compile(optimizer='adam',
Absolute Error): regression loss='mean_squared_error',
metrics=['mae'])
'mse' (Mean Measures the average squared error for
Squared Error): regression
'mape' (Mean Measures the average absolute
Absolute percentage error for regression
Percentage Error)
Precision Measures the proportion of true positives from tensorflow.keras.metrics import
among all positive predictions Precision, Recall
Recall Measures the proportion of true positives
among all actual positives model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=[Precision(), Recall()])
AUC Measures the model's ability to from tensorflow.keras.metrics import
distinguish between classes. Useful in AUC
binary classification.
REFERENCES

• https://medium.com/@sathishv700/history-of-deep-learning-d734c5fd4b12
• https://www.v7labs.com/blog/deep-learning-guide
• https://towardsdatascience.com/the-differences-between-artificial-and-biolo
gical-neural-networks-a8b46db828b7
• https://www.youtube.com/watch?v=vVWRsZMi8xs&list=PL4gu8xQu0_5JK6K
mQi-Qx5hI3W13RJbDY&index=2
• https://www.shiksha.com/online-courses/articles/activation-functions-with-re
al-life-analogy-and-python-code/
• https://www.analyticsvidhya.com/blog/2020/01/fundamentals-deep-learning
-activation-functions-when-to-use-them/
• https://developers.google.com/machine-learning/crash-course/reducing-loss
/learning-rate

You might also like