[go: up one dir, main page]

0% found this document useful (0 votes)
33 views64 pages

NNDL

Uploaded by

sumitsudamnikam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views64 pages

NNDL

Uploaded by

sumitsudamnikam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Networks

Neural
aud

Deep Learning
.
nates)
(sem
-z

uram
Models of a nevror .

model :
Deterministic

D Synapses/connecting links .

-> in pat signal


-> weight

2) Addes .

-> linear combiner .

3) Activation function . / Squashing function .

the permissible amplite range of


-> limits

output to a finite value .

- the net input of activation function can be

lowered or increased
through bias .

-> bios modifies the relationship between

induced local field and activation function .

of
Typesactivation functions
---
function
function) Heaside
1) Threshold
.

(unity amplificationn
a
linear function
1) Piecewise
function
3) sigmoid
.

parameter)
Fac [a
I = slope

6 () =

Sigmoid threshold faction


zinfinity >
.

a
Memory
-

.
connected
memory and learning intricately
->
are

-> activities from spatial inside the


Neural a
pattern
that stimulus
memory contains information about the .

-> Thus
memory
ransforms an
activity pattern in the

into
input space another
activity pattern in the

output space .
[It is through matrix multiplication) Y =RX.
.

defines the aurall


-
The weight matrix connectivity
between input and outpat layers of associative

memory
.

Correlation
-
Matrimony .

M
M = YpX

= · Ya si
k=1

call normal
M ↑ onsider Ij to be
y =
Mxj
m vector (1) 11 jll=1

j
=
y =

I
&j]Y1
7

+Ej) yj
2L+x
3
=[c)Y;
=
+

Ej j
Failure in learning .

->
Training algo may
not find solution parameters
.

wrong function
->
due to
Training algo moose
may
over
fitting
.

[MM
-

->
Captures Spatial features from an
image .

Spatial features
identify object and it's
->
help the location

more
accurately
.

PN
->
intermediate results fed back to the
are
predict
of
outcome the .
layer
-> Information from
previous time-step is remembered by
a function
memory
.

Gradient based learning


-
.

--

optimization algorithm uses model


->
gradients to update
parameters during training
-> Enables learn complex representations of data .

-> Gradient infolSensitivity info is needed to determine

the search directions .

partial derivatives used


-> Spatial and temporal are

to estimate flow at every position .


of
a Neal
Math Network
No ofMevrons/Units of MM
.
- .
in each layer a

-> with
> more to
more
capacity capture complex patterns
.

wid better for


-
>more
memorizing and fitting complex data
.

But
-> , more with may lead to
overfitting .

Depth
of a Wal etwork

-> No of,

layers in a MM
.

-> Movedepth) better at hierarchial data .

-> more depth of better at abstract features .

->
may face and exploding gradients problems
vanishing ,

Activation Functions
I ,

--

-> Introduce non-linear properties in Neural Network,


a

Restricted Boltzmann Machines


-
,

-
-
Used for
=
generative modelling and unsupervised learning
.

-> Stochastic learning processes .

->
Statistical in .
nature

used
->
in supervised learning .

-> The state of each individual newson is taken into .


account

->
They have fixed weight .

-> Howevers ,
the weights are to be set

-Its to maximize the consensus function


objective is
AutoEncoders ,

->
Arms capable of dense representations of input data ,
-> These dense representations are called 'latent representation's
or
'codings',
-> It works unsupervised learn to copy inputs to .
outputs
,
than the
->
Codings typically have a lower
dimensionality
,
input
->
They also act as feature detectors

Can be
-> used for unsupervised pretaining of nearal

networks
.

-> Some autoencoders are generative models .

-> These are


capable of generating
random new data

that looks similar to the data


very training .

->
Morse can be added or Site of latent representations can ba

limited.
->
This forces the model to learn efficient of representing
ways
the data ,

-> It
always
has two ports: (i) Encoder (recognition network)

(i) Decoder/generation network)


.

of op No of input
neurons in layer layers.
=
- No ,
, neurons in

sine outputs are called .


reconstructions

-> Const function contains a reconstruction loss .

-> Because Me internal representation of the autoencoder hos

lower the auto said to be


dimensionality Mon input data, encoder is

under complete
-> Thus a full layer of nevrons using the same filter
,

outputs a feature map


.
-> A feature map highlights the areas in an image that activate

the filter most .

the
Convolution will
During training layer automatically
-> learn

the most useful filters for its took and the layers above it
,

to
will know combine them into complex patterns
-> Each canvolation layer has multiple milters and outputs

one featuremap for each filter .

-> It has one nearon per pixcel in feature map.


-> All neurons of a
given featuremap share the some parameters .

-> Thus
,
a convolational layer applies multiple mainable filters to

its inputs making it capable of detecting multiple features

anywhere in its inputs.

-
> Valid Padding : No zero padding
. Each neuson's receptive

field lies strictly within valid positions


inside the input
.

-
Same padding - Inputs are padded with enough zeroes on

all sides to ensure that output feature

maps endup with the same size as the


in puts

-> If stride >I ,


then output size will nat be equal to input

Size even
for some pudding .
: Controls outpat gate
which decides
[Iv> Outputs O(P) ,

what of state should


parts longterm
output this timeStep
be read and at

& o nC] and Y() .

So LSTM cell can't is learn to recognize an importan input


.
,
(i) Store that important input in

state
- long-term ,

too as long as
.
needed
(ii) Preserve it

GRU cells's
-
-> A simplified version of LSTM cell .

ificationsstate vectors are merged into h,

controls forget and input


· A single gate convaller
other closes
.
When opens, the
gates one
.

·
No olpgate .
Full state vector is outputted at
time step
.

every

· New gate conmalles wt] which tells what

part of previous State will be shazon to &(D)


.

However good LSTMs and RGRU cells are , they still can't

. So Input sequences are shortened


tackle very long sequences ,

using ID Convolution layers


.

You might also like