0% found this document useful (0 votes)

134 views133 pages

ML Section15 Neural Networks

This document discusses single-layer neural networks for classification problems. It describes how a linear discriminant function can create linear decision boundaries for two-category or multi-category classification. While this is limited by its inability to solve problems like XOR, generalized linear discriminants using predefined basis functions can approximate any continuous function. The document outlines training these networks using least squares or stochastic gradient descent to minimize error on training data by updating weights. Early single-layer networks like the Perceptron are also briefly mentioned.

Uploaded by

dummy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views133 pages

ML Section15 Neural Networks

Uploaded by

dummy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 133

Machine Learning

Section 15: Neural Networks

Stefan Harmeling

13./15./20. December 2021

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 1

What we have seen so far?
Sections:
1. Introduction
2. Plausible reasoning and Bayes Rule
3. From Logic to Probabilities
4. Bayesian networks
5. Continuous Probabilities
6. The Gaussian distribution
7. More on distributions, models, MAP, ML
8. Linear Regression
9. Matrix Differential Calculus
10. Model selection
11. Support Vector Machines
12. PCA, kPCA
13. ISOMAP, LLE
14. k-means, GMM, EM algorithm
Machine Learning / Stefan Harmeling / 13./15./20. December 2021 2
We follow Chapters 3 and 4 of the book:
Neural Networks and Pattern Recognition
by Chris Bishop (1995)

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 3

Single-layer networks
Two category classification problem
▸ linear discriminant function

y (x) = w T x + w0

with weights w and threshold w0 (sometimes called bias)

▸ if y (x) > 0 assign x to class 1 otherwise to class 2
▸ our first neural network! (see board)
▸ creates a linear decision boundary
▸ the bias can be merged into the weights:

x̃ = [1, x T ]T
w̃ = [w0 , w T ]T
y (x) = w̃ T x̃

(see board, getting from 1D to 2D, or from 2D to 3D)

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 4

Single-layer networks
Several category classification problem
▸ linear discriminant function for k classes

yk (x) = wkT x + wk 0

with per-class weights wk and threshold wk 0 (sometimes called

bias)
▸ if yk (x) > yj (x) for all k ≠ j assign x to class k
▸ neural network representation (see board)
▸ consider x assigned to class k , i.e. for j ≠ k :

yk (x) − yj (x) > 0

(wk − wj ) x + wk 0 − wj0 = 0
T

▸ creates also linear decision boundaries (see board, similar to

Voronoi cells)

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 5

Single-layer networks

Logistic discriminants
▸ generalize linear discriminant function for two classes by applying
monotonic non-linear function g called activation function:

y (x) = g(w T x + w0 )

▸ still linear decision boundary (monotonicity), so what’s the point?

are there useful choices for g?

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 6

Single-layer networks
Consider two class problem
▸ assume the locations x sampled from Gaussians with
class-dependent means µ1 and µ2 and covariance matrix Σ:

1
p(x∣Ck ) = exp (− 12 (x − µk )T Σ−1 (x − µk ))
(2π)d/2 ∣Σ∣1/2

▸ Bayes theorem implies:

p(x∣C1 )p(C1 ) 1
p(C1 ∣x) = =
p(x∣C1 )p(C1 ) + p(x∣C2 )p(C2 ) 1 + p(x∣C2 )p(C2 )
p(x∣C1 )p(C1 )
1
= = g(a) “the logistic sigmoid”
1 + exp(−a)

with a = ln p(x∣C1 )p(C1 )

p(x∣C2 )p(C2 )
= w T x + w0 . Can be solved for w and w0 .
▸ Thus: the output for the logistic sigmoid can be interpreted as
posterior probabilities.
Machine Learning / Stefan Harmeling / 13./15./20. December 2021 7
Single-layer networks

The expressions for w and w0 from the previous slide are:

w = Σ−1 (µ1 − µ2 )
p(C1 )
w0 = − 21 µT1 Σ−1 µ1 + 12 µT2 Σ−1 µ2 + ln
p(C2 )

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 8

Single-layer networks

Linear discriminants
▸ discriminant function for two classes by applying sigmoid:

y (x) = g(w T x + w0 )

▸ how powerful is this? any limitations? e.g. limitations in 2D?

▸ XOR-problem can not be solved! (see board)
▸ this is related to the VC dimension of linear classifier in two
dimensions which is 3 (see later Section on Statistical Learning Theory)
▸ from the perspective of statistical learning theory, failing at the
XOR problem is not a bad thing, but bounds the capacity of the
single-layer networks which leads to regularization...
▸ nonetheless: how can we make it more powerful?

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 9

Single-layer networks

Generalized linear discriminants

▸ generalized discriminant function for several classes:

M
yk (x) = ∑ wkj φj (x)
j=0

with pre-defined basis functions φj (aka features) and fixing

φ0 (x) = 1 to omit the theshold/bias
▸ how powerful is this? any limitations?
▸ one can show that for suitable choice of basis functions any
continuous function can be arbitrarily well approximated
▸ so there is also a choice that solves the XOR-problem, e.g. in 2D:

y (x) = wφ(x)

with a single basis function φ(x) = x1 x2 for x = [x1 , x2 ]T .

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 10

Single-layer networks
Learn weights from data:
▸ given N training data points (x 1 , t 1 ), . . . , (x N , t N ) consisting of
locations and targets t n = [t1n , . . . , tcn ]T for c classes
▸ tkn = 1 if x n is in class k otherwise tkn = 0
Least-squares techniques
▸ Minimize sum-of-squares error function:
2
1 N c 2 1 N c ⎛M ⎞
E(w) = ∑ ∑ (yk (x n ) − tkn ) = ∑ ∑ ∑ wkj φj (x n ) − tkn
2 n=1 k =1 2 n=1 k =1 ⎝j=0 ⎠

▸ derivative of E wrt. w:
N ⎛ M ⎞ N
∂
E(w) = ∑ ∑ wkj ′ φj ′ (x n ) − tkn φj (x n ) = ∑ rkn φj (x n )
∂wkj n=1 ⎝j ′ =0 ⎠ n=1

where residuals rkn = yk (x n ) − tkn

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 11

Single-layer networks

Derivative for a single location:

∂
E n (w) = rkn φj (x n )
∂wkj

Stochastic gradient descent

▸ iterate over all data points and update the weights using some
learning rate, i.e. for n = 1 to N update w:

wkj ← wkj − ηrkn φj (x n )

with learning rate η

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 12

Goal: given training examples D = {(x 1 , t 1 ), . . . , (x N , t N )}
1 N 1 N
min E(w) = min ∑ E n (w) = min ∑ (t n − f (x n , w))2
w w 2 w 2
n=1 n=1

Algorithm 15.1 (Batch gradient descent)

1. Initialize w0 .
2. For several epochs update the parameters with the overall
gradient:
N
wt+1 = wt − η ∑ ∇w E n (w)
n=1

Algorithm 15.2 (Stochastic gradient descent)

1. Initialize w0 .
2. For several epochs update the parameters with a local gradient at
a randomly chosen location (e.g. for point x n ):

wt+1 = wt − η∇w E n (w)

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 13

Early single-layer networks: Perceptron / Adalines
The perceptron for the two class problem (Rosenblatt, 1962)
▸ output of the perceptron:

⎛M ⎞
y (x) = g ∑ wj φj (x) = g(w t φ(x))
⎝j=0 ⎠

with activation function g(a) = sign(a) being 1 if a ≥ 0 and -1

otherwise
▸ given N training data points (x 1 , t 1 ), . . . , (x N , t N ) consisting of
locations and targets t n ∈ {−1, +1}
▸ perceptron criterion: minimize

E perc (w) = − ∑ w T φ(x n )t n

y (x n )≠t n

where the summation is over all miss-classified training examples

▸ essentially the same as adalines (ADAptive LINear Element,
Widrow and Hoff 1960)
Machine Learning / Stefan Harmeling / 13./15./20. December 2021 14
Single-layer networks

Description: Note on back of

photograph reads Dr. Frank Rosenblatt (left) [Charles] W. Wightman (right) working on a
prototype association unit for 1st perceptron. Dec. 1958 / Cornell University Faculty
Biographical Files, 47-10-3394. Division of Rare and Manuscript Collections, Cornell
University Library. / https://digital.library.cornell.edu/catalog/ss:547372

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 15

Single-layer networks
The perceptron (Rosenblatt, 1962)

Mark I Perceptron at the Cornell Aeronautical Laboratory, Cornell University News

Service records, 4-3-15. Division of Rare and Manuscript Collections, Cornell University
Library / https://digital.library.cornell.edu/catalog/ss:550351.

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 16

Single-layer networks

The perceptron (Rosenblatt, 1962)

From Perceptrons by M. L Minsky and S. Papert, Copyright 1969 by MIT Press

Limits?
▸ can solve XOR problem with appropriate choice of φ
▸ however, when limiting the receptive fields of the single φj (x)
certain connectivity problems can not be solved (book
“Perceptron”, Minsky and Papert, 1969)

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 17

Spot the tiger!

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 18

Profberger at English Wikipedia https://commons.wikimedia.org/wiki/File:Cheetah_chasing_Thompsons_gazelle.jpg, “Cheetah

chasing Thompsons gazelle”, CC-BY-SA 3.0

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 19

The diameter-limited perceptron can not solve the
connectivity problem

Can you?
From Perceptrons by M. L Minsky and S. Papert, Copyright 1969 by MIT Press
Machine Learning / Stefan Harmeling / 13./15./20. December 2021 20
From Single-layer to Multi-layer networks
How to generalize the single-layer network?

⎛M ⎞
y (x) = g ∑ wj φj (x) = g(w t φ(x))
⎝j=0 ⎠

Make φj (x) adaptive as well! Make it another single-layer network!

Multi-layer network
▸ with one hidden layer:

y (x) = b2 + W2 tanh(b1 + W1 x)

with a component-wise activation function tanh, vector-valued

input x and thresholds b1 and b2 , output y , and matrix-valued
weights W1 and W2 .
▸ with two hidden layers:

y (x) = b3 + W3 tanh(b2 + W2 tanh(b1 + W1 x))

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 21

Activation functions

Sigmoid

1
σ(a) =
1 + exp(−a)

tanh
exp(a) − exp(−a) 2
tanh(a) = =1−
exp(a) + exp(−a) 1 + exp(2x)

ReLU

ReLU(a) = max(0, a)

▸ always nonlinear
▸ always a component-wise function

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 22

Multi-layer networks (python/numpy)

Multi-layer network:

y (x) = b3 + W3 tanh(b2 + W2 tanh(b1 + W1 x))

Forward computation:
z1 = x # input layer
z2 = np.tanh(b1 + W1 @ z1) # second layer (hidden)
z3 = np.tanh(b2 + W2 @ z2) # third layer (hidden)
y = b3 + W3 @ z3 # output layer
r = y - t # compare to truth
E = 0.5 * r.T @ r # squared error

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 23

Multi-layer networks

Calculating the differentials

dE = r T dr
= r T dy
= r T (db3 + (dW3 )z3 + W3 dz3 ) read off Db3 E and DW3 E
= r W3 dz3
T
continuing with z3
= r W3 d tanh(b2 + W2 z2 )
T

= r T W3 Diag(1 − z3 ⊙ z3 )(db2 + (dW2 )z2 + W2 dz2 )

= ...

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 24

ea − e−a
d tanh(a) = d
ea + e−a
(ea + e−a )(ea + e−a )da − (ea − e−a )(ea − e−a )da
=
(ea + e−a )2
= (1 − tanh(a)2 ) da

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 25

Multi-layer networks (python/numpy)

b3E = r
W3E = np.outer(b3E, z3)
b2E = (1 - z3**2) * (W3.T @ b3E)
W2E = np.outer(b2E, z2)
b1E = (1 - z2**2) * (W2.T @ b2E)
W1E = np.outer(b1E, z1)

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 26

Multi-layer networks (python/numpy)
Backward computation (calculate gradients):

b3E = r
W3E = np.outer(b3E, z3)
b2E = (1 - z3**2) * (W3.T @ b3E)
W2E = np.outer(b2E, z2)
b1E = (1 - z2**2) * (W2.T @ b2E)
W1E = np.outer(b1E, z1)

Gradient descent:
eta = 0.1;
W1 = W1 - eta * W1E
b1 = b1 - eta * b1E
W2 = W2 - eta * W2E
b2 = b2 - eta * b2E
W3 = W3 - eta * W3E
b3 = b3 - eta * b3E

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 27

Multi-layer networks (matlab)

Multi-layer network:

y (x) = b3 + W3 tanh(b2 + W2 tanh(b1 + W1 x))

Forward computation:
z1 = x; # input layer
z2 = tanh(b1 + W1*z1); # second layer (hidden)
z3 = tanh(b2 + W2*z2); # third layer (hidden)
y = b3 + W3*z3 ; # output layer
r = y-t; # compare to truth
E = 0.5 * r’ * r; # squared error

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 28

Multi-layer networks (matlab)

b3E = r;
W3E = b3E * z3’;
b2E = (1-z3.^2) .* (W3’ * b3E)
W2E = b2E * z2’;
b1E = (1-z2.^2) .* (W2’ * b2E)
W1E = b1E * z1’;

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 29

Multi-layer networks (matlab)
Backward computation (calculate gradients):
b3E = r;
W3E = b3E * z3’;
b2E = (1-z3.^2) .* (W3’ * b3E)
W2E = b2E * z2’;
b1E = (1-z2.^2) .* (W2’ * b2E)
W1E = b1E * z1’;
Gradient descent:

eta = 0.1;
W1 = W1 - eta * W1E;
b1 = b1 - eta * b1E;
W2 = W2 - eta * W2E;
b2 = b2 - eta * b2E;
W3 = W3 - eta * W3E;
b3 = b3 - eta * b3E;

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 30

What is back propagation?

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 31

▸ Consider simple neural network with scalars

y = f3 (w3 , f2 (w2 , f1 (w1 , x)))

x and y are scalars, parameters w3 , w2 , w1 are also scalars. We

don’t specify f1 , f2 , f3 .
▸ For a data point x with target value t we define the loss function

E = 12 (y − t)2

▸ Goal: learn parameters that minimize the loss

min E
w3 ,w2 ,w1

▸ Back-propagation is an efficient way to calculate the derivatives of

the the loss wrt the parameters:

∂E ∂E ∂E
∂w3 ∂w2 ∂w1

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 32

Rewrite the network into layers

z1 = x input layer
z2 = f1 (w1 , z1 ) second layer (hidden)
z3 = f2 (w2 , z2 ) third layer (hidden)
y = z4 = f3 (w3 , z3 ) output layer
E = 12 (y − t)2 squared error (the loss)

Applying chain rule

∂E ∂E ∂z4
=
∂w3 ∂z4 ∂w3
∂E ∂E ∂z4 ∂z3
=
∂w2 ∂z4 ∂z3 ∂w2
∂E ∂E ∂z4 ∂z3 ∂z2
=
∂w1 ∂z4 ∂z3 ∂z2 ∂w1

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 33

Propagation (feed forward)

z1 = x input layer
z2 = f1 (w1 , z1 ) second layer (hidden)
z3 = f2 (w2 , z2 ) third layer (hidden)
y = z4 = f3 (w3 , z3 ) output layer
E= 1
2
(y − t) 2
squared error (the loss)

x = z1 f1 z2 f2 z3 f3 z4 loss E

w1 w2 w3 t

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 34

Back propagation

∂E ∂E ∂z4
=
∂w3 ∂z4 ∂w3
∂E ∂E ∂z4 ∂z3
=
∂w2 ∂z4 ∂z3 ∂w2
∂E ∂E ∂z4 ∂z3 ∂z2
=
∂w1 ∂z4 ∂z3 ∂z2 ∂w1

∂E ∂E ∂E
f1 ∂z2 f2 ∂z3 f3 ∂z4 loss ∂E
∂E
=1

∂E ∂E ∂E
∂w1 ∂w2 ∂w3

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 35

Propagation (feed forward)

x = z1 f1 z2 f2 z3 f3 z4 loss E

w1 w2 w3 t

Back propagation
∂E ∂E ∂E
f1 ∂z2 f2 ∂z3 f3 ∂z4 loss ∂E
∂E
=1

∂E ∂E ∂E
∂w1 ∂w2 ∂w3

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 36

Back-propagation step by step
1. At the loss:
▸ sends ∂E
to f3
∂z4

2. At f3 :
▸ receives ∂E
∂z4
▸ calculates ∂E = ∂E ∂z4 with ∂z4
= ∂
f (w3 , z3 )
∂w3 ∂z4 ∂w3 ∂w3 ∂w3 3
▸ updates w3 = w3 − η ∂E
∂w3
▸ sends ∂E
= ∂E ∂z4
to f2 with ∂z4
= ∂
f (w3 , z3 )
∂z3 ∂z4 ∂z3 ∂z3 ∂z3 3

3. At f2 :
▸ receives ∂E
∂z3
▸ calculates ∂E = ∂E ∂z3 with ∂z3
= ∂
f (w2 , z2 )
∂w2 ∂z3 ∂w2 ∂w2 ∂w2 2
▸ updates w2 = w2 − η ∂E
∂w2
▸ sends ∂E
= ∂E ∂z3
to f1 with
∂z3
= ∂
f (w2 , z2 )
∂z2 ∂z3 ∂z2 ∂z2 ∂z2 2

4. At f1 :
▸ receives ∂E
∂z2
▸ calculates ∂E = ∂E ∂z2 with ∂z2
= ∂
f (w1 , z1 )
∂w1 ∂z2 ∂w1 ∂w1 ∂w1 1
▸ updates w1 = w1 − η ∂E
∂w1

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 37

Notation!
▸ Scalars only (EASY)

∂E ∂E ∂y
=
∂w ∂y ∂w
° °°
∈R ∈R ∈R

▸ Scalars and vectors (BE CAREFUL)

∂E ∂E ∂y
=
∂x ∂y ∂x
° ° °
∈R1×n ∈Rm×n
∈R1×m

where E ∈ R, x ∈ Rn , y ∈ Rm .
▸ Scalars, vectors, and matrices (DANGER!)

∂E ∂E ∂y
= Here, good notation is difficult!
∂W ∂y ∂W
± ° ±
∈Rn×m ∈R1×m
m×m×n
∈R

where E ∈ R, W ∈ Rm×n , y ∈ Rm .
Machine Learning / Stefan Harmeling / 13./15./20. December 2021 38
What is back propagation?
Let’s look at Stanford’s course on Deep Neural Networks for Computer
vision (2017, Fei-Fei Li, Justin Johnson, Serena Yeung,
http://cs231n.stanford.edu/2017/)

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 39

Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 21 April 13, 2017
49
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 22 April 13, 2017
50
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 23 April 13, 2017
51
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 4 - 24
52
April 13, 2017
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

f
“local gradient”

Lecture 4 - 29
gradients

[1] x [0.2] = 0.2
[1] x [0.2] = 0.2 (both inputs!)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 41 April 13, 2017
69
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 4 - 42
70
April 13, 2017
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Another example:

[local gradient] x [upstream gradient]

x0: [2] x [0.2] = 0.4
w0: [-1] x [0.2] = -0.2

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 43 April 13, 2017
71
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

sigmoid function

sigmoid gate

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 44 April 13, 2017
72
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

sigmoid function

sigmoid gate

(0.73) * (1 - 0.73) = 0.2

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 45 April 13, 2017
73
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Patterns in backward flow

add gate: gradient distributor

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 46 April 13, 2017
74
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Patterns in backward flow

add gate: gradient distributor

Q: What is a max gate?

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 47 April 13, 2017
75
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Patterns in backward flow

add gate: gradient distributor

max gate: gradient router

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 48 April 13, 2017
76
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Patterns in backward flow

add gate: gradient distributor

max gate: gradient router
Q: What is a mul gate?

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 49 April 13, 2017
77
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Patterns in backward flow

add gate: gradient distributor

max gate: gradient router
mul gate: gradient switcher

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 50 April 13, 2017
78
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Gradients add at branches

Lecture 4 - 51
79
April 13, 2017
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Gradients for vectorized code (x,y,z are This is now the

now vectors) Jacobian matrix
(derivative of each
element of z w.r.t. each
“local gradient” element of x)

f
gradients

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 52 April 13, 2017
80
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Vectorized operations

4096-d f(x) = max(0,x) 4096-d

input vector (elementwise) output vector

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 53 April 13, 2017
81
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Vectorized operations

Jacobian matrix

4096-d f(x) = max(0,x) 4096-d

input vector (elementwise) output vector

Q: what is the
size of the
Jacobian matrix?

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 54 April 13, 2017
82
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Vectorized operations

Jacobian matrix

4096-d f(x) = max(0,x) 4096-d

input vector (elementwise) output vector

Q: what is the
size of the
Jacobian matrix?
[4096 x 4096!]

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 55 April 13, 2017
83
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Vectorized operations

4096-d f(x) = max(0,x) 4096-d

input vector (elementwise) output vector

Q: what is the in practice we process an

entire minibatch (e.g. 100)
size of the of examples at one time:
Jacobian matrix? i.e. Jacobian would technically be a
[4096 x 4096!] [409,600 x 409,600] matrix :\

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017
84
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

Vectorized operations

Jacobian matrix

4096-d f(x) = max(0,x) 4096-d

input vector (elementwise) output vector

Q: what is the
size of the Q2: what does it
Jacobian matrix? look like?
[4096 x 4096!]

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 4 - 71
99
April 13, 2017
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

A vectorized example:

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 4 - 72
100
April 13, 2017
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

A vectorized example:

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 4 - 73
101
April 13, 2017
Slide copied from Fei-Fei Li, Justin Johnson, Serena Yeung’s Stanford course on Deep Learning, 2017

A vectorized example:

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 4 - 74
102
April 13, 2017
Back to our slides!
Thanks to Justin, Serena and Fei-Fei Li for letting me use their great
slides on deep learning!

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 103

Building blocks for neural networks
see Bottou, Gallinari, A Framework for the Cooperation of Learning Algorithms, 1991

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 104

In a nutshell
Neural networks are sequences of computational blocks (or bricks).
More general

Neural networks are directed acyclic graphs (DAGs)

of computational blocks (or bricks).
Question
What kind of blocks?

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 105

General block with input x, output y , parameters w

Tanh block (component-wise):

x tanh y

Propagation (feed forward)

y = tanh(x)

Back propagation

∂E ∂E ∂E
= Diag(1 − y ⊙ y ) = ⊙ (1 − y ⊙ y )T
∂x ∂y ∂y

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 109

Logistic block (component-wise):

x logistic y

Propagation (feed forward)

1
y=
1 + e−x
Back propagation

∂E ∂E ∂E
= Diag(y ⊙ (1 − y )) = ⊙ (y ⊙ (1 − y ))T
∂x ∂y ∂y

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 110

ReLU block (rectified linear unit, component-wise):

x ReLU y

Propagation (feed forward)

y = max(x, 0) = [x > 0] ⊙ x

Back propagation

∂E ∂E ∂E
= Diag([x > 0]) = ⊙ [x > 0]T
∂x ∂y ∂y

where [x > 0] is a vector of ones and zeros.

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 111

Convolutional block:
x conv y1 ⋯yk

f1 ⋯fk

Propagation (feed forward)

yi = fi ∗ x "convolution"

Back propagation

∂E ∂E
= ... = ...
∂fi ∂x

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 112

Max pooling block:

x max pool y

Propagation (feed forward)

y = Dx "D downsampling matrix"

Back propagation

∂E ∂E
= D
∂x ∂y

▸ pooling means “downsampling”

▸ alternatives: maximum or mean over region

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 113

Square loss block:

y square loss E

Propagation (feed forward)

E = 0.5(y − t)2

Back propagation

∂E
=y −t
∂y

▸ a loss layer is for convenience, it is not part of the solution

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 114

Softmax loss block:
y softmax loss E

Propagation (feed forward)

exp(yt )
E = − log
∑j exp(yj )

Back propagation

∂E
= ...
∂y

▸ loss function for multi-class problems

▸ assuming y = [y1 , . . . , yk ] for k classes, and t ∈ {1, . . . , k }

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 115

Neural network as a sequence of bricks

x b1 b2 b3 b4 b5 y

Neural network as a DAG of bricks

b1 b2 b3 b4 b5

x b6 b7 y

b8 b9

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 116

Famous examples

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 117

1991 Face recognition. Sonar image analysis. (Neuristique)
LeNet-5
1993 Vehicle recognition. (Onera)
1994 Zip code recognition (AT&T Bell Labs)
Layers of LeNet-5: not its bricks
1996 Check reading (AT&T Bell Labs)

from Bottou’s MLSS 2013 slides on neural networks

▸ LeNet-5 robustly recognizes digits.

▸ Developed at AT&T in the 1990s by Yann Lecun, Leon Bottou et al.
▸ LeNet-5 been used for a decade to read digits on about 10-20
percent of all (!) US checks.
▸ Demo at http://yann.lecun.com/exdb/lenet/index.html.
Machine Learning / Stefan Harmeling / 13./15./20. December 2021 118
ImageNet classification

▸ 1.2 million images, 1000 classes

▸ network with 60 million parameters

Figurefrom
2: An illustration
Kryzhevsky, of theHinton,
Sutskever, architecture of our
ImageNet CNN, explicitly
Classification showing
with Deep the delineation
Convolutional of responsibilities
Neural Networks, 2012
between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–
4096–4096–1000.

neurons in a kernel map). The second convolutional layer takes as input the (response-normalized
and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 ⇥ 5 ⇥ 48.
MachineThe third,/ Stefan
Learning fourth, and fifth
Harmeling convolutional
/ 13./15./20. layers
December 2021are connected to one another without any intervening 119
ImageNet classification

Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.
The correct label is written under each image, and the probability assigned to the correct label is also shown
with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The
remaining columns show the six training images that produce feature vectors in the last hidden layer with the
smallest Euclidean distance from the feature vector for the test image.

from Kryzhevsky, Sutskever, Hinton, ImageNet Classification with Deep Convolutional Neural Networks, 2012

In the left panel of Figure 4 we qualitatively assess what the network has learned by computing its
top-5 predictions on eight test images. Notice that even off-center objects, such as the mite in the
top-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example,
only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry)
there is genuine ambiguity about the intended focus of the photograph.
Machine Learning / Stefan Harmeling / 13./15./20. December 2021 120
Tricks of the trade

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 121

General tricks
copied from Burger, Schuler, Harmeling, Image denoising: Can plain neural networks compete with BM3D, CVPR 2012.

Goal:
Avoid flat regions of the loss function.
Try to start at the steep parts of the activation functions.
Tricks:
▸ data normalization: transform input points to have mean zero and
variance one
▸ weight initialization for linear bricks: sample
√ from a normal
distribution with mean zero and std σ = N with N being the
number of input nodes
▸ learning rate division: in each linear layer divide the learning rate
by N, the number of input nodes
See also:
▸ LeCun, Bottou, Orr, Müller, “Efficient back-prop”, 1998.

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 122

Dropout

▸ very large network might overfit (i.e. be perfect on training data,

but fail on test data)
▸ dropout is a trick during training to reduce overfitting
▸ during forward pass randomly set hidden nodes to zero (e.g. with
probability 0.5)
▸ this leads to more robust features

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 123

Dropout block (component-wise):

x dropout y

Propagation (feed forward)

m = [rand(size(x)) <= p]
y =x ⊙m

Back propagation

∂E ∂E
= ⊙m
∂x ∂y

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 124

Optimization strategies

▸ stochastic gradient descent with momentum (SGD)

vt+1 = µvt − η∇w E(wt )

wt+1 = wt + vt+1

▸ Nesterov acceleration gradient (NAG)

Nesterov, A method of solving a convex programming problem with convergence rate (1/sqrt(k)), 1983.

vt+1 = µvt − η∇w E(wt + µvt )

wt+1 = wt + vt+1

▸ AdaGrad
see Duchi, Hazan, Singer, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, JMLR, 2011.

(∇w E(wt ))i

(wt+1 )i = (wt )i − η √
∑t ′ =1 (∇w E(wt ′ ))2i
t

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 125

Unsupervised neural networks

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 126

Auto-associative neural network (autoencoder)

image from Matthias Scholz, http://nlpca.org and Scholz, Nichtlinear Hauptkomponentenanalyse auf Basis neuronaler Netze,

Diplomarbeit, 2002.

see also, Kramer, Nonlinear principal component analysis using auto- associative neural networks, AIChE Journal, 1991.

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 127

Auto-associative neural network (autoencoder)

from Matthias Scholz, http://nlpca.org

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 128

Auto-associative neural network (autoencoder)

What is optimized?
▸ The neural network can be written as follows:

f (x) = b4 + W4 tanh(b3 + w3 tanh(b2 + w2T tanh(b1 + W1 x)))

where W4 , W1 are matrices, w3 , w2 , b4 , b3 , b1 , x are vectors, and

b2 is a scalar.
▸ the loss function to minimize is

E = ∥x − f (x)∥2

i.e. we are trying to reconstruct the input with the output

▸ network reconstructs a single nonlinear directions (probably of
largest variance)

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 129

Auto-associative neural network (autoencoder)

What if we want to reconstruct a two dimensional embedding?

▸ The neural network can be written as follows:

f (x) = b4 + W4 tanh(b3 + W3 tanh(b2 + W2T tanh(b1 + W1 x)))

where W4 , W3 , W2 , W1 are matrices, b4 , b3 , b2 , b1 , x are vectors.

▸ the loss function to minimize is

E = ∥x − f (x)∥2

▸ note that if there are several dimensions for the bottleneck there is
no order on the nodes (unlike linear PCA), i.e. it is only about the
reconstruction error
▸ Scholz (2002) calls this s-NLPCA, “s” like symmetric.

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 130

Auto-associative neural network (autoencoder)

(Hierarchical) h-NLPCA, Scholz, 2002

▸ Scholz suggests an hierarchical error function (called h-NLPCA,
“h” like hierarchical, e.g. for two dimensions):

E = αE1 + E12 for 0 ≤ α < ∞

▸ For α = ∞ we get one-dimensional NLPCA, for α = 0 we get

s-NLPCA, in between he calls it h-NLPCA.

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 131

Auto-associative neural network (autoencoder)

from Matthias Scholz, http://nlpca.org

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 132

end for today!

Machine Learning / Stefan Harmeling / 13./15./20. December 2021 133

Lecture 8 Deep Learning Overview PDF
No ratings yet
Lecture 8 Deep Learning Overview PDF
98 pages
Graphblas
No ratings yet
Graphblas
8 pages
Lecture 12 Learning in Vision 2022
No ratings yet
Lecture 12 Learning in Vision 2022
100 pages
Applied Category Theory in Chemistry, Computing, and Social Networks
100% (1)
Applied Category Theory in Chemistry, Computing, and Social Networks
6 pages
AI ML Course
No ratings yet
AI ML Course
7 pages
Double Integrals Over Non-Rectangular Regions
No ratings yet
Double Integrals Over Non-Rectangular Regions
13 pages
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
No ratings yet
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
160 pages
ANN Unit-3 Associative Learning
No ratings yet
ANN Unit-3 Associative Learning
13 pages
Btech CSE
100% (1)
Btech CSE
17 pages
Shifman M. (Ed.) - You Failed Your Math Test, Comrade Einstein (2005) PDF
100% (2)
Shifman M. (Ed.) - You Failed Your Math Test, Comrade Einstein (2005) PDF
229 pages
Asymptotic Series Theory by Watson
No ratings yet
Asymptotic Series Theory by Watson
36 pages
Pygame
No ratings yet
Pygame
120 pages
Geometric and Topological Data Reduction
No ratings yet
Geometric and Topological Data Reduction
275 pages
ANN-unit 3
No ratings yet
ANN-unit 3
30 pages
Rethinking Attention With Performers
No ratings yet
Rethinking Attention With Performers
38 pages
11 Analysis of Control Systems in State Space Ogata Modern Control Engineering
No ratings yet
11 Analysis of Control Systems in State Space Ogata Modern Control Engineering
76 pages
Nonlinear Dynamical Systems and Carleman Linearization
No ratings yet
Nonlinear Dynamical Systems and Carleman Linearization
188 pages
Siamese Network Object Tracking Research
No ratings yet
Siamese Network Object Tracking Research
5 pages
Edwin L. Woollett - Maxima by Example
100% (1)
Edwin L. Woollett - Maxima by Example
514 pages
ANN-unit 4
No ratings yet
ANN-unit 4
25 pages
A Concise Introduction To Machine Learning 1st Edition A.C. Faul (Author) 2025 Full Version
No ratings yet
A Concise Introduction To Machine Learning 1st Edition A.C. Faul (Author) 2025 Full Version
161 pages
Deep Neural Network Presentation
No ratings yet
Deep Neural Network Presentation
9 pages
An Introduction To Chaotic Dynamical Systems: October 2021
No ratings yet
An Introduction To Chaotic Dynamical Systems: October 2021
36 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
0262072629.MIT Press - Peter D. Grunwald, in Jae Myung, Mark A. P Advances in Minimum Description Length & Applications - Apr.2009
No ratings yet
0262072629.MIT Press - Peter D. Grunwald, in Jae Myung, Mark A. P Advances in Minimum Description Length & Applications - Apr.2009
455 pages
Machine Learning - 2 Books in 1 - The Complete Guide For Beginners To Master Neural Networks, Artificial Intelligence, and Data Science With Python (BooksRack - Net)
No ratings yet
Machine Learning - 2 Books in 1 - The Complete Guide For Beginners To Master Neural Networks, Artificial Intelligence, and Data Science With Python (BooksRack - Net)
201 pages
Math Essentials for ML Enthusiasts
No ratings yet
Math Essentials for ML Enthusiasts
25 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
Geometric Deep Learning Overview
No ratings yet
Geometric Deep Learning Overview
20 pages
Introduction To Data Science: Hui Lin and Ming Li
No ratings yet
Introduction To Data Science: Hui Lin and Ming Li
403 pages
3D Point Cloud Segmentation Review
No ratings yet
3D Point Cloud Segmentation Review
20 pages
Physics 129b Caltech, 01/07/20
100% (1)
Physics 129b Caltech, 01/07/20
109 pages
DL Slides 2
No ratings yet
DL Slides 2
162 pages
UNIT I Artificial Neural Networks Hightlighted
No ratings yet
UNIT I Artificial Neural Networks Hightlighted
136 pages
Mathematics 281: Leonard Evans
No ratings yet
Mathematics 281: Leonard Evans
716 pages
Deep Dive Into ML
No ratings yet
Deep Dive Into ML
124 pages
CBSE Class 12 Question Paper Physics 2019 Set 4
No ratings yet
CBSE Class 12 Question Paper Physics 2019 Set 4
19 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
B.SC Mathematics
No ratings yet
B.SC Mathematics
17 pages
Conditional Probability & Bayes' Theorem
No ratings yet
Conditional Probability & Bayes' Theorem
14 pages
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
No ratings yet
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
194 pages
PYQ Linear Algebra JAM (MA) (2005-2024)
No ratings yet
PYQ Linear Algebra JAM (MA) (2005-2024)
75 pages
Stochastic Processes Beamer
No ratings yet
Stochastic Processes Beamer
43 pages
Advanced Calculus 13
No ratings yet
Advanced Calculus 13
355 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Experiment No. 4 TE SL-II (ANN)
100% (1)
Experiment No. 4 TE SL-II (ANN)
2 pages
Covariance Intersection Algorithm
No ratings yet
Covariance Intersection Algorithm
5 pages
Game Theory Draftj PDF
No ratings yet
Game Theory Draftj PDF
25 pages
ODE Extra
No ratings yet
ODE Extra
12 pages
Foundations of Machine
No ratings yet
Foundations of Machine
120 pages
John Von Neumann: by Kiara Williams & Kai Stringer
No ratings yet
John Von Neumann: by Kiara Williams & Kai Stringer
8 pages
A First Course in Linear Optimization
No ratings yet
A First Course in Linear Optimization
196 pages
An Introduction To Digital Image Process
No ratings yet
An Introduction To Digital Image Process
233 pages
Quantum Information and Computation Preskill
No ratings yet
Quantum Information and Computation Preskill
70 pages
Support Vector Machine SVM For Medical I
No ratings yet
Support Vector Machine SVM For Medical I
9 pages
Linear Algebra A First Course With Applications 1st Edition Larry E. Knop Complete Edition
100% (2)
Linear Algebra A First Course With Applications 1st Edition Larry E. Knop Complete Edition
143 pages
Unit V
No ratings yet
Unit V
25 pages
Nns Are A Study of Parallel and Distributed Processing Systems (PDPS)
No ratings yet
Nns Are A Study of Parallel and Distributed Processing Systems (PDPS)
46 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Flow Through A Venturi Meter
No ratings yet
Flow Through A Venturi Meter
13 pages
Filled NC Audit Report Template 45001 Ohsms Team B
No ratings yet
Filled NC Audit Report Template 45001 Ohsms Team B
5 pages
Advanced Vector Rotations
No ratings yet
Advanced Vector Rotations
4 pages
Model 043-B Service Regulator: Technical Data
No ratings yet
Model 043-B Service Regulator: Technical Data
2 pages
Eviden Brochure 2024
No ratings yet
Eviden Brochure 2024
6 pages
LED-Based Visible Light Communication
No ratings yet
LED-Based Visible Light Communication
4 pages
Questionnaire For Charitable Organizations
No ratings yet
Questionnaire For Charitable Organizations
5 pages
667400a31d833d00172262cf - ## - Inverse Trigonometric Functions - DPP 01 (Of Lec 03) - Lakshya JEE 2025
No ratings yet
667400a31d833d00172262cf - ## - Inverse Trigonometric Functions - DPP 01 (Of Lec 03) - Lakshya JEE 2025
2 pages
ISTQB Agile Foundation Sample Exam
No ratings yet
ISTQB Agile Foundation Sample Exam
9 pages
Prod Ec 1896508402
0% (1)
Prod Ec 1896508402
98 pages
G7 Recognize and Pronounce Blends, Diphthongs, and
No ratings yet
G7 Recognize and Pronounce Blends, Diphthongs, and
6 pages
Statistics Principles and Methods 8th Edition Test Bank
No ratings yet
Statistics Principles and Methods 8th Edition Test Bank
30 pages
TravelerPDF PDF
No ratings yet
TravelerPDF PDF
178 pages
HR Recruitment & Selection Guide
No ratings yet
HR Recruitment & Selection Guide
35 pages
Violin History and Anatomy Guide
100% (1)
Violin History and Anatomy Guide
4 pages
Script 2
No ratings yet
Script 2
62 pages
TOPFLYtech TLD2-L OBDII Vehicle GPS Tracker User Manual 20210207
No ratings yet
TOPFLYtech TLD2-L OBDII Vehicle GPS Tracker User Manual 20210207
12 pages
Anatomical Evidences Project 10 TH STD
No ratings yet
Anatomical Evidences Project 10 TH STD
7 pages
Concept Paper
No ratings yet
Concept Paper
4 pages
Herzberg Motivation
100% (1)
Herzberg Motivation
4 pages
Understanding Passive Components
No ratings yet
Understanding Passive Components
20 pages
Yoga Tips for Parkinson’s Relief
100% (1)
Yoga Tips for Parkinson’s Relief
29 pages
IKEA and ACME Case Study
No ratings yet
IKEA and ACME Case Study
24 pages
DLL - Mathematics 6 - Q1 - W3
No ratings yet
DLL - Mathematics 6 - Q1 - W3
6 pages
Iso 4317 2011
No ratings yet
Iso 4317 2011
11 pages
Venus' Influence in Relationships
0% (1)
Venus' Influence in Relationships
9 pages
Der, Die, Das - Easy Tricks To Identify German Articles
No ratings yet
Der, Die, Das - Easy Tricks To Identify German Articles
3 pages
Dan Appleman's Developing ActiveX Components With Visual Basic 5.0
No ratings yet
Dan Appleman's Developing ActiveX Components With Visual Basic 5.0
10 pages
InfralinePlus Oct12
No ratings yet
InfralinePlus Oct12
84 pages
William Gropp, Torsten Hoefler, Rajeev Thakur, Ewing Lusk Using Advanced MPI Modern Features of The Message-Passing Interface
No ratings yet
William Gropp, Torsten Hoefler, Rajeev Thakur, Ewing Lusk Using Advanced MPI Modern Features of The Message-Passing Interface
376 pages