Deep Learning Summer Term 2025
http://ml3.leuphana.de/lectures/summer25/DL
Machine Learning Group, Leuphana University of Lüneburg
Soham Majumder (soham.majumder@leuphana.de)
Exercise 5
Discussion date: 02.06.2025
Task 10 Multiclass Classification
Part 1
Let x ∈ Rd be a vector. The softmax function softmax : Rd → (0, 1)d is given by
exp(x1 )
exp(x2 ) Xd
p = softmax(x) = / exp(xj )
..
. j=1
exp(xd )
and returns a probability distribution p, i.e.,
exp(xj )
pj = Pd ≥0
k=1 exp(xk )
Pd
and j=1 pj = 1.
A suitable loss function is the cross-entropy loss. It is given by
d
X
H(p, y) = − yj log(pj ),
j=1
where y is a one-hot encoded target vector and p is the output of the softmax layer.
(i) Show that the derivative of the softmax function with respect to x is
∂pj
= pj (δij − pi ),
∂xi
where δij is 1 if i = j and 0 otherwise.
(ii) Show that the derivative of the cross-entropy loss in combination with the softmax function
with respect to x is
∂H(p, y)
= p − y.
∂x
∂pj
Hint: For the first part, you should do a case distinction (i = j and i ̸= j) of ∂xi . In the second
∂ log pj
part, you need the chain rule when considering ∂xi .
1
Solution
(i) For i = j we have
P
∂pj exp(xj ) ( k exp(xk )) − exp(xj ) exp(xi )
= 2 (derivative of a rational)
∂xi
P
( k exp(xk ))
P
exp(xj ) ( k exp(xk )) − exp(xi )
=P · P (separate exp(xj ))
k exp(x k ) exp(xk )
P k
exp(xj ) ( exp(xk )) exp(xi )
=P · Pk −P
k exp(xk ) k exp(xk ) k exp(xk )
= pj · (1 − pi )
and for i ̸= j we get
∂pj 0 − exp(xj ) exp(xi )
= 2
∂xi
P
( k exp(xk ))
exp(xj ) − exp(xi )
=P ·P
k exp(x k ) k exp(xk )
= pj · (0 − pi ).
Combined, that yields
∂pj
= pj (δij − pi ),
∂xi
where δij is 1 if i = j and 0 otherwise.
(ii)
Pd d d
∂H(p, y) ∂− j=1 yj log(pj ) X ∂ log(pj ) X ∂ log(pj ) ∂pj
= =− yj =− yj
∂xi ∂xi j=1
∂xi j=1
∂pj ∂xi
d d d
X 1 X X
=− yj pj (δij − pi ) = − yj (δij − pi ) = − yj δij − yj pi
j=1
pj j=1 j=1
d
X d
X
=− yj δij + pi yj = −yi + pi
j=1 j=1
Pd
Here, we used the fact that y is a one-hot encoded target vector, hence j=1 yj = 1.
2
Part 2
The log-linear model for logistic regression allows us to derive the softmax function to model the
probabilities in multiclass classification. For a problem with c classes, start by writing the log-
probability of each class as a linear function of the inputs and the partition (“normalization”) term
− log Z
log P (Y = 1|X = x) = w1 x + b1 − log Z,
log P (Y = 2|X = x) = w2 x + b2 − log Z,
..
.
log P (Y = c|X = x) = wc x + bc − log Z,
Pc
and using j=1 P (Y = j|X = x) = 1, show how this model is equivalent to modeling the class
probabilities with the softmax function.
Solution
First, we rewrite the log-linear models as probabilities by exponentiating both sides:
1
P (Y = 1|X = x) = exp(w1 x + b1 ),
Z
1
P (Y = 2|X = x) = exp(w2 x + b2 ),
Z
..
.
1
P (Y = c|X = x) = exp(wc x + bc ).
Z
Pc
We can now determine Z by using j=1 P (Y = j|X = x) = 1:
c
X 1
1= exp(wj x + bj ) (multiplying both sides by Z)
j=1
Z
c
X
Z= exp(wj x + bj )
j=1
Thus,
exp(wi x + bi )
P (Y = i|X = x) = Pc = pi ,
j=1 exp(wj x + bj )
where pi is the i-th component of softmax((w1 x + b1 , w2 x + b2 , . . . , wc x + bc )⊤ ).
3
Task 11 Multiclass Classification with PyTorch
(i) Read the classification tutorial from the PyTorch documentation.∗
(ii) Adapt the code and implement a 10-class classifier for the MNIST data set based on the
tutorial you just read. Use the CrossEntropyLoss and the Adam optimizer.
(iii) Add a dropout layer† between the fully connected linear layers of the classifier. Test several
values of the dropout probability p and report on the train and test accuracy. Use a learning
rate of 0.01 and train for at least 15 epochs. Which p works best?
(iv) Visualize the learned filters of the convolutional layers. You can access the weights via
net.conv1.weight.data.cpu().numpy()
(v) Take a training sample and manually apply every operation of the forward pass. Take a look
at the intermediate results.
Solution
The code is provided as solution5.ipynb.
∗ https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
† https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html#torch.nn.Dropout