0% found this document useful (0 votes)

14 views4 pages

Solution 5

The document outlines Exercise 5 for a Deep Learning course at Leuphana University, focusing on multiclass classification using the softmax function and cross-entropy loss. It includes tasks to derive the derivatives of the softmax function and the loss function, as well as implementing a 10-class classifier for the MNIST dataset using PyTorch. The exercise also involves testing dropout layers and visualizing learned filters in the model.

Uploaded by

Muhammad Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views4 pages

Solution 5

Uploaded by

Muhammad Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Deep Learning Summer Term 2025

http://ml3.leuphana.de/lectures/summer25/DL
Machine Learning Group, Leuphana University of Lüneburg
Soham Majumder (soham.majumder@leuphana.de)

Exercise 5
Discussion date: 02.06.2025

Task 10 Multiclass Classification

Part 1
Let x ∈ Rd be a vector. The softmax function softmax : Rd → (0, 1)d is given by
 
exp(x1 )  
exp(x2 ) Xd
p = softmax(x) =  / exp(xj )
  
..
 .  j=1
exp(xd )
and returns a probability distribution p, i.e.,
exp(xj )
pj = Pd ≥0
k=1 exp(xk )
Pd
and j=1 pj = 1.
A suitable loss function is the cross-entropy loss. It is given by
d
X
H(p, y) = − yj log(pj ),
j=1

where y is a one-hot encoded target vector and p is the output of the softmax layer.

(i) Show that the derivative of the softmax function with respect to x is
∂pj
= pj (δij − pi ),
∂xi
where δij is 1 if i = j and 0 otherwise.
(ii) Show that the derivative of the cross-entropy loss in combination with the softmax function
with respect to x is
∂H(p, y)
= p − y.
∂x
∂pj
Hint: For the first part, you should do a case distinction (i = j and i ̸= j) of ∂xi . In the second
∂ log pj
part, you need the chain rule when considering ∂xi .

1
Solution
(i) For i = j we have
P
∂pj exp(xj ) ( k exp(xk )) − exp(xj ) exp(xi )
= 2 (derivative of a rational)
∂xi
P
( k exp(xk ))
P
exp(xj ) ( k exp(xk )) − exp(xi )
=P · P (separate exp(xj ))
k exp(x k ) exp(xk )
P k
exp(xj ) ( exp(xk )) exp(xi )
=P · Pk −P
k exp(xk ) k exp(xk ) k exp(xk )
= pj · (1 − pi )

and for i ̸= j we get

∂pj 0 − exp(xj ) exp(xi )

= 2
∂xi
P
( k exp(xk ))
exp(xj ) − exp(xi )
=P ·P
k exp(x k ) k exp(xk )
= pj · (0 − pi ).

Combined, that yields

∂pj
= pj (δij − pi ),
∂xi
where δij is 1 if i = j and 0 otherwise.
(ii)
Pd d d
∂H(p, y) ∂− j=1 yj log(pj ) X ∂ log(pj ) X ∂ log(pj ) ∂pj
= =− yj =− yj
∂xi ∂xi j=1
∂xi j=1
∂pj ∂xi
d d d
X 1 X X
=− yj pj (δij − pi ) = − yj (δij − pi ) = − yj δij − yj pi
j=1
pj j=1 j=1
d
X d
X
=− yj δij + pi yj = −yi + pi
j=1 j=1

Pd
Here, we used the fact that y is a one-hot encoded target vector, hence j=1 yj = 1.

2
Part 2
The log-linear model for logistic regression allows us to derive the softmax function to model the
probabilities in multiclass classification. For a problem with c classes, start by writing the log-
probability of each class as a linear function of the inputs and the partition (“normalization”) term
− log Z

log P (Y = 1|X = x) = w1 x + b1 − log Z,

log P (Y = 2|X = x) = w2 x + b2 − log Z,
..
.
log P (Y = c|X = x) = wc x + bc − log Z,

Pc
and using j=1 P (Y = j|X = x) = 1, show how this model is equivalent to modeling the class
probabilities with the softmax function.

Solution
First, we rewrite the log-linear models as probabilities by exponentiating both sides:
1
P (Y = 1|X = x) = exp(w1 x + b1 ),
Z
1
P (Y = 2|X = x) = exp(w2 x + b2 ),
Z
..
.
1
P (Y = c|X = x) = exp(wc x + bc ).
Z

Pc
We can now determine Z by using j=1 P (Y = j|X = x) = 1:

c
X 1
1= exp(wj x + bj ) (multiplying both sides by Z)
j=1
Z
c
X
Z= exp(wj x + bj )
j=1

Thus,

exp(wi x + bi )
P (Y = i|X = x) = Pc = pi ,
j=1 exp(wj x + bj )

where pi is the i-th component of softmax((w1 x + b1 , w2 x + b2 , . . . , wc x + bc )⊤ ).

3
Task 11 Multiclass Classification with PyTorch
(i) Read the classification tutorial from the PyTorch documentation.∗
(ii) Adapt the code and implement a 10-class classifier for the MNIST data set based on the
tutorial you just read. Use the CrossEntropyLoss and the Adam optimizer.
(iii) Add a dropout layer† between the fully connected linear layers of the classifier. Test several
values of the dropout probability p and report on the train and test accuracy. Use a learning
rate of 0.01 and train for at least 15 epochs. Which p works best?
(iv) Visualize the learned filters of the convolutional layers. You can access the weights via
net.conv1.weight.data.cpu().numpy()
(v) Take a training sample and manually apply every operation of the forward pass. Take a look
at the intermediate results.

Solution
The code is provided as solution5.ipynb.

∗ https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
† https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html#torch.nn.Dropout

02 - Linear Models - D (Multiclass Classification)
No ratings yet
02 - Linear Models - D (Multiclass Classification)
9 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
Cross Interopy
No ratings yet
Cross Interopy
7 pages
Logistic Regression
No ratings yet
Logistic Regression
29 pages
Main
No ratings yet
Main
9 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Practice QuestionsV1
No ratings yet
Practice QuestionsV1
7 pages
Practice QuestionsV1
No ratings yet
Practice QuestionsV1
7 pages
HW 3
No ratings yet
HW 3
4 pages
Homework 2
No ratings yet
Homework 2
3 pages
Machine Learning Basics Lecture 7: Multiclass Classification
No ratings yet
Machine Learning Basics Lecture 7: Multiclass Classification
28 pages
cs231n Github Io Neural Networks Case Study
No ratings yet
cs231n Github Io Neural Networks Case Study
17 pages
Softmax Function for ML Practitioners
No ratings yet
Softmax Function for ML Practitioners
7 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
C2 W2 SoftMax
No ratings yet
C2 W2 SoftMax
7 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
Solution 02
No ratings yet
Solution 02
9 pages
Softmax Regression Explained
No ratings yet
Softmax Regression Explained
4 pages
7 TrainingNN-2
No ratings yet
7 TrainingNN-2
84 pages
Log-Linear Models, Memms, and CRFS: 1 Notation
No ratings yet
Log-Linear Models, Memms, and CRFS: 1 Notation
11 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
Slides MC Softmax Regression
No ratings yet
Slides MC Softmax Regression
11 pages
Softmax Regression in Neural Networks
No ratings yet
Softmax Regression in Neural Networks
35 pages
Bản sao của softmax - regression.ipynb - Colab
No ratings yet
Bản sao của softmax - regression.ipynb - Colab
6 pages
Softmax Reg Skimmed - Ipynb - Colab
No ratings yet
Softmax Reg Skimmed - Ipynb - Colab
9 pages
Detailed Sigmoid and Softmax Activation Function
No ratings yet
Detailed Sigmoid and Softmax Activation Function
5 pages
Ds 6
No ratings yet
Ds 6
21 pages
DL145611 03 Shallow
No ratings yet
DL145611 03 Shallow
92 pages
Lec 2
No ratings yet
Lec 2
43 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
ML Basics Lecture2 Linear Classification
No ratings yet
ML Basics Lecture2 Linear Classification
34 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Lect 8
No ratings yet
Lect 8
117 pages
Lecture 19
No ratings yet
Lecture 19
8 pages
CHC 351 Module 4
No ratings yet
CHC 351 Module 4
126 pages
Softmax Derivative for CS Students
No ratings yet
Softmax Derivative for CS Students
3 pages
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
No ratings yet
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
53 pages
NB 13
No ratings yet
NB 13
27 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Logistic Regression
No ratings yet
Logistic Regression
61 pages
PW3 SupervisedLearning
No ratings yet
PW3 SupervisedLearning
10 pages
APA Chapter3 T20
No ratings yet
APA Chapter3 T20
24 pages
practicalMachineLearning Lecture3
No ratings yet
practicalMachineLearning Lecture3
25 pages
ML-chap10 2024 110300
No ratings yet
ML-chap10 2024 110300
29 pages
IN5400 - Machine Learning For Image Analysis
No ratings yet
IN5400 - Machine Learning For Image Analysis
6 pages
Softmax
No ratings yet
Softmax
17 pages
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Log Reg Skimed - Ipynb - Colab
No ratings yet
Log Reg Skimed - Ipynb - Colab
10 pages
NLP: Linear & Log-Linear Models
No ratings yet
NLP: Linear & Log-Linear Models
34 pages
Exercise Solution 05 Linear Classification
No ratings yet
Exercise Solution 05 Linear Classification
9 pages
SoftMax Regress Real
No ratings yet
SoftMax Regress Real
8 pages
Lecture 2 2022
No ratings yet
Lecture 2 2022
34 pages
Naive Bayes Classifier and Other Topics
No ratings yet
Naive Bayes Classifier and Other Topics
52 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Intro To Machine Learning Lecture Notes2
No ratings yet
Intro To Machine Learning Lecture Notes2
7 pages
Module 2
No ratings yet
Module 2
55 pages
Đê 1
No ratings yet
Đê 1
5 pages
Seal AIr Blowers To Burners and Observation Ports
No ratings yet
Seal AIr Blowers To Burners and Observation Ports
5 pages
Au Deloitte Conversational Ai
No ratings yet
Au Deloitte Conversational Ai
40 pages
Imp Mcq's
No ratings yet
Imp Mcq's
2 pages
MATLAB Codes For Finite Element Analysis
No ratings yet
MATLAB Codes For Finite Element Analysis
8 pages
HVAC BOQ - HOSPITAL at Kozhikkode
No ratings yet
HVAC BOQ - HOSPITAL at Kozhikkode
7 pages
CAD Questions
No ratings yet
CAD Questions
1 page
Engineering Student Resume SEO Tips
No ratings yet
Engineering Student Resume SEO Tips
2 pages
Osl Midterm Exam Set A
No ratings yet
Osl Midterm Exam Set A
3 pages
Scrum Glossary
No ratings yet
Scrum Glossary
5 pages
Introduction To Computer Organization and Architecture (COA)
No ratings yet
Introduction To Computer Organization and Architecture (COA)
35 pages
DVS Grades
No ratings yet
DVS Grades
1 page
Spare Parts List: Feed 304, Feed 484
No ratings yet
Spare Parts List: Feed 304, Feed 484
23 pages
Crypto Trading Rechargeable Token Based
No ratings yet
Crypto Trading Rechargeable Token Based
13 pages
Cheekati Saraswathi 2
No ratings yet
Cheekati Saraswathi 2
6 pages
GB NAC Peugeot
No ratings yet
GB NAC Peugeot
6 pages
THESIS Mobile Phone Usage
70% (43)
THESIS Mobile Phone Usage
24 pages
Astm A338-2022
No ratings yet
Astm A338-2022
2 pages
Types of Assurance Engagements
No ratings yet
Types of Assurance Engagements
1 page
Internet Marketing and ECom (Revised - 01)
No ratings yet
Internet Marketing and ECom (Revised - 01)
2 pages
Relative Clauses Unit 8
No ratings yet
Relative Clauses Unit 8
16 pages
IT Internship Logbook
No ratings yet
IT Internship Logbook
42 pages
WSN Unit 1 Jntuh
No ratings yet
WSN Unit 1 Jntuh
7 pages
US90A - 24V Hall IC Fan Driver
No ratings yet
US90A - 24V Hall IC Fan Driver
12 pages
Transmission Hydraulic System
100% (3)
Transmission Hydraulic System
3 pages
Numerical Análisis With Julia
100% (3)
Numerical Análisis With Julia
215 pages
Business Research Methodology
No ratings yet
Business Research Methodology
2 pages
Q2'25 Trade Up Matrix
No ratings yet
Q2'25 Trade Up Matrix
4 pages
M.Sc Computer Science Syllabus 2021-22
No ratings yet
M.Sc Computer Science Syllabus 2021-22
88 pages
Li Ion Battery PSDS
No ratings yet
Li Ion Battery PSDS
2 pages

Solution 5

Uploaded by

Solution 5

Uploaded by

Deep Learning Summer Term 2025

Task 10 Multiclass Classification

and for i ̸= j we get

∂pj 0 − exp(xj ) exp(xi )

Combined, that yields

log P (Y = 1|X = x) = w1 x + b1 − log Z,

where pi is the i-th component of softmax((w1 x + b1 , w2 x + b2 , . . . , wc x + bc )⊤ ).

You might also like