Slides MC Softmax Regression

Uploaded by

rohit rushil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views11 pages

Slides MC Softmax Regression

Uploaded by

rohit rushil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Introduction to Machine Learning

Softmax Regression

Learning goals
Know softmax regression
Understand that softmax
regression is a generalization of
logistic regression
FROM LOGISTIC REGRESSION ...
Remember logistic regression (Y = {0, 1}): We combined the
hypothesis space of linear functions, transformed by the logistic
1
function s(z ) = 1+exp(− z)
, i.e.
n o
⊤
H = π : X → R | π(x) = s(θ x) ,

with the Bernoulli (logarithmic) loss:

L(y , π(x)) = −y log (π(x)) − (1 − y ) log (1 − π(x)) .

Remark: We suppress the intercept term for better readability. The intercept term can
be easily included via θ ⊤ x̃, θ ∈ Rp+1 , x̃ = (1, x).

© Introduction to Machine Learning – 1 / 9

... TO SOFTMAX REGRESSION
There is a straightforward generalization to the multiclass case:
Instead of a single linear discriminant function we have g linear
discriminant functions

fk (x) = θk⊤ x, k = 1, 2, ..., g ,

each indicating the confidence in class k .

The g score functions are transformed into g probability functions
by the softmax function s : Rg → Rg

exp(θk⊤ x)
πk (x) = s(f (x))k = Pg ⊤
,
j =1 exp(θj x)

instead of the logistic function for g = 2. The probabilities are

P
well-defined: πk (x) = 1 and πk (x) ∈ [0, 1] for all k .

© Introduction to Machine Learning – 2 / 9

... TO SOFTMAX REGRESSION
The softmax function is a generalization of the logistic function.
For g = 2, the logistic function and the softmax function are
equivalent.
Instead of the Bernoulli loss, we use the multiclass logarithmic
loss
g
X
L(y , π(x)) = − 1{y =k } log (πk (x)) .
k =1

Note that the softmax function is a “smooth” approximation of the

arg max operation, so s((1, 1000, 2)T ) ≈ (0, 1, 0)T (picks out 2nd
element!).
Furthermore, it is invariant to constant offsets in the input:
exp(θk⊤ x + c ) exp(θk⊤ x) · exp(c )
s(f (x)+c) = Pg ⊤
= Pg ⊤
= s(f (x))
j =1 exp(θj x + c ) j =1 exp(θj x) · exp(c )

© Introduction to Machine Learning – 3 / 9

LOGISTIC VS. SOFTMAX REGRESSION

Logistic Regression Softmax Regression

Y {0, 1} {1, 2, ..., g }

Discriminant fun. f (x) = θ ⊤ x fk (x) = θk⊤ x, k = 1, 2, ..., g

1 exp(θk⊤ x)
Probabilities π(x) = πk (x) = Pg
1+exp(−θ ⊤ x) j =1
exp(θj⊤ x)

L(y , π(x)) Bernoulli / logarithmic loss Multiclass logarithmic loss

Pg
−y log (π(x)) − (1 − y ) log (1 − π(x)) − k =1 [y = k ] log (πk (x))

LOGISTIC VS. SOFTMAX REGRESSION
We can schematically depict softmax regression as follows:

LOGISTIC VS. SOFTMAX REGRESSION
Further comments:
We can now, for instance, calculate gradients and optimize this
with standard numerical optimization software.
Softmax regression has an unusual property in that it has a
“redundant” set of parameters. If we subtract a fixed vector from all
θk , the predictions do not change at all. Hence, our model is
“over-parameterized”. For any hypothesis we might fit, there are
multiple parameter vectors that give rise to exactly the same
hypothesis function. This also implies that the minimizer of
Remp (θ) above is not unique! Hence, a numerical trick is to set
θg = 0 and only optimize the other θk . This does not restrict our
hypothesis space, but the constrained problem is now convex, i.e.,
there exists exactly one parameter vector for every hypothesis.
A similar approach is used in many ML models: multiclass LDA,
naive Bayes, neural networks and boosting.

SOFTMAX: LINEAR DISCRIMINANT FUNCTIONS
Softmax regression gives us a linear classifier.
exp(z k )
The softmax function s(z )k = Pg is
j =1 exp(z j )

a rank-preserving function, i.e. the ranks among the elements

of the vector z are the same as among the elements of s(z ).
This is because softmax transforms all scores by taking the
exp(·) (rank-preserving) and divides each element by the
same normalizing constant.
Thus, the softmax function has a unique inverse function
s−1 : Rg → Rg that is also monotonic and rank-preserving.
exp(θk⊤ x)
Applying sk−1 to πk (x) = Pg ⊤ gives us fk (x) = θk⊤ x.
j =1 exp(θj x)
Thus, softmax regression is a linear classifier.

GENERALIZING SOFTMAX REGRESSION
Instead of simple linear discriminant functions we could use any model
that outputs g scores

fk (x) ∈ R, k = 1, 2, ..., g

We can choose a multiclass loss and optimize the score functions

fk , k ∈ {1, ..., g } by multivariate minimization. The scores can be
transformed to probabilities by the softmax function.

GENERALIZING SOFTMAX REGRESSION
For example for a neural network (note that softmax regression is also
a neural network with no hidden layers):

Remark: For more details about neural networks please refer to the
lecture Deep Learning.

Detailed Sigmoid and Softmax Activation Function
No ratings yet
Detailed Sigmoid and Softmax Activation Function
5 pages
Softmax Reg Skimmed - Ipynb - Colab
No ratings yet
Softmax Reg Skimmed - Ipynb - Colab
9 pages
Logistic Regression
No ratings yet
Logistic Regression
29 pages
02 - Linear Models - D (Multiclass Classification)
No ratings yet
02 - Linear Models - D (Multiclass Classification)
9 pages
Softmax Function for ML Practitioners
No ratings yet
Softmax Function for ML Practitioners
7 pages
C2 W2 SoftMax
No ratings yet
C2 W2 SoftMax
7 pages
Multiclass Logistic Regression Guide
No ratings yet
Multiclass Logistic Regression Guide
12 pages
Softmax Function Explained in Python
No ratings yet
Softmax Function Explained in Python
14 pages
SoftMax Regress Real
No ratings yet
SoftMax Regress Real
8 pages
Cross Interopy
No ratings yet
Cross Interopy
7 pages
cs231n Github Io Neural Networks Case Study
No ratings yet
cs231n Github Io Neural Networks Case Study
17 pages
Soft Max
No ratings yet
Soft Max
6 pages
Softmax Regression Explained
No ratings yet
Softmax Regression Explained
4 pages
Understand The Softmax Function in Minutes: Data Science Bootcamp
No ratings yet
Understand The Softmax Function in Minutes: Data Science Bootcamp
15 pages
Bản sao của softmax - regression.ipynb - Colab
No ratings yet
Bản sao của softmax - regression.ipynb - Colab
6 pages
Lecture Slides - Linear Regression (2025)
No ratings yet
Lecture Slides - Linear Regression (2025)
45 pages
Softmax
No ratings yet
Softmax
17 pages
Solution 5
No ratings yet
Solution 5
4 pages
Lec 2
No ratings yet
Lec 2
43 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
61 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
NB 13
No ratings yet
NB 13
27 pages
ML Lec 4
No ratings yet
ML Lec 4
38 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Unit 2
No ratings yet
Unit 2
10 pages
Supervised Learning Algorithms Cheat Sheet
No ratings yet
Supervised Learning Algorithms Cheat Sheet
20 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Big Data Lesson 4 Lucrezia Noli
No ratings yet
Big Data Lesson 4 Lucrezia Noli
16 pages
Softmax Regression for Data Scientists
100% (1)
Softmax Regression for Data Scientists
10 pages
Cs229 ML Notes
No ratings yet
Cs229 ML Notes
192 pages
Main
No ratings yet
Main
9 pages
DSA5102X Lecture1
No ratings yet
DSA5102X Lecture1
51 pages
Math For Machine Learning Book Preview
0% (1)
Math For Machine Learning Book Preview
43 pages
04SVM
No ratings yet
04SVM
22 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Softmax Regression in Neural Networks
No ratings yet
Softmax Regression in Neural Networks
35 pages
Lec 05
No ratings yet
Lec 05
54 pages
10 SVM
No ratings yet
10 SVM
23 pages
Solution 02
No ratings yet
Solution 02
9 pages
Lecture MachineLearning
No ratings yet
Lecture MachineLearning
139 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
61 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
Tda 7396
100% (1)
Tda 7396
11 pages
46E9AF MS91LA Service Manual
No ratings yet
46E9AF MS91LA Service Manual
76 pages
Optika ISview Manual
No ratings yet
Optika ISview Manual
73 pages
ERP Implementation at Nestle
No ratings yet
ERP Implementation at Nestle
6 pages
Episode 1: Fame or Shame
No ratings yet
Episode 1: Fame or Shame
14 pages
66ccda1f1b3bbf2d30c4f522 Intuition Whitepaper
No ratings yet
66ccda1f1b3bbf2d30c4f522 Intuition Whitepaper
85 pages
Modular Inspection Systems
No ratings yet
Modular Inspection Systems
9 pages
Arduino+MATLAB MATuino - MATLAB & ARDUINO Serial Communication
No ratings yet
Arduino+MATLAB MATuino - MATLAB & ARDUINO Serial Communication
3 pages
Operation and Maintenance Manual For PZ61 DC Power System
No ratings yet
Operation and Maintenance Manual For PZ61 DC Power System
15 pages
Debug 1214
No ratings yet
Debug 1214
6 pages
2c.80386 Multitasking
No ratings yet
2c.80386 Multitasking
35 pages
Python Operator Overloading Guide
No ratings yet
Python Operator Overloading Guide
3 pages
Sony Hcd-Slk1i Slk2i Ver1.0 PDF
No ratings yet
Sony Hcd-Slk1i Slk2i Ver1.0 PDF
88 pages
Intro to Set Theory for CS Students
No ratings yet
Intro to Set Theory for CS Students
19 pages
Unit 3.1 Miller Effect
No ratings yet
Unit 3.1 Miller Effect
3 pages
Multi Banking System Project Abstract
No ratings yet
Multi Banking System Project Abstract
1 page
Andhra Pradesh SSC Mathematics 2014
No ratings yet
Andhra Pradesh SSC Mathematics 2014
396 pages
Introduction to F. Smarandache
No ratings yet
Introduction to F. Smarandache
3 pages
Structure-Enhanced Pop Music Generation Via Harmony-Aware Learning
No ratings yet
Structure-Enhanced Pop Music Generation Via Harmony-Aware Learning
10 pages
MSc CS Inventory System Report
No ratings yet
MSc CS Inventory System Report
7 pages
Quantum Mechanics Concepts and Applications 1st Edition Nouredine Zettili Kindle & PDF Formats
No ratings yet
Quantum Mechanics Concepts and Applications 1st Edition Nouredine Zettili Kindle & PDF Formats
49 pages
English5 q4 Module 1 Weeks 1 3 Approved For Printing
No ratings yet
English5 q4 Module 1 Weeks 1 3 Approved For Printing
14 pages
TLR+9Trainer Hyper2k
No ratings yet
TLR+9Trainer Hyper2k
2 pages
Solid Works Tutorial 2001
100% (10)
Solid Works Tutorial 2001
262 pages
Bharati Vidyapeeth College of Engineering Department of Mechanical Engineering A-Y-2020-21 Title
No ratings yet
Bharati Vidyapeeth College of Engineering Department of Mechanical Engineering A-Y-2020-21 Title
19 pages
Cs607p Labs
No ratings yet
Cs607p Labs
50 pages
Turberg DWR G3512E PDF
100% (2)
Turberg DWR G3512E PDF
176 pages
Introduction To Scripting
No ratings yet
Introduction To Scripting
4 pages
Kenwood TK-80 TRC-80 User PDF
100% (1)
Kenwood TK-80 TRC-80 User PDF
33 pages
Baby Lock Elegante2 BLG2 Sewing Machine Instruction Manual
No ratings yet
Baby Lock Elegante2 BLG2 Sewing Machine Instruction Manual
268 pages