Improve the mathematical description of Logistic Regression

Describe the issue linked to the documentation

The current description at https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression is a bit confusing:

it only presents the binary case with the sigmoid logistic loss, and does not give the equation for the multiclass case with the multinomial loss function;
the fact that y_i has values in {-1, 1} is explained but should probably appear earlier;
it does not present the prediction function nor where the loss function come from.

Suggest a potential alternative/fix

I think we should first present the 2 equations for binary logistic regression and multinomial logistic regression with l2 regularization and give the encoding of y_i right below each equations so as to put the binary and the multiclass case on an equal footing.

The possibility to swap the l2 regularization for l1 or elastic net regularization should be moved to a dedicated subsection that would then give the formula for those regularized loss functions but only for the binary case for the sake of conciseness.

I think we should also have a subsection to make it explicit that, for the multinomial case, scikit-learn's implementation over-parametrizes the model since the coef_ array has shape (n_classes_, n_features) while it would be possible to alternatively use an (n_classes - 1, n_features) parametrization as often done in the literature. We could justify the choice to use the over-parametrized formulation to preserve the symmetrical inductive bias w.r.t. the classes, which is especially important because of the penalization term.

We might also have another subsection that gives the mathematical description of the prediction function, both for the binary case (with the logistic sigmoid) and the multinomial case (with the softmax function), how those prediction functions stems from the modeling choice to parameterize log ratios of conditional class probabilities with linear combination of the input features. And finally how to recover the loss functions from by taking the negative log likelihood of those functions to recover the MLE (or the MAP estimate of a Bayesian formulation when adding the penalty term).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Describe the issue linked to the documentation

Suggest a potential alternative/fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Describe the issue linked to the documentation

Suggest a potential alternative/fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions