10000 Improve the mathematical description of Logistic Regression · Issue #21985 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Improve the mathematical description of Logistic Regression #21985
Closed
@ogrisel

Description

@ogrisel

Describe the issue linked to the documentation

The current description at https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression is a bit confusing:

  • it only presents the binary case with the sigmoid logistic loss, and does not give the equation for the multiclass case with the multinomial loss function;
  • the fact that y_i has values in {-1, 1} is explained but should probably appear earlier;
  • it does not present the prediction function nor where the loss function come from.

Suggest a potential alternative/fix

I think we should first present the 2 equations for binary logistic regression and multinomial logistic regression with l2 regularization and give the encoding of y_i right below each equations so as to put the binary and the multiclass case on an equal footing.

The possibility to swap the l2 regularization for l1 or elastic net regularization should be moved to a dedicated subsection that would then give the formula for those regularized loss functions but only for the binary case for the sake of conciseness.

I think we should also have a subsection to make it explicit that, for the multinomial case, scikit-learn's implementation over-parametrizes the model since the coef_ array has shape (n_classes_, n_features) while it would be possible to alternatively use an (n_classes - 1, n_features) parametrization as often done in the literature. We could justify the choice to use the over-parametrized formulation to preserve the symmetrical inductive bias w.r.t. the classes, which is especially important because of the penalization term.

We might also have another subsection that gives the mathematical description of the prediction function, both for the binary case (with the logistic sigmoid) and the multinomial case (with the softmax function), how those prediction functions stems from the modeling choice to parameterize log ratios of conditional class probabilities with linear combination of the input features. And finally how to recover the loss functions from by taking the negative log likelihood of those functions to recover the MLE (or the MAP estimate of a Bayesian formulation when adding the penalty term).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0