8000 Document what cost function LogisticRegression minimizes for each solver · Issue #10164 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Document what cost function LogisticRegression minimizes for each solver #10164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
allComputableThings opened this issue Nov 17, 2017 · 3 comments

Comments

@allComputableThings
Copy link
allComputableThings commented Nov 17, 2017

Description

Different choices of solver for sklearn's LogisticRegression optimize different cost functions - its highly confusing behavior, particularly of concern if you want to publish what cost function you're using. In particular:

sklearn.LR(solver=liblinear) minimizes: L + lam*Rb
sklearn.LR(solver=others) minimizes:    L + lam*R
statsmodels.GLM(bionomial) minimizes:   L/n + lam*Rb

where:

lam = 1/C
L = logloss
n = training sample size
R = square of L2 norm of feature weights
Rb =square of L2 norm of feature weights and intercept

I was a little surprised to find that the logloss is not normalized by the training set size. I think this is uncommon, and means the effective C changes based on the amount of training data. Good thing, bad thing? Not sure, but it seems unusual, but more importantly, what is minimized should be explicit.

PS. #10001 --- excellent idea! The default liblinear cost function is just plain confusing.

Steps/Code to Reproduce

There's an example to show the different weights here:

https://stackoverflow.com/questions/47338695/why-does-the-choice-of-solver-result-in-different-weight-in-sklearn-logisticreg

Expected Results

Actual Results

Versions

1.19.1

@allComputableThings allComputableThings changed the title Document what cost function that LogisticRegression minimizes for each solver Document what cost function LogisticRegression minimizes for each solver Nov 17, 2017
@amueller
Copy link
Member
amueller commented Nov 17, 2017

PR welcome!
Yes, this is very confusing indeed. Thank you for your summary. I think having a section in the user guide committed to it would help. The not scaling the regularization is basically for consistency with linearSVC (which is also implemented in liblinear). But it's also the parametrization that "makes sense" IIRC. That means that this is the parametrization that is least dependent on the number of samples. We did extensive experiments a couple of years ago, and from what I remember, we found that for L2 not scaling by n_samples is the right way to go, and for L1 scaling by n_samples is. We have the same scaling behavior in Ridge btw.

@TomDLT
Copy link
Member
TomDLT commented Nov 20, 2017

We found that for L2 not scaling by n_samples is the right way to go, and for L1 scaling by n_samples is.

See this example.

Otherwise, the cost function is documented here. The liblinear intercept regularization could be more advertised though. PR welcome!

@amueller
Copy link
Member

Thanks for pulling out the example. The thing that the docs don't mention is the difference in penalizing the intercept, right? Otherwise they seem pretty complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0