Open
Description
Description
Different choices of solver for sklearn's LogisticRegression optimize different cost functions - its highly confusing behavior, particularly of concern if you want to publish what cost function you're using. In particular:
sklearn.LR(solver=liblinear) minimizes: L + lam*Rb
sklearn.LR(solver=others) minimizes: L + lam*R
statsmodels.GLM(bionomial) minimizes: L/n + lam*Rb
where:
lam = 1/C
L = logloss
n = training sample size
R = square of L2 norm of feature weights
Rb =square of L2 norm of feature weights and intercept
I was a little surprised to find that the logloss is not normalized by the training set size. I think this is uncommon, and means the effective C changes based on the amount of training data. Good thing, bad thing? Not sure, but it seems unusual, but more importantly, what is minimized should be explicit.
PS. #10001 --- excellent idea! The default liblinear cost function is just plain confusing.
Steps/Code to Reproduce
There's an example to show the different weights here:
Expected Results
Actual Results
Versions
1.19.1