8000 Convergence issues in l1 logistic regression path example · Issue #15903 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Convergence issues in l1 logistic regression path example #15903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Dec 16, 2019 · 6 comments · Fixed by #15927
Closed

Convergence issues in l1 logistic regression path example #15903

amueller opened this issue Dec 16, 2019 · 6 comments · Fixed by #15927

Comments

@amueller
Copy link
Member

This example:
https://scikit-learn.org/dev/auto_examples/linear_model/plot_logistic_path.html

Shows wrong results according to
https://papers.nips.cc/paper/8491-geno-generic-optimization-for-classical-machine-learning.pdf

I'm surprised the data is not scaled, I wouldn't expect saga to work without scaling. That paper has some other interesting graphs also.

cc @agramfort who might now actually see this ping [you did unfollow the repo, right?]

@agramfort
Copy link
Member

I will look tomorrow.

@agramfort
Copy link
Member

I have liblinear match GENO in figure 1 when setting intercept_scaling=10000. so intercept is not regularized much.

Now regarding SAGA I agree that it's weird. SAGA should converge to machine precision so graphs should match... bug? cc @TomDLT @arthurmensch

@TomDLT
Copy link
Member
TomDLT commented Dec 17, 2019

SAGA is bad on small dataset, since the gradient estimate is very noisy. This is even worst when C is large, since the soft-thresholding (1 / C) is too small to kill even the smallest noisy coefficients variations.

In the example, if you increase the number of iterations and decrease the tolerance (tol=1e-10, max_iter=int(1e8)), you get almost the same results. It is very slow (2230 sec), but I don't think we should use SAGA on such small dataset anyway, especially with large C.

Figure_2
Figure_1

@agramfort
Copy link
Member
agramfort commented Dec 17, 2019 via email

@TomDLT
Copy link
Member
TomDLT commented Dec 17, 2019

Note that directly calling _logistic_regression_path with the entire list of C does not change the computational time nor the results, despite using more advanced SAGA warm starting.

warm_start_mem = {'coef': coef_init, 'sum_gradient': sum_gradient_init,
'intercept_sum_gradient': intercept_sum_gradient,
'gradient_memory': gradient_memory_init,
'seen': seen_init, 'num_seen': num_seen}

@amueller
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0

Can we warn though?
And then this example is bad and we should use a different solver ;)