8000 RidgeCV with sample_weights and 'svd' gcv mode · Issue #13321 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

RidgeCV with sample_weights and 'svd' gcv mode #13321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeromedockes opened this issue Feb 28, 2019 · 0 comments · Fixed by #13350
Closed

RidgeCV with sample_weights and 'svd' gcv mode #13321

jeromedockes opened this issue Feb 28, 2019 · 0 comments · Fixed by #13350

Comments

@jeromedockes
Copy link
Contributor
jeromedockes commented Feb 28, 2019

When sample weights are provided, RidgeCV never uses an SVD decomposition of the
design matrix and always uses an eigendecomposition of the Gram matrix:

# FIXME non-uniform sample weights not yet supported

I'm not sure why this is the case. The strategy of multiplying X and Y by the
square root of the sample weights, used when cv_mode is 'eigen', should work in
the same way for the svd solver. If I simply remove the lines that set cv_mode
to 'eigen' when sample_weights are provided, the tests still pass and the fitted
coeficients are the same wether we use 'eigen' or 'svd'. does somebody know what
I am missing?

At the very least, a warning could be emitted when the gram matrix is used,
despite the number of samples being greater than the number of features, because
of the sample weights. otherwise a user fitting a RidgeCV with many samples, few
features, sample weights, and the default parameters, might be surprised to see that it takes a
long time and a lot of memory. at the moment, such a warning is only emitted
when the user explicitely asked for gcv_mode='svd':

gcv_mode = 'svd'

warnings.warn("non-uniform sample weights unsupported for svd, "

Also, this warning message could be a bit more explicit, explaining the
performance implications of using 'eigen' rather than 'svd' when n samples > n
features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant
0