Description
When sample weights are provided, RidgeCV never uses an SVD decomposition of the
design matrix and always uses an eigendecomposition of the Gram matrix:
scikit-learn/sklearn/linear_model/ridge.py
Line 1036 in 7389dba
I'm not sure why this is the case. The strategy of multiplying X and Y by the
square root of the sample weights, used when cv_mode is 'eigen', should work in
the same way for the svd solver. If I simply remove the lines that set cv_mode
to 'eigen' when sample_weights are provided, the tests still pass and the fitted
coeficients are the same wether we use 'eigen' or 'svd'. does somebody know what
I am missing?
At the very least, a warning could be emitted when the gram matrix is used,
despite the number of samples being greater than the number of features, because
of the sample weights. otherwise a user fitting a RidgeCV with many samples, few
features, sample weights, and the default parameters, might be surprised to see that it takes a
long time and a lot of memory. at the moment, such a warning is only emitted
when the user explicitely asked for gcv_mode='svd':
scikit-learn/sklearn/linear_model/ridge.py
Line 1034 in 7389dba
scikit-learn/sklearn/linear_model/ridge.py
Line 1037 in 7389dba
Also, this warning message could be a bit more explicit, explaining the
performance implications of using 'eigen' rather than 'svd' when n samples > n
features.