Fix Ridge sparse + sample_weight + intercept #22899

jeremiedbb · 2022-03-19T01:36:14Z

Same issue as in LinearRegression: in sparse X_offset needs the sample_weight rescaling.
The issue appears with solver = 'sparse-cg' and solver = 'lbfgs'.

~~I noticed that there's also an issue for 'sag' solver, but I'm not familiar with the code at all. Any help would be greatly appreciated. We can also leave that for a separate PR.~~

ping @lorentzenchr

8000

lorentzenchr

Could you also parametrize test_ridge_sample_weights and test_ridge_sample_weight_invariance for sparse input?

sklearn/linear_model/_ridge.py

sklearn/linear_model/tests/test_base.py

sklearn/linear_model/tests/test_ridge.py

lorentzenchr · 2022-03-19T09:51:44Z

sklearn/linear_model/tests/test_ridge.py

+
+
+@pytest.mark.parametrize("solver", ["sparse_cg", "sag"])
+def test_ridge_sample_weights_dense_sparse(solver, global_random_seed):


Could this be merged with test_ridge_fit_intercept_sparse (and maybe also test_ridge_fit_intercept_sparse)? (take the best part of both).
It seems kind of duplicate.

Right, I merged them into a single test. It allowed to find out that lbfgs also had the same issue. I fixed it as well

lorentzenchr · 2022-03-19T09:56:41Z

This is great as it will resolve a long existing, severe bug!

jeremiedbb · 2022-03-19T17:36:18Z

sklearn/linear_model/tests/test_ridge.py

@@ -1362,33 +1362,41 @@ def test_n_iter():


 @pytest.mark.parametrize("solver", ["sparse_cg", "lbfgs", "auto"])
-def test_ridge_fit_intercept_sparse(solver):
+@pytest.mark.parametrize("with_sample_weight", [True, False])
+def test_ridge_fit_intercept_sparse(solver, with_sample_weight, global_random_seed):


tested with "all" seeds

lorentzenchr

LGTM

sklearn/linear_model/_ridge.py

lorentzenchr · 2022-03-22T17:22:29Z

@agramfort @rth @TomDLT may be interested.

ogrisel · 2022-03-22T18:00:10Z

sklearn/linear_model/tests/test_ridge.py

+    For now only sparse_cg and lbfgs can correctly fit an intercept
+    with sparse X with default tol and max_iter.
+    'sag' is tested separately in test_ridge_fit_intercept_sparse_sag because it
+    requires more iterations and should raise a warning if default max_iter is used.


I pushed a new commit to make this true with sample_weight != None. I tested it with "all" global_random_seed.

ogrisel

I had a shallow look at the code changes and it looks ok-ish but I did not check the maths. I just trust the updated tests.

+1 for merge once the CI is green.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

scale X_offset sparse_cg

13491ff

jeremiedbb added the module:linear_model label Mar 19, 2022

jeremiedbb added this to the 1.1 milestone Mar 19, 2022

what's new

6b56daf

lorentzenchr reviewed Mar 19, 2022

View reviewed changes

jeremiedbb added 3 commits March 19, 2022 17:32

address comments + fix lbfgs

d48f138

cln

b93b272

cln

f407a06

jeremiedbb commented Mar 19, 2022

View reviewed changes

lorentzenchr approved these changes Mar 21, 2022

View reviewed changes

sklearn/linear_model/_ridge.py Show resolved Hide resolved

Merge branch 'main' into fix-linear-models-sparse-sw

744a7ce

test_ridge_fit_intercept_sparse_sag with non default sample_weight

d64ebaa

ogrisel reviewed Mar 22, 2022

View reviewed changes

Use assert_allclose in test_ridge_fit_intercept_sparse_sag

70be11b

ogrisel approved these changes Mar 22, 2022

View reviewed changes

jeremiedbb merged commit d76f87c into scikit-learn:main Mar 22, 2022

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022

Fix Ridge sparse + sample_weight + intercept (scikit-learn#22899)

54d57e1

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix Ridge sparse + sample_weight + intercept #22899

Fix Ridge sparse + sample_weight + intercept #22899

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!



		@pytest.mark.parametrize("solver", ["sparse_cg", "sag"])
		def test_ridge_sample_weights_dense_sparse(solver, global_random_seed):

Uh oh!

Fix Ridge sparse + sample_weight + intercept #22899

Fix Ridge sparse + sample_weight + intercept #22899

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!