FIX raise error when enet_path and lasso_path receive unknown parameters #19391

hongshaoyang · 2021-02-07T11:23:01Z

Reference Issues/PRs

Closes #10598
Takes over stalled PR and closes #10622

What does this implement/fix? Explain your changes.

Raise ValueError when unknown params passed to enet_path
Remove useless param in test_lasso_path_return_models_vs_new_return_gives_same_coefficients

Any other comments?

Updated the code, sorry for putting in lackluster pull requests. Will put in more attention to detail, something silly like putting in commands after the function ended was easy to prevent.

glemaitre · 2021-02-09T16:50:40Z

We could also just expand the params in the function signature of enet_path (i.e. add X_offset, X_scale, tol, max_iter, random_state, selection) although that may be excessive for users.

I see this solution in the original issue. I am wondering if we are not converging to this type of pattern when having keyword only parameters. It would require to update the documentation but otherwise, we will just have to add the default value in the signature.

@lorentzenchr I saw that you had a look. Do you have a preference?

lorentzenchr · 2021-02-10T17:33:58Z

I'm in favor of adding those parameters as keyword arguments to enet_path because the solvers are not part of the public API and this way those arguments are explicit and documented—in the spirit of The Zen of Python number 2.

hongshaoyang · 2021-03-01T11:08:07Z

sklearn/linear_model/_coordinate_descent.py

+    selection = params.pop('selection', 'cyclic')
+
+    params.pop('fit_intercept', None)
+    params.pop('normalize', None)


there seems to be many places in the tests where fit_intercept and and normalize are being passed into enet_path - changing **params to keyword args will necessitate a refactor of all tests that pass in fit_intercept and normalize. it seems that these two aren't being used in enet_path at all.

should we still proceed in this direction?

I'd say yes, but I would also very much be interested in the opinion of others, @glemaitre maybe?

Indeed, fit_intercept and normalize are not used in the path. They used to prepare the data before fit.
I assume that we should raise the error if one tries to pass these parameters since the path is not doing anything regarding these parameters.

However, we should add in the documentation that lasso_path and lasso_path expect centred and normalized data. If this is not the case, I don't know if we should stipulate to use the Lasso and ElasticNet estimators instead. @agramfort can give some highlight here.

glemaitre · 2021-08-02T17:33:45Z

looking a bit more into details at the code, I think that we need to apply the following patch:

Note: I applied these changes to check if the tests are passing.

diff --git a/sklearn/linear_model/_coordinate_descent.py b/sklearn/linear_model/_coordinate_descent.py
index 72be51f3aa..cad3e3ea38 100644
--- a/sklearn/linear_model/_coordinate_descent.py
+++ b/sklearn/linear_model/_coordinate_descent.py
@@ -503,8 +503,6 @@ def enet_path(
     random_state = params.pop("random_state", None)
     selection = params.pop("selection", "cyclic")
 
-    params.pop("fit_intercept", None)
-    params.pop("normalize", None)
     if len(params) > 0:
         raise ValueError("Unexpected parameters in params", params.keys())
 
@@ -1025,7 +1023,6 @@ class ElasticNet(MultiOutputMixin, RegressorMixin, LinearModel):
         dual_gaps_ = np.zeros(n_targets, dtype=X.dtype)
         self.n_iter_ = []
 
-        # FIXME: 'normalize' to be removed in 1.2
         for k in range(n_targets):
             if Xy is not None:
                 this_Xy = Xy[:, k]
@@ -1040,8 +1037,6 @@ class ElasticNet(MultiOutputMixin, RegressorMixin, LinearModel):
                 alphas=[alpha],
                 precompute=precompute,
                 Xy=this_Xy,
-                fit_intercept=False,
-                normalize=False,
                 copy_X=True,
                 verbose=False,
                 tol=self.tol,
@@ -1276,6 +1271,8 @@ def _path_residuals(
     sample_weight,
     train,
     test,
+    normalize,
+    fit_intercept,
     path,
     path_params,
     alphas=None,
@@ -1357,9 +1354,6 @@ def _path_residuals(
                 # for read-only memmaps (cf. numpy#14132).
                 array.setflags(write=True)
 
-    fit_intercept = path_params["fit_intercept"]
-    normalize = path_params["normalize"]
-
     if y.ndim == 1:
         precompute = path_params["precompute"]
     else:
@@ -1586,7 +1580,11 @@ class LinearModelCV(MultiOutputMixin, LinearModel, ABC):
         path_params = self.get_params()
 
         # FIXME: 'normalize' to be removed in 1.2
-        path_params["normalize"] = _normalize
+        # path_params["normalize"] = _normalize
+        # Pop `intercept` and `normalize` that are not parameter of the path
+        # function
+        path_params.pop("normalize", None)
+        path_params.pop("fit_intercept", None)
 
         if "l1_ratio" in path_params:
             l1_ratios = np.atleast_1d(path_params["l1_ratio"])
@@ -1644,6 +1642,8 @@ class LinearModelCV(MultiOutputMixin, LinearModel, ABC):
                 sample_weight,
                 train,
                 test,
+                _normalize,
+                self.fit_intercept,
                 self.path,
                 path_params,
                 alphas=this_alphas,
diff --git a/sklearn/linear_model/tests/test_coordinate_descent.py b/sklearn/linear_model/tests/test_coordinate_descent.py
index f17da26471..e589a77bfa 100644
--- a/sklearn/linear_model/tests/test_coordinate_descent.py
+++ b/sklearn/linear_model/tests/test_coordinate_descent.py
@@ -1111,8 +1111,8 @@ def test_sparse_dense_descent_paths():
     X, y, _, _ = build_dataset(n_samples=50, n_features=20)
     csr = sparse.csr_matrix(X)
     for path in [enet_path, lasso_path]:
-        _, coefs, _ = path(X, y, fit_intercept=False)
-        _, sparse_coefs, _ = path(csr, y, fit_intercept=False)
+        _, coefs, _ = path(X, y)
+        _, sparse_coefs, _ = path(csr, y)
         assert_array_almost_equal(coefs, sparse_coefs)
 
 
@@ -1520,8 +1520,7 @@ def test_enet_cv_sample_weight_correctness(fit_intercept):
     assert alphas[0] < reg.alpha_ < alphas[-1]
     assert reg_sw.alpha_ == reg.alpha_
     assert_allclose(reg_sw.coef_, reg.coef_)
-    if fit_intercept is not None:
-        assert reg_sw.intercept_ == pytest.approx(reg.intercept_)
+    assert reg_sw.intercept_ == pytest.approx(reg.intercept_)
 
 
 @pytest.mark.parametrize("sample_weight", [False, True])

glemaitre · 2021-08-02T17:36:44Z

Basically, normalize and fit_intercept are used to preprocess the data and not inside the path function. normalize will even get deprecated into two versions. So we need to pass these parameters as estimator parameters or function parameters but absolutely not as a path parameter.

It is the essence of the patch that I proposed. If @agramfort could give a look just to be sure then, I could finish up the PR if needed.

glemaitre

LGTM

agramfort

Thx @hongshaoyang @glemaitre

…ers (scikit-learn#19391) * Resolves scikit-learn#10598 Updated the code, sorry for putting in lackluster pull requests. Will put in more attention to detail, something silly like putting in commands after the function ended was easy to prevent. * Update coordinate_descent.py * Update coordinate_descent.py * Move params checking to top and fix test * Flake8 * make sure to not pass normalize and fit_intercept as path params * TST check the error message * typo * iter * iter Co-authored-by: Elliot Kang <kkang2097@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

kkang2097 and others added 4 commits February 7, 2021 18:34

Resolves scikit-learn#10598

98dbbde

Updated the code, sorry for putting in lackluster pull requests. Will put in more attention to detail, something silly like putting in commands after the function ended was easy to prevent.

Update coordinate_descent.py

12a5b61

Update coordinate_descent.py

6b4d3d8

Move params checking to top and fix test

a971d7d

github-actions bot added the module:linear_model label Feb 7, 2021

Flake8

0898bd6

lorentzenchr mentioned this pull request Feb 8, 2021

WIP #10598 <enet_path does not pass params to coordinate descent solver>(Updated) #10622

Closed

Merge remote-tracking branch 'upstream/main' into 10598-enetpath

9006088

hongshaoyang commented Mar 1, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/main' into 10598-enetpath

81626bd

glemaitre self-assigned this Aug 2, 2021

Merge remote-tracking branch 'origin/main' into pr/hongshaoyang/19391

cfe74e6

make sure to not pass normalize and fit_intercept as path params

13861a0

glemaitre changed the title ~~[MRG] Raise ValueError when unknown params passed to enet_path~~ FIX raise error when enet_path and lasso_path receive unknown parameters Aug 2, 2021

glemaitre added 4 commits August 2, 2021 20:03

TST check the error message

e0d60a6

typo

24a5ee4

iter

847e55c

iter

405734a

glemaitre removed their assignment Aug 4, 2021

glemaitre approved these changes Aug 4, 2021

View reviewed changes

agramfort approved these changes Aug 5, 2021

View reviewed changes

agramfort merged commit 4f33e4d into scikit-learn:main Aug 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX raise error when enet_path and lasso_path receive unknown parameters #19391

FIX raise error when enet_path and lasso_path receive unknown parameters #19391

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FIX raise error when enet_path and lasso_path receive unknown parameters #19391

FIX raise error when enet_path and lasso_path receive unknown parameters #19391

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!