Add ValueError in case of numerical issues during PoissonRegressor lbfgs solver fit #29681

stes · 2024-08-16T09:58:47Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Issue #27016 outlines an edge case where the PoissonRegression silently gives a wrong result when fitting with the default lbfgs solver. This PR implements the change discussed in #27016 and adds test cases for the linear loss (only for the HalfPoissonLoss special case), plus for the PoissonRegression.

Any other comments?

Credits to @akaashp2000 for raising the issue and proposing the solution of wrapping the numpy warning. The solution detailed here is similar to #27332, but adds tests to both the linear loss and GLM packages.

github-actions · 2024-08-16T10:00:06Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 4ee50e6. Link to the linter CI: here}

Note, similar solution as in scikit-learn#27332

stes · 2024-08-16T12:59:06Z

sklearn/linear_model/_glm/tests/test_glm.py

+
+    def _load_edgecase():
+        """Dataset to cause a failing test in GLM model."""
+        X = [


I just realized that there is another edge case which the current fix is not able to catch, if we run

X = X[:,-1:]

(i.e., only use the last column)

then the models converge with

lbfgs intercept: 3.896592058397603 coefs: [0.] # newton-cholesky intercept: 3.2885810888136406 coefs: [0.00087977]

without any error message or warning, even when setting verbose = 100.

Hum, that is bad. Does this still happen even with tiny values of tol (while adjusting max_iter as needed)?

Please open a dedicated issue for this one if tiny tol values do not fix it.

What are the values of the Poisson loss for both models at convergence? Could you add a print statement to also display the gradient value (and norm) at the last iterate for both solvers? Maybe there are below machine precision level and there is nothing we can do about it.

On such a low parameter space, one could also display the 2x2 Hessian matrix of the Newton Cholesky method at each iterate, and it's condition number.

adrinjalali · 2024-08-22T11:41:42Z

cc @lorentzenchr @ogrisel

ogrisel

Thanks for the PR @stes. Here is some feedback:

ogrisel · 2024-11-08T09:51:17Z

sklearn/linear_model/_glm/tests/test_glm.py

+
+    # Without scaling the data, an overflow error is raised when using the LBFGS solver
+    with pytest.raises(ValueError, match="Overflow in gradient computation detected."):
+        model_sklearn_lbfgs = PoissonRegressor(alpha=0).fit(X, y)


Let's make the solver explicit:

Suggested change

model_sklearn_lbfgs = PoissonRegressor(alpha=0).fit(X, y)

model_sklearn_lbfgs = PoissonRegressor(alpha=0, solver="lbfgs").fit(X, y)

ogrisel · 2024-11-08T09:52:02Z

sklearn/linear_model/_glm/tests/test_glm.py

+    model_sklearn_lbfgs_scaled = PoissonRegressor(alpha=0, tol=1e-5, max_iter=1000).fit(
+        X / scale, y
+    )


Suggested change

model_sklearn_lbfgs_scaled = PoissonRegressor(alpha=0, tol=1e-5, max_iter=1000).fit(

X / scale, y

)

model_sklearn_lbfgs_scaled = PoissonRegressor(

alpha=0, solver="lbfgs", tol=1e-5, max_iter=1000

).fit(X / scale, y)

ogrisel · 2024-11-08T09:53:42Z

sklearn/linear_model/_glm/tests/test_glm.py

+    np.testing.assert_allclose(
+        model_sklearn_lbfgs_scaled.intercept_,
+        model_sklearn_nc.intercept_,
+        rtol=0.005,
+        atol=2e-4,
    )
+    np.testing.assert_allclose(
+        model_sklearn_lbfgs_scaled.coef_ / scale,
+        model_sklearn_nc.coef_,
+        rtol=0.005,
+        atol=2e-4,
+    )
+
+    # Scaling the data yields matching outputs for both solvers
+    np.testing.assert_allclose(
+        model_sklearn_lbfgs_scaled.intercept_,
+        model_sklearn_nc_scaled.intercept_,
+        rtol=0.005,
+        atol=2e-4,
+    )
+    np.testing.assert_allclose(
+        model_sklearn_lbfgs_scaled.coef_,
+        model_sklearn_nc_scaled.coef_,
+        rtol=0.005,
+        atol=2e-4,
+    )


Suggested change

np.testing.assert_allclose(

model_sklearn_lbfgs_scaled.intercept_,

model_sklearn_nc.intercept_,

rtol=0.005,

atol=2e-4,

)

np.testing.assert_allclose(

model_sklearn_lbfgs_scaled.coef_ / scale,

model_sklearn_nc.coef_,

rtol=0.005,

atol=2e-4,

)

# Scaling the data yields matching outputs for both solvers

np.testing.assert_allclose(

model_sklearn_lbfgs_scaled.intercept_,

model_sklearn_nc_scaled.intercept_,

rtol=0.005,

atol=2e-4,

)

np.testing.assert_allclose(

model_sklearn_lbfgs_scaled.coef_,

model_sklearn_nc_scaled.coef_,

rtol=0.005,

atol=2e-4,

)

tols = dict(rtol=0.005, atol=2e-4)

np.testing.assert_allclose(

model_sklearn_lbfgs_scaled.intercept_,

model_sklearn_nc.intercept_,

**tols,

)

np.testing.assert_allclose(

model_sklearn_lbfgs_scaled.coef_ / scale,

model_sklearn_nc.coef_,

**tols,

)

# Scaling the data yields matching outputs for both solvers

np.testing.assert_allclose(

model_sklearn_lbfgs_scaled.intercept_,

model_sklearn_nc_scaled.intercept_,

**tols,

)

np.testing.assert_allclose(

model_sklearn_lbfgs_scaled.coef_,

model_sklearn_nc_scaled.coef_,

**tols,

)

ogrisel · 2024-11-08T09:54:51Z

sklearn/linear_model/_linear_loss.py

+                    "Overflow in gradient computation detected. "
+                    "Scale the data as shown in:\n"
+                    "    https://scikit-learn.org/stable/modules/"
+                    "preprocessing.html, or select a different solver."


Suggested change

"preprocessing.html, or select a different solver."

"preprocessing.html, increase regularization or select a "

"different solver."

I agree that "increase" the regularization is not necessarily an interesting recommendation if this happens while hyper-parameter tuning, but it could be otherwise.

ogrisel · 2024-11-08T10:01:50Z

sklearn/linear_model/_linear_loss.py

+        # NOTE there are other instances of
+        # grad_pointwise.T @ X + l2_reg_strength * weights
+        # in this class. It might be necessary to adapt similar error
+        # handling for these instances as well.


Maybe you could refactor this as a private decorator for all methods of LinearModelLoss where this might occur.

ogrisel · 2024-11-08T10:06:52Z

sklearn/linear_model/_linear_loss.py

+                    if coef.ndim == 1:
+                        grad = grad.ravel(order="F")
+            except FloatingPointError as e:
+                raise ValueError(


I am not sure if ValueError is specific enough. Maybe we should reraise FloatingPointError (with a more informative error message) or introduce our own sklearn.exception.ConvergenceError class.

ogrisel · 2024-11-08T10:10:15Z

sklearn/linear_model/_glm/tests/test_glm.py

+            6,
+                          8,

            
+            33,
+        ]


Isn't it possible to synthetically generate the data for a reproducer? For instance, one-hot encoding a random integer variable with 9 possible values to get the first 9 columns and then concatenating a random numerical feature with a long tail positive distribution (e.g. log normal or similar with a large enough scale parameter)?

y (or log(y + eps)) and the last numerical column probably need to be correlated to trigger the problem.

Alternatively, we could try to cut this dataset iteratively in half and see if the problem is still there to attempt to make it more minimal.

But if it's too challenging to find a more minimal reproducer, then so be it

ogrisel · 2024-11-08T10:11:08Z

sklearn/linear_model/_glm/tests/test_glm.py

+
+    def _load_edgecase():
+        """Dataset to cause a failing test in GLM model."""
+        X = [


Hum, that is bad. Does this still happen even with tiny values of tol (while adjusting max_iter as needed)?

Please open a dedicated issue for this one if tiny tol values do not fix it.

github-actions bot added the module:linear_model label Aug 16, 2024

stes added 3 commits August 16, 2024 13:35

Raise ValueError for LinearModelLoss

5ee5009

Note, similar solution as in scikit-learn#27332

Add test for linear loss

6e531a0

Add tests for GLM edgecase and linear loss

0434111

stes force-pushed the issue27016 branch from 6ade462 to 0434111 Compare August 16, 2024 11:35

stes added 2 commits August 16, 2024 14:34

Improve linear lost test case

137995a

Improve error message

4ee50e6

stes marked this pull request as ready for review August 16, 2024 12:54

stes commented Aug 16, 2024

View reviewed changes

stes mentioned this pull request Aug 16, 2024

PoissonRegressor lbfgs solver giving coefficients of 0 and Runtime Warning #27016

Open

ogrisel reviewed Nov 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ValueError in case of numerical issues during PoissonRegressor lbfgs solver fit #29681

Add ValueError in case of numerical issues during PoissonRegressor lbfgs solver fit #29681

	model_sklearn_lbfgs = PoissonRegressor(alpha=0).fit(X, y)
	model_sklearn_lbfgs = PoissonRegressor(alpha=0, solver="lbfgs").fit(X, y)

	"preprocessing.html, or select a different solver."
	"preprocessing.html, increase regularization or select a "
	"different solver."

+,
+,
+,
+                      ]

Add ValueError in case of numerical issues during PoissonRegressor lbfgs solver fit #29681

Are you sure you want to change the base?

Add ValueError in case of numerical issues during PoissonRegressor lbfgs solver fit #29681

Conversation

✔️ Linting Passed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment