8000 Add ValueError in case of numerical issues during PoissonRegressor lbfgs solver fit by stes · Pull Request #29681 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Add ValueError in case of numerical issues during PoissonRegressor lbfgs solver fit #29681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

stes
Copy link
Contributor
@stes stes commented Aug 16, 2024

Reference Issues/PRs

Fix #27016

What does this implement/fix? Explain your changes.

Issue #27016 outlines an edge case where the PoissonRegression silently gives a wrong result when fitting with the default lbfgs solver. This PR implements the change discussed in #27016 and adds test cases for the linear loss (only for the HalfPoissonLoss special case), plus for the PoissonRegression.

Any other comments?

Credits to @akaashp2000 for raising the issue and proposing the solution of wrapping the numpy warning. The solution detailed here is similar to #27332, but adds tests to both the linear loss and GLM packages.

Copy link
github-actions bot commented Aug 16, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4ee50e6. Link to the linter CI: here

@stes stes marked this pull request as ready for review August 16, 2024 12:54

def _load_edgecase():
"""Dataset to cause a failing test in GLM model."""
X = [
Copy link
Contributor Author
@stes stes Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that there is another edge case which the current fix is not able to catch, if we run

X = X[:,-1:]

(i.e., only use the last column)

then the models converge with

lbfgs
intercept: 3.896592058397603
coefs:    [0.]

# newton-cholesky
intercept:  3.2885810888136406
coefs:     [0.00087977]

without any error message or warning, even when setting verbose = 100.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, that is bad. Does this still happen even with tiny values of tol (while adjusting max_iter as needed)?

Please open a dedicated issue for this one if tiny tol values do not fix it.

Copy link
Member
@ogrisel ogrisel Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the values of the Poisson loss for both models at convergence? Could you add a print statement to also display the gradient value (and norm) at the last iterate for both solvers? Maybe there are below machine precision level and there is nothing we can do about it.

Copy link
Member
@ogrisel ogrisel Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On such a low parameter space, one could also display the 2x2 Hessian matrix of the Newton Cholesky method at each iterate, and it's condition number.

@adrinjalali
Copy link
Member

cc @lorentzenchr @ogrisel

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @stes. Here is some feedback:


# Without scaling the data, an overflow error is raised when using the LBFGS solver
with pytest.raises(ValueError, match="Overflow in gradient computation detected."):
model_sklearn_lbfgs = PoissonRegressor(alpha=0).fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the solver explicit:

Suggested change
model_sklearn_lbfgs = PoissonRegressor(alpha=0).fit(X, y)
model_sklearn_lbfgs = PoissonRegressor(alpha=0, solver="lbfgs").fit(X, y)

Comment on lines +1267 to +1269
model_sklearn_lbfgs_scaled = PoissonRegressor(alpha=0, tol=1e-5, max_iter=1000).fit(
X / scale, y
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model_sklearn_lbfgs_scaled = PoissonRegressor(alpha=0, tol=1e-5, max_iter=1000).fit(
X / scale, y
)
model_sklearn_lbfgs_scaled = PoissonRegressor(
alpha=0, solver="lbfgs", tol=1e-5, max_iter=1000
).fit(X / scale, y)

Comment on lines +1272 to +1297
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.intercept_,
model_sklearn_nc.intercept_,
rtol=0.005,
atol=2e-4,
)
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.coef_ / scale,
model_sklearn_nc.coef_,
rtol=0.005,
atol=2e-4,
)

# Scaling the data yields matching outputs for both solvers
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.intercept_,
model_sklearn_nc_scaled.intercept_,
rtol=0.005,
atol=2e-4,
)
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.coef_,
model_sklearn_nc_scaled.coef_,
rtol=0.005,
atol=2e-4,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.intercept_,
model_sklearn_nc.intercept_,
rtol=0.005,
atol=2e-4,
)
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.coef_ / scale,
model_sklearn_nc.coef_,
rtol=0.005,
atol=2e-4,
)
# Scaling the data yields matching outputs for both solvers
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.intercept_,
model_sklearn_nc_scaled.intercept_,
rtol=0.005,
atol=2e-4,
)
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.coef_,
model_sklearn_nc_scaled.coef_,
rtol=0.005,
atol=2e-4,
)
tols = dict(rtol=0.005, atol=2e-4)
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.intercept_,
model_sklearn_nc.intercept_,
**tols,
)
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.coef_ / scale,
model_sklearn_nc.coef_,
**tols,
)
# Scaling the data yields matching outputs for both solvers
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.intercept_,
model_sklearn_nc_scaled.intercept_,
**tols,
)
np.testing.assert_allclose(
model_sklearn_lbfgs_scaled.coef_,
model_sklearn_nc_scaled.coef_,
**tols,
)

"Overflow in gradient computation detected. "
"Scale the data as shown in:\n"
" https://scikit-learn.org/stable/modules/"
"preprocessing.html, or select a different solver."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"preprocessing.html, or select a different solver."
"preprocessing.html, increase regularization or select a "
"different solver."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that "increase" the regularization is not necessarily an interesting recommendation if this happens while hyper-parameter tuning, but it could be otherwise.

# NOTE there are other instances of
# grad_pointwise.T @ X + l2_reg_strength * weights
# in this class. It might be necessary to adapt similar error
# handling for these instances as well.
628C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could refactor this as a private decorator for all methods of LinearModelLoss where this might occur.

if coef.ndim == 1:
grad = grad.ravel(order="F")
except FloatingPointError as e:
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if ValueError is specific enough. Maybe we should reraise FloatingPointError (with a more informative error message) or introduce our own sklearn.exception.ConvergenceError class.

6,
8,
33,
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to synthetically generate the data for a reproducer? For instance, one-hot encoding a random integer variable with 9 possible values to get the first 9 columns and then concatenating a random numerical feature with a long tail positive distribution (e.g. log normal or similar with a large enough scale parameter)?

y (or log(y + eps)) and the last numerical column probably need to be correlated to trigger the problem.

Alternatively, we could try to cut this dataset iteratively in half and see if the problem is still there to attempt to make it more minimal.

But if it's too challenging to find a more minimal reproducer, then so be it


def _load_edgecase():
"""Dataset to cause a failing test in GLM model."""
X = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, that is bad. Does this still happen even with tiny values of tol (while adjusting max_iter as needed)?

Please open a dedicated issue for this one if tiny tol values do not fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

PoissonRegressor lbfgs solver giving coefficients of 0 and Runtime Warning
3 participants
0