-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add ValueError in case of numerical issues during PoissonRegressor lbfgs solver fit #29681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
def _load_edgecase(): | ||
"""Dataset to cause a failing test in GLM model.""" | ||
X = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that there is another edge case which the current fix is not able to catch, if we run
X = X[:,-1:]
(i.e., only use the last column)
then the models converge with
lbfgs
intercept: 3.896592058397603
coefs: [0.]
# newton-cholesky
intercept: 3.2885810888136406
coefs: [0.00087977]
without any error message or warning, even when setting verbose = 100
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, that is bad. Does this still happen even with tiny values of tol
(while adjusting max_iter
as needed)?
Please open a dedicated issue for this one if tiny tol
values do not fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the values of the Poisson loss for both models at convergence? Could you add a print statement to also display the gradient value (and norm) at the last iterate for both solvers? Maybe there are below machine precision level and there is nothing we can do about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On such a low parameter space, one could also display the 2x2 Hessian matrix of the Newton Cholesky method at each iterate, and it's condition number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @stes. Here is some feedback:
|
||
# Without scaling the data, an overflow error is raised when using the LBFGS solver | ||
with pytest.raises(ValueError, match="Overflow in gradient computation detected."): | ||
model_sklearn_lbfgs = PoissonRegressor(alpha=0).fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the solver explicit:
model_sklearn_lbfgs = PoissonRegressor(alpha=0).fit(X, y) | |
model_sklearn_lbfgs = PoissonRegressor(alpha=0, solver="lbfgs").fit(X, y) |
model_sklearn_lbfgs_scaled = PoissonRegressor(alpha=0, tol=1e-5, max_iter=1000).fit( | ||
X / scale, y | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_sklearn_lbfgs_scaled = PoissonRegressor(alpha=0, tol=1e-5, max_iter=1000).fit( | |
X / scale, y | |
) | |
model_sklearn_lbfgs_scaled = PoissonRegressor( | |
alpha=0, solver="lbfgs", tol=1e-5, max_iter=1000 | |
).fit(X / scale, y) |
np.testing.assert_allclose( | ||
model_sklearn_lbfgs_scaled.intercept_, | ||
model_sklearn_nc.intercept_, | ||
rtol=0.005, | ||
atol=2e-4, | ||
) | ||
np.testing.assert_allclose( | ||
model_sklearn_lbfgs_scaled.coef_ / scale, | ||
model_sklearn_nc.coef_, | ||
rtol=0.005, | ||
atol=2e-4, | ||
) | ||
|
||
# Scaling the data yields matching outputs for both solvers | ||
np.testing.assert_allclose( | ||
model_sklearn_lbfgs_scaled.intercept_, | ||
model_sklearn_nc_scaled.intercept_, | ||
rtol=0.005, | ||
atol=2e-4, | ||
) | ||
np.testing.assert_allclose( | ||
model_sklearn_lbfgs_scaled.coef_, | ||
model_sklearn_nc_scaled.coef_, | ||
rtol=0.005, | ||
atol=2e-4, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.testing.assert_allclose( | |
model_sklearn_lbfgs_scaled.intercept_, | |
model_sklearn_nc.intercept_, | |
rtol=0.005, | |
atol=2e-4, | |
) | |
np.testing.assert_allclose( | |
model_sklearn_lbfgs_scaled.coef_ / scale, | |
model_sklearn_nc.coef_, | |
rtol=0.005, | |
atol=2e-4, | |
) | |
# Scaling the data yields matching outputs for both solvers | |
np.testing.assert_allclose( | |
model_sklearn_lbfgs_scaled.intercept_, | |
model_sklearn_nc_scaled.intercept_, | |
rtol=0.005, | |
atol=2e-4, | |
) | |
np.testing.assert_allclose( | |
model_sklearn_lbfgs_scaled.coef_, | |
model_sklearn_nc_scaled.coef_, | |
rtol=0.005, | |
atol=2e-4, | |
) | |
tols = dict(rtol=0.005, atol=2e-4) | |
np.testing.assert_allclose( | |
model_sklearn_lbfgs_scaled.intercept_, | |
model_sklearn_nc.intercept_, | |
**tols, | |
) | |
np.testing.assert_allclose( | |
model_sklearn_lbfgs_scaled.coef_ / scale, | |
model_sklearn_nc.coef_, | |
**tols, | |
) | |
# Scaling the data yields matching outputs for both solvers | |
np.testing.assert_allclose( | |
model_sklearn_lbfgs_scaled.intercept_, | |
model_sklearn_nc_scaled.intercept_, | |
**tols, | |
) | |
np.testing.assert_allclose( | |
model_sklearn_lbfgs_scaled.coef_, | |
model_sklearn_nc_scaled.coef_, | |
**tols, | |
) |
"Overflow in gradient computation detected. " | ||
"Scale the data as shown in:\n" | ||
" https://scikit-learn.org/stable/modules/" | ||
"preprocessing.html, or select a different solver." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"preprocessing.html, or select a different solver." | |
"preprocessing.html, increase regularization or select a " | |
"different solver." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that "increase" the regularization is not necessarily an interesting recommendation if this happens while hyper-parameter tuning, but it could be otherwise.
# NOTE there are other instances of | ||
# grad_pointwise.T @ X + l2_reg_strength * weights | ||
# in this class. It might be necessary to adapt similar error | ||
# handling for these instances as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could refactor this as a private decorator for all methods of LinearModelLoss
where this might occur.
if coef.ndim == 1: | ||
grad = grad.ravel(order="F") | ||
except FloatingPointError as e: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if ValueError
is specific enough. Maybe we should reraise FloatingPointError
(with a more informative error message) or introduce our own sklearn.exception.ConvergenceError
class.
6, | ||
8, | ||
33, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible to synthetically generate the data for a reproducer? For instance, one-hot encoding a random integer variable with 9 possible values to get the first 9 columns and then concatenating a random numerical feature with a long tail positive distribution (e.g. log normal or similar with a large enough scale parameter)?
y (or log(y + eps)
) and the last numerical column probably need to be correlated to trigger the problem.
Alternatively, we could try to cut this dataset iteratively in half and see if the problem is still there to attempt to make it more minimal.
But if it's too challenging to find a more minimal reproducer, then so be it
|
||
def _load_edgecase(): | ||
"""Dataset to cause a failing test in GLM model.""" | ||
X = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, that is bad. Does this still happen even with tiny values of tol
(while adjusting max_iter
as needed)?
Please open a dedicated issue for this one if tiny tol
values do not fix it.
Reference Issues/PRs
Fix #27016
What does this implement/fix? Explain your changes.
Issue #27016 outlines an edge case where the
PoissonRegression
silently gives a wrong result when fitting with the default lbfgs solver. This PR implements the change discussed in #27016 and adds test cases for the linear loss (only for theHalfPoissonLoss
special case), plus for the PoissonRegression.Any other comments?
Credits to @akaashp2000 for raising the issue and proposing the solution of wrapping the numpy warning. The solution detailed here is similar to #27332, but adds tests to both the linear loss and GLM packages.