PoissonRegressor lbfgs solver giving coefficients of 0 and Runtime Warning #27016

akaashp2000 · 2023-08-04T22:18:48Z

Describe the bug

See the following stack exchange post (the solution to my original issue was to use newton-cholesky solver)

When fitting a Poisson Regression (without regularization) to some dummy data I encounter:

with lbfgs solver, a Runtime Warning, a non-zero intercept and all coefficients as zero
with newton-cholesky solver, coefficients as expected

Some people on StackExchange have mentioned it is worth submitting an issue (there was a similar one faced with Logistic Regression).

Steps/Code to Reproduce

import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
from sklearn.linear_model import PoissonRegressor
from sklearn.preprocessing import (
    OneHotEncoder,
)
data = sm.datasets.get_rdataset('Insurance', package='MASS').data
# Fit Poisson regression using formula interface
formula = "Claims ~ C(District, Treatment(1)) + C(Group, Treatment('<1l')) + C(Age, Treatment('<25')) + Holders"
model_smf = smf.poisson(formula=formula, data=data).fit()
print(type(model_smf))
print(model_smf.summary())
# with sklearn OneHotEncoder

X_train_ohe = OneHotEncoder(sparse_output=False, drop=[1, "<1l", "<25"]).fit(data[["District", "Group", "Age"]])
X_train_ohe = pd.DataFrame(X_train_ohe.transform(data[["District", "Group", "Age"]]), columns=X_train_ohe.get_feature_names_out())

X_train = pd.concat([X_train_ohe, data[["Holders"]]], axis=1)
y_train = data["Claims"]

# one-hot encode the categorical columns, and drop the baseline column
# with lbfgs solver

model_sklearn_lbfgs = PoissonRegressor(alpha=0).fit(X_train, y_train)

print(model_sklearn_lbfgs.intercept_)
print(model_sklearn_lbfgs.coef_)
# with newton-cholesky solver

model_sklearn_nc = PoissonRegressor(alpha=0, solver='newton-cholesky').fit(X_train, y_train)

print(model_sklearn_nc.intercept_)
print(model_sklearn_nc.coef_)

Expected Results

Expected intercept/coefficients (from statsmodels and sklearn with newton-cholesky solver):

2.8456859090224444
[-4.47436243e-01 -9.30629212e-01 -1.46547942e+00  1.00603354e+00
  4.71328611e-01 -5.98213625e-01  5.69685530e-01  6.85301286e-01
  2.22504699e+00 -1.58161172e-05]

Actual Results

Result with lbfgs:

3.896592058397603
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
/home/akaash/Downloads/statsmodels-play/venv/lib/python3.10/site-packages/sklearn/linear_model/_linear_loss.py:290: RuntimeWarning: invalid value encountered in matmul
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights

Versions

System:
    python: 3.10.9 (main, Mar  1 2023, 18:23:06) [GCC 11.2.0]
executable: /home/akaash/Downloads/statsmodels-play/venv/bin/python
   machine: Linux-5.19.0-46-generic-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.3.0
          pip: 23.2.1
   setuptools: 68.0.0
        numpy: 1.25.1
        scipy: 1.11.1
       Cython: None
       pandas: 2.0.3
   matplotlib: 3.7.2
       joblib: 1.3.1
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 6
         prefix: libopenblas
       filepath: /home/akaash/Downloads/statsmodels-play/venv/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-7a851222.3.23.so
        version: 0.3.23
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 6
         prefix: libopenblas
       filepath: /home/akaash/Downloads/statsmodels-play/venv/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 6
         prefix: libgomp
       filepath: /home/akaash/Downloads/statsmodels-play/venv/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

The text was updated successfully, but these errors were encountered:

Micky774 · 2023-08-11T16:10:45Z

@lorentzenchr wondering if you may have some insight on this

akaashp2000 · 2023-08-11T18:25:04Z

Quick update on this:

It seems the issue is not scikit-learn specific, but rather it is specific to lbfgs solver. Even with statsmodels, fitting a Poisson GLM using lbfgs gives errors

# Fit Poisson regression using formula interface WITH lbfgs solver
import statsmodels.formula.api as smf
data = sm.datasets.get_rdataset('Insurance', package='MASS').data
formula = "Claims ~ C(District, Treatment(1)) + C(Group, Treatment('<1l')) + C(Age, Treatment('<25')) + Holders"

model_smf_lbfgs = smf.poisson(formula=formula, data=data).fit(method="lbfgs")
print(type(model_smf_lbfgs))
print(model_smf_lbfgs.summary())

gives the following error too:

At X0         0 variables are exactly at the bounds

At iterate    0    f=  3.19136D+01    |proj g|=  9.62165D+04

At iterate    1    f=  1.80465D+01    |proj g|=  6.63668D+03

At iterate    2    f=  1.79172D+01    |proj g|=  3.18678D+03

At iterate    3    f=  1.78811D+01    |proj g|=  2.51280D+02

At iterate    4    f=  1.78809D+01    |proj g|=  2.15869D+01

At iterate    5    f=  1.78809D+01    |proj g|=  2.15982D+01

At iterate    6    f=  1.78807D+01    |proj g|=  1.44105D+02

At iterate    7    f=  1.78802D+01    |proj g|=  3.22683D+02
This problem is unconstrained.
ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "

lorentzenchr · 2023-08-14T10:03:17Z

@akaashp2000 Could you run it with verbose=3 or higher? It seems like the coefficients get huge, nan or inf.

akaashp2000 · 2023-08-25T10:25:03Z

@lorentzenchr sorry for the late response on this.

With the same X_train and y_train as in the original post, I ran with verbose = 10

Code:

# one-hot encode the categorical columns, and drop the baseline column
# with lbfgs solver

verbose = 10

model_sklearn_lbfgs_verbose = PoissonRegressor(alpha=0, verbose=verbose).fit(X_train, y_train)

print("verbose", verbose)
print(model_sklearn_lbfgs_verbose.intercept_)
print(model_sklearn_lbfgs_verbose.coef_)

Output:

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           11     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f= -1.42612D+02    |proj g|=  4.30046D+04
verbose 10
3.896592058397603
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
   11      1      3      1     0     0   4.300D+04  -1.426D+02
  F =  -142.61189962516949     

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH

For different values of verbose (I tried 3 and 50 as well) the output was identical (aside from the line for print("verbose", verbose))

ogrisel · 2023-08-25T13:47:36Z

Could you print the value of grad_pointwise (by editing /home/akaash/Downloads/statsmodels-play/venv/lib/python3.10/site-packages/sklearn/linear_model/_linear_loss.py:290) at each iteration?

ogrisel · 2023-08-25T13:55:53Z

I tried the reproducer with verbose=100 and I get a different output:

>>> model_sklearn_lbfgs = PoissonRegressor(alpha=0, verbose=100).fit(X_train, y_train)
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           11     M =           10
 This problem is unconstrained.

At X0         0 variables are exactly at the bounds

At iterate    0    f= -1.42612D+02    |proj g|=  4.30046D+04


ITERATION     1

---------------- CAUCHY entered-------------------
 There are            0   breakpoints 

 GCP found in this segment
Piece      1 --f1, f2 at start point  -1.8494D+09  1.8494D+09
Distance to the stationary point =   1.0000D+00

---------------- exit CAUCHY----------------------

          11  variables are free at GCP            1
/Users/ogrisel/mambaforge/envs/tmp/lib/python3.11/site-packages/sklearn/linear_model/_linear_loss.py:290: RuntimeWarning: invalid value encountered in matmul
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
 LINE SEARCH           1  times; norm of step =    0.0000000000000000     

At iterate    1    f= -1.42612D+02    |proj g|=  4.30046D+04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
   11      1      3      1     0     0   4.300D+04  -1.426D+02
  F =  -142.61189962516949     

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH

Note that the warnings happens after the CAUCHY subroutine.

I printed the grad_pointwise values and there are indeed infinite values there.

ogrisel · 2023-08-25T14:19:07Z

I have tried to change the penalty (alpha) and we still get non-finite gradients.

Not sure what the fix would be but we could at least raise an error or a convergence warning with a more user friendly error message if intermediate computation produce non-finite gradient values.

lorentzenchr · 2023-08-25T15:37:48Z

I suspect that the columns "Holders" with a max value of 3e3 produces an overflow when exponentiated. If that's true, there is not much we can do about it.
A user can

use a pipeline with a StandardScaler
use a different solver
use a different package like glum (which defaults to the newton cholesky like solver)

ogrisel · 2023-08-25T17:23:52Z

We could detect the non-finite gradient values and issue a ConvergenceWarning to suggest to scale the features (and target?) and/or try alternative solvers.

akaashp2000 · 2023-08-25T22:55:12Z

@ogrisel

yes I am in fact getting the same output with verbose = 100 as you
... and the same issue of infinite values for grad_pointwise

following your suggestion to print the value at each iteration, I added a line print(gradient_pointwise) at Line 290 of _linear_loss.py. Without any verbose value specified, I found that the first value printed had reasonable coordinate values, then the second value had very large/infinite coordinate values, then it reverted to the first value of grad_pointwise, and then printed the intercept (3.896...) and coefficients (all zeroes)

@lorentzenchr

I have now tried fitting the GLM with LBFGS, now on the same X_train but with the "Holders" column scaled, i.e.

X_train_scaled = X_train.copy()
X_train_scaled["Holders"] = X_train_scaled["Holders"]/X_train_scaled["Holders"].max()

and it does indeed converge now! And to the same coefficients as from solver='newton-cholesky' (aside from the scaling of 'Holders' coefficient due to the scaling of the feature). Scaling the feature works for both statsmodels and scikit-learn now.

akaashp2000 · 2023-08-25T23:12:12Z

I would be happy to help by adding a ConvergenceWarning and/or suggestion to scale features/use another solver as @ogrisel suggested, should any infinite values occur in grad_pointwise.

My initial thought is this could be done in the LinearModelLoss class' loss_gradient, immediately after loss_gradient is defined. What are your thoughts, or is there a better approach?

lorentzenchr · 2023-08-26T07:15:08Z

An explicit check for finiteness could impact the fit time. Catching the runtime warning could be a way to go.

ogrisel · 2023-08-26T20:42:33Z

We could indeed force numpy to raise an exception during the matmul and rewrap it with a ValueError with a more informative error message.

with np.errstate(all='raise'):
    # do the matmul here
    ...

https://numpy.org/doc/stable/reference/generated/numpy.errstate.html#numpy.errstate

akaashp2000 · 2023-09-03T01:27:06Z

@ogrisel

I am trying the approach you have suggested above - from what I understand using np.errstate means that instead of the RuntimeWarning, a (FloatingPointError) exception is raised, so I have tried the following change: where the matmul was before, I now have:

with np.errstate(all="raise"):
    try:
        grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
    except FloatingPointError:
        raise ValueError(
            "Overflow detected. Try scaling the target variable or"
            " features, or using a different solver"
        ) from None

(I believe the None surpresses the original FloatingPointErrror, so only the ValueError is returned... below is the output)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 4
      1 # one-hot encode the categorical columns, and drop the baseline column
      2 # with lbfgs solver
----> 4 model_sklearn_lbfgs = PoissonRegressor(alpha=0).fit(X_train, y_train)
      6 print(model_sklearn_lbfgs.intercept_)
      7 print(model_sklearn_lbfgs.coef_)

File ~/Downloads/scikit-learn/sklearn/base.py:1215, in _fit_context..decorator..wrapper(estimator, *args, **kwargs)
   1208     estimator._validate_params()
   1210 with config_context(
   1211     skip_parameter_validation=(
   1212         prefer_skip_nested_validation or global_skip_validation
   1213     )
   1214 ):
-> 1215     return fit_method(estimator, *args, **kwargs)

File ~/Downloads/scikit-learn/sklearn/linear_model/_glm/glm.py:265, in _GeneralizedLinearRegressor.fit(self, X, y, sample_weight)
    262 if self.solver == "lbfgs":
    263     func = linear_loss.loss_gradient
--> 265     opt_res = scipy.optimize.minimize(
    266         func,
    267         coef,
    268         method="L-BFGS-B",
...
    297         ) from None
    298 if self.fit_intercept:
    299     grad[-1] = grad_pointwise.sum()

ValueError: Overflow detected. Try scaling the target variable or features, or using a different solver

Is this what you meant? Could something similar could be done for multiclass case too (i.e. when self.base_loss.is_multiclass is True)?

lorentzenchr · 2024-01-11T13:47:54Z

@akaashp2000 Yes, such an error message is much more informative. Could you also check that actually scaling first does solve the problem? If so, then a PR for the improved error message would be much appreciated.

stes · 2024-08-16T13:02:19Z

I gave this another shot and made a suggestion in #29681, based on @akaashp2000 's PR, but including tests.

However, I found another edge case that is not detectable (see https://github.com/scikit-learn/scikit-learn/pull/29681/files#r1719809543), to repro:

import sys
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
from sklearn.linear_model import PoissonRegressor
from sklearn.preprocessing import (
    OneHotEncoder,
)

data = sm.datasets.get_rdataset('Insurance', package='MASS').data

X_train_ohe = OneHotEncoder(sparse_output=False, drop=[1, "<1l", "<25"]).fit(data[["District", "Group", "Age"]])
X_train_ohe = pd.DataFrame(X_train_ohe.transform(data[["District", "Group", "Age"]]), columns=X_train_ohe.get_feature_names_out())

# NOTE: Use just the last colum here
X_train = data[["Holders"]]
y_train = data["Claims"]

model_sklearn_lbfgs = PoissonRegressor(alpha=0, tol = 1e-8, max_iter = 10000, verbose = True).fit(X_train, y_train)
print(model_sklearn_lbfgs.intercept_)
print(model_sklearn_lbfgs.coef_)

# with newton-cholesky solver
model_sklearn_nc = PoissonRegressor(alpha=0, solver='newton-cholesky', tol = 1e-8, max_iter = 1000).fit(X_train, y_train)
print(model_sklearn_nc.intercept_)
print(model_sklearn_nc.coef_)

which gives

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            2     M =           10
 This problem is unconstrained.

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    2      1      3      1     0     0   4.300D+04  -1.426D+02

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH             
3.896592058397603
[0.]
3.2885810888136406
[0.00087977]

How to proceed?

akaashp2000 added Bug Needs Triage Issue requires triage labels Aug 4, 2023

ogrisel removed the Needs Triage Issue requires triage label Aug 25, 2023

This was referenced Sep 10, 2023

lbfgs solver giving coefficients of 0 and Runtime Warning #27331

Closed

Raise ValueError instead of RuntimeWarning for LinearModelLoss #27332

Open

Issue27016 raise value error: Fix failing pytest akaashp2000/scikit-learn#3

Merged

lorentzenchr added the help wanted label Jan 11, 2024

glemaitre added this to Losses and solvers May 17, 2024

glemaitre moved this to Todo in Losses and solvers May 17, 2024

stes linked a pull request Aug 16, 2024 that will close this issue

Add ValueError in case of numerical issues during PoissonRegressor lbfgs solver fit #29681

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoissonRegressor lbfgs solver giving coefficients of 0 and Runtime Warning #27016

PoissonRegressor lbfgs solver giving coefficients of 0 and Runtime Warning #27016

PoissonRegressor lbfgs solver giving coefficients of 0 and Runtime Warning #27016

PoissonRegressor lbfgs solver giving coefficients of 0 and Runtime Warning #27016

Comments

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions