10000 FEA add newton-lsmr solver to LogisticRegression and GLMs by lorentzenchr · Pull Request #25462 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FEA add newton-lsmr solver to LogisticRegression and GLMs #25462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
bfb29a0
ENH add NewtonLSMRSolver
lorentzenchr Nov 4, 2022
9c3fd7f
TST add test_solver_on_ill_conditioned_X
lorentzenchr Nov 6, 2022
3874810
ENH add multinomial to LSMR
lorentzenchr Nov 19, 2022
86da909
ENH add newton-lsmr to LogisticRegression
lorentzenchr Jan 23, 2023
1d13089
CLN fix dtype and tests
lorentzenchr Jan 24, 2023
4e1a696
ENH speed up LDL by better handling q==0
lorentzenchr Jan 24, 2023
7ef5877
ENH speed up LDL by using einsum and q_inv
lorentzenchr Jan 25, 2023
65d7a87
Merge branch 'main' into glm_newton_lsmr_only
lorentzenchr Feb 13, 2023
74dab94
TST fix atol in test_multinomial_identifiability_properties
lorentzenchr Feb 13, 2023
0dc87cf
DOC add whatsnew
lorentzenchr Feb 13, 2023
8e78465
TST reduce tolerances
lorentzenchr Feb 23, 2023
3da89d5
TST skip LinearOperator transpose for scipy<1.4
lorentzenchr Feb 23, 2023
298e63e
TST fix skipif
lorentzenchr Feb 24, 2023
27ddf56
TST loosen rtol a bit
lorentzenchr Feb 24, 2023
d26004d
TST make tests pass for all random seeds
lorentzenchr Feb 24, 2023
59c1322
DOC add comment about initial A_norm
lorentzenchr Feb 24, 2023
a84b938
Merge branch 'main' into glm_newton_lsmr_only
lorentzenchr Jun 2, 2023
c50c859
FIX warning condition and cleaner for loop
lorentzenchr Jun 3, 2023
cf9facc
ENH improve Multinomial_LDL_Decomposition by precomputation
lorentzenchr Jun 3, 2023
fc1cb24
DOC improve docstrings and comments
lorentzenchr Jun 4, 2023
e7368e7
ENH inner stopping criterion for LSMR with forcing sequence
lorentzenchr Jun 4, 2023
7295409
DOC improve docs and comments of Newton solvers
lorentzenchr Jun 6, 2023
3ea7d98
ENH set initial A_norm based on n_samples, n_features and l2_reg_stre…
lorentzenchr Jun 11, 2023
99a9508
CLN address review comments
lorentzenchr Jun 11, 2023
d108ffc
TST set lsmr to xfail in test_solver_on_ill_conditioned_X like other …
lorentzenchr Jun 11, 2023
ab075fe
Doc docstring solver arg in LogisticRegression
lorentzenchr Jun 12, 2023
e4c0ee3
ENH set btol back to self.tol
lorentzenchr Jun 14, 2023
39030c4
TST higher tol in test_glm_regression_hstacked_X
lorentzenchr Jun 14, 2023
83ce34f
ENH increase conlim to make more tests pass
lorentzenchr Jun 15, 2023
8846903
Merge branch 'main' into glm_newton_lsmr_only
lorentzenchr Jun 16, 2023
b887390
ENH verbose print total number of LSMR iterations
lorentzenchr Jun 17, 2023
df94b5f
DOC inner_solve sets self.lsmr_iter
lorentzenchr Jun 26, 2023
23332fc
DOC enhance Taylor series and comments
lorentzenchr Jun 22, 2023
861be08
ENH more robust choice of atol (with some memory)
lorentzenchr Jun 26, 2023
973329a
Merge branch 'main' into glm_newton_lsmr_only
lorentzenchr Jun 27, 2023
8627b0a
Merge branch 'main' into glm_newton_lsmr_only
lorentzenchr Nov 15, 2023
3e2d1ea
FIX adapt to new loss and gradient scaling
lorentzenchr Nov 15, 2023
9409891
FIX sw_sum for newton-lsmr
lorentzenchr Nov 16, 2023
a49d25b
CLN fix import order
lorentzenchr Nov 16, 2023
bdd5230
Merge branch 'main'
lorentzenchr Nov 16, 2023
52381c2
FIX tests with sw_sum
lorentzenchr Nov 16, 2023
fe7bddd
Merge branch 'main' into glm_newton_lsmr_only
lorentzenchr Mar 16, 2024
2e80557
Merge branch 'main' into glm_newton_lsmr_only
lorentzenchr Apr 12, 2024
71d2733
CLN address review comments, move to 1.5
lorentzenchr Apr 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions doc/whats_new/v1.5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,17 @@ Changelog
:mod:`sklearn.linear_model`
...........................

- |Enhancement| :class:`linear_model.LogisticRegression`,
:class:`linear_model.LogisticRegressionCV`, :class:`linear_model.GammaRegressor`,
:class:`linear_model.PoissonRegressor` and :class:`linear_model.TweedieRegressor` got
a new solver `solver="newton-lsmr"`. This is a 2nd order (Newton) optimisation
routine that uses the iterative LSMR algorithm: to find the Newton direction in each
step, the 2nd order equation is cast as a least squares problem and solved
iteratively, therefore called iteratively reweighted least squares (IRLS), via LSMR.
Due to using LSMR, only matrix-vector multiplications are used and sparse matrices
are supported as well. Especially for multiclass problems it might be worth a try.
:pr:`25462` by :user:`Christian Lorentzen <lorentzenchr>`.

- |Fix| :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV`,
:class:`linear_model.Lasso` and :class:`linear_model.LassoCV` now explicitly don't
accept large sparse data formats. :pr:`27576` by :user:`Stefanie Senger
Expand Down
602 changes: 593 additions & 9 deletions sklearn/linear_model/_glm/_newton_solver.py

Large diffs are not rendered by default.

71 changes: 63 additions & 8 deletions sklearn/linear_model/_glm/glm.py
57AE
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,12 @@
from ...utils.optimize import _check_optimize_result
from ...utils.validation import _check_sample_weight, check_is_fitted
from .._linear_loss import LinearModelLoss
from ._newton_solver import NewtonCholeskySolver, NewtonSolver
from ._newton_solver import NewtonCholeskySolver, NewtonLSMRSolver, NewtonSolver

NEWTON_SOLVER = {
"newton-cholesky": NewtonCholeskySolver,
"newton-lsmr": NewtonLSMRSolver,
}


class _GeneralizedLinearRegressor(RegressorMixin, BaseEstimator):
Expand Down Expand Up @@ -65,7 +70,7 @@ class _GeneralizedLinearRegressor(RegressorMixin, BaseEstimator):
Specifies if a constant (a.k.a. bias or intercept) should be
added to the linear predictor (X @ coef + intercept).

solver : {'lbfgs', 'newton-cholesky'}, default='lbfgs'
solver : {'lbfgs', 'newton-cholesky', 'newton-lsmr'}, default='lbfgs'
Algorithm to use in the optimization problem:

'lbfgs'
Expand All @@ -81,6 +86,20 @@ class _GeneralizedLinearRegressor(RegressorMixin, BaseEstimator):

.. versionadded:: 1.2

'newton-lsmr'
Uses Newton-Raphson steps formulated as iteratively reweighted least
squares (IRLS), which is solved by LSMR. Contrary to `newton-cholesky`,
this solver does not explicitly materialize the Hessian matrix but instead
leverages knowledge about its structure to incrementally solve the least
squares problems via a series of matrix vector operations where the
matrices have block structure with block sizes scaling as
`(n_samples, n_features)` and `(n_samples, n_classes)` therefore limiting
the memory requirements.
Additionaly, this is numerically more stable for ill-conditioned X compared
to `newton-cholesky`.

.. versionadded:: 1.5

max_iter : int, default=100
The maximal number of iterations for the solver.
Values must be in the range `[1, inf)`.
Expand Down Expand Up @@ -140,7 +159,7 @@ class _GeneralizedLinearRegressor(RegressorMixin, BaseEstimator):
"alpha": [Interval(Real, 0.0, None, closed="left")],
"fit_intercept": ["boolean"],
"solver": [
StrOptions({"lbfgs", "newton-cholesky"}),
StrOptions({"lbfgs", "newton-cholesky", "newton-lsmr"}),
Hidden(type),
],
"max_iter": [Interval(Integral, 1, None, closed="left")],
Expand Down Expand Up @@ -284,8 +303,8 @@ def fit(self, X, y, sample_weight=None):
)
self.n_iter_ = _check_optimize_result("lbfgs", opt_res)
coef = opt_res.x
elif self.solver == "newton-cholesky":
sol = NewtonCholeskySolver(
elif self.solver in NEWTON_SOLVER.keys():
sol = NEWTON_SOLVER[self.solver](
coef=coef,
linear_loss=linear_loss,
l2_reg_strength=l2_reg_strength,
Expand Down Expand Up @@ -483,7 +502,7 @@ class PoissonRegressor(_GeneralizedLinearRegressor):
Specifies if a constant (a.k.a. bias or intercept) should be
added to the linear predictor (`X @ coef + intercept`).

solver : {'lbfgs', 'newton-cholesky'}, default='lbfgs'
solver : {'lbfgs', 'newton-cholesky', 'newton-lsmr'}, default='lbfgs'
Algorithm to use in the optimization problem:

'lbfgs'
Expand All @@ -499,6 +518,18 @@ class PoissonRegressor(_GeneralizedLinearRegressor):

.. versionadded:: 1.2

'newton-lsmr'
Uses Newton-Raphson steps formulated as iteratively reweighted least
squares (IRLS), which is solved by LSMR. Contrary to `newton-cholesky`,
this solver does not explicitly materialize the Hessian matrix but instead
leverages knowledge about its structure to incrementally solve the least
squares problems via a series of matrix vector operations where the
matrices have size `(n_samples, n_features)`.
Additionaly, this is numerically more stable for ill-conditioned X compared
to `newton-cholesky`.

.. versionadded:: 1.5

max_iter : int, default=100
The maximal number of iterations for the solver.
Values must be in the range `[1, inf)`.
Expand Down Expand Up @@ -614,7 +645,7 @@ class GammaRegressor(_GeneralizedLinearRegressor):
Specifies if a constant (a.k.a. bias or intercept) should be
added to the linear predictor `X @ coef_ + intercept_`.

solver : {'lbfgs', 'newton-cholesky'}, default='lbfgs'
solver : {'lbfgs', 'newton-cholesky', 'newton-lsmr'}, default='lbfgs'
Algorithm to use in the optimization problem:

'lbfgs'
Expand All @@ -630,6 +661,18 @@ class GammaRegressor(_GeneralizedLinearRegressor):

.. versionadded:: 1.2

'newton-lsmr'
Uses Newton-Raphson steps formulated as iteratively reweighted least
squares (IRLS), which is solved by LSMR. Contrary to `newton-cholesky`,
this solver does not explicitly materialize the Hessian matrix but instead
leverages knowledge about its structure to incrementally solve the least
squares problems via a series of matrix vector operations where the
matrices have size `(n_samples, n_features)`.
Additionaly, this is numerically more stable for ill-conditioned X compared
to `newton-cholesky`.

.. versionadded:: 1.3

max_iter : int, default=100
The maximal number of iterations for the solver.
Values must be in the range `[1, inf)`.
Expand Down Expand Up @@ -776,7 +819,7 @@ class TweedieRegressor(_GeneralizedLinearRegressor):
- 'log' for ``power > 0``, e.g. for Poisson, Gamma and Inverse Gaussian
distributions

solver : {'lbfgs', 'newto 9E13 n-cholesky'}, default='lbfgs'
solver : {'lbfgs', 'newton-cholesky', 'newton-lsmr'}, default='lbfgs'
Algorithm to use in the optimization problem:

'lbfgs'
Expand All @@ -792,6 +835,18 @@ class TweedieRegressor(_GeneralizedLinearRegressor):

.. versionadded:: 1.2

'newton-lsmr'
Uses Newton-Raphson steps formulated as iteratively reweighted least
squares (IRLS), which is solved by LSMR. Contrary to `newton-cholesky`,
this solver does not explicitly materialize the Hessian matrix but instead
leverages knowledge about its structure to incrementally solve the least
squares problems via a series of matrix vector operations where the
matrices have size `(n_samples, n_features)`.
Additionaly, this is numerically more stable for ill-conditioned X compared
to `newton-cholesky`.

.. versionadded:: 1.3

max_iter : int, default=100
The maximal number of iterations for the solver.
Values must be in the range `[1, inf)`.
Expand Down
Loading
Loading
0