FEA add newton-lsmr solver to LogisticRegression and GLMs #25462

lorentzenchr · 2023-01-23T19:47:54Z

Reference Issues/PRs

Supersedes #23507.
Fixes #16634.

What does this implement/fix? Explain your changes.

This PR adds a further NewtonSolver: NewtonLSMRSolver.
This solver uses the iteratively reweighted least squares (IRLS) formulation of a Newton step. This means the inner solver uses the square root of the Hessian and solves the corresponding least squares problem (as opposed to solving the normal equation as "newton-cholesky" is doing) with the iterative LSMR solver.

This solver is therefore suited for dense and sparse X.

Any other comments?

The multinomial/multiclass case deserves special attention as there are different ways to look at the Hessian $X' W X$:

Naive: Use, e.g. for n_classes=3

X = [X 0 0]  W = [W00 W10 W00]  y = [y==0    0    0]  coef = [coef_class_0]  ...
    [0 X 0]      [W10 W11 W00]      [   0 y==1    0]         [coef_class_1]
    [0 0 X]      [W20 W10 W22]      [   0    0 y==2]         [coef_class_2]

Consider every 1d-array/2d-array as a 2d-/3d-array with its 2nd/3rd dimension having all n_samples. Then
$W = \mathrm{diag}(p) - p'p$ and $p$ the probability array in n_classes (and n_samples as "depth").
Now, one can use the LDL decomposition of this particular matrix $W$, analytically given in https://doi.org/10.1111/j.2517-6161.1992.tb01875.x, use $\sqrt{D} L'$ as square root of $W$.
This is the chosen approach.

Edit: For benchmarks, see #25462 (comment).

lorentzenchr · 2023-01-23T19:55:38Z

@ogrisel @mathurinm @TomDLT @agramfort @rth might be interested as this seems to be new ground for GLM solvers, especially the multinomial logistic regression!

It was a very stony path to arrive with all (added) tests green. Right now, I've no energy to do extensive benchmarking. But I hope, that this work will become useful and find its way into scikit-learn, in the end. I'm sure, there are opportunities left for performance optimization.

ogrisel · 2023-01-24T09:13:50Z

Glad to see this! I just re-ran the previous benchmark for Poisson regression on the French MTPL dataset from the previous PR:

FEA add GLM Newton-LSMR Solver on top of Newton-Cholesky #23507 (comment)

and here are the results on my laptop:

so this looks very good.

ogrisel · 2023-01-24T12:29:51Z

I have adapted the above benchmark to turn it into an imbalanced multiclass classification problem by binning the target. Since 0 is overly represented, when choosing a large number of bins and the quantile strategy, many bins are collapsed to 0.

Here is the code:

import warnings
from pathlib import Path
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.datasets import fetch_openml
from sklearn.linear_model import PoissonRegressor, LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model._linear_loss import LinearModelLoss
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split
from time import perf_counter
import pandas as pd
import joblib


m = joblib.Memory(location=".", verbose=0)


@m.cache
def prepare_data():
    df = fetch_openml(data_id=41214, as_frame=True, parser='auto').frame
    df["Frequency"] = df["ClaimNb"] / df["Exposure"]
    log_scale_transformer = make_pipeline(
        FunctionTransformer(np.log, validate=False), StandardScaler()
    )
    linear_model_preprocessor = ColumnTransformer(
        [
            ("passthrough_numeric", "passthrough", ["BonusMalus"]),
            (
                "binned_numeric",
                KBinsDiscretizer(n_bins=10, subsample=None),
                ["VehAge", "DrivAge"],
            ),
            ("log_scaled_numeric", log_scale_transformer, ["Density"]),
            (
                "onehot_categorical",
                OneHotEncoder(),
                ["VehBrand", "VehPower", "VehGas", "Region", "Area"],
            ),
        ],
        remainder="drop",
    )
    y = df["Frequency"]
    w = df["Exposure"]
    X = linear_model_preprocessor.fit_transform(df)
    return X, y, w


X, y_orig, w = prepare_data()

print("binning the target...")
binner = KBinsDiscretizer(
    n_bins=300, encode="ordinal", strategy="quantile", subsample=int(2e5), random_state=0
)
y = binner.fit_transform(y_orig.to_numpy().reshape(-1, 1)).ravel().astype(np.int32)

# X = X.toarray()
X_train, X_test, y_train, y_test, w_train, w_test = train_test_split(
    X, y, w, train_size=10_000, test_size=10_000, random_state=0
)
print(f"{X_train.shape = }")
print("y_train.value_counts() :")
print(pd.Series(y_train).value_counts())

results = []
slow_solvers = set()
for tol in np.logspace(-1, -10, 10):
    for solver in ["lbfgs", "newton-cg", "newton-lsmr"]:
        if solver in slow_solvers:
            # skip slow solvers to keep the benchmark runtime reasonable
            continue
        tic = perf_counter()
        # with warnings.catch_warnings():
        #     warnings.filterwarnings("ignore", category=ConvergenceWarning)
        clf = LogisticRegression(
            C=1e12, solver=solver, tol=tol, max_iter=10000
        ).fit(X_train, y_train)
        toc = perf_counter()
        train_time = toc - tic
        if train_time > 200:
            # skip this solver from now on...
            slow_solvers.add(solver)
        # TODO: handle the regularization term...
        train_loss = log_loss(y_train, clf.predict_proba(X_train))
        n_iter = clf.n_iter_[0]
        result = {
            "solver": solver,
            "tol": tol,
            "train_loss": train_loss,
            "train_time": train_time,
            "train_score": clf.score(X_train, y_train),
            "test_score": clf.score(X_test, y_test),
            "n_iter": n_iter,
            "converged": n_iter < clf.max_iter,
        }
        print(result)
        results.append(result)


results = pd.DataFrame.from_records(results)
filepath = Path().resolve() / "bench_multinomial_logistic_regression_mtpl.csv"
results.to_csv(filepath)


import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt


results = pd.read_csv(filepath)
results["suboptimality"] = results["train_loss"] - results["train_loss"].min() + 1e-15
fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(
        x="n_iter", y="suboptimality", loglog=True, marker="o", label=label, ax=ax
    )
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by iterations")

fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(
        x="train_time", y="suboptimality", loglog=True, marker="o", label=label, ax=ax
    )
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by time")
plt.show()

DISCLAIMER: the plot above displays the unpenalized train_loss while the model where fitted with C=1e12 so take those results with a grain of salt. I should have tried to completely disable penalization. And maybe also plot the (unpenalized) test negative log likelihood.

EDIT: I did another run with C=np.inf and the results are similar:

This task is very challenging for all solvers and I had to decrease the number of samples to get it run in a reasonable time on my laptop. I also stopped recording solver when tol decreases to the point where a single fit would last more than a few minutes.

Here are the resulting plots:

Note that the handling of the stopping criterion of LBFGS is not working properly for the LogitistRegression estimator as was previously reported in #24752.

newton-lsmr is slower than alternatives at the beginning but can still converge to low tol values while newton-cg would probably a lot more time (if it ever could in the first place).

Note that for lower tolerance values, the above snippet can trigger:

/Users/ogrisel/code/scikit-learn/sklearn/linear_model/_linear_loss.py:867: RuntimeWarning: divide by zero encountered in divide
  fj = self.p[:, i] / (self.q[:, i - 1] + mask)
/Users/ogrisel/code/scikit-learn/sklearn/linear_model/_linear_loss.py:873: RuntimeWarning: invalid value encountered in add
  x[:, i] += fj * x[:, j]

for the newton-lsmr solver. Yet it does not prevent the convergence.

Finally, I think it would be interesting to adapt this benchmark to use benchopt and include it in this panel since it's quite challenging for most solvers yet still realistic enough.

ogrisel

Some early feedback after a quick glance at the code.

Out of curiosity, have you tried to profile this to pinpoint the bottlenecks for both the multinomial and non-multinomial cases?

I have the impression that multithreading is barely used (in the multiclass case) so it's probably not a few large BLAS calls as is the case for newton-cholesky.

ogrisel · 2023-01-24T14:11:09Z

sklearn/linear_model/_glm/_newton_solver.py

@@ -516,3 +532,460 @@ def inner_solve(self, X, y, sample_weight):
                )
            self.use_fallback_lbfgs_solve = True
            return
+
+
+class NewtonLSMRSolver(NewtonSolver):


I have the feeling the code would be simpler to follow if we had a special subclass to handle the if self.linear_loss.base_loss.is_multiclass case.

That would require introducing a factory function to do the dispatching to the correct solver class based on the loss object but I have the feeling that would be worth it.

sklearn/linear_model/_glm/_newton_solver.py

lorentzenchr · 2023-01-24T15:39:19Z

Out of curiosity, have you tried to profile this to pinpoint the bottlenecks for both the multinomial and non-multinomial cases?

For the multinomial/multiclass case, it clearly is LDL.sqrt_D_Lt_matmul and LDL.L_sqrt_D_matmul in A.matmul and A.rmatmul inside LSMR.

Edit: I was able to significantly speed up those 2 functions in e5f5f48. They are still the bottleneck, but much reduced (~2x).

- keep dtype float32 after LSMR - lower test precision in test_NewtonLSMRSolver_multinomial_A_b_on_3_classes

lorentzenchr · 2023-01-25T18:07:35Z

With the latest improvements it looks a bit better (btw n_classes=12)

Sparse X (as above)

Dense X

Added 24.02.2023

Conclusion

So this solver can be used for very fast but rough estimates or for high precision estimates:smirk:

Code for reproducibility:

import warnings
from pathlib import Path
import numpy as np
from scipy import sparse
from sklearn.compose import ColumnTransformer
from sklearn.datasets import fetch_openml
from sklearn.linear_model import PoissonRegressor, LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model._linear_loss import LinearModelLoss
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split
from time import perf_counter
import pandas as pd
import joblib


m = joblib.Memory(location=".", verbose=0)


@m.cache
def prepare_data():
    df = fetch_openml(data_id=41214, as_frame=True, parser='auto').frame
    df["Frequency"] = df["ClaimNb"] / df["Exposure"]
    log_scale_transformer = make_pipeline(
        FunctionTransformer(np.log, validate=False), StandardScaler()
    )
    linear_model_preprocessor = ColumnTransformer(
        [
            ("passthrough_numeric", "passthrough", ["BonusMalus"]),
            (
                "binned_numeric",
                KBinsDiscretizer(n_bins=10, subsample=None),
                ["VehAge", "DrivAge"],
            ),
            ("log_scaled_numeric", log_scale_transformer, ["Density"]),
            (
                "onehot_categorical",
                OneHotEncoder(),
                ["VehBrand", "VehPower", "VehGas", "Region", "Area"],
            ),
        ],
        remainder="drop",
    )
    y = df["Frequency"]
    w = df["Exposure"]
    X = linear_model_preprocessor.fit_transform(df)
    return X, y, w


X, y_orig, w = prepare_data()

print("binning the target...")
binner = KBinsDiscretizer(
    n_bins=300, encode="ordinal", strategy="quantile", subsample=int(2e5), random_state=0
)
y = binner.fit_transform(y_orig.to_numpy().reshape(-1, 1)).ravel().astype(np.int32)

# X = X.toarray()
X_train, X_test, y_train, y_test, w_train, w_test = train_test_split(
    X, y, w, train_size=10_000, test_size=10_000, random_state=0
)
print(f"{X_train.shape = }")
print(f"{sparse.issparse(X_train)=}")
print("y_train.value_counts() :")
print(pd.Series(y_train).value_counts())

results = []
slow_solvers = set()
for tol in np.logspace(-1, -10, 10):
    for solver in ["lbfgs", "newton-cg", "newton-lsmr"]:
        if solver in slow_solvers:
            # skip slow solvers to keep the benchmark runtime reasonable
            continue
        tic = perf_counter()
        # with warnings.catch_warnings():
        #     warnings.filterwarnings("ignore", category=ConvergenceWarning)
        clf = LogisticRegression(
            C=1e12, solver=solver, tol=tol, max_iter=10000
        ).fit(X_train, y_train)
        toc = perf_counter()
        train_time = toc - tic
        if train_time > 200:
            # skip this solver from now on...
            slow_solvers.add(solver)
        # TODO: handle the regularization term...
        train_loss = log_loss(y_train, clf.predict_proba(X_train))
        n_iter = clf.n_iter_[0]
        result = {
            "solver": solver,
            "tol": tol,
            "train_loss": train_loss,
            "train_time": train_time,
            "train_score": clf.score(X_train, y_train),
            "test_score": clf.score(X_test, y_test),
            "n_iter": n_iter,
            "converged": n_iter < clf.max_iter,
        }
        print(result)
        results.append(result)


results = pd.DataFrame.from_records(results)
filepath = Path().resolve() / "bench_multinomial_logistic_regression_mtpl.csv"
results.to_csv(filepath)


import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt


results = pd.read_csv(filepath)
results["suboptimality"] = results["train_loss"] - results["train_loss"].min() + 1e-15
fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(
        x="n_iter", y="suboptimality", loglog=True, marker="o", label=label, ax=ax
    )
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by iterations")

fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(
        x="train_time", y="suboptimality", loglog=True, marker="o", label=label, ax=ax
    )
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by time")
plt.show()

ogrisel · 2023-02-13T18:12:36Z

@lorentzenchr I am not sure if you saw but your last push triggered a CI failure. I did not investigate myself but I wanted to make sure that it not go unnoticed.

lorentzenchr · 2023-02-13T21:52:13Z

Looking at the last plot, I wonder why the LMSR-based solver seems to slow down after the first 4 iterations, before it accelerates in the last two again. Perhaps, the choice of atol in lsmr in the inner_solve could be improved.

lorentzenchr · 2023-02-14T06:34:14Z

The remaining CI error will be automatically fixed by setting scipy>=1.4, see scipy/scipy#7396. Note that the transpose is only taken in a few tests, the solver itself works fine with those older scipy versions.

ogrisel · 2023-02-14T09:45:39Z

The remaining CI error will be automatically fixed by setting scipy>=1.4, see scipy/scipy#7396.

For reference, the bump to scipy>=1.4 in main happens in this PR: #24665

lorentzenchr · 2023-02-24T11:05:57Z

CI all 🟢 again.

lorentzenchr · 2023-06-16T08:28:38Z

Commit 83ce34f passes SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest -n auto -rfE sklearn/linear_model/_glm/tests/test_glm.py locally on my laptop.

ogrisel · 2023-06-16T13:40:34Z

When trying to trigger the LBFGS fallback by trying to make the model extremely confident so has to make the Hessian numerically degenerate I triggered the following ZeroDivisionError at iteration 26 instead:

File ~/code/scikit-learn/sklearn/linear_model/_glm/_newton_solver.py:974 in inner_solve
    atol=eta * norm_G / (self.A_norm * self.r_norm),
ZeroDivisionError: float division by zero

>>> import numpy as np
... from sklearn.linear_model import LogisticRegression
... 
... x = np.array([-1e24] * 1 + [1e24] * 1)
... X = x.reshape(-1, 1)
... y = (x > 0).astype(np.int32)
... 
... lr = LogisticRegression(solver="newton-lsmr", penalty=None, verbose=100).fit(X, y)
... lr.n_iter_
[...]
Newton iter=26
    norm(gradient) = 4170877078229.1255
Traceback (most recent call last):
  Cell In[2], line 8
    lr = LogisticRegression(solver="newton-lsmr", penalty=None, verbose=100).fit(X, y)
  File ~/code/scikit-learn/sklearn/base.py:1148 in wrapper
    return fit_method(estimator, *args, **kwargs)
  File ~/code/scikit-learn/sklearn/linear_model/_logistic.py:1321 in fit
    fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, prefer=prefer)(
  File ~/code/scikit-learn/sklearn/utils/parallel.py:65 in __call__
    return super().__call__(iterable_with_config)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:1085 in __call__
    if self.dispatch_one_batch(iterator):
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:901 in dispatch_one_batch
    self._dispatch(tasks)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:819 in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/_parallel_backends.py:208 in apply_async
    result = ImmediateResult(func)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/_parallel_backends.py:597 in __init__
    self.results = batch()
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:288 in __call__
    return [func(*args, **kwargs)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:288 in <listcomp>
    return [func(*args, **kwargs)
  File ~/code/scikit-learn/sklearn/utils/parallel.py:127 in __call__
    return self.function(*args, **kwargs)
  File ~/code/scikit-learn/sklearn/linear_model/_logistic.py:485 in _logistic_regression_path
    w0 = sol.solve(X=X, y=target, sample_weight=sample_weight)
  File ~/code/scikit-learn/sklearn/linear_model/_glm/_newton_solver.py:426 in solve
    self.inner_solve(X=X, y=y, sample_weight=sample_weight)
  File ~/code/scikit-learn/sklearn/linear_model/_glm/_newton_solver.py:974 in inner_solve
    atol=eta * norm_G / (self.A_norm * self.r_norm),
ZeroDivisionError: float division by zero

We might need a np.max(self.A_norm * self.r_norm, np.finfo(dtype).eps) of similar here.

ogrisel · 2023-06-16T13:42:50Z

Also note that "newton-cholesky" and "lbfgs" converge with n_iter_ = [0]. "newton-cg" converges with n_iter_ = [64] on this problem.

ogrisel · 2023-06-16T13:50:13Z

Actually there is a problem with lbfgs on the above problem, it does not converge:

ABNORMAL_TERMINATION_IN_LNSRCH                              

 Line search cannot locate an adequate point after MAXLS
  function and gradient evaluations.
  Previous x, f and g restored.
 Possible causes: 1 error in function or gradient evaluation;
                  2 rounding error dominate computation.
/Users/ogrisel/code/scikit-learn/sklearn/linear_model/_logistic.py:459: ConvergenceWarning: lbfgs failed to converge (status=2):
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(

and

>>> lr.coef_
array([[0.]])

while "newton-cholesky" does converge on this toy yet extreme problem.

I don't know if the failure of lbfgs here should be considered a bug or not but since lbfgs is our robust fallback, this might be a problem :) Maybe LBFGS should try a simple gradient step when the line search fails?

ogrisel · 2023-06-16T13:59:23Z

I found a way to trigger the LBFGS fallback for the multiclass case:

import numpy as np
from sklearn.linear_model import LogisticRegression

x = np.array([-1e24] * 1 + [1e24] * 2)
X = x.reshape(-1, 1)
y = np.asarray([0, 1, 2])

lr = LogisticRegression(solver="newton-lsmr", penalty=None, verbose=100).fit(X, y)

lorentzenchr · 2023-06-26T22:31:43Z

I opened #26707 for investigating the inner solver stopping criterion and run a log of benchmarks. There is no clear winner. I have to leave it as is: Either it is good enough in it's current shape or someone else needs to dig deeper.

My conclusion is that we have quite some room of improvement of the current solvers, like #24752. Also the "newton-cg" could likely be improved by doing what liblinear does, see Galli & Lin "A Study on Truncated Newton Methods for Linear Classification" (https://www.csie.ntu.edu.tw/~cjlin/papers/tncg/tncg.pdf or https://doi.org/10.1109/TNNLS.2020.3045836).
Currently, I'm not 100% convinced of the newton-lsmr, but it is such a nice solver for multiclass problems!

github-actions · 2023-06-26T22:32:45Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 71d2733. Link to the linter CI: here}

ogrisel · 2023-06-27T16:19:31Z

Currently, I'm not 100% convinced of the newton-lsmr, but it is such a nice solver for multiclass problems!

There are cases where it's indeed quite impressive based on the last benchmarks that are now collapsed in the discussion.

#25462 (comment)

But I agree that fixing #24752 would be helpful to get a clearer picture.

Also based on benchopt, it seems that SAG & SAGA are better reference solvers for 20 newsgroups, see e.g.:

https://benchopt.github.io/results/preprint_results_preprint_results_logreg_l2.html

I have no intuition on why this should be the case.

You'll have to switch the dataset in the menu on the left to see the results on 20 newsgroups.

UPDATE: actually the results on this dataset are completely different with and without scaling.

jjerphan

Some comments I wrote a few months ago (they might not all be relevant).

sklearn/linear_model/_glm/glm.py

doc/whats_new/v1.3.rst

jjerphan · 2023-11-19T16:42:37Z

sklearn/linear_model/_linear_loss.py

+        if self.p.dtype == np.float32:
+            eps = 2 * np.finfo(np.float32).resolution
+        else:
+            eps = 2 * np.finfo(np.float64).resolution


Suggested change

if self.p.dtype == np.float32:

eps = 2 * np.finfo(np.float32).resolution

else:

eps = 2 * np.finfo(np.float64).resolution

eps = 2 * np.finfo(self.p.dtype).resolution

I'm not 100% sure that proba and hence self.p is always either float32 or float64.

sklearn/linear_model/_logistic.py

sklearn/linear_model/_glm/tests/test_glm.py

sklearn/linear_model/_linear_loss.py

sklearn/linear_model/tests/test_logistic.py

jjerphan · 2023-11-25T16:41:57Z

sklearn/linear_model/tests/test_logistic.py

+        elif solver == "newton-lsmr":
+            clf.set_params(tol=1e-6)


This adaptation makes me think of an UX consideration: do we want to have the tolerance settable by the choice of the solver?

What do you have in mind?

github-actions bot added the module:linear_model label Jan 23, 2023

ogrisel reviewed Jan 24, 2023

View reviewed changes

lorentzenchr changed the title ~~ENH add newton-lsmr solver to LogisticRegression and GLMs~~ FEA add newton-lsmr solver to LogisticRegression and GLMs Jan 24, 2023

lorentzenchr force-pushed the glm_newton_lsmr_only branch from 885b413 to d0eea42 Compare January 25, 2023 17:00

lorentzenchr added 7 commits January 25, 2023 18:01

ENH add NewtonLSMRSolver

bfb29a0

TST add test_solver_on_ill_conditioned_X

9c3fd7f

ENH add multinomial to LSMR

3874810

ENH add newton-lsmr to LogisticRegression

86da909

CLN fix dtype and tests

1d13089

- keep dtype float32 after LSMR - lower test precision in test_NewtonLSMRSolver_multinomial_A_b_on_3_classes

ENH speed up LDL by better handling q==0

4e1a696

ENH speed up LDL by using einsum and q_inv

7ef5877

lorentzenchr force-pushed the glm_newton_lsmr_only branch from d0eea42 to 7ef5877 Compare January 25, 2023 17:01

lorentzenchr added 2 commits February 13, 2023 20:45

Merge branch 'main' into glm_newton_lsmr_only

65d7a87

TST fix atol in test_multinomial_identifiability_properties

74dab94

DOC add whatsnew

0dc87cf

lorentzenchr force-pushed the glm_newton_lsmr_only branch from fa2586f to 0dc87cf Compare February 14, 2023 05:36

lorentzenchr added 4 commits February 23, 2023 17:43

TST reduce tolerances

8e78465

TST skip LinearOperator transpose for scipy<1.4

3da89d5

TST fix skipif

298e63e

TST loosen rtol a bit

27ddf56

lorentzenchr added 2 commits June 14, 2023 22:31

TST higher tol in test_glm_regression_hstacked_X

39030c4

ENH increase conlim to make more tests pass

83ce34f

Merge branch 'main' into glm_newton_lsmr_only

8846903

ENH verbose print total number of LSMR iterations

b887390

lorentzenchr mentioned this pull request Jun 26, 2023

BENCH newton-lsmr solver for GLMs - inner stopping criterion #26707

Draft

lorentzenchr added 3 commits June 27, 2023 00:09

DOC inner_solve sets self.lsmr_iter

df94b5f

DOC enhance Taylor series and comments

23332fc

ENH more robust choice of atol (with some memory)

861be08

Merge branch 'main' into glm_newton_lsmr_only

973329a

ogrisel mentioned this pull request Jun 27, 2023

Fix scaling of LogisticRegression objective for LBFGS #24752

Closed

jjerphan self-requested a review September 30, 2023 12:16

jjerphan mentioned this pull request Oct 20, 2023

LogisticRegression memory consumption goes crazy on 0.22+ #17125

Closed

lorentzenchr added 7 commits November 15, 2023 05:06

Merge branch 'main' into glm_newton_lsmr_only

8627b0a

FIX adapt to new loss and gradient scaling

3e2d1ea

FIX sw_sum for newton-lsmr

9409891

CLN fix import order

a49d25b

Merge branch 'main'

bdd5230

FIX tests with sw_sum

52381c2

Merge branch 'main' into glm_newton_lsmr_only

fe7bddd

jjerphan reviewed Mar 18, 2024

View reviewed changes

lorentzenchr added 2 commits April 12, 2024 11:16

Merge branch 'main' into glm_newton_lsmr_only

2e80557

CLN address review comments, move to 1.5

71d2733

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA add newton-lsmr solver to LogisticRegression and GLMs #25462

FEA add newton-lsmr solver to LogisticRegression and GLMs #25462

FEA add newton-lsmr solver to LogisticRegression and GLMs #25462

Are you sure you want to change the base?

FEA add newton-lsmr solver to LogisticRegression and GLMs #25462

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sparse X (as above)

Dense X

Conclusion

✔️ Linting Passed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment