FEA add GLM Newton-LSMR Solver on top of Newton-Cholesky #23507

lorentzenchr · 2022-05-31T21:20:24Z

Reference Issues/PRs

This is a follow-up on #23314.

What does this implement/fix? Explain your changes.

It uses the machinery of #23314 and implements a further Newton solver based on LSMR for GLMs. It uses the IRLS formulation of the Newton step, which can be read as a normal equation of a least squares problem. This least squares problem is then solved via LSMR with a careful choice of the stopping criterion (atol).

This solver is good for dense and sparse input, only uses matrix-vector products, can stop early when achieving the convergence criteria, can handle thin and wide data, can deal with singular (collinear) X, but may fall short for very ill-conditioned X.

Any other comments?

I have never seen the combination of IRLS with LSMR (or LSQR). Momentarily, this is my favorite solver 😉

As promised, LSMR seems to be able to stop a little earlier and therefore is a little faster than LSQR.

lorentzenchr · 2022-05-31T21:31:53Z

Some benchmark

As in #23314.
Updated with a115afa.

import numpy as np
from sklearn._loss import HalfPoissonLoss
from sklearn.compose import ColumnTransformer
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model._linear_loss import LinearModelLoss
from sklearn.linear_model import PoissonRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer


df = fetch_openml(data_id=41214, as_frame=True).frame
df["Frequency"] = df["ClaimNb"] / df["Exposure"]
log_scale_transformer = make_pipeline(
    FunctionTransformer(np.log, validate=False), StandardScaler()
)
linear_model_preprocessor = ColumnTransformer(
    [
        ("passthrough_numeric", "passthrough", ["BonusMalus"]),
        ("binned_numeric", KBinsDiscretizer(n_bins=10, subsample=None), ["VehAge", "DrivAge"]),
        ("log_scaled_numeric", log_scale_transformer, ["Density"]),
        (
            "onehot_categorical",
            OneHotEncoder(),
            ["VehBrand", "VehPower", "VehGas", "Region", "Area"],
        ),
    ],
    remainder="drop",
)
y = df["Frequency"]
w = df["Exposure"]
X = linear_model_preprocessor.fit_transform(df)

Note that X is sparse

%time PoissonRegressor(solver="lbfgs", alpha=1e-4, tol=1e-4, max_iter=1000).fit(X, y, sample_weight=w); None

Wall time: 5.01 s

%time PoissonRegressor(solver="newton-cholesky", alpha=1e-4, tol=1e-4).fit(X, y, sample_weight=w); None

Wall time: 1.98 s

%time PoissonRegressor(solver="newton-lsmr", alpha=1e-4, tol=1e-4).fit(X, y, sample_weight=w); None

Wall time: 1.1 s

On dense data with

X = X.toarray()

I get

lbfgs: 18.5 s
newton-cholesky: 2.51 s
newton-lsmr: 4.35 s

They achieve different losses and gradients:

# loss
reg_lbfgs = PoissonRegressor(solver="lbfgs", alpha=1e-4, tol=1e-4, max_iter=1000).fit(X, y, sample_weight=w)
reg_cholesky = PoissonRegressor(solver="newton-cholesky", alpha=1e-4, tol=1e-4, max_iter=1000).fit(X, y, sample_weight=w)
reg_lsmr = PoissonRegressor(solver="newton-lsmr", alpha=1e-4, tol=1e-4, max_iter=1000).fit(X, y, sample_weight=w)

lml = LinearModelLoss(base_loss=HalfPoissonLoss(), fit_intercept=True)

for reg, solver in [(reg_lbfgs, "lbfgs"), (reg_cholesky, "cholesky"), (reg_lsmr, "lsmr")]:
    this_loss = lml.loss(
        coef=np.r_[reg.coef_, reg.intercept_],
        X=X,
        y=np.asarray(y),
        sample_weight=np.asarray(w) / np.sum(w),
        l2_reg_strength=1e-4,
    )
    this_grad = lml.gradient(
        coef=np.r_[reg.coef_, reg.intercept_],
        X=X,
        y=np.asarray(y),
        sample_weight=np.asarray(w) / np.sum(w),
        l2_reg_strength=1e-4,
    )
    print(f"solver={solver:<8} has loss = {this_loss:0.10f} "
          f"gradient 2-norm = {np.linalg.norm(this_grad):0.10f} "
          f"gradient inf-norm = {np.linalg.norm(this_grad, ord=np.inf)}"
         )

solver=lbfgs    has loss = 0.3159339334 gradient 2-norm = 0.0001658628 gradient inf-norm = 7.403807907072436e-05
solver=cholesky has loss = 0.3159052779 gradient 2-norm = 0.0000049585 gradient inf-norm = 4.956622873102174e-06
solver=lsmr     has loss = 0.3159284188 gradient 2-norm = 0.0000654960 gradient inf-norm = 3.575219334355157e-05

8000

ogrisel · 2022-06-01T12:14:27Z

Interesting! Since the lsmr sover is the fastest but stops at a higher loss, maybe the stopping criterion are not directly comparable.

Could you please run benchmarks with a grid of tolerance values to check if one solver is Pareto optimal (uniformly dominates the others on this tasks)?

#23314 (review)

Also, you might want to contribute both those solvers and the Poisson regression task to the benchopt project:

https://benchopt.github.io/

lorentzenchr · 2022-06-01T18:41:08Z

Solver profiles

Poisson Regression on French MTPL

As of a115afa.
Sparse X with X.shape = (678013, 75)

import warnings
from pathlib import Path
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.datasets import fetch_openml
from sklearn.linear_model import PoissonRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model._linear_loss import LinearModelLoss
from sklearn.model_selection import train_test_split
from time import perf_counter
import pandas as pd
import joblib


def prepare_data():
    df = fetch_openml(data_id=41214, as_frame=True).frame
    df["Frequency"] = df["ClaimNb"] / df["Exposure"]
    log_scale_transformer = make_pipeline(
        FunctionTransformer(np.log, validate=False), StandardScaler()
    )
    linear_model_preprocessor = ColumnTransformer(
        [
            ("passthrough_numeric", "passthrough", ["BonusMalus"]),
            (
                "binned_numeric",
                KBinsDiscretizer(n_bins=10, subsample=None),
                ["VehAge", "DrivAge"],
            ),
            ("log_scaled_numeric", log_scale_transformer, ["Density"]),
            (
                "onehot_categorical",
                OneHotEncoder(),
                ["VehBrand", "VehPower", "VehGas", "Region", "Area"],
            ),
        ],
        remainder="drop",
    )
    y = df["Frequency"]
    w = df["Exposure"]
    X = linear_model_preprocessor.fit_transform(df)
    return X, y, w


X, y, w = prepare_data()
# X = X.toarray()
X_train, X_test, y_train, y_test, w_train, w_test = train_test_split(
    X, y, w, train_size=100_000, test_size=10_000, random_state=0
)
print(f"{X_train.shape = }")

results = []
loss_sw = np.full_like(y_train, fill_value=(1. / y_train.shape[0]))
# loss_
8000
sw = None
for tol in np.logspace(-1, -10, 10):
    for solver in ["lbfgs", "newton-cholesky", "newton-lsmr"]:
        tic = perf_counter()
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", category=ConvergenceWarning)
            reg = PoissonRegressor(
                alpha=1e-4, solver=solver, tol=tol, max_iter=10000
            ).fit(X_train, y_train)
        toc = perf_counter()
        train_time = toc - tic
        train_loss = LinearModelLoss(
            base_loss=reg._get_loss(), fit_intercept=reg.fit_intercept
        ).loss(
            coef=np.r_[reg.coef_, reg.intercept_],
            X=X_train,
            y=y_train.values,
            l2_reg_strength=reg.alpha,
            sample_weight=loss_sw,
        )
        result = {
            "solver": solver,
            "tol": tol,
            "train_loss": train_loss,
            "train_time": train_time,
            "train_score": reg.score(X_train, y_train),
            "test_score": reg.score(X_test, y_test),
            "n_iter": reg.n_iter_,
            "converged": reg.n_iter_ < reg.max_iter,
        }
        print(result)
        results.append(result)


results = pd.DataFrame.from_records(results)
results.to_csv(Path().resolve() / "bench_poisson_reg.csv")

Plotting

import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt


results = pd.read_csv(Path().resolve() / "bench_poisson_reg.csv")
results["suboptimality"] = results["train_loss"] - results["train_loss"].min() + 1e-15
fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(x="n_iter", y="suboptimality", loglog=True, marker="o", label=label, ax=ax)
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by iterations")

fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(x="train_time", y="suboptimality", loglog=True, marker="o", label=label, ax=ax)
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by time")

Binary Logistic Regression on Kicks Dataset

As of 9de52bf.
Sparse X with X.shape = (72983, 1076)

import warnings
from pathlib import Path
import numpy as np
from sklearn._loss import HalfBinomialLoss
from sklearn.compose import ColumnTransformer
from sklearn.datasets import fetch_openml
from sklearn.impute import SimpleImputer
from sklearn.linear_model._glm.glm import _GeneralizedLinearRegressor
from sklearn.linear_model._linear_loss import LinearModelLoss
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.exceptions import ConvergenceWarning
from sklearn.model_selection import train_test_split
from time import perf_counter
import pandas as pd
import joblib


class BinomialRegressor(_GeneralizedLinearRegressor):
    def _get_loss(self):
        return HalfBinomialLoss()


def prepare_data():
    df = fetch_openml(data_id=41162, as_frame=True, parser="auto").frame
    linear_model_preprocessor = ColumnTransformer(
        [
            (
                "passthrough_numeric",
                make_pipeline(SimpleImputer(), StandardScaler()),
                [
                    "MMRAcquisitionAuctionAveragePrice",
                    "MMRAcquisitionAuctionCleanPrice",
                    "MMRCurrentAuctionAveragePrice",
                    "MMRCurrentAuctionCleanPrice",
                    "MMRCurrentRetailAveragePrice",
                    "MMRCurrentRetailCleanPrice",
                    "MMRCurrentRetailAveragePrice",
                    "MMRCurrentRetailCleanPrice",
                    "VehBCost",
                    "VehOdo",
                    "VehYear",
                    "VehicleAge",
                    "WarrantyCost",
                ],
            ),
            (
                "onehot_categorical",
                OneHotEncoder(min_frequency=10),
                [
                    "Auction",
                    "Color",
                    "IsOnlineSale",
                    "Make",
                    "Model",
                    "Nationality",
                    "Size",
                    "SubModel",
                    "Transmission",
                    "Trim",
                    "WheelType",
                ],
            ),
        ],
        remainder="drop",
    )
    y = np.asarray(df["IsBadBuy"] == "1", dtype=float)
    X = linear_model_preprocessor.fit_transform(df)
    return X, y


X, y = prepare_data()
# X = X.toarray()
X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.9, random_state=0
)
print(f"{X_train.shape = }")

results = []
loss_sw = np.full_like(y_train, fill_value=(1. / y_train.shape[0]))
alpha = 1e-4
for tol in np.logspace(-1, -10, 10):
    for solver in ["lbfgs", "newton-cholesky", "newton-lsmr", "logreg-lbfgs", "logreg-newton-cg"]:
        tic = perf_counter()
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", category=ConvergenceWarning)
            if solver in ["lbfgs", "newton-cholesky", "newton-lsmr"]:
                reg = BinomialRegressor(
                    alpha=alpha, solver=solver, tol=tol, max_iter=10000
                ).fit(X_train, y_train)
            else:
                reg = LogisticRegression(
                    solver=solver[7:], C=1 / alpha / X.shape[0], tol=tol, max_iter=1000
                ).fit(X_train, y_train)
        toc = perf_counter()
        train_time = toc - tic
        train_loss = LinearModelLoss(
            base_loss=HalfBinomialLoss(), fit_intercept=reg.fit_intercept
        ).loss(
            coef=np.r_[np.squeeze(reg.coef_), reg.intercept_],
            X=X_train,
            y=y_train,
            l2_reg_strength=alpha,
            sample_weight=loss_sw,
        )
        result = {
            "solver": solver,
            "tol": tol,
            "train_loss": train_loss,
            "train_time": train_time,
            "train_score": reg.score(X_train, y_train),
            "test_score": reg.score(X_test, y_test),
            "n_iter": np.squeeze(reg.n_iter_),
            "converged": np.squeeze(reg.n_iter_) < np.squeeze(reg.max_iter),
        }
        print(result)
        results.append(result)


results = pd.DataFrame.from_records(results)
results.to_csv(Path().resolve() / "bench_logistic_reg.csv")

Plotting

import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt


results = pd.read_csv(Path().resolve() / "bench_logistic_reg.csv")
results["suboptimality"] = results["train_loss"] - results["train_loss
8000
"].min() + 1e-15
fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(x="n_iter", y="suboptimality", loglog=True, marker="o", label=label, ax=ax)
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by iterations, sparse X")


fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(x="train_time", y="suboptimality", loglog=True, marker="o", label=label, ax=ax)
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by time, sparse X")

Summary

Note that results look similar with dense X.
For these datasets with n_samples > n_features and some OHE features, newton-cholesky is the fastest for higher precision (smaller tol or suboptimality)
For lower precision, newton-lsmr or lbfgs can be faster, which one depends on the dataset.
The behaviour of LogisticRegression (logreg-lbfgs, logreg-newton-cg) is very strange. These solvers seem unable to converge any further to the true minimum. This is all the more strange as the difference between logreg-lbfgs und lbfgs is only the normalization of the objective function (logreg-lbfgs uses sum(loss_i), lbfgs uses mean(loss_i)), all the rest is exactly identical!!!
Conclusion: mean(objective) has numerical better stopping criteria than sum(objective).

sklearn/linear_model/_glm/glm.py

ogrisel · 2022-06-02T07:31:56Z

BTW, thanks for the benchmarks, it also looks like a strong contender. It would be interesting to systematically compare all those solvers for various data shapes (n_samples, n_features, n_classes/n_outputs), duplicated/collinear features, regularization and also measure memory usage on dense and sparse data.

ogrisel · 2022-06-02T07:35:13Z

can deal with singular (collinear) X, but may fall short for very ill-conditioned X

Not sure what's this means. It can handle collinear data as long as n_samples >> n_uniquely_informative_features?

sklearn/linear_model/_glm/glm.py

ogrisel · 2022-06-02T13:32:06Z

sklearn/linear_model/_glm/glm.py

+        # Estimated Frobenius norm of A and norm of residual r for tolerance of
+        # next iteration.
+        self.A_norm = result[5]
+        self.r_norm = result[3]


We might want to issue a warning (and suggest increasing the regularization) if result[1] == 7 (istop) which means that lsmr has not converged with the default maxiter(which should ben_features` by construction in our case), possibly because of ill-conditioning.

Instead of warning might even want to experiment with doing more iterations of lsmr when we detect this situation according to the docstring of lsmr, see: https://github.com/scipy/scipy/blob/main/scipy/sparse/linalg/_isolve/lsmr.py

and then only warn if we do not converge with maxiter=2 * n_features in total for instance (e.g. running lsmr twice in a row if the first call resulted in istop == 7).

Note that issuing warning when the inner solver failed to converge while still proceeding with the outer loop might just result in a wall of warnings and a badly converged model. Maybe we could also choose to just early interrupt the outer solver loop with a ConvergenceWarning in such a case. This is related to: lorentzenchr#6 (that investigates a similar case for the "newton-cholesky" solver).

sklearn/linear_model/_glm/glm.py

…kit-learn into glm_newton_lsmr

- Do not use lbfgs steps, fall back complete to lbfgs

Stricter test for glms with newton-cholesky solver on collinear data

test_linalg_warning_with_newton_solver

lorentzenchr · 2022-07-02T17:06:10Z

Notes on multi-class logistic / multinomial regression

Is this possible with newton-lsmr?
Consider 3 classes. We concatenate the coefficient/weight of each class, i.e. coef = (coeffs_class_1, coeffs_class_2, coeffs_class_3). The Hessian becomes

(X'    ) (W11 W12 W13) (X  )
(  X'  ) (W12 W22 W23) ( X )
(    X') (W13 W23 W33) (  X)

Each submatrix Wij is diagonal of dimension n_samples * n_samples, Wii=p_i (1-p_i) and Wij=-pi * p_j with predicted probabilities p_i. For LSMR to work, we need to find a square root of the whole W, say V with V'V=W. One could achieve this via a cholesky decomposition, but W is very sparse and very large, i.e. W.shape = (n_classes * n_samples, n_classes * n_samples). I tried to find analytic expressions, but so far without luck.
See Chapter 3.2 of https://www.jstatsoft.org/article/view/v075i03.

Update

There is some literature on this, in particular https://doi.org/10.1111/j.2517-6161.1992.tb01875.x. As this is not open access, I tried to derive it myself, resulting in:

Cholesky Decomposition of multinomial covariance matrix

Mathematical sum convention, starting from 1.
p = n-dimentional vector of probabilities summing to 1.
C = diag(p) - p'p, meaning C_ij = p_i - p_i p_j

Cholesky decomposition of C = L' L with lower triangular L.
L_ii = sqrt(p_i * (1 - sum(p_k, k=1..i)) / (1 - sum(p_k, k=1..i-1)))
Note that L_nn = 0.

for i>j
L_ij = -p_i p_j / sqrt(p_i * (1 - sum(p_k, k=1..j-1)) * (1 - sum(p_k, k=1..j)))

I verified that for n=4

import numpy as np

p = np.array([0.1, 0.2, 0.3, 0.4])
C = np.diag(p) - np.outer(p, p)

L = np.zeros_like(C)
for i in range(L.shape[0]):
    L[i, i] = np.sqrt(p[i] * (1 - np.sum(p[:i+1])) / (1 - np.sum(p[:i])))
for i in range(L.shape[0]):
    for j in range(i):
        den = p[j] * (1 - np.sum(p[:j])) * (1 - np.sum(p[:j+1]))
        L[i, j] = - p[i] <
F438
span class="pl-c1">* p[j] / np.sqrt(den)

np.allclose(L @ L.T, C)

I think is makes multinomial newton-lsmr possible!

lorentzenchr · 2022-07-04T14:00:56Z

LDL of multinomial covariance matrix

The square root free LDL decomposition, given in https://doi.org/10.1111/j.2517-6161.1992.tb01875.x, reads:

Mathematical sum convention, starting from 1.
p = n-dimentional vector of probabilities summing to 1.
C = diag(p) - p'p, meaning C_ij = p_i - p_i p_j

LDL (square root free Cholesky) decomposition of C = L' D L with lower triangular L and diagonal D.
q_0 = 1
q_i = 1 - sum(p_k, k=1..i)

L_ii = 1
for i>j:
    L_ij = -p_i / q_i

D_ii = p_i * q_i / q_{i-1}

If the p_i sum to 1, then q_n = 0 and d_n = 0.

ogrisel · 2022-07-20T14:14:43Z

I think is makes multinomial newton-lsmr possible!

That's very promising.

lorentzenchr · 2022-08-12T12:55:21Z

Adding multiclass/multinomial would be super cool, but I guess this PR would explode. I'd rather refer it to a later PR.

…kit-learn into glm_newton_lsmr

lorentzenchr · 2022-11-03T17:42:38Z

I will open a new PR to add LSMR. I'll keep the branch as it is important for https://github.com/lorentzenchr/notebooks/blob/master/bench_glm_newton_solvers_cholesky_lsmr.ipynb.

lorentzenchr and others added 11 commits May 9, 2022 18:12

FEA add NewtonSolver, CholeskyNewtonSolver and QRCholeskyNewtonSolver

d4a8b4e

Merge branch 'main' into glm_newton_cholesky

ad65d89

ENH better singular hessian special solve

267d570

CLN fix some typos found by reviewer

dd5a820

TST assert ConvergenceWarning is raised

bf1828d

MNT add BaseCholeskyNewtonSolver

9783e6b

WIP colinear design in GLMs

d373e63

Merge branch 'glm_newton_cholesky' into glm-newton-colinear

bcc176e

FIX _solve_singular

c6efcef

8000 FIX solver for singular cases in GLMs (#5)

2381298

ENH add LSMR Newton solver

4d48569

github-actions bot added the module:linear_model label May 31, 2022

ENH improve value of atol in LSMR

a115afa

ogrisel reviewed Jun 2, 2022

View reviewed changes

sklearn/linear_model/_glm/glm.py Outdated Show resolved Hide resolved

ogrisel reviewed Jun 2, 2022

View reviewed changes

sklearn/linear_model/_glm/glm.py Show resolved Hide resolved

ogrisel reviewed Jun 2, 2022

View reviewed changes

sklearn/linear_model/_glm/glm.py Show resolved Hide resolved

lorentzenchr added 8 commits June 2, 2022 18:11

FIX false unpacking in

d2063f7

ENH add LSMR Newton solver

7e2b323

ENH improve value of atol in LSMR

ef3c68f

Merge branch 'glm_newton_lsmr' of https://github.com/lorentzenchr/sci…

7e99cd9

…kit-learn into glm_newton_lsmr

DOC less acronyms and reference

68d7455

Merge branch 'main' into glm_newton_cholesky

8d1137f

Merge branch 'glm_newton_cholesky' into glm_newton_lsmr

53aba4d

TST add tests for unpenalized GLMs

e6684c6

lorentzenchr and others added 13 commits July 1, 2022 15:01

ENH fallback_lbfgs_solve

d4206d6

- Do not use lbfgs steps, fall back complete to lbfgs

ENH adapt rtol

5e6aa99

Merge branch 'glm_newton_cholesky' into test-glm-collinear-data

4992398

Improve test_linalg_warning_with_newton_solver

15192f1

Better comments

621ffd8

Merge pull request #8 from ogrisel/test-glm-collinear-data

83944aa

Stricter test for glms with newton-cholesky solver on collinear data

Fixed Hessian casing and improved warning messages

6413f07

[all random seeds]

bfe3c38

test_linalg_warning_with_newton_solver

Ignore ConvergenceWarnings for now if convergence is good

fa9e885

CLN remove counting of warnings

7318a4f

ENH fall back to lbfgs if line search did not converge

34e297e

DOC better comment on performance bottleneck

d8c98a2

Merge branch 'glm_newton_cholesky' into glm_newton_lsmr

9de52bf

Merge branch 'main' into glm_newton_lsmr

d2f2237

lorentzenchr added 6 commits October 8, 2022 17:10

Merge branch 'main' into glm_newton_lsmr

9a60605

Merge branch 'glm_newton_lsmr' of https://github.com/lorentzenchr/sci…

392d89a

…kit-learn into glm_newton_lsmr

FIX merge conflict

046e0c9

CLN fix typo

2979f08

CLN fix typos

0bcb01e

Merge branch 'main' into glm_newton_lsmr

09be2f1

mathurinm mentioned this pull request Oct 11, 2022

ENH add Newton LSMR for smooth problems scikit-learn-contrib/skglm#86

Open

ogrisel mentioned this pull request Oct 21, 2022

FEA add (single) Cholesky Newton solver to GLMs #24637

Merged

lorentzenchr changed the title ~~FEA add GLM Newton-LSMR Solver~~ FEA add GLM Newton-LSMR Solver on top of Newton-Cholesky Nov 3, 2022

lorentzenchr closed this Nov 3, 2022

lorentzenchr mentioned this pull request Jan 23, 2023

FEA add newton-lsmr solver to LogisticRegression and GLMs #25462

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA add GLM Newton-LSMR Solver on top of Newton-Cholesky #23507

FEA add GLM Newton-LSMR Solver on top of Newton-Cholesky #23507

FEA add GLM Newton-LSMR Solver on top of Newton-Cholesky #23507

FEA add GLM Newton-LSMR Solver on top of Newton-Cholesky #23507

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Some benchmark

Solver profiles

Poisson Regression on French MTPL

Binary Logistic Regression on Kicks Dataset

Summary

Choose a reason for hiding this comment

Notes on multi-class logistic / multinomial regression

Update

Cholesky Decomposition of multinomial covariance matrix

LDL of multinomial covariance matrix