8000 FEA add newton-lsmr solver to LogisticRegression and GLMs by lorentzenchr · Pull Request #25462 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FEA add newton-lsmr solver to LogisticRegression and GLMs #25462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

lorentzenchr
Copy link
Member
@lorentzenchr lorentzenchr commented Jan 23, 2023

Reference Issues/PRs

Supersedes #23507.
Fixes #16634.

What does this implement/fix? Explain your changes.

This PR adds a further NewtonSolver: NewtonLSMRSolver.
This solver uses the iteratively reweighted least squares (IRLS) formulation of a Newton step. This means the inner solver uses the square root of the Hessian and solves the corresponding least squares problem (as opposed to solving the normal equation as "newton-cholesky" is doing) with the iterative LSMR solver.

This solver is therefore suited for dense and sparse X.

Any other comments?

The multinomial/multiclass case deserves special attention as there are different ways to look at the Hessian $X' W X$:

  • Naive: Use, e.g. for n_classes=3
    X = [X 0 0]  W = [W00 W10 W00]  y = [y==0    0    0]  coef = [coef_class_0]  ...
        [0 X 0]      [W10 W11 W00]      [   0 y==1    0]         [coef_class_1]
        [0 0 X]      [W20 W10 W22]      [   0    0 y==2]         [coef_class_2]
    
  • Consider every 1d-array/2d-array as a 2d-/3d-array with its 2nd/3rd dimension having all n_samples. Then
    $W = \mathrm{diag}(p) - p'p$ and $p$ the probability array in n_classes (and n_samples as "depth").
    Now, one can use the LDL decomposition of this particular matrix $W$, analytically given in https://doi.org/10.1111/j.2517-6161.1992.tb01875.x, use $\sqrt{D} L'$ as square root of $W$.
    This is the chosen approach.

Edit: For benchmarks, see #25462 (comment).

@lorentzenchr
Copy link
Member Author

@ogrisel @mathurinm @TomDLT @agramfort @rth might be interested as this seems to be new ground for GLM solvers, especially the multinomial logistic regression!

It was a very stony path to arrive with all (added) tests green. Right now, I've no energy to do extensive benchmarking. But I hope, that this work will become useful and find its way into scikit-learn, in the end. I'm sure, there are opportunities left for performance optimization.

@ogrisel
Copy link
Member
ogrisel commented Jan 24, 2023

Glad to see this! I just re-ran the previous benchmark for Poisson regression on the French MTPL dataset from the previous PR:

and here are the results on my laptop:

poisson_reg_MTPL_iter

poisson_reg_MTPL

so this looks very good.

@ogrisel
Copy link
Member
ogrisel commented Jan 24, 2023

I have adapted the above benchmark to turn it into an imbalanced multiclass classification problem by binning the target. Since 0 is overly represented, when choosing a large number of bins and the quantile strategy, many bins are collapsed to 0.

Here is the code:

import warnings
from pathlib import Path
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.datasets import fetch_openml
from sklearn.linear_model import PoissonRegressor, LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model._linear_loss import LinearModelLoss
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split
from time import perf_counter
import pandas as pd
import joblib


m = joblib.Memory(location=".", verbose=0)


@m.cache
def prepare_data():
    df = fetch_openml(data_id=41214, as_frame=True, parser='auto').frame
    df["Frequency"] = df["ClaimNb"] / df["Exposure"]
    log_scale_transformer = make_pipeline(
        FunctionTransformer(np.log, validate=False), StandardScaler()
    )
    linear_model_preprocessor = ColumnTransformer(
        [
            ("passthrough_numeric", "passthrough", ["BonusMalus"]),
            (
                "binned_numeric",
                KBinsDiscretizer(n_bins=10, subsample=None),
                ["VehAge", "DrivAge"],
            ),
            ("log_scaled_numeric", log_scale_transformer, ["Density"]),
            (
                "onehot_categorical",
                OneHotEncoder(),
                ["VehBrand", "VehPower", "VehGas", "Region", "Area"],
            ),
        ],
        remainder="drop",
    )
    y = df["Frequency"]
    w = df["Exposure"]
    X = linear_model_preprocessor.fit_transform(df)
    return X, y, w


X, y_orig, w = prepare_data()

print("binning the target...")
binner = KBinsDiscretizer(
    n_bins=300, encode="ordinal", strategy="quantile", subsample=int(2e5), random_state=0
)
y = binner.fit_transform(y_orig.to_numpy().reshape(-1, 1)).ravel().astype(np.int32)

# X = X.toarray()
X_train, X_test, y_train, y_test, w_train, w_test = train_test_split(
    X, y, w, train_size=10_000, test_size=10_000, random_state=0
)
print(f"{X_train.shape = }")
print("y_train.value_counts() :")
print(pd.Series(y_train).value_counts())

results = []
slow_solvers = set()
for tol in np.logspace(-1, -10, 10):
    for solver in ["lbfgs", "newton-cg", "newton-lsmr"]:
        if solver in slow_solvers:
            # skip slow solvers to keep the benchmark runtime reasonable
            continue
        tic = perf_counter()
        # with warnings.catch_warnings():
        #     warnings.filterwarnings("ignore", category=ConvergenceWarning)
        clf = LogisticRegression(
            C=1e12, solver=solver, tol=tol, max_iter=10000
        ).fit(X_train, y_train)
        toc = perf_counter()
        train_time = toc - tic
        if train_time > 200:
            # skip this solver from now on...
            slow_solvers.add(solver)
        # TODO: handle the regularization term...
        train_loss = log_loss(y_train, clf.predict_proba(X_train))
        n_iter = clf.n_iter_[0]
        result = {
            "solver": solver,
            "tol": tol,
            "train_loss": train_loss,
            "train_time": train_time,
            "train_score": clf.score(X_train, y_train),
            "test_score": clf.score(X_test, y_test),
            "n_iter": n_iter,
            "converged": n_iter < clf.max_iter,
        }
        print(result)
        results.append(result)


results = pd.DataFrame.from_records(results)
filepath = Path().resolve() / "bench_multinomial_logistic_regression_mtpl.csv"
results.to_csv(filepath)


import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt


results = pd.read_csv(filepath)
results["suboptimality"] = results["train_loss"] - results["train_loss"].min() + 1e-15
fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(
        x="n_iter", y="suboptimality", loglog=True, marker="o", label=label, ax=ax
    )
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by iterations")

fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(
        x="train_time", y="suboptimality", loglog=True, marker="o", label=label, ax=ax
    )
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by time")
plt.show()

DISCLAIMER: the plot above displays the unpenalized train_loss while the model where fitted with C=1e12 so take those results with a grain of salt. I should have tried to completely disable penalization. And maybe also plot the (unpenalized) test negative log likelihood.

EDIT: I did another run with C=np.inf and the results are similar:

multinomial-no-penalty

This task is very challenging for all solvers and I had to decrease the number of samples to get it run in a reasonable time on my laptop. I also stopped recording solver when tol decreases to the point where a single fit would last more than a few minutes.

Here are the resulting plots:

multinomial_lsmr_iter

multinomial_lsmr_walltime

Note that the handling of the stopping criterion of LBFGS is not working properly for the LogitistRegression estimator as was previously reported in #24752.

newton-lsmr is slower than alternatives at the beginning but can still converge to low tol values while newton-cg would probably a lot more time (if it ever could in the first place).

Note that for lower tolerance values, the above snippet can trigger:

/Users/ogrisel/code/scikit-learn/sklearn/linear_model/_linear_loss.py:867: RuntimeWarning: divide by zero encountered in divide
  fj = self.p[:, i] / (self.q[:, i - 1] + mask)
/Users/ogrisel/code/scikit-learn/sklearn/linear_model/_linear_loss.py:873: RuntimeWarning: invalid value encountered in add
  x[:, i] += fj * x[:, j]

for the newton-lsmr solver. Yet it does not prevent the convergence.

Finally, I think it would be interesting to adapt this benchmark to use benchopt and include it in this panel since it's quite challenging for most solvers yet still realistic enough.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early feedback after a quick glance at the code.

Out of curiosity, have you tried to profile this to pinpoint the bottlenecks for both the multinomial and non-multinomial cases?

I have the impression that multithreading is barely used (in the multiclass case) so it's probably not a few large BLAS calls as is the case for newton-cholesky.

@@ -516,3 +532,460 @@ def inner_solve(self, X, y, sample_weight):
)
self.use_fallback_lbfgs_solve = True
return


class NewtonLSMRSolver(NewtonSolver):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling the code would be simpler to follow if we had a special subclass to handle the if self.linear_loss.base_loss.is_multiclass case.

That would require introducing a factory function to do the dispatching to the correct solver class based on the loss object but I have the feeling that would be worth it.

@lorentzenchr lorentzenchr changed the title ENH add newton-lsmr solver to LogisticRegression and GLMs FEA add newton-lsmr solver to LogisticRegression and GLMs Jan 24, 2023
@lorentzenchr
Copy link
Member Author
lorentzenchr commented Jan 24, 2023

Out of curiosity, have you tried to profile this to pinpoint the bottlenecks for both the multinomial and non-multinomial cases?

For the multinomial/multiclass case, it clearly is LDL.sqrt_D_Lt_matmul and LDL.L_sqrt_D_matmul in A.matmul and A.rmatmul inside LSMR.

Edit: I was able to significantly speed up those 2 functions in e5f5f48. They are still the bottleneck, but much reduced (~2x).

@lorentzenchr
Copy link
Member Author
lorentzenchr commented Jan 25, 2023

With the latest improvements it looks a bit better (btw n_classes=12)

Sparse X (as above)

image

Dense X

Added 24.02.2023
image

Conclusion

So this solver can be used for very fast but rough estimates or for high precision estimates:smirk:

Code for reproducibility:

import warnings
from pathlib import Path
import numpy as np
from scipy import sparse
from sklearn.compose import ColumnTransformer
from sklearn.datasets import fetch_openml
from sklearn.linear_model import PoissonRegressor, LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model._linear_loss import LinearModelLoss
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split
from time import perf_counter
import pandas as pd
import joblib


m = joblib.Memory(location=".", verbose=0)


@m.cache
def prepare_data():
    df = fetch_openml(data_id=41214, as_frame=True, parser='auto').frame
    df["Frequency"] = df["ClaimNb"] / df["Exposure"]
    log_scale_transformer = make_pipeline(
        FunctionTransformer(np.log, validate=False), StandardScaler()
    )
    linear_model_preprocessor = ColumnTransformer(
        [
            ("passthrough_numeric", "passthrough", ["BonusMalus"]),
            (
                "binned_numeric",
                KBinsDiscretizer(n_bins=10, subsample=None),
                ["VehAge", "DrivAge"],
            ),
            ("log_scaled_numeric", log_scale_transformer, ["Density"]),
            (
                "onehot_categorical",
                OneHotEncoder(),
                ["VehBrand", "VehPower", "VehGas", "Region", "Area"],
            ),
        ],
        remainder="drop",
    )
    y = df["Frequency"]
    w = df["Exposure"]
    X = linear_model_preprocessor.fit_transform(df)
    return X, y, w


X, y_orig, w = prepare_data()

print("binning the target...")
binner = KBinsDiscretizer(
    n_bins=300, encode="ordinal", strategy="quantile", subsample=int(2e5), random_state=0
)
y = binner.fit_transform(y_orig.to_numpy().reshape(-1, 1)).ravel().astype(np.int32)

# X = X.toarray()
X_train, X_test, y_train, y_test, w_train, w_test = train_test_split(
    X, y, w, train_size=10_000, test_size=10_000, random_state=0
)
print(f"{X_train.shape = }")
print(f"{sparse.issparse(X_train)=}")
print("y_train.value_counts() :")
print(pd.Series(y_train).value_counts())

results = []
slow_solvers = set()
for tol in np.logspace(-1, -10, 10):
    for solver in ["lbfgs", "newton-cg", "newton-lsmr"]:
        if solver in slow_solvers:
            # skip slow solvers to keep the benchmark runtime reasonable
            continue
        tic = perf_counter()
        # with warnings.catch_warnings():
        #     warnings.filterwarnings("ignore", category=ConvergenceWarning)
        clf = LogisticRegression(
            C=1e12, solver=solver, tol=tol, max_iter=10000
        ).fit(X_train, y_train)
        toc = perf_counter()
        train_time = toc - tic
        if train_time > 200:
            # skip this solver from now on...
            slow_solvers.add(solver)
        # TODO: handle the regularization term...
        train_loss = log_loss(y_train, clf.predict_proba(X_train))
        n_iter = clf.n_iter_[0]
        result = {
            "solver": solver,
            "tol": tol,
            "train_loss": train_loss,
            "train_time": train_time,
            "train_score": clf.score(X_train, y_train),
            "test_score": clf.score(X_test, y_test),
            "n_iter": n_iter,
            "converged": n_iter < clf.max_iter,
        }
        print(result)
        results.append(result)


results = pd.DataFrame.from_records(results)
filepath = Path().resolve() / "bench_multinomial_logistic_regression_mtpl.csv"
results.to_csv(filepath)


import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt


results = pd.read_csv(filepath)
results["suboptimality"] = results["train_loss"] - results["train_loss"].min() + 1e-15
fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(
        x="n_iter", y="suboptimality", loglog=True, marker="o", label=label, ax=ax
    )
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by iterations")

fig, ax = plt.subplots(figsize=(8, 6))
for label, group in results.groupby("solver"):
    group.sort_values("tol").plot(
        x="train_time", y="suboptimality", loglog=True, marker="o", label=label, ax=ax
    )
ax.set_ylabel("suboptimality")
ax.set_title("Suboptimality by time")
plt.show()

@ogrisel
Copy link
Member
ogrisel commented Feb 13, 2023

@lorentzenchr I am not sure if you saw but your last push triggered a CI failure. I did not investigate myself but I wanted to make sure that it not go unnoticed.

@lorentzenchr
Copy link
Member Author

Looking at the last plot, I wonder why the LMSR-based solver seems to slow down after the first 4 iterations, before it accelerates in the last two again. Perhaps, the choice of atol in lsmr in the inner_solve could be improved.

@lorentzenchr
Copy link
Member Author
lorentzenchr commented Feb 14, 2023

The remaining CI error will be automatically fixed by setting scipy>=1.4, see scipy/scipy#7396. Note that the transpose is only taken in a few tests, the solver itself works fine with those older scipy versions.

@ogrisel
Copy link
Member
ogrisel commented Feb 14, 2023

The remaining CI error will be automatically fixed by setting scipy>=1.4, see scipy/scipy#7396.

For reference, the bump to scipy>=1.4 in main happens in this PR: #24665

@lorentzenchr
Copy link
Member Author

CI all 🟢 again.

@lorentzenchr
Copy link
Member Author

Commit 83ce34f passes SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest -n auto -rfE sklearn/linear_model/_glm/tests/test_glm.py locally on my laptop.

@ogrisel
Copy link
Member
ogrisel commented Jun 16, 2023

When trying to trigger the LBFGS fallback by trying to make the model extremely confident so has to make the Hessian numerically degenerate I triggered the following ZeroDivisionError at iteration 26 instead:

File ~/code/scikit-learn/sklearn/linear_model/_glm/_newton_solver.py:974 in inner_solve
    atol=eta * norm_G / (self.A_norm * self.r_norm),
ZeroDivisionError: float division by zero
>>> import numpy as np
... from sklearn.linear_model import LogisticRegression
... 
... x = np.array([-1e24] * 1 + [1e24] * 1)
... X = x.reshape(-1, 1)
... y = (x > 0).astype(np.int32)
... 
... lr = LogisticRegression(solver="newton-lsmr", penalty=None, verbose=100).fit(X, y)
... lr.n_iter_
[...]
Newton iter=26
    norm(gradient) = 4170877078229.1255
Traceback (most recent call last):
  Cell In[2], line 8
    lr = LogisticRegression(solver="newton-lsmr", penalty=None, verbose=100).fit(X, y)
  File ~/code/scikit-learn/sklearn/base.py:1148 in wrapper
    return fit_method(estimator, *args, **kwargs)
  File ~/code/scikit-learn/sklearn/linear_model/_logistic.py:1321 in fit
    fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, prefer=prefer)(
  File ~/code/scikit-learn/sklearn/utils/parallel.py:65 in __call__
    return super().__call__(iterable_with_config)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:1085 in __call__
    if self.dispatch_one_batch(iterator):
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:901 in dispatch_one_batch
    self._dispatch(tasks)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:819 in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/_parallel_backends.py:208 in apply_async
    result = ImmediateResult(func)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/_parallel_backends.py:597 in __init__
    self.results = batch()
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:288 in __call__
    return [func(*args, **kwargs)
  File ~/mambaforge/envs/dev/lib/python3.11/site-packages/joblib/parallel.py:288 in <listcomp>
    return [func(*args, **kwargs)
  File ~/code/scikit-learn/sklearn/utils/parallel.py:127 in __call__
    return self.function(*args, **kwargs)
  File ~/code/scikit-learn/sklearn/linear_model/_logistic.py:485 in _logistic_regression_path
    w0 = sol.solve(X=X, y=target, sample_weight=sample_weight)
  File ~/code/scikit-learn/sklearn/linear_model/_glm/_newton_solver.py:426 in solve
    self.inner_solve(X=X, y=y, sample_weight=sample_weight)
  File ~/code/scikit-learn/sklearn/linear_model/_glm/_newton_solver.py:974 in inner_solve
    atol=eta * norm_G / (self.A_norm * self.r_norm),
ZeroDivisionError: float division by zero

We might need a np.max(self.A_norm * self.r_norm, np.finfo(dtype).eps) of similar here.

@ogrisel
Copy link
Member
ogrisel commented Jun 16, 2023

Also note that "newton-cholesky" and "lbfgs" converge with n_iter_ = [0]. "newton-cg" converges with n_iter_ = [64] on this problem.

@ogrisel
Copy link
Member
ogrisel commented Jun 16, 2023

Actually there is a problem with lbfgs on the above problem, it does not converge:

ABNORMAL_TERMINATION_IN_LNSRCH                              

 Line search cannot locate an adequate point after MAXLS
  function and gradient evaluations.
  Previous x, f and g restored.
 Possible causes: 1 error in function or gradient evaluation;
                  2 rounding error dominate computation.
/Users/ogrisel/code/scikit-learn/sklearn/linear_model/_logistic.py:459: ConvergenceWarning: lbfgs failed to converge (status=2):
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(

and

>>> lr.coef_
array([[0.]])

while "newton-cholesky" does converge on this toy yet extreme problem.

I don't know if the failure of lbfgs here should be considered a bug or not but since lbfgs is our robust fallback, this might be a problem :) Maybe LBFGS should try a simple gradient step when the line search fails?

@ogrisel
Copy link
Member
ogrisel commented Jun 16, 2023

I found a way to trigger the LBFGS fallback for the multiclass case:

import numpy as np
from sklearn.linear_model import LogisticRegression

x = np.array([-1e24] * 1 + [1e24] * 2)
X = x.reshape(-1, 1)
y = np.asarray([0, 1, 2])

lr = LogisticRegression(solver="newton-lsmr", penalty=None, verbose=100).fit(X, y)

@lorentzenchr
Copy link
Member Author

I opened #26707 for investigating the inner solver stopping criterion and run a log of benchmarks. There is no clear winner. I have to leave it as is: Either it is good enough in it's current shape or someone else needs to dig deeper.

My conclusion is that we have quite some room of improvement of the current solvers, like #24752. Also the "newton-cg" could likely be improved by doing what liblinear does, see Galli & Lin "A Study on Truncated Newton Methods for Linear Classification" (https://www.csie.ntu.edu.tw/~cjlin/papers/tncg/tncg.pdf or https://doi.org/10.1109/TNNLS.2020.3045836).
Currently, I'm not 100% convinced of the newton-lsmr, but it is such a nice solver for multiclass problems!

@github-actions
Copy link
github-actions bot commented Jun 26, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 71d2733. Link to the linter CI: here

@ogrisel
Copy link
Member
ogrisel commented Jun 27, 2023

Currently, I'm not 100% convinced of the newton-lsmr, but it is such a nice solver for multiclass problems!

There are cases where it's indeed quite impressive based on the last benchmarks that are now collapsed in the discussion.

#25462 (comment)

But I agree that fixing #24752 would be helpful to get a clearer picture.

Also based on benchopt, it seems that SAG & SAGA are better reference solvers for 20 newsgroups, see e.g.:

https://benchopt.github.io/results/preprint_results_preprint_results_logreg_l2.html

I have no intuition on why this should be the case.

You'll have to switch the dataset in the menu on the left to see the results on 20 newsgroups.

UPDATE: actually the results on this dataset are completely different with and without scaling.

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments I wrote a few months ago (they might not all be relevant).

Comment on lines +746 to +749
if self.p.dtype == np.float32:
eps = 2 * np.finfo(np.float32).resolution
else:
eps = 2 * np.finfo(np.float64).resolution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.p.dtype == np.float32:
eps = 2 * np.finfo(np.float32).resolution
else:
eps = 2 * np.finfo(np.float64).resolution
eps = 2 * np.finfo(self.p.dtype).resolution

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure that proba and hence self.p is always either float32 or float64.

Comment on lines +774 to +775
elif solver == "newton-lsmr":
clf.set_params(tol=1e-6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adaptation makes me think of an UX consideration: do we want to have the tolerance settable by the choice of the solver?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you have in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add IRLS solver for linear models
3 participants
0