8000 [MRG] Add elastic net penalty to LogisticRegression by NicolasHug · Pull Request #11646 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] Add elastic net penalty to LogisticRegression #11646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Nov 22, 2018

Conversation

NicolasHug
Copy link
Member
@NicolasHug NicolasHug commented Jul 20, 2018

Reference Issues/PRs

Closes #8288

What does this implement/fix? Explain your changes.

This PR allows the use of penalty='elastic-net' for the LogisticRegression class. Elastic net is already available in the saga solver, it's just not exposed yet.

Any other comments?

I set the l1_ratio parameter to 0.5 by default, because that's the default in ElasticNet. Should I set it to None? This would allow to raise an error if it's set to something else while penalty != 'elastic-net'.
EDIT: I set it to None and issue warning if penalty != elastic-net

TODO

  • expose penalty='elastic-net' in LogisticRegression
  • expose penalty='elastic-net' in LogisticRegressionCV
  • Update doc
  • write tests
  • Update examples

@amueller
Copy link
Member

So you want to set it to None but have the behavior be 0.5? I don't have a strong feeling about it.

Otherwise looks like it's going in the right direction :)

It seems to be working for binary classification and for multiclass when
multi_class='ovr'. I'm having a hard time figuring out the intricacies
of multi_class='multinomial'.
@NicolasHug
Copy link
Member Author

I just committed a first shot at LogisticRegressionCV.

I looks OK for binary classification and multiclass when multiclass='ovr', but does not work for multiclass='multinomial'.

I'm pretty sure that the problem comes from this line, which was fine when there was only one parameter to optimize (C), but now it messes up with the format of the scores array. So the computation of the best_index (index of the best parameters), which depends on the score array, is incorrect.

I tried various ways of dealing with this but every single change I tried would break other parts of the code.

I'll focus on the doc and the examples for now, but any help / insight would be helpful :)

@NicolasHug
Copy link
Member Author

One of the doctests in 3.6 if failing simply because of a different formatting for floats in numpy arrays... Is there any clean way to ignore it? I guess changing the expected output string would break the other versions tests...

@jnothman
Copy link
Member
jnothman commented Jul 31, 2018 via email

@NicolasHug NicolasHug changed the title [WIP] Add elastic net penalty to LogisticRegression [MRG] Add elastic net penalty to LogisticRegression Aug 2, 2018
@NicolasHug
Copy link
Member Author

Marking this as MRG for a first review, looks like LogisticRegressionCV is finally working as expected \o/

@jnothman
Copy link
Member
jnothman commented Aug 5, 2018 via email

with the 'saga' solver. :issue:`11646` by :user:`Nicolas Hug <NicolasHug>`.

- |Fix| Fixed a bug in the 'saga' solver where the weights would not be
correctly updated in some cases :issue:`11646` by `Tom Dupre la Tour`_.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @TomDLT , please feel free to modify this entry for a more accurate description. I don't understand the bug fix in details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a . after "some cases"

Copy link
Member
@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@@ -799,8 +813,8 @@ In a nutshell, the following table summarizes the penalties supported by each so
| Robust to unscaled datasets | yes | yes | yes | no | no |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+

The "saga" solver is often the best choice but requires scaling. The "liblinear" solver is
used by default for historical reasons.
The "saga" solver is often the best choice but requires scaling. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but we're moving the default to lbfgs and there is no mention of this in the summary. That's a different PR though.

"multinomial", an optimal C is obtained by minimizing the cross-entropy
loss.
multiclass case, if `multi_class` option is set to "ovr", an optimal C and
l1_ratio is obtained for each class and if the `multi_class` option is set to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, we're not using accuracy here or scoring? That seems odd. Is that wording really correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogisticRegressionCV selects best parameters according to its scoring parameter.
The default scoring option used is 'accuracy'.

We should clarify the doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug can you please fix this, even though it wasn't your error?

it can be changed using the cv parameter. In the case of newton-cg and
lbfgs solvers, we warm start along the path i.e guess the initial
coefficients of the present fit to be the coefficients got after
convergence in the previous fit, so it is supposed to be faster for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language here is not that great but not your fault...

F438
Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick...

with the 'saga' solver. :issue:`11646` by :user:`Nicolas Hug <NicolasHug>`.

- |Fix| Fixed a bug in the 'saga' solver where the weights would not be
correctly updated in some cases :issue:`11646` by `Tom Dupre la Tour`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a . after "some cases"

Copy link
Member
@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good apart from nitpicks.

@pytest.mark.parametrize('C', np.logspace(-3, 2, 4))
@pytest.mark.parametrize('l1_ratio', [.1, .5, .9])
def test_elastic_net_versus_sgd(C, l1_ratio):
# Compare elasticnet penatly in LogisticRegression() and SGD(loss='log')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

penalty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry yeah, will do it tomorrow!

cv=n_folds, l1_ratios=l1_ratios,
random_state=0)
lrcv.fit(X, y)
coefs_paths = np.asarray(list(lrcv.coefs_paths_.values()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if l1_ratios is not specified, do we squeeze? Otherwise this would be a backward-incompatible change, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fixed yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to squeeze because if l1_ratios is None then we just don't even add the new dimension (see here).

The existing test that check the shapes of the attributes is left unchanged (test_logistic_cv)

# make sure LogisticRegressionCV gives same best params (l1 and C) as
# GridSearchCV when penalty is elasticnet

if multi_class == 'ovr':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so binary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, multiclass ovr is treated below in test_LogisticRegressionCV_GridSearchCV_elastic_net_ovr, just added a comment

# _log_reg_scoring_path will output different shapes depending on the
# multi_class param, so we need to reshape the outputs accordingly.
# After reshaping,
# - scores is of shape (n_classes X n_folds X n_Cs . n_l1_ratios)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the shape of Cs and why do we always have to take C[0]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cs is of shape (n_classes . n_folds . n_l1_ratios, n_Cs) and all the rows are equal, so we just take the first one.

I added a comment.

)
# equiv to coefs_paths = np.moveaxis(coefs_paths, (0, 1, 2, 3),
# (1, 2, 0, 3))
coefs_paths = np.swapaxes(coefs_paths, 0, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you find two swapaxes easier to read than one rollaxis? I don't actually have an opinion as the comment above makes it pretty clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did at the time when I was trying to break it down but not anymore ^^

@amueller
Copy link
Member

test failing...

@NicolasHug
Copy link
Member Author

Aaah... moveaxis isn't supported in old numpy version, that's actually why I needed the weird swapaxes.

Sorry

@amueller
Copy link
Member

thanks!

@amueller amueller merged commit c1f5874 into scikit-learn:master Nov 22, 2018
@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Nov 22, 2018 via email

@TomDLT
Copy link
Member
TomDLT commented Nov 22, 2018

Great ! 🎊

@jnothman
Copy link
Member
jnothman commented Nov 25, 2018 via email

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
* First draft on elasticnet penaly for LogisticRegression

* Some basic tests

* Doc update

* First draft for LogisticRegressionCV.

It seems to be working for binary classification and for multiclass when
multi_class='ovr'. I'm having a hard time figuring out the intricacies
of multi_class='multinomial'.

* Changed default to None for l1_ratio.

added warning message is user sets l1_ratio while penalty is not
elastic-net

* Some more doc

* Updated example to plot elastic net sparsity

* Fixed flake8

* Fixed test by not modifying attribute in fit

* Fixed doc issues

* WIP

* Partially fixed logistic_reg_CV for multinomial.

Also added some comments that are hopefully clear.
Still need to fix refit=False

* Fixed doc issue

* WIP

* Fixed test for refit=False in LogisticRegressionCV

* Fixed Python 2 numpy version issue

* minor doc updates

* Weird doc error...

* Added test to ensure that elastic net is at least as good as L1 or L2
once l1_ratio has been optimized with grid search

Also addressed minor reviews

* Fixed test

* addressed comments

* Added back ignore warning on tests

* Added a functional test

* Scale data in test... Now failing

* elastic-net --> elasticnet

* Updated doc for some attributes and checked their shape in tests

* Added l1_ratio dimension to coefs_paths and scores attr

* improve example + fix test

* FIX incorrect lagged_update in SAGA

* Add non-regression test for SAGA's bug

* FIX flake8 and warning

* Re fixed warning

* Updated some tests

* Addressed comments

* more comments and added dimension to LogisticRegressionCV.n_iter_ attribute

* Updated whatsnew for 0.21

* better doc shape looks

* Fixed whatnew entry after merges

* Added dot

* Addressed comments + standardized optional default param docstrings

* Addessed comments

* use swapaxes instead of unsupported moveaxis (hopefully fixes tests)
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
* First draft on elasticnet penaly for LogisticRegression

* Some basic tests

* Doc update

* First draft for LogisticRegressionCV.

It seems to be working for binary classification and for multiclass when
multi_class='ovr'. I'm having a hard time figuring out the intricacies
of multi_class='multinomial'.

* Changed default to None for l1_ratio.

added warning message is user sets l1_ratio while penalty is not
elastic-net

* Some more doc

* Updated example to plot elastic net sparsity

* Fixed flake8

* Fixed test by not modifying attribute in fit

* Fixed doc issues

* WIP

* Partially fixed logistic_reg_CV for multinomial.

Also added some comments that are hopefully clear.
Still need to fix refit=False

* Fixed doc issue

* WIP

* Fixed test for refit=False in LogisticRegressionCV

* Fixed Python 2 numpy version issue

* minor doc updates

* Weird doc error...

* Added test to ensure that elastic net is at least as good as L1 or L2
once l1_ratio has been optimized with grid search

Also addressed minor reviews

* Fixed test

* addressed comments

* Added back ignore warning on tests

* Added a functional test

* Scale data in test... Now failing

* elastic-net --> elasticnet

* Updated doc for some attributes and checked their shape in tests

* Added l1_ratio dimension to coefs_paths and scores attr

* improve example + fix test

* FIX incorrect lagged_update in SAGA

* Add non-regression test for SAGA's bug

* FIX flake8 and warning

* Re fixed warning

* Updated some tests

* Addressed comments

* more comments and added dimension to LogisticRegressionCV.n_iter_ attribute

* Updated whatsnew for 0.21

* better doc shape looks

* Fixed whatnew entry after merges

* Added dot

* Addressed comments + standardized optional default param docstrings

* Addessed comments

* use swapaxes instead of unsupported moveaxis (hopefully fixes tests)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linear classifier with elasticnet penalty
7 participants
0