8000 [MRG+1] fix _BaseComposition._set_params with nested parameters by amueller · Pull Request #9945 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] fix _BaseComposition._set_params with nested parameters #9945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 18, 2017

Conversation

amueller
Copy link
Member

Fixes #9944.

This seemed the simplest fix.

This kind of means that the protection in _BaseComposition._set_params is not enough.
We could make that method work by grouping the things in BaseEstimator.set_params according to their prefixes and call set_params only once per prefix. That seems like a slightly cleaner solution, but not sure it's worth the effort.

This will go away in Python3.6 as iteration is guaranteed to be ordered then.

@jnothman
Copy link
Member
jnothman commented Oct 17, 2017 via email

@jnothman
Copy link
Member
jnothman commented Oct 17, 2017 via email

@jnothman
Copy link
Member

I pushed a more extensive fix to your branch. I hope you don't mind.

@codecov
Copy link
codecov bot commented Oct 18, 2017

Codecov Report

Merging #9945 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9945      +/-   ##
==========================================
+ Coverage   96.17%   96.17%   +<.01%     
==========================================
  Files         336      336              
  Lines       62533    62540       +7     
==========================================
+ Hits        60138    60145       +7     
  Misses       2395     2395
Impacted Files Coverage Δ
sklearn/tests/test_pipeline.py 99.64% <100%> (ø) ⬆️
sklearn/feature_selection/base.py 94.82% <0%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c6483c...dd1b792. Read the comment docs.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller, let me know if you find this use of groupby illegible. Using defaultdict may be neater.

@jnothman jnothman force-pushed the sorted_param_setting branch from b2e5a7b to 5129522 Compare October 18, 2017 03:04
@jnothman jnothman force-pushed the sorted_param_setting branch from 5129522 to 1c07283 Compare October 18, 2017 03:05
@jnothman jnothman added this to the 0.19.1 milestone Oct 18, 2017
@jnothman jnothman force-pushed the sorted_param_setting branch from 28a3235 to b7b4464 Compare October 18, 2017 03:11
You can't determine even local parameters with only class name
Copy link
Member
@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an expert on the Pipeline code, but this looks good to me, with a small comment.

I double-checked that the tests were failing on master Python 3.6.

('b', DummyRegressor())
]))
])
params = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a collections.OrderedDict to make this test more future-proof?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shrug. I think it distorts the purpose of the test somewhat if we only test with an OrderedDict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit clearer (without some of the subtlety of python version-specific dict ordering):

estimator.set_params(a__b__alpha=0.001, a__b=Lasso())

@lesteve lesteve changed the title MRG iterate over parameters in sorted order in set_params [MRG+1] iterate over parameters in sorted order in set_params Oct 18, 2017
@lesteve lesteve changed the title [MRG+1] iterate over parameters in sorted order in set_params [MRG+1] fix _BaseComposition._set_params with nested parameters Oct 18, 2017
@lesteve
Copy link
Member
lesteve commented Oct 18, 2017

I rename the PR title because it did not seem appropriate any more. Feel free to use a better one.

@lesteve
Copy link
Member
lesteve commented Oct 18, 2017

Pushed a change that should fix the flake8 error.

@amueller
Copy link
Member Author

@jnothman you understood the problem, I guess, because you pushed the other fix? As I said, that fix might be a bit cleaner.

Just for the record:
The problem is that these are two pipelines inside each other. The logic for handling this is inside the _BaseComposition._set_params. The outer pipeline does that, but then iterates over the parameters it delegates in an arbitrary order. That means on the inner pipeline set_params is called multiple times, with one parameter each time. That means this loop in the outer pipeline determines the sequence that the parameters are set, the inner pipeline has no control.
@jnothman's fix (which was my second suggested fix) is to pass all parameters to the inner pipeline at once, so that it can handle the order of the setting itself, which allows the logic in _BaseComposition._set_params to kick in.
My fix was to order the setting of the parameters, and in lexical ordering any prefix of a string comes before the string.

@amueller
Copy link
Member Author
amueller commented Oct 18, 2017

LGTM. This behavior is more natural.

@jnothman
Copy link
Member

Your fix is not sufficient if, say, the user sets steps=[('a', Lasso())], a__alpha=1

@jnothman
Copy link
Member

I'd missed your original comment about this potential solution!

@jnothman
Copy link
Member

I'm happy to merge when Travis says my new assertion is okay.

@jnothman jnothman merged commit 75763cf into scikit-learn:master Oct 18, 2017
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Oct 18, 2017
@marcus-voss
Copy link

Hey, can you have a look at my issue here:
https://stackoverflow.com/questions/46915855/scikit-learn-set-param-for-custom-estimator-sets-nested-parameter-before-comp

This very similar case still fails.

@jnothman
Copy link
Member
jnothman commented Oct 25, 2017 via email

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 25, 2017
Issue where estimator is changed as well as its parameter: scikit-learn#9945 (comment)
@marcus-voss
Copy link

Thanks for that very quick fix! Indeed your fix does also work for my original SO minimal example. To close the issue there: Would you want to add an answer there referring to your fix, then I could accept that and close this issue there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline in Pipeline seems to not work well with setting of parameters using .set_params
4 participants
0