-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+2] RFC: Adds passthrough option to Pipeline #11674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] RFC: Adds passthrough option to Pipeline #11674
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there not an example that uses this in the gallery? It would also need updating
I ran the following to see if there are any references to grep -rl "Pipeline" examples | xargs grep "None" There does not seem to be an example that uses the None feature with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely happy with the creation of _iter
. The logic is now harder to read because of it, particularly because of the default include_final_estimator
behaviour.
But I'm not entirely upset about it either.
Something you need to check here and in the FeatureUnion PR: what currently happens with make_{pipeline,union} including None? Does it work? What does it get named?
sklearn/pipeline.py
Outdated
|
||
def _iter(self, include_final_estimator=False): | ||
""" | ||
Generate (name, trans) tuples excluding None and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to not mention None anymore, or it makes completing the deprecation harder.
I made When |
cb3a1d2
to
c83456b
Compare
Let's make the parameter with_final
Supporting passthrough or drop in make_* is a bit awkward...
|
It is a bit strange. Although, I can see some workflows where users are creating pipelines dynamically, and end up with |
Okay, so let's either forbid it or make sure 'passthrough' gets named
sensibly.
|
This PR was updated to just name the passthrough 'passthrough' when calling |
I'm now wondering if this change is truly worth the hassle to users.
|
I believe the point of this change was to unify the api of This PR together with #11640 would unify the API. |
yes, I get that... but I'm not sure that deprecating None support is
anything but a nuisance for users.
|
<
8000
/tr>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, added some minor comments.
Maybe also add a test with another string than 'passthrough' (to test that it properly errors)
sklearn/pipeline.py
Outdated
transformers = estimators[:-1] | ||
estimator = estimators[-1] | ||
transformers_steps = self.steps[:-1] | ||
estimator_name, estimator = self.steps[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change this? As it doesn't look like the names are actually used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we were still deprecating None
, this provided the name fo the transformer for the warning message. This is not needed anymore.
sklearn/pipeline.py
Outdated
@@ -167,10 +168,23 @@ def _validate_steps(self): | |||
" '%s' (type %s) doesn't" % (t, type(t))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add here something like: or be the string "passthrough"
sklearn/pipeline.py
Outdated
@@ -527,7 +533,11 @@ def _pairwise(self): | |||
def _name_estimators(estimators): | |||
"""Generate names for estimators.""" | |||
|
|||
names = [type(estimator).__name__.lower() for estimator in estimators] | |||
names = [ | |||
estimator.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the lower needed? (do we support somebody passing 'Passthrough' ?)
sklearn/tests/test_pipeline.py
Outdated
pipeline = make_pipeline(None) | ||
|
||
|
||
def test_set_pipeline_step_passthrough(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look in detail, but it might be relatively easy to parametrize this test to test both None and 'passthrough' ? (then you don't need two tests that duplicates things)
There's no tests that |
@amueller The tests are there defined at @pytest.mark.parametrize('passthrough', [None, 'passthrough'])
def test_set_pipeline_step_passthrough(passthrough):
... The parametrize allows it to test for both |
Ah, you're right. I was reading to quickly. Looks good. |
examples/compose/plot_compare_reduction.py needs now to be updated (after #12272) |
This PR keeps both options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not looked through this again in detail, but I think it's good.
doc/whats_new/v0.20.rst
Outdated
@@ -945,6 +945,9 @@ Support for Python 3.3 has been officially dropped. | |||
- |API| :class:`pipeline.FeatureUnion` now supports ``'drop'`` as a transformer | |||
to drop features. :issue:`11144` by :user:`thomasjpfan`. | |||
|
|||
- |API| :class:`pipeline.Pipeline` now supports using ``'passthrough'`` as a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move to 0.21.
Thanks again, @thomasjpfan! |
This reverts commit 5098474.
This reverts commit 5098474.
Reference Issues/PRs
Fixes #11144
What does this implement/fix? Explain your changes.
Adds
'passthrough'
topipeline.Pipeline
. I added aPipeline._iter
method to handle the filtering of transformers.