8000 [MRG+2] RFC: Adds passthrough option to Pipeline by thomasjpfan · Pull Request #11674 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] RFC: Adds passthrough option to Pipeline #11674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Nov 5, 2018

Conversation

thomasjpfan
Copy link
Member
@thomasjpfan thomasjpfan commented Jul 25, 2018

Reference Issues/PRs

Fixes #11144

What does this implement/fix? Explain your changes.

Adds 'passthrough' to pipeline.Pipeline. I added a Pipeline._iter method to handle the filtering of transformers.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not an example that uses this in the gallery? It would also need updating

@thomasjpfan
Copy link
Member Author

I ran the following to see if there are any references to None together with Pipeline in the examples:

grep -rl "Pipeline" examples | xargs grep "None"

There does not seem to be an example that uses the None feature with Pipeline.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely happy with the creation of _iter. The logic is now harder to read because of it, particularly because of the default include_final_estimator behaviour.

But I'm not entirely upset about it either.

Something you need to check here and in the FeatureUnion PR: what currently happens with make_{pipeline,union} including None? Does it work? What does it get named?


def _iter(self, include_final_estimator=False):
"""
Generate (name, trans) tuples excluding None and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to not mention None anymore, or it makes completing the deprecation harder.

@thomasjpfan
Copy link
Member Author

_iter was created because of two reasons:

  1. Iteration and filtering out 'passthrough' (or the deprecated None) is common in Pipeline
  2. When None gets deprecated, removing the None check in _iter would be most of the work needed.

I made include_final_estimator=False the default, because it was the most common use case. I updated the PR to change the default to include_final_estimator=True.

When make_{pipeline,union}(None) gets call, it creates an estimator named 'nonetype' with None as its estimator. This naming is done in pipeline._name_estimators.

@thomasjpfan thomasjpfan force-pushed the pipeline_drop_passthrough branch from cb3a1d2 to c83456b Compare July 26, 2018 15:26
@jnothman
Copy link
Member
jnothman commented Jul 27, 2018 via email

@thomasjpfan
Copy link
Member Author

It is a bit strange. Although, I can see some workflows where users are creating pipelines dynamically, and end up with make_pipeline(None) or make_pipeline('passthrough').

@thomasjpfan thomasjpfan changed the title RFC: Uses passthrough instead of None in Pipeline [MRG] RFC: Uses passthrough instead of None in Pipeline Jul 27, 2018
@jnothman
Copy link
Member
jnothman commented Jul 29, 2018 via email

@thomasjpfan
Copy link
Member Author

This PR was updated to just name the passthrough 'passthrough' when calling make_pipeline('passthrough'). The behavior for make_pipeline(None) is left unchanged.

@jnothman
Copy link
Member
jnothman commented Aug 18, 2018 via email

@thomasjpfan
Copy link
Member Author

I believe the point of this change was to unify the api of ColumnTransformer, Pipeline, and FeatureUnion. Currently, Pipeline uses None for passthrough , and FeatureUnion uses None for dropping.

This PR together with #11640 would unify the API.

@jnothman
Copy link
Member
jnothman commented Aug 19, 2018 via email
< 8000 /tr>

Copy link
Member
@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, added some minor comments.

Maybe also add a test with another string than 'passthrough' (to test that it properly errors)

transformers = estimators[:-1]
estimator = estimators[-1]
transformers_steps = self.steps[:-1]
estimator_name, estimator = self.steps[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this? As it doesn't look like the names are actually used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we were still deprecating None, this provided the name fo the transformer for the warning message. This is not needed anymore.

@@ -167,10 +168,23 @@ def _validate_steps(self):
" '%s' (type %s) doesn't" % (t, type(t)))
Copy link
Member
< 6D40 div hidden="hidden" data-view-component="true" class="js-comment-show-on-error flash flash-error flash-full">

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add here something like: or be the string "passthrough"

@@ -527,7 +533,11 @@ def _pairwise(self):
def _name_estimators(estimators):
"""Generate names for estimators."""

names = [type(estimator).__name__.lower() for estimator in estimators]
names = [
estimator.lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the lower needed? (do we support somebody passing 'Passthrough' ?)

pipeline = make_pipeline(None)


def test_set_pipeline_step_passthrough():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look in detail, but it might be relatively easy to parametrize this test to test both None and 'passthrough' ? (then you don't need two tests that duplicates things)

@thomasjpfan thomasjpfan changed the title [MRG] RFC: Uses passthrough instead of None in Pipeline [MRG] RFC: Adds passthrough option to Pipeline Aug 23, 2018
@amueller
Copy link
Member
amueller commented Oct 4, 2018

There's no tests that None still works, right? (there should be)

@thomasjpfan
Copy link
Member Author

@amueller The tests are there defined at test_set_pipeline_step_passthrough:

@pytest.mark.parametrize('passthrough', [None, 'passthrough'])
def test_set_pipeline_step_passthrough(passthrough):
    ...

The parametrize allows it to test for both None and 'passthrough'.

@amueller
Copy link
Member
amueller commented Oct 4, 2018

Ah, you're right. I was reading to quickly. Looks good.

@ngoix
Copy link
Contributor
ngoix commented Oct 9, 2018

examples/compose/plot_compare_reduction.py needs now to be updated (after #12272)

@thomasjpfan
Copy link
Member Author

This PR keeps both options, None and 'passthrough', available. I still updated the examples/compose/plot_compare_reduction.py just to nudge new users to use the 'passthrough' option.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked through this again in detail, but I think it's good.

@@ -945,6 +945,9 @@ Support for Python 3.3 has been officially dropped.
- |API| :class:`pipeline.FeatureUnion` now supports ``'drop'`` as a transformer
to drop features. :issue:`11144` by :user:`thomasjpfan`.

- |API| :class:`pipeline.Pipeline` now supports using ``'passthrough'`` as a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move to 0.21.

@thomasjpfan thomasjpfan changed the title [MRG] RFC: Adds passthrough option to Pipeline [MRG+2] RFC: Adds passthrough option to Pipeline Nov 2, 2018
@jnothman jnothman merged commit 9f842e3 into scikit-learn:master Nov 5, 2018
@jnothman
Copy link
Member
jnothman commented Nov 5, 2018

Thanks again, @thomasjpfan!

thoo pushed a commit to thoo/scikit-learn that referenced this pull request Nov 14, 2018
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC Allow passthrough as pipeline step instead of None
5 participants
0