8000 change ColumnTransformer input order by adrinjalali · Pull Request #12396 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

change ColumnTransformer input order #12396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 41 commits into from

Conversation

adrinjalali
Copy link
Member

Fixes #12339

Changes the order of inputs, adds a deprecation warning, adds a test.

@adrinjalali adrinjalali changed the title change make_column_transformer input order [WIP] change make_column_transformer input order Oct 16, 2018
@adrinjalali adrinjalali changed the title [WIP] change make_column_transformer input order change make_column_transformer input order Oct 16, 2018
@amueller
Copy link
Member

I hope @jorisvandenbossche isn't too disappointed :-/

trans_ix, cols_ix = 0, 1
if (hasattr(estimators[0][1], 'fit') or
(estimators[0][1] in ('drop', 'passthrough')
and estimators[0][0] not in ('drop', 'passthrough'))):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you're not catching the (obscure) case make_column_transformer(("passthrough", "passthrough"), (columns, transrformer))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I guess this is now clear.

@amueller
Copy link
Member

looks good. Can we test that only a single warning is raised? though that might be a bit overkill.

@jnothman
Copy link
Member

IMO deprecations in 0.20.1 should end for 0.23.

@jnothman
Copy link
Member

However, for ColumnTransformer, being experimental, we could make it sooner.

@adrinjalali
Copy link
Member Author

But I guess we still don't know if we want this version or the other way around, do we?

@adrinjalali adrinjalali changed the title change make_column_transformer input order change ColumnTransformer input order Oct 22, 2018
@amueller amueller mentioned this pull request Nov 14, 2018
@amueller
Copy link
Member

@jnothman it "transforms the columns" ;)

@jorisvandenbossche
Copy link
Member

I had a very brief chat with @GaelVaroquaux, and I would summarize it as "I am fine with breaking the edge case of a empty list of transformers, and also fine to switch to change the order in make_column_transformers if that turns out to be more feasible, but still don't like we need to do any break / deprecation in a .x release"
(@GaelVaroquaux hope that is more or less correct :), but certainly comment here, as I also don't know to what extent you followed all latest developments above)

Personally, I would still be in favor of one of the previous iterations in this PR: warning in ColumnTransformer on changed order, but breaking the access in .transformers / .transformers_ (so whathever the input, you can assure than in 0.20.1 those attributes are always consistent. I think that is the easiest for library authors).
We did put "EXPERIMENTAL: some behaviors may change between releases without deprecation." in the docstring (but of course, @GaelVaroquaux is correct that we did not put that note in the release notes or docstring of make_column_transformer, and who reads docstrings anyway ..)

But, I also certainly won't block changing the order in make_column_transformers if that is preferred by most.

@jnothman
Copy link
Member
jnothman commented Nov 15, 2018 via email

@adrinjalali
Copy link
Member Author

@jnothman , and how do you propose the deprecation cycle of the "old" order be? Do we just stop supporting it in 0.21 w/o a warning? Or do we start having a warning in 0.21 and remove support for the deprecated order in ~0.23?

@amueller
Copy link
Member

@jorisvandenbossche can you maybe say again why you prefer this complex technical solution instead of changing the order? To me the order is completely arbitrary and there's clear technical reasons to do it the other way.

@jorisvandenbossche
Copy link
Member

can you maybe say again why you prefer this complex technical solution instead of changing the order?

To be clear, my preference that I noted above #12396 (comment) is not the complex 'solution' that is the current state of this PR, but the one of the previous states where we just checked the passed tuples and warned, and didn't try to keep self.transformers/self.transformers_ backwards compatible (the point when we started to discuss the modification of self.transformers: #12396 (comment)). So this would technically not be more complex than changing the order in make_column_transformers (as for both, we need the code to try both orders to infer which one was passed).
But anyway, I don't think I am going to convince you all :)

About the actual order change in make_column_transformer vs ColumnTransformer. As I said on the issue, in make_column_transformer the order of (columns, transformer) "feels" more naturally to me, it seems more logical to read and write (I like eg the way it is written here: #12339 (comment)). This was the original reason I wanted it that way in make_column_transformer (but I should have done it consistently in ColumnTransformer then, my own fault :-)).
But this is clearly subjective (and also depends on your code organisation), as clearly proven by that others don't feel that way. So as a user you simply need to remember / get used by doing it one or the other way (or look it up), and it probably doesn't matter that much which one we choose, as long as it is consistent.
So as I said, I am OK with changing the order in make_column_transformer.

[@jnothman:] I would be happy to support both syntaxes without warning [in 0.20.1]

That might actually be a nice way to also alleviate the concerns of @GaelVaroquaux that there is no new deprecation warning introduced in a bugfix release.
The main drawback is that then more people will start using the "old" way until they see a deprecation warning in 0.21.0.

@amueller
Copy link
Member

I'd rather introduce the warning in 0.20.1. We introduces some other warnings where we mucked up the deprecation in SGDClassifier and the trees.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Nov 16, 2018 via email

@adrinjalali
Copy link
Member Author

If that's the case, then we don't need an extra parameter to ColumnTransformer, and the PR as it stands satisfies all the requirements, except that it has that single ugly parameter to make_column_transformer (i.e. force_deprecated).

We can remove that failing test from the tests for the deprecated order, and the rest is left to the [minor] changes through reviews.

@amueller that means the deciding factor is that parameter (force_deprecated). If we really don't want it, then I'll [soon] create the solution to change the order in make_column_transformer.

I personally [almost] don't have any attachment to either solutions. But from a third person's perspective, it seems a bit silly to introduce a parameter with a long deprecation cycle in a module which is still experimental, hence changing the make_column_transformer order makes more sense; at the same time, we're not really treating the module as experimental, so, I'm torn.

@jorisvandenbossche I guess we can't go with the version of the PR you like since it does count as a major breakage.

@amueller
Copy link
Member

@adrinjalali can you explain why your explanation makes you torn? I think we mostly decided not to treat is as experimental.

@amueller
Copy link
Member

If that's the case, then we don't need an extra parameter to ColumnTransformer,

Meaning "if we're fine with edge-case breakage", right.
I mean it's not too bad, but if it's avoidable then I'd rather avoid it ;)

@adrinjalali
Copy link
Member Author

@adrinjalali can you explain why your explanation makes you torn? I think we mostly decided not to treat is as experimental.

If I think of it as not experimental, I'm totally fine with introducing a parameter with a long deprecation cycle.

Meaning "if we're fine with edge-case breakage", right.
I mean it's not too bad, but if it's avoidable then I'd rather avoid it ;)

But taking into account that the alternative (not breaking in the edge cases) means introducing a second force_deprecated to ColumnTransformer as well, I'd rather break in the edge case and avoid having that parameter. And I guess at least on considering the force_deprecated parameter a pretty ugly solution, we're on the same page.

@amueller
Copy link
Member

So we really need to come to some consensus here, this is the only remaining blocker for the release afaik.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Nov 17, 2018 via email

@adrinjalali
Copy link
Member Author

Option1:
(name, columns, transformer) and (columns, transformer)

  • change the order in ColumnTransformer, and the internal data structures (depending on the input order), and support both for now, and break the old codes on edge cases (i.e. when the tuple list is empty). Both orders are supported until 0.22.
  • make_column_transformer by default constructs ColumnTransformers using deprecated tuple order, and has a parameter which user can use to set which tuple order is used to construct ColumnTransformers. It also raises a warning if the parameter is not set. The parameter will be deprecated and ignored in v0.22, and removed in 0.24(or v0.25).

Option 2:
(name, transformer, columns) and (transformer, columns)

  • change the order in make_column_transformer, and support both orders until 0.22.
  • No change in ColumnTransformer.

Did I miss anything?

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Nov 17, 2018 via email

@adrinjalali
Copy link
Member Author

My personal vote is option 2, for the sole reason that I rather not have the force_deprecated parameter in make_column_transformer.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Nov 17, 2018 via email

@jnothman
Copy link
Member
jnothman commented Nov 18, 2018 via email

@amueller
Copy link
Member

I agree.

@amueller
Copy link
Member
10000

@adrinjalali do you think you have time to do this soon? Otherwise I could take over. I'd really like to release 0.20.1.

@adrinjalali
Copy link
Member Author
adrinjalali commented Nov 19, 2018 via email

@amueller
Copy link
Member

Awesome, sorry for hustling you!

@jorisvandenbossche
Copy link
Member

@adrinjalali yes, thanks a lot for your effort keeping up with the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0