-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
change ColumnTransformer input order #12396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I hope @jorisvandenbossche isn't too disappointed :-/ |
trans_ix, cols_ix = 0, 1 | ||
if (hasattr(estimators[0][1], 'fit') or | ||
(estimators[0][1] in ('drop', 'passthrough') | ||
and estimators[0][0] not in ('drop', 'passthrough'))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you're not catching the (obscure) case make_column_transformer(("passthrough", "passthrough"), (columns, transrformer))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I guess this is now clear.
looks good. Can we test that only a single warning is raised? though that might be a bit overkill. |
IMO deprecations in 0.20.1 should end for 0.23. |
However, for ColumnTransformer, being experimental, we could make it sooner. |
But I guess we still don't know if we want this version or the other way around, do we? |
@jnothman it "transforms the columns" ;) |
I had a very brief chat with @GaelVaroquaux, and I would summarize it as "I am fine with breaking the edge case of a empty list of transformers, and also fine to switch to change the order in Personally, I would still be in favor of one of the previous iterations in this PR: warning in ColumnTransformer on changed order, but breaking the access in But, I also certainly won't block changing the order in |
It seems like changing make_colu,mnt_transformer is the simpler solution.
In 0.20.x is not ideal, but it avoids user and tutorial code changing
later. I would be happy to support both syntaxes without warning.
|
@jnothman , and how do you propose the deprecation cycle of the "old" order be? Do we just stop supporting it in 0.21 w/o a warning? Or do we start having a warning in 0.21 and remove support for the deprecated order in ~0.23? |
@jorisvandenbossche can you maybe say again why you prefer this complex technical solution instead of changing the order? To me the order is completely arbitrary and there's clear technical reasons to do it the other way. |
To be clear, my preference that I noted above #12396 (comment) is not the complex 'solution' that is the current state of this PR, but the one of the previous states where we just checked the passed tuples and warned, and didn't try to keep About the actual order change in
That might actually be a nice way to also alleviate the concerns of @GaelVaroquaux that there is no new deprecation warning introduced in a bugfix release. |
I'd rather introduce the warning in 0.20.1. We introduces some other warnings where we mucked up the deprecation in SGDClassifier and the trees. |
That might actually be a nice way to also alleviate the concerns of
@GaelVaroquaux that there is no new deprecation warning introduced in a
bugfix release.
I don't mind introducing a deprecation warning in a bugfix release.
I mind major breakages of compatibility (ie, I'm open to not covering
edge cases).
|
If that's the case, then we don't need an extra parameter to We can remove that failing test from the tests for the deprecated order, and the rest is left to the [minor] changes through reviews. @amueller that means the deciding factor is that parameter ( I personally [almost] don't have any attachment to either solutions. But from a third person's perspective, it seems a bit silly to introduce a parameter with a long deprecation cycle in a module which is still experimental, hence changing the @jorisvandenbossche I guess we can't go with the version of the PR you like since it does count as a major breakage. |
@adrinjalali can you explain why your explanation makes you torn? I think we mostly decided not to treat is as experimental. |
Meaning "if we're fine with edge-case breakage", right. |
If I think of it as not experimental, I'm totally fine with introducing a parameter with a long deprecation cycle.
But taking into account that the alternative (not breaking in the edge cases) means introducing a second |
So we really need to come to some consensus here, this is the only remaining blocker for the release afaik. |
The problem for me to give an opinion is that I don't really know what the options are: the thread has iterated a few times and the picture is very blurry.
Sent from my phone. Please forgive typos and briefness.
…On Nov 17, 2018, 18:43, at 18:43, Andreas Mueller ***@***.***> wrote:
So we really need to come to some consensus here, this is the only
remaining blocker for the release afaik.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#12396 (comment)
|
Option1:
Option 2:
Did I miss anything? |
Thanks for this very useful summary. I am not going to be helpful: I am totally fine with both.
Sent from my phone. Please forgive typos and briefness.
…On Nov 17, 2018, 18:55, at 18:55, Adrin Jalali ***@***.***> wrote:
Option1:
`(name, columns, transformer)` and `(columns, transformer)`
- change the order in `ColumnTransformer`, and the internal data
structures (depending on the input order), and support both for now,
and break the old codes on edge cases (i.e. when the tuple list is
empty). Both orders are supported until 0.22.
- `make_column_transformer` by default constructs `ColumnTransformer`s
using deprecated tuple order, and has a parameter which user can use to
set which tuple order is used to construct `ColumnTransformer`s. It
also raises a warning if the parameter is not set. The parameter will
be deprecated and ignored in v0.22, and removed in 0.24(or v0.25).
Option 2:
`(name, transformer, columns)` and `(transformer, columns)`
- change the order in `make_column_transformer`, and support both
orders until 0.22.
- No change in `ColumnTransformer`.
Did I miss anything?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#12396 (comment)
|
My personal vote is option 2, for the sole reason that I rather not have the |
I also have a slight preference for option 2, as I find the syntax more readable.
Sent from my phone. Please forgive typos and briefness.
…On Nov 17, 2018, 20:33, at 20:33, Adrin Jalali ***@***.***> wrote:
My personal vote is option 2, for the sole reason that I rather not
have the `force_deprecated` parameter in `make_column_transformer`.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#12396 (comment)
|
Let's not confuse users with an extra parameter. Let's take option 2.
|
I agree. |
@adrinjalali do you think you have time to do this soon? Otherwise I could take over. I'd really like to release 0.20.1. |
Will do, sorry didn't have time, but should be done tomorrow.
…On Mon, Nov 19, 2018, 20:21 Andreas Mueller, ***@***.***> wrote:
@adrinjalali <https://github.com/adrinjalali> do you think you have time
to do this soon? Otherwise I could take over. I'd really like to release
0.20.1.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12396 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABljeBvcfSbZJWa_BAIBUWG4nctS97HJks5uwwS-gaJpZM4XeeZc>
.
|
Awesome, sorry for hustling you! |
@adrinjalali yes, thanks a lot for your effort keeping up with the discussion! |
Fixes #12339
Changes the order of inputs, adds a deprecation warning, adds a test.