-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ColumnTransformer input feature name and count validation #14544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
now you only need to adjust the tests ;) |
The failing test is @NicolasHug do you agree with this version? |
My general thought is that we should have gone for a stricter version of #14237 and just error for any discrepancy match, as a bug fix. But OK for doing a deprecation. I'm OK with this but we still need to error when |
I agree with all of that @NicolasHug. I lean a little more towards a deprecation. |
I added a test for the case where the indexing is negative, I'd appreciate a close review on that part. |
@amueller you fine with the changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments, other than that LGTM
X_array_fewer = np.array([[0, 1, 2], ]).T | ||
err_msg = 'Number of features' | ||
with pytest.raises(ValueError, match=err_msg): | ||
ct.transform(X_array_fewer) | ||
with pytest.warns(DeprecationWarning, match=msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why warn when passing less features? I thought the only scenario we still accept is passing more features, not less?
This will be confusing for users since the message says it should not error until 0.24
with pytest.raises(ValueError, match=err_msg): | ||
tf.transform(X_trans_df) | ||
with pytest.warns(DeprecationWarning, match=warn_msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, why warn here since we error as well?
|
||
tf = ColumnTransformer([('bycol', Trans(), [0, -1])]) | ||
tf.fit(df_extra) | ||
msg = "At least one negative column was used to" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also check that no warning is raised when n_features matches with negative indexing? Just in case.
@NicolasHug done :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Adrin, LGTM if all goes green
Fixes #14251.
This raises a deprecation warning if the
X
at the time of fit and transform don't match in terms of feature names or count. It implements option (1) of #14251 (comment)