-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX KNNImputer missing indicator column addition when add_indicator=True #26600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX KNNImputer missing indicator column addition when add_indicator=True #26600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @Shreesha3112 !
sklearn/impute/tests/test_knn.py
Outdated
|
||
@pytest.mark.parametrize("add_indicator", [True, False]) | ||
@pytest.mark.parametrize("missing_value_test", [np.NaN, 1]) | ||
def test_knn_imputation_adds_missing_indicator_column_if_add_indicator_is_true( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this property applies to all imputers, thus I think it's better to have a test in sklearn/impute/tests/test_common.py
:
@pytest.mark.parametrize("imputer", imputers(), ids=lambda x: x.__class__.__name__)
@pytest.mark.parametrize("missing_value_test", [np.nan, 1])
def test_imputation_adds_missing_indicator_if_add_indicator_is_true(
imputer, missing_value_test
):
"""Check that missing indicator always exists when add_indicator=True.
Non-regression test for gh-26590.
"""
X_train = np.array([[0, np.NaN], [1, 2]])
# Test data without missing values that can be set to np.NaN or 1.
X_test = np.array([[0, missing_value_test], [1, 2]])
imputer.set_params(add_indicator=True)
imputer.fit(X_train)
X_test_imputed_with_indicator = imputer.transform(X_test)
assert X_test_imputed_with_indicator.shape == (2, 3)
imputer.set_params(add_indicator=False)
imputer.fit(X_train)
X_test_imputed_without_indicator = imputer.transform(X_test)
assert X_test_imputed_without_indicator.shape == (2, 2)
assert_allclose(
X_test_imputed_with_indicator[:, :-1], X_test_imputed_without_indicator
)
if np.isnan(missing_value_test):
expected_missing_indicator = [1, 0]
else:
expected_missing_indicator = [0, 0]
assert_allclose(X_test_imputed_with_indicator[:, -1], expected_missing_indicator)
Given the original bug report, I am okay with checking add_in
8000
dicator=True
, which is also consistent with the name of the test "*_if_add_indicator_is_true".
doc/whats_new/v1.4.rst
Outdated
:mod:`sklearn.impute` | ||
..................... | ||
|
||
- |Fix| :class:`impute.KNNImputer` now correctly adds a missing indicator column in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this changelog entry be moved to doc/whats_new/1.3.rst
under 1.3.1
?
…ing-indicator-behavior
…ing-indicator-behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update! LGTM
Hoping this will be merged soon! Thank you for this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Looks good overall.
…rue (scikit-learn#26600) Co-authored-by: shreesha3112 <shreesha3112.com>
…rue (scikit-learn#26600) Co-authored-by: shreesha3112 <shreesha3112.com>
…rue (#26600) Co-authored-by: shreesha3112 <shreesha3112.com>
…rue (scikit-learn#26600) Co-authored-by: shreesha3112 <shreesha3112.com>
Reference Issues/PRs
Fixes #26590
What does this implement/fix? Explain your changes.
Previously, the
KNNImputer
did not add a missing indicator column intransform
method, when theadd_indicator
parameter was set to True and no missing values are present in the input. This could lead to inconsistent outputs between training and testing datasets for example.This bug fix ensures that
KNNImputer
correctly adds a missing indicator column in itstransform
output when theadd_indicator
is set toTrue
, even if no missing values are present in the input.Any other comments?