8000 [MRG+2] Fixed assumption fit attribute means object is estimator. by drkatnz · Pull Request #8418 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
8000

[MRG+2] Fixed assumption fit attribute means object is estimator. #8418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 21, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
8000
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions sklearn/utils/tests/test_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from sklearn.utils.testing import assert_warns_message
from sklearn.utils.testing import assert_warns
from sklearn.utils.testing import ignore_warnings
from sklearn.utils.testing import SkipTest
from sklearn.utils import as_float_array, check_array, check_symmetric
from sklearn.utils import check_X_y
from sklearn.utils.mocking import MockDataFrame
Expand Down Expand Up @@ -501,3 +502,15 @@ def test_check_consistent_length():
assert_raises_regexp(TypeError, 'estimator', check_consistent_length,
[1, 2], RandomForestRegressor())
# XXX: We should have a test with a string, but what is correct behaviour?


def test_check_dataframe_fit_attribute():
# check pandas dataframe with 'fit' column does not raise error
# https://github.com/scikit-learn/scikit-learn/issues/8415
try:
import pandas as pd
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
X_df = pd.DataFrame(X, columns=['a', 'b', 'fit'])
check_consistent_length(X_df)
except ImportError:
raise SkipTest("Pandas not found")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put your test in a separate test function please? Any lines after your test will not be run if pandas is not installed which is a bit dodgy.

Also add a link to the issue to add some context.

To be honest maybe the simplest thing to do is to have a test that does not require pandas, i.e. something like:

X = np.ones(5)
X.fit = 'an non-callable attribute'
check_consistent_length(X)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/tests/test_utils.py#L185

for inspiration when creating the test - seemed using pandas was ok. I would assume the best thing to do to prove the fix worked would be to test using the same framework that the problem was found in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you did it is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, there are caveats for example I believe we currently we only test with pandas on Linux with Python 3. This means that potentially a bug could sneak in without being noticed by the CIs ...

2 changes: 1 addition & 1 deletion sklearn/utils/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def _is_arraylike(x):

def _num_samples(x):
"""Return number of samples in array-like x."""
if hasattr(x, 'fit'):
if hasattr(x, 'fit') and callable(x.fit):
# Don't get num_samples from an ensembles length!
raise TypeError('Expected sequence or array-like, got '
'estimator %s' % x)
Expand Down
0