-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] Fixed assumption fit attribute means object is estimator. #8418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please add a non-regression test |
sklearn/utils/validation.py
Outdated
@@ -93,8 +93,9 @@ def _is_arraylike(x): | |||
def _num_samples(x): | |||
"""Return number of samples in array-like x.""" | |||
if hasattr(x, 'fit'): | |||
# Don't get num_samples from an ensembles length! | |||
raise TypeError('Expected sequence or array-like, got ' | |||
if hasattr(x.fit, '__call__'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use and
Codecov Report
@@ Coverage Diff @@
## master #8418 +/- ##
==========================================
- Coverage 94.75% 94.75% -0.01%
==========================================
Files 342 342
Lines 60892 60902 +10
==========================================
+ Hits 57701 57708 +7
- Misses 3191 3194 +3
Continue to review full report at Codecov.
|
@jnothman I've never written a non-regression test before - can you double check this for me please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you make that test PEP8-compliant, it looks good to me.
sklearn/utils/validation.py
Outdated
@@ -92,7 +92,7 @@ def _is_arraylike(x): | |||
|
|||
def _num_samples(x): | |||
"""Return number of samples in array-like x.""" | |||
if hasattr(x, 'fit'): | |||
if hasattr(x, 'fit') and hasattr(x.fit, '__call__'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the builtin callable
which basically does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That callable has been deprecated in some versions of python - 3.0 and 3.1 I think - I thought this would be more backwards compatible.
Please change WIP to MRG when you don't expect to do more work before review. |
I'd forgotten about that. Happy to use |
I don't believe it will affect many people either. Using 'callable' I think is better practice than isinstance, I think we should stick with that. |
@drkatnz I edited your description to use "Fix #issueNumber", this way the associated issue gets closed when the PR is merged. |
X_df = pd.DataFrame(X, columns=['a', 'b', 'fit']) | ||
check_consistent_length(X_df) | ||
except ImportError: | ||
raise SkipTest("Pandas not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put your test in a separate test function please? Any lines after your test will not be run if pandas is not installed which is a bit dodgy.
Also add a link to the issue to add some context.
To be honest maybe the simplest thing to do is to have a test that does not require pandas, i.e. something like:
X = np.ones(5)
X.fit = 'an non-callable attribute'
check_consistent_length(X)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/tests/test_utils.py#L185
for inspiration when creating the test - seemed using pandas was ok. I would assume the best thing to do to prove the fix worked would be to test using the same framework that the problem was found in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you did it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, there are caveats for example I believe we currently we only test with pandas on Linux with Python 3. This means that potentially a bug could sneak in without being noticed by the CIs ...
LGTM, this should be merged once AppVeyor finishes. @drkatnz ping me if I forget. |
The coverage drop is due to the fact that we are not sending the coverage data from the pandas build in Travis. We should probably add the coverage in the Python 3.6 build and that would also cover Python2 + Python 3. This is a one line change in .travis.yml since codecov can combine coverage from multiple builds. |
Merged, thanks a lot @drkatnz! |
Reference Issue
Fix #8415
What does this implement/fix? Explain your changes.
Adds a check to see if the fit attribute is callable, which is more likely to indicate the attribute belongs to an estimator.
Any other comments?