8000 [MRG+2] Fixed assumption fit attribute means object is estimator. by drkatnz · Pull Request #8418 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] Fixed assumption fit attribute means object is estimator. #8418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 21, 2017

Conversation

drkatnz
Copy link
Contributor
@drkatnz drkatnz commented Feb 21, 2017

Reference Issue

Fix #8415

What does this implement/fix? Explain your changes.

Adds a check to see if the fit attribute is callable, which is more likely to indicate the attribute belongs to an estimator.

Any other comments?

@jnothman
Copy link
Member

Please add a non-regression test

@@ -93,8 +93,9 @@ def _is_arraylike(x):
def _num_samples(x):
"""Return number of samples in array-like x."""
if hasattr(x, 'fit'):
# Don't get num_samples from an ensembles length!
raise TypeError('Expected sequence or array-like, got '
if hasattr(x.fit, '__call__'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use and

@codecov
Copy link
codecov bot commented Feb 21, 2017

Codecov Report

Merging #8418 into master will decrease coverage by -0.01%.
The diff coverage is 70%.

@@            Coverage Diff             @@
##           master    #8418      +/-   ##
==========================================
- Coverage   94.75%   94.75%   -0.01%     
==========================================
  Files         342      342              
  Lines       60892    60902      +10     
==========================================
+ Hits        57701    57708       +7     
- Misses       3191     3194       +3
Impacted Files Coverage Δ
sklearn/utils/validation.py 99.49% <100%> (ø)
sklearn/utils/tests/test_validation.py 97.38% <66.66%> (-0.94%)
sklearn/utils/tests/test_class_weight.py 100% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4493d37...6041763. Read the comment docs.

@drkatnz drkatnz changed the title Fixed assumption fit attribute means object is estimator. [WIP] Fixed assumption fit attribute means object is estimator. Feb 21, 2017
@drkatnz
Copy link
Contributor Author
drkatnz commented Feb 21, 2017

@jnothman I've never written a non-regression test before - can you double check this for me please?

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you make that test PEP8-compliant, it looks good to me.

@@ -92,7 +92,7 @@ def _is_arraylike(x):

def _num_samples(x):
"""Return number of samples in array-like x."""
if hasattr(x, 'fit'):
if hasattr(x, 'fit') and hasattr(x.fit, '__call__'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the builtin callable which basically does this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That callable has been deprecated in some versions of python - 3.0 and 3.1 I think - I thought this would be more backwards compatible.

@jnothman
Copy link
Member

Please change WIP to MRG when you don't expect to do more work before review.

@drkatnz drkatnz changed the title [WIP] Fixed assumption fit attribute means object is estimator. [MRG] Fixed assumption fit attribute means object is estimator. Feb 21, 2017
@jnothman
Copy link
Member

I'd forgotten about that. Happy to use isinstance(..., collections.Callable). Not that I believe there are many users of Python 3<3.2.

@jnothman jnothman changed the title [MRG] Fixed assumption fit attribute means object is estimator. [MRG+21] Fixed assumption fit attribute means object is estimator. Feb 21, 2017
@jnothman jnothman changed the title [MRG+21] Fixed assumption fit attribute means object is estimator. [MRG+1] Fixed assumption fit attribute means object is estimator. Feb 21, 2017
@drkatnz
Copy link
Contributor Author
drkatnz commented Feb 21, 2017

I don't believe it will affect many people either. Using 'callable' I think is better practice than isinstance, I think we should stick with that.

@lesteve
Copy link
Member
lesteve commented Feb 21, 2017

@drkatnz I edited your description to use "Fix #issueNumber", this way the associated issue gets closed when the PR is merged.

X_df = pd.DataFrame(X, columns=['a', 'b', 'fit'])
check_consistent_length(X_df)
except ImportError:
raise SkipTest("Pandas not found")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put your test in a separate test function please? Any lines after your test will not be run if pandas is not installed which is a bit dodgy.

Also add a link to the issue to add some context.

To be honest maybe the simplest thing to do is to have a test that does not require pandas, i.e. something like:

X = np.ones(5)
X.fit = 'an non-callable attribute'
check_consistent_length(X)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/tests/test_utils.py#L185

for inspiration when creating the test - seemed using pandas was ok. I would assume the best thing to do to prove the fix worked would be to test using the same framework that the problem was found in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you did it is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, there are caveats for example I believe we currently we only test with pandas on Linux with Python 3. This means that potentially a bug could sneak in without being noticed by the CIs ...

@lesteve
Copy link
Member
lesteve commented Feb 21, 2017

LGTM, this should be merged once AppVeyor finishes. @drkatnz ping me if I forget.

@lesteve lesteve changed the title [MRG+1] Fixed assumption fit attribute means object is estimator. [MRG+2] Fixed assumption fit attribute means object is estimator. Feb 21, 2017
@lesteve
Copy link
Member
lesteve commented Feb 21, 2017

The coverage drop is due to the fact that we are not sending the coverage data from the pandas build in Travis.

We should probably add the coverage in the Python 3.6 build and that would also cover Python2 + Python 3. This is a one line change in .travis.yml since codecov can combine coverage from multiple builds.

@lesteve lesteve merged commit daeba62 into scikit-learn:master Feb 21, 2017
@lesteve
Copy link
Member
lesteve commented Feb 21, 2017

Merged, thanks a lot @drkatnz!

@drkatnz
Copy link
Contributor Author
drkatnz commented Feb 21, 2017

No problem, thanks for your patience and help @jnothman and @lesteve :)

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
lemonlaug pushed a commit to lemonlaug/scikit-learn that referenced this pull request Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

model_selection.KFold and GroupKFold cannot handle column labelled 'fit' in pandas dataframe
3 participants
0