[MRG+2] Fixed assumption fit attribute means object is estimator. #8418

drkatnz · 2017-02-21T02:49:37Z

Reference Issue

What does this implement/fix? Explain your changes.

Adds a check to see if the fit attribute is callable, which is more likely to indicate the attribute belongs to an estimator.

Any other comments?

jnothman · 2017-02-21T02:58:51Z

Please add a non-regression test

jnothman · 2017-02-21T02:59:03Z

sklearn/utils/validation.py

@@ -93,8 +93,9 @@ def _is_arraylike(x):
 def _num_samples(x):
    """Return number of samples in array-like x."""
    if hasattr(x, 'fit'):
-        # Don't get num_samples from an ensembles length!
-        raise TypeError('Expected sequence or array-like, got '
+        if hasattr(x.fit, '__call__'):


please use and

codecov · 2017-02-21T03:08:42Z

Codecov Report

Merging #8418 into master will decrease coverage by -0.01%.
The diff coverage is 70%.

@@            Coverage Diff             @@
##           master    #8418      +/-   ##
==========================================
- Coverage   94.75%   94.75%   -0.01%     
==========================================
  Files         342      342              
  Lines       60892    60902      +10     
==========================================
+ Hits        57701    57708       +7     
- Misses       3191     3194       +3

Impacted Files	Coverage Δ
sklearn/utils/validation.py	`99.49% <100%> (ø)`	✅
sklearn/utils/tests/test_validation.py	`97.38% <66.66%> (-0.94%)`	❌
sklearn/utils/tests/test_class_weight.py	`100% <ø> (ø)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4493d37...6041763. Read the comment docs.

drkatnz · 2017-02-21T03:52:17Z

@jnothman I've never written a non-regression test before - can you double check this for me please?

jnothman

Once you make that test PEP8-compliant, it looks good to me.

jnothman · 2017-02-21T03:53:58Z

sklearn/utils/validation.py

@@ -92,7 +92,7 @@ def _is_arraylike(x):

 def _num_samples(x):
    """Return number of samples in array-like x."""
-    if hasattr(x, 'fit'):
+    if hasattr(x, 'fit') and hasattr(x.fit, '__call__'):


Please use the builtin callable which basically does this.

That callable has been deprecated in some versions of python - 3.0 and 3.1 I think - I thought this would be more backwards compatible.

jnothman · 2017-02-21T03:57:08Z

Please change WIP to MRG when you don't expect to do more work before review.

jnothman · 2017-02-21T04:33:18Z

I'd forgotten about that. Happy to use isinstance(..., collections.Callable). Not that I believe there are many users of Python 3<3.2.

drkatnz · 2017-02-21T05:55:23Z

I don't believe it will affect many people either. Using 'callable' I think is better practice than isinstance, I think we should stick with that.

lesteve · 2017-02-21T08:36:02Z

@drkatnz I edited your description to use "Fix #issueNumber", this way the associated issue gets closed when the PR is merged.

lesteve · 2017-02-21T08:47:28Z

sklearn/utils/tests/test_validation.py

+        X_df = pd.DataFrame(X, columns=['a', 'b', 'fit'])
+        check_consistent_length(X_df)
+    except ImportError:
+        raise SkipTest("Pandas not found")


Can you put your test in a separate test function please? Any lines after your test will not be run if pandas is not installed which is a bit dodgy.

Also add a link to the issue to add some context.

To be honest maybe the simplest thing to do is to have a test that does not require pandas, i.e. something like:

X = np.ones(5) X.fit = 'an non-callable attribute' check_consistent_length(X)

I looked at:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/tests/test_utils.py#L185

for inspiration when creating the test - seemed using pandas was ok. I would assume the best thing to do to prove the fix worked would be to test using the same framework that the problem was found in.

The way you did it is fine.

For the record, there are caveats for example I believe we currently we only test with pandas on Linux with Python 3. This means that potentially a bug could sneak in without being noticed by the CIs ...

lesteve · 2017-02-21T09:53:40Z

LGTM, this should be merged once AppVeyor finishes. @drkatnz ping me if I forget.

lesteve · 2017-02-21T13:19:47Z

The coverage drop is due to the fact that we are not sending the coverage data from the pandas build in Travis.

We should probably add the coverage in the Python 3.6 build and that would also cover Python2 + Python 3. This is a one line change in .travis.yml since codecov can combine coverage from multiple builds.

lesteve · 2017-02-21T13:20:16Z

Merged, thanks a lot @drkatnz!

drkatnz · 2017-02-21T20:25:49Z

No problem, thanks for your patience and help @jnothman and @lesteve :)

…ikit-learn#8418)

Fixed assumption fit attribute means object is estimator.

839b005

jnothman reviewed Feb 21, 2017

View reviewed changes

Fixed 'and' in if statement

6348642

drkatnz changed the title ~~Fixed assumption fit attribute means object is estimator.~~ [WIP] Fixed assumption fit attribute means object is estimator. Feb 21, 2017

Added non-regression test for scikit-learn#8418

d37c248

jnothman reviewed Feb 21, 2017

View reviewed changes

Kathryn Hempstalk added 2 commits February 21, 2017 16:57

Added PEP8 compliance for new regression test

b417ce6

Replaced hasattr function with callable in-built

acad16c

drkatnz changed the title ~~[WIP] Fixed assumption fit attribute means object is estimator.~~ [MRG] Fixed assumption fit attribute means object is estimator. Feb 21, 2017

jnothman changed the title ~~[MRG] Fixed assumption fit attribute means object is estimator.~~ [MRG+21] Fixed assumption fit attribute means object is estimator. Feb 21, 2017

jnothman changed the title ~~[MRG+21] Fixed assumption fit attribute means object is estimator.~~ [MRG+1] Fixed assumption fit attribute means object is estimator. Feb 21, 2017

lesteve reviewed Feb 21, 2017

View reviewed changes

Moved pandas test to separate function

6041763

lesteve changed the title ~~[MRG+1] Fixed assumption fit attribute means object is estimator.~~ [MRG+2] Fixed assumption fit attribute means object is estimator. Feb 21, 2017

lesteve merged commit daeba62 into scikit-learn:master Feb 21, 2017

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG+2] Fixed assumption fit attribute means object is estimator. (sc…

b91ec72

…ikit-learn#8418)

Przemo10 mentioned this pull request Mar 17, 2017

update fork (#1) #8606

Closed

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+2] Fixed assumption fit attribute means object is estimator. (sc…

878bcdf

…ikit-learn#8418)

NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+2] Fixed assumption fit attribute means object is estimator. (sc…

9db2e5f

…ikit-learn#8418)

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+2] Fixed assumption fit attribute means object is estimator. (sc…

82e7eb7

…ikit-learn#8418)

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+2] Fixed assumption fit attribute means object is estimator. (sc…

3c1a3a7

…ikit-learn#8418)

lemonlaug pushed a commit to lemonlaug/scikit-learn that referenced this pull request Jan 6, 2021

[MRG+2] Fixed assumption fit attribute means object is estimator. (sc…

fb77f15

…ikit-learn#8418)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+2] Fixed assumption fit attribute means object is estimator. #8418

[MRG+2] Fixed assumption fit attribute means object is estimator. #8418

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG+2] Fixed assumption fit attribute means object is estimator. #8418

[MRG+2] Fixed assumption fit attribute means object is estimator. #8418

Uh oh!

Conversation

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!