[MRG+2] Fix empty input data common checks#4245
Conversation
|
travis is not happy. |
|
can you rebase? |
|
I think these are intended to be smoke tests? If so, summarizing the 11 failures ( 1 error msg included ) : Needs handling of 0 samples
Needs handling of 0 features
Needs rewording of the error message for the 0 samples test
Needs rewording of the error message for the 0 features test
Others :
|
39dbaf7 to
25898de
Compare
|
I rebased. I know the tests don't pass as @ragv summarized many estimators are not consistent with the 0-samples or 0-features edge cases. I am under the impression that having support for any of those 2 cases is never useful and can be the source of confusion when the user is not properly aware of the shape of the datastructures he/she fed to sklearn estimators. This is why I think we should always raise an exception consistently with an informative error message. I will work on fixing the failures. We can discuss if maintaining exceptions is desirable or not. |
|
+1 on failing with a good error. |
25898de to
549d98c
Compare
There was a problem hiding this comment.
@amueller I removed ensure_2d=False here and no unit test check failed contrary to what the FIXME inline comment says. Do remember which tests used to fail?
This change is technically not required for this PR. I can revert it if you prefer to deal with this case in another PR such as the one tackling #4252 for instance.
There was a problem hiding this comment.
That was a tree-specific test.
There was a problem hiding this comment.
It does no longer fail apparently.
|
I fixed all the failures revealed by the new checks. |
sklearn/tests/test_common.py
Outdated
There was a problem hiding this comment.
why don't you use the "make_y_multioutput" helper?
There was a problem hiding this comment.
I cannot find what you are talking about in the code base.
There was a problem hiding this comment.
Sorry I meant y = multioutput_estimator_convert_y_2d(name, y)
|
Maybe add a whatsnew (helpful for looking up when something was fixed), and maybe use the multioutput helper. Otherwise +1 |
sklearn/utils/estimator_checks.py
Outdated
There was a problem hiding this comment.
I would use the ignore warnings decorator.
There was a problem hiding this comment.
Can you explain why you need this? I am missing something here.
There was a problem hiding this comment.
I would use the ignore warnings decorator.
Indeed.
Can you explain why you need this? I am missing something here.
I will relaunch the tests without the catch to make sure.
090ad08 to
67f6ad1
Compare
|
LGTM |
sklearn/utils/estimator_checks.py
Outdated
There was a problem hiding this comment.
So here you could do
y = multioutput_estimator_convert_y_2d(name, y)
And remove the multioutput argument. This is how it is done in all the other tests.
7d7b95a to
feceab6
Compare
|
Thanks @amueller, I addressed your comment, squashed and merged. |
This PR includes #4214 but also additional common tests that currently fail on some estimators that do not have a consistent behavior and that probably need to be fixed on a case by case basis.