-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Fix empty input data common checks #4245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -307,8 +307,9 @@ def check_array(array, accept_sparse=None, dtype="numeric", order=None, copy=Fal | |
ensure_min_features : int (default=1) | ||
Make sure that the 2D array has some minimum number of features | ||
(columns). The default value of 1 rejects empty datasets. | ||
This check is only enforced when ``ensure_2d`` is True and | ||
``allow_nd`` is False. Setting to 0 disables this check. | ||
This check is only enforced when the input data has effectively 2 | ||
dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0 | ||
disables this check. | ||
|
||
Returns | ||
------- | ||
|
@@ -347,7 +348,8 @@ def check_array(array, accept_sparse=None, dtype="numeric", order=None, copy=Fal | |
" minimum of %d is required." | ||
% (n_samples, shape_repr, ensure_min_samples)) | ||
|
||
if ensure_min_features > 0 and ensure_2d and not allow_nd: | ||
|
||
if ensure_min_features > 0 and array.ndim == 2: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this modification tested? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a bunch of common checks that would fail otherwise. I can add a couple of unittest in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done in 7d7b95a |
||
n_features = array.shape[1] | ||
if n_features < ensure_min_features: | ||
raise ValueError("Found array with %d feature(s) (shape=%s) while" | ||
|
@@ -411,13 +413,16 @@ def check_X_y(X, y, accept_sparse=None, dtype="numeric", order=None, copy=False, | |
axis (rows for a 2D array). | ||
|
||
ensure_min_features : int (default=1) | ||
Make sure that the 2D X has some minimum number of features | ||
Make sure that the 2D array has some minimum number of features | ||
(columns). The default value of 1 rejects empty datasets. | ||
This check is only enforced when ``ensure_2d`` is True and | ||
``allow_nd`` is False. | ||
This check is only enforced when X has effectively 2 dimensions or | ||
is originally 1D and ``ensure_2d`` is True. Setting to 0 disables | ||
this check. | ||
|
||
y_numeric : boolean (default=False) | ||
Whether to ensure that y has a numeric type. If dtype of y is object, | ||
it is converted to float64. Should only be used for regression algorithms. | ||
it is converted to float64. Should only be used for regression | ||
algorithms. | ||
|
||
Returns | ||
------- | ||
|
@@ -428,7 +433,8 @@ def check_X_y(X, y, accept_sparse=None, dtype="numeric", order=None, copy=False, | |
ensure_2d, allow_nd, ensure_min_samples, | ||
ensure_min_features) | ||
if multi_output: | ||
y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False, dtype=None) | ||
y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False, | ||
dtype=None) | ||
else: | ||
y = column_or_1d(y, warn=True) | ||
_assert_all_finite(y) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller I removed
ensure_2d=False
here and no unit test check failed contrary to what the FIXME inline comment says. Do remember which tests used to fail?This change is technically not required for this PR. I can revert it if you prefer to deal with this case in another PR such as the one tackling #4252 for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a tree-specific test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather forest-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does no longer fail apparently.