-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Fix check array sparse param #7937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Fix check array sparse param #7937
Conversation
@@ -218,11 +218,12 @@ def _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, | |||
spmatrix : scipy sparse matrix | |||
Input to validate and convert. | |||
|
|||
accept_sparse : string, list of string or None (default=None) | |||
accept_sparse : string, list of string, boolean or None (default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to drop documenting None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty list should perhaps have the same behaviour (which is untested).
'data is required. Use X.toarray() to ' | ||
'convert to a dense numpy array.') | ||
elif (isinstance(accept_sparse, (list, tuple)) and | ||
len(accept_sparse) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the len and the isinstance
? That's automatic, right? And the error if it's not a string might be better than what is below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass in an empty list or tuple then it will pass the isinstance
, but it may be an empty list. len
ensures it has an element and the next check ensures that the first element is a string.
I can take those checks out however and allow it to fail on its own when it tries to index the list or format with a non-string, if that would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm right the empty list would fail with the indexing, sorry I didn't think about that. But the not a sparse format
error you get from the string seems better than the generic accept_parse
error message below, so I'd remove that check.
changed_format = True | ||
elif accept_sparse is not True: | ||
# any other type | ||
raise ValueError('The parameter \'accept_sparse\' was ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also provide the value of accept_sparse
in the message. Btw, if you mix "
and '
you can avoid escaping ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to provide the value with "accept_sparse=%s" % accept_sparse
but when an empty tuple is passed in (essentially "%s" % ()
), this errors with "not enough arguments for format string". Do you have a suggestion for how to format properly in case its an empty tuple, aside from just checking for that specific case?
Use .format instead of % formatting. This is one of the issues it was
intended to fix.
…On 30 November 2016 at 12:52, Josh Karnofsky ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/utils/validation.py
<#7937>:
> +
+ if accept_sparse in [None, False]:
+ raise TypeError('A sparse matrix was passed, but dense '
+ 'data is required. Use X.toarray() to '
+ 'convert to a dense numpy array.')
+ elif (isinstance(accept_sparse, (list, tuple)) and
+ len(accept_sparse) and
+ isinstance(accept_sparse[0], str)):
+ # ensure correct sparse format
+ if spmatrix.format not in accept_sparse:
+ # create new with correct sparse
+ spmatrix = spmatrix.asformat(accept_sparse[0])
+ changed_format = True
+ elif accept_sparse is not True:
+ # any other type
+ raise ValueError('The parameter \'accept_sparse\' was '
I'm trying to provide the value with "accept_sparse=%s" % accept_sparse
but when an empty tuple is passed in (essentially "%s" % ()"), this
errors with "not enough arguments for format string". Do you have a
suggestion for how to format properly in case its just an empty tuple,
aside from just checking for that specific case?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7937>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz6w9vJJvqGzawwCOnKkrBFMyWMXNIks5rDNbzgaJpZM4K7-aB>
.
|
assert_raise_message(ValueError, msg.format(invalid_type), | ||
check_array, X_csr, accept_sparse=invalid_type) | ||
|
||
msg = "Can't convert 'SVR' object to str implicitly" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use "'SVR' object"
and it'll work on python2 and python3 and we know it's the right error (probably)
changed_format = True | ||
elif accept_sparse is not True: | ||
# any other type | ||
raise ValueError(("The parameter 'accept_sparse={}' " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "Invalid parameter 'accept_sparse'={}"
? You you prefer the current wording, it should be "is not of correct type". The argument is (probably) not a type.
LGTM. Having the default option ( |
+1 to that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
@@ -218,11 +218,12 @@ def _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, | |||
spmatrix : scipy sparse matrix | |||
Input to validate and convert. | |||
|
|||
accept_sparse : string, list of string or None (default=None) | |||
accept_sparse : string, list of strings or boolean (default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"string, boolean or list of strings" make it clearer. You avoid the potential confusion of reading list of strings or list of booleans.
the allowed format, it will be converted to the first listed format. | ||
'csr', 'coo', 'dok', 'bsr', 'lil', 'dia'). If the input is sparse but | ||
not in the allowed format, it will be converted to the first listed | ||
format. True allows the input to be any format. False or None means |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you deprecated None my feeling is that you may remove it from the docstring.
@@ -284,11 +295,12 @@ def check_array(array, accept_sparse=None, dtype="numeric", order=None, | |||
array : object | |||
Input object to check / convert. | |||
|
|||
accept_sparse : string, list of string or None (default=None) | |||
accept_sparse : string, list of strings or boolean (default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above: string, boolean or list of strings
elif accept_sparse is not True: | ||
# any other type | ||
raise ValueError(("Invalid parameter " | ||
"'accept_sparse={}'").format(accept_sparse)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please list the allowed type of parameters, i.e. something like 'accept_sparse' should be a string, boolean or list of strings, you provided 'accept_sparse={}'.format(accept_sparse)
.
check_array(X_csr, accept_sparse=True) | ||
check_array(X_csr, accept_sparse='csr') | ||
check_array(X_csr, accept_sparse=['csr']) | ||
check_array(X_csr, accept_sparse=('csr')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not checking what you think it is checking (I guess you meant to test accept_sparse as a tuple) ('csr')
is not a tuple, ('csr',)
is a tuple with one element.
To be honest this four lines need should be moved to another function since they do not test exception. accept_sparse=True, accept_sparse='csr' and accept_sparse=['csr']. It would be great to test the type of the returned object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and by the way tuple
is not mentioned in the docstring ...
spmatrix = spmatrix.asformat(accept_sparse[0]) | ||
changed_format = True | ||
|
||
if isinstance(accept_sparse, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use six.string_types
rather than str
so that unicode strings can be passed in in Python 2.7.
To be honest thinking about it a bit more, maybe deprecating accept_sparse=None can be done in a separate PR. It will make this PR easier to review and merge. Your call @jkarno. |
The deprecation didn't seem like too much extra so I decided to work on adding it to this PR. However, I am pushing now because I have a few small questions.
Also, I assume I should be adding something to the what's new document now that the deprecation is being added? |
msg = "Passing None to parameter 'accept_sparse' is " \ | ||
"deprecated in 0.19. Use False instead." | ||
|
||
assert_raise_message((DeprecationWarning, TypeError), msg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For warnings you'd use assert_warns
, not assert_raise_message
. And you're best to test with something that does not raise an exception (although assert_warns(DeprecationError, assert_raise_message(TypeError, func, args))
should work also).
if accept_sparse is None: | ||
warnings.warn( | ||
"Passing None to parameter 'accept_sparse' is " | ||
"deprecated in 0.19. Use False instead.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"0.19" -> "version 0.19 and will be removed in 0.21"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should deprecate that in check_array
imho, as the deprecation warning should be raised even if a dense array is passed.
For deprecations also see http://scikit-learn.org/dev/developers/contributing.html#deprecation ;) |
X_csr = sp.csr_matrix(X) | ||
invalid_type = SVR() | ||
|
||
msg = "A sparse matrix was passed, but dense data is required. " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually we would use parentheses around a string to get alignment more visually pleasing.
check_array, X_csr, accept_sparse=None) | ||
|
||
msg = "Parameter 'accept_sparse' should be a string, " \ | ||
"boolean or list of strings. You provided 'accept_sparse={}'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this indentation is inconsistent with the similar string above
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to add an entry in doc/whats_new.rst to advertise the deprecation of accept_sparse=None I think.
@@ -218,11 +218,12 @@ def _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, | |||
spmatrix : scipy sparse matrix | |||
Input to validate and convert. | |||
|
|||
accept_sparse : string, list of string or None (default=None) | |||
accept_sparse : string, boolean or list/tuple of strings (default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading docstring, there is no default for this function. Actually this is a problem for all the parameters in this function, can you please fix this?
# accept_sparse 'None' deprecation check | ||
if accept_sparse is None: | ||
warnings.warn( | ||
"Passing 'None' to parameter 'accept_sparse' is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the function name to the warning. The origin of the warning is sometimes not trivial to understand so you want to make it as explicit as possible.
if isinstance(accept_sparse, six.string_types): | ||
accept_sparse = [accept_sparse] | ||
|
||
if accept_sparse is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use elif
and remove the newline mostly for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't work, right? We want to end up in the case below where isinstance(accept_sparse, (list, tuple))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't work, right?
Oh yeah, sorry I missed that. Ignore my comment then.
'data is required. Use X.toarray() to ' | ||
'convert to a dense numpy array.') | ||
elif (isinstance(accept_sparse, (list, tuple)) and | ||
len(accept_sparse)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm so what is the behaviour of accept_sparse=[]
, Is it the same as accept_sparse=False
? Maybe this should be tackled inside this if clause to be more clear, i.e. something like:
elif ininstance(accept_sparse, (list, tuple)):
if len(accept_sparse) == 0:
raise ValueError("When providing 'accept_sparse' as a tuple or list it should be non empty or some better errror message")
changed_format = True | ||
elif accept_sparse is not True: | ||
# any other type | ||
raise ValueError(("Parameter 'accept_sparse' should be a string, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick you don't need the additional parentheses inside ValueError
@@ -451,11 +470,12 @@ def check_X_y(X, y, accept_sparse=None, dtype="numeric", order=None, | |||
y : nd-array, list or sparse matrix | |||
Labels. | |||
|
|||
accept_sparse : string, list of string or None (default=None) | |||
accept_sparse : string, boolean or list of string (default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to deprecate accept_sparse=None
here too.
@@ -451,11 +470,12 @@ def check_X_y(X, y, accept_sparse=None, dtype="numeric", order=None, | |||
y : nd-array, list or sparse matrix | |||
Labels. | |||
|
|||
accept_sparse : string, list of string or None (default=None) | |||
accept_sparse : string, boolean or list of string (default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean raise a warning? But check_array
is called...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I missed that, maybe the deprecation message in check_array
can be changed to mention both check_array
and check_X_y
then.
Apologies on the gaps between commits, I was caught in the middle of finals this week. I am working on rebasing on master to solve the conflicts with |
In this case the conflict is not too hard to fix so I would recommend to use the new Github feature of resolving the conflicts via the web UI. Click on "Resolve conflicts".
If you want to do that on the command line instead, then the answer to your question is yes. When you do a rebase you change the history of your branch so you need to force push. If you don't feel super confident with the force push, you can make a backup branch ( |
Also remove unnecessary parentheses
OK I pushed a few minor changes into your branch and ended up resolving the conflicts via the web UI while I was at it. Hope you don't mind. |
LGTM by the way, let's wait for the CIs. |
@jkarno looks like you have some flake8 violations. |
There's a web UI for conflicts now? Huh
Sent from phone. Please excuse spelling and brevity.
…On Dec 19, 2016 3:52 AM, "Loïc Estève" ***@***.***> wrote:
@jkarno <https://github.com/jkarno> looks like you have some flake8
violations.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7937 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbcFm4-J1e744JlCunc9RBDAtPSQrxFks5rJkXQgaJpZM4K7-aB>
.
|
(@lesteve, I assume I can squash and merge and you forego any attribution for your changes?) |
Yeah go for it! |
Thanks @jkarno |
Nice clean-up @jkarno! |
Reference Issue
Fixes #7880
What does this implement/fix? Explain your changes.
Any other comments?
accept_sparse=True
which this would break otherwise._ensure_sparse_format
because this is where the parameter is used. However, this means no warning or exception is thrown if the array is not sparse. This may or may not be the right behavior.