-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
modified check_arrays to handle multiple sparse formats #3094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -192,7 +192,8 @@ def check_arrays(*arrays, **options): | |||
Python lists or tuples occurring in arrays are converted to 1D numpy | |||
arrays, unless allow_lists is specified. | |||
|
|||
sparse_format : 'csr', 'csc' or 'dense', None by default | |||
sparse_format : 'csr', 'csc', 'dense', or an iterable containing some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type should fit on one line to render fine in the doc
I would write
sparse_format : 'csr' | 'csc' | 'dense' | list
...
if is list ...
addressed the issues @agramfort brought up: change documentation line, and avoid copying matrices to other types unnecessarily. There started to be way too much logic going on in i decided that the most intuitive way to have the list argument to for example:
finally, i added a little more test coverage around the sparse_format functionality |
elif sparse_format is None: | ||
return array | ||
else: | ||
input_format = array.__class__.__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use array.getformat()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there's array.format
which is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, didn't realize that existed -- much nicer!
substituted my wonky also added clearer documentation for the |
@@ -102,6 +102,8 @@ def test_check_arrays_exceptions(): | |||
assert_raises(TypeError, check_arrays, [0, 1], [0, 1], meaning_of_life=42) | |||
assert_raises(ValueError, check_arrays, [0], [0], sparse_format='fake') | |||
assert_raises(ValueError, check_arrays, np.zeros((2, 3, 4)), [0]) | |||
assert_raises(ValueError, check_arrays, [0], [0], sparse_format=['csc', | |||
'fake']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cosmetics, I would prefer:
assert_raises(ValueError, check_arrays, [0], [0],
sparse_format=['csc', 'fake'])
I actually implemented this and also refactored all of the input validation in the last sprint. The |
on pull request #3076, it was proposed we extend the functionality of
sklearn.utils.validation.check_arrays(...)
to allow for multiple sparse formats. for example:check_arrays(*arrays, sparse_format=['csc', 'csr'])
i did my best to make sure my implementation's compatible with both string arguements (
sparse_format='csc'
) and collections (sparse_format=['csc', 'csr']
). wasn't quite sure what to do in the case of something like...check_arrays(*arrays, sparse_format=['csc', 'dense'])
...where both sparse and dense formats are listed. i assumed the most intuitive way to read that was to say "csc and dense arrays are allowed," and attempted to get the code to behave accordingly.
i could definitely use a watchful eye on this one, as check_arrays is used everywhere.