-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
ENH: added axis param for np.count_nonzero #7177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
☔ The latest upstream changes (presumably #7246) made this pull request unmergeable. Please resolve the merge conflicts. |
@jaimefrio : Unfortunately, I have not been able to make much progress on refactoring the code into C. However, I think it should be good to merge as is since it performs relatively well, and the tests pass. If someone else wants to take a look at this, that would be great as well! |
Can somebody take a look at this? I've been pinging this PR for a couple of weeks now with no response from anyone. The code should be ready to land. |
☔ The latest upstream changes (presumably #7346) made this pull request unmergeable. Please resolve the merge conflicts. |
Alright, this is the second time I've encountered a merge conflict on this PR. Could someone take a look at this and try to land it? |
☔ The latest upstream changes (presumably #7178) made this pull request unmergeable. Please resolve the merge conflicts. |
This code currently has a lot of branches. Given the apparent lack of existing test coverage, I'm not even sure all of them are being exercised. It needs to test every valid dtype, and to test invalid (e.g., I have also have some broader concerns about
|
return (multiarray.count_nonzero(a) if axis is None | ||
else (a != '').sum(axis=axis, dtype=intp)) | ||
|
||
return (multiarray.count_nonzero(a) if axis is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What falls through to this case? Does it really make sense to use a potentially very slow loop with apply_along_axis
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I covered most of the common cases above, and when the objects start getting really weird (i.e. dtype=object
cases), behaviour is unfortunately quite unpredictable with regards to how it will behave when counting along multiple axes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we had a simple is_nonzero
function that would return a boolean array. Then we could simply use sum
here instead of apply_along_axis (which feels very hacky).
I haven't looked at the rest, but I can explain the name :-). In py2, "nonzero" is the official name for the property that's more commonly known as "truthiness" (see |
All reactions
Sorry, something went wrong.
@shoyer: The reason why |
All reactions
Sorry, something went wrong.
OK, that makes sense for the definition of "nonzero". It would be good to document that :). |
All reactions
Sorry, something went wrong.
@njsmith : If we could rename it, what should it be called? That feels like a |
All reactions
Sorry, something went wrong.
@shoyer: +1 for the documentation. That has been added along with |
All reactions
Sorry, something went wrong.
This still needs comprehensive test coverage, as I mentioned in my comment above |
All reactions
Sorry, something went wrong.
@shoyer: Ah, good point. Thanks for the reminder! |
All reactions
Sorry, something went wrong.
Could the folks involved here take another look. |
All reactions
-
👍 1 reaction
Sorry, something went wrong.
size = (5, 5, 5, 5) | ||
msg = "Mismatch for axis: %s" | ||
|
||
m = randint(-100, 100, size=size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a random seed to ensure that this test can't fail in a stochastic fashion. I recommend using np.random.RandomState
.
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Sorry, something went wrong.
All reactions
@rgommers : I can try rewriting the current |
All reactions
Sorry, something went wrong.
@rgommers : simplicity seems best - ended up using the current |
All reactions
Sorry, something went wrong.
☔ The latest upstream changes (presumably #7689) made this pull request unmergeable. Please resolve the merge conflicts. |
All reactions
Sorry, something went wrong.
@rgommers : any update on this? |
All reactions
Sorry, something went wrong.
Travis is happy, except for the spurious failure on |
All reactions
Sorry, something went wrong.
@rgommers : any update on this? |
All reactions
Sorry, something went wrong.
return a.sum(axis=axis, dtype=np.intp) | ||
|
||
if issubdtype(a.dtype, np.number): | ||
return (a != 0).sum(axis=axis, dtype=np.intp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocates a new boolean array of the same shape as the original. I thought the whole point was to avoid doing that...
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madphysicist: When was that the whole point?
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought wrong apparently. It just seems a bit hacky to do that with a function that is implemented in C exactly to avoid such an operation.
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hacky, a bit, but it does get the job done without too much sadness.
Sorry, something went wrong.
All reactions
+1 for the overall idea. I started something in C along similar lines. My main objection is the creation of boolean temp arrays in corner cases. I do like the idea of adding the results along multiple axes once you have applied it to one. |
All reactions
Sorry, something went wrong.
@madphysicist : thanks for taking a look! Would be great to get feedback from the maintainers (@rgommers or someone else) too so that this can actually be merged. |
All reactions
Sorry, something went wrong.
Can somebody take a look at this? |
All reactions
Sorry, something went wrong.
+1 from the peanut gallery. I think getting the |
All reactions
-
👍 1 reaction
Sorry, something went wrong.
I second that. I would really like to see this in numpy regardless of whether it could be done a different way. |
All reactions
-
👍 1 reaction
Sorry, something went wrong.
Can somebody take a look at this? |
All reactions
Sorry, something went wrong.
OK, this looks pretty sane to me at this point, so I will merge after CI tests pass. |
All reactions
Sorry, something went wrong.
Sorry, something went wrong.
Fixes numpy#9728 This bug was introduced with the `axis` keyword in numpy#7177, as a misguided optimization.
array([2, 3]) | ||
|
||
""" | ||
if axis is None or axis == (): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider this a bug, described in #9728.
Sorry, something went wrong.
All reactions
Fixes numpy#9728 This bug was introduced with the `axis` keyword in numpy#7177, as a misguided optimization.
eric-wieser
Successfully merging this pull request may close these issues.
Addresses feature request in #391 to add an axis parameter to
np.count_nonzero
.Duplicate of #7138 after a forced push on another branch accidentally reset my origin branch for that PR to
master
, closing it automatically (I had reset hard on the #7138 branch tomaster
so I could do the C refactoring @jaimefrio suggested in the spirit of #4330, whoops).