ENH: added axis param for np.count_nonzero #7177

gfyoung · 2016-02-03T04:34:24Z

Addresses feature request in #391 to add an axis parameter to np.count_nonzero.

Duplicate of #7138 after a forced push on another branch accidentally reset my origin branch for that PR to master, closing it automatically (I had reset hard on the #7138 branch to master so I could do the C refactoring @jaimefrio suggested in the spirit of #4330, whoops).

homu · 2016-02-14T16:40:30Z

☔ The latest upstream changes (presumably #7246) made this pull request unmergeable. Please resolve the merge conflicts.

gfyoung · 2016-02-29T01:46:40Z

@jaimefrio : Unfortunately, I have not been able to make much progress on refactoring the code into C. However, I think it should be good to merge as is since it performs relatively well, and the tests pass. If someone else wants to take a look at this, that would be great as well!

gfyoung · 2016-03-06T23:17:37Z

Can somebody take a look at this? I've been pinging this PR for a couple of weeks now with no response from anyone. The code should be ready to land.

homu · 2016-03-12T20:05:20Z

☔ The latest upstream changes (presumably #7346) made this pull request unmergeable. Please resolve the merge conflicts.

gfyoung · 2016-03-12T21:54:33Z

Alright, this is the second time I've encountered a merge conflict on this PR. Could someone take a look at this and try to land it?

homu · 2016-03-13T23:31:59Z

☔ The latest upstream changes (presumably #7178) made this pull request unmergeable. Please resolve the merge conflicts.

shoyer · 2016-03-14T02:14:27Z

This code currently has a lot of branches. Given the apparent lack of existing test coverage, I'm not even sure all of them are being exercised. It needs to test every valid dtype, and to test invalid (e.g., axis='foo' or axis=3 for a 2d array), negative and tuple axes (e.g., axis=[0, 1]).

I have also have some broader concerns about count_nonzero:

The behavior for non-numeric dtypes is not documented. What exactly does it mean to be "zero"? Apparently None and '' are all considered zero for object arrays:
```
In [13]: x = np.array([None, 0, '', 12])

In [14]: x
Out[14]: array([None, 0, '', 12], dtype=object)

In [15]: np.count_nonzero(x)
Out[15]: 1
```
Is this behavior a bug or a feature? I don't know.
If (x != 0).sum() is faster and more explicit, what's the point of even encouraging the use of this function?

shoyer · 2016-03-14T02:15:44Z

numpy/core/numeric.py

+        return (multiarray.count_nonzero(a) if axis is None
+                else (a != '').sum(axis=axis, dtype=intp))
+
+    return (multiarray.count_nonzero(a) if axis is None


What falls through to this case? Does it really make sense to use a potentially very slow loop with apply_along_axis?

I think I covered most of the common cases above, and when the objects start getting really weird (i.e. dtype=object cases), behaviour is unfortunately quite unpredictable with regards to how it will behave when counting along multiple axes.

It would be nice if we had a simple is_nonzero function that would return a boolean array. Then we could simply use sum here instead of apply_along_axis (which feels very hacky).

Bleh, I was going to say that casting-to-bool was is_nonzero, but it's not true -- string->bool type coercion doesn't check truthiness, it tries to parse the strings or something :-/. I guess internally we don't care about truthiness except in the case of if arr: ..., which can be handled by checking shape + count_nonzero?

@njsmith: Yeah...I hit my head against that option as well. Needless to say, my definition of 'nonzero-ness' has been forever altered...:smile:

njsmith · 2016-03-14T03:23:48Z

What exactly does it mean to be "zero"?

I haven't looked at the rest, but I can explain the name :-). In py2, "nonzero" is the official name for the property that's more commonly known as "truthiness" (see __nonzero__). This is confusing and in py3 they fixed this (__nonzero__ is now __bool__), but this function's name remains as collateral damage.

gfyoung · 2016-03-14T21:31:26Z

@shoyer: The reason why a != 0 isn't used by itself is because the definition of non-zero, as @njsmith enumerated, is unfortunately not as clear-cut as it seemed to be at the surface. I too had that thought initially, and then numpy tests told me otherwise.

shoyer · 2016-03-14T21:45:22Z

OK, that makes sense for the definition of "nonzero". It would be good to document that :).

gfyoung · 2016-03-14T22:27:55Z

@njsmith : If we could rename it, what should it be called? That feels like a FutureWarning of some kind (i.e. "WARNING: name change coming soon!"), maybe not for this PR but at least for a follow-up.

gfyoung · 2016-03-14T22:36:24Z

@shoyer: +1 for the documentation. That has been added along with [ci skip] to this PR until there are other code-related changes to make OR if people deem this merge-able.

gfyoung · 2016-03-21T12:33:58Z

@shoyer, @njsmith : Is this good to merge? I've made all of the requested changes, and I haven't seen any complaints in the past week or so.

shoyer · 2016-03-21T15:07:46Z

This still needs comprehensive test coverage, as I mentioned in my comment above

gfyoung · 2016-03-21T15:12:55Z

@shoyer: Ah, good point. Thanks for the reminder!

gfyoung · 2016-03-26T19:06:34Z

@shoyer, @njsmith : Added lots of new tests for count_nonzero that I think are fairly comprehensive. Travis and Appveyor give the green light. If there is nothing else, should be ready to merge.

gfyoung · 2016-04-13T16:37:01Z

@shoyer , @njsmith : I haven't seen any complaints / issue from either of you or anyone regarding this PR for almost two weeks. Can this be merged?

charris · 2016-04-16T19:52:00Z

Could the folks involved here take another look.

shoyer · 2016-04-16T20:12:57Z

numpy/core/tests/test_numeric.py

+        size = (5, 5, 5, 5)
+        msg = "Mismatch for axis: %s"
+
+        m = randint(-100, 100, size=size)


Use a random seed to ensure that this test can't fail in a stochastic fashion. I recommend using np.random.RandomState.

gfyoung · 2016-05-28T22:18:53Z

@rgommers : I can try rewriting the current _validate_axis function and see if my new checks are compatible with the current code. Otherwise, I'll add a new one.

gfyoung · 2016-05-30T22:01:26Z

@rgommers : simplicity seems best - ended up using the current _validate_axis function, as that seems to suffice on second look. Travis and Appveyor are happy with this.

homu · 2016-06-03T17:09:16Z

☔ The latest upstream changes (presumably #7689) made this pull request unmergeable. Please resolve the merge conflicts.

gfyoung · 2016-06-07T18:44:28Z

@rgommers : any update on this?

gfyoung · 2016-06-08T01:48:44Z

Travis is happy, except for the spurious failure on master from the LRU cache PR. Will undo the [ci skip] in the commit once this error has been fixed. Otherwise, this should be ready to merge if there are no other concerns.

gfyoung · 2016-07-06T04:12:31Z

@rgommers : any update on this?

madphysicist · 2016-07-07T18:33:35Z

numpy/core/numeric.py

+        return a.sum(axis=axis, dtype=np.intp)
+
+    if issubdtype(a.dtype, np.number):
+        return (a != 0).sum(axis=axis, dtype=np.intp)


This allocates a new boolean array of the same shape as the original. I thought the whole point was to avoid doing that...

@madphysicist: When was that the whole point?

I thought wrong apparently. It just seems a bit hacky to do that with a function that is implemented in C exactly to avoid such an operation.

Hacky, a bit, but it does get the job done without too much sadness.

madphysicist · 2016-07-07T18:40:19Z

+1 for the overall idea. I started something in C along similar lines. My main objection is the creation of boolean temp arrays in corner cases. I do like the idea of adding the results along multiple axes once you have applied it to one.

gfyoung · 2016-07-08T15:33:12Z

@madphysicist : thanks for taking a look!

Would be great to get feedback from the maintainers (@rgommers or someone else) too so that this can actually be merged.

gfyoung · 2016-07-13T14:36:04Z

Can somebody take a look at this?

perimosocordiae · 2016-07-22T22:00:24Z

+1 from the peanut gallery.

I think getting the axis kwarg in the official API is the most important consideration, even if the underlying code isn't currently the most efficient possible approach. Further optimization and cleanup can happen later, behind the scenes and without user-visible changes.

madphysicist · 2016-07-26T13:36:43Z

I second that. I would really like to see this in numpy regardless of whether it could be done a different way.

Closes gh-391.

gfyoung · 2016-08-05T02:23:14Z

Can somebody take a look at this?

shoyer · 2016-08-05T02:51:47Z

OK, this looks pretty sane to me at this point, so I will merge after CI tests pass.

rgommers · 2016-08-06T10:18:26Z

Great, thanks @gfyoung @shoyer & all.

Fixes numpy#9728 This bug was introduced with the `axis` keyword in numpy#7177, as a misguided optimization.

eric-wieser · 2017-10-11T07:48:43Z

numpy/core/numeric.py

+    array([2, 3])
+
+    """
+    if axis is None or axis == ():


I'd consider this a bug, described in #9728.

Fixes numpy#9728 This bug was introduced with the `axis` keyword in numpy#7177, as a misguided optimization.

charris added 01 - Enhancement 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes component: numpy._core and removed 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes labels Feb 14, 2016

shoyer reviewed Mar 14, 2016
View reviewed changes

gfyoung closed this Apr 6, 2016

gfyoung reopened this Apr 6, 2016

charris added this to the 1.12.0 release milestone Apr 16, 2016

shoyer reviewed Apr 16, 2016
View reviewed changes

gfyoung closed this Jun 29, 2016

gfyoung reopened this Jun 29, 2016

madphysicist reviewed Jul 7, 2016
View reviewed changes

ENH: added axis param for np.count_nonzero

0fc9e45

Closes gh-391.

shoyer merged commit 31a95d9 into numpy:master Aug 5, 2016

This was referenced Aug 5, 2016

ENH: adding maxlag mode to convolve and correlate #5978

Closed

ENH: Allow Randint to Broadcast Arguments #6938

Closed

gfyoung deleted the count_nonzero_axis branch August 5, 2016 07:47

eric-wieser mentioned this pull request Oct 11, 2017

MAINT: Fix all special-casing of dtypes in count_nonzero #9849

Merged

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Oct 18, 2017

BUG: count_nonzero treats empty axis tuples strangely

3856a73

Fixes numpy#9728 This bug was introduced with the `axis` keyword in numpy#7177, as a misguided optimization.

eric-wieser mentioned this pull request Oct 18, 2017

BUG: count_nonzero treats empty axis tuples strangely #9881

Merged

eric-wieser reviewed Oct 18, 2017

View reviewed changes

numpy/core/numeric.py

array([2, 3])

"""

if axis is None or axis == ():

Copy link

Member

eric-wieser Oct 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider this a bug, described in #9728.

theodoregoetz pushed a commit to theodoregoetz/numpy that referenced this pull request Oct 23, 2017

BUG: count_nonzero treats empty axis tuples strangely

6ec2d71

Fixes numpy#9728 This bug was introduced with the `axis` keyword in numpy#7177, as a misguided optimization.

Uh oh!

ENH: added axis param for np.count_nonzero #7177

ENH: added axis param for np.count_nonzero #7177

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!