ENH: Add an "axis" kwarg to numpy.unique #3584

joferkington · 2013-08-07T11:51:31Z

Description

Adds an axis kwarg to numpy.unique so that unique can easily be used to find unique rows/columns/sub-arrays/etc of a larger ndarray.

Rationale

There is often a need to find unique rows/columns/etc of an array. However, numpy.unique currently operates only on the flattened array, and returns unique elements. It is possible to work around this in several ways (e.g. viewing each row/column/etc as a structured array or a void dtype the size of the row/column in question). However, each of these work-arounds requires a fairly in-depth knowledge of numpy. Having an axis kwarg to unique makes this operation much simpler.

Examples

In [1]: import numpy as np

In [2]: a = np.array([[1, 0, 1], [0, 1, 0], [0, 1, 0], [0, 1, 0]])

In [3]: a
Out[3]: 
array([[1, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0]])

In [4]: np.unique(a)
Out[4]: array([0, 1])

In [5]: np.unique(a, axis=0)
Out[5]: 
array([[0, 1, 0],
       [1, 0, 1]])

In [6]: np.unique(a, axis=1)
Out[6]: 
array([[0, 1],
       [1, 0],
       [1, 0],
       [1, 0]])

In [7]: b = np.dstack([a]*3).T

In [8]: b
Out[8]: 
array([[[1, 0, 0, 0],
        [0, 1, 1, 1],
        [1, 0, 0, 0]],

       [[1, 0, 0, 0],
        [0, 1, 1, 1],
        [1, 0, 0, 0]],

       [[1, 0, 0, 0],
        [0, 1, 1, 1],
        [1, 0, 0, 0]]])

In [9]: np.unique(b, axis=0)
Out[9]: 
array([[[1, 0, 0, 0],
        [0, 1, 1, 1],
        [1, 0, 0, 0]]])

seberg · 2013-08-07T14:29:00Z

numpy/lib/arraysetops.py

+    # a structured array, but only works if array is contiguous)
+    # Inspired by <http://stackoverflow.com/a/16973510/325565>
+    new_itemsize = ar.dtype.itemsize * ar.shape[1]
+    consolidated = ar.view(np.dtype((np.void, new_itemsize)))


I don't have an opinion about the idea as such, it should probably be brought up on the list. This is too simple however. Inexact types/objects cannot be compared bitwise. If you want to do optimizations by using void, do it only for types where you know that bitwise equality is the same as value equality. (I guess all integer types should be safe, but not even booleans)

@seberg - Thanks for the catch on the bitwise comparisons! I clearly didn't think it through quite clearly enough.

I'll run my suggestion by the list and then update it to fall back on a structured array for dtypes other than int, float, etc. (Float bitwise equality should be safe, beyond the usual caveats with float equality, right?)

Well, luckily, np.unique is mostly useful for ints in the first place ;). Ignoring possible problems with NaNs (at least the python fallback is not consistent here either), there is still the issue of -0. and 0. not being the same. Not sure if that can be worked around somehow; np.longdouble certainly cannot be trusted, though (longdouble is not standardized, can be a composite, include random bits...).

Just for clarification, I (finally) got around to doing what I said I would do and implemented this correctly and sent an e-mail out to the list. I squashed the commits back together for easier viewing, thus the single commit and outdated diff.

The void optimization is now only used for ints and strings. All other dtypes (including all floats) fall back to a structured array, which is slower, but should be correct.

Thanks again for catching that!

charris · 2013-08-29T23:15:31Z

@seberg Want to finish this off ;)

joferkington · 2013-08-29T23:54:38Z

Yeah, I don't think there was a lot of interest in adding this, based on the reaction on the list.

It arguably would cause more confusion than good. I'm still all for adding it, but I'm biased, obviously! :)

charris · 2013-08-30T00:55:50Z

I didn't mean kill it. I meant take it to conclusion, whatever that was ;)

joferkington · 2013-08-30T01:09:46Z

Ah! Sorry for the confusion!

seberg · 2013-09-07T19:00:07Z

numpy/lib/arraysetops.py

@@ -159,6 +170,45 @@ def unique(ar, return_index=False, return_inverse=False):
    array([1, 2, 6, 4, 2, 3, 2])

    """
+    if axis is None or ar.ndim == 1:


Maybe just remove the or? If axis is not None, it might still be wrong, and it should be an error.

seberg · 2013-09-07T20:20:49Z

There is also a merge conflict, probably style changes in the tests or so. Generally looks good to me, and considering that this kind of thing seems to crop up more often. The only real thing I am wondering about is whether someone has a great idea for how to call the argument ;).

…specified

seberg · 2013-09-24T08:58:17Z

General travis error, closing and reopening.

seberg · 2013-09-24T08:58:58Z

Nvm, they seem to ignore that, but on the other hand travis now has a bigger button to restart all jobs ;).

joferkington · 2013-09-25T03:42:53Z

@seberg - Sorry for the delay! (and for the failing pull request) I've been in the middle of a move over the past few weeks and got sidetracked for awhile.

I fixed the failing tests (sorry again for those), but I think I'm going to go ahead and close this pull request.

It didn't have much support on the mailing list in its current form. Having this functionality as a separate function seemed to have broader support, but namespace pollution is always a concern.

I may run it back by the list as a "unique_rows" function, but in that case it's probably cleaner to make another pull requests.

Any thoughts? Thanks again for looking over things, regardless!

seberg · 2013-09-25T09:49:31Z

Well, I am not quite sure about keyword vs. name (and what exactly). But the thing seems to crop up once in a while, so it might make sense, and those homebrew versions are likely to forget the subtleties you fixed...

alimuldal · 2015-04-21T14:29:03Z

I, for one, would really like to see this feature in numpy 1.10! Were there any show-stopping concerns about this implementation? I get the impression from the comments that the outstanding issues were mostly API and documentation-related.

seberg · 2015-04-21T15:46:57Z

@alimuldal, I do not remember the discussion on the list. Maybe you can have a look at this. We probably had most of the techincal issues already addressed it would seem, so yes. Maybe all this needs is another revival (i.e. from you), but maybe first have a look at the old discussion and see what the result was (if there was any), or just revive it on the mailing list.

joferkington · 2015-04-21T15:55:05Z

@alimuldal & @seberg Here's the list discussion: http://numpy-discussion.10968.n7.nabble.com/Adding-an-axis-argument-to-numpy-unique-td34841.html

If I recall correctly, it was mostly just ambiguity around naming vs. kwargs. I didn't want to push something ahead too hard without a clear consensus.

Also related (along with other functionality):
http://numpy-discussion.10968.n7.nabble.com/New-function-count-unique-to-generate-contingency-tables-td38363.html
(and the pull request in question: #4958)

alimuldal · 2015-04-21T22:45:41Z

@joferkington Thanks, I'll take a look

joferkington · 2015-05-06T15:48:23Z

Related issue from a couple of years ago: #2871

breedlun · 2015-08-20T13:13:52Z

I would also like to see this feature in numpy.

micmitch · 2016-02-04T00:20:19Z

This would be very useful.

I'm currently using the function unique_rows from @bfroehle which was posted here #2871

martinosorb · 2016-06-03T14:00:08Z

@joferkington As I was saying the other day in another thread, I think it would be a great functionality to have. I personally support the idea of just adding an "axis" parameter. I don't think it would be ambiguous: in all the functions that have one, the given axis is the one that ''disappears'' in after the operation, so it would be quite easy to remember. Are you still interested in taking this forward? Otherwise I may look into it.

seberg reviewed Aug 7, 2013
View reviewed changes

seberg reviewed Sep 7, 2013
View reviewed changes

joferkington added 4 commits September 8, 2013 20:17

ENH: Add an "axis" kwarg to numpy.unique

4fcf6a8

DOC: Fixed typo in docstring examples for unique

a9f8ece

BUG: unique should raise a clearer error if an invalid axis kwarg is …

fccd7fe

…specified

STY: Refactored unqiue tests into their own class

2544df4

seberg closed this Sep 24, 2013

seberg reopened this Sep 24, 2013

TST: Added basic tests for an invalid axis kwarg to unique

d9ea28d

joferkington closed this Sep 25, 2013

ahaldane mentioned this pull request Feb 21, 2016

Converting array to subarray dtype requires bogus intermediary dtype #7273

Closed

martinosorb mentioned this pull request May 31, 2016

Add unique row, for finding unique rows in a 2D array #2871

Closed

martinosorb mentioned this pull request Jun 14, 2016

Add axis argument to numpy.unique #7742

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add an "axis" kwarg to numpy.unique #3584

ENH: Add an "axis" kwarg to numpy.unique #3584

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: Add an "axis" kwarg to numpy.unique #3584

ENH: Add an "axis" kwarg to numpy.unique #3584

Uh oh!

Conversation

Description

Rationale

Examples

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!