8000 BUG: Fix return shape of inverse_indices in unique_inverse by jakevdp · Pull Request #25553 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix return shape of inverse_indices in unique_inverse #25553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 11, 2024

Conversation

jakevdp
Copy link
Contributor
@jakevdp jakevdp commented Jan 8, 2024

Fixes #25552

@seberg
Copy link
Member
seberg commented Jan 8, 2024

Should we also just change this in the normal version? It does look much like a bug-fix when axis is used?

@jakevdp
Copy link
Contributor Author
jakevdp commented Jan 8, 2024

I don't think we should make this change in np.unique, because that's a potentially breaking change.

unique_all and unique_inverse have not been part of a release yet, so modifying their output shapes is safer.

@seberg
Copy link
Member
seberg commented Jan 8, 2024

No strong opinion, although it seems so odd that I think we can do it, and no time like a 2.0 release to change such a thing that looks like an oversight to begin with. (If users worked around via .reshape(input.shape) their code even keeps working and res.reshape(-1) gives the old without branching)

@jakevdp
Copy link
Contributor Author
jakevdp commented Jan 8, 2024

Sure, I see what you're saying. I'm happy to update the PR to modify the original (and add appropriate docs/changelog) if you'd prefer that.

Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to fix this!

Like @seberg, I think we should ideally fix this in np.unique itself, since it is clearly a bug.

That said, np.unique has an axis argument, and with that the simple reshape is no longer generally correct. It would need something like the following:

arr = np.array([[[1, 2, 3], [2, 3, 1]]]*2 +[[[1, 3, 2], [2, 1, 3]]])
axis = 1  # for testing
a, i = np.unique(arr, axis=axis, return_inverse=T
8000
rue)
if axis is None:
    i.shape = arr.shape
else:
    i.shape = tuple((sh if ax == axis else 1) for ax, sh in enumerate(arr.shape))
    # we probably want this too, to make the indices similar to argsort.
    i = np.broadcast_to(i, arr.shape)

# check result is correct, in way also suggested for np.argsort, etc.
np.all(np.take_along_axis(a, i, axis=axis) == arr)
# all true

@jakevdp
Copy link
Contributor Author
jakevdp commented Jan 9, 2024

I think this change to np.unique makes sense, but maybe it deserves its own discussion? Should we merge this PR (so that unique_* have the correct API) and then discuss np.unique in its own issue and/or mailing list thread?

@charris charris changed the title unique_inverse: return appropriate shape for inverse_indices BUG: Fix return shape of inverse_indices in unique_inverse Jan 10, 2024
@ngoldbaum ngoldbaum added the triage review Issue/PR to be discussed at the next triage meeting label Jan 10, 2024
@mattip
Copy link
Member
mattip commented Jan 10, 2024

In the triage meeting we reached a consensus to call the wrong 1d shape a bug, and fix it without any deprecation. It should be mentioned in the release notes.

@mattip mattip added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Jan 10, 2024
@jakevdp
Copy link
Contributor Author
jakevdp commented Jan 10, 2024

Thanks - I will update this PR to change the output shape for np.unique as well.

@jakevdp
Copy link
Contributor Author
jakevdp commented Jan 10, 2024

OK, I changed the implementation of np.unique itself, added release notes, and updated several tests. PTAL!

Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Only a silly nitpick from me. Approving.

@mhvk
Copy link
Contributor
mhvk commented Jan 10, 2024

Note there is a linting error, so you can fix the versionchanged at the same time...

Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

rgommers added a commit to rgommers/scipy that referenced this pull request Jan 11, 2024
Code was written to expect a 1-D array with the inverse
indices, and that got changed in NumPy in
numpy/numpy#25553.

Closes scipygh-19867

[skip cirrus] [skip circle]
@rgommers
Copy link
Member

This broke SciPy in two places, in ways that were a little nontrivial to diagnose: scipy/scipy#19868. The fix does make sense, but more complaints may roll in - this probably broke most usages of the returned reverse indices.

@mhvk
Copy link
Contributor
mhvk commented Jan 11, 2024

Hmm, @jakevdp was clearly right to worry about breaking things. Arguably in cases like this we should run the scipy tests... Thanks for fixing!

@jakevdp
Copy link
Contributor Author
jakevdp commented Jan 12, 2024

@rgommers do you think this change should be reverted? I do think the change in the axis=None case is a good one, but I don't feel as strongly about the change of behavior when axis is an int.

@rgommers
Copy link
Member

I don't know, I'd give it a week or so - if no issues turn up for other libraries that test against numpy nightlies, it's probably okay to keep the change.

@WarrenWeckesser
Copy link
Member

One more data point: this broke some code in an unreleased project of mine (yanova). Some tests fail because of the change of shape of the inverse array in examples like the following.

NumPy 1.26.4:

In [9]: np.__version__
Out[9]: '1.26.4'

In [10]: m = np.array([[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2],
    ...:               [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0]])

In [11]: np.unique(m, axis=1, return_inverse=True)
Out[11]: 
(array([[0, 0, 1, 1, 2, 2],
        [0, 1, 0, 1, 0, 1]]),
 array([0, 0, 0, 1, 1, 1, 3, 3, 2, 2, 5, 5, 5, 4, 4]))

NumPy 2.0.0rc2

In [6]: np.__version__
Out[6]: '2.0.0rc2'

In [7]: m = np.array([[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2],
   ...:               [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0]])

In [8]: np.unique(m, axis=1, return_inverse=True)
Out[8]: 
(array([[0, 0, 1, 1, 2, 2],
        [0, 1, 0, 1, 0, 1]]),
 array([[0, 0, 0, 1, 1, 1, 3, 3, 2, 2, 5, 5, 5, 4, 4]]))

I can easily fix the issue, and since we haven't heard of any other problems related to this, this doesn't open up again the question of reverting the change. Just FYI.

@seberg
Copy link
Member
seberg commented Jun 18, 2024

Well, I thought the reason why this was different in the Array API was because the current behavior doesn't make sense...
... so that is a bit my bad, I didn't think much about it assuming that the Array API decision was base on the current version being unwieldy.

If that was the case, I would not want to revert this. But, unfortunately, that impression was seems wrong. There is no big reason for unique to behave differently, because unique is already compatible to np.take rather than np.take_along_axes, and that isn't going away.

EDIT: Sorry, forgot to cross-ref gh-26738

seberg added a commit to seberg/numpy that referenced this pull request Jul 11, 2024
There was a good argument that it is not possible to reconstruct
the original array with `axis=None` without first reshaping and
changing the result shape helped with it.

However, it was always possible to do it for other axis values
by using `np.take` rather than `np.take_along_axis`.  Changing it
for all axis values is unnecessary to achieve reconstruction because
`np.take(arr, inverse, axis=axis)` already performed the job except
for `axis=None`.

Thus, this keeps the change for axis=None, but reverts numpygh-25553
for numerical axis.
@seberg
Copy link
Member
seberg commented Jul 13, 2024

Just FYI, this is now reverted except for axis=None where the change made sense. For axis=0, etc. I think the old behavior was actually strictly better anyway.

charris pushed a commit to charris/numpy that referenced this pull request Jul 16, 2024
There was a good argument that it is not possible to reconstruct
the original array with `axis=None` without first reshaping and
changing the result shape helped with it.

However, it was always possible to do it for other axis values
by using `np.take` rather than `np.take_along_axis`.  Changing it
for all axis values is unnecessary to achieve reconstruction because
`np.take(arr, inverse, axis=axis)` already performed the job except
for `axis=None`.

Thus, this keeps the change for axis=None, but reverts numpygh-25553
for numerical axis.
ArvidJB pushed a commit to ArvidJB/numpy that referenced this pull request Nov 1, 2024
There was a good argument that it is not possible to reconstruct
the original array with `axis=None` without first reshaping and
changing the result shape helped with it.

However, it was always possible to do it for other axis values
by using `np.take` rather than `np.take_along_axis`.  Changing it
for all axis values is unnecessary to achieve reconstruction because
`np.take(arr, inverse, axis=axis)` already performed the job except
for `axis=None`.

Thus, this keeps the change for axis=None, but reverts numpygh-25553
for numerical axis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy.lib Numpy 2.0 API Changes triaged Issue/PR that was discussed in a triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: np.unique_inverse output has wrong shape
7 participants
0