8000 Make np.argmax, np.argmin to return indexes and values at the same time · Issue #15623 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Make np.argmax, np.argmin to return indexes and values at the same time #15623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
darck-neos opened this issue Feb 21, 2020 · 9 comments
Closed

Comments

@darck-neos
Copy link

Actually, np.argmax returns indexes of maximum values along some axis. On the other side, np.amax returns maximum values along some axis but if you want to get indexes of maximum values with those maximum values it is necessary to go through the array again. For example:

import numpy as np

data = np.array([[0.02079073, 0.97920927], [0.00487725, 0.99512275], [0.00849596, 0.99150404], [0.96402552, 0.03597448], [0.00711506, 0.99288494]])

indexes = np.argmax(data, axis=-1)

maximum_values = data[np.arange(data.shape[0]), indexes]

print(indexes)
print(maximum_values)
print(np.amax(data, axis=-1))
>> [1 1 1 0 1]
>> [0.97920927 0.99512275 0.99150404 0.96402552 0.99288494]
>> [0.97920927 0.99512275 0.99150404 0.96402552 0.99288494]

Even with np.arange, go through the array again consumes time (specially on huge arrays). So, np.argmax and np.argmin could return indexes and values at the same time on one look (maybe with an extra parameter in order to make compatible previous versions). For 1D just indexing would be fine, but that not work for arrays with 2 or more axis because it gets the rows elements:

import numpy as np

data = np.array([0.3, 0.7, 0.5])

indexes = np.argmax(data, axis=-1)

print(data[indexes])
>> 0.7
import numpy as np
data = np.array([[0.02079073, 0.97920927], [0.00487725, 0.99512275], [0.00849596, 0.99150404], [0.96402552, 0.03597448], [0.00711506, 0.99288494]])

indexes = np.argmax(data, axis=-1)

print(data[indexes])
>> [[0.00487725 0.99512275]
      [0.00487725 0.99512275]
      [0.00487725 0.99512275]
      [0.02079073 0.97920927]
      [0.00487725 0.99512275]]

Numpy version: '1.17.5'
Python version: '3.6.9'

@WarrenWeckesser
Copy link
Member

For 1D just indexing would be fine, but that not work for arrays with 2 or more axis because it gets the rows elements:

You can make the latter case work as you expect by indexing data appropriately:

In [37]: data                                                                   
Out[37]: 
array([[0.02079073, 0.97920927],
       [0.00487725, 0.99512275],
       [0.00849596, 0.99150404],
       [0.96402552, 0.03597448],
       [0.00711506, 0.99288494]])

Get the indices of the maxima along the last axis:

In [38]: indices = data.argmax(axis=-1)                                         

In [39]: indices                                                                
Out[39]: array([1, 1, 1, 0, 1])

Now pull out the corresponding maximum values from data. We have to use indices as the second index, while using [0, 1, 2, ... data.shape[0]-1] as the first index:

In [40]: data[np.arange(data.shape[0]), indices]                                
Out[40]: array([0.97920927, 0.99512275, 0.99150404, 0.96402552, 0.99288494])

@WarrenWeckesser
Copy link
Member

If you really want to do this in a single function call, over in ufunclab I have implemented an assortment of NumPy ufuncs and gufuncs, including min_argmin and max_argmax.

Here's your example, using ufunclab.max_argmax:

In [21]: import numpy as np                                                     

In [22]: import ufunclab                                                        

In [23]: data = np.array([[0.02079073, 0.97920927], [0.00487725, 0.99512275], [0
    ...: .00849596, 0.99150404], [0.96402552, 0.03597448], [0.00711506, 0.992884
    ...: 94]])                                                                  

In [24]: data                                                                   
Out[24]: 
array([[0.02079073, 0.97920927],
       [0.00487725, 0.99512275],
       [0.00849596, 0.99150404],
       [0.96402552, 0.03597448],
       [0.00711506, 0.99288494]])

In [25]: values, indices = ufunclab.
8000
max_argmax(data, axis=-1)                   

In [26]: values                                                                 
Out[26]: array([0.97920927, 0.99512275, 0.99150404, 0.96402552, 0.99288494])

In [27]: indices                                                                
Out[27]: array([1, 1, 1, 0, 1])

@seberg
Copy link
Member
seberg commented Feb 21, 2020

We now have np.take_along_axis for this operation. Although for smallish arrays, it is probably more overhead when it comes to speed. Whether to include something like Warrens function, I guess it could be proposed, maybe better on the mailing list though.

@WarrenWeckesser
Copy link
Member

We now have np.take_along_axis for this operation.

Ah, right, and there is even an example that uses argmax in the take_along_axis docstring. The only minor "gotcha" is the need to tweak the shape of the return value of argmax when passing it to take_along_axis.

For the example above:

In [62]: data
Out[62]: 
array([[0.02079073, 0.97920927],
       [0.00487725, 0.99512275],
       [0.00849596, 0.99150404],
       [0.96402552, 0.03597448],
       [0.00711506, 0.99288494]])

In [63]: indices = data.argmax(axis=-1)

In [64]: indices
Out[64]: array([1, 1, 1, 0, 1])

indices has shape (5,). To use it in take_along_axis, we'll expand that shape to (5, 1) by writing it as indices[:, None] (alternatives include indices.reshape(-1, 1) or np.expand_dims(indices, -1)):

In [65]: np.take_along_axis(data, indices[:, None], axis=-1)
Out[65]: 
array([[0.97920927],
       [0.99512275],
       [0.99150404],
       [0.96402552],
       [0.99288494]])

(It would be nice if argmax had a keepdims parameter, as suggested in #8710.)

@seberg
Copy link
Member
seberg commented Feb 21, 2020

I am wondering if we should make an "endorsed" repo for such ufuncs, it could also include things like Julians old ufuncs for composed instructions (add 4 floats, fused multiply add).

For most functions it would just be a home to put things with a bit less hesitation, for some it could maybe be an entry point into NumPy proper.

@darck-neos
Copy link
Author

@WarrenWeckesser, ufunclab.max_argmax works like a charm. I see in your repo that is not just a python wrapper function but a Cython code and that is great because of speed.

@seberg, I think Warren' functions would be a great addition because my search on internet shows me that people usually use another function to get values and indexes, even in some cases they get values first and indexes second (because of the question on stackoverflow) but I think it damages speed more than get indexes first and values second because the comparations over elements using values (O(n) in worst case) when indexing is direct (O(1)).

@WarrenWeckesser, you have something similar (one function) in order to get top K maximum/minimum values and indexes at the same time?. I'm asking something similar to #15128 but returning values and indexes at the same time but only of the top K (to not use just np.argpartition or np.argsort). Maybe this extension of my question should be a new issue or be on an other repo as @seberg said but until now thanks for everything.

@WarrenWeckesser
Copy link
Member

I see in your repo that is not just a python wrapper function but a Cython code and that is great because of speed.

Actually, it is C, not Cython.

you have something similar (one function) in order to get top K maximum/minimum values and indexes at the same time?.

No, I haven't looked into such a function. I'd probably use argpartition for that.

@darck-neos
Copy link
Author

Actually, it is C, not Cython

My fault confussing languages, still is fast than python wrapper

No, I haven't looked into such a function. I'd probably use argpartition for that.

Nevermind, you help me a lot, thanks. I close this issue because was solved with ufunclab.

@gmoda
Copy link
gmoda commented Dec 11, 2020

You could also do:

[ind1[ind2] for ind1, ind2 in zip(data, indexes)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0