-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Clarification of output array type when metrics accept multiclass/multioutput #31286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@ogrisel once mentioned this was intentional since Array API is experimental and we didn't want to confuse people by complicated docstrings. But I think not having the information is very confusing. I'd be in favor of properly documenting them. I think @StefanieSenger had the same view at some point. |
I wasn't suggesting we should include this in docstrings. This issue is mainly because of the concern expressed in the comment quoted, deciding on what we want and adding a note in the array API doc page. |
Oh I see. Since we decided for our metrics to return python scalars, shouldn't the arrays here also be very simple types? |
Sorry,the arrays should be simple array types? Like numpy array only? Haven't looked into pros and cons either way, so not pushing either proposal. |
That's an open question, that is currently blocking progress of implementing Array API. (I had tried to sum up the state of the discussion on one of my blocked PRs here.) I believe it's less confusing for users if they get the same array type out of a function / pipeline that they had put into it and I also understand this service is part of the promise of Array API. It doesn't seem to be very costly to convert back to the input namespace even if numpy was used in part of the process, as @OmarManzoor had checked by benchmarking for a specific example. (Scalar types from metrics could be an acceptable exception, since people are used getting scalars, but don't need to be.) |
When it comes to |
Thanks for the reference @StefanieSenger , that is interesting. I think none of the metrics above required an explicit conversion into the same input namespace and device. e.g., for scikit-learn/sklearn/metrics/_regression.py Lines 287 to 292 in 7a88bf1
the output of AFAICT the classification metrics, I think confusion matrix may be in the minority but what we may need is to have a look at the other metrics to be converted, and make an educated guess as to which will 'automatically' return in the same namespace/device and what will require explicit conversion like with Or would we like a simple list/numpy array, even when it requires explicit conversion? |
Could you explain what you mean with automatic vs. explicit conversion, @lucyleeow? My understanding is that we would make the metrics return the type(s) we want. And then I would think, if it's not a float, but an array type, it makes sense to return the same type (and same device) the user has passed in as |
This is my understanding of how the code works. Looking at the 'easy' example of scikit-learn/sklearn/metrics/_regression.py Lines 287 to 292 in 7a88bf1
scikit-learn/sklearn/utils/_array_api.py Line 621 in bdef5aa
and scikit-learn/sklearn/utils/_array_api.py Line 660 in bdef5aa
So the code automatically outputs an array in the same namespace / device as the input, simply by virtue of array API having the "array type in equals array type out" principle. If you wanted something else, you would have to manually convert it at the end. I guess the situation with I think this is how it works, but I may have missed something. |
I would say: "it makes sense to return the same type (and same device) the user has passed in as y_pred" ( |
We discussed this topic in the array API meeting today. My impression was that there were no strong opinions either way. Instead we discussed the question of "what are people going to do with the output of these functions?" - some things we could think of was: print values (maybe rounding to N digits), plot values with matplotlib, put into a pandas DF to then Another question raised was how much code we'd have to change/add to every metric function to, say, always output numpy arrays. My personal thoughts: I don't know what the state of "array API support" in matplotlib is. I don't think they are working on supporting the array API, but there has been some thought and work around supporting things that aren't numpy arrays. I think, but this is just speculation, that most users will plot the results of the metrics or otherwise compute simple statistics on them. It seems silly to make every user add code that converts the result of a metric function to a numpy array (if that is what is needed for plotting). Especially because there is (currently) no generic way to take a arbitrary array API array and turn it into a numpy array. But if you can feed a GPU pytorch tensor to matplotlib just like a numpy array this problem goes 8000 away. Keeping the namespace and device consistent with the input is less surprising and what other libraries do. Probably also what people expect to happen, given that metrics tend to be simple arithmetic functions of their inputs. What do people think are typical "next steps" for the return values of metrics? Does anyone have experience with feeding "array API arrays" to matplotlib? |
I also have no strong views either way.
Spitballing - would you consider making |
Not sure. Even if it was public it feels pretty clunky for a "generic average pydata user".
|
I would expect something as |
I think @ogrisel pointed out that returning metrics in the same namespace and device would be useful if they were used within a complete pipeline. Otherwise if it's just printing them, using them in plots or for further descriptive analysis then I think it doesn't really matter if we return it in numpy or the original namespace. I guess if we return it in the original namespace then the users would need an extra step to convert the return arrays into numpy and move them to the cpu when the libraries they use (matplotlib, pandas e.t.c.) don't really support the returned array or its device. However won't returning in the same namespace and device be useful in the case where for example someone is training a neural network architecture maybe using HuggingFace and most of the computations are being done on the gpu but when calculating and displaying the metrics, the respective arrays first need to be moved to the cpu and possibly converted to numpy? |
Reading through this thread, here's what I see:
Also:
So in the end, it seams what returning the result in the same space as the input brings, is to make us happy that we're "consistent" in some way, but it doesn't seem to bring much of a value to the user. On the other hand, returning numpy arrays is a much "simpler" kind of output which to me seems closer to the idea of returning python floats on single values. That to me indicates returning numpy arrays is winning the argument here. |
could you explain this a bit more? I was thinking about things like I'm also tending towards "return numpy" but need to ponder the idea a bit more. I'm also trying to work out where/how to reach out to the matplotlib team to find out if accepting (say) torch tensors on the GPU is something they want to have, or know that it is virtually impossible for that ever to land in matplotlib. (I posted a message on the matplotlib channel on the scientific python discord to ask about this) |
So basically if you train models using HuggingFace and you want to compute metrics you need to move the arrays to cpu and convert to numpy before this can be done. Here is the code from the generic trainer class that does this /https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L4420-L4439 |
I wouldn't say useful, but less surprising for people who expect scikit-learn to behave more than other array API consuming libraries such as SciPy: in SciPy the general contract for array API capable functions is: namespace in, namespace out. Our metric functions would not follow this contract. But as team said, I don't really see the case where it would be beneficial to stay in the original namespace. The metric values will not be used as inputs for any kind of complex computation but rather be displayed in jupyter cell outputs (after some simple post-processing such as rounding to remove non-significant digits or computing some simple summary/descriptive statistics such as mean, standard deviation, quantiles and co), or plotted with matplotlib and friends.
So you mean that huggingface's trainer pytorch support automatically convert metric values to numpy on CPU so scikit-learn doing the same would not feel odd in that respect? |
EDIT: for the record, here is how it is implemented at the moment: scikit-learn/sklearn/utils/_array_api.py Lines 780 to 789 in 675736a
The clean way to implement this would be to rely on For instance, with pytorch 2.7.0 (and numpy 2.2.5), this does not work yet: >>> import array_api_compat.torch as xp
>>> import numpy as np
>>> np.from_dlpack(xp.ones(3, device="mps"), device="cpu")
Traceback (most recent call last):
Cell In[9], line 1
np.from_dlpack(xp.ones(3, device="mps"), device="cpu")
TypeError: Tensor.__dlpack__() got an unexpected keyword argument 'dl_device' This exception is raised by PyTorch. EDIT: the PyTorch specific problem is being tackled in this set of PRs: NumPy and array-api-strict are fine: >>> import array_api_strict as xp
>>> np.from_dlpack(xp.ones(3, device=xp.Device("device1")), device="cpu")
array([1., 1., 1.]) but maybe it's just array-api-strict that is too lax when simulating non-CPU devices, because based on my understanding of the spec, I would have expected any of the following to raise an error: >>> np.from_dlpack(xp.ones(3, device=xp.Device("device1")), device=None, copy=False)
array([1., 1., 1.])
>>> np.from_dlpack(xp.ones(3, device=xp.Device("device1")), device="cpu", copy=False)
array([1., 1., 1.]) So to conclude, cross-device dlpack-based namespace conversion is too new to be relied upon hence we have to keep our |
Note, array-api-extra implements a more complete alternative to our However, at the time of writing it's not importable from a public module. This is not necessarily a problem for us since we vendor array-api-extra, but we shouldn't make it public either. |
I think with respect to the general standard it would be better to convert to numpy. What I meant was currently it seems like we have to perform the computations on the cpu using numpy. Supporting the original arrays might allow metrics to be computed on the gpu. |
This is a different problem: we can compute as much as possible on the GPU (in the input namespace of the metric function, that is based on how |
Given all that has been said above, I think I am ok with having all the metrics that return arrays to an internal conversion to numpy before returning the results. It feels more practical for 99% of the use cases of our users. Once libraries such as matplotlib have a better support for dlpack 1.0 and/or array API spec, we can consider changing our mind (assuming we do so before array API support in scikit-learn is out of the experimental phase). |
Right I think converting the final outcome to numpy seems to be fine then. |
Clarification of how we should handle array output type when a metric outputs several values (i.e. accepts multiclass or multioutput input).
The issue was summarised succinctly in #30439 (comment):
Currently all regression/classification metrics that support array API and multiclass or multioutput, all output an array in the same namespace and device as the input (checked code and manually). Summary of these metrics :
Regression metrics
Returns array in same namespace/device:
Classification metrics
Returns array in same namespace/device:
Looking at the metrics code, if we wanted to support a list of scalars, we'd generally have to do extra processing to convert an array (often output of an xp.** function) to a list of scalars.
Once we arrive at a consensus we should update the array API documentation and update the
check_array_api_metric
in tests such that when the output is array/list - we check that the output type etc is correct.cc @ogrisel @betatim
The text was updated successfully, but these errors were encountered: