8000 Documenting return array types · Issue #30638 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Documenting return array types #30638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
StefanieSenger opened this issue Jan 13, 2025 · 3 comments
Closed

Documenting return array types #30638

StefanieSenger opened this issue Jan 13, 2025 · 3 comments

Comments

@StefanieSenger
Copy link
Contributor
StefanieSenger commented Jan 13, 2025

Since we are introducing Array API compatibility we are discussing that some functions (especially in the metrics section) would not return the input array type, but a numpy array.

How would we document that, so that users know what they get as a return type?

We have started to discuss this here, here and here (and possibly in other places), but this discussion a bit scattered and in this issue I am trying to bring this together.

I would think we need to find a standard way of how to talk about return types in the docstrings.

  • use the terms ndarray and array (or something more eye-catching) for the input array type
  • from the docstring, link to a dedicated section in the glossary, explaining the differences between ndarray and array and which are the implications for the users

What are the general feelings about that?
@ogrisel @OmarManzoor @adrinjalali (I don't want to bother you by tagging, but it would be interesting to hear the takes of betatim, thomasjpfan, lesteve and jeremiedbb as well if they are interested 😅)

@StefanieSenger
Copy link
Contributor Author

Personally, I haven't fully understood why scikit-learn maintainers prefer to keep numpy return types for functions with small array outputs.

It doesn't seem computationally costly to convert small arrays into the input array type, whereas returning them as numpy arrays complicates the matter.

This is because we would need to discuss and decide on array return types individually, which could potentially lead to confusion for users, especially if they miss parts of the documentation where we explain this.

This is costly in a different way:

  • It slows down the overall Array API transition.
  • It will annoy users.
  • It could keep us busy with clarifications and decisions on return types and then again to maintain this.

I am sharing my thoughts here primarily to get some direction from my senior colleagues on how to pick and document the return type in #30562, rather than to start a broader discussion.

@OmarManzoor
Copy link
Contributor

If this is with reference to the documentation part, the return types in the doc strings are retained intentionally, considering that the Array api is experimental. Please refer to this set of comments #27113 (comment)

@jeremiedbb jeremiedbb added RFC and removed Needs Triage Issue requires triage labels Jan 17, 2025
@StefanieSenger
Copy link
Contributor Author

Thanks for the link, @OmarManzoor. It makes sense to not touch the docstrings while Array API is still experimental.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
< 349C /form>
Development

No branches or pull requests

4 participants
0