8000 Implement `to_numpy` method to speed up matplotlib with PyTorch arrays · Issue #101795 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Implement to_numpy method to speed up matplotlib with PyTorch arrays #101795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
patel-zeel opened this issue May 18, 2023 · 6 comments
Closed

Implement to_numpy method to speed up matplotlib with PyTorch arrays #101795

patel-zeel opened this issue May 18, 2023 · 6 comments
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: numpy Related to numpy support, and also numpy compatibility of our operators needs research We need to decide whether or not this merits inclusion, based on research world triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@patel-zeel
Copy link
Contributor
patel-zeel commented May 18, 2023

🚀 The feature, motivation and pitch

Hi,

As discussed in this issue and corresponding PR on matplotlib, PyTorch arrays can be significantly slow when used directly with matplotlib. This is because matplotlib has no easy way to convert PyTorch arrays to NumPy arrays before plotting and thus it expects other libraries to have to_numpy() method. I think to_numpy() implementation in PyTorch would be useful for the PyTorch users who might be using PyTorch arrays directly with matplotlib without knowing that it can be too slow.

Alternatives

  • As discussed in the matplotlib PR, we considered adding a specific check for inputs of type torch.Tensor and then convert it to numpy using .numpy() method but adding a string based check does not seem a good idea.
  • We tried using __array__ method to convert both JAX and PyTorch arrays to NumPy but it does not work well with some other objects having __array__ method.

Additional context

Here is the code to reproduce the plotting delay issue:

from time import time
import numpy as np
import matplotlib.pyplot as plt

import torch

torch_array = torch.randn(1000, 150)

def plot_hist(array):
    init = time()
    plt.figure()
    plt.hist(array)
    print(f"Time to plot: {time() - init:.2f} s")
    plt.show()

plot_hist(torch_array.ravel())  # Takes around 2 seconds
plot_hist(np.array(torch_array.ravel()))  # Takes around 0.04 seconds

I am open to a diverse set of suggestions to fix this issue.

cc @mruberry @rgommers

@malfet
Copy link
Contributor
malfet commented May 18, 2023

Wouldn't adding torch.Tensor.to_numpy = torch.Tensor.numpy after import torch addresses it?
But adding this alias permanently, sounds reasonable to me, as it would make it consistent with say pandas, unless @mruberry knows a reason why it should be called numpy, rather than to_numpy?

@malfet malfet added triage review enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels May 18, 2023
@oscargus
Copy link

As discussed in matplotlib/matplotlib#22645, at least pandas, xarray, polar, and pyarrow support to_numpy, so I would just encourage you to consider adding an alias.

@patel-zeel
Copy link
Contributor Author
patel-zeel commented May 19, 2023

Right @oscargus, I'd say that adding to_numpy is unlikely to break anything in PyTorch but finding a way to support all possible array libraries with different APIs is kind of difficult for matplotlib (at least it seems to me that way). At least this can be a hotfix until all array libraries find a common way out of this.

@lezcano
Copy link
Collaborator
lezcano commented May 20, 2023

cc @rgommers

@rgommers
Copy link
Collaborator

This is related to gh-36560, which is also "ensure conversion to numpy before plotting". Note that .numpy() or .to_numpy() as an alias is not enough, you'll get exceptions for non-CPU tensors as well as tensors on CPU that are in an autograd graph.

There are already several comments on matplotlib/matplotlib#25887 which point out that adding yet another way of converting to a numpy array is not the best idea. So I'd recommend not doing this. There are already too many ways of doing this. I'll comment on the Matplotlib PR, because that has much more relevant context.

@albanD albanD added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: numpy Related to numpy support, and also numpy compatibility of our operators needs research We need to decide whether or not this merits inclusion, based on research world and removed triage review labels May 22, 2023
@patel-zeel
Copy link
Contributor Author

The original issue is resolved by matplotlib/matplotlib#25887 and thus this issue also stands resolved. Closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: numpy Related to numpy support, and also numpy compatibility of our operators needs research We need to decide whether or not this merits inclusion, based on research world triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
< 372D form data-target="create-branch.developmentForm" data-turbo="false" class="js-issue-sidebar-form" aria-label="Link issues" action="/pytorch/pytorch/issues/closing_references?source_id=1715353906&source_type=ISSUE" accept-charset="UTF-8" method="post">
Development

No branches or pull requests

6 participants
0