8000 [RFC] Universal Device Context and Safe GPU/CPU Execution Decorators · Issue #152679 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators #152679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tunahanyrd opened this issue May 2, 2025 · 5 comments
Closed
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: python frontend For issues relating to PyTorch's Python frontend needs research We need to decide whether or not this merits inclusion, based on research world triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@Tunahanyrd
Copy link
Tunahanyrd commented May 2, 2025

Feature

I propose a small but useful utility package named cuda_tools that provides:

  • A DeviceContext context manager for clean device/AMP/cache handling
  • Simple and advanced decorators (@cuda, @cuda.advanced) to make any function run safely on GPU or CPU
  • Optional automatic tensorization (int, list, np.ndarray → torch.Tensor)
  • Memory profiling, retry on error, timeout, automatic fallback to CPU on OOM
  • AMP and multi-GPU support (optional)

Why

Working with GPU-accelerated functions often requires boilerplate code:

  • Device selection
  • .to(device) calls
  • Cache clearing
  • Error handling for CUDA OOM
  • AMP context setup
  • Converting NumPy / CuPy / TensorFlow objects into torch.Tensor

This toolset wraps all that logic in a minimal, reusable, and modular design.

Package Structure (already implemented)

cuda_tools/
├── __init__.py
├── context.py      # DeviceContext
├── decorators.py   # @cuda, @cuda.advanced
├── utils.py        # tensor conversion, CuPy patching, etc.

Demo Repository:

https://github.com/Tunahanyrd/universal-cuda-tools

Note
This is not a request to include the entire codebase as-is.
Rather, the components are highly modular, and any part that fits the PyTorch core philosophy could be selectively adapted into torch.utils, torch.cuda, or elsewhere.

If there's interest, I’d be happy to refine this further or submit a minimal PR.

Thanks for considering!

Alternatives

Alternative solutions typically involve manual .to(device) calls, AMP wrapping, and cache clearing. However, these require repetitive boilerplate and are not reusable. While other utility wrappers exist in personal or third-party codebases, they are often specific and not modular like this.

Additional context

This toolset was developed during real-world training workflows involving long-running model training on limited-GPU hardware. It aims to reduce boilerplate while improving safety and clarity in device management.

Example usage:

@cuda(device="cuda", retry=1, auto_tensorize=True)
def train_step(batch):
    # Works with raw ints/lists/np arrays; runs safely on selected device
    ...

cc @albanD
@malfet malfet added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: python frontend For issues relating to PyTorch's Python frontend triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module needs research We need to decide whether or not this merits inclusion, based on research world and removed triage review labels May 5, 2025
@albanD
Copy link
Collaborator
albanD commented May 5, 2025

Thanks for sharing this. Very interesting repo.

I do think it covers a lot of very different things like Numpy conversion to amp or cache clearing.
It feels a bit too high level for the PyTorch APIs given our design principles https://pytorch.org/docs/stable/community/design.html and in particular the "Simple over Easy" one.
These are all very convenient and easy APIs to use. But they hide a lot of the complexity and I'm sure they're not that simple under the hood.

I would be curious though if there are any lower level components we could provide in core to make writing cuda_tools easier?

Also I will mention the new device-generic APIs https://pytorch.org/docs/stable/accelerator.html in case you're interested in making this work with amd/mps/xpu and other accelerators!

@Tunahanyrd
Copy link
Author

Hi again!

I initially just wanted to test some new AI-powered coding tools (mostly GPT-based) by writing a simple wrapper around device management and basic utilities. But then something amazing happened: these tools were way more powerful than I expected, and the little wrapper quickly evolved into this modular and comprehensive toolkit.

Honestly, I'm pretty excited about how it turned out! While I don't consider myself an experienced PyTorch developer (I'm still exploring and learning), I really enjoyed this process, and now I'm enthusiastic to see if any parts could benefit PyTorch core. I've put together some clear explanations and a structured proposal on how these utilities could potentially integrate into the existing API (especially the accelerator APIs).

Here's the fully documented repo: Tunahanyrd/torch-tools.

If you or the core team find any value here, I’d love to help out in any way I can—just let me know!

Thanks for your encouraging feedback earlier—it really means a lot.

Best,
Tunahan

@Tunahanyrd
Copy link
Author

Hi again,

Just wanted to follow up gently — I know things can get busy.

If you had a chance to glance at the current version of the repo Tunahanyrd/torch-tools, I’d love to know if any part seems worth pursuing.

If there's anything I can help with — or if something needs clarification — I’d be happy to assist in any way I can, or open a minimal PR if that would be helpful.

Thanks again for your time!

@albanD
Copy link
Collaborator
albanD commented May 12, 2025

Hey!

I did take a quick look. I think a few of the simple APIs are nice but we do expect users to have these in their code to match their need.
The more involved APIs you have for device, timeout, movement are also very nice but too high level for PyTorch Core I'm afraid. We explicitly want "simple over easy": we prefer that, when interacting with Core, everything is very explicit and simple, even if it has to be extra verbose. These APIs are valuable but we don't want to provide them directly (see similar things for trainer APIs for example).

Another similar example is that, while we have a trivial one, we don't recommend setting the default device type to be "cuda" (or anything that is not CPU) and recommend doing all device movement manually. Since these are very expensive operations, it is very helpful for a low level library like PyTorch to make the user very aware when these things happen.

Hopefully that all make sense!

@Tunahanyrd
Copy link
Author

Thanks a lot for taking the time to look! I really appreciate the feedback — makes total sense. I'll keep it in mind going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: python frontend For issues relating to PyTorch's Python frontend needs research We need to decide whether or not this merits inclusion, based on research world triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants
0