[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators #152679

Tunahanyrd · 2025-05-02T07:59:03Z

Feature

I propose a small but useful utility package named cuda_tools that provides:

A DeviceContext context manager for clean device/AMP/cache handling
Simple and advanced decorators (@cuda, @cuda.advanced) to make any function run safely on GPU or CPU
Optional automatic tensorization (int, list, np.ndarray → torch.Tensor)
Memory profiling, retry on error, timeout, automatic fallback to CPU on OOM
AMP and multi-GPU support (optional)

Why

Working with GPU-accelerated functions often requires boilerplate code:

Device selection
.to(device) calls
Cache clearing
Error handling for CUDA OOM
AMP context setup
Converting NumPy / CuPy / TensorFlow objects into torch.Tensor

This toolset wraps all that logic in a minimal, reusable, and modular design.

Package Structure (already implemented)

cuda_tools/
├── __init__.py
├── context.py      # DeviceContext
├── decorators.py   # @cuda, @cuda.advanced
├── utils.py        # tensor conversion, CuPy patching, etc.

Demo Repository:

https://github.com/Tunahanyrd/universal-cuda-tools

Note
This is not a request to include the entire codebase as-is.
Rather, the components are highly modular, and any part that fits the PyTorch core philosophy could be selectively adapted into torch.utils, torch.cuda, or elsewhere.

If there's interest, I’d be happy to refine this further or submit a minimal PR.

Thanks for considering!

Alternatives

Alternative solutions typically involve manual .to(device) calls, AMP wrapping, and cache clearing. However, these require repetitive boilerplate and are not reusable. While other utility wrappers exist in personal or third-party codebases, they are often specific and not modular like this.

Additional context

This toolset was developed during real-world training workflows involving long-running model training on limited-GPU hardware. It aims to reduce boilerplate while improving safety and clarity in device management.

Example usage:

@cuda(device="cuda", retry=1, auto_tensorize=True)
def train_step(batch):
    # Works with raw ints/lists/np arrays; runs safely on selected device
    ...

cc @albanD

The text was updated successfully, but these errors were encountered:

albanD · 2025-05-05T18:10:57Z

Thanks for sharing this. Very interesting repo.

I do think it covers a lot of very different things like Numpy conversion to amp or cache clearing.
It feels a bit too high level for the PyTorch APIs given our design principles https://pytorch.org/docs/stable/community/design.html and in particular the "Simple over Easy" one.
These are all very convenient and easy APIs to use. But they hide a lot of the complexity and I'm sure they're not that simple under the hood.

I would be curious though if there are any lower level components we could provide in core to make writing cuda_tools easier?

Also I will mention the new device-generic APIs https://pytorch.org/docs/stable/accelerator.html in case you're interested in making this work with amd/mps/xpu and other accelerators!

Tunahanyrd · 2025-05-06T17:01:20Z

Hi again!

I initially just wanted to test some new AI-powered coding tools (mostly GPT-based) by writing a simple wrapper around device management and basic utilities. But then something amazing happened: these tools were way more powerful than I expected, and the little wrapper quickly evolved into this modular and comprehensive toolkit.

Honestly, I'm pretty excited about how it turned out! While I don't consider myself an experienced PyTorch developer (I'm still exploring and learning), I really enjoyed this process, and now I'm enthusiastic to see if any parts could benefit PyTorch core. I've put together some clear explanations and a structured proposal on how these utilities could potentially integrate into the existing API (especially the accelerator APIs).

Here's the fully documented repo: Tunahanyrd/torch-tools.

If you or the core team find any value here, I’d love to help out in any way I can—just let me know!

Thanks for your encouraging feedback earlier—it really means a lot.

Best,
Tunahan

Tunahanyrd · 2025-05-10T13:43:03Z

Hi again,

Just wanted to follow up gently — I know things can get busy.

If you had a chance to glance at the current version of the repo Tunahanyrd/torch-tools, I’d love to know if any part seems worth pursuing.

If there's anything I can help with — or if something needs clarification — I’d be happy to assist in any way I can, or open a minimal PR if that would be helpful.

Thanks again for your time!

albanD · 2025-05-12T15:07:37Z

Hey!

I did take a quick look. I think a few of the simple APIs are nice but we do expect users to have these in their code to match their need.
The more involved APIs you have for device, timeout, movement are also very nice but too high level for PyTorch Core I'm afraid. We explicitly want "simple over easy": we prefer that, when interacting with Core, everything is very explicit and simple, even if it has to be extra verbose. These APIs are valuable but we don't want to provide them directly (see similar things for trainer APIs for example).

Another similar example is that, while we have a trivial one, we don't recommend setting the default device type to be "cuda" (or anything that is not CPU) and recommend doing all device movement manually. Since these are very expensive operations, it is very helpful for a low level library like PyTorch to make the user very aware when these things happen.

Hopefully that all make sense!

Tunahanyrd · 2025-05-12T19:57:42Z

Thanks a lot for taking the time to look! I really appreciate the feedback — makes total sense. I'll keep it in mind going forward.

mikaylagawarecki added the triage review label May 2, 2025

Tunahanyrd mentioned this issue May 3, 2025

Proposal: GPU-safe utilities for Lightning (based on Universal CUDA Tools) Lightning-AI/pytorch-lightning#20782

Closed

Tunahanyrd closed this as completed May 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators #152679

[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators #152679

[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators #152679

[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators #152679

Comments

Feature

Why

Package Structure (already implemented)

Demo Repository:

Alternatives

Additional context