-
Notifications
You must be signed in to change notification settings - Fork 24.2k
[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators #152679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for sharing this. Very interesting repo. I do think it covers a lot of very different things like Numpy conversion to amp or cache clearing. I would be curious though if there are any lower level components we could provide in core to make writing cuda_tools easier? Also I will mention the new device-generic APIs https://pytorch.org/docs/stable/accelerator.html in case you're interested in making this work with amd/mps/xpu and other accelerators! |
Hi again! I initially just wanted to test some new AI-powered coding tools (mostly GPT-based) by writing a simple wrapper around device management and basic utilities. But then something amazing happened: these tools were way more powerful than I expected, and the little wrapper quickly evolved into this modular and comprehensive toolkit. Honestly, I'm pretty excited about how it turned out! While I don't consider myself an experienced PyTorch developer (I'm still exploring and learning), I really enjoyed this process, and now I'm enthusiastic to see if any parts could benefit PyTorch core. I've put together some clear explanations and a structured proposal on how these utilities could potentially integrate into the existing API (especially the accelerator APIs). Here's the fully documented repo: Tunahanyrd/torch-tools. If you or the core team find any value here, I’d love to help out in any way I can—just let me know! Thanks for your encouraging feedback earlier—it really means a lot. Best, |
Hi again, Just wanted to follow up gently — I know things can get busy. If you had a chance to glance at the current version of the repo Tunahanyrd/torch-tools, I’d love to know if any part seems worth pursuing. If there's anything I can help with — or if something needs clarification — I’d be happy to assist in any way I can, or open a minimal PR if that would be helpful. Thanks again for your time! |
Hey! I did take a quick look. I think a few of the simple APIs are nice but we do expect users to have these in their code to match their need. Another similar example is that, while we have a trivial one, we don't recommend setting the default device type to be "cuda" (or anything that is not CPU) and recommend doing all device movement manually. Since these are very expensive operations, it is very helpful for a low level library like PyTorch to make the user very aware when these things happen. Hopefully that all make sense! |
Thanks a lot for taking the time to look! I really appreciate the feedback — makes total sense. I'll keep it in mind going forward. |
Feature
I propose a small but useful utility package named
cuda_tools
that provides:DeviceContext
context manager for clean device/AMP/cache handling@cuda
,@cuda.advanced
) to make any function run safely on GPU or CPUWhy
Working with GPU-accelerated functions often requires boilerplate code:
.to(device)
callsThis toolset wraps all that logic in a minimal, reusable, and modular design.
Package Structure (already implemented)
Demo Repository:
https://github.com/Tunahanyrd/universal-cuda-tools
Note
This is not a request to include the entire codebase as-is.
Rather, the components are highly modular, and any part that fits the PyTorch core philosophy could be selectively adapted into torch.utils, torch.cuda, or elsewhere.
If there's interest, I’d be happy to refine this further or submit a minimal PR.
Thanks for considering!
Alternatives
Alternative solutions typically involve manual
.to(device)
calls, AMP wrapping, and cache clearing. However, these require repetitive boilerplate and are not reusable. While other utility wrappers exist in personal or third-party codebases, they are often specific and not modular like this.Additional context
This toolset was developed during real-world training workflows involving long-running model training on limited-GPU hardware. It aims to reduce boilerplate while improving safety and clarity in device management.
Example usage:
The text was updated successfully, but these errors were encountered: