10000 MultiGPU training tiny torch backend by tocubed · Pull Request #9317 · tinygrad/tinygrad · GitHub
[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiGPU training tiny torch backend #9317

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

tocubed
Copy link
Contributor
@tocubed tocubed commented Mar 2, 2025

Messy but working with GPUS=4 LLVM=1 TINY_BACKEND=1 python3 examples/other_mnist/beautiful_mnist_torch.py
Quite a few stubs and TODOs to work on.

Copy link
Contributor
github-actions bot commented Mar 2, 2025

This branch currently is behind tinygrad/master. The line count difference bot is disabled.

@tocubed

This comment was marked as resolved.

@geohot geohot added the bounty locked Bounty is locked to someone label Mar 2, 2025
@geohot
Copy link
Collaborator
geohot commented Mar 2, 2025

Nice! Does DistributedDataParallel work also?

@tocubed
Copy link
Contributor Author
tocubed commented Mar 3, 2025

Nice! Does DistributedDataParallel work also?

Not yet, work in progress. A few tricky things and more C++ code than I'd like, but close to something running, just need to get a basic multiprocess gather/scatter/reduce with tiny tensors.

< 8000 div class="js-timeline-item js-timeline-progressive-focus-container" data-gid="C_kwDON-8xttoAKDM5MWYwYWUyNGM5NWU4NWY2MGNjN2YxMGU5YmE2NTA5MGVjY2IxMzg">
@tocubed
Copy link
Contributor Author
tocubed commented Mar 5, 2025

Issue narrowed down to torch backend not supporting views for out or dest tensors. Working on fix.
DDP uses slices of larger tensors to accumulate grads from different layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty locked Bounty is locked to someone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0