8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/core/ivalue_inl.h#L952
void markCompleted( IValue value, c10::optional<std::vector<WeakStorage>> storages = c10::nullopt) { # .... for (const c10::Device& device : usedDevices) { c10::Event event(impl_.type()); event.record(impl_.getStream(device)); events_.push_back(std::move(event)); } # .... }
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/core/ivalue_inl.h#L1197
void synchronizeWithCurrentStreams() { for (c10::Event& event : events_) { event.block(impl_.getStream(event.device())); } # .... }
When usedDevices and impl_ is same device, both record and block will be executed on the same stream, which is redundant. Also, I would like to know under what circumstances different devices will be used? Are the records and blocks necessary here.
Additionally, I also mentioned a suspected bug related to this. #119266
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @ptrblck
Sorry, something went wrong.
Note: Links to docs will display an error until the docs builds have been completed.
As of commit e52245b with merge base becfda0 ():
distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last
distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm
distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Also, I would like to know under what circumstances different devices will be used?
Related to this comment, we are removing the multi-device support from ProcessGroupNCCL: #119099 #119421
this looks ok to me but i'm not very familiar with this code. maybe @ezyang can tag someone more familiar
Looks good for me. My feeling that this is double check/double sync and can be converted to assert that devices match.
cc @kwen2501
Not sure if the test failures are real but they do look suspicious
@pytorchbot merge -r
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here
avoid redundant event records and event blocks
e52245b
Successfully rebased tzwfeature2 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout tzwfeature2 && git pull --rebase)
tzwfeature2
refs/remotes/origin/viable/strict
git checkout tzwfeature2 && git pull --rebase
75effa2
Reason: This PR needs a release notes: label If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.
release notes:
If not, please add the topic: not user facing label.
topic: not user facing
To add a label, you can comment to pytorchbot, for example @pytorchbot label "topic: not user facing"
@pytorchbot label "topic: not user facing"
For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.
@pytorchbot merge
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX Team
Reason: 1 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud
Failing merge rule: Core Maintainers
https://github.com/pytorch/pytorch/blob/main/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L2394
{ c10::cuda::CUDAMultiStreamGuard streamGuard(ncclStream); std::vector<at::Device> devices{device}; work->future_ = c10::make_intrusive<at::ivalue::Future>( c10::ListType::create(c10::TensorType::get()), devices); // Add a callback that runs profiling end callbacks. wrapCallback() in CUDA // future blocks the stream this callback runs on the corresponding // ncclEndEvents_ ensuring appropriate synchronization. if (work->recordFunctionEndCallback_) { work->future_->addCallback( [work](at::ivalue::Future& /* unused */) { work->recordFunctionEndCallback_(); }, // uses_future = false allows us to skip synchronization in // ivalue::Future, but is only valid as long as the lambda doesn't use // the "Future" argument. /*uses_future=*/false); } work->future_->markCompleted(at::IValue(*work->outputs_)); } `` CUDAMultiStreamGuard will change current stream, so record and block are different streams.
IvanKobzarev IvanKobzarev approved these changes
wconstab Awaiting requested review from wconstab
ptrblck Awaiting requested review from ptrblck
Successfully merging this pull request may close these issues.