8000 [Intel GPU] fix memory leak in deconv backward by jianyizh · Pull Request #144385 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[Intel GPU] fix memory leak in deconv backward #144385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jianyizh
Copy link
Contributor
@jianyizh jianyizh commented Jan 8, 2025

Fixes #143807

We need manage onednn scratchpad in pytorch, otherwise onednn will always allocate scratchpad memory during primitive execution and causes memory leak.

cc @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Copy link
pytorch-bot bot commented Jan 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144385

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Pending, 3 Unrelated Failures

As of commit acefd1c with merge base 9da376d (image):

NEW FAILURE - The following job has failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jan 8, 2025
@jianyizh
Copy link
Contributor Author
jianyizh commented Jan 8, 2025

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jan 8, 2025
@jianyizh
Copy link
Contributor Author
jianyizh commented Jan 8, 2025

When scratchpad mode = library as onednn default, onednn manage scratchpad, but it's thread local, so we either using dnnl::scratchpad_mode::user or ONEDNN_ENABLE_CONCURRENT_EXEC=ON. But concurrent execution will lead larger memory footprint.

@jianyizh jianyizh marked this pull request as ready for review January 9, 2025 02:17
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 9, 2025
@etaf etaf added the ciflow/xpu Run XPU CI tasks label Jan 17, 2025
@jianyizh
Copy link
Contributor Author
jianyizh commented Feb 6, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased jianyi/fix_deconv onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout jianyi/fix_deconv && git pull --rebase)

@EikanWang
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 13, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu)

Details for Dev Infra team Raised by workflow job

@EikanWang
Copy link
Collaborator

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 4 checks: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 1, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 2, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 3, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@guangyey guangyey added release notes: xpu release notes category module: xpu Intel XPU related issues and removed topic: not user facing topic category module: cpu CPU specific problem (e.g., perf, algorithm) labels Feb 13, 2025
@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Feb 13, 2025
@guangyey
Copy link
Collaborator

@EikanWang should this PR targe to 2.6.1?

@guangyey guangyey added this to the 2.6.1 milestone Feb 13, 2025
@EikanWang
Copy link
Collaborator

Yes. We need to cherry-pick this PR to 2.6.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: xpu Intel XPU related issues open source release notes: xpu release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

XPU ConvTranspose2d Causes DataLoader Memory Leak
8 participants
0