[Intel GPU] fix memory leak in deconv backward #144385

jianyizh · 2025-01-08T10:04:35Z

We need manage onednn scratchpad in pytorch, otherwise onednn will always allocate scratchpad memory during primitive execution and causes memory leak.

cc @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2025-01-08T10:04:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144385

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Pending, 3 Unrelated Failures

As of commit acefd1c with merge base 9da376d ():

NEW FAILURE - The following job has failed:

xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stft_xpu

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 1, 4, linux.idc.xpu) (gh) (similar failure)
inductor/test_max_autotune.py::TestPrologueFusion::test_pending_fusion_pro_and_epi
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 2, 4, linux.idc.xpu) (gh) (similar failure)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_xpu
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 3, 4, linux.idc.xpu) (gh) (similar failure)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoXPU::test_comprehensive_fft_hfft2_xpu_float64

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jianyizh · 2025-01-08T10:07:06Z

@pytorchbot label "topic: not user facing"

jianyizh · 2025-01-08T10:15:02Z

When scratchpad mode = library as onednn default, onednn manage scratchpad, but it's thread local, so we either using dnnl::scratchpad_mode::user or ONEDNN_ENABLE_CONCURRENT_EXEC=ON. But concurrent execution will lead larger memory footprint.

jianyizh · 2025-02-06T02:28:48Z

@pytorchbot rebase

pytorchmergebot · 2025-02-06T02:30:22Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-02-06T02:30:24Z

Successfully rebased jianyi/fix_deconv onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout jianyi/fix_deconv && git pull --rebase)

EikanWang · 2025-02-13T03:10:26Z

@pytorchbot merge

pytorchmergebot · 2025-02-13T03:12:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-02-13T03:12:29Z

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu)

Details for Dev Infra team

Raised by workflow job

EikanWang · 2025-02-13T07:33:52Z

@pytorchbot merge -i

pytorchmergebot · 2025-02-13T07:35:36Z

Merge started

Your change will be merged while ignoring the following 4 checks: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 1, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 2, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 3, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

guangyey · 2025-02-13T07:48:06Z

@EikanWang should this PR targe to 2.6.1?

EikanWang · 2025-02-14T02:36:42Z

Yes. We need to cherry-pick this PR to 2.6.1.

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jan 8, 2025

pytorch-bot bot added the topic: not user facing topic category label Jan 8, 2025

pytorchbot added the open source label Jan 8, 2025

jianyizh marked this pull request as ready for review January 9, 2025 02:17

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 9, 2025

liangan1 approved these changes Jan 10, 2025

View reviewed changes

EikanWang approved these changes Jan 17, 2025

View reviewed changes

etaf added the ciflow/xpu Run XPU CI tasks label Jan 17, 2025

fix memory leak

acefd1c

pytorchmergebot force-pushed the jianyi/fix_deconv branch from 5a0050f to acefd1c Compare February 6, 2025 02:30

pytorchmergebot requested a review from gujinghui as a code owner February 6, 2025 02:30

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 13, 2025

pytorchmergebot added the merging label Feb 13, 2025

pytorchmergebot removed the merging label Feb 13, 2025

pytorchmergebot added the merging label Feb 13, 2025

pytorchmergebot added the Merged label Feb 13, 2025

pytorchmergebot closed this in 17d3a69 Feb 13, 2025

pytorchmergebot removed the merging label Feb 13, 2025

guangyey added release notes: xpu release notes category module: xpu Intel XPU related issues and removed topic: not user facing topic category module: cpu CPU specific problem (e.g., perf, algorithm) labels Feb 13, 2025

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Feb 13, 2025

guangyey added this to the 2.6.1 milestone Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Intel GPU] fix memory leak in deconv backward #144385

[Intel GPU] fix memory leak in deconv backward #144385

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Intel GPU] fix memory leak in deconv backward #144385

[Intel GPU] fix memory leak in deconv backward #144385

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144385

❌ 1 New Failure, 3 Pending, 3 Unrelated Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Uh oh!