Enable XPUEvent elapsed_time function #134666

guangyey · 2024-08-28T10:13:47Z

Stack from ghstack (oldest at bottom):

-> Enable XPUEvent elapsed_time function #134666

Motivation

This PR aims to enable elapsed_time function for XPUEvent.

Additional Context

This PR depends on toolchain oneAPI 2025.0.

cc @gujinghui @EikanWang @fengyuan14

pytorch-bot · 2024-08-28T10:13:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134666

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4952ae0 with merge base 565a794 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: f3ca80d Pull Request resolved: #134666

[ghstack-poisoned]

aten/src/ATen/xpu/XPUEvent.h

c10/xpu/impl/XPUGuardImpl.h

ghstack-source-id: e70cf67 Pull Request resolved: #134666

gujinghui · 2024-09-11T02:10:53Z

aten/src/ATen/xpu/XPUEvent.h

+  void assignEvent(sycl::queue& queue) {
+    if (enable_timing_) {
+      event_ = std::make_unique<sycl::event>(
+          sycl::ext::oneapi::experimental::submit_profiling_tag(queue));


Do we know when they will move this API from experiment 8000 al namespace?

I assume it should be a long-term plan. Let me confirm with the compiler team.

The compiler team has a plan to move them from experimental since 2025.1

gujinghui · 2024-09-11T02:11:39Z

aten/src/ATen/xpu/XPUEvent.h

@@ -128,7 +125,7 @@ struct TORCH_XPU_API XPUEvent {
    }
  }

-  float elapsed_time(const XPUEvent& other) const {
+  double elapsed_time(const XPUEvent& other) const {


why double? align with other device type?

Align to frontend API. refer to

pytorch/torch/csrc/xpu/Event.cpp

Line 100 in 26e5572

return PyFloat_FromDouble(self->xpu_event.elapsed_time(other->xpu_event));

[ghstack-poisoned]

ZzEeKkAa · 2024-09-12T16:17:14Z

Was this code tested somehow? I'm getting runtime crashes for pytorch with 2025 compiler (not only related to this particular branch).

UPD: my issue was related to #135818

EikanWang · 2024-09-27T01:56:07Z

c10/xpu/impl/XPUGuardImpl.h

+  double elapsedTime(void* event1, void* event2, const DeviceIndex device_index)
+      const override {


Since the two events have been associated with a particular device, why do we require device_index here?

cuda's elapsedTime needs to know device_index to switch the context to destroy cuda event. We have to align it as this function is an implementation of the same interface.

EikanWang · 2024-09-27T01:57:24Z

c10/xpu/impl/XPUGuardImpl.h

+    TORCH_CHECK(
+        event1 && event2,
+        "Both events must be recorded before calculating elapsed time.");


I think the two events need to be checked if they are associated with the device_index.

These two events will be checked in

pytorch/c10/core/impl/InlineEvent.h

Line 114 in 565a794

TORCH_CHECK(

. And it is legal to measure time between events in different device index.

test/test_xpu.py

guangyey · 2024-11-11T07:49:50Z

aten/src/ATen/xpu/XPUEvent.h

+      event_ = std::make_unique<sycl::event>(
+          sycl::ext::oneapi::experimental::submit_profiling_tag(queue));
+    } else {
+      event_ = std::make_unique<sycl::event>(queue.ext_oneapi_submit_barrier());


We plan to use ext_oneapi_get_last_event to replace ext_oneapi_submit_barrier when recording an event without the enable_timing feature.
This code change will happen while CI uplifts to 2025.0.

ghstack-source-id: b7ce1b3 Pull Request resolved: #134666

[ghstack-poisoned]

Patch conflict due to pytorch/pytorch#134666 Benchmark CI: * https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11855565969 (elapsed_time failed) * https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11856129996 (legacy profiler works) Inductor CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11856999355 (passed) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

guangyey · 2024-11-16T08:45:41Z

@pytorchbot revert "It seems to raise an internal failure."

pytorch-bot · 2024-11-16T08:45:44Z

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message, -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

guangyey · 2024-11-16T08:46:05Z

@pytorchbot revert -m "It seems to raise an internal failure."

pytorch-bot · 2024-11-16T08:46:07Z

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

guangyey · 2024-11-16T08:49:37Z

@pytorchbot revert -m "It seems to raise an internal failure." -c weird

pytorchmergebot · 2024-11-16T08:51:08Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-11-16T08:51:15Z

Reverting PR 134666 failed

Reason: Command git -C /home/runner/work/pytorch/pytorch revert --no-edit 4bbd6da33101a8d709f1d2921ad8ae6f9b0dc166 returned non-zero exit code 1

Auto-merging aten/src/ATen/xpu/XPUEvent.h
CONFLICT (content): Merge conflict in aten/src/ATen/xpu/XPUEvent.h
Auto-merging c10/xpu/impl/XPUGuardImpl.h
CONFLICT (content): Merge conflict in c10/xpu/impl/XPUGuardImpl.h
Auto-merging test/test_xpu.py
error: could not revert 4bbd6da3310... Enable XPUEvent elapsed_time function (#134666)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git revert --continue".
hint: You can instead skip this commit with "git revert --skip".
hint: To abort and get back to the state before "git revert",
hint: run "git revert --abort".
hint: Disable this message with "git config advice.mergeConflict false"

Details for Dev Infra team

Raised by workflow job

# Motivation This PR raises an internal UT failure on XPU. This reverts commit 4bbd6da. # Additional Context refer to #140814 Pull Request resolved: #140872 Approved by: https://github.com/EikanWang

…rch#140872) # Motivation This PR raises an internal UT failure on XPU. This reverts commit 4bbd6da. # Additional Context refer to pytorch#140814 Pull Request resolved: pytorch#140872 Approved by: https://github.com/EikanWang

# Motivation This PR intends to reland #134666 that has been reverted in #140872 We reverted it because I forgot to support `elapsed_time` for `XPUGuardImpl`, which resulted in `c10::Event` not supporting' elapsed_time' and blocking XPU CI. # Additional Context We split #134666 into two parts: one part, PR #140865, supports `elapsed_time` for `torch.Event` and another one, this PR, supports for `torch.xpu.elapsed_time`. Pull Request resolved: #140873 Approved by: https://github.com/gujinghui ghstack dependencies: #140865

# Motivation This PR aims to enable `elapsed_time` function for `XPUEvent`. # Additional Context This PR depends on toolchain oneAPI 2025.0. Pull Request resolved: pytorch#134666 Approved by: https://github.com/EikanWang, https://github.com/ezyang

…rch#140872) # Motivation This PR raises an internal UT failure on XPU. This reverts commit 4bbd6da. # Additional Context refer to pytorch#140814 Pull Request resolved: pytorch#140872 Approved by: https://github.com/EikanWang

# Motivation This PR intends to reland pytorch#134666 that has been reverted in pytorch#140872 We reverted it because I forgot to support `elapsed_time` for `XPUGuardImpl`, which resulted in `c10::Event` not supporting' elapsed_time' and blocking XPU CI. # Additional Context We split pytorch#134666 into two parts: one part, PR pytorch#140865, supports `elapsed_time` for `torch.Event` and another one, this PR, supports for `torch.xpu.elapsed_time`. Pull Request resolved: pytorch#140873 Approved by: https://github.com/gujinghui ghstack dependencies: pytorch#140865

guangyey requested review from EikanWang and gujinghui as code owners August 28, 2024 10:13

guangyey added a commit that referenced this pull request Aug 28, 2024

Enable XPUEvent elapsed_time function

86d1265

ghstack-source-id: f3ca80d Pull Request resolved: #134666

guangyey marked this pull request as draft August 28, 2024 10:14

guangyey added the ciflow/xpu Run XPU CI tasks label Aug 28, 2024

pytorchbot added the open source label Aug 28, 2024

Update

6ee66a5

[ghstack-poisoned]

anmyachev reviewed Sep 10, 2024

View reviewed changes

aten/src/ATen/xpu/XPUEvent.h Outdated Show resolved Hide resolved

c10/xpu/impl/XPUGuardImpl.h Outdated Show resolved Hide resolved

guangyey added a commit that referenced this pull request Sep 11, 2024

Enable XPUEvent elapsed_time function

fecda54

ghstack-source-id: e70cf67 Pull Request resolved: #134666

gujinghui reviewed Sep 11, 2024

View reviewed changes

Update

d43d69c

[ghstack-poisoned]

EikanWang reviewed Sep 27, 2024

View reviewed changes

gujinghui reviewed Oct 10, 2024

View reviewed changes

test/test_xpu.py Show resolved Hide resolved

guangyey added the release notes: xpu release notes category label Nov 11, 2024

guangyey added this to the 2.6.0 milestone Nov 11, 2024

guangyey marked this pull request as ready for review November 11, 2024 03:34

guangyey added the module: xpu Intel XPU related issues label Nov 11, 2024

guangyey requested review from EikanWang and gujinghui November 11, 2024 07:43

guangyey commented Nov 11, 2024

View reviewed changes

guangyey added a commit that referenced this pull request Nov 11, 2024

Enable XPUEvent elapsed_time function

bc21779

ghstack-source-id: b7ce1b3 Pull Request resolved: #134666

guangyey added 2 commits November 11, 2024 10:53

Update

d11c6ce

[ghstack-poisoned]

Update

7aa1322

[ghstack-poisoned]

pytorchmergebot added the Merged label Nov 13, 2024

pytorchmergebot closed this in 4bbd6da Nov 13, 2024

pytorchmergebot removed the merging label Nov 13, 2024

anmyachev mentioned this pull request Nov 15, 2024

Update PyTorch pin intel/intel-xpu-backend-for-triton#2719

Merged

chuanqi129 mentioned this pull request Nov 15, 2024

[CI] Upgrade XPU support packages version to 2025.0 #139775

Closed

guangyey mentioned this pull request Nov 16, 2024

Support torch.Event elapsed_time method on XPU #140865

Closed

guangyey mentioned this pull request Nov 16, 2024

Revert "Enable XPUEvent elapsed_time function (#134666)" #140872

Closed

guangyey mentioned this pull request Nov 26, 2024

[Reland] Enable XPUEvent elapsed_time function #140873

Closed

github-actions bot deleted the gh/guangyey/65/head branch December 18, 2024 02:10

atalman mentioned this pull request Jan 13, 2025

Release 2.6.0 validations checklist and cherry-picks #144503

Closed

73 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable XPUEvent elapsed_time function #134666

Enable XPUEvent elapsed_time function #134666

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		double elapsedTime(void* event1, void* event2, const DeviceIndex device_index)
		const override {

Enable XPUEvent elapsed_time function #134666

Enable XPUEvent elapsed_time function #134666

Uh oh!

Conversation

Uh oh!

Motivation

Additional Context

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134666

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reverting PR 134666 failed

Uh oh!

Uh oh!