8000 [CI] Fix xpu linux ci build environment duplicated issue by chuanqi129 · Pull Request #141546 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[CI] Fix xpu linux ci build environment duplicated issue #141546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

chuanqi129
Copy link
Collaborator
@chuanqi129 chuanqi129 commented Nov 26, 2024

We found that there are duplicated build environments in XPU linux ci test, it led to test jobs may download wrong pytorch build artifact file. Refer https://github.com/pytorch/pytorch/actions/runs/12023238798/job/33518351906#step:14:633

Works for #139722 and #114850

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 26, 2024
@chuanqi129 chuanqi129 requested a review from atalman November 26, 2024 06:04
@chuanqi129 chuanqi129 added the ciflow/xpu Run XPU CI tasks label Nov 26, 2024
Copy link
Contributor
@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like possible CI issue: echo 'Error: Available diskspace is less than 70 percent. Not enough diskspace.'

@atalman
Copy link
Contributor
atalman commented Nov 26, 2024

The failures seems to be unrelated, however please look into these :
inductor/test_inductor_freezing.py::FreezingGpuTests::test_mm_concat_xpu inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_maximize_xpu

Copy link
Contributor
@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Please investigate the XPU failures on this PR

@chuanqi129
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased fix_xpu_ci onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix_xpu_ci && git pull --rebase)

@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 26, 2024
@chuanqi129
Copy link
Collaborator Author

@pytorchbot rebase -b main

@chuanqi129 chuanqi129 requested a review from atalman November 27, 2024 02:01
@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased fix_xpu_ci onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout fix_xpu_ci && git pull --rebase)

@chuanqi129
Copy link
Collaborator Author

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased fix_xpu_ci onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout fix_xpu_ci && git pull --rebase)

@chuanqi129
Copy link
Collaborator Author
chuanqi129 commented Nov 27, 2024

Hi @atalman, I have resolved the diskspace issue and rebased the PR with latest main, previous 2 xpu inductor UT failures has been fixed. But there is only one doctests failure in the latest CI tests, it's very strange and may need more time to root cause it. I have created a issue #141705 to track it, and will try to fix it ASAP. Can we land this PR firstly?

@chuanqi129 chuanqi129 added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 27, 2024
Copy link
Contributor
@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thank for taking time resolving and looking into these issues

@chuanqi129
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Dec 3, 2024
# Motivation
Fix this UT failure introduced by #140865. The unrelated failure suppressed this UT failure.
It goes to happen since #141546 is landed.

Pull Request resolved: #141800
Approved by: https://github.com/EikanWang
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
)

We found that there are duplicated build environments in XPU linux ci test, it led to test jobs may download wrong pytorch build artifact file. Refer https://github.com/pytorch/pytorch/actions/runs/12023238798/job/33518351906#step:14:633

Works for pytorch#139722 and pytorch#114850
Pull Request resolved: pytorch#141546
Approved by: https://github.com/EikanWang, https://github.com/atalman
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
# Motivation
Fix this UT failure introduced by pytorch#140865. The unrelated failure suppressed this UT failure.
It goes to happen since pytorch#141546 is landed.

Pull Request resolved: pytorch#141800
Approved by: https://github.com/EikanWang
1E80
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants
0