8000 Prioritize building with libgomp over libomp by mwlon · Pull Request #138834 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Prioritize building with libgomp over libomp #138834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mwlon
Copy link
Contributor
@mwlon mwlon commented Oct 24, 2024

Fixes #126211

The approach I'm taking here is simply to prioritize gomp over omp, if it is available. Here's how this should affect possible build scenarios:

  • host with gomp+omp, building without MKL DNN:
    ** old behavior: uses omp
    ** new behavior: uses gomp
  • host with gomp+omp, building with MKL DNN:
    ** old behavior: unsafely links in both gomp+omp
    ** new behavior: uses only gomp

The pytorch distributions have always only linked to gomp, so I assume they're built on a machine without libomp. I've verified on my own machine that this build avoids linking to libomp.

< 8000 /div>

Copy link
pytorch-bot bot commented Oct 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138834

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 1 Unrelated Failure

As of commit c674810 with merge base e802b29 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@mwlon
Copy link
Contributor Author
mwlon commented Oct 24, 2024

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 24, 2024
@soulitzer soulitzer requested a review from malfet October 28, 2024 15:17
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 28, 2024
@mwlon
Copy link
Contributor Author
mwlon commented Dec 10, 2024

@malfet @mingfeima Could I get a review or merge on this?

@malfet
Copy link
Contributor
malfet commented Dec 10, 2024

@mwlon in principle this change looks good to me, but can you please add a few more references to what is libomp (llvm's OpenMP runtime I guess) and why gomp is preferable, when available

Last but not least, what are the performance implications?

@mwlon
Copy link
Contributor Author
mwlon commented Dec 10, 2024

@malfet I added the comments. As for performance:

  • This has no performance implications for Pytorch's standard builds, since they'll still use libgomp.
  • For the cases this fixes (so that they only link libgomp instead of both libgomp and libomp), performance can greatly improve, but this is very particular to the running process. For us it was a 5-10x improvement for many jobs, but that's because of effects in unrelated libraries (libarrow, not any libtorch library) that found the wrong symbols.
  • For the cases where the building machine has libomp but no libgomp, I don't have any numbers on the performance implications, nor do I have an easy way to obtain them. But I would expect almost no difference; OMP is just a threading layer, so as long as you use one consistently, it shouldn't matter too much which one you use.

@mwlon
Copy link
Contributor Author
mwlon commented Dec 11, 2024

@malfet I don't believe I have permissions to merge this. Do you want to?

@mwlon
Copy link
Contributor Author
mwlon commented Dec 16, 2024

@malfet @mingfeima ping

@mwlon
Copy link
Contributor Author
mwlon commented Dec 25, 2024

@pytorchbot merge

Copy link
pytorch-bot bot commented Dec 25, 2024

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

@malfet
Copy link
Contributor
malfet commented Feb 7, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased fix-build-gomp-priority onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix-build-gomp-priority && git pull --rebase)

@malfet
Copy link
Contributor
malfet commented Feb 7, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 7, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@leslie-fang-intel
Copy link
Collaborator
leslie-fang-intel commented Feb 19, 2025

Thanks for the PR @mwlon. I would like to link to another issue (#136307) which is currently under investigation. While this issue may not be directly related to your fix, it relates to a potential conflict between the omp/gomp and iomp. Specifically, if PyTorch is built with libgomp and libiomp is preloaded at runtime, below script fails to run with certain numactl commands:

LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so KMP_AFFINITY=granularity=fine,compact,1,0 python -c 'import os; from torch._C import *; os.system("numactl -C 1 -s")'

@atalman atalman added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Feb 19, 2025
@leslie-fang-intel
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
8000 Copy link
Collaborator

Successfully rebased fix-build-gomp-priority onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix-build-gomp-priority && git pull --rebase)

@leslie-fang-intel
Copy link
Collaborator
leslie-fang-intel commented Feb 20, 2025

This Job: X linux-focal-py3_9-clang9-xla / build seems to be a real failure: https://github.com/pytorch/pytorch/actions/runs/13424557739/job/37504797027 which seems building fbgemm related things. @mwlon are you interested to take a look of this failure?

Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 17, 2025
@jakirkham
Copy link

Not stale

In fact it looks like we are still waiting for action from maintainers

@cyyever
Copy link
Collaborator
cyyever commented May 17, 2025

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased fix-build-gomp-priority onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout fix-build-gomp-priority && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the fix-build-gomp-priority branch from 8c43093 to c674810 Compare May 17, 2025 04:11
@pytorch-bot pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR labels May 17, 2025
@cyyever
Copy link
Collaborator
cyyever commented May 17, 2025

@pytorchmergebot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 17, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: Apply lint suggestions

Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request open source Stale topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conflicting OMP implementations linked when building with MKLDNN
9 participants
0