Default TreadPool size to number of physical cores #125963

malfet · 2024-05-10T21:10:00Z

TODO: Some benchmarks

pytorch-bot · 2024-05-10T21:10:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125963

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit be2399a with merge base bbe68a1 ():

NEW FAILURES - The following jobs have failed:

pull / linux-focal-cuda12.4-py3.10-gcc9 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDASparseDescriptors.h:119:68: error: ‘cusparseStatus_t cusparseCreateBsrsm2Info(bsrsm2Info**)’ is deprecated: The routine will be removed in the next major release [-Werror=deprecated-declarations]
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDASparseDescriptors.h:119:68: error: ‘cusparseStatus_t cusparseCreateBsrsm2Info(bsrsm2Info**)’ is deprecated: The routine will be removed in the next major release [-Werror=deprecated-declarations]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

janeyx99

I do believe this change is for the better but I'm not an expert and so I cannot ascertain that is always better. @malfet what's the plan on the benchmarks haha

facebook-github-bot · 2024-05-10T22:11:27Z

@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Skylion007 · 2024-05-11T14:12:21Z

Will this work properly for partial CPU allocations on SLURM clusters?

msaroufim · 2024-05-12T16:22:25Z

This might be a noop concern but was also curious how a change like this affects the performance of distributed jobs

malfet · 2024-05-14T19:26:41Z

Will this work properly for partial CPU allocations on SLURM clusters?

I can only hope for it :)

Skylion007 · 2024-05-14T19:39:27Z

I'm just worried becaues there might be say 16 logical cores, and only 4 are available to the job meaning that the cores become over-subscribed.

janeyx99 · 2024-05-14T19:41:16Z

@Skylion007 that is the exact type of problem we're attempting to fix, as processors usually means threads and cores means actual cores (what we're thinking). @malfet's PR will be strictly an improvement on that front as written compared to before the change.

gajjanag · 2024-05-14T21:26:30Z

LGTM - tested both a hyperthreaded multicore and non hyperthreaded multicore. It gives the correct thread count now.

sanchitintel · 2024-05-15T23:05:52Z

tested both a hyperthreaded multicore and non hyperthreaded multicore. It gives the correct thread count now.

Hi @gajjanag, can you please clarify if you meant you were getting incorrect counts earlier with torch.get_num_threads?
For instance, even with HyperThreading enabled on an Intel machine, I only see physical core count with torch.get_num_threads at my end (without this patch).

Thanks!

gajjanag · 2024-05-23T17:36:17Z

tested both a hyperthreaded multicore and non hyperthreaded multicore. It gives the correct thread count now.

Hi @gajjanag, can you please clarify if you meant you were getting incorrect counts earlier with torch.get_num_threads? For instance, even with HyperThreading enabled on an Intel machine, I only see physical core count with torch.get_num_threads at my end (without this patch).

Thanks!

Yes, I was getting incorrect counts before this patch (eg a 2 socket 56 core each Intel was giving 224 thread count, but it now gives the correct 112)

sanchitintel · 2024-05-23T21:15:39Z

Thanks for confirming, @gajjanag!

That hasn't been my experience, with HyperThreading enabled (without this patch) -

malfet · 2024-05-23T21:22:07Z

Thanks for confirming, @gajjanag!

That hasn't been my experience, with HyperThreading enabled (without this patch) -

We need to chat about this. Perhaps cpuinfo does not correctly work on your system, but it should have returned number of logical cores rather than physical ones.

malfet · 2024-05-23T21:22:48Z

@pytorchbot rebase

pytorchmergebot · 2024-05-23T21:24:18Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

TODO: Some benchmarks

pytorchmergebot · 2024-05-23T21:24:21Z

Successfully rebased malfet-patch-29 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout malfet-patch-29 && git pull --rebase)

malfet · 2024-05-24T13:36:33Z

@pytorchbot merge

pytorchmergebot · 2024-05-24T13:38:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-24T13:38:51Z

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

malfet · 2024-05-24T15:32:40Z

@pytorchbot merge -i

pytorchmergebot · 2024-05-24T15:34:41Z

Merge started

Your change will be merged while ignoring the following 3 checks: pull / linux-focal-cuda12.4-py3.10-gcc9 / build, pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build, Meta Internal-Only Changes Check

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

TODO: Some benchmarks Pull Request resolved: pytorch#125963 Approved by: https://github.com/janeyx99, https://github.com/Skylion007, https://github.com/gajjanag, https://github.com/jgong5

malfet added the topic: performance topic category label May 10, 2024

malfet requested review from albanD and janeyx99 May 10, 2024 21:10

janeyx99 approved these changes May 10, 2024

View reviewed changes

Skylion007 approved these changes May 12, 2024

View reviewed changes

gajjanag approved these changes May 14, 2024

View reviewed changes

jgong5 approved these changes May 15, 2024

View reviewed changes

Default TreadPool size to number of physical cores

be2399a

TODO: Some benchmarks

pytorchmergebot force-pushed the malfet-patch-29 branch from 230082a to be2399a Compare May 23, 2024 21:24

malfet added the release notes: python_frontend python frontend release notes category label May 24, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 24, 2024

pytorchmergebot added the merging label May 24, 2024

pytorchmergebot removed the merging label May 24, 2024

pytorchmergebot added the merging label May 24, 2024

pytorchmergebot added the Merged label May 24, 2024

pytorchmergebot closed this in 194950c May 24, 2024

pytorchmergebot removed the merging label May 24, 2024

huydhn mentioned this pull request May 24, 2024

[CUDA] [CI]: Enable CUDA 12.4 CI #121956

Closed

github-actions bot deleted the malfet-patch-29 branch June 25, 2024 01:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default TreadPool size to number of physical cores #125963

Default TreadPool size to number of physical cores #125963

Default TreadPool size to number of physical cores #125963

Default TreadPool size to number of physical cores #125963

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125963

❌ 2 New Failures

Choose a reason for hiding this comment

Merge started

Merge failed

Merge started