-
Notifications
You must be signed in to change notification settings - Fork 24.4k
[Windows][cpu] mkl use mimalloc as allocator on Windows #138419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138419
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Unrelated FailureAs of commit 8094594 with merge base 675e16e ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
3a8736e
to
6d601a9
Compare
* add cmake option: USE_MIMALLOC_ON_MKL * wrap and export mi_malloc APIs in C10. * add MklAllocationHelp.cpp to register allocation APIs to MKL.
bc7aeee
to
bb746b1
Compare
#ifdef USE_MIMALLOC_ON_MKL | ||
namespace mi_malloc_wrapper { | ||
C10_API void* c10_mi_malloc(size_t size); | ||
C10_API void* c10_mi_calloc(size_t count, size_t size); | ||
C10_API void* c10_mi_realloc(void* p, size_t newsize); | ||
C10_API void* c10_mi_malloc_aligned(size_t size, size_t alignment); | ||
C10_API void c10_mi_free(void* p); | ||
} // namespace mi_malloc_wrapper | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have integrated mimalloc into c10.dll
in pervious PR. In this PR we need optimize the kernels in libtorch_cpu.dll
.
This C++ header is export mimalloc function from c10.dll
, and let libtorch_cpu.dll
import functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kernels in libtorch_cpu.dll
need these exported API? I don't see the dependencies in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it looks good. I thought the allocations are all registered in c10.
#ifdef USE_MIMALLOC_ON_MKL | ||
namespace mi_malloc_wrapper { | ||
C10_API void* c10_mi_malloc(size_t size); | ||
C10_API void* c10_mi_calloc(size_t count, size_t size); | ||
C10_API void* c10_mi_realloc(void* p, size_t newsize); | ||
C10_API void* c10_mi_malloc_aligned(size_t size, size_t alignment); | ||
C10_API void c10_mi_free(void* p); | ||
} // namespace mi_malloc_wrapper | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it looks good. I thought the allocations are all registered in c10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of annoying but I guess we can do Windows crimes
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchbot merge -f "ignore unrelated failure." |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Which version of oneAPI are you using to build pytorch? I am using oneAPI binaries from https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html and I am getting linking errors:
Looking at CMakeFiles/torch_cpu.rsp I see that mkl_core_dll.lib is used for linking and it doesn't define these i_* symbols. |
Hi @gshimansky, pytorch/aten/src/ATen/native/mkl/MklAllocationHelper.cpp Lines 18 to 21 in 3a6f014
|
For now I've turned this feature off in cmake with |
Hi @gshimansky you can only set |
@gshimansky We disabled it temporary: #139204 |
We did a lot of optimization for PyTorch Windows, and we got good progress of it. But still some models have performance gap between PyTorch Windows and PyTorch Linux. Ref: https://pytorch.org/blog/performance-boost-windows/#conclusion
From the blog conclusion, we found the
ResNet50
is typical case of it.Let's focus on the
ResNet50
, and collect the profiling log:We found the major kernel consume CPU resource is
aten::mkldnn_convolution
. It was dispatched toMKLDNN
.Acturally, we had optimized memory allocation via integrated mimalloc to pytorch C10 module. It helps PyTorch Windows boost a lot, but it does not cover
MKL
andMKLDNN
's intermediary temporary memory.We still have potential to improve PyTorch Windows performance via optimize
MKL
andMKLDNN
's intermediary temporary memory.So, I discussed with Intel MKL team, and get a method to register high performance memory allocation API to MKL, and it would help MKL to boost memory performance. Please check the online document: https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-windows/2023-0/redefining-memory-functions.html
This PR is optimize MKL memory alloction performance on Windows, via register mi_malloc to MKL. PR Changes:
USE_MIMALLOC_ON_MKL
, It is sub-option ofUSE_MIMALLOC
.USE_MIMALLOC_ON_MKL
isON
.USE_MIMALLOC_ON_MKL
isON
.For
oneDNN
, it is still tracking in this proposal: uxlfoundation/oneDNN#1898cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10