10000 fix a few performance drop in some matrix size per data type by ewanglong · Pull Request #2375 · OpenMathLib/OpenBLAS · GitHub
[go: up one dir, main page]

Skip to content

fix a few performance drop in some matrix size per data type #2375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2020

Conversation

ewanglong
Copy link
Contributor

Signed-off-by: Wang,Long long1.wang@intel.com

Add this patch to fix a few performance drop in some matrix size. It was caused by the inappropriate GEMM_PREFERED_SIZE value(=32), which set the imbalance workload split per thread.
For example 1280x1280 single precision matrix in 16 threads. It was split into 8 threads with 96 width single precision(32 bits)size and 8 threads with 64 width. But for avx512 alignment, the 16 threads with 80 width single precision(32 bits)size is also acceptable and more important in such case is that the workload is also balanced in each thread.

Signed-off-by: Wang,Long <long1.wang@intel.com>
@TiborGY
Copy link
Contributor
TiborGY commented Jan 24, 2020

This is applied to all CPU types, right? You mentioned that this improves performance on AVX-512 CPUs, but what about AVX2 CPUs? (Haswell, Zen2, etc.)

@martin-frbg
Copy link
Collaborator

@TiborGY just Haswell and SkylakeX (it was added by Arjan van de Ven in 5b708e5 ). I have not merged it yet as I did not have time to run any benchmarks but I assume someone with an Intel email will have thought of that ?

@martin-frbg
Copy link
Collaborator

The gains seem small on my hardware (but at least I do not see a slowdown), would you happen to have a few numbers from your 16-thread case ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0