fix a few performance drop in some matrix size per data type #2375

ewanglong · 2020-01-22T07:18:13Z

Signed-off-by: Wang,Long long1.wang@intel.com

Add this patch to fix a few performance drop in some matrix size. It was caused by the inappropriate GEMM_PREFERED_SIZE value(=32), which set the imbalance workload split per thread.
For example 1280x1280 single precision matrix in 16 threads. It was split into 8 threads with 96 width single precision(32 bits)size and 8 threads with 64 width. But for avx512 alignment, the 16 threads with 80 width single precision(32 bits)size is also acceptable and more important in such case is that the workload is also balanced in each thread.

Signed-off-by: Wang,Long <long1.wang@intel.com>

TiborGY · 2020-01-24T23:27:56Z

This is applied to all CPU types, right? You mentioned that this improves performance on AVX-512 CPUs, but what about AVX2 CPUs? (Haswell, Zen2, etc.)

martin-frbg · 2020-01-25T09:19:30Z

@TiborGY just Haswell and SkylakeX (it was added by Arjan van de Ven in 5b708e5 ). I have not merged it yet as I did not have time to run any benchmarks but I assume someone with an Intel email will have thought of that ?

martin-frbg · 2020-01-30T09:27:22Z

The gains seem small on my hardware (but at least I do not see a slowdown), would you happen to have a few numbers from your 16-thread case ?

fix a few performance drop in some matrix size per data type

fbf4f48

Signed-off-by: Wang,Long <long1.wang@intel.com>

6EC8 martin-frbg merged commit abc67bd into OpenMathLib:develop Jan 30, 2020

jeremiedbb mentioned this pull request Oct 8, 2021

Poor performance of sklearn.cluster.KMeans for numpy >= 1.19.0 scikit-learn/scikit-learn#20642

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix a few performance drop in some matrix size per data type #2375

fix a few performance drop in some matrix size per data type #2375

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix a few performance drop in some matrix size per data type #2375

fix a few performance drop in some matrix size per data type #2375

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!