Single CPU Core Matrix Multiplication Benchmarks

This repository aims to benchmark Matrix Multiply (SGEMM) hand-tuned libraries and code generation stacks on a single thread on one CPU core. The focus will be on machine learning workloads so FP32 or smaller and irregular sizes of matrices. The goal is to expose high performance atomic kernels that can then be used to build highly efficient higher level implemenations spanning multiple cores or distributed across systems.

Installation

First checkout the repo with submodules

git clone --recurse-submodules -j8 https://github.com/mmperf/mmperf.git

To build the code, run

cmake -GNinja -DCMAKE_CXX_COMPILER=clang++-11 -DCMAKE_C_COMPILER=clang-11 -DUSE_MLIR=ON -B build .
cmake --build build

To plot the results, you will need to install matplotlib.

pip install matplotlib

Running the code

We use AOT compilation to generate the binaries for matrix multiplication and then run them to generate the benchmarking numbers. To run all the tests, do

cmake --build build/matmul --target run_all_tests

To plot the results against MKL (and generate a plot like above), run

python3 plot_results.py

To run a specific matrix size (say 24x64x512), run

./build/matmul/matmul_24x64x512

Code structure

The linalg codegen pass is in matmul/matmul-compile/matmul-compile.cpp.

Hardware information

This benchmark was run on an Intel Xeon CPU running at 3.1GHz. The machine has 256Kb L1 cache, 8Mb L2 cache and 24.8Mb L3 cache. It supports AVX-512 instructions. The peak performance of the machine is 3.1 x 8 x 2 x 2 = 99.2 GFLOPS for double precision and 198.4 GFLOPS for single precision.

TODO:

Add Accelerate Framework

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
asm		asm
external		external
halide		halide
matmul		matmul
mkl		mkl
openblas		openblas
ruy		ruy
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
benchmark_sizes.txt		benchmark_sizes.txt
matmul.png		matmul.png
plot_results.py		plot_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Single CPU Core Matrix Multiplication Benchmarks

Installation

Running the code

Code structure

Hardware information

TODO:

About

Uh oh!

Releases

Packages

Languages

License

srcarroll/mmperf

Folders and files

Latest commit

History

Repository files navigation

Single CPU Core Matrix Multiplication Benchmarks

Installation

Running the code

Code structure

Hardware information

TODO:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages