Deprecated: Please use multi-language pep-talk
Simple MATLAB/Octave API for PAPI (Performance Application Programming Interface).
- Hardware counters are measured for the parent and child threads (e.g. when using parallelized functions like
sum
). Unfortunately, there is no way to differentiate which counters come from which thread. - Each function in the MEX-file is locked (once loaded it can't be erased using
clear
function in MATLAB/Octave environment)
- Install PAPI >=5.5.1
- Build mPAPI functions:
mPAPI_register
,mPAPI_tic
,mPAPI_toc
,mPAPI_groupEvents
,mPAPI_enumNativeEvents
,mPAPI_enumPresetEvents
with MEX-compatible compiler (the repository contains two bash script for buildingbuild.sh
andbuild_all.sh
):
mex -I/usr/local/include mPAPI_register.c -L/usr/local/lib/ -lpapi -output mPAPI_register
mex -I/usr/local/include mPAPI_tic.c -L/usr/local/lib/ -lpapi -output mPAPI_tic
mex -I/usr/local/include mPAPI_toc.c -L/usr/local/lib/ -lpapi -output mPAPI_toc
mex -I/usr/local/include mPAPI_groupEvents.c -L/usr/local/lib/ -lpapi -output mPAPI_groupEvents
mex -I/usr/local/include mPAPI_enumNativeEvents.c -L/usr/local/lib/ -lpapi -output mPAPI_enumNativeEvents
mex -I/usr/local/include mPAPI_enumPresetEvents.c -L/usr/local/lib/ -lpapi -output mPAPI_enumPresetEvents
Where directory /usr/local/include
contains papi.h
header and directory /usr/local/lib/
contains libpapi.so
static library.
- Register hardware performance monitoring counters (PMC) using preset or native events:
- For the current thread/process:
ev = mPAPI_register({'FP_ARITH:SCALAR_SINGLE', 'L1D:REPLACEMENT', 'PAPI_L2_ICA'})
- In multiplex mode for the current thread:
ev = mPAPI_register({'FP_ARITH:SCALAR_SINGLE', 'L1D:REPLACEMENT', 'PAPI_L2_ICA'}, true)
- For a specific thread/process by PID:
ev = mPAPI_register({'PAPI_TOT_INS'}, 1234)
- Start counters for the specific event-set(s):
mPAPI_tic(ev)
- Read counters measurements
- For the specific event-set:
>> mPAPI_toc(ev) ans = [0, 1559, 4032]
- For many event-sets:
>> mPAPI_toc([ev1, ev2]) ans = [0, 1559, 4032; 0, 1450, 3999]
- Enumarate all available native or preset PAPI events:
>> mPAPI_enumNativeEvents()
ans = {'ix86arch::UNHALTED_CORE_CYCLES', 'ix86arch::INSTRUCTION_RETIRED', ...}
>> mPAPI_enumPresetEvents()
ans = {'PAPI_L1_DCM', 'PAPI_L1_ICM', ...}
- Divide events into compatible groups (that can be measured simultaneously)
>> mPAPI_groupEvents({'PAPI_L1_DCM', 'PAPI_L1_ICM', ...})
ans = {{'PAPI_L1_DCM', 'PAPI_L1_ICM', ...},
...
}
- Register sampling event and frequency (using overflow threshold):
ev = mPAPI_trace_register('PAPI_TOT_INS', 1000000, {'PAPI_BR_INS', 'PAPI_L1_DCM'}, 'kernel.trace')
The first argument is a performance event used as time, here we sample the program performance after some number of cycles defined by the second argument — sampling interval in the domain of time. The third argument is a cell array of performance events to measure. The last argument is a location of the trace result.
- Start the sub-trace, basically, a performance trace for a given test.
mPAPI_trace_tic(ev, 'R2015b:1:1:sdaxpy:loop:341:1'))
The second argument is a header. For conversion of the trace to CSV with
trace2csv
script you need to use header convention:env:threads:process:benchmark:version:N:in_process
. The fields represents:env
— execution environment e.g.R2015b
;threads
— number of threads,process
— number of test execution on different environment instances,benchmark
andversion
— the kernel and the version used (marked in the code with%! pragma version
),N
— input data size,in_process
— test repeition in the same instance of the execution environment.
- Perform the test.
- Finish the sub-trace
mPAPI_trace_toc(ev)
In order to set an older version of GCC (newer might not be supported by MATLAB's MEX compiler), run mex as follows:
mex GXX='/usr/bin/gcc-X.X' ... % R2013a/R2015b/R2018b
- The number of hardware counters available on the system defines the upper limit of counters you can register using
mPAPI_register
function. - Not all hardware counters can be mixed and used simultaneously (except when in multiplex mode).