8000 tests: Add an explanation of run-perfbench.py. · DucRP/micropython@ad308bc · GitHub
[go: up one dir, main page]

Skip to content

Commit ad308bc

Browse files
committed
tests: Add an explanation of run-perfbench.py.
Also changes this file to a Markdown file. Signed-off-by: Angus Gratton <gus@projectgus.com>
1 parent ccaf197 commit ad308bc

File tree

2 files changed

+149
-27
lines changed

2 files changed

+149
-27
lines changed

tests/README

Lines changed: 0 additions & 27 deletions
This file was deleted.

tests/README.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# MicroPython Test Suite
2+
3+
This directory contains tests for various functionality areas of MicroPython.
4+
To run all stable tests, run "run-tests.py" script in this directory.
5+
6+
Tests of capabilities not supported on all platforms should be written
7+
to check for the capability being present. If it is not, the test
8+
should merely output 'SKIP' followed by the line terminator, and call
9+
sys.exit() to raise SystemExit, instead of attempting to test the
10+
missing capability. The testing framework (run-tests.py in this
11+
directory, test_main.c in qemu_arm) recognizes this as a skipped test.
12+
13+
There are a few features for which this mechanism cannot be used to
14+
condition a test. The run-tests.py script uses small scripts in the
15+
feature_check directory to check whether each such feature is present,
16+
and skips the relevant tests if not.
17+
18+
Tests are generally verified by running the test both in MicroPython and
19+
in CPython and comparing the outputs. If the output differs the test fails
20+
and the outputs are saved in a .out and a .exp file respectively.
21+
For tests that cannot be run in CPython, for example because they use
22+
the machine module, a .exp file can be provided next to the test's .py
23+
file. A convenient way to generate that is to run the test, let it fail
24+
(because CPython cannot run it) and then copy the .out file (but not
25+
before checking it manually!)
26+
27+
When creating new tests, anything that relies on float support should go in the
28+
float/ subdirectory. Anything that relies on import x, where x is not a built-in
29+
module, should go in the import/ subdirectory.
30+
31+
## perf_bench
32+
33+
The `perf_bench` directory contains some performance benchmarks that can be used
34+
to benchmark different MicroPython firmwares or host ports.
35+
36+
The runner utility is `run-perfbench,py`. Execute `./run-perfbench.py --help`
37+
for a full list of command line options.
38+
39+
### Benchmarking a target
40+
41+
To run tests on a firmware target using `pyboard.py`, run the command line like
42+
this:
43+
44+
```
45+
./run-perfbench.py -p -d /dev/ttyACM0 168 100
46+
```
47+
48+
* `-p` indicates running on a remote target via pyboard.py, not the host.
49+
* `-d PORTNAME` is the serial port, `/dev/ttyACM0` is the default if not
50+
provided.
51+
* `168` is value `N`, the approximate CPU frequency in MHz (in this case Pyboard
52+
V1.1 is 168MHz). It's possible to choose other values as well: lower values
53+
like `10` will run much the tests much quicker, higher values like `1000` will
54+
run much longer.
55+
* `100` is value `M`, the approximate heap size in kilobytes (can get this from
56+
`import micropython; micropython.mem_info()` or estimate it). It's possible to
57+
choose other values here too: lower values like `10` will run shorter/smaller
58+
tests, and higher values will run bigger tests. The maximum value of `M` is
59+
limited by available heap, and the tests are written so the "recommended"
60+
value is approximately the upper limit.
61+
62+
### Benchmarking the host
63+
64+
To benchmark the host build (unix/Windows), run like this:
65+
66+
```
67+
./run-perfbench.py 2000 10000
68+
```
69+
70+
The output of perfbench is a list of tests and times/scores, like this:
71+
72+
```
73+
N=2000 M=10000 n_average=8
74+
perf_bench/bm_chaos.py: SKIP
75+
perf_bench/bm_fannkuch.py: 94550.38 2.9145 84.68 2.8499
76+
perf_bench/bm_fft.py: 79920.38 10.0771 129269.74 8.8205
77+
perf_bench/bm_float.py: 43844.62 17.8229 353219.64 17.7693
78+
perf_bench/bm_hexiom.py: 32959.12 15.0243 775.77 14.8893
79+
perf_bench/bm_nqueens.py: 40855.00 10.7297 247776.15 11.3647
80+
perf_bench/bm_pidigits.py: 64547.75 2.5609 7751.36 2.5996
81+
perf_bench/core_import_mpy_multi.py: 15433.38 14.2733 33065.45 14.2368
82+
perf_bench/core_import_mpy_single.py: 263.00 11.3910 3858.35 12.9021
83+
perf_bench/core_qstr.py: 4929.12 1.8434 8117.71 1.7921
84+
perf_bench/core_yield_from.py: 16274.25 6.2584 12334.13 5.8125
85+
perf_bench/misc_aes.py: 57425.25 5.5226 17888.60 5.7482
86+
perf_bench/misc_mandel.py: 40809.25 8.2007 158107.00 9.8864
87+
perf_bench/misc_pystone.py: 39821.75 6.4145 100867.62 6.5043
88+
perf_bench/misc_raytrace.py: 36293.75 6.8501 26906.93 6.8402
89+
perf_bench/viper_call0.py: 15573.00 14.9931 19644.99 13.1550
90+
perf_bench/viper_call1a.py: 16725.75 9.8205 18099.96 9.2752
91+
perf_bench/viper_call1b.py: 20752.62 8.3372 14565.60 9.0663
92+
perf_bench/viper_call1c.py: 20849.88 5.8783 14444.80 6.6295
93+
perf_bench/viper_call2a.py: 16156.25 11.2956 18818.59 11.7959
94+
perf_bench/viper_call2b.py: 22047.38 8.9484 13725.73 9.6800
95+
```
96+
97+
The numbers across each line are times and scores for the test:
98+
99+
* Runtime average (microseconds, lower is better)
100+
* Runtime standard deviation as a percentage
101+
* Score average (units depend on the benchmark, higher is better)
102+
* Score standard deviation as a percentage
103+
104+
### Comparing performance
105+
106+
Usually you want to know if something is faster or slower than a reference. To
107+
do this, copy the output of each `run-perfbench.py` run to a text file.
108+
109+
This can be done multiple ways, but one way on Linux/macOS is with the `tee`
110+
utility: `./run-perfbench.py -p 168 100 | tee pyb-run1.txt`
111+
112+
Once you have two files with output from two different runs (maybe with
113+
different code or configuration), compare the runtimes with `./run-perfbench.py
114+
-t pybv-run1.txt pybv-run2.txt` or compare scores with `./run-perfbench.py -s
115+
pybv-run1.txt pybv-run2.txt`:
116+
117+
```
118+
> ./run-perfbench.py -s pyb-run1.txt pyb-run2.txt
119+
diff of scores (higher is better)
120+
N=168 M=100 pyb-run1.txt -> pyb-run2.txt diff diff% (error%)
121+
bm_chaos.py 352.90 -> 352.63 : -0.27 = -0.077% (+/-0.00%)
122+
bm_fannkuch.py 77.52 -> 77.45 : -0.07 = -0.090% (+/-0.01%)
123+
bm_fft.py 2516.80 -> 2519.74 : +2.94 = +0.117% (+/-0.00%)
124+
bm_float.py 5749.27 -> 5749.65 : +0.38 = +0.007% (+/-0.00%)
125+
bm_hexiom.py 42.22 -> 42.30 : +0.08 = +0.189% (+/-0.00%)
126+
bm_nqueens.py 4407.55 -> 4414.44 : +6.89 = +0.156% (+/-0.00%)
127+
bm_pidigits.py 638.09 -> 632.14 : -5.95 = -0.932% (+/-0.25%)
128+
core_import_mpy_multi.py 477.74 -> 477.57 : -0.17 = -0.036% (+/-0.00%)
129+
core_import_mpy_single.py 58.74 -> 58.72 : -0.02 = -0.034% (+/-0.00%)
130+
core_qstr.py 63.11 -> 63.11 : +0.00 = +0.000% (+/-0.01%)
131+
core_yield_from.py 357.57 -> 357.57 : +0.00 = +0.000% (+/-0.00%)
132+
misc_aes.py 397.27 -> 396.47 : -0.80 = -0.201% (+/-0.00%)
133+
misc_mandel.py 3375.70 -> 3375.84 : +0.14 = +0.004% (+/-0.00%)
134+
misc_pystone.py 2265.36 -> 2265.97 : +0.61 = +0.027% (+/-0.01%)
135+
misc_raytrace.py 367.61 -> 368.15 : +0.54 = +0.147% (+/-0.01%)
136+
viper_call0.py 605.92 -> 605.92 : +0.00 = +0.000% (+/-0.00%)
137+
viper_call1a.py 576.78 -> 576.78 : +0.00 = +0.000% (+/-0.00%)
138+
viper_call1b.py 452.45 -> 452.46 : +0.01 = +0.002% (+/-0.01%)
139+
viper_call1c.py 457.39 -> 457.39 : +0.00 = +0.000% (+/-0.00%)
140+
viper_call2a.py 561.37 -> 561.37 : +0.00 = +0.000% (+/-0.00%)
141+
viper_call2b.py 389.49 -> 389.50 : +0.01 = +0.003% (+/-0.01%)
142+
```
143+
144+
Note in particular the error percentages at the end of each line. If these are
145+
high relative to the percentage difference then it indicates high variability in
146+
the test runs, and the absolute difference value is unreliable. High error
147+
percentages are particularly common on PC builds, where the host OS may
148+
influence test run times. Increasing the `N` value may help average this out by
149+
running each test longer.

0 commit comments

Comments
 (0)
0