|
| 1 | +# MicroPython Test Suite |
| 2 | + |
| 3 | +This directory contains tests for various functionality areas of MicroPython. |
| 4 | +To run all stable tests, run "run-tests.py" script in this directory. |
| 5 | + |
| 6 | +Tests of capabilities not supported on all platforms should be written |
| 7 | +to check for the capability being present. If it is not, the test |
| 8 | +should merely output 'SKIP' followed by the line terminator, and call |
| 9 | +sys.exit() to raise SystemExit, instead of attempting to test the |
| 10 | +missing capability. The testing framework (run-tests.py in this |
| 11 | +directory, test_main.c in qemu_arm) recognizes this as a skipped test. |
| 12 | + |
| 13 | +There are a few features for which this mechanism cannot be used to |
| 14 | +condition a test. The run-tests.py script uses small scripts in the |
| 15 | +feature_check directory to check whether each such feature is present, |
| 16 | +and skips the relevant tests if not. |
| 17 | + |
| 18 | +Tests are generally verified by running the test both in MicroPython and |
| 19 | +in CPython and comparing the outputs. If the output differs the test fails |
| 20 | +and the outputs are saved in a .out and a .exp file respectively. |
| 21 | +For tests that cannot be run in CPython, for example because they use |
| 22 | +the machine module, a .exp file can be provided next to the test's .py |
| 23 | +file. A convenient way to generate that is to run the test, let it fail |
| 24 | +(because CPython cannot run it) and then copy the .out file (but not |
| 25 | +before checking it manually!) |
| 26 | + |
| 27 | +When creating new tests, anything that relies on float support should go in the |
| 28 | +float/ subdirectory. Anything that relies on import x, where x is not a built-in |
| 29 | +module, should go in the import/ subdirectory. |
| 30 | + |
| 31 | +## perf_bench |
| 32 | + |
| 33 | +The `perf_bench` directory contains some performance benchmarks that can be used |
| 34 | +to benchmark different MicroPython firmwares or host ports. |
| 35 | + |
| 36 | +The runner utility is `run-perfbench,py`. Execute `./run-perfbench.py --help` |
| 37 | +for a full list of command line options. |
| 38 | + |
| 39 | +### Benchmarking a target |
| 40 | + |
| 41 | +To run tests on a firmware target using `pyboard.py`, run the command line like |
| 42 | +this: |
| 43 | + |
| 44 | +``` |
| 45 | +./run-perfbench.py -p -d /dev/ttyACM0 168 100 |
| 46 | +``` |
| 47 | + |
| 48 | +* `-p` indicates running on a remote target via pyboard.py, not the host. |
| 49 | +* `-d PORTNAME` is the serial port, `/dev/ttyACM0` is the default if not |
| 50 | + provided. |
| 51 | +* `168` is value `N`, the approximate CPU frequency in MHz (in this case Pyboard |
| 52 | + V1.1 is 168MHz). It's possible to choose other values as well: lower values |
| 53 | + like `10` will run much the tests much quicker, higher values like `1000` will |
| 54 | + run much longer. |
| 55 | +* `100` is value `M`, the approximate heap size in kilobytes (can get this from |
| 56 | + `import micropython; micropython.mem_info()` or estimate it). It's possible to |
| 57 | + choose other values here too: lower values like `10` will run shorter/smaller |
| 58 | + tests, and higher values will run bigger tests. The maximum value of `M` is |
| 59 | + limited by available heap, and the tests are written so the "recommended" |
| 60 | + value is approximately the upper limit. |
| 61 | + |
| 62 | +### Benchmarking the host |
| 63 | + |
| 64 | +To benchmark the host build (unix/Windows), run like this: |
| 65 | + |
| 66 | +``` |
| 67 | +./run-perfbench.py 2000 10000 |
| 68 | +``` |
| 69 | + |
| 70 | +The output of perfbench is a list of tests and times/scores, like this: |
| 71 | + |
| 72 | +``` |
| 73 | +N=2000 M=10000 n_average=8 |
| 74 | +perf_bench/bm_chaos.py: SKIP |
| 75 | +perf_bench/bm_fannkuch.py: 94550.38 2.9145 84.68 2.8499 |
| 76 | +perf_bench/bm_fft.py: 79920.38 10.0771 129269.74 8.8205 |
| 77 | +perf_bench/bm_float.py: 43844.62 17.8229 353219.64 17.7693 |
| 78 | +perf_bench/bm_hexiom.py: 32959.12 15.0243 775.77 14.8893 |
| 79 | +perf_bench/bm_nqueens.py: 40855.00 10.7297 247776.15 11.3647 |
| 80 | +perf_bench/bm_pidigits.py: 64547.75 2.5609 7751.36 2.5996 |
| 81 | +perf_bench/core_import_mpy_multi.py: 15433.38 14.2733 33065.45 14.2368 |
| 82 | +perf_bench/core_import_mpy_single.py: 263.00 11.3910 3858.35 12.9021 |
| 83 | +perf_bench/core_qstr.py: 4929.12 1.8434 8117.71 1.7921 |
| 84 | +perf_bench/core_yield_from.py: 16274.25 6.2584 12334.13 5.8125 |
| 85 | +perf_bench/misc_aes.py: 57425.25 5.5226 17888.60 5.7482 |
| 86 | +perf_bench/misc_mandel.py: 40809.25 8.2007 158107.00 9.8864 |
| 87 | +perf_bench/misc_pystone.py: 39821.75 6.4145 100867.62 6.5043 |
| 88 | +perf_bench/misc_raytrace.py: 36293.75 6.8501 26906.93 6.8402 |
| 89 | +perf_bench/viper_call0.py: 15573.00 14.9931 19644.99 13.1550 |
| 90 | +perf_bench/viper_call1a.py: 16725.75 9.8205 18099.96 9.2752 |
| 91 | +perf_bench/viper_call1b.py: 20752.62 8.3372 14565.60 9.0663 |
| 92 | +perf_bench/viper_call1c.py: 20849.88 5.8783 14444.80 6.6295 |
| 93 | +perf_bench/viper_call2a.py: 16156.25 11.2956 18818.59 11.7959 |
| 94 | +perf_bench/viper_call2b.py: 22047.38 8.9484 13725.73 9.6800 |
| 95 | +``` |
| 96 | + |
| 97 | +The numbers across each line are times and scores for the test: |
| 98 | + |
| 99 | +* Runtime average (microseconds, lower is better) |
| 100 | +* Runtime standard deviation as a percentage |
| 101 | +* Score average (units depend on the benchmark, higher is better) |
| 102 | +* Score standard deviation as a percentage |
| 103 | + |
| 104 | +### Comparing performance |
| 105 | + |
| 106 | +Usually you want to know if something is faster or slower than a reference. To |
| 107 | +do this, copy the output of each `run-perfbench.py` run to a text file. |
| 108 | + |
| 109 | +This can be done multiple ways, but one way on Linux/macOS is with the `tee` |
| 110 | +utility: `./run-perfbench.py -p 168 100 | tee pyb-run1.txt` |
| 111 | + |
| 112 | +Once you have two files with output from two different runs (maybe with |
| 113 | +different code or configuration), compare the runtimes with `./run-perfbench.py |
| 114 | +-t pybv-run1.txt pybv-run2.txt` or compare scores with `./run-perfbench.py -s |
| 115 | +pybv-run1.txt pybv-run2.txt`: |
| 116 | + |
| 117 | +``` |
| 118 | +> ./run-perfbench.py -s pyb-run1.txt pyb-run2.txt |
| 119 | +diff of scores (higher is better) |
| 120 | +N=168 M=100 pyb-run1.txt -> pyb-run2.txt diff diff% (error%) |
| 121 | +bm_chaos.py 352.90 -> 352.63 : -0.27 = -0.077% (+/-0.00%) |
| 122 | +bm_fannkuch.py 77.52 -> 77.45 : -0.07 = -0.090% (+/-0.01%) |
| 123 | +bm_fft.py 2516.80 -> 2519.74 : +2.94 = +0.117% (+/-0.00%) |
| 124 | +bm_float.py 5749.27 -> 5749.65 : +0.38 = +0.007% (+/-0.00%) |
| 125 | +bm_hexiom.py 42.22 -> 42.30 : +0.08 = +0.189% (+/-0.00%) |
| 126 | +bm_nqueens.py 4407.55 -> 4414.44 : +6.89 = +0.156% (+/-0.00%) |
| 127 | +bm_pidigits.py 638.09 -> 632.14 : -5.95 = -0.932% (+/-0.25%) |
| 128 | +core_import_mpy_multi.py 477.74 -> 477.57 : -0.17 = -0.036% (+/-0.00%) |
| 129 | +core_import_mpy_single.py 58.74 -> 58.72 : -0.02 = -0.034% (+/-0.00%) |
| 130 | +core_qstr.py 63.11 -> 63.11 : +0.00 = +0.000% (+/-0.01%) |
| 131 | +core_yield_from.py 357.57 -> 357.57 : +0.00 = +0.000% (+/-0.00%) |
| 132 | +misc_aes.py 397.27 -> 396.47 : -0.80 = -0.201% (+/-0.00%) |
| 133 | +misc_mandel.py 3375.70 -> 3375.84 : +0.14 = +0.004% (+/-0.00%) |
| 134 | +misc_pystone.py 2265.36 -> 2265.97 : +0.61 = +0.027% (+/-0.01%) |
| 135 | +misc_raytrace.py 367.61 -> 368.15 : +0.54 = +0.147% (+/-0.01%) |
| 136 | +viper_call0.py 605.92 -> 605.92 : +0.00 = +0.000% (+/-0.00%) |
| 137 | +viper_call1a.py 576.78 -> 576.78 : +0.00 = +0.000% (+/-0.00%) |
| 138 | +viper_call1b.py 452.45 -> 452.46 : +0.01 = +0.002% (+/-0.01%) |
| 139 | +viper_call1c.py 457.39 -> 457.39 : +0.00 = +0.000% (+/-0.00%) |
| 140 | +viper_call2a.py 561.37 -> 561.37 : +0.00 = +0.000% (+/-0.00%) |
| 141 | +viper_call2b.py 389.49 -> 389.50 : +0.01 = +0.003% (+/-0.01%) |
| 142 | +``` |
| 143 | + |
| 144 | +Note in particular the error percentages at the end of each line. If these are |
| 145 | +high relative to the percentage difference then it indicates high variability in |
| 146 | +the test runs, and the absolute difference value is unreliable. High error |
| 147 | +percentages are particularly common on PC builds, where the host OS may |
| 148 | +influence test run times. Increasing the `N` value may help average this out by |
| 149 | +running each test longer. |
0 commit comments