|
| 1 | +.. highlight:: shell-session |
| 2 | + |
| 3 | +.. _perf_profiling: |
| 4 | + |
| 5 | +============================================== |
| 6 | +Python support for the Linux ``perf`` profiler |
| 7 | +============================================== |
| 8 | + |
| 9 | +:author: Pablo Galindo |
| 10 | + |
| 11 | +The Linux ``perf`` profiler is a very powerful tool that allows you to profile and |
| 12 | +obtain information about the performance of your application. ``perf`` also has |
| 13 | +a very vibrant ecosystem of tools that aid with the analysis of the data that it |
| 14 | +produces. |
| 15 | + |
| 16 | +The main problem with using the ``perf`` profiler with Python applications is that |
| 17 | +``perf`` only allows to get information about native symbols, this is, the names of |
| 18 | +the functions and procedures written in C. This means that the names and file names |
| 19 | +of the Python functions in your code will not appear in the output of the ``perf``. |
| 20 | + |
| 21 | +Since Python 3.12, the interpreter can run in a special mode that allows Python |
| 22 | +functions to appear in the output of the ``perf`` profiler. When this mode is |
| 23 | +enabled, the interpreter will interpose a small piece of code compiled on the |
| 24 | +fly before the execution of every Python function and it will teach ``perf`` the |
| 25 | +relationship between this piece of code and the associated Python function using |
| 26 | +`perf map files`_. |
| 27 | + |
| 28 | +.. warning:: |
| 29 | + |
| 30 | + Support for the ``perf`` profiler is only currently available for Linux on |
| 31 | + selected architectures. Check the output of the configure build step or |
| 32 | + check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE`` |
| 33 | + to see if your system is supported. |
| 34 | + |
| 35 | +For example, consider the following script: |
| 36 | + |
| 37 | +.. code-block:: python |
| 38 | +
|
| 39 | + def foo(n): |
| 40 | + result = 0 |
| 41 | + for _ in range(n): |
| 42 | + result += 1 |
| 43 | + return result |
| 44 | +
|
| 45 | + def bar(n): |
| 46 | + foo(n) |
| 47 | +
|
| 48 | + def baz(n): |
| 49 | + bar(n) |
| 50 | +
|
| 51 | + if __name__ == "__main__": |
| 52 | + baz(1000000) |
| 53 | +
|
| 54 | +We can run perf to sample CPU stack traces at 9999 Hertz: |
| 55 | + |
| 56 | + $ perf record -F 9999 -g -o perf.data python my_script.py |
| 57 | + |
| 58 | +Then we can use perf report to analyze the data: |
| 59 | + |
| 60 | +.. code-block:: shell-session |
| 61 | +
|
| 62 | + $ perf report --stdio -n -g |
| 63 | +
|
| 64 | + # Children Self Samples Command Shared Object Symbol |
| 65 | + # ........ ........ ............ .......... .................. .......................................... |
| 66 | + # |
| 67 | + 91.08% 0.00% 0 python.exe python.exe [.] _start |
| 68 | + | |
| 69 | + ---_start |
| 70 | + | |
| 71 | + --90.71%--__libc_start_main |
| 72 | + Py_BytesMain |
| 73 | + | |
| 74 | + |--56.88%--pymain_run_python.constprop.0 |
| 75 | + | | |
| 76 | + | |--56.13%--_PyRun_AnyFileObject |
| 77 | + | | _PyRun_SimpleFileObject |
| 78 | + | | | |
| 79 | + | | |--55.02%--run_mod |
| 80 | + | | | | |
| 81 | + | | | --54.65%--PyEval_EvalCode |
| 82 | + | | | _PyEval_EvalFrameDefault |
| 83 | + | | | PyObject_Vectorcall |
| 84 | + | | | _PyEval_Vector |
| 85 | + | | | _PyEval_EvalFrameDefault |
| 86 | + | | | PyObject_Vectorcall |
| 87 | + | | | _PyEval_Vector |
| 88 | + | | | _PyEval_EvalFrameDefault |
| 89 | + | | | PyObject_Vectorcall |
| 90 | + | | | _PyEval_Vector |
| 91 | + | | | | |
| 92 | + | | | |--51.67%--_PyEval_EvalFrameDefault |
| 93 | + | | | | | |
| 94 | + | | | | |--11.52%--_PyLong_Add |
| 95 | + | | | | | | |
| 96 | + | | | | | |--2.97%--_PyObject_Malloc |
| 97 | + ... |
| 98 | +
|
| 99 | +As you can see here, the Python functions are not shown in the output, only ``_Py_Eval_EvalFrameDefault`` appears |
| 100 | +(the function that evaluates the Python bytecode) shows up. Unfortunately that's not very useful because all Python |
| 101 | +functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which |
| 102 | +bytecode-evaluating function. |
| 103 | + |
| 104 | +Instead, if we run the same experiment with perf support activated we get: |
| 105 | + |
| 106 | +.. code-block:: shell-session |
| 107 | +
|
| 108 | + $ perf report --stdio -n -g |
| 109 | +
|
| 110 | + # Children Self Samples Command Shared Object Symbol |
| 111 | + # ........ ........ ............ .......... .................. ..................................................................... |
| 112 | + # |
| 113 | + 90.58% 0.36% 1 python.exe python.exe [.] _start |
| 114 | + | |
| 115 | + ---_start |
| 116 | + | |
| 117 | + --89.86%--__libc_start_main |
| 118 | + Py_BytesMain |
| 119 | + | |
| 120 | + |--55.43%--pymain_run_python.constprop.0 |
| 121 | + | | |
| 122 | + | |--54.71%--_PyRun_AnyFileObject |
| 123 | + | | _PyRun_SimpleFileObject |
| 124 | + | | | |
| 125 | + | | |--53.62%--run_mod |
| 126 | + | | | | |
| 127 | + | | | --53.26%--PyEval_EvalCode |
| 128 | + | | | py::<module>:/src/script.py |
| 129 | + | | | _PyEval_EvalFrameDefault |
| 130 | + | | | PyObject_Vectorcall |
| 131 | + | | | _PyEval_Vector |
| 132 | + | | | py::baz:/src/script.py |
| 133 | + | | | _PyEval_EvalFrameDefault |
| 134 | + | | | PyObject_Vectorcall |
| 135 | + | | | _PyEval_Vector |
| 136 | + | | | py::bar:/src/script.py |
| 137 | + | | | _PyEval_EvalFrameDefault |
| 138 | + | | | PyObject_Vectorcall |
| 139 | + | | | _PyEval_Vector |
| 140 | + | | | py::foo:/src/script.py |
| 141 | + | | | | |
| 142 | + | | | |--51.81%--_PyEval_EvalFrameDefault |
| 143 | + | | | | | |
| 144 | + | | | | |--13.77%--_PyLong_Add |
| 145 | + | | | | | | |
| 146 | + | | | | | |--3.26%--_PyObject_Malloc |
| 147 | +
|
| 148 | +
|
| 149 | +
|
| 150 | +Enabling perf profiling mode |
| 151 | +---------------------------- |
| 152 | + |
| 153 | +There are two main ways to activate the perf profiling mode. If you want it to be |
| 154 | +active since the start of the Python interpreter, you can use the `-Xperf` option: |
| 155 | + |
| 156 | + $ python -Xperf my_script.py |
| 157 | + |
| 158 | +There is also support for dynamically activating and deactivating the perf |
| 159 | +profiling mode by using the APIs in the :mod:`sys` module: |
| 160 | + |
| 161 | +.. code-block:: python |
| 162 | +
|
| 163 | + import sys |
| 164 | + sys.activate_stack_trampoline("perf") |
| 165 | +
|
| 166 | + # Run some code with Perf profiling active |
| 167 | +
|
| 168 | + sys.deactivate_stack_trampoline() |
| 169 | +
|
| 170 | + # Perf profiling is not active anymore |
| 171 | +
|
| 172 | +These APIs can be handy if you want to activate/deactivate profiling mode in |
| 173 | +response to a signal or other communication mechanism with your process. |
| 174 | + |
| 175 | + |
| 176 | + |
| 177 | +Now we can analyze the data with ``perf report``: |
| 178 | + |
| 179 | + $ perf report -g -i perf.data |
| 180 | + |
| 181 | + |
| 182 | +How to obtain the best results |
| 183 | +------------------------------- |
| 184 | + |
| 185 | +For the best results, Python should be compiled with |
| 186 | +``CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"`` as this allows |
| 187 | +profilers to unwind using only the frame pointer and not on DWARF debug |
| 188 | +information. This is because as the code that is interposed to allow perf |
| 189 | +support is dynamically generated it doesn't have any DWARF debugging information |
| 190 | +available. |
| 191 | + |
| 192 | +You can check if you system has been compiled with this flag by running: |
| 193 | + |
| 194 | + $ python -m sysconfig | grep 'no-omit-frame-pointer' |
| 195 | + |
| 196 | +If you don't see any output it means that your interpreter has not been compiled with |
| 197 | +frame pointers and therefore it may not be able to show Python functions in the output |
| 198 | +of ``perf``. |
| 199 | + |
| 200 | +.. _perf map files: https://github.com/torvalds/linux/blob/0513e464f9007b70b96740271a948ca5ab6e7dd7/tools/perf/Documentation/jit-interface.txt |
0 commit comments