8000 gh-96143: Allow Linux perf profiler to see Python calls (GH-96123) · python/cpython@6d791a9 · GitHub
[go: up one dir, main page]

Skip to content

Commit 6d791a9

Browse files
authored
gh-96143: Allow Linux perf profiler to see Python calls (GH-96123)
⚠️ ⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers ⚠️ ⚠️ If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**. If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
1 parent 0f733ff commit 6d791a9

24 files changed

+1412
-2
lines changed

Doc/c-api/init_config.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1155,6 +1155,20 @@ PyConfig
11551155
11561156
Default: ``-1`` in Python mode, ``0`` in isolated mode.
11571157
1158+
.. c:member:: int perf_profiling
1159+
1160+
Enable compatibility mode with the perf profiler?
1161+
1162+
If non-zero, initialize the perf trampoline. See :ref:`perf_profiling`
1163+
for more information.
1164+
1165+
Set by :option:`-X perf <-X>` command line option and by the
1166+
:envvar:`PYTHONPERFSUPPORT` environment variable.
1167+
1168+
Default: ``-1``.
1169+
1170+
.. versionadded:: 3.12
1171+
11581172
.. c:member:: int use_environment
11591173
11601174
Use :ref:`environment variables <using-on-envvars>`?

Doc/howto/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Currently, the HOWTOs are:
3030
ipaddress.rst
3131
clinic.rst
3232
instrumentation.rst
33+
perf_profiling.rst
3334
annotations.rst
3435
isolating-extensions.rst
3536

Doc/howto/perf_profiling.rst

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
.. highlight:: shell-session
2+
3+
.. _perf_profiling:
4+
5+
==============================================
6+
Python support for the Linux ``perf`` profiler
7+
==============================================
8+
9+
:author: Pablo Galindo
10+
11+
The Linux ``perf`` profiler is a very powerful tool that allows you to profile and
12+
obtain information about the performance of your application. ``perf`` also has
13+
a very vibrant ecosystem of tools that aid with the analysis of the data that it
14+
produces.
15+
16+
The main problem with using the ``perf`` profiler with Python applications is that
17+
``perf`` only allows to get information about native symbols, this is, the names of
18+
the functions and procedures written in C. This means that the names and file names
19+
of the Python functions in your code will not appear in the output of the ``perf``.
20+
21+
Since Python 3.12, the interpreter can run in a special mode that allows Python
22+
functions to appear in the output of the ``perf`` profiler. When this mode is
23+
enabled, the interpreter will interpose a small piece of code compiled on the
24+
fly before the execution of every Python function and it will teach ``perf`` the
25+
relationship between this piece of code and the associated Python function using
26+
`perf map files`_.
27+
28+
.. warning::
29+
30+
Support for the ``perf`` profiler is only currently available for Linux on
31+
selected architectures. Check the output of the configure build step or
32+
check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE``
33+
to see if your system is supported.
34+
35+
For example, consider the following script:
36+
37+
.. code-block:: python
38+
39+
def foo(n):
40+
result = 0
41+
for _ in range(n):
42+
result += 1
43+
return result
44+
45+
def bar(n):
46+
foo(n)
47+
48+
def baz(n):
49+
bar(n)
50+
51+
if __name__ == "__main__":
52+
baz(1000000)
53+
54+
We can run perf to sample CPU stack traces at 9999 Hertz:
55+
56+
$ perf record -F 9999 -g -o perf.data python my_script.py
57+
58+
Then we can use perf report to analyze the data:
59+
60+
.. code-block:: shell-session
61+
62+
$ perf report --stdio -n -g
63+
64+
# Children Self Samples Command Shared Object Symbol
65+
# ........ ........ ............ .......... .................. ..........................................
66+
#
67+
91.08% 0.00% 0 python.exe python.exe [.] _start
68+
|
69+
---_start
70+
|
71+
--90.71%--__libc_start_main
72+
Py_BytesMain
73+
|
74+
|--56.88%--pymain_run_python.constprop.0
75+
| |
76+
| |--56.13%--_PyRun_AnyFileObject
77+
| | _PyRun_SimpleFileObject
78+
| | |
79+
| | |--55.02%--run_mod
80+
| | | |
81+
| | | --54.65%--PyEval_EvalCode
82+
| | | _PyEval_EvalFrameDefault
83+
| | | PyObject_Vectorcall
84+
| | | _PyEval_Vector
85+
| | | _PyEval_EvalFrameDefault
86+
| | | PyObject_Vectorcall
87+
| | | _PyEval_Vector
88+
| | | _PyEval_EvalFrameDefault
89+
| | | PyObject_Vectorcall
90+
| | | _PyEval_Vector
91+
| | | |
92+
| | | |--51.67%--_PyEval_EvalFrameDefault
93+
| | | | |
94+
| | | | |--11.52%--_PyLong_Add
95+
| | | | | |
96+
| | | | | |--2.97%--_PyObject_Malloc
97+
...
98+
99+
As you can see here, the Python functions are not shown in the output, only ``_Py_Eval_EvalFrameDefault`` appears
100+
(the function that evaluates the Python bytecode) shows up. Unfortunately that's not very useful because all Python
101+
functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which
102+
bytecode-evaluating function.
103+
104+
Instead, if we run the same experiment with perf support activated we get:
105+
106+
.. code-block:: shell-session
107+
108+
$ perf report --stdio -n -g
109+
110+
# Children Self Samples Command Shared Object Symbol
111+
# ........ ........ ............ .......... .................. .....................................................................
112+
#
113+
90.58% 0.36% 1 python.exe python.exe [.] _start
114+
|
115+
---_start
116+
|
117+
--89.86%--__libc_start_main
118+
Py_BytesMain
119+
|
120+
|--55.43%--pymain_run_python.constprop.0
121+
| |
122+
| |--54.71%--_PyRun_AnyFileObject
123+
| | _PyRun_SimpleFileObject
124+
| | |
125+
| | |--53.62%--run_mod
126+
| | | |
127+
| | | --53.26%--PyEval_EvalCode
128+
| | | py::<module>:/src/script.py
129+
| | | _PyEval_EvalFrameDefault
130+
| | | PyObject_Vectorcall
131+
| | | _PyEval_Vector
132+
| | | py::baz:/src/script.py
133+
| | | _PyEval_EvalFrameDefault
134+
| | | PyObject_Vectorcall
135+
| | | _PyEval_Vector
136+
| | | py::bar:/src/script.py
137+
| | | _PyEval_EvalFrameDefault
138+
| | | PyObject_Vectorcall
139+
| | | _PyEval_Vector
140+
| | | py::foo:/src/script.py
141+
| | | |
142+
| | | |--51.81%--_PyEval_EvalFrameDefault
143+
| | | | |
144+
| | | | |--13.77%--_PyLong_Add
145+
| | | | | |
146+
| | | | | |--3.26%--_PyObject_Malloc
147+
148+
149+
150+
Enabling perf profiling mode
151+
----------------------------
152+
153+
There are two main ways to activate the perf profiling mode. If you want it to be
154+
active since the start of the Python interpreter, you can use the `-Xperf` option:
155+
156+
$ python -Xperf my_script.py
157+
158+
There is also support for dynamically activating and deactivating the perf
159+
profiling mode by using the APIs in the :mod:`sys` module:
160+
161+
.. code-block:: python
162+
163+
import sys
164+
sys.activate_stack_trampoline("perf")
165+
166+
# Run some code with Perf profiling active
167+
168+
sys.deactivate_stack_trampoline()
169+
170+
# Perf profiling is not active anymore
171+
172+
These APIs can be handy if you want to activate/deactivate profiling mode in
173+
response to a signal or other communication mechanism with your process.
174+
175+
176+
177+
Now we can analyze the data with ``perf report``:
178+
179+
$ perf report -g -i perf.data
180+
181+
182+
How to obtain the best results
183+
-------------------------------
184+
185+
For the best results, Python should be compiled with
186+
``CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"`` as this allows
187+
profilers to unwind using only the frame pointer and not on DWARF debug
188+
information. This is because as the code that is interposed to allow perf
189+
support is dynamically generated it doesn't have any DWARF debugging information
190+
available.
191+
192+
You can check if you system has been compiled with this flag by running:
193+
194+
$ python -m sysconfig | grep 'no-omit-frame-pointer'
195+
196+
If you don't see any output it means that your interpreter has not been compiled with
197+
frame pointers and therefore it may not be able to show Python functions in the output
198+
of ``perf``.
199+
200+
.. _perf map files: https://github.com/torvalds/linux/blob/0513e464f9007b70b96740271a948ca5ab6e7dd7/tools/perf/Documentation/jit-interface.txt

Doc/using/cmdline.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -535,6 +535,12 @@ Miscellaneous options
535535
development (running from the source tree) then the default is "off".
536536
Note that the "importlib_bootstrap" and "importlib_bootstrap_external"
537537
frozen modules are always used, even if this flag is set to "off".
538+
* ``-X perf`` to activate compatibility mode with the ``perf`` profiler.
539+
When this option is activated, the Linux ``perf`` profiler will be able to
540+
report Python calls. This option is only available on some platforms and
541+
will do nothing if is not supported on the current system. The default value
542+
is "off". See also :envvar:`PYTHONPERFSUPPORT` and :ref:`perf_profiling`
543+
for more information.
538544

539545
It also allows passing arbitrary values and retrieving them through the
540546
:data:`sys._xoptions` dictionary.
@@ -1025,6 +1031,13 @@ conflict.
10251031

10261032
.. versionadded:: 3.11
10271033

1034+
.. envvar:: PYTHONPERFSUPPORT
1035+
1036+
If this variable is set to a nonzero value, it activates compatibility mode
1037+
with the ``perf`` profiler so Python calls can be detected by it. See the
1038+
:ref:`perf_profiling` section for more information.
1039+
1040+
.. versionadded:: 3.12
10281041

10291042

10301043
Debug-mode variables

Include/cpython/initconfig.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@ typedef struct PyConfig {
142142
unsigned long hash_seed;
143143
int faulthandler;
144144
int tracemalloc;
145+
int perf_profiling;
145146
int import_time;
146147
int code_debug_ranges;
147148
int show_ref_count;

Include/internal/pycore_ceval.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,27 @@ extern PyObject* _PyEval_BuiltinsFromGlobals(
6565
PyThreadState *tstate,
6666
PyObject *globals);
6767

68+
// Trampoline API
69+
70+
typedef struct {
71+
// Callback to initialize the trampoline state
72+
void* (*init_state)(void);
73+
// Callback to register every trampoline being created
74+
void (*write_state)(void* state, const void *code_addr,
75+
unsigned int code_size, PyCodeObject* code);
76+
// Callback to free the trampoline state
77+
int (*free_state)(void* state);
78+
} _PyPerf_Callbacks;
79+
80+
extern int _PyPerfTrampoline_SetCallbacks(_PyPerf_Callbacks *);
81+
extern void _PyPerfTrampoline_GetCallbacks(_PyPerf_Callbacks *);
82+
extern int _PyPerfTrampoline_Init(int activate);
83+
extern int _PyPerfTrampoline_Fini(void);
84+
extern int _PyIsPerfTrampolineActive(void);
85+
extern PyStatus _PyPerfTrampoline_AfterFork_Child(void);
86+
#ifdef PY_HAVE_PERF_TRAMPOLINE
87+
extern _PyPerf_Callbacks _Py_perfmap_callbacks;
88+
#endif
6889

6990
static inline PyObject*
7091
_PyEval_EvalFrame(PyThreadState *tstate, struct _PyInterpreterFrame *frame, int throwflag)

Lib/test/test_embed.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -436,6 +436,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
436436
'hash_seed': 0,
437437
'faulthandler': 0,
438438
'tracemalloc': 0,
439+
'perf_profiling': 0,
439440
'import_time': 0,
440441
'code_debug_ranges': 1,
441442
'show_ref_count': 0,
@@ -520,6 +521,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
520521
use_hash_seed=0,
521522
faulthandler=0,
522523
tracemalloc=0,
524+
perf_profiling=0,
523525
pathconfig_warnings=0,
524526
)
525527
if MS_WINDOWS:
@@ -828,6 +830,7 @@ def test_init_from_config(self):
828830
'use_hash_seed': 1,
829831
'hash_seed': 123,
830832
'tracemalloc': 2,
833+
'perf_profiling': 0,
831834
'import_time': 1,
832835
'code_debug_ranges': 0,
833836
'show_ref_count': 1,
@@ -890,6 +893,7 @@ def test_init_compat_env(self):
890893
'use_hash_seed': 1,
891894
'hash_seed': 42,
892895
'tracemalloc': 2,
896+
'perf_profiling': 0,
893897
'import_time': 1,
894898
'code_debug_ranges': 0,
895899
'malloc_stats': 1,
@@ -921,6 +925,7 @@ def test_init_python_env(self):
921925
'use_hash_seed': 1,
922926
'hash_seed': 42,
923927
'tracemalloc': 2,
928+
'perf_profiling': 0,
924929
'import_time': 1,
925930
'code_debug_ranges': 0,
926931
'malloc_stats': 1,

0 commit comments

Comments
 (0)
0