8000 gh-118518: Improve perf docs (#118708) · python/cpython@b9caa09 · GitHub
[go: up one dir, main page]

Skip to content

Commit b9caa09

Browse files
authored
gh-118518: Improve perf docs (#118708)
1 parent a94ac56 commit b9caa09

File tree

1 file changed

+48
-22
lines changed

1 file changed

+48
-22
lines changed

Doc/howto/perf_profiling.rst

Lines changed: 48 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -162,12 +162,12 @@ the :option:`!-X` option takes precedence over the environment variable.
162162

163163
Example, using the environment variable::
164164

165-
$ PYTHONPERFSUPPORT=1 python script.py
165+
$ PYTHONPERFSUPPORT=1 perf record -F 9999 -g -o perf.data python script.py
166166
$ perf report -g -i perf.data
167167

168168
Example, using the :option:`!-X` option::
169169

170-
$ python -X perf script.py
170+
$ perf record -F 9999 -g -o perf.data python -X perf script.py
171171
$ perf report -g -i perf.data
172172

173173
Example, using the :mod:`sys` APIs in file :file:`example.py`:
@@ -184,7 +184,7 @@ Example, using the :mod:`sys` APIs in file :file:`example.py`:
184184
185185
...then::
186186

187-
$ python ./example.py
187+
$ perf record -F 9999 -g -o perf.data python ./example.py
188188
$ perf report -g -i perf.data
189189

190190

@@ -210,31 +210,57 @@ of ``perf``.
210210
How to work without frame pointers
211211
----------------------------------
212212

213-
If you are working with a Python interpreter that has been compiled without frame pointers
214-
you can still use the ``perf`` profiler but the overhead will be a bit higher because Python
215-
needs to generate unwinding information for every Python function call on the fly. Additionally,
216-
``perf`` will take more time to process the data because it will need to use the DWARF debugging
217-
information to unwind the stack and this is a slow process.
213+
If you are working with a Python interpreter that has been compiled without
214+
frame pointers, you can still use the ``perf`` profiler, but the overhead will be
215+
a bit higher because Python needs to generate unwinding information for every
216+
Python function call on the fly. Additionally, ``perf`` will take more time to
217+
process the data because it will need to use the DWARF debugging information to
218+
unwind the stack and this is a slow process.
218219

219-
To enable this mode, you can use the environment variable :envvar:`PYTHON_PERF_JIT_SUPPORT` or the
220-
:option:`-X perf_jit <-X>` option, which will enable the JIT mode for the ``perf`` profiler.
220+
To enable this mode, you can use the environment variable
221+
:envvar:`PYTHON_PERF_JIT_SUPPORT` or the :option:`-X perf_jit <-X>` option,
222+
which will enable the JIT mode for the ``perf`` profiler.
221223

222-
When using the perf JIT mode, you need an extra step before you can run ``perf report``. You need to
223-
call the ``perf inject`` command to inject the JIT information into the ``perf.data`` file.
224+
.. note::
225+
226+
Due to a bug in the ``perf`` tool, only ``perf`` versions higher than v6.8
227+
will work with the JIT mode. The fix was also backported to the v6.7.2
228+
version of the tool.
229+
230+
Note that when checking the version of the ``perf`` tool (which can be done
231+
by running ``perf version``) you must take into account that some distros
232+
add some custom version numbers including a ``-`` character. This means
233+
that ``perf 6.7-3`` is not necessarily ``perf 6.7.3``.
234+
235+
When using the perf JIT mode, you need an extra step before you can run ``perf
236+
report``. You need to call the ``perf inject`` command to inject the JIT
237+
information into the ``perf.data`` file.::
224238

225239
$ perf record -F 9999 -g --call-graph dwarf -o perf.data python -Xperf_jit my_script.py
226-
$ perf inject -i perf.data --jit
227-
$ perf report -g -i perf.data
240+
$ perf inject -i perf.data --jit --output perf.jit.data
241+
$ perf report -g -i perf.jit.data
228242

229243
or using the environment variable::
230244

231245
$ PYTHON_PERF_JIT_SUPPORT=1 perf record -F 9999 -g --call-graph dwarf -o perf.data python my_script.py
232-
$ perf inject -i perf.data --jit
233-
$ perf report -g -i perf.data
234-
235-
Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take snapshots of the stack of
236-
the process being profiled and save the information in the ``perf.data`` file. By default the size of
237-
the stack dump is 8192 bytes but the user can change the size by passing the size after comma like
238-
``--call-graph dwarf,4096``. The size of the stack dump is important because if the size is too small
239-
``perf`` will not be able to unwind the stack and the output will be incomplete.
246+
$ perf inject -i perf.data --jit --output perf.jit.data
247+
$ perf report -g -i perf.jit.data
248+
249+
``perf inject --jit`` command will read ``perf.data``,
250+
automatically pick up the perf dump file that Python creates (in
251+
``/tmp/perf-$PID.dump``), and then create ``perf.jit.data`` which merges all the
252+
JIT information together. It should also create a lot of ``jitted-XXXX-N.so``
253+
files in the current directory which are ELF images for all the JIT trampolines
254+
that were created by Python.
255+
256+
.. warning::
257+
Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take
258+
snapshots of the stack of the process being profiled and save the
259+
information in the ``perf.data`` file. By default the size of the stack dump
260+
is 8192 bytes but the user can change the size by passing the size after
261+
comma like ``--call-graph dwarf,4096``. The size of the stack dump is
262+
important because if the size is too small ``perf`` will not be able to
263+
unwind the stack and the output will be incomplete. On the other hand, if
264+
the size is too big, then ``perf`` won't be able to sample the process as
265+
frequently as it would like as the overhead will be higher.
240266

0 commit comments

Comments
 (0)
0