From a314710a3a41cd3cfce89494d034371cc43c11c7 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 7 May 2024 14:29:17 +0100 Subject: [PATCH 1/3] gh-118518: Improve perf docs --- Doc/howto/perf_profiling.rst | 70 ++++++++++++++++++++++++------------ 1 file changed, 48 insertions(+), 22 deletions(-) diff --git a/Doc/howto/perf_profiling.rst b/Doc/howto/perf_profiling.rst index ed2b76ff4f410c..417646efa813f6 100644 --- a/Doc/howto/perf_profiling.rst +++ b/Doc/howto/perf_profiling.rst @@ -162,12 +162,12 @@ the :option:`!-X` option takes precedence over the environment variable. Example, using the environment variable:: - $ PYTHONPERFSUPPORT=1 python script.py + $ PYTHONPERFSUPPORT=1 perf record -F 9999 -g -o perf.data python script.py $ perf report -g -i perf.data Example, using the :option:`!-X` option:: - $ python -X perf script.py + $ perf record -F 9999 -g -o perf.data python -X perf script.py $ perf report -g -i perf.data Example, using the :mod:`sys` APIs in file :file:`example.py`: @@ -184,7 +184,7 @@ Example, using the :mod:`sys` APIs in file :file:`example.py`: ...then:: - $ python ./example.py + $ perf record -F 9999 -g -o perf.data python ./example.py $ perf report -g -i perf.data @@ -210,31 +210,57 @@ of ``perf``. How to work without frame pointers ---------------------------------- -If you are working with a Python interpreter that has been compiled without frame pointers -you can still use the ``perf`` profiler but the overhead will be a bit higher because Python -needs to generate unwinding information for every Python function call on the fly. Additionally, -``perf`` will take more time to process the data because it will need to use the DWARF debugging -information to unwind the stack and this is a slow process. +If you are working with a Python interpreter that has been compiled without +frame pointers you can still use the ``perf`` profiler but the overhead will be +a bit higher because Python needs to generate unwinding information for every +Python function call on the fly. Additionally, ``perf`` will take more time to +process the data because it will need to use the DWARF debugging information to +unwind the stack and this is a slow process. -To enable this mode, you can use the environment variable :envvar:`PYTHON_PERF_JIT_SUPPORT` or the -:option:`-X perf_jit <-X>` option, which will enable the JIT mode for the ``perf`` profiler. +To enable this mode, you can use the environment variable +:envvar:`PYTHON_PERF_JIT_SUPPORT` or the :option:`-X perf_jit <-X>` option, +which will enable the JIT mode for the ``perf`` profiler. -When using the perf JIT mode, you need an extra step before you can run ``perf report``. You need to -call the ``perf inject`` command to inject the JIT information into the ``perf.data`` file. +.. note:: + + Due to a bug in the ``perf`` tool, only ``perf`` versions higher than v6.8 + will work with the JIT mode. The fix was also backported to the v6.7.2 + version of the tool. + + Note that when checking the version of the ``perf`` tool (which can be done + by running ``perf version``) you must take into account that some distros + add some custom version numbers including a ``-`` character. This means + that ``perf 6.7-3`` is not necessarily ``perf 6.7.3``. + +When using the perf JIT mode, you need an extra step before you can run ``perf +report``. You need to call the ``perf inject`` command to inject the JIT +information into the ``perf.data`` file.:: $ perf record -F 9999 -g --call-graph dwarf -o perf.data python -Xperf_jit my_script.py - $ perf inject -i perf.data --jit - $ perf report -g -i perf.data + $ perf inject -i perf.data --jit --output perf.jit.data + $ perf report -g -i perf.jit.data or using the environment variable:: $ PYTHON_PERF_JIT_SUPPORT=1 perf record -F 9999 -g --call-graph dwarf -o perf.data python my_script.py - $ perf inject -i perf.data --jit - $ perf report -g -i perf.data - -Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take snapshots of the stack of -the process being profiled and save the information in the ``perf.data`` file. By default the size of -the stack dump is 8192 bytes but the user can change the size by passing the size after comma like -``--call-graph dwarf,4096``. The size of the stack dump is important because if the size is too small -``perf`` will not be able to unwind the stack and the output will be incomplete. + $ perf inject -i perf.data --jit --output perf.jit.data + $ perf report -g -i perf.jit.data + +When ``perf inject --jit`` its called. this will read ``perf.data``, +automatically pick up the perf dump file that python creates (in +``/tmp/perf-$PID.dump``), and then create ``perf.jit.data`` which merges all the +JIT information together. This should also create a lot of ``jitted-XXXX-N.so`` +files in the current directory which are ELF images for all the JIT trampolines +that were created by Python. + +.. warning:: + Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take + snapshots of the stack of the process being profiled and save the + information in the ``perf.data`` file. By default the size of the stack dump + is 8192 bytes but the user can change the size by passing the size after + comma like ``--call-graph dwarf,4096``. The size of the stack dump is + important because if the size is too small ``perf`` will not be able to + unwind the stack and the output will be incomplete. On the other hand, if + the size is too big, then ``perf`` won't be able to sample the process as + frequently as it would like as the overhead will be higher. From 50d8e06b8ad628cff9b6b24c4da9d60fd84d0168 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Tue, 7 May 2024 14:49:06 +0100 Subject: [PATCH 2/3] Update Doc/howto/perf_profiling.rst Co-authored-by: Kerim Kabirov --- Doc/howto/perf_profiling.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Doc/howto/perf_profiling.rst b/Doc/howto/perf_profiling.rst index 417646efa813f6..bd292b9ba62e25 100644 --- a/Doc/howto/perf_profiling.rst +++ b/Doc/howto/perf_profiling.rst @@ -246,10 +246,10 @@ or using the environment variable:: $ perf inject -i perf.data --jit --output perf.jit.data $ perf report -g -i perf.jit.data -When ``perf inject --jit`` its called. this will read ``perf.data``, -automatically pick up the perf dump file that python creates (in +``perf inject --jit`` command will read ``perf.data``, +automatically pick up the perf dump file that Python creates (in ``/tmp/perf-$PID.dump``), and then create ``perf.jit.data`` which merges all the -JIT information together. This should also create a lot of ``jitted-XXXX-N.so`` +JIT information together. It should also create a lot of ``jitted-XXXX-N.so`` files in the current directory which are ELF images for all the JIT trampolines that were created by Python. From 01c1b35c283131a6ad66b67a83a1e24b3f84c8ea Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Tue, 7 May 2024 16:35:07 +0100 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Jelle Zijlstra Co-authored-by: Kerim Kabirov --- Doc/howto/perf_profiling.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Doc/howto/perf_profiling.rst b/Doc/howto/perf_profiling.rst index bd292b9ba62e25..06459d1b222964 100644 --- a/Doc/howto/perf_profiling.rst +++ b/Doc/howto/perf_profiling.rst @@ -211,7 +211,7 @@ How to work without frame pointers ---------------------------------- If you are working with a Python interpreter that has been compiled without -frame pointers you can still use the ``perf`` profiler but the overhead will be +frame pointers, you can still use the ``perf`` profiler, but the overhead will be a bit higher because Python needs to generate unwinding information for every Python function call on the fly. Additionally, ``perf`` will take more time to process the data because it will need to use the DWARF debugging information to @@ -225,8 +225,8 @@ which will enable the JIT mode for the ``perf`` profiler. Due to a bug in the ``perf`` tool, only ``perf`` versions higher than v6.8 will work with the JIT mode. The fix was also backported to the v6.7.2 - version of the tool. - + version of the tool. + Note that when checking the version of the ``perf`` tool (which can be done by running ``perf version``) you must take into account that some distros add some custom version numbers including a ``-`` character. This means