-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
GH-91048: Add utils for capturing async call stack for asyncio programs and enable profiling #124640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-91048: Add utils for capturing async call stack for asyncio programs and enable profiling #124640
Changes from 31 commits
1b01a91
0fc5511
1d20a51
c8be18e
abf2cb9
20ceab7
72d9321
c9475f6
e1099e9
817f88b
54386ac
98434f0
485c166
8802be7
1ddc9cf
bc9beb8
fd141d4
2d72f24
391defa
d6357fd
bb3b6df
54c99ec
c1a4f09
027d522
c2d5ec6
08d09eb
fe3113b
18ec26d
e4cc462
d5cdc36
83606f2
5edac41
8dc6d34
30884ea
1317658
81b0a31
258ce3d
b9ecefb
b77dcb0
8867946
87d2524
b47bef1
230b7ec
b1d6158
ac51364
c7e59eb
9eba5e1
59121f6
f8f48f0
74c5ad1
067c043
9f04911
0774805
3048493
1f42873
7799391
03ed5c1
21f9ea9
8a43dfa
b3fae68
d0aedf0
df0032a
0ce241b
8f126f6
966d84e
f56468a
404b88a
911fed8
ab511a4
c3c685a
785adeb
a577328
064129a
ce332d9
d6d943f
703ff46
e867863
9cb5b29
61b2b7b
9533ab9
596191d
ad9152e
066bf21
4caeec4
38f061d
a8dd667
cf8f5e5
eda9c7c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
.. currentmodule:: asyncio | ||
|
||
|
||
.. _asyncio-stack: | ||
|
||
=================== | ||
Stack Introspection | ||
=================== | ||
|
||
**Source code:** :source:`Lib/asyncio/stack.py` | ||
|
||
------------------------------------- | ||
|
||
asyncio has powerful runtime call stack introspection utilities | ||
willingc marked this conversation as resolved.
Show resolved
Hide resolved
|
||
to trace the entire call graph of a running coroutine or task, or | ||
a suspended *future*. | ||
|
||
.. versionadded:: 3.14 | ||
|
||
|
||
.. function:: print_call_graph(*, future=None, file=None, depth=1) | ||
|
||
Print the async call graph for the current task or the provided | ||
:class:`Task` or :class:`Future`. | ||
|
||
The function receives an optional keyword-only *future* argument. | ||
If not passed, the current running task will be used. If there's no | ||
current task, the function returns ``None``. | ||
|
||
If the function is called on *the current task*, the optional | ||
keyword-only ``depth`` argument can be used to skip the specified | ||
1st1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
number of frames from top of the stack. | ||
|
||
If *file* is not specified the function will print to :data:`sys.stdout`. | ||
|
||
**Example:** | ||
|
||
The following Python code: | ||
|
||
.. code-block:: python | ||
|
||
import asyncio | ||
|
||
async def test(): | ||
asyncio.print_call_graph() | ||
|
||
async def main(): | ||
async with asyncio.TaskGroup() as g: | ||
g.create_task(test()) | ||
|
||
asyncio.run(main()) | ||
|
||
will print:: | ||
|
||
* Task(name='Task-2', id=0x1039f0fe0) | ||
+ Call stack: | ||
| File 't2.py', line 4, in async test() | ||
+ Awaited by: | ||
* Task(name='Task-1', id=0x103a5e060) | ||
+ Call stack: | ||
| File 'taskgroups.py', line 107, in async TaskGroup.__aexit__() | ||
| File 't2.py', line 7, in async main() | ||
|
||
For rendering the call stack to a string the following pattern | ||
should be used: | ||
|
||
.. code-block:: python | ||
|
||
import io | ||
|
||
... | ||
|
||
buf = io.StringIO() | ||
asyncio.print_call_graph(file=buf) | ||
output = buf.getvalue() | ||
|
||
|
||
.. function:: capture_call_graph(*, future=None) | ||
|
||
Capture the async call graph for the current task or the provided | ||
:class:`Task` or :class:`Future`. | ||
|
||
The function receives an optional keyword-only *future* argument. | ||
If not passed, the current running task will be used. If there's no | ||
current task, the function returns ``None``. | ||
|
||
If the function is called on *the current task*, the optional | ||
keyword-only ``depth`` argument can be used to skip the specified | ||
number of frames from top of the stack. | ||
|
||
Returns a ``FutureCallGraph`` named tuple: | ||
1st1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* ``FutureCallGraph(future, call_stack, awaited_by)`` | ||
|
||
Where 'future' is a reference to a *Future* or a *Task* | ||
(or their subclasses.) | ||
|
||
``call_stack`` is a list of ``FrameCallGraphEntry`` objects. | ||
|
||
``awaited_by`` is a list of ``FutureCallGraph`` tuples. | ||
|
||
* ``FrameCallGraphEntry(frame)`` | ||
|
||
Where ``frame`` is a frame object of a regular Python function | ||
in the call stack. | ||
|
||
|
||
Low level utility functions | ||
=========================== | ||
|
||
To introspect an async call graph asyncio requires cooperation from | ||
control flow structures, such as :func:`shield` or :class:`TaskGroup`. | ||
Any time an intermediate ``Future`` object with low-level APIs like | ||
:meth:`Future.add_done_callback() <asyncio.Future.add_done_callback>` is | ||
involved, the following two functions should be used to inform *asyncio* | ||
about how exactly such intermediate future objects are connected with | ||
the tasks they wrap or control. | ||
|
||
|
||
.. function:: future_add_to_awaited_by(future, waiter, /) | ||
|
||
Record that *future* is awaited on by *waiter*. | ||
|
||
Both *future* and *waiter* must be instances of | ||
:class:`asyncio.Future <Future>` or :class:`asyncio.Task <Task>` or | ||
their subclasses, otherwise the call would have no effect. | ||
1st1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
A call to ``future_add_to_awaited_by()`` must be followed by an | ||
eventual call to the ``future_discard_from_awaited_by()`` function | ||
with the same arguments. | ||
|
||
|
||
.. function:: future_discard_from_awaited_by(future, waiter, /) | ||
|
||
Record that *future* is no longer awaited on by *waiter*. | ||
|
||
Both *future* and *waiter* must be instances of | ||
:class:`asyncio.Future <Future>` or :class:`asyncio.Task <Task>` or | ||
their subclasses, otherwise the call would have no effect. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,13 @@ extern "C" { | |
PyObject *prefix##_qualname; \ | ||
_PyErr_StackItem prefix##_exc_state; \ | ||
PyObject *prefix##_origin_or_finalizer; \ | ||
/* A *borrowed* reference to a task that drives the coroutine. \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is adding yet another field generators for the use of async. Is there no way to add this tasks, not generators? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I don't see how it is safe for this to be a borrowed reference. What guarantee is there that the object passed to If the answer is "_PyCoro_SetTask is an internal function", then why is it used by asyncio which is a module? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
AFAICT there is too much entanglement between generators and coroutines to split their implementation entirely to avoid this. How can we measure what this unwanted expense actually is? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Splitting their implementations is desirable, but I appreciate that it won't happen in this PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That means that it not only does it takes space and needs initializing, it will also need to be cleared when the generator is destroyed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @markshannon we agree with your feedback. I'll refactor the PR to remove the need for |
||
The field is meant to be used by profilers and debuggers only. \ | ||
The main invariant is that a task can't get GC'ed while \ | ||
the coroutine it drives is alive and vice versa. \ | ||
Profilers can use this field to reconstruct the full async \ | ||
call stack of program. */ \ | ||
PyObject *prefix##_task; \ | ||
char prefix##_hooks_inited; \ | ||
char prefix##_closed; \ | ||
char prefix##_running_async; \ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,10 @@ extern "C" { | |
#include "pycore_typeobject.h" // struct _types_runtime_state | ||
#include "pycore_unicodeobject.h" // struct _Py_unicode_runtime_state | ||
|
||
#if defined(__APPLE__) | ||
# include <mach-o/loader.h> | ||
#endif | ||
|
||
struct _getargs_runtime_state { | ||
struct _PyArg_Parser *static_parsers; | ||
}; | ||
|
@@ -59,6 +63,37 @@ typedef struct _Py_AuditHookEntry { | |
void *userData; | ||
} _Py_AuditHookEntry; | ||
|
||
// Macros to burn global values in custom sections so out-of-process | ||
ambv marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// profilers can locate them easily | ||
|
||
#define GENERATE_DEBUG_SECTION(name, declaration) \ | ||
_GENERATE_DEBUG_SECTION_WINDOWS(name) \ | ||
_GENERATE_DEBUG_SECTION_APPLE(name) \ | ||
declaration \ | ||
_GENERATE_DEBUG_SECTION_LINUX(name) | ||
|
||
#if defined(MS_WINDOWS) | ||
#define _GENERATE_DEBUG_SECTION_WINDOWS(name) \ | ||
_Pragma(Py_STRINGIFY(section(Py_STRINGIFY(name), read, write))) \ | ||
__declspec(allocate(Py_STRINGIFY(name))) | ||
#else | ||
#define _GENERATE_DEBUG_SECTION_WINDOWS(name) | ||
#endif | ||
|
||
#if defined(__APPLE__) | ||
#define _GENERATE_DEBUG_SECTION_APPLE(name) \ | ||
__attribute__((section(SEG_DATA "," Py_STRINGIFY(name)))) | ||
#else | ||
#define _GENERATE_DEBUG_SECTION_APPLE(name) | ||
#endif | ||
|
||
#if defined(__linux__) && (defined(__GNUC__) || defined(__clang__)) | ||
#define _GENERATE_DEBUG_SECTION_LINUX(name) \ | ||
__attribute__((section("." Py_STRINGIFY(name)))) | ||
#else | ||
#define _GENERATE_DEBUG_SECTION_LINUX(name) | ||
#endif | ||
|
||
typedef struct _Py_DebugOffsets { | ||
char cookie[8]; | ||
uint64_t version; | ||
|
@@ -108,6 +143,7 @@ typedef struct _Py_DebugOffsets { | |
uint64_t instr_ptr; | ||
uint64_t localsplus; | ||
uint64_t owner; | ||
uint64_t stackpointer; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The stack pointer is not always readable, nor is it always correct, nor is it part of the ABI, and when it is correct or not may change at a whim. Overall, I think it is better not to publish this offset lest it suggest that the value is meaningful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is needed because there is no other way to get the current running coroutine object from the frame with the current new layout from external profilers they can only read memory. Please check the tests. If you have a better alternative I am happy to move to that but right now I don't see any other possibility There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you point me at (pun not intended) a situation where the stack pointer will be invalid in a way that will make the external introspection (done in the tests here) not work?
Is there a design that you have in mind that would result in the removal of this entirely? AFAICT if we decided to drop deferred refcounting, we'd return to what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@markshannon Mark, I think the contract here is that CPython can move freely with its refactorings and changes. If Same as internal layout of Set/Dict/asyncio objects -- they are all subject to break from one version to another. I really think this particular change is fine -- CPython doesn't care nor does guarantee anything whatsoever here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We experimented with a register VM. We ultimately decided against it, but if we had gone for it there would have been no stack pointer.
The problem is that optimizations may leave the tests working, but the stack pointer could easily be meaningless in other places. We expect that the stack will consist of zero or more values in registers followed by the remaining values in memory, with the stack pointer pointing to some offset from the in-memory stack top. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@pablogsal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
(Modules/_testexternalinspection.c in line 728) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will copy It here for your convenience:
as you can see is needed to go from the coroutine to the |
||
} interpreter_frame; | ||
|
||
// Code object offset; | ||
|
@@ -152,6 +188,14 @@ typedef struct _Py_DebugOffsets { | |
uint64_t ob_size; | ||
} list_object; | ||
|
||
// PySet object offset; | ||
struct _set_object { | ||
uint64_t size; | ||
uint64_t used; | ||
uint64_t table; | ||
uint64_t mask; | ||
} set_object; | ||
|
||
// PyDict object offset; | ||
struct _dict_object { | ||
uint64_t size; | ||
|
@@ -192,6 +236,14 @@ typedef struct _Py_DebugOffsets { | |
uint64_t size; | ||
uint64_t collecting; | ||
} gc; | ||
|
||
struct _gen_object { | ||
uint64_t size; | ||
uint64_t gi_name; | ||
uint64_t gi_iframe; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an implementation detail, and may not exist in future versions of CPython There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Everything in the debug offsets is an implementation detail so I don't think this made this special. Once more, this is used to follow the chain of coroutines so if it it changes we will adapt but the test we add ensures that there is an alternative and we don't change this in a way that breaks all profilers There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, see also my comment here. I don't think it's worth anyones time to debate these offsets. Yes they are subject to break and that's fine, they are not a public API. Profilers and debuggers expect this kind of stuff to drift. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I worry that when they do break, we'll be pressured to restore whatever field they were relying on. I'd be happier if these offsets were in their own header, with a large clear warning about breakage. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think which header file they live in should matter, we don't need a new header for this specific internal implementation detail, this already is such a header. Lets just add the desired code comment about stability expectations and breakage as instructions to our future selves that if these need to change, so be it, and that the internal implementation detail tests relying on these can be marked as expected failures at that time. |
||
uint64_t gi_task; | ||
uint64_t gi_frame_state; | ||
1st1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} gen_object; | ||
} _Py_DebugOffsets; | ||
|
||
/* Reference tracer state */ | ||
|
Uh oh!
There was an error while loading. Please reload this page.