8000 gh-129185: Fix PyTraceMalloc_Untrack() at Python exit by vstinner · Pull Request #129191 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-129185: Fix PyTraceMalloc_Untrack() at Python exit #129191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 23, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
PyTraceMalloc_Track() checks tracing with the GIL
  • Loading branch information
vstinner committed Jan 23, 2025
commit d510567c32bf52044351c7a68df3acef5eb12e4c
15 changes: 10 additions & 5 deletions Python/tracemalloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -1255,15 +1255,17 @@ int
PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr,
size_t size)
{
// gh-129185: Pre-check to support calls after _PyTraceMalloc_Fini()
PyGILState_STATE gil_state = PyGILState_Ensure();
// gh-129185: Check before TABLES_LOCK() to support calls after
// _PyTraceMalloc_Fini().
int result;
if (!tracemalloc_config.tracing) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to hold the lock or GIL, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this "pre-check" is to avoid the GIL and TABLES_LOCK().

The flag is tested again below with TABLES_LOCK().

PyTraceMalloc_Untrack() doesn't lock the GIL, only TABLES_LOCK(), when tracing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change restores the check that we had before recent refactoring. For example, Python 3.14 alpha3 uses:

int
PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr,
                    size_t size)
{
    int res;
    PyGILState_STATE gil_state;

    if (!tracemalloc_config.tracing) {
        /* tracemalloc is not tracing: do nothing */
        return -2;
    }

    gil_state = PyGILState_Ensure();

    TABLES_LOCK();
    res = tracemalloc_add_trace(domain, ptr, size);
    TABLES_UNLOCK();

    PyGILState_Release(gil_state);
    return res;
}

The check is done without holding the GIL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we lose the thread safety: if one thread that holds the GIL were to write to tracing, then a thread that calls one of these without the GIL would race.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest solution would be to grab the tables lock for the read, and then unlock it before calling PyGILState_Ensure to prevent lock-ordering deadlocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race is right here. Thread A might check tracing at the exact same time as thread B, which is a data race.

What is the problem of "continue execution" of thread A, since thread A checks again tracing with the tables lock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it might not reach the second tracing read at all--if the first one races, then we're probably going to get a crash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're talking about thread B calling _PyTraceMalloc_Fini(), not tracemalloc.stop(). In this case, thread A can call TABLES_LOCK() after thread B deleted the lock, and yes, we can get a crash.

Sadly, PyTraceMalloc_Track() and PyTraceMalloc_Untrack() API doesn't require the caller to hold the GIL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can happen too, but I meant that we'll get a crash from the data race on tracing, not a use-after-free on the lock. I think we just hold the GIL (via PyGILState_Ensure) and call without the tables lock held, that should be thread safe enough for 3.12.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the Untrack() function to get the GIL. It should address your last concern.

return -2;
result = -2;
goto unlock_gil;
}

PyGILState_STATE gil_state = PyGILState_Ensure();
TABLES_LOCK();

int result;
if (tracemalloc_config.tracing) {
result = tracemalloc_add_trace_unlocked(domain, ptr, size);
}
Expand All @@ -1273,6 +1275,7 @@ PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr,
}

TABLES_UNLOCK();
unlock_gil:
PyGILState_Release(gil_state);

return result;
Expand All @@ -1282,7 +1285,9 @@ PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr,
int
PyTraceMalloc_Untrack(unsigned int domain, uintptr_t ptr)
{
// gh-129185: Pre-check to support calls after _PyTraceMalloc_Fini()
// gh-129185: Check before TABLES_LOCK() to support calls after
// _PyTraceMalloc_Fini(). This check is prone to race if another thread
// calls _PyTraceMalloc_Fini() in parallel.
if (!tracemalloc_config.tracing) {
return -2;
}
Expand Down
Loading
0