8000 gh-132519: fix excessive mem usage in QSBR with large blocks by tom-pytel · Pull Request #132520 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-132519: fix excessive mem usage in QSBR with large blocks #132520

8000
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

tom-pytel
Copy link
Contributor
@tom-pytel tom-pytel commented Apr 14, 2025

Memory usage numbers (proposed fix explained below):

             VmHWM
GIL      135104 kB  - normal GIL-enabled baseline
noGIL   6702788 kB  - free-threaded current QSBR behavior
fix      517760 kB  - free-threaded with _PyMem_ProcessDelayed() in _Py_HandlePending()

Test script:

import threading
from queue import Queue

def thrdfunc(queue):
    while True:
        l = queue.get()

        l.append(0)  # force resize in non-parent thread which will free using _PyMem_FreeDelayed()

queue = Queue(maxsize=2)

threading.Thread(target=thrdfunc, args=(queue,)).start()

while True:
    l = [None] * int(3840*2160*3/8)  # sys.getsizeof(l) ~= 3840*2160*3 bytes

    queue.put(l)

Delayed memory free checks (and subsequent frees if applicable) currently only occur in one of two situations:

  • Garbage collection, which doesn't trigger often enough in this script, though manual trigger solves problem.
  • On a _PyMem_FreeDelayed() when the number of pending delayed free memory blocks reaches exactly 254. And then it waits another 254 frees even if could not free any pending blocks this time, which is a lot for big buffers.

This works great for many small objects, but with larger buffers these can accumulate quickly, so more frequent checks should be done.

I tried a few things but _PyMem_ProcessDelayed() added to _Py_HandlePending() seems to work well and be safe and a QSBR_QUIESCENT_STATE has just been reported so there is a fresh chance to actually free. Seems to happen often enough that memory usage is kept down, and if nothing to free then _PyMem_ProcessDelayed() is super-cheap.

Another option would be to track the amount of pending memory to be freed and increase the frequency of free attempts if that number gets too large, but to start with this small change seems to solved the problem well enough. Could also schedule GC if pending frees get too high, but that seems like a roundabout way to arrive at _PyMem_ProcessDelayedNoDealloc().

Performance as checked by pyperformance full suite is unchanged with the fix (literally 0.17% better avg, so noise).

@tom-pytel
Copy link
Contributor Author

Ping @colesbury, @kumaraditya303. Is there a better place for the _PyMem_ProcessDelayed()? I thought _PyThreadState_Attach() at first but that is too low level.

Copy link
Contributor
@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this. You risk accidentally introducing quadratic behavior.

We will likely tweak the heuristics in the future for when _PyMem_ProcessDelayed() is called, but that should be based on data for real applications.

@tom-pytel tom-pytel closed this Apr 14, 2025
@tom-pytel tom-pytel deleted the fix-issue-132519 branch April 14, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0