gh-132519: fix excessive mem usage in QSBR with large blocks #132520

tom-pytel · 2025-04-14T15:20:55Z

Memory usage numbers (proposed fix explained below):

             VmHWM
GIL      135104 kB  - normal GIL-enabled baseline
noGIL   6702788 kB  - free-threaded current QSBR behavior
fix      517760 kB  - free-threaded with _PyMem_ProcessDelayed() in _Py_HandlePending()

Test script:

import threading
from queue import Queue

def thrdfunc(queue):
    while True:
        l = queue.get()

        l.append(0)  # force resize in non-parent thread which will free using _PyMem_FreeDelayed()

queue = Queue(maxsize=2)

threading.Thread(target=thrdfunc, args=(queue,)).start()

while True:
    l = [None] * int(3840*2160*3/8)  # sys.getsizeof(l) ~= 3840*2160*3 bytes

    queue.put(l)

Delayed memory free checks (and subsequent frees if applicable) currently only occur in one of two situations:

Garbage collection, which doesn't trigger often enough in this script, though manual trigger solves problem.
On a _PyMem_FreeDelayed() when the number of pending delayed free memory blocks reaches exactly 254. And then it waits another 254 frees even if could not free any pending blocks this time, which is a lot for big buffers.

This works great for many small objects, but with larger buffers these can accumulate quickly, so more frequent checks should be done.

I tried a few things but _PyMem_ProcessDelayed() added to _Py_HandlePending() seems to work well and be safe and a QSBR_QUIESCENT_STATE has just been reported so there is a fresh chance to actually free. Seems to happen often enough that memory usage is kept down, and if nothing to free then _PyMem_ProcessDelayed() is super-cheap.

Another option would be to track the amount of pending memory to be freed and increase the frequency of free attempts if that number gets too large, but to start with this small change seems to solved the problem well enough. Could also schedule GC if pending frees get too high, but that seems like a roundabout way to arrive at _PyMem_ProcessDelayedNoDealloc().

Performance as checked by pyperformance full suite is unchanged with the fix (literally 0.17% better avg, so noise).

Issue: Excessive QSBR memory usage when delay freeing large blocks. #132519

tom-pytel · 2025-04-14T15:47:53Z

Ping @colesbury, @kumaraditya303. Is there a better place for the _PyMem_ProcessDelayed()? I thought _PyThreadState_Attach() at first but that is too low level.

colesbury

I don't think we should do this. You risk accidentally introducing quadratic behavior.

We will likely tweak the heuristics in the future for when _PyMem_ProcessDelayed() is called, but that should be based on data for real applications.

pythongh-132519: fix excessive mem usage in QSBR with large blocks

d29dc7c

tom-pytel requested a review from ericsnowcurrently as a code owner April 14, 2025 15:20

bedevere-app bot added the awaiting review label Apr 14, 2025

bedevere-app bot mentioned this pull request Apr 14, 2025

Excessive QSBR memory usage when delay freeing large blocks. #132519

Closed

📜🤖 Added by blurb_it.

4037c10

colesbury reviewed Apr 14, 2025

View reviewed changes

tom-pytel closed this Apr 14, 2025

tom-pytel deleted the fix-issue-132519 branch April 14, 2025 18:41

tom-pytel mentioned this pull request Jun 21, 2025

GH-133136: Revise QSBR to reduce excess memory held #135473

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-132519: fix excessive mem usage in QSBR with large blocks #132520

gh-132519: fix excessive mem usage in QSBR with large blocks #132520

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-132519: fix excessive mem usage in QSBR with large blocks #132520

gh-132519: fix excessive mem usage in QSBR with large blocks #132520

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!