gh-132519: fix excessive mem usage in QSBR with large blocks #132520
8000
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Memory usage numbers (proposed fix explained below):
Test script:
Delayed memory free checks (and subsequent frees if applicable) currently only occur in one of two situations:
_PyMem_FreeDelayed()
when the number of pending delayed free memory blocks reaches exactly 254. And then it waits another 254 frees even if could not free any pending blocks this time, which is a lot for big buffers.This works great for many small objects, but with larger buffers these can accumulate quickly, so more frequent checks should be done.
I tried a few things but
_PyMem_ProcessDelayed()
added to_Py_HandlePending()
seems to work well and be safe and aQSBR_QUIESCENT_STATE
has just been reported so there is a fresh chance to actually free. Seems to happen often enough that memory usage is kept down, and if nothing to free then_PyMem_ProcessDelayed()
is super-cheap.Another option would be to track the amount of pending memory to be freed and increase the frequency of free attempts if that number gets too large, but to start with this small change seems to solved the problem well enough. Could also schedule GC if pending frees get too high, but that seems like a roundabout way to arrive at
_PyMem_ProcessDelayedNoDealloc()
.Performance as checked by
pyperformance
full suite is unchanged with the fix (literally 0.17% better avg, so noise).