-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py/gc: Print fragmentation stats in micropython.mem_info(1) #5195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
py/gc.c
Outdated
#if !EXTENSIVE_HEAP_PROFILING | ||
// When comparing heap output we don't want to print the starting | ||
// pointer of the heap because it changes from run to run. | ||
mp_printf(&mp_plat_print, "GC memory layout; from %p:", MP_STATE_MEM(gc_pool_start)); | ||
#endif | ||
#if DUMP_FRAGMENTATION_INFO | ||
size_t n = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For something used throughout the function a more descriptive would be better I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've updated it and added a comment.
py/gc.c
Outdated
bl = bl2 & (~(DUMP_BYTES_PER_LINE - 1)); | ||
mp_printf(&mp_plat_print, "\n (%u lines all free)", (uint)lines); | ||
skip = lines * DUMP_BYTES_PER_LINE; | ||
#if DUMP_FRAGMENTATION_INFO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't get reached if DUMP_BLOCK_INFO is 0, is this the intent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is the intent. This line is accounting for the fact that skipping lines means we wouldn't otherwise be counting these free blocks in the current run of free blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't look at the code in detail so I 'm not 100% sure I understand what this means :)
But I'll rephrase my question: when DUMP_FRAGMENTATION_INFO is 1 and DUMP_BLOCK_INFO is 0, n
seems to go through different caluclations than in the case where both DUMP_FRAGMENTATION_INFO and DUMP_BLOCK_INFO are 1. But the end result is the same in both cases, right?
py/gc.c
Outdated
for (size_t bl = 0; bl < MP_STATE_MEM(gc_alloc_table_byte_len) * BLOCKS_PER_ATB; bl++) { | ||
#if DUMP_BLOCK_INFO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if DUMP_BLOCK_INFO nor DUMP_FRAGMENTATION_INFO is set, should this function run at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not. I've #if'd it in modmicropython.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @stinos! Do you have any thoughts on the whether changing this behaviour is worthwhile and/or whether these are useful stats?
py/gc.c
Outdated
#if !EXTENSIVE_HEAP_PROFILING | ||
// When comparing heap output we don't want to print the starting | ||
// pointer of the heap because it changes from run to run. | ||
mp_printf(&mp_plat_print, "GC memory layout; from %p:", MP_STATE_MEM(gc_pool_start)); | ||
#endif | ||
#if DUMP_FRAGMENTATION_INFO | ||
size_t n = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've updated it and added a comment.
py/gc.c
Outdated
for (size_t bl = 0; bl < MP_STATE_MEM(gc_alloc_table_byte_len) * BLOCKS_PER_ATB; bl++) { | ||
#if DUMP_BLOCK_INFO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not. I've #if'd it in modmicropython.c
py/gc.c
Outdated
bl = bl2 & (~(DUMP_BYTES_PER_LINE - 1)); | ||
mp_printf(&mp_plat_print, "\n (%u lines all free)", (uint)lines); | ||
skip = lines * DUMP_BYTES_PER_LINE; | ||
#if DUMP_FRAGMENTATION_INFO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is the intent. This line is accounting for the fact that skipping lines means we wouldn't otherwise be counting these free blocks in the current run of free blocks.
I haven't ran it in an actual program, but at first sight it is way more useful than knowing just the total amount of free memory. However for the output I'd maybe print an extra line containing a header specifying what each class means, now it's some guesswork to look at the table and figure out what the largest succesful allocation would be. (Which could potentially be interesting information to access at runtime in a next stage, i.e. being able to query the |
I use the feature relatively often, to get a visual indication of the heap. And IMO it should stay enabled by default. Eg it's sometimes useful when "remote debugging" someone's problems to get them to issue Adding a config option to disable the printing makes sense, but it would only be used (ie disabled) on really minimal ports. |
OK! My thought was that the fragmentation stats capture the same information in most cases. Out of curiosity, could you imagine an alternative version of the fragmentation stats that was a good substitute for the full block dump?
So how would you like to proceed:
|
Thoughts on first use:
It still has the header from the original display: Counter to the argument about saving flash space by default, I think it'd be more useful to have I saw the mix of #defines already in place that could be used to enable both/either of these in a particular port/board/build, perhaps just document these so that people can strip either/both options out of their build if they want the extra space. |
I like this idea. |
+1 on all this :-) I would like to suggest a further step to digest-down the fragments. Print the top 3-5 fragments so one gets a feeling for how much "large space" there is. But then distill it down to one "benchmark" number, which is how much more memory one could allocate given the current size distribution of allocated memory. E.g. "your program could allocate 1.5x as much using the same distribution". I believe the above is difficult/expensive to calculate and really depends on the order of allocations as well. A an approximation I can offer is to classify the allocated objects into 10 non-linear size bins. Then calculate for each size bin: Edit: I hadn't fully seen the code when I wrote the above. You already have 20 bins of "number of objects in this size bin that could be allocated" so all you are "missing" for the above is to weigh them by the currently allocated object histogram. |
#if DUMP_FRAGMENTATION_INFO | ||
STATIC void gc_update_fragmentation_stats(size_t n, size_t *frag_classes) { | ||
for (size_t c = 1, i = 0; c < n; c <<= 1, ++i) { | ||
frag_classes[i] += n / c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment that describes what this does would sure be nice. At least I can't just glance at the code and understand what it's doing... Maybe if the above #define NUM_FRAGMENTATION_CLASSES
had a comment what frag classes are...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
OK, I think I have implemented this as you describe. This PR is now -128 bytes on the L0/F0 STM32 build (no table dump, adds frag stats). +224 on PYBV11 (keeps table dump, adds frag stats) |
Thanks for the reminder to finish this off! I definitely agree there's scope for adding more info here (although at some point we just write an offline parser for the table dump). My assumption had been that programs tend to go through two stages -- set up and initialise, then some sort of steady state of request handling or repeated operation. The two have vastly different allocation profiles. So I'm not sure that the currently allocated object histogram is necessarily a good predictor of the future demands. There have been a few discussions about how to improve the GC to be more aware of the sort of allocations that fall into the two stages, and separate them, and this is kind of my motivation for this PR -- a way to quantify whether future improvements do actually make a difference to a given program. To a certain extent, the "expected vs actual" is captured by comparing the total available bytes to the frag stats. I think possibly a third mode might be useful, basically along the lines of what you're suggesting, would be printing out the size (and location) of the top N allocations (or free regions). |
Problem if you take multiple bins together is that one benchmark number can theoretically have an infinite number of meanings so I'm not sure if this is very useful? edit: thinking of it, same is true for 'free mem' actually and that's also displayed so maybe my point is moot |
Yes, but I think there's value in both: a ton of data you can analyze to your heart's content, and also a single number or two that tell you roughly how much headroom you have. |
py/gc.c
Outdated
static const size_t DUMP_BYTES_PER_LINE = 64; | ||
#endif | ||
#if !EXTENSIVE_HEAP_PROFILING | ||
// When comparing heap output we don't want to print the starting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it looks as if this comment is for if (print_table)
but it's actually for #if !EXTENSIVE_HEAP_PROFILING
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right.
But moving it above the #if seems weird too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yes. It's a bit out of scope for this commit but to make this completely clear I'd go with changing the comment itself e.g. Skip this when EXTENSIVE_HEAP_PROFILING is on because...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Not directly related to this PR, but you might be interested in the images and tools that @tannewt created to visualize fragmentation, visible here: adafruit#547 |
Yup, thanks @dhalbert I have seen this, it's really cool, and very useful for understand why the GC behaves the way it does. But for now what I need is a really quick diagnostic for "is the heap fragmented", without requiring the user to install custom firmware or do offline analysis. |
py/modmicropython.c
Outdated
// arg given means dump gc allocation table | ||
gc_dump_alloc_table(); | ||
mp_int_t arg = mp_obj_get_int(args[0]); | ||
gc_dump_alloc_table(arg == 1, arg == 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend this to be: gc_dump_alloc_table(arg >= 1, arg >= 2);
? That would match the docs.
Or maybe it could be gc_dump_alloc_table(arg & 1, arg & 2)
, which would allow better control over what's printed (sometimes you may only want the frag stats).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also clarified the docs.
py/gc.c
Outdated
// How many consecutive free blocks. | ||
size_t nfree = 0; | ||
// Number of allocs that would succeed for 1, 2, 4, .. 2^n blocks. | ||
size_t frag_classes[NUM_FRAGMENTATION_CLASSES] = {0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use MICROPY_GC_STACK_ENTRY_TYPE
instead of size_t
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Rebased. I guess the decision is still whether this is worth +256 bytes. I would possibly make the case that we should disable the full dump on EXTRA_FEATURES and below, i.e. only enable for FULL_FEATURES, in which case the this PR would be net -140 bytes on PYBV11. |
3df2bcc
to
8d2b83c
Compare
The previous behavior was (for ports that enable it via `MICROPY_PY_MICROPYTHON_MEM_INFO`), that passing any argument to `micropython.mem_info()` would print out the block table. ``1`` means dump the block table (as before), ``2`` means show the fragmentation stats, i.e. how many allocations of a given size can succeed, ``3`` means both. On small STM32 ports (L0, F0), don't enable support for the full block table, only for the frag stats. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This is an automated heads-up that we've just merged a Pull Request See #13763 A search suggests this PR might apply the STATIC macro to some C code. If it Although this is an automated message, feel free to @-reply to me directly if |
PR Note: Not sure how controversial it is to remove the block printing by default. Open to suggestions here, but for 196 bytes I'm not sure how many people care. Would appreciate feedback on my assumption that people (who aren't debugging the GC) are mostly interested in fragmentation only.
Also open to suggestions on a better summary of the fragmentation. In my mind this captures the main issue with fragmentation -- there are enough total bytes available but allocations above a certain size are no longer possible. I can imagine some sort of equivalent to perf_bench where we can compare these stats for given test cases over time. I'd like to distil it down to a single number but not sure that's useful (and could be done in post-processing anyway).
Obviously the fragmentation stats only make sense immediately after calling
gc.collect()
but I don't think this method should initiate a collection by default.Although no targets now use the block-table-only option, I have confirmed that it makes zero difference to the build size compared to before this change.
The previous behavior was (for ports that enable it via
MICROPY_PY_MICROPYTHON_MEM_INFO
), that passing any argument tomicropython.mem_info()
would print out the block table.This adds an additional line to the end that shows basic fragmentation
statistics, i.e. how many allocations of a given size can succeed.
This also changes the default behavior to no longer print the block
table, and only prints the fragmentation stats (with the assumption
that fragmentation is the main reason to investigate the block table).
This saves ~196 bytes on PYBV11.
On the Unix port, show both (useful for debugging GC).
On small STM32 ports (L0, F0), don't show either. (Saves ~350 bytes)