8000 py/gc: Print fragmentation stats in micropython.mem_info(1) by jimmo · Pull Request #5195 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

py/gc: Print fragmentation stats in micropython.mem_info(1) #5195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jimmo
Copy link
Member
@jimmo jimmo commented Oct 9, 2019

PR Note: Not sure how controversial it is to remove the block printing by default. Open to suggestions here, but for 196 bytes I'm not sure how many people care. Would appreciate feedback on my assumption that people (who aren't debugging the GC) are mostly interested in fragmentation only.

Also open to suggestions on a better summary of the fragmentation. In my mind this captures the main issue with fragmentation -- there are enough total bytes available but allocations above a certain size are no longer possible. I can imagine some sort of equivalent to perf_bench where we can compare these stats for given test cases over time. I'd like to distil it down to a single number but not sure that's useful (and could be done in post-processing anyway).

Obviously the fragmentation stats only make sense immediately after calling gc.collect() but I don't think this method should initiate a collection by default.

Although no targets now use the block-table-only option, I have confirmed that it makes zero difference to the build size compared to before this change.


The previous behavior was (for ports that enable it via
MICROPY_PY_MICROPYTHON_MEM_INFO), that passing any argument to
micropython.mem_info() would print out the block table.

This adds an additional line to the end that shows basic fragmentation
statistics, i.e. how many allocations of a given size can succeed.

This also changes the default behavior to no longer print the block
table, and only prints the fragmentation stats (with the assumption
that fragmentation is the main reason to investigate the block table).
This saves ~196 bytes on PYBV11.

On the Unix port, show both (useful for debugging GC).

On small STM32 ports (L0, F0), don't show either. (Saves ~350 bytes)

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Oct 9, 2019
py/gc.c Outdated
#if !EXTENSIVE_HEAP_PROFILING
// When comparing heap output we don't want to print the starting
// pointer of the heap because it changes from run to run.
mp_printf(&mp_plat_print, "GC memory layout; from %p:", MP_STATE_MEM(gc_pool_start));
#endif
#if DUMP_FRAGMENTATION_INFO
size_t n = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For something used throughout the function a more descriptive would be better I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've updated it and added a comment.

py/gc.c Outdated
bl = bl2 & (~(DUMP_BYTES_PER_LINE - 1));
mp_printf(&mp_plat_print, "\n (%u lines all free)", (uint)lines);
skip = lines * DUMP_BYTES_PER_LINE;
#if DUMP_FRAGMENTATION_INFO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't get reached if DUMP_BLOCK_INFO is 0, is this the intent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is the intent. This line is accounting for the fact that skipping lines means we wouldn't otherwise be counting these free blocks in the current run of free blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't look at the code in detail so I 'm not 100% sure I understand what this means :)
But I'll rephrase my question: when DUMP_FRAGMENTATION_INFO is 1 and DUMP_BLOCK_INFO is 0, n seems to go through different caluclations than in the case where both DUMP_FRAGMENTATION_INFO and DUMP_BLOCK_INFO are 1. But the end result is the same in both cases, right?

py/gc.c Outdated
for (size_t bl = 0; bl < MP_STATE_MEM(gc_alloc_table_byte_len) * BLOCKS_PER_ATB; bl++) {
#if DUMP_BLOCK_INFO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if DUMP_BLOCK_INFO nor DUMP_FRAGMENTATION_INFO is set, should this function run at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not. I've #if'd it in modmicropython.c

Copy link
Member Author
@jimmo jimmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @stinos! Do you have any thoughts on the whether changing this behaviour is worthwhile and/or whether these are useful stats?

py/gc.c Outdated
#if !EXTENSIVE_HEAP_PROFILING
// When comparing heap output we don't want to print the starting
// pointer of the heap because it changes from run to run.
mp_printf(&mp_plat_print, "GC memory layout; from %p:", MP_STATE_MEM(gc_pool_start));
#endif
#if DUMP_FRAGMENTATION_INFO
size_t n = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've updated it and added a comment.

py/gc.c Outdated
for (size_t bl = 0; bl < MP_STATE_MEM(gc_alloc_table_byte_len) * BLOCKS_PER_ATB; bl++) {
#if DUMP_BLOCK_INFO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not. I've #if'd it in modmicropython.c

py/gc.c Outdated
bl = bl2 & (~(DUMP_BYTES_PER_LINE - 1));
mp_printf(&mp_plat_print, "\n (%u lines all free)", (uint)lines);
skip = lines * DUMP_BYTES_PER_LINE;
#if DUMP_FRAGMENTATION_INFO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is the intent. This line is accounting for the fact that skipping lines means we wouldn't otherwise be counting these free blocks in the current run of free blocks.

@stinos
Copy link
Contributor
stinos commented Oct 9, 2019

Do you have any thoughts on the whether changing this behaviour is worthwhile and/or whether these are useful stats?

I haven't ran it in an actual program, but at first sight it is way more useful than knowing just the total amount of free memory. However for the output I'd maybe print an extra line containing a header specifying what each class means, now it's some guesswork to look at the table and figure out what the largest succesful allocation would be. (Which could potentially be interesting information to access at runtime in a next stage, i.e. being able to query the classes table)

@dpgeorge
Copy link
Member

Not sure how controversial it is to remove the block printing by default. Open to suggestions here, but for 196 bytes I'm not sure how many people care. Would appreciate feedback on my assumption that people (who aren't debugging the GC) are mostly interested in fragmentation only.

I use the feature relatively often, to get a visual indication of the heap. And IMO it should stay enabled by default. Eg it's sometimes useful when "remote debugging" someone's problems to get them to issue micropython.mem_info(1) to understand more how their program is behaving.

Adding a config option to disable the printing makes sense, but it would only be used (ie disabled) on really minimal ports.

@jimmo
Copy link
Member Author
jimmo commented Oct 11, 2019

I use the feature relatively often, to get a visual indication of the heap. And IMO it should stay enabled by default. Eg it's sometimes useful when "remote debugging" someone's problems to get them to issue micropython.mem_info(1) to understand more how their program is behaving.

OK!

My thought was that the fragmentation stats capture the same information in most cases. Out of curiosity, could you imagine an alternative version of the fragmentation stats that was a good substitute for the full block dump?

Adding a config option to disable the printing makes sense, but it would only be used (ie disabled) on really minimal ports.

So how would you like to proceed:

  • Not useful -- can extract the equivalent data in post-processing from the existing heap dump.
  • Frag stats aren't useful, but disabling verbose mem info is useful on constrained ports?
  • It is useful -- enough to enable for all ports that have dump enabled? (I doubt it)
  • It's a good substitute for the full dump on constrained ports (e.g. L0 / F0)?

< 8000 /div>
@andrewleech
Copy link
Contributor

Thoughts on first use:

>>> micropython.mem_info(1)
stack: 580 out of 31744
GC: total: 26262400, used: 20026240, free: 6236160
 No. of 1-blocks: 8292, 2-blocks: 1562, max blk sz: 153600, max free sz: 190051
GC memory layout; from d06f4480:
Frag: 389002 191191 95312 47513 23656 11789 5871 2916 1449 715 349 166 79 37 16 6 2 1 0 0

It still has the header from the original display: GC memory layout; from d06f4480:
It would help if the Frag line had it's own header to say what the numbers mean.

Counter to the argument about saving flash space by default, I think it'd be more useful to have micropython.mem_info(1) show the previous full map and micropython.mem_info(2) show the newer frag info (with simple header).

I saw the mix of #defines already in place that could be used to enable both/either of these in a particular port/board/build, perhaps just document these so that people can strip either/both options out of their build if they want the extra space.
My preference would be to have both available by default on all but the most constrained boards.

@dpgeorge
Copy link
Member

Counter to the argument about saving flash space by default, I think it'd be more useful to have micropython.mem_info(1) show the previous full map and micropython.mem_info(2) show the newer frag info (with simple header).

I like this idea.

@tve
Copy link
Contributor
tve commented Apr 21, 2020

+1 on all this :-)

I would like to suggest a further step to digest-down the fragments. Print the top 3-5 fragments so one gets a feeling for how much "large space" there is. But then distill it down to one "benchmark" number, which is how much more memory one could allocate given the current size distribution of allocated memory. E.g. "your program could allocate 1.5x as much using the same distribution".

I believe the above is difficult/expensive to calculate and really depends on the order of allocations as well. A an approximation I can offer is to classify the allocated objects into 10 non-linear size bins. Then calculate for each size bin: (currently_allocated_bytes + theoretically_allocatable_bytes) / currently_allocated_bytes. Finally calculate the median or the min across the bins.

Edit: I hadn't fully seen the code when I wrote the above. You already have 20 bins of "number of objects in this size bin that could be allocated" so all you are "missing" for the above is to weigh them by the currently allocated object histogram.

#if DUMP_FRAGMENTATION_INFO
STATIC void gc_update_fragmentation_stats(size_t n, size_t *frag_classes) {
for (size_t c = 1, i = 0; c < n; c <<= 1, ++i) {
frag_classes[i] += n / c;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment that describes what this does would sure be nice. At least I can't just glance at the code and understand what it's doing... Maybe if the above #define NUM_FRAGMENTATION_CLASSES had a comment what frag classes are...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jimmo
Copy link
Member Author
jimmo commented Apr 21, 2020

Counter to the argument about saving flash space by default, I think it'd be more useful to have micropython.mem_info(1) show the previous full map and micropython.mem_info(2) show the newer frag info (with simple header).

I like this idea.

OK, I think I have implemented this as you describe.

This PR is now -128 bytes on the L0/F0 STM32 build (no table dump, adds frag stats). +224 on PYBV11 (keeps table dump, adds frag stats)

@jimmo
Copy link
Member Author
jimmo commented Apr 21, 2020

Edit: I hadn't fully seen the code when I wrote the above. You already have 20 bins of "number of objects in this size bin that could be allocated" so all you are "missing" for the above is to weigh them by the currently allocated object histogram.

Thanks for the reminder to finish this off!

I definitely agree there's scope for adding more info here (although at some point we just write an offline parser for the table dump).

My assumption had been that programs tend to go through two stages -- set up and initialise, then some sort of steady state of request handling or repeated operation. The two have vastly different allocation profiles. So I'm not sure that the currently allocated object histogram is necessarily a good predictor of the future demands.

There have been a few discussions about how to improve the GC to be more aware of the sort of allocations that fall into the two stages, and separate them, and this is kind of my motivation for this PR -- a way to quantify whether future improvements do actually make a difference to a given program.

To a certain extent, the "expected vs actual" is captured by comparing the total available bytes to the frag stats.

I think possibly a third mode might be useful, basically along the lines of what you're suggesting, would be printing out the size (and location) of the top N allocations (or free regions).

@stinos
Copy link
Contributor
stinos commented Apr 21, 2020

But then distill it down to one "benchmark" number, which is how much more memory one could allocate given the current size distribution of allocated memory. E.g. "your program could allocate 1.5x as much using the same distribution".

Problem if you take multiple bins together is that one benchmark number can theoretically have an infinite number of meanings so I'm not sure if this is very useful?

edit: thinking of it, same is true for 'free mem' actually and that's also displayed so maybe my point is moot

@tve
Copy link
Contributor
tve commented Apr 21, 2020

Problem if you take multiple bins together is that one benchmark number can theoretically have an infinite number of meanings so I'm not sure if this is very useful?

Yes, but I think there's value in both: a ton of data you can analyze to your heart's content, and also a single number or two that tell you roughly how much headroom you have.

py/gc.c Outdated
static const size_t DUMP_BYTES_PER_LINE = 64;
#endif
#if !EXTENSIVE_HEAP_PROFILING
// When comparing heap output we don't want to print the starting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks as if this comment is for if (print_table) but it's actually for #if !EXTENSIVE_HEAP_PROFILING right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right.
But moving it above the #if seems weird too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes. It's a bit out of scope for this commit but to make this completely clear I'd go with changing the comment itself e.g. Skip this when EXTENSIVE_HEAP_PROFILING is on because...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dhalbert
Copy link
Contributor

Not directly related to this PR, but you might be interested in the images and tools that @tannewt created to visualize fragmentation, visible here: adafruit#547

@jimmo
Copy link
Member Author
jimmo commented Apr 22, 2020

Not directly related to this PR, but you might be interested in the images and tools that @tannewt created to visualize fragmentation, visible here: adafruit#547

Yup, thanks @dhalbert I have seen this, it's really cool, and very useful for understand why the GC behaves the way it does.

But for now what I need is a really quick diagnostic for "is the heap fragmented", without requiring the user to install custom firmware or do offline analysis.

// arg given means dump gc allocation table
gc_dump_alloc_table();
mp_int_t arg = mp_obj_get_int(args[0]);
gc_dump_alloc_table(arg == 1, arg == 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend this to be: gc_dump_alloc_table(arg >= 1, arg >= 2); ? That would match the docs.

Or maybe it could be gc_dump_alloc_table(arg & 1, arg & 2), which would allow better control over what's printed (sometimes you may only want the frag stats).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also clarified the docs.

py/gc.c Outdated
// How many consecutive free blocks.
size_t nfree = 0;
// Number of allocs that would succeed for 1, 2, 4, .. 2^n blocks.
size_t frag_classes[NUM_FRAGMENTATION_CLASSES] = {0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use MICROPY_GC_STACK_ENTRY_TYPE instead of size_t here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jimmo
Copy link
Member Author
jimmo commented May 13, 2022

Rebased.

I guess the decision is still whether this is worth +256 bytes.

I would possibly make the case that we should disable the full dump on EXTRA_FEATURES and below, i.e. only enable for FULL_FEATURES, in which case the this PR would be net -140 bytes on PYBV11.

The previous behavior was (for ports that enable it via
`MICROPY_PY_MICROPYTHON_MEM_INFO`), that passing any argument to
`micropython.mem_info()` would print out the block table.

``1`` means dump the block table (as before), ``2`` means show the
fragmentation stats, i.e. how many allocations of a given size can
succeed, ``3`` means both.

On small STM32 ports (L0, F0), don't enable support for the full
block table, only for the frag stats.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
@projectgus
Copy link
Contributor

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0