-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py/gc.c: decouple gc_sweep from allocation/free operations #1321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit is the first part in a series of commits to decouple the alloc/dealloc component of the upython memory manager from the garbage collector. - mem_free implemented to be independent of gc_free. - mem_first and mem_next functions for a standard api to walk through memory This commit WILL have a small impact on performance. The mem_next function is not *quite* as fast as the original implementation. Also, mem_free performs some checks (in MEM_VALID) that have essentially already been done by the loop. I contend the following: - there is no way to make the "next" funcionality faster and allow it to be fully decoupled - we could make the mem_free function faster by implementing a mem_free_unsafe (or something) that would not perform the checks. This could be a macro for speed, and called by the mem_free function. Whether this complexity is required for the small increase in performance is up to the upython developers
if (ATB_GET_KIND(block) == AT_MARK){ | ||
ATB_MARK_TO_HEAD(block); | ||
} else { | ||
// TODO: bad indentation kept intentionally for git diff (will move back next commit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such things should not be necessary: use git diff -w
to ignore whitespace. Or for the diff on github itself: append ?w=1
to the end of this url (so you get https://github.com/micropython/micropython/pull/1321/files?w=1
) That is invalueable anyway fot commits like this else all you see is one huge bunch of adds/deletes.
Storing the result of Some other remarks:
|
@cloudformdesign thanks for the effort trying to get this ready for merging, but I think we need to change how this is going to be done. This patch increases code size by about 150 bytes and (supposedly) decreases performance. By the time we add all patches to get to a state where the allocator is separated from the GC it's likely that the increase in code size will be at least a few times 150 bytes and performance will further suffer. Since the current GC/allocator is well proven, compact and relatively efficient it seems a shame to lose it, and so I don't want to make any major changes to it (such as those proposed here). Secondly, I was under the impression that #1168 was about providing a completely independent memory allocator/GC to replace the current one. Ie a complete replacement for gc.c that implements all the functions in gc.h (gc_alloc, gc_free, ...). But it seems that you want to divide gc.c into allocator and GC, and then replace the allocator. (I may have missed something in the previous discussions about all this so sorry if what I'm saying has already been discussed, or I'm missing something.) So, way forward would be to copy gc.c to some new file (eg gcplug.c) and work on that. There would be a config variable to select which allocator to use (gc.c or gcplug.c). Also, please read CODECONVENTIONS carefully and follow them. You have some whitespace inconsistencies. And you need to write (void) for functions that take no arguments. |
@cloudformdesign I hope you are not discouraged to work on this, it would be a shame to lose the work you have done so far. Note that you have picked probably the most critical (and therefore difficult) piece of the code to work on: the memory manager is critical for efficiency in time, efficiency in memory use, and robustness. |
No no, my daughter was just born on Sunday night, so this has been on haitus! I think separating it so it can be chosen at compile time is an excellent idea. I need to figure out how to do this for this project. Do you do it in the makefile, in header files, at the command line, or something else? @pfalcon if the allocator was selected this way would you be more open to the single large commit I have already made? Obviously we would make it no longer change gc.c |
@cloudformdesign : Congratulations! I'm personally ok with any implementation which achieves realistic goals in a seamless manner (and there're other people to do reviews). Note that the way @dpgeorge described it is how I'd approach it - not starting with separating gc and memmgr part, but just making it be able to use gc_something.c instead of existing gc.c . The way you approached it is better, more formal, more reusable, more advanced, more cool. It's also more complicated, harder to do, harder to do right, and then harder to get merged. Note that I also predicted @dpgeorge's concerns, and that's again why I suggested to start with factoring out just a single function - to learn a way, and to prove, that it's possible to provide gc/memmgr separation without affecting performance. |
We need to talk about the not affecting performance part. I don't think this is possible without implementing separate ifdef blocks that reach into the internals of the selected men manager for some functions, which harms plug and play. If we require performance to not be impacted at all then this is going to be very difficult (read impossible). This is another reason to have separate pluggable functionality, as we can experiment a little as we simultaneously reach our goals. My goal would be to eventually have the faster current implementation and a lower performing but robust option. We can then do whatever we want to improve performance from there |
Great news @cloudformdesign! Take your time, no hurry.
It depends. All configurable stuff in py/ is done using #if guards; ie all files are compiled but sometimes they contain nothing. In unix/ some files are optionally compiled depending on a flag that the makefile has access to. Part of the work making gc.c replaceable will be figuring out how to configure it :) Really, if you can just provide a minimal working example (ie passes most tests) of how to use something other than gc.c then that would be a great start.
If gc.c is replaceable with gc_X.c then there will be no change in performance using the existing gc.c. That is how it should work, a simple way of using different memory implementations that doesn't force anything on the implementation except that they should provide gc_alloc, etc (which can be renamed to mem_alloc, or something). |
Hey @dpgeorge it will be a little while until I start work on this, but #1234 passes all tests for unix (still having build problems for ARM) and successfully splits the memory manager. Would you mind reviewing that? If the Personally I would solve this issue by creating a Then to add memory managers you just put them in a named folder, and create a |
The case of having a memory manager with separate and pluggable GC and malloc components is difficult. Let's first try to solve the case of having the whole thing pluggable as a single entity. I don't think we should introduce subdirs in py/. Currently we use prefixes on filenames to separate parts of the implementation (eg mod_, obj_) so we can continue doing it that way. I also think we should have a single file implementing a given memory allocation scheme. As is currently done with the nlr implementations, all files are run through the compiler, but only one is actually used due to the correct macro being defined (so all files are wrapped in a #if). The relevant state for the memory manager will need to be selected in mpstate.h using the same #if macro. gc.h is the main header that describes the memory manager interface. Any implementation must provide the functions listed there. On file names, I can think of two options:
Further down the track, when we want to implement sub components within a heap manager, there will still be one master file for a given implementation (heapXXX.c) but it will #include relevant sub-files to provide the actual functions. Eg heapplug.c could #include heapgcplug.c and heaptinymem.c. |
I am now working on this. Closing this pull request and will be opening a new one. I will be doing the following:
Does that all sound good? Edit: made it |
Having duplicates of code is not ideal, but I don't see any other way to proceed. |
No. As was discussed, avoid munging existing code until you have something of real-world value to show. Starting with renaming of files, which may show as a right step in distance future, isn't a good way. As was discussed before, you should optimize your work for following criteria: 1) change as little as possible in existing code; 2) do something useful as soon as possible and with as little effort as possible ("little effort" as in: "avoid avoidable changes and refactors", not as "write sloppy code"). As usual, that's just my personal opinion. |
Here's good example of consequences of changing memalloc/gc code: http://forum.micropython.org/viewtopic.php?f=2&t=1112 . Fanalization is a pretty straightforward code. 1.5 years after it was added, it's still buggy (and that's of course not the only bug fixed over this time). That's not the reason to not work on memalloc/gc, it's a reason to have very rigid process for that. |
Move atmel-samd to tinyusb and support nRF flash.
This commit is the first part in a series of commits to decouple
the alloc/dealloc component of the upython memory manager from the
garbage collector.
through memory
This commit WILL have a small impact on performance. The mem_next
function is not quite as fast as the original implementation.
Also, mem_free performs some checks (in MEM_VALID) that have
essentially already been done by the loop.
I contend the following:
it to be fully decoupled
mem_free_unsafe (or something) that would not perform the checks.
This could be a macro for speed, and called by the mem_free
function. Whether this complexity is required for the small
increase in performance is up to the upython developers