8000 Call free on GC for object types by srajko · Pull Request #958 · nodegit/nodegit · GitHub
[go: up one dir, main page]

Skip to content

Call free on GC for object types#958

Merged
johnhaley81 merged 12 commits intonodegit:masterfrom
srajko:object-leak-fix
Mar 24, 2016
Merged

Call free on GC for object types#958
johnhaley81 merged 12 commits intonodegit:masterfrom
srajko:object-leak-fix

Conversation

@srajko
Copy link
Collaborator
@srajko srajko commented Mar 18, 2016

No description provided.

@srajko srajko force-pushed the object-leak-fix branch 2 times, most recently from d3adb47 to 27713af Compare March 22, 2016 22:50
@srajko
Copy link
Collaborator Author
srajko commented Mar 22, 2016

This adds two mechanisms - one marks types as selfFreeing, which allows for the selfFreeing parameter to be passed in with the correct value for that particular type, when creating a new wrapper for a returned libgit2 instance. It's a little wonky that we currently have two selfFreeing mechanisms with overlapping functionality, and eventually these two might merge together, but for now I was a little weary of changing too much. For this, we added a test that confirms that git_commit_free etc. are called on types which are selfFreeing.

The second mechanism is selfDuplicating and addresses the problem that arises when a selfFreeing object returns an internally stored field (for example git_commit_author) which then gets destroyed as part of the selfFreeing object free. selfDuplicating objects will get duplicated before being stored in the wrapper, and freed on destruction / garbage collection, avoiding the aforementioned issue.

@srajko
Copy link
Collaborator Author
srajko commented Mar 22, 2016

BTW, this is leading up to #955 - by itself, in terms of memory leaks this PR only has a benefit if the user calls free on the repository object explicitly (because the object types affected by the PR are cached in the repository object, and even though we free them when the wrappers go out of scope the free just causes their refcount to decrease, and the repository will still have an outstanding reference preventing the refcount from going to 0 and the object actually getting freed).

@srajko srajko changed the title [WIP] Call free on GC for object types Call free on GC for object types Mar 23, 2016
@johnhaley81 johnhaley81 self-assigned this Mar 23, 2016
to = {{ cppClassName }}::New({{= parsedName =}});
{% else %}
to = {{ cppClassName }}::New({{= parsedName =}}, false);
to = {{ cppClassName }}::New({{= parsedName =}}, {% if selfFreeing %}true{% else %}false{% endif %});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty verbose. I think there should be another Combyne filter called toBool that just basically returns !!value. I'll push that to your branch.

@srajko
Copy link
Collaborator Author
srajko commented Mar 23, 2016

I was thinking about extending the instance counting to all wrappers (not just for types marked as selfFreeing), and count how many instances were constructed / destructed both when selfFreeing is passed true and false (separately). That might help us diagnose where remaining leaks are coming from.

@johnhaley81 what do you think?

@srajko
Copy link
Collaborator Author
srajko commented Mar 23, 2016

@implausible for selfDuplicating, we might need to override the behavior for certain functions (or maybe we only specify it for certain functions, namely the ones returning owned data from commits, trees, blobs, and tags).

For example, we should definitely not duplicate on https://libgit2.github.com/libgit2/#HEAD/group/signature/git_signature_new (unless we immediately free the returned signature, but at that point we might as well just keep the returned signature and free it on destroy just like we do with the copies).

@johnhaley81
Copy link
Collaborator

@srajko wrt SelfFreeingInstanceCount it seems weird that we're adding stuff that we only need for tests. However, the only way we could get away with not exposing it JS side is to have a C++ testing kit unless somebody knows something that I don't (plzplzplz).

Regardless of that though, it does highlight a need for a C++ testing suite so we should start looking into that. Once we get something going in that case I think we should remove SelfFreeingInstanceCount from the JS object.

@johnhaley81
Copy link
Collaborator

So extending the instance count to all wrappers (this comment notwithstanding) is a good idea. In tests we should at least start documenting memory usage and anything that can help diagnose leaks is 👍

@johnhaley81
Copy link
Collaborator

@srajko so selfDuplicating I think is on the wrong object. Or at least is very confusing. Currently it exists on the object that is duplicated in a global setting (i.e. this object must be duplicated always every time) which means that we'll copy memory 100% of the time regardless. What I think it should be is a duplicate function on the object and a shouldDuplicate on methods that return an object that has a duplicate function and which the parent object owns the original (and will free it when itself frees)

However, this is (what I believe) selfFreeing was supposed to address originally. i.e. you have an object that has to free its underlying libgit2 type or you have an object that should not free its underlying libgit2 type. When constructing the JS value we were passing in what type of object that was and I don't see exactly why we need the new value. Could you help fill in where I might be overlooking?

@srajko
Copy link
Collaborator Author
srajko commented Mar 23, 2016

What I think it should be is a duplicate function on the object and a shouldDuplicate on methods that return an object that has a duplicate function and which the parent object owns the original (and will free it when itself frees)

I think that would work well, and would address #958 (comment)

However, this is (what I believe) selfFreeing was supposed to address originally

It's related but slightly different. I think, in the endgame, we need every object to be either self-freeing (that is, the wrapper owns it, and nothing else is going to free it while the wrapper is alive), or it needs to have a handle to its owner's wrapper (so that the owner can't be GCd/destroyed as long as the object in question is reachable). Currently though, we have three classes of objects - leaked, selfFreeing (libgit2 hands it to us, and we free on GC), and selfDuplicating (libgit2 hands it to us, we copy it so that we can own it, then we free on GC).

For things like git_signature, which can be returned by commits and owned by commits, either we need the duplication mechanism so we own the git_signature, or we need the signature to have a handle on its owner (the commit, for example), so that it doesn't get freed while reachable.

@srajko
Copy link
Collaborator Author
srajko commented Mar 23, 2016

@johnhaley81 I agree, C++ testing would be great. I think I got something going on a branch with https://www.npmjs.com/package/cppunitlite at one point. I would prefer http://banditcpp.org/ though, for its syntactic similarity to the JS tests (maybe cppunitlite and bandit could be used together?)

I will extend the count system to all wrappers then, but I do agree with you - once we get the leaks resolved all across nodegit and/or get some C++ tests going we could pull out the exposed statistics.

@johnhaley81
Copy link
Collaborator

I'm looking into why coverage tests are dying. @tbranyen have any idea why this is happening?

@johnhaley81
Copy link
Collaborator

Aaaaand it fixed itself /shrug

@johnhaley81
Copy link
Collaborator

Just a side note that what just happened is scary as hell :)

@srajko
Copy link
Collaborator Author
srajko commented Mar 23, 2016

I fixed up the diagnostic counts. I didn't want to go overboard with what we count, so I'm just counting too values - for self-freeing objects, it counts the number of outstanding instances (constructed-destructed). For non-self-freeing, it only counts the number of co 10BC0 nstructed instances (as this is a possible source of memory leaks).

@srajko
Copy link
Collaborator Author
srajko commented Mar 23, 2016

@implausible do you want to take a look at #958 (comment) / #958 (comment) or do you want me to?

@implausible
Copy link
Member

@srajko i'm currently working on increasing lock coverage in GK and as that will also be a big change... I don't think I can contribute today.

to = {{ cppClassName }}::New({{= parsedName =}});
{% else %}
to = {{ cppClassName }}::New({{= parsedName =}}, {{ selfFreeing|toBool }});
to = {{ cppClassName }}::New({{= parsedName =}}, {{ selfFreeing|toBool }} {% if shouldDuplicate %}, true{% endif %});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{ shouldDuplicate|toBool }}?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it, but there is a case where we call that on something that I think is a struct. And I didn't want to add that parameter to struct constructors without adding support for it as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so adding the 3rd parameter only whenshouldDuplicate is turned on avoids that problem)

srajko added 6 commits March 23, 2016 16:20
Affects commit objects returned by:
git_commit_lookup
git_commit_lookup_prefix
git_commit_nth_gen_ancestor
git_commit_parent
Affects tree objects returned by:
git_tree_lookup
git_tree_lookup_prefix
Affects tag objects returned by:
git_tag_lookup
git_tag_lookup_prefix
Affects blob objects returned by:
git_blob_lookup
git_blob_lookup_prefix
srajko and others added 4 commits March 23, 2016 16:20
It's possible that we will eventually get rid of the selfFreeing parameter, and just go by the type descriptor, but for now we are more comfortable with a smaller change.
the structs git_tree_entry, git_oid, and git_signature all represent structs which are owned by some other object. In order to simplify our memory model, we should always duplicate these 3 structs when we pull them out of libgit2. This detaches these types of structs from their owning object, such that when we delete a commit, none of the oids or signatures that we retrieved from that commit would cause bad memory access.
@srajko
Copy link
Collaborator Author
srajko commented Mar 23, 2016

I've added the shouldDuplicate functionality proposed in #958 (comment) instead of selfDuplicating, and flagged the necessary functions. Someone might want to go through the libgit2 docs for blob, commit, tree, and tag, and double-check that I flagged everything correctly.

@johnhaley81
Copy link
Collaborator

This LGTM now. Thanks @srajko

@johnhaley81 johnhaley81 merged commit 487aa31 into nodegit:master Mar 24, 2016
@johnhaley81 johnhaley81 deleted the object-leak-fix branch March 24, 2016 04:24
johnhaley81 added a commit to srajko/nodegit that referenced this pull request Mar 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

0