8000 BUG: Fix memory leak (#10157). by skrah · Pull Request #10286 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix memory leak (#10157). #10286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 30, 2017
Merged

BUG: Fix memory leak (#10157). #10286

merged 2 commits into from
Dec 30, 2017

Conversation

skrah
Copy link
Contributor
@skrah skrah commented Dec 27, 2017

This uses the same logic for python3 as for python2: PyUnicode_AsUTF8String(str) returns a new reference, but PyString_AS_STRING(str) does not.

#10157

@twmr
Copy link
twmr commented Dec 27, 2017

Can you add a test for that please?

@eric-wieser
Copy link
Member
eric-wieser commented Dec 28, 2017

This looks correct in that it makes py2 and py3 the same.

However, I now worry that there's a use-after-free bug here, and that tp_doc ends up containing a pointer to a deallocated string. Then again, if that is the case it was a bug in py2 already

@skrah
Copy link
Contributor Author
skrah commented Dec 28, 2017

@Thisch This only shows up in Valgrind (#10157), it would be difficult to add a test.

@eric-wieser I agree that setting the docstrings in that manner seems fragile, but I think it works:

  1. If PyUnicode_AsUTF8(str) returns NULL, tp_doc is set to NULL. This is valid for "no docs".

  2. If PyUnicode_AsUTF8(str) succeeds, it either a) returns a cached pointer to its internal utf8 representation or b) creates a new (unique) internal utf8 representation. That is to say, repeated calls on the same immutable string will return the same char *.

docstr is assigned to tp_doc etc., then the owner of the char * is incremented immediately. This means that it cannot go away -- and that technically there is still a leak, but this leak does not show up under "definitely lost" in Valgrind.

@skrah
Copy link
Contributor Author
skrah commented Dec 28, 2017

Of course one can check the return value of PyUnicode_AsUTF8(str), but I wanted to provide a minimal patch.

@eric-wieser
Copy link
Member

returns a cached pointer to its internal utf8 representation

Which is then a dangling pointer when the string object is DECREF'd, which I'd expect it to be once the call to add_new_docs is complete.

@skrah
Copy link
Contributor Author
skrah commented Dec 28, 2017

str comes from PyArg_ParseTuple(), which is reference-neutral. It is valid without incrementing inside add_new_docs, with incrementing it is valid outside the function.

@eric-wieser
Copy link
Member
eric-wieser commented Dec 28, 2017

which is reference-neutral. I

Right - so the refcount is one due to the string literal, the function is called, and then it returns, and the temporary string literal is cleaned up, bringing it down to 0, and leaving a dangling pointer.

More explicitly:

x = "somestring"  # refcount of the string is 1
some_func(x)  # reference-neutral, remains at 1
x = None # refcount of the string is 0 again, internal pointers become dangling

@skrah
Copy link
Contributor Author
skrah commented Dec 28, 2017

When the C function is called, str belongs to the *args tuple, which has its own reference to str.

When the C function returns, the tuple may be cleaned up, but it only decrements its own references to str.

@eric-wieser
Copy link
Member

I think the reference in the args tuple is the only remaining reference.

It's possible we're being saved by the string being interned, which would ensure it never got collected.

@skrah
Copy link
Contributor Author
skrah commented Dec 28, 2017

I think the reference in the args tuple is the only remaining reference.

If I can't convince you, so be it. I'm CPython developer and I'm telling you that this is not how the C-API works.

@eric-wieser
Copy link
Member

I'd rather that one of us succeed in convincing the other.

Is my three line coffee example above a valid simplification? And if so, which of my line-comments is inaccurate?

@eric-wieser
Copy link
Member

Apologies, I missed the Py_INCREF(str); at the end of the function

@skrah
Copy link
Contributor Author
skrah commented Dec 28, 2017

Ok good, the Py_INCREF(str) is indeed somewhat hidden there. :)

@eric-wieser
Copy link
Member

This is pretty much untestable, right?

@twmr
Copy link
twmr commented Dec 29, 2017

Why can't we install valgrind and run the minimal example #10157 (comment) in a subprocess of the testrunner? The test then has to check if the stdout of the subprocess contains the desired content.

@eric-wieser
Copy link
Member

That sounds out of scope for this PR, but might still be a good idea

  - tp_doc can handle NULL, but CPython compiled --with-pydebug aborts
    on non-NULL returns after an exception has been set.
@skrah
Copy link
Contributor Author
skrah commented Dec 29, 2017

Sorry for pushing a change after approval. I thought it would be better to do the right thing and check the return value after all. Python compiled --with-pydebug aborts if a function sets an exception but returns non-NULL.

@eric-wieser
Copy link
Member

No worries, thanks for catching that too. LGTM still.

@twmr
Copy link
twmr commented Dec 29, 2017

I'm working on a unit-test in twmr@04097f7 (https://travis-ci.org/thisch/numpy/builds/322878412)

WDYT if this commit is cherry-picked into the branch of this PR?

@charris charris merged commit 47cbd40 into numpy:master Dec 30, 2017
@charris
Copy link
Member
charris commented Dec 30, 2017

@skrah Thanks.

@Thisch Make a separate PR when you are ready. Might be good to make valgrind testing depend on an environmental variable because we probably don't want to do it for all the test runs.

@twmr twmr mentioned this pull request Dec 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0