bpo-42972: Fully implement GC protocol for functools LRU cache #26423

erlend-aasland · 2021-05-28T08:25:01Z

clear and visit members in correct order
visit LRU cache elem type via lru_cache_tp_traverse

https://bugs.python.org/issue42972

vstinner

I'm not sure if it's a good idea to "expose" the link type in this traverse function. The only purpose would be to be able to break reference cycles involving multiple "hidden" objects: the LRU cache itself (which is not directly accessible in Python, you need to dig into gc.get_objects()) and the link type (again, you need gc.get_objects() since @methane removed it from the module dict, which is a good thing).

I'm not strongly against it, technically, it respects the GC: the clear function clears the linked list and so removes references to the link type.

By the way, making the two LRU heap types (cache and link) immutable can make these reference cycles even less likely. But it should be done in a separated PR.

vstinner · 2021-05-28T08:30:47Z

Modules/_functoolsmodule.c

@@ -1327,15 +1327,17 @@ lru_cache_deepcopy(PyObject *self, PyObject *unused)
 static int
 lru_cache_tp_traverse(lru_cache_object *self, visitproc visit, void *arg)
 {
+    Py_VISIT(Py_TYPE(self));
    lru_list_elem *link = self->root.next;
    while (link != &self->root) {
        lru_list_elem *next = link->next;


Please add a comment (with a reference to bpo-42972) somewhere (here may be good place) to explain why the link type doesn't implement the GC protocol.

I understand that the code works as if the GC protocol is implemented, but the code is inlined.

vstinner · 2021-05-28T08:45:05Z

Modules/_functoolsmodule.c

    lru_list_elem *link = self->root.next;
    while (link != &self->root) {
        lru_list_elem *next = link->next;
        Py_VISIT(link->key);
        Py_VISIT(link->result);
+        Py_VISIT(Py_TYPE(link));


nitpick: I suggest to visit the type first, to visit object members in their definition order.

I'm not sure that I get the rationale why the link type doesn't implement the GC protocol but is "exposed" by this Py_VISIT() call in gc.get_objects(). I'm not sure that it's a good idea to visit the type.

nitpick: I suggest to visit the type first, to visit object members in their definition order.

Sorry, I was a little bit too fast there. We can fix it later if needed.

vstinner · 2021-05-28T09:01:23Z

Oh. lru_list_elem_type was modified to not implemented the GC to optimize the code, I missed that:

Please explain it in a comment, it's non-obvious.

https://bugs.python.org/issue32422 seems to conflict with https://bugs.python.org/issue42972

Maybe it's acceptable to not implement the GC protocol if the link type is hidden well and it is immutable. In my experience, reference cycles are created in very surprising ways. I don't know about this one. I don't know if it's acceptable for now, if it's a trade-off, or if there is a risk of leaks.

An alternative would be to avoid completely Python objects in the cache. Replace lru_cache_object.cache Python dictionary with a _Py_hashtable_t. So the linked list items don't have to be Python objects anymore.

But this change would be a major change and should only be done in the main branch.

The question is what to do in the 3.10 branch:

Implement again the GC protocol in the link type: revert https://bugs.python.org/issue32422
Don't implement the GC protocol in the link type, make it immutable, and hide it to reduce the risk of reference cycles involing this type

miss-islington · 2021-05-28T09:02:45Z

Thanks @erlend-aasland for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

…nGH-26423) (cherry picked from commit 3f8d332) Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>

bedevere-bot · 2021-05-28T09:02:54Z

GH-26425 is a backport of this pull request to the 3.10 branch.

vstinner · 2021-05-28T09:04:06Z

I dislike the change, but I merged it anyway to unblock https://bugs.python.org/issue42972 since @pablogsal marked it as a release blocker for the beta2. Moreover, at least, the change makes the code less broken, since it now visits the LRU cache type which is a bugfix. We can decide what to do with the link type after the beta2.

erlend-aasland · 2021-05-28T09:59:15Z

@vstinner: I can add explanatory comments in a new PR, if you want. Also, let me know if you want a PR for making the cache and cache link types immutable.

pablogsal · 2021-05-28T13:46:24Z

I am a bit confused about what happened here, the PR mentions that "fully implement the GC protocol" but the change just moves some calls and visits the type....

erlend-aasland · 2021-05-28T13:53:45Z

I am a bit confused about what happened here, the PR mentions that "fully implement the GC protocol" but the change just moves some calls and visits the type....

The LRU cache type already had the Py_TPFLAGS_HAVE_GC flag, and already had a traverse method. The only thing missing was visiting the type. According to bpo-42972, the LRU cache now fully implements the GC protocol, no?

) (cherry picked from commit 3f8d332) Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>

pablogsal · 2021-05-28T14:59:01Z

I am a bit confused about what happened here, the PR mentions that "fully implement the GC protocol" but the change just moves some calls and visits the type....

The LRU cache type already had the Py_TPFLAGS_HAVE_GC flag, and already had a traverse method. The only thing missing was visiting the type. According to bpo-42972, the LRU cache now fully implements the GC protocol, no?

Is there any regression in performance with this or other related changes for the cache? I am asking basically based on @methane previous - efforts to optimize this.

erlend-aasland · 2021-05-28T17:02:49Z

Is there any regression in performance with this or other related changes for the cache? I am asking basically based on @methane previous - efforts to optimize this.

@methane's optimisations were in the end applied (lru cache link type is visited via the lru cache type traversal slot). I haven't done any benchmarking yet.

bpo-42972: Fully implement GC protocol for functools LRU cache

3573c4d

erlend-aasland requested a review from rhettinger as a code owner May 28, 2021 08:25

the-knights-who-say-ni added the CLA signed label May 28, 2021

bedevere-bot added the awaiting review label May 28, 2021

erlend-aasland requested review from vstinner, shihai1991 and methane May 28, 2021 08:25

erlend-aasland added needs backport to 3.10 only security fixes skip news labels May 28, 2021

methane approved these changes May 28, 2021

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels May 28, 2021

erlend-aasland mentioned this pull request May 28, 2021

bpo-42972: Fully implement GC protocol for functools keywrapper and partial types #26363

Merged

vstinner reviewed May 28, 2021

View reviewed changes

vstinner merged commit 3f8d332 into python:main May 28, 2021

bedevere-bot removed the awaiting merge label May 28, 2021

bedevere-bot removed the needs backport to 3.10 only security fixes label May 28, 2021

erlend-aasland deleted the gc-lru-cache branch May 28, 2021 09:44

miss-islington added a commit that referenced this pull request May 28, 2021

bpo-42972: Fully implement GC protocol for functools LRU cache (GH-26423

1c0106c

) (cherry picked from commit 3f8d332) Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>

pablogsal mentioned this pull request Jul 18, 2021

[3.10] Correct the order of regen-abidump #27228

Closed

vstinner mentioned this pull request Jun 9, 2022

[C API] Heap types (PyType_FromSpec) must fully implement the GC protocol #87138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-42972: Fully implement GC protocol for functools LRU cache #26423

bpo-42972: Fully implement GC protocol for functools LRU cache #26423

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bpo-42972: Fully implement GC protocol for functools LRU cache #26423

bpo-42972: Fully implement GC protocol for functools LRU cache #26423

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!