8000 gh-101438: Avoid reference cycle in ElementTree.iterparse. by colesbury · Pull Request #114269 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-101438: Avoid reference cycle in ElementTree.iterparse. #114269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
gh-101438: Avoid reference cycle in ElementTree.iterparse.
Refactor IterParseIterator to avoid a reference cycle between the
iterator() function and the IterParseIterator() instance. This leads to
more prompt clean-up of the "source" file if the returned iterator is
not exhausted and not otherwise part of a reference cycle.

This also avoids a test failure in the GC implementation for the
free-threaded build: if the "source" file is finalized before the
"iterator()" generator, a ResourceWarning is issued leading to a
failure in test_iterparse(). In theory, this warning can occur in
the default build as well, but is much less likely because it would
require an unlucky scheduling of the GC between creation of the generator
and the file object in order to change the order of finalization.
  • Loading branch information
colesbury committed Jan 18, 2024
commit 2119f17f3f3e6cb62040c56cecd26666bd284f10
18 changes: 14 additions & 4 deletions Lib/xml/etree/ElementTree.py
Original file line number Diff line number Diff line change
Expand Up @@ -1222,6 +1222,7 @@ def iterparse(source, events=None, parser=None):
# Use the internal, undocumented _parser argument for now; When the
# parser argument of iterparse is removed, this can be killed.
pullparser = XMLPullParser(events=events, _parser=parser)
_root = None

def iterator(source):
close_source = False
Expand All @@ -1239,15 +1240,24 @@ def iterator(source):
pullparser.feed(data)
root = pullparser._close_and_return_root()
yield from pullparser.read_events()
it.root = root
nonlocal _root
_root = root
finally:
if close_source:
source.close()

class IterParseIterator(collections.abc.Iterator):
__next__ = iterator(source).__next__
it = IterParseIterator()
it.root = None
def __init__(self, it):
self.it = it

def __next__(self):
return next(self.it)

@property
def root(self):
return _root

it = IterParseIterator(iterator(source))
del iterator, IterParseIterator

next(it)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Avoid reference cycle in ElementTree.iterparse. The iterator returned by
``ElementTree.iterparse`` may hold on to a file descriptor. The reference
cycle prevented prompt clean-up of the file decsriptor if the returned
iterator was not exhausted.
0