8000 GH-92123: Move _elementtree heap types to module state by erlend-aasland · Pull Request #101187 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

GH-92123: Move _elementtree heap types to module state #101187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

erlend-aasland
Copy link
Contributor
@erlend-aasland erlend-aasland commented Jan 20, 2023

Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
@erlend-aasland
Copy link
Contributor Author
erlend-aasland commented Jan 21, 2023

Thanks for the reviews, Kumar and Oleg!

For the path forward, there are two options: either pass state as a parameter, or store it in the type structs. I've created draft PRs for the two alternatives. Of course, a third path is a combination of both.

@erlend-aasland erlend-aasland merged commit 13566a3 into python:main Jan 21, 2023
@erlend-aasland erlend-aasland deleted the etree/move-heap-types-to-state branch January 21, 2023 11:01
@arhadthedev
Copy link
Member

For the path forward, there are three options: either pass state as a parameter, store it in the type structs.

Both variants ultimately call PyModule_GetState, right? [with (1) caching its result into a field of self and (2) plainly calling it each time on demand]

If so, we need an informed decision by benchmarking some _elementtree method in both options. It should show whether direct structure access of (1) is faster than preheated call to PyModule_GetState of (2).

@erlend-aasland
Copy link
Contributor Author
erlend-aasland commented Jan 21, 2023

Both variants ultimately call PyModule_GetState, right?

PyModule_GetState is called for module level functions, PyType_GetModuleState for class methods (via defining class), and PyType_GetModuleByDef plus PyModule_GetState for slots. The two former are inexpensive lookups, the latter is slightly slower. Caching state in self results in more memory usage (might only be important for the Element type). I'd be surprised if it is possible to measure the performance impact of this, but sure let's do some benchmarking.

@erlend-aasland
Copy link
Contributor Author

I ran the pyperformance xml_etree benchmark towards all three alternatives (1: pass state around as a parameter, 2: store state in the heap type contexts, 3: store state in TreeBuilder and XMLParser contexts, but pass it around for Element and its iter). I could not measure a significant difference any of the alternatives.

@kumaraditya303
Copy link
Contributor

It won't matter as expected as most of the time as long as there isn't much stack spilling the performance should be identical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0