-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add missing xmlparser attributes: namespace_prefixes, ordered_attributes, specified_attributes, intern #6494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+182
−7
Merged
Add missing xmlparser attributes: namespace_prefixes, ordered_attributes, specified_attributes, intern #6494
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
3fcd777
Initial plan
Copilot 9dd0626
Add namespace_prefixes and other missing xmlparser attributes
Copilot 18a27f6
Remove expectedFailure decorators from passing tests
Copilot 143e52f
Address code review feedback
Copilot e6bf9de
Auto-format: cargo fmt --all
github-actions[bot] c9be4ac
Remove expectedFailure from test_parse_str and test_parse_file
Copilot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
<
10000
div data-view-component="true" class="Popover position-fixed mt-2 pt-1 right-0 left-0">
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
CPython xml.parsers.expat xmlparser attribute names buffer_text namespace_prefixes ordered_attributes specified_attributes💡 Result:
Summary of xml.parsers.expat xmlparser attributes
buffer_text: when True, the parser buffers character data to avoid many small CharacterDataHandler calls (Expat normally splits text at line endings). Default False; can be changed at any time. [1]
buffer_size / buffer_used: buffer_size sets the buffer capacity used when buffer_text is True (changing it flushes the buffer). buffer_used reports how many bytes (UTF‑8) are currently stored. [1]
ordered_attributes: when nonzero, attributes passed to StartElementHandler are a list (name, value, name, value, ...) in the document order instead of a dict mapping names→values. Default False. [1]
specified_attributes: when nonzero, the parser reports only attributes that were explicitly specified in the instance document (not those defaulted from DTD attribute declarations). Use with care if you rely on attribute-declaration defaults. Default False. [1]
namespace handling / namespace_prefixes: Expat can do namespace processing when created via ParserCreate with a namespace separator (the Python wrapper exposes this via namespace_separator). With namespace processing enabled, Expat consumes xmlns/xmlns:... attributes (so they won’t appear as normal attributes) and expands element and attribute names to "URIlocalname". Namespace declaration events can be observed with StartNamespaceDeclHandler/EndNamespaceDeclHandler. (The behavior of attribute names and namespace-declaration attributes follows Expat’s namespace processing rules.) [2][1]
Sources
🏁 Script executed:
Repository: RustPython/RustPython
Length of output: 1783
🌐 Web query:
CPython pyexpat intern attribute dictionary💡 Result:
Short answer: pyexpat interns attribute names/strings when it builds the Python attributes dictionary so the attribute keys/other XML name strings become interned Python str objects (reducing allocations and speeding dict lookups). The C implementation calls its string-intern helper when converting Expat's XML_Char strings to Python strings before putting them into the attributes dict; the public API then hands you a normal dict (or ordered list if ordered_attributes is enabled). See the StartElementHandler docs and the pyexpat.c source where string_intern is used.
Sources: Python pyexpat documentation (StartElementHandler / attributes). [1] PyExpat C implementation showing string_intern usage when constructing handler arguments. [2]
References
[1] xml.parsers.expat — StartElementHandler / attributes (Python stdlib docs).
[2] Modules/pyexpat.c (source: string_intern usage when converting element/attribute names).
🌐 Web query:
CPython xml.parsers.expat ParserCreate namespace_prefixes attribute xmlparser💡 Result:
Short answer: There is no documented xmlparser.namespace_prefixes attribute in CPython's xml.parsers.expat. To enable namespace processing you pass namespace_separator to ParserCreate(); namespace declarations are reported via StartNamespaceDeclHandler/EndNamespaceDeclHandler and element/attribute names are expanded using the separator. Expat itself can be instructed to return the (uri, localname, prefix) triplet (XML_SetReturnNSTriplet), but that API is part of libexpat and is not exposed as a simple xmlparser.namespace_prefixes attribute in the Python docs. [1][2]
References
Sources
[1] Python docs — xml.parsers.expat (ParserCreate, StartNamespaceDeclHandler).
[2] Expat reference — Namespace Processing / XML_SetReturnNSTriplet.
🌐 Web query:
CPython xml.parsers.expat xmlparser complete list of attributes properties💡 Result:
Below is a concise, complete listing (as given by the CPython docs) of the xml.parsers.expat xmlparser type: its public methods, instance attributes (properties) and the handler attributes you can set. Source: Python standard-library documentation for xml.parsers.expat. [1]
Methods
Instance attributes / properties
Error-related attributes (valid after ExpatError)
Current parse-position attributes
Handler attributes (assign callable functions: o.handlername = func)
Notes
Reference
[1] Python documentation — xml.parsers.expat (XMLParser / xmlparser) (CPython stdlib).
🏁 Script executed:
Repository: RustPython/RustPython
Length of output: 2765
Remove non-standard attributes
namespace_prefixesandinternor add comments explaining why they diverge from CPython's xml.parsers.expat API.According to the official CPython documentation, the xmlparser instance attributes are:
buffer_size,buffer_text,buffer_used,ordered_attributes, andspecified_attributes. The RustPython implementation registersbuffer_text,ordered_attributes, andspecified_attributescorrectly, but also includes:namespace_prefixes— not documented in CPython's public APIintern— not a public xmlparser attribute in CPython (though internal string interning occurs in the C implementation)Either remove these or add inline documentation explaining them as RustPython extensions to match CPython's behavior and prevent compatibility issues.
🤖 Prompt for AI Agents