Description
Bug description:
Minimal reproducer
import xml.etree.ElementTree as ET
print(
ET.tostring(ET.fromstring("<foo xmlns='somens'>a<bar /></foo>"), encoding="unicode")
)
Expected output
<foo xmlns="somens">a<bar /></foo>
Actual output
<ns0:foo xmlns:ns0="somens">a<ns0:bar /></ns0:foo>
Discussion
It would appear that, if a namespace isn't registered, ElementTree will use a ns0:
namespace prefix by default during serialization, instead of simply using the default namespace.
Now to be fair, technically this is a valid thing to do, as the resulting XML is semantically the same. However I still find this behavior problematic because:
- It is confusing and violates the principle of least surprise. Users won't expect to see a random namespace prefix just popping out of nowhere.
- It unnecessarily increases the size of the resulting XML document.
- Most importantly, the use of namespace prefixes can cause problems with downstream consumers.
- For example, the SVG viewer in VS Code and in the GitHub file browser get confused if an SVG file uses namespace prefixes. (This is how I stumbled upon this issue)
- This means ElementTree can turn perfectly working documents into ones that can cause compatibility headaches.
ElementTree should just use the default namespace, and only resort to prefixes if the default namespace is already in use.
Workaround
In the above example, pass default_namespace="somens"
to ET.tostring()
. Note this requires the user to know the namespace URL in advance. If they don't know it in advance they can get it from the root element but ElementTree doesn't make that easy: it requires ugly string parsing on the element name.
CPython versions tested on:
3.11
Operating systems tested on:
Linux