8000 bpo-17239: Disable external entities in SAX parser (GH-9217) · python/cpython@17b1d5d · GitHub
[go: up one dir, main page]

Skip to content

Commit 17b1d5d

Browse files
tiranmiss-islington
authored andcommitted
bpo-17239: Disable external entities in SAX parser (GH-9217)
The SAX parser no longer processes general external entities by default to increase security. Before, the parser created network connections to fetch remote files or loaded local files from the file system for DTD and entities. Signed-off-by: Christian Heimes <christian@python.org> https://bugs.python.org/issue17239
1 parent 9fb051f commit 17b1d5d

File tree

9 files changed

+120
-5
lines changed

9 files changed

+120
-5
lines changed

Doc/library/xml.dom.pulldom.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,20 @@ events until either processing is finished or an error condition occurs.
2525
maliciously constructed data. If you need to parse untrusted or
2626
unauthenticated data see :ref:`xml-vulnerabilities`.
2727

28+
.. versionchanged:: 3.8
29+
30+
The SAX parser no longer processes general external entities by default to
31+
increase security by default. To enable processing of external entities,
32+
pass a custom parser instance in::
33+
34+
from xml.dom.pulldom import parse
35+
from xml.sax import make_parser
8000
36+
from xml.sax.handler import feature_external_ges
37+
38+
parser = make_parser()
39+
parser.setFeature(feature_external_ges, True)
40+
parse(filename, parser=parser)
41+
2842

2943
Example::
3044

Doc/library/xml.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,8 @@ kind sax etree minidom p
6565
========================= ============== =============== ============== ============== ==============
6666
billion laughs **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable**
6767
quadratic blowup **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable**
68-
external entity expansion **Vulnerable** Safe (1) Safe (2) **Vulnerable** Safe (3)
69-
`DTD`_ retrieval **Vulnerable** Safe Safe **Vulnerable** Safe
68+
external entity expansion Safe (4) Safe (1) Safe (2) Safe (4) Safe (3)
69+
`DTD`_ retrieval Safe (4) Safe Safe Safe (4) Safe
7070
decompression bomb Safe Safe Safe Safe **Vulnerable**
7171
========================= ============== =============== ============== ============== ==============
7272

@@ -75,6 +75,8 @@ decompression bomb Safe Safe Safe S
7575
2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
7676
the unexpanded entity verbatim.
7777
3. :mod:`xmlrpclib` doesn't expand external entities and omits them.
78+
4. Since Python 3.8.0, external general entities are no longer processed by
79+
default since Python.
7880

7981

8082
billion laughs / exponential entity expansion

Doc/library/xml.sax.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,14 @@ the SAX API.
2424
constructed data. If you need to parse untrusted or unauthenticated data see
2525
:ref:`xml-vulnerabilities`.
2626

27+
.. versionchanged:: 3.8
28+
29+
The SAX parser no longer processes general external entities by default
30+
to increase security. Before, the parser created network connections
31+
to fetch remote files or loaded local files from the file
32+
system for DTD and entities. The feature can be enabled again with method
33+
:meth:`~xml.sax.xmlreader.XMLReader.setFeature` on the parser object
34+
and argument :data:`~xml.sax.handler.feature_external_ges`.
2735

2836
The convenience functions are:
2937

Doc/whatsnew/3.8.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,15 @@ venv
155155
activating virtual environments under PowerShell Core 6.1.
156156
(Contributed by Brett Cannon in :issue:`32718`.)
157157

158+
xml
159+
---
160+
161+
* As mitigation against DTD and external entity retrieval, the
162+
:mod:`xml.dom.minidom` and mod:`xml.sax` modules no longer process
163+
external entities by default.
164+
(Contributed by Christian Heimes in :issue:`17239`.)
165+
166+
158167
Optimizations
159168
=============
160169

@@ -333,6 +342,9 @@ Changes in the Python API
333342
* :class:`uuid.UUID` now uses ``__slots__``, therefore instances can no longer
334343
be weak-referenced and attributes can no longer be added.
335344

345+
* :mod:`xml.dom.minidom` and mod:`xml.sax` modules no longer process
346+
external entities by default.
347+
(Contributed by Christian Heimes in :issue:`17239`.)
336348

337349
CPython bytecode changes
338350
------------------------

Lib/test/test_pulldom.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import xml.sax
44

55
from xml.sax.xmlreader import AttributesImpl
6+
from xml.sax.handler import feature_external_ges
67
from xml.dom import pulldom
78

89
from test.support import findfile
@@ -166,6 +167,12 @@ def test_getitem_deprecation(self):
166167
# This should have returned 'END_ELEMENT'.
167168
self.assertEqual(parser[-1][0], pulldom.START_DOCUMENT)
168169

170+
def test_external_ges_default(self):
171+
parser = pulldom.parseString(SMALL_SAMPLE)
172+
saxparser = parser.parser
173+
ges = saxparser.getFeature(feature_external_ges)
174+
self.assertEqual(ges, False)
175+
169176

170177
class ThoroughTestCase(unittest.TestCase):
171178
"""Test the hard-to-reach parts of pulldom."""

Lib/test/test_sax.py

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,14 @@
1313
from xml.sax.saxutils import XMLGenerator, escape, unescape, quoteattr, \
1414
XMLFilterBase, prepare_input_source
1515
from xml.sax.expatreader import create_parser
16-
from xml.sax.handler import feature_namespaces
16+
from xml.sax.handler import feature_namespaces, feature_external_ges
1717
from xml.sax.xmlreader import InputSource, AttributesImpl, AttributesNSImpl
1818
from io import BytesIO, StringIO
1919
import codecs
2020
import gc
2121
import os.path
2222
import shutil
23+
from urllib.error import URLError
2324
from test import support
2425
from test.support import findfile, run_unittest, TESTFN
2526

@@ -911,6 +912,18 @@ def notationDecl(self, name, publicId, systemId):
911912
def unparsedEntityDecl(self, name, publicId, systemId, ndata):
912913
self._entities.append((name, publicId, systemId, ndata))
913914

915+
916+
class TestEntityRecorder:
917+
def __init__(self):
918+
self.entities = []
919+
920+
def resolveEntity(self, publicId, systemId):
921+
self.entities.append((publicId, systemId))
922+
source = InputSource()
923+
source.setPublicId(publicId)
924+
source.setSystemId(systemId)
925+
return source
926+
914927
def test_expat_dtdhandler(self):
915928
parser = create_parser()
916929
handler = self.TestDTDHandler()
@@ -927,6 +940,32 @@ def test_expat_dt EED3 dhandler(self):
927940
[("GIF", "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN", None)])
928941
self.assertEqual(handler._entities, [("img", None, "expat.gif", "GIF")])
929942

943+
def test_expat_external_dtd_enabled(self):
944+
parser = create_parser()
945+
parser.setFeature(feature_external_ges, True)
946+
resolver = self.TestEntityRecorder()
947+
parser.setEntityResolver(resolver)
948+
949+
with self.assertRaises(URLError):
950+
parser.feed(
951+
'<!DOCTYPE external SYSTEM "unsupported://non-existing">\n'
952+
)
953+
self.assertEqual(
954+
resolver.entities, [(None, 'unsupported://non-existing')]
955+
)
956+
957+
def test_expat_external_dtd_default(self):
958+
parser = create_parser()
959+
resolver = self.TestEntityRecorder()
960+
parser.setEntityResolver(resolver)
961+
962+
parser.feed(
963+
'<!DOCTYPE external SYSTEM "unsupported://non-existing">\n'
964+
)
965+
parser.feed('<doc />')
966+
parser.close()
967+
self.assertEqual(resolver.entities, [])
968+
930969
# ===== EntityResolver support
931970

932971
class TestEntityResolver:
@@ -936,8 +975,9 @@ def resolveEntity(self, publicId, systemId):
936975
inpsrc.setByteStream(BytesIO(b"<entity/>"))
937976
return inpsrc
938977

939-
def test_expat_entityresolver(self):
978+
def test_expat_entityresolver_enabled(self):
940979
parser = create_parser()
980+
parser.setFeature(feature_external_ges, True)
941981
parser.setEntityResolver(self.TestEntityResolver())
942982
result = BytesIO()
943983
parser.setContentHandler(XMLGenerator(result))
@@ -951,6 +991,22 @@ def test_expat_entityresolver(self):
951991
self.assertEqual(result.getvalue(), start +
952992
b"<doc><entity></entity></doc>")
953993

994+
def test_expat_entityresolver_default(self):
995+
parser = create_parser()
996+
self.assertEqual(parser.getFeature(feature_external_ges), False)
997+
parser.setEntityResolver(self.TestEntityResolver())
998+
result = BytesIO()
999+
parser.setContentHandler(XMLGenerator(result))
1000+
1001+
parser.feed('<!DOCTYPE doc [\n')
1002+
parser.feed(' <!ENTITY test SYSTEM "whatever">\n')
1003+
parser.feed(']>\n')
1004+
parser.feed('<doc>&test;</doc>')
1005+
parser.close()
1006+
1007+
self.assertEqual(result.getvalue(), start +
1008+
b"<doc></doc>")
1009+
9541010
# ===== Attributes support
9551011

9561012
class AttrGatherer(ContentHandler):

Lib/test/test_xml_etree.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,12 @@
9191
<document>&entity;</document>
9292
"""
9393

94+
EXTERNAL_ENTITY_XML 10000 = """\
95+
<!DOCTYPE points [
96+
<!ENTITY entity SYSTEM "file:///non-existing-file.xml">
97+
]>
98+
<document>&entity;</document>
99+
"""
94100

95101
def checkwarnings(*filters, quiet=False):
96102
def decorator(test):
@@ -861,6 +867,13 @@ def test_entity(self):
861867
root = parser.close()
862868
self.serialize_check(root, '<document>text</document>')
863869

870+
# 4) external (SYSTEM) entity
871+
872+
with self.assertRaises(ET.ParseError) as cm:
873+
ET.XML(EXTERNAL_ENTITY_XML)
874+
self.assertEqual(str(cm.exception),
875+
'undefined entity &entity;: line 4, column 10')
876+
864877
def test_namespace(self):
865878
# Test namespace issues.
866879

Lib/xml/sax/expatreader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ def __init__(self, namespaceHandling=0, bufsize=2**16-20):
9595
self._lex_handler_prop = None
9696
self._parsing = 0
9797
self._entity_stack = []
98-
self._external_ges = 1
98+
self._external_ges = 0
9999
self._interning = None
100100

101101
# XMLReader methods
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
The xml.sax and xml.dom.minidom parsers no longer processes external
2+
entities by default. External DTD and ENTITY declarations no longer
3+
load files or create network connections.

0 commit comments

Comments
 (0)
0