8000 [3.6] bpo-17239: Disable external entities in SAX parser (GH-9217) (G… · python/cpython@582d188 · GitHub
[go: up one dir, main page]

Skip to content

Commit 582d188

Browse files
tiranmiss-islington
authored andcommitted
[3.6] bpo-17239: Disable external entities in SAX parser (GH-9217) (GH-9512)
The SAX parser no longer processes general external entities by default to increase security. Before, the parser created network connections to fetch remote files or loaded local files from the file system for DTD and entities. Signed-off-by: Christian Heimes <christian@python.org> https://bugs.python.org/issue17239. (cherry picked from commit 17b1d5d) Co-authored-by: Christian Heimes <christian@python.org> https://bugs.python.org/issue17239
1 parent 6b48f98 commit 582d188

File tree

9 files changed

+125
-6
lines changed

9 files changed

+125
-6
lines changed

Doc/library/xml.dom.pulldom.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,20 @@ events until either processing is finished or an error condition occurs.
2525
maliciously constructed data. If you need to parse untrusted or
2626
unauthenticated data see :ref:`xml-vulnerabilities`.
2727

28+
.. versionchanged:: 3.6.7
29+
30+
The SAX parser no longer processes general external entities by default to
31+
increase security by default. To enable processing of external entities,
32+
pass a custom parser instance in::
33+
34+
from xml.dom.pulldom import parse
35+
from xml.sax import make_parser
36+
from xml.sax.handler import feature_external_ges
37+
38+
parser = make_parser()
39+
parser.setFeature(feature_external_ges, True)
40+
parse(filename, parser=parser)
41+
2842

2943
Example::
3044

Doc/library/xml.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,8 @@ kind sax etree minidom p
6565
========================= ============== =============== ============== ============== ==============
6666
billion laughs **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable**
6767
quadratic blowup **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable**
68-
external entity expansion **Vulnerable** Safe (1) Safe (2) **Vulnerable** Safe (3)
69-
`DTD`_ retrieval **Vulnerable** Safe Safe **Vulnerable** Safe
68+
external entity expansion Safe (4) Safe (1) Safe (2) Safe (4) Safe (3)
69+
`DTD`_ retrieval Safe (4) Safe Safe Safe (4) Safe
7070
decompression bomb Safe Safe Safe Safe **Vulnerable**
7171
========================= ============== =============== ============== ============== ==============
7272

@@ -75,6 +75,8 @@ decompression bomb Safe Safe Safe S
7575
2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
7676
the unexpanded entity verbatim.
7777
3. :mod:`xmlrpclib` doesn't expand external entities and omits them.
78+
4. Since Python 3.8.0, external general entities are no longer processed by
79+
default since Python.
7880

7981

8082
billion laughs / exponential entity expansion

Doc/library/xml.sax.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,14 @@ the SAX API.
2424
constructed data. If you need to parse untrusted or unauthenticated data see
2525
:ref:`xml-vulnerabilities`.
2626

27+
.. versionchanged:: 3.6.7
28+
29+
The SAX parser no longer processes general external entities by default
30+
to increase security. Before, the parser created network connections
31+
to fetch remote files or loaded local files from the file
32+
system for DTD and entities. The feature can be enabled again with method
33+
:meth:`~xml.sax.xmlreader.XMLReader.setFeature` on the parser object
34+
and argument :data:`~xml.sax.handler.feature_e 9E88 xternal_ges`.
2735

2836
The convenience functions are:
2937

Doc/whatsnew/3.6.rst

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2055,6 +2055,15 @@ connected to and thus what Python interpreter will be used by the virtual
20552055
environment. (Contributed by Brett Cannon in :issue:`25154`.)
20562056

20572057

2058+
xml
2059+
---
2060+
2061+
* As mitigation against DTD and external entity retrieval, the
2062+
:mod:`xml.dom.minidom` and mod:`xml.sax` modules no longer process
2063+
external entities by default.
2064+
(Contributed by Christian Heimes in :issue:`17239`.)
2065+
2066+
20582067
Deprecated functions and types of the C API
20592068
-------------------------------------------
20602069

@@ -2163,7 +2172,7 @@ Changes in the Python API
21632172

21642173
* The functions in the :mod:`compileall` module now return booleans instead
21652174
of ``1`` or ``0`` to represent success or failure, respectively. Thanks to
2166-
booleans being a subclass of integers, this should only be an issue if you
2175+
booleans being a subclass of integers, this should only be an issue if you7
21672176
were doing identity checks for ``1`` or ``0``. See :issue:`25768`.
21682177

21692178
* Reading the :attr:`~urllib.parse.SplitResult.port` attribute of
@@ -2408,3 +2417,10 @@ Notable changes in Python 3.6.5
24082417
The :func:`locale.localeconv` function now sets temporarily the ``LC_CTYPE``
24092418
locale to the ``LC_NUMERIC`` locale in some cases.
24102419
(Contributed by Victor Stinner in :issue:`31900`.)
2420+
2421+
2422+
Notable changes in Python 3.6.7
2423+
===============================
2424+
2425+
:mod:`xml.dom.minidom` and mod:`xml.sax` modules no longer process
2426+
external entities by default. See also :issue:`17239`.

Lib/test/test_pulldom.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import xml.sax
44

55
from xml.sax.xmlreader import AttributesImpl
6+
from xml.sax.handler import feature_external_ges
67
from xml.dom import pulldom
78

89
from test.support import findfile
@@ -159,6 +160,12 @@ def test_end_document(self):
159160
self.fail(
160161
"Ran out of events, but should have received END_DOCUMENT")
161162

163+
def test_external_ges_default(self):
164+
parser = pulldom.parseString(SMALL_SAMPLE)
165+
saxparser = parser.parser
166+
ges = saxparser.getFeature(feature_external_ges)
167+
self.assertEqual(ges, False)
168+
162169

163170
class ThoroughTestCase(unittest.TestCase):
164171
"""Test the hard-to-reach parts of pulldom."""

Lib/test/test_sax.py

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,14 @@
1313
from xml.sax.saxutils import XMLGenerator, escape, unescape, quoteattr, \
1414
XMLFilterBase, prepare_input_source
1515
from xml.sax.expatreader import create_parser
16-
from xml.sax.handler import feature_namespaces
16+
from xml.sax.handler import feature_namespaces, feature_external_ges
1717
from xml.sax.xmlreader import InputSource, AttributesImpl, AttributesNSImpl
1818
from io import BytesIO, StringIO
1919
import codecs
2020
import gc
2121
import os.path
2222
import shutil
23+
from urllib.error import URLError
2324
from test import support
2425
from test.support import findfile, run_unittest, TESTFN
2526

@@ -911,6 +912,18 @@ def notationDecl(self, name, publicId, systemId):
911912
def unparsedEntityDecl(self, name, publicId, systemId, ndata):
912913
self._entities.append((name, publicId, systemId, ndata))
913914

915+
916+
class TestEntityRecorder:
917+
def __init__(self):
918+
self.entities = []
919+
920+
def resolveEntity(self, publicId, systemId):
921+
self.entities.append((publicId, systemId))
922+
source = InputSource()
923+
source.setPublicId(publicId)
924+
source.setSystemId(systemId)
925+
return source
926+
914927
def test_expat_dtdhandler(self):
915928
parser = create_parser()
916929
handler = self.TestDTDHandler()
@@ -927,6 +940,32 @@ def test_expat_dtdhandler(self):
927940
[("GIF", "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN", None)])
928941
self.assertEqual(handler._entities, [("img", None, "expat.gif", "GIF")])
929942

943+
def test_expat_external_dtd_enabled(self):
944+
parser = create_parser()
945+
parser.setFeature(feature_external_ges, True)
946+
resolver = self.TestEntityRecorder()
947+
parser.setEntityResolver(resolver)
948+
949+
with self.assertRaises(URLError):
950+
parser.feed(
951+
'<!DOCTYPE external SYSTEM "unsupported://non-existing">\n'
952+
)
953+
self.assertEqual(
954+
resolver.entities, [(None, 'unsupported://non-existing')]
955+
)
956+
957+
def test_expat_external_dtd_default(self):
958+
parser = create_parser()
959+
resolver = self.TestEntityRecorder()
960+
parser.setEntityResolver(resolver)
961+
962+
parser.feed(
963+
'<!DOCTYPE external SYSTEM "unsupported://non-existing">\n'
964+
)
965+
parser.feed('<doc />')
966+
parser.close()
967+
self.assertEqual(resolver.entities, [])
968+
930969
# ===== EntityResolver support
931970

932971
class TestEntityResolver:
@@ -936,8 +975,9 @@ def resolveEntity(self, publicId, systemId):
936975
inpsrc.setByteStream(BytesIO(b"<entity/>"))
937976
return inpsrc
938977

939-
def test_expat_entityresolver(self):
978+
def test_expat_entityresolver_enabled(self):
940979
parser = create_parser()
980+
parser.setFeature(feature_external_ges, True)
941981
parser.setEntityResolver(self.TestEntityResolver())
942982
result = BytesIO()
943983
parser.setContentHandler(XMLGenerator(result))
@@ -951,6 +991,22 @@ def test_expat_entityresolver(self):
951991
self.assertEqual(result.getvalue(), start +
952992
b"<doc><entity></entity></doc>")
953993

994+
def test_expat_entityresolver_default(self):
995+
parser = create_parser()
996+
self.assertEqual(parser.getFeature(feature_external_ges), False)
997+
parser.setEntityResolver(self.TestEntityResolver())
998+
result = BytesIO()
999+
parser.setContentHandler(XMLGenerator(result))
1000+
1001+
parser.feed('<!DOCTYPE doc [\n')
1002+
parser.feed(' <!ENTITY test SYSTEM "whatever">\n')
1003+
parser.feed(']>\n')
1004+
parser.feed('<doc>&test;</doc>')
1005+
parser.close()
1006+
1007+
self.assertEqual(result.getvalue(), start +
1008+
b"<doc></doc>")
1009+
9541010
# ===== Attributes support
9551011

9561012
class AttrGatherer(ContentHandler):

Lib/test/test_xml_etree.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,12 @@
9090
<document>&entity;</document>
9191
"""
9292

93+
EXTERNAL_ENTITY_XML = """\
94+
<!DOCTYPE points [
95+
<!ENTITY entity SYSTEM "file:///non-existing-file.xml">
96+
]>
97+
<document>&entity;</document>
98+
"""
9399

94100
class ModuleTest(unittest.TestCase):
95101
def test_sanity(self):
@@ -846,6 +852,13 @@ def test_entity(self):
846852
root = parser.close()
847853
self.serialize_check(root, '<document>text</document>')
848854

855+
# 4) external (SYSTEM) entity
856+
857+
with self.assertRaises(ET.ParseError) as cm:
858+
ET.XML(EXTERNAL_ENTITY_XML)
859+
self.assertEqual(str(cm.exception),
860+
'undefined entity &entity;: line 4, column 10')
861+
849862
def test_namespace(self):
850863
# Test namespace issues.
851864

Lib/xml/sax/expatreader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ def __init__(self, namespaceHandling=0, bufsize=2**16-20):
9595
self._lex_handler_prop = None
9696
self._parsing = 0
9797
self._entity_stack = []
98-
self._external_ges = 1
98+
self._external_ges = 0
9999
self._interning = None
100100

101101
# XMLReader methods
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
The xml.sax and xml.dom.minidom parsers no longer processes external
2+
entities by default. External DTD and ENTITY declarations no longer
3+
load files or create network connections.

0 commit comments

Comments
 (0)
0