8000 [3.6] bpo-42967: only use '&' as a query string separator (GH-24297) … · python/cpython@5c17dfc · GitHub
[go: up one dir, main page]

Skip to content 8000

Commit 5c17dfc

Browse files
orsenthilmerwokFidget-SpinnerAdamGold
authored
[3.6] bpo-42967: only use '&' as a query string separator (GH-24297) (GH-24532)
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <merwok@netwok.org> Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com>
1 parent 34df10a commit 5c17dfc

File tree

8 files changed

+134
-43
lines changed

8 files changed

+134
-43
lines changed

Doc/library/cgi.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -277,13 +277,12 @@ These are useful if you want more control, or if you want to employ some of the
277277
algorithms implemented in this module in other circumstances.
278278

279279

280-
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False)
280+
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&")
281281

282282
Parse a query in the environment or from a file (the file defaults to
283-
``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are
283+
``sys.stdin``). The *keep_blank_values*, *strict_parsing* and *separator* parameters are
284284
passed to :func:`urllib.parse.parse_qs` unchanged.
285285

286-
287286
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False)
288287

289288
This function is deprecated in this module. Use :func:`urllib.parse.parse_qs`
@@ -308,6 +307,9 @@ algorithms implemented in this module in other circumstances.
308307
Note that this does not parse nested multipart parts --- use
309308
:class:`FieldStorage` for that.
310309

310+
.. versionchanged:: 3.6.13
311+
Added the *separator* parameter.
312+
311313

312314
.. function:: parse_header(string)
313315

Doc/library/urllib.parse.rst

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ or on combining URL components into a URL string.
143143
now raise :exc:`ValueError`.
144144

145145

146-
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
146+
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
147147

148148
Parse a query string given as a string argument (data of type
149149
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
@@ -168,6 +168,9 @@ or on combining URL components into a URL string.
168168
read. If set, then throws a :exc:`ValueError` if there are more than
169169
*max_num_fields* fields read.
170170

171+
The optional argument *separator* is the symbol to use for separating the
172+
query arguments. It defaults to ``&``.
173+
171174
Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
172175
parameter set to ``True``) to convert such dictionaries into query
173176
strings.
@@ -179,8 +182,14 @@ or on combining URL components into a URL string.
179182
.. versionchanged:: 3.6.8
180183
Added *max_num_fields* parameter.
181184

185+
.. versionchanged:: 3.6.13
186+
Added *separator* parameter with the default value of ``&``. Python
187+
versions earlier than Python 3.6.13 allowed using both ``;`` and ``&`` as
188+
query parameter separator. This has been changed to allow only a single
189+
separator key, with ``&`` as the default separator.
190+
182191

183-
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
192+
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
184193

185194
Parse a query string given as a string argument (data of type
186195
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
@@ -204,6 +213,9 @@ or on combining URL components into a URL string.
204213
read. If set, then throws a :exc:`ValueError` if there are more than
205214
*max_num_fields* fields read.
206215

216+
The optional argument *separator* is the symbol to use for separating the
217+
query arguments. It defaults to ``&``.
218+
207219
Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
208220
query strings.
209221

@@ -213,6 +225,12 @@ or on combining URL components into a URL string.
213225
.. versionchanged:: 3.6.8
214226
Added *max_num_fields* parameter.
215227

228+
.. versionchanged:: 3.6.13
229+
Added *separator* parameter with the default value of ``&``. Python
230+
versions earlier than Python 3.6.13 allowed using both ``;`` and ``&`` as
231+
query parameter separator. This has been changed to allow only a single
232+
separator key, with ``&`` as the default separator.
233+
216234

217235
.. function:: urlunparse(parts)
218236

Doc/whatsnew/3.6.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2459,3 +2459,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
24592459
details, see the documentation for ``loop.create_datagram_endpoint()``.
24602460
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
24612461
:issue:`37228`.)
2462+
2463+
Notable changes in Python 3.6.13
2464+
================================
2465+
2466+
Earlier Python versions allowed using both ``;`` and ``&`` as
2467+
query parameter separators in :func:`urllib.parse.parse_qs` and
2468+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2469+
newer W3C recommendations, this has been changed to allow only a single
2470+
separator key, with ``&`` as the default. This change also affects
2471+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2472+
functions internally. For more details, please see their respective
2473+
documentation.
2474+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

Lib/cgi.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,8 @@ def closelog():
117117
# 0 ==> unlimited input
118118
maxlen = 0
119119

120-
def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
120+
def parse(fp=None, environ=os.environ, keep_blank_values=0,
121+
strict_parsing=0, separator='&'):
121122
"""Parse a query in the environment or from a file (default stdin)
122123
123124
Arguments, all optional:
@@ -136,6 +137,9 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
136137
strict_parsing: flag indicating what to do with parsing errors.
137138
If false (the default), errors are silently ignored.
138139
If true, errors raise a ValueError exception.
140+
141+
separator: str. The symbol to use for separating the query arguments.
142+
Defaults to &.
139143
"""
140144
if fp is None:
141145
fp = sys.stdin
@@ -180,7 +184,7 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
180184
qs = ""
181185
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
182186
return urllib.parse.parse_qs(qs, keep_blank_values, strict_parsing,
183-
encoding=encoding)
187+
encoding=encoding, separator=separator)
184188

185189

186190
# parse query string function called from urlparse,
@@ -405,7 +409,7 @@ class FieldStorage:
405409
def __init__(self, fp=None, headers=None, outerboundary=b'',
406410
environ=os.environ, keep_blank_values=0, strict_parsing=0,
407411
limit=None, encoding='utf-8', errors='replace',
408-
max_num_fields=None):
412+
max_num_fields=None, separator='&'):
409413
"""Constructor. Read multipart/* until last part.
410414
411415
Arguments, all optional:
@@ -453,6 +457,7 @@ def __init__(self, fp=None, headers=None, outerboundary=b'',
453457
self.keep_blank_values = keep_blank_values
454458
self.strict_parsing = strict_parsing
455459
self.max_num_fields = max_num_fields
460+
self.separator = separator
456461
if 'REQUEST_METHOD' in environ:
457462
method = environ['REQUEST_METHOD'].upper()
458463
self.qs_on_post = None
@@ -678,7 +683,7 @@ def read_urlencoded(self):
678683
query = urllib.parse.parse_qsl(
679684
qs, self.keep_blank_values, self.strict_parsing,
680685
encoding=self.encoding, errors=self.errors,
681-
max_num_fields=self.max_num_fields)
686+
max_num_fields=self.max_num_fields, separator=self.separator)
682687
self.list = [MiniFieldStorage(key, value) for key, value in query]
683688
self.skip_lines()
684689

@@ -694,7 +699,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
694699
query = urllib.parse.parse_qsl(
695700
self.qs_on_post, self.keep_blank_values, self.strict_parsing,
696701
encoding=self.encoding, errors=self.errors,
697-
max_num_fields=self.max_num_fields)
702+
max_num_fields=self.max_num_fields, separator=self.separator)
698703
self.list.extend(MiniFieldStorage(key, value) for key, value in query)
699704

700705
klass = self.FieldStorageClass or self.__class__
@@ -736,7 +741,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
736741

737742
part = klass(self.fp, headers, ib, environ, keep_blank_values,
738743
strict_parsing,self.limit-self.bytes_read,
739-
self.encoding, self.errors, max_num_fields)
744+
self.encoding, self.errors, max_num_fields, self.separator)
740745

741746
if max_num_fields is not None:
742747
max_num_fields -= 1

Lib/test/test_cgi.py

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,9 @@ def do_test(buf, method):
5555
("", ValueError("bad query field: ''")),
5656
("&", ValueError("bad query field: ''")),
5757
("&&", ValueError("bad query field: ''")),
58-
(";", ValueError("bad query field: ''")),
59-
(";&;", ValueError("bad query field: ''")),
6058
# Should the next few really be valid?
6159
("=", {}),
6260
("=&=", {}),
63-
("=;=", {}),
6461
# This rest seem to make sense
6562
("=a", {'': ['a']}),
6663
("&=a", ValueError("bad query field: ''")),
@@ -75,8 +72,6 @@ def do_test(buf, method):
7572
("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
7673
("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
7774
("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
78-
("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
79-
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
8075
("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
8176
{'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
8277
'cuyer': ['r'],
@@ -180,6 +175,30 @@ def test_strict(self):
180175
else:
181176
self.assertEqual(fs.getvalue(key), expect_val[0])
182177

178+
def test_separator(self):
179+
parse_semicolon = [
180+
("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
181+
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
182+
(";", ValueError("bad query field: ''")),
183+
(";;", ValueError("bad query field: ''")),
184+
("=;a", ValueError("bad query field: 'a'")),
185+
(";b=a", ValueError("bad query field: ''")),
186+
("b;=a", ValueError("bad query field: 'b'")),
187+
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
188+
("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
189+
]
190+
for orig, expect in parse_semicolon:
191+
env = {'QUERY_STRING': orig}
192+
fs = cgi.FieldStorage(separator=';', environ=env)
193+
if isinstance(expect, dict):
194+
for key in expect.keys():
195+
expect_val = expect[key]
196+
self.assertIn(key, fs)
197+
if len(expect_val) > 1:
198+
self.assertEqual(fs.getvalue(key), expect_val)
199+
else:
200+
self.assertEqual(fs.getvalue(key), expect_val[0])
201+
183202
def test_log(self):
184203
cgi.log("Testing")
185204

Lib/test/test_urlparse.py

Lines changed: 46 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,10 @@
3232
(b"&a=b", [(b'a', b'b')]),
3333
(b"a=a+b&b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
3434
(b"a=1&a=2", [(b'a', b'1'), (b'a', b'2')]),
35-
(";", []),
36-
(";;", []),
37-
(";a=b", [('a', 'b')]),
38-
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
39-
("a=1;a=2", [('a', '1'), ('a', '2')]),
40-
(b";", []),
41-
(b";;", []),
42-
(b";a=b", [(b'a', b'b')]),
43-
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
44-
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
35+
(";a=b", [(';a', 'b')]),
36+
("a=a+b;b=b+c", [('a', 'a b;b=b c')]),
37+
(b";a=b", [(b';a', b'b')]),
38+
(b"a=a+b;b=b+c", [(b'a', b'a b;b=b c')]),
4539
]
4640

4741
# Each parse_qs testcase is a two-tuple that contains
@@ -68,16 +62,10 @@
6862
(b"&a=b", {b'a': [b'b']}),
6963
(b"a=a+b&b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
7064
(b"a=1&a=2", {b'a': [b'1', b'2']}),
71-
(";", {}),
72-
(";;", {}),
73-
(";a=b", {'a': ['b']}),
74-
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
75-
("a=1;a=2", {'a': ['1', '2']}),
76-
(b";", {}),
77-
(b";;", {}),
78-
(b";a=b", {b'a': [b'b']}),
79-
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
80-
(b"a=1;a=2", {b'a': [b'1', b'2']}),
65+
(";a=b", {';a': ['b']}),
66+
("a=a+b;b=b+c", {'a': ['a b;b=b c']}),
67+
(b";a=b", {b';a': [b'b']}),
68+
(b"a=a+b;b=b+c", {b'a':[ b'a b;b=b c']}),
8169
]
8270

8371
class UrlParseTestCase(unittest.TestCase):
@@ -884,10 +872,46 @@ def test_parse_qsl_encoding(self):
884872
def test_parse_qsl_max_num_fields(self):
885873
with self.assertRaises(ValueError):
886874
urllib.parse.parse_qs('&'.join(['a=a']*11), max_num_fields=10)
887-
with self.assertRaises(ValueError):
888-
urllib.parse.parse_qs(';'.join(['a=a']*11), max_num_fields=10)
889875
urllib.parse.parse_qs('&'.join(['a=a']*10), max_num_fields=10)
890876

877+
def test_parse_qs_separator(self):
878+
parse_qs_semicolon_cases = [
879+
(";", {}),
880+
(";;", {}),
881+
(";a=b", {'a': ['b']}),
882+
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
883+
("a=1;a=2", {'a': ['1', '2']}),
884+
(b";", {}),
885+
(b";;", {}),
886+
(b";a=b", {b'a': [b'b']}),
887+
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
888+
(b"a=1;a=2", {b'a': [b'1', b'2']}),
889+
]
890+
for orig, expect in parse_qs_semicolon_cases:
891+
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
892+
result = urllib.parse.parse_qs(orig, separator=';')
893+
self.assertEqual(result, expect, "Error parsing %r" % orig)
894+
895+
896+
def test_parse_qsl_separator(self):
897+
parse_qsl_semicolon_cases = [
898+
(";", []),
899+
(";;", []),
900+
(";a=b", [('a', 'b')]),
901+
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
902+
("a=1;a=2", [('a', '1'), ('a', '2')]),
903+
(b";", []),
904+
(b";;", []),
905+
(b";a=b", [(b'a', b'b')]),
906+
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
907+
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
908+
]
909+
for orig, expect in parse_qsl_semicolon_cases:
910+
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
911+
result = urllib.parse.parse_qsl(orig, separator=';')
912+
self.assertEqual(result, expect, "Error parsing %r" % orig)
913+
914+
891915
def test_urlencode_sequences(self):
892916
# Other tests incidentally urlencode things; test non-covered cases:
893917
# Sequence and object values.

Lib/urllib/parse.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -644,7 +644,7 @@ def unquote(string, encoding='utf-8', errors='replace'):
644644

645645

646646
def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
647-
encoding='utf-8', errors='replace', max_num_fields=None):
647+
encoding='utf-8', errors='replace', max_num_fields=None, separator='&'):
648648
"""Parse a query given as a string argument.
649649
650650
Arguments:
@@ -668,12 +668,15 @@ def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
668668
max_num_fields: int. If set, then throws a ValueError if there
669669
are more than n fields read by parse_qsl().
670670
671+
separator: str. The symbol to use for separating the query arguments.
672+
Defaults to &.
673+
671674
Returns a dictionary.
672675
"""
673676
parsed_result = {}
674677
pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
675678
encoding=encoding, errors=errors,
676-
max_num_fields=max_num_fields)
679+
max_num_fields=max_num_fields, separator=separator)
677680
for name, value in pairs:
678681
if name in parsed_result:
679682
parsed_result[name].append(value)
@@ -683,7 +686,7 @@ def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
683686

684687

685688
def parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
686-
encoding='utf-8', errors='replace', max_num_fields=None):
689+
encoding='utf-8', errors='replace', max_num_fields=None, separator='&'):
687690
"""Parse a query given as a string argument.
688691
689692
Arguments:
@@ -706,19 +709,25 @@ def parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
706709
max_num_fields: int. If set, then throws a ValueError
707710
if there are more than n fields read by parse_qsl().
708711
712+
separator: str. The symbol to use for separating the query arguments.
713+
Defaults to &.
714+
709715
Returns a list, as G-d intended.
710716
"""
711717
qs, _coerce_result = _coerce_args(qs)
712718

719+
if not separator or (not isinstance(separator, (str, bytes))):
720+
raise ValueError("Separator must be of type string or bytes.")
721+
713722
# If max_num_fields is defined then check that the number of fields
714723
# is less than max_num_fields. This prevents a memory exhaustion DOS
715724
# attack via post bodies with many fields.
716725
if max_num_fields is not None:
717-
num_fields = 1 + qs.count('&') + qs.count(';')
726+
num_fields = 1 + qs.count(separator)
718727
if max_num_fields < num_fields:
719728
raise ValueError('Max number of fields exceeded')
720729

721-
pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
730+
pairs = [s1 for s1 in qs.split(separator)]
722731
r = []
723732
for name_value in pairs:
724733
if not name_value and not strict_parsing:
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Fix web cache poisoning vulnerability by defaulting the query args separator to ``&``, and allowing the user to choose a custom separator.

0 commit comments

Comments
 (0)
0