8000 gh-82150: Make urllib.parse.urlsplit and urllib.parse.urlunsplit preserve the '?' and '#' delimiters of empty query and fragment components by geryogam · Pull Request #15642 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-82150: Make urllib.parse.urlsplit and urllib.parse.urlunsplit preserve the '?' and '#' delimiters of empty query and fragment components #15642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 17 additions & 26 deletions Doc/library/urllib.parse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ or on combining URL components into a URL string.
>>> o.port
80
>>> o._replace(fragment="").geturl()
'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params#'

Following the syntax specifications in :rfc:`1808`, urlparse recognizes
a netloc only if it is properly introduced by '//'. Otherwise the
Expand All @@ -83,13 +83,13 @@ or on combining URL components into a URL string.
>>> from urllib.parse import urlparse
>>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')
params='', query=None, fragment=None)
>>> urlparse('www.cwi.nl/%7Eguido/Python.html')
ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
params='', query='', fragment='')
params='', query=None, fragment=None)
>>> urlparse('help/Python.html')
ParseResult(scheme='', netloc='', path='help/Python.html', params='',
query='', fragment='')
query=None, fragment=None)

The *scheme* argument gives the default addressing scheme, to be
used only if the URL does not specify one. It should be the same type
Expand Down Expand Up @@ -154,10 +154,10 @@ or on combining URL components into a URL string.
>>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
>>> u
ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')
params='', query=None, fragment=None)
>>> u._replace(scheme='http')
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')
params='', query=None, fragment=None)


.. versionchanged:: 3.2
Expand Down Expand Up @@ -269,10 +269,7 @@ or on combining URL components into a URL string.
.. function:: urlunparse(parts)

Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
argument can be any six-item iterable. This may result in a slightly
different, but equivalent URL, if the URL that was parsed originally had
unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
states that these are equivalent).
argument can be any six-item iterable.


.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Expand Down Expand Up @@ -344,9 +341,7 @@ or on combining URL components into a URL string.

Combine the elements of a tuple as returned by :func:`urlsplit` into a
complete URL as a string. The *parts* argument can be any five-item
iterable. This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example, a ?
with an empty query; the RFC states that these are equivalent).
iterable.


.. function:: urljoin(base, url, allow_fragments=True)
Expand Down Expand Up @@ -463,32 +458,28 @@ individual URL quoting functions.
Structured Parse Results
------------------------

The result objects from the :func:`urlparse`, :func:`urlsplit` and
The result objects from the :func:`urlparse`, :func:`urlsplit` and
:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
These subclasses add the attributes listed in the documentation for
those functions, the encoding and decoding support described in the
previous section, as well as an additional method:

.. method:: urllib.parse.SplitResult.geturl()
.. method:: urllib.parse.ParseResult.geturl()

Return the re-combined version of the original URL as a string. This may
differ from the original URL in that the scheme may be normalized to lower
case and empty components may be dropped. Specifically, empty parameters,
queries, and fragment identifiers will be removed.

For :func:`urldefrag` results, only empty fragment identifiers will be removed.
For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
made to the URL returned by this method.
differ from the original URL in that the scheme will be normalized to lower
case for :func:`urlparse`, :func:`urlsplit` and :func:`urldefrag` results,
and empty parameters will be removed for :func:`urlparse` results.

The result of this method remains unchanged if passed back through the original
parsing function:

>>> from urllib.parse import urlsplit
>>> url = 'HTTP://www.Python.org/doc/#'
>>> r1 = urlsplit(url)
>>> from urllib.parse import urlparse
>>> url = 'HTTP://www.Python.org/doc/;'
>>> r1 = urlparse(url)
>>> r1.geturl()
'http://www.Python.org/doc/'
>>> r2 = urlsplit(r1.geturl())
>>> r2 = urlparse(r1.geturl())
>>> r2.geturl()
'http://www.Python.org/doc/'

Expand Down
4 changes: 1 addition & 3 deletions Lib/test/test_urllib2.py
Original file line number Diff line number Diff line change
Expand Up @@ -1100,9 +1100,7 @@ def test_full_url_setter(self):
parsed = urlparse(url)

self.assertEqual(r.get_full_url(), url)
# full_url setter uses splittag to split into components.
# splittag sets the fragment as None while urlparse sets it to ''
self.assertEqual(r.fragment or '', parsed.fragment)
self.assertEqual(r.fragment, parsed.fragment)
self.assertEqual(urlparse(r.get_full_url()).query, parsed.query)

def test_full_url_deleter(self):
Expand Down
133 changes: 70 additions & 63 deletions Lib/test/test_urlparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,34 +143,34 @@ def test_qs(self):
def test_roundtrips(self):
str_cases = [
('file:///tmp/junk.txt',
('file', '', '/tmp/junk.txt', '', '', ''),
('file', '', '/tmp/junk.txt', '', '')),
('file', '', '/tmp/junk.txt', '', None, None),
('file', '', '/tmp/junk.txt', None, None)),
('imap://mail.python.org/mbox1',
('imap', 'mail.python.org', '/mbox1', '', '', ''),
('imap', 'mail.python.org', '/mbox1', '', '')),
('imap', 'mail.python.org', '/mbox1', '', None, None),
('imap', 'mail.python.org', '/mbox1', None, None)),
('mms://wms.sys.hinet.net/cts/Drama/09006251100.asf',
('mms', 'wms.sys.hinet.net', '/cts/Drama/09006251100.asf',
'', '', ''),
'', None, None),
('mms', 'wms.sys.hinet.net', '/cts/Drama/09006251100.asf',
'', '')),
None, None)),
('nfs://server/path/to/file.txt',
('nfs', 'server', '/path/to/file.txt', '', '', ''),
('nfs', 'server', '/path/to/file.txt', '', '')),
('nfs', 'server', '/path/to/file.txt', '', None, None),
('nfs', 'server', '/path/to/file.txt', None, None)),
('svn+ssh://svn.zope.org/repos/main/ZConfig/trunk/',
('svn+ssh', 'svn.zope.org', '/repos/main/ZConfig/trunk/',
'', '', ''),
'', None, None),
('svn+ssh', 'svn.zope.org', '/repos/main/ZConfig/trunk/',
'', '')),
None, None)),
('git+ssh://git@github.com/user/project.git',
('git+ssh', 'git@github.com','/user/project.git',
'','',''),
'', None, None),
('git+ssh', 'git@github.com','/user/project.git',
'', '')),
None, None)),
]
def _encode(t):
return (t[0].encode('ascii'),
tuple(x.encode('ascii') for x in t[1]),
tuple(x.encode('ascii') for x in t[2]))
tuple(x.encode('ascii') if x is not None else None for x in t[1]),
tuple(x.encode('ascii') if x is not None else None for x in t[2]))
bytes_cases = [_encode(x) for x in str_cases]
for url, parsed, split in str_cases + bytes_cases:
self.checkRoundtrips(url, parsed, split)
Expand All @@ -181,25 +181,34 @@ def test_http_roundtrips(self):
# Three cheers for white box knowledge!
str_cases = [
('://www.python.org',
('www.python.org', '', '', '', ''),
('www.python.org', '', '', '')),
('www.python.org', '', '', None, None),
('www.python.org', '', None, None)),
('://www.python.org#abc',
('www.python.org', '', '', '', 'abc'),
('www.python.org', '', '', 'abc')),
('www.python.org', '', '', None, 'abc'),
('www.python.org', '', None, 'abc')),
('://www.python.org?q=abc',
('www.python.org', '', '', 'q=abc', ''),
('www.python.org', '', 'q=abc', '')),
('www.python.org', '', '', 'q=abc', None),
('www.python.org', '', 'q=abc', None)),
('://www.python.org/#abc',
('www.python.org', '/', '', '', 'abc'),
('www.python.org', '/', '', 'abc')),
('www.python.org', '/', '', None, 'abc'),
('www.python.org', '/', None, 'abc')),
('://a/b/c/d;p?q#f',
('a', '/b/c/d', 'p', 'q', 'f'),
('a', '/b/c/d;p', 'q', 'f')),
('://a/?',
('a', '/', '', '', None),
('a', '/', '', None)),
('://a/#',
('a', '/', '', None, ''),
('a', '/', None, '')),
('://a/?#',
('a', '/', '', '', ''),
('a', '/', '', '')),
]
def _encode(t):
return (t[0].encode('ascii'),
tuple(x.encode('ascii') for x in t[1]),
tuple(x.encode('ascii') for x in t[2]))
tuple(x.encode('ascii') if x is not None else None for x in t[1]),
tuple(x.encode('ascii') if x is not None else None for x in t[2]))
bytes_cases = [_encode(x) for x in str_cases]
str_schemes = ('http', 'https')
bytes_schemes = (b'http', b'https')
Expand Down Expand Up @@ -278,7 +287,7 @@ def test_RFC1808(self):
def test_RFC2368(self):
# Issue 11467: path that starts with a number is not parsed correctly
self.assertEqual(urllib.parse.urlparse('mailto:1337@example.org'),
('mailto', '', '1337@example.org', '', '', ''))
('mailto', '', '1337@example.org', '', None, None))

def test_RFC2396(self):
# cases from RFC 2396
Expand Down Expand Up @@ -490,18 +499,18 @@ def _encode(t):
def test_urldefrag(self):
str_cases = [
('http://python.org#frag', 'http://python.org', 'frag'),
('http://python.org', 'http://python.org', ''),
('http://python.org', 'http://python.org', None),
('http://python.org/#frag', 'http://python.org/', 'frag'),
('http://python.org/', 'http://python.org/', ''),
('http://python.org/', 'http://python.org/', None),
('http://python.org/?q#frag', 'http://python.org/?q', 'frag'),
('http://python.org/?q', 'http://python.org/?q', ''),
('http://python.org/?q', 'http://python.org/?q', None),
('http://python.org/p#frag', 'http://python.org/p', 'frag'),
('http://python.org/p?q', 'http://python.org/p?q', ''),
('http://python.org/p?q', 'http://python.org/p?q', None),
(RFC1808_BASE, 'http://a/b/c/d;p?q', 'f'),
(RFC2396_BASE, 'http://a/b/c/d;p?q', ''),
(RFC2396_BASE, 'http://a/b/c/d;p?q', None),
]
def _encode(t):
return type(t)(x.encode('ascii') for x in t)
return type(t)(x.encode('ascii') if x is not None else None for x in t)
bytes_cases = [_encode(x) for x in str_cases]
for url, defrag, frag in str_cases + bytes_cases:
result = urllib.parse.urldefrag(url)
Expand All @@ -525,7 +534,7 @@ def test_urlsplit_attributes(self):
self.assertEqual(p.scheme, "http")
self.assertEqual(p.netloc, "WWW.PYTHON.ORG")
self.assertEqual(p.path, "/doc/")
self.assertEqual(p.query, "")
self.assertEqual(p.query, None)
self.assertEqual(p.fragment, "frag")
self.assertEqual(p.username, None)
self.assertEqual(p.password, None)
Expand Down Expand Up @@ -572,7 +581,7 @@ def test_urlsplit_attributes(self):
self.assertEqual(p.scheme, b"http")
self.assertEqual(p.netloc, b"WWW.PYTHON.ORG")
self.assertEqual(p.path, b"/doc/")
self.assertEqual(p.query, b"")
self.assertEqual(p.query, None)
self.assertEqual(p.fragment, b"frag")
self.assertEqual(p.username, None)
self.assertEqual(p.password, None)
Expand Down Expand Up @@ -730,46 +739,44 @@ def test_attributes_without_netloc(self):
def test_noslash(self):
# Issue 1637: http://foo.com?query is legal
self.assertEqual(urllib.parse.urlparse("http://example.com?blahblah=/foo"),
('http', 'example.com', '', '', 'blahblah=/foo', ''))
('http', 'example.com', '', '', 'blahblah=/foo', None))
self.assertEqual(urllib.parse.urlparse(b"http://example.com?blahblah=/foo"),
(b'http', b'example.com', b'', b'', b'blahblah=/foo', b''))
(b'http', b'example.com', b'', b'', b'blahblah=/foo', None))

def test_withoutscheme(self):
# Test urlparse without scheme
# Issue 754016: urlparse goes wrong with IP:port without scheme
# RFC 1808 specifies that netloc should start with //, urlparse expects
# the same, otherwise it classifies the portion of url as path.
self.assertEqual(urllib.parse.urlparse("path"),
('','','path','','',''))
self.assertEqual(urllib.parse.urlparse("path"), ('','','path','',None,None))
self.assertEqual(urllib.parse.urlparse("//www.python.org:80"),
('','www.python.org:80','','','',''))
('','www.python.org:80','','',None,None))
self.assertEqual(urllib.parse.urlparse("http://www.python.org:80"),
('http','www.python.org:80','','','',''))
('http','www.python.org:80','','',None,None))
# Repeat for bytes input
self.assertEqual(urllib.parse.urlparse(b"path"),
(b'',b'',b'path',b'',b'',b''))
self.assertEqual(urllib.parse.urlparse(b"path"), (b'',b'',b'path',b'',None,None))
self.assertEqual(urllib.parse.urlparse(b"//www.python.org:80"),
(b'',b'www.python.org:80',b'',b'',b'',b''))
(b'',b'www.python.org:80',b'',b'',None,None))
self.assertEqual(urllib.parse.urlparse(b"http://www.python.org:80"),
(b'http',b'www.python.org:80',b'',b'',b'',b''))
(b'http',b'www.python.org:80',b'',b'',None,None))

def test_portseparator(self):
# Issue 754016 makes changes for port separator ':' from scheme separator
self.assertEqual(urllib.parse.urlparse("http:80"), ('http','','80','','',''))
self.assertEqual(urllib.parse.urlparse("https:80"), ('https','','80','','',''))
self.assertEqual(urllib.parse.urlparse("path:80"), ('path','','80','','',''))
self.assertEqual(urllib.parse.urlparse("http:"),('http','','','','',''))
self.assertEqual(urllib.parse.urlparse("https:"),('https','','','','',''))
self.assertEqual(urllib.parse.urlparse("http:80"), ('http','','80','',None,None))
self.assertEqual(urllib.parse.urlparse("https:80"), ('https','','80','',None,None))
self.assertEqual(urllib.parse.urlparse("path:80"), ('path','','80','',None,None))
self.assertEqual(urllib.parse.urlparse("http:"),('http','','','',None,None))
self.assertEqual(urllib.parse.urlparse("https:"),('https','','','',None,None))
self.assertEqual(urllib.parse.urlparse("http://www.python.org:80"),
('http','www.python.org:80','','','',''))
('http','www.python.org:80','','',None,None))
# As usual, need to check bytes input as well
self.assertEqual(urllib.parse.urlparse(b"http:80"), (b'http',b'',b'80',b'',b'',b''))
self.assertEqual(urllib.parse.urlparse(b"https:80"), (b'https',b'',b'80',b'',b'',b''))
self.assertEqual(urllib.parse.urlparse(b"path:80"), (b'path',b'',b'80',b'',b'',b''))
self.assertEqual(urllib.parse.urlparse(b"http:"),(b'http',b'',b'',b'',b'',b''))
self.assertEqual(urllib.parse.urlparse(b"https:"),(b'https',b'',b'',b'',b'',b''))
self.assertEqual(urllib.parse.urlparse(b"http:80"), (b'http',b'',b'80',b'',None,None))
self.assertEqual(urllib.parse.urlparse(b"https:80"), (b'https',b'',b'80',b'',None,None))
self.assertEqual(urllib.parse.urlparse(b"path:80"), (b'path',b'',b'80',b'',None,None))
self.assertEqual(urllib.parse.urlparse(b"http:"),(b'http',b'',b'',b'',None,None))
self.assertEqual(urllib.parse.urlparse(b"https:"),(b'https',b'',b'',b'',None,None))
self.assertEqual(urllib.parse.urlparse(b"http://www.python.org:80"),
(b'http',b'www.python.org:80',b'',b'',b'',b''))
(b'http',b'www.python.org:80',b'',b'',None,None))

def test_usingsys(self):
# Issue 3314: sys module is used in the error
Expand All @@ -778,23 +785,23 @@ def test_usingsys(self):
def test_anyscheme(self):
# Issue 7904: s3://foo.com/stuff has netloc "foo.com".
self.assertEqual(urllib.parse.urlparse("s3://foo.com/stuff"),
('s3', 'foo.com', '/stuff', '', '', ''))
('s3', 'foo.com', '/stuff', '', None, None))
self.assertEqual(urllib.parse.urlparse("x-newscheme://foo.com/stuff"),
('x-newscheme', 'foo.com', '/stuff', '', '', ''))
('x-newscheme', 'foo.com', '/stuff', '', None, None))
self.assertEqual(urllib.parse.urlparse("x-newscheme://foo.com/stuff?query#fragment"),
('x-newscheme', 'foo.com', '/stuff', '', 'query', 'fragment'))
self.assertEqual(urllib.parse.urlparse("x-newscheme://foo.com/stuff?query"),
('x-newscheme', 'foo.com', '/stuff', '', 'query', ''))
('x-newscheme', 'foo.com', '/stuff', '', 'query', None))

# And for bytes...
self.assertEqual(urllib.parse.urlparse(b"s3://foo.com/stuff"),
(b's3', b'foo.com', b'/stuff', b'', b'', b''))
(b's3', b'foo.com', b'/stuff', b'', None, None))
self.assertEqual(urllib.parse.urlparse(b"x-newscheme://foo.com/stuff"),
(b'x-newscheme', b'foo.com', b'/stuff', b'', b'', b''))
(b'x-newscheme', b'foo.com', b'/stuff', b'', None, None))
self.assertEqual(urllib.parse.urlparse(b"x-newscheme://foo.com/stuff?query#fragment"),
(b'x-newscheme', b'foo.com', b'/stuff', b'', b'query', b'fragment'))
self.assertEqual(urllib.parse.urlparse(b"x-newscheme://foo.com/stuff?query"),
(b'x-newscheme', b'foo.com', b'/stuff', b'', b'query', b''))
(b'x-newscheme', b'foo.com', b'/stuff', b'', b'query', None))

def test_default_scheme(self):
# Exercise the scheme parameter of urlparse() and urlsplit()
Expand Down Expand Up @@ -831,10 +838,10 @@ def test_parse_fragments(self):
attr = "path"
with self.subTest(url=url, function=func):
result = func(url, allow_fragments=False)
self.assertEqual(result.fragment, "")
self.assertEqual(result.fragment, None)
self.assertTrue(
getattr(result, attr).endswith("#" + expected_frag))
self.assertEqual(func(url, "", False).fragment, "")
self.assertEqual(func(url, "", False).fragment, None)

result = func(url, allow_fragments=True)
self.assertEqual(result.fragment, expected_frag)
Expand Down
Loading
0