Description
Background
RFC 3986 (spec for URIs) defines a valid port string with the following grammar rule:
port = *DIGIT
Here's the WHATWG URL spec definition:
"""
A URL-port string must be one of the following:
- the empty string
- one or more ASCII digits representing a decimal number no greater than
$2^{16} − 1$ .
"""1
The bug
This is the port string parsing code from Lib/urllib/parse.py:166-176
:
def port(self):
port = self._hostinfo[1]
if port is not None:
try:
port = int(port, 10)
except ValueError:
message = f'Port could not be cast to integer value as {port!r}'
raise ValueError(message) from None
if not ( 0 <= port <= 65535):
raise ValueError("Port out of range 0-65535")
return port
This will erroneously validate strings "-0"
and f"+{x}"
for any value of x
in the valid range. Given that +
and -
are not digits, this behavior is in violation of both specifications.
This bug is easily reproducible with the following snippet:
from urllib.parse import urlparse
url1 = urlparse("http://python.org:-0")
url2 = urlparse("http://python.org:+80")
print(url1.port) # prints 0, but error is expected
print(url2.port) # prints 80, but error is expected
Happy to submit a PR, but don't want to step on any toes over at #25774.
My environment
- CPython version tested on:
- 3.10.6
- Operating system and architecture:
- Arch Linux x86_64
Footnotes
-
Given that this is
urlparse
and noturiparse
, it seems appropriate that we do not accept port numbers outsiderange(2**16)
, even though such numbers are allowed by RFC 3986. ↩