8000 urlparse does not correctly handle signs, underscores, and whitespace in port numbers · Issue #96035 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content
urlparse does not correctly handle signs, underscores, and whitespace in port numbers #96035
Closed
@kenballus

Description

@kenballus

Background

RFC 3986 (spec for URIs) defines a valid port string with the following grammar rule:

  • port = *DIGIT

Here's the WHATWG URL spec definition:
"""
A URL-port string must be one of the following:

  • the empty string
  • one or more ASCII digits representing a decimal number no greater than $2^{16} − 1$.

"""1

The bug

This is the port string parsing code from Lib/urllib/parse.py:166-176:

def port(self):
    port = self._hostinfo[1]
    if port is not None:
        try:
            port = int(port, 10)
        except ValueError:
            message = f'Port could not be cast to integer value as {port!r}'
            raise ValueError(message) from None
        if not ( 0 <= port <= 65535):
            raise ValueError("Port out of range 0-65535")
    return port

This will erroneously validate strings "-0" and f"+{x}" for any value of x in the valid range. Given that + and - are not digits, this behavior is in violation of both specifications.

This bug is easily reproducible with the following snippet:

from urllib.parse import urlparse
url1 = urlparse("http://python.org:-0")
url2 = urlparse("http://python.org:+80")
print(url1.port) # prints 0, but error is expected
print(url2.port) # prints 80, but error is expected

Happy to submit a PR, but don't want to step on any toes over at #25774.

My environment

  • CPython version tested on:
    • 3.10.6
  • Operating system and architecture:
    • Arch Linux x86_64

Footnotes

  1. Given that this is urlparse and not uriparse, it seems appropriate that we do not accept port numbers outside range(2**16), even though such numbers are allowed by RFC 3986.

Metadata

Metadata

Assignees

Labels

3.10only security fixes3.11only security fixes3.12only security fixesstdlibPython modules in the Lib dirtriagedThe issue has been accepted as valid by a triager.type-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0