8000 gh-102153: Start stripping C0 control and space chars in `urlsplit` by illia-v · Pull Request #102508 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-102153: Start stripping C0 control and space chars in urlsplit #102508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 17, 2023
Merged
Prev Previous commit
Next Next commit
Add urlparse and urlsplit security warnings.
The added section describing the situation is longer than I might want,
but being more brief just leaves open questions.

This is a lighter worded version of my original text proposed in
https://discuss.python.org/t/how-to-word-a-warning-about-security-uses-in-urllib-parse-docs/26399
  • Loading branch information
gpshead committed May 17, 2023
commit a510652af8eb02fd2377accbd66101c81bb326e8
38 changes: 38 additions & 0 deletions Doc/library/urllib.parse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,10 @@ or on combining URL components into a URL string.
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')

.. warning::

The :func:`urlparse` API does not perform validation. See :ref:`URL
parsing security <url-parsing-security>` for details.

.. versionchanged:: 3.2
Added IPv6 URL parsing capabilities.
Expand Down Expand Up @@ -328,6 +332,11 @@ or on combining URL components into a URL string.
control and space characters are stripped from the URL. ``\n``,
``\r`` and tab ``\t`` characters are removed from the URL at any position.

.. warning::

The :func:`urlsplit` API does not perform validation. See :ref:`URL
parsing security <url-parsing-security>` for details.

.. versionchanged:: 3.6
Out-of-range port numbers now raise :exc:`ValueError`, instead of
returning :const:`None`.
Expand Down Expand Up @@ -418,6 +427,35 @@ or on combining URL components into a URL string.
or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned
without changes.

.. _url-parsing-security:

URL parsing security
--------------------

The :func:`urlsplit` and :func:`urlparse` APIs do not perform **validation**
of inputs. They may not raise errors on inputs that other applications
consider invalid. They may accept and pass through some inputs that might
not be considered URLs elsewhere as unusually split component parts. Their
purpose is for practical functionality rather than purity.

Instead of raising an exception on unusual input, they may instead return
some components as empty ``""`` strings. Or components may contain more than
perhaps they should.

We recommend that users of these APIs where the values may be used anywhere
with security implications code defensively. Do some verification within
your code before trusting a returned component part. Does that ``scheme``
make sense? Is that a sensible ``path``? Is there anything strange about
that ``hostname``? etc.

What constitutes a URL is not universally well defined. Different
applications have different needs and desired constraints. For instance the
living `WHATWG spec`_ describes what user facing web clients such as a web
browser require. While :rfc:`3986` is more general. These functions
incorporate some aspects of both, but cannot be claimed compliant with
either. Our APIs and code with expectations on their behaviors predate both
standards. We attempt to maintain backwards compatibility.

.. _parsing-ascii-encoded-bytes:

Parsing ASCII Encoded Bytes
Expand Down
0