8000 GH-123599: `url2pathname()`: don't call `gethostbyname()` by default by barneygale · Pull Request #132610 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

GH-123599: url2pathname(): don't call gethostbyname() by default #132610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions Doc/library/pathlib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -872,10 +872,10 @@ conforming to :rfc:`8089`.
.. versionadded:: 3.13

.. versionchanged:: next
If a URL authority (e.g. a hostname) is present and resolves to a local
address, it is discarded. If an authority is present and *doesn't*
resolve to a local address, then on Windows a UNC path is returned (as
before), and on other platforms a :exc:`ValueError` is raised.
The URL authority is discarded if it matches the local hostname.
Otherwise, if the authority isn't empty or ``localhost``, then on
Windows a UNC path is returned (as before), and on other platforms a
:exc:`ValueError` is raised.


.. method:: Path.as_uri()
Expand Down
18 changes: 12 additions & 6 deletions Doc/library/urllib.request.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,10 +172,10 @@ The :mod:`urllib.request` module defines the following functions:
the URL ``///etc/hosts``.

.. versionchanged:: next
The *add_scheme* argument was added.
The *add_scheme* parameter was added.


.. function:: url2pathname(url, *, require_scheme=False)
.. function:: url2pathname(url, *, require_scheme=False, resolve_host=False)

Convert the given ``file:`` URL to a local path. This function uses
:func:`~urllib.parse.unquote` to decode the URL.
Expand All @@ -185,6 +185,13 @@ The :mod:`urllib.request` module defines the following functions:
value should include the prefix; a :exc:`~urllib.error.URLError` is raised
if it doesn't.

The URL authority is discarded if it is empty, ``localhost``, or the local
hostname. Otherwise, if *resolve_host* is set to true, the authority is
resolved using :func:`socket.gethostbyname` and discarded if it matches a
local IP address (as per :rfc:`RFC 8089 §3 <8089#section-3>`). If the
authority is still unhandled, then on Windows a UNC path is returned, and
on other platforms a :exc:`~urllib.error.URLError` is raised.

This example shows the function being used on Windows::

>>> from urllib.request import url2pathname
Expand All @@ -198,14 +205,13 @@ The :mod:`urllib.request` module defines the following functions:
:exc:`OSError` exception to be raised on Windows.

.. versionchanged:: next
This function calls :func:`socket.gethostbyname` if the URL authority
isn't empty, ``localhost``, or the machine hostname. If the authority
resolves to a local IP address then it is discarded; otherwise, on
The URL authority is discarded if it matches the local hostname.
Otherwise, if the authority isn't empty or ``localhost``, then on
Windows a UNC path is returned (as before), and on other platforms a
:exc:`~urllib.error.URLError` is raised.

.. versionchanged:: next
The *require_scheme* argument was added.
The *require_scheme* and *resolve_host* parameters were added.


.. function:: getproxies()
Expand Down
8 changes: 5 additions & 3 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1703,9 +1703,11 @@ urllib

- Accept a complete URL when the new *require_scheme* argument is set to
true.
- Discard URL authorities that resolve to a local IP address.
- Raise :exc:`~urllib.error.URLError` if a URL authority doesn't resolve
to a local IP address, except on Windows where we return a UNC path.
- Discard URL authority if it matches the local hostname.
- Discard URL authority if it resolves to a local IP address when the new
*resolve_host* argument is set to true.
- Raise :exc:`~urllib.error.URLError` if a URL authority isn't local,
except on Windows where we return a UNC path as before.

In :func:`urllib.request.pathname2url`:

Expand Down
1 change: 0 additions & 1 deletion Lib/test/test_pathlib/test_pathlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -3290,7 +3290,6 @@ def test_from_uri_posix(self):
self.assertEqual(P.from_uri('file:////foo/bar'), P('//foo/bar'))
self.assertEqual(P.from_uri('file://localhost/foo/bar'), P('/foo/bar'))
if not is_wasi:
self.assertEqual(P.from_uri('file://127.0.0.1/foo/bar'), P('/foo/bar'))
self.assertEqual(P.from_uri(f'file://{socket.gethostname()}/foo/bar'),
P('/foo/bar'))
self.assertRaises(ValueError, P.from_uri, 'foo/bar')
Expand Down
16 changes: 12 additions & 4 deletions Lib/test/test_urllib.py
Original file line number Diff line number Diff line change
Expand Up @@ -1551,7 +1551,8 @@ def test_url2pathname_require_scheme(self):
urllib.request.url2pathname(url, require_scheme=True),
expected_path)

error_subtests = [
def test_url2pathname_require_scheme_errors(self):
subtests = [
'',
':',
'foo',
Expand All @@ -1561,13 +1562,20 @@ def test_url2pathname_require_scheme(self):
'data:file:foo',
'data:file://foo',
]
for url in error_subtests:
for url in subtests:
with self.subTest(url=url):
self.assertRaises(
urllib.error.URLError,
urllib.request.url2pathname,
url, require_scheme=True)

def test_url2pathname_resolve_host(self):
fn = urllib.request.url2pathname
sep = os.path.sep
self.assertEqual(fn('//127.0.0.1/foo/bar', resolve_host=True), f'{sep}foo{sep}bar')
self.assertEqual(fn(f'//{socket.gethostname()}/foo/bar'), f'{sep}foo{sep}bar')
self.assertEqual(fn(f'//{socket.gethostname()}/foo/bar', resolve_host=True), f'{sep}foo{sep}bar')

@unittest.skipUnless(sys.platform == 'win32',
'test specific to Windows pathnames.')
def test_url2pathname_win(self):
Expand Down Expand Up @@ -1598,6 +1606,7 @@ def test_url2pathname_win(self):
self.assertEqual(fn('//server/path/to/file'), '\\\\server\\path\\to\\file')
self.assertEqual(fn('////server/path/to/file'), '\\\\server\\path\\to\\file')
self.assertEqual(fn('/////server/path/to/file'), '\\\\server\\path\\to\\file')
self.assertEqual(fn('//127.0.0.1/path/to/file'), '\\\\127.0.0.1\\path\\to\\file')
# Localhost paths
self.assertEqual(fn('//localhost/C:/path/to/file'), 'C:\\path\\to\\file')
self.assertEqual(fn('//localhost/C|/path/to/file'), 'C:\\path\\to\\file')
Expand All @@ -1622,8 +1631,7 @@ def test_url2pathname_posix(self):
self.assertRaises(urllib.error.URLError, fn, '//:80/foo/bar')
self.assertRaises(urllib.error.URLError, fn, '//:/foo/bar')
self.assertRaises(urllib.error.URLError, fn, '//c:80/foo/bar')
self.assertEqual(fn('//127.0.0.1/foo/bar'), '/foo/bar')
self.assertEqual(fn(f'//{socket.gethostname()}/foo/bar'), '/foo/bar')
self.assertRaises(urllib.error.URLError, fn, '//127.0.0.1/foo/bar')

@unittest.skipUnless(os_helper.FS_NONASCII, 'need os_helper.FS_NONASCII')
def test_url2pathname_nonascii(self):
Expand Down
19 changes: 12 additions & 7 deletions Lib/urllib/request.py
Original file line number Diff line number Diff line change
Expand Up @@ -1466,7 +1466,7 @@ def get_names(self):
def open_local_file(self, req):
import email.utils
import mimetypes
localfile = url2pathname(req.full_url, require_scheme=True)
localfile = url2pathname(req.full_url, require_scheme=True, resolve_host=True)
try:
stats = os.stat(localfile)
size = stats.st_size
Expand All @@ -1482,7 +1482,7 @@ def open_local_file(self, req):

file_open = open_local_file

def _is_local_authority(authority):
def _is_local_authority(authority, resolve):
# Compare hostnames
if not authority or authority == 'localhost':
return True
Expand All @@ -1494,9 +1494,11 @@ def _is_local_authority(authority):
if authority == hostname:
return True
# Compare IP addresses
if not resolve:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, should we check this before calling gethostname() which can fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to come up with firm info on when gethostname() fails (or whether it can perform network access). Do you have any pointers? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know.

But I was thinking about this case: processing the "file:" URL on different computer. We cannot use gethostname(), because it returns a different result. If we want to get a path on other computer, we should ignore the host name. There will still be problem with processing Windows paths on Posix and Posix paths on Windows.

I do not know how common this case, and how common the opposite case, when we only use gethostname(), but not gethostbyname(). We should guess what would be more useful for users without having any data.

return False
try:
address = socket.gethostbyname(authority)
except (socket.gaierror, AttributeError):
except (socket.gaierror, AttributeError, UnicodeEncodeError):
return False
return address in FileHandler().get_names()

Expand Down Expand Up @@ -1641,21 +1643,24 @@ def data_open(self, req):
return addinfourl(io.BytesIO(data), headers, url)


# Code move from the old urllib module
# Code moved from the old urllib module

def url2pathname(url, *, require_scheme=False):
def url2pathname(url, *, require_scheme=False, resolve_host=False):
"""Convert the given file URL to a local file system path.

The 'file:' scheme prefix must be omitted unless *require_scheme*
is set to true.

The URL authority may be resolved with gethostbyname() if
*resolve_host* is set to true.
"""
if require_scheme:
scheme, url = _splittype(url)
if scheme != 'file':
raise URLError("URL is missing a 'file:' scheme")
authority, url = _splithost(url)
if os.name == 'nt':
if not _is_local_authority(authority):
if not _is_local_authority(authority, resolve_host):
# e.g. file://server/share/file.txt
url = '//' + authority + url
elif url[:3] == '///':
Expand All @@ -1669,7 +1674,7 @@ def url2pathname(url, *, require_scheme=False):
# Older URLs use a pipe after a drive letter
url = url[:1] + ':' + url[2:]
url = url.replace('/', '\\')
elif not _is_local_authority(authority):
elif not _is_local_authority(authority, resolve_host):
raise URLError("file:// scheme is supported only on localhost")
encoding = sys.getfilesystemencoding()
errors = sys.getfilesystemencodeerrors()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
Fix issue where :func:`urllib.request.url2pathname` mishandled file URLs with
authorities. If an authority is present and resolves to ``localhost``, it is
now discarded. If an authority is present but *doesn't* resolve to
``localhost``, then on Windows a UNC path is returned (as before), and on
other platforms a :exc:`urllib.error.URLError` is now raised.
Add *resolve_host* keyword-only parameter to
:func:`urllib.request.url2pathname`, and fix handling of file URLs with
authorities.
Loading
0