8000 GH-123599: `url2pathname()`: don't call `gethostbyname()` by default by barneygale · Pull Request #132610 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

GH-123599: url2pathname(): don't call gethostbyname() by default #132610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 5, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Address some review feedback
  • Loading branch information
barneygale committed Apr 28, 2025
commit 873c19a4d1b8588ba1063572c07195cfe74835c3
15 changes: 6 additions & 9 deletions Doc/library/urllib.request.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ The :mod:`urllib.request` module defines the following functions:
The *add_scheme* argument was added.


.. function:: url2pathname(url, *, require_scheme=False, resolve_netloc=False)
.. function:: url2pathname(url, *, require_scheme=False, resolve_host=False)

Convert the given ``file:`` URL to a local path. This function uses
:func:`~urllib.parse.unquote` to decode the URL.
Expand All @@ -186,11 +186,11 @@ The :mod:`urllib.request` module defines the following functions:
if it doesn't.

The URL authority is discarded if it is empty, ``localhost``, or the local
hostname. Otherwise, if *resolve_netloc* is set to true, the authority is
hostname. Otherwise, if *resolve_host* is set to true, the authority is
resolved using :func:`socket.gethostbyname` and discarded if it matches a
local IP address. If the authority is still unhandled, then on Windows a
UNC path is returned, and on other platforms a
:exc:`~urllib.error.URLError` is raised.
local IP address (as per :rfc:`RFC 8089 §3 <8089#section-3>`). If the
authority is still unhandled, then on Windows a UNC path is returned, and
on other platforms a :exc:`~urllib.error.URLError` is raised.

This example shows the function being used on Windows::

Expand All @@ -211,10 +211,7 @@ The :mod:`urllib.request` module defines the following functions:
:exc:`~urllib.error.URLError` is raised.

< 8000 span class='blob-code-inner blob-code-marker ' data-code-marker=" "> .. versionchanged:: next
The *require_scheme* argument was added.

.. versionchanged:: next
The *resolve_netloc* argument was added.
The *require_scheme* and *resolve_host* arguments were added.


.. function:: getproxies()
Expand Down
2 changes: 1 addition & 1 deletion Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1232,7 +1232,7 @@ urllib
true.
- Discard URL authority if it matches the local hostname.
- Discard URL authority if it resolves to a local IP address when the new
*resolve_netloc* argument is set to true.
*resolve_host* argument is set to true.
- Raise :exc:`~urllib.error.URLError` if a URL authority isn't local,
except on Windows where we return a UNC path as before.

Expand Down
6 changes: 3 additions & 3 deletions Lib/test/test_urllib.py
Original file line number Diff line number Diff line change
Expand Up @@ -1569,12 +1569,12 @@ def test_url2pathname_require_scheme_errors(self):
urllib.request.url2pathname,
url, require_scheme=True)

def test_url2pathname_resolve_netloc(self):
def test_url2pathname_resolve_host(self):
fn = urllib.request.url2pathname
sep = os.path.sep
self.assertEqual(fn('//127.0.0.1/foo/bar', resolve_netloc=True), f'{sep}foo{sep}bar')
self.assertEqual(fn('//127.0.0.1/foo/bar', resolve_host=True), f'{sep}foo{sep}bar')
self.assertEqual(fn(f'//{socket.gethostname()}/foo/bar'), f'{sep}foo{sep}bar')
self.assertEqual(fn(f'//{socket.gethostname()}/foo/bar', resolve_netloc=True), f'{sep}foo{sep}bar')
self.assertEqual(fn(f'//{socket.gethostname()}/foo/bar', resolve_host=True), f'{sep}foo{sep}bar')

@unittest.skipUnless(sys.platform == 'win32',
'test specific to Windows pathnames.')
Expand Down
10 changes: 5 additions & 5 deletions Lib/urllib/request.py
Original file line number Diff line number Diff line change
Expand Up @@ -1466,7 +1466,7 @@ def get_names(self):
def open_local_file(self, req):
import email.utils
import mimetypes
localfile = url2pathname(req.full_url, require_scheme=True, resolve_netloc=True)
localfile = url2pathname(req.full_url, require_scheme=True, resolve_host=True)
try:
stats = os.stat(localfile)
size = stats.st_size
Expand Down Expand Up @@ -1645,22 +1645,22 @@ def data_open(self, req):

# Code moved from the old urllib module

def url2pathname(url, *, require_scheme=False, resolve_netloc=False):
def url2pathname(url, *, require_scheme=False, resolve_host=False):
"""Convert the given file URL to a local file system path.

The 'file:' scheme prefix must be omitted unless *require_scheme*
is set to true.

The URL authority may be resolved with gethostbyname() if
*resolve_netloc* is set to true.
*resolve_host* is set to true.
"""
if require_scheme:
scheme, url = _splittype(url)
if scheme != 'file':
raise URLError("URL is missing a 'file:' scheme")
authority, url = _splithost(url)
if os.name == 'nt':
if not _is_local_authority(authority, resolve_netloc):
if not _is_local_authority(authority, resolve_host):
# e.g. file://server/share/file.txt
url = '//' + authority + url
elif url[:3] == '///':
Expand All @@ -1674,7 +1674,7 @@ def url2pathname(url, *, require_scheme=False, resolve_netloc=False):
# Older URLs use a pipe after a drive letter
url = url[:1] + ':' + url[2:]
url = url.replace('/', '\\')
elif not _is_local_authority(authority, resolve_netloc):
elif not _is_local_authority(authority, resolve_host):
raise URLError("file:// scheme is supported only on localhost")
encoding = sys.getfilesystemencoding()
errors = sys.getfilesystemencodeerrors()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
Fix issue where :func:`urllib.request.url2pathname` mishandled file URLs with
authorities. The process now works as follows:

1. Discard authority if it is empty or ``localhost``; otherwise
2. (New) Discard authority if it matches the local hostname; otherwise
3. (New) If the new *resolve_netloc* keyword-only argument is set to true,
discard authority if it resolves to a local IP address; otherwise
4. On Windows, return a UNC path; otherwise
5. (New) Raise :exc:`urllib.error.URLError`.
Add *resolve_host* keyword-only argument to
:func:`urllib.request.url2pathname`, and fix handling of file URLs with
authorities.
Loading
0