-
-
Notifications
You must be signed in to change notification settings - Fork 32k
urlopen URL with unescaped space #59031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There appears to be an odd networking issue with how urllib2 sends HTTP requests. Downloadin 8000 g an image from maw.liquifire.com gives an error: $ python -c 'import urllib2 ; urllib2.urlopen("http://maw.liquifire.com/maw?set=image[2302.000.13314 a]&call=url[file:325x445]")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1180, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 447, in readline
data = self._sock.recv(self._rbufsize)
socket.error: [Errno 104] Connection reset by peer Downloading the same image using wget works fine: $ wget 'http://maw.liquifire.com/maw?set=image[2302.000.13314 a]&call=url[file:325x445]'
--2012-05-16 10:53:27-- http://maw.liquifire.com/maw?set=image[2302.000.13314%20a]&call=url[file:325x445]
Resolving maw.liquifire.com (maw.liquifire.com)... 184.169.78.6
Connecting to maw.liquifire.com (maw.liquifire.com)|184.169.78.6|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11393 (11K) [image/jpeg]
Saving to: `maw?set=image[2302.000.13314 a]&call=url[file:325x445]' 100%[======================================>] 11,393 --.-K/s in 0.003s 2012-05-16 10:53:27 (3.49 MB/s) - `maw?set=image[2302.000.13314 a]&call=url[file:325x445]' saved [11393/11393] |
http://maw.liquifire.com/maw?set=image[2302.000.13314%20a]&call=url[file:325x445] works properly. Notice the %20 instead of ' ' |
Here is a patch that uses the same quoting logic in urllib.request.Request.__init__ as is used by urllib.request.URLopener.open() |
New changeset 01c8d800efd2 by Senthil Kumaran in branch '3.2': New changeset e6bb919b2623 by Senthil Kumaran in branch 'default': |
New changeset d931a3b64fd6 by Senthil Kumaran in branch '2.7': |
Thanks for the patch, Stephen. |
It looks like this broke the build bots: |
Here's a followup patch that fixes the trunk build for me. This will unbreak the builds as well as fixing this bug, but it should be investigated why URLopener calls to_bytes() and Request does not. Ideally this interface should be consistent. |
It seems to me that toBytes in urllib was introduce to restrict the http://mail.python.org/pipermail/python-bugs-list/2000-November/002779.html And quoting to toBytes / to_bytes is actually the problem here, as |
I’m not sure urllib should accept invalid (non-escaped) URLs; a higher-level application can do so, but for the low-level stdlib module it is more debatable. |
Yeah, I am thinking so as well in that case, the test_cookielib.py |
New changeset ee1828dc3bf6 by Senthil Kumaran in branch '3.2': New changeset dc30111a5d7e by Senthil Kumaran in branch 'default': New changeset 224b27a8d9be by Senthil Kumaran in branch '2.7': |
The last change should settle the buildbots, But I would like to come |
Senthil, do you read python-dev? I think this change was prematurate from the start (nevermind the fact that you didn't run the test suite before committing). For example, if you have an URL with a non-ASCII domain name such as "http://وزارة-الأتصالات.مصر/", the domain name should IDNA-encoded, not %-encoded like the rest. Furthermore, some people are certainly already quoting their URLs to workaround this issue, so "fixing" it will break their code by double-escaping the URLs. You've got to be more careful. |
The docs [1] state that I vote for reverting the chances as they break the API. You could improve the docs and emphasize that URLs must be quoted correctly as the module doesn't implement browser magic. [1] http://docs.python.org/py3k/library/urllib.request.html#urllib.request.Request |
On Sun, Jul 8, 2012 at 2:30 AM, Antoine Pitrou <report@bugs.python.org> wrote:
I thought that the other legacy URLOpen was quoting it correct and
Agreed and understood.
Oh. yes, the change may break an already quoted URL. I think, I shall |
On Sun, Jul 8, 2012 at 9:42 AM, Christian Heimes <report@bugs.python.org> wrote:
Okay. But I do realize that in 3.3, we may have a FancyURLOpener / |
New changeset ebd37273e0fe by Senthil Kumaran in branch '3.2': |
New changeset a4bdb637d818 by Senthil Kumaran in branch 'default': |
FWIW urlopen() already handles space characters in the Location target of redirects; see HTTPRedirectHandler.redirect_request(). So I think it is reasonable to handle space characters in user-supplied URLs also, if it is done properly. |
I think this should be treated as a feature, not a bug, since as Christian said, the documentation currently does not support this case. |
urllib.request.URLopener() and FancyURLopener() automatically quote() URLs for the user. Those APIs are marked deprecated since 3.3 but have no timeline for removal. urllib.request.urlopen() does not use those, so URLs passed in are not auto-quoted. i'll clarify the docs for URLopener. |
It seems there's nothing left to be done here? We've documented the behaviour of the (deprecated) URLopener |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: