-
-
Notifications
You must be signed in to change notification settings - Fork 32k
urllib2 doesn't escape spaces in http requests #57568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
urllib2.urlopen('http://foo/url and spaces') will send a HTTP request line like this to the server: GET /url and spaces HTTP/1.1 which the server obviously does not understand. This contrasts with urllib's behaviour which replaces the spaces (' ') in the url with '%20'. Related: bpo-918368 bpo-1153027 |
I have used the quote method to percent encode the url for spaces and similar characters. This is my first patch. Please let me know if there is anything wrong. I will correct and re-submit it. I ran the test_urllib2.py which gave an OK for 34 tests. Changes are made in two instances:
|
Seems good. |
Patch attached for python3, with unit tests. |
FWIW, I don't think it is a good idea to escape automatically. It will change the behaviour in a non-backward compatible way for existing applications that pass encoded urls to this function. I think the existing behaviour is better. The documentation and the failure mode for passing URLs with spaces could however be improved. |
Here the patch for python2. kiilerix, RFC 1738 explicitly says that the space character shall not be used. |
Yes, the url sent by urllib2 must not contain spaces. In my opinion the only way to handle that correctly is to not pass urls with spaces to urlopen. Escaping the urls is not a good solution - even if the API was to be designed from scratch. It would be better to raise an exception if it is passed an invalid url. Note for example that '/' and the %-encoding of '/' are different, and it must thus be possible to pass an url containing both to urlopen. That is not possible if it automically escapes. |
The issue with the current patch is that it is escaping more than only the spaces, with possibly indirect border effect. |
I vote for the parse method converting the spaces (and only the spaces) explicitly, for the following reasons:
Here's a patch implementing this. The change allows for any whitespace character in the selector part of the url (and in particular, '\n'), not only ' '. |
I think this could be merged with bpo-14826. Maybe it is sensible to handle all control characters the same way. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: