8000 gh-130283: update deprecated links and examples in `urllib.request` docs by Mr-Sunglasses · Pull Request #130284 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-130283: update deprecated links and examples in urllib.request docs #130284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Next Next commit
gh-130283: update deprecated links and examples in urllib.request docs
  • Loading branch information
Mr-Sunglasses committed Feb 18, 2025
commit 061cba8857b5bc5100c19707ccea089afa474b0c
20 changes: 9 additions & 11 deletions Doc/library/urllib.request.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1219,20 +1219,16 @@ it. ::
>>> with urllib.request.urlopen('http://www.python.org/') as f:
... print(f.read(300))
...
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
<meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
<title>Python Programming '
b'<!doctype html>\n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 8]> <html class="no-js ie8 lt-ie9">

Note that urlopen returns a bytes object. This is because there is no way
for urlopen to automatically determine the encoding of the byte stream
it receives from the HTTP server. In general, a program will decode
the returned bytes object to string once it determines or guesses
the appropriate encoding.

The following W3C document, https://www.w3.org/International/O-charset\ , lists
the various ways in which an (X)HTML or an XML document could have specified its
The following W3C document, https://www.w3.org/International/questions/qa-html-encoding-declarations\ , lists
the various ways in which an HTML document could have specified its
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML is still covered by the document but they explicitly don't say "XHTML".

encoding information.

As the python.org website uses *utf-8* encoding as specified in its meta tag, we
Expand All @@ -1241,17 +1237,19 @@ will use the same for decoding the bytes object. ::
>>> with urllib.request.urlopen('http://www.python.org/') as f:
... print(f.read(100).decode('utf-8'))
...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm
<!doctype html>
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
<!-

It is also possible to achieve the same result without using the
:term:`context manager` approach. ::

>>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/')
>>> print(f.read(100).decode('utf-8'))
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm
<!doctype html>
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--

In the following example, we are sending a data-stream to the stdin of a CGI
and reading the data it returns to us. Note that this example will only work
Expand Down
Loading
0