From 061cba8857b5bc5100c19707ccea089afa474b0c Mon Sep 17 00:00:00 2001 From: Kanishk Pachauri Date: Wed, 19 Feb 2025 02:07:11 +0530 Subject: [PATCH 1/5] gh-130283: update deprecated links and examples in `urllib.request` docs --- Doc/library/urllib.request.rst | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index b3efde3f189566..4556bba29d40bd 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -1219,11 +1219,7 @@ it. :: >>> with urllib.request.urlopen('http://www.python.org/') as f: ... print(f.read(300)) ... - b'\n\n\n\n\n\n - \n - Python Programming ' + b'<!doctype html>\n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 8]> <html class="no-js ie8 lt-ie9"> Note that urlopen returns a bytes object. This is because there is no way for urlopen to automatically determine the encoding of the byte stream @@ -1231,8 +1227,8 @@ it receives from the HTTP server. In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding. -The following W3C document, https://www.w3.org/International/O-charset\ , lists -the various ways in which an (X)HTML or an XML document could have specified its +The following W3C document, https://www.w3.org/International/questions/qa-html-encoding-declarations\ , lists +the various ways in which an HTML document could have specified its encoding information. As the python.org website uses *utf-8* encoding as specified in its meta tag, we @@ -1241,8 +1237,9 @@ will use the same for decoding the bytes object. :: >>> with urllib.request.urlopen('http://www.python.org/') as f: ... print(f.read(100).decode('utf-8')) ... - <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" - "http://www.w3.org/TR/xhtml1/DTD/xhtm + <!doctype html> + <!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]--> + <!- It is also possible to achieve the same result without using the :term:`context manager` approach. :: @@ -1250,8 +1247,9 @@ It is also possible to achieve the same result without using the >>> import urllib.request >>> f = urllib.request.urlopen('http://www.python.org/') >>> print(f.read(100).decode('utf-8')) - <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" - "http://www.w3.org/TR/xhtml1/DTD/xhtm + <!doctype html> + <!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]--> + <!-- In the following example, we are sending a data-stream to the stdin of a CGI and reading the data it returns to us. Note that this example will only work From 2b066f74ec49218348713e6bfdeeb6bd3683f10c Mon Sep 17 00:00:00 2001 From: Kanishk Pachauri <itskanishkp.py@gmail.com> Date: Sun, 9 Mar 2025 23:43:46 +0530 Subject: [PATCH 2/5] docs: fix the ". ::" sentences and merge them into "::" directly --- Doc/library/urllib.request.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index 4556bba29d40bd..553839dabdbd85 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -1213,7 +1213,7 @@ In addition to the examples below, more examples are given in :ref:`urllib-howto`. This example gets the python.org main page and displays the first 300 bytes of -it. :: +it:: >>> import urllib.request >>> with urllib.request.urlopen('http://www.python.org/') as f: @@ -1232,7 +1232,7 @@ the various ways in which an HTML document could have specified its encoding information. As the python.org website uses *utf-8* encoding as specified in its meta tag, we -will use the same for decoding the bytes object. :: +will use the same for decoding the bytes object:: >>> with urllib.request.urlopen('http://www.python.org/') as f: ... print(f.read(100).decode('utf-8')) @@ -1242,7 +1242,7 @@ will use the same for decoding the bytes object. :: <!- It is also possible to achieve the same result without using the -:term:`context manager` approach. :: +:term:`context manager` approach:: >>> import urllib.request >>> f = urllib.request.urlopen('http://www.python.org/') From 29c53e783d2e2f72c667acd928df457324da2505 Mon Sep 17 00:00:00 2001 From: Kanishk Pachauri <itskanishkp.py@gmail.com> Date: Sun, 9 Mar 2025 23:49:39 +0530 Subject: [PATCH 3/5] docs: Add the reference of HTML spec document and W3C document --- Doc/library/urllib.request.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index 553839dabdbd85..195a328b39481d 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -1227,10 +1227,12 @@ it receives from the HTTP server. In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding. -The following W3C document, https://www.w3.org/International/questions/qa-html-encoding-declarations\ , lists +The following HTML spec document, https://html.spec.whatwg.org/#charset\ , lists the various ways in which an HTML document could have specified its encoding information. +For additional information, see the W3C document: https://www.w3.org/International/questions/qa-html-encoding-declarations\ . + As the python.org website uses *utf-8* encoding as specified in its meta tag, we will use the same for decoding the bytes object:: From 0c4d55e39dfdb2a29ebf13ab700f35c59de89841 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?B=C3=A9n=C3=A9dikt=20Tran?= <10796600+picnixz@users.noreply.github.com> Date: Sun, 23 Mar 2025 14:22:42 +0100 Subject: [PATCH 4/5] Apply suggestions from code review --- Doc/library/urllib.request.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index 624a7751d8ee9e..310fb7ce9f684b 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -1229,11 +1229,11 @@ it receives from the HTTP server. In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding. -The following HTML spec document, https://html.spec.whatwg.org/#charset\ , lists +The following HTML spec document, https://html.spec.whatwg.org/#charset, lists the various ways in which an HTML document could have specified its encoding information. -For additional information, see the W3C document: https://www.w3.org/International/questions/qa-html-encoding-declarations\ . +For additional information, see the W3C document: https://www.w3.org/International/questions/qa-html-encoding-declarations. As the python.org website uses *utf-8* encoding as specified in its meta tag, we will use the same for decoding the bytes object:: From 6c60d905b60cafe6f3955513c87238aa4f8d52ea Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?B=C3=A9n=C3=A9dikt=20Tran?= <10796600+picnixz@users.noreply.github.com> Date: Sun, 23 Mar 2025 14:23:17 +0100 Subject: [PATCH 5/5] Update Doc/library/urllib.request.rst --- Doc/library/urllib.request.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index 310fb7ce9f684b..8b54e10713e782 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -1230,7 +1230,7 @@ the returned bytes object to string once it determines or guesses the appropriate encoding. The following HTML spec document, https://html.spec.whatwg.org/#charset, lists -the various ways in which an HTML document could have specified its +the various ways in which an HTML or an XML document could have specified its encoding information. For additional information, see the W3C document: https://www.w3.org/International/questions/qa-html-encoding-declarations.