10000 Enable UTF-8 mode by default (PEP 686) · python/cpython@fc19177 · GitHub
[go: up one dir, main page]

Skip to content

Commit fc19177

Browse files
committed
Enable UTF-8 mode by default (PEP 686)
1 parent b2fabce commit fc19177

File tree

15 files changed

+74
-84
lines changed

15 files changed

+74
-84
lines changed

Doc/c-api/init_config.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -975,9 +975,7 @@ PyPreConfig
975975
Set to ``0`` or ``1`` by the :option:`-X utf8 <-X>` command line option
976976
and the :envvar:`PYTHONUTF8` environment variable.
977977
978-
Also set to ``1`` if the ``LC_CTYPE`` locale is ``C`` or ``POSIX``.
979-
980-
Default: ``-1`` in Python config and ``0`` in isolated config.
978+
Default: ``1``.
981979
982980
983981
.. _c-preinit:

Doc/library/os.rst

Lines changed: 17 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,12 @@ Python UTF-8 Mode
108108
.. versionadded:: 3.7
109109
See :pep:`540` for more details.
110110

111+
.. versionchanged:: next
112+
113+
Python UTF-8 mode is now enabled by default (:pep:`686`).
114+
It may be disabled with by setting :envvar:`PYTHONUTF8=0 <PYTHONUTF8>` as
115+
an environment variable or by using the :option:`-X utf8=0 <-X>` flag.
116+
111117
The Python UTF-8 Mode ignores the :term:`locale encoding` and forces the usage
112118
of the UTF-8 encoding:
113119

@@ -139,31 +145,22 @@ level API B41A s also exhibit different default behaviours:
139145
default so that attempting to open a binary file in text mode is likely
140146
to raise an exception rather than producing nonsense data.
141147

142-
The :ref:`Python UTF-8 Mode <utf8-mode>` is enabled if the LC_CTYPE locale is
143-
``C`` or ``POSIX`` at Python startup (see the :c:func:`PyConfig_Read`
144-
function).
145-
146-
It can be enabled or disabled using the :option:`-X utf8 <-X>` command line
147-
option and the :envvar:`PYTHONUTF8` environment variable.
148-
149-
If the :envvar:`PYTHONUTF8` environment variable is not set at all, then the
150-
interpreter defaults to using the current locale settings, *unless* the current
151-
locale is identified as a legacy ASCII-based locale (as described for
152-
:envvar:`PYTHONCOERCECLOCALE`), and locale coercion is either disabled or
153-
fails. In such legacy locales, the interpreter will default to enabling UTF-8
154-
mode unless explicitly instructed not to do so.
155-
156-
The Python UTF-8 Mode can only be enabled at the Python startup. Its value
148+
The :ref:`Python UTF-8 Mode <utf8-mode>` is enabled by default.
149+
It can be disabled using the :option:`-X utf8 <-X>` command line
150+
option or the :envvar:`PYTHONUTF8` environment variable.
151+
The Python UTF-8 Mode can only be disabled at Python startup. Its value
157152
can be read from :data:`sys.flags.utf8_mode <sys.flags>`.
158153

154+
If the UTF-8 mode is disabled, the interpreter defaults to using
155+
the current locale settings, *unless* the current locale is identified
156+
as a legacy ASCII-based locale (as described for :envvar:`PYTHONCOERCECLOCALE`),
157+
and locale coercion is either disabled or fails.
158+
In such legacy locales, the interpreter will default to enabling UTF-8 mode
159+
unless explicitly instructed not to do so.
160+
159161
See also the :ref:`UTF-8 mode on Windows <win-utf8-mode>`
160162
and the :term:`filesystem encoding and error handler`.
161163

162-
.. seealso::
163-
164-
:pep:`686`
165-
Python 3.15 will make :ref:`utf8-mode` default.
166-
167164

168165
.. _os-procinfo:
169166

Doc/library/sys.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -642,6 +642,9 @@ always available. Unless explicitly noted otherwise, all variables are read-only
642642
.. versionchanged:: 3.14
643643
Added the ``context_aware_warnings`` attribute.
644644

645+
.. versionchanged:: next
646+
UTF-8 mode (:option:`-X utf8 <-X>`) is now enabled by default.
647+
645648

646649
.. data:: float_info
647650

Doc/using/windows.rst

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -952,6 +952,9 @@ UTF-8 mode
952952
==========
953953

954954
.. versionadded:: 3.7
955+
.. versionchanged:: next
956+
957+
Python UTF-8 mode is now enabled by default (:pep:`686`).
955958

956959
Windows still uses legacy encodings for the system encoding (the ANSI Code
957960
Page). Python uses it for the default encoding of text files (e.g.
@@ -960,20 +963,22 @@ Page). Python uses it for the default encoding of text files (e.g.
960963
This may c 10000 ause issues because UTF-8 is widely used on the internet
961964
and most Unix systems, including WSL (Windows Subsystem for Linux).
962965

963-
You can use the :ref:`Python UTF-8 Mode <utf8-mode>` to change the default text
964-
encoding to UTF-8. You can enable the :ref:`Python UTF-8 Mode <utf8-mode>` via
965-
the ``-X utf8`` command line option, or the ``PYTHONUTF8=1`` environment
966-
variable. See :envvar:`PYTHONUTF8` for enabling UTF-8 mode, and
967-
:ref:`setting-envvars` for how to modify environment variables.
968-
969-
When the :ref:`Python UTF-8 Mode <utf8-mode>` is enabled, you can still use the
966+
The :ref:`Python UTF-8 Mode <utf8-mode>`, enabled by default, can help by
967+
changing the default text encoding to UTF-8.
968+
When the :ref:`UTF-8 mode <utf8-mode>` is enabled, you can still use the
970969
system encoding (the ANSI Code Page) via the "mbcs" codec.
971970

972-
Note that adding ``PYTHONUTF8=1`` to the default environment variables
973-
will affect all Python 3.7+ applications on your system.
974-
If you have any Python 3.7+ applications which rely on the legacy
975-
system encoding, it is recommended to set the environment variable
976-
temporarily or use the ``-X utf8`` command line option.
971+
You can disable the :ref:`Python UTF-8 Mode <utf8-mode>` via
972+
the ``-X utf8=0`` command line option, or the ``PYTHONUTF8=0`` environment
973+
variable. See :envvar:`PYTHONUTF8` for disabling UTF-8 mode, and
974+
:ref:`setting-envvars` for how to modify environment variables.
975+
976+
.. hint::
977+
Adding ``PYTHONUTF8={0,1}`` to the default environment variables
978+
will affect all Python 3.7+ applications on your system.
979+
If you have any Python 3.7+ applications which rely on the legacy
980+
system encoding, it is recommended to set the environment variable
981+
temporarily or use the ``-X utf8`` command line option.
977982

978983
.. note::
979984
Even when UTF-8 mode is disabled, Python uses UTF-8 by default

Doc/whatsnew/3.15.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,10 @@ New features
7575
Other language changes
7676
======================
7777

78+
* Python UTF-8 mode is now enabled by default.
79+
It may be disabled with by setting :envvar:`PYTHONUTF8=0 <PYTHONUTF8>` as
80+
an environment variable or by using the :option:`-X utf8=0 <-X>` flag.
81+
See :pep:`686` for further details.
7882

7983

8084
New modules

Include/cpython/initconfig.h

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -102,15 +102,14 @@ typedef struct PyPreConfig {
102102

103103
/* Enable UTF-8 mode? (PEP 540)
104104
105-
Disabled by default (equals to 0).
105+
If equal to 1, use the UTF-8 encoding and use "surrogateescape" for the
106+
stdin & stdout error handlers.
106107
107-
Set to 1 by "-X utf8" and "-X utf8=1" command line options.
108-
Set to 1 by PYTHONUTF8=1 environment variable.
108+
Enabled by default (equal to 1; PEP 686), or if Py_UTF8Mode=1,
109+
or if "-X utf8=1" or PYTHONUTF8=1.
109110
110-
Set to 0 by "-X utf8=0" and PYTHONUTF8=0.
111-
112-
If equals to -1, it is set to 1 if the LC_CTYPE locale is "C" or
113-
"POSIX", otherwise it is set to 0. Inherit Py_UTF8Mode value value. */
111+
Set to 0 by "-X utf8=0" or PYTHONUTF8=0.
112+
*/
114113
int utf8_mode;
115114

116115
/* If non-zero, enable the Python Development Mode.

Lib/locale.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -651,7 +651,8 @@ def getpreferredencoding(do_setlocale=True):
651651
if sys.flags.warn_default_encoding:
652652
import warnings
653653
warnings.warn(
6 2851 54-
"UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead.",
654+
"UTF-8 Mode affects locale.getpreferredencoding(). "
655+
"Consider locale.getencoding() instead.",
655656
EncodingWarning, 2)
656657
if sys.flags.utf8_mode:
657658
return 'utf-8'

Lib/subprocess.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -380,8 +380,7 @@ def _text_encoding():
380380

381381
if sys.flags.utf8_mode:
382382
return "utf-8"
383-
else:
384-
return locale.getencoding()
383+
return locale.getencoding()
385384

386385

387386
def call(*popenargs, timeout=None, **kwargs):

Lib/test/test_cmd_line.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,10 @@ def run_utf8_mode(arg):
299299
cmd = [sys.executable, '-X', 'utf8', '-c', code, arg]
300300
return subprocess.run(cmd, stdout=subprocess.PIPE, text=True)
301301

302+
def run_no_utf8_mode(arg):
303+
cmd = [sys.executable, '-X', 'utf8=0', '-c', code, arg]
304+
return subprocess.run(cmd, stdout=subprocess.PIPE, text=True)
305+
302306
valid_utf8 = 'e:\xe9, euro:\u20ac, non-bmp:\U0010ffff'.encode('utf-8')
303307
# invalid UTF-8 byte sequences with a valid UTF-8 sequence
304308
# in the middle.
@@ -311,7 +315,7 @@ def run_utf8_mode(arg):
311315
)
312316
test_args = [valid_utf8, invalid_utf8]
313317

314-
for run_cmd in (run_default, run_c_locale, run_utf8_mode):
318+
for run_cmd in (run_default, run_c_locale, run_utf8_mode, run_no_utf8_mode):
315319
with self.subTest(run_cmd=run_cmd):
316320
for arg in test_args:
317321
proc = run_cmd(arg)

Lib/test/test_embed.py

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -543,7 +543,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
543543
'configure_locale': True,
544544
'coerce_c_locale': False,
545545
'coerce_c_locale_warn': False,
546-
'utf8_mode': False,
546+
'utf8_mode': True,
547547
}
548548
if MS_WINDOWS:
549549
PRE_CONFIG_COMPAT.update({
@@ -560,7 +560,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
560560
configure_locale=False,
561561
isolated=True,
562562
use_environment=False,
563-
utf8_mode=False,
563+
utf8_mode=True,
564564
dev_mode=False,
565565
coerce_c_locale=False,
566566
)
@@ -805,12 +805,6 @@ def get_expected_config(self, expected_preconfig, expected,
805805
'stdio_encoding', 'stdio_errors'):
806806
expected[key] = self.IGNORE_CONFIG
807807

808-
if not expected_preconfig['configure_locale']:
809-
# UTF-8 Mode depends on the locale. There is no easy way
810-
# to guess if UTF-8 Mode will be enabled or not if the locale
811-
# is not configured.
812-
expected_preconfig['utf8_mode'] = self.IGNORE_CONFIG
813-
814808
if expected_preconfig['utf8_mode'] == 1:
815809
if expected['filesystem_encoding'] is self.GET_DEFAULT_CONFIG:
816810
expected['filesystem_encoding'] = 'utf-8'

Lib/test/test_utf8_mode.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,8 @@ def test_env_var(self):
8989
# the UTF-8 mode
9090
if not self.posix_locale():
9191
# PYTHONUTF8 should be ignored if -E is used
92-
out = self.get_output('-E', '-c', code, PYTHONUTF8='1')
93-
self.assertEqual(out, '0')
92+
out = self.get_output('-E', '-c', code, PYTHONUTF8='0')
93+
self.assertEqual(out, '1')
9494

9595
# invalid mode
9696
out = self.get_output('-c', code, PYTHONUTF8='xxx', failure=True)
@@ -116,7 +116,7 @@ def test_filesystemencoding(self):
116116
# PYTHONLEGACYWINDOWSFSENCODING disables the UTF-8 mode
117117
# and has the priority over -X utf8 and PYTHONUTF8
118118
out = self.get_output('-X', 'utf8', '-c', code,
119-
PYTHONUTF8='strict',
119+
PYTHONUTF8='xxx',
120120
PYTHONLEGACYWINDOWSFSENCODING='1')
121121
self.assertEqual(out, 'mbcs/replace')
122122

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Implement :pep:`686`: Enable :ref:`Python UTF-8 Mode <utf8-mode>` by
2+
default. Patch by Adam Turner.

Programs/_testembed.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1882,9 +1882,9 @@ static int test_initconfig_get_api(void)
18821882
assert(initconfig_getint(config, "dev_mode") == 1);
18831883

18841884
// test PyInitConfig_GetInt() on a PyPreConfig option
1885-
assert(initconfig_getint(config, "utf8_mode") == 0);
1886-
assert(PyInitConfig_SetInt(config, "utf8_mode", 1) == 0);
18871885
assert(initconfig_getint(config, "utf8_mode") == 1);
1886+
assert(PyInitConfig_SetInt(config, "utf8_mode", 0) == 0);
1887+
assert(initconfig_getint(config, "utf8_mode") == 0);
18881888

18891889
// test PyInitConfig_GetStr()
18901890
char *str;

Python/initconfig.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -459,7 +459,7 @@ static const char usage_envvars[] =
459459

460460
/* --- Global configuration variables ----------------------------- */
461461

462-
/* UTF-8 mode (PEP 540): if equals to 1, use the UTF-8 encoding, and change
462+
/* UTF-8 mode (PEP 540): if equal to 1, use the UTF-8 encoding, and change
463463
stdin and stdout error handler to "surrogateescape". */
464464
int Py_UTF8Mode = 0;
465465
int Py_DebugFlag = 0; /* Needed by parser.c */

Python/preconfig.c

Lines changed: 8 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -291,12 +291,12 @@ _PyPreConfig_InitCompatConfig(PyPreConfig *config)
291291
config->use_environment = -1;
292292
config->configure_locale = 1;
293293

294-
/* bpo-36443: C locale coercion (PEP 538) and UTF-8 Mode (PEP 540)
295-
are disabled by default using the Compat configuration.
294+
/* gh-80624: C locale coercion (PEP 538) is disabled by default using
295+
the Compat configuration.
296296
297-
Py_UTF8Mode=1 enables the UTF-8 mode. PYTHONUTF8 environment variable
297+
Py_UTF8Mode=0 disables the UTF-8 mode. PYTHONUTF8 environment variable
298298
is ignored (even if use_environment=1). */
299-
config->utf8_mode = 0;
299+
config->utf8_mode = 1;
300300
config->coerce_c_locale = 0;
301301
config->coerce_c_locale_warn = 0;
302302

@@ -317,8 +317,8 @@ PyPreConfig_InitPythonConfig(PyPreConfig *config)
317317
config->isolated = 0;
318318
config->parse_argv = 1;
319319
config->use_environment = 1;
320-
/* Set to -1 to enable C locale coercion (PEP 538) and UTF-8 Mode (PEP 540)
321-
depending on the LC_CTYPE locale, PYTHONUTF8 and PYTHONCOERCECLOCALE
320+
/* Set to -1 to enable C locale coercion (PEP 538) depending on
321+
the LC_CTYPE locale, PYTHONUTF8 and PYTHONCOERCECLOCALE
322322
environment variables. */
323323
config->coerce_c_locale = -1;
324324
config->coerce_c_locale_warn = -1;
@@ -338,7 +338,7 @@ PyPreConfig_InitIsolatedConfig(PyPreConfig *config)
338338
config->configure_locale = 0;
339339
config->isolated = 1;
340340
config->use_environment = 0;
341-
config->utf8_mode = 0;
341+
config->utf8_mode = 1;
342342
config->dev_mode = 0;
343343
#ifdef MS_WINDOWS
344344
config->legacy_windows_fs_encoding = 0;
@@ -649,23 +649,7 @@ preconfig_init_utf8_mode(PyPreConfig *config, const _PyPreCmdline *cmdline)
649649
return _PyStatus_OK();
650650
}
651651

652-
653-
#ifndef MS_WINDOWS
654-
if (config->utf8_mode < 0) {
655-
/* The C locale and the POSIX locale enable the UTF-8 Mode (PEP 540) */
656-
const char *ctype_loc = setlocale(LC_CTYPE, NULL);
657-
if (ctype_loc != NULL
658-
&& (strcmp(ctype_loc, "C") == 0
659-
|| strcmp(ctype_loc, "POSIX") == 0))
660-
{
661-
config->utf8_mode = 1;
662-
}
663-
}
664-
#endif
665-
666-
if (config->utf8_mode < 0) {
667-
config->utf8_mode = 0;
668-
}
652+
config->utf8_mode = 1;
669653
return _PyStatus_OK();
670654
}
671655

0 commit comments

Comments
 (0)
0