10000 GH-94526: Force utf8 encoding in _bootstrap_python by kumaraditya303 · Pull Request #96889 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

GH-94526: Force utf8 encoding in _bootstrap_python #96889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
force utf8
  • Loading branch information
kumaraditya303 authored Sep 17, 2022
commit 35b374ac9c27d016bb92f6f00f45d6da7972fbbd
9 changes: 9 additions & 0 deletions Programs/_bootstrap_python.c
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,15 @@ main(int argc, char **argv)
PyStatus status;

PyConfig config;
PyPreConfig preconfig;
PyPreConfig_InitIsolatedConfig(&preconfig);
// Force utf8 encoding
preconfig.utf8_mode = 1;
status = Py_PreInitialize(&preconfig);
if (PyStatus_Exception(status)) {
Py_ExitStatusException(status);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr this PR initializes the preconfig differently, but that might be okay (or even what we actually want).

FWIW, there is a subtle difference here.

analysis:

before:

  1. initialize the config (PyConfig)
  2. set argv on the config
  3. call PyConfig_Read()
    a. call _Py_PreInitializeFromConfig()
    1. call _PyPreConfig_InitFromConfig() <-- initialize the preconfig
      a. call PyPreConfig_InitIsolatedConfig()
      b. call _PyPreConfig_GetConfig() -- copies from config:
      • parse_argv = 1 (set on the original line 71)
      • isolated = 0 (set on the original line 72)
      • use_environment = 0 (set in PyConfig_InitIsolatedConfig())
      • dev_mode = 0 (set in PyConfig_InitIsolatedConfig())
    2. call _Py_PreInitializeFromPyArgv() <-- use the preconfig
      a. call _PyPreConfig_Read() <-- uses argv from config
      b. call _PyPreConfig_Write() <-- updates the runtime state
  4. call Py_InitializeFromConfig()

after:

  1. initialize the preconfig (PyPreConfig)
    a. call PyPreConfig_InitIsolatedConfig() <-- initialize the preconfig
    • parse_argv = 0 (set in _PyPreConfig_InitCompatConfig())
    • isolated = 1
    • use_environment = 0
    • dev_mode = 0
      b. set utf8_mode
  2. call Py_PreInitialize()
    1. call _Py_PreInitializeFromPyArgv() <-- use the preconfig
      a. call _PyPreConfig_Read() <-- argv not used
      b. call _PyPreConfig_Write() <-- updates the runtime state
  3. initialize the config (PyConfig)
  4. set argv on the config
  5. call PyConfig_Read()
    a. call _Py_PreInitializeFromConfig()
    1. skip (runtime->preinitialized is true)
  6. call Py_InitializeFromConfig()

The key difference is how the preconfig is initialized:

  • before: with _PyPreConfig_InitFromConfig() (via PyConfig_Read()) using the already initialized config
  • after: with PyPreConfig_InitIsolatedConfig()

(When the preconfig is initialized is also different, but that doesn't really affect the outcome.)

Consequently, preconfig.parse_argv and preconfig.isolated are different (which slightly changes how the runtime is initialized) and _PyPreCmdline_SetArgv() is never called (in _PyPreConfig_Read()).

If we wanted the how to be strictly equivalent, we'd initialize the preconfig after the config:

    PyConfig config;
    PyConfig_InitIsolatedConfig(&config);
    // don't warn, pybuilddir.txt does not exist yet
    config.pathconfig_warnings = 0;
    // parse arguments
    config.parse_argv = 1;
    // add current script dir to sys.path
    config.isolated = 0;
    config.safe_path = 0;
#ifdef MS_WINDOWS
    status = PyConfig_SetArgv(&config, argc, argv);
#else
    status = PyConfig_SetBytesArgv(&config, argc, argv);
#endif
    if (PyStatus_Exception(status)) {
        goto error;
    }

    PyPreConfig preconfig;
    _PyPreConfig_InitFromConfig(&preconfig, &config);
    // Force utf8 encoding
    preconfig.utf8_mode = 1;
    _PyArgv config_args = {
        .use_bytes_argv = 0,
        .argc = config->argv.length,
        .wchar_argv = config->argv.items};
    status = _Py_PreInitializeFromPyArgv(&preconfig, &config_args);
    if (PyStatus_Exception(status)) {
        Py_ExitStatusException(status);
    }

    status = PyConfig_Read(&config);

All that said, it isn't clear that difference in runtime init is bad. In fact, it might be what we want for _bootstrap_python. The difference might also not matter for _bootstrap_python, especially for how we're parsing argv. Regardless, my point is that it isn't clear that these details were considered. It would be worth making sure that the resulting runtime state is something we're okay with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW for what _bootstrap_python is used, which is just to run one or two scripts this seems sufficient. It isn't something user facing or anything and it is "internal". I am hoping @vstinner can take a look as he authored the PyConfig PEP https://peps.python.org/pep-0587/ and he is most familiar with these APIs.

I did this PR mostly by reading the source code and in particular this example from the PEP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must not and cannot pre-initialize (PyPreConfig) Python once it's already initialized (PyConfig).

PyConfig_Read() does not pre-initialize Python if it's already pre-initialized. (If it's the case, it's would a bug.)

PyStatus
_Py_PreInitializeFromConfig(const PyConfig *config,
                            const _PyArgv *args)
{
    ...
    if (runtime->preinitialized) {
        /* Already initialized: do nothing */
        return _PyStatus_OK();
    }
    ...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn 7333 more.

Calling PyConfig_SetArgv() first to then call _Py_PreInitializeFromPyArgv() doesn't work, since PyConfig_SetArgv() does pre-initialize Python :-)

https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.PyConfig_SetArgv


PyConfig_InitIsolatedConfig(&config);
// don't warn, pybuilddir.txt does not exist yet
config.pathconfig_warnings = 0;
Expand Down
0