8000 bpo-42923: Dump extension modules on fatal error by vstinner · Pull Request #24207 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-42923: Dump extension modules on fatal error #24207

8000
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 18, 2021
Merged

bpo-42923: Dump extension modules on fatal error #24207

merged 1 commit into from
Jan 18, 2021

Conversation

vstinner
Copy link
Member
@vstinner vstinner commented Jan 13, 2021

The Py_FatalError() function and the faulthandler module now dump the
list of extension modules on a fatal error.

  • Add _Py_DumpExtensionModules() and _PyModule_IsExtension() internal
    functions.
  • Pass stream to _Py_FatalError_DumpTracebacks() rather than
    hardcoding stderr.
  • Move faulthandler._fatal_error() to _testcapi.fatal_error().
  • Add test_capi.test_fatal_error() test.

https://bugs.python.org/issue42923

@vstinner
Copy link
Member Author

cc @methane @corona10 @pablogsal @serhiy-storchaka: This enhancement should help to explain to users that their crash may come from a third party C extension modules rather than Python (internals or stdlib).

@vstinner
Copy link
Member Author

I updated the PR to change the formatting.

I tested with a more realistic list of extensions, crash after loading pip. The extension modules list was too long when rendered with one item per line, so I wrote it on a single long with separated by commas:

Extension modules: sys, builtins, _imp, _thread, _warnings, _weakref, _io, marshal, posix, time, faulthandler, _codecs, _signal, _abc, _stat, _ctypes, _struct, itertools, _operator, _collections, _functools, _sre, _locale, _string, atexit, errno, pwd, grp, fcntl, _posixsubprocess, select, math, _datetime, zlib, _lzma, _socket, array, _pickle, _heapq, termios, _hashlib, _blake2, binascii, pyexpat, _bisect, _sha512, _opcode

@methane
Copy link
Member
methane commented Jan 14, 2021

I don't have time to precise review or testing locally. But it looks nice by quick looking.

Copy link
Member
@pablogsal pablogsal 8000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also dump the extension modules in the standard library?

@vstinner
Copy link
Member Author

Will this also dump the extension modules in the standard library?

We could ignore stdlib extension modules using an hardcoded list of all stdlib extensions names. But I don't want to maintain such list in the long term. Apart of the name, I don't know any programmatic way to detect if a module comes from the stdlib or not.

My expectation is that users will copy/paste the whole output to a bug reports, and so core developers can look for some unusual extensions.

If we get more and more bug reports containing crashes, maybe it would be nice to have a tool to process the output:

  • Add the source code from the traceback. Problem: the dump doesn't contain the Python version. Should we add it?
  • Ignore the stdlib extensions and sort the list.

Concrete example: when I looked at https://bugs.python.org/issue42891 I didn't know anything about unicorn or lsm-db. I don't think that the reporter knows the exhaustive list of all extension modules used by his code. He got a crash and considers that it must be a bug in Python. Then I discovered that lsm-db is implemented in C. And the bug comes from this extension.

A Linux kernel "oops" dump contains a flag saying if the kernel is "tainted": https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html

@pablogsal
Copy link
Member

Will this also dump the extension modules in the standard library?

We could ignore stdlib extension modules using an hardcoded list of all stdlib extensions names. But I don't want to maintain such list in the long term. Apart of the name, I don't know any programmatic way to detect if a module comes from the stdlib or not.

My expectation is that users will copy/paste the whole output to a bug reports, and so core developers can look for some unusual extensions.

If we get more and more bug reports containing crashes, maybe it would be nice to have a tool to process the output:

  • Add the source code from the traceback. Problem: the dump doesn't contain the Python version. Should we add it?
  • Ignore the stdlib extensions and sort the list.

Concrete example: when I looked at https://bugs.python.org/issue42891 I didn't know anything about unicorn or lsm-db. I don't think that the reporter knows the exhaustive list of all extension modules used by his code. He got a crash and considers that it must be a bug in Python. Then I discovered that lsm-db is implemented in C. And the bug comes from this extension.

A Linux kernel "oops" dump contains a flag saying if the kernel is "tainted": https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html

I see. Well, one of the reasons I was asking is that this list is probably going to be gigantic and a lot of stdlib extension modules will be in it almost always. Given that there is nothing that tells the user that extension modules are the culprit is not always clear what the user can do with that information.

On the other hand as a core Dev that has to debug crashes I find the information quite useful, but I would prefer to not have the built-in extension modules to reduce the noise of a list that can be veeeery long.

@vstinner
Copy link
Member Author

Since Python dicts keep the insertion order, the interesting thing is that the list is written in the import order! Last items are the most recently imported modules.

I backported locally the function to Python 3.9 to test other applications.

At Python startup, sys.modules contains 38 modules, 17 extensions:

$ env/bin/python 
Python 3.9.1+ (heads/3.9-dirty:799f8489d4, Jan 14 2021, 14:29:26) 
>
8000
>> import sys; len(sys.modules)
38
>>> import _testcapi; _testcapi.dump_ext_modules()
Extension modules: sys, builtins, _imp, _thread, _warnings, _weakref, posix, _io, marshal, time, _codecs, _signal, _abc, _stat, readline, atexit, _testcapi

After loading numpy and jupyter_client, sys.modules contains 360 modules (+322), 76 extensions (+59):

>>> import sys; len(sys.modules)
360
>>> import numpy, jupyter_client
>>> import _testcapi; _testcapi.dump_ext_modules()

Extension modules: sys, builtins, _imp, _thread, _warnings, _weakref, posix, _io, marshal, time, _codecs, _signal, _abc, _stat, readline, atexit, _testcapi, _heapq, itertools, _operator, _collections, _functools, _sre, _locale, math, _datetime, numpy.core._multiarray_umath, errno, _struct, _pickle, numpy.core._multiarray_tests, pwd, grp, _posixsubprocess, select, _ctypes, numpy.linalg.lapack_lite, numpy.linalg._umath_linalg, zlib, _lzma, _decimal, numpy.fft._pocketfft_internal, numpy.random._common, binascii, _hashlib, _blake2, _bisect, _sha512, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, _json, _socket, array, termios, zmq.backend.cython.constants, zmq.backend.cython.error, zmq.backend.cython.message, zmq.backend.cython.context, zmq.backend.cython.socket, zmq.backend.cython.utils, zmq.backend.cython._poll, zmq.backend.cython._version, zmq.backend.cython._device, zmq.backend.cython._proxy_steerable, _opcode, _string, _ssl, _contextvars, _asyncio, _uuid

Ratio of extensions / all modules:

  • Startup: 45% (17/38)
  • numpy + jupyter_client: 21% (76/360)

@vstinner
Copy link
Member Author

I see. Well, one of the reasons I was asking is that this list is probably going to be gigantic and a lot of stdlib extension modules will be in it almost always. Given that there is nothing that tells the user that extension modules are the culprit is not always clear what the user can do with that information.

If I have to debug a crash and I don't know anything about the application, for me it's relevant to know if the application imported stdlib extensions. For example, if _ctypes is loaded, maybe _ctypes was misused or was used to load "unsafe" code.

On the other hand as a core Dev that has to debug crashes I find the information quite useful, but I would prefer to not have the built-in extension modules to reduce the noise of a list that can be veeeery long.

I tried numpy+jupyter_client: I get 76 extensions. The list is long, is it "veeeery long" for you?

Extension modules: sys, builtins, _imp, _thread, _warnings, _weakref, posix, _io, marshal, time, _codecs, _signal, _abc, _stat, readline, atexit, _testcapi, _heapq, itertools, _operator, _collections, _functools, _sre, _locale, math, _datetime, numpy.core._multiarray_umath, errno, _struct, _pickle, numpy.core._multiarray_tests, pwd, grp, _posixsubprocess, select, _ctypes, numpy.linalg.lapack_lite, numpy.linalg._umath_linalg, zlib, _lzma, _decimal, numpy.fft._pocketfft_internal, numpy.random._common, binascii, _hashlib, _blake2, _bisect, _sha512, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, _json, _socket, array, termios, zmq.backend.cython.constants, zmq.backend.cython.error, zmq.backend.cython.message, zmq.backend.cython.context, zmq.backend.cython.socket, zmq.backend.cython.utils, zmq.backend.cython._poll, zmq.backend.cython._version, zmq.backend.cython._device, zmq.backend.cython._proxy_steerable, _opcode, _string, _ssl, _contextvars, _asyncio, _uuid

Or do you prefer to get the list rendered with one item per line?

  • sys
  • builtins
  • _imp
  • _thread
  • _warnings
  • _weakref
  • posix
  • _io
  • marshal
  • time
  • _codecs
  • _signal
  • _abc
  • _stat
  • readline
  • atexit
  • _testcapi
  • _heapq
  • itertools
  • _operator
  • _collections
  • _functools
  • _sre
  • _locale
  • math
  • _datetime
  • numpy.core._multiarray_umath
  • errno
  • _struct
  • _pickle
  • numpy.core._multiarray_tests
  • pwd
  • grp
  • _posixsubprocess
  • select
  • _ctypes
  • numpy.linalg.lapack_lite
  • numpy.linalg._umath_linalg
  • zlib
  • _lzma
  • _decimal
  • numpy.fft._pocketfft_internal
  • numpy.random._common
  • binascii
  • _hashlib
  • _blake2
  • _bisect
  • _sha512
  • numpy.random.bit_generator
  • numpy.random._bounded_integers
  • numpy.random._mt19937
  • numpy.random.mtrand
  • numpy.random._philox
  • numpy.random._pcg64
  • numpy.random._sfc64
  • numpy.random._generator
  • _json
  • _socket
  • array
  • termios
  • zmq.backend.cython.constants
  • zmq.backend.cython.error
  • zmq.backend.cython.message
  • zmq.backend.cython.context
  • zmq.backend.cython.socket
  • zmq.backend.cython.utils
  • zmq.backend.cython._poll
  • zmq.backend.cython._version
  • zmq.backend.cython._device
  • zmq.backend.cython._proxy_steerable
  • _opcode
  • _string
  • _ssl
  • _contextvars
  • _asyncio
  • _uuid

@vstinner
Copy link
Member Author

Another example. I tested "import cinder" (OpenStack Cinder application): 221 modules (43 extensions):

Extension modules: sys, builtins, _imp, _warnings, _weakref, posix, _io, marshal, _codecs, _signal, _abc, _stat, _locale, _heapq, itertools, _operator, _collections, _functools, _sre, readline, atexit, _testcapi, _opcode, _struct, greenlet._greenlet, _thread, errno, math, __original_module_select, time, __original_module_time, _socket, select, array, _datetime, binascii, _hashlib, _blake2, _cffi_backend, _bisect, _sha512, _string, _ssl

@pablogsal
Copy link
Member

Another example. I tested "import cinder" (OpenStack Cinder application): 221 modules (43 extensions):

Extension modules: sys, builtins, _imp, _warnings, _weakref, posix, _io, marshal, _codecs, _signal, _abc, _stat, _locale, _heapq, itertools, _operator, _collections, _functools, _sre, readline, atexit, _testca 8000 pi, _opcode, _struct, greenlet._greenlet, _thread, errno, math, __original_module_select, time, __original_module_time, _socket, select, array, _datetime, binascii, _hashlib, _blake2, _cffi_backend, _bisect, _sha512, _string, _ssl

Note that almost all of these come from the stdlib, which I would say is going to be noisy to the user.

@pablogsal
Copy link
Member

The list is long, is it "veeeery long" for you?

Is veeery long (3 'e's) 😉

@vstinner
Copy link
Member Author

Is veeery long (3 'e's) wink

I created https://bugs.python.org/issue42955 to add sys.modules_names tuple: names of stdlib modules. Once it will be merged, I will updated this PR to filter the list of modules (ignore stdlib modules).

The Py_FatalError() function and the faulthandler module now dump the
list of extension modules on a fatal error.

Add _Py_DumpExtensionModules() and _PyModule_IsExtension() internal
functions.
@vstinner
Copy link
Member Author

I merged non controversial changes to make this PR shorter. I rebased my PR on master.

@vstinner vstinner merged commit 250035d into python:master Jan 18, 2021
@vstinner vstinner deleted the dump_ext_modules branch January 18, 2021 19:47
@vstinner
Copy link
Member Author

@pablogsal: I merged a first implementation which doesn't exclude stdlib modules. I will update the once once https://bugs.python.org/issue42955 will be implemented.

adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
The Py_FatalError() function and the faulthandler module now dump the
list of extension modules on a fatal error.

Add _Py_DumpExtensionModules() and _PyModule_IsExtension() internal
functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0