-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
gh-95913: Edit Faster CPython section in 3.11 WhatsNew #98429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
a7493eb
5e67f8b
2fda457
3d614ee
6fe84e8
076e331
2b24d4e
0e898cd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1162,14 +1162,17 @@ Optimizations | |
Faster CPython | ||
============== | ||
|
||
CPython 3.11 is on average `25% faster <https://github.com/faster-cpython/ideas#published-results>`_ | ||
than CPython 3.10 when measured with the | ||
CPython 3.11 is an average of | ||
`25% faster <https://github.com/faster-cpython/ideas#published-results>`_ | ||
than CPython 3.10 as measured with the | ||
`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite, | ||
and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup | ||
could be up to 10-60% faster. | ||
when compiled with GCC on Ubuntu Linux. | ||
Depending on your workload, the overall speedup could likely be 10-60%. | ||
|
||
This project focuses on two major areas in Python: faster startup and faster | ||
runtime. Other optimizations not under this project are listed in `Optimizations`_. | ||
This project focuses on two major areas in Python: | ||
:ref:`whatsnew311-faster-startup` and :ref:`whatsnew311-faster-runtime`. | ||
Optimizations not covered by this project are listed separately under | ||
:ref:`whatsnew311-optimizations`. | ||
|
||
|
||
.. _whatsnew311-faster-startup: | ||
|
@@ -1182,8 +1185,8 @@ Faster Startup | |
Frozen imports / Static code objects | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Python caches bytecode in the :ref:`__pycache__<tut-pycache>` directory to | ||
speed up module loading. | ||
Python caches :term:`bytecode` in the :ref:`__pycache__ <tut-pycache>` | ||
directory to speed up module loading. | ||
|
||
Previously in 3.10, Python module execution looked like this: | ||
|
||
|
@@ -1192,8 +1195,9 @@ Previously in 3.10, Python module execution looked like this: | |
Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate | ||
|
||
In Python 3.11, the core modules essential for Python startup are "frozen". | ||
This means that their code objects (and bytecode) are statically allocated | ||
by the interpreter. This reduces the steps in module execution process to this: | ||
This means that their :ref:`codeobjects` (and bytecode) | ||
are statically allocated by the interpreter. | ||
This reduces the steps in module execution process to: | ||
|
||
.. code-block:: text | ||
|
||
|
@@ -1202,7 +1206,7 @@ by the interpreter. This reduces the steps in module execution process to this: | |
Interpreter startup is now 10-15% faster in Python 3.11. This has a big | ||
impact for short-running programs using Python. | ||
|
||
(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.) | ||
(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in many issues.) | ||
|
||
|
||
.. _whatsnew311-faster-runtime: | ||
|
@@ -1215,17 +1219,19 @@ Faster Runtime | |
Cheaper, lazy Python frames | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Python frames are created whenever Python calls a Python function. This frame | ||
holds execution information. The following are new frame optimizations: | ||
Python frames, holding execution information, | ||
are created whenever Python calls a Python function. | ||
The following are new frame optimizations: | ||
|
||
- Streamlined the frame creation process. | ||
- Avoided memory allocation by generously re-using frame space on the C stack. | ||
- Streamlined the internal frame struct to contain only essential information. | ||
Frames previously held extra debugging and memory management information. | ||
|
||
Old-style frame objects are now created only when requested by debuggers or | ||
by Python introspection functions such as ``sys._getframe`` or | ||
``inspect.currentframe``. For most user code, no frame objects are | ||
Old-style :ref:`frame objects <frame-objects>` | ||
are now created only when requested by debuggers | ||
or by Python introspection functions such as :func:`sys._getframe` and | ||
:func:`inspect.currentframe`. For most user code, no frame objects are | ||
created at all. As a result, nearly all Python functions calls have sped | ||
up significantly. We measured a 3-7% speedup in pyperformance. | ||
|
||
|
@@ -1246,10 +1252,11 @@ In 3.11, when CPython detects Python code calling another Python function, | |
it sets up a new frame, and "jumps" to the new code inside the new frame. This | ||
avoids calling the C interpreting function altogether. | ||
|
||
Most Python function calls now consume no C stack space. This speeds up | ||
most of such calls. In simple recursive functions like fibonacci or | ||
factorial, a 1.7x speedup was observed. This also means recursive functions | ||
can recurse significantly deeper (if the user increases the recursion limit). | ||
Most Python function calls now consume no C stack space, speeding them up. | ||
In simple recursive functions like fibonacci or | ||
factorial, we observed a 1.7x speedup. This also means recursive functions | ||
can recurse significantly deeper | ||
(if the user increases the recursion limit with :func:`sys.setrecursionlimit`). | ||
We measured a 1-3% improvement in pyperformance. | ||
|
||
(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.) | ||
|
@@ -1260,7 +1267,7 @@ We measured a 1-3% improvement in pyperformance. | |
PEP 659: Specializing Adaptive Interpreter | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
:pep:`659` is one of the key parts of the faster CPython project. The general | ||
:pep:`659` is one of the key parts of the Faster CPython project. The general | ||
idea is that while Python is a dynamic language, most code has regions where | ||
objects and types rarely change. This concept is known as *type stability*. | ||
|
||
|
@@ -1269,17 +1276,18 @@ in the executing code. Python will then replace the current operation with a | |
more specialized one. This specialized operation uses fast paths available only | ||
to those use cases/types, which generally outperform their generic | ||
counterparts. This also brings in another concept called *inline caching*, where | ||
Python caches the results of expensive operations directly in the bytecode. | ||
Python caches the results of expensive operations directly in the | ||
:term:`bytecode`. | ||
|
||
The specializer will also combine certain common instruction pairs into one | ||
superinstruction. This reduces the overhead during execution. | ||
superinstruction, reducing the overhead during execution. | ||
|
||
Python will only specialize | ||
when it sees code that is "hot" (executed multiple times). This prevents Python | ||
from wasting time for run-once code. Python can also de-specialize when code is | ||
from wasting time on run-once code. Python can also de-specialize when code is | ||
too dynamic or when the use changes. Specialization is attempted periodically, | ||
and specialization attempts are not too expensive. This allows specialization | ||
to adapt to new circumstances. | ||
and specialization attempts are not too expensive, | ||
allowing it to adapt to new circumstances. | ||
CAM-Gerlach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler. | ||
See :pep:`659` for more information. Implementation by Mark Shannon and Brandt | ||
|
@@ -1292,32 +1300,32 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.) | |
| Operation | Form | Specialization | Operation speedup | Contributor(s) | | ||
| | | | (up to) | | | ||
+===============+====================+=======================================================+===================+===================+ | ||
| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, | | ||
| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, | | ||
| | | fast paths for their underlying types. | | Brandt Bucher, | | ||
| Binary | ``x + x`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, | | ||
| operations | | such as :class:`int`, :class:`float` and :class:`str` | | Dong-hee Na, | | ||
| | ``x - x`` | take custom fast paths for their underlying types. | | Brandt Bucher, | | ||
| | | | | Dennis Sweeney | | ||
| | ``x * x`` | | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-25% | Irit Katriel, | | ||
| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon | | ||
| | | data structures. | | | | ||
| Subscript | ``a[i]`` | Subscripting container types such as :class:`list`, | 10-25% | Irit Katriel, | | ||
| | | :class:`tuple` and :class:`dict` directly index | | Mark Shannon | | ||
| | | the underlying data structures. | | | | ||
| | | | | | | ||
| | | Subscripting custom ``__getitem__`` | | | | ||
| | | Subscripting custom :meth:`~object.__getitem__` | | | | ||
| | | is also inlined similar to :ref:`inline-calls`. | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney | | ||
| subscript | | | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Calls | ``f(arg)`` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, | | ||
| | ``C(arg)`` | as ``len`` and ``str`` directly call their underlying | | Ken Jin | | ||
| | | C version. This avoids going through the internal | | | | ||
| | | calling convention. | | | | ||
| | | | | | | ||
| | | as :func:`len` and :class:`str` directly call their | | Ken Jin | | ||
| | ``C(arg)`` | underlying C version. This avoids going through the | | | | ||
| | | internal calling convention. | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Load | ``print`` | The object's index in the globals/builtins namespace | [1]_ | Mark Shannon | | ||
| global | ``len`` | is cached. Loading globals and builtins require | | | | ||
| variable | | zero namespace lookups. | | | | ||
| Load | ``print()`` | The object's index in the globals/builtins namespace | [#load-global]_ | Mark Shannon | | ||
| global | | is cached. Loading globals and builtins require | | | | ||
| variable | ``len()`` | zero namespace lookups. | | | | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [2]_ | Mark Shannon | | ||
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [#load-attr]_ | Mark Shannon | | ||
| attribute | | index inside the class/object's namespace is cached. | | | | ||
| | | In most cases, attribute loading will require zero | | | | ||
| | | namespace lookups. | | | | ||
|
@@ -1329,14 +1337,15 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.) | |
| Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon | | ||
| attribute | | | in pyperformance | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | 8% | Brandt Bucher | | ||
| Sequence | | and ``tuple``. Avoids internal calling convention. | | | | ||
| Unpack | ``*seq`` | Specialized for common containers such as | 8% | Brandt Bucher | | ||
| Sequence | | :class:`list` and :class:`tuple`. | | | | ||
| | | Avoids internal calling convention. | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
|
||
.. [1] A similar optimization already existed since Python 3.8. 3.11 | ||
specializes for more forms and reduces some overhead. | ||
.. [#load-global] A similar optimization already existed since Python 3.8. | ||
3.11 specializes for more forms and reduces some overhead. | ||
|
||
.. [2] A similar optimization already existed since Python 3.10. | ||
.. [#load-attr] A similar optimization already existed since Python 3.10. | ||
3.11 specializes for more forms. Furthermore, all attribute loads should | ||
be sped up by :issue:`45947`. | ||
|
||
|
@@ -1346,8 +1355,8 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.) | |
Misc | ||
---- | ||
|
||
* Objects now require less memory due to lazily created object namespaces. Their | ||
namespace dictionaries now also share keys more freely. | ||
* Objects now require less memory due to lazily created object namespaces. | ||
Their namespace dictionaries now also share keys more freely. | ||
(Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.) | ||
|
||
* A more concise representation of exceptions in the interpreter reduced the | ||
|
@@ -1360,35 +1369,47 @@ Misc | |
FAQ | ||
--- | ||
|
||
| Q: How should I write my code to utilize these speedups? | ||
| | ||
| A: You don't have to change your code. Write Pythonic code that follows common | ||
best practices. The Faster CPython project optimizes for common code | ||
patterns we observe. | ||
| | ||
| | ||
| Q: Will CPython 3.11 use more memory? | ||
| | ||
| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10. | ||
This is offset by memory optimizations for frame objects and object | ||
dictionaries as mentioned above. | ||
| | ||
| | ||
| Q: I don't see any speedups in my workload. Why? | ||
| | ||
| A: Certain code won't have noticeable benefits. If your code spends most of | ||
its time on I/O operations, or already does most of its | ||
computation in a C extension library like numpy, there won't be significant | ||
speedup. This project currently benefits pure-Python workloads the most. | ||
| | ||
| Furthermore, the pyperformance figures are a geometric mean. Even within the | ||
pyperformance benchmarks, certain benchmarks have slowed down slightly, while | ||
others have sped up by nearly 2x! | ||
| | ||
| | ||
| Q: Is there a JIT compiler? | ||
| | ||
| A: No. We're still exploring other optimizations. | ||
.. _faster-cpython-faq-my-code: | ||
|
||
How should I write my code to utilize these speedups? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Write Pythonic code that follows common best practices; | ||
you don't have to change your code. | ||
The Faster CPython project optimizes for common code patterns we observe. | ||
|
||
|
||
.. _faster-cpython-faq-memory: | ||
|
||
Will CPython 3.11 use more memory? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Maybe not; we don't expect memory use to exceed 20% higher than 3.10. | ||
This is offset by memory optimizations for frame objects and object | ||
dictionaries as mentioned above. | ||
|
||
|
||
.. _faster-cpython-ymmv: | ||
|
||
I don't see any speedups in my workload. Why? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Certain code won't have noticeable benefits. If your code spends most of | ||
its time on I/O operations, or already does most of its | ||
computation in a C extension library like NumPy, there won't be significant | ||
speedups. This project currently benefits pure-Python workloads the most. | ||
|
||
Furthermore, the pyperformance figures are a geometric mean. Even within the | ||
pyperformance benchmarks, certain benchmarks have slowed down slightly, while | ||
others have sped up by nearly 2x! | ||
|
||
|
||
.. _faster-cpython-jit: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just for my own understanding, these are so that the hyperlink generates nicer links right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are ref target labels, which allow directly and stably linking and cross-referencing of this FAQ answer using the Furthermore, it means external links to this fragment id (i.e. using |
||
|
||
Is there a JIT compiler? | ||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
No. We're still exploring other optimizations. | ||
|
||
|
||
.. _whatsnew311-faster-cpython-about: | ||
|
Uh oh!
There was an error while loading. Please reload this page.