-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
bpo-47189: What's New in 3.11: Faster CPython #32235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 18 commits
d18f804
77701fe
2d4171f
2e7647f
7098375
07646a6
6436b81
8063a44
33b0486
3cb89aa
121b24d
48dd46d
ff34d6f
acef908
bc84c89
1f1e565
8b182ec
63fa529
f460c93
584dafa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,6 +62,8 @@ Summary -- Release highlights | |
.. This section singles out the most important changes in Python 3.11. | ||
Brevity is key. | ||
|
||
- Python 3.11 is up to 10-60% faster than Python 3.10. On average, we measured a | ||
1.22x speedup on the standard benchmark suite. See `Faster CPython`_ for details. | ||
|
||
.. PEP-sized items next. | ||
|
||
|
@@ -472,13 +474,6 @@ Optimizations | |
almost eliminated when no exception is raised. | ||
(Contributed by Mark Shannon in :issue:`40222`.) | ||
|
||
* Method calls with keywords are now faster due to bytecode | ||
changes which avoid creating bound method instances. Previously, this | ||
optimization was applied only to method calls with purely positional | ||
arguments. | ||
(Contributed by Ken Jin and Mark Shannon in :issue:`26110`, based on ideas | ||
implemented in PyPy.) | ||
|
||
* Pure ASCII strings are now normalized in constant time by :func:`unicodedata.normalize`. | ||
(Contributed by Dong-hee Na in :issue:`44987`.) | ||
|
||
|
@@ -493,6 +488,225 @@ Optimizations | |
(Contributed by Inada Naoki in :issue:`46845`.) | ||
|
||
|
||
Faster CPython | ||
============== | ||
|
||
CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst>`_ | ||
than CPython 3.10 when measured with the | ||
`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite, | ||
and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup | ||
could be up to 10-60% faster. | ||
|
||
This project focuses on two major areas in Python: faster startup and faster | ||
runtime. Other optimizations not under this project are listed in `Optimizations`_. | ||
|
||
gvanrossum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Faster Startup | ||
-------------- | ||
|
||
Frozen imports / Static code objects | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Python caches bytecode in the :ref:`__pycache__<tut-pycache>` directory to | ||
speed up module loading. | ||
|
||
Previously in 3.10, Python module execution looked like this: | ||
|
||
.. code-block:: text | ||
|
||
Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate | ||
|
||
In Python 3.11, the core modules essential for Python startup are "frozen". | ||
This means that their code objects (and bytecode) are statically allocated | ||
by the interpreter. This reduces the steps in module execution process to this: | ||
|
||
.. code-block:: text | ||
|
||
Statically allocated code object -> Evaluate | ||
|
||
Interpreter startup is now 10-15% faster in Python 3.11. This has a big | ||
impact for short-running programs using Python. | ||
|
||
(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.) | ||
|
||
|
||
Faster Runtime | ||
-------------- | ||
|
||
Cheaper, lazy Python frames | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Python frames are created whenever Python calls a Python function. This frame | ||
holds execution information. The following are new frame optimizations: | ||
|
||
- Streamlined the frame creation process. | ||
- Avoided memory allocation by generously re-using frame space on the C stack. | ||
- Streamlined the internal frame struct to contain only essential information. | ||
Frames previously held extra debugging and memory management information. | ||
|
||
Old-style frame objects are now created only when required by debuggers. For | ||
most user code, no frame objects are created at all. As a result, nearly all | ||
Python functions calls have sped up significantly. We measured a 3-7% speedup | ||
in pyperformance. | ||
|
||
(Contributed by Mark Shannon in :issue:`44590`.) | ||
|
||
.. _inline-calls: | ||
|
||
Inlined Python function calls | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
During a Python function call, Python will call an evaluating C function to | ||
interpret that function's code. This effectively limits pure Python recursion to | ||
what's safe for the C stack | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
In 3.11, when Python detects Python code calling another Python function, | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
it sets up a new frame, and "jumps" to the new code inside the new frame. This | ||
avoids calling the C interpreting function altogether. | ||
|
||
Most Python function calls now consume no C stack space. This speeds up | ||
most of such calls. In simple recursive functions like fibonacci or | ||
factorial, a 1.7x speedup was observed. This also means recursive functions | ||
can recurse significantly deeper, (if the user increases the recursion limit). | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
We measured a 1-3% improvement in pyperformance. | ||
|
||
(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.) | ||
|
||
PEP 659: Specializing Adaptive Interpreter | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
:pep:`659` is one of the key parts of the faster CPython project. The general | ||
idea is that while Python is a dynamic language, most code has regions where | ||
objects and types rarely change. This concept is known as *type stability*. | ||
|
||
At runtime, Python will try to look for common patterns and type stability | ||
in the executing code. Python will then replace the current operation with a | ||
more specialized one. This specialized operation uses fast paths available only | ||
to those use cases/types, which generally outperform their generic | ||
counterparts. This also brings in another concept called *inline caching*, where | ||
Python caches the results of expensive operations directly in the bytecode. | ||
|
||
The specializer will also combine certain common instruction pairs into one | ||
superinstruction. This reduces the overhead during execution. | ||
|
||
Python will only specialize | ||
when it sees code that is "hot" (executed multiple times). This prevents Python | ||
from wasting time for run-once code. Python can also de-specialize when code is | ||
too dynamic or when the use changes. Specialization is attempted periodically, | ||
and specialization attempts are not too expensive. This allows specialization | ||
to adapt to new circumstances. | ||
|
||
(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler. | ||
See :pep:`659` for more information.) | ||
|
||
.. | ||
If I missed out anyone, please add them. | ||
|
||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Operation | Form | Specialization | Operation speedup | Contributor(s) | | ||
| | | | (up to) | | | ||
+===============+====================+=======================================================+===================+===================+ | ||
| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, | | ||
| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, | | ||
| | | fast paths for their underlying types. | | Brandt Bucher, | | ||
| | | | | Dennis Sweeney | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-30% | Irit Katriel, | | ||
| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon | | ||
| | | data structures. | | | | ||
| | | | | | | ||
| | | Subscripting custom ``__getitem__`` | | | | ||
| | | is also inlined similar to :ref:`inline-calls`. | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-90% | Dennis Sweeney | | ||
| subscript | | | | | | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Calls | ``f(arg)`` | Calls to common builtin (C) functions such as ``len`` | 20% | Mark Shannon, | | ||
| | | and ``isinstance`` directly call their underlying C | | Ken Jin | | ||
| | ``C(arg)`` | version. This avoids going through the internal | 170% | | | ||
| | | calling convention. | | | | ||
| | | | | | | ||
| | | Calls to certain Python functions are inlined similar | | | | ||
| | | to :ref:`inline-calls`. | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this an oblique reference to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it just means that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm. It's rather subtle what you intend to say then. The text linked under Also, it's not clear for what operation you are claiming a 170% speedup (so a factor 2.7). The table layout makes it look like |
||
| | | | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Load | ``print`` | The object's index in the globals/builtins namespace | - [1]_ | Mark Shannon | | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| global | ``len`` | is cached. Loading globals and builtins require | | | | ||
| variable | | zero namespace lookups. | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | - [2]_ | Mark Shannon | | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| attribute | | index inside the class/object's namespace is cached. | | | | ||
| | | In most cases, attribute loading will require zero | | | | ||
| | | namespace lookups. | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% | Ken Jin, | | ||
| methods for | | loading now has no namespace lookups -- even for | | Mark Shannon | | ||
| call | | classes with long inheritance chains. | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon | | ||
| attribute | | | in pyperformance | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | 8% | Brandt Bucher | | ||
| Sequence | | and ``tuple``. Avoids internal calling convention. | | | | ||
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | ||
gvanrossum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. [1] A similar optimization already existed since Python 3.8. 3.11 | ||
specializes for more forms and reduces some overhead. | ||
|
||
.. [2] A similar optimization already existed since Python 3.10. | ||
3.11 specializes for more forms. Furthermore, all attribute loads should | ||
be sped up by :issue:`45947`. | ||
|
||
|
||
Misc | ||
---- | ||
|
||
* Objects now require less memory due to lazily created object namespaces. Their | ||
namespace dictionaries now also share keys more freely. | ||
(Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.) | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* A more concise representation of exceptions in the interpreter reduced the | ||
time required for catching an exception by about 10%. | ||
(Contributed by Irit Katriel in :issue:`45711`.) | ||
|
||
FAQ | ||
--- | ||
|
||
| Q: How should I write my code to utilize these speedups? | ||
| | ||
| A: You don't have to change your code. Write Pythonic code that follows common | ||
best practices. The Faster CPython project optimizes for common code | ||
patterns we observe. | ||
| | ||
| | ||
| Q: Will CPython 3.11 use more memory? | ||
| | ||
| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10. | ||
This is offset by memory optimizations for frame objects and object | ||
dictionaries as mentioned above. | ||
| | ||
| | ||
| Q: I don't see any speedups in my workload. Why? | ||
| | ||
| A: Certain code won't have noticeable benefits. If your code spends most of | ||
its time on I/O operations, or already does most of its | ||
computation in a C extension library like numpy, there won't be significant | ||
speedup. This project currently benefits pure-Python workloads the most. | ||
| | ||
| Furthermore, the pyperformance figures are a geometric mean. Even within the | ||
pyperformance benchmarks, certain benchmarks have slowed down slightly, while | ||
others have sped up by nearly 2x! | ||
| | ||
| | ||
| Q: Is there a JIT compiler? | ||
| | ||
| A: No. We're still exploring other optimizations. | ||
|
||
|
||
About | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
----- | ||
|
||
Faster CPython explores optimizations for :term:`CPython`. The main team is | ||
funded by Microsoft to work on this full-time. The team also collaborates | ||
extensively with volunteer contributors in the community. | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
CPython bytecode changes | ||
======================== | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Add a What's New in Python 3.11 entry for the Faster CPython project. | ||
Documentation by Ken Jin and Kumar Aditya. |
Uh oh!
There was an error while loading. Please reload this page.