8000 bpo-47189: What's New in 3.11: Faster CPython (GH-32235) · python/cpython@9ffe47d · GitHub
[go: up one dir, main page]

Skip to content

Commit 9ffe47d

Browse files
Fidget-Spinnerkumaraditya303JelleZijlstraAlexWaygoodgvanrossum
authored
bpo-47189: What's New in 3.11: Faster CPython (GH-32235)
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Guido van Rossum <gvanrossum@users.noreply.github.com> Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
1 parent 074da78 commit 9ffe47d

File tree

3 files changed

+223
-7
lines changed

3 files changed

+223
-7
lines changed

Doc/tutorial/modules.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,8 @@ directory. This is an error unless the replacement is intended. See section
211211
.. %
212212
Do we need stuff on zip files etc. ? DUBOIS
213213
214+
.. _tut-pycache:
215+
214216
"Compiled" Python files
215217
-----------------------
216218

Doc/whatsnew/3.11.rst

Lines changed: 219 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ Summary -- Release highlights
6262
.. This section singles out the most important changes in Python 3.11.
6363
Brevity is key.
6464
65+
- Python 3.11 is up to 10-60% faster than Python 3.10. On average, we measured a
66+
1.22x speedup on the standard benchmark suite. See `Faster CPython`_ for details.
6567

6668
.. PEP-sized items next.
6769
@@ -477,13 +479,6 @@ Optimizations
477479
almost eliminated when no exception is raised.
478480
(Contributed by Mark Shannon in :issue:`40222`.)
479481

480-
* Method calls with keywords are now faster due to bytecode
481-
changes which avoid creating bound method instances. Previously, this
482-
optimization was applied only to method calls with purely positional
483-
arguments.
484-
(Contributed by Ken Jin and Mark Shannon in :issue:`26110`, based on ideas
485-
implemented in PyPy.)
486-
487482
* Pure ASCII strings are now normalized in constant time by :func:`unicodedata.normalize`.
488483
(Contributed by Dong-hee Na in :issue:`44987`.)
489484

@@ -498,6 +493,223 @@ Optimizations
498493
(Contributed by Inada Naoki in :issue:`46845`.)
499494

500495

496+
Faster CPython
497+
==============
498+
499+
CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst>`_
500+
than CPython 3.10 when measured with the
501+
`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite,
502+
and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
503+
could be up to 10-60% faster.
504+
505+
This project focuses on two major areas in Python: faster startup and faster
506+
runtime. Other optimizations not under this project are listed in `Optimizations`_.
507+
508+
Faster Startup
509+
--------------
510+
511+
Frozen imports / Static code objects
512+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
513+
514+
Python caches bytecode in the :ref:`__pycache__<tut-pycache>` directory to
515+
speed up module loading.
516+
517+
Previously in 3.10, Python module execution looked like this:
518+
519+
.. code-block:: text
520+
521+
Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate
522+
523+
In Python 3.11, the core modules essential for Python startup are "frozen".
524+
This means that their code objects (and bytecode) are statically allocated
525+
by the interpreter. This reduces the steps in module execution process to this:
526+
527+
.. code-block:: text
528+
529+
Statically allocated code object -> Evaluate
530+
531+
Interpreter startup is now 10-15% faster in Python 3.11. This has a big
532+
impact for short-running programs using Python.
533+
534+
(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
535+
536+
537+
Faster Runtime
538+
--------------
539+
540+
Cheaper, lazy Python frames
541+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
542+
Python frames are created whenever Python calls a Python function. This frame
543+
holds execution information. The following are new frame optimizations:
544+
545+
- Streamlined the frame creation process.
546+
- Avoided memory allocation by generously re-using frame space on the C stack.
547+
- Streamlined the internal frame struct to contain only essential information.
548+
Frames previously held extra debugging and memory management information.
549+
550+
Old-style frame objects are now created only when required by debuggers. For
551+
most user code, no frame objects are created at all. As a result, nearly all
552+
Python functions calls have sped up significantly. We measured a 3-7% speedup
553+
in pyperformance.
554+
555+
(Contributed by Mark Shannon in :issue:`44590`.)
556+
557+
.. _inline-calls:
558+
559+
Inlined Python function calls
560+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
561+
During a Python function call, Python will call an evaluating C function to
562+
interpret that function's code. This effectively limits pure Python recursion to
563+
what's safe for the C stack.
564+
565+
In 3.11, when CPython detects Python code calling another Python function,
566+
it sets up a new frame, and "jumps" to the new code inside the new frame. This
567+
avoids calling the C interpreting function altogether.
568+
569+
Most Python function calls now consume no C stack space. This speeds up
570+
most of such calls. In simple recursive functions like fibonacci or
571+
factorial, a 1.7x speedup was observed. This also means recursive functions
572+
can recurse significantly deeper (if the user increases the recursion limit).
573+
We measured a 1-3% improvement in pyperformance.
574+
575+
(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)
576+
577+
PEP 659: Specializing Adaptive Interpreter
578+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
579+
:pep:`659` is one of the key parts of the faster CPython project. The general
580+
idea is that while Python is a dynamic language, most code has regions where
581+
objects and types rarely change. This concept is known as *type stability*.
582+
583+
At runtime, Python will try to look for common patterns and type stability
584+
in the executing code. Python will then replace the current operation with a
585+
more specialized one. This specialized operation uses fast paths available only
586+
to those use cases/types, which generally outperform their generic
587+
counterparts. This also brings in another concept called *inline caching*, where
588+
Python caches the results of expensive operations directly in the bytecode.
589+
590+
The specializer will also combine certain common instruction pairs into one
591+
superinstruction. This reduces the overhead during execution.
592+
593+
Python will only specialize
594+
when it sees code that is "hot" (executed multiple times). This prevents Python
595+
from wasting time for run-once code. Python can also de-specialize when code is
596+
too dynamic or when the use changes. Specialization is attempted periodically,
597+
and specialization attempts are not too expensive. This allows specialization
598+
to adapt to new circumstances.
599+
600+
(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
601+
See :pep:`659` for more information.)
602+
603+
..
604+
If I missed out anyone, please add them.
605+
606+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
607+
| Operation | Form | Specialization | Operation speedup | Contributor(s) |
608+
| | | | (up to) | |
609+
+===============+====================+=======================================================+===================+===================+
610+
| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
611+
| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, |
612+
| | | fast paths for their underlying types. | | Brandt Bucher, |
613+
| | | | | Dennis Sweeney |
614+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
615+
| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-25% | Irit Katriel, |
616+
| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon |
617+
| | | data structures. | | |
618+
| | | | | |
619+
| | | Subscripting custom ``__getitem__`` | | |
620+
| | | is also inlined similar to :ref:`inline-calls`. | | |
621+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
622+
| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney |
623+
| subscript | | | | |
624+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
625+
| Calls | ``f(arg)`` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, |
626+
| | ``C(arg)`` | as ``len`` and ``str`` directly call their underlying | | Ken Jin |
627+
| | | C version. This avoids going through the internal | | |
628+
| | | calling convention. | | |
629+
| | | | | |
630+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
631+
| Load | ``print`` | The object's index in the globals/builtins namespace | [1]_ | Mark Shannon |
632+
| global | ``len`` | is cached. Loading globals and builtins require | | |
633+
| variable | | zero namespace lookups. | | |
634+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
635+
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [2]_ | Mark Shannon |
636+
| attribute | | index inside the class/object's namespace is cached. | | |
637+
| | | In most cases, attribute loading will require zero | | |
638+
| | | namespace lookups. | | |
639+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
640+
| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% | Ken Jin, |
641+
| methods for | | loading now has no namespace lookups -- even for | | Mark Shannon |
642+
| call | | classes with long inheritance chains. | | |
643+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
644+
| Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon |
645+
| attribute | | | in pyperformance | |
646+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
647+
| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | 8% | Brandt Bucher |
648+
| Sequence | | and ``tuple``. Avoids internal calling convention. | | |
649+
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
650+
651+
.. [1] A similar optimization already existed since Python 3.8. 3.11
652+
specializes for more forms and reduces some overhead.
653+
654+
.. [2] A similar optimization already existed since Python 3.10.
655+
3.11 specializes for more forms. Furthermore, all attribute loads should
656+
be sped up by :issue:`45947`.
657+
658+
659+
Misc
660+
----
661+
662+
* Objects now require less memory due to lazily created object namespaces. Their
663+
namespace dictionaries now also share keys more freely.
664+
(Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.)
665+
666+
* A more concise representation of exceptions in the interpreter reduced the
667+
time required for catching an exception by about 10%.
668+
(Contributed by Irit Katriel in :issue:`45711`.)
669+
670+
FAQ
671+
---
672+
673+
| Q: How should I write my code to utilize these speedups?
674+
|
675+
| A: You don't have to change your code. Write Pythonic code that follows common
676+
best practices. The Faster CPython project optimizes for common code
677+
patterns we observe.
678+
|
679+
|
680+
| Q: Will CPython 3.11 use more memory?
681+
|
682+
| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
683+
This is offset by memory optimizations for frame objects and object
684+
dictionaries as mentioned above.
685+
|
686+
|
687+
| Q: I don't see any speedups in my workload. Why?
688+
|
689+
| A: Certain code won't have noticeable benefits. If your code spends most of
690+
its time on I/O operations, or already does most of its
691+
computation in a C extension library like numpy, there won't be significant
692+
speedup. This project currently benefits pure-Python workloads the most.
693+
|
694+
| Furthermore, the pyperformance figures are a geometric mean. Even within the
695+
pyperformance benchmarks, certain benchmarks have slowed down slightly, while
696+
others have sped up by nearly 2x!
697+
|
698+
|
699+
| Q: Is there a JIT compiler?
700+
|
701+
| A: No. We're still exploring other optimizations.
702+
703+
704+
About
705+
-----
706+
707+
Faster CPython explores optimizations for :term:`CPython`. The main team is
708+
funded by Microsoft to work on this full-time. Pablo Galindo Salgado is also
709+
funded by Bloomberg LP to work on the project part-time. Finally, many
710+
contributors are volunteers from the community.
711+
712+
501713
CPython bytecode changes
502714
========================
503715

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Add a What's New in Python 3.11 entry for the Faster CPython project.
2+
Documentation by Ken Jin and Kumar Aditya.

0 commit comments

Comments
 (0)
0