@@ -62,6 +62,8 @@ Summary -- Release highlights
62
62
.. This section singles out the most important changes in Python 3.11.
63
63
Brevity is key.
64
64
65
+ - Python 3.11 is up to 10-60% faster than Python 3.10. On average, we measured a
66
+ 1.22x speedup on the standard benchmark suite. See `Faster CPython `_ for details.
65
67
66
68
.. PEP-sized items next.
67
69
@@ -477,13 +479,6 @@ Optimizations
477
479
almost eliminated when no exception is raised.
478
480
(Contributed by Mark Shannon in :issue: `40222 `.)
479
481
480
- * Method calls with keywords are now faster due to bytecode
481
- changes which avoid creating bound method instances. Previously, this
482
- optimization was applied only to method calls with purely positional
483
- arguments.
484
- (Contributed by Ken Jin and Mark Shannon in :issue: `26110 `, based on ideas
485
- implemented in PyPy.)
486
-
487
482
* Pure ASCII strings are now normalized in constant time by :func: `unicodedata.normalize `.
488
483
(Contributed by Dong-hee Na in :issue: `44987 `.)
489
484
@@ -498,6 +493,223 @@ Optimizations
498
493
(Contributed by Inada Naoki in :issue: `46845 `.)
499
494
500
495
496
+ Faster CPython
497
+ ==============
498
+
499
+ CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst >`_
500
+ than CPython 3.10 when measured with the
501
+ `pyperformance <https://github.com/python/pyperformance >`_ benchmark suite,
502
+ and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
503
+ could be up to 10-60% faster.
504
+
505
+ This project focuses on two major areas in Python: faster startup and faster
506
+ runtime. Other optimizations not under this project are listed in `Optimizations `_.
507
+
508
+ Faster Startup
509
+ --------------
510
+
511
+ Frozen imports / Static code objects
512
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
513
+
514
+ Python caches bytecode in the :ref: `__pycache__<tut-pycache> ` directory to
515
+ speed up module loading.
516
+
517
+ Previously in 3.10, Python module execution looked like this:
518
+
519
+ .. code-block :: text
520
+
521
+ Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate
522
+
523
+ In Python 3.11, the core modules essential for Python startup are "frozen".
524
+ This means that their code objects (and bytecode) are statically allocated
525
+ by the interpreter. This reduces the steps in module execution process to this:
526
+
527
+ .. code-block :: text
528
+
529
+ Statically allocated code object -> Evaluate
530
+
531
+ Interpreter startup is now 10-15% faster in Python 3.11. This has a big
532
+ impact for short-running programs using Python.
533
+
534
+ (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
535
+
536
+
537
+ Faster Runtime
538
+ --------------
539
+
540
+ Cheaper, lazy Python frames
541
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
542
+ Python frames are created whenever Python calls a Python function. This frame
543
+ holds execution information. The following are new frame optimizations:
544
+
545
+ - Streamlined the frame creation process.
546
+ - Avoided memory allocation by generously re-using frame space on the C stack.
547
+ - Streamlined the internal frame struct to contain only essential information.
548
+ Frames previously held extra debugging and memory management information.
549
+
550
+ Old-style frame objects are now created only when required by debuggers. For
551
+ most user code, no frame objects are created at all. As a result, nearly all
552
+ Python functions calls have sped up significantly. We measured a 3-7% speedup
553
+ in pyperformance.
554
+
555
+ (Contributed by Mark Shannon in :issue: `44590 `.)
556
+
557
+ .. _inline-calls :
558
+
559
+ Inlined Python function calls
560
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
561
+ During a Python function call, Python will call an evaluating C function to
562
+ interpret that function's code. This effectively limits pure Python recursion to
563
+ what's safe for the C stack.
564
+
565
+ In 3.11, when CPython detects Python code calling another Python function,
566
+ it sets up a new frame, and "jumps" to the new code inside the new frame. This
567
+ avoids calling the C interpreting function altogether.
568
+
569
+ Most Python function calls now consume no C stack space. This speeds up
570
+ most of such calls. In simple recursive functions like fibonacci or
571
+ factorial, a 1.7x speedup was observed. This also means recursive functions
572
+ can recurse significantly deeper (if the user increases the recursion limit).
573
+ We measured a 1-3% improvement in pyperformance.
574
+
575
+ (Contributed by Pablo Galindo and Mark Shannon in :issue: `45256 `.)
576
+
577
+ PEP 659: Specializing Adaptive Interpreter
578
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
579
+ :pep: `659 ` is one of the key parts of the faster CPython project. The general
580
+ idea is that while Python is a dynamic language, most code has regions where
581
+ objects and types rarely change. This concept is known as *type stability *.
582
+
583
+ At runtime, Python will try to look for common patterns and type stability
584
+ in the executing code. Python will then replace the current operation with a
585
+ more specialized one. This specialized operation uses fast paths available only
586
+ to those use cases/types, which generally outperform their generic
587
+ counterparts. This also brings in another concept called *inline caching *, where
588
+ Python caches the results of expensive operations directly in the bytecode.
589
+
590
+ The specializer will also combine certain common instruction pairs into one
591
+ superinstruction. This reduces the overhead during execution.
592
+
593
+ Python will only specialize
594
+ when it sees code that is "hot" (executed multiple times). This prevents Python
595
+ from wasting time for run-once code. Python can also de-specialize when code is
596
+ too dynamic or when the use changes. Specialization is attempted periodically,
597
+ and specialization attempts are not too expensive. This allows specialization
598
+ to adapt to new circumstances.
599
+
600
+ (PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
601
+ See :pep: `659 ` for more information.)
602
+
603
+ ..
604
+ If I missed out anyone, please add them.
605
+
606
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
607
+ | Operation | Form | Specialization | Operation speedup | Contributor(s) |
608
+ | | | | (up to) | |
609
+ +===============+====================+=======================================================+===================+===================+
610
+ | Binary | ``x+x; x*x; x-x; `` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
611
+ | operations | | such as ``int ``, ``float ``, and ``str `` take custom | | Dong-hee Na, |
612
+ | | | fast paths for their underlying types. | | Brandt Bucher, |
613
+ | | | | | Dennis Sweeney |
614
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
615
+ | Subscript | ``a[i] `` | Subscripting container types such as ``list ``, | 10-25% | Irit Katriel, |
616
+ | | | ``tuple `` and ``dict `` directly index the underlying | | Mark Shannon |
617
+ | | | data structures. | | |
618
+ | | | | | |
619
+ | | | Subscripting custom ``__getitem__ `` | | |
620
+ | | | is also inlined similar to :ref: `inline-calls `. | | |
621
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
622
+ | Store | ``a[i] = z `` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney |
623
+ | subscript | | | | |
624
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
625
+ | Calls | ``f(arg) `` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, |
626
+ | | ``C(arg) `` | as ``len `` and ``str `` directly call their underlying | | Ken Jin |
627
+ | | | C version. This avoids going through the internal | | |
628
+ | | | calling convention. | | |
629
+ | | | | | |
630
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
631
+ | Load | ``print `` | The object's index in the globals/builtins namespace | [1 ]_ | Mark Shannon |
632
+ | global | ``len `` | is cached. Loading globals and builtins require | | |
633
+ | variable | | zero namespace lookups. | | |
634
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
635
+ | Load | ``o.attr `` | Similar to loading global variables. The attribute's | [2 ]_ | Mark Shannon |
636
+ | attribute | | index inside the class/object's namespace is cached. | | |
637
+ | | | In most cases, attribute loading will require zero | | |
638
+ | | | namespace lookups. | | |
639
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
640
+ | Load | ``o.meth() `` | The actual address of the method is cached. Method | 10-20% | Ken Jin, |
641
+ | methods for | | loading now has no namespace lookups -- even for | | Mark Shannon |
642
+ | call | | classes with long inheritance chains. | | |
643
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
644
+ | Store | ``o.attr = z `` | Similar to load attribute optimization. | 2% | Mark Shannon |
645
+ | attribute | | | in pyperformance | |
646
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
647
+ | Unpack | ``*seq `` | Specialized for common containers such as ``list `` | 8% | Brandt Bucher |
648
+ | Sequence | | and ``tuple ``. Avoids internal calling convention. | | |
649
+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
650
+
651
+ .. [1 ] A similar optimization already existed since Python 3.8. 3.11
652
+ specializes for more forms and reduces some overhead.
653
+
654
+ .. [2 ] A similar optimization already existed since Python 3.10.
655
+ 3.11 specializes for more forms. Furthermore, all attribute loads should
656
+ be sped up by :issue: `45947 `.
657
+
658
+
659
+ Misc
660
+ ----
661
+
662
+ * Objects now require less memory due to lazily created object namespaces. Their
663
+ namespace dictionaries now also share keys more freely.
664
+ (Contributed Mark Shannon in :issue: `45340 ` and :issue: `40116 `.)
665
+
666
+ * A more concise representation of exceptions in the interpreter reduced the
667
+ time required for catching an exception by about 10%.
668
+ (Contributed by Irit Katriel in :issue: `45711 `.)
669
+
670
+ FAQ
671
+ ---
672
+
673
+ | Q: How should I write my code to utilize these speedups?
674
+ |
675
+ | A: You don't have to change your code. Write Pythonic code that follows common
676
+ best practices. The Faster CPython project optimizes for common code
677
+ patterns we observe.
678
+ |
679
+ |
680
+ | Q: Will CPython 3.11 use more memory?
681
+ |
682
+ | A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
683
+ This is offset by memory optimizations for frame objects and object
684
+ dictionaries as mentioned above.
685
+ |
686
+ |
687
+ | Q: I don't see any speedups in my workload. Why?
688
+ |
689
+ | A: Certain code won't have noticeable benefits. If your code spends most of
690
+ its time on I/O operations, or already does most of its
691
+ computation in a C extension library like numpy, there won't be significant
692
+ speedup. This project currently benefits pure-Python workloads the most.
693
+ |
694
+ | Furthermore, the pyperformance figures are a geometric mean. Even within the
695
+ pyperformance benchmarks, certain benchmarks have slowed down slightly, while
696
+ others have sped up by nearly 2x!
697
+ |
698
+ |
699
+ | Q: Is there a JIT compiler?
700
+ |
701
+ | A: No. We're still exploring other optimizations.
702
+
703
+
704
+ About
705
+ -----
706
+
707
+ Faster CPython explores optimizations for :term: `CPython `. The main team is
708
+ funded by Microsoft to work on this full-time. Pablo Galindo Salgado is also
709
+ funded by Bloomberg LP to work on the project part-time. Finally, many
710
+ contributors are volunteers from the community.
711
+
712
+
501
713
CPython bytecode changes
502
714
========================
503
715
0 commit comments