-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Increase _PY_NSMALLPOSINTS size #133059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The graph below has helped to make up my mind. Given all range calls for So for example, given all those It is visible that usage of numbers in range Although the results are plotted based on lower bound, which is 0.35M, but there is a a significant segregation of those search results in the range of Numbers in this range are used only 2-3x less than the ones in The next range of numbers - So my proposal is to set this number to It includes a common 1024 and captures full range of 1000, which is very common. It has a significant positive impact on 2 benchmarks:
Extra costs for increasing this number to 1025 are:
|
@dg-pb do you have a public branch that we can benchmark on our infrastructure? |
If you want to check different numbers there is nothing else to do apart from:
|
To address negatives (as per comment/request #133160 (comment)), I have done a quick check.
So from (1), current negative range of But range analysis (2) shows that there is some scope to consider increasing this number. In this analysis it largely depends what perspective is taken: (b) in my opinion is much more tangible POV. Also, there is little to no chance that such a small increase in negative number cache size has noticeable impact on runtime of benchmarks. Given the above I am comfortable of not running them for such a negligible chance. Even if significant improvement was detected, it would most likely be explained away by it being a very specific case. Frequency of negative integer usage is simply nowhere close to that of positive ones and those that are used at startup are already captured. Thus, I don't think there is enough evidence to justify increasing cache size of negative numbers. And I have not encountered any frequent use cases in practice that could challenge this conclusion. |
Feature or enhancement
Proposal:
See link to faster CPython for initial analyses and backstory if interested, but I will try to summarise it all here:
So the intent is to make
int
creation faster:Ant pre-storing
int
objects is straight forward path to achieve this:While increasing value of
_PY_NSMALLPOSINTS
does exactly that.Benefits of this can be observed in benchmarks in
pyperformance
.With each incremental increase of this number new statistically significant benefits surface:
regex_v8
regex_dna
and 8% fasterregex_effbot
genshi_*
,regex_compile
,scimark_*
,spectral_norm
,xml_*
and few morescimark_monte_carlo
, 6% fasterscimark_sparse_mat_mult
and 12% fasterspectral_norm
As it can be seen, each range of integers have benefits for different applications:
This number can be increased to 1M, 10M, etc and it is possible to find further observable benefits in specific applications.
Having that said, more involved scientific applications should aim to benefit from optimized libraries such as
numpy
, which most likely leaves (3) way out of scope of this change.To attempt to find most optimal number the following two calculations can give some insight:
Python is launched with:
Cumulative density graph of
PyLongObject
requests:256 captures ~83% of all used numbers in startup and imports.
Beyond 256, there are 2 visible increments of at 512 and 4096.
4096 would increase coverage to 93%.
Above only captures imports and not the user use cases.
For that
github
search for\brange(\d+)
function can provide some more insight.Total of 4.3M use cases.
It can be seen that 91% of ranges that are initialised with one argument have it set below 260. So current cache size covers it well.
However, the actual number of use cases is not indicative of performance benefit.
What is more appropriate here is to see how many integer objects would be re-used at each cache size.
Below is the graph of extra integers re-used with each incremental increase of this number.
So current number 256 allows re-using of 99M (11+54+34) integers that are generated by
range
cases.Increasing it to 1000 would add extra 33M.
Increasing it to 10000 would add extra 240M (33 + 2 + 5 + 200).
Etc...
So this is in line with findings from benchmark results above:
So my best guess is that the upper bound for this number is ~10K, which:
There should be a very good reason to sacrifice anything more only for integer storage.
(And I suspect that this can be higher than what others would agree on.)
From all the information above together with some analyses from @eendebakpt in faster-cpython/ideas#725 it seems that both 1024 and 2048 could be good candidates.
So this is my best inference.
What do others think?
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
faster-cpython/ideas#725
Linked PRs
The text was updated successfully, but these errors were encountered: