-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
use libdivide to make integer division faster #14959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If the libdivide approach is really that good, wouldn't it make more sense to try and upstream it into gcc so that all c code benefits? |
I suppose specifically for broadcast multiplies, the shift / multiply could be computed just once, then applied to the entire axis. |
It is actually in gcc and other compilers. The problem though is that in order to use this approach, you have to do a small bit of work that costs about 3 integer divisions. Compilers can do that for division by constant, but they can't do it for cases where the divisor isn't visible to them. For numpy where you can see exactly how many divisions you need to do ahead of time, this could be a big win. |
@eric-wieser , @seberg , we can also explore Karatsuba if numbers are huge? Also, is this open to change? would like to explore by replacing a small bit and testing performance. |
The code to move the constant value out of the loop is pretty deep in the C macros, I think this would be the place to expand the logic so that if Benchmarking this would be important to a PR to show that the increase in code complexity is worth the change. |
Performance gains are massive :) Sample Code: print(np.__version__)
x = np.arange(1000000000)
cProfile.run('x//30')
print(x//30) Now output 🎉
And after libdivde:
Things I did:
I'll make it more generic and extend to other types. |
Hmm, I think I once also thought that the line But it indeed looks like it is may be worth the trouble of adding a fast path for 0-stride case! (I am not sure how big a dependency/chunk of code libdivide is). |
Thanks @seberg , libdivide is a header only, we can package it(79KB) or add it as a dependency to install. |
@ganesh-k13 one thing to remember is that we cannot hardcode certain SIMD instructions sets. So the functions provided need to be wrapped into the new universal intrinsics, also we need to be careful about retaining edge-case behaviour (or at least intentionally breaking it). |
Thanks for the heads up @seberg . Seems like it's better if we build it separately. Perhaps have a knob of some kind to use libdivide(at least initially, like put it in a #def). Coming to instruction sets, take a look at this. Seems like they are using a wider range for compatibility. So maybe we could use it as is.
Can you elaborate on that?
Do we mail them? |
Also, when we build numpy, I see this output:
Perhaps we can use this to build libdivide as well? |
Oh, about edge cases, I am unsure if we have some defined behaviour e.g. for |
Right now, integer division uses the standard division instructions, but libdivide, a header-only C library, allows you to replace integer division with integer multiplication and a bitshift. On my machine, that's about 5x faster.
Alternatively, it would be good if integer division by a power of 2 (and maybe multiplication too?) were implemented with bitshifts.
The text was updated successfully, but these errors were encountered: