8000 bpo-43420: Simple optimizations for Fraction's arithmetics by skirpichev · Pull Request #24779 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-43420: Simple optimizations for Fraction's arithmetics #24779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 22, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Special-case for trivial gcd in Fraction._mul()/_div() as well
  • Loading branch information
skirpichev committed Mar 16, 2021
commit 3d5d321918cab8a05f51630dcec407c27955db13
21 changes: 18 additions & 3 deletions Lib/fractions.py
8000
Original file line number Diff line number Diff line change
Expand Up @@ -445,6 +445,10 @@ def reverse(b, a):
# Indeed, pick (na//g1). It's coprime with (da//g2), because input
# fractions are normalized. It's also coprime with (db//g1), because
# common factors are removed by g1 == gcd(na, db).
#
# As for addition/substraction, we should special-case g1 == 1
# and g2 == 1 for same reason. That happens also for multiplying
# rationals, obtained from floats.

def _add_sub_(a, b, pm=int.__add__):
na, da = a._numerator, a._denominator
Expand Down Expand Up @@ -474,9 +478,14 @@ def _mul(a, b):
na, da = a._numerator, a._denominator
nb, db = b._numerator, b._denominator
g1 = math.gcd(na, db)
if g1 > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "> 1" tests should be removed. The cost of a test is in the same ballpark as the cost of a division. The code is clearer if you just always divide. The performance gain mostly comes from two small gcd's being faster than a single big gcd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "> 1" tests should be removed.

Maybe, I didn't benchmark this closely, but...

The cost of a test is in the same ballpark as the cost of a division.

This is a micro-optimization, yes. But ">" has not same cost as "//" even in the CPython:

$ ./python -m timeit -s 'a = 1' -s 'b = 1' 'a > b'
5000000 loops, best of 5: 82.3 nsec per loop
$ ./python -m timeit -s 'a = 111111111111111111111111111111111' -s 'b = 1' 'a > b'
5000000 loops, best of 5: 86.1 nsec per loop
$ ./python -m timeit -s 'a = 11111' -s 'b = 1' 'a // b'
2000000 loops, best of 5: 120 nsec per loop
$ ./python -m timeit -s 'a = 111111111111111111111111' -s 'b = 1' 'a // b'
1000000 loops, best of 5: 203 nsec per loop
$ ./python -m timeit -s 'a = 1111111111111111111111111111111111111111111111' -s 'b = 1' 'a // b'
1000000 loops, best of 5: 251 nsec per loop
$ ./python -m timeit -s 'a = 111111111111111111111111111111122222222222222111111111111111' \
 -s 'b = 1' 'a // b'
1000000 loops, best of 5: 294 nsec per loop

Maybe my measurements are primitive, but there is some pattern...

The code is clearer if you just always divide.

I tried to explain reasons for branching in the comment above code. Careful reader might ask: why you don't include same optimizations in the _mul as in _add/sub?

On another hand, gmplib doesn't use thise optimizations (c.f. same in mpz_add/sub). SymPy's PythonRational class - too.

So, I did my best in defending those branches and I'm fine with removing, if @tim-one has no arguments against.

The performance gain mostly comes from two small gcd's being faster than a single big gcd.

This is more important, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost of a test is in the same ballpark as the cost of a division.

? Comparing g to 1 takes, at worst, small constant time, independent of how many digits g has. But n // 1 takes time proportional to the number of internal digits in n - it's a slow, indirect way of making a physical copy of n. Since we expect g to be 1 most of the time, and these are unbounded ints, the case for checking g > 1 seems clear to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind either way.

If it were just me, I would leave out the '> 1' tests because:

  1. For small inputs, the code is about 10% faster without the guards.
  2. For random 100,000 bit numerators and denominators where the gcd's were equal to 1, my timings show no discernible benefit.
  3. The code and tests are simpler without the guards.

Copy link
Member
@tim-one tim-one Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That just doesn't make much sense to me. On my box just now, with the released 3.9.1:

C:\Windows\System32>py -m timeit -s "g = 1" "g > 1"
10000000 loops, best of 5: 24.2 nsec per loop

C:\Windows\System32>py -m timeit -s "g = 1; t = 1 << 100000" "g > 1; t // g"
10000 loops, best of 5: 32.4 usec per loop

So doing the useless division not only drove the per-iteration measure from 24.2 to 32.4, the unit increased by a factor of 1000(!), from nanoseconds to microseconds. That's using, by construction, an exactly 100,001 bit numerator.

And, no, removing the test barely improves that relative disaster at all:

C:\Windows\System32>py -m timeit -s "g = 1; t = 1 << 100000" "t // g"
10000 loops, best of 5: 32.3 usec per loop

BTW, as expected, boost the number of numerator bits by a factor of 10 (to a million+1), and per-loop expense also becomes 10 times as much; cut it by a factor 10, and similarly the per-loop expense is also cut by a factor of 10.

With respect to the BPO message you linked to, it appeared atypical: both gcds were greater than 1: (10 / 3) * (6 / 5), so there was never any possibility of testing for > 1 paying off (while in real life a gcd of 1 is probably most common, and when multiplying Fractions obtained from floats gcd=1 is certain unless one of the inputs is an even integer (edited: used to say 0, but it's broader than just that)).

na //= g1
db //= g1
g2 = math.gcd(nb, da)
return Fraction((na // g1) * (nb // g2),
(db // g1) * (da // g2), _normalize=False)
if g2 > 1:
nb //= g2
da //= g2
return Fraction(na * nb, db * da, _normalize=False)

__mul__, __rmul__ = _operator_fallbacks(_mul, operator.mul)

Expand All @@ -486,8 +495,14 @@ def _div(a, b):
na, da = a._numerator, a._denominator
nb, db = b._numerator, b._denominator
g1 = math.gcd(na, nb)
if g1 > 1:
na //= g1
nb //= g1
g2 = math.gcd(db, da)
n, d = (na // g1) * (db // g2), (nb // g1) * (da // g2)
if g2 > 1:
da //= g2
db //= g2
n, d = na * db, nb * da
if nb < 0:
n, d = -n, -d
return Fraction(n, d, _normalize=False)
Expand Down
0