8000 Comparisons between numpy scalars and python scalars are very slow · Issue #14415 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Comparisons between numpy scalars and python scalars are very slow #14415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
atg opened this issue Sep 3, 2019 · 5 comments
Closed

Comparisons between numpy scalars and python scalars are very slow #14415

atg opened this issue Sep 3, 2019 · 5 comments

Comments

@atg
Copy link
atg commented Sep 3, 2019

Comparisons between numpy scalars and python scalars are very slow. It's an order of magnitude faster to do int(npint) == pyint than npint == pyint.

It seems to me that at worst be no slower as numpy could just do int() for me?

The slowdown was apparent with all the types I tried including uint8, int32, float32 and uint64, but not int64 and float64, which are fast.

Reproducing code example

import numpy as np
import time

N = 1024 * 32
T = np.int32

def testPvP():
  y = 0
  x = T(y)
  t0 = time.perf_counter()
  for i in range(N):
    int(x) == y
  t = time.perf_counter() - t0
  print('int(np) == py:', t * 1000, 'ms')

def testNvP():
  y = 0
  x = T(y)
  t0 = time.perf_counter()
  for i in range(N):
    x == y
  t = time.perf_counter() - t0
  print('     np == py:', t * 1000, 'ms')

for i in range(5):
  testPvP()
  testNvP()
  print()
int(np) == py: 7.214172000000005 ms
     np == py: 50.713307000000015 ms

int(np) == py: 6.769899999999995 ms
     np == py: 51.27685699999998 ms

int(np) == py: 6.717130000000015 ms
     np == py: 55.45432100000003 ms

int(np) == py: 7.091216000000012 ms
     np == py: 51.84300200000003 ms

int(np) == py: 6.472694000000001 ms
     np == py: 55.38991200000004 ms

Numpy/Python version information:

1.17.0 3.7.4 (default, Jul 13 2019, 14:20:24)
[GCC 6.3.0 20170516]
@seberg
Copy link
Member
seberg commented Sep 4, 2019

It is a bit more complex than that, since numpy scalars behave like comparison with arrays. I.e. you can compare them to a list and it will coerce the list.

That said, for comparisons there seems to be no fast paths meaning that everything is always coerced to an array first. So those fast paths could be added to make the performance gap smaller.

@ganesh-k13
Copy link
Member

Hey @seberg , I can see this line is where we do the array initialize:

arr = PyArray_FromScalar(self, NULL);

Do you think it will be worthwhile to include a check to see if we need to create an array for comparison? Willing to try some edits and bring some performance data to take a call.

@seberg
Copy link
Member
seberg commented Nov 14, 2020

@ganesh-k13 there is a lot more to unravel there, since for the important things it is actuall done here:

@name@_richcompare(PyObject *self, PyObject *other, int cmp_op)

I had an older branch were I started to think of cleaning it up, I think it is this one: https://github.com/numpy/numpy/compare/master...seberg:scalar-binop-giveup-in-fallback?expand=1 that might also make this path faster. There is a lot going on there though (and the branch doesn't apply to master currently).

I have given up on the branch for now currently, not because its not worth cleaning it up, but because I figured it is not really a necessary clean up for my dtypes work.

If you want to look at this and get confused, I would be happy to help you out or do a video call.

@ganesh-k13
Copy link
Member

Thanks a lot for the info @seberg , I'll explore that part more and get back 👍 .

@seberg
Copy link
Member
seberg commented Nov 21, 2022

I am sure there is more to do here, but note that if activating NEP 50, it is mostly fast now. Of course not quite as fast as it could be, but faster. The reason is preserving the NumPy dtype, i.e. with:

np._set_promotion_state("weak")  # On current main/NumPy 1.24

The timings are for me (int64 would be native, not int32):

int(np) == py: 2.8434160631150007 ms
     np == py: 1.5849580522626638 ms

int(np) == py: 2.695708069950342 ms
     np == py: 1.4313339488580823 ms

int(np) == py: 2.4105830816552043 ms
     np == py: 1.411624951288104 ms

int(np) == py: 2.41049996111542 ms
     np == py: 1.4116250677034259 ms

int(np) == py: 2.2635000059381127 ms
     np == py: 1.280000084079802 ms

So if we do weak promotion for scalars, improvements can be done, but the main issue here is solved.

@seberg seberg closed this as completed Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0