ENH: Fast paths for richcompare #17970

ganesh-k13 · 2020-12-09T18:32:44Z

Changes:

Added check to see if RHS is a python number and prevent the creation of array.
Added bench

Benchmarks:

                                                                                                                   
********************************************************************************                                                                          
WARNING: you have uncommitted changes --- these will NOT be benchmarked!                                                                                  
********************************************************************************                                                                          
· Creating environments                                                      
· Discovering benchmarks                                                     
·· Uninstalling from virtualenv-py3.8-Cython                                 
·· Building 933620b4  for virtualenv-py3.8-Cython..................................                                                                                                                                                                                                          
·· Installing 933620b4  into virtualenv-py3.8-Cython.                                                                             
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)                                                                                  
[  0.00%] · For numpy commit 4d290795  (round 1/2):                                                                                               
[  0.00%] ·· Building for virtualenv-py3.8-Cython.                                                                                                                                                                                                                                                                   
[  0.00%] ·· Benchmarking virtualenv-py3.8-Cython                                                                                                         
[ 25.00%] ··· Running (bench_ufunc.CustomScalarEqual.time_scalar_equality--).                                                                             
[ 25.00%] · For numpy commit 933620b4  (round 1/2):                                                                               
[ 25.00%] ·· Building for virtualenv-py3.8-Cython.                                                                                                        
[ 25.00%] ·· Benchmarking virtualenv-py3.8-Cython                                                                                                         
[ 50.00%] ··· Running (bench_ufunc.CustomScalarEqual.time_scalar_equality--).                                                                             
[ 50.00%] · For numpy commit 933620b4  (round 2/2):                                                                               
[ 50.00%] ·· Benchmarking virtualenv-py3.8-Cython                                                                                                         
[ 75.00%] ··· bench_ufunc.CustomScalarEqual.time_scalar_equality                                                                                                              ok                                                                                                                                     
[ 75.00%] ··· =============== ==========                                     
                   dtype                                                                                                                                  
              --------------- ----------                                     
    
8000
             numpy.int8    439±6ms                                       
                numpy.int16    445±8ms                                       
                numpy.int32    435±10ms                                      
                numpy.int64    147±2ms                                       
                numpy.uint8    427±10ms                                      
                numpy.uint16   444±8ms                                       
                numpy.uint32   437±10ms                                      
                numpy.uint64   447±4ms                                       
               numpy.float32   471±10ms                                      
               numpy.float64   180±4ms                                                                                                                    
              =============== ==========                                                                                                                  
                                                                             
[ 75.00%] · For numpy commit 4d290795  (round 2/2):                                                                                               
[ 75.00%] ·· Building for virtualenv-py3.8-Cython.                                                                                                        
[ 75.00%] ·· Benchmarking virtualenv-py3.8-Cython                                                                                                         
[100.00%] ··· bench_ufunc.CustomScalarEqual.time_scalar_equality                                                                                                              ok                                                                                                                                     
[100.00%] ··· =============== ============                                   
                   dtype                                                     
              --------------- ------------                                   
                 numpy.int8     1.35±0s                                      
                numpy.int16    1.34±0.01s                                    
                numpy.int32    1.34±0.02s                                    
                numpy.int64     147±3ms                                      
                numpy.uint8     1.34±0s                                      
                numpy.uint16    1.36±0s                                      
                numpy.uint32    1.35±0s                                      
                numpy.uint64   1.54±0.01s                                    
               numpy.float32    1.54±0s                                      
               numpy.float64    184±4ms                                      
              =============== ============                                   

       before           after         ratio                                  
     [4d290795]       [933620b4]                                             
                                                                                                                          
-      1.34±0.01s          445±8ms     0.33  bench_ufunc.CustomScalarEqual.time_scalar_equality()                                                                                                                                                                                               
-         1.36±0s          444±8ms     0.33  bench_ufunc.CustomScalarEqual.time_scalar_equality()                                                                                                                                                                                              
-      1.34±0.02s         435±10ms     0.33  bench_ufunc.CustomScalarEqual.time_scalar_equality()                                                                                                                                                                                               
-         1.35±0s          439±6ms     0.33  bench_ufunc.CustomScalarEqual.time_scalar_equality()                                                                                                                                                                                                
-         1.35±0s         437±10ms     0.32  bench_ufunc.CustomScalarEqual.time_scalar_equality()                                                                                                                                                                                              
-         1.34±0s         427±10ms     0.32  bench_ufunc.CustomScalarEqual.time_scalar_equality()                                                                                                                                                                                               
-         1.54±0s         471±10ms     0.31  bench_ufunc.CustomScalarEqual.time_scalar_equality()                                                                                                                                                                                             
-      1.54±0.01s          447±4ms     0.29  bench_ufunc.CustomScalarEqual.time_scalar_equality()                                                                                                                                                                                              

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.                                  
PERFORMANCE INCREASED.

resolves: #14415

cc: @seberg , @eric-wieser , @mattip

seberg · 2020-12-09T18:40:35Z

Thanks, I will have to think about it more. I do guess that it might be nice to intercept some of this even earlier, but I do not mind injecting a fast-path there it is practical and doesn't add much complexity.

One thing to think about .item() has two awkward cases:

Low precision floating point numbers are upcast to float64, I expect that is fine, but did not think it through.
High precision floating point numbers (longdouble, clongdouble) are unchanged, that means the fast-path is definitely broken for them. Maybe that is the test-failure, maybe not.

(To be fair, the 2. part is already broken in some similar paths, but this would break it more/worse.)

ganesh-k13 · 2020-12-09T18:42:11Z

Thanks for the review @seberg , will test it out 👍

ganesh-k13 · 2020-12-10T10:21:15Z

I tried something like this:

* #Name = Long*11, Float*3,,,#
* #fastpath = 1*14, 0*3#

.....

   case -2:
        /* use ufunc */
        if (PyErr_Occurred()) {
            return NULL;
        }
#if @fastpath@
        if (PyArray_IsPythonNumber(other)) {
            PyObject *scalar, *result;
            scalar = Py@Name@_From@Name@(arg1);
            result = PyObject_RichCompare(scalar, other, cmp_op);
            if (result == NULL) {
                return NULL;
            }
            out = PyObject_IsTrue(result);
            if (out == -1) {
                return NULL;
            }
            Py_DECREF(result);
            goto done;
        }
#endif
        return PyGenericArrType_Type.tp_richcompare(self, other, cmp_op);

But it fails for np.int64(-1) == 18446744073709551615 because arg1 becomes -1. Even if I replace the code with .item I get same, anything I can do differently here?

[EDIT]
This is in (I had pasted power code before, my bad)

numpy/numpy/core/src/umath/scalarmath.c.src

Lines 1319 to 1324 in 91d9bbe

    
           case -2: 
        
               /* use ufunc */ 
        
               if (PyErr_Occurred()) { 
        
                   return NULL; 
        
               } 
        
               return PyGenericArrType_Type.tp_richcompare(self, other, cmp_op);

seberg · 2020-12-10T16:18:11Z

Ah, right, I am not sure you need the * #fastpath = 1*14, 0*3# addition, because longdouble takes the -3 case?

Why are you converting the obj to the scalar instead of the other way around? Your old code with item = gentype_generic_method(self, NULL, NULL, "item"); made more sense to me (albeit it should be replaced with the correct C-side incarnation).

It all looks weird, but when other is an arbitrary Python object, and the numpy scalar is not longdouble/clongdouble, we actually currently already do:

Call ufuncs
Convert both into an array
- The second object will be an object dtype array
Cast the original scalar (now an array) to object so that it has the same type as the other
- This cast, for the main builtin numerical types, actually does arr.item() and not just arr[()].

So in the end we already operate on Python ~~objects~~ types, just in a very slow and convoluted way.

ganesh-k13 · 2020-12-10T16:34:28Z

Oh I see, yeah I missed the _convert2_to_ctypes logic. gentype_generic_method was static to the file, hence needed conversion to a scalar to call its richcompare, hence the obj to scalar.

Given that the original commit is showing significant gains in performance, I'll dig in a bit to see where we are slowing down.

albeit it should be replaced with the correct C-side incarnation

~~Yeah, probably moving previous code up a function call should do it~~, Getting the item in the C way will do it. will do so after reading a bit.

seberg · 2020-12-10T16:47:44Z

Ohh, I think you are testing mixed types? I.e. one of them is an arbitrary type, the other is always a default integer (probably C long?)... I.e. int8 + int64 is super slow, but int64 + int8 is fine.

IIRC those paths were messed up, because we did not have a correct "hierarchy" in the code. I tried to clean it up, but it turned out a pretty big PR and I never opened it, I should probably reactivate. There was a problem in that what we should be doing is:

if other.dtype.type_num > self.dtype.type_num:
    return NotImplemented

(At least that seemed like the easiest solution). Right now I think what happens it that if the first dtype is the smaller type, we will always end up in the super-slow path. With the return NotImplemented fix, we at least only end up in the super-slow path if (a + b).dtype is not in (a.dtype, b.dtype), which is much better.

More complex or generic schemes seemed like over-engineering to me, as they would have to duplicate a lot of ufunc logic which is likely not going to end up very light-weight in any case.

I would love if if you can find a more minimal fix, but in many ways I think this code requires an overhaul :(. Which means rewriting a bunch of it.

ganesh-k13 · 2020-12-10T16:55:46Z

I started with your branch and resolved the conflicts(not logical conflicts, as it does not compile yet :) ) here: ganesh-k13#1.

I'll be happy to resume work on it and finish it if you feel it helps this part of the code.

seberg · 2020-12-10T16:57:14Z

I feel it would help a lot, but I don't quite remember the state :), this code hasn't been touched in more than a decade, at least not seriously. The code is super convoluted, and I at least think it made it quite a bit more easy to grasp the logic (and probably actually fixes a few bugs).

ganesh-k13 · 2020-12-10T17:00:34Z

Ohh I see. Cool yeah, I'll work on that branch. Thanks.

ganesh-k13 · 2020-12-12T18:28:21Z

Hey @seberg , I leveraged some stuff from the branch and added fastpath in the -1 case where it's safe to cast. Let me know your thoughts on this.

ganesh-k13 · 2020-12-16T16:28:34Z

       before           after         ratio
     [4d290795]       [3115c165]
     <master>         <enh_14415_fast_compare>
-      23.7±0.1μs      11.7±0.04μs     0.49  bench_scalar.ScalarMath.time_compare('int32')
-      23.9±0.3μs       11.7±0.2μs     0.49  bench_scalar.ScalarMath.time_compare('int16')
-      25.5±0.3μs       12.4±0.2μs     0.49  bench_scalar.ScalarMath.time_compare('complex64')
-     25.5±0.05μs      11.9±0.09μs     0.47  bench_scalar.ScalarMath.time_compare('float32')
-      26.1±0.4μs      11.9±0.05μs     0.45  bench_scalar.ScalarMath.time_compare('float16')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

numpy/core/src/umath/scalarmath.c.src

ganesh-k13 · 2020-12-25T14:02:19Z

Took some time to repro the issue, we hit a NULL pointer dereference(Py_DECREF on NULL) in complex number cases. Will fix it.

ganesh-k13 · 2021-01-11T14:25:05Z

Hi @seberg , sorry for the long pause in the PR.
I noticed the following TC causing crash in specific environment. I think handling return value for self_descr for overflow should take care of it.
Now to the overflow:

numpy/numpy/core/tests/test_getlimits.py

Lines 68 to 70 in a190258

    
           types = np.sctypes['uint'] 
        
           for T in types: 
        
               assert_equal(iinfo(T).max, T(-1))

Is it defined behaviour to do np.uint(-1)?

charris · 2021-01-11T15:03:39Z

Is it defined behaviour to do np.uint(-1)?

I think so, although it may not be formalized. C allows it, but maybe we should make an explicit choice.

seberg · 2021-01-11T15:31:22Z

Lets say that it was once explicitly implemented that uint(-1) will work, so for now I will say yes. For comparisons it would be best if this can't happen (the only problematic thing is probably uint64 == int64 using float and losing precision though).

It is something that we could think about changing, but I am not sure it is worth the trouble and how annoying the work around is (since using -1 for maxint is not uncommon, even if technicall undefined by the C standard.)

BUG: handled ret and overflow BUG: Fixed crash in do_richcompare_on_scalars | Added complex check on fastpath BUG: Added scalar check for self MAINT: Refactor PyLong descr creation MAINT: Moved header location MAINT: Added XDECREF | Refactor complex equality check BUG: Fixed error in complex check logic

ganesh-k13 · 2021-02-10T09:11:02Z

Updated bench for changes:

       before           after         ratio
     [f6a71fea]       [13e94849]
     <master>         <enh_14415_fast_compare>
+         455±2ns         545±60ns     1.20  bench_scalar.ScalarMath.time_abs('longfloat')
+         490±7ns        585±100ns     1.19  bench_scalar.ScalarMath.time_addition('int64')
+         539±6ns         579±20ns     1.07  bench_scalar.ScalarMath.time_addition('complex256')
+       538±0.6ns         571±20ns     1.06  bench_scalar.ScalarMath.time_addition('longfloat')
+         485±1ns          513±6ns     1.06  bench_scalar.ScalarMath.time_multiplication('float64')
+         483±5ns         511±20ns     1.06  bench_scalar.ScalarMath.time_addition('float64')
+       456±0.7ns          482±3ns     1.06  bench_scalar.ScalarMath.time_abs('complex128')
+         489±6ns          515±5ns     1.05  bench_scalar.ScalarMath.time_multiplication('int64')
-      82.9±0.6μs       54.8±0.5μs     0.66  bench_scalar.ScalarMath.time_compare_types('float64')
-         128±4μs         75.4±2μs     0.59  bench_scalar.ScalarMath.time_compare_types('int64')
-         152±7μs         76.3±2μs     0.50  bench_scalar.ScalarMath.time_compare_types('int32')
-         117±8μs       58.7±0.2μs     0.50  bench_scalar.ScalarMath.time_compare_types('complex64')
-         161±4μs         77.5±1μs     0.48  bench_scalar.ScalarMath.time_compare_types('int16')
-         129±2μs       59.1±0.4μs     0.46  bench_scalar.ScalarMath.time_compare_types('float32')
-     19.6±0.05μs       8.61±0.1μs     0.44  bench_scalar.ScalarMath.time_compare('int32')
-      20.0±0.4μs       8.36±0.2μs     0.42  bench_scalar.ScalarMath.time_compare('int16')
-      21.7±0.5μs      8.94±0.03μs     0.41  bench_scalar.ScalarMath.time_compare('complex64')
-      21.8±0.1μs       8.73±0.2μs     0.40  bench_scalar.ScalarMath.time_compare('float32')

ganesh-k13 · 2021-02-10T17:37:53Z

numpy/core/src/umath/scalarmath.c.src

+        is_complex_operands = (PyTypeNum_ISCOMPLEX(self_descr->type_num) ||
+            PyTypeNum_ISCOMPLEX(other_descr->type_num));
+        is_equality_operator = (cmp_op != Py_EQ && cmp_op != Py_NE);
+        is_python_scalar = PyDescr_PythonCRepresentable(self_descr) ||


I think a less confusing name would be is_not_* and PyDescr_Not*, will fix, aim is to basically check if the value can be represented as a native c type that python allows.

No need for the "C" probably, we are aiming for converting (safely) to a native Python type?

Yes indeed, also the PyArray_Is* functions that are already present do not have a proper check for native python types.

The problem being that, if we cannot represent in a PyObject, the richcompare seems to cause problem, long double being one of the victims.
[EDIT] PyObject via a c-python api

ganesh-k13 · 2021-02-10T17:44:20Z

Also I found a SIGABORT in the bench where I added a check for a type compared with all other dtype here. But no crash in UT. In particualr it means we lack a test where a clongdouble is never compared with int. I feel we must add this test somewhere. Any good location to add this?

EDIT: I fixed this case by adding the 1 or 0 check for return of descr_from_basic_scalar in this PR.

ganesh-k13 · 2021-02-18T18:05:05Z

Hey @seberg, let me know any other changes needed here. Thanks.

ganesh-k13 · 2021-02-18T18:07:33Z

I think the main pending items from side are the tests, overflow and few other combinations, will add them.

seberg · 2021-03-07T17:52:47Z

Thanks @ganesh-k13, I will prioritize this a bit soon. gh-18548 also makes this a bit timely, although it is probably somewhat orthogonal it struggles with similar problems...

seberg · 2021-03-10T02:23:52Z

For myself, maybe it does not matter: But if I get it right, doing this would basically cement the behaviour in gh-10322 for scalars. I really hope we can get away with fixing gh-10322 though, because if we can't the whole "weak python scalar" idea might break down. But if it breaks down, I am left with shambles of confusion because there is no likely chance of a consistent future :(.

ganesh-k13 · 2021-03-10T10:06:28Z

Yeah Sebastian, I liked this idea from that discussion:

It does seem to me array scalars should always be treated like arrays.

Though this PR does seem to give a massive boost to scalars, it does feel like a patchup rather than something actually intedend by design. Let's have a wider discussions on future of scalars to decide a more solid or rather uniform way of handling them

seberg · 2022-11-03T11:21:13Z

Going to close, since the merge conflicts will be large now. There are still things that this tries to speed up that are not sped up in the meantime (e.g. 1j == np.float64(1)), but a few other paths are improved anyway.
Hope you don't mind, if you do just reopen :).

7FF7

github-actions bot added the 01 - Enhancement label Dec 9, 2020

ganesh-k13 force-pushed the enh_14415_fast_compare branch from 0393c9a to fcba612 Compare December 9, 2020 18:39

seberg added the component: numpy._core label Dec 9, 2020

ganesh-k13 added a commit to ganesh-k13/numpy that referenced this pull request Dec 10, 2020

BENCH: Added benchmarks for fast paths (numpy#17970)

3d1f4cd

ganesh-k13 force-pushed the enh_14415_fast_compare branch from fcba612 to 3d1f4cd Compare December 10, 2020 10:16

ganesh-k13 marked this pull request as draft December 10, 2020 17:00

ganesh-k13 added a commit to ganesh-k13/numpy that referenced this pull request Dec 16, 2020

BENCH: Moved benchmarks for fast paths (numpy#17970)

af51386

ganesh-k13 force-pushed the enh_14415_fast_compare branch 2 times, most recently from 51aeb7c to 52e8085 Compare December 16, 2020 16:48

ganesh-k13 commented Dec 16, 2020

View reviewed changes

numpy/core/src/umath/scalarmath.c.src Outdated Show resolved Hide resolved

ganesh-k13 force-pushed the enh_14415_fast_compare branch from d945e26 to dc18734 Compare December 21, 2020 04:33

ganesh-k13 force-pushed the enh_14415_fast_compare branch from 3055f37 to e9a69eb Compare December 25, 2020 18:13

ganesh-k13 force-pushed the enh_14415_fast_compare branch from e9a69eb to 08dccbb Compare January 11, 2021 12:47

ganesh-k13 added 7 commits February 10, 2021 14:21

ENH: Added descr_from_basic_scalar

ce141fb

ENH: Added fast path for richcompare

1fb7a6c

ENH: Added comments and header

dcd1924

ENH: Replaced item getting from array to getitem call

9bfd81f

BENCH: Moved benchmarks for fast paths (numpy#17970)

f5a46b1

ENH: Enhanced scalar fastpaths

708dd75

ganesh-k13 force-pushed the enh_14415_fast_compare branch from 2d87920 to 13e9484 Compare February 10, 2021 09:03

seberg self-requested a review February 10, 2021 15:34

ganesh-k13 commented Feb 10, 2021

View reviewed changes

BENCH: Added scalar-scalar checks (numpy#17970)

2e016b7

ganesh-k13 force-pushed the enh_14415_fast_compare branch from 13e9484 to ae8d903 Compare February 12, 2021 06:04

MAINT: Added comments to richcompare fastpath

4bdfda8

ganesh-k13 force-pushed the enh_14415_fast_compare branch from 9bcb421 to 5a6cc59 Compare February 15, 2021 13:19

TST: Added numbers comparison (numpy#17970)

3b6ace3

ganesh-k13 force-pushed the enh_14415_fast_compare branch from 5a6cc59 to 3b6ace3 Compare February 16, 2021 04:06

Base automatically changed from master to main March 4, 2021 02:05

mattip mentioned this pull request Aug 22, 2021

ENH: Fast paths for richcompare #19720

Closed

seberg closed this Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Fast paths for richcompare #17970

ENH: Fast paths for richcompare #17970

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: Fast paths for richcompare #17970

ENH: Fast paths for richcompare #17970

Uh oh!

Conversation

Uh oh!

Changes:

Benchmarks:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!