-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Fast paths for richcompare #17970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0393c9a
to
fcba612
Compare
Thanks, I will have to think about it more. I do guess that it might be nice to intercept some of this even earlier, but I do not mind injecting a fast-path there it is practical and doesn't add much complexity. One thing to think about
(To be fair, the 2. part is already broken in some similar paths, but this would break it more/worse.) |
Thanks for the review @seberg , will test it out 👍 |
fcba612
to
3d1f4cd
Compare
I tried something like this: * #Name = Long*11, Float*3,,,#
* #fastpath = 1*14, 0*3#
.....
case -2:
/* use ufunc */
if (PyErr_Occurred()) {
return NULL;
}
#if @fastpath@
if (PyArray_IsPythonNumber(other)) {
PyObject *scalar, *result;
scalar = Py@Name@_From@Name@(arg1);
result = PyObject_RichCompare(scalar, other, cmp_op);
if (result == NULL) {
return NULL;
}
out = PyObject_IsTrue(result);
if (out == -1) {
return NULL;
}
Py_DECREF(result);
goto done;
}
#endif
return PyGenericArrType_Type.tp_richcompare(self, other, cmp_op); But it fails for [EDIT] numpy/numpy/core/src/umath/scalarmath.c.src Lines 1319 to 1324 in 91d9bbe
|
Ah, right, I am not sure you need the Why are you converting the It all looks weird, but when
So in the end we already operate on Python |
Oh I see, yeah I missed the Given that the original commit is showing significant gains in performance, I'll dig in a bit to see where we are slowing down.
|
Ohh, I think you are testing mixed types? I.e. one of them is an arbitrary type, the other is always a default integer (probably IIRC those paths were messed up, because we did not have a correct "hierarchy" in the code. I tried to clean it up, but it turned out a pretty big PR and I never opened it, I should probably reactivate. There was a problem in that what we should be doing is:
(At least that seemed like the easiest solution). Right now I think what happens it that if the first dtype is the smaller type, we will always end up in the super-slow path. With the More complex or generic schemes seemed like over-engineering to me, as they would have to duplicate a lot of ufunc logic which is likely not going to end up very light-weight in any case. I would love if if you can find a more minimal fix, but in many ways I think this code requires an overhaul :(. Which means rewriting a bunch of it. |
I started with your branch and resolved the conflicts(not logical conflicts, as it does not compile yet :) ) here: ganesh-k13#1. I'll be happy to resume work on it and finish it if you feel it helps this part of the code. |
I feel it would help a lot, but I don't quite remember the state :), this code hasn't been touched in more than a decade, at least not seriously. The code is super convoluted, and I at least think it made it quite a bit more easy to grasp the logic (and probably actually fixes a few bugs). |
Ohh I see. Cool yeah, I'll work on that branch. Thanks. |
Hey @seberg , I leveraged some stuff from the branch and added fastpath in the |
|
51aeb7c
to
52e8085
Compare
d945e26
to
dc18734
Compare
Took some time to repro the issue, we hit a NULL pointer dereference(Py_DECREF on NULL) in complex number cases. Will fix it. |
3055f37
to
e9a69eb
Compare
e9a69eb
to
08dccbb
Compare
Hi @seberg , sorry for the long pause in the PR. numpy/numpy/core/tests/test_getlimits.py Lines 68 to 70 in a190258
Is it defined behaviour to do np.uint(-1) ?
|
I think so, although it may not be formalized. C allows it, but maybe we should make an explicit choice. |
Lets say that it was once explicitly implemented that It is something that we could think about changing, but I am not sure it is worth the trouble and how annoying the work around is (since using -1 for maxint is not uncommon, even if technicall undefined by the C standard.) |
BUG: handled ret and overflow BUG: Fixed crash in do_richcompare_on_scalars | Added complex check on fastpath BUG: Added scalar check for self MAINT: Refactor PyLong descr creation MAINT: Moved header location MAINT: Added XDECREF | Refactor complex equality check BUG: Fixed error in complex check logic
2d87920
to
13e9484
Compare
Updated bench for changes:
|
is_complex_operands = (PyTypeNum_ISCOMPLEX(self_descr->type_num) || | ||
PyTypeNum_ISCOMPLEX(other_descr->type_num)); | ||
is_equality_operator = (cmp_op != Py_EQ && cmp_op != Py_NE); | ||
is_python_scalar = PyDescr_PythonCRepresentable(self_descr) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a less confusing name would be is_not_*
and PyDescr_Not*
, will fix, aim is to basically check if the value can be represented as a native c type that python allows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the "C" probably, we are aiming for converting (safely) to a native Python type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, also the PyArray_Is*
functions that are already present do not have a proper check for native python types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem being that, if we cannot represent in a PyObject, the richcompare seems to cause problem, long double being one of the victims.
[EDIT] PyObject via a c-python api
Also I found a SIGABORT in the bench where I added a check for a type compared with all other dtype here. But no crash in UT. In particualr it means we lack a test where a EDIT: I fixed this case by adding the |
13e9484
to
ae8d903
Compare
9bcb421
to
5a6cc59
Compare
5a6cc59
to
3b6ace3
Compare
Hey @seberg, let me know any other changes needed here. Thanks. |
I think the main pending items from side are the tests, overflow and few other combinations, will add them. |
Thanks @ganesh-k13, I will prioritize this a bit soon. gh-18548 also makes this a bit timely, although it is probably somewhat orthogonal it struggles with similar problems... |
For myself, maybe it does not matter: But if I get it right, doing this would basically cement the behaviour in gh-10322 for scalars. I really hope we can get away with fixing gh-10322 though, because if we can't the whole "weak python scalar" idea might break down. But if it breaks down, I am left with shambles of confusion because there is no likely chance of a consistent future :(. |
Yeah Sebastian, I liked this idea from that discussion:
Though this PR does seem to give a massive boost to scalars, it does feel like a patchup rather than something actually intedend by design. Let's have a wider discussions on future of scalars to decide a more solid or rather uniform way of handling them |
Going to close, since the merge conflicts will be large now. There are still things that this tries to speed up that are not sped up in the meantime (e.g. |
Changes:
Benchmarks:
Benchmarks:
resolves: #14415
cc: @seberg , @eric-wieser , @mattip