-
-
Notifications
You must be signed in to change notification settings - Fork 11k
more small-array performance improvements #4904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -4355,7 +4368,8 @@ PyUFunc_FromFuncAndDataAndSignature(PyUFuncGenericFunction *func, void **data, | |||
} | |||
ufunc->doc = doc; | |||
|
|||
ufunc->op_flags = PyArray_malloc(sizeof(npy_uint32)*ufunc->nargs); | |||
ufunc->op_flags = PyArray_malloc(sizeof(npy_uint32)*(ufunc->nargs + | |||
NPY_NTYPES)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this in a cleaner way? E.g.
typedef struct {
PyUFuncObject public;
/* extra fields here */
} PyUFuncObjectPrivate;
and then they can be cast back and forth to each other so long as we're careful to always allocate the full thing? (This does assume that subclassing ufunc is illegal though.)
Or if that's not doable for some reason, at least a few helpers to get/set this table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm thinking about it this approach does not really help as we would also need some marker to indicate that field exists.
actually that object has been extended a few times (e.g. in 1.7) so I guess nobody actually relies on its size not growing. So we could just stuff a pointer to a private object into the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm that also won't work as we might get passed in some user copy of the old structure...
also it would break forward compatibilty with cython, we don't provide that anyway but it would be nice to avoid it for something that might turn out as temporary.
I guess a second independant lookup table based on the code generator would work
@juliantaylor Are you OK with this in its current state? |
not in the current state, it breaks ABI anyway so we might as well increase the structure size (causing more problems for cython forward compat) |
We could do more to hide the structure, but such things take time as we need to coordinate with cython. Which is to say we should try to plan and get started early ;) I'm thinking of separating out the numpy ABI into something more like a library. What are your thoughts on that? |
could we use the check_return which 8000 seems to be unused to flag extension? possibly a special iter flag would work too? or should we just not care about breaking the abi of that structure? we seem to have done so lots of times in the last couple releases. |
Check first character of ufunc name before attempting a full string compare. This improves scalar operations performance slightly.
The creation and parsing of the type tuple is slower than the array result path used by binary ufuncs. For the most common reductions add a fast path skipping the type tuple creation and sending it through the array result path. Improves small reduction performance of these types by 5%-10%.
updating unconditionally caused a ~5% performance regression for scalar operations.
Add a jump table to the first entry of each of the basic type to the ufunc object and use it to skip over uninteresting inner loops. For the add_signatures this skips 13 loop iterations and improves scalar performance by 10%. The jump table is placed behind an dynamically allocated object already present in the ufunc object so the ABI is preserved.
52a90ed
to
27ff221
Compare
as we are going to be less strict on preserving the ufunc structure this can be probably be considered for merging |
Maybe it would be best to finish off that NEP first and then figure out what to do here? Is this urgent? |
Ping folks who are in the conversation. We should either close this or take it forward. |
Ping. |
Can we close this? |
Closing. Please reopen if you wish to move it forward. |
some more small array performance improvements, please see the commits for details.
The changes are somewhat hackish and could probably also be resolved by a more extensive overhaul of the ufunc object, though these changes are also very simple and should we decide to do an overhaul its good to have a better performance in our benchmarks for comparison.
small reductions (O(100) elements) improve on my machine by ~20% and binary ufuncs by about ~10%.
add.reduce(ones(100))
from 1.80us to 1.5usadd(ones(100), ones(100))
from 0.77us to 0.7us