8000 ufunc.at (and possibly other methods) slow · Issue #11156 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ufunc.at (and possibly other methods) slow #11156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nschloe opened this issue May 24, 2018 · 10 comments
Closed

ufunc.at (and possibly other methods) slow #11156

nschloe opened this issue May 24, 2018 · 10 comments

Comments

@nschloe
Copy link
Contributor
nschloe commented May 24, 2018

I noticed that in many of my codes, seemingly harmless lines like

numpy.add.at(target, idx, vals)

take a large share of the runtime. I investigated and found that one gets a speed-up of a factor of 40 (!) by simply moving the critical code to C++.

f

(Not sure what's actually done in numpy.ufunc.) I put this into a very simple module, https://github.com/nschloe/fastfunc, so feel free to take some code out of there.

@mattip
Copy link
Member
mattip commented May 24, 2018

@nschloe
Copy link
Contributor Author
nschloe commented May 24, 2018

Yup.

@seberg
Copy link
Member
seberg commented May 24, 2018

We can leave this open, #5922 seems like a duplicate. I am surprised I did not create an issue the day we merged it, harharhar. Somewhere hidden in the depths there are some pointers how to make it at least a small factor faster (making it much faster is more involved, using a faster hook into the fancy indexing machinery and requiring to special different cases).

EDIT: One thing is that it uses utterly insane (and possibly unnecessary) slow stuff for casting (unless it was removed). But another factor that is that there is no inner loop specialization IIRC (i.e. the very old advanced indexing api). I admit, the first part may have been avoided if I had been more familiar with that stuff at the time of reviewing it... But basically we always knew it was not fast :/. (to my defence I said something like: dunno, this might be an option ;))

@jaimefrio
Copy link
Member

Here I explained what makes ufunc.at slow. Your code here takes a few shortcuts that are not really available to NumPy:

  • You have hardcoded the operations you have implemented, +, -, * and /, which is OK for most use cases, but does not support generic operations. If you had a generic function that took a function pointer that performed the operation, you would have more generic, albeit slower code. That is what NumPy does.

  • You have assumed that the a and b arrays are of the same type, which is not always the case. Not sure if your code can handle cases where these don't match, but that's a liberty NumPy can't take either.

  • I'm not sure how much magic pybind11 packs, but your iteration scheme seems deceptively simple for all the subtleties of fancy indexing that ufunc.at supports. Are you assuming that some/all of the arrays are 1D? That's again something NumPy isn't free to do.

@eric-wieser
Copy link
Member

It would be great if we could start using C++ template instead of the .c.src stuff within numpy, but I imagine there are distribution issues with that...

@nschloe
Copy link
Contributor Author
nschloe commented May 24, 2018

I'm not sure how much magic pybind11 packs, but your iteration scheme seems deceptively simple for all the subtleties of fancy indexing that ufunc.at supports. Are you assuming that some/all of the arrays are 1D? That's again something NumPy isn't free to do.

In op(target, idx, vals), the C++ code assumes target and vals to be 2D, idx to be 1D. All cases are simply reshaped into that; see https://github.com/nschloe/fastfunc/blob/master/fastfunc/helpers.py.

You have assumed that the a and b arrays are of the same type, which is not always the case. Not sure if your code can handle cases where these don't match, but that's a liberty NumPy can't take either.

This could perhaps be generalized; for now I just wanted to create a show case.

You have hardcoded the operations you have implemented, +, -, * and /, which is OK for most use cases, but does not support generic operations. If you had a generic function that took a function pointer that performed the operation, you would have more generic, albeit slower code. That is what NumPy does.

Well, then perhaps +, -, *, / should be treated differently than custom functions?

@ml31415
Copy link
ml31415 commented Aug 30, 2018

Well, then perhaps +, -, *, / should be treated differently than custom functions?

Yes, all the common things like min, max, mean, std and so on clearly deserve a dedicated treatment.

Not sure if you saw numpy-groupies already. If using numba or weave is an option for you, you might have a look at it. The speeds should be comparable with what you had with your C++ implementation. It has seen proper testing and also several other functions implemented and optimized by now.

@charris
Copy link
Member
charris commented Aug 30, 2018

It would be great if we could start using C++ template instead of the .c.src

There was a discussion back in the 2010-2011 time frame about moving to C++. I'm not so much bothered by the current templates, but having smart pointers and error handling would be a big +. The downside is that writing good C++ code takes a lot of expertise and the compiler error messages, especially of templates, are practically indecipherable. There would also be a need for keyword variable name cleanup, things like variables named new. I expect many things have improved since C++11 has come out and been widely implemented, unlike the early days when C++ support was quite spotty and unreliable.

@mattip
Copy link
Member
mattip commented Dec 27, 2022

Duplicate of #5922

@mattip
Copy link
Member
mattip commented Feb 8, 2023

Closing #23136 was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants
0