8000 Performance of numpy average and numpy.mean function · Issue #5507 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Performance of numpy average and numpy.mean function #5507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
skuschel opened this issue Jan 27, 2015 · 4 comments
Closed

Performance of numpy average and numpy.mean function #5507

skuschel opened this issue Jan 27, 2015 · 4 comments

Comments

@skuschel
Copy link

I need a weightened average function on a VERY large Dataset (some 1e8 numbers or more). The numpy functions mean and average serve me well and fast, but I discovered, that numpy.average is slower than builing the weightened average myself with two numpy.mean functions, as shown by the example:
https://gist.github.com/skuschel/2d148a37a2ce17925fb0
np.average(a,weights=b) takes 0.32 sec on my computer, but
np.mean(a*b)/np.mean(b) takes 0.23 sec for the equally sized dataset, yielding the same result.
How does that make sense?

@Nodd
Copy link
Contributor
Nodd commented Jan 27, 2015

I did a quick timing using line_profiler: https://gist.github.com/Nodd/926e3e21af1f04741c14

28% of the time is spent doing a = a + 0.0 (I guess it is to convert the array to float ?), this is an operation you're not doing while using mean.
(0.32-0.23)/0.32 is 28%, so it corresponds to the slowdown you've noticed.

The average function could use some refactoring :)

@minhlongdo
Copy link

I removed a = a + 0.0 ,replaced wgt = np.asarray(weights) with wgt = np.asarray(weights, dtype=float) then ran the script provided by @skuschel and got the following results:
execution time np.average: 3.03e-01 sec
exectution time np.mean: 3.01e-01 sec

@njsmith
Copy link
Member
njsmith commented Jan 30, 2015

Excellent!

That's not quite a complete fix (consider what happens if the input is
complex), but that can be handled with a call to result_type I think.

Could you submit a pull request?

-n
On 30 Jan 2015 10:13, "Minh-Long Do" notifications@github.com wrote:

I removed a = a + 0.0 and replaced wgt = np.asarray(weights) with wgt =
np.asarray(weights, dtype=float, copy=0) and then ran the script provided
by @skuschel https://github.com/skuschel and got the following results:
execution time np.average: 3.03e-01 sec
exectution time np.mean: 3.01e-01 sec


Reply to this email directly or view it on GitHub
#5507 (comment).

@skuschel
Copy link
Author

got finally fixed in #7382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0