ENH: Increase average calculation performance. See #5507 #5525

minhlongdo · 2015-01-30T10:50:02Z

No description provided.

Nodd · 2015-01-30T15:02:29Z

numpy/lib/function_base.py

@@ -514,8 +514,7 @@ def average(a, axis=None, weights=None, returned=False):
        avg = a.mean(axis)
        scl = avg.dtype.type(a.size/avg.size)
    else:
-        a = a + 0.0
-        wgt = np.asarray(weights)
+        wgt = np.asarray(weights, dtype=float)


As said in the corresponding issue, you should use np.result_type(weights, np.float) instead of float only.

unfortunately that won't work as result_type needs an ndarray as input, which is was asarray creates

Maybe asarray needs an argument to say 'make the dtype at least this high
on the casting graph (higher is OK)'?
On 30 Jan 2015 16:50, "Julian Taylor" notifications@github.com wrote:

In numpy/lib/function_base.py
#5525 (comment):

@@ -514,8 +514,7 @@ def average(a, axis=None, weights=None, returned=False):
avg = a.mean(axis)
scl = avg.dtype.type(a.size/avg.size)
else:

a = a + 0.0

wgt = np.asarray(weights)

wgt = np.asarray(weights, dtype=float)

unfortunately that won't work as result_type needs an ndarray as input,
which is was asarray creates

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/pull/5525/files#r23854432.

Guys, the calculation down below already does a result type for weights, you do not need to cast it here (unless you want at least 64bit floats). As for a, you can actually use result type to cast it as suggested, and just do the cast in the sum like it is done for the weights?

njsmith · 2015-01-30T19:26:42Z

On further consideration, this change doesn't make much sense to me. The
speedup is from dropping the a = a + 0.0 -- how does changing the dtype of
weights replace this?

OTOH assuming that weights is real is probably OK -- I thought we were
forcing 'a' to be real.

BTW, copy=0 is the default for asarray.
On 30 Jan 2015 10:50, "Minh-Long Do" notifications@github.com wrote:

You can view, comment on, or merge this pull request online at:

#5525
Commit Summary

Add @Profile tag to profile code line.

Remove unnecesary step of turning numpy array int type to float.
Increased performance. See Performance of numpy average and numpy.mean function #5507.

Remove @Profile

File Changes

M numpy/lib/function_base.py
https://github.com/numpy/numpy/pull/5525/files#diff-0 (3)

Patch Links:

https://github.com/numpy/numpy/pull/5525.patch

https://github.com/numpy/numpy/pull/5525.diff

—
Reply to this email directly or view it on GitHub
#5525.

jaimefrio · 2015-01-30T19:55:32Z

I think the best solution would look something like:

wts = np.asarray(wts)
dt = np.result_type(a.dtype, wts.dtype, float)

and later on always make calls to the ufuncs with dtype=dt specified:

scl = wgt.sum(axis=axis, dtype=dt)
avg = np.multiply(a, wgt, dtype=dt).sum(axis) / scl

There is also a very weird construct a little bit further down:

if returned:
    scl = np.multiply(avg, 0) + scl

which we could probably do away with if we implemented the above, as the only conceivable reason I can see to do that would be to have scl be of np.result_type(avg.dtype, scl.dtype).

ewmoore · 2015-01-30T20:19:06Z

Doesn't np.result_type(a.dtype, wts.dtype, float) upcast float32 to
float64?

On Fri, Jan 30, 2015 at 2:55 PM, Jaime notifications@github.com wrote:

I think the best solution would look something like:

wts = np.asarray(wts)
dt = np.result_type(a.dtype, wts.dtype, float)

and later on always make calls to the ufuncs with dtype=dt specified:

scl = wgt.sum(axis=axis, dtype=dt)
avg = np.multiply(a, wgt, dtype=dt).sum(axis) / scl

There is also a very weird construct a little bit further down:

if returned:
scl = np.multiply(avg, 0) + scl

which we could probably do away with if we implemented the above, as the
only conceivable reason I can see to do that would be to have scl be of np.result_type(avg.dtype,
scl.dtype).

—
Reply to this email directly or view it on GitHub
#5525 (comment).

seberg · 2015-01-30T20:37:51Z

Yes, you would need to use np.result_type(a, wts, 0.) (or use a.dtype if you want it to never behave like a scalar for a, etc.)

minhlongdo · 2015-02-03T00:45:24Z

In the current code base by doing a = a + 0.0 is to force a to be real. So I was wondering if np.average(a, weights=None, returned=False) also accepts an array of complex numbers. Because, if it accepts np.complex128 then the operation a = a + 0.0 would not be necessary since the imaginary and real parts of the complex number are already np.float64 . So I did the following in the if statement if not isinstance(a, np.matrix) to reduce unnecessary operations that forces a to be real.


if a.dtype is np.dtype('complex128'):
8000
 
    if weights is None:
        # Just convert into an np.array
        a = np.asarray(a)
    else:
        if a.dtype is not np.dtype('complex128'):
            # Convert to np.array and set np.dtype to float64
            a = np.asarray(a, dtype=np.dtype('float64'))
else:
    # If a is a np.matrix and there are weight values
    if weights is not None:
        # if dtype of a is not np.dtype('complex128') then make it to be np.float64
        if a.dtype is not np.dtype('complex128'):
            # No need to attempt conversion for np.float64 if it is already np.float64
            if a.dtype is not np.dtype('float64'):
                # Worse case scenario, np.matrix is not np.float64 then the following will force a to be np.float64
                a = a + 0.0

The performance of this change by using the script https://gist.github.com/skuschel/2d148a37a2ce17925fb0 provided by skuschel is the following:

execution time np.average: 3.06e-01 sec
exectution time np.mean: 3.00e-01 sec

What do you guys think?

minhlongdo · 2015-02-03T00:46:00Z

Sorry I accidentally closed it.

charris · 2015-02-08T19:16:08Z

In the current code base by doing a = a + 0.0 is to force a to be real

Real or complex while preserving inexact type. It is an tricky way to achieve that end. Maybe we should just check for inexact? This PR looks a bit too complicated to me as it stands.

@minhlongdo At some point the commits will need to be squashed into one and the commit message cleaned up. See doc/source/dev/gitwash/development_workflow.rst for discussion of commit messages.

minhlongdo · 2015-02-10T03:48:44Z

@charris Will do, thanks for the feedback.

minhlongdo · 2015-02-10T07:56:02Z

@charris I looked into inexact and it does indeed simply the proposed solution.
I somehow managed to mess up, so I made a new pull request #5551. I am sorry for the mess.

minhlongdo changed the title ~~Increase average calculation performance. See #5507~~ ENC Increase average calculation performance. See #5507 Jan 30, 2015

minhlongdo changed the title ~~ENC Increase average calculation performance. See #5507~~ EHC Increase average calculation performance. See #5507 Jan 30, 2015

minhlongdo changed the title ~~EHC Increase average calculation performance. See #5507~~ ECH Increase average calculation performance. See #5507 Jan 30, 2015

minhlongdo changed the title ~~ECH Increase average calculation performance. See #5507~~ ENH Increase average calculation performance. See #5507 Jan 30, 2015

minhlongdo changed the title ~~ENH Increase average calculation performance. See #5507~~ ENH: Increase average calculation performance. See #5507 Jan 30, 2015

Nodd reviewed Jan 30, 2015
View reviewed changes

minhlongdo closed this Feb 3, 2015

minhlongdo reopened this Feb 3, 2015

minhlongdo closed this Feb 10, 2015

jaimefrio mentioned this pull request Feb 10, 2015

ENH: Increase numpy.average performance mentioned in #5507. #5551

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Increase average calculation performance. See #5507 #5525

ENH: Increase average calculation performance. See #5507 #5525

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: Increase average calculation performance. See #5507 #5525

ENH: Increase average calculation performance. See #5507 #5525

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!