ndarray's mean method should be computed using double precision (Trac #465) #1063

numpy-gitbot · 2012-10-19T18:53:03Z

Original ticket http://projects.scipy.org/numpy/ticket/465 on 2007-03-07 by @chanley, assigned to unknown.

The default data type for the accumulator variable in the mean method should be double precision. The problem can best be illustrated with the following example:

Python 2.4.3 (#2, Dec  7 2006, 11:01:45) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as n
>>> n.__version__
'1.0.2.dev3571'
>>> a = n.ones((1000,1000),dtype=n.float32)*132.00005
>>> print a
[[ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 ..., 
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]]
>>> a.min()
132.000045776
>>> a.max()
132.000045776
>>> a.mean()
133.96639999999999

Having the mean be greater than the maximum is a tad odd.

The calculation of the mean is occurring with a single precision accumulator variable. A user can force a double precision calculation with the following command and receive a correct result:

>>> a.mean(dtype=n.float64)
132.00004577636719
>>>

However, this is not going to be obvious to the casual user and will appear to be an error.

I realize that one reason for not doing all calculations as double precision is performance. However, it is probably better to always receive the correct answer than to quickly arrive at the wrong one.

The current default behavior needs to be changed. All calculations should be done in double precision. If performance is needed the "expert user" can go back and start setting data types after having shown that their application arrives at a correct result.

Not having to worry about overflow problems in the accumulator variable would also make numpy consistent with numarray's behavior.

The text was updated successfully, but these errors were encountered:

numpy-gitbot · 2012-10-19T18:53:03Z

Milestone changed to 1.0.2 Release by @chanley on 2007-03-07

numpy-gitbot · 2012-10-19T18:53:03Z

@teoliphant wrote on 2007-03-31

The problem with this becomes more clear when reducing over a single axis. If you have a 2-d float32 array and compute the mean over one of the axes you obtain a 1-d array. It should be of float32 type. If the accumulator is done with a different data-type, then by default you will end up with a float64 array, which is not what many people will expect.

I would like to avoid guessing what the user wants, because it makes the system harder to explain and requires a lot of "special-case" checking which also makes the system harder to maintain. If users want a higher-precision accumulator they can get it.

numpy-gitbot closed this as completed Oct 19, 2012

This was referenced Oct 23, 2012

numpy.mean(): accumulator default type should not be single precision (Trac #435) #1033

Closed

Wrong result when calculating the mean masking nan values on a 64 bit system (Trac #2170) #624

Closed

argriffing mentioned this issue May 13, 2014

Numerical stability #4694

Closed

musically-ut mentioned this issue Mar 17, 2015

Numerically unstable mean calculation for Timedeltas. pandas-dev/pandas#9670

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ndarray's mean method should be computed using double precision (Trac #465) #1063

ndarray's mean method should be computed using double precision (Trac #465) #1063

Uh oh!

Uh oh!

Uh oh!

ndarray's mean method should be computed using double precision (Trac #465) #1063

ndarray's mean method should be computed using double precision (Trac #465) #1063

Comments

Uh oh!

Uh oh!