8000 ndarray's mean method should be computed using double precision (Trac #465) · Issue #1063 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ndarray's mean method should be computed using double precision (Trac #465) #1063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 2 comments

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/465 on 2007-03-07 by @chanley, assigned to unknown.

The default data type for the accumulator variable in the mean method should be double precision. The problem can best be illustrated with the following example:

Python 2.4.3 (#2, Dec  7 2006, 11:01:45) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as n
>>> n.__version__
'1.0.2.dev3571'
>>> a = n.ones((1000,1000),dtype=n.float32)*132.00005
>>> print a
[[ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 ..., 
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]
 [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
   132.00004578  132.00004578]]
>>> a.min()
132.000045776
>>> a.max()
132.000045776
>>> a.mean()
133.96639999999999

Having the mean be greater than the maximum is a tad odd.

The calculation of the mean is occurring with a single precision accumulator variable. A user can force a double precision calculation with the following command and receive a correct result:

>>> a.mean(dtype=n.float64)
132.00004577636719
>>> 

However, this is not going to be obvious to the casual user and will appear to be an error.

I realize that one reason for not doing all calculations as double precision is performance. However, it is probably better to always receive the correct answer than to quickly arrive at the wrong one.

The current default behavior needs to be changed. All calculations should be done in double precision. If performance is needed the "expert user" can go back and start setting data types after having shown that their application arrives at a correct result.

Not having to worry about overflow problems in the accumulator variable would also make numpy consistent with numarray's behavior.

@numpy-gitbot
Copy link
Author

Milestone changed to 1.0.2 Release by @chanley on 2007-03-07

@numpy-gitbot
Copy link
Author

@teoliphant wrote on 2007-03-31

The problem with this becomes more clear when reducing over a single axis. If you have a 2-d float32 array and compute the mean over one of the axes you obtain a 1-d array. It should be of float32 type. If the accumulator is done with a different data-type, then by default you will end up with a float64 array, which is not what many people will expect.

I would like to avoid guessing what the user wants, because it makes the system harder to explain and requires a lot of "special-case" checking which also makes the system harder to maintain. If users want a higher-precision accumulator they can get it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant
0