You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default data type for the accumulator variable in the mean method should be double precision. The problem can best be illustrated with the following example:
Python 2.4.3 (#2, Dec 7 2006, 11:01:45)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as n
>>> n.__version__
'1.0.2.dev3571'
>>> a = n.ones((1000,1000),dtype=n.float32)*132.00005
>>> print a
[[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
...,
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]]
>>> a.min()
132.000045776
>>> a.max()
132.000045776
>>> a.mean()
133.96639999999999
Having the mean be greater than the maximum is a tad odd.
The calculation of the mean is occurring with a single precision accumulator variable. A user can force a double precision calculation with the following command and receive a correct result:
However, this is not going to be obvious to the casual user and will appear to be an error.
I realize that one reason for not doing all calculations as double precision is performance. However, it is probably better to always receive the correct answer than to quickly arrive at the wrong one.
The current default behavior needs to be changed. All calculations should be done in double precision. If performance is needed the "expert user" can go back and start setting data types after having shown that their application arrives at a correct result.
Not having to worry about overflow problems in the accumulator variable would also make numpy consistent with numarray's behavior.
The text was updated successfully, but these errors were encountered:
The problem with this becomes more clear when reducing over a single axis. If you have a 2-d float32 array and compute the mean over one of the axes you obtain a 1-d array. It should be of float32 type. If the accumulator is done with a different data-type, then by default you will end up with a float64 array, which is not what many people will expect.
I would like to avoid guessing what the user wants, because it makes the system harder to explain and requires a lot of "special-case" checking which also makes the system harder to maintain. If users want a higher-precision accumulator they can get it.
Original ticket http://projects.scipy.org/numpy/ticket/465 on 2007-03-07 by @chanley, assigned to unknown.
The default data type for the accumulator variable in the mean method should be double precision. The problem can best be illustrated with the following example:
Having the mean be greater than the maximum is a tad odd.
The calculation of the mean is occurring with a single precision accumulator variable. A user can force a double precision calculation with the following command and receive a correct result:
However, this is not going to be obvious to the casual user and will appear to be an error.
I realize that one reason for not doing all calculations as double precision is performance. However, it is probably better to always receive the correct answer than to quickly arrive at the wrong one.
The current default behavior needs to be changed. All calculations should be done in double precision. If performance is needed the "expert user" can go back and start setting data types after having shown that their application arrives at a correct result.
Not having to worry about overflow problems in the accumulator variable would also make numpy consistent with numarray's behavior.
The text was updated successfully, but these errors were encountered: