8000 variance is inaccurate for arrays of identical, large values (Trac #1098) · Issue #1696 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

variance is inaccurate for arrays of identical, large values (Trac #1098) #1696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thouis opened this issue Oct 19, 2012 · 5 comments
Closed

Comments

@thouis
Copy link
Contributor
thouis commented Oct 19, 2012

Original ticket http://projects.scipy.org/numpy/ticket/1098 on 2009-04-29 by @thouis, assigned to @charris.

Variance calculation is inaccurate for arrays of large, identical values:

from numpy import *
(ones(100000)10.0*20).var()
52756253943791624.0

There are more accurate algorithms for computing variance. One example from Welford (1962) is in the attached file.

@thouis
Copy link
Contributor Author
thouis commented Oct 19, 2012

Attachment added by @thouis on 2009-04-29: test_var.py

@thouis
Copy link
Contributor Author
thouis commented Oct 19, 2012

Milestone changed to 1.4.0 by @cournape on 2009-06-01

@thouis
Copy link
Contributor Author
thouis commented Oct 19, 2012

Milestone changed to Unscheduled by @cournape on 2009-11-27

@thouis
Copy link
Contributor Author
thouis commented Oct 19, 2012

@bsouthey wrote on 2011-01-26

At least on Linux, this can be addressed by the dtype argument:

import numpy as np
>>> (np.ones(100000)*10.0**20).var(dtype=np.float64)
52756253943791624.0
>>> (np.ones(100000)*10.0**20).var(dtype=np.float128)
0.0

Of course it just delays the situation when using even larger numbers.

@charris
Copy link
Member
charris commented Feb 20, 2014

This is much improved after #3685. Note that the relative error in each term is ~1ulp.

In [13]: a.std()/1e20
Out[13]: 1.6383999999999999e-16

The error arises from the determination of the mean combined with the fact that all error are the same, hence no cancellation when adding up the variance. Using Python's "exact" fsum doesn't do any better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
0