WIP Variable class that includes uncertainties; application to UQuantity #3715

mhvk · 2015-04-23T13:22:41Z

EDIT: updated: this PR provides ways to propagate uncertainties analytically, keeping track of correlations. It is meant as a companion to the Distribution class, which follows, effectively, a Monte-Carlo approach to propagating uncertainties. Items still to do:

Decide on a name (Variable is poor, Measurement does not include propagation/correlated errors, Variate suggests random number generation; Measurand may be closest but is not common);
Like for Distribution, consider auto-generating subclasses (VariableArray, VariableQuantity, etc.).
Make PR consistent with current minimum numpy version (i.e., use __array_ufunc__ only, no __array_prepare__, etc.)
Implement reductions; by default, these should drop correlations.

Follow-up of this PR might include:

Implement user-control for when/whether correlations are kept.
Ensure that error propagation could can be used as module in nddata

BELOW IS THE OUTDATED ORIGINAL TEXT

Very much work in progress, so do not merge

This is a first stab at a general Variable class, a subclass of ndarray that tracks uncertainties. Also implemented is an application to UQuantity, i.e., a mixin with Quantity. It is still inspired by the uncertainties package [1], written by @lebigot, and many of the test cases are taken from the GSoC work of @Epitrochoid. Unlike those efforts, however, this is a proper nd 8000 array subclass (while in @Epitrochoid's approach arrays were not yet possible, while in uncertainties, object arrays (holding Variable instances) are used, which is much slower).

Note that I started writing this using __numpy_ufunc__, which is the most logical and fastest approach, and hence this PR includes the commits of #2583 and #2948. Upon further thought, however, it became clear that this would work with the usual __array_prepare__ and __array_wrap__ as well; for now, I didn't feel it was worth removing the __numpy_ufunc__ machinery.

Anyway, as is, this works well for generic ufunc operations. One big outstanding issue is what to do with methods that combine (parts of) an array, such as sum or mean. E.g., if we do x - x.mean(), where x is some large array, do we want to track for every element of x its dependence on every other element of x (which, for a 1000x1000 image would imply a 1000x1000x1000 derivative matrix).

cc: @wkerzendorf, @eteq, @mwcraig

@lebigot: please let me know if you are interested in using this in your package.

[1] https://github.com/lebigot/uncertainties

embray · 2015-04-24T09:51:30Z

remind me to look closer at this later--it might be good to start looking into if/how this can be integrated into modeling as well.

lebigot · 2015-05-06T23:57:46Z

Thanks @mhvk: I will have a look at what you did.

The "big outstanding issue" that you are mentioning (large memory consumption for even simple array operations) is tricky (it is one of the main reasons why I did not delve sooner into this business!). A NxN = 1000x1000 matrix x of numbers with uncertainties actually gives through x - x.mean() an N^4 array of derivatives (not N^3) so the difficulty is even worse than what you mention.

There are C (and C++?) libraries out there that handle uncertainties in arrays: maybe they solved this issue and can be a source of inspiration?

lebigot · 2015-05-07T00:02:49Z

astropy/uncertainty/core.py

+UFUNC_DERIVATIVES[np.abs] = UFUNC_DERIVATIVES[np.fabs]
+
+if hasattr(np, 'cbrt'):
+    UFUNC_DERIVATIVES[np.cbrt] = lambda x: 1./(3.*np.cbrt(x)**2)


Shouldn't this be **(2/3)? (__future__ is used, so no need for 2./3).

Had to think again why I did it, but then remembered: np.cbrt is pretty optimized, as is squaring. As a result, what I have is faster than just raising to the 2/3:

In [2]: a = np.arange(100.) In [3]: %timeit np.cbrt(a) 100000 loops, best of 3: 2.93 µs per loop In [4]: %timeit np.cbrt(a)**2 100000 loops, best of 3: 3.78 µs per loop In [5]: %timeit a**(2/3) 100000 loops, best of 3: 7.33 µs per loop

Ah, sorry, I had missed the fact that you had put np.cbrt(x): I thought it was only x. Good timing results (I guess that in your example you have 2/3 = 0.666… and not 0, right?).

mhvk · 2015-05-07T13:49:16Z

@lebigot - indeed, you're right, in x - x.mean(), each element would depend on each other element, so it would be a 1000**4 derivatives matrix. Nice way to run out of memory... Of course, in that case, the dependence on the other elements is also so small that one can really ignore it. At some level, I think this has to be left to common sense: if a variable depends on more than some large number of others, one just calculates the uncertainty and forgets the dependencies. Or, one makes a constant uncertainty component from all variables which contribute less than some percentage to the total uncertainty.

Overall, if only for clarity of implementation, it seems reasonably sensible to let any of the reduce methods (like mean, sum, etc.) ignore the dependencies. But I welcome further thought.

lebigot · 2015-05-07T15:09:53Z

Good that you welcome further thoughts: I have been needing mean() etc. with uncertainties quite a few times. Handling its uncertainty is almost OK: the mean with uncertainty is essentially as big as the array itself. So, in my ideal world, mean() would keep its correct uncertainty. Optimizations and approximations could be triggered, though, for x - x.mean(). Again, I would be curious to see how C libraries that handle uncertainties in arrays tackle this.

astropy-bot · 2017-09-27T14:25:43Z

Hi there @mhvk 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃.

I see this is an experimental pull request. I'll report back on the checks once the PR discussion in settled.

If there are any issues with this message, please report them here.

mhvk · 2017-09-27T14:26:42Z

Rebased. @eteq - it would be good to discuss this together with the attempt at implementing distributions.

10000

bsipocz · 2017-09-27T14:37:50Z

@mhvk - I see that you've deleted a comment from the bot. Was it misbehaving? Any feedback is great at this point, so we can improve it.

mhvk · 2017-09-27T14:44:13Z

Not at all, it just was no longer relevant, so I just hoped to make life easier for someone going over earlier discussion.

bsipocz · 2017-09-27T18:46:41Z

astropy/uncertainty/core.py

+
+import numpy as np
+from astropy.utils.misc import isiterable
+from astropy.utils.compat import NUMPY_LT_1_10


since we require np 1.10+, this isn't available any more

bsipocz · 2017-09-27T18:47:40Z

astropy/uncertainty/__init__.py

@@ -0,0 +1,5 @@
+# Licensed under a 3-clause BSD style license - see LICENSE.rst
+from __future__ import absolute_import


please remove the future import

MSeifert04 · 2017-09-27T19:02:23Z

astropy/uncertainty/core.py

+        return result
+
+    def __array_finalize__(self, obj):
+        if super(Variable, self).__array_finalize__:


With only Python 3 support you don't need the arguments for super.

MSeifert04 · 2017-09-27T19:02:31Z

astropy/uncertainty/core.py

+
+    def __array_finalize__(self, obj):
+        if super(Variable, self).__array_finalize__:
+            super(Variable, self).__array_finalize__(obj)


MSeifert04 · 2017-09-27T19:02:38Z

astropy/uncertainty/uquantity.py

+        value = q_cls(value, unit, **kwargs)
+        if uncertainty is not None:
+            uncertainty = q_cls(uncertainty, value.unit)
+        return super(UQuantity, cls).__new__(cls, value, uncertainty, **kwargs)


MSeifert04 · 2017-09-27T19:02:42Z

astropy/uncertainty/uquantity.py

+            # Use str('U') to avoid unicode for class name in python2.
+            self[q_cls] = type(str('U') + q_cls.__name__,
+                               (UQuantity, Variable, q_cls), {})
+        return super(_SubclassDict, self).__getitem__(q_cls)


mhvk · 2019-07-25T18:35:34Z

Talking about NDData interaction: need way to ignore covariance. Similarly, may want some way to ensure covariance is tracked even in reductions.

astromancer · 2019-09-11T09:48:05Z

@mhvk This is great work! and very necessary imo! I'm keen to help you develop and test this. It would be helpful if you could point out the aspects that would be most beneficial to get some help on - based on your comments it seems rn making the covariance tracking mechanism optional is a priority. Otherwise I'll just start hacking and send PRs to your fork.

astromancer · 2019-09-11T10:20:34Z

astropy/uncertainty/uncertainty.py

+# -*- coding: utf-8 -*-
+# Licensed under a 3-clause BSD style license - see LICENSE.rst
+"""
+Distribution class and associated machinery.


This should probably read "Uncertainty" instead of "Distribution"

astromancer · 2019-09-11T10:33:58Z

astropy/uncertainty/uncertainty.py

+    np.multiply: (lambda x, y: y,
+                  lambda x, y: x),
+    np.true_divide: (lambda x, y: 1/y,
+                     lambda x, y: -x/y**2),


In terms of performance, np.square typically tends to be somewhat faster (5 - 15% in my experience) than using the power operator to square. Replacing all the squares here with np.square should give you a moderate performance boost. The same is probably true for np.power, but I have not tested this.

Numpy makes this optimization internally for __pow__: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/number.c#L507

Interesting. I wasn't aware of this optimization. I convinced myself that np.square was faster by running this test, but I suppose these results are themselves uncertain (ahem):

for p in range(1, 8): n = int(10 ** p) x = np.random.rand(n) t_pow = %timeit -o x ** 2 t_sqr = %timeit -o np.square(x) speedup = np.mean(1 - np.divide(t_sqr.timings, t_pow.timings)) print('speedup (n = %.0e): %.2f%%' % (n, speedup * 100))

The results were:

971 ns ± 63.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 911 ns ± 52.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) speedup (n = 1e+01): 5.71% 977 ns ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 909 ns ± 23.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) speedup (n = 1e+02): 6.84% 1.6 µs ± 45.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 1.53 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) speedup (n = 1e+03): 4.19% 5.23 µs ± 407 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 4.81 µs ± 302 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) speedup (n = 1e+04): 7.70% 64.2 µs ± 5.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 66 µs ± 7.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) speedup (n = 1e+05): -3.82% 1.86 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.86 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) speedup (n = 1e+06): -0.71% 55.4 ms ± 3.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 47.9 ms ± 7.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) speedup (n = 1e+07): 12.76%

Quite large fluctuations, indeed. Overall, my sense would be to leave it for the future - there may be larger advantages to be had by carefully ensuring more complicated operations are done in-place (though some of that is also done automatically in numpy).

astromancer · 2019-09-11T13:11:43Z

astropy/uncertainty/uncertainty.py

+        return '{0}({1})'.format(type(self).__name__, self.derivatives)
+
+
+class Variable(np.ndarray):


I wonder if there may not be a better name for this class. "Variable" is a term that is already quite overloaded, and doesn't seem such a good fit for the intended use of this class and could be confusing to the uninitiated. In statistical parlance, the objects represented by (instances of) this class are Random Variates - that is realizations of a Random Variable. It would also be a good idea to include "Array" in the name to indicate that this is actually a numpy array subclass. I would therefore suggest RandomVariateArray as a potential alternative name.

Agreed the name is very poor, but random to me suggests there is a random number generator involved, which is of course not the case. Maybe Measurement? But then I hope to use the same thing also for derived quantities... Wikipedia suggests Measurand, but it is the first I ever saw that term...

I'm in favour of Measurement or Variate or even VariateArray over Variable

My problem with Variate remains the association with random number generators - I have not seen it used for measurements or derived quantities. My problem with Measurement is that this suggests no dependence on other objects, while the whole point of the class is to include correlations. But Variable sounds like an instance in a programme, which is also no good.

What I'll do for now is right on top put an item that we have to decide on the name...

mhvk · 2019-09-11T13:34:20Z

@astromancer - it would be great to get some help with this - it is also on the schedule for a bit of hacking in November with @eteq, but the further it is by that time, the better - a definite goal to have it in 4.0.

The discussion above indeed mentions opting in/out of tracking covariance, but my feeling is that it is most important to have a good default. I think for now this has to mean that all reductions drop the covariances.

Looking at the code, though, I see that it really is a bit out of date (e.g., still having __array_prepare__ and __array_wrap__. I also think I would now do the construction a bit more like Distribution, where there is a non-array subclass that deals with the propagation and which mixes itself in with the input to make a new class.

But I don't think any of the above should stop you from contributing. In particular, it would be fantastic to have more tests. Or to have the basis for the reductions (method "reduce" in __array_ufunc__).

bsipocz · 2019-09-11T14:17:21Z

it is also on the schedule for a bit of hacking in November with @eteq, but the further it is by that time, the better - a definite goal to have it in 4.0.

@mhvk @eteq - feature freeze for 4.0 is the 25th of October

mhvk · 2019-10-24T20:49:02Z

Made some good progress with this, but not ready by tomorrow. And, like for Masked Quantities, best to introduce this in a feature release, not LTS.

astropy-bot · 2019-11-18T08:00:58Z

Hi humans 👋 - this pull request hasn't had any new commits for approximately 5 months. I plan to close this in a month if the pull request doesn't have any new commits by then.

In lieu of a stalled pull request, please consider closing this and open an issue instead if a reminder is needed to revisit in the future. Maintainers may also choose to add keep-open label to keep this PR open but it is discouraged unless absolutely necessary.

If this PR still needs to be reviewed, as an author, you can rebase it to reset the clock.

If you believe I commented on this pull request incorrectly, please report this here.

bsipocz · 2019-11-18T18:05:56Z

@mhvk - could you rebase to keep this PR alive?

astropy-bot · 2019-12-19T08:01:38Z

I'm going to close this pull request as per my previous message. If you think what is being added/fixed here is still important, please remember to open an issue to keep track of it. Thanks!

If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here.

nstarman · 2021-04-20T04:53:16Z

@mhvk, v5.0? Would be very cool.

mhvk · 2021-04-20T14:56:17Z

@nstarman - I'd happily revive this!

nstarman · 2021-04-20T19:25:04Z

And I'd be happy to help. I would love a way to naturally represent errors in coordinates and this seems a natural first stepping stone.

mhvk force-pushed the uncertainty branch 4 times, most recently from 113f8d1 to ef14802 Compare April 23, 2015 19:29

embray added nddata units Affects-release Experimental Enhancement labels Apr 29, 2015

mhvk mentioned this pull request Apr 30, 2015

dealing with correlated errors and covariance in fitting and stats #3709

Open

lebigot mentioned this pull request May 6, 2015

Module wrapping lmfit/uncertainties#19

Open

lebigot reviewed May 7, 2015
View reviewed changes

mhvk mentioned this pull request Sep 12, 2015

NDArithmeticMixin uncertainty propagation for multiplication/division #4152

Closed

embray force-pushed the master branch from 63540e0 to 34f4c29 Compare October 5, 2015 21:18

mhvk mentioned this pull request Oct 13, 2015

What constants to have and which to calculate #3843

Open

mhvk mentioned this pull request Dec 4, 2015

NDArithmeticMixin and NDUncertainty Refactor #4272

Merged

3 tasks

mhvk mentioned this pull request Sep 8, 2016

Remaking NDData to support LSST needs (APE11) astropy/astropy-APEs#14

Closed

mhvk force-pushed the uncertainty branch from ef14802 to 3efdd5a Compare September 27, 2017 14:25

astropy deleted a comment from astropy-bot bot Sep 27, 2017

bsipocz reviewed Sep 27, 2017

View reviewed changes

MSeifert04 reviewed Sep 27, 2017

View reviewed changes

Addition of an uncertainty module, with a UQuantity subclass.

87eb360

mhvk force-pushed the uncertainty branch from 1a9e0ab to 87eb360 Compare June 15, 2019 16:04

mhvk mentioned this pull request Jun 21, 2019

Using __array_function__ beyond Quantity? #8610

Open

6 tasks

eteq added the uncertainty label Jul 25, 2019

eteq mentioned this pull request Jul 25, 2019

Better integration between astropy.uncertainty and nddata #9020

Open

astromancer reviewed Sep 11, 2019

View reviewed changes

mhvk modified the milestones: v4.0, v4.1 Oct 24, 2019

astropy-bot bot added the closed-by-bot label Dec 19, 2019

astropy-bot bot closed this Dec 19, 2019

bsipocz removed this from the v4.1 milestone Jan 2, 2020

mhvk mentioned this pull request Aug 1, 2021

quantity_input could provide access to the unwrapped function #11987

Closed

mhvk mentioned this pull request Oct 26, 2022

Better compatibility with uncertainties package #13902

Open

mhvk mentioned this pull request Dec 15, 2022

ENH nddata: collapse operations on NDDataArray, improved Masked Quantity support #14175

Merged

10 tasks

mhvk mentioned this pull request Feb 16, 2023

Space velocity Errors #14405

Open

mhvk mentioned this pull request Mar 3, 2023

astropy.units.Unit should use __array_ufunc__ instead of overriding the arithmetic operators #14479

Open

mhvk mentioned this pull request Oct 5, 2023

What was the motivation for astropy.{units,constants,uncertainties}, when there are pint and uncertainties? #15440

Closed

mhvk mentioned this pull request Oct 8, 2024

Propagation of provenance for Quantity objects #17113

Open

		@@ -0,0 +1,5 @@
		# Licensed under a 3-clause BSD style license - see LICENSE.rst
		from __future__ import absolute_import

		return '{0}({1})'.format(type(self).__name__, self.derivatives)


		class Variable(np.ndarray):

Uh oh!

WIP Variable class that includes uncertainties; application to UQuantity #3715

WIP Variable class that includes uncertainties; application to UQuantity #3715

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!