-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
WIP Variable class that includes uncertainties; application to UQuantity #3715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
113f8d1
to
ef14802
Compare
remind me to look closer at this later--it might be good to start looking into if/how this can be integrated into modeling as well. |
Thanks @mhvk: I will have a look at what you did. The "big outstanding issue" that you are mentioning (large memory consumption for even simple array operations) is tricky (it is one of the main reasons why I did not delve sooner into this business!). A NxN = 1000x1000 matrix There are C (and C++?) libraries out there that handle uncertainties in arrays: maybe they solved this issue and can be a source of inspiration? |
astropy/uncertainty/core.py
Outdated
UFUNC_DERIVATIVES[np.abs] = UFUNC_DERIVATIVES[np.fabs] | ||
|
||
if hasattr(np, 'cbrt'): | ||
UFUNC_DERIVATIVES[np.cbrt] = lambda x: 1./(3.*np.cbrt(x)**2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be **(2/3)
? (__future__
is used, so no need for 2./3
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to think again why I did it, but then remembered: np.cbrt
is pretty optimized, as is squaring. As a result, what I have is faster than just raising to the 2/3
:
In [2]: a = np.arange(100.)
In [3]: %timeit np.cbrt(a)
100000 loops, best of 3: 2.93 µs per loop
In [4]: %timeit np.cbrt(a)**2
100000 loops, best of 3: 3.78 µs per loop
In [5]: %timeit a**(2/3)
100000 loops, best of 3: 7.33 µs per loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I had missed the fact that you had put np.cbrt(x)
: I thought it was only x
. Good timing results (I guess that in your example you have 2/3 = 0.666… and not 0, right?).
@lebigot - indeed, you're right, in Overall, if only for clarity of implementation, it seems reasonably sensible to let any of the |
Good that you welcome further thoughts: I have been needing |
Hi there @mhvk 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃. I see this is an experimental pull request. I'll report back on the checks once the PR discussion in settled. If there are any issues with this message, please report them here. |
Rebased. @eteq - it would be good to discuss this together with the attempt at implementing distributions. |
@mhvk - I see that you've deleted a comment from the bot. Was it misbehaving? Any feedback is great at this point, so we can improve it. |
Not at all, it just was no longer relevant, so I just hoped to make life easier for someone going over earlier discussion. |
astropy/uncertainty/core.py
Outdated
|
||
import numpy as np | ||
from astropy.utils.misc import isiterable | ||
from astropy.utils.compat import NUMPY_LT_1_10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we require np 1.10+, this isn't available any more
astropy/uncertainty/__init__.py
Outdated
@@ -0,0 +1,5 @@ | |||
# Licensed under a 3-clause BSD style license - see LICENSE.rst | |||
from __future__ import absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the future import
astropy/uncertainty/core.py
Outdated
return result | ||
|
||
def __array_finalize__(self, obj): | ||
if super(Variable, self).__array_finalize__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With only Python 3 support you don't need the arguments for super.
astropy/uncertainty/core.py
Outdated
|
||
def __array_finalize__(self, obj): | ||
if super(Variable, self).__array_finalize__: | ||
super(Variable, self).__array_finalize__(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
astropy/uncertainty/uquantity.py
Outdated
value = q_cls(value, unit, **kwargs) | ||
if uncertainty is not None: | ||
uncertainty = q_cls(uncertainty, value.unit) | ||
return super(UQuantity, cls).__new__(cls, value, uncertainty, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
astropy/uncertainty/uquantity.py
Outdated
# Use str('U') to avoid unicode for class name in python2. | ||
self[q_cls] = type(str('U') + q_cls.__name__, | ||
(UQuantity, Variable, q_cls), {}) | ||
return super(_SubclassDict, self).__getitem__(q_cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
Talking about |
@mhvk This is great work! and very necessary imo! I'm keen to help you develop and test this. It would be helpful if you could point out the aspects that would be most beneficial to get some help on - based on your comments it seems rn making the covariance tracking mechanism optional is a priority. Otherwise I'll just start hacking and send PRs to your fork. |
# -*- coding: utf-8 -*- | ||
# Licensed under a 3-clause BSD style license - see LICENSE.rst | ||
""" | ||
Distribution class and associated machinery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably read "Uncertainty" instead of "Distribution"
np.multiply: (lambda x, y: y, | ||
lambda x, y: x), | ||
np.true_divide: (lambda x, y: 1/y, | ||
lambda x, y: -x/y**2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of performance, np.square
typically tends to be somewhat faster (5 - 15% in my experience) than using the power operator to square. Replacing all the squares here with np.square
should give you a moderate performance boost. The same is probably true for np.power
, but I have not tested this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy makes this optimization internally for __pow__
: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/number.c#L507
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I wasn't aware of this optimization. I convinced myself that np.square
was faster by running this test, but I suppose these results are themselves uncertain (ahem):
for p in range(1, 8):
n = int(10 ** p)
x = np.random.rand(n)
t_pow = %timeit -o x ** 2
t_sqr = %timeit -o np.square(x)
speedup = np.mean(1 - np.divide(t_sqr.timings, t_pow.timings))
print('speedup (n = %.0e): %.2f%%' % (n, speedup * 100))
The results were:
971 ns ± 63.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
911 ns ± 52.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
speedup (n = 1e+01): 5.71%
977 ns ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
909 ns ± 23.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
speedup (n = 1e+02): 6.84%
1.6 µs ± 45.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.53 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
speedup (n = 1e+03): 4.19%
5.23 µs ± 407 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.81 µs ± 302 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
speedup (n = 1e+04): 7.70%
64.2 µs ± 5.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
66 µs ± 7.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
speedup (n = 1e+05): -3.82%
1.86 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.86 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
speedup (n = 1e+06): -0.71%
55.4 ms ± 3.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
47.9 ms ± 7.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
speedup (n = 1e+07): 12.76%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite large fluctuations, indeed. Overall, my sense would be to leave it for the future - there may be larger advantages to be had by carefully ensuring more complicated operations are done in-place (though some of that is also done automatically in numpy).
return '{0}({1})'.format(type(self).__name__, self.derivatives) | ||
|
||
|
||
class Variable(np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there may not be a better name for this class. "Variable" is a term that is already quite overloaded, and doesn't seem such a good fit for the intended use of this class and could be confusing to the uninitiated. In statistical parlance, the objects represented by (instances of) this class are Random Variates - that is realizations of a Random Variable. It would also be a good idea to include "Array" in the name to indicate that this is actually a numpy array subclass. I would therefore suggest RandomVariateArray
as a potential alternative name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed the name is very poor, but random
to me suggests there is a random number generator involved, which is of course not the case. Maybe Measurement
? But then I hope to use the same thing also for derived quantities... Wikipedia suggests Measurand
, but it is the first I ever saw that term...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favour of Measurement
or Variate
or even VariateArray
over Variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My problem with Variate
remains the association with random number generators - I have not seen it used for measurements or derived quantities. My problem with Measurement
is that this suggests no dependence on other objects, while the whole point of the class is to include correlations. But Variable
sounds like an instance in a programme, which is also no good.
What I'll do for now is right on top put an item that we have to decide on the name...
@astromancer - it would be great to get some help with this - it is also on the schedule for a bit of hacking in November with @eteq, but the further it is by that time, the better - a definite goal to have it in 4.0. The discussion above indeed mentions opting in/out of tracking covariance, but my feeling is that it is most important to have a good default. I think for now this has to mean that all reductions drop the covariances. Looking at the code, though, I see that it really is a bit out of date (e.g., still having But I don't think any of the above should stop you from contributing. In particular, it would be fantastic to have more tests. Or to have the basis for the reductions (method "reduce" in |
Made some good progress with this, but not ready by tomorrow. And, like for Masked Quantities, best to introduce this in a feature release, not LTS. |
Hi humans 👋 - this pull request hasn't had any new commits for approximately 5 months. I plan to close this in a month if the pull request doesn't have any new commits by then. In lieu of a stalled pull request, please consider closing this and open an issue instead if a reminder is needed to revisit in the future. Maintainers may also choose to add If this PR still needs to be reviewed, as an author, you can rebase it to reset the clock. If you believe I commented on this pull request incorrectly, please report this here. |
@mhvk - could you rebase to keep this PR alive? |
I'm going to close this pull request as per my previous message. If you think what is being added/fixed here is still important, please remember to open an issue to keep track of it. Thanks! If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here. |
@mhvk, v5.0? Would be very cool. |
@nstarman - I'd happily revive this! |
And I'd be happy to help. I would love a way to naturally represent errors in coordinates and this seems a natural first stepping stone. |
EDIT: updated: this PR provides ways to propagate uncertainties analytically, keeping track of correlations. It is meant as a companion to the
Distribution
class, which follows, effectively, a Monte-Carlo approach to propagating uncertainties. Items still to do:Variable
is poor,Measurement
does not include propagation/correlated errors,Variate
suggests random number generation;Measurand
may be closest but is not common);Distribution
, consider auto-generating subclasses (VariableArray
,VariableQuantity
, etc.).__array_ufunc__
only, no__array_prepare__
, etc.)Follow-up of this PR might include:
nddata
BELOW IS THE OUTDATED ORIGINAL TEXT
Very much work in progress, so do not merge
This is a first stab at a general
Variable
class, a subclass ofndarray
that tracks uncertainties. Also implemented is an application toUQuantity
, i.e., a mixin withQuantity
. It is still inspired by theuncertainties
package [1], written by @lebigot, and many of the test cases are taken from the GSoC work of @Epitrochoid. Unlike those efforts, however, this is a propernd 8000 array
subclass (while in @Epitrochoid's approach arrays were not yet possible, while inuncertainties
, object arrays (holdingVariable
instances) are used, which is much slower).Note that I started writing this using
__numpy_ufunc__
, which is the most logical and fastest approach, and hence this PR includes the commits of #2583 and #2948. Upon further thought, however, it became clear that this would work with the usual__array_prepare__
and__array_wrap__
as well; for now, I didn't feel it was worth removing the__numpy_ufunc__
machinery.Anyway, as is, this works well for generic
ufunc
operations. One big outstanding issue is what to do with methods that combine (parts of) an array, such assum
ormean
. E.g., if we dox - x.mean()
, wherex
is some large array, do we want to track for every element ofx
its dependence on every other element ofx
(which, for a 1000x1000 image would imply a 1000x1000x1000 derivative matrix).cc: @wkerzendorf, @eteq, @mwcraig
@lebigot: please let me know if you are interested in using this in your package.
[1] https://github.com/lebigot/uncertainties