-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
ENH nddata: collapse operations on NDDataArray, improved Masked Quantity support #14175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Thank you for your draft pull request! Do you know that you can use |
@DanRyanIrish this looks like something you might want to look at. |
Hi @bmorris3. This looks like a specific case of the same problem we are trying to solve in In our case we are implementing I haven't gone through your draft implementation in detail yet but at least parts of your implementation might be better than what we have in |
Hi @DanRyanIrish, thanks for the heads up on a related effort in sunpy/ndcube#450. As you say, the At first look, there might be two main differences.
https://github.com/sunpy/ndcube/blob/488fc6bba269e6699d940696b1a0691761d90658/ndcube/ndcube.py#L731 Two key features of this astropy PR are generalized solutions for (1) and (2). For example, in astronomical observations with spectral cubes, we often have an unresolved star producing most of the signal in a single pixel. A As you noticed, I'm still working on an efficient implementation of |
Looking closer, I'm not sure that my previous comment is correct about (1). It looks like |
Thanks for your thoughts on sunpy/ndcube#450 @bmorris3.
As you mentioned subsequently
The reason uncertainty propagation is not supported for these operations is that, as far as I can see, the uncertainty is ill-defined. For example, there may be multiple array values equal to the maximum value in the cube. However, each may have different uncertainties, e.g. due to different noise levels in different pixels. In this case what should the uncertainty be? The same ambiguity holds for
To mind the way to handle this use-case and unambiguously maintain the uncertainty is to find the index of the spatial pixel and then slice the cube so it gives you the spectrum at that spatial location. Although this does involve at least two operations rather than just one and so is a little more cumbersome. WHat do you think about the above? Have I misunderstood your use-case? Is uncertainty well-defined for |
It's true that
I think the advantage of this
|
Just in case it helps: much of the machinery for propagating masks would seem identical to what is done in the [1] E.g., astropy/astropy/utils/masked/core.py Lines 1098 to 1119 in 5250b24
|
@mhvk thanks for the pointer! I was actually just reading about this module because I didn't know much about it. |
Upon further inspection of |
I'm still sorting out all of the implications of this, but since @mhvk's suggestion to build on Masked arrays from ExamplesSuppose you have loaded a spectral cube (two spatial dims, one spectral dim) into an np.sum(nddata, axis=(1, 2)) # internally this becomes `nddata.sum(axis=(1,2))` and you will be returned the collapsed spectrum as an NDDataArray([68098.234, 80767.66 , 69840.2 , ..., 94635.03 , 92484.08 ,
89727.45 ], unit='MJy / sr') We can only propagate errors, and return a If you call a numpy function that doesn't have explicit support from the additions in this PR for error propagation like np.std(nddata, axis=(1, 2)) it falls back on returning a masked_array(data=[137.33192443847656, 138.0076141357422,
137.83934020996094, ..., 238.78318786621094,
238.67127990722656, 238.54486083984375],
mask=[False, False, False, ..., False, False, False],
fill_value=1e+20,
dtype=float32) |
Happy that this looks helpful! Might it make sense to return p.s. At some point I really hope to make some progress on getting proper error propagation, including covariance, supported in |
@mhvk I agree that the behavior above is inconsistent, but I think this stems from a feature rather than a bug? astropy/astropy/nddata/compat.py Lines 229 to 237 in a7cb990
The above snippet exports the The (few) Clearly, one likely source of confusion in this type of implementation is that you have different output types if you call |
OK, that certainly makes sense. Maybe in the future this can change (e.g., one might want to return |
Great! I want to preempt another likely source of confusion here: if you call a numpy function on an Relevant notes in the numpy docs here. |
There is a conflict now, FYI |
I guess it comes down to whether you want a correct answer most of the time in a way that mimics numpy or to never give a wrong answer. This is probably more of a choice of preference but one argument against it is that numpy arrays don't have uncertainties. So whether This has prompted me to attempt to support uncertainty preservation for min/max in the ndcube#450. However instead of taking the first found uncertainty, I preserve the largest uncertainty. This seems to me to better reflect our actual knowledge of what the min/max value is. Any thoughts?
I hadn't thought this approach. It's compelling. One aspect is still not clear to me though. In the odd-length case, if there are multiple occurrences of the median value, which one's uncertainty gets used? Is it meaningful to simply take the "middle" one as determined by the sorting algorithm, or should it depend on the uncertainty, e.g. taking the largest as in the ndcube min/max case? |
Bit of a comment from the side, but I think that picking particular elements based on their uncertainty is not really "better"; the question of what are the minimum, maximum, median in the presence of uncertainties likely has more statistically meaningful answers (which may well depend on the distribution, number of points, and the confidence level one wishes to have, just like the mean has an error in the mean, which is not so trivially determined if the data points are not consistent with the mean within their uncertainties -- i.e., reduced chi2 inconsistent with unity). |
I agree with @mhvk. To summarize for others joining the discussion, I think the first decision to make is whether or not we return any result when the operation is ambiguous. We both agree that the user should get an answer, but we're disagreeing on which answer. The approach @DanRyanIrish is suggesting says "take the most conservative possible uncertainty", which is absolutely a handy approach when doing model optimization on observations, for example. The approach I've implemented says "return the first found minimum/maximum and its corresponding uncertainty", which mirrors the behavior in numpy. |
0a18e0f
to
2f51d83
Compare
To further contribute to this discussion, I spoke with a couple solar colleagues who argued for a 3rd way of calculating the uncertainty in the min & max cases which won me over. For the case of max, the returned data value is simply the max of the axis/superpixel. But the uncertainty is |
d5dbb77
to
28a106a
Compare
Hmm. Perhaps. |
In conversations with @mwcraig and @DanRyanIrish, we have agreed that the kwarg discussed earlier (#14175 (comment)) should be changed to |
@bmorris3 -- will a rebase fix the issue with the weird overage results? |
Adding masking support in nddata collapse ops with astropy.utils.masked Adding tests for NDDataArray collapse operations correcting NDDataArray collapse w propagation for ndim<=3 remove print statement from testing Fixing unit propagation bug preserve masked copying behavior fixing up masked array views in nddata NDDataArray arithmetic testing improvements Further NDDataArray testing improvements Fixing ufunc inplace error test Updating docs to reflect small API change when not handling masks Cleaning up PR 14175 Fixing mask overwriting order for masked quantities more thorough handling of inputs for NDData constructor return to assumption NDData.data is ndarray fixing incorrect result for np.ma.maskedarray Increasing coverage adding changelog entry support for mask=None case fixing up astropy mask handling in ndarithmetic further improvements for mask handling remove amin/amax, drop uncertainties from collapse via min/max by default generalizing to >3D cube collapse Apply suggestions from code review Co-authored-by: Matt Craig <mattwcraig@gmail.com> Splitting out masked improvements for separate PR Adding filterwarnings for div by zero error (addressed in another PR) simplifying private methods for extrema in nddata Improving test coverage
26a09bb
to
36b5f61
Compare
Only one way to find out. Here we go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to go once tests pass! Thanks to everyone who has provided input here.
Just to be safe, I am also running the cron jobs. 😬 |
Since this is slated to go in soon, I also took the liberty to fix the What's New listing that I noticed while glancing at the diff. Hope you don't mind! |
linkcheck failure is unrelated. Merging, thank you all very much! |
Thanks to all reviewers! I'm very happy to have had your helpful, patient feedback. |
Description
The nddata module (docs) supports arithmetic between
NDDataArray
objects with propagation of units, masking and uncertainties. The currently supported arithmetic operations include addition, subtraction, multiplication and division.A set of unsupported operations would "collapse"
NDDataArray
s along specified axes. For a concrete example, imagine theNDDataArray
namedcube
represents a spectral cube from an IFU with a spectral dimension (axis=0) and two spatial dimensions (axes 1 and 2). It may be useful to collapse the cube to a single spectrum by taking the maximum the spatial dimensions at each wavelength, which might be represented with a numpy-like syntax like withcollapsed_spectrum = cube.max(axis=(1, 2))
. In the case ofmin
/max
, we expect the uncertainties incollapsed_spectrum
to be the uncertainties at each maximum flux in the spatial dimension. If instead we took the mean withcube.mean(axis=(1, 2))
, we would propagate uncertainties along the collapsed axes. The mask should also be propagated so that only unmasked values are included in the collapse operation.This type of generic collapse operation is something we need over in jdaviz, where we currently rely on glue to collapse our spectral cubes after they have been converted to glue
Data
objects. Since we don't anticipate unit/mask/error propagation to be added to glueData
objects, we're hoping to implement this functionality in astropy, and then offer jdaviz users the option to propagate units/masks/errors via the astropy methods.This PR is a first-draft implementation of
sum
,mean
,min
, andmax
methods forNDDataArray
, which accept an integer or a tuple of ints for theaxis
argument. As of opening this draft PR, I have implemented the collapse arithmetic for the data and uncertainties, and I'm now working on correctly applying the masks. I'm putting up this unfinished PR to collect comments and discussion while I put together docs and tests. Comments welcome and needed!Checklist for package maintainer(s)
This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.
Extra CI
label. Codestyle issues can be fixed by the bot.no-changelog-entry-needed
label. If this is a manual backport, use theskip-changelog-checks
label unless special changelog handling is necessary.astropy-bot
check might be missing; do not let the green checkmark fool you.backport-X.Y.x
label(s) before merge.See additional to-do items at #14175 (comment)