-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Allow to choose median function for mad_std and median_absolute_deviation #5835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Oh, I could have skipped the CI, if someone wants to kill the builds ... |
This looks great to me, please continue! Thanks, @saimn. |
This has some overlap with #5232. I like the possibility of providing a function (🎉). But it seems weird that any providided function is overridden if the input is a masked array. |
Ah, yes, thanks for reminding me about #5232. For a masked array input, we could turn the masked values to NaNs and use EDIT: I'm not sure if we can give the user a choice for masked arrays. |
Oh indeed it's quite the same as #5232, but with a different approach. I prefer a simple approach like here where the user has the responsibility to choose the right function ( For the masked array case, currently it overrides the provided function, which is I think the best choice here. |
Reading a bit more #5232, in particular #5232 (comment) and the benchmark part (#5232 (comment) and below). For me it strengthens the idea that it is difficult to make good default choices, and the best is to give the user to possibility to choose :) |
@saimn I lean the other way - I don't think it's reasonable to expect users to understand all the intricacies of interactions between different versions of numpy and different array types. Personally, if I ever had to figure out which variant of median to use, I'd go back to the source code in #5232. |
I definitely like the possibility of the user having the option to chose the function, but I see the argument in that the average user who will just want the result doesn't want to go through all the work to figure out which to use. So I think the best solution would be a hybrid of the two. Something where the user could specify the function they want to use (and then it would be use that function regardless of what else the user did), but the default would not be an I have to say I don't like how this one changes the function after the user has specified it depending on the array type as that seems non-intuitive from the users perspective to me and something that is done without any warning to the user. |
I have changed the default function to |
@keflavich are you happy with this merging and then in another pull request adding an |
Yes, I think this approach is fine for now and we can hopefully enhance it later. |
Ok great ! Added changelog and a test for the multi-dimensional axis case, it should be ready now. |
@saimn -- it looks like your test for multi-dimensional axis is failing in older versions of numpy. |
@crawfordsm - Thanks for the notice, it was indeed introduced in Numpy 1.9 for |
Failure is due to a timeout in the documentation build:
|
ping @crawfordsm - I think this one is ready (modulo the doc build failure, which you may want to restart ?) |
I restarted it. That was the time the numpy documentation was down, right? 😄 |
astropy/stats/funcs.py
Outdated
|
||
a = np.asanyarray(a) | ||
a_median = func(a, axis=axis) | ||
|
||
# broadcast the median array before subtraction | ||
if axis is not None: | ||
a_median = np.expand_dims(a_median, axis=axis) | ||
if isinstance(axis, (tuple, list)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are lists possible? I thought it was only tuple
or int
. Additional thought: Could this be replaced with a try / except TypeError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah fun, lists are valid for np.median
(docstring speaks of a sequence of int) but not for np.mean
. I usually prefer to use try/except
for real errors, here using isinstance
is more explicit imo.
And thanks for restarting 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sequence of int is the problem. What if I used axis=np.array([1, 2])
as input? arrays are valid axis inputs for median.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure, it's valid for this case. And after more testing it seems safe to catch TypeError
, so if it's your pleasure to use a Numpy array here let's go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that np.expand_dims([0], axis=np.array([1,2]))
raises a ValueError
, so two exceptions to catch.
astropy/stats/funcs.py
Outdated
try: | ||
a_median = np.expand_dims(a_median, axis=axis) | ||
except (ValueError, TypeError): | ||
for ax in sorted(list(axis)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does sorted
work correctly even for negative axis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
astropy/stats/funcs.py
Outdated
a_median = np.expand_dims(a_median, axis=axis) | ||
try: | ||
a_median = np.expand_dims(a_median, axis=axis) | ||
except (ValueError, TypeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about:
try:
len(axis)
except TypeError:
axis = [axis]
for ax in ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with astropy.utils.isiterable
which is quite similar.
Nice, 2 unrelated failures this time:
|
Rebased to get the |
Merging! Thanks @saimn ! |
The idea here is to allow the user to choose which function must be used for the median computation for
mad_std
andmedian_absolute_deviation
(similarly tosigma_clip
), which would be useful for arrays containing NaNs. Usingnp.nanmedian
can be much faster than having to use masked arrays (see example below).Also,
np.nanmedian
allows to use a multi-dimensionalaxis
, as well asnp.median
, butnp.ma.median
cannot do this (though it's possible to bypass this limitation with a reshape).Example on a data cube:
Masked array version:
With this PR,
np.nanmedian
and a multi-dimensionalaxis
:With bottleneck (does not support multi-dimensional
axis
):The PR is not polished yet, it needs tests, docstring, changelog etc., but first I wanted to open the discussion (@larrybradley , anyone else ?)