8000 NDArithmeticMixin and NDUncertainty Refactor by MSeifert04 · Pull Request #4272 · astropy/astropy · GitHub
[go: up one dir, main page]

Skip to content

NDArithmeticMixin and NDUncertainty Refactor #4272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 30, 2016
Merged

NDArithmeticMixin and NDUncertainty Refactor #4272

merged 12 commits into from
Jan 30, 2016

Conversation

MSeifert04
Copy link
Contributor

Initial try of splitting the large #4234. More information will follow when the splitting is done.

edit 12/22/2015: added checklist

Next steps:

  • Rework metadata arithmetic so that it does not enforce any specific methods of combining the metadata -- that works by calling methods on the metadata but that returns None with a warning if the methods don't exist. _arithmetic_meta
  • Make thin wrapper class around a meta if it doesn't have any arithmetic operations.
  • Then, if that looks promising:*
  • Add wrapper classes for other properties.

@MSeifert04
Copy link
Contributor Author

Short Summary

  • UnknownUncertainty is now the default uncertainty for NDData-like classes.
  • NDUncertainty has now adopted the internal mechanics of StdDevUncertainty.
  • StdDevUncertainty now propagates uncertainties always correct and can use a given correlation
  • NDArithmeticMixin has classmethods that allow arithmetic operations where the first element is not a NDData-like class but the result should be.
  • NDArithmeticMixin has now different methods that operate on the attributes. Also different ways of handling the meta in arithmetic operations are now possible.

I think I forgot to mention some further changes and I hope this is not to much change for you to give me some comments.

@MSeifert04
Copy link
Contributor Author

One issue I found about the Uncertainty refactor was how to organize the classes. Since NDUncertainty is a MetaClass but it does everything except the actual propagation. As an alternative I could make NDUncertainty like NDDataBase to a class that only implements an interface (UnknownUncertainty would then be like a Minimal working implementation of it) and make the mechanics (like __init__ and propagation abstractmethods) a Mixin with StdDevUncertainty being a working implementation of it. I don't know what would be better.

I hope you understand what I mean, if not I could wrap this idea in code (just superficially). But maybe I'm thinking too much about it and it is fine just like it is.

function that is used by the name of the operation better than by
the function itself. Suppose we have ``data`` that is a
`~numpy.matrix` or ``Pandas.DataFrame`` we need to use different
UFUNCS (maybe) and that explodes the number of if/elif checks in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this (need to use different UFUNCS) isn't actually an issue per http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe-interoperability-with-numpy-functions

Even if it were an issue I think it would be better to make operation a function that defaults to the appropriate numpy ufunc because then the user can substitute a different, more appropriate, ufunc without needing to subclass or add a special case via a PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to get this straight before I change it:
add should provide the numpy ufunc (np.add) and pass it to _arithmetic as operation? So that another ufunc could be provided by calling _arithmetic directly?
Another thing: How should I evaluate the function in _arithmetics_meta: compare the function name or simply apply the operation-parameter?
For uncertainty_propagation I'll need to compare the operation in a way, should I compare it like operation is np.add or by operation.__name__ == 'add'?

I'm not sure about what would be best, that was one another reason why I thought string comparisons were the best way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add should provide the numpy ufunc (np.add) and pass it to _arithmetic as operation? So that another ufunc could be provided by calling _arithmetic directly?

Mostly, yes. I think what I said before about substituting a different function without subclassing doesn't actually work, but a subclass that wanted to use a different addition could do so by just overriding add without needing to change anything in _arithmetic.

The other advantage of making this change is that you eliminate the need for the duplicate logic in _arithmetic_data and _arithmetic_meta

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing: How should I evaluate the function in _arithmetics_meta: compare the function name or simply apply the operation-parameter?

I would just apply the parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For uncertainty_propagation I'll need to compare the operation in a way, should I compare it like operation is np.add or by operation.__name__ == 'add'?

Good question...looking over the code, I'd lean towards using operation.__name__. What about an approach like that in the current implementation [1]? That adds an argument to _arithmetic that just gets passed through to the uncertainty call but that maybe isn't so bad.

@mwcraig
Copy link
Member
mwcraig commented Oct 31, 2015

@MSeifert04 -- lots of inline comments, I still want to try running the tests locally and see if it breaks anything in ccdproc :). Hoping to do that today.

@MSeifert04
Copy link
Contributor Author

@mwcraig Thanks for the comments, I've adressed every obvious comment but left replies where I wasn't sure how to proceed.

If you need any help with ccdproc I can help if you want.

@mwcraig
Copy link
Member
mwcraig commented Nov 2, 2015

@jehturner -- would you mind looking over this? It would impact NDDataArithmetic. While comments on the code itself are certainly welcome, it would be useful if you could try this out with what you are developing with NDDataArithmetic and report back whether it breaks anything.

@jehturner
Copy link
Member

OK. Any sane changes to the arithmetic are unlikely to break things for me, as I'm not systematically using that functionality yet (though obviously that's the intention). Most of the calculations are done by the external processes I'm initially wrapping and then gradually replacing, something I'm just completing the outline of. I can certainly do a quick check but it might take a few days as I'm juggling 2 deadlines. Anyway, thanks for asking and thanks to Michael for improving the code.

@jehturner
Copy link
Member

I confirm that this does not break my code (in a couple of dozen quick tests). Sorry it has taken so long to comment (has it really been 3 weeks?). I haven't yet looked closely at what these changes actually do but I'm not noticing any unexpected problems.

@embray
Copy link
Member
embray commented Nov 25, 2015

Thanks for checking it out @jehturner --the fact that it works for you is encouraging.

I hope to review these changes soon myself.

@MSeifert04
Copy link
Contributor Author

@mwcraig @embray Should I incorporate @mwcraig suggestions or still wait a bit longer?

@jehturner Thanks for the tests.

@mwcraig
Copy link
Member
mwcraig commented Dec 4, 2015

@MSeifert04 -- hold off for a couple more days, hoping to get comments form @wkerzendorf (he was traveling for most of November).

@@ -23,212 +168,638 @@ class NDArithmeticMixin(object):
When subclassing, be sure to list the superclasses in the correct order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to automate or force the correct order? @embray you know a lot of python magic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth checking what makes it required to have a certain order. Often, one can write things such that order does not matter (but certainly not always).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MRO dictates the order - a helpful reference is http://nedbatchelder.com/blog/201210/multiple_inheritance_is_hard.html and the comments underneath.

If the mixin overrides a method in NDDataBase you want the mixin listed before the base class.

It might be better to simply point out that order matters and provide a reference....

@MSeifert04
Copy link
Contributor Author

@jehturner - Sorry, I haven't seen your comments. I just used the uncertainty_correlation as given in the error propagation formulas (I don't recall which book I had them from but they are also given here: https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Example_formulas). This is just a shortcut if one really tries to use some known correlation. Thank you for pointing it out!
Implementing some covariance handling by itself is a bit beyond my capabilities. But in some cases I know the uncertainty (because I divide an image A by the pixel-wise mean of Image A, B, C) and want to have the right final uncertainty (including the correlation effect). Not sure if anyone else wants to use it though. :)

@mwcraig - I rebased and deleted the provisional code parts. I hope I have time the next days to make an accurate Changelog entry.

@MSeifert04
Copy link
Contributor Author

I've added a small changelog entry. I think it covers everything related to public methods.

@eteq
Copy link
Member
eteq commented Jan 28, 2016

@MSeifert04 @mwcraig and all the reviewers - Lots of great work here, +:100: !

I have one concern, though, and that's the lack of (narrative) documentation and examples, particularly with the correlation stuff (which is notoriously tricky). I see that #4538 was created as a reminder to do that. But, in my experience, if something gets merged with a separate "do the documentation" issue, often the documentation doesn't get written until it's far too late (i.e., someone mis-used it or gave up on it because they didn't understand it).

That said, I understand you might want to get it in soon for other changes. So @mwcraig, I'm fine with you making a command decision about whether my concern here is warranted or if you're confident you and/or @MSeifert04 will get the docs in for this in a reasonably timely fashion before the key points have escaped your heads.

@eteq
Copy link
Member
eteq commented Jan 28, 2016

Oh one other thing that's more informational than anything else: I think we'll also want to try to at least get a start on something with Quantity and coorelated uncertainties for v1.2 - maybe @mhvk's #3715, or maybe something simpler that then #3715 ends up building on. We will certainly want to try to figure out how to harmonize that with nddata's uncertainties... But I think it's probably better to let them operate seperately at least initially so that we have APIs that make sense for those objects, and then later on try to hook them together better.

@mhvk
Copy link
Contributor
mhvk commented Jan 28, 2016

Agreed to let Quantity and NDData proceed separately; my notes here were really about making people aware of #3715, since it does correlations perfectly (though the resulting derivative matrices can get rather out of hand -- which is what needs thought).

@mwcraig
Copy link
Member
mwcraig commented Jan 30, 2016

@eteq -- I'm going to go ahead and merge this; @MSeifert04 has an outline in an ipython notebook in #4538.

Agreed it would have been better to flesh out the docs at the same time, but this has lingered long enough I'd like to get it closed.

I'll work with @MSeifert04 to divide up the work on the docs with a target date of a week or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
0