-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH ufunc called on memmap return a ndarray #7406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Years ago, had started something like this, but had to be a mit more complex, because of subclasses of memmap. You can find it here: #2932 |
assert_(unary_op(fp, axis=1).__class__ is ndarray) | ||
|
||
for binary_op in [add, multiply, divide, add]: | ||
assert_(binary_op(fp, self.data).__class__ is ndarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the test failure is because self.data
starts with zero, which causes problems with division.
Not sure if the counts as a bug/enhancement/change in behavior. If the latter, might need to start with a |
Looks good, though I'm pretty sure in-place operations (eg Also, I wonder if this is an opportunity to try out the new |
7826a7e
to
137afc3
Compare
Well, looks like Here's a possible fix for def __array_wrap__(self, arr, context=None):
if np.may_share_memory(self, arr):
return arr
if arr.shape == ():
return arr[()]
return arr.view(ndarray) so now you can also do
More generally, there are 2 things I am not 100% sure of yet which we need to be more sure of:
|
Dunno. Don't even know what my code was compared to this (why did I Only thing I would be worried about that memmaps have some attributes About 1. Yes, we are sure ufuncs never return 0-d arrays (except for Anyway, if we are not sure about all of these things, possibly we could |
Don't have time to read through all of this in detail right now, but as a general note I'm +10 on making it so that memmaps don't propagate through ufuncs. I'd also vote against requiring a |
Should get a mention in the 1.12.0-notes. |
@lesteve Ping. |
137afc3
to
6d7f1aa
Compare
I added something in doc/release/1.12.0-notes.rst. I haven't had time to work on this in the past few days. I will try to understand better what was done in #2932. |
@lesteve My impression of the difference between this PR and #2932 is that #2932 tries to treat subclasses of memmap specially and handles I will admit I don't really understand a lot of #2932. It appears to be trying to take care of multiple inheritance, involving inheritance from However, I do think #2932 has the right idea in
to return an ndarray, since the result is a new array rather than a view. However In summary, my impression is that we can forget about trying to fix multiple inheritance like #2932, however we should somehow take care of fancy indexing, and I also suggest adding the |
6d7f1aa
to
609d19a
Compare
609d19a
to
838dab9
Compare
I think this PR is ready to be merged as is, if there are no objections. Although I have a little bit of a feeling there may be some "unknown unknowns" here, after sitting on it I haven't though up any more cases to take care of, and I'm not too worried about the multiple subclass case. Plus, I think this is easy to undo later if there are problems, and it can get some testing if we merge. Maybe we can wait a little for anyone else to speak up, and then merge. |
def __array_wrap__(self, arr, context=None): | ||
# Return scalar instead of 0d memmap, e.g. for np.sum with | ||
# axis=None | ||
if arr.shape == (): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be on the safe side, it might be good to move it after the may_share_memory call? I don't think we should ever return 0-d arrays from functions calling array_wrap, but I am not perfectly sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then again, maybe it doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be on the safe side, it might be good to move it after the may_share_memory call?
Done.
If there may be problems with subclasses, I would feel better if there is a note to the mailing list. |
Otherwise, I trust your assessment :). |
838dab9
to
944ece3
Compare
☔ The latest upstream changes (presumably #7325) made this pull request unmergeable. Please resolve the merge conflicts. |
I am on holiday for a week with a very spotty internet connection. I'll
do that when I get back home.
|
@lesteve - the problem I mentioned is that in your example, |
@mhvk this PR does have the behaviour you want. Thanks for the explanations, I naively expected |
OK, I now see why this does work (should have seen it before): when the output is an memmap, it is |
|
||
def __array_wrap__(self, arr, context=None): | ||
# Return memmap if self and arr are likely to share memory | ||
if np.may_share_memory(self, arr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, a remaining suggestion is to replace this statement with the simpler (and faster)
# Return the object it is was given as an output and should thus preserve type
if arr is self:
return arr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing a bit around, I wasn't able to find an edge case in __array_wrap__
where np.may_share_memory(self, arr)
and arr is self
disagreed, so I made the advised change.
If someone else can see an edge case that I missed, please complain !
0d0c8e6
to
cc8c8b6
Compare
Let me know, I won't be able to work on it until Monday though. |
All right, from a brief look, astropy code doesn't seem to subclass memmap anywhere so @mhvk's example of In that case I vote to keep things simple and ignore the subclass issue, and merge this as is. Is that all right @mhvk? Anyone in the future who wants to subclass memmap will be able to override our behavior if they want. |
@ahaldane - I'm half curious about what goes wrong with the inheritance |
@mhvk, maybe I am mistaken. I was bothered by the fact that ufuncs and fancy slices of instances of @lesteve, sorry for going back and forth on this, but could you make the
and a unit test could be
|
Special case for reduction functions (e.g. np.sum with axis=None) that return a numpy scalar. Keep original memmap subclasses behavior to be on the safe side.
I find that slightly bothering too. In my mind, it is very unlikely that one would want a B 0d array for the result of np.mean and for fancy slices I am a bit more split but not really convinced either. I'd be interested if someone has some kind of use case where this behaviour makes sense. I made the requested change anyway. |
cc8c8b6
to
8f27d12
Compare
@embray I agree. To fix the joblib/joblib@b68a868#diff-19004d02cec64e3d448b33565de338ebR94 I never finished that work but if people are interested I could contribute it to numpy as a saner alternative to |
@ogrisel Yes, that looks useful. For myself, I'm using |
@lesteve - for the subclass case, I do think it is best to default to returning the subclass. It is quite possible that the subclass stores information that is useful even for a 0-d array (e.g., a Overall, it seems to me this is a good addition as is, and with all the tests, etc., it seems ready to go in. |
Fair enough, thanks for the explanation! |
Looks good, I'll merge in an hour or two. |
Merged, thanks @lesteve and the many reviewers. |
Great, thanks a lot ! |
memmap behavior is about to change with merge of numpy/numpy#7406. Test for behavior change so we can adapt expected mmap return values on an image.
memmap behavior is about to change with merge of numpy/numpy#7406. Test for behavior change so we can adapt expected mmap return values on an image.
memmap behavior is about to change with merge of numpy/numpy#7406. Test for behavior change so we can adapt expected mmap return values on an image.
Special case for reduction functions (e.g. np.sum with axis=None) that
return a numpy scalar.
Fix #7403. Implemented
__array_wrap__
following @ahaldane's suggestion. Should fix #667 as well.