8000 MaskedArray heuristic for memory overlap seems simplistic and slow · Issue #10234 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

MaskedArray heuristic for memory overlap seems simplistic and slow #10234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mhvk opened this issue Dec 18, 2017 · 3 comments
Open

MaskedArray heuristic for memory overlap seems simplistic and slow #10234

mhvk opened this issue Dec 18, 2017 · 3 comments

Comments

@mhvk
Copy link
Contributor
mhvk commented Dec 18, 2017

Currently, in MaskedArray.__array_finalize__, the following is done to check whether the new object may be a view of an old one:

if (obj.__array_interface__["data"][0] != self.__array_interface__["data"][0]):

if that check fails, the mask is copied.

This seems unnecessarily restrictive and fails even simple slicing:

ma = np.ma.MaskedArray(np.arange(100.))
ma2 = ma[10:20]
ma.__array_interface__["data"][0] == ma2.__array_interface__["data"][0]
# False

This means that slices by default do not share the mask with the original object, which doesn't seem a good idea given that we tried to change this behaviour in #5580 (hence, cc @jakirkham).

A relatively straightforward solution would be to replace it with not np.may_share_memory(ma, ma2)
(this, perhaps surprisingly, is also much faster than the above for the simple slice case).

@eric-wieser
Copy link
Member

Are you sure that slicing isn't already handled correctly by __getitem__?

8000
@mhvk
Copy link
Contributor Author
mhvk commented Dec 18, 2017

Yes, you're right, it is. I guess that if we changed the behaviour here, some of the work-arounds in __getitem__ could be removed. But maybe it is not worth the hassle (though the present comparison takes 4us on my computer, while may_share_memory takes 0.3us)

@eric-wieser
Copy link
Member
eric-wieser commented Dec 18, 2017

Most of the __getitem__ workarounds are due to;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0