8000 np.average of memmap returns memmap 0d array instead of numpy scalar · Issue #7403 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

np.average of memmap returns memmap 0d array instead of numpy scalar #7403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lesteve opened this issue Mar 10, 2016 · 7 comments · Fixed by #7406 or #7433
Closed

np.average of memmap returns memmap 0d array instead of numpy scalar #7403

lesteve opened this issue Mar 10, 2016 · 7 comments · Fixed by #7406 or #7433

Comments

@lesteve
Copy link
Contributor
lesteve commented Mar 10, 2016
import numpy as np
mm = np.memmap('/tmp/test.data', mode='w+', shape=4)
np.average(mm)

Output since 5ceab8f (a couple of days ago):

memmap(0.0)

Output before that (e.g. in 1.10.4):

0.0

This causes the scikit-learn using numpy master tests to fail, see this Travis log.

We have been bitten in the past by this in scikit-learn with .sum that has the same surprising behaviour (see scikit-learn/scikit-learn#6225). Shouldn't reduce operations on a memmap array always return a numpy scalar ?

cc @ogrisel.

@ahaldane
Copy link
Member

The cause is the change from a = np.asarray(a) to a = np.asanyarray(a) at the start of np.average, which I did to preserve subclass like np.mean does. Previously np.average only preserved the np.matrix subclass using a special clause, but now it preserves any subclass.

While it's definitely a change in behavior we should think carefully about, the new behavior seems more consistent with the rest of numpy. If no weights are given np.average just computes np.mean, and np.mean preserves the subclass (including for scalars), just like ufuncs and most other numpy methods. This is so that subclasses have a chance to modify the inputs or outputs to the computation, see some of the discussion in #5819, #667.

I suggest we should keep the old asarray behavior temporarily with a deprecation warning, and then (sorry @lesteve) go ahead with the change to asanyarray in the next release. Is that reasonable?

Possibly we might also think about updating memmap.sum and others so they return a scalar, or making some of the changes proposed in #667. Indeed it seems weird that memmap sum/mean return a memmap array (even for non-scalar results, eg after summing over an axis).

@ogrisel
Copy link
Contributor
ogrisel commented Mar 10, 2016

Pretty much all the operations are weird anyway with numpy.memmap, for instance the product of a memmap array by a scalar is a memmap array too:

>>> import numpy as np
>>> a = np.memmap('a.bin', shape=10, dtype=np.uint8, mode='w+')
>>> a[:] = 1

>>> b = a * 3
>>> type(b)
<class 'numpy.core.memmap.memmap'>
>>> b.base
array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3], dtype=uint8)

b is of type np.memmap but is actually backed by an in-memory buffer and not a memmaped file. The fact that b's data is in-memmory is fine. But I think b should just be a regular number array.

Anyway this is not directly the topic of this issue. I am +1 for a deprecation cycle to change the behavior of average and other aggregation operations.

@ahaldane
Copy link
Member

Re ufuncs on memmaps:

Maybe we just need to add

    def __array_wrap__(self, arr, context=None):
        if arr.shape == ():
            return arr[()]
        return arr.view(np.ndarray)

or similar to the memmap definition. I just tried it, and it seems to work without failing any tests.

@ogrisel
Copy link
Contributor
ogrisel commented Mar 10, 2016

I think this is a good idea (unless there are ufuncs that do inplace operations?).

Also to fix the memmap class consistency issue I had prototyped in the past an alternative that would not subclass ndarray but instead call np.frombuffer on a custom mmap buffer that would store the required metadata (filename, offset, mode...):

joblib/joblib@b68a868#diff-19004d02cec64e3d448b33565de338ebR94

I never finished that work but if people are interested I could contribute it to numpy as a saner alternative to np.memmap.

@matthew-brett
Copy link
Contributor

We just hit this too over at dipy/dipy#1074

Is it really desirable to return a scalar of a particular subclass in this case? Should the scalars be special-cased? It's unfortunate that a numpy array and a subclass return a different type of object when reducing to a scalar.

@shoyer
Copy link
Member
shoyer commented Jun 9, 2016

@matthew-brett It looks like this is already fixed in master via #7406

@matthew-brett
Copy link
Contributor
8B25 matthew-brett commented Jun 9, 2016

Ugh - and I see that I knew about this fix from my comments echoed to that thread - thanks for the reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants
0