8000 Fancy indexing on read-only memmap creates a read-only copy · Issue #14132 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Fancy indexing on read-only memmap creates a read-only copy #14132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeremiedbb opened this issue Jul 26, 2019 · 7 comments · Fixed by #14171
Closed

Fancy indexing on read-only memmap creates a read-only copy #14132

jeremiedbb opened this issue Jul 26, 2019 · 7 comments · Fixed by #14171

Comments

@jeremiedbb
Copy link
Contributor

Fancy indexing on an array creates a copy. If the array is read-only, the copy is writeable and owns its data.

a = np.array([1,2,3])
a.setflags(write=False)
b = a[[1,2]]
b.flags
>>>  C_CONTIGUOUS : True
     F_CONTIGUOUS : True
     OWNDATA : True  #
     WRITEABLE : True  #
     ALIGNED : True
     WRITEBACKIFCOPY : False
     UPDATEIFCOPY : False

However fancy indexing on a memmap creates a read-only copy.

m = np.memmap('tmp', dtype='int', mode='w+', shape=(3,))
m[:] = a[:]
m = np.memmap('tmp', dtype='int', mode='r', shape=(3,))
b = m[[1,2]]
b.flags 
>>>  C_CONTIGUOUS : True
     F_CONTIGUOUS : True
     OWNDATA : False  #
     WRITEABLE : False  #
     ALIGNED : True
     WRITEBACKIFCOPY : False
     UPDATEIFCOPY : False

And it seems that it does not own its data, but then who is its base ?

@rth
Copy link
Contributor
rth commented Jul 26, 2019

Also, in case anyone is wondering, b above is an array not a mmap,

>>> m
memmap([1, 2, 3])
>>> b
array([2, 3])

It is also contiguous even when selecting non-contiguous elements,

>>> m[[0,2]].flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

but still doesn't own the data.

@seberg
Copy link
Member
seberg commented Jul 26, 2019

The problem is here:
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/mapping.c#L1695

The writeable flag has to be cleared. The only tricky thing is to double check what flags are necessary at all (probably none, and just passing in 0 is fine).

Want to have a look?

@jeremiedbb
Copy link
Contributor Author
jeremiedbb commented Jul 29, 2019

@seberg

I've looked at it and there's something I don't understand. When using fancy indexing, a new result array is created:

result = PyArray_NewFromDescr(&PyArray_Type,
                              PyArray_DESCR(self),
                              PyArray_NDIM(ind),
                              PyArray_SHAPE(ind),
                              NULL, NULL,
                              /* Same order as indices */
                              PyArray_ISFORTRAN(ind) ?
                                  NPY_ARRAY_F_CONTIGUOUS : 0,
                              NULL);

and then, a new one may be created with this one as base:

wrap_out_array:
    if (!PyArray_CheckExact(self)) {
        /*
         * Need to create a new array as if the old one never existed.
         */
        PyArrayObject *tmp_arr = (PyArrayObject *)result;

        Py_INCREF(PyArray_DESCR(tmp_arr));
        result = PyArray_NewFromDescrAndBase(
                Py_TYPE(self),
                PyArray_DESCR(tmp_arr),
                PyArray_NDIM(tmp_arr),
                PyArray_SHAPE(tmp_arr),

8000
                PyArray_STRIDES(tmp_arr),
                PyArray_BYTES(tmp_arr),
                PyArray_FLAGS(self),
                (PyObject *)self, (PyObject *)tmp_arr);
        Py_DECREF(tmp_arr);
        if (result == NULL) {
            goto finish;
        }
    }

Why is this step necessary ? What is the purpose of PyArray_CheckExact ?
In which situation returning the previously created array is problematic ?

@seberg
Copy link
Member
seberg commented Jul 29, 2019

@jeremiedbb the second call is for subclasses. The first call always returns an "exact" (non-subclass) numpy array. Since the first call only has the contiguity as flags, I do not think the second one requires any flags at all.

@jeremiedbb
Copy link
Contributor Author

The second call uses the flags of self. Using the flags of tmp_arr solves the writeable issue.

However it does not solve the owndata issue. Since it's a call of PyArray_NewFromDescrAndBase, its base is tmp_arr. But it should not have a base, right ?

Why returning the array after the first call is not enough ?

@seberg
Copy link
Member
seberg commented Jul 31, 2019

The owndata is part is not an issue, there is nothing wrong with it having a base. This is because the subclass may need the data.

@jeremiedbb
Copy link
Contributor Author

Thanks for the clarification. I opened #14171.

seberg pushed a commit that referenced this issue Aug 1, 2019
…14171)

Fancy indexing on read-only subclass makes a read-only copy. This PR avoids using the flags of the original array.

Fixes #14132

* do not use original array flags in fancy indexing

* Update numpy/core/tests/test_indexing.py

Co-Authored-By: Eric Wieser <wieser.eric@gmail.com>
charris pushed a commit to charris/numpy that referenced this issue Aug 5, 2019
…umpy#14171)

Fancy indexing on read-only subclass makes a read-only copy. This PR avoids using the flags of the original array.

Fixes numpy#14132

* do not use original array flags in fancy indexing

* Update numpy/core/tests/test_indexing.py

Co-Authored-By: Eric Wieser <wieser.eric@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0