8000 Inconsistent array assignment for structured arrays · Issue #3126 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Inconsistent array assignment for structured arrays #3126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mbyt opened this issue Mar 6, 2013 · 16 comments
Closed

Inconsistent array assignment for structured arrays #3126

mbyt opened this issue Mar 6, 2013 · 16 comments

Comments

@mbyt
Copy link
Contributor
mbyt commented Mar 6, 2013

When assigning arrays inside structured arrays without a left sided [:], only the first element gets actually assigned. This is shown in the following code snippet, where a structured array is initialized with zeros, and then filled by assignment of an array, and the assignment of a structured array. The assignment of the array only assigns the first value, where the assignment of the structured array assigns all values:

In [2]: np.__version__
Out[2]: '1.7.0'
In [3]: struct_dt = np.dtype([
   ...:     ('elem', 'i4', 5),
   ...: ])
In [4]: dt = np.dtype([
   ...:     ('field', 'i4', 10),
   ...:     ('struct', struct_dt)
   ...: ])
In [5]: x = np.zeros(1, dt)
In [6]: x[0]['field']  = np.ones(10, dtype='i4')
In [7]: x[0]['struct'] = np.ones(1, dtype=struct_dt)
In [8]: x
Out[8]: 
array([([1, 0, 0, 0, 0, 0, 0, 0, 0, 0], ([1, 1, 1, 1, 1],))], 
      dtype=[('field', '<i4', (10,)), ('struct', [('elem', '<i4', (5,))])])

The expected result is that all elements of x[0]['field'] are 1.

However, x[0]['field'] "recognizes" some part of the assigned array, as it will complain about broadcasting problems for wrong sizes:

In [9]: x[0]['field'] = np.ones(9, dtype='i4')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-80d7cbce6711> in <module>()
----> 1 x[0]['field'] = np.ones(9, dtype='i4')

ValueError: could not broadcast input array from shape (9) into shape (9,10)

As mentioned in the beginning, adding [:] on the left side (In [6]) will correctly set all elements to 1. If the shown assignment is not allowed, I would have expected that an exception gets raised?

@charris
Copy link
Member
charris commented Feb 21, 2014

See also #2598, #3561. I suspect these are all the same problem.

@charris
Copy link
Member
charris commented Feb 21, 2014

Also possibly #2599. After any fix for this issue other issues involving assignment should be checked.

@mbyt
Copy link
Contributor Author
mbyt commented Mar 9, 2014

@charris I tried to fix this. The difference between a field assigned with a [:] or not are actually different C functions being called.

x[0]['field'] = np.ones(10, dtype='i4') calls voidtype_ass_subscript:

#0  __memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:60
#1  0x00007ffff62107f8 in VOID_copyswap (dst=0xd3dd10 "", src=0x936110 "\001", swap=0, arr=0x7ffff7ec12b0)
    at numpy/core/src/multiarray/arraytypes.c.src:2058
#2  0x00007ffff6336040 in voidtype_setfield (self=0x7ffff7ebd650, args=
    (<numpy.ndarray at remote 0x7ffff7ec1080>, <numpy.dtype at remote 0xde1b88>, 0), kwds=0x0)
    at numpy/core/src/multiarray/scalartypes.c.src:1701
#3  0x00007ffff6336964 in voidtype_ass_subscript (self=0x7ffff7ebd650, ind='field', val=<numpy.ndarray at remote 0x7ffff7ec1080>)
    at numpy/core/src/multiarray/scalartypes.c.src:2248

x[0]['field'][:] = np.ones(10, dtype='i4') calls array_assign_slice:

#1  0x00007ffff62d384e in PyArray_AssignArray (dst=dst@entry=0xddb990, src=0x7ffff7ec1120, wheremask=wheremask@entry=0x0, 
    casting=casting@entry=NPY_UNSAFE_CASTING) at numpy/core/src/multiarray/array_assign_array.c:336
#2  0x00007ffff62debfc in PyArray_MoveInto (dst=dst@entry=0xddb990, src=<optimized out>) at numpy/core/src/multiarray/ctors.c:2686
#3  0x00007ffff62aef52 in PyArray_CopyObject (dest=dest@entry=0xddb990, src_object=src_object@entry=0x7ffff7ec1120)
    at numpy/core/src/multiarray/arrayobject.c:327
#4  0x00007ffff635f0ba in array_assign_slice (self=<optimized out>, ilow=0, ihigh=9223372036854775807, v=0x7ffff7ec1120)
    at numpy/core/src/multiarray/sequence.c:108

All I did in 14b6ff1 is to ensure, that for the first case no array is but a scalar is assigned. I added a corresponding test. This change resolves this ticket (#3126) and #3561.

Any comments, better ideas, ...?

mbyt added a commit to mbyt/numpy that referenced this issue Mar 30, 2014
Raising an error for non supported operations. Also adapting related
statement in test. Resolves numpy#3126 and numpy#3561.
@charris
Copy link
Member
charris commented Mar 30, 2014

Hmm, following your example

In [24]: x = ones(2, dt)

In [25]: type(x[0])
Out[25]: numpy.void

In [26]: type(x[0]['field'])
Out[26]: numpy.ndarray

In [27]: type(x[0]['struct'])
Out[27]: numpy.void

In [28]: type(x[0]['struct']['elem'])
Out[28]: numpy.ndarray

I'm thinking it should only be allowed to assign void to void types and ndarray types should require a slice for assignment. The latter would be consistent with the rest of numpy. But I think this could use some discussion on the list. Could you put together a post?

@mbyt
Copy link
Contributor Author
mbyt commented Apr 2, 2014

sorry, took some time... As suggested, I sent a post: http://mail.scipy.org/pipermail/numpy-discussion/2014-April/069802.html.

@mbyt
Copy link
Contributor Author
mbyt commented Apr 27, 2014

@charris looks like no one actively has something against the above suggestion on the list:

> The question now is: What is the general opinion about forcing slice
> assignments for ndarray types and raising otherwise?

How can we proceed?

@charris
Copy link
Member
charris 8000 commented May 1, 2014

Hmm, I need to think about this a bit more. Usually Python assignments are by reference, but numpy is a bit different in that you can do a[i] = 1 as shorthand for a[i, ...] = 1. I'm leaning towards requiring the [...] notation for arrays, and maybe for void scalars also as we are changing the memory inplace. @njsmith Any thoughts on this?

@juliantaylor
Copy link
Contributor

should we still consider this for 1.9, there are a bunch of linked bugs.

@charris
Copy link
Member
charris commented Jul 18, 2014

There is a PR for this, so 1.9 would be good. It's long enough ago that the PR needs another review.

@njsmith
Copy link
Member
njsmith commented Jul 18, 2014

Don't see any reason this would be a 1.9 blocker, though if the PR sneaks
in then fine...

@charris: I'm sorry, what's the question? The behaviour here does seem
pretty broken -- the assigning-only-one-value bit in particular. Are you
suggesting though that arr["subarr"] = foo should be an error, and we
require people to write arr["subarr"][...] = foo instead? That seems a bit
extreme to me... what else would arr["subarr"] = foo mean?

On Fri, Jul 18, 2014 at 10:27 PM, Julian Taylor notifications@github.com
wrote:

should we still consider this for 1.9, there are a bunch of linked bugs.


Reply to this email directly or view it on GitHub
#3126 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@charris
Copy link
Member
charris commented Jul 18, 2014

It means I don't recall exactly what the state of the PR is, in particular apropos my comments may 30. I don't think it is a blocker as there are workarounds.

@mbyt
Copy link
Contributor Author
mbyt commented Jul 22, 2014

@charris The suggestion was that the following syntay is OK:

  • arr[0]['scalar'] = scalar
  • arr[0]['array'][:] = array

While arr[0]['array'] = array will raise an error.

Currently arr[0]['array'] = array is allowed but buggy / will incorrectly assign the array (solely the first item).

The main enhancement I see in the PR is that the unsuccessful assignment is reported and not silently passed --- which will just help a lot in tracing down errors.

@mbyt
Copy link
Contributor Author
mbyt commented Dec 29, 2014

The described problem still persists in master --- I just merged the master into the pull request #4556 (082cc0e) related to this ticket.

Is there something that can be done to get this moving along?

@njsmith
Copy link
Member
njsmith commented Dec 29, 2014

I still have the impression that allowing the subarray assignment and
handling it correctly would be nicest, but the semantics in PR #4556 (=
raise an error) do seem better than the current situation (= return the
incorrect result), and leaves the door open to further improvements in the
future. I haven't reviewed the code in #4556 though.

On Mon, Dec 29, 2014 at 9:15 PM, mbyt notifications@github.com wrote:

The described problem still persists in master --- I just merged the
master into the pull request #4556
#4556 (082cc0e
082cc0e)
related this ticket.

Is there something that can be done to get this moving along?


Reply to this email directly or view it on GitHub
#3126 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@mbyt
Copy link
Contributor Author
mbyt commented Jan 1, 2015

Thanks for the update.

Regarding the "subarray assignment", one could also argue: In analogy to dict, that a) item assignment keeps the address / memory of the list, where as b) a direct assignment of "data" (the subarray assignment) will create a new address / memory. The option b) (create a new address) makes no sense for structured numpy arrays (which own all their data). Therefore, we raise an error --- also in order to effectively highlight the difference.

In [31]: x = {'header': {'a': 5, 'b': 3, 'c': 2}, 'data': list(range(10))}

In [32]: hex(id(x['data']))
Out[32]: '0x7f673c562488'

In [33]: x['data'][:] = range(20,30)

In [34]: hex(id(x['data']))
Out[34]: '0x7f673c562488'

In [35]: x['data'] = range(20,30)

In [36]: hex(id(x['data']))
Out[36]: '0x7f673c0bbae0'

But I am sure that there are also good arguments in favor for allowing subarray assignment...

Please let me know whenever I can do something to keep the ball rolling.

@mbyt
Copy link
Contributor Author
mbyt commented Feb 3, 2015

Just merged again the master into the pull request related to this ticket #4556 mbyt@a8a1fc5. Travis still passes OK. #3561, which is basically a duplicate of this issue is also resolved through this pull request.

I really would like to get this into the next numpy release / 1.10 --- it's just a 15 line change of production code. Is there anything I can do?

ahaldane added a commit to ahaldane/numpy that referenced this issue Mar 13, 2015
This commit modifiesvoidtype_get/setfield to that they call the ndarray
get/setfield. This solves bugs related to void-scalar assignment.

This fixes issues numpy#3126,
numpy#3561.
jbernhard added a commit to Duke-QCD/hic-eventgen that referenced this issue Jul 8, 2016
This bug was fixed sometime between numpy 1.9.2 (the OSG's version) and 1.11.1
(the current version).  See numpy/numpy#3126.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
0