8000 structured dtype descr method returns incorrect descriptor when void space is at the end of dtype built through a dict · Issue #6359 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

structured dtype descr method returns incorrect descriptor when void space is at the end of dtype built through a dict #6359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ikehall opened this issue Sep 25, 2015 · 3 comments

Comments

@ikehall
Copy link
ikehall commented Sep 25, 2015

Using python 2.7.10, numpy 1.9.2
If I define a structured dtype in the following manner:

>>>my_dtype = np.dtype({ 
    'names':['A', 'B'], 
    'formats':['f4', 'f4'], 
    'offsets'[0, 8], 
    'itemsize':16})

And I then try to create a new dtype from this dtypes descr:

>>>new_dtype = np.dtype(my_dtype.descr)

Then the two dtypes will not have the same itemsize

>>>my_dtype.itemsize
16
>>>new_dtype.itemsize
12

Examining the descr of my_dtype, we see that it is leaving off the 4 void bytes at the end

>>>my_dtype.descr
[('A', '<f4'), ('', '|V4'), ('B', '<f4')]

What should happen instead:

>>>my_dtype.descr
[('A', '<f4'), ('', '|V4'), ('B', '<f4'), ('', '|V4')]

This has relevance for use of structured arrays with IPython.parallel, as this is how structured arrays are reconstructed when serialized and sent to engines in IPython.parallel. A work-around for the user of course is to define some field that marks the end of the structured data, but it seems that this should not be necessary.

seberg added a commit to seberg/numpy that referenced this issue Sep 25, 2015
dtype.descr returns void fields to explain the padding part of
the dtype. The last void field for the itemsize itself was however
not included.

Closes numpygh-6359
@seberg
Copy link
Member
seberg commented Sep 25, 2015

Uploaded a PR to fix this, I wonder if the extra void fields are not somewhat annoying in any case and if there is not a better way to do this, though?

@ahaldane
Copy link
Member

Yes, that is indeed a bug, but I'd also like to warn you that there is a deeper bug here.

dtype.descr is documented to be the "Array-interface compliant full description of the data-type." It keeps the 'unseen' padding in order to conform to the Array Interface requirements -- the fact that it keeps 'padding' bytes is intentional.

It is used in 2 places in numpy: 1. In the Array Interface (for serializing) , 2. In the io code to save the .npy format.

Now, unfortunately currently numpy cannot reliably convert dtype.descr back to a dtype!!!! This is actually a problem in the io code because it means we cannot load structured arrays with padding bytes from .npy files. See #2215 #3176 and related.

@ikehall, unfortunately this means numpy probably currently cannot do what you want (though I encourage you to check to be sure). Numpy knows how to serialize your array, but it probably currently cannot deserialize it due to a bug (in addition to the bug you just found)

@ikehall
Copy link
Author
ikehall commented Sep 28, 2015

Thank you for bringing that to my attention.

For my purposes, the fix for this bug fixes my problems completely. The other bugs in dtype.descr seem to all be related to automatic naming of unnamed and invisible fields. As my fields are all either named (with names other than 'f1', 'f2', etc...) or 'invisible', it doesn't really matter to me if they come back on deserialization with automatically generated names, because I am not going to use them, and further, when I ultimately save the array, I discard the dtype information and just save the string of bytes. (Because I am saving to an industry-defined standard, and not np.save)

I agree that those are bugs, and could affect me if I relied on automatic field naming, or named my fields badly, but currently this is not the case for me.

jaimefrio pushed a commit to jaimefrio/numpy that referenced this issue Mar 22, 2016
dtype.descr returns void fields to explain the padding part of
the dtype. The last void field for the itemsize itself was however
not included.

Closes numpygh-6359
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0