8000 Dtype descr inconsistency with invisible fields · Issue #3176 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Dtype descr inconsistency with invisible fields #3176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kgabor opened this issue Mar 28, 2013 · 6 comments
Closed

Dtype descr inconsistency with invisible fields #3176

kgabor opened this issue Mar 28, 2013 · 6 comments

Comments

@kgabor
Copy link
Contributor
kgabor commented Mar 28, 2013

Declaring a dtype with the dict 'name','format','offset' format could result in invisible fields. The same dtype however is not reproduced by its own descr that uses the list of tuples format. The invisibility does matter especially at comparisons/sorting. In the following case e.g. B and C should behave the same way:

>>> A
array([(0, 2, 0), (0, 1, 1), (0, 0, 5), (1, 3, 2), (1, 2, 3)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> B=A.view(dtype={'names':('A','C'),'formats':(int,int),'offsets':(A.dtype.fields['a'][1],A.dtype.fields['c'][1])})
>>> B
array([(0, 0), (0, 1), (0, 5), (1, 2), (1, 3)], 
      dtype=[('A', '<i8'), ('', '|V8'), ('C', '<i8')])
>>> B.sort()
>>> B
array([(0, 0), (0, 1), (0, 5), (1, 2), (1, 3)], 
      dtype=[('A', '<i8'), ('', '|V8'), ('C', '<i8')])
>>> A
array([(0, 2, 0), (0, 1, 1), (0, 0, 5), (1, 3, 2), (1, 2, 3)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])

>>> C=A.view(dtype=B.dtype.descr)
>>> C
array([(0, <read-write buffer ptr 0x7f9de28bed58, size 8 at 0x10e60f830>, 0),
       (0, <read-write buffer ptr 0x7f9de28bed70, size 8 at 0x10e60f830>, 1),
       (0, <read-write buffer ptr 0x7f9de28bed88, size 8 at 0x10e60f830>, 5),
       (1, <read-write buffer ptr 0x7f9de28beda0, size 8 at 0x10e60f830>, 2),
       (1, <read-write buffer ptr 0x7f9de28bedb8, size 8 at 0x10e60f830>, 3)], 
      dtype=[('A', '<i8'), ('f1', '|V8'), ('C', '<i8')])
>>> C.sort()
>>> A
array([(0, 0, 5), (0, 1, 1), (0, 2, 0), (1, 2, 3), (1, 3, 2)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])

Also, on save and load into .npy file, the invisible field becomes visible.

@ahaldane
Copy link
Member

This seems to be a bug or quirk of the Array Interface, and of dtype.descr.

The problem is that dtype.descr keeps the 'unseen' padding, apparently in order to conform to the Array Interface requirements (since dtype.descr is the "Array-interface compliant full description of the data-type."). Note that B.dtype.descr is [('A', '<i8'), ('', '|V8'), ('C', '<i8')], but repr(B.dtype) is not. Also note that C = A.view(B.dtype) works properly.

As noted in the OP, if you use np.save and np.load you will end up with extra 'padding fields' in your file, which leads to errors. This is not limited to the code above. For example, most "aligned" structs cannot be save/loaded properly because of their padding:

>>> a = np.zeros(2, np.dtype('i4,u1,i8', align=True))
>>> a.dtype.descr
[('f0', '<i4'), ('f1', '|u1'), ('', '|V3'), ('f2', '<i8')]
>>> np.save('file', a)
>>> np.load('file.npy')
ValueError: two fields with the same name

I'd have to learn more about the array interface to figure any more out.

My instinct for a solution is that dtype.descr is behaving properly, and should simply not be used unless you are doing something with the array interface. The code in the OP should be replaced with C = A.view(B.dtype). However, np.load needs a fix. Maybe it should notice when the field name is '' and interpret that as padding.

@ahaldane
Copy link
Member

Oh, just found that #2215 is the same problem.

@charris
Copy link
Member
charris commented Aug 2, 2015

I think I'll leave both this and #2215 open, this because it is new, and the other because of the useful discussion.

@embray
Copy link
Contributor
embray commented Oct 7, 2015

I've run afoul of this as well. Thanks for raising the issue.

@mattip
Copy link
Member
mattip commented Nov 9, 2018

PR #12358 removes the hidden fields while loading. I think we should disallow empty field names.

@ahaldane
Copy link
Member

#12358 fixed this, by my reading. np.load/save now work.

You just shouldn't use code like A.view(dtype=B.dtype.descr) now: The dtype.descr attribute should only be used in special situations, see #8174.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0