Dtype descr inconsistency with invisible fields #3176

kgabor · 2013-03-28T15:17:37Z

Declaring a dtype with the dict 'name','format','offset' format could result in invisible fields. The same dtype however is not reproduced by its own descr that uses the list of tuples format. The invisibility does matter especially at comparisons/sorting. In the following case e.g. B and C should behave the same way:

>>> A
array([(0, 2, 0), (0, 1, 1), (0, 0, 5), (1, 3, 2), (1, 2, 3)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> B=A.view(dtype={'names':('A','C'),'formats':(int,int),'offsets':(A.dtype.fields['a'][1],A.dtype.fields['c'][1])})
>>> B
array([(0, 0), (0, 1), (0, 5), (1, 2), (1, 3)], 
      dtype=[('A', '<i8'), ('', '|V8'), ('C', '<i8')])
>>> B.sort()
>>> B
array([(0, 0), (0, 1), (0, 5), (1, 2), (1, 3)], 
      dtype=[('A', '<i8'), ('', '|V8'), ('C', '<i8')])
>>> A
array([(0, 2, 0), (0, 1, 1), (0, 0, 5), (1, 3, 2), (1, 2, 3)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])

>>> C=A.view(dtype=B.dtype.descr)
>>> C
array([(0, <read-write buffer ptr 0x7f9de28bed58, size 8 at 0x10e60f830>, 0),
       (0, <read-write buffer ptr 0x7f9de28bed70, size 8 at 0x10e60f830>, 1),
       (0, <read-write buffer ptr 0x7f9de28bed88, size 8 at 0x10e60f830>, 5),
       (1, <read-write buffer ptr 0x7f9de28beda0, size 8 at 0x10e60f830>, 2),
       (1, <read-write buffer ptr 0x7f9de28bedb8, size 8 at 0x10e60f830>, 3)], 
      dtype=[('A', '<i8'), ('f1', '|V8'), ('C', '<i8')])
>>> C.sort()
>>> A
array([(0, 0, 5), (0, 1, 1), (0, 2, 0), (1, 2, 3), (1, 3, 2)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])

Also, on save and load into .npy file, the invisible field becomes visible.

ahaldane · 2015-07-28T22:42:30Z

This seems to be a bug or quirk of the Array Interface, and of dtype.descr.

The problem is that dtype.descr keeps the 'unseen' padding, apparently in order to conform to the Array Interface requirements (since dtype.descr is the "Array-interface compliant full description of the data-type."). Note that B.dtype.descr is [('A', '<i8'), ('', '|V8'), ('C', '<i8')], but repr(B.dtype) is not. Also note that C = A.view(B.dtype) works properly.

As noted in the OP, if you use np.save and np.load you will end up with extra 'padding fields' in your file, which leads to errors. This is not limited to the code above. For example, most "aligned" structs cannot be save/loaded properly because of their padding:

>>> a = np.zeros(2, np.dtype('i4,u1,i8', align=True))
>>> a.dtype.descr
[('f0', '<i4'), ('f1', '|u1'), ('', '|V3'), ('f2', '<i8')]
>>> np.save('file', a)
>>> np.load('file.npy')
ValueError: two fields with the same name

I'd have to learn more about the array interface to figure any more out.

My instinct for a solution is that dtype.descr is behaving properly, and should simply not be used unless you are doing something with the array interface. The code in the OP should be replaced with C = A.view(B.dtype). However, np.load needs a fix. Maybe it should notice when the field name is '' and interpret that as padding.

ahaldane · 2015-07-28T22:56:56Z

Oh, just found that #2215 is the same problem.

charris · 2015-08-02T01:47:38Z

I think I'll leave both this and #2215 open, this because it is new, and the other because of the useful discussion.

embray · 2015-10-07T16:32:07Z

I've run afoul of this as well. Thanks for raising the issue.

mattip · 2018-11-09T23:55:38Z

PR #12358 removes the hidden fields while loading. I think we should disallow empty field names.

ahaldane · 2018-11-18T06:01:18Z

#12358 fixed this, by my reading. np.load/save now work.

You just shouldn't use code like A.view(dtype=B.dtype.descr) now: The dtype.descr attribute should only be used in special situations, see #8174.

charris added Defect labels Feb 22, 2014

ahaldane mentioned this issue Feb 9, 2015

ENH: structured datatype safety checks #5548

Merged

ahaldane mentioned this issue Sep 26, 2015

structured dtype descr method returns incorrect descriptor when void space is at the end of dtype built through a dict #6359

Closed

ahaldane mentioned this issue Jul 2, 2016

Array from memoryview fails if there's trailing padding #7797

Open

ahaldane mentioned this issue Oct 12, 2016

BUG: np.save() and np.load() are not idempotent when align=True or fields are discontiguous #8100

Closed

ahaldane mentioned this issue Jan 14, 2018

BUG: structured array indexing dtype and dtype.descr changed in 1.14.0 #10387

Closed

eric-wieser mentioned this issue Jan 14, 2018

BUG: Make dtype.descr error for out-of-order fields #10391

Merged

ahaldane mentioned this issue Jan 16, 2018

BUG: Revert multifield-indexing adds padding bytes for NumPy 1.15. #10411

Merged

ahaldane mentioned this issue Jan 26, 2018

Problem with views on masked structured arrays #10483

Closed

mattip removed the priority: normal label Oct 21, 2018

ahaldane closed this as completed Nov 18, 2018

embray mentioned this issue Nov 25, 2018

Handling of invisible dtype fields in io.fits astropy/astropy#8172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Dtype descr inconsistency with invisible fields #3176

Dtype descr inconsistency with invisible fields #3176

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dtype descr inconsistency with invisible fields #3176

Dtype descr inconsistency with invisible fields #3176

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!