8000 BUG: structured array indexing dtype and dtype.descr changed in 1.14.0 · Issue #10387 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: structured array indexing dtype and dtype.descr changed in 1.14.0 #10387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
larsoner opened this issue Jan 12, 2018 · 10 comments · Fixed by #10411
Closed

BUG: structured array indexing dtype and dtype.descr changed in 1.14.0 #10387

larsoner opened this issue Jan 12, 2018 · 10 comments · Fixed by #10411

Comments

@larsoner
Copy link
Contributor

Running this code on 1.13.x and 1.14.0 yields different outputs:

import numpy as np
dt = [('a', int), ('b', float)]
data = np.zeros(1, dtype=dt)
data['a'] = 0
data['b'] = 1
data = data[['a']]
print(data.dtype)
print(data.dtype.descr)
np.save('test.npy', data)
data_read = np.load('test.npy')
print(data_read.dtype)
print(data_read.dtype.descr)

On 1.13.x the output is consistent -- for all four print statements it gives:

[('a', '<i8')]

On 1.14.x it changes to these, the first of which is now dict-like, the second (descr) has a new field ('', '|V8'), the third (round-trip IO) changes dtype back to a list, and the fourth (descr) changes the blank field name to '':

{'names':['a'], 'formats':['<i8'], 'offsets':[0], 'itemsize':16}
[('a', '<i8'), ('', '|V8')]
[('a', '<i8'), ('f1', 'V8')]
[('a', '<i8'), ('f1', '|V8')]

The real reason I found this bug is because the addition of the blank field name breaks scipy.io.savemat:

>>> from scipy import io  # on latest dev
>>> io.savemat('test.mat', dict(data=data))
  File "untitled0.py", line 16, in <module>
    io.savemat('test.mat', dict(data=data))
  File "scipy/io/matlab/mio.py", line 219, in savemat
    MW.put_variables(mdict)
  File "scipy/io/matlab/mio5.py", line 846, in put_variables
    self._matrix_writer.write_top(var, asbytes(name), is_global)
  File "scipy/io/matlab/mio5.py", line 587, in write_top
    self.write(arr)
  File "scipy/io/matlab/mio5.py", line 616, in write
    self.write_struct(narr)
  File "scipy/io/matlab/mio5.py", line 735, in write_struct
    self._write_items(arr)
  File "scipy/io/matlab/mio5.py", line 752, in _write_items
    self.write(el[f])
ValueError: no field of name 

If these structured dtype changes are intentional and not bugs, I can open a SciPy issue instead (which would either be to avoid writing void data, or write out unknown fields).

@eric-wieser
Copy link
Member
eric-wieser commented Jan 12, 2018

I would expect the first and second print to differ, since .descr is only there for compatilibity. Seems the bug is in np.save / np.load.

It outright fails if you use data = data[['b', 'a']] instead (improved n #10391 with a bette 8000 r message).

cc @ahaldane

@larsoner
Copy link
Contributor Author

Seems the bug is in np.save / np.load.

It appears there is one there, yes. But there is also one either in SciPy (dealing with empty field names) or NumPy (the addition of an empty field name) that is breaking scipy.io.savemat.

@ahaldane
Copy link
Member

OK, I'm back.

And ugh, this is (in part) an ancient outstanding bug, #3176 (also #2215) which has now become easier to run across because of the changes from #6053.

The old problem from #3176 is that save/load has always been broken (raises exceptions) for any structured type which contains padding bytes. But as a result of #6053, multi-field indexing now commonly returns arrays with padding bytes.

#3176 has been on my "structured type" TODO list. My idea was to get the consistent structured type behavior settled first in #6053, and only then move on to "details" like the load/save implementation. But maybe we can't do that.

@eric-wieser
Copy link
Member

For now do we want to provide a mechanism to copy a structured array into a contiguous dtype, at the cost of reordering fields? The save function could just do this 8000 internally and warn

@ahaldane
Copy link
Member

That seems like a pretty reasonable solution. Better than reverting #6053 imo, and a lot easier than trying to fix #3176, which I think will be tricky since it is enmeshed with the PEP3118 issues we found for structured types.

Actually maybe we should define a new public function somewhere called pack_fields(arr) which returns a copy of the array (or dtype) with all padding bytes removed. We could use it in save, and could mention it in release notes to make life easier for anyone who's code broke.

@eric-wieser
Copy link
Member
eric-wieser commented Jan 14, 2018

Another option:

  • x.astype(x.dtype.packed) - dtype with .names in the same order, but offsets changed to be ordered and without padding
  • x.view(x.dtype.packed_view) - dtype with .names sorted by offsets (padding preserved)

I'd prefer something that operates on the dtype, not the array

@larsoner
Copy link
Contributor Author

But as a result of #6053, multi-field indexing now commonly returns arrays with padding bytes.

While it sounds like the save/load stuff is a bug, my primary use case concerns the (separable?) bug caused in scipy.io.savemat, which comes from the addition of padding bytes at all, shown in the second print statement as:

[('a', '<i8'), ('', '|V8')]

In parallel with the fixes proposed here that will address the np.save/load problems, should I open a PR to SciPy that ignores any field of void type when writing a structured type?

@charris
Copy link
Member
charris commented Feb 8, 2018

@ahaldane This is fixed for 1.14, correct?

@ahaldane
Copy link
Member
ahaldane commented Feb 8, 2018

Yes. Even though #10537 was merged and has the right "fixes" strings, I think it didn't auto-close because it was for the 1.14.x branch.

This should get auto-closed when the 1.15 version in #10411 is merged.

@charris charris modified the milestones: 1.14.1 release, 1.15.0 release Feb 8, 2018
@larsoner
Copy link
Contributor Author
larsoner commented Feb 8, 2018

I just rebuilt the latest 1.14.x branch, and can confirm that all outputs of my script are as they were on 1.13:

[('a', '<i8')]
[('a', '<i8')]
[('a', '<i8')]
[('a', '<i8')]

And that io.savemat('test.mat', dict(data=data)) works again. So yes this bug is fixed in 1.14.x.

I'm happy to give it a try on 1.15.0 once #10411 is in if it helps.

7782

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0