8000 DEP: deprecate empty field names by mattip · Pull Request #12375 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DEP: deprecate empty field names #12375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mattip
Copy link
Member
@mattip mattip commented Nov 13, 2018

Split off from PR #12358. Empty field names are confusing, users should construct record arrays with name, format, offset dictionaries instead.

@charris
Copy link
Member
charris commented Nov 13, 2018

Labels please.

@mattip mattip force-pushed the deprecate-blank-fields branch from ac066e2 to 657db88 Compare November 15, 2018 00:13
@mattip
Copy link
Member Author
mattip commented Nov 15, 2018

should hit the mailing list

@mattip mattip force-pushed the deprecate-blank-fields branch from 657db88 to eaca685 Compare November 15, 2018 01:24
@charris
Copy link
Member
charris commented Nov 15, 2018

If it needs to hit the mailing list, please do so.

@mattip
Copy link
Member Author
mattip commented Nov 15, 2018

Hmm. The problem is deeper than just blank names. dtype.descrdoes not properly handle overlapping fields:

>>>a = np.dtype({'names': ['a', 'b'], 'formats': ['i4'] * 2, 'offsets': [0, 2]})
>>>a.descr
[('a', '<i4'), ('b', '<i4')]  # loses the offsets :(
>>>str(a)
"{'names':['a','b'], 'formats':['<i4','<i4'], 'offsets':[0,2], 'itemsize':6}"

Maybe we should abandon deprecating blank fields, as it does not solve the roundtrip save/load problem. We could try using str(dtype) in saving/loading record arrays, as it seems to capture the full expression of the dtype.

@eric-wieser
Copy link
Member
eric-wieser commented Nov 16, 2018

dtype.descrdoes not properly handle overlapping fields:

On 1.15 (since #10391) your code gives an error,

ValueError: dtype.descr is not defined for types with overlapping or out-of-order fields
8000

@eric-wieser
Copy link
Member
eric-wieser commented Nov 16, 2018

as it seems to capture the full expression of the dtype.

Counterexample: str(np.dtype((np.record, 4)))

Can't we just pickle the dtype?

@ahaldane
Copy link
Member
ahaldane commented Nov 18, 2018

Also, that's similar to the overlapping-fields problem in the PEP3118 interface, see this comment.

It turns out that the PEP3118 structure format string cannot represent overlapping or out-of-order fields. Since dtype.descr was originally designed for use in __array_interface__, which is essentially a forerunner to the PEP3118-interface, it isn't surprising it can't represent overlapping fields either.

In my opinion, the "best" ultimate behavior would be for us to work with CPython to update or extend the PEP3118 format string for structs so it can handle overlapping fields, out-of-order-fields, and padding, using the justification in the linked comment above. Then deprecate __array_interface__ and use PEP3118 everywhere, including in np.save/load when serializing arrays. Currently .descr is used to serialize the dtype - it would be nice to use PEP3118 datatype notation instead since it was in principle designed for such serialization.

@eric-wieser
Copy link
Member

s/__array_function__/__array_interface__/?

@eric-wieser
Copy link
Member
eric-wieser commented Nov 18, 2018

and use PEP3118 everywhere, including in np.save/load when serializing arrays.

It's not clear to me why the buffer interface and a serialization format need to be related. It's not unreasonable for numpy to support custom or third-party dtypes that are not part of PEP3118, and we need to be able to save those too.

Python already has a serialization protocol - pickle. Why can't we just use it? What makes it worth defining a custom serialization format for the data type?

since it was in principle designed for such serialization.

The word "serial" appears nowhere in PEP3118. PEP3118 is a memory sharing interface, not a serialization interface.

@ahaldane
Copy link
Member

Well, I suppose that's true, that seems like a reasonable solution as well. We've already written all the pickle serialization stuff in descriptor.c.

My background thought, though, was it would be "cool" if .npy files were simply gzipped PEP3118 buffers, which anyone (and not just numpy) could read.

@charris
Copy link
Member
charris commented Nov 18, 2018

@mattip Abandon?

@mattip mattip closed this Nov 19, 2018
@mattip mattip deleted the deprecate-blank-fields branch March 4, 2019 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0