-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Structured arrays with offsets not working in 1.14.2 but OK in 1.13.3 #10752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm ... |
Some more exploration: >>> np.__version__
'1.14.1'
>>> dt = np.dtype({'f1': ('S3', 2), 'f2': ('S4', 7)})
>>> s12 = b'abcdefghijkl'
>>> s12_arr = np.array(s12)
>>> np.array(s12, dt)
array((b'abc', b'abcd'),
dtype={'names':['f1','f2'], 'formats':['S3','S4'], 'offsets':[2,7], 'itemsize':11})
>>> np.array(s12_arr, dt)
array((b'abc', b'abcd'),
dtype={'names':['f1','f2'], 'formats':['S3','S4'], 'offsets':[2,7], 'itemsize':11})
>>> s12_arr.view(dt)
ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged
>>> s11 = b'abcdefghijk'
>>> s11_arr = np.array(s11)
>>> np.array(s11, dt)
array((b'abc', b'abcd'),
dtype={'names':['f1','f2'], 'formats':['S3','S4'], 'offsets':[2,7], 'itemsize':11})
>>> np.array(s11_arr, dt)
array((b'abc', b'abcd'),
dtype={'names':['f1','f2'], 'formats':['S3','S4'], 'offsets':[2,7], 'itemsize':11})
>>> s11_arr.view(dt)
array((b'cde', b'hijk'),
dtype={'names':['f1','f2'], 'formats':['S3','S4'], 'offsets':[2,7], 'itemsize':11}) At least this last case works... |
The problem occurs for unstructured |
Oh, actually this is by design. Running the above on 1.13 gives: >>> s12 = b'abcdefghijkl'
>>> s12_arr = np.array(s12)
>>> np.array(s12, dt)
array((b'abc', b'abcd'),
dtype={'names':['f1','f2'], 'formats':['S3','S4'], 'offsets':[2,7], 'itemsize':11})
>>> np.array(s12_arr, dt)
array((b'cde', b'hijk'),
dtype={'names':['f1','f2'], 'formats':['S3','S4'], 'offsets':[2,7], 'itemsize':11}) 1.14 fixed this inconsistency by treating string arrays the same way as string scalars. |
This is sort of expected, as recently discussed here, and the new behavior is the "desired" behavior: The problem is that your original code is using some dangerous behavior which was causing many hard-to-debug bugs like #7058, #6314, #2353, #3351 In 1.13, numpy would assign to While it does seem like we should have worked harder to have a round of deprecation, the new behavior avoids these problems by (more correctly) casting instead of viewing. You can achieve your old behavior using a view: s = 'abcdefghijkl'
dt = np.dtype({'f1':('S3',2),'f2':('S4',7)})
np.array([s], dtype='S11').view(dt) |
Hmmmm, OK. Thanks for the example code, which appears to work the same under both 1.13 and 1.14. I don't fully understand the implications of what changed, but it still seems strange to me that you can give an offset in a dtype as in the original example and it is ignored. Is there some way to make it throw an exception instead? However, the bigger problem seems to be that I can't access the entire string this way. You're using S11 for a 12-character string, and using S12 doesn't work. If I use S11 and change the last field to be 5 characters long, I still don't get the final "l" in the string. Why is this and is there some kind of workaround? |
The idea is the byte-offsets are a property of the memory layout of the data, and memory layout should not matter when assigning values. This is like endianness: The integer "1" is the same value on big-endian and little-endian machines even though the byte layout is different. When you do
s = 'abcdefghijkl'
dt = np.dtype({'names': ['f1', 'f2'], 'formats': ['S3', 'S4'],
'offsets': [2, 7], 'itemsize': 12})
np.array([s], dtype='S12').view(dt) Some other notes: First, and somewhat unrelated, you are using a special dictionary-based form of dtype specification ( Second, note that your code does not work in python3 even with numpy 1.13, which is another sign of the bugginess we fixed. And third, just to illustrate the weirdness of the 1.13 behavior in python2, consider: >>> s = 'abcdefghijkl'
>>> np.array(s, dtype='i4,S1,f4')
array((1684234849, 'e', 1.75599422e+25),
dtype=[('f0', '<i4'), ('f1', 'S1'), ('f2', '<f4')]) |
Thank you for taking the time to explain fully. I've updated my code and it is working properly now under Python 3. I'm a little confused by your final example, though. That behavior is actually what I would expect. |
The behavior there is reasonable, but |
I'm going to close this one because it was an intentional change, to get it off the 1.14.3 issue list. Feel free to reopen if there is more to discuss. |
Uh oh!
There was an error while loading. Please reload this page.
NumPy 1.14.x always uses a zero offset for strings in a structured array.
but:
The text was updated successfully, but these errors were encountered: