-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Fix npz header incompatibility #5178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix npz header incompatibility #5178
Conversation
I am unable to generate a test file with the 'L' addition, it probably requires 32 bits or windows. So the |
095ff53
to
edb4536
Compare
OK, added valid test file that has the problem. I think it needs 64 bit numpy running on windows. |
edb4536
to
bd606db
Compare
token_string = token[1] | ||
if (last_token_was_number and | ||
token_type == tokenize.NAME and | ||
token_string == "L"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace here is weird, though it doesn't really matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is that the condition should be easily distinguished from the body. See also PEP8, indentation. The other option is a (useless) comment ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough! It's totally readable as is so IMO it doesn't matter at all.
On Mon, Oct 13, 2014 at 12:30 AM, Charles Harris notifications@github.com
wrote:
In numpy/lib/format.py:
- """
- import tokenize
- if sys.version_info[0] >= 3:
# In Python3 stderr, stdout are text files.
from io import StringIO
- else:
from StringIO import StringIO
- tokens = []
- last_token_was_number = False
- for token in tokenize.generate_tokens(StringIO(asstr(s)).read):
token_type = token[0]
token_string = token[1]
if (last_token_was_number and
token_type == tokenize.NAME and
token_string == "L"):
My feeling is that the condition should be easily distinguished from the
body. See also PEP8, indentation. The other option is a (useless) comment ;)—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/pull/5178/files#r18751104.
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
LGTM, up to you whether you care enough to fix the whitespace thing |
bd606db
to
The Python2 generated file had long integer literals like '1L' that broke in Python3. The fix here is to filter out the 'L' and let safe_eval take care of the integer type in converting the string. The fix here comes from Nathaniel Smith with a few added fixups. Closes numpy#5170.
53ff0b1
to
8b1f90a
Compare
will this take care of array dtypes like: can we fix the write path in a similar way? |
@juliantaylor I think the write path should be fixable in the same way. I think the dtype should work as the strings should not parse as numbers, but without a test file can't be sure. The test files need to be generated in win64 with 64 bit python2. |
alright I'll merge it to simplify testing, if there are issues we can followup, a write path fix would be nice thanks |
That example dtype should be fine -- the fixup code just blindly strips out On Mon, Oct 13, 2014 at 6:58 PM, Julian Taylor notifications@github.com
Nathaniel J. Smith |
Fix npz header incompatibility
Fix npz header incompatibility
I can verify that |
I do think there might be problems with unicode field names. Currently, latin1 encoding is assumed for the header, and that might break down. The header should probably be utf-8 encoded. |
But unicode field names are not backward compatible with python2 ndarrays. |
Replicates: numpy/numpy#5178
Fixes #5170.
In addition, *.npy test files produced in both Python2 and Python3 are added so that compatibility between versions can be tested.
Rebased for backport to 1.9.