8000 Fix npz header incompatibility by charris · Pull Request #5178 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Fix npz header incompatibility #5178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 13, 2014

Conversation

charris
Copy link
Member
@charris charris commented Oct 12, 2014

Fixes #5170.

In addition, *.npy test files produced in both Python2 and Python3 are added so that compatibility between versions can be tested.

Rebased for backport to 1.9.

@charris
Copy link
Member Author
charris commented Oct 12, 2014

I am unable to generate a test file with the 'L' addition, it probably requires 32 bits or windows. So the python2.npy test file here actually tests nothing.

@charris charris force-pushed the fix-npz-header-incompatibility branch from 095ff53 to edb4536 Compare October 12, 2014 23:01
@charris
Copy link
Member Author
charris commented Oct 12, 2014

OK, added valid test file that has the problem. I think it needs 64 bit numpy running on windows.

token_string = token[1]
if (last_token_was_number and
token_type == tokenize.NAME and
token_string == "L"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace here is weird, though it doesn't really matter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that the condition should be easily distinguished from the body. See also PEP8, indentation. The other option is a (useless) comment ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! It's totally readable as is so IMO it doesn't matter at all.

On Mon, Oct 13, 2014 at 12:30 AM, Charles Harris notifications@github.com
wrote:

In numpy/lib/format.py:

  • """
  • import tokenize
  • if sys.version_info[0] >= 3:
  •    # In Python3 stderr, stdout are text files.
    
  •    from io import StringIO
    
  • else:
  •    from StringIO import StringIO
    
  • tokens = []
  • last_token_was_number = False
  • for token in tokenize.generate_tokens(StringIO(asstr(s)).read):
  •    token_type = token[0]
    
  •    token_string = token[1]
    
  •    if (last_token_was_number and
    
  •            token_type == tokenize.NAME and
    
  •            token_string == "L"):
    

My feeling is that the condition should be easily distinguished from the
body. See also PEP8, indentation. The other option is a (useless) comment ;)


Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/pull/5178/files#r18751104.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@njsmith
Copy link
Member
njsmith commented Oct 12, 2014

LGTM, up to you whether you care enough to fix the whitespace thing

@charris charris force-pushed the fix-npz-header-incompatibility branch from bd606db to 53ff0b1 Compare October 12, 2014 23:40
The Python2 generated file had long integer literals like '1L' that
broke in Python3. The fix here is to filter out the 'L' and let
safe_eval take care of the integer type in converting the string.

The fix here comes from Nathaniel Smith with a few added fixups.

Closes numpy#5170.
@charris charris force-pushed the fix-npz-header-incompatibility branch from 53ff0b1 to 8b1f90a Compare October 12, 2014 23:40
@juliantaylor
Copy link
Contributor

will this take care of array dtypes like: [('k', 'f4', 4L)]

can we fix the write path in a similar way?

@charris
Copy link
Member Author
charris commented Oct 13, 2014

@juliantaylor I think the write path should be fixable in the same way. I think the dtype should work as the strings should not parse as numbers, but without a test file can't be sure. The test files need to be generated in win64 with 64 bit python2.

@juliantaylor
Copy link
Contributor

alright I'll merge it to simplify testing, if there are issues we can followup, a write path fix would be nice

thanks

@njsmith
Copy link
Member
njsmith commented Oct 13, 2014

That example dtype should be fine -- the fixup code just blindly strips out
unquoted-L whenever it appears directly after an unquoted-number.

On Mon, Oct 13, 2014 at 6:58 PM, Julian Taylor notifications@github.com
wrote:

alright I'll merge it to simplify testing, if there are issues we can
followup, a write path fix would be nice

thanks


Reply to this email directly or view it on GitHub
#5178 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

juliantaylor added a commit that referenced this pull request Oct 13, 2014
@juliantaylor juliantaylor merged commit 2f17863 into numpy:master Oct 13, 2014
juliantaylor added a commit that referenced this pull request Oct 13, 2014
@charris
Copy link
Member Author
charris commented Oct 13, 2014

I can verify that [('k', 'f4', 4L)] works.

@charris charris deleted the fix-npz-header-incompatibility branch October 14, 2014 00:06
@charris
Copy link
Member Author
charris commented Oct 14, 2014

I do think there might be problems with unicode field names. Currently, latin1 encoding is assumed for the header, and that might break down. The header should probably be utf-8 encoded.

@charris
Copy link
Member Author
charris commented Oct 14, 2014

But unicode field names are not backward compatible with python2 ndarrays.

roblovelock pushed a commit to roblovelock/npy that referenced this pull request Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

numpy npz files with v1 and v2 formats are not interoperable between python 2 and python 3.
3 participants
0