-
-
Notifications
You must be signed in to change notification settings - Fork 11k
numpy.save broken in 1.14 with python 2.7.5 #10348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can't seem to reproduce on python 2.7.11. Can you enter the |
My colleague just sent me the following email: (It seems this is a python 2.7.5 issue) Newest RPM in CentOS 7 (os) Only in Software Collections there's a newer version but we cannot use that one Tokens in _filter_header displayed below, python 2.7.5 shows no newline (\n)
|
FWIW, there was a big rant from one of the Red Hat folks about people not using the Pythons from software collections a while ago. If it's a Python bug, then there's (probably) nothing to fix on our end. |
I have the same error on Centos OS 7, which uses Python 2.7.5 (default, Aug 4 2017, 00:39:18) Is the only option, to wait till Cent OS 7 updates Python? |
I have the same problem while running on Centos 7, python 2.7.5 and numpy 1.14.0
|
I'll ask again - in the ipython traceback you have there, can you type |
|
I have the sample problem using python 2.7.5, numpy 1.14.0 and CentOS. |
Same problem while running on Centos 7, python 2.7.5 and numpy 1.14.0. def convert_mean(binMean, npyMean):
blob = caffe.proto.caffe_pb2.BlobProto()
bin_mean = open(binMean, 'rb').read()
blob.ParseFromString(bin_mean)
arr = np.array(caffe.io.blobproto_to_array(blob))
npy_mean = arr[0]
np.save(npyMean, npy_mean)
binMean = caffe_root + 'examples/cifar10/mean.binaryproto'
npyMean = caffe_root + 'examples/cifar10/mean.npy'
convert_mean(binMean, npyMean)
|
I think the work around solution is using other python version.
…On Jan 26, 2018 15:37, "Universe" ***@***.***> wrote:
Same problem while running on Centos 7, python 2.7.5 and numpy 1.14.0.
def convert_mean(binMean, npyMean):
blob = caffe.proto.caffe_pb2.BlobProto()
bin_mean = open(binMean, 'rb').read()
blob.ParseFromString(bin_mean)
arr = np.array(caffe.io.blobproto_to_array(blob))
npy_mean = arr[0]
np.save(npyMean, npy_mean)
binMean = caffe_root + 'examples/cifar10/mean.binaryproto'
npyMean = caffe_root + 'examples/cifar10/mean.npy'
convert_mean(binMean, npyMean)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-17-61deea8c4153> in <module>()
9 binMean = caffe_root + 'examples/cifar10/mean.binaryproto'
10 npyMean = caffe_root + 'examples/cifar10/mean.npy'
---> 11 convert_mean(binMean, npyMean)
<ipython-input-17-61deea8c4153> in convert_mean(binMean, npyMean)
6 arr = np.array(caffe.io.blobproto_to_array(blob))
7 npy_mean = arr[0]
----> 8 np.save(npyMean, npy_mean)
9 binMean = caffe_root + 'examples/cifar10/mean.binaryproto'
10 npyMean = caffe_root + 'examples/cifar10/mean.npy'
/usr/lib64/python2.7/site-packages/numpy/lib/npyio.pyc in save(file, arr, allow_pickle, fix_imports)
509 arr = np.asanyarray(arr)
510 format.write_array(fid, arr, allow_pickle=allow_pickle,
--> 511 pickle_kwargs=pickle_kwargs)
512 finally:
513 if own_fid:
/usr/lib64/python2.7/site-packages/numpy/lib/format.pyc in write_array(fp, array, version, allow_pickle, pickle_kwargs)
560 _check_version(version)
561 used_ver = _write_array_header(fp, header_data_from_array_1_0(array),
--> 562 version)
563 # this warning can be removed when 1.9 has aged enough
564 if version != (2, 0) and used_ver == (2, 0):
/usr/lib64/python2.7/site-packages/numpy/lib/format.pyc in _write_array_header(fp, d, version)
306 header.append("}")
307 header = "".join(header)
--> 308 header = asbytes(_filter_header(header))
309
310 hlen = len(header) + 1 # 1 for newline
/usr/lib64/python2.7/site-packages/numpy/lib/format.pyc in _filter_header(s)
465 tokens.append(token)
466 last_token_was_number = (token_type == tokenize.NUMBER)
--> 467 return tokenize.untokenize(tokens)
468
469
/usr/lib64/python2.7/tokenize.pyc in untokenize(iterable)
260 """
261 ut = Untokenizer()
--> 262 return ut.untokenize(iterable)
263
264 def generate_tokens(readline):
/usr/lib64/python2.7/tokenize.pyc in untokenize(self, iterable)
196 break
197 tok_type, token, start, end, line = t
--> 198 self.add_whitespace(start)
199 self.tokens.append(token)
200 self.prev_row, self.prev_col = end
/usr/lib64/python2.7/tokenize.pyc in add_whitespace(self, start)
185 def add_whitespace(self, start):
186 row, col = start
--> 187 assert row <= self.prev_row
188 col_offset = col - self.prev_col
189 if col_offset:
AssertionError:
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10348 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AG5FiqXy33v77BJokclHrWaPo0TIcuLsks5tOdUEgaJpZM4RXaLu>
.
|
I solved this problem by |
I am still receiving emails regarding this issue. Just to make it clear again, it turned out that this is not a numpy issue. The problem occurs while using numpy 1.14 with python 2.7.5. Unfortunately, that python version is the most up-to-date version in the default CentOS repositories. So what can you do about it? As pointed out before, you could simply use a previous version of numpy (e.g. 1.13.3). This does not look like a good solution to me, but if you have to stick to python 2.7.5 for any good reason, then this might be the solution you are looking for. |
I'm also seeing this problem with python 2.7.6 |
Did we numpy devs figure out what changed between 1.13 and 1.14 to trigger this? |
@ahaldane Well, it is pretty much related to the changes of #9025 |
So would adding a trailing newline before The |
If you look at the diff, in line 313 of the old code, the header was modified to contain a newline: Regarding code quality, the pull request definitely made things worse. There is plenty of duplication and a lot of comments (which means the code itself is not clear). Well, if this is a bugfix for some alignment issue, then having things work properly might be more important. Too bad, it caused other problems which need to be fixed now. |
How do you guys test the library? Are you running tests with different python versions? |
Here's the cpython commit that affected things: python/cpython@5e6db31 I don't have time right now to read it and understand why this fixed missing newlines, but I did check that in python 2.7.5: >>> tokenize.untokenize(tokenize.generate_tokens(StringIO("1000").read))
AssertionError
>>> tokenize.untokenize(tokenize.generate_tokens(StringIO("1000\n").read))
'1000\n' so adding the newline would fix it for old python. |
@ahaldane Well, there was a newline before the pull request... |
Yeah, I got that now, I hadn't read your comments carefully before, sorry. My last comment was to to reply that adding the newline looks like a valid fix. What I don't quite see is why the old tokenize code failed if a newline was missing, and whether that was intentional or not (seems not). As to which pythons we test, it is 2.7.14. If you click around here you can see the options: https://travis-ci.org/numpy/numpy. The 1.14 release notes say "This release supports Python 2.7 and 3.4 - 3.6.", so I think we should try to fix this for 2.7.5. |
Can you point to duplication that was introduced by the patch? Also, a patch adding comments doesn't mean the code became less clear - it might just mean the old code was already unclear. I think we could fix this by adding the newline first, and changing the padding to
|
Yeah I would accept anything along those lines, with a comment explaining why it's needed. Might also do |
I'm going to open and tag 1.14.1 |
@eric-wieser I am not writing here to bother you guys, so there is no need to react like that. If you want to sort things out by yourself, fine for me. @eric-wieser @ahaldane I would be careful with the two solutions you proposed. They both assume that in all python versions (with all I mean all you want to support) the newline will be still present after Regarding the code quality: # Pad the header with spaces and a final newline such that the magic
# string, the header-length short and the header are aligned on a
# ARRAY_ALIGN byte boundary. This supports memory mapping of dtypes
# aligned up to ARRAY_ALIGN on systems like Linux where mmap()
# offset must be page-aligned (i.e. the beginning of the file).
header = header + b' '*topad + b'\n' And regarding the duplication, the version specific handling got much worse: padlen_v1 = ARRAY_ALIGN - ((MAGIC_LEN + struct.calcsize('<H') + hlen) % ARRAY_ALIGN)
padlen_v2 = ARRAY_ALIGN - ((MAGIC_LEN + struct.calcsize('<I') + hlen) % ARRAY_ALIGN) if hlen + padlen_v1 < 2**16 and version in (None, (1, 0)):
version = (1, 0)
header_prefix = magic(1, 0) + struct.pack('<H', hlen + padlen_v1)
topad = padlen_v1
elif hlen + padlen_v2 < 2**32 and version in (None, (2, 0)):
version = (2, 0)
header_prefix = magic(2, 0) + struct.pack('<I', hlen + padlen_v2)
topad = padlen_v2 Having two classes containing the version specific functionality with the same interface (the python kind of polymorphism) could improve the code a lot. |
Sorry if my tone came across that way, and thanks for calling out the examples. I agree, there's still room for improvement there - patches welcome :) |
@eric-wieser I will see if I find some time to contribute a bit. Seems a good starting point now... |
@eric-wieser @ahaldane currently on an 11 hours flight. I had a look at format.py and would like to work on a few issues, including the python 2.7.5 issue. I never contributed to any large open source project, I guess it's time to change that. How can I contribute to numpy, respectively format.py? |
@pnbat I also started with some local fixes like this, and I found it was pretty easy to contribute once you know how git works. You need to get familiar with git/github. We have a set of docs about contributing at https://docs.scipy.org/doc/numpy/dev/index.html, and the quick summary is at https://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html. Here's how I do it, in brief: I have a fork of numpy on github, which I clone to my computer. On my clone, I 1. checkout a new branch, 2. make my changes, 3. run tests with The docs have more details, but some things from the dev guide that I remember I initially found numpy devs were picky about: 1. You should follow our/python's coding standards (PEP8). Common problems for me is I forget to always add a space between Usually any pull request should include a new unit test, following the pattern of other tests. The exact placement (which file) doesn't matter as long as it vaguely makes sense. But for this fix, since the bug only occurs in python 2.7.5 there may be no unit test to add since we don't test that version, and probably numpy already fails tests for that python version. |
I'm starting to look at preparing the 1.14.1 release, so it would be good if this gets fixed soon. No pressure :-) |
If it is that urgent I would recommend one of the workarounds that were suggested above. Either adding the newline earlier, removing the +1 from the len and adapting the padding (erics suggestion) or adding the newline just for the filter and removing it afterwards (which could also be done in the filter method itself). |
@charris @eric-wieser @ahaldane I have pushed a few changes to my numpy fork (https://github.com/pnbat/numpy/tree/format) The changes include a bugfix for this issue here (tested it with python 2.7.5 and 2.7.14, but unfortunately I cannot test it with python 3). This is basically the last commit. I went for the easy solution, adding a newline right before the filtering and removing it after the filtering. To make this work I also had to change from StringIO.read to StringIO.readline (which seems more appropriate in any case). |
Great! If you go to the page for your github fork of numpy, you should see a button "create pull request". Or, if you go to that specific branch, there should be a button "create pull request". Push the button. It will give you a little form for a description. Fill it in briefly to explain what the pull request (PR) does, why the changes were needed, and any other comments (i.e., just copy your last comment). I suggest giving the PR the title "BUG: fix np.save tokenizer for python 2.7.5". Then submit the PR. Then automatic tests will be run, and we will be able to do line-by-line comparison and review of the diff on github, and answer questions. You can later update your PR based on comments. |
guys, how can i upgrade the newest numpy since 1.14.1 is not released? |
There should be a release in the next week, we are just waiting on a couple of things. |
@charris thanks a lot |
Uh oh!
There was an error while loading. Please reload this page.
The following minimal example crashes with numpy 1.14 and python 2.7.5:
The tokenize is now performed before the padding and the newline is added to the header string. This does not seem to work in python 2.7. Adding the newline in _filter_header(s) fixes the problem for python 2.7, but does not work in python 3. This could be done in the version distinguishing part of _filter_header(s). I am not familiar with the numpy internals and the recent change for the 64 byte alignment (#9025), so I do not feel comfortable to change it myself.
The text was updated successfully, but these errors were encountered: