numpy.save broken in 1.14 with python 2.7.5 #10348

pnbat · 2018-01-09T07:21:32Z

The following minimal example crashes with numpy 1.14 and python 2.7.5:

import numpy
a = numpy.zeros(64)
numpy.save("/tmp/a.npy", a)

lib/python2.7/site-packages/numpy/lib/format.pyc in _write_array_header(fp, d, version)
    306     header.append("}")
    307     header = "".join(header)
--> 308     header = asbytes(_filter_header(header))
    309 
    310     hlen = len(header) + 1 # 1 for newline

lib/python2.7/site-packages/numpy/lib/format.pyc in _filter_header(s)
    465             tokens.append(token)
    466         last_token_was_number = (token_type == tokenize.NUMBER)
--> 467     return tokenize.untokenize(tokens)
    468 
    469

/usr/lib64/python2.7/tokenize.pyc in untokenize(iterable)
    260     """
    261     ut = Untokenizer()
--> 262     return ut.untokenize(iterable)
    263 
    264 def generate_tokens(readline):

/usr/lib64/python2.7/tokenize.pyc in untokenize(self, iterable)
    196                 break
    197             tok_type, token, start, end, line = t
--> 198             self.add_whitespace(start)
    199             self.tokens.append(token)
    200             self.prev_row, self.prev_col = end

/usr/lib64/python2.7/tokenize.pyc in add_whitespace(self, start)
    185     def add_whitespace(self, start):
    186         row, col = start
--> 187         assert row <= self.prev_row
    188         col_offset = col - self.prev_col
    189         if col_offset:

AssertionError:

The tokenize is now performed before the padding and the newline is added to the header string. This does not seem to work in python 2.7. Adding the newline in _filter_header(s) fixes the problem for python 2.7, but does not work in python 3. This could be done in the version distinguishing part of _filter_header(s). I am not familiar with the numpy internals and the recent change for the 64 byte alignment (#9025), so I do not feel comfortable to change it myself.

The text was updated successfully, but these errors were encountered:

eric-wieser · 2018-01-09T07:54:16Z

can't seem to reproduce on python 2.7.11.

Can you enter the %debuger in ipython, and show the value of header causing the issue?

pnbat · 2018-01-09T12:25:06Z

My colleague just sent me the following email: (It seems this is a python 2.7.5 issue)

Newest RPM in CentOS 7 (os)
http://mirror.centos.org/centos/7/os/x86_64/Packages/python-2.7.5-58.el7.x86_64.rpm

Only in Software Collections there's a newer version but we cannot use that one
https://centos.pkgs.org/7/centos-sclo-rh/python27-python-2.7.13-3.el7.x86_64.rpm.html

Tokens in _filter_header displayed below, python 2.7.5 shows no newline (\n)

PYTHON 2.7.5

Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
Type "copyright", "credits" or "license" for more information.

IPython 5.5.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy

In [2]: a = numpy.ndarray([0,0,0,0], dtype=numpy.uint8)

In [3]: numpy.save("/tmp/a.npy", a)
token (51, '{', (1, 0), (1, 1), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (3, "'descr'", (1, 1), (1, 8), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ':', (1, 8), (1, 9), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (3, "'|u1'", (1, 10), (1, 15), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ',', (1, 15), (1, 16), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (3, "'fortran_order'", (1, 17), (1, 32), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ':', (1, 32), (1, 33), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (1, 'False', (1, 34), (1, 39), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ',', (1, 39), (1, 40), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (3, "'shape'", (1, 41), (1, 48), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ':', (1, 48), (1, 49), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, '(', (1, 50), (1, 51), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (2, '0', (1, 51), (1, 52), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ',', (1, 52), (1, 53), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (2, '0', (1, 54), (1, 55), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ',', (1, 55), (1, 56), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (2, '0', (1, 57), (1, 58), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ',', (1, 58), (1, 59), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (2, '0', (1, 60), (1, 61), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ')', (1, 61), (1, 62), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, ',', (1, 62), (1, 63), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (51, '}', (1, 64), (1, 65), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }")
token (0, '', (2, 0), (2, 0), '')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-3-63a44cce5b8d> in <module>()
----> 1 numpy.save("/tmp/a.npy", a)

/usr/lib64/python2.7/site-packages/numpy/lib/npyio.pyc in save(file, arr, allow_pickle, fix_imports)
    509         arr = np.asanyarray(arr)
    510         format.write_array(fid, arr, allow_pickle=allow_pickle,
--> 511                            pickle_kwargs=pickle_kwargs)
    512     finally:
    513         if own_fid:

/usr/lib64/python2.7/site-packages/numpy/lib/format.py in write_array(fp, array, version, allow_pickle, pickle_kwargs)
    561     _check_version(version)
    562     used_ver = _write_array_header(fp, header_data_from_array_1_0(array),
--> 563                                    version)
    564     # this warning can be removed when 1.9 has aged enough
    565     if version != (2, 0) and used_ver == (2, 0):

/usr/lib64/python2.7/site-packages/numpy/lib/format.py in _write_array_header(fp, d, version)
    306     header.append("}")
    307     header = "".join(header)
--> 308     header = asbytes(_filter_header(header))
    309 
    310     hlen = len(header) + 1 # 1 for newline

/usr/lib64/python2.7/site-packages/numpy/lib/format.py in _filter_header(s)
    466             tokens.append(token)
    467         last_token_was_number = (token_type == tokenize.NUMBER)
--> 468     return tokenize.untokenize(tokens)
    469 
    470 

/usr/lib64/python2.7/tokenize.pyc in untokenize(iterable)
    260     """
    261     ut = Untokenizer()
--> 262     return ut.untokenize(iterable)
    263 
    264 def generate_tokens(readline):

/usr/lib64/python2.7/tokenize.pyc in untokenize(self, iterable)
    196                 break
    197             tok_type, token, start, end, line = t
--> 198             self.add_whitespace(start)
    199             self.tokens.append(token)
    200             self.prev_row, self.prev_col = end

/usr/lib64/python2.7/tokenize.pyc in add_whitespace(self, start)
    185     def add_whitespace(self, start):
    186         row, col = start
--> 187         assert row <= self.prev_row
    188         col_offset = col - self.prev_col
    189         if col_offset:

AssertionError:

PYTHON 2.7.14

Python 2.7.14 (default, Dec 11 2017, 16:08:01) 
Type "copyright", "credits" or "license" for more information.

IPython 5.4.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy

In [2]: a = numpy.ndarray([0,0,0,0], dtype=numpy.uint8)

In [3]: numpy.save("/tmp/a.npy", a)
token (51, '{', (1, 0), (1, 1), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (3, "'descr'", (1, 1), (1, 8), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ':', (1, 8), (1, 9), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (3, "'|u1'", (1, 10), (1, 15), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ',', (1, 15), (1, 16), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (3, "'fortran_order'", (1, 17), (1, 32), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ':', (1, 32), (1, 33), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (1, 'False', (1, 34), (1, 39), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ',', (1, 39), (1, 40), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (3, "'shape'", (1, 41), (1, 48), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ':', (1, 48), (1, 49), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, '(', (1, 50), (1, 51), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (2, '0', (1, 51), (1, 52), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ',', (1, 52), (1, 53), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (2, '0', (1, 54), (1, 55), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ',', (1, 55), (1, 56), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (2, '0', (1, 57), (1, 58), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ',', (1, 58), (1, 59), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (2, '0', (1, 60), (1, 61), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ')', (1, 61), (1, 62), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, ',', (1, 62), (1, 63), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (51, '}', (1, 64), (1, 65), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (4, '\n', (1, 69), (1, 70), "{'descr': '|u1', 'fortran_order': False, 'shape': (0, 0, 0, 0), }    \n")
token (0, '', (2, 0), (2, 0), '')

ghost · 2018-01-09T15:26:00Z

FWIW, there was a big rant from one of the Red Hat folks about people not using the Pythons from software collections a while ago. If it's a Python bug, then there's (probably) nothing to fix on our end.

indiajoe · 2018-01-20T04:25:39Z

I have the same error on Centos OS 7, which uses Python 2.7.5 (default, Aug 4 2017, 00:39:18)
and numpy 1.14.0

Is the only option, to wait till Cent OS 7 updates Python?

HuynhLam · 2018-01-20T09:55:42Z

I have the same problem while running on Centos 7, python 2.7.5 and numpy 1.14.0

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-32-9eb6a987a90e> in <module>()
      4 # saved_path = os.path.join('err_stats')
      5 # print(saved_path)
----> 6 np.save('err_stats', err_vals)
      7 mask = np.std(err_vals, axis=0)
      8 print(mask.shape)

/lib/python2.7/site-packages/numpy/lib/npyio.pyc in save(file, arr, allow_pickle, fix_imports)
    509         arr = np.asanyarray(arr)
    510         format.write_array(fid, arr, allow_pickle=allow_pickle,
--> 511                            pickle_kwargs=pickle_kwargs)
    512     finally:
    513         if own_fid:

/lib/python2.7/site-packages/numpy/lib/format.pyc in write_array(fp, array, version, allow_pickle, pickle_kwargs)
    560     _check_version(version)
    561     used_ver = _write_array_header(fp, header_data_from_array_1_0(array),
--> 562                                    version)
    563     # this warning can be removed when 1.9 has aged enough
    564     if version != (2, 0) and used_ver == (2, 0):

/lib/python2.7/site-packages/numpy/lib/format.pyc in _write_array_header(fp, d, version)
    306     header.append("}")
    307     header = "".join(header)
--> 308     header = asbytes(_filter_header(header))
    309 
    310     hlen = len(header) + 1 # 1 for newline

/lib/python2.7/site-packages/numpy/lib/format.pyc in _filter_header(s)
    465             tokens.append(token)
    466         last_token_was_number = (token_type == tokenize.NUMBER)
--> 467     return tokenize.untokenize(tokens)
    468 
    469 

/usr/lib64/python2.7/tokenize.pyc in untokenize(iterable)
    260     """
    261     ut = Untokenizer()
--> 262     return ut.untokenize(iterable)
    263 
    264 def generate_tokens(readline):

/usr/lib64/python2.7/tokenize.pyc in untokenize(self, iterable)
    196                 break
    197             tok_type, token, start, end, line = t
--> 198             self.add_whitespace(start)
    199             self.tokens.append(token)
    200             self.prev_row, self.prev_col = end

/usr/lib64/python2.7/tokenize.pyc in add_whitespace(self, start)
    185     def add_whitespace(self, start):
    186         row, col = start
--> 187         assert row <= self.prev_row
    188         col_offset = col - self.prev_col
    189         if col_offset:

AssertionError:

eric-wieser · 2018-01-20T10:58:48Z

I'll ask again - in the ipython traceback you have there, can you type %debug, then up a few times until you're in _write_array_header, and then run print(repr(header))?

HuynhLam · 2018-01-20T11:03:48Z

> /usr/lib64/python2.7/tokenize.py(187)add_whitespace()
    185     def add_whitespace(self, start):
    186         row, col = start
--> 187         assert row <= self.prev_row
    188         col_offset = col - self.prev_col
    189         if col_offset:

ipdb> up
> /usr/lib64/python2.7/tokenize.py(198)untokenize()
    196                 break
    197             tok_type, token, start, end, line = t
--> 198             self.add_whitespace(start)
    199             self.tokens.append(token)
    200             self.prev_row, self.prev_col = end

ipdb> up
> /usr/lib64/python2.7/tokenize.py(262)untokenize()
    260     """
    261     ut = Untokenizer()
--> 262     return ut.untokenize(iterable)
    263 
    264 def generate_tokens(readline):

ipdb> up
> /var/tmp/nb-cache/test/lib/python2.7/site-packages/numpy/lib/format.py(467)_filter_header()
    465             tokens.append(token)
    466         last_token_was_number = (token_type == tokenize.NUMBER)
--> 467     return tokenize.untokenize(tokens)
    468 
    469 

ipdb> up
> /var/tmp/nb-cache/test/lib/python2.7/site-packages/numpy/lib/format.py(308)_write_array_header()
    306     header.append("}")
    307     header = "".join(header)
--> 308     header = asbytes(_filter_header(header))
    309 
    310     hlen = len(header) + 1 # 1 for newline

ipdb> print(repr(header))
"{'descr': '<f8', 'fortran_order': False, 'shape': (1449, 480, 640), }"

lzj1769 · 2018-01-25T10:28:47Z

I have the sample problem using python 2.7.5, numpy 1.14.0 and CentOS.
Anyone has idea about how to solve this issue?

littleeleventhwolf · 2018-01-26T13:37:00Z

Same problem while running on Centos 7, python 2.7.5 and numpy 1.14.0.

def convert_mean(binMean, npyMean):
    blob = caffe.proto.caffe_pb2.BlobProto()
    bin_mean = open(binMean, 'rb').read()
    blob.ParseFromString(bin_mean)
    arr = np.array(caffe.io.blobproto_to_array(blob))
    npy_mean = arr[0]
    np.save(npyMean, npy_mean)
binMean = caffe_root + 'examples/cifar10/mean.binaryproto'
npyMean = caffe_root + 'examples/cifar10/mean.npy'
convert_mean(binMean, npyMean)

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-17-61deea8c4153> in <module>()
      9 binMean = caffe_root + 'examples/cifar10/mean.binaryproto'
     10 npyMean = caffe_root + 'examples/cifar10/mean.npy'
---> 11 convert_mean(binMean, npyMean)

<ipython-input-17-61deea8c4153> in convert_mean(binMean, npyMean)
      6     arr = np.array(caffe.io.blobproto_to_array(blob))
      7     npy_mean = arr[0]
----> 8     np.save(npyMean, npy_mean)
      9 binMean = caffe_root + 'examples/cifar10/mean.binaryproto'
     10 npyMean = caffe_root + 'examples/cifar10/mean.npy'

/usr/lib64/python2.7/site-packages/numpy/lib/npyio.pyc in save(file, arr, allow_pickle, fix_imports)
    509         arr = np.asanyarray(arr)
    510         format.write_array(fid, arr, allow_pickle=allow_pickle,
--> 511                            pickle_kwargs=pickle_kwargs)
    512     finally:
    513         if own_fid:

/usr/lib64/python2.7/site-packages/numpy/lib/format.pyc in write_array(fp, array, version, allow_pickle, pickle_kwargs)
    560     _check_version(version)
    561     used_ver = _write_array_header(fp, header_data_from_array_1_0(array),
--> 562                                    version)
    563     # this warning can be removed when 1.9 has aged enough
    564     if version != (2, 0) and used_ver == (2, 0):

/usr/lib64/python2.7/site-packages/numpy/lib/format.pyc in _write_array_header(fp, d, version)
    306     header.append("}")
    307     header = "".join(header)
--> 308     header = asbytes(_filter_header(header))
    309 
    310     hlen = len(header) + 1 # 1 for newline

/usr/lib64/python2.7/site-packages/numpy/lib/format.
8000
pyc in _filter_header(s)
    465             tokens.append(token)
    466         last_token_was_number = (token_type == tokenize.NUMBER)
--> 467     return tokenize.untokenize(tokens)
    468 
    469 

/usr/lib64/python2.7/tokenize.pyc in untokenize(iterable)
    260     """
    261     ut = Untokenizer()
--> 262     return ut.untokenize(iterable)
    263 
    264 def generate_tokens(readline):

/usr/lib64/python2.7/tokenize.pyc in untokenize(self, iterable)
    196                 break
    197             tok_type, token, start, end, line = t
--> 198             self.add_whitespace(start)
    199             self.tokens.append(token)
    200             self.prev_row, self.prev_col = end

/usr/lib64/python2.7/tokenize.pyc in add_whitespace(self, start)
    185     def add_whitespace(self, start):
    186         row, col = start
--> 187         assert row <= self.prev_row
    188         col_offset = col - self.prev_col
    189         if col_offset:

AssertionError:

HuynhLam · 2018-01-26T14:16:27Z

I think the work around solution is using other python version.

…

On Jan 26, 2018 15:37, "Universe" ***@***.***> wrote: Same problem while running on Centos 7, python 2.7.5 and numpy 1.14.0. def convert_mean(binMean, npyMean): blob = caffe.proto.caffe_pb2.BlobProto() bin_mean = open(binMean, 'rb').read() blob.ParseFromString(bin_mean) arr = np.array(caffe.io.blobproto_to_array(blob)) npy_mean = arr[0] np.save(npyMean, npy_mean) binMean = caffe_root + 'examples/cifar10/mean.binaryproto' npyMean = caffe_root + 'examples/cifar10/mean.npy' convert_mean(binMean, npyMean) --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-17-61deea8c4153> in <module>() 9 binMean = caffe_root + 'examples/cifar10/mean.binaryproto' 10 npyMean = caffe_root + 'examples/cifar10/mean.npy' ---> 11 convert_mean(binMean, npyMean) <ipython-input-17-61deea8c4153> in convert_mean(binMean, npyMean) 6 arr = np.array(caffe.io.blobproto_to_array(blob)) 7 npy_mean = arr[0] ----> 8 np.save(npyMean, npy_mean) 9 binMean = caffe_root + 'examples/cifar10/mean.binaryproto' 10 npyMean = caffe_root + 'examples/cifar10/mean.npy' /usr/lib64/python2.7/site-packages/numpy/lib/npyio.pyc in save(file, arr, allow_pickle, fix_imports) 509 arr = np.asanyarray(arr) 510 format.write_array(fid, arr, allow_pickle=allow_pickle, --> 511 pickle_kwargs=pickle_kwargs) 512 finally: 513 if own_fid: /usr/lib64/python2.7/site-packages/numpy/lib/format.pyc in write_array(fp, array, version, allow_pickle, pickle_kwargs) 560 _check_version(version) 561 used_ver = _write_array_header(fp, header_data_from_array_1_0(array), --> 562 version) 563 # this warning can be removed when 1.9 has aged enough 564 if version != (2, 0) and used_ver == (2, 0): /usr/lib64/python2.7/site-packages/numpy/lib/format.pyc in _write_array_header(fp, d, version) 306 header.append("}") 307 header = "".join(header) --> 308 header = asbytes(_filter_header(header)) 309 310 hlen = len(header) + 1 # 1 for newline /usr/lib64/python2.7/site-packages/numpy/lib/format.pyc in _filter_header(s) 465 tokens.append(token) 466 last_token_was_number = (token_type == tokenize.NUMBER) --> 467 return tokenize.untokenize(tokens) 468 469 /usr/lib64/python2.7/tokenize.pyc in untokenize(iterable) 260 """ 261 ut = Untokenizer() --> 262 return ut.untokenize(iterable) 263 264 def generate_tokens(readline): /usr/lib64/python2.7/tokenize.pyc in untokenize(self, iterable) 196 break 197 tok_type, token, start, end, line = t --> 198 self.add_whitespace(start) 199 self.tokens.append(token) 200 self.prev_row, self.prev_col = end /usr/lib64/python2.7/tokenize.pyc in add_whitespace(self, start) 185 def add_whitespace(self, start): 186 row, col = start --> 187 assert row <= self.prev_row 188 col_offset = col - self.prev_col 189 if col_offset: AssertionError: — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10348 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AG5FiqXy33v77BJokclHrWaPo0TIcuLsks5tOdUEgaJpZM4RXaLu> .

emma-sjwang · 2018-01-30T13:30:35Z

I solved this problem by
pip install numpy==1.13.3

pnbat · 2018-01-31T12:36:26Z

I am still receiving emails regarding this issue. Just to make it clear again, it turned out that this is not a numpy issue. The problem occurs while using numpy 1.14 with python 2.7.5. Unfortunately, that python version is the most up-to-date version in the default CentOS repositories. So what can you do about it?

As pointed out before, you could simply use a previous version of numpy (e.g. 1.13.3). This does not look like a good solution to me, but if you have to stick to python 2.7.5 for any good reason, then this might be the solution you are looking for.
Otherwise, using the newer python 2 provided by the CentOS software collections repository should solve the issue. On most other distributions, a newer python 2 version should be already available (probably since quite a while).
If you are bound to CentOS and its default repositories, but you have the freedom to use docker, this could also help, depending on your particular use case.

jo11111 · 2018-02-02T17:29:32Z

I'm also seeing this problem with python 2.7.6
I've downgraded numpy to get around it.

ahaldane · 2018-02-02T18:48:13Z

Did we numpy devs figure out what changed between 1.13 and 1.14 to trigger this?

pnbat · 2018-02-02T19:12:18Z

@ahaldane Well, it is pretty much related to the changes of #9025
I thought my first two comments would provide enough information. I was going through the code changes and if I remember it correctly, together with the changes for the alignment, the logic also changed quite a bit. I think the order of some operations changed. This should not be a huge problem, but with some old python 2 versions "Tokens in _filter_header displayed below, python 2.7.5 shows 8000 no newline (\n)" (cf. my second comment in this thread).
I was about to change the code myself, but it seems a bit messy and I have not worked on numpy before. What I also figured out, was that the pullrequest #9025 was pushed quite a lot to be integrated into 1.14. Things like this are usually not good for overall code quality...

eric-wieser · 2018-02-02T19:18:56Z

So would adding a trailing newline before _filter_header, and then stripping it again after, solve the problem?

The ENH in #9025 is misleading - it's a substantial bugfix for incorrect alignments in python 2 and format 2.0, which I would consider to be a big plus for code quality

pnbat · 2018-02-02T19:59:04Z

If you look at the diff, in line 313 of the old code, the header was modified to contain a newline:
header = header + ' '*topad + '\n'
This was moved to line 333 of the pull request code:
header = header + b' '*topad + b'\n'
That means the _filter_header method was called with the modified header in the old code and is called with the non modified header (not padded and no newline) in the new code. I do not have time to dig deeper into this, but adding the newline before calling _filter_header was probably the workaround for the python 2.7.5 issue with the tokenizer.

Regarding code quality, the pull request definitely made things worse. There is plenty of duplication and a lot of comments (which means the code itself is not clear). Well, if this is a bugfix for some alignment issue, then having things work properly might be more important. Too bad, it caused other problems which need to be fixed now.

pnbat · 2018-02-02T20:02:00Z

How do you guys test the library? Are you running tests with different python versions?

ahaldane · 2018-02-02T20:03:11Z

Here's the cpython commit that affected things: python/cpython@5e6db31

I don't have time right now to read it and understand why this fixed missing newlines, but I did check that in python 2.7.5:

>>> tokenize.untokenize(tokenize.generate_tokens(StringIO("1000").read))
AssertionError
>>> tokenize.untokenize(tokenize.generate_tokens(StringIO("1000\n").read))
'1000\n'

so adding the newline would fix it for old python.

pnbat · 2018-02-02T20:05:38Z

@ahaldane Well, there was a newline before the pull request...

ahaldane · 2018-02-02T20:12:41Z

Yeah, I got that now, I hadn't read your comments carefully before, sorry. My last comment was to to reply that adding the newline looks like a valid fix.

What I don't quite see is why the old tokenize code failed if a newline was missing, and whether that was intentional or not (seems not).

As to which pythons we test, it is 2.7.14. If you click around here you can see the options: https://travis-ci.org/numpy/numpy.

The 1.14 release notes say "This release supports Python 2.7 and 3.4 - 3.6.", so I think we should try to fix this for 2.7.5.

eric-wieser · 2018-02-02T20:14:59Z

There is plenty of duplication and a lot of comments (which means the code itself is not clear)

Can you point to duplication that was introduced by the patch?

Also, a patch adding comments doesn't mean the code became less clear - it might just mean the old code was already unclear.

I think we could fix this by adding the newline first, and changing the padding to

header = header[:-1] + b' '*topad + header[-1]

ahaldane · 2018-02-02T20:19:22Z

Yeah I would accept anything along those lines, with a comment explaining why it's needed.

Might also do header = asbytes(_filter_header(header + "\n"))[:-1]

ahaldane · 2018-02-02T20:23:09Z

I'm going to open and tag 1.14.1

pnbat · 2018-02-02T20:43:15Z

@eric-wieser I am not writing here to bother you guys, so there is no need to react like that. If you want to sort things out by yourself, fine for me.

@eric-wieser @ahaldane I would be careful with the two solutions you proposed. They both assume that in all python versions (with all I mean all you want to support) the newline will be still present after _filter_header has been called and all the tokenize and untokenize stuff has been performed. I would suggest to double check this first.

Regarding the code quality:
What about
padded_header = pad_header_for_alignment(header, topad)
instead of

# Pad the header with spaces and a final newline such that the magic
# string, the header-length short and the header are aligned on a
# ARRAY_ALIGN byte boundary.  This supports memory mapping of dtypes
# aligned up to ARRAY_ALIGN on systems like Linux where mmap()
# offset must be page-aligned (i.e. the beginning of the file).
header = header + b' '*topad + b'\n'

And regarding the duplication, the version specific handling got much worse:

padlen_v1 = ARRAY_ALIGN - ((MAGIC_LEN + struct.calcsize('<H') + hlen) % ARRAY_ALIGN)
padlen_v2 = ARRAY_ALIGN - ((MAGIC_LEN + struct.calcsize('<I') + hlen) % ARRAY_ALIGN)

if hlen + padlen_v1 < 2**16 and version in (None, (1, 0)):
    version = (1, 0)
    header_prefix = magic(1, 0) + struct.pack('<H', hlen + padlen_v1)
    topad = padlen_v1
elif hlen + padlen_v2 < 2**32 and version in (None, (2, 0)):
    version = (2, 0)
    header_prefix = magic(2, 0) + struct.pack('<I', hlen + padlen_v2)
    topad = padlen_v2

Having two classes containing the version specific functionality with the same interface (the python kind of polymorphism) could improve the code a lot.

eric-wieser · 2018-02-02T20:47:10Z

so there is no need to react like that

Sorry if my tone came across that way, and thanks for calling out the examples. I agree, there's still room for improvement there - patches welcome :)

pnbat · 2018-02-02T20:49:45Z

@eric-wieser I will see if I find some time to contribute a bit. Seems a good starting point now...

pnbat · 2018-02-03T13:57:30Z

@eric-wieser @ahaldane currently on an 11 hours flight. I had a look at format.py and would like to work on a few issues, including the python 2.7.5 issue.

I never contributed to any large open source project, I guess it's time to change that. How can I contribute to numpy, respectively format.py?

ahaldane · 2018-02-03T18:26:02Z

@pnbat I also started with some local fixes like this, and I found it was pretty easy to contribute once you know how git works. You need to get familiar with git/github.

We have a set of docs about contributing at https://docs.scipy.org/doc/numpy/dev/index.html, and the quick summary is at https://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html.

Here's how I do it, in brief: I have a fork of numpy on github, which I clone to my computer. On my clone, I 1. checkout a new branch, 2. make my changes, 3. run tests with python ./runtests.py, 4. If passed, commit them with a properly formatted commit message, 5. push to that branch in my github fork. 6. Press the button on github to make a new pull request. Then we will review and eventually merge, possibly after a few rounds of fixes.

The docs have more details, but some things from the dev guide that I remember I initially found numpy devs were picky about: 1. You should follow our/python's coding standards (PEP8). Common problems for me is I forget to always add a space between ) {, and that there should be no trailing whitespace on any line. 2. Commit header should start with BUG: my description or similar, and the commit message should say Fixes #10348 to automate issue resolution.

Usually any pull request should include a new unit test, following the pattern of other tests. The exact placement (which file) doesn't matter as long as it vaguely makes sense. But for this fix, since the bug only occurs in python 2.7.5 there may be no unit test to add since we don't test that version, and probably numpy already fails tests for that python version.

charris · 2018-02-03T18:33:32Z

I'm starting to look at preparing the 1.14.1 release, so it would be good if this gets fixed soon. No pressure :-)

pnbat · 2018-02-03T19:16:01Z

If it is that urgent I would recommend one of the workarounds that were suggested above. Either adding the newline earlier, removing the +1 from the len and adapting the padding (erics suggestion) or adding the newline just for the filter and removing it afterwards (which could also be done in the filter method itself).
I have done a lot of changes which require some cleanup now. Tests are running, but I am calling private methods during the tests, because I changed the interface. Did not look good to me that certain methods were part of the public interface. Enough work in the airplane, but I will get back to you guys tomorrow.

pnbat · 2018-02-04T23:57:14Z

@charris @eric-wieser @ahaldane I have pushed a few changes to my numpy fork (https://github.com/pnbat/numpy/tree/format)

The changes include a bugfix for this issue here (tested it with python 2.7.5 and 2.7.14, but unfortunately I cannot test it with python 3). This is basically the last commit. I went for the easy solution, adding a newline right before the filtering and removing it after the filtering. To make this work I also had to change from StringIO.read to StringIO.readline (which seems more appropriate in any case).
Apart from that I started with some refactoring to make the code more readable and more testable. It should also be quite easy to make the alignment configurable now. I could continue a bit more into that direction, but I would be happy about a bit of feedback at this point.
It is also not really clear to me, what is part of the public API of the format module. To me the methods of interest are write_array, read_array and open_mmap. The methods magic, read_magic, dtype_to_descr, header_data_from_array_1_0, write_array_header_1_0, write_array_header_2_0, read_array_header_1_0, read_array_header_2_0 do not look like they should be exposed. What is the reason behind that?

ahaldane · 2018-02-05T01:15:52Z

Great! If you go to the page for your github fork of numpy, you should see a button "create pull request". Or, if you go to that specific branch, there should be a button "create pull request". Push the button.

It will give you a little form for a description. Fill it in briefly to explain what the pull request (PR) does, why the changes were needed, and any other comments (i.e., just copy your last comment). I suggest giving the PR the title "BUG: fix np.save tokenizer for python 2.7.5".

Then submit the PR. Then automatic tests will be run, and we will be able to do line-by-line comparison and review of the diff on github, and answer questions. You can later update your PR based on comments.

baixiaoustc · 2018-02-12T03:16:46Z

guys, how can i upgrade the newest numpy since 1.14.1 is not released?

charris · 2018-02-12T03:34:16Z

There should be a release in the next week, we are just waiting on a couple of things.

baixiaoustc · 2018-02-12T06:01:42Z

@charris thanks a lot

pnbat closed this as completed Jan 31, 2018

bluegitter mentioned this issue Feb 1, 2018

AssertionError MattVitelli/GRUV#34

Open

ahaldane reopened this Feb 2, 2018

ahaldane added this to the 1.14.1 release milestone Feb 2, 2018

pnbat mentioned this issue Feb 5, 2018

BUG: fix np.save issue with python 2.7.5 #10524

Merged

charris closed this as completed in #10524 Feb 5, 2018

charris mentioned this issue Feb 7, 2018

BUG: fix np.save issue with python 2.7.5 #10539

Merged

Uh oh!

numpy.save broken in 1.14 with python 2.7.5 #10348

numpy.save broken in 1.14 with python 2.7.5 #10348

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!