-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
loadtxt fails to load large unsigned int64 integers. (Trac #1565) #2162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@fengy-research wrote on 2010-07-28 Patch. --- /home/yfeng1/local/lib/python2.7/site-packages/numpy/lib/io.py 2010-07-28 18:07:06.000000000 -0400
|
@fengy-research wrote on 2010-07-28 oops patch garbled.
|
@cgohlke wrote on 2010-09-06 It should be noted that with the patch, loading a csv file containing floating point formatted data (e.g. 3.14 or 1e9) with loadtxt('test.csv', dtype='uint64') will raise a ValueError. This seems better than silently returning wrong values. |
@charris wrote on 2010-09-06 Needs some work. Note that savetxt has the same problem and that with your fix negative numbers are converted bitwise to their uint counterpart. That is the numpy way but it probably isn't what readers of text files expect. I agree about raising the error. I'm not convinced that the other integer types shouldn't do the same. Patch attached with your changes and two added tests. |
Attachment added by @charris on 2010-09-06: npio.patch |
trac user dynetrekk wrote on 2011-03-26 This is the same issue as #1761. I attached a suggested patch for that today, which should work both for floats and for ints. |
Milestone changed to |
@charris wrote on 2011-04-02 I wonder if the integer conversions shouldn't just return the int function instead of going through a double. Doing so would cause an error to be raised if a float string were encountered. Why did this function go through a float in the first place? |
@rgommers wrote on 2011-04-02 Probably went through float to avoid the error. There's likely to be some software out there that writes txt/csv files as "5, 300, 1e10" for integer data. I agree raising an error may be better, but that will break backwards compatibility (perhaps quite severely). +1 for raising a ValueError as discussed above for (u)int64, since that's already broken. |
@charris wrote on 2011-04-02 I've gone with the patch as is. The patch in 1163 was defective as once the conversion to int was done, further conversion to int64 would fail if the value exceeded what can be held in long and the size of long varies between int32 and int64 depending on the platform. |
@charris wrote on 2011-04-02 Replying to [comment:10 rgommers]:
Note that the error can be avoided by doing something like int64(repr(int(float(x)))) as a fallback when a ValueError is raised, but the largest values can't be converted in this way as they don't fit in a double. |
Original ticket http://projects.scipy.org/numpy/ticket/1565 on 2010-07-28 by @fengy-research, assigned to unknown.
Prepare the following file:
-------file /tmp/test -----
9223372043271415339
9223372043271415853
9223372043271415612
9223372043271416107
9223372043271415594
9223372043271415836
9223372043761290139
9223372044088967272
9223372044088967273
9223372043925949039
---------end of file-----
And run the following code:
In [16]: print loadtxt('/tmp/test', dtype='uint64')
-------> print(loadtxt('/tmp/test', dtype='uint64'))
[9223372043271415808 9223372043271415808 9223372043271415808
9223372043271415808 9223372043271415808 9223372043271415808
9223372043761289216 9223372044088967168 9223372044088967168
9223372043925948416]
On the other hand, with fromfile(),
In [2]: print fromfile('/tmp/test', dtype='uint64', sep=' ')
------> print(fromfile('/tmp/test', dtype='uint64', sep=' '))
[9223372043271415339 9223372043271415853 9223372043271415612
9223372043271416107 9223372043271415594 9223372043271415836
9223372043761290139 9223372044088967272 9223372044088967273
9223372043925949039]
Clearly the first few numbers are wrongly converted by loadtxt
The problem was tracked to line 453 in numpy/lib/io.py, _getconv. The conversion for np.integer is int(float(x)), which is inexact for large integers.
I don't know if a priority of normal is appropriate, as this bug will produce hidden errors in programs that use numpy.
The text was updated successfully, but these errors were encountered: