10000 loadtxt fails to load large unsigned int64 integers. (Trac #1565) · Issue #2162 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

loadtxt fails to load large unsigned int64 integers. (Trac #1565) #2162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 16 comments
Closed

loadtxt fails to load large unsigned int64 integers. (Trac #1565) #2162

numpy-gitbot opened this issue Oct 19, 2012 · 16 comments

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/1565 on 2010-07-28 by @fengy-research, assigned to unknown.

Prepare the following file:

-------file /tmp/test -----
9223372043271415339
9223372043271415853
9223372043271415612
9223372043271416107
9223372043271415594
9223372043271415836
9223372043761290139
9223372044088967272
9223372044088967273
9223372043925949039
---------end of file-----

And run the following code:

In [16]: print loadtxt('/tmp/test', dtype='uint64')
-------> print(loadtxt('/tmp/test', dtype='uint64'))
[9223372043271415808 9223372043271415808 9223372043271415808
9223372043271415808 9223372043271415808 9223372043271415808
9223372043761289216 9223372044088967168 9223372044088967168
9223372043925948416]

On the other hand, with fromfile(),

In [2]: print fromfile('/tmp/test', dtype='uint64', sep=' ')
------> print(fromfile('/tmp/test', dtype='uint64', sep=' '))
[9223372043271415339 9223372043271415853 9223372043271415612
9223372043271416107 9223372043271415594 9223372043271415836
9223372043761290139 9223372044088967272 9223372044088967273
9223372043925949039]

Clearly the first few numbers are wrongly converted by loadtxt

The problem was tracked to line 453 in numpy/lib/io.py, _getconv. The conversion for np.integer is int(float(x)), which is inexact for large integers.

I don't know if a priority of normal is appropriate, as this bug will produce hidden errors in programs that use numpy.

@numpy-gitbot
Copy link
Author

@fengy-research wrote on 2010-07-28

Patch.

--- /home/yfeng1/local/lib/python2.7/site-packages/numpy/lib/io.py 2010-07-28 18:07:06.000000000 -0400
+++ /home/yfeng1/local/lib/python2.7/site-packages/numpy/lib/io.py.new 2010-07-28 18:06:48.000000000 -0400
@@ -454,6 +454,10 @@
typ = dtype.type
if issubclass(typ, np.bool_):
return lambda x: bool(int(x))

  • if issubclass(typ, np.uint64):
  •    return np.uint64
    
  • if issubclass(typ, np.int64):
  •    return np.int64
    
    if issubclass(typ, np.integer):
    return lambda x: int(float(x))
    elif issubclass(typ, np.floating):

@numpy-gitbot
Copy link
Author

@fengy-research wrote on 2010-07-28

oops patch garbled.

--- /home/yfeng1/local/lib/python2.7/site-packages/numpy/lib/io.py      2010-07-28 18:07:06.000000000 -0400
+++ /home/yfeng1/local/lib/python2.7/site-packages/numpy/lib/io.py.new  2010-07-28 18:06:48.000000000 -0400
@@ -454,6 +454,10 @@
     typ = dtype.type
     if issubclass(typ, np.bool_):
         return lambda x: bool(int(x))
+    if issubclass(typ, np.uint64):
+        return np.uint64
+    if issubclass(typ, np.int64):
+        return np.int64
     if issubclass(typ, np.integer):
         return lambda x: int(float(x))
     elif issubclass(typ, np.floating):

@numpy-gitbot
Copy link
Author

@cgohlke wrote on 2010-09-06

This seems related to ticket #1761.

@numpy-gitbot
Copy link
Author

@cgohlke wrote on 2010-09-06

I can reproduce this with numpy 1.5.0 on win32 and win-amd64. The patch solves this issue and also ticket #1761.

@numpy-gitbot
Copy link
Author

Attachment added by @cgohlke on 2010-09-06: npio.diff

@numpy-gitbot
Copy link
Author

@cgohlke wrote on 2010-09-06

It should be noted that with the patch, loading a csv file containing floating point formatted data (e.g. 3.14 or 1e9) with loadtxt('test.csv', dtype='uint64') will raise a ValueError. This seems better than silently returning wrong values.

@numpy-gitbot
Copy link
Author

@charris wrote on 2010-09-06

Needs some work. Note that savetxt has the same problem and that with your fix negative numbers are converted bitwise to their uint counterpart. That is the numpy way but it probably isn't what readers of text files expect. I agree about raising the error. I'm not convinced that the other integer types shouldn't do the same.

Patch attached with your changes and two added tests.

@numpy-gitbot
Copy link
Author

Attachment added by @charris on 2010-09-06: npio.patch

@numpy-gitbot
Copy link
Author

trac user dynetrekk wrote on 2011-03-26

This is the same issue as #1761. I attached a suggested patch for that today, which should work both for floats and for ints.

@numpy-gitbot
Copy link
Author

Milestone changed to 1.6.0 by @rgommers on 2011-03-30

@numpy-gitbot
Copy link
Author

@charris wrote on 2011-04-02

I wonder if the integer conversions shouldn't just return the int function instead of going through a double. Doing so would cause an error to be raised if a float string were encountered. Why did this function go through a float in the first place?

@numpy-gitbot
Copy link
Author

@rgommers wrote on 2011-04-02

Probably went through float to avoid the error. There's likely to be some software out there that writes txt/csv files as "5, 300, 1e10" for integer data. I agree raising an error may be better, but that will break backwards compatibility (perhaps quite severely).

+1 for raising a ValueError as discussed above for (u)int64, since that's already broken.
-0.5 for changing it for all ints.

@numpy-gitbot
Copy link
Author

@rgommers wrote on 2011-04-02

Whatever the final solution, the exact same should be done in genfromtxt. That has also a bug filed against it for the same thing, #2026.

@numpy-gitbot
Copy link
Author

@charris wrote on 2011-04-02

I've gone with the patch as is. The patch in 1163 was defective as once the conversion to int was done, further conversion to int64 would fail if the value exceeded what can be held in long and the size of long varies between int32 and int64 depending on the platform.

@numpy-gitbot
Copy link
Author

@charris wrote on 2011-04-02

Replying to [comment:10 rgommers]:

Probably went through float to avoid the error. There's likely to be some software out there that writes txt/csv files as "5, 300, 1e10" for integer data. I agree raising an error may be better, but that will break backwards compatibility (perhaps quite severely).

+1 for raising a ValueError as discussed above for (u)int64, since that's already broken.
-0.5 for changing it for all ints.

Note that the error can be avoided by doing something like int64(repr(int(float(x)))) as a fallback when a ValueError is raised, but the largest values can't be converted in this way as they don't fit in a double.

@numpy-gitbot
Copy link
Author

@charris wrote on 2011-04-02

Fixed in 32903b3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant
0