8000 Incorrect conversion to Int64 by loadtxt (traced to _getconv in numpy.lib.io) (Trac #1163) · Issue #1761 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Incorrect conversion to Int64 by loadtxt (traced to _getconv in numpy.lib.io) (Trac #1163) #1761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thouis opened this issue Oct 19, 2012 · 7 comments
Labels
00 - Bug component: numpy.lib Priority: high High priority, also add milestones for urgent issues
Milestone

Comments

@thouis
Copy link
Contributor
thouis commented Oct 19, 2012

Original ticket http://projects.scipy.org/numpy/ticket/1163 on 2009-07-09 by trac user onsi, assigned to unknown.

I'm running version 1.2.1 but this error should also occur in 1.3.0 based on the source currently in the trunk.

I try importing the following ascii data stored in "sample.csv":

9007200000000000,670927001710,0.010190886[[BR]]
9007200000000001,660927001348,0.00976051[[BR]]
9007200000000002,650883003926,0.009154096

using (maximal verbosity for clarity):

import numpy
arr=numpy.loadtxt("sample.csv",dtype=[('id0',numpy.int64),('id1',numpy.int64),('flt',numpy.float32)],delimiter=',',comments='#')

I get:

[(9007200000000000L, 670927001710L, 0.010190886445343494)[[BR]](9007200000000000L, 660927001348L, 0.0097605101764202118)[[BR]](9007200000000002L, 650883003926L, 0.009154096245765686)][[BR]]

After some digging, i found the culprit to be the converter used by loadtxt to convert strings to dtypes. lib.io._getconv (line 352 in trunk) returns:

lambda x: int(float(x))

as the converter for any dtype that is a subclass of int, which int64 is. Unfortunately, float does not faithfully reproduce long integers and so 9007200000000001 gets rounded to 9007200000000000.

This is fairly serious as int64s are often used as IDs in various numerical/simulation contexts. Changing the converter to int() should resolve this problem -- though then some error checking needs to take place to ensure that int is fed an integer string.

@thouis thouis closed this as completed Oct 19, 2012
@numpy-gitbot
Copy link

Milestone changed to 1.4.0 by @cournape on 2009-11-25

@numpy-gitbot
Copy link

@WarrenWeckesser wrote on 2010-08-18

There is a thread in the mailing list about this problem:
http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052289.html

@numpy-gitbot
Copy link

Milestone changed to Unscheduled by @mwiebe on 2011-03-24

@numpy-gitbot
Copy link

trac user dynetrekk wrote on 2011-03-26

I suggest a patch here. The patch will work both for long int as well as for converting floats into ints.

diff -r 2763b87dd7e8 -r 8eaaeb6ed8f3 numpy/lib/npyio.py
--- a/numpy/lib/npyio.py    Fri Mar 25 22:37:19 2011 -0600
+++ b/numpy/lib/npyio.py    Sat Mar 26 12:40:26 2011 +0100
@@ -566,7 +566,20 @@
     if issubclass(typ, np.bool_):
         return lambda x: bool(int(x))
     if issubclass(typ, np.integer):
-        return lambda x: int(float(x))
+        def _intconv(x):
+            try:
+                # This works for long integer, for example:
+                # >>> int('123456789123456789123456789123456789')
+                # 123456789123456789123456789123456789L
+                
+                y = int(x)
+            except ValueError:
+                # This will work if the number is a float, for example:
+                # >>> int(float('1.23e45'))
+                # 1229999999999999973814869011019624571608236032L
+                y = int(float(x))
+            return y
+        return _intconv
     elif issubclass(typ, np.floating):
         return float
     elif issubclass(typ, np.complex):

@numpy-gitbot
Copy link

@rgommers wrote on 2011-03-31

It would be helpful if the patch includes a test.

@numpy-gitbot
Copy link

Milestone changed to 1.6.0 by @rgommers on 2011-03-31

@numpy-gitbot
Copy link

@rgommers wrote on 2011-04-02

Closing as duplicate of #2162. This one's older, but #2162 has a more complete patch and more discussion.

This should be fixed for 1.6.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy.lib Priority: high High priority, also add milestones for urgent issues
Projects
None yet
Development

No branches or pull requests

2 participants
0