ENH: usecols now accepts an int when only one column has to be read #6656

I--P · 2015-11-09T10:07:07Z

As explained on the Numpy-discussion mailing list I've seen many students, coming from Matlab, struggling against the usecols argument of loadtxt when they had to read only one column in a file.

This simple PR aims at making loadtxt more user friendly for newcomers, usecol behavior has been modified to accept either an int or a sequence when a single column has to be read.

jaimefrio · 2015-12-12T16:14:31Z

I'm a little wary of the check for non-ints, as it may break people's code and would probably require a deprecation cycle.

Other than that, you should add this enhancement to the release notes.

seberg · 2016-01-02T18:50:36Z

What Jaime said. As a solution for the conversion, I think you could try:

try:
    usecols = [operator.index(usecols)]
except TypeError:
    usecols = list(usecols)

I--P · 2016-01-04T09:34:39Z

I don't get the problem with the check for non ints, with the current code it raises a TypeError:

from io import StringIO # StringIO behaves like a file object
c = StringIO("0 1\n2 3")
np.loadtxt(c, usecols=[1.5])

Output:
Traceback (most recent call last):
File "test.py", line 6, in
np.loadtxt(c, usecols=[1.5])
File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 846, in loadtxt
vals = [vals[i] for i in usecols]
File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 846, in
vals = [vals[i] for i in usecols]
TypeError: list indices must be integers, not float

The only change is that the error message is a bit more userfriendly imho. Admittedly I should raise TypeError instead of ValueError to avoid breaking existing code though.

Just tell me if raising TypeError would be enough or if you really want this to go away.

seberg · 2016-01-04T09:47:57Z

There are more int-like classes possible then you can easily check for. One "hope for the best" solution would be to check for numbers.integral. But in the end trying operator.index (basically the duck-type version of it) is the only solution that is guaranteed to not create any regression.

You can do the try/except this way around (try list first instead of index first), it does not matter. But the integer check should be done with operator.index.
Your version probably improves the error message conside 8000 rably, so I am fine with adding the explicit check/conversion (could also convert the list into a list filled with operator.index and replace the error message if it fails).

seberg · 2016-01-04T09:49:49Z

Hmmm, seriously wondering here though. Maybe TypeError is better than ValueError?

I--P · 2016-01-04T09:53:15Z

TypeError would not break people's code if they are catching the old exception, so I think I should use it instead of ValueError.

I--P · 2016-01-04T10:16:04Z

Is it OK like that ? I don't know if it's a good practice in numpy to delete the original exception message to replace it with my own message or if I should append it.

seberg · 2016-01-04T10:22:09Z

8000

Frankly, I got no idea if replacing Exception args is good or not :). If it is common practice in python, sounds fine to me (I do guess the traceback lines are nicer).

seberg · 2016-01-04T10:25:23Z

numpy/lib/tests/test_io.py

@@ -633,6 +633,15 @@ def test_usecols(self):
        x = np.loadtxt(c, dtype=float, usecols=np.array([1, 2]))
        assert_array_equal(x, a[:, 1:])

+        # Testing with an integer instead of a sequence


If you want to test with something really crazy, could test with:

class CrazyInt(object): def __index__(self): return 0

but I am OK with leaving it out.

You're right it's better to add a test with a custom integer type. If I had done that from the beginning I would have seen the problem with my previous way to check integers.

njsmith · 2016-01-04T12:09:34Z

As a general rule we don't consider changing the text of an error message to be a compatibility breaking change, i.e. feel free to change them if it makes things better.

seberg · 2016-01-04T12:09:40Z

numpy/lib/npyio.py

+                    )
+                raise
+        # Fall back to existing code
+        usecols = usecols_as_list


Actually, I think there is no reason to create usecols_as_list at all. Also the comment is rather unnecessary "existing code" is not existing code anymore as soon as we put this in ;).

By existing code I meant all the lines below this one where usecols is used.
You're right I can use usecols and avoid the creation of usecols_as_list. I wished to make this easier to read, I hate code like foo=some_fancy_type(foo) buried deep in some code and then waste time trying to understand why foo is not any more what it was at the beginning.

seberg · 2016-01-04T12:11:08Z

@njsmith that is not what was meant. The question is, is it safe/good practice to do

try:
   ...
except Exception as e:
    e.args = ("new message",)
    raise

seberg · 2016-01-04T12:13:04Z

Anyway, should be pretty much ready. Before the end, please squash the commits into one or two only.

seberg · 2016-01-04T12:28:24Z

Seems like while in theory someone could break the .args modifying code (though I am not even sure you can in the case of operator.index), in practice it seems considered OK to do so.

seberg · 2016-01-04T13:29:37Z

numpy/lib/npyio.py

        Which columns to read, with 0 being the first.  For example,
-        ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
+        ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns and
+        ``usecols = 3`` or  ``usecols = (3,)`` will extract the third column.


Ohh! I forgot. Can you add a ..versionadded flag (grep for some other examples for how it should look). Also might as well add it to the release notes as Jaime said.

Woops, I missed this one in my squash...

BTW, what version number should I put in ?

Ah, 1.11 is the next release, so that is it.

Hmmm, maybe you can rewrite this a bit? I think the typical thing would be to leave the first part as it was and only add the new stuff after the tag (not sure exactly how it looks like out of my head):

Old documentation .. versionadded:: 1.11.0 Describe that you can also use a single integer.

I--P · 2016-01-04T13:53:57Z

Err... I have no idea what I've done wrong with this release notes commit

seberg · 2016-01-04T13:59:14Z

@I--P it is quite simple, you have to rebase against the new master. The release notes get updated 67ED quite a lot unfortunatly, so they commonly create merge conflicts. Updating your master and rebasing should solve it.

I--P · 2016-01-04T14:04:05Z

I was sure I made a rebase 5 minutes ago... Looks like I should get some sleep !

homu · 2016-01-06T23:28:41Z

☔ The latest upstream changes (presumably #6932) made this pull request unmergeable. Please resolve the merge conflicts.

I--P · 2016-01-07T10:40:24Z

It should be OK now

seberg · 2016-02-01T10:09:58Z

Sorry, I forgot about this a bit :(. Can you do a rebase again and move the stuff to the 1.12 release notes? Then we can finally put this in.

I--P · 2016-02-01T12:46:34Z

That's not the most important feature in the long PR list anyway :-)
It should be rebased to 1.12 now.

seberg · 2016-02-01T13:15:04Z

OK, thanks!

ENH: usecols now accepts an int when only one column has to be read

charris added 01 - Enhancement component: numpy.lib labels Nov 11, 2015

I--P force-pushed the loadtxt-int branch from a904805 to 42f3049 Compare November 13, 2015 08:57

seberg reviewed Jan 4, 2016
View reviewed changes

I--P force-pushed the loadtxt-int branch from f208af4 to 0489630 Compare January 4, 2016 13:36

I--P force-pushed the loadtxt-int branch 2 times, most recently from 6751893 to 041e87b Compare January 4, 2016 14:12

I--P force-pushed the loadtxt-int branch 2 times, most recently from 172ea70 to 5453b77 Compare January 7, 2016 10:38

ENH: usecols now accepts an int when only one column has to be read

849b818

I--P force-pushed the loadtxt-int branch 2 times, most recently from 69f37f1 to 849b818 Compare February 1, 2016 11:13

seberg added a commit that referenced this pull request Feb 1, 2016

Merge pull request #6656 from I--P/loadtxt-int

d1dada1

ENH: usecols now accepts an int when only one column has to be read

seberg merged commit d1dada1 into numpy:master Feb 1, 2016

homu mentioned this pull request Feb 1, 2016

ENH: Added 'doane' and 'sqrt' estimators to np.histogram in numpy.function_base #7090

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: usecols now accepts an int when only one column has to be read #6656

ENH: usecols now accepts an int when only one column has to be read #6656

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: usecols now accepts an int when only one column has to be read #6656

ENH: usecols now accepts an int when only one column has to be read #6656

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!