8000 ENH: usecols now accepts an int when only one column has to be read by I--P · Pull Request #6656 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: usecols now accepts an int when only one column has to be read #6656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 1, 2016

Conversation

I--P
Copy link
Contributor
@I--P I--P commented Nov 9, 2015

As explained on the Numpy-discussion mailing list I've seen many students, coming from Matlab, struggling against the usecols argument of loadtxt when they had to read only one column in a file.

This simple PR aims at making loadtxt more user friendly for newcomers, usecol behavior has been modified to accept either an int or a sequence when a single column has to be read.

@jaimefrio
Copy link
Member

I'm a little wary of the check for non-ints, as it may break people's code and would probably require a deprecation cycle.

Other than that, you should add this enhancement to the release notes.

@seberg
Copy link
Member
seberg commented Jan 2, 2016

What Jaime said. As a solution for the conversion, I think you could try:

try:
    usecols = [operator.index(usecols)]
except TypeError:
    usecols = list(usecols)

@I--P
Copy link
Contributor Author
I--P commented Jan 4, 2016

I don't get the problem with the check for non ints, with the current code it raises a TypeError:

from io import StringIO # StringIO behaves like a file object
c = StringIO("0 1\n2 3")
np.loadtxt(c, usecols=[1.5])

Output:
Traceback (most recent call last):
File "test.py", line 6, in
np.loadtxt(c, usecols=[1.5])
File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 846, in loadtxt
vals = [vals[i] for i in usecols]
File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 846, in
vals = [vals[i] for i in usecols]
TypeError: list indices must be integers, not float

The only change is that the error message is a bit more userfriendly imho. Admittedly I should raise TypeError instead of ValueError to avoid breaking existing code though.

Just tell me if raising TypeError would be enough or if you really want this to go away.

@seberg
Copy link
Member
seberg commented Jan 4, 2016

There are more int-like classes possible then you can easily check for. One "hope for the best" solution would be to check for numbers.integral. But in the end trying operator.index (basically the duck-type version of it) is the only solution that is guaranteed to not create any regression.

You can do the try/except this way around (try list first instead of index first), it does not matter. But the integer check should be done with operator.index.
Your version probably improves the error message considerably, so 8000 I am fine with adding the explicit check/conversion (could also convert the list into a list filled with operator.index and replace the error message if it fails).

@seberg
Copy link
Member
seberg commented Jan 4, 2016

Hmmm, seriously wondering here though. Maybe TypeError is better than ValueError?

@I--P
Copy link
Contributor Author
I--P commented Jan 4, 2016

TypeError would not break people's code if they are catching the old exception, so I think I should use it instead of ValueError.

@I--P
Copy link
Contributor Author
I--P commented Jan 4, 2016

Is it OK like that ? I don't know if it's a good practice in numpy to delete the original exception message to replace it with my own message or if I should append it.

@seberg
Copy link
Member
seberg commented Jan 4, 2016

Frankly, I got no idea if replacing Exception args is good or not :). If it is common practice in python, sounds fine to me (I do guess the traceback lines are nicer).

@@ -633,6 +633,15 @@ def test_usecols(self):
x = np.loadtxt(c, dtype=float, usecols=np.array([1, 2]))
assert_array_equal(x, a[:, 1:])

# Testing with an integer instead of a sequence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to test with something really crazy, could test with:

class CrazyInt(object):
     def __index__(self):
         return 0

but I am OK with leaving it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right it's better to add a test with a custom integer type. If I had done that from the beginning I would have seen the problem with my previous way to check integers.

@njsmith
Copy link
Member
njsmith commented Jan 4, 2016

As a general rule we don't consider changing the text of an error message to be a compatibility breaking change, i.e. feel free to change them if it makes things better.

)
raise
# Fall back to existing code
usecols = usecols_as_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think there is no reason to create usecols_as_list at all. Also the comment is rather unnecessary "existing code" is not existing code anymore as soon as we put this in ;).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By existing code I meant all the lines below this one where usecols is used.
You're right I can use usecols and avoid the creation of usecols_as_list. I wished to make this easier to read, I hate code like foo=some_fancy_type(foo) buried deep in some code and then waste time trying to understand why foo is not any more what it was at the beginning.

@seberg
Copy link
Member
seberg commented Jan 4, 2016

@njsmith that is not what was meant. The question is, is it safe/good practice to do

try:
   ...
except Exception as e:
    e.args = ("new message",)
    raise

@seberg
Copy link
Member
seberg commented Jan 4, 2016

Anyway, should be pretty much ready. Before the end, please squash the commits into one or two only.

@seberg
Copy link
Member
seberg commented Jan 4, 2016

Seems like while in theory someone could break the .args modifying code (though I am not even sure you can in the case of operator.index), in practice it seems considered OK to do so.

Which columns to read, with 0 being the first. For example,
``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns and
``usecols = 3`` or ``usecols = (3,)`` will extract the third column.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh! I forgot. Can you add a ..versionadded flag (grep for some other examples for how it should look). Also might as well add it to the release notes as Jaime said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, I missed this one in my squash...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, what version number should I put in ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, 1.11 is the next release, so that is it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, maybe you can rewrite this a bit? I think the typical thing would be to leave the first part as it was and only add the new stuff after the tag (not sure exactly how it looks like out of my head):

Old documentation

.. versionadded:: 1.11.0

Describe that you can also use a single integer.

@I--P
Copy link
Contributor Author
I--P commented Jan 4, 2016

Err... I have no idea what I've done wrong with this release notes commit

@seberg
Copy link
Member
seberg commented Jan 4, 2016

@I--P it is quite simple, you have to rebase against the new master. The release notes get updated quite a lot unfortunatly, so they commonly create merge conflicts. Updating your master and rebasing should solve it.

@I--P
Copy link
Contributor Author
I--P commented Jan 4, 2016

I was sure I made a rebase 5 minutes ago... Looks like I should get some sleep !

@I--P I--P force-pushed the loadtxt-int branch 2 times, most recently from 6751893 to 041e87b Compare January 4, 2016 14:12
@homu
Copy link
Contributor
homu commented Jan 6, 2016

☔ The latest upstream changes (presumably #6932) made this pull request unmergeable. Please resolve the merge conflicts.

@I--P I--P force-pushed the loadtxt-int branch 2 times, most recently from 172ea70 to 5453b77 Compare January 7, 2016 10:38
@I--P
Copy link
Contributor Author
I--P commented Jan 7, 2016

It should be OK now

@seberg
Copy link
Member
seberg commented Feb 1, 2016

Sorry, I forgot about this a bit :(. Can you do a rebase again and move the stuff to the 1.12 release notes? Then we can finally put this in.

@I--P I--P force-pushed the loadtxt-int branch 2 times, most recently from 69f37f1 to 849b818 Compare February 1, 2016 11:13
@I--P
Copy link
Contributor Author
I--P commented Feb 1, 2016

That's not the most important feature in the long PR list anyway :-)
It should be rebased to 1.12 now.

@seberg
Copy link
Member
seberg commented Feb 1, 2016

OK, thanks!

seberg added a commit that referenced this pull request Feb 1, 2016
ENH: usecols now accepts an int when only one column has to be read
@seberg seberg merged commit d1dada1 into numpy:master Feb 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0