-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: usecols now accepts an int when only one column has to be read #6656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm a little wary of the check for non-ints, as it may break people's code and would probably require a deprecation cycle. Other than that, you should add this enhancement to the release notes. |
What Jaime said. As a solution for the conversion, I think you could try:
|
I don't get the problem with the check for non ints, with the current code it raises a TypeError: from io import StringIO # StringIO behaves like a file object Output: The only change is that the error message is a bit more userfriendly imho. Admittedly I should raise TypeError instead of ValueError to avoid breaking existing code though. Just tell me if raising TypeError would be enough or if you really want this to go away. |
There are more int-like classes possible then you can easily check for. One "hope for the best" solution would be to check for You can do the try/except this way around (try list first instead of index first), it does not matter. But the integer check should be done with |
Hmmm, seriously wondering here though. Maybe TypeError is better than ValueError? |
TypeError would not break people's code if they are catching the old exception, so I think I should use it instead of ValueError. |
Is it OK like that ? I don't know if it's a good practice in numpy to delete the original exception message to replace it with my own message or if I should append it. |
Frankly, I got no idea if replacing Exception args is good or not :). If it is common practice in python, sounds fine to me (I do guess the traceback lines are nicer). |
@@ -633,6 +633,15 @@ def test_usecols(self): | |||
x = np.loadtxt(c, dtype=float, usecols=np.array([1, 2])) | |||
assert_array_equal(x, a[:, 1:]) | |||
|
|||
# Testing with an integer instead of a sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to test with something really crazy, could test with:
class CrazyInt(object):
def __index__(self):
return 0
but I am OK with leaving it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right it's better to add a test with a custom integer type. If I had done that from the beginning I would have seen the problem with my previous way to check integers.
As a general rule we don't consider changing the text of an error message to be a compatibility breaking change, i.e. feel free to change them if it makes things better. |
) | ||
raise | ||
# Fall back to existing code | ||
usecols = usecols_as_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think there is no reason to create usecols_as_list at all. Also the comment is rather unnecessary "existing code" is not existing code anymore as soon as we put this in ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By existing code I meant all the lines below this one where usecols is used.
You're right I can use usecols and avoid the creation of usecols_as_list. I wished to make this easier to read, I hate code like foo=some_fancy_type(foo) buried deep in some code and then waste time trying to understand why foo is not any more what it was at the beginning.
@njsmith that is not what was meant. The question is, is it safe/good practice to do
|
Anyway, should be pretty much ready. Before the end, please squash the commits into one or two only. |
Seems like while in theory someone could break the |
Which columns to read, with 0 being the first. For example, | ||
``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns. | ||
``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns and | ||
``usecols = 3`` or ``usecols = (3,)`` will extract the third column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh! I forgot. Can you add a ..versionadded flag (grep for some other examples for how it should look). Also might as well add it to the release notes as Jaime said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, I missed this one in my squash...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, what version number should I put in ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, 1.11 is the next release, so that is it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, maybe you can rewrite this a bit? I think the typical thing would be to leave the first part as it was and only add the new stuff after the tag (not sure exactly how it looks like out of my head):
Old documentation
.. versionadded:: 1.11.0
Describe that you can also use a single integer.
Err... I have no idea what I've done wrong with this release notes commit |
@I--P it is quite simple, you have to rebase against the new master. The release notes get updated quite a lot unfortunatly, so they commonly create merge conflicts. Updating your master and rebasing should solve it. |
I was sure I made a rebase 5 minutes ago... Looks like I should get some sleep ! |
6751893
to
041e87b
Compare
☔ The latest upstream changes (presumably #6932) made this pull request unmergeable. Please resolve the merge conflicts. |
172ea70
to
5453b77
Compare
It should be OK now |
Sorry, I forgot about this a bit :(. Can you do a rebase again and move the stuff to the 1.12 release notes? Then we can finally put this in. |
69f37f1
to
849b818
Compare
That's not the most important feature in the long PR list anyway :-) |
OK, thanks! |
ENH: usecols now accepts an int when only one column has to be read
As explained on the Numpy-discussion mailing list I've seen many students, coming from Matlab, struggling against the usecols argument of loadtxt when they had to read only one column in a file.
This simple PR aims at making loadtxt more user friendly for newcomers, usecol behavior has been modified to accept either an int or a sequence when a single column has to be read.