8000 Raise error in usecols when column doesn't exist but length matches by bpraggastis · Pull Request #16460 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

Raise error in usecols when column doesn't exist but length matches #16460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 4, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
tests added for gh-14671, expected behavior of simultaneous use of us…
…ecols and names unclear so these tests are commented out
  • Loading branch information
brendapraggastis authored and TomAugspurger committed Jun 4, 2017
commit 1968a70a4b821b0075d87c3b37e273ec876d84bf
32 changes: 29 additions & 3 deletions pandas/tests/io/parser/usecols.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,18 +478,44 @@ def test_uneven_length_cols(self):
tm.assert_frame_equal(df, expected)

def test_raise_on_usecols_names_mismatch(self):
# see gh-14671
## see gh-14671
data = 'a,b,c,d\n1,2,3,4\n5,6,7,8'
msg = 'Usecols do not match names' ## from parsers.py CParserWrapper()
Copy link
Member
@gfyoung gfyoung May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition the message on self.engine i.e.:

msg = <first-message> if self.engine == 'c' else <second-message>

That way you don't need that massive regex (and can remove the re import)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this one as well

msg2 = 'is not in list' ## from parser.py _handle_usecols()

usecols = ['a','b','c','d']
df = self.read_csv(StringIO(data), usecols=usecols)
expected = DataFrame({'a': [1,5], 'b': [2,6], 'c': [3,7], 'd': [4,8]})
tm.assert_frame_equal(df, expected)

msg = 'Usecols do not match names' ## from parsers.py CParserWrapper()
msg2 = 'is not in list' ## from parser.py _handle_usecols()
usecols = ['a','b','c','f']
with tm.assert_raises_regex(ValueError, re.compile("'" + msg + '||' + msg2 + "'")):
self.read_csv(StringIO(data), usecols=usecols)

usecols = ['a','b','f']
with tm.assert_raises_regex(ValueError, re.compile("'" + msg + '||' + msg2 + "'")):
self.read_csv(StringIO(data), usecols=usecols)

names = ['A', 'B', 'C', 'D']

df = self.read_csv(StringIO(data), header=0, names=names)
expected = DataFrame({'A': [1,5], 'B': [2,6], 'C': [3,7], 'D': [4,8]})
tm.assert_frame_equal(df, expected)

# usecols = ['A','C']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those failures are related to #16469. Should put a TODO there I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpraggastis I think add a TODO here with the issue above.

# df = self.read_csv(StringIO(data), header=0, names=names, usecols=usecols)
# expected = DataFrame({'A': [1,5], 'C': [3,7]})
# tm.assert_frame_equal(df, expected)
#
# usecols = [0,2]
# df = self.read_csv(StringIO(data), header=0, names=names, usecols=usecols)
# expected = DataFrame({'A': [1,5], 'C': [3,7]})
# tm.assert_frame_equal(df, expected)


usecols = ['A','B','C','f']
with tm.assert_raises_regex(ValueError, re.compile("'" + msg + '||' + msg2 + "'")):
self.read_csv(StringIO(data), header=0, names=names, usecols=usecols)
usecols = ['A','B','f']
with tm.assert_raises_regex(ValueError, re.compile("'" + msg + '||' + msg2 + "'")):
self.read_csv(StringIO(data), names=names, usecols=usecols)
0