8000 Infer compression from non-string paths by dhimmel · Pull Request #17206 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

Infer compression from non-string paths #17206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 15, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Test: infer compression from pathlib.Path
  • Loading branch information
dhimmel committed Aug 15, 2017
commit ec0a292919f5ff4aabe54c3bef322c6f7caac42e
15 changes: 15 additions & 0 deletions pandas/tests/io/parser/compression.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,21 @@ def test_read_csv_infer_compression(self):

inputs[3].close()

def test_read_csv_infer_compression_pathlib(self):
"""
Test that compression is inferred from pathlib.Path paths.
"""
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test pass on master (or are you going to make changes so that it will) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of a3b9ee9, this test should fail when pathlib is available and be skipped otherwise.

Once I commit the enhancement, this test should pass when pathlib is available and be skipped otherwise.

Copy link
Contributor
@TomAugspurger TomAugspurger Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your plan for the fix? We do have the __fspath__ stuff now. It's possible we could just ensure that the path has been stringified before doing the compression inference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger see b4e280f. The fix should be simple I think, just passing filepath_or_buffer through the _stringify_path function in _infer_compression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, perfect (the fspath stuff happens inside _stringify_path). Let's hope that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, let's take advantage of pytest.importorskip functionality here instead of doing this try-except block.

import pathlib
except ImportError:
pytest.skip('need pathlib to run')
expected = self.read_csv(self.csv1, index_col=0, parse_dates=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Construct the expected DataFrame from the DataFrame constructor, not from read_csv.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to do that... I followed the method used by other tests in this module, e.g. test_read_csv_infer_compression. Is csv1 available in memory somewhere? I agree that it would be nice to reduce the runtime burden of this test / dependencies on a file... But, it's not something I should manually create, I don't think.

for extension in '', '.gz', '.bz2':
Copy link
Member
@gfyoung gfyoung Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the pytest.mark.parametrize decorator for the extension.

path = pathlib.Path(self.csv1 + extension)
df = self.read_csv(
path, index_col=0, parse_dates=True, compression='infer')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this test entirely. instead add a generalized compression test (iow for all extensions & path types) in test_common, similar to how fspath is tested (you specificy all parameters then you may skip on some combinations).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback, Just to make sure we're on the same page. All path types would be:

  • local paths via strings
  • local paths via fspath defining objects
  • URLs

All compression types would be None, xz, bz2, zip, and gzip. But in all cases, we'd be inferring compression?

Just want to make sure the scope of this test should be primarily compression='infer'? Any more guidance will be appreciated.

tm.assert_frame_equal(expected, df)

def test_invalid_compression(self):
msg = 'Unrecognized compression type: sfark'
with tm.assert_raises_regex(ValueError, msg):
Expand Down
0