-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
ENH: Allow compression in NDFrame.to_csv to be a dict with optional arguments (#26023) #26024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
4e73dc4
ab7620d
2e782f9
83e8834
d238878
b41be54
60ea58c
8ba9082
0a3a9fd
a1cb3f7
af2a96c
5853a28
789751f
5b09e6f
68a2b4d
c856f50
8df6c81
40d0252
18a735d
103c877
b6c34bc
969d387
abfbc0f
04ae25d
9c22652
56a75c2
bbfea34
7717f16
779511e
780eb04
6c4e679
1b567c9
9324b63
7cf65ee
29374f3
6701aa4
0f5489d
e04138e
6f2bf00
865aa81
8d1deee
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,7 +9,7 @@ | |
import lzma | ||
import mmap | ||
import os | ||
from typing import Dict, Union | ||
from typing import Dict, Tuple, Union | ||
from urllib.error import URLError # noqa | ||
from urllib.parse import ( # noqa | ||
urlencode, urljoin, urlparse as parse_url, uses_netloc, uses_params, | ||
|
@@ -269,40 +269,27 @@ def _get_compression_method(compression: Union[str, Dict, None]): | |
|
||
def _infer_compression(filepath_or_buffer, compression): | ||
""" | ||
Get the compression method for filepath_or_buffer. If compression mode is | ||
'infer', the inferred compression method is returned. Otherwise, the input | ||
Get the compression method for filepath_or_buffer. If compression='infer', | ||
the inferred compression method is returned. Otherwise, the input | ||
compression method is returned unchanged, unless it's invalid, in which | ||
case an error is raised. | ||
|
||
gfyoung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Parameters | ||
---------- | ||
filepath_or_buffer : | ||
a path (str) or buffer | ||
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None} or dict | ||
If string, specifies compression mode. If dict, value at key 'method' | ||
specifies compression mode. If compression mode is 'infer' and | ||
`filepath_or_buffer` is path-like, then detect compression from the | ||
following extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise no | ||
compression). | ||
|
||
.. versionchanged 0.25.0 | ||
|
||
May now be a dict with required key 'method' specifying compression | ||
mode | ||
|
||
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None} | ||
If 'infer' and `filepath_or_buffer` is path-like, then detect | ||
compression from the following extensions: '.gz', '.bz2', '.zip', | ||
or '.xz' (otherwise no compression). | ||
Returns | ||
------- | ||
string or None : | ||
compression method | ||
|
||
Raises | ||
------ | ||
ValueError on invalid compression specified | ||
""" | ||
|
||
# Handle compression as dict | ||
compression, _ = _get_compression_method(compression) | ||
|
||
# No compression has been explicitly specified | ||
if compression is None: | ||
return None | ||
|
@@ -357,7 +344,8 @@ def _get_handle(path_or_buf, mode, encoding=None, | |
.. versionchanged:: 0.25.0 | ||
|
||
May now be a dict with key 'method' as compression mode | ||
and 'arcname' as CSV file name if mode is 'zip' | ||
and other keys as kwargs for ByteZipFile if compression | ||
mode is 'zip'. | ||
|
||
memory_map : boolean, default False | ||
See parsers._parser_params for more information. | ||
|
@@ -374,7 +362,7 @@ def _get_handle(path_or_buf, mode, encoding=None, | |
""" | ||
try: | ||
from s3fs import S3File | ||
need_text_wrapping = (BytesIO, S3File) | ||
need_text_wrapping = (BytesIO, S3File) # type: Tuple | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was this throwing a Typing error? Think MyPy should be able to infer here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WillAyd Yeah, it was. Couldn’t tell why, but MyPy couldn’t infer when I added types to the function definition. It also occurred on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What error was it giving you? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was giving |
||
except ImportError: | ||
need_text_wrapping = (BytesIO,) | ||
|
||
|
@@ -407,10 +395,7 @@ def _get_handle(path_or_buf, mode, encoding=None, | |
|
||
# ZIP Compression | ||
elif compression == 'zip': | ||
arcname = None | ||
if 'arcname' in compression_args: | ||
arcname = compression_args['arcname'] | ||
zf = BytesZipFile(path_or_buf, mode, arcname=arcname) | ||
zf = BytesZipFile(path_or_buf, mode, **compression_args) | ||
# Ensure the container is closed as well. | ||
handles.append(zf) | ||
if zf.mode == 'w': | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -122,7 +122,7 @@ def __init__(self, obj, path_or_buf=None, sep=",", na_rep='', | |
self.data_index = obj.index | ||
if (isinstance(self.data_index, (ABCDatetimeIndex, ABCPeriodIndex)) and | ||
date_format is not None): | ||
from pandas import Index | ||
from pandas import Index # type: ignore | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What error was this giving you? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MyPy was giving There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make sure you try again on master - I think a separate PR should have resolved this already There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The same error appears on master if any type annotations are added. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @drew-heenan can you remove this and push as a new commit? Again thought we resolved this in a separate PR so would like to validate its not a mypy versioning thing between your local environment and what we have on CI There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WillAyd Just did that - the error still appears. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm OK thanks for confirming. @ryankarlos not sure if you have any insight - thought this would be resolved by #26019 @drew-heenan this isn't a blocker so OK to add back in the ignore I think; can review separate from this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WillAyd Got it, thanks for checking! |
||
self.data_index = Index([x.strftime(date_format) if notna(x) else | ||
'' for x in self.data_index]) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
says Series.to_csv and DataFrame.to_csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback To clarify, do you mean that it should say
Series.to_csv
andDataFrame.to_csv
instead of justNDFrame.to_csv
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what @jreback is trying to say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - I've changed the doc to list both.