-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Allow compression in NDFrame.to_csv to be a dict with optional arguments (#26023) #26024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
4e73dc4
ENH/BUG: Add arcname to to_csv for ZIP compressed csv filename (#26023)
drew-heenan ab7620d
DOC: Updated docs for arcname in NDFrame.to_csv (#26023)
drew-heenan 2e782f9
conform to line length limit
drew-heenan 83e8834
Fixed test_to_csv_zip_arcname for Windows paths
drew-heenan d238878
Merge remote-tracking branch 'upstream/master' into issue-26023
drew-heenan b41be54
to_csv compression may now be dict with possible keys 'method' and 'a…
drew-heenan 60ea58c
test_to_csv_compression_dict uses compression_only fixture
drew-heenan 8ba9082
delegate dict handling to _get_compression_method, type annotations
drew-heenan 0a3a9fd
fix import order, None type annotations
drew-heenan a1cb3f7
compression args passed as kwargs, update relevant docs
drew-heenan af2a96c
style/doc improvements, change arcname to archive_name
drew-heenan 5853a28
Merge branch 'master' into issue-26023
drew-heenan 789751f
Merge branch 'master' into issue-26023
drew-heenan 5b09e6f
add to_csv example, no method test, Optional types, tweaks; update wh…
drew-heenan 68a2b4d
remove Index import type ignore
drew-heenan c856f50
Revert "remove Index import type ignore"
drew-heenan 8df6c81
Merge remote-tracking branch 'upstream/master' into issue-26023
drew-heenan
8000
Apr 26, 2019
40d0252
Merge branch 'master' into issue-26023
drew-heenan 18a735d
Improve docs/examples
drew-heenan 103c877
Merge branch 'master' into issue-26023
drew-heenan b6c34bc
Merge remote-tracking branch 'upstream/master' into issue-26023
WillAyd 969d387
Added back missed Callable import in generic
WillAyd abfbc0f
Merge remote-tracking branch 'upstream/master' into issue-26023
WillAyd 04ae25d
Address comments
WillAyd 9c22652
Typing cleanup
WillAyd 56a75c2
Cleaned up docstring
WillAyd bbfea34
Merge remote-tracking branch 'upstream/master' into issue-26023
WillAyd 7717f16
Merge remote-tracking branch 'upstream/master' into issue-26023
WillAyd 779511e
blackify
WillAyd 780eb04
Merge remote-tracking branch 'upstream/master' into issue-26023
WillAyd 6c4e679
Added annotations where feasible
WillAyd 1b567c9
Black and lint
WillAyd 9324b63
Merge remote-tracking branch 'upstream/master' into issue-26023
WillAyd 7cf65ee
isort fixup
WillAyd 29374f3
Docstring fixup and more annotations
WillAyd 6701aa4
Merge remote-tracking branch 'upstream/master' into issue-26023
WillAyd 0f5489d
lint fixup
WillAyd e04138e
mypy fixup
WillAyd 6f2bf00
whatsnew fixup
WillAyd 865aa81
Annotation and doc fixups
WillAyd 8d1deee
mypy typeshed bug fix
WillAyd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,7 +9,19 @@ | |
import mmap | ||
import os | ||
import pathlib | ||
from typing import IO, AnyStr, BinaryIO, Optional, TextIO, Type | ||
from typing import ( | ||
IO, | ||
Any, | ||
AnyStr, | ||
BinaryIO, | ||
Dict, | ||
List, | ||
Optional, | ||
TextIO, | ||
Tuple, | ||
Type, | ||
Union, | ||
) | ||
from urllib.error import URLError # noqa | ||
from urllib.parse import ( # noqa | ||
urlencode, | ||
|
@@ -255,6 +267,40 @@ def file_path_to_url(path: str) -> str: | |
_compression_to_extension = {"gzip": ".gz", "bz2": ".bz2", "zip": ".zip", "xz": ".xz"} | ||
|
||
|
||
def _get_compression_method( | ||
compression: Optional[Union[str, Dict[str, str]]] | ||
) -> Tuple[Optional[str], Dict[str, str]]: | ||
""" | ||
Simplifies a compression argument to a compression method string and | ||
a dict containing additional arguments. | ||
|
||
Parameters | ||
---------- | ||
compression : str or dict | ||
If string, specifies the compression method. If dict, value at key | ||
'method' specifies compression method. | ||
|
||
Returns | ||
------- | ||
tuple of ({compression method}, Optional[str] | ||
{compression arguments}, Dict[str, str]) | ||
|
||
Raises | ||
------ | ||
ValueError on dict missing 'method' key | ||
""" | ||
# Handle dict | ||
if isinstance(compression, dict): | ||
compression_args = compression.copy() | ||
try: | ||
compression = compression_args.pop("method") | ||
except KeyError: | ||
raise ValueError("If dict, compression must have key 'method'") | ||
else: | ||
compression_args = {} | ||
return compression, compression_args | ||
|
||
|
||
def _infer_compression( | ||
filepath_or_buffer: FilePathOrBuffer, compression: Optional[str] | ||
) -> Optional[str]: | ||
|
@@ -266,21 +312,20 @@ def _infer_compression( | |
|
||
gfyoung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Parameters | ||
---------- | ||
filepath_or_buffer : | ||
a path (str) or buffer | ||
filepath_or_buffer : str or file handle | ||
File path or object. | ||
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None} | ||
If 'infer' and `filepath_or_buffer` is path-like, then detect | ||
compression from the following extensions: '.gz', '.bz2', '.zip', | ||
or '.xz' (otherwise no compression). | ||
|
||
Returns | ||
------- | ||
string or None : | ||
compression method | ||
string or None | ||
|
||
Raises | ||
------ | ||
ValueError on invalid compression specified | ||
ValueError on invalid compression specified. | ||
""" | ||
|
||
# No compression has been explicitly specified | ||
|
@@ -312,32 +357,49 @@ def _infer_compression( | |
|
||
|
||
def _get_handle( | ||
path_or_buf, mode, encoding=None, compression=None, memory_map=False, is_text=True | ||
path_or_buf, | ||
mode: str, | ||
encoding=None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Couldn't annotate this particular argument due to a minor bug in typeshed. Fixed on master so maybe something we can come back to soon (typeshed updates are pretty quick) |
||
compression: Optional[Union[str, Dict[str, Any]]] = None, | ||
memory_map: bool = False, | ||
is_text: bool = True, | ||
): | ||
""" | ||
Get file handle for given path/buffer and mode. | ||
|
||
Parameters | ||
---------- | ||
path_or_buf : | ||
a path (str) or buffer | ||
path_or_buf : str or file handle | ||
File path or object. | ||
mode : str | ||
mode to open path_or_buf with | ||
Mode to open path_or_buf with. | ||
encoding : str or None | ||
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default None | ||
If 'infer' and `filepath_or_buffer` is path-like, then detect | ||
compression from the following extensions: '.gz', '.bz2', '.zip', | ||
or '.xz' (otherwise no compression). | ||
Encoding to use. | ||
compression : str or dict, default None | ||
If string, specifies compression mode. If dict, value at key 'method' | ||
specifies compression mode. Compression mode must be one of {'infer', | ||
'gzip', 'bz2', 'zip', 'xz', None}. If compression mode is 'infer' | ||
and `filepath_or_buffer` is path-like, then detect compression from | ||
the following extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise | ||
no compression). If dict and compression mode is 'zip' or inferred as | ||
'zip', other entries passed as additional compression options. | ||
|
||
.. versionchanged:: 1.0.0 | ||
|
||
May now be a dict with key 'method' as compression mode | ||
and other keys as compression options if compression | ||
mode is 'zip'. | ||
|
||
memory_map : boolean, default False | ||
See parsers._parser_params for more information. | ||
is_text : boolean, default True | ||
whether file/buffer is in text format (csv, json, etc.), or in binary | ||
mode (pickle, etc.) | ||
mode (pickle, etc.). | ||
|
||
Returns | ||
------- | ||
f : file-like | ||
A file-like object | ||
A file-like object. | ||
handles : list of file-like objects | ||
A list of file-like object that were opened in this function. | ||
""" | ||
|
@@ -346,15 +408,16 @@ def _get_handle( | |
|
||
need_text_wrapping = (BufferedIOBase, S3File) | ||
except ImportError: | ||
need_text_wrapping = BufferedIOBase | ||
need_text_wrapping = BufferedIOBase # type: ignore | ||
|
||
handles = list() | ||
handles = list() # type: List[IO] | ||
f = path_or_buf | ||
|
||
# Convert pathlib.Path/py.path.local or string | ||
path_or_buf = _stringify_path(path_or_buf) | ||
is_path = isinstance(path_or_buf, str) | ||
|
||
compression, compression_args = _get_compression_method(compression) | ||
WillAyd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if is_path: | ||
compression = _infer_compression(path_or_buf, compression) | ||
|
||
|
@@ -376,7 +439,7 @@ def _get_handle( | |
|
||
# ZIP Compression | ||
elif compression == "zip": | ||
zf = BytesZipFile(path_or_buf, mode) | ||
zf = BytesZipFile(path_or_buf, mode, **compression_args) | ||
# Ensure the container is closed as well. | ||
handles.append(zf) | ||
if zf.mode == "w": | ||
|
@@ -429,9 +492,9 @@ def _get_handle( | |
|
||
if memory_map and hasattr(f, "fileno"): | ||
try: | ||
g = MMapWrapper(f) | ||
wrapped = MMapWrapper(f) | ||
f.close() | ||
f = g | ||
f = wrapped | ||
except Exception: | ||
# we catch any errors that may have occurred | ||
# because that is consistent with the lower-level | ||
|
@@ -456,15 +519,19 @@ def __init__( | |
self, | ||
file: FilePathOrBuffer, | ||
mode: str, | ||
compression: int = zipfile.ZIP_DEFLATED, | ||
archive_name: Optional[str] = None, | ||
**kwargs | ||
): | ||
if mode in ["wb", "rb"]: | ||
mode = mode.replace("b", "") | ||
super().__init__(file, mode, compression, **kwargs) | ||
self.archive_name = archive_name | ||
super().__init__(file, mode, zipfile.ZIP_DEFLATED, **kwargs) | ||
|
||
def write(self, data): | ||
super().writestr(self.filename, data) | ||
archive_name = self.filename | ||
if self.archive_name is not None: | ||
archive_name = self.archive_name | ||
super().writestr(archive_name, data) | ||
|
||
@property | ||
def closed(self): | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.