8000 gh-132983: Add `compression.zstd` and Python tests by emmatyping · Pull Request #133365 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-132983: Add compression.zstd and Python tests #133365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
May 6, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
f4cd026
Add Python files
emmatyping Apr 30, 2025
4168895
Fix byteswarning in test
emmatyping Apr 30, 2025
a22fa9b
Remove shape tests
emmatyping May 3, 2025
e70e03b
Make namedtuples dataclasses
emmatyping May 4, 2025
cbf0ef8
Apply suggestions from AA-Turner
emmatyping May 4, 2025
298b369
Clean up chunk calculations in train_dict
emmatyping May 4, 2025
a22be68
Fix _CLValues instantiation
emmatyping May 4, 2025
b30ed02
More cleanup of train_/finalize_dict
emmatyping May 4, 2025
307a894
Have train_/finalize_dict take tuple not list
emmatyping May 4, 2025
dd716a4
Ensure trailing data raises errors
emmatyping May 4, 2025
9b4765b
Remove paramter bounds caching and unsupported...
emmatyping May 4, 2025
214cd60
Use kwargs for code clarity
emmatyping May 4, 2025
e1f53b1
Clean up imports in zstd tests
emmatyping May 4, 2025
4c00026
Use _1K instead of 1024 in tests
emmatyping May 4, 2025
99653d2
Move compression.zstd.zstdfile to compression.zstd._zstdfile
emmatyping May 4, 2025
e403a25
Change compressLevel_values to COMPRESSION_LEVEL_DEFAULT
emmatyping May 4, 2025
63625bc
Fix tests for change in error message
emmatyping May 4, 2025
1ea4b9a
Make parameter names snake case
emmatyping May 4, 2025
ad05da8
Replace compressionLevel_values re-export with COMPRESSION_LEVEL_DEFAULT
emmatyping May 4, 2025
e82e23d
Move zstd_support_multithread to tests and rename
emmatyping May 4, 2025
7801b6b
Update module docstring for compression.zstd
emmatyping May 4, 2025
df5d827
Clarify Strategy stability in docstring
emmatyping May 4, 2025
4ff48da
Fix formatting in tarfile
emmatyping May 4, 2025
c68a896
Remove zstd_support_multithread from __all__
emmatyping May 4, 2025
326400d
Add test_name from upstream
emmatyping May 4, 2025
2c0c9a1
Don't close tarfile if there is a BaseException
emmatyping May 4, 2025
49f3821
Use options kwarg in tests
emmatyping May 4, 2025
8ba6bda
Use options kwarg in tests in more places
emmatyping May 4, 2025
129d5e6
Adopt suggestions by Tomas R. for _zstdfile
emmatyping May 4, 2025
7d54d35
Formatting fixes in zstd tests
emmatyping May 4, 2025
03795ec
Improve docstrings for (de)compress
emmatyping May 4, 2025
01fcfcb
Fix some line length issues
emmatyping May 4, 2025
f04494c
Improve docstring on C/DParameter.bounds()
emmatyping May 4, 2025
caa40b1
Improve docstrings and formatting
emmatyping May 4, 2025
4584ec5
Add missing f string prefix
emmatyping May 4, 2025
3cafdc6
Fix weird indent in _zstdfile.py
emmatyping May 4, 2025
8cb0846
Use io.open instead of builtins.open
emmatyping May 5, 2025
c7d5d67
Remove _READER_CLASS from ZstdFile
emmatyping May 5, 2025
a56a22e
Adopt many suggestions from AA-Turner for ZstdFile
emmatyping May 5, 2025
7e919c8
Set self._buffer to None
emmatyping May 5, 2025
389faed
Move _nbytes to _zstdfile.py
emmatyping May 5, 2025
006ef2e
Move test_zstd to file
emmatyping May 5, 2025
c846b78
Rename C/DParameter to (De)CompressionParameter
emmatyping May 5, 2025
fa0cb0c
regen clinic
AA-Turner May 5, 2025
74e4d2b
Fix whitespace issue
emmatyping May 5, 2025
03fff3d
Remove makefile test dir
emmatyping May 5, 2025
a99c5dd
swap order of parameters in _get_param_bounds
AA-Turner May 5, 2025
a12a031
Merge branch 'main' into 3.14-zstd-python-code
emmatyping May 5, 2025
bf94aad
Sort imports
AA-Turner May 5, 2025
5b45ec7
Improve docstrings
AA-Turner May 5, 2025
b0eca5a
Remove comments
AA-Turner May 5, 2025
c0d0e10
Remove unused private variables
AA-Turner May 5, 2025
10f0cff
Misc changes (positional-only, style, error messages)
AA-Turner May 5, 2025
7f8c350
whitespace
AA-Turner May 5, 2025
bf4b07d
Remove _set_parameter_types
AA-Turner May 5, 2025
eaf46a8
Revert "Remove _set_parameter_types"
emmatyping May 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Adopt many suggestions from AA-Turner for ZstdFile
* rename filename argument to file
* improve __init__ mode and argument checking
* docstring and error rewording
* renamed self._closefp to self._close_fp
* removed mode_code from __init__
* removed unneeded self._READER_CLASS

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
  • Loading branch information
emmatyping and AA-Turner committed May 5, 2025
commit a56a22e1444b375f4c985bcb25a5a9b5fe1958d6
93 changes: 42 additions & 51 deletions Lib/compression/zstd/_zstdfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ class ZstdFile(_streams.BaseStream):

def __init__(
self,
filename,
file,
/,
mode="r",
*,
level=None,
Expand All @@ -40,7 +41,7 @@ def __init__(
):
"""Open a zstd compressed file in binary mode.

filename can be either an actual file name (given as a str, bytes, or
file can be either an actual file name (given as a str, bytes, or
PathLike object), in which case the named file is opened, or it can be
an existing file object to read from or write to.

Expand All @@ -58,29 +59,23 @@ def __init__(
See the function train_dict for how to train a ZstdDict on sample data.
"""
self._fp = None
self._closefp = False
self._close_fp = False
self._mode = _MODE_CLOSED

if not isinstance(mode, str):
raise ValueError("mode must be a str")
# Read or write mode
if mode in ("r", "rb"):
if not isinstance(options, (type(None), dict)):
raise TypeError(
(
"In read mode (decompression), options argument "
"should be a dict object, that represents "
"decompression options."
)
)
if options is not None and not isinstance(options, dict):
raise TypeError("options must be a dict or None")
mode = mode.removesuffix("b") # handle rb, wb, xb, ab
if mode == "r":
if level is not None:
raise TypeError("level argument should only be passed when "
"writing.")
mode_code = _MODE_READ
elif mode in ("w", "wb", "a", "ab", "x", "xb"):
if not isinstance(level, (type(None), int)):
raise TypeError("level argument should be an int object.")
if not isinstance(options, (type(None), dict)):
raise TypeError("options argument should be an dict object.")
mode_code = _MODE_WRITE
raise TypeError("level is illegal in read mode")
self._mode = _MODE_READ
elif mode in {"w", "a", "x"}:
if level is not None and not isinstance(level, int):
raise TypeError("level must be int or None")
self._mode = _MODE_WRITE
self._compressor = ZstdCompressor(
level=level, options=options, zstd_dict=zstd_dict
)
Expand All @@ -89,17 +84,15 @@ def __init__(
raise ValueError(f"Invalid mode: {mode!r}")

# File object
if isinstance(filename, (str, bytes, PathLike)):
if "b" not in mode:
mode += "b"
self._fp = io.open(filename, mode)
self._closefp = True
elif hasattr(filename, "read") or hasattr(filename, "write"):
self._fp = filename
if isinstance(file, (str, bytes, PathLike)):
self._fp = io.open(file, f'{mode}b')
self._close_fp = True
elif ((mode == 'r' and hasattr(file, "read"))
or (mode != 'r' and hasattr(file, "write"))):
self._fp = file
else:
raise TypeError("filename must be a str, bytes, file or PathLike "
"object")
self._mode = mode_code
raise TypeError("file must be a file-like object "
"or a str, bytes, or PathLike object")

if self._mode == _MODE_READ:
raw = _streams.DecompressReader(
Expand All @@ -114,15 +107,14 @@ def __init__(
def close(self):
"""Flush and close the file.

May be called more than once without error. Once the file is
closed, any other operation on it will raise a ValueError.
May be called multiple times. Once the file has been closed,
any other operation on it will raise ValueError.
"""
# Nop if already closed
if self._fp is None:
return
try:
if self._mode == _MODE_READ:
if hasattr(self, "_buffer") and self._buffer:
if getattr(self, '_buffer', None):
self._buffer.close()
self._buffer = None
elif self._mode == _MODE_WRITE:
Expand All @@ -131,11 +123,11 @@ def close(self):
finally:
self._mode = _MODE_CLOSED
try:
if self._closefp:
if self._close_fp:
self._fp.close()
finally:
self._fp = None
self._closefp = False
self._close_fp = False

def write(self, data):
"""Write a bytes-like object *data* to the file.
Expand All @@ -161,9 +153,8 @@ def write(self, data):
def flush(self, mode=FLUSH_BLOCK):
"""Flush remaining data to the underlying stream.

The mode argument can be ZstdFile.FLUSH_BLOCK or ZstdFile.FLUSH_FRAME.
Abuse of this method will reduce compression ratio, use it only when
necessary.
The mode argument can be FLUSH_BLOCK or FLUSH_FRAME. Abuse of this
method will reduce compression ratio, use it only when necessary.

If the program is interrupted afterwards, all data can be recovered.
To ensure saving to disk, also need to use os.fsync(fd).
Expand All @@ -173,10 +164,10 @@ def flush(self, mode=FLUSH_BLOCK):
if self._mode == _MODE_READ:
return
self._check_not_closed()
if mode not in (self.FLUSH_BLOCK, self.FLUSH_FRAME):
raise ValueError("mode argument wrong value, it should be "
"ZstdCompressor.FLUSH_FRAME or "
"ZstdCompressor.FLUSH_BLOCK.")
if mode not in {self.FLUSH_BLOCK, self.FLUSH_FRAME}:
raise ValueError("Invalid mode argument, expected either "
"ZstdFile.FLUSH_FRAME or "
"ZstdFile.FLUSH_BLOCK")
if self._compressor.last_mode == mode:
return
# Flush zstd block/frame, and write.
Expand Down Expand Up @@ -270,8 +261,7 @@ def peek(self, size=-1):
return self._buffer.peek(size)

def __next__(self):
ret = self._buffer.readline()
if ret:
if ret := self._buffer.readline():
return ret
raise StopIteration

Expand Down Expand Up @@ -319,7 +309,8 @@ def writable(self):

# Copied from lzma module
def open(
filename,
file,
/,
mode="rb",
*,
level=None,
Expand All @@ -331,9 +322,9 @@ def open(
):
"""Open a zstd compressed file in binary or text mode.

filename can be either an actual file name (given as a str, bytes, or
PathLike object), in which case the named file is opened, or it can be an
existing file object to read from or write to.
file can be either a file name (given as a str, bytes, or PathLike object),
in which case the named file is opened, or it can be an existing file object
to read from or write to.

The mode parameter can be "r", "rb" (default), "w", "wb", "x", "xb", "a",
"ab" for binary mode, or "rt", "wt", "xt", "at" for text mode.
Expand Down Expand Up @@ -370,7 +361,7 @@ def open(

zstd_mode = mode.replace("t", "")
binary_file = ZstdFile(
filename, zstd_mode, level=level, options=options, zstd_dict=zstd_dict
file, zstd_mode, level=level, options=options, zstd_dict=zstd_dict
)

if "t" in mode:
Expand Down
5 changes: 2 additions & 3 deletions Lib/test/test_zstd/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2121,10 +2121,9 @@ class T:
def read(self, size):
return b'a' * size

with self.assertRaises(AttributeError): # on close
with self.assertRaises(TypeError): # on creation
with ZstdFile(T(), 'w') as f:
with self.assertRaises(AttributeError): # on write
f.write(b'1234')
pass

# 3
with ZstdFile(io.BytesIO(), 'w') as f:
Expand Down
0