From 49d9c306afde0044afb2a1312dc8755a432dab85 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 11 May 2025 19:42:30 -0700 Subject: [PATCH 01/22] Add documentation for compression.zstd --- Doc/library/archiving.rst | 4 +- Doc/library/compression.rst | 25 ++ Doc/library/compression.zstd.rst | 714 +++++++++++++++++++++++++++++++ 3 files changed, 742 insertions(+), 1 deletion(-) create mode 100644 Doc/library/compression.rst create mode 100644 Doc/library/compression.zstd.rst diff --git a/Doc/library/archiving.rst b/Doc/library/archiving.rst index c9284949af4972..9148faf73d4518 100644 --- a/Doc/library/archiving.rst +++ b/Doc/library/archiving.rst @@ -5,13 +5,15 @@ Data Compression and Archiving ****************************** The modules described in this chapter support data compression with the zlib, -gzip, bzip2 and lzma algorithms, and the creation of ZIP- and tar-format +gzip, bzip2, zstd, and lzma algorithms, and the creation of ZIP- and tar-format archives. See also :ref:`archiving-operations` provided by the :mod:`shutil` module. .. toctree:: + compression.rst + compression.zstd.rst zlib.rst gzip.rst bz2.rst diff --git a/Doc/library/compression.rst b/Doc/library/compression.rst new file mode 100644 index 00000000000000..d2f3b93d764da0 --- /dev/null +++ b/Doc/library/compression.rst @@ -0,0 +1,25 @@ +The :mod:`!compression` package +================================= + +.. versionadded:: 3.14 + +.. note:: + + Several modules in :mod:`!compression` re-export modules that currently + exist at the repository top-level. These re-exported modules are the new + canonical import name for the respective top-level module. The existing + modules are not currently deprecated and will not be removed prior to Python + 3.19, but users are encouraged to migrate to the new import names when + feasible. + +* :mod:`!compression.bz2` -- Re-exports :mod:`bz2` + +* :mod:`!compression.gzip` -- Re-exports :mod:`gzip` + +* :mod:`!compression.lzma` -- Re-exports :mod:`lzma` + +* :mod:`!compression.zlib` -- Re-exports :mod:`zlib` + +* :mod:`compression.zstd` -- Wrapper for the Zstandard compression library + + diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst new file mode 100644 index 00000000000000..6f4cbc64f13c02 --- /dev/null +++ b/Doc/library/compression.zstd.rst @@ -0,0 +1,714 @@ +:mod:`!compression.zstd` --- Compression compatible with the Zstandard format +============================================================================= + +.. module:: compression.zstd + :synopsis: Low level interface to compression and decompression routines in + Meta's zstd library + +.. versionadded:: 3.14 + +**Source code:** :source:`Lib/compression/zstd/__init__.py` + +-------------- + +This module provides classes and convenience functions for compressing and +decompressing data using the Zstandard (or "zstd") compression algorithm. Also +included is a file interface supporting reading and writing contents of ``.zst`` +files created from the :program:`zstd` utility, as well as raw zstd compressed +streams. + +The :mod:`compression.zstd` module contains: + +* The :func:`.open` function and :class:`ZstdFile` class for reading and + writing compressed files. +* The :class:`ZstdCompressor` and :class:`ZstdDecompressor` classes for + incremental (de)compression. +* The :func:`compress` and :func:`decompress` functions for one-shot + (de)compression. +* The :func:`train_dict` and :func:`finalize_dict` functions and the + :class:`ZstdDict` class to train and manage Zstandard dictionaries. +* The :class:`CompressionParameter`, :class:`DecompressionParameter`, and + :class:`Strategy` classes for setting advanced (de)compression parameters. + + +.. exception:: ZstdError + + This exception is raised when an error occurs during compression or + decompression, or while initializing the (de)compressor state. + + +Reading and writing compressed files +------------------------------------ + +.. function:: open(file, /, mode="rb", *, level=None, options=None, zstd_dict=None, encoding=None, errors=None, newline=None) + + Open an Zstandard-compressed file in binary or text mode, returning a + :term:`file object`. + + The *file* argument can be either an actual file name (given as a + :class:`str`, :class:`bytes` or :term:`path-like ` object), + in which case the named file is opened, or it can be an existing file object + to read from or write to. + + The mode argument can be either 'r' for reading (default), 'w' for + overwriting, 'a' for appending, or 'x' for exclusive creation. These can + equivalently be given as 'rb', 'wb', 'ab', and 'xb' respectively. You may + also open in text mode with 'rt', 'wt', 'at', and 'xt' respectively. + + When opening a file for reading, the *options* argument can be a dictionary + providing advanced decompression parameters, see + :class:`DecompressionParameter` for detailed information about supported + parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be + used during decompression. When opening a file for reading, the *level* + argument should not be used. + + When opening a file for writing, the *options* argument can be a dictionary + providing advanced decompression parameters, see + :class:`CompressionParameter` for detailed information about supported + parameters. The *level* argument is the compression level to use when + writing compressed data. Only one of *level* or *options* may be passed. The + *zstd_dict* argument is a :class:`ZstdDict` instance to be used during + compression. + + For binary mode, this function is equivalent to the :class:`ZstdFile` + constructor: ``ZstdFile(file, mode, ...)``. In this case, the + *encoding*, *errors* and *newline* parameters must not be provided. + + For text mode, a :class:`ZstdFile` object is created, and wrapped in an + :class:`io.TextIOWrapper` instance with the specified encoding, error handling + behavior, and line ending(s). + + +.. class:: ZstdFile(file, /, mode="r", *, level=None, options=None, zstd_dict=None) + + Open a Zstandard-compressed file in binary mode. + + A :class:`ZstdFile` can wrap an already-open :term:`file object`, or operate + directly on a named file. The *file* argument specifies either the file + object to wrap, or the name of the file to open (as a :class:`str`, + :class:`bytes` or :term:`path-like ` object). When + wrapping an existing file object, the wrapped file will not be closed when + the :class:`ZstdFile` is closed. + + The *mode* argument can be either ``"r"`` for reading (default), ``"w"`` for + overwriting, ``"x"`` for exclusive creation, or ``"a"`` for appending. These + can equivalently be given as ``"rb"``, ``"wb"``, ``"xb"`` and ``"ab"`` + respectively. + + If *file* is a file object (rather than an actual file name), a mode of + ``"w"`` does not truncate the file, and is instead equivalent to ``"a"``. + + When opening a file for reading, the *options* argument can be a dictionary + providing advanced decompression parameters, see + :class:`DecompressionParameter` for detailed information about supported + parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be + used during decompression. When opening a file for reading, the *level* + argument should not be used. + + When opening a file for writing, the *options* argument can be a dictionary + providing advanced decompression parameters, see + :class:`CompressionParameter` for detailed information about supported + parameters. The *level* argument is the compression level to use when + writing compressed data. Only one of *level* or *options* may be passed. The + *zstd_dict* argument is a :class:`ZstdDict` instance to be used during + compression. + + When opening a file for writing, the *options*, *zstd_dict* and *level* + arguments have the same meanings as for :class:`ZstdCompressor`. + + :class:`ZstdFile` supports all the members specified by + :class:`io.BufferedIOBase`, except for :meth:`~io.BufferedIOBase.detach` + and :meth:`~io.IOBase.truncate`. + Iteration and the :keyword:`with` statement are supported. + + The following method and attributes are also provided: + + .. method:: peek(size=-1) + + Return buffered data without advancing the file position. At least one + byte of data will be returned, unless EOF has been reached. The exact + number of bytes returned is unspecified (the *size* argument is ignored). + + .. note:: While calling :meth:`peek` does not change the file position of + the :class:`ZstdFile`, it may change the position of the underlying + file object (e.g. if the :class:`ZstdFile` was constructed by passing a + file object for *filename*). + + .. attribute:: mode + + ``'rb'`` for reading and ``'wb'`` for writing. + + + .. attribute:: name + + The zstd file name. Equivalent to the :attr:`~io.FileIO.name` + attribute of the underlying :term:`file object`. + + +Compressing and decompressing data in memory +-------------------------------------------- + +.. function:: compress(data, level=None, options=None, zstd_dict=None) + + Compress *data* (a :term:`bytes-like object`), returning the compressed + data as a :class:`bytes` object. + + The *level* argument is an int object controlling the level of + compression. Please refer to :meth:`CompressionParameter.bounds` to get the + values that can be passed for *level*. If advanced compression options are + needed, this argument must be omitted and in the *options* dictionary the + :attr:`CompressionParameter.compression_level` parameter should be set. + + The *options* argument is a Python dictionary containing advanced compression + parameters. The valid keys and values for compression parameters are + documented as part of the :class:`CompressionParameter` documentation. + + The *zstd_dict* argument is an instance of :class:`ZstdDict`, a Zstandard + dictionary, containing trained data to improve compression efficiency. The + function :func:`train_dict` can be used to generate a Zstandard dictionary. + + +.. function:: decompress(data, zstd_dict=None, options=None) + + Decompress *data* (a :term:`bytes-like object`), returning the uncompressed + data as a :class:`bytes` object. + + The *options* argument is a Python dictionary containing advanced + decompression parameters. The valid keys and values for compression + parameters are documented as part of the :class:`DecompressionParameter` + documentation. + + The *zstd_dict* argument is an instance of :class:`ZstdDict`, a Zstandard + dictionary, containing trained data used during compression. This must be + the same Zstandard dictionary used during compression. + + If *data* is the concatenation of multiple distinct compressed frames, + decompress all of these frames, and return the concatenation of the results. + + +.. class:: ZstdCompressor(level=None, options=None, zstd_dict=None) + + Create a compressor object, which can be used to compress data incrementally. + + For a more convenient way of compressing a single chunk of data, see the + module-level function :func:`compress`. + + The *level* argument is an int object controlling the level of + compression. Please refer to :meth:`CompressionParameter.bounds` to get the + values that can be passed for *level*. If advanced compression options are + needed, this argument must be omitted and in the *options* dictionary the + :attr:`CompressionParameter.compression_level` parameter should be set. + + The *options* argument is a Python dictionary containing advanced compression + parameters. The valid keys and values for compression parameters are + documented as part of the :class:`CompressionParameter` documentation. + + The *zstd_dict* argument is an instance of :class:`ZstdDict`, a Zstandard + dictionary, containing trained data to improve compression efficiency. The + function :func:`train_dict` can be used to generate a Zstandard dictionary. + + .. attribute:: CONTINUE + + Collect more data for compression, which may or may not generate output + immediately. This mode optimizes the compression ratio by maximizing the + amount of data per block and frame. + + .. attribute:: FLUSH_BLOCK + + Complete and write a block to the data stream. The data returned so far + can be immediately decompressed. Past data can still be referenced in + future blocks generated by calls to :meth:`~.compress`, + improving compression. + + .. attribute:: FLUSH_FRAME + + Complete and write out a frame. Future data provided to + :meth:`~.compress` will be written into a new frame and + *cannot* reference past data. + + .. method:: compress(data, mode=ZstdCompressor.CONTINUE) + + Compress *data* (a :term:`bytes-like object`), returning a :class:`bytes` + object if possible, or an empty byte string otherwise. Some of *data* may + be buffered internally, for use in later calls to + :meth:`~.compress` and :meth:`~.flush`. The + returned data should be concatenated with the output of any previous calls + to :meth:`~.compress`. + + The *mode* argument is a :class:`ZstdCompressor` attribute, either + :attr:`~.CONTINUE`, :attr:`~.FLUSH_BLOCK`, + or :attr:`~.FLUSH_FRAME`. + + When you have finished providing data to the compressor, call the + :meth:`~.flush` method to finish the compression process. + + .. method:: flush(mode) + + Finish the compression process, returning a :class:`bytes` object + containing any data stored in the compressor's internal buffers. + + The *mode* argument is a :class:`ZstdCompressor` attribute, either + :attr:`~.FLUSH_BLOCK`, or :attr:`~.FLUSH_FRAME`. + + +.. class:: ZstdDecompressor(zstd_dict=None, options=None) + + Create a decompressor object, which can be used to decompress data + incrementally. + + For a more convenient way of decompressing an entire compressed stream at + once, see the module-level function :func:`decompress`. + + The *options* argument is a Python dictionary containing advanced + decompression parameters. The valid keys and values for compression + parameters are documented as part of the :class:`DecompressionParameter` + documentation. + + The *zstd_dict* argument is an instance of :class:`ZstdDict`, a Zstandard + dictionary, containing trained data used during compression. This must be + the same Zstandard dictionary used during compression. + + .. note:: + This class does not transparently handle inputs containing multiple + compressed frames, unlike the :func:`decompress` function and + :class:`ZstdFile` class. To decompress a multi-frame input, you should + use :func:`decompress`, :class:`ZstdFile` if working with a + :term:`file object`, or multiple :class:`ZstdDecompressor` instances. + + .. method:: decompress(data, max_length=-1) + + Decompress *data* (a :term:`bytes-like object`), returning + uncompressed data as bytes. Some of *data* may be buffered + internally, for use in later calls to :meth:`~.decompress`. + The returned data should be concatenated with the output of any previous + calls to :meth:`~.decompress`. + + If *max_length* is nonnegative, returns at most *max_length* + bytes of decompressed data. If this limit is reached and further + output can be produced, the :attr:`~.needs_input` attribute will + be set to ``False``. In this case, the next call to + :meth:`~.decompress` may provide *data* as ``b''`` to obtain + more of the output. + + If all of the input data was decompressed and returned (either + because this was less than *max_length* bytes, or because + *max_length* was negative), the :attr:`~.needs_input` attribute + will be set to ``True``. + + Attempting to decompress data after the end of a frame will raise a + :exc:`ZstdError`. Any data found after the end of the frame is ignored + and saved in the :attr:`~.unused_data` attribute. + + .. attribute:: eof + + ``True`` if the end-of-stream marker has been reached. + + .. attribute:: unused_data + + Data found after the end of the compressed stream. + + Before the end of the stream is reached, this will be ``b""``. + + .. attribute:: needs_input + + ``False`` if the :meth:`.decompress` method can provide more + decompressed data before requiring new uncompressed input. + + +Zstandard Dictionaries +---------------------- + + +.. function:: train_dict(samples, dict_size) + + Train a Zstandard dictionary, returning a :class:`ZstdDict` instance. + Zstandard dictionaries enable more efficient compression of smaller sizes + of data, which is traditionally difficult to compress due to less repetition. + If you are compressing multiple similar groups of data (such as similar + files), Zstandard dictionaries can improve compression ratios and speed + significantly. + + The *samples* argument (an iterable of :class:`bytes`), is the population of + samples used to train the Zstandard dictionary. + + The *dict_size* argument, an integer, is the maximum size (in bytes) the + Zstandard dictionary should be. The Zstandard documentation suggests an + absolute maximum of no more than 100KB, but the maximum can often be smaller + depending on the data. Larger dictionaries generally slow down compression, + but improve compression ratios. Smaller dictionaries lead to faster + compression, but reduce the compression ratio. + + +.. function:: finalize_dict(zstd_dict, /, samples, dict_size, level) + + An advanced function for converting a "raw content" Zstandard dictionary into + a regular Zstandard dictionary. "Raw content" dictionaries are a sequence of + bytes that do not need to follow the structure of a normal Zstandard + dictionary. + + The *zstd_dict* argument is a :class:`ZstdDict` instance with + the :attr:`~ZstdDict.dict_contents` containing the raw dictionary contents. + + The *samples* argument (an iterable of bytes), contains sample data for + generating the Zstandard dictionary. + + The *dict_size* argument, an integer, is the maximum size (in bytes) the + Zstandard dictionary should be. Please see :func:`train_dict` for + suggestions on the maximum dictionary size. + + The *level* argument (an integer) is the compression level expected to be + passed to the compressors using this dictionary. The dictionary information + varies for each compression level, so tuning for the proper compression + level can make compression more efficient. + + +.. class:: ZstdDict(dict_content, /, *, is_raw=False) + + A wrapper around Zstandard dictionaries. Dictionaries can be used to improve + the compression of many small chunks of data. Use :func:`train_dict` if you + need to train a new dictionary from sample data. + + The *dict_content* argument (a :term:`bytes-like object`), is the already + trained dictionary information. + + The *is_raw* argument, a boolean, is an advanced parameter controlling the + meaning of *dict_content*. ``True`` means *dict_content* is a "raw content" + dictionary, without any format restrictions. ``False`` means *dict_content* + is an ordinary Zstandard dictionary, created from Zstandard functions, e.g. + :func:`train_dict` or the ``zstd`` CLI. + + .. attribute:: dict_content + + The content of the Zstandard dictionary, a ``bytes`` object. It's the + same as *dict_content* argument in :meth:`~ZstdDict.__init__`. It can + be used with other programs, such as the ``zstd`` CLI program. + + .. attribute:: dict_id + + Identifier of the Zstandard dictionary, a int value between 0 and . + + Non-zero means the dictionary is ordinary, created by Zstandard + functions and following the Zstandard format. + + ``0`` means a "raw content" dictionary, free of any format restriction, + used for advanced users. + + .. note:: + + The meaning of ``0`` for :attr:`ZstdDict.dict_id` is different from + the ``dictionary_id`` argument to the :func:`get_frame_info` + function. + + .. attribute:: as_digested_dict + + Load as a digested dictionary, see below. + + .. attribute:: as_undigested_dict + + Load as an undigested dictionary. + + Digesting a dictionary is a costly operation. These two attributes can + control how the dictionary is loaded to the compressor, by passing them + as the ``zstd_dict`` argument, e.g. + ``compress(data, zstd_dict=zd.as_digested_dict)``. + + If don't use one of these attributes, an **undigested** dictionary is + passed by default. + + .. list-table:: Difference for compression + :widths: 12 12 12 + :header-rows: 1 + + * - + - | Digested + | dictionary + - | Undigested + | dictionary + * - | Some advanced + | parameters of the + | compressor may + | be overridden + | by dictionary's + | parameters + - | ``window_log``, ``hash_log``, + | ``chain_log``, ``search_log``, + | ``min_match``, ``target_length``, + | ``strategy``, + | ``enable_long_distance_matching``, + | ``ldm_hash_log``, ``ldm_min_match``, + | ``ldm_bucket_size_log``, + | ``ldm_hash_rate_log``, and some + | non-public parameters. + - No + * - | ZstdDict internally + | caches the dictionary + - | Yes. It's faster when + | loading a digested + | dictionary again with the same + | compression level. + - | No. If you wish to load an undigested + | dictionary multiple times, + | consider reusing a + | compressor object. + + A **digested** dictionary is used for decompression by default, which + is faster when loaded multiple times. + + +Advanced parameter control +-------------------------- + +.. class:: CompressionParameter() + + An :class:`~enum.IntEnum` containing the advanced compression parameter + names that can be used when compressing data. + + The :meth:`~.bounds` method can be used on any attribute to get the valid + values for that parameter. + + Setting any parameter to "0" causes zstd to dynamically select a value + for that parameter based on other compression parameters' settings. + + .. method:: bounds() + + Return the tuple of int bounds, ``(lower, upper)``, of a compression + parameter. This method should be called on the attribute you wish to + retrieve the bounds of. For example, to get the valid values for + :attr:`~.compression_level`, one may check the result of + ``CompressionParameter.compression_level.bounds()``. + + Both the lower and upper bounds are inclusive. + + .. attribute:: compression_level + + A high-level means of setting other compression parameters that affect + the speed and ratio of compressing data. Setting the level to 0 uses the + default :attr:`COMPRESSION_LEVEL_DEFAULT`. + + .. attribute:: window_log + + Maximum allowed back-reference distance the compressor can use when + compressing data, expressed as power of 2, ``1 << window_log`` bytes. This + parameter greatly influences the memory usage of compression. Higher + values require more memory but gain better compression values. + + .. attribute:: hash_log + + Size of the initial probe table, as a power of 2. The resulting memory + usage is ``1 << (hash_log+2)`` bytes. Larger tables improve compression + ratio of strategies <= :attr:`~Strategy.dfast`, and improve compression + speed of strategies > :attr:`~Strategy.dfast`. + + .. attribute:: chain_log + + Size of the multi-probe search table, as a power of 2. The resulting + memory usage is ``1 << (chain_log+2)`` bytes. Larger tables result in + better and slower compression. This parameter has no effect for the + :attr:`~Strategy.fast` strategy. It's still useful when using + :attr:`~Strategy.dfast` strategy, in which case it defines a secondary + probe table. + + .. attribute:: search_log + + Number of search attempts, as a power of 2. More attempts result in + better and slower compression. This parameter is useless for + :attr:`~Strategy.fast` and :attr:`~Strategy.dfast` strategies. + + .. attribute:: min_match + + Minimum size of searched matches. Larger values increase compression and + decompression speed, but decrease ratio. Note that Zstandard can still + find matches of smaller size, it just tweaks its search algorithm to look + for this size and larger. Note that currently, for all strategies + < :attr:`~Strategy.btopt`, the effective minimum is ``4``, for all + strategies > :attr:`~Strategy.fast`, the effective maximum is ``6``. + + .. attribute:: target_length + + The impact of this field depends on the selected :class:`Strategy`. + + For strategies :attr:`~Strategy.btopt`, :attr:`~Strategy.btultra` and + :attr:`~Strategy.btultra2`, the values is the length of a match + considered "good enough" to stop searching. Larger values make + compression ratios better, but compresses slower. + + For strategy :attr:`~Strategy.fast`, it is the distance between match + sampling. Larger values make compression faster, but with a worse + compression ratio. + + .. attribute:: strategy + + The higher the value of selected strategy, the more complex the + compression technique used by zstd, resulting in higher compression + ratios but slower compression. + + .. seealso:: + :class:`Strategy` + + .. attribute:: enable_long_distance_matching + + Long distance matching can be used to improve compression for large + inputs by finding large matches at greater distances. It increases memory + usage and window size. + + Enabling this parameter increases default :attr:`~CParameter.windowLog` + to 128 MiB except when expressly set to a different value. This setting + is enabled by default if :attr:`~CParameter.windowLog` >= 128 MiB and + the compression strategy >= :attr:`~Strategy.btopt` (compression + level 16+). + + .. attribute:: ldm_hash_log + + Size of the table for long distance matching, as a power of 2. Larger + values increase memory usage and compression ratio, but decrease + compression speed. + + .. attribute:: ldm_min_match + + Minimum match size for long distance matcher. Larger or too small values + can often decrease the compression ratio. + + .. attribute:: ldm_bucket_size_log + + Log size of each bucket in the long distance matcher hash table for + collision resolution. Larger values improve collision resolution but + decrease compression speed. + .. attribute:: ldm_hash_rate_log + + Frequency of inserting/looking up entries into the long distance matcher + hash table. Larger values improve compression speed. Deviating far from + the default value will likely result in a compression ratio decrease. + + .. attribute:: content_size_flag + + Uncompressed content size will be written into frame header whenever + known. This flag currently has no effect. + + .. attribute:: checksum_flag + + A 4-byte checksum using XXHash64 of the uncompressed content is written + at the end of each frame. Zstandard's decompression code verifies the + checksum. If there is a mismatch a :class:`ZstdError` exception is + raised. + .. attribute:: dict_id_flag + + When compressing with a :class:`ZstdDict`, the dictionary's ID is written + into the frame header. + + .. attribute:: nb_workers + + Select how many threads will be spawned to compress in parallel. When + :attr:`~.nb_workers` >= 1, enables multi-threaded compression, 1 + means "1-thread multi-threaded mode". More workers improve speed, but + also increases memory usage and slightly reduce compression ratio. + + .. attribute:: job_size + + Size of a compression job, in bytes. This value is enforced only when + :attr:`~CParameter.nbWorkers` >= 1. Each compression job is completed in + parallel, so this value can indirectly impact the number of active + threads. + + .. attribute:: overlap_log + + Sets how much data is reloaded from previous jobs (threads) for new jobs + to be used by the look behind window during compression. This values is + only used when :attr:`~CParameter.nbWorkers` >= 1. Acceptable values vary + from 0 to 9. + + * 0 means dynamically set the overlap amount + * 1 means no overlap + * 9 means use a full window size from the previous job + + Each increment halves/doubles the overlap size. "8" means an overlap of + ``window_size/2``, "7" means an overlap of ``window_size/4``, etc. + +.. class:: DecompressionParameter() + + An :class:`~enum.IntEnum` containing the advanced decompression parameter + names that can be used when decompressing data. + + The :meth:`~.bounds` method can be used on any attribute to get the valid + values for that parameter. + + .. method:: bounds() + + Return the tuple of int bounds, ``(lower, upper)``, of a decompression + parameter. This method should be called on the attribute you wish to + retrieve the bounds of. For example, to get the valid values for + :attr:`~.window_log_max`, one may check the result of + ``CompressionParameter.window_log_max.bounds()``. + + Both the lower and upper bounds are inclusive. + + .. attribute:: window_log_max + + The power of two maximum size of the window used during decompression. + This can be useful to limit the amount of memory used when decompressing + data. + +.. class:: Strategy() + + An :class:`~enum.IntEnum` containing strategies for compression. + Higher-numbered strategies correspond to more complex and slower + compression. + + .. note:: + + The values of attributes of :class:`Strategy` are not necessarily stable + between zstd versions. Only the ordering may be relied upon. + + The following strategies are available: + + .. attribute:: fast + + .. attribute:: dfast + + .. attribute:: greedy + + .. attribute:: lazy + + .. attribute:: lazy2 + + .. attribute:: btlazy2 + + .. attribute:: btopt + + .. attribute:: btultra + + .. attribute:: btultra2 + + +Miscellaneous +------------- + +.. function:: get_frame_info(frame_buffer) + + Retrieve a :class:`FrameInfo`, containing metadata about a Zstandard frame. + Frames contain metadata related to the compressed data they hold. + + +.. class:: FrameInfo() + + Metadata related to a Zstandard frame. There are currently two attributes + containing metadata related to Zstandard frames. + + .. attribute:: decompressed_size + + The size of the decompressed contents of the frame. + + .. attribute:: dictionary_id + + An int object representing the Zstandard dictionary ID needed for + decompressing the frame. ``0`` means the dictionary ID was not + recorded in the frame header, the frame may or may not need a dictionary + to be decoded, or the ID of such a dictionary is not specified. + +.. attribute:: COMPRESSION_LEVEL_DEFAULT + + The default compression level for Zstandard, currently '3'. + +.. attribute:: zstd_version_info + + Version number of the runtime zstd library as a tuple of int + (major, minor, release). From 62ef4dcac5e8c884efffa018e649bdbf5a68be5e Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 11 May 2025 19:57:38 -0700 Subject: [PATCH 02/22] Add examples --- Doc/library/compression.zstd.rst | 54 ++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 6f4cbc64f13c02..da25f2f2d6bfb1 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -704,11 +704,65 @@ Miscellaneous recorded in the frame header, the frame may or may not need a dictionary to be decoded, or the ID of such a dictionary is not specified. + .. attribute:: COMPRESSION_LEVEL_DEFAULT The default compression level for Zstandard, currently '3'. + .. attribute:: zstd_version_info Version number of the runtime zstd library as a tuple of int (major, minor, release). + + +Examples +-------- + +Reading in a compressed file:: + + from compression import zstd + with zstd.open("file.zst") as f: + file_content = f.read() + +Creating a compressed file:: + + from compression import zstd + data = b"Insert Data Here" + with zstd.open("file.zst", "w") as f: + f.write(data) + +Compressing data in memory:: + + from compression import zstd + data_in = b"Insert Data Here" + data_out = zstd.compress(data_in) + +Incremental compression:: + + from compression import zstd + comp = zstd.ZstdCompressor() + out1 = comp.compress(b"Some data\n") + out2 = comp.compress(b"Another piece of data\n") + out3 = comp.compress(b"Even more data\n") + out4 = comp.flush() + # Concatenate all the partial results: + result = b"".join([out1, out2, out3, out4]) + +Writing compressed data to an already-open file:: + + from compression import zstd + with open("file.zst", "wb") as f: + f.write(b"This data will not be compressed\n") + with zstd.open(f, "w") as zstf: + zstf.write(b"This *will* be compressed\n") + f.write(b"Not compressed\n") + +Creating a compressed file using compression parameters:: + + from compression import zstd + options = { + zstd.CompressionParameter.checksum_flag: 1 + } + with zstd.open("file.zst", "w", options=options) as f: + f.write(b"blah blah blah") From 0b154b131896c78b7908400e746bdd6544477192 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 11 May 2025 20:01:29 -0700 Subject: [PATCH 03/22] Fix camelcase name references --- Doc/library/compression.zstd.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index da25f2f2d6bfb1..cf617134f13cc0 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -347,7 +347,7 @@ Zstandard Dictionaries dictionary. The *zstd_dict* argument is a :class:`ZstdDict` instance with - the :attr:`~ZstdDict.dict_contents` containing the raw dictionary contents. + the :attr:`~ZstdDict.dict_content` containing the raw dictionary contents. The *samples* argument (an iterable of bytes), contains sample data for generating the Zstandard dictionary. @@ -380,7 +380,7 @@ Zstandard Dictionaries .. attribute:: dict_content The content of the Zstandard dictionary, a ``bytes`` object. It's the - same as *dict_content* argument in :meth:`~ZstdDict.__init__`. It can + same as the *dict_content* argument in the ``__init__`` method. It can be used with other programs, such as the ``zstd`` CLI program. .. attribute:: dict_id @@ -551,9 +551,9 @@ Advanced parameter control inputs by finding large matches at greater distances. It increases memory usage and window size. - Enabling this parameter increases default :attr:`~CParameter.windowLog` + Enabling this parameter increases default :attr:`~CParameter.window_log` to 128 MiB except when expressly set to a different value. This setting - is enabled by default if :attr:`~CParameter.windowLog` >= 128 MiB and + is enabled by default if :attr:`~CParameter.window_log` >= 128 MiB and the compression strategy >= :attr:`~Strategy.btopt` (compression level 16+). @@ -605,7 +605,7 @@ Advanced parameter control .. attribute:: job_size Size of a compression job, in bytes. This value is enforced only when - :attr:`~CParameter.nbWorkers` >= 1. Each compression job is completed in + :attr:`~CParameter.nb_workers` >= 1. Each compression job is completed in parallel, so this value can indirectly impact the number of active threads. @@ -613,7 +613,7 @@ Advanced parameter control Sets how much data is reloaded from previous jobs (threads) for new jobs to be used by the look behind window during compression. This values is - only used when :attr:`~CParameter.nbWorkers` >= 1. Acceptable values vary + only used when :attr:`~CParameter.nb_workers` >= 1. Acceptable values vary from 0 to 9. * 0 means dynamically set the overlap amount From 63f963f651572e255db4c5cc30bfbd29de194324 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 11 May 2025 20:05:42 -0700 Subject: [PATCH 04/22] CParameter->CompressionParameter --- Doc/library/compression.zstd.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index cf617134f13cc0..3a27413a332c1d 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -551,11 +551,11 @@ Advanced parameter control inputs by finding large matches at greater distances. It increases memory usage and window size. - Enabling this parameter increases default :attr:`~CParameter.window_log` - to 128 MiB except when expressly set to a different value. This setting - is enabled by default if :attr:`~CParameter.window_log` >= 128 MiB and - the compression strategy >= :attr:`~Strategy.btopt` (compression - level 16+). + Enabling this parameter increases default + :attr:`~CompressionParameter.window_log` to 128 MiB except when expressly + set to a different value. This setting is enabled by default if + :attr:`~CompressionParameter.window_log` >= 128 MiB and the compression + strategy >= :attr:`~Strategy.btopt` (compression level 16+). .. attribute:: ldm_hash_log @@ -605,16 +605,16 @@ Advanced parameter control .. attribute:: job_size Size of a compression job, in bytes. This value is enforced only when - :attr:`~CParameter.nb_workers` >= 1. Each compression job is completed in - parallel, so this value can indirectly impact the number of active - threads. + :attr:`~CompressionParameter.nb_workers` >= 1. Each compression job is + completed in parallel, so this value can indirectly impact the number of + active threads. .. attribute:: overlap_log Sets how much data is reloaded from previous jobs (threads) for new jobs to be used by the look behind window during compression. This values is - only used when :attr:`~CParameter.nb_workers` >= 1. Acceptable values vary - from 0 to 9. + only used when :attr:`~CompressionParameter.nb_workers` >= 1. Acceptable + values vary from 0 to 9. * 0 means dynamically set the overlap amount * 1 means no overlap From cfe0590855a5f5c6e36da82baf0703c89d8acf6b Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Mon, 12 May 2025 08:40:27 -0700 Subject: [PATCH 05/22] Apply suggestions from AA-Turner Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> --- Doc/library/archiving.rst | 2 +- Doc/library/compression.rst | 19 ++++++------------- Doc/library/compression.zstd.rst | 32 ++++++++++++++++++-------------- 3 files changed, 25 insertions(+), 28 deletions(-) diff --git a/Doc/library/archiving.rst b/Doc/library/archiving.rst index 9148faf73d4518..da0b3f8c3e7693 100644 --- a/Doc/library/archiving.rst +++ b/Doc/library/archiving.rst @@ -5,7 +5,7 @@ Data Compression and Archiving ****************************** The modules described in this chapter support data compression with the zlib, -gzip, bzip2, zstd, and lzma algorithms, and the creation of ZIP- and tar-format +gzip, bzip2, lzma, and zstd algorithms, and the creation of ZIP- and tar-format archives. See also :ref:`archiving-operations` provided by the :mod:`shutil` module. diff --git a/Doc/library/compression.rst b/Doc/library/compression.rst index d2f3b93d764da0..861fecc5d7badf 100644 --- a/Doc/library/compression.rst +++ b/Doc/library/compression.rst @@ -1,25 +1,18 @@ The :mod:`!compression` package -================================= +=============================== .. versionadded:: 3.14 -.. note:: +.. attention:: - Several modules in :mod:`!compression` re-export modules that currently - exist at the repository top-level. These re-exported modules are the new - canonical import name for the respective top-level module. The existing - modules are not currently deprecated and will not be removed prior to Python - 3.19, but users are encouraged to migrate to the new import names when - feasible. + The :mod:`!compression` package is the new location for the data compression + modules in the standard library, listed below. The existing modules are not + deprecated and will not be removed before Python 3.19. The new ``compression.*`` + import names are encouraged for use where practicable. * :mod:`!compression.bz2` -- Re-exports :mod:`bz2` - * :mod:`!compression.gzip` -- Re-exports :mod:`gzip` - * :mod:`!compression.lzma` -- Re-exports :mod:`lzma` - * :mod:`!compression.zlib` -- Re-exports :mod:`zlib` - * :mod:`compression.zstd` -- Wrapper for the Zstandard compression library - diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 3a27413a332c1d..e6f15f860533c8 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -138,10 +138,9 @@ Reading and writing compressed files ``'rb'`` for reading and ``'wb'`` for writing. - .. attribute:: name - The zstd file name. Equivalent to the :attr:`~io.FileIO.name` + The name of the Zstandard file. Equivalent to the :attr:`~io.FileIO.name` attribute of the underlying :term:`file object`. @@ -542,8 +541,7 @@ Advanced parameter control compression technique used by zstd, resulting in higher compression ratios but slower compression. - .. seealso:: - :class:`Strategy` + .. seealso:: :class:`Strategy` .. attribute:: enable_long_distance_matching @@ -684,11 +682,11 @@ Miscellaneous .. function:: get_frame_info(frame_buffer) - Retrieve a :class:`FrameInfo`, containing metadata about a Zstandard frame. - Frames contain metadata related to the compressed data they hold. + Retrieve a :class:`FrameInfo` object containing metadata about a Zstandard + frame. Frames contain metadata related to the compressed data they hold. -.. class:: FrameInfo() +.. class:: FrameInfo Metadata related to a Zstandard frame. There are currently two attributes containing metadata related to Zstandard frames. @@ -712,34 +710,38 @@ Miscellaneous .. attribute:: zstd_version_info - Version number of the runtime zstd library as a tuple of int + Version number of the runtime zstd library as a tuple of integers (major, minor, release). Examples -------- -Reading in a compressed file:: +Reading in a compressed file: +.. code-block:: python from compression import zstd with zstd.open("file.zst") as f: file_content = f.read() -Creating a compressed file:: +Creating a compressed file: +.. code-block:: python from compression import zstd data = b"Insert Data Here" with zstd.open("file.zst", "w") as f: f.write(data) -Compressing data in memory:: +Compressing data in memory: +.. code-block:: python from compression import zstd data_in = b"Insert Data Here" data_out = zstd.compress(data_in) -Incremental compression:: +Incremental compression: +.. code-block:: python from compression import zstd comp = zstd.ZstdCompressor() out1 = comp.compress(b"Some data\n") @@ -749,8 +751,9 @@ Incremental compression:: # Concatenate all the partial results: result = b"".join([out1, out2, out3, out4]) -Writing compressed data to an already-open file:: +Writing compressed data to an already-open file: +.. code-block:: python from compression import zstd with open("file.zst", "wb") as f: f.write(b"This data will not be compressed\n") @@ -758,8 +761,9 @@ Writing compressed data to an already-open file:: zstf.write(b"This *will* be compressed\n") f.write(b"Not compressed\n") -Creating a compressed file using compression parameters:: +Creating a compressed file using compression parameters: +.. code-block:: python from compression import zstd options = { zstd.CompressionParameter.checksum_flag: 1 From 5115b4ce4c08274c8f8ba138240ac5453934cc0b Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Mon, 12 May 2025 12:13:50 -0700 Subject: [PATCH 06/22] Apply suggestions from reviewers Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> --- Doc/library/compression.zstd.rst | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index e6f15f860533c8..b1f1551267bf72 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -40,9 +40,10 @@ The :mod:`compression.zstd` module contains: Reading and writing compressed files ------------------------------------ -.. function:: open(file, /, mode="rb", *, level=None, options=None, zstd_dict=None, encoding=None, errors=None, newline=None) +.. function:: open(file, /, mode='rb', *, level=None, options=None, \ + zstd_dict=None, encoding=None, errors=None, newline=None) - Open an Zstandard-compressed file in binary or text mode, returning a + Open a Zstandard-compressed file in binary or text mode, returning a :term:`file object`. The *file* argument can be either an actual file name (given as a @@ -79,14 +80,14 @@ Reading and writing compressed files behavior, and line ending(s). -.. class:: ZstdFile(file, /, mode="r", *, level=None, options=None, zstd_dict=None) +.. class:: ZstdFile(file, /, mode='r', *, level=None, options=None, zstd_dict=None) Open a Zstandard-compressed file in binary mode. A :class:`ZstdFile` can wrap an already-open :term:`file object`, or operate directly on a named file. The *file* argument specifies either the file object to wrap, or the name of the file to open (as a :class:`str`, - :class:`bytes` or :term:`path-like ` object). When + :class:`bytes` or :term:`path-like ` object). If wrapping an existing file object, the wrapped file will not be closed when the :class:`ZstdFile` is closed. @@ -527,7 +528,7 @@ Advanced parameter control The impact of this field depends on the selected :class:`Strategy`. For strategies :attr:`~Strategy.btopt`, :attr:`~Strategy.btultra` and - :attr:`~Strategy.btultra2`, the values is the length of a match + :attr:`~Strategy.btultra2`, the value is the length of a match considered "good enough" to stop searching. Larger values make compression ratios better, but compresses slower. @@ -571,6 +572,7 @@ Advanced parameter control Log size of each bucket in the long distance matcher hash table for collision resolution. Larger values improve collision resolution but decrease compression speed. + .. attribute:: ldm_hash_rate_log Frequency of inserting/looking up entries into the long distance matcher @@ -598,7 +600,7 @@ Advanced parameter control Select how many threads will be spawned to compress in parallel. When :attr:`~.nb_workers` >= 1, enables multi-threaded compression, 1 means "1-thread multi-threaded mode". More workers improve speed, but - also increases memory usage and slightly reduce compression ratio. + also increase memory usage and slightly reduce compression ratio. .. attribute:: job_size @@ -610,7 +612,7 @@ Advanced parameter control .. attribute:: overlap_log Sets how much data is reloaded from previous jobs (threads) for new jobs - to be used by the look behind window during compression. This values is + to be used by the look behind window during compression. This value is only used when :attr:`~CompressionParameter.nb_workers` >= 1. Acceptable values vary from 0 to 9. @@ -720,6 +722,7 @@ Examples Reading in a compressed file: .. code-block:: python + from compression import zstd with zstd.open("file.zst") as f: file_content = f.read() @@ -727,6 +730,7 @@ Reading in a compressed file: Creating a compressed file: .. code-block:: python + from compression import zstd data = b"Insert Data Here" with zstd.open("file.zst", "w") as f: @@ -735,6 +739,7 @@ Creating a compressed file: Compressing data in memory: .. code-block:: python + from compression import zstd data_in = b"Insert Data Here" data_out = zstd.compress(data_in) @@ -742,6 +747,7 @@ Compressing data in memory: Incremental compression: .. code-block:: python + from compression import zstd comp = zstd.ZstdCompressor() out1 = comp.compress(b"Some data\n") @@ -754,6 +760,7 @@ Incremental compression: Writing compressed data to an already-open file: .. code-block:: python + from compression import zstd with open("file.zst", "wb") as f: f.write(b"This data will not be compressed\n") @@ -764,6 +771,7 @@ Writing compressed data to an already-open file: Creating a compressed file using compression parameters: .. code-block:: python + from compression import zstd options = { zstd.CompressionParameter.checksum_flag: 1 From 4ab7fd78fca0ab1929df22903301167f24364227 Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Mon, 12 May 2025 13:02:20 -0700 Subject: [PATCH 07/22] Apply suggestions from reviewers Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> --- Doc/library/compression.zstd.rst | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index b1f1551267bf72..e5b5ee712f2e6b 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -11,10 +11,10 @@ -------------- -This module provides classes and convenience functions for compressing and -decompressing data using the Zstandard (or "zstd") compression algorithm. Also -included is a file interface supporting reading and writing contents of ``.zst`` -files created from the :program:`zstd` utility, as well as raw zstd compressed +This module provides classes and functions for compressing and +decompressing data using the Zstandard (or *zstd*) compression algorithm. Also +included is a file interface that supports reading and writing the contents of ``.zst`` files. +files created by the :program:`zstd` utility, as well as raw zstd compressed streams. The :mod:`compression.zstd` module contains: @@ -51,33 +51,33 @@ Reading and writing compressed files in which case the named file is opened, or it can be an existing file object to read from or write to. - The mode argument can be either 'r' for reading (default), 'w' for - overwriting, 'a' for appending, or 'x' for exclusive creation. These can - equivalently be given as 'rb', 'wb', 'ab', and 'xb' respectively. You may - also open in text mode with 'rt', 'wt', 'at', and 'xt' respectively. + The mode argument can be either ``'r'`` for reading (default), ``'w'`` for + overwriting, 'a' for appending, or ``'x'`` for exclusive creation. These can + equivalently be given as ``'rb'``, ``'wb'``, ``'ab'``, and ``'xb'`` respectively. You may + also open in text mode with ``'rt'``, ``'wt'``, ``'at'``, and ``'xt'`` respectively. When opening a file for reading, the *options* argument can be a dictionary - providing advanced decompression parameters, see + providing advanced decompression parameters; see :class:`DecompressionParameter` for detailed information about supported parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be used during decompression. When opening a file for reading, the *level* argument should not be used. When opening a file for writing, the *options* argument can be a dictionary - providing advanced decompression parameters, see + providing advanced decompression parameters; see :class:`CompressionParameter` for detailed information about supported parameters. The *level* argument is the compression level to use when writing compressed data. Only one of *level* or *options* may be passed. The *zstd_dict* argument is a :class:`ZstdDict` instance to be used during compression. - For binary mode, this function is equivalent to the :class:`ZstdFile` + In binary mode, this function is equivalent to the :class:`ZstdFile` constructor: ``ZstdFile(file, mode, ...)``. In this case, the - *encoding*, *errors* and *newline* parameters must not be provided. + *encoding*, *errors*, and *newline* parameters must not be provided. - For text mode, a :class:`ZstdFile` object is created, and wrapped in an + In text mode, a :class:`ZstdFile` object is created, and wrapped in an :class:`io.TextIOWrapper` instance with the specified encoding, error handling - behavior, and line ending(s). + behavior, and line endings. .. class:: ZstdFile(file, /, mode='r', *, level=None, options=None, zstd_dict=None) From 5eb5efcd089fdfad04199872eaaf10cf8d1b0a47 Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Tue, 13 May 2025 18:40:50 -0700 Subject: [PATCH 08/22] Apply suggestions from reviewers Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> --- Doc/library/compression.rst | 2 +- Doc/library/compression.zstd.rst | 57 +++++++++++++++++--------------- 2 files changed, 32 insertions(+), 27 deletions(-) diff --git a/Doc/library/compression.rst b/Doc/library/compression.rst index 861fecc5d7badf..427c6c42154ca1 100644 --- a/Doc/library/compression.rst +++ b/Doc/library/compression.rst @@ -8,7 +8,7 @@ The :mod:`!compression` package The :mod:`!compression` package is the new location for the data compression modules in the standard library, listed below. The existing modules are not deprecated and will not be removed before Python 3.19. The new ``compression.*`` - import names are encouraged for use where practicable. + import names are encouraged for use where practical. * :mod:`!compression.bz2` -- Re-exports :mod:`bz2` * :mod:`!compression.gzip` -- Re-exports :mod:`gzip` diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index e5b5ee712f2e6b..2345afe86b6f32 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -2,7 +2,7 @@ ============================================================================= .. module:: compression.zstd - :synopsis: Low level interface to compression and decompression routines in + :synopsis: Low-level interface to compression and decompression routines in Meta's zstd library .. versionadded:: 3.14 @@ -17,7 +17,7 @@ included is a file interface that supports reading and writing the contents of ` files created by the :program:`zstd` utility, as well as raw zstd compressed streams. -The :mod:`compression.zstd` module contains: +The :mod:`!compression.zstd` module contains: * The :func:`.open` function and :class:`ZstdFile` class for reading and writing compressed files. @@ -52,7 +52,7 @@ Reading and writing compressed files to read from or write to. The mode argument can be either ``'r'`` for reading (default), ``'w'`` for - overwriting, 'a' for appending, or ``'x'`` for exclusive creation. These can + overwriting, ``'a'`` for appending, or ``'x'`` for exclusive creation. These can equivalently be given as ``'rb'``, ``'wb'``, ``'ab'``, and ``'xb'`` respectively. You may also open in text mode with ``'rt'``, ``'wt'``, ``'at'``, and ``'xt'`` respectively. @@ -132,7 +132,7 @@ Reading and writing compressed files .. note:: While calling :meth:`peek` does not change the file position of the :class:`ZstdFile`, it may change the position of the underlying - file object (e.g. if the :class:`ZstdFile` was constructed by passing a + file object (for example, if the :class:`ZstdFile` was constructed by passing a file object for *filename*). .. attribute:: mode @@ -154,7 +154,7 @@ Compressing and decompressing data in memory data as a :class:`bytes` object. The *level* argument is an int object controlling the level of - compression. Please refer to :meth:`CompressionParameter.bounds` to get the + compression. Refer to :meth:`CompressionParameter.bounds` to get the values that can be passed for *level*. If advanced compression options are needed, this argument must be omitted and in the *options* dictionary the :attr:`CompressionParameter.compression_level` parameter should be set. @@ -194,7 +194,7 @@ Compressing and decompressing data in memory module-level function :func:`compress`. The *level* argument is an int object controlling the level of - compression. Please refer to :meth:`CompressionParameter.bounds` to get the + compression. Refer to :meth:`CompressionParameter.bounds` to get the values that can be passed for *level*. If advanced compression options are needed, this argument must be omitted and in the *options* dictionary the :attr:`CompressionParameter.compression_level` parameter should be set. @@ -283,7 +283,7 @@ Compressing and decompressing data in memory The returned data should be concatenated with the output of any previous calls to :meth:`~.decompress`. - If *max_length* is nonnegative, returns at most *max_length* + If *max_length* is non-negative, returns at most *max_length* bytes of decompressed data. If this limit is reached and further output can be produced, the :attr:`~.needs_input` attribute will be set to ``False``. In this case, the next call to @@ -315,7 +315,7 @@ Compressing and decompressing data in memory decompressed data before requiring new uncompressed input. -Zstandard Dictionaries +Zstandard dictionaries ---------------------- @@ -333,7 +333,7 @@ Zstandard Dictionaries The *dict_size* argument, an integer, is the maximum size (in bytes) the Zstandard dictionary should be. The Zstandard documentation suggests an - absolute maximum of no more than 100KB, but the maximum can often be smaller + absolute maximum of no more than 100 KB, but the maximum can often be smaller depending on the data. Larger dictionaries generally slow down compression, but improve compression ratios. Smaller dictionaries lead to faster compression, but reduce the compression ratio. @@ -353,7 +353,7 @@ Zstandard Dictionaries generating the Zstandard dictionary. The *dict_size* argument, an integer, is the maximum size (in bytes) the - Zstandard dictionary should be. Please see :func:`train_dict` for + Zstandard dictionary should be. See :func:`train_dict` for suggestions on the maximum dictionary size. The *level* argument (an integer) is the compression level expected to be @@ -374,8 +374,8 @@ Zstandard Dictionaries The *is_raw* argument, a boolean, is an advanced parameter controlling the meaning of *dict_content*. ``True`` means *dict_content* is a "raw content" dictionary, without any format restrictions. ``False`` means *dict_content* - is an ordinary Zstandard dictionary, created from Zstandard functions, e.g. - :func:`train_dict` or the ``zstd`` CLI. + is an ordinary Zstandard dictionary, created from Zstandard functions, + for example, :func:`train_dict` or the ``zstd`` CLI. .. attribute:: dict_content @@ -385,7 +385,7 @@ Zstandard Dictionaries .. attribute:: dict_id - Identifier of the Zstandard dictionary, a int value between 0 and . + Identifier of the Zstandard dictionary, an int value between zero and . Non-zero means the dictionary is ordinary, created by Zstandard functions and following the Zstandard format. @@ -409,7 +409,7 @@ Zstandard Dictionaries Digesting a dictionary is a costly operation. These two attributes can control how the dictionary is loaded to the compressor, by passing them - as the ``zstd_dict`` argument, e.g. + as the ``zstd_dict`` argument, for example, ``compress(data, zstd_dict=zd.as_digested_dict)``. If don't use one of these attributes, an **undigested** dictionary is @@ -466,7 +466,7 @@ Advanced parameter control The :meth:`~.bounds` method can be used on any attribute to get the valid values for that parameter. - Setting any parameter to "0" causes zstd to dynamically select a value + Setting any parameter to zero causes zstd to dynamically select a value for that parameter based on other compression parameters' settings. .. method:: bounds() @@ -482,26 +482,26 @@ Advanced parameter control .. attribute:: compression_level A high-level means of setting other compression parameters that affect - the speed and ratio of compressing data. Setting the level to 0 uses the + the speed and ratio of compressing data. Setting the level to zero uses the default :attr:`COMPRESSION_LEVEL_DEFAULT`. .. attribute:: window_log Maximum allowed back-reference distance the compressor can use when - compressing data, expressed as power of 2, ``1 << window_log`` bytes. This + compressing data, expressed as power of two, ``1 << window_log`` bytes. This parameter greatly influences the memory usage of compression. Higher values require more memory but gain better compression values. .. attribute:: hash_log - Size of the initial probe table, as a power of 2. The resulting memory + Size of the initial probe table, as a power of two. The resulting memory usage is ``1 << (hash_log+2)`` bytes. Larger tables improve compression ratio of strategies <= :attr:`~Strategy.dfast`, and improve compression speed of strategies > :attr:`~Strategy.dfast`. .. attribute:: chain_log - Size of the multi-probe search table, as a power of 2. The resulting + Size of the multi-probe search table, as a power of two. The resulting memory usage is ``1 << (chain_log+2)`` bytes. Larger tables result in better and slower compression. This parameter has no effect for the :attr:`~Strategy.fast` strategy. It's still useful when using @@ -510,7 +510,7 @@ Advanced parameter control .. attribute:: search_log - Number of search attempts, as a power of 2. More attempts result in + Number of search attempts, as a power of two. More attempts result in better and slower compression. This parameter is useless for :attr:`~Strategy.fast` and :attr:`~Strategy.dfast` strategies. @@ -553,12 +553,12 @@ Advanced parameter control Enabling this parameter increases default :attr:`~CompressionParameter.window_log` to 128 MiB except when expressly set to a different value. This setting is enabled by default if - :attr:`~CompressionParameter.window_log` >= 128 MiB and the compression + :attr:`!window_log` >= 128 MiB and the compression strategy >= :attr:`~Strategy.btopt` (compression level 16+). .. attribute:: ldm_hash_log - Size of the table for long distance matching, as a power of 2. Larger + Size of the table for long distance matching, as a power of two. Larger values increase memory usage and compression ratio, but decrease compression speed. @@ -586,7 +586,7 @@ Advanced parameter control .. attribute:: checksum_flag - A 4-byte checksum using XXHash64 of the uncompressed content is written + A four-byte checksum using XXHash64 of the uncompressed content is written at the end of each frame. Zstandard's decompression code verifies the checksum. If there is a mismatch a :class:`ZstdError` exception is raised. @@ -690,8 +690,7 @@ Miscellaneous .. class:: FrameInfo - Metadata related to a Zstandard frame. There are currently two attributes - containing metadata related to Zstandard frames. + Metadata related to a Zstandard frame. .. attribute:: decompressed_size @@ -707,7 +706,7 @@ Miscellaneous .. attribute:: COMPRESSION_LEVEL_DEFAULT - The default compression level for Zstandard, currently '3'. + The default compression level for Zstandard: ``3``. .. attribute:: zstd_version_info @@ -724,6 +723,7 @@ Reading in a compressed file: .. code-block:: python from compression import zstd + with zstd.open("file.zst") as f: file_content = f.read() @@ -732,6 +732,7 @@ Creating a compressed file: .. code-block:: python from compression import zstd + data = b"Insert Data Here" with zstd.open("file.zst", "w") as f: f.write(data) @@ -741,6 +742,7 @@ Compressing data in memory: .. code-block:: python from compression import zstd + data_in = b"Insert Data Here" data_out = zstd.compress(data_in) @@ -749,6 +751,7 @@ Incremental compression: .. code-block:: python from compression import zstd + comp = zstd.ZstdCompressor() out1 = comp.compress(b"Some data\n") out2 = comp.compress(b"Another piece of data\n") @@ -762,6 +765,7 @@ Writing compressed data to an already-open file: .. code-block:: python from compression import zstd + with open("file.zst", "wb") as f: f.write(b"This data will not be compressed\n") with zstd.open(f, "w") as zstf: @@ -773,6 +777,7 @@ Creating a compressed file using compression parameters: .. code-block:: python from compression import zstd + options = { zstd.CompressionParameter.checksum_flag: 1 } From 987bd2788c8d19df316bc445aa1e92beb55af818 Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Tue, 13 May 2025 18:47:46 -0700 Subject: [PATCH 09/22] Don't reference self when referring to items Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> --- Doc/library/compression.zstd.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 2345afe86b6f32..cdf19e51d4ab5e 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -102,7 +102,7 @@ Reading and writing compressed files When opening a file for reading, the *options* argument can be a dictionary providing advanced decompression parameters, see :class:`DecompressionParameter` for detailed information about supported - parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be + parameters. The *zstd_dict* argument is a :class:`!ZstdDict` instance to be used during decompression. When opening a file for reading, the *level* argument should not be used. @@ -111,13 +111,13 @@ Reading and writing compressed files :class:`CompressionParameter` for detailed information about supported parameters. The *level* argument is the compression level to use when writing compressed data. Only one of *level* or *options* may be passed. The - *zstd_dict* argument is a :class:`ZstdDict` instance to be used during + *zstd_dict* argument is a :class:`!ZstdDict` instance to be used during compression. When opening a file for writing, the *options*, *zstd_dict* and *level* arguments have the same meanings as for :class:`ZstdCompressor`. - :class:`ZstdFile` supports all the members specified by + :class:`!ZstdFile` supports all the members specified by :class:`io.BufferedIOBase`, except for :meth:`~io.BufferedIOBase.detach` and :meth:`~io.IOBase.truncate`. Iteration and the :keyword:`with` statement are supported. @@ -231,7 +231,7 @@ Compressing and decompressing data in memory Compress *data* (a :term:`bytes-like object`), returning a :class:`bytes` object if possible, or an empty byte string otherwise. Some of *data* may be buffered internally, for use in later calls to - :meth:`~.compress` and :meth:`~.flush`. The + :meth:`!compress` and :meth:`~.flush`. The returned data should be concatenated with the output of any previous calls to :meth:`~.compress`. @@ -273,15 +273,15 @@ Compressing and decompressing data in memory compressed frames, unlike the :func:`decompress` function and :class:`ZstdFile` class. To decompress a multi-frame input, you should use :func:`decompress`, :class:`ZstdFile` if working with a - :term:`file object`, or multiple :class:`ZstdDecompressor` instances. + :term:`file object`, or multiple :class:`!ZstdDecompressor` instances. .. method:: decompress(data, max_length=-1) Decompress *data* (a :term:`bytes-like object`), returning uncompressed data as bytes. Some of *data* may be buffered - internally, for use in later calls to :meth:`~.decompress`. + internally, for use in later calls to :meth:`!decompress`. The returned data should be concatenated with the output of any previous - calls to :meth:`~.decompress`. + calls to :meth:`!decompress`. If *max_length* is non-negative, returns at most *max_length* bytes of decompressed data. If this limit is reached and further @@ -395,7 +395,7 @@ Zstandard dictionaries .. note:: - The meaning of ``0`` for :attr:`ZstdDict.dict_id` is different from + The meaning of ``0`` for :attr:`!ZstdDict.dict_id` is different from the ``dictionary_id`` argument to the :func:`get_frame_info` function. @@ -598,7 +598,7 @@ Advanced parameter control .. attribute:: nb_workers Select how many threads will be spawned to compress in parallel. When - :attr:`~.nb_workers` >= 1, enables multi-threaded compression, 1 + :attr:`!nb_workers` >= 1, enables multi-threaded compression, 1 means "1-thread multi-threaded mode". More workers improve speed, but also increase memory usage and slightly reduce compression ratio. From 615ed7f7e0a9e32745ba183a19024b167a0fa9ae Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Tue, 13 May 2025 20:39:26 -0700 Subject: [PATCH 10/22] Updates to respond to review - Rewrite the digested/undigested dict section - Add a section for exceptions - wrap lines to ~80 chars - clarify that an exception is raised if level is passed to a decompressor - make quotes in docs for open vs ZstdFile consistent - Remove currently and repeated "note" --- Doc/library/compression.zstd.rst | 147 +++++++++++++++---------------- 1 file changed, 70 insertions(+), 77 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index cdf19e51d4ab5e..7db68c89f14ffd 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -13,9 +13,9 @@ This module provides classes and functions for compressing and decompressing data using the Zstandard (or *zstd*) compression algorithm. Also -included is a file interface that supports reading and writing the contents of ``.zst`` files. -files created by the :program:`zstd` utility, as well as raw zstd compressed -streams. +included is a file interface that supports reading and writing the contents of +``.zst`` files created by the :program:`zstd` utility, as well as raw zstd +compressed streams. The :mod:`!compression.zstd` module contains: @@ -31,6 +31,9 @@ The :mod:`!compression.zstd` module contains: :class:`Strategy` classes for setting advanced (de)compression parameters. +Exceptions +---------- + .. exception:: ZstdError This exception is raised when an error occurs during compression or @@ -52,16 +55,17 @@ Reading and writing compressed files to read from or write to. The mode argument can be either ``'r'`` for reading (default), ``'w'`` for - overwriting, ``'a'`` for appending, or ``'x'`` for exclusive creation. These can - equivalently be given as ``'rb'``, ``'wb'``, ``'ab'``, and ``'xb'`` respectively. You may - also open in text mode with ``'rt'``, ``'wt'``, ``'at'``, and ``'xt'`` respectively. + overwriting, ``'a'`` for appending, or ``'x'`` for exclusive creation. These + can equivalently be given as ``'rb'``, ``'wb'``, ``'ab'``, and ``'xb'`` + respectively. You may also open in text mode with ``'rt'``, ``'wt'``, + ``'at'``, and ``'xt'`` respectively. When opening a file for reading, the *options* argument can be a dictionary providing advanced decompression parameters; see :class:`DecompressionParameter` for detailed information about supported parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be - used during decompression. When opening a file for reading, the *level* - argument should not be used. + used during decompression. When opening a file for reading, if the *level* + argument is passed a :exc:`!TypeError` will be raised. When opening a file for writing, the *options* argument can be a dictionary providing advanced decompression parameters; see @@ -76,11 +80,12 @@ Reading and writing compressed files *encoding*, *errors*, and *newline* parameters must not be provided. In text mode, a :class:`ZstdFile` object is created, and wrapped in an - :class:`io.TextIOWrapper` instance with the specified encoding, error handling - behavior, and line endings. + :class:`io.TextIOWrapper` instance with the specified encoding, error + handling behavior, and line endings. -.. class:: ZstdFile(file, /, mode='r', *, level=None, options=None, zstd_dict=None) +.. class:: ZstdFile(file, /, mode='r', *, level=None, options=None, \ + zstd_dict=None) Open a Zstandard-compressed file in binary mode. @@ -91,20 +96,20 @@ Reading and writing compressed files wrapping an existing file object, the wrapped file will not be closed when the :class:`ZstdFile` is closed. - The *mode* argument can be either ``"r"`` for reading (default), ``"w"`` for - overwriting, ``"x"`` for exclusive creation, or ``"a"`` for appending. These - can equivalently be given as ``"rb"``, ``"wb"``, ``"xb"`` and ``"ab"`` + The *mode* argument can be either ``'r'`` for reading (default), ``'w'`` for + overwriting, ``'x'`` for exclusive creation, or ``'a'`` for appending. These + can equivalently be given as ``'rb'``, ``'wb'``, ``'xb'`` and ``'ab'`` respectively. If *file* is a file object (rather than an actual file name), a mode of - ``"w"`` does not truncate the file, and is instead equivalent to ``"a"``. + ``'w'`` does not truncate the file, and is instead equivalent to ``'a'``. When opening a file for reading, the *options* argument can be a dictionary providing advanced decompression parameters, see :class:`DecompressionParameter` for detailed information about supported parameters. The *zstd_dict* argument is a :class:`!ZstdDict` instance to be - used during decompression. When opening a file for reading, the *level* - argument should not be used. + used during decompression. When opening a file for reading, if the *level* + argument is passed a :exc:`!TypeError` will be raised. When opening a file for writing, the *options* argument can be a dictionary providing advanced decompression parameters, see @@ -132,8 +137,8 @@ Reading and writing compressed files .. note:: While calling :meth:`peek` does not change the file position of the :class:`ZstdFile`, it may change the position of the underlying - file object (for example, if the :class:`ZstdFile` was constructed by passing a - file object for *filename*). + file object (for example, if the :class:`ZstdFile` was constructed by + passing a file object for *file*). .. attribute:: mode @@ -307,7 +312,7 @@ Compressing and decompressing data in memory Data found after the end of the compressed stream. - Before the end of the stream is reached, this will be ``b""``. + Before the end of the stream is reached, this will be ``b''``. .. attribute:: needs_input @@ -377,6 +382,40 @@ Zstandard dictionaries is an ordinary Zstandard dictionary, created from Zstandard functions, for example, :func:`train_dict` or the ``zstd`` CLI. + When passing a :class:`!ZstdDict` to a function, the + :attr:`!as_digested_dict` and :attr:`!as_undigested_dict` attributes can + control how the dictionary is loaded by passing them as the ``zstd_dict`` + argument, for example, ``compress(data, zstd_dict=zd.as_digested_dict)``. + Digesting a dictionary is a costly operation that occurs when loading a + Zstandard dictionary. When making multiple calls to compression or + decompression, passing a digested dictionary will reduce the overhead of + loading the dictionary. + + .. list-table:: Difference for compression + :widths: 10 14 10 + :header-rows: 1 + + * - + - Digested dictionary + - Undigested dictionary + * - Advanced parameters of the compressor which may be overridden by + the dictionary's parameters + - ``window_log``, ``hash_log``, ``chain_log``, ``search_log``, + ``min_match``, ``target_length``, ``strategy``, + ``enable_long_distance_matching``, ``ldm_hash_log``, + ``ldm_min_match``, ``ldm_bucket_size_log``, ``ldm_hash_rate_log``, + and some non-public parameters. + - None + * - :class:`!ZstdDict` internally caches the dictionary + - Yes. It's faster when loading a digested dictionary again with the + same compression level. + - No. If you wish to load an undigested dictionary multiple times, + consider reusing a compressor object. + + If passing a :class:`!ZstdDict` without any attribute, an undigested + dictionary is passed by default when compressing and a digested dictionary + is passed by default when decompressing. + .. attribute:: dict_content The content of the Zstandard dictionary, a ``bytes`` object. It's the @@ -407,53 +446,6 @@ Zstandard dictionaries Load as an undigested dictionary. - Digesting a dictionary is a costly operation. These two attributes can - control how the dictionary is loaded to the compressor, by passing them - as the ``zstd_dict`` argument, for example, - ``compress(data, zstd_dict=zd.as_digested_dict)``. - - If don't use one of these attributes, an **undigested** dictionary is - passed by default. - - .. list-table:: Difference for compression - :widths: 12 12 12 - :header-rows: 1 - - * - - - | Digested - | dictionary - - | Undigested - | dictionary - * - | Some advanced - | parameters of the - | compressor may - | be overridden - | by dictionary's - | parameters - - | ``window_log``, ``hash_log``, - | ``chain_log``, ``search_log``, - | ``min_match``, ``target_length``, - | ``strategy``, - | ``enable_long_distance_matching``, - | ``ldm_hash_log``, ``ldm_min_match``, - | ``ldm_bucket_size_log``, - | ``ldm_hash_rate_log``, and some - | non-public parameters. - - No - * - | ZstdDict internally - | caches the dictionary - - | Yes. It's faster when - | loading a digested - | dictionary again with the same - | compression level. - - | No. If you wish to load an undigested - | dictionary multiple times, - | consider reusing a - | compressor object. - - A **digested** dictionary is used for decompression by default, which - is faster when loaded multiple times. - Advanced parameter control -------------------------- @@ -482,14 +474,14 @@ Advanced parameter control .. attribute:: compression_level A high-level means of setting other compression parameters that affect - the speed and ratio of compressing data. Setting the level to zero uses the - default :attr:`COMPRESSION_LEVEL_DEFAULT`. + the speed and ratio of compressing data. Setting the level to zero uses + the default :attr:`COMPRESSION_LEVEL_DEFAULT`. .. attribute:: window_log Maximum allowed back-reference distance the compressor can use when - compressing data, expressed as power of two, ``1 << window_log`` bytes. This - parameter greatly influences the memory usage of compression. Higher + compressing data, expressed as power of two, ``1 << window_log`` bytes. + This parameter greatly influences the memory usage of compression. Higher values require more memory but gain better compression values. .. attribute:: hash_log @@ -519,9 +511,9 @@ Advanced parameter control Minimum size of searched matches. Larger values increase compression and decompression speed, but decrease ratio. Note that Zstandard can still find matches of smaller size, it just tweaks its search algorithm to look - for this size and larger. Note that currently, for all strategies - < :attr:`~Strategy.btopt`, the effective minimum is ``4``, for all - strategies > :attr:`~Strategy.fast`, the effective maximum is ``6``. + for this size and larger. For all strategies < :attr:`~Strategy.btopt`, + the effective minimum is ``4``, for all strategies + > :attr:`~Strategy.fast`, the effective maximum is ``6``. .. attribute:: target_length @@ -599,7 +591,7 @@ Advanced parameter control Select how many threads will be spawned to compress in parallel. When :attr:`!nb_workers` >= 1, enables multi-threaded compression, 1 - means "1-thread multi-threaded mode". More workers improve speed, but + means "one-thread multi-threaded mode". More workers improve speed, but also increase memory usage and slightly reduce compression ratio. .. attribute:: job_size @@ -655,8 +647,9 @@ Advanced parameter control .. note:: - The values of attributes of :class:`Strategy` are not necessarily stable - between zstd versions. Only the ordering may be relied upon. + The values of attributes of :class:`!Strategy` are not necessarily stable + across zstd versions. Only the ordering of the attributes may be relied + upon. The following strategies are available: From 0f7bc05fd5b413697ebd093f3ffe956c399c9bb2 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Wed, 14 May 2025 19:00:47 -0700 Subject: [PATCH 11/22] Remove outdated paragraph --- Doc/library/compression.zstd.rst | 3 --- 1 file changed, 3 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 7db68c89f14ffd..9743020c923c96 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -119,9 +119,6 @@ Reading and writing compressed files *zstd_dict* argument is a :class:`!ZstdDict` instance to be used during compression. - When opening a file for writing, the *options*, *zstd_dict* and *level* - arguments have the same meanings as for :class:`ZstdCompressor`. - :class:`!ZstdFile` supports all the members specified by :class:`io.BufferedIOBase`, except for :meth:`~io.BufferedIOBase.detach` and :meth:`~io.IOBase.truncate`. From 44173f33b84d5da0c8aa6271a4912ca351b435ee Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Fri, 16 May 2025 11:53:01 -0700 Subject: [PATCH 12/22] Remove Zstandard dictionary after ZstdDict Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> --- Doc/library/compression.zstd.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 9743020c923c96..94cbf053afccdf 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -165,8 +165,8 @@ Compressing and decompressing data in memory parameters. The valid keys and values for compression parameters are documented as part of the :class:`CompressionParameter` documentation. - The *zstd_dict* argument is an instance of :class:`ZstdDict`, a Zstandard - dictionary, containing trained data to improve compression efficiency. The + The *zstd_dict* argument is an instance of :class:`ZstdDict` + containing trained data to improve compression efficiency. The function :func:`train_dict` can be used to generate a Zstandard dictionary. @@ -180,8 +180,8 @@ Compressing and decompressing data in memory parameters are documented as part of the :class:`DecompressionParameter` documentation. - The *zstd_dict* argument is an instance of :class:`ZstdDict`, a Zstandard - dictionary, containing trained data used during compression. This must be + The *zstd_dict* argument is an instance of :class:`ZstdDict` + containing trained data used during compression. This must be the same Zstandard dictionary used during compression. If *data* is the concatenation of multiple distinct compressed frames, @@ -205,8 +205,8 @@ Compressing and decompressing data in memory parameters. The valid keys and values for compression parameters are documented as part of the :class:`CompressionParameter` documentation. - The *zstd_dict* argument is an instance of :class:`ZstdDict`, a Zstandard - dictionary, containing trained data to improve compression efficiency. The + The *zstd_dict* argument is an instance of :class:`ZstdDict` + containing trained data to improve compression efficiency. The function :func:`train_dict` can be used to generate a Zstandard dictionary. .. attribute:: CONTINUE @@ -266,8 +266,8 @@ Compressing and decompressing data in memory parameters are documented as part of the :class:`DecompressionParameter` documentation. - The *zstd_dict* argument is an instance of :class:`ZstdDict`, a Zstandard - dictionary, containing trained data used during compression. This must be + The *zstd_dict* argument is an instance of :class:`ZstdDict` + containing trained data used during compression. This must be the same Zstandard dictionary used during compression. .. note:: From 8bd550025b394f671b24e153675de6d2909909f3 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 18 May 2025 15:08:04 -0400 Subject: [PATCH 13/22] Rewrite introduction to compression package to be more timeless --- Doc/library/compression.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/Doc/library/compression.rst b/Doc/library/compression.rst index 427c6c42154ca1..618b4a3c2bd170 100644 --- a/Doc/library/compression.rst +++ b/Doc/library/compression.rst @@ -3,12 +3,12 @@ The :mod:`!compression` package .. versionadded:: 3.14 -.. attention:: - - The :mod:`!compression` package is the new location for the data compression - modules in the standard library, listed below. The existing modules are not - deprecated and will not be removed before Python 3.19. The new ``compression.*`` - import names are encouraged for use where practical. +The :mod:`!compression` package contains the canonical compression modules +containing interfaces to several different compression algorithms. Some of +these modules have historically been available as separate modules; those will +continue to be available under their original names for compatibility reasons, +and will not be removed without a deprecation cycle. The use of modules in +:mod:`!compression` is encouraged where practical. * :mod:`!compression.bz2` -- Re-exports :mod:`bz2` * :mod:`!compression.gzip` -- Re-exports :mod:`gzip` From 24f376121f4a52a2346c6c5db4527ad8ff699390 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 18 May 2025 15:09:30 -0400 Subject: [PATCH 14/22] Remove content_size_flag --- Doc/library/compression.zstd.rst | 5 ----- 1 file changed, 5 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 7db68c89f14ffd..8d901fad00418d 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -571,11 +571,6 @@ Advanced parameter control hash table. Larger values improve compression speed. Deviating far from the default value will likely result in a compression ratio decrease. - .. attribute:: content_size_flag - - Uncompressed content size will be written into frame header whenever - known. This flag currently has no effect. - .. attribute:: checksum_flag A four-byte checksum using XXHash64 of the uncompressed content is written From e61e9a14c923f4fc49e2aa06d6760960d8c86e71 Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Mon, 19 May 2025 08:45:41 -0700 Subject: [PATCH 15/22] Apply suggestions from Sumana and Stan Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Sumana Harihareswara --- Doc/library/compression.zstd.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 291e9a2cf9a614..0578453c5d623f 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -49,7 +49,7 @@ Reading and writing compressed files Open a Zstandard-compressed file in binary or text mode, returning a :term:`file object`. - The *file* argument can be either an actual file name (given as a + The *file* argument can be either a file name (given as a :class:`str`, :class:`bytes` or :term:`path-like ` object), in which case the named file is opened, or it can be an existing file object to read from or write to. @@ -65,7 +65,7 @@ Reading and writing compressed files :class:`DecompressionParameter` for detailed information about supported parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be used during decompression. When opening a file for reading, if the *level* - argument is passed a :exc:`!TypeError` will be raised. + argument is passed, a :exc:`!TypeError` will be raised. When opening a file for writing, the *options* argument can be a dictionary providing advanced decompression parameters; see @@ -241,7 +241,7 @@ Compressing and decompressing data in memory :attr:`~.CONTINUE`, :attr:`~.FLUSH_BLOCK`, or :attr:`~.FLUSH_FRAME`. - When you have finished providing data to the compressor, call the + When all data has been provided to the compressor, call the :meth:`~.flush` method to finish the compression process. .. method:: flush(mode) @@ -767,4 +767,4 @@ Creating a compressed file using compression parameters: zstd.CompressionParameter.checksum_flag: 1 } with zstd.open("file.zst", "w", options=options) as f: - f.write(b"blah blah blah") + f.write(b"Mind if I squeeze in?") From 11498327fe0162fbb2ecf0f10f8b0bf53f9c3a85 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Tue, 20 May 2025 12:01:29 -0400 Subject: [PATCH 16/22] Remove ref to Meta and clean up mode usage --- Doc/library/compression.zstd.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 0578453c5d623f..ab82ccfcd5d126 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -3,7 +3,7 @@ .. module:: compression.zstd :synopsis: Low-level interface to compression and decompression routines in - Meta's zstd library + the zstd library. .. versionadded:: 3.14 @@ -54,9 +54,9 @@ Reading and writing compressed files in which case the named file is opened, or it can be an existing file object to read from or write to. - The mode argument can be either ``'r'`` for reading (default), ``'w'`` for - overwriting, ``'a'`` for appending, or ``'x'`` for exclusive creation. These - can equivalently be given as ``'rb'``, ``'wb'``, ``'ab'``, and ``'xb'`` + The mode argument can be either ``'rb'`` for reading (default), ``'wb'`` for + overwriting, ``'ab'`` for appending, or ``'xb'`` for exclusive creation. + These can equivalently be given as ``'r'``, ``'w'``, ``'a'``, and ``'x'`` respectively. You may also open in text mode with ``'rt'``, ``'wt'``, ``'at'``, and ``'xt'`` respectively. @@ -84,7 +84,7 @@ Reading and writing compressed files handling behavior, and line endings. -.. class:: ZstdFile(file, /, mode='r', *, level=None, options=None, \ +.. class:: ZstdFile(file, /, mode='rb', *, level=None, options=None, \ zstd_dict=None) Open a Zstandard-compressed file in binary mode. @@ -96,9 +96,9 @@ Reading and writing compressed files wrapping an existing file object, the wrapped file will not be closed when the :class:`ZstdFile` is closed. - The *mode* argument can be either ``'r'`` for reading (default), ``'w'`` for - overwriting, ``'x'`` for exclusive creation, or ``'a'`` for appending. These - can equivalently be given as ``'rb'``, ``'wb'``, ``'xb'`` and ``'ab'`` + The *mode* argument can be either ``'rb'`` for reading (default), ``'wb'`` + for overwriting, ``'xb'`` for exclusive creation, or ``'ab'`` for appending. + These can equivalently be given as ``'r'``, ``'w'``, ``'x'`` and ``'a'`` respectively. If *file* is a file object (rather than an actual file name), a mode of From 71ed7c3b5efdd6ac77b5ad8e277be9a05324f36e Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Tue, 20 May 2025 11:07:02 -0700 Subject: [PATCH 17/22] Apply suggestions from vadmium Co-authored-by: Martin Panter --- Doc/library/compression.zstd.rst | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index ab82ccfcd5d126..eeb26e2904f3de 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -65,7 +65,7 @@ Reading and writing compressed files :class:`DecompressionParameter` for detailed information about supported parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be used during decompression. When opening a file for reading, if the *level* - argument is passed, a :exc:`!TypeError` will be raised. + argument is not None, a :exc:`!TypeError` will be raised. When opening a file for writing, the *options* argument can be a dictionary providing advanced decompression parameters; see @@ -105,9 +105,9 @@ Reading and writing compressed files ``'w'`` does not truncate the file, and is instead equivalent to ``'a'``. When opening a file for reading, the *options* argument can be a dictionary - providing advanced decompression parameters, see + providing advanced decompression parameters; see :class:`DecompressionParameter` for detailed information about supported - parameters. The *zstd_dict* argument is a :class:`!ZstdDict` instance to be + parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be used during decompression. When opening a file for reading, if the *level* argument is passed a :exc:`!TypeError` will be raised. @@ -205,7 +205,7 @@ Compressing and decompressing data in memory parameters. The valid keys and values for compression parameters are documented as part of the :class:`CompressionParameter` documentation. - The *zstd_dict* argument is an instance of :class:`ZstdDict` + The *zstd_dict* argument is an optional instance of :class:`ZstdDict` containing trained data to improve compression efficiency. The function :func:`train_dict` can be used to generate a Zstandard dictionary. @@ -285,7 +285,7 @@ Compressing and decompressing data in memory The returned data should be concatenated with the output of any previous calls to :meth:`!decompress`. - If *max_length* is non-negative, returns at most *max_length* + If *max_length* is non-negative, the method returns at most *max_length* bytes of decompressed data. If this limit is reached and further output can be produced, the :attr:`~.needs_input` attribute will be set to ``False``. In this case, the next call to @@ -314,7 +314,7 @@ Compressing and decompressing data in memory .. attribute:: needs_input ``False`` if the :meth:`.decompress` method can provide more - decompressed data before requiring new uncompressed input. + decompressed data before requiring new compressed input. Zstandard dictionaries @@ -330,7 +330,7 @@ Zstandard dictionaries files), Zstandard dictionaries can improve compression ratios and speed significantly. - The *samples* argument (an iterable of :class:`bytes`), is the population of + The *samples* argument (an iterable of :class:`bytes` objects), is the population of samples used to train the Zstandard dictionary. The *dict_size* argument, an integer, is the maximum size (in bytes) the @@ -421,7 +421,7 @@ Zstandard dictionaries .. attribute:: dict_id - Identifier of the Zstandard dictionary, an int value between zero and . + Identifier of the Zstandard dictionary, a non-negative int value. Non-zero means the dictionary is ordinary, created by Zstandard functions and following the Zstandard format. @@ -437,7 +437,7 @@ Zstandard dictionaries .. attribute:: as_digested_dict - Load as a digested dictionary, see below. + Load as a digested dictionary. .. attribute:: as_undigested_dict @@ -450,7 +450,7 @@ Advanced parameter control .. class:: CompressionParameter() An :class:`~enum.IntEnum` containing the advanced compression parameter - names that can be used when compressing data. + keys that can be used when compressing data. The :meth:`~.bounds` method can be used on any attribute to get the valid values for that parameter. @@ -472,7 +472,7 @@ Advanced parameter control A high-level means of setting other compression parameters that affect the speed and ratio of compressing data. Setting the level to zero uses - the default :attr:`COMPRESSION_LEVEL_DEFAULT`. + :attr:`COMPRESSION_LEVEL_DEFAULT`. .. attribute:: window_log @@ -509,7 +509,7 @@ Advanced parameter control decompression speed, but decrease ratio. Note that Zstandard can still find matches of smaller size, it just tweaks its search algorithm to look for this size and larger. For all strategies < :attr:`~Strategy.btopt`, - the effective minimum is ``4``, for all strategies + the effective minimum is ``4``; for all strategies > :attr:`~Strategy.fast`, the effective maximum is ``6``. .. attribute:: target_length @@ -621,13 +621,13 @@ Advanced parameter control parameter. This method should be called on the attribute you wish to retrieve the bounds of. For example, to get the valid values for :attr:`~.window_log_max`, one may check the result of - ``CompressionParameter.window_log_max.bounds()``. + ``DecompressionParameter.window_log_max.bounds()``. Both the lower and upper bounds are inclusive. .. attribute:: window_log_max - The power of two maximum size of the window used during decompression. + The base-two logarithm of the maximum size of the window used during decompression. This can be useful to limit the amount of memory used when decompressing data. From 2f895ddb70aae01609eab81089eb93c38b157bf7 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Tue, 20 May 2025 16:08:03 -0400 Subject: [PATCH 18/22] Many updates to respond to review - improve argument descriptions - improve clarity about the meaning of parameter value 0s - clarify behavior of compress/flush - many more clarifications/wording improvements --- Doc/library/compression.zstd.rst | 188 +++++++++++++++++++------------ 1 file changed, 116 insertions(+), 72 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index eeb26e2904f3de..6422976f791e1a 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -50,9 +50,9 @@ Reading and writing compressed files :term:`file object`. The *file* argument can be either a file name (given as a - :class:`str`, :class:`bytes` or :term:`path-like ` object), - in which case the named file is opened, or it can be an existing file object - to read from or write to. + :class:`str`, :class:`bytes` or :term:`path-like ` + object), in which case the named file is opened, or it can be an existing + file object to read from or write to. The mode argument can be either ``'rb'`` for reading (default), ``'wb'`` for overwriting, ``'ab'`` for appending, or ``'xb'`` for exclusive creation. @@ -60,19 +60,19 @@ Reading and writing compressed files respectively. You may also open in text mode with ``'rt'``, ``'wt'``, ``'at'``, and ``'xt'`` respectively. - When opening a file for reading, the *options* argument can be a dictionary - providing advanced decompression parameters; see - :class:`DecompressionParameter` for detailed information about supported + When reading, the *options* argument can be a dictionary providing advanced + decompression parameters; see :class:`DecompressionParameter` for detailed + information about supported parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be - used during decompression. When opening a file for reading, if the *level* + used during decompression. When reading, if the *level* argument is not None, a :exc:`!TypeError` will be raised. - When opening a file for writing, the *options* argument can be a dictionary + When writing, the *options* argument can be a dictionary providing advanced decompression parameters; see :class:`CompressionParameter` for detailed information about supported parameters. The *level* argument is the compression level to use when - writing compressed data. Only one of *level* or *options* may be passed. The - *zstd_dict* argument is a :class:`ZstdDict` instance to be used during + writing compressed data. Only one of *level* or *options* may be non-None. + The *zstd_dict* argument is a :class:`ZstdDict` instance to be used during compression. In binary mode, this function is equivalent to the :class:`ZstdFile` @@ -104,19 +104,19 @@ Reading and writing compressed files If *file* is a file object (rather than an actual file name), a mode of ``'w'`` does not truncate the file, and is instead equivalent to ``'a'``. - When opening a file for reading, the *options* argument can be a dictionary + When reading, the *options* argument can be a dictionary providing advanced decompression parameters; see :class:`DecompressionParameter` for detailed information about supported parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be - used during decompression. When opening a file for reading, if the *level* - argument is passed a :exc:`!TypeError` will be raised. + used during decompression. When reading, if the *level* + argument is not None, a :exc:`!TypeError` will be raised. - When opening a file for writing, the *options* argument can be a dictionary - providing advanced decompression parameters, see + When writing, the *options* argument can be a dictionary + providing advanced decompression parameters; see :class:`CompressionParameter` for detailed information about supported parameters. The *level* argument is the compression level to use when writing compressed data. Only one of *level* or *options* may be passed. The - *zstd_dict* argument is a :class:`!ZstdDict` instance to be used during + *zstd_dict* argument is a :class:`ZstdDict` instance to be used during compression. :class:`!ZstdFile` supports all the members specified by @@ -161,9 +161,9 @@ Compressing and decompressing data in memory needed, this argument must be omitted and in the *options* dictionary the :attr:`CompressionParameter.compression_level` parameter should be set. - The *options* argument is a Python dictionary containing advanced compression - parameters. The valid keys and values for compression parameters are - documented as part of the :class:`CompressionParameter` documentation. + The *options* argument is a Python dictionary containing advanced + compression parameters. The valid keys and values for compression parameters + are documented as part of the :class:`CompressionParameter` documentation. The *zstd_dict* argument is an instance of :class:`ZstdDict` containing trained data to improve compression efficiency. The @@ -190,7 +190,8 @@ Compressing and decompressing data in memory .. class:: ZstdCompressor(level=None, options=None, zstd_dict=None) - Create a compressor object, which can be used to compress data incrementally. + Create a compressor object, which can be used to compress data + incrementally. For a more convenient way of compressing a single chunk of data, see the module-level function :func:`compress`. @@ -201,14 +202,42 @@ Compressing and decompressing data in memory needed, this argument must be omitted and in the *options* dictionary the :attr:`CompressionParameter.compression_level` parameter should be set. - The *options* argument is a Python dictionary containing advanced compression - parameters. The valid keys and values for compression parameters are - documented as part of the :class:`CompressionParameter` documentation. + The *options* argument is a Python dictionary containing advanced + compression parameters. The valid keys and values for compression parameters + are documented as part of the :class:`CompressionParameter` documentation. The *zstd_dict* argument is an optional instance of :class:`ZstdDict` containing trained data to improve compression efficiency. The function :func:`train_dict` can be used to generate a Zstandard dictionary. + + .. method:: compress(data, mode=ZstdCompressor.CONTINUE) + + Compress *data* (a :term:`bytes-like object`), returning a :class:`bytes` + object with compressed data if possible, or otherwise an empty + :class:`!bytes` object. Some of *data* may be buffered internally, for + use in later calls to :meth:`!compress` and :meth:`~.flush`. The returned + data should be concatenated with the output of any previous calls to + :meth:`~.compress`. + + The *mode* argument is a :class:`ZstdCompressor` attribute, either + :attr:`~.CONTINUE`, :attr:`~.FLUSH_BLOCK`, + or :attr:`~.FLUSH_FRAME`. + + When all data has been provided to the compressor, call the + :meth:`~.flush` method to finish the compression process. If + :meth:`~.compress` is called with *mode* set to :attr:`~.FLUSH_FRAME`, + :meth:`~.flush` should not be called, as it would write out a new empty + frame. + + .. method:: flush(mode=ZstdCompressor.FLUSH_FRAME) + + Finish the compression process, returning a :class:`bytes` object + containing any data stored in the compressor's internal buffers. + + The *mode* argument is a :class:`ZstdCompressor` attribute, either + :attr:`~.FLUSH_BLOCK`, or :attr:`~.FLUSH_FRAME`. + .. attribute:: CONTINUE Collect more data for compression, which may or may not generate output @@ -228,30 +257,6 @@ Compressing and decompressing data in memory :meth:`~.compress` will be written into a new frame and *cannot* reference past data. - .. method:: compress(data, mode=ZstdCompressor.CONTINUE) - - Compress *data* (a :term:`bytes-like object`), returning a :class:`bytes` - object if possible, or an empty byte string otherwise. Some of *data* may - be buffered internally, for use in later calls to - :meth:`!compress` and :meth:`~.flush`. The - returned data should be concatenated with the output of any previous calls - to :meth:`~.compress`. - - The *mode* argument is a :class:`ZstdCompressor` attribute, either - :attr:`~.CONTINUE`, :attr:`~.FLUSH_BLOCK`, - or :attr:`~.FLUSH_FRAME`. - - When all data has been provided to the compressor, call the - :meth:`~.flush` method to finish the compression process. - - .. method:: flush(mode) - - Finish the compression process, returning a :class:`bytes` object - containing any data stored in the compressor's internal buffers. - - The *mode* argument is a :class:`ZstdCompressor` attribute, either - :attr:`~.FLUSH_BLOCK`, or :attr:`~.FLUSH_FRAME`. - .. class:: ZstdDecompressor(zstd_dict=None, options=None) @@ -325,13 +330,13 @@ Zstandard dictionaries Train a Zstandard dictionary, returning a :class:`ZstdDict` instance. Zstandard dictionaries enable more efficient compression of smaller sizes - of data, which is traditionally difficult to compress due to less repetition. - If you are compressing multiple similar groups of data (such as similar - files), Zstandard dictionaries can improve compression ratios and speed - significantly. + of data, which is traditionally difficult to compress due to less + repetition. If you are compressing multiple similar groups of data (such as + similar files), Zstandard dictionaries can improve compression ratios and + speed significantly. - The *samples* argument (an iterable of :class:`bytes` objects), is the population of - samples used to train the Zstandard dictionary. + The *samples* argument (an iterable of :class:`bytes` objects), is the + population of samples used to train the Zstandard dictionary. The *dict_size* argument, an integer, is the maximum size (in bytes) the Zstandard dictionary should be. The Zstandard documentation suggests an @@ -351,8 +356,8 @@ Zstandard dictionaries The *zstd_dict* argument is a :class:`ZstdDict` instance with the :attr:`~ZstdDict.dict_content` containing the raw dictionary contents. - The *samples* argument (an iterable of bytes), contains sample data for - generating the Zstandard dictionary. + The *samples* argument (an iterable of :class:`bytes` objects), contains + sample data for generating the Zstandard dictionary. The *dict_size* argument, an integer, is the maximum size (in bytes) the Zstandard dictionary should be. See :func:`train_dict` for @@ -377,7 +382,7 @@ Zstandard dictionaries meaning of *dict_content*. ``True`` means *dict_content* is a "raw content" dictionary, without any format restrictions. ``False`` means *dict_content* is an ordinary Zstandard dictionary, created from Zstandard functions, - for example, :func:`train_dict` or the ``zstd`` CLI. + for example, :func:`train_dict` or the external :program:`zstd` CLI. When passing a :class:`!ZstdDict` to a function, the :attr:`!as_digested_dict` and :attr:`!as_undigested_dict` attributes can @@ -411,7 +416,7 @@ Zstandard dictionaries If passing a :class:`!ZstdDict` without any attribute, an undigested dictionary is passed by default when compressing and a digested dictionary - is passed by default when decompressing. + is generated if necessary and passed by default when decompressing. .. attribute:: dict_content @@ -431,8 +436,8 @@ Zstandard dictionaries .. note:: - The meaning of ``0`` for :attr:`!ZstdDict.dict_id` is different from - the ``dictionary_id`` argument to the :func:`get_frame_info` + The meaning of ``0`` for :attr:`!ZstdDict.dict_id` is different + from the ``dictionary_id`` attribute to the :func:`get_frame_info` function. .. attribute:: as_digested_dict @@ -455,8 +460,8 @@ Advanced parameter control The :meth:`~.bounds` method can be used on any attribute to get the valid values for that parameter. - Setting any parameter to zero causes zstd to dynamically select a value - for that parameter based on other compression parameters' settings. + Parameters are optional; any omitted parameter will have it's value selected + automatically. .. method:: bounds() @@ -481,6 +486,8 @@ Advanced parameter control This parameter greatly influences the memory usage of compression. Higher values require more memory but gain better compression values. + A value of zero causes the value to be selected automatically. + .. attribute:: hash_log Size of the initial probe table, as a power of two. The resulting memory @@ -488,6 +495,8 @@ Advanced parameter control ratio of strategies <= :attr:`~Strategy.dfast`, and improve compression speed of strategies > :attr:`~Strategy.dfast`. + A value of zero causes the value to be selected automatically. + .. attribute:: chain_log Size of the multi-probe search table, as a power of two. The resulting @@ -497,12 +506,16 @@ Advanced parameter control :attr:`~Strategy.dfast` strategy, in which case it defines a secondary probe table. + A value of zero causes the value to be selected automatically. + .. attribute:: search_log Number of search attempts, as a power of two. More attempts result in better and slower compression. This parameter is useless for :attr:`~Strategy.fast` and :attr:`~Strategy.dfast` strategies. + A value of zero causes the value to be selected automatically. + .. attribute:: min_match Minimum size of searched matches. Larger values increase compression and @@ -512,6 +525,8 @@ Advanced parameter control the effective minimum is ``4``; for all strategies > :attr:`~Strategy.fast`, the effective maximum is ``6``. + A value of zero causes the value to be selected automatically. + .. attribute:: target_length The impact of this field depends on the selected :class:`Strategy`. @@ -525,6 +540,8 @@ Advanced parameter control sampling. Larger values make compression faster, but with a worse compression ratio. + A value of zero causes the value to be selected automatically. + .. attribute:: strategy The higher the value of selected strategy, the more complex the @@ -539,6 +556,9 @@ Advanced parameter control inputs by finding large matches at greater distances. It increases memory usage and window size. + ``True`` or ``0`` enable long distance matching while ``False`` or ``1`` + disable it. + Enabling this parameter increases default :attr:`~CompressionParameter.window_log` to 128 MiB except when expressly set to a different value. This setting is enabled by default if @@ -551,34 +571,49 @@ Advanced parameter control values increase memory usage and compression ratio, but decrease compression speed. + A value of zero causes the value to be selected automatically. + .. attribute:: ldm_min_match Minimum match size for long distance matcher. Larger or too small values can often decrease the compression ratio. + A value of zero causes the value to be selected automatically. + .. attribute:: ldm_bucket_size_log Log size of each bucket in the long distance matcher hash table for collision resolution. Larger values improve collision resolution but decrease compression speed. + A value of zero causes the value to be selected automatically. + .. attribute:: ldm_hash_rate_log Frequency of inserting/looking up entries into the long distance matcher hash table. Larger values improve compression speed. Deviating far from the default value will likely result in a compression ratio decrease. + A value of zero causes the value to be selected automatically. + .. attribute:: checksum_flag - A four-byte checksum using XXHash64 of the uncompressed content is written - at the end of each frame. Zstandard's decompression code verifies the - checksum. If there is a mismatch a :class:`ZstdError` exception is + A four-byte checksum using XXHash64 of the uncompressed content is + written at the end of each frame. Zstandard's decompression code verifies + the checksum. If there is a mismatch a :class:`ZstdError` exception is raised. + + ``True`` or ``0`` enable checksum generation while ``False`` or ``1`` + disable it. + .. attribute:: dict_id_flag When compressing with a :class:`ZstdDict`, the dictionary's ID is written into the frame header. + ``True`` or ``0`` enable storing the dictionary ID while ``False`` or + ``1`` disable it. + .. attribute:: nb_workers Select how many threads will be spawned to compress in parallel. When @@ -586,6 +621,8 @@ Advanced parameter control means "one-thread multi-threaded mode". More workers improve speed, but also increase memory usage and slightly reduce compression ratio. + A value of zero disables multi-threading. + .. attribute:: job_size Size of a compression job, in bytes. This value is enforced only when @@ -593,6 +630,8 @@ Advanced parameter control completed in parallel, so this value can indirectly impact the number of active threads. + A value of zero causes the value to be selected automatically. + .. attribute:: overlap_log Sets how much data is reloaded from previous jobs (threads) for new jobs @@ -610,7 +649,8 @@ Advanced parameter control .. class:: DecompressionParameter() An :class:`~enum.IntEnum` containing the advanced decompression parameter - names that can be used when decompressing data. + keys that can be used when decompressing data. Parameters are optional; any + omitted parameter will have it's value selected automatically. The :meth:`~.bounds` method can be used on any attribute to get the valid values for that parameter. @@ -627,9 +667,13 @@ Advanced parameter control .. attribute:: window_log_max - The base-two logarithm of the maximum size of the window used during decompression. - This can be useful to limit the amount of memory used when decompressing - data. + The base-two logarithm of the maximum size of the window used during + decompression. This can be useful to limit the amount of memory used when + decompressing data. A larger maximum window size leads to faster + decompression. + + A value of zero causes the value to be selected automatically. + .. class:: Strategy() @@ -641,7 +685,7 @@ Advanced parameter control The values of attributes of :class:`!Strategy` are not necessarily stable across zstd versions. Only the ordering of the attributes may be relied - upon. + upon. The attributes are listed below in order. The following strategies are available: @@ -685,8 +729,8 @@ Miscellaneous An int object representing the Zstandard dictionary ID needed for decompressing the frame. ``0`` means the dictionary ID was not - recorded in the frame header, the frame may or may not need a dictionary - to be decoded, or the ID of such a dictionary is not specified. + recorded in the frame header. This may mean that a Zstandard dictionary + is not needed, or that the ID of a required dictionary was not recorded. .. attribute:: COMPRESSION_LEVEL_DEFAULT @@ -751,7 +795,7 @@ Writing compressed data to an already-open file: from compression import zstd - with open("file.zst", "wb") as f: + with open("myfile", "wb") as f: f.write(b"This data will not be compressed\n") with zstd.open(f, "w") as zstf: zstf.write(b"This *will* be compressed\n") From f25e6e778111256cbe5ceb4c3e5e7eda4c3f3d71 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Tue, 20 May 2025 16:30:25 -0400 Subject: [PATCH 19/22] Add examples to (De)compressionParameter --- Doc/library/compression.zstd.rst | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 6422976f791e1a..d7f33c7669d210 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -463,6 +463,16 @@ Advanced parameter control Parameters are optional; any omitted parameter will have it's value selected automatically. + Example getting the lower and upper bound of :attr:`~.compression_level`:: + + lower, upper = CompressionParameter.compression_level.bounds() + + Example setting the :attr:`~.window_log` to the maximum size:: + + _lower, upper = CompressionParameter.window_log.bounds() + options = {CompressionParameter.window_log: upper} + compress(b'venezuelan beaver cheese', options=options) + .. method:: bounds() Return the tuple of int bounds, ``(lower, upper)``, of a compression @@ -655,13 +665,20 @@ Advanced parameter control The :meth:`~.bounds` method can be used on any attribute to get the valid values for that parameter. + Example setting the :attr:`~.window_log_max` to the maximum size:: + + data = compress(b'Some very long buffer of bytes...') + + _lower, upper = DecompressionParameter.window_log_max.bounds() + + options = {DecompressionParameter.window_log_max: upper} + decompress(data, options=options) + .. method:: bounds() Return the tuple of int bounds, ``(lower, upper)``, of a decompression parameter. This method should be called on the attribute you wish to - retrieve the bounds of. For example, to get the valid values for - :attr:`~.window_log_max`, one may check the result of - ``DecompressionParameter.window_log_max.bounds()``. + retrieve the bounds of. Both the lower and upper bounds are inclusive. From 9ff632056849d36f8db0e678be0f20fe4930e536 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Tue, 20 May 2025 16:41:00 -0400 Subject: [PATCH 20/22] Add reference to zstd manual and blurb on algorithm --- Doc/library/compression.zstd.rst | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index d7f33c7669d210..046a51b3c92ce8 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -11,11 +11,14 @@ -------------- -This module provides classes and functions for compressing and -decompressing data using the Zstandard (or *zstd*) compression algorithm. Also -included is a file interface that supports reading and writing the contents of -``.zst`` files created by the :program:`zstd` utility, as well as raw zstd -compressed streams. +This module provides classes and functions for compressing and decompressing +data using the Zstandard (or *zstd*) compression algorithm. The +`zstd manual `__ +describes Zstandard as "a fast lossless compression algorithm, targeting +real-time compression scenarios at zlib-level and better compression ratios." +Also included is a file interface that supports reading and writing the +contents of ``.zst`` files created by the :program:`zstd` utility, as well as +raw zstd compressed streams. The :mod:`!compression.zstd` module contains: From daa9df14d501f31dc71416185a5f7a005e97e929 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Wed, 21 May 2025 09:59:22 -0400 Subject: [PATCH 21/22] Expand on the connection between level and compression_level --- Doc/library/compression.zstd.rst | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 046a51b3c92ce8..407de5cde5999d 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -159,10 +159,13 @@ Compressing and decompressing data in memory data as a :class:`bytes` object. The *level* argument is an int object controlling the level of - compression. Refer to :meth:`CompressionParameter.bounds` to get the - values that can be passed for *level*. If advanced compression options are - needed, this argument must be omitted and in the *options* dictionary the - :attr:`CompressionParameter.compression_level` parameter should be set. + compression. *level* is an alternative to setting + :attr:`CompressionParameter.compression_level` in *options*. Use + :meth:`~CompressionParameter.bounds` on + :attr:`~CompressionParameter.compression_level` to get the values that can + be passed for *level*. If advanced compression options are needed, the + *level* argument must be omitted and in the *options* dictionary the + :attr:`!CompressionParameter.compression_level` parameter should be set. The *options* argument is a Python dictionary containing advanced compression parameters. The valid keys and values for compression parameters @@ -200,10 +203,13 @@ Compressing and decompressing data in memory module-level function :func:`compress`. The *level* argument is an int object controlling the level of - compression. Refer to :meth:`CompressionParameter.bounds` to get the - values that can be passed for *level*. If advanced compression options are - needed, this argument must be omitted and in the *options* dictionary the - :attr:`CompressionParameter.compression_level` parameter should be set. + compression. *level* is an alternative to setting + :attr:`CompressionParameter.compression_level` in *options*. Use + :meth:`~CompressionParameter.bounds` on + :attr:`~CompressionParameter.compression_level` to get the values that can + be passed for *level*. If advanced compression options are needed, the + *level* argument must be omitted and in the *options* dictionary the + :attr:`!CompressionParameter.compression_level` parameter should be set. The *options* argument is a Python dictionary containing advanced compression parameters. The valid keys and values for compression parameters From b3fd3cd05202a68509ffa800247e989aadf42f85 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Wed, 21 May 2025 10:03:26 -0400 Subject: [PATCH 22/22] Resolve review suggestions - Use "integer" instead of "int object" - Use > 0 rather than >=1 for nb_workers --- Doc/library/compression.zstd.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst index 407de5cde5999d..1e1802155a19ec 100644 --- a/Doc/library/compression.zstd.rst +++ b/Doc/library/compression.zstd.rst @@ -158,7 +158,7 @@ Compressing and decompressing data in memory Compress *data* (a :term:`bytes-like object`), returning the compressed data as a :class:`bytes` object. - The *level* argument is an int object controlling the level of + The *level* argument is an integer controlling the level of compression. *level* is an alternative to setting :attr:`CompressionParameter.compression_level` in *options*. Use :meth:`~CompressionParameter.bounds` on @@ -202,7 +202,7 @@ Compressing and decompressing data in memory For a more convenient way of compressing a single chunk of data, see the module-level function :func:`compress`. - The *level* argument is an int object controlling the level of + The *level* argument is an integer controlling the level of compression. *level* is an alternative to setting :attr:`CompressionParameter.compression_level` in *options*. Use :meth:`~CompressionParameter.bounds` on @@ -575,7 +575,7 @@ Advanced parameter control inputs by finding large matches at greater distances. It increases memory usage and window size. - ``True`` or ``0`` enable long distance matching while ``False`` or ``1`` + ``True`` or ``1`` enable long distance matching while ``False`` or ``0`` disable it. Enabling this parameter increases default @@ -622,7 +622,7 @@ Advanced parameter control the checksum. If there is a mismatch a :class:`ZstdError` exception is raised. - ``True`` or ``0`` enable checksum generation while ``False`` or ``1`` + ``True`` or ``1`` enable checksum generation while ``False`` or ``0`` disable it. .. attribute:: dict_id_flag @@ -630,15 +630,15 @@ Advanced parameter control When compressing with a :class:`ZstdDict`, the dictionary's ID is written into the frame header. - ``True`` or ``0`` enable storing the dictionary ID while ``False`` or - ``1`` disable it. + ``True`` or ``1`` enable storing the dictionary ID while ``False`` or + ``0`` disable it. .. attribute:: nb_workers Select how many threads will be spawned to compress in parallel. When - :attr:`!nb_workers` >= 1, enables multi-threaded compression, 1 - means "one-thread multi-threaded mode". More workers improve speed, but - also increase memory usage and slightly reduce compression ratio. + :attr:`!nb_workers` > 0, enables multi-threaded compression, a value of + ``1`` means "one-thread multi-threaded mode". More workers improve speed, + but also increase memory usage and slightly reduce compression ratio. A value of zero disables multi-threading. @@ -753,7 +753,7 @@ Miscellaneous .. attribute:: dictionary_id - An int object representing the Zstandard dictionary ID needed for + An integer representing the Zstandard dictionary ID needed for decompressing the frame. ``0`` means the dictionary ID was not recorded in the frame header. This may mean that a Zstandard dictionary is not needed, or that the ID of a required dictionary was not recorded.