8000 gh-132983: Introduce `_zstd` bindings module by emmatyping · Pull Request #133027 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-132983: Introduce _zstd bindings module #133027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
May 4, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
9814e3b
Add _zstd module
emmatyping Apr 26, 2025
fda87c8
Add _zstd to modules
emmatyping Apr 26, 2025
887e564
Fix path for compression.zstd module
emmatyping Apr 26, 2025
cdba656
Ignore _zstd module like _io
emmatyping Apr 26, 2025
6b67e9b
Expand module state macros to improve code quality
emmatyping Apr 26, 2025
a99a5d2
Remove backticks suggested in review
emmatyping Apr 27, 2025
02cd17a
Use critical sections to lock object state
emmatyping Apr 27, 2025
54eca74
Remove compress/decompress and mark module as not reliant on the GIL
emmatyping Apr 27, 2025
f605956
Lift critical section to avoid clang warning
emmatyping Apr 27, 2025
2eadc65
Respond to comments by picnixz
emmatyping Apr 27, 2025
8eac354
Call out pyzstd explicitly in license description
emmatyping Apr 27, 2025
26775be
Use a much more robust implementation...
emmatyping Apr 27, 2025
eae460f
Use PyList_GetItemRef for thread safety purposes
emmatyping Apr 27, 2025
2ab5e4a
Use a macro for the minimum supported version
emmatyping Apr 27, 2025
d5bf1c1
remove const from primivite types
emmatyping Apr 27, 2025
9e92b9f
Use PyMem_New in another spot
emmatyping Apr 27, 2025
47f815a
Simplify error handling in _get_frame_size
emmatyping Apr 27, 2025
6a4f7b8
Another simplification of error handling in get_frame_info
emmatyping Apr 27, 2025
d7b3805
Rename _module_state to mod_state
emmatyping Apr 27, 2025
c225ea6
Rewrite comment explaining the context of the code
emmatyping Apr 28, 2025
6e8c61c
Add link to pyzstd
emmatyping Apr 28, 2025
e52ad06
Add TODO about refactoring dict training code
emmatyping Apr 28, 2025
2a1ad8b
Use PyModule_AddObjectRef over PyModule_AddObject
emmatyping Apr 28, 2025
94473b9
Check result of OutputBufferGrow
emmatyping Apr 28, 2025
e2b2515
Simplify return logic in `add_constant_to_type`
emmatyping Apr 29, 2025
cd2f085
Ignore return value of _zstd_clear()
emmatyping Apr 29, 2025
79e174f
Remove redundant comments
emmatyping Apr 29, 2025
ce6f79c
Remove __reduce__ from ZstdDict
emmatyping Apr 29, 2025
e15dd85
Use PyUnicode_FromFormat instead of a buffer
emmatyping Apr 29, 2025
685a3d1
Don't use C constants/types in error messages
emmatyping Apr 29, 2025
1b9f786
Make error messages easier to understand for Python users
emmatyping Apr 29, 2025
40c653c
Lower minimum required version 1.4.0
emmatyping Apr 30, 2025
428677d
Use casts and make slot function signatures correct
emmatyping Apr 30, 2025
0962bbb
Be consistent with CPython on const usage
emmatyping Apr 30, 2025
85efc18
Make else clauses in line with PEP 7
emmatyping Apr 30, 2025
cadf6e4
Fix over-indented blocks in argument clinic
emmatyping Apr 30, 2025
e45c22a
Merge branch 'main' into 3.14-zstd-c-code
emmatyping Apr 30, 2025
b9415be
Merge branch 'main' into 3.14-zstd-c-code
emmatyping Apr 30, 2025
6760545
Add critical section around ZSTD_DCtx_setParameter
emmatyping May 1, 2025
c082d8a
Add a TODO about refactoring critical sections
emmatyping May 1, 2025
e825285
Use Py_UNREACHABLE
emmatyping May 1, 2025
f02ff5a
Move bytes operations out of Py_BEGIN_ALLOW_THREADS
emmatyping May 2, 2025
0d69c8c
Add TODO about ensuring a lock is held
emmatyping May 2, 2025
58b0008
Remove asserts that may not be correct
emmatyping May 2, 2025
c786c3c
Add TODO to make ZstdDict and others GC objects
emmatyping May 2, 2025
a9d57fc
Make objects GC tracked
emmatyping May 2, 2025
cc9ab2a
Remove unused include
emmatyping May 2, 2025
1f65d19
Fix some memory issues
emmatyping May 2, 2025
552da2d
Fix refleaks on module and in ZstdDict
emmatyping May 2, 2025
c0648d8
Update configure to check for ZDICT_finalizeDictionary
emmatyping May 3, 2025
d149b2c
Merge branch 'main' into 3.14-zstd-c-code
emmatyping May 3, 2025
a5ea379
Properly check version in configure
emmatyping May 3, 2025
0d88d6d
exit(1) if check fails
emmatyping May 3, 2025
4a0a33e
Use AC_RUN_IFELSE
emmatyping May 3, 2025
30b3934
Use a define() to re-use version check
emmatyping May 3, 2025
90b9172
Merge branch 'main' into 3.14-zstd-c-code
emmatyping May 3, 2025
8fdc7f7
Actually properly set _zstd module status based on version
emmatyping May 3, 2025
10e2e80
Merge branch 'main' into 3.14-zstd-c-code
emmatyping May 3, 2025
d90b29d
Merge branch 'main' into 3.14-zstd-c-code
emmatyping May 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Remove compress/decompress and mark module as not reliant on the GIL
The `compress`/`decompress` functions will be moved to Python code for simplicity.
C implementations can always be re-added in the future.

Also, mark _zstd as not requiring the GIL.
  • Loading branch information
emmatyping committed Apr 29, 2025
commit 54eca744197efe47467353605dead01aaf90bf06
187 changes: 2 additions & 185 deletions Modules/_zstd/_zstdmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -588,198 +588,13 @@ _zstd__set_parameter_types_impl(PyObject *module, PyObject *c_parameter_type,
Py_RETURN_NONE;
}

/*[clinic input]
_zstd.compress

data: Py_buffer
A bytes-like object, data to be compressed.
level: object = None
The compression level to use, defaults to ZSTD_CLEVEL_DEFAULT.
options: object = None
A dict object that contains advanced compression parameters.
zstd_dict: object = None
A ZstdDict object, a pre-trained zstd dictionary.

Compress a block of data, return a bytes object of zstd compressed data.

Refer to ZstdCompressor's docstring for a description of the
optional arguments *level*, *options*, and *zstd_dict*.

For incremental compression, use an ZstdCompressor instead.
[clinic start generated code]*/

static PyObject *
_zstd_compress_impl(PyObject *module, Py_buffer *data, PyObject *level,
PyObject *options, PyObject *zstd_dict)
/*[clinic end generated code: output=0cca9399ca5c95cc input=e8a7c59073af923c]*/
{
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state == NULL) {
return NULL;
}
PyObject *ret = NULL;
ZstdCompressor self = {0};

/* Initialize & set ZstdCompressor */
self.cctx = ZSTD_createCCtx();
if (self.cctx == NULL) {
PyErr_SetString(_module_state->ZstdError,
"Unable to create ZSTD_CCtx instance.");
goto error;
}

if (level != Py_None && options != Py_None) {
PyErr_SetString(PyExc_RuntimeError, "Only one of level or options should be used.");
return NULL;
}

/* Set compressLevel/options to compression context */
if (level != Py_None) {
if (_PyZstd_set_c_parameters(&self, level, "level", "int") < 0) {
goto error;
}
}

if (options != Py_None) {
if (_PyZstd_set_c_parameters(&self, options, "options", "dict") < 0) {
goto error;
}
}

/* Load dictionary to compression context */
if (zstd_dict != Py_None) {
if (_PyZstd_load_c_dict(&self, zstd_dict) < 0) {
goto error;
}
self.dict = zstd_dict;
}

ret = compress_impl(&self, data, ZSTD_e_end);
if (ret == NULL) {
goto error;
} else {
goto success;
}
error:
Py_CLEAR(ret);
success:
/* Free decompression context */
ZSTD_freeCCtx(self.cctx);
return ret;
}

/*[clinic input]
_zstd.decompress

data: Py_buffer
A bytes-like object, zstd data to be decompressed.
zstd_dict: object = None
A ZstdDict object, a pre-trained zstd dictionary.
options: object = None
A dict object that contains advanced decompression parameters.

Decompress one or more frames of data.

Refer to ZstdDecompressor's docstring for a description of the
optional arguments *zstd_dict*, *options*.

For incremental decompression, use an ZstdDecompressor instead.
[clinic start generated code]*/

static PyObject *
_zstd_decompress_impl(PyObject *module, Py_buffer *data, PyObject *zstd_dict,
PyObject *options)
/*[clinic end generated code: output=2e8423588fb3b178 input=c18346e620e59039]*/
{
uint64_t decompressed_size;
Py_ssize_t initial_size;
ZstdDecompressor self = {0};
ZSTD_inBuffer in;
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state == NULL) {
return NULL;
}
PyObject *ret = NULL;

/* Initialize & set ZstdDecompressor */
self.dctx = ZSTD_createDCtx();
if (self.dctx == NULL) {
PyErr_SetString(_module_state->ZstdError,
"Unable to create ZSTD_DCtx instance.");
goto error;
}
self.at_frame_edge = 1;

/* Load dictionary to decompression context */
if (zstd_dict != Py_None) {
if (_PyZstd_load_d_dict(&self, zstd_dict) < 0) {
goto error;
}
}

/* Set option to decompression context */
if (options != Py_None) {
if (_PyZstd_set_d_parameters(&self, options) < 0) {
goto error;
}
}

/* Prepare input data */
in.src = data->buf;
in.size = data->len;
in.pos = 0;

/* Get decompressed size */
decompressed_size = ZSTD_getFrameContentSize(data->buf, data->len);
/* These two zstd constants always > PY_SSIZE_T_MAX:
ZSTD_CONTENTSIZE_UNKNOWN is (0ULL - 1)
ZSTD_CONTENTSIZE_ERROR is (0ULL - 2) */
if (decompressed_size <= (uint64_t) PY_SSIZE_T_MAX) {
initial_size = (Py_ssize_t) decompressed_size;
} else {
initial_size = -1;
}

/* Decompress */
ret = decompress_impl(&self, &in, -1, initial_size,
TYPE_ENDLESS_DECOMPRESSOR);
if (ret == NULL) {
goto error;
}

/* Check data integrity. at_frame_edge flag is 1 when both the input and
output streams are at a frame edge. */
if (self.at_frame_edge == 0) {
char *extra_msg = (Py_SIZE(ret) == 0) ? "." :
", if want to output these decompressed data, use "
"the ZstdDecompressor class to decompress.";
PyErr_Format(_module_state->ZstdError,
"Decompression failed: zstd data ends in an incomplete "
"frame, maybe the input data was truncated. Decompressed "
"data is %zd bytes%s",
Py_SIZE(ret), extra_msg);
goto error;
}

goto success;

error:
Py_CLEAR(ret);
success:
/* Free decompression context */
ZSTD_freeDCtx(self.dctx);
return ret;
}

static PyMethodDef _zstd_methods[] = {
_ZSTD__TRAIN_DICT_METHODDEF
_ZSTD__FINALIZE_DICT_METHODDEF
_ZSTD__GET_PARAM_BOUNDS_METHODDEF
_ZSTD_GET_FRAME_SIZE_METHODDEF
_ZSTD__GET_FRAME_INFO_METHODDEF
_ZSTD__SET_PARAMETER_TYPES_METHODDEF
_ZSTD_COMPRESS_METHODDEF
_ZSTD_DECOMPRESS_METHODDEF

{0}
};
Expand Down Expand Up @@ -1130,6 +945,8 @@ _zstd_free(void *module)

static struct PyModuleDef_Slot _zstd_slots[] = {
{Py_mod_exec, _zstd_exec},
{Py_mod_gil, Py_MOD_GIL_NOT_USED},

{0}
};

Expand Down
Loading
0