8000 gh-132983: Introduce `_zstd` bindings module by emmatyping · Pull Request #133027 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-132983: Introduce _zstd bindings module #133027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
May 4, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
9814e3b
Add _zstd module
emmatyping Apr 26, 2025
fda87c8
Add _zstd to modules
emmatyping Apr 26, 2025
887e564
Fix path for compression.zstd module
emmatyping Apr 26, 2025
cdba656
Ignore _zstd module like _io
emmatyping Apr 26, 2025
6b67e9b
Expand module state macros to improve code quality
emmatyping Apr 26, 2025
a99a5d2
Remove backticks suggested in review
emmatyping Apr 27, 2025
02cd17a
Use critical sections to lock object state
emmatyping Apr 27, 2025
54eca74
Remove compress/decompress and mark module as not reliant on the GIL
emmatyping Apr 27, 2025
f605956
Lift critical section to avoid clang warning
emmatyping Apr 27, 2025
2eadc65
Respond to comments by picnixz
emmatyping Apr 27, 2025
8eac354
Call out pyzstd explicitly in license description
emmatyping Apr 27, 2025
26775be
Use a much more robust implementation...
emmatyping Apr 27, 2025
eae460f
Use PyList_GetItemRef for thread safety purposes
emmatyping Apr 27, 2025
2ab5e4a
Use a macro for the minimum supported version
emmatyping Apr 27, 2025
d5bf1c1
remove const from primivite types
emmatyping Apr 27, 2025
9e92b9f
Use PyMem_New in another spot
emmatyping Apr 27, 2025
47f815a
Simplify error handling in _get_frame_size
emmatyping Apr 27, 2025
6a4f7b8
Another simplification of error handling in get_frame_info
emmatyping Apr 27, 2025
d7b3805
Rename _module_state to mod_state
emmatyping Apr 27, 2025
c225ea6
Rewrite comment explaining the context of the code
emmatyping Apr 28, 2025
6e8c61c
Add link to pyzstd
emmatyping Apr 28, 2025
e52ad06
Add TODO about refactoring dict training code
emmatyping Apr 28, 2025
2a1ad8b
Use PyModule_AddObjectRef over PyModule_AddObject
emmatyping Apr 28, 2025
94473b9
Check result of OutputBufferGrow
emmatyping Apr 28, 2025
e2b2515
Simplify return logic in `add_constant_to_type`
emmatyping Apr 29, 2025
cd2f085
Ignore return value of _zstd_clear()
emmatyping Apr 29, 2025
79e174f
Remove redundant comments
emmatyping Apr 29, 2025
ce6f79c
Remove __reduce__ from ZstdDict
emmatyping Apr 29, 2025
e15dd85
Use PyUnicode_FromFormat instead of a buffer
emmatyping Apr 29, 2025
685a3d1
Don't use C constants/types in error messages
emmatyping Apr 29, 2025
1b9f786
Make error messages easier to understand for Python users
emmatyping Apr 29, 2025
40c653c
Lower minimum required version 1.4.0
emmatyping Apr 30, 2025
428677d
Use casts and make slot function signatures correct
emmatyping Apr 30, 2025
0962bbb
Be consistent with CPython on const usage
emmatyping Apr 30, 2025
85efc18
Make else clauses in line with PEP 7
emmatyping Apr 30, 2025
cadf6e4
Fix over-indented blocks in argument clinic
emmatyping Apr 30, 2025
e45c22a
Merge branch 'main' into 3.14-zstd-c-code
emmatyping Apr 30, 2025
b9415be
Merge branch 'main' into 3.14-zstd-c-code
emmatyping Apr 30, 2025
6760545
Add critical section around ZSTD_DCtx_setParameter
emmatyping May 1, 2025
c082d8a
Add a TODO about refactoring critical sections
emmatyping May 1, 2025
e825285
Use Py_UNREACHABLE
emmatyping May 1, 2025
f02ff5a
Move bytes operations out of Py_BEGIN_ALLOW_THREADS
emmatyping May 2, 2025
0d69c8c
Add TODO about ensuring a lock is held
emmatyping May 2, 2025
58b0008
Remove asserts that may not be correct
emmatyping May 2, 2025
c786c3c
Add TODO to make ZstdDict and others GC objects
emmatyping May 2, 2025
a9d57fc
Make objects GC tracked
emmatyping May 2, 2025
cc9ab2a
Remove unused include
emmatyping May 2, 2025
1f65d19
Fix some memory issues
emmatyping May 2, 2025
552da2d
Fix refleaks on module and in ZstdDict
emmatyping May 2, 2025
c0648d8
Update configure to check for ZDICT_finalizeDictionary
emmatyping May 3, 2025
d149b2c
Merge branch 'main' into 3.14-zstd-c-code
emmatyping May 3, 2025
a5ea379
Properly check version in configure
emmatyping May 3, 2025
0d88d6d
exit(1) if check fails
emmatyping May 3, 2025
4a0a33e
Use AC_RUN_IFELSE
emmatyping May 3, 2025
30b3934
Use a define() to re-use version check
emmatyping May 3, 2025
90b9172
Merge branch 'main' into 3.14-zstd-c-code
emmatyping May 3, 2025
8fdc7f7
Actually properly set _zstd module status based on version
emmatyping May 3, 2025
10e2e80
Merge branch 'main' into 3.14-zstd-c-code
emmatyping May 3, 2025
d90b29d
8000 Merge branch 'main' into 3.14-zstd-c-code
emmatyping May 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Expand module state macros to improve code quality
Also removes module state references from the classes in the _zstd
module and instead uses PyType_GetModuleState()
  • Loading branch information
emmatyping committed Apr 29, 2025
commit 6b67e9bac8dfe7f9c82f0d3e2cb0646c8f2ee251
190 changes: 108 additions & 82 deletions Modules/_zstd/_zstdmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -256,8 +256,10 @@ _zstd__train_dict_impl(PyObject *module, PyBytesObject *samples_bytes,

/* Check zstd dict error */
if (ZDICT_isError(zstd_ret)) {
STATE_FROM_MODULE(module);
set_zstd_error(MODULE_STATE, ERR_TRAIN_DICT, zstd_ret);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state != NULL) {
set_zstd_error(_module_state, ERR_TRAIN_DICT, zstd_ret);
}
goto error;
}

Expand Down Expand Up @@ -384,8 +386,10 @@ _zstd__finalize_dict_impl(PyObject *module, PyBytesObject *custom_dict_bytes,

/* Check zstd dict error */
if (ZDICT_isError(zstd_ret)) {
STATE_FROM_MODULE(module);
set_zstd_error(MODULE_STATE, ERR_FINALIZE_DICT, zstd_ret);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state != NULL) {
set_zstd_error(_module_state, ERR_FINALIZE_DICT, zstd_ret);
}
goto error;
}

Expand Down Expand Up @@ -425,15 +429,19 @@ _zstd__get_param_bounds_impl(PyObject *module, int is_compress,
if (is_compress) {
bound = ZSTD_cParam_getBounds(parameter);
if (ZSTD_isError(bound.error)) {
STATE_FROM_MODULE(module);
set_zstd_error(MODULE_STATE, ERR_GET_C_BOUNDS, bound.error);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state != NULL) {
set_zstd_error(_module_state, ERR_GET_C_BOUNDS, bound.error);
}
return NULL;
}
} else {
bound = ZSTD_dParam_getBounds(parameter);
if (ZSTD_isError(bound.error)) {
STATE_FROM_MODULE(module);
set_zstd_error(MODULE_STATE, ERR_GET_D_BOUNDS, bound.error);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state != NULL) {
set_zstd_error(_module_state, ERR_GET_D_BOUNDS, bound.error);
}
return NULL;
}
}
Expand Down Expand Up @@ -462,13 +470,15 @@ _zstd_get_frame_size_impl(PyObject *module, Py_buffer *frame_buffer)

frame_size = ZSTD_findFrameCompressedSize(frame_buffer->buf, frame_buffer->len);
if (ZSTD_isError(frame_size)) {
STATE_FROM_MODULE(module);
PyErr_Format(MS_MEMBER(ZstdError),
"Error when finding the compressed size of a zstd frame. "
"Make sure the frame_buffer argument starts from the "
"beginning of a frame, and its length not less than this "
"complete frame. Zstd error message: %s.",
ZSTD_getErrorName(frame_size));
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state != NULL) {
PyErr_Format(_module_state->ZstdError,
"Error when finding the compressed size of a zstd frame. "
"Make sure the frame_buffer argument starts from the "
"beginning of a frame, and its length not less than this "
"complete frame. Zstd error message: %s.",
ZSTD_getErrorName(frame_size));
}
goto error;
}

Expand Down Expand Up @@ -508,12 +518,14 @@ _zstd__get_frame_info_impl(PyObject *module, Py_buffer *frame_buffer)
/* #define ZSTD_CONTENTSIZE_UNKNOWN (0ULL - 1)
#define ZSTD_CONTENTSIZE_ERROR (0ULL - 2) */
if (decompressed_size == ZSTD_CONTENTSIZE_ERROR) {
STATE_FROM_MODULE(module);
PyErr_SetString(MS_MEMBER(ZstdError),
"Error when getting information from the header of "
"a zstd frame. Make sure the frame_buffer argument "
"starts from the beginning of a frame, and its length "
"not less than the frame header (6~18 bytes).");
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state != NULL) {
PyErr_SetString(_module_state->ZstdError,
"Error when getting information from the header of "
"a zstd frame. Make sure the frame_buffer argument "
"starts from the beginning of a frame, and its length "
"not less than the frame header (6~18 bytes).");
}
goto error;
}

Expand Down Expand Up @@ -553,7 +565,10 @@ _zstd__set_parameter_types_impl(PyObject *module, PyObject *c_parameter_type,
PyObject *d_parameter_type)
/*[clinic end generated code: output=a13d4890ccbd2873 input=3e7d0d37c3a1045a]*/
{
STATE_FROM_MODULE(module);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state == NULL) {
return NULL;
}

if (!PyType_Check(c_parameter_type) || !PyType_Check(d_parameter_type)) {
PyErr_SetString(PyExc_ValueError,
Expand All @@ -562,13 +577,13 @@ _zstd__set_parameter_types_impl(PyObject *module, PyObject *c_parameter_type,
return NULL;
}

Py_XDECREF(MS_MEMBER(CParameter_type));
Py_XDECREF(_module_state->CParameter_type);
Py_INCREF(c_parameter_type);
MS_MEMBER(CParameter_type) = (PyTypeObject*)c_parameter_type;
_module_state->CParameter_type = (PyTypeObject*) c_parameter_type;

Py_XDECREF(MS_MEMBER(DParameter_type));
Py_XDECREF(_module_state->DParameter_type);
Py_INCREF(d_parameter_type);
MS_MEMBER(DParameter_type) = (PyTypeObject*)d_parameter_type;
_module_state->DParameter_type = (PyTypeObject*)d_parameter_type;

Py_RETURN_NONE;
}
Expand Down Expand Up @@ -598,20 +613,21 @@ _zstd_compress_impl(PyObject *module, Py_buffer *data, PyObject *level,
PyObject *options, PyObject *zstd_dict)
/*[clinic end generated code: output=0cca9399ca5c95cc input=e8a7c59073af923c]*/
{
STATE_FROM_MODULE(module);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state == NULL) {
return NULL;
}
PyObject *ret = NULL;
ZstdCompressor self = {0};

/* Initialize & set ZstdCompressor */
self.cctx = ZSTD_createCCtx();
if (self.cctx == NULL) {
PyErr_SetString(MS_MEMBER(ZstdError),
PyErr_SetString(_module_state->ZstdError,
"Unable to create ZSTD_CCtx instance.");
goto error;
}

self.module_state = MODULE_STATE;

if (level != Py_None && options != Py_None) {
PyErr_SetString(PyExc_RuntimeError, "Only one of level or options should be used.");
return NULL;
Expand Down Expand Up @@ -679,20 +695,21 @@ _zstd_decompress_impl(PyObject *module, Py_buffer *data, PyObject *zstd_dict,
Py_ssize_t initial_size;
ZstdDecompressor self = {0};
ZSTD_inBuffer in;
STATE_FROM_MODULE(module);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_modu A93C le_state == NULL) {
return NULL;
}
PyObject *ret = NULL;

/* Initialize & set ZstdDecompressor */
self.dctx = ZSTD_createDCtx();
if (self.dctx == NULL) {
PyErr_SetString(MS_MEMBER(ZstdError),
PyErr_SetString(_module_state->ZstdError,
"Unable to create ZSTD_DCtx instance.");
goto error;
}
self.at_frame_edge = 1;

self.module_state = MODULE_STATE;

/* Load dictionary to decompression context */
if (zstd_dict != Py_None) {
if (_PyZstd_load_d_dict(&self, zstd_dict) < 0) {
Expand Down Expand Up @@ -736,7 +753,7 @@ _zstd_decompress_impl(PyObject *module, Py_buffer *data, PyObject *zstd_dict,
char *extra_msg = (Py_SIZE(ret) == 0) ? "." :
", if want to output these decompressed data, use "
"the ZstdDecompressor class to decompress.";
PyErr_Format(MS_MEMBER(ZstdError),
PyErr_Format(_module_state->ZstdError,
"Decompression failed: zstd data ends in an incomplete "
"frame, maybe the input data was truncated. Decompressed "
"data is %zd bytes%s",
Expand Down Expand Up @@ -916,8 +933,8 @@ add_vars_to_module(PyObject *module)

#define ADD_STR_TO_STATE_MACRO(STR) \
do { \
MS_MEMBER(str_##STR) = PyUnicode_FromString(#STR); \
if (MS_MEMBER(str_##STR) == NULL) { \
_module_state->str_##STR = PyUnicode_FromString(#STR); \
if (_module_state->str_##STR == NULL) { \
return -1; \
} \
} while(0)
Expand Down Expand Up @@ -959,17 +976,20 @@ add_constant_to_type(PyTypeObject *type, const char *name, const long value)
}

static int _zstd_exec(PyObject *module) {
STATE_FROM_MODULE(module);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state == NULL) {
return -1;
}

/* Reusable objects & variables */
MS_MEMBER(empty_bytes) = PyBytes_FromStringAndSize(NULL, 0);
if (MS_MEMBER(empty_bytes) == NULL) {
_module_state->empty_bytes = PyBytes_FromStringAndSize(NULL, 0);
if (_module_state->empty_bytes == NULL) {
return -1;
}

MS_MEMBER(empty_readonly_memoryview) =
PyMemoryView_FromMemory((char*)MODULE_STATE, 0, PyBUF_READ);
if (MS_MEMBER(empty_readonly_memoryview) == NULL) {
_module_state->empty_readonly_memoryview =
PyMemoryView_FromMemory((char*)_module_state, 0, PyBUF_READ);
if (_module_state->empty_readonly_memoryview == NULL) {
return -1;
}

Expand All @@ -979,59 +999,59 @@ static int _zstd_exec(PyObject *module) {
ADD_STR_TO_STATE_MACRO(write);
ADD_STR_TO_STATE_MACRO(flush);

MS_MEMBER(CParameter_type) = NULL;
MS_MEMBER(DParameter_type) = NULL;
_module_state->CParameter_type = NULL;
_module_state->DParameter_type = NULL;

/* Add variables to module */
if (add_vars_to_module(module) < 0) {
return -1;
}

/* ZstdError */
MS_MEMBER(ZstdError) = PyErr_NewExceptionWithDoc(
_module_state->ZstdError = PyErr_NewExceptionWithDoc(
"_zstd.ZstdError",
"Call to the underlying zstd library failed.",
NULL, NULL);
if (MS_MEMBER(ZstdError) == NULL) {
if (_module_state->ZstdError == NULL) {
return -1;
}

Py_INCREF(MS_MEMBER(ZstdError));
if (PyModule_AddObject(module, "ZstdError", MS_MEMBER(ZstdError)) < 0) {
Py_DECREF(MS_MEMBER(ZstdError));
Py_INCREF(_module_state->ZstdError);
if (PyModule_AddObject(module, "ZstdError", _module_state->ZstdError) < 0) {
Py_DECREF(_module_state->ZstdError);
return -1;
}

/* ZstdDict */
if (add_type_to_module(module,
"ZstdDict",
&zstddict_type_spec,
&MS_MEMBER(ZstdDict_type)) < 0) {
&_module_state->ZstdDict_type) < 0) {
return -1;
}

// ZstdCompressor
if (add_type_to_module(module,
"ZstdCompressor",
&zstdcompressor_type_spec,
&MS_MEMBER(ZstdCompressor_type)) < 0) {
&_module_state->ZstdCompressor_type) < 0) {
return -1;
}

// Add EndDirective enum to ZstdCompressor
if (add_constant_to_type(MS_MEMBER(ZstdCompressor_type),
if (add_constant_to_type(_module_state->ZstdCompressor_type,
"CONTINUE",
ZSTD_e_continue) < 0) {
return -1;
}

if (add_constant_to_type(MS_MEMBER(ZstdCompressor_type),
if (add_constant_to_type(_module_state->ZstdCompressor_type,
"FLUSH_BLOCK",
ZSTD_e_flush) < 0) {
return -1;
}

if (add_constant_to_type(MS_MEMBER(ZstdCompressor_type),
if (add_constant_to_type(_module_state->ZstdCompressor_type,
"FLUSH_FRAME",
ZSTD_e_end) < 0) {
return -1;
Expand All @@ -1041,7 +1061,7 @@ static int _zstd_exec(PyObject *module) {
if (add_type_to_module(module,
"ZstdDecompressor",
&ZstdDecompressor_type_spec,
&MS_MEMBER(ZstdDecompressor_type)) < 0) {
&_module_state->ZstdDecompressor_type) < 0) {
return -1;
}

Expand All @@ -1051,48 +1071,54 @@ static int _zstd_exec(PyObject *module) {
static int
_zstd_traverse(PyObject *module, visitproc visit, void *arg)
{
STATE_FROM_MODULE(module);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state == NULL) {
return -1;
}

Py_VISIT(MS_MEMBER(empty_bytes));
Py_VISIT(MS_MEMBER(empty_readonly_memoryview));
Py_VISIT(MS_MEMBER(str_read));
Py_VISIT(MS_MEMBER(str_readinto));
Py_VISIT(MS_MEMBER(str_write));
Py_VISIT(MS_MEMBER(str_flush));
Py_VISIT(_module_state->empty_bytes);
Py_VISIT(_module_state->empty_readonly_memoryview);
Py_VISIT(_module_state->str_read);
Py_VISIT(_module_state->str_readinto);
Py_VISIT(_module_state->str_write);
Py_VISIT(_module_state->str_flush);

Py_VISIT(MS_MEMBER(ZstdDict_type));
Py_VISIT(MS_MEMBER(ZstdCompressor_type));
Py_VISIT(_module_state->ZstdDict_type);
Py_VISIT(_module_state->ZstdCompressor_type);

Py_VISIT(MS_MEMBER(ZstdDecompressor_type));
Py_VISIT(_module_state->ZstdDecompressor_type);

Py_VISIT(MS_MEMBER(ZstdError));
Py_VISIT(_module_state->ZstdError);

Py_VISIT(MS_MEMBER(CParameter_type));
Py_VISIT(MS_MEMBER(DParameter_type));
Py_VISIT(_module_state->CParameter_type);
Py_VISIT(_module_state->DParameter_type);
return 0;
}

static int
_zstd_clear(PyObject *module)
{
STATE_FROM_MODULE(module);
_zstd_state* const _module_state = PyModule_GetState(module);
if (_module_state == NULL) {
return -1;
}

Py_CLEAR(MS_MEMBER(empty_bytes));
Py_CLEAR(MS_MEMBER(empty_readonly_memoryview));
Py_CLEAR(MS_MEMBER(str_read));
Py_CLEAR(MS_MEMBER(str_readinto));
Py_CLEAR(MS_MEMBER(str_write));
Py_CLEAR(MS_MEMBER(str_flush));
Py_CLEAR(_module_state->empty_bytes);
Py_CLEAR(_module_state->empty_readonly_memoryview);
Py_CLEAR(_module_state->str_read);
Py_CLEAR(_module_state->str_readinto);
Py_CLEAR(_module_state->str_write);
Py_CLEAR(_module_state->str_flush);

Py_CLEAR(MS_MEMBER(ZstdDict_type));
Py_CLEAR(MS_MEMBER(ZstdCompressor_type));
Py_CLEAR(_module_state->ZstdDict_type);
Py_CLEAR(_module_state->ZstdCompressor_type);

Py_CLEAR(MS_MEMBER(ZstdDecompressor_type));
Py_CLEAR(_module_state->ZstdDecompressor_type);

Py_CLEAR(MS_MEMBER(ZstdError));
Py_CLEAR(_module_state->ZstdError);

Py_CLEAR(MS_MEMBER(CParameter_type));
Py_CLEAR(MS_MEMBER(DParameter_type));
Py_CLEAR(_module_state->CParameter_type);
Py_CLEAR(_module_state->DParameter_type);
return 0;
}

Expand Down
Loading
0