-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-37596: compile to reproducible frozen sets #27769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Filipe Laíns <lains@riseup.net>
@pablogsal how does this look? |
PyAPI_DATA(PyTypeObject) _PyReproducibleFrozenSet_Type; | ||
|
||
PyAPI_FUNC(PyObject *) _PyReproducibleFrozenSet_New(PyObject *); | ||
|
||
PyAPI_FUNC(int) _PySet_NextEntry(PyObject *set, Py_ssize_t *pos, PyObject **key, Py_hash_t *hash); | ||
PyAPI_FUNC(int) _PySet_Update(PyObject *set, PyObject *iterable); | ||
|
||
#define _PyReproducibleFrozenSet_CheckExact(ob) \ | ||
Py_IS_TYPE(ob, &_PyReproducibleFrozenSet_Type) | ||
#define _PyReproducibleFrozenSet_Check(ob) \ | ||
(Py_IS_TYPE(ob, &_PyReproducibleFrozenSet_Type) || \ | ||
PyType_IsSubtype(Py_TYPE(ob), &_PyReproducibleFrozenSet_Type)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions should not be public. I think we really should not expose this type to be used or checked from outside CPython
if (_PyReproducibleFrozenSet_CheckExact(so)) { | ||
i = 0; | ||
perturb = 0; | ||
} else { | ||
i = (size_t)hash & mask; | ||
perturb = hash; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this impacts the performance of normal set operations, and given that is a basic type of the interpreter I am not very convinced about it. This is one of the reasons I said:
and a new type, which I am not very fond of)
Because a new type will require a new set of independent functions on the required path to not impact the regular set and frozenset and that is much more code than what I am confortable with.
On the other hand, others may have other views on this, so I am happy to be convinced otherwise :)
@@ -2233,6 +2241,53 @@ PyTypeObject PyFrozenSet_Type = { | |||
}; | |||
|
|||
|
|||
PyTypeObject _PyReproducibleFrozenSet_Type = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For new types, please use C99 named fields like
cpython/Doc/includes/custom2.c
Lines 98 to 110 in bb3e0c2
static PyTypeObject CustomType = { | |
PyVarObject_HEAD_INIT(NULL, 0) | |
.tp_name = "custom2.Custom", | |
.tp_doc = "Custom objects", | |
.tp_basicsize = sizeof(CustomObject), | |
.tp_itemsize = 0, | |
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, | |
.tp_new = Custom_new, | |
.tp_init = (initproc) Custom_init, | |
.tp_dealloc = (destructor) Custom_dealloc, | |
.tp_members = Custom_members, | |
.tp_methods = Custom_methods, | |
}; |
The approach looks good, but I have some concerns regarding impacting the code paths of regular sets. |
Don't forget to run |
I recommend rejecting this PR and its entire approach. For this to work, the alternate search logic would be needed in multiple places: set_lookkey(), set_add_entry(), and in the critical path for set_insert_clean(). This impacts the most highly optimized parts of the module. Also the fast paths in set_merge() and set_swap_bodies() would need to be bypassed when they assume two sets have the same search order. Even if all those changes were made and we accepted the performance impact on regular sets and didn't mind that added maintenance complexity, we still get the new problem that building a reproducible set becomes quadratic and searching a reproducible set becomes linear, both of which defeat the entire rationale for a programmer having used a frozenset rather than a tuple. There are other approaches that would be performant, correct, and not damage existing set logic. One way is to have a frozenset subclass that augments the data structure with a separate tuple of keys to be used by marshal when saving a PYC. The _PyReproducibleFrozenSet_New call would save the tuple and then delegate the actual set construction back to frozenset_new. That keeps the code clean and the performance intact. The memory cost is small in comparison to the total set size and is less would be incurred with an odict. When evaluating approaches, the following are unacceptable: 1) quadratic build time, 2) linear speed search, 3) slowing down critical paths in regular sets, 4) complicating the code in a way that makes it difficult to maintain, 5) create new public APIs that we have to support in perpetuity, 6) code that isn't thoroughly tested (i.e. the correctness issues with the current PR would have been detected immediately). |
Agreed with Raymond. The approach in GH-27926 looks more promising. |
Signed-off-by: Filipe Laíns lains@riseup.net
https://bugs.python.org/issue37596