8000 bpo-37596: compile to reproducible frozen sets by FFY00 · Pull Request #27769 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-37596: compile to reproducible frozen sets #27769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

FFY00
Copy link
Member
@FFY00 FFY00 commented Aug 14, 2021

Signed-off-by: Filipe Laíns <lains@riseup.net>
@FFY00
Copy link
Member Author
FFY00 commented Aug 14, 2021

@pablogsal how does this look?

Comment on lines +70 to +82
PyAPI_DATA(PyTypeObject) _PyReproducibleFrozenSet_Type;

PyAPI_FUNC(PyObject *) _PyReproducibleFrozenSet_New(PyObject *);

PyAPI_FUNC(int) _PySet_NextEntry(PyObject *set, Py_ssize_t *pos, PyObject **key, Py_hash_t *hash);
PyAPI_FUNC(int) _PySet_Update(PyObject *set, PyObject *iterable);

#define _PyReproducibleFrozenSet_CheckExact(ob) \
Py_IS_TYPE(ob, &_PyReproducibleFrozenSet_Type)
#define _PyReproducibleFrozenSet_Check(ob) \
(Py_IS_TYPE(ob, &_PyReproducibleFrozenSet_Type) || \
PyType_IsSubtype(Py_TYPE(ob), &_PyReproducibleFrozenSet_Type))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions should not be public. I think we really should not expose this type to be used or checked from outside CPython

Comment on lines +125 to +131
if (_PyReproducibleFrozenSet_CheckExact(so)) {
i = 0;
perturb = 0;
} else {
i = (size_t)hash & mask;
perturb = hash;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this impacts the performance of normal set operations, and given that is a basic type of the interpreter I am not very convinced about it. This is one of the reasons I said:

and a new type, which I am not very fond of)

Because a new type will require a new set of independent functions on the required path to not impact the regular set and frozenset and that is much more code than what I am confortable with.

On the other hand, others may have other views on this, so I am happy to be convinced otherwise :)

@@ -2233,6 +2241,53 @@ PyTypeObject PyFrozenSet_Type = {
};


PyTypeObject _PyReproducibleFrozenSet_Type = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new types, please use C99 named fields like

static PyTypeObject CustomType = {
PyVarObject_HEAD_INIT(NULL, 0)
.tp_name = "custom2.Custom",
.tp_doc = "Custom objects",
.tp_basicsize = sizeof(CustomObject),
.tp_itemsize = 0,
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
.tp_new = Custom_new,
.tp_init = (initproc) Custom_init,
.tp_dealloc = (destructor) Custom_dealloc,
.tp_members = Custom_members,
.tp_methods = Custom_methods,
};

@pablogsal
Copy link
Member

@pablogsal how does this look?

The approach looks good, but I have some concerns regarding impacting the code paths of regular sets.

@pablogsal pablogsal requested a review from rhettinger August 14, 2021 15:59
@pablogsal
Copy link
Member

Don't forget to run make regen-all :)

@rhettinger
Copy link
Contributor

I recommend rejecting this PR and its entire approach. For this to work, the alternate search logic would be needed in multiple places: set_lookkey(), set_add_entry(), and in the critical path for set_insert_clean(). This impacts the most highly optimized parts of the module. Also the fast paths in set_merge() and set_swap_bodies() would need to be bypassed when they assume two sets have the same search order. Even if all those changes were made and we accepted the performance impact on regular sets and didn't mind that added maintenance complexity, we still get the new problem that building a reproducible set becomes quadratic and searching a reproducible set becomes linear, both of which defeat the entire rationale for a programmer having used a frozenset rather than a tuple.

There are other approaches that would be performant, correct, and not damage existing set logic.

One way is to have a frozenset subclass that augments the data structure with a separate tuple of keys to be used by marshal when saving a PYC. The _PyReproducibleFrozenSet_New call would save the tuple and then delegate the actual set construction back to frozenset_new. That keeps the code clean and the performance intact. The memory cost is small in comparison to the total set size and is less would be incurred with an odict.

When evaluating approaches, the following are unacceptable: 1) quadratic build time, 2) linear speed search, 3) slowing down critical paths in regular sets, 4) complicating the code in a way that makes it difficult to maintain, 5) create new public APIs that we have to support in perpetuity, 6) code that isn't thoroughly tested (i.e. the correctness issues with the current PR would have been detected immediately).

@ambv
Copy link
Contributor
ambv commented Aug 24, 2021

Agreed with Raymond. The approach in GH-27926 looks more promising.

@ambv ambv closed this Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0