-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-110235: Raise TypeError
for duplicate/unknown fields in PyStructSequence constructor
#110258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in the constructor is too complicated, and it slow down normal case, because it tests fields already passed in the sequence argument.
Simply count the number of fields found in the dict
argument. If it less than the size of the dict, then raise a TypeError.
I think that it is not needed to make the error message more specific, test whether some of keys match the fields already specified in sequence
argument or are not field names at all -- it only adds a lot of complicated code for a little gain. If users came with requests for more details, then we may provide them, but currently it is a waste of efforts.
I updated the implementation to only count the number of unexpected fields rather than list them in the error message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still too complicated. No need to check for multiple values for field.
The signature for SomeStructSeq(sequence: Iterable[Any], dict: Dict[str, Any]) -> SomeStructSeq In this PR, I raise an error for values that come from both IMO, it should raise an error as we did for mport collections
MyTuple = collections.namedtuple('MyTuple', ['a', 'b', 'c', 'd'])
MyTuple(1, 2, 3, 4, a=5, b=6) # TypeError: MyTuple.__new__() got multiple values for argument 'a' But def func(a, **kwargs):
pass
func(1, a=2) # TypeError: func() got multiple values for argument 'a' |
I meant that it will automatically raise TypeError if for i in range(len, max_len):
name = fields[i].name
if name in dict:
items[i] = dict[name]
count++
else:
items[i] = None
i++
if count < len(dict):
raise TypeError("simple generic error message") In C it will be just few additional lines of code. |
Thanks for the code snippet. My point is #110258 (comment):
For example: time.struct_time
fields: [
"tm_year", # 0, visiable
"tm_mon", # 1, visiable
"tm_mday", # 2, visiable
"tm_hour", # 3, visiable
"tm_min", # 4, visiable
"tm_sec", # 5, visiable
"tm_wday", # 6, visiable
"tm_yday", # 7, visiable
"tm_isdst", # 8, visiable
"tm_zone", # 9, invisible
"tm_gmtoff", # 10, invisible
]
n_sequence_fields: 9
n_unnamed_fields: 0
n_fields: 11 In the code snippet (#110258 (comment)), it will raise an error if there is a visible field name or an unknown name in time.struct_time(seqeunce=list(range(10)), dict={'tm_zone': -1})
# tm_zone is firstly set from sequence[9], then it will be override by dict['tm_zone'] without raising an error The implementation of this PR is roughly equivalent to: len = sequence.__len__()
min_len = n_sequence_fields
max_len = n_fields
for i, value in enumerate(sequence):
items[i] = value # for i >= min_len, it will set invisible fields
count = 0
- for i in range(len, max_len):
+ for i in range(min_len, max_len):
name = fields[i].name
+ if i < len:
+ if name in dict:
+ raise TypeError("duplicate invisible field values")
+ continue
if name in dict:
items[i] = dict[name]
count += 1
else:
items[i] = None
if count < dict.__len__():
raise TypeError("simple generic error message") |
The following code: diff --git a/Objects/structseq.c b/Objects/structseq.c
index 0ca622edc2..08af71a88d 100644
--- a/Objects/structseq.c
+++ b/Objects/structseq.c
@@ -216,6 +216,7 @@ structseq_new_impl(PyTypeObject *type, PyObject *arg, PyObject *dict)
res->ob_item[i] = Py_NewRef(v);
}
Py_DECREF(arg);
+ Py_ssize_t count = 0;
for (; i < max_len; ++i) {
PyObject *ob = NULL;
if (dict != NULL) {
@@ -228,8 +229,18 @@ structseq_new_impl(PyTypeObject *type, PyObject *arg, PyObject *dict)
if (ob == NULL) {
ob = Py_NewRef(Py_None);
}
+ else {
+ count++;
+ }
res->ob_item[i] = ob;
}
+ if (dict != NULL && PyDict_GET_SIZE(dict) > count) {
+ PyErr_Format(PyExc_TypeError,
+ "%.500s() got unexpected field name(s).",
+ type->tp_name);
+ Py_DECREF(res);
+ return NULL;
+ }
_PyObject_GC_TRACK(res);
return (PyObject*) res; passes your tests (ignoring error message). It raises error for your example
Look, your loop can be split on two, |
I check this and it works. I have updated my implementation to raise a single generic error. |
PyObject *ob = NULL; | ||
if (dict != NULL) { | ||
const char *name = type->tp_members[i-n_unnamed_fields].name; | ||
if (dict != NULL && PyDict_GET_SIZE(dict) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add this guard check before the for-loop to avoid unnecessary dictionary lookup for empty dict
.
Thanks for the code review and suggestions! |
…ructSequence constructor (pythonGH-110258)
Fixes #110235
In this PR, we iterate over
dict
keys and check if there are duplicate fields provided bysequence
or there are any unexpected field names.