Description
It has been noted in [1] that in several circumstances, arbitrary string keys are accepted into a name space ostensibly intended for identifiers. Currently, CPython checks at most that the key is a str
, and admits keys that would not be valid identifiers.
In these circumstances, should we
- document that this is a Python language feature, or
- note that it is a CPython implementation detail that may or may not be supported on other/later versions?
Where this comes up
The discussion [1] and related issue [2] identify these cases:
-
When additional keyword arguments are supplied in a call using the syntax the syntax
**expression
, CPython checks that the keys of the mapping arestr
(or a subclass) but not that they are identifiers. If the function has a formal parameter**identifier
, it gathers these non-identifier KV-pairs, and the function body may treat as a dictionary. -
When an object allows the addition of attributes, the default implementation of
__setattr__
in CPython checks the name is astr
(or a subclass) but not that it is an identifier (see_PyObject_GenericSetAttrWithDict
). The built-insetattr()
makes the same check (or ratherPyObject_SetAttr
does, which it calls). By either route, we obtain an instance with an attribute that is not accessible using dot notation (only bygetattr()
etc.). -
In a variant of 2., it is possible to give a type an attribute (value or descriptor) whose name is not an identifier, by manipulation of the
locals()
during definition, that is accessible togetattr()
. This may even be a non-string, but a non-string key is not accessible other than directly on the__dict__
of the type.
The third is in contrast to the behaviour of __slots__
, which insists on identifiers and projects names that are subclasses of str
ont the base class str
(using _Py_Mangle
from compile.c
).
It is possible for the SC to accept any or all cases as a language feature, but as the arguments overlap, all or none seems most defensible. Two documentation-only PRs await this SC decision: one addressing keyword arguments [3] and the other object attributes [4].
Issue [2] also identifies the question whether subclasses of str
really ought to be allowed, or maybe projected as in __slots__
. This appears orthogonal to the question asked here (but interesting).
Arguments for accepting non-identifier strings as Python
It is an established practice. There is an example of non-identifier keyword arguments at [5] and of an __init__
in the Azure SDK that looks for such keywords at [6].
It is "consistent, harmless and intentional" and "a language feature in good standing" [GvR].
"leaving it up to the implementation ... just invites gratuitous differences." [GvR]
"Sometimes Python object attributes are mapped to attributes from some other system that may have different naming rules."
"Calling functions in Python is slow enough as it is without the extra checks". (But is it unimaginable in any implementation with motive and ingenuety?)
The other major interpreters follow CPython behaviour.
Arguments these are implementation details (with option to disallow)
Consistency: the **mapping
is intended to supply keyword arguments, and these names must be identifiers. (Note in passing proposal [7], yet to attract support, for syntax that would allow any string where an identifier is expected.)
Similarly, the glossary defines an attribute as "a value referenced by a dotted expression ... o.a
". (The PR at [4] includes additional words.)
Arguments against documenting it at all
"Documenting it would require all Python implementations to support it, including future versions of CPython. [Even if documented as an "implementation detail"?] ... I’m perfectly happy with it being an implementation detail of CPython that people have to discover for themselves." [1]
References
[1] Python ideas https://discuss.python.org/t/supporting-or-not-invalid-identifiers-in-kwargs/17147 Thanks to contributors there for the arguments summarised here.
[2] python/cpython#96397
[3] python/cpython#96393
[4] python/cpython#96454
[5] PyTorch-UNet https://github.com/milesial/Pytorch-UNet/blob/a96fbb05ccdbb8a140471391c8e51d159ffaa45e/train.py#L111. Arguably this stems from an API fault in tqdm.set_postfix
, where the programmer's intent is to accept a mapping.
[6] Azure SDK https://github.com/Azure/azure-sdk-for-python/blob/608d038c352878c0931df3a3b5319372da8847fb/sdk/storage/azure-storage-blob/azure/storage/blob/_models.py#L385-L390
[7] https://discuss.python.org/t/backtics-to-allow-any-name/18698