8000 Non-identifier keys in `**` arguments and elsewhere · Issue #142 · python/steering-council · GitHub
[go: up one dir, main page]

Skip to content
Non-identifier keys in ** arguments and elsewhere #142
Closed
@jeff5

Description

@jeff5

It has been noted in [1] that in several circumstances, arbitrary string keys are accepted into a name space ostensibly intended for identifiers. Currently, CPython checks at most that the key is a str, and admits keys that would not be valid identifiers.

In these circumstances, should we

  1. document that this is a Python language feature, or
  2. note that it is a CPython implementation detail that may or may not be supported on other/later versions?

Where this comes up

The discussion [1] and related issue [2] identify these cases:

  1. When additional keyword arguments are supplied in a call using the syntax the syntax **expression, CPython checks that the keys of the mapping are str (or a subclass) but not that they are identifiers. If the function has a formal parameter **identifier, it gathers these non-identifier KV-pairs, and the function body may treat as a dictionary.

  2. When an object allows the addition of attributes, the default implementation of __setattr__ in CPython checks the name is a str (or a subclass) but not that it is an identifier (see _PyObject_GenericSetAttrWithDict). The built-in setattr() makes the same check (or rather PyObject_SetAttr does, which it calls). By either route, we obtain an instance with an attribute that is not accessible using dot notation (only by getattr() etc.).

  3. In a variant of 2., it is possible to give a type an attribute (value or descriptor) whose name is not an identifier, by manipulation of the locals() during definition, that is accessible to getattr(). This may even be a non-string, but a non-string key is not accessible other than directly on the __dict__ of the type.

The third is in contrast to the behaviour of __slots__, which insists on identifiers and projects names that are subclasses of str ont the base class str (using _Py_Mangle from compile.c).

It is possible for the SC to accept any or all cases as a language feature, but as the arguments overlap, all or none seems most defensible. Two documentation-only PRs await this SC decision: one addressing keyword arguments [3] and the other object attributes [4].

Issue [2] also identifies the question whether subclasses of str really ought to be allowed, or maybe projected as in __slots__. This appears orthogonal to the question asked here (but interesting).

Arguments for accepting non-identifier strings as Python

It is an established practice. There is an example of non-identifier keyword arguments at [5] and of an __init__ in the Azure SDK that looks for such keywords at [6].

It is "consistent, harmless and intentional" and "a language feature in good standing" [GvR].

"leaving it up to the implementation ... just invites gratuitous differences." [GvR]

"Sometimes Python object attributes are mapped to attributes from some other system that may have different naming rules."

"Calling functions in Python is slow enough as it is without the extra checks". (But is it unimaginable in any implementation with motive and ingenuety?)

The other major interpreters follow CPython behaviour.

Arguments these are implementation details (with option to disallow)

Consistency: the **mapping is intended to supply keyword arguments, and these names must be identifiers. (Note in passing proposal [7], yet to attract support, for syntax that would allow any string where an identifier is expected.)

Similarly, the glossary defines an attribute as "a value referenced by a dotted expression ... o.a". (The PR at [4] includes additional words.)

Arguments against documenting it at all

"Documenting it would require all Python implementations to support it, including future versions of CPython. [Even if documented as an "implementation detail"?] ... I’m perfectly happy with it being an implementation detail of CPython that people have to discover for themselves." [1]

References

[1] Python ideas https://discuss.python.org/t/supporting-or-not-invalid-identifiers-in-kwargs/17147 Thanks to contributors there for the arguments summarised here.
[2] python/cpython#96397
[3] python/cpython#96393
[4] python/cpython#96454
[5] PyTorch-UNet https://github.com/milesial/Pytorch-UNet/blob/a96fbb05ccdbb8a140471391c8e51d159ffaa45e/train.py#L111. Arguably this stems from an API fault in tqdm.set_postfix, where the programmer's intent is to accept a mapping.
[6] Azure SDK https://github.com/Azure/azure-sdk-for-python/blob/608d038c352878c0931df3a3b5319372da8847fb/sdk/storage/azure-storage-blob/azure/storage/blob/_models.py#L385-L390
[7] https://discuss.python.org/t/backtics-to-allow-any-name/18698

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0