diff --git a/pep-0589.rst b/pep-0589.rst new file mode 100644 index 00000000000..c78afb805e5 --- /dev/null +++ b/pep-0589.rst @@ -0,0 +1,648 @@ +PEP: 589 +Title: TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys +Author: Jukka Lehtosalo +Sponsor: Guido van Rossum +Discussions-To: typing-sig@python.org +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 20-Mar-2019 +Python-Version: 3.8 +Post-History: + + +Abstract +======== + +PEP 484 [#PEP-484]_ defines the type ``Dict[K, V]`` for uniform +dictionaries, where each value has the same type and arbitrary key +values are supported. It doesn't properly support the common pattern +where the type of a dictionary value depends on the string value of +the key. This PEP proposes a type constructor ``typing.TypedDict`` to +support the use case where a dictionary object has a specific set of +string keys, each with a value of a specific type. + +Here is an example where PEP 484 doesn't allow us to annotate +satisfactorily:: + + movie = {'name': 'Blade Runner', + 'year': 1982} + +This PEP proposes the addition of a new type constructor, called +``TypedDict``, to allow the type of ``movie`` to be represented +precisely:: + + from typing import TypedDict + + class Movie(TypedDict): + name: str + year: int + +Now a type checker should accept this code:: + + movie: Movie = {'name': 'Blade Runner', + 'year': 1982} + + +Motivation +========== + +Representing an object or structured data using (potentially nested) +dictionaries with string keys (instead of a user-defined class) is a +common pattern in Python programs. Representing JSON objects is +perhaps the canonical use case, and this is popular enough that Python +ships with a JSON library. This PEP proposes a way to allow such code +to be type checked more effectively. + +More generally, representing pure data objects using only Python +primitive types such as dictionaries, strings and lists has had +certain appeal. They are are easy to serialize and deserialize even +when not using JSON. They trivially support various useful operations +with no extra effort, including pretty-printing (through ``str()`` and +the ``pprint`` module), iteration, and equality comparisons. + +PEP 484 doesn't properly support the use cases mentioned above. Let's +consider a dictionary object that has exactly two valid string keys, +``'name'`` with value type ``str``, and ``'year'`` with value type +``int``. The PEP 484 type ``Dict[str, Any]`` would be suitable, but +it is too lenient, as arbitrary string keys can be used, and arbitrary +values are valid. Similarly, ``Dict[str, Union[str, int]]`` is too +general, as the value for key ``'name'`` could be an ``int``, and +arbirary string keys are allowed. Also, the type of a subscription +expression such as ``d['name']`` (assuming ``d`` to be a dictionary of +this type) would be ``Union[str, int]``, which is too wide. + +Dataclasses are a more recent alternative to solve this use case, but +there is still a lot of existing code that was written before +dataclasses became available, especially in large existing codebases +where type hinting and checking has proven to be helpful. Unlike +dictionary objects, dataclasses don't directly support JSON +serialization, though there is a third-party package that implements +it [#dataclasses-json]_. + + +Specification +============= + +A TypedDict type represents dictionary objects with a specific set of +string keys, and with specific value types for each valid key. Each +string key can be either required (it can't be omitted) or +non-required (it doesn't need to exist). + +The PEP proposes two ways of defining TypedDict types. The first uses +a class-based syntax. The second is an alternative +assignment-based syntax that is provided for backwards compatibility, +to allow the feature to be backported to older Python versions. The +rationale is similar to why PEP 484 supports a comment-based +annotation syntax for Python 2.7: type hinting is particularly useful +for large existing codebases, and these often need to run on older +Python versions. The two syntax options parallel the syntax variants +supported by ``typing.NamedTuple``. Other proposed features include +TypedDict inheritance and totality (specifying whether keys are +required or not). + +This PEP also provides a sketch of how a type checker is expected +to support type checking operations involving TypedDict objects. +Similar to PEP 484, this discussion is left somewhat vague on purpose, +to allow experimentation with a wide variety of different type +checking approaches. In particular, type compatibility should be +based on structural compatibility: a more specific TypedDict type can +be compatible with a smaller (more general) TypedDict type. + + +Class-based Syntax +------------------ + +A TypedDict type can be defined using the class definition syntax with +``typing.TypedDict`` as the sole base class:: + + from typing import TypedDict + + class Movie(TypedDict): + name: str + year: int + +``Movie`` is a TypedDict type with two items: ``'name'`` (with type +``str``) and ``'year'`` (with type ``int``). + +A type checker should validate that the body of a class-based +TypedDict definition conforms to these rules: + +* The class body should only contain an optional docstring, followed + by lines with item definitions of the form ``key: value_type``. The + syntax for item definitions is identical to attribute annotations, + but there must be no initializer, and the key name actually refers + to the string value of the key instead of an attribute name. + +* Type comments cannot be used with the class-based syntax, for + consistency with the class-based ``NamedTuple`` syntax. (Note that + it would not be sufficient to support type comments for backwards + compatibility with Python 2.7, since the class definition may have a + ``total`` keyword argument, as discussed below, and this isn't valid + syntax in Python 2.7.) Instead, this PEP provides an alternative, + assignment-based syntax for backwards compatibility, discussed in + `Alternative Syntax`_. + +* String literal forward references are valid in the value types. + +* Methods are not allowed, since the runtime type of a TypedDict + object will always be just ``dict`` (it is never a subclass of + ``dict``). + +* Specifying a metaclass is not allowed. + +An empty TypedDict can be created by only including ``pass`` in the +body (if there is a docstring, ``pass`` can be omitted):: + + class EmptyDict(TypedDict): + pass + + +Using TypedDict Types +--------------------- + +Here is an example of how the type ``Movie`` can be used:: + + movie: Movie = {'name': 'Blade Runner', + 'year': 1982} + +An explicit ``Movie`` type annotation is generally needed, as +otherwise an ordinary dictionary type could be assumed by a type +checker, for backwards compatibility. When a type checker can infer +that a constructed dictionary object should be a TypedDict, an +explicit annotation can be omitted. A typical example is a dictionary +object as a function argument. In this example, a type checker is +expected to infer that the dictionary argument should be understood as +a TypedDict:: + + def record_movie(movie: Movie) -> None: ... + + record_movie({'name': 'Blade Runner', 'year': 1982}) + +Another example where a type checker should treat a dictionary display +as a TypedDict is in an assignment to a variable with a previously +declared TypedDict type:: + + movie: Movie + ... + movie = {'name': 'Blade Runner', 'year': 1982} + +Operations on ``movie`` can be checked by a static type checker:: + + movie['director'] = 'Ridley Scott' # Error: invalid key 'director' + movie['year'] = '1982' # Error: invalid value type ("int" expected) + +The code below should be rejected, since ``'title'`` is not a valid +key, and the ``'name'`` key is missing:: + + movie2: Movie = {'title': 'Blade Runner', + 'year': 1982} + +The created TypedDict type object is not a real class object. These +are the only uses of the type a type checker is expected to allow: + +* It can be used in type annotations and in any context where an + arbitrary type hint is valid, such as in type aliases and as the + target type of a cast. + +* It can be used as a callable object with keyword arguments + corresponding to the TypedDict items. Non-keyword arguments are not + allowed. Example:: + + m = Movie(name='Blade Runner', year=1982) + + When called, the TypedDict type object returns an ordinary + dictionary object at runtime:: + + print(type(m)) # + +* It can be used as a base class, but only when defining a derived + TypedDict. This is discussed in more detail below. + +In particular, TypedDict type objects cannot be used in +``isinstance()`` tests such as ``isinstance(d, Movie)``. The reason is +that there is no existing support for checking types of dictionary +item values, since ``isinstance()`` does not work with many PEP 484 +types, including common ones like ``List[str]``. This would be needed +for cases like this:: + + class Strings(TypedDict): + items: List[str] + + print(isinstance({'items': [1]}, Strings)) # Should be False + print(isinstance({'items': ['x']}, Strings)) # Should be True + +The above use case is not supported. This is consistent with how +``isinstance()`` is not supported for ``List[str]``. + + +Inheritance +----------- + +It is possible for a TypedDict type to inherit from one or more +TypedDict types using the class-based syntax. In this case the +``TypedDict`` base class should not be included. Example:: + + class BookBasedMovie(Movie): + based_on: str + +Now ``BookBasedMovie`` has keys ``name``, ``year``, and ``based_on``. +It is equivalent to this definition, since TypedDict types use +structural compatibility:: + + class BookBasedMovie(TypedDict): + name: str + year: int + based_on: str + +Here is an example of multiple inheritance:: + + class X(TypedDict): + x: int + + class Y(TypedDict): + y: str + + class XYZ(X, Y): + z: bool + +The TypedDict ``XYZ`` has three items: ``x`` (type ``int``), ``y`` +(type ``str``), and ``z`` (type ``bool``). + +A TypedDict cannot inherit from both a TypedDict type and a +non-TypedDict base class. + + +Totality +-------- + +By default, all keys must be present in a TypedDict. It is possible +to override this by specifying *totality*. Here is how to do this +using the class-based syntax:: + + class Movie(TypedDict, total=False): + name: str + year: int + +This means that a ``Movie`` TypedDict can have any of the keys omitted. Thus +these are valid:: + + m: Movie = {} + m2: Movie = {'year': 2015} + +A type checker is only expected to support a literal ``False`` or +``True`` as the value of the ``total`` argument. ``True`` is the +default, and makes all items defined the class body be required. + +The totality flag only applies to items defined in the body of the +TypedDict definition. Inherited items won't be affected, and instead +use totality of the TypedDict type where they were defined. This makes +it possible to have a combination of required and non-required keys in +a single TypedDict type. + + +Alternative Syntax +------------------ + +This PEP also proposes an alternative syntax that can be backported to +older Python versions such as 3.5 and 2.7 that don't support the +variable definition syntax introduced in PEP 526 [#PEP-526]. It +resembles the traditional syntax for defining named tuples:: + + Movie = TypedDict('Movie', {'name': str, 'year': int}) + +It is also possible to specify totality using the alternative syntax:: + + Movie = TypedDict('Movie', {'name': str, + 'year': int}, total=False) + +The semantics are equivalent to the class-based syntax. This syntax +doesn't support inheritance, however, and there is no way to +have both required and non-required fields in a single type. The +motivation for this is keeping the backwards compatible syntax as +simple as possible while covering the most common use cases. + +A type checker is only expected to accept a dictionary display expression +as the second argument to ``TypedDict``. In particular, a variable that +refers to a dictionary object does not need to be supported, to simplify +implementation. + + +Type Consistency +---------------- + +Informally speaking, *type consistency* is a generalization of the +is-subtype-of relation to support the ``Any`` type. It is defined +more formally in PEP 483 [#PEP-483]_). This section introduces the +new, non-trivial rules needed to support type consistency for +TypedDict types. + +First, any TypedDict type is consistent with ``Mapping[str, object]``. +Second, a TypedDict type ``A`` is consistent with TypedDict ``B`` if +``A`` is structurally compatible with ``B``. This is true if and only +if both of these conditions are satisfied: + +* For each key in ``B``, ``A`` has the corresponding key and the + corresponding value type in ``A`` is consistent with the value type + in ``B``. For each key in ``B``, the value type in ``B`` is also + consistent with the corresponding value type in ``A``. + +* For each required key in ``B``, the corresponding key is required + in ``A``. For each non-required key in ``B``, the corresponding key + is not required in ``A``. + +Discussion: + +* Value types behave invariantly, since TypedDict objects are mutable. + This is similar to mutable container types such as ``List`` and + ``Dict``. Example where this is relevant:: + + class A(TypedDict): + x: Optional[int] + + class B(TypedDict): + x: int + + def f(a: A) -> None: + a['x'] = None + + b: B = {'x': 0} + f(b) # Type check error: 'B' not compatible with 'A' + b['x'] + 1 # Runtime error: None + 1 + +* A TypedDict type with required keys is not consistent with a + TypedDict type with non-required keys, since the latter allows keys + to be deleted. Example where this is relevant:: + + class A(TypedDict, total=False): + x: int + + class B(TypedDict): + x: int + + def f(a: A) -> None: + del a['x'] + + b: B = {'x': 0} + f(b) # Type check error: 'B' not compatible with 'A' + b['x'] + 1 # Runtime KeyError: 'x' + +* A TypedDict type ``A`` with no key ``'x'`` is not consistent with a + TypedDict type with a non-required key ``'x'``, since at runtime + the key ``'x'`` could be present and have an incompatible type + (which may not be visible through ``A`` due to structural subtyping). + Example:: + + class A(TypedDict, total=False): + x: int + y: int + + class B(TypedDict, total=False): + x: int + + class C(TypedDict, total=False): + x: int + y: str + + def f(a: A) -> None: + a[y] = 1 + + def g(b: B) -> None: + f(b) # Type check error: 'B' incompatible with 'A' + + c: C = {'x': 0, 'y': 'foo'} + g(c) + c['y'] + 'bar' # Runtime error: int + str + +* A TypedDict isn't consistent with any ``Dict[...]`` type, since + dictionary types allow destructive operations, including + ``clear()``. They also allow arbitrary keys to be set, which + would compromise type safety. Example:: + + class A(TypedDict): + x: int + + class B(A): + y: str + + def f(d: Dict[str, int]) -> None: + d['y'] = 0 + + def g(a: A) -> None: + f(a) # Type check error: 'A' incompatible with Dict[str, int] + + b: B = {'x': 0, 'y': 'foo'} + g(b) + b['y'] + 'bar' # Runtime error: int + str + +* A TypedDict with all ``int`` values is not consistent with + ``Mapping[str, int]``, since there may be additional non-``int`` + values not visible through the type, due to structural subtyping. + These can be accessed using the ``values()`` and ``items()`` + methods in ``Mapping``, for example. Example:: + + class A(TypedDict): + x: int + + class B(TypedDict): + x: int + y: str + + def sum_values(m: Mapping[str, int]) -> int: + n = 0 + for v in m.values(): + n += v # Runtime error + return n + + def f(a: A) -> None: + sum_values(a) # Error: 'A' incompatible with Mapping[str, int] + + b: B = {'x': 0, 'y': 'foo'} + f(b) + + +Supported and Unsupported Operations +------------------------------------ + +Type checkers should support restricted forms of most ``dict`` +operations on TypedDict objects. The guiding principle is that +operations not involving ``Any`` types should be rejected by type +checkers if they may violate runtime type safety. Here are some of +the most important type safety violations to prevent: + +1. A required key is missing. + +2. A value has an invalid type. + +3. A key that is not defined in the TypedDict type is added. + +A key that is not a literal should generally be rejected, since its +value is unknown during type checking, and thus can cause some of +the above violations. + +The use of a key that is not known to exist should be reported as an +error, even if this wouldn't necessarily generate a runtime type +error. These are often mistakes, and these may insert values with an +invalid type if structural subtyping hides the types of certain items. +For example, ``d['x'] = 1`` should generate a type check error if +``'x'`` is not a valid key for ``d`` (which is assumed to be a +TypedDict type). + +Extra keys included in TypedDict object construction should also be +caught. In this example, the ``director`` key is not defined in +``Movie`` and is expected to generate an error from a type checker:: + + m: Movie = dict( + name='Alien', + year=1979, + director='Ridley Scott') # error: Unexpected key 'director' + +Type checkers should reject the following operations on TypedDict +objects as unsafe, even though they are valid for normal dictionaries: + +* Operations with arbitrary ``str`` keys (instead of string literals + or other expressions with known string values) should be rejected. + This involves both destructive operations such as setting an item + and read-only operations such as subscription expressions. + +* ``clear()`` is not safe since it could remove required keys, some of + which may not be directly visible because of structural + subtyping. ``popitem()`` is similarly unsafe, even if all known + keys are not required (``total=False``). + +* ``del obj['key']`` should be rejected unless ``'key'`` is a + non-required key. + +Type checkers may allow reading an item using ``d['x']`` even if +the key ``'x'`` is not required, instead of requiring the use of +``d.get('x')`` or an explicit ``'x' in d`` check. The rationale is +that tracking the existence of keys is difficult to implement in full +generality, and that disallowing this could require many changes to +existing code. + +The exact type checking rules are up to each type checker to decide. +In some cases potentially unsafe operations may be accepted if the +alternative is to generate false positive errors for idiomatic code. + + +Backwards Compatibility +======================= + +To retain backwards compatibility, type checkers should not infer a +TypedDict type unless it is sufficiently clear that this is desired by +the programmer. When unsure, an ordinary dictionary type should be +inferred. Otherwise existing code that type checks without errors may +start generating errors once TypedDict support is added to the type +checker, since TypedDict types are more restrictive than dictionary +types. In particular, they aren't subtypes of dictionary types. + + +Reference Implementation +======================== + +The mypy [#mypy]_ type checker supports TypedDict types. A reference +implementation of the runtime component is provided in the +``mypy_extensions`` [#mypy_extensions]_ module. + + +Rejected Alternatives +===================== + +Several proposed ideas were rejected. The current set of features +seem to cover a lot of ground, and it was not not clear which of the +proposed extensions would be more than marginally useful. This PEP +defines a baseline feature that can be potentially extended later. + +These are rejected on principle, as incompatible with the spirit of +this proposal: + +* TypedDict isn't extensible, and it addresses only a specific use + case. TypedDict objects are regular dictionaries at runtime, and + TypedDict cannot be used with other dictionary-like or mapping-like + classes, including subclasses of ``dict``. There is no way to add + methods to TypedDict types. The motivation here is simplicity. + +* TypedDict type definitions could plausibly used to perform runtime + type checking of dictionaries. For example, they could be used to + validate that a JSON object conforms to the schema specified by a + TypedDict type. This PEP doesn't include such functionality, since + the focus of this proposal is static type checking only, and other + existing types do not support this, as discussed in `Class-based + syntax`_. Such functionality can be provided by a third-party + library using the ``typing_inspect`` [#typing_inspect]_ third-party + module, for example. + +* TypedDict types can't be used in ``isinstance()`` or ``issubclass()`` + checks. The reasoning is similar to why runtime type checks aren't + supported in general. + +These features were left out from this PEP, but they are potential +extensions to be added in the future: + +* TypedDict doesn't support providing a *default value type* for keys + that are not explicitly defined. This would allow arbitrary keys to + be used with a TypedDict object, and only explicitly enumerated keys + would receive special treatment compared to a normal, uniform + dictionary type. + +* There is no way to individually specify whether each key is required + or not. No proposed syntax was clear enough. + +* TypedDict can't be used for specifying the type of a ``**kwargs`` + argument. This would allow restricting the allowed keyword + arguments and their types. According to PEP 484, using a TypedDict + type as the type of ``**kwargs`` means that the TypedDict is valid + as the *value* of arbitrary keyword arguments, but it doesn't + restrict which keyword arguments should be allowed. The syntax + ``**kwargs: Expand[T]`` has been proposed for this [#expand]_. + + +Acknowledgements +================ + +David Foster contributed the initial implementation of TypedDict types +to mypy. Improvements to the implementation have been contributed by +at least the author (Jukka Lehtosalo), Ivan Levkivskyi, Gareth T, +Michael Lee, Dominik Miedzinski, Roy Williams and Max Moroz. + + +References +========== + +.. [#PEP-484] PEP 484, Type Hints, van Rossum, Lehtosalo, Langa + (http://www.python.org/dev/peps/pep-0484) + +.. [#dataclasses-json] Dataclasses JSON + (https://github.com/lidatong/dataclasses-json) + +.. [#PEP-526] PEP 526, Syntax for Variable Annotations, Gonzalez, + House, Levkivskyi, Roach, van Rossum + (http://www.python.org/dev/peps/pep-0484) + +.. [#PEP-483] PEP 483, The Theory of Type Hints, van Rossum, Levkivskyi + (http://www.python.org/dev/peps/pep-0483) + +.. [#mypy] http://www.mypy-lang.org/ + +.. [#mypy_extensions] https://github.com/python/mypy_extensions + +.. [#typing_inspect] https://github.com/ilevkivskyi/typing_inspect + +.. [#expand] https://github.com/python/mypy/issues/4441 + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep0/pep.py b/pep0/pep.py index 5d8e1215053..6d1d727a135 100644 --- a/pep0/pep.py +++ b/pep0/pep.py @@ -161,7 +161,7 @@ class PEP(object): # required or not. headers = (('PEP', True), ('Title', True), ('Version', False), ('Last-Modified', False), ('Author', True), - ('BDFL-Delegate', False), + ('Sponsor', False), ('BDFL-Delegate', False), ('Discussions-To', False), ('Status', True), ('Type', True), ('Content-Type', False), ('Requires', False), ('Created', True), ('Python-Version', False),