|
| 1 | +!!! warning "🚧 Work in Progress" |
| 2 | + This page is a work in progress. |
| 3 | + |
| 4 | +Starting with Pydantic V2, part of the codebase is written in Rust in a separate package called `pydantic-core`. |
| 5 | +This was done partly in order to improve validation and serialization performance (with the cost of limited |
| 6 | +customization and extendibility of the internal logic). |
| 7 | + |
| 8 | +This architecture documentation will first cover how the two `pydantic` and `pydantic-core` packages interact |
| 9 | +together, then will go through the architecture specifics for various patterns (model definition, validation, |
| 10 | +serialization, JSON Schema). |
| 11 | + |
| 12 | +Usage of the Pydantic library can be divided into two parts: |
| 13 | + |
| 14 | +- Model definition, done in the `pydantic` package. |
| 15 | +- Model validation and serialization, done in the `pydantic-core` package. |
| 16 | + |
| 17 | +## Model definition |
| 18 | + |
| 19 | +Whenever a Pydantic [`BaseModel`][pydantic.main.BaseModel] is defined, the metaclass |
| 20 | +will analyze the body of the model to collect a number of elements: |
| 21 | + |
| 22 | +- Defined annotations to build model fields (collected in the [`model_fields`][pydantic.main.BaseModel.model_fields] attribute). |
| 23 | +- Model configuration, set with [`model_config`][pydantic.main.BaseModel.model_config]. |
| 24 | +- Additional validators/serializers. |
| 25 | +- Private attributes, class variables, identification of generic parametrization, etc. |
| 26 | + |
| 27 | +### Communicating between `pydantic` and `pydantic-core`: the core schema |
| 28 | + |
| 29 | +We then need a way to communicate the collected information from the model definition to `pydantic-core`, |
| 30 | +so that validation and serialization is performed accordingly. To do so, Pydantic uses the concept |
| 31 | +of a core schema: a structured (and serializable) Python dictionary (represented using |
| 32 | +[`TypedDict`][typing.TypedDict] definitions) describing a specific validation and serialization |
| 33 | +logic. It is the core data structure used to communicate between the `pydantic` and `pydantic-core` |
| 34 | +packages. Every core schema has a required `type` key, and extra properties depending on this `type`. |
| 35 | + |
| 36 | +The generation of a core schema is handled in a single place, by the `GenerateSchema` class |
| 37 | +(no matter if it is for a Pydantic model or anything else). |
| 38 | + |
| 39 | +!!! note |
| 40 | + It is not possible to define a custom core schema. A core schema needs to be understood by the |
| 41 | + `pydantic-core` package, and as such we only support a fixed number of core schema types. |
| 42 | + This is also part of the reason why the `GenerateSchema` isn't truly exposed and properly |
| 43 | + documented. |
| 44 | + |
| 45 | + The core schema definitions can be found in the [`pydantic_core.core_schema`][] module. |
| 46 | + |
| 47 | +In the case of a Pydantic model, a core schema will be constructed and set as the |
| 48 | +[`__pydantic_core_schema__`][pydantic.main.BaseModel.__pydantic_core_schema__] attribute. |
| 49 | + |
| 50 | +To illustrate what a core schema looks like, we will take the example of the |
| 51 | +[`bool`][pydantic_core.core_schema.bool_schema] core schema: |
| 52 | + |
| 53 | +```python lint="skip" test="skip" |
| 54 | +class BoolSchema(TypedDict, total=False): |
| 55 | + type: Required[Literal['bool']] |
| 56 | + strict: bool |
| 57 | + ref: str |
| 58 | + metadata: Any |
| 59 | + serialization: SerSchema |
| 60 | +``` |
| 61 | + |
| 62 | +When defining a Pydantic model with a boolean field: |
| 63 | + |
| 64 | +```python |
| 65 | +from pydantic import BaseModel, Field |
| 66 | + |
| 67 | + |
| 68 | +class Model(BaseModel): |
| 69 | + foo: bool = Field(strict=True) |
| 70 | +``` |
| 71 | + |
| 72 | +The core schema for the `foo` field will look like: |
| 73 | + |
| 74 | +```python |
| 75 | +{ |
| 76 | + 'type': 'bool', |
| 77 | + 'strict': True, |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +As seen in the [`BoolSchema`][pydantic_core.core_schema.bool_schema] definition, |
| 82 | +the serialization logic is also defined in the core schema. |
| 83 | +If we were to define a custom serialization function for `foo` (1), the `serialization` key would look like: |
| 84 | +{ .annotate } |
| 85 | + |
| 86 | +1. For example using the [`field_serializer`][pydantic.functional_serializers.field_serializer] decorator: |
| 87 | + |
| 88 | + ```python test="skip" lint="skip" |
| 89 | + class Model(BaseModel): |
| 90 | + foo: bool = Field(strict=True) |
| 91 | + |
| 92 | + @field_serializer('foo', mode='plain') |
| 93 | + def serialize_foo(self, value: bool) -> Any: |
| 94 | + ... |
| 95 | + ``` |
| 96 | + |
| 97 | +```python lint="skip" test="skip" |
| 98 | +{ |
| 99 | + 'type': 'function-plain', |
| 100 | + 'function': <function Model.serialize_foo at 0x111>, |
| 101 | + 'is_field_serializer': True, |
| 102 | + 'info_arg': False, |
| 103 | + 'return_schema': {'type': 'int'}, |
| 104 | +} |
| 105 | +``` |
| 106 | + |
| 107 | +Note that this is also a core schema definition, just that it is only relevant for `pydantic-core` during serialization. |
| 108 | + |
| 109 | +Core schemas cover a broad scope, and are used whenever we want to communicate between the Python and Rust side. |
| 110 | +While the previous examples were related to validation and serialization, it could in theory be used for anything: |
| 111 | +error management, extra metadata, etc. |
| 112 | + |
| 113 | +### JSON Schema generation |
| 114 | + |
| 115 | +You may have noticed that the previous serialization core schema has a `return_schema` key. |
| 116 | +This is because the core schema is also used to generate the corresponding JSON Schema. |
| 117 | + |
| 118 | +Similar to how the core schema is generated, the JSON Schema generation is handled by the |
| 119 | +[`GenerateJsonSchema`][pydantic.json_schema.GenerateJsonSchema] class. |
| 120 | +The [`generate`][pydantic.json_schema.GenerateJsonSchema.generate] method |
| 121 | +is the main entry point and is given the core schema of that model. |
| 122 | + |
| 123 | +Coming back to our `bool` field example, the [`bool_schema`][pydantic.json_schema.GenerateJsonSchema.bool_schema] |
| 124 | +method will be given the previously generated [boolean core schema][pydantic_core.core_schema.bool_schema] |
| 125 | +and will return the following JSON Schema: |
| 126 | + |
| 127 | +```json |
| 128 | +{ |
| 129 | + {"type": "boolean"} |
| 130 | +} |
| 131 | +``` |
| 132 | + |
| 133 | +### Customizing the core schema and JSON schema |
| 134 | + |
| 135 | +!!! abstract "Usage Documentation" |
| 136 | + [Custom types](concepts/types.md#custom-types) |
| 137 | + |
| 138 | + [Implementing `__get_pydantic_core_schema__`](concepts/json_schema.md#implementing-__get_pydantic_core_schema__) |
| 139 | + |
| 140 | + [Implementing `__get_pydantic_json_schema__`](concepts/json_schema.md#implementing-__get_pydantic_json_schema__) |
| 141 | + |
| 142 | +While the `GenerateSchema` and [`GenerateJsonSchema`][pydantic.json_schema.GenerateJsonSchema] classes handle |
| 143 | +the creation of the corresponding schemas, Pydantic offers a way to customize them in some cases, following a wrapper pattern. |
| 144 | +This customization is done through the `__get_pydantic_core_schema__` and `__get_pydantic_json_schema__` methods. |
| 145 | + |
| 146 | +To understand this wrapper pattern, we will take the example of metadata classes used with [`Annotated`][typing.Annotated], |
| 147 | +where the `__get_pydantic_core_schema__` method can be used: |
| 148 | + |
| 149 | +```python |
| 150 | +from typing import Any |
| 151 | + |
| 152 | +from pydantic_core import CoreSchema |
| 153 | +from typing_extensions import Annotated |
| 154 | + |
| 155 | +from pydantic import GetCoreSchemaHandler, TypeAdapter |
| 156 | + |
| 157 | + |
| 158 | +class MyStrict: |
| 159 | + @classmethod |
| 160 | + def __get_pydantic_core_schema__( |
| 161 | + cls, source: Any, handler: GetCoreSchemaHandler |
| 162 | + ) -> CoreSchema: |
| 163 | + schema = handler(source) # (1)! |
| 164 | + schema['strict'] = True |
| 165 | + return schema |
| 166 | + |
| 167 | + |
| 168 | +class MyGt: |
| 169 | + @classmethod |
| 170+ def __get_pydantic_core_schema__( |
| 171 | + cls, source: Any, handler: GetCoreSchemaHandler |
| 172 | + ) -> CoreSchema: |
| 173 | + schema = handler(source) # (2)! |
| 174 | + schema['gt'] = 1 |
| 175 | + return schema |
| 176 | + |
| 177 | + |
| 178 | +ta = TypeAdapter(Annotated[int, MyStrict(), MyGt()]) |
| 179 | +``` |
| 180 | + |
| 181 | +1. `MyStrict` is the first annotation to be applied. At this point, `schema = {'type': 'int'}`. |
| 182 | +2. `MyGt` is the last annotation to be applied. At this point, `schema = {'type': 'int', 'strict': True}`. |
| 183 | + |
| 184 | +When the `GenerateSchema` class builds the core schema for `Annotated[int, MyStrict(), MyGt()]`, it will |
| 185 | +create an instance of a `GetCoreSchemaHandler` to be passed to the `MyGt.__get_pydantic_core_schema__` method. (1) |
| 186 | +{ .annotate } |
| 187 | + |
| 188 | +1. In the case of our [`Annotated`][typing.Annotated] pattern, the `GetCoreSchemaHandler` is defined in a nested way. |
| 189 | + Calling it will recursively call the other `__get_pydantic_core_schema__` methods until it reaches the `int` annotation, |
| 190 | + where a simple `{'type': 'int'}` schema is returned. |
| 191 | + |
| 192 | +The `source` argument depends on the core schema generation pattern. In the case of [`Annotated`][typing.Annotated], |
| 193 | +the `source` will be the type being annotated. When [defining a custom type](concepts/types.md#as-a-method-on-a-custom-type), |
| 194 | +the `source` will be the actual class where `__get_pydantic_core_schema__` is defined. |
| 195 | + |
| 196 | +## Model validation and serialization |
| 197 | + |
| 198 | +While model definition was scoped to the _class_ level (i.e. when defining your model), model validation |
| 199 | +and serialization happens at the _instance_ level. Both these concepts are handled in `pydantic-core` |
| 200 | +(providing a 5 to 20 performance increase compared to Pydantic V1), by using the previously built core schema. |
| 201 | + |
| 202 | +`pydantic-core` exposes a [`SchemaValidator`][pydantic_core.SchemaValidator] and |
| 203 | +[`SchemaSerializer`][pydantic_core.SchemaSerializer] class to perform these tasks: |
| 204 | + |
| 205 | +```python |
| 206 | +from pydantic import BaseModel |
| 207 | + |
| 208 | + |
| 209 | +class Model(BaseModel): |
| 210 | + foo: int |
| 211 | + |
| 212 | + |
| 213 | +model = Model.model_validate({'foo': 1}) # (1)! |
| 214 | +dumped = model.model_dump() # (2)! |
| 215 | +``` |
| 216 | + |
| 217 | +1. The provided data is sent to `pydantic-core` by using the |
| 218 | + [`SchemaValidator.validate_python`][pydantic_core.SchemaValidator.validate_python] method. |
| 219 | + `pydantic-core` will validate (following the core schema of the model) the data and populate |
| 220 | + the model's `__dict__` attribute. |
| 221 | +2. The `model` instance is sent to `pydantic-core` by using the |
| 222 | + [`SchemaSerializer.to_python`][pydantic_core.SchemaSerializer.to_python] method. |
| 223 | + `pydantic-core` will read the instance's `__dict__` attribute and built the appropriate result |
| 224 | + (again, following the core schema of the model). |
0 commit comments