8000 Add initial architeture documentation (#10077) · pydantic/pydantic@f5d6acf · GitHub
[go: up one dir, main page]

Skip to content

Commit f5d6acf

Browse files
authored
Add initial architeture documentation (#10077)
This is still work in progress and subject to changes.
1 parent 4253e14 commit f5d6acf

File tree

4 files changed

+229
-2
lines changed

4 files changed

+229
-2
lines changed

docs/api/base_model.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Pydantic models are simply classes which inherit from `BaseModel` and define fie
1313
- model_extra
1414
- model_fields
1515
- model_fields_set
16+
- __pydantic_core_schema__
1617
- model_construct
1718
- model_copy
1819
- model_dump

docs/architecture.md

Lines changed: 224 additions & 0 deletions
170
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
!!! warning "🚧 Work in Progress"
2+
This page is a work in progress.
3+
4+
Starting with Pydantic V2, part of the codebase is written in Rust in a separate package called `pydantic-core`.
5+
This was done partly in order to improve validation and serialization performance (with the cost of limited
6+
customization and extendibility of the internal logic).
7+
8+
This architecture documentation will first cover how the two `pydantic` and `pydantic-core` packages interact
9+
together, then will go through the architecture specifics for various patterns (model definition, validation,
10+
serialization, JSON Schema).
11+
12+
Usage of the Pydantic library can be divided into two parts:
13+
14+
- Model definition, done in the `pydantic` package.
15+
- Model validation and serialization, done in the `pydantic-core` package.
16+
17+
## Model definition
18+
19+
Whenever a Pydantic [`BaseModel`][pydantic.main.BaseModel] is defined, the metaclass
20+
will analyze the body of the model to collect a number of elements:
21+
22+
- Defined annotations to build model fields (collected in the [`model_fields`][pydantic.main.BaseModel.model_fields] attribute).
23+
- Model configuration, set with [`model_config`][pydantic.main.BaseModel.model_config].
24+
- Additional validators/serializers.
25+
- Private attributes, class variables, identification of generic parametrization, etc.
26+
27+
### Communicating between `pydantic` and `pydantic-core`: the core schema
28+
29+
We then need a way to communicate the collected information from the model definition to `pydantic-core`,
30+
so that validation and serialization is performed accordingly. To do so, Pydantic uses the concept
31+
of a core schema: a structured (and serializable) Python dictionary (represented using
32+
[`TypedDict`][typing.TypedDict] definitions) describing a specific validation and serialization
33+
logic. It is the core data structure used to communicate between the `pydantic` and `pydantic-core`
34+
packages. Every core schema has a required `type` key, and extra properties depending on this `type`.
35+
36+
The generation of a core schema is handled in a single place, by the `GenerateSchema` class
37+
(no matter if it is for a Pydantic model or anything else).
38+
39+
!!! note
40+
It is not possible to define a custom core schema. A core schema needs to be understood by the
41+
`pydantic-core` package, and as such we only support a fixed number of core schema types.
42+
This is also part of the reason why the `GenerateSchema` isn't truly exposed and properly
43+
documented.
44+
45+
The core schema definitions can be found in the [`pydantic_core.core_schema`][] module.
46+
47+
In the case of a Pydantic model, a core schema will be constructed and set as the
48+
[`__pydantic_core_schema__`][pydantic.main.BaseModel.__pydantic_core_schema__] attribute.
49+
50+
To illustrate what a core schema looks like, we will take the example of the
51+
[`bool`][pydantic_core.core_schema.bool_schema] core schema:
52+
53+
```python lint="skip" test="skip"
54+
class BoolSchema(TypedDict, total=False):
55+
type: Required[Literal['bool']]
56+
strict: bool
57+
ref: str
58+
metadata: Any
59+
serialization: SerSchema
60+
```
61+
62+
When defining a Pydantic model with a boolean field:
63+
64+
```python
65+
from pydantic import BaseModel, Field
66+
67+
68+
class Model(BaseModel):
69+
foo: bool = Field(strict=True)
70+
```
71+
72+
The core schema for the `foo` field will look like:
73+
74+
```python
75+
{
76+
'type': 'bool',
77+
'strict': True,
78+
}
79+
```
80+
81+
As seen in the [`BoolSchema`][pydantic_core.core_schema.bool_schema] definition,
82+
the serialization logic is also defined in the core schema.
83+
If we were to define a custom serialization function for `foo` (1), the `serialization` key would look like:
84+
{ .annotate }
85+
86+
1. For example using the [`field_serializer`][pydantic.functional_serializers.field_serializer] decorator:
87+
88+
```python test="skip" lint="skip"
89+
class Model(BaseModel):
90+
foo: bool = Field(strict=True)
91+
92+
@field_serializer('foo', mode='plain')
93+
def serialize_foo(self, value: bool) -> Any:
94+
...
95+
```
96+
97+
```python lint="skip" test="skip"
98+
{
99+
'type': 'function-plain',
100+
'function': <function Model.serialize_foo at 0x111>,
101+
'is_field_serializer': True,
102+
'info_arg': False,
103+
'return_schema': {'type': 'int'},
104+
}
105+
```
106+
107+
Note that this is also a core schema definition, just that it is only relevant for `pydantic-core` during serialization.
108+
109+
Core schemas cover a broad scope, and are used whenever we want to communicate between the Python and Rust side.
110+
While the previous examples were related to validation and serialization, it could in theory be used for anything:
111+
error management, extra metadata, etc.
112+
113+
### JSON Schema generation
114+
115+
You may have noticed that the previous serialization core schema has a `return_schema` key.
116+
This is because the core schema is also used to generate the corresponding JSON Schema.
117+
118+
Similar to how the core schema is generated, the JSON Schema generation is handled by the
119+
[`GenerateJsonSchema`][pydantic.json_schema.GenerateJsonSchema] class.
120+
The [`generate`][pydantic.json_schema.GenerateJsonSchema.generate] method
121+
is the main entry point and is given the core schema of that model.
122+
123+
Coming back to our `bool` field example, the [`bool_schema`][pydantic.json_schema.GenerateJsonSchema.bool_schema]
124+
method will be given the previously generated [boolean core schema][pydantic_core.core_schema.bool_schema]
125+
and will return the following JSON Schema:
126+
127+
```json
128+
{
129+
{"type": "boolean"}
130+
}
131+
```
132+
133+
### Customizing the core schema and JSON schema
134+
135+
!!! abstract "Usage Documentation"
136+
[Custom types](concepts/types.md#custom-types)
137+
138+
[Implementing `__get_pydantic_core_schema__`](concepts/json_schema.md#implementing-__get_pydantic_core_schema__)
139+
140+
[Implementing `__get_pydantic_json_schema__`](concepts/json_schema.md#implementing-__get_pydantic_json_schema__)
141+
142+
While the `GenerateSchema` and [`GenerateJsonSchema`][pydantic.json_schema.GenerateJsonSchema] classes handle
143+
the creation of the corresponding schemas, Pydantic offers a way to customize them in some cases, following a wrapper pattern.
144+
This customization is done through the `__get_pydantic_core_schema__` and `__get_pydantic_json_schema__` methods.
145+
146+
To understand this wrapper pattern, we will take the example of metadata classes used with [`Annotated`][typing.Annotated],
147+
where the `__get_pydantic_core_schema__` method can be used:
148+
149+
```python
150+
from typing import Any
151+
152+
from pydantic_core import CoreSchema
153+
from typing_extensions import Annotated
154+
155+
from pydantic import GetCoreSchemaHandler, TypeAdapter
156+
157+
158+
class MyStrict:
159+
@classmethod
160+
def __get_pydantic_core_schema__(
161+
cls, source: Any, handler: GetCoreSchemaHandler
162+
) -> CoreSchema:
163+
schema = handler(source) # (1)!
164+
schema['strict'] = True
165+
return schema
166+
167+
168+
class MyGt:
169+
@classmethod
+
def __get_pydantic_core_schema__(
171+
cls, source: Any, handler: GetCoreSchemaHandler
172+
) -> CoreSchema:
173+
schema = handler(source) # (2)!
174+
schema['gt'] = 1
175+
return schema
176+
177+
178+
ta = TypeAdapter(Annotated[int, MyStrict(), MyGt()])
179+
```
180+
181+
1. `MyStrict` is the first annotation to be applied. At this point, `schema = {'type': 'int'}`.
182+
2. `MyGt` is the last annotation to be applied. At this point, `schema = {'type': 'int', 'strict': True}`.
183+
184+
When the `GenerateSchema` class builds the core schema for `Annotated[int, MyStrict(), MyGt()]`, it will
185+
create an instance of a `GetCoreSchemaHandler` to be passed to the `MyGt.__get_pydantic_core_schema__` method. (1)
186+
{ .annotate }
187+
188+
1. In the case of our [`Annotated`][typing.Annotated] pattern, the `GetCoreSchemaHandler` is defined in a nested way.
189+
Calling it will recursively call the other `__get_pydantic_core_schema__` methods until it reaches the `int` annotation,
190+
where a simple `{'type': 'int'}` schema is returned.
191+
192+
The `source` argument depends on the core schema generation pattern. In the case of [`Annotated`][typing.Annotated],
193+
the `source` will be the type being annotated. When [defining a custom type](concepts/types.md#as-a-method-on-a-custom-type),
194+
the `source` will be the actual class where `__get_pydantic_core_schema__` is defined.
195+
196+
## Model validation and serialization
197+
198+
While model definition was scoped to the _class_ level (i.e. when defining your model), model validation
199+
and serialization happens at the _instance_ level. Both these concepts are handled in `pydantic-core`
200+
(providing a 5 to 20 performance increase compared to Pydantic V1), by using the previously built core schema.
201+
202+
`pydantic-core` exposes a [`SchemaValidator`][pydantic_core.SchemaValidator] and
203+
[`SchemaSerializer`][pydantic_core.SchemaSerializer] class to perform these tasks:
204+
205+
```python
206+
from pydantic import BaseModel
207+
208+
209+
class Model(BaseModel):
210+
foo: int
211+
212+
213+
model = Model.model_validate({'foo': 1}) # (1)!
214+
dumped = model.model_dump() # (2)!
215+
```
216+
217+
1. The provided data is sent to `pydantic-core` by using the
218+
[`SchemaValidator.validate_python`][pydantic_core.SchemaValidator.validate_python] method.
219+
`pydantic-core` will validate (following the core schema of the model) the data and populate
220+
the model's `__dict__` attribute.
221+
2. The `model` instance is sent to `pydantic-core` by using the
222+
[`SchemaSerializer.to_python`][pydantic_core.SchemaSerializer.to_python] method.
223+
`pydantic-core` will read the instance's `__dict__` attribute and built the appropriate result
224+
(again, following the core schema of the model).

mkdocs.yml

Lines changed: 1 addition & 0 deletions
38BA
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,7 @@ nav:
147147
- Semantic Version: api/pydantic_extra_types_semantic_version.md
148148
- Timezone Name: api/pydantic_extra_types_timezone_name.md
149149
- ULID: api/pydantic_extra_types_ulid.md
150+
- Architecture: architecture.md
150151
- Examples:
151152
- Secrets: examples/secrets.md
152153
- Validators: examples/validators.md

pydantic/main.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,8 @@ class BaseModel(metaclass=_model_construction.ModelMetaclass):
136136

137137
__pydantic_complete__: ClassVar[bool]
138138
__pydantic_core_schema__: ClassVar[CoreSchema]
139+
"""The core schema of the model."""
140+
139141
__pydantic_custom_init__: ClassVar[bool]
140142
__pydantic_decorators__: ClassVar[_decorators.DecoratorInfos]
141143
__pydantic_generic_metadata__: ClassVar[_generics.PydanticGenericMetadata]
@@ -994,8 +996,7 @@ def __init_subclass__(cls, **kwargs: Unpack[ConfigDict]):
994996
```py
995997
from pydantic import BaseModel
996998
997-
class MyModel(BaseModel, extra='allow'):
998-
...
999+
class MyModel(BaseModel, extra='allow'): ...
9991000
```
10001001
10011002
However, this may be deceiving, since the _actual_ calls to `__init_subclass__` will not receive any

0 commit comments

Comments
 (0)
0