-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Collect model fields when rebuilding a model #11388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This aligns with dataclasses.
This is the biggest change in this PR, and the hardest to reason about and the trickiest to implement. Instead of re-evaluating annotations on a field-by-field basis, we assume all fields have their annotation evaluated (and consequently the `FieldInfo` attributes -- some of them set depending on the annotation -- are correct). To do so, we collect model fields (if necessary, thanks to the new `__pydantic_fields_complete__` attribute) in `_model_schema()`, and then have `_common_field_schema()` called with a fully built `FieldInfo` instance (this mimics what is done for dataclasses). When `model_rebuild()` is called, we also want to recollect fields if it wasn't successful the first time. This introduces challenges, such as keeping the model assignments to be able to recollect fields and call `FieldInfo.from_annotated_attribute()`.
CodSpeed Performance ReportMerging #11388 will degrade performances by 15.94%Comparing Summary
Benchmarks breakdown
|
9f735a6
to
131fa9f
Compare
Deploying pydantic-docs with
|
Latest commit: |
de1adec
|
Status: | ✅ Deploy successful! |
Preview URL: | https://203f8a46.pydantic-docs.pages.dev |
Branch Preview URL: | https://model-fields-cleanup.pydantic-docs.pages.dev |
The `has_instance_in_type` function is no longer used. We don't need to track attributes explicitly set on `FieldInfo` (this was a code smell and I'm glad it's gone now).
bba626b
to
a463e08
Compare
…s()` They need to be performed once we know fields are complete.
0e3cd84
to
10a643a
Compare
74e241e
to
783c3be
Compare
783c3be
to
1743828
Compare
Great, this seems like a good step.
Seems good to be consistent here. We should comb through
I really like this pattern as well. Not only more performant, but also makes more sense logically.
Sure, willing to move forward here, these regressions aren't super concerning. As I review, I'll see if I can suggest any changes that might lead to slight reductions in the regressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the in depth PR description.
Code changes generally look good. I'd like to learn more about:
a) memory usage consequences
b) where more time is now being spent - the codspeed benchmarks aren't super clear on what's slowing things down
I understand that the slower benchmarks are for pretty specific use cases, but want to make sure we understand perf consequences before merging.
# The `from_annotated_attribute()` call below mutates the assigned `Field()`, so make a copy: | ||
original_assignment = ( | ||
copy(assigned_value) if not evaluated and isinstance(assigned_value, FieldInfo_) else assigned_value | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any metrics on perf impact of this copy? Seems like a pretty special case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only relevant if an annotation fails to evaluate and has a Field()
assigned:
a: Undefined = Field()
So not that common, + this copy could be removed if we manage to refactor how we merge field infos together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's definitely worth investigating refactoring how we merge field infos :).
""" | ||
FieldInfo_ = import_cached_field_info() | ||
|
||
rebuilt_fields: dict[str, FieldInfo] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understandable, but downside - this is going to increate memory usage. Could you benchmark this with memray and see what the difference is like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only going to make temporary memory allocations, as:
- if this is called from
model_rebuild()
, it will override the existing__pydantic_fields__
, so the old ones will be gc'ed. - if this is called in
GenerateSchema._model_schema()
, as part of another model build, they are going to be gc'ed at the end of the method.
5fb8716
to
29f7702
Compare
83bdb4c
to
76d535d
Compare
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
if config_wrapper.defer_build: | ||
# TODO we can also stop there if `__pydantic_fields_complete__` is False. | ||
# However, `set_model_fields()` is currently lenient and we don't have access to the `NameError`. | ||
# (which is useful as we can provide the name in the error message: `set_model_mock(cls, e.name)`) | ||
set_model_mocks(cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link an issue to this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #11453.
# Private attributes, used to rebuild FieldInfo instances: | ||
self._complete = True | ||
self._original_annotation: Any = PydanticUndefined | ||
self._original_assignment: Any = PydanticUndefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be _original_default
? Or is that confusing for the Field()
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's confusing for Field()
assignments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work here - this takes us in a good direction in terms of refactoring field collection and merging, which have the potential for some modest perf benefits and minimal breaking changes.
Note - failing 3rd party tests are unrelated and should be fixed by #11429 |
76d535d
to
de1adec
Compare
This results in parameterized classes being different type objects, although they represent the same type. A new refactor (#11388) was introduced since then and fixes the root issue.
This applying roughly the same logic from #11388 for dataclasses.
This applying roughly the same logic from #11388 for dataclasses.
Fixes #11390, fixes #10695, works towards #9969 (we don't get the
KeyError
anymore, but we still get a clean schema error because of a missing definition).Change Summary
#5305 made it so that unknown forward annotations would not stop field collection. The PR is huge and doesn't explain why this was done, but this introduced subtle bugs and code smells that this PR tries removing.
The first issue arising from this is that the model fields can be inaccurate when a model is not fully built yet:
and I won't be surprised if some users currently make use of
model_fields
on an incomplete model (especially as we implicitly rebuild on instance creation — meaning most users don't explicitly callmodel_rebuild()
). A relatively big concern I also have is FastAPI relying onFieldInfo
instances to validate data. They do manual stuff before calling the Pydantic validate methods, such as checking forFieldInfo.is_required()
,alias
.When a model is being rebuilt, we reevaluate the annotation and recreate a
FieldInfo
instance during the core schema gen process (and we used to even to this in a less performant way; see #10769):pydantic/pydantic/_internal/_generate_schema.py
Lines 1301 to 1320 in 929e8f4
Doing so introduced a code smell where we track explicitly set attributes using
_attributes_set
. This is known to be a source of bugs as we need to keep track of such explicitly set attrs when you alter theFieldInfo
instance:...
as the default value in some cases. See this added test case:pydantic/tests/test_edge_cases.py
Lines 1919 to 1927 in 929e8f4
deprecated
(first bug of Deprecated fields bugs with forward annotations #11390).And more generally:
_generate_schema.py#L1311
, we directly mutate the original field info instance to take the newly evaluated annotation. This is the reason we get theKeyError
on generics (KeyError when inhering from a self-referencing generic model #9969).ModelMetaclass.__new__
, at model creation (i.e. stuff that isn't reprocessed onmodel_rebuild()
), such as setting deprecated descriptors (second bug of #11390: we set these descriptors based onFieldInfo.deprecated
).Instead, this PR does three things:
__pydantic_fields_complete__
class property on models (for private use). It isTrue
if at least one field annotation failed to evaluate duringcollect_model_fields()
. On a related note: we still set__pydantic_fields__
for backwards compatibility, but a follow up PR can now easily add a user/deprecation warning when accessingmodel_fields
if__pydantic_fields_complete__ == False
. In V3, we could remove this backward compatibility logic (TBD).Also note that now we don't enter
GenerateSchema
if the fields collection failed, as it is guaranteed to fail anyway (if an annotation failed to evaluate during fields collection, it will also immediately fail to be evaluated inGenerateSchema
).model_rebuild()
will redo the fields collection if it wasn't successful the first time. Note that this aligns with Pydantic dataclasses.GenerateSchema
), we first callrebuild_model_fields()
; a new function that iterates over the existing (but potentially not complete) field infos, and recreate instances if the annotation needs to be evaluated again. This way, we avoid generating a schema for each field just to realize that one field still has a forward annotation:sub: Sub
, we don't callSub.model_rebuild()
. Doing so might be the right thing to do, but is unfortunately not backwards compatible. The reason is that the NS Resolver used forModel
might contain annotations (e.g. provided through theparent_namespace
) that should be available forSub
. This means that even ifsub: Sub
successfully builds,Sub
isn't complete yet — each model should be rebuilt independently, throughmodel_rebuild()
).TODO:
Performance issues: I believe this is a tradeoff to have, as the couple affected benchmarks are models with one field defined, and the newrebuild_model_fields()
might add a bit of overhead. However, it will be beneficial if many fields are defined (see the example withf1..100
above and thetest_failed_rebuild
benchmark).Related issue number
Checklist