Collect model fields when rebuilding a model #11388

Viicos · 2025-02-04T17:00:18Z

Fixes #11390, fixes #10695, works towards #9969 (we don't get the KeyError anymore, but we still get a clean schema error because of a missing definition).

Change Summary

#5305 made it so that unknown forward annotations would not stop field collection. The PR is huge and doesn't explain why this was done, but this introduced subtle bugs and code smells that this PR tries removing.

The first issue arising from this is that the model fields can be inaccurate when a model is not fully built yet:

from typing import Annotated

from pydantic import BaseModel

class Model(BaseModel):
    a: 'Annotated[Forward, Field(alias="b")]'

Model.model_fields['a'].alias
#> None, should be 'b'

and I won't be surprised if some users currently make use of model_fields on an incomplete model (especially as we implicitly rebuild on instance creation — meaning most users don't explicitly call model_rebuild()). A relatively big concern I also have is FastAPI relying on FieldInfo instances to validate data. They do manual stuff before calling the Pydantic validate methods, such as checking for FieldInfo.is_required(), alias.

When a model is being rebuilt, we reevaluate the annotation and recreate a FieldInfo instance during the core schema gen process (and we used to even to this in a less performant way; see #10769):

pydantic/pydantic/_internal/_generate_schema.py

Lines 1301 to 1320 in 929e8f4

    
           if not field_info.evaluated: 
        
               # TODO Can we use field_info.apply_typevars_map here? 
        
               try: 
        
                   evaluated_type = _typing_extra.eval_type(field_info.annotation, *self._types_namespace) 
        
               except NameError as e: 
        
                   raise PydanticUndefinedAnnotation.from_name_error(e) from e 
        
               evaluated_type = replace_types(evaluated_type, self._typevars_map) 
        
               field_info.evaluated = True 
        
               if not has_instance_in_type(evaluated_type, PydanticRecursiveRef): 
        
                   new_field_info = FieldInfo.from_annotation(evaluated_type) 
        
                   field_info.annotation = new_field_info.annotation 
        
                   # Handle any field info attributes that may have been obtained from now-resolved annotations 
        
                   for k, v in new_field_info._attributes_set.items(): 
        
                       # If an attribute is already set, it means it was set by assigning to a call to Field (or just a 
        
                       # default value), and that should take the highest priority. So don't overwrite existing attributes. 
        
                       # We skip over "attributes" that are present in the metadata_lookup dict because these won't 
        
                       # actually end up as attributes of the `FieldInfo` instance. 
        
                       if k not in field_info._attributes_set and k not in field_info.metadata_lookup: 
        
                           setattr(field_info, k, v)

Doing so introduced a code smell where we track explicitly set attributes using _attributes_set. This is known to be a source of bugs as we need to keep track of such explicitly set attrs when you alter the FieldInfo instance:

We forgot to update it when using ... as the default value in some cases. See this added test case:

pydantic/tests/test_edge_cases.py

Lines 1919 to 1927 in 929e8f4

    
           def test_ellipsis_forward_ref_annotated() -> None: 
        
               """This previously resulted in the ellipsis being used as a default value.""" 
        
               class Model(BaseModel): 
        
                   f: 'Forward' 
        
               Forward = Annotated[int, Field(...)] 
        
               assert Model.model_fields['f'].default is PydanticUndefined

We currently forget to update deprecated (first bug of Deprecated fields bugs with forward annotations #11390).

And more generally:

On _generate_schema.py#L1311, we directly mutate the original field info instance to take the newly evaluated annotation. This is the reason we get the KeyError on generics (KeyError when inhering from a self-referencing generic model #9969).
We make use of the model fields (even though they might be incorrect) inside ModelMetaclass.__new__ , at model creation (i.e. stuff that isn't reprocessed on model_rebuild()), such as setting deprecated descriptors (second bug of #11390: we set these descriptors based on FieldInfo.deprecated).

Instead, this PR does three things:

Add a new __pydantic_fields_complete__ class property on models (for private use). It is True if at least one field annotation failed to evaluate during collect_model_fields(). On a related note: we still set __pydantic_fields__ for backwards compatibility, but a follow up PR can now easily add a user/deprecation warning when accessing model_fields if __pydantic_fields_complete__ == False. In V3, we could remove this backward compatibility logic (TBD).
Also note that now we don't enter GenerateSchema if the fields collection failed, as it is guaranteed to fail anyway (if an annotation failed to evaluate during fields collection, it will also immediately fail to be evaluated in GenerateSchema).
Calling model_rebuild() will redo the fields collection if it wasn't successful the first time. Note that this aligns with Pydantic dataclasses.
When an incomplete Pydantic model is encountered as an annotation (inside GenerateSchema), we first call rebuild_model_fields(); a new function that iterates over the existing (but potentially not complete) field infos, and recreate instances if the annotation needs to be evaluated again. This way, we avoid generating a schema for each field just to realize that one field still has a forward annotation:
```
class Sub(BaseModel):
    f1: int
    ...  # Many fields...
    f100: 'Forward'

class Model(BaseModel):
    sub: Sub
    # Note that Sub is incomplete, so in the process of generating the schema for
    # `Model`, we don't use the cached schema attribute
    # Thus previously, we would generate a schema for f1, f2, ..., f99. f100 would fail,
    # meaning we generated 99 fields core schemas for nothing.
    # In this PR, we first (re)collect fields for `Sub`, and immediately raise if fields collection did not suceed.
```
On important thing to acknowledge: when encountering sub: Sub, we don't call Sub.model_rebuild(). Doing so might be the right thing to do, but is unfortunately not backwards compatible. The reason is that the NS Resolver used for Model might contain annotations (e.g. provided through the parent_namespace) that should be available for Sub. This means that even if sub: Sub successfully builds, Sub isn't complete yet — each model should be rebuilt independently, through model_rebuild()).

TODO:

Performance issues: I believe this is a tradeoff to have, as the couple affected benchmarks are models with one field defined, and the new rebuild_model_fields() might add a bit of overhead. However, it will be beneficial if many fields are defined (see the example with f1..100 above and the test_failed_rebuild benchmark).

Related issue number

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

This aligns with dataclasses.

This is the biggest change in this PR, and the hardest to reason about and the trickiest to implement. Instead of re-evaluating annotations on a field-by-field basis, we assume all fields have their annotation evaluated (and consequently the `FieldInfo` attributes -- some of them set depending on the annotation -- are correct). To do so, we collect model fields (if necessary, thanks to the new `__pydantic_fields_complete__` attribute) in `_model_schema()`, and then have `_common_field_schema()` called with a fully built `FieldInfo` instance (this mimics what is done for dataclasses). When `model_rebuild()` is called, we also want to recollect fields if it wasn't successful the first time. This introduces challenges, such as keeping the model assignments to be able to recollect fields and call `FieldInfo.from_annotated_attribute()`.

codspeed-hq · 2025-02-04T17:04:43Z

CodSpeed Performance Report

Merging #11388 will degrade performances by 15.94%

_{Comparing model-fields-cleanup (de1adec) with main (53f8ece)}

Summary

⚡ 2 improvements
❌ 3 regressions
✅ 41 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`test_schema_build`	2.5 ms	2.8 ms	-11.53%
⚡	`test_construct_dataclass_schema`	1.2 ms	1.1 ms	+8.6%
⚡	`test_failed_rebuild`	3,851.1 µs	211.9 µs	×18
❌	`test_deeply_nested_recursive_model_schema_generation`	1.1 ms	1.3 ms	-15.94%
❌	`test_recursive_discriminated_union_with_base_model`	1.4 ms	1.5 ms	-8.22%

cloudflare-workers-and-pages · 2025-02-04T17:04:58Z

Deploying pydantic-docs with Cloudflare Pages

Latest commit:	`de1adec`
Status:	✅ Deploy successful!
Preview URL:	https://203f8a46.pydantic-docs.pages.dev
Branch Preview URL:	https://model-fields-cleanup.pydantic-docs.pages.dev

View logs

The `has_instance_in_type` function is no longer used. We don't need to track attributes explicitly set on `FieldInfo` (this was a code smell and I'm glad it's gone now).

…s()` They need to be performed once we know fields are complete.

pydantic/fields.py

pydantic/_internal/_generics.py

pydantic/_internal/_model_construction.py

sydney-runkle · 2025-02-10T11:42:43Z

Add a new __pydantic_fields_complete__ class property on models

Great, this seems like a good step.

Calling model_rebuild() will redo the fields collection if it wasn't successful the first time. Note that this aligns with Pydantic dataclasses.

Seems good to be consistent here. We should comb through model_rebuild issues, this might close some of those.

This way, we avoid generating a schema for each field just to realize that one field still has a forward annotation:

I really like this pattern as well. Not only more performant, but also makes more sense logically.

Performance issues: I believe this is a tradeoff to have, as the couple affected benchmarks are models with one field defined, and the new rebuild_model_fields() might add a bit of overhead. However, it will be beneficial if many fields are defined (see the example with f1..100 above and the test_failed_rebuild benchmark).

Sure, willing to move forward here, these regressions aren't super concerning. As I review, I'll see if I can suggest any changes that might lead to slight reductions in the regressions.

sydney-runkle

Thanks for the in depth PR description.

Code changes generally look good. I'd like to learn more about:
a) memory usage consequences
b) where more time is now being spent - the codspeed benchmarks aren't super clear on what's slowing things down

I understand that the slower benchmarks are for pretty specific use cases, but want to make sure we understand perf consequences before merging.

pydantic/fields.py

pydantic/_internal/_model_construction.py

pydantic/main.py

pydantic/_internal/_fields.py

sydney-runkle · 2025-02-10T16:08:16Z

pydantic/_internal/_fields.py

+            # The `from_annotated_attribute()` call below mutates the assigned `Field()`, so make a copy:
+            original_assignment = (
+                copy(assigned_value) if not evaluated and isinstance(assigned_value, FieldInfo_) else assigned_value
+            )


Any metrics on perf impact of this copy? Seems like a pretty special case...

It's only relevant if an annotation fails to evaluate and has a Field() assigned:

a: Undefined = Field()

So not that common, + this copy could be removed if we manage to refactor how we merge field infos together.

I think it's definitely worth investigating refactoring how we merge field infos :).

sydney-runkle · 2025-02-10T16:09:24Z

pydantic/_internal/_fields.py

+    """
+    FieldInfo_ = import_cached_field_info()
+
+    rebuilt_fields: dict[str, FieldInfo] = {}


Understandable, but downside - this is going to increate memory usage. Could you benchmark this with memray and see what the difference is like?

It's only going to make temporary memory allocations, as:

if this is called from model_rebuild(), it will override the existing __pydantic_fields__, so the old ones will be gc'ed.

if this is called in GenerateSchema._model_schema(), as part of another model build, they are going to be gc'ed at the end of the method.

pydantic/_internal/_generics.py

pydantic/_internal/_fields.py

github-actions · 2025-02-12T14:12:13Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
pydantic
fields.py
main.py
pydantic/_internal
_fields.py
_generate_schema.py
_model_construction.py
Project Total

_{This report was generated by python-coverage-comment-action}

pydantic/_internal/_fields.py

sydney-runkle · 2025-02-12T14:59:59Z

pydantic/_internal/_model_construction.py

+            if config_wrapper.defer_build:
+                # TODO we can also stop there if `__pydantic_fields_complete__` is False.
+                # However, `set_model_fields()` is currently lenient and we don't have access to the `NameError`.
+                # (which is useful as we can provide the name in the error message: `set_model_mock(cls, e.name)`)
+                set_model_mocks(cls)


Can we link an issue to this comment?

Created #11453.

sydney-runkle · 2025-02-12T15:00:33Z

pydantic/fields.py

+        # Private attributes, used to rebuild FieldInfo instances:
+        self._complete = True
+        self._original_annotation: Any = PydanticUndefined
+        self._original_assignment: Any = PydanticUndefined


Should this be _original_default? Or is that confusing for the Field() case?

I think it's confusing for Field() assignments

sydney-runkle

Great work here - this takes us in a good direction in terms of refactoring field collection and merging, which have the potential for some modest perf benefits and minimal breaking changes.

sydney-runkle · 2025-02-12T15:04:13Z

Note - failing 3rd party tests are unrelated and should be fixed by #11429

This results in parameterized classes being different type objects, although they represent the same type. A new refactor (#11388) was introduced since then and fixes the root issue.

This applying roughly the same logic from #11388 for dataclasses.

Viicos added 2 commits February 3, 2025 21:29

Collect and set model fields when rebuilding model

7175058

This aligns with dataclasses.

github-actions bot added the relnotes-fix Used for bugfixes. label Feb 4, 2025

Viicos force-pushed the model-fields-cleanup branch from 9f735a6 to 131fa9f Compare February 4, 2025 17:04

Clean up unused code

4e1f4c5

The `has_instance_in_type` function is no longer used. We don't need to track attributes explicitly set on `FieldInfo` (this was a code smell and I'm glad it's gone now).

Viicos added the third-party-tests Add this label on a PR to trigger 3rd party tests label Feb 4, 2025

Viicos closed this Feb 4, 2025

Viicos reopened this Feb 4, 2025

Viicos force-pushed the model-fields-cleanup branch from bba626b to a463e08 Compare February 4, 2025 19:58

Move operations depending on model fields inside `complete_model_clas…

361847c

…s()` They need to be performed once we know fields are complete.

Viicos force-pushed the model-fields-cleanup branch from 0e3cd84 to 10a643a Compare February 6, 2025 13:54

Viicos added 2 commits February 6, 2025 16:47

Simplify condition for field info creation

07f2c36

Alternative approach

5a73371

8000 Viicos force-pushed the model-fields-cleanup branch 4 times, most recently from 74e241e to 783c3be Compare February 7, 2025 10:53

Viicos commented Feb 7, 2025

View reviewed changes

pydantic/fields.py Show resolved Hide resolved

Viicos added 2 commits February 7, 2025 17:01

Add deprecated issue test

1896c54

Add generics test

125c697

Viicos force-pushed the model-fields-cleanup branch from 783c3be to 1743828 Compare February 7, 2025 16:32

Viicos commented Feb 7, 2025

View reviewed changes

pydantic/_internal/_generics.py Outdated Show resolved Hide resolved

Viicos commented Feb 7, 2025

View reviewed changes

pydantic/_internal/_model_construction.py Show resolved Hide resolved

sydney-runkle reviewed Feb 10, 2025

View reviewed changes

Feedback

da90550

Viicos force-pushed the model-fields-cleanup branch from 5fb8716 to 29f7702 Compare February 10, 2025 21:21

Last fixes

de1adec

Viicos marked this pull request as ready for review February 12, 2025 14:05

Viicos force-pushed the model-fields-cleanup branch from 83bdb4c to 76d535d Compare February 12, 2025 14:06

sydney-runkle reviewed Feb 12, 2025

View reviewed changes

pydantic/_internal/_fields.py Show resolved Hide resolved

sydney-runkle reviewed Feb 12, 2025

View reviewed changes

sydney-runkle approved these changes Feb 12, 2025

View reviewed changes

Viicos force-pushed the model-fields-cleanup branch from 76d535d to de1adec Compare February 12, 2025 16:28

Viicos merged commit 72c77bf into main Feb 12, 2025
76 of 77 checks passed

Viicos deleted the model-fields-cleanup branch February 12, 2025 16:41

This was referenced Feb 17, 2025

Raise a warning when accessing incomplete model fields #11453

Open

FormPage.__pydantic_init_subclass__ doesn't behave as expected workfloworchestrator/pydantic-forms#39

Closed

This was referenced Apr 1, 2025

Do not override FieldInfo._complete when using field from parent class #11668

Merged

Recursion ForwardRef issue starting from pydantic 2.11.0 #11689

Closed

Preserve field description when rebuilding model fields #11698

Merged

Viicos added a commit that referenced this pull request Apr 15, 2025

Remove generics cache workaround

e7d3bbf

This results in parameterized classes being different type objects, although they represent the same type. A new refactor (#11388) was introduced since then and fixes the root issue.

Viicos mentioned this pull request Apr 15, 2025

Remove generics cache workaround #11755

Merged

5 tasks

Viicos added a commit that referenced this pull request Jun 4, 2025

Rebuild dataclass fields before schema generation

158a17d

This applying roughly the same logic from #11388 for dataclasses.

Viicos mentioned this pull request Jun 4, 2025

Rebuild dataclass fields before schema generation #11949

Open

5 tasks

Viicos added a commit that referenced this pull request Jun 4, 2025

Rebuild dataclass fields before schema generation

fc1858d

This applying roughly the same logic from #11388 for dataclasses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Collect model fields when rebuilding a model #11388

Collect model fields when rebuilding a model #11388

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	if not field_info.evaluated:
	# TODO Can we use field_info.apply_typevars_map here?
	try:
	evaluated_type = _typing_extra.eval_type(field_info.annotation, *self._types_namespace)
	except NameError as e:
	raise PydanticUndefinedAnnotation.from_name_error(e) from e
	evaluated_type = replace_types(evaluated_type, self._typevars_map)
	field_info.evaluated = True
	if not has_instance_in_type(evaluated_type, PydanticRecursiveRef):
	new_field_info = FieldInfo.from_annotation(evaluated_type)
	field_info.annotation = new_field_info.annotation

	# Handle any field info attributes that may have been obtained from now-resolved annotations
	for k, v in new_field_info._attributes_set.items():
	# If an attribute is already set, it means it was set by assigning to a call to Field (or just a
	# default value), and that should take the highest priority. So don't overwrite existing attributes.
	# We skip over "attributes" that are present in the metadata_lookup dict because these won't
	# actually end up as attributes of the `FieldInfo` instance.
	if k not in field_info._attributes_set and k not in field_info.metadata_lookup:
	setattr(field_info, k, v)

	def test_ellipsis_forward_ref_annotated() -> None:
	"""This previously resulted in the ellipsis being used as a default value."""

	class Model(BaseModel):
	f: 'Forward'

	Forward = Annotated[int, Field(...)]

	assert Model.model_fields['f'].default is PydanticUndefined

Uh oh!

Collect model fields when rebuilding a model #11388

Collect model fields when rebuilding a model #11388

Uh oh!

Conversation

Uh oh!

Change Summary

Related issue number

Checklist

Uh oh!

Uh oh!

CodSpeed Performance Report

Merging #11388 will degrade performances by 15.94%

Summary

Benchmarks breakdown

Uh oh!

Uh oh!

Deploying pydantic-docs with Cloudflare Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Coverage report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!