-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Allow customizing core schema generation by making GenerateSchema public
#6737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploying with
|
| Latest commit: |
6daf1c8
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://c13c4805.pydantic-docs2.pages.dev |
| Branch Preview URL: | https://customize-str-schema.pydantic-docs2.pages.dev |
|
@vitalik I'm curious what you think of this approach. It is more restrictive than arbitrary type replacement but also pretty powerful and easy to understand. |
|
Hi @adriangb thank you for looking into this yeah, few tests on my projects seems to work Maybe you should also include some example to docs on how to achieve v1 behaviour globally : from pydantic.v1.validators import str_validator
class LaxStrGenerator(SchemaGenerator):
def str_schema(self):
return core_schema.no_info_plain_validator_function(str_validator)
BaseModel.model_config['schema_generator'] = LaxStrGenerator
... |
|
Questions to think about:
|
|
After talking with @dmontagu yesterday we came to the conclusion that this approach is promising, but there was a lot of overlap between the new I added methods for some generic types ( Some open questions:
|
|
@vitalik More feedback from your perspective is appreciated :) |
|
@dmontagu I've added |
a36fbaa to
b045d64
Compare
| assert type(m.tuple_field) is tuple | ||
| assert type(m.long_tuple_field) is tuple | ||
| assert type(m.tuple_field) is CustomTuple | ||
| assert type(m.long_tuple_field) is CustomLongTuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several bugs / inconsistencies are getting fixed by this PR
|
please review |
|
Could you please add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice 👍🏻
Also, this opens up some possibilities for testing as schema_generator basically becomes a dependency injection.
| if metadata_schema: | ||
| self._add_js_function(metadata_schema, metadata_js_function) | ||
|
|
||
| _add_custom_serialization_from_json_encoders(self._json_encoders, obj, schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line go above the metadata_js_function stuff immediately prior? It seems if this line is going to modify the serialization, it would make sense to generate the JSON schema after that modification to schema has been performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter? We're editing the CoreSchema here directly while the js func won't get executed until later. Or maybe I'm misunderstanding. Is there a test case we can add?
| if json_encoders is None: | ||
| return | ||
| # Check the class type and its superclasses for a matching encoder | ||
| for base in (obj, obj.__class__.__mro__[:-1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the :-1 here just to prevent object from being a base that gets checked? Any reason not to allow people to have an encoder for object? Lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I just copied that from V1
| JsonEncoders = Dict[Type[Any], Callable[[Any], Any]] | ||
|
|
||
|
|
||
| def _add_custom_serialization_from_json_encoders( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense for this to be a method of GenerateSchema? I'm okay either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t think it matters too much happy to change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks great. Good work!
I think after this is merged (doesn't need to be today), that GenerateSchema should more to a public module, probably a new module pydantic/generate_schema.py
| def _arbitrary_types(self) -> bool: | ||
| return self._config_wrapper.arbitrary_types_allowed | ||
|
|
||
| def literal_schema(self, values: list[Any]) -> CoreSchema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add docstrings for all these, ideally linking to the core_schema methods.
| TUPLE_TYPES: list[type] = [tuple, typing.Tuple] | ||
| LIST_TYPES: list[type] = [list, typing.List, collections.abc.MutableSequence] | ||
| SET_TYPES: list[type] = [set, typing.Set, collections.abc.MutableSet] | ||
| FROZEN_SET_TYPES: list[type] = [frozenset, typing.FrozenSet, collections.abc.Set] | ||
| DICT_TYPES: list[type] = [dict, typing.Dict, collections.abc.MutableMapping, collections.abc.Mapping] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make these sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can, but that's not always the case. Types don't promise to be hashable and since it's an O(1) lookup either way I tend to go for a list/tuple for these sorts of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case in point: https://github.com/pydantic/pydantic/actions/runs/5630806077/job/15257133423?pr=6737#step:8:174
Reverting, sorry Samuel.
| elif obj in DICT_TYPES: | ||
| return self.dict_schema(*(self._get_args_resolving_forward_refs(obj) or (Any, Any))) | ||
| elif isinstance(obj, TypeAliasType): | ||
| return self._type_alias_type_schema(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again should _type_schema be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat arbitrarily deciding what to make public or not. I picked the ones that I felt were important, simple or proved that something works (e.g. lists). No reason to make everything public now, let's do it a bit at a time as we understand the use cases.
| assert Child.model_config == {'extra': 'allow', 'str_to_lower': True} | ||
|
|
||
|
|
||
| def test_json_encoders_model() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we hunt the v1 docs and copy tests that used json_encoders? I feel like we had a lot and they probably covered a lot of weird behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied several of them already.
|
please update. |
| def test_schema_generator() -> None: | ||
| class LaxStrGenerator(GenerateSchema): | ||
| def str_schema(self) -> CoreSchema: | ||
| return core_schema.no_info_plain_validator_function(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this specifically is intended to be the solution for people who want to override the string validation, but if it is, we should make sure the JSON schema generation works. Right now, you get:
>>> ta.json_schema()
pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.PlainValidatorFunctionSchema ({'type': 'no-info', 'function': <class 'str'>})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be up to them to return a CoreSchema that can generate a JSON schema. I'm not sure there's much we can do.
| if _typing_extra.origin_is_union(origin): | ||
| return self._union_schema(obj) | ||
| elif issubclass(origin, typing.Tuple): # type: ignore[arg-type] | ||
| elif origin in TUPLE_TYPES: |
| return core_schema.any_schema() | ||
| return self.match_type(obj) | ||
|
|
||
| def match_type(self, obj: Any) -> core_schema.CoreSchema: # noqa: C901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this method intended to be public? I know it says we'll evolve this below, but it might make sense to say something to the effect of "if you override this, know that it may be updated in future versions" or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. My only concern is making sure that we are clear about what we want to guarantee in terms of breaking changes to GenerateSchema; it feels like there could be bugs in the future where we want to make breaking changes to some of the non-leading-_ methods, not sure if you agree with that but if you do maybe it makes sense to document this more explicitly (even if we don't change the leading-_-ness).
(Also please address Samuel's feedback before merging.)
|
also mypy tests are failing. |
eb4f6e3 to
c3e2b75
Compare
|
Since we're now passing We could also arrange this in a hierarchy of methods that ~ reflects the type hierarchy, e.g. Something like this: from inspect import isclass
from typing import Any, Collection, Dict, Mapping, get_args, get_origin
from pydantic_core import CoreSchema, core_schema
class GenerateSchema:
def match_type(self, tp: Any) -> CoreSchema:
if isclass(tp):
if issubclass(tp, Collection):
return self.collection_schema(tp, tp)
origin = get_origin(tp)
if isclass(origin):
if issubclass(origin, Collection):
return self.collection_schema(tp, origin)
# handle non classes like ForwardRef
raise NotImplementedError
def collection_schema(self, tp: Any, origin: Any) -> CoreSchema:
if issubclass(tp, Mapping):
return self.mapping_schema(tp, origin)
raise NotImplementedError('Subclass to support other collection types')
def mapping_schema(self, tp: Any, origin: Any) -> CoreSchema:
if origin is not None:
args = get_args(tp)
assert len(args) == 2, 'Expected Mapping to have two generic arguments'
key_type, value_type = args
else:
key_type = value_type = Any
if issubclass(tp, Dict):
return self.dict_schema(tp, origin, key_type, value_type)
raise NotImplementedError('Subclass to support other mapping types')
def dict_schema(self, tp: Any, origin: Any, key_type: Any, value_type: Any) -> CoreSchema:
if origin is not dict:
raise NotImplementedError('Subclass to support custom dict subclasses')
return core_schema.dict_schema(self.match_type(key_type), self.match_type(value_type)) |
eea980c to
63c7ff5
Compare
|
After some internal discussion, we came to the conclusion that exposing |
2fc538e to
4da8b58
Compare
|
please review |
GenerateSchema public
a4a7c7d to
b76dc2b
Compare
|
Wanted to notify folks following along with this pattern - we're planning on reverting this (very loose) support in v2.10. See #10303 for details. The |
The idea here is to make a more constrained version of schema generation customization that we can expand over time.
Fix #6045
EDIT by @Kludex: Closes #6726
Selected Reviewer: @lig