8000 Allow single-item discriminated unions by dmontagu · Pull Request #6405 · pydantic/pydantic · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@dmontagu
Copy link
Contributor
@dmontagu dmontagu commented Jul 3, 2023

I don't have proof yet as I can't reproduce the issue, but I suspect this may resolve the issue brought up in #5163 (comment).

Even if not, I think this is an improvement worth making, as I don't see any reason we should error if you produce a single-item discriminated union. In particular, it may be desirable to do this as a way to immediately fail validation if the tag is wrong rather than producing all the possible associated errors.

(The test_discriminated_single_variant test I've added shows this in action.)

Selected Reviewer: @davidhewitt

@cloudflare-workers-and-pages
Copy link
cloudflare-workers-and-pages bot commented Jul 3, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 4213a62
Status: ✅  Deploy successful!
Preview URL: https://a2761ace.pydantic-docs2.pages.dev
Branch Preview URL: https://allow-single-item-discrimina.pydantic-docs2.pages.dev

View logs

@dmontagu
Copy link
Contributor Author
dmontagu commented Jul 3, 2023

Update: the issue described in the comment linked above is resolved on the current main branch even without this PR.

However, I would still be in favor of merging this for the reasons described above. Obviously it's less urgent though given it's not necessary to resolve a reported issue.

@dmontagu
Copy link
Contributor Author
dmontagu commented Jul 3, 2023

please review

@adriangb
Copy link
Member
adriangb commented Jul 4, 2023

wouldn't the right thing to do be to just return the inner schema? why have a union at all if there's just 1 thing?

@dmontagu
Copy link
Contributor Author
dmontagu commented Jul 4, 2023

I can imagine scenarios where it's useful to get the discriminated union behavior even for a single union case (note: this is in contrast with the behavior of "normal" unions, where I don't believe using a single-item-union would provide any even-theoretically-beneficial difference in behavior). In particular, rather than reporting all the possible errors if the discriminator is missing, it just reports one.

More concretely, here is an example:

from typing import Literal, Annotated

from pydantic import BaseModel, TypeAdapter, Field, ValidationError

class Model1(BaseModel):
    kind: Literal['one']
    x: str
    y: int
    z: float
    w: bool


class Model2(BaseModel):
    kind: Literal['two']
    x: int
    y: str
    z: bool
    w: float


data = {'kind': 'two', 'x': 1, 'y': 'a', 'z': False, 'w': 1.5}

adapter = TypeAdapter(Model1)
try:
    adapter.validate_python(data)
except ValidationError as exc:
    print(exc)
    """
    4 validation errors for Model1
    kind
      Input should be 'one' [type=literal_error, input_value='two', input_type=str]
        For further information visit https://errors.pydantic.dev/2.0.2/v/literal_error
    x
      Input should be a valid string [type=string_type, input_value=1, input_type=int]
        For further information visit https://errors.pydantic.dev/2.0.2/v/string_type
    y
      Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='a', input_type=str]
        For further information visit https://errors.pydantic.dev/2.0.2/v/int_parsing
    w
      Input should be a valid boolean [type=bool_type, input_value=1.5, input_type=float]
        For further information visit https://errors.pydantic.dev/2.0.2/v/bool_type
    """

adapter_with_discriminator = TypeAdapter(Annotated[Model1, Field(..., discriminator='kind')])
try:
    adapter_with_discriminator.validate_python(data)
except ValidationError as exc:
    print(exc)
"""
1 validation error for tagged-union[Model1]
  Input tag 'two' found using 'kind' does not match any of the expected tags: 'one' [type=union_tag_invalid, input_value={'kind': 'two', 'x': 1, '...', 'z': False, 'w': 1.5}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.0.2/v/union_tag_invalid
"""

In a world where you know you'll get one of two inputs, and you can distinguish by the kind field, but you only want one of the two to pass validation, I can see the error from adapter_with_discriminator.validate_python(data) being much preferable.


Another case that seems like it could be useful would be if you have logic for determining what items should be in a tagged union where having a single item is still sensible:

from typing import Literal, Annotated, Union

from pydantic import BaseModel, TypeAdapter, Field, ValidationError


class Cat(BaseModel):
    kind: Literal['cat'] = 'cat'


class Dog(BaseModel):
    kind: Literal['dog'] = 'dog'


class Fish(BaseModel):
    kind: Literal['fish'] = 'fish'


def get_pet_adapter(cats_allowed: bool, dogs_allowed: bool, fish_allowed: bool) -> TypeAdapter:
    allowed_kinds = []
    if cats_allowed:
        allowed_kinds.append(Cat)
    if dogs_allowed:
        allowed_kinds.append(Dog)
    if fish_allowed:
        allowed_kinds.append(Fish)
    return TypeAdapter(Annotated[Union[tuple(allowed_kinds)], Field(discriminator='kind')])


adapter = get_pet_adapter(cats_allowed=False, dogs_allowed=True, fish_allowed=False)
adapter.validate_python({'kind': 'dog'})

While this could potentially be worked around by simplifying the union, I still feel there is utility in being able to say "if the discriminator field fails validation, don't bother validating anything else".


Ultimately, the reasons why I think we should do this are:

  • it adds no complexity to the implementation (arguably simplifies it)
  • I don't see any reason why it would be actively undesirable to be able to do tagged-union-style validation for a single case when a user went out of their way to specify a discriminator
  • I can imagine scenarios where it would be significantly beneficial

Copy link
Contributor
@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably this removes the distinction between treatment of a single type and a 1-element union when using a discriminator.

Presumably this would (very slightly) negatively impact performance for users who use this, because pydantic_core would first validate the tag and then prodeed to validate the whole type (which would include the tag again) - i.e. it's not an optimization in the one-member case, unlike for multiple members.

An argument to not do this would be that by forbidding it we don't give users the opportunity to deoptimize themselves. Users can already handle the case of single members differently in the examples @dmontagu illustrated above.

Overall this seems like a tradeoff of usability versus performance. I think because the error (of trying to use discriminant for a non-union type, i.e. single-element union) fails at model build time, I'm currently falling just on the side of not allowing this.

We could benchmark the performance impact; maybe the overhead added is irrelevant, in which case I'd be👍

@dmontagu
Copy link
Contributor Author
dmontagu commented Jul 4, 2023

Overall this seems like a tradeoff of usability versus performance. I think because the error (of trying to use discriminant for a non-union type, i.e. single-element union) fails at model build time, I'm currently falling just on the side of not allowing this.

@davidhewitt note that this is opt-in — you still have to say you want to use a discriminator to get this behavior. Given that, I feel like it's not really a tradeoff in performance, because you can just choose not to include the discriminator field in the Field call and you will get the current behavior.

Am I misunderstanding something, or do you think even with that it should still be an error?

@dmontagu
Copy link
Contributor Author
dmontagu commented Jul 4, 2023

@davidhewitt also to be clear:

I think because the error (of trying to use discriminant for a non-union type, i.e. single-element union) fails at model build time

This PR would change that:

from typing import Literal

from pydantic import BaseModel, Field


class Model1(BaseModel):
    kind: Literal['one']
    x: str
    y: int
    z: float
    w: bool


class M(BaseModel):
    x: Model1 = Field(discriminator='kind')


data_one = {'kind': 'one', 'x': 'a', 'y': 1, 'z': 1.5, 'w': True}
data_two = {'kind': 'two', 'x': 1, 'y': 'a', 'z': False, 'w': 1.5}

print(M(x=data_one))
#> x=Model1(kind='one', x='a', y=1, z=1.5, w=True)

M(x=data_two)
"""
pydantic_core._pydantic_core.ValidationError: 1 validation error for M
x
  Input tag 'two' found using 'kind' does not match any of the expected tags: 'one' [type=union_tag_invalid, input_value={'kind': 'two', 'x': 1, '...', 'z': False, 'w': 1.5}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.0.2/v/union_tag_invalid
"""

@davidhewitt
Copy link
Contributor
davidhewitt commented Jul 4, 2023

Am I misunderstanding something, or do you think even with that it should still be an error?

My perception was that the error at build time is good because it stops users adding a performance overhead.

But on reflection I retract this opinion; I'm in favour of the usability improvement proposed here. If it's a performance hit, that's something we can solve in the implementation.

@dmontagu dmontagu merged commit 793c202 into main Jul 5, 2023
@dmontagu dmontagu deleted the allow-single-item-discriminated-union branch July 5, 2023 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

0