-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Add pipeline API #9459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pipeline API #9459
Conversation
Deploying pydantic-docs with
|
| Latest commit: |
166df3d
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://b2819b73.pydantic-docs.pages.dev |
| Branch Preview URL: | https://transform.pydantic-docs.pages.dev |
CodSpeed Performance ReportMerging #9459 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool API you've designed here. I enjoyed this review, and am looking forward to your thoughts re my organizational questions!
|
Nice. I like the new namespace changes, and thanks for switching over to annotations in the docs! |
|
@sydney-runkle i think the main thing missing here is some tests and validating json schema? |
|
Indeed. I think we need:
|
|
Other questions I thought about while working on tests:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like this though the verbosity seems quite high. It would be nice to explore ideas how to encourage users to make reusable pipeline pieces and similar techniques to reduce verbosity.
docs/concepts/types.md
Outdated
| username: Annotated[str, parse(str).str.pattern(r'[a-z]+')] # (3)! | ||
| password: Annotated[ | ||
| str, | ||
| parse(str).transform(str.lower).predicate(lambda x: x != 'password')] # (4)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two observations here:
-
It feels pretty verbose to have this pipeline inline. Should we encourage users to make it reusable?
validate_password = parse(str).transform(str.lower).predicate(lambda x: x != 'password') class User(BaseModel): password: Annotated[str, validate_password]
-
Is there a way to not need the
Annotated? I can't see an obvious one, but I'd love for it to work somehow. e.g.Pipeline[str, transform(str.lower).predicate(...)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pipeline[str, transform(str.lower).predicate(...)] would work if we just export Pipeline = Annotated (I think) but that's not much better? I would love for foo: parse(str).transform(int) to work and infer the output type as int but that just doesn't work.
Should we encourage users to make it reusable?
I agree, making examples reusable makes sense. I would suggest:
# or call some password check/validation
Password = Annotated[str, parse(str).predicate(lambda x: x != 'password')]
class User(BaseModel):
password: Password
docs/concepts/types.md
Outdated
| 5. Use the `|` or `&` operators to combine steps (like a logical OR or AND). | ||
| 6. Calling `parse()` with no arguments implies `parse(<field type>)`. Use `parse(Any)` to accept any type. | ||
| 7. For recursive types you can use `parse_defer` to reference the type itself before it's defined. | ||
| 8. You can call `parse()` before or after other steps to do pre or post processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what this means? I assume that this is interactions with BeforeValidator. How do we implement wrap validators in this model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is you can do Annotated[int, parse(str).transform(str.strip).parse().transform(lambda x: x * 2) which is a wrap validator since it does (1) validate as a string and strip whitespace, (2) parse as an integer and (3) multiple by 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. .parse() like that was totally not obvious to me, is it a significant ergonomic win to allow the argument to be inferred. Otherwise I might argue to make it required for now to simplify.
Or write an explicit section which is like
BeforeValidator -> parse(str).transform(bv).parse()
AfterValidator -> parse().transform(av)
WrapValidator -> parse(str).transform(bv).parse().transform(av)
... @sydney-runkle and I had an interesting use case for a wrap validator the other day which split IANA timezone off a timestamp so that we could do the timestamp in Rust and then reconstruct a tz-aware datetime. It would be interesting to see how to implement that in this API, as it's not obvious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on that example and we can try and see how it would work out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a significant ergonomic win to allow the argument to be inferred
I'd say so, consider if the type is list[MyGeneric[dict[str, set[bool]]]] or something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had an interesting use case for a wrap validator the other day which split IANA timezone off a timestamp so that we could do the timestamp in Rust and then reconstruct a tz-aware datetime
We could also add an explicit wrap validator: parse(str).transform(lambda x, h: h(x.strip())).parse(int)?
I don't really like that because it breaks the linearity and typing.
| """Pipe the result of one validation chain into another.""" | ||
| return _Pipeline([_PipelineAnd(self, other)]) | ||
|
|
||
| __and__ = then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand why this API was added — as long as the input and output types of a pipeline are the same, then it's basically equivalent to do them in order. However, I'll note that sequencing the items like this may end up being unintuitive, in particular if you expect to get an error for each failure in the case of multiple independent validators, rather than just the first failure. I understand it's hard to "fix" that given that transformations are possible though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it that you mean that for validate_as(int).gt(0) & validate_as(int).gt(1) you’d expect -1 to give two errors? Indeed it would only give one. I think that’s reasonable behavior.
I also think you’re saying that for the case where you have a chain of constraints you could error for all of them eg validate_as(int).gt(0).gt(1) and indeed that will happen if they’re all one after another and are known constraints, but not if they’re custom predicates or there’s a transformation between them. And also not when you use &. I think that’s okay.
The one improvement we could make is “collapsing” sequential constraints into one level eg validate_as(int).predicate(lambda x: x > 0).predicate(lambda x: x % 2 == 0) could give both errors despite it being arbitrary user code. That’s a reasonable future feature request that shouldn’t be too hard to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my point was that I might expect validate_as(int).predicate(lambda x: x > 0) & validate_as(int).predicate(lambda x: x % 2 == 0) to produce two errors. (Well, knowing it's doing chaining, I'm not even sure if that's valid as written there since it repeats the validate_as(int), but that's what my intuition would be for how I'd expect to use &.) I think it's reasonable for validate_as(int).predicate(lambda x: x > 0).predicate(lambda x: x % 2 == 0) to just produce one though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My expectation would be the exact opposite: & is often a greedy operation e.g. False & 1 / 0. So I don't think there's a valid general intuition here. If we could make it behave as you expect I suspect there'd be complaints because it's unintuitive or because it is doing unnecessary work. I don't know why you'd expect validate_as(int).predicate(lambda x: x > 0).predicate(lambda x: x % 2 == 0) to produce just one error. Do you also expect Field(min_length=10, pattern=...) to produce a single error? In any case given that we can't really change the behavior and that no users have complained yet I suspect trying to determine what is most intuitive here is not a productive debate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm late to the conversation, but I'd expect it to short-circuit the same way it would for an if expression or any other useage (that I know of) for and.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grantmwilliams I think it works as you'd expect then, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adriangb I've only tested it a bit, but it seems to work exactly as expected.
|
@sydney-runkle the changes to make |
😢 ugh, I'll take another look. I tried overloading, but the |
Co-authored-by: David Montague <35119617+dmontagu@users.noreply.github.com>
Co-authored-by: David Montague <35119617+dmontagu@users.noreply.github.com>
|
What's left to do here? |
|
@sydney-runkle I have a couple of followup ideas but I think we should merge this and iterate from there |
|
Awesome, going to merge then! Could you document those thoughts in an issue at some point for reference? |
This PR introduces a new experimental API to do more type-safe advanced constraining, parsing and transformation of types.
More specifically, from the docs updates:
This PR introduces an experimental "pipeline" API that allows composing of parsing (validation), constraints and transformations in a more type-safe manner than existing APIs. This API is subject to change or removal, we are looking for feedback and suggestions before making it a permanent part of Pydantic.
Generally, the pipeline API is used to define a sequence of steps to apply to incoming data during validation. The pipeline API is designed to be more type-safe and composable than the existing Pydantic API.
Each step in the pipeline can be:
FalseYou can see more detail in the docs here: https://transform.pydantic-docs.pages.dev/concepts/experimental/#importing-experimental-feature-warnings
Also, this PR is very much aligned with PEP 746, which introduces type checking for annotated metadata.
Fix #9522
^^ adding new experimental feature pattern docs to the version policy