8000 Add support for `propertyNames` in JSON schema by FlorianSW · Pull Request #10478 · pydantic/pydantic · GitHub
[go: up one dir, main page]

Skip to content

Add support for propertyNames in JSON schema #10478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 3, 2024

Conversation

FlorianSW
Copy link
Contributor
@FlorianSW FlorianSW commented Sep 24, 2024

Change Summary

JSON schema draft 2020-12 defines a propertyNames key for dict/object schemas which can contain any valid JSON schema that defines to what any property name of the object instance needs to validate to.

This can be mapped to a pydantic/python typing like dict[AnyEnum, str] where AnyEnum is an Enum type with defined values.

propertyNames supports other validation schemas as well, e.g. a pattern. However, this is not covered with this PR.

Related issue number

Fixes #4393

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @sydney-runkle

8000
JSON schema draft 2020-12 defines a propertyNames key for dict/object
schemas which can contain any valid JSON schema that defines to what any
property name of the object instance needs to validate to.

This can be mapped to a pydantic/python typing like dict[AnyEnum, str]
where AnyEnum is an Enum type with defined values.

propertyNames supports other validation schemas as well, e.g. a pattern.
However, this is not covered with this commit.

Fixes pydantic#4393
@FlorianSW
Copy link
Contributor Author

please review

Copy link
codspeed-hq bot commented Sep 24, 2024

CodSpeed Performance Report

Merging #10478 will not alter performance

Comparing FlorianSW:feature/4393 (dd77f91) with main (c9c9277)

Summary

✅ 3 8000 8 untouched benchmarks

@FlorianSW
Copy link
Contributor Author

Sorry for the additional commits, I haven't tested any other python version than 3.12, which I've installed locally 😬

Copy link
Contributor
github-actions bot commented Sep 24, 2024

Coverage report

This PR does not seem to contain any modification to coverable code.

Comment on lines 973 to 977
if ks is not None:
keys_type = ks.get('type', None)
if keys_type == 'enum':
keys_members = ks.get('members', [])
json_schema['propertyNames'] = {'enum': keys_members}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add support for any key schema out of the box

Suggested change
if ks is not None:
keys_type = ks.get('type', None)
if keys_type == 'enum':
keys_members = ks.get('members', [])
json_schema['propertyNames'] = {'enum': keys_members}
# Add `keys_schema.pop('title', None)` next to `values_schema.pop('title', None)` above and update comment
if keys_schema.get('type') == 'string' and len(keys_schema) > 1: # len check means there are additional constraints to the key schema
json_schema['propertyNames'] = keys_schema

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and the suggestion :) Just for better understanding from my side:
keys_schema could be a ref only, which means that I would need to resolve that before I can do any checks on that, right?
Would it then be best to add the ref to propertyNames schema, or the resolved one? I'll go with the resolved for now but am open for your inputs :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While doing it, this will also add a propertyNames schema to an existing test case. This should be expected behaviour, as python's Path in json schema of pydantic has the ' format' 'path' schema.

I just have to add this to the existing test case, and I'm not sure how such changes to existing behaviour is handled in pydantic tbh :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum right, although I'm not sure schemas of type 'string' can be made references? cc @sydney-runkle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While doing it, this will also add a propertyNames schema to an existing test case. This should be expected behaviour, as python's Path in json schema of pydantic has the ' format' 'path' schema.

Agreed, this change looks correct to me. I second @Viicos' request for that additional test case with constrained dict keys.

Would it then be best to add the ref to propertyNames schema, or the resolved one? I'll go with the resolved for now but am open for your inputs :)

Re refs - I think the current behavior makes sense. I haven't seen strings as references in JSON schemas yet. Maybe @adriangb has?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it then be best to add the ref to propertyNames schema, or the resolved one?

Looked closer at this - I think we can avoid the ref resolution. Can change if it comes up for a specific use case.

Copy link
Contributor
@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @Viicos' analysis here.

@sydney-runkle sydney-runkle added awaiting author revision awaiting changes from the PR author and removed ready for review labels Sep 25, 2024
@FlorianSW
Copy link
Contributor Author

Agreed with @Viicos' analysis here.

Thanks for the reviews :) I added the changes in the latest commit.

@Viicos Viicos changed the title [json_schema] Add support for propertyNames schema for dict with enum keys typing Add support for propertyNames in JSON schema Sep 25, 2024
Copy link
Member
@Viicos Viicos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a couple remarks.

FlorianSW and others added 3 commits September 26, 2024 20:56
Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>
Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>
@FlorianSW
Copy link
Contributor Author

Thanks for all your support and the reviews 🙏
I added the comments into the PR :)

Copy link
Contributor
@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here! Thanks for your patience with a couple of iterations

@sydney-runkle
Copy link
Contributor
sydney-runkle commented Sep 27, 2024

Ah, I see why my change broke things. Need to think a bit more about this.

There's a part of me that thinks this should be more unconditional, like:

json_schema: JsonSchemaValue = {'type': 'object'}

        keys_schema = self.generate_inner(schema['keys_schema']).copy() if 'keys_schema' in schema else {}
        keys_pattern = keys_schema.pop('pattern', None)

        values_schema = self.generate_inner(schema['values_schema']).copy() if 'values_schema' in schema else {}
        values_schema.pop('title', None)  # don't give a title to the additionalProperties
        if values_schema or keys_pattern is not None:  # don't add additionalProperties if it's empty
            if keys_pattern is None:
                json_schema['additionalProperties'] = values_schema
            else:
                json_schema['patternProperties'] = {keys_pattern: values_schema}

        keys_schema.pop('title', None)  # don't give a title to the propertyNames
        if keys_schema:
            json_schema['propertyNames'] = keys_schema

And we can just use refs like we would for other types. Need to chat with @Viicos before we move forward here.

Thanks for your help @FlorianSW, we can take it from here :).

@sydney-runkle sydney-runkle removed the awaiting author revision awaiting changes from the PR author label Sep 27, 2024
@Viicos
Copy link
Member
Viicos commented Sep 30, 2024

@sydney-runkle, with your proposed change, propertyNames will be set even for keys with a type different than string?

class Model(BaseModel):
    f: dict[Annotated[int, Field(gt=1)], str]

Model(f={2: "a"}).model_dump_json()
#> '{"f": {"2": "a"}}'

While such keys are converted to strings when dumping to JSON, we shouldn't add a propertyNames schema if it isn't of type string (and anyway, gt=1 isn't a constraint that can be expressed for strings -- or maybe by mapping it to a regex constraint but this is a different topic).

The len check feels a bit hacky, but I don't know if there's a better way to do so 🤔

@sydney-runkle
Copy link
Contributor

Ah I see, I'm alright with just adding for string types then.

I'm just hesitant to eagerly resolve the schema refs here - we don't do that elsewhere, and I'm afraid we'll run into errors with available defs at this point in the schema building process (I suppose we can fix and ignore with a catch, though that seems like an annoyingly silent failure).

Thoughts on this @Viicos?

@sydney-runkle
Copy link
Contributor

So, I'm ok with this resolution ahead of time, as we do this for discriminator refs eagerly :).

See

while '$ref' in choice:
assert isinstance(choice['$ref'], str)
choice = self.get_schema_from_definitions(JsonRef(choice['$ref'])) or {}
for an example. Going to open a PR soon standardizing this.

@sydney-runkle sydney-runkle enabled auto-merge (squash) October 3, 2024 19:50
@sydney-runkle sydney-runkle merged commit eda3eb4 into pydantic:main Oct 3, 2024
58 checks passed
@FlorianSW FlorianSW deleted the feature/4393 branch October 3, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema ignores enumerator on dictionary keys
3 participants
0