8000 Support UTF-8 in metric creation, parsing, and exposition by ywwg · Pull Request #1070 · prometheus/client_python · GitHub
[go: up one dir, main page]

Skip to content

Support UTF-8 in metric creation, parsing, and exposition #1070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

ywwg
Copy link
Member
@ywwg ywwg commented Nov 6, 2024

Adding support for UTF-8 required substantial reworking of the parsing code because the previous code relied on regexes and .index calls that would incorrectly find characters inside quote marks. Note that this means the previous code was already broken for parsing exposition text where characters like braces and hashes were in label values. I have added tests to exercise the edge cases I can think of.

Does not address content negotiation, which will be a followon.

part of #1013

@ywwg ywwg marked this pull request as draft November 6, 2024 14:42
@ywwg
Copy link
Member Author
ywwg commented Nov 7, 2024

I'll squash commits when we're done

@ywwg ywwg marked this pull request as ready for review November 7, 2024 20:47
@ywwg ywwg marked this pull request as draft November 7, 2024 20:55
@ywwg ywwg marked this pull request as ready for review November 12, 2024 15:42
@ywwg ywwg requested a review from csmarchbanks November 12, 2024 15:59
Copy link
Member
@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a first runthrough. Generally looking ok to me. A couple thoughts:

  1. If you want to split it up I t 10000 hink refactoring validation + removing regex parsing could be good opportunities. Plus have some tests that catch the incorrect behavior you mentioned (e.g. curly braces in a label value).
    • Tests for this would be nice in this PR too
  2. Some more tests for various permutations would be nice, specifically there are not many tests for escaped label names/values right now

_legacy_validation = True


def validate_metric_name(name: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want these to be public now? Might be worth having them still prefixed with _ but I also think they could be useful general functions for other libraries to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the functions are used by both prometheus_client and prometheus_client/openmetrics so I thought they had to be public

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, yeah Python doesn't really do public/private, it's just a convention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused what change you want me to make, then -- leave it? prepend _? something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh sorry, yeah if you could prepend with _ I think that would be safest to begin with to avoid others depending on the methods.

@ywwg
Copy link
Member Author
ywwg commented Nov 15, 2024

Added a test, fixed an issue using .split

return tokens


def unquote_unescape(text):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe these should be private too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say yes, thanks!

@ywwg ywwg requested a review from csmarchbanks November 18, 2024 15:13
Copy link
Member
@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bit of prepending with more _ and I think this is good to go! Plus cleaning up the commit history and such.

return tokens


def unquote_unescape(text):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say yes, thanks!

@ywwg ywwg changed the title WIP: Support UTF-8 in metric creation, parsing, and exposition Support UTF-8 in metric creation, parsing, and exposition Nov 20, 2024
@ywwg ywwg force-pushed the owilliams/utf8-01 branch from 303ff91 to d50068f Compare November 20, 2024 16:38
@ywwg ywwg requested a review from csmarchbanks November 20, 2024 16:39
@ywwg ywwg force-pushed the owilliams/utf8-01 branch from d50068f to cf5bdc0 Compare November 20, 2024 18:22
part of #1013

Signed-off-by: Owen Williams <owen.williams@grafana.com>
@ywwg ywwg force-pushed the owilliams/utf8-01 branch from cf5bdc0 to b21a7a1 Compare November 20, 2024 18:23
@ywwg ywwg requested a review from vesari November 27, 2024 14:59
Copy link
Contributor
@vesari vesari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great work, LGTM!

@csmarchbanks csmarchbanks merged commit 33e6828 into master Dec 2, 2024
11 checks passed
@csmarchbanks csmarchbanks deleted the owilliams/utf8-01 branch December 2, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0