-
Notifications
You must be signed in to change notification settings - Fork 813
Support UTF-8 in metric creation, parsing, and exposition #1070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'll squash commits when we're done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a first runthrough. Generally looking ok to me. A couple thoughts:
- If you want to split it up I t
10000
hink refactoring validation + removing regex parsing could be good opportunities. Plus have some tests that catch the incorrect behavior you mentioned (e.g. curly braces in a label value).
- Tests for this would be nice in this PR too
- Some more tests for various permutations would be nice, specifically there are not many tests for escaped label names/values right now
prometheus_client/validation.py
Outdated
_legacy_validation = True | ||
|
||
|
||
def validate_metric_name(name: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want these to be public now? Might be worth having them still prefixed with _
but I also think they could be useful general functions for other libraries to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the functions are used by both prometheus_client and prometheus_client/openmetrics so I thought they had to be public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, yeah Python doesn't really do public/private, it's just a convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused what change you want me to make, then -- leave it? prepend _? something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh sorry, yeah if you could prepend with _
I think that would be safest to begin with to avoid others depending on the methods.
Added a test, fixed an issue using .split |
prometheus_client/parser.py
Outdated
return tokens | ||
|
||
|
||
def unquote_unescape(text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe these should be private too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say yes, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a bit of prepending with more _
and I think this is good to go! Plus cleaning up the commit history and such.
prometheus_client/parser.py
Outdated
return tokens | ||
|
||
|
||
def unquote_unescape(text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say yes, thanks!
303ff91
to
d50068f
Compare
d50068f
to
cf5bdc0
Compare
part of #1013 Signed-off-by: Owen Williams <owen.williams@grafana.com>
cf5bdc0
to
b21a7a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great work, LGTM!
Adding support for UTF-8 required substantial reworking of the parsing code because the previous code relied on regexes and .index calls that would incorrectly find characters inside quote marks. Note that this means the previous code was already broken for parsing exposition text where characters like braces and hashes were in label values. I have added tests to exercise the edge cases I can think of.
Does not address content negotiation, which will be a followon.
part of #1013