10BC0 Adding Sanitizer interface and implementation by longquanzheng · Pull Request #97 · uber-java/tally · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@longquanzheng
Copy link

This is replacement of #70

  • Sanitizes metric names and tag keys and values.
  • Does nothing by default.
  • The ScopeBulider can be configured with the standard M3Sanitizer.
  • Fixes Add Metrics Sanitizer  #8

@CLAassistant
Copy link
CLAassistant commented Apr 8, 2021

CLA assistant check
All committers have signed the CLA.

@longquanzheng
Copy link
Author

@alexeykudinkin Hi, can you take a look? I have addressed the comments from the previous PR. LMK if you have any others.

@longquanzheng
Copy link
Author
10BC0

looks like unit tests are failing. Working on fixing

@longquanzheng
Copy link
Author

@alexeykudinkin Hi, can you take a look? I have fixed all the tests. Thanks in advance.

@longquanzheng
Copy link
Author

@alexeykudinkin @andrewmains12 Hello, sorry to bother you guys again. I would really appreciate if you can help review it.

@alexeykudinkin
Copy link
Contributor

@longquanzheng we would also need to write a benchmark for this one.

* StringSanitizer is to sanitize strings
* It has a Sanitize method which returns a sanitized version of the input string value.
*/
public interface StringSanitizer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need standalone interface for this, Function interface should be reflective

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed. It's too heavy for a function to have interface.
You meant functional interface/lambda, correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Function<String,String>


private SanitizeRange(char low, char high) {
this.low = low;
this.high = high;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add precondition asserting that high >= low

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an example of "Precondition"(maybe I didn't find in a correct way) so I throw a Runtime exception here. LMK if you want to assert in a different way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, precondition reference was mostly notional (please do not import Guava for that)

@longquanzheng
Copy link
Author

@alexeykudinkin thanks so much for your reviews. I have addressed your comments except bench tests. Could you give me more about how to write/add it?

protected String separator = DEFAULT_SEPARATOR;
protected ImmutableMap<String, String> tags;
protected Buckets defaultBuckets = DEFAULT_SCOPE_BUCKETS;
protected ScopeSanitizer sanitizer = new ScopeSanitizerBuilder().build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this opt-in -- by default sanitizer should be no-op

Copy link
Author
@longquanzheng longquanzheng Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean having it as null by default in the builder?

This would be a little bit tricky. Then the ScopeImpl will have to do this null check everywhere, is that okay? (It would be easier if this is Kotlin and we can use ? operator, sigh)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM, looks like I will apply another comment from you and Andrey in #97 (comment)

@alexeykudinkin
Copy link
Contributor

@longquanzheng LGTM, minor comment.

Regarding benchmarks, you can take a look at the benchmarks package for some examples;

}

ImmutableMap.Builder<String, String> builder = new ImmutableMap.Builder<>();
if (tags != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this check is redundant. If tags is null it will fail on tags.entrySet()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather re-phrase this: @longquanzheng please move this check at the beginning to make sure there's no NPE

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point. Thanks


private CharRange(char low, char high) {
if (low > high) {
throw new RuntimeException("invalid CharRange");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/RuntimException/IllegalArgumentException?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

private final Function<String, String> keySanitizer;
private final Function<String, String> valueSanitizer;

SanitizerImpl(Function<String, String> nameSanitizer, Function<String, String> keySanitizer, Function<String, String> valueSanitizer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd make arguments name more explicit: tagKeySanitizer, tagValueSanitizer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good catch -- I should have done it, hehe

*/
public class ScopeSanitizerBuilder {

private Function<String, String> nameSanitizer = value -> value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider Function.identity()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! TIL

3411

/**
* The SanitizerBuilder returns a Sanitizer for the name, key and value. By
* default, the name, key and value sanitize functions returns all the input
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create a NoopSanitizer implements ScopeSanitizer and use it as a default one instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like this idea better.
Otherwise having null in ScopeImpl will require lots of null checking in the code.

@@ -0,0 +1,89 @@
// Copyright (c) 2020 Uber Technologies, Inc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls update copyright to 2021

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;

public class SanitizeRangeTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/SanitizeRangeTest/CharRangeTest

private static final char HIGH = 'z';

@Test
public void sanitizeRange() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a test when HIGH is smaller than LOW and maybe when HIGH == LOW


import static org.junit.Assert.assertEquals;

public class SanitizerTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/SanitizerTest/SanitizerImplTest or ScopeSanitizerTest

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to ScopeSanitizerTest

.withKeyCharacters(
ValidCharacters.of(
ValidCharacters.ALPHANUMERIC_RANGE,
ValidCharacters.UNDERSCORE_DASH_CHARACTERS))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc dot is also a valid character but not recommended

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to UNDERSCORE_DASH_CHARACTERS

Copy link
Collaborator
@SokolAndrey SokolAndrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're modifying ScopeImpl default behaviour (even though it's a noop by default) would be nice to see how does it affect the benchmarks.
You can run those benchmark tests using the following command:

./gradlew tally-core:runJmhTests

Could you run it without and with your changes and add both results to the PR?
It would be nice to have separate benchmarks for sanitizer implementation though, but I don't think it has to be a part of this PR. @alexeykudinkin wdyt?

you can find more info/examples on JMH benchmarks here

@longquanzheng
< 4C07 summary data-view-component="true" class="timeline-comment-action Link--secondary Button--link Button--medium Button"> Copy link
Author

Thanks both for the review. I will address the comments this week.

@longquanzheng
Copy link
Author
longquanzheng commented Apr 22, 2021

@alexeykudinkin @SokolAndrey thank you so much again for the reviewing.

I have addressed all the comments. Below is the bench results on my laptop:

Before the PR(using master)

Benchmark                                    Mode  Cnt     Score    Error   Units
ScopeImplBenchmark.scopeReportingBenchmark  thrpt   10  1162.629 ± 67.777  ops/ms

After the PR:

Benchmark                                    Mode  Cnt     Score    Error   Units
ScopeImplBenchmark.scopeReportingBenchmark  thrpt   10  1146.372 ± 30.429  ops/ms

My second run on current PR:

Benchmark                                    Mode  Cnt     Score    Error   Units
ScopeImplBenchmark.scopeReportingBenchmark  thrpt   10  1151.460 ± 53.541  ops/ms

@alexeykudinkin
Copy link
Contributor

@longquanzheng i assume these runs are w/ no-op sanitizer? Can you also paste Benchmarks w/ the sanitizer actually used?

@longquanzheng
Copy link
Author

@longquanzheng i assume these runs are w/ no-op sanitizer? Can you also paste Benchmarks w/ the sanitizer actually used?

Yes. Thanks for pointing it out.
I am testing with sanitizer now. Is that the right way? longquanzheng@5f2993c

@longquanzheng
Copy link
Author

@longquanzheng i assume these runs are w/ no-op sanitizer? Can you also paste Benchmarks w/ the sanitizer actually used?

Yes. Thanks for pointing it out.
I am testing with sanitizer now. Is that the right way? longquanzheng@5f2993c

Just ran with the commit of sanitizer:

Benchmark                                    Mode  Cnt     Score    Error   Units
ScopeImplBenchmark.scopeReportingBenchmark  thrpt   10  1485.831 ± 40.634  ops/ms

@alexeykudinkin
Copy link
Contributor

add benchmark code
@longquanzheng
Copy link
Author

@longquanzheng i don't think we have any Benchmarks before targeting specifically code paths you're changing. Let's make sure that Benchmarks are stressing code paths that are changing.

@longquanzheng i don't think we have any Benchmarks before targeting specifically code paths you're changing. Let's make sure that Benchmarks are stressing code paths that are changing.

I got what you mean. Just added benchmark code. Can you take another look?

@longquanzheng
Copy link
Author
longquanzheng commented Apr 23, 2021

@alexeykudinkin here is the results:

Benchmark                                                 Mode  Cnt   Score   Error   Units
ScopeImplBenchmark.recordingBenchmark                    thrpt   10  75.436 ± 2.179  ops/ms
ScopeImplBenchmark.recordingWithSanitizingDashBenchmark  thrpt   10  53.200 ± 1.539  ops/ms

recordingBenchmark is the default without sanitizer, recordingWithSanitizingDashBenchmark is with sanitizers

@@ -0,0 +1,4 @@
Benchmark Mode Cnt Score Error Units
ScopeImplBenchmark.recordingBenchmark thrpt 10 74.819 ± 4.573 ops/ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check how other benchmarks are reported (we need to capture all data-points offered by JMH)

Copy link
Author
@longquanzheng longquanzheng Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't really understand how exactly to do it.

Would you mind taking it over? I am not convinced that this should be done by this PR, as the benchmark is already missing. The most problem is I don't have a clear picture of what you really want for benchmarking(with out detailed documentation), and it should be much easier if your team just do it.

LMK if you agree.

Copy link
Contributor
@alexeykudinkin alexeykudinkin May 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be done by this PR. This is a critical library laying in the hot-path of execution of many services, hence we need to maintain the focus on its performance.

Over the last year a lot of efforts have been put in to optimize, streamline, and make this library robust to serve the needs it was originally built for. Therefore, as a new contribution guideline we require any non-trivial change to adhere to the same basic principles of assuring library's correctness and performance.

Totally appreciate the amount of incremental effort that is required to adhere to this heightened standards from every contribution, but unfortunately there's no other way to guarantee high-level of reliability and performance of the open-source library otherwise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check how other benchmarks are reported

Can you give more details here? I did check but don't know how that works(though I see other reports have the details like gc time).
For example another benchmark is just
public class M3ReporterBenchmark extends AbstractReporterBenchmark. So I don't know what am I missing here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SokolAndrey can you help in here?

.reportEvery(Duration.MAX_VALUE);
}

public void recordTestMetrics(final ScopeImpl scope) {
Copy link
Contributor
4C07

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good routine to check -- this was used as pre-init seq, and i would suggest to keep it as such.

Instead create separate benchmarks and routines to update counter/gauge/histogram separately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my pervious response. I appreciate your time reviewing and commenting it.
But it would be nice if you can help the benchmarking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comment above. We're more than happy to help w/ guidance, but benchmarking is now a standard requirement from non-trivial contributions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will revert that one back.

Can you give me details about what you want to benchmark? -- which method/function/routines?

Copy link
Contributor
@alexeykudinkin alexeykudinkin May 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i have called out prior, let's test creation and recording fro each type of metric individually (counter/gauge/histogram).

// Counter
scope.counter("counter").inc(1)

// Timer
scope.timer("timer").record(...)

// Histogram
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Metrics Sanitizer

5 participants

0