8000 Sample contended locks by overflowing interval bucket by krk · Pull Request #982 · async-profiler/async-profiler · GitHub
[go: up one dir, main page]

Skip to content

Conversation

krk
Copy link
Contributor
@krk krk commented Sep 2, 2024

Description

Implement a sampling method for contended locks, controlled by the lock
threshold (lock=DURATION). Bucket is filled by each contended lock's
duration, sampling is done only when the bucket overflows.

Also included are two commits to make testing reliable and easier:

  • Update PmuTests to have more cache misses
  • Allow setting TESTS variable from make cli

Related issues

#805

Motivation and context

Goal is to reduce the overhead while profiling lock contentions.

How has this been tested?

Different threshold values were tested with the included LockProfiling.java and Renaissance benchmarks.


By submitting this pull request 8000 , I confirm that my contribution is made under the terms of the Apache 2.0 license.

krk added 2 commits August 27, 2024 18:07
Makes the test more reliable, passes in higher spec machines.
e.g. `make test TESTS=lock` will run tests under test/test/lock folder.
unsigned long tid = (unsigned long)syscall(SYS_gettid);

// _threshold is used both as a duration threshold and a bucket interval. When the counter overflows _threshold, the event is sampled.
if (_enabled && duration >= _threshold && enter_time >= _start_time && updateCounter(_total_duration, duration, _threshold)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration >= _threshold condition is exactly how the current algorithm works.
The idea of a counter was to replace this condition.

private:
static double _ticks_to_nanos;
static jlong _threshold;
static jlong _interval;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where _interval is used.


try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PrintStream out = new PrintStream(outputStream)) {
PrintStream out = new PrintStream(outputStream)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not reformat untouched code, especially since the existing style already matches default IDEA settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using vscode default rules, will not format, thanks.

boolean enabled() default true;

// Optional inputs to the test method.
double[] inputs() default {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double[] is probably not the best option for a generic case. String[] would likely be a better universal choice, similarly to main method accepting String[] args.
Optionally, the framework could automatically translate inputs to proper types derived from a method signature. This may unnecessarily complicate the code, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will accept String[] and convert to double[] now, leaving the option to automatically support other types.

8000
@Test(mainClass = LockProfiling.class, inputs = { 10, 75 })
@Test(mainClass = LockProfiling.class, inputs = { 100, 75 })
@Test(mainClass = LockProfiling.class, inputs = { 1000, 75 })
@Test(mainClass = LockProfiling.class, inputs = { 10000, 75 })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many options, IMO - this will significantly increase test run time. I'd reduce the number of tests to the required minimum (let's say, up to 10 seconds in total).

public static void test8M() {
long[] array = new long[1024 * 1024];
testRandomRead(array, 1024 * 1024);
long[] array = new long[8 * 1024 * 1024];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8M meant 8 Megabytes, you made it 64. Perhaps, we can rename it to test8MB to avoid confusion.
Does the test fail with 8MB?

Copy link
Contributor Author

Choose 8000 a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests fail with 8MB:

WARNING: PmuTests.cacheMisses failed
java.lang.AssertionError: Expected 0.6086956521739131 < 0.2
        at one.profiler.test.Assert.isLess(Assert.java:18)
        at test.pmu.PmuTests.cacheMisses(PmuTests.java:40)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename the functions to test16KB/test8MB or testSmallRead/testLargeRead.

// alloc[=BYTES] - profile allocations with BYTES interval
// live - build allocation profile from live objects only
// lock[=DURATION] - profile contended locks longer than DURATION ns
// lock[=DURATION] - profile contended locks overflowing the DURATION ns bucket (default: 10u)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10us



double LockTracer::_ticks_to_nanos;
jlong LockTracer::_threshold;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, _interval? Similarly to other engines.

if (args != null && !args.isEmpty()) {
args = substituteFiles(args);
for (StringTokenizer st = new StringTokenizer(args, " "); st.hasMoreTokens(); ) {
for (StringTokenizer st = new StringTokenizer(args, " "); st.hasMoreTokens();) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extraneous formatting changes

long samples = extractSamples(s);
matched1 += pattern1.matcher(s).find() ? samples : 0;
matched2 += pattern2.matcher(s).find() ? samples : 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reusing samples method?

if (p.inputs().length == 0) {
m.invoke(holder, p);
} else {
m.invoke(holder, p, p.inputs());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to pass inputs() explicitly.
Test method already accepts p and can extract inputs whenever it wants.

public class LockProfiling {
final static int timeOutsideLock = 1_000_000; // 1 ms
final static ThreadLocal<Double> totalUsefulWork = ThreadLocal.withInitial(() -> 0.0);
final static ThreadLocal<Double> totalWait = ThreadLocal.withInitial(() -> 0.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The canonical order of modifiers is static final


// 0 is equivalent to disabling sampling of locks, so all profiles are included.
@Test(mainClass = LockProfiling.class, inputs = { "0", "75" })
@Test(mainClass = LockProfiling.class, inputs = { "1", "75" })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, there is no difference between 0 and 1, you can leave only 0 case here.

A3E2

@Test(mainClass = LockProfiling.class, inputs = { "10000", "75" })

// Large (for the specific paylod) interval value skews the sampled lock
// contention distribution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an expected behavior? Is it what we want to assert here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an expected behavior. As the interval increases, lock contentions with longer durations will be favored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it's an expected, it is probably not intended behavior.
A test should verify an intended behavior.
What is the actual ratio in all these cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval = 0, ratio = 75.84415584415585, minRatio = 70.0
interval = 10000, ratio = 75.125, minRatio = 70.0
interval = 1000000, ratio = 95.3307392996109, minRatio = 90.0
interval = 1000000000, ratio = NaN, minRatio = NaN

import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.lang.reflect.Modifier;
import java.util.ArrayList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there are no meaningful changes in this file - please revert it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed.


public class LockTests {
private static void contendedLocks(TestProcess p, int interval, double minRatio) throws Exception {
Output out = p.profile("-e lock --lock " + interval + " --threads -o collapsed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-e lock is an obsolete form. --lock option is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed.

@Test(mainClass = LockProfiling.class, inputs = { "10000", "75" })

// Large (for the specific paylod) interval value skews the sampled lock
// contention distribution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it's an expected, it is probably not intended behavior.
A test should verify an intended behavior.
What is the actual ratio in all these cases?

Implement a sampling method for contended locks, controlled by the lock
threshold (lock=DURATION). Bucket is filled by each contended lock's
duration, sampling is done only when the bucket overflows.

Goal is to reduce the overhead while profiling lock contentions.
@apangin apangin merged commit 25fa02e into async-profiler:master Sep 9, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0