8000 Avoid JVM crash by deleting JNI refs after calling GetMethodDeclaringClass by krk · Pull Request #981 · async-profiler/async-profiler · GitHub
[go: up one dir, main page]

Skip to content

Conversation

krk
Copy link
Contributor
@krk krk commented Sep 2, 2024

This is a partial fix for #974, complementing JDK-8268364.

How has this been tested?

Backported JDK-8268364 into Java 17 and tested with reproducer at gist:

javac Main.java
gcc -shared -I"$JAVA_HOME/include" -I"$JAVA_HOME/include/linux" -fPIC repro.cpp -o
8000
repro.so

# Low Xmx to pressure GC into unloading classes sooner.
java -agentpath:"$(pwd)/repro.so" -Xmx100m Main

}

jvmtiEnv* jvmti = VM::jvmti();
JNIEnv* jni = VM::jni();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VM::jni() is not an expensive call, but it's not free either - not a trivial getter like VM::jvmti().
So, I'd create JNIEnv* once in Lookup constructor and cache it as a field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, cached JNIEnv.

char* method_sig = NULL;

jvmtiEnv* jvmti = VM::jvmti();
JNIEnv* jni = VM::jni();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above: obtain JNIEnv* once in FrameName constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cached as above.

@krk krk force-pushed the fjmi-974 branch 2 times, most recently from fe2f917 to 340786f Compare September 2, 2024 17:12
_thread_names_lock(thread_names_lock),
_thread_names(thread_names)
_thread_names(thread_names),
_jni(VM::jni())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be the same problem with native apps as above.
I suggest simply returning NULL from VM::jni() if _vm == NULL.

}

if (method_class) {
_jni->DeleteLocalRef(method_class);
Copy link
@D-D-H D-D-H Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @krk ,

After investigating why the call to DeleteLocalRef is added here(cannot completely fix the problem), I think we can fix the problem in OpenJDK upstream.

Do you plan to fix it? If not, we will file an issue at OpenJDK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a backport into 17 for JDK-8268364 at openjdk/jdk17u-dev#2839. Do you propose a different fix?

Copy link
@D-D-H D-D-H Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you propose a different fix?

If I understand correctly, the call to 'DeleteLocalRef' here is to reduce the possibility of a crash. That crash is actually a bug in JDK and not fixed by JDK-8268364. So we should file a new issue in OpenJDK to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@D-D-H You are right: DeleteLocalRef is a cheap way to significantly decrease possibility of a crash, but fixing the root cause in OpenJDK is still highly desirable. I think it makes sense to integrate this change anyway, before the proper fix is released.

Copy link
@D-D-H D-D-H Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My colleague Liang created an issue for this problem:
Issue: https://bugs.openjdk.org/browse/JDK-8339725
PR: https://git.openjdk.org/jdk/pull/20907

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks!

@yanglong1010
Copy link
Contributor

I ran the reproducer. The stack trace in hs_err log produced by this reproducer is different from the stack trace I provided in #974. The reproducer crashes in GC threads, D-D-H thinks this a JDK bug, DeleteLocalRef can alleviate this problem, but it should also be fixed on JDK (including JDK 18 and above).

The crash in #974 on JDK 8/11/17 is due to lack of JDK-8268364.

These are two different issues and need to be treated differently.

Am i right ?

@apangin
Copy link
Member
apangin commented Sep 4, 2024

Yes, there are two crashes, but they are both parts of the same issue.
JDK-8268364 only mitigates one possible race, but does not solve the problem completely.

This is a partial fix, complementing JDK-8268364.
@apangin apangin changed the title Fix JVM crash by deleting refs after calling GetMethodDeclaringClass. Avoid JVM crash by deleting JNI refs after calling GetMethodDeclaringClass Sep 4, 2024
@apangin apangin merged commit b9a3737 into async-profiler:master Sep 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

0