[go: up one dir, main page]

CN118193272B - Problem positioning method and device of JVM system - Google Patents

Problem positioning method and device of JVM system Download PDF

Info

Publication number
CN118193272B
CN118193272B CN202410457237.6A CN202410457237A CN118193272B CN 118193272 B CN118193272 B CN 118193272B CN 202410457237 A CN202410457237 A CN 202410457237A CN 118193272 B CN118193272 B CN 118193272B
Authority
CN
China
Prior art keywords
target application
target
snapshot
stack
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410457237.6A
Other languages
Chinese (zh)
Other versions
CN118193272A (en
Inventor
杜中原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Keynote Network Inc
Original Assignee
Beijing Keynote Network Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Keynote Network Inc filed Critical Beijing Keynote Network Inc
Priority to CN202410457237.6A priority Critical patent/CN118193272B/en
Publication of CN118193272A publication Critical patent/CN118193272A/en
Application granted granted Critical
Publication of CN118193272B publication Critical patent/CN118193272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a problem positioning method and device of a JVM system, and belongs to the field of application performance management. The method comprises the following steps: creating a snapshot thread in a Java Virtual Machine (JVM) system; each time the JVM system starts a target application, starting the snapshot thread; during the starting process of the target application: performing stack tracking operation on the target application based on the snapshot thread, and acquiring stack tracking data in the starting process of the target application; periodically acquiring a starting time influence parameter of the target application; stopping the snapshot thread when a preset snapshot thread stopping condition is met; determining target stack tracking data of the target application based on stack tracking data currently cached by the snapshot thread; and performing problem positioning analysis on the target application based on the target stack tracking data and the starting time influence parameters. By adopting the application, the problem positioning of long starting time of Java application is realized.

Description

Problem positioning method and device of JVM system
Technical Field
The application relates to the field of application performance management, in particular to a problem positioning method of a JVM system.
Background
APM is an abbreviation for application performance management (Application Performance Management), a set of practices and tools that improve application performance and availability by monitoring and managing software application performance. APM is intended to help developers and system administrators identify and solve performance problems in applications to ensure that applications can run quickly and efficiently, and to provide a good user experience. APM is an important practice that can help organizations ensure that their applications remain high performance and stability in a constantly changing technological environment.
The application program can run in a JVM (Java Virtual Machine ) system, and the running process of the application program is monitored through the JVM. The problem of high-rate allocation of objects in JVM refers to the situation that objects are frequently created and destroyed in Java application programs, resulting in high memory management overhead. This situation may result in reduced performance of the application because frequent object allocation and garbage collection operations may take up a significant amount of CPU (Central Processing Unit ) time and may cause all application threads to halt execution, increase application response time, reduce application throughput, and increase unnecessary memory overhead.
At present, general APM probes are used for positioning requests, and some methods such as web (network) requests are performed through a pre-buried point method, so that the problem analysis of overlong execution time is performed, but basically no method or means for long starting time of positioning application is provided.
Disclosure of Invention
In order to solve the problems in the prior art, the embodiment of the application provides a problem positioning method and device for a JVM system, which can realize the problem positioning of long starting time of Java application. The technical proposal is as follows:
according to an aspect of the present application, there is provided a problem positioning method of a JVM system, the method comprising:
Creating a snapshot thread in a Java Virtual Machine (JVM) system;
each time the JVM system starts a target application, starting the snapshot thread;
During the starting process of the target application: performing stack tracking operation on the target application based on the snapshot thread, and acquiring stack tracking data in the starting process of the target application; periodically acquiring a starting time influence parameter of the target application;
stopping the snapshot thread when a preset snapshot thread stopping condition is met;
Determining target stack tracking data of the target application based on stack tracking data currently cached by the snapshot thread;
And performing problem positioning analysis on the target application based on the target stack tracking data and the starting time influence parameters.
According to another aspect of the present application, there is provided a problem localization apparatus of a JVM system, the apparatus comprising:
the thread creation module is used for creating a snapshot thread in the Java virtual machine JVM system;
The data acquisition module is used for starting the snapshot thread every time the JVM system starts a target application; during the starting process of the target application: performing stack tracking operation on the target application based on the snapshot thread, and acquiring stack tracking data in the starting process of the target application; periodically acquiring a starting time influence parameter of the target application; stopping the snapshot thread when a preset snapshot thread stopping condition is met;
The data aggregation module is used for determining target stack tracking data of the target application based on stack tracking data currently cached by the snapshot thread;
and the problem positioning module is used for performing problem positioning analysis on the target application based on the target stack tracking data and the starting time influence parameters.
According to another aspect of the present application, there is provided an electronic apparatus including:
A processor; and
A memory in which a program is stored,
Wherein the program comprises instructions which, when executed by the processor, cause the processor to perform problem localization for the JVM system described above.
According to another aspect of the present application, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform problem localization of the JVM system described above.
The application has the following beneficial effects:
In the starting process of the target application, stack tracking data are collected through a snapshot thread, starting time influence parameters of the target application are periodically obtained, and problem positioning analysis is carried out on the target application based on the stack tracking data and the starting time influence parameters, so that the problem positioning of the Java application with long starting time can be realized.
Drawings
Further details, features and advantages of the application are disclosed in the following description of exemplary embodiments with reference to the following drawings, in which:
FIG. 1 illustrates a flowchart of a problem location method for a JVM system provided in accordance with an exemplary embodiment of the present application;
FIG. 2 illustrates a flame pictorial intent provided in accordance with an exemplary embodiment of the present application;
FIG. 3 illustrates a schematic block diagram of a problem location apparatus of a JVM system provided in accordance with an exemplary embodiment of the present application;
Fig. 4 shows a block diagram of an exemplary electronic device that can be used to implement an embodiment of the application.
Detailed Description
Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While the application is susceptible of embodiment in the drawings, it is to be understood that the application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the application. It should be understood that the drawings and embodiments of the application are for illustration purposes only and are not intended to limit the scope of the present application.
It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the application is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the devices in the embodiments of the present application are for illustrative purposes only and are not intended to limit the scope of such messages or information.
The application provides a problem positioning method of a JVM system, which can be completed by a terminal, a server and/or other equipment with processing capability. The method provided by the embodiment of the application can be completed by any device or can be completed by a plurality of devices together.
The method will be described with reference to the problem location method of the JVM system shown in fig. 1.
In step 101, in the Java virtual machine JVM system, a snapshot thread is created.
In one possible implementation, a snapshot (dump) thread may be created in a Java Agent of the JVM system. Java agents are a mechanism provided by JVM that allows the bytecode of an application to be modified or enhanced by proxy means when the application is started. This mechanism is often used to implement various types of code injection, performance monitoring, logging, AOP (slice oriented programming), etc. functions.
Java agents allow developers to intercept and modify the bytecode of a class during the loading of the class by an application. This requires analysis, modification and regeneration of the bytecode. To achieve this, a byte code operation library, such as ASM, byteBuddy, is typically used. These libraries provide a high level API that makes it relatively easy to operate at the bytecode level.
The Java class loader is responsible for loading classes from the class path into memory. Java Agent can intervene in class loading process by using delegation mechanism of class loader, and realize dynamic modification of class. This intervention is typically implemented through the Java Instrumentation interface, which provides the ability to perform bytecode conversion before and after class loading.
The Java Agent function is implemented by a special JAR file containing a predefined premain or agentmain method. The premain method is invoked before the host application is started, while the agentmain method may load a new agent at runtime after the host application has been started. These methods are used to initialize agents, set bytecode conversion rules, and the like.
Java agents allow modification of bytecodes when a class is loaded, so that changes to the behavior of the class can be implemented. Such mechanisms are widely used for a variety of purposes including performance optimization, monitoring, logging, security enhancement, and the like.
Application scenarios for Java agents include performance analysis tools (e.g., JProfiler), AOP frameworks (e.g., aspectJ), code injection tools (e.g., bytecode instrumentation tools), monitoring tools (e.g., new Relic), security enhancement tools, and the like. Through Java agents, developers can make various types of enhancements to applications without modifying the original code.
In summary, java Agent is a powerful mechanism provided by Java virtual machines that allows developers to modify and enhance applications at the bytecode level. This provides the possibility to implement various innovative functions and tools.
Step 102, each time the JVM system starts a target application, a snapshot thread is started.
In one possible implementation, the snapshot thread may be started in a Java Agent when the target application is started.
Step 103, in the process of starting the target application: performing stack tracking operation on the target application based on the snapshot thread, and acquiring stack tracking data in the starting process of the target application; and periodically acquiring the starting time influence parameters of the target application.
In one possible implementation, to facilitate analysis of the problem of excessively long application start-up time, two types of data may be collected during the start-up of the target application, the first type of data being stack trace data, and the second type of data being start-up time influencing parameters.
The process of collecting stack trace data is described below.
In the snapshot thread, stack data of the target application in the starting process can be photographed in a mode of regularly tracking the stack data, namely, data in photographing is obtained to be used as stack tracking data, and the stack tracking data is cached according to photographing time.
Optionally, a timing multiplication photographing strategy can be adopted to reduce the photographing quantity and reduce the influence on the application. The specific process may be as follows:
Acquiring a preset initial photographing time interval and an initial snapshot weight;
After starting the target application, acquiring stack tracking data of the target application with the target number based on an initial photographing time interval and an initial snapshot weight;
The following snapshot adjustment processing is performed: adjusting the current photographing time interval and the current snapshot weight based on the preset multiple, and acquiring stack tracking data of the target application of the target number again based on the adjusted photographing time interval and the snapshot weight;
And repeatedly executing the snapshot adjustment processing until the preset snapshot thread stop condition is met, and stopping executing the snapshot adjustment processing.
As an example, a preset initial photographing time interval and snapshot weight may be read in the configuration file, where the interval determines how often the application photographs every other time after starting to run.
After the photographing time interval is read, photographing may be performed at this interval. This process may be repeated 10 times (i.e., target data), i.e., 10 times stack trace data is acquired.
After the first 10 shots of the first round are completed, the shot time interval may be modified. The new photographing time interval may be 2 times (i.e., a preset multiple) the previous photographing time interval, and the weight of the newly photographed snapshot will be 2 times the weight of the previous snapshot.
After the photographing time interval is modified, photographing is performed again at a new interval. This process was repeated 10 times as well.
After completing the second round of 10 shots, the shooting time interval is modified again to be 2 times of the previous one, and the weight of the newly shot snapshot is 2 times of the weight of the last snapshot.
Then, shooting is performed again at new intervals, the process is repeated until the snapshot thread stops running when the preset snapshot thread stop condition is met, and the shooting process is correspondingly stopped.
The acquisition process of the start-up time influencing parameters is described below.
The process of periodically acquiring the start time influence parameter of the target application may be as follows:
in the JVM system, creating MBean objects;
periodically acquiring a starting time influence parameter of a target application based on the MBean object;
and reporting the startup time influence parameter of the current cache of the MBean object when the reporting period is reached.
Wherein the start-up time influencing parameter may comprise any one or more of the following: garbage collection time consumption, security point time consumption, class loading quantity and class loading time.
In one possible implementation, the single time garbage collection time, the security point time and the class loading number can be reported through a JMX (Java Management Extensions, java management extension) monitor, and the class loading process is monitored by adding a transducer, and the class loading classloader (class loader), the url (uniform resource location, uniform resource locator) of a class file, the jar (Java archive) package to which the class loading time belongs, and the class loading time are analyzed.
Specifically, the MBean object is a Java object that can be managed by JMX API, and an MBean object can be created to acquire and store the above-mentioned startup time influencing parameters. MBean objects are registered in MBeanServer so that they can be accessed through JMX APIs (Application Programming Interface ). In the MBean object, java.land.management.Garagecollector MXBean and java.land.management.ThreadMXBean can be used to obtain garbage collection time and security point time, and java.land.management.ClassLoadingMXBean can be used to obtain class loading quantity.
A timed task may be created that periodically updates information in the MBean object and reports that information to the monitoring system.
Specifically, the data format of the upload class loading time may be as follows:
{
time class load time
Class name
ClassLoaderName class loader name
ClassLoaderHashCode class loading HashCode
Classresource url class File (belonging jar bag)
}
During the running of a Java application, there are certain events that may cause the entire JVM to stop or pause, so that all application threads pause, thereby affecting the performance of the application.
The specific events include:
Garbage collection (GC, garba Clean): when the JVM starts to perform garbage collection, to ensure that memory and objects are properly cleaned, all threads need to pause execution in order to perform global garbage collection operations;
initialization at JVM startup: during the JVM startup process, some initialization tasks, such as loading classes, initializing virtual machines, etc., may need to stop all threads for initialization;
Thread security checkpoint (safepoint): in some cases, the JVM needs to ensure that all threads are in a secure state for certain operations or checks, which may include transitions in thread state, modifications in internal state, and so on.
In Java, the programmer does not need to manually manage memory, as the garbage collector of Java is responsible for automatically managing memory. When a Java application is running, it creates objects in heap memory. Over time, these objects may no longer be used by the program, and therefore they become garbage objects (i.e., objects that are no longer referenced). The garbage collector of the JVM periodically scans the heap memory to find objects that are no longer referenced and to collect memory space occupied by those objects. In general, the garbage collection process is stop the world, and other threads cannot work normally.
"Stop the world" is a descriptive term that means that execution of the entire JVM is temporarily stopped when these events occur. This stall may cause performance problems for some applications because the excessive time a thread is suspended may affect the responsiveness and throughput of the application.
To reduce the impact of "Stop the world", developers of JVM will constantly optimize garbage collection algorithms, improve garbage collector performance, and reduce garbage collection frequency and downtime as much as possible. When designing an application, developers can also reduce the influence of garbage collection on the application through reasonable memory management and resource use.
Safepoint is an important mechanism for the JVM to manage and optimize, but in some cases safepoint may cause long suspension times of the application, affecting the performance of the application. Thus, in optimizing and tuning Java applications, developers need to consider safepoint's impact on application performance and make adjustments and optimizations targeted.
There are typically many safepoint when the JVM is running, but not all safepoint will cause all threads to pause. For example, garbage collection requires that all threads be suspended, while other types safepoint may only require that part of the threads be suspended.
The embodiment collects the startup time influence parameters, and performs positioning analysis on the problem of long application startup time based on the startup time influence parameters, so that developers can optimize the startup time of the application.
And 104, stopping the snapshot thread every time a preset snapshot thread stopping condition is met.
The preset snapshot thread stopping conditions may include one or more of the following:
The first condition is that the starting time of the target application reaches a preset time value;
The second condition is that the target application is successfully initialized;
third condition, the first access request of the target application is successfully ended.
In one possible implementation manner, for the first condition, the preset time value may be read in a configuration file, and the starting time of the target application is monitored, and the snapshot thread is stopped every time the starting time of the target application reaches the preset time value.
For the second condition, the port initialization method of different Web containers can be embedded, and when the port is successfully set, an event notification is sent out, and the snapshot thread is stopped. Among them, a Web Container (Web Container) is a software environment for hosting and running Web applications. It provides a runtime environment in which developers can deploy, run, and manage their Web applications. Web containers are typically part of a Web server that executes code for Web applications, processes HTTP requests and responses, and provides the ability to interact with other systems and services. The start-up time of a Web container refers to the time required from the start of a start command until the container is fully ready for a request. This includes loading configuration, initializing applications, preparing database connections, and the like.
For the third condition, after the target application successfully ends the first access request, an event notification may be sent, and the snapshot thread is stopped.
Step 105, determining target stack trace data of the target application based on the stack trace data currently cached by the snapshot thread.
In one possible implementation manner, after the snapshot thread ends, stack trace data currently cached by the snapshot thread may be acquired, and the stack trace data is aggregated, and the corresponding processing may be as follows:
Each time the snapshot thread is stopped, stack tracking data currently cached by the snapshot thread is obtained;
And performing aggregation operation aiming at the target application based on the stored historical stack tracking data and the stack tracking data currently cached by the snapshot thread, and determining the target stack tracking data of the target application.
The specific polymerization operation may be as follows:
Creating target key data for stack tracking data currently cached by the snapshot thread, wherein the target key data comprises a thread identifier and time granularity of the snapshot thread;
Searching the history key data which are the same as the target key data in the stored history stack tracking data;
If the target historical key data which is the same as the target key data does not exist, determining target stack tracking data of a target application based on stack tracking data currently cached by the snapshot thread;
if the target historical key data which is the same as the target key data exists, determining target stack tracking data of the target application based on stack tracking data currently cached by the snapshot thread and historical stack tracking data corresponding to the target historical key data.
In one possible implementation, a hash map HashMap may be used to store and aggregate STACKTRACE (i.e., stack trace data) within the same thread ID (i.e., thread identification) and the same time granularity.
First, a HashMap is created, the key is a compound key (i.e., key data) containing thread ID and time granularity, and the value is a List for storing STACKTRACE. Every time the snapshot thread stops, the stack trace data currently cached by the snapshot thread is acquired as a new STACKTRACE, and a composite key, i.e. target key data, is created for the new STACKTRACE based on the format of the key data in the HashMap.
It is checked whether there is this target key data already in HashMap. If not, a new key (i.e., the target key data) may be created in the HashMap and a new STACKTRACE may be added to a new List with the stack trace data in the List as the target stack trace data for the target application. If the HashMap has the target key data, a List corresponding to the key is acquired, a new STACKTRACE is added to the List, and stack trace data (including history stack trace data and new stack trace data) in the List is taken as target stack trace data of the target application.
And 106, performing problem positioning analysis on the target application based on the target stack tracking data and the starting time influence parameters.
In one possible implementation, both the target stack trace data and the start time influencing parameters may be reported to the monitoring system. Further, a flame map as shown in FIG. 2 may be drawn based on the target stack trace data. Flame mapping is a visualization technique used to demonstrate the performance bottleneck of software or systems. The method graphically displays the time consumption of the program in the execution process and helps a developer to quickly locate the performance problem. Each layer of the flame plot represents a function call, the width represents the time taken by the function at start-up time, and a wider represents a longer time taken. The order from bottom to top indicates the call relationship of the functions, and the lower functions call the upper functions. The top function is the function currently being executed and if it is wide, it may be a performance bottleneck. Overall, flame patterns are a very effective performance analysis tool that can help developers quickly locate and solve performance problems.
In combination with the start-up time influencing parameters, factors that may lead to slow start-up of the target application may be analyzed.
As a specific example, the following logic may be employed to give an analysis strategy:
safepointImpact=safepointTime/totalTime*safepointWeight
gcImpact=gcTime/totalTime*gcWeight
classLoadImpact=classLoadCount/totalTime*classLoadWeight
classLoaderImpact=classLoaderCount/totalTime*classLoaderWeight
flameGraphImpact=flameGraphResult/totalTime*flameGraphWeight
maxImpact=max(safepointImpact,gcImpact,classLoadImpact,classLoaderImpact,flameGraphImpact)
Wherein totalTime represents the total duration of target application start; safepointImpact denotes a security point influence factor, safepointTime denotes security point time consumption, safepointWeight denotes security point weight; gcImpact denotes a garbage collection influence factor, gcTime denotes garbage collection time consumption, gcWeight denotes garbage collection weight; classLoadImpact denotes a class load number influence factor, classLoadCount denotes a class load number, and classLoadWeight denotes a class load number weight; classLoaderImpact denotes a class loading time influence factor, classLoaderCount denotes a class loading time, and classLoaderWeight denotes a class loading time weight; FLAMEGRAPHIMPACT denotes a stack influence factor, flameGraphResult denotes a stack trace time width of flame pattern selection, and FLAMEGRAPHWEIGHT denotes a stack weight; maxImpact denotes the most likely influencing factor.
By means of the above analysis strategy, one can give the factor that most probably results in slow application start-up. The safety point weight, the garbage collection weight, the class loading number weight, the class loading time weight and the flame map weight can be adjusted through configuration.
The embodiment of the application has the following beneficial effects:
In the starting process of the target application, stack tracking data are collected through a snapshot thread, starting time influence parameters of the target application are periodically obtained, and problem positioning analysis is carried out on the target application based on the stack tracking data and the starting time influence parameters, so that the problem positioning of the Java application with long starting time can be realized.
The embodiment of the application provides a problem positioning device of a JVM system, which is used for realizing the problem positioning method of the JVM system. As shown in the schematic block diagram of the problem positioning apparatus of the JVM system in fig. 3, the problem positioning apparatus 300 of the JVM system includes: a thread creation module 301, a data acquisition module 302, a data aggregation module 303, and a problem location module 304.
A thread creation module 301, configured to create a snapshot thread in a Java virtual machine JVM system;
a data acquisition module 302, configured to start the snapshot thread every time the JVM system starts a target application; during the starting process of the target application: performing stack tracking operation on the target application based on the snapshot thread, and acquiring stack tracking data in the starting process of the target application; periodically acquiring a starting time influence parameter of the target application; stopping the snapshot thread when a preset snapshot thread stopping condition is met;
a data aggregation module 303, configured to determine target stack trace data of the target application based on stack trace data currently cached by the snapshot thread;
The problem location module 304 is configured to perform problem location analysis on the target application based on the target stack tracking data and the start time influence parameter.
Optionally, the data acquisition module 302 is configured to:
Acquiring a preset initial photographing time interval and an initial snapshot weight;
after the target application starts to be started, acquiring stack tracking data of the target application with the target number based on the initial photographing time interval and the initial snapshot weight;
the following snapshot adjustment processing is performed: adjusting a current photographing time interval and a current snapshot weight based on a preset multiple, and acquiring stack tracking data of the target application of the target number again based on the adjusted photographing time interval and snapshot weight;
And repeatedly executing the snapshot adjustment processing until the preset snapshot thread stopping condition is met, and stopping executing the snapshot adjustment processing.
Optionally, the preset snapshot thread stop condition includes one or more of the following:
The starting time of the target application reaches a preset time value;
The target application is successfully initialized;
And the first access request of the target application is successfully ended.
Optionally, the data aggregation module 303 is configured to:
Each time the snapshot thread is stopped, stack tracking data currently cached by the snapshot thread is obtained;
And performing aggregation operation aiming at the target application based on the stored historical stack tracking data and the stack tracking data currently cached by the snapshot thread, and determining the target stack tracking data of the target application.
Optionally, the data aggregation module 303 is configured to:
Creating target key data for stack tracking data currently cached by the snapshot thread, wherein the target key data comprises a thread identifier and time granularity of the snapshot thread;
Searching the history key data which are the same as the target key data in the stored history stack tracking data;
If the target historical key data which is the same as the target key data does not exist, determining target stack tracking data of the target application based on stack tracking data currently cached by the snapshot thread;
And if the target historical key data which is the same as the target key data exists, determining the target stack tracking data of the target application based on the stack tracking data currently cached by the snapshot thread and the historical stack tracking data corresponding to the target historical key data.
Optionally, the data acquisition module 302 is configured to:
creating MBean objects in the JVM system;
periodically acquiring a starting time influence parameter of the target application based on the MBean object;
and reporting the startup time influence parameter of the current cache of the MBean object when the reporting period is reached.
Optionally, the start-up time influencing parameter includes any one or more of the following: garbage collection time consumption, security point time consumption, class loading quantity and class loading time.
The embodiment of the application has the following beneficial effects:
In the starting process of the target application, stack tracking data are collected through a snapshot thread, starting time influence parameters of the target application are periodically obtained, and problem positioning analysis is carried out on the target application based on the stack tracking data and the starting time influence parameters, so that the problem positioning of the Java application with long starting time can be realized.
The exemplary embodiment of the application also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to an embodiment of the application when executed by the at least one processor.
The exemplary embodiments of the present application also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the present application.
The exemplary embodiments of the application also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the application.
Referring to fig. 4, a block diagram of an electronic device 400 that may be a server or a client of the present application will now be described, which is an example of a hardware device that may be applied to aspects of the present application. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 4, the electronic device 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Various components in electronic device 400 are connected to I/O interface 405, including: an input unit 406, an output unit 407, a storage unit 408, and a communication unit 409. The input unit 406 may be any type of device capable of inputting information to the electronic device 400, and the input unit 406 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 407 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 408 may include, but is not limited to, magnetic disks, optical disks. The communication unit 409 allows the electronic device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above. For example, in some embodiments, the problem localization method of the JVM system described above may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 400 via the ROM 402 and/or the communication unit 409. In some embodiments, the computing unit 401 may be configured to perform the problem localization method of the JVM system described above in any other suitable manner (e.g., by means of firmware).
Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims (10)

1. A method for problem localization in a JVM system, the method comprising:
Creating a snapshot thread in a Java Virtual Machine (JVM) system;
each time the JVM system starts a target application, starting the snapshot thread;
During the starting process of the target application: performing stack tracking operation on the target application based on the snapshot thread, and acquiring stack tracking data in the starting process of the target application; periodically acquiring a starting time influence parameter of the target application;
stopping the snapshot thread when a preset snapshot thread stopping condition is met;
performing aggregation operation aiming at the target application based on the stored historical stack tracking data and the stack tracking data currently cached by the snapshot thread, and determining target stack tracking data of the target application;
And performing problem positioning analysis on the target application based on the target stack tracking data and the starting time influence parameters.
2. The method according to claim 1, wherein the performing, during the starting of the target application, a stack trace operation on the target application based on the snapshot thread, and obtaining stack trace data during the starting of the target application, includes:
Acquiring a preset initial photographing time interval and an initial snapshot weight;
after the target application starts to be started, acquiring stack tracking data of the target application with the target number based on the initial photographing time interval and the initial snapshot weight;
the following snapshot adjustment processing is performed: adjusting a current photographing time interval and a current snapshot weight based on a preset multiple, and acquiring stack tracking data of the target application of the target number again based on the adjusted photographing time interval and snapshot weight;
And repeatedly executing the snapshot adjustment processing until the preset snapshot thread stopping condition is met, and stopping executing the snapshot adjustment processing.
3. The method of claim 1, wherein the preset snapshot thread stop conditions include one or more of:
The starting time of the target application reaches a preset time value;
The target application is successfully initialized;
And the first access request of the target application is successfully ended.
4. The method of claim 1, wherein each time the snapshot thread is stopped, stack trace data currently cached by the snapshot thread is obtained to perform an aggregate operation for the target application based on the stored historical stack trace data and the stack trace data currently cached by the snapshot thread to determine target stack trace data for the target application.
5. The method of claim 1, wherein the performing an aggregate operation for the target application based on the stored historical stack trace data and the current cached stack trace data of the snapshot thread, determining target stack trace data for the target application comprises:
Creating target key data for stack tracking data currently cached by the snapshot thread, wherein the target key data comprises a thread identifier and time granularity of the snapshot thread;
Searching the history key data which are the same as the target key data in the stored history stack tracking data;
If the target historical key data which is the same as the target key data does not exist, determining target stack tracking data of the target application based on stack tracking data currently cached by the snapshot thread;
And if the target historical key data which is the same as the target key data exists, determining the target stack tracking data of the target application based on the stack tracking data currently cached by the snapshot thread and the historical stack tracking data corresponding to the target historical key data.
6. The method of claim 1, wherein the periodically obtaining the start-up time influencing parameter of the target application comprises:
creating MBean objects in the JVM system;
periodically acquiring a starting time influence parameter of the target application based on the MBean object;
and reporting the startup time influence parameter of the current cache of the MBean object when the reporting period is reached.
7. The method of claim 6, wherein the start-up time influencing parameters include any one or more of: garbage collection time consumption, security point time consumption, class loading quantity and class loading time.
8. A problem location apparatus for a JVM system, the apparatus comprising:
the thread creation module is used for creating a snapshot thread in the Java virtual machine JVM system;
The data acquisition module is used for starting the snapshot thread every time the JVM system starts a target application; during the starting process of the target application: performing stack tracking operation on the target application based on the snapshot thread, and acquiring stack tracking data in the starting process of the target application; periodically acquiring a starting time influence parameter of the target application; stopping the snapshot thread when a preset snapshot thread stopping condition is met;
The data aggregation module is used for performing aggregation operation aiming at the target application based on the stored historical stack tracking data and the stack tracking data currently cached by the snapshot thread, and determining target stack tracking data of the target application;
and the problem positioning module is used for performing problem positioning analysis on the target application based on the target stack tracking data and the starting time influence parameters.
9. An electronic device, comprising:
A processor; and
A memory in which a program is stored,
Wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-7.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7.
CN202410457237.6A 2024-04-16 2024-04-16 Problem positioning method and device of JVM system Active CN118193272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410457237.6A CN118193272B (en) 2024-04-16 2024-04-16 Problem positioning method and device of JVM system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410457237.6A CN118193272B (en) 2024-04-16 2024-04-16 Problem positioning method and device of JVM system

Publications (2)

Publication Number Publication Date
CN118193272A CN118193272A (en) 2024-06-14
CN118193272B true CN118193272B (en) 2024-11-19

Family

ID=91415183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410457237.6A Active CN118193272B (en) 2024-04-16 2024-04-16 Problem positioning method and device of JVM system

Country Status (1)

Country Link
CN (1) CN118193272B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104410671A (en) * 2014-11-03 2015-03-11 深圳市蓝凌软件股份有限公司 Snapshot capturing method and data monitoring tool
CN115016866A (en) * 2022-08-09 2022-09-06 荣耀终端有限公司 Data processing method during application starting, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11226869B2 (en) * 2020-04-20 2022-01-18 Netapp, Inc. Persistent memory architecture
CN113434402A (en) * 2021-06-24 2021-09-24 中国工商银行股份有限公司 Performance analysis method, device and equipment for micro-service application
US20230004478A1 (en) * 2021-07-02 2023-01-05 Salesforce.Com, Inc. Systems and methods of continuous stack trace collection to monitor an application on a server and resolve an application incident
CN115718673A (en) * 2022-11-24 2023-02-28 阿里巴巴(中国)有限公司 Method and related device for positioning error code
CN115827390A (en) * 2022-12-28 2023-03-21 厦门友微科技有限公司 JVM performance monitoring method, device, system and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104410671A (en) * 2014-11-03 2015-03-11 深圳市蓝凌软件股份有限公司 Snapshot capturing method and data monitoring tool
CN115016866A (en) * 2022-08-09 2022-09-06 荣耀终端有限公司 Data processing method during application starting, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN118193272A (en) 2024-06-14

Similar Documents

Publication Publication Date Title
US11743116B2 (en) Methods and apparatus to scale application deployments in cloud computing environments
US9424065B2 (en) Methods and apparatus to scale application deployments in cloud computing environments using virtual machine pools
CA2171572C (en) System and method for determining and manipulating configuration information of servers in a distributed object environment
CN110677305B (en) Automatic scaling method and system in cloud computing environment
US7552153B2 (en) Virtual machine monitoring using shared memory
US9811356B2 (en) Automated software configuration management
US9229840B2 (en) Managing traces to capture data for memory regions in a memory
US20200034073A1 (en) Accelerating background tasks in a computing cluster
EP1679602B1 (en) Shared memory based monitoring for application servers
US10084637B2 (en) Automatic task tracking
US11372871B1 (en) Programmable framework for distributed computation of statistical functions over time-based data
CN114328073A (en) Thread monitoring method, device, equipment and storage medium
US10073689B2 (en) Managing application lifecycles within a federation of distributed software applications
US20170031740A1 (en) Naming of nodes in net framework
CN118193272B (en) Problem positioning method and device of JVM system
US11704242B1 (en) System and method for dynamic memory optimizer and manager for Java-based microservices
CN114860203B (en) Project creation method, device, server and storage medium
CN111190693B (en) Method and device for building cloud platform based on Kunpeng platform
CN116340102B (en) Memory overflow monitoring method, device, equipment and storage medium
CN110874303A (en) Data acquisition method, device and equipment
US20230342295A1 (en) System and method for dynamic selection of a garbage collector for java-based microservices
CN119645756A (en) Information reporting method and device for application program, electronic equipment and storage medium
WO2024129077A1 (en) Container emulator for kubernetes
CN114237885A (en) Container resource adjusting method and device, storage medium and electronic equipment
You et al. Autonomic system management for the PDA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant