[go: up one dir, main page]

CN112799929B - Root cause analysis method and system of alarm log - Google Patents

Root cause analysis method and system of alarm log Download PDF

Info

Publication number
CN112799929B
CN112799929B CN202110126298.0A CN202110126298A CN112799929B CN 112799929 B CN112799929 B CN 112799929B CN 202110126298 A CN202110126298 A CN 202110126298A CN 112799929 B CN112799929 B CN 112799929B
Authority
CN
China
Prior art keywords
hierarchical tree
alarm
generalized hierarchical
nodes
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110126298.0A
Other languages
Chinese (zh)
Other versions
CN112799929A (en
Inventor
吴冕冠
周文泽
陆新龙
谢伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110126298.0A priority Critical patent/CN112799929B/en
Publication of CN112799929A publication Critical patent/CN112799929A/en
Application granted granted Critical
Publication of CN112799929B publication Critical patent/CN112799929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a root cause analysis method and a root cause analysis system for an alarm log, which can be used in the financial field or other fields, and the method comprises the following steps: receiving batch alarm logs; obtaining an alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set; the preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs. The application can improve the efficiency of the root cause analysis of the alarm log on the basis of ensuring the reliability of the root cause analysis of the alarm log.

Description

Root cause analysis method and system of alarm log
Technical Field
The application relates to the technical field of data processing, in particular to a root cause analysis method and system of an alarm log.
Background
With the continuous rapid development of distributed services, the call complexity between services has an exponential rise compared with the traditional monomer architecture. When the transaction fails, the difficulty of troubleshooting the problems by operation, maintenance and development related personnel is obviously greatly increased compared with the traditional monomer architecture.
Particularly when a large number of alarms occur in production, at present, service personnel and operation and maintenance personnel cannot quickly locate the cause of the problem according to the alarm information, an alarm log is required to be sent to a developer to assist in analyzing the cause of the problem, and the developer is difficult to quickly locate the cause of the problem when facing a large number of complicated alarm information, and the developer with multiple applications possibly needs to combine to discuss and analyze together so as to finally locate the cause of the problem, so that the efficiency is relatively low.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a root cause analysis method and a root cause analysis system for an alarm log, which can improve the root cause analysis efficiency of the alarm log on the basis of ensuring the reliability of the root cause analysis of the alarm log.
In order to solve the technical problems, the application provides the following technical scheme:
in a first aspect, the present application provides a root cause analysis method for an alarm log, including:
Receiving batch alarm logs;
Obtaining an alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set;
the preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs.
Further, before the alarm root cause of the batch alarm log is obtained according to the batch alarm log and the preset generalized hierarchical tree set, the method further comprises:
Acquiring batch historical alarm logs and actual results corresponding to the historical alarm logs respectively, wherein the actual results comprise: abnormal results or normal results;
generating a plurality of generalized hierarchical trees according to the batch history alarm logs;
obtaining the association coefficient of each intermediate node in the generalized hierarchical tree according to a preset association coefficient algorithm, a batch history alarm log and an actual result;
And taking the association coefficients of each generalized hierarchical tree and each intermediate node thereof as the generalized hierarchical tree set.
Further, the obtaining the alarm root cause of the batch alarm log according to the batch alarm log and a preset generalized hierarchical tree set includes:
Loading all alarm logs into the preset generalized hierarchical tree set to obtain the number of characteristic attributes of all alarm logs corresponding to each leaf node, wherein the preset generalized hierarchical tree set comprises a plurality of generalized hierarchical trees and association coefficients of each intermediate node of the generalized hierarchical trees, and each generalized hierarchical tree comprises the leaf nodes and the intermediate nodes;
If the leaf nodes with the number of the characteristic attributes being greater than or equal to the number threshold value exist, the leaf nodes are used as root cause nodes in the corresponding generalized hierarchical tree;
if a generalized hierarchical tree with the number of the characteristic attributes corresponding to each leaf node being smaller than the number threshold exists, obtaining root cause nodes in the generalized hierarchical tree according to the association coefficients of each intermediate node in the generalized hierarchical tree, the number of the characteristic attributes corresponding to each leaf node and the number threshold;
and obtaining the alarm root cause of the batch alarm logs according to the root cause nodes in each generalized hierarchical tree.
Further, the obtaining the root cause node in the generalized hierarchical tree according to the association coefficient of each intermediate node in the generalized hierarchical tree, the number of the feature attributes corresponding to each leaf node and the number threshold value includes:
the node with the largest association coefficient value in the upper layer of nodes of the leaf nodes is used as a target node;
Performing a generalization procedure, the generalization procedure comprising: taking the sum of the number of the characteristic attributes of the nodes associated with the target node in the nodes of the next layer of the target node as the number of the characteristic attributes of the target node;
If the number of the characteristic attributes of the target node is smaller than the number threshold, taking the node with the largest association coefficient value in the node of the upper layer of the target node as the target node, and executing the generalization process again until the number of the characteristic attributes of the target node is larger than or equal to the number threshold;
and taking the target node as a root cause node in the corresponding generalized hierarchical tree.
In a second aspect, the present application provides a root cause analysis system for an alarm log, including:
the receiving module is used for receiving batch alarm logs;
The root cause analysis module is used for obtaining the alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set;
the preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs.
Further, the root cause analysis system of the alarm log further comprises:
the acquisition module is used for acquiring batch historical alarm logs and actual results corresponding to the historical alarm logs respectively, wherein the actual results comprise: abnormal results or normal results;
The generation module is used for generating a plurality of generalized hierarchical trees according to the batch history alarm logs;
The correlation coefficient obtaining module is used for obtaining the correlation coefficient of each intermediate node in the generalized hierarchical tree according to a preset correlation coefficient algorithm, a batch history alarm log and an actual result;
And the acquisition module is used for taking the association coefficients of each generalized hierarchical tree and each intermediate node thereof as the generalized hierarchical tree set.
Further, the root cause analysis module includes:
the loading unit is used for loading all the alarm logs into the preset generalized hierarchical tree set to obtain the number of the characteristic attributes of all the alarm logs corresponding to each leaf node, the preset generalized hierarchical tree set comprises a plurality of generalized hierarchical trees and association coefficients of each intermediate node of the generalized hierarchical tree, and each generalized hierarchical tree comprises the leaf nodes and the intermediate nodes;
The first judging unit is used for taking the leaf node as a root cause node in the corresponding generalized hierarchical tree if the leaf node with the number of the characteristic attributes being greater than or equal to the number threshold exists;
The second judging unit is used for obtaining root cause nodes in the generalized hierarchical tree according to the association coefficients of all intermediate nodes in the generalized hierarchical tree, the number of the characteristic attributes corresponding to all leaf nodes and the number threshold value if the generalized hierarchical tree with the number of the characteristic attributes corresponding to all leaf nodes smaller than the number threshold value exists;
And the root cause analysis unit is used for obtaining the alarm root cause of the batch alarm logs according to the root cause nodes in each generalized hierarchical tree.
Further, the second judging unit includes:
the node determining subunit is used for taking the node with the largest association coefficient value in the node of the upper layer of the leaf node as a target node;
an execution subunit, configured to execute a generalization procedure, where the generalization procedure includes: taking the sum of the number of the characteristic attributes of the nodes associated with the target node in the nodes of the next layer of the target node as the number of the characteristic attributes of the target node;
The circulation subunit is configured to, if the number of feature attributes of the target node is smaller than the number threshold, re-execute the generalization process with the target node having the largest association coefficient value in the node of the previous layer of the target node until the number of feature attributes of the target node is greater than or equal to the number threshold;
and obtaining a root node subunit, wherein the root node subunit is used for taking the target node as a root node in the corresponding generalized hierarchical tree.
In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the root cause analysis method of the alarm log when the program is executed by the processor.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon computer instructions that when executed implement the root cause analysis method of an alarm log.
According to the technical scheme, the application provides a root cause analysis method and system for an alarm log. Wherein the method comprises the following steps: receiving batch alarm logs; obtaining an alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set; the preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs, and the efficiency of root cause analysis of the alarm logs can be improved on the basis of guaranteeing the reliability of root cause analysis of the alarm logs; specifically, the automation degree of root cause analysis can be improved, multiplexing of generalized hierarchical trees and association coefficients can be realized, the universality of the application scene of the root cause analysis can be improved, the labor cost can be saved, and the efficiency and the instantaneity of the root cause analysis can be improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a root cause analysis method of an alarm log in an embodiment of the application;
FIG. 2 is a schematic diagram of a service feature generalization hierarchical tree constructed in a specific application example of the present application;
FIG. 3 is a schematic diagram of a method feature generalization hierarchical tree constructed in a specific application example of the present application;
FIG. 4 is a schematic diagram of an application feature generalization hierarchical tree constructed in a specific application example of the present application;
FIG. 5 is a schematic diagram of a host characterization generalized hierarchical tree constructed in a specific application example of the present application;
FIG. 6 is a schematic diagram of a service feature generalization hierarchical tree after feature attribute mapping in a specific application example of the present application;
FIG. 7 is a schematic diagram of a method feature generalization hierarchical tree after feature attribute mapping in a specific application example of the present application;
FIG. 8 is a schematic diagram of an application feature generalization hierarchical tree after feature attribute mapping in a specific application example of the present application;
FIG. 9 is a schematic diagram of a host characterization generalized hierarchical tree after feature attribute mapping in a specific application example of the present application;
FIG. 10 is a schematic diagram of a generalized hierarchical tree of host features after generalization in an example of an embodiment of the present application;
FIG. 11 is a schematic diagram of a root cause analysis system of an alarm log according to an embodiment of the present application;
FIG. 12 is a schematic diagram of a root cause analysis system of an alarm log in an application example of the present application;
FIG. 13 is a schematic representation of a generalized hierarchical tree of network failure characteristics in one example of the present application;
FIG. 14 is a schematic diagram of a cluster attribute locating device in an embodiment of the present application;
FIG. 15 is a schematic structural diagram of a generalized hierarchical tree structure device in an application example of the present application;
Fig. 16 is a schematic block diagram of a system configuration of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order to solve the problem that when a large number of alarms occur in production, a system cannot analyze and position the problem root by an automatic means, and the problem that the problem positioning efficiency is lower due to manual work of a developer is solved, and the problem root is analyzed based on an alarm log in an efficient and automatic mode. Then the system carries out log combination on each characteristic attribute of the alarm logs such as transaction alarm logs, iterates upwards continuously according to the association coefficient of each characteristic attribute and counts how many similar logs are contained in the characteristic attribute until the number of the logs contained in a certain clustering result is larger than a threshold value set by a user (as optimization, the number is usually one fifth of the total number of the processing logs, and the user can also set the processing logs by himself); the generalized clustering result obtained can represent the common root cause of the alarm logs of the type.
Based on the above, in order to improve the root cause analysis efficiency of the alarm log on the basis of ensuring the reliability of the root cause analysis of the alarm log, the embodiment of the application provides a root cause analysis system of the alarm log, which can be a server or a client device, wherein the client device can comprise a smart phone, a tablet electronic device, a network set top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), a vehicle-mounted device, an intelligent wearable device and the like. Wherein, intelligent wearing equipment can include intelligent glasses, intelligent wrist-watch and intelligent bracelet etc..
In practical applications, the part for performing root cause analysis of the alarm log may be performed on the server side as described above, or all operations may be performed in the client device. Specifically, the selection may be made according to the processing capability of the client device, and restrictions of the use scenario of the user. The application is not limited in this regard. If all operations are performed in the client device, the client device may further include a processor.
The client device may have a communication module (i.e. a communication unit) and may be connected to a remote server in a communication manner, so as to implement data transmission with the server. The server may include a server on the side of the task scheduling center, and in other implementations may include a server of an intermediate platform, such as a server of a third party server platform having a communication link with the task scheduling center server. The server may include a single computer device, a server cluster formed by a plurality of servers, or a server structure of a distributed device.
Any suitable network protocol may be used for communication between the server and the client device, including those not yet developed on the filing date of the present application. The network protocols may include, for example, TCP/IP protocol, UDP/IP protocol, HTTP protocol, HTTPS protocol, etc. Of course, the network protocol may also include, for example, RPC protocol (Remote Procedure Call Protocol ), REST protocol (Representational STATE TRANSFER) or the like used above the above-described protocol.
It should be noted that, the root cause analysis method and system of the alarm log disclosed by the application can be used in the technical field of finance, and can also be used in any field except the technical field of finance, and the application field of the root cause analysis method and system of the alarm log disclosed by the application is not limited.
The following examples are presented in detail.
In order to improve the efficiency of root cause analysis of an alarm log on the basis of ensuring the reliability of root cause analysis of the alarm log, the embodiment provides a root cause analysis method of an alarm log, wherein an execution subject is a root cause analysis system of the alarm log, and the root cause analysis system of the alarm log comprises, but is not limited to, a server, as shown in fig. 1, and the method specifically comprises the following contents:
step 100: and receiving batch alarm logs.
Specifically, batch alarm logs sent by the distributed server may be received, and each alarm log may include: service, method, application and host.
Step 200: obtaining an alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set; the preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs.
Specifically, the alarm root can represent the reason for generating batch alarm logs, and can be used for determining a method which possibly has abnormality and a corresponding host, application, service and the like; the preset association coefficient algorithm may be a pd.corr () algorithm; the actual result may represent the actual running condition of the distributed server corresponding to the historical alarm log, and may be abnormal or normal.
As can be seen from the above description, the root cause analysis method of the alarm log provided by the embodiment of the application receives batch alarm logs; obtaining an alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set; the preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs, so that the efficiency of root cause analysis of the alarm logs can be improved on the basis of guaranteeing the reliability of root cause analysis of the alarm logs; the method has stronger theoretical interpretability compared with the traditional method of aggregating attributes with fewer current logs, is more efficient in clustering iteration, and finally obtains a result more in line with the actual root cause conclusion.
In order to further improve the reliability of obtaining the generalized hierarchical tree set and further improve the accuracy of root cause analysis by applying the reliable generalized hierarchical tree set next, in an embodiment of the present application, before step 200, the method further includes:
Step 001: acquiring batch historical alarm logs and actual results corresponding to the historical alarm logs respectively, wherein the actual results comprise: abnormal results or normal results.
Step 002: and generating a plurality of generalized hierarchical trees according to the batch history alarm logs.
Specifically, a plurality of generalized hierarchical trees can be generated according to a batch history alarm log and a preset characteristic attribute relation table; the preset feature attribute relationship table may include direct correspondence between features and feature attributes and between feature attributes, that is, direct interlayer relationship and direct inclusion relationship, and may be specifically set according to actual needs, which is not limited in the present application, for example: the application features comprise platform application and host application feature attributes, the platform application feature attributes comprise F-DSF and F-BAM feature attributes, and the host application feature attributes comprise F-AAA and F-BBB feature attributes; when the leaf characteristic attribute of the generalized hierarchical tree is determined, a complete generalized hierarchical tree can be obtained by applying a preset characteristic attribute relation table.
The root nodes in the generalized hierarchical tree can represent the characteristics of the alarm log, the non-root nodes can represent the characteristic attributes, and the characteristics of the generalized hierarchical tree are different; the non-root nodes are nodes except the root nodes in the generalized hierarchical tree; the feature attributes may include feature attributes associated with the next level node.
Step 003: and obtaining the association coefficient of each intermediate node in the generalized hierarchical tree according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results.
Specifically, the intermediate node may represent nodes other than the root node and the leaf node in the generalized hierarchical tree, and the association coefficient of the intermediate node may represent an association coefficient between the intermediate node and the abnormal result.
Step 004: and taking the association coefficients of each generalized hierarchical tree and each intermediate node thereof as the generalized hierarchical tree set.
To further increase the automation degree and efficiency of root cause analysis, in one embodiment of the present application, step 200 includes:
step 210: loading all alarm logs into the preset generalized hierarchical tree set to obtain the number of characteristic attributes of all alarm logs corresponding to each leaf node, wherein the preset generalized hierarchical tree set comprises a plurality of generalized hierarchical trees and association coefficients of each intermediate node of the generalized hierarchical trees, and each generalized hierarchical tree comprises the leaf nodes and the intermediate nodes.
Step 220: if the leaf nodes with the number of the characteristic attributes being greater than or equal to the number threshold exist, the leaf nodes are used as root cause nodes in the corresponding generalized hierarchical tree.
Specifically, a leaf node with the number of the characteristic attributes being greater than or equal to the number threshold may be used as a root cause node in the generalized hierarchical tree corresponding to the leaf node.
Step 230: if the generalized hierarchical tree with the number of the characteristic attributes corresponding to each leaf node being smaller than the number threshold exists, the root cause node in the generalized hierarchical tree is obtained according to the association coefficient of each intermediate node in the generalized hierarchical tree, the number of the characteristic attributes corresponding to each leaf node and the number threshold.
Specifically, if the number of the characteristic attributes corresponding to each leaf node in the generalized hierarchical tree is smaller than the number threshold, the root cause node in the generalized hierarchical tree is obtained according to the association coefficient of each intermediate node in the generalized hierarchical tree, the number of the characteristic attributes corresponding to each leaf node and the number threshold.
Step 240: and obtaining the alarm root cause of the batch alarm logs according to the root cause nodes in each generalized hierarchical tree.
In order to further improve accuracy and efficiency of determining root cause nodes, in one embodiment of the present application, the obtaining root cause nodes in the generalized hierarchical tree according to the association coefficients of each intermediate node in the generalized hierarchical tree, the number of feature attributes corresponding to each leaf node, and the number threshold in step 230 includes:
Step 241: and taking the node with the largest association coefficient value in the node of the upper layer of the leaf node as a target node.
Specifically, the leaf nodes are leaf nodes in a generalized hierarchical tree, wherein the number of characteristic attributes corresponding to each leaf node is smaller than a number threshold; the generalized hierarchical tree may be divided into multiple layers, each layer containing multiple nodes.
Step 242: performing a generalization procedure, the generalization procedure comprising: and taking the sum of the number of the characteristic attributes of the nodes associated with the target node in the nodes of the next layer of the target node as the number of the characteristic attributes of the target node.
Specifically, performing the generalization process may be equivalent to a process of clustering iterations; the node associated with the target node in the next layer of nodes can be the node directly contained by the target node; for example, if there are two associated nodes, the number of feature attributes is a and B, respectively, then the number of feature attributes of the target node is a+b.
Step 243: and if the number of the characteristic attributes of the target node is smaller than the number threshold, taking the node with the largest association coefficient value in the node of the upper layer of the target node as the target node, and executing the generalization process again until the number of the characteristic attributes of the target node is larger than or equal to the number threshold.
Step 244: and taking the target node as a root cause node in the corresponding generalized hierarchical tree.
In order to further explain the scheme, in a specific application example of the root cause analysis method of the alarm log, the method specifically comprises the following steps:
Step S1: the stock alarm logs, namely batch history alarm logs, are obtained, as shown in table 1, the stock alarm logs can comprise 20 pieces, and each alarm log comprises: service, method, application and host, each feature has multiple feature attributes, such as application (application) features have "F-DSF", "F-BAM", "F-AAA" and "F-BBB"4 leaf feature attributes, where "F-DSF" and "F-BAM" are platform class (platform) applications and "F-AAA" and "F-BBB" are host class (mainframe) applications, so application features have 6 feature attributes in total.
TABLE 1
Step S2: training according to a pd.corr () algorithm to obtain association coefficients (the value range is 0-1,0 is irrelevant, and 1 is 100% relevant) of each characteristic attribute and a final result (the last column is an abnormal part); namely, applying a pd.corr () algorithm, the stock alarm log in the table 1 and a final result to obtain the association coefficients of the characteristic attributes in the table 2; for example, in the application feature in table 1, the result of the log with the feature attribute of "F-AAA" is abnormal, so the association coefficient between the feature attribute of "F-AAA" and the final result is 1; the method has 10 records of the attribute of the 'query' feature, wherein 8 final results are abnormal, so that the association coefficient of the 'query' and the final results is 0.8; other characteristic attributes and final result association coefficients can be obtained similarly.
TABLE 2
Step S3: according to the relation between the attributes of the features in the tables 1 and 2, a service feature generalization hierarchical tree in fig. 2 is constructed, wherein 0.5 and 0 represent association coefficients of non-leaf attributes, a method feature generalization hierarchical tree in fig. 3, 0.2 and 0.8 represent association coefficients of non-leaf attributes, an application feature generalization hierarchical tree in fig. 4, 0 and 1 represent association coefficients of non-leaf attributes, and a host feature generalization hierarchical tree in fig. 5, and 0, 0.1 and 1 represent association coefficients of non-leaf attributes.
For example, the host features generalize a hierarchical tree, and all IPs corresponding to host are distributed in parks 1 and 2, each of which is further divided into a plurality of failure domains, and each failure domain is followed by a respective server IP node that actually operates.
Step S4: after the generalized hierarchical tree is built, the system starts to receive the alarm logs in real time, for example, a certain batch, and the system receives 10 alarm logs as shown in table 3.
TABLE 3 Table 3
Step S5: as shown in fig. 6 to 9, mapping each characteristic attribute of the alarm log into four generalization hierarchical trees generated before to obtain the log number of each generalization hierarchical tree characteristic attribute corresponding to the 10 logs; in fig. 2 and 6, bam, dtx, preapproval, consumerCredit, aml, comscore, service and ipublic in the non-root nodes of the service feature generalization hierarchical tree each represent a feature attribute; in fig. 3 and 7, response, "logMainTxStart", "logTxStart", request, and "query" in the non-root node of the method feature generalization hierarchical tree each represent a feature attribute; in FIGS. 4 and 8, the platform, "F-BAM", "F-DSF", mainframe, "F-AAA", "F-BBB" in the non-root node of the application feature generalization hierarchical tree represent feature attributes; in fig. 5 and 9, park 1, park 2, fault domain a to fault domain C, ", 49.74.45", 49.74.52", 27.77.59", 27.77.37", and". 27.77.43 "in non-root nodes of the host signature generalization hierarchical tree each represent a signature attribute.
Step S6: if the preset number threshold is 4, it can be seen that the number of logs which are 4 or more and are contained in the nodes in all three generalization hierarchical trees of service, method, application, so that the three trees do not need to be generalized upwards. The log number contained in the attribute in the host generalization hierarchical tree exceeds a threshold, so that iteration generalization is needed for the attribute with the largest association coefficient in the host generalization hierarchical tree; as shown in fig. 9, among the attributes of the tree that can be generalized at present, the association coefficient of the two attributes of "fault domain a" and "fault domain B" is the largest, so the two nodes are selected for generalization.
Step S7: as shown in fig. 10, in the post-generalization hierarchical tree after generalization, the number of logs included in the attribute of "fault domain a" is already greater than a threshold, and at this time, the number of logs included in all four attributes is greater than or equal to the threshold. Therefore, the generalization process is finished, and the generalization final result of each generalization hierarchical tree is output, namely: the "query" method of the "bam.con surCredit.bam sur Credit service __1_0" service of the F-AAA application running in the failure domain A may be anomalous.
As can be seen from the above description, the root cause analysis method of the alarm log provided by the specific application example constructs an attribute generalization hierarchical tree, association coefficients of each characteristic attribute and a final result and sets a log number threshold value of clustering end according to related service characteristics, after the system is started, the system can continuously analyze the alarm log through the whole model without manual intervention, and the root cause analysis result is output, and the efficiency and the instantaneity of the root cause analysis mode are much higher than those of the traditional manual analysis; the attribute generalization hierarchical tree and the association coefficient of each characteristic attribute and the final result are constructed according to the related business technical characteristics, and then the attribute generalization hierarchical tree and the association coefficient of each characteristic attribute and the final result can be reused all the time without analyzing error reasons according to the alarm log each time, so that the labor cost in the process of analyzing the root cause of the alarm log is greatly reduced; the model can be used in one type of application, different types of applications only need to construct a corresponding attribute generalization hierarchical tree and the association coefficient of each characteristic attribute and a final result according to the business removal technical characteristics, and other devices can be used in all scenes, so that the model multiplexing and popularization are facilitated.
In another application example of the present application, the root cause analysis method of the alarm log includes: extracting features of the stock alarm logs, applying a pd.corr () algorithm to train to obtain association coefficients of each feature attribute and a final result, and constructing a generalized hierarchical tree according to the feature attributes; each characteristic attribute of each alarm log has a unique and topmost generalized result; the generalized hierarchical tree of all features constitutes a generalized hierarchical tree set. When the system receives the alarm logs in the production environment, firstly, extracting the characteristics of the alarm logs, screening out effective characteristic attributes, then loading the effective characteristic attributes into a generalized hierarchical tree set, and merging the alarm logs of the same type. And judging whether the number of alarm logs contained in a certain clustering result is larger than a threshold value (generally set to be one fifth of the total log quantity). If the condition is not met, selecting the characteristic attribute with the maximum absolute value of the current association coefficient, replacing the value of the characteristic attribute of all logs with the value of the upper layer of the characteristic attribute, and continuing to merge logs of the same type until the number of the logs contained in a certain clustering result is larger than a threshold value. And outputting the clustering result to represent root cause attributes of all alarm logs contained in the clustering result, so that the efficiency of the root cause analysis of the alarm logs can be improved on the basis of ensuring the reliability of the root cause analysis of the alarm logs.
In order to improve the root cause analysis efficiency of the alarm log on the basis of ensuring the reliability of the root cause analysis of the alarm log in terms of software, the application provides an embodiment of a root cause analysis system of the alarm log for realizing all or part of the content in the root cause analysis method of the alarm log, referring to fig. 11, wherein the root cause analysis system of the alarm log specifically comprises the following contents:
And the receiving module 10 is used for receiving the batch alarm logs.
The root cause analysis module 20 is configured to obtain an alarm root cause of the batch alarm log according to the batch alarm log and a preset generalized hierarchical tree set; the preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs.
In one embodiment of the present application, the root cause analysis system of the alarm log further includes:
The acquisition module is used for acquiring batch historical alarm logs and actual results corresponding to the historical alarm logs respectively, wherein the actual results comprise: abnormal results or normal results.
And the generating module is used for generating a plurality of generalized hierarchical trees according to the batch history alarm logs.
The correlation coefficient obtaining module is used for obtaining the correlation coefficient of each intermediate node in the generalized hierarchical tree according to a preset correlation coefficient algorithm, a batch of historical alarm logs and actual results.
And the acquisition module is used for taking the association coefficients of each generalized hierarchical tree and each intermediate node thereof as the generalized hierarchical tree set.
In one embodiment of the present application, the root cause analysis module includes:
The loading unit is used for loading all the alarm logs into the preset generalized hierarchical tree set to obtain the number of the characteristic attributes of all the alarm logs corresponding to each leaf node, the preset generalized hierarchical tree set comprises a plurality of generalized hierarchical trees and association coefficients of each intermediate node of the generalized hierarchical tree, and each generalized hierarchical tree comprises the leaf nodes and the intermediate nodes.
And the first judging unit is used for taking the leaf node as a root cause node in the corresponding generalized hierarchical tree if the leaf node with the number of the characteristic attributes being greater than or equal to the number threshold exists.
And the second judging unit is used for obtaining root cause nodes in the generalized hierarchical tree according to the association coefficients of all intermediate nodes in the generalized hierarchical tree, the number of the characteristic attributes corresponding to all leaf nodes and the number threshold value if the generalized hierarchical tree with the number of the characteristic attributes corresponding to all leaf nodes smaller than the number threshold value exists.
And the root cause analysis unit is used for obtaining the alarm root cause of the batch alarm logs according to the root cause nodes in each generalized hierarchical tree.
In one embodiment of the present application, the second determining unit includes:
and the node determining subunit is used for taking the node with the largest association coefficient value in the node of the upper layer of the leaf node as a target node.
An execution subunit, configured to execute a generalization procedure, where the generalization procedure includes: and taking the sum of the number of the characteristic attributes of the nodes associated with the target node in the nodes of the next layer of the target node as the number of the characteristic attributes of the target node.
And the circulation subunit is used for taking the node with the largest association coefficient value in the node of the upper layer of the target node as the target node if the number of the characteristic attributes of the target node is smaller than the number threshold value, and executing the generalization process again until the number of the characteristic attributes of the target node is larger than or equal to the number threshold value.
And obtaining a root node subunit, wherein the root node subunit is used for taking the target node as a root node in the corresponding generalized hierarchical tree.
The embodiment of the root cause analysis system of the alarm log provided in the present disclosure may be specifically used to execute the processing flow of the embodiment of the root cause analysis method of the alarm log, and the functions thereof are not described herein again, and may refer to the detailed description of the embodiment of the root cause analysis method of the alarm log.
To further illustrate the present solution, the present application provides an application example of a root cause analysis system for an alarm log, as shown in fig. 12, where the system specifically includes:
Log access device 01: for receiving an alarm log.
Feature extraction device 02: the method is used for extracting the characteristics of the alarm log and screening out effective characteristic attributes.
Generalized hierarchical tree loading device 03: for loading the generalized hierarchical tree set.
Alarm log merging device 04: and the method is used for merging the alarm logs of the same type, counting the number of the alarm logs of the same type and judging whether the upward clustering is needed to be continued or not.
Cluster attribute locating device 05: the method is used for positioning the characteristic attribute to be clustered in the next step, and the specific positioning method is that Fi corresponding to each characteristic attribute of the characteristic Ai is calculated firstly:
fi (v) = SELECT s FROM T WHERE Ai =v denotes the correlation coefficient s of the feature attribute v of the query in the Ai feature.
Fi =max { Fi (v) |v e dot (Ai) } means that the feature attribute v with the largest Fi (v) value in the feature Ai is selected as the feature attribute to be clustered in the next step.
Attribute replacement apparatus 06: for replacing the characteristic attribute of the alarm log with the characteristic attribute of the layer above it.
Clustering result output device 07: and the method is used for outputting a final clustering result.
Generalization hierarchical tree construction means 08: and the generalized hierarchical tree structure is used for carrying out generalized hierarchical tree structure according to the characteristic attributes of the alarm log. Referring to fig. 13, in one example, the network failure types in the alarm log include network packet loss and network port failure, and the network port failure can be subdivided into local network port failure and remote network port failure, based on which a network failure characterization generalized hierarchical tree is constructed.
Fig. 14 is a schematic structural diagram of the cluster attribute locating device 05, as shown in fig. 14, the cluster attribute locating device 05 includes: a characteristic attribute association coefficient inquiry unit 51 and a characteristic attribute association coefficient comparison unit 52,
Wherein:
Characteristic attribute association coefficient query unit 51: for querying the association coefficients of the feature attributes.
Feature attribute association coefficient comparison unit 52: for comparing how much of the log number each feature attribute contains.
Fig. 15 is a schematic structural diagram of the generalized hierarchical tree structure device 08, and as shown in fig. 15, the generalized hierarchical tree structure device 08 includes: a generalization hierarchical tree construction unit 81, a generalization hierarchical tree integration unit 82, and a feature attribute association coefficient generation unit 83, wherein:
The generalized hierarchical tree construction unit 81: a generalized hierarchical tree for constructing the feature.
The generalized hierarchical tree integration unit 82: the method is used for integrating different generalized hierarchical trees in the log.
The feature attribute association coefficient generation unit 83: and the method is used for generating the association coefficient corresponding to each characteristic attribute in the hierarchical generalization tree.
As can be seen from the above description, the root cause analysis method and system for the alarm log provided by the application can improve the root cause analysis efficiency of the alarm log on the basis of ensuring the reliability of the root cause analysis of the alarm log; specifically, the automation degree of root cause analysis can be improved, multiplexing of generalized hierarchical trees and association coefficients can be realized, the universality of the application scene of the root cause analysis can be improved, the labor cost can be saved, and the efficiency and the instantaneity of the root cause analysis can be improved.
In order to improve the root cause analysis efficiency of the alarm log on the basis of ensuring the reliability of the root cause analysis of the alarm log from the hardware level, the application provides an embodiment of an electronic device for realizing all or part of the content in the root cause analysis method of the alarm log, which specifically comprises the following contents:
A processor (processor), a memory (memory), a communication interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete communication with each other through the bus; the communication interface is used for realizing information transmission between the root cause analysis system of the alarm log and related equipment such as a user terminal and the like; the electronic device may be a desktop computer, a tablet computer, a mobile terminal, etc., and the embodiment is not limited thereto. In this embodiment, the electronic device may be implemented with reference to an embodiment of the root cause analysis method for implementing the alarm log and an embodiment of the root cause analysis system for implementing the alarm log, and the contents thereof are incorporated herein, and are not repeated here.
Fig. 16 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 16, the electronic device 9600 may include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 16 is exemplary; other types of structures may also be used in addition to or in place of the structures to implement telecommunications functions or other functions.
In one or more embodiments of the application, the root cause analysis function of the alarm log may be integrated into the central processor 9100. The central processor 9100 may be configured to perform the following control:
step 100: and receiving batch alarm logs.
Step 200: obtaining an alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set; the preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs.
From the above description, it can be seen that the electronic device provided by the embodiment of the application can improve the efficiency of root cause analysis of the alarm log on the basis of ensuring the reliability of root cause analysis of the alarm log.
In another embodiment, the root cause analysis system of the alarm log may be configured separately from the central processor 9100, for example, the root cause analysis system of the alarm log may be configured as a chip connected to the central processor 9100, and the root cause analysis function of the alarm log is implemented by the control of the central processor.
As shown in fig. 16, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 need not include all of the components shown in fig. 16; in addition, the electronic device 9600 may further include components not shown in fig. 16, and reference may be made to the related art.
As shown in fig. 16, the central processor 9100, sometimes also referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 9100 receives inputs and controls the operation of the various components of the electronic device 9600.
The memory 9140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 9100 can execute the program stored in the memory 9140 to realize information storage or processing, and the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. The power supply 9170 is used to provide power to the electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.
The memory 9140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, etc. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. The memory 9140 may also be some other type of device. The memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 storing application programs and function programs or a flow for executing operations of the electronic device 9600 by the central processor 9100.
The memory 9140 may also include a data store 9143, the data store 9143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. A communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, as in the case of conventional mobile communication terminals.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and to receive audio input from the microphone 9132 to implement usual telecommunications functions. The audio processor 9130 can include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100 so that sound can be recorded locally through the microphone 9132 and sound stored locally can be played through the speaker 9131.
As can be seen from the above description, the electronic device provided by the embodiment of the application can improve the efficiency of root cause analysis of the alarm log on the basis of ensuring the reliability of root cause analysis of the alarm log.
An embodiment of the present application also provides a computer-readable storage medium capable of implementing all the steps in the root cause analysis method of an alarm log in the above embodiment, the computer-readable storage medium storing thereon a computer program which, when executed by a processor, implements all the steps in the root cause analysis method of an alarm log in the above embodiment, for example, the processor implements the following steps when executing the computer program:
step 100: and receiving batch alarm logs.
Step 200: obtaining an alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set; the preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs.
As can be seen from the above description, the computer readable storage medium provided by the embodiments of the present application can improve the efficiency of root cause analysis of an alarm log on the basis of ensuring the reliability of root cause analysis of an alarm log.
The embodiments of the method of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment mainly describes differences from other embodiments. For relevance, see the description of the method embodiments.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principles and embodiments of the present application have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (8)

1. A root cause analysis method of an alarm log, comprising:
Receiving batch alarm logs;
Obtaining an alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set;
The preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs;
before the alarm root cause of the batch alarm log is obtained according to the batch alarm log and the preset generalized hierarchical tree set, the method further comprises the following steps:
Acquiring batch historical alarm logs and actual results corresponding to the historical alarm logs respectively, wherein the actual results comprise: abnormal results or normal results;
generating a plurality of generalized hierarchical trees according to the batch history alarm logs;
obtaining the association coefficient of each intermediate node in the generalized hierarchical tree according to a preset association coefficient algorithm, a batch history alarm log and an actual result;
Taking each generalized hierarchical tree and the association coefficient of each intermediate node as the generalized hierarchical tree set;
generating a plurality of generalized hierarchical trees according to the batch history alarm logs, including:
generating a plurality of generalized hierarchical trees according to the batch historical alarm logs and a preset characteristic attribute relation table; the preset characteristic attribute relation table comprises characteristics and direct corresponding relations among characteristic attributes and among the characteristic attributes;
The root nodes in the generalized hierarchical tree represent the characteristics of the alarm log, the non-root nodes represent the characteristic attribute, and the characteristics of the generalized hierarchical tree are different; the non-root nodes are nodes except the root nodes in the generalized hierarchical tree; the intermediate nodes represent nodes except root nodes and leaf nodes in the generalized hierarchical tree, and the association coefficients of the intermediate nodes represent association coefficients between the intermediate nodes and abnormal results;
The preset association coefficient algorithm comprises the following steps: pd.corr () algorithm.
2. The root cause analysis method of an alarm log according to claim 1, wherein the obtaining the alarm root cause of the batch alarm log according to the batch alarm log and a preset generalized hierarchical tree set comprises:
Loading all alarm logs into the preset generalized hierarchical tree set to obtain the number of characteristic attributes of all alarm logs corresponding to each leaf node, wherein the preset generalized hierarchical tree set comprises a plurality of generalized hierarchical trees and association coefficients of each intermediate node of the generalized hierarchical trees, and each generalized hierarchical tree comprises the leaf nodes and the intermediate nodes;
If the leaf nodes with the number of the characteristic attributes being greater than or equal to the number threshold value exist, the leaf nodes are used as root cause nodes in the corresponding generalized hierarchical tree;
if a generalized hierarchical tree with the number of the characteristic attributes corresponding to each leaf node being smaller than the number threshold exists, obtaining root cause nodes in the generalized hierarchical tree according to the association coefficients of each intermediate node in the generalized hierarchical tree, the number of the characteristic attributes corresponding to each leaf node and the number threshold;
and obtaining the alarm root cause of the batch alarm logs according to the root cause nodes in each generalized hierarchical tree.
3. The root cause analysis method of an alarm log according to claim 2, wherein the obtaining root cause nodes in the generalized hierarchical tree according to the association coefficients of each intermediate node in the generalized hierarchical tree, the number of feature attributes corresponding to each leaf node, and the number threshold includes:
the node with the largest association coefficient value in the upper layer of nodes of the leaf nodes is used as a target node;
Performing a generalization procedure, the generalization procedure comprising: taking the sum of the number of the characteristic attributes of the nodes associated with the target node in the nodes of the next layer of the target node as the number of the characteristic attributes of the target node;
If the number of the characteristic attributes of the target node is smaller than the number threshold, taking the node with the largest association coefficient value in the node of the upper layer of the target node as the target node, and executing the generalization process again until the number of the characteristic attributes of the target node is larger than or equal to the number threshold;
and taking the target node as a root cause node in the corresponding generalized hierarchical tree.
4. A root cause analysis system for an alarm log, comprising:
the receiving module is used for receiving batch alarm logs;
The root cause analysis module is used for obtaining the alarm root cause of the batch alarm logs according to the batch alarm logs and a preset generalized hierarchical tree set;
The preset generalized hierarchical tree set is obtained according to a preset association coefficient algorithm, a batch of historical alarm logs and actual results corresponding to the historical alarm logs;
the acquisition module is used for acquiring batch historical alarm logs and actual results corresponding to the historical alarm logs respectively, wherein the actual results comprise: abnormal results or normal results;
The generation module is used for generating a plurality of generalized hierarchical trees according to the batch history alarm logs;
The correlation coefficient obtaining module is used for obtaining the correlation coefficient of each intermediate node in the generalized hierarchical tree according to a preset correlation coefficient algorithm, a batch history alarm log and an actual result;
the obtaining module is used for taking each generalized hierarchical tree and the association coefficient of each intermediate node as the generalized hierarchical tree set;
generating a plurality of generalized hierarchical trees according to the batch history alarm logs, including:
generating a plurality of generalized hierarchical trees according to the batch historical alarm logs and a preset characteristic attribute relation table; the preset characteristic attribute relation table comprises characteristics and direct corresponding relations among characteristic attributes and among the characteristic attributes;
The root nodes in the generalized hierarchical tree represent the characteristics of the alarm log, the non-root nodes represent the characteristic attribute, and the characteristics of the generalized hierarchical tree are different; the non-root nodes are nodes except the root nodes in the generalized hierarchical tree; the intermediate nodes represent nodes except root nodes and leaf nodes in the generalized hierarchical tree, and the association coefficients of the intermediate nodes represent association coefficients between the intermediate nodes and abnormal results;
The preset association coefficient algorithm comprises the following steps: pd.corr () algorithm.
5. The root cause analysis system of an alarm log of claim 4, wherein the root cause analysis module comprises:
the loading unit is used for loading all the alarm logs into the preset generalized hierarchical tree set to obtain the number of the characteristic attributes of all the alarm logs corresponding to each leaf node, the preset generalized hierarchical tree set comprises a plurality of generalized hierarchical trees and association coefficients of each intermediate node of the generalized hierarchical tree, and each generalized hierarchical tree comprises the leaf nodes and the intermediate nodes;
The first judging unit is used for taking the leaf node as a root cause node in the corresponding generalized hierarchical tree if the leaf node with the number of the characteristic attributes being greater than or equal to the number threshold exists;
The second judging unit is used for obtaining root cause nodes in the generalized hierarchical tree according to the association coefficients of all intermediate nodes in the generalized hierarchical tree, the number of the characteristic attributes corresponding to all leaf nodes and the number threshold value if the generalized hierarchical tree with the number of the characteristic attributes corresponding to all leaf nodes smaller than the number threshold value exists;
And the root cause analysis unit is used for obtaining the alarm root cause of the batch alarm logs according to the root cause nodes in each generalized hierarchical tree.
6. The root cause analysis system of an alarm log according to claim 5, wherein the second judging unit includes:
the node determining subunit is used for taking the node with the largest association coefficient value in the node of the upper layer of the leaf node as a target node;
an execution subunit, configured to execute a generalization procedure, where the generalization procedure includes: taking the sum of the number of the characteristic attributes of the nodes associated with the target node in the nodes of the next layer of the target node as the number of the characteristic attributes of the target node;
The circulation subunit is configured to, if the number of feature attributes of the target node is smaller than the number threshold, re-execute the generalization process with the target node having the largest association coefficient value in the node of the previous layer of the target node until the number of feature attributes of the target node is greater than or equal to the number threshold;
and obtaining a root node subunit, wherein the root node subunit is used for taking the target node as a root node in the corresponding generalized hierarchical tree.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the root cause analysis method of the alarm log of any of claims 1 to 3 when the program is executed by the processor.
8. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the root cause analysis method of an alarm log of any of claims 1 to 3.
CN202110126298.0A 2021-01-29 2021-01-29 Root cause analysis method and system of alarm log Active CN112799929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110126298.0A CN112799929B (en) 2021-01-29 2021-01-29 Root cause analysis method and system of alarm log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110126298.0A CN112799929B (en) 2021-01-29 2021-01-29 Root cause analysis method and system of alarm log

Publications (2)

Publication Number Publication Date
CN112799929A CN112799929A (en) 2021-05-14
CN112799929B true CN112799929B (en) 2024-06-28

Family

ID=75812848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110126298.0A Active CN112799929B (en) 2021-01-29 2021-01-29 Root cause analysis method and system of alarm log

Country Status (1)

Country Link
CN (1) CN112799929B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656206B (en) * 2021-07-23 2024-10-18 东软集团股份有限公司 Error log processing method, device and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609759A (en) * 2018-06-15 2019-12-24 华为技术有限公司 Fault root cause analysis method and device
CN111159127A (en) * 2018-11-07 2020-05-15 中移(苏州)软件技术有限公司 A method and device for log analysis based on Apriori algorithm

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11586972B2 (en) * 2018-11-19 2023-02-21 International Business Machines Corporation Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
CN111726248A (en) * 2020-05-29 2020-09-29 北京宝兰德软件股份有限公司 Alarm root cause positioning method and device
CN112052151B (en) * 2020-10-09 2022-02-18 腾讯科技(深圳)有限公司 Fault root cause analysis method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609759A (en) * 2018-06-15 2019-12-24 华为技术有限公司 Fault root cause analysis method and device
CN111159127A (en) * 2018-11-07 2020-05-15 中移(苏州)软件技术有限公司 A method and device for log analysis based on Apriori algorithm

Also Published As

Publication number Publication date
CN112799929A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN111782470B (en) Distributed container log data processing method and device
CN112163946A (en) Accounting processing method and device based on distributed transaction system
CN111047430A (en) Accounting information processing method and device
CN112866268B (en) Message processing method and system
CN113392158A (en) Service data processing method and device and data center
WO2021057064A1 (en) Data interaction conversion method and apparatus based on artificial intelligence, device, and medium
CN112181678A (en) Service data processing method, device and system, storage medium and electronic device
CN112784112A (en) Message checking method and device
CN109840159A (en) Abnormality processing project management method, apparatus, computer installation and storage medium
WO2022123490A1 (en) Systems and methods for managing connections in scalable clusters
CN105335466A (en) Audio data retrieval method and apparatus
CN112799929B (en) Root cause analysis method and system of alarm log
CN116991929A (en) Micro-service system based on big hospital data
CN112417018B (en) Data sharing method and device
CN113672488A (en) Log text processing method and device
US20210141791A1 (en) Method and system for generating a hybrid data model
CN112396511A (en) Distributed wind control variable data processing method, device and system
CN111930690A (en) File generation method and device
US11916853B2 (en) Group type identification method and apparatus, computer device, and medium
CN117743721A (en) Data processing method and device
CN113190236B (en) HQL script verification method and device
CN111047362A (en) Statistical management method and system for use activity of intelligent sound box
CN110222286A (en) Information acquisition method, device, terminal and computer readable storage medium
CN113190460B (en) Automatic test case generation method and device
CN114222028A (en) Speech recognition method, apparatus, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant