CN113268370B

CN113268370B - Root cause alarm analysis method, system, equipment and storage medium

Info

Publication number: CN113268370B
Application number: CN202110513431.8A
Authority: CN
Inventors: 杨树森; 田晓慧; 杨煜乾; 薛江; 孙建永
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2023-05-23
Anticipated expiration: 2041-05-11
Also published as: CN113268370A

Abstract

The invention discloses a root cause alarm analysis method, system, equipment and storage medium, including: preprocessing the alarm data, mining the alarm data based on the SR-ASM algorithm, obtaining the alarm sequence, and converting the alarm sequence into an alarm relationship Figure; according to the alarm relationship diagram and the characteristics of the alarm itself, the root cause alarm identification model SGC-RAI is used to identify the root cause alarm, and the identified root cause alarm is recommended. The method, system, device and storage medium can efficiently identify the root cause Because of alarms, and low operation and maintenance costs.

Description

A root cause alarm analysis method, system, device and storage medium

技术领域Technical Field

本发明属于告警分析领域，涉及一种根因告警分析方法、系统、设备及存储介质。The present invention belongs to the field of alarm analysis and relates to a root cause alarm analysis method, system, equipment and storage medium.

背景技术Background Art

数以万计的告警日志，构成了告警洪流。在这些告警里，有一些告警经常发生，且不断重复上报，有一些告警之间有着复杂的关联关系。如果不对告警进行处理，压缩，直接告警日志推送给管理人员，那每次告警的处理效率将十分低下，且处理效果与管理人员的经验成正相关。因此网络告警分析的最主要的任务是使用算法大幅度减少呈现给管理中心运维人员的告警量，只对根因告警进行推送。在过往的根因分析研究中，大量的工作都是通过研究告警的关联和因果关系，然后基于此进行告警的压缩，根因定位，告警预测。这方面的方法主要分为告警相关性分析和根因分析两个类别。告警相关性分析主要通过数据挖掘相关的算法实现，主要有聚类，频繁项挖掘，时序关系挖掘等算法。告警根因分析有很多不同的方法，其中在应用中比较常用的是基于规则的根因分析系统。此类系统中的规则是专家经验的结晶，具有少而精的特点。但是在网络系统日益复杂化的背景下，这类方法在规则的制定，维护，更新方面都需要花费很大成本。随着大数据技术和数据挖掘技术的发展，一些学者提出了“智能运维”的概念，将机器学习算法应用到根因及故障分析的任务中，提高了根因定位的精度，有效降低了运维成本。此类方法是未来的主要发展趋势，然而目前阶段并没有给出具体操作过程。Tens of thousands of alarm logs constitute an alarm flood. Among these alarms, some alarms occur frequently and are repeatedly reported, and some alarms have complex correlations. If the alarms are not processed and compressed, and the alarm logs are directly pushed to the management personnel, the processing efficiency of each alarm will be very low, and the processing effect is positively correlated with the experience of the management personnel. Therefore, the most important task of network alarm analysis is to use algorithms to significantly reduce the amount of alarms presented to the operation and maintenance personnel of the management center, and only push the root cause alarms. In the past root cause analysis research, a lot of work was done by studying the correlation and causal relationship of alarms, and then compressing alarms, locating root causes, and predicting alarms based on this. The methods in this area are mainly divided into two categories: alarm correlation analysis and root cause analysis. Alarm correlation analysis is mainly implemented through data mining related algorithms, mainly clustering, frequent item mining, time series relationship mining and other algorithms. There are many different methods for alarm root cause analysis, among which the rule-based root cause analysis system is more commonly used in applications. The rules in such systems are the crystallization of expert experience and have the characteristics of being concise and precise. However, in the context of increasingly complex network systems, such methods require a lot of cost in terms of rule formulation, maintenance, and updating. With the development of big data technology and data mining technology, some scholars have proposed the concept of "intelligent operation and maintenance", applying machine learning algorithms to root cause and fault analysis tasks, improving the accuracy of root cause location and effectively reducing operation and maintenance costs. Such methods are the main development trend in the future, but the specific operation process is not given at this stage.

发明内容Summary of the invention

本发明的目的在于克服上述现有技术的缺点，提供了一种根因告警分析方法、系统、设备及存储介质，该方法、系统、设备及存储介质能够高效识别根因告警，且运维成本低。The purpose of the present invention is to overcome the shortcomings of the above-mentioned prior art and provide a root cause alarm analysis method, system, device and storage medium, which can efficiently identify root cause alarms and have low operation and maintenance costs.

为达到上述目的，本发明所述的根因告警分析方法包括：To achieve the above object, the root cause alarm analysis method of the present invention comprises:

对告警数据进行预处理，利用SR-ASM算法对告警数据进行挖掘，得告警序列，将告警序列转换为告警关系图；Preprocess the alarm data, use the SR-ASM algorithm to mine the alarm data, obtain the alarm sequence, and convert the alarm sequence into an alarm relationship diagram;

根据所述告警关系图，利用根因告警识别模型SGC-RAI进行根因告警的识别，推荐识别出来的根因告警。According to the alarm relationship diagram, the root cause alarm is identified using the root cause alarm identification model SGC-RAI, and the identified root cause alarm is recommended.

还包括：对告警关系图中双向边进行进一步判定及对图上的每一对关系根据置信度及提升度赋予权值。It also includes: further determining the bidirectional edges in the alarm relationship graph and assigning weights to each pair of relationships in the graph according to the confidence and the improvement.

对告警数据进行预处理的过程为：对告警数据进行去重、排序、按网元及时间窗分组。The process of preprocessing the alarm data is: deduplication, sorting, and grouping the alarm data by network element and time window.

根据所述告警关系图及告警特征，利用根因告警识别模型SGC-RAI 进行根因告警分析的具体操作过程为：According to the alarm relationship diagram and alarm characteristics, the specific operation process of using the root cause alarm identification model SGC-RAI to perform root cause alarm analysis is as follows:

对于一组待分析根因告警的告警序列，对告警的文本特征进行向量化得到每个告警自身的特征，根据告警关系图提取告警的关系特征，利用告警的自身特征及关系特征对根因告警识别模型SGC-RAI进行有监督训练，然后利用训练后的根因告警识别模型SGC-RAI进行根因告警的识别。For a set of alarm sequences of root cause alarms to be analyzed, the text features of the alarms are vectorized to obtain the features of each alarm. The relational features of the alarms are extracted according to the alarm relationship graph. The root cause alarm identification model SGC-RAI is supervised trained using the alarm's own features and relational features. Then, the trained root cause alarm identification model SGC-RAI is used to identify the root cause alarms.

对告警组进行去重，再将所有告警的文本信息作为语料库训练GloVe模型及TF-IDF模型，其中，对于一条告警日志，将告警名称的分词输入到训练后的GloVe模型中，得告警名称分词的词向量特征，然后使用训练后的TF-IDF模型对所述词向量特征进行加权，得告警自身的特征，根据告警关系图得告警的关联信息。The alarm group is deduplicated, and the text information of all alarms is used as a corpus to train the GloVe model and the TF-IDF model. For an alarm log, the word segmentation of the alarm name is input into the trained GloVe model to obtain the word vector features of the alarm name segmentation. The trained TF-IDF model is then used to weight the word vector features to obtain the features of the alarm itself, and the associated information of the alarm is obtained according to the alarm relationship graph.

根因告警识别模型SGC-RAI包括两层空域图卷积层、池化层及全连接层，其中，利用两层空域图卷积层聚合局部信息，利用池化层使用 element-wise最大池化操作提取全部故障信息，并将故障信息注入到告警中，得到告警新的表示，最后通过共享的全连接层将告警的特征转化为一个值，并使用Softmax归一化得到告警的根因分数，取根因分数最大的告警为根因告警。The root cause alarm identification model SGC-RAI includes two layers of spatial graph convolution layers, a pooling layer and a fully connected layer. Among them, the two layers of spatial graph convolution layers are used to aggregate local information, and the pooling layer uses element-wise maximum pooling operation to extract all fault information, and the fault information is injected into the alarm to obtain a new representation of the alarm. Finally, the alarm feature is converted into a value through a shared fully connected layer, and Softmax normalization is used to obtain the root cause score of the alarm. The alarm with the largest root cause score is taken as the root cause alarm.

两层空域图卷积层的操作过程(受关系图卷积启发)为：The operation process of the two-layer spatial domain graph convolution layer (inspired by relational graph convolution) is:

其中，v为告警序列中的告警，k为第k层。

为第k层的输出，

为获取的邻域信息，

为可学习的线性变换。N₁(v)和N₂(v) 分别代表节点v的两类邻居组成的集合。N₁)v)中的节点与节点v有边相连，且边的方向由节点v指向邻居节点，N₂(v)中的节点与节点v有边相连，且边的方向由邻居节点指向节点v。w(v,u)表示告警关联图中节点v 到邻居节点u的边的对应先验权重，w(u,v)表示告警关联图中邻居节点u 到节点v的边的对应先验权重。Among them, v is the alarm in the alarm sequence, and k is the kth layer.

is the output of the kth layer,

To obtain the neighborhood information,

is a learnable linear transformation. _{N 1} (v) and N ₂ (v) represent the sets of two types of neighbors of node v. The nodes in N ₁ (v) are connected to node v by edges, and the direction of the edges is from node v to the neighbor nodes. The nodes in N ₂ (v) are connected to node v by edges, and the direction of the edges is from the neighbor nodes to node v. w(v,u) represents the corresponding prior weight of the edge from node v to neighbor node u in the alarm association graph, and w(u,v) represents the corresponding prior weight of the edge from neighbor node u to node v in the alarm association graph.

聚合全局信息的操作为：The operations for aggregating global information are:

其中，G为整个告警序列构成的关联图：Among them, G is the association graph composed of the entire alarm sequence:

根因告警预测计算为：The root cause alarm prediction calculation is:

一种根因告警分析系统，包括：A root cause alarm analysis system, comprising:

挖掘模块，用于对告警数据进行预处理，利用SR-ASM算法对告警数据进行挖掘，得告警序列，将告警序列转换为告警关系图；The mining module is used to pre-process the alarm data, mine the alarm data using the SR-ASM algorithm, obtain the alarm sequence, and convert the alarm sequence into an alarm relationship diagram;

识别模块，用于根据所述告警关系图及告警本身的文本特征，利用根因告警识别模型SGC-RAI进行根因告警的识别，推荐识别出来的根因告警。一种计算机设备，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现所述根因告警分析方法的步骤。The identification module is used to identify the root cause alarm according to the alarm relationship diagram and the text features of the alarm itself, and recommend the identified root cause alarm using the root cause alarm identification model SGC-RAI. A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the root cause alarm analysis method when executing the computer program.

一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现所述根因告警分析方法的步骤。A computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the root cause alarm analysis method are implemented.

本发明具有以下有益效果：The present invention has the following beneficial effects:

本发明所述的根因告警分析方法、系统、设备及存储介质在具体操作时，基于增长可信度及SR-ASM算法对告警数据进行挖掘，得告警序列，将告警序列转换为告警关系图，避免遗漏具有价值的告警关联信息，提高识别的准确性，然后利用根因告警识别模型SGC-RAI进行根因告警的识别，有效降低对专家经验的依赖，降低运维成本，同时提高识别的效率，经试验，本发明中训练集和测试集的准确率都很高，同时，本发明的microF1和macroF1均高于现有技术，即，当故障的分布不均衡时，本发明在识别不同种类根因方面能力更强，可广泛应用于电信领域的告警压缩及根因定位的任务中。During specific operation, the root cause alarm analysis method, system, device and storage medium described in the present invention mine alarm data based on increasing credibility and SR-ASM algorithm to obtain an alarm sequence, convert the alarm sequence into an alarm relationship diagram, avoid missing valuable alarm association information, and improve recognition accuracy. Then, the root cause alarm recognition model SGC-RAI is used to identify the root cause alarm, effectively reducing the dependence on expert experience, reducing operation and maintenance costs, and improving recognition efficiency. According to experiments, the accuracy of the training set and the test set in the present invention are very high. At the same time, the microF1 and macroF1 of the present invention are higher than the prior art, that is, when the distribution of faults is uneven, the present invention has a stronger ability to identify different types of root causes, and can be widely used in the tasks of alarm compression and root cause location in the telecommunications field.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为SR-ASM算法中模式增长的判断逻辑图；Figure 1 is a logic diagram for judging pattern growth in the SR-ASM algorithm;

图2为本发明中告警相关性挖掘的流程图；FIG2 is a flow chart of alarm correlation mining in the present invention;

图3为本发明中根因告警识别的框架图；FIG3 is a framework diagram of root cause alarm identification in the present invention;

图4为本发明中根因告警识别模型的示意图；FIG4 is a schematic diagram of a root cause alarm identification model in the present invention;

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，不是全部的实施例，而并非要限制本发明公开的范围。此外，在以下说明中，省略了对公知结构和技术的描述，以避免不必要的混淆本发明公开的概念。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the scheme of the present invention, the technical scheme in the embodiment of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiment of the present invention. Obviously, the described embodiment is only an embodiment of a part of the present invention, not all embodiments, and is not intended to limit the scope of the present invention. In addition, in the following description, the description of well-known structures and technologies is omitted to avoid unnecessary confusion of the concepts disclosed in the present invention. Based on the embodiments in the present invention, all other embodiments obtained by ordinary technicians in this field without making creative work should fall within the scope of protection of the present invention.

本发明所述的根因告警分析方法包括以下步骤：The root cause alarm analysis method of the present invention comprises the following steps:

1)基于序列模式挖掘发现告警之间的关联；1) Discover the association between alarms based on sequence pattern mining;

基于SR-ASM算法对告警数据进行挖掘，得告警序列，再将告警序列转换为告警关系图，并对所述告警关系图中的双向边的方向进行进一步判定，及对每一对关系根据置信度及提升度赋予权值；Based on the SR-ASM algorithm, the alarm data is mined to obtain an alarm sequence, which is then converted into an alarm relationship graph. The directions of the bidirectional edges in the alarm relationship graph are further determined, and each pair of relationships is weighted according to the confidence and lift.

具体的，考虑到不同告警的发生频率差异较大，仅依靠支持度指标对告警数据进行挖掘，容易遗漏有价值的告警关联信息，因此本发明对传统的频繁序列模式挖掘算法PrefixSpan进行修改，提出增长可信度概念及告警序列挖掘算法SR-ASM挖掘告警数据中有价值的告警序列，其中，SR-ASM算法以告警序列数据库作为输入，以有价值的告警序列作为输出。Specifically, considering that the occurrence frequencies of different alarms vary greatly, it is easy to miss valuable alarm-related information if only the support index is used to mine the alarm data. Therefore, the present invention modifies the traditional frequent sequence pattern mining algorithm PrefixSpan, and proposes the concept of increasing credibility and the alarm sequence mining algorithm SR-ASM to mine valuable alarm sequences in alarm data. The SR-ASM algorithm takes the alarm sequence database as input and outputs valuable alarm sequences.

2)基于图网络进行根因告警的识别。2) Identify root cause alarms based on graph networks.

根据所述告警关系图，利用根因告警识别模型SGC-RAI进行根因告警分析。According to the alarm relationship diagram, the root cause alarm identification model SGC-RAI is used to perform root cause alarm analysis.

具体的，本发明利用图神经网络的方法来提取告警的特征，判别根因告警。本发明应用空域图卷积以及池化的思想，提出根因告警识别模型SGC-RAI，其中，SGC-RAI模型能够使每个告警聚合局部和全局的信息，从而得到更具区分性的特征，最后将特征经变换及归一化处理，得根因分数，本发明采用有监督的训练方法，取得非常高的根因告警识别率。Specifically, the present invention uses a graph neural network method to extract the features of the alarm and identify the root cause alarm. The present invention applies the idea of spatial graph convolution and pooling, and proposes a root cause alarm identification model SGC-RAI, wherein the SGC-RAI model enables each alarm to aggregate local and global information, thereby obtaining more discriminative features, and finally transforms and normalizes the features to obtain the root cause score. The present invention adopts a supervised training method to achieve a very high root cause alarm recognition rate.

参考图1，SR-ASM的核心细节，SR-ASM为基于PrefixSpan提出的算法，同样采用前缀模式增长的方法来挖掘所有有价值的告警序列，在前缀增长时，本发明提出增长可信度GR的概念，其中，给定前缀α，在前缀α的基础上结合项目b构成的新序列s，以构成有价值的前缀的可信程度，得：Referring to FIG1 , the core details of SR-ASM, SR-ASM is an algorithm proposed based on PrefixSpan, and also adopts the prefix pattern growth method to mine all valuable alarm sequences. When the prefix grows, the present invention proposes the concept of growth credibility GR, wherein, given a prefix α, a new sequence s formed by combining the prefix α with the item b is formed to form the credibility of the valuable prefix, and the following is obtained:

原始PrefixSpan算法仅仅通过支持度来进行模式增长的判断，如果达到最小支持度要求，则前缀继续增长，在前缀增长时，本发明兼顾了支持度和增长可信度，提出如图1所述的前缀增长判断逻辑，本发明将支持度及增长可信度GR一起用于模式继续增长的判断，两种指标满足其一模式即可继续增长。The original PrefixSpan algorithm only uses support to determine pattern growth. If the minimum support requirement is reached, the prefix continues to grow. When the prefix grows, the present invention takes into account both support and growth credibility, and proposes a prefix growth judgment logic as shown in Figure 1. The present invention uses support and growth credibility GR together to determine whether the pattern continues to grow. If either of the two indicators is met, the pattern can continue to grow.

SR-ASM算法思想SR-ASM Algorithm Idea

与PrefixSpan算法类似，SR-ASM通过递归的方法不断进行前缀的增长，具体的，首先从1-前缀(k＝1)开始挖掘，对于每一个1-前缀，进行投影，得所有后缀组成的数据库，统计该数据库中每一项b的支持度；Similar to the PrefixSpan algorithm, SR-ASM continuously increases the prefix through a recursive method. Specifically, it starts mining from the 1-prefix (k=1), and for each 1-prefix, it projects to obtain a database consisting of all suffixes, and counts the support of each item b in the database.

当b的支持度≥最小支持度，则将前缀继续增长为(k+1)-前缀，然后进行递归挖掘；When the support of b is greater than or equal to the minimum support, the prefix is further increased to a (k+1)-prefix, and then recursive mining is performed;

当b的支持度<最小支持度，则计算增长可信度GR，若GR≥min_GR，则将前缀继续增长为(k+1)-前缀，然后进行递归挖掘，否则，则停止增长。When the support of b is less than the minimum support, the growth credibility GR is calculated. If GR≥min_GR, the prefix is further increased to (k+1)-prefix, and then recursive mining is performed. Otherwise, the growth is stopped.

以此类推，一直递归到投影后的数据库为空或者达到序列长度约束为止。And so on, the process continues recursively until the projected database is empty or the sequence length constraint is reached.

程序初始时，前缀为空，在SR-ASM算法中需要被特殊处理，本发明设定支持度下界t，最初数据库中项的支持度达到t时，则为1-前缀。At the beginning of the program, the prefix is empty and needs to be specially processed in the SR-ASM algorithm. The present invention sets the support lower bound t. When the support of an item in the initial database reaches t, it is 1-prefix.

图2为本发明采用的告警相关性挖掘流程，具体过程为：FIG2 is a flowchart of the alarm correlation mining process used in the present invention, and the specific process is as follows:

1a)构建告警序列数据库：1a) Build an alarm sequence database:

对海量及离散的告警记录进行分组、排序、去重及并序列化，其中，该环节的输入为原始告警日志记录，输出为构建好的告警序列数据库S。Massive and discrete alarm records are grouped, sorted, deduplicated and serialized. The input of this link is the original alarm log record, and the output is the constructed alarm sequence database S.

2a)挖掘有价值告警序列：2a) Mining valuable alarm sequences:

以数据库S作为输入，执行SR-ASM算法，得有价值的告警序列及其支持度。Taking the database S as input, the SR-ASM algorithm is executed to obtain valuable alarm sequences and their support.

3a)生成告警关系网络：3a) Generate alarm relationship network:

将步骤2a)得到的告警序列采用网络图的形式表示告警之间的关系，以告警作为网络图上的节点，序列作为网络图上的一条路径。如果构成的图上存在双向边，且一个方向的支持度为另一个方向的支持度的二倍以上，则删除支持度较低的一边，否则两条边都删除。The alarm sequence obtained in step 2a) is used as a network graph to represent the relationship between alarms, with the alarm as a node on the network graph and the sequence as a path on the network graph. If there are bidirectional edges on the constructed graph, and the support in one direction is more than twice the support in the other direction, the side with lower support is deleted, otherwise both edges are deleted.

再计算每一条边的权重，其中，告警A至告警B对应边的权重 Edge(A→B)_weight为：Then calculate the weight of each edge, where the weight of the edge from alarm A to alarm B, Edge(A→B) _weight , is:

参考图3，当需要判断一组告警序列中的根因告警时，则以两类特征作为识别模型的输入，其中，所述两类特征分别为告警的文本特征及告警之间的关联特征，识别模型的输出为根因告警。由于告警日志是文本的形式，本发明采用自然语言处理任务中文本向量化的思路。采用 GloVe模型和TF-IDF模型结合的方法来表示告警，以告警日志作为语料库来训练GloVe模型及TF-IDF模型，向量长度为50，一个告警的特征的向量表示为：Referring to Figure 3, when it is necessary to determine the root cause alarm in a set of alarm sequences, two types of features are used as the input of the recognition model, wherein the two types of features are the text features of the alarm and the correlation features between the alarms, and the output of the recognition model is the root cause alarm. Since the alarm log is in the form of text, the present invention adopts the idea of text vectorization in natural language processing tasks. A method combining the GloVe model and the TF-IDF model is used to represent the alarm. The alarm log is used as a corpus to train the GloVe model and the TF-IDF model. The vector length is 50, and the vector representation of the feature of an alarm is:

参考图4，根因告警识别模型SGC-RAI以一组告警的特征X∈R^N*D及其关系矩阵A∈R^N*N作为输入，N表示序列中告警的数量，D表示告警特征的维度。根因告警识别模型SGC-RAI从左到右分为3部分，首先使用2层空域图卷积来聚合局部信息，然后使用element-wise最大池化操作来提取全部故障信息，并将故障信息注入到告警中，再次更新节点信息，最后通过共享的全连接层及归一化操作将告警特征转化为根因分数。Referring to Figure 4, the root cause alarm identification model SGC-RAI takes a set of alarm features ^X∈RN*D and its relationship matrix A∈RN ^*N as input, where N represents the number of alarms in the sequence and D represents the dimension of the alarm features. The root cause alarm identification model SGC-RAI is divided into three parts from left to right. First, two layers of spatial graph convolution are used to aggregate local information, then element-wise maximum pooling operations are used to extract all fault information and inject the fault information into the alarm, and then the node information is updated again. Finally, the alarm features are converted into root cause scores through a shared fully connected layer and normalization operation.

本发明中的卷积操作(受关系图卷积启发)为：The convolution operation in this invention (inspired by relational graph convolution) is:

其中，v为告警序列中的告警，k为第k层。

为第k层的输出，

为获取的邻域信息，

为可学习的线性变换。N₁(v)和N₂(v) 分别代表节点v的两类邻居组成的集合。N₁(v)中的节点与节点v有边相连，且边的方向由节点v指向邻居节点，N₂(v)中的节点与节点v有边相连，且边的方向由邻居节点指向节点v。w(v,u)表示告警关联图中节点v 到邻居节点u的边的对应先验权重，w(u,v)表示告警关联图中邻居节点u 到节点v的边的对应先验权重。Among them, v is the alarm in the alarm sequence, and k is the kth layer.

is the output of the kth layer,

To obtain the neighborhood information,

根因告警预测计算为：The root cause alarm prediction calculation is:

挖掘模块，用于对告警数据进行预处理，利用提出的SR-ASM算法对告警数据进行挖掘，得告警序列，将告警序列转换为告警关系图；The mining module is used to pre-process the alarm data, mine the alarm data using the proposed SR-ASM algorithm, obtain the alarm sequence, and convert the alarm sequence into an alarm relationship graph;

识别模块，用于根据所述告警关系图及告警本身的文本特征，利用提出的根因告警识别模型SGC-RAI进行根因告警的识别，推荐识别出来的根因告警。一种计算机设备，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现所述根因告警分析方法的步骤。The identification module is used to identify the root cause alarm according to the alarm relationship diagram and the text features of the alarm itself, and recommend the identified root cause alarm using the proposed root cause alarm identification model SGC-RAI. A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the root cause alarm analysis method when executing the computer program.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图1个流程或多个流程和/或方框图1 个方框或多个方框中指定的功能的装置。The present application is described with reference to the flowchart and/or block diagram of the method, device (system) and computer program product according to the embodiment of the present application. It should be understood that each process and/or box in the flowchart and/or block diagram, and the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the function specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图1个流程或多个流程和/或方框图1个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图1个流程或多个流程和/或方框图1个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce computer-implemented processing, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

最后应当说明的是：以上实施例仅用以说明本发明的技术方案而非对其限制，尽管参照上述实施例对本发明进行了详细的说明，所属领域的普通技术人员应当理解：依然可以对本发明的具体实施方式进行修改或者等同替换，而未脱离本发明精神和范围的任何修改或者等同替换，其均应涵盖在本发明的权利要求保护范围之内。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention rather than to limit it. Although the present invention has been described in detail with reference to the above embodiments, ordinary technicians in the relevant field should understand that the specific implementation methods of the present invention can still be modified or replaced by equivalents, and any modifications or equivalent replacements that do not depart from the spirit and scope of the present invention should be covered within the scope of protection of the claims of the present invention.

Claims

1. A root cause alarm analysis method, characterized by comprising:

Preprocess the alarm data, use the SR-ASM algorithm to mine the alarm data, obtain the alarm sequence, and convert the alarm sequence into an alarm relationship diagram;

According to the alarm relationship diagram and the characteristics of the alarm itself, the root cause alarm is identified using the root cause alarm identification model SGC-RAI, and the identified root cause alarm is recommended;

The root cause alarm identification model SGC-RAI includes two layers of spatial graph convolution layers, a pooling layer, and a fully connected layer. The two layers of spatial graph convolution layers are used to aggregate local information, and the pooling layer uses element-wise maximum pooling operations to extract global fault information, and injects the fault information into the alarm to obtain a new representation of the alarm. Finally, the alarm features are converted into numerical values through a shared fully connected layer, and Softmax normalization is used to obtain the root cause score of the alarm. The alarm with the largest root cause score is taken as the root cause alarm.

The operation process of the two-layer spatial domain graph convolution layer is:

Where v is the alarm in the alarm sequence, k is the kth layer,

is the output of the kth layer,

To obtain the neighborhood information,

is a learnable linear transformation, N ₁ (v) and N ₂ (v) represent the sets of two types of neighbors of node v, respectively. The nodes in N ₁ (v) are connected to node v by edges, and the direction of the edges is from node v to the neighbor nodes. The nodes in N ₂ (v) are connected to node v by edges, and the direction of the edges is from the neighbor nodes to node v. w(v,u) represents the corresponding prior weight of the edge from node v to neighbor node u in the alarm association graph, and w(u,v) represents the corresponding prior weight of the edge from neighbor node u to node v in the alarm association graph. The operation of aggregating global information is:

Among them, G is the association graph composed of the entire alarm sequence:

The root cause alarm prediction calculation is:

2. The root cause alarm analysis method according to claim 1 is characterized in that it also includes: determining the direction of the bidirectional edges in the alarm relationship graph, and assigning weights to each pair of relationships on the alarm relationship graph according to confidence and improvement.

3. The root cause alarm analysis method according to claim 1 is characterized in that the process of preprocessing the alarm data is: deduplicating, sorting, and grouping the alarm data by network element and time window.

4. The root cause alarm analysis method according to claim 1 is characterized in that, according to the alarm relationship diagram and the characteristics of the alarm itself, the specific operation process of performing root cause alarm analysis using the root cause alarm identification model SGC-RAI is as follows:

For a set of alarm sequences of root cause alarms to be analyzed, the text features of the alarms are vectorized to obtain the features of the alarms themselves. The relational features of the alarms are extracted according to the alarm relationship graph. The root cause alarm identification model SGC-RAI is supervisedly trained using the features of the alarms themselves and the relational features, where the label is the root cause alarm. The trained root cause alarm identification model SGC-RAI is then used to identify the root cause alarms.

5. The root cause alarm analysis method according to claim 1 is characterized in that the alarm group is deduplicated, and then the text information of all alarms is used as a corpus to train the GloVe model and the TF-IDF model, wherein for an alarm log, the word segmentation of the alarm name is input into the trained GloVe model to obtain the word vector features of the alarm name segmentation, and then the trained TF-IDF model is used to weight the word vector features to obtain the features of the alarm itself, and the associated information of the alarm is obtained according to the alarm relationship diagram.

6. A root cause alarm analysis system, characterized by comprising:

The mining module is used to pre-process the alarm data, mine the alarm data using the SR-ASM algorithm, obtain the alarm sequence, and convert the alarm sequence into an alarm relationship diagram;

An identification module, used to identify the root cause alarm according to the alarm relationship diagram and the characteristics of the alarm itself, using the root cause alarm identification model SGC-RAI, and recommend the identified root cause alarm;

Where v is the alarm in the alarm sequence, k is the kth layer,

is the output of the kth layer,

To obtain the neighborhood information,

Among them, G is the association graph composed of the entire alarm sequence:

The root cause alarm prediction calculation is:

7. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the root cause alarm analysis method according to any one of claims 1 to 5 when executing the computer program.

8. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the root cause alarm analysis method according to any one of claims 1 to 5.