CN101635658A

CN101635658A - Method and system for detecting abnormality of network secret stealing behavior

Info

Publication number: CN101635658A
Application number: CN200910091605A
Authority: CN
Inventors: 张永铮; 云晓春; 李世淙
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2009-08-26
Filing date: 2009-08-26
Publication date: 2010-01-27
Anticipated expiration: 2029-08-26
Also published as: CN101635658B

Abstract

The present invention provides a method for abnormal detection of network theft, including: receiving network data packets, and performing protocol restoration on the obtained network data packets; making a correlation judgment between the network data packets and the target of interest, and discarding Relevant network data packets, perform the next step on the relevant network data packets; determine the connection to which the network data packets related to the concerned target belong; wherein, the connection includes a TCP connection or a UDP connection; Carry out statistics on the detection feature vectors on all connections related to the concerned target; calculate the value of each detection feature in the detection feature vector corresponding to the connection related to the concerned target according to the statistical results; judge the corresponding connection according to the value of the detection feature Whether there is an exception. The invention does not need to analyze and extract fixed features, and can discover unknown network theft behaviors in time; does not need to process the data load of network data packets, improves detection efficiency, and is suitable for application deployment in large-scale network environments.

Description

Abnormal detection method and system for network theft

技术领域 technical field

本发明涉及网络安全领域，特别涉及网络失窃密行为的异常检测方法及系统。The invention relates to the field of network security, in particular to an abnormal detection method and system for network theft.

背景技术 Background technique

近年来，随着网络入侵技术的发展以及木马、病毒、僵尸程序等恶意代码的泛滥，网络与信息安全问题日益突出，尤其是网络失密、窃密(合成网络失窃密)事件层出不穷，其中不乏影响重大的安全事件，危及社会经济和国家安全。因此，针对网络失密行为和窃密行为检测(合成网络失窃密行为检测)的研究工作显得尤为重要。In recent years, with the development of network intrusion technology and the proliferation of malicious codes such as Trojan horses, viruses, and bots, network and information security issues have become increasingly prominent, especially network confidentiality and theft (synthetic network theft) incidents have emerged in an endless stream, some of which have significant impacts. security incidents, endangering social economy and national security. Therefore, the research work on network stealing behavior and stealing behavior detection (synthetic network stealing behavior detection) is particularly important.

与网络入侵检测类似，网络失窃密行为检测方法也可分为误用检测和异常检测两类。误用检测主要通过对失窃密行为的深入分析，提取其相对应的固定特征(一般为字符串特征或二进制串特征)，并结合涉密敏感关键词等特征，对网络数据进行特征模式匹配，如果匹配成功则认为存在失窃密行为。而异常检测的一般原理是通过对网络历史数据的统计分析，预先建立网络正常活动的行为轮廓，然后在检测时统计当前网络行为，如果当前行为与正常行为轮廓的偏离程度超过一定范围则认为当前行为为网络失窃密行为。Similar to network intrusion detection, network theft detection methods can also be divided into two categories: misuse detection and anomaly detection. Misuse detection mainly extracts its corresponding fixed features (usually character string features or binary string features) through in-depth analysis of stolen secrets, and combines features such as secret-related and sensitive keywords to perform feature pattern matching on network data. If the matching is successful, it is considered that there is a secret theft. The general principle of anomaly detection is to pre-establish the behavior profile of normal network activities through the statistical analysis of historical network data, and then count the current network behavior during detection. The behavior is cyber theft.

现有技术中的网络入侵检测往往采用误用检测方法，该类方法较适合对已知行为进行检测，在选取恰当关键词或特征的前提下具有高效、准确等优点。然而，由于误用检测方法固有的不足，导致现有技术存在以下问题：The network intrusion detection in the prior art often adopts the misuse detection method, which is more suitable for detecting known behaviors, and has the advantages of high efficiency and accuracy under the premise of selecting appropriate keywords or features. However, due to the inherent deficiencies of misuse detection methods, the existing techniques have the following problems:

1)缺乏对未知失窃密事件的发现能力；1) Lack of ability to discover unknown theft of secrets;

2)由于误用检测方法只能对明文信息进行检测，因此现有方法缺乏对加密信息的检测能力；2) Since the misuse detection method can only detect plaintext information, the existing methods lack the ability to detect encrypted information;

3)在应用实践中，由于失窃密行为的变异以及新行为的出现导致固定特征提取很难，或跟不上失窃密行为的变化，因而检测效果不佳。3) In practice, due to the variation of stealing behavior and the emergence of new behaviors, it is difficult to extract fixed features, or it cannot keep up with the change of stealing behavior, so the detection effect is not good.

发明内容 Contents of the invention

本发明的目的是克服现有的网络失窃密行为异常检测方法缺乏对未知失窃密事件的发现能力、检测效果不佳等缺陷，从而提供一种高效、快捷的网络失窃密行为的异常检测方法。The purpose of the present invention is to overcome the defects of the existing anomaly detection method for network theft behavior, such as lack of ability to discover unknown theft events and poor detection effect, so as to provide an efficient and fast abnormal detection method for network theft behavior.

为了实现上述目的，本发明提供了一种网络失窃密行为的异常检测方法，包括：In order to achieve the above object, the present invention provides a method for abnormal detection of network theft secret behavior, comprising:

步骤1)、接收网络数据包，并对所得到的网络数据包做协议还原；Step 1), receiving the network data packet, and performing protocol restoration on the obtained network data packet;

步骤2)、将所述网络数据包与关注目标做相关性判断，抛弃不具有相关性的网络数据包，对具有相关性的网络数据包执行下一步；Step 2), making a correlation judgment between the network data packet and the target of concern, discarding the network data packet without correlation, and performing the next step for the network data packet with correlation;

步骤3)、判定与所述关注目标相关的网络数据包所属的连接；其中，所述连接包括TCP连接或UDP连接；Step 3), determine the connection to which the network data packet related to the concerned target belongs; wherein, the connection includes a TCP connection or a UDP connection;

步骤4)、对与所述关注目标相关的所有连接上的检测特征向量进行统计；其中，Step 4), making statistics on the detection feature vectors on all connections related to the target of interest; wherein,

所述检测特征向量包括用于表示当前连接最近一次发生访问的时间至检测时的时间间隔的最近访问间隔特征last_interval、用于表示检测时间间隔内当前连接的平均流量的流量特征traffic、用于表示当前连接最近若干次发生访问的时间间隔所呈现出的周期性的行为周期性特征period、用于表示在检测时间间隔内与当前连接具有相同sip且相继发生访问的连接的个数的同sip连接相继访问特征sip_count、用于表示在检测时间间隔内与当前连接具有相同dip的连接的个数的同dip的连接特征dip_count；The detection feature vector includes the latest access interval feature last_interval used to represent the time interval from the latest access to the current connection to the time of detection, the traffic feature traffic used to represent the average flow of the current connection within the detection time interval, and used to represent The periodic behavior of the current connection in the time interval of the most recent visits. period, which is used to indicate the same sip connection with the same sip as the current connection and the number of connections that have successively accessed within the detection time interval. The successive access feature sip_count, the connection feature dip_count of the same dip used to indicate the number of connections with the same dip as the current connection within the detection time interval;

步骤5)、根据统计结果，计算与所述关注目标相关的连接所对应的检测特征向量中各个检测特征的值；Step 5), according to the statistical results, calculate the value of each detection feature in the detection feature vector corresponding to the connection related to the target of interest;

步骤6)、根据所述检测特征的值判断对应连接是否存在异常。Step 6), judging whether the corresponding connection is abnormal according to the value of the detection feature.

上述技术方案中，在所述的步骤1)之前还包括学习阶段，所述学习阶段包括步骤1)-步骤4)，以得到与所述关注目标相关的所有连接上的检测特征向量的大量统计结果。In the above-mentioned technical solution, a learning stage is also included before the step 1), and the learning stage includes steps 1)-step 4), so as to obtain a large amount of statistics of the detection feature vectors on all connections related to the concerned target result.

上述技术方案中，在所述的步骤2)中，所述的相关性判断包括：In the above technical solution, in said step 2), said correlation judgment includes:

将所述网络数据包的源IP地址或目的IP地址与所述关注目标的IP地址进行匹配，若所述源IP地址、目的IP地址两者之一与所述关注目标的IP地址相匹配，则所述网络数据包与所述关注目标存在相关性，否则，不存在相关性。Matching the source IP address or the destination IP address of the network data packet with the IP address of the target of interest, if one of the source IP address and the destination IP address matches the IP address of the target of concern, Then there is a correlation between the network data packet and the target of interest; otherwise, there is no correlation.

上述技术方案中，所述的步骤3)包括：In the above-mentioned technical scheme, described step 3) comprises:

步骤3-1)、当与所述关注目标相关的网络数据包为TCP同步包时，若该网络数据包不属于一个已有连接，则为该网络数据包建立一个新的TCP连接，若该网络数据包属于一个已有连接，则开始对该已有连接的新访问；Step 3-1), when the network data packet related to the concerned target is a TCP synchronization packet, if the network data packet does not belong to an existing connection, then a new TCP connection is established for the network data packet, if the The network packet belongs to an existing connection, and a new access to the existing connection is started;

步骤3-2)、当与所述关注目标相关的网络数据包为TCP非同步包时，若该网络数据包不属于一个已有连接，则丢弃该网络数据包，若该网络数据包属于一个已有连接，则保留该网络数据包；Step 3-2), when the network data packet related to the concerned target is a TCP asynchronous packet, if the network data packet does not belong to an existing connection, then discard the network data packet, if the network data packet belongs to a If there is already a connection, keep the network data packet;

步骤3-3)、当与所述关注目标相关的网络数据包为UDP包时，若该网络数据包不属于一个已有连接，则为其建立一个新的UDP连接，若该网络数据包属于一个已有连接，则当该已有连接满足UDP连接的划分规则时，开始一个当前已有连接的新访问，然后做后续的统计处理；当该已有连接不满足UDP连接的划分规则时，直接做后续的统计处理。Step 3-3), when the network data packet related to the concerned target is a UDP packet, if the network data packet does not belong to an existing connection, a new UDP connection is established for it, if the network data packet belongs to An existing connection, when the existing connection meets the division rules of UDP connections, start a new access to the current existing connection, and then do subsequent statistical processing; when the existing connection does not meet the division rules of UDP connections, Do the follow-up statistical processing directly.

上述技术方案中，所述的UDP连接的划分规则包括：In the above technical solution, the division rules of the UDP connection include:

(1)、针对目的端口为53的UDP包，每个UDP包为一个UDP连接；(1), for the UDP packet whose destination port is 53, each UDP packet is a UDP connection;

(2)、针对所有其他UDP包，设置UDP连接计时器和划分阈值，连接建立或每次访问开始时，将计时器置0，在处理每个UDP包时，判断计时器是否超过预先设定的划分阈值，如果超过，则进行连接划分，开始一个当前连接的新访问，并将其计时器重置为0，如果不超过则无需划分。(2) For all other UDP packets, set the UDP connection timer and division threshold, set the timer to 0 when the connection is established or each access starts, and determine whether the timer exceeds the preset value when processing each UDP packet If the division threshold is exceeded, the connection division will be performed, a new access of the current connection will be started, and its timer will be reset to 0. If it does not exceed, no division is required.

上述技术方案中，在所述的步骤4)中，In the above-mentioned technical scheme, in described step 4),

对所述的最近访问间隔特征last_interval的统计包括：用变量last_time记录当前连接建立的时间或最近一次发生访问的时间；其中，当建立连接时，令last_time为0，当发生新访问时，令last_time为当前时间；The statistics of the latest access interval feature last_interval include: use the variable last_time to record the time when the current connection is established or the time when the last visit occurred; wherein, when the connection is established, let last_time be 0, and when a new visit occurs, let last_time is the current time;

对所述流量特征traffic的统计包括：用变量sum统计在检测间隔T₀内当前连接的所有数据包长之和；The statistics of the traffic characteristic traffic include: using the variable sum to count the sum of all data packet lengths currently connected in the detection interval T ₀ ;

对所述行为周期性特征period的统计包括：用长度为n的循环队列记录当前连接最近n次发生访问的时间间隔；The statistics of the periodic feature period of the behavior include: using a circular queue with a length of n to record the time interval of the latest n accesses of the current connection;

对所述同sip连接相继访问特征sip_count的统计包括：采用CountingBloom Filter方法统计sip_count特征；或，The statistics of the successive access feature sip_count to the same sip connection include: adopt the CountingBloom Filter method to count the sip_count feature; or,

设start_time为当前连接建立或发生访问的时间，flag为当前连接是否被统计过的标记，Counting Bloom Filter方法的统计元素包括计数器count和结束时间end_time，初始化均为0；当当前连接访问结束时，如果flag等于1，则返回，否则通过Counting Bloom Filter方法找到当前连接的sip所对应的统计元素，如果start_time大于end_time，那么令flag＝1、count+1、end_time＝当前的访问结束时间，否则如果start_time小等于end_time，则返回；Let start_time be the time when the current connection is established or access occurs, and flag is the flag indicating whether the current connection has been counted. The statistical elements of the Counting Bloom Filter method include the counter count and the end time end_time, both of which are initialized to 0; when the current connection access ends, If flag is equal to 1, then return, otherwise find the statistical element corresponding to the currently connected sip through the Counting Bloom Filter method, if start_time is greater than end_time, then set flag=1, count+1, end_time=current access end time, otherwise if If start_time is less than or equal to end_time, return;

对所述同dip的连接特征dip_count的统计包括：采用Counting BloomFilter方法对当前连接的dip进行统计。The statistics of the connection feature dip_count of the same dip includes: using the Counting BloomFilter method to count the current connected dips.

上述技术方案中，在所述的步骤5)中，对检测特征值的计算包括：In the above-mentioned technical scheme, in described step 5), the calculation to detection characteristic value comprises:

所述的最近访问间隔特征last_interval的值为当前时间与所述变量last_time的差；The value of the latest access interval feature last_interval is the difference between the current time and the variable last_time;

所述的流量特征traffic的值为所述变量sum与8的乘积以及与检测间隔T₀的商；The value of the traffic characteristic traffic is the product of the variable sum and 8 and the quotient of the detection interval _T0 ;

所述行为周期性特征period的值的计算公式为：The calculation formula of the value of the periodic characteristic period of the behavior is:

$period period = = Max Max {{p p ((m m)) | | m m = = 11 ~ ~ \frac{n no}{33}}};;$

其中，n为所述循环队列的长度；Max表示集合中的最大值；m表示分组长度；p(m)表示分组长度为m时，各分组元素的平均数所呈现出的均匀度，其大小采用以下公式计算：Wherein, n is the length of the circular queue; Max represents the maximum value in the set; m represents the packet length; p (m) represents the uniformity that the average number of each packet element presents when the packet length is m, its size Calculated using the following formula:

其中，Min表示取集合中的最小值，

为分组的个数，

\overset{&OverBar;}{q} = Σ_{j = 1}^{n} q_{j}

为队列元素的平均数，

p_{i} = Σ_{j = m \times (i - 1) + 1}^{Min (m \times i, n)} q_{j} / m

为第i分组元素的平均数；Among them, Min means to take the minimum value in the set,

is the number of groups,

\overset{&OverBar;}{q} = Σ_{j = 1}^{no} q_{j}

is the average number of queue elements,

p_{i} = Σ_{j = m \times (i - 1) + 1}^{Min (m \times i, no)} q_{j} / m

is the average number of i grouping elements;

所述同sip连接相继访问特征sip_count的值为Counting Bloom Filter方法中统计元素的count值；The value of the successive access feature sip_count of the same sip connection is the count value of the statistical element in the Counting Bloom Filter method;

所述同dip的连接特征dip_countsip_count的值为Counting BloomFilter方法中统计元素的count值。The value of the connection feature dip_countsip_count of the same dip is the count value of the statistical element in the Counting BloomFilter method.

上述技术方案中，所述的步骤6)包括：In above-mentioned technical scheme, described step 6) comprises:

步骤6-1)、对检测特征是否存在异常进行判断；Step 6-1), judging whether there is an abnormality in the detection feature;

步骤6-2)、根据检测特征的异常判断结果对所对应连接是否存在异常进行判断。In step 6-2), it is judged whether there is an abnormality in the corresponding connection according to the abnormality judgment result of the detection feature.

上述技术方案中，在所述的步骤6-1)中，采用将所述检测特征与对应检测阈值比较的方法或采用模型判断的方法对检测特征是否存在异常进行判断。In the above technical solution, in the step 6-1), a method of comparing the detection feature with a corresponding detection threshold or a model judgment method is used to judge whether the detection feature is abnormal.

上述技术方案中，在所述的步骤6-2)中，采用设定判断标准的方法或采用模型判断的方法对连接是否存在异常进行判断。In the above technical solution, in the step 6-2), it is judged whether there is an abnormality in the connection by using a method of setting a judgment standard or a method of model judgment.

上述技术方案中，所述的判断标准包括：若最近访问间隔特征last_ interval和流量特征traffic异常，或者如果最近访问间隔特征last-interval和行为周期性特征period异常，或者同sip连接相继访问特征sip_count存在异常，或者最近访问间隔特征last_interval以及同dip的连接特征同时发生异常，则连接发生行为异常。In the above-mentioned technical scheme, described judging standard comprises: if the most recent visit interval characteristic last_interval and traffic characteristic traffic are abnormal, or if the most recent visit interval characteristic last-interval and behavior periodicity characteristic period are abnormal, or connect successively with sip access characteristic sip_count If there is an exception, or if the latest access interval feature last_interval and the connection feature of the same dip are abnormal at the same time, the connection will behave abnormally.

上述技术方案中，在所述的步骤6-2)中，还包括：In the above-mentioned technical scheme, in described step 6-2), also include:

提取失密信息和窃密信息；其中，sip为失密者，dip为窃密者，若当前连接的direction为in，则所述关注目标的IP地址疑为窃密者，若当前连接的direction为out，则所述关注目标的IP地址疑为失密者。Extract the secret information and the secret information; wherein, sip is a secret stealer, dip is a secret stealer, if the direction of the current connection is in, then the IP address of the concerned target is suspected of being a secret stealer, if the direction of the current connection is out, then the The IP address of the target of concern is suspected to be a leaker.

本发明还提供了一种网络失窃密行为的异常检测系统，包括数据包还原模块、相关性判断模块、连接判定模块、检测特征向量统计模块、检测特征值计算模块以及异常判断模块；其中，The present invention also provides an abnormality detection system for network theft, including a data packet recovery module, a correlation judgment module, a connection judgment module, a detection feature vector statistics module, a detection feature value calculation module, and an abnormality judgment module; wherein,

所述的数据包还原模块接收网络数据包，并对所得到的网络数据包做协议还原；The data packet restoration module receives the network data packet, and performs protocol restoration on the obtained network data packet;

所述的相关性判断模块将所述网络数据包与关注目标做相关性判断，抛弃不具有相关性的网络数据包，对具有相关性的网络数据包发送到连接判定模块；The correlation judgment module makes a correlation judgment between the network data packet and the target of interest, discards the network data packets without correlation, and sends the relevant network data packets to the connection judgment module;

所述的连接判定模块判定与所述关注目标相关的网络数据包所属的连接；其中，所述连接包括TCP连接或UDP连接；The connection determination module determines the connection to which the network data packet related to the concerned target belongs; wherein, the connection includes a TCP connection or a UDP connection;

所述的检测特征向量统计模块对与所述关注目标相关的所有连接上的检测特征向量进行统计；其中，The detection feature vector statistics module performs statistics on the detection feature vectors on all connections related to the target of interest; wherein,

所述的检测特征值计算模块根据统计结果，计算与所述关注目标相关的连接所对应的检测特征向量中各个检测特征的值；The detection feature value calculation module calculates the value of each detection feature in the detection feature vector corresponding to the connection related to the target of interest according to the statistical results;

所述的异常判断模块根据所述检测特征的值判断对应连接是否存在异常。The abnormality judging module judges whether the corresponding connection is abnormal according to the value of the detection feature.

本发明的优点在于：The advantages of the present invention are:

1、本发明不需要分析和提取固定特征，能够及时地发现未知网络失窃密行为。1. The present invention does not need to analyze and extract fixed features, and can promptly discover unknown network theft.

2、本发明不需要处理网络数据包的数据负载，提高了检测效率，适合于大规模网络环境的应用部署。2. The present invention does not need to process the data load of the network data packet, improves the detection efficiency, and is suitable for application deployment in a large-scale network environment.

附图说明 Description of drawings

图1为本发明的网络失窃密行为的异常检测方法的流程图。FIG. 1 is a flow chart of the abnormal detection method for network theft of secrets according to the present invention.

具体实施方式 Detailed ways

下面结合附图和具体实施方式对本发明加以说明。The present invention will be described below in conjunction with the accompanying drawings and specific embodiments.

通过实践经验和理论分析可以知道：网络失窃密行为在其生命周期里一般遵循以下必要过程：Through practical experience and theoretical analysis, it can be known that network theft generally follows the following necessary processes in its life cycle:

1)、利用窃密恶意代码植入、传播或网络入侵等手段侵入网络信息系统；1) Invading the network information system by means of implanting, disseminating or network intrusion of malicious code stealing;

2)、搜集涉密信息；2) Collect confidential information;

3)、回传涉密信息。3) Return the confidential information.

本发明针对网络失窃密行为过程的上述特点，基于网络检测的应用需求，重点关注网络失窃密行为过程中“回传涉密信息”这一必要步骤，通过深入分析其行为特点，提取其关键行为特征，通过与学习到的网络正常行为轮廓的比较来发现其行为异常，从而检测出可疑的失窃密行为事件。The present invention aims at the above-mentioned characteristics of the network stealing behavior process, based on the application requirements of network detection, focusing on the necessary step of "returning secret-related information" in the network stealing behavior process, and extracting its key behavior through in-depth analysis of its behavior characteristics Features, by comparing with the learned normal behavior profile of the network to find its abnormal behavior, so as to detect suspicious stealing behavior events.

下面结合图1，就本发明如何实现对失密或窃密事件的检测进行说明。Next, with reference to FIG. 1 , how the present invention realizes the detection of the event of loss of confidentiality or theft of confidentiality will be described.

要实现本发明的方法，首先要对本发明方法在实现过程中所涉及到的一些检测参数做初始化操作。例如，将计时器T设置为0，将检测时间间隔设置为T₀，设置所关注目标的IP地址，设置UDP连接的划分规则，设置检测方法所需的参数，例如基于阈值的异常检测时所需要的检测特征的阈值以及学习时间等。上述检测参数在后面的描述中都会涉及，因此，将在下文中就检测参数的具体取值进行说明。To realize the method of the present invention, some detection parameters involved in the realization of the method of the present invention must be initialized first. For example, set the timer T to 0, set the detection interval to T ₀ , set the IP address of the target concerned, set the division rules of UDP connections, and set the parameters required by the detection method, such as threshold-based anomaly detection. The threshold of the required detection features and the learning time, etc. The above-mentioned detection parameters will be involved in the following description, therefore, specific values of the detection parameters will be described below.

通过初始化操作设定检测参数的初始值以后，还需要获得与检测有关的数据。在获得与检测有关数据的过程中，先要在互联网上接收网络数据包，对所接收的网络数据包做协议还原，以获取网络数据包的网络层首部和传输层首部的信息。在本实施例中，所述的网络数据包为IP包，具体包括TCP包和UDP包。所述的协议还原为TCP/IP协议还原，所能获取的网络层首部和传输层首部的信息包括源IP地址、目的IP地址、协议、包长、TCP标志信息等。上述的接收网络数据包、做协议还原等操作都是本领域技术人员所公知的现有技术，因此，不在此处做重复说明。After setting the initial values of the detection parameters through the initialization operation, it is also necessary to obtain the data related to the detection. In the process of obtaining data related to detection, it is first necessary to receive network data packets on the Internet, and perform protocol restoration on the received network data packets, so as to obtain the information of the network layer header and the transport layer header of the network data packets. In this embodiment, the network data packets are IP packets, specifically including TCP packets and UDP packets. The protocol restoration is TCP/IP protocol restoration, and the information that can be obtained from the network layer header and the transport layer header includes source IP address, destination IP address, protocol, packet length, TCP flag information, etc. The above-mentioned operations of receiving network data packets and performing protocol restoration are all existing technologies known to those skilled in the art, so repeated descriptions are not repeated here.

在前面的说明中已经提到，本发明的方法是要对所关注目标是否发生了失窃密事件进行检测。因此，在检测过程中只需要关注与所关注目标有关的网络数据包。但在前述接收网络数据包的过程中，所接收的网络数据包未必都与所关注目标相关。基于上述原因，需要将所接收的网络数据包做相关性判定。所述的相关性判定过程包括：将所接收网络数据包的源IP地址或目的IP地址与初始化过程中所设置的所有关注目标的IP地址进行匹配，如果源IP地址、目的IP地址两者之一与关注目标的IP地址相匹配，就认为该网络数据包与所关注目标间存在相关性。反之，就不存在相关性。由于所关注目标的IP地址通常有多个，为了提高匹配效率，可以采用现有技术的基于哈希表的匹配方法来实现IP地址的高效匹配。It has been mentioned in the foregoing description that the method of the present invention is to detect whether a secret theft event has occurred in the target concerned. Therefore, it is only necessary to pay attention to the network data packets related to the concerned target during the detection process. However, in the aforementioned process of receiving network data packets, not all received network data packets are related to the concerned target. Based on the above reasons, it is necessary to make a correlation determination on the received network data packets. The correlation determination process includes: matching the source IP address or the destination IP address of the received network data packet with the IP addresses of all concerned targets set in the initialization process, if the source IP address and the destination IP address Once it matches the IP address of the target of interest, it is considered that there is a correlation between the network data packet and the target of interest. Otherwise, there is no correlation. Since there are usually multiple IP addresses of the target concerned, in order to improve the matching efficiency, a matching method based on a hash table in the prior art may be used to achieve efficient matching of IP addresses.

在将网络数据包做相关性判定后，抛弃不具有相关性的网络数据包。对于有相关性的网络数据包，进一步判定这些网络数据包所属的连接。在前面的说明中已经提到，本实施例的网络数据包包括有TCP包和UDP包，因此，与这些网络数据包所对应的连接包括TCP连接和UDP连接。无论是TCP连接还是UDP连接都可以用一个4元组<sip，dip，protocol，direction>统一加以说明。在该4元组中，sip表示连接发起方的IP地址，dip表示连接接收方的IP地址，protocol表示当前连接的协议类型，具体包括TCP、UDP，direction表示对关注目标而言的连接的方向，包括in和out，in表示流入的连接，out表示发出的连接。其中sip、dip和protocol是一个连接的唯一标识。在前文所述相关性判定中，如果网络数据包的源IP地址匹配上关注目标，则direction的值为out，如果网络数据包的目的IP地址匹配上关注目标，则direction的值为in。After the network data packets are determined to be relevant, the network data packets that are not relevant are discarded. For related network data packets, the connection to which these network data packets belong is further determined. As mentioned above, the network data packets in this embodiment include TCP packets and UDP packets, therefore, the connections corresponding to these network data packets include TCP connections and UDP connections. Whether it is a TCP connection or a UDP connection, a 4-tuple <sip, dip, protocol, direction> can be used to describe it uniformly. In this 4-tuple, sip indicates the IP address of the connection initiator, dip indicates the IP address of the connection receiver, protocol indicates the protocol type of the current connection, specifically including TCP and UDP, and direction indicates the direction of the connection to the target of interest , including in and out, in means the incoming connection, and out means the outgoing connection. Among them, sip, dip and protocol are unique identifiers of a connection. In the correlation determination mentioned above, if the source IP address of the network data packet matches the target of interest, the value of direction is out; if the destination IP address of the network data packet matches the target of interest, the value of direction is in.

虽然TCP连接和UDP连接都可以用上述的4元组统一说明，但由TCP包判定所属TCP连接以及由UDP包判定所属UDP连接的过程有明显的差异。因此，在下文中分别予以说明。Although both the TCP connection and the UDP connection can be uniformly described by the above 4-tuple, there are obvious differences in the process of determining the TCP connection from the TCP packet and the UDP connection from the UDP packet. Therefore, they are described separately below.

1、当考察的网络数据包为TCP同步包(即标志位为Syn的TCP包)时，如果该数据包不属于一个已有连接，则为其建立一个新的TCP连接。该数据包的源IP地址为TCP连接4元组中连接发起方的IP地址sip，该数据包的目的IP地址为4元组中连接接收方的IP地址dip，连接的协议protocol为TCP。由TCP同步包建立TCP连接的实现属于已知的公知技术，具体的实现细节可参考现有技术文献。如果该数据包属于一个已有连接，则表示要开始对当前已有连接的新访问。1. When the network data packet investigated is a TCP synchronous packet (that is, a TCP packet whose flag is Syn), if the data packet does not belong to an existing connection, a new TCP connection is established for it. The source IP address of the data packet is the IP address sip of the connection initiator in the TCP connection 4-tuple, the destination IP address of the data packet is the IP address dip of the connection receiver in the 4-tuple, and the protocol protocol of the connection is TCP. The implementation of establishing a TCP connection by using the TCP synchronization packet belongs to known technologies, and specific implementation details may refer to prior art documents. If the data packet belongs to an existing connection, it indicates that a new access to the current existing connection is to be started.

2、当考察的网络数据包为非同步包的其它TCP包时，如果该数据包不属于一个已有连接，则丢弃，如果该数据包属于一个已有连接，则保留，以做后续的统计处理。2. When the network data packet under investigation is other TCP packets of non-synchronous packets, if the data packet does not belong to an existing connection, it will be discarded; if the data packet belongs to an existing connection, it will be kept for subsequent statistics deal with.

3、当考察的网络数据包为UDP包时，如果该数据包不属于一个已有连接，则为其建立一个新的UDP连接。该数据包的源IP地址为前述4元组中连接发起方的IP地址sip，该数据包的目的IP地址为前述4元组中连接接收方的IP地址dip，连接的协议protocol为UDP。如果该数据包属于一个已有连接，则判断该连接是否满足预先定义的UDP连接的划分规则，如果满足，则开始一个当前连接的新访问，然后做后续的统计处理，如果不满足则可直接做后续的统计处理。在本实施例中，所述UDP连接的划分规则可定义为：(1)针对目的端口为53的UDP包，每个UDP包为一个UDP连接；(2)针对所有其他UDP包，设置UDP连接计时器和划分阈值，在一个实例中所述的划分阈值可设置为10秒，连接建立或每次访问开始时，将计时器置0，在处理每个UDP包时，判断计时器是否超过预先设定的划分阈值，如果超过，则进行连接划分，开始一个当前连接的新访问，并将其计时器重置为0，如果不超过则无需划分。3. When the network data packet under investigation is a UDP packet, if the data packet does not belong to an existing connection, a new UDP connection is established for it. The source IP address of the data packet is the IP address sip of the connection initiator in the aforementioned 4-tuple, the destination IP address of the data packet is the IP address dip of the connection receiver in the aforementioned 4-tuple, and the protocol protocol of the connection is UDP. If the data packet belongs to an existing connection, it is judged whether the connection satisfies the pre-defined division rules of UDP connections. If it is satisfied, a new visit of the current connection is started, and then subsequent statistical processing is performed. If it is not satisfied, it can be directly Do follow-up statistical processing. In this embodiment, the division rule of the UDP connection can be defined as: (1) for the UDP packet whose destination port is 53, each UDP packet is a UDP connection; (2) for all other UDP packets, set the UDP connection Timer and division threshold, the division threshold described in an example can be set to 10 seconds, the timer is set to 0 when the connection is established or each access starts, and when processing each UDP packet, it is judged whether the timer exceeds the preset If the set division threshold is exceeded, the connection division will be performed, a new access of the current connection will be started, and its timer will be reset to 0. If it does not exceed, no division is required.

在上述判定过程中，判断一个数据包是否属于一个连接的方法为：如果所考察的数据包的源IP地址、目的IP地址、协议分别与一个已有连接的sip、dip和protocol相同，则该数据包属于此连接，如果不相同，则不属于此连接。In the above-mentioned determination process, the method for judging whether a data packet belongs to a connection is as follows: if the source IP address, destination IP address, and protocol of the data packet under investigation are the same as the sip, dip and protocol of an existing connection respectively, then the The packet belongs to this connection, if not the same, it does not belong to this connection.

在由网络数据包得到其所属的连接后，可以对各个连接上的检测特征向量进行统计。检测特征向量包括有多个检测特征，所述检测特征反映了发生网络失窃密行为时所呈现出的网络流量特征和/或行为特征。通过检测特征的异常可发现是否存在网络失窃密行为。检测特征的选取不仅与所要检测的异常的种类有关，而且也和异常检测的准确率、效率有关。在一个实施例中，针对网络失窃密行为的特点，本发明为每个连接设计了一个由5个检测特征所组成的检测特征向量<last_interval，traffic，period，sip_count，dip_count>，该检测特征向量中各个检测特征的含义如下：After the connection to which it belongs is obtained from the network data packet, the detection feature vectors on each connection can be counted. The detection feature vector includes a plurality of detection features, and the detection features reflect the network traffic characteristics and/or behavior characteristics presented when a network theft occurs. By detecting the abnormality of the characteristics, it can be found whether there is network theft. The selection of detection features is not only related to the type of anomaly to be detected, but also related to the accuracy and efficiency of anomaly detection. In one embodiment, for the characteristics of network theft, the present invention designs a detection feature vector <last_interval, traffic, period, sip_count, dip_count> consisting of 5 detection features for each connection, the detection feature vector The meanings of each detection feature in are as follows:

a、最近访问间隔特征last_interval：表示当前连接最近一次发生访问的时间至检测时的时间间隔，单位为秒。在失窃密行为发生时，窃密者的IP地址往往不是失密者平时习惯性访问的IP地址集合，所以在失密者访问IP列表中可能出现从未访问过的IP地址，另一方面，为了达到隐蔽的目的，失窃密行为往往呈现出访问频次很低的特点。满足上述特点的某一连接的最近访问间隔特征的值较大，因此其失窃密行为的可能性就较大。因此在此选取了最近访问间隔特征作为失窃密行为的一个检测特征。a. The latest access interval feature last_interval: Indicates the time interval from the last access time of the current connection to the detection time, in seconds. When the theft occurs, the IP address of the stealer is often not the set of IP addresses that the stealer usually visits habitually, so there may be IP addresses that have never been visited in the access IP list of the stealer. On the other hand, in order to achieve concealment The purpose of stealing secrets often shows the characteristics of low frequency of visits. A connection that satisfies the above characteristics has a larger value of the latest access interval characteristic, so the possibility of its theft is greater. Therefore, the recent access interval feature is selected as a detection feature of the theft.

b、流量特征traffic：表示在检测间隔T₀内，当前连接的平均流量，单位为bps(每秒比特数)。失窃密行为在窃密过程中往往具有在短时间内大量下载涉密信息的特点，以达到尽快窃取涉密信息的目的，因此选择流量特征作为失窃密行为的一个检测特征。b. Traffic characteristic traffic: indicates the average traffic of the current connection within the detection interval T ₀ , and the unit is bps (bits per second). Stealing behavior often has the characteristics of downloading a large amount of secret-related information in a short period of time in the process of stealing secrets, so as to achieve the purpose of stealing secret-related information as soon as possible. Therefore, traffic characteristics are selected as a detection feature of secret-stealing behavior.

c、行为周期性特征period：表示至检测时，当前连接最近n次发生访问的时间间隔呈现出的周期性，其中n为阈值，应为3的倍数，通常可设为6、9或12，周期性的大小为数值范围在0到1的模糊数。应用实践表明，绝大多数失窃密行为都是由窃密木马或僵尸程序等恶意代码产生的，作为一种计算机程序，这些恶意代码在窃密过程中往往产生一定周期性和规律性的访问行为，因此，选取行为周期性特征作为失窃密行为的一个检测特征。c. Behavior periodic feature period: It indicates the periodicity of the time interval between the last n accesses of the current connection until the detection, where n is the threshold, which should be a multiple of 3, and can usually be set to 6, 9 or 12. The magnitude of the periodicity is a fuzzy number in the range 0 to 1. Application practice shows that the vast majority of stealing behaviors are generated by malicious codes such as stealing Trojan horses or zombie programs. As a kind of computer program, these malicious codes often produce certain periodic and regular access behaviors during the stealing process, so , select the periodic feature of behavior as a detection feature of stealing secret behavior.

d、同sip连接相继访问特征sip_count：表示在检测时间间隔T₀内与当前连接具有相同sip且相继发生访问的连接的个数，所述相继发生是指访问按时间顺序先后连续发生。窃密木马或僵尸程序等恶意代码通常会依次访问多个IP地址以实现域名解析、获取窃密控制服务器IP地址或获取窃密命令等，因此，选取同sip连接相继访问特征作为一个检测特征。d. Sequential access feature sip_count of the same sip connection: Indicates the number of connections that have the same sip as the current connection within the detection time interval T ₀ and accesses occur successively. Malicious codes such as secret-stealing Trojans or bots usually access multiple IP addresses in sequence to achieve domain name resolution, obtain the IP address of the stealing control server, or obtain stealing commands. Therefore, the sequential access feature of the same sip connection is selected as a detection feature.

e、同dip的连接特征dip_count：表示在检测时间间隔T₀内与当前连接具有相同dip的连接的个数。窃密木马或僵尸程序等恶意代码在与控制服务器通信时，会导致相同dip(控制服务器IP地址)的连接数与正常行为明显不同，因此，选取同dip的连接特征作为一个检测特征。e. Connection feature dip_count of the same dip: indicates the number of connections with the same dip as the current connection within the detection time interval T ₀ . When malicious codes such as secret-stealing Trojans or bots communicate with the control server, the number of connections with the same dip (IP address of the control server) will be significantly different from the normal behavior. Therefore, the connection feature of the same dip is selected as a detection feature.

在统计检测特征向量中的上述各个检测特征时，有不同的统计方法，包括：When counting the above-mentioned detection features in the detection feature vector, there are different statistical methods, including:

a、针对last_interval特征：用变量last_time记录当前连接建立的时间或最近一次发生访问的时间。当建立连接时，令last_time为0，当发生新访问时，令last_time为当前时间。a. For the last_interval feature: use the variable last_time to record the time when the current connection was established or the time when the last access occurred. When a connection is established, let last_time be 0, and when a new access occurs, let last_time be the current time.

b、针对traffic特征：用变量sum统计在检测间隔T₀内当前连接的所有数据包长之和，即当前连接的总字节数。b. For traffic characteristics: use the variable sum to count the sum of the lengths of all data packets of the current connection within the detection interval T ₀ , that is, the total number of bytes of the current connection.

c、针对period特征：用长度为n的循环队列记录当前连接最近n次发生访问的时间间隔。c. For the period feature: use a circular queue with a length of n to record the time interval of the latest n accesses of the current connection.

d、针对sip_count特征：可基于已公开的Counting Bloom Filter(CBF)技术实现对sip_count特征的统计。除了上述已有技术外，在本实施例中还可以采用下列统计算法：d. For the sip_count feature: statistics on the sip_count feature can be realized based on the published Counting Bloom Filter (CBF) technology. In addition to the above-mentioned prior art, the following statistical algorithms can also be used in this embodiment:

设start_time为当前连接建立或发生访问的时间，flag为当前连接是否被统计过的标记，CBF的统计元素为计数器count和结束时间end_time，初始化均为0。当当前连接访问结束时，即当收到TCP结束包(即标志位为Fin的TCP包)或满足前文所述UDP连接的划分规则时，如果flag等于1，则返回，否则通过CBF技术找到当前连接的sip所对应的统计元素，如果start_time大于end_time，那么令flag＝1、count+1、end_time＝当前的访问结束时间，否则如果start_time小等于end_time，则返回。Let start_time be the time when the current connection is established or access occurs, and flag is the flag indicating whether the current connection has been counted. The statistical elements of CBF are the counter count and the end time end_time, both of which are initialized to 0. When the current connection access ends, that is, when the TCP end packet (that is, the TCP packet whose flag is Fin) is received or the division rule of the UDP connection mentioned above is satisfied, if the flag is equal to 1, return, otherwise, find the current connection through CBF technology For the statistical element corresponding to the connected sip, if start_time is greater than end_time, then set flag=1, count+1, end_time=current access end time, otherwise, if start_time is less than or equal to end_time, then return.

e、针对dip_count特征：当建立连接或发生新访问时，可采用已公开的Counting Bloom Filter技术对当前连接的dip进行统计。e. For the dip_count feature: when a connection is established or a new visit occurs, the disclosed Counting Bloom Filter technology can be used to count the dips of the current connection.

以上经由接收网络数据包、由网络数据包判定所属连接、对各个连接的检测特征进行统计的过程所获得的结果即为与检测有关的数据。在一种优选实现方式中，为了提高异常检测的最终效果，可以在一个较长的时间段内，重复执行上述的接收网络数据包、判定网络数据包所属连接、对各个连接的检测特征进行统计的过程以得到较为丰富的统计结果。这种重复执行过程也被称为学习过程，学习过程所占用的时间由前述初始化时所设置的学习时间确定，一般来说，学习时间越长，最后所得到的检测效果就越好。如在一个实例中，经实验测得，将学习时间设置为24×7小时时，最后的检测效果较好。在实践中，需要在学习时间与检测效果间达到一个较好的平衡，学习时间一般可设置为24～48小时。此外，虽然学习过程的存在有利于提高异常检测的最终效果，但在其他的实施例中，在整个异常检测过程中也可以省去所述的学习过程，只是该异常检测过程的检测结果中的误报率会有所提高。The results obtained through the above processes of receiving network data packets, determining the connection from the network data packets, and counting the detection characteristics of each connection are data related to detection. In a preferred implementation, in order to improve the final effect of anomaly detection, the above-mentioned receiving network data packets, determining the connection to which the network data packets belong, and counting the detection characteristics of each connection can be performed repeatedly within a relatively long period of time process to obtain richer statistical results. This repeated execution process is also called a learning process, and the time taken by the learning process is determined by the learning time set during the aforementioned initialization. Generally speaking, the longer the learning time, the better the final detection effect. For example, in an example, it is found through experiments that when the learning time is set to 24×7 hours, the final detection effect is better. In practice, it is necessary to achieve a good balance between the learning time and the detection effect, and the learning time can generally be set to 24 to 48 hours. In addition, although the existence of the learning process is conducive to improving the final effect of anomaly detection, in other embodiments, the learning process can also be omitted in the entire anomaly detection process, and only the detection results of the anomaly detection process The false positive rate will increase.

经过前述的接收网络数据包、由网络数据包判定所属连接、对各个连接的检测特征进行统计后，就可以对与关注目标有关的各个连接分别做异常检测。以其中的一个连接为例，首先由前面所得到的统计结果计算该连接所对应的检测特征向量中各个检测特征的值。计算方法如下：After receiving the network data packet, determining the connection according to the network data packet, and counting the detection characteristics of each connection, anomaly detection can be performed on each connection related to the target of interest. Taking one of the connections as an example, the value of each detection feature in the detection feature vector corresponding to the connection is firstly calculated from the statistical results obtained above. The calculation method is as follows:

a、last_interval＝now_time-last_time，其中now_time为当前时间。a. last_interval=now_time-last_time, where now_time is the current time.

b、traffic＝sum×8/T₀。b. traffic=sum×8/T ₀ .

c、period的计算可采用现有技术中的相关方法实现，如自相似性判定方法等。在一个实施例中，采用了以下计算方法：c. The calculation of the period can be realized by using related methods in the prior art, such as a self-similarity determination method and the like. In one embodiment, the following calculation method is used:

与period特征有关的统计结果中包括有长度为n的循环队列，设该队列中的元素为q_j(j＝1～n)，则 $period = Max {p (m) | m = 1 ~ \frac{n}{3}},$ 其中Max表示取集合中的最大值，m表示分组长度(即将队列元素划分成若干组时，每组所包含的元素数)，p(m)表示分组长度为m时，各分组元素的平均数所呈现出的均匀度。p(m)的计算公式如下：The statistical results related to the period feature include a circular queue with a length of n, assuming that the elements in the queue are q _j (j=1~n), then $period = Max {p (m) | m = 1 ~ \frac{no}{3}},$ Among them, Max means to take the maximum value in the set, m means the grouping length (that is, when the queue elements are divided into several groups, the number of elements contained in each group), p(m) means when the grouping length is m, the average number of grouping elements The uniformity presented. The calculation formula of p(m) is as follows:

其中，Min表示取集合中的最小值，

为分组的个数，

\overset{&OverBar;}{q} = Σ_{j = 1}^{n} q_{j}

为队列元素的平均数，

p_{i} = Σ_{j = m \times (i - 1) + 1}^{Min (m \times i, n)} q_{j} / m

为第i分组元素的平均数。Among them, Min means to take the minimum value in the set,

is the number of groups,

\overset{&OverBar;}{q} = Σ_{j = 1}^{no} q_{j}

is the average number of queue elements,

p_{i} = Σ_{j = m \times (i - 1) + 1}^{Min (m \times i, no)} q_{j} / m

is the average number of elements in the i-th group.

该计算方法的时间复杂性为O(n²)，为提高处理效率，在应用实践中，可仅考虑当m＝1的情况，此时的时间复杂性为O(n)。The time complexity of this calculation method is O(n ² ). In order to improve processing efficiency, in practice, only the case of m=1 can be considered, and the time complexity at this time is O(n).

d、sip_count为其对应CBF中统计元素的count值。d. sip_count is the count value of the statistical element in the corresponding CBF.

e、dip_count为其对应CBF中统计元素的count值。e. dip_count is the count value of the statistical element in the corresponding CBF.

在得到该连接的各个检测特征的值以后，就可以对这一连接是否存在异常进行检测。异常检测的实现方法有多种，比如基于阈值的检测方法、基于模型的检测方法(如基于贝叶斯网络)等等。下面分别以基于阈值的检测方法和基于模型的检测方法为例，对如何实现异常检测的过程进行说明。After obtaining the values of each detection characteristic of the connection, it is possible to detect whether there is an abnormality in the connection. There are many ways to implement anomaly detection, such as threshold-based detection methods, model-based detection methods (such as Bayesian network-based) and so on. In the following, the threshold-based detection method and the model-based detection method are taken as examples to illustrate how to implement anomaly detection.

(1)、在一个基于阈值检测方法的实施例中，首先对检测特征是否存在异常进行判断，然后根据检测特征的异常判断结果对整个连接是否存在异常进行判断。可以为各个检测特征设定检测阈值(每个检测阈值可以是一个单独的阈值，也可以是一个阈值范围或一组阈值)，然后将检测特征与相应的检测阈值进行比较，如果一个检测特征大于或小于或属于或背离该检测特征相应的检测阈值，就认为该特征发生异常。所述的检测阈值可根据实践经验和应用环境为每个检测特征分别设置，例如，在一个实例中可以将last_interval、traffic、period、sip_count、dip_count的检测阈值分别设置为86400秒(24小时)、300k bps、0.95、[3，5]、128，当last_interval、traffic、period、dip count分别大于相应检测阈值，sip_count不属于相应的阈值范围时，则认为相应检测特征异常。此外，所述检测阈值也可以通过采用已公开的技术和方法对应用环境的网络数据进行预先训练和学习来设置。(1) In an embodiment based on the threshold value detection method, it is first judged whether there is an abnormality in the detection feature, and then it is judged whether there is an abnormality in the entire connection according to the abnormality judgment result of the detection feature. Detection thresholds can be set for each detection feature (each detection threshold can be an individual threshold, or a threshold range or a set of thresholds), and then the detection feature is compared with the corresponding detection threshold, if a detection feature is greater than If it is less than or belongs to or deviates from the corresponding detection threshold of the detection feature, it is considered that the feature is abnormal. The detection threshold can be set respectively for each detection feature according to practical experience and application environment. For example, in an example, the detection thresholds of last_interval, traffic, period, sip_count, and dip_count can be respectively set to 86400 seconds (24 hours), 300k bps, 0.95, [3, 5], 128, when last_interval, traffic, period, and dip count are greater than the corresponding detection threshold, and sip_count does not belong to the corresponding threshold range, the corresponding detection feature is considered abnormal. In addition, the detection threshold can also be set by pre-training and learning the network data of the application environment by using disclosed technologies and methods.

在得到组成检测特征向量的各个检测特征的异常判断结果后，可以对整个连接是否存在异常做进一步判断。在一个实施例中，设定如下的判断标准：1)如果last_interval和traffic特征异常；2)如果last_interval和period特征异常；3)如果sip_count特征异常；4)如果last_interval和dip_count特征异常。只要满足上述任一规则，则认为该连接发生行为异常。其中sip为失密者，dip为窃密者，如果当前连接的direction为in，则关注目标的IP地址疑为窃密者，如果direction为out，则关注目标的IP地址疑为失密者。如果连接不存在检测特征异常，或虽然存在检测特征异常，但不满足连接异常的判断标准，则认为当前连接不存在行为异常。After obtaining the abnormality judgment results of each detection feature constituting the detection feature vector, a further judgment can be made on whether the entire connection is abnormal. In one embodiment, the following judgment criteria are set: 1) if the last_interval and traffic features are abnormal; 2) if the last_interval and period features are abnormal; 3) if the sip_count feature is abnormal; 4) if the last_interval and dip_count features are abnormal. As long as any of the above rules are met, the connection is considered to be misbehaving. Among them, sip is the stealer, and dip is the stealer. If the direction of the current connection is in, the IP address of the target of interest is suspected to be the stealer. If the direction is out, the IP address of the target of interest is suspected to be the stealer. If there is no abnormal detection feature in the connection, or if there is an abnormal detection feature but does not meet the criteria for connection abnormality, then it is considered that the current connection does not have abnormal behavior.

(2)、在一个基于模型检测方法的实施例中，以基于已公开的贝叶斯网络的方法为例，对异常检测过程进行说明。(2) In an embodiment of a model-based detection method, the anomaly detection process is described by taking the disclosed Bayesian network-based method as an example.

首先基于检测特征编制带有事件结论的训练样本集，在下面的表1中给出了一个所述训练样本集的范例。First, a training sample set with event conclusions is compiled based on the detection features, and an example of the training sample set is given in Table 1 below.

last_interval last_interval traffic traffic period period sip_count sip_count dip_count dip_count conclusion conclusion 10000秒 10000 seconds 100k bps 100k bps 0.8 0.8 1 1 53 53 正常行为 normal behavior 50000秒 50000 seconds 89k bps 89k bps 0.95 0.95 3 3 23 twenty three 异常行为 abnormal behavior ...... ... ...... ... ...... ... ...... ... ...... ... ...... ...

表1Table 1

在得到训练样本集后，将该训练样本集输入相应的模型中，从而通过训练计算出模型的参数。模型训练完成后即可对失窃密行为进行检测。在检测时，对网络数据进行前文所述的采集、还原、统计，生成检测特征，然后将检测特征输入训练好的模型，模型通常可计算出一个概率值，如0.5、0.9等，具有较大概率值(如超过预定阈值)的连接通常被认为发生了异常。After obtaining the training sample set, the training sample set is input into the corresponding model, so as to calculate the parameters of the model through training. After the model training is completed, the theft can be detected. During detection, collect, restore, and count the network data as described above to generate detection features, and then input the detection features into the trained model. The model can usually calculate a probability value, such as 0.5, 0.9, etc., with a large A connection with a probability value (eg, exceeding a predetermined threshold) is generally considered to be anomalous.

除了上述的基于贝叶斯网络的方法外，还可以采用其他的基于模型的检测方法，如基于可信度模型的检测方法、基于直推信度机的检测方法、基于K-近邻的检测方法、基于支持向量机的检测方法等等。In addition to the above-mentioned methods based on Bayesian networks, other model-based detection methods can also be used, such as detection methods based on credibility models, detection methods based on direct push reliability machines, and detection methods based on K-nearest neighbors. , detection method based on support vector machine and so on.

在得到连接是否存在异常的结论后，可以针对结论做后续处理。如果某一连接存在异常，则生成失窃密行为报警事件，并写入事件库。如果某一连接不存在异常，则继续对下一个连接做异常检测。在将与关注目标有关的连接都做上述的异常检测后，还要对所有连接对应的检测特征traffic的变量sum置零，重新开始统计流量，并将计时器T置0。After the conclusion of whether there is an abnormality in the connection is obtained, follow-up processing can be performed on the conclusion. If there is an abnormality in a certain connection, an alarm event of stealing secrets will be generated and written into the event library. If there is no abnormality in a certain connection, continue to perform abnormality detection on the next connection. After the above-mentioned anomaly detection is performed on all connections related to the target of interest, the variable sum of the detection feature traffic corresponding to all connections should be set to zero, and the traffic statistics should be restarted, and the timer T should be set to 0.

所述的异常判断模块根据所述检测特征的值判断对应连接是否存在异常。本发明的异常检测方法不需要分析和提取固定特征，能够及时地发现未知网络失窃密行为。The abnormality judging module judges whether the corresponding connection is abnormal according to the value of the detection feature. The anomaly detection method of the present invention does not need to analyze and extract fixed features, and can discover unknown network theft behaviors in time.

本发明的异常检测方法采用“回传涉密信息”这一关键步骤的访问行为作为本发明的检测特征，不需要处理网络数据包的数据负载，提高了检测效率，适合于大规模网络环境的应用部署。本发明还为未知网络失窃密行为的进一步追踪、防御提供有效的信息支持和技术手段。The anomaly detection method of the present invention adopts the access behavior of the key step of "returning secret-related information" as the detection feature of the present invention, does not need to process the data load of the network data packet, improves the detection efficiency, and is suitable for large-scale network environments. Application deployment. The invention also provides effective information support and technical means for further tracking and defense of unknown network theft.

最后所应说明的是，以上实施例仅用以说明本发明的技术方案而非限制。尽管参照实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，对本发明的技术方案进行修改或者等同替换，都不脱离本发明技术方案的精神和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention rather than limit them. Although the present invention has been described in detail with reference to the embodiments, those skilled in the art should understand that modifications or equivalent replacements to the technical solutions of the present invention do not depart from the spirit and scope of the technical solutions of the present invention, and all of them should be included in the scope of the present invention. within the scope of the claims.

Claims

1. An anomaly detection method for network theft, comprising:

Step 1), receiving the network data packet, and performing protocol restoration on the obtained network data packet;

Step 2), making a correlation judgment between the network data packet and the target of concern, discarding the network data packet without correlation, and performing the next step for the network data packet with correlation;

Step 3), determine the connection to which the network data packet related to the concerned target belongs; wherein, the connection includes a TCP connection or a UDP connection;

Step 4), making statistics on the detection feature vectors on all connections related to the target of interest; wherein,

The detection feature vector includes the latest access interval feature last_interval used to represent the time interval from the latest access to the current connection to the time of detection, the traffic feature traffic used to represent the average flow of the current connection within the detection time interval, and used to represent The periodic behavior of the current connection in the time interval of the most recent visits. period, which is used to indicate the same sip connection with the same sip as the current connection and the number of connections that have successively accessed within the detection time interval. The successive access feature sip_count, the connection feature dip_count of the same dip used to indicate the number of connections with the same dip as the current connection within the detection time interval;

Step 5), according to the statistical results, calculate the value of each detection feature in the detection feature vector corresponding to the connection related to the target of interest;

Step 6), judging whether the corresponding connection is abnormal according to the value of the detection feature.

2. The abnormal detection method of network theft secret behavior according to claim 1, is characterized in that, before described step 1), also comprises learning phase, and described learning phase comprises step 1)-step 4), to obtain A large number of statistical results of the detected feature vectors on all connections related to the target of interest.

3. The abnormal detection method of network theft according to claim 1 or 2, characterized in that, in said step 2), said correlation judgment includes:

Matching the source IP address or the destination IP address of the network data packet with the IP address of the target of interest, if one of the source IP address and the destination IP address matches the IP address of the target of concern, Then there is a correlation between the network data packet and the target of interest; otherwise, there is no correlation.

4. The abnormal detection method for network theft according to claim 1 or 2, wherein said step 3) comprises:

Step 3-1), when the network data packet related to the concerned target is a TCP synchronization packet, if the network data packet does not belong to an existing connection, then a new TCP connection is established for the network data packet, if the The network packet belongs to an existing connection, and a new access to the existing connection is started;

Step 3-2), when the network data packet related to the concerned target is a TCP asynchronous packet, if the network data packet does not belong to an existing connection, then discard the network data packet, if the network data packet belongs to a If there is already a connection, keep the network data packet;

Step 3-3), when the network data packet related to the concerned target is a UDP packet, if the network data packet does not belong to an existing connection, a new UDP connection is established for it, if the network data packet belongs to An existing connection, when the existing connection meets the division rules of UDP connections, start a new access to the current existing connection, and then do subsequent statistical processing; when the existing connection does not meet the division rules of UDP connections, Do the follow-up statistical processing directly.

5. The abnormal detection method for network theft according to claim 4, wherein the rules for dividing UDP connections include:

(1), for the UDP packet whose destination port is 53, each UDP packet is a UDP connection;

(2) For all other UDP packets, set the UDP connection timer and division threshold, set the timer to 0 when the connection is established or each access starts, and determine whether the timer exceeds the preset value when processing each UDP packet If the division threshold is exceeded, the connection division will be performed, a new access of the current connection will be started, and its timer will be reset to 0. If it does not exceed, no division is required.

6. The abnormal detection method for network theft according to claim 1 or 2, characterized in that, in step 4),

The statistics of the latest access interval feature last_interval include: use the variable last_time to record the time when the current connection is established or the time when the last visit occurred; wherein, when the connection is established, let last_time be 0, and when a new visit occurs, let last_time is the current time;

The statistics of the traffic characteristic traffic include: using the variable sum to count the sum of all data packet lengths currently connected in the detection interval T ₀ ;

The statistics of the periodic feature period of the behavior include: using a circular queue with a length of n to record the time interval of the latest n accesses of the current connection;

The statistics of the successive access feature sip_count to the same sip connection include: adopt the CountingBloom Filter method to count the sip_count feature; or,

Let start_time be the time when the current connection is established or access occurs, and flag is the flag indicating whether the current connection has been counted. The statistical elements of the Counting Bloom Filter method include the counter count and the end time end_time, both of which are initialized to 0; when the current connection access ends, If flag is equal to 1, then return, otherwise find the statistical element corresponding to the currently connected sip through the Counting Bloom Filter method, if start_time is greater than end_time, then set flag=1, count+1, end_time=current access end time, otherwise if If start_time is less than or equal to end_time, return;

The statistics of the connection feature dip_count of the same dip includes: using the Counting BloomFilter method to count the current connected dips.

7. The abnormal detection method of network theft according to claim 6, characterized in that, in said step 5), the calculation of the detection feature value includes:

The value of the latest access interval feature last_interval is the difference between the current time and the variable last_time;

The value of the traffic characteristic traffic is the product of the variable sum and 8 and the quotient of the detection interval _T0 ;

The calculation formula of the value of the periodic characteristic period of the behavior is:

period period = = Max Max {{p p ((m m)) | | m m = = 11 ~ ~ \frac{n no}{33};;

Wherein, n is the length of the circular queue; Max represents the maximum value in the set; m represents the packet length; p (m) represents the uniformity that the average number of each packet element presents when the packet length is m, its size Calculated using the following formula:

Among them, Min means to take the minimum value in the set,

is the number of groups,

\overset{&OverBar;}{q} = Σ_{j = 1}^{no} q_{j}

is the average number of queue elements,

p_{i} = Σ_{j = m \times (i - 1) + 1}^{Min (m \times i, no)} q_{j} / m

is the average number of i grouping elements;

The value of the successive access feature sip_count of the same sip connection is the count value of the statistical element in the Counting Bloom Filter method;

The value of the connection feature dip_countsip_count of the same dip is the count value of the statistical element in the Counting BloomFilter method.

8. The abnormal detection method for network theft according to claim 1 or 2, wherein said step 6) includes:

Step 6-1), judging whether there is an abnormality in the detection feature;

In step 6-2), it is judged whether there is an abnormality in the corresponding connection according to the abnormality judgment result of the detection feature.

9. The abnormal detection method for network theft according to claim 8, characterized in that, in the step 6-1), the method of comparing the detection feature with the corresponding detection threshold or model judgment is adopted The method judges whether there is an abnormality in the detection feature.

10. The abnormal detection method for network theft according to claim 8, characterized in that, in said step 6-2), the method of setting judgment criteria or the method of model judgment is used to check whether the connection exists Exceptions are judged.

11. The abnormality detection method for network theft according to claim 9, characterized in that the judgment criteria include: if the latest access interval feature last_interval and traffic feature traffic are abnormal, or if the latest access interval feature last_interval and behavior If the periodic characteristic period is abnormal, or the consecutive access characteristic sip_count of the same sip connection is abnormal, or the latest access interval characteristic last_interval and the connection characteristic of the same dip are abnormal at the same time, then the connection will behave abnormally.

12. The abnormal detection method for network theft according to claim 8, characterized in that, in said step 6-2), further comprising:

Extract the secret information and the secret information; wherein, sip is a secret stealer, dip is a secret stealer, if the direction of the current connection is in, then the IP address of the concerned target is suspected of being a secret stealer, if the direction of the current connection is out, then the The IP address of the target of concern is suspected to be a leaker.

13. An anomaly detection system for network theft, including a data packet recovery module, a correlation judgment module, a connection judgment module, a detection feature vector statistics module, a detection feature value calculation module, and an abnormality judgment module; wherein,

The data packet restoration module receives the network data packet, and performs protocol restoration on the obtained network data packet;

The correlation judgment module makes a correlation judgment between the network data packet and the target of interest, discards the network data packets without correlation, and sends the relevant network data packets to the connection judgment module;

The connection determination module determines the connection to which the network data packet related to the concerned target belongs; wherein, the connection includes a TCP connection or a UDP connection;

The detection feature vector statistics module performs statistics on the detection feature vectors on all connections related to the target of interest; wherein,

The detection feature value calculation module calculates the value of each detection feature in the detection feature vector corresponding to the connection related to the target of interest according to the statistical results;

The abnormality judging module judges whether the corresponding connection is abnormal according to the value of the detection feature.