CN111291792B - Traffic data type integrated classification method and device based on dual evolution - Google Patents
Traffic data type integrated classification method and device based on dual evolution Download PDFInfo
- Publication number
- CN111291792B CN111291792B CN202010063154.0A CN202010063154A CN111291792B CN 111291792 B CN111291792 B CN 111291792B CN 202010063154 A CN202010063154 A CN 202010063154A CN 111291792 B CN111291792 B CN 111291792B
- Authority
- CN
- China
- Prior art keywords
- population
- sub
- class
- individual
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000009977 dual effect Effects 0.000 title claims description 35
- 238000005457 optimization Methods 0.000 claims abstract description 18
- 239000000243 solution Substances 0.000 claims description 164
- 238000012549 training Methods 0.000 claims description 45
- 238000012795 verification Methods 0.000 claims description 44
- 238000012706 support-vector machine Methods 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 14
- 238000002347 injection Methods 0.000 claims description 9
- 239000007924 injection Substances 0.000 claims description 9
- 230000004927 fusion Effects 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 45
- 230000035772 mutation Effects 0.000 description 16
- 238000001514 detection method Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Security & Cryptography (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Description
技术领域Technical field
本发明涉及无线网络入侵检测技术领域,尤其涉及一种基于双进化的流量数据类型集成分类方法、装置、计算机设备及存储介质。The invention relates to the technical field of wireless network intrusion detection, and in particular to an integrated classification method, device, computer equipment and storage medium of traffic data types based on dual evolution.
背景技术Background technique
随着互联网的普及和技术的快速发展,出现了越来越多的网络入侵事件。作为一种广泛使用的预防技术,入侵检测系统(IDS)已经变得越来越重要,这也是近年来研究的热点。With the popularity of the Internet and the rapid development of technology, more and more network intrusions have occurred. As a widely used prevention technology, intrusion detection systems (IDS) have become increasingly important, which is also a hot research topic in recent years.
IDS是一种分类器,用于分析网络流量数据类型,可以准确地识别网络的各种攻击。IDS部署在服务器时,来自外部网络的数据通过防火墙先到达IDS,由IDS评估数据的类型并将其返回给防火墙,普通数据可以通过防火墙到达服务器,而攻击数据将被防火墙过滤。IDS的本质是个不平衡的分类问题,现行的网络流量数据集中训练样本的各个类别的样本数量是不均衡的,部分类别的样本数量差别巨大,为了最大化检测率,通常训练出来的模型会将少数类忽视。IDS is a classifier used to analyze network traffic data types and can accurately identify various attacks on the network. When an IDS is deployed on a server, data from the external network first reaches the IDS through the firewall. The IDS evaluates the type of data and returns it to the firewall. Ordinary data can reach the server through the firewall, while attack data will be filtered by the firewall. The essence of IDS is an unbalanced classification problem. The number of samples in each category of training samples in the current network traffic data set is unbalanced. The number of samples in some categories varies greatly. In order to maximize the detection rate, the trained model usually Minorities are ignored.
通常在IDS中部署的分类器都是单一种类分类器(例如自组织特征映射网络,自组织特征映射网络的英文简称为SOM),对较少见的攻击类型识别率不高。Classifiers usually deployed in IDS are single-type classifiers (such as self-organizing feature mapping network, the English abbreviation of self-organizing feature mapping network is SOM), and the recognition rate of less common attack types is not high.
发明内容Contents of the invention
本发明实施例提供了一种基于双进化的流量数据类型集成分类方法、装置、计算机设备及存储介质,旨在解决现有技术中入侵检测系统部署的分类器都是单一种类分类器,对较少见的攻击类型识别率不高的问题。Embodiments of the present invention provide an integrated classification method, device, computer equipment and storage medium for traffic data types based on dual evolution, aiming to solve the problem that the classifiers deployed in the intrusion detection system in the prior art are all single-type classifiers. A rare problem of low attack type recognition rate.
第一方面,本发明实施例提供了一种基于双进化的流量数据类型集成分类方法,其包括:In the first aspect, embodiments of the present invention provide an integrated classification method of traffic data types based on dual evolution, which includes:
根据预设的第一种群大小值进行种群初始化,得到多个子类初始化种群;其中,多个子类初始化种群中每一子类初始化种群中所包括个体的总个数与所述第一种群大小值相等,多个子类初始化种群中每一子类初始化种群中所包括每一个体对应一个二进制序列,多个子类初始化种群中每一子类初始化种群中所包括每一个体对应的二进制序列中包括的特征数量与无线网络流量数据的特征数量相等;Perform population initialization according to the preset first population size value to obtain multiple sub-category initialization populations; wherein, the total number of individuals included in each sub-category initialization population in the multiple sub-category initialization populations is equal to the first population size value Equally, each individual included in each sub-category initialization population in multiple sub-category initialization populations corresponds to a binary sequence, and each individual included in each sub-category initialization population in multiple sub-category initialization populations corresponds to a binary sequence included in The number of features is equal to the number of features of the wireless network traffic data;
根据预设的第一最大迭代代数,迭代重复执行根据多个子类初始化种群中每一子类初始化种群中每一个体对预先存储的无线网络流量数据集中的训练数据子集进行特征选择、及输入至每一子类初始化种群对应的待训练分类器模型,得到与每一子类初始化种群对应的基分类器组,以多个子类初始化种群中每一子类初始化种群根据对应的基分类器组和预设的第一优化目标条件进化,得到每一子类初始化种群对应的子类当前种群的步骤,直至得到每一子类初始化种群对应的基分类器组,由每一子类初始化种群对应的基分类器组组成的当前分类器组;According to the preset first maximum iteration generation, iteratively and repeatedly perform feature selection and input for each individual in the initialization population of each subcategory of the multiple subcategories of the training data subset in the prestored wireless network traffic data set. Go to the classifier model to be trained corresponding to each sub-category initialization population, obtain the base classifier group corresponding to each sub-category initialization population, and use multiple sub-category initialization populations for each sub-category initialization population according to the corresponding base classifier group Evolve with the preset first optimization target conditions to obtain the current population of the subcategory corresponding to the initialization population of each subcategory, until the base classifier group corresponding to the initialization population of each subcategory is obtained, and the initialization population of each subcategory corresponds to The current classifier group composed of base classifier groups;
根据预设的第二种群大小值进行种群初始化,得到第二类初始化种群;其中,所述第二类初始化种群中包括多个个体,多个个体的总个数与所述第二种群大小值相等,每一个体对应一个二进制序列,每一个体对应的二进制序列中包括的特征数量等于所述第一种群大小值乘以所述多个子类初始化种群对应的子类种类总数值;Perform population initialization according to the preset second population size value to obtain a second type of initialization population; wherein the second type of initialization population includes multiple individuals, and the total number of multiple individuals is equal to the second population size value. Equal, each individual corresponds to a binary sequence, and the number of features included in the binary sequence corresponding to each individual is equal to the first population size value multiplied by the total number of subclass categories corresponding to the multiple subclass initialization populations;
根据预设的第二最大迭代代数,迭代重复执行由第二类初始化种群中每一个体根据当前分类器组和预设的第二优化目标条件进化的步骤,直至得到迭代输出的第二类初始化种群;According to the preset second maximum iteration generation, iteratively repeat the steps of evolving each individual in the second type initialization population according to the current classifier group and the preset second optimization target conditions until the second type initialization of the iteration output is obtained. population; population
根据输出的所述第二类初始化种群中每一个体在当前分类器组进行基分类器选择,得到第二类初始化种群中每一个体对应的目标分类器组,以组成最优目标分类器组,将所述最优目标分类器组进行存储。According to the output of each individual in the second type of initialization population, the base classifier is selected in the current classifier group to obtain the target classifier group corresponding to each individual in the second type of initialization population to form the optimal target classifier group. , store the optimal target classifier group.
第二方面,本发明实施例提供了一种基于双进化的流量数据类型集成分类装置,其包括用于执行上述第一方面所述的基于双进化的流量数据类型集成分类方法的单元。In a second aspect, embodiments of the present invention provide an integrated classification device for traffic data types based on dual evolution, which includes a unit for executing the integrated classification method for traffic data types based on dual evolution described in the first aspect.
第三方面,本发明实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的基于双进化的流量数据类型集成分类方法。In a third aspect, embodiments of the present invention provide a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor. The processor executes the computer program. The program implements the integrated classification method of traffic data types based on dual evolution described in the first aspect above.
第四方面,本发明实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行上述第一方面所述的基于双进化的流量数据类型集成分类方法。In a fourth aspect, embodiments of the present invention further provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when executed by a processor, the computer program causes the processor to execute the above-mentioned first step. The dual evolution-based integrated classification method of traffic data types described in one aspect.
本发明实施例提供了一种基于双进化的流量数据类型集成分类方法、装置、计算机设备及存储介质,在服务器中部署由至少包括2种类型基分类器的集成分类器组组成最优目标分类器组,从而能对流量数据类型进行更精准的分类,避免了采用单一种类的分类器,从而提高对较少见的攻击类型识别率。Embodiments of the present invention provide an integrated classification method, device, computer equipment and storage medium for traffic data types based on dual evolution. An integrated classifier group including at least two types of base classifiers is deployed in the server to form an optimal target classification. Classifier group can be used to classify traffic data types more accurately, avoiding the use of a single type of classifier, thereby improving the recognition rate of less common attack types.
附图说明Description of the drawings
为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present invention, which are of great significance to this field. Ordinary technicians can also obtain other drawings based on these drawings without exerting creative work.
图1为本发明实施例提供的基于双进化的流量数据类型集成分类方法的应用场景示意图;Figure 1 is a schematic diagram of the application scenario of the integrated classification method of traffic data types based on dual evolution provided by the embodiment of the present invention;
图2为本发明实施例提供的基于双进化的流量数据类型集成分类方法的流程示意图;Figure 2 is a schematic flow chart of the integrated classification method of traffic data types based on dual evolution provided by an embodiment of the present invention;
图3为本发明实施例提供的基于双进化的流量数据类型集成分类装置的示意性框图;Figure 3 is a schematic block diagram of a traffic data type integrated classification device based on dual evolution provided by an embodiment of the present invention;
图4为本发明实施例提供的计算机设备的示意性框图。Figure 4 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that, when used in this specification and the appended claims, the terms "comprises" and "comprises" indicate the presence of described features, integers, steps, operations, elements and/or components but do not exclude the presence of one or The presence or addition of multiple other features, integers, steps, operations, elements, components and/or collections thereof.
还应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms unless the context clearly dictates otherwise.
还应当进一步理解,在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It will be further understood that the term "and/or" as used in the specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. .
请参阅图1和图2,图1为本发明实施例提供的基于双进化的流量数据类型集成分类方法的应用场景示意图;图2为本发明实施例提供的基于双进化的流量数据类型集成分类方法的流程示意图,该基于双进化的流量数据类型集成分类方法应用于服务器中,该方法通过安装于服务器中的应用软件进行执行。Please refer to Figures 1 and 2. Figure 1 is a schematic diagram of an application scenario of the integrated classification method of traffic data types based on dual evolution provided by an embodiment of the present invention; Figure 2 is an integrated classification of traffic data types based on dual evolution provided by an embodiment of the present invention. A schematic flow chart of the method. The integrated classification method of traffic data types based on dual evolution is applied to the server. The method is executed through the application software installed in the server.
如图2所示,该方法包括步骤S110~S150。As shown in Figure 2, the method includes steps S110 to S150.
S110、根据预设的第一种群大小值进行种群初始化,得到多个子类初始化种群;其中,多个子类初始化种群中每一子类初始化种群中所包括个体的总个数与所述第一种群大小值相等,多个子类初始化种群中每一子类初始化种群中所包括每一个体对应一个二进制序列,多个子类初始化种群中每一子类初始化种群中所包括每一个体对应的二进制序列中包括的特征数量与无线网络流量数据的特征数量相等。S110. Perform population initialization according to the preset first population size value to obtain multiple sub-category initialization populations; wherein the total number of individuals included in each sub-category initialization population in the multiple sub-category initialization populations is equal to the total number of individuals included in the first population. The size values are equal. Each individual included in each sub-category initialization population in multiple sub-category initialization populations corresponds to a binary sequence. Each individual included in each sub-category initialization population in multiple sub-category initialization populations corresponds to a binary sequence. The number of features included is equal to the number of features of the wireless network traffic data.
为了更清楚的理解本申请的技术方案,下面对所涉及到的终端进行介绍。本申请是在服务器的角度描述技术方案。In order to understand the technical solution of the present application more clearly, the involved terminals are introduced below. This application describes the technical solution from the perspective of the server.
第一是客户端,客户端可以理解为用户终端,用户终端可以是智能手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等具有通信功能的物联网电子设备,用户终端产生无线网络流量数据(如用户终端通过蓝牙或Wi-Fi等无线连接方式与交换机连接)并经过交换机之后上传至服务器。The first is the client. The client can be understood as the user terminal. The user terminal can be an IoT electronic device with communication functions such as a smartphone, tablet, laptop, desktop computer, personal digital assistant, and wearable device. The user terminal generates Wireless network traffic data (for example, the user terminal is connected to the switch through wireless connection methods such as Bluetooth or Wi-Fi) is uploaded to the server after passing through the switch.
第二是交换机,用于为接入交换机的任意两个网络节点提供独享的电信号通路。例如用户终端1和用户终端2进入交换机之后,用户终端1独享一条电信号通路,用户终端2独享另一条电信号通路。The second is the switch, which is used to provide an exclusive electrical signal path for any two network nodes connected to the switch. For example, after user terminal 1 and user terminal 2 enter the switch, user terminal 1 exclusively enjoys one electrical signal path, and user terminal 2 exclusively enjoys another electrical signal path.
第三是服务器,防火墙部署于服务器中,而IDS(即入侵检测系统)部署于防火墙中,防火墙中的IDS可以对用户终端上传的无线网络流量数据进行数据攻击类型的判断,若用户终端上传的无线网络流量数据是攻击类型的数据则防火墙对该无线网络流量数据进行拦截,若用户终端上传的无线网络流量数据是非攻击类型的数据则防火墙对该网络流量数据进行放行。The third is the server. The firewall is deployed in the server, and the IDS (intrusion detection system) is deployed in the firewall. The IDS in the firewall can judge the type of data attack on the wireless network traffic data uploaded by the user terminal. If the user terminal uploads If the wireless network traffic data is attack type data, the firewall will intercept the wireless network traffic data. If the wireless network traffic data uploaded by the user terminal is non-attack type data, the firewall will allow the network traffic data.
在本实施例中,为了在服务器中获取对无线网络流量数据进行数据攻击类型精准分类的目标分类器组,先是通过服务器根据预设的第一种群大小值进行种群初始化,得到多个子类初始化种群。之所以初始化多个子类初始化种群,是为了以每一子类初始化种群对应去选择无线网络流量数据集中的训练数据子集的特征以输入至对应的待训练分类器模型,得到与每一子类初始化种群对应的基分类器组,从而得到一种集成多种分类器的异质基分类器组,该异质基分类器组对待检测无线网络流量数据进行分类而得到的数据攻击类型结果为正常操作、洪泛攻击、注入攻击、假冒攻击中的其中一种。也就可以通过异质基分类器组中多种类的分类器对无线网络流量数据进行更加精确的分类,避免采用单一种类分类器,对较少见的攻击类型识别率不高的问题。In this embodiment, in order to obtain a target classifier group for accurately classifying data attack types on wireless network traffic data in the server, the server first performs population initialization according to the preset first population size value to obtain multiple sub-category initialization populations. . The reason why multiple sub-category initialization populations are initialized is to select the characteristics of the training data subset in the wireless network traffic data set corresponding to each sub-category initialization population to input to the corresponding classifier model to be trained, and obtain the characteristics of each sub-category. Initialize the base classifier group corresponding to the population to obtain a heterogeneous base classifier group that integrates multiple classifiers. The data attack type result obtained by the heterogeneous base classifier group by classifying the wireless network traffic data to be detected is normal. One of operations, flooding attacks, injection attacks, and impersonation attacks. In other words, multiple types of classifiers in the heterogeneous base classifier group can be used to classify wireless network traffic data more accurately, avoiding the problem of using a single type of classifier and low recognition rate of rare attack types.
例如,一条无线网络流量数据有95个属性数据(可以理解为该条无线网络流量数据有95个字段,每一字段有对应的字段取值,也即该无线网络流量数据的特征数量为95),此时多个子类初始化种群中每一子类初始化种群中所包括个体的总个数与所述第一种群大小值相等,多个子类初始化种群中每一子类初始化种群中所包括每一个体对应一个二进制序列。多个子类初始化种群中每一子类初始化种群中所包括每一个体对应的二进制序列中包括的特征数量与无线网络流量数据的特征数量相等,即每一个体对应的二进制序列中包括的特征数量也是95个,这95个特征的二进制取值为0或1,0代表删除对应位置的特征,1代表选择对应位置的特征。For example, a piece of wireless network traffic data has 95 attribute data (it can be understood that this piece of wireless network traffic data has 95 fields, and each field has a corresponding field value, that is, the number of characteristics of the wireless network traffic data is 95) , at this time, the total number of individuals included in each sub-category initialization population in the multiple sub-category initialization populations is equal to the first population size value, and each individual included in each sub-category initialization population in the multiple sub-category initialization populations The body corresponds to a binary sequence. The number of features included in the binary sequence corresponding to each individual included in the multiple sub-category initialization populations is equal to the number of features of the wireless network traffic data, that is, the number of features included in the binary sequence corresponding to each individual There are also 95. The binary values of these 95 features are 0 or 1. 0 represents deleting the feature at the corresponding position, and 1 represents selecting the feature at the corresponding position.
当完成了多个子类初始化种群的初始化后,即可根据多个子类初始化种群中每一子类初始化种群的每一个个体对后续的无线网络流量数据集中每一无线网络流量数据进行特征选择和简化。After the initialization of multiple sub-category initialization populations is completed, feature selection and simplification of each wireless network traffic data in the subsequent wireless network traffic data set can be performed based on each individual of each sub-category initialization population in the multiple sub-category initialization populations. .
S120、根据预设的第一最大迭代代数,迭代重复执行根据多个子类初始化种群中每一子类初始化种群中每一个体对预先存储的无线网络流量数据集中的训练数据子集进行特征选择、及输入至每一子类初始化种群对应的待训练分类器模型,得到与每一子类初始化种群对应的基分类器组,以多个子类初始化种群中每一子类初始化种群根据对应的基分类器组和预设的第一优化目标条件进化,得到每一子类初始化种群对应的子类当前种群的步骤,直至得到每一子类初始化种群对应的基分类器组,由每一子类初始化种群对应的基分类器组组成的当前分类器组。S120. According to the preset first maximum iteration generation, iteratively and repeatedly perform feature selection on the training data subset in the pre-stored wireless network traffic data set based on each individual in each sub-category initialization population of multiple sub-categories. And input to the classifier model to be trained corresponding to each sub-category initialization population, obtain a base classifier group corresponding to each sub-category initialization population, and classify each sub-category initialization population in multiple sub-category initialization populations according to the corresponding base classification The steps of the current population of the sub-category corresponding to the initialization population of each sub-category are obtained by evolving the device group and the preset first optimization target conditions until the base classifier group corresponding to the initialization population of each sub-category is obtained, which is initialized by each sub-category. The current classifier group composed of the base classifier group corresponding to the population.
在本实施例中,为了对客户端发送的待检测无线网络流量数据集中每一待检测无线网络流量数据进行数据攻击类型结果的精准获取,需要在服务器中根据所述第一最大迭代代数同时对多个子类初始化种群分别进行满足第一优化目标条件的进化,从而得到每一子类初始化种群对应的基分类器组,由每一子类初始化种群对应的基分类器组组成的当前分类器组。这一当前分类器组是一个至少包括2种类型基分类器的集成分类器组,避免了采用单一种类的分类器,从而提高对较少见的攻击类型识别率。In this embodiment, in order to accurately obtain the data attack type results for each wireless network traffic data to be detected in the wireless network traffic data set to be detected sent by the client, it is necessary to simultaneously analyze the data in the server according to the first maximum iteration algebra. Multiple sub-category initialization populations are respectively evolved to satisfy the first optimization objective conditions, thereby obtaining the base classifier group corresponding to each sub-category initialization population, and the current classifier group is composed of the base classifier group corresponding to each sub-category initialization population. . This current classifier group is an integrated classifier group including at least 2 types of base classifiers, which avoids using a single type of classifier, thereby improving the recognition rate of less common attack types.
其中,以多个子类初始化种群中每一子类初始化种群根据对应的基分类器组和预设的第一优化目标条件进化时,该第一优化目标条件具体可参考下述由第一多目标策略获取与所述第一子类混合种群中每一个体对应的第一分类结果集相对应的第一目标值集合、与所述第二子类混合种群中每一个体对应的第二分类结果集相对应的第二目标值集合、及与所述第三子类混合种群中每一个体对应的第三分类结果集相对应的第三目标值集合的过程。Wherein, when each sub-category initialization population in the multiple sub-category initialization populations evolves according to the corresponding base classifier group and the preset first optimization target condition, the first optimization target condition can be specifically referred to as follows by the first multi-objective The strategy obtains a first target value set corresponding to the first classification result set corresponding to each individual in the first sub-category mixed population, and a second classification result corresponding to each individual in the second sub-category mixed population. The second target value set corresponding to the set, and the third target value set corresponding to the third classification result set corresponding to each individual in the third sub-category mixed population.
在一实施例中,步骤S120中所述多个子类初始化种群对应的子类种类总数值为3,3个子类初始化种群分别对应第一子类初始化种群、第二子类初始化种群、第三子类初始化种群;其中,所述第一子类初始化种群中多个个体的总个数与所述第一种群大小值相等,所述第一子类初始化种群中每一个体对应一个二进制序列,所述第一子类初始化种群中每一个体对应的二进制序列中包括的特征数量与无线网络流量数据的特征数量相等;所述第二子类初始化种群中多个个体的总个数与所述第一种群大小值相等,所述第二子类初始化种群中每一个体对应一个二进制序列,所述第二子类初始化种群中每一个体对应的二进制序列中包括的特征数量与无线网络流量数据的特征数量相等;所述第三子类初始化种群中多个个体的总个数与所述第一种群大小值相等,所述第三子类初始化种群中每一个体对应一个二进制序列,所述第三子类初始化种群中每一个体对应的二进制序列中包括的特征数量与无线网络流量数据的特征数量相等。In one embodiment, the total number of subcategories corresponding to the multiple subcategory initialization populations in step S120 is 3, and the three subcategory initialization populations respectively correspond to the first subcategory initialization population, the second subcategory initialization population, and the third subcategory initialization population. Class initialization population; wherein, the total number of individuals in the first sub-class initialization population is equal to the size of the first population, and each individual in the first sub-class initialization population corresponds to a binary sequence, so The number of features included in the binary sequence corresponding to each individual in the first subcategory initialization population is equal to the number of features of the wireless network traffic data; the total number of multiple individuals in the second subcategory initialization population is equal to the number of features in the second subcategory initialization population. A group size value is equal, each individual in the second sub-category initialization population corresponds to a binary sequence, and the number of features included in the binary sequence corresponding to each individual in the second sub-category initialization population is equal to the number of wireless network traffic data The number of features is equal; the total number of multiple individuals in the third subcategory initialization population is equal to the size value of the first population, and each individual in the third subcategory initialization population corresponds to a binary sequence. The number of features included in the binary sequence corresponding to each individual in the three-subcategory initialization population is equal to the number of features of the wireless network traffic data.
在本实施例中,采用上述3个子类初始化种群,是因为具体实施采用3种类型基分类器组成当前分类器组能实现较佳的分类效果。此时为了通过种群进化的过程来得到每一子类初始化种群对应的基分类器组,需根据所述第一种群大小值同时初始化出第一子类初始化种群、第二子类初始化种群、第三子类初始化种群。In this embodiment, the above three subclasses are used to initialize the population because the specific implementation of using three types of base classifiers to form the current classifier group can achieve better classification results. At this time, in order to obtain the base classifier group corresponding to each sub-category initialization population through the population evolution process, it is necessary to simultaneously initialize the first sub-category initialization population, the second sub-category initialization population, and the third sub-category initialization population according to the first population size value. Three subclasses initialize the population.
在一实施例中,步骤S120包括:In an embodiment, step S120 includes:
获取第一当前迭代代数,判断所述第一当前迭代代数是否达到所述第一最大迭代代数;Obtain the first current iteration generation and determine whether the first current iteration generation reaches the first maximum iteration generation;
若所述第一当前迭代代数未达到所述第一最大迭代代数,获取预先存储的无线网络流量数据集中的训练数据子集;If the first current iteration generation does not reach the first maximum iteration generation, obtain a training data subset in the pre-stored wireless network traffic data set;
通过所述第一子类初始化种群中每一个体对所述训练数据子集进行特征选择及分别输入至待训练K最邻近模型进行训练,对应得到K最邻近模型分类器组;其中,所述K最邻近模型分类器组中基分类器的总数与所述第一种群大小值相等;Each individual in the first subclass initialization population performs feature selection on the training data subset and inputs it into the K nearest neighbor model to be trained for training, correspondingly obtaining a K nearest neighbor model classifier group; wherein, the The total number of base classifiers in the K nearest neighbor model classifier group is equal to the first population size value;
通过所述第二子类初始化种群中每一个体对所述训练数据子集进行特征选择及分别输入至待训练支持向量机模型进行训练,对应得到支持向量机模型分类器组;其中,所述支持向量机模型分类器组中基分类器的总数与所述第一种群大小值相等;Each individual in the second subclass initialization population performs feature selection on the training data subset and inputs it into the support vector machine model to be trained for training, correspondingly obtaining a support vector machine model classifier group; wherein, the The total number of base classifiers in the support vector machine model classifier group is equal to the first population size value;
通过所述第三子类初始化种群中每一个体对所述训练数据子集进行特征选择及分别输入至待训练自组织特征映射网络进行训练,对应得到自组织特征映射网络分类器组;其中,所述自组织特征映射网络分类器组中基分类器的总数与所述第一种群大小值相等;Each individual in the third subcategory initialization population performs feature selection on the training data subset and inputs it into the self-organizing feature mapping network to be trained for training, correspondingly obtaining a self-organizing feature mapping network classifier group; wherein, The total number of base classifiers in the self-organizing feature map network classifier group is equal to the first population size value;
以所述K最邻近模型分类器组、支持向量机模型分类器组和自组织特征映射网络分类器组组成当前分类器组;其中,所述当前分类器组中对应的基分类器的总数为所述第一种群大小值的3倍;The current classifier group is composed of the K nearest neighbor model classifier group, the support vector machine model classifier group and the self-organizing feature map network classifier group; wherein, the total number of corresponding base classifiers in the current classifier group is 3 times the first population size value;
对所述第一子类初始化种群进行模拟二进制交叉和多项式变异,得到与所述第一子类初始化种群有相同个体总个数的第一子类子种群;Perform simulated binary crossover and polynomial mutation on the first sub-category initialization population to obtain a first sub-category sub-population with the same total number of individuals as the first sub-category initialization population;
对所述第二子类初始化种群进行模拟二进制交叉和多项式变异,得到与所述第二子类初始化种群有相同个体总个数的第二子类子种群;Perform simulated binary crossover and polynomial mutation on the second sub-category initialization population to obtain a second sub-category sub-population with the same total number of individuals as the second sub-category initialization population;
对所述第三子类初始化种群进行模拟二进制交叉和多项式变异,得到与所述第三子类初始化种群有相同个体总个数的第三子类子种群;Perform simulated binary crossover and polynomial mutation on the third subcategory initialization population to obtain a third subcategory subpopulation with the same total number of individuals as the third subcategory initialization population;
将所述第一子类初始化种群与所述第一子类子种群进行合并得到第一子类混合种群;Merge the first sub-category initialization population and the first sub-category sub-population to obtain a first sub-category mixed population;
将所述第二子类初始化种群与所述第二子类子种群进行合并得到第二子类混合种群;Merge the second sub-category initialization population and the second sub-category sub-population to obtain a second sub-category mixed population;
将所述第三子类初始化种群与所述第三子类子种群进行合并得到第三子类混合种群;Merge the third subcategory initialization population and the third subcategory subpopulation to obtain a third subcategory mixed population;
获取预先存储的无线网络流量数据集中的验证数据子集;Obtain the verification data subset from the pre-stored wireless network traffic data set;
通过所述第一子类混合种群中每一个体对所述验证数据子集进行特征选择及分别输入至所述K最邻近模型分类器组中各基分类器,得到与所述第一子类混合种群中每一个体对应的第一分类结果集;By each individual in the first sub-category mixed population performing feature selection on the verification data subset and inputting it into each base classifier in the K-nearest neighbor model classifier group, the result is the same as the first sub-category. The first classification result set corresponding to each individual in the mixed population;
通过所述第二子类混合种群中每一个体对所述验证数据子集进行特征选择及分别输入至所述支持向量机模型分类器组中各基分类器,得到与所述第二子类混合种群中每一个体对应的第二分类结果集;By each individual in the second sub-category mixed population performing feature selection on the verification data subset and inputting it into each base classifier in the support vector machine model classifier group, the result is the same as the second sub-category. The second classification result set corresponding to each individual in the mixed population;
通过所述第三子类混合种群中每一个体对所述验证数据子集进行特征选择及分别输入至所述自组织特征映射网络分类器组中各基分类器,得到与所述第三子类混合种群中每一个体对应的第三分类结果集;Each individual in the third sub-category mixed population performs feature selection on the verification data subset and inputs it into each base classifier in the self-organizing feature map network classifier group to obtain the same as the third sub-category. The third classification result set corresponding to each individual in the mixed population;
调用预先存储的第一多目标策略,将所述第一子类混合种群中每一个体对应的第一分类结果集、所述第二子类混合种群中每一个体对应的第二分类结果集、所述第三子类混合种群中每一个体对应的第三分类结果集分别作为第一多目标策略的输入,获取与所述第一子类混合种群中每一个体对应的第一分类结果集相对应的第一目标值集合、与所述第二子类混合种群中每一个体对应的第二分类结果集相对应的第二目标值集合、及与所述第三子类混合种群中每一个体对应的第三分类结果集相对应的第三目标值集合;Call the pre-stored first multi-objective strategy to combine the first classification result set corresponding to each individual in the first sub-category mixed population and the second classification result set corresponding to each individual in the second sub-category mixed population. , the third classification result set corresponding to each individual in the third sub-category mixed population is used as the input of the first multi-objective strategy to obtain the first classification result corresponding to each individual in the first sub-category mixed population. The first target value set corresponding to the set, the second target value set corresponding to the second classification result set corresponding to each individual in the second subcategory mixed population, and the second target value set corresponding to the third subcategory mixed population. The third target value set corresponding to the third classification result set corresponding to each individual;
将所述第一子类混合种群中的个体根据对应的所述第一目标值集合、将所述第二子类混合种群中的个体根据对应的所述第二目标值集合、及将所述第三子类混合种群中的个体根据对应的所述第三目标值集合分别进行非支配排序,得到与所述第一目标值集合对应的第一子类非支配解集及第一子类多层解集、与所述第二目标值集合对应的第二子类非支配解集及第二子类多层解集、及与所述第三目标值集合对应的第三子类非支配解集及第三子类多层解集;其中,所述第一子类非支配解集记为Q11,所述第一子类多层解集中包括多个解集子集且分别记为Q12至Q1X,其中Q11至Q1X的并集为所述第一子类混合种群,Q11至Q1X中任意两个集合的交集为空集,Q11≥Q12≥Q13≥……≥Q1X;所述第二子类非支配解集记为Q21,所述第二子类多层解集中包括多个解集子集且分别记为Q22至Q2Y,其中Q21至Q2Y的并集为所述第二子类混合种群,Q21至Q2Y中任意两个集合的交集为空集,Q21≥Q22≥Q23≥……≥Q2Y;所述第三子类非支配解集记为Q31,所述第三子类多层解集中包括多个解集子集且分别记为Q32至Q3Z,其中Q31至Q3Z的并集为所述第三子类混合种群,Q31至Q3Z中任意两个集合的交集为空集,Q31≥Q32≥Q33≥……≥Q3Z;Set the individuals in the first sub-category mixed population according to the corresponding first target value, set the individuals in the second sub-category mixed population according to the corresponding second target value, and set the The individuals in the third sub-category mixed population are respectively non-dominated sorted according to the corresponding third target value set, and the first sub-category non-dominated solution set and the first sub-category multi-population solution set corresponding to the first target value set are obtained. layer solution set, a second sub-category non-dominated solution set corresponding to the second target value set and a second sub-category multi-level solution set, and a third sub-category non-dominated solution set corresponding to the third target value set Sets and multi-level solution sets of the third sub-category; wherein, the first sub-category non-dominated solution set is denoted as Q 11 , and the first sub-category multi-layer solution set includes multiple solution set subsets and are denoted as Q respectively 12 to Q 1X , where the union of Q 11 to Q 1X is the first subcategory mixed population, the intersection of any two sets in Q 11 to Q 1X is the empty set, Q 11 ≥ Q 12 ≥ Q 13 ≥… ...≥Q 1X ; the second subcategory of non-dominated solution sets is denoted as Q 21 , and the second subcategory of multi-layer solution sets includes multiple solution set subsets and are denoted as Q 22 to Q 2Y respectively, where Q 21 The union of Q 2Y to Q 2Y is the mixed population of the second subcategory. The intersection of any two sets from Q 21 to Q 2Y is the empty set. Q 21 ≥ Q 22 ≥ Q 23 ≥...≥ Q 2Y ; the above-mentioned third The third subcategory of non-dominated solution sets is denoted as Q 31 . The third subcategory of multi-level solution sets includes multiple solution set subsets and are denoted as Q 32 to Q 3Z respectively, where the union of Q 31 to Q 3Z is For the third subcategory of mixed populations, the intersection of any two sets from Q 31 to Q 3Z is the empty set, Q 31 ≥ Q 32 ≥ Q 33 ≥……≥Q 3Z ;
在所述第一子类非支配解集及第一子类多层解集中多个解集子集依序合并从而获取多个集合直至个体的总个数等于所述第一种群大小值,以组成第一子类当前种群,将所述第一子类当前种群作为所述第一子类初始化种群;Multiple solution set subsets in the first sub-category non-dominated solution set and the first sub-category multi-layer solution set are merged sequentially to obtain multiple sets until the total number of individuals is equal to the first population size value, so as to Form a first sub-category current population, and use the first sub-category current population as the first sub-category initialization population;
在所述第二子类非支配解集及第二子类多层解集中多个解集子集依序合并从而获取多个集合直至个体的总个数等于所述第一种群大小值,以组成第二子类当前种群,将所述第二子类当前种群作为所述第二子类初始化种群;Multiple solution set subsets in the second sub-category non-dominated solution set and the second sub-category multi-layer solution set are merged sequentially to obtain multiple sets until the total number of individuals is equal to the first population size value, so as to Form a current population of the second subcategory, and use the current population of the second subcategory as the initialization population of the second subcategory;
在所述第三子类非支配解集及第三子类多层解集中多个解集子集依序合并从而获取多个集合直至个体的总个数等于所述第一种群大小值,以组成第三子类当前种群,将所述第三子类当前种群作为所述第三子类初始化种群;Multiple solution set subsets in the third sub-category non-dominated solution set and the third sub-category multi-layer solution set are merged sequentially to obtain multiple sets until the total number of individuals is equal to the first population size value, so that Form a third subcategory current population, and use the third subcategory current population as the third subcategory initialization population;
将所述第一当前迭代代数加一以作为第一当前迭代代数,返回执行判断所述第一当前迭代代数是否达到所述第一最大迭代代数的步骤;Add one to the first current iteration generation to serve as the first current iteration generation, and return to the step of determining whether the first current iteration generation reaches the first maximum iteration generation;
若所述第一当前迭代代数达到所述第一最大迭代代数,获取第一子类初始化种群对应的K最邻近模型分类器组、第二子类初始化种群对应的支持向量机模型分类器组、及第三子类初始化种群对应的自组织特征映射网络分类器组,由K最邻近模型分类器组、支持向量机模型分类器组及自组织特征映射网络分类器组组成当前分类器组。If the first current iteration generation reaches the first maximum iteration generation, obtain the K nearest neighbor model classifier group corresponding to the first subclass initialization population, the support vector machine model classifier group corresponding to the second subclass initialization population, and the self-organizing feature map network classifier group corresponding to the third subcategory initialization population. The current classifier group is composed of the K nearest neighbor model classifier group, the support vector machine model classifier group and the self-organizing feature map network classifier group.
在本实施例中,服务器中预先存储了无线网络流量数据集(例如,AWID数据集为一种常用的无线网络流量数据集),该无线网络流量数据集包括训练数据子集(用于对分类器进行模型训练)和验证数据子集(用于对已完成训练的分类器进行分类效果验证)。此时为了加快对待训练K最邻近模型(K最邻近模型即KNN模型)、待训练支持向量机模型(支持向量机模型即SVM模型)、待训练自组织特征映射网络(自组织特征映射网络即SOM模型)的训练进度,可以通过所述第一子类初始化种群中每一个体对所述训练数据子集进行特征选择及分别输入至待训练K最邻近模型进行训练,通过所述第二子类初始化种群中每一个体对所述训练数据子集进行特征选择及分别输入至待训练支持向量机模型进行训练,通过所述第三子类初始化种群中每一个体对所述训练数据子集进行特征选择及分别输入至待训练自组织特征映射网络进行训练。其中,所述第一子类初始化种群中每一个体、所述第二子类初始化种群中每一个体、所述第三子类初始化种群中每一个体的作用均是为了对所述训练数据子集中每一训练数据(例如,每一训练数据也是有95个属性数据)进行特征选择,以去掉每一训练数据中的冗余特征(也即冗余属性数据),从而降低每一训练数据的数据维度,以提高对分类器的训练速度。In this embodiment, a wireless network traffic data set is pre-stored in the server (for example, the AWID data set is a commonly used wireless network traffic data set). The wireless network traffic data set includes a training data subset (used for classification). model training) and validation data subset (used to verify the classification effect of the trained classifier). At this time, in order to speed up the training of the K nearest neighbor model (the K nearest neighbor model is the KNN model), the support vector machine model to be trained (the support vector machine model is the SVM model), and the self-organizing feature mapping network to be trained (the self-organizing feature mapping network is SOM model), each individual in the first sub-category initialization population can perform feature selection on the training data subset and input it to the K nearest neighbor model to be trained for training, and through the second sub-category Each individual in the class initialization population performs feature selection on the training data subset and inputs it into the support vector machine model to be trained for training, and each individual in the third sub-class initialization population performs feature selection on the training data subset. Features are selected and input into the self-organizing feature mapping network to be trained for training. Wherein, the function of each individual in the first sub-category initialization population, each individual in the second sub-category initialization population, and each individual in the third sub-category initialization population is to modify the training data. Feature selection is performed on each training data in the subset (for example, each training data also has 95 attribute data) to remove redundant features (that is, redundant attribute data) in each training data, thereby reducing the number of each training data. data dimensions to improve the training speed of the classifier.
例如,将第一子类初始化种群中的个体分别记为A1-Am、m的取值集合为{1,2,……,X1}且X1与所述第一种群大小值相等;将无线网络流量数据集的无线网络流量数据分别记为B1-BN、N的取值集合为{1,2,……,X2}且X2与所述无线网络流量数据集中无线网络流量数据的总数据条数相等。For example, the individuals in the initialization population of the first subcategory are recorded as A 1 -A m respectively, and the value set of m is {1, 2,...,X1}, and X1 is equal to the size of the first population; The wireless network traffic data in the wireless network traffic data set are respectively recorded as B 1 -B N . The value set of N is {1, 2,...,X2}, and X2 is the same as the wireless network traffic data in the wireless network traffic data set. The total number of data items is equal.
此时以第一子类初始化种群中个体A1为例,其对无线网络流量数据B1-BN分别进行特征选择后,例如无线网络流量数据B1中有95个字段取值,而A1中也是一个有95个取值的二进制序列,A1中仅有第一位-第五位的取值为1且其他位的取值为0,此时无线网络流量数据B1-BN分别根据A1进行特征选择后,得到与无线网络流量数据B1-BN分别对应的简化后入侵检测数据A1 B1-A1 BN,此时简化后无线网络流量数据A1B1-A1BN中均仅有第一位-第五位的取值保留了原有取值且其他位的取值均被简化为0。同理,个体对无线网络流量数据集中无线网络流量数据的特征选择和简化过程均可参考上述过程。At this time, take the individual A 1 in the first sub-category initialization population as an example. After performing feature selection on the wireless network traffic data B 1 -B N respectively, for example, there are 95 field values in the wireless network traffic data B 1 , and A 1 is also a binary sequence with 95 values. In A 1 , only the first to fifth bits have a value of 1 and the other bits have a value of 0. At this time, the wireless network traffic data B 1 -B N After feature selection based on A 1 , the simplified intrusion detection data A 1 B 1 -A 1 B N corresponding to the wireless network traffic data B 1 -B N are obtained. At this time, the simplified wireless network traffic data A 1 B 1 In -A 1 B N , only the first to fifth bits retain their original values and the values of other bits are simplified to 0. In the same way, the feature selection and simplification process of individual wireless network traffic data in the wireless network traffic data set can refer to the above process.
在获取最终的当前分类器组的此轮迭代过程中,当得到了所述K最邻近模型分类器组、支持向量机模型分类器组和自组织特征映射网络分类器组,以所述K最邻近模型分类器组、支持向量机模型分类器组和自组织特征映射网络分类器组组成当前分类器组。例如,所述K最邻近模型分类器组有X1个K最邻近模型分类器,所述支持向量机模型分类器组有X1个支持向量机模型分类器,所述分类器组和自组织特征映射网络分类器组有X1个分类器组和自组织特征映射网络分类器,这样当前分类器组中有3X1个基分类器(即当前分类器组对应的基分类器的总数为3X1)。In this iterative process of obtaining the final current classifier group, when the K nearest neighbor model classifier group, the support vector machine model classifier group and the self-organizing feature map network classifier group are obtained, the K nearest neighbor model classifier group is obtained. The neighbor model classifier group, the support vector machine model classifier group and the self-organizing feature map network classifier group constitute the current classifier group. For example, the K nearest neighbor model classifier group has X1 K nearest neighbor model classifiers, the support vector machine model classifier group has X1 support vector machine model classifiers, the classifier group and the self-organizing feature map The network classifier group has X1 classifier groups and self-organizing feature map network classifiers, so that there are 3X1 base classifiers in the current classifier group (that is, the total number of base classifiers corresponding to the current classifier group is 3X1).
为了进行第一子类初始化种群、第二子类初始化种群、第三子类初始化种群的种群进化,此时可以分别对对所述第一子类初始化种群进行模拟二进制交叉和多项式变、对所述第二子类初始化种群进行模拟二进制交叉和多项式变异、对所述第三子类初始化种群进行模拟二进制交叉和多项式变异,分别得到与所述第一子类初始化种群有相同个体总个数的第一子类子种群、与所述第二子类初始化种群有相同个体总个数的第二子类子种群、与所述第三子类初始化种群有相同个体总个数的第三子类子种群。In order to carry out the population evolution of the first sub-category initialization population, the second sub-category initialization population and the third sub-category initialization population, at this time, binary crossover and polynomial transformation can be simulated for the first sub-category initialization population. Perform simulated binary crossover and polynomial mutation on the second subcategory initialization population, simulate binary crossover and polynomial mutation on the third subcategory initialization population, and obtain the same total number of individuals as the first subcategory initialization population respectively. The first sub-category sub-population, the second sub-category sub-population with the same total number of individuals as the second sub-category initialization population, and the third sub-category with the same total number of individuals as the third sub-category initialization population. subpopulation.
例如,在对第一子类初始化种群进行模拟二进制交叉和多项式变异得到与所述第一子类初始化种群有相同个体总个数的第一子类子种群的过程中,是在在所述第一子类初始化种群中任意挑选两个个体以依次进行二进制交叉,直到生成X1个交叉处理后新个体,对X1个交叉处理后新个体进行多项式变异,由多项式变异后的新个体组成第一子类子种群。对第二子类初始化种群、和第三子类初始化种群进行模拟二进制交叉和多项式变异的过程,可以参考对第一子类初始化种群进行模拟二进制交叉和多项式变异的过程。二进制交叉和多项式变异均为常规处理过程,此处不再赘述。For example, in the process of simulating binary crossover and polynomial mutation on the first sub-category initialization population to obtain the first sub-category sub-population with the same total number of individuals as the first sub-category initialization population, it is during the process that the first sub-category initialization population has the same total number of individuals. A subclass initializes two individuals randomly selected from the population to perform binary crossover in sequence until X1 new individuals after crossover processing are generated. Polynomial mutation is performed on the X1 new individuals after crossover processing, and the new individuals after polynomial mutation form the first subclass. Class subpopulation. For the process of simulating binary crossover and polynomial mutation for the second subcategory initialization population and the third subcategory initialization population, please refer to the process of simulating binary crossover and polynomial mutation for the first subcategory initialization population. Binary crossover and polynomial mutation are routine processes and will not be described again here.
之后进行初始化种群与子种群的合并过程,分别得到第一子类混合种群、第二子类混合种群、第三子类混合种群。为了分别挑选出第一子类混合种群、第二子类混合种群、第三子类混合种群中性能较好的个体进化成下一代种群,此时获取预先存储的无线网络流量数据集中的验证数据子集,通过所述第一子类混合种群中每一个体对所述验证数据子集进行特征选择及分别输入至所述K最邻近模型分类器组中各基分类器,得到与所述第一子类混合种群中每一个体对应的第一分类结果集;通过所述第二子类混合种群中每一个体对所述验证数据子集进行特征选择及分别输入至所述支持向量机模型分类器组中各基分类器,得到与所述第二子类混合种群中每一个体对应的第二分类结果集;通过所述第三子类混合种群中每一个体对所述验证数据子集进行特征选择及分别输入至所述自组织特征映射网络分类器组中各基分类器,得到与所述第三子类混合种群中每一个体对应的第三分类结果集。其中,所述验证数据子集中每一验证数据也是有95个属性数据,所述第一子类混合种群中每一个体、所述第二子类混合种群中每一个体、所述第三子类混合种群中每一个体也是有95个二进制取值(二进制取值的具体取值为0或1)。Afterwards, the merging process of the initialization population and the sub-population is carried out to obtain the first sub-category mixed population, the second sub-category mixed population and the third sub-category mixed population respectively. In order to select individuals with better performance in the first sub-category mixed population, the second sub-category mixed population, and the third sub-category mixed population to evolve into the next generation population, the verification data in the pre-stored wireless network traffic data set is obtained at this time. Subset, each individual in the first sub-category mixed population performs feature selection on the verification data subset and inputs it into each base classifier in the K nearest neighbor model classifier group to obtain the same as the first sub-category. The first classification result set corresponding to each individual in a sub-category mixed population; performing feature selection on the verification data subset through each individual in the second sub-category mixed population and inputting it into the support vector machine model respectively Each base classifier in the classifier group obtains a second classification result set corresponding to each individual in the second sub-category mixed population; the verification data sub-set is processed by each individual in the third sub-category mixed population. Feature selection is performed on the set and input to each base classifier in the self-organizing feature map network classifier group, to obtain a third classification result set corresponding to each individual in the third sub-category mixed population. Wherein, each verification data in the verification data subset also has 95 attribute data, each individual in the first sub-category mixed population, each individual in the second sub-category mixed population, the third sub-category Each individual in the quasi-mixed population also has 95 binary values (the specific value of the binary value is 0 or 1).
在服务器中预先存储的无线网络流量数据集包括的验证数据子集中,与训练数据子集相同的是验证数据子集包括的验证数据有4种类型的无线网络流量数据子集,即正常操作无线网络流量数据子集、洪泛攻击无线网络流量数据子集、注入攻击无线网络流量数据子集和假冒攻击无线网络流量数据子集,上述4种验证数据子集中每一验证数据(每一条验证数据对应一条无线网络流量数据)的分类结果真实类型为正例。In the verification data subset included in the wireless network traffic data set pre-stored in the server, the same as the training data subset is that the verification data included in the verification data subset has 4 types of wireless network traffic data subsets, namely normal operation wireless Network traffic data subset, flood attack wireless network traffic data subset, injection attack wireless network traffic data subset and impersonation attack wireless network traffic data subset, each verification data (each piece of verification data) in the above four verification data subsets Corresponding to a piece of wireless network traffic data), the true type of the classification result is a positive example.
将验证数据子集中各验证数据分别根据所述第一子类混合种群中每一个体、所述第二子类混合种群中每一个体、所述第三子类混合种群中每一个体进行特征选择并简化,得到与所述第一子类混合种群中每一个体对应的第一子类简化后验证数据集(其中,第一子类简化后验证数据集中每一简化后验证数据对应一条简化后无线网络流量数据),与所述第二子类混合种群中每一个体对应的第二子类简化后验证数据集,与所述第三子类混合种群中每一个体对应的第三子类简化后验证数据集;其中,所述第一子类简化后验证数据集、第二子类简化后验证数据集、第三子类简化后验证数据集中每一简化后验证数据的分类结果真实类型为正例。Each verification data in the verification data subset is characterized based on each individual in the first sub-category mixed population, each individual in the second sub-category mixed population, and each individual in the third sub-category mixed population. Select and simplify to obtain the first subcategory simplified verification data set corresponding to each individual in the first subcategory mixed population (wherein, each simplified verification data in the first subcategory simplified verification data set corresponds to a simplified wireless network traffic data), the second subcategory simplified verification data set corresponding to each individual in the second subcategory mixed population, and the third subcategory corresponding to each individual in the third subcategory mixed population. Class simplified verification data set; wherein, the classification result of each simplified verification data set in the first sub-class simplified verification data set, the second sub-class simplified verification data set, and the third sub-class simplified verification data set is true Type is positive example.
调用预先存储的第一多目标策略,以获取与所述第一子类混合种群中每一个体对应的第一分类结果集相对应的第一目标值集合、与所述第二子类混合种群中每一个体对应的第二分类结果集相对应的第二目标值集合、及与所述第三子类混合种群中每一个体对应的第三分类结果集相对应的第三目标值集合的过程中,由于第一分类结果集、第二分类结果集、第三分类结果集中每一分类结果是简化后验证数据输入至分类器的预测结果,而每一简化后验证数据还对应一个真实结果,故可以作为第一多目标策略进行优化的目标,根据种群进化的策略直至种群收敛或达到停止条件,从而选出最终的第一子类初始化种群、最终的第二子类初始化种群、最终的第三子类初始化种群,上述3个子类初始化种群组成最优特征子集。Call the pre-stored first multi-objective strategy to obtain the first target value set corresponding to the first classification result set corresponding to each individual in the first sub-category mixed population, and the second sub-category mixed population. The second target value set corresponding to the second classification result set corresponding to each individual in the third subcategory mixed population, and the third target value set corresponding to the third classification result set corresponding to each individual in the third subcategory mixed population. In the process, since each classification result in the first classification result set, the second classification result set, and the third classification result set is the prediction result of the simplified verification data input to the classifier, and each simplified verification data also corresponds to a real result. , so it can be used as the optimization target of the first multi-objective strategy. According to the population evolution strategy until the population converges or reaches the stopping condition, the final first sub-category initialization population, the final second sub-category initialization population, and the final The third subcategory initializes the population, and the above three subcategories initialize the population to form the optimal feature subset.
在一实施例中,第一多目标策略对应的公式为Jm(k)=TPmk/(TPmk+FNmk+FPmk);其中,k对应的取值集合为{1,2,3,4},m的取值集合为{1,2,……,L}且L与所述第一种群大小值相等,Jm(k)为第m个个体的第k类型的简化无线网络流量数据对应的目标值,且Jm(1)、Jm(2)、Jm(3)、Jm(4)组成第m个个体对应的目标值集合,TPmk为第m个个体第k类型的简化无线网络流量数据中分类结果真实类型为正例且分类结果预测类型为正例的第一总个数值,FNmk为第m个个体第k类型的简化无线网络流量数据中分类结果真实类型为正例且分类结果预测类型为负例的第二总个数值,FPmk为第m个个体第k类型的简化无线网络流量数据中分类结果真实类型为负例且分类结果预测类型为正例的第三总个数值;In one embodiment, the formula corresponding to the first multi-objective strategy is J m (k) = TP mk / (TP mk + FN mk + FP mk ); where the corresponding value set of k is {1, 2, 3 ,4}, the value set of m is {1,2,...,L} and L is equal to the first population size value, J m (k) is the k-th type simplified wireless network of the m-th individual The target value corresponding to the flow data, and J m (1), J m (2), J m (3), J m (4) form the target value set corresponding to the m-th individual, TP mk is the m-th individual's The first total value in the simplified wireless network traffic data of k type that the true type of the classification result is a positive example and the predicted type of the classification result is a positive example, FN mk is the classification result of the mth individual in the kth type of simplified wireless network traffic data The second total value whose true type is a positive example and the prediction type of the classification result is a negative example. FP mk is the simplified wireless network traffic data of the kth type of the m-th individual. The true type of the classification result is a negative example and the prediction type of the classification result is The third total value of the positive example;
所述将所述第一子类混合种群中每一个体对应的第一分类结果集、所述第二子类混合种群中每一个体对应的第二分类结果集、所述第三子类混合种群中每一个体对应的第三分类结果集分别作为第一多目标策略的输入,获取与所述第一子类混合种群中每一个体对应的第一分类结果集相对应的第一目标值集合、与所述第二子类混合种群中每一个体对应的第二分类结果集相对应的第二目标值集合、及与所述第三子类混合种群中每一个体对应的第三分类结果集相对应的第三目标值集合,包括:The first classification result set corresponding to each individual in the first sub-category mixed population, the second classification result set corresponding to each individual in the second sub-category mixed population, the third sub-category mixed population The third classification result set corresponding to each individual in the population is respectively used as the input of the first multi-objective strategy to obtain the first target value corresponding to the first classification result set corresponding to each individual in the first sub-category mixed population. set, a second target value set corresponding to the second classification result set corresponding to each individual in the second sub-category mixed population, and a third classification corresponding to each individual in the third sub-category mixed population The third target value set corresponding to the result set includes:
根据Jm(k)=TPmk/(TPmk+FNmk+FPmk)获取与与所述第一子类混合种群中每一个体对应的第一分类结果集相对应的第一目标值集合、与所述第二子类混合种群中每一个体对应的第二分类结果集相对应的第二目标值集合、及与所述第三子类混合种群中每一个体对应的第三分类结果集相对应的第三目标值集合。Obtain the first target value set corresponding to the first classification result set corresponding to each individual in the first sub-category mixed population according to J m (k) = TP mk / (TP mk + FN mk + FP mk ) , a second target value set corresponding to the second classification result set corresponding to each individual in the second sub-category mixed population, and a third classification result corresponding to each individual in the third sub-category mixed population. The third set of target values corresponding to the set.
由于无线网络流量数据集中训练数据子集和验证数据子集中均有四种攻击类型数据(即正常操作、洪泛攻击、注入攻击、假冒攻击),所以每个个体有四个目标值,根据上述第一多目标策略对应的公式,分别计算出对应攻击类型的JM(k)作为目标值,即每个个体是一个二进制序列且有四个目标值,具体为由JM(1)、JM(2)、JM(3)、JM(4)组成第M个个体对应的目标值集合。通过获取每个个体对应的目标值集合,能作为后续选择较佳性能的个体提供参考。Since there are four types of attack data (i.e. normal operation, flooding attack, injection attack, impersonation attack) in both the training data subset and the verification data subset in the wireless network traffic data set, each individual has four target values. According to the above The formula corresponding to the first multi-target strategy calculates J M (k) corresponding to the attack type as the target value, that is, each individual is a binary sequence and has four target values. Specifically, J M (1), J M (2), J M (3), and J M (4) form the target value set corresponding to the M-th individual. By obtaining the target value set corresponding to each individual, it can be used as a reference for subsequent selection of individuals with better performance.
在获取了所述第一子类混合种群中的个体根据对应的所述第一目标值集合、所述第二子类混合种群中的个体根据对应的所述第二目标值集合、及所述第三子类混合种群中的个体根据对应的所述第三目标值集合后,此时可以对上述3类类混合种群分别进行非支配排序,得到与所述第一目标值集合对应的第一子类非支配解集及第一子类多层解集、与所述第二目标值集合对应的第二子类非支配解集及第二子类多层解集、及与所述第三目标值集合对应的第三子类非支配解集及第三子类多层解集。After obtaining the corresponding first target value set of the individuals in the first sub-category mixed population, the corresponding second target value set of the individuals in the second sub-category mixed population, and the After the individuals in the third sub-category mixed population are gathered according to the corresponding third target value, the above-mentioned three types of mixed populations can be non-dominated sorted at this time to obtain the first corresponding to the first target value set. The non-dominated solution set of the sub-category and the multi-level solution set of the first sub-category, the non-dominated solution set of the second sub-category and the multi-level solution set of the second sub-category corresponding to the second target value set, and the non-dominated solution set of the second sub-category corresponding to the third target value set The third subcategory non-dominated solution set and the third subcategory multi-layer solution set corresponding to the target value set.
其中,具体对某一包括多个个体的种群中各个体进行非支配排序时,可通过非支配解(也可以称为帕累托解)的获取方式,来得到与这一种群对应的非支配解集。其中,帕累托解的定义为假设任何二解S1及S2对所有目标而言,S1均优于或同于S2,并且存在至少一个目标,S1在该目标上对应的目标值优于S2该目标上对应的目标值,则称S1支配S2,若S1的解没有被其他解所支配,则S1称为非支配解(不受支配解),也称Pareto解(即帕累托解)。例如,所述第一子类非支配解集记为Q11,所述第一子类多层解集中包括多个解集子集且分别记为Q12至Q1X,其中Q11至Q1X的并集为所述第一子类混合种群,Q11至Q1X中任意两个集合的交集为空集,Q11≥Q12≥Q13≥……≥Q1X;所述第二子类非支配解集记为Q21,所述第二子类多层解集中包括多个解集子集且分别记为Q22至Q2Y,其中Q21至Q2Y的并集为所述第二子类混合种群,Q21至Q2Y中任意两个集合的交集为空集,Q21≥Q22≥Q23≥……≥Q2Y;所述第三子类非支配解集记为Q31,所述第三子类多层解集中包括多个解集子集且分别记为Q32至Q3Z,其中Q31至Q3Z的并集为所述第三子类混合种群,Q31至Q3Z中任意两个集合的交集为空集,Q31≥Q32≥Q33≥……≥Q3Z;其中,“≥”表示支配关系,Qi≥Qj表示存在Qi中的解支配Qj,该关系是具有传递性,Q1≥Q2表示对于Jm(1)至Jm(4)而言,Q2中的每个解都至少被Q1中的一个解所支配,该关系具有传递性,即Q3中的每个解至少被Q1或Q2中的一个解支配,其他的也依次类推。Among them, when performing a non-dominated sorting of individuals in a population that includes multiple individuals, the non-dominated solution corresponding to this population can be obtained by obtaining a non-dominated solution (also called a Pareto solution). Solution set. Among them, the definition of Pareto solution is to assume that for any two solutions S1 and S2, S1 is better than or the same as S2 for all targets, and there is at least one target, and the corresponding target value of S1 on this target is better than S2. The corresponding target value on the target is said to dominate S2. If the solution of S1 is not dominated by other solutions, then S1 is called a non-dominated solution (undominated solution), also called a Pareto solution (i.e. Pareto solution). For example, the first subtype of non-dominated solution set is denoted as Q 11 , and the first subtype of multi-layer solution set includes multiple solution set subsets and are denoted as Q 12 to Q 1X respectively, where Q 11 to Q 1X The union of is the mixed population of the first subcategory, the intersection of any two sets from Q 11 to Q 1X is the empty set, Q 11 ≥ Q 12 ≥ Q 13 ≥……≥Q 1X ; the second subcategory The non-dominated solution set is denoted as Q 21 , and the second subcategory multi-layer solution set includes multiple solution set subsets and are denoted as Q 22 to Q 2Y respectively, where the union of Q 21 to Q 2Y is the second Subcategory mixed population, the intersection of any two sets from Q 21 to Q 2Y is the empty set, Q 21 ≥ Q 22 ≥ Q 23 ≥... ≥ Q 2Y ; the third subcategory non-dominated solution set is recorded as Q 31 , the third subcategory multi-layer solution set includes multiple solution set subsets and are respectively recorded as Q 32 to Q 3Z , where the union of Q 31 to Q 3Z is the third subcategory mixed population, Q 31 to Q 3Z The intersection of any two sets in Q 3Z is the empty set, Q 31 ≥ Q 32 ≥ Q 33 ≥...≥ Q 3Z ; among them, "≥" represents the dominance relationship, and Q i ≥ Q j represents the existence of solution dominance in Q i Q j , the relationship is transitive, Q 1 ≥ Q 2 means that for J m (1) to J m (4), each solution in Q 2 is dominated by at least one solution in Q 1 , This relationship is transitive, that is, each solution in Q 3 is dominated by at least one solution in Q 1 or Q 2 , and so on for the others.
当获取了与所述第一目标值集合对应的第一子类非支配解集及第一子类多层解集、与所述第二目标值集合对应的第二子类非支配解集及第二子类多层解集、及与所述第三目标值集合对应的第三子类非支配解集及第三子类多层解集后,此时需要在所述第一子类非支配解集及第一子类多层解集中中挑选出X1个个体以组成第一子类当前种群,在所述第二子类非支配解集及第二子类多层解集中中挑选出X1个个体以组成第二子类当前种群,在所述第三子类非支配解集及第三子类多层解集中中挑选出X1个个体以组成第三子类当前种群,至此就完成了获取当前分类器组的此轮迭代过程。When the first sub-category non-dominated solution set and the first sub-category multi-layer solution set corresponding to the first target value set, the second sub-category non-dominated solution set corresponding to the second target value set and After the second subcategory multi-level solution set, the third subcategory non-dominated solution set corresponding to the third target value set, and the third subcategory multi-level solution set, it is necessary to set the first subcategory non-dominated solution set at this time. X1 individuals are selected from the dominated solution set and the first sub-category multi-layer solution set to form the current population of the first sub-category, and selected from the second sub-category non-dominated solution set and the second sub-category multi-layer solution set X1 individuals are selected to form the current population of the second subcategory. X1 individuals are selected from the non-dominated solution set of the third subcategory and the multi-layer solution set of the third subcategory to form the current population of the third subcategory. This is complete This round of iteration process is used to obtain the current classifier group.
例如,所述第一子类非支配解集记为Q11,所述第一子类多层解集中包括多个解集子集且分别记为Q12至Q1X,在所述第一子类非支配解集Q11及第一子类多层解集Q12至Q1X中多个解集子集依序合并从而获取多个个体直至个体的总个数等于所述第一种群大小值X1的过程中,例如获取Q11对应的个体总数为3,Q12对应的个体总数为4,Q13对应的个体总数为5,Q14对应的个体总数为6,且3+4+5+6=X1,则由Q11-Q14中所有的个体以组成第一子类当前种群。For example, the first sub-category of non-dominated solution sets is denoted as Q 11 , and the first sub-category of multi-layer solution sets includes multiple solution set subsets and are denoted as Q 12 to Q 1X respectively. In the first sub-category of multi-layer solution sets, Multiple solution set subsets in the class non-dominated solution set Q 11 and the first sub-class multi-layer solution set Q 12 to Q 1X are sequentially merged to obtain multiple individuals until the total number of individuals is equal to the first population size value In the process of X1, for example, the total number of individuals corresponding to Q 11 is 3, the total number of individuals corresponding to Q 12 is 4, the total number of individuals corresponding to Q 13 is 5, the total number of individuals corresponding to Q 14 is 6, and 3+4+5+ 6=X1, then all individuals in Q 11 -Q 14 form the current population of the first subcategory.
S130、根据预设的第二种群大小值进行种群初始化,得到第二类初始化种群;其中,所述第二类初始化种群中包括多个个体,多个个体的总个数与所述第二种群大小值相等,每一个体对应一个二进制序列,每一个体对应的二进制序列中包括的特征数量等于所述第一种群大小值乘以所述多个子类初始化种群对应的子类种类总数值。S130. Perform population initialization according to the preset second population size value to obtain a second type of initialization population; wherein the second type of initialization population includes multiple individuals, and the total number of multiple individuals is equal to the second type of population. The size values are equal, each individual corresponds to a binary sequence, and the number of features included in the binary sequence corresponding to each individual is equal to the first population size value multiplied by the total number of subclass categories corresponding to the multiple subclass initialization populations.
在本实施例中,为了在所述当前分类器组中选择分类性能较好的目标分类器以组成目标分类器组,也可以先是初始化第二类初始化种群,所述第二类初始化种群中包括的多个个体的总个数与所述第二种群大小值相等,每一个体对应一个二进制序列,每一个体对应的二进制序列中包括的特征数量等于所述第一种群大小值乘以所述多个子类初始化种群对应的子类种类总数值。例如所述多个子类初始化种群对应的子类种类总数值为3,所述第一种群大小值为X1,则所述第二类初始化种群中每一个体对应的二进制序列中包括的特征数量等于3X1,这3X1个特征的二进制取值为0或1,0代表删除对应位置的基分类器,1代表选择对应位置的基分类器。In this embodiment, in order to select a target classifier with better classification performance from the current classifier group to form a target classifier group, a second type of initialization population may also be initialized first. The second type of initialization population includes The total number of multiple individuals is equal to the second population size value, each individual corresponds to a binary sequence, and the number of features included in the binary sequence corresponding to each individual is equal to the first population size value multiplied by the The total number of subclass categories corresponding to multiple subclass initialization populations. For example, the total number of subclass categories corresponding to the multiple subclass initialization populations is 3, and the first population size value is X1, then the number of features included in the binary sequence corresponding to each individual in the second class initialization population is equal to 3X1, the binary value of these 3X1 features is 0 or 1, 0 represents deleting the base classifier at the corresponding position, and 1 represents selecting the base classifier at the corresponding position.
S140、根据预设的第二最大迭代代数,迭代重复执行由第二类初始化种群中每一个体根据当前分类器组和预设的第二优化目标条件进化的步骤,直至得到迭代输出的第二类初始化种群。S140. According to the preset second maximum iteration generation, iteratively repeat the steps of evolving each individual in the second type initialization population according to the current classifier group and the preset second optimization target condition until the second iteration output is obtained. Class initialization population.
在本实施例中,对包括多个基分类器的当前分类器组进行分类器的优化选择时,也是为了除去冗余的基分类器,提高少数类的分类精度。例如,当前分类器组中包括3X1个基分类器,其中第i(i=1,2,……,3X1)个基分类器为Ci,包含3X1个基分类器的当前分类器组为AC。以每个基分类器是否被选作为个体,初始化一个种群,将各个类别(包括正常操作、洪泛攻击、注入攻击、假冒攻击)的检测率和被选择基分类器占总生成基分类器的比例作为进化目标,通过一系列的进化算法策略直到迭代次数用尽或者目标值收敛后,选择出来的最优的分类器组为SC。In this embodiment, when optimizing the classifier selection for the current classifier group including multiple base classifiers, it is also to remove redundant base classifiers and improve the classification accuracy of the minority class. For example, the current classifier group includes 3X1 base classifiers, of which the i-th (i=1, 2,..., 3X1) base classifier is Ci , and the current classifier group including 3X1 base classifiers is AC. . Based on whether each base classifier is selected as an individual, a population is initialized, and the detection rate of each category (including normal operation, flooding attack, injection attack, and fake attack) and the proportion of the selected base classifier to the total generated base classifiers are calculated. Ratio is used as the evolutionary goal. Through a series of evolutionary algorithm strategies until the number of iterations is exhausted or the target value converges, the optimal classifier group selected is SC.
其中,迭代重复执行由第二类初始化种群中每一个体根据当前分类器组和预设的第二优化目标条件进化的步骤,该第二优化目标条件具体可参考下述由第二多目标策略获取与所述第二类混合种群中每一个体相对应的个体目标值集合的过程。Among them, the steps of evolving each individual in the second type initialization population according to the current classifier group and the preset second optimization target condition are iteratively repeated. The second optimization target condition can be specifically referred to as follows by the second multi-objective strategy. The process of obtaining a set of individual target values corresponding to each individual in the second type of mixed population.
在一实施例中,所述根据预设的第二最大迭代代数,迭代重复执行由第二类初始化种群中每一个体根据当前分类器组和预设的第二优化目标条件进化的步骤,直至得到迭代输出的第二类初始化种群,包括:In one embodiment, according to the preset second maximum iteration generation, iteratively repeats the steps of evolving each individual in the second type initialization population according to the current classifier group and the preset second optimization target condition until The second type of initialization population that obtains the iterative output includes:
获取第二当前迭代代数,判断所述第二当前迭代代数是否达到所述第二最大迭代代数;Obtain the second current iteration generation and determine whether the second current iteration generation reaches the second maximum iteration generation;
若所述第二当前迭代代数未达到所述第二最大迭代代数,获取所述当前分类器组;If the second current iteration generation does not reach the second maximum iteration generation, obtain the current classifier group;
对所述第二类初始化种群进行模拟二进制交叉和多项式变异,得到与所述第二类初始化种群有相同个体总个数的第二类子种群;Simulate binary crossover and polynomial mutation on the second type initialization population to obtain a second type subpopulation with the same total number of individuals as the second type initialization population;
将所述第二类初始化种群与所述第二类子种群进行合并,得到第二类混合种群;Merge the second type initialization population with the second type sub-population to obtain a second type mixed population;
通过所述第二类混合种群中每一个体对所述当前分类器组进行基分类器选择,得到所述第二类混合种群中每一个体相对应的已选择分类器组;Perform base classifier selection on the current classifier group by each individual in the second type of mixed population to obtain the selected classifier group corresponding to each individual in the second type of mixed population;
获取预先存储的无线网络流量数据集中的验证数据子集;Obtain the verification data subset from the pre-stored wireless network traffic data set;
将所述验证数据子集输入至所述第二类混合种群中每一个体相对应的已选择分类器组,得到与所述第二类混合种群中每一个体相对应的个体分类结果集;Input the verification data subset into the selected classifier group corresponding to each individual in the second type of mixed population, and obtain an individual classification result set corresponding to each individual in the second type of mixed population;
调用预先存储的第二多目标策略,将所述第二类混合种群中每一个体相对应的个体分类结果集作为第二多目标策略的输入,得到与所述第二类混合种群中每一个体相对应的个体目标值集合;Call the pre-stored second multi-objective strategy, use the individual classification result set corresponding to each individual in the second type of mixed population as the input of the second multi-objective strategy, and obtain the corresponding set of individual classification results for each individual in the second type of mixed population. The set of individual target values corresponding to the body phase;
将所述第二类混合种群中的个体根据个体目标值集合进行非支配排序,得到与所个体目标值集合对应的非支配解集及多层解集,其中,所述非支配解集记为Q41,所述多层解集中包括多个解集子集且分别记为Q42至Q4W,其中Q41至Q4W的并集为所述第二类混合种群,Q41至Q4W中任意两个集合的交集为空集,Q41≥Q42≥Q43≥……≥Q4W;The individuals in the second type of mixed population are non-dominated sorted according to the individual target value set, and a non-dominated solution set and a multi-layer solution set corresponding to the individual target value set are obtained, where the non-dominated solution set is recorded as Q 41 , the multi-layer solution set includes multiple solution set subsets and are recorded as Q 42 to Q 4W respectively, where the union of Q 41 to Q 4W is the second type of mixed population, among Q 41 to Q 4W The intersection of any two sets is the empty set, Q 41 ≥ Q 42 ≥ Q 43 ≥……≥Q 4W ;
在所述非支配解集及多层解集中多个解集子集依序合并从而获取多个集合直至个体的总个数等于所述第二种群大小值,以组成第二类当前种群,将所述第二类当前种群作为所述第二类初始化种群;Multiple solution set subsets in the non-dominated solution set and the multi-layer solution set are sequentially merged to obtain multiple sets until the total number of individuals is equal to the second population size value to form a second type of current population. The second type of current population serves as the second type of initialization population;
将所述第二当前迭代代数加一以作为第二当前迭代代数,返回执行判断所述第二当前迭代代数是否达到所述第二最大迭代代数的步骤;Add one to the second current iteration generation to serve as the second current iteration generation, and return to the step of determining whether the second current iteration generation reaches the second maximum iteration generation;
若所述第二当前迭代代数达到所述第二最大迭代代数,获取第二类初始化种群。If the second current iteration generation reaches the second maximum iteration generation, obtain a second type of initialization population.
在本实施例中,所述第二类初始化种群在迭代进化的过程中对应的第一多目标策略对应的公式包括2个,其中:In this embodiment, the formula corresponding to the first multi-objective strategy corresponding to the second type of initialization population during the iterative evolution process includes two formulas, among which:
第一个公式为Jn(k)=TPnk/(TPnk+FNnk+FPnk);其中,k对应的取值集合为{1,2,3,4},n的取值集合为{1,2,……,L2}且L2与所述第二种群大小值相等,Jn(k)为第n个个体的第k类型的简化无线网络流量数据对应的目标值,且Jn(1)、Jn(2)、Jn(3)、Jn(4)组成第m个个体对应的个体分类结果集,TPnk为第n个个体第k类型的简化无线网络流量数据中分类结果真实类型为正例且分类结果预测类型为正例的第四总个数值,FNnk为第n个个体第k类型的简化无线网络流量数据中分类结果真实类型为正例且分类结果预测类型为负例的第五总个数值,FPnk为第n个个体第k类型的简化无线网络流量数据中分类结果真实类型为负例且分类结果预测类型为正例的第六总个数值;The first formula is J n (k) = TP nk /(TP nk +FN nk +FP nk ); among them, the value set corresponding to k is {1,2,3,4}, and the value set corresponding to n is {1,2,...,L2} and L2 is equal to the second population size value, J n (k) is the target value corresponding to the k-th type of simplified wireless network traffic data of the n-th individual, and J n (1), J n (2), J n (3), J n (4) constitute the individual classification result set corresponding to the m-th individual, TP nk is the simplified wireless network traffic data of the k-th type of the n-th individual The fourth total value in which the true type of the classification result is a positive example and the predicted type of the classification result is a positive example. FN nk is the simplified wireless network traffic data of the kth type of the nth individual. The true type of the classification result is a positive example and the predicted classification result is a positive example. The fifth total value whose type is a negative example, FP nk is the sixth total value whose true type of classification result is a negative example and the predicted type of the classification result is a positive example in the simplified wireless network traffic data of the nth individual and kth type;
第二个公式为min Sn=|SCn|/|AC|;其中,Sn表示第二类初始化种群第n个个体对应在所述当前分类器组中选择中的基分类器个数占所述当前分类器组中总基分类器数的比例,|SCn|表示第二类初始化种群第n个个体对应在所述当前分类器组中选择中的基分类器个数,|AC|表示所述当前分类器组中总基分类器数(也即所述当前分类器组中所包括的基分类器的总个数,例如取值为3X1)。The second formula is min S n = |SC n |/|AC|; where S n represents the number of base classifiers selected in the current classifier group corresponding to the nth individual of the second type of initialization population. The proportion of the total number of base classifiers in the current classifier group, |SC n |, represents the number of base classifiers selected in the current classifier group corresponding to the nth individual of the second type of initialization population, |AC| Indicates the total number of base classifiers in the current classifier group (that is, the total number of base classifiers included in the current classifier group, for example, the value is 3X1).
例如,所述非支配解集记为Q41,所述多层解集中包括多个解集子集且分别记为Q42至Q4W,在Q41至Q4W多个集合依序合并从而获取多个个体直至个体的总个数等于所述第二种群大小值的过程中,例如获取Q41对应的个体总数为4,Q42对应的个体总数为5,Q13对应的个体总数为5,Q14对应的个体总数为7,且4+5+5+7=第二种群大小值,则由Q41-Q44中所有的个体以组成第二类当前种群。其中,例如Q41≥Q42表示对于Jn(1)至Jn(4)以及Sn而言,Q2中的每个解都至少被Q1中的一个解所支配,该关系具有传递性,即Q3中的每个解至少被Q1或Q2中的一个解支配,其他的也依次类推。更具体的,Q41对应有个体1-个体4,Q42对应有个体5-个体9,Q41≥Q42可表示J1(1)≥J6(1)、J1(2)≥J6(2)、J1(3)≥J6(3)、J1(4)≥J6(4)、S1≤S6,由J1(1)、J1(2)、J1(3)、J1(4)、S1组成Q41中个体1对应的个体目标值集合,由J6(1)、J6(2)、J6(3)、J6(4)、S6组成Q42中个体6对应的个体目标值集合。For example, the non-dominated solution set is denoted as Q 41 , the multi-layer solution set includes multiple solution set subsets and are denoted as Q 42 to Q 4W respectively, and multiple sets from Q 41 to Q 4W are sequentially merged to obtain In the process of multiple individuals until the total number of individuals is equal to the second population size value, for example, the total number of individuals corresponding to Q 41 is 4, the total number of individuals corresponding to Q 42 is 5, and the total number of individuals corresponding to Q 13 is 5, The total number of individuals corresponding to Q 14 is 7, and 4+5+5+7=the second population size value, then all individuals in Q 41 -Q 44 form the second type of current population. Among them, for example, Q 41 ≥ Q 42 means that for J n (1) to J n (4) and S n , each solution in Q 2 is dominated by at least one solution in Q 1. This relationship has transitive property, that is, each solution in Q 3 is dominated by at least one solution in Q 1 or Q 2 , and so on for the others. More specifically, Q 41 corresponds to individual 1-individual 4, Q 42 corresponds to individual 5-individual 9, Q 41 ≥ Q 42 can mean J 1 (1) ≥ J 6 (1), J 1 (2) ≥ J 6 (2), J 1 (3) ≥ J 6 (3), J 1 (4) ≥ J 6 (4), S 1 ≤ S 6 , from J 1 (1), J 1 (2), J 1 (3), J 1 (4), and S 1 form the individual target value set corresponding to individual 1 in Q 41 , which is composed of J 6 (1), J 6 (2), J 6 (3), J 6 (4), S 6 constitutes the set of individual target values corresponding to individual 6 in Q 42 .
S150、根据输出的所述第二类初始化种群中每一个体在当前分类器组进行基分类器选择,得到第二类初始化种群中每一个体对应的目标分类器组,以组成最优目标分类器组,将所述最优目标分类器组进行存储。S150. Select a base classifier in the current classifier group according to the output of each individual in the second type of initialization population, and obtain the target classifier group corresponding to each individual in the second type of initialization population to form the optimal target classification. The optimal target classifier group is stored.
在本实施例中,当在服务器中完成了所述第二类初始化种群的进化过程后,根据最终的所述第二类初始化种群中每一个体在当前分类器组进行基分类器选择,得到第二类初始化种群中每一个体对应的目标分类器组,以组成最优目标分类器组,将所述最优目标分类器组进行存储。例如,第二类初始化种群中个体1对应在当前分类器组进行基分类器,选中了当前分类器组包括的3X1个基分类器中第1排序位、第3排序位、第8排序位、第9排序位的基分类器后,则第二类初始化种群中个体1对应的目标分类器组由当前分类器组包括的3X1个基分类器中第1排序位、第3排序位、第8排序位、第9排序位的基分类器组成。参照此过程,可以获取第二类初始化种群中其它个体分别对应的目标分类器组,从而由这多个目标分类器组组成最优目标分类器组。In this embodiment, after the evolution process of the second type initialization population is completed in the server, a base classifier is selected in the current classifier group based on each individual in the final second type initialization population, and we get The second type initializes the target classifier group corresponding to each individual in the population to form an optimal target classifier group, and stores the optimal target classifier group. For example, individual 1 in the second type of initialization population corresponds to the base classifier in the current classifier group, and the 1st sorting position, the 3rd sorting position, the 8th sorting position, and the 3X1 base classifiers included in the current classifier group are selected. After the base classifier in the 9th sorting position, the target classifier group corresponding to individual 1 in the second type of initialization population is composed of the 1st sorting position, the 3rd sorting position, and the 8th sorting position among the 3X1 base classifiers included in the current classifier group. The base classifier consists of the sorting bit and the 9th sorting bit. Referring to this process, the target classifier groups corresponding to other individuals in the second type of initialization population can be obtained, so that the optimal target classifier group is composed of these multiple target classifier groups.
此时将最优目标分类器组部署在服务器的防火墙的IDS中即可对待检测无线网络流量数据进行攻击类型分类。由于所述目标分类器组是一个至少包括2种类型基分类器的集成分类器组,避免了采用单一种类的分类器,从而提高对较少见的攻击类型识别率。At this time, the optimal target classifier group is deployed in the IDS of the server's firewall to classify the attack types of the wireless network traffic data to be detected. Since the target classifier group is an integrated classifier group including at least two types of base classifiers, the use of a single type of classifier is avoided, thereby improving the recognition rate of less common attack types.
在一实施例中,步骤S150之后还包括:In an embodiment, after step S150, it further includes:
判断是否接收到客户端发送的待检测无线网络流量数据集;Determine whether the wireless network traffic data set to be detected sent by the client is received;
若接收到待检测无线网络流量数据集,通过所述最优目标分类器组对所述待检测无线网络流量数据集中每一待检测无线网络流量数据进行数据攻击类型分类,得到与每一待检测无线网络流量数据对应的数据攻击类型结果;其中,所述数据攻击类型结果为正常操作、洪泛攻击、注入攻击、或假冒攻击。If the wireless network traffic data set to be detected is received, the optimal target classifier group is used to classify the data attack type of each wireless network traffic data to be detected in the wireless network traffic data set to be detected, and obtain the data attack type corresponding to each wireless network traffic data set to be detected. The data attack type result corresponding to the wireless network traffic data; wherein the data attack type result is normal operation, flooding attack, injection attack, or impersonation attack.
在本实施例中,在服务器的防火墙中的IDS中部署了最优目标分类器组,若检测到某一条无线网络流量数据到达服务器的防火墙中时,此时可以通过所述最优目标分类器组对该无线网络流量数据进行集成模型的分类,从而获取更准确的分类效果,尤其是提高对较少见的攻击类型识别率。In this embodiment, an optimal target classifier group is deployed in the IDS in the server's firewall. If a certain piece of wireless network traffic data is detected and reaches the server's firewall, the optimal target classifier can be used to The group classifies the wireless network traffic data with an integrated model to obtain a more accurate classification effect, especially to improve the recognition rate of less common attack types.
在一实施例中,所述通过所述最优目标分类器组对所述待检测无线网络流量数据集中每一待检测无线网络流量数据进行数据攻击类型分类,得到与每一待检测无线网络流量数据对应的数据攻击类型结果的步骤,包括:In one embodiment, the optimal target classifier group is used to classify the data attack type of each wireless network traffic data to be detected in the wireless network traffic data set to be detected, and obtain the data attack type corresponding to each wireless network traffic to be detected. The steps for the results of the data attack type corresponding to the data include:
通过所述最优目标分类器组对所述待检测无线网络流量数据集中每一待检测无线网络流量数据进行数据攻击类型分类,得到每一待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果;Classify the data attack type of each wireless network traffic data to be detected in the wireless network traffic data set to be detected by using the optimal target classifier group to obtain the optimal target classifier for each wireless network traffic data to be detected. The target classification results corresponding to each target classifier group in the group;
判断各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中是否存在洪泛攻击;Determine whether there is a flooding attack in the target classification results corresponding to each target classifier group in the optimal target classifier group for each wireless network traffic data to be detected;
若各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中存在洪泛攻击、且目标分类结果中洪泛攻击对应的总占比超出预设的少数类比重,获取存在洪泛攻击的目标分类结果对应的待检测无线网络流量数据以组成少数类无线网络流量数据集,获取不存在洪泛攻击的目标分类结果对应的待检测无线网络流量数据以组成多数类无线网络流量数据集;If the wireless network traffic data to be detected contains flooding attacks in the target classification results corresponding to each target classifier group in the optimal target classifier group, and the total proportion of flooding attacks in the target classification results exceeds the predetermined Assuming the minority class proportion, obtain the wireless network traffic data to be detected corresponding to the target classification results with flooding attacks to form a minority class wireless network traffic data set, and obtain the wireless network traffic to be detected corresponding to the target classification results without flooding attacks. Data to form most types of wireless network traffic data sets;
将少数类无线网络流量数据集中目标分类结果中洪泛攻击对应的总占比超出预设的少数类比重对应的待检测无线网络流量数据的分类结果置为洪泛攻击;Set the classification result of the wireless network traffic data to be detected for which the total proportion corresponding to flood attacks in the target classification results in the minority class wireless network traffic data set exceeds the preset minority class proportion as flooding attacks;
获取多数类无线网络流量数据集中各待检测无线网络流量数据在所述目标分类器组中各目标分类器分别对应的目标分类结果中分类结果总个数为最大值的分类结果,以作为多数类无线网络流量数据集中每一待检测无线网络流量数据对应的数据攻击类型结果;Obtain the classification result with the maximum total number of classification results among the target classification results corresponding to each target classifier in the target classifier group for each wireless network traffic data to be detected in the majority category wireless network traffic data set, as the majority category Data attack type results corresponding to each wireless network traffic data to be detected in the wireless network traffic data set;
若各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中存在洪泛攻击、且目标分类结果中洪泛攻击对应的总占比未超出预设的少数类比重,获取各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中分类结果总个数为最大值的分类结果,以作为每一待检测无线网络流量数据对应的数据攻击类型结果。If the wireless network traffic data to be detected contains flooding attacks in the target classification results corresponding to each target classifier group in the optimal target classifier group, and the total proportion of flooding attacks in the target classification results does not exceed The preset proportion of minority classes is used to obtain the classification result in which the total number of classification results is the maximum value among the target classification results corresponding to each target classifier group in the optimal target classifier group for each wireless network traffic data to be detected, so as to As the result of the data attack type corresponding to each wireless network traffic data to be detected.
在本实施例中,若各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中不存在洪泛攻击,获取各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中分类结果总个数为最大值的分类结果,以作为每一待检测无线网络流量数据对应的数据攻击类型结果。In this embodiment, if the wireless network traffic data to be detected does not have a flooding attack in the target classification results corresponding to each target classifier group in the optimal target classifier group, obtain the wireless network traffic data to be detected. Among the target classification results corresponding to each target classifier group in the optimal target classifier group, the classification result with the maximum total number of classification results is used as the data attack type result corresponding to each wireless network traffic data to be detected. .
为了更清楚的理解通过最优目标分类器组对待检测无线网络流量数据进行数据攻击类型结果分类的过程,下面通过一各具体实施例来说明。假设步骤S140迭代完成输出的第二类初始化种群包括个体1和个体2,个体1中对应的目标分类器组由当前分类器组包括的3X1个基分类器中第1排序位、第3排序位、第8排序位、第9排序位的基分类器组成,个体2中对应的目标分类器组由当前分类器组包括的3X1个基分类器中第2排序位、第3排序位、第7排序位、第9排序位的基分类器组成。此时无线网络流量数据1输入至个体1对应选中的目标分类器组中,得到的目标分类结果为正常操作、正常操作、注入攻击、正常操作;无线网络流量数据1输入至个体2对应选中的目标分类器组中,得到的目标分类结果为正常操作、假冒攻击、正常操作、正常操作。此时个体1对应的目标分类器组对无线网络流量数据1的4个分类结果中,正常操作的分类结果总个数为3,则无线网络流量数据1在个体1对应的目标分类器组中的数据攻击类型结果为正常操作;同理,无线网络流量数据1在个体2对应的目标分类器组中的数据攻击类型结果为正常操作,由于在无线网络流量数据1在个体1对应的目标分类器组及个体2对应的目标分类器组各自对应的分类结果都是正常操作,则可判定无线网络流量数据1对应的数据攻击类型结果为正常操作。In order to more clearly understand the process of classifying the data attack type results of the wireless network traffic data to be detected through the optimal target classifier group, a specific embodiment will be described below. Assume that the second type of initialization population output by the iterative completion of step S140 includes individual 1 and individual 2. The corresponding target classifier group in individual 1 is composed of the 1st sorting position and the 3rd sorting position among the 3X1 base classifiers included in the current classifier group. , the 8th sorted position, and the 9th sorted base classifier. The corresponding target classifier group in individual 2 consists of the 2nd sorted position, the 3rd sorted position, and the 7th ranked classifier among the 3X1 base classifiers included in the current classifier group. The base classifier consists of the sorting bit and the 9th sorting bit. At this time, wireless network traffic data 1 is input into the selected target classifier group corresponding to individual 1, and the obtained target classification results are normal operation, normal operation, injection attack, and normal operation; wireless network traffic data 1 is input into the selected target classifier group corresponding to individual 2. In the target classifier group, the obtained target classification results are normal operation, fake attack, normal operation, and normal operation. At this time, among the 4 classification results of wireless network traffic data 1 by the target classifier group corresponding to individual 1, the total number of classification results for normal operation is 3, then wireless network traffic data 1 is in the target classifier group corresponding to individual 1 The data attack type result is normal operation; similarly, the data attack type result of wireless network traffic data 1 in the target classifier group corresponding to individual 2 is normal operation, because the wireless network traffic data 1 is in the target classification corresponding to individual 1 If the corresponding classification results of the target classifier group corresponding to the device group and individual 2 are normal operations, it can be determined that the data attack type result corresponding to the wireless network traffic data 1 is normal operation.
例如,在选择个体1对应的目标分类器组各自对应的分类结果时,也是判断分类结果中洪泛攻击对应的加权分类结果总个数是否为正常操作的分类结果总个数、假冒攻击分类结果总个数、注入攻击分类结果总个数、及洪泛攻击对应的加权分类结果总个数中的最大值。在计算分类结果中洪泛攻击对应的加权分类结果总个数时可以由分类结果中洪泛攻击对应的分类结果总个数乘以预设的加权系数,例如设置该加权系数为3。若分类结果中洪泛攻击对应的加权分类结果总个数是上述4个数值中的最大值,则个体1对应的各目标分类器组对无线网络流量数据1的分类结果为洪泛攻击;若分类结果中洪泛攻击对应的加权分类结果总个数不是上述4个数值中的最大值,则上述4个数值中的最大值对应的分类结果作为个体1对应的各目标分类器组对无线网络流量数据1的分类结果。For example, when selecting the corresponding classification results of the target classifier group corresponding to individual 1, it is also judged whether the total number of weighted classification results corresponding to flooding attacks in the classification results is the total number of classification results for normal operations and the classification results for fake attacks. The maximum value among the total number, the total number of injection attack classification results, and the total number of weighted classification results corresponding to flooding attacks. When calculating the total number of weighted classification results corresponding to flooding attacks in the classification results, the total number of classification results corresponding to flooding attacks in the classification results may be multiplied by a preset weighting coefficient. For example, the weighting coefficient is set to 3. If the total number of weighted classification results corresponding to flooding attacks in the classification results is the maximum value among the above four values, then the classification result of each target classifier group corresponding to individual 1 for wireless network traffic data 1 is a flooding attack; if The total number of weighted classification results corresponding to flooding attacks in the classification results is not the maximum value among the above four values, then the classification result corresponding to the maximum value among the above four values will be used as the target classifier group corresponding to individual 1 for the wireless network Classification results of traffic data 1.
该方法实现了在服务器中部署由至少包括2种类型基分类器的集成分类器组组成最优目标分类器组,从而能对流量数据类型进行更精准的分类,避免了采用单一种类的分类器,从而提高对较少见的攻击类型识别率。This method implements the deployment of an integrated classifier group including at least two types of base classifiers in the server to form an optimal target classifier group, thereby enabling more accurate classification of traffic data types and avoiding the use of a single type of classifier. , thereby improving the recognition rate of less common attack types.
本发明实施例还提供一种基于双进化的流量数据类型集成分类装置,该基于双进化的流量数据类型集成分类装置用于执行前述基于双进化的流量数据类型集成分类方法的任一实施例。具体地,请参阅图3,图3是本发明实施例提供的基于双进化的流量数据类型集成分类装置的示意性框图。该基于双进化的流量数据类型集成分类装置100可以被配置于服务器中。Embodiments of the present invention also provide an integrated classification device for traffic data types based on dual evolution. The integrated classification device for traffic data types based on dual evolution is used to execute any embodiment of the aforementioned integrated classification method for traffic data types based on dual evolution. Specifically, please refer to Figure 3, which is a schematic block diagram of a traffic data type integrated classification device based on dual evolution provided by an embodiment of the present invention. The dual evolution-based traffic data type integrated classification device 100 may be configured in a server.
如图3所示,基于双进化的流量数据类型集成分类装置100包括第一种群初始化单元110、第一迭代进化单元120、第二种群初始化单元130、第二迭代进化单元140、最优目标分类器组获取单元150。As shown in Figure 3, the traffic data type integrated classification device 100 based on dual evolution includes a first population initialization unit 110, a first iterative evolution unit 120, a second population initialization unit 130, a second iterative evolution unit 140, an optimal target classification The device group acquisition unit 150.
其中,第一种群初始化单元110,用于根据预设的第一种群大小值进行种群初始化,得到多个子类初始化种群;其中,多个子类初始化种群中每一子类初始化种群中所包括个体的总个数与所述第一种群大小值相等,多个子类初始化种群中每一子类初始化种群中所包括每一个体对应一个二进制序列,多个子类初始化种群中每一子类初始化种群中所包括每一个体对应的二进制序列中包括的特征数量与无线网络流量数据的特征数量相等。Among them, the first population initialization unit 110 is used to perform population initialization according to the preset first population size value to obtain multiple sub-category initialization populations; wherein, the number of individuals included in each sub-category initialization population in the multiple sub-category initialization populations is The total number is equal to the size of the first population. Each individual included in each sub-category initialization population in the multiple sub-category initialization populations corresponds to a binary sequence. All individuals included in each sub-category initialization population in the multiple sub-category initialization populations The number of features included in the binary sequence corresponding to each individual is equal to the number of features of the wireless network traffic data.
第一迭代进化单元120,用于根据预设的第一最大迭代代数,迭代重复执行根据多个子类初始化种群中每一子类初始化种群中每一个体对预先存储的无线网络流量数据集中的训练数据子集进行特征选择、及输入至每一子类初始化种群对应的待训练分类器模型,得到与每一子类初始化种群对应的基分类器组,以多个子类初始化种群中每一子类初始化种群根据对应的基分类器组和预设的第一优化目标条件进化,得到每一子类初始化种群对应的子类当前种群的步骤,直至得到每一子类初始化种群对应的基分类器组,由每一子类初始化种群对应的基分类器组组成的当前分类器组。The first iterative evolution unit 120 is configured to iteratively and repeatedly perform training on the pre-stored wireless network traffic data set based on each individual in each sub-category initialization population of multiple sub-categories according to the preset first maximum iteration generation number. The data subset is subjected to feature selection and input to the classifier model to be trained corresponding to the initialization population of each subcategory, and a base classifier group corresponding to the initialization population of each subcategory is obtained, and each subcategory in the initialization population is initialized with multiple subcategories. The initialization population evolves according to the corresponding base classifier group and the preset first optimization target condition, and the steps of obtaining the current population of the subcategory corresponding to the initialization population of each subcategory are obtained until the base classifier group corresponding to the initialization population of each subcategory is obtained. , the current classifier group composed of the base classifier group corresponding to the initialization population of each subclass.
第二种群初始化单元130,用于根据预设的第二种群大小值进行种群初始化,得到第二类初始化种群;其中,所述第二类初始化种群中包括多个个体,多个个体的总个数与所述第二种群大小值相等,每一个体对应一个二进制序列,每一个体对应的二进制序列中包括的特征数量等于所述第一种群大小值乘以所述多个子类初始化种群对应的子类种类总数值。The second population initialization unit 130 is configured to perform population initialization according to the preset second population size value to obtain a second type of initialization population; wherein the second type of initialization population includes multiple individuals, and the total number of individuals is The number is equal to the second population size value, each individual corresponds to a binary sequence, and the number of features included in the binary sequence corresponding to each individual is equal to the first population size value multiplied by the number corresponding to the multiple subclass initialization populations. The total number of subcategories.
第二迭代进化单元140,用于根据预设的第二最大迭代代数,迭代重复执行由第二类初始化种群中每一个体根据当前分类器组和预设的第二优化目标条件进化的步骤,直至得到迭代输出的第二类初始化种群。The second iterative evolution unit 140 is configured to iteratively and repeatedly execute the steps of evolving each individual in the second type initialization population according to the current classifier group and the preset second optimization target condition according to the preset second maximum iteration generation, Until the second type of initialization population of the iterative output is obtained.
最优目标分类器组获取单元150,用于根据输出的所述第二类初始化种群中每一个体在当前分类器组进行基分类器选择,得到第二类初始化种群中每一个体对应的目标分类器组,以组成最优目标分类器组,将所述最优目标分类器组进行存储。The optimal target classifier group acquisition unit 150 is used to select a base classifier in the current classifier group according to each individual in the output second type initialization population, and obtain the target corresponding to each individual in the second type initialization population. Classifier group to form an optimal target classifier group, and store the optimal target classifier group.
在一实施例中,所述基于双进化的流量数据类型集成分类装置100还包括:In one embodiment, the dual evolution-based traffic data type integrated classification device 100 further includes:
流量数据检测单元,用于判断是否接收到客户端发送的待检测无线网络流量数据集;The traffic data detection unit is used to determine whether the wireless network traffic data set to be detected sent by the client is received;
数据攻击类型结果分类单元,用于若接收到待检测无线网络流量数据集,通过所述最优目标分类器组对所述待检测无线网络流量数据集中每一待检测无线网络流量数据进行数据攻击类型分类,得到与每一待检测无线网络流量数据对应的数据攻击类型结果;其中,所述数据攻击类型结果为正常操作、洪泛攻击、注入攻击、或假冒攻击。A data attack type result classification unit, used to perform a data attack on each wireless network traffic data to be detected in the wireless network traffic data set to be detected through the optimal target classifier group if a wireless network traffic data set to be detected is received. Type classification is performed to obtain a data attack type result corresponding to each wireless network traffic data to be detected; wherein the data attack type result is normal operation, flooding attack, injection attack, or impersonation attack.
其中,所述多个子类初始化种群对应的子类种类总数值为3时,具体描述参考上述方法实施例,此次不再赘述。When the total number of subclass types corresponding to the multiple subclass initialization populations is 3, refer to the above method embodiments for detailed description, and will not be described again this time.
在一实施例中,所述第一迭代进化单元120,包括:In one embodiment, the first iterative evolution unit 120 includes:
第一处理单元,用于获取第一当前迭代代数,判断所述第一当前迭代代数是否达到所述第一最大迭代代数;A first processing unit, configured to obtain the first current iteration generation and determine whether the first current iteration generation reaches the first maximum iteration generation;
第二处理单元,用于若所述第一当前迭代代数未达到所述第一最大迭代代数,获取预先存储的无线网络流量数据集中的训练数据子集;A second processing unit configured to obtain a training data subset in the pre-stored wireless network traffic data set if the first current iteration generation does not reach the first maximum iteration generation;
第三处理单元,用于通过所述第一子类初始化种群中每一个体对所述训练数据子集进行特征选择及分别输入至待训练K最邻近模型进行训练,对应得到K最邻近模型分类器组;其中,所述K最邻近模型分类器组中基分类器的总数与所述第一种群大小值相等;The third processing unit is used to perform feature selection on the training data subset through each individual in the first subcategory initialization population and input it into the K nearest neighbor model to be trained for training, and correspondingly obtain the K nearest neighbor model classification. group; wherein, the total number of base classifiers in the K nearest neighbor model classifier group is equal to the first population size value;
第四处理单元,用于通过所述第二子类初始化种群中每一个体对所述训练数据子集进行特征选择及分别输入至待训练支持向量机模型进行训练,对应得到支持向量机模型分类器组;其中,所述支持向量机模型分类器组中基分类器的总数与所述第一种群大小值相等;The fourth processing unit is used to perform feature selection on the training data subset through each individual in the second subcategory initialization population and input it into the support vector machine model to be trained for training, and correspondingly obtain the support vector machine model classification. group; wherein, the total number of base classifiers in the support vector machine model classifier group is equal to the first population size value;
第五处理单元,用于通过所述第三子类初始化种群中每一个体对所述训练数据子集进行特征选择及分别输入至待训练自组织特征映射网络进行训练,对应得到自组织特征映射网络分类器组;其中,所述自组织特征映射网络分类器组中基分类器的总数与所述第一种群大小值相等;The fifth processing unit is used to perform feature selection on the training data subset through each individual in the third subcategory initialization population and input it into the self-organizing feature map network to be trained for training, and obtain the self-organizing feature map accordingly. Network classifier group; wherein the total number of base classifiers in the self-organizing feature map network classifier group is equal to the first population size value;
第六处理单元,用于以所述K最邻近模型分类器组、支持向量机模型分类器组和自组织特征映射网络分类器组组成当前分类器组;其中,所述当前分类器组中对应的基分类器的总数为所述第一种群大小值的3倍;The sixth processing unit is used to form a current classifier group with the K nearest neighbor model classifier group, the support vector machine model classifier group and the self-organizing feature map network classifier group; wherein the corresponding The total number of base classifiers is 3 times the first population size value;
第七处理单元,用于对所述第一子类初始化种群进行模拟二进制交叉和多项式变异,得到与所述第一子类初始化种群有相同个体总个数的第一子类子种群;A seventh processing unit, configured to perform simulated binary crossover and polynomial mutation on the first sub-category initialization population to obtain a first sub-category sub-population with the same total number of individuals as the first sub-category initialization population;
第八处理单元,用于对所述第二子类初始化种群进行模拟二进制交叉和多项式变异,得到与所述第二子类初始化种群有相同个体总个数的第二子类子种群;The eighth processing unit is used to simulate binary crossover and polynomial mutation on the second sub-category initialization population to obtain a second sub-category sub-population with the same total number of individuals as the second sub-category initialization population;
第九处理单元,用于对所述第三子类初始化种群进行模拟二进制交叉和多项式变异,得到与所述第三子类初始化种群有相同个体总个数的第三子类子种群;The ninth processing unit is used to simulate binary crossover and polynomial mutation on the third sub-category initialization population to obtain a third sub-category sub-population with the same total number of individuals as the third sub-category initialization population;
第十处理单元,用于将所述第一子类初始化种群与所述第一子类子种群进行合并得到第一子类混合种群;A tenth processing unit, configured to merge the first sub-category initialization population and the first sub-category sub-population to obtain a first sub-category mixed population;
第十一处理单元,用于将所述第二子类初始化种群与所述第二子类子种群进行合并得到第二子类混合种群;An eleventh processing unit, configured to merge the second sub-category initialization population and the second sub-category sub-population to obtain a second sub-category mixed population;
第十二处理单元,用于将所述第三子类初始化种群与所述第三子类子种群进行合并得到第三子类混合种群;A twelfth processing unit, configured to merge the third sub-category initialization population and the third sub-category sub-population to obtain a third sub-category mixed population;
第十三处理单元,用于获取预先存储的无线网络流量数据集中的验证数据子集;A thirteenth processing unit, used to obtain the verification data subset in the pre-stored wireless network traffic data set;
第十四处理单元,用于通过所述第一子类混合种群中每一个体对所述验证数据子集进行特征选择及分别输入至所述K最邻近模型分类器组中各基分类器,得到与所述第一子类混合种群中每一个体对应的第一分类结果集;A fourteenth processing unit, configured to perform feature selection on the verification data subset through each individual in the first sub-category mixed population and input it to each base classifier in the K nearest neighbor model classifier group, Obtain a first classification result set corresponding to each individual in the first sub-category mixed population;
第十五处理单元,用于通过所述第二子类混合种群中每一个体对所述验证数据子集进行特征选择及分别输入至所述支持向量机模型分类器组中各基分类器,得到与所述第二子类混合种群中每一个体对应的第二分类结果集;A fifteenth processing unit, configured to perform feature selection on the verification data subset through each individual in the second sub-category mixed population and input it to each base classifier in the support vector machine model classifier group, Obtain a second classification result set corresponding to each individual in the second sub-category mixed population;
第十六处理单元,用于通过所述第三子类混合种群中每一个体对所述验证数据子集进行特征选择及分别输入至所述自组织特征映射网络分类器组中各基分类器,得到与所述第三子类混合种群中每一个体对应的第三分类结果集;The sixteenth processing unit is used to perform feature selection on the verification data subset through each individual in the third sub-category mixed population and input it to each base classifier in the self-organizing feature map network classifier group. , obtain the third classification result set corresponding to each individual in the third subcategory mixed population;
第十七处理单元,用于调用预先存储的第一多目标策略,将所述第一子类混合种群中每一个体对应的第一分类结果集、所述第二子类混合种群中每一个体对应的第二分类结果集、所述第三子类混合种群中每一个体对应的第三分类结果集分别作为第一多目标策略的输入,获取与所述第一子类混合种群中每一个体对应的第一分类结果集相对应的第一目标值集合、与所述第二子类混合种群中每一个体对应的第二分类结果集相对应的第二目标值集合、及与所述第三子类混合种群中每一个体对应的第三分类结果集相对应的第三目标值集合;The seventeenth processing unit is used to call the pre-stored first multi-objective strategy to combine the first classification result set corresponding to each individual in the first sub-category mixed population and each individual in the second sub-category mixed population. The second classification result set corresponding to the individual and the third classification result set corresponding to each individual in the third sub-category mixed population are respectively used as inputs of the first multi-objective strategy to obtain the information corresponding to each individual in the first sub-category mixed population. a first target value set corresponding to the first classification result set corresponding to an individual, a second target value set corresponding to the second classification result set corresponding to each individual in the second subcategory mixed population, and The third target value set corresponding to the third classification result set corresponding to each individual in the third subcategory mixed population;
第十八处理单元,用于将所述第一子类混合种群中的个体根据对应的所述第一目标值集合、将所述第二子类混合种群中的个体根据对应的所述第二目标值集合、及将所述第三子类混合种群中的个体根据对应的所述第三目标值集合分别进行非支配排序,得到与所述第一目标值集合对应的第一子类非支配解集及第一子类多层解集、与所述第二目标值集合对应的第二子类非支配解集及第二子类多层解集、及与所述第三目标值集合对应的第三子类非支配解集及第三子类多层解集;其中,所述第一子类非支配解集记为Q11,所述第一子类多层解集中包括多个解集子集且分别记为Q12至Q1X,其中Q11至Q1X的并集为所述第一子类混合种群,Q11至Q1X中任意两个集合的交集为空集,Q11≥Q12≥Q13≥……≥Q1X;所述第二子类非支配解集记为Q21,所述第二子类多层解集中包括多个解集子集且分别记为Q22至Q2Y,其中Q21至Q2Y的并集为所述第二子类混合种群,Q21至Q2Y中任意两个集合的交集为空集,Q21≥Q22≥Q23≥……≥Q2Y;所述第三子类非支配解集记为Q31,所述第三子类多层解集中包括多个解集子集且分别记为Q32至Q3Z,其中Q31至Q3Z的并集为所述第三子类混合种群,Q31至Q3Z中任意两个集合的交集为空集,Q31≥Q32≥Q33≥……≥Q3Z;The eighteenth processing unit is used to assign individuals in the first sub-category mixed population according to the corresponding first target value set, and assign individuals in the second sub-category mixed population according to the corresponding second target value set. Target value set, and non-dominated sorting of the individuals in the third sub-category mixed population according to the corresponding third target value set, to obtain the first sub-category non-dominated sequence corresponding to the first target value set. The solution set and the first sub-category multi-level solution set, the second sub-category non-dominated solution set and the second sub-category multi-level solution set corresponding to the second target value set, and the second sub-category multi-level solution set corresponding to the third target value set The third sub-category of non-dominated solution sets and the third sub-category of multi-level solution sets; wherein, the first sub-category of non-dominated solution sets is denoted as Q 11 , and the first sub-category of multi-level solution sets includes multiple solutions Set subsets are recorded as Q 12 to Q 1X respectively, where the union of Q 11 to Q 1X is the first subtype mixed population, and the intersection of any two sets in Q 11 to Q 1X is the empty set, Q 11 ≥Q 12 ≥Q 13 ≥…≥Q 1X ; the second subcategory non-dominated solution set is recorded as Q 21 , and the second subcategory multi-layer solution set includes multiple solution set subsets and are respectively marked as Q 22 to Q 2Y , where the union of Q 21 to Q 2Y is the second subcategory mixed population, the intersection of any two sets in Q 21 to Q 2Y is the empty set, Q 21 ≥ Q 22 ≥ Q 23 ≥… ...≥Q 2Y ; the third subcategory of non-dominated solution sets is denoted as Q 31 , and the third subcategory of multi-layer solution sets includes multiple solution set subsets and are denoted as Q 32 to Q 3Z respectively, where Q 31 The union to Q 3Z is the third sub-category mixed population, the intersection of any two sets from Q 31 to Q 3Z is the empty set, Q 31 ≥ Q 32 ≥ Q 33 ≥……≥Q 3Z ;
第十九处理单元,用于在所述第一子类非支配解集及第一子类多层解集中多个解集子集依序合并从而获取多个集合直至个体的总个数等于所述第一种群大小值,以组成第一子类当前种群,将所述第一子类当前种群作为所述第一子类初始化种群;The nineteenth processing unit is used to sequentially merge multiple solution set subsets in the first sub-category non-dominated solution set and the first sub-category multi-layer solution set to obtain multiple sets until the total number of individuals equals the required number. The first population size value is used to form the current population of the first subcategory, and the current population of the first subcategory is used as the initialization population of the first subcategory;
第二十处理单元,用于在所述第二子类非支配解集及第二子类多层解集中多个解集子集依序合并从而获取多个集合直至个体的总个数等于所述第一种群大小值,以组成第二子类当前种群,将所述第二子类当前种群作为所述第二子类初始化种群;The twentieth processing unit is used to sequentially merge multiple solution set subsets in the second sub-category non-dominated solution set and the second sub-category multi-layer solution set to obtain multiple sets until the total number of individuals equals the required number. The first population size value is used to form the current population of the second subcategory, and the current population of the second subcategory is used as the initialization population of the second subcategory;
第二十一处理单元,用于在所述第三子类非支配解集及第三子类多层解集中多个解集子集依序合并从而获取多个集合直至个体的总个数等于所述第一种群大小值,以组成第三子类当前种群,将所述第三子类当前种群作为所述第三子类初始化种群;The twenty-first processing unit is used to sequentially merge multiple solution set subsets in the third sub-category non-dominated solution set and the third sub-category multi-layer solution set to obtain multiple sets until the total number of individuals is equal to The first population size value is used to form the current population of the third subcategory, and the current population of the third subcategory is used as the initialization population of the third subcategory;
第二十二处理单元,用于将所述第一当前迭代代数加一以作为第一当前迭代代数,返回执行判断所述第一当前迭代代数是否达到所述第一最大迭代代数的步骤;The twenty-second processing unit is configured to add one to the first current iteration generation as the first current iteration generation, and return to the step of determining whether the first current iteration generation reaches the first maximum iteration generation;
第二十三处理单元,用于若所述第一当前迭代代数达到所述第一最大迭代代数,获取第一子类初始化种群对应的K最邻近模型分类器组、第二子类初始化种群对应的支持向量机模型分类器组、及第三子类初始化种群对应的自组织特征映射网络分类器组,由K最邻近模型分类器组、支持向量机模型分类器组及自组织特征映射网络分类器组组成当前分类器组。The twenty-third processing unit is used to obtain the K nearest neighbor model classifier group corresponding to the initialization population of the first subclass and the initialization population of the second subclass if the first current iteration generation reaches the first maximum iteration generation. The support vector machine model classifier group and the self-organizing feature map network classifier group corresponding to the third subclass initialization population are classified by the K nearest neighbor model classifier group, the support vector machine model classifier group and the self-organizing feature map network The classifier group forms the current classifier group.
在一实施例中,所述第二迭代进化单元140包括:In an embodiment, the second iterative evolution unit 140 includes:
第二十四处理单元,用于获取第二当前迭代代数,判断所述第二当前迭代代数是否达到所述第二最大迭代代数;The twenty-fourth processing unit is used to obtain the second current iteration generation and determine whether the second current iteration generation reaches the second maximum iteration generation;
第二十五处理单元,用于若所述第二当前迭代代数未达到所述第二最大迭代代数,获取所述当前分类器组;A twenty-fifth processing unit, configured to obtain the current classifier group if the second current iteration generation does not reach the second maximum iteration generation;
第二十六处理单元,用于对所述第二类初始化种群进行模拟二进制交叉和多项式变异,得到与所述第二类初始化种群有相同个体总个数的第二类子种群;The twenty-sixth processing unit is used to simulate binary crossover and polynomial mutation on the second type initialization population to obtain a second type subpopulation with the same total number of individuals as the second type initialization population;
第二十七处理单元,用于将所述第二类初始化种群与所述第二类子种群进行合并,得到第二类混合种群;The twenty-seventh processing unit is used to merge the second type of initialization population with the second type of subpopulation to obtain a second type of mixed population;
第二十八处理单元,用于通过所述第二类混合种群中每一个体对所述当前分类器组进行基分类器选择,得到所述第二类混合种群中每一个体相对应的已选择分类器组;The twenty-eighth processing unit is used to select a base classifier for the current classifier group through each individual in the second type of mixed population, and obtain the corresponding already-classified classifier for each individual in the second type of mixed population. Select classifier group;
第二十九处理单元,用于获取预先存储的无线网络流量数据集中的验证数据子集;The twenty-ninth processing unit is used to obtain the verification data subset in the pre-stored wireless network traffic data set;
第三十处理单元,用于将所述验证数据子集输入至所述第二类混合种群中每一个体相对应的已选择分类器组,得到与所述第二类混合种群中每一个体相对应的个体分类结果集;The thirtieth processing unit is used to input the verification data subset to the selected classifier group corresponding to each individual in the second type of mixed population, and obtain the corresponding classification of each individual in the second type of mixed population. The corresponding individual classification result set;
第三十一处理单元,用于调用预先存储的第二多目标策略,将所述第二类混合种群中每一个体相对应的个体分类结果集作为第二多目标策略的输入,得到与所述第二类混合种群中每一个体相对应的个体目标值集合;The thirty-first processing unit is used to call the pre-stored second multi-objective strategy, and use the individual classification result set corresponding to each individual in the second type of mixed population as the input of the second multi-objective strategy to obtain the corresponding The set of individual target values corresponding to each individual in the second type of mixed population;
第三十二处理单元,用于将所述第二类混合种群中的个体根据个体目标值集合进行非支配排序,得到与所个体目标值集合对应的非支配解集及多层解集,其中,所述非支配解集记为Q41,所述多层解集中包括多个解集子集且分别记为Q42至Q4W,其中Q41至Q4W的并集为所述第二类混合种群,Q41至Q4W中任意两个集合的交集为空集,Q41≥Q42≥Q43≥……≥Q4W;The thirty-second processing unit is used to non-dominated sort the individuals in the second type of mixed population according to the individual target value set, and obtain the non-dominated solution set and the multi-layer solution set corresponding to the individual target value set, where , the non-dominated solution set is denoted as Q 41 , the multi-level solution set includes multiple solution set subsets and are denoted as Q 42 to Q 4W respectively, where the union of Q 41 to Q 4W is the second category Mixed population, the intersection of any two sets from Q 41 to Q 4W is the empty set, Q 41 ≥ Q 42 ≥ Q 43 ≥... ≥ Q 4W ;
第三十三处理单元,用于在所述非支配解集及多层解集中多个解集子集依序合并从而获取多个集合直至个体的总个数等于所述第二种群大小值,以组成第二类当前种群,将所述第二类当前种群作为所述第二类初始化种群;The thirty-third processing unit is used to sequentially merge multiple solution set subsets in the non-dominated solution set and the multi-layer solution set to obtain multiple sets until the total number of individuals is equal to the second population size value, To form a second type of current population, the second type of current population is used as the second type of initialization population;
第三十四处理单元,用于将所述第二当前迭代代数加一以作为第二当前迭代代数,返回执行判断所述第二当前迭代代数是否达到所述第二最大迭代代数的步骤;A thirty-fourth processing unit, configured to add one to the second current iteration generation as the second current iteration generation, and return to the step of determining whether the second current iteration generation reaches the second maximum iteration generation;
第三十五处理单元,用于若所述第二当前迭代代数达到所述第二最大迭代代数,获取第二类初始化种群。The thirty-fifth processing unit is configured to obtain the second type of initialization population if the second current iteration generation reaches the second maximum iteration generation.
在一实施例中,所述数据攻击类型结果分类单元,包括:In one embodiment, the data attack type result classification unit includes:
第三十六处理单元,用于通过所述最优目标分类器组对所述待检测无线网络流量数据集中每一待检测无线网络流量数据进行数据攻击类型分类,得到每一待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果;The thirty-sixth processing unit is configured to use the optimal target classifier group to classify the data attack type of each wireless network traffic data to be detected in the wireless network traffic data set to be detected, and obtain each wireless network traffic to be detected. The target classification results corresponding to each target classifier group of the data in the optimal target classifier group;
第三十七处理单元,用于判断各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中是否存在洪泛攻击;The thirty-seventh processing unit is used to determine whether there is a flooding attack in the target classification results corresponding to each target classifier group in the optimal target classifier group for each wireless network traffic data to be detected;
第三十八处理单元,用于若各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中存在洪泛攻击、且目标分类结果中洪泛攻击对应的总占比超出预设的少数类比重,获取存在洪泛攻击的目标分类结果对应的待检测无线网络流量数据以组成少数类无线网络流量数据集,获取不存在洪泛攻击的目标分类结果对应的待检测无线网络流量数据以组成多数类无线网络流量数据集;The thirty-eighth processing unit is used to perform a flooding attack if the wireless network traffic data to be detected is flooded in the target classification results corresponding to each target classifier group in the optimal target classifier group, and the target classification results are flooded. If the total proportion corresponding to flood attacks exceeds the preset proportion of the minority class, obtain the wireless network traffic data to be detected corresponding to the target classification results with flooding attacks to form a minority class wireless network traffic data set, and obtain the targets without flooding attacks. The wireless network traffic data to be detected corresponding to the classification results form a majority wireless network traffic data set;
第三十九处理单元,用于将少数类无线网络流量数据集中目标分类结果中洪泛攻击对应的总占比超出预设的少数类比重对应的待检测无线网络流量数据的分类结果置为洪泛攻击;The thirty-ninth processing unit is used to set the classification result of the wireless network traffic data to be detected corresponding to the flood attack in the target classification result of the minority class wireless network traffic data set to flood if the total proportion corresponding to the flood attack exceeds the preset minority class proportion. general attack;
第四十处理单元,用于获取多数类无线网络流量数据集中各待检测无线网络流量数据在所述目标分类器组中各目标分类器分别对应的目标分类结果中分类结果总个数为最大值的分类结果,以作为多数类无线网络流量数据集中每一待检测无线网络流量数据对应的数据攻击类型结果;The fortieth processing unit is used to obtain the maximum number of classification results among the target classification results corresponding to each target classifier in the target classifier group for each wireless network traffic data to be detected in the wireless network traffic data set of most types. The classification result is used as the result of the data attack type corresponding to each wireless network traffic data to be detected in the majority wireless network traffic data set;
第四十一处理单元,用于若各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中存在洪泛攻击、且目标分类结果中洪泛攻击对应的总占比未超出预设的少数类比重,获取各待检测无线网络流量数据在所述最优目标分类器组中各目标分类器组分别对应的目标分类结果中分类结果总个数为最大值的分类结果,以作为每一待检测无线网络流量数据对应的数据攻击类型结果。The forty-first processing unit is used to perform a flooding attack if the wireless network traffic data to be detected is flooded in the target classification results corresponding to each target classifier group in the optimal target classifier group, and the target classification results are flooded. The total proportion corresponding to general attacks does not exceed the preset proportion of minority categories, and the total number of classification results of each wireless network traffic data to be detected in the target classification results corresponding to each target classifier group in the optimal target classifier group is obtained. The classification result with the largest number is used as the data attack type result corresponding to each wireless network traffic data to be detected.
该装置实现了在服务器中部署由至少包括2种类型基分类器的集成分类器组组成最优目标分类器组,从而能对流量数据类型进行更精准的分类,避免了采用单一种类的分类器,从而提高对较少见的攻击类型识别率。The device implements the deployment of an integrated classifier group including at least two types of base classifiers in the server to form an optimal target classifier group, thereby enabling more accurate classification of traffic data types and avoiding the use of a single type of classifier. , thereby improving the recognition rate of less common attack types.
上述基于双进化的流量数据类型集成分类装置可以实现为计算机程序的形式,该计算机程序可以在如图4所示的计算机设备上运行。The above-mentioned integrated classification device of traffic data types based on dual evolution can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in Figure 4.
请参阅图4,图4是本发明实施例提供的计算机设备的示意性框图。该计算机设备500是服务器,服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。Please refer to Figure 4. Figure 4 is a schematic block diagram of a computer device provided by an embodiment of the present invention. The computer device 500 is a server, and the server can be an independent server or a server cluster composed of multiple servers.
参阅图4,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。Referring to Figure 4, the computer device 500 includes a processor 502, a memory and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行基于双进化的流量数据类型集成分类方法。The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. When executed, the computer program 5032 can cause the processor 502 to execute a traffic data type integration classification method based on dual evolution.
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。The processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500 .
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行基于双进化的流量数据类型集成分类方法。The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, it can cause the processor 502 to execute the traffic data type integration classification method based on dual evolution.
该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图4中示出的结构,仅仅是与本发明方案相关的部分结构的框图,并不构成对本发明方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The network interface 505 is used for network communication, such as providing transmission of data information, etc. Those skilled in the art can understand that the structure shown in Figure 4 is only a block diagram of a partial structure related to the solution of the present invention, and does not constitute a limitation on the computer equipment 500 to which the solution of the present invention is applied. Specific computer equipment 500 may include more or fewer components than shown, some combinations of components, or a different arrangement of components.
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本发明实施例公开的基于双进化的流量数据类型集成分类方法。The processor 502 is used to run the computer program 5032 stored in the memory to implement the integrated classification method of traffic data types based on dual evolution disclosed in the embodiment of the present invention.
本领域技术人员可以理解,图4中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图4所示实施例一致,在此不再赘述。Those skilled in the art can understand that the embodiment of the computer device shown in Figure 4 does not constitute a limitation on the specific configuration of the computer device. In other embodiments, the computer device may include more or fewer components than shown in the figure. Or combining certain parts, or different parts arrangements. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structure and function of the memory and processor are consistent with the embodiment shown in FIG. 4 and will not be described again.
应当理解,在本发明实施例中,处理器502可以是中央处理单元(CentralProcessing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(DigitalSignal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable GateArray,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present invention, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), dedicated integrated processors, etc. Circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general processor may be a microprocessor or the processor may be any conventional processor.
在本发明的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现本发明实施例公开的基于双进化的流量数据类型集成分类方法。In another embodiment of the invention a computer readable storage medium is provided. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, the integrated classification method of traffic data types based on dual evolution disclosed in the embodiment of the present invention is implemented.
所述计算机可读存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。The computer-readable storage medium is a physical, non-transient storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk, which can store program codes. physical storage media.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the above-described equipment, devices and units can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of various equivalent methods within the technical scope disclosed in the present application. Modification or replacement, these modifications or replacements shall be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.
Claims (10)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010063154.0A CN111291792B (en) | 2020-01-19 | 2020-01-19 | Traffic data type integrated classification method and device based on dual evolution |
PCT/CN2020/079875 WO2021142914A1 (en) | 2020-01-19 | 2020-03-18 | Traffic data type integrated classification method and apparatus based on double evolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010063154.0A CN111291792B (en) | 2020-01-19 | 2020-01-19 | Traffic data type integrated classification method and device based on dual evolution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111291792A CN111291792A (en) | 2020-06-16 |
CN111291792B true CN111291792B (en) | 2023-10-27 |
Family
ID=71023198
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010063154.0A Active CN111291792B (en) | 2020-01-19 | 2020-01-19 | Traffic data type integrated classification method and device based on dual evolution |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111291792B (en) |
WO (1) | WO2021142914A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115660025B (en) * | 2022-10-25 | 2025-06-24 | 吉林大学 | A method for extracting and selecting IoT device identification features based on improved honey badger algorithm |
CN116647877B (en) * | 2023-06-12 | 2024-03-15 | 广州爱浦路网络技术有限公司 | Flow category verification method and system based on graph convolution model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004006137A1 (en) * | 2002-07-02 | 2004-01-15 | British Telecommunications Public Limited Company | Optimisation method and apparatus |
CN107122844A (en) * | 2017-03-15 | 2017-09-01 | 深圳大学 | A kind of Multipurpose Optimal Method and system being combined based on index and direction vector |
CN109242021A (en) * | 2018-09-07 | 2019-01-18 | 浙江财经大学 | A kind of classification prediction technique based on multistage mixed model |
CN109615421A (en) * | 2018-11-28 | 2019-04-12 | 安徽大学 | A Personalized Product Recommendation Method Based on Multi-objective Evolutionary Algorithm |
US10402723B1 (en) * | 2018-09-11 | 2019-09-03 | Cerebri AI Inc. | Multi-stage machine-learning models to control path-dependent processes |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130054816A1 (en) * | 2011-08-25 | 2013-02-28 | Alcatel-Lucent Usa Inc | Determining Validity of SIP Messages Without Parsing |
CN108234500A (en) * | 2018-01-08 | 2018-06-29 | 重庆邮电大学 | A kind of wireless sense network intrusion detection method based on deep learning |
CN108712404B (en) * | 2018-05-04 | 2020-11-06 | 重庆邮电大学 | Internet of things intrusion detection method based on machine learning |
CN108632279B (en) * | 2018-05-08 | 2020-07-10 | 北京理工大学 | A multi-layer anomaly detection method based on network traffic |
-
2020
- 2020-01-19 CN CN202010063154.0A patent/CN111291792B/en active Active
- 2020-03-18 WO PCT/CN2020/079875 patent/WO2021142914A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004006137A1 (en) * | 2002-07-02 | 2004-01-15 | British Telecommunications Public Limited Company | Optimisation method and apparatus |
CN107122844A (en) * | 2017-03-15 | 2017-09-01 | 深圳大学 | A kind of Multipurpose Optimal Method and system being combined based on index and direction vector |
CN109242021A (en) * | 2018-09-07 | 2019-01-18 | 浙江财经大学 | A kind of classification prediction technique based on multistage mixed model |
US10402723B1 (en) * | 2018-09-11 | 2019-09-03 | Cerebri AI Inc. | Multi-stage machine-learning models to control path-dependent processes |
CN109615421A (en) * | 2018-11-28 | 2019-04-12 | 安徽大学 | A Personalized Product Recommendation Method Based on Multi-objective Evolutionary Algorithm |
Non-Patent Citations (2)
Title |
---|
a gene-level hybrid crossover operator for multiobjective evolutionary algorithm;Qingling Zhu;《2015 Sceond International Conference on Soft Computing and Machine Intelligence》;第20-24页 * |
a hybrid evolutionary immune algorithm for multiobjective optimization problem;Qiuzhen Lin;《IEEE》;第20卷(第5期);第711-729页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111291792A (en) | 2020-06-16 |
WO2021142914A1 (en) | 2021-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8655823B1 (en) | Event management system based on machine logic | |
CN110225030A (en) | Malice domain name detection method and system based on RCNN-SPP network | |
CN107451597A (en) | A kind of sample class label method and device for correcting | |
CN114330135B (en) | Classification model construction method and device, storage medium and electronic equipment | |
CN111291792B (en) | Traffic data type integrated classification method and device based on dual evolution | |
CN116596095A (en) | Training method and device of carbon emission prediction model based on machine learning | |
CN113541985A (en) | Internet of things fault diagnosis method, model training method and related device | |
CN112087329A (en) | Network service function chain deployment method | |
JP2020061007A (en) | Learning program, learning method, and learning device | |
CN110378389A (en) | A kind of Adaboost classifier calculated machine creating device | |
CN113672508A (en) | Simulink test method based on risk strategy and diversity strategy | |
WO2024066143A1 (en) | Molecular collision cross section prediction method and apparatus, device, and storage medium | |
CN108133240A (en) | A kind of multi-tag sorting technique and system based on fireworks algorithm | |
Liu et al. | A weight-incorporated similarity-based clustering ensemble method | |
CN107426141A (en) | Malicious code protection method, system and monitoring device | |
CN111723873A (en) | Power sequence data classification method and device | |
US11609936B2 (en) | Graph data processing method, device, and computer program product | |
CN105843859A (en) | Data processing method, device and equipment | |
CN115550178A (en) | Intelligent gateway control method and system | |
CN110995722B (en) | Method and Device for Obtaining Optimal Feature Subset of Traffic Data Based on Immune Strategy | |
CN112861115B (en) | Encryption strategy calling method based on block chain security authentication and cloud authentication server | |
CN106295671B (en) | Application list clustering method and device and computing equipment | |
Mani et al. | Solving combinatorial optimization problems with quantum inspired evolutionary algorithm tuned using a novel heuristic method | |
CN114662579A (en) | Clustering method and clustering equipment | |
WO2019114481A1 (en) | Cluster type recognition method, apparatus, electronic apparatus, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |