CN109543406B - Android malicious software detection method based on XGboost machine learning algorithm - Google Patents
Android malicious software detection method based on XGboost machine learning algorithm Download PDFInfo
- Publication number
- CN109543406B CN109543406B CN201811150736.1A CN201811150736A CN109543406B CN 109543406 B CN109543406 B CN 109543406B CN 201811150736 A CN201811150736 A CN 201811150736A CN 109543406 B CN109543406 B CN 109543406B
- Authority
- CN
- China
- Prior art keywords
- xgboost
- algorithm
- child
- optimal
- ant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Virology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Stored Programmes (AREA)
Abstract
本发明涉及一种基于XGBoost机器学习算法的Android恶意软件检测方法,首先通过反编译apk文件提取Permission,Intent,Component和API call特征,并量化组成特征矩阵,利用蚁群算法的并行性和较强的鲁棒性,对XGBoost分类器参数进行寻优,以求得最优目标并得到XGBoost的最优参数组合。该发明提出的改进的XGBoost机器学习算法与传统的XGBoost算法相比,在Android恶意软件检测时具有更高的分类精度,提高了恶意软件检测的正确率,降低了由于检测错误而导致Android系统遭受攻击的概率。
The invention relates to an Android malware detection method based on an XGBoost machine learning algorithm. First, the Permission, Intent, Component and API call features are extracted by decompiling the apk file, and the feature matrix is quantified to form a feature matrix, and the parallelism of the ant colony algorithm is used. The robustness of the XGBoost classifier is optimized to obtain the optimal target and the optimal parameter combination of XGBoost. Compared with the traditional XGBoost algorithm, the improved XGBoost machine learning algorithm proposed by this invention has higher classification accuracy in Android malware detection, improves the correct rate of malware detection, and reduces the damage caused by detection errors to the Android system. probability of attack.
Description
技术领域Technical Field
本发明涉及Android平台上恶意软件检测的技术领域,具体涉及一种基于XGBoost机器学习算法的Android恶意软件检测方法。The present invention relates to the technical field of malware detection on an Android platform, and in particular to an Android malware detection method based on an XGBoost machine learning algorithm.
背景技术Background Art
Android系统由Google公司在2007年11月5日正式发布,作为一款基于Linux内核的操作系统,其开源、自由的特性,使得Android系统以极快的速度成为市场占有量最大的智能移动设备操作系统。然而,在其备受广大App开发者和用户欢迎的同时,也成为恶意攻击者的首选目标。Android恶意软件的快速增长己经对用户的安全和隐私构成严重威胁,恶意软件窃取用户的私人数据,导致财产损失,以及利用系统漏洞获取更高的权限,实现更大的危害。随着移动支付产业的持续推进,互联网+概念火爆,移动支付迅速发展,手机支付病毒也是层出不穷,严重危害了用户财产安全。因此需要能快速有效地检测出恶意软件的方法。The Android system was officially released by Google on November 5, 2007. As an operating system based on the Linux kernel, its open source and free features have made the Android system quickly become the largest smart mobile device operating system in the market. However, while it is popular among App developers and users, it has also become the preferred target of malicious attackers. The rapid growth of Android malware has posed a serious threat to the security and privacy of users. Malware steals users' private data, causing property losses, and exploits system vulnerabilities to obtain higher permissions and achieve greater harm. With the continuous advancement of the mobile payment industry, the Internet+ concept is popular, mobile payment is developing rapidly, and mobile payment viruses are emerging in an endless stream, seriously endangering the property safety of users. Therefore, a method that can quickly and effectively detect malware is needed.
目前针对Android恶意软件的检测方法主要有三种,静态检测方法、动态检测方法以及静态检测与动态检测相结合的方法。Currently, there are three main detection methods for Android malware: static detection method, dynamic detection method, and a combination of static detection and dynamic detection.
其中,静态检测方法是在不运行Android应用程序的情况下,通过逆向工程对应用程序的安装包进行反编译,并提取相关特征,如权限信息、API调用、指令特征等信息,以此来表征程序在运行时可能进行的操作,从而辨别该应用程序是否是恶意软件。静态检测大多使用机器学习算法对提取出的特征信息进行分类检测。然而,该种静态检测方法的分类精度不高,恶意软件检测的正确率较低,增加了由于检测错误而导致Android系统遭受攻击的概率。Among them, the static detection method is to decompile the installation package of the application through reverse engineering without running the Android application, and extract relevant features, such as permission information, API calls, instruction features, etc., to characterize the operations that the program may perform when running, so as to identify whether the application is malware. Static detection mostly uses machine learning algorithms to classify and detect the extracted feature information. However, the classification accuracy of this static detection method is not high, and the accuracy of malware detection is low, which increases the probability of the Android system being attacked due to detection errors.
发明内容Summary of the invention
本发明的目的在于克服现有技术的不足,提供一种分类精度较高、恶意软件检测的正确率较高、大大降低由于检测错误而导致Android系统遭受攻击的概率的基于XGBoost机器学习算法的Android恶意软件检测方法。The purpose of the present invention is to overcome the shortcomings of the prior art and provide an Android malware detection method based on the XGBoost machine learning algorithm, which has high classification accuracy, high accuracy of malware detection, and greatly reduces the probability of the Android system being attacked due to detection errors.
为实现上述目的,本发明所提供的技术方案为:To achieve the above purpose, the technical solution provided by the present invention is:
一种基于XGBoost机器学习算法的Android恶意软件检测方法,通过反编译apk文件提取Permission,Intent,Component和API call特征,并量化组成特征矩阵,利用蚁群优化算法对XGBoost集成学习框架进行参数优化,快速寻找到全局最优解,多次迭代后获取最优目标值并且得到XGBoost的最优参数组合收缩步长shrinkage和子节点中最小样本权重阈值 min_child_weight,最后将优化后的XGBoost算法应用到Android恶意软件检测模型中。An Android malware detection method based on XGBoost machine learning algorithm extracts Permission, Intent, Component and API call features by decompiling apk files and quantizing the feature matrix. The ant colony optimization algorithm is used to optimize the parameters of the XGBoost ensemble learning framework to quickly find the global optimal solution. After multiple iterations, the optimal target value is obtained and the optimal parameter combination of XGBoost, namely, shrinkage step size and minimum sample weight threshold min_child_weight in child nodes, is obtained. Finally, the optimized XGBoost algorithm is applied to the Android malware detection model.
进一步地,基于XGBoost机器学习算法的Android恶意软件检测方法的具体步骤如下:Furthermore, the specific steps of the Android malware detection method based on the XGBoost machine learning algorithm are as follows:
S1:利用apktool将apk文件反编译得到AndroidManifest.xml和 classes.dex;S1: Use apktool to decompile the apk file to get AndroidManifest.xml and classes.dex;
S2:提取Permission、Intent、Component和API call特征;S2: Extract Permission, Intent, Component and API call features;
S3:特征量化,输出值为one-hot向量,如果存在特征,则标记为1,否则将其标记为0;S3: Feature quantization, the output value is a one-hot vector, if the feature exists, it is marked as 1, otherwise it is marked as 0;
S4:将所有的特征向量形成特征向量集合,采用特征选择算法对特征向量集合进行降维,选取最优的特征子集;S4: All feature vectors are formed into a feature vector set, and the feature selection algorithm is used to reduce the dimension of the feature vector set to select the optimal feature subset;
S5:利用蚁群优化算法对XGBoost集成学习框架进行参数优化,快速寻找到全局最优解,多次迭代后获取最优目标值并且得到XGBoost的最优参数组合收缩步长shrinkage和子节点中最小样本权重阈值min_child_weight;S5: Use the ant colony optimization algorithm to optimize the parameters of the XGBoost integrated learning framework, quickly find the global optimal solution, obtain the optimal target value after multiple iterations, and obtain the optimal parameter combination of XGBoost, the shrinkage step size, and the minimum sample weight threshold min_child_weight in the child node;
S6:将优化特征向量随机抽取10%作为测试集,剩余的90%作为训练集合输入到优化后的XGBoost集成学习框架中进行优化学习;S6: Randomly extract 10% of the optimized feature vectors as the test set, and the remaining 90% as the training set and input them into the optimized XGBoost ensemble learning framework for optimization learning;
S7:从真正率、假正率、分类精度对分类结果进行评估,判断该基于蚁群算法优化的XGBoost算法用于生成Android恶意软件检测模型是否符合检测要求。S7: Evaluate the classification results from the perspective of true positive rate, false positive rate, and classification accuracy to determine whether the XGBoost algorithm optimized by the ant colony algorithm is used to generate the Android malware detection model that meets the detection requirements.
进一步地,利用蚁群优化算法对XGBoost集成学习框架进行参数优化的具体步骤如下:Furthermore, the specific steps of using the ant colony optimization algorithm to optimize the parameters of the XGBoost integrated learning framework are as follows:
A、设置XGBoost分类器参数的收缩步长shrinkage和子节点中最小样本权重阈值min_child_weight的上下限,最大的迭代次数MaxIter,蚁群规模M,信息蒸发系数Rho;A. Set the shrinkage step size of the XGBoost classifier parameters, the upper and lower limits of the minimum sample weight threshold min_child_weight in the child node, the maximum number of iterations MaxIter, the ant colony size M, and the information evaporation coefficient Rho;
B、初始化种群,即初始化shrinkage和min_child_weight,作为每一只蚂蚁的位置向量;B. Initialize the population, that is, initialize shrinkage and min_child_weight as the position vector of each ant;
C、执行蚁群搜索;C. Perform ant colony search;
D、进行XGBoost训练;D. Perform XGBoost training;
E、用XGBoost分类器计算每只蚂蚁的目标函数值和信息素值,寻找当前最优蚂蚁;E. Use XGBoost classifier to calculate the objective function value and pheromone value of each ant and find the current optimal ant;
F、判断是否满足终止条件:如果迭代的次数大于MaxIter,则输出蚁群最优值以及对应的shrinkage和min_child_weight值,执行步骤G,否则迭代次数加1,执行步骤C;F. Determine whether the termination condition is met: If the number of iterations is greater than MaxIter, output the optimal value of the ant colony and the corresponding shrinkage and min_child_weight values, and execute step G; otherwise, increase the number of iterations by 1 and execute step C;
G、更新信息素;G. Update pheromones;
H、将输出的shrinkage和min_child_weight用于Android恶意软件的检测模型中。H. Use the output shrinkage and min_child_weight in the Android malware detection model.
进一步地,所述蚁群优化算法具体如下:Furthermore, the ant colony optimization algorithm is specifically as follows:
蚁群位置初始化:Ant colony position initialization:
假设XGBoost的分类准确率作为目标函数值Assume that XGBoost's classification accuracy is used as the objective function value
max{F(s1,w1),F(s2,w2),...,F(sm,wm)},记为 max fitness=max{F(X)},X={x1,x2,...,xm},其中xi表示蚂蚁,利用混沌序列产生初始化的种群步骤如下:max{F(s 1 ,w 1 ),F(s 2 ,w 2 ),...,F(s m ,w m )}, denoted as max fitness=max{F(X)},X={x 1 ,x 2 ,...,x m }, where xi represents ants. The steps to generate the initialized population using chaotic sequence are as follows:
1)产生一个D维的随机向量:1) Generate a D-dimensional random vector:
2)Logistics映射,使用上式作为初始迭代,Logistics映射方程如下:2) Logistics mapping, using the above formula as the initial iteration, the Logistics mapping equation is as follows:
式中,μ=1,i=1,2,...,N,d=1,2,..,D;In the formula, μ=1,i=1,2,...,N,d=1,2,...,D;
3)将混沌空间映射到优化变量的搜索空间:3) Map the chaotic space to the search space of optimization variables:
式中,maxd为取上限值,mind为取下限值;In the formula, max d is the upper limit value, and min d is the lower limit value;
蚂蚁移动规则:Ant movement rules:
蚁群初始化后,计算其目标函数,为第k迭代第j个蚂蚁的位置向量,定义,目标函数越大,其位置信息素浓度越大,则保存当前目标值最大的蚂蚁为以及其信息素最大值 After the ant colony is initialized, its objective function is calculated. is the position vector of the jth ant at the kth iteration. It is defined that the larger the objective function is, the greater the pheromone concentration is at its position. The ant with the largest current objective value is and its pheromone maximum value
选择局部搜索或者全局搜索:Select Local Search or Global Search:
蚂蚁转移的概率定义如下:The probability of ant migration is defined as follows:
式中,S为适应度函数的标准差,计算公式如下:In the formula, S is the standard deviation of the fitness function, and the calculation formula is as follows:
式中,m为蚂蚁个数,Fave为平均适应度值;In the formula, m is the number of ants, and Fave is the average fitness value;
由上式可知,离越近,蚂蚁的转移概率就越大,其搜索的方法如下:From the above formula, we can see that The closer the ant is, the greater the probability of its transfer. The search method is as follows:
若P(xi)≤P0,其中,P0为常数,0<P0<1,则蚂蚁在附近局部位置搜索,移动公式如下:If P( xi )≤P0, where P0 is a constant and 0<P0<1, the ant searches in the nearby local position and the movement formula is as follows:
式中为移动后的位置,为移动前的位置,a为移动步长,定义如下:In the formula is the position after moving, is the position before moving, a is the moving step length, which is defined as follows:
若P(xi)>P0,则蚂蚁在解空间搜索;If P( xi )>P0, the ant searches in the solution space;
信息素更新:Pheromone Update:
根据个体位置函数值的大小,更新信息素如下:According to the value of the individual position function, the pheromone is updated as follows:
式中,ρ为信息蒸发系数。Where ρ is the information evaporation coefficient.
与现有技术相比,本方案原理和优点如下:Compared with the existing technology, the principles and advantages of this solution are as follows:
相对于传统的XGBoost机器学习算法在Android恶意软件检测中因参数选取而影响XGBoost算法分类的表现性能,本方案应用蚁群算法对XGBoost 的进行参数寻优,快速地找到最优参数,使得XGBoost算法具有良好得分类性能,应用到Android恶意软件检测模型中,使在Android恶意软件检测时具有更高的分类精度,大大提高恶意软件检测的正确率,从而降低由于检测错误而导致Android系统遭受攻击的概率。Compared with the traditional XGBoost machine learning algorithm, which affects the performance of XGBoost algorithm classification due to parameter selection in Android malware detection, this scheme uses the ant colony algorithm to optimize the parameters of XGBoost and quickly find the optimal parameters, so that the XGBoost algorithm has good classification performance and is applied to the Android malware detection model, so that it has higher classification accuracy in Android malware detection, greatly improving the accuracy of malware detection, thereby reducing the probability of Android system being attacked due to detection errors.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明一种基于XGBoost机器学习算法的Android恶意软件检测方法的检测流程图;FIG1 is a detection flow chart of an Android malware detection method based on an XGBoost machine learning algorithm of the present invention;
图2为本发明一种基于XGBoost机器学习算法的Android恶意软件检测方法中特征提取的流程图;FIG2 is a flow chart of feature extraction in an Android malware detection method based on an XGBoost machine learning algorithm according to the present invention;
图3为本发明一种基于XGBoost机器学习算法的Android恶意软件检测方法中应用蚁群算法优化XGBoost参数的流程图。FIG3 is a flow chart of applying the ant colony algorithm to optimize XGBoost parameters in an Android malware detection method based on the XGBoost machine learning algorithm of the present invention.
具体实施方式DETAILED DESCRIPTION
下面结合具体实施例对本发明作进一步说明:The present invention will be further described below in conjunction with specific embodiments:
本实施例所述的一种基于XGBoost机器学习算法的Android恶意软件检测方法,具体内容如下:The Android malware detection method based on the XGBoost machine learning algorithm described in this embodiment is specifically as follows:
XGBoost(eXtreme Gradient Boosting)由Tian Chen于2015年提出的一种集成学习算法,在XGBoost集成学习框架中,直接影响其分类性能的主要有参数的收缩步长(shrinkage)和子节点中最小样本权重阈值 (min_child_weight)。过小的shrinkage会导致算法过拟合,较大的 shrinkage导致算法无法收敛,对于min_child_weight,过小会导致算法过拟合,过大的mini_child_weight将会导致算法对线性不可分数据的分类性能。XGBoost (eXtreme Gradient Boosting) is an ensemble learning algorithm proposed by Tian Chen in 2015. In the XGBoost ensemble learning framework, the main parameters that directly affect its classification performance are the shrinkage step size (shrinkage) and the minimum sample weight threshold (min_child_weight) in the child node. Too small shrinkage will cause the algorithm to overfit, and too large shrinkage will cause the algorithm to fail to converge. For min_child_weight, too small will cause the algorithm to overfit, and too large mini_child_weight will cause the algorithm to have poor classification performance for linearly inseparable data.
因此,本实施例通过反编译apk文件提取Permission,Intent,Component 和APIcall特征量化组成特征矩阵后,利用蚁群优化算法对XGBoost集成学习框架进行参数优化,快速寻找到全局最优解,多次迭代后获取最优目标值并且得到XGBoost的最优参数组合收缩步长shrinkage和子节点中最小样本权重阈值min_child_weight,最后将优化后的XGBoost算法应用到Android 恶意软件检测模型中。如图1所示,具体步骤如下:Therefore, this embodiment extracts Permission, Intent, Component and APIcall features by decompiling the apk file to quantify the feature matrix, and then uses the ant colony optimization algorithm to optimize the parameters of the XGBoost integrated learning framework, quickly find the global optimal solution, obtain the optimal target value after multiple iterations, and obtain the optimal parameter combination of XGBoost, the shrinkage step size, and the minimum sample weight threshold min_child_weight in the child node. Finally, the optimized XGBoost algorithm is applied to the Android malware detection model. As shown in Figure 1, the specific steps are as follows:
S1:利用apktool将apk文件反编译得到AndroidManifest.xml和 classes.dex;S1: Use apktool to decompile the apk file to get AndroidManifest.xml and classes.dex;
S2:提取Permission、Intent、Component和API call特征,具体过程如图2所示;S2: Extract Permission, Intent, Component and API call features. The specific process is shown in Figure 2.
S3:特征量化,输出值为one-hot向量,如果存在特征,则标记为1,否则将其标记为0;S3: Feature quantization, the output value is a one-hot vector, if the feature exists, it is marked as 1, otherwise it is marked as 0;
S4:将所有的特征向量形成特征向量集合,采用特征选择算法对特征向量集合进行降维,选取最优的特征子集;S4: All feature vectors are formed into a feature vector set, and the feature selection algorithm is used to reduce the dimension of the feature vector set to select the optimal feature subset;
S5:利用蚁群优化算法对XGBoost集成学习框架进行参数优化,快速寻找到全局最优解,多次迭代后获取最优目标值并且得到XGBoost的最优参数组合收缩步长shrinkage和子节点中最小样本权重阈值min_child_weight;S5: Use the ant colony optimization algorithm to optimize the parameters of the XGBoost integrated learning framework, quickly find the global optimal solution, obtain the optimal target value after multiple iterations, and obtain the optimal parameter combination of XGBoost, the shrinkage step size, and the minimum sample weight threshold min_child_weight in the child node;
S6:将优化特征向量随机抽取10%作为测试集,剩余的90%作为训练集合输入到优化后的XGBoost集成学习框架中进行优化学习;S6: Randomly extract 10% of the optimized feature vectors as the test set, and the remaining 90% as the training set and input them into the optimized XGBoost ensemble learning framework for optimization learning;
S7:从真正率、假正率、分类精度对分类结果进行评估,判断该基于蚁群算法优化的XGBoost算法用于生成Android恶意软件检测模型是否符合检测要求。S7: Evaluate the classification results from the perspective of true positive rate, false positive rate, and classification accuracy to determine whether the XGBoost algorithm optimized by the ant colony algorithm is used to generate the Android malware detection model that meets the detection requirements.
上述中,如图3所示,利用蚁群优化算法对XGBoost集成学习框架进行参数优化的具体步骤如下:In the above, as shown in Figure 3, the specific steps of using the ant colony optimization algorithm to optimize the parameters of the XGBoost integrated learning framework are as follows:
A、设置XGBoost分类器参数的收缩步长shrinkage和子节点中最小样本权重阈值min_child_weight的上下限,最大的迭代次数MaxIter,蚁群规模M,信息蒸发系数Rho;A. Set the shrinkage step size of the XGBoost classifier parameters, the upper and lower limits of the minimum sample weight threshold min_child_weight in the child node, the maximum number of iterations MaxIter, the ant colony size M, and the information evaporation coefficient Rho;
B、初始化种群,即初始化shrinkage和min_child_weight,作为每一只蚂蚁的位置向量;B. Initialize the population, that is, initialize shrinkage and min_child_weight as the position vector of each ant;
C、执行蚁群搜索;C. Perform ant colony search;
D、进行XGBoost训练;D. Perform XGBoost training;
E、用XGBoost分类器计算每只蚂蚁的目标函数值和信息素值,寻找当前最优蚂蚁;E. Use XGBoost classifier to calculate the objective function value and pheromone value of each ant and find the current optimal ant;
F、判断是否满足终止条件:如果迭代的次数大于MaxIter,则输出蚁群最优值以及对应的shrinkage和min_child_weight值,执行步骤G,否则迭代次数加1,执行步骤C;F. Determine whether the termination condition is met: If the number of iterations is greater than MaxIter, output the optimal value of the ant colony and the corresponding shrinkage and min_child_weight values, and execute step G; otherwise, increase the number of iterations by 1 and execute step C;
G、更新信息素;G. Update pheromones;
H、将输出的shrinkage和min_child_weight用于Android恶意软件的检测模型中。H. Use the output shrinkage and min_child_weight in the Android malware detection model.
而具体的蚁群优化算法如下:The specific ant colony optimization algorithm is as follows:
蚁群位置初始化:Ant colony position initialization:
假设XGBoost的分类准确率作为目标函数值Assume that XGBoost's classification accuracy is used as the objective function value
max{F(s1,w1),F(s2,w2),...,F(sm,wm)},记为 max fitness=max{F(X)},X={x1,x2,...,xm},其中xi表示蚂蚁,利用混沌序列产生初始化的种群步骤如下:max{F(s 1 ,w 1 ),F(s 2 ,w 2 ),...,F(s m ,w m )}, denoted as max fitness=max{F(X)},X={x 1 ,x 2 ,...,x m }, where xi represents ants. The steps to generate the initialized population using chaotic sequence are as follows:
1)产生一个D维的随机向量:1) Generate a D-dimensional random vector:
2)Logistics映射,使用上式作为初始迭代,Logistics映射方程如下:2) Logistics mapping, using the above formula as the initial iteration, the Logistics mapping equation is as follows:
式中,μ=1,i=1,2,...,N,d=1,2,..,D;In the formula, μ=1,i=1,2,...,N,d=1,2,...,D;
3)将混沌空间映射到优化变量的搜索空间:3) Map the chaotic space to the search space of optimization variables:
式中,maxd为取上限值,mind为取下限值;In the formula, max d is the upper limit value, and min d is the lower limit value;
蚂蚁移动规则:Ant movement rules:
蚁群初始化后,计算其目标函数,为第k迭代第j个蚂蚁的位置向量,定义,目标函数越大,其位置信息素浓度越大,则保存当前目标值最大的蚂蚁为以及其信息素最大值 After the ant colony is initialized, its objective function is calculated. is the position vector of the jth ant at the kth iteration. It is defined that the larger the objective function is, the greater the pheromone concentration is at its position. The ant with the largest current objective value is and its pheromone maximum value
选择局部搜索或者全局搜索:Select Local Search or Global Search:
蚂蚁转移的概率定义如下:The probability of ant migration is defined as follows:
式中,S为适应度函数的标准差,计算公式如下:In the formula, S is the standard deviation of the fitness function, and the calculation formula is as follows:
式中,m为蚂蚁个数,Fave为平均适应度值;In the formula, m is the number of ants, and Fave is the average fitness value;
由上式可知,离越近,蚂蚁的转移概率就越大,其搜索的方法如下:From the above formula, we can see that The closer the ant is, the greater the probability of its transfer. The search method is as follows:
若P(xi)≤P0,其中,P0为常数,0<P0<1,则蚂蚁在附近局部位置搜索,移动公式如下:If P( xi )≤P0, where P0 is a constant and 0<P0<1, the ant searches in the nearby local position and the movement formula is as follows:
式中为移动后的位置,为移动前的位置,a为移动步长,定义如下:In the formula is the position after moving, is the position before moving, a is the moving step length, which is defined as follows:
若P(xi)>P0,则蚂蚁在解空间搜索;If P( xi )>P0, the ant searches in the solution space;
信息素更新:Pheromone Update:
根据个体位置函数值的大小,更新信息素如下:According to the value of the individual position function, the pheromone is updated as follows:
式中,ρ为信息蒸发系数。Where ρ is the information evaporation coefficient.
本实施例首先通过反编译apk文件提取Permission,Intent,Component 和APIcall特征,并量化组成特征矩阵,利用蚁群算法的并行性和较强的鲁棒性,对XGBoost分类器参数进行寻优,以求得最优目标并得到XGBoost的最优参数组合。该实施例提出的改进的XGBoost机器学习算法与传统的 XGBoost算法相比,在Android恶意软件检测时具有更高的分类精度,提高了恶意软件检测的正确率,降低了由于检测错误而导致Android系统遭受攻击的概率。This embodiment first extracts Permission, Intent, Component and APIcall features by decompiling the apk file, and quantizes the feature matrix, and optimizes the XGBoost classifier parameters by using the parallelism and strong robustness of the ant colony algorithm to obtain the optimal target and the optimal parameter combination of XGBoost. Compared with the traditional XGBoost algorithm, the improved XGBoost machine learning algorithm proposed in this embodiment has higher classification accuracy in Android malware detection, improves the accuracy of malware detection, and reduces the probability of Android system being attacked due to detection errors.
以上所述之实施例子只为本发明之较佳实施例,并非以此限制本发明的实施范围,故凡依本发明之形状、原理所作的变化,均应涵盖在本发明的保护范围内。The embodiments described above are only preferred embodiments of the present invention and are not intended to limit the scope of implementation of the present invention. Therefore, all changes made according to the shape and principle of the present invention should be included in the protection scope of the present invention.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811150736.1A CN109543406B (en) | 2018-09-29 | 2018-09-29 | Android malicious software detection method based on XGboost machine learning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811150736.1A CN109543406B (en) | 2018-09-29 | 2018-09-29 | Android malicious software detection method based on XGboost machine learning algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109543406A CN109543406A (en) | 2019-03-29 |
CN109543406B true CN109543406B (en) | 2023-04-11 |
Family
ID=65841391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811150736.1A Expired - Fee Related CN109543406B (en) | 2018-09-29 | 2018-09-29 | Android malicious software detection method based on XGboost machine learning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109543406B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197068B (en) * | 2019-05-06 | 2022-07-12 | 广西大学 | Android malicious application detection method based on improved gray wolf algorithm |
CN110263539A (en) * | 2019-05-15 | 2019-09-20 | 湖南警察学院 | A kind of Android malicious application detection method and system based on concurrent integration study |
CN110362995B (en) * | 2019-05-31 | 2022-12-02 | 电子科技大学成都学院 | Malicious software detection and analysis system based on reverse direction and machine learning |
CN112818344B (en) * | 2020-08-17 | 2024-06-04 | 北京辰信领创信息技术有限公司 | Method for improving virus killing rate by using artificial intelligence algorithm |
CN112989342B (en) * | 2021-03-04 | 2022-08-05 | 北京邮电大学 | Malware detection network optimization method, device, electronic device and storage medium |
CN115801463B (en) * | 2023-02-06 | 2023-04-18 | 山东能源数智云科技有限公司 | Industrial Internet platform intrusion detection method and device and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194803A (en) * | 2017-05-19 | 2017-09-22 | 南京工业大学 | P2P net loan borrower credit risk assessment device |
CN107577942A (en) * | 2017-08-22 | 2018-01-12 | 中国民航大学 | A Hybrid Feature Screening Method for Android Malware Detection |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6701311B2 (en) * | 2001-02-07 | 2004-03-02 | International Business Machines Corporation | Customer self service system for resource search and selection |
US8108933B2 (en) * | 2008-10-21 | 2012-01-31 | Lookout, Inc. | System and method for attack and malware prevention |
-
2018
- 2018-09-29 CN CN201811150736.1A patent/CN109543406B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194803A (en) * | 2017-05-19 | 2017-09-22 | 南京工业大学 | P2P net loan borrower credit risk assessment device |
CN107577942A (en) * | 2017-08-22 | 2018-01-12 | 中国民航大学 | A Hybrid Feature Screening Method for Android Malware Detection |
Non-Patent Citations (1)
Title |
---|
基于机器学习的移动终端高级持续性威胁检测技术研究;胡彬等;《计算机工程》;20170115(第01期);242-246 * |
Also Published As
Publication number | Publication date |
---|---|
CN109543406A (en) | 2019-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543406B (en) | Android malicious software detection method based on XGboost machine learning algorithm | |
Pei et al. | AMalNet: A deep learning framework based on graph convolutional networks for malware detection | |
Kolosnjaji et al. | Adversarial malware binaries: Evading deep learning for malware detection in executables | |
CN109145600B (en) | System and method for detecting malicious files using static analysis elements | |
CN106503558B (en) | An Android malicious code detection method based on community structure analysis | |
Xiaofeng et al. | ASSCA: API sequence and statistics features combined architecture for malware detection | |
WO2021027831A1 (en) | Malicious file detection method and apparatus, electronic device and storage medium | |
CN108985061B (en) | A webshell detection method based on model fusion | |
CN113704759B (en) | Adaboost-based android malicious software detection method and system and storage medium | |
CN113297571B (en) | Graph Neural Network Model-Oriented Backdoor Attack Detection Method and Device | |
CN113139185A (en) | Malicious code detection method and system based on heterogeneous information network | |
CN111144274A (en) | A method and device for protecting social image privacy for YOLO detector | |
CN114595451A (en) | Graph convolution-based android malicious application classification method | |
Wu | A systematical study for deep learning based android malware detection | |
CN114637990A (en) | File malice degree evaluation method and device, electronic equipment and medium | |
CN108959930A (en) | Malice PDF detection method, system, data storage device and detection program | |
Onoja et al. | Exploring the effectiveness and efficiency of LightGBM algorithm for windows malware detection | |
Olowoyo et al. | Malware classification using deep learning technique | |
CN111368894B (en) | A FCBF Feature Selection Method and Its Application in Network Intrusion Detection | |
Du et al. | A mobile malware detection method based on malicious subgraphs mining | |
CN110647747B (en) | False mobile application detection method based on multi-dimensional similarity | |
CN107622201B (en) | A kind of Android platform clone's application program rapid detection method of anti-reinforcing | |
CN110197068A (en) | Based on the Android malicious application detection method for improving grey wolf algorithm | |
KR20200067044A (en) | Method and apparatus for detecting malicious file | |
CN113449304B (en) | Malicious software detection method and device based on strategy gradient dimension reduction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20230411 |