CN107092829A

CN107092829A - A kind of malicious code detecting method based on images match

Info

Publication number: CN107092829A
Application number: CN201710265324.1A
Authority: CN
Inventors: 喻波; 刘浏; 杨强; 解炜; 唐勇; 陈曙晖; 方莹
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-04-21
Filing date: 2017-04-21
Publication date: 2017-08-25
Anticipated expiration: 2037-04-21
Also published as: CN107092829B

Abstract

The invention discloses a malicious code detection method based on image matching. The steps include: S1. Obtain training samples of malicious codes corresponding to different family categories, respectively convert the training samples into grayscale images and extract corresponding image texture features; from each family The first benchmark sample is selected from the training samples of the category, and the second benchmark sample is selected according to the difference of the image texture characteristics between the first benchmark sample and the samples, and the first benchmark sample and the second benchmark sample selected by each family category form a corresponding Benchmark sample set; S2. Convert the malicious code to be detected into a grayscale image, and extract the corresponding image texture features; S3. Match the image texture features extracted in step S2 with the benchmark sample sets corresponding to each family category , confirm the family category of the malicious code to be detected according to the matching result. The invention has the advantages of simple implementation method, strong robustness, high detection accuracy and high detection effect.

Description

A Malicious Code Detection Method Based on Image Matching

技术领域technical field

本发明涉及恶意代码检测分析技术领域，尤其涉及一种基于图像匹配的恶意代码检测方法。The invention relates to the technical field of malicious code detection and analysis, in particular to a method for detecting malicious code based on image matching.

背景技术Background technique

随着恶意代码自动化生成工具的广泛应用，以及开源代码在恶意代码中的应用，恶意代码的变种与新的恶意代码家族的数量也迅猛增长，据统计年度检测出的恶意代码变种达到4.3亿，恶意代码已经成为网络空间安全的重大挑战。传统的恶意代码检测方法主要分为两种：一种是基于签名机制的检测方法，能够快速地检测出已知恶意代码样本，但是缺点是需要大量的专家经验和人工参与分析，而且难以应对变形和混淆的恶意代码样本；另一种是基于异常行为的检测方法，可以检测出零日漏洞和新型家族的恶意代码样本，但其误报率也很高。With the widespread application of malicious code automatic generation tools and the application of open source code in malicious code, the number of malicious code variants and new malicious code families has also increased rapidly. According to statistics, the number of malicious code variants detected annually reached 430 million. Malicious code has become a major challenge to cyberspace security. Traditional malicious code detection methods are mainly divided into two types: one is the detection method based on the signature mechanism, which can quickly detect known malicious code samples, but the disadvantage is that it requires a lot of expert experience and manual analysis, and it is difficult to deal with deformation and obfuscated malicious code samples; the other is a detection method based on abnormal behavior, which can detect zero-day vulnerabilities and new types of malicious code samples, but its false positive rate is also high.

基于自动化分析的恶意代码检测方法可以解决上述问题，该类方法主要是使用机器学习的方法对恶意代码进行分析，通常分为三个步骤：①提取恶意代码特征；②选择适当的模型；③获得分类结果。现有技术中基于自动化分析的恶意代码检测方法，从特征选择的角度可以分为两种类型：一种是基于静态特征的方法，另一种是基于动态特征的方法，其中第一种类型方法不需要运行恶意代码，只需要通过诸如IDA、PEView和hap-depends等工具获得恶意代码的二进制码或操作码等，如采用字节码N-Gram作为恶意代码的检测特征，再通过词频方法降低特征维度，最终通过SVM模型等分类恶意代码，或采用OpCode和字节码相结合的特征，再利用集成学习方法进行样本分类；动态学习相比于静态学习可以准确的获得恶意代码的行为信息和目的性，第二种基于动态特征的方式需要运行程序，获得程序的动态行为，最后由特征提取方法提取出的特征便是静态特征和动态特征，如基于API的调用字符串以及参数信息抽取恶意代码的行为知识，并将这些知识转化为特征向量。Malicious code detection methods based on automated analysis can solve the above problems. This type of method mainly uses machine learning methods to analyze malicious codes. It is usually divided into three steps: ① extracting malicious code features; ② selecting an appropriate model; ③ obtaining classification results. Malicious code detection methods based on automated analysis in the prior art can be divided into two types from the perspective of feature selection: one is based on static features, and the other is based on dynamic features. The first type of method There is no need to run malicious code, only the binary code or operation code of the malicious code needs to be obtained through tools such as IDA, PEView, and hap-depends, etc., such as using bytecode N-Gram as the detection feature of malicious code, and then reducing the Feature dimension, and finally classify malicious code through SVM model, or use the characteristics of the combination of OpCode and bytecode, and then use the integrated learning method to classify samples; compared with static learning, dynamic learning can accurately obtain malicious code behavior information and Purpose, the second method based on dynamic features needs to run the program to obtain the dynamic behavior of the program, and finally the features extracted by the feature extraction method are static features and dynamic features, such as API-based call strings and parameter information extraction malicious The behavioral knowledge of the code and convert this knowledge into feature vectors.

上述基于自动化分析的恶意代码检测方法存在以下缺陷：The above-mentioned malicious code detection method based on automated analysis has the following defects:

(1)鲁棒性差、检测精度低。该类方法中基于提取的恶意代码的特征进行分类检测，不同的特征所获取的检测精度可能不同，特征提取的精度、以及特征本身的选取都将直接影响最终检测分析结果的精度，因而实际检测的鲁棒性差且检测精度低；(1) Poor robustness and low detection accuracy. In this type of method, classification and detection are performed based on the features of the extracted malicious code. The detection accuracy obtained by different features may be different. The accuracy of feature extraction and the selection of features themselves will directly affect the accuracy of the final detection and analysis results. Therefore, the actual detection poor robustness and low detection accuracy;

(2)检测效率低。该类方法通常实现较为复杂，通常都需要较长的时间进行模型训练，使得检测效率低。(2) The detection efficiency is low. This type of method is usually more complicated to implement, and usually requires a long time for model training, making the detection efficiency low.

发明内容Contents of the invention

本发明要解决的技术问题就在于：针对现有技术存在的技术问题，本发明提供一种实现方法简单、鲁棒性强、检测准确度以及检测效果高的基于图像匹配的恶意代码检测方法。The technical problem to be solved by the present invention lies in: aiming at the technical problems existing in the prior art, the present invention provides a malicious code detection method based on image matching with simple implementation method, strong robustness, high detection accuracy and detection effect.

为解决上述技术问题，本发明提出的技术方案为：In order to solve the problems of the technologies described above, the technical solution proposed by the present invention is:

一种基于图像匹配的恶意代码检测方法，步骤包括：A malicious code detection method based on image matching, the steps comprising:

S1.基准样本选取：获取对应不同家族类别恶意代码的训练样本，分别将所述训练样本转换为灰度图像并提取对应的图像纹理特征；从各个家族类别的训练样本中选取第一基准样本，以及根据所述第一基准样本、样本之间图像纹理特征的差异选取第二基准样本，将各个家族类别选取的第一基准样本、第二基准样本构成对应的基准样本集；S1. Reference sample selection: obtain training samples corresponding to different family categories of malicious code, respectively convert the training samples into grayscale images and extract corresponding image texture features; select the first reference sample from the training samples of each family category, And selecting a second reference sample according to the difference in image texture features between the first reference sample and the samples, and forming a corresponding reference sample set from the first reference sample and the second reference sample selected for each family category;

S2.图像特征提取：将待检测恶意代码转化为灰度图像，并提取对应的图像纹理特征；S2. Image feature extraction: convert the malicious code to be detected into a grayscale image, and extract the corresponding image texture features;

S3.测试代码分类：将所述步骤S2提取的图像纹理特征分别与各家族类别对应的所述基准样本集进行匹配，根据匹配结果确认待检测恶意代码的家族类别。S3. Test code classification: Match the image texture features extracted in the step S2 with the reference sample sets corresponding to each family category, and confirm the family category of the malicious code to be detected according to the matching result.

作为本发明的进一步改进，所述步骤S1中每个家族类别选取第二基准样本的具体步骤为：As a further improvement of the present invention, the specific steps for selecting the second benchmark sample for each family category in the step S1 are:

S11.候选基准样本获取：将选取的各所述第一基准样本分别与剩余的训练样本进行匹配，根据匹配结果查找出每个家族类别中被错误分配的训练样本并作为候选基准样本；S11. Acquisition of candidate reference samples: matching each of the selected first reference samples with the remaining training samples, and finding out the wrongly assigned training samples in each family category according to the matching results and using them as candidate reference samples;

S12.第二基准样本确定：分别计算各家族类别中各所述候选基准样本与其他候选基准样本之间的差异值，若计算到的所述差异值大于指定阈值，则将对应的候选基准样本作为对应家族类别的第二基准样本。S12. Determination of the second benchmark sample: calculate the difference value between each candidate benchmark sample and other candidate benchmark samples in each family category, and if the calculated difference value is greater than the specified threshold, the corresponding candidate benchmark sample as the second benchmark sample for the corresponding family category.

作为本发明的进一步改进，所述步骤S12中具体通过计算各所述候选基准样本的Gabor函数值，以及各所述候选基准样本与其他训练样本之间的距离值，根据所述Gabor函数值、所述距离值计算所述候选基准样本与其他候选基准样本之间的差异值。As a further improvement of the present invention, in the step S12, specifically by calculating the Gabor function value of each of the candidate reference samples, and the distance value between each of the candidate reference samples and other training samples, according to the Gabor function value, The distance value computes a difference value between the candidate reference sample and other candidate reference samples.

作为本发明的进一步改进，一个候选基准样本与其他候选基准样本之间的差异值具体按下式计算得到；As a further improvement of the present invention, the difference between a candidate reference sample and other candidate reference samples is specifically calculated by the following formula;

p_d(es_id)＝∑_{j＝0,1,......,N}D(es_id,es_hj)p _d (es _id )=∑ _j=0,1,...,N D(es _id ,es _hj )

其中，es_id为第i类第d个候选基准样本，es_hj为第h类第j个候选基准样本，D(es_id,es_hj)为样本es_id与样本es_hj之间的距离，h为es_id被错误分配的家族类别，μ为权衡系数，N为家族类别h所包含的基准样本的数量，M为基准样本的数量，l为所述图像纹理特征的向量长度。Among them, es _id is the dth candidate benchmark sample of class i, es _hj is the jth candidate benchmark sample of class h, D(es _id ,es _hj ) is the distance between sample es _id and sample es _hj , h is the family category whose es _id is misassigned, μ is the trade-off coefficient, N is the number of benchmark samples contained in the family category h, M is the number of benchmark samples, and l is the vector length of the image texture feature.

作为本发明的进一步改进：所述图像纹理特征为信号型静态纹理特征。As a further improvement of the present invention: the image texture feature is a signal-type static texture feature.

作为本发明的进一步改进：所述图像纹理特征具体采用Gabor滤波器提取得到。As a further improvement of the present invention: the image texture feature is specifically extracted by using a Gabor filter.

作为本发明的进一步改进，所述步骤S3中确认待检测恶意代码的家族类别的具体步骤为：As a further improvement of the present invention, the specific steps for confirming the family category of the malicious code to be detected in the step S3 are:

S31.分别获取待检测恶意代码与各家族类别的所述基准样本集中所有基准样本的匹配结果；S31. Respectively obtain the matching results of the malicious code to be detected and all the benchmark samples in the benchmark sample set of each family category;

S32.由各家族类别的所有匹配结果分别得到对应各家族类别的综合匹配值，根据各家族类别的所述综合匹配值判断待检测恶意代码是否属于对应家族类别。S32. Obtain a comprehensive matching value corresponding to each family category from all matching results of each family category, and judge whether the malicious code to be detected belongs to the corresponding family category according to the comprehensive matching value of each family category.

作为本发明的进一步改进：所述综合匹配值具体按下式计算得到；As a further improvement of the present invention: the comprehensive matching value is specifically calculated according to the following formula;

其中，es_test为待检测恶意代码，es_ij为第i类第j个基准样本，N为家族类别i所包含的基准样本的数量。in, es _test is the malicious code to be detected, es _ij is the jth benchmark sample of the i-th category, and N is the number of benchmark samples contained in the family category i.

作为本发明的进一步改进：当目标家族类别对应的所述综合匹配值R满足R＞0时，则判定为待检测恶意代码属于目标家族类别，否则判定为待检测恶意代码不属于目标家族类别。As a further improvement of the present invention: when the comprehensive matching value R corresponding to the target family category satisfies R>0, it is determined that the malicious code to be detected belongs to the target family category; otherwise, it is determined that the malicious code to be detected does not belong to the target family category.

与现有技术相比，本发明的优点在于：Compared with the prior art, the present invention has the advantages of:

1)本发明基于图像匹配的恶意代码检测方法，通过自动化提取图像纹理特征，基于特征相似性分析进行图像匹配，再基于图像匹配结果实现家族分类判定，能够实现检测的自动化，可方便、高效实现大规模恶意代码家族检测分析；1) The malicious code detection method based on image matching of the present invention automatically extracts image texture features, performs image matching based on feature similarity analysis, and then realizes family classification and judgment based on image matching results, which can realize automatic detection, which can be realized conveniently and efficiently Large-scale malicious code family detection and analysis;

2)本发明基于图像匹配的恶意代码检测方法，在图像匹配过程中，通过先选取各家族类别的第一基准样本，再根据第一基准样本以及样本之间的图像纹理特征差异选取第二基准样本，由第一基准样本、第二基准样本构成基准样本集对待检测恶意代码进行图像匹配，最终确认待检测恶意代码的家族类别，无需进行长时间的模型训练，所选取的基准样本可靠性高，能够大大降低由基准样本选取对检测结果造成的影响，从而提高检测精度；2) The malicious code detection method based on image matching of the present invention, in the image matching process, by first selecting the first reference sample of each family category, and then selecting the second reference sample according to the first reference sample and the image texture feature difference between the samples Samples, the first benchmark sample and the second benchmark sample constitute the benchmark sample set for image matching of the malicious code to be detected, and finally confirm the family category of the malicious code to be detected, without the need for long-term model training, the selected benchmark samples are highly reliable , which can greatly reduce the impact of the selection of reference samples on the detection results, thereby improving the detection accuracy;

3)本发明基于图像匹配的恶意代码检测方法，进一步先选取第一基准样本，再基于第一基准样本选取第二基准样本，通过与第一基准样本的匹配状态查找出匹配错误的样本作为候选基准样本，由候选基准样本计算与当前家族类别的训练样本之间的差异值，最后基于差异值确定是否作为新的基准样本，使得由错误分配、同时与其他样本之间差异较大的样本也作为基准样本，实现方法简单，相比于传统的图像匹配直接选取基准样本，能够有效增加基准样本选择的可靠性，从而进一步提高恶意代码检测的精度。3) The malicious code detection method based on image matching of the present invention further selects the first reference sample first, and then selects the second reference sample based on the first reference sample, and finds the wrongly matched sample as a candidate through the matching state with the first reference sample Baseline sample, the difference value between the candidate benchmark sample and the training sample of the current family category is calculated, and finally based on the difference value, it is determined whether to use it as a new benchmark sample, so that the sample assigned by the error and has a large difference from other samples is also As a benchmark sample, the implementation method is simple. Compared with the traditional image matching, directly selecting the benchmark sample can effectively increase the reliability of benchmark sample selection, thereby further improving the accuracy of malicious code detection.

附图说明Description of drawings

图1是本实施例基于图像匹配的恶意代码检测方法的实现流程示意图。FIG. 1 is a schematic diagram of the implementation flow of the malicious code detection method based on image matching in this embodiment.

图2是本实施例基于图像匹配的恶意代码检测方法的实现原理示意图。FIG. 2 is a schematic diagram of the implementation principle of the malicious code detection method based on image matching in this embodiment.

图3是本实施例中灰度图像转换的具体实现流程示意图。FIG. 3 is a schematic diagram of a specific implementation process of grayscale image conversion in this embodiment.

图4是本发明具体实施例中得到的灰度图。Fig. 4 is a grayscale image obtained in a specific embodiment of the present invention.

具体实施方式detailed description

以下结合说明书附图和具体优选的实施例对本发明作进一步描述，但并不因此而限制本发明的保护范围。The present invention will be further described below in conjunction with the accompanying drawings and specific preferred embodiments, but the protection scope of the present invention is not limited thereby.

如图1、2所示，本实施例基于图像匹配的恶意代码检测方法，步骤包括：As shown in Figures 1 and 2, the malicious code detection method based on image matching in this embodiment, the steps include:

S1.基准样本选取：获取对应不同家族类别恶意代码的训练样本，分别将训练样本转换为灰度图像并提取对应的图像纹理特征；从各个家族类别的训练样本中选取第一基准样本，以及根据第一基准样本、样本之间图像纹理特征的差异选取第二基准样本，将各个家族类别选取的第一基准样本、第二基准样本构成对应的基准样本集；S1. Benchmark sample selection: Obtain training samples corresponding to different family categories of malicious code, respectively convert the training samples into grayscale images and extract corresponding image texture features; select the first benchmark sample from the training samples of each family category, and according to Selecting a second benchmark sample based on the differences in image texture features between the first benchmark sample and the samples, and forming a corresponding benchmark sample set from the first benchmark sample and the second benchmark sample selected for each family category;

S3.测试代码分类：将步骤S2提取的图像纹理特征分别与各家族类别对应的基准样本集进行匹配，根据匹配结果确认待检测恶意代码的家族类别。S3. Test code classification: match the image texture features extracted in step S2 with the benchmark sample sets corresponding to each family category, and confirm the family category of the malicious code to be detected according to the matching results.

图像的纹理能够反映图像中同质现象的视觉特征，同时能够体现物体表面所具有的缓慢变化的或者周期性变化的表面结构组织排列属性。本实施例利用上述特性纹理特性，基于图像匹配的方式实现恶意代码的检测，通过自动化提取图像纹理特征，基于特征相似性分析进行图像匹配，再基于图像匹配结果实现家族分类判定，能够实现检测的自动化，可方便的应用于PC端、移动客户端等或大规模的恶意代码家族同源性分析的后端分类系统中进行恶意软件检测，可以通过在线或离线等方式高效地从海量待检测样本中挖掘恶意代码。The texture of the image can reflect the visual characteristics of homogeneous phenomena in the image, and at the same time, it can reflect the slowly changing or periodically changing surface structure organization and arrangement properties of the object surface. In this embodiment, the detection of malicious codes is realized based on image matching by using the above-mentioned characteristic texture characteristics, by automatically extracting image texture features, performing image matching based on feature similarity analysis, and then realizing family classification and judgment based on image matching results, which can realize detection. It is automated and can be easily applied to PCs, mobile clients, etc., or in the back-end classification system of large-scale malicious code family homology analysis for malware detection, and can efficiently collect samples from a large number of samples to be detected through online or offline methods mining malicious code.

本实施例在图像匹配过程中，通过先选取各家族类别的第一基准样本，再根据第一基准样本以及样本之间的图像纹理特征差异选取第二基准样本，由第一基准样本、第二基准样本构成基准样本集对待检测恶意代码进行图像匹配，最终确认待检测恶意代码的家族类别，无需进行长时间的模型训练，检测效率高，所选取的基准样本可靠性高，能够大大降低由基准样本选取对检测结果造成的影响，从而提高检测精度。In the image matching process of this embodiment, by first selecting the first reference sample of each family category, and then selecting the second reference sample according to the first reference sample and the difference in image texture characteristics between the samples, the first reference sample, the second The benchmark sample constitutes the benchmark sample set for image matching of the malicious code to be detected, and finally confirms the family category of the malicious code to be detected, without the need for long-term model training, the detection efficiency is high, and the selected benchmark samples are highly reliable, which can greatly reduce the The impact of sample selection on the detection results, thereby improving the detection accuracy.

本实施例中，步骤S1中每个家族类别选取第二基准样本的具体步骤为：In this embodiment, the specific steps for selecting the second reference sample for each family category in step S1 are:

S11.候选基准样本获取：将选取的各第一基准样本分别与剩余的训练样本进行匹配，根据匹配结果查找出每个家族类别中被错误分配的训练样本并作为候选基准样本；S11. Acquisition of candidate reference samples: matching the selected first reference samples with the remaining training samples, and finding out the wrongly assigned training samples in each family category according to the matching results and using them as candidate reference samples;

S12.第二基准样本确定：分别计算各家族类别中各候选基准样本与其他候选基准样本之间的差异值，若计算到的差异值大于指定阈值，则将对应的候选基准样本作为对应家族类别的第二基准样本。S12. Determination of the second benchmark sample: Calculate the difference between each candidate benchmark sample and other candidate benchmark samples in each family category, and if the calculated difference value is greater than the specified threshold, then use the corresponding candidate benchmark sample as the corresponding family category The second benchmark sample of .

传统的图像匹配中是直接从已知类别的样本集中选择一个基准样本作为本类别的基准样本，如果待检测样本和基准样本的图像相匹配，则将该待检测样本判定为基准样本所对应的类别中，则选择不同样本的基准图像所得到的匹配效果可能不同，基准样本的选取会直接对匹配效果的精度产生影响。本实施例选取基准样本时，基于第一基准样本再选取第二基准样本，第二基准样本选取过程中，首先通过与第一基准样本的匹配状态查找出匹配错误的样本作为候选基准样本，再由候选基准样本计算与其他候选基准样本之间的差异值，基于差异值确定是否作为新的基准样本，使得由错误分配、同时与其他候选基准样本之间差异较大的样本也作为基准样本，实现方法简单，相比于传统的图像匹配直接选取基准样本，能够有效增加基准样本选择的可靠性，从而进一步提高恶意代码检测的精度。In the traditional image matching, a reference sample is directly selected from the sample set of known category as the reference sample of this category. If the image of the sample to be detected matches the image of the reference sample, the sample to be detected is judged to be the corresponding image of the reference sample. In the category, the matching effect obtained by selecting the reference image of different samples may be different, and the selection of the reference sample will directly affect the accuracy of the matching effect. When selecting a reference sample in this embodiment, the second reference sample is selected based on the first reference sample. During the selection process of the second reference sample, firstly, the sample with a matching error is found through the matching state with the first reference sample as a candidate reference sample, and then Calculate the difference value between the candidate benchmark sample and other candidate benchmark samples, and determine whether to use it as a new benchmark sample based on the difference value, so that the sample that is wrongly assigned and has a large difference from other candidate benchmark samples is also used as a benchmark sample, The implementation method is simple, and compared with traditional image matching to directly select benchmark samples, it can effectively increase the reliability of benchmark sample selection, thereby further improving the accuracy of malicious code detection.

本实施例在步骤S1中对于训练样本，首先将样本进行图像转化，将二进的恶意代码转化为灰度图像形式。如图3所示，本实施例将样本进行灰度图像转化时，由于灰度图的每一个像素都是用位于[0,255]之间的无符号整型数据表示，先将二进制形式的恶意代码转化为无符号整型数据矩阵，且由于8bit的二进制数据转化为整型是大于0且小于256，二进制文件具体是以连续的8个bit为单位切割并进行转化，可以根据转换需求微调图像宽度，得到各训练样本的灰度图像。In this embodiment, for the training samples in step S1, the samples are first converted into images, and the binary malicious codes are converted into grayscale images. As shown in Figure 3, when this embodiment converts the sample to a grayscale image, since each pixel of the grayscale image is represented by unsigned integer data between [0, 255], the malicious code in binary form is first converted to It is converted into an unsigned integer data matrix, and since the conversion of 8-bit binary data into an integer is greater than 0 and less than 256, the binary file is specifically cut and converted in units of 8 consecutive bits, and the image width can be fine-tuned according to the conversion requirements , to get the grayscale image of each training sample.

图像纹理特征主要有统计型纹理特征、模型纹理特征、信号型纹理特征以及结构型纹理特征四种。本实施例中，图像纹理特征具体为信号型纹理特征，采用信号型纹理处理方法来进行特征提取，图像纹理特征具体采用Gabor滤波器提取得到，本实施例基于静态特征，不需要运行恶意代码，实现简单。Image texture features mainly include statistical texture features, model texture features, signal texture features and structural texture features. In this embodiment, the image texture feature is specifically a signal-type texture feature, and a signal-type texture processing method is used to perform feature extraction. The image texture feature is specifically extracted by using a Gabor filter. This embodiment is based on static features and does not need to run malicious code. Simple to implement.

Gabor滤波器为用于图像边缘特征提取的线性滤波器，可以定义为一个正弦波乘以高斯函数，其中对于二维Gabor滤波器则为正弦平面波。由于乘法卷积性质，Gabor滤波器的脉冲响应的傅立叶变换是其调和函数的傅立叶变换和高斯函数的傅立叶变换的卷积，则该滤波器由实部和虚步组成且两者互为正交。本实施例所采用的Gabor滤波器具体如下所示，其中复数表达式为：The Gabor filter is a linear filter used for image edge feature extraction, which can be defined as a sine wave multiplied by a Gaussian function, where it is a sine plane wave for a two-dimensional Gabor filter. Due to the nature of multiplicative convolution, the Fourier transform of the impulse response of the Gabor filter is the convolution of the Fourier transform of its harmonic function and the Fourier transform of the Gaussian function, then the filter consists of a real part and an imaginary step, and the two are orthogonal to each other . The Gabor filter adopted in this embodiment is specifically as follows, wherein the complex expression is:

实数部分为：The real part is:

虚部部分为：The imaginary part is:

式中，x′＝xcosθ+ysinθ，y′＝-xsinθ+ycosθ，λ为波长，以像素为单位；θ表示方向，取值范围在为0°≤θ≤360°；ψ表示相位偏移，属于[-180°,180°]区域范围内；γ的值决定Gabor函数的形状的椭圆率；Σ表示Gabor函数的高斯因子的标准差，且随着带宽的变化而改变。In the formula, x'=xcosθ+ysinθ, y'=-xsinθ+ycosθ, λ is the wavelength, in the unit of pixel; θ represents the direction, and the value range is 0°≤θ≤360°; ψ represents the phase shift, It belongs to the range of [-180°,180°]; the value of γ determines the ellipticity of the shape of the Gabor function; Σ represents the standard deviation of the Gaussian factor of the Gabor function, and changes with the bandwidth.

图像特征提取时可获取不同频率不同方向的Gabor函数数组，基于Gabor计算纹理特征时，每个样本的图像纹理特征可表示为T＝([a1,a2],[b1,b2],[c1,c2],[d1,d2])，即图像纹理特征由a、b、c和d四个特征值构成，每个特征值分别由实部(下标为1)和虚部(下标为2)组成。Gabor function arrays with different frequencies and different directions can be obtained during image feature extraction. When calculating texture features based on Gabor, the image texture features of each sample can be expressed as T=([a1,a2],[b1,b2],[c1, c2],[d1,d2]), that is, the image texture features are composed of four eigenvalues a, b, c and d, and each eigenvalue consists of a real part (subscript 1) and an imaginary part (subscript 2 )composition.

样本之间进行匹配时，具体计算样本之间的图像纹理特征的差异值，根据差异值大小判断样本之间的匹配性，样本之间图像纹理特征的差异值越小对应的更为匹配。单个样本s_i和s_j之间的差异性计算公式为：When matching between samples, the difference value of image texture features between samples is specifically calculated, and the matching between samples is judged according to the size of the difference value. The smaller the difference value of image texture features between samples, the better the match. The formula for calculating the difference between a single sample s _i and s _j is:

本实施例中，步骤S12中具体通过计算各候选基准样本、训练样本的Gabor函数值，以及各候选基准样本与其他候选基准样本之间的距离值，根据Gabor函数值、距离值计算得到各候选基准样本与其他候选基准样本之间的差异值。In this embodiment, step S12 specifically calculates the Gabor function values of each candidate reference sample and training sample, and the distance value between each candidate reference sample and other candidate reference samples, and calculates each candidate reference sample according to the Gabor function value and distance value. The difference value between the benchmark sample and other candidate benchmark samples.

本实施例中，一个候选基准样本es_id与其他候选基准样本之间的差异值具体按下式计算得到；In this embodiment, the difference between a candidate benchmark sample es _id and other candidate benchmark samples is specifically calculated by the following formula;

其中，es_id为第i类第d个候选基准样本，es_hj为第h类第j个候选基准样本，D(es_id,es_hj)为样本es_id与样本es_hj之间的距离，h为es_id被错误分配的家族类别，μ为权衡系数，N为家族类别h所包含的基准样本的数量，M为基准样本的数量，l为G滤波器获得的图像纹理特征向量的长度。由各候选基准样本与当前家族类别的训练样本之间的差异值大小，即可确定是否应作为新的基准样本，以提高基准样本的可靠性。Among them, es _id is the dth candidate benchmark sample of class i, es _hj is the jth candidate benchmark sample of class h, D(es _id ,es _hj ) is the distance between sample es _id and sample es _hj , h is the family category whose es _id is misassigned, μ is the trade-off coefficient, N is the number of reference samples contained in the family category h, M is the number of reference samples, and l is the length of the image texture feature vector obtained by the G filter. Based on the size of the difference between each candidate benchmark sample and the training sample of the current family category, it can be determined whether it should be used as a new benchmark sample, so as to improve the reliability of the benchmark sample.

对于任意n个样本且包含m个家族类别的样本集合C，记为C＝{C₁,C₂,…,C_m}，集合C中分别包含n₁个训练样本和n₂个未知样本，且n＝n₁+n₂，选取候选基准样本的详细步骤如下：For any sample set C with n samples and m family categories, denoted as C={C ₁ ,C ₂ ,…,C _m }, the set C contains n ₁ training samples and n ₂ unknown samples respectively, And n=n ₁ +n ₂ , the detailed steps of selecting candidate reference samples are as follows:

①随机从训练样本中选择m个样本{b₁₁,b₂₁,…,b_m1}为基准样本，其中b_ij表示来自于家族i的第j个基准样本，即从每个家族类别的训练样本中随机选择一个训练样本作为初始基准样本(第一基准样本)；①Randomly select m samples {b ₁₁ ,b ₂₁ ,…,b _m1 } from the training samples as benchmark samples, where b _ij represents the jth benchmark sample from family i, that is, from the training samples of each family category randomly select a training sample as the initial benchmark sample (the first benchmark sample);

②剩余的训练样本分别与初始基准样本进行匹配，并统计匹配错误的样本，即错误分配的样本，匹配错误的样本相应也为m类，将匹配错误的样本作为候选基准样本，假设m类匹配错误的样本表示为：②The remaining training samples are matched with the initial benchmark samples respectively, and the wrongly matched samples are counted, that is, the wrongly assigned samples, and the wrongly matched samples are also in the m category, and the wrongly matched samples are used as candidate benchmark samples, assuming that the m category matches A sample of the error is represented as:

es＝{{es₁₁,es₁₂,...},{es₂₁,es₂₂,...},...,{es_n,1,es_n,2,...}}es={{es ₁₁ ,es ₁₂ ,...},{es ₂₁ ,es ₂₂ ,...},...,{es _n,1 ,es _n,2 ,...}}

③将每一个家族类别的匹配错误样本集合内部进行二次匹配，具体为：家族i的候选基准样本集表示为{es_i1,es_i2,…}，分别计算各候选基准样本的Gabor函数值{gabor_l(es_i1)，gabor_l(es_i2),gabor_l(es_i3),…}，再计算不同候选基准样本之间的差异性，具体按公式(5)计算样本es_id关于家族i的差异值，若候选样本es_id的差异值满足D(es_id)>ρ，则添加es_id为家族i的新基准样本(第二基准样本)。③ Perform secondary matching within the matching error sample set of each family category, specifically: the candidate benchmark sample set of family i is expressed as {es _i1 , es _i2 ,…}, and the Gabor function values of each candidate benchmark sample { gabor _l (es _i1 ), gabor _l (es _i2 ), gabor _l (es _i3 ),…}, and then calculate the difference between different candidate benchmark samples, specifically calculate the sample es _id with respect to family i according to formula (5) difference value, if the difference value of the candidate sample es _id satisfies D(es _id )>ρ, then add a new benchmark sample (second benchmark sample) whose es _id is family i.

本实施例对待检测恶意代码，首先将待检测恶意代进行图像转化，将二进制的待检测恶意代码转化为灰度图像形式，再提取图像纹理特征，具体与上述训练样本的处理方法相同。In this embodiment, the malicious code to be detected first converts the malicious code to be detected into an image, converts the binary malicious code to be detected into a grayscale image form, and then extracts image texture features, which is the same as the above-mentioned training sample processing method.

本实施例中，步骤S3中基于提取的图像纹理特征，确认待检测恶意代码的家族类别的具体步骤为：In this embodiment, based on the extracted image texture features in step S3, the specific steps for confirming the family category of the malicious code to be detected are:

S31.分别获取待检测恶意代码与各家族类别的基准样本集中各基准样本的匹配结果；S31. Respectively obtain the matching results of the malicious code to be detected and each benchmark sample in the benchmark sample set of each family category;

S32.由各家族类别的所有匹配结果分别得到对应各家族类别的综合匹配值，根据各家族类别的综合匹配值判断待检测恶意代码是否属于对应家族类别。S32. Obtain a comprehensive matching value corresponding to each family category from all matching results of each family category, and judge whether the malicious code to be detected belongs to the corresponding family category according to the comprehensive matching value of each family category.

本实施例中，综合匹配值具体按下式计算得到；In this embodiment, the comprehensive matching value is specifically calculated according to the following formula;

本实施例将待检测恶意代码es_test与基准样本es_ij匹配时，若匹配，则匹配结果为1，若不匹配，则匹配结果为-1，当然还可以根据实际需求对匹配结果进行设定；再将每个家族类别得到的所有匹配结果累加，得到最终的综合匹配值，由综合匹配值判定所属的家族类别。In this embodiment, when the malicious code es _test to be detected is matched with the benchmark sample es _ij , if they match, the matching result is 1, and if they do not match, the matching result is -1. Of course, the matching result can also be set according to actual needs ; Then add up all the matching results obtained by each family category to obtain the final comprehensive matching value, and determine the family category to which it belongs based on the comprehensive matching value.

本实施例中，当目标家族类别对应的综合匹配值R满足R>0时，则判定为待检测恶意代码属于目标家族类别，否则判定为待检测恶意代码不属于目标家族类别。In this embodiment, when the comprehensive matching value R corresponding to the target family category satisfies R>0, it is determined that the malicious code to be detected belongs to the target family category; otherwise, it is determined that the malicious code to be detected does not belong to the target family category.

以下以两个家族类别中10个测试样本的检测分类为例进一步说明本发明。The present invention will be further described below by taking the detection and classification of 10 test samples in two family categories as an example.

本实施例所采用的训练样本如表1所示。The training samples used in this embodiment are shown in Table 1.

表1：训练样本表。Table 1: Training sample table.

步骤1：基准样本选取Step 1: Benchmark sample selection

步骤1.1：训练样本图像纹理特征提取Step 1.1: Extraction of training sample image texture features

以分别为上述家族(1)中两个训练样本S1(0B06744D7C5822BA585C5992B10ADFA0)、S2(0BDAFFBA037A4880D31C93C0AADCC1FE)、家族(2)中两个训练样本S3(2C69C485A46B03C277B5F88DED0BABF0)、S4(2C9F38EF39CFD73AA52E22869E8ABD90)为例，首先将上述四个恶意代码训练样本由二进制文件转化为灰度图，如一段二进制代码"01100111"转化为无符号整型为206，则表明转化为灰度图后对应像素点的值为206，灰度图结果如图4所示，其中图(a)为家族(2)，分别对应样本S3和S4；图(b)为家族(2)，分别对应样本S1和S2；再采用Gabor过滤器提取恶意代码的纹理特征，采用的Gabor滤波器的实现方法具体如上述式(1)～(3)所示，计算得到四个样本的纹理特征分别为：以分别为上述家族(1)中两个训练样本S1(0B06744D7C5822BA585C5992B10ADFA0)、S2(0BDAFFBA037A4880D31C93C0AADCC1FE)、家族(2)中两个训练样本S3(2C69C485A46B03C277B5F88DED0BABF0)、S4(2C9F38EF39CFD73AA52E22869E8ABD90)为例，首先将上述四个恶意The code training sample is converted from a binary file to a grayscale image. For example, a piece of binary code "01100111" is converted into an unsigned integer of 206, which means that the value of the corresponding pixel is 206 after being converted into a grayscale image. The result of the grayscale image is shown in the figure 4, where Figure (a) is Family (2), corresponding to samples S3 and S4 respectively; Figure (b) is Family (2), corresponding to Samples S1 and S2 respectively; then Gabor filter is used to extract texture features of malicious code , the implementation method of the Gabor filter used is specifically shown in the above formulas (1) to (3), and the texture features of the four samples are calculated as follows:

T_样本S1＝([3.64589196e-01,1.78531921e-02],[1.11456886e-01,3.62631582e-03],[2.45940133e-01,4.82167451e-03],[3.66851460e-04,1.85390288e-04])；T _{sample S1} = ([3.64589196e-01,1.78531921e-02],[1.11456886e-01,3.62631582e-03],[2.45940133e-01,4.82167451e-03],[3.66851460e-04,1.8853902 04]);

T_样本S2＝([3.67820753e-01,2.47166142e-02],[1.12444790e-01,5.22362168e-03],[2.48120037e-01,7.30584538e-03],[3.70103068e-04,3.69625304e-04])；T _{sample S2} = ([3.67820753e-01,2.47166142e-02],[1.12444790e-01,5.22362168e-03],[2.48120037e-01,7.30584538e-03],[3.70103068e-04,3.6496253 04]);

T_样本S3＝([3.82683113e-01,1.65478632e-02],[1.16988294e-01,5.31969120e-03],[2.58145706e-01,3.20882018e-03],[3.85057648e-04,4.53992963e-04])；T _{sample S3} = ([3.82683113e-01,1.65478632e-02],[1.16988294e-01,5.31969120e-03],[2.58145706e-01,3.20882018e-03],[3.85057648e-04,4.539929 04]);

T_样本S4＝([3.78114609e-01,2.53183776e-02],[1.15591678e-01,5.70669572e-03],[2.55063941e-01,7.49917029e-03],[3.80460797e-04,3.41053618e-04])。T _{sample S4} = ([3.78114609e-01,2.53183776e-02],[1.15591678e-01,5.70669572e-03],[2.55063941e-01,7.49917029e-03],[3.80460797e-04,3.410536 04]).

步骤1.2：候选基准样本选取Step 1.2: Selection of Candidate Benchmark Samples

本实施例家族(1)与家族(2)分别提供如表1所示训练样本集1和2，并且每个样本集中包含10个训练样本。采用公式(4)对训练样本进行差异性计算以进行匹配，并且假设初始基准样本为样本S1与样本S3，则匹配结果为：训练样本集1中，样本S7与样本S10被分配错误；训练样本集2中，样本S5被分配错误；则将这三个样本添加为候选基准样本{[c₁₁,c₁₂],[c₂₁]}。In this embodiment, family (1) and family (2) respectively provide training sample sets 1 and 2 as shown in Table 1, and each sample set contains 10 training samples. Formula (4) is used to calculate the difference between the training samples for matching, and assuming that the initial reference samples are sample S1 and sample S3, the matching result is: in training sample set 1, sample S7 and sample S10 are assigned incorrectly; training sample In set 2, sample S5 is assigned incorrectly; then these three samples are added as candidate benchmark samples {[c ₁₁ ,c ₁₂ ],[c ₂₁ ]}.

步骤1.3：第二基准样本确定Step 1.3: Second benchmark sample determination

计算候选基准样本的纹理特征：Compute texture features for candidate benchmark samples:

([3.53133564e-01,2.24345224e-02],[1.07954837e-01,4.99747304e-03],[2.38212532e-01,6.49801062e-03],[3.55324746e-04,4.32171770e-04]),([3.54380214e-01,2.24449735e-02],[1.08335945e-01,5.00705347e-03],[2.39053482e-01,6.41765146e-03],[3.56579131e-04,3.85045161e-04])，([3.66485717e-01,2.55705031e-02],[1.12036663e-01,5.15760513e-03],[2.47219465e-01,8.83855001e-03],[3.68759749e-04,3.02423971e-04])。([3.53133564e-01,2.24345224e-02],[1.07954837e-01,4.99747304e-03],[2.38212532e-01,6.49801062e-03],[3.55324746e-04,4.3217174]70e-0 ([3.54380214e-01,2.24449735e-02],[1.08335945e-01,5.00705347e-03],[2.39053482e-01,6.41765146e-03],[3.56579131e-04,3.85045161e-03]), ([3.66485717e-01,2.55705031e-02],[1.12036663e-01,5.15760513e-03],[2.47219465e-01,8.83855001e-03],[3.68759749e-04,3.0242394]).

使用上述公式(5)计算候选基准样本与其他基准样本之间的差异值，可得：Using the above formula (5) to calculate the difference between the candidate benchmark sample and other benchmark samples, we can get:

其中，设置μ＝2。Among them, μ=2 is set.

本实施例假设阈值ρ＝0.45，由于有D(c₁₁)>ρ和D(c₁₂)>ρ，则将候选基准样本c₁₁与c₁₂添加为新的基准样本(第二基准样本)，则有家族(1)的基准样本集为{b₁₁,c₁₁,c₁₂}。同时，由于家族2的候选基准样本只有1个，直接添加为新的基准样本(第二基准样本)，得到基准样本集为{b₂₁,c₂₁}。In this embodiment, it is assumed that the threshold value ρ=0.45, since D(c ₁₁ )>ρ and D(c ₁₂ )>ρ, the candidate reference samples c ₁₁ and c ₁₂ are added as new reference samples (second reference samples), Then the benchmark sample set of family (1) is {b ₁₁ ,c ₁₁ ,c ₁₂ }. At the same time, since there is only one candidate benchmark sample of family 2, it is directly added as a new benchmark sample (second benchmark sample), and the benchmark sample set is {b ₂₁ ,c ₂₁ }.

步骤2：测试样本图像纹理特征提取Step 2: Test sample image texture feature extraction

将各测试恶意代码转换为灰度图像，并提取图像纹理特征，具体方法如上所述。Convert each test malicious code into a grayscale image, and extract the texture features of the image, the specific method is as described above.

步骤3：检测分类Step 3: Detection Classification

使用家族(1)的基准样本集{b₁₁,c₁₁,c₁₂}与家族(2)的基准样本集{b₂₁,c₂₁}对各测试样本进行重新匹配测试样本，其中测试样本集包含十个测试样本{S₁,S₂,S₃,S₄,S₅,S₆,S₇,S₈,S₉,S₁₀}。采用上述式(6)得到的各测试样本的综合匹配结果具体为：Use the benchmark sample set {b ₁₁ ,c ₁₁ ,c ₁₂ } of family (1) and the benchmark sample set {b ₂₁ ,c ₂₁ } of family (2) to re-match each test sample, where the test sample set contains Ten test samples {S ₁ , S ₂ , S ₃ , S ₄ , S ₅ , S ₆ , S ₇ , S ₈ , S ₉ , S ₁₀ }. The comprehensive matching results of each test sample obtained by using the above formula (6) are specifically:

表2：测试匹配结果表。Table 2: Test matching result table.

测试样本test sample S₁ S ₁ S₂ S ₂ S₃ S ₃ S₄ S ₄ S₅ S ₅ S₆ S ₆ S₇ S ₇ S₈ S ₈ S₉ S ₉ S₁₀ S ₁₀ 家族1family 1 33 33 33 11 33 00 00 33 11 00 家族2family 2 00 00 00 22 00 22 22 00 22 22

综合匹配结果若大于0，则判定为属于对应的家族类别，否则为不属于。则根据上述综合匹配结果得到最终测试结果为{S₁,S₂,S₃,S₅,S₈}属于家族(1)，{S4,S6,S7,S9,S10}属于家族(2)。由检测结果可知，本实施例上述检测方法能够准确的划分恶意代码家族类别，且检测效率高。If the comprehensive matching result is greater than 0, it is determined to belong to the corresponding family category, otherwise it is not. According to the above comprehensive matching results, the final test result is that {S ₁ , S ₂ , S ₃ , S ₅ , S ₈ } belong to family (1), and {S4, S6, S7, S9, S10} belong to family (2). It can be seen from the detection results that the above detection method in this embodiment can accurately classify malicious code family categories, and the detection efficiency is high.

上述只是本发明的较佳实施例，并非对本发明作任何形式上的限制。虽然本发明已以较佳实施例揭露如上，然而并非用以限定本发明。因此，凡是未脱离本发明技术方案的内容，依据本发明技术实质对以上实施例所做的任何简单修改、等同变化及修饰，均应落在本发明技术方案保护的范围内。The above are only preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention shall fall within the protection scope of the technical solution of the present invention.

Claims

1. A malicious code detection method based on image matching is characterized by comprising the following steps:

s1, selecting a reference sample: acquiring training samples corresponding to malicious codes of different family categories, respectively converting the training samples into gray level images and extracting corresponding image texture features; selecting a first reference sample from training samples of each family type, selecting a second reference sample according to the difference of image texture characteristics between the first reference sample and the samples, and forming the first reference sample and the second reference sample selected by each family type into a corresponding reference sample set;

s2, image feature extraction: converting the malicious codes to be detected into gray level images, and extracting corresponding image texture features;

s3, testing code classification: and matching the image texture features extracted in the step S2 with the reference sample sets corresponding to the family categories respectively, and confirming the family categories of the malicious codes to be detected according to matching results.

2. The method for detecting malicious codes based on image matching according to claim 1, wherein the specific step of selecting the second reference sample in the step S1 is:

s11, obtaining candidate reference samples: matching the selected first reference samples with the rest training samples respectively, and finding out the training samples which are wrongly distributed in each family type according to the matching result and using the training samples as candidate reference samples;

s12, determining a second reference sample: and respectively calculating difference values between each candidate reference sample and other candidate reference samples in each family type, and if the calculated difference values are greater than a specified threshold value, taking the corresponding candidate reference sample as a second reference sample of the corresponding family type.

3. The method according to claim 2, wherein in step S12, a difference value between each candidate reference sample and the other candidate reference samples is calculated according to the Gabor function value and the distance value by specifically calculating a Gabor function value of each candidate reference sample and a distance value between each candidate reference sample and the other candidate reference samples.

4. The image matching-based malicious code detection method according to claim 3, wherein the difference value between one candidate reference sample and the other candidate reference samples is calculated according to the following formula;

<mrow> <mi>D</mi> <mrow> <mo>(</mo> <mi>e</mi> <mi>s</mi> <mi>i</mi> <mi>d</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>M</mi> </mfrac> <munder> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mo>...</mo> <mo>...</mo> <mi>M</mi> </mrow> </munder> <mroot> <mrow> <msub> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>......</mn> </mrow> </msub> <msup> <mrow> <mo>(</mo> <msub> <mi>gabor</mi> <mi>l</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>es</mi> <mrow> <mi>i</mi> <mi>d</mi> </mrow> </msub> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mi>gabor</mi> <mi>l</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>es</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> <mn>2</mn> </mroot> <mo>+</mo> <mfrac> <mn>1</mn> <mrow> <mi>N</mi> <mi>&mu;</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>p</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>es</mi> <mrow> <mi>i</mi> <mi>d</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow>

p_d(es_id)＝∑_{j＝0,1,......,N}D(es_id,es_hj)

wherein es_idFor the ith class of candidate reference samples, es_hjFor the h-th class jth candidate reference sample, D (es)_id,es_hj) As samples es_idWith the sample es_hjH is es_idAnd μ is a weighting coefficient, N is the number of reference samples contained in the family class h, M is the number of reference samples, and l is the vector length of the image texture feature.

5. The malicious code detection method based on image matching according to any one of claims 1 to 4, wherein the image texture features are signal type static texture features.

6. The malicious code detection method based on image matching according to any one of claims 1 to 4, characterized in that: the image texture features are obtained by extracting through a Gabor filter.

7. The image matching-based malicious code detection method according to any one of claims 1 to 4, wherein the specific steps of confirming the family category of the malicious code to be detected in the step S3 are as follows:

s31, respectively obtaining matching results of the malicious codes to be detected and all reference samples in the reference sample sets of all family categories;

and S32, respectively obtaining a comprehensive matching value corresponding to each family type according to all matching results of each family type, and judging whether the malicious codes to be detected belong to the corresponding family type according to the comprehensive matching value of each family type.

8. The image matching-based malicious code detection method according to claim 7, wherein the comprehensive matching value is calculated according to the following formula;

<mrow> <mi>R</mi> <mo>=</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow>

wherein,es_testfor malicious code to be detected, es_ijIs the jth reference sample of the ith class, and N is the number of reference samples contained in the family class i.

9. The image matching-based malicious code detection method according to claim 8, wherein: and when the comprehensive matching value R corresponding to the target family category meets R >0, judging that the malicious codes to be detected belong to the target family category, otherwise, judging that the malicious codes to be detected do not belong to the target family category.