CN1625121A

CN1625121A - A Layered Cooperative Network Virus and Malicious Code Identification Method

Info

Publication number: CN1625121A
Application number: CN 200310106551
Authority: CN
Inventors: 王煦法; 曹先彬; 罗文坚; 马建辉; 张四海
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2003-12-05
Filing date: 2003-12-05
Publication date: 2005-06-08
Anticipated expiration: 2023-12-05
Also published as: CN1300982C

Abstract

A layered coordinate network virus and vicious code recognizing method. The characteristics are: use strong self-protection mechanism of biological immunity for reference, correspond the network virus and vicious code recognizing method to the multi-level protective mechanism of biological immunity system, judge the dangerous degree of the stand-by detected script through statistically analyzing the word frequency of the key words, based on the point view of 'self-collection' of register form operation analyze and judge the exceptional behavior of the register form written in the form route, and recognize non-self the executing sequence of the programming interface of the applied program, at last send all the exceptional behavior information to the network control station through the network. It well solves the problem of the identification of the unknown network virus and vicious code, and has good capacity of identification, realizes the monitoring and management of the network virus and vicious code of single system and the whole sub-network.

Description

A Layered Cooperative Network Virus and Malicious Code Identification Method

技术领域：Technical field:

本发明属于计算机网络安全技术领域，特别是涉及网络病毒和恶意代码的识别技术。The invention belongs to the technical field of computer network security, in particular to the identification technology of network viruses and malicious codes.

背景技术：Background technique:

根据在美国出版的电气和电子工程师协会《潜力》杂志(IEEE POTENTIALS，2001年10月第四期第16-19页)介绍，现有的计算机反病毒识别技术大致可以分为以下几种：(1)基于特征码的扫描，主要针对已知病毒。(2)虚拟机技术，其基本思想是将可疑程序置于虚拟机环境下执行用于判断是否为病毒，但目前仍然面临虚拟机的效果以及如何保证虚拟机的自身安全性等诸多问题。(3)启发式方法，其基本思想是试图通过泛化的特征码来检测家族病毒以及检测未知病毒。该方法常常依赖于特征码技术和虚拟机技术，目前对未知病毒的识别效果也有待改进。(4)行为分析法，即利用监视病毒的特有行为来检测病毒的方法。这种方法要求首先归纳出病毒的一般行为模式，然后设计一个有限状态机对应该行为模式，状态迁移对应于程序的行为，接受状态为检测到病毒。这种方法的问题是对层出不穷的新病毒，很难归纳出一个一般的行为模式。(5)校验和法。这种方法在机器的初始状态生成一个校验信息并保存，然后在校验信息发生异常变化时(校验失败)报警，这种方法的主要问题是实现起来开销太大，同时也面临新应用程序的安装和版本升级等问题。总的来说，在现有的计算机反病毒技术中，特征码扫描技术主要用于识别已知病毒，其余各种针对未知病毒而提出的识别技术都还有各自的缺点和局限性。According to the Institute of Electrical and Electronics Engineers "Potential" magazine (IEEE POTENTIALS, October 2001, pp. 16-19) published in the United States, the existing computer anti-virus identification technologies can be roughly divided into the following types: ( 1) Scanning based on signatures, mainly for known viruses. (2) virtual machine technology, its basic idea is to place suspicious programs in a virtual machine environment to execute to determine whether it is a virus, but still faces many problems such as the effect of virtual machines and how to ensure the security of virtual machines. (3) The heuristic method, whose basic idea is to try to detect family viruses and unknown viruses through generalized signatures. This method often relies on feature code technology and virtual machine technology, and the current identification effect on unknown viruses needs to be improved. (4) Behavior analysis method, that is, a method of detecting viruses by monitoring the unique behavior of viruses. This method requires first summarizing the general behavior pattern of the virus, and then designing a finite state machine corresponding to the behavior pattern, the state transition corresponds to the behavior of the program, and the acceptance state is the detection of the virus. The problem with this method is that it is difficult to generalize a general behavior pattern for new viruses that emerge in an endless stream. (5) Checksum method. This method generates a verification information in the initial state of the machine and saves it, and then reports to the police when the verification information changes abnormally (verification failure). The main problem of this method is that the implementation cost is too large, and it also faces new applications. Problems such as program installation and version upgrades. Generally speaking, in the existing computer anti-virus technology, signature code scanning technology is mainly used to identify known viruses, and other identification technologies proposed for unknown viruses have their own shortcomings and limitations.

由于网络病毒和恶意代码是在近年来才开始流行并带来严重危害的网络安全事件，中国专利申请号96114050提出的一种可防止计算机病毒感染的方法只能防范部分早期的计算机病毒，目前这种防毒卡已经彻底淡出市场；中国专利申请号96109573提出的防火墙系统是对进、出内部网络的连接或信息进行安全检查，基本不具有识别网络病毒和恶意代码的能力。因此，这些技术不适用于网络病毒和恶意代码的识别。Since network viruses and malicious codes are network security incidents that have only become popular in recent years and have caused serious harm, a method for preventing computer virus infection proposed by Chinese Patent Application No. 96114050 can only prevent some early computer viruses. This kind of anti-virus card has completely faded out of the market; the firewall system proposed by Chinese Patent Application No. 96109573 is to carry out security checks on connections or information entering and leaving the internal network, and basically does not have the ability to identify network viruses and malicious codes. Therefore, these techniques are not suitable for the identification of network viruses and malicious codes.

发明内容：Invention content:

针对现有网络病毒和恶意代码识别技术的不足，本发明提出一种分层协同的网络病毒和恶意代码识别方法，以解决未知网络病毒和恶意代码的异常行为识别问题，实现对单个系统及整个子网中的未知网络病毒和恶意代码异常行为的监控。Aiming at the deficiencies of existing network virus and malicious code identification technologies, the present invention proposes a layered and collaborative network virus and malicious code identification method to solve the problem of identifying abnormal behaviors of unknown network viruses and malicious codes, and to realize the identification of a single system and the entire system. Monitoring of unknown network viruses and malicious code abnormal behavior in the subnet.

本发明分层协同的网络病毒和恶意代码识别方法，包括：从脚本文件中分离出关键词，通过注入动态链接库(Dynamic Linked Library：简称DLL)的方法获得应用程序编程接口(Application Programming Interface：简称API)执行序列和注册表写入表项路径，将注册表写入表项路径和API序列保存在硬盘或内存中；其特征在于：The network virus and malicious code identification method of layered collaboration of the present invention, comprise: separate keyword from script file, obtain application programming interface (Application Programming Interface: API for short) executes the sequence and the registry entry entry path, and saves the registry entry entry entry path and the API sequence in the hard disk or memory; it is characterized in that:

对脚本的关键词词频统计分析并作出异常判断；Statistical analysis of keyword frequency of scripts and making abnormal judgments;

对注册表写入表项路径进行自我识别并作出异常判断；Self-identify the entry path written in the registry and make abnormal judgments;

对API序列进行非我识别并作出异常判断；Perform non-self identification on API sequences and make abnormal judgments;

将异常行为信息发送到网络控制台；Send abnormal behavior information to the web console;

所述脚本文件是指用Javascript语言写的脚本文件、用VBScript语言写的脚本文件以及嵌入了Javascript或VBScript代码的脚本文件；Described script file refers to the script file written in Javascript language, the script file written in VBScript language and the script file embedded in Javascript or VBScript code;

所述注入DLL获得API执行序列和注册表写入表项路径是指，通过将DLL作为远程线程注入到目标程序(即待监控程序)中，然后采用替换输入地址表(Import Address Table：IAT)的方法截取目标程序的API执行序列，并从注册表API函数的参数获得注册表写入表项路径；Said injecting DLL to obtain API execution sequence and registry write table item path refers to, by injecting DLL in the target program (i.e. the program to be monitored) as remote thread, and then adopting replacement input address table (Import Address Table: IAT) The method intercepts the API execution sequence of the target program, and obtains the registry entry entry path from the parameter of the registry API function;

所述对脚本的关键词统计分析并作出异常判断是指从脚本文件中分离出29个关键词copyfile、Createobject、Delete、FolderDelete、RegWrite、Virus、.Write、GetSpecialFolder、keys、opentextfile、readall、.save、startup、execute、.add、buildpath、copyfolder、createfolder、createtextfile、deletefile、fileexists、folderexists、getfile、getfolder、getparentfolder、format、.run、do copy、document.write，并进行如下步骤：The keyword statistical analysis of the script and making an abnormal judgment refer to separating 29 keywords copyfile, Createobject, Delete, FolderDelete, RegWrite, Virus, .Write, GetSpecialFolder, keys, opentextfile, readall, .save from the script file , startup, execute, .add, buildpath, copyfolder, createfolder, createtextfile, deletefile, fileexists, folderexists, getfile, getfolder, getparentfolder, format, .run, do copy, document.write, and perform the following steps:

(1)将29个关键词分为三组，第一组为创建对象关键词：Createobject；第二组为本身无危险操作关键词：Virus、.Write、GetSpecialFolder、keys、opentextfile、readall、startup、execute、.add、buildpath、fileexists、folderexists、getfile、getfolder、getparentfolder、.run、document.write；第三组为具有可能进行破坏操作的关键词：copyfile、Delete、FolderDelete、RegWrite、.save、copyfolder、createfolder、createtextfile、deletefile、format、do copy；(1) Divide 29 keywords into three groups. The first group is the keyword for creating an object: Createobject; the second group is a keyword for its own safe operation: Virus, . execute, .add, buildpath, fileexists, folderexists, getfile, getfolder, getparentfolder, .run, document.write; the third group is keywords with possible destructive operations: copyfile, Delete, FolderDelete, RegWrite, .save, copyfolder, createfolder, createtextfile, deletefile, format, do copy;

(2)统计正常脚本中这29个关键词出现的词频的期望值f_i，1≤i≤29，统计异常脚本中这29个关键词出现的词频的期望值f_i′，1≤i≤29，计算29个关键词在正常和异常脚本中的归一化词频差 $e_{i} = (f_{i} - f_{i}^{'}) / Σ_{i = 1}^{29} (f_{i} - f_{i}^{'}), 1 \leq i \leq 29;$ (2) The expected value f _i of the word frequency that these 29 keywords occur in the statistical normal script, 1≤i≤29, the expected value f _i ' of the word frequency that these 29 keywords occur in the statistical abnormal script, 1≤i≤29, Calculate the normalized word frequency difference of 29 keywords in normal and abnormal scripts $e_{i} = (f_{i} - f_{i}^{'}) / Σ_{i = 1}^{29} (f_{i} - f_{i}^{'}), 1 \leq i \leq 29;$

(3)统计在当前待检测脚本中关键词出现的词频m_i，1≤i≤29，计算待检测脚本的危险度Risk，(3) Count the frequency m _i of the keywords appearing in the current script to be detected, 1≤i≤29, calculate the risk degree Risk of the script to be detected,

$Risk risk = = G G {Σ Σ}_{i i = = 11}^{2929} P P ((i i)) F f ((i i))$

其中P(i)、F(i)和G分别为：Where P(i), F(i) and G are respectively:

(4)将危险度阈值TH定义为：(4) Define the risk threshold TH as:

$TH TH = = {Σ Σ}_{i i = = 00}^{2929} P P ((i i)) / / 2929$

当危险度Risk超过阈值TH时，发送预警信息至网络控制台；When the risk degree Risk exceeds the threshold TH, an early warning message is sent to the network console;

所述对注册表写入表项路径进行自我识别并作出异常判断采取如下步骤：The self-identification of the entry path written in the registry and making an abnormal judgment take the following steps:

(1)收集正常状态下目标程序(待监控程序)的正常注册表写入表项路径，并存入数据库中，每个正常注册表写入表项路径称为“自我”，其集合称为“自我集”；(1) Collect the normal registry entry path of the target program (program to be monitored) in the normal state and store it in the database. Each normal registry entry entry path is called "self", and its collection is called "self-set";

(2)读取当前注册表写入表项路径，与数据库中原有的“自我”操作相比较，如果不在“自我集”中，则发送异常行为信息至网络控制台；(2) Read the path of the entry in the current registry, compare it with the original "self" operation in the database, if it is not in the "self set", send the abnormal behavior information to the network console;

所述对API序列进行非我识别并作出异常判断采取如下步骤：The following steps are taken to identify the non-self of the API sequence and make an abnormal judgment:

(1)API选取操作：(1) API selection operation:

(a)截取正常状态下目标程序的API序列，并以滑动步长为W₀的方式将之截成长度为L₀的串集S₀；(a) Intercept the API sequence of the target program under normal conditions, and cut it into a string set S ₀ whose length is L ₀ with a sliding step size of W ₀ ;

(b)截取带毒运行状态下目标程序的API序列，并以滑动步长为W₀的方式将之截成长度为L₀的串集R₀；(b) intercepting the API sequence of the target program under the poisoned running state, and truncating it into a string set R ₀ whose length is L ₀ with a sliding step size of W ₀ ;

(c)比较串集S₀和R₀中不同的序列，抽取出构成这些序列的API函数，将这些API函数作为待监视的API函数集；(c) compare the different sequences in the string sets _S0 and _R0 , extract the API functions that form these sequences, and use these API functions as the API function set to be monitored;

(2)根据选定的API函数，截取正常状态下目标程序的API序列，并以滑动步长为W将之截成长度为L的串，生成自我集S；(2) According to the selected API function, intercept the API sequence of the target program in the normal state, and cut it into a string of length L with a sliding step size of W to generate a self-set S;

(3)获取目标程序的当前API执行序列，并以滑动步长为W将之截成长度为L的串，每次读取N个API序列进行如下检测过程：(3) Obtain the current API execution sequence of the target program, and cut it into a string of length L with a sliding step size of W, and read N API sequences each time to perform the following detection process:

(a)产生初始检测器集D₀：根据选定的API函数随机产生预检测器，过滤自我(即把与自我匹配的API序列删除)，进而获得初始检测器集；这里的匹配策略是部分匹配策略，即两个序列匹配当且仅当这两个字符串在连续r个位置一致；(a) Generate an initial detector set D ₀ : randomly generate a pre-detector according to the selected API function, filter the self (that is, delete the API sequence that matches the self), and then obtain the initial detector set; the matching strategy here is part Matching strategy, that is, two sequences match if and only if the two strings are consistent in consecutive r positions;

(b)比较当前AP执行序列和检测器集中的任一检测器：如果发现匹配则标记该序列并将总匹配数目加1，当实时获取的待检测API序列总匹配数目达到阈值G_n时，向网络控制台发异常行为信息；(b) Compare the current AP execution sequence with any detector in the detector set: if a match is found, mark the sequence and add 1 to the total matching number. When the total matching number of the API sequence to be detected in real time reaches the threshold _Gn , Send abnormal behavior information to the network console;

(c)如果进化代数t超过阈值G_e或全部API序列已被标记，继续读取下一批API序列并进行检测；否则，对于不匹配的API序列，则依据亲合度变异、基因库进化、随机产生的三个子集D_A、D_G、D_R和记忆集D_M共同组成下一代检测器集D_t＝D_A+D_G+D_R+D_M，且D_A、D_G、D_R子集满足 $\frac{D_{A}}{1} \approx \frac{D_{G}}{2} \approx \frac{D_{M}}{1};$ (c) If the evolution algebra t exceeds the threshold _Ge or all API sequences have been marked, continue to read the next batch of API sequences and perform detection; otherwise, for unmatched API sequences, based on affinity variation, gene pool evolution, The randomly generated three subsets D _A , D _G , _DR and the memory set D _M together form the next generation detector set D _t = D _A +D _G +D _R +D _M , and D _A , D _G , D _R subset satisfies $\frac{{D.}_{A}}{1} \approx \frac{{D.}_{G}}{2} \approx \frac{{D.}_{m}}{1};$

通过亲合度变异产生检测器子集D_A，亲合度变异是指当API序列与检测器集中的任一检测器的匹配程度超过亲合度阈值G_f时，通过变异产生N_c(N_c≥1)个子代个体；A subset _of detectors D _A is generated through affinity variation, which refers to generating N _c (N _c ≥ 1 ) offspring individuals;

通过基因库进化产生检测器子集D_G，基因库进化是指提高组成有效检测器的API的选择概率，即P_api＝P_api+ΔP；并在实际生成检测器时，依据API选择概率通过赌轮法生成预检测器，最后过滤自我生成检测器子集D_G；The detector subset D _G is generated through the evolution of the gene pool. The evolution of the gene pool refers to increasing the selection probability of the APIs that form an effective detector, that is, P _api = P _api + ΔP; and when actually generating a detector, the API selection probability passes The roulette method generates a pre-detector, and finally filters the self-generated detector subset D _G ;

通过随机产生检测器子集D_R；By randomly generating the detector subset D _R ;

将已有的能够匹配异常序列的检测器组成记忆集D_M；Combining existing detectors that can match abnormal sequences into a memory set D _M ;

所述网络控制台是指用来接收对脚本、注册表写入表项路径以及API序列进行分析处理所获得的异常信息的网络程序。The network console refers to a network program used to receive abnormal information obtained by analyzing and processing scripts, registry entry paths and API sequences.

与现有技术相比较，本发明的优点在于：Compared with the prior art, the present invention has the advantages of:

1、本发明通过统计正常脚本和异常脚本中选定的29个关键词词频来获得归一化词频，并以此为基础给出危险度和危险度阈值计算方法来判断待检测脚本的危险度，解决了恶意脚本的识别问题。1. The present invention obtains the normalized word frequency by counting the frequency of 29 keywords selected in normal scripts and abnormal scripts, and based on this, provides a calculation method for the degree of danger and the degree of danger threshold to judge the degree of danger of the script to be detected , which solves the problem of identifying malicious scripts.

2、本发明基于注册表操作“自我集”的角度来判断分析注册表写入表项路径的异常行为，适用于各种目标程序。2. The present invention judges and analyzes the abnormal behavior of the entry path written in the registry based on the "self-set" of the registry operation, and is applicable to various target programs.

3、本发明将包括基因库进化、随机产生、亲合度变异和记忆集在内的四个学习与记忆模块和API执行序列的异常检测结合起来，使得对API序列进行非我识别的异常检测效果较好，且适用于各种目标程序。3. The present invention combines four learning and memory modules including gene pool evolution, random generation, affinity variation and memory set with abnormal detection of API execution sequence, so that the abnormal detection effect of non-self recognition on API sequence Better, and suitable for a variety of target programs.

4、本发明借鉴生物免疫强大的自我保护机制，首次将对脚本进行关键词统计分析、对注册表写入表项路径进行自我识别、对API执行序列进行非我识别这三个方面统一起来对目标程序的异常行为进行监视，使得对未知网络病毒和恶意代码的识别效果更好。4. The present invention draws lessons from the powerful self-protection mechanism of biological immunity, and for the first time unifies the three aspects of statistical analysis of script keywords, self-identification of entry paths written in the registry, and non-self identification of API execution sequences. The abnormal behavior of the target program is monitored, making the identification of unknown network viruses and malicious codes better.

5、采用本发明可以自动详实地记录程序的注册表写入表项路径和API执行序列，为进一步分析网络病毒和恶意代码提供了第一手资料。5. Adopting the present invention can automatically and detailedly record the registry entry entry path and API execution sequence of the program, providing first-hand information for further analysis of network viruses and malicious codes.

综上所述，本发明借鉴生物免疫强大的自我保护机制，将网络病毒和恶意代码识别技术和生物免疫系统的多层保护机制对应起来，分别从对脚本进行关键词统计分析、对注册表写入表项路径进行自我识别、对API执行序列进行非我识别这三个方面来较好地解决了未知网络病毒和恶意代码的异常行为识别问题，进而解决了现有技术对病毒变种和未知病毒难于识别的问题，不仅实现了对单个系统中网络病毒和恶意代码异常行为的监控，而且使得管理员能够通过网络控制台对整个子网的安全情况实时监控和管理。In summary, the present invention draws lessons from the powerful self-protection mechanism of biological immunity, and corresponds the network virus and malicious code identification technology with the multi-layer protection mechanism of the biological immune system. The three aspects of self-identification of the entry path and non-self identification of the API execution sequence better solve the problem of abnormal behavior identification of unknown network viruses and malicious codes, and thus solve the problem of virus variants and unknown viruses in the prior art. Problems that are difficult to identify not only realize the monitoring of abnormal behaviors of network viruses and malicious codes in a single system, but also enable administrators to monitor and manage the security situation of the entire subnet in real time through the network console.

附图说明：Description of drawings:

图1是本发明进行分层协同的网络病毒和恶意代码识别的工作流程图。Fig. 1 is a working flow diagram of network virus and malicious code identification for layered cooperation in the present invention.

具体实施方式：Detailed ways:

下面结合附图和实例对本发明方法作进一步具体的描述。The method of the present invention will be further specifically described below in conjunction with the accompanying drawings and examples.

实施例1：Example 1:

1、利用几台通用微型个人计算机，通过交换机连成一个网络环境1. Use several general-purpose micro-personal computers to connect to a network environment through switches

本实施例中具体采用的是三台奔腾IV微机，和一台Dell笔记本，以及一台企业服务器，外加一个长城24端口10M/100M自适应以太网交换机GES-1125交换机，通过交换机将几台微机三台奔腾IV微机、一台Dell笔记本和一台企业服务器连成一个网络。Specifically adopted in this embodiment are three Pentium IV microcomputers, and a Dell notebook, and an enterprise server, plus a Great Wall 24-port 10M/100M adaptive Ethernet switch GES-1125 switchboard, through which several microcomputers Three Pentium IV microcomputers, a Dell notebook and an enterprise server are connected into a network.

图1给出了本实施例进行分层协同的网络病毒和恶意代码识别的工作流程。箭头方向指示了工作流向顺序，箭头尾部是下一步的输入，箭头端是下一步进行的操作。其中一台奔腾序列微机用于运行网络控制台1，其余的两台奔腾IV微机、一台Dell笔记本和一台企业服务器都用于执行对脚本进行关键词词频统计分析2、对注册表写入表项路径进行自我识别3和对API执行序列进行非我识别4，并将这三个方面的分析结果都发送到网络控制台1。FIG. 1 shows the workflow of identifying network viruses and malicious codes for layered collaboration in this embodiment. The direction of the arrow indicates the sequence of the work flow, the tail of the arrow is the input of the next step, and the end of the arrow is the operation of the next step. One of the Pentium series microcomputers is used to run the network console 1, and the remaining two Pentium IV microcomputers, a Dell notebook, and an enterprise server are all used to execute the statistical analysis of the keyword frequency of the script. 2. Write to the registry The table item path performs self-identification 3 and performs non-self identification 4 on the API execution sequence, and sends the analysis results of these three aspects to the network console 1 .

2、对脚本的关键词统计分析并作出恶意代码异常判断2. Statistical analysis of script keywords and judgment of malicious code exceptions

如图1中的对脚本进行关键词词频统计分析2，具体采取如下操作步骤：As shown in Figure 1, perform keyword frequency statistical analysis 2 on the script, and specifically take the following steps:

(1)收集大量的正常脚本文件和恶意脚本文件，建议正常脚本文件和恶意脚本文件均不少于50个，从脚本文件中分离出29个关键词copyfile、Createobject、Delete、FolderDelete、RegWrite、Virus、.Write、GetSpecialFolder、keys、opentextfile、readall、.save、startup、execute、.add、buildpath、copyfolder、createfolder、createtextfile、deletefile、fileexists、folderexists、getfile、getfolder、getparentfolder、format、.run、do copy、document.write：(1) Collect a large number of normal script files and malicious script files. It is recommended that there are no less than 50 normal script files and malicious script files, and separate 29 keywords copyfile, Createobject, Delete, FolderDelete, RegWrite, Virus from the script files , .Write, GetSpecialFolder, keys, opentextfile, readall, .save, startup, execute, .add, buildpath, copyfolder, createfolder, createtextfile, deletefile, fileexists, folderexists, getfile, getfolder, getparentfolder, format, .run, do copy, document.write:

(2)将29个关键词分为三组，第一组为创建对象关键词：Createobject，第二组为本身无危险操作关键词：Virus、.Write、GetSpecialFolder、keys、opentextfile、readall、startup、execute、.add、buildpath、fileexists、folderexists、getfile、getfolder、getparentfolder、.run、document.write，第三组为具有可能进行破坏操作的关键词：copyfile、Delete、FolderDelete、RegWrite、.save、copyfolder、createfolder、createtextfile、deletefile、format、do copy；(2) Divide the 29 keywords into three groups, the first group is the keyword for creating objects: Createobject, the second group is the keywords for self-safe operations: Virus, .Write, GetSpecialFolder, keys, opentextfile, readall, startup, execute, .add, buildpath, fileexists, folderexists, getfile, getfolder, getparentfolder, .run, document.write, the third group is keywords with possible damage operations: copyfile, Delete, FolderDelete, RegWrite, .save, copyfolder, createfolder, createtextfile, deletefile, format, do copy;

(3)如图1中的正常脚本关键词词频统计A1：统计正常脚本中这29个关键词出现的词频的期望值f_i(1≤i≤29)；(3) As shown in Figure 1, the normal script keyword word frequency statistics A1: the expected value f _i (1≤i≤29) of the word frequency occurrence of these 29 keywords in the statistical normal script;

(4)如图1中的异常脚本关键词词频统计A2：统计恶意脚本中这29个关键词出现的词频的期望值f_i′(1≤i≤29)；(4) Abnormal script keyword frequency statistics A2 as shown in Figure 1: the expected value f _i '(1≤i≤29) of the word frequency occurrence of these 29 keywords in the statistical malicious script;

(5)如图1中的计算归一化词频A3：计算29个关键词在正常和异常脚本中的归一化词频差 $e_{i} = (f_{i} - f_{i}^{'}) / Σ_{i = 1}^{29} (f_{i} - f_{i}^{'}) (1 \leq i \leq 29);$ (5) Calculate the normalized word frequency A3 as shown in Figure 1: Calculate the normalized word frequency difference of 29 keywords in normal and abnormal scripts $e_{i} = (f_{i} - f_{i}^{'}) / Σ_{i = 1}^{29} (f_{i} - f_{i}^{'}) (1 \leq i \leq 29);$

(6)如图1中的分析待检测脚本A4：从硬盘中读取指定的脚本文件或从浏览器(如IExplore.exe)的临时文件目录中读取浏览器正在访问的脚本文件，统计在该脚本中这29个关键词出现的词频m_i；(6) Analysis script A4 to be detected as shown in Figure 1: read the specified script file from the hard disk or read the script file that the browser is accessing from the temporary file directory of the browser (such as IExplore.exe), the statistics are in The frequency m _i of these 29 keywords appearing in the script;

(7)如图1中的危险度计算A5：计算待检测脚本的危险度Risk，(7) Risk calculation A5 as shown in Figure 1: Calculate the risk risk of the script to be detected,

$Risk risk = = G G {Σ Σ}_{i i = = 00}^{2929} P P ((i i)) F f ((i i))$

其中P(i)、F(i)和G分别为：Where P(i), F(i) and G are respectively:

(8)计算危险度阈值，危险度阈值TH的计算方法为：(8) Calculating the risk threshold, the calculation method of the risk threshold TH is:

$TH TH = = {Σ Σ}_{i i = = 00}^{2929} P P ((i i)) / / 2929$

(9)如图1中的发送预警信息A6：当危险度Risk超过阈值TH时，通过网络将预警信息发送至网络控制台1(根据Windows操作系统的Socket编制相应的发送接收程序)。(9) Send early warning information A6 as shown in Figure 1: when the risk degree Risk exceeds the threshold value TH, send the early warning information to the network console 1 through the network (compile the corresponding sending and receiving program according to the Socket of the Windows operating system).

3、如图1中的对注册表写入表项路径进行自我识别3，对注册表写入表项路径进行自我识别并作出异常判断可采取如下实施步骤：3. As shown in Figure 1, perform self-identification on the entry path written in the registry 3, perform self-identification on the entry path written in the registry and make an abnormal judgment, the following implementation steps can be taken:

(1)如图1中的截取注册表写入表项路径B1：注入截取注册表API函数的DLL至目标程序中，如IExplore.exe和Outlook.exe，获得注册表API函数执行情况和参数，并从注册表API函数的参数获得注册表写入表项路径。注入DLL的方法可以用远程线程注入方法，远程线程函数可参见MSDN中的CreateRemoteThread，在注入DLL中采用替换IAT(ImportAddress Table：输入地址表)的方法可截取目标程序的API执行序列，注意要对GetProcAddress和LoadLibraryA、LoadLibraryExA、LoadLibraryW、LoadLibraryExW做特殊处理，具体可参见微软公司出版、杰弗里·里克特(Jeffrey Ritcher)著的《窗口操作系统核心编程》(Programming Applications for Windows)；(1) As shown in Figure 1, the interception registry is written into the entry path B1: inject the DLL that intercepts the registry API function into the target program, such as IExplore.exe and Outlook.exe, to obtain the execution status and parameters of the registry API function, And obtain the registry entry entry path from the parameters of the registry API function. The method of injecting DLL can use the remote thread injection method, the remote thread function can refer to CreateRemoteThread in MSDN, the method of replacing IAT (ImportAddress Table: input address table) in the injection DLL can intercept the API execution sequence of the target program, pay attention to the GetProcAddress and LoadLibraryA, LoadLibraryExA, LoadLibraryW, and LoadLibraryExW do special processing. For details, see "Programming Applications for Windows" published by Microsoft and written by Jeffrey Ritcher;

(2)如图1中的收集自我B2：在正常状态下运行目标程序，如用IExplore.exe访问不含恶意代码的网页或用Outlook.exe收取不含网络病毒和恶意代码的信件等，收集正常状态下目标程序(在此是IExplore.exe或Outlook.exe)的正常注册表写入表项路径，并存入数据库中，每个正常注册表写入表项路径称为“自我”，其集合称为“自我集”；(2) Collecting self B2 among Fig. 1: run target program under normal state, as visit the webpage that does not contain malicious code with IExplore.exe or receive the letter that does not contain network virus and malicious code with Outlook.exe etc., collect Under normal conditions, the normal registry entry path of the target program (here IExplore.exe or Outlook.exe) is stored in the database, and each normal registry entry entry path is called "self". The collection is called the "self-set";

(3)如图1中的收集当前待检测的注册表写入表项路径B3：在目标程序运行过程中，通过注入的DLL实时获取目标程序的注册表写入表项路径，如IExplore.exe或Outlook.exe的注册表写操作，并将注册表写入表项路径保存在共享内存中；与此同时，注册表检测模块从共享内存中读取当前注册表写入表项路径，与数据库中原有的“自我”操作相比较，如图1中的自我识别B4；如果不在“自我集”中，则发送异常行为信息至网络控制台，如图1中的发送异常行为信息B5。(3) As shown in Figure 1, collect the registry entry path B3 currently to be detected: during the operation of the target program, obtain the registry entry entry path of the target program in real time through the injected DLL, such as IExplore.exe Or the registry write operation of Outlook.exe, and save the registry entry path in the shared memory; at the same time, the registry detection module reads the current registry entry entry path from the shared memory, and the database Compared with the original "self" operation, see self-identification B4 in Figure 1; if it is not in the "self set", send abnormal behavior information to the network console, such as sending abnormal behavior information B5 in Figure 1.

4、如图1中的对API执行序列进行非我识别4，对API序列进行非我识别并作出异常判断可采取如下实施步骤。4. As shown in Figure 1, perform non-self identification on the API execution sequence 4, perform non-self identification on the API sequence and make an abnormal judgment, the following implementation steps can be taken.

需要说明的是：如果不考虑速度的话，可以不运行第(1)步和第(2)步，直接使用全体API函数；或者不运行第(1)步，直接在全体API函数中进行选取。It should be noted that if the speed is not considered, you can directly use all API functions without running steps (1) and (2); or directly select from all API functions without running step (1).

(1)首先对全部API函数进行重新编号，并确定目标程序使用的API函数总集，如图1中的使用的API集合C1：(1) First renumber all API functions, and determine the total set of API functions used by the target program, such as the used API set C1 in Figure 1:

(a)由于全部API函数过多，约3000个，可以将API函数分为20组，每组约150个，并针对各组API函数生成相应的注入DLL；(a) Since there are too many API functions, about 3,000, the API functions can be divided into 20 groups, each group has about 150, and corresponding injection DLLs are generated for each group of API functions;

(b)将这些DLL分别注入目标程序，如IExplore.exe或Outlook.exe，在正常和带毒情况下运行目标程序，并从记录的文件中获得目标程序使用的API函数列表；(b) Inject these DLLs into the target program, such as IExplore.exe or Outlook.exe, run the target program under normal and poisonous conditions, and obtain a list of API functions used by the target program from the recorded file;

(2)API选取操作，如图1中的API选取C2：(2) API selection operation, such as API selection C2 in Figure 1:

(a)截取正常状态下目标程序的API序列，并以滑动步长为W₀的方式将之截成长度为L₀的串集S₀，其中W₀的取值可以为1至L₀间的任意整数，建议取

L₀的取值可以为大于8的整数，建议取8、16、32或64；(a) Intercept the API sequence of the target program in the normal state, and cut it into a string set S ₀ of length L ₀ with a sliding step size of W ₀ , where the value of W ₀ can be between 1 and L ₀ Any integer of , it is recommended to take

The value of L ₀ can be an integer greater than 8, it is recommended to take 8, 16, 32 or 64;

(3)如图1中的API重新编号C3：对选定的API函数进行重新编号，以便于表示API序列；(3) API renumbering C3 in Figure 1: renumbering the selected API functions to facilitate the representation of the API sequence;

(4)如图1中的收集自我C4：根据选定的API函数，截取正常状态下目标程序的API序列，并以滑动步长W将之截成长度为L的串，生成自我集S，其中W₀的取值可以为1至L₀间的任意整数，建议取 L₀的取值可以为大于8的整数，建议取8、16、32或64；(4) Collecting self C4 in Figure 1: according to the selected API function, intercept the API sequence of the target program under normal conditions, and cut it into a string of length L with a sliding step size W to generate self-set S, The value of W ₀ can be any integer between 1 and L ₀ , it is recommended to take The value of L ₀ can be an integer greater than 8, it is recommended to take 8, 16, 32 or 64;

(5)获取目标程序的当前API执行序列，每次读取N个API序列进行如下检测过程，如IExplore.exe或Outlook.exe，建议N取值为128，如图1中的获取目标程序的当前API执行序列C5：(5) Obtain the current API execution sequence of the target program, read N API sequences each time and perform the following detection process, such as IExplore.exe or Outlook.exe, it is recommended that the value of N be 128, as shown in Figure 1 to obtain the target program Current API execution sequence C5:

(a)如图1中的启动检测并判断结束条件是否满足C7，产生初始检测器集D₀：根据选定的API函数随机产生预检测器，过滤自我(即把与自我匹配的API序列删除)，进而获得初始检测器集；这里的匹配策略是部分匹配策略，即两个序列匹配当且仅当这两个字符串在连续r个位置一致；(a) Start the detection as shown in Figure 1 and judge whether the end condition meets C7, and generate the initial detector set D ₀ : randomly generate the pre-detector according to the selected API function, and filter the self (that is, delete the API sequence matching the self ), and then obtain the initial detector set; the matching strategy here is a partial matching strategy, that is, two sequences match if and only if the two strings are consistent in consecutive r positions;

(b)如图1中的匹配C6，比较当前API执行序列和检测器集中的任一检测器：如果发现匹配则标记该序列并将总匹配数目加1，当实时获取的待检测API序列总匹配数目达到阈值G_n时，向网络控制台发异常行为信息，如图1中的发送异常行为信息C8；(b) Match C6 in Figure 1, compare the current API execution sequence with any detector in the detector set: if a match is found, mark the sequence and add 1 to the total number of matches, when the total number of API sequences to be detected acquired in real time When the number of matches reaches the threshold _Gn , the abnormal behavior information is sent to the network console, such as sending abnormal behavior information C8 among Fig. 1;

(c)如图1中的启动检测并判断结束条件是否满足C7，如果进化代数t超过阈值G_e或全部API序列已被标记，继续对下一批API序列进行检测；(c) As shown in Figure 1, start the detection and judge whether the end condition meets C7, if the evolution algebra t exceeds the threshold _Ge or all API sequences have been marked, continue to detect the next batch of API sequences;

(d)对于不匹配的API序列，则依据亲合度变异、基因库进化、随机产生的三个子集D_A、D_G、D_R和记忆集D_M共同组成下一代检测器集D_t＝D_A+D_G+D_R+D_M，且D_A、D_G、D_R子集满足 $\frac{D_{A}}{1} \approx \frac{D_{G}}{2} \approx \frac{D_{M}}{1};$ (d) For unmatched API sequences, three subsets D _A , D _G , _DR and memory set D M are randomly generated based on affinity variation, gene pool evolution, and memory set D _M to form a next-generation detector set D _t =D _A +D _G +D _R +D _M , and the subsets of D _A , D _G , and _DR satisfy $\frac{{D.}_{A}}{1} \approx \frac{{D.}_{G}}{2} \approx \frac{{D.}_{m}}{1};$

(e)如图1中的亲合度变异C9，检测器子集D_A由亲合度变异产生，亲合度变异是指当API序列与检测器集中的任一检测器的匹配程度超过亲合度阈值G_f时，通过变异产生N_c(N_c≥1)个子代个体；(e) As shown in Figure 1, the affinity variation C9, the detector subset D _A is generated by the affinity variation, and the affinity variation refers to when the matching degree between the API sequence and any detector in the detector set exceeds the affinity threshold G When _f , generate N _c (N _c ≥ 1) offspring individuals through mutation;

一种建议采用的具体变异方法可以为：如果当前API执行序列与任一检测器匹配位数超过亲和度变异阈值，随机生成一个[1，L]的数a，对此检测器第a位发生变异，得到一个子代检测器；如此循环4次，对每个需要变异的检测器生成4个子代检测器。A suggested specific mutation method can be as follows: if the current API execution sequence matches any detector with more than the affinity mutation threshold, randomly generate a number a of [1, L], and the number a of the detector is A mutation occurs to obtain a descendant detector; this cycle is repeated 4 times, and 4 descendant detectors are generated for each detector that needs to be mutated.

(f)如图1中的基因库进化C10：检测器子集D_G由基因库进化产生，基因库进化是指提高组成有效检测器的API的选择概率，使得在通过赌轮法生成预检测器时，该API具有较高的选择概率，即P_api＝P_api+ΔP。需要指出的是，所有API的选择概率在开始时是一致的，具有相同的被选择概率P_api；而且为避免局部最优，每一次基因库进化的步长是很小的，即API选择概率的递增量ΔP很小，且对于所有的API，这里ΔP是相同的；(f) Gene pool evolution C10 as shown in Figure 1: The detector subset D _G is generated by gene pool evolution, and gene pool evolution refers to increasing the selection probability of APIs that make up effective detectors, so that when the pre-detection is generated by the roulette method When the device is used, the API has a higher selection probability, that is, P _api =P _api +ΔP. It should be pointed out that the selection probabilities of all APIs are consistent at the beginning, with the same selection probability P _api ; and in order to avoid local optimum, the step size of each gene pool evolution is very small, that is, the API selection probability The increment ΔP of is very small, and for all APIs, ΔP is the same here;

基因库进化中API选择概率提升部分的代码可以简写为：The code of the API selection probability improvement part in gene pool evolution can be abbreviated as:

for(有效检测器的每一个基因Gene)for (Each gene Gene of effective detector)

BeginBegin

该基因Gene的选择概率P[Gene]＝P[Gene]+ΔP。The selection probability of the gene Gene is P[Gene]=P[Gene]+ΔP.

Endend

其中ΔP是一个较小的常常数。如果对于任意Gene，初始P[Gene]为100，ΔP可设为0.1或0.01。Where ΔP is a small constant. If for any Gene, the initial P[Gene] is 100, ΔP can be set to 0.1 or 0.01.

(g)如图1中的随机产生C11，检测器子集D_R由随机产生，随机产生检测器是指在每一代检测器集中保持一定比例的检测器来自于随机产生的方式，这是为了维持检测器的多样性；(g) Randomly generate C11 as shown in Figure 1, the detector subset D _R is randomly generated, and randomly generating detectors means that a certain proportion of detectors in each generation of detector sets comes from random generation, which is for maintain detector diversity;

(h)如图1中的记忆集C12：记忆集D_M由能够匹配异常序列的检测器组成，它既可以在开始实时检测前通过离线生成，也可以在实际监测过程中将能检测到异常序列的检测器加入到记忆集中；(h) Memory set C12 in Figure 1: memory set D _M is composed of detectors that can match abnormal sequences, it can be generated offline before starting real-time detection, and can detect abnormalities during the actual monitoring process The detector of the sequence is added to the memory set;

5、网络控制台1是具有网络数据报接收功能的程序，可以用可视化编程工具编写，如VC++或Delphi编写，具有可视化界面并能够接收网络数据报和读写数据库；数据库可以使用Microsoft SQL Server数据库。管理员可以通过网络控制台获取对脚本、注册表写入表项路径以及API序列进行分析处理获得的异常行为信息。5. The network console 1 is a program with the function of receiving network datagrams. It can be written with visual programming tools, such as VC++ or Delphi. It has a visual interface and can receive network datagrams and read and write databases; the database can use Microsoft SQL Server database . Administrators can use the web console to obtain abnormal behavior information obtained by analyzing and processing scripts, registry write entry paths, and API sequences.

6、按照上述方法，包括对脚本进行关键词词频统计分析2、对注册表写入表项路径进行自我识别3和对API执行序列进行非我识别4，下面列出了针对75种Email病毒、Email蠕虫病毒和恶意代码的检测结果，结果表明本发明对网络病毒和恶意代码具有很好的效果。序号名称种类是否报病毒是 1 Bloodhound.vbs.worm Email，worm 2 Bloodhound.vbs.worm变种 Email，worm 是 3 vbs.mesut email 是 4 Jesus Email，worm 是 5 Vbs.jadra email 是 6 Vbs.infi email 是 7 Vbs.hatred.b email 是 8 Vbs.godog email 是 9 Vbs.hard Email，worm 是 10 Vbs.gascript Email，Trojan 是 11 I-Worm.CIAN email 是 12 Vbs.vbswg.qen Email，worm 是 13 I-Worm.doublet Email，worm 是 14 White house Email，worm 是 15 I-Worm.chu email 是 16 Loveletter Email，worm 是 17 freelink Email，worm 是 18 Mbop.d Email，worm 是 19 Kounikewa Email，worm 是 20 json888 恶意代码是 21 gator[1] 恶意代码是 22 overkill2 恶意代码是 23 redlof 恶意代码是 24 script.unrealer 恶意代码是 25 vbs.both 恶意代码是 26 VBS.kremp 恶意代码是 27 script.exploit 恶意代码否 28 script.happytime 恶意代码是 29 vbs.godog 恶意代码是 30 I-worm.doublet 恶意代码是 31 I-worm.chu 恶意代码是 32 vbs.baby 恶意代码是 33 vbs.gascript 恶意代码是 34 vbs.jesus 恶意代码是 35 vbs.mbop.d 恶意代码是 36 vbs.fasan 恶意代码是 37 vbs.hard.vbs 恶意代码是 38 vbs.infi 恶意代码是 39 vbs.jadra 恶意代码是 40 LOVE-LETTER-FOR-YOU 恶意代码是 41 vbs.mesut 恶意代码是 42 JS.Exception.Exploit1 恶意代码是 43 JS.Exception.Exploit2 恶意代码是 44 自编Writefile 恶意代码是 45 Writefile变种恶意代码是 46 IRC.salim 恶意代码是 47 Vbs.vbswg.qen 恶意代码是 48 Bloodhound.vbs.3 恶意代码是 49 Bloodhound.vbs.3变种1 恶意代码是 50 Bloodhound.vbs.3变种2 恶意代码是 51 Bloodhound.vbs.3变种3 恶意代码是 52 Bloodhound.vbs.3变种4 恶意代码是 53 Bloodhound.vbs.3变种5 恶意代码是 54 Bloodhound.vbs.3变种6 恶意代码是 55 Bloodhound.vbs.3变种7 恶意代码是 56 Bloodhound.vbs.3变种8 恶意代码是 57 Bloodhound.vbs.3变种9 恶意代码是 58 Vbs.bound 恶意代码是 59 Vbs.charl 恶意代码是 60 VBS.Phram.D(vbs.cheese) 恶意代码是 61 Vbs.entice 恶意代码是 62 Vbs.ave.a 恶意代码是 63 Vbs.exposed 恶意代码是 64 Vbs.annod(vbs.jadra) 恶意代码是 65 Vbs.nomekop 恶意代码是 66 Html.reality(vbs.reality) 恶意代码是 67 Bloodhound.vbs.3 恶意代码是 68 Bloodhound.vbs.3变种1 恶意代码是 69 Bloodhound.vbs.3变种2 恶意代码是 70 Bloodhound.vbs.3变种3 恶意代码是 71 Bloodhound.vbs.3变种4 恶意代码是 72 Bloodhound.vbs.3变种5 恶意代码是 73 Bloodhound.vbs.3变种6 恶意代码是 74 Bloodhound.vbs.3变种7 恶意代码是 75 Bloodhound.vbs.3变种8 恶意代码是 6. According to the above method, including statistical analysis of keyword frequency of scripts 2, self-identification of registry entry entry path 3 and non-self identification of API execution sequence 4, the following lists 75 kinds of Email viruses, The detection result of Email worm virus and malicious code shows that the present invention has good effect on network virus and malicious code. serial number name type Whether to report the virus is 1 Bloodhound.vbs.worm Email, worm 2 VB.Bloodhound.vbs.worm.worm Email, worm yes 3 vbs. mesut email yes 4 jesus Email, worm yes 5 Vbs.jadra email yes 6 Vbs.infi email yes 7 Vbs.hatred.b email yes 8 Vbs.godog email yes 9 Vbs.hard Email, worm yes 10 Vbs.gascript Email, Trojan yes 11 I-Worm.CIAN email yes 12 Vbs.vbswg.qen Email, worm yes 13 I-Worm. doublet Email, worm yes 14 white house Email, worm yes 15 I-Worm.chu email yes 16 Love letter Email, worm yes 17 free link Email, worm yes 18 Mbop.d Email, worm yes 19 Kounikewa Email, worm yes 20 json888 Malicious code yes twenty one gator[1] Malicious code yes twenty two overkill2 Malicious code yes twenty three red lof Malicious code yes twenty four script. unrealer Malicious code yes 25 vbs.both Malicious code yes 26 VBS.kremp Malicious code yes 27 script.exploit Malicious code no 28 script. happytime Malicious code yes 29 vbs.godog Malicious code yes 30 I-worm. doublet Malicious code yes 31 I-worm.chu Malicious code yes 32 vbs.baby Malicious code yes 33 vbs.gascript Malicious code yes 34 vbs. jesus Malicious code yes 35 vbs.mbop.d Malicious code yes 36 vbs.fasan Malicious code yes 37 vbs.hard.vbs Malicious code yes 38 vbs.infi Malicious code yes 39 vbs.jadra Malicious code yes 40 LOVE-LETTER-FOR-YOU Malicious code yes 41 vbs. mesut Malicious code yes 42 JS.Exception.Exploit1 Malicious code yes 43 JS.Exception.Exploit2 Malicious code yes 44 Self-made Writefile Malicious code yes 45 Variant of Writefile Malicious code yes 46 IRC.salim Malicious code yes 47 Vbs.vbswg.qen Malicious code yes 48 Bloodhound.vbs.3 Malicious code yes 49 Bloodhound.vbs.3 variant 1 Malicious code yes 50 Bloodhound.vbs.3 variant 2 Malicious code yes 51 Bloodhound.vbs.3 variant 3 Malicious code yes 52 Bloodhound.vbs.3 variant 4 Malicious code yes 53 Bloodhound.vbs.3 variant 5 Malicious code yes 54 Bloodhound.vbs.3 variant 6 Malicious code yes 55 Bloodhound.vbs.3 variant 7 Malicious code yes 56 Bloodhound.vbs.3 variant 8 Malicious code yes 57 Bloodhound.vbs.3 variant 9 Malicious code yes 58 Vbs.bound Malicious code yes 59 Vbs.charl Malicious code yes 60 VBS. Phram. D (vbs. cheese) Malicious code yes 61 Vbs.entice Malicious code yes 62 Vbs.ave.a Malicious code yes 63 Vbs.exposed Malicious code yes 64 Vbs.annod(vbs.jadra) Malicious code yes 65 Vbs.nomekop Malicious code yes 66 Html. reality (vbs. reality) Malicious code yes 67 Bloodhound.vbs.3 Malicious code yes 68 Bloodhound.vbs.3 variant 1 Malicious code yes 69 Bloodhound.vbs.3 variant 2 Malicious code yes 70 Bloodhound.vbs.3 variant 3 Malicious code yes 71 Bloodhound.vbs.3 variant 4 Malicious code yes 72 Bloodhound.vbs.3 variant 5 Malicious code yes 73 Bloodhound.vbs.3 variant 6 Malicious code yes 74 Bloodhound.vbs.3 variant 7 Malicious code yes 75 Bloodhound.vbs.3 variant 8 Malicious code yes

Claims

1, a kind of internet worm of layered cooperative and malicious code recognition methods comprise:

From script file, isolate keyword, obtain by the method for injecting dynamic link library (being called for short DLL) that application programming interface (being called for short API) is carried out sequence and registration table writes the list item path, registration table is write the list item path and the API sequence is kept in hard disk or the internal memory; It is characterized in that:

To the keyword word frequency statistics analysis of script and make unusual judgement;

Registration table is write the list item path to carry out oneself's identification and makes unusual judgement;

The API sequence is carried out nonego identification and made unusual judgement;

Abnormal behaviour information is sent to net control station;

The script file that described script file is meant the script file write with the Javascript language, write with the VBScript language and embedded Javascript or the script file of VBScript code;

Described injection DLL acquisition API execution sequence and registration table write the list item path and are meant, by DLL is injected in the target program as remote thread, the API execution sequence of the method intercepting target program of Import Address Table (being called for short IAT) is replaced in employing then, and writes the list item path from the parameter acquisition registration table of registration table api function;

Described to script the keyword statistical analysis and make unusual judgement and be meant and from script file, isolate 29 keyword copyfile, Createobject, Delete, FolderDelete, RegWrite, Virus, .Write, GetSpecialFolder, keys, opentextfile, readall, .save, startup, execute, .add, buildpath, copyfolder, createfolder, createtextfile, deletefile, fileexists, folderexists, getfile, getfolder, getparentfolder, format, .run, do copy, document.write, and carry out following steps:

(1) 29 keywords are divided into three groups, first group for creating object keyword: Createobject; Second group is no risky operation keyword itself: Virus .Write, GetSpecialFolder, keys, opentextfile, readall, startup, execute .add, buildpath, fileexists, folderexists, getfile, getfolder, getparentfolder .run, document.write; The 3rd group is to have the keyword that possibility is destroyed operation: copyfile, Delete, FolderDelete, RegWrite .save, copyfolder, createfolder, createtextfile, deletefile, format, do copy;

(2) the desired value f of the word frequency that these 29 keywords occur in the normal script of statistics _i, the desired value f of the word frequency that these 29 keywords occur in the unusual script is added up in 1≤i≤29 _i', 1≤i≤29, it is poor to calculate the normalization word frequency of 29 keywords in normal and unusual script

e_{i} = (f_{i} - f_{i}^{'}) / Σ_{i = 1}^{29} (f_{i} - f_{i}^{'}),

1≤i≤29：

(3) the statistics word frequency m that keyword occurs in current script to be detected _i, the risk factor Risk of script to be detected is calculated in 1≤i≤29,

Risk = G Σ_{i = 1}^{29} P (i) F (i)

Wherein P (i), F (i) and G are respectively:

(4) risk factor threshold value TH is defined as:

TH = Σ_{i = 0}^{29} P (i) / 29

When risk factor Risk surpasses threshold value TH, send early warning information to net control station;

Describedly registration table is write the list item path carry out oneself identification and make unusual judgement and take following steps:

(1) the normal registration table of target program writes the list item path under the collection normal condition, and deposits in the database, and each normal registration table writes the list item path and is called " oneself ", and its set is called " oneself's collection ";

(2) read current registration table and write the list item path, compare, if not in " oneself's collection ", then send abnormal behaviour information to net control station with original in the database " oneself " operation;

Described the API sequence carried out nonego identification and made unusual judgement and take following steps:

(1) API selection operation:

(a) the API sequence of target program under the intercepting normal condition, and be W with the sliding step ₀Mode it is cut into length is L ₀Trail S ₀

(b) the API sequence of target program under the intercepting operation with virus state, and be W with the sliding step ₀Mode it is cut into length is L ₀Trail R ₀

(c) compare trail S ₀And R ₀In different sequences, extract the api function that constitutes these sequences, with these api functions as api function collection to be monitored;

(2) according to selected api function, the API sequence of target program under the intercepting normal condition, and be that W is cut into the string that length is L with it with the sliding step, generate oneself's collection S;

(3) the current API that obtains target program carries out sequence, and is that W is cut into the string that length is L with it with the sliding step, reads N API sequence at every turn and carries out following testing process:

(a) produce initial detector collection D ₀: produce pre-detector at random according to selected api function, filter the oneself, and then obtain the initial detector collection; The matching strategy here is the part matching strategy, and promptly two sequences match and if only if these two character strings are in r position consistency continuously;

(b) more current AP carries out arbitrary detector that sequence and detector are concentrated: if find to mate then this sequence of mark and total matching number added 1, when the total matching number of API sequence to be detected that obtains in real time reaches threshold value G _nThe time, send out abnormal behaviour information to net control station;

(c) if evolutionary generation t surpasses threshold value G _eOr all the API sequences are labeled, continue to read next group API sequence and detect; Otherwise, for unmatched API sequence, then according to the variation of affinity degree, gene library evolution, three subset D producing at random _A, D _G, D _RWith memory collection D _MCommon composition detector collection D of future generation _i=D _A+ D _G+ D _R+ D _M, and D _A, D _G, D _RSubclass satisfies

\frac{D_{A}}{1} \approx \frac{D_{G}}{2} \approx \frac{D_{M}}{1};

Produce the detector subset D by the variation of affinity degree _A, affinity degree variation is meant that the matching degree of arbitrary detector of concentrating when API sequence and detector is above affinity degree threshold value G _fThe time, produce N by variation _c(N _c〉=1) individual filial generation individuality;

Produce the detector subset D by gene library evolution _G, gene library evolution is meant the selection probability that improves the API that forms valid detector, i.e. P _Api=P _Api+ Δ P; And when reality generates detector, select probability to generate pre-detector by the roulette wheel method according to API, filter the oneself at last and generate the detector subset D _G

By producing the detector subset D at random _R

The existing detector that can mate unusual sequence is formed memory collection D _M

Described net control station is meant to be used for receiving script, registration table is write the network program that list item path and API sequence are carried out the abnormal information that analyzing and processing obtained.