CN110096439B

CN110096439B - Test case generation method for solidity language

Info

Publication number: CN110096439B
Application number: CN201910341716.0A
Authority: CN
Inventors: 张鹏程; 于佳男; 吉顺慧; 肖锋
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2020-07-14
Anticipated expiration: 2039-04-26
Also published as: CN110096439A

Abstract

The invention discloses a method for generating a test case facing to a solidness language, aiming at an intelligent contract program realized by applying the solidity language, a corresponding Control Flow Graph (CFG) is obtained according to the program, and the control flow graph covers calling information among function methods in the intelligent contract program; deriving from the program CFG definition-use pairs (dup) of variables present in the program; in the process of generating the test case, a genetic algorithm is applied to accelerate the generation speed of the effective test case. On the other hand, considering the problem of integer overflow safety existing in the solid, when the fitness function of the genetic algorithm is designed, the definition and the use of variables which can cause integer overflow errors by the test cases are emphasized to realize coverage, and the fitness value of the test cases which cover dup related to the integer overflow errors is relatively large. The method is feasible, and realizes the integer overflow coverage test on the basis of realizing the coverage test of the conventional variable operation in the program.

Description

A Test Case Generation Method for Solidity Language

技术领域technical field

本发明涉及一种面向solidity语言的测试用例生成方法，尤其涉及到应用遗传算法并基于数据流测试的测试用例生成方法，属于软件测试技术领域。The invention relates to a test case generation method oriented to solidity language, in particular to a test case generation method using genetic algorithm and based on data flow testing, and belongs to the technical field of software testing.

背景技术Background technique

近两年来，随着区块链技术的不断发展，数字货币、智能合约成为了许多研究的热点，其中以太坊是支持智能合约部署编译的典型平台之一，solidity更是成为当今智能合约编写的热门语言。之所以被称为智能合约，其主要特点就在于可以自动的或具备一定的智能性的，通过预先定义好的执行条件和逻辑，在条件满足时，自动的执行合约内容，实现期望的功能；尤其是借助于区块链平台的去中心化、去信任、防篡改等特性，智能合约的优势被更好的放大，有了更大的应用前景。因此，对solidity这门语言的测试也是一个需更深入研究的问题。In the past two years, with the continuous development of blockchain technology, digital currency and smart contracts have become the focus of many researches. Among them, Ethereum is one of the typical platforms that support the deployment and compilation of smart contracts, and solidity has become the author of smart contracts. popular languages. The reason why it is called a smart contract is that its main feature is that it can be automatic or possess a certain intelligence. Through pre-defined execution conditions and logic, when the conditions are met, the contract content is automatically executed to achieve the desired function; In particular, with the help of the decentralization, trustlessness, and tamper resistance of the blockchain platform, the advantages of smart contracts are better amplified and have greater application prospects. Therefore, the testing of the solidity language is also a problem that needs more in-depth study.

由于区块链平台的特性，以solidity语言编写的智能合约没办法在执行过程中完成动态测试，因此，目前对于solidity语言的测试分析主要集中在对源代码进行静态分析上。除了传统语言中常见的程序错误，在solidity语言中还可能因编程缺陷等引发一些其特有的错误。针对solidity语言编程，Tsankov等人开发了一个程序分析器Securify，该分析器将程序预定义为两种模式：合规模式和违规模式，然后通过分析合约的依赖图从代码中得到语言信息，通过预定义的条件，判断程序合规或违规或者警告；基于重入攻击这个错误，Liu等提出了一个模型ReGuard，该模型通过将solidity程序通过中间表示转换成通过语言C++，然后通过对得到的C++程序执行模糊测试迭代的生成随机又不同的交易来发现错误。Due to the characteristics of the blockchain platform, smart contracts written in the solidity language cannot complete dynamic testing during the execution process. Therefore, the current testing analysis of the solidity language mainly focuses on the static analysis of the source code. In addition to the common program errors in traditional languages, some unique errors may also be caused in solidity languages due to programming defects. For solidity language programming, Tsankov et al. developed a program analyzer Securify, which predefines the program into two modes: compliance mode and violation mode, and then obtains language information from the code by analyzing the dependency graph of the contract, through Predefined conditions, judge program compliance or violation or warning; based on the error of reentrancy attack, Liu et al. proposed a model ReGuard, which converts solidity programs into language C++ through intermediate representation, and then passes on the obtained C++ The program performs fuzzing iterations to generate random and distinct transactions to find bugs.

随着对智能约合的使用越来越广泛，已经不仅仅是在货币交易领域，在其他通用信息产业等领域也将会有很广泛的使用，而面向solidity语言的测试仍处于起步阶段，因此有必要考虑到面向程序整体测试的测试用例生成技术。As the use of smart contracts becomes more and more extensive, it will be widely used not only in the field of currency transactions, but also in other general information industries and other fields, and the test for solidity language is still in its infancy, so It is necessary to take into account the test case generation techniques for the overall testing of the program.

发明内容SUMMARY OF THE INVENTION

发明目的：考虑到solidity语言的新颖性，且在使用过程中已出现了一些较为严重的安全性错误；与此同时，随着智能合约的应用领域越来越广，在现有的一些静态代码分析的基础上，程序的通用测试也是一个需求。本发明目的是提供一种面向solidity语言的测试用例生成方法，基于数据流测试的方法，实现对程序执行过程中变量操作进行整体性测试，并同时强调了对执行过程中可引发整数溢出错误的变量操作的测试。Purpose of the invention: Considering the novelty of the solidity language, and some serious security errors have occurred in the process of use; at the same time, with the wider application of smart contracts, some existing static codes On the basis of analysis, general testing of the program is also a requirement. The purpose of the present invention is to provide a test case generation method oriented to solidity language, based on the method of data flow test, to realize the integrity test of variable operation in the process of program execution, and at the same time to emphasize the problem of integer overflow errors in the execution process. Tests for variable manipulation.

技术方案：为实现上述发明目的，本发明采用如下技术方案：Technical scheme: In order to realize the above-mentioned purpose of the invention, the present invention adopts the following technical scheme:

一种面向solidity语言的测试用例生成方法，包括如下步骤：A test case generation method for solidity language, including the following steps:

(1)根据solidity语言的变量类型、流程控制语句、函数体结构、内部函数require以及函数修改器的使用对solidity语言实现的待测试智能合约程序进行分析得到对应的控制流图CFG；(1) According to the variable type, flow control statement, function body structure, internal function require and function modifier of solidity language, analyze the smart contract program to be tested implemented by solidity language to obtain the corresponding control flow graph CFG;

(2)根据步骤(1)的CFG图遍历各节点信息，判断存在uint型变量使用的节点是否存在整数溢出错误的安全性问题，若存在，则将存在整数溢出错误的节点标记；(2) traverse the information of each node according to the CFG graph of step (1), and judge whether there is a security problem of integer overflow error in the node used by the uint type variable, and if so, there will be a node mark with an integer overflow error;

(3)根据步骤(1)的CFG图遍历各节点信息，统计出程序中所有数值型变量的定义-使用对，记为dup，且若步骤(2)中判断结果为存在整数溢出错误，则将包含步骤(2)中的标记节点存在的变量的定义-使用对统计出来，记为dup’；(3) Traverse the information of each node according to the CFG graph of step (1), and count the definition-use pairs of all numerical variables in the program, which are recorded as dup, and if the judgment result in step (2) is that there is an integer overflow error, then Count the definition-use pairs of the variables including the existence of the marked node in step (2), and denote it as dup';

(4)根据步骤(3)统计出的dup，针对程序中的所有数值型变量随机生成初始的包含若干组测试用例的测试用例集；(4) According to the dup counted in step (3), randomly generate an initial test case set containing several groups of test cases for all numerical variables in the program;

(5)设计遗传算法中的适应度函数用于选取较优的测试用例以驱动算法执行；其中测试用例的适应度值为测试用例覆盖的dup的数量与覆盖的涉及到整数溢出错误的dup’数量的加权和与所有dup的数量与所有dup’数量加权和的比值；(5) The fitness function in the genetic algorithm is designed to select the optimal test case to drive the execution of the algorithm; the fitness value of the test case is the number of dups covered by the test case and the covered dups involving integer overflow errors. The ratio of the weighted sum of the number to the number of all dups and the weighted sum of the number of all dup's;

(6)根据步骤(5)中的适应度函数以及算法的执行界限，求得步骤(4)中初始测试用例的适应度值，开始遗传算法迭代执行，得到算法内最优结果。(6) According to the fitness function in step (5) and the execution limit of the algorithm, the fitness value of the initial test case in step (4) is obtained, and the iterative execution of the genetic algorithm is started to obtain the optimal result in the algorithm.

在优选的实施方案中，所述步骤(1)中分析待测试程序的CFG图时，考虑程序中地址型变量的使用，考虑由地址变量引出其他变量的定义；考虑程序中require和assert条件判断语句的使用，将require语句作为一个if条件结构语句进行处理；将函数修改器看作是函数调用的一种形态，遇到函数体时，首先分析该函数是否使用了函数修改器，若使用了函数修改器则先转入函数修改器，根据函数修改器内容进行节点间的连接关系转移。In a preferred embodiment, when analyzing the CFG diagram of the program to be tested in the step (1), consider the use of the address variable in the program, and consider the definition of other variables derived from the address variable; consider the require and assert conditions in the program to judge The use of the statement, the require statement is treated as an if conditional structure statement; the function modifier is regarded as a form of function call, when encountering a function body, first analyze whether the function uses a function modifier, if used The function modifier is first transferred to the function modifier, and the connection relationship between nodes is transferred according to the content of the function modifier.

在优选的实施方案中，所述步骤(2)中包括如下步骤：In a preferred embodiment, the step (2) includes the following steps:

(21)根据步骤(1)得到的CFG分析程序中是否存在整数溢出错误，具体为：遍历CFG图节点，对于存在uint型变量使用的节点，其中变量使用包括执行加、减、乘或除计算，判断在变量使用节点前是否已有对相应变量操作结果进行溢出判断的语句节点，若没有，则判定程序会出现整数溢出错误；(21) Whether there is an integer overflow error in the CFG analysis program obtained according to step (1), specifically: traversing the nodes of the CFG graph, and for the nodes where the uint variable is used, the variable use includes performing addition, subtraction, multiplication or division calculation , judging whether there is a statement node for overflow judgment on the corresponding variable operation result before the variable use node, if not, the judgment program will have an integer overflow error;

(22)若存在整数溢出错误，对CFG中涉及到引发整数溢出错误的变量操作节点进行标注。(22) If there is an integer overflow error, mark the variable operation node involved in the integer overflow error in the CFG.

在优选的实施方案中，所述步骤(3)中包括如下步骤：In a preferred embodiment, the step (3) includes the following steps:

(31)根据CFG图，逐节点分析语句信息，找到程序中存在的数值型变量的定义节点；(31) According to the CFG graph, analyze the statement information node by node, and find the definition node of the numerical variable existing in the program;

(32)针对步骤(31)中找到的每一个变量的定义节点，找到其对应的所有使用节点；(32) for the definition node of each variable found in step (31), find all use nodes corresponding to it;

(33)结合步骤(31)、(32)，得到程序中存在的变量的定义-使用对：dup＝(d,u,v)，计算其总数量，记为n；其中v表示某一变量，d表示v的某一定义节点，u表示v的某一使用节点；(33) Combining steps (31) and (32), the definition of the variable existing in the program is obtained-use pair: dup=(d, u, v), calculate the total number, and denote it as n; where v represents a variable , d represents a definition node of v, and u represents a usage node of v;

(34)结合步骤(22)与步骤(31)、(32)的内容得到存在整数溢出问题的变量定义-使用对，这里用dup’表示，计算其总数量，记为m，若步骤(22)中结果为没有整数溢出问题，即未做标记，则此处m统计值为0。(34) Combining the contents of step (22) and steps (31) and (32), the variable definition-use pair with integer overflow problem is obtained, which is represented by dup' here, and the total number is calculated, which is recorded as m. If step (22) ), the result is that there is no integer overflow problem, that is, it is not marked, so the m statistic value here is 0.

在优选的实施方案中，所述步骤(4)中针对程序中的所有数值型变量随机生成初始的包含4组测试用例的测试用例集。In a preferred embodiment, in the step (4), an initial test case set including 4 groups of test cases is randomly generated for all numerical variables in the program.

在优选的实施方案中，所述步骤(5)中，适应度函数公式

其中p_i表示第i个测试用例覆盖的dup的数量，q_i表示第i个测试用例覆盖的涉及到整数溢出错误的dup’数量，n表示程序中所有dup的数量，m表示程序中所有涉及到整数溢出错误的dup’数量，ε为权重参数，0<ε<1。In a preferred embodiment, in the step (5), the fitness function formula

where pi represents the number of dups covered by the _ith test case, qi represents the number of dup's covered by the _ith test case involving integer overflow errors, n represents the number of all dups in the program, and m represents all the dups involved in the program. The number of dup's to integer overflow errors, ε is the weight parameter, 0<ε<1.

在优选的实施方案中，所述步骤(6)中，算法执行界限的设定包含两部分：(1)遗传算法迭代次数，若达到最高代数限制，适应度值仍未达到1，也结束；(2)在遗传代数上限内，若适应度值达到1则保存当前测试用例生成结果，再进行下一代实验，若实验结果的适应度值较父代小，则结束，取上一代结果。In a preferred embodiment, in the step (6), the setting of the algorithm execution limit includes two parts: (1) the number of iterations of the genetic algorithm, if the highest algebraic limit is reached, and the fitness value has not yet reached 1, it will also end; (2) Within the upper limit of the genetic algebra, if the fitness value reaches 1, the current test case generation result will be saved, and then the next generation experiment will be performed. If the fitness value of the experimental result is smaller than that of the parent generation, it will end, and the result of the previous generation will be taken.

在优选的实施方案中，所述步骤(6)中包括如下步骤：In a preferred embodiment, the step (6) includes the following steps:

(61)种群中个体适应度值计算：对于生成的每个个体，即每个测试用例，将对应的测试数据放到程序中执行，通过预先对源程序进行插桩的方式计算其覆盖的dup数量，记为p，以及覆盖dup’数量，记为q，在统计q的值时，其前提条件为在当前个体执行状态下，变量的操作满足溢出条件，才将其记为对整数溢出错误的一次有效覆盖，根据适应度函数，计算个体适应度值；(61) Calculation of individual fitness value in the population: For each individual generated, that is, each test case, put the corresponding test data into the program for execution, and calculate the covered dup by pre-instrumenting the source program. The number, denoted as p, and the number of covered dup's, denoted as q, when the value of q is counted, the precondition is that in the current individual execution state, the operation of the variable satisfies the overflow condition, and then it is recorded as an integer overflow error An effective coverage of , according to the fitness function, calculate the individual fitness value;

(62)遗传算法迭代执行，选取算法内最优结果：依照遗传算法中的轮盘赌选择法、均匀交叉算子以及变异执行迭代执行遗传算法，遗传算法每一次执行适应度值计算都如步骤(61)，直到达到算法执行界限为止，最终得到算法结果。(62) Iteratively execute the genetic algorithm, and select the optimal result in the algorithm: execute the genetic algorithm iteratively according to the roulette selection method, the uniform crossover operator and the mutation in the genetic algorithm, and each time the genetic algorithm performs the fitness value calculation steps (61), until the algorithm execution limit is reached, and finally the algorithm result is obtained.

有益效果：本发明提供的一种面向solidity语言的基于数据流测试的测试用例生成方法，考虑到程序执行中涉及到较多的函数调用，而在函数执行调用过程中，变量的定义和使用又是其关键环节，因此采用基于数据流测试的方法，根据程序中变量的定义和使用情况，分析状态的变化。在实现对程序执行过程中变量操作进行整体性测试的同时强调了对执行过程中可引发整数溢出错误的变量操作的测试，进行这一操作的前提是通过控制流分析对引起整数溢出操作是否已进行了溢出判断，从而判定程序执行中是否会出现整数溢出的问题。与现有技术相比，本发明在面向solidity语言测试用例生成领域，提出了一个切实可行的算法，在实现对程序中常规变量操作进行覆盖测试的基础上，进一步考虑了对solidity语言中可引起严重问题的整数溢出错误的变量操作的覆盖测试。并且本发明采用的遗传算法对测试用例生成过程进行优化，可以在短时间内生成较优的测试用例。Beneficial effect: The method for generating test cases based on data flow testing oriented to solidity language provided by the present invention, considering that many function calls are involved in program execution, and in the process of function execution and invocation, the definition and use of variables are different. is its key link, so the method based on data flow test is adopted to analyze the change of state according to the definition and usage of variables in the program. While implementing the overall testing of variable operations during program execution, it also emphasizes the testing of variable operations that can cause integer overflow errors during execution. The premise of this operation is to analyze whether the integer overflow operation has been The overflow judgment is carried out to judge whether the problem of integer overflow occurs in the execution of the program. Compared with the prior art, the present invention proposes a feasible algorithm in the field of test case generation oriented to solidity language. Coverage testing of variable operations for serious problems with integer overflow errors. And the genetic algorithm adopted in the present invention optimizes the test case generation process, and can generate a better test case in a short time.

附图说明Description of drawings

图1为本发明实施例的整体步骤图；1 is an overall step diagram of an embodiment of the present invention;

图2为本发明实施例的方法流程图。FIG. 2 is a flowchart of a method according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合具体实施例，进一步阐明本发明，应理解这些实施例仅用于说明本发明而不用于限制本发明的范围，在阅读了本发明之后，本领域技术人员对本发明的各种等价形式的修改均落于本申请所附权利要求所限定的范围。Below in conjunction with specific embodiments, the present invention will be further illustrated, and it should be understood that these embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. The modifications all fall within the scope defined by the appended claims of this application.

如图1所示，本发明实施例提供的一种面向solidity语言的测试用例生成方法，包括如下步骤：As shown in FIG. 1 , a method for generating a test case for solidity language provided by an embodiment of the present invention includes the following steps:

步骤S1：待测试程序的CFG绘制。根据solidity语言的变量类型(尤其包括地址类型address和无符号整型uint)；包括if-else、while、for等的流程控制语句；通过关键字function定义函数体结构；特有的内部函数require以及函数修改器Function Modifier的使用等得到准确的CFG。本步骤在绘制CFG图时主要注意以下内容：(1)当程序中(一般是函数内)遇到require(或assert)这样的条件判断语句时，将require()当成是一个if判断结构，把需要判断的语句作为一个条件节点处理。当()内的条件满足时，继续执行该语句下面的内容，如果条件不满足时，直接跳出函数。(2)对程序中存在的函数间的调用要进行处理。对于存在函数方法调用的语句节点要通过程序CFG图实现对函数的调用，得到正确的执行顺序，并将被调用函数的返回结果正确返回调用节点；(3)正确处理函数修改器的调用，将其作为一个通用的函数调用处理，若当前函数需调用函数修改器，则在函数体内语句节点之前要先调用函数修改器的语句节点，在CFG中，要按照语句执行顺序实现相关节点的连接调用。Step S1: CFG drawing of the program to be tested. According to the variable type of solidity language (especially including address type address and unsigned integer type uint); flow control statements including if-else, while, for, etc.; define function body structure through keyword function; unique internal function require and function Use of modifiers Function Modifier etc to get accurate CFG. In this step, when drawing the CFG diagram, pay attention to the following: (1) When a conditional judgment statement such as require (or assert) is encountered in the program (usually within a function), require() is regarded as an if judgment structure, and the The statement that needs to be judged is processed as a conditional node. When the condition in () is satisfied, continue to execute the content below the statement, if the condition is not satisfied, jump out of the function directly. (2) The calls between functions existing in the program should be processed. For the statement node with function method call, the function call should be realized through the program CFG graph, the correct execution order should be obtained, and the return result of the called function should be correctly returned to the calling node; (3) Correctly handle the call of the function modifier, put It is processed as a general function call. If the current function needs to call the function modifier, the statement node of the function modifier must be called before the statement node in the function body. .

步骤S2：整数溢出错误分析。根据步骤S1的CFG图遍历节点信息，在统计变量信息前，首先分析该程序是否存在整数溢出的安全性问题，若存在整数溢出，将存在整数溢出的节点标记；Step S2: Integer overflow error analysis. According to the CFG graph of step S1 traversing the node information, before statistical variable information, first analyze whether the program has an integer overflow security problem, if there is an integer overflow, there will be an integer overflow node mark;

步骤S3：程序变量的定义使用对收集。根据步骤S1的CFG遍历节点，统计出程序中所有变量的定义-使用对,记为dup，且若步骤S2中分析结果为存在整数溢出错误，则将有该标记节点存在的变量的定义-使用对统计出来，记为dup’；Step S3: Definition of program variables using pair collection. According to the CFG traversal node in step S1, the definition-use pairs of all variables in the program are counted, and denoted as dup, and if the analysis result in step S2 is that there is an integer overflow error, there will be a variable definition-use pair that exists in the marked node. For statistics, it is recorded as dup';

步骤S4：初始测试用例集生成。根据步骤S3统计出的dup，首先针对程序中的所有变量随机生成初始的包含4组测试用例的测试用例集；Step S4: an initial test case set is generated. According to the dup counted in step S3, first randomly generate an initial test case set containing 4 groups of test cases for all variables in the program;

步骤S5：适应度函数设计及算法执行界限定义。设计遗传算法中的适应度函数用于选取较优者测试用例以驱动算法执行执行；并定义遗传算法执行界限，均衡覆盖率和执行时间，辅助算法执行；Step S5: fitness function design and algorithm execution limit definition. Design the fitness function in the genetic algorithm to select the best test case to drive the execution of the algorithm; and define the execution limit of the genetic algorithm, balance the coverage and execution time, and assist the execution of the algorithm;

步骤S6：遗传算法执行获取最优测试用例。根据步骤S5中的是适应度函数，求得步骤S4中初始测试用例(种群)的适应度值，进入遗传算法执行阶段，得到算法内最优结果。Step S6: The genetic algorithm is executed to obtain the optimal test case. According to the fitness function in step S5, the fitness value of the initial test case (population) in step S4 is obtained, and the genetic algorithm execution stage is entered to obtain the optimal result in the algorithm.

图2为本发明实施例的一种面向solidity语言的基于数据流测试的测试用例生成方法的详细步骤，具体如下：步骤S1中，根据待测试程序要得到对应的控制流图，具体步骤为：Fig. 2 is the detailed steps of a kind of test case generation method based on data flow test oriented to solidity language according to an embodiment of the present invention, and the details are as follows: In step S1, a corresponding control flow diagram is to be obtained according to the program to be tested, and the specific steps are:

步骤101：绘制给定程序的CFG，在绘制solidity编写的程序的控制流图时，首先考虑其控制结构语句：通用的有if-else、while、do-while、for、break、continue、return、？：(三目运算符)；solidity中没有switch和goto控制结构语句。以外要注意程序中require语句的使用，遇到require语句，将其作为if选择控制结构处理，条件满足到达下一个语句，条件不满足时，退出当前正在执行的函数。以及该语言中特有的函数修改器的使用，当一个函数中使用了函数修改器，将该函数修改器作为一个函数调用处理，首先执行函数修改器的内容；Step 101: Draw the CFG of a given program. When drawing the control flow graph of the program written by Solidity, first consider its control structure statements: common ones are if-else, while, do-while, for, break, continue, return, ? : (ternary operator); There are no switch and goto control structure statements in solidity. In addition, pay attention to the use of the require statement in the program. When the require statement is encountered, it is treated as an if selection control structure. When the condition is satisfied, the next statement is reached. When the condition is not satisfied, the currently executing function is exited. And the use of the unique function modifier in the language, when a function modifier is used in a function, the function modifier is processed as a function call, and the content of the function modifier is executed first;

步骤102：在solidity中，一个合约程序的内部也可能存在着多个函数方法(关键字为function)，因此，对于给定程序中所有存在函数调用的语句，在处理该语句的顶点节点相关信息时也要正确定义其涉及的函数的调用以及被调函数的返回关系。Step 102: In solidity, there may also be multiple function methods (the keyword is function) in a contract program. Therefore, for all statements with function calls in a given program, the information about the vertex nodes that process the statement is processed. When it is necessary to correctly define the call of the function involved and the return relationship of the called function.

步骤103：solidity中包含modifier函数修改器属性，一个函数可以通过调用函数修改器实现对该函数的修正作用。因此在生成控制流图过程中，需正确处理存在的对函数修改器的调用，将其作为一个通用的函数调用处理——若当前函数调用了函数修改器，则在函数体内语句节点执行之前应先调用函数修改器的语句节点。Step 103: Solidity includes the modifier function modifier attribute, and a function can implement the modification effect of the function by calling the function modifier. Therefore, in the process of generating the control flow graph, it is necessary to correctly handle the existing call to the function modifier and treat it as a general function call - if the current function calls the function modifier, it should be executed before the statement node in the function body. The statement node of the function modifier is called first.

步骤S2中，根据程序CFG分析程序是否存在整数溢出错误，并当存在整数溢出错误时，对涉及到溢出错误的节点进行标记，其具体步骤如下：In step S2, analyze whether the program has an integer overflow error according to the program CFG, and when there is an integer overflow error, mark the node involved in the overflow error, and the specific steps are as follows:

步骤201：程序中对uint类型(该类型变量最小值为0)变量的操作都有可能会导致整数溢出的发生，尤其是针对uint8这种短位的变量，所以针对这种变量在进行算数运算操作(加、减、乘、除)前要进行变量溢出判断，若程序没有溢出判断，则会有整数溢出问题；因此对步骤1中的程序CFG从开始节点顺序分析每个节点，当遇到存在uint型变量运算操作的节点，判断在该节点前是否对该操作进行了整数溢出判断，若没有溢出判断语言节点，则判定程序存在整数溢出问题；Step 201: The operation of the uint type (the minimum value of the variable of this type is 0) in the program may lead to the occurrence of integer overflow, especially for the short-bit variable such as uint8, so the arithmetic operation is performed for this variable Before the operation (addition, subtraction, multiplication, and division), variable overflow judgment should be performed. If the program does not have overflow judgment, there will be an integer overflow problem; therefore, the program CFG in step 1 analyzes each node sequentially from the starting node. If there is a node with a uint type variable operation operation, it is judged whether the operation has performed an integer overflow judgment before the node. If there is no overflow judgment language node, it is judged that the program has an integer overflow problem;

步骤202：若经分析，待测试程序存在整数溢出，则对CFG中涉及到引发整数溢出错误的变量操作节点做特殊标注。Step 202 : If the program to be tested has an integer overflow after analysis, special mark is made on the variable operation node involved in the integer overflow error in the CFG.

步骤S3中，根据程序的CFG分析出程序中存在的所有变量的定义-使用对dup，其具体步骤如下：In step S3, the definition-use pair dup of all variables existing in the program is analyzed according to the CFG of the program, and the specific steps are as follows:

步骤301：对CFG中的每一个节点，按前向顺序逐顶点节点识别出包含变量定义的顶点节点(例如，存在一个顶点节点④处的语句为sum＝a+b，则顶点④包含对变量sum的定义，是sum的一个定义节点)，一个变量可能在多个节点处被定义，在寻找定义节点def过程中，要注意步骤202中做了特殊标记节点，若为变量的定义节点，仍作为特殊节点处理；Step 301: For each node in the CFG, identify the vertex node containing the variable definition by vertex node in the forward order (for example, if there is a vertex node ④ where the statement is sum=a+b, then the vertex ④ contains a pair of variables. The definition of sum is a definition node of sum), a variable may be defined at multiple nodes, in the process of searching for the definition node def, pay attention to the special marked node in step 202, if it is the definition node of the variable, still Treated as a special node;

步骤302：针对步骤301中的每个变量的每个定义节点，寻找其对应的所有使用节点(例如，存在一个顶点节点④处的语句为sum＝a+b，则顶点④包含对变量a和变量b的使用，既是a又是b的一个使用节点)，同样的，一个变量可能存在多个使用节点use，在寻找使用节点use时，要注意步骤202中做了特殊标记节点，若为变量的使用节点，仍作为特殊节点处理；Step 302: For each definition node of each variable in step 301, look for all corresponding use nodes (for example, if there is a vertex node ④ where the statement is sum=a+b, then vertex ④ contains a pair of variables a and The use of variable b is both a and a use node of b). Similarly, a variable may have multiple use nodes use. When looking for use node use, pay attention to the special marked node in step 202. If it is a variable The used node is still treated as a special node;

步骤303：综合步骤301和步骤302，将会得到程序中所有的变量的定义-使用对，这里用如下表示dup＝(d,u,v)，其中v表示某一变量，d表示v的某一定义节点，u表示v的某一使用节点。对于同一个变量可能多个定义节点和多个使用节点，因此变量的定义使用对可以多个，计算出现的dup的数量总和(记为n)；Step 303: Combining step 301 and step 302, the definition-use pairs of all variables in the program will be obtained. Here, dup=(d, u, v) is expressed as follows, where v represents a variable, and d represents a certain variable of v. A definition node, u represents a certain usage node of v. For the same variable, there may be multiple definition nodes and multiple use nodes, so the definition and use pairs of the variable can be multiple, and the sum of the number of dup that appears (denoted as n) is calculated;

步骤304：结合步骤202与步骤301-302的内容得到存在整数溢出问题的变量定义-使用对，这里用dup’表示，用以区分通用变量和可引起整数溢出的变量的定义和使用，计算出现的dup’的数量总和(记为m)，若步骤202中结果为不存在整数溢出，即未做标记，则此处m统计值为0。Step 304: Combining the contents of Step 202 and Steps 301-302 to obtain a variable definition-use pair that has an integer overflow problem, which is represented by dup' here, it is used to distinguish the definition and use of a general variable and a variable that can cause an integer overflow, and the calculation occurs The sum of the number of dup's (denoted as m), if the result in step 202 is that there is no integer overflow, that is, it is not marked, the statistical value of m here is 0.

步骤S4中，生成初始测试用例(初代种群)，具体步骤如下：In step S4, an initial test case (primary population) is generated, and the specific steps are as follows:

步骤401：对于遗传算法中的种群大小(有几组测试用例，每个测试用例包含了程序中所有变量的一个测试输入)设置问题，考虑到适应度值计算的速度、遗传算法中理想测试用例的生成效率，结合对相关测试用例生成文献中参数选取和实验分析，同时要配合遗传算法中的交叉算子的使用，最终确定种群大小为4较为合适；Step 401: For the problem of setting the population size in the genetic algorithm (there are several groups of test cases, each test case contains a test input for all variables in the program), considering the speed of the calculation of the fitness value, the ideal test case in the genetic algorithm Combined with the parameter selection and experimental analysis in the relevant test case generation literature, and at the same time with the use of the crossover operator in the genetic algorithm, it is more appropriate to finally determine the population size to be 4;

步骤402：关于初始测试用例生成方法采用随机测试随机生成方法，根据程序中存在的变量随机的生成一个随机数作为变量的测试输入。Step 402: Regarding the initial test case generation method, a random test random generation method is adopted, and a random number is randomly generated according to a variable existing in the program as a test input of the variable.

与面向对象编程语言(如java)不同，solidity中存在着一些可引起严重问题的安全性问题，本发明中主要涉及到整数溢出，该问题的出现在具体使用环境中会引起严重的后果，这就促使在程序测试过程中应着重强调这一点。结合错误产生的具体原因，该问题都可以通过步骤S2中的程序执行顺序分析得到；因此，在应用遗传算法过程中，需设计合适的适应度函数求得测试用例的适应度值以找到较优测试用例，另一方面，要对遗传算法执行进行限定以便在合理的时间内找得到算法的较优解。步骤S5中遗传算法适应度函数设计及算法执行上限说明，具体步骤如下：Different from object-oriented programming languages (such as java), solidity has some security problems that can cause serious problems. The present invention mainly involves integer overflow. The occurrence of this problem will cause serious consequences in a specific use environment. This is why it should be emphasized in the process of program testing. Combined with the specific cause of the error, this problem can be analyzed by the program execution sequence in step S2; therefore, in the process of applying the genetic algorithm, it is necessary to design an appropriate fitness function to obtain the fitness value of the test case to find the optimal fitness value. The test case, on the other hand, is to limit the execution of the genetic algorithm in order to find a better solution to the algorithm in a reasonable time. In step S5, the genetic algorithm fitness function design and the algorithm execution upper limit description, the specific steps are as follows:

步骤501：设置参数ε，其中0<ε<1，ε在这里的作用是为了增加可导致整数溢出的相关变量操作的权重，可通过一些预实验，小范围内调整ε的值，通过分析测试用例趋于最优时遗传算法迭代次数以及整体适应度值收敛情况得到ε值，用于后面的实验；Step 501: Set the parameter ε, where 0<ε<1, the role of ε here is to increase the weight of related variable operations that can lead to integer overflow. Through some pre-experiments, the value of ε can be adjusted in a small range, and the analysis test can be passed. When the use case tends to be optimal, the number of iterations of the genetic algorithm and the convergence of the overall fitness value are used to obtain the ε value for subsequent experiments;

步骤502：基于数据流覆盖测试的目标(当前测试用例能否覆盖程序中每个变量的定义和该定义所到达的对该变量的使用之间的路径)及基本准则(测试用例的对程序内所有变量的定义-使用路径的覆盖情况决定了测试用例的好坏)，构造适应度函数

其中i表示第i个测试用例，p_i表示第i个测试用例覆盖的dup的数量，q_i表示第i个测试用例覆盖的涉及到整数溢出错误的dup’数量，n表示程序中所有dup的数量，m表示程序中所有涉及到整数溢出错误的dup’数量；Step 502: Cover the test target based on the data flow (whether the current test case can cover the path between the definition of each variable in the program and the use of the variable reached by the definition) and the basic criteria (the test case's internal reference to the program) The definition of all variables - the coverage of the use path determines the quality of the test case), construct the fitness function

where _i represents the ith test case, pi represents the number of dups covered by the ith test case, q _i represents the number of dup's covered by the ith test case involving integer overflow errors, and n represents the number of dups covered by the ith test case. The number, m represents the number of dup's involved in integer overflow errors in the program;

步骤503：通过以下两部分内容对算法执行结束条件做了说明：(1)为达到可以在有限时间内执行这一目标，设定合理遗传算法迭代次数，若达到最高代数限制，即使适应度值仍未达到1，结束算法执行；(2)在遗传代数上限内，若适应度值达到1，为了取得较优的测试用例，算法会继续执行下一代的遗传计算，同时保存当前测试用例生成结果，若下一代的实验结果适应度值没有达到1，则算法结束，取上一代结果；反之存在适应度值为1的测试用例，算法结束，取最后一代执行结果。Step 503: The execution end condition of the algorithm is explained through the following two parts: (1) In order to achieve the goal that can be executed in a limited time, set a reasonable number of iterations of the genetic algorithm. If the highest algebraic limit is reached, even if the fitness value (2) Within the upper limit of genetic algebra, if the fitness value reaches 1, in order to obtain better test cases, the algorithm will continue to execute the next generation of genetic calculations, while saving the current test case generation results , if the fitness value of the next generation experimental result does not reach 1, the algorithm ends, and the result of the previous generation is taken; otherwise, there is a test case with a fitness value of 1, the algorithm ends, and the execution result of the last generation is taken.

步骤S6中，遗传算法执行，得到算法内最优测试用例，其具体步骤如下(注：在遗传算法执行过程中，使用“个体”来表示每个测试用例)：In step S6, the genetic algorithm is executed to obtain the optimal test case within the algorithm, and the specific steps are as follows (Note: in the execution process of the genetic algorithm, "individual" is used to represent each test case):

步骤601：根据种群中每个个体中对变量的测试输入值，按照程序的CFG顺序执行每个语句节点，对于当前个体，首先计算其覆盖的dup数量(记为p)，以及覆盖dup’数量(记为q)，在统计q的值时，要注意其前提条件应满足：在将当前个体带入执行的状态下，对变量的使用或定义满足溢出条件(也就是带入变量的测试值后，保存相应节点操作的中间结果进行比较分析，且分析结果为溢出)时，才可以将该个体记为对整数溢出操作的一次有效覆盖，若步骤201中没有整数溢出、不存在标记节点，步骤202中没有被标记的dup’，则此处的q统计值为0，根据步骤502中的适应度函数计算每个个体的适应度值，其后的每一代遗传算法的执行都先计算新生成的种群每个个体的适应度值；Step 601: According to the test input value of the variable in each individual in the population, execute each statement node according to the CFG sequence of the program. For the current individual, first calculate the number of dups covered by it (denoted as p), and the number of covered dup' (denoted as q), when counting the value of q, it should be noted that its preconditions should be satisfied: in the state of bringing the current individual into the execution state, the use or definition of the variable satisfies the overflow condition (that is, the test value brought into the variable). After saving the intermediate results of the corresponding node operations for comparative analysis, and the analysis result is overflow), the individual can be recorded as an effective coverage of the integer overflow operation. If there is no integer overflow in step 201, there is no marked node, There is no marked dup' in step 202, the q statistic value here is 0, and the fitness value of each individual is calculated according to the fitness function in step 502, and the execution of each subsequent generation of genetic algorithm is calculated first. The fitness value of each individual in the generated population;

步骤602：本发明中，遗传算法的选择算子采用轮盘赌选择算子，具体为：个体被选中的概率与其适应度值的大小成正比，每个个体被选中的概率可以表示为

其中M表示种群大小(即测试用例的数量)，F_i表示第i个测试用例的适应度值；Step 602: In the present invention, the selection operator of the genetic algorithm adopts the roulette selection operator, specifically: the probability of an individual being selected is proportional to the size of its fitness value, and the probability of each individual being selected can be expressed as:

where M represents the population size (that is, the number of test cases), and F _i represents the fitness value of the ith test case;

步骤603：本发明中，遗传算法部分采用的交叉算子基本原理同均匀交叉算子，但在传统均匀交叉算子的原理上进行了一些改变，具体为：交叉环节取两个个体做为一组，组内两个相互配对的个体中的每个变量以相同的概率进行交换，形成两个新个体，最终则可得到全新的四个个体；Step 603: In the present invention, the basic principle of the crossover operator used in the genetic algorithm is the same as that of the uniform crossover operator, but some changes have been made in the principle of the traditional uniform crossover operator. Group, each variable in the two paired individuals in the group is exchanged with the same probability to form two new individuals, and finally four new individuals can be obtained;

步骤604：本发明中，遗传算法部分采用的变异算子在传统遗传算法的基本位变异的基础上进行了修改，具体为：对经过交叉算子后得到的个体，随机指定个体中的某一个或几个变量以变异概率P_m进行变换；Step 604: In the present invention, the mutation operator used in the genetic algorithm is modified on the basis of the basic bit mutation of the traditional genetic algorithm. Specifically, for the individuals obtained after the crossover operator, randomly designate one of the individuals. or several variables are transformed with the probability of mutation P _m ;

步骤605：经过步骤601—604，遗传算法将会自动生成下一代种群；Step 605: After steps 601-604, the genetic algorithm will automatically generate the next generation population;

步骤606：在步骤503的算法执行结束条件的约束下，迭代的执行步601-605，最终得到的测试用例即可视为算法内最优解。Step 606: Under the constraints of the end condition of the algorithm execution in step 503, iteratively execute steps 601-605, and the finally obtained test case can be regarded as the optimal solution in the algorithm.

Claims

1. A method for generating a test case facing to a language is characterized by comprising the following steps:

(1) analyzing an intelligent contract program to be tested realized by the solidity language according to the variable type, the flow control statement, the function body structure, the internal function requirer and the use of the function modifier of the solidity language to obtain a corresponding control flow graph CFG;

(2) traversing each node information according to the CFG graph in the step (1), judging whether the node used by the agent type variable has the safety problem of the integer overflow error, and if so, marking the node with the integer overflow error;

(3) traversing each node information according to the CFG graph in the step (1), counting definition-use pairs of all numerical variables in the program, and recording the definition-use pairs as dup, and counting the definition-use pairs of the variables including the marked nodes in the step (2) if the judgment result in the step (2) is that an integer overflow error exists, and recording the definition-use pairs as dup';

(4) randomly generating an initial test case set containing a plurality of groups of test cases aiming at all numerical variables in the program according to the dup counted in the step (3); the group number of the test cases is determined according to the size of the population in the genetic algorithm;

(5) designing a fitness function in the genetic algorithm for selecting a better test case to drive the algorithm to execute; the fitness value of the test case is the ratio of the weighted sum of the number of dups covered by the test case and the number of covered dup 'related to integer overflow errors to the weighted sum of the number of all dups and the number of all dup';

(6) and (4) according to the fitness function in the step (5) and the execution limit of the algorithm, solving the fitness value of the initial test case in the step (4), and starting iterative execution of the genetic algorithm to obtain an optimal result in the algorithm.

2. The solidity language-oriented test case generation method according to claim 1, wherein, when analyzing the CFG diagram of the program to be tested in step (1), the use of an address type variable in the program is considered, and the definition of an unsigned integer variable derived from the address variable is considered; considering the use of a require and alert condition judgment statement in a program, and processing the require statement as an if condition structure statement; the function modifier is regarded as a form of function call, when a function body is encountered, whether the function uses the function modifier is firstly analyzed, if the function modifier is used, the function modifier is firstly transferred, and the connection relation among nodes is transferred according to the content of the function modifier.

3. The solidity language-oriented test case generation method according to claim 1, wherein the step (2) includes the following steps:

(21) according to whether an integer overflow error exists in the CFG analysis program obtained in the step (1), the method specifically comprises the following steps: traversing CFG graph nodes, and judging whether statement nodes for performing overflow judgment on corresponding variable operation results exist before the variable use nodes or not for the nodes with the agent type variable use, wherein the variable use comprises addition, subtraction, multiplication or division calculation, and if not, judging that an integer overflow error occurs in a program;

(22) if the integer overflow error exists, marking the variable operation nodes which are involved in the CFG and cause the integer overflow error.

4. The solidity language-oriented test case generation method according to claim 3, wherein the step (3) includes the following steps:

(31) analyzing the statement information node by node according to the CFG graph, and finding out definition nodes of numerical variables in the program;

(32) finding all corresponding use nodes of the definition nodes of each variable found in the step (31);

(33) in connection with steps (31), (32), the definition-use pairs of the variables present in the program are obtained: d, calculating the total number of dup and recording the total number as n; wherein v represents a variable, d represents a defined node of v, and u represents a used node of v;

(34) and (3) combining the contents of the step (22) and the steps (31) and (32) to obtain a variable definition-use pair with the integer overflow problem, wherein the variable definition-use pair is denoted by dup', the total number of the variable definition-use pair is calculated and is marked as m, and if the result in the step (22) is that no integer overflow problem exists, namely the integer overflow problem is not marked, the statistical value of m is 0.

5. The method according to claim 1, wherein the step (4) randomly generates an initial test case set containing 4 groups of test cases for all the numerical variables in the program.

6. The method for generating relevance language-oriented test cases according to claim 1, wherein in the step (5), the fitness function formula

Wherein p is_iIndicates the number of dup covered by the ith test case, q_iIndicating the number of dup' covered by the ith test case and related to integer overflow errors, n indicating the number of all dups in the program, and m indicating all integer overflow errors in the programThe number of false dup's, which is a weight parameter, 0<<1。

7. The method for generating testcases in the ontology language according to claim 1, wherein the step (6) of setting the algorithm execution limit includes two steps: (1) the iteration times of the genetic algorithm are finished, if the highest algebraic limit is reached, the fitness value still does not reach 1; (2) and (4) within the upper limit of the genetic algebra, if the fitness value reaches 1, storing the result generated by the current test case, then carrying out the next generation experiment, if the fitness value of the experiment result is smaller than that of the parent generation, ending, and taking the previous generation result.

8. The solidity language-oriented test case generation method according to claim 1, wherein the step (6) includes the following steps:

(61) calculating individual fitness values in the population: for each generated individual, namely each test case, putting corresponding test data into a program for execution, calculating the number of covered dup, which is recorded as p, the number of covered dup', which is recorded as q, in a way of pre-instrumentation of a source program, and when the value of q is counted, the precondition is that under the current individual execution state, the operation of a variable meets an overflow condition, and then the variable is recorded as one-time effective coverage of an integer overflow error, and an individual fitness value is calculated according to a fitness function;

(62) and (3) performing iterative execution of a genetic algorithm, and selecting an optimal result in the algorithm: and (4) performing iterative execution of the genetic algorithm according to a roulette selection method, a uniform crossover operator and a variation in the genetic algorithm, wherein the fitness value calculation is performed in the step (61) every time the genetic algorithm performs the fitness value calculation until the algorithm execution limit is reached, and finally obtaining an algorithm result.