CN111611177B

CN111611177B - A Software Performance Defect Detection Method Based on Performance Expectation of Configuration Items

Info

Publication number: CN111611177B
Application number: CN202010610996.3A
Authority: CN
Inventors: 李姗姗; 廖湘科; 王戟; 董威; 何浩辰; 陈振邦; 陈立前; 贾周阳; 王腾
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2023-06-09
Anticipated expiration: 2040-06-29
Also published as: CN111611177A

Abstract

The invention discloses a software performance defect detection method based on configuration item performance expectations, and aims to provide a method for effectively detecting performance defects related to configuration items. The technical solution is: use configuration item performance expectations to build a performance defect detection system composed of configuration item expectation prediction module, test sample generation module, and performance defect detection module; train configuration item expectation prediction module; read in the software to be tested by configuring The item expectation prediction module predicts the performance expectation of the configuration item, the test sample generation module generates test samples according to the performance expectation and the software test set, the performance defect detection module executes the test sample and detects whether the performance expectation is consistent with the actual performance, if not, then Output performance flaws. The invention can not only effectively detect software performance defects, but also detect new performance defects for the software community, and adopt the invention to effectively distinguish the performance difference between non-defective software and defective software.

Description

A software performance defect detection method based on configuration item performance expectation

技术领域Technical Field

本发明涉及大型软件中的性能缺陷检测领域，具体涉及一种基于配置项性能期望的软件性能缺陷检测方法。The present invention relates to the field of performance defect detection in large-scale software, and in particular to a software performance defect detection method based on configuration item performance expectations.

背景技术Background Art

随着社会的不断进步，软件系统已经在各个领域得到广泛应用，在现代社会中扮演着举足轻重的角色，发挥了重要的作用。随着软件系统的不断发展，人们对软件的可靠性，安全性，性能(软件运行速度)要求越来越高，导致软件规模不断增大，软件复杂度不断提升。例如，Hadoop分布式开源软件的2.8.0版本，源码文件数量超过8000，代码总行数接近千万。同时，软件系统提供更多更加灵活的配置项以使用户根据需求配置软件。例如，Apache httpd软件中共有个1000多个配置项，MySQL中有800多个配置项。且非功能属性的所占的比例日益增加，这些配置项与计算资源(如CPU、内存等)、性能优化策略密切相关。同时，随着软件规模不断增大，提高软件性能是软件演化和维护最重要的任务之一。Xue Han等人在ESEM 2016发表的文章“An Empirical Study on Performance Bugs for HighlyConfigurable Software Systems(高可配置软件系统中的性能缺陷实证研究)”表明：配置项也成为引发软件性能问题的主要原因之一，比例高达59％。在对148家企业的调查中，92％的企业认为提高软件性能是软件发展过程中最重要的任务之一。近年来，软件配置项相关的代码缺陷导致的软件性能问题造成了巨大的商业损失。With the continuous progress of society, software systems have been widely used in various fields, playing a pivotal role in modern society and playing an important role. With the continuous development of software systems, people have higher and higher requirements for software reliability, security, and performance (software running speed), resulting in the continuous increase of software scale and software complexity. For example, the 2.8.0 version of Hadoop distributed open source software has more than 8,000 source code files and nearly 10 million lines of code. At the same time, the software system provides more and more flexible configuration items to enable users to configure the software according to their needs. For example, there are more than 1,000 configuration items in Apache httpd software and more than 800 configuration items in MySQL. And the proportion of non-functional attributes is increasing day by day. These configuration items are closely related to computing resources (such as CPU, memory, etc.) and performance optimization strategies. At the same time, as the scale of software continues to increase, improving software performance is one of the most important tasks in software evolution and maintenance. Xue Han et al. published an article titled "An Empirical Study on Performance Bugs for HighlyConfigurable Software Systems" at ESEM 2016, which showed that configuration items have also become one of the main causes of software performance problems, accounting for as high as 59%. In a survey of 148 companies, 92% of them believed that improving software performance is one of the most important tasks in the software development process. In recent years, software performance problems caused by code defects related to software configuration items have caused huge business losses.

针对软件性能问题，现有技术主要采用两类方法对其进行检测。第一类方法，如DuShen等人在ISSTA2015发表的“Automating Performance Bottleneck Detection usingSearch-Based Application Profiling(一种基于搜索和profiling的性能缺陷自动检测方法)”，主要基于profiler等性能瓶颈诊断工具生成使软件运行缓慢的测试用例，并将执行该用例耗时最长的函数作为性能缺陷报告给开发者。虽然此类方法检测性能缺陷的覆盖率较高，但会存在大量误报。原因是测试用例执行缓慢可能并非由于性能缺陷导致，而是因为测试用例本身所需的时间较长。即，该类方法缺乏有效的性能测试预言(Test Oracle:Incomputing,software engineering,and software testing,a test oracle(or justoracle)is a mechanism for determining whether a test has passed or failed.测试预言：在计算机、软件、软件测试领域，测试预言是判断一个测试是否通过测试的标准)。In view of software performance issues, the existing technology mainly uses two types of methods to detect them. The first type of method, such as "Automating Performance Bottleneck Detection using Search-Based Application Profiling (a performance defect automatic detection method based on search and profiling)" published by DuShen et al. in ISSTA2015, mainly generates test cases that make the software run slowly based on performance bottleneck diagnostic tools such as profilers, and reports the function that takes the longest time to execute the case to the developer as a performance defect. Although this type of method has a high coverage rate for detecting performance defects, there will be a large number of false positives. The reason is that the slow execution of test cases may not be caused by performance defects, but because the test cases themselves take a long time. That is, this type of method lacks an effective performance test oracle (Test Oracle: Input computing, software engineering, and software testing, a test oracle (or just oracle) is a mechanism for determining whether a test has passed or failed. Test oracle: In the field of computers, software, and software testing, test oracle is a standard for determining whether a test has passed the test).

第二类方法，如Adrian Nistor等人在ICSE 2013发表的“Toddler:DetectingPerformance Problems via Similar Memory-Access Patterns(通过相似的内存读写模式检测性能缺陷)”，通过总结循环结构中的性能缺陷代码模式和变量读取模式，匹配待测软件中的性能缺陷。此类方法基于缺陷代码模式构建测试预言，能够有效减少性能故障的误报。然而，循环结构中的性能缺陷仅占一般性能缺陷的少量比例，因此该类方法局限于检测某种特定类型的故障(如循环结构中的缺陷)，且经验证，该类方法仅能检测出9.8％配置项相关的性能故障。The second type of method, such as "Toddler: Detecting Performance Problems via Similar Memory-Access Patterns" published by Adrian Nistor et al. at ICSE 2013, matches the performance defects in the software under test by summarizing the performance defect code patterns and variable read patterns in the loop structure. This type of method builds test predictions based on defective code patterns, which can effectively reduce false positives of performance faults. However, performance defects in loop structures only account for a small proportion of general performance defects, so this type of method is limited to detecting a specific type of fault (such as defects in loop structures), and it has been verified that this type of method can only detect 9.8% of performance faults related to configuration items.

综上，如何构建低误报、高覆盖的性能测试预言，并自动化生成相应的测试样例，以有效、全面地检测软件性能缺陷是本领域技术人员正在探讨的热点问题。In summary, how to construct a performance test oracle with low false positives and high coverage, and automatically generate corresponding test samples to effectively and comprehensively detect software performance defects is a hot issue that technicians in this field are discussing.

发明内容Summary of the invention

本发明要解决的技术问题是提供一种基于配置项性能期望的软件性能缺陷检测方法。此方法利用软件配置项性能期望构建测试预言(即当软件实际性能与配置项性能期望不符时，则存在性能缺陷)，同时自动预测待测软件的测试预言；基于测试预言，自动生成测试样例，有效检测出配置项相关的性能缺陷。The technical problem to be solved by the present invention is to provide a method for detecting software performance defects based on configuration item performance expectations. This method uses the performance expectations of software configuration items to construct test predictions (i.e., when the actual performance of the software does not meet the performance expectations of the configuration items, there is a performance defect), and automatically predicts the test predictions of the software to be tested; based on the test predictions, test samples are automatically generated to effectively detect performance defects related to the configuration items.

为解决上述技术问题，本发明的技术方案为：首先，利用He Haochen在ESEC/FSE2019发表的“Tuning backfired？not(always)your fault:understanding anddetecting configuration-related performance bugs(配置调节适得其反？不总是你的错！理解并检测配置相关的性能缺陷)”所述的配置项性能期望构建由配置项期望预测模块、测试样例生成模块、性能缺陷检测模块构成的性能缺陷检测系统；然后，读入人工标记了配置项期望的训练数据集，对配置项期望预测模块进行训练；最后读入待检测软件(包括软件、软件自带测试集、软件配置项用户手册)，由配置项期望预测模块预测配置项的性能期望并发送到测试样例生成模块和性能缺陷检测模块，测试样例生成模块根据性能期望和软件测试集生成测试样例并发送到性能缺陷检测模块，性能缺陷检测模块执行测试样例并检测性能期望和实际性能是否相符，若不相符则输出性能缺陷。To solve the above technical problems, the technical solution of the present invention is as follows: first, the configuration item performance expectations described in "Tuning backfired? not (always) your fault: understanding and detecting configuration-related performance bugs" published by He Haochen at ESEC/FSE2019 are used to construct a performance defect detection system consisting of a configuration item expectation prediction module, a test sample generation module, and a performance defect detection module; then, a training data set with manually marked configuration item expectations is read in to train the configuration item expectation prediction module; finally, the software to be tested (including software, software built-in test set, and software configuration item user manual) is read in, and the configuration item expectation prediction module predicts the performance expectation of the configuration item and sends it to the test sample generation module and the performance defect detection module. The test sample generation module generates test samples according to the performance expectations and the software test set and sends them to the performance defect detection module. The performance defect detection module executes the test samples and detects whether the performance expectations are consistent with the actual performance. If they are not consistent, a performance defect is output.

本发明包括以下步骤：The present invention comprises the following steps:

第一步，构建性能缺陷检测系统，性能缺陷检测系统由配置项期望预测模块、测试样例生成模块、性能缺陷检测模块构成。The first step is to build a performance defect detection system, which consists of a configuration item expectation prediction module, a test sample generation module, and a performance defect detection module.

配置项期望预测模块是一个加权投票分类器，与测试样例生成模块、性能缺陷检测模块相连，从待检测软件的配置项用户手册读取配置项的描述、取值范围，对待预测配置项的性能期望进行预测，得到配置项的性能期望标签(用标签表示性能期望的类别)，将配置项的性能期望标签发送给测试样例生成模块和性能缺陷检测模块。The configuration item expectation prediction module is a weighted voting classifier, which is connected to the test sample generation module and the performance defect detection module. It reads the description and value range of the configuration item from the configuration item user manual of the software to be tested, predicts the performance expectation of the configuration item to be predicted, obtains the performance expectation label of the configuration item (using the label to represent the category of the performance expectation), and sends the performance expectation label of the configuration item to the test sample generation module and the performance defect detection module.

测试样例生成模块与配置项期望预测模块、性能缺陷检测模块相连，从配置项期望预测模块接收配置项的性能期望标签，从待检测软件的测试集读取测试命令，根据配置项的性能期望标签和待检测软件测试集生成测试样例集合T。The test sample generation module is connected to the configuration item expectation prediction module and the performance defect detection module, receives the performance expectation label of the configuration item from the configuration item expectation prediction module, reads the test command from the test set of the software to be tested, and generates a test sample set T according to the performance expectation label of the configuration item and the test set of the software to be tested.

性能缺陷检测模块与配置项期望预测模块、测试样例生成模块相连，从测试样例生成模块接收测试样例集合T，从配置项期望预测模块接收配置项的性能期望标签，执行测试样例集合T中测试样例并检测配置项的性能期望标签所对应的期望性能和实际性能是否相符，若不相符则输出待检测软件的性能缺陷。The performance defect detection module is connected to the configuration item expectation prediction module and the test sample generation module, receives the test sample set T from the test sample generation module, receives the performance expectation label of the configuration item from the configuration item expectation prediction module, executes the test samples in the test sample set T and detects whether the expected performance corresponding to the performance expectation label of the configuration item is consistent with the actual performance. If not, the performance defect of the software to be tested is output.

第二步：训练性能缺陷检测系统的配置项期望预测模块。读入人工标注期望的配置项和配置项的官方文档描述，训练配置项期望预测模块。Step 2: Train the configuration item expectation prediction module of the performance defect detection system. Read the manually annotated expected configuration items and the official document descriptions of the configuration items to train the configuration item expectation prediction module.

2.1构建训练集，方法是：从MySQL、MariaDB、Apache-httpd、Apache-Tomcat、Apache-Derby、H2、PostgreSQL、GCC、Clang、MongoDB、RocksDB、Squid共12款软件的1万多个配置项中随机选取N(其中，N≥500)个配置项。2.1 Construct a training set by randomly selecting N (where N ≥ 500) configuration items from more than 10,000 configuration items of 12 software, including MySQL, MariaDB, Apache-httpd, Apache-Tomcat, Apache-Derby, H2, PostgreSQL, GCC, Clang, MongoDB, RocksDB, and Squid.

2.2根据N个配置项的官方文档描述，对配置项人工标注其性能期望标签，方法为：根据配置项(记为c)的文档描述(记为d)，如果调节该配置项的目的是为了开启优化开关(即性能期望标签的含义是开启优化开关)，则该配置项的性能期望标签为Label₁；如果调节该配置项的目的是为了提升性能牺牲可靠性等非功能需求，则该配置项的性能期望标签为Label₂；如果调节该配置项的目是为了分配更多计算机资源，则该配置项的性能期望标签为Label₃；如果调节该配置项的目的是为了开启软件额外功能，则该配置项的性能期望标签为Label₄；如果调节该配置项与软件性能无关，则该配置项的性能期望标签为Label₅；最终得到训练集，记为

其中，N₁+N₂+N₃+N₄+N₅＝N；N₁、N₂、N₃、N₄、N₅分别为性能期望标签为Label₁,Label₂,Label₃,Label₄,Label₅的配置项文档描述的个数。

是训练集中性能期望标签为Label_l的第i_l个配置项。

是

的文档描述，由单词组成。其中，1≤l≤5，1≤i_l≤N_l。令

中的单词总数为

记为(单词₁，单词₂，…，单词

…，单词

)。2.2 According to the official document descriptions of N configuration items, manually label the configuration items with their performance expectation labels. The method is as follows: according to the document description (denoted as d) of the configuration item (denoted as c), if the purpose of adjusting the configuration item is to turn on the optimization switch (that is, the meaning of the performance expectation label is to turn on the optimization switch), then the performance expectation label of the configuration item is Label ₁ ; if the purpose of adjusting the configuration item is to improve performance at the expense of non-functional requirements such as reliability, then the performance expectation label of the configuration item is Label ₂ ; if the purpose of adjusting the configuration item is to allocate more computer resources, then the performance expectation label of the configuration item is Label ₃ ; if the purpose of adjusting the configuration item is to enable additional software functions, then the performance expectation label of the configuration item is Label ₄ ; if adjusting the configuration item is not related to software performance, then the performance expectation label of the configuration item is Label ₅ ; finally, the training set is obtained, denoted as

Among them, N ₁ +N ₂ +N ₃ +N ₄ +N ₅ =N; N ₁ , N ₂ , N ₃ , N ₄ , and N ₅ are the numbers of configuration item document descriptions with performance expectation labels Label ₁ , Label ₂ , Label ₃ , Label ₄ , and Label ₅ respectively.

It is the i _lth configuration item with expected performance label Label _l in the training set.

yes

The document description consists of words. Among them, 1≤l≤5, 1≤i _l ≤N _l . Let

The total number of words in

Recorded as (word ₁ , word ₂ , ..., word

…,word

).

2.3配置项期望预测模块预处理训练集；2.3 Configuration item expectation prediction module preprocesses the training set;

2.3.1初始化变量l＝1；2.3.1 Initialize variable l = 1;

2.3.2初始化变量o_l＝1；2.3.2 Initialize variable o _l = 1;

2.3.3对

进行预处理，方法是：2.3.3 Pair

Perform preprocessing by:

2.3.3.1令变量

2.3.3.1 Let variables

2.3.3.2将单词

转化为

其中

为单词的词性标签(如名词(Noun)，动词(Verb)等)，

为计算机领域同义词(如memory、CPU的DS均为resource)；2.3.3.2 Word

Convert to

in

is the part-of-speech tag of the word (such as noun, verb, etc.),

They are synonyms in the computer field (e.g., DS in memory and CPU both stand for resource);

2.3.3.3若

令

转2.3.3.2；若

则得到预处理后的

为如下形式：

简记为

转2.3.4；2.3.3.3 If

make

Go to 2.3.3.2; if

Then the preprocessed

In the following form:

Abbreviated as

Go to 2.3.4;

2.3.4判断i_l是否等于N_l，若是，转2.3.5，否则令i_l＝i_l+1，转2.3.3；2.3.4 Determine whether i _l is equal to N _l . If so, go to 2.3.5. Otherwise, set i _l =i _l +1 and go to 2.3.3.

2.3.5判断l是否等于5，若是，转2.4，否则令l＝l+1，转2.3.2；2.3.5 Determine whether l is equal to 5. If so, go to 2.4. Otherwise, set l = l + 1 and go to 2.3.2.

2.4配置项期望预测模块挖掘频繁子序列。使用Jian Pei等人在ICDE 2001发表的文献“PrefixSpan:Mining Sequential Patterns Efficiently by Prefix-ProjectedPattern Growth(PrefixSpan：通过前缀投影模式有效地挖掘序列模式)”的PrefixSpan算法分别对集合

进行频繁子序列挖掘，得到5个频繁子序列集合：

其中Q₁,Q₂,…,Q_l,…,Q₅为正整数，表示当l＝1,2,…,5时，PrefixSpan算法从集合

挖掘出的频繁子序列的个数；1≤q≤Q_l；2.4 Configuration Item Expectation Prediction Module Mines Frequent Subsequences. We use the PrefixSpan algorithm from the paper “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth” published by Jian Pei et al. in ICDE 2001 to mine frequent subsequences.

Perform frequent subsequence mining and obtain 5 frequent subsequence sets:

Where Q ₁ ,Q ₂ ,…,Q _l ,…,Q ₅ are positive integers, indicating that when l＝1,2,…,5, the PrefixSpan algorithm selects

The number of frequent subsequences mined; 1≤q≤Q _l ;

2.5对P₁,P₂,…,P₅中的所有频繁子序列计算置信度(Confidence)，方法是：2.5 Calculate the confidence of all frequent subsequences in P ₁ , P ₂ ,…, P ₅ by:

2.5.1初始化变量l＝1；2.5.1 Initialize variable l = 1;

2.5.2初始化变量q＝1；2.5.2 Initialize variable q = 1;

2.5.3计算频繁子序列p_(l,q)的置信度Confidence₍l,q)：2.5.3 Calculate the confidence ( _l,q) of the frequent subsequence p ₍ l,q):

中的匹配次数之和)。其中，若p_(l,q)是

的一个子序列，则判定p_(l,q)与

一次匹配。

The sum of the number of matches in ). If p _(l,q) is

is a subsequence of , then determine whether p _(l,q) is

One match.

2.5.4判断q是否等于Q_l，若是，转2.5.5；若否，令q＝q+1，转2.5.3；2.5.4 Determine whether q is equal to Q _l . If so, go to 2.5.5. If not, set q = q + 1 and go to 2.5.3.

2.5.5判断l是否等于5，若是，表示得到了P₁,P₂,…,P₅中的所有频繁子序列的置信度，转2.6；若否，令l＝l+1，转2.5.2。2.5.5 Determine whether l is equal to 5. If so, it means that the confidence of all frequent subsequences in P ₁ , P ₂ , …, P ₅ is obtained, and go to 2.6; if not, set l = l + 1, and go to 2.5.2.

2.6根据P₁,P₂,…,P₅中的频繁子序列的置信度Confidence，对P₁,P₂,…,P₅中的频繁子序列进行筛选。方法为：2.6 According to the confidence of the frequent subsequences in P ₁ , P ₂ , …, P _5, filter the frequent subsequences in P ₁ , P ₂ , …, P _5. The method is:

2.6.1初始化变量l＝1；2.6.1 Initialize variable l = 1;

2.6.2初始化变量q＝1；2.6.2 Initialize variable q = 1;

2.6.3若

其中5为期望标签种类数，则将p_lq放入集合P_l'中；2.6.3 If

Where 5 is the expected number of label types, then p _lq is put into the set P _l ';

2.6.4判断q是否等于Q_l，若是，转2.6.5；若否，令q＝q+1，转2.6.3；2.6.4 Determine whether q is equal to Q _l . If so, go to 2.6.5. If not, set q = q + 1 and go to 2.6.3.

2.6.5判断l是否等于5，若是，表示得到了筛选后的频繁子序列集合P₁'，P₂'，P₃'，P₄'，P₅'，转2.7；若否，令l＝l+1，转2.6.2。2.6.5 Determine whether l is equal to 5. If so, it means that the filtered frequent subsequence set P ₁ ', P ₂ ', P ₃ ', P ₄ ', P ₅ ' is obtained, and go to 2.7; if not, set l = l + 1, and go to 2.6.2.

2.7采用P₁'，P₂'，P₃'，P₄'，P₅'对配置项期望预测模块进行训练，方法是：2.7 Use P ₁ ', P ₂ ', P ₃ ', P ₄ ', P ₅ ' to train the configuration item expectation prediction module, the method is:

2.7.1初始化：从P₁'，P₂'，P₃'，P₄'，P₅'中分别随机选出(选出后放回)100个频繁子序列，构成随机选出频繁子序列集合P₁”，P₂”，P₃”，P₄”，P₅”。P₁”，P₂”，P₃”，P₄”，P₅”中共包含500个频繁子序列，即：2.7.1 Initialization: Randomly select (and replace after selection) 100 frequent subsequences from P ₁ ', P ₂ ', P ₃ ', P ₄ ', and P ₅ ' respectively to form the randomly selected frequent subsequence set P ₁ ', P ₂ ', P ₃ ', P ₄ ', and P ₅ '. _{P 1} ', P ₂ ', P ₃ ', P ₄ ', and P ₅ ' contain a total of 500 frequent subsequences, namely:

{p_(1,1),p_(1,2),…,p_(1,r),…,p_(1,100)},…,{p_(l,1),p_(l,2),…,p_(l,r),…,p_(l,100)},…,{p _(1,1) ,p _(1,2) ,…,p _(1,r) ,…,p _(1,100) },…,{p _(l,1) ,p _(l,2) ,… ,p _(l,r) ,…,p _(l,100) },…,

{p_(5,1),p_(5,2),…,p_(5,r),…,p_(5,100)}，1≤r≤100；{p _(5,1) ,p _(5,2) ,…,p _(5,r) ,…,p _(5,100) }, 1≤r≤100;

2.7.2分别计算P₁”，P₂”，P₃”，P₄”，P₅”在训练数据集上的准确率(Precision)、召2.7.2 Calculate the _accuracy ₍ Precision), _recall ( _Recall ) _and

回率(Recall)、F-score(准确率和召回率的调和平均数)：Recall, F-score (the harmonic mean of precision and recall):

2.7.3判断F-score最大值的估计累积分布函数值是否大于阈值δ，δ一般为99％-99.9％，若大于，转2.8；若小于等于阈值δ，转2.7.1；2.7.3 Determine whether the estimated cumulative distribution function value of the maximum F-score is greater than a threshold value δ, which is generally 99%-99.9%. If so, go to 2.8; if less than or equal to the threshold value δ, go to 2.7.1;

2.8配置项期望预测模块选取F-score最大时对应的P₁”，P₂”，P₃”，P₄”，P₅”构建加权投票分类器。方法为：将加权投票分类器的输入设定为任一待预测期望标签的已预处理的配置项描述《POS_x,DS_x》(简记为x)，输出为5个期望标签的得票，x的性能期望标签为得票最高的期望标签。其中，类别l的得票为频繁子序列

的置信度之和，1≤r_x≤100，且

为x的子序列。分类器输出得票五元组，记为

2.8 Configuration item expectation prediction module selects P ₁ ”, P ₂ ”, P ₃ ”, P ₄ ”, P ₅ ” corresponding to the maximum F-score to construct a weighted voting classifier. The method is: set the input of the weighted voting classifier to any preprocessed configuration item description of the expected label to be predicted "POS _x , DS _x " (abbreviated as x), and the output is the votes of the 5 expected labels. The performance expected label of x is the expected label with the highest votes. Among them, the votes of category l are frequent subsequences.

The sum of the confidences of , 1≤r _x ≤100, and

is a subsequence of x. The classifier outputs a five-tuple of votes, recorded as

其中，

的含义为：“在P_l”中，满足“为x的子序列”的

的置信度之和(其中l＝1,2,…,5)”。若Votes(x)中有元素不为0，则找到Votes(x)中最大值所对应的元素，该元素对应的序号l即为x的性能期望标签对应的序号，为Label_l，转第三步；若Votes(x)＝[0,0,0,0,0]，则x的性能期望标签为空，转第三步；例如，若Votes(x)＝[1.1,1.4,5.3,0,2.0]，则x的性能期望标签为Label₃，若Votes(x)＝[0,0,0,0,0]，则该配置项x的性能期望标签为空；in,

The meaning is: "In P _l ", the sequence that satisfies "is a subsequence of x"

The sum of the confidences of (where l = 1, 2, …, 5)". If any element in Votes(x) is not 0, find the element corresponding to the maximum value in Votes(x). The sequence number l corresponding to the element is the sequence number corresponding to the performance expectation label of x, which is Label _l , and go to the third step; if Votes(x) = [0, 0, 0, 0, 0], the performance expectation label of x is empty, and go to the third step; for example, if Votes(x) = [1.1, 1.4, 5.3, 0, 2.0], the performance expectation label of x is Label ₃ , and if Votes(x) = [0, 0, 0, 0, 0], the performance expectation label of the configuration item x is empty;

第三步，利用训练后的配置项期望预测模块为待检测软件生成性能期望标签集合L，将L发送给测试样例生成模块和性能缺陷检测模块，方法是：The third step is to use the trained configuration item expectation prediction module to generate a performance expectation label set L for the software to be tested, and send L to the test sample generation module and the performance defect detection module. The method is:

训练后的配置项期望预测模块从待检测软件的配置项用户手册读取配置项描述，的加权投票分类器对所有待测配置项C＝{c₁,c₂,…,c_z,…,c_N'}，其中，1≤z≤N′(令N′为配置项用户手册中配置项的个数)的性能期望进行预测，得到性能期望标签集合L＝[Lab₁,Lab₂,…,Lab_z,…,Lab_N′]，其中Lab_z∈{Label₁,Label₂,Label₃,Label₄,Label₅,null(空)}；将L发送给测试样例生成模块和性能缺陷检测模块。The trained configuration item expectation prediction module reads the configuration item description from the configuration item user manual of the software to be tested, and uses the weighted voting classifier to predict the performance expectations of all configuration items to be tested C = {c ₁ ,c ₂ ,…,c _z ,…,c _N '}, where 1≤z≤N′ (let N′ be the number of configuration items in the configuration item user manual), and obtains the performance expectation label set L = [Lab ₁ ,Lab ₂ ,…,Lab _z ,…,Lab _N′ ], where Lab _z ∈{Label ₁ ,Label ₂ ,Label ₃ ,Label ₄ ,Label ₅ ,null}; L is sent to the test sample generation module and the performance defect detection module.

第四步，测试样例生成模块为待检测软件生成测试样例集合T，并将T发送给性能缺陷检测模块，方法是：In the fourth step, the test sample generation module generates a test sample set T for the software to be tested and sends T to the performance defect detection module, the method is:

4.1测试样例生成模块使用Tianyin Xu等人在SOSP 2013发表的文章“Do NotBlame Users for Misconfigurations(不要责备用户的配置错误)”的Spex算法，对C中的软件配置项的语法类型和取值范围进行提取。Spex最终提取出的语法类型分为四类：数值类型(int)、布尔类型(bool)、枚举类型(enum)、字符串类型(string)；4.1 The test sample generation module uses the Spex algorithm from the article “Do Not Blame Users for Misconfigurations” published by Tianyin Xu et al. in SOSP 2013 to extract the syntax type and value range of software configuration items in C. The syntax types finally extracted by Spex are divided into four categories: numeric type (int), Boolean type (bool), enumeration type (enum), and string type (string);

4.2测试样例生成模块为配置项集合C＝{c₁,c₂,…,c_z,…,c_N'}生成待测值集合V，V＝{V₁,V₂,…,V_z,…,V_N'}，其中

为配置项c_z的一个取值，K_z为测试样例生成模块为c_z生成的值的个数。方法为：4.2 The test sample generation module generates a set of test values V for the configuration item set C = {c ₁ ,c ₂ ,…,c _z ,…,c _N' }, V = {V ₁ ,V ₂ ,…,V _z ,…,V _N' }, where

is a value of the configuration item c _z , and K _z is the number of values generated by the test sample generation module for c _z . The method is:

4.2.1初始化变量z＝1；4.2.1 Initialize variable z = 1;

4.2.2若c_z对应的期望标签为空，则令

转4.2.7；4.2.2 If the expected label corresponding to c _z is empty, then let

Go to 4.2.7;

4.2.3若c_z为布尔类型(bool)，则令V_z＝{0,1}，转4.2.7；4.2.3 If c _z is of Boolean type (bool), let V _z = {0,1} and go to 4.2.7;

4.2.4若c_z为枚举类型(enum)，则令

其中

为Spex算法提取到的c_z的全部可能取值，转4.2.7；4.2.4 If c _z is an enumeration type (enum), then let

in

All possible values of c _z extracted by the Spex algorithm, go to 4.2.7;

4.2.5若c_z为字符串类型(string)，则令

(根据何浩辰在ESEC/FSE 2019发表的“Tuning backfired？not(always)your fault:understanding and detectingconfiguration-related performance bugs(配置调节适得其反？不总是你的错！理解并检测配置相关的性能缺陷)”的结论，极少数字符串类型的配置项会导致性能缺陷)，转4.2.7；4.2.5 If c _z is a string type (string), then let

(According to the conclusion of "Tuning backfired? not (always) your fault: understanding and detecting configuration-related performance bugs" published by Haochen He at ESEC/FSE 2019, very few string type configuration items can cause performance defects), go to 4.2.7;

4.2.6若c_z为数值类型(int)，则对c_z的值进行抽样，方法为：记Spex算法提取到的c_z的最小取值和最大取值记为Min、Max，令V_z＝{Min,10·Min,10²·Min,Max,10^-1·Max,10^-2·Max}，转4.2.7；4.2.6 If c _z is a numeric type (int), sample the value of c _z as follows: record the minimum and maximum values of c _z extracted by the Spex algorithm as Min and Max, let V _z = {Min, 10·Min, 10 ² ·Min, Max, 10 ^-1 ·Max, 10 ^-2 ·Max}, and go to 4.2.7;

4.2.7若z＝N′，转4.3；否则，令z＝z+1，转4.2.2；4.2.7 If z = N', go to 4.3; otherwise, let z = z + 1 and go to 4.2.2;

4.3对V₁,V₂,…,V_z,…,V_N'取笛卡尔积，得到笛卡尔积V^Cartesian＝V₁×V₂×…×V_N'；4.3 Taking the Cartesian product of V ₁ , V ₂ , …, V _z , …, V _N' , we obtain the Cartesian product V ^Cartesian = V ₁ × V ₂ × … × V _N' ;

4.4软件的性能测试集一般以性能测试工具的形式提供。因此测试样例生成模块基于性能测试工具(如sysbench、apache-benchmark)生成测试命令。方法为：采用经典的pair-wise方法(Pair-wise Testing is a combinatorial method of software testingthat,for each pair of input parameters to a system,tests all possiblediscrete combinations of those parameters.“pair-wise方法是一种软件测试领域的组合方法，该方法针对系统的每对输入参数，测试这些参数的所有可能的离散组合”--《Pragmatic Software Testing:Becoming an Effective and Efficient TestProfessional》“实用软件测试：成为一个高效的测试专业”)对性能测试工具的参数进行抽样，然后将参数(如并发度、负载类型、数据表大小、数据表数量、读操作比例、写操作比例)输入性能测试工具，输出测试命令，得到测试命令集合B＝{b₁,b₂,b₃,…,b_y,…,b_Y}，1≤y≤Y，Y为B中测试命令的个数；4.4 The performance test set of the software is generally provided in the form of a performance test tool. Therefore, the test sample generation module generates test commands based on the performance test tool (such as sysbench, apache-benchmark). The method is as follows: the parameters of the performance testing tool are sampled by using the classic pair-wise method (Pair-wise Testing is a combinatorial method of software testingthat, for each pair of input parameters to a system, tests all possible discrete combinations of those parameters. "Pragmatic Software Testing: Becoming an Effective and Efficient Test Professional"), and then the parameters (such as concurrency, load type, data table size, number of data tables, read operation ratio, write operation ratio) are input into the performance testing tool, and the test commands are output to obtain the test command set B = {b ₁ ,b ₂ ,b ₃ ,…,by _, …,b _Y }, 1≤y≤Y, and Y is the number of test commands in B;

4.5测试样例生成模块生成测试样例集合T，T＝B×V^Cartesian＝{t₁,t₂,t₃,…,t_a,…,t_W}，1≤a≤W，t_a为一个二元组，

(其中，

的含义是：c_z的取值为

)，W为T中测试样例的个数，

为c₁的第u(1≤u≤K₁)个可能取值,

为c_z的第h(1≤h≤K_z)个可能取值,

为c_8′(1≤j≤K_8′)的第j个可能取值，K₁、K_z、K_8′分别为Spex算法提取到的配置项c₁、c_z、c_8′的可能取值的个数，且K₁、K_z、K_N′均为正整数；将测试样例集合T发送给性能缺陷检测模块；4.5 Test sample generation module generates a test sample set T, T = B × V ^Cartesian = {t ₁ ,t ₂ ,t ₃ ,…,t _a ,…,t _W }, 1≤a≤W, t _a is a two-tuple,

(in,

The meaning is: the value of c _z is

), W is the number of test samples in T,

is the uth (1≤u≤K ₁ ) possible value of c ₁ ,

is the hth ( _1≤h≤Kz ) possible value of _cz ,

is the jth possible value of c _8′ (1≤j≤K _8′ ), K ₁ , K _z , K _8′ are respectively the number of possible values of configuration items c ₁ , c _z , c _8′ extracted by the Spex algorithm, and K ₁ , K _z , K _N′ are all positive integers; the test sample set T is sent to the performance defect detection module;

第五步：性能缺陷检测模块根据T和L检测待测软件可执行文件的性能缺陷：Step 5: The performance defect detection module detects the performance defects of the executable file of the software to be tested based on T and L:

5.1性能缺陷检测模块执行T中的测试样例，得到测试样例的性能值，方法是：5.1 The performance defect detection module executes the test samples in T and obtains the performance values of the test samples. The method is:

5.1.1初始化变量a＝1；5.1.1 Initialize variable a=1;

5.1.2为防止因测试环境不稳定导致的性能波动，性能缺陷检测模块重复执行每个测试样例A次，A为正整数，A优选为10；因此，令变量repeat＝1(变量repeat记录当前重复执行的次数)；5.1.2 To prevent performance fluctuations caused by unstable test environment, the performance defect detection module repeats each test sample A times, where A is a positive integer, preferably 10; therefore, let the variable repeat = 1 (the variable repeat records the number of current repeated executions);

5.1.3性能缺陷检测模块将测试样例t_a输入待检测软件，运行待检测软件，记录第repeat次输入t_a运行得到的性能值

设定检测性能值的默认性能指标为软件数据吞吐量；5.1.3 The performance defect detection module inputs the test sample t _a into the software to be tested, runs the software to be tested, and records the performance value obtained by running the software after the repeat input t _a.

The default performance indicator for setting the detection performance value is the software data throughput;

5.1.4判定repeat是否等于A，若是，则得到一组关于测试样例t_a的性能指标，记为：

转4.1.5；否则令repeat＝repeat+1，转4.1.3；5.1.4 Determine whether repeat is equal to A. If so, a set of performance indicators for the test sample _ta is obtained, which is recorded as:

Go to 4.1.5; otherwise, set repeat = repeat + 1 and go to 4.1.3;

5.1.5判定a是否等于W，若是，记输出为Out＝{[t₁,R₁],…,[t_a,R_a],…,[t_W,R_W]}(其中，二元组[t_a,R_a]的第一个元素为测试样例，第二个元素为执行该测试样例A次得到的性能值集合)，转5.2；否则令a＝a+1，转5.1.2；5.1.5 Determine whether a is equal to W. If so, record the output as Out = {[t ₁ ,R ₁ ],…,[t _a ,R _a ],…,[t _W ,R _W ]} (where the first element of the tuple [t _a ,R _a ] is the test sample, and the second element is the performance value set obtained by executing the test sample A times), and go to 5.2; otherwise, let a = a + 1, and go to 5.1.2;

5.2性能缺陷检测模块将Out依据测试样例进行分组，方法是：5.2 The performance defect detection module groups Out according to the test samples by:

5.2.1初始化变量a＝1；5.2.1 Initialize variable a=1;

5.2.2判断若[t_a,R_a]已被分组，则令a＝a+1，转5.2.2；否则转5.2.3；5.2.2 If [t _a ,R _a ] has been grouped, set a=a+1 and go to 5.2.2; otherwise go to 5.2.3;

5.2.3将[t_a,R_a]按t_a中的配置项取值和测试命令进行分组，即[t_a,R_a]与{[t₁,R₁],…,[t_a,R_a],…,[t_W,R_W]}中，若t_a和t_a'同时满足以下3个条件，则将[t_a,R_a],[t_a’,R_a’]组成一组：5.2.3 Group [t _a ,R _a ] according to the configuration item values and test commands in t _a , that is, [t _a ,R _a ] and {[t ₁ ,R ₁ ],…,[t _a ,R _a ],…,[t _W ,R _W ]}, if t _a and t _a' simultaneously meet the following three conditions, then [t _a ,R _a ],[t _a' ,R _a' ] are grouped together:

条件1，t_a和t_a'仅有某一个配置项c_z(其中，1≤z≤N′)的取值不同；Condition 1: _ta and ta _' have only one configuration item c _z (where 1≤z≤N') that differs in value;

条件2，测试命令均为

Condition 2, the test commands are

条件3，[t_a,R_a],[t_a’,R_a’]未被分组；Condition 3, [t _a ,R _a ], [t _a' ,R _a' ] are not grouped;

令与[t_a,R_a]满足以上条件的共有Num_a个，即

分为一组，记为Group_(z,y)，Group_(z,y)＝{[t_a,R_a],[t_a‘,R_a‘],[t_a’‘,R_a“],…,[t_a*,R_a*]}(其中，1≤a',a”,…,a*≤W，Num_a为正整数，Num_a的大小与c_z的类型有关：若c_z为布尔类型，则Num_a＝2；若为枚举类型，则Num_a＝K_z；若为数值类型，则Num_a＝6；若为字符串类型，则Num_a＝1)。例如:t_a为

并且t_a’为

时，将[t_a,R_a],[t_a’,R_a’]组成一组；Let [t _a ,R _a ] satisfy the above conditions. _That is,

Divide them into a group, denoted as Group _(z,y) , Group _(z,y) ₌ {[ _ta , Ra], [ta _' , Ra _' ], [ta _'' , _Ra" ], ..., [ta _* , Ra _* ]} (where 1 ≤ a', a", ..., a* ≤ W, _Numa is a positive integer, and the size of _Numa depends on the type of _cz : if _cz is a Boolean type, then _Numa = 2; if it is an enumeration type, then _Numa = _Kz ; if it is a numeric type, then _Numa = 6; if it is a string type, then _Numa = 1). For example: t _a is

And t _a' is

, group [t _a ,R _a ],[t _a' ,R _a' ] together;

5.2.4若a＝W，表示分组完成，得到分组后的测试结果集合G＝{Group_(1,1),Group_(1,2),…,Group_(1,Y),…,Group_(z,y)…,Group_(N',Y)}，转5.3；否则令a＝a+1，转5.2.2；5.2.4 If a＝W, it means the grouping is completed, and the test result set after grouping is G＝{Group _(1,1) ,Group _(1,2) ,…,Group _(1,Y) ,…,Group _(z,y) …,Group _(N',Y) }, go to 5.3; otherwise let a＝a+1, go to 5.2.2;

5.3性能缺陷检测模块根据配置项集合C的性能期望标签L以及分组后的测试结果集合G，使用假设检验(假设检验(hypothesis testing)，又称统计假设检验，是用来判断样本与样本、样本与总体的差异是由抽样误差引起还是本质差别造成的统计推断方法。假设检验参数β为小于1的正实数.优选β＝0.05)的方法判别待检测软件是否存在缺陷。假设检验原理为：若任意一配置项c_z的期望标签为Label₁、Label₂或Label₃，调节c_z的性能预期为性能提升，若实际测试结果为性能下降，则软件存在性能缺陷；若c_z的期望标签为Label₄，调节c_z的性能预期为性能合理下降，若实际测试结果为性能大幅下降，则软件存在性能缺陷；若c_z的期望标签为Label₅，调节c_z的性能预期为性能不变，若实际测试结果为性能下降，则软件存在性能缺陷。方法为：遍历R中每一个分组，使用假设检验的方法判别待检测软件是否存在缺陷：5.3 The performance defect detection module uses the hypothesis test (hypothesis testing, also known as statistical hypothesis testing, is a statistical inference method used to determine whether the difference between samples and samples, samples and the population is caused by sampling errors or essential differences. The hypothesis test parameter β is a positive real number less than 1. β = 0.05 is preferred) to determine whether the software to be tested has defects. The hypothesis test principle is: if the expected label of any configuration item c _z is Label ₁ , Label ₂ or Label ₃ , the performance expectation of c _z is adjusted to improve performance. If the actual test result is a performance degradation, the software has a performance defect; if the expected label of c z is Label ₄ , the performance expectation of c _z is adjusted to a reasonable performance degradation. If the actual test result is a significant performance degradation, the software has a performance defect; if the expected label of c _z is Label ₅ , the performance expectation of c _z is adjusted to remain unchanged. If _the actual test result is a performance degradation, the software has a performance defect. The method is: traverse each group in R and use the hypothesis test method to determine whether the software to be tested has defects:

5.3.1初始化变量z＝1；5.3.1 Initialize variable z = 1;

5.3.2初始化变量y＝1；5.3.2 Initialize variable y = 1;

5.3.3若Lab_z＝Label₁，(其中，Lab_z为c_z的期望标签)设定待验假设H₀：R_a≤R_a'(其中，c_z在t_a中的值为0，c_z在t_a'中的值为1)。转5.3.8；5.3.3 If Lab _z = Label ₁ , (where Lab _z is the expected label of c _z ), set the hypothesis to be tested H ₀ : _Ra ≤ _Ra' (where the value of c _z in _ta is 0, and the value of c _z in ta _' is 1). Go to 5.3.8;

5.3.4若Lab_z＝Label₂，设定待验假设H₀：R_a≤R_a'(其中，c_z在t_a中的值大于c_z在t_a'中的值)。转5.3.8；5.3.4 If Lab _z = Label ₂ , set the hypothesis to be tested H ₀ : _Ra ≤ _Ra' (where the value of c _z in _ta is greater than the value of c _z in ta _' ). Go to 5.3.8;

5.3.5若Lab_z＝Label₃，设定待验假设H₀：R_a≤R_a'(其中，c_z在t_a中的值小于c_z在t_a'中的值)。转5.3.8；5.3.5 If Lab _z = Label ₃ , set the hypothesis to be tested H ₀ : _Ra ≤ _Ra ' (where the value of c _z in _ta is less than the value of c _z in ta _' ). Go to 5.3.8;

5.3.6若Lab_z＝Label₄，设定待验假设H₀：5·R_a≤R_a'(其中，c_z在t_a中的值为1，c_z在t_a'中的值为0)。转5.3.8；5.3.6 If Lab _z = Label ₄ , set the hypothesis to be tested H ₀ : 5·R _a ≤R _a' (where c _z is 1 in t _a _and 0 in t _a' ). Go to 5.3.8;

5.3.7若Lab_z＝Label₅，设定待验假设H₀：R_a≠R_a'。转5.3.8；5.3.7 If Lab _z = Label ₅ , set the hypothesis to be tested as H ₀ : _Ra ≠Ra _' . Go to 5.3.8;

5.3.8当假设检验结果表明H₀被拒绝时(即使用假设检验方法计算得到的被拒绝概率≥1-β)，表明软件存在一个与配置项c_z有关的性能缺陷，且触发该缺陷的测试命令为

5.3.8 When the hypothesis test result shows that H ₀ is rejected (i.e. the rejection probability calculated by the hypothesis test method is ≥ 1-β), it indicates that the software has a performance defect related to configuration item c _z , and the test command that triggers the defect is

5.3.9若y＝Y，则转5.3.10；否则令y＝y+1，转5.3.3；5.3.9 If y = Y, go to 5.3.10; otherwise, let y = y + 1 and go to 5.3.3;

5.3.10若z＝N′，结束检测；否则令z＝z+1，转5.3.2。5.3.10 If z=N′, end the test; otherwise, set z=z+1 and go to 5.3.2.

与现有技术相比，采用本发明能达到以下有益效果：Compared with the prior art, the present invention can achieve the following beneficial effects:

1、采用本发明能有效检测出软件性能缺陷。采用本发明在12款大型开源软件MySQL、MariaDB、Apache-httpd、Apache-Tomcat、Apache-Derby、H2、PostgreSQL、GCC、Clang、MongoDB、RocksDB、Squid中的61个历史性能缺陷中，基于52个配置项的预期，运行了23418个测试样例，耗费178个小时，成功检测出54个性能缺陷，仅产生7个假阳性(误报)。而已有工作(Adrian Nistor等人在ICSE 2013发表的“Toddler:Detecting PerformanceProblems via Similar Memory-Access Patterns”通过相似的内存读写模式检测性能缺陷)仅能检测出6个。1. The present invention can effectively detect software performance defects. The present invention was used to detect 61 historical performance defects in 12 large open source software, including MySQL, MariaDB, Apache-httpd, Apache-Tomcat, Apache-Derby, H2, PostgreSQL, GCC, Clang, MongoDB, RocksDB, and Squid. Based on the expectations of 52 configuration items, 23,418 test samples were run, which took 178 hours, and 54 performance defects were successfully detected, with only 7 false positives (false alarms). However, existing work (“Toddler: Detecting Performance Problems via Similar Memory-Access Patterns” published by Adrian Nistor et al. at ICSE 2013, which detects performance defects through similar memory read and write patterns) can only detect 6.

2、采用本发明能为软件社区检测出11个新的性能缺陷，防止了潜在的因软件性能问题可能导致的经济、用户损失。缺陷ID为：Clang-43576，Clang-43084，Clang-44359，Clang-44518，GCC-93521，GCC-93037，GCC91895，GCC91852，GCC-91817，GCC-91875，GCC-93535。2. The present invention can detect 11 new performance defects for the software community, preventing potential economic and user losses caused by software performance problems. The defect IDs are: Clang-43576, Clang-43084, Clang-44359, Clang-44518, GCC-93521, GCC-93037, GCC91895, GCC91852, GCC-91817, GCC-91875, GCC-93535.

3、本发明第二步给出了一个详尽的配置项性能期望分类并给出了自动预测配置项性能期望的方法，并给出了包含大量配置项及其期望的数据集；本发明基于配置项的预期，可有效判别无缺陷软件和有缺陷软件的性能差异，具有良好的应用前景。3. The second step of the present invention provides a detailed classification of configuration item performance expectations and a method for automatically predicting configuration item performance expectations, and provides a data set containing a large number of configuration items and their expectations; based on the expectations of configuration items, the present invention can effectively distinguish the performance differences between defect-free software and defective software, and has good application prospects.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的总体流程图；Fig. 1 is an overall flow chart of the present invention;

图2是本发明第一步构建的性能期望检测系统逻辑结构图；FIG2 is a logical structure diagram of the performance expectation detection system constructed in the first step of the present invention;

图3是本发明第二步使用的配置项性能期望表。FIG. 3 is a configuration item performance expectation table used in the second step of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图对本发明进行说明。The present invention will be described below in conjunction with the accompanying drawings.

如图1所示，本发明包括以下步骤：As shown in Figure 1, the present invention comprises the following steps:

第一步，构建性能缺陷检测系统，性能缺陷检测系统如图2所示，由配置项期望预测模块、测试样例生成模块、性能缺陷检测模块构成。The first step is to build a performance defect detection system. The performance defect detection system is shown in Figure 2 and consists of a configuration item expectation prediction module, a test sample generation module, and a performance defect detection module.

配置项期望预测模块是一个加权投票分类器，与测试样例生成模块、性能缺陷检测模块相连，从待检测软件的配置项用户手册读取配置项的描述、取值范围，对待预测配置项的性能期望进行预测，得到配置项的性能期望标签，将配置项的性能期望标签发送给测试样例生成模块和性能缺陷检测模块。The configuration item expectation prediction module is a weighted voting classifier, which is connected to the test sample generation module and the performance defect detection module. It reads the description and value range of the configuration item from the configuration item user manual of the software to be tested, predicts the performance expectation of the configuration item to be predicted, obtains the performance expectation label of the configuration item, and sends the performance expectation label of the configuration item to the test sample generation module and the performance defect detection module.

2.2根据N个配置项的官方文档描述，对配置项人工标注其性能期望标签，方法如图3所示：根据配置项(记为c)的文档描述(记为d)，如果调节该配置项的目的是为了开启优化开关，则该配置项的性能期望标签为Label₁；如果调节该配置项的目的是为了提升性能牺牲可靠性等非功能需求，则该配置项的性能期望标签为Label₂；如果调节该配置项的目是为了分配更多计算机资源，则该配置项的性能期望标签为Label₃；如果调节该配置项的目的是为了开启软件额外功能，则该配置项的性能期望标签为Label₄；如果调节该配置项与软件性能无关，则该配置项的性能期望标签为Label₅；最终得到训练集，记为

是训练集中性能期望标签为Label_l的第i_l个配置项。

是

的文档描述，由单词组成。其中，1≤l≤5，1≤i_l≤N_l。令

中的单词总数为

记为(单词₁，单词₂，…，单词

…，单词

)。2.2 According to the official document descriptions of N configuration items, the performance expectation labels of the configuration items are manually labeled. The method is shown in Figure 3: According to the document description (denoted as d) of the configuration item (denoted as c), if the purpose of adjusting the configuration item is to turn on the optimization switch, the performance expectation label of the configuration item is Label ₁ ; if the purpose of adjusting the configuration item is to improve performance at the expense of non-functional requirements such as reliability, the performance expectation label of the configuration item is Label ₂ ; if the purpose of adjusting the configuration item is to allocate more computer resources, the performance expectation label of the configuration item is Label ₃ ; if the purpose of adjusting the configuration item is to turn on additional software functions, the performance expectation label of the configuration item is Label ₄ ; if adjusting the configuration item is not related to software performance, the performance expectation label of the configuration item is Label ₅ ; finally, the training set is obtained, denoted as

yes

The total number of words in

Recorded as (word ₁ , word ₂ , ..., word

…,word

).

2.3.1初始化变量l＝1；2.3.1 Initialize variable l = 1;

2.3.2初始化变量i_l＝1；2.3.2 Initialize variable i _l = 1;

2.3.3对

进行预处理，方法是：2.3.3 Pair

Perform preprocessing by:

2.3.3.1令变量

2.3.3.1 Let variables

2.3.3.2将单词

转化为

其中

为单词的词性标签，

为计算机领域同义词；2.3.3.2 Word

Convert to

in

is the part-of-speech tag of the word,

Synonymous with the computer field;

2.3.3.3若

令

转2.3.3.2；若

则得到预处理后的

为如下形式：

简记为

转2.3.4；2.3.3.3 If

make

Go to 2.3.3.2; if

Then the preprocessed

In the following form:

Abbreviated as

Go to 2.3.4;

2.4配置项期望预测模块挖掘频繁子序列。使用的PrefixSpan算法分别对集合

进行频繁子序列挖掘，得到5个频繁子序列集合：

挖掘出的频繁子序列的个数；1≤q≤Q_l；2.4 Configuration item expectation prediction module mines frequent subsequences. The PrefixSpan algorithm used is respectively

Perform frequent subsequence mining and obtain 5 frequent subsequence sets:

The number of frequent subsequences mined; 1≤q≤Q _l ;

2.5.1初始化变量l＝1；2.5.1 Initialize variable l = 1;

2.5.2初始化变量q＝1；2.5.2 Initialize variable q = 1;

2.5.3计算频繁子序列p_(l,q)的置信度Confidence_(l,q)：2.5.3 Calculate the confidence ( _l,q) of the frequent subsequence p _(l,q) :

中的匹配次数之和)。其中，若p_(l,q)是

的一个子序列，则判定p_(l,q)与

一次匹配。

The sum of the number of matches in ). If p _(l,q) is

is a subsequence of , then determine whether p _(l,q) is

One match.

2.6.1初始化变量l＝1；2.6.1 Initialize variable l = 1;

2.6.2初始化变量q＝1；2.6.2 Initialize variable q = 1;

2.6.3若

其中5为期望标签种类数，则将p_lq放入集合P_l'中；2.6.3 If

2.7.1初始化：从P₁'，P₂'，P₃'，P₄'，P₅'中分别随机选出(选出后放回)100个频繁子序列，构成随机选出频繁子序列集合P₁”，P₂”，P₃”，P₄”，P₅”。P₁”，P₂”，P₃”，P₄”，P₅”中共包含500个频繁子序列，即：{p_(1,1),p_(1,2),…,p_(1,r),…,p_(1,100)},…,{p_(l,1),p_(l,2),…,p_(l,r),…,p_(l,100)},…,2.7.1 Initialization: Randomly select (and replace after selection) 100 frequent subsequences from P ₁ ′, P ₂ ′, P ₃ ′, P ₄ ′, and P ₅ ′ respectively to form the randomly selected frequent subsequence set P ₁ ′, P ₂ ′, P ₃ ′, P ₄ ′, and P ₅ ′. P ₁ ′, P ₂ ′, P ₃ ′, P ₄ ′, and P ₅ ′ contain a total of 500 frequent subsequences, namely: {p _(1,1) , p _(1,2) ,…, p _(1,r) ,…, p _(1,100) },…, {p _(l,1) , p _(l,2) ,…, p _(l,r) ,…, p _(l,100) },…,

2.7.2分别计算P₁”，P₂”，P₃”，P₄”，P₅”在训练数据集上的准确率(Precision)、召回率(Recall)、F-score(准确率和召回率的调和平均数)：2.7.2 Calculate the accuracy (Precision), recall (Recall), and F-score (the harmonic mean of accuracy and recall) of P ₁ ”, P ₂ ”, P ₃ ”, P ₄ ”, and P ₅ ” on the training data set respectively:

的置信度之和，1≤r_x≤100，且

为x的子序列。分类器输出得票五元组，记为Votes(x)＝

The sum of the confidences of , 1≤r _x ≤100, and

is a subsequence of x. The classifier outputs a five-tuple of votes, recorded as Votes(x) =

其中，

为x的子序列

的含义为：“在P_l”中，满足“为x的子序列”的

is a subsequence of x

The meaning is: "In P _l ", the sequence that satisfies "is a subsequence of x"

4.1测试样例生成模块使用Spex算法，对C中的软件配置项的语法类型和取值范围进行提取。Spex最终提取出的语法类型分为四类：数值类型(int)、布尔类型(bool)、枚举类型(enum)、字符串类型(string)；4.1 The test sample generation module uses the Spex algorithm to extract the syntax type and value range of the software configuration items in C. The syntax types finally extracted by Spex are divided into four categories: numeric type (int), Boolean type (bool), enumeration type (enum), and string type (string);

4.2.1初始化变量z＝1；4.2.1 Initialize variable z = 1;

4.2.2若c_z对应的期望标签为空，则令

转4.2.7；4.2.2 If the expected label corresponding to c _z is empty, then let

Go to 4.2.7;

4.2.4若c_z为枚举类型(enum)，则令

其中

in

All possible values of c _z extracted by the Spex algorithm, go to 4.2.7;

4.2.5若c_z为字符串类型(string)，则令

转4.2.7；4.2.5 If c _z is a string type (string), then let

Go to 4.2.7;

4.4软件的性能测试集一般以性能测试工具的形式提供。因此测试样例生成模块基于性能测试工具(如sysbench、apache-benchmark)生成测试命令。方法为：采用经典的pair-wise方法对性能测试工具的参数进行抽样，然后将参数(如并发度、负载类型、数据表大小、数据表数量、读操作比例、写操作比例)输入性能测试工具，输出测试命令，得到测试命令集合B＝{b₁,b₂,b₃,…,b_y,…,b_Y}，1≤y≤Y，Y为B中测试命令的个数；4.4 The performance test set of the software is generally provided in the form of a performance testing tool. Therefore, the test sample generation module generates test commands based on the performance testing tools (such as sysbench, apache-benchmark). The method is: use the classic pair-wise method to sample the parameters of the performance testing tool, and then input the parameters (such as concurrency, load type, data table size, number of data tables, read operation ratio, write operation ratio) into the performance testing tool, output the test command, and obtain the test command set B = {b ₁ , b ₂ , b ₃ ,…, by _, …, b _Y }, 1≤y≤Y, Y is the number of test commands in B;

(其中，

的含义是：c_z的取值为

)，W为T中测试样例的个数，

为c₁的第u(1≤u≤K₁)个可能取值,

为c_z的第h(1≤h≤K_z)个可能取值,

为c_N′(1≤j≤K_N′)的第j个可能取值，K₁、K_z、K_N′分别为Spex算法提取到的配置项c₁、c_z、c_N′的可能取值的个数，且均为正整数；将测试样例集合T发送给性能缺陷检测模块；4.5 Test sample generation module generates a test sample set T, T = B × V ^Cartesian = {t ₁ ,t ₂ ,t ₃ ,…,t _a ,…,t _W }, 1≤a≤W, t _a is a two-tuple,

(in,

The meaning is: the value of c _z is

), W is the number of test samples in T,

is the uth (1≤u≤K ₁ ) possible value of c ₁ ,

is the hth ( _1≤h≤Kz ) possible value of _cz ,

is the jth possible value of c _N′ (1≤j≤K _N′ ), K ₁ , K _z , K _N′ are respectively the number of possible values of configuration items c ₁ , c _z , c _N′ extracted by the Spex algorithm, and are all positive integers; the test sample set T is sent to the performance defect detection module;

5.1.1初始化变量a＝1；5.1.1 Initialize variable a=1;

5.1.2为防止因测试环境不稳定导致的性能波动，性能缺陷检测模块重复执行每个测试样例A次，A为正整数，A优选为10；因此，令变量repeat＝1；5.1.2 To prevent performance fluctuations caused by unstable test environment, the performance defect detection module repeats each test sample A times, where A is a positive integer and A is preferably 10; therefore, let the variable repeat = 1;

设定检测性能值的默认性能指标为软件数据吞吐量；5.1.3 The performance defect detection module inputs the test sample t _a into the software to be tested, runs the software to be tested, and records the performance value obtained by running the software after the input t _a for the first time.

Go to 4.1.5; otherwise, set repeat = repeat + 1 and go to 4.1.3;

5.2.1初始化变量a＝1；5.2.1 Initialize variable a=1;

条件2，测试命令均为

Condition 2, the test commands are

令与[t_a,R_a]满足以上条件的共有Num_a个，即

并且t_a'为

And t _a' is

, group [t _a ,R _a ],[t _a' ,R _a' ] together;

5.3性能缺陷检测模块根据配置项集合C的性能期望标签L以及分组后的测试结果集合G，使用假设检验(假设检验参数β为小于1的正实数.优选β＝0.05)的方法判别待检测软件是否存在缺陷。假设检验原理为：若任意一配置项c_z的期望标签为Label₁、Label₂或Label₃，调节c_z的性能预期为性能提升，若实际测试结果为性能下降，则软件存在性能缺陷；若c_z的期望标签为Label₄，调节c_z的性能预期为性能合理下降，若实际测试结果为性能大幅下降，则软件存在性能缺陷；若c_z的期望标签为Label₅，调节c_z的性能预期为性能不变，若实际测试结果为性能下降，则软件存在性能缺陷。方法为：遍历R中每一个分组，使用假设检验的方法判别待检测软件是否存在缺陷：5.3 The performance defect detection module uses the hypothesis test method (the hypothesis test parameter β is a positive real number less than 1. β = 0.05 is preferred) to determine whether the software to be tested has defects based on the performance expectation label L of the configuration item set C and the grouped test result set G. The principle of hypothesis testing is: if the expected label of any configuration item c _z is Label ₁ , Label ₂ or Label ₃ , the performance expectation of c _z is adjusted to improve the performance. If the actual test result is a performance degradation, the software has a performance defect; if the expected label of c _z is Label ₄ , the performance expectation of c _z is adjusted to a reasonable performance degradation. If the actual test result is a significant performance degradation, the software has a performance defect; if the expected label of c _z is Label ₅ , the performance expectation of c _z is adjusted to remain unchanged. If the actual test result is a performance degradation, the software has a performance defect. The method is: traverse each group in R and use the hypothesis test method to determine whether the software to be tested has defects:

5.3.1初始化变量z＝1；5.3.1 Initialize variable z = 1;

5.3.2初始化变量y＝1；5.3.2 Initialize variable y = 1;

5.3.5若Lab_z＝Label₃，设定待验假设H₀：R_a≤R_a'(其中，c_z在t_a中的值小于c_z在t_a'中的值)。转5.3.8；5.3.5 If Lab _z = Label ₃ , set the hypothesis to be tested H ₀ : _Ra ≤ _Ra' (where the value of c _z in _ta is less than the value of c _z in ta _' ). Go to 5.3.8;

Claims

1. A method for detecting software performance defects based on configuration item performance expectations, characterized by comprising the following steps:

The first step is to build a performance defect detection system, which consists of a configuration item expectation prediction module, a test sample generation module, and a performance defect detection module.

The configuration item expectation prediction module is a weighted voting classifier, which is connected to the test sample generation module and the performance defect detection module. It reads the description and value range of the configuration item from the configuration item user manual of the software to be tested, predicts the performance expectation of the configuration item to be predicted, obtains the performance expectation label of the configuration item, and sends the performance expectation label of the configuration item to the test sample generation module and the performance defect detection module.

The test sample generation module is connected to the configuration item expectation prediction module and the performance defect detection module, receives the performance expectation label of the configuration item from the configuration item expectation prediction module, reads the test command from the test set of the software to be tested, and generates a test sample set T according to the performance expectation label of the configuration item and the test set of the software to be tested;

The performance defect detection module is connected to the configuration item expectation prediction module and the test sample generation module, receives the test sample set T from the test sample generation module, receives the performance expectation label of the configuration item from the configuration item expectation prediction module, executes the test samples in the test sample set T and detects whether the expected performance corresponding to the performance expectation label of the configuration item is consistent with the actual performance, and outputs the performance defect of the software to be detected if they are not consistent;

Step 2: Read the manually annotated expected configuration items and the official document descriptions of the configuration items, and train the configuration item expectation prediction module of the performance defect detection system. The method is:

2.1 Construct a training set with a total of N configuration items in the training set, N ≥ 500;

2.2 According to the official document descriptions of N configuration items, manually label the configuration items with their expected performance labels. The method is as follows: according to the document description d of configuration item c, if the purpose of adjusting the configuration item is to turn on the optimization switch, the expected performance label of the configuration item is Label ₁ ; if the purpose of adjusting the configuration item is to improve performance at the expense of non-functional requirements such as reliability, the expected performance label of the configuration item is Label ₂ ; if the purpose of adjusting the configuration item is to allocate more computer resources, the expected performance label of the configuration item is Label ₃ ; if the purpose of adjusting the configuration item is to turn on additional software functions, the expected performance label of the configuration item is Label ₄ ; if adjusting the configuration item is not related to software performance, the expected performance label of the configuration item is Label ₅ ; finally, the training set is obtained, which is recorded as

0≤i ₁ ≤N ₁ ;

0≤i ₂ ≤N ₂ ;

0≤i ₃ ≤N ₃ ;

0≤i ₄ ≤N ₄ ;

0≤i ₅ ≤N ₅ , N ₁ +N ₂ +N ₃ +N ₄ +N ₅ =N; N ₁ , N ₂ , N ₃ , N ₄ , and N ₅ are the numbers of configuration item document descriptions whose performance expectation labels are Label ₁ , Label ₂ , Label ₃ , Label ₄ , and Label ₅ respectively;

is the i _lth configuration item with expected performance label Label _l in the training set,

yes

The document description consists of words, 1≤l≤5, 1≤i _l ≤N _l ; let

The total number of words in

Recorded as: word ₁ , word ₂ , ..., word

…,word

2.3 The configuration item expectation prediction module preprocesses the training set by:

2.3.1 Initialize variable l = 1;

2.3.2 Initialize variable i _l = 1;

2.3.3 Pair

Perform preprocessing by:

2.3.3.1 Let variables

2.3.3.2 Word

Convert to

is the part-of-speech tag of the word,

Synonymous with the computer field;

2.3.3.3 If

make

Go to 2.3.3.2; if

Then the preprocessed

It is in the following form: <POS ₁ , DS ₁ >, <POS ₂ , DS ₂ >, ...,

Abbreviated as

Go to 2.3.4;

2.3.4 Determine whether i _l is equal to N _l . If so, go to 2.3.5. Otherwise, set i _l =i _l +1 and go to 2.3.3.

2.3.5 Determine whether l is equal to 5. If so, go to 2.4. Otherwise, set l = l + 1 and go to 2.3.2.

2.4 Configuration item expectation prediction module mines frequent subsequences and uses the PrefixSpan algorithm to respectively

Perform frequent subsequence mining and obtain 5 frequent subsequence sets:

Where Q ₁ , Q ₂ , ..., Q _l , ..., Q ₅ are positive integers, indicating that when l = 1, 2, ..., 5, the PrefixSpan algorithm selects

The number of frequent subsequences mined; 1≤q≤Q _l :

2.5 Calculate the confidence of all frequent subsequences in P ₁ , P ₂ , ..., P ₅ by:

2.5.1 Initialize variable l = 1;

2.5.2 Initialize variable q = 1;

2.5.3 Calculate the confidence ( _{l, q)} of the frequent subsequence p _{(l, q)} :

Confidence _{(l, q)} = (p _{(l, q)} in the set

The number of matches in the five sets)/(p _(l,q)

The sum of the number of matches in ), where if p _{(l, q)} is

is a subsequence of , then determine whether p _{(l, q)} is equal to

One match;

2.5.4 Determine whether q is equal to Q _l . If so, go to 2.5.5. If not, set q = q + 1 and go to 2.5.3.

2.5.5 Determine whether l is equal to 5. If so, it means that the confidence of all frequent subsequences in P ₁ , P ₂ , ..., P ₅ is obtained, and go to 2.6; if not, set l = l + 1, and go to 2.5.2;

2.6 According to the confidence Confidence of the frequent subsequences in P ₁ , P ₂ , ..., P _5, the frequent subsequences in P ₁ , P ₂ , ..., P ₅ are screened to obtain the screened frequent subsequence set P ₁ ′, P ₂ ′, P ₃ ′, P ₄ ′, P ₅ ′;

2.7 Use P ₁ ′, P ₂ ′, P ₃ ′, P ₄ ′, P ₅ ′ to train the configuration item expectation prediction module, the method is:

2.7.1 Initialization: Randomly select 100 frequent subsequences from P ₁ ′, P ₂ ′, P ₃ ′, P ₄ ′, P ₅ ′ respectively to form the randomly selected frequent subsequence set P ₁ ″, P ₂ ″, P ₃ ″, P ₄ ″, P ₅ ″; P ₁ ″, P ₂ ″, P _{3 ′, P 4} ″, _P ₅ ″ contain 500 frequent subsequences in total, that is:

{p _(1,1) , p _(1,2) ,..., p _(1,r) ,..., p _(1,100) },..., {p _(l,1) , p _{(l, 2)} , ..., p _{(l, r)} , ..., p _{(l, 100} )}, ..., {p _{(5, 1)} , p _{(5, 2)} ,. .., p _{(5, r)} , ..., p _{(5, 100)} }, 1≤r≤100;

2.7.2 Calculate the accuracy Precision, recall Recall, and harmonic mean F-score of accuracy and recall of P ₁ ”, P ₂ ”, P ₃ ”, P ₄ ”, and P ₅ ” on the training data set respectively:

2.7.3 Determine whether the estimated cumulative distribution function value of the maximum F-score is greater than the threshold δ. If so, go to 2.8; if less than or equal to the threshold δ, go to 2.7.1;

2.8 Configuration Item Expectation Prediction Module Select P ₁ ″, P ₂ ″, P ₃ ″, P ₄ ″, P ₅ ″ corresponding to the maximum F-score to construct a weighted voting classifier and go to the third step;

The third step is to use the trained configuration item expectation prediction module to generate a performance expectation label set L for the software to be tested, and send L to the test sample generation module and the performance defect detection module. The method is:

The trained configuration item expectation prediction module reads the configuration item description from the configuration item user manual of the software to be tested, and the weighted voting classifier predicts the performance expectations of all configuration items to be tested C = {c ₁ , c ₂ , ..., c _z , ..., c _N′ } to obtain a performance expectation label set L = [Lab ₁ , Lab ₂ , ..., Lab _z , ..., Lab _N′ ], where 1≤z≤N′, Lab _z ∈{Label ₁ , Label ₂ , Label ₃ , Label ₄ , Label ₅ , null}, and N′ is the number of configuration items in the configuration item user manual; L is sent to the test sample generation module and the performance defect detection module;

In the fourth step, the test sample generation module generates a test sample set T for the software to be tested and sends T to the performance defect detection module, the method is:

4.1 The test sample generation module extracts the syntax type and value range of the software configuration items in C. The extracted syntax types are divided into four categories: numeric type, Boolean type, enumeration type, and string type;

4.2 The test sample generation module generates a set of test values V for the configuration item set C = {c ₁ , c ₂ , ..., c _z , ..., c _N′ }, V = {V ₁ , V ₂ , ..., V _z , ..., V _N′ }, where

is a value of the configuration item c _z , K _z is the number of values generated by the test sample generation module for c _z ;

4.3 Taking the Cartesian product of V ₁ , V ₂ , ..., V _z , ..., V _N′ , we obtain the Cartesian product V ^Cartesian = V ₁ × V ₂ × ... ΔV _N′ ;

4.4 The test sample generation module generates test commands based on the performance test tool by adopting a pair-wise method to sample the parameters of the performance test tool, then inputting the parameters into the performance test tool, outputting the test commands, and obtaining a test command set B = {b ₁ , b ₂ , b ₃ , ..., by _, ..., b _Y }, 1≤y≤Y, where Y is the number of test commands in B;

4.5 Test sample generation module generates a test sample set T, T = B × V ^Cartesian = {t ₁ , t ₂ , t ₃ , ..., _ta , ..., t _W }, 1≤a≤W, _ta is a two-tuple,

in,

The meaning is: the value of c _z is

W is the number of test samples in T,

is the uth possible value of c ₁ ,

is the hth possible value of c _z ,

is the jth possible value of c _N′ , 1≤u≤K ₁ , 1≤h≤K _z , 1≤j≤K _N′ , K ₁ , K _z , K _N′ are respectively the number of possible values of configuration items c ₁ , c _z , c _N′ extracted by the Spex algorithm, and are all positive integers; send the test sample set T to the performance defect detection module;

Step 5: The performance defect detection module detects the performance defects of the executable file of the software to be tested based on T and L:

5.1 The performance defect detection module executes the test samples in T and obtains the performance values of the test samples. The method is:

5.1.1 Initialize variable a=1;

5.1.2 The performance defect detection module repeats each test sample A times, and sets the variable repeat = 1, where A is a positive integer;

5.1.3 The performance defect detection module inputs the test sample t _a into the software to be tested, runs the software to be tested, and records the performance value obtained by running the software after the input t _a for the first time.

5.1.4 Determine whether repeat is equal to A. If so, a set of performance indicators for the test sample _ta is obtained, which is recorded as:

Go to 4.1.5; otherwise, set repeat = repeat + 1 and go to 4.1.3;

5.1.5 Determine whether a is equal to W. If so, record the output as Out = {[t ₁ , R ₁ ], ..., [t _a , _Ra ], ..., [t _W , R _W ]}, where the first element of the tuple [t _a , _Ra ] is the test sample, and the second element is the performance value set obtained by executing the test sample A times. Go to 5.2. Otherwise, let a = a + 1 and go to 5.1.2.

5.2 The performance defect detection module groups Out according to the test samples to obtain a grouped test result set G = {Group _{(1, 1)} , Group _{(1, 2)} , ..., Group _{(1, Y)} , ..., Group _{(z, y)} ..., Group _{(N′, Y)} };

5.3 The performance defect detection module uses the hypothesis testing method to determine whether the software to be tested has defects based on the performance expectation label L of the configuration item set C and the grouped test result set G:

5.3.1 Initialize variable z = 1;

5.3.2 Initialize variable y = 1;

5.3.3 If Lab _z = Label ₁ , set the hypothesis to be tested H ₀ : _Ra ≤ _Ra′ , where the value of c _z in _ta is 0, the value of c _z in ta _′ is 1, 1≤a′≤W, go to 5.3.8;

5.3.4 If Lab _z = Label ₂ , set the hypothesis to be tested H ₀ : _Ra ≤ _Ra′ , where the value of c _z in t _a is greater than the value of c _z in t _a′ , go to 5.3.8;

5.3.5 If Lab _z = Label ₃ , set the hypothesis to be tested H ₀ : _Ra ≤ _Ra′ , where the value of c _z in t _a is less than the value of c _z in t _a′ , go to 5.3.8;

5.3.6 If Lab _z = Label ₄ , set the hypothesis to be tested H0: 5·R _a ≤R _a′ , where the value of c _z in t _a is 1, and the value of c _z in t _a′ is 0, go to 5.3.8;

5.3.7 If Lab _z = Label ₅ , set the hypothesis to be tested H ₀ : _Ra ≠Ra _′ and go to 5.3.8;

5.3.8 When the hypothesis test result shows that H ₀ is rejected, that is, the rejection probability is ≥ 1-β, it indicates that the software has a performance defect related to configuration item c _z , and the test command that triggers the defect is

β is the hypothesis test parameter, which is a positive real number less than 1;

5.3.9 If y = Y, go to 5.3.10; otherwise, let y = y + 1 and go to 5.3.3;

5.3.10 If z=N′, end the test; otherwise, set z=z+1 and go to 5.3.2.

2. A software performance defect detection method based on configuration item performance expectations as described in claim 1, characterized in that the method of constructing a training set in step 2.1 is: randomly selecting N configuration items from more than 10,000 configuration items of 12 software including MySQL, MariaDB, Apache-httpd, Apache-Tomcat, Apache-Derby, H2, PostgreSQL, GCC, Clang, MongoDB, RocksDB, and Squid.

3. A method for detecting software performance defects based on configuration item performance expectations as claimed in claim 1, characterized in that in step 2.6, frequent subsequences in P ₁ , P ₂ , ..., P ₅ are screened according to the confidence of the frequent subsequences in P ₁ , P ₂ , ..., P _5, by:

2.6.1 Initialize variable l = 1;

2.6.2 Initialize variable q = 1;

2.6.3 If

Where 5 is the expected number of label types, then p _lq is put into the set P _l ′;

2.6.4 Determine whether q is equal to Q _l . If so, go to 2.6.5. If not, set q = q + 1 and go to 2.6.3.

2.6.5 Determine whether l is equal to 5. If so, it means that the filtered frequent subsequence set P ₁ ′, P ₂ ′, P ₃ ′, P ₄ ′, P ₅ ′ is obtained; if not, set l = l + 1 and go to 2.6.2.

4. A method for detecting software performance defects based on configuration item performance expectations as described in claim 1, characterized in that the threshold δ in step 2.7.3 is 99%-99.9%.

5. A method for detecting software performance defects based on configuration item performance expectations as described in claim 1, characterized in that the method for constructing a weighted voting classifier by the configuration item expectation prediction module in step 2.8 is as follows: the input of the weighted voting classifier is set to any preprocessed configuration item description "POS _x , DS _x " of the expected label to be predicted, "POS _x , DS _x " is abbreviated as x, and the output is the votes of the five expected labels, and the performance expected label of x is the expected label with the highest votes; wherein the votes of category l are the frequent subsequences

The sum of the confidences of , 1≤r _x ≤100, and

is a subsequence of x; the weighted voting classifier outputs a five-tuple of votes, recorded as

in,

The meaning is: "In P _l ", the sequence that satisfies "is a subsequence of x"

The sum of the confidences of"; if any element in Votes(x) is not 0, then find the element corresponding to the maximum value in Votes(x), and the sequence number l corresponding to this element is the sequence number corresponding to the performance expectation label of x, which is Label _l ; if Votes(x) = [0, 0, 0, 0, 0], then the performance expectation label of x is empty.

6. A software performance defect detection method based on configuration item performance expectations as described in claim 1, characterized in that the test sample generation module in step 4.1 uses the Spex algorithm to extract the syntax type and value range of the software configuration item in C.

7. A method for detecting software performance defects based on configuration item performance expectations as claimed in claim 1, characterized in that the method by which the test sample generation module in step 4.2 generates a set of test values V for the configuration item set C = {c ₁ , c ₂ , ..., c _z , ..., c _N′ } is:

4.2.1 Initialize variable z = 1;

4.2.2 If the expected label corresponding to c _z is empty, then let

Go to 4.2.7;

4.2.3 If c _z is a Boolean type, let V _z = {0, 1} and go to 4.2.7;

4.2.4 If c _z is an enumeration type, then let

in

All possible values of c _z extracted by the Spex algorithm, go to 4.2.7;

4.2.5 If c _z is a string type, then let

Go to 4.2.7;

4.2.6 If c _z is a numeric type, sample the value of c _z by recording the minimum and maximum values of c _z as Min and Max, setting V _z = {Min, 10·Min, 10 ² ·Min, Max, 10 ^-1 ·Max, 10 ^-2 ·Max}, and go to 4.2.7;

4.2.7 If z = N′, end; otherwise, let z = z + 1 and go to 4.2.2.

8. A software performance defect detection method based on configuration item performance expectations as described in claim 1, characterized in that the performance testing tool in step 4.4 refers to sysbench and apache-benchmark, and the parameters of the performance testing tool include concurrency, load type, data table size, number of data tables, read operation ratio, and write operation ratio.

9. A method for detecting software performance defects based on configuration item performance expectations as described in claim 1, characterized in that A in the fifth step is 10.

10. A method for detecting software performance defects based on configuration item performance expectations as claimed in claim 1, characterized in that the method in step 5.2 of grouping Out according to test samples by the performance defect detection module is:

5.2.1 Initialize variable a=1;

5.2.2 If [t _a , _Ra ] has been grouped, set a=a+1 and go to 5.2.2; otherwise go to 5.2.3;

5.2.3 Group [ _ta , _Ra ] according to the configuration item values and test commands in _ta , that is, if [ _ta , _Ra ] and {[ _t1 , _R1 ], ..., [ _ta , _Ra ], ..., [ _tW , _RW ]} and _ta' satisfy the following three conditions at the same time, then group [ _ta , _Ra ] and [ta _' , Ra _' _] :

Condition 1: t _a and t _a′ differ in the value of only one configuration item c _z , 1≤z≤N′;

Condition 2, the test commands are

Condition 3, [t _a , _Ra ], [t _a' , Ra _' ] are not grouped;

Let [t _a , _Ra ] satisfy the above conditions. _That is,

Divide into a group, denoted as Group _{(z, y)} ,

Wherein, 1≤a′, a″, ..., a*≤W, and Num _a is a positive integer;

5.2.4 If α=W, it means that the grouping is completed, and the test result set after grouping is G={Group _(1,1) , Group _(1,2) , ..., Group _(1,Y) , ..., Group _(z,y) ..., Group _(N′,Y) }, go to 5.3; otherwise, let α=a+1, go to 5.2.2.

11. A method for detecting software performance defects based on configuration item performance expectations as described in claim 1, characterized in that the size of _Numa is related to the type of _cz : if _cz is a Boolean type, _Numa = 2; if it is an enumeration type, _Numa = _Kz ; if it is a numerical type, _Numa = 6; if it is a string type, _Numa = 1.

12. A method for detecting software performance defects based on configuration item performance expectations as described in claim 1, characterized in that the test principle of the hypothesis test used by the performance defect detection module in step 5.3 is: if the expected label of any configuration item c _z is Label ₁ , Label ₂ or Label ₃ , the performance expectation of c _z is adjusted to improve the performance. If the actual test result is a performance degradation, then the software has a performance defect; if the expected label of c _z is Label ₄ , the performance expectation of c _z is adjusted to a reasonable performance degradation. If the actual test result is a significant performance degradation, then the software has a performance defect; if the expected label of c _z is Label ₅ , the performance expectation of c _z is adjusted to keep the performance unchanged. If the actual test result is a performance degradation, then the software has a performance defect.

13. A method for detecting software performance defects based on configuration item performance expectations as described in claim 1, characterized in that the hypothesis test parameter β in step 5.3 is 0.05.