CN113361780A

CN113361780A - Behavior data-based crowdsourcing tester evaluation method

Info

Publication number: CN113361780A
Application number: CN202110641778.0A
Authority: CN
Inventors: 王崇骏; 蒋先杰; 姚懿容; 李珂帆; 张雷
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-09-07

Abstract

The invention discloses an evaluation method for crowdsourcing testers. S10 data collection, public testing testers register information on the crowdsourcing platform, take the test, receive tasks, participate in the test and submit reports; S20 feature extraction, the sources of the collected data are divided into three sources: personnel information, human-machine collaboration, and historical test data. 22 indicators are extracted from the acquired data, including five types of characteristics: identity background, social network, ability proof, contract performance record, and historical behavior; S30 trains the logistic regression classifier, and compares the indicator data in the training data set with the public testers. The evaluation of the input logistic regression model, train the logistic regression model, use the gradient ascent method to calculate the maximum likelihood estimation of the loss function, and determine the parameters; S33, construct the prediction function, find the probability value of all the data, and calculate the probability value and threshold value according to the probability value and the threshold value. The result of the classifier is obtained; the evaluation of the person is obtained through the trained classifier.

Description

Behavior data-based crowdsourcing tester evaluation method

Technical Field

The invention belongs to the technical field of program analysis and verification in software engineering. And more particularly to a method for evaluating mobile application testers in a crowd-sourced manner.

Background

With the rapid development of mobile devices, mobile applications on the mainstream market become more and more powerful and complex. While users desire mobile applications to be reliable and secure, their increasing complexity also increases their likelihood of bugs. Software testing is becoming increasingly important to ensure quality during mobile application use. However, due to the particularity of mobile devices, such as unreliable network, large screen size difference, and various operating systems, testing of mobile applications is challenging.

Today, many companies or organizations tend to test mobile applications in a crowd-sourced manner by recruiting a large number of online people distributed throughout the world. Crowd-sourced testing involves different platforms, languages, and users than traditional testing. Developers can obtain real feedback information, functional requirements and user experience; a large number of testers can be recruited to perform testing at the same time, so that high parallelization is realized, and the testing efficiency is obviously improved; in addition, the crowdsourcing test can provide various test environments including mobile equipment, network environments, operating systems and the like, so that high software and hardware coverage rate is effectively guaranteed, and test cost is greatly reduced.

However, in the process of crowd-sourced testing, the testing levels of numerous testers are uneven. Some testers can accurately write test steps and accurately describe error phenomena, but some testers only draft some steps and fuzzy phenomenon descriptions, and developers cannot even reappear according to reports. In order to identify the quality of the test result of a crowd-surveyed person, the existing method extracts a morphological index, a vocabulary index, an analysis index and a relation index from a report submitted by the person, and then classifies the indexes to obtain the quality condition of the report. However, the above method only considers reports submitted by people and does not consider other factors of people, such as the conservation rate, the testing difficulty and the like.

Disclosure of Invention

The purpose of the invention is as follows: in order to more comprehensively evaluate the crowdsourcing testers, the invention provides an evaluation method of crowdsourcing testers, which is characterized in that corresponding data processing is carried out on personnel information and historical test data, five different characteristics are extracted, and a logistic regression classifier is trained, so that the evaluation result of the crowdsourcing testers is obtained.

In order to achieve the purpose, the invention adopts the technical scheme that: an evaluation method for crowdsourcing testers comprises the following steps:

s10, collecting data, wherein the public testing personnel register information on the crowdsourcing platform, take examinations, get tasks and participate in test submission reports;

s20, extracting features, namely dividing the source of the acquired data into three types, namely personnel information, man-machine cooperation and historical test data; extracting 22 index quantities of five characteristics of identity background, social contact, capability certification, performance record and historical behavior from the acquired data;

s30 training the logistic regression classifier, comprising the steps of:

s31, establishing a training data set, wherein each datum in the training data set corresponds to a crowd-surveyed person and comprises an independent variable field and a decision target field, the independent variable field comprises five characteristics of identity background, social contact, capability certification, performance record and historical behavior of the crowd-surveyed person, and the decision target field comprises the evaluation result of the expert on the person;

s32, inputting the index data in the training data set and the evaluation of people to a logistic regression model, training the logistic regression model, calculating the maximum likelihood estimation of the loss function by using a gradient ascent method, and determining parameters;

s33, constructing a prediction function, solving probability values of all data, and obtaining a result of the classifier according to the probability values and a threshold value;

and S40, classifying by using a logistic regression classifier to obtain the evaluation result of the people.

Preferably: the step S20 extracts 22 index quantities from the data as follows:

the identity background class characteristics include: academic, vocational, certification, resources, and territories. Wherein, the study calendar can embody the ever learning ability of the crowd-surveyor; occupation can show the correlation between the working background of people and crowdsourcing test; the certificate refers to the skill certificate holding quantity related to the test; the resource refers to whether people test personnel hold related equipment and networks; the region refers to the region attribution condition of people measuring personnel.

The social connection features include: social networks and partnerships. Wherein, the social network refers to other excellent crowders related to the crowders; partnerships refer to the scoring of partners in historical tasks.

Among the capability demonstration class features are: level examination, capability assessment test, task level completion, task participation distribution and timeliness. Wherein the level test represents the level of the public examination questions completed during registration; the ability evaluation test refers to the evaluation of examination scores aiming at the public testing ability; the completion task level represents the level distribution of the historical participation project; the participation task distribution represents the type and the corresponding time distribution of the historical participation project; the aging can reflect the efficiency of the participating tasks.

The fulfillment record class features include: number of savings, rate of savings, number of savings in the last year, and rate of savings in the last year. Wherein, the number of times of saving refers to the number of times of completing tasks on time in history; the conservation rate refers to the proportion of tasks completed on time in history; the number of times of last year appointment refers to the number of times of completing tasks on time in the past year; the last year contract rate refers to the proportion of tasks completed on time in the past year.

The historical behavior class characteristics comprise: experience values, heat, total value of the BUG, BUG mining capabilities, task points and report quality. Wherein the experience value refers to the participation condition of historical similar items; the heat degree refers to the number of times of completing tasks in the past three months; the total value of BUG is calculated based on severity, uniqueness and description length; the BUG digging capacity is the ratio of the BUG value to the number of dug in the project; the task points refer to historical participation task scores; the report quality is the number of praise/tramp reports.

Preferably, the formula for calculating the logistic regression loss function in step S32 is as follows:

wherein h is_w(X_i) Is X_iIs the prediction function in step S33, Y_iIs a label of the input data of the ith row, X_iIs an index of the input data of the ith row, m represents the input row number of the data, and w is the target parameter to be obtained. The loss function is for h_w(X_i) Penalties are given for erroneous conclusions. Thus, the smaller the loss is, the lower h is considered_w(X_i) The more accurate the conclusion of (a).

Preferably, the calculation formula of the logistic regression prediction function in step S33 is as follows:

where w is the determined target parameter, X is the input index vector, w^TIs the transpose of w.

Compared with the prior art, the invention has the following beneficial effects: aiming at the defects that the content of a test report is only considered in the existing method and the records of the identity background, the performance record and the like of a public test tester are not available, the method of the invention sets indexes from the five aspects of the identity background, social contact, capability certification, the performance record and historical behaviors: various factors are fully considered so as to achieve better evaluation effect of personnel. When a tester registers an account, obtaining related data of the personnel information according to the information filled in by the tester; acquiring man-machine cooperation related data according to the results of the level test and the capability evaluation test which are completed before the task is received; obtaining historical test data according to all historical test report results; three aspects of data were thus obtained. And training the data of the people to be tested by using a logistic regression classification model according to the data to finally obtain a trained logistic regression classification prediction function to evaluate the tested personnel (semi-quantitatively or quantitatively), and the method has the advantages of strong real-time performance, high detection efficiency and accurate evaluation effect.

Drawings

FIG. 1 is an overall flow chart of the method of the present invention.

FIG. 2 is a block diagram of an index system for evaluating characteristic parameters of people in testing.

FIG. 3 is a flow chart of the logistic regression classification performed in accordance with the present invention.

Detailed Description

The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are given solely for the purpose of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications will occur to those skilled in the art upon reading the present invention and fall within the limits of the appended claims.

As shown in fig. 1, the evaluation method for crowdsourcing of testers includes the following steps:

s20 feature extraction, as shown in FIG. 2, extracting 22 index quantities of five features of identity background, social connections, capability certification, performance record and historical behaviors from the acquired data, and expressing the index quantities as X- [ X [ [ X ] X₁,x₂,...,x₂₂]；

Wherein, the identity background class features include academic calendar, occupation, certificate, resource and region, and are represented as [ x [ ]₁,x₂,x₃,x₄,x₅]. Wherein, the study calendar can embody the ever learning ability of the crowd-surveyor; occupation can show the correlation between the working background of people and crowdsourcing test; certificates refer to test-related technical certificatesThe number of book holders; the resource refers to whether people test personnel hold related equipment and networks; the region refers to the region attribution condition of people measuring personnel.

Among the social connections, there are social networks and cooperative relationships, and the social connections are represented as [ x ]₆,x₇]. Wherein, the social network refers to other excellent crowders related to the crowders; partnerships refer to the scoring of partners in historical tasks.

Wherein, the capability certification characteristics include level examination, capability evaluation test, task level completion, task distribution participation and time effectiveness, and are expressed as [ x [ ]₈,x₉,x₁₀,x₁₁,x₁₂]. Wherein the level test represents the level of the public examination questions completed during registration; the ability evaluation test refers to the evaluation of examination scores aiming at the public testing ability; the completion task level represents the level distribution of the historical participation project; the participation task distribution represents the type and the corresponding time distribution of the historical participation project; the aging can reflect the efficiency of the participating tasks.

Wherein the record features of the performance record include number of contacts, contact rate, number of contacts in the last year and contact rate in the last year, which are represented as [ x ]₁₃,x₁₄,x₁₅,x₁₆]. Wherein, the number of times of saving refers to the number of times of completing tasks on time in history; the conservation rate refers to the proportion of tasks completed on time in history; the number of times of last year appointment refers to the number of times of completing tasks on time in the past year; the last year contract rate refers to the proportion of tasks completed on time in the past year.

Wherein, the historical behavior class characteristics comprise empirical values, heat degrees, BUG total values, BUG mining capacity, task integrals and report quality, and are expressed as [ x [₁₇,x₁₈,x₁₉,x₂₀,x₂₁,x₂₂]. Wherein the experience value refers to the participation condition of historical similar items; the heat degree refers to the number of times of completing tasks in the past three months; the total value of BUG is calculated based on severity, uniqueness and description length; the BUG digging capacity is the ratio of the BUG value to the number of dug in the project; the task points refer to historical participation task scores; the report quality is the number of praise/tramp reports.

S30 training the logistic regression classifier, comprising the steps of:

s31, establishing a training data set, wherein each data in the training data set corresponds to a public measuring person and comprises an independent variable field and a decision target field, wherein the independent variable field X- [ X ] is₁,x₂,...,x₂₂]The method comprises five characteristics of identity background, social connections, capability certification, performance records and historical behaviors of people, wherein a decision target field y comprises the evaluation result of an expert on the people;

s32, inputting the index data in the training data set and the evaluation of the public testing personnel into a logistic regression model, training the logistic regression model, and calculating the loss function by using a gradient ascent method

Determining the parameter w (w) of the maximum likelihood estimation of₁,w₂,...,w₂₂) Is w (0.22261609, 0.20556465.., 0.17643208);

s33, constructing a prediction function

Finding X- [3, 1., 1]Is 0.7512438, and the result y-best of the classifier is obtained according to the probability value 0.7512438 > t (the threshold value t is 0.5);

s40 classification, using a logistic regression classifier to score 10 people testing personnel in a certain city, and obtaining the evaluation results of (excellent, poor, excellent, poor, excellent and excellent).

The invention aims at the aspects that the existing method only considers report contents and does not measure the identity background, performance record and the like of testers. The method sets indexes from five aspects of identity background, social contact, capability certification, performance record and historical behavior. Various factors are fully considered so as to achieve better evaluation effect of personnel. The logistic regression classification model is used for training data of people to be tested, and finally the trained logistic regression classification prediction function is obtained to evaluate the people.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. the evaluation method of a crowdsourcing tester, is characterized in that, comprises the steps:

S10 data collection, public testing testers register information on the crowdsourcing platform, take the exam, receive tasks and participate in the test to submit reports;

S20 feature extraction, the sources of the collected data are divided into three categories: personnel information, human-machine collaboration, and historical test data; five types of features are extracted from the acquired data: identity background, social connections, ability proof, contract performance records, and historical behavior. 22 indicators;

S30 trains a logistic regression classifier, including the following steps:

S31, establish a training data set, each data in the training data set corresponds to a public tester, including independent variable fields and decision target fields, wherein the independent variable fields include the public testers' identity background, social contacts, ability proof, contract performance record 、Five types of characteristics of historical behavior, the decision target field includes the evaluation results of experts on the person;

S32, input the index data in the training data set and the evaluation of the public testers into the logistic regression model, train the logistic regression model, use the gradient ascent method to calculate the maximum likelihood estimation of the loss function, and determine the parameters;

S33, construct a prediction function, obtain the probability value of all the data, and obtain the result of the classifier according to the probability value and the threshold value;

S40 classification, use the logistic regression classifier to classify, and obtain the evaluation results of the public testers.

2. according to the evaluation method of the described crowdsourcing tester of claim 1, it is characterized in that: described step S20 extracts 22 index quantities from data as follows:

The identity background features include: education, occupation, certificate, resource and region;

The social networking features include: social network and cooperative relationship;

The capability certification features include: level examination, capability assessment test, task completion level, participation task distribution and time limit;

The features of the performance record category include: the number of times of compliance, the rate of compliance, the number of times of compliance in the past year and the rate of compliance in the past year;

The historical behavior features include: experience value, popularity, total bug value, bug mining ability, task points, and report quality.

3. according to the evaluation method of the described crowdsourcing tester of claim 1, it is characterized in that: in step S32, the loss function calculation formula of logistic regression is as follows:

Among them, h _w (X _i ) is the conditional probability of x _i , Y _i is the label of the input data in the i-th row, X _i is the index of the input data in the i-th row, m represents the number of input rows of the data, and w is the desired target parameter.

4. according to the evaluation method of the described crowdsourcing tester of claim 1, it is characterized in that: the prediction function calculation formula of logistic regression in step S33 is as follows:

where w is the target parameter, X is the input indicator vector, and w ^T is the transpose of w.