CN112631922A

CN112631922A - Flow playback data selection method, system and storage medium

Info

Publication number: CN112631922A
Application number: CN202011584777.9A
Authority: CN
Inventors: 袁丽莉; 梁北才; 杨浩文
Original assignee: Guangzhou Pinwei Software Co Ltd
Current assignee: Guangzhou Pinwei Software Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-09

Abstract

The invention discloses a traffic playback data selection method, system and storage medium. The method includes: grabbing interface request data according to a preset time interval; calculating the signature value of each interface request data according to a simhash algorithm, The signature value is marked as the interface data set; the interface data set is divided into k interface data subsets according to the k-means algorithm; n/k signature values are respectively selected from each of the interface data subsets as playback data The present invention can, under the situation of the same amount of playback data, have larger coverage relative to the playback mode of the traditional random selection interface request data, can effectively avoid the influence of test judgment due to the small coverage of playback data, and when the coverage of this method and When the playback method of the data requested by the traditional random selection interface is the same, the amount of playback data to be selected by this method is much smaller than that of the traditional playback method, which can effectively improve the playback speed and improve the test efficiency.

Description

Flow playback data selection method, system and storage medium

Technical Field

The invention relates to the field of software testing, in particular to a flow playback data selection method, a flow playback data selection system and a storage medium.

Background

The flow playback is a vital method for monitoring the quality of the pre-release code before the software pre-release code version is online. The existing flow playback data mainly captures the existing interface request data from the line in a random mode so as to directly play back the data.

However, randomly selecting a certain amount of interface request data may cause insufficient coverage of the played back interface request data, and in order to compensate for the insufficient coverage, it is generally necessary to select as many interface request data as possible for playing back, so as to avoid that the test result of the pre-release code is affected due to insufficient coverage of the selected interface request data. However, if the data amount of the interface request data is too large, the playback time is too long, and the playback efficiency is seriously affected.

Disclosure of Invention

The invention aims to provide a flow playback data selection method, which can have a larger coverage area compared with the traditional playback mode of randomly selecting interface request data under the condition of the same amount of playback data, can effectively avoid the influence on test judgment caused by small coverage area of the playback data, and can effectively improve the playback speed to greatly improve the test efficiency because the amount of the playback data required to be selected by the method is far smaller than that of the traditional playback method when the coverage area of the method is the same as that of the traditional playback mode of randomly selecting the interface request data.

Another objective of the present invention is to provide a flow playback data selection system, which has a larger coverage area than the playback manner of the conventional random selection interface request data under the condition of the same amount of playback data, and can effectively avoid the influence on the test judgment due to a small coverage area of the playback data, and when the coverage area of the method is the same as the playback manner of the conventional random selection interface request data, the amount of the playback data required to be selected by the method is much smaller than that of the conventional playback method, and the playback speed can be effectively increased to greatly improve the test efficiency.

A further object of the present invention is to provide a storage medium, which has a larger coverage area than the conventional playback manner of randomly selecting interface request data under the condition of the same amount of playback data, and can effectively avoid the influence on test judgment due to a small coverage area of the playback data.

In order to achieve the aim, the invention discloses a flow playback data selection method, which comprises the following steps:

s1, capturing interface request data according to a preset time interval;

s2, calculating the signature value of each interface request data according to a simhash algorithm, and marking all the obtained signature values as an interface data aggregate;

s3, dividing the interface data total set into k interface data subsets according to a k-means algorithm;

and S4, respectively selecting n/k signature values from each interface data subset as playback data, wherein n is the total number of interface request data needing to be played back.

Compared with the prior art, the signature value of each interface request data is calculated through a simhash algorithm, all the obtained signature values are marked as an interface data aggregate, the interface data aggregate is divided into k interface data subsets according to a k-means algorithm, so that the k interface data subsets have the same or similar weight, and the data in each interface data subset can be considered to have the same playback value, so that when n/k signature values are respectively selected from each interface data subset as playback data, under the condition of the same number of playback data, the method has a larger coverage compared with the traditional playback mode of randomly selecting the interface request data, the method has more representative significance and stronger reliability, and the influence on test judgment caused by the small coverage of the playback data can be effectively avoided; in addition, when the coverage of the method is the same as the playback mode of the traditional random selection interface request data, the number of the playback data required to be selected by the method is far smaller than that of the traditional playback method, so that the playback speed is effectively increased to greatly improve the test efficiency.

Preferably, the step (2) specifically includes:

s21, dividing the current interface request data into a plurality of interface words according to the parameter values and the corresponding values of the current interface request data;

s22, respectively calculating the hash value of each interface word to obtain the vector characteristic value of the current interface word, and carrying out vector combination on all vector characteristic values corresponding to the current interface request data to obtain the signature value of the current interface request data;

and S23, marking all the obtained signature values as an interface data aggregate.

Preferably, the step (22) specifically comprises:

s221, respectively calculating the hash value of each interface word to obtain a first vector characteristic value of the current interface word;

and S222, carrying out vector combination on all first vector characteristic values corresponding to the current interface request data to obtain a signature value of the current interface request data.

Preferably, the step (222) specifically includes:

s2221, weighting each first vector characteristic value to obtain a second vector characteristic value;

s2222, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data, and a signature value of the current interface request data is obtained.

Preferably, the step (2222) specifically includes:

s22221, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data to obtain a third vector characteristic value;

s22222, the third vector characteristic value is subjected to dimensionality reduction to obtain a signature value of the current interface request data.

Preferably, the step (3) specifically includes:

s31, randomly selecting k signature values from the interface data total set as virtual center points;

s32, dividing the interface data total set into k interface data subsets, wherein each interface data subset only comprises a virtual center point;

s33, respectively calculating the Hamming distance from all signature values except the virtual center point in the interface data total set to each virtual center point, and classifying each signature value into an interface data subset in which the virtual center point with the smallest Hamming distance from the current signature value to all virtual center points is located;

s34, calculating the central point of each interface data subset, and taking the central point as a new virtual central point of the current interface data subset;

and S35, repeatedly iterating the interface data aggregate according to the new virtual central point to obtain k converged interface data subsets.

Preferably, the flow playback data selection method is respectively executed in the pre-release code and the comparison code;

the step (4) is followed by:

s5, acquiring playback data of the pre-issued codes and playback data of the comparison codes;

s6, performing differential analysis on the playback data of the pre-issued codes and the playback data of the comparison codes;

and S7, judging whether the pre-issued codes have problems according to the analysis result.

Preferably, the value of k is between A and B, wherein A, B and k are natural numbers.

Correspondingly, the invention also discloses a flow playback data selection system, which comprises:

the data capturing module is used for capturing the interface request data according to a preset time interval;

the first processing module is used for calculating a signature value of each interface request data according to a simhash algorithm and marking all obtained signature values as an interface data aggregate;

the second processing module is used for dividing the interface data total set into k interface data subsets according to a k-means algorithm;

and the execution module is used for respectively selecting n/k signature values from each interface data subset as playback data, wherein n is the total number of the interface request data needing to be played back.

Correspondingly, the invention also discloses a storage medium for storing a computer program, and the program realizes the flow playback data selection method when being executed by a processor.

Drawings

FIG. 1 is a block flow diagram of a flow playback data selection method of the present invention;

FIG. 2 is a block flow diagram of step (2) of the flow playback data selection method of the present invention;

fig. 3 is a flow chart of step (3) in the flow playback data selection method of the present invention;

fig. 4 is a block diagram of the flow playback data selection system of the present invention.

Detailed Description

In order to explain technical contents, structural features, and objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

Referring to fig. 1 to fig. 3, the present invention discloses a method for selecting flow playback data, which includes the following steps:

and S1, capturing the interface request data according to the preset time interval.

It can be understood that the interface request data captured each time is all interface request data generated at two adjacent time intervals, and after multiple data captures, a plurality of numbers of interface request data are obtained. For example, the preset time interval is 100ms, and 100 times of capturing are performed, and assuming that the number of interface request data captured each time is one hundred thousand, ten million interface request data need to be processed this time. It should be noted that the interface request data is dynamically generated, and the number of request data captured each time is not necessarily fixed, such as capturing one hundred thousand pieces for the first time, capturing fifty thousand pieces for the second time, capturing twelve thousand pieces for the third time, and the like.

And S2, calculating the signature value of each interface request data according to a simhash algorithm, and marking all the obtained signature values as an interface data aggregate.

The simhash algorithm is an algorithm for calculating a locality-sensitive hash value. The partially sensitive hash value is understood as that, assuming that two strings have certain similarity, after the hash value is calculated, the two strings still can maintain the similarity, which is called the partially sensitive hash value. The common hash value does not have the local sensitivity attribute, and the simhash algorithm is mainly applied to the deduplication of mass data in a search engine.

Because the interface request data are inconvenient to directly process and calculate, the method calculates the signature value of each interface request data by using a simhash algorithm, not only can convert each interface request data into the signature value to facilitate calculation, but also can keep the similarity between the interface request data so as to reduce the influence on the similarity between the interface request data after the interface request data are converted into the signature values.

And S3, dividing the interface data total set into k interface data subsets according to a k-means algorithm.

The k-means algorithm is a clustering analysis algorithm for iterative solution, and iterative solution is performed through k interface data subsets to locally minimize the sum of squared errors between the k interface data subsets, so that the k interface data subsets have the same weight.

Since the k interface data subsets obtained in step (3) have the same weight, at this time, n/k signature values are respectively selected from each interface data subset as playback data, and the total number of the playback data is n, it can be understood that each piece of playback data has the same weight, and the selected n pieces of playback data can theoretically cover all interface request data.

Preferably, the step (2) specifically includes:

and S21, dividing the current interface request data into a plurality of interface words according to the parameter values and the corresponding values of the current interface request data.

It is understood that, for example, if the current interface request data is http:// mapi.v. com/viss-mobile/rest/favorite/store/status + 1577329909& user _ token + 12345, the current interface request data can be divided into two interface words, i.e., time + 1577329909 and user _ token + 12345. Of course, for other interface request data, the number of interface words may also be one, three, or four, and the actual number of each interface word is derived from the parameter and the corresponding value of the current interface request data.

And S22, respectively calculating the hash value of each interface word to obtain the vector characteristic value of the current interface word, and carrying out vector combination on all vector characteristic values corresponding to the current interface request data to obtain the signature value of the current interface request data.

The step of calculating the hash value of the interface word obtains the vector characteristic value of the current interface word, wherein the vector characteristic value is a two-dimensional vector characteristic value. For example, when HASH1(timestamp is 1577329909) is 100101, HASH2(user _ token is 12345) is 101011, vector eigenvalue T1 of HASH1 is (1, 0, 0, 1, 0, 1), and vector eigenvalue T2 of HASH2 is (1, 0, 1, 0, 1).

Preferably, the step (22) specifically comprises:

s221, respectively calculating the hash value of each interface word to obtain a first vector characteristic value of the current interface word.

Corresponding to step (22), in this case, the first vector eigenvalue T1 of HASH1 in step (221) is (1, 0, 0, 1, 0, 1), and the first vector eigenvalue T2 of HASH2 is (1, 0, 1, 0, 1, 1).

Preferably, the step (222) specifically includes:

s2221, weighting each first vector characteristic value to obtain a second vector characteristic value.

Specifically, each first vector feature is weighted according to the formula W ═ HASH × WEIGHT, and when 1 is encountered, the HASH value is multiplied by the WEIGHT positively, and when 0 is encountered, the HASH value is multiplied by the WEIGHT negatively, so as to obtain second vector feature values T1 'and T2'.

Preferably, the step (2222) specifically includes:

s22221, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data, and a third vector characteristic value is obtained. I.e., the third vector eigenvalue T ' ═ T1 ' + T2 '.

Specifically, for each vector value in T ', if the vector value is greater than 0, the vector value is set to 1, otherwise, the vector value is set to 0, and a signature value T of the current interface request data is obtained, where the step is to process each vector value in T ' to make the value of each vector value in T ' be 0 or 1, so as to simplify subsequent calculation.

Preferably, the step (3) specifically includes:

and S31, randomly selecting k signature values from the interface data total set as virtual center points.

Preferably, the value of k is between A and B, wherein A, B and k are natural numbers. If A is constant 10 and B is constant 50, the value of k lies within the interval [10, 50], e.g., k is constant 15, 18, 20, 25 or 30, etc. It should be noted that the values of a and B are selected according to actual requirements, and are not limited herein.

And S32, dividing the interface data total set into k interface data subsets, wherein each interface data subset only comprises one virtual center point.

And S33, respectively calculating the Hamming distance from all signature values except the virtual center point in the interface data total set to each virtual center point, and classifying each signature value into the interface data subset in which the virtual center point with the smallest Hamming distance from the current signature value to all virtual center points is located.

And S34, calculating the central point of each interface data subset, and taking the central point as a new virtual central point of the current interface data subset.

It is understood that step (32), step (33) and step (34) are repeatedly performed until all k subsets of interface data converge, at which time the division of the total set of interface data forms k subsets of interface data, and the importance of each subset of interface data can be considered to be the same or similar, i.e. each subset of interface data has the same weight.

Preferably, the flow playback data selection method is executed in the pre-release code and the comparison code respectively.

The pre-release code refers to the code to be tested, the comparison code refers to the code which can normally run after the early-stage test, and corresponding playback data can be obtained by executing the method in the pre-release code and the comparison code.

The step (4) is followed by:

s5, obtaining the playback data of the pre-issued codes and the playback data of the comparison codes.

And S6, performing differential analysis on the playback data of the pre-issued codes and the playback data of the comparison codes.

It is understood that the differential analysis refers to comparing the corresponding signature values of the pre-release code and each interface in the comparison code, such as comparing the difference between the corresponding signatures of the pre-release code and each interface in the comparison code, manually or automatically. And when the difference value is larger than the preset threshold value, the problem of the pre-issued code can be judged, and a programmer is reminded of needing to modify the pre-issued code.

Referring to fig. 4, correspondingly, the present invention also discloses a flow playback data selecting system, which includes:

the data grabbing module 10 is configured to grab the interface request data according to a preset time interval;

the first processing module 20 is configured to calculate a signature value of each interface request data according to a simhash algorithm, and mark all obtained signature values as an interface data aggregate;

the second processing module 30 is configured to divide the total set of interface data into k interface data subsets according to a k-means algorithm;

and the execution module 40 is configured to select n/k signature values from each of the interface data subsets as playback data, where n is a total number of interface request data to be played back.

With reference to fig. 1 to 4, the signature value of each interface request data is calculated by a simhash algorithm, all the obtained signature values are marked as an interface data aggregate, the interface data aggregate is divided into k interface data subsets according to a k-means algorithm, so that the k interface data subsets have the same or similar weights, and it can be considered that the data in each interface data subset have the same playback value, so that when n/k signature values are respectively selected from each interface data subset as playback data, under the condition of the same number of playback data, the method has a larger coverage compared with the traditional playback mode of randomly selecting interface request data, has more representative significance and stronger reliability, and can effectively avoid influence on test judgment due to small coverage of the playback data; in addition, when the coverage of the method is the same as the playback mode of the traditional random selection interface request data, the number of the playback data required to be selected by the method is far smaller than that of the traditional playback method, so that the playback speed is effectively increased to greatly improve the test efficiency.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the scope of the present invention, therefore, the present invention is not limited by the appended claims.

Claims

1. a traffic playback data selection method, is characterized in that, comprises the steps:

Capture interface request data at preset time intervals;

Calculate the signature value of each of the interface request data according to the simhash algorithm, and mark all the obtained signature values as the interface data aggregate;

Divide the total set of interface data into k interface data subsets according to the k-means algorithm;

n/k signature values are respectively selected from each of the interface data subsets as playback data, where n is the total number of interface request data to be played back.

2. The method for selecting traffic playback data as claimed in claim 1, characterized in that, according to the simhash algorithm, the signature value of each described interface request data is calculated, and all signature values obtained are marked as the interface data collection, specifically include:

According to the input parameter value and corresponding value of the current interface request data, divide the current interface request data into several interface words;

Calculate the hash value of each described interface word separately to obtain the vector feature value of the current interface word, and perform vector merging of all vector feature values corresponding to the current interface request data to obtain the signature value of the current interface request data;

Mark all obtained signature values as the interface data set.

3. The method for selecting traffic playback data as claimed in claim 2, wherein the hash value of each described interface word is calculated separately to obtain the vector characteristic value of the current interface word, and the corresponding value of the current interface request data is calculated. All vector eigenvalues are combined by vector to obtain the signature value of the current interface request data, including:

Calculate the hash value of each interface word separately to obtain the first vector feature value of the current interface word;

All the first vector feature values corresponding to the current interface request data are vector combined to obtain the signature value of the current interface request data.

4. The method for selecting traffic playback data as claimed in claim 3, wherein the vector merging is carried out for all the first vector eigenvalues corresponding to the current interface request data, and the signature value of the current interface request data is obtained, specifically comprising:

Perform weighting processing on each of the first vector eigenvalues to obtain second vector eigenvalues;

Perform vector combination of all the second vector feature values corresponding to the current interface request data to obtain the signature value of the current interface request data.

5. The method for selecting traffic playback data as claimed in claim 4, wherein the vector merging is carried out for all second vector eigenvalues corresponding to the current interface request data to obtain the signature value of the current interface request data, specifically comprising:

Perform vector merging of all second vector eigenvalues corresponding to the current interface request data to obtain a third vector eigenvalue;

Dimensionality reduction processing is performed on the third vector eigenvalue to obtain the signature value of the current interface request data.

6. The method for selecting traffic playback data as claimed in claim 1, wherein the interface data collection is divided into k interface data subsets according to a k-means algorithm, specifically comprising:

Randomly select k signature values from the total set of interface data as virtual center points;

dividing the interface data set into k interface data subsets, and each of the interface data subsets only includes one virtual center point;

Calculate the Hamming distance from all signature values except the virtual center point in the interface data total set to each virtual center point, and classify each signature value into the Hamming distance between the current signature value and all virtual center points The subset of interface data where the virtual center point with the smallest distance is located;

Calculate the center point of each interface data subset, and use the center point as a new virtual center point of the current interface data subset;

The interface data set is repeatedly iterated according to the new virtual center point to obtain k converged interface data subsets.

7. The traffic playback data selection method as claimed in claim 1, wherein the traffic playback data selection method is performed in pre-release code and contrast code respectively;

Said selecting respectively n/k signature values from each of said interface data subsets as playback data, wherein, n is the total number of interface request data that needs to be played back, and further includes:

Obtain the playback data of the pre-release code and the playback data of the comparison code;

Differentiated analysis is performed on the playback data of the pre-release code and the playback data of the comparison code;

Determine whether there is a problem with the pre-release code according to the analysis result.

8. The method for selecting traffic playback data according to claim 1, wherein the value of k is between A and B, wherein A, B, and k are all natural numbers.

9. A traffic playback data selection system is characterized in that, comprising:

The data capture module is used to capture the interface request data according to the preset time interval;

The first processing module is used to calculate the signature value of each described interface request data according to the simhash algorithm, and mark all the obtained signature values as the interface data collection;

a second processing module, configured to divide the interface data aggregate into k interface data subsets according to the k-means algorithm;

The execution module is configured to select n/k signature values from each of the interface data subsets as playback data, where n is the total number of interface request data to be played back.

10 . A storage medium for storing a computer program, characterized in that: when the program is executed by a processor, the method for selecting traffic playback data according to any one of claims 1 to 8 is implemented.