[go: up one dir, main page]

CN112631922A - Flow playback data selection method, system and storage medium - Google Patents

Flow playback data selection method, system and storage medium Download PDF

Info

Publication number
CN112631922A
CN112631922A CN202011584777.9A CN202011584777A CN112631922A CN 112631922 A CN112631922 A CN 112631922A CN 202011584777 A CN202011584777 A CN 202011584777A CN 112631922 A CN112631922 A CN 112631922A
Authority
CN
China
Prior art keywords
data
interface
playback
vector
request data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011584777.9A
Other languages
Chinese (zh)
Inventor
袁丽莉
梁北才
杨浩文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Pinwei Software Co Ltd
Original Assignee
Guangzhou Pinwei Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Pinwei Software Co Ltd filed Critical Guangzhou Pinwei Software Co Ltd
Priority to CN202011584777.9A priority Critical patent/CN112631922A/en
Publication of CN112631922A publication Critical patent/CN112631922A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/3668Testing of software
    • G06F11/3672Test management
    • G06F11/3676Test management for coverage analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/3668Testing of software
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/3668Testing of software
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

本发明公开了流量回放数据选取方法、系统和存储介质,该方法包括:按照预设的时间间隔抓取接口请求数据;根据simhash算法计算每一所述接口请求数据的签名值,将获得的所有签名值标记为接口数据总集;根据k‑means算法将所述接口数据总集划分为k个接口数据子集;从每一所述接口数据子集中分别选取n/k个签名值作为回放数据;本发明能在同一数量的回放数据的情况下,相对传统随机选取接口请求数据的回放方式具有更大的覆盖面,能有效避免因回放数据覆盖面小而影响测试判断,且当本方法的覆盖面和传统随机选取接口请求数据的回放方式相同时,本方法需要选取的回放数据的数量远小于传统回放方法,能有效提高回放速度以提升测试效率。

Figure 202011584777

The invention discloses a traffic playback data selection method, system and storage medium. The method includes: grabbing interface request data according to a preset time interval; calculating the signature value of each interface request data according to a simhash algorithm, The signature value is marked as the interface data set; the interface data set is divided into k interface data subsets according to the k-means algorithm; n/k signature values are respectively selected from each of the interface data subsets as playback data The present invention can, under the situation of the same amount of playback data, have larger coverage relative to the playback mode of the traditional random selection interface request data, can effectively avoid the influence of test judgment due to the small coverage of playback data, and when the coverage of this method and When the playback method of the data requested by the traditional random selection interface is the same, the amount of playback data to be selected by this method is much smaller than that of the traditional playback method, which can effectively improve the playback speed and improve the test efficiency.

Figure 202011584777

Description

Flow playback data selection method, system and storage medium
Technical Field
The invention relates to the field of software testing, in particular to a flow playback data selection method, a flow playback data selection system and a storage medium.
Background
The flow playback is a vital method for monitoring the quality of the pre-release code before the software pre-release code version is online. The existing flow playback data mainly captures the existing interface request data from the line in a random mode so as to directly play back the data.
However, randomly selecting a certain amount of interface request data may cause insufficient coverage of the played back interface request data, and in order to compensate for the insufficient coverage, it is generally necessary to select as many interface request data as possible for playing back, so as to avoid that the test result of the pre-release code is affected due to insufficient coverage of the selected interface request data. However, if the data amount of the interface request data is too large, the playback time is too long, and the playback efficiency is seriously affected.
Disclosure of Invention
The invention aims to provide a flow playback data selection method, which can have a larger coverage area compared with the traditional playback mode of randomly selecting interface request data under the condition of the same amount of playback data, can effectively avoid the influence on test judgment caused by small coverage area of the playback data, and can effectively improve the playback speed to greatly improve the test efficiency because the amount of the playback data required to be selected by the method is far smaller than that of the traditional playback method when the coverage area of the method is the same as that of the traditional playback mode of randomly selecting the interface request data.
Another objective of the present invention is to provide a flow playback data selection system, which has a larger coverage area than the playback manner of the conventional random selection interface request data under the condition of the same amount of playback data, and can effectively avoid the influence on the test judgment due to a small coverage area of the playback data, and when the coverage area of the method is the same as the playback manner of the conventional random selection interface request data, the amount of the playback data required to be selected by the method is much smaller than that of the conventional playback method, and the playback speed can be effectively increased to greatly improve the test efficiency.
A further object of the present invention is to provide a storage medium, which has a larger coverage area than the conventional playback manner of randomly selecting interface request data under the condition of the same amount of playback data, and can effectively avoid the influence on test judgment due to a small coverage area of the playback data.
In order to achieve the aim, the invention discloses a flow playback data selection method, which comprises the following steps:
s1, capturing interface request data according to a preset time interval;
s2, calculating the signature value of each interface request data according to a simhash algorithm, and marking all the obtained signature values as an interface data aggregate;
s3, dividing the interface data total set into k interface data subsets according to a k-means algorithm;
and S4, respectively selecting n/k signature values from each interface data subset as playback data, wherein n is the total number of interface request data needing to be played back.
Compared with the prior art, the signature value of each interface request data is calculated through a simhash algorithm, all the obtained signature values are marked as an interface data aggregate, the interface data aggregate is divided into k interface data subsets according to a k-means algorithm, so that the k interface data subsets have the same or similar weight, and the data in each interface data subset can be considered to have the same playback value, so that when n/k signature values are respectively selected from each interface data subset as playback data, under the condition of the same number of playback data, the method has a larger coverage compared with the traditional playback mode of randomly selecting the interface request data, the method has more representative significance and stronger reliability, and the influence on test judgment caused by the small coverage of the playback data can be effectively avoided; in addition, when the coverage of the method is the same as the playback mode of the traditional random selection interface request data, the number of the playback data required to be selected by the method is far smaller than that of the traditional playback method, so that the playback speed is effectively increased to greatly improve the test efficiency.
Preferably, the step (2) specifically includes:
s21, dividing the current interface request data into a plurality of interface words according to the parameter values and the corresponding values of the current interface request data;
s22, respectively calculating the hash value of each interface word to obtain the vector characteristic value of the current interface word, and carrying out vector combination on all vector characteristic values corresponding to the current interface request data to obtain the signature value of the current interface request data;
and S23, marking all the obtained signature values as an interface data aggregate.
Preferably, the step (22) specifically comprises:
s221, respectively calculating the hash value of each interface word to obtain a first vector characteristic value of the current interface word;
and S222, carrying out vector combination on all first vector characteristic values corresponding to the current interface request data to obtain a signature value of the current interface request data.
Preferably, the step (222) specifically includes:
s2221, weighting each first vector characteristic value to obtain a second vector characteristic value;
s2222, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data, and a signature value of the current interface request data is obtained.
Preferably, the step (2222) specifically includes:
s22221, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data to obtain a third vector characteristic value;
s22222, the third vector characteristic value is subjected to dimensionality reduction to obtain a signature value of the current interface request data.
Preferably, the step (3) specifically includes:
s31, randomly selecting k signature values from the interface data total set as virtual center points;
s32, dividing the interface data total set into k interface data subsets, wherein each interface data subset only comprises a virtual center point;
s33, respectively calculating the Hamming distance from all signature values except the virtual center point in the interface data total set to each virtual center point, and classifying each signature value into an interface data subset in which the virtual center point with the smallest Hamming distance from the current signature value to all virtual center points is located;
s34, calculating the central point of each interface data subset, and taking the central point as a new virtual central point of the current interface data subset;
and S35, repeatedly iterating the interface data aggregate according to the new virtual central point to obtain k converged interface data subsets.
Preferably, the flow playback data selection method is respectively executed in the pre-release code and the comparison code;
the step (4) is followed by:
s5, acquiring playback data of the pre-issued codes and playback data of the comparison codes;
s6, performing differential analysis on the playback data of the pre-issued codes and the playback data of the comparison codes;
and S7, judging whether the pre-issued codes have problems according to the analysis result.
Preferably, the value of k is between A and B, wherein A, B and k are natural numbers.
Correspondingly, the invention also discloses a flow playback data selection system, which comprises:
the data capturing module is used for capturing the interface request data according to a preset time interval;
the first processing module is used for calculating a signature value of each interface request data according to a simhash algorithm and marking all obtained signature values as an interface data aggregate;
the second processing module is used for dividing the interface data total set into k interface data subsets according to a k-means algorithm;
and the execution module is used for respectively selecting n/k signature values from each interface data subset as playback data, wherein n is the total number of the interface request data needing to be played back.
Correspondingly, the invention also discloses a storage medium for storing a computer program, and the program realizes the flow playback data selection method when being executed by a processor.
Drawings
FIG. 1 is a block flow diagram of a flow playback data selection method of the present invention;
FIG. 2 is a block flow diagram of step (2) of the flow playback data selection method of the present invention;
fig. 3 is a flow chart of step (3) in the flow playback data selection method of the present invention;
fig. 4 is a block diagram of the flow playback data selection system of the present invention.
Detailed Description
In order to explain technical contents, structural features, and objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
Referring to fig. 1 to fig. 3, the present invention discloses a method for selecting flow playback data, which includes the following steps:
and S1, capturing the interface request data according to the preset time interval.
It can be understood that the interface request data captured each time is all interface request data generated at two adjacent time intervals, and after multiple data captures, a plurality of numbers of interface request data are obtained. For example, the preset time interval is 100ms, and 100 times of capturing are performed, and assuming that the number of interface request data captured each time is one hundred thousand, ten million interface request data need to be processed this time. It should be noted that the interface request data is dynamically generated, and the number of request data captured each time is not necessarily fixed, such as capturing one hundred thousand pieces for the first time, capturing fifty thousand pieces for the second time, capturing twelve thousand pieces for the third time, and the like.
And S2, calculating the signature value of each interface request data according to a simhash algorithm, and marking all the obtained signature values as an interface data aggregate.
The simhash algorithm is an algorithm for calculating a locality-sensitive hash value. The partially sensitive hash value is understood as that, assuming that two strings have certain similarity, after the hash value is calculated, the two strings still can maintain the similarity, which is called the partially sensitive hash value. The common hash value does not have the local sensitivity attribute, and the simhash algorithm is mainly applied to the deduplication of mass data in a search engine.
Because the interface request data are inconvenient to directly process and calculate, the method calculates the signature value of each interface request data by using a simhash algorithm, not only can convert each interface request data into the signature value to facilitate calculation, but also can keep the similarity between the interface request data so as to reduce the influence on the similarity between the interface request data after the interface request data are converted into the signature values.
And S3, dividing the interface data total set into k interface data subsets according to a k-means algorithm.
The k-means algorithm is a clustering analysis algorithm for iterative solution, and iterative solution is performed through k interface data subsets to locally minimize the sum of squared errors between the k interface data subsets, so that the k interface data subsets have the same weight.
And S4, respectively selecting n/k signature values from each interface data subset as playback data, wherein n is the total number of interface request data needing to be played back.
Since the k interface data subsets obtained in step (3) have the same weight, at this time, n/k signature values are respectively selected from each interface data subset as playback data, and the total number of the playback data is n, it can be understood that each piece of playback data has the same weight, and the selected n pieces of playback data can theoretically cover all interface request data.
Preferably, the step (2) specifically includes:
and S21, dividing the current interface request data into a plurality of interface words according to the parameter values and the corresponding values of the current interface request data.
It is understood that, for example, if the current interface request data is http:// mapi.v. com/viss-mobile/rest/favorite/store/status + 1577329909& user _ token + 12345, the current interface request data can be divided into two interface words, i.e., time + 1577329909 and user _ token + 12345. Of course, for other interface request data, the number of interface words may also be one, three, or four, and the actual number of each interface word is derived from the parameter and the corresponding value of the current interface request data.
And S22, respectively calculating the hash value of each interface word to obtain the vector characteristic value of the current interface word, and carrying out vector combination on all vector characteristic values corresponding to the current interface request data to obtain the signature value of the current interface request data.
The step of calculating the hash value of the interface word obtains the vector characteristic value of the current interface word, wherein the vector characteristic value is a two-dimensional vector characteristic value. For example, when HASH1(timestamp is 1577329909) is 100101, HASH2(user _ token is 12345) is 101011, vector eigenvalue T1 of HASH1 is (1, 0, 0, 1, 0, 1), and vector eigenvalue T2 of HASH2 is (1, 0, 1, 0, 1).
And S23, marking all the obtained signature values as an interface data aggregate.
Preferably, the step (22) specifically comprises:
s221, respectively calculating the hash value of each interface word to obtain a first vector characteristic value of the current interface word.
Corresponding to step (22), in this case, the first vector eigenvalue T1 of HASH1 in step (221) is (1, 0, 0, 1, 0, 1), and the first vector eigenvalue T2 of HASH2 is (1, 0, 1, 0, 1, 1).
And S222, carrying out vector combination on all first vector characteristic values corresponding to the current interface request data to obtain a signature value of the current interface request data.
Preferably, the step (222) specifically includes:
s2221, weighting each first vector characteristic value to obtain a second vector characteristic value.
Specifically, each first vector feature is weighted according to the formula W ═ HASH × WEIGHT, and when 1 is encountered, the HASH value is multiplied by the WEIGHT positively, and when 0 is encountered, the HASH value is multiplied by the WEIGHT negatively, so as to obtain second vector feature values T1 'and T2'.
S2222, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data, and a signature value of the current interface request data is obtained.
Preferably, the step (2222) specifically includes:
s22221, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data, and a third vector characteristic value is obtained. I.e., the third vector eigenvalue T ' ═ T1 ' + T2 '.
S22222, the third vector characteristic value is subjected to dimensionality reduction to obtain a signature value of the current interface request data.
Specifically, for each vector value in T ', if the vector value is greater than 0, the vector value is set to 1, otherwise, the vector value is set to 0, and a signature value T of the current interface request data is obtained, where the step is to process each vector value in T ' to make the value of each vector value in T ' be 0 or 1, so as to simplify subsequent calculation.
Preferably, the step (3) specifically includes:
and S31, randomly selecting k signature values from the interface data total set as virtual center points.
Preferably, the value of k is between A and B, wherein A, B and k are natural numbers. If A is constant 10 and B is constant 50, the value of k lies within the interval [10, 50], e.g., k is constant 15, 18, 20, 25 or 30, etc. It should be noted that the values of a and B are selected according to actual requirements, and are not limited herein.
And S32, dividing the interface data total set into k interface data subsets, wherein each interface data subset only comprises one virtual center point.
And S33, respectively calculating the Hamming distance from all signature values except the virtual center point in the interface data total set to each virtual center point, and classifying each signature value into the interface data subset in which the virtual center point with the smallest Hamming distance from the current signature value to all virtual center points is located.
And S34, calculating the central point of each interface data subset, and taking the central point as a new virtual central point of the current interface data subset.
And S35, repeatedly iterating the interface data aggregate according to the new virtual central point to obtain k converged interface data subsets.
It is understood that step (32), step (33) and step (34) are repeatedly performed until all k subsets of interface data converge, at which time the division of the total set of interface data forms k subsets of interface data, and the importance of each subset of interface data can be considered to be the same or similar, i.e. each subset of interface data has the same weight.
Preferably, the flow playback data selection method is executed in the pre-release code and the comparison code respectively.
The pre-release code refers to the code to be tested, the comparison code refers to the code which can normally run after the early-stage test, and corresponding playback data can be obtained by executing the method in the pre-release code and the comparison code.
The step (4) is followed by:
s5, obtaining the playback data of the pre-issued codes and the playback data of the comparison codes.
And S6, performing differential analysis on the playback data of the pre-issued codes and the playback data of the comparison codes.
And S7, judging whether the pre-issued codes have problems according to the analysis result.
It is understood that the differential analysis refers to comparing the corresponding signature values of the pre-release code and each interface in the comparison code, such as comparing the difference between the corresponding signatures of the pre-release code and each interface in the comparison code, manually or automatically. And when the difference value is larger than the preset threshold value, the problem of the pre-issued code can be judged, and a programmer is reminded of needing to modify the pre-issued code.
Referring to fig. 4, correspondingly, the present invention also discloses a flow playback data selecting system, which includes:
the data grabbing module 10 is configured to grab the interface request data according to a preset time interval;
the first processing module 20 is configured to calculate a signature value of each interface request data according to a simhash algorithm, and mark all obtained signature values as an interface data aggregate;
the second processing module 30 is configured to divide the total set of interface data into k interface data subsets according to a k-means algorithm;
and the execution module 40 is configured to select n/k signature values from each of the interface data subsets as playback data, where n is a total number of interface request data to be played back.
Correspondingly, the invention also discloses a storage medium for storing a computer program, and the program realizes the flow playback data selection method when being executed by a processor.
With reference to fig. 1 to 4, the signature value of each interface request data is calculated by a simhash algorithm, all the obtained signature values are marked as an interface data aggregate, the interface data aggregate is divided into k interface data subsets according to a k-means algorithm, so that the k interface data subsets have the same or similar weights, and it can be considered that the data in each interface data subset have the same playback value, so that when n/k signature values are respectively selected from each interface data subset as playback data, under the condition of the same number of playback data, the method has a larger coverage compared with the traditional playback mode of randomly selecting interface request data, has more representative significance and stronger reliability, and can effectively avoid influence on test judgment due to small coverage of the playback data; in addition, when the coverage of the method is the same as the playback mode of the traditional random selection interface request data, the number of the playback data required to be selected by the method is far smaller than that of the traditional playback method, so that the playback speed is effectively increased to greatly improve the test efficiency.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the scope of the present invention, therefore, the present invention is not limited by the appended claims.

Claims (10)

1.一种流量回放数据选取方法,其特征在于,包括如下步骤:1. a traffic playback data selection method, is characterized in that, comprises the steps: 按照预设的时间间隔抓取接口请求数据;Capture interface request data at preset time intervals; 根据simhash算法计算每一所述接口请求数据的签名值,将获得的所有签名值标记为接口数据总集;Calculate the signature value of each of the interface request data according to the simhash algorithm, and mark all the obtained signature values as the interface data aggregate; 根据k-means算法将所述接口数据总集划分为k个接口数据子集;Divide the total set of interface data into k interface data subsets according to the k-means algorithm; 从每一所述接口数据子集中分别选取n/k个签名值作为回放数据,其中,n为需要回放的接口请求数据的总条数。n/k signature values are respectively selected from each of the interface data subsets as playback data, where n is the total number of interface request data to be played back. 2.如权利要求1所述的流量回放数据选取方法,其特征在于,所述根据simhash算法计算每一所述接口请求数据的签名值,将获得的所有签名值标记为接口数据总集,具体包括:2. The method for selecting traffic playback data as claimed in claim 1, characterized in that, according to the simhash algorithm, the signature value of each described interface request data is calculated, and all signature values obtained are marked as the interface data collection, specifically include: 依据当前接口请求数据的入参值和对应值,将当前接口请求数据划分为若干个接口词;According to the input parameter value and corresponding value of the current interface request data, divide the current interface request data into several interface words; 分别计算每一所述接口词的哈希值以得到当前接口词的向量特征值,将当前接口请求数据对应的所有向量特征值进行向量合并,得到当前接口请求数据的签名值;Calculate the hash value of each described interface word separately to obtain the vector feature value of the current interface word, and perform vector merging of all vector feature values corresponding to the current interface request data to obtain the signature value of the current interface request data; 将获得的所有签名值标记为接口数据总集。Mark all obtained signature values as the interface data set. 3.如权利要求2所述的流量回放数据选取方法,其特征在于,所述分别计算每一所述接口词的哈希值以得到当前接口词的向量特征值,将当前接口请求数据对应的所有向量特征值进行向量合并,得到当前接口请求数据的签名值,具体包括:3. The method for selecting traffic playback data as claimed in claim 2, wherein the hash value of each described interface word is calculated separately to obtain the vector characteristic value of the current interface word, and the corresponding value of the current interface request data is calculated. All vector eigenvalues are combined by vector to obtain the signature value of the current interface request data, including: 分别计算每一所述接口词的哈希值,以得到当前接口词的第一向量特征值;Calculate the hash value of each interface word separately to obtain the first vector feature value of the current interface word; 将当前接口请求数据对应的所有第一向量特征值进行向量合并,得到当前接口请求数据的签名值。All the first vector feature values corresponding to the current interface request data are vector combined to obtain the signature value of the current interface request data. 4.如权利要求3所述的流量回放数据选取方法,其特征在于,所述将当前接口请求数据对应的所有第一向量特征值进行向量合并,得到当前接口请求数据的签名值,具体包括:4. The method for selecting traffic playback data as claimed in claim 3, wherein the vector merging is carried out for all the first vector eigenvalues corresponding to the current interface request data, and the signature value of the current interface request data is obtained, specifically comprising: 对每一所述第一向量特征值进行加权处理,得到第二向量特征值;Perform weighting processing on each of the first vector eigenvalues to obtain second vector eigenvalues; 将当前接口请求数据对应的所有第二向量特征值进行向量合并,得到当前接口请求数据的签名值。Perform vector combination of all the second vector feature values corresponding to the current interface request data to obtain the signature value of the current interface request data. 5.如权利要求4所述的流量回放数据选取方法,其特征在于,所述将当前接口请求数据对应的所有第二向量特征值进行向量合并,得到当前接口请求数据的签名值,具体包括:5. The method for selecting traffic playback data as claimed in claim 4, wherein the vector merging is carried out for all second vector eigenvalues corresponding to the current interface request data to obtain the signature value of the current interface request data, specifically comprising: 将当前接口请求数据对应的所有第二向量特征值进行向量合并,得到第三向量特征值;Perform vector merging of all second vector eigenvalues corresponding to the current interface request data to obtain a third vector eigenvalue; 对所述第三向量特征值进行降维处理,得到当前接口请求数据的签名值。Dimensionality reduction processing is performed on the third vector eigenvalue to obtain the signature value of the current interface request data. 6.如权利要求1所述的流量回放数据选取方法,其特征在于,所述根据k-means算法将所述接口数据总集划分为k个接口数据子集,具体包括:6. The method for selecting traffic playback data as claimed in claim 1, wherein the interface data collection is divided into k interface data subsets according to a k-means algorithm, specifically comprising: 随机从所述接口数据总集中选取k个签名值作为虚拟中心点;Randomly select k signature values from the total set of interface data as virtual center points; 将所述接口数据总集划分为k个接口数据子集,且每一所述接口数据子集只包含一个虚拟中心点;dividing the interface data set into k interface data subsets, and each of the interface data subsets only includes one virtual center point; 分别计算所述接口数据总集中除所述虚拟中心点以外的所有签名值到每一虚拟中心点的海明距离,并将每一签名值归类至当前签名值到所有的虚拟中心点中海明距离最小的虚拟中心点所在的接口数据子集中;Calculate the Hamming distance from all signature values except the virtual center point in the interface data total set to each virtual center point, and classify each signature value into the Hamming distance between the current signature value and all virtual center points The subset of interface data where the virtual center point with the smallest distance is located; 计算每一接口数据子集的中心点,并将所述中心点作为当前接口数据子集的新的虚拟中心点;Calculate the center point of each interface data subset, and use the center point as a new virtual center point of the current interface data subset; 依据新的虚拟中心点对所述接口数据总集进行反复迭代,以得到k个收敛后的接口数据子集。The interface data set is repeatedly iterated according to the new virtual center point to obtain k converged interface data subsets. 7.如权利要求1所述的流量回放数据选取方法,其特征在于,所述流量回放数据选取方法分别在预发布代码和对比代码中执行;7. The traffic playback data selection method as claimed in claim 1, wherein the traffic playback data selection method is performed in pre-release code and contrast code respectively; 所述从每一所述接口数据子集中分别选取n/k个签名值作为回放数据,其中,n为需要回放的接口请求数据的总条数,之后还包括:Said selecting respectively n/k signature values from each of said interface data subsets as playback data, wherein, n is the total number of interface request data that needs to be played back, and further includes: 获取所述预发布代码的回放数据和所述对比代码的回放数据;Obtain the playback data of the pre-release code and the playback data of the comparison code; 对所述预发布代码的回放数据和所述对比代码的回放数据进行差异化分析;Differentiated analysis is performed on the playback data of the pre-release code and the playback data of the comparison code; 依据分析结果判断所述预发布代码是否存在问题。Determine whether there is a problem with the pre-release code according to the analysis result. 8.如权利要求1所述的流量回放数据选取方法,其特征在于,所述k的数值介于A和B之间,其中A,B,k均为自然数。8. The method for selecting traffic playback data according to claim 1, wherein the value of k is between A and B, wherein A, B, and k are all natural numbers. 9.一种流量回放数据选取系统,其特征在于,包括:9. A traffic playback data selection system is characterized in that, comprising: 数据抓取模块,用于按照预设的时间间隔抓取接口请求数据;The data capture module is used to capture the interface request data according to the preset time interval; 第一处理模块,用于根据simhash算法计算每一所述接口请求数据的签名值,将获得的所有签名值标记为接口数据总集;The first processing module is used to calculate the signature value of each described interface request data according to the simhash algorithm, and mark all the obtained signature values as the interface data collection; 第二处理模块,用于根据k-means算法将所述接口数据总集划分为k个接口数据子集;a second processing module, configured to divide the interface data aggregate into k interface data subsets according to the k-means algorithm; 执行模块,用于从每一所述接口数据子集中分别选取n/k个签名值作为回放数据,其中,n为需要回放的接口请求数据的总条数。The execution module is configured to select n/k signature values from each of the interface data subsets as playback data, where n is the total number of interface request data to be played back. 10.一种存储介质,用于存储计算机程序,其特征在于:所述程序被处理器执行时实现如权利要求1~8中任一项所述的流量回放数据选取方法。10 . A storage medium for storing a computer program, characterized in that: when the program is executed by a processor, the method for selecting traffic playback data according to any one of claims 1 to 8 is implemented.
CN202011584777.9A 2020-12-28 2020-12-28 Flow playback data selection method, system and storage medium Pending CN112631922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011584777.9A CN112631922A (en) 2020-12-28 2020-12-28 Flow playback data selection method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011584777.9A CN112631922A (en) 2020-12-28 2020-12-28 Flow playback data selection method, system and storage medium

Publications (1)

Publication Number Publication Date
CN112631922A true CN112631922A (en) 2021-04-09

Family

ID=75285841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011584777.9A Pending CN112631922A (en) 2020-12-28 2020-12-28 Flow playback data selection method, system and storage medium

Country Status (1)

Country Link
CN (1) CN112631922A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831055A (en) * 2012-07-05 2012-12-19 陈振宇 Test case selection method based on weighting attribute
US20150213112A1 (en) * 2014-01-24 2015-07-30 Facebook, Inc. Clustering using locality-sensitive hashing with improved cost model
CN105138647A (en) * 2015-08-26 2015-12-09 陕西师范大学 Travel network cell division method based on Simhash algorithm
CN106557777A (en) * 2016-10-17 2017-04-05 中国互联网络信息中心 It is a kind of to be based on the improved Kmeans clustering methods of SimHash
US20170199811A1 (en) * 2016-01-12 2017-07-13 Wipro Limited Method and System for Optimizing a Test Suite Comprising Plurality of Test Cases
CN107451686A (en) * 2017-07-18 2017-12-08 广东双新电气科技有限公司 Consider the micro-capacitance sensor energy source optimization method of the genetic algorithm of stochastic prediction error
CN109766754A (en) * 2018-12-04 2019-05-17 平安科技(深圳)有限公司 Human face five-sense-organ clustering method, device, computer equipment and storage medium
CN110221965A (en) * 2019-05-09 2019-09-10 阿里巴巴集团控股有限公司 Test cases technology, test method, device, equipment and system
CN111367782A (en) * 2018-12-25 2020-07-03 中国移动通信集团浙江有限公司 Method and device for automatic generation of regression test data
CN111444411A (en) * 2020-03-30 2020-07-24 深圳前海微众银行股份有限公司 Network data increment acquisition method, device, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831055A (en) * 2012-07-05 2012-12-19 陈振宇 Test case selection method based on weighting attribute
US20150213112A1 (en) * 2014-01-24 2015-07-30 Facebook, Inc. Clustering using locality-sensitive hashing with improved cost model
CN105138647A (en) * 2015-08-26 2015-12-09 陕西师范大学 Travel network cell division method based on Simhash algorithm
US20170199811A1 (en) * 2016-01-12 2017-07-13 Wipro Limited Method and System for Optimizing a Test Suite Comprising Plurality of Test Cases
CN106557777A (en) * 2016-10-17 2017-04-05 中国互联网络信息中心 It is a kind of to be based on the improved Kmeans clustering methods of SimHash
CN107451686A (en) * 2017-07-18 2017-12-08 广东双新电气科技有限公司 Consider the micro-capacitance sensor energy source optimization method of the genetic algorithm of stochastic prediction error
CN109766754A (en) * 2018-12-04 2019-05-17 平安科技(深圳)有限公司 Human face five-sense-organ clustering method, device, computer equipment and storage medium
CN111367782A (en) * 2018-12-25 2020-07-03 中国移动通信集团浙江有限公司 Method and device for automatic generation of regression test data
CN110221965A (en) * 2019-05-09 2019-09-10 阿里巴巴集团控股有限公司 Test cases technology, test method, device, equipment and system
CN111444411A (en) * 2020-03-30 2020-07-24 深圳前海微众银行股份有限公司 Network data increment acquisition method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US9430743B2 (en) Composite defect classifier
JP5385759B2 (en) Image processing apparatus and image processing method
CN111027069A (en) Malware family detection method, storage medium and computing device
TWI567660B (en) Multi-class object classifying method and system
CN112634022B (en) Credit risk assessment method and system based on unbalanced data processing
US20150332172A1 (en) Learning method, information processing device, and recording medium
JPWO2014013686A1 (en) Verification device, verification device control method, and computer program
CN113052577A (en) Method and system for estimating category of virtual address of block chain digital currency
CN116582300A (en) Network traffic classification method and device based on machine learning
JP7502972B2 (en) Pruning management device, pruning management system, and pruning management method
CN103927530A (en) Acquiring method, application method and application system of final classifier
CN112836731A (en) Signal random forest classification method, system and device based on decision tree accuracy and correlation measurement
CN111368894B (en) A FCBF Feature Selection Method and Its Application in Network Intrusion Detection
CN112631922A (en) Flow playback data selection method, system and storage medium
CN106852171B (en) User multiple behavior recognition method based on sound information
WO2024159109A1 (en) System and method for authentication of rareness of a digital asset
CN113792141B (en) Feature selection method based on covariance measurement factor
CN117179777A (en) Electrocardiogram data classification method and device based on multi-scale feature fusion network
CN110390309B (en) A method for identifying illegal users of finger veins based on residual distribution
JP5791666B2 (en) Dynamic generation device for visual keywords
TW201122844A (en) System and method for simplifying a matrix based boosting algorithm
CN112422505A (en) Network malicious traffic identification method based on high-dimensional extended key feature vector
US9008437B2 (en) Information processing apparatus, information processing method and storage medium
TWI858596B (en) Model training method and model training apparatus
CN115779444B (en) A cloud game data security protection method and server applied to artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination