[go: up one dir, main page]

CN115567412B - Traffic deduplication method, device, electronic device and storage medium - Google Patents

Traffic deduplication method, device, electronic device and storage medium Download PDF

Info

Publication number
CN115567412B
CN115567412B CN202211268231.1A CN202211268231A CN115567412B CN 115567412 B CN115567412 B CN 115567412B CN 202211268231 A CN202211268231 A CN 202211268231A CN 115567412 B CN115567412 B CN 115567412B
Authority
CN
China
Prior art keywords
flow
identifier
sub
recording
call
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211268231.1A
Other languages
Chinese (zh)
Other versions
CN115567412A (en
Inventor
周官宝
陈吉
吴广贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dewu Information Group Co Ltd
Original Assignee
Shanghai Dewu Information Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dewu Information Group Co Ltd filed Critical Shanghai Dewu Information Group Co Ltd
Priority to CN202211268231.1A priority Critical patent/CN115567412B/en
Publication of CN115567412A publication Critical patent/CN115567412A/en
Application granted granted Critical
Publication of CN115567412B publication Critical patent/CN115567412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供一种流量去重方法、装置、电子设备及存储介质。该方法包括:获取待去重流量数据,所述待去重流量数据包括多条流量数据;其中,每条所述流量数据包括第一标识、第二标识和录制流量,所述第一标识为根据所述录制流量对应的录制接口生成,所述第二标识为根据所述录制流量对应的子调用集合和业务标签生成;根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重。本申请通过利用录制接口对应的第一标识、用业务标签和子调用集合生成的第二标识进行流量去重,提高了流量去重的效率及精准性。

The present application provides a traffic deduplication method, device, electronic device and storage medium. The method includes: obtaining traffic data to be deduplicated, the traffic data to be deduplicated includes multiple traffic data; wherein each of the traffic data includes a first identifier, a second identifier and recorded traffic, the first identifier is generated according to a recording interface corresponding to the recorded traffic, and the second identifier is generated according to a sub-call set and a service label corresponding to the recorded traffic; deduplication of the traffic data to be deduplicated is performed according to the first identifier and the second identifier corresponding to each traffic data. The present application improves the efficiency and accuracy of traffic deduplication by using the first identifier corresponding to the recording interface and the second identifier generated by the service label and the sub-call set to perform traffic deduplication.

Description

Traffic duplicate removal method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of software development and testing, in particular to a traffic deduplication method, a traffic deduplication device, electronic equipment and a storage medium.
Background
After the business service is updated, the business service needs to be tested through traffic recording and playback. In the test process, in the process of comparing the flow data recorded by the current network with the playback flow data in the playback environment to generate a test result, the test efficiency is low because a large number of repeated flows exist in the recorded flows.
In order to solve the problem, a manual flow labeling mode is adopted at present to remove the weight of recorded flow, and the efficiency of the flow weight removing mode is lower.
Disclosure of Invention
The embodiment of the application aims to provide a traffic duplication eliminating method, a traffic duplication eliminating device, electronic equipment and a storage medium, which are used for improving traffic duplication eliminating efficiency.
In a first aspect, an embodiment of the present application provides a traffic deduplication method, including:
The method comprises the steps of obtaining flow data to be de-duplicated, wherein the flow data to be de-duplicated comprises a plurality of pieces of flow data, each piece of flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, and the second identifier is generated according to a sub-calling set and a service label corresponding to the recording flow;
And de-duplicating the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each piece of flow data.
According to the embodiment of the application, the flow duplicate removal is performed by utilizing the first identifier corresponding to the recording interface and the second identifier generated by the service tag and the sub-calling set, so that the efficiency and the accuracy of the flow duplicate removal are improved.
In any embodiment, the de-duplication of the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each flow data includes:
Determining the same flow data in the recorded flow according to the first identifier and the second identifier;
and partially eliminating the same flow data, and reserving only one piece of flow data to realize the de-duplication of the flow data to be de-duplicated.
In the embodiment of the application, the first identifier can be used for representing the uniqueness of the recording interface, and the second identifier can be used for representing the uniqueness of the sub-call, so that the recording flow is subjected to duplication removal through the first identifier and the second identifier, and the accuracy of duplication removal can be improved.
In any embodiment, before obtaining the to-be-de-duplicated traffic data, the method further comprises:
acquiring recording flow through jvm sandbox repeater tools, wherein the recording flow comprises a sub-call set;
extracting a recording interface of the recording flow and generating a first mark according to the recording interface;
and generating a service label corresponding to the recorded flow, and generating a second identifier according to the service label and the sub-call set.
According to the embodiment of the application, the flow duplicate removal is performed by utilizing the first identifier corresponding to the recording interface and the second identifier generated by the service tag and the sub-calling set, so that the efficiency and the accuracy of the flow duplicate removal are improved.
In any embodiment, the sub-call set comprises at least one sub-call, and the call type corresponding to the at least one sub-call is determined by the service server in advance according to the hash value of the thread corresponding to the sub-call, wherein the call type comprises a main thread sub-call and an asynchronous thread sub-call.
According to the embodiment of the application, the sub-call is divided into the main thread sub-call and the asynchronous thread sub-call, so that a plurality of unnecessary scenes can be avoided, and the accuracy of recording flow de-duplication is improved.
In any embodiment, generating the second identifier according to the service tag and the sub-call set of the recorded traffic includes:
Generating a main thread sub-call set according to the sub-call of which the call type is the main thread sub-call;
Generating an asynchronous thread sub-call set for the sub-call of the asynchronous thread sub-call according to the call type;
Generating a target character string from the main thread sub-call set, the asynchronous thread sub-call set and the service tag according to a preset format;
And calculating the target character string by using a preset algorithm to obtain the second identifier.
According to the embodiment of the application, the sub-call is divided into the main thread sub-call and the asynchronous thread sub-call, so that a plurality of unnecessary scenes can be avoided, and the accuracy of recording flow de-duplication is improved by combining the service labels.
In any embodiment, after generating the second identification, the method further comprises:
The first identifier, the second identifier and the flow data corresponding to each piece of flow data are stored into a search server as one piece of flow data;
correspondingly, the obtaining the flow data to be de-duplicated includes:
and obtaining the data of the flow to be de-duplicated in a preset time period from the search server.
Because the data volume of the recording flow is large, in order to reduce the pressure of the recording platform, the preprocessed flow data is stored in a search server, and then the data of the recording flow to be duplicated is read from the search server, so that the purpose of reducing the pressure of the recording platform is achieved.
In any embodiment, generating the service label corresponding to the recorded traffic includes:
extracting a preset field from each piece of flow data;
And determining the business label matched with the preset field from the corresponding relation of the pre-stored field label.
According to the embodiment of the application, the corresponding service label is generated for each flow data, and whether the scenes corresponding to the flow data are the same or not is reflected by the service label, so that the accuracy of recording the flow de-duplication is improved.
In a second aspect, an embodiment of the present application provides a traffic deduplication apparatus, including:
the system comprises a data acquisition module, a data processing module and a service label generation module, wherein the data acquisition module is used for acquiring flow data to be de-duplicated, the flow data to be de-duplicated comprises a plurality of flow data, each flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, and the second identifier is generated according to a sub-calling set and a service label corresponding to the recording flow;
and the de-duplication module is used for de-duplication the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each flow data.
In a third aspect, an embodiment of the present application provides an electronic device comprising a processor, a memory, and a bus, wherein,
The processor and the memory complete communication with each other through the bus;
The memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of the first aspect.
In a fourth aspect, embodiments of the present application provide a non-transitory computer readable storage medium comprising:
The non-transitory computer-readable storage medium stores computer instructions that cause the computer to perform the method of the first aspect.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a flow deduplication method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of another flow deduplication method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a flow de-duplication apparatus according to an embodiment of the present application;
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the technical scheme of the present application will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and thus are merely examples, and are not intended to limit the scope of the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used herein are for the purpose of describing particular embodiments only and are not intended to be limiting of the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions.
In the description of embodiments of the present application, the technical terms "first," "second," and the like are used merely to distinguish between different objects and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated, a particular order or a primary or secondary relationship. In the description of the embodiments of the present application, the meaning of "plurality" is two or more unless explicitly defined otherwise.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In the description of the embodiment of the present application, the term "and/or" is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B, and may indicate that a exists alone, while a and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
In the description of the embodiments of the present application, the term "plurality" means two or more (including two), and similarly, "plural sets" means two or more (including two), and "plural sheets" means two or more (including two).
In the description of the embodiments of the present application, the orientation or positional relationship indicated by the technical terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. are based on the orientation or positional relationship shown in the drawings, and are merely for convenience of description and simplification of the description, and do not indicate or imply that the apparatus or element referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the embodiments of the present application.
In the description of the embodiments of the present application, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "fixed" and the like are to be construed broadly and include, for example, fixed connection, detachable connection, or integral therewith, mechanical connection, electrical connection, direct connection, indirect connection via an intermediary, communication between two elements, or interaction between two elements. The specific meaning of the above terms in the embodiments of the present application will be understood by those of ordinary skill in the art according to specific circumstances.
The flow generated by the production environment every day is massive and has a large number of repetitions, and the effective test of the service system to be tested cannot be realized by the service requests which are repeated in a large number, so that a scheme capable of de-duplicating the service requests of the system in the large number of production environments is needed.
At present, the flow can be controlled by manually marking the flow data, namely, corresponding labels are manually set for each flow, and the flows belonging to the same label are removed, so that the flow of the same label only remains one flow, and the duplication removing operation of the flow is realized. However, in the face of large volumes of traffic, manual labeling methods can result in slower traffic deduplication efficiency.
In order to solve the above technical problems, the present inventors provide a traffic deduplication method, that is, the difference of traffic data is differentiated according to a recording interface, a service tag corresponding to each piece of traffic data, and a sub-call set corresponding to each piece of traffic data, so as to achieve the purpose of deduplication of recorded traffic.
Before describing the embodiments of the present application, in order to better understand the present application, related concepts will be described:
Portal call-requests initiated externally, the call formed is called portal call. Currently, the types supported by the flow playback platform are http, dubbo, mq (rocketMQ) and the like.
Sub-calling, namely sub-calling formed by an out-of-process or java (needing enhancement) method, such as out-of-process calling (redis, mybatis), java method enhancement, custom sub-calling and the like. Wherein the sub-calls can be divided into main thread sub-calls and asynchronous thread sub-calls.
Jvm Sandbox repeater is a common server recording/playback solution based on Jvm-Sandbox.
Recording, namely serializing and storing the one-time request of the entering parameter, the exiting parameter, the downstream RPC, the DB, the cache and the like.
Playback, namely restoring the recorded data, re-initiating one or N requests, and performing a MOCK process on a specific downstream node.
And (3) when the MOCK is played back, the intercepted sub-call does not generate real call, and the return value during recording is directly returned by utilizing the flow intervention capability of Sandbox.
The search server is a distributed, high-expansion and high-real-time search and data analysis engine. The method can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring. The horizontal scalability of the elastomer search is fully utilized, enabling the data to become more valuable in a production environment. The implementation principle of the elastic search is mainly divided into the following steps, firstly, a user submits data to an elastic search database, then a word segmentation controller is used for word segmentation of corresponding sentences, the weight and word segmentation results are stored in the data together, when the user searches the data, the results are ranked according to the weight, scoring is carried out, and then the returned results are presented to the user.
It can be understood that the traffic deduplication method provided by the embodiment of the application can be applied to electronic equipment, which can be a terminal or a server, wherein the terminal can be a smart phone, a tablet Personal computer, a Personal digital assistant (Personal DIGITAL ASSITANT, PDA) and the like, and the server can be an application server or a Web server. It can be understood that the electronic device is provided with a flow recording platform, and the flow recording platform can be in communication connection with the service server and receives the recording flow sent by the service server.
Fig. 1 is a schematic flow chart of a flow deduplication method according to an embodiment of the present application, as shown in fig. 1, where the method includes:
Step 101, obtaining flow data to be de-duplicated, wherein the flow data to be de-duplicated comprises a plurality of flow data, each flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, and the second identifier is generated according to a sub-calling set and a service label corresponding to the recording flow;
And 102, de-duplicating the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each flow data.
In step 101, the data of the traffic to be de-duplicated may be pre-stored in a preset location, for example, in a search server, or in a database. The electronic equipment can acquire flow data in a preset time period from the search server according to a preset period to form flow data to be de-duplicated. For example, the preset period may be one day, and the electronic device may acquire all traffic data of the previous day at 10 am every day.
Since there are many flow data generated in one day and many repeated flow data, it is necessary to deduplicate the repeated flow data. It is understood that the repeated traffic data refers to traffic data of the same scene corresponding to the same recording interface. Taking online shopping as an example, two users submit orders in the same activity of the same online shopping platform, and flow data corresponding to the two orders belong to repeated flow data.
The first identifier is an identifier for representing global uniqueness of a recording interface corresponding to the recording flow, and can be obtained by calculating the recording interface by adopting an MD5 algorithm, or by calculating the recording interface by adopting a hash algorithm, or by adopting other algorithms, which is not particularly limited in the embodiment of the application.
The recording flow is a service request sent by the user terminal to the service server, the service server responds to the service flow generated in the service request process and sends the service flow to the electronic equipment in a preset request mode, and the electronic equipment is recorded by a flow recording platform in the electronic equipment. It will be appreciated that a service request may correspond to a recorded stream. The service server can send the service traffic to the electronic device in the form of an http request. The http request includes a recording interface, so that the recording interface can be extracted from the http request.
The second identifier is used for representing the global uniqueness of the recorded flow, and is obtained by calculating a sub-call set and a service label corresponding to the recorded flow according to a preset algorithm, wherein the preset algorithm can be a one-way function algorithm, the one-way function algorithm has a function output result difficultly, and specific input data is reversely pushed. The one-way function algorithm can be a password hash function, and specifically can be an MD5 algorithm, a hash algorithm and the like. The recording flow comprises a plurality of sub-call sets, wherein the plurality of sub-call sets can comprise main thread sub-calls and asynchronous thread sub-calls.
The traffic label is used to characterize the type of recorded traffic, which can be determined by some specific fields in the recorded traffic. It may be understood that the recording traffic may correspond to one service tag, may correspond to a plurality of service tags, or may not have a corresponding service tag. For the case where there is no service tag, the electronic device may generate a second identification from the sub-call set.
In step 102, since the same recording interface may include traffic data of multiple scenes and the same scene may include multiple pieces of traffic data, multiple pieces of traffic data belonging to the same scene in the same recording interface need to be deduplicated. In addition, different recording interfaces are considered to be different flows even if the flow data included in the interfaces are the same. Therefore, in order to improve the accuracy of flow deduplication, the recording interface can be distinguished by the first identifier, and the flow data can be distinguished by the second identifier, so that the data of the flow to be deduplicated is deduplicated according to the first identifier and the second identifier.
According to the embodiment of the application, the flow duplicate removal is performed by utilizing the first identifier corresponding to the recording interface and the second identifier generated by the service tag and the sub-calling set, so that the efficiency and the accuracy of the flow duplicate removal are improved.
On the basis of the foregoing embodiment, the performing deduplication on the traffic data to be deduplicated according to the first identifier and the second identifier corresponding to each piece of traffic data includes:
Determining the same flow data in the recorded flow according to the first identifier and the second identifier;
and partially eliminating the same flow data, and reserving only one piece of flow data to realize the de-duplication of the flow data to be de-duplicated.
In a specific implementation, the spark deduplication method may be used for deduplication. The spark includes multiple de-duplication methods, such as distinct de-duplication, group by operation, row_number windowing operation, etc. And distinct, performing de-duplication, namely taking one or more fields as de-duplication basis, and when the values corresponding to the fields in the two pieces of data are the same, considering the two pieces of data as the same data, and only reserving one piece of data. The group by operation uses the de-duplication column as an aggregation field, and the purpose of de-duplication is achieved through aggregation, for example, a first identifier and a second identifier are used as aggregation fields, traffic data are aggregated according to the first identifier and the second identifier, and only one piece of aggregated traffic data is reserved. Thus, the deduplication operation of the data of the traffic to be deduplicated is realized.
In the embodiment of the application, the first identifier can be used for representing the uniqueness of the recording interface, and the second identifier can be used for representing the uniqueness of the sub-call, so that the recording flow is subjected to duplication removal through the first identifier and the second identifier, and the accuracy of duplication removal can be improved.
On the basis of the above embodiment, before obtaining the traffic data to be de-duplicated, the method further includes:
acquiring recording flow through jvm sandbox repeater tools, wherein the recording flow comprises a sub-call set;
extracting a recording interface of the recording flow and generating a first mark according to the recording interface;
and generating a service label corresponding to the recorded flow, and generating a second identifier according to the service label and the sub-call set.
In a specific implementation process, the recording flow is obtained by recording the service request by using jvm sandbox repeater tools running in the electronic device, and the explanation about the recording flow is referred to the above embodiments and is not repeated here. When responding to a service request sent by a user side, the service server may perform multiple calls, so that the recorded flow includes flow data corresponding to multiple sub-calls, and the flow data corresponding to the multiple sub-calls forms a sub-call set.
When the service server reports the recording flow to the electronic equipment, the recording interface is carried in the reporting request, for example, the recording interface can be extracted from the http request through the entrance call. After the recording interface is obtained, a first identifier corresponding to the recording interface may be generated according to a preset algorithm, which may be specifically referred to the above embodiment.
After the electronic equipment acquires the recorded flow, determining a service label corresponding to the recorded flow according to a preset field in the recorded flow. It can be understood that the correspondence between the fields and the service labels may be stored in the electronic device in advance, after the recording volume is obtained, each field in the recording flow is matched with a preset field, and if the matching is successful, the field in the recording flow that is successfully matched is the preset field. And determining the business label of the preset field from the prestored corresponding relation.
After the electronic device obtains the service tag, a preset algorithm is adopted to generate the second identifier for the service tag and the sub-call set, and the specific preset algorithm can be referred to the above embodiment and is not described herein.
According to the embodiment of the application, the flow duplicate removal is performed by utilizing the first identifier corresponding to the recording interface and the second identifier generated by the service tag and the sub-calling set, so that the efficiency and the accuracy of the flow duplicate removal are improved.
On the basis of the above embodiment, the call type of the sub-call may be determined as follows:
the electronic equipment pre-stores the hash value corresponding to the main thread, calculates the hash value of the thread corresponding to the sub-call, compares the calculated hash value with the stored hash value of the main thread, determines that the sub-call is the main thread sub-call if the calculated hash value is consistent with the stored hash value of the main thread, and determines that the sub-call is the asynchronous thread sub-call if the calculated hash value is inconsistent with the stored hash value of the main thread.
On the basis of the foregoing embodiment, the generating, according to the service tag and the sub-call set of the recording flow, a second identifier includes:
generating a main thread sub-call set for sub-calls of the main thread sub-call according to the call type;
generating an asynchronous thread sub-call set for the sub-call of the asynchronous thread sub-call according to the call type;
generating a target character string from the main thread sub-call set, the asynchronous thread sub-call set and the service tag according to a preset format;
And calculating the target character string by using a preset algorithm to obtain the second identifier.
In a specific implementation process, after determining the call types of the sub-calls, the electronic device may form a main thread sub-call set from the sub-calls belonging to the main thread sub-call, and form an asynchronous thread sub-call set from the sub-calls belonging to the asynchronous thread sub-call. And then combining the main thread sub-call set, the asynchronous thread sub-call set and the service tag to form a large set, namely the target character string. Specifically, the main thread sub-call set, the asynchronous thread sub-call set and the service tag are separated by commas in an English state, so that a target character string is obtained, namely, the target character string=the main thread sub-call set, the asynchronous thread sub-call set and the service tag.
After the target character string is obtained, a preset algorithm such as an MD5 algorithm or a hash algorithm can be adopted to calculate the target character string, so that the second identifier is obtained.
According to the embodiment of the application, the sub-call is divided into the main thread sub-call and the asynchronous thread sub-call, so that a plurality of unnecessary scenes can be avoided, and the accuracy of recording flow de-duplication is improved by combining the service labels.
On the basis of the above embodiment, after generating the second identifier, the method further includes:
Storing the first identifier, the second identifier and the recorded flow as one flow data to a search server;
correspondingly, the obtaining the flow data to be de-duplicated includes:
and obtaining the data of the flow to be de-duplicated in a preset time period from the search server.
In a specific implementation process, since a large amount of service flows can be generated in a service server in a short time, the electronic device can acquire a large amount of recording flows in real time. If the recorded flow is de-duplicated in real time, the load pressure of the electronic equipment is increased. In order to reduce the load pressure of the electronic equipment, after the first identifier and the second identifier corresponding to the recorded flow are obtained, the embodiment of the application generates a piece of flow data from the recorded flow and the first identifier and the second identifier corresponding to the recorded flow, and stores the generated flow data in the search server. It can be understood that the flow data may also include information such as a timestamp corresponding to the recorded flow. It can be appreciated that the timestamp of the recorded traffic may be the time of generating the recorded traffic, the time of storing the recorded traffic in the search server, the time of receiving the recorded traffic by the electronic device, and so on.
The electronic device may acquire the traffic data in the preset period from the search server, and call the acquired traffic data as traffic data to be deduplicated. The preset time period may be a time period input by a tester, for example, may be flow data of 2022, 10 months, 1 day to 2 days, or may be a time period determined according to the current time, for example, may be flow data of a day before a date corresponding to the current time.
It will be appreciated that traffic data in the search server may be cleaned up periodically in order to reduce the storage pressure of the search server, for example, traffic data two weeks away from the current time may be cleaned up.
Because the data volume of the recording flow is large, in order to reduce the pressure of the recording platform, the preprocessed flow data is stored in a search server, and then the data of the recording flow to be duplicated is read from the search server, so that the purpose of reducing the pressure of the recording platform is achieved.
Fig. 2 is a schematic flow chart of another flow deduplication method according to an embodiment of the present application, as shown in fig. 2, where the method includes:
And the service server judges whether the sub-calls are asynchronous threads or not, namely, after the service server generates service flow according to the service request sent by the user side, judging whether each sub-call contained in the service flow is an asynchronous thread or not, and marking the sub-call. It can be understood that, the method for determining whether the thread corresponding to the sub-call is an asynchronous thread is referred to the above embodiment, and will not be described herein. And "0" may be used to indicate that the thread corresponding to the sub-call is a main thread, and "1" may be used to indicate that the thread corresponding to the sub-call is an asynchronous thread. It will be appreciated that other marking methods may be used to mark the same, and embodiments of the present application are not particularly limited in this regard.
The service server sends service traffic to a traffic recording platform (jvm sandbox repeater) of the electronic device, and specifically, the service traffic can be sent by means of an http request (jvm sandbox repeater module). And a flow recording platform controller (jvm sandbox repeater console) in the electronic equipment records the service flow after receiving the service flow to obtain recorded flow.
The electronic equipment acquires a recording interface from the recording flow sent by the service server, and generates an entry MD5 according to the recording interface.
The electronic equipment adds the main thread sub-call set, the asynchronous thread sub-call set and the business label into a large set, connects the large set with English commas to form a character string (namely the target character string), and carries out MD5 calculation on the character string to obtain sub MD5.
The entry MD5, sub-MD 5 and recorded traffic are formed into a piece of traffic data which is stored in ELASTIC SEARCH.
The multiple pieces of traffic data are fetched from ELASTIC SEARCH and the recorded traffic is de-duplicated using spark according to the ingress MD5 and sub-MD 5.
And storing the flow data after the duplication removal into a preset database, such as a mysql database.
The embodiment of the application ensures the basic skeleton of the recorded service request by controlling the flow by utilizing the service tag and combining the sub-call set of the main thread and the sub-call set of the asynchronous thread, and improves the accuracy of the duplication removal on the basis of improving the duplication removal efficiency of the recorded flow.
Fig. 3 is a schematic structural diagram of a flow deduplication apparatus according to an embodiment of the present application, where the apparatus may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus corresponds to the embodiment of the method of fig. 1 described above, and is capable of performing the steps involved in the embodiment of the method of fig. 1, and specific functions of the apparatus may be referred to in the foregoing description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy. The device comprises a data acquisition module 301 and a deduplication module 302, wherein:
The data acquisition module 301 is configured to acquire to-be-de-duplicated flow data, where the to-be-de-duplicated flow data includes a plurality of flow data, each flow data includes a first identifier, a second identifier, and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, and the second identifier is generated according to a sub-call set and a service tag corresponding to the recording flow;
the deduplication module 302 is configured to deduplicate the traffic data to be deduplicated according to the first identifier and the second identifier corresponding to each piece of traffic data.
Based on the above embodiment, the deduplication module 302 is specifically configured to:
Determining the same flow data in the recorded flow according to the first identifier and the second identifier;
and partially eliminating the same flow data, and reserving only one piece of flow data to realize the de-duplication of the flow data to be de-duplicated.
On the basis of the above embodiment, the apparatus further includes a flow preprocessing module, configured to:
acquiring recording flow through jvm sandbox repeater tools, wherein the recording flow comprises a sub-call set;
extracting a recording interface of the recording flow and generating a first mark according to the recording interface;
and generating a service label corresponding to the recorded flow, and generating a second identifier according to the service label and the sub-call set.
On the basis of the embodiment, the sub-call set comprises at least one sub-call, and the call type corresponding to the at least one sub-call is determined by a service server in advance according to the hash value of the thread corresponding to the sub-call, wherein the call type comprises a main thread sub-call and an asynchronous thread sub-call.
Based on the above embodiment, the flow preprocessing module is specifically configured to:
generating a main thread sub-call set for sub-calls of the main thread sub-call according to the call type;
generating an asynchronous thread sub-call set for the sub-call of the asynchronous thread sub-call according to the call type;
generating a target character string from the main thread sub-call set, the asynchronous thread sub-call set and the service tag according to a preset format;
And calculating the target character string by using a preset algorithm to obtain the second identifier.
On the basis of the above embodiment, the apparatus further includes a data storage module for:
Storing the first identifier, the second identifier and the recorded flow as one flow data to a search server;
Correspondingly, the flow preprocessing module is specifically used for:
and obtaining the data of the flow to be de-duplicated in a preset time period from the search server.
Based on the above embodiment, the flow preprocessing module is specifically configured to:
extracting a preset field from each piece of flow data;
And determining the business label matched with the preset field from the corresponding relation of the pre-stored field label.
Fig. 4 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present application, as shown in fig. 4, where the electronic device includes a processor 401, a memory 402, and a bus 403,
The processor 401 and the memory 402 complete communication with each other through the bus 403;
the processor 401 is configured to call the program instructions in the memory 402 to execute the method provided in the above embodiments of the method, for example, includes obtaining to-be-de-duplicated flow data, where the to-be-de-duplicated flow data includes a plurality of flow data, each flow data includes a first identifier, a second identifier, and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, the second identifier is generated according to a sub-call set and a service tag corresponding to the recording flow, and de-duplication is performed on the to-be-de-duplicated flow data according to the first identifier and the second identifier corresponding to each flow data.
The processor 401 may be an integrated circuit chip having signal processing capabilities. The processor 401 may be a general purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc., or may be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. Which may implement or perform the various methods, steps, and logical blocks disclosed in embodiments of the application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Memory 402 may include, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), and the like.
The embodiment discloses a computer program product, which comprises a computer program stored on a non-transitory computer readable storage medium, wherein the computer program comprises program instructions, when the program instructions are executed by a computer, the computer can execute the method provided by the method embodiments, for example, the method comprises the steps of obtaining flow data to be de-duplicated, wherein the flow data to be de-duplicated comprises a plurality of pieces of flow data, each piece of flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, the second identifier is generated according to a sub-call set and a service label corresponding to the recording flow, and the de-duplication is carried out on the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each piece of flow data.
The embodiment provides a non-transitory computer readable storage medium, which stores computer instructions, wherein the computer instructions enable a computer to execute the method provided by the method embodiments, for example, the method comprises the steps of obtaining to-be-de-duplicated flow data, wherein the to-be-de-duplicated flow data comprises a plurality of flow data, each flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, the second identifier is generated according to a sub-call set and a service label corresponding to the recording flow, and de-duplication is performed on the to-be-de-duplicated flow data according to the first identifier and the second identifier corresponding to each flow data.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A traffic deduplication method, comprising:
The method comprises the steps of obtaining flow data to be de-duplicated, wherein the flow data to be de-duplicated comprises a plurality of pieces of flow data, each piece of flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, the second identifier is generated according to a sub-calling set corresponding to the recording flow and a service tag according to a preset algorithm, the first identifier is used for representing the global uniqueness of the recording interface corresponding to the recording flow, the second identifier is used for representing the global uniqueness of the recording flow, and the service tag is used for representing the type of the recording flow;
And de-duplicating the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each piece of flow data.
2. The method according to claim 1, wherein the de-duplication of the traffic data to be de-duplicated according to the first identifier and the second identifier corresponding to each traffic data includes:
Determining the same flow data in the recorded flow according to the first identifier and the second identifier;
and partially eliminating the same flow data, and reserving only one piece of flow data to realize the de-duplication of the flow data to be de-duplicated.
3. The method of claim 1, wherein prior to obtaining the de-duplication traffic data, the method further comprises:
acquiring recording flow through jvm sandbox repeater tools, wherein the recording flow comprises a sub-call set;
extracting a recording interface of the recording flow and generating a first mark according to the recording interface;
and generating a service label corresponding to the recorded flow, and generating a second identifier according to the service label and the sub-call set.
4. The method of claim 3, wherein the set of sub-calls comprises at least one sub-call, and wherein a call type corresponding to the at least one sub-call is determined by a service server in advance according to a hash value of a thread corresponding to the sub-call, and wherein the call type comprises a main thread sub-call and an asynchronous thread sub-call.
5. The method of claim 4, wherein the generating a second identification from the service tag and the set of sub-calls comprises:
generating a main thread sub-call set for sub-calls of the main thread sub-call according to the call type;
generating an asynchronous thread sub-call set for the sub-call of the asynchronous thread sub-call according to the call type;
generating a target character string from the main thread sub-call set, the asynchronous thread sub-call set and the service tag according to a preset format;
And calculating the target character string by using a preset algorithm to obtain the second identifier.
6. The method of claim 3, wherein the generating the service label corresponding to the recorded traffic comprises:
extracting a preset field from each piece of flow data;
And determining the business label matched with the preset field from the corresponding relation of the pre-stored field label.
7. The method of any of claims 3-6, wherein after generating the second identifier, the method further comprises:
Storing the first identifier, the second identifier and the recorded flow as one flow data to a search server;
correspondingly, the obtaining the flow data to be de-duplicated includes:
and obtaining the data of the flow to be de-duplicated in a preset time period from the search server.
8. A flow deduplication apparatus, comprising:
The system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring to-be-de-duplicated flow data which comprises a plurality of flow data, each flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, the second identifier is generated according to a sub-calling set corresponding to the recording flow and a service tag according to a preset algorithm, the first identifier is used for representing the global uniqueness of the recording interface corresponding to the recording flow, the second identifier is used for representing the global uniqueness of the recording flow, and the service tag is used for representing the type of the recording flow;
and the de-duplication module is used for de-duplication the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each flow data.
9. An electronic device comprising a processor, a memory and a bus, wherein,
The processor and the memory complete communication with each other through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-7.
10. A non-transitory computer readable storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-7.
CN202211268231.1A 2022-10-17 2022-10-17 Traffic deduplication method, device, electronic device and storage medium Active CN115567412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211268231.1A CN115567412B (en) 2022-10-17 2022-10-17 Traffic deduplication method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211268231.1A CN115567412B (en) 2022-10-17 2022-10-17 Traffic deduplication method, device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN115567412A CN115567412A (en) 2023-01-03
CN115567412B true CN115567412B (en) 2025-02-14

Family

ID=84746954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211268231.1A Active CN115567412B (en) 2022-10-17 2022-10-17 Traffic deduplication method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115567412B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104322039A (en) * 2012-12-31 2015-01-28 华为技术有限公司 System architecture, subsystem, and method for opening of telecommunication network capability
CN112214395A (en) * 2020-09-02 2021-01-12 浙江大搜车融资租赁有限公司 Interface testing method based on flow data, electronic device and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9516049B2 (en) * 2013-11-13 2016-12-06 ProtectWise, Inc. Packet capture and network traffic replay
CN112637005B (en) * 2020-12-08 2022-06-14 广州品唯软件有限公司 Flow playback method and device, computer equipment and storage medium
CN114710562B (en) * 2022-03-31 2022-11-08 珠海市鸿瑞信息技术股份有限公司 Big data-based equipment application log correlation analysis system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104322039A (en) * 2012-12-31 2015-01-28 华为技术有限公司 System architecture, subsystem, and method for opening of telecommunication network capability
CN112214395A (en) * 2020-09-02 2021-01-12 浙江大搜车融资租赁有限公司 Interface testing method based on flow data, electronic device and storage medium

Also Published As

Publication number Publication date
CN115567412A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN110046086B (en) Expected data generation method and device for test and electronic equipment
CN111666443A (en) Service processing method and device, electronic equipment and computer readable storage medium
CN107633016A (en) Data processing method and device and electronic equipment
CN112559578B (en) Data processing method, device, electronic equipment and storage medium
CN111625625B (en) Method, device, computer equipment and storage medium for determining exception log
CN110704486B (en) Data processing method, device, system, storage medium and server
CN110955758A (en) Code detection method, code detection server and index server
CN108074033A (en) Processing method, system, electronic equipment and the storage medium of achievement data
CN114116973A (en) Multi-document text duplicate checking method, electronic equipment and storage medium
CN114595127B (en) Log exception processing method, device, equipment and storage medium
CN114662822A (en) Audit model determination method and device and electronic equipment
CN111932076A (en) Rule configuration and release method and device and computing equipment
CN111770080A (en) Device fingerprint recovery method and device
CN115567412B (en) Traffic deduplication method, device, electronic device and storage medium
CN119357334B (en) Production data processing method, device, storage medium and program product
CN111859101B (en) Abnormal event detection method, device, electronic device and storage medium
CN112861013A (en) User portrait updating method and device, electronic equipment and storage medium
CN112965943A (en) Data processing method and device, electronic equipment and storage medium
CN112115136A (en) Multi-data stream processing method, apparatus, computer equipment and storage medium
CN111045983A (en) Nuclear power station electronic file management method and device, terminal equipment and medium
CN114254112B (en) Methods, systems, devices, and media for pre-classification of sensitive information
WO2019085075A1 (en) Information element set generation method and rule execution method based on rule engine
JP6646699B2 (en) Search device and search method
CN114661979A (en) Information processing method and device, equipment and computer readable storage medium
CN113656586A (en) Emotion classification method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: Room 6416, Building 13, No. 723 Tongxin Road, Hongkou District, Shanghai 200080

Applicant after: Shanghai Dewu Information Group Co.,Ltd.

Address before: Room B6-2005, No. 121 Zhongshan North 1st Road, Hongkou District, Shanghai

Applicant before: SHANGHAI SHIZHUANG INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20230103

Assignee: Shanghai Dewen Information Technology Co.,Ltd.

Assignor: Shanghai Dewu Information Group Co.,Ltd.

Contract record no.: X2025980039885

Denomination of invention: Traffic de-duplication method, device, electronic equipment and storage medium

Granted publication date: 20250214

License type: Common License

Record date: 20251128

EE01 Entry into force of recordation of patent licensing contract