Disclosure of Invention
The embodiment of the application aims to provide a traffic duplication eliminating method, a traffic duplication eliminating device, electronic equipment and a storage medium, which are used for improving traffic duplication eliminating efficiency.
In a first aspect, an embodiment of the present application provides a traffic deduplication method, including:
The method comprises the steps of obtaining flow data to be de-duplicated, wherein the flow data to be de-duplicated comprises a plurality of pieces of flow data, each piece of flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, and the second identifier is generated according to a sub-calling set and a service label corresponding to the recording flow;
And de-duplicating the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each piece of flow data.
According to the embodiment of the application, the flow duplicate removal is performed by utilizing the first identifier corresponding to the recording interface and the second identifier generated by the service tag and the sub-calling set, so that the efficiency and the accuracy of the flow duplicate removal are improved.
In any embodiment, the de-duplication of the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each flow data includes:
Determining the same flow data in the recorded flow according to the first identifier and the second identifier;
and partially eliminating the same flow data, and reserving only one piece of flow data to realize the de-duplication of the flow data to be de-duplicated.
In the embodiment of the application, the first identifier can be used for representing the uniqueness of the recording interface, and the second identifier can be used for representing the uniqueness of the sub-call, so that the recording flow is subjected to duplication removal through the first identifier and the second identifier, and the accuracy of duplication removal can be improved.
In any embodiment, before obtaining the to-be-de-duplicated traffic data, the method further comprises:
acquiring recording flow through jvm sandbox repeater tools, wherein the recording flow comprises a sub-call set;
extracting a recording interface of the recording flow and generating a first mark according to the recording interface;
and generating a service label corresponding to the recorded flow, and generating a second identifier according to the service label and the sub-call set.
According to the embodiment of the application, the flow duplicate removal is performed by utilizing the first identifier corresponding to the recording interface and the second identifier generated by the service tag and the sub-calling set, so that the efficiency and the accuracy of the flow duplicate removal are improved.
In any embodiment, the sub-call set comprises at least one sub-call, and the call type corresponding to the at least one sub-call is determined by the service server in advance according to the hash value of the thread corresponding to the sub-call, wherein the call type comprises a main thread sub-call and an asynchronous thread sub-call.
According to the embodiment of the application, the sub-call is divided into the main thread sub-call and the asynchronous thread sub-call, so that a plurality of unnecessary scenes can be avoided, and the accuracy of recording flow de-duplication is improved.
In any embodiment, generating the second identifier according to the service tag and the sub-call set of the recorded traffic includes:
Generating a main thread sub-call set according to the sub-call of which the call type is the main thread sub-call;
Generating an asynchronous thread sub-call set for the sub-call of the asynchronous thread sub-call according to the call type;
Generating a target character string from the main thread sub-call set, the asynchronous thread sub-call set and the service tag according to a preset format;
And calculating the target character string by using a preset algorithm to obtain the second identifier.
According to the embodiment of the application, the sub-call is divided into the main thread sub-call and the asynchronous thread sub-call, so that a plurality of unnecessary scenes can be avoided, and the accuracy of recording flow de-duplication is improved by combining the service labels.
In any embodiment, after generating the second identification, the method further comprises:
The first identifier, the second identifier and the flow data corresponding to each piece of flow data are stored into a search server as one piece of flow data;
correspondingly, the obtaining the flow data to be de-duplicated includes:
and obtaining the data of the flow to be de-duplicated in a preset time period from the search server.
Because the data volume of the recording flow is large, in order to reduce the pressure of the recording platform, the preprocessed flow data is stored in a search server, and then the data of the recording flow to be duplicated is read from the search server, so that the purpose of reducing the pressure of the recording platform is achieved.
In any embodiment, generating the service label corresponding to the recorded traffic includes:
extracting a preset field from each piece of flow data;
And determining the business label matched with the preset field from the corresponding relation of the pre-stored field label.
According to the embodiment of the application, the corresponding service label is generated for each flow data, and whether the scenes corresponding to the flow data are the same or not is reflected by the service label, so that the accuracy of recording the flow de-duplication is improved.
In a second aspect, an embodiment of the present application provides a traffic deduplication apparatus, including:
the system comprises a data acquisition module, a data processing module and a service label generation module, wherein the data acquisition module is used for acquiring flow data to be de-duplicated, the flow data to be de-duplicated comprises a plurality of flow data, each flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, and the second identifier is generated according to a sub-calling set and a service label corresponding to the recording flow;
and the de-duplication module is used for de-duplication the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each flow data.
In a third aspect, an embodiment of the present application provides an electronic device comprising a processor, a memory, and a bus, wherein,
The processor and the memory complete communication with each other through the bus;
The memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of the first aspect.
In a fourth aspect, embodiments of the present application provide a non-transitory computer readable storage medium comprising:
The non-transitory computer-readable storage medium stores computer instructions that cause the computer to perform the method of the first aspect.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Detailed Description
Embodiments of the technical scheme of the present application will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and thus are merely examples, and are not intended to limit the scope of the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used herein are for the purpose of describing particular embodiments only and are not intended to be limiting of the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions.
In the description of embodiments of the present application, the technical terms "first," "second," and the like are used merely to distinguish between different objects and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated, a particular order or a primary or secondary relationship. In the description of the embodiments of the present application, the meaning of "plurality" is two or more unless explicitly defined otherwise.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In the description of the embodiment of the present application, the term "and/or" is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B, and may indicate that a exists alone, while a and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
In the description of the embodiments of the present application, the term "plurality" means two or more (including two), and similarly, "plural sets" means two or more (including two), and "plural sheets" means two or more (including two).
In the description of the embodiments of the present application, the orientation or positional relationship indicated by the technical terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. are based on the orientation or positional relationship shown in the drawings, and are merely for convenience of description and simplification of the description, and do not indicate or imply that the apparatus or element referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the embodiments of the present application.
In the description of the embodiments of the present application, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "fixed" and the like are to be construed broadly and include, for example, fixed connection, detachable connection, or integral therewith, mechanical connection, electrical connection, direct connection, indirect connection via an intermediary, communication between two elements, or interaction between two elements. The specific meaning of the above terms in the embodiments of the present application will be understood by those of ordinary skill in the art according to specific circumstances.
The flow generated by the production environment every day is massive and has a large number of repetitions, and the effective test of the service system to be tested cannot be realized by the service requests which are repeated in a large number, so that a scheme capable of de-duplicating the service requests of the system in the large number of production environments is needed.
At present, the flow can be controlled by manually marking the flow data, namely, corresponding labels are manually set for each flow, and the flows belonging to the same label are removed, so that the flow of the same label only remains one flow, and the duplication removing operation of the flow is realized. However, in the face of large volumes of traffic, manual labeling methods can result in slower traffic deduplication efficiency.
In order to solve the above technical problems, the present inventors provide a traffic deduplication method, that is, the difference of traffic data is differentiated according to a recording interface, a service tag corresponding to each piece of traffic data, and a sub-call set corresponding to each piece of traffic data, so as to achieve the purpose of deduplication of recorded traffic.
Before describing the embodiments of the present application, in order to better understand the present application, related concepts will be described:
Portal call-requests initiated externally, the call formed is called portal call. Currently, the types supported by the flow playback platform are http, dubbo, mq (rocketMQ) and the like.
Sub-calling, namely sub-calling formed by an out-of-process or java (needing enhancement) method, such as out-of-process calling (redis, mybatis), java method enhancement, custom sub-calling and the like. Wherein the sub-calls can be divided into main thread sub-calls and asynchronous thread sub-calls.
Jvm Sandbox repeater is a common server recording/playback solution based on Jvm-Sandbox.
Recording, namely serializing and storing the one-time request of the entering parameter, the exiting parameter, the downstream RPC, the DB, the cache and the like.
Playback, namely restoring the recorded data, re-initiating one or N requests, and performing a MOCK process on a specific downstream node.
And (3) when the MOCK is played back, the intercepted sub-call does not generate real call, and the return value during recording is directly returned by utilizing the flow intervention capability of Sandbox.
The search server is a distributed, high-expansion and high-real-time search and data analysis engine. The method can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring. The horizontal scalability of the elastomer search is fully utilized, enabling the data to become more valuable in a production environment. The implementation principle of the elastic search is mainly divided into the following steps, firstly, a user submits data to an elastic search database, then a word segmentation controller is used for word segmentation of corresponding sentences, the weight and word segmentation results are stored in the data together, when the user searches the data, the results are ranked according to the weight, scoring is carried out, and then the returned results are presented to the user.
It can be understood that the traffic deduplication method provided by the embodiment of the application can be applied to electronic equipment, which can be a terminal or a server, wherein the terminal can be a smart phone, a tablet Personal computer, a Personal digital assistant (Personal DIGITAL ASSITANT, PDA) and the like, and the server can be an application server or a Web server. It can be understood that the electronic device is provided with a flow recording platform, and the flow recording platform can be in communication connection with the service server and receives the recording flow sent by the service server.
Fig. 1 is a schematic flow chart of a flow deduplication method according to an embodiment of the present application, as shown in fig. 1, where the method includes:
Step 101, obtaining flow data to be de-duplicated, wherein the flow data to be de-duplicated comprises a plurality of flow data, each flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, and the second identifier is generated according to a sub-calling set and a service label corresponding to the recording flow;
And 102, de-duplicating the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each flow data.
In step 101, the data of the traffic to be de-duplicated may be pre-stored in a preset location, for example, in a search server, or in a database. The electronic equipment can acquire flow data in a preset time period from the search server according to a preset period to form flow data to be de-duplicated. For example, the preset period may be one day, and the electronic device may acquire all traffic data of the previous day at 10 am every day.
Since there are many flow data generated in one day and many repeated flow data, it is necessary to deduplicate the repeated flow data. It is understood that the repeated traffic data refers to traffic data of the same scene corresponding to the same recording interface. Taking online shopping as an example, two users submit orders in the same activity of the same online shopping platform, and flow data corresponding to the two orders belong to repeated flow data.
The first identifier is an identifier for representing global uniqueness of a recording interface corresponding to the recording flow, and can be obtained by calculating the recording interface by adopting an MD5 algorithm, or by calculating the recording interface by adopting a hash algorithm, or by adopting other algorithms, which is not particularly limited in the embodiment of the application.
The recording flow is a service request sent by the user terminal to the service server, the service server responds to the service flow generated in the service request process and sends the service flow to the electronic equipment in a preset request mode, and the electronic equipment is recorded by a flow recording platform in the electronic equipment. It will be appreciated that a service request may correspond to a recorded stream. The service server can send the service traffic to the electronic device in the form of an http request. The http request includes a recording interface, so that the recording interface can be extracted from the http request.
The second identifier is used for representing the global uniqueness of the recorded flow, and is obtained by calculating a sub-call set and a service label corresponding to the recorded flow according to a preset algorithm, wherein the preset algorithm can be a one-way function algorithm, the one-way function algorithm has a function output result difficultly, and specific input data is reversely pushed. The one-way function algorithm can be a password hash function, and specifically can be an MD5 algorithm, a hash algorithm and the like. The recording flow comprises a plurality of sub-call sets, wherein the plurality of sub-call sets can comprise main thread sub-calls and asynchronous thread sub-calls.
The traffic label is used to characterize the type of recorded traffic, which can be determined by some specific fields in the recorded traffic. It may be understood that the recording traffic may correspond to one service tag, may correspond to a plurality of service tags, or may not have a corresponding service tag. For the case where there is no service tag, the electronic device may generate a second identification from the sub-call set.
In step 102, since the same recording interface may include traffic data of multiple scenes and the same scene may include multiple pieces of traffic data, multiple pieces of traffic data belonging to the same scene in the same recording interface need to be deduplicated. In addition, different recording interfaces are considered to be different flows even if the flow data included in the interfaces are the same. Therefore, in order to improve the accuracy of flow deduplication, the recording interface can be distinguished by the first identifier, and the flow data can be distinguished by the second identifier, so that the data of the flow to be deduplicated is deduplicated according to the first identifier and the second identifier.
According to the embodiment of the application, the flow duplicate removal is performed by utilizing the first identifier corresponding to the recording interface and the second identifier generated by the service tag and the sub-calling set, so that the efficiency and the accuracy of the flow duplicate removal are improved.
On the basis of the foregoing embodiment, the performing deduplication on the traffic data to be deduplicated according to the first identifier and the second identifier corresponding to each piece of traffic data includes:
Determining the same flow data in the recorded flow according to the first identifier and the second identifier;
and partially eliminating the same flow data, and reserving only one piece of flow data to realize the de-duplication of the flow data to be de-duplicated.
In a specific implementation, the spark deduplication method may be used for deduplication. The spark includes multiple de-duplication methods, such as distinct de-duplication, group by operation, row_number windowing operation, etc. And distinct, performing de-duplication, namely taking one or more fields as de-duplication basis, and when the values corresponding to the fields in the two pieces of data are the same, considering the two pieces of data as the same data, and only reserving one piece of data. The group by operation uses the de-duplication column as an aggregation field, and the purpose of de-duplication is achieved through aggregation, for example, a first identifier and a second identifier are used as aggregation fields, traffic data are aggregated according to the first identifier and the second identifier, and only one piece of aggregated traffic data is reserved. Thus, the deduplication operation of the data of the traffic to be deduplicated is realized.
In the embodiment of the application, the first identifier can be used for representing the uniqueness of the recording interface, and the second identifier can be used for representing the uniqueness of the sub-call, so that the recording flow is subjected to duplication removal through the first identifier and the second identifier, and the accuracy of duplication removal can be improved.
On the basis of the above embodiment, before obtaining the traffic data to be de-duplicated, the method further includes:
acquiring recording flow through jvm sandbox repeater tools, wherein the recording flow comprises a sub-call set;
extracting a recording interface of the recording flow and generating a first mark according to the recording interface;
and generating a service label corresponding to the recorded flow, and generating a second identifier according to the service label and the sub-call set.
In a specific implementation process, the recording flow is obtained by recording the service request by using jvm sandbox repeater tools running in the electronic device, and the explanation about the recording flow is referred to the above embodiments and is not repeated here. When responding to a service request sent by a user side, the service server may perform multiple calls, so that the recorded flow includes flow data corresponding to multiple sub-calls, and the flow data corresponding to the multiple sub-calls forms a sub-call set.
When the service server reports the recording flow to the electronic equipment, the recording interface is carried in the reporting request, for example, the recording interface can be extracted from the http request through the entrance call. After the recording interface is obtained, a first identifier corresponding to the recording interface may be generated according to a preset algorithm, which may be specifically referred to the above embodiment.
After the electronic equipment acquires the recorded flow, determining a service label corresponding to the recorded flow according to a preset field in the recorded flow. It can be understood that the correspondence between the fields and the service labels may be stored in the electronic device in advance, after the recording volume is obtained, each field in the recording flow is matched with a preset field, and if the matching is successful, the field in the recording flow that is successfully matched is the preset field. And determining the business label of the preset field from the prestored corresponding relation.
After the electronic device obtains the service tag, a preset algorithm is adopted to generate the second identifier for the service tag and the sub-call set, and the specific preset algorithm can be referred to the above embodiment and is not described herein.
According to the embodiment of the application, the flow duplicate removal is performed by utilizing the first identifier corresponding to the recording interface and the second identifier generated by the service tag and the sub-calling set, so that the efficiency and the accuracy of the flow duplicate removal are improved.
On the basis of the above embodiment, the call type of the sub-call may be determined as follows:
the electronic equipment pre-stores the hash value corresponding to the main thread, calculates the hash value of the thread corresponding to the sub-call, compares the calculated hash value with the stored hash value of the main thread, determines that the sub-call is the main thread sub-call if the calculated hash value is consistent with the stored hash value of the main thread, and determines that the sub-call is the asynchronous thread sub-call if the calculated hash value is inconsistent with the stored hash value of the main thread.
On the basis of the foregoing embodiment, the generating, according to the service tag and the sub-call set of the recording flow, a second identifier includes:
generating a main thread sub-call set for sub-calls of the main thread sub-call according to the call type;
generating an asynchronous thread sub-call set for the sub-call of the asynchronous thread sub-call according to the call type;
generating a target character string from the main thread sub-call set, the asynchronous thread sub-call set and the service tag according to a preset format;
And calculating the target character string by using a preset algorithm to obtain the second identifier.
In a specific implementation process, after determining the call types of the sub-calls, the electronic device may form a main thread sub-call set from the sub-calls belonging to the main thread sub-call, and form an asynchronous thread sub-call set from the sub-calls belonging to the asynchronous thread sub-call. And then combining the main thread sub-call set, the asynchronous thread sub-call set and the service tag to form a large set, namely the target character string. Specifically, the main thread sub-call set, the asynchronous thread sub-call set and the service tag are separated by commas in an English state, so that a target character string is obtained, namely, the target character string=the main thread sub-call set, the asynchronous thread sub-call set and the service tag.
After the target character string is obtained, a preset algorithm such as an MD5 algorithm or a hash algorithm can be adopted to calculate the target character string, so that the second identifier is obtained.
According to the embodiment of the application, the sub-call is divided into the main thread sub-call and the asynchronous thread sub-call, so that a plurality of unnecessary scenes can be avoided, and the accuracy of recording flow de-duplication is improved by combining the service labels.
On the basis of the above embodiment, after generating the second identifier, the method further includes:
Storing the first identifier, the second identifier and the recorded flow as one flow data to a search server;
correspondingly, the obtaining the flow data to be de-duplicated includes:
and obtaining the data of the flow to be de-duplicated in a preset time period from the search server.
In a specific implementation process, since a large amount of service flows can be generated in a service server in a short time, the electronic device can acquire a large amount of recording flows in real time. If the recorded flow is de-duplicated in real time, the load pressure of the electronic equipment is increased. In order to reduce the load pressure of the electronic equipment, after the first identifier and the second identifier corresponding to the recorded flow are obtained, the embodiment of the application generates a piece of flow data from the recorded flow and the first identifier and the second identifier corresponding to the recorded flow, and stores the generated flow data in the search server. It can be understood that the flow data may also include information such as a timestamp corresponding to the recorded flow. It can be appreciated that the timestamp of the recorded traffic may be the time of generating the recorded traffic, the time of storing the recorded traffic in the search server, the time of receiving the recorded traffic by the electronic device, and so on.
The electronic device may acquire the traffic data in the preset period from the search server, and call the acquired traffic data as traffic data to be deduplicated. The preset time period may be a time period input by a tester, for example, may be flow data of 2022, 10 months, 1 day to 2 days, or may be a time period determined according to the current time, for example, may be flow data of a day before a date corresponding to the current time.
It will be appreciated that traffic data in the search server may be cleaned up periodically in order to reduce the storage pressure of the search server, for example, traffic data two weeks away from the current time may be cleaned up.
Because the data volume of the recording flow is large, in order to reduce the pressure of the recording platform, the preprocessed flow data is stored in a search server, and then the data of the recording flow to be duplicated is read from the search server, so that the purpose of reducing the pressure of the recording platform is achieved.
Fig. 2 is a schematic flow chart of another flow deduplication method according to an embodiment of the present application, as shown in fig. 2, where the method includes:
And the service server judges whether the sub-calls are asynchronous threads or not, namely, after the service server generates service flow according to the service request sent by the user side, judging whether each sub-call contained in the service flow is an asynchronous thread or not, and marking the sub-call. It can be understood that, the method for determining whether the thread corresponding to the sub-call is an asynchronous thread is referred to the above embodiment, and will not be described herein. And "0" may be used to indicate that the thread corresponding to the sub-call is a main thread, and "1" may be used to indicate that the thread corresponding to the sub-call is an asynchronous thread. It will be appreciated that other marking methods may be used to mark the same, and embodiments of the present application are not particularly limited in this regard.
The service server sends service traffic to a traffic recording platform (jvm sandbox repeater) of the electronic device, and specifically, the service traffic can be sent by means of an http request (jvm sandbox repeater module). And a flow recording platform controller (jvm sandbox repeater console) in the electronic equipment records the service flow after receiving the service flow to obtain recorded flow.
The electronic equipment acquires a recording interface from the recording flow sent by the service server, and generates an entry MD5 according to the recording interface.
The electronic equipment adds the main thread sub-call set, the asynchronous thread sub-call set and the business label into a large set, connects the large set with English commas to form a character string (namely the target character string), and carries out MD5 calculation on the character string to obtain sub MD5.
The entry MD5, sub-MD 5 and recorded traffic are formed into a piece of traffic data which is stored in ELASTIC SEARCH.
The multiple pieces of traffic data are fetched from ELASTIC SEARCH and the recorded traffic is de-duplicated using spark according to the ingress MD5 and sub-MD 5.
And storing the flow data after the duplication removal into a preset database, such as a mysql database.
The embodiment of the application ensures the basic skeleton of the recorded service request by controlling the flow by utilizing the service tag and combining the sub-call set of the main thread and the sub-call set of the asynchronous thread, and improves the accuracy of the duplication removal on the basis of improving the duplication removal efficiency of the recorded flow.
Fig. 3 is a schematic structural diagram of a flow deduplication apparatus according to an embodiment of the present application, where the apparatus may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus corresponds to the embodiment of the method of fig. 1 described above, and is capable of performing the steps involved in the embodiment of the method of fig. 1, and specific functions of the apparatus may be referred to in the foregoing description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy. The device comprises a data acquisition module 301 and a deduplication module 302, wherein:
The data acquisition module 301 is configured to acquire to-be-de-duplicated flow data, where the to-be-de-duplicated flow data includes a plurality of flow data, each flow data includes a first identifier, a second identifier, and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, and the second identifier is generated according to a sub-call set and a service tag corresponding to the recording flow;
the deduplication module 302 is configured to deduplicate the traffic data to be deduplicated according to the first identifier and the second identifier corresponding to each piece of traffic data.
Based on the above embodiment, the deduplication module 302 is specifically configured to:
Determining the same flow data in the recorded flow according to the first identifier and the second identifier;
and partially eliminating the same flow data, and reserving only one piece of flow data to realize the de-duplication of the flow data to be de-duplicated.
On the basis of the above embodiment, the apparatus further includes a flow preprocessing module, configured to:
acquiring recording flow through jvm sandbox repeater tools, wherein the recording flow comprises a sub-call set;
extracting a recording interface of the recording flow and generating a first mark according to the recording interface;
and generating a service label corresponding to the recorded flow, and generating a second identifier according to the service label and the sub-call set.
On the basis of the embodiment, the sub-call set comprises at least one sub-call, and the call type corresponding to the at least one sub-call is determined by a service server in advance according to the hash value of the thread corresponding to the sub-call, wherein the call type comprises a main thread sub-call and an asynchronous thread sub-call.
Based on the above embodiment, the flow preprocessing module is specifically configured to:
generating a main thread sub-call set for sub-calls of the main thread sub-call according to the call type;
generating an asynchronous thread sub-call set for the sub-call of the asynchronous thread sub-call according to the call type;
generating a target character string from the main thread sub-call set, the asynchronous thread sub-call set and the service tag according to a preset format;
And calculating the target character string by using a preset algorithm to obtain the second identifier.
On the basis of the above embodiment, the apparatus further includes a data storage module for:
Storing the first identifier, the second identifier and the recorded flow as one flow data to a search server;
Correspondingly, the flow preprocessing module is specifically used for:
and obtaining the data of the flow to be de-duplicated in a preset time period from the search server.
Based on the above embodiment, the flow preprocessing module is specifically configured to:
extracting a preset field from each piece of flow data;
And determining the business label matched with the preset field from the corresponding relation of the pre-stored field label.
Fig. 4 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present application, as shown in fig. 4, where the electronic device includes a processor 401, a memory 402, and a bus 403,
The processor 401 and the memory 402 complete communication with each other through the bus 403;
the processor 401 is configured to call the program instructions in the memory 402 to execute the method provided in the above embodiments of the method, for example, includes obtaining to-be-de-duplicated flow data, where the to-be-de-duplicated flow data includes a plurality of flow data, each flow data includes a first identifier, a second identifier, and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, the second identifier is generated according to a sub-call set and a service tag corresponding to the recording flow, and de-duplication is performed on the to-be-de-duplicated flow data according to the first identifier and the second identifier corresponding to each flow data.
The processor 401 may be an integrated circuit chip having signal processing capabilities. The processor 401 may be a general purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc., or may be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. Which may implement or perform the various methods, steps, and logical blocks disclosed in embodiments of the application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Memory 402 may include, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), and the like.
The embodiment discloses a computer program product, which comprises a computer program stored on a non-transitory computer readable storage medium, wherein the computer program comprises program instructions, when the program instructions are executed by a computer, the computer can execute the method provided by the method embodiments, for example, the method comprises the steps of obtaining flow data to be de-duplicated, wherein the flow data to be de-duplicated comprises a plurality of pieces of flow data, each piece of flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, the second identifier is generated according to a sub-call set and a service label corresponding to the recording flow, and the de-duplication is carried out on the flow data to be de-duplicated according to the first identifier and the second identifier corresponding to each piece of flow data.
The embodiment provides a non-transitory computer readable storage medium, which stores computer instructions, wherein the computer instructions enable a computer to execute the method provided by the method embodiments, for example, the method comprises the steps of obtaining to-be-de-duplicated flow data, wherein the to-be-de-duplicated flow data comprises a plurality of flow data, each flow data comprises a first identifier, a second identifier and a recording flow, the first identifier is generated according to a recording interface corresponding to the recording flow, the second identifier is generated according to a sub-call set and a service label corresponding to the recording flow, and de-duplication is performed on the to-be-de-duplicated flow data according to the first identifier and the second identifier corresponding to each flow data.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.