CN112784113B

CN112784113B - Data processing method and device, computer readable storage medium, and electronic device

Info

Publication number: CN112784113B
Application number: CN201911083637.0A
Authority: CN
Inventors: 刘新颖
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2024-10-18
Anticipated expiration: 2039-11-07
Also published as: CN112784113A

Abstract

The embodiment of the present invention relates to a data processing method and device, a computer-readable storage medium and an electronic device, and relates to the field of big data processing technology. The method includes: obtaining multiple abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain multiple parent node data and multiple child node data; constructing multiple initial feature trees according to each parent node data and the child node data having the same feature information as each parent node data; fusing the non-empty feature information of all nodes in each initial feature tree to obtain the attribute information of each initial feature tree; screening the initial feature trees with the same attribute information to obtain multiple target feature trees, and searching for the target object that generates each abnormal data according to the attribute information of each target feature tree. The embodiment of the present invention improves the search speed for the target object that generates each abnormal data.

Description

Data processing method and device, computer readable storage medium and electronic equipment

Technical Field

The embodiment of the invention relates to the technical field of big data processing, in particular to a data processing method, a data processing device, a computer readable storage medium and electronic equipment.

Background

With the development of electronic commerce, online shopping behavior of people is also becoming more and more widespread, so that a large amount of order streams are generated at any time for some shopping platforms. In order to discover business anomalies in time, it is necessary to monitor certain business indicators (e.g., effective orders, effective amounts, etc.) in real time. Because different levels of the anomaly detection system belong to a drill-down relationship, such as a first class drill-down to a second class, anomalies in a single anomaly SKU (stock keeping unit, single item) may trigger multiple alarms to occur simultaneously. Therefore, it is critical how to quickly and accurately locate the root cause from among the detected plurality of redundant and coupled alert information.

The existing transaction data abnormality detection reason positioning method mainly comprises the following two methods, wherein one method is as follows: searching a sub-node with the largest abnormal score in a plurality of low-level alarms in real time by the high-level alarms to form an information chain, taking the information of the sub-node as the positioned abnormal reason, and inquiring the SKU by using the information of the sub-node so as to find the SKU meeting the condition; the other is: and after finishing the precipitation of each index, storing the data into a database, and searching for abnormal SKU.

However, the above method has the following drawbacks: in the first method, because possible correlations among low-level alarms are ignored, only one of the alarms is selected as output, and therefore only local information is exposed, the accuracy of the found SKU meeting the condition is low; in addition, the method needs to calculate the anomaly score, and how to measure the anomaly score correctly is a difficult point, so that the complexity of the searching process of the SKU meeting the condition is increased; in the second method, the reasons cannot be positioned in real time due to offline detection, so that timeliness is low, and the accuracy of the found abnormal SKU is low.

Therefore, it is desirable to provide a new data processing method and apparatus.

It should be noted that the information of the present invention in the above background section is only for enhancing the understanding of the background of the present invention and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.

Disclosure of Invention

The invention aims to provide a data processing method, a data processing device, a computer readable storage medium and electronic equipment, so as to overcome the problem of low accuracy of the found abnormal SKU caused by the limitations and defects of the related art at least to a certain extent.

According to one aspect of the present disclosure, there is provided a data processing method including:

Acquiring a plurality of abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of father node data and a plurality of child node data;

Constructing a plurality of initial feature trees according to the father node data and the child node data with the same feature information as the father node data;

Fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree;

Screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, and searching for target objects for generating the abnormal data according to the attribute information of each target feature tree.

In one exemplary embodiment of the present disclosure, acquiring a plurality of anomaly data includes:

And acquiring a plurality of abnormal data which are output by the same data generation platform at the same moment and have the same data identification.

In one exemplary embodiment of the present disclosure, the data levels are divided into a first data level and a second data level; the first data level is higher than the second data level;

wherein classifying each of the abnormal data according to the data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data includes:

and taking the different data corresponding to the first data level as the father node data and the abnormal data corresponding to the second data level as the child node data.

In one exemplary embodiment of the present disclosure, constructing a plurality of initial feature trees from each of the parent node data and child node data having the same feature information as each of the parent node data includes:

determining alarm characteristics of the father node data, and determining child node data with the same alarm characteristics as the father node data;

And constructing a plurality of initial feature trees according to the father node data and the child node data with the same alarm features as the father node data.

In an exemplary embodiment of the present disclosure, filtering an initial feature tree having the same attribute information to obtain a plurality of target feature trees includes:

constructing a feature tree set according to each initial feature tree, and judging whether a relation between the initial feature trees exists or not according to attribute information of each initial feature tree in the feature tree set;

If the relation between the inclusion and the inclusion exists between any two initial feature trees, deleting the initial feature tree with the inclusion relation;

Deleting all the initial feature trees with the included relation from the feature tree set as a plurality of target feature trees;

Wherein, no relation between inclusion and inclusion exists between any two target feature trees.

In one exemplary embodiment of the present disclosure, the attribute information includes a variety of category levels and business groups to which a target object corresponding to each of the initial feature trees belongs, a source of the target object, a geographic location of each of the abnormal data corresponding to each of the initial feature trees, and a channel level of each of the abnormal data corresponding to each of the initial feature trees.

In an exemplary embodiment of the present disclosure, searching for the target object generating each of the abnormal data according to the attribute information of each of the target feature trees includes:

Searching the target object for generating the abnormal data according to the class level and the business group of the corresponding target object of each target feature tree, the source of the target object, the geographic position for generating the abnormal data corresponding to each target feature tree and the channel level for generating the abnormal data corresponding to each target feature tree.

According to an aspect of the present disclosure, there is provided a data processing apparatus comprising:

the data classification module is used for acquiring a plurality of abnormal data, classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of father node data and a plurality of child node data;

The initial feature tree construction module is used for constructing a plurality of initial feature trees according to the father node data and the child node data with the same feature information as the father node data;

The feature information integration module is used for integrating the non-empty feature information of all nodes in each initial feature tree to obtain attribute information of each initial feature tree;

And the target object searching module is used for screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, and searching the target object for generating the abnormal data according to the attribute information of each target feature tree.

According to one aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method of any one of the above.

According to one aspect of the present disclosure, there is provided an electronic device including:

a processor; and

A memory for storing executable instructions of the processor;

Wherein the processor is configured to perform the data processing method of any of the above via execution of the executable instructions.

According to the data processing method and device provided by the embodiment of the invention, on one hand, a plurality of father node data and a plurality of child node data are obtained by classifying the abnormal data according to the data level of the abnormal data; constructing a plurality of initial feature trees according to the data of each father node and the data of the child nodes with the same feature information as the data of each father node; fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree; finally, screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, searching the target object for generating each abnormal data according to the attribute information of each target feature tree, and solving the problem that the accuracy of the searched SKU meeting the condition is lower because only one alarm is selected as output and only local information is exposed because possible relativity among low-level alarms is ignored in the prior art, thereby improving the accuracy of the searched target object for generating each abnormal data; on the other hand, the problem of complicated degree of the searching process of the SKU meeting the conditions is solved, the complicated degree of searching the target object generating each abnormal data is simplified, and the searching speed is further improved because the abnormal score is required to be calculated and how to accurately measure the abnormal score is a difficult point in the prior art; on the other hand, the problem that the accuracy of the found abnormal SKU is low due to the fact that the reason cannot be located in real time and the timeliness is low in the prior art due to the fact that offline detection is adopted is solved; furthermore, the initial feature trees with the same attribute information are screened to obtain a plurality of target feature trees, and target objects for generating abnormal data are searched according to the attribute information of each target feature tree, so that the number of the target feature trees is reduced, and the searching speed is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 schematically illustrates a flow chart of a data processing method according to an example embodiment of the invention;

FIG. 2 schematically illustrates a flow chart of a method of constructing a plurality of initial feature trees from each of the parent node data and child node data having the same feature information as each of the parent node data, according to an example embodiment of the invention;

FIGS. 3, 4, 5 and 6 schematically illustrate an example diagram of an initial feature tree according to an example embodiment of the invention;

FIG. 7 schematically illustrates a flowchart of a method for filtering an initial feature tree having identical attribute information to obtain a plurality of target feature trees, according to an example embodiment of the present invention;

FIG. 8 schematically shows a block diagram of a data processing apparatus according to an example embodiment of the invention;

fig. 9 schematically shows an electronic device for implementing the above-described data processing method according to an exemplary embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known aspects have not been shown or described in detail to avoid obscuring aspects of the invention.

Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

In this exemplary embodiment, a data processing method is provided first, where the method may operate on a server, a server cluster, or a cloud server, or may also operate on a terminal device; of course, those skilled in the art may also operate the method of the present invention on other platforms as required, and this is not a particular limitation in the present exemplary embodiment. Referring to fig. 1, the data processing method may include the steps of:

S110, acquiring a plurality of abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of father node data and a plurality of child node data.

And S120, constructing a plurality of initial feature trees according to the father node data and the child node data with the same feature information as the father node data.

S130, fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree.

And S140, screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, and searching for target objects for generating the abnormal data according to the attribute information of each target feature tree.

In the data processing method, on one hand, a plurality of father node data and a plurality of child node data are obtained by classifying each anomaly according to the data level of each anomaly data; constructing a plurality of initial feature trees according to the data of each father node and the data of the child nodes with the same feature information as the data of each father node; fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree; finally, screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, searching the target object for generating each abnormal data according to the attribute information of each target feature tree, and solving the problem that the accuracy of the searched SKU meeting the condition is lower because only one alarm is selected as output and only local information is exposed because possible relativity among low-level alarms is ignored in the prior art, and improving the accuracy of the searched target object for generating each abnormal data; on the other hand, the problem of complexity of a lookup process of the SKU meeting the conditions is solved, the complexity of searching the target object generating each abnormal data is simplified, and the lookup speed of the target object generating each abnormal data is further improved because the anomaly score needs to be calculated and how to accurately measure the anomaly score is a difficult point in the prior art; on the other hand, the problem that the accuracy of the found abnormal SKU is low due to the fact that the reason cannot be located in real time and the timeliness is low in the prior art due to the fact that offline detection is adopted is solved; further, the initial feature tree with the same attribute information is screened to obtain a plurality of target feature trees, and target objects for generating abnormal data are searched according to the attribute information of each target feature tree, so that the number of the target feature trees is reduced, and the searching speed of the target objects for generating the abnormal data is further improved.

Hereinafter, each step involved in the data processing method according to the exemplary embodiment of the present invention will be explained and illustrated in detail with reference to the accompanying drawings.

First, the purpose and the background of the exemplary embodiments of the present invention will be described.

Specifically, the object of the exemplary embodiment of the present invention includes the following two aspects: on one hand, in order to expose information of different dimensions of a certain abnormal point as much as possible, the embodiment of the invention connects related alarms which occur simultaneously in series, forms a multi-dimensional unified structure and searches for abnormal SKUs; on the other hand, in order to realize quick positioning and timely damage stopping, the system adopts real-time positioning.

Further, the output of the abnormal SKU marking is divided into different levels, and the warning information of level 0 (first data level), level 1 (second data level) and level 2 (third data level) is shared. For example, a certain piece of abnormal data with level=0 may be shown in the following table 1:

TABLE 1

Date of day

Time of day

Platform

Index (I)

Alert content

Level

Marking

d

t

p

Effective amount of money

{}

0

High to high

For another example, a certain piece of abnormal data with level=2 may be as shown in table 2 below:

TABLE 2

Date of day

Time of day

Platform

Index (I)

Alert content

Level

Marking

d

t

B

Effective order

{ Business group: a, first class: b }

2

High to high

The abnormality detection system determines each time series independently, and thus the output of the abnormality marking (each differential data) is also independent. Even if the alarms contain some alarms of a drill-down relation, the output process of the alarms does not consider the associated information. Two pieces of alarm information shown in the following table 3, the time, the platform and the index of the two pieces of alarm information are consistent, the first piece of alarm belongs to level 1, and the alarm content of the first piece of alarm information is { cause group A-higher }; the second alarm belongs to level 2, and the alarm content is { business group A, class B-higher }. Both alarms occur simultaneously because both alarm 1 and alarm 2 are caused by one or some abnormal SKUs in primary class B subordinate to utility group a.

TABLE 3 Table 3

Date of day

Time of day

Platform

Index (I)

Alert content

Level

Marking

Alarm 1

d

t

B

Effective order

{ Business group: a }

1

High to high

Alarm 2

d

t

B

Effective order

{ Business group: a, first class: b }

2

High to high

In step S110, a plurality of abnormal data are acquired, and each of the abnormal data is classified according to a data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data.

In the present exemplary embodiment, first, a plurality of pieces of abnormal data are acquired; specifically, a plurality of abnormal data having the same data identifier and output by the same data generating platform at the same time may be obtained. By acquiring a plurality of valid data of the same data identification (index, such as whether the data is a valid order or whether the data is a valid amount) and the same data generation platform at the same time, the real-time performance of the data can be ensured, and the problem that the accuracy of the searched abnormal SKU is low due to the fact that the reasons cannot be positioned in real time and the timeliness is not strong because the data is detected offline in the prior art is avoided. Wherein, for ease of understanding, a plurality of exception data as shown in table 4 below is provided for illustration.

For example, on the same date d, the same data generation platform P and the same time t, the following 12 pieces of alarm information are simultaneously generated for the same data identifier (effective amount). Of these 12 alarms, 1 belongs to level 0,4 belongs to level 1, and 7 belongs to level 2, as shown in Table 4 below.

TABLE 4 Table 4

Further, after the plurality of abnormal data are obtained, the plurality of parent node data and child node data may be obtained by classifying the plurality of abnormal data according to the data level of each abnormal data, wherein the data level may include a first data level and a second data level, and the first data level is higher than the second data level. Specifically, each different data corresponding to the first data level may be used as each parent node data, and each abnormal data corresponding to the second data level may be used as child node data. For example, according to the classification rule, the alarms of 4 levels 1 in the above table 4 are respectively used as parent nodes, and the alarms of 7 levels 2 are respectively assigned as child nodes according to the characteristic attribute.

It should be further noted that the data Level may further include a Level 0 data Level; however, for ease of explanation, this data level is not specifically described, but is within the scope of embodiments of the present invention. Also, the ideas to be always expressed by the exemplary embodiments of the present invention are: taking the data with higher data level as father node data and the data with lower data level as child node data; when it appears that a certain child node data cannot be contained in any one of the parent node data, the child node data may be regarded as parent node data.

In step S120, a plurality of initial feature trees are constructed from each of the parent node data and child node data having the same feature information as each of the parent node data.

In the present exemplary embodiment, referring to fig. 2, constructing a plurality of initial feature trees from each of the parent node data and child node data having the same feature information as each of the parent node data may include step S210 and step S220, which will be described in detail below.

In step S210, an alarm characteristic of each of the parent node data is determined, and child node data having the same alarm characteristic as each of the parent node data is determined.

In step S220, a plurality of initial feature trees are constructed according to each of the parent node data and child node data having the same alarm feature as each of the parent node data.

Hereinafter, step S210 and step S220 will be explained and explained. Specifically, first, the alarm characteristics of the data of each father node are determined, and then the alarms containing the same characteristics in level 2 are set as child nodes of the level 1 alarm, so as to form a plurality of initial characteristic trees. For example, referring to FIG. 3, the parent node shown in FIG. 3 is: { class I: c }; the corresponding child nodes are: { secondary class: s, first class: c }, { business group: b, first class: c }, { whether to self: o, first class: c } and { province: p, first class: c } and the like; further, referring to fig. 4, the parent node shown in fig. 4 is: { business group: b }; the corresponding child nodes are: { business group: b, first class: c }, { whether to self: o, business group: b } and { province: p, business group: b } and the like; still further, referring to FIG. 5, the parent node shown in FIG. 5 is: { whether to self-camp: o }; the corresponding child nodes are: { whether to self-camp: o, first class: c }, { whether to self: o, business group: b } and { province: p, whether to self-operate: o } and the like; finally, the parent node shown in fig. 6 is: { province: p }; the corresponding child nodes are: { province: p, first class: c, { province: p, whether to self-operate: o } { province: p, business group: b }, and the like.

In step S130, non-empty feature information of all nodes in each initial feature tree is fused to obtain attribute information of each initial feature tree.

In the present exemplary embodiment, the attribute information includes a category level and a business group to which a target object corresponding to each of the initial feature trees belongs, a source of the target object, a geographic position of each of the abnormal data corresponding to each of the initial feature trees, a channel level of each of the abnormal data corresponding to each of the initial feature trees, and the like. Specifically, the attribute of each initial feature tree comprises a primary class, a secondary class, a public service group, whether to self-camp, province, a primary channel, a secondary channel and a city.

Further, for each initial feature tree, all non-empty feature information of all nodes (including parent nodes and child nodes) of the initial feature tree is fused to form the attribute of the feature tree. The attributes that are not available to both the child and parent nodes remain empty. Specifically, according to the fusion rule, feature fusion is performed on each feature tree independently, and the fusion result is shown in the following table 5:

TABLE 5

In step S140, the initial feature tree with the same attribute information is filtered to obtain a plurality of target feature trees, and the target object for generating each abnormal data is searched according to the attribute information of each target feature tree.

In the present exemplary embodiment, first, an initial feature tree having the same attribute information is filtered to obtain a plurality of target feature trees. Specifically, referring to fig. 7, filtering the initial feature tree having the same attribute information to obtain a plurality of target feature trees may include steps S710 to S730, which will be described in detail below.

In step S710, a feature tree set is constructed according to each of the initial feature trees, and whether a relationship between inclusion and inclusion exists between each of the initial feature trees is determined according to attribute information of each of the initial feature trees in the feature tree set.

In step S720, if there is a relationship between any two initial feature trees that includes and is included, the initial feature tree having the included relationship is deleted.

In step S730, each of the initial feature trees remaining after deleting the initial feature tree having the included relationship in the feature tree set is used as a plurality of the target feature trees; wherein, no relation between inclusion and inclusion exists between any two target feature trees.

Hereinafter, step S710 to step S730 will be explained and explained. Specifically, according to the above screening rule (if there is a relation between inclusion and inclusion between any two initial feature trees, the initial feature tree having the inclusion relation is deleted), it can be known that the attributes of the initial feature tree 2, the initial feature tree 3, and the initial feature tree 4 are all included in the initial feature tree 1, and thus the feature tree 2, the feature tree 3, and the feature tree 4 are screened out, and the feature tree 1 is retained. Specifically, the results are shown in Table 6 below.

TABLE 6

Finally, the final output is all the attributes of feature tree 1, as shown in Table 7 below:

TABLE 7

Feature tree 1 attribute
	First class: c
Secondary class: s
	Public service group B
Whether to self-operate: o
	Saving: p
Channel:
	Secondary channel:
City:

The method is characterized in that the accuracy of obtaining the target feature tree can be further improved by constructing the feature tree set, and the accuracy of the searched target object generating the abnormal data is further improved.

Further, after the target feature tree is obtained, a target object for generating each abnormal data may be searched according to attribute information of each target feature tree. Specifically, the method can comprise the following steps: searching the target object for generating the abnormal data according to the class level and the business group of the corresponding target object of each target feature tree, the source of the target object, the geographic position for generating the abnormal data corresponding to each target feature tree and the channel level for generating the abnormal data corresponding to each target feature tree. Compared with the method for searching the abnormal SKU by adopting the information chain in the prior art, the method for searching the abnormal SKU by adopting the information chain outputs only two characteristics finally.

The disclosure also provides a data processing device. Referring to fig. 8, the data processing apparatus may include a data classification module 810, an initial feature tree construction module 820, a feature information integration module 830, and a target object search module 840. Wherein:

The data classification module 810 may be configured to obtain a plurality of abnormal data, and classify each of the abnormal data according to a data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data.

The initial feature tree construction module 820 may be configured to construct a plurality of initial feature trees from each of the parent node data and child node data having the same feature information as each of the parent node data.

The feature information integration module 830 may be configured to integrate non-empty feature information of all nodes in each initial feature tree to obtain attribute information of each initial feature tree.

The target object searching module 840 may filter the initial feature tree having the same attribute information to obtain a plurality of target feature trees, and search the target object generating each abnormal data according to the attribute information of each target feature tree.

In one exemplary embodiment of the present disclosure, the data levels are divided into a first data level and a second data level; the first data level is higher than the second data level; wherein classifying each of the abnormal data according to the data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data includes:

Determining alarm characteristics of the father node data, and determining child node data with the same alarm characteristics as the father node data; and constructing a plurality of initial feature trees according to the father node data and the child node data with the same alarm features as the father node data.

Constructing a feature tree set according to each initial feature tree, and judging whether a relation between the initial feature trees exists or not according to attribute information of each initial feature tree in the feature tree set; if the relation between the inclusion and the inclusion exists between any two initial feature trees, deleting the initial feature tree with the inclusion relation; deleting all the initial feature trees with the included relation from the feature tree set as a plurality of target feature trees; wherein, no relation between inclusion and inclusion exists between any two target feature trees.

The specific details of each module in the above data processing apparatus have been described in detail in the corresponding data processing method, so that the details are not repeated here.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Furthermore, although the steps of the methods of the present invention are depicted in the accompanying drawings in a particular order, this is not required to or suggested that the steps must be performed in this particular order or that all of the steps shown be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

In an exemplary embodiment of the present invention, an electronic device capable of implementing the above method is also provided.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, and a bus 930 connecting the different system components (including the storage unit 920 and the processing unit 910).

Wherein the storage unit stores program code that is executable by the processing unit 910 such that the processing unit 910 performs steps according to various exemplary embodiments of the present invention described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 910 may perform step S110 as shown in fig. 1: acquiring a plurality of abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of father node data and a plurality of child node data; step S120: constructing a plurality of initial feature trees according to the father node data and the child node data with the same feature information as the father node data; step S130: fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree; step S140: screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, and searching for target objects for generating the abnormal data according to the attribute information of each target feature tree.

The storage unit 920 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 9201 and/or cache memory 9202, and may further include Read Only Memory (ROM) 9203.

The storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205, such program modules 9205 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The bus 930 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 900 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 950. Also, electronic device 900 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 960. As shown, the network adapter 960 communicates with other modules of the electronic device 900 over the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 900, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present invention.

In an exemplary embodiment of the present invention, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.

A program product for implementing the above-described method according to an embodiment of the present invention may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims

1. A data processing method, comprising:

Acquire a plurality of abnormal data, and classify each of the abnormal data according to the data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data;

Constructing a plurality of initial feature trees according to the parent node data and the child node data having the same feature information as the parent node data;

Merging the non-empty feature information of all nodes in each of the initial feature trees to obtain the attribute information of each of the initial feature trees;

The initial feature trees with the same attribute information are screened to obtain a plurality of target feature trees, and the target object generating each of the abnormal data is searched according to the attribute information of each of the target feature trees.

2. The data processing method according to claim 1, wherein obtaining a plurality of abnormal data comprises:

A plurality of abnormal data are obtained at the same time, have the same data identifier, and are output by the same data generation platform.

3. The data processing method according to claim 2, characterized in that the data level is divided into a first data level and a second data level; the first data level is higher than the second data level;

Among them, classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of parent node data and a plurality of child node data includes:

Each abnormal data corresponding to the first data level is used as each parent node data, and each abnormal data corresponding to the second data level is used as child node data.

4. The data processing method according to claim 3, characterized in that constructing a plurality of initial feature trees according to each of the parent node data and the child node data having the same feature information as each of the parent node data comprises:

Determine the alarm feature of each parent node data, and determine the child node data having the same alarm feature as each parent node data;

A plurality of initial feature trees are constructed according to the parent node data and the child node data having the same alarm feature as the parent node data.

5. The data processing method according to claim 4, characterized in that the initial feature trees having the same attribute information are screened to obtain a plurality of target feature trees comprising:

Constructing a feature tree set according to each of the initial feature trees, and judging whether there is a containing and being contained relationship between each of the initial feature trees according to the attribute information of each of the initial feature trees in the feature tree set;

If there is a containment relationship between any two initial feature trees, the initial feature tree with the contained relationship will be deleted;

The initial feature trees remaining after deleting the initial feature trees having the included relationship from the feature tree set are used as the plurality of target feature trees;

There is no relationship of inclusion and being included between any two of the target feature trees.

6. The data processing method according to any one of claims 1-5 is characterized in that the attribute information includes multiple types of the category level and business group to which the target object corresponding to each of the initial feature trees belongs, the source of the target object, the geographical location where each of the abnormal data corresponding to each of the initial feature trees is generated, and the channel level that generates each of the abnormal data corresponding to each of the initial feature trees.

7. The data processing method according to claim 6, characterized in that searching for the target object that generates each of the abnormal data according to the attribute information of each of the target feature trees comprises:

According to the category level and business group to which the target object corresponding to each target feature tree belongs, the source of the target object, the geographical location where the abnormal data corresponding to each target feature tree is generated, and the channel level where the abnormal data corresponding to each target feature tree is generated, the target object that generates each abnormal data is found.

8. A data processing device, comprising:

A data classification module, used for acquiring a plurality of abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of parent node data and a plurality of child node data;

An initial feature tree construction module, used to construct multiple initial feature trees according to each of the parent node data and the child node data having the same feature information as each of the parent node data;

A feature information integration module, used to integrate the non-empty feature information of all nodes in each of the initial feature trees to obtain the attribute information of each of the initial feature trees;

The target object search module screens the initial feature trees with the same attribute information to obtain multiple target feature trees, and searches for the target object that generates each of the abnormal data according to the attribute information of each of the target feature trees.

9. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the data processing method according to any one of claims 1 to 7 is implemented.

10. An electronic device, comprising:

Processor; and

A memory, configured to store executable instructions of the processor;

The processor is configured to execute the data processing method according to any one of claims 1 to 7 by executing the executable instructions.