[go: up one dir, main page]

CN112784113B - Data processing method and device, computer readable storage medium, and electronic device - Google Patents

Data processing method and device, computer readable storage medium, and electronic device Download PDF

Info

Publication number
CN112784113B
CN112784113B CN201911083637.0A CN201911083637A CN112784113B CN 112784113 B CN112784113 B CN 112784113B CN 201911083637 A CN201911083637 A CN 201911083637A CN 112784113 B CN112784113 B CN 112784113B
Authority
CN
China
Prior art keywords
data
feature
trees
abnormal
initial feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911083637.0A
Other languages
Chinese (zh)
Other versions
CN112784113A (en
Inventor
刘新颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201911083637.0A priority Critical patent/CN112784113B/en
Publication of CN112784113A publication Critical patent/CN112784113A/en
Application granted granted Critical
Publication of CN112784113B publication Critical patent/CN112784113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例是关于一种数据处理方法及装置、计算机可读存储介质以及电子设备,涉及大数据处理技术领域,该方法包括:获取多个异常数据,并根据各所述异常数据的数据级别对各所述异常数据进行分类得到多个父节点数据以及多个子节点数据;根据各所述父节点数据以及与各所述父节点数据具有相同特征信息的子节点数据构建多个初始特征树;对各所述初始特征树中的所有节点的非空特征信息进行融合得到各所述初始特征树的属性信息;对具有相同属性信息的初始特征树进行筛选得到多个目标特征树,并根据各所述目标特征树的属性信息查找产生各所述异常数据的目标对象。本发明实施例提高了对产生各异常数据的目标对象的查找速度。

The embodiment of the present invention relates to a data processing method and device, a computer-readable storage medium and an electronic device, and relates to the field of big data processing technology. The method includes: obtaining multiple abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain multiple parent node data and multiple child node data; constructing multiple initial feature trees according to each parent node data and the child node data having the same feature information as each parent node data; fusing the non-empty feature information of all nodes in each initial feature tree to obtain the attribute information of each initial feature tree; screening the initial feature trees with the same attribute information to obtain multiple target feature trees, and searching for the target object that generates each abnormal data according to the attribute information of each target feature tree. The embodiment of the present invention improves the search speed for the target object that generates each abnormal data.

Description

Data processing method and device, computer readable storage medium and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of big data processing, in particular to a data processing method, a data processing device, a computer readable storage medium and electronic equipment.
Background
With the development of electronic commerce, online shopping behavior of people is also becoming more and more widespread, so that a large amount of order streams are generated at any time for some shopping platforms. In order to discover business anomalies in time, it is necessary to monitor certain business indicators (e.g., effective orders, effective amounts, etc.) in real time. Because different levels of the anomaly detection system belong to a drill-down relationship, such as a first class drill-down to a second class, anomalies in a single anomaly SKU (stock keeping unit, single item) may trigger multiple alarms to occur simultaneously. Therefore, it is critical how to quickly and accurately locate the root cause from among the detected plurality of redundant and coupled alert information.
The existing transaction data abnormality detection reason positioning method mainly comprises the following two methods, wherein one method is as follows: searching a sub-node with the largest abnormal score in a plurality of low-level alarms in real time by the high-level alarms to form an information chain, taking the information of the sub-node as the positioned abnormal reason, and inquiring the SKU by using the information of the sub-node so as to find the SKU meeting the condition; the other is: and after finishing the precipitation of each index, storing the data into a database, and searching for abnormal SKU.
However, the above method has the following drawbacks: in the first method, because possible correlations among low-level alarms are ignored, only one of the alarms is selected as output, and therefore only local information is exposed, the accuracy of the found SKU meeting the condition is low; in addition, the method needs to calculate the anomaly score, and how to measure the anomaly score correctly is a difficult point, so that the complexity of the searching process of the SKU meeting the condition is increased; in the second method, the reasons cannot be positioned in real time due to offline detection, so that timeliness is low, and the accuracy of the found abnormal SKU is low.
Therefore, it is desirable to provide a new data processing method and apparatus.
It should be noted that the information of the present invention in the above background section is only for enhancing the understanding of the background of the present invention and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
The invention aims to provide a data processing method, a data processing device, a computer readable storage medium and electronic equipment, so as to overcome the problem of low accuracy of the found abnormal SKU caused by the limitations and defects of the related art at least to a certain extent.
According to one aspect of the present disclosure, there is provided a data processing method including:
Acquiring a plurality of abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of father node data and a plurality of child node data;
Constructing a plurality of initial feature trees according to the father node data and the child node data with the same feature information as the father node data;
Fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree;
Screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, and searching for target objects for generating the abnormal data according to the attribute information of each target feature tree.
In one exemplary embodiment of the present disclosure, acquiring a plurality of anomaly data includes:
And acquiring a plurality of abnormal data which are output by the same data generation platform at the same moment and have the same data identification.
In one exemplary embodiment of the present disclosure, the data levels are divided into a first data level and a second data level; the first data level is higher than the second data level;
wherein classifying each of the abnormal data according to the data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data includes:
and taking the different data corresponding to the first data level as the father node data and the abnormal data corresponding to the second data level as the child node data.
In one exemplary embodiment of the present disclosure, constructing a plurality of initial feature trees from each of the parent node data and child node data having the same feature information as each of the parent node data includes:
determining alarm characteristics of the father node data, and determining child node data with the same alarm characteristics as the father node data;
And constructing a plurality of initial feature trees according to the father node data and the child node data with the same alarm features as the father node data.
In an exemplary embodiment of the present disclosure, filtering an initial feature tree having the same attribute information to obtain a plurality of target feature trees includes:
constructing a feature tree set according to each initial feature tree, and judging whether a relation between the initial feature trees exists or not according to attribute information of each initial feature tree in the feature tree set;
If the relation between the inclusion and the inclusion exists between any two initial feature trees, deleting the initial feature tree with the inclusion relation;
Deleting all the initial feature trees with the included relation from the feature tree set as a plurality of target feature trees;
Wherein, no relation between inclusion and inclusion exists between any two target feature trees.
In one exemplary embodiment of the present disclosure, the attribute information includes a variety of category levels and business groups to which a target object corresponding to each of the initial feature trees belongs, a source of the target object, a geographic location of each of the abnormal data corresponding to each of the initial feature trees, and a channel level of each of the abnormal data corresponding to each of the initial feature trees.
In an exemplary embodiment of the present disclosure, searching for the target object generating each of the abnormal data according to the attribute information of each of the target feature trees includes:
Searching the target object for generating the abnormal data according to the class level and the business group of the corresponding target object of each target feature tree, the source of the target object, the geographic position for generating the abnormal data corresponding to each target feature tree and the channel level for generating the abnormal data corresponding to each target feature tree.
According to an aspect of the present disclosure, there is provided a data processing apparatus comprising:
the data classification module is used for acquiring a plurality of abnormal data, classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of father node data and a plurality of child node data;
The initial feature tree construction module is used for constructing a plurality of initial feature trees according to the father node data and the child node data with the same feature information as the father node data;
The feature information integration module is used for integrating the non-empty feature information of all nodes in each initial feature tree to obtain attribute information of each initial feature tree;
And the target object searching module is used for screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, and searching the target object for generating the abnormal data according to the attribute information of each target feature tree.
According to one aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method of any one of the above.
According to one aspect of the present disclosure, there is provided an electronic device including:
a processor; and
A memory for storing executable instructions of the processor;
Wherein the processor is configured to perform the data processing method of any of the above via execution of the executable instructions.
According to the data processing method and device provided by the embodiment of the invention, on one hand, a plurality of father node data and a plurality of child node data are obtained by classifying the abnormal data according to the data level of the abnormal data; constructing a plurality of initial feature trees according to the data of each father node and the data of the child nodes with the same feature information as the data of each father node; fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree; finally, screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, searching the target object for generating each abnormal data according to the attribute information of each target feature tree, and solving the problem that the accuracy of the searched SKU meeting the condition is lower because only one alarm is selected as output and only local information is exposed because possible relativity among low-level alarms is ignored in the prior art, thereby improving the accuracy of the searched target object for generating each abnormal data; on the other hand, the problem of complicated degree of the searching process of the SKU meeting the conditions is solved, the complicated degree of searching the target object generating each abnormal data is simplified, and the searching speed is further improved because the abnormal score is required to be calculated and how to accurately measure the abnormal score is a difficult point in the prior art; on the other hand, the problem that the accuracy of the found abnormal SKU is low due to the fact that the reason cannot be located in real time and the timeliness is low in the prior art due to the fact that offline detection is adopted is solved; furthermore, the initial feature trees with the same attribute information are screened to obtain a plurality of target feature trees, and target objects for generating abnormal data are searched according to the attribute information of each target feature tree, so that the number of the target feature trees is reduced, and the searching speed is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 schematically illustrates a flow chart of a data processing method according to an example embodiment of the invention;
FIG. 2 schematically illustrates a flow chart of a method of constructing a plurality of initial feature trees from each of the parent node data and child node data having the same feature information as each of the parent node data, according to an example embodiment of the invention;
FIGS. 3, 4, 5 and 6 schematically illustrate an example diagram of an initial feature tree according to an example embodiment of the invention;
FIG. 7 schematically illustrates a flowchart of a method for filtering an initial feature tree having identical attribute information to obtain a plurality of target feature trees, according to an example embodiment of the present invention;
FIG. 8 schematically shows a block diagram of a data processing apparatus according to an example embodiment of the invention;
fig. 9 schematically shows an electronic device for implementing the above-described data processing method according to an exemplary embodiment of the invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known aspects have not been shown or described in detail to avoid obscuring aspects of the invention.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
In this exemplary embodiment, a data processing method is provided first, where the method may operate on a server, a server cluster, or a cloud server, or may also operate on a terminal device; of course, those skilled in the art may also operate the method of the present invention on other platforms as required, and this is not a particular limitation in the present exemplary embodiment. Referring to fig. 1, the data processing method may include the steps of:
S110, acquiring a plurality of abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of father node data and a plurality of child node data.
And S120, constructing a plurality of initial feature trees according to the father node data and the child node data with the same feature information as the father node data.
S130, fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree.
And S140, screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, and searching for target objects for generating the abnormal data according to the attribute information of each target feature tree.
In the data processing method, on one hand, a plurality of father node data and a plurality of child node data are obtained by classifying each anomaly according to the data level of each anomaly data; constructing a plurality of initial feature trees according to the data of each father node and the data of the child nodes with the same feature information as the data of each father node; fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree; finally, screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, searching the target object for generating each abnormal data according to the attribute information of each target feature tree, and solving the problem that the accuracy of the searched SKU meeting the condition is lower because only one alarm is selected as output and only local information is exposed because possible relativity among low-level alarms is ignored in the prior art, and improving the accuracy of the searched target object for generating each abnormal data; on the other hand, the problem of complexity of a lookup process of the SKU meeting the conditions is solved, the complexity of searching the target object generating each abnormal data is simplified, and the lookup speed of the target object generating each abnormal data is further improved because the anomaly score needs to be calculated and how to accurately measure the anomaly score is a difficult point in the prior art; on the other hand, the problem that the accuracy of the found abnormal SKU is low due to the fact that the reason cannot be located in real time and the timeliness is low in the prior art due to the fact that offline detection is adopted is solved; further, the initial feature tree with the same attribute information is screened to obtain a plurality of target feature trees, and target objects for generating abnormal data are searched according to the attribute information of each target feature tree, so that the number of the target feature trees is reduced, and the searching speed of the target objects for generating the abnormal data is further improved.
Hereinafter, each step involved in the data processing method according to the exemplary embodiment of the present invention will be explained and illustrated in detail with reference to the accompanying drawings.
First, the purpose and the background of the exemplary embodiments of the present invention will be described.
Specifically, the object of the exemplary embodiment of the present invention includes the following two aspects: on one hand, in order to expose information of different dimensions of a certain abnormal point as much as possible, the embodiment of the invention connects related alarms which occur simultaneously in series, forms a multi-dimensional unified structure and searches for abnormal SKUs; on the other hand, in order to realize quick positioning and timely damage stopping, the system adopts real-time positioning.
Further, the output of the abnormal SKU marking is divided into different levels, and the warning information of level 0 (first data level), level 1 (second data level) and level 2 (third data level) is shared. For example, a certain piece of abnormal data with level=0 may be shown in the following table 1:
TABLE 1
Date of day Time of day Platform Index (I) Alert content Level Marking
d t p Effective amount of money {} 0 High to high
For another example, a certain piece of abnormal data with level=2 may be as shown in table 2 below:
TABLE 2
Date of day Time of day Platform Index (I) Alert content Level Marking
d t B Effective order { Business group: a, first class: b } 2 High to high
The abnormality detection system determines each time series independently, and thus the output of the abnormality marking (each differential data) is also independent. Even if the alarms contain some alarms of a drill-down relation, the output process of the alarms does not consider the associated information. Two pieces of alarm information shown in the following table 3, the time, the platform and the index of the two pieces of alarm information are consistent, the first piece of alarm belongs to level 1, and the alarm content of the first piece of alarm information is { cause group A-higher }; the second alarm belongs to level 2, and the alarm content is { business group A, class B-higher }. Both alarms occur simultaneously because both alarm 1 and alarm 2 are caused by one or some abnormal SKUs in primary class B subordinate to utility group a.
TABLE 3 Table 3
Date of day Time of day Platform Index (I) Alert content Level Marking
Alarm 1 d t B Effective order { Business group: a } 1 High to high
Alarm 2 d t B Effective order { Business group: a, first class: b } 2 High to high
In step S110, a plurality of abnormal data are acquired, and each of the abnormal data is classified according to a data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data.
In the present exemplary embodiment, first, a plurality of pieces of abnormal data are acquired; specifically, a plurality of abnormal data having the same data identifier and output by the same data generating platform at the same time may be obtained. By acquiring a plurality of valid data of the same data identification (index, such as whether the data is a valid order or whether the data is a valid amount) and the same data generation platform at the same time, the real-time performance of the data can be ensured, and the problem that the accuracy of the searched abnormal SKU is low due to the fact that the reasons cannot be positioned in real time and the timeliness is not strong because the data is detected offline in the prior art is avoided. Wherein, for ease of understanding, a plurality of exception data as shown in table 4 below is provided for illustration.
For example, on the same date d, the same data generation platform P and the same time t, the following 12 pieces of alarm information are simultaneously generated for the same data identifier (effective amount). Of these 12 alarms, 1 belongs to level 0,4 belongs to level 1, and 7 belongs to level 2, as shown in Table 4 below.
TABLE 4 Table 4
Further, after the plurality of abnormal data are obtained, the plurality of parent node data and child node data may be obtained by classifying the plurality of abnormal data according to the data level of each abnormal data, wherein the data level may include a first data level and a second data level, and the first data level is higher than the second data level. Specifically, each different data corresponding to the first data level may be used as each parent node data, and each abnormal data corresponding to the second data level may be used as child node data. For example, according to the classification rule, the alarms of 4 levels 1 in the above table 4 are respectively used as parent nodes, and the alarms of 7 levels 2 are respectively assigned as child nodes according to the characteristic attribute.
It should be further noted that the data Level may further include a Level 0 data Level; however, for ease of explanation, this data level is not specifically described, but is within the scope of embodiments of the present invention. Also, the ideas to be always expressed by the exemplary embodiments of the present invention are: taking the data with higher data level as father node data and the data with lower data level as child node data; when it appears that a certain child node data cannot be contained in any one of the parent node data, the child node data may be regarded as parent node data.
In step S120, a plurality of initial feature trees are constructed from each of the parent node data and child node data having the same feature information as each of the parent node data.
In the present exemplary embodiment, referring to fig. 2, constructing a plurality of initial feature trees from each of the parent node data and child node data having the same feature information as each of the parent node data may include step S210 and step S220, which will be described in detail below.
In step S210, an alarm characteristic of each of the parent node data is determined, and child node data having the same alarm characteristic as each of the parent node data is determined.
In step S220, a plurality of initial feature trees are constructed according to each of the parent node data and child node data having the same alarm feature as each of the parent node data.
Hereinafter, step S210 and step S220 will be explained and explained. Specifically, first, the alarm characteristics of the data of each father node are determined, and then the alarms containing the same characteristics in level 2 are set as child nodes of the level 1 alarm, so as to form a plurality of initial characteristic trees. For example, referring to FIG. 3, the parent node shown in FIG. 3 is: { class I: c }; the corresponding child nodes are: { secondary class: s, first class: c }, { business group: b, first class: c }, { whether to self: o, first class: c } and { province: p, first class: c } and the like; further, referring to fig. 4, the parent node shown in fig. 4 is: { business group: b }; the corresponding child nodes are: { business group: b, first class: c }, { whether to self: o, business group: b } and { province: p, business group: b } and the like; still further, referring to FIG. 5, the parent node shown in FIG. 5 is: { whether to self-camp: o }; the corresponding child nodes are: { whether to self-camp: o, first class: c }, { whether to self: o, business group: b } and { province: p, whether to self-operate: o } and the like; finally, the parent node shown in fig. 6 is: { province: p }; the corresponding child nodes are: { province: p, first class: c, { province: p, whether to self-operate: o } { province: p, business group: b }, and the like.
In step S130, non-empty feature information of all nodes in each initial feature tree is fused to obtain attribute information of each initial feature tree.
In the present exemplary embodiment, the attribute information includes a category level and a business group to which a target object corresponding to each of the initial feature trees belongs, a source of the target object, a geographic position of each of the abnormal data corresponding to each of the initial feature trees, a channel level of each of the abnormal data corresponding to each of the initial feature trees, and the like. Specifically, the attribute of each initial feature tree comprises a primary class, a secondary class, a public service group, whether to self-camp, province, a primary channel, a secondary channel and a city.
Further, for each initial feature tree, all non-empty feature information of all nodes (including parent nodes and child nodes) of the initial feature tree is fused to form the attribute of the feature tree. The attributes that are not available to both the child and parent nodes remain empty. Specifically, according to the fusion rule, feature fusion is performed on each feature tree independently, and the fusion result is shown in the following table 5:
TABLE 5
In step S140, the initial feature tree with the same attribute information is filtered to obtain a plurality of target feature trees, and the target object for generating each abnormal data is searched according to the attribute information of each target feature tree.
In the present exemplary embodiment, first, an initial feature tree having the same attribute information is filtered to obtain a plurality of target feature trees. Specifically, referring to fig. 7, filtering the initial feature tree having the same attribute information to obtain a plurality of target feature trees may include steps S710 to S730, which will be described in detail below.
In step S710, a feature tree set is constructed according to each of the initial feature trees, and whether a relationship between inclusion and inclusion exists between each of the initial feature trees is determined according to attribute information of each of the initial feature trees in the feature tree set.
In step S720, if there is a relationship between any two initial feature trees that includes and is included, the initial feature tree having the included relationship is deleted.
In step S730, each of the initial feature trees remaining after deleting the initial feature tree having the included relationship in the feature tree set is used as a plurality of the target feature trees; wherein, no relation between inclusion and inclusion exists between any two target feature trees.
Hereinafter, step S710 to step S730 will be explained and explained. Specifically, according to the above screening rule (if there is a relation between inclusion and inclusion between any two initial feature trees, the initial feature tree having the inclusion relation is deleted), it can be known that the attributes of the initial feature tree 2, the initial feature tree 3, and the initial feature tree 4 are all included in the initial feature tree 1, and thus the feature tree 2, the feature tree 3, and the feature tree 4 are screened out, and the feature tree 1 is retained. Specifically, the results are shown in Table 6 below.
TABLE 6
Finally, the final output is all the attributes of feature tree 1, as shown in Table 7 below:
TABLE 7
Feature tree 1 attribute
First class: c
Secondary class: s
Public service group B
Whether to self-operate: o
Saving: p
Channel:
Secondary channel:
City:
The method is characterized in that the accuracy of obtaining the target feature tree can be further improved by constructing the feature tree set, and the accuracy of the searched target object generating the abnormal data is further improved.
Further, after the target feature tree is obtained, a target object for generating each abnormal data may be searched according to attribute information of each target feature tree. Specifically, the method can comprise the following steps: searching the target object for generating the abnormal data according to the class level and the business group of the corresponding target object of each target feature tree, the source of the target object, the geographic position for generating the abnormal data corresponding to each target feature tree and the channel level for generating the abnormal data corresponding to each target feature tree. Compared with the method for searching the abnormal SKU by adopting the information chain in the prior art, the method for searching the abnormal SKU by adopting the information chain outputs only two characteristics finally.
The disclosure also provides a data processing device. Referring to fig. 8, the data processing apparatus may include a data classification module 810, an initial feature tree construction module 820, a feature information integration module 830, and a target object search module 840. Wherein:
The data classification module 810 may be configured to obtain a plurality of abnormal data, and classify each of the abnormal data according to a data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data.
The initial feature tree construction module 820 may be configured to construct a plurality of initial feature trees from each of the parent node data and child node data having the same feature information as each of the parent node data.
The feature information integration module 830 may be configured to integrate non-empty feature information of all nodes in each initial feature tree to obtain attribute information of each initial feature tree.
The target object searching module 840 may filter the initial feature tree having the same attribute information to obtain a plurality of target feature trees, and search the target object generating each abnormal data according to the attribute information of each target feature tree.
In one exemplary embodiment of the present disclosure, acquiring a plurality of anomaly data includes:
And acquiring a plurality of abnormal data which are output by the same data generation platform at the same moment and have the same data identification.
In one exemplary embodiment of the present disclosure, the data levels are divided into a first data level and a second data level; the first data level is higher than the second data level; wherein classifying each of the abnormal data according to the data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data includes:
and taking the different data corresponding to the first data level as the father node data and the abnormal data corresponding to the second data level as the child node data.
In one exemplary embodiment of the present disclosure, constructing a plurality of initial feature trees from each of the parent node data and child node data having the same feature information as each of the parent node data includes:
Determining alarm characteristics of the father node data, and determining child node data with the same alarm characteristics as the father node data; and constructing a plurality of initial feature trees according to the father node data and the child node data with the same alarm features as the father node data.
In an exemplary embodiment of the present disclosure, filtering an initial feature tree having the same attribute information to obtain a plurality of target feature trees includes:
Constructing a feature tree set according to each initial feature tree, and judging whether a relation between the initial feature trees exists or not according to attribute information of each initial feature tree in the feature tree set; if the relation between the inclusion and the inclusion exists between any two initial feature trees, deleting the initial feature tree with the inclusion relation; deleting all the initial feature trees with the included relation from the feature tree set as a plurality of target feature trees; wherein, no relation between inclusion and inclusion exists between any two target feature trees.
In one exemplary embodiment of the present disclosure, the attribute information includes a variety of category levels and business groups to which a target object corresponding to each of the initial feature trees belongs, a source of the target object, a geographic location of each of the abnormal data corresponding to each of the initial feature trees, and a channel level of each of the abnormal data corresponding to each of the initial feature trees.
In an exemplary embodiment of the present disclosure, searching for the target object generating each of the abnormal data according to the attribute information of each of the target feature trees includes:
Searching the target object for generating the abnormal data according to the class level and the business group of the corresponding target object of each target feature tree, the source of the target object, the geographic position for generating the abnormal data corresponding to each target feature tree and the channel level for generating the abnormal data corresponding to each target feature tree.
The specific details of each module in the above data processing apparatus have been described in detail in the corresponding data processing method, so that the details are not repeated here.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Furthermore, although the steps of the methods of the present invention are depicted in the accompanying drawings in a particular order, this is not required to or suggested that the steps must be performed in this particular order or that all of the steps shown be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
In an exemplary embodiment of the present invention, an electronic device capable of implementing the above method is also provided.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, and a bus 930 connecting the different system components (including the storage unit 920 and the processing unit 910).
Wherein the storage unit stores program code that is executable by the processing unit 910 such that the processing unit 910 performs steps according to various exemplary embodiments of the present invention described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 910 may perform step S110 as shown in fig. 1: acquiring a plurality of abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of father node data and a plurality of child node data; step S120: constructing a plurality of initial feature trees according to the father node data and the child node data with the same feature information as the father node data; step S130: fusing the non-empty characteristic information of all nodes in each initial characteristic tree to obtain attribute information of each initial characteristic tree; step S140: screening the initial feature tree with the same attribute information to obtain a plurality of target feature trees, and searching for target objects for generating the abnormal data according to the attribute information of each target feature tree.
The storage unit 920 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 9201 and/or cache memory 9202, and may further include Read Only Memory (ROM) 9203.
The storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205, such program modules 9205 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The bus 930 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 900 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 950. Also, electronic device 900 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 960. As shown, the network adapter 960 communicates with other modules of the electronic device 900 over the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 900, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present invention.
In an exemplary embodiment of the present invention, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.
A program product for implementing the above-described method according to an embodiment of the present invention may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims (10)

1.一种数据处理方法,其特征在于,包括:1. A data processing method, comprising: 获取多个异常数据,并根据各所述异常数据的数据级别对各所述异常数据进行分类得到多个父节点数据以及多个子节点数据;Acquire a plurality of abnormal data, and classify each of the abnormal data according to the data level of each of the abnormal data to obtain a plurality of parent node data and a plurality of child node data; 根据各所述父节点数据以及与各所述父节点数据具有相同特征信息的子节点数据构建多个初始特征树;Constructing a plurality of initial feature trees according to the parent node data and the child node data having the same feature information as the parent node data; 对各所述初始特征树中的所有节点的非空特征信息进行融合得到各所述初始特征树的属性信息;Merging the non-empty feature information of all nodes in each of the initial feature trees to obtain the attribute information of each of the initial feature trees; 对具有相同属性信息的初始特征树进行筛选得到多个目标特征树,并根据各所述目标特征树的属性信息查找产生各所述异常数据的目标对象。The initial feature trees with the same attribute information are screened to obtain a plurality of target feature trees, and the target object generating each of the abnormal data is searched according to the attribute information of each of the target feature trees. 2.根据权利要求1所述的数据处理方法,其特征在于,获取多个异常数据包括:2. The data processing method according to claim 1, wherein obtaining a plurality of abnormal data comprises: 获取同一时刻、具有相同数据标识,且由同一数据生成平台输出的多个所述异常数据。A plurality of abnormal data are obtained at the same time, have the same data identifier, and are output by the same data generation platform. 3.根据权利要求2所述的数据处理方法,其特征在于,所述数据级别分为第一数据级别以及第二数据级别;所述第一数据级别高于第二数据级别;3. The data processing method according to claim 2, characterized in that the data level is divided into a first data level and a second data level; the first data level is higher than the second data level; 其中,根据各所述异常数据的数据级别对各所述异常数据进行分类得到多个父节点数据以及多个子节点数据包括:Among them, classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of parent node data and a plurality of child node data includes: 将所述第一数据级别对应的各异常数据作为各所述父节点数据,将各所述第二数据级别对应的异常数据作为子节点数据。Each abnormal data corresponding to the first data level is used as each parent node data, and each abnormal data corresponding to the second data level is used as child node data. 4.根据权利要求3所述的数据处理方法,其特征在于,根据各所述父节点数据以及与各所述父节点数据具有相同特征信息的子节点数据构建多个初始特征树包括:4. The data processing method according to claim 3, characterized in that constructing a plurality of initial feature trees according to each of the parent node data and the child node data having the same feature information as each of the parent node data comprises: 确定各所述父节点数据的告警特征,并确定与各所述父节点数据具有相同告警特征的子节点数据;Determine the alarm feature of each parent node data, and determine the child node data having the same alarm feature as each parent node data; 根据各所述父节点数据,以及与各所述父节点数据具有相同告警特征的子节点数据,构建多个初始特征树。A plurality of initial feature trees are constructed according to the parent node data and the child node data having the same alarm feature as the parent node data. 5.根据权利要求4所述的数据处理方法,其特征在于,对具有相同属性信息的初始特征树进行筛选,得到多个目标特征树包括:5. The data processing method according to claim 4, characterized in that the initial feature trees having the same attribute information are screened to obtain a plurality of target feature trees comprising: 根据各所述初始特征树构建特征树集合,并根据所述特征树集合中的各所述初始特征树的属性信息判断各所述初始特征树之间是否存在包含与被包含的关系;Constructing a feature tree set according to each of the initial feature trees, and judging whether there is a containing and being contained relationship between each of the initial feature trees according to the attribute information of each of the initial feature trees in the feature tree set; 如果任意两个初始特征树之间存在包含与被包含的关系,则对具有被包含关系的初始特征树进行删除;If there is a containment relationship between any two initial feature trees, the initial feature tree with the contained relationship will be deleted; 将所述特征树集合中删除了具有被包含关系的初始特征树之后剩余的各所述初始特征树,作为多个所述目标特征树;The initial feature trees remaining after deleting the initial feature trees having the included relationship from the feature tree set are used as the plurality of target feature trees; 其中,任意两个所述目标特征树之间,均不存在包含与被包含的关系。There is no relationship of inclusion and being included between any two of the target feature trees. 6.根据权利要求1-5任意一项所述的数据处理方法,其特征在于,所述属性信息包括各所述初始特征树对应的目标对象所属的品类级别以及事业群、所述目标对象的来源、产生各所述初始特征树对应的各所述异常数据的地理位置以及产生各所述初始特征树对应的各所述异常数据的渠道级别中的多种。6. The data processing method according to any one of claims 1-5 is characterized in that the attribute information includes multiple types of the category level and business group to which the target object corresponding to each of the initial feature trees belongs, the source of the target object, the geographical location where each of the abnormal data corresponding to each of the initial feature trees is generated, and the channel level that generates each of the abnormal data corresponding to each of the initial feature trees. 7.根据权利要求6所述的数据处理方法,其特征在于,根据各所述目标特征树的属性信息查找产生各所述异常数据的目标对象包括:7. The data processing method according to claim 6, characterized in that searching for the target object that generates each of the abnormal data according to the attribute information of each of the target feature trees comprises: 根据各所述目标特征树的对应的所述目标对象所属的品类级别以及事业群、所述目标对象的来源、产生各所述目标特征树对应的各所述异常数据的地理位置以及产生各所述目标特征树对应的各所述异常数据的渠道级别,查找产生各所述异常数据的目标对象。According to the category level and business group to which the target object corresponding to each target feature tree belongs, the source of the target object, the geographical location where the abnormal data corresponding to each target feature tree is generated, and the channel level where the abnormal data corresponding to each target feature tree is generated, the target object that generates each abnormal data is found. 8.一种数据处理装置,其特征在于,包括:8. A data processing device, comprising: 数据分类模块,用于获取多个异常数据,并根据各所述异常数据的数据级别对各所述异常数据进行分类得到多个父节点数据以及多个子节点数据;A data classification module, used for acquiring a plurality of abnormal data, and classifying each abnormal data according to the data level of each abnormal data to obtain a plurality of parent node data and a plurality of child node data; 初始特征树构建模块,用于根据各所述父节点数据以及与各所述父节点数据具有相同特征信息的子节点数据构建多个初始特征树;An initial feature tree construction module, used to construct multiple initial feature trees according to each of the parent node data and the child node data having the same feature information as each of the parent node data; 特征信息整合模块,用于对各所述初始特征树中的所有节点的非空特征信息进行整合得到各所述初始特征树的属性信息;A feature information integration module, used to integrate the non-empty feature information of all nodes in each of the initial feature trees to obtain the attribute information of each of the initial feature trees; 目标对象查找模块,对具有相同属性信息的初始特征树进行筛选得到多个目标特征树,并根据各所述目标特征树的属性信息查找产生各所述异常数据的目标对象。The target object search module screens the initial feature trees with the same attribute information to obtain multiple target feature trees, and searches for the target object that generates each of the abnormal data according to the attribute information of each of the target feature trees. 9.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-7任一项所述的数据处理方法。9. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the data processing method according to any one of claims 1 to 7 is implemented. 10.一种电子设备,其特征在于,包括:10. An electronic device, comprising: 处理器;以及Processor; and 存储器,用于存储所述处理器的可执行指令;A memory, configured to store executable instructions of the processor; 其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1-7任一项所述的数据处理方法。The processor is configured to execute the data processing method according to any one of claims 1 to 7 by executing the executable instructions.
CN201911083637.0A 2019-11-07 2019-11-07 Data processing method and device, computer readable storage medium, and electronic device Active CN112784113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911083637.0A CN112784113B (en) 2019-11-07 2019-11-07 Data processing method and device, computer readable storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911083637.0A CN112784113B (en) 2019-11-07 2019-11-07 Data processing method and device, computer readable storage medium, and electronic device

Publications (2)

Publication Number Publication Date
CN112784113A CN112784113A (en) 2021-05-11
CN112784113B true CN112784113B (en) 2024-10-18

Family

ID=75748011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911083637.0A Active CN112784113B (en) 2019-11-07 2019-11-07 Data processing method and device, computer readable storage medium, and electronic device

Country Status (1)

Country Link
CN (1) CN112784113B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113608818B (en) * 2021-08-19 2024-01-16 北京字跳网络技术有限公司 Data processing methods, devices, electronic equipment, storage media and program products
CN114936040B (en) * 2022-05-31 2024-10-25 北京达佳互联信息技术有限公司 Program data processing method, device, electronic equipment and storage medium
CN116049507B (en) * 2022-09-16 2025-12-19 中国建设银行股份有限公司 Method, apparatus, device and computer readable medium for monitoring data
CN117389854B (en) * 2023-11-13 2024-09-06 北京驭数华创科技有限公司 Performance analysis method, device, electronic equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597744A (en) * 2018-11-16 2019-04-09 北京奇虎科技有限公司 Data change analysis method and device
CN109725803A (en) * 2018-12-20 2019-05-07 网易(杭州)网络有限公司 Comment information processing method and processing device, storage medium, electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4983604B2 (en) * 2005-10-31 2012-07-25 富士通株式会社 Performance abnormality analyzer, method and program, and analysis result display method of performance abnormality analyzer
US20130166188A1 (en) * 2011-12-21 2013-06-27 Microsoft Corporation Determine Spatiotemporal Causal Interactions In Data
CN116719698A (en) * 2019-05-06 2023-09-08 创新先进技术有限公司 A method and device for identifying the cause of abnormal indicators

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597744A (en) * 2018-11-16 2019-04-09 北京奇虎科技有限公司 Data change analysis method and device
CN109725803A (en) * 2018-12-20 2019-05-07 网易(杭州)网络有限公司 Comment information processing method and processing device, storage medium, electronic equipment

Also Published As

Publication number Publication date
CN112784113A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
US12205043B2 (en) Method and apparatus for generating knowledge graph, method for relation mining
CN112784113B (en) Data processing method and device, computer readable storage medium, and electronic device
CN113760891B (en) A method, device, equipment and storage medium for generating a data table
US20170109676A1 (en) Generation of Candidate Sequences Using Links Between Nonconsecutively Performed Steps of a Business Process
WO2021159834A1 (en) Abnormal information processing node analysis method and apparatus, medium and electronic device
WO2020037917A1 (en) User behavior data recommendation method, server and computer readable medium
CN117971606B (en) Log management system and method based on elastic search
CN114461644B (en) Data collection method, device, electronic device and storage medium
US20210209162A1 (en) Method for processing identity information, electronic device, and storage medium
US20170109639A1 (en) General Model for Linking Between Nonconsecutively Performed Steps in Business Processes
CN112925664A (en) Target user determination method and device, electronic equipment and storage medium
CN116795923A (en) A tax analysis service system based on big data
CN105653419B (en) Realization method and system of tracing to the source based on monitoring point
CN114358113B (en) Digital currency transaction tracing method and device based on unsupervised learning technology
CN114547406A (en) Data monitoring method, system, storage medium and electronic device
CN117914547A (en) Security situation awareness processing method, system and device with built-in data processing unit
US11907267B2 (en) User interface for frequent pattern analysis
US11294917B2 (en) Data attribution using frequent pattern analysis
CN114743384A (en) Alarm method and device
CN113610008A (en) Method, device, device and storage medium for obtaining status of muck truck
CN117952717B (en) A method and system for processing air ticket orders based on big data
CN119557060A (en) Data processing method, device, electronic device and storage medium
CN111553597A (en) Method and device for carrying out financial fraud risk identification on enterprise
CN118796863A (en) Data query method and device
CN119272037A (en) Data anomaly detection model training method, data anomaly detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant