Disclosure of Invention
In order to solve the technical problems described above or at least partially solve the technical problems described above, the present application provides a method for constructing a metadata tag library, which is used for implementing automatic batch calculation and automatic marking of metadata tags.
In a first aspect, the present application provides a method for constructing a metadata tag library, including:
acquiring a plurality of metadata entities;
Acquiring dimension labels corresponding to the metadata entities according to the relation among the metadata entities; the dimension tag is used for indicating the dimension of the relation between the metadata entity and another metadata entity;
Acquiring blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity;
And according to the blood relationship of each metadata entity, acquiring and adding at least one tag of the liveness tag, the influence tag and the similar tag for each metadata entity.
As an optional implementation manner of the embodiment of the present invention, the obtaining and adding, for each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag according to a blood relationship of each metadata entity includes:
Acquiring the referenced times, the referenced frequency and the weight of the referee of each metadata entity according to the blood relationship of each metadata entity;
And acquiring liveness labels of the metadata entities according to the quoted times, the quoted frequencies and the quoter weights.
As an optional implementation manner of the embodiment of the present invention, the obtaining and adding, for each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag according to a blood relationship of each metadata entity includes:
acquiring one or more of centrality, intermediacy and compactness of each metadata entity according to the blood relationship of each metadata entity;
and obtaining influence tags of each metadata entity according to one or more of the centrality, the intermediacy and the compactness.
As an optional implementation manner of the embodiment of the present invention, the obtaining and adding, for each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag according to a blood relationship of each metadata entity includes:
Performing cluster analysis on the metadata entities to obtain a cluster result;
And acquiring influence tags of the metadata entities according to the clustering result.
As an optional implementation manner of the embodiment of the present invention, according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, and the method further includes:
obtaining a calculation result of the similarity degree among the metadata entities according to the blood-cause relation of the metadata entities;
and obtaining the similarity labels of the metadata entities according to the calculation result of the similarity degree among the metadata entities.
As an optional implementation manner of the embodiment of the present invention, before obtaining the dimension labels corresponding to the metadata entities according to the relationships between the metadata entities, the method further includes:
and acquiring the relation among the metadata entities by one or more modes of analyzing the data dictionary, analyzing the SQL statement, analyzing the database and analyzing the audit log.
As an optional implementation manner of the embodiment of the present invention, the method further includes:
Generating a relationship map by taking each metadata entity as a vertex and the relationship between the metadata entity and other metadata entities in the blood-edge relationship corresponding to each metadata entity as an edge;
And storing the relation map into the map database.
In a second aspect, the present application provides an apparatus for constructing a metadata tag library, including:
The metadata acquisition entity module is used for acquiring a plurality of metadata entities;
the first acquisition module is used for acquiring dimension labels corresponding to the metadata entities according to the relation among the metadata entities; the dimension tag is used for indicating the dimension of the relation between the metadata entity and another metadata entity;
the second acquisition module is used for acquiring the blood-edge relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity;
and the third acquisition module is used for acquiring and adding at least one tag of the liveness tag, the influence tag and the similar tag for each metadata entity according to the blood relationship of each metadata entity.
As an optional implementation manner of the embodiment of the present invention, the obtaining and adding, for each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag according to a blood relationship of each metadata entity includes:
Acquiring the referenced times, the referenced frequency and the weight of the referee of each metadata entity according to the blood relationship of each metadata entity;
And acquiring liveness labels of the metadata entities according to the quoted times, the quoted frequencies and the quoter weights.
As an optional implementation manner of the embodiment of the present invention, the obtaining and adding, for each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag according to a blood relationship of each metadata entity includes:
acquiring one or more of centrality, intermediacy and compactness of each metadata entity according to the blood relationship of each metadata entity;
and obtaining influence tags of each metadata entity according to one or more of the centrality, the intermediacy and the compactness.
As an optional implementation manner of the embodiment of the present invention, the obtaining and adding, for each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag according to a blood relationship of each metadata entity includes:
Performing cluster analysis on the metadata entities to obtain a cluster result;
And acquiring influence tags of the metadata entities according to the clustering result.
As an optional implementation manner of the embodiment of the present invention, according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, including:
obtaining a calculation result of the similarity degree among the metadata entities according to the blood-cause relation of the metadata entities;
and obtaining the similarity labels of the metadata entities according to the calculation result of the similarity degree among the metadata entities.
As an optional implementation manner of the embodiment of the present invention, before obtaining the dimension labels corresponding to the metadata entities according to the relationships between the metadata entities, the method further includes:
and acquiring the relation among the metadata entities by one or more modes of analyzing the data dictionary, analyzing the SQL statement, analyzing the database and analyzing the audit log.
As an optional implementation manner of the embodiment of the present invention, the method further includes:
Generating a relationship map by taking each metadata entity as a vertex and the relationship between the metadata entity and other metadata entities in the blood-edge relationship corresponding to each metadata entity as an edge;
And storing the relation map into the map database.
In a third aspect, the present application provides a computer device, comprising a memory and a processor, the memory storing a computer program, the processor executing the computer program to perform the method of building a metadata tag library according to the first aspect or any implementation of the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to perform the method for building a metadata tag library according to the first aspect or any implementation manner of the first aspect.
Compared with the prior art, the technical scheme provided by the application has the following advantages:
The method for constructing the metadata tag library comprises the steps of firstly, obtaining a plurality of metadata entities; then, according to the relation among the metadata entities, acquiring dimension labels corresponding to the metadata entities; acquiring blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity; and finally, according to the blood relationship of each metadata entity, acquiring and adding at least one tag of the liveness tag, the influence tag and the similar tag for each metadata entity. Compared with the traditional data tag generation mode, the method has the advantages that the dimension reflecting the association relationship of the metadata entities is contained in the data asset tag system, so that various metadata entities and the blood relationship corresponding to the metadata entities can be completely acquired, and the complete metadata tag system is constructed.
Detailed Description
In order that the above objects, features and advantages of the application will be more clearly understood, a further description of the application will be made. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the application.
Definition of technical terms:
The data tag is an important means for organizing the data asset, on one hand, the data asset manager can supplement and expand the classified hierarchical management of the data through the data tag, enrich the expression of the data characteristics and attributes, and on the other hand, the user of the data asset can quickly find the required data through the data tag.
Big data tag: when the data resource is big data, the data tag is evolved into a big data tag, which is different from the conventional enterprise data tag in that:
(1) The starting point is different: the traditional enterprise labels are more from the aspect of enterprise business; whereas big data tags are theoretically directed to each class of data IDs with marking value and marking feasibility.
(2) The data mining method is different: the traditional enterprise labels rely more on experience to perform relevant dimension combination and threshold setting; and the big data labels are subjected to dimension screening and threshold setting through the data model.
(3) The management of the tags is different: the management of the traditional enterprise labels generally cannot be managed by a self-organization system; while big data tags emphasize full lifecycle management and dynamic management.
(4) The support application of the labels is different: the application of the traditional enterprise labels is also based on experience, for example, from the perspective of products, potential marking clients of the products are found out through the labels, and then the products are popularized through relevant channel contacts; the big data labels are more based on depth understanding of scene, and the images of certain types of data IDs are emphasized and obtained, so that the integration schemes such as solutions, information, channels and values are adapted.
Blood edge analysis is a technical means for comprehensive tracking of data processing processes to find all relevant metadata objects starting from a certain data object and the relationships between these metadata objects. Relationships between metadata objects are specifically data stream input-output relationships that represent these metadata objects. Purpose of blood margin analysis: obtaining source information of result data through blood margin tracking according to the integrated database or view; when updating data, the change of the original database can be reflected, and the change process of the data in the data stream is checked.
The blood margin analysis is a mapping of the internal relation of the data object, and simultaneously, the time sequence and the successive relation are combined to reflect a certain relativity and the front consequence thereof. The range of applications can be said to be very broad, and are also the core sharps for data asset management.
The embodiment of the application provides a method, a device, electronic equipment, a computer readable storage medium and a program product for constructing a metadata tag library. According to the method, a plurality of metadata entities are obtained, the dimension reflecting the association relationship of the metadata entities is contained in a data asset tag system, and at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity according to the blood relationship of each metadata entity. In the process of acquiring liveness labels, influence labels and similar labels, an artificial intelligent modeling means is introduced into the labeling work, so that the defects of low efficiency, high cost and long period of artificial labeling can be overcome.
In one embodiment, the application provides a method for constructing a metadata tag library. As shown in fig. 1, the method for constructing the metadata tag library includes the following steps:
s101, acquiring a plurality of metadata entities.
The metadata acquisition entity is divided into an automatic acquisition mode and a manual acquisition mode in an operation mode.
Specifically, automatic acquisition refers to automatic and timed completion of acquisition tasks. The acquisition task is an automatically scheduled unit of work, providing an automated, periodic, or time-triggered mechanism for the acquisition of metadata. The interface is supported to maintain tasks, such as inquiry, addition, modification and deletion, and the time and state of automatic execution of the tasks can be configured. Illustratively, metadata is automatically collected after obtaining the connection rights for the data source. The data source types may include: conventional databases such as Oracle Database (a relational Database management system), MYSQL (a relational Database management system), DB2 (IBM DB2, a relational Database management system), inflimit (inflimit is a relational Database management system manufactured by IBM corporation), mariaDB (MariaDB Database management system), sybase (a relational Database system), and the like. Manual collection refers to selecting local files to upload to a server to manually collect metadata. Compared with automatic acquisition, the manual acquisition function belongs to the manual mode of acquiring metadata information stored in a local file in real time, such as metadata stored in an excel file.
Further, after connecting to the database, the file storing the metadata is read to the parsing platform. Illustratively, in the case of a data dictionary, a table containing descriptive language data such as table names, field names, data types, data storage processes, etc., tables of this type are canonical and have a uniform format. Different types of databases may use different resolvers to parse tables of descriptive language data such as table names, field names, data types, data storage processes, etc. into nodes in the graph database. Meanwhile, checking of the collected log information is provided, and whether the collection is successful or not is checked. Looking up the acquisition log can query the following information of the acquisition task: start time, task status, end time, process log, number of acquisitions, etc.
After the metadata acquisition is completed, the metadata is stored in a database, and metadata applications including metadata statistics, inquiry, blood-margin analysis, influence analysis, data asset maps and the like are supported.
S102, acquiring dimension labels corresponding to the metadata entities according to the relation among the metadata entities.
Wherein the dimension tag is used to indicate a dimension of a relationship between the metadata entity and another metadata entity.
S103, acquiring the blood-edge relation corresponding to each metadata entity according to the dimension label corresponding to each metadata entity.
The blood relationship is the relationship between the related data found in the process of tracing the data. Big data blood-edges are links that refer to the generation of data, i.e., the source of the data and which processes and phases the data has undergone. Through the blood-edge relations of different levels, the migration circulation of the data can be clearly known, and according to the dimension labels corresponding to the metadata entities, the dimension of the relation between one metadata entity and the other metadata entity can be obtained, namely, the relation of the two metadata entities with several layers of blood-edge relations is obtained.
The dimension of the response metadata association relationship is contained in the data asset tag system, so that various metadata nodes and relationships can be automatically and completely obtained, and a guarantee is provided for automatically constructing a complete data map. Wherein, the data asset tag is a description of the asset from a plurality of different angles, one tag can be marked on different assets, one asset can be marked with a plurality of different tags at the same time, and the data asset tag can be classified and managed in a group or catalog form. Thus, the tag hierarchy should be constructed to take into account the different angles of application for querying, inventorying, recommending, etc. data assets. For example, the data security management can define a tag catalog and tags from the perspective of the data security level, and label the tags to various types of assets, and the data assets can be searched from different tag systems after the labeling is completed. The tag directory hierarchy may also be defined in view of traffic striping, data lifecycle, and the like.
S104, according to the blood relationship of each metadata entity, at least one label of an activity label, an influence label and a similar label is obtained and added for each metadata entity.
In one embodiment, the obtaining and adding at least one tag of the liveness tag, the influence tag and the similar tag for each metadata entity according to the blood relationship of each metadata entity includes: acquiring the referenced times, the referenced frequency and the weight of the referee of each metadata entity according to the blood relationship of each metadata entity; and acquiring liveness labels of the metadata entities according to the quoted times, the quoted frequencies and the quoter weights.
Specifically, for metadata liveness labels, liveness calculation rules are set, including the number of times that is referred, the frequency that is referred and the weight of the referee.
In one embodiment, the obtaining and adding at least one tag of the liveness tag, the influence tag and the similar tag for each metadata entity according to the blood relationship of each metadata entity includes: acquiring one or more of centrality, intermediacy and compactness of each metadata entity according to the blood relationship of each metadata entity; and obtaining influence tags of each metadata entity according to one or more of the centrality, the intermediacy and the compactness.
The centrality is a measurement index for measuring the importance degree of the node. The measure of centrality has three most basic dimensions: degree centrality, intermediate centrality, feature vector centrality. The most direct measure of node centrality is characterized in network analysis. The greater the degree of a node, the greater the centrality of that node, the more important the node in the network.
Degree of center: one node is in direct contact with many other nodes, and the node is in a central position. I.e. the wider the relationship of a node, the more adjacent nodes, the more important this node is.
Intermediate centrality, i.e., indirect centrality, refers to the number of shortest paths that a node appears between other nodes. I.e. this node corresponds to a gate through which the node connected to it wants other nodes to pass.
Compact centrality: i.e., closeness, reflects the proximity between a node and other nodes. If one node is closer to the other node, it need not rely on the other node when it propagates the information. One node is not constrained by other nodes if its distance from each point in the network is short.
Nodes are the concept of graph databases, namely libraries, tables, fields, views, etc.; relationships are the relationships between libraries, tables, relationships between libraries, and the like. In the design process of the database, the nodes are equivalent to entities. An entity often refers to a collection of something in a database. The entity can be a specific person, thing or thing; or abstract concepts, associations.
Because the centrality, intermediacy and compactness are all the importance of the metadata entity in the network, different judgment bases can be used in different application scenarios, and the embodiment is not particularly limited. In addition, there is a method for calculating the importance ranking of nodes in a network, namely the PageRank algorithm.
PageRank, the ranking of web pages, is the earliest algorithm used by Google to rank web pages, indicating how important a web page is by considering links as votes. The PageRank calculation process is not complex: before the first round of iteration starts, all vertexes set the PageRank value of themselves to 1; in each iteration, each vertex contributes the current PageRank value divided by the number of edges to all neighbors as votes, and then all received votes from the neighbors are accumulated to be used as new PageRank values; this is repeated until the PageRank value for all vertices varies between adjacent rounds to reach a threshold. PageRank takes web pages as vertices and hyperlinks between web pages as edges, and the entire Internet can be modeled as a very large graph. When the search engine returns a result, the correlation degree between the webpage content and the keywords needs to be considered, and the quality of the webpage needs to be considered.
In one embodiment, the obtaining and adding at least one tag of the liveness tag, the influence tag and the similar tag for each metadata entity according to the blood relationship of each metadata entity includes: performing cluster analysis on the metadata entities to obtain a cluster result; and acquiring influence tags of the metadata entities according to the clustering result.
Among these, clustering is an important unsupervised algorithm in machine learning that can group data points into a series of specific combinations. Data points that are theoretically classified into one category have the same characteristics, while data points of a different category have different attributes.
In this embodiment, a density clustering or community clustering method may be used to obtain the clustering result.
The density clustering is used for examining the connectivity among samples from the viewpoint of sample density, and the connectable samples are continuously expanded until a final clustering result is obtained.
Community discovery algorithm based on spectrum analysis: the graph is represented by a specific matrix by using the adjacent matrix and the diagonal matrix of the graph, for example, a laplace matrix l=d-W of the graph, D is a diagonal matrix with the degree of each node as a diagonal element, and W is the adjacent matrix of the graph. And taking the matrix characteristic components corresponding to the nodes as space coordinates, mapping the nodes in the network into a multidimensional characteristic vector space, and clustering the nodes into communities by using a traditional clustering method.
In one embodiment, according to the blood-edge relationship of each metadata entity, at least one tag of an liveness tag, an influence tag and a similar tag is obtained and added for each metadata entity, and according to the blood-edge relationship of each metadata entity, a calculation result of the similarity degree between each metadata entity is obtained; and obtaining the similarity labels of the metadata entities according to the calculation result of the similarity degree among the metadata entities.
Illustratively, in this embodiment, a similarity algorithm is used for modeling, and a calculation result of the similarity degree between each metadata entity is obtained. In the current natural language processing, data mining and machine learning, a similarity measurement algorithm is a relatively common algorithm, and is the basis of text calculation. The similarity measure helps the developer find the data relevance, and the core point is in two aspects: the first aspect is a characteristic representation of data and the second aspect is a representation method between sets.
According to the embodiment, an artificial intelligent modeling means is adopted in the process of obtaining each tag of the metadata entity, so that the defects of low efficiency, high cost and long period of manual tagging can be overcome.
In one embodiment, before the dimension labels corresponding to the metadata entities are obtained according to the relationships among the metadata entities, the relationships among the metadata entities are obtained by one or more modes of analyzing a data dictionary, analyzing SQL sentences, analyzing a database and analyzing an audit log.
Specifically, one method is to access a data dictionary table of a database to obtain user authority information in the database and information such as table names, field names, data types, primary keys, external keys and the like in a basic table, define all the basic tables and data items as entities in a data map, and construct a library/table relationship, a table/field relationship, an external key relationship among tables and a relationship among users and data.
Wherein tables, i.e. settings for the relevant properties of a table. In a relational database, another name of a table is called a "relationship".
The data types are classified according to the data structures, and the data with the same data structure belong to the same class, namely the same class of data is called a data type. Illustratively, in MYSQL relational database management systems, there are three main types of data: text, number, and date/time type.
A view is a table composed of one or more base tables (tables actually storing data) according to a certain condition, i.e., temporarily stored data, not an actual table, which is essentially just a select statement.
The stored procedure is a set of SQL sentences stored in the database for accomplishing specific functions in a large database system, after the first compilation, the stored procedure is called again without need of recompilation, and the user calls the stored procedure by designating the name of the stored procedure and giving the parameters (if the stored procedure has parameters).
A relationship, one relationship corresponds to what is generally referred to as a table.
The primary key, often has a column or combination of columns in the table whose values can uniquely identify each row in the table.
The foreign key sets the value of a certain field in one table and must be derived from the value of a certain primary key field of another table.
Secondly, analyzing the source, the destination and the processing process of the data stream through SQL sentences, thereby constructing the inter-table storage process relationship, the inter-table function relationship, the inter-table/field function relationship and the inter-table view relationship.
Thirdly, the information of the table, the column, the data type, the view, the storage process, the relation, the primary key, the external key and the like is obtained through Schema analysis, and the data dictionary analysis result and the SQL sentence analysis result are supplemented to obtain a more comprehensive metadata association relation.
Fourthly, acquiring the relation among the user access library, the table and the fields and the time and frequency information of the view by analyzing the audit log.
Specifically, in the data asset, the higher the frequency of users accessing a certain library, table and field, the more important the access department, the higher the importance of the piece of data, and when the metadata is labeled, the information is required to be used as the basis for one of the labeling.
In one embodiment, a relationship graph is generated by taking each metadata entity as a vertex and the relationship between the metadata entity and other metadata entities in the blood-edge relationship corresponding to each metadata entity as an edge; and storing the relation map into the map database.
The graph is an abstract data structure for representing association relations between objects, and is described by using vertexes and edges, wherein the vertexes represent the objects, and the edges represent the relations between the objects. That is, the data described by the graph is graph data.
The graph database provides efficient associative queries. In the data map stored by the map database, the other entity related to the entity can be quickly obtained by inquiring the edge of the entity and the label on the edge of the entity, so that the association operation of various complicated tables is omitted, the relationship inquiry is more convenient, and the efficiency is remarkably improved.
The data map is a metadata relation map constructed based on the map database and all metadata information, and the included entities are metadata entities such as libraries, tables, fields and the like, and management entities such as visitors, owners and the like, and the included relations are relations among the metadata entities, such as foreign key relations, schema relations, blood relation relations, relations between the metadata entities and data visitors and the like.
Illustratively, referring to FIG. 2, the acquired entity and relationship data is stored in the form of an adjacency matrix, which is divided into rows and stored on each physical node of the large data platform. Wherein, the first row and the first column both represent entities, and besides, other data indicates whether the two entities have a relation, namely "1", and the two entities have no relation, namely "0", so that a one-degree relation can be directly calculated in the adjacency matrix.
Specifically, each split matrix stored on a physical node of the big data platform, each row of which is continuously stored in a data block on the file system, is not stored in order to save space, and is marked by a position marking mode.
Through a given main entity, finding out the corresponding row on the cluster physical nodes of the corresponding segmentation matrix, obtaining the associated entity with the layer 1 relation, through the message transmission among the cluster nodes, transmitting the associated entity value to the physical node where the corresponding matrix row is located, finding out the layer 2 relation, and so on until all the associated entities with the main entity according to the appointed hierarchy are found out.
By applying the embodiment of the application, a plurality of metadata entities are firstly acquired; then, according to the relation among the metadata entities, acquiring dimension labels corresponding to the metadata entities; acquiring blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity; and finally, according to the blood relationship of each metadata entity, acquiring and adding at least one tag of the liveness tag, the influence tag and the similar tag for each metadata entity. Compared with the traditional data tag generation mode, the method has the advantages that the dimension reflecting the association relationship of the metadata entities is contained in the data asset tag system, so that various metadata entities and the blood relationship corresponding to the metadata entities can be completely acquired, and the complete metadata tag system is constructed.
In one embodiment, as shown in fig. 3, there is provided an apparatus for constructing a metadata tag library, the apparatus comprising:
the metadata entity acquisition module 301 is configured to acquire a plurality of metadata entities.
A first obtaining module 302, configured to obtain dimension labels corresponding to the metadata entities according to relationships between the metadata entities; the dimension tag is used to indicate a dimension of a relationship between the metadata entity and another metadata entity.
The second obtaining module 303 is configured to obtain a blood-edge relationship corresponding to each metadata entity according to the dimension tag corresponding to each metadata entity.
The third obtaining module 304 is configured to obtain and add, for each metadata entity, at least one tag of an liveness tag, an influence tag, and a similar tag according to a blood relationship of each metadata entity.
As an optional implementation manner of the embodiment of the present invention, the obtaining and adding, for each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag according to a blood relationship of each metadata entity includes: acquiring the referenced times, the referenced frequency and the weight of the referee of each metadata entity according to the blood relationship of each metadata entity; and acquiring liveness labels of the metadata entities according to the quoted times, the quoted frequencies and the quoter weights.
As an optional implementation manner of the embodiment of the present invention, the obtaining and adding, for each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag according to a blood relationship of each metadata entity includes: acquiring one or more of centrality, intermediacy and compactness of each metadata entity according to the blood relationship of each metadata entity; and obtaining influence tags of each metadata entity according to one or more of the centrality, the intermediacy and the compactness.
As an optional implementation manner of the embodiment of the present invention, the obtaining and adding, for each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag according to a blood relationship of each metadata entity includes: performing cluster analysis on the metadata entities to obtain a cluster result; and acquiring influence tags of the metadata entities according to the clustering result.
As an optional implementation manner of the embodiment of the present invention, according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, including: obtaining a calculation result of the similarity degree among the metadata entities according to the blood-cause relation of the metadata entities; and obtaining the similarity labels of the metadata entities according to the calculation result of the similarity degree among the metadata entities.
As an optional implementation manner of the embodiment of the present invention, before obtaining the dimension labels corresponding to the metadata entities according to the relationships between the metadata entities, the method further includes: and acquiring the relation among the metadata entities by one or more modes of analyzing the data dictionary, analyzing the SQL statement, analyzing the database and analyzing the audit log.
As an optional implementation manner of the embodiment of the present invention, the method further includes: generating a relationship map by taking each metadata entity as a vertex and the relationship between the metadata entity and other metadata entities in the blood-edge relationship corresponding to each metadata entity as an edge; and storing the relation map into the map database.
The specific limitation regarding the construction of the metadata tag library apparatus may be referred to the limitation regarding the method of constructing the metadata tag library hereinabove, and will not be described herein. The above-described construction of each module in the metadata tag library apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a notebook computer, and the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The computer program is executed by a processor to implement a method of building a metadata tag library. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by persons skilled in the art that the architecture shown in fig. 4 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, the apparatus for constructing a metadata tag library provided by the present application may be implemented in the form of a computer, and the computer program may be executed on a computer device as shown in fig. 4. The memory of the computer device may store various program modules constituting the metadata tag library construction apparatus of the computer device, such as the acquisition metadata entity module, the first acquisition module, the second acquisition module, and the third acquisition module shown in fig. 3. The computer program of each program module causes a processor to execute the steps in the method for constructing a metadata tag library for a computer device according to each embodiment of the present application described in the present specification.
For example, the computer device shown in fig. 4 may perform step S101 through the acquisition metadata entity module in the construction metadata tag library apparatus of the computer device shown in fig. 3. The computer device may perform step S102 through the first acquisition module. The computer device may perform step S103 through the second acquisition module. The computer device may execute step S104 through the third acquisition module.
In one embodiment, a computer device is provided comprising a memory storing a computer program and a processor that when executing the computer program performs the steps of: acquiring a plurality of metadata entities; acquiring dimension labels corresponding to the metadata entities according to the relation among the metadata entities; the dimension tag is used for indicating the dimension of the relation between the metadata entity and another metadata entity; acquiring blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity; and according to the blood relationship of each metadata entity, acquiring and adding at least one tag of the liveness tag, the influence tag and the similar tag for each metadata entity.
In one embodiment, the processor when executing the computer program further performs the steps of: according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, and the method comprises the following steps: acquiring the referenced times, the referenced frequency and the weight of the referee of each metadata entity according to the blood relationship of each metadata entity; and acquiring liveness labels of the metadata entities according to the quoted times, the quoted frequencies and the quoter weights.
In one embodiment, the processor when executing the computer program further performs the steps of: according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, and the method comprises the following steps: acquiring one or more of centrality, intermediacy and compactness of each metadata entity according to the blood relationship of each metadata entity; and obtaining influence tags of each metadata entity according to one or more of the centrality, the intermediacy and the compactness.
In one embodiment, the processor when executing the computer program further performs the steps of: according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, and the method comprises the following steps: performing cluster analysis on the metadata entities to obtain a cluster result; and acquiring influence tags of the metadata entities according to the clustering result.
In one embodiment, the processor when executing the computer program further performs the steps of: according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, and the method comprises the following steps: obtaining a calculation result of the similarity degree among the metadata entities according to the blood-cause relation of the metadata entities; and obtaining the similarity labels of the metadata entities according to the calculation result of the similarity degree among the metadata entities.
In one embodiment, the processor when executing the computer program further performs the steps of: before the dimension labels corresponding to the metadata entities are obtained according to the relationships among the metadata entities, the relationships among the metadata entities are obtained by one or more modes of analyzing a data dictionary, analyzing SQL sentences, analyzing a database and analyzing an audit log.
In one embodiment, the processor when executing the computer program further performs the steps of: generating a relationship map by taking each metadata entity as a vertex and the relationship between the metadata entity and other metadata entities in the blood-edge relationship corresponding to each metadata entity as an edge; and storing the relation map into the map database.
In one embodiment, a computer readable storage medium having a computer program stored thereon is provided, which when executed by a processor, performs the steps of: acquiring a plurality of metadata entities; acquiring dimension labels corresponding to the metadata entities according to the relation among the metadata entities; the dimension tag is used for indicating the dimension of the relation between the metadata entity and another metadata entity; acquiring blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity; and according to the blood relationship of each metadata entity, acquiring and adding at least one tag of the liveness tag, the influence tag and the similar tag for each metadata entity.
In one embodiment, the processor when executing the computer program further performs the steps of: according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, and the method comprises the following steps: acquiring the referenced times, the referenced frequency and the weight of the referee of each metadata entity according to the blood relationship of each metadata entity; and acquiring liveness labels of the metadata entities according to the quoted times, the quoted frequencies and the quoter weights.
In one embodiment, the processor when executing the computer program further performs the steps of: according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, and the method comprises the following steps: acquiring one or more of centrality, intermediacy and compactness of each metadata entity according to the blood relationship of each metadata entity; and obtaining influence tags of each metadata entity according to one or more of the centrality, the intermediacy and the compactness.
In one embodiment, the processor when executing the computer program further performs the steps of: according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, and the method comprises the following steps: performing cluster analysis on the metadata entities to obtain a cluster result; and acquiring influence tags of the metadata entities according to the clustering result.
In one embodiment, the processor when executing the computer program further performs the steps of: according to the blood relationship of each metadata entity, at least one tag of an activity tag, an influence tag and a similar tag is obtained and added for each metadata entity, and the method comprises the following steps: obtaining a calculation result of the similarity degree among the metadata entities according to the blood-cause relation of the metadata entities; and obtaining the similarity labels of the metadata entities according to the calculation result of the similarity degree among the metadata entities.
In one embodiment, the processor when executing the computer program further performs the steps of: before the dimension labels corresponding to the metadata entities are obtained according to the relationships among the metadata entities, the relationships among the metadata entities are obtained by one or more modes of analyzing a data dictionary, analyzing SQL sentences, analyzing a database and analyzing an audit log.
In one embodiment, the processor when executing the computer program further performs the steps of: and generating a relationship map by taking each metadata entity as a vertex and the relationship between the metadata entity and other metadata entities in the blood relationship corresponding to each metadata entity as an edge, and storing the relationship map into the map database.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Those skilled in the art will appreciate that implementing all or part of the above described processes in an example method may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise processes such as the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include ROM (Read-Only Memory), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include RAM (Random Access Memory ) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as SRAM (Static Random Access Memory ), DRAM (Dynamic Random Access Memory, dynamic random access memory), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.