CN115481298B

CN115481298B - Graph data processing method and electronic equipment

Info

Publication number: CN115481298B
Application number: CN202211417991.4A
Authority: CN
Inventors: 于文渊; 徐静波; 罗小简; 李雪; 曾维彬
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-03-14
Anticipated expiration: 2042-11-14
Also published as: CN115481298A

Abstract

The embodiment of the application provides a graph data processing method and electronic equipment, wherein the graph data processing method comprises the following steps: acquiring information of graph data to be stored, wherein the information of the graph data at least comprises node type information and data value information of the graph data; determining a matched node standard format according to the node type indicated by the node type information of the graph data; and storing the graph data into standard graph data conforming to the standard format of the nodes according to the node type information and the data value information of the graph data. Through the embodiment of the application, the utilization rate of the graph data is greatly improved.

Description

Graph data processing method and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a graph data processing method and electronic equipment.

Background

A graph is a collection of points and edges, the "points" represent entities, the "edges" represent relationships between the entities, and the graph data is a collective term for the points, edges, and their relationships. In practical applications, the graph data may be recorded and stored in different file formats. For example, ORC (Optimized Record column) Format, queue (analytical task oriented column store) Format, HDF5 (Hierarchical Data Format Version 5, fifth edition) Format, CSV (Comma-Separated Values) Format, and so on. In addition, in practical applications, the storage of graph data has many different features in representation methods, such as: whether attributes are included, whether updates are supported, whether partitioning is done by point or edge, how the edge topology is represented (whether sorted, sorted by start or end point), and so on.

Graph computation based on graph data is very rich, and thus many services and applications for graph data are also generated, such as storage of internal and external memory of graph data, computation of graph databases, query engines of graph data, and so on. However, since the file format and the representation method of the graph data used by each service or application are different, there is a great incompatibility problem in accessing, importing, exporting, and the like the graph data. And thus incompatibility issues, further result in inefficient data processing for applications or services based on graph data, affecting the quality of service provided by such applications or services.

Disclosure of Invention

In view of the above, embodiments of the present application provide a graph data processing scheme to at least partially solve the above problems.

According to a first aspect of embodiments of the present application, there is provided a graph data processing method, including: acquiring information of graph data to be stored, wherein the information of the graph data at least comprises node type information and data value information of the graph data; determining a matched node standard format according to the node type indicated by the node type information of the graph data; and storing the graph data into standard graph data conforming to the standard format of the nodes according to the node type information and the data value information of the graph data.

According to a second aspect of embodiments of the present application, there is provided an electronic apparatus, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the method according to the first aspect.

According to a third aspect of embodiments herein, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, performs the method according to the first aspect.

According to a fourth aspect of the embodiments of the present application, there is provided another graph data processing method, including: acquiring information of graph data to be stored, wherein the information of the graph data at least comprises node type information and data value information of the graph data; determining a matched standard node format according to the node type indicated by the node type information of the graph data; and generating standard graph data conforming to the standard format of the nodes for the graph data according to the corresponding relation between each field in the target graph database and the standard format of the nodes and based on the node type information and the data value information of the graph data, and storing the standard graph data into the corresponding field in the target graph database.

According to the graph data processing scheme provided by the embodiment of the application, the graph data can be stored according to a preset standard format no matter what scene or application the graph data is generated under. The points and edges in the graph data respectively correspond to different standard node formats, and different standard formats can be stored according to the node types of the graph data. Thus, assuming a graph is constructed in one graph application, the graph data it produces can be used indiscriminately in another graph application or without hindrance from graph data-based interaction with another graph application. And compared with the traditional conversion scheme, the method has the advantages that the unified standard format is adopted for storage, so that format conversion of the graph data is not needed, and the utilization rate of the graph data is greatly improved. Therefore, the problem of incompatibility caused by different formats of graph data files used by various services or applications in the traditional mode is effectively solved, and the problems that the data processing efficiency of the applications or services based on the graph data is low and the service quality is influenced are further solved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a schematic diagram of an exemplary system to which graph data processing methods of embodiments of the present application are applicable;

FIG. 2A is a flowchart illustrating steps of a graph data processing method according to an embodiment of the present application;

FIG. 2B is a schematic diagram of an exemplary embodiment of the system of FIG. 2A;

FIG. 2C is a schematic diagram of a logical edge table corresponding to the legend shown in FIG. 2B;

FIG. 2D is a diagram of a physical point table corresponding to the legend shown in FIG. 2B;

FIG. 2E is a diagram of a physical edge table corresponding to the legend shown in FIG. 2B;

FIG. 2F is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 2A;

FIG. 3A is a flow chart illustrating steps of another method for processing data according to an embodiment of the present application;

FIG. 3B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 3A;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of protection of the embodiments in the present application.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

Fig. 1 illustrates an exemplary system to which a graph data processing method according to an embodiment of the present application is applied. As shown in fig. 1, the system 100 may include a cloud server 102, a communication network 104, and/or one or more user devices 106, illustrated in fig. 1 as a plurality of user devices. It should be noted that the graph data processing scheme in the embodiment of the present application may be implemented in the cloud service end 102, or may be implemented in the user equipment 106.

Cloud server 102 may be any suitable device for storing information, data, programs, and/or any other suitable type of content, including but not limited to distributed storage system devices, server clusters, computing cloud server clusters, and the like. In some embodiments, cloud server 102 may perform any suitable functions. For example, when the graph data processing scheme according to the embodiment of the present application is implemented in the cloud server 102, the cloud server 102 may store the graph data to be stored as the standard graph data in the standard format according to a preset node standard format and a node type of the graph data to be stored. As an optional example, in some embodiments, the cloud service 102 may be configured to determine a matching node standard format according to a node type indicated by the node type information of the graph data. As another example, in some embodiments, the cloud server 102 may be used to store graph data as standard graph data conforming to a node standard format according to the node type information and the data value information of the graph data. In some embodiments, the cloud server 102 may store the graph data to be stored as the standard graph data at the cloud server 102 according to the graph data storage request sent by the user device 106 and the information of the graph data to be stored.

In some embodiments, the communication network 104 may be any suitable combination of one or more wired and/or wireless networks. For example, the communication network 104 can include any one or more of the following: the network may include, but is not limited to, the internet, an intranet, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a Digital Subscriber Line (DSL) network, a frame relay network, an Asynchronous Transfer Mode (ATM) network, a Virtual Private Network (VPN), and/or any other suitable communication network. The user device 106 can be connected to the communication network 104 via one or more communication links (e.g., communication link 112), and the communication network 104 can be linked to the cloud server 102 via one or more communication links (e.g., communication link 114). The communication link may be any communication link suitable for communicating data between the user device 106 and the cloud service 102, such as a network link, a dial-up link, a wireless link, a hardwired link, any other suitable communication link, or any suitable combination of such links.

User device 106 may include any user device or devices suitable for interacting with a user and performing corresponding functions. For example, when the graph data processing scheme according to the embodiment of the present application is implemented in the user equipment 106, the user equipment 106 may store the graph data to be stored as the standard graph data in the standard format according to a preset node standard format and a node type of the graph data to be stored. As an alternative example, in some embodiments, the user device 106 may be configured to determine a matching node standard format based on the node type indicated by the node type information of the graph data. As another example, in some embodiments, the user device 106 may be used to store graph data as standard graph data that conforms to a standard format for nodes based on the node type information and the data value information of the graph data. In some embodiments, the user device 106 may send the graph data storage request and the information of the graph data to be stored to the cloud server 102, and the cloud server 102 stores the graph data to be stored as the standard graph data in the cloud server 102, or further, sends the standard graph data back to the user device 106. In some embodiments, user devices 106 may comprise any suitable type of device. For example, in some embodiments, the user device 106 may include a mobile device, a tablet computer, a laptop computer, a desktop computer, a wearable computer, a game console, a media player, a vehicle entertainment system, and/or any other suitable type of user device.

Based on the above system, the graph data processing method provided in the embodiment of the present application is described below by an embodiment.

Referring to FIG. 2A, a flow diagram illustrating steps of a data processing method according to an embodiment of the present application is shown.

The graph data processing method of the embodiment includes the steps of:

step S202: and acquiring information of the graph data to be stored.

The graph data to be stored may be data of a graph generated via any graph application or non-graph application, such as a graph for characterizing a relationship between a user and a commodity, a graph in a knowledge graph, or the like, among others. In the embodiment of the present application, the specific generation and acquisition manner of the graph data is not limited.

In the embodiment of the application, the information of the graph data to be stored at least comprises node type information and data value information of the graph data. The node type information is used to indicate that the current graph data is a point or an edge, and the data value information of the graph data is used to indicate a value corresponding to the point or the edge, including but not limited to an attribute value. But not limited thereto, in a possible manner, the information of the graph data may further include label information for characterizing the classification of the graph data in the application to which the graph data belongs, and for example, if a certain graph is a graph corresponding to a commodity, the label information may be the commodity; or, if a certain graph is a graph corresponding to the user, the label information of the graph can be the user; or, if a certain figure is a figure of the examination lessons and the scores of the examination lessons of the students, the labels of the figures can be the scores of the examinations, and the like.

Step S204: and determining the matched standard node format according to the node type indicated by the node type information of the graph data.

For a graph, its nodes are generally of two types, namely: a point type and an edge type. In the embodiment of the application, corresponding node standard formats are set for the two types respectively. The method comprises the following steps: when the node type is the point type, the standard format of the node comprises: recording a point identifier and at least one point attribute field corresponding to the point identifier by using a logical point table; when the node type is an edge type, the standard format of the node comprises: and recording the edge attribute fields of the edges corresponding to the edge start point identification, the edge end point identification and the edge start point identification and the edge end point identification by using the logic edge table. In this way, the management map data can be standardized efficiently.

Exemplarily, referring to fig. 2B, an example of a graph is shown. Further illustratively, the graph may be a graph structure corresponding to a community forum, which includes nodes of point type, namely: the user, the post being posted, the tag corresponding to the post (e.g., category, etc.), and the information of the forum. As can be seen from the figure, there are edges between the nodes of these four point types that characterize their interrelationship, namely: the relationship between the user and the post is that the post is created by a certain user and the certain user likes a certain post, the relationship that the post has tag, the relationship that the forum contains the post and classifies tag, and the like.

When the map needs to be stored, for the points therein, the point identifier and at least one point attribute field corresponding to the point identifier need to be recorded by using the logical point table. In order to clearly distinguish graph data, a logical point table may be created for a set of points for each type of label, in conjunction with labels label included in the graph data.

That is, each kind of label point corresponds to a logical point table, and each point is assigned with a continuous number global id starting from 0, that is, the line number of the point on the logical point table. Illustratively, the logical point table of tag points may be represented as:

therefore, the label + global id can uniquely identify one point, and in a logic point table of a certain label, the attribute of the point can be randomly accessed according to the global id. Subsequently, when maintaining the topology of the graph, the global id can also be used to identify the start and end points of the edge.

But is not limited thereto, in one possible approach the node standard format of the point type also includes a node key value field. For example, in the logical point table, part of property may be labeled as key (i.e., node key value), such as the name column of the tag table may be the node key value field of the tag table.

For an edge in the graph, an edge attribute field of the edge corresponding to the edge start point identifier, the edge end point identifier, and the edge start point identifier and the edge end point identifier may be recorded using a logical edge table. Optionally, the node standard format of the edge type further includes: and recording offset position information of the start point corresponding to the edge start point identification by using the logic edge table. Although these information may be stored simultaneously by the logical edge table, in order to make the subsequent management and access of the stored graph data more efficient and convenient, in one possible way, the offset table may be used to record the offset position information, that is, the logical edge table may be divided into an offset table and an edge table, wherein the offset table is only used for storing the offset position information of the start point corresponding to the edge start point identifier.

Specifically, each triplet (starting label, edge label, and ending label) corresponds to a logical edge table, and in order to ensure that a graph can be quickly created, the logical edge table may maintain topology information in a manner similar to CSR (compressed sparse row) or CSC (compressed sparse column) or COO (Coordinate list format). Taking CSR as an example, the logical edge table is split into two tables, one offset table (i.e., offset table) and the other edge table. And the edges of the same origin are stored in the edge table consecutively. Taking the edge corresponding to the user's favorite post between the "user" and "post" shown in fig. 2B as an example, it may be represented in the form as shown in fig. 2C.

As can be seen in fig. 2C, it also has global ids, each corresponding to a position in the edge table of one start point in the offset table, each corresponding to a plurality of attributes of the same start point in the edge table. For example, because of sequential storage, "0" in the offset table indicates that edges whose starting points are all "0" start from row 0 in the edge table; similarly, "15" in the offset table indicates that edges whose starting points are all "1" start from row 15 in the edge table. In the edge table, the attribute fields of the edge corresponding to the start point (e.g., user No. 0 in fig. 2B), the end point (e.g., post No. 8430 preferred by user No. 0), and the start point and the end point are recorded, and fig. 2C illustrates the creation time of the post.

By analogy, corresponding logical point tables and logical edge tables can be established for each point and each edge in the graph, and the data of the logical point tables and the logical edge tables is convenient for subsequent users to read and understand and has clear meaning; on the other hand, effective basis is provided for the management of the graph data, and the management efficiency is improved.

Step S206: and storing the graph data into standard graph data conforming to the standard format of the nodes according to the node type information and the data value information of the graph data.

As described above, according to the standard format corresponding to the node type and the standard format corresponding to the edge type, the graph data may be stored as the standard graph data conforming to the standard format of the node, and the data value information is the specific value of the relevant specific field specified in the standard format of the node.

In addition, in the case that the graph data further includes the tag information, the graph data may be stored as standard graph data conforming to a standard format of the node according to the tag information, the node type information, and the data value information of the graph data, so that the graph data has a higher degree of recognition.

Through the process, the standard graph data which is convenient for a user to use and view can be formed in the form of a logical point table and a logical edge table. However, it should be clear to those skilled in the art that efficient management and access to graph data also depends on the way the graph data is physically stored.

For this reason, in a possible manner of the embodiment of the present application, for the graph data of the point type, the logical point table may be divided into a plurality of sub-tables according to a row according to a first preset size; aiming at each sub-table, dividing the sub-table into a plurality of column groups according to the column division of the logic point table; and according to the plurality of divided column groups, physically storing the graph data corresponding to the logic point table to obtain a plurality of corresponding physical point tables. By dividing the data blocks into rows, the data amount corresponding to each data block is effectively reduced, and the parallel processing capacity of the data is improved; and further, the data are divided into columns, so that the subsequent access and splicing of the data are more convenient. The first preset size can be set by a person skilled in the art according to actual requirements, and the embodiment of the present application does not limit this.

Taking the logic point table corresponding to the legend shown in fig. 2B as an example, in order to support parallel read/write of the logic point table, the logic point table may be divided into chunks, and in order to maintain the random access capability, the chunk size of the logic point table of the same label may be set to a constant value. Furthermore, in order to support adding attribute fields on the points without modifying the existing graph files, the columns of the logic point table can be divided into a plurality of column groups. Taking the tag logical point table as an example, assuming that chunk size = 8192, two column attributes are divided into two column groups, and the resulting physical point table is shown in fig. 2D.

As can be seen from fig. 2D, the tag logical point table is divided into two chunks according to chunk size = 8192, the first chunk stores therein standard map data with global id from 0 to 8191, and the second chunk stores therein standard map data with global id from 8192 to 16079. Further, in each chunk, the attribute data is divided into two column groups according to a name column and a url column, which are respectively marked as tag _ v0_ p0 corresponding to the name column of the first chunk and tag _ v0_ p1 corresponding to the url column in the figure; tag _ v1_ p0 corresponding to the name column of the second chunk, and tag _ v1_ p1 corresponding to the url column.

For the standard graph data of the edge type, an offset table used for recording offset position information, a topological structure table used for recording edge topology and an edge attribute table used for recording edge attribute fields corresponding to the logical edge table can be obtained according to the graph data information stored in the logical edge table; dividing the offset table into a plurality of offset sub-tables according to a row according to a second preset size; obtaining a plurality of corresponding topological structure sub-tables and a plurality of edge attribute sub-tables according to the plurality of offset sub-tables; and according to the plurality of offset sub-tables, the plurality of topological structure sub-tables and the plurality of edge attribute sub-tables, physically storing the graph data corresponding to the logical edge table to obtain a plurality of corresponding physical edge tables. In this embodiment, the logical edge table is divided into the offset table, the topology table, and the edge attribute table based on the difference in the stored information, but the division may be only logical, and the logical edge table does not actually need to be physically divided into a plurality of edges, but actual physical division may be performed. By the mode, management of the graph data of the side type can be more efficient. The second preset size may be set by a person skilled in the art according to actual conditions, and the embodiment of the present application does not limit this. Alternatively, the second predetermined size may be the same as the first predetermined size to align the partition of the offset table with the partition of the logical point table. It should be understood by those skilled in the art that the second predetermined size may be different from the first predetermined size, and may be equally applicable to the embodiments of the present application. In addition, it should be noted that, if there is no offset table, the topology table and the edge attribute table may be directly split according to a certain size.

In order to make the correspondence between the data before and after the splitting consistent, so as to facilitate data access and management, in a feasible manner, obtaining a plurality of corresponding topology sub-tables and edge attribute sub-tables according to a plurality of offset sub-tables may be implemented as: obtaining a plurality of corresponding table blocks comprising a topological structure table and an edge attribute table according to the plurality of offset sub-tables; for each table block, dividing the topological structure table block in each table block into a plurality of topological structure sub-tables according to a third preset size; and dividing the edge attribute table in the table block according to the plurality of topological structure sub-tables to obtain a plurality of edge attribute sub-tables aligned with the plurality of topological structure sub-tables in the table block. The third preset size may be set by a person skilled in the art as appropriate according to actual situations, and is not limited in this embodiment of the present application. Further, for each edge attribute sub-table, the edge attribute sub-table can be further divided into a plurality of column groups according to attribute grouping.

Still taking the logical edge table corresponding to the graph in the illustration shown in fig. 2B as an example, the logical edge table can be divided into three types of tables: namely an offset table (optional), a topology table (containing only global ids of start and end points), and an edge attribute table (if there is no attribute on the edge, there is no edge attribute table). The division of the offset table and the division of the corresponding logic point table can be aligned, and the first behavior of each divided offset sub-table is 0; after the offset table is divided, the other part in the logical edge table is actually divided into sub-tables (i.e. table blocks) with unequal rows, at this time, each sub-table is divided by a fixed chunk size, the divided chunks have both topology structure information and edge attribute information, it can be considered that the topology structure information part corresponds to the topology structure table, the edge attribute part corresponds to the edge attribute table, and the two are further divided, so that the topology structure sub-table and the edge attribute sub-table can be formed, and the topology structure sub-table and the edge attribute sub-table are aligned in division.

Taking the logical edge table corresponding to the edge of the user preference post in the diagram of the illustration shown in fig. 2B as an example, the chunk size obtained by dividing the offset table is set to 1024, and based on this, the chunk size of each table block corresponding to the offset table is set to 1024. It should be noted that, in the case of the presence of the offset table, the topology table should maintain the ordering, and since the table is stored in groups of columns, the table can be used as a CSR. But not limited to, it can also be provided as a CSC (compressed sparse column) to select the data in the offset table and chunking by end point. Illustratively, the physical edge table corresponding to the edge that the user likes the post is shown in fig. 2E.

As can be seen from fig. 2E, the offset table is divided into two offset sub-tables, the first offset sub-table offset _ v0 stores offset position information of global id from 0 to 1023, and the second offset sub-table offset _ v1 stores offset position information of global id from 1024 to 1638. The first table block corresponding to the first offset sub-table is divided into 4 sub-blocks, each sub-block is divided into a topological structure sub-table and an edge attribute sub-table according to attribute columns, the second table block has less data, sub-blocks are not divided again, and the second table block is only divided into the topological structure sub-table and the edge attribute sub-table according to the attribute columns. Thus, four topology sub-tables corresponding to offset _ v0, i.e., topo _ v0_ e0, topo _ v0_ e1, topo _ v0_ e2, topo _ v0_ e3, and edge attribute sub-tables aligned with the four topology sub-tables, i.e., prop _ v0_ e0, prop _ v0_ e1, prop _ v0_ e2, prop _ v0_ e3, are obtained. And the topology sub-table topo _ v1_ e0 and the edge attribute sub-table prop _ v1_ e0 corresponding to the offset _ v 1.

Thereby, a standard and efficient storage of graph data is achieved.

Further, in order to facilitate interaction between different graph applications or services and long-term storage of graph data, in most cases, standard graph data is output in the form of a file. Therefore, in the embodiment of the application, on the basis that the graph data is presented in the form of the above-mentioned logical point and edge table and is physically stored in the form of the above-mentioned physical point and edge table, a file corresponding to the standard graph data can be generated according to preset graph file metadata information.

Wherein the graph file metadata information includes: graph file name information, graph file path base information, point file metadata information, and edge file metadata information, etc. The point file metadata information may include: point type information, first data segmentation size information, point attribute grouping information, point file path information, and the like; the side file metadata information may include: the start point, the end point, and the edge type information of the edge, the edge file path information, the edge topology representation method information, the second data segment size information, the edge attribute grouping information, and the like.

When the file is specifically generated, a preset type of graph file corresponding to the standard graph data can be generated according to the preset graph file metadata information. In order to use and be compatible with existing graph file formats as much as possible, preset types of graph files may employ any of ORC, partial, HDF5, CSV, etc., in one possible approach.

Illustratively, the graph file metadata information may be implemented as:

yml, suffix name of meta information filename of the graph;

defining basic information of the graph, such as the name of the graph, etc.;

defining prefix corresponding to the point type, and describing vertex meta of the point type through a vertex.yml file;

define (start type, edge type, end type) triplet corresponding prefix, by means of.

The vertex meta described in the above metadata can be described as follows:

defining a point type;

define chunk size;

defining point file path information;

groups of attributes defining points (optional, mandatory for CSV), each group defining:

o corresponding attribute file prefix and file format;

o contains the attribute name, type, and whether it is a primary key.

The edge meta described in the above metadata can be described as follows:

defining the types of a starting point, an end point and an edge;

define chunk size;

defining whether the edge is directed;

defining edge file path information;

defining a representation of the edge topology;

for each representation method of the topological structure, defining the prefix corresponding to the topological structure table, the stored file format and the attribute grouping of the edges;

each attribute group defines:

o a file prefix and a file format corresponding to the edge attribute table;

o contains the attribute name, type, and whether it is a primary key.

Through the metadata, the logic point table and the logic edge table can be stored as the corresponding graph files. The logic point table and the logic edge table can be divided into a plurality of sub-tables, and each sub-table can be stored as a file in one of the following four formats:

part, ORC format: attributes that can be used to store a point/edge private type;

HDF5 format: can be used to store data of point/edge array type;

CSV format: any of them may be used.

Based on the definition of the metadata information of the graph file, the generation and the access of the graph file can be realized on the basis, and at least comprise data writing and data reading of the graph file.

For data writing of the graph file, standard graph data of the file to be written and information of a node standard format corresponding to the standard graph data can be obtained; and accessing a graph file writing interface according to the type of the standard graph data, and writing the standard graph data into a file with a format matched with the type of the standard graph data according to the standard format of the node through the graph file writing interface to generate a corresponding graph file. The writing may be divided into writing for a point and writing for an edge, and during specific writing, according to the metadata defined by the foregoing vertex meta and edge meta, and considering the validity of the chunk to be written, corresponding data may be written into the graph file in the corresponding format, so as to generate a final graph file corresponding to the point or a graph file corresponding to the edge.

For data reading of the graph file, a graph file reading request can be received; and accessing the graph file reading interface according to the graph file reading request, reading a file corresponding to the standard graph data requested by the graph file reading request through the graph file reading interface, and analyzing to obtain the standard graph data in the file. For example, the graph file may be parsed to obtain the information of the corresponding metadata and the information of the chunk, and based on this, the specific data in the graph file may be read, so as to obtain the standard graph data stored in the graph file.

Because of the standardization of the graph data storage, the interface for reading and writing data of the graph file can be realized as a uniform and standardized interface so as to facilitate the access to the graph data.

However, in practical applications, it may not be possible to completely implement corresponding functions only by using the read standard diagram data, and therefore, in a feasible manner of the embodiment of the present application, the standard diagram data obtained by analysis may be combined according to the combination operation indicated by the diagram data operation instruction; and generating a new graph according to the combination result. As described above, the standard diagram data may be stored in columns and chunks, and this storage method also provides great flexibility and convenience for recombining different data to generate a new diagram, so that, when necessary, the standard diagram data may be flexibly combined to generate actually required diagram data and diagrams corresponding to the diagram data, thereby expanding the application of the diagram data.

Hereinafter, the above process is explained by a scenario example, as shown in fig. 2F.

In this example, the setting user 01 constructs a graph X through a graph application a, and the graph X is uploaded to the cloud server by the graph application a and calls the graph file writing interface in the graph data processing method, so that the graph X generates a first graph file corresponding to the standard graph data of the point in the graph X and a second graph file corresponding to the standard graph data of the edge in the graph X. And storing the first graph file and the second graph file into the cloud server.

On the other side, the other user 02 has a diagram application B, and wishes to be able to use diagram X via diagram application B to generate a new diagram Y. In this case, the graph application B first sends a graph file reading request to the cloud server, so as to read the first graph file and the second graph file corresponding to the graph X through the graph file reading interface in the cloud server, and download the first graph file and the second graph file to the local graph application B for opening. Since both the graph applications a and B default graph data to be stored in the standard node format in the embodiment of the present application, the first graph file and the second graph file can be read for use in the graph application B.

Further, the user 02 recombines part of the points in the first graph file and part of the edges in the second graph file by the graph application B to generate the graph Y.

It can be seen that, with the present embodiment, the graph data can be stored in a preset standard format regardless of the scenario or application under which it is generated. The points and edges in the graph data respectively correspond to different standard node formats, and different standard formats can be stored according to the node types of the graph data. Thus, assuming a graph is constructed in one graph application, the graph data it produces can be used indiscriminately in another graph application or without hindrance from graph data-based interaction with another graph application. And compared with the traditional conversion scheme, the method has the advantages that the unified standard format is adopted for storage, the format conversion of the graph data is not needed, and the utilization rate of the graph data is greatly improved. Therefore, the problem of incompatibility caused by different formats of graph data files used by various services or applications in the traditional mode is effectively solved, and the problems that the data processing efficiency of the applications or services based on the graph data is low and the service quality is influenced are further solved.

Moreover, existing graph file formats such as ORC, partial, HDF5, CSV and the like can be effectively utilized, and compatibility is kept. Based on the standard format, different standalone/distributed graph computation engines, databases, etc. can conveniently load and export graph data. In addition, since the characteristics of the graph data are not deviated, a non-graph calculation engine such as Spark/Hadoop can generate standard graph data or graph files, and can also support graph formats such as CSR/CSC. In addition, based on standard graph data, rich downstream computing tasks such as graph computation of external memory and the like can be conveniently processed; and, further operations may be performed while keeping the payload map file unmodified, such as: new attributes and additions and deletions of points, free combinations of different types of points and edges to organize a new graph, and the like.

Referring to FIG. 3A, a flow diagram illustrating steps of another data processing method according to an embodiment of the present application is shown. In this embodiment, a diagram data method according to the embodiment of the present application is described in a scenario in which standard diagram data is generated from a diagram and then stored in a diagram database.

The graph data processing method comprises the following steps:

step S302: and acquiring information of the graph data to be stored.

The graph data to be stored may be data of a graph generated via any graph application or non-graph application, such as a graph for characterizing a relationship between a user and a commodity, a graph in a knowledge graph, and the like. In the embodiment of the present application, the specific generation and acquisition manner of the graph data is not limited.

Step S304: and determining the matched standard node format according to the node type indicated by the node type information of the graph data.

For a graph, its nodes are generally of two types, namely: a point type and an edge type. In the embodiment of the application, corresponding node standard formats are set for the two types respectively. The method comprises the following steps: when the node type is a point type, the standard format of the node comprises: recording a point identifier and at least one point attribute field corresponding to the point identifier using a logical point table; when the node type is an edge type, the standard format of the node comprises: and recording the edge attribute fields of the edges corresponding to the edge start point identification, the edge end point identification and the edge start point identification and the edge end point identification by using the logic edge table. In this way, the management map data can be effectively standardized.

Step S306: and generating standard graph data conforming to the standard format of the nodes for the graph data according to the corresponding relation between each field in the target graph database and the standard format of the nodes and based on the node type information and the data value information of the graph data, and storing the standard graph data into the corresponding field in the target graph database.

The target graph database includes a plurality of fields, such as the plurality of fields in the aforementioned logical point table or logical edge table. In an alternative, the sequence of fields in the target graph database may be consistent with a logical point table or a logical edge table to improve storage efficiency. However, in some cases, the sequence of each field in the target graph database may not be completely consistent with the logical point table or the logical edge table, and the actually stored data may be consistent but the field names may not be completely consistent.

The above process is exemplified below with a specific scenario example, as shown in fig. 3B.

In this example, it is simply exemplified that one user posts in the community forum. Assuming the user's username is Hamid, it is assumed that he has published 3 posts post in the forum, simple

examples being posts

5237, 5338, 6197. Whether the user Hamid or the posts issued by the user Hamid correspond to corresponding information, for example, hamid corresponds to url information, and each post corresponds to creation Date information. From this, it can be determined that the name and post can be points in the graph, and the relationship between the name and post can be an edge between the two. Then, according to the node standard format corresponding to the point type and the node standard format corresponding to the edge type, the standard graph data of the corresponding point type and the standard graph data of the edge type can be generated for the data. Further assume that the field name corresponding to the name of the graph database X (object database) is the user name, the field name corresponding to the url is the user address, the field name corresponding to the post is the post, and the field name corresponding to the creation Date is the creation time. Then, the corresponding information in the standard drawing data of the point type and the standard drawing data of the edge type will generate each standard drawing data that can be stored in the drawing database X (target database) correspondingly according to the above correspondence. In turn, these stored standard graph data may be presented to the user in graph form.

Therefore, according to the embodiment, the graph data is stored in the graph database by adopting the unified standard format, and the format conversion of the graph data is not needed in the subsequent use and interaction based on the graph data, so that the utilization rate of the graph data is greatly improved. Therefore, the problem of incompatibility caused by different formats of graph data files used by various services or applications in the traditional mode is effectively solved, and the problems that the data processing efficiency of the applications or services based on the graph data is low and the service quality is influenced are further solved.

Referring to fig. 4, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, and the specific embodiment of the present application does not limit a specific implementation of the electronic device.

As shown in fig. 4, the electronic device may include: a processor (processor) 402, a communication Interface 404, a memory 406, and a communication bus 408.

Wherein:

the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.

A communication interface 404 for communicating with other electronic devices or servers.

The processor 402 is configured to execute the program 410, and may specifically execute relevant steps in the above-described data processing method embodiment.

In particular, program 410 may include program code comprising computer operating instructions.

The processor 402 may be a CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 410 may be specifically configured to enable the processor 402 to execute operations corresponding to the graph data processing method described in the foregoing method embodiment.

For specific implementation of each step in the program 410, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing method embodiments, and corresponding beneficial effects are provided, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

The embodiment of the present application further provides a computer program product, which includes computer instructions, where the computer instructions instruct a computing device to execute operations corresponding to any of the above-described method embodiments.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the methods described herein. Further, when a general-purpose computer accesses code for implementing the methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. A graph data processing method, comprising:

acquiring information of graph data to be stored, wherein the information of the graph data at least comprises node type information and data value information of the graph data;

determining a matched standard node format according to the node type indicated by the node type information of the graph data, wherein the node type comprises: point types and edge types in graph data; when the node type is the point type, the standard format of the node comprises: recording a point identifier and at least one point attribute field corresponding to the point identifier using a logical point table; when the node type is an edge type, the standard format of the node comprises: recording an edge starting point identifier, an edge end point identifier and an edge attribute field of an edge corresponding to the edge starting point identifier and the edge end point identifier by using a logic edge table;

storing the graph data into standard graph data conforming to the standard format of the nodes according to the node type information and the data value information of the graph data;

the method further comprises the following steps:

dividing the logic point table into a plurality of sub-tables according to rows according to a first preset size; aiming at each sub-table, dividing the sub-table into a plurality of column groups according to the column division of the logic point table; according to the divided multiple column groups, physically storing the graph data corresponding to the logic point table to obtain multiple corresponding physical point tables;

or,

according to the information of the graph data stored in the logic edge table, obtaining an offset table for recording offset position information, a topological structure table for recording edge topology and an edge attribute table for recording the edge attribute field corresponding to the logic edge table; dividing the offset table into a plurality of offset sub-tables according to a second preset size; obtaining a plurality of corresponding topological structure sub-tables and a plurality of edge attribute sub-tables according to the plurality of offset sub-tables; and according to the plurality of offset sub-tables, the topological structure sub-table and the plurality of edge attribute sub-tables, physically storing the graph data corresponding to the logical edge table to obtain a plurality of corresponding physical edge tables.

2. The method of claim 1, wherein the information of the graph data further comprises: label information of the graph data, the label information being used to characterize a classification of the graph data in an application to which it belongs;

the storing the graph data into standard graph data conforming to the standard format of the node according to the node type information and the data value information of the graph data includes:

and storing the graph data into standard graph data conforming to the standard format of the nodes according to the label information, the node type information and the data value information of the graph data.

3. The method of claim 1 or 2,

the node standard format of the point type further includes: a node key value field;

and/or the presence of a gas in the atmosphere,

the node standard format of the edge type further includes: and recording offset position information of the start point corresponding to the edge start point identification by using the logic edge table.

4. The method of claim 1, wherein said obtaining a corresponding plurality of topology sub-tables and edge attribute sub-tables from the plurality of offset sub-tables comprises:

obtaining a plurality of corresponding table blocks comprising a topological structure table and an edge attribute table according to the plurality of offset sub-tables;

for each table block, dividing the topological structure table in each table block into a plurality of topological structure sub-tables according to a third preset size; and dividing the edge attribute table in the table block according to the plurality of topological structure sub-tables to obtain a plurality of edge attribute sub-tables aligned with the plurality of topological structure sub-tables in the table block.

5. The method of claim 1, wherein the method further comprises:

and generating a file corresponding to the standard graph data according to preset graph file metadata information.

6. The method of claim 5, wherein the graph file metadata information comprises: the method comprises the following steps of obtaining graph file naming information, graph file path basic information, point file metadata information and side file metadata information;

wherein,

the point file metadata information includes: point type information, first data segmentation size information, point attribute grouping information and point file path information;

the side file metadata information includes: the starting point, the end point, the edge type information, the edge file path information, the edge topological structure representation method information, the second data segmentation size information and the edge attribute grouping information of the edge.

7. The method of claim 5, wherein the method further comprises:

receiving a graph file reading request;

and accessing a graph file reading interface according to the graph file reading request, reading a file corresponding to the standard graph data requested by the graph file reading request through the graph file reading interface, and analyzing to obtain the standard graph data in the file.

8. The method of claim 5, wherein the method further comprises:

acquiring standard graph data of a file to be written and information of a node standard format corresponding to the standard graph data;

and accessing a graph file writing interface according to the type of the standard graph data, and writing the standard graph data into a file with a format matched with the type of the standard graph data according to the node standard format through the graph file writing interface to generate a corresponding graph file.

9. A graph data processing method, comprising:

determining a matched standard node format according to the node type indicated by the node type information of the graph data, wherein the node type comprises: point types and edge types in graph data; when the node type is the point type, the standard format of the node comprises: recording a point identifier and at least one point attribute field corresponding to the point identifier using a logical point table; when the node type is an edge type, the standard format of the node comprises: recording an edge starting point identifier, an edge end point identifier and an edge attribute field of an edge corresponding to the edge starting point identifier and the edge end point identifier by using a logic edge table; dividing the logic point table into a plurality of sub-tables according to rows according to a first preset size; aiming at each sub-table, dividing the sub-table into a plurality of column groups according to the column division of the logic point table; according to the divided column groups, carrying out physical storage on the graph data corresponding to the logic point table to obtain a plurality of corresponding physical point tables; or, according to the information of the graph data stored in the logical edge table, obtaining an offset table for recording offset position information, a topology structure table for recording edge topology, and an edge attribute table for recording the edge attribute field, which correspond to the logical edge table; dividing the offset table into a plurality of offset sub-tables according to a row according to a second preset size; obtaining a plurality of corresponding topological structure sub-tables and a plurality of edge attribute sub-tables according to the plurality of offset sub-tables; according to the plurality of offset sub-tables, the topological structure sub-table and the plurality of edge attribute sub-tables, carrying out physical storage on the graph data corresponding to the logic edge table to obtain a plurality of corresponding physical edge tables;

and generating standard graph data conforming to the standard format of the nodes for the graph data according to the corresponding relation between each field in the target graph database and the standard format of the nodes and based on the node type information and the data value information of the graph data, and storing the standard graph data into the corresponding field in the target graph database.

10. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the method in any one of claims 1-9.